DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 1,
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES JINHO BAIK AND ERIC M. RAINS
Abstract We present a number of results relating partial Cauchy-Littlewood sums, integrals over the compact classical groups, and increasing subsequences of permutations. These include: integral formulae for the distribution of the longest increasing subsequence of a random involution with constrained number of fixed points; new formulae for partial Cauchy-Littlewood sums, as well as new proofs of old formulae; relations of these expressions to orthogonal polynomials on the unit circle; and explicit bases for invariant spaces of the classical groups, together with appropriate generalizations of the straightening algorithm. Introduction Consider the following two identities: Y X sλ (x1 , x2 , . . .)sλ (y1 , y2 , . . .) = (1 − xi y j )−1 , (0.1) λ
lim EU ∈U (l)
l→∞
Y i
det(1 − xi U )
i, j
−1
Y i
det(1 − yi U
−1 −1
)
=
Y i, j
(1 − xi y j )−1 . (0.2)
The first of these is the well-known identity of Cauchy (see [29]). The second is a formal analogue of the Szeg¨o limit theorem, equivalent to a theorem of [10]. Since the right-hand sides are the same, we also have a third identity: X sλ (x1 , x2 , . . .)sλ (y1 , y2 , . . .) λ
= lim EU ∈U (l) l→∞
Y i
det(1 − xi U )−1
Y i
det(1 − yi U −1 )−1 . (0.3)
Our object of study in the present work is generalizations of these three identities. Our generalizations take two forms. One is to remove the limit l → ∞. As we see in Section 5, in order to preserve (0.3), we must then restrict the sum over partitions DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 1, Received 23 February 2000. Revision received 26 September 2000. 2000 Mathematics Subject Classification. Primary 05E15. Baik’s work supported in part by a Sloan Doctoral Foundation Fellowship. 1
2
BAIK AND RAINS
to involve only partitions with at most l parts. It is here that increasing subsequences appear; in order to rescue equations (0.1) and (0.2), we must replace the common right-hand side with a generating function counting objects (reducing to permutations in an appropriate limit) without long increasing subsequences. In the case of (0.1), the connection is via the Robinson-Schensted-Knuth correspondence, with its wellknown connections to increasing subsequences. It turns out that there is also a direct connection for (0.2), in terms of the invariant theory of the unitary group. In particular, this gives a direct (and essentially elementary) proof of the known connection between unitary group integrals and increasing subsequences (see [32]), as well as of the new connections given here. The other way in which we generalize these identities is to replace the unitary group U (l) by one of four other groups, including the orthogonal and symplectic groups. In terms of permutations, this corresponds to considering involutions (in two ways), signed permutations, and signed involutions, in addition to the original case of permutations; each of these conditions can be described as a particular type of symmetry condition. In each case, we obtain analogues of the finite l versions of equations (0.1), (0.2), and (0.3), together with increasing subsequence interpretations. Guide to main results One of the classical models used to analyze increasing subsequences is the Poisson model: one generates a random subset of the unit square via a Poisson process; then one associates a permutation to this subset in a canonical way (the order of the y coordinates relative to the x coordinates). The five symmetry types correspond to the five subgroups of Z2 × Z2 (acting on the unit square via diagonal reflections); one insists that the random subset be preserved by the appropriate group. It turns out that each symmetry type is naturally associated with a certain compact Lie group, determined as a fixed subgroup via an appropriate action of Z2 × Z2 on the unitary group. Our first main result (Theorem 1.2) then states that the (exact) distribution of the (length of the) longest increasing subsequence of a random permutation of a given symmetry type is given by the moments of the trace of a random element of the corresponding Lie group. By the Schensted correspondence, each of the five cases of Theorem 1.2 can be viewed as expressing an integral over one of the five groups as an appropriate sum over partitions. Each such identity specializes an appropriate Schur function identity (see Theorem 5.2 and Corollary 5.3). For the three symmetry types with diagonal symmetry, this can be further generalized (essentially allowing points on the diagonal of symmetry); thus, for instance, if f (λ) is the number of odd parts and if `(λ) is the
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
3
number of parts of λ, then (see Theorem 5.6) Y X α f (λ) sλ (x) = E U ∈O(l) det(1 + αU ) det(1 − x j U )−1 . `(λ)≤l
(0.4)
j
For a quite general class of specializations (corresponding to “super-Schur” or “hook Schur” functions), these Schur function sums have combinatorial interpretations in terms of increasing subsequences of multisets. That is, for each symmetry type and each specialization (subject to convergence conditions), we construct a random multiset and a notion of increasing subsequence such that (see Theorem 7.1) the normalized Schur function sum gives the distribution of the longest increasing subsequence. These random multiset models generalize K. Johansson’s 2-dimensional random growth model in [24]. Putting these together, we obtain a connection between integrals and increasing subsequences of multisets. In each case, the identity states that the dimension of a certain space of invariants is given by counting a certain collection of multisets without long increasing subsequences. For the three classical groups, we give direct proofs of this fact by producing an explicit basis of the invariants indexed by the appropriate multisets. Thus, for instance (see Theorem 8.2), the centralizer algebra of the nth tensor power representation of U (l) has basis given by permutations of length n with no increasing subsequence of length greater than l. More generally (see Theorem 8.4), the space of simultaneous multilinear invariants of a collection of symmetric and antisymmetric, covariant and contravariant tensors is explicitly indexed by multisets without long increasing subsequences (generalizing the classical straightening algorithm). Corresponding results hold for the orthogonal (see Theorems 8.5 and 8.6) and symplectic (see Theorems 8.7 and 8.8) groups. The remaining collection of results is of lesser interest combinatorially, but it is crucial to our asymptotic analysis in [3]. The key step in the analysis is to express the integrals of interest in terms of orthogonal polynomials on the unit circle. This is done for the classical groups in Theorem 2.3 (the remaining two groups reduce to the unitary group); for the unitary group the connection is immediate, while for the orthogonal group one must pass through orthogonal polynomials on [−1, 1]. We also give a number of results indicating how certain modifications to the integrand affect the integral. As a consequence, we find (see Corollary 4.3) that for each of the five natural Poisson models, the longest increasing subsequence distribution can be expressed in terms of the same family of orthogonal polynomials. In the sequel to this paper (see [3]), we determine the asymptotics of such polynomials via the RiemannHilbert method, and thus we obtain the limiting longest increasing subsequence distribution for each of the five Poisson models, as well as for the de-Poissonized versions (random symmetric permutations). The results are expressed in terms of the solution to the Painlev´e II equation, thus connecting to random matrix theory. Further related
4
BAIK AND RAINS
asymptotic work can be found in [2], [1], and [33]. Outline Section 1 introduces the five symmetry types and their associated groups. In particular, we express the (exact) distribution of the longest increasing subsequence of a random permutation of a given symmetry type as an integral over the corresponding group. Section 2 expresses these integrals as determinants of Bessel functions related to orthogonal polynomials; this relation is given in Section 3. Section 4 describes the extension of the integrals to include the cases when fixed points are allowed. In order to prove the formulae of Section 4, Section 5 uses representation-theoretic arguments to deduce integral representations of partial Cauchy-Littlewood sums, at which point the theory of symmetric functions can be applied. Section 6 briefly discusses alternate proofs of those formulae based on intermediate Pfaffian forms. Section 7 uses a generalized Robinson-Schensted-Knuth correspondence from [5] to relate the partial Cauchy-Littlewood sums to increasing subsequences of certain distributions of random multisets. Finally, Section 8 shows that there is an intimate connection between increasing subsequences and invariants of the classical groups. Indeed, all of the integrals for which we give increasing subsequence interpretations also can be interpreted as dimensions of certain spaces of invariants. We give direct, elementary proofs of these identities by constructing bases of invariants explicitly indexed by multisets with restricted increasing subsequence length. In the process we obtain a generalization of the straightening algorithm of invariant theory, as well as analogues for the orthogonal and symplectic groups. We also discuss extensions to quantum groups and supergroups. Notation We refer the reader to [29] for notation and an introduction to symmetric functions. In other notation, if G is a compact group, we use EU ∈G f (U )
(0.5)
to denote the integral of f (U ) with respect to the (normalized) Haar measure on G. In other words, this is the expected value of f evaluated at a uniform random element of G. When G is the orthogonal group, we occasionally need to consider the two components of G. Thus we write EU ∈O ± (l) f (U )
(0.6)
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
5
to denote the integral of f over the coset of O(l) of determinant ±1. In particular, we have 1 (0.7) EU ∈O ± (l) f (U ) = EU ∈O(l) (1 ± det(U )) f (U ). 2 1. Symmetrized increasing subsequence problems One of the standard models used in analyzing the usual increasing subsequence problem is defined as follows. We say that a collection of k points in the unit square is increasing if for any two points (x 1 , y1 ) and (x 2 , y2 ), either x 1 < x2 and y1 < y2 or x1 > x2 and y1 > y2 . One can ask, then, for the distribution of the size of the largest increasing subset of n points chosen uniformly and independently from the unit square. It is not too difficult to see that this is the same as the distribution of the longest increasing subsequence of a random permutation; indeed, we can associate a (uniformly distributed) permutation to a given collection of points by using the relative order of the y coordinates after sorting along the x coordinates. Also of interest is the Poissonized analogue, in which new points are occasionally added in such a way that the number of points at time λ is Poisson with parameter λ. One way to generalize this model is to impose a symmetry condition on the set of points. The square has eight symmetries; if we insist that the symmetry preserve increasing collections, we obtain a group H of four elements, generated by the reflections through the main diagonals. Thus there are five possible symmetry conditions we can impose (including the trivial condition), which we denote by the symbols , , , · , and , with associated groups H , H , H , H · , and H . (The symbol indicates the point/line(s) of reflection.) We also use the symbol ~ to denote an arbitrary choice of the five possibilities. Definition 1 We define ~ pnl
to be the probability that, if n points are chosen uniformly at random in the unit square, then the set 6 consisting of the images of those points under H~ contains no increasing subset with more than l points. We also define a function Q l~ (λ) = e−λ
X λn p ~ nl
0≤n
n!
.
(1.1)
The function Q l~ (λ) corresponds to the natural Poisson model; Q l~ (λ) is the probability that the largest increasing subset at time λ has size at most l, and 1 − Q l~ (λ) is the distribution function of the time at which an increasing subset of size l + 1 first appears. In the sequel, however, it turns out to be convenient to use a different time
6
BAIK AND RAINS
scale; we thus define Pl~ (t) = Q l~ (a~ t 2 /2),
(1.2)
where a = a = 1, a = a = 2, and a · = 4. (Here a~ is the number of Young tableaux associated to an element of Sn~ in the proof of Theorem 1.2.) If we map the set of points to a permutation, we obtain a permutation uniformly distributed from an appropriate ensemble. To be precise, define the involution ι ∈ Sn by x 7→ n + 1 − x. Then define an ensemble Sn~ for each symmetry type as follows: Sn = Sn ,
(1.3)
Sn = {π ∈ S2n : π = π −1 , π(x) 6= x},
(1.4)
−1
Sn = {π ∈ S2n : π = ιπ ι, π(x) 6= ι(x)}, Sn· = {π ∈ S2n : π = ιπι},
Sn = {π ∈ S4n : π = π
−1
, π = ιπ
−1
ι, π(x) 6= x, ι(x)}.
(1.5) (1.6) (1.7)
1.1 If a set 6 is chosen as above with symmetry ~, then with probability 1, the associated permutation is well defined and is uniformly distributed from Sn~ . LEMMA
This motivates the further definition ~ f nl~ = |Sn~ | pnl .
(1.8)
That is, f nl~ is the number of elements of Sn~ with no increasing subsequence of length greater than l. It is straightforward to compute |Sn~ | for each case: |Sn | = n!,
(1.9)
|Sn | = |Sn | =
|Sn· | = 2n n!, (2n)! |Sn | = . n!
(2n)! , 2n n!
(1.11) (1.12)
Note that Pl~ (t) = e−a~ t for
2 /2
and · , and note that Pl~ (t) = e−a~ t
2 /2
(1.10)
t 2n n!n!
(1.13)
t 2n (2n)!
(1.14)
X
f nl~
X
f nl~
0≤n
0≤n
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
7
for , , and . A major reason for considering the above problems is the following theorem. 1.2 Fix an integer l > 0. Map H into Aut(U (2l)) by THEOREM
7→ (U 7→ (U t )−1 ) and 7→ (U 7→ −J (U t )−1 J ), (1.15) ~ l where J = I0l −I 0 . Let U (2l) be the subgroup of U (2l) fixed by the corresponding automorphisms. Then ~ f n(2l) = EU ∈U ~ (2l) | Tr(U )|2n . (1.16) Before giving the proof, it is helpful to list the groups U ~ (2l): U (2l) = U (2l),
U (2l) = O(2l),
U (2l) = Sp(2l), U · (2l) ∼ = U (l) × U (l),
U (2l) = O(2l) ∩ Sp(2l) ∼ = U (l).
(1.17) (1.18) (1.19) (1.20) (1.21)
The last instance is the image of the fundamental representation of U (l) as a 2l×2l real matrix, and thus it corresponds to the direct sum of the fundamental representation and its conjugate. Proof The cases , , and are given in [32]; more precisely, is given there as [32, Th. 1.1], while and are given in [32, Th. 3.4]. (Note that if π ∈ Sn , then πι is a fixed-point-free involution with decreasing subsequences corresponding to the increasing subsequence of π.) We also give new, elementary proofs (see Theorems 8.2, 8.5, and 8.7). It remains to consider · and . Via the Robinson-Schensted correspondence (see [26, Sec. 5.1.4] for an excellent introduction), we can associate a pair (P, Q) of Young tableaux of the same shape to π ∈ Sn· , satisfying the relations P S = P, Q S = Q, where S is the duality operation (“evacuation”) of Schu¨ tzenberger (see [26]). But there is a bijective correspondence between self-dual tableaux and domino tableaux (see, e.g., [42]), and, further, from domino tableaux to pairs of ordinary tableaux with disjoint content, and with shape determined only by the shape of the domino tableau (see [37]). Thus we have associated four Young tableaux (P1 , P2 , Q 1 , Q 2 ), where P1 and Q 1 have the same shape, P2 and Q 2 have the same shape, P1 and P2 have
8
BAIK AND RAINS
disjoint content, and Q 1 and Q 2 have disjoint content. This corresponds to a choice of 0 ≤ m ≤ n, independent choices of two subsets of size m of [1, 2, . . . , n], and independent choices of π1 and π2 of length m and n − m. Furthermore, the longest increasing subsequence of π has length at most 2l precisely when the longest increasing subsequences of π1 and π2 are of length at most l. Putting this together, we find · = f n(2l)
X n 2 f ml f (n−m)l , m
(1.22)
0≤m≤n
while, the integral formula simplifies as follows: EU ∈U · (2l) | Tr(U )|2n = EU1 ∈U (l),U2 ∈U (l) | Tr(U1 ) + Tr(U2 )|2n = EU1 ∈U (l),U2 ∈U (l) |(Tr(U1 ) + Tr(U2 ))n |2 X 2 n m n−m Tr(U1 ) Tr(U2 ) = EU1 ∈U (l),U2 ∈U (l) m 0≤m≤n X n 2 = f ml f (n−m)l m 0≤m≤n
· . = f n(2l)
(1.23)
Similarly, an element π ∈ Sn corresponds to a pair of Young tableaux of the same shape with disjoint content, and thus 2n f nl , f n(2l) = (1.24) n while EU ∈U
(2l)
| Tr(U )|2n = EU ∈U (l) (Tr(U ) + Tr(U ))2n X 2n 2n−m = EU ∈U (l) Tr(U )m Tr(U ) m 0≤m≤2n 2n = (1.25) EU ∈U (l) | Tr(U )|2n . n
It ought to be possible to give a more uniform proof of this result; the results of Section 8 may be relevant to this goal.
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
9
There is an analogue of Theorem 1.2 in which 2l is replaced by 2l + 1. THEOREM 1.3 For any n, l ≥ 0,
f n(2l+1) = EU ∈U (2l+1) | Tr(U )|2n , 2n
(1.26)
f n(2l+1) = EU ∈O(2l+1) | Tr(U )| , · f n(2l+1) = EU ∈U (l)⊕U (l+1) | Tr(U )|2n ,
(1.27)
f n(2l+1) = f n(2l) ,
(1.29)
(1.28)
while
f n(2l+1) = f n(2l) .
(1.30)
Also, we have the following corollary for · and . COROLLARY 1.4 For any n, l ≥ 0,
P2l· (t) = Pl (t)2 , · (t) = P (t)P (t), P2l+1 l l+1 P2l (t) = Pl (t).
(1.31) (1.32) (1.33)
And for , , and , we have the following corollary. 1.5 For any n, l ≥ 0, COROLLARY
Pl (t) = e−t P2l (t) = e P2l (t) = e
2 /2
−t 2 /2 −t 2
EU ∈O(l) exp(t Tr(U )),
(1.34)
EU ∈Sp(2l) exp(t Tr(U )),
(1.35)
EU ∈U
(2l) exp(t
Tr(U )).
(1.36)
Proof For , we have et
2 /2
Pl (t) =
X t 2n EU ∈O(l) Tr(U )2n . (2n)!
(1.37)
0≤n
But EU ∈O(l) Tr(U )n = 0 for n odd, so this is EU ∈O(l) exp(t Tr(U )),
(1.38)
10
BAIK AND RAINS
as required. The calculations for
and
are analogous.
Remark. In particular, we see that formula (1.36), which was derived in [30] as an expression for , is really most naturally interpreted in terms. As an aside, we observe that if we remove the condition that the symmetries under consideration preserve increasing sets but insist that the corresponding sets should still give permutations, there is one further type of symmetry allowed, namely, rotation by 90 degrees. In terms of permutations, this is the set Sn◦ = {π ∈ S4n : π 2 = ι},
|Sn◦ | = (2n)!/n!.
(1.39)
Such permutations correspond to pairs of tableaux (P, Q t ) with n elements such that P and Q have the same shape and disjoint content. It follows that the length l of the longest increasing subsequence from this set has the same distribution as max(2l + (π), 2l − (π) − 1), where π is randomly chosen from Sn , l + (π) is the increasing subsequence length of π, and l − (π) is the decreasing subsequence length of π. In particular, the bound of P. Erd¨os and G. Szekeres implies that f nl◦ = 0 for n > l 2 , and thus no integral formula a` la Theorem 1.2 can exist for this case. There is a determinant formula, however, which can be obtained from the following symmetric function identity: ! X (h j−i (x))0≤i
One can either derive this via the approach in [16] or simply use the Jacobi-Trudi identity together with matrix manipulation as in [13]. This then gives the following formulae: j−i t 2n X ( j−1)! t ◦ 0≤i
(l+i− j)!
0≤i
It is not clear how to obtain asymptotic information from the formulae, so we do not discuss this case further. (The techniques of [6] may be applicable, however.)
2. Determinantal formulae It turns out that each of the formulae of Corollary 1.5 can be expressed in terms of a Toeplitz or Hankel determinant, related to orthogonal polynomials on the unit circle.
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
11
Indeed, this correspondence is more general. For the unitary group, we have the following theorem. 2.1 Let f (z), g(z) be any functions on the unit circle. Then THEOREM
EU ∈U (l) det( f (U )) det(g(U † )) Z 1 iθ −iθ i( j−k)θ = det f (e )g(e )e dθ . (2.1) 2π [0,2π] 0≤ j,k
0≤ j
The result follows from the standard theory of Toeplitz determinants or by the classic formula for the integral of a product of two (generalized) Vandermonde determinants (see, for instance, [9]): 1 l!
Z
Sl
det(φ j (xk ))0≤ j,k
Y
µ(d x j )
j
= det
Z
φ j (x)ψk (x)µ(d x) S
(2.3) 0≤ j,k
for any measure µ on any set S. Similarly, for the orthogonal and symplectic groups, we have the following theorem. THEOREM 2.2 Let g(z) be any function on the unit circle such that the integrals Z 1 ιj = g(eiθ )g(e−iθ )ei jθ dθ 2π [0,2π]
(2.4)
are well defined. Then 1 det(ι j−k + ι j+k )0≤ j,k
(2.5) (2.6)
12
BAIK AND RAINS
EU ∈O + (2l+1) det(g(U )) = g(1) det(ι j−k − ι j+k+1 )0≤ j,k
EU ∈O − (2l+1) det(g(U )) = g(−1) det(ι j−k + ι j+k+1 )0≤ j,k
(2.7) (2.8) (2.9)
except that EU ∈O + (0) det(g(U )) = 1. Proof As observed in [23, Prop. 3.1], integrals over the orthogonal and symplectic groups can be expressed as Hankel determinants; thus, for instance, EU ∈O + (2l) det(g(U )) Z 1 ∝ det g(eiθ )g(e−iθ ) p j (cos(θ)) pk (cos(θ)) dθ π [0,2π] 0≤ j,k
(2.10)
for any polynomials p j with deg( p j ) = j. In particular, this must be true when p j (cos(θ)) = cos( jθ) (Chebyshev polynomials). In that case, noting that ι k = ι−k , the jk coefficient of the determinant is Z 1 g(eiθ )g(e−iθ ) cos( jθ) cos(kθ) dθ = ι j−k + ι j+k . (2.11) π [0,2π] The constant of proportionality can be determined by comparing the two sides when g = 1, and thus ι j = δ j0 . For the other cases, we take p j (cos(θ)) to be sin(( j + 1)θ) , sin(θ)
sin(( j + 1/2)θ) , sin(θ/2)
cos(( j + 1/2)θ) , cos(θ/2)
(2.12)
as appropriate. For our purposes, we need the following related result. THEOREM 2.3 Let g(z) be as above, and consider the (symmetric) inner product on polynomials Z 1 h p(z), q(z)i = p(eiθ )q(e−iθ )g(eiθ )g(e−iθ ) dθ. (2.13) 2π [0,2π]
Let π j (z) be the monic orthogonal polynomials on the unit circle relative to that inner product, and define N j = hπ j (z), π j (z)i. If the polynomials π j (z) are well defined, then Y EU ∈U (l) | det(g(U ))|2 = Nj, (2.14) 0≤ j
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
EU ∈O + (0) det(g(U )) = 1, EU ∈O + (2l) det(g(U )) = N0
(2.15) N2 j+2 (1 + π2 j+2 (0))−1 ,
Y
0≤ j
EU ∈O − (2l) det(g(U )) = g(1)g(−1) EU ∈O + (2l+1) det(g(U )) = g(1)
Y
0≤ j
EU ∈O − (2l+1) det(g(U )) = g(−1) EU ∈Sp(2l) det(g(U )) =
13
Y
0≤ j
Y
0≤ j
N2 j+2 (1 − π2 j+2 (0))−1 ,
N2 j+1 (1 − π2 j+1 (0))−1 ,
Y
0≤ j
N2 j+1 (1 + π2 j+1 (0))−1 ,
N2 j+2 (1 − π2 j+2 (0))−1 .
(2.16) (2.17) (2.18) (2.19) (2.20)
The proof is given in Section 3. By combining these formulae, we obtain the following corollary. 2.4 Let g(z) be as above. Then, for any l > 0, COROLLARY
g(1)g(−1) EU ∈U (l) | det(g(U ))|2
= (EU ∈O + (l+1) det(g(U )))(EU ∈O − (l+1) det(g(U ))). (2.21)
Proof Use Nl = (1 − πl (0)2 )Nl−1 , which follows from (3.11). In our case, the function g(z) = e t z , and thus everything is related to the orthogonal polynomials on the circle for the weight function exp(t (z + 1/z)) = exp(2t cos θ). Let these polynomials be πl (z; t) with norms Nl (t), and define five functions Dl (t), D ±± (t) by Y Dl (t) = det(I j−k (2t))0≤ j,k
D0−− (t)
= 1, 1 Dl−− (t) = det(I j−k (2t) + I j+k (2t))0≤ j,k
Dl++ (t) = det(I j−k (2t) − I j+k+2 (2t))0≤ j,k
(2.23)
(2.24)
14
BAIK AND RAINS
N2 j+2 (t)(1 − π2 j+2 (0; t))−1 ,
(2.25)
Dl+− (t) = det(I j−k (2t) − I j+k+1 (2t))0≤ j,k
(2.26)
=
Y
0≤ j
0≤ j
Dl−+ (t) = det(I j−k (2t) + I j+k+1 (2t))0≤ j,k
(2.27)
0≤ j
where 1 I j (2t) = 2π
Z
[0,2π]
e2t cos θ ei jθ =
is a modified Bessel function of the first kind. Then
X t m t j+m m! ( j + m)! m
(2.28)
EU ∈U (l) | exp(t Tr(U ))|2 = Dl (t),
(2.29)
EU ∈O − (2l) exp(t Tr(U )) =
(2.31)
EU ∈O + (2l) exp(t Tr(U )) =
EU ∈O + (2l+1) exp(t Tr(U )) = EU ∈O − (2l+1) exp(t Tr(U )) =
Dl−− (t), ++ Dl−1 (t), t +− e Dl (t), e−t Dl−+ (t).
(2.30) (2.32) (2.33)
In summary we have the following theorem. THEOREM 2.5 For l ≥ 0, we have the following formulae: 2
Pl (t) = e−t Dl (t), 2 ++ P2l (t) = e−t /2 Dl−− (t) + Dl−1 (t) /2, 2 P2l+1 (t) = e−t /2 et Dl+− (t) + e−t Dl−+ (t) /2, −t 2 /2
P2l (t) = e Dl++ (t), 2 P · (t) = e−2t D (t)2 ,
· (t) = e P2l+1 P2l (t) = e except that P0 (t) = e−t
2 /2
−2t 2 −t 2
Dl (t)Dl+1 (t),
Dl (t),
D0−− (t) = e−t
(2.35) (2.36) (2.37) (2.38)
l
2l
(2.34)
(2.39) (2.40)
2 /2
.
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
15
In [3], we also need the following limits, which follow immediately from the Szeg o¨ limit theorem and the analogue (see [23]) for orthogonal polynomials on a finite interval. THEOREM 2.6 For any real t ≥ 0, 2
lim Dl (t) = et ,
(2.41)
l→∞
lim Dl−− (t) = et
2 /2
lim Dl++ (t) = et
2 /2
l→∞ l→∞
,
(2.42)
,
(2.43)
lim Dl+− (t) = e−t+t
2 /2
l→∞
lim Dl−+ (t) = et+t
2 /2
l→∞
,
.
(2.44) (2.45)
Remark. These limits are also valid as limits of formal power series in t; see [10] and Theorem 8.10. COROLLARY 2.7 For any real t ≥ 0, 2
e−t Dl (t) = e−t e−t e−t e−t
2 /2
2 /2
2 /2+t
2 /2−t
Dl−− (t) = Dl++ (t) = Dl+− (t) = Dl−+ (t) =
Y
N j (t)−1 ,
(2.46)
N2 j+2 (t)−1 (1 + π2 j+2 (0; t)),
(2.47)
N2 j+2 (t)−1 (1 − π2 j+2 (0; t)),
(2.48)
N2 j+1 (t)−1 (1 − π2 j+1 (0; t)),
(2.49)
N2 j+1 (t)−1 (1 + π2 j+1 (0; t)).
(2.50)
j≥l
Y j≥l
Y j≥l
Y j≥l
Y j≥l
3. Orthogonal polynomial identities Let w(x) on [−1, 1] and f (z) on the unit circle be related by f (eiθ ) = w(cos θ).
(3.1)
16
BAIK AND RAINS
Associated to f are five sets of monic orthogonal polynomials: πl (z) with respect to the (symmetric) inner product Z 1 h p(z), q(z)i = p(eiθ )q(e−iθ ) f (eiθ ) dθ (3.2) 2π [0,2π] on the unit circle, and the four sets pl±± (z) with respect to the inner products Z 1 h p(x), q(x)i = p(x)q(x)w(x)(1 − x)±1/2 (1 + x)±1/2 d x. π [−1,1]
(3.3)
The notation h p, qi refers to the inner product with respect to f (z) or w(x)(1 − x 2 )−1/2 , whichever is appropriate. Thus the defining identities for these polynomials are hπn , πm i −− −− h pn , pm i −+ −+ h(1 + x) pn , pm i +− h(1 − x) pn+− , pm i 2 ++ ++ h(1 − x ) pn , pm i
= δnm Nn ,
(3.4)
=
(3.6)
= = =
δnm Nn−− , δnm Nn−+ , δnm Nn+− , δnm Nn++ .
(3.5) (3.7) (3.8)
We also use the notation πl∗ (z) = z l πl (1/z).
(3.9)
Then the following identities hold (see, e.g., [39, Th. 11.5]). For π, πl+1 (z) = zπl (z) + πl+1 (0)πl∗ (z), Y Nl = N 0 (1 − π j (0)2 ),
(3.10) (3.11)
1≤ j≤l
πl (1) = (−1)l πl (−1) =
Y
1≤ j≤l
Y
1≤ j≤l
(1 + π j (0)),
(3.12)
(1 + (−1) j π j (0)).
(3.13)
For p −− , p0−− ((z + 1/z)/2) = 1,
(2z)l pl−− ((z
+ 1/z)/2) =
(3.14)
∗ π2l−1 (z) + Y −l
pl−− (1) = 2
0≤ j<2l
pl−− (−1) = (−2)−l
zπ2l−1 (z),
(3.15)
(1 + π j (0)),
(3.16)
Y
0≤ j<2l
(1 + (−1) j π j (0)),
(3.17)
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
Nl−− = 41/2−l (1 + π2l (0))−1 =4
1/2−l
17
Y
0≤ j<2l
(1 − π j (0)2 )
N2l (1 + π2l (0))−1 .
(3.18)
For p +− , (1 − z)(2z)l pl+− ((z + 1/z)/2) = π2l∗ (z) − zπ2l (z), Y pl+− (−1) = (−2)−l (1 + (−1) j π j (0)),
(3.19) (3.20)
1≤ j≤2l
Nl+−
−l
= 4 (1 + π2l+1 (0))
Y
1≤ j≤2l
−l
(1 − π j (0)2 )
= 4 N2l (1 + π2l+1 (0))
= 4−l N2l+1 (1 − π2l+1 (0))−1 .
(3.21)
For p −+ , (1 + z)(2z)l pl−+ ((z + 1/z)/2) = π2l∗ (z) + zπ2l (z), Y (1 + π j (0)), pl−+ (1) = 2−l
(3.22) (3.23)
1≤i≤2l
Nl−+
−l
= 4 (1 − π2l+1 (0))
Y
1≤i≤2l
−l
(1 − πi (0)2 )
= 4 N2l (1 − π2l+1 (0))
= 4−l N2l+1 (1 + π2l+1 (0))−1 .
(3.24)
And finally, for p ++ , ∗ (1 − z 2 )(2z)l pl++ ((z + 1/z)/2) = π2l+1 (z) − zπ2l+1 (z),
Nl++ = 4−l−1/2 (1 + π2l+2 (0)) =4
−l−1/2
(3.25) Y
1≤ j≤2l+1
(1 − π j (0)2 )
N2l+1 (1 + π2l+2 (0))
= 4−l−1/2 N2l+2 (1 − π2l+2 (0))−1 .
(3.26)
All of the proofs are straightforward. We can now prove Theorem 2.3. Recall that we have chosen f to be of the form f (z) = g(z)g(1/z) such that the above inner product is well defined and nondegenerate. Now consider the integral for O + (2l). By the proof of Theorem 2.2 and the theory of Hankel determinants, we have Y EU ∈O + (2l) det(g(U )) ∝ N −− (3.27) j ; 0≤ j
18
BAIK AND RAINS
but then this is in turn proportional to Y N0 N2 j+2 (1 + π2 j+2 (0))−1 .
(3.28)
0≤ j
The constant can be determined by taking g = 1. The other cases are analogous. THEOREM 3.1 Let f and g be as above. Then, for any α, ∗ EU ∈O ± (2l) det((1 − αU )g(U )) = (π2l−1 (α) ± απ2l−1 (α)) EU ∈O ± (2l) det(g(U )), (3.29)
EU ∈O ± (2l+1) det((1 − αU )g(U )) = (π2l∗ (α) ∓ απ2l (α)) EU ∈O ± (2l+1) det(g(U )). (3.30) As a special case, EU ∈O + (l) det((1 + U )g(U )) = 2g(−1)−1 EU ∈O − (l+1) det(g(U )), EU ∈O − (l) det((1 + U )g(U )) = 0,
EU ∈O + (2l) det((1 − U )g(U )) = 2g(1)
−1
EU ∈O − (2l) det((1 − U )g(U )) = 0,
EU ∈O + (2l+1) det((1 − U )g(U )) = 0,
EU ∈O − (2l+1) det((1 − U )g(U )) = 2g(1)
EU ∈O + (2l+1) det(g(U )),
(3.31) (3.32) (3.33) (3.34) (3.35)
−1
EU ∈O − (2l+2) det(g(U )).
(3.36)
Proof The proof follows by standard results about the behavior of orthogonal polynomials when the weight function is multiplied by a polynomial. Similarly, we have the following theorem. 3.2 With notation as above and for αβ 6= 1, THEOREM
EU ∈U (l) det((1 − αU )(1 − βU † )g(U )g(U † )) =
πl∗ (α)πl∗ (β) − αβπl (α)πl (β) EU ∈U (l) det(g(U )g(U † )). (3.37) 1 − αβ
In particular, 2g(1)g(−1)(1 − αβ) EU ∈U (l) det((1 − αU )(1 − βU † )g(U )g(U † ))
= EU ∈O + (l+1) det((1 − αU )g(U )) EU ∈O − (l+1) det((1 − βU )g(U ))
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
19
+ EU ∈O + (l+1) det((1 − βU )g(U )) EU ∈O − (l+1) det((1 − αU )g(U )). (3.38) For our purposes, the following form is preferable. COROLLARY 3.3 With notation as above and for αβ 6= 1, |β| < 1,
2g(1)g(−1)(1 − αβ) EU ∈U (l) det((1 − αU )(1 − βU )−1 g(U )g(U † )) (1 − β 2 )
= EU ∈O + (l+1) det((1 − αU )(1 − βU )−1 g(U )) · EU ∈O − (l+1) det(g(U ))
+ EU ∈O + (l+1) det(g(U ))
· EU ∈O − (l+1) det((1 − αU )(1 − βU )−1 g(U )).
(3.39)
Remark. This, together with the following theorem, enables us to compute the limits α → ±1 or β → ±1. THEOREM 3.4 With notation as above and for |β| < 1,
EU ∈O + (2l+2) det((1 − βU )−1 g(U ))/ EU ∈O + (2l+2) det(g(U )) =
(1/2)β −l h2l pl−− (x), (1 − β 2 )/(1 − 2βx + β 2 )i , (3.40) (1 − β 2 )N2l (1 + π2l (0))−1
EU ∈O − (2l+2) det((1 − βU )−1 g(U ))/ EU ∈O − (2l+2) det(g(U )) =
++ β 1−l h2l (1 − x 2 ) pl−1 (x), 1/(1 − 2βx + β 2 )i
(1 − β 2 )N2l (1 − π2l (0))−1
, (3.41)
EU ∈O + (2l+3) det((1 − βU )−1 g(U ))/ EU ∈O + (2l+3) det(g(U )) =
β −l h2l (1 − x) pl+− (x), (1 + β)/(1 − 2βx + β 2 )i , (3.42) (1 − β 2 )N2l (1 + π2l+1 (0))
EU ∈O − (2l+3) det((1 − βU )−1 g(U ))/ EU ∈O − (2l+3) det(g(U )) =
β −l h2l (1 + x) pl−+ (x), (1 − β)/(1 − 2βx + β 2 )i , (3.43) (1 − β 2 )N2l (1 − π2l+1 (0))
20
BAIK AND RAINS
EU ∈Sp(2l) det((1 − βU )−1 g(U ))/ EU ∈Sp(2l) det(g(U )) =
++ (x), 1/(1 − 2βx + β 2 )i β 1−l h2l (1 − x 2 ) pl−1
N2l (1 − π2l (0))−1
. (3.44)
In the limits as β → ±1, lim (1 − β 2 ) EU ∈O ± (l+1) det((1 − βU )−1 g(U )) = g(1) EU ∈O ± (l) det(g(U )),
β→1 |β|<1
(3.45)
lim (1 − β 2 ) EU ∈O ± (l+1) det((1 − βU )−1 g(U )) = g(−1) EU ∈O ∓ (l) det(g(U )).
β→−1 |β|<1
(3.46)
For the symplectic group, lim EU ∈Sp(2l) det((1 − βU )−1 g(U )) = g(−1)−1 EU ∈O − (2l+1) det(g(U )), (3.47)
β→1 |β|<1
lim EU ∈Sp(2l) det((1 − βU )−1 g(U )) = g(1)−1 EU ∈O + (2l+1) det(g(U )).
β→−1 |β|<1
(3.48)
Proof The first group of statements are straightforward via appropriate row and column operations on the Hankel matrix. For the limits, consider lim (1 − β 2 ) EU ∈O + (2l+2) det(1 − βU )−1 det(g(U )).
β→1 |β|<1
(3.49)
Here we need to compute the limit of β −l h2l pl−− (x), (1 − β 2 )/(1 − 2βx + β 2 )i.
(3.50)
Now, (1 − β 2 )/(1 − β(z + 1/z) + β 2 ) = 1 +
X
m>0
β m (z m + z −m ),
(3.51)
so, for any polynomial q(z), lim hq(z), 1/(1 − 2βx + β 2 )i = q(1)g(1)2 .
β→1 |β|<1
(3.52)
Plugging in and simplifying gives the desired result. The other cases either are analogous to or follow from Theorem 3.1.
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
21
4. Diagonal points Recall that in defining the ensembles S ~ above we insisted for and that π have no fixed points and for and that πι have no fixed points. While these conditions are very natural from the standpoint of points in the square, they seem somewhat artificial in the permutation setting. This suggests consideration of the following extended ensembles: S˜n = {π ∈ Sn : π = π −1 }, S˜n = {π ∈ Sn : π = ιπ
−1
(4.1) (4.2)
ι},
S˜n = {π ∈ S2n : π = π −1 , π = ιπ −1 ι}.
(4.3)
We immediately encounter a difficulty, however, with the Poisson generating function since the probabilities involve division by the complicated sum X n (2m)! (4.4) 2m 2m m! 0≤m≤bn/2c
for and , and a similar sum for . However, there is a generalized Poisson model for which the generating function is again tractable. For , the model is as follows. In a given infinitesimal time interval [t, t + dt], we add two points (the images of a uniform random point) with probability (1/2)t dt, and we add one point (a point uniformly chosen on the diagonal) with probability α dt. Then the probability that we have n points at time t is X n (2m)! n 2 e−αt−t /2 α n−2m t /n!. (4.5) 2m 2m m! 0≤m≤bn/2c
Thus if we define Pl (t; α) to be the probability that the increasing subsequence length at time t is at most l, then Pl (t; α) = e−αt−t
2 /2
X tn X 0≤n
n!
0≤m
α m f˜nml ,
(4.6)
where f˜nml is the number of involutions on n elements with m fixed points and no increasing subsequence of length greater than l. This is, of course, compatible with our previous notation, with Pl (t; 0) = Pl (t). Similarly, for , if we add negated points with probability β dt, we obtain the Poisson generating function Pl (t; β) = e−βt−t
2 /2
X tn X 0≤n
n!
0≤m
β m f˜nml .
(4.7)
22
BAIK AND RAINS
Finally, for , we have two parameters in the model; we add fixed points with probability α dt, negated points with probability β dt, and generic points with probability 2t dt, obtaining Pl (t; α, β) = e−αt−βt−t
2
X tn 0≤n
n!
X
0≤m + ,m −
α m + β m − f˜nm + m − l .
(4.8)
The β-parameter is largely irrelevant due to the following lemma. LEMMA 4.1 For all l ≥ 0, α ≥ 0, and β ≥ 0,
P2l+1 (t; β) = P2l (t; 0),
P2l+1 (t; α, β) = P2l (t; α, 0).
(4.9) (4.10)
Proof The key observation is that if an increasing subsequence contains a given point off the diagonal, it can always be extended to contain the reflection of that point through that diagonal; moreover, no increasing subsequence can contain more than one point on that diagonal. Thus the point collection at time t has increasing subset size at most 2l +1 if and only if the off-diagonal subset has increasing subset size at most 2l. Since the points were added via a generalized Poisson process, the off-diagonal subset itself corresponds to a generalized Poisson process; the lemma is then immediate. We can again express these generating functions as integrals. THEOREM
4.2
For all l, Pl (t; α) = e−αt−t
2 /2
Pl (t; 1) = e
EU ∈O − (l+1) exp(t Tr(U )),
P2l (t; β) = e P2l+1 (t; β) = e Pl (t; 1) = e P2l (t; α, β) = e
−t 2 /2
−βt−t 2 /2 −t 2 /2 −t 2 /2
EU ∈O(l) det(1 + αU ) exp(t Tr(U )), EU ∈Sp(2l) det(1 − βU )−1 exp(t Tr(U )),
(4.13) (4.14)
EU ∈O − (l+1) exp(t Tr(U )),
(4.15)
2
EU ∈U (l) det((1 + αU )(1 − βU )−1 ) exp(2t Re Tr(U )), (4.16)
P2l+1 (t; α, β) = e−αt−t EU ∈U (l) det(1 + αU ) exp(2t Re Tr(U )), P2l+1 (t; 1, β) = e
(4.12)
EU ∈Sp(2l) exp(t Tr(U )),
−αt−βt−t 2
−t 2
(4.11)
(4.17)
EU ∈O − (l+2) exp(t Tr(U )) · EU ∈O − (l+1) exp(t Tr(U )). (4.18)
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
23
Proof This follows immediately from the symmetric function identities of Section 5 (see, in particular, Remark 5.5.2 to Corollary 5.5). Remark. Strictly speaking, equations (4.13) and (4.16) are valid only when β < 1. The correct formulae for β ≥ 1 are obtained by analytic continuation (e.g., by expanding det(1 − βU )−1 in a formal power series, integrating term-by-term, and summing). From Theorem 4.2 and the orthogonal polynomial identities of Section 3, we have the following formulae. COROLLARY
4.3
For α, β ≥ 0, P2l (t; α) = e
−αt−t 2 /2 1
2
∗ π2l−1 (−α; t) − απ2l−1 (−α; t) Dl−− (t)
∗ π2l−1 (−α; t) + απ2l−1 (−α; t)
++ Dl−1 (t)
+ , (4.19) 1 ∗ 2 π2l (−α; t) + απ2l (−α; t) et Dl+− (t) P2l+1 (t; α) = e−αt−t /2 2 + π2l∗ (−α; t) − απ2l (−α; t) e−t Dl−+ (t) , (4.20) 2 /2
Dl−+ (t),
(4.21)
−t 2 /2
Dl++ (t),
(4.22)
P2l (t; 1) = P2l (t; 1) = e−t−t
P2l+1 (t; 1) = P2l+1 (t; 1) = e P2l+1 (t; β) = e P2l+1 (t; α, β) = e P4l+1 (t; 1, β) = e P4l+3 (t; 1, β) = e
−t 2 /2
Dl++ (t),
−αt−t 2
πl∗ (−α; t)Dl (t),
(4.23) (4.24)
−t−t 2
Dl++ (t)Dl−+ (t),
(4.25)
−t−t 2
−+ Dl++ (t)Dl+1 (t).
(4.26)
5. Schur function identities In order to derive the above generating functions for P (t; α), P (t; β), and P (t; α, β), we first need a stronger version of Theorem 1.2. Remark. In the sequel, we see a number of expressions of the form EU ∈U ~ (l) det(g(U )),
(5.1)
24
BAIK AND RAINS
where g takes values in a ring of formal power series (e.g., the ring of symmetric functions) with coefficients in C[z, 1/z] (Laurent polynomials in z). This formal integral is defined by expanding det(g(U )) and integrating term-by-term. We can recover analytical results by specializing down to a small number of variables in such a way that the resulting series converge. We refer the reader to [29] for an introduction to symmetric functions. Recall the following symmetric function identities. 5.1 The following identities hold in the algebra of symmetric functions on two sets of variables: X 1 X Y (1 − x j yk )−1 = sλ (x)sλ (y) = exp (5.2) pm (x) pm (y) . m LEMMA
λ
j,k
m>0
And, for any symmetric function f , Y
f (x), (1 − x j yk )−1 = f (y).
(5.3)
j,k
It is convenient to write Y j,k
(1 − x j yk )−1 =
Y j
H (x j ; y),
(5.4)
where H (u; y) :=
Y j
(1 − uy j )−1 =
X
h ju j
(5.5)
j
is the generating function for the complete symmetric functions h j ; by convention, h j = 0 when j < 0. In the sequel we also need the generating function Y X E(u; y) := (1 + uy j ) = eju j (5.6) j
j
for the elementary symmetric functions. 5.2 For all l ≥ 0, THEOREM
X
`(λ)≤l
sλ (x)sλ (y) = EU ∈U (l) det(H (U ; x)H (U † ; y)),
(5.7)
X
(5.8)
`(λ)≤l
s2λ (x) = EU ∈O(l) det(H (U ; x)),
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
X
`(λ)≤2l
sλ2 (x) = EU ∈Sp(2l) det(H (U ; x)).
25
(5.9)
Proof Consider the first identity. We have EU ∈U (l) det(H (U ; x)H (U † ; y)) =
X λ,µ
sλ (x)sµ (y) EU ∈U (l) sλ (U )sµ (U † ).
(5.10)
But the integral on the right is 1 when λ = µ and `(λ), λ(µ) ≤ l, and it is zero otherwise. The result follows immediately. Similarly, the identity for O(l) follows from the fact that EU ∈O(l) sλ (U )
(5.11)
is 1 when λ is even and `(λ) ≤ l, and it is zero otherwise; and the identity for Sp(2l) follows from the fact that EU ∈Sp(2l) sλ (U ) (5.12)
is 1 when λ0 is even and `(λ) ≤ l, and it is zero otherwise. Remark 5.2.1 This is a direct generalization of the argument in [32]. Remark 5.2.2 The expression det(H (U ; x)) can also be written as X 1 exp pm (x) Tr(U m ) . m
(5.13)
m>0
For the hyperoctahedral cases · and , we need the notation s˜λ (x) = sλ(0) (x)sλ(1) (x)
(5.14)
whenever λ is a partition with trivial 2-core and 2-quotient (λ(0) , λ(1) ), and s˜λ (x) = 0 otherwise. From [29, Example 1.5.24], this can also be written as s˜λ (x) = (−1) f (λ)/2 φ2 (sλ (x)),
(5.15)
where f (λ) is the number of odd parts of λ (note that this is even when λ has trivial 2-core), and φ2 is the homomorphism such that φ2 (h 2n ) = h n ,
(5.16)
φ2 (h 2n+1 ) = 0,
(5.17)
φ2 (H (u; x)) = H (u 2 ; x).
(5.18)
or, equivalently,
26
BAIK AND RAINS
COROLLARY
5.3
For all l ≥ 0, X
`(λ)≤2l
s˜λ (x)˜sλ (y) = EU ∈U · (2l) det(H (U ; x)H (U † ; y)),
X
`(λ)≤l
s˜2λ2 (x) = EU ∈U
(2l) det(H (U ; x)).
(5.19) (5.20)
Proof This follows from the computations X X s˜λ (x)˜sλ (y) = sλ (x)sµ (x)sλ (y)sµ (y) `(λ)≤2l
`(λ)≤l, `(µ)≤l
=
X
sλ (x)sλ (y)
`(λ)≤l
2
= EU ∈U (l) det(H (U ; x)H (U † ; y))
2
= EU1 ,U2 ∈U (l) det(H (U1 ; x)H (U1† ; y)) det(H (U2 ; x)H (U2† ; y)) = EU ∈U (l)⊕U (l) det(H (U ; x)H (U † ; y))
(5.21)
and X
`(λ)≤l
s˜2λ2 (x) =
X
sλ (x)sλ (x)
`(λ)≤l
= EU ∈U (l) det(H (U ; x)) det(H (U † ; x)) = EU ∈U
(2l) det(H (U ; x)).
(5.22)
The point of writing the integrands in the form det(H (U ; x)) is that we know how to convert such integrals into determinants. THEOREM 5.4 For all l ≥ 0,
EU ∈U (l) det(H (U ; x)H (U † ; y)) = det(g j−k (x; y))0≤ j,k
· det(i j−k (x) − i j+k+2 (x))0≤ j,k
(5.23) (5.24)
(5.25)
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
27
EU ∈O + (2l+1) det(H (U ; x)) = H (1; x) det(i j−k (x) − i j+k+1 (x))0≤ j,k
(5.28)
where we define g j (x; y) = i j (x) =
X
h m (x)h m+ j (y),
(5.29)
h m (x)h m+ j (x).
(5.30)
m
X m
Proof By Theorems 2.1 and 2.2, it suffices to evaluate Z 1 H (eiθ ; x)H (e−iθ ; y)ei jθ dθ. 2π [0,2π] But this integral is just the coefficient of z − j in H (z; x)H (z −1 ; y); that is, X h m (x)h m+ j (y) = g j (x; y).
(5.31)
(5.32)
m
The following is then immediate. COROLLARY 5.5 For all l > 0, X sλ (x)sλ (y) = det(g j−k (x; y))0≤ j,k
(5.33)
`(λ)≤l
X
`(λ)≤2l
s2λ (x) =
h1 2
det(i j−k (x) + i j+k (x))0≤ j,k
i + H (1; x)H (−1; x) det(i j−k (x) − i j+k+2 (x))0≤ j,k
(5.34)
X
`(λ)≤2l+1
X
`(λ)≤2l
s2λ (x) = H (1; x) det(i j−k (x) − i j+k+1 (x))0≤ j,k
sλ2 (x) = det(i j−k (x) − i j+k+2 (x))0≤ j,k
(5.35) (5.36)
28
BAIK AND RAINS
Remark 5.5.1 P P The expressions for `(λ)≤l sλ (x)sλ (y) and `(λ)≤2l sλ2 (x) are already known (see [13] and [16], resp.; the latter also gives analogues of (5.56) and (5.57)), although it is worth noting that the proof of the latter formula given here, in contrast to most of the proofs in the literature, involves neither Pfaffians nor symplectic tableaux. The particular forms of (5.34) and (5.35) given here are new, although (C. Krattenthaler, personal communication) they can be readily derived from [31, Th. 2.4(3)] together with well-known character formulas for symplectic and special orthogonal characters (see Remark 5.5.3). The analogue of (5.55), again given in terms of orthogonal and symplectic characters, is given in [27, Th. 2]. Remark 5.5.2 We recover our earlier identities by noting that, when |λ| = n, h p1n , sλ (x)i is equal to the number of tableaux of shape λ. So the generating functions for f ~ are obtained by taking inner products of the above formulae with exp(t p1 (x)). But in general taking an inner product with the function exp
X f m pm (x) m
(5.37)
m>0
gives a homomorphism from the algebra of symmetric functions to Q[ f 1 , f 2 , . . .], given by the substitutions pm (x) 7→ f m . In our case, we have tm , m! H (u; x) 7→ eut , h m (x) 7→
i j (x) 7→ I j (2t),
(5.38) (5.39) (5.40)
so, for instance, we obtain X n≥0
f nl
t 2n = EU ∈U (l) exp(2t Re Tr(U )) n!2 = det(I j−k (2t))0≤ j,k
(5.41)
as before. Remark 5.5.3 In [38], it is observed that the determinant det(i j−k (x) + i j−k+2 (x))0≤ j,k
(5.42)
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
29
is related by duality to a certain irreducible character of the symplectic group, and similarly the determinants of Corollary 5.7 are related to irreducible characters of the (odd-dimensional) special orthogonal group, as are those of equations (5.34) and (5.35). It is not clear how this relates to our integrals since the integrals are over groups of fixed dimension, while the characters are associated to groups of arbitrarily high dimension. Remark 5.5.4 The approach in [16] for deriving such formulae is based on the operator Y (1 − z −1 f (z) 7→ φ j z k ) f (z) ,
(5.43)
1≤ j
where φ is the linear operator on Laurent series taking Now, this operator can be expressed as an integral: Y φ (1 − z −1 z ) f (z) k j
Q
1≤ j≤l
b
z j j to
Q
1≤ j≤l
h b j (x).
1≤ j
=
Z
f (eiθ ) [0,2π]l
Y
1≤ j≤l
H (e−iθ j ; x)
Y
1≤ j
(1 − e−iθ j eiθk )
Y
dθ j , (5.44)
1≤ j≤l
which after symmetrizing the integrand becomes simply EU ∈U (l) f (U ) det(H (U † ; x)).
(5.45)
P In [16], formulae are given for a generalization of `(λ)≤l sλ2 (x), in which the sum is over all λ with a given number of odd parts in the dual. This is derived using the homomorphism H ⊥ (β; x) (see [29]) which takes f (x) to f (β, x) for any symmetric function f . The relevant property of H ⊥ (β; x) is that X H ⊥ (β; x)sλ (x) = sλ (β, x) = β |λ|−|µ| sµ (x), (5.46) µ.λ
where µ . λ indicates that λ − µ is a “horizontal strip”; that is, λi0 ≤ µi0 ≤ λi0 + 1
(5.47)
λ1 ≥ µ 1 ≥ λ 2 ≥ µ 2 ≥ . . . .
(5.48)
for all i, or, equivalently,
We also need the dual operator E ⊥ (α; x), which satisfies X E ⊥ (α; x)sλ (x) = α |λ|−|µ| sµ (x), µ.0 λ
(5.49)
30
BAIK AND RAINS
where µ . 0 λ ⇔ µ 0 . λ0 .
(5.50)
H ⊥ (β; x)H (u; x) = (1 − βu)−1 H (u; x),
(5.51)
E ⊥ (α; x)H (u; x) = (1 + αu)H (u; x),
(5.53)
We have the identities ⊥
H (β; x)E(u; x) = (1 + βu)E(u; x), ⊥
E (α; x)E(u; x) = (1 − αu)
−1
E(u; x).
THEOREM 5.6 For all l > 0, we have the integral identities X α f (λ) sλ (x) = EU ∈O(l) det((1 + αU )H (U ; x)),
(5.52) (5.54)
(5.55)
`(λ)≤l
X
`(λ)≤2l
X
`(λ)≤2l+1
0
β f (λ ) sλ (x) = EU ∈Sp(2l) det((1 − βU )−1 H (U ; x)), 0
β f (λ ) sλ (x) = H (β; x) EU ∈Sp(2l) det(H (U ; x)).
(5.56) (5.57)
Proof The point is that given any partition λ, there is a unique partition µ such that λ . 0 2µ; simply add 1 to each odd part of λ, and then divide by 2. In particular, then, f (λ) = |2µ| − |λ|, and thus X X X α f (λ) sλ (x) = α |2µ|−|λ| sλ (x) `(µ)≤l λ.0 2µ
`(λ)≤l
= E ⊥ (α; x)
X
s2µ (x)
`(µ)≤l
= EU ∈O(l) E ⊥ (α; x) det(H (U ; x))
= EU ∈O(l) det((1 + αU )H (U ; x)).
(5.58)
Similarly, X
`(λ)≤2l
0
β f (λ ) sλ (x) = H ⊥ (β; x) EU ∈Sp(2l) det(H (U ; x)) = EU ∈Sp(2l) det((1 − βU )−1 H (U ; x)).
By the argument in [16], the expression for X 0 β f (λ ) sλ (x) `(λ)≤2l+1
(5.59)
(5.60)
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
31
follows from Pieri’s formula H (β; x)sµ (x) =
X
β |λ|−|µ| sλ (x)
(5.61)
λ&µ
(see [29, (5.16) in Chap. I]) and the fact that, for every λ with `(λ) ≤ 2l + 1, there is a unique µ with µ2 . λ; and for that µ, `(µ) ≤ l. Remark. It is worth stressing here that the integral EU ∈Sp(2l) det((1 − βU )−1 H (U ; x))
(5.62)
is a formal integral, in which the expression (1 − βz)−1 stands for the formal power P series 0≤m z m β m . We must therefore take special care when specializing to an analytical integral with |β| ≥ 1. COROLLARY
5.7
For all l > 0, X
`(λ)≤l
sλ (x) = E(1; x) EU ∈O − (l+1) det(H (U ; x)),
(5.63)
so X
`(λ)≤2l
X
`(λ)≤2l+1
sλ (x) = det(i j−k (x) + i j+k+1 (x))0≤ j,k
(5.64)
sλ (x) = H (1; x) det(i j−k (x) − i j+k+2 (x))0≤ j,k
(5.65)
Proof We have X
`(λ)≤l
sλ (x) =
X
1 f (λ) sλ (x)
`(λ)≤l
= EU ∈O(l) det((1 + U )H (U ; x)) 1 = EU ∈O + (l) det((1 + U )H (U ; x)) 2
(5.66)
since det(1 + U ) vanishes on O − (l). But then we can apply Theorem 3.1 to obtain 1 EU ∈O + (l) det((1+U )H (U ; x)) = H (−1; x)−1 EU ∈O − (l+1) det(H (U ; x)), (5.67) 2 as desired. (Recall H (−t; x)E(t; x) = 1.) The remaining formulae are immediate.
32
BAIK AND RAINS
Remark 5.7.1 Again, the determinantal forms are known from [14] (based on an earlier Pfaffian form from [15]). Remark 5.7.2 We could also have derived the formulae by the (formal) substitution β = 1 in the above symplectic formulae, or by taking the limit β → 1− . We also have the following formulae. THEOREM 5.8 For all l ≥ 0,
X
α f (λ) sλ (x) = E(α; x)E U ∈O(l) det(H (U ; x)),
(5.68)
α 2λ1 −l− f (λ) sλ (x) = E(α; x)E U ∈O(l) det(U ) det(H (U ; x)).
(5.69)
λ02 ≤l
X
λ02 ≤l≤λ01
0
Proof We recall E(α; x)sµ (x) =
X
α |λ|−|µ| sλ (x);
(5.70)
µ.0 λ
that is, we sum over all partitions λ obtained from µ by adjoining a vertical strip. On the other hand, X EU ∈O(l) det(H (U ; x)) = sµ (x), (5.71) `(µ)≤l f (µ)=0
EU ∈O(l) det(U ) det(H (U ; x)) =
X
sµ (x).
(5.72)
`(µ)= f (µ)=l
In each case, there is at most one way to remove a vertical strip from a generic partition to obtain one with the desired special form. Thus it remains only to determine which partitions occur. In either case, we note that µl+1 = 0, so λl+1 ≤ 1; it follows that λ02 ≤ l. In the second case, µl ≥ 1, and thus λ01 ≥ l. As these conditions are readily seen to be sufficient, the result follows. Remark. This gives another proof of Corollary 5.7 by subtracting (5.69) from (5.68) with α = 1.
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
33
It remains to consider the formulae corresponding to hyperoctahedral involutions. For an arbitrary partition λ, we define new partitions λ+ = (λ1 , λ3 , . . .), −
λ = (λ2 , λ4 , . . .).
(5.73) (5.74)
Note that λ− and λ+ are unique partitions such that (λ− )2 . λ . (λ+ )2 ; also note that f (λ0 ) = |λ+ | − |λ− |. 5.9 A partition λ has trivial 2-core if and only if f (λ+ ) = f (λ− ) = f (λ)/2. LEMMA
Proof A partition has trivial 2-core if and only if its diagram can be tiled by dominos. By a classical result, this can happen if and only if the diagram contains as many points (i, j) with i + j even as with i + j odd. For a, b ∈ {0, 1}, let C ab be the number of points in the diagram of λ with (i mod 2, j mod 2) = (a, b). Then f (λ+ ) = C11 − C10 ,
(5.75)
f (λ ) = C01 − C00 .
(5.76)
f (λ+ ) − f (λ− ) = C11 − C10 − C01 + C00 = 0.
(5.77)
−
But then
5.10 For all l ≥ 0, X 0 α f (λ)/2 β f (λ )/2 s˜λ (x) = EU ∈U (l) det((1 + αU )(1 − βU )−1 H (U ; x)H (U † ; x)). THEOREM
`(λ)≤2l
(5.78)
Proof It follows from Lemma 5.9 that, for any λ with trivial 2-core, there exist unique partitions µ and ν such that (5.79) λ . ν 2 .0 2µ2 , and which satisfy f (λ) = 2 f (ν) = |2µ2 | − |ν 2 |, 0
2
f (λ ) = |ν | − |λ|.
(5.80) (5.81)
34
BAIK AND RAINS
Consequently, X X p √ 0 φ2 (−α) f (λ)/2 β f (λ )/2 sλ = φ2 E ⊥ ( −α; x)H ⊥ ( β; x)s2µ2 (x) . `(λ)≤2l
`(µ)≤l
(5.82)
It thus suffices to show that, for any partition µ, φ2 (E ⊥ (a; x)H ⊥ (b; x)s2µ2 (x)) = sµ (x)(E ⊥ (−a 2 ; x)H ⊥ (b2 ; x)sµ (x))
(5.83)
since then we may set y = x in sµ (y)(E ⊥ (α; x)H ⊥ (β; x)sµ (x))
X
`(µ)≤l
= EU ∈U (l) det((1 + αU )(1 − βU )−1 H (U ; x)H (U † ; y)). (5.84)
By the Jacobi-Trudi identity, s2µ2 (x) is the determinant of the block matrix det(Bµi + j−i )1≤i, j≤`(µ) ,
(5.85)
where Bm =
h 2m h 2m+1
h 2m−1 . h 2m
Equivalently, via row and column operations, we could take h 2m−1 h 2m − bh 2m−1 Bm = . h 2m+1 − (a + b)h 2m + abh 2m−1 h 2m − ah 2m−1 Upon applying E ⊥ (a; x)H ⊥ (b; x) and then φ2 , we obtain hm (a + b)H ⊥ (b2 ; x)h m−1 Bm0 = . 0 E ⊥ (−a 2 ; x)H ⊥ (b2 ; x)h m
(5.86)
(5.87)
(5.88)
But then det(Bµ0 i + j−i )1≤i, j≤`(µ) = sµ (x)(E ⊥ (−a 2 ; x)H ⊥ (b2 ; x)sµ (x)),
(5.89)
as required. COROLLARY
We have X
`(λ)≤2l+1
5.11 0
α f (λ)/2 β f (λ )/2 s˜λ (x) = H (β; x) EU ∈U (l) det((1 + αU )H (U ; x)H (U † ; x)). (5.90)
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
35
Proof Here we use the following analogue of Pieri’s formula: p H (β; x)˜sλ (x) = (−1) f (λ)/2 φ2 (H ( β; x)sλ (x)) X (−1)( f (λ)− f (µ))/2 β (|λ|−|µ|)/2 s˜µ (x). =
(5.91)
µ&λ
Thus H (β; x)
X
`(λ)≤l
α f (λ) s˜λ2 (x) = = =
X
`(λ)≤l µ&λ2
β |µ|/2−|λ| α f (λ) s˜µ (x)
X
`(µ)≤2l+1
X
`(µ)≤2l+1
−
−
β |µ|/2−|µ | α f (µ ) s˜µ (x) 0
β f (µ )/2 α f (µ)/2 s˜µ (x).
(5.92)
Analogously, we have the following corollary. COROLLARY
5.12
For all l > 0, X
µ02 ≤2l
X
0
α f (µ)/2 β f (µ )/2 s˜µ (x)
µ02 ≤2l+1
= E(α; x)E U ∈U (l) det((1 − βU )−1 H (U ; x)H (U † ; x)), (5.93) 0
α f (µ)/2 β f (µ )/2 s˜µ (x) = E(α; x)H (β; x)E U ∈U (l) det(H (U ; x)H (U † ; x)). (5.94)
Proof Dualizing the proof of Corollary 5.11, we find X X 0 0 E(α; x) β f (λ ) s˜2λ (x) = α f (µ)/2 β f (µ )/2 s˜µ (x), `(λ)≤l
(5.95)
µ
where the sum is over µ such that ((µ0 )− )0 = λ with `(λ) ≤ l. But, as in the proof of Theorem 5.8, this condition is simply that µ02 ≤ l. The result follows.
36
BAIK AND RAINS
6. Pfaffian identities As we remarked earlier, many of the earlier proofs of the known identities from Section 5 proceeded via an intermediate Pfaffian form. We sketch analogous proofs of the remaining identities of that section. We recall the definition of the Pfaffian. If A is a 2n × 2n antisymmetric matrix, the Pfaffian pf(A) is defined by pf(A) =
Y 1 X σ (π) Aπ(2 j−1)π(2 j) . 2n n! π∈S2n
(6.1)
1≤ j≤n
In the odd-dimensional case, it is convenient to use the notation pf(v; A)
(6.2)
to denote the Pfaffian of A bordered by v. The fundamental identity that we use is the following theorem. THEOREM 6.1 (N. de Bruijn [9]) Let X be a measure space, let ρ(x, y) be an antisymmetric function on X × X , and let φ j (x) be a sequence of functions on X , such that, for all j and k, the function φ j (x)ρ(x, y)φk (y) is absolutely integrable. Then, for n even,
Z
xE
pf(ρ(x j , xk ))1≤ j,k≤n det(φ j (xk ))1≤ j,k≤n Z = n! pf
φ j (x)ρ(x, y)φk (y)
x,y
For n odd, Z pf(ρ(x j , xk ))1≤ j,k≤n det(φ j (xk ))1≤ j,k≤n xE Z Z = n! pf φ j (x) ; φ j (x)ρ(x, y)φk (y) x
1≤ j≤n
x,y
1≤ j,k≤n
1≤ j,k≤n
. (6.3)
, (6.4)
where the integrals are all over X . Proof For n even, we have Z pf(ρ(x j , xk ))1≤ j,k≤n det(φ j (xk ))1≤ j,k≤n xE Z Y X Y = σ (π1 π2 ) ρ(xπ1 (2 j−1) , xπ2 (2 j) ) φ j (xπ2 ( j) ) π1 ,π2 ∈Sn
xE 1≤ j≤n/2
1≤ j≤n
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
= n! = n!
X
σ (π)
X
σ (π)
π∈Sn
π∈Sn
Z
Y
xE 1≤ j≤n/2
Y
ρ(xπ(2 j−1) , xπ(2 j) )
Z
Y
37
φπ( j) (xπ( j) )
1≤ j≤n
φπ(2 j−1) (x)ρ(x, y)φπ(2 j) (y).
(6.5)
1≤ j≤n/2 x,y
For n odd, we simply adjoin an extra element ∞ to X and add a new function φ 0 that is zero away from ∞; then we apply the identity for n even. Remark. As noted in [9], since the proof holds for arbitrary measure spaces, this includes the case of a sum, thus obtaining a Pfaffian analogue of the Cauchy-Binet formula (which is also readily obtained as a special case). As such, this was independently rediscovered in [21], as well as some extensions (e.g., to a q-analogue) which we do not need. It is also worth noting that de Bruijn’s formula has applications to random matrices (see [41]). Associated to any partition λ of length less than or equal to l, we associate an l-tuple µl (λ) of distinct integers, defined by µl (λ) j = λl− j + j
(6.6)
with 0 ≤ j < l. In terms of this, the Jacobi-Trudi identity becomes sλ (x) = det(h µl (λ)k − j (x))0≤ j,k
(6.7)
We observe that this has the appropriate form for the application of Theorem 6.1, where we take φ j (µk ) = φ j (µk ; x) := h µk − j (x). (6.8) Thus, for instance, we have X 1 sλ (x)sλ (y) = l! `(λ)≤l
X
µ1 ,µ2 ,...,µl ∈N
= det
X
det(φ j (µk ; x)) det(φ j (µk ; y))
φ j (µ; x)φk (µ; x)
µ∈N
= det(g j−k (x; y))0≤ j,k
0≤ j,k
(6.9)
This is, of course, precisely I. Gessel’s original proof in [13], slightly restated. Similarly, the identity for · follows analogously. To handle the involution cases, we note the following. Define a function F(µ) on l-tuples of nonnegative integers as follows: if µ is increasing (and thus µ = µl (λ) for some λ), then 0 (6.10) F(µ) = α f (λ) β f (λ ) ; otherwise, F(µ) is alternating under permutations of µ.
38
BAIK AND RAINS
LEMMA 6.2 [20] If l is even, then
F(µ) = pf(F(µi µk ))0≤i,k
(6.11)
F(µ) = − pf F(µi )0≤i
(6.12)
while if l is odd, then
At this point we can write X
0
`(λ)≤l
α f (λ) β f (λ ) sλ (x) =
1 l!
X
µ1 ,µ2 ,...,µl ∈N
F(µ) det(φ j (µk ; x))0≤ j,k
(6.13)
For l even, this becomes 1 l!
X
µ1 ,µ2 ,...,µl ∈N
pf(F(µi µk ))0≤i,k
X
F(µν)φ j (µ)φk (ν)
µ,ν
For l odd, we must border the Pfaffian with X − F(µ)φ j (µ) µ
0≤ j
.
0≤ j,k
. (6.14)
(6.15)
In three cases (corresponding to , , ), this Pfaffian simplifies. We consider only in detail (in particular since the resulting Pfaffians for the other cases are more complicated). For l even, after simplifying, we find X M( j, k) := F(µ, ν)φ j (µ)φk (ν) = M0 ( j, k) + M1 ( j, k), (6.16) µ,ν
where M0 ( j, k) X 1 = 2
0≤d<(k− j)
(1 + α 2 )i 2d+1−(k− j) (x) + α(i 2d−(k− j) (x) + i 2d+2−(k− j) (x)) , (6.17)
while M1 ( j, k) = 0 unless k − j is odd, in which case M1 ( j, k) = (−1) j
1 − α2 H (1; x)H (−1; x) = (−1) j H (1; α, x)H (−1; α, x). 2 (6.18)
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
39
Now, M1 has rank 2, so pf(M) can be simplified via appropriate row and columns. To be precise, if we subtract row/column i from row/column i + 2 in decreasing order, M1 is now zero except when ( j, k) = (0, 1) or (1, 0). Thus X
`(λ)≤l
α f (λ) sλ (x) = pf(M0 ) +
1 H (1; α, x)H (−1; α, x) pf(M00 ), 2
(6.19)
where M00 ( j, k) = i k− j−1 (x) − i k− j+1 (x).
(6.20)
Our earlier formula follows from the next identity, due to B. Gordon [14]. 6.3 Let x j be an odd function of j ∈ Z. Then X pf(x k− j )0≤ j,k<2l = det x j−k+2m+1 THEOREM
0≤m≤k
pf((z j−l )0≤ j<2l+1 ; (x k− j )0≤k, j<2l+1 ) = det
X 0≤m≤k
z+
0≤ j,k
(6.21)
,
1 x j−k+2m+1 z
− (x j−k+2m + x j−k+2m+2 )
. 0≤ j,k
(6.22)
Remark. In addition to the proof in [14], one can also prove this by showing that the second Pfaffian gives an appropriate orthogonal polynomial or by applying a special case of de Bruijn’s formula (with X the unit circle). The computation for l odd is analogous. 7. More increasing subsequence problems It turns out that the Schur function sums considered above can in many cases be given direct interpretations in terms of suitably defined increasing subsequences of random multisets. Let 1 and 2 be totally ordered sets, and let W1 ⊂ 1 and W2 ⊂ 2 be arbitrarily chosen subsets, with complements W1 and W2 . It is also convenient to use the corresponding indicator functions χ1 and χ2 . Definition 2 A (W1 , W2 )-increasing sequence in 1 × 2 is a sequence (xi , yi ) such that xi ≤ xi+1
and
yi ≤ yi+1
(7.1)
40
BAIK AND RAINS
for all i and such that xi = xi+1 =⇒ xi ∈ W1 ,
(7.2)
yi = yi+1 =⇒ yi ∈ W2 .
(7.3)
Thus Wi specifies where the ith coordinate is allowed to weakly increase. In particular, a (1 , 2 )-increasing sequence is weakly increasing, while a (∅, ∅)-increasing sequence is strictly increasing. We also have a notion of a (W 1 , W2 )-decreasing sequence, in which the inequality for y is reversed. Definition 3 A finite multiset M from 1 × 2 is (W1 , W2 )-compatible if for all i and j such that χ1 (i) 6= χ2 ( j), (i, j) occurs at most once in M. If M is a (W1 , W2 )-compatible multiset from 1 × 2 , define l W1 W2 (M) to be the length of the longest (W1 , W2 )-increasing subsequence of M (with, of course, the condition that an element may appear in the sequence at most as many times as it − appears in M). We also require the notation l W (M) for the length of the longest 1 W2 (W1 , W2 )-decreasing subsequence of M. We now define, for each symmetry type and for certain choices of (W1 , W2 ) for each symmetry type, a random (W1 , W2 )-compatible multiset and thus a random longest increasing subsequence length. We define the multiset by specifying for each (x, y) the distribution of the multiplicity w(x, y) of (x, y) in M; the multiplicities are independent unless otherwise specified. Each multiset distribution also depends on certain parameters. For convenience, we use the following notation for the distributions that appear: g(q) for the geometric distribution with parameter q < 1, b(q) for a variable that is zero with probability 1/(1 + q) and 1 with probability q/(1 + q), and g 0 (α, q) for a variable with Pr(g 0 (α, q) = k) =
1 − q2 k α 1 + αq
mod 2 k
q .
(7.4)
We denote the set of multisets from a set S by M(S). Case . The parameters are subsets W, W 0 ⊂ Z+ and nonnegative sequences qi and qi0 such that qi q 0j < 1 whenever χ1 (i) = χ2 ( j) and such that Z W,W 0 (q; q 0 ) :=
Y
(1 − qi q 0j )
χ(i)=χ 0 ( j)
Y
(1 + qi q 0j )−1 6= 0.
(7.5)
χ(i)6 =χ 0 ( j)
M ∈ M(Z+ × Z+ ) is chosen as follows: χ(i) = χ 0 ( j) : w(i, j) ∼ g(qi q 0j ),
(7.6)
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
41
χ(i) 6= χ 0 ( j) : w(i, j) ∼ b(qi q 0j ).
(7.7)
Denote the variable l W W 0 (M) by l W W 0 (q; q 0 ). Case · . The parameters are subsets W, W 0 ⊂ Z symmetric under negation and nonnegative sequences qi and qi0 such that qi q 0j < 1 whenever χ(i) = χ 0 ( j) and such that Y Y Z W· W 0 (q; q 0 ) := (1 − qi q 0j )2 (1 + qi q 0j )−2 6= 0. (7.8) χ(i)=χ 0 ( j)
χ(i)6 =χ 0 ( j)
M ∈ M(Z × Z) is chosen as follows: w(i, j) = 0 if i = 0 or j = 0; otherwise, χ(i) = χ 0 ( j) : w(i, j) = w(−i, − j) ∼ g(q|i| q|0 j| ),
χ(i) 6= χ 0 ( j) : w(i, j) = w(−i, − j) ∼ b(q|i| q|0 j| ).
(7.9) (7.10)
Denote the variable l W W 0 (M) by l W· W 0 (q; q 0 ). Case . The parameters are a subset W ⊂ Z+ , a real number α ≥ 0, and a nonnegative sequence qi such that qi < 1 for all i, αqi < 1 when χ(i) = 1, and Y Y Z W (q; α) := (1 − αqi ) (1 + αqi )−1 (1 − qi2 ) χ(i)=1
·
Y
i< j χ(i)=χ( j)
χ(i)=0
(1 − qi q j )
Y
i< j χ(i)6 =χ( j)
(1 + qi q j )−1 6= 0.
(7.11)
M ∈ M(Z+ × Z+ ) is chosen as follows. For i 6= j, χ(i) = χ( j) : w(i, j) = w( j, i) ∼ g(qi q j ),
(7.12)
χ(i) 6= χ( j) : w(i, j) = w( j, i) ∼ b(qi q j ).
(7.13)
χ(i) = 1 : w(i, i) ∼ g(αqi ),
(7.14)
For i = j, 0
χ(i) = 0 : w(i, i) ∼ g (α, qi ).
(7.15)
Denote the variable l W W (M) by l W (q; α). Case . The parameters are a subset W ⊂ Z+ , a real number β ≥ 0, and a nonnegative sequence qi such that qi < 1 for all i, βqi < 1 when χ(i) = 0, and Y Y (1 − βqi ) (1 + βqi )−1 (1 − qi2 ) Z W (q; β) := χ(i)=0
·
Y
i< j χ(i)=χ( j)
χ(i)=1
(1 − qi q j )
Y
i< j χ(i)6 =χ( j)
(1 + qi q j )−1 6= 0.
(7.16)
42
BAIK AND RAINS
M ∈ M(Z+ × Z− ) is chosen as follows. For i 6= − j, χ(i) = χ(− j) : w(i, − j) = w( j, −i) ∼ g(qi q− j ),
(7.17)
χ(i) 6= χ(− j) : w(i, − j) = w( j, −i) ∼ b(qi q− j ).
(7.18)
χ(i) = 0 : w(i, −i) ∼ g(βqi ),
(7.19)
For i = − j, 0
χ(i) = 1 : w(i, −i) ∼ g (β, qi ).
(7.20)
Denote the variable l W,−W (M) by l W (q; β). Case . The parameters are a subset W ∈ Z symmetric under negation, real numbers α ≥ 0 and β ≥ 0, and a nonnegative sequence qi such that qi , αqi < 1 when χ(i) = 1, qi , βqi < 1 when χ(i) = 0, and Y Z W (q; α, β) := (1 − βqi )(1 + αqi )−1 χ(i)=0
·
Y
χ(i)=1
(1 + βqi )−1 (1 − αqi )Z W W (q; q) 6= 0.
(7.21)
M ∈ M(Z × Z) is chosen as follows: w(i, j) = 0 if i = 0 or j = 0; otherwise, for |i| 6= | j|, χ(i) = χ( j) : w(i, j) = w(−i, − j) = w( j, i) = w(− j, −i) ∼ g(q |i| q| j| ), (7.22)
χ(i) 6= χ( j) : w(i, j) = w(−i, − j) = w( j, i) = w(− j, −i) ∼ b(q |i| q| j| ); (7.23) for i = j, χ(i) = 0 : w(i, i) = w(−i, −i) ∼ g 0 (α, q|i| ),
(7.24)
χ(i) = 1 : w(i, i) = w(−i, −i) ∼ g(αq|i| );
(7.25)
χ(i) = 0 : w(i, −i) = w(−i, i) ∼ g(βq|i| ),
(7.26)
and, for i = − j, 0
χ(i) = 1 : w(i, −i) = w(−i, i) ∼ g (β, q|i| ).
(7.27)
Denote the variable l W W (M) by l W (q; α, β). Remark. The conditions on the parameters are simply (1) that the various probability distributions are defined and (2) that M be finite with probability 1. In each case the quantity Z ~ gives the probability that the multiset is empty.
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
43
To relate these variables to our Schur function sums, we need the notion of a “superSchur” (also known as a “hook Schur”) function. If x i and yi are countable sets of variables, we define sλ (x/y) to be the image of sλ under the homomorphism H (t; z) 7→ H (t; x/y) := H (t; x)E(t; y)
(7.28)
(see [29, Example 3.21], but note that we have used a slightly different sign convention). In particular, since this is defined via a homomorphism, all of our identities are valid for such functions; for example, X sλ (x/y)sλ (z/w) = EU ∈U (l) det(H (U ; x)E(U ; y)H (U † ; z)E(U † ; w)). `(λ)≤l
(7.29)
Then the relation to our symmetric function identities is as follows. THEOREM 7.1 For any valid choices of parameters,
Pr(l W W 0 (q; q 0 ) ≤ l) = Z W W 0 (q; q 0 ) Pr(l W· W 0 (q; q 0 ) ≤ l) = Z W· W 0 (q; q 0 ) Pr(l W (q; α) ≤ l) = Z W (q; α) Pr(l W (q; β) ≤ l) = Z W (q; β)
X
`(λ)≤l
X
`(λ)≤l
X
0 0 sλ (qW /qW )sλ (qW 0 /q W 0 ),
(7.30)
0 0 s˜λ (qW /qW )˜sλ (qW 0 /q W 0 ),
(7.31)
α f (λ) sλ (qW /qW ),
(7.32)
`(λ)≤l
X
0
β f (λ ) sλ (qW /qW ),
(7.33)
`(λ)≤l
Pr(l W (q; α, β) ≤ l) = Z W (q; α, β)
X
`(λ)≤l
0
α f (λ)/2 β f (λ )/2 s˜λ (qW /qW ).
(7.34)
Remark. These processes are generalizations of processes studied by Johansson [24]. In particular, he studies the process l Z+ Z+ in the special case qi =
(√ q, 1 ≤ i ≤ N , 0,
qi0
i > N,
=
(√ q, 1 ≤ i ≤ M, 0,
i > M,
(7.35)
as well as the process l Z+ in the special case α = 1,
qi =
(√ q, 1 ≤ i ≤ N , 0,
i > N.
In both cases, he also studies an appropriate limit as q → 1.
(7.36)
44
BAIK AND RAINS
To prove this theorem, we need a generalization of the Knuth correspondences in [25]. Let be a totally ordered set, let W be a chosen subset, and let λ be a partition. Definition 4 An (, W )-bitableau T of shape λ is a function from the diagram of λ to which is weakly increasing along each row and column, such that any element of W appears at most once in each column and such that any element of W appears at most once in each row. Remark. This is essentially the same as the notion of the bitableau given in [29, Example 5.23] (see also [4]). We denote the set of such bitableaux as Bλ (, W ) and observe (see [29]) that if xi is a sequence of indeterminates, then, for any subset W ∈ Z + , X x T = sλ (x W /x W ), (7.37) T ∈Bλ (Z+ ,W )
where for a bitableau T , x T is the product of x im i , where i appears in T m i times. THEOREM 7.2 Given a pair 1 and 2 of totally ordered sets, and respective subsets W1 and W2 , there exists a bijective correspondence K W1 W2 that, given a (W1 , W2 )-compatible multiset M, produces a pair (P, Q) of bitableaux of the same shape such that P is a (1 , W1 )-bitableau and Q is a (2 , W2 )-bitableau. A given value appears in the first (resp., second) tableau exactly as many times as it appears as the first (resp., second) coordinate in M.
Proof This is essentially shown in [5] (see also [35]). While the references deal only with the case in which every element of W is greater than every element of W , the proofs carry over directly. Remark 7.2.1 We have switched the order of tableaux usually used in the Robinson-Schensted correspondence. Remark 7.2.2 The special cases K 1 2 and K 1 ∅ are known as the Knuth correspondence and the dual Knuth correspondence (see [25]), respectively, while the special case K ∅∅ is known as the Burge correspondence (see [8], [12]).
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
45
Since the above correspondence can be reduced to the Robinson-Schensted correspondence (see [5]), all of the usual properties carry over. We observe that the operations of inflation and deflation carry over to bitableaux, and thus one can define the dual P ∗ of a bitableau. That P ∗ is indeed a bitableau follows from the next theorem. THEOREM 7.3 Let ι1 : 1 → 01 and ι2 : 2 → 02 be order-reversing maps; also, for a multiset M, define M t to be the multiset in 2 × 1 obtained by exchanging the coordinates. For any finite multiset M ⊂ 1 × 2 , the following are equivalent:
K W1 W2 (M) = (P, Q),
(7.38)
∗
∗
K W1 W2 ((ι1 × ι2 )(M)) = (ι1 (P ), ι2 (Q )), t
K W2 W1 (M ) = (Q, P), t
∗
∗
(7.39) (7.40)
K W2 W1 ((ι2 × ι1 )(M )) = (ι2 (Q ), ι1 (P )),
(7.41)
K W1 W2 ((1 × ι2 )(M)) = (P t , ι2 ((Q ∗ )t )),
(7.43)
K W2 W1 ((ι2 × 1)(M )) = (ι2 ((Q ) ), P ),
(7.44)
∗ t
t
K W1 W2 ((ι1 × 1)(M)) = (ι1 (P ) , Q )), t
∗ t
t
K W2 W1 ((1 × ι1 )(M t )) = (Q t , ι1 ((P ∗ )t )).
(7.42)
(7.45)
Remark. For the cases K 1 2 and K ∅∅ , this theorem appears in [12] and [43]. When W1 = W2 = W and M = M t , the two bitableaux are the same; as in the Robinson-Schensted correspondence, we can describe the number of odd-length columns. THEOREM 7.4 Let M be a finite multiset in × with M = M t . If λ is the common shape of K W W (M), then f (λ0 ) is equal to the sum of (1) the number of elements of M of the form (x, x) with x ∈ W and (2) the number of x ∈ W that appear an odd number of times in M.
Proof This was known for K 1 2 (see [25]) and for K ∅∅ (see [8]); it thus follows in general. Let M be a (W1 , W2 )-compatible multiset from 1 × 2 . We observe that l W1 W2 (M) ≤ l whenever M can be written as the union of l (W1 , W2 )-decreasing sequences. (Again, we use (W1 , W2 )-compatibility.) This motivates the notation
46
BAIK AND RAINS
(k) lW (M) for the size of the largest submultiset M 0 of M with l − 1 W2
W1 W2
similarly for l −(k) (M).
(M 0 ) ≤ k, and
W1 W2
7.5 Let M be a finite (W1 , W2 )-compatible multiset from 1 × 2 . If λ is the common shape of K W1 W2 (M), then X X (k) λi = l W (M), λi0 = l −(k) (M). (7.46) 1 W2 THEOREM
1≤i≤k
1≤i≤k
W1 W2
Remark. For the Robinson-Schensted correspondence, this was proved in [18]; the extension to the general case is in [5]. In particular, λ1 gives the length of the longest (W1 , W2 )-increasing sequence. Proof of Theorem 7.1 We generalize the argument of [24]. Consider, for instance, the case . In this case, the probability that our random multiset M(q; α) is equal to a given fixed multiset M is 0 Pr(M(q; α) = M) = Pr(M(q; α) = ∅)α f (λ ) q P , (7.47) where K W W (M) = (P, P) with P of shape λ. Thus X Pr(l W (q; α) ≤ l) = Pr(M(q; α) = M) λ1 ≤l λW W (M)=λ
= Pr(M(q; α) = ∅) = Z W (q; α)
X
X
α f (λ) q P
`(λ)≤l P∈Bλ0 (Z+ ,W )
α f (λ) sλ (qW /qW ).
(7.48)
`(λ)≤l
Similar arguments hold for types and . For · and , we also need the fact that s˜λ (x/y) can be defined in terms of a sum over self-dual bitableaux. Clearly, it suffices to show that the usual correspondence between self-dual tableaux and pairs of tableaux extends to bitableaux. Define a domino (, W )-bitableau of shape λ to be a tiling of the diagram of λ with labeled dominos such that the labels increase weakly in each row and column, and such that (1) for x ∈ W no column hits more than one domino labeled x and (2) for x ∈ / W no row hits more than one domino labeled x. We readily verify (using the fact that deflation preserves the bitableau property) that we have a bijection between self-dual bitableaux and domino bitableaux.
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
47
To proceed from domino bitableaux to pairs of ordinary bitableaux, it suffices to show that the usual correspondence preserves the bitableau property. But this follows from the fact that the correspondence is valid for column-strict tableaux (and thus for row-strict tableaux by symmetry), as remarked in [37]. Combining these bijections, we obtain the desired correspondence and thus the theorem for · and . Theorem 7.1 motivates the following change of notation: 0 0 l W W 0 (q; q 0 ) → l (qW /qW ; qW 0 /q W 0 ), l · (q; q 0 ) → l · (q /q ; q 0 /q 0 ), WW0
W
W
W0
W0
l W (q; α) → l (q W /qW ; α),
l W (q; β) → l (q W /qW ; β),
l W (q; α, β) → l (q W /qW ; α, β).
(7.49) (7.50) (7.51) (7.52) (7.53)
It is somewhat startling that, even though the new notation in principle discards information, the resulting distributions are in fact exactly the same. For the involution cases, the integral formulae of Section 5 imply further the following corollary. 7.6 For any valid parameter choices, the following pairs of random variables have the same distribution: COROLLARY
l (q/r ; α) ∼ l (q/α, r ; 0), k 1 l (q/r ; β) ∼ l (q/r ; 0), 2 2 l1 m 1 l (q/r ; β) ∼ l (β, q/r ; 0), 2 2 j1 k l (q/r ; α, β) ∼ l (q/α, r ; q/r ), 2 l1 m l (q/r ; α, β) ∼ l (β, q/α, r ; q/r ), 2 j1
(7.54) (7.55) (7.56) (7.57) (7.58)
where, for instance, l (α, q/r ; 0) corresponds to a process in which α has been inserted into the sequence q. Proof We have Pr(l (q/r ; α) ≤ l) = Z (q/r ; α) EU ∈O(l) det((1 + αU )H (U ; q/r )) = Z (q/α, r ; 0) EU ∈O(l) det(H (U ; q/α, r ))
(7.59)
48
BAIK AND RAINS
and similarly for the other cases. Remark 7.6.1 By the arguments of Theorem 5.6, one can prove something stronger; namely, the joint distribution of the lengths of the odd-numbered rows is the same for (q/r ; α) and for (q/α, r ; 0), and similarly for the other cases. For the even-numbered rows, the argument of Theorem 5.8 shows that the joint distribution is independent of α. In the Laguerre limit (see below), we obtain the following fact. Consider the “matrix ensemble” with joint eigenvalue density (on [0, ∞)) proportional to Y Y pf(sgn(x k − x j )e A|xk −x j | )1≤ j,k≤2n (xk − x j ) e−C x j , (7.60) 1≤ j
1≤ j≤2n
where A and C are parameters with C > max(A, 0); note that if x 1 < x2 < · · · < x 2n , then X (−1) j x2 j ). (7.61) pf(sgn(x k − x j )e A|xk −x j | )1≤ j,k≤2n = exp(A 1≤ j≤2n
Then the joint distribution of the second, fourth, sixth, and so on, largest eigenvalues is independent of A. Since for A = 0 this ensemble is the Laguerre orthogonal ensemble (LOE), while in the limit A → −∞ it becomes the Laguerre symplectic ensemble (LSE), we find in particular that the distribution of the second-largest eigenvalue of LOE is the same as the distribution of the largest eigenvalue of LSE (since every eigenvalue of LSE occurs twice). For an alternate proof, and generalizations, see [11]. Remark 7.6.2 We can recover Theorems 1.2 and 4.2 from Theorem 7.1 by taking suitable limits. For instance, for , we take qi = qi0 = t/N for 1 ≤ i ≤ N , and qi = qi0 = 0 otherwise. As N → ∞, the resulting point process converges to the usual Poisson process. In Corollary 7.6, if we take the corresponding limit for the right-hand sides, we obtain Poisson processes in which, instead of adding extra diagonal points, we add extra side points. Thus we obtain the fact that these two Poisson processes have exactly the same distribution. For instance, if n and m are nonnegative integers, the distribution of the longest weakly increasing subsequence is the same if we choose either of the following: (1) pick n points at random in the triangle 0 ≤ y ≤ x ≤ 1, and m points at random with 0 ≤ x = y ≤ 1; or (2) pick n points at random in the triangle 0 ≤ y ≤ x ≤ 1, and m points at random with y = 0 and 0 ≤ x ≤ 1. It is not at all clear why these distributions should be the same. Remark 7.6.3 By taking a Poisson limit for only some of the variables, we obtain increasing sub-
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
49
sequence interpretations for the image of the Schur function sums under homomorphisms of the form H (t; z) 7→ eat H (t; x)E(t; y) (7.62) with a, xi , and yi nonnegative (and satisfying the appropriate additional conditions). As this is the most general case in which the images of sλ are all nonnegative (see [40]), this is presumably the most general case for which such an interpretation can be given. Remark 7.6.4 In addition to the Poisson limit, another natural limit is the Laguerre limit (see [24]), in which the random multiplicities are exponentially distributed. More precisely, one fixes integers N (and N 0 , as necessary), and one considers a process in which q and q 0 are constant on the respective intervals [1, N ] and [1, N 0 ], and zero otherwise; then one takes the limit q, q 0 → 1. In the case α = β = 0, one finds the following curious fact, analogous to Theorem 1.2. Given a complex (N × N 0 )-matrix M, we define two decreasing sequences of nonnegative real numbers. 6(M)i is the ith largest eigenvalue of M M † , or zero if i > min(N , N 0 ), while 1(M)i is defined such that X X 1(M)i = max |Mkl |2 , (7.63) 1≤i≤ j
S
(k,l)∈S
where S ranges over unions of j decreasing paths. 7.7 Let N be a positive even integer. Map H (recall Section 1) into transformations of N × N complex matrices as follows: THEOREM
7→ (M 7→ −M t )
and
7→ (M 7→ J M t J ).
(7.64)
Let G ~ (N ) be the Gaussian distribution on matrices with the appropriate symmetry. Then, for each ~, the distributions 6(G ~ (N )) and 1(G ~ (N )) are the same. Proof (Sketch) That we can compute the distribution of 1(G ~ (N )) (as well as the significance of this result) follows from the fact that it is the Laguerre limit of the appropriate discrete process with symmetry ~. Thus, as in [24] for , we find that the appropriate sum of Pfaffians (see Section 6) tends to a Riemann integral. On the other hand, the distributions G ~ (N ) for , , and are well studied (these are unconstrained, antisymmetric, and symmetric complex matrices, resp.), and, in particular, the distribution 6(G ~ (N )) is known in these cases (see [11]). For
50
BAIK AND RAINS
· and , we can perform simple row and column operations (not changing 6(M)) to reduce to the case of . We find that the two density functions we obtain are the same, proving the theorem. Remark. For be square.
and · , the matrices need not
and , N need not be even, while for
8. Invariants and increasing subsequences Consider the integral EU ∈U (l) | Tr(U )|2n .
(8.1)
The function | Tr(U )|2n is the character of a representation of U (l) since | Tr(U )|2n = Tr(U ⊗n ⊗ U
⊗n
),
(8.2)
where we write A⊗n for the tensor product of A with itself n times. It follows that (8.1) (and thus f nl ) gives the dimension of the fixed subspace of that representation. Equivalently, this is the dimension of the space C n (U (l)) of operators on (Cl )⊗n that commute with U ⊗n for all U ∈ U (l). Given a permutation π ∈ Sn , we can associate an operator Tl (π) on (Cl )⊗n as follows: Tl (π)(v1 ⊗ v2 ⊗ · · · ⊗ vn ) = (vπ(1) ⊗ vπ(2) ⊗ · · · ⊗ vπ(n) ).
(8.3)
This operator clearly commutes with U ⊗n for U ∈ U (l), and thus Tl extends to a map from C[Sn ] to Cn (U (l)). Indeed, this map is known to be surjective (see, e.g., [7]); that is, the operators Tl (π) span C n (U (l)). In general, however, it is not injective. For any subset S ⊂ {1, 2, . . . , n}, define two elements of C[Sn ]: X E S := σ (π)π, (8.4) π∈Sn π(x)=x, ∀x ∈S /
HS :=
X
π.
(8.5)
π∈Sn π(x)=x, ∀x ∈S /
8.1 For l ≥ n, Tl is injective on C[Sn ], while for l < n, the kernel of Tl contains all elements of the form π E S with |S| > l. LEMMA
Proof That Tl is injective for l ≥ n is straightforward; we simply note that if v1 , v2 , . . . , vn
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
51
are linearly independent vectors, then the vectors Tl (π)(v1 ⊗ v2 ⊗ · · · ⊗ vn ) = vπ(1) ⊗ vπ(2) ⊗ · · · ⊗ vπ(n)
(8.6)
are linearly independent, as π ranges over Sn . Thus, suppose l < n. That π E S ∈ ker Tl follows from the fact that any tensor product of |S| > l basis vectors must contain at least one basis vector more than once, so it is taken to zero by E S . Remark. It follows from the proof of Theorem 8.2 that these elements also span the kernel. Using this fact, we obtain the following theorem. THEOREM 8.2 For any nonnegative integers l and n, the set
{Tl (π) : π ∈ Sn |`− (π) ≤ l}
(8.7)
is a basis of C n (U (l)). Proof We first need to show that, given any permutation π with a long decreasing subsequence, we can express Tl (π) as a linear combination of Tl (π 0 ) with π 0 ranging over permutations without long decreasing subsequences. Let π be such a permutation, and let S be a subset of size l + 1 on which π is decreasing. By the lemma, it follows that X Tl (π) = Tl (π − π E S ) = −σ (π) σ (π 0 )Tl (π 0 ). (8.8) π 0 ∈Sn π 0 (x)=π(x), ∀x ∈S / π 0 6 =π
Now each permutation π 0 on the right-hand side agrees with π outside S, but it is no longer decreasing on S. It follows that each π 0 has strictly fewer inversions than π. (Reduce to the case in which π 0 differs from π by a 2-cycle.) It follows that if we iterate this reduction, we eventually obtain a linear combination of permutations that cannot be reduced. But this is precisely the desired result. It remains only to show that the elements Tl (π) are linearly independent, which we do via a triangularity argument. If we choose a basis of V , we can view Tl as the restriction to {1, 2, . . . , l}n of the corresponding action of Sn on the set Wn = Nn . Now, to a given permutation π, we associate two words w1 (π) and w2 (π) as follows: w1 (π) j is equal to the length of the longest decreasing subsequence of π starting with
52
BAIK AND RAINS
j; similarly, w2 (π) j is equal to the length of the longest decreasing subsequence of π starting in position j. (Since π has longest decreasing subsequence of length less than or equal to l, these are indeed in {1, 2, . . . , l}n .) We easily see that w2 (π) = π(w1 (π)) and thus that Tl (π) has coefficient 1 on the pair (w1 (π), w2 (π)). The claim is then that any other permutation taking w1 (π) to w2 (π) has strictly more inversions than π. Define the number of inversions i(w) of a word w to be the number of coordinate positions i < j such that w j < wi . By standard arguments, we find that if π(v) = w, then i(w) ≤ i(v) + i(π); equality holds if, for all pairs i < j such that πi > π j , we have vi < v j and wi > w j . We immediately deduce that (a) i(w2 (π)) = i(w1 (π)) + i(π) and (b) if π 0 (w1 (π)) = w2 (π), then i(π 0 ) ≥ i(π). Finally, if π 0 (w1 (π)) = w2 (π) with i(π 0 ) = i(π), then π 0 has not only the same number of inversions as π, but indeed the same set of inversions; it follows that π 0 = π. Probably the most important consequence of Theorem 8.2 is that it gives an elementary proof of the following corollary. 8.3 For any integers n and l, COROLLARY
EU ∈U (l) | Tr(U )|2n = dim(C n (U (l))) = f nl .
(8.9)
Remark 8.3.1 A reduction algorithm closely related to that used in the proof of Theorem 8.2 appeared in [34], for an application to P.I. (polynomial identity) algebras. The connection to invariant theory, as well as the various generalizations given below, appears to be new, however. Remark 8.3.2 A different basis for l = 2 (based on one of the many combinatorial interpretations of the Catalan numbers) was given in [17]. Remark 8.3.3 We could, of course, just as easily have used the permutations without long increasing subsequences to form the basis. The current choice has the merit of giving a basis containing the identity and closed under taking inverses, as well as making the proof of linear independence somewhat cleaner. Remark 8.3.4 While we defined everything over C, we observe that both the representation Tl and
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
53
the above reduction algorithm are actually defined over Z. Remark 8.3.5 There are also, of course, analogues of Corollary 8.3 associated with Theorems 8.4 through 8.8; we leave the details to the reader. More generally, Theorem 7.1 implies Z W W 0 (q; q 0 )−1 Pr(l W W 0 (q; q 0 ) ≤ l)
0 0 = EU ∈U (l) det(H (U ; q W /qW )H (U † ; qW (8.10) 0 /q W 0 )).
Consider the coefficient of a monomial Y
0
qiνi (qi0 )νi
(8.11)
i
in the right-hand side. By the properties of H (t; x/y), this is Y Y Y Y EU ∈U (l) eνi (U ) h νi (U ) eνi0 (U ) h νi0 (U ). i∈W
i∈W 0
i∈W
(8.12)
i∈W 0
Again, this is the expectation of a character, and thus it computes the dimension of a space of invariants. The corresponding coefficient of the left-hand side counts (W, W 0 )-compatible multisets without (W, W 0 )-increasing subsequences of length l + 1, in which i appears νi times as a first coordinate and νi0 times as a second coordinate. Thus to obtain the analogue of Theorem 8.2, we first need an analogue of Tl for multisets. Given a composition ν, we define a partition S(ν) of {1, 2, . . . , |ν|} by X X S(ν)i = { j : νk < j ≤ νk }. (8.13) k
k≤i
Associated to this partition is an operator Y Y 5(ν) = E S(ν)i HS(ν)i . i∈W
(8.14)
i∈W
Then the representation of U (l) in question is simply the action by conjugation of U ⊗n on operators of the form Tl (5(ν 0 ))ATl (5(ν)). In particular, the invariant subspace is spanned by operators of the form Tl (5(ν 0 )π5(ν)) with π ∈ Sn . We note that if two elements of Si with i ∈ W each map to elements of the same S 0j with j 6∈ W 0 , then 5(ν 0 )π5(ν) = 0; the same is true if i 6∈ W and j ∈ W 0 . It follows that operators of the form 5(ν 0 )π5(ν) are, up to sign, in one-to-one correspondence with
54
BAIK AND RAINS
(W, W 0 )-compatible multisets with content (ν, ν 0 ). In particular, given a (W, W 0 )compatible multiset M with content (ν, ν 0 ), we obtain an element of C[Sn ] which we denote Tl (M). Writing MW W 0 (ν; ν 0 ) for the set of such multisets, we have the following theorem. THEOREM 8.4 For any nonnegative integers l and compositions ν and ν 0 ,
{Tl (M) : M ∈ MW W 0 (ν; ν 0 )|`− W W 0 (M) ≤ l}
(8.15)
is a basis of 5(ν 0 )Cn (U (l))5(ν). Proof Given a multiset M in Z+ × Z+ , we define an inversion of M to be a pair of elements (x, y) ∈ M, (z, w) ∈ M with x < z and y > w (i.e., a strictly decreasing subsequence of length 2). Now, suppose M ∈ MW W 0 (n; n 0 ) has a (W, W 0 )-decreasing subsequence of length l + 1. Choose a permutation π corresponding to M, and let S ⊂ {1, 2, . . . , n} be the set of positions of π corresponding to the (W, W 0 )-decreasing subsequence. As before, we have (8.16) T (5(ν 0 )π E S 5(ν)) = 0. Now, we readily see that, for any permutation that appears in the left-hand side, the corresponding multiset has at most as many inversions as M. Indeed, the number of inversions is strictly smaller unless the corresponding multiset is equal to M. Thus it remains only to show that the terms corresponding to M do not cancel. The only way that the term σ (π 0 )5(ν 0 )ππ 0 5(ν) (8.17) can correspond to M is if we can write Y Y π0 = π −1 π1i0 π π2i0 , i
(8.18)
i
where π2i0 fixes the complement of S ∩ S(ν)i and where π1i0 fixes the complement of π(S) ∩ S(ν 0 )i . Now, by the definition of a (W, W 0 )-increasing subsequence, S ∩ S(ν)i contains at most one element when i ∈ / W ; similarly, π(S) ∩ S(ν 0 )i contains at most one element when i ∈ / W 0 . In other words, we can write Y Y (8.19) π0 = π −1 π1i0 π π2i0 ; i∈W 0
i∈W
but then σ (π 0 )50 ππ 0 5 = 50 π5,
(8.20)
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
55
as desired. The proof of linear independence is analogous to that in Theorem 8.2. Of the permutations associated to M, there is a unique one (π(M)) such that each element of W and W 0 induces a decreasing subsequence and each element of W and W 0 induces an increasing subsequence. We define w1 (M) = w1 (π(M)), w2 (M) = w2 (π(M)), and we observe that any other multiset M 0 with a nonzero coefficient at (w1 (M), w2 (M)) satisfies i(π(M 0 )) > i(π(M)). Remark 8.4.1 It is possible to renormalize the basic invariants Tl (M) in such a way that the reduction algorithm is integral. We need simply divide Tl (M) by Y |S(ν)i ∩ π(S(ν 0 ) j )|!, (8.21) i∈W, j∈W 0
where Tl (M) = 5(ν 0 )π5(ν). Remark 8.4.2 The space 5(ν 0 )Cn (U (l))5(ν) can be thought of as the space of simultaneous (multilinear) invariants of a collection of symmetric and antisymmetric tensors, some covariant and some contravariant. Of special interest is the case in which ν 0 is the composition l k with 1, 2, . . . , k ∈ W 0 . In this case, the space (E {1,2,...,l} E {l+1,l+2,...,2l} . . . )C n (U (l))5(ν)
(8.22)
corresponds to relative invariants of a collection of symmetric and antisymmetric covariant tensors; that is, transforming the tensors multiplies the invariant by a power of the determinant. There is a known algorithm (the straightening algorithm (see [36])) for computing a basis of such invariants. In fact, we observe that the resulting basis is, up to constant factors, the same as our basis. Thus our algorithm can be viewed as a generalization of this algorithm (different from the generalization to the “fourfold” algebra of [19]). Similarly, the algorithms below for the orthogonal and symplectic groups can be thought of as straightening algorithms for those groups. Remark 8.4.3 There is also a “quantum” analogue of this result. If one replaces the unitary group U (l) by the quantum enveloping algebra Uq (gl l ), the role of the symmetric group is now played by the Hecke algebra (see [22]). As long as S consists of consecutive elements, there is no difficulty in defining E S and HS . (These are idempotents corresponding to 1-dimensional characters of parabolic subalgebras.) We find that the kernel of the quantum Tl is the ideal generated by E S with S = {k, k + 2, . . . , k + l},
56
BAIK AND RAINS
1 ≤ k ≤ n − l. Using the appropriate normalization, we obtain a reduction algorithm integral over Z[q]. (We also have linear independence whenever q is not a root of unity.) Of particular interest is the “crystal limit” q = 0. Under that specialization, the relations take the form Tl (M) = 0, with M ranging over multisets with long decreasing subsequences. This case surely merits further investigation, given the connection (see [28]) between the crystal limit of the quantum straightening algorithm and the Robinson-Schensted-Knuth correspondence. It would also be interesting to understand the analogues for the quantum orthogonal and symplectic groups. For the orthogonal group O(l), there is a “basic” invariant Tl (τ ) associated to any fixed-point-free involution τ in S2n such that the basic invariants span the invariant space of U ⊗2n (which we denote by F2n (O(l)). These transform under Tl (π) as Tl (π)Tl (τ ) = Tl (π −1 τ π).
(8.23)
We write π · τ for the corresponding action on the formal span of fixed-point-free involutions. For any composition ν and nonnegative integer a, we define MW (ν; a)
(8.24)
to be the set of symmetric (W, W )-compatible multisets with content composition ν and with a “fixed points” (i.e., (i, i) ∈ M such that i ∈ W , together with (i, i) with i∈ / W such that (i, i) has odd multiplicity). Clearly, there is a correspondence (up to sign) between MW (n i ; 0) and elements of the form 5 · τ . 8.5 For any nonnegative integer l and any composition ν, THEOREM
{Tl (M) : M ∈ MW (ν; 0)|`W W (M) ≤ l}
(8.25)
is a basis of 5(ν)F2n (O(l)). Proof The argument is analogous. In eliminating a given increasing subsequence, we replace both it and its reflection through the diagonal by nonincreasing subsequences, so the number of inversions increases. The proof of linear independence is analogous to that of Theorem 8.4. We simply reverse the inequalities on the second coordinates before applying the above arguments. This corresponds to taking the coefficient of a monomial q ν in the identity Z W (q; α)−1 Pr(l W (q; α) ≤ l) = E U ∈O(l) det(H (U ; q W /α, qW )).
(8.26)
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
57
To handle the general case, we need to map an element of M W (ν; a) (corresponding to the monomial q ν α a ) to an element of 5 (ν; a)F2n (O(l)),
(8.27)
5 (ν; a) = 5(ν)E {|ν|+1,|ν|+2,...,|ν|+a} .
(8.28)
where we define But this is simple: simply convert each fixed point (i, i) to a pair of elements (i, ∞) and (∞, i), and apply Tl . 8.6 For any nonnegative integers l and a and any composition ν, THEOREM
{Tl (M) : M ∈ MW (ν; a)|`W W (M) ≤ l}
(8.29)
is a basis of 5 (ν; a)F2n (O(l)). Proof The main difficulty here is that the na¨ıve extension of the above algorithm is no longer guaranteed to terminate; for instance, corresponding to the increasing subsequence 13 of 132, we have the identity T2 (132) = T2 (213).
(8.30)
Since 13 is an increasing subsequence of both sides, we could clearly loop indefinitely. The solution is to require that the increasing subsequence consist entirely of points (x, y) with x ≤ y. Even with this restriction, the above proof does not entirely carry over; for instance, both 132 and 213 have the same number of inversions (i.e., 1). For an involution τ , denote by S≤ (τ ) the set of i with i ≤ τ (i). Given a multiset M, we then define M≤ (M) to be the multiset corresponding to S≤ (τ ), where τ corresponds to M. Given two multisets M1 and M2 of the same size on the same totally ordered set, we write M1 ≤ M2 to indicate that we can identify elements of M1 and M2 in such a way that each element of M1 is less than or equal to the corresponding element of M2 . Then the theorem follows from the next observation. If M 0 is one of the multisets obtained after eliminating an increasing subsequence of M, then M≤ (M 0 ) ≤ M≤ (M). If equality occurs, then either M 0 = M, or M 0 has strictly more inversions than M. Linear independence follows as above.
58
BAIK AND RAINS
For the symplectic group Sp(2l), the basic invariants again correspond to involutions, but now they transform under Tl (π) as T2l (π)Tl (τ ) = ±T2l (π −1 τ π)
(8.31)
T2l (π)Tl (τ ) = σ (π)Tl (τ )
(8.32)
with whenever π commutes with τ . For any composition ν and nonnegative integer b, we define MW (ν; b) (8.33) to be the set of symmetric (W , W )-compatible multisets with content composition ν and with b fixed points. THEOREM 8.7 For any nonnegative integer l and composition ν,
{Tl (M) : M ∈ MW (ν; 0)|`− W W (M) ≤ 2l}
(8.34)
is a basis of 5(ν)F2n (Sp(2l)). Proof The proof is analogous. The only issue is that the long decreasing subsequence we eliminate must be symmetric about the diagonal (clearly always possible). For linear independence, we choose a symplectic basis of V indexed v±1 , v±2 . We thus find that the nonzero coefficients of an involution τ correspond to words of length 2n on ±Z+ such that τ (w) = −w. The word w(τ ) associated to an involution is now defined such that |τ j | is equal to half the length of the longest symmetric decreasing subsequence starting or ending with j; the sign is positive if the sequence ends with j, and it is negative otherwise. Other than this, the arguments are analogous. This extends easily to the case when fixed points are allowed; in this case, we define 52l (ν; b) = 5(ν)H|ν|+1,|ν|+2,...,|ν|+b .
(8.35)
When eliminating (symmetric) decreasing subsequences, if the subsequence has even length, we can proceed as above; otherwise, we permute only those elements not corresponding to ∞. In either case, we convert a decreasing subsequence to a nondecreasing subsequence. (And the linear independence argument carries over.) Thus we have the following theorem.
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
59
THEOREM 8.8 For any nonnegative integers l and b and any composition ν,
{Tl (M) : M ∈ MW (ν; b)|`− W W (M) ≤ 2l}
(8.36)
is a basis of 5 (ν; b)F2n (Sp(2l)). It is not entirely clear how to proceed for either of · or . For · , or more generally U ( p) × U (q), the centralizer algebra corresponds as expected to a certain representation T p,q of the hyperoctahedral group, in which permutations are mapped via T p+q , and sign changes correspond to an operator with eigenvalues 1 ( p times) and −1 (q times). Given a set S ⊂ {1, 2, . . . , n}, we can define two elements E S± ∈ C[Hn ]. Each is a sum over elements of Hn which fix the complement of S; in E + , we multiply by the sign of the corresponding permutation, while in E − , we further multiply by the number of sign changes. Since the group U ( p) × U (q) is a direct product, we obtain the following lemma. LEMMA 8.9 The kernel of T p,q on C[Sn ] is spanned by the elements π E S+ with |S| > p and by the elements π E S− with |S| > q.
Remark. The point is that π E S+ can be used to reduce invariants of U ( p), while π E S− can be used to reduce invariants of U (q). Since every invariant of U ( p) × U (q) can be expressed in terms of the invariants of the respective factors, we are done. Even in the cases of interest, however, (q = p or q = p + 1), it is not clear how to use these relations to eliminate hyperoctahedral permutations with long decreasing subsequences; in particular, the invariants do not span the kernel of T p,q on Z[Sn ], in general, but only on Z[1/2][Sn ]. Similar remarks apply to . As in the other involution cases, the invariant space for is associated to a (twisted) Gelfand pair in H2n ; the relevant subgroup of H2n is the centralizer of an element of Sn . (For and , the Gelfand pair is (S2n , Hn ), twisted in the case.) ~ We observe that in each case, ker(Tl~ ) ⊂ ker(Tl−1 ) for all l, and ker(Tn~ ) = 0. We thus obtain the following theorem, which is a formal analogue of the Szeg o¨ limit theorem. 8.10 We have the following limits of formal power series: Y lim EU ∈U (l) det(H (U ; x/y)H (U † ; z/w)) = (1 − x j z k )−1 (1 − y j wk )−1 THEOREM
l→∞
j,k
60
BAIK AND RAINS
· lim EU ∈O(l) det(H (U ; x/y)) =
l→∞
Y
lim EU ∈Sp(2l) det(H (U ; x/y)) =
l→∞
j,k
(1 + x j yk )
Y j,k
Y j,k
Y
j≤k
(1 + x j yk )
(1 + x j wk )(1 + y j z k ),
(1 − x j xk )−1
Y
j
Y
j
(1 − x j xk )−1
(8.37)
(1 − y j yk )−1 ,
Y
j≤k
(8.38) (1 − y j yk )−1 . (8.39)
More precisely, all coefficients of degree less than or equal to 2l agree, and each limit is monotonic in all coefficients. Remark 8.10.1 Under the homomorphism p j (x/y) 7→ f i , p j (z/w) 7→ gi , we obtain X X X f j Tr(U j )/j + g j Tr(U − j )/j = exp f j g j /j , lim EU ∈U (l) exp l→∞
j
j
j
(8.40)
lim EU ∈O(l) exp
l→∞
X
lim EU ∈Sp(2l) exp
l→∞
j
f j Tr(U j )/j = exp
X j
X j
( f j2 + f 2 j )/2 j ,
X f j Tr(U j )/j = exp ( f j2 − f 2 j )/2 j ,
(8.41) (8.42)
j
and again coefficients with (weighted) degree less than or equal to 2l agree. This fact was proved via representation theory in [10] (note that the degree bounds given there are incorrect); the connection to the Szeg¨o limit theorem was observed in [23]. The monotonicity result is new. Remark 8.10.2 There are, of course, also hyperoctahedral analogues, both of which simply reduce to the case. We can also give a partial extension of the above results to the “super” analogues of the classical groups. For the unitary supergroup U (l/k), the centralizer algebra Cn (U (l/k)) is again spanned by permutations, under a particular representation Tl/k of Sn . 8.11 The representations that appear in Tl/k with positive multiplicity are those indexed by partitions λ with λl+1 < k + 1. THEOREM
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
61
We then have the following theorem. THEOREM 8.12 For any nonnegative integers l and k and compositions ν and ν 0 , the dimension of the space 5(ν 0 )Cn (U (l/k))5(ν) is equal to the number of multisets M ∈ M W W 0 (ν; ν 0 ) such that λW W 0 (M)l+1 < k + 1. (8.43)
Proof If Tλ is the representation of Sn corresponding to the partition λ, then the dimension of Tλ (5(ν 0 )Sn 5(ν)) (8.44) is given by the number of pairs of bitableaux of shape λ with respective content ν and ν 0 . In other words, by the generalized Knuth correspondence, this is equal to the number of multisets M ∈ MW W 0 (ν; ν 0 ) with λW W 0 (M) = λ.
(8.45)
The result follows by summing over λ. Remark 8.12.1 The obvious conjecture that those multisets give a basis under Tl/k is not true in general (not even for C 4 (U (1/1))); thus Theorem 8.4 does not immediately extend to the supergroup case. It is also not clear whether there is a simple description of the kernel of Tl/k . Remark 8.12.2 This theorem can be thought of as a formal statement along the lines of X EU ∈U (l/k) det(H (U ; x)H (U † ; y)) = sλ (x)sλ (y).
(8.46)
λl+1
It would be nice to make this statement precise. It is also interesting to speculate on the possibility of a “super” analogue of orthogonal polynomials and Toeplitz determinants. Similarly, for the orthosymplectic supergroup O Sp(l/2k) (the full group, not just the component of the identity), the invariant space Fn (O Sp(l/2k)) is the image of fixed-point-free involutions under a map Tl/2k for which ±Tl/2k (π)Tl/2k (τ ) = Tl/2k (π −1 τ π).
(8.47)
62
BAIK AND RAINS
This thus induces an action of S2n on F2n (O(l/2k)), for which we have the following theorem. 8.13 F2n (O Sp(l/2k)) splits into Sn -irreducible submodules as the direct sum M T2λ .
THEOREM
(8.48)
λ`n λl+1 ≤k
It then follows that we have the next theorem. THEOREM 8.14 For any nonnegative integers l and k and compositions ν and ν 0 , the dimension of the space 5(ν)F2n (O Sp(l/2k)) is equal to the number of multisets M ∈ M W W 0 (ν; 0) such that λW W 0 (M)l+1 < 2k + 1. (8.49)
In “integral” form, this reads as EU ∈O Sp(l/2k) det(H (U ; x)) =
X
s2λ (x).
(8.50)
λl+1 ≤k
Acknowledgments. We would like to acknowledge the following people for helpful discussions: Kurt Johansson for telling us about the processes generalized in Section 7, Richard Stanley for telling us about the references for that section, Peter Shor for spotting flaws in earlier versions of the algorithms of Section 8, and Christian Krattenthaler for helpful comments on Section 5. We would also like to thank Jeff Lagarias, Andrew Odlyzko, and Neil Sloane for helpful comments and enthusiasm. References [1]
J. BAIK, Random vicious walks and random matrices, Comm. Pure Appl. Math. 53
[2]
J. BAIK and E. M. RAINS, Limiting distributions for a polynuclear growth model with
(2000), 1385–1410. MR CMP 1 773 413 4
[3] [4]
[5]
external sources, J. Statist. Phys. 100 (2000), 523–541. MR CMP 1 788 477 4 , The asymptotics of monotone subsequences of involutions, Duke Math. J. 109 (2001), 205–282. 3, 15 A. BERELE and A. REGEV, Hook Young diagrams with applications to combinatorics and to representations of Lie superalgebras, Adv. in Math. 64 (1987), 118–175. MR 88i:20006 44 A. BERELE and J. B. REMMEL, Hook flag characters and their combinatorics, J. Pure Appl. Algebra 35 (1985), 225–245. MR 86g:20014 4, 44, 45, 46
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
[6]
[7] [8] [9] [10] [11]
[12] [13] [14] [15] [16] [17] [18] [19]
[20]
[21] [22] [23] [24] [25]
63
A. BORODIN, A. OKOUNKOV, and G. OLSHANSKI, Asymptotics of Plancherel measures
for symmetric groups, J. Amer. Math. Soc. 13 (2000), 481–515, http://www.ams.org/jams/ MR CMP 1 758 751 10 R. BRAUER, On algebras which are connected with the semisimple continuous groups, Ann. of Math. (2) 38 (1937), 857–872. 50 W. H. BURGE, Four correspondences between graphs and generalized Young tableaux, J. Combin. Theory Ser. A 17 (1974), 12–30. MR 50:4372 44, 45 N. G. DE BRUIJN, On some multiple integrals involving determinants, J. Indian Math. Soc. (N.S.) 19 (1955), 133–151. MR 18:121g 11, 36, 37 P. DIACONIS and M. SHAHSHAHANI, On the eigenvalues of random matrices, J. Appl. Probab. 31A (1994), 49–62. MR 95m:60011 1, 15, 60 P. J. FORRESTER and E. M. RAINS, Inter-relationships between orthogonal, unitary and symplectic matrix ensembles, preprint, arXiv:solv-int/9907008, to appear in Random Matrix Models and Their Applications, Math. Sci. Res. Inst. Publ. 48, 49 E. R. GANSNER, Matrix correspondences of plane partitions, Pacific J. Math. 92 (1981), 295–315. MR 82f:05009 44, 45 I. M. GESSEL, Symmetric functions and P-recursiveness, J. Combin. Theory Ser. A 53 (1990), 257–285. MR 91c:05190 10, 28, 37 B. GORDON, Notes on plane partitions, V, J. Combin. Theory Ser. B 11 (1971), 157–168. MR 43:6175 32, 39 B. GORDON and L. HOUTEN, Notes on plane partitions, II, J. Combinatorial Theory 4 (1968), 81–99. MR 36:1339 32 I. P. GOULDEN, A linear operator for symmetric functions and tableaux in a strip with given trace, Discrete Math. 99 (1992), 69–77. MR 93f:05100 10, 28, 29, 30 ¨ M. GRASSL, M. ROTTELER, and T. BETH, Computing local invariants of quantum-bit systems, Phys. Rev. A (3) 58 (1998), 1833–1839. MR 99d:81026 52 C. GREENE, An extension of Schensted’s theorem, Adv. Math. 14 (1974), 254–265. MR 50:6874 46 F. D. GROSSHANS, G.-C. ROTA, and J. A. STEIN, Invariant Theory and Superalgebras, CBMS Regional Conf. Ser. in Math. 69, Amer. Math. Soc., Providence, 1987. MR 88k:15035 55 M. ISHIKAWA, S. OKADA, and M. WAKAYAMA, Applications of minor-summation formula, I: Littlewood’s formulas, J. Algebra 183 (1996), 193–216. MR 97e:05197 M. ISHIKAWA and M. WAKAYAMA, Minor summation formula of Pfaffians, Linear and Multilinear Algebra 39 (1995), 285–305. MR 96m:15010 37 M. JIMBO, A q-analogue of U (gl(N + 1)), Hecke algebra, and the Yang-Baxter equation, Lett. Math. Phys. 11 (1986), 247–252. MR 87k:17011 55 K. JOHANSSON, On random matrices from the compact classical groups, Ann. of Math. (2) 145 (1997), 519–545. MR 98e:60016 12, 15, 60 , Shape fluctuations and random matrices, Comm. Math. Phys. 209 (2000), 437–476. MR CMP 1 737 991 3, 43, 46, 49 D. E. KNUTH, Permutations, matrices and generalized Young tableaux, Pacific J. Math.
64
[26]
[27] [28]
[29]
[30] [31]
[32] [33] [34] [35]
[36]
[37] [38] [39] [40]
[41]
[42]
BAIK AND RAINS
34 (1970), 709–727. MR 42:7535 44, 45 , The Art of Computer Programming, Vol. 3: Sorting and Searching, 2d ed., Addison-Wesley Ser. Comput. Sci. Inform. Process., Addison-Wesley, Reading, Mass., 1973. MR 56:4281 7 C. KRATTENTHALER, Identities for classical group characters of nearly rectangular shape, J. Algebra 209 (1998), 1–64. MR 2000a:05218 28 B. LECLERC and J.-Y. THIBON, The Robinson-Schensted correspondence, crystal bases, and the quantum straightening at q = 0, Electron. J. Combin. 3, no. 2 (1996), The Foata Festschrift, R11, http://www.combinatorics.org/ MR 99c:05203 56 I. G. MACDONALD, Symmetric Functions and Hall Polynomials, 2d ed., Oxford Math. Monogr., Oxford Univ. Press, New York, 1995. MR 96h:05207 1, 4, 24, 25, 29, 31, 43, 44 A. M. ODLYZKO, B. POONEN, H. WIDOM, and H. S. WILF, On the distribution of longest increasing subsequences in random permutations, in preparation. 10 S. OKADA, Applications of minor summation formulas to rectangular-shaped representations of classical groups, J. Algebra 205 (1998), 337–367. MR 99g:20081 28 E. M. RAINS, Increasing subsequences and the classical groups, Electron. J. Combin. 5 (1998), R12, http://www.combinatorics.org/ MR 98k:05146 2, 7, 25 , A mean identity for longest increasing subsequence problems, preprint, arXiv:math.CO/0004082. 4 A. REGEV, The representations of Sn and explicit identities for P.I. algebras, J. Algebra 51 (1978), 25–40. MR 57:9745 52 J. B. REMMEL, “The combinatorics of (k, l)-hook Schur functions” in Combinatorics and Algebra (Boulder, Colo., 1983), ed. C. Greene, Contemp. Math. 34, Amer. Math. Soc., Providence, 1984, 253–287. MR 86h:05012 44 G.-C. ROTA and B. STURMFELS, “Introduction to invariant theory in superalgebras” in Invariant Theory and Tableaux (Minneapolis, 1988), ed. D. Stanton, IMA Vol. Math. Appl. 19, Springer, New York, 1990, 1–35. MR 91f:05118 55 D. W. STANTON and D. E. WHITE, A Schensted algorithm for rim hook tableaux, J. Combin. Theory Ser. A 40 (1985), 211–247. MR 87c:05014 7, 47 J. R. STEMBRIDGE, Nonintersecting paths, Pfaffians, and plane partitions, Adv. Math. 83 (1990), 96–131. MR 91h:05014 28 ¨ , Orthogonal Polynomials, 4th ed., Amer. Math. Soc. Colloq. Publ. 23, Amer. G. SZEGO Math. Soc., Providence, 1975. MR 51:8724 16 E. THOMA, Die unzerlegbaren, positiv-definiten Klassenfunktionen der abz a¨ hlbar unendlichen, symmetrischen Gruppe, Math. Z. 85 (1964), 40–61. MR 30:3382 49 C. A. TRACY and H. WIDOM, Correlation functions, cluster functions, and spacing distributions for random matrices, J. Statist. Phys. 92 (1998), 809–835. MR 99m:82030 37 M. A. A. VAN LEEUWEN, The Robinson-Schensted and Sch u¨ tzenberger algorithms, an elementary approach, Electron. J. Combin. 3, no. 2 (1996), The Foata Festschrift,
ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES
65
R15, http://www.combinatorics.org/ MR 97e:05200 7 [43]
K. P. VO and R. WHITNEY, Tableaux and matrix correspondences, J. Combin. Theory
Ser. A 35 (1983), 328–359. MR 85e:05016 45
Baik Courant Institute of Mathematical Sciences, New York University, New York, New York 10012-1185, USA; current: Mathematics Department, Princeton University, Princeton, New Jersey 08544-1000, USA;
[email protected]; School of Mathematics, Institute for Advanced Study, Princeton, New Jersey 08540, USA. Rains AT&T Labs-Research, Shannon Laboratory, Florham Park, New Jersey 07932-0971, USA;
[email protected]
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 1,
ON A CONJECTURE OF JACQUET ABOUT DISTINGUISHED REPRESENTATIONS OF GL(n) DIPENDRA PRASAD
Abstract In this paper we prove a conjecture of Jacquet about supercuspidal representations of GLn (K ) distinguished by GLn (k), or by Un (k), for K a quadratic unramified extension of a non-Archimedean local field k. 1. Introduction Let G be a reductive algebraic group over a non-Archimedean local field k. Let K be a quadratic field extension of k. There has recently been much interest in trying to classify representations of G(K ) which have G(k)-invariant linear forms. The initial impetus for such a study came from the work of G. Harder, R. Langlands, and M. Rapoport [HLR] for G = GL2 which was done in the global context. H. Jacquet (cf. [JY1], [JY2]) has made the following conjectures for G = GLn and G = Un , where Un is the unique quasi-split unitary group in n variables over k which is split over K . We also refer to the paper [F] by Y. Flicker. CONJECTURE 1 Let π be an irreducible admissible representation of GLn (K ), where K is a quadratic extension of a non-Archimedean local field k. Assume that the central character of π restricted to k ∗ is trivial. Then we have the following. (1) If n is odd, π σ ∼ = π ∗ if and only if π has a GLn (k)-invariant linear form, σ where π denotes the representation of GLn (K ) obtained from π by using the automorphism of GLn (K ) coming from the Galois automorphism σ of K over k. (2) If n is even, π σ ∼ = π ∗ if and only if either π has a GLn (k)-invariant linear form or π has a linear form ` with `(gv) = ω K /k (det g)`(v) for g ∈ GLn (k) and v ∈ π, where ω K /k is the quadratic character of k ∗ associated to the extension K of k. DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 1, Received 19 July 2000. Revision received 2 October 2000. 2000 Mathematics Subject Classification. Primary 22E50; Secondary 22E35, 11F70.
67
68
DIPENDRA PRASAD
CONJECTURE 2 Let π be an irreducible admissible representation of GLn (K ), where K is a quadratic extension of a non-Archimedean local field k. Then π σ ∼ = π if and only if π has a Un (k)-invariant linear form for Un , the unique quasi-split unitary group in n variables over k which is split over K , and where π σ denotes the representation of GLn (K ) obtained from π by using the automorphism of GLn (K ) coming from the Galois automorphism σ of K over k.
The aim of this paper is to prove these conjectures for supercuspidal representations of GLn (K ) when K is an unramified quadratic extension of k. The analogues of these conjectures in the case of finite fields is due to R. Gow [G] (cf. also [P2]). The proof of this conjecture is accomplished via the methods of our earlier paper [P2], in which we treated a similar question for certain representations of finite groups of Lie type, together with the theorem of C. Bushnell and P. Kutzko that realizes any supercuspidal representation of GLn by compact induction from a finite-dimensional representation of an open subgroup which is compact modulo the center. Since the methods of compact induction are expected to be true for supercuspidal representations in great generality, it appears that the methods of this paper, which treats representations of compact open subgroups via finite groups of Lie type, though usually not reductive, may have greater applicability. We note that in an ongoing work of J. Hakim and F. Murnaghan (cf. [HM]), the authors are able to obtain certain results for both the ramified and unramified quadratic extension K of k by an elaborate structure theory of the representations of GLn , but they are not able to get as complete results as we obtain here for the quadratic unramified case. We refer to the paper of Hakim and Z. Mao [HMa] for some of the earlier results in this direction. 2. Recollection of earlier results We begin by recalling the theorem proved in our earlier paper [P2]. Let G(F) be the F rational points of a connected algebraic group G over a finite field F. Let E be the quadratic field extension of F. We recall that in [P2] we called a representation of G = G(E) stable if its character takes the same value on any two elements in G(E) which become conjugate when we extend the field E to its algebraic closure. Let σ denote the automorphism of G(E) obtained from the Galois automorphism of E over F, and let π σ denote the representation of G(E) obtained from a representation π of G(E) by using the automorphism σ of the group G(E).
DISTINGUISHED REPRESENTATIONS
69
THEOREM 1 For a connected algebraic group G over F, an irreducible stable representation π of G(E) has a fixed vector for G(F) if and only if π σ ∼ = π ∗.
Remark. For a Hermitian matrix J in GLn (E), the unitary group Un (J ) can be defined to be the set of matrices g ∈ GLn (E) such that g J σ (t g) = J, or g = J σ (t g −1 )J −1 . Thus the unitary group Un (J ) can be considered as the fixed points of the involution g → J σ (t g −1 )J −1 on GLn (E), which is to be thought of as the new Frobenius action on GLn (E) whose fixed point subgroup is Un (J ). Under this Frobenius action the transform of π, to be denoted by π Fr , becomes (π σ )∗ ; hence the condition π Fr ∼ = π ∗ becomes, for unitary groups, (π σ )∗ ∼ = π ∗ or π σ ∼ = π. This explains the difference in the condition on a representation of GLn (K )—to have a GLn (k)-invariant linear form, or to have a Un (k)-invariant linear form in Conjectures 1 and 2 due to Jacquet. We apply this theorem to prove the following theorem. 2 Let K be a quadratic unramified extension of a non-Archimedean local field k. Suppose that O K and Ok are the ring of integers in the two fields. Let G be either GLn over Ok or the unitary group over Ok defined in terms of a nondegenerate Hermitian form over O K . (A nondegenerate Hermitian form over O K means, in concrete terms, that the matrix of the Hermitian form has entries in O K and that its determinant is a unit, i.e., an element of O K∗ .) Then an irreducible representation π of GLn (O K ) has a fixed vector for G(Ok ) if and only if π σ ∼ = π ∗ . Here the action of σ on representations of GLn (K ) is the standard Galois action for G = GLn and is the standard Galois action composed with the dual for G, the unitary group. THEOREM
Proof An irreducible representation of GLn (O K ) is actually a representation of GLn (O K /πkm ) for some integer m ≥ 1, where πk denotes a uniformizing parameter of Ok and hence also of O K . It is a consequence of a theorem of M. Greenberg (cf. [G1], [G2]), generalizing the notion of Witt group schemes, that the group G(Ok /πkm ) is representable by a connected algebraic group over the finite field Ok /πk in the sense that there is a connected algebraic group G n,m over the finite field Ok /πk such that for any finite field extension E of Ok /πk , G n,m (E) = G(O L /πkm ), where L is the unique unramified extension of k which corresponds to the extension E of the residue field Ok /πk of k.
70
DIPENDRA PRASAD
The proof of this theorem will therefore follow from Theorem 1 if we can check that all representations of GLn (O K /πkm ) are stable. This is because in fact in G n,m there is no difference between conjugacy and stable conjugacy. This follows, for instance, by an application of Lang’s theorem, as the centralizer of any element in G n,m is connected. To substantiate our claim about the connectedness of the centralizer, we only point out that the invertible elements in any O K subalgebra (not necessarily free over O K /πk n ) of the matrix algebra Mn (O K /πk n ) define a connected group. 3. The theorem of Bushnell and Kutzko The following theorem is due to Bushnell and Kutzko [BK, Chap. 6]. THEOREM 3 Given a supercuspidal representation π of GLn (K ), there exists an irreducible representation 5 of K ∗ GLn (O K ) such that X GL (K ) π ⊗ µ, Ind K ∗ nGLn (O K ) 5 ∼ = µ
where the characters µ of K ∗ are certain distinct unramified characters of K ∗ with µn = 1 which form a group under multiplication; the representations π ⊗ µ are distinct for distinct characters µ. Proof Since this is not the usual form of the theorem of Bushnell and Kutzko, we give a detailed proof. We recall that Bushnell and Kutzko realized any supercuspidal representation of GLn (K ) as an induced representation from a maximal compact modulo center subgroup of GLn (K ), GL (K ) π∼ = indK n 3, for a certain maximal compact modulo center subgroup K of GLn (K ) which can be written as K = K0 · E ∗ with K0 ⊂ GLn (O K ), a normal subgroup of K , and E a field extension of K of degree n. The mapping val ◦ det on K induces an isomorphism K /(K0 · K ∗ ) ∼ = f Z/nZ ∼ = Z/eZ,
where f Z is the image of val ◦ det on E ∗ and e is the ramification index of E. It follows that X ∗K IndK 3| = 3⊗µ ∗ K 0 K K0
DISTINGUISHED REPRESENTATIONS
71
for unramified characters µ of K ∗ coming from the characters of Z/eZ. Hence X GL (K ) π ⊗ µ = ind K ∗ nK0 3| K ∗ K0 h i GL (K ) K ∗ GL (O ) = ind K ∗ nGLn (O K ) ind K ∗ K0n K 3| K ∗ K0 GL (K )
= ind K ∗ nGLn (O K ) 5 with K ∗ GL (O K )
5 = ind K ∗ K0n
3| K ∗ K0 .
That the representations π ⊗ µ are distinct follows from the uniqueness of the repGL (K ) resentation 3 of K with π ∼ = indK n 3 together with the property of 3 that it is irreducible when restricted to K0 . 4. Some known results In this section we recall the following lemma due to Flicker (cf. [F]) about the double coset decomposition of GLn (K ) by GLn (k), whose simple proof we supply for completeness. Here K is a separable quadratic extension of k. 1 For any g in GLn (K ), σ (g −1 ) = g1 gg2 for matrices g1 , g2 ∈ GLn (k). LEMMA
Proof It suffices to prove that, given any g in GLn (K ), there is g1 in GLn (k) such that σ (g)g1 g belongs to GLn (k). For this it suffices to prove that the equation σ (g)Xg = g X σ (g) has a solution for X in GLn (k). It is clear that the set of solutions in the matrix algebra Mn (K ) forms a vector space V that is stable under the Galois involution, hence defined over k, and is nonzero (as it contains, for instance, g −1 ). Since the determinant takes nonzero values on V , it does so over the k structure Vk of V too. (A nonzero polynomial cannot vanish on an affine space over an infinite field!) The following two corollaries were also obtained by Flicker via standard techniques. COROLLARY 1 The space of GLn (k)-invariant linear forms on any irreducible admissible representation of GLn (K ) is at most one-dimensional.
72
DIPENDRA PRASAD
COROLLARY 2 If an irreducible representation π of GLn (K ) has a GLn (k)-invariant linear form, then one has π σ ∼ = π ∗.
5. Main theorem We now give the proof of Jacquet’s conjecture when K is an unramified quadratic extension of k. 4 Let π be an irreducible admissible supercuspidal representation of GLn (K ), where K is an unramified quadratic extension of a non-Archimedean local field k. Assume that the central character of π restricted to k ∗ is trivial. Then we have the following. (1) If n is odd, π σ ∼ = π ∗ if and only if π has a GLn (k)-invariant linear form, σ where π denotes the representation of GLn (K ) obtained from π by using the automorphism of GLn (K ) coming from the Galois automorphism σ of K over k. (2) If n is even, π σ ∼ = π ∗ if and only if either π has a GLn (k)-invariant linear form or π has a linear form ` with `(gv) = ω K /k (det g)`(v) for g ∈ GLn (k) and v ∈ π, where ω K /k is the quadratic character of k ∗ associated to the extension K of k. THEOREM
Proof From Corollary 2 we already know that if π has a GLn (k)-invariant linear form, then πσ ∼ = π ∗ . It therefore suffices to prove the converse statement. From the theorem of Bushnell and Kutzko recalled in Section 3, there exists an irreducible representation 5 of K ∗ GLn (O K ) such that X GL (K ) Ind K ∗ nGLn (O K ) 5 ∼ π ⊗ µ, = µ
where the characters µ of K ∗ are certain distinct unramified characters of K ∗ . The isomorphism π σ ∼ = π ∗ implies that the same is true for 5: 5σ ∼ = 5∗ . To prove this, note that Bushnell and Kutzko work with “simple types”, say (J, λ), and prove a uniqueness theorem for these up to G conjugacy. The isomorphism of π with π σ ∗ would, however, only give an element g in G = GLn (K ) which preserves J under inner conjugation and takes λ to λσ ∗ . This g has the property that g 2 takes λ to λ and hence belongs to J . It follows that the group J generated by g and J is compact modulo center and that the induction of λ to J, say 3, has the property that 3 ∼ = 3σ ∗ . Induction of 3 to a maximal compact modulo center subgroup will continue to have this property, and hence the same is true for 5 by the proof of Theorem 3.
DISTINGUISHED REPRESENTATIONS
73
From a simple application of Mackey theory about the restriction of an induced representation to a subgroup, it follows that GL (K )
GL (k)
n ResGLn (k) Ind K ∗ nGLn (O K ) 5 = Indk ∗ GL 5|k ∗ GLn (Ok ) ⊕ . . . , n (Ok )
where the terms omitted in the above expression come from the nontrivial double cosets of K ∗ GLn (O K )\GLn (K )/GLn (k). Noting that 5 restricted to k ∗ is trivial, Theorem 2 together with an application of the Frobenius reciprocity implies that one of the twists of π by an unramified character µ of K ∗ has a GLn (k)-invariant form. We claim that the only possible µ for which π ⊗ µ could possibly have a GLn (k)-invariant linear form is the unramified character µ of order 2. For this we note by Corollary 2 that if π ⊗ µ has a GLn (k)-invariant linear form, then (π ⊗ µ)σ ∼ = (π ⊗ µ)∗ , or π σ ∼ = π ∗ ⊗ µ−2 . Since we are already σ ∗ 2 given that π ∼ = π , it follows that π ∼ = π ⊗µ . But the twists that appear in Theorem 3 (due to Bushnell and Kutzko) are all distinct. Hence µ2 = 1. In particular, if n is odd, µ is trivial, and hence a representation π of GLn (K ), n odd, with π σ ∼ = π ∗, has a GLn (k)-invariant linear form. If n is even, then either π or π ⊗ ω K /k , has a GLn (k)-invariant form. THEOREM 5 Let π be an irreducible admissible supercuspidal representation of GLn (K ), where K is a quadratic unramified extension of a non-Archimedean local field k. Then π σ ∼ = π if and only if π has a Un (k)-invariant linear form for Un , the unique quasi-split unitary group in n variables over k which is split over K , and where π σ denotes the representation of GLn (K ) obtained from π by using the automorphism of GLn (K ) coming from the Galois automorphism σ of K over k.
Proof If a representation π of GLn (K ) has a Un (k)-invariant linear form, then one has πσ ∼ = π. This is proved via global methods by embedding a representation of GLn (K ) which has a nontrivial Un (k)-invariant linear form into a global automorphic representation with nonzero period on the unitary group and then appealing to a global theorem. We refer to [F], [HF], and [H] for various contexts in which such a result has been proved and to the most recent and most complete work by Jacquet in [J]. It therefore suffices to prove that supercuspidal representations π of GLn (K ) with π σ ∼ = π carry a Un (k)-invariant linear form. The proof of the previous theorem constructs in this case a Un (k)-invariant linear form on some twist π ⊗ µ of π by an unramified character of K ∗ . Notice that if π ⊗ µ has a Un (k)-invariant linear form, then π itself carries a Un (k) linear form. This follows as the determinant map on
74
DIPENDRA PRASAD
GLn (K ) which, when restricted to Un (k), takes values in U1 = {z ∈ K ∗ |zσ (z) = 1} on which an unramified character such as µ must be trivial. Remark. It will be nice to be able to carry out a generalization of the method used here to the case of the ramified quadratic extensions. One of the difficulties in this case, which was also encountered in [P1] but taken care of there by explicit character formulae, is that the unique invariant linear form that one wants to construct does not arise from the trivial double coset used in the arguments of the above theorems. Thus although Theorem 2 is not true for ramified field extensions, as one can easily see, Conjectures 1 and 2 are expected to be true. Remark. It is expected that if π is a supercuspidal representation of GLn (K ) with πσ ∼ = π ∗ , then either π or π ⊗ ω K /k has a GLn (k)-invariant linear form, where ω K /k is the quadratic character of k ∗ associated to the extension K of k, but that the two possibilities do not hold simultaneously. Also, it is expected that if π is a supercuspidal representation of GLn (K ) with π σ ∼ = π, then π has a Un (k)-invariant form which is unique up to scalars. Both of these expectations are false for principal series representations, as can be easily seen. Hence methods of Gelfand pairs are inadequate to prove these multiplicity-1 expectations. Having constructed the desired linear forms, what needs to be proved is that nontrivial double cosets do not contribute to invariant linear forms. This can be done by the property of inducing data in many cases (cf. [HM], [HMa]). 6. Question of central characters In Conjecture 1 and Theorem 4, we restricted ourselves to representations of GLn (K ) whose central character restricted to k ∗ is trivial. One can in fact treat the more general situation that might arise from the condition π σ ∼ = π ∗ . Observe that if π σ ∼ = π ∗, ∗ then the restriction of the central character of π to k is either trivial or is ω K /k . Fix a character χ of K ∗ whose restriction to k ∗ is ω K /k . It is easy to see that twisting by the character χ preserves the condition π σ ∼ = π ∗ , and if n is odd, it takes representations π whose central character restricted to k ∗ is trivial to representations π ⊗ χ whose central character restricted to k ∗ is nontrivial, and vice-versa. It is clear that if π has a GLn (k)-invariant linear form, then for n odd, π ⊗ χ has a linear form on which GLn (k) operates by ω K /k . Hence Theorem 4 implies the following slightly more general theorem THEOREM 6 A representation π of GLn (K ) for K a quadratic unramified extension of k and n odd, with π σ ∼ = π ∗ , has a GLn (k)-invariant linear form if and only if its central character
DISTINGUISHED REPRESENTATIONS
75
restricted to k ∗ is trivial. If the central character of π restricted to k ∗ is ω K /k , then π has a linear form ` : π → C with `(gv) = ω K /k (det g)`(v) for v ∈ π and for g in GLn (k). For n even, we have the following theorem. THEOREM 7 For K a quadratic extension of k and n even, a supercuspidal representation π of GLn (K ), with π σ ∼ = π ∗ has trivial central character when restricted to k ∗ .
Proof The analogous result for irreducible representations of the Weil group W K is a simple group-theoretic fact proved in J. Rogawski’s book (cf. [R, Lemma 15.1.2(b)]). The result then follows from the local Langlands conjecture proved by Harris, Taylor, and Henniart. 7. A conjecture The method of this paper, which tries to retrieve information on representations of a p-adic group via its restriction to compact open subgroups, does not apply to representations other than supercuspidals, most notably to discrete series representations that are not supercuspidal. Based on what is expected for GLn , it is tempting to speculate about at least one general class of representations as to what may be expected in general. In this section we make a conjecture about when the Steinberg representation of G(K ) has a G(k)-invariant linear form in the case when G is a quasi-split reductive group over a non-Archimedean local field k. A particular case of the conjecture below is that, for a simply connected semisimple quasi-split group G over a local field k, the Steinberg representation of G(K ) carries a unique G(k)-invariant linear form. This is not the case for general quasi-split reductive groups, and we make a precise conjecture below. Observe that if there is an exact sequence of algebraic groups 1 → A → G → G0 → 1 with A a central subgroup in a reductive algebraic group G whose derived subgroup is quasi-split over k, then the k-rational points of a flag variety G/P of G can be identified to the k-rational points of a flag variety G 0 /P 0 of G 0 . It follows that the Steinberg representation of G(k) is the restriction to G(k) of the Steinberg representation of G ad (k), where G ad is the group G divided by its center Z (G). This actually gives an extra structure to the Steinberg representation of G(k) since G ad (k) is in general larger than the image of G(k) in G ad (k).
76
DIPENDRA PRASAD
We now construct a natural character χ K on G(k) with values in Z/2 associated to any quadratic extension K of k, where G is any reductive group over the local field k. We denote the simply connected cover of G ad by G sc , and we denote the center of sc G by Z . By a theorem due to Kneser and Bruhat-Tits, the first Galois cohomology of G sc vanishes. This gives rise to the following exact sequence of groups: 1 → Z (k) → G sc (k) → G ad (k) → H 1 (k, Z ) → 1. It is known that G sc (k)/Z (k) is its own derived subgroup if G sc is not anisotropic; this is a consequence of the so-called Kneser-Tits problem, known to be true for all p-adic fields due to Platonov. Hence, from the exact sequence above, the character group of G ad (k) can be identified to the character group of H 1 (k, Z ). By the Tate-Nakayama duality, the character group of H 1 (k, Z ) can be identified to H 1 (k, Z ∨ ), where Z ∨ is the Cartier dual of Z . Let G ∨ be the dual group of G ad . So G ∨ is a complex semisimple simply connected group whose center is isomorphic to Z ∨ (C). The group G ∨ comes equipped with the action of the Galois group of k via algebraic automorphisms on the complex group G ∨ (C), and hence the center of G ∨ (C), which as we have pointed out is Z ∨ (C), gets a Galois action that is the same as it gets as the Cartier dual of Z . (In particular, Z ∨ is a constant group scheme over k for a semisimple split group G s .) It follows from the Jacobson-Morozov theorem that there is a homomorphism from SL2 (C) to G ∨ (C) which takes a nontrivial unipotent of SL2 to a regular unipotent in G ∨ (C). Since the action of the Galois group of k preserves a based root datum in G ∨ , there is a regular unipotent in G ∨ (C) on which the Galois action is trivial. Hence the homomorphism from SL2 (C) to G ∨ (C) can be assumed to be invariant under the Galois action. Under this homomorphism the center of SL2 , consisting of ±1, goes to the center of G ∨ and thus canonically gives a Galois invariant element in the center of G ∨ which is either trivial or is of order 2. (This is the element that decides whether an algebraic self-dual representation of G ∨ is orthogonal or symplectic (cf. [P3]).) The associated mapping from Z/2 to Z ∨ gives rise to a homomorphism from H 1 (k, Z/2) to H 1 (k, Z ∨ ). We now define an element in H 1 (k, Z ∨ ) to be the image of the element in H 1 (k, Z/2) which defines the quadratic extension K of k. This, as we saw earlier, defines a character, say χ K , which is either trivial or of order 2 on the group G ad (k) with values in Z/2 associated to any quadratic extension K of a local field k. If G is any reductive group over k, the natural map from G to G ad , when composed with the character χ K defined here for G ad , thus defines a character on G(k) for any reductive group G. We are now ready to make our conjecture.
DISTINGUISHED REPRESENTATIONS
77
CONJECTURE 3 For a reductive algebraic group G over a local field k whose derived subgroup is quasi-split, the Steinberg representation of G(K ), K a quadratic extension of k, carries a unique linear form ` such that
`(gv) = χ K (g)`(v) for all g ∈ G ad (k), and v a vector in the Steinberg representation of G(K ). The Steinberg representation of G(K ) does not carry a χ -invariant linear form for the action of the group G ad (k) on the Steinberg representation of G(K ) for any other character χ of G ad (k). Remark. For G = GL2 (K ), this conjecture follows from the results in [P1]. Acknowledgments. This paper was written at the Institut Henri Poincar´e (IHP) where the author was visiting under a programme of the Ministry of Education of France. The author thanks organizers of the special programme on automorphic forms at the IHP for the invitation to visit. The author would like to thank H. Jacquet, R. Kottwitz, P. Kutzko, J. Rogawski, and M.-F. Vigneras for encouraging words and for helpful remarks, J. Hakim and F. Murnaghan for telling about their ongoing work, and J.-K. Yu for pointing out a subtlety in compact induction. Finally, the author thanks the referee for some very pertinent remarks. References [BK]
[F] [G] [G1] [G2] [H] [HF] [HMa]
C. BUSHNELL and P. KUTZKO, The Admissible Dual of GL(N ) via Compact Open
Subgroups, Ann. of Math. Stud. 129, Princeton Univ. Press, Princeton, 1993. MR 94h:22007 70 Y. FLICKER, On distinguished representations, J. Reine Angew. Math. 418 (1991), 139–172. MR 92i:22019 67, 71, 73 R. GOW, Two multiplicity-free permutation representations of the general linear group GL(n, q 2 ), Math. Z. 188 (1984), 45–54. MR 86a:20008 68 M. J. GREENBERG, Schemata over local rings, Ann. of Math. (2) 73 (1961), 624–648. MR 23:A3745 69 , Schemata over local rings, II, Ann. of Math. (2) 78 (1963), 256–266. MR 28:98 69 J. HAKIM, Character relations for distinguished representations, Amer. J. Math. 116 (1994), 1153–1202. MR 95i:22026 73 J. HAKIM and Y. FLICKER, Quaternionic distinguished representations, Amer. J. Math. 116 (1994), 683–736. MR 95i:22028 73 J. HAKIM and Z. MAO, Supercuspidal representations of GL(n) distinguished by a unitary subgroup, Pacific J. Math. 185 (1998), 149–162. MR 99j:22023 68, 74
78
DIPENDRA PRASAD
[HM]
J. HAKIM and F. MURNAGHAN, Tame supercuspidal representations of GL(n)
[HLR]
G. HARDER, R. LANGLANDS, and M. RAPOPORT, Algebraische zyklen auf
distinguished by a unitary group, in preparation. 68, 74
[J] [JY1] [JY2] [P1] [P2] [P3] [R]
Hilbert-Blumenthal-fl¨achen, J. Reine Angew. Math. 366 (1986), 53–120. MR 87k:11066 67 H. JACQUET, Factorization of period integrals, to appear in J. Number Theory. 73 H. JACQUET and Y. YE, Une remarque sur le changement de base quadratique, C. R. Acad. Sci. Paris S´er. I Math. 311 (1990), 671–676. MR 92j:11046 67 , Distinguished representations and quadratic base change for GL(3), Trans. Amer. Math. Soc. 348 (1996), 913–939. MR 96h:11041 67 D. PRASAD, Invariant forms for representations of GL2 over a local field, Amer. J. Math. 114 (1992), 1317–1363. MR 93m:22011 74, 77 , Distinguished representations for quadratic extensions, Compositio Math. 119 (1999), 335–345. MR 2001b:22016 68 , On the self-dual representations of a p-adic group, Internat. Math. Res. Notices 1999, 443–452. MR 2000d:22019 76 J. ROGAWSKI, Automorphic Representations of Unitary Groups in Three Variables, Ann. Math. Stud. 123, Princeton Univ. Press, Princeton, 1990. MR 91k:22037 75
Harish-Chandra Research Institute, Chhatnag Road, Jhusi, Allahabad, 211019, India;
[email protected]
DUKE MATHEMATICAL JOURNAL Vol. 109, No. 1, © 2001
ON THE SLOPE FILTRATION THOMAS ZINK
Abstract Let X be a p-divisible group over a regular scheme S such that the Newton polygon in each geometric point of S is the same. Then there is a p-divisible group isogenous to X which has a slope filtration. 1. Introduction Let X be a p-divisible group over a perfect field. The Dieudonné classification implies that X is isogenous to a direct product of isoclinic p-divisible groups. We study what remains true if the perfect field is replaced by a ring R such that p R = 0. Now let X be a p-divisible group over R. Let us denote by Fr X : X → X ( p) the Frobenius homomorphism. We call X isoclinic and slope divisible if there are natural numbers r ≥ 0 and s > 0 such that p −r FrsX : X → X ( p
s)
is an isomorphism. The rational number r/s is called the slope of X . Then X is isoclinic of slope r/s; that is, it is isoclinic of slope r/s over each geometric point of SpecR. If R is a field, a p-divisible group is isoclinic if and only if it is isogenous to a p-divisible group that is isoclinic and slope divisible. It is stated in a letter of A. Grothendieck to I. Barsotti (see [G2]) that over a field K = R any p-divisible group admits a slope filtration 0 = X 0 ⊂ X 1 ⊂ X 2 ⊂ · · · ⊂ X m = X.
(1)
This filtration is uniquely determined by the following properties: the inclusions are strict, and the factors X i / X i−1 are isoclinic p-divisible groups of slope λi such that 1 ≥ λ1 > · · · > λm ≥ 0. Moreover, the rational numbers λi are uniquely determined. A proof of this statement was never published but can be found here. The heights of the factors and the numbers λi determine the Newton polygon, and conversely. If we want a slope filtration over R, we have to assume that the Newton DUKE MATHEMATICAL JOURNAL Vol. 109, No. 1, © 2001 Received 2 May 2000. Revision received 22 October 2000. 2000 Mathematics Subject Classification. Primary 14L05; Secondary 14F30. 79
80
THOMAS ZINK
polygon is the same in any point of SpecR. We say in this case that X has a constant Newton polygon. THEOREM
Let R be a regular ring. Then any p-divisible group over R with constant Newton polygon is isogenous to a p-divisible group X , which admits a strict filtration (1) such that the quotients X i / X i−1 are isoclinic and slope divisible of slope λi with 1 ≥ λ1 > · · · > λm ≥ 0. In the case where dim R = 1 and R is finitely generated over a perfect field, the theorem was proved by N. Katz [K] using the crystalline theory. Our proof uses only Dieudonné theory over a perfect field. It is based on a purity result (see Proposition 5) that was suggested to us when reading the work of M. Harris and R. Taylor. Let S be a regular scheme, and let U be an open subset such that the codimension of the complement is greater than or equal to 2. Then we show that a p-divisible group over U with constant Newton polygon extends up to isogeny to a p-divisible group over S. One might call this Nagata-Zariski purity for p-divisible groups. We note that there is a difficult purity result of A. de Jong and F. Oort which holds without the regularity assumption for any noetherian scheme S. It says that a p-divisible group X over S, which has constant Newton polygon on U , has constant Newton polygon on S. 2. The étale part of a Frobenius module We work over a base scheme S over F p . The Frobenius morphism is denoted by Frob S . Definition 1 Fix an integer a > 0. A Frobenius module over S is a finitely generated locally free O S -module M , and a FrobaS -linear map 8 : M → M . There is an important case where the condition that M is locally free is automatically satisfied, namely, if 8 is a FrobaS -linear isomorphism. This means that the linearization 8] : O S ⊗FrobaS ,S M → M is an isomorphism. 2 Let R be a local ring with maximal ideal m. Assume that R is m-adically separated. LEMMA
ON THE SLOPE FILTRATION
81
Let M be a finitely generated R-module. Assume that there exists a FrobaS -linear isomorphism 8 : M → M. Then M is free. Proof We choose a minimal resolution of M, 0 → U → P → M → 0, where P is a finitely generated free R-module and U ⊂ mP. Since R ⊗Froba ,R P is a free R-module, the linearization 8] extends to R ⊗Froba ,R P; that is, we find a commutative diagram R ⊗Froba ,R P −−−−→ R ⊗Froba ,R M ] y8 8] y P
(2)
M
−−−−→
Since P/mP ∼ = M/mM, it follows by Nakayama that the left vertical arrow is surjective and hence an isomorphism. The diagram implies that U = 8] (R ⊗Froba ,R U ) (with a small abuse of notation). Since P is m-adically separated, it is enough to show that U ⊂ mn P for each number n. This is true for n = 1 by construction. We assume by induction that the inclusion is true for a given n, and we find U ⊂ 8] (R ⊗Froba ,R mn P) ⊂ 8] (mnp ⊗Froba ,R P) ⊂ mnp P. a
a
To any Frobenius module we associate the following functor on the category of schemes T → S: CM (T ) = {x ∈ 0(T, MT ) | 8x = x}. PROPOSITION 3 The functor CM is representable by a scheme that is étale and affine over S.
Proof Since the functor is a sheaf for the flat (fppf) topology, the question is local on S. We may therefore assume that S = SpecR and that M is the sheaf associated to a free R-module M. We choose an isomorphism M ∼ = R n and write the operator 8 in matrix form: a 8x = U x ( p ) , x ∈ R n . Here x is a column vector, and x ( p ) is the vector obtained by raising all components to the pa th power. U is a square matrix with coefficients in R. Let A be an R-algebra. a
82
THOMAS ZINK
We set C M (A) = CM (SpecA). Then C M is just the functor of solutions of the equation a x = U x ( p ) , x ∈ An . This functor is clearly a closed subscheme of the affine space AnR . To show that C M is étale, one applies the infinitesimal criterion. Let A → A¯ be a surjection of R-algebras with kernel a such that a2 = 0. We have to show that the canonical map ¯ C M (A) → C M ( A) ¯ and we lift it to an element x of is bijective. We consider an element x¯ ∈ C M ( A), n ∼ A ⊗ R M = A . We set ρ = 8x − x ∈ a ⊗ R M. Since 8(a ⊗ R M) = 0, we obtain 8(x + ρ) = 8x = x + ρ. This shows that x + ρ ∈ C M (A) is the unique lifting of x. ¯ To make life easier, let us assume that S is an F pa -scheme. Then CM may be considered as a sheaf of F pa -vector spaces. If S is connected and η ∈ S is a point, the natural map CM (S) → CM (η) is injective because CM is unramified and separated over S (see, e.g., [EGAIV, Proposition 17.4.9]). Let us assume that S = SpecK is the spectrum of an algebraically closed field. Let (M, 8) be a Frobenius module over K . Then there is a unique decomposition M = M bij ⊕ M nil
(3)
into 8-invariant subspaces such that 8 is bijective on the first summand and nilpotent on the second summand. Moreover, by a theorem of Dieudonné (see [Z, Lemma 6.25]) we have an isomorphism K ⊗F pa C M (SpecK ) → M bij .
(4)
Let us assume that S = SpecK is the spectrum of separably closed fields and denote the algebraic closure by K¯ . Since C M (K ) = C M ( K¯ ), the subspace M bij is −1 defined over K by (4). Note that M nil is not defined over K ; for example, M = K p and 8 = Frob. We note that the submodule M bij is defined over any field K by Galois descent (see [G2, B, Example 1]). If K s denotes the separable closure and G denotes its Galois group over K , we set M bij = (K s ⊗F pa C M (K s ))G .
ON THE SLOPE FILTRATION
83
This subspace is characterized as follows: on M bij the operator 8 acts as a Froba linear isomorphism, and on the factor M/M bij it acts nilpotently. We note that the functor M 7 → M bij is an exact functor in M. To see this it is enough to consider the case of an algebraically closed field K . With this assumption the result follows because the decomposition (3) is functorial in M. The same argument shows that the functor commutes with tensor products. Assume that S = SpecR and that (M, 8) is a Frobenius module over R. 4 Assume that SpecR is connected. Then the natural map LEMMA
R ⊗F pa C M (R) → M
(5)
is an injection onto a direct summand of M. Proof Since SpecR is connected, the natural map C M (R) → C M (Rp ) is injective for any prime ideal p of R (see [EGAIV, Proposition 17.4.9]). Therefore it is enough to show our statement for a local ring R with maximal ideal m. Indeed, the question of whether the finitely generated quotient of (5) is projective is local. Since Rp ⊗F pa C M (R) is obviously a direct summand of Rp ⊗F pa C M (Rp ), we are reduced to the local case. In this case it is enough to show that the following map is injective: R/m ⊗F pa C M (R) → M/mM. Since the map C M (R) → C M (R/m) is injective, we are reduced to the case where R is a field. Then the injectivity follows from the considerations above. Let S = SpecR, where R is a henselian local ring with maximal ideal m. Then there is a unique 8-invariant direct summand L ⊂ M such that 8 is a Froba -linear isomorphism on L and is nilpotent on M/L + mM. We call L the finite part. To show this, one reduces the problem by Galois descent (see [G3]) to the case where R is strictly henselian. In this case we can set L = R ⊗F pa C M (R). We note also that taking the finite part L is an exact functor in M. This functor also commutes with tensor products. Let us return to the general situation of Definition 1. For each point η of S we define the function µ(M ,8) (η) = dimF pa (CM )η¯ , where η¯ is some geometric point over η. If µ(M ,8) (η) ≥ k, it stays bigger than or equal to k in some neighbourhood of η. If this function is constant on S, there is a 8-invariant submodule L of M , which
84
THOMAS ZINK
is locally a direct summand, such that 8 is a Froba -linear isomorphism on L and is locally on S nilpotent on M /L . By this last property L is uniquely determined. For this result it is not necessary that S is noetherian. Indeed, in this case the scheme C associated to (M , 8) is finite étale since all geometric fibres have the same number of points (see [EGAIV, corollaire 18.2.9]). Then C represents an étale sheaf on S denoted by the same letter. In the sense of étale sheaves, we have L = O S ⊗F pa C.
If the scheme S is perfect, the exact sequence 0 → L → M → M /L → 0 splits canonically. Indeed, it is enough to define this splitting in the case S = SpecR. Then 8 : L → L is bijective. Assume that 8n is zero on M /L for some number n. Let M nil be the kernel of 8n on M . Then the projection M nil → M /L is bijective. Indeed, let x ∈ M . Then 8n x ∈ L . Since 8 is bijective on L , we find y ∈ L with 8n y = 8n x. But then x and x − y ∈ M nil have the same image in M /L . This proves the assertion. The following purity result is contained in Harris and Taylor [HT] in a special case. PROPOSITION 5 Let R be a noetherian local ring of dimension greater than or equal to 2. Assume that the function µ(M,8) is constant outside the closed point. Then µ(M,8) is constant.
Proof We can assume that R is a complete local ring with algebraically closed residue field. Let S = SpecR, and let U be the complement of the closed point s ∈ S. Since C M is étale over R, it admits a unique decomposition a f 0 CM = CM CM , f
0 has an empty special fibre. where C M is finite and étale over SpecR and where C M 0 is affine as a closed subscheme of C . We note that C M M 0 is empty. Let us assume the opposite. We consider the We have to show that C M following function on U : f
0 ]C M, η¯ = ]C M,η¯ − ]C M,η¯ ,
η ∈ U.
(6)
Here ] denotes the number of points in the corresponding scheme. The first term on the right-hand side of (6) is by assumption constant on U , while the second term has this property for obvious reasons.
ON THE SLOPE FILTRATION
85
Hence all geometric fibres of the map 0 CM →U 0 is not empty have the same number of points. Together with our assumption that C M this shows that the last map is surjective. But this implies that U is affine (see [EGAII, théorème 6.7.1]). Since U is not affine (see [G3, Proposition 6.4]), we have a contradiction.
If R is a regular local ring of dimension 2, then any Frobenius module (M , 8) over U may be extended to S because the direct image of M by j : U → S is a free R-module M. This implies in particular that any locally constant étale sheaf of F pa vector spaces extends to a locally constant étale sheaf on S (purity). We apply this to finite commutative group schemes as follows: let G be a finite locally free group scheme over a scheme S. Assume that we are given a homomora phism 8 : G → G ( p ) . Let G = SpecM relative to S. Then 8 induces on M the structure of a Frobenius module. Let S = SpecR be the spectrum of a henselian local ring. Let L be the finite part of M . Since its formation commutes with tensor products, we obtain a finite locally free group scheme G 8 = SpecL . Since L is a direct summand of M , the natural morphism G → G 8 is an epimorphism of finite locally free group schemes. Let us denote by G 8−nil the kernel. We obtain an exact sequence of finite locally free group schemes 0 → G 8−nil → G → G 8 → 0
(7)
such that 8 induces an isomorphism on G 8 and is nilpotent on the special fibre of G 8−nil . LEMMA 6 Let G i , i = 1, 2, 3, be finite locally free group schemes over the spectrum S of a ( pa ) henselian local ring. Let 8i : G i → G i be homomorphisms. Assume we are given an exact sequence 0 → G1 → G2 → G3 → 0
which respects the homomorphisms 8i . Then the corresponding sequence 8
82 3 1 0 → G8 1 → G2 → G3 → 0
is exact. Proof Let S be the spectrum of an algebraically closed field. In view of decomposition (3),
86
THOMAS ZINK
we have a unique 8i -equivariant section of the epimorphism G i → G i8 . Therefore there exists a functorial decomposition G i = G i8i ⊕ G i8i −nil . This proves the assertion for an algebraically closed field and hence for any field. In 83 2 the general case we consider the kernel H of the epimorphsim G 8 2 → G 3 . Then 1 we obtain a homomorphism of locally free group schemes G 8 1 → H , which is an isomorphism over the closed point of S. Hence it is an isomorphism by the lemma of Nakayama. We consider a pair (G, 8), as above, over any locally noetherian scheme S. Let k be the maximal value of the corresponding function µ = µ(M ,8) . Then the set µ = k is an open set U of S. Over U we have an exact sequence of finite locally free group schemes (7) such that 8 induces an isomorphism on G 8 and is locally on S nilpotent on G 8−nil . If S is irreducible, the complement of U is, by Proposition 5, of pure codimension 1, or it is empty. The formation of G 8 is by Lemma 6 an exact functor in an obvious sense. Assume that 8 is an isomorphism, that is, that G = G 8 . The étale sheaf C associated to (M, 8) is a locally constant étale sheaf of F pa -bigebras. If S is the spectrum of a strictly henselian local ring, the canonical isomorphism M = O S ⊗ C of bigebras means that G is obtained via base change from a group scheme G 0 over ( pa ) F pa , and 8 is obtained from the identity G 0 → G 0 . 3. The slope filtration of a p-divisible group Let X be a p-divisible group over a scheme S, and let λ ∈ Q. We call X slope divisible with respect to λ if locally on S there are integers r, s > 0 such that λ = r/s and the following quasi-isogeny is an isogeny: p −r FrsX : X → X ( p ) . s
(8)
Recall that a quasi-isogeny is an isogeny formally divided by a power of p (see [RZ, Definition 2.8]). We use the fact that the functor of points of S where a quasi-isogeny is an isogeny is representable by a closed subscheme of S (see [RZ, Proposition 2.9]). If X is slope divisible and isoclinic of slope λ (i.e., isoclinic over any geometric point of S), then the isogeny above is an isomorphism. THEOREM 7 Let S be a regular scheme. Let X be a p-divisible group over S whose Newton polygon is constant. Then there is a p-divisible group Y over S which is isogenous to X and which has a filtration by closed immersions of p-divisible groups
0 = Y0 ⊂ Y1 ⊂ · · · ⊂ Yk = Y
ON THE SLOPE FILTRATION
87
such that Yi /Yi−1 is isoclinic and slope divisible of slope λi , and the group Yi is slope divisible with respect to λi . One has λ1 > λ2 > · · · > λk . The existence of the slope filtration over a field which is not necessarily perfect is announced in [G2]. Since a proof was never published, we give it here before treating the general case. PROPOSITION 8 Let K be a field of characteristic p. Let G → H be a morphism of p-divisible groups over K . Then there is a unique factorization in the category of p-divisible groups
G → G0 → H 0 → H with the following properties. (i) G 0 → H 0 is an isogeny. (ii) H 0 → H is a monomorphism of p-divisible groups. (iii) For each number n, the morphism G(n) → G 0 (n) is an epimorphism of finite group schemes. Proof We note that a monomorphism in the category of p-divisible groups is the same thing as a closed immersion. Let A be the kernel of G → H in the category of flat sheaves of abelian groups. Then A has the following properties. (i) The kernel A(n) of multiplication by p n on A is representable by a finite group scheme. (ii) The group A is the union of the subgroups A(n). With these assumptions there is a unique p-divisible subgroup A0 ⊂ A such that the quotient is a finite group scheme. Indeed, we consider the following sequence of monomorphisms p
p
A(n + 1)/A(n) → A(n)/A(n − 1) → · · · → A(1),
(9)
which is induced by the multiplication by p. Since the ranks of the group schemes in (9) cannot decrease infinitely, there is a number n 0 such that A(n + 1)/A(n) → A(n)/A(n − 1) is an isomorphism for n > n 0 . We set A0 = A/A(n 0 ). Then we obtain A0 (m) = A(n 0 + m)/A(n 0 ). Because for A0 all homomorphisms in (9) are isomorphisms, this group is a p-divisible group. The multiplication by p n 0 defines a pn0
monomorphism A0 ,→ A. The cokernel of this monomorphism is a finite locally free
88
THOMAS ZINK
group scheme. This is seen in the diagram 0
0
/ A0 (n 0 ) _
/ A0
/ A(n 0 )
/ A
pn0
/ A0
/0
/ A0
/0
pn0
Now we may define G 0 as the quotient G/A0 and H 0 as the quotient of G 0 by the finite group scheme A/A0 . The group H 0 is the image of G → H in the category of flat sheaves. We call G 0 the small image of G → H . Assume for a moment that K is a perfect field, and let MG and M H be the covariant Dieudonné modules. Then MG 0 is the image of the map MG → M H , while M H 0 is the smallest direct summand of M H containing MG 0 . Let X be a p-divisible group of height h over a perfect field K . We denote by M its covariant Dieudonné module. It is a free W (K )-module of rank h. Let λ = r/s be the first Newton slope of X . By [Z, Lemma 6.13] there is a W (K )-lattice M 0 in M ⊗ Q such that V s M 0 ⊂ pr M 0 . The operator U = p −r V s acts on M ⊗ Q. 9 The submodule M0 ⊂ M ⊗ Q, defined by LEMMA
M0 = M + U M + U 2 M + · · · + U h−1 M, is a Dieudonné module that is invariant by U . Proof By [Z, Lemma 6.13] we know that M ⊗ Q contains a U -invariant lattice. Let M 0 be a lattice that contains M such that U M 0 ⊂ M 0 . We take M 0 minimal with respect to inclusion. Then M * pM 0 . We consider the ascending chain of lattices pM 0 $ pM 0 + M ⊂ pM 0 + M + U M ⊂ · · · ⊂ M 0 . Since dimk M 0 / pM 0 = h, there is an integer e ≤ h − 1 such that pM 0 + M + · · · + U e M = pM 0 + M + · · · + U e+1 M. Hence this is a U -invariant lattice containing M, and we conclude by minimality that M 0 = M + · · · + U e M + pM 0 .
ON THE SLOPE FILTRATION
89
But then the lemma of Nakayama shows that M 0 = M + · · · + U e M. Since U commutes with F and V , it is easy to see that M 0 is a Dieudonné module. This proves the lemma. By this lemma, F s(h−1) M0 is the Dieudonné module of a p-divisible group Y over K , which is slope divisible with respect to λ. Clearly, Y is the small image of the morphism of p-divisible groups which is defined as the composite of the following quasimorphisms: X(p
(h−1)s )
α
× · · · × X(p ) × X → X(p s
(h−1)s )
→ X,
(10)
where the last arrow is the power Ver(h−1)s of the Verschiebung Ver : X ( p) → X and (h−i)s ) where the restriction of α to the factor X ( p is p −(i−1)r Fr(i−1)s . We recall here that Ver induces F on the Dieudonné module M, while Fr : X → X p induces V (see [Z, Lemma 5.19]). If K is not perfect, we can still consider the small image Y of (10). Making base change to the perfect hull, we see that Y → X is an isogeny and that Y is slope divisible, that is, that s 8 = p −r Frs : Y → Y ( p ) is an isogeny. If we apply (7) to the finite group schemes Y (n) and the operator 8, we obtain an exact sequence of p-divisible groups 0 → Y 8−nil → Y → Y 8−´et → 0.
(11)
The p-divisible group Y 8−´et is slope divisible and isoclinic, while the first slope of Y 8−nil is strictly bigger than λ. Remark. In the notation of Lemma 9, the inclusion F s(h−1) M0 ⊂ M holds. This follows easily from r ≤ s. Note that we can take s ≤ h. It follows that over any field 2 K the degree of the isogeny Y → X is bounded by p h (h−1) , that is, by a constant that depends only on the height h of X . Definition 10 Let X be a p-divisible group over a field K . We call X completely slope divisible if it admits a filtration 0 ⊂ X1 ⊂ X2 ⊂ · · · ⊂ Xm = X (12) by p-divisible subgroups and if there are rational numbers λ1 > · · · > λm such that (i) X i is slope divisible with respect to λi for i = 1, . . . , m;
90
(ii)
THOMAS ZINK
X i / X i−1 is isoclinic and slope divisible with respect to λi .
Since there are no homomorphisms between p-divisible groups with pairwise different Newton slopes (see [Z]), it follows easily that the filtration (12) is uniquely determined. COROLLARY 11 If the field K is perfect, the sequence (12) splits canonically.
Proof By the remark after Definition 10, the splitting is unique. We consider the Dieudonné modules Mi of X i . By induction it is enough to show that the following sequence splits as a sequence of Dieudonné modules: 0 → Mm−1 → Mm → Mm /Mm−1 → 0. But 8 = p −rm V s acts on this sequence. On Mm−1 the action is topologically nilpotent, and on Mm /Mm−1 it is bijective. Therefore we conclude by using [Z, Lemma 6.16]. PROPOSITION 12 Let h be a number. Then there is a constant c that depends only on h with the following property. Let X be a p-divisible group of height h over a field K . Then there is an isogeny X 0 → X whose degree is smaller than c such that X 0 is completely slope divisible.
Proof Let λi , for i = 1, . . . m, be the slopes of X . We may write λi = ri /s, where s divides h!, and r1 > · · · > rm . We argue by induction on m. By what we have proved we find an isogeny Y → X of bounded degree such that Y is slope divisible with respect to λm . We set 8 = p −rm Frs , and we obtain the 8-decomposition (11). By induction there is an isogeny of bounded degree Y 8−nil → Z , where Z is completely slope divisible. Then we take the pushout of the sequence (11) by the morphism Y 8−nil → Z : 0 → Z → Z 0 → Y 8−´et → 0. The only thing we have to check is that Z 0 is slope divisible with respect to rm /s. But by induction Z is slope divisible with respect to rm−1 /s and hence a fortiori with respect to rm /s. From the exact sequence (11) it follows that Y 8−nil is slope divisible with respect to λm = rm /s. By definition, Z 0 sits in an exact sequence 0 → Y 8−nil → Y × Z → Z 0 → 0.
ON THE SLOPE FILTRATION
91
Since all groups in this sequence except Z 0 are slope divisible with respect to λm = rm /s, the same is true for Z 0 . COROLLARY 13 Let X be a p-divisible group over a field K . Let λ1 > · · · > λm be the sequence of slopes of X . Then there is a filtration of X ,
0 ⊂ X 1 ⊂ X 2 ⊂ · · · ⊂ X m = X,
(13)
by p-divisible subgroups such that (i) X i has the slopes λ1 , . . . , λi for i = 1, . . . , m; (ii) X i / X i−1 is isoclinic of slope λi . Proof Indeed, in the notation of Proposition 12 it is enough to consider the image of the filtration on X 0 by the isogeny X 0 → X . Proof of Theorem 7 If S is the spectrum of a field, this follows from Proposition 12. This proves the result of A. Grothendieck. If the dimension of S is 1, this was shown under more restrictive conditions by Katz [K, Corollary 2.6.3]. We give an alternative proof that holds in the general sit◦
uation. Let K be a function field of S. Any isogeny X K → Y over K extends to an isogeny X → Y over S (see also the discussion in front of the next proposition). Hence we may assume that X K is completely slope divisible. We have to show that this filtration extends to S. Since the functor of points of S, where (8) is an isogeny, is representable by a closed subscheme of S, we see that X is slope divisible with respect to λk . Therefore we have an isogeny 8 = p −r Frs : X → X ( p ) . s
The function µ associated to (X (n), 8) is constant since the Newton polygon is constant. Therefore we may form the finite group schemes X (n)8 . For varying n this is a p-divisible group Z since the functor X (n) 7 → X (n)8 is exact. We obtain an exact sequence 0 → X 0 → X → Z → 0. Then X 0 again has constant Newton polygon, but the slope λk does not appear. Since X 0K is completely slope divisible, we can finish the proof in the case dim S = 1 by induction. By the same method we may find Y with the filtration over an open set U ⊂ S, which contains all points of codimension 1. Indeed, we start again with an isogeny
92
THOMAS ZINK ◦
◦
X K → Y , where Y is a completely slope divisible p-divisible group over K . The ◦ kernel G of this isogeny is a closed subscheme of some X K (n). We denote by G the scheme-theoretic closure in X (n). Let U be the open subscheme where G is flat. We replace S by U , and we assume that G is flat over S. Then one checks that G inherits the structure of a group scheme such that G → X (n) is a closed immersion of group schemes. We may replace X by Y = X/G. This group is slope divisible with respect to λk . Therefore we obtain the slope filtration as above. The case where S has arbitrary dimension is now a consequence of the following. PROPOSITION 14 Let S be a regular scheme, and let U ⊂ S be an open subscheme that contains all points of codimension 1. Suppose that Y is a p-divisible group on U with a filtration, as in Theorem 7. Then Y extends to S.
Proof We set SpecAi (n) = Yi (n). Let j : U → S be the immersion. It is enough to prove the following 3 statements. (1) The sheaves Ai0 (n) = j∗ Ai (n) are locally free O S -modules. If this is true, the bigebra structure on Ai (n) extends to Ai0 (n). Therefore we can define finite locally free group schemes Yi0 (n) = SpecAi0 (n). (2) For varying n the systems {Yi0 (n)} define a p-divisible group Yi0 . 0 (3) The induced maps Yi0 → Yi+1 are closed immersions. To verify these statements, one can make without loss of generality a faithfully flat base change S 0 → S. Therefore it is enough to consider the case where S is the spectrum of a complete regular local ring R of dimension greater than or equal to 2 with algebraically closed residue field. By induction on the dimension we may assume that U is the complement of the closed point. We make an induction on the length of the filtration. For k = 1 we extend Y1 as follows. Let λ1 = r/s such that 8 = p −r Frs is an isogeny and hence an isomorphism ( ps ) Y1 → Y1 . Since the morphism Frobs : S → S is flat, we may apply base change (see [EGAIV, lemme 2.3.1]) to the Cartesian diagram U −−−−→ Frobs y
S s yFrob
U −−−−→ S This yields an isomorphism R ⊗Frobs ,R A01 (n) ∼ = j∗ (OU ⊗Frobs ,OU A1 (n)),
ON THE SLOPE FILTRATION
93
where A01 (n) denotes the global sections of A10 (n). Therefore the isomorphism 8 induces an isomorphism 8∗ : R ⊗Frobs ,R A01 (n) → A01 (n). We denote the associated Frobs -linear map by 9 : A01 (n) → A01 (n). By Lemma 2 it follows that A01 (n) is free. This shows assertion (1). To see the second assertion, we have to show that the sequence 0 → Y10 (1) → Y10 (n) → Y10 (n − 1) → 0
(14)
is exact. Since we know that this sequence is exact over U , it suffices to show that the first arrow is a closed immersion. Indeed, knowing this, we obtain a morphism of finite locally free group schemes over S: Y10 (n)/Y10 (1) → Y10 (n − 1). Since this is an isomorphism over U , it must also be an isomorphism over S. Finally, consider the locally constant étale sheaves Cn on S associated to (A01 (n), 9). Then Cn → C1 is an epimorphism of étale sheaves because the restriction to U is. From this we obtain that A01 (n) → A01 (1) is surjective too. Hence the first arrow of (14) is a closed immersion, and the sequence is exact. We assume now by induction that 0 Yk−1 with its filtration extends to a p-divisible group Yk−1 on S. We denote by Z 0 the extension of the p-divisible group Z = Yk /Yk−1 to S. m We show that Ak0 (n) is free. We denote FrobU by αm : Um → U , and we denote m Frob S by βm : Sm → S. Of course Um = U , but we would like to think of FrobU as a flat covering. Again applying base change to the Cartesian diagram above, it is ∗ A (n) is free. But the exact sequence enough to show that j∗ αm k 0 → Yk−1 (n) → Yk (n) → Z (n) → 0
(15)
splits over the perfect closure of U and therefore over some Um by the discussion preceding Proposition 5. Hence over Um the scheme Yk (n) ×U Um is the product of the schemes Yk−1 (n) ×U Um and Z (n) ×U Um . Therefore Yk (n) ×U Um extends to a ∗ A (n) is free. locally free scheme over Sm . This proves that j∗ αm k Since Yk is slope divisible with respect to λk , we find r and s with λk = r/s ( ps ) such that 8 = p −r Frs : Yk → Yk is an isogeny. The pairs (Yk (n), 8) extend to S. The purity result (see Proposition 5) for these extensions (Yk0 (n), 8) yields an exact sequence of finite locally free group schemes on S: 0 → Hn → Yk0 (n) → (Yk0 (n))8 → 0. It is clear that Hn must be the extension of Yk−1 (n) and that (Yk0 (n))8 must be the extension Z 0 . We set Y 0 = lim Yk0 (n) as a flat sheaf. Then we obtain an exact sequence →
94
THOMAS ZINK
of flat sheaves 0 0 → Yk−1 → Y 0 → Z 0 → 0.
We know that the outer sheaves are p-divisible groups, and therefore Y 0 is a pdivisible group. This is the desired group in the isogeny class of X . Since this also proves Theorem 7, we obtain the following by combining this theorem and Proposition 14. COROLLARY 15 Let S be a regular scheme, and let U ⊂ S be an open subscheme that contains all points of codimension 1. Suppose that X is a p-divisible group on U such that the Newton polygon is constant on U . Then there is a p-divisible group X 0 on S whose restriction to U is isogenous to X . The Newton polygon of X 0 is constant.
Proof The reader should compare this with a result of de Jong and Oort [JO, Theorem 4.13].
Acknowledgments. I would like to thank Johan de Jong and Michael Harris for pointing out this problem to me, and Frans Oort for helpful remarks. References [G1]
A. GROTHENDIECK, Local Cohomology, Lecture Notes in Math. 41, Springer, Berlin,
1967. MR 37:219. , Groupes de Barsotti-Tate et cristaux de Dieudonné, Sém. Math. Sup. 45, Presses de l’Univ. de Montreal, Montreal, 1974. MR 54:5250 79, 82, 87 [G3] , “Technique de descente et théorèmes d’existence en géometrie algébrique, I: Généralitiés: Descente par morphismes fidèlement plats” in Seminaire Bourbaki, Vo.l 5, 1959/1960, exp. no. 190, Soc. Math. France, Montrouge, 1995, 299–327. MR CMP 1 603 475 83, 85 [EGAII] A. GROTHENDIECK and J. DIEUDONNE, Éléments de géometrie algébrique, II: Étude globale élémentaire de quelques classes des morphismes, Inst. Hautes Études Sci. Publ. Math. 8 (1961). MR 29:1208 85 [EGAIV] , Éléments de géometrie algébrique, IV: Étude locale des schémas et des morphismes de schémas, II, Inst. Hautes Études Sci. Publ. Math. 24 (1965) MR 33:7330; IV, Inst. Hautes Études Sci. Publ. Math. 32 (1967). MR 39:220 82, 83, 84, 92 [HT] M. HARRIS and R. TAYLOR, On the geometry and cohomology of some simple Shimura varieties, preprint, 2001, [G2]
ON THE SLOPE FILTRATION
[JO] [K]
[M]
[RZ] [Z]
95
http://www.math.harvard.edu/HTML/Individuals/Richard\ protect\T1\textunderscoreTaylor.html 84 A. J. DE JONG and F. OORT, Purity of the stratification by Newton polygons, J. Amer. Math. Soc. 13 (2000), 209–241. MR 2000m:14050 94 N. M. KATZ, “Slope filtration of F-crystals” in Journées de géométrie algebrique de Rennes (Rennes, 1978), Vol. I, Astérisque 63, Soc. Math. France, Montrouge, 1979, 113–163. MR 81i:14014 80, 91 W. MESSING, The Crystals Associated to Barsotti-Tate Groups: With Applications to Abelian Schemes, Lecture Notes in Math. 264, Springer, Berlin, 1972. MR 50:337 M. RAPOPORT AND TH. ZINK, Period Spaces for p-Divisible Groups, Ann. of Math. Stud. 141, Princeton Univ. Press, Princeton, 1996. MR 97f:14023 86 TH. ZINK, Cartiertheorie kommutativer formaler Gruppen, Teubner-Texte Math. 68, Teubner, Leipzig, 1984. MR 86j:14046 82, 88, 89, 90
Fakultät für Mathematik, Universität Bielefeld, Postfach 100131, D-33501 Bielefeld, Germany;
[email protected]
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 1,
ON THE MODULARITY OF Q-CURVES JORDAN S. ELLENBERG AND CHRIS SKINNER
Abstract A Q-curve is an elliptic curve over a number field K which is geometrically isogenous to each of its Galois conjugates. K. Ribet [17] asked whether every Q-curve is modular, and he showed that a positive answer would follow from J.-P. Serre’s conjecture on mod p Galois representations. We answer Ribet’s question in the affirmative, subject to certain local conditions at 3. 1. Introduction Let K be a number field, Galois over Q. A Q-curve over K is an elliptic curve E/K which is isogenous over K to each of its Galois conjugates. Our interest in Q-curves is motivated by the following theorem of Ribet. THEOREM [17, §5] ¯ is an elliptic curve that is also a quotient of J1 (N )/Q. ¯ Then E is a Suppose E/Q Q-curve over some number field.
¯ is called modular; Ribet has conjectured A Q-curve that is a quotient of J1 (N )/Q that, in fact, every Q-curve is modular. The modularity of various Q-curves has been verified by B. Roberts and L. Washington [18], by Y. Hasegawa, K.-I. Hashimoto, and F. Momose [12], and by H. Hida [13]. In this article we establish the modularity of a large class of Q-curves, including infinitely many curves not treated in the aforementioned papers (but not including every curve treated there). Suppose E/K is a Q-curve, and suppose E σ is a Galois conjugate of E. Then there exists a nonzero K -isogeny µ : E σ → E, and so if p is a prime dividing the square-free part of the degree of µ, then the Gal( K¯ /K )-module E[ p] is reducible. The arguments employed in [12] and [13] use this reducibility to associate to E a p-adic ¯ representation of Gal(Q/Q) whose reduction mod p has dihedral image and which is therefore modular (in the sense that it arises from a modular form). Consequently, DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 1, Received 11 May 2000. 2000 Mathematics Subject Classification. Primary 11G05; Secondary 11F80, 11G18, 14H52. Skinner’s work partially supported by a National Science Foundation grant and by the Clay Mathematics Institute. 97
98
ELLENBERG AND SKINNER
the results in [12] and [13] depend on the existence of a prime p ≥ 5 dividing the square-free part of the degree of some K -rational isogeny between E and one of its Galois conjugates. Moreover, their results require E to satisfy certain local conditions at p. In contrast, the arguments we employ in the present paper make use of the mod 3 and 3-adic representations attached to a Q-curve. We thus obtain a theorem that does not require the existence of a rational isogeny of large degree (but that does require local conditions at 3). This allows us, for instance, to prove the modularity of the Q-curve E = E A,B,C : y 2 = x 3 + 2(1 + i)Ax 2 + (B + i A2 )x (1.1) discussed by H. Darmon in [5] in connection with the generalized Fermat equation A4 + B 2 = C p .
(1.2)
We will discuss in a later paper the consequences of the present result regarding solutions of (1.2). In order to state the main theorems of this paper, we introduce a few definitions. ¯ Let E/K be a Q-curve, and, for each σ ∈ Gal(Q/Q), let µσ : E σ → E be a nonzero 2 ¯ isogeny. Then we define b E ∈ H (Gal(Q/Q), ±1) by b E (σ, τ ) = sgn(µσ µστ µ−1 σ τ ). ¯ 3 /Q3 ), ±1). Furthermore, we asDenote by (b E )3 the restriction of b E to H 2 (Gal(Q ¯ sociate to E an `-adic Galois representation ρ E,` of Gal(Q/Q) and a quadratic char¯ ¯ acter ψ E,3 of Gal(Q/Q) (cf. Proposition 2.3 and Definition 2.16). THEOREM
Suppose E/K is a Q-curve with potentially ordinary or multiplicative reduction at a prime of K over 3, and such that (b E )3 is trivial. Then E is modular. THEOREM
Suppose E/K is a Q-curve such that, for some (hence every) prime ` > 3, the projective representation Pρ E,` associated to ρ E,` is unramified at 3. Then E is modular. We can weaken the condition on ρ E,` in the second theorem above, at the expense of introducing some technical conditions. ¯ Denote by q3,∞ the unique class in H 2 (Gal(Q/Q), ±1) ramified exactly at 3 and ∞. We prove the following theorem. THEOREM
Suppose E/K is a Q-curve that acquires semistable reduction over a field tamely
ON THE MODULARITY OF Q-CURVES
99
ramified over Q3 . Suppose further that (b E )3 is trivial and that the four classes (ψ¯ E,3 , −1), b E q3,∞ (ψ¯ E,3 , −1), q3,∞ (ψ¯ E,3 , 3), and b E (ψ¯ E,3 , 3) are all nontrivial in ¯ H 2 (Gal(Q/Q), ±1). Finally, suppose that deg µσ can be chosen to be prime to 3 for ¯ all σ ∈ Gal(Q/Q). Then E is modular. We remark that the Q-curve (1.1) satisfies the hypotheses of both the second and third theorems above. There are infinitely many Q-curves that are not proved to be modular by the theorems in this paper. An instructive example is the curve √ √ √ √ E : y 2 = x 3 + (−994708512 5257 73 − 414461880 5257 − 4973542560 73 √ √ − 1089620282520)x + 36601957546560 5257 73 √ √ + 5349307626327168 5257 + 55021459817878848 73 + 32065347994985088,
(1.3)
which is the specialization to a = 22 · 32 · 73 of the family of Q-curves described by J. Quer in [14, §6]. One checks that • the reduction of E over 3 is potentially supersingular; • Pρ E,` is ramified at 3; • there is an isogeny of degree 3 between E and one of its Galois conjugates. Thus, E does not satisfy the hypotheses of any of the theorems above. The 3-adic representation ρ E,3 is residually irreducible, but the image of the restriction ρ E,3 |G 3 does not have trivial centralizer. To prove that such a representation is modular is beyond the reach of existing technology in deformation theory, including the recent result of C. Breuil, B. Conrad, F. Diamond, and R. Taylor [1]. The modularity of a Q-curve E is equivalent to the modularity of any one of the ρ E,` ’s. (The modularity of the latter means that it is a representation associated to a modular form.) Most of the present paper is devoted to proving the modularity of the ρ E,3 ’s. This is essentially done by showing that under the hypotheses stated in the theorem these representations satisfy the main theorems of either [3], [22], or [23]. Some notation. If K is a quadratic extension of Q, we write χ K for the quadratic ¯ character of Gal(Q/Q) associated to K . Any two quadratic characters χ and χ 0 of ¯ ¯ Gal(Q/Q) give classes in H 1 (Gal(Q/Q), ±1). We write (χ , χ 0 ) for their cup product. 2 ¯ This is an element in H (Gal(Q/Q), ±1). If d is an element of Q∗ /(Q∗ )2 , we write (χ, d) to mean the cup product (χ, χQ(√d) ). ¯ We write elements in H 2 (Gal(Q/Q), ±1) multiplicatively. Thus, if c1 , c2 ∈ 2 ¯ H (Gal(Q/Q), ±1), then c1 c2 is the class such that (c1 c2 )(σ, τ ) = c1 (σ, τ )c2 (σ, τ ) ¯ for all σ, τ ∈ Gal(Q/Q).
100
ELLENBERG AND SKINNER
¯ ,→ Q ¯ ` to be fixed for each `, and we denote the We take an embedding ν : Q ¯ resulting decomposition subgroup (resp., inertia subgroup) of Gal(Q/Q) by G ` (resp., I` ). To be completely precise, we need to define two such embeddings: one in order to ¯ define decomposition subgroups of Gal(Q/Q) and the other in order to make sense of ¯ ¯ ` . Write ι : Q ¯ ,→ Q ¯ ` for the scalar action of Q on `-adic vector spaces like T` A ⊗Z` Q the second embedding. We may think of ν as fixed through the course of the paper; on the other hand, we occasionally want to vary ι. ¯ ,→ C to be fixed. This determines a complex We also take an embedding Q conjugation c. We denote by G t` , I`t , I`w the tame quotient of the decomposition group, the tame inertia group, and the wild inertia group, respectively. ¯ For ` a rational prime, we denote by χ` : Gal(Q/Q) → Z∗` the cyclotomic ∗ ¯ character, and by χ¯ ` : Gal(Q/Q) → F` the mod ` cyclotomic character. If ρ : G → GL2 (F) is a representation of a group G over a field F, we write Pρ for the composition of ρ with the natural projection GL2 (F) → PGL2 (F). 2. Q-curves and Galois representations In this section we describe the `-adic and mod ` Galois representations attached to a Q-curve. We also define Galois cohomology classes c E , b E , and ψ¯ E,` which are naturally attached to a Q-curve E. The definitions and results of this section, with the exception of Proposition 2.13, are not original to this paper. The basic framework is laid down in Ribet’s foundational paper [17]. The interested reader should also consult Quer’s paper [14], which alerted us to the relevance of the class b E . Let K be a number field Galois over Q. Definition 2.1 ¯ A Q-curve E/K is an elliptic curve E/K such that, for each σ ∈ Gal(Q/Q), there exists a nonzero K -isogeny µσ : E σ → E. ¯ We may, and do, suppose that µσ is the identity morphism for all σ ∈ Gal(Q/K ). Remark 2.2 Throughout this paper it is understood that all Q-curves are elliptic curves without complex multiplication. This assumption is not restrictive from our point of view since Q-curves with complex multiplication are known to be modular (see [21]). Let ` be a rational prime, and define φ E,` : Gal( K¯ /K ) → GL2 (Z` )
ON THE MODULARITY OF Q-CURVES
101
to be the representation of Gal( K¯ /K ) on the `-adic Tate module T` E of E. (We have fixed an isomorphism T` E ∼ = Z2` .) In the following proposition we describe an ¯ extension of φ E,` to a representation of the whole group Gal(Q/Q). PROPOSITION 2.3 There exists a representation
¯ ∗ GL2 (Q` ) ρ E,` : G Q → Q ` such that Pρ E,` |Gal( K¯ /K ) ∼ = Pφ E,` . This representation is odd, continuous, and ramified at only finitely many primes. Proof For each nonzero isogeny µ : E 0 → E, we write µ−1 to mean (1/ deg µ)µ∨ ∈ Hom(E 0 , E) ⊗Z Q, where µ∨ is the dual isogeny. ¯ Let σ and τ be elements of Gal(Q/Q). Following [17, §6], we define ∗ ∗ c E (σ, τ ) = µσ µστ µ−1 σ τ ∈ (Hom(E, E) ⊗ Z Q) = Q .
¯ Then c E determines a class in H 2 (Gal(Q/Q), Q∗ ). Tate showed that ¯ ¯ ∗ ) is trivial, where Q ¯ ∗ is acted on trivially by Gal(Q/Q) ¯ H 2 (Gal(Q/Q), Q (see ¯ ¯ ∗ such [19, Th. 4]). It follows that there exists a continuous map α : Gal(Q/Q) →Q that c E (g, h) = α(g)α(h)α(gh)−1 . (2.4) ¯ ¯ ` ⊗Z T` E by We can now define an action of Gal(Q/Q) on Q ` ρ E,` (g)(1 ⊗ x) = α −1 (g) ⊗ µg (x g ).
(2.5)
It is clear from the above definition that Pρ E,` |Gal( K¯ /K ) ∼ = Pφ E,` . In particular, ρ E,` |Gal( K¯ /K ) and φ E,` differ by the continuous character α|Gal( K¯ /K ) . It follows that ρ E,` is continous and unramified away from finitely many primes. It remains to show that ρ E,` is odd, that is, that det ρ E,` (c) = −1, where c is our fixed complex conjugation. ¯ ¯ ∗ by Define a map E : Gal(Q/Q) →Q E (σ ) = α 2 (σ )/(deg µσ ), ¯ ,→ Q ¯ ` . That this and let E,` be the composition of E with the chosen embedding Q map is a character follows from the observation that c E (σ, τ )2 =
(deg µσ )(deg µτ ) . deg µσ τ
102
ELLENBERG AND SKINNER
It also follows immediately from (2.5) that −1 det ρ E,` = E,` χ` .
(2.6)
Write µ for µc . We may write the complexification E/C as the quotient of C by a lattice 3. Then µ is given by multiplication by a complex number z such that ¯ ⊂ 3. The composition µµc is then given by zz c , a positive real number. Since the z3 degree of µµc is (deg µ)2 , we conclude that µµc = deg µ. Therefore, E (c) = α 2 (c)/ deg µ = c E (c, c)/ deg µ = µµc / deg µ = 1 and det ρ E,` (c) = E,` (c)χ` (c) = −1. Since α 2 (c) = c E (c, c) = µµc = deg µ, the proposition follows from the definition of E,` and (2.6). Remark 2.4 It is occasionally useful to work directly with the homomorphism ¯ ¯ ∗ GL2 (Q` ) ρˆ E,` : Gal(Q/Q) →Q defined by (2.5). More precisely, suppose M is a number field such that ρˆ E,` takes ¯ →Q ¯ ` , and let λ = values in M ∗ GL2 (Q` ). Let λ be the prime of M defined by ι : Q λ1 , . . . , λr be the set of all primes of M dividing `. Write ρ E,λi for the composition of ρˆ E,` with the map M ∗ GL2 (Q` ) → Mλ∗i GL2 (Q` ). So ρ E,λ is just another name for ρ E,` . Remark 2.5 While ρ E,` and E depend on our choice of α, the projective representation Pρ E,` depends only on the isomorphism class of E/K . Moreover, Pρ E,` is independent of the choice of ι. Remark 2.6 We can choose α in such a way that the image of E has 2-power order, by the following argument. Let n = 2a m be the order of the image of E , where m is odd. (m−1)/2 If m 6 = 1, replace α by α E ; this has the effect of replacing E by Em , whose image has 2-power order.
ON THE MODULARITY OF Q-CURVES
103
The reason for introducing the representations ρ E,` is found in the following proposition. PROPOSITION 2.7 A Q-curve E/K is modular if there exists a (normalized) eigenform f and a prime ` such that ρ E,` ∼ = ρ f,` .
Here, f is a holomorphic Hecke eigenform on the complex upper half-plane, and ¯ ¯ ` ) such that if f (z) = ρ f,` is the Galois representation ρ f,` : Gal(Q/Q) → GL2 (Q P∞ n=1 a(n)e(nz) (a(1) = 1), then trace ρ f,` (Frob p ) = a( p) for almost all primes p. Proof Suppose ρ E,` ∼ = ρ f,` for some eigenform f of level N . Then there exists some finite extension L/K such that ∼ φ E,` |Gal( L/L) = ρ f,` |Gal( L/L) ¯ ¯
(2.7)
and the weight of f must be 2, as can be seen by comparing determinants. Let ρ N ,` ¯ ¯ ` , where T` J1 (N ) is the `-Tate be the representation of Gal(Q/Q) on T` J1 (N ) ⊗Z` Q module of J1 (N ). We have ρ N ,` ' ⊕ρg,` (2.8) where the sum is over all the eigenforms g of level N and weight 2. (This can be ¯ deduced from [20, Th. 7.11].) From (2.7) and (2.8) it follows that φ E,` is a Gal( L/L)quotient of ρ N ,` . It then follows that Hom L (T` J1 (N ), T` E) is nonzero. By a theorem of G. Faltings [8] we can conclude from this that Hom L (J1 (N ), E) is nonzero. We next define some cohomological invariants associated to E. Let b E ∈ ¯ H 2 (Gal(Q/Q), ±1) be the composition of c E with the sign map Q∗ → ±1. Then b E can be computed from E . Consider the exact sequence in Galois cohomology δ
¯ ¯ ∗ ) → Hom(Gal(Q/Q), ¯ ¯ ∗ ) → H 2 (Gal(Q/Q), ¯ Hom(Gal(Q/Q), Z Z ±1) arising from the short exact sequence of Galois modules (with trivial action) ¯∗ → Z ¯ ∗ → 0. 0 → ±1 → Z PROPOSITION 2.8 We have b E = δ( E ).
(2.9)
104
ELLENBERG AND SKINNER
Proof ¯ ¯ ∗ be a character, and, for each σ ∈ Gal(Q/Q), ¯ Let χ : Gal(Q/Q) →Z let χ˜ (σ ) be a square root of χ (σ ). Then δ(χ) is defined by δ(χ)(σ, τ ) =
χ(σ ˜ )χ(τ ˜ ) . χ(σ ˜ τ)
To compute δ( E ), we may choose p ˜ E (σ ) = α(σ )/ deg µσ , where the
√
sign signifies positive square root. We now have
p q deg µσ τ α(σ )α(τ ) p p δ( E )(σ, τ ) = = c E (σ, τ )/ c2E (σ, τ ) = b E (σ, τ ). α(σ τ ) deg µσ deg µτ
Remark 2.9 Note that the class c E is the inflation of a class in H 2 (Gal(K /Q), ±1). Quer [14, Th. 2.4] has proven the converse: if E/K 0 is a Q-curve over some extension of K , and if c E is the inflation of a class in H 2 (Gal(K /Q), ±1), then there exists a Q-curve E 0 /K such that E 0 × K K 0 is geometrically isogenous to E. We need the fact that the representation ρ E,` can also be viewed as the `-adic representation attached to a certain abelian variety over Q. PROPOSITION 2.10 ¯ ¯ ∗ be a 1-cochain with coboundary Let E/K be a Q-curve, and let α : Gal(Q/Q) →Q c E , as in (2.4). Define ρ E,` as in (2.5). Let M be the number field generated by the ¯ α(g) for all g ∈ Gal(Q/Q). There exists an abelian variety Aα /Q satisfying the following conditions. • There exists an injection M ,→ End(Aα /Q) ⊗Z Q. • If λ1 , . . . , λr are the primes of M lying over `, then the rational Tate module V` Aα decomposes as M V` Aα = Vλi Aα i
¯ and Vλi Aα is isomorphic, as the Mλi [Gal(Q/Q)]-module, to ρ E,λi . In partic∼ ular, Vλ Aα = ρ E,` . Proof The desired Aα is the one constructed by Ribet in [17, §6]. We briefly recall this
ON THE MODULARITY OF Q-CURVES
105
construction. First, enlarge K if necessary so that α is the inflation of a function on Gal(K /Q). Let R be the algebra generated by elements λσ for each σ ∈ Gal(K /Q), with the multiplication table λσ τ c E (σ, τ ) = λσ λτ . Then R acts on the abelian variety K ResQ E ×Q K ∼ =
by the rule
M
Eσ
σ ∈Gal(K /Q)
λσ (P) = µτσ (P)
(2.10)
K E. for any P ∈ E τ σ ( K¯ ). This action descends to an action of R on ResQ Our choice of α defines a homomorphism ω : R → M. Now, define K Aα = ResQ E ⊗R M
in the category of abelian varieties up to isogeny. To be more precise, let π ∈ R be the projector onto M; then Aα is the image of mπ , where m is an integer large enough K E. to make mπ an actual endomorphism (not only a rational endomorphism) of ResQ Then Aα admits the desired injection M ,→ End(Aα ) ⊗Z Q, and the rational λi -adic Tate module Vλi Aα is a 2-dimensional vector space over Mλi (see [16, Th. ¯ 2.1.1]); one then has from (2.10) that Gal(Q/Q) acts on Vλi Aα via ρ E,λi . Remark 2.11 We emphasize that the construction of Aα is independent of `. 2.12 ¯ Let ` > 2. Suppose ` does not divide deg µg for any g ∈ Gal(Q/Q), and suppose α is chosen so that E has 2-power order (see Remark 2.6.) Let λ = λ1 , . . . , λi be the set of primes of M dividing `. Then the full ring of integers of M ⊗Z Z` ∼ = ⊕i Mλi acts on T` Aα , and the `divisible group T` Aα breaks up as a direct sum of `-divisible groups M Tλi Aα , (2.11) PROPOSITION
i
where the λi range over the primes of M dividing `. Moreover, Tλi Aα is a free O Mλi module of rank 2. Proof Let Z[α] be the ring generated by the α(g). Then it follows by the definition of Aα
106
ELLENBERG AND SKINNER
p that Z` ⊗Z Z[α] acts on T` Aα . Since deg µg is an `-adic unit, and since α(g)/ deg µg is a 2-power root of unity, we can obtain Z` ⊗Z Z[α] by successively adjoining square roots of `-adic units to Z` ; it follows that Z` ⊗Z Z[α] is e´ tale over Z` , and therefore M Z` ⊗Z Z[α] ∼ O Mλi . = i
The decomposition of T` Aα now follows immediately from the decomposition (2.11).
We now want to define a mod ` representation attached to E. We begin with a general result about Galois representations. 2.13 Let L be a totally ramified extension of Q` , and let F be an unramified extension of ¯ L. Let ρ be a continuous representation of Gal(Q/Q) (or any compact group) with ∗ image in F GL2 (Q` ). Then ρ is conjugate in GL2 (F) to a representation with image in O F∗ GL2 (O L ). PROPOSITION
Proof Let S, T be a basis for F ⊕2 with respect to which the image of ρ lies in F ∗ GL2 (Q` ), and let L0 be the lattice O F S + O F T generated by S, T . There are only finitely many ¯ images of L0 under the action of the compact group Gal(Q/Q). Each such image Li is of the form x(O F (aS + bT ) + O F (cS + dT )) with x ∈ F ∗ and a, b, c, d ∈ Q` . Let L be the lattice generated by all the Li ; then ¯ L is preserved by the action of ρ(Gal(Q/Q)). Because F/L is unramified, we may ∗ ∗ write x = yu with y ∈ L and u ∈ O F . So each Li can be rewritten as y(O F (aS + bT ) + O F (cS + dT )). Let Li0 be the lattice in L 2 defined by Li0 = y(O L (aS + bT ) + O L (cS + dT )),
and let L 0 = O L (αS + βT ) + O L (γ S + δT )
with α, β, γ , δ ∈ L, be the lattice generated by all the Li0 . Then L = L 0 ⊗O L O F . Let S 0 = αS + βT and T 0 = γ S + δT , and write ¯ ρ : Gal(Q/Q) → GL2 (F)
ON THE MODULARITY OF Q-CURVES
107
with respect to the basis elements S 0 and T 0 . ¯ ¯ Since ρ(Gal(Q/Q)) preserves the lattice O F S 0 + O F T 0 , we have ρ(Gal(Q/Q)) ∈ 0 0 ∗ ¯ GL2 (O F ). Since S , T lie in L S + L T , we have ρ(Gal(Q/Q)) ∈ F GL2 (L). Combining these two facts yields the desired result. The representation ρ E,` produced in Proposition 2.3 takes values in Mλ∗ GL2 (Q` ), where Mλ is the extension of Qλ generated by the values of ι(α(σ )) for all σ ∈ ¯ Gal(Q/Q). Recall from the proof of Proposition 2.3 that E,` (σ ) = α 2 (σ )/ deg µσ is a Dirichlet character. So Mλ is contained in an extension generated by square roots and roots of unity; it is thus an abelian extension of Q` . It follows from local class field theory that there exists an abelian extension F of Q` containing M and such that F also contains a subextension L totally ramified over Q` over which F is unramified. Then F and L satisfy the conditions of Proposition 2.13, so there exists a basis of F ⊕2 with respect to which ρ E,` takes images in O F∗ GL2 (O L ). Definition 2.14 We denote by ¯ ρ¯ E,` : Gal(Q/Q) → F¯ ∗` GL2 (F` ) the representation obtained by choosing a basis of F ⊕2 as above and reducing the resulting representation ¯ ρ E,` : Gal(Q/Q) → O F∗ GL2 (O L ) modulo the maximal ideal m F of O F . The reduced representation ρ¯ E,` is then well defined up to semisimplification and conjugation by GL2 (F¯ ` ). We observe that det ρ¯ E,` = ¯ E,` χ¯ ` , where the overlines indicate the reductions of the `-adic characters to mod ` characters. Let δ¯ be the reduction mod ` of the coboundary map δ in (2.9). From this point on, we assume that ` > 2. PROPOSITION 2.15 ¯ E,` ). We have b E = δ(¯
Proof This is immediate from Proposition 2.8. When R is a domain, we abuse notation and denote by “det” the determinant character from PGL2 (R) to R ∗ /(R ∗ )2 .
108
ELLENBERG AND SKINNER
Definition 2.16 Let ` be an odd prime. Then we define a quadratic Dirichlet character ¯ ψ¯ E,` = det Pρ¯ E,` : Gal(Q/Q) → F∗` /(F∗` )2 ∼ = ±1.
The character ψ¯ E,` , like the cohomology class b E , depends only on the isomorphism class of E/K . Remark 2.17 The invariants b E and ψ¯ E,` are easy to compute in practice. For instance, suppose K is a quadratic extension of Q, and suppose E/K is a Q-curve. Let τ be the nontrivial element of Gal(K /Q), and let n be the integer such that µµτ is multiplication by n. Then ( 0 if n is positive, bE = χ K if n is negative. ¯ Suppose, for simplicity, that ` does not divide n. Let η : Gal(Q/Q) → ±1 be the quadratic character ramified only at `. Then ( η, n ∈ (Q∗` )2 , ψ¯ E,` = ηχ K , n ∈ / (Q∗` )2 .
3. The potentially supersingular, residually irreducible case THEOREM 3.1 Suppose K is tamely ramified over 3. Let E/K be a Q-curve such that • E has good supersingular reduction over K v for one (hence every) prime v of K above 3; ¯ • b E ∈ H 2 (Gal(Q/Q), ±1) has trivial projection to H 2 (G 3 , ±1); • either Pρ E,` |G 3 is unramified for some (hence every) ` 6 = 3, or deg µσ is not ¯ a multiple of 3 for any σ ∈ Gal(Q/Q); √ ¯ • the restriction of ρ¯ E,3 to Gal(Q/Q[ −3]) is absolutely irreducible. Then E is modular.
Proof The basic tool is the theorem of A. Wiles [25] and of Taylor and Wiles [24], as refined
ON THE MODULARITY OF Q-CURVES
109
by Diamond [7] and by Conrad, Diamond, and Taylor [3]. In particular, our argument follows closely the proof of [3, Th. 7.2.1]. We have from Proposition 2.3 that ρ E,3 is an odd, continuous representation unramified away from finitely many primes. Write G v , Iv for the absolute Galois group and the inertia group of K v . Write I3w for the subgroup of wild inertia in I3 . LEMMA 3.2 We have that ρ¯ E,3 is modular.
Proof We follow closely the usual argument that the 3-division points of an elliptic curve over Q form a modular representation (see [9, §I.1] for more details). The image of ρ¯ E,3 lies in F¯ ∗3 GL2 (F3 ). We suppose without loss of generality that √ the chosen extension of the 3-adic valuation of Q to Q[ −2] is given by the prime √ (1 + −2). We can define a homomorphism √ ι : F¯ ∗3 GL2 (F3 ) → µ∞ GL2 (Z[ −2]), where µ∞ denotes the group of roots of unity, as follows: set −1 1 −1 1 ι = , −1 0 −1 0 1 −1 1 −1 √ √ ι = , 1 1 − −2 −1 + −2 and, for each scalar a ∈ F¯ ∗3 , define ι(a) to be the preimage, under the chosen embed¯ ,→ Q ¯ 3 , of the Teichm¨uller lift of a. ding Q Let F be a number field such that the image of ι ◦ ρ¯ E,3 lies in GL2 (F), and let w be the chosen extension of the 3-adic valuation to F. Then the composition of ι ◦ ρ¯ E,3 with reduction mod w is ρ¯ E,3 . ¯ The composition ι ◦ ρ¯ E,3 is a continuous complex representation of Gal(Q/Q), odd and irreducible because ρ¯ E,3 is odd and absolutely irreducible. It follows from the theorem of R. Langlands and J. Tunnell that there exists a weight-1 eigenform, of some level and Dirichlet character, g=
∞ X
bn q n
n=1
such that b p = Tr(ι ◦ ρ¯ E,3 (Frob p ))
110
ELLENBERG AND SKINNER
for almost all p. Let F 0 be a number field containing all the bn . If E is a weight-1 Eisenstein series whose Fourier expansion is congruent to 1 mod 3, then g E is a weight-2 cusp form, of some level and Dirichlet character, such that Tn (g E) is congruent mod w to bn g E, for some prime w0 of F 0 above w. It then follows from an argument of P. Deligne and Serre [6, §6.10] that there exists an eigenform f =
∞ X
an q n
n=1
of weight 2 with an ∈ F 0 and an ≡ bn mod w for all n. In particular, a p = Tr(ρ¯ E,3 (Frob p )) for almost all p. So ρ¯ E,3 is the mod w0 representation associated to f . We show that ρ E,3 satisfies the conditions of [3, Th. 7.1.1]. Recall that, for any `, φ E,` : Gal( K¯ /K ) → GL2 (Z` ) is the Galois representation attached to E/K as elliptic curve, and recall that Pφ E,` and Pρ E,` are isomorphic projective representations of Gal( K¯ /K ), by Proposition 2.3. The representation ρ E,` produced by Proposition 2.3 depends on a choice of α : ¯ ¯ ∗ , a cochain whose coboundary is c E . We begin by observing that α Gal(Q/Q) →Q can be chosen so as to impart to ρ E,` some useful arithmetic properties. LEMMA 3.3 ¯ ∗ such that There exists a choice of α : G Q → Q • for all ` 6= 3, the representation ρ E,` |G 3 is tamely ramified; • for all `, det ρ E,` |G 3 = χ` |G 3 ; • E has 2-power order.
Proof Since K is tamely ramified, c E |G 3 is the inflation of an element of H 2 (G t3 , Q∗ ). The cohomology group ¯ ∗) H 2 (G t3 , Q is trival, as can be seen by placing G t3 in the exact sequence 0 → I3t → G t3 → G 3 /I3 → 0
ON THE MODULARITY OF Q-CURVES
111
and computing the initial terms of the Hochschild-Serre spectral sequence (see [19, ¯ ∗ such that §6.1]). Therefore, there is a cochain a3 : G 3 → Q c E (g, h) = α3 (g)α3 (h)α3 (gh)−1 for all g, h ∈ G 3 , and such that α3 vanishes on the wild inertia group I3w . Now, let ¯ ¯ ∗ be any cochain whose coboundary is c E . Then (α 0 |G 3 )α −1 is α 0 : Gal(Q/Q) →Q 3 ¯ a character θ3 of G 3 . Let θ be a character of Gal(Q/Q) whose restriction to G 3 is θ3 . Then define α = α 0 θ −1 . So the coboundary of α is c E , and α|G 3 = α3 ; in particular, α vanishes on wild inertia. Since E obtains good reduction after a tame extension of K , we know φ E,` is tamely ramified at 3 for all ` 6 = 3. It follows from definition (2.5) that ρ E,` |G 3 is tamely ramified for all ` 6 = 3. By Proposition 2.8 and (2.6), the assumption that b E |G 3 is trivial means that E |G 3 = χ 2 ¯ ∗ . The character E,` is tamely ramified for any for some character χ : G 3 → Q ` 6= 3 because ρ E,` |G 3 is tamely ramified; it follows that E , hence also χ, is tamely ramified. Replacing α by αχ now yields the first two desired conditions. In particular, E |G 3 is trivial. So we can modify α by any power of E without affecting the first two conditions. Now we can force to have 2-power order by the argument of Remark 2.4.
For the rest of the proof, it is understood that α is chosen so that ρ E,` |G 3 satisfies the conditions in Lemma 3.3. We now take as fixed some ` > 3. From Proposition 2.10, we have an abelian variety Aα /Q such that ρ E,` ∼ = Vλ Aα , where λ|` is the prime of M (the number field generated by the α(g)) determined by ι. Denote by L the ramified quadratic extension of Q3 , by G L ⊂ G 3 the absolute Galois group of L, and by I L the inertia subgroup of G L . Let ψ be the ramified ψ quadratic character of G L , and write Aα /L for the twist of Aα ×Q L by ψ. From this point on, we take as fixed some ` > 3. LEMMA
3.4
Either • •
ρ E,` |G L and Pρ E,` |G 3 are unramified, and Aα /L has good reduction; or ψ ρ E,` |G L ⊗ ψ is unramified, and Aα /L has good reduction.
112
ELLENBERG AND SKINNER
Proof By Lemma 3.3, wild inertia is killed by ρ E,` . Let τ be a topological generator of I3t , and define m = ρ E,` (τ ). Since m has finite order, it is diagonalizable. Note that τ and τ 3 are conjugate in G 3 . So m and m 3 are conjugate in ∗ ¯ Q` GL2 (Q` ). Since det(m) = χ` (τ ) = 1, we conclude that the eigenvalues of m must be (1, 1), (−1, −1), or (i, −i). In the first two cases, we see that ρ E,` |G L is unramified. In the last case, (ρ E,` |G L ) ⊗ ψ is unramified. Suppose the eigenvalues of m are (1, 1) or (−1, −1); equivalently, Pρ E,` |G 3 is unramified. Recall from Remark 2.5 that Pρ E,` does not depend on the choice of ι. So ρ E,λi (τ ) is scalar for any i, hence I L acts trivially on Vλi Aα for any prime λi of M dividing `. It then follows from Proposition 2.10 that I L acts trivially on V` Aα , and so Aα /L has good reduction. Likewise, if the eigenvalues of m are (i, −i), then ρ E,λi (τ 2 ) = −1 for all i, so ψ ρ E,λi |G L ⊗ ψ is unramified for all i, and Aα /L has good reduction. Suppose Pρ E,` |G 3 is unramified. Then ρ E,` |G L is unramified. Since det ρ E,` |I3 is trivial, the image ρ E,` (I3 ) is either trivial or ±1. So, in fact, either Aα /Q3 or its ramified quadratic twist has good reduction over Q3 . Therefore, either ρ E,3 |G 3 or its ramified quadratic twist is associated to a 3-divisible group, and E is modular by [7, Th. 5.3]. We therefore assume from now on that (ρ E,` |G L ) ⊗ ψ is unramified, so that ψ Aα /L has good reduction. In this case, by the hypotheses of our theorem, 3 does not ¯ divide deg µg for any g ∈ Gal(Q/Q). ψ In this case, Aα /L has good reduction. Therefore, if ψ 0 is a ramified quadratic √ √ ψ0 ¯ character of Gal(Q/Q[ −3]), the twist Aα /Q[ −3] has good reduction at the prime over 3. ¯ 3 . Then, Let θ be the prime of M determined by the chosen embedding M ,→ Q by Proposition 2.12, we can define a finite flat group scheme Aα [θ] as the 3-torsion √ ψ0 (equivalently, the θ-torsion) of the 3-divisible group Tθ Aα /Q[ −3]. Because ρ¯ E,3 √ ¯ is absolutely irreducible when restricted to Gal(Q/Q[ −3]), and because 0 ρ E,3 ⊗ ψ 0 ∼ = Vθ Aψ α , √ ¯ we have an isomorphism of (O Mθ /θ)[Gal(Q/Q[ −3])]-modules: 0 ρ¯ E,3 ⊗ ψ 0 ∼ = Aψ α [θ ].
Restricting this isomorphism to G L yields an isomorphism of (O Mθ /θ)[G L ]-modules: ρ¯ E,3 |G L ⊗ ψ ∼ = Aψ α [θ ]. In particular, (ρ¯ E,3 |G L ) ⊗ ψ is flat. Recall that a representation of G L with finite image is said to be flat if the attached finite flat group scheme over L is the generic
ON THE MODULARITY OF Q-CURVES
113
fiber of a finite flat group scheme over O L . In this case, the finite flat group scheme in ψ question is (Aα )[θ]. LEMMA 3.5 The centralizer of ρ¯ E,3 (G 3 ) consists entirely of scalars.
Proof The result follows from a theorem of Conrad [2, Th. 4.2.1]. As above, the relevant ψ finite flat group scheme over O L is G = (Aα )[θ]. To apply Conrad’s theorem, we need only verify that G is connected and has connected Cartier dual, and that G satisfies a certain exactness condition on Dieudonn´e modules. The connectedness of G and its dual follows from the fact that G is a closed subgroup scheme of the 3ψ torsion subscheme of the supersingular abelian variety Aα . The exactness condition ψ is automatically satisfied because G is the 3-torsion in the 3-divisible group Tθ Aα . Let F be a finite extension of Q` . Recall that an `-adic representation ρ of the Galois group of F is said to be Barsotti-Tate if it arises from the generic fiber of an `-divisible group, and to be potentially Barsotti-Tate if some restriction of ρ to a finite-index ¯ subgroup of Gal( F/F) is Barsotti-Tate (see [3, §1.1]). The representation ρ E,3 |G 3 is potentially Barsotti-Tate because it is realized on the θ-adic Tate module of Aα , which has potentially good reduction. From now on, we abuse notation and refer to the local representation ρ E,3 |G 3 simply as ρ E,3 . Let V be a d-dimensional vector space over a finite extension F 0 of Q` . One can ¯ associate to any potentially Barsotti-Tate representation ρ : Gal( F/F) → GL(V ) a continuous representation W D(ρ) : W F → GL(D) ¯ ` -vector space D of dimension d, as in Conrad, Diaof the Weil group of F on a Q mond, and Taylor [3, Appendix B]. In the lemma that follows, we freely use definitions and facts from that paper, especially [3, §1.2, §2.3, and Appendix B]. 3.6 The type of W D(ρ E,3 ) is strongly acceptable for ρ¯ E,3 . LEMMA
Proof We take F 0 = Mλ . Let τ be the restriction of W D(ρ E,3 ) to I3 . It follows from Proposition 2.10 and [3, Prop. B.4.2] that ρ E,3 is Barsotti-Tate over L 0 for any finite extension L 0 /Q` such that τ is trivial. Our choice of α in Lemma 3.3 guarantees that det ρ E,3 = χ3 and
114
ELLENBERG AND SKINNER
det ρ¯ E,3 = χ¯ 3 . It follows that ρ E,3 is a deformation of ρ¯ E,3 of type τ , according to the definition in [3, §1.2]. We know that (ρ E,3 |G L ) ⊗ ψ is Barsotti-Tate because it is associated to the 3ψ divisible group Tθ Aα . So W D((ρ E,3 |G L ) ⊗ ψ) = W D(ρ E,3 |G L ) ⊗ W D(ψ) ¯` is unramified, and so τ |I L = W D(ψ)|I L . We know that W D(ψ) = ψ|W L ⊗ Mλ Q (see [3, §B.2]); that is, W D(ψ)|I L is a nontrivial quadratic character of I L . We also know that the determinant of τ is trivial on I3 because the W D functor commutes with exterior products, and the determinant of ρ E,3 is the cyclotomic character χ3 ; the character W D(χ3 ) is shown to be unramified in [3, §B.2]. We conclude that τ∼ = ω˜ 22 ⊕ ω˜ 26 , ¯ ∗ is the Teichm¨uller lift of ω2 , the fundamental tame character of where ω˜ 2 : I3t → Q 3 level 2. It now follows from [3, Cor. 2.3.2] that τ is acceptable for ρ¯ E,3 . We have by [2, Th. 4.2.1] that either • (ρ¯ E,3 |I3 ) ⊗F3 F¯ 3 ∼ = ωm ⊕ ω23m , where m = 1 or 5; or χ¯3m ∗ 2 • ρ¯ E,3 |I3 ∼ e. = n , where (m, n) = (0, 1) or (1, 0) and ∗ is peu ramifi´ 0 χ¯ 3
In either case, it follows from the criterion of [3, §1.2] that τ is strongly acceptable for ρ¯ E,3 . Now, combining Lemmas 3.5 and 3.6, we can apply [3, Th. 7.1.1] and conclude that ρ E,3 , hence E, is modular. 4. More on residual representations In [25], Wiles deals with the case where the 3-adic representation associated to an elliptic curve C is residually reducible by executing a “3-5 switch.” That is, he replaces C with another elliptic curve C 0 such that the mod 3 representation attached to C 0 is √ ¯ absolutely irreducible when restricted to Gal(Q/Q[ −3]), and such that C and C 0 have isomorphic mod 5 Galois representations. Aside from a finite set of exceptions, the common √ mod 5 Galois representation is absolutely irreducible when restricted to ¯ Gal(Q/Q[ 5]). This coincidence of mod 5 Galois representations is enough to show that modularity of C 0 is equivalent to modularity of C, and the modularity of C 0 follows from the condition on the mod 3 Galois representation of C 0 . This argument relies on the fact that, given an elliptic curve C, there are plenty of elliptic curves C 0 whose mod 5 representations are isomorphic to that of C. This fact, in turn, depends on the fact that the modular curve X (5)/Q is isomorphic to P1 /Q. In general, the
ON THE MODULARITY OF Q-CURVES
115
modular curve parametrizing Q-curves with full level 5 structure does not have genus zero, rendering a 3-5 switch impossible. We are left with two methods of treating the residually reducible cases. One method is to generalize the lifting theorems of Wiles, Taylor and Wiles, and others to the residually reducible situation. Several theorems in this direction have been proven by Wiles and C. Skinner [22], [23] in the case where the reduction of E is ordinary or multiplicative. We apply those theorems to the present situation in Theorem 5.1. Another method is to exploit the fact that, in contrast with the case of elliptic curves over Q, there are often cohomological obstructions to the reducibility of ρ¯ E,` . These obstructions can be computed explicitly in terms of the invariants described in Section 2. We begin with a general fact about reducible projective mod ` Galois representations. 4.1 Let ` be an odd prime, and let PROPOSITION
¯ Pρ¯ : Gal(Q/Q) → PGL2 (F` ) ¯ be a projective mod ` Galois representation. Let χ : Gal(Q/Q) → ±1 be a quadratic Dirichlet character (possibly trivial). Let G be the subgroup of matrices in F¯ ∗` GL2 (F` ) having determinant 1, and let γ ∈ H 2 (PGL2 (F` ), ±1) be the class of the extension 1 → ±1 → G → PGL2 (F` ) → 1. Let ψ¯ = det Pρ. ¯ Finally, suppose that either the image of Pρ¯ lies in the normalizer N of a Cartan subgroup C of PGL2 (F` ), ¯ and the quadratic character Gal(Q/Q) → N /C is equal to χ ; or (b) the image of Pρ¯ lies in a Borel subgroup of PGL2 (F` ), and χ is trivial. ¯ χ ψ) ¯ ¯ χ ψ)(χ ¯ Then either (ψ, or Pρ¯ ∗ γ (ψ, , χ ) is the trivial class in 2 ¯ H (Gal(Q/Q), ±1).
(a)
Proof First, suppose the image of Pρ¯ lies in the normalizer N of a Cartan subgroup C of PGL2 (F` ). Write N¯ for the group N /C 2 . Let π be the natural projection of N onto N¯ . Then N¯ ∼ = (Z/2Z)⊕2 ; a choice of isomorphism can be fixed by requiring that the first copy of Z/2Z be π(C) and the second be the kernel of det : N¯ → F∗` /(F∗` )2 . We then have ¯ π ◦ Pρ¯ = ψ¯ ⊕ χ : Gal(Q/Q) → (Z/2Z)⊕2 . (4.12) We consider two cases. Case 1: |C 2 | is even. Then π factors as N → Nˆ → N¯ ,
116
ELLENBERG AND SKINNER
where Nˆ is a dihedral group of order 8 whose cyclic subgroup of order 4 is the preim¯ age of π(C). So π ◦ Pρ¯ lifts to a homomorphism from Gal(Q/Q) to Nˆ , which means that d(π ◦ Pρ) ¯ vanishes in the cohomology sequence d ¯ ¯ ¯ H 1 (Gal(Q/Q), Nˆ ) → H 1 (Gal(Q/Q), N¯ ) → H 2 (Gal(Q/Q), ±1).
The isomorphism N¯ ∼ = (Z/2Z)⊕2 then tells us that d 0 (ψ¯ ⊕ χ) vanishes in 0
d ¯ ¯ ¯ H 1 (Gal(Q/Q), D4 ) → H 1 (Gal(Q/Q), (Z/2Z)⊕2 ) → H 2 (Gal(Q/Q), ±1),
where D4 is a dihedral group of order 8 whose cyclic subgroup of order 4 is the preimage of the first copy of (Z/2Z). It is well known that, for any two characters ¯ χ ψ) ¯ = 0, χ1 , χ2 , we have d 0 (χ1 ⊕ χ2 ) = (χ1 , χ1 χ2 ) (see [11, Prop. 3.10]). So (ψ, as desired. Case 2: |C 2 | is odd. In this case, the inflation map π ∗ : H 2 ( N¯ , ±1) → H 2 (N , ±1)
(4.13)
is an isomorphism. The subgroup of N generated by an involution in C and any element of N \C is isomorphic to (Z/2Z)⊕2 ; in fact, any such subgroup is the image of an injection s : N¯ → N such that π ◦ s is the identity. Write ι for the inclusion of N in PGL2 (F` ). Let M be the subgroup of G lying over s( N¯ ). Then s ∗ ι∗ γ is the class c ∈ H 2 ( N¯ , ±1) corresponding to the extension 1 → ±1 → M → N¯ → 1. It follows from the fact that a nonscalar element of G whose square is a scalar has exact order 4 that M is the quaternion group of order 8. Now, from (4.12) and [11, Th. 3.11], one gets ¯ χ ψ)(χ, ¯ Pρ ∗ π ∗ c = (ψ¯ ⊕ χ)∗ c = (ψ, χ). The isomorphism (4.13) implies that π ∗ s ∗ acts as the identity on H 2 (N , ±1). In particular, we have π ∗ c = ι∗ γ . Pulling back both of these by Pρ (or, more precisely, ¯ by the homomorphism f : Gal(Q/Q) → N such that ι ◦ f = Pρ), one obtains the equality Pρ ∗ γ = Pρ ∗ π ∗ c, which yields the desired result. The only case remaining is that where the image of Pρ lies in a Borel subgroup but not necessarily in the normalizer of a Cartan. In this case, the semisimplification of Pρ¯ has image lying in a split Cartan subgroup, and we are in the case already discussed.
ON THE MODULARITY OF Q-CURVES
117
We now apply Proposition 4.1 to the case of mod ` representations attached to Qcurves. PROPOSITION 4.2 ¯ Let E/K be a Q-curve, and let ` be an odd prime. Let χ : Gal(Q/Q) → ±1 be a 2 ¯ quadratic Dirichlet character (possibly trivial). Let q`,∞ ∈ H (Gal(Q/Q), ±1) be the Brauer class of the quaternion algebra ramified only at ` and ∞. Suppose that either (i) the image of Pρ¯ E,` lies in the normalizer N of a Cartan subgroup C of ¯ PGL2 (F` ), and the quadratic character Gal(Q/Q) → N /C is equal to χ; or (ii) the image of Pρ¯ E,` lies in a Borel subgroup of PGL2 (F` ), and χ is trivial. Then either (ψ¯ E,` , χ ψ¯ E,` ) or b E q`,∞ (ψ¯ E,` , χ ψ¯ E,` )(χ, χ ) is the trivial class in ¯ H 2 (Gal(Q/Q), ±1).
Proof The proposition is an immediate corollary of Proposition 4.1. The only thing to check is that ∗ Pρ¯ E,` γ = b E q`,∞ . ¯ Let G be as in the statement of Proposition 4.1. For each σ ∈ Gal(Q/Q), let dσ ∈ F¯ ∗` −1 be a square root of det(ρ¯ E,` (σ )). Then gσ = dσ ρ¯ E,` is a set-theoretic lift of Pρ¯ E,` to G. To this lift one associates a 2-cocycle c given by the rule −1 −1 c(σ, τ ) = gσ gτ gσ−1 τ = dσ dτ dσ τ .
¯ But this is just a 2-cocycle representing the class δ(det ρ¯ E,` ), where δ¯ is defined as in ¯ χ¯ ` ) = q`,∞ , one has Proposition 2.15. From that proposition and from the fact that δ( ¯ ¯ χ¯ ` ) = b E q`,∞ . δ(det ρ¯ E,` ) = b E δ( The desired result follows. Proposition 4.2 guarantees in many cases that the 3-adic representation attached to a Q-curve is residually absolutely irreducible, even when restricted to a quadratic field. 5. The main theorems We are now ready to state and prove the main results of the paper. Recall that (b E )3 denotes the restriction of b E to H 2 (G 3 , ±1).
118
ELLENBERG AND SKINNER
THEOREM 5.1 Suppose E/K is a Q-curve with potentially ordinary or multiplicative reduction at some (hence every) prime of K over 3, and such that (b E )3 is trivial. Then E is modular.
Proof √ ¯ First, suppose that ρ¯ E,3 is absolutely reducible when restricted to Gal(Q/Q[ −3]). For this case we appeal to the main theorems of [22] and [23]. In order for these theorems to apply, we need only verify the following properties of the representation ρ E,3 : (i) ρ E,3 is continuous, irreducible, and odd; (ii) det ρ E,3 (Frob` ) = ψ(`)`k−1 for some finite character ψ, some integer k ≥ 2, and almost all primes `; φ ∗ (iii) ρ E,3 |G 3 ∼ = 1 φ2 with φ2 | I3 finite; (iv) the reductions φ¯ 1 and φ¯2 are distinct; (v) ρ¯ E,3 is modular (in the sense of Lemma 3.2) if it is absolutely irreducible. Properties (i) and (ii) follow from Proposition 2.3 and (2.6). (Here we have again used that E does not have complex multiplication, this time to ensure that φ E,3 , and hence ρ E,3 , is irreducible.) We next prove that property (iii) holds. From the possibilities for the reduction type of E, it follows that the restriction of φ E,3 to a decomposition group G v at a prime v|3 of K satisfies θ1 ∗ ∼ φ E,3 |G v = θ2 with θ2 having finite order on inertia. We claim that the same is true of ρ E,3 |G 3 . Suppose otherwise. From the fact that ρ E,3 |Gal( K¯ /K ) is isomorphic to a twist of φ E,3 , it follows that there is a quadratic extension, say L, of Q3 such that the restriction ¯ of ρ E,3 to Gal( L/L) is the direct sum of two characters that are interchanged by the action of Gal(L/Q3 ). Since the product of these two characters, being the restriction of det ρ E,3 , is infinitely ramified, so must be one, and hence both, of these characters. But this contradicts the above description of φ E,3 |G v . Write ρ E,3 |G 3 ∼ =
φ1
∗ φ2
.
We next prove that the reductions φ¯ 1 and φ¯ 2 are distinct on G 3 ; in other words, that ρ E,3 has property (iv). To see this, we note that if φ¯ 1 and φ¯ 2 were not distinct on G 3 , then det ρ¯ E,3 |G 3 would be a square. Suppose this were so. Then from det ρ¯ E,3 = ¯3 χ¯ 3 (see (2.6)), we conclude that ¯3 |G 3 = φ 2 χ¯ 3 |G 3 for some character φ of G 3 . It then follows from Proposition 2.15 that the restriction of b E to G 3 equals
ON THE MODULARITY OF Q-CURVES
119
¯ χ¯ 3 ) to G 3 . But the latter is nontrivial, and hence so is the former, the restriction of δ( contradicting hypothesis (ii) of the theorem. It remains to prove that property (v) holds. If ρ¯ E,3 is absolutely irreducible, then √ ¯ it must be dihedral and in fact induced from a character of Gal(Q/Q( −3)) since we √ ¯ are assuming that ρ¯ E,3 is absolutely reducible on Gal(Q/Q( −3)). It is a classical result that such representations are modular. We have shown that ρ E,3 has properties (i)–(v) listed above. As mentioned before, the theorem follows. Now, suppose that ρ¯ E,3 is absolutely irreducible when restricted to √ ¯ Gal(Q/Q[ −3]). By the argument above, φ1 ∗ ρ E,3 |G 3 ∼ , = φ2 where φ2 has finite image on inertia and φ1 | I3 = ηχ3 | I3 with η a finite-order character. After twisting ρ E,3 by a finite-order character of G Q , we may assume φ2 is unramified. We have already shown above that φ¯ 1 6 = φ¯ 2 . Finally, ρ¯ E,3 is modular by Lemma 3.2 (which does not use the assumption of supersingular reduction in Theorem 3.1). It now follows from [7, Th. 5.3] that ρ E,3 , hence E, is modular. THEOREM 5.2 Suppose E/K is a Q-curve such that, for some (hence every) prime ` > 3, the projective representation Pρ E,` associated to ρ E,` is unramified at 3. Then E is modular.
Proof If E has potentially ordinary or multiplicative reduction, the modularity follows from Theorem 5.1. We therefore assume that the reduction of E is potentially supersingular. We have that ρ E,` |I3 is a character θ. So θ 2 = det ρ E,` |I3 = E |I3 . Choose α such that E has 2-power order; then E , hence also ρ E,` , is tamely ramified. We may choose K to be a compositum of quadratic fields (see [14, Cor. 2.5]), in which case it follows that E obtains good reduction over a tamely ramified extension of Q3 . Let τ be a topological generator of tame inertia, and let m = ρ E,` (τ ); then m is a scalar that is conjugate to its cube, so m = ±1. In either case, det m = E (τ ) = 1, so E is unramified at 3, and (b E )3 = δ( E |G 3 ) is trivial. From Proposition 2.10, the action of τ on T` Aα is either 1 or −1. Thus, after modifying α by a quadratic character, we may assume that Aα has good supersingular reduction at 3.
120
ELLENBERG AND SKINNER
Therefore, Aα [3] extends to a finite flat group scheme over R = W (F¯ 3 ), to which we can apply M. Raynaud’s classification in [15]. Let F be the fraction field of R. Let H be a Jordan-H¨older quotient of Aα [3] F . Then we have from [15, Cor. 3.4.4] that ¯ has eigenvalues the action of τ on H ( F) i
ψm (τ )n3 (i = 0, . . . , m − 1), where ψm is a fundamental tame character of I3 , and n is an integer whose base-3 ¯ if and expansion contains only 0’s and 1’s. In particular, τ 4 acts trivially on H ( F) only if τ 2 acts trivially. ¯ Then H ( F) ¯ is a 1-dimensional F3 -vector Suppose τ 2 acts trivially on H ( F). space, and H is isomorphic to either (Z/3Z) K or (µ3 ) K . It then follows from [15, Cor. 3.3.6] that Aα [3]/R has either Z/3Z or µ3 as a subquotient, which contradicts the supersingularity of Aα . ¯ We may therefore suppose that τ 4 acts nontrivially on the F-points of every sub4 quotient of Aα [3]. In particular, ρ¯ E,3 (τ ) does not have 1 as an eigenvalue. √ ¯ −3]) is absolutely reducible. As in Suppose the restriction of ρ¯ E,3 to Gal(Q/Q[ §3, let L be the ramified quadratic extension of Q3 . Then (ρ¯ E,3 |I L )ss ∼ = φ1 ⊕ φ2 for some characters φ1 , φ2 : I Lt → F¯ ∗3 . Since (ρ¯ E,3 |I L ) extends to a representation √ of G 3 , we have that {φ1 , φ2 } = {φ13 , φ23 }. The fact that ρ¯ E,3 |Q[ −3] is absolutely reducible means that in fact φi3 = φi for i = 1, 2; in other words, φ1 and φ2 are quadratic characters. In particular, ρ¯ E,3 (τ 4 ) is unipotent, which is a contradiction. To sum up: we have shown that under the hypotheses of the theorem, we know that • E obtains good supersingular reduction over a tame extension of Q3 ; • (b E )3 = 1; and √ ¯ • ρ¯ E,3 | Gal(Q/Q[ −3]) is absolutely irreducible. It now follows from Theorem 3.1 that E is modular. 5.3 Suppose E/K is a Q-curve that acquires semistable reduction over a field tamely ramified over Q3 . Suppose further that (b E )3 is trivial, and suppose that the four classes (ψ¯ E,3 , −1), b E q3,∞ (ψ¯ E,3 , −1), q3,∞ (ψ¯ E,3 , 3), and b E (ψ¯ E,3 , 3) are all non¯ trivial in H 2 (Gal(Q/Q), ±1). Finally, suppose that deg µσ can be chosen to be prime ¯ to 3 for all σ ∈ Gal(Q/Q). Then E is modular. THEOREM
Proof We may assume that the reduction of E over 3 is potentially supersingular; otherwise, E is modular by Theorem 5.1.
ON THE MODULARITY OF Q-CURVES
121
√ ¯ It follows from Proposition 4.2 that the restriction of ρ¯ E,3 to Gal(Q/Q[ −3]) is absolutely irreducible. It then follows from Theorem 3.1 that E is modular. Acknowledgments. The authors wish to thank Brian Conrad, Fred Diamond, Matthew Emerton, Jordi Quer, Ken Ribet, Richard Taylor, and Andrew Wiles for helpful discussions. References [1]
C. BREUIL, B. CONRAD, F. DIAMOND, and R. TAYLOR, On the modularity of elliptic
[2]
B. CONRAD, Ramified deformation problems, Duke Math. J. 97 (1999), 439–513.
[3]
B. CONRAD, F. DIAMOND, and R. TAYLOR, Modularity of certain potentially
curves over Q, to appear in J. Amer. Math. Soc. 99 MR 2000h:11055 113, 114
[4] [5]
[6] [7] [8] [9]
[10]
[11] [12]
[13] [14] [15]
Barsotti-Tate Galois representations, J. Amer. Math. Soc. 12 (1999), 521–567. MR 99i:11037 99, 109, 110, 113, 114 G. CORNELL, J. H. SILVERMAN, and G. STEVENS, eds., Modular Forms and Fermat’s Last Theorem (Boston, 1995), Springer, New York, 1997. MR 99k:11004 H. DARMON, “Serre’s conjectures” in Seminar on Fermat’s Last Theorem (Toronto, 1993/94), ed. V. Kumar Murty, CMS Conf. Proc. 17, Amer. Math. Soc., Providence, 1995, 135–153. MR 96h:11048 98 ´ P. DELIGNE and J.-P. SERRE, Formes modulaires de poids 1, Ann. Sci. Ecole Norm. Sup. (4) 7 (1974), 507–530. MR 52:284 110 F. DIAMOND, On deformation rings and Hecke rings, Ann. of Math. (2) 144 (1996), 137–166. MR 97d:11172 109, 112, 119 G. FALTINGS, Endlichkeitss¨atze f¨ur abelsche Variet¨aten u¨ ber Zahlk¨orpern, Invent. Math. 73 (1983), 349–366. MR 85g:11026a 103 S. GELBART, “Three lectures on the modularity of ρ¯ E,3 and the Langlands reciprocity conjecture” in Modular Forms and Fermat’s Last Theorem (Boston, 1995), Springer, New York, 1997, 155–207. MR CMP 1 638 479 109 A. GROTHENDIECK, Groupes de monodromie en g´eom´etrie alg´ebrique, I, S´eminaire de G´eom´etrie Alg´ebrique du Bois-Marie (SGA 7 I), Lecture Notes in Math. 288, Springer, Berlin, 1972. MR 50:7134 H. G. GRUNDMAN, T. L. SMITH, and J. R. SWALLOW, Groups of order 16 as Galois groups, Exposition. Math. 13 (1995), 289–319. MR 96h:12005 116 Y. HASEGAWA, K.-I. HASHIMOTO, and F. MOMOSE, Modularity conjecture for Q-curves and QM-curves, Internat. J. Math. 10 (1999), 1011–1036. MR CMP 1 739 367 97, 98 H. HIDA, Modular Galois representations of “Neben” type, preprint, 2000, http://www.math.ucla.edu/˜hida/ 97, 98 J. QUER, Q-curves and abelian varieties of GL2 -type, Proc. London Math. Soc. (3) 81 (2000), 285–317. MR CMP 1 770 611 99, 100, 104, 119 M. RAYNAUD, Sch´emas en groupes de type ( p, . . . , p), Bull. Soc. Math. France 102
122
ELLENBERG AND SKINNER
(1974), 241–280. MR 54:7488 120 [16] [17]
[18] [19]
[20]
[21] [22]
[23] [24] [25]
K. A. RIBET, Galois action on division points of Abelian varieties with real
multiplications, Amer. J. Math. 98 (1976), 751–804. MR 56:15660 105 , “Abelian varieties over Q and modular forms” in Algebra and Topology (Taej˘on, Korea, 1992), Korea Adv. Inst. Sci. Tech., Taej˘on, 1992, 53–79. MR 94g:11042 97, 100, 101, 104 B. ROBERTS and L. WASHINGTON, The modularity of some Q-curves, Compositio Math. 111 (1998), 35–49. MR 99a:11072 97 J.-P. SERRE, “Modular forms of weight one and galois representations” in Algebraic Number Fields: L-Functions and Galois Properties (Durham, England, 1975), Academic Press, London, 1977, 193–268. MR 56:8497 101, 111 G. SHIMURA, Introduction to the Arithmetic Theory of Automorphic Functions, Kanˆo Memorial Lectures 1, Iwanami Shoten, Tokyo; Publ. Math. Soc. Japan 11, Princeton Univ. Press, Princeton, 1971. MR 47:3318 103 , On elliptic curves with complex multiplication as factors of the Jacobians of modular function fields, Nagoya Math. J. 43 (1971), 199–208. MR 45:5111 100 C. M. SKINNER and A. J. WILES, Residually reducible representations and modular ´ forms, Inst. Hautes Etudes Sci. Publ. Math. 89 (1999), 5–126. MR CMP 1 793 414 99, 115, 118 , Nearly ordinary deformations of irreducible residual representations, to appear in Ann. Fac. Sci. Toulouse Math. (6). 99, 115, 118 R. TAYLOR and A. WILES, Ring-theoretic properties of certain Hecke algebras, Ann. of Math. (2) 141 (1995), 553–572. MR 96d:11072 108 A. WILES, Modular elliptic curves and Fermat’s last theorem, Ann. of Math. (2) 141 (1995), 443–551. MR 96d:11071 108, 114
Ellenberg Mathematics Department, Princeton University, Princeton, New Jersey 08544-1000, USA;
[email protected] Skinner Department of Mathematics, University of Michigan, Ann Arbor, Michigan 48109-1109, USA;
[email protected]
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 1,
FOURIER TRANSFORM FOR D-ALGEBRAS, I A. POLISHCHUK AND M. ROTHSTEIN
Abstract An analogue of the Fourier transform is developed for D-algebras. G. Laumon’s equivalence between the derived category of D -modules on an abelian variety and the derived category of O -modules on the universal extension of the dual variety is seen as a degenerate case of a duality for twisted differential operators (tdo’s) with respect to which the dual of a generic tdo is again a tdo. 1. Introduction This is part one of a pair of papers devoted to an analogue of the Fourier transform for a certain class of noncommutative ringed spaces. The model example that initiated this study is the equivalence between derived categories of D -modules on an abelian variety and O -modules on the universal extension of the dual abelian variety by a vector space (see [L], [R]). The natural framework for a generalization of this equivalence is provided by the language of D-algebras and D-schemes developed by A. Beilinson and J. Bernstein in [BB]. We consider a subclass of D-schemes we call special D-schemes. We show that whenever one has an equivalence of categories of O -modules on two varieties X and Y , given by O X ×Y -modules P and Q , it gives rise to a correspondence between special D-schemes on X and Y such that the corresponding derived categories of modules are equivalent. When X is an abelian variety and Y is the dual abelian variety, according to S. Mukai [M] the derived categories of O -modules on X and Y are equivalent. So our construction gives in particular the Fourier transform between modules over rings of twisted differential operators (tdo for short; see Section 5 for a definition) with nondegenerate first Chern class on X and Y . This picture restores some of the symmetry between X and Y and also gives a perspective on the asymmetry in Laumon’s result. The idea, as noted in Lemma 5.2, is that extensions 0 −−−−→ 1∗ O X −−−−→ E −−−−→ 1∗ O X −−−−→ 0 DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 1, Received 13 July 1999. Revision received 14 August 2000. 2000 Mathematics Subject Classification. Primary 14K05; Secondary 35Q53, 35A27. Authors’ work supported in part by National Science Foundation grants number DMS-9700458 and DMS9626522. 123
124
POLISHCHUK AND ROTHSTEIN
of the structure sheaf of the diagonal decompose as a sum of extensions of two kinds: those for which the O X -bimodule E is simply an O X -module and the extension data is global, and those for which the extension is trivial both as a left O X -module and as a right O X -module but the two module structures are intertwined by a global vector field. The observation that motivated this paper is that the Fourier-Mukai transform interchanges these two kinds of extensions. Now the sheaf of untwisted differential operators corresponds to the latter type of extension, which explains why it corresponds under the Fourier-Mukai transform to a sheaf of commutative algebras.∗ A generic tdo involves both types of extensions and therefore corresponds to another tdo. For much of the paper we do not assume that we are working with abelian varieties since the notion of a special D-algebra is meaningful for any variety. However, as far as we know, abelian varieties provide the only nontrivial examples. Here is a sketch of the basic construction. Given three schemes X , Y , and Z and objects F and G in the derived category of O -modules on X × Y and Y × Z , respectively, we define their circle product in the derived category of O -modules on X × Z by the formula L
F ◦Y G = Rπ X Z ∗ (π X∗ Y (F )⊗O πY∗ Z (G )).
It is an easy application of the projection and base-change formulas that this operation is associative for objects in the quasi-coherent derived category, that is, complexes with quasi-coherent cohomology. For P and Q to give rise to an equivalence, it is sufficient that they satisfy P ◦Y Q ' 1∗ O X ,
Q ◦ X P ' 1∗ OY ,
where 1∗ OY denotes the structure sheaf of the diagonal. We develop the analogue of this picture when X , Y , and Z are replaced by special D-schemes (X, A ), (Y, B ), and (Z , C ). Starting with P , Q and a special D-algebra A on X , we endow Q ◦ A ◦ P with the structure of a special D-algebra on Y . It is then shown that Q ◦ A may be endowed with the structure of a module over A op (Q ◦ A ◦ P ) and that it gives an equivalence of derived categories of A -modules and Q ◦ A ◦ P -modules. In part two of the series we will treat the microlocal version of the Fourier transform, using the theory of NC-schemes developed by M. Kapranov [K], and give applications to higher dimensional analogues of Krichever theory (see [Kr1], [Kr2]). Notation. For ease of reading we work with schemes over Spec(C), though the results ∗ More
precisely, this explains why the Fourier transform of the differential operators is a sheaf of algebras in which O is central. The commutativity comes from the fact that a connection defines a D -module if and only if its curvature vanishes (see example 6.6).
FOURIER TRANSFORM FOR D-ALGEBRAS, I
125
may be generalized to an arbitrary base. We assume our schemes to be separated and of finite type. For sheaves of C-vector spaces, tensor product is indicated simply by juxtaposition, or by ⊗ if necessary. The projection from a product X 1 × X 2 × X 3 . . . 1 X 2 ... to a sequence of its factors is denoted πi,Xj,... or πi, j,... . 2. D-algebras and D-schemes The notion of D-algebra, or differential algebra, is treated in [BB]. Here we review the basic definitions. Given a ring homomorphism R→A, with R commutative, the differential part of A is the direct limit of the R-submodules Ai defined recursively by A0 = { a ∈ A | [a, R] = 0}, Ai+1 = { a ∈ A | [a, R] ⊂ Ai }. We call A a D-algebra over R if A equals its differential part. The same construction works for any R-bimodule, so one has the notion of a differential R-bimodule. If M is an R ⊗ R-module, it is a differential R-bimodule if and only if the sheaf M˜ on Spec(R ⊗ R) is supported on the diagonal. Let A be a D-algebra over R, and let there be given an e´ tale morphism of commutative rings R→S. There is a natural way to endow the left S-module S ⊗ R A with the structure of a D-algebra over S. The multiplication law is defined recursively using the unique lifting of derivations. Thus to a D-algebra A over R one associates a ˜ sheaf of D-algebras A˜ e´ t on Spec(R)e´ t . The Zariski version is denoted simply by A. Given a scheme X , a sheaf A of D-algebras over O X is quasi-coherent if it is constructed as above on affine subsets. Thus A is quasi-coherent if it is quasi-coherent either as a left O X -module or as a right O X -module. Equivalently, A is quasi-coherent if and only if it is quasi-coherent as a sheaf on X × X supported on the diagonal. A D-scheme is a pair (X, A ), where X is a scheme and A is a quasi-coherent sheaf of D-algebras over O X . Given a pair of D-schemes (X, A ) and (Y, B ), the sheaf (π1 −1 (A )π2 −1 (B )) ⊗π
1
−1 (O )π −1 (O ) 2
O X ×Y
has a natural structure of a D-algebra on X × Y . We denote this D-scheme by (X × Y , A B ). There is a convenient characterization of A B -modules. PROPOSITION 2.1 Let (X, A ) and (Y, B ) be D-schemes, and let F be a sheaf of O X ×Y -modules. The following data are equivalent: (1) an A B -module structure on F ,
126
(2)
POLISHCHUK AND ROTHSTEIN
a π1 −1 (A )π2 −1 (B )-module structure on π1 −1 (O )π2 −1 (O )-module structures given π1 −1 (O )π2 −1 (O )→π1 −1 (A )π2 −1 (B ) and coincide.
F such that the two by the homomorphisms π1 −1 (O )π2 −1 (O )→O X ×Y
Proof Clearly an A B -module structure on F gives the latter type of structure. Conversely, let F be endowed with π1 −1 (O )π2 −1 (O )-compatible π1 −1 (A )π2 −1 (B ) and O X ×Y module structures. Then if U × V ⊂ X × Y is the product of affine open subsets, there is an action of 0(U × V, A B ) on 0(U × V, F ), and there is a unique way to localize this action on basic open subsets (U × V ) f . Then we can patch. Given sheaves of modules F and G on (X, A ) and (Y, B ), respectively, the sheaf π1∗ (F ) ⊗O X ×Y π2∗ (G ) satisfies the hypotheses of Proposition 2.1 and is therefore an A B -module, which we denote by F G . 3. Circle product for schemes For a scheme X , denote by D− (X ) the derived category bounded above of O X modules, and denote by D− qc (X ) the subcategory with quasi-coherent cohomology sheaves. Let X , Y , and Z be schemes, and let F and G be objects in D− (X × Y ) and D− (Y × Z ), respectively. Define L
∗ ∗ F ◦Y G = Rπ13∗ (π12 (F )⊗O π23 (G )).
The basic result is the following proposition. 3.1 Let W, X, Y, and Z be schemes, and let PROPOSITION
F ∈ D− qc (W × X ),
(1)
G ∈ D− qc (X × Y ),
H ∈ D− qc (Y × Z ).
There is a natural isomorphism (F ◦ X G ) ◦Y H ' F ◦ X (G ◦Y H ).
(2)
Let 1∗ O X denote the structure sheaf of the diagonal in X × X . There are natural isomorphisms F ◦ X 1∗ O X ' F ,
1∗ O X ◦ X G ' G .
FOURIER TRANSFORM FOR D-ALGEBRAS, I
127
Part (2) is an easy verification. Part (1) is essentially Mukai’s proposition [M, Proposition 1.3, p. 154]. The proof uses the projection and base-change formulas. Mukai’s statement of the proposition notwithstanding, quasicoherence is indeed essential for statement (1). 3.1. D-algebras and the circle product Let us call a D-scheme (X, A ) flat if A is flat as a left and as a right O -module. Then the circle product enters naturally into the discussion. When we regard A as a sheaf of quasi-coherent O X ×X -modules supported on the diagonal, it makes sense to consider A ◦ A . In the flat case it is again a sheaf, for then ∗ ∗ (A )). (A ) ⊗O π23 A ◦ A = π13∗ (π12
Therefore the multiplication law for A is an associative morphism A ◦ A →A .
Proposition 2.1 then takes the following convenient form. For the applications we have in mind, it is best stated in terms of A B op -modules, where B op is the D-algebra opposite to B . 3.2 Let (X, A ) and (Y, B ) be flat D-schemes, and let F be a sheaf of O X ×Y -modules. Then to give F the structure of an A B op -module is the same as to give morphisms PROPOSITION
A ◦ X F →F , F ◦Y B →F ,
making F a module with respect to ◦ over both A and B op such that the following diagram commutes: A ◦ (F ◦ B ) ' (A ◦ F ) ◦ B → F ◦ B
(3.1.1) ? A ◦F
-
? F
4. Circle product for D-schemes Our goal now is to extend the notion of circle product to D-schemes and to prove the analogue of Proposition 3.1. Along the way we establish a set of formulas in the spirit of the projection and base-change formulas.
128
POLISHCHUK AND ROTHSTEIN
4.1. Transform Definition 4.1 Let (X, A ) and (Y, B ) be D-schemes. Let F ∈ D− (X, A ), and let G ∈ D− (X × Y , A op B ). Define the transform of F by G , F G ∈ D− (Y, B ), by the formula L
XY F G = Rπ2∗ (F B ⊗A op B G ).
Note that XY F G = Rπ2∗ (π1X Y
−1
L
(F )⊗π X Y −1 (A op ) G ). 1
Consider the affine case. Given D-algebras A and B over commutative rings R and S, respectively, let F and G be modules over A and Aop B, respectively. If A is commutative, the sheaf F˜ B˜ ⊗ ˜ op ˜ G˜ is a quasi-coherent O -module on Spec(R S). In A B
particular, it has no higher cohomology. In the noncommutative case this sheaf has no O -module structure. However, there is the following result. LEMMA 4.2 Let the notation be as in the preceding paragraph, and let X = Spec(R) and Y = L
˜ B = B, ˜ F = F, ˜ and G = G. ˜ Assume F ⊗ Aop G ' F ⊗ Aop G Spec(S). Let A = A, − in D (B-Mod). Then for i > 0, XY R i π2∗ (F B ⊗A op B G ) = 0.
Moreover,
XY ^ F G ' π2∗ (F B ⊗A op B G ) ' F ⊗ Aop G.
Proof Consider first the case that G is Aop B itself. Then F B ⊗A op B G is simply F B , which is a quasi-coherent O X ×Y -module and therefore is acyclic for the functor Rπ2∗ . Its direct image onto Y is also quasi-coherent, and on the level of global sections we have XY 0(Y, π2∗ (F B ⊗A op B G )) = 0(X × Y , g F B)
= F B = F ⊗ Aop G. This proves the lemma when G = Aop B and, therefore, also when G is free. To prove the lemma in general, let G • →G→0 be a resolution of G by free Aop Bmodules. Then L F B ⊗A op B G ' F B ⊗A op B G˜ • .
FOURIER TRANSFORM FOR D-ALGEBRAS, I
129
If we can show that this complex has cohomology only in degree zero, we then have L
F B ⊗A op B G ' F B ⊗A op B G .
We also know that the terms of the complex F B ⊗A op B G˜ • are acyclic for the functor Rπ2∗ , from which we learn that L
RπY ∗ (F B ⊗A op B G ) ' πY ∗ (F B ⊗A op B G˜ • ). We would be finished if we then knew that the complex πY ∗ (F B ⊗A op B G˜ • ) had cohomology only in degree zero, and that its zeroth cohomology was F ^ ⊗ Aop G. But since each G i is a free module, the terms of the complex F B ⊗A op B G˜ • are quasi-coherent O X ×Y -modules, with the morphisms being differential operators. So everything that remains to be proved can be checked on the level of global sections. We have L
0(X × Y , F B ⊗A op B G˜ • ) = F B ⊗ Aop B G • ' F ⊗ Aop G ' F ⊗ Aop G.
PROPOSITION 4.3 op Let G ∈ D− qc (X × Y, A B ).
− Then the functor (·)G takes D− qc (X, A ) to Dqc (Y, B ).
Proof By [H1, Chap. I, Prop. 7.3, p. 73], it suffices to prove that F G ∈ D− qc (Y, B ) when F and G are quasi-coherent sheaves. Then one has the following observation, useful here and elsewhere in the paper. For F a quasi-coherent A -module, there are quasiisomorphisms of complexes bounded above F →c1 (F ) ← c2 (F )
such that the terms of the complex c2 (F ) are direct sums of sheaves of the form j∗ (L ), where j : U →X is an open embedding from an affine U and where L is a flat ˇ quasi-coherent A |U -module.∗ For c1 (F ), take the Cech resolution of F with respect to a cover of X by finitely many affine open subsets. Now, given the embedding j : U →X , there is a canonical flat resolution of j∗ j ∗ (F ). Namely, resolve j ∗ (F ) by free A |U -modules using the functor that sends every module to the free module generated by its elements, then apply j∗ . Do this for every term of c1 (F ) to get a double complex whose associated simple complex is c2 (F ). Now use Lemma 4.2 to deduce that F G is quasi-isomorphic to the complex πY ∗ (c2 (F )B ⊗A op B G ). Moreover, the terms of this complex are quasi-coherent, again by Lemma 4.2. ∗ We
thank P. Deligne for pointing this out to us.
130
POLISHCHUK AND ROTHSTEIN
4.2. Some natural isomorphisms The following proposition lists several natural morphisms analogous to the projection and base-change formulas. PROPOSITION 4.4 One has the following natural morphisms: (1) for F ∈ D− (X, A ), G ∈ D− (X × Y , A op B ), and H ∈ D− (Z , C ),
F G H →F G H ;
(2)
(4.2.1)
for F ∈ D− (X, A ), G ∈ D− (X × Y , A op B ), and H ∈ D− (Y × Z , B op C ), (F G )H →G F H ;
(3)
(4.2.2)
for F ∈ D− (X, A ), G ∈ D− (Y, B ), and H ∈ D− (X × Y × Z , A op B op C ), F (G
H
)
→(F G )H .
(4.2.3)
Moreover, in every case the morphism is an isomorphism if F , G , and H have quasicoherent cohomology sheaves. To illustrate, here is the morphism (4.2.1). We have F G H = BC ⊗π
1
−1 (B )π −1 (C ) 2
(π1Y Z
−1
(F G )π2Y Z
−1
(H )).
There is a natural transformation π2X Y Z
−1
XY XY Z Rπ2∗ (·)→π12
XY Z given by adjunction. Moreover, π12 XY Z π23
−1
(F G H )→
XY Z π23
−1
(BC ) ⊗π
2
= π1X Y Z ·
−1 (B )π −1 (C ) 3
−1
−1
−1
(·)
(·) is exact. So we get a morphism
−1
XY Z (π12
L
(F B ⊗A op B G )π3X Y Z
−1
(H ))
L
(F )⊗π
1
−1 (A op )
X Y Z −1 π23 (BC ) ⊗π −1 (B )π −1 (C ) 2 3
XY Z (π12
−1
(G )π3X Y Z
−1
(H )) .
On the other hand, F BC ⊗A op BC G H = π1X Y Z
−1
L
(F )⊗π
1
−1 (A op )
A op BC ⊗π
12
XY Z · (π12
−1
−1 (A op B )π −1 (C ) 3
(G )π3X Y Z
−1
(H )) .
FOURIER TRANSFORM FOR D-ALGEBRAS, I XY Z The morphism π23
−1
131
(BC )→A op BC then gives us our morphism F G H →F G H .
The proofs that the morphisms are isomorphisms in the quasi-coherent case are all done in the same way. One first uses the lemma on way-out functors (see [H1, Chap. 1, Sect. 7, Prop 7.1, p. 68]) to reduce to the case when the objects involved are quasi-coherent sheaves. Then one fixes finite affine open covers of all the schemes and uses the quasi-isomorphisms (·)→c2 (·) described in the proof of Proposition 4.3. This reduces everything to the case of complexes of flat sheaves on affine schemes, where one may invoke Lemma 4.2. Then the morphisms reduce to the canonical isomorphisms (F ⊗ Aop G)H ' F ⊗ Aop G H
(4.2.4)
(F ∈ A-mod, G ∈ A B-mod, H ∈ C-mod), op
(F ⊗ Aop G) ⊗ B op H ' G ⊗ AB op (F H )
(4.2.5)
(F ∈ A-mod, G ∈ A B-mod, H ∈ B C-mod), op
op
F ⊗ Aop (G ⊗ B op H ) ' (F G) ⊗ Aop B op H
(4.2.6)
(F ∈ A-mod, G ∈ AB-mod, H ∈ A B C-mod). op
op
4.3. Circle product Given the D-scheme (Y, B ), if we view B as a sheaf on Y × Y supported on the diagonal, it has a natural BB op -module structure. Denote this BB op -module by δ B . Given F ∈ D− (X × Y , A B op ), G ∈ D− (Y × Z , BC op ), we have F G ∈ D− (X × Y × Y × Z , A B op BC op ).
Thus we can make the following definition. Definition 4.5 Let F ∈ D− (X × Y , A B op ) and G ∈ D− (Y × Z , BC op ). Define F ◦B G ∈ D− (X × Z , A C op )
by the formula
F ◦B G = δ B F G .
PROPOSITION 4.6 For F ∈ D− (X × Y, A B op ) there is a natural morphism
F →F ◦B δ B . op It is an isomorphism if F ∈ D− qc (X × Y, A B ).
132
POLISHCHUK AND ROTHSTEIN
Proof If we assume that F is a complex of flat sheaves and if we set H = A (δ B )B op ⊗A B op BB op F δ B ,
then we need a natural morphism XY Y Y π14
−1
(F )→H .
So it is enough to give such a morphism when F is an arbitrary sheaf of A B op modules. Since H is supported on the main diagonal in X × Y 3 , it suffices to define the morphism on open sets U 0 ⊂ X × Y 3 of the form U 0 = U × X U × X U , where U is an open subset of X × Y . On the level of additive groups, one has the obvious morphism 0(U 0 , π12 −1 (F ))→0(U 0 , H ) and the isomorphism 0(U 0 , π14 −1 (F ))→0(U 0 , π12 −1 (F )). It must be checked that the composite morphism respects the 0(U, A B op )-module structure. We leave this to the reader. (Take U to be a product open set.) The proof that the morphism is an isomorphism in the quasi-coherent case proceeds, as usual, by reducing to the case of a flat quasi-coherent sheaf on an affine variety and then reducing by Lemma 4.2 to the obvious identity B ⊗ B op B (F B) ' F. The same proof works for transforms. PROPOSITION 4.7 Let F ∈ D− qc (X, A ).
There is a natural isomorphism F ' F δA . op
4.4. Associativity Now let (W, A ), (X, B ), (Y, C ), and (Z , D ) be D-schemes, and let F ∈ D− (A B op ),
G ∈ D− (BC op ),
H ∈ D− (C D op ).
We have the following morphism by Proposition 4.4: (F ◦B G ) ◦C H = δ C ((δ B
F G )H
→ δC δB
)
→δ C δ B
F G H
.
FG H
(4.4.1)
FOURIER TRANSFORM FOR D-ALGEBRAS, I
133
A morphism F ◦B (G ◦C H )→ δ C δ B
F G H
(4.4.2)
is similarly defined. These are isomorphisms in the quasi-coherent case, so we have the following result. PROPOSITION 4.8 op − op − op For F ∈ D− qc (A B ), G ∈ Dqc (BC ), and H ∈ Dqc (C D ), there is a natural isomorphism (F ◦B G ) ◦C H ' F ◦B (G ◦C H ). − op One final remark; given F ∈ D− qc (A ) and G ∈ Dqc (A B ), one may regard (X, A ) as the product D-scheme (Spec(C) × X, OSpec(C) A ) and hence consider F ◦A op G or consider instead F G .
PROPOSITION 4.9 Given F ∈ D− qc (A )
op and G ∈ D− qc (A B ), there is a natural isomorphism
F G ' F ◦A op G .
Proof We have
F G ' (F δ A )G . op
Then by the isomorphism (4.2.2), (F δ A )G ' (δ A op )F G = F ◦A op G . op
5. Lie algebroids and twisted differential operators Let T = DerO X be the tangent sheaf of X . A Lie algebroid L on X is a (quasicoherent) O X -module equipped with a morphism of O X -modules σ : L → T and a C-linear Lie bracket [·, ·] : L ⊗ L → L such that σ is a homomorphism of Lie algebras and the following identity is satisfied: [`1 , f `2 ] = f · [`1 , `2 ] + σ (`1 )( f )`2 , where `1 , `2 ∈ L , f ∈ O X (see [Mc]). To every Lie algebroid L one can associate a D-algebra U (L) called the universal enveloping algebra of L. By definition, U (L) is a sheaf of algebras equipped with the morphisms of sheaves i : O X → U (L), i L : L → U (L) such that U (L) is generated, as an algebra, by the images of these morphisms, and the only relations are (i) i is a morphism of algebras;
134
(ii) (iii)
POLISHCHUK AND ROTHSTEIN
i L is a morphism of Lie algebras; i L ( f `) = i( f )i L (`), [i L (`), i( f )] = i(σ (`)( f )), where f ∈ O X , ` ∈ L.
5.1 Let L be a Lie algebroid on X . A central extension of L by O X is a Lie algebroid L˜ on ˜ =0 X equipped with an embedding of O X -modules c : O X ,→ L˜ such that [c(1), `] ˜ ˜ ˜ for every ` ∈ L (in particular, c(O X ) is an ideal in L) and an isomorphism of Lie ˜ O X ) ' L. For such a central extension we denote by U ◦ ( L) ˜ the algebroids L/c( ˜ modulo the ideal generated by the central element i(1) − i ˜ (c(1)). quotient of U ( L) L
5.1 Let L be a locally free O X -module of finite rank. Then there is a bijective correspondence between isomorphism classes of the following data: (i) a structure of a Lie algebroid on L and a central extension L˜ of L by O X , (ii) a D-algebra A equipped with an increasing algebra filtration O X = A0 ⊂ A1 ⊂ A2 ⊂ . . . such that ∪An = A and an isomorphism of the associated graded algebra grA with the symmetric algebra S • L. LEMMA
Proof The correspondence between (i) and (ii) is established as follows. Given a central ˜ as in (i), the corresponding D-algebra is U ◦ ( L) ˜ with its standard filtraextension L, tion. The isomorphism of the associated graded algebra with S • L is provided by the Poincar´e-Birkhoff-Witt (PBW) theorem for Lie algebroids (see [Mc]). Conversely, given a D-algebra A with filtration as in (ii), it gives rise to a central extension 0 → A0 → A1 → A1 /A0 → 0, where the Lie algebroid structure on A1 is induced by the algebra structure on A . Since A0 ' O X , A1 /A0 ' L, this is an extension of L by O X . 5.2 Assume that X is smooth. Then one can take L = T with its natural Lie algebroid structure. The corresponding central extensions T˜ of T by O are called Picard algebroids, and the associated D-algebras are called algebras of twisted differential operators or simply tdo’s. If D is a tdo, D−1 = 0 = D0 ⊂ D1 ⊂ D2 ⊂ . . ., its maximal D-filtration, that is, Di = {d ∈ D |ad( f )d ∈ Di−1 , f ∈ O X },
then grD ' S • T .
FOURIER TRANSFORM FOR D-ALGEBRAS, I
135
LEMMA 5.2 For a locally free O X -module of finite rank E, one has a canonical isomorphism
Ext1O X ×X (1∗ E, 1∗ O X ) ' HomO X (E, T ) ⊕ Ext1O X (E, O X ), 1
where X → X × X is the diagonal embedding. Proof Since 1∗ E ' p1∗ E ⊗O X ×X (O X ×X /J ), where J is the ideal sheaf of the diagonal, we have an exact sequence 0 → Hom( p1∗ E ⊗O X ×X J, 1∗ O X ) → Ext1 (1∗ E, 1∗ O X ) → Ext1 ( p1∗ E, 1∗ O X ). Note that the first and last terms are isomorphic to Hom(E, T ) and Ext1 (E, O X ), respectively. It remains to note that there is a canonical splitting 1∗ : Ext1 (E, O X ) → Ext1 (1∗ E, 1∗ O ). Note that the projection Ext1 (1∗ E, 1∗ O X ) → Hom(E, T ) can be described as follows. Given an extension 0 −−−−→ 1∗ O X −−−−→ E˜ −−−−→ 1∗ E −−−−→ 0, the action of J/J 2 on E˜ induces the morphism J/J 2 ⊗ E˜ → 1∗ O X , which factors through J/J 2 ⊗ 1∗ E since J annihilates 1∗ O X . Hence we get a morphism 1∗ E → 1∗ T . Now if A is a D-algebra, equipped with a filtration A• such that grA ' S • (E), then we consider the corresponding extension of O X -bimodules 0 −−−−→ O X = A0 −−−−→ A1 −−−−→ E = A1 /A0 −−−−→ 0 as an element in Ext1O X ×X (1∗ E, 1∗ O X ). By definition, A is a tdo if the projection of this element to HomO X (E, T ) is a map E → T that is an isomorphism.
6. Equivalences of categories of modules over D-algebras 6.1 Let P and Q be objects in D− qc (X × Y ) such that P ◦Y Q ' 1∗ O X , Q ◦ X P ' 1∗ OY .
(6.1.1)
136
POLISHCHUK AND ROTHSTEIN
By Proposition 3.1, the transforms by P and Q give equivalences of the derived − categories D− qc (X ) and Dqc (Y ). There are many examples of such equivalences. However, we are mainly interested in the case when X and Y are dual abelian varieties, where we may take P to be the normalized Poincar´e line bundle on X × Y and Q = P −1 ω−1 X [−g], where g = dim X . In this case we can extend the equivalence to the derived categories of modules over a large class of D-algebras on X and Y . The idea is to study the functor Q ◦ X (·) ◦ X P − from D− qc (X × X ) to Dqc (Y × Y ) for sheaves supported on the diagonal.
6.2. Special sheaves Let us call a quasi-coherent sheaf K on X ×X special if there is an exhaustive filtration of K by quasi-coherent sheaves 0 = K −1 ⊂ K 0 ⊂ K 1 ⊂ . . . such that for all i, K i /K i−1 ' 1∗ (Fi ) for some globally free O X -module Fi . Denote by S X the exact category of special sheaves on X × X . We have the following easy proposition. 6.1 For every K ∈ S X , the functor M 7 → K ◦ X M is exact from O X ×Z -modules to O X ×Z -modules. For every pair of special sheaves K , K 0 ∈ S X , K ◦ X K 0 is special.
PROPOSITION
(1) (2)
From the fact that Q ◦ X (1∗ O X )◦ X P ' 1∗ OY , we obtain the following proposition. PROPOSITION 6.2 The functor 8 : K 7→ Q ◦ X K ◦ X P defines an equivalence of categories 8 : S X → SY , the inverse being K 7→ P ◦Y K ◦Y Q . PROPOSITION 6.3 For K , K 0 ∈ S X , M ∈ D− (X ), one has a canonical isomorphism of OY -bimodules,
8(K ◦ X K 0 ) ' 8K ◦Y 8K 0 , and a canonical isomorphism in mathb f D − (Y ), 8(K ◦ X M) ' 8K ◦Y 8M, where 8M = Q ◦ X M. Definition 6.4 A D-scheme (X, A ) is special if δ A is special when regarded as an O X ×X -module.
FOURIER TRANSFORM FOR D-ALGEBRAS, I
137
The main class of examples of special D-schemes is provided by tdo’s over abelian varieties. It follows from Propositions 6.2 and 6.3 that for any special D-scheme (X, A ) there exists a canonical D-algebra 8A on Y such that δ(8A ) ' 8(δ A ). Indeed, every special D-algebra is flat, so the structural morphism of A is a morphism δ A ◦ X δ A → δ A . One just has to apply 8 to this morphism and also to the morphism 1∗ O X → δ A . Then (Y, 8A ) is again a special D-scheme. Futhermore, we now prove that the derived categories of modules over A and 8A are equivalent. THEOREM 6.5 Assume that the objects P and Q in equations (6.1.1) are quasi-coherent sheaves up to a shift (i.e., they have only one cohomology). Then for every special D-algebra A − on X there is an exact equivalence 8 : D− qc (A ) → Dqc (8A ) such that the following diagram of functors is commutative: 8
D− −−−→ D− qc (A ) − qc (8A ) y y 8
0 0 D− −−−→ D− qc (A ) − qc (8A )
for every homomorphism of special D-algebras A 0 → A (resp., A → A 0 ), where the vertical arrows are the restriction (resp., induction) functors. Proof Set B = 8A . Set op G 0 = P ◦Y δ B ∈ D− qc (O X B ),
H 0 = δ B ◦Y Q ∈ D− qc (BO X ).
Note that these objects are concentrated in one cohomological degree. We claim, op moreover, that G 0 and H 0 come from objects G and H in D− qc (A B ) and op D− qc (BA ), respectively, via the forgetful functor. By Proposition 3.2, it suffices to endow G 0 with a left action of A with respect to ◦, commuting with the right B -action. But G 0 = P ◦ (Q ◦ δ A ◦ P ) = δ A ◦ X P , which exhibits the desired structure. If we apply the forgetful functor to G ◦B H , we get G 0 ◦B H 0 ∈ D− qc (X × X ). Moreover, G 0 ◦B H 0 = (P ◦Y δ B ) ◦B (δ B ◦Y Q ) = P ◦Y δ B ◦Y Q = δ A
138
POLISHCHUK AND ROTHSTEIN
as an O X ×X -module. We have to check that we have the correct A A op -module structure on G ◦B H , but this can be done one side at a time. That is, we have op G 0 ◦B H ∈ D− qc (O X A )
and G ◦B H 0 ∈ D− qc (A O X ),
and it is easy to check that we get the correct π −1 (A op )- and π −1 (A )-module structures. We now have G ◦B H ' δ A and H ◦A G ' δ B , so we have equivalences of categories (·)G and (·)H . Now let A1 →A2 be a homomorphism of special D-algebras on X . Let Gi = P ◦OY δ Ai for i = 1, 2. Then we have an isomorphism of 8A1 (A2 )op -modules G2 ' G1 ◦A1 A2 .
This immediately implies that the equivalences 8 for A1 and A2 commute with restriction. Indeed, for an A2 -module M we have an isomorphism of 8A1 -modules, G2 ◦A2 M ' (G1 ◦A1 A2 ) ◦A2 M ' G1 ◦A1 M.
The compatibility of 8 with induction is checked similarly, using the isomorphism G2 ' 8A2 ◦8A1 G1 .
Example 6.6 (Connections; see [R]) For any smooth variety X there is a D-algebra C X such that, for any O X -module F , endowing F with a (not-necessarily-integrable) connection is equivalent to endowing F with the structure of a C X -module. If we denote by T the tangent sheaf of X , then α there is a map of left O X -modules T → C X such that, for f ∈ O and ξ ∈ T , α(ξ( f )) = α(ξ ) f − f α(ξ ), and C X is universal among O X -algebras with this property. The graded algebra associated to the maximal D-filtration of C X is the tensor algebra T (T ). In particular, if X is an abelian variety, (X, C X ) is a special D-scheme. To compute its Fourier transform, it suffices to transform the extension of O X ×X -modules 0→O X →O X ⊕ T →T →0, where the middle term is an O X -bimodule by the formula f (g, ξ )h = ( f gh + f ξ(h), f hξ ).
(6.2.1)
FOURIER TRANSFORM FOR D-ALGEBRAS, I
139
Set g = H 0 (X, T ) = H 1 (Y, O ), where Y is the dual abelian variety. By Lemma 5.2, Ext1O X ×X (1∗ O X , 1∗ O X ) = g ⊕ gˆ , where gˆ = H 0 (Y, T ) = H 1 (X, O ). With this identification the extension class of (6.2.1) is the identity element in g∗ ⊗ g. Therefore the transform of extension (6.2.1) is the universal vector bundle extension on Y, 0→OY →E →g ⊗ OY →0.
(6.2.2)
We find, therefore, that 8(C X ) is the universal D-algebra associated to the extension (6.2.2); that is, 8(C X ) = T (E )/(1E − 1), the tensor algebra of E modulo the identification of 1’s. Note that for an OY -module F to be endowed with a module structure over 8(C X ) is the same as to give a splitting of the sequence 0→F →E ⊗O F →g ⊗C F →0. Thus the derived category of sheaves on Y equipped with such a splitting is equivalent via the Fourier-Mukai transform to the derived category of sheaves on X equipped with a connection. This is the form of the equivalence given in [R]. Example 6.7 (Integrable connections) These are D -modules. Now D X is the quotient of C X by the relations [α(ξ ), α(ζ )] = α([ξ, ζ ]) for vector fields ξ and ζ . Upon Fourier transform this translates to the condition that our 8(C X )-module is in fact a module over the commutative D-algebra AY = Sym(E )/(1E − 1). Now SpecY (AY ) is the universal additive group extension Y \ →Y . This recovers Laumon’s correspondence between the derived category of D modules on X and the derived category of O -modules on Y \ . Below we show that this equivalence is a degenerate case of a more symmetric picture involving the categories of modules over tdo’s on both X and Y . Remark 6.8 Assuming that X is an abelian variety, we can generalize the notion of a special Dalgebra on X as follows. Instead of considering special sheaves on X × X one can consider quasi-coherent sheaves on X × X admitting filtration with quotients of the form (id, tx )∗ L, where (id, tx ) : X → X × X is the graph of the translation by some point x ∈ X and L is a line bundle algebraically equivalent to zero on X . Let us call such sheaves quasi-special. It is easy to see that quasi-special sheaves are flat over X
140
POLISHCHUK AND ROTHSTEIN
with respect to both projections p1 and p2 , so the operation ◦ is exact on them. We can define a quasi-special algebra as a quasi-special sheaf K on X × X together with the associative multiplication K ◦ K → K admitting a unit 1∗ O X → K . Then there is a Fourier duality for quasi-special algebras and equivalence of the corresponding derived categories. The proof of Theorem 6.5 works literally in this situation. Note that modules over quasi-special algebras form a much broader class of categories than those over special D-algebras. Among these categories we can find some categories of modules over 1-motives, and our Fourier duality coincides with the one defined by G. Laumon in [L]. For example, a homomorphism φ : Z → X defines a quasi-special algebra on X that is a sum of structural sheaves of graphs of translations by φ(n), n ∈ Z. The corresponding category of modules is the category of Z-equivariant O X modules. The Fourier dual algebra corresponds to the affine group over Y that is an extension of Y by the multiplicative group.
7. Transforms of Lie algebroids and twisted differential operators PROPOSITION 7.1 Let L be a Lie algebroid on Y such that L ' OYd as an OY -module. Then for any ˜ is special. Futhermore, one central extension L˜ of L by OY , the D-algebra U ◦ ( L) ◦ ◦ 0 ˜ ' U ( L˜ ) for some central extension L˜ 0 of a Lie algebroid L 0 on X has 8U ( L) such that L 0 ' O Xd as an O X -module.
Proof This follows from Lemma 5.1. One just has to notice that if a D-algebra A on Y has an algebra filtration A• with grA• ' S • (OYd ), then 8A has an algebra filtration F A• with gr8A• ' S • (O Xd ). ˜ Note that if L is a successive extension of trivial bundles, then the D-algebra U ◦ ( L) ˜ is not necessarily of the form U ◦ ( L˜ 0 ). is still special, but 8U ◦ ( L) 7.1 From now on, we assume that X is an abelian variety; Y is the dual abelian variety. As before, denote by g (resp., gˆ ) the tangent space to X (resp., Y ) at zero. Let T˜ be a Picard algebroid on X , and let D = U ◦ (T˜ ) be the corresponding tdo. Then T˜ /O X ' T X ' g ⊗C O X is a trivial O X -module. Hence D is a special D-algebra. By Proposition 7.1, 8D ' U ◦ ( L˜ 0 ) for some Lie algebroid L 0 on Y and its central extension L˜ 0 by OY . It is then natural to ask whether 8D is a tdo.
FOURIER TRANSFORM FOR D-ALGEBRAS, I
141
THEOREM 7.2 Let D be a tdo on X , and let T˜ be the corresponding Picard algebroid. Then 8D is a tdo on Y if and only if the map g → H 1 (X, O ), induced by the extension of O X -modules 0 → O X → T˜ → g ⊗C O X → 0,
is an isomorphism. Proof Let D• be the canonical filtration of D . Then 8D is a tdo if and only if the class of the extension of OY -bimodules 0 → OY ' 8D0 → 8D1 → 8(D1 /D0 ) ' g ⊗C OY → 0 induces an isomorphism gˆ ⊗C OY → TY . Thus it is sufficient to check that the components of the canonical decomposition Ext1O X ×X (1∗ O X , 1∗ O X ) ' H 0 (X, T X ) ⊕ H 1 (X, O X ), introduced in Lemma 5.2, get interchanged by the Fourier-Mukai transform if we take into account the natural isomorphisms H 0 (X, T ) ' g ' H 1 (Y, O ), H 1 (X, O ) ' gˆ ' H 0 (Y, T ). We leave this to the reader as a pleasant exercise on Fourier-Mukai transform. 7.2 Let us describe in more detail the data consisting of a Lie algebroid L on an abelian variety X such that L ' V ⊗C O X as an O X -module (where V is a finite-dimensional k-vector space) and a central extension L˜ of L by O X . First of all, V = H 0 (X, L) has a structure of a Lie algebra, and the structural morphism L → T is given by some k-linear map β : V → g = H 0 (X, T ) which is a homomorphism of Lie algebras (where g is an abelian Lie algebra). The central extension L˜ is described (up to an isomorphism) by a class e α in the first hypercohomology space H1 (X, L ∗ → 2 ∗ 3 ∗ ∧ L → ∧ L → . . .) of the truncated Koszul complex of L. In particular, we have the corresponding class α ∈ H 1 (X, L ∗ ), which is just the class of the extension of O X -modules 0 → O X → L˜ → L → 0. We can consider α as a linear map V → H 1 (X, O X ) = gˆ . The maps α and β get interchanged by the Fourier transform, up to a sign.
142
POLISHCHUK AND ROTHSTEIN
By definition, the D-algebra associated with e L is a tdo if and only if β : V → g is an isomorphism. If in addition α : V → gˆ is an isomorphism, then the dual Dalgebra is also a tdo. Thus we have a bijection between tdo’s with nondegenerate first Chern class on X and Y such that the corresponding derived categories of modules are equivalent. According to [BB], isomorphism classes of tdo’s on X are classified by H2 (X, ≥1 ), which is an extension of H 1 (X, 1 ) ' Hom(g, gˆ ) by H 0 (X, 2 ) = ∧2 g∗ . Let U X ⊂ H2 (X, ≥1 ) be the subset of elements with nondegenerate projection to H 1 (X, 1 ). The duality gives an isomorphism between U X and UY . It is easy to see that under this isomorphism the operation of multiplication by λ ∈ C∗ on U X corresponds to multiplication by λ−1 on UY . On the other hand, let A be a tdo with trivial c1 . In other words, A corresponds to some global 2-form ω on X . Modules over A are O -modules equipped with a connection having curvature ω. Let B be the dual D-algebra on Y , and let e L→L = H 0 (X, T X ) ⊗ OY be the corresponding central extension of Lie algebroids. We claim that L is just an OY -linear commutative Lie algebra while the central extension e L is given by the class (e, ω) ∈ H 1 (L ∗ ) ⊕ H 0 (∧2 L ∗ ), where e is the canonical element in H 1 (L ∗ ) ' H 1 (Y, O ) ⊗ H 1 (Y, O )∗ . Indeed, as an OY -module, e L is a universal 1 extension of H (Y, O ) ⊗ O by O . Hence the Lie bracket defines a morphism of O modules ∧2 L→e L. Since H 0 (e L) = H 0 (O ), it follows that [e L, e L] ⊂ O ⊂ e L. It is easy to see that the Lie bracket is just given by ω : ∧2 L→O . 7.3. Action on the Neron-Severi group Recall that the Neron-Severi group of X is identified with Homsym (X, Y ) ⊗ Q, where Homsym (X, Y ) is the group of symmetric homomorphisms X →Y . Namely, to a line bundle L there corresponds a symmetric homomorphism φ L : X →Y sending a point x to tx∗ L ⊗ L −1 , where tx : X →X is the translation by x. One has the natural Q-linear homomorphism c1 : N S(X )→H2 (X, ≥1 ) sending a line bundle L to the class of the ring D L of differential operators on L. For µ ∈ N S(X ) we denote by Dµ the corresponding tdo. For a vector bundle E we set c1 (E) = c1 (detE). 7.3 If µ ∈ N S(X ) is a nondegenerate class, then the Fourier tdo dual to Dµ is PROPOSITION
8(Dµ ) = D−µ−1 .
(7.3.1)
Proof It suffices to check this when µ is a class of a line bundle L, in which case it follows easily from the isomorphism φ L∗ det8(L) ' L −rk8(L)
FOURIER TRANSFORM FOR D-ALGEBRAS, I
143
and the fact that the dual tdo to D L acts on 8(L). 8. Projective connections Let E be a coherent sheaf that is a module over some tdo on X (then E is automatically locally free). Following [BB], we say in this case that there is an integrable projective connection on E. PROPOSITION 8.1 Let E be a vector bundle on X equipped with an integrable projective connection. Assume that detE is a nondegenerate line bundle. Then H i 8(E) are vector bundles with canonical integrable projective connections, and the following equality holds: ∗ φdetE c1 (8(E)) = −χ(X, E) · rkE · c1 (E).
Proof The first statement follows immediately from the fact that 8(E) is quasi-isomorphic to a complex of modules over the tdo on Y dual to D(detE)1/r , where r = rkE. On the other hand, this tdo acting on 8(E) is isomorphic to D(det8(E))1/r 0 , where r 0 = rk8(E) = χ(X, E). Considering classes of these dual tdo’s and using the isomorphism (7.3.1) applied to µ = (1/r )φdetE , we get the above formula. 8.1 The following two natural questions arise: (1) For every µ ∈ N S(X ), does there exist a vector bundle E on X that is a module over Dµ ? (2) Which vector bundles on an abelian variety admit integrable projective connections? To answer these questions we use the following construction. Let π : X 1 →X 2 be an isogeny of abelian varieties, and let E be a vector bundle with an integrable projective connection on X 1 . Then there is a canonical integrable projective connection on π∗ E. Indeed, the simplest way to see this is to use Fourier duality. If E is a module over some tdo Dλ on X 1 , then 8(E) is a module over the dual D-algebra 8(Dλ ) on Y1 . Now we use the formula π∗ E ' 8−1 πˆ ∗ (8(E)), where 8−1 is the inverse Fourier transform on X 2 ; hence π∗ E is a module over 8−1 πˆ ∗ 8(Dλ ) which is a tdo on X 2 . In particular, the pushforwards of line bundles under isogenies have canonical integrable projective connections. Also, it is clear that if E is a vector bundle with an integrable projective connection and F is a flat vector bundle, then E ⊗ F has a natural integrable projective connection.
144
POLISHCHUK AND ROTHSTEIN
Now we can answer the above questions. 8.2 For every µ ∈ N S(X ) there exists a vector bundle E that is a module over Dµ . THEOREM
Proof We can write µ = [L]/n, where n > 0 is an integer and [L] is a class of a line bundle L on X . Let [n] A : A→A be an endomorphism of multiplication by n. Then [n]∗A (µ) ∈ N S(X ) is represented by a line bundle L 0 . Now we claim that the pushforward [n] A,∗ L 0 has the structure of a module over Dµ . Indeed, it suffices to check that c1 ([n] A,∗ L 0 )/deg([n] A ) = µ. Let Nmn : N S(X )→N S(X ) be the norm homomorphism corresponding to the isogeny [n] A . Then the left-hand side of the above equality is Nmn ([L 0 ])/deg([n] A ). Hence the pullback of the left-hand side by [n] A is equal to [L 0 ] = [n]∗A (µ), which implies our claim. THEOREM 8.3 Let E be an indecomposable vector bundle with an integrable projective connection on an abelian variety X . Then there exists an isogeny of abelian varieties π : X 0 →X , a line bundle L on X 0 , and a flat bundle F on X , such that E ' π∗ L ⊗ F.
Proof The main idea is to analyze the sheaf of algebras A = End(E). Namely, A has a flat connection such that the multiplication is covariantly constant. In other words, it corresponds to a representation of the fundamental group π1 (X ) in automorphisms of the matrix algebra. Since all such automorphisms are inner, we get a homomorphism ρ : π1 (X )→PGL(E 0 ), where E 0 is a fiber of E at zero. Now the central extension SL(E 0 )→PGL(E 0 ) induces a central extension of π1 (X ) = Z2g by the group of roots of unity of order rk E. This central extension splits on some subgroup of finite index H ⊂ π1 (X ). In other words, the restriction of ρ to H lifts to a homomorphism ρ H : H →GL(E 0 ). Let π : e X →X be an isogeny corresponding to H , so that e X is an abelian variety with e on e π1 ( e X ) = H . Then ρ H defines a flat bundle F X such that e π ∗ A ' End( F) e for some line bundle as algebras with connections. It follows that π ∗ E ' L ⊗ F e e L on X . Thus E is a direct summand of π∗ (L ⊗ F). Note that there exists a flat
FOURIER TRANSFORM FOR D-ALGEBRAS, I
145
e ' π ∗ F. (Again, the simplest way to see this is to use the bundle F on X such that F Fourier duality.) Hence E is a direct summand of π∗ L ⊗ F. It remains to check that all indecomposable summands of the latter bundle have the same form. This follows from the following lemma. 8.4 Let π : X 1 →X 2 be an isogeny of abelian varieties, let L be a line bundle on X 1 , and let F be an indecomposable flat bundle on X 1 . Assume that π∗ (L ⊗ F) is decomposable. Then there exists a nontrivial factorization of π into a composition LEMMA
π0
X 1 → X 10 →X 2 such that L ' (π 0 )∗ L 0 for some line bundle L 0 on X 10 . Proof By adjunction and projection formula we have End(π∗ (L ⊗ F)) ' Hom(π ∗ π∗ (L ⊗ F), L ⊗ F) ' ⊕x∈K Hom(tx∗ L ⊗ F, L ⊗ F), where K ⊂ X 1 is the kernel of π. If tx∗ L ' L for some x ∈ K , x 6= 0, then L descends to a line bundle on the quotient of X 1 by the subgroup generated by x. Otherwise we get End(π∗ (L ⊗ F)) ' End(F); hence π∗ (L ⊗ F) is indecomposable. References [BB]
[H1] [H2] [K] [Kr1] [Kr2] [L] [Mc]
A. BEILINSON and J. BERNSTEIN, “A proof of Jantzen conjectures” in I. M. Gelfand
Seminar, Adv. Soviet Math. 16, Part 1, Amer. Math. Soc., Providence, 1993, 1–50. MR 95a:22022 123, 125, 142, 143 R. HARTSHORNE, Residues and Duality, Lecture Notes in Math. 20, Springer, Berlin, 1966. MR 36:5145 129, 131 , Algebraic Geometry, Grad. Texts in Math. 52, Springer, New York, 1977. MR 57:3116 M. KAPRANOV, Noncommutative geometry based on commutator expansions, J. Reine Angew. Math. 505 (1998), 73–118. MR 2000b:14003 124 I. M. KRICHEVER, Algebraic-geometric construction of the Zaharov-Sabat equations and their periodic solutions, Soviet Math. Dokl. 17 (1976), 394–397. 124 , Integration of nonlinear equations by the methods of nonlinear geometry (in Russian), Funk. Anal. i Pril. 11 (1977), 15–31. 124 G. LAUMON, Transformation de Fourier g´en´eralis´ee, preprint, arXiv:math.alg-geom/9603004 123, 140 K. MACKENZIE, Lie Groupoids and Lie Algebroids in Differential Geometry, London Math. Soc. Lecture Note Ser. 124, Cambridge Univ. Press, Cambridge, 1987. MR 89g:58225 133, 134
146
POLISHCHUK AND ROTHSTEIN
[M]
S. MUKAI, Duality between D(X ) and D( Xˆ ) with its application to Picard sheaves,
[R]
M. ROTHSTEIN, Sheaves with connection on abelian varieties, Duke Math. J. 84
Nagoya Math. J. 81 (1981), 153–175. MR 82f:14036 123, 127 (1996), 565–598, MR 98i:14044a; Correction, Duke Math. J. 87 (1997), 205–211. MR 98i:14044b 123, 138, 139
Polishchuk Department of Mathematics, Harvard University, Cambridge, Massachusetts 02138, USA;
[email protected]; current: Department of Mathematics and Statistics, Boston University, Boston, Massachusetts 02215, USA;
[email protected] Rothstein Department of Mathematics, University of Georgia, Athens, Georgia 30602, USA;
[email protected]
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 1,
LOW-LYING ZEROS OF L-FUNCTIONS AND RANDOM MATRIX THEORY MICHAEL RUBINSTEIN
Abstract By looking at the average behavior (n-level density) of the low-lying zeros of certain families of L-functions, we find evidence, as predicted by function field analogs, in favor of a spectral interpretation of the nontrivial zeros in terms of the classical compact groups. 1. Introduction In this paper, a connection is made between the low-lying zeros of L-functions and the eigenvalues of large matrices from the classical compact groups. The Langlands program (see [2], [10], [7]) predicts that all L-functions can be written as products of ζ (s) and L-functions attached to automorphic cuspidal representations of GL M over Q. Such an L-function is given intially (for <s sufficiently large) as an Euler product of the form L(s, π) =
Y p
L(s, π p ) =
M YY
(1 − απ ( p, j) p −s )−1 .
(1.1)
p j=1
Basic properties of such L-functions are described in [15]. The L-functions that arise in the m = 1 case are the Riemann zeta-function ζ (s) and Dirichlet L-functions L(s, χ ), χ a primitive character. For m = 2, the L-functions in question are associated to cusp forms or Maass forms of congruence subgroups of SL2 (Z). The Riemann hypothesis (RH) for L(s, π ) asserts that the nontrivial zeros of L(s, π), {1/2 + iγπ } all have γπ ∈ R. (Our L-functions are always normalized so that the critical line is through <s = 1/2.) A vague suggestion of G. P´olya and D. Hilbert suggests an approach that one might take in establishing RH. They hypothesized (for ζ (s)) that one might be able to associate the nontrivial zeros of ζ to the eigenvalues of some operator acting on some Hilbert space, thus (depending on the properties of the operator) forcing the zeros to lie on a line. DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 1, Received 11 February 2000. Revision received 25 September 2000. 2000 Mathematics Subject Classification. Primary 11M26; Secondary 15A52. 147
148
MICHAEL RUBINSTEIN
The first evidence in favor of this approach was obtained by H. Montgomery [9], who derived (under certain restrictions) the pair correlation of the zeros of ζ (s). Together with an observation of Freeman Dyson, who pointed out that the Gaussian Unitary Ensemble (GUE), consisting of N × N random Hermitian matrices (see [8] for a more precise definition), has the same pair correlation (as N → ∞), it seems to suggest that the relevant operator, at least for ζ (s), might be Hermitian. Extensive computations of A. Odlyzko [11], [12] further seem to bolster the Hermitian nature of the zeros of ζ (s), as might the work of Z. Rudnick and P. Sarnak [15], where, under certain restrictions, the n-level correlations of ζ (s) and L(s, π) are found to be the same as those of the GUE. However, recent developments suggest that, rather than being Hermitian, the relevant operators for L-functions belong to the classical compact groups. (This is consistent with the above work of Montgomery, Odlyzko, and Rudnick and Sarnak since all the classical compact groups have the same n-level correlations as the GUE (as N → ∞).) First, analogs with function field zeta-functions, where there is a spectral interpretation of the zeros in terms of Frobenius on cohomology, point towards the classical compact groups (see [6]). Second, even though all the mentioned families of matrices have the same n-level correlations, there is another statistic, called n-level density, which is sensitive to the particular family. By looking at this statistic for zeros of L-functions, one finds the fingerprints of the classical compact groups. For n = 1 ¨ uk this was done, for quadratic twists of ζ (s), and with certain restrictions, by A. Ozl¨ and C. Snyder [13]. A stronger result (which takes into account certain nondiagonal contributions and allows one to choose test functions whose Fourier transform is supported in (−2/M, 2/M)) which applies for ζ (s) as well as all L(s, π) was obtained by N. Katz and Sarnak [6]. The general case, n ≥ 1, is worked out (again, with some restrictions) in this paper. 2. n-level density For an (N × N )-matrix A in one of the classical compact groups, write its eigenvalues as λ j = eiθ j , with 0 ≤ θ1 ≤ · · · ≤ θ N < 2π. (2.1) Assume that f : Rn → R is bounded, Borel measurable, and compactly supported. Then, letting X H (n) (A, f ) = f θ j1 N /(2π), . . . , θ jn N /(2π) , 1≤ j1 ,..., jn ≤N distinct
Katz and Sarnak [5, Appendix] obtain the following family dependent result: Z Z (n) (n) lim H (A, f ) d A = WG (x) f (x) d x N →∞ G(N )
Rn≥0
(2.2)
LOW-LYING ZEROS
149
for the following families: (n)
G
WG
U(N ),Uκ (N ) USp(N ) SO(2N ), O− (2N + 1) SO(2N + 1), O− (2N )
det K 0 (x j , xk ) 1≤ j≤n 1≤k≤n det K −1 (x j , xk ) 1≤ j≤n 1≤k≤n det K 1 (x j , xk ) 1≤ j≤n 1≤k≤n det K −1 (x j , xk ) 1≤ j≤n 1≤k≤n P + nν=1 δ(xν ) det K −1 (x j , xk ) 1≤ j6=ν≤n 1≤k6=ν≤n
with K ε (x, y) =
sin(π(x − y)) sin(π(x + y)) +ε . π(x − y) π(x + y)
In the above, d A is the Haar measure on G(N ) (normalized so that and Uκ (N ) = A ∈ U(N ) : detκ (A) = 1 ,
R
G(N ) d A
= 1),
SO(N ) = {A ∈ O(N ) : det A = 1} , O− (N ) = {A ∈ O(N ) : det A = −1} . The delta functions in the SO(2N + 1), O− (2N ) case are accounted for by the eigenvalue λ1 = 1. (Notice, for O(N ), that λ = 1 is an eigenvalue if N is even and det A = −1 (i.e., A ∈ O− (2N )) or if N is odd and det A = 1 (i.e., if A ∈ SO(2N + 1).) (n) Removing this zero from (2.2) would yield the same WG as for USp. For ease of (n) notation, we refer to the third WG above (i.e., det K 1 (x j , xk ) ) as the scaling den(n) sity of O+ and the fourth WG as the scaling density of O− . (We use this notation because the former comes from orthogonal matrices with even functional equations p(z) = z N p(1/z), while the latter comes from orthogonal matrices with odd functional equations p(z) = −z N p(1/z).) One could also form a similar statistic for the eigenvalues of the GUE (where we would normalize the eigenvalues according to the Wigner semicircle law), and one could obtain the same answer (as N → ∞) as for U(N ). (n) The function WG (x) is called the n-level scaling density of the group G(N ), and its nonuniversality can be used to detect which group lies behind which family of L-functions. Notice that the normalization by N /(2π) is such that the mean spacing is 1 and that only the low-lying eigenvalues (those with θ ≤ c/N for some constant c) contribute to H (n) (A, f ). So, (2.2) measures how the low-lying eigenvalues of matrices in G(N ) fall near the point 1 (as N → ∞).
150
MICHAEL RUBINSTEIN
3. Results In this section, we consider the analog of (2.2) for the zeros of families of L-functions. One looks at the average behavior of the low-lying nontrivial zeros (i.e., those close to the real axis) of a family of L-functions hoping to find evidence (as predicted by functional field analogs (see [6])) in favor of a spectral interpretation in terms of the classical compact groups. Indeed, if we take quadratic twists of ζ (s), {L(s, χd )}, as our family of L functions, where χd (n) = dn is Kronecker’s symbol and we restrict ourselves to primitive χd , we find evidence of a USp(∞) symmetry. This is Theorem 3.1. More generally, we take a self-contragredient automorphic cuspidal representation of GL M over Q, π = π, ˜ that is, one whose L-function has real coefficients, απ ( p, j) ∈ R, and we look at the family of quadratic twists, {L(s, π ⊗ χd )}. The low-lying zeros of this family behave as if they are coming either from USp(∞) or from O± (∞). (Here the ± is to indicate that we need to consider separately the L(s, π ⊗ χd )’s with even (resp., odd) functional equations.) We describe this result in Theorem 3.2. It confirms the connection to the classical compact groups, and it gives an answer that cannot be confused with the corresponding statistic for the GUE. Numerical experiments that further support the connection to classical compact groups are described in the author’s thesis [14] and in Katz and Sarnak [6]. 3.1. Main theorem Write the nontrivial zeros of L(s, χd ) as ( j)
1/2 + iγd , where
(1)
0 ≤ <γd and
(2)
≤ <γd
(− j)
γd
j = ±1, ±2, . . . , (3)
≤ <γd . . . ( j)
= −γd .
(3.4)
Here χd (n) = n is Kronecker’s symbol, and we restrict ourselves to primitive χd . Let D denote the set of such d’s, and let D(X ) = {d ∈ D : X/2 ≤ |d| < X }. Notice that we are not assuming the Riemann hypothesis for L(s, χd ) since we ( j) allow that the γd ’s be complex. d
THEOREM
3.1
Let f (x1 , x2 , . . . , xn ) =
n Y i=1
f i (xi ),
(3.5)
LOW-LYING ZEROS
151
where each f i is even and in S(R) (i.e., smooth and rapidly decreasing). Assume Qn Pn |u i | < 1, where further that fˆ(u 1 , . . . , u n ) = i=1 fˆi (u i ) is supported in i=1 Z def fˆ(u) = f (x)e2πi x·u d x. (3.6) Rn
Then X X∗ ( j ) 1 (j ) (j ) f Lγd 1 , Lγd 2 , . . . , Lγd n X →∞ |D(X )| d∈D(X ) j1 ,..., jn Z (n) = f (x)WUSp (x) d x, lim
(3.7)
Rn
where log X , 2π (n) WUSp (x1 , . . . , xn ) = det K −1 (x j , xk ) 1≤ j≤n , L=
1≤k≤n
sin(π(x − y)) sin(π(x + y)) K −1 (x, y) = − , π(x − y) π(x + y) and where
P∗
j1 ,..., jn
is over jk = ±1, ±2, . . ., with jk1 6 = ± jk2 if k1 6= k2 .
Plan. We first use the explicit formula to study the l.h.s. (left-hand side) of (3.7), and we end up expressing it in terms of the fˆi ’s. Parseval’s formula is then applied to the r.h.s. (right-hand side) of (3.7), and terms are matched with the l.h.s. Remark. The condition f i even is not essential to the proof, nor is the assumption that Q f be of the form f i . At the expense of more cumbersome writing, these can be removed. 3.2. l.h.s. By (3.4), (3.5), and since f i (−x) = f i (x), X X∗ ( j ) 1 (j ) (j ) f Lγd 1 , Lγd 2 , . . . , Lγd n |D(X )| =
d∈D(X ) j1 ,..., jn X 2n
|D(X )|
X
f˜d ( j1 , . . . , jn ),
(3.8)
(j ) f i Lγd i .
(3.9)
d∈D(X ) j1 ,..., jn positive and distinct
where f˜d ( j1 , . . . , jn ) =
n Y i=1
152
MICHAEL RUBINSTEIN
In order to apply the explicit formula to (3.8), we need to circumvent the fact that the ji ’s are distinct. By combinatorial sieving, as in [15, p. 305], the r.h.s. of (3.8) is ν(F) n X X Y 2 (−1)n−ν(F) (|F` | − 1)! w F , |D(X )| `=1
d∈D(X ) F
where F ranges over all ways of decomposing {1, 2, . . . , n} into disjoint subsets [F1 , . . . , Fν ], and where X wF = f˜d (` F ( j1 , . . . , jν )). j1 ,..., jν positive
Here ` F : Rν → Rn , ` F (x1 , . . . , xν ) = (y1 , . . . , yn ) with yi = x j if i ∈ F ` . For example, for n = 3, the possible F’s are [{1, 2, 3}], [{1, 2} , {3}], [{1, 3} , {2}], [{2, 3} , {1}], [{1} , {2} , {3}], and `[{1,3},{2}] (x1 , x2 ) = (x1 , x2 , x1 ). Thus, (3.8) is ν(F) X X Y X 2n (−1)n−ν(F) f˜d (` F ( j1 , . . . , jν(F) )), (|F` | − 1)! |D(X )| j1 ,..., jν(F) positive
`=1
d∈D(X ) F
which, by (3.9), equals Y XY X X (−1)n−ν(F) ν(F) 2n (|F` | − 1)! f i (Lγd ) . |D(X )| 2ν(F) γ `=1
d∈D(X ) F
d
(3.10)
i∈F`
( j)
In the innermost sum, we are going over all γd (instead of j > 0) and hence the presence of the 1/2ν(F) . This is justified by (3.4) and because we are assuming that the f i ’s are even. Let Y F` (x) = f i (x). (3.11) i∈F`
By the explicit formula (see [15, (2.16)], with, in the notation of that paper, h(r ) = F` (Lr ), g(y) = (1/ log X ) Fˆ` (−y/ log X )), X ( j) Z F` Lγd = F` (x) d x + O(1/ log X ) γd
R
−
∞ 2 X 3(m) log m ˆ χ (m) F . d ` log X log X m 1/2 m=1
(3.12)
LOW-LYING ZEROS
153
(Note that Fˆ` (x) is even since each f i is even. We have also used the facts that F` (x) is rapidly decreasing and that 0 0 (s)/ 0(s) = O(log |s|) to replace the 0 0 / 0-terms in [15, (2.16)] by O(1/ log X ). Note further that Fˆ` is compactly supported (see Claim 1).) Plugging (3.12) into (3.10) (without the O(1/ log X )-term, a step that is justified in Lemma 2), we see, on multiplying out the product over ` in (3.10), that (3.10) is ν(F) X X Y 1 n−ν(F) (−2) (|F` | − 1)!(C` + D` ), |D(X )| `=1
d∈D(X ) F
where Z
F` (x) d x,
C` = R
∞ 2 X 3(m) log m ˆ D` = − χd (m) F` . log X log X m 1/2 m=1
When we expand the product over `, we obtain 2ν(F) terms, each a product of C` ’s and D` ’s. A typical term can be written as Y Y C` D` `∈S c
`∈S
for some subset S of 1, 2, . . . , ν(F) . (Empty products are taken to be 1.) The product of the C` ’s contributes to (3.10) a factor of YZ F` (x) d x. `∈S c R
The product of the D` ’s equals
−2 log X
|S| Y X ∞ `∈S m=1
3(m) χd (m) Fˆ` m 1/2
log m log X
,
which, by Lemma 1, contributes a factor of |SY 2 |/2 Z X −1 | S2c | Y Z X |u| Fˆa j (u) Fˆb j (u) du , Fˆ` (u) du 2|S2 |/2 2 R c R S2 ⊆S |S2 | even
`∈S2
(A;B)
j=1
from which we find that (3.10) (and hence (3.8)) tends, as X → ∞, to
154
MICHAEL RUBINSTEIN
ν(F) X Y X (−2)n−ν(F) (|F` | − 1)!
YZ
`=1
`∈S c R
F
X
·
S2 ⊆S |S2 | even
S
·
F` (x) d x
| S c | Y Z 2 −1 Fˆ` (u) du 2 c R `∈S2
X
!
2|S2 |/2
(A;B)
|SY 2 |/2 Z j=1
|u| Fˆa j (u) Fˆb j (u) du . R
(3.13) Here S ranges over all 2ν(F) subsets of 1, 2, . . . , ν(F) , and S c denotes the complement of S. The rest of the notation is as in Lemma 1. LEMMA
1
We have k ∞ X −2 k Y X 1 3(m) log m ˆ lim · χd (m) F` j X →∞ |D(X )| log X log X m 1/2 j=1 m=1 d∈D(X ) c Z X −1 | S2 | Y = Fˆ` (u) du 2 R c `∈S2
S2 ⊆S |S2 | even
·
X
(A;B)
2|S2 |/2
|SY 2 |/2 Z j=1
|u| Fˆa j (u) Fˆb j (u) du ,
(3.14)
R
P where S = {l1 , . . . , lk }. S2 ⊆S is over all subsets S2 of S whose size is even. |S2 | even P (A;B) is over all ways of pairing up the elements of S2 . F` (x) is defined in (3.11). For example, if S = {1, 2, 5, 7}, the possible S2 ’s are ∅, {1, 2}, {1, 5}, {1, 7}, {2, 5}, {2, 7}, {5, 7}, {1, 2, 5, 7}. And if S2 = {1, 2, 5, 7}, then the possible (A; B)’s are (1, 2; 5, 7), (1, 2; 7, 5), (1, 5; 2, 7). These correspond, respectively, to matching 1 with 5 and 2 with 7, 1 with 7 and 2 with 5, 1 with 2 and 5 with 7. Note that our notation is not unique. For example, (1, 2; 5, 7) ≡ (7, 1; 2, 5). Proof Lemma 1 is obtained in a sequence of claims.
LOW-LYING ZEROS
155
CLAIM 1 Qn Pn Q |u i | ≤ α. Then kj=1 Fˆ` j (u j ) is supSuppose that i=1 fˆi (u i ) is supported in i=1 P ported in kj=1 u j ≤ α.
Proof By (3.11), Fˆ` (u) =
Z Y
f i (x)e2πiux d x
R i∈F
`
Z
Y
= R| F` |
R| F` |
|F` | xi /|F` | Y
δ(xim − xi1 )
m=2
Y
=
P i∈F`
i∈F`
Z
2πiu
d xi f i (xi ) e
dvi fˆi (vi ) δ u −
i∈F`
X
vi ,
(3.15)
i∈F`
the last step following from Parseval’s formula. (Note: If |F` | = 1, then the product over m is taken to be 1.) Hence, Z k k Y X Y Y Fˆ` j (u j ) = Pk dvi fˆi (vi ) δ u j − vi . (3.16) j=1
R
1 F` j
S i∈ F
`j
i∈F` j
j=1
In the integrand, the δ’s restrict us to X X u j = |vi | . vi ≤ S j=1 j=1 i∈F` j i∈ F `
k X
k X
j
Pk
u j > α, then P i∈∪ F` |vi | > α. But, by the support condition on j Q P ˆ ˆ f (v ), f (v ) = 0 if i i i i i=1 i∈∪ F` i∈∪ F` |vi | > α. Hence (3.16) is zero if j j Pk u > α; thus the claim is proved. j j=1 So, if Qn
j=1
156
MICHAEL RUBINSTEIN
CLAIM 2 Qn Pn |u i | ≤ α < 1. Then Suppose that i=1 fˆi (u i ) is supported in i=1 k k X X Y 3(m j ) log m j 1 −2 lim χd (m j ) Fˆ` j 1/2 X →∞ |D(X )| log X log X m d∈D(X )
m i ≥1 i=1,...,k m 1 ·...·m k 6=
j=1
j
= 0.
(3.17)
Here we are summing over all k-tuples (m 1 , . . . , m k ) of positive integers with Qk / {1, 4, 9, 16, . . .}, and S = {l1 , . . . , lk }. 1 mi ∈ Remark. This claim tells us that the only contributions to (3.14) come from perfect squares. (This is dealt with in Claim 3.) Proof Changing the order of summation and applying Claim 1 and the Cauchy-Schwarz inequality, we find that the l.h.s. of (3.17) is 1/2
1 1 lim k X →∞ |D(X )| log X P
X
m i ≥1 log m i ≤α log X m 1 ·...·m k 6 =
· P
X
m i ≥1 log m i ≤α log X m 1 ·...·m k 6=
32 (m 1 ) · . . . · 32 (m k ) m1 · . . . · mk
1/2 2 X . χ (m · . . . · m ) d k 1 d∈D(X )
The first bracketed term is less than k/2 X 32 (m) logk X. m α
(3.18)
(3.19)
m≤X
Next, the number of times we may write m = m 1 · . . . · m k , m i ≥ 1, is O σ0k−1 (m) = Oε (m ε ) for any ε > 0 (σ0 (m) being the number of divisors of m), so that the second bracketed term is 2 1/2 ε X X ε X χd (m) . (3.20) m≤X α d∈D(X )
LOW-LYING ZEROS
157
Applying the methods of M. Jutila [4], we find that the above is 1/2 ε X ε+1+α log A X
for some constant A (A = 10 is admissable),
which, combined with (3.19), shows that (3.18) is 1 X ε+(1+α)/2 , X →∞ |D(X )|
ε lim
But, for ε small enough, this limit equals zero (because |D(X )| ∼ cX for some constant c, and we are assuming α < 1). CLAIM 3 We have
X −2 k 1 lim X →∞ |D(X )| log X d∈D(X )
X
k Y 3(m j )
m i ≥1 m 1 ·...·m k =
1/2
j=1
mj
χd (m j ) Fˆ` j
log m j log X
| S c | Y Z X 2 −1 Fˆ` (u) du = 2 c R `∈S2
S2 ⊆S |S2 | even
·
X
(A;B)
2|S2 |/2
|SY 2 |/2 Z j=1
|u| Fˆa j (u) Fˆb j (u) du .
(3.21)
R
Here we are summing over all k-tuples (m 1 , . . . , m k ) of positive integers with Qk 1 m i ∈ {1, 4, 9, 16, . . .}. Proof Q First, the 3(m i )’s restrict us to prime powers, m i = piei , so the only way that k1 m i can equal a perfect square is if some of the ei ’s are even, and the rest of the piei ’s match up to produce squares. We can focus our attention on ei = 1 or 2 since the sum over ei ≥ 3 contributes zero as X → ∞. Q Q Also, note, in (3.21), that χd ( k1 m i ) = 1 since k1 m i is restricted to perfect
158
MICHAEL RUBINSTEIN
squares. Hence the l.h.s. of (3.21) is X
lim
X →∞
X
p` S2 ⊆S |S2 | even Q`∈S2 p` =
−2 log X
|S2 | Y log( pi ) ˆ log pi F i 1/2 log X i∈S2 pi
`∈S2
·
X p` `∈S2c
c −2 | S2 | Y log( pi ) ˆ 2 log pi Fi . log X pi log X c i∈S2
P (We have dropped the (1/ |D(X )|) d∈D(X ) since the terms in the sum do not depend on d.) The sum over ` ∈ S2 corresponds to the e` ’s that are equal to 1 (and that pair up to produce squares), while the sum over ` ∈ S2c corresponds to the e` ’s that are equal to 2. To complete the proof of this claim and hence of Lemma 1, we establish the two subclaims below. SUBCLAIM
3.1
We have X −2 | S2c | Y log( pi ) 2 log pi lim Fˆi X →∞ log X pi log X c p` i∈S2
`∈S2c
=
c Z −1 | S2 | Y Fˆ` (u) du. 2 c R
(3.22)
`∈S2
Proof The l.h.s. of (3.22) factors Y `∈S2c
! −2 X log( p) ˆ 2 log p F` , log X p p log X
which, summing by parts, equals Y 2 Z ∞ X log( p) 2 log t 0 Fˆ` dt. p log X 1 c log X p≤t `∈S2
The sum
P
p≤t
log( p)/ p can be evaluated elementarily (see [3, p. 22]), and the above
LOW-LYING ZEROS
159
becomes 0 2 log t dt (log t + O(1)) Fˆ` log X 1 Y −2 Z ∞ 2 log t dt 1 Fˆ` +O , = log X 1 log X t log X c
2 c log X
Y
`∈S2
∞
Z
(3.23)
`∈S2
(1) the last step from integration by parts, and using the fact that Fˆ` (u) is supported in |u| ≤ α. Changing variables u = 2 log t/ log X and noting that all the Fˆ` ’s are even (since all the f i ’s are), we thus find that the limit in (3.22) is
c Z −1 | S2 | Y Fˆ` (u) du. 2 c R `∈S2
SUBCLAIM
3.2
We have
X →∞
X
lim
p` Q`∈S2 p` =
−2 log X
|S2 | Y log( pi ) ˆ log pi F i 1/2 log X i∈S2 pi
`∈S2
=
X
|S2 |/2
2
(A;B)
|SY 2 |/2 Z j=1
|u| Fˆa j (u) Fˆb j (u) du.
(3.24)
R
Proof Q In (3.24), `∈S2 p` = implies that the p` ’s pair up to produce squares. So, the l.h.s. of (3.24) equals lim
X →∞
X (A;B)
X
|SY 2 |/2
pi j=1 i=1,...,|S2 |/2
log2 ( p j ) Fˆa j pj log2 (X ) 4
log p j log X
Fˆb j
log p j log X
. (3.25)
The sum over (A; B) accounts for all ways of pairing up primes in (3.24). Note that there is a bit of overlap produced in (3.25), but this overlap contributes zero as X → ∞. For example, if S2 = {1, 2, 5, 7}, then the three ways of pairing up p1 , p2 , p5 , p7 are: p1 = p5 and p2 = p7 , p1 = p7 and p2 = p5 , p1 = p2 and p5 = p7 . So the sum over p1 = p2 = p5 = p7 is counted three times in (3.25), whereas it is counted only once in the l.h.s. of (3.24). Such diagonal sums do not bother us since there are Ok (1)
160
MICHAEL RUBINSTEIN
such sums, and a typical p j1 = p j2 = · · · = p j2r , r ≥ 2, contributes to (3.25) a term with a factor that is 1
lim
X →∞
log2r
X log2r p 1 lim = 0. r X →∞ p X p log2r X
Now, (3.25) can be written as lim
X →∞
2 |/2 X |SY
X log2 ( p)
4 log2 (X )
(A;B) j=1
p
p
Fˆa j
log p log X
Fˆb j
log p log X
!
.
Summing by parts, we find that the bracketed term is Z ∞ 4 u Fˆa j (u) Fˆb j (u) du + O (1/ log X ) . 0
ˆ are even, we obtain the subclaim. Recalling that the F’s We thus obtain Claim 3 and Lemma 1. LEMMA
2
Let a` (d) =
X γd
where F` (x) =
( j) F` Lγd ,
f i (x), and f i is as in Theorem 3.1. Then
Q
i∈F`
X ν(F) Y 1 a` (d) X →∞ |D(X )| lim
d∈D(X ) `=1
X ν(F) Y 1 (a` (d) + O(1/ log X )) . X →∞ |D(X )|
= lim
d∈D(X ) `=1
Remark. This lemma justifies dropping the O(1/ log X ) when plugging (3.12) into (3.10). Proof The proof is by induction. We consider k X Y 1 (a` (d) + O(1/ log X )) X →∞ |D(X )|
lim
d∈D(X ) `=1
(3.26)
LOW-LYING ZEROS
161
for k = 1, 2, . . . , ν(F). When k = 1, this clearly equals X 1 a` (d). X →∞ |D(X )| lim
d∈D(X )
Now, consider the general case. Multiplying out the product in (3.26), we get k X Y 1 a` (d) + remainder, X →∞ |D(X )|
lim
d∈D(X ) `=1
where the remainder consists of 2k − 1 terms, each of which is of the form k2 X Y 1 1 a` (d) O r j log (X ) |D(X )|
(3.27)
d∈D(X ) j=1
with r ≥ 1, k2 < k. Now, if F` (x) ≥ 0 for all x, then a` j (d) = a` j (d), and, by our inductive hypothesis combined with Lemma 1, the O-term above tends to zero as X → ∞. If F` (x) is not greater than or equal to zero for all x, we can show that the Oterm in (3.27) tends to zero as X → ∞ by replacing each f i (x) (i = 1, . . . , n) with a function gi (x), which is positive and bigger in absolute value than f i (x), and which satisfies the conditions of Theorem 3.1; that is, we require that • gi (x) ≥ | f i (x)|, • g (x) be even and in S(R), Qi n Pn • i=1 gˆ i (u i ) be supported in i=1 |u i | < 1. That there exist gi ’s satisfying the required conditions can be seen as follows. Let ( K exp(−1/(1 − t 2 )), |t| < 1, h(t) = |t| ≥ 1, 0, where K is chosen so that Z
1
h(t) dt = 1, −1
let θβ (t) =
1 h(t/β) β
(3.28)
(so that θβ approximates the δ-function when β is small), and consider 9β (x) = (θβ ∗ θβ )ˆ(x) = (θˆβ (x))2 .
(3.29)
162
MICHAEL RUBINSTEIN
Now θˆβ (x) =
1 β Z
Z
β
h(t/β) cos(2π xt) dt −β 1
h(u) cos(2πβux) du.
=
(3.30)
−1
But when |x| ≤ 1/(8β), we have θˆβ (x) >
√ Z 1 √ 2 2 h(u) du = 2 −1 2
(since, when |x| ≤ 1/(8β), |u| ≤ 1, we get, |2πβux| ≤ π/4). Hence 9β (x) > 1/2
when |x| ≤ 1/(8β)
(so 9β is bounded away from zero for long stretches when β is small), and, from (3.29), 9β (x) ≥ 0 for all x. Also, note that 9β is even and in S(R) (since h(t) enjoys these properties), and note ˆ β (t) = (θβ ∗ θβ )(t) is supported in [−2β, 2β]. We use 9β (x)’s to construct a that 9 gi (x) satisfying the three required properties. Let M f (c, d) = max | f (x)| , c≤|x|≤d
and let β −1 j
=
( 2n + j, 0,
j ≥ 1, j = 0.
(The j = 0 case is only for notational convenience.) Then gi (x) = 2
∞ X
M fi (8β j )−1 , (8β j+1 )−1 9β j+1 (x)
j=0
has the required properties. 3.3. r.h.s. Our goal is to express Z
(n)
Rn
f (x)WUSp (x) d x
in a manner that allows us to easily see how to match terms with (3.13).
LOW-LYING ZEROS
163
We consider the more general Z Rn
f (x)Wε (x) d x,
(3.31)
where ε ∈ {−1, 1} and Wε (x1 , . . . , xn ) = det K ε (x j , xk ) 1≤ j≤n , 1≤k≤n
sin(π(x − y)) sin(π(x + y)) K ε (x, y) = +ε π(x − y) π(x + y) because it is needed when we study analogous questions for GL M /Q. Write n X Y K ε (x j , xσ ( j) ). Wε (x1 , . . . , xn ) = sgn(σ ) σ
j=1
Here, σ is over all permutations of n elements. Express σ as a product of disjoint cycles G S ∗ (F1 ) × · · · × S ∗ (Fν(F) ), (3.32) σ ∈ F
where F is over set partitions of {1, . . . , n} (as in Section 3.2) and S ∗ (F` ) denotes the set of all (|F` | − 1)! cyclic permutations of the elements of F` . Notice that sgn(σ ) = Qν(F) |F` |−1 . `=1 (−1) For example, if n = 7 and F = [{1, 3, 4, 6} , {2, 5, 7}], then S ∗ ({1, 3, 4, 6}) × ∗ S ({2, 5, 7}) is the set of 12 permutations: {(1 3 4 6)(2 5 7), (1 3 6 4)(2 5 7), (1 4 3 6)(2 5 7), (1 4 6 3)(2 5 7), (1 6 3 4)(2 5 7), (1 6 4 3)(2 5 7), (1 3 4 6)(2 7 5), (1 3 6 4)(2 7 5), (1 4 3 6)(2 7 5), (1 4 6 3)(2 7 5), (1 6 3 4)(2 7 5), (1 6 4 3)(2 7 5)}. We are applying Parseval’s formula to (3.31), and thus we need to determine Wˆ ε (u). So, for each cycle (i 1 , . . . , i m ), we evaluate the Fourier transform Z P 2πi mj=1 u i j xi j K ε (xi1 , xi2 )K ε (xi2 , xi3 ) · . . . · K ε (xim , xi1 )e d xi1 · · · d xim . (3.33) Rm
Expanding the product of K ε ’s, we obtain 2m terms Z X sin(π(xim − am xi1 )) sin(π(xi1 − a1 xi2 )) ··· εβ(a) m π(x − a x ) π(xim − am xi1 ) i1 1 i2 R a ·e
2πi
Pm
j=1 u i j xi j
d xi1 · · · d xim .
(3.34)
164
MICHAEL RUBINSTEIN
Here a ranges over all 2m m-tuples (a1 , . . . , am ) with a j ∈ {1, −1}, and β(a) = # j | a j = −1 . P According to Lemma 3, if u i j < 1, then (3.34) is 2m−2 ε +
X
m X c j u i j 1 − V c1 u i1 , . . . , cm u im , δ
c
(3.35)
j=1
where c is over all 2m−1 m-tuples (c1 , . . . , cm ) with c j ∈ {1, −1}, cm = 1, and where V ( y) = M( y) − m( y),
(3.36)
M( y) = max {sk ( y), k = 1, . . . , n} , m( y) = min {sk ( y), k = 1, . . . , n} , s j ( y) =
k X
yj.
j=1
Applying Parseval’s formula to (3.31) and recalling the assumption that the support Qn Pn |u i | < 1 (so in the integral below, we are restricted to the of i=1 fˆi (u i ) is in i=1 region where Lemma 3 applies), we find that (3.31) equals n Y
Z Rn
! du i fˆi (u i )
X ν(F) Y F
i=1
`=1
·
X0
2|F` |−2 ε +
X c
{i|i∈F` }
(−1)|F` |−1
|F` | X δ c j u i j 1 − V c1 u i1 , . . . , c|F` | u i| F | , j=1
`
(3.37) P where {i|i∈F0` } is over all (|F` | − 1)! cyclic permutations of the elements of F` . Next, in the inner sum, change variables wi j = c j u i j . Recalling that the fˆ’s are assumed to be even functions, we find that the above becomes ! Z n Y X ν(F) Y dwi fˆi (wi ) (−2)|F` |−1 Rn
F
i=1
`=1
|F` | X 0 ε X +δ · wi j 1 − V wi1 , . . . , wi| F | . ` 2
{i|i∈F` }
j=1
LOW-LYING ZEROS
165
Applying the combinatorial identity [15, (4.35)], we get ! Z n Y X ν(F) Y X ε dwi fˆi (wi ) (−2)|F` |−1 · (|F` | − 1)! + δ wi n 2 R F i∈F i=1 `=1 ` X X · (|F` | − 1)! − wk . (|H | − 1)! (|F` | − 1 − |H |)! c k∈H
[H,H ]
Here, H, H runs over all 2|F` | − 2 /2 ways of decomposing F` into two disjoint P proper subsets: H ∪ H c = F` , H ∩ H c = ∅, with H 6 = ∅, F` . Since |F` | = n, we can rewrite the above as ! Z ν(F) n Y X Y X (|F` | − 1)! ε + δ du i fˆi (u i ) (−2)n−ν(F) ui n 2 R F i∈F` i=1 `=1 X X (|F` | − 1)! − (|H | − 1)!(|F` | − 1 − |H |)! u k . c c
[H,H ]
k∈H
(3.38) We now prove the lemma that was required in deriving the above. LEMMA 3 Pm j=1 u j
Let
Z
< 1. Then
sin(π(x1 − a1 x2 )) sin(π(xm − am x1 )) 2πiu·x ··· e dx π(x1 − a1 x2 ) π(xm − am x1 ) a m X X = 2m−2 ε + δ c j u j (1 − V (c1 u 1 , . . . , cm u m )) .
X Rm
εβ(a)
c
(3.39)
j=1
The notation here is defined between (3.34) and (3.36). Note: In the degenerate case m = 1, the above should be read as Z 1 sin(2π x) 2πiux e d x = ε + δ(u), |u| < 1. 1+ε 2π x 2 R Proof The m = 1 case is easy to check and follows from the fact that (1/2)χ[−1,1] (u) = R 2πiux d x. So, assume that m ≥ 2, and consider a typical R (sin(2π x)/(2π x))e Z sin(π(x1 − a1 x2 )) sin(π(xm − am x1 )) 2πiu·x ··· e d x. (3.40) m π(x − a x ) π(xm − am x1 ) 1 1 2 R
166
MICHAEL RUBINSTEIN
Let ti = xi − ai xi+1 ,
i = 1, . . . , m − 1,
tm = x m ,
(3.41)
so that 1 a1 a1 a2 a1 a2 a3 . . . a1 · . . . · am−1 0 1 a2 a2 a3 . . . a2 · . . . · am−1 t1 x1 1 a3 . . . a3 · . . . · am−1 . .. 0 0 . . . = . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . t xm 0 . . . . . . . . m 0 1 am−1 0 ................. 0 1 Let def
K (y) = sin(π y)/(π y). Changing variables, (3.40) is Z K (t1 ) · · · K (tm−1 )K (tm − am (t1 + a1 t2 + a1 a2 t3 + · · · + a1 · . . . · am−1 tm )) Rm
·e2πi(t1 s1 +···+tm sm ) dt1 · · · dtm , (3.42)
where s1 = u 1 , s2 = a 1 u 1 + u 2 , s3 = a 1 a 2 u 1 + a 2 u 2 + u 3 , .. . sk = a1 · . . . · ak−1 u 1 + a2 · . . . · ak−1 u 2 + · · · + ak−1 u k−1 + u k , .. .
(3.43)
Now, K (y) = K (−y), so, because am ∈ {1, −1}, we find that (3.42) equals Z K (t1 ) · · · K (tm−1 )K (am tm − t1 − a1 t2 − a1 a2 t3 − · · · − a1 · . . . · am−1 tm )) Rm
·e2πi(t1 s1 +···+tm sm ) dt1 · · · dtm .
Applying [15, (4.28)] (to the variable t1 with τ = −am tm + a1 t2 + a1 a2 t3 + · · · + a1 · . . . · am−1 tm ), the above becomes Z χ[−1/2,1/2] (v) χ[−1/2,1/2] (v + s1 ) e2πiv(−am tm +a1 t2 +a1 a2 t3 +···+a1 ·...·am−1 tm ) Rm
·K (t2 ) · · · K (tm−1 )e2πi(t2 s2 +···+tm sm ) dv dt2 · · · dtm .
LOW-LYING ZEROS
167
Integrating over t2 , . . . , tm−1 , we get Z χ[−1/2,1/2] (v) χ[−1/2,1/2] (v + s1 ) χ[−1/2,1/2] (a1 v + s2 ) R2
· χ[−1/2,1/2] (a1 a2 v + s3 ) · . . . · χ[−1/2,1/2] (a1 · . . . · am−2 v + sm−1 ) · e2πitm (sm +v(a1 ·...·am−1 −am )) dv dtm . (3.44) Now, if β(a) = # {i | ai = −1} is even, then a1 · . . . · am = 1, so a1 · . . . · am−1 = am and thus a1 · . . . · am−1 − am = 0. Hence the integral over tm pulls out a δ(sm ) from the integral. Next, if β(a) is odd, then a1 · . . . · am = −1, so a1 · . . . · am−1 = −am and thus a1 · . . . · am−1 − am = −2am . Hence the integral over tm gives us a δ(sm − 2am v), which, when integrated over v, pulls out a product of characteristic functions. Hence, we find that (3.44) (and hence that (3.40)) is Z δ(sm ) χ[−1/2,1/2] (v) χ[−1/2,1/2] (v + s1 ) χ[−1/2,1/2] (a1 v + s2 ) · . . . R
· χ[−1/2,1/2] (a1 · . . . · am−2 v + sm−1 ) dv if β(a) is even, (3.45) 1 sm sm sm χ[−1/2,1/2] χ[−1/2,1/2] + s1 χ[−1/2,1/2] a1 + s2 · . . . 2 2am 2am 2am sm · χ[−1/2,1/2] a1 · . . . · am−2 + sm−1 if β(a) is odd. (3.46) 2am We require the following two claims. CLAIM 4 Pm |u i | < 1. Then Let β(a) be odd, and assume that i=1 sm χ[−1/2,1/2] a1 · . . . · ak−1 + sk = 1, k = 1, . . . , m − 1. 2am
(3.47)
Thus, (3.46) equals 1/2. Proof Because ak ∈ {1, −1}, we have, from (3.43), sk = a1 · . . . · ak−1 (u 1 + a1 u 2 + a1 a2 u 3 + · · · + a1 · . . . · ak−1 u k ) .
(3.48)
So the coefficient of u j in (3.47) is (a1 · . . . · ak−1 ) (a1 · . . . · am−1 ) a1 · . . . · a j−1 + (a1 · . . . · ak−1 ) a1 · . . . · a j−1 . 2am (3.49)
168
MICHAEL RUBINSTEIN
When β(a) is odd,
Qm
i=1 ai
= −1; hence (3.49) equals (a1 · . . . · ak−1 ) a1 · . . . · a j−1 1 1 ∈ ,− . 2 2 2
So
a1 · . . . · ak−1 sm + sk < 1/2 2am Pm (since we are assuming i=1 |u i | < 1), and hence the claim is proved.
5 Pm |u i | < 1. Then (3.45) equals Let β(a) be even, and assume that i=1 CLAIM
δ(sm ) (1 − V (u 1 , a1 u 2 , . . . , a1 · . . . · am−1 u m )) with V ( y) defined in (3.36). Proof In (3.45), we have, by (3.48), χ[−1/2,1/2] (a1 · . . . · ak−1 v + sk ) = χ[−1/2,1/2] (a1 · . . . · ak−1 (v + u 1 + a1 u 2 + a1 a2 u 3 + · · · + a1 · . . . · ak−1 u k )) , and we can drop the a1 ·. . .·ak−1 ∈ {1, −1} since χ[−1/2,1/2] (y) is even. Furthermore, the δ(sm ) restricts us to u 1 +a1 u 2 +a1 a2 u 3 +· · ·+a1 ·. . .·am−1 u m = 0. And because Pm |u i | < 1 < 2, we may apply [15, Lemma 4.3], obtaining the we are assuming i=1 claim. Note: In [15, (4.32)], n could read n − 1 without affecting the truth of the equation since, in the notation of that paper, f 2 (v) f 2 (v + u 1 + · · · + u n ) = f 2 (v). We are now ready to complete the proof of this lemma. By Claim 4, the contribution to (3.39) from a with β(a) odd is X a β(a) odd
1 β(a) ε . 2
But we are assuming ε ∈ {1, −1}, so the above is 2m−2 ε. The contribution to (3.39) from a with β(a) even is, by Claim 5, X δ(sm ) (1 − V (u 1 , a1 u 2 , . . . , a1 · . . . · am−1 u m )) . a β(a) even
(3.50)
(3.51)
LOW-LYING ZEROS
169
Now, sm = a1 · . . . · am−1 (u 1 + a1 u 2 + a1 a2 u 3 + · · · + a1 · . . . · am−1 u m ) = am u 1 + am a1 u 2 + am a1 a2 u 3 + · · · + am a1 · . . . · am−1 u m because
Qm
i=1 ai
= 1 when β(a) is even. Let
c = (c1 , . . . , cm ) = (am , am a1 , am a1 a2 , . . . , am a1 · . . . · am−1 ). Qm Now, because i=1 ai = 1, c ranges over all m-tuples with c j ∈ {1, −1} and cm = 1. So, summing over such c, we find that (3.51) equals m X X δ c j u j (1 − V (am c1 u 1 , . . . , am cm u m )) . (3.52) c
j=1
But, because V (− y) = V ( y), the above is (regardless of the value of am = ±1) m X X δ c j u j (1 − V (c1 u 1 , . . . , cm u m )) . (3.53) c
j=1
This, in combination with (3.50), establishes the lemma. 3.4. l.h.s. = r.h.s. LEMMA 4 We have
Z
Y R| F` | i∈F `
du i fˆi (u i ) =
Z
Fˆ` (u) du. R
Proof Q Both are equal, by Fourier inversion, to i∈F` f i (0). LEMMA
5
We have
Z
Y
R| F` |
du i fˆi (u i ) δ
i∈F`
Proof We obtain the lemma by Parseval’s formula.
X
i∈F`
Z
F` (x) d x.
ui = R
170
MICHAEL RUBINSTEIN
LEMMA 6 Let H ⊂ F` , H 6= ∅. Then Z X Y X du i fˆi (u i ) δ ui uk F | | ` R i∈F`
i∈F`
Z
! ! \ \ Y Y f i (u) f i (u) |u| du.
= R
k∈H
i∈H c
i∈H
Proof We obtain the lemma by Parseval’s formula. (n)
Now, WUSp = W−1 , so we need to compare (3.38), with ε = −1, to (3.13). By Lemmas 4–6, write (3.38) as ν(F) Y X (−2)n−ν(F) (P` + Q ` + R` )
(3.54)
`=1
F
with
Z −1 P` = (|F` | − 1)! Fˆ` (u) du, 2 R Z Q ` = (|F` | − 1)! F` (x) d x, R
R` = −
X
(|H | − 1)! (|F` | − 1 − |H |)!
Z R
[H,H c ]
! ! \ \ Y Y f i (u) f i (u) |u| du. i∈H c
i∈H
(3.55) Expanding the product over `, we get ! X
(−2)n−ν(F)
F
X
Y
S
`∈S c
Q`
! X
Y
T ⊆S
`∈T c
P`
! Y
R` ,
(3.56)
`∈T
where S ranges over all subsets of 1, . . . , ν(F) . (We take empty products to be 1.) Q Expanding the product `∈T R` , we find that (3.56) is ! ! |T | X X Y X Y XY n−ν(F) H j − 1 ! (−2) Q` P` · (−1)|T | F
S
`∈S c
T ⊆S
`∈T c
H j=1
\ \ Z Y Y · F` j − 1 − H j ! f i (u) f i (u)|u| du , R
i∈H j
i∈H jc
(3.57)
LOW-LYING ZEROS
171
P c , . . . , H , Hc |T |-tuples where is over all H , H and where T = |T | 1 H |T | 1 `1 , . . . , `|T | . (If T = ∅, we take the large bracketed factor to be 1. And if T 6= ∅, P but H contains no terms, we take it to be zero.) We have thus expressed, in (3.57), the r.h.s. of (3.7) in a form that can easily be compared with the l.h.s., as expressed in (3.13). More precisely, a typical term in (3.13) is specified by F l.h.s. , Sl.h.s. , S2 , (A; B). The sum over F arises from combinatorial sieving, and the sum over S ⊆ 1, . . . , ν(F) arises from multiplying out the explicit formula (3.12). The sum over S2 ⊆ S comes from deciding which prime powers are paired up to produce squares and which are already squares (S2c ). (A; B) accounts for all ways of pairing up S2 . The contribution to (3.13) from a typical term is Z Y (−2)n−ν(F l.h.s. ) F` (x) d x (|F` | − 1)! R
c `∈Sl.h.s.
·
Y
(|F` | − 1)!
`∈S2c
· 2|S2 |/2
|SY 2 |/2
−1 2
Z
Fˆ` (u) du
R
Fa − 1 ! Fb − 1 ! j j
= (−2)n−ν(F l.h.s. )
Y
c `∈Sl.h.s.
· 2|S2 |/2
|SY 2 |/2
Fˆa j (u) Fˆb j (u) |u| du R
j=1
Z
Q`
Y
P`
`∈S2c
Fa − 1 ! Fb − 1 ! j j
Z
Fˆa j (u) Fˆb j (u) |u| du .
(3.58)
R
j=1
other hand, in (3.57), a typical term is specified by F r.h.s. , Sr.h.s. , T , On the c c H1 , H1 , . . . , H|T | , H|T | . Set [ [ c F r.h.s. = F` | ` ∈ Sl.h.s. F` | ` ∈ S2c Fa j ∪ Fb j | j = 1, . . . , |S2 | /2 , H1 = Fa1 , .. . H|S2 |/2 = Fa| S
2 |/2
,
H1c = Fb1 , .. . c H|S = Fb| S 2 |/2
2 |/2
(3.59) .
Sr.h.s. and T are chosen in the obvious way (so that both products of Q’s match, and both products of P’s match). Notice that |T | = |S2 | /2 and that ν(F l.h.s. ) = ν(F r.h.s. ) + |S2 | /2.
172
MICHAEL RUBINSTEIN
The contribution to (3.57) from this term is thus Y Y (−2)n−ν(F l.h.s. )+|S2 |/2 Q` P` c `∈Sl.h.s.
· (−1)|S2 |/2
|SY 2 |/2
`∈S2c
Fa − 1 ! Fb − 1 ! j j
Z
Fˆa j (u) Fˆb j (u) |u| du , R
j=1
(3.60) which is equal, because |S2 | is even, to (3.58). So every term on the l.h.s. has a corresponding term on the r.h.s. Conversely, this method of matching (i.e., (3.59)) produces for every term on the r.h.s. its corresponding term on the l.h.s. (with the convention that we disregard, on P the r.h.s., any term with |T | ≥ 1 but H empty; we can do so since these terms contribute nothing to (3.57)). Thus (3.13) = (3.38) and Theorem 3.1 is proved. 2
3.5. Examples One term for n = 17 Let n = 17, and let F l.h.s. = [F1 , F2 , F3 , F4 , F5 , F6 , F7 ] = [{1, 2, 13} , {4} , {3, 6, 7, 9, 17} , {8, 10, 11} , {5, 12} , {14} , {15, 16}] , Sl.h.s. = {1, 2, 3, 5, 6} , S2 = {1, 2, 5, 6} ,
c Sl.h.s. = {4, 7} ,
S2c = {3} ,
(A; B) = (1, 5; 2, 6).
(3.61)
This corresponds on the r.h.s. to F r.h.s. = [F1 , F2 , F3 , F4 , F5 ] , F1 = F4 ,
F2 = F7 ,
F4 = F1 ∪ F2 ,
F5 = F5 ∪ F6 ,
Sr.h.s. = {3, 4, 5} , T = {4, 5} ,
F3 = F3 ,
c Sr.h.s. = {1, 2} ,
T c = {3} ,
H1 = F1 ,
H1c = F2 ,
H2 = F5 ,
H2c = F6 .
(3.62)
LOW-LYING ZEROS
173
Tables 3.1 and 3.2 show the correspondence between terms on the l.h.s. (as expressed in (3.58)) and the r.h.s. (as expressed in (3.60)). 3.6. Analogous results for GL M /Q Let L(s, π) be the L-function attached to a self-contragredient (π = π) ˜ automorphic cuspidal representation of GL M over Q. Such an L-function is given initially (for <s sufficiently large) as an Euler product of the form L(s, π) =
Y
L(s, π p ) =
p
M YY
(1 − απ ( p, j) p −s )−1 .
p j=1
The condition π = π˜ implies that απ ( p, j) ∈ R. The Rankin-Selberg L-function L(s, π ⊗ π) ˜ factors as the product of the symmetric and exterior square Lfunctions (see [1]): L(s, π ⊗ π) ˜ = L(s, π ⊗ π) = L(s, π, ∨2 )L(s, π, ∧2 ) and has a simple pole at s = 1 which is carried by one of the two factors. Write the order of the pole of L(s, π, ∧2 ) as (δ(π ) + 1)/2 (so that δ(π ) = ±1). We desire to generalize Theorem 3.1 to the zeros of L(s, π ⊗ χd ) whose Euler product is given by L(s, π ⊗ χd ) =
M YY
(1 − χd ( p)απ ( p, j) p −s )−1 .
p j=1
Now, when π = π, ˜ L(s, π ⊗ χd ) has a functional equation of the form 8(s, π ⊗ χd ) := π −Ms/2
M Y
0 (s + µπ ⊗χd ( j))/2 L(s, π ⊗ χd )
j=1
= ε(s, π ⊗ χd )8(1 − s, π ⊗ χd ), where the µπ ⊗χd ( j)’s are complex numbers that are known to satisfy < µπ⊗χd ( j) > −1/2 (and are conjectured to satisfy < µπ⊗χd ( j) ≥ 0). We also have −s+1/2
ε(s, π ⊗ χd ) = ε(π ⊗ χd )Q π⊗χd
−s+1/2
= ±Q π⊗χd
with ε(π ⊗ χd ) = χ 0 (d), where χ 0 is a quadratic character that depends only on π. When δ(π) = −1, all twists have ε(π ⊗ χd ) = 1. If δ(π) = 1, then half the L(s, π ⊗ χd )’s have ε(π ⊗ χd ) = 1 and the other half have ε(π ⊗ χd ) = −1 (with
174
MICHAEL RUBINSTEIN
Table 3.1. Matching the l.h.s. with the r.h.s. for n = 1, 2, 3. Here Sl.h.s. ⊆ 1, . . . , ν(F l.h.s. ) , S2 ⊆ Sl.h.s. , with |S2 | even. (A; B) accounts for all ways of pairing up S2 . Further, Sr.h.s. ⊆ 1, . . . , ν(F r.h.s. ) , T ⊆ Sr.h.s. , and H is over all |T |-tuples H1 , H1c , . . . , H|T | , H|T |c . The matching is as described in (3.59). n
F l.h.s.
Sl.h.s.
S2
(A; B)
F r.h.s.
Sr.h.s.
T
H
1
[{1}]
∅ {1}
∅ ∅
— —
[{1}]
∅ {1}
∅ ∅
— —
2
[{1, 2}]
∅ {1} ∅ {1} {2} {1, 2}
∅ ∅ ∅ ∅ ∅ ∅ {1, 2}
— — — — — — (1; 2)
[{1, 2}]
∅ {1} ∅ {1} {2} {1, 2} {1}
∅ ∅ ∅ ∅ ∅ ∅ {1}
— — — — — — [{1} , {2}]
∅ ∅ ∅ ∅ ∅ ∅ {1, 2} ∅ ∅ ∅ ∅ {1, 2} ∅ ∅ ∅ ∅ {1, 2} ∅ ∅ ∅ ∅ ∅ {1, 2} ∅ {1, 3} ∅ {2, 3} ∅ {1, 2} {1, 3} {2, 3}
— — — — — — (1; 2) — — — — (1; 2) — — — — (1; 2) — — — — — (1; 2) — (1; 3) — (2; 3) — (1; 2) (1; 3) (2; 3)
∅ {1} ∅ {1} {2} {1, 2} {1} ∅ {1} {2} {1, 2} {1} ∅ {1} {2} {1, 2} {1} ∅ {1} {2} {3} {1, 2} {1} {1, 3} {1} {2, 3} {1} {1, 2, 3} {1, 2} {1, 2} {1, 2}
∅ ∅ ∅ ∅ ∅ ∅ {1} ∅ ∅ ∅ ∅ {1} ∅ ∅ ∅ ∅ {1} ∅ ∅ ∅ ∅ ∅ {1} ∅ {1} ∅ {1} ∅ {1} {1} {1}
— — — — — — [{1, 2} , {3}] — — — — [{1, 3} , {2}] — — — — [{2, 3} , {1}] — — — — — [{1} , {2}] — [{1} , {3}] — [{2} , {3}] — [{1} , {2}] [{1} , {3}] [{2} , {3}]
[{1} , {2}]
3
[{1, 2, 3}] [{1, 2} , {3}]
∅ {1} ∅ {1} {2} {1, 2}
[{1, 3} , {2}]
∅ {1} {2} {1, 2}
[{2, 3} , {1}]
∅ {1} {2} {1, 2}
[{1} , {2} , {3}]
∅ {1} {2} {3} {1, 2} {1, 3} {2, 3} {1, 2, 3}
[{1} , {2}]
[{1, 2}] [{1, 2, 3}] [{1, 2} , {3}]
[{1, 2, 3}] [{1, 3} , {2}]
[{1, 2, 3}] [{2, 3} , {1}]
[{1, 2, 3}] [{1} , {2} , {3}]
[{1, 2} , {3}] [{1} , {2} , {3}] [{1, 3} , {2}] [{1} , {2} , {3}] [{2, 3} , {1}] [{1} , {2} , {3}] [{1, 2} , {3}] [{1, 3} , {2}] [{2, 3} , {1}]
LOW-LYING ZEROS
175
Table 3.2. Terms on the r.h.s. that are discarded since they contribute nothing to (3.57). n
F r.h.s.
Sr.h.s.
T
H
1
[{1}]
{1}
{1}
none
2
[{1} , {2}] [{1} , {2}] [{1} , {2}]
{1} {2} {1, 2}
{1} {2} {1, 2}
none none none
3
[{1, 2} , {3}] [{1, 2} , {3}] [{1, 2} , {3}] [{1, 3} , {2}] [{1, 3} , {2}] [{1, 3} , {2}] [{2, 3} , {1}] [{2, 3} , {1}] [{2, 3} , {1}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}]
{2} {1, 2} {1, 2} {2} {1, 2} {1, 2} {2} {1, 2} {1, 2} {1} {2} {3} {1, 2} {1, 2} {1, 2} {1, 3} {1, 3} {1, 3} {2, 3} {2, 3} {2, 3} {1, 2, 3} {1, 2, 3} {1, 2, 3} {1, 2, 3} {1, 2, 3} {1, 2, 3} {1, 2, 3}
{2} {2} {1, 2} {2} {2} {1, 2} {2} {2} {1, 2} {1} {2} {3} {1} {2} {1, 2} {1} {3} {1, 3} {2} {3} {2, 3} {1} {2} {3} {1, 2} {1, 3} {2, 3} {1, 2, 3}
none none none none none none none none none none none none none none none none none none none none none none none none none none none none
176
MICHAEL RUBINSTEIN
the corresponding d’s lying in fixed arithmetic progressions to the modulus of the character χ 0 ). When ε(π ⊗ χd ) = 1, we write the nontrivial zeros of L(s, π ⊗ χd ) as ( j)
1/2 + iγπ ⊗χd , with
j = ±1, ±2, ±3, . . . , (1)
(−1)
(−2)
(2)
. . . <γπ⊗χd ≤ <γπ⊗χd ≤ 0 ≤ <γπ ⊗χd ≤ <γπ⊗χd ≤ . . . and
(−k)
(k)
γπ ⊗χd = −γπ⊗χd . When ε(π ⊗ χd ) = −1, γ = 0 is a zero of L(s, π ⊗ χd ), and we index the zeros as ( j)
1/2 + iγπ ⊗χd ,
j ∈ Z,
with (−2)
(−1)
(0)
(1)
(2)
. . . <γπ ⊗χd ≤ <γπ⊗χd ≤ γπ ⊗χd = 0 ≤ <γπ⊗χd ≤ <γπ⊗χd ≤ . . . and
(−k)
(k)
γπ ⊗χd = −γπ⊗χd . Next, let D(X ) be as in (3.4), and let Dπ,+ (X ) = {d ∈ D(X ) : ε(π ⊗ χd ) = 1} , Dπ,− (X ) = {d ∈ D(X ) : ε(π ⊗ χd ) = −1} . Then, assuming, for M ≥ 4, the Ramanujan conjecture |απ ( p, j)| ≤ 1, we have the following theorem. THEOREM 3.2 Qn Let f (x1 , . . . , xn ) = i=1 f i (xi ) be even in all its variables with each f i in S(R). Pn ˆ |u i | < 1/M. Then if δ(π ) = Assume further that f (u 1 , . . . , u n ) is supported in i=1 1, X X ∗ 1 ( j1 ) ( j2 ) ( jn ) lim f L M γπ ⊗χ , L γ , . . . , L γ M M π⊗χ π ⊗χ d d d X →∞ Dπ,± (X ) d∈Dπ,± (X ) j1 ,..., jn Z (n) = f (x)W±,O (x) d x, (3.63) Rn
LOW-LYING ZEROS
177
and if δ(π ) = −1 (so that all twists have ε(π ⊗ χd ) = 1), X X ∗ 1 ( j1 ) ( j2 ) ( jn ) lim f L M γπ⊗χ , L γ , . . . , L γ M M π⊗χd π⊗χd d X →∞ |D(X )| d∈D(X ) j1 ,..., jn Z (n) = f (x)WUSp (x) d x, (3.64) Rn
where M log X , 2π (n) WUSp (x1 , . . . , xn ) = det K −1 (x j , xk ) 1≤ j≤n , LM =
1≤k≤n
(n) W+,O (x1 , . . . , xn ) = det K 1 (x j , xk ) 1≤ j≤n , 1≤k≤n (n) W−,O (x1 , . . . , xn )
= det K −1 (x j , xk ) 1≤ j≤n 1≤k≤n
+
n X
δ(xν ) det K −1 (x j , xk ) 1≤ j6=ν≤n , 1≤k6=ν≤n
ν=1
sin(π(x − y)) sin(π(x + y)) +ε π(x − y) π(x + y) P∗ (1) (W−,O (x) = 1 − sin(2π x)/(2π x) + δ(x)) and where j1 ,..., jn is over jk = (0), ±1, ±2, . . ., with jk1 6= ± jk2 if k1 6= k2 . K ε (x, y) =
Remark. Again, as in Theorem 3.1, the assumptions f i even and f of the form can be removed.
Q
fi
Proof The proof is similar to that of Theorem 3.1. The main difference is in the explicit formula that, for L(s, π ⊗ χd ), reads Z X F` L M γπ ⊗χd = F` (x) d x + O(1/ log X ) γπ ⊗χd
R
∞ X log m 2 3(m)aπ (m) ˆ χd (m) F` (3.65) − M log X M log X m 1/2 m=1
where aπ ( p k ) =
M X
απk ( p, j).
j=1
We consider the two cases, δ(π) = −1 and δ(π ) = 1, separately.
178
MICHAEL RUBINSTEIN
For both cases we require the estimates X |aπ (m)3(m)|2 /m ∼ log2 (T )/2, m≤T
X
aπ ( p 2 ) log p ∼ −δ(π)T,
(3.66)
p≤T
X
|aπ ( p) log p|2 / p ∼ log2 (T )/2
p≤T
(see [15] and [6]). For these estimates, and M ≥ 4, the Ramanujan conjecture is assumed; these three are needed in the analogs of Claim 2, Subclaim 3.22, and Subclaim 3.24. When δ(π ) = −1, all twists have ε(π ⊗ χd ) = 1. The combinatorics work out exactly the same. The smaller support of fˆ compensates for the presence of the M in the explicit formula. When δ(π) = 1, we need to examine the two subcases, ε(π ⊗ χd ) = 1 and ε(π ⊗ χd ) = −1, separately. As the analog of Lemma 1, we have the following lemma. 7 When δ(π ) = 1, LEMMA
k Y k X ∞ −2 3(m)aπ (m) χd (m) M log X m 1/2 j=1 m=1 d∈Dπ,+ (X ) | S2c | Y Z X log m 1 · Fˆ` j = Fˆ` (u) du M log X 2 c R
1 lim X →∞ Dπ,+ (X )
X
S2 ⊆S |S2 | even
·
X
(A;B)
2|S2 |/2
|SY 2 |/2 Z j=1
`∈S2
|u| Fˆa j (u) Fˆb j (u) du ,
(3.67)
R
P where S = {l1 , . . . , lk }. S2 ⊆S is over all subsets S2 of S whose size is even. |S2 | even P (A;B) is over all ways of pairing up the elements of S2 . F` (x) is defined in (3.11). Proof Notice that the only difference in the r.h.s. of this lemma as compared to Lemma 1 is in the factor | S c | Y Z 1 2 Fˆ` (u) du. 2 c R `∈S2
The difference in sign is accounted for by the opposite sign in (3.66).
LOW-LYING ZEROS
179
So, we have that the l.h.s. of (3.63), for Dπ,+ , tends, as X → ∞, to ! ν(F) X Y X YZ n−ν(F) (−2) F` (x) d x (|F` | − 1)! `=1
F
S
`∈S c R
| S c | Y Z X 2 1 Fˆ` (u) du · 2 c R `∈S2
S2 ⊆S |S2 | even
·
X
2|S2 |/2
(A;B)
|SY 2 |/2 Z j=1
|u| Fˆa j (u) Fˆb j (u) du .
R
This expression matches (3.38) with ε = 1; that is, this equals (in the notation of Section 3.3) Z Z Rn
f (x)W+ (x) d x =
(n)
Rn
f (x)W+,O (x) d x.
For the ε(π ⊗ χd ) = −1 case, there is always a zero at s = 1/2, (0)
γπ⊗χd = 0, and, before applying the combinatorial sieving of Section 3.2, we need to isolate this zero. Now X∗ ( j1 ) ( j2 ) ( jn ) f L M γπ ⊗χ , L γ , . . . , L γ M M π ⊗χd π ⊗χd d j1 ,..., jn
X∗ ( j1 ) ( j2 ) ( jn ) f L M γπ ⊗χ , L γ , . . . , L γ M M π ⊗χ π ⊗χ d d d
=
j1 6 =0,..., jn 6 =0 n X X∗
+
ν=1
( jν−1 ) ( jν+1 ) ( j1 ) ( jn ) f L M γπ⊗χ , . . . , L γ , 0, L γ , . . . , L γ M M M π⊗χd π⊗χd π ⊗χd . d
jν =0 jk 6 =0,k6 =ν
We only focus on the first sum on the r.h.s. above. The same technique applies to the remaining sums.
180
MICHAEL RUBINSTEIN
By combinatorial sieving and the explicit formula, we find that X∗ ( j1 ) ( j2 ) ( jn ) f L M γπ⊗χ , L γ , . . . , L γ M M π ⊗χd π⊗χd d j1 6 =0,..., jn 6 =0 ν(F)
=
X
(−2)
n−ν(F)
Y
(|F` | − 1)!
`=1
F
Z ·
∞ X 2 3(m)aπ (m) χd (m) M log X m 1/2 m=1 log m ˆ · F` − F` (0) + O(1/ log X ) . M log X
F` (x) d x −
R
(3.68)
But, by (3.66), −F` (0) = lim
X →∞
X 3( p 2 )aπ ( p 2 ) 2 log p 4 Fˆ` , M log X p p M log X
and this has the effect, in (3.68), of changing the sign of the contribution from the squares of primes. Acknowledgments. I wish to thank Peter Sarnak for involving me in this project, and Zeev Rudnick and Andrew Oldyzko for many discussions and comments. I thank Rudnick further for inviting me to warm and sunny Israel, where part of this work was done. References [1]
D. BUMP and D. GINZBURG, Symmetric square L-functions on GL(r ), Ann. of Math.
[2]
S. GELBART, An elementary introduction to the Langlands program, Bull. Amer. Math.
[3]
A. E. INGHAM, The Distribution of Prime Numbers, Cambridge Math. Lib., Cambridge
[4]
M. JUTILA, On the mean value of L(1/2, χ ) for real characters, Analysis 1 (1981),
[5]
N. M. KATZ and P. SARNAK, Random Matrices, Frobenius Eigenvalues, and
(2) 136 (1992), 137–205. MR 93i:11058 173 Soc. (N.S.) 10 (1984), 177–219. MR 85e:11094 147 Univ. Press, Cambridge, 1990. MR 91f:11064 158 149–161. MR 82m:10065 157
[6] [7]
Monodromy, Amer. Math. Soc. Colloq. Publ. 45, Amer. Math. Soc., Providence, 1999. MR 2000b:11070 148 , Zeroes of zeta functions and symmetry, Bull. Amer. Math. Soc. (N.S.) 36 (1999), 1–26. MR 2000f:11114 148, 150, 178 A. W. KNAPP, “Introduction to the Langlands program” in Representation Theory and Automorphic Forms (Edinburgh, 1996), Proc. Sympos. Pure Math. 61, Amer. Math. Soc., Providence, 1997, 245–302. MR 99d:11123 147
LOW-LYING ZEROS
181
[8]
M. L. MEHTA, Random Matrices, 2d ed., Academic Press, Boston, 1991.
[9]
H. L. MONTGOMERY, “The pair correlation of zeros of the zeta function” in Analytic
MR 92f:82002 148
[10]
[11] [12] [13] [14]
[15]
Number Theory (St. Louis, Mo., 1972), Proc. Sympos. Pure Math. 24, Amer. Math. Soc., Providence, 1973, 181–193. MR 49:2590 148 M. R. MURTY, “A motivated introduction to the Langlands program” in Advances in Number Theory (Kingston, Ontario, 1991), Oxford Sci. Publ., Oxford Univ. Press, New York, 1993, 37–66. MR 96j:11157 147 A. M. ODLYZKO, On the distribution of spacings between zeros of the zeta function, Math. Comp. 48 (1987), 273–308. MR 88d:11082 148 , The 1020 -th zero of the Riemann zeta function and 70 million of its neighbors, A.T.&T., 1989, http://www.research.att.com/˜amo/unpublished/index.html 148 ¨ UK ¨ and C. SNYDER, Small zeros of quadratic L-functions, Bull. Austral. A. E. OZL Math. Soc. 47 (1993), 307–319. MR 94c:11080 148 M. RUBINSTEIN, Evidence for a spectral interpretation of the zeros of L-functions, Ph.D. thesis, Princeton Univ., 1998, http://www.ma.utexas.edu/users/miker/thesis/thesis.html 150 Z. RUDNICK and P. SARNAK, Zeros of principal L-functions and random matrix theory, Duke Math. J. 81 (1996), 269–322. MR 97f:11074 147, 148, 152, 153, 164, 166, 168, 178
Department of Mathematics, University of Texas at Austin, Austin, Texas 78705, USA;
[email protected]; current: American Institute of Mathematics, 360 Portage Avenue, Palo Alto, California 94306, USA.
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 1,
HOMOTOPICAL DYNAMICS, III: REAL SINGULARITIES AND HAMILTONIAN FLOWS OCTAVIAN CORNEA
Abstract On the space of nondepraved (see [8]) real, isolated singularities, we consider the stable equivalence relation induced by smooth deformations whose asymptotic behaviour is controlled by the Palais-Smale condition. It is shown that the resulting space of equivalence classes admits a canonical semiring structure and is isomorphic to the semiring of stable homotopy classes of CW-complexes. In an application to Hamiltonian dynamics, we relate the existence of bounded and periodic orbits on noncompact level hypersurfaces of Palais-Smale Hamiltonians with just one singularity that is nondepraved to the lack of self-duality (in the sense of E. Spanier and J. Whitehead) of the sublink of the singularity. 1. Introduction This paper is concerned with the dynamics of the gradient and Hamiltonian flows associated to real, isolated, nondepraved (in the sense of M. Goreski and R. MacPherson [8]) singularities. This is a fairly general class of singularities that includes locally analytic singularities. Its definition is recalled in the second section. 1.1. Deformations We first study an equivalence relation on the set of these singularities. The motivation and origin of this relation lie in nonlinear analysis and the calculus of variations and, in particular, in the notion of continuation in Conley index theory. From our point of view, two nondepraved singular germs f and g are equivalent if the asymptotic geometric behaviour of their gradient flows is (stably and uniformly) cobordant. One can formalize this as follows. For the two germs f and g, consider extensions to Rn that do not add any new critical points and that have asymptotically ample gradients that define global flows. (The “ampleness” condition is that they satisfy a version of the Palais-Smale condition; it is shown that such extensions always exist.) Such extensions are called Palais-Smale (PS)-extensions. The two germs are stably DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 1, Received 23 February 2000. Revision received 10 September 2000. 2000 Mathematics Subject Classification. Primary 37B30, 57R45; Secondary 37J45.
183
184
OCTAVIAN CORNEA
Palais-Smale equivalent if there are such extensions that, after adding a quadratic form in independent variables, can be deformed one to the other via a deformation that satisfies a uniform variant of the condition above. Let SingPS n be the set of equivalence n classes of germs of nondepraved singularities f : D −→ R with respect to stable Palais-Smale equivalence. (It is a nontrivial fact that this is indeed an equivalence relation.) As only asymptotic restrictions are imposed along the deformation relating f to g and, therefore, bifurcations and nonisolated singular points can occur, it might appear that this equivalence relation is very weak. The first key idea of this paper is that, in fact, it is precisely as rigid as needed to reduce the study of these singularities to homotopy theory. To state the first main theorem, let CWnS be the abelian semigroup of stable homotopy classes of CW-complexes that admit a thickening (see [23]) in S n−1 . (The operation is the wedge.) This semigroup supports an obvious involution induced by X −→ X ∗ with X ∗ the complement of X in S n−1 . There is also an operation S CWnS × CWkS −→ CWk+n given by the join of complexes. This is again compatible with the involution. The equatorial inclusion S n−1 ,→ S n allows one to assemble this data in a semiring CW∗S . THEOREM 1.1 On SingPS n there
is a “fusion of singularities” operation such that the application associating a singularity germ f to its sublink A f induces a semigroup isomorphism S 8n : SingPS n ≈ CWn compatible with the involution f −→ − f . Moreover, there PS PS is also an “exterior sum operation” ⊕ : SingPS n × Singk −→ Singn+k induced by PS ( f, g) −→ f + g. There are semigroup inclusions Singn ,→ SingPS n+1 given by PS 2 n [ f (x)] −→ [ f (x)] ⊕ [y ] with (x, y) ∈ R × R. Let Sing∗ be the resulting filtered S semiring. The maps 8∗ induce a semiring isomorphism SingPS ∗ ≈ CW∗ . For a nondepraved germ f and for a sufficiently small δ > 0, the sublink A f of the singularity is the homeomorphism type of the space \ f −1 (−∞, f (0)] S(δ, 0) and is independent of δ. (The argument is recalled in the second section.) The Conley index of the critical point of f is the homotopy type of 6 A f . Therefore, it results from Theorem 1.1 that if the Conley indexes of two nondepraved critical points are the same, then, after possibly adding some quadratic form, the two singularity germs can be related by continuation. This gives a partial answer to C. Conley [1, §IV.8.1, question A].
PS-DEFORMATIONS OF REAL SINGULARITIES
185
1.2. Application to Hamiltonian dynamics Our methods apply to C 2 -Hamiltonians f : R2n −→ R that are Palais-Smale and that have a single singularity that is of the type above. In other words, we consider the Hamiltonian flow associated to a C 2 -PSextension of some nondepraved singular germ f with respect to some symplectic form ω on R2n . The major question about this flow is whether it has closed orbits on some con−1 stant energy hypersurface of the form Sa = f (a). Most known results detecting closed orbits of Hamiltonian flows apply to the case when Sa is compact (for a survey, see [9]). Compactness of Sa corresponds in our context to the very special case of f being a local extremum. We assume in the following that this is not the case. Then the hypersurface Sa is noncompact (for all a). In this situation it is natural to look first for the existence of bounded orbits. A second central idea of the paper is that the phenomenon of persistent, infinitely many bounded orbits is implied by a lack of self-duality. Here is how to make this idea precise. We say that a nondepraved germ f has bounded orbits if the following property holds for any symplectic form ω: for any PS-extension of f , the associated ω-Hamiltonian flow has an orbit inside D(R, 0) which intersects S(R, 0), nontrivially. (D(R, 0) is the closed euclidean disk of radius R and center zero, and S(R, 0) is its boundary.) We say that a nondepraved germ f is self-dual if f and − f are stably PSequivalent. 1.2 Assume f : D 2n −→ R is a nondepraved germ. If f is not self-dual, then f has bounded orbits. THEOREM
We do not know under what precise circumstances the presence of self-duality leads to the nonexistence of bounded orbits. Notice that, by Theorem 1.1, f is self-dual if and only if A f is self-dual in the sense of Spanier and Whitehead [20] (or, in other words, possibly after some suspensions, A f and the closure of its complement in S n−1 have the same homotopy type). Therefore, lack of self-duality depends only on the stable PS-equivalence class of f . Other consequences of the fundamental relation between flow inversion and SpanierWhitehead duality have been exploited in the first papers of this series (see [4], [5]). We are also interested in detecting bounded orbits that persist not only under PS-deformations but also with respect to stabilization. We say that the nondepraved germ f has stable bounded orbits if, for any nondegenerate quadratic form q, any nondepraved germ in [ f ] ⊕ [q] has bounded orbits.
186
OCTAVIAN CORNEA
It is natural to neglect self-dual germs. Consider the following equivalence relation on the space of all singular nondepraved germs. We say that two such germs f and g are tamely equivalent if there are self-dual, nondepraved germs Ti and nondegenerate quadratic forms qi for i = 1, 2 such that f + T1 + q1 is stably Palais-Smale equivalent to g + T2 + q2 . (This is easily seen to be an equivalence relation.) Let SingT be the resulting set of equivalence classes. Let CW be the set of equivalence classes obtained from CW∗S by identifying X and 6 X for each CW-complex X and by inverting all self-dual complexes. (6 X is the suspension of X .) It is easy to see that CW is a group. 1.3 The exterior sum produces a group structure on SingT , and the semiring morphism 8 induces a group isomorphism 8 : SingT −→ CW. If 8( f ) 6 = 0, then f has stable bounded orbits. COROLLARY
Obviously, we are also interested in producing some closed orbits. We understand here by closed orbit one in the usual sense or one that closes at a stationary point. COROLLARY 1.4 Let f : D 2n −→ R be a nondepraved germ. Every PS-extension of f has a Whitney C 2 -neighbourhood containing a dense family of functions, each of whose induced Hamiltonian flow has at least rk(H∗ (A f ; Z)) − rk(Hn−1 (A f ; Z)) nonconstant closed orbits.
1.3. Structure of the paper The proofs of Theorems 1.1 and 1.2 are based on elements of singularity theory and some basic facts concerning the Conley index (see [1], [19]). That these two techniques have something in common has been suggested before, for example, in [8] and [2]. The steps leading to these proofs have some interest in themselves. They are presented as follows. Section 2 contains, besides basic reviews of properties of nondepraved singularities and the Palais-Smale condition, one key fact relating these two notions: any nondepraved germ admits a PS-extension. The purpose of Section 3 is to prove that if two nondepraved germs f , g are related by a deformation satisfying a Palais-Smale-type condition, then 6 A f ' 6 A g (where ' is homotopy equivalence). Conversely, if two germs have sublinks of the same stable homotopy type, then their stabilizations can be related by a PSdeformation. For the proof of the first implication, we make use of the continuation
PS-DEFORMATIONS OF REAL SINGULARITIES
187
properties of the Conley index. As a corollary we introduce stable PS-equivalence. In Section 4 we describe the operations in Sing∗S and prove Theorem 1.1. We then discuss the application to Hamiltonian dynamics and prove Theorem 1.2 and its corollaries. This again makes use of the properties of the Conley index. The closed orbit statement is based on the C 1 -closing lemma of C. Pugh and C. Robinson [15]. To increase the accessibility of the paper, we have tried to make it as selfcontained as possible. We review and indicate the proofs of the needed elements of singularity theory. We also recall the basic definitions and properties relative to the Conley index.
2. All nondepraved singularities have Palais-Smale extensions 2.1. Nondepraved singularities We denote by || || the standard euclidean norm on Rn . Let D n = D(δ, 0) = {y : ||y|| ≤ δ}, S(δ, 0) = ∂ D(δ, 0). We use below the term smooth to mean C r with r ≥ 2. Following [8], we recall the definition of nondepraved singularities. Definition 1 Let f : (D n , 0) −→ (R, 0) be a smooth function. Point zero is a nondepraved singularity of f if it is an isolated singularity and if for any sequence xn such that limn→∞ xn = 0, limn→∞ (∇ f (xn )/||∇ f (xn )||) = v ∈ S(1, 0), limn→∞ xn /||xn || = w, we have either v • w = 0 or sign((v • w) f (xn )) = 1 for all sufficiently big n. (Here • is the scalar product of vectors in Rn ; sign(a) is the sign of a and is taken to be zero if a = 0.) Obviously, one could make the same definition for a metric different from the standard one. However, the nondepravity condition is independent of the metric used on Rn and is invariant with respect to diffeomorphisms leaving invariant the origin (see [8]). If zero is a nondepraved singularity of f , the following condition is also satisfied: (∗)
there exists δ > 0 such that for all < δ the singular hypersurface f −1 (0) intersects transversely the sphere S(, 0).
Indeed, if (∗) were false, there would be a sequence xn with limn→+∞ xn = 0 and f (xn ) = 0, ∇ f (xn ) = λn xn . We may assume that xn /||xn || and ∇ f (xn )/||∇ f (xn )|| both converge. But this immediately leads to a contradiction with the nondepravity condition.
188
OCTAVIAN CORNEA
In the following we say that the isolated singularity zero of a smooth function f : (D n , 0) −→ (R, 0) is neat if it satisfies the condition (∗) above. (In particular, nondepraved implies neat.) The basic fact that we need to recall is the following lemma. 2.1 If zero is a neat singularity of f , then for some small > 0 the homeomorphism type T of the sublink of f , f −1 (−∞, 0] S(δ, 0), is independent of δ < . This homeomorphism type is denoted by A f (0) (or A f ). LEMMA
The argument for this is classical (see [13]). Indeed, let H = f −1 (0). Assume that T for δ ≤ the intersection of H with S(δ, 0) is transversal. Let H 0 = H D(, 0). Let V be the vector field on D(, 0) given by the projection of the gradient vector field X of the function x −→ ||x||2 in the direction of −∇ f /||∇ f ||. Similarly, let W be obtained by projecting X on the hyperplane orthogonal to ∇ f . Let ν : D(, 0) −→ [0, 1] be a smooth function such that ν −1 (0) = H 0 , ν −1 (1) is a S T neighbourhood of f −1 ((−∞, −τ ] [τ, ∞)) S(, 0) with τ so small that the intersection of S(, 0) with f −1 (τ 0 ) is transverse for all τ 0 ∈ [−τ, τ ]. (Such a ν may be constructed by a suitable modification of f 2 .) Consider the vector field U = W + νV . This is smooth except at zero, is tangent to H 0 , and points out on S( 0 , 0) for all 0 < 0 ≤ . It immediately follows that the flow induced by U provides the T T homeomorphisms between f −1 (−∞, 0] S(δ, 0) and f −1 (−∞, 0] S(, 0) for all 0 < δ ≤ . Remark 1 (a) A main problem with neat singularities is that the neatness condition, in contrast to the nondepravity one, depends strongly on the metric chosen on Rn . (b) A nondepraved singularity f also has the property that the pair ( f −1 (0) − {0}, 0) forms a Whitney stratification (see [24], [22]). This last condition is again independent of the metric, is invariant under diffeomorphisms, and implies neatness. A singularity satisfying only this stratification condition was called reasonable in [2]. T (c) Given a neat singularity germ f , consider the link of f , f −1 (0) S(δ, 0). (This is, of course, the boundary of the sublink.) The same arguments as above show that the homeomorphism type of the link is also invariant with respect to (sufficiently small) δ. We denote the link of f by L f . The following local structure result is useful. Assume that f : Rn −→ R is a smooth function with a single critical point at the origin and that f (0) = 0. Consider the
PS-DEFORMATIONS OF REAL SINGULARITIES
189
flow γ induced by −∇ f , the negative of the gradient of f . Consider the cylindrical neighbourhoods of zero defined by \ [ U (, δ) = f −1 [−, ] {x ∈ Rn : ∃t ∈ R {∞, −∞}, f (γt x) = 0, ||γt x|| ≤ δ}. Let A(, δ) = U (, δ)
T
f −1 (−).
LEMMA 2.2 If zero is a nondepraved singularity and δ, are small enough, then there is a homeomorphism of pairs (U (, δ), A(, δ)) ≈ (D n , A f (0)).
Proof It is clear that the different small neighbourhoods U (, δ) are homeomorphic. This T is true even for neat singularities due to the conical structure of f −1 (0) D(τ, 0) for small τ . (This is an immediate consequence of Lemma 2.1.) For this reason we take , δ, and δ 0 small enough such that U (, δ) ⊂ D(δ 0 , 0) and f −1 (0) intersects transversely all spheres of radius smaller than δ 0 . As above, construct a vector field U smooth (except at zero) in D(δ 0 , 0) that points out on the boundary and is tangent to f −1 (0) − {0}. Let V be the smooth vector field that is the component of U orthogonal to ∇ f . Consider the flow associated to V . If we take small enough, this flow carries T U (δ, ) into D(δ 0 , 0) f −1 ([−, ]), providing a homeomorphism between these T two sets which restricts to a homeomorphism A(δ, ) ≈ D(δ 0 , 0) f −1 (−). On the other hand, another modification of the vector field U provides a homeomorphism of pairs \ \ (D(δ 0 , 0) f −1 ([−, ]), D(δ 0 , 0) f −1 (−)) \ ≈ (D(δ 0 , 0), S(δ 0 , 0) f −1 (−∞, 0]). More precisely, one adds to U a vector field that is null outside a neighbourhood of | f |−1 ([ 0 , ]), where it points in the direction of sign( f )∇ f . Let X 00 be the resulting vector field. Nondepravity implies that, for sufficiently small and for all x satisfying ||x|| ≤ , ∇ f (x) = λx, λ ∈ R, we have sign(λ) = sign( f ). Because of this, when 0 < are sufficiently small, we see that the vector field, X 00 , points out on the T boundary of D(δ 0 , 0) f −1 ([−, ]) without adding any new stationary points. The induced flow provides the homeomorphism indicated. Remark 2 (a) The cylindrical neighbourhoods U (, δ) appear to have been first introduced in [18]. Their properties have also been exploited in [6].
190
OCTAVIAN CORNEA
(b)
For singularities that are only neat, a simple invertible cobordism argument (as in [2]) shows the existence of a homotopy equivalence of pairs (U (, δ), A(, δ)) ' (D n , A f (0)).
2.2. Palais-Smale extensions Let M be a smooth manifold, and let α be a Riemannian metric on M. Let f : M −→ R be a smooth function. We denote by ∇ α f the α gradient of f . Recall that the function f satisfies the Palais-Smale condition (or is PS) with respect to α (see [14]) if for any m ∈ R a sequence xn that verifies ∇ α f (xn ) → 0 and | f (xn )| ≤ m contains a convergent subsequence. Whenever we are speaking about a PS-function, we assume implicitly that the relevant metric has been already fixed. A global flow is one that is defined for infinite negative and positive times. 2.3 For any nondepraved singular germ, f , there are a disk D n , a Riemannian metric α on Rn , and a function (which is called a Palais-Smale extension of f ) f : Rn −→ R with a single critical point and such that ∇ α f induces a global flow, f | D n = f , and f is PS with respect to α. PROPOSITION
Proof Let γ be the (partial) flow induced by −∇ f . As above, for and δ small enough, let U = U (, δ) be a cylindrical neighbourhood of zero. The interior of U is homeomorphic to Rn . Notice that ∇ f induces only a partially defined flow on Int(U ) (as each flow line is defined only for finite time). We intend to change the canonical metric on U as well as the function f such that this deficiency is corrected and, with respect to the new metric, the modified function satisfies the desired property. Let 0 < . Notice that W = Int(U (, δ)) − Int(U ( 0 , δ)) is diffeomorphic to the dis` T −1 joint union of A × (0, 1] B × (0, 1], where A = A × {0} = U f () and T −1 B = B × {0} = U f (−). We use, for the set W , coordinates of the form ` (y, t), where y ∈ A B and t gives the direction of the negative of the gradient of f . In these coordinates, let ν be the standard metric, and let ∂/∂t be the unit tangent vector in the direction of t. We focus only on the part, W 0 , of W diffeomorphic to A × (0, 1]. Let u, v : U −→ R be smooth functions such that u(x) = 1 = v(x) for x ∈ U − Int(W 0 ), u and v depend only on t inside W 0 , u(0) = 0 = v(0), u(1) = 1 = v(1), u 0 (t), v 0 (t) > 0 for t ∈ [0, 1), u(t) = tv(t) in a small neighbourhood of zero, and limt→0 tv 0 (t)/v(t) = 1. Define a new metric on Int(U ) by ν 0 = (1/u 2 )ν. Modify the function f by defining f 0 = (1/v) f on W 0 . Clearly, the negative of the gradient of f 0 with respect to ν 0 , −∇ 0 f 0 , points in the direction t. Its value is given by ν 0 (∇ 0 f 0 , ∂/∂t) = ∂ f 0 /∂t.
PS-DEFORMATIONS OF REAL SINGULARITIES
191
This means that −∇ 0 f 0 = ((v 0 u 2 /v 2 ) f − (u 2 /v)∂ f /∂t)∂/∂t on W 0 . Notice that ∂ f /∂t is negative on W 0 . Hence, the ν 0 norm of ∇ 0 f 0 is equal to 0 (v u/v 2 ) f − (u/v)∂ f /∂t. When t tends to zero, the sum goes to . Therefore, this expression is bounded from above on Int(U ). Moreover, when x 6= 0, the sum is strictly positive. Therefore, for some small ω > 0, on Int(U ) − D(ω, 0) the ν 0 norm of ∇ 0 f 0 is bounded both from above and from below. The existence of a bound from above implies that the flow induced by −∇ 0 f 0 is well defined for infinite negative time. A similar construction applied to W − W 0 leads to the desired metric α and function f . Notice that the metric α coincides with the canonical one on a small disk around the origin. Of course, the same construction could have been performed by starting with any other Riemannian metric around the origin (because the nondepravity condition is independent of the metric). Remark 3 (a) At first sight the existence of Palais-Smale extensions might seem an obvious fact. However, it is easy to construct a Morse function on D n with only two critical points, both nondegenerate, and which does not admit a PS-extension to Rn . Indeed, consider the height function h on the sphere S n . Let S and N be its two critical points, and let E ∈ h −1 (0). Let p be the stereographic projection from E onto a plane tangent to the antipodal of E. Let D n be the image of a complement of a disk around E. The function h 0 = h ◦ p −1 : D n −→ R does not admit a PS-extension to Rn . It remains open what is the exact class of singularities for which the extension result of the proposition remains valid. (b) For any two nondepraved germs f and g there is a smooth function F : Rn × [0, 1] −→ R with F0 and F1 Palais-Smale with unique critical points extending, respectively, f and g. Indeed, one just has to use a smooth homotopy of the two extensions f and g provided by the proposition as well as a smooth deformation of the metrics. We also need some local properties of sums of nondepraved singularities. We recall S that the join X ∗ Y of two spaces X , Y is given by C X × Y X ×Y X × CY , where C X is the cone on X .
192
OCTAVIAN CORNEA
2.4 If h : D n −→ R and g : D k −→ R are nondepraved germs, then h + g : D n × D k −→ R is neat with respect to a metric on D n × D k which restricts on a neighbourhood of (0, 0) ∈ D n × D k to a metric α1 +α2 , with αi Riemannian metrics on D n and, respectively, D k for i = 1, 2. For such a metric we have Ah+g ' Ah ∗ A g . A singular germ f = h + g with h and g nondepraved verifies the conclusions of Lemma 2.2 and of Proposition 2.3 for any metric of the type above.
LEMMA
(a)
(b)
Proof Recall that a nondepraved germ f has the following property: (∗∗)
for sufficiently small , and for all x satisfying ||x|| ≤ , ∇ f (x) = λx, λ ∈ R, we have sign(λ) = sign( f ).
Our germ f = h + g continues to satisfy this condition for any metric α on D n × D k of the sort considered. (This discussion is necessary because h + g might not be nondepraved.) Indeed, let αi , i = 1, 2, be as above. Without loss of generality we may assume that α1 and α2 are the standard metrics. Clearly, ∇ α (g+h) = ∇(h)+∇(g). As g and h are both nondepraved and this condition is independent of the metric, we get that if (z, y) ∈ D n × D k sufficiently close to (0, 0) and (∇(h)(z), ∇(g)(y)) = λ(z, y), then sign(λ) = sign(g(y)) = sign(h(z)) = sign(h + g)(z, y). Assume now that the claim at Lemma 2.4(a) is false. It follows that for any n ∈ N there is a point (xn , yn ) such that {(xn , yn )} converges to (0, 0), ∇ α (h + g)(xn , yn ) = λn (xn , yn ) for some λn ∈ R, and (h + g)(xn , yn ) = 0. In particular, we have ∇h(xn ) = λn xn , ∇g(yn ) = λn yn . This means that sign(h(xn )) = sign(g(yn )) for sufficiently big n and contradicts h(xn ) + g(yn ) = 0. The homotopy equivalence Ah+g ' Ah ∗ A g follows easily as in [2] or [12]. For the claim at (b), by inspecting the proof of Lemma 2.2 we see that, besides the neatness of the singular germ, the only property that is used is (∗∗). Thus, the statement of Lemma 2.2 remains valid for h +g with respect to any metric on D n × D k which is of the type discussed. The proof of Proposition 2.3 uses only this fact (besides neatness) and therefore also carries over to h + g. Remark 4 (a) Whenever talking about the sublink of a sum h + g as above, we assume the metric used to define this sublink to be as in Lemma 2.4(a). Notice that for a nondepraved germ the sublink is independent of the metric. (b) The homotopy equivalence in Lemma 2.4(a) has also been discussed in [3].
PS-DEFORMATIONS OF REAL SINGULARITIES
193
We point out that all the constructions of that paper also work for neat singularities. However, there is an imprecision in [3, Lemmas 3.1 and 3.3]. One should in fact assume that the functions involved have locally analytic or at least nondepraved singularities. 3. The sublink and PS-deformations We need some elements of Conley index theory which we now recall (see [1], [19]). Consider a continuous (local) flow γ : X × R −→ X , X being a locally compact, metric space. Let S ⊂ X be a compact invariant set that is isolated in the sense that there is a compact neighbourhood N of S such that S ⊂ Int(N ) and S is the maximal invariant set of γ inside N . Such a neighbourhood is called an isolating neighbourhood of S. A pair (N1 , N0 ) of compact sets in N is an index pair for S in N if N0 ⊂ N1 , N1 − N0 is a neighbourhood of S, S is the maximal invariant set in the closure of N1 − N0 , N0 is positively invariant in N1 , and if for x ∈ N1 there is some T ≥ 0 such that γT (x) 6 ∈ N1 , then there exists a τ > 0, τ < T with γt (x) ∈ N1 for 0 ≤ t ≤ τ and γτ (x) ∈ N0 . There are index pairs inside any isolating neighborhood of S. The Conley index of S, cγ (S), is the homotopy type of the quotient space N1 /N0 . It is independent of the choice of the index pair. Moreover, suppose γ λ : X ×R −→ X is a family of flows depending continuously on the parameter λ ∈ [0, 1]. Assume that S T is an isolated invariant set of γ viewed as a flow on X ×[0, 1]. Then Sλ = S X ×{λ} is an isolated invariant set of γ λ , and cγ (S) = cγ λ (Sλ ) for all λ ∈ [0, 1]. This last property of the Conley index is referred to as invariance to continuation (the invariant set S0 of γ 0 being continued to the invariant set S1 of γ 1 ). We make use of the following definition. Definition 2 Fix a Riemannian metric β on Rn × [0, 1]. We denote by βt the metrics on Rn given by β|Rn ×{t} , t ∈ [0, 1]. Let F : Rn × [0, 1] −→ R, and let Ft = F|Rn ×{t} . A βPalais-Smale deformation is a smooth function F as above such that ∇ βt Ft induces a global flow, the union of the critical sets of Ft for all t ∈ [0, 1] is compact, and, for any m ∈ R, a sequence (xn , τn ) ∈ Rn × [0, 1] that verifies ∇ βτn Fτn (xn , τn ) → 0, |F(xn , τn )| ≤ m contains a convergent subsequence. Two smooth, singular germs f, g : D n −→ R are related by a Palais-Smale deformation if there are a metric β and a β-PS-deformation F such that F0 has a single critical point and extends f and F1 has a single critical point and extends g. We say that the germs f and g are related by a stable PS-deformation if there is some nondegenerate quadratic form q : Rk −→ R such that f + q, g + q : D n × D k −→ R are related by a β-PS-deformation with βt a product metric in a neighbourhood of (0, 0) ∈ D n × D k for all t ∈ [0, 1].
194
OCTAVIAN CORNEA
Remark 5 The simplest example of a β-PS-deformation is that of a function F : Rn × [0, 1] −→ R which has the property that there is some > 0 such that for some fixed compact set K ⊂ Rn we have ||∇ βt Ft (x)|| ≥ for all t ∈ [0, 1] and x 6∈ K , and ∇ βt Ft induces a global flow. All the PS-deformations that are constructed below are of this type. 3.1. Reduction to homotopy theory Denote by ' homotopy equivalence, and denote by ' S stable homotopy equivalence. (Two finite CW-complexes X , Y verify X ' S Y if and only if for some q ∈ N we have 6 q X ' 6 q Y .) 3.1 If two neat germs f and g are related by a PS-deformation, then 6 A f ' 6 A g . Assume that for two nondepraved germs f and g we have A f ' S A g . Then f and g are related by a stable PS-deformation.
THEOREM
(a) (b)
Proof Assume that f is a neat germ and that γ , U (, δ) and A(, δ) are as in Lemma 2.2. The pair (U (, δ), A(, δ)) is an obvious index pair of zero viewed as an isolated invariant set of γ . Therefore, as U (, δ) is contractible and A(, δ) ' A f (by Remark 2), the corresponding Conley index verifies cγ (0) ' 6 A f . Let β be a Riemannian metric on Rn × [0, 1], and let F : Rn × [0, 1] −→ R be a β-PS-deformation relating f to g. Now, f might not be neat with respect to β0 . However, one can relate β0 to the canonical metric by a path of metrics β 0 with β00 = β0 and β10 the canonical metric. A look to the corresponding induced gradient flows shows by continuation that the Conley index of zero with respect to the β0 gradient flow of f coincides with 6 A f . A similar argument shows that the Conley index of zero with respect to the β1 -gradient flow of g coincides with 6 A g . Denote by γ τ the flow induced by the negative of the βτ -gradient of Fτ , −∇ τ Fτ . The restriction imposed on the family Fτ implies that there is a compact set Q such that for all τ the critical set of Fτ is included in Q. Let T = max{|F(x, t)| : x ∈ Q, t ∈ [0, 1]}. Moreover, there exist a compact set K ⊂ Rn and a constant > 0 such that the gradient of Fτ has βτ norm bigger than for x 6 ∈ K , |Fτ (x)| ≤ 2T , ∀τ . Consider two points a, b ∈ Q. For a fixed τ , assume that there are t0 , t1 , and x such that γtτ0 (x) = a and γtτ1 (x) = b. We have Z t 1 d τ 2T ≥ |F(a) − F(b)| = (Fτ (γt (x))) dt t dt Z 0t 1 dγ τ τ τ = βτ ∇ Fτ , dt ≥ 2 Ha,b , dt t0
PS-DEFORMATIONS OF REAL SINGULARITIES
195
τ is the time that the flow line passing through a is spending out of K and where Ha,b τ before reaching b. The compactness of Q and this uniform bound for the times Ha,b 0 τ implies that there is another compact K such that, for any τ , a flow line of γ starting in Q and returning in Q remains in the meanwhile in K 0 . (If this did not happen, then there would exist sequences xn ∈ Q, tn ∈ R such that lim(xn ) = x0 , lim(tn ) = t0 , and ||γtτn (xn )|| → ∞. But this would imply ||γtτ0 (x0 )|| = ∞.) As the critical points of Fτ are inside Q, this means that K 0 × [0, 1] is an isolating neighbourhood for the maximal invariant set of the flow γ . By the continuation property of the Conley index, we have that the Conley index of the maximal invariant set of γ 0 coincides with the Conley index of the maximal invariant set of γ 1 . However, it was noticed above that the first Conley index is, up to homotopy, 6 A f and that the second is 6 A g .
Remark 6 The use of the Conley index in similar settings is relatively standard. Another such application appears in [6]. The relation between the Conley index of an isolated critical point and its sublink appears in [2]. We do not know how to prove the invariance of the homotopy type of the suspension of the sublink without using the Conley index. The second part of the theorem follows from the next result. PROPOSITION 3.2 If two nondepraved germs f, g : D n −→ R have the property that A f +q and A g+q are isotopic in S n−1 for some nondegenerate quadratic form q : D k −→ R, then they are related by a stable PS-deformation.
Assuming this, we proceed as follows. By hypothesis there is a k ∈ N such that 6 k A f ' 6 k A g . Let q : R2k+n+2 −→ R be a nondegenerate quadratic form of index k. Of course, q is nondepraved. Therefore, A f +q ' A f ∗ Aq ' 6 q A f ' 6 q A g ' A g+q by Lemma 2.4. This means that A f +q and A g+q are two thickenings in S 2k+2n+1 of a CW-complex of dimension at most n + k − 1. By results of [23] and [10] these two thickenings are isotopic in S 2k+2n+1 , and by applying Proposition 3.2 we obtain that f and g are related by a stable PS-deformation. To end the proof we need to prove Proposition 3.2. A first step is given by the next statement. 3.3 If A ⊂ S n−1 is a compact (n − 1)-manifold with boundary, then there is a nondepraved germ f with A f = A. Consider three nondepraved germs f : D n −→ R, h : D k −→ R, and
LEMMA
(a) (b)
196
OCTAVIAN CORNEA
g : D n+k −→ R. If A f +h = A g , then f + h and g are related by a PSdeformation. Proof The first part of the lemma is similar to a result in [2] and is inspired by a general technique for constructing isolated critical points that appears in [21]. Here is the construction. Let φ : [0, 1] −→ [0, 1] be a C ∞ -function that satisfies (dφ/dt)(t) > 0 if t 6= 0, (d n φ/dt n )(0) = 0 for all n ∈ N, and φ(1) = 1. Let g : S n−1 −→ R be a C ∞ -function having zero as a regular value and such that g −1 ((−∞, 0]) = A. Define f : D(δ, 0) −→ R by f (y) = φ(||y||2 )g(x), where y = (x, ||y||) ∈ S n−1 × [0, 1] in polar coordinates. As all the derivatives of φ vanish at zero, it follows that f is T C ∞ . Clearly, f −1 ((−∞, 0])) S(δ, 0) = A. Now, in polar coordinates (x, t), the gradient of f has the form ((φ)(t 2 )/t∇g, 2t (dφ/dt)(t 2 )g). This means that the only singularity of f is at zero. Moreover, whenever g 6 = 0 the product of this gradient with the position vector has sign = sign(g) = sign( f ). Together with the fact that if x ∈ f −1 (0), x 6 = 0, then the vector x belongs to Tx ( f −1 (0)), this implies that zero is a nondepraved critical point of f . For the second part we first assume that k = 0 = h. Notice that if f : D n −→ R has a single singularity at zero which is nondepraved, then there is a new smooth function f with a single critical point and small disks (all centered at zero) D 0 ⊂ D 00 ⊂ D n such that f = f on D 0 , sign( f ) = sign( f ) on D 00 , and sign(∇ f (x) • x) = sign( f ) for x ∈ ∂ D 00 . Indeed, consider a cylindrical neighbourhood U (, δ) as in Lemma 2.2. We may round the corners of U (, δ), getting a new neighbourhood U 0 in such T −1 T −1 a way that −∇ f points in on ∂U 0 f ((0, ∞)) and out on ∂U 0 f ((−∞, 0)). We have, as in Lemma 2.2, a diffeomorphism of pairs (U 0 , A(, δ)) ≈ (D 00 , A f ) for some small disk D 00 containing U 0 . Moreover, this diffeomorphism can be taken to be equal to the identity on some small disk D 0 ⊂ U 0 . We define f as the composition of the restriction of f to U 0 with the inverse of this diffeomorphism. We perform the same construction for g, thus getting a function g. We have that −∇g and −∇ f point inside (respectively, outside) D 00 precisely on A− f = A−g (respectively, A f = A g ). This implies that the function G τ = τ f + (1 − τ )g has the same properties on ∂ D 00 . In particular, its gradient never vanishes on ∂ D 00 for any τ ∈ [0, 1]. Because of this we may use the same blow-up technique as in the proof of Proposition 2.3 to extend the function G : D 00 × [0, 1] −→ R to a smooth function F : Rn × [0, 1] −→ R such that Fτ extends G τ , with respect to a suitable metric F is a Palais-Smale deformation such that F0 extends f and F1 extends g, and the critical points of Fτ coincide with those of G τ . Notice that we may assume that the relevant metrics restrict to the canonical one in a neighbourhood of 0 ∈ Rn .
PS-DEFORMATIONS OF REAL SINGULARITIES
197
We now come back to the general case when k > 0 and h 6= 0. Because of Lemma 2.4, sums of nondepraved singularities are neat with respect to metrics restricting to products in a neighbourhood of the origin and have cylindrical neighbourhoods satisfying the diffeomorphism of Lemma 2.2. Therefore, the construction above can also be applied to f + h. The statement follows. Proof of Proposition 3.2 Let h : A f +q × [0, 1] −→ S n−1 × [0, 1] be the smooth embedding provided by the isotopy of A f +q and A g+q . It has the properties that h t is an embedding for all t ∈ [0, 1], h 0 is the inclusion of A f +q , and h 1 is a diffeomorphism A f +q ≈ A g+q . Denote by A the image of h. We may use a parametrized version of the construction in Lemma 3.3(a) to obtain a function G : D n × [0, 1] −→ R such that, for each t ∈ [0, 1], the function G t has a single singularity at zero which is nondepraved. Of course, the sublink of G 0 is A f +q , and that of G 1 is A g+q . A parametrized version of Proposition 2.3 shows that G can be extended to a PS-deformation of G 0 to G 1 . By applying the second part of Lemma 3.3, we conclude the proposition. COROLLARY 3.4 Under the assumptions of Theorem 3.1(a) we also have 6 L f ' 6 L g .
Proof It is easy to see that L f is the void set if and only if L g is void. We now assume that L f is not void. We have an obvious homotopy pushout: / Af
Lf
i
A− f
j
/ S n−1
The maps i and j are null-homotopic. This implies that up to homotopy 6 L f ' 6 A f ∨ 6 A− f ∨ S n−1 . By applying Theorem 3.1, we have 6 A f ' 6 A g and also 6 A− f ' 6 A−g . COROLLARY 3.5 The relation “ f is related to g by a stable PS-deformation” is an equivalence relation on the class of nondepraved germs.
Proof Obviously, our relation is well defined, reflexive, and symmetric. The only difficulty is transitivity. Assume that f is related to g by a stable PS-deformation and that g
198
OCTAVIAN CORNEA
is related to h by another such deformation. Then A f ' S A g ' S Ah by Theorem 3.1(a). But now Theorem 3.1(b) implies that h and f can be related by a stable PSdeformation. Remark 7 Recall that the standard equivalence relation for singularities (see, for example, [11]) is right equivalence. Two singularity germs f and g are right equivalent if there is a diffeomorphism germ h : Rn −→ Rn such that f = g ◦ h. Obviously, if f and g are nondepraved and right equivalent, then A f ≈ A g and therefore f and g are stably PS-equivalent. It is also worth mentioning that right equivalence is not implied only by the existence of an isotopy of the inclusions A f ⊂ S n−1 and A g ⊂ S n−1 (see [11]).
4. The main theorems and Hamiltonian flows 4.1. Proof of Theorem 1.1 Let SingPS n be the set of stable PS-equivalence classes of nondepraved germs f : D n −→ R. Let 9 : SingPS n −→ CWn be defined by 8([ f ]) = [A f ], where [ f ] is the stable PS-equivalence class of f and [X ] is the stable homotopy equivalence class of the complex X ⊂ S n−1 . (CWn is the set of stable homotopy classes of complexes admitting a thickening in S n−1 .) By Theorem 3.1 and Lemma 3.3 we already know that 8 is a bijection. The map 8 is clearly compatible with the involution f → − f as A− f is the closure of the complement of A f in S n−1 . To end the proof of Theorem 1.1 we only need to discuss the two operations that are transported by 8 to the wedge and, respectively, to the join of CW-complexes. The existence of these two operations is obvious as they can be defined using the bijection 8−1 . However, it is useful to have a definition that is more geometric. Fusion of singularities Given two nondepraved germs f, g : D n −→ R, we construct a function k in the following way. Let U = U (, δ) and U 0 = U 0 (, δ) be cylindrical neighbourhoods for f and g. We take the connected sum of U and U 0 by first identifying a disk in D 0 ⊂ T T f −1 (0) ∂U with a disk D 00 ⊂ g −1 (0) ∂U 0 and then extending this identification to the set W of points in ∂U and points in ∂U 0 situated on flow lines that cross D 0 , respectively, D 00 . Of course, W is again a disk (of dimension n − 1), and the union S S U W U 0 is a disk of dimension n. We define the function k : U U 0 −→ R by pasting together f and g on W . Of course, by composing with a diffeomorphism we
PS-DEFORMATIONS OF REAL SINGULARITIES
199
may assume that k is defined on D n . We see that k has exactly two critical points both with the same critical value equal to zero, that it is negative or zero on a set ambiently isotopic to the connected sum of A f and A g , and that it extends (translations of) both f and g. PS Assume now that [ f ] and [g] are in SingPS n . Let [ f ] ∨ [g] ∈ Singn be the stable n PS-equivalence class of a nondepraved germ h : D −→ R which admits a PSdeformation H such that H0 extends h, H1 extends the function k constructed above, and H0 as well as H1 do not have more critical points than h, respectively, k. By a method similar to the proof of Theorem 3.1(a), we see that 6 Ah ' 6 A f ∨ 6 A g . (This is because the Conley index of the maximal invariant set of k is precisely this wedge.) As 8 is bijective, it follows that this operation is well defined once we prove that such functions h, H exist. This follows immediately by the same methods as in Proposition 3.2. (The only thing that is crucial is that even if k has two critical points, it still admits cylindrical neighbourhoods homeomorphic to disks; in this case they correspond to connected sums of cylindrical neighbourhoods of f and g.) Exterior sum PS PS For [ f ] ∈ SingPS n and [g] ∈ Singk , let [ f ] ⊕ [g] = [h] ∈ Singn+k , where h is a nondepraved germ such that there is a PS-deformation relating h to f + g. (Such a germ h exists by Lemma 3.3.) By Lemma 2.4, for a metric restricting to a product in a neighbourhood of the origin, we have that f + g is neat and A f +g ' A f ∗ A g = Ah . Assume that h 0 is any other nondepraved germ with the same property relative to f + g. Then Ah 0 ' S Ah , and therefore h and h 0 are stably PS-equivalent. As a consequence, the operation is well defined. As an immediate consequence of the properties of these two operations, we obtain 2 Theorem 1.1. 4.2. Proofs of Theorem 1.2 and of Corollary 1.3 For completeness, we start by recalling the basic notions needed. A symplectic form ω on R2n is a 2-form that is closed and nowhere degenerate. Assume that α is some Riemannian metric on R2n . As before, for a smooth function g : R2n −→ R, let ∇ α g be the α-gradient of g. The Hamiltonian vector field Hg , induced by g, is defined by the equation ω(Hg , X ) = dg(X ) that holds for all smooth vector fields X . It has the property that ∇ α g and Hg are α-orthogonal. Therefore, Hg is tangent to the hypersurfaces g −1 (a). The Hamiltonian flow induced by g and ω is the flow obtained by integrating Hg . We recall that we assume our isolated singularities to be different from local extrema.
200
OCTAVIAN CORNEA
Proof of Theorem 1.2 The idea of the proof is simple and is similar to [1, Chapter I, Section 9.2]: assuming that f does not have bounded orbits, we show that there exists a continuation from the gradient flow of f to the gradient flow of − f in such a way that there is some global, compact, isolating neighbourhood. (Here f is an arbitrary PS-extension of f .) This implies, by the continuation properties of the Conley index, that 6 A f ' 6 A− f , which shows by Theorem 1.1 that f and − f are stably PS-equivalent. It is useful to note that the intermediate stages of the continuation that is constructed are not gradient flows, and, therefore, we do not construct directly a PS-deformation of f to −f. Fix a metric α on R2n . Let f be some PS-extension of f (relative to α). Fix also a symplectic form ω on R2n . Let φ, ψ, ν : [−1, 1] −→ [0, 1] be smooth functions such that φ(x) = 0 for x ≥ 0, ψ(x) = 0 for x ≤ 0, ν(x) = 0 for |x| ≥ 1/2, φ is decreasing, φ(−1) = 1, ψ is increasing, and ψ(1) = 1; ν is increasing for x < 0 and decreasing for x > 0, and ν(0) = 1; φ(x) + ψ(x) + ν(x) = 1 for all x ∈ [−1, 1]. Let X = −∇ α f , let H = H f be the Hamiltonian vector field induced by f , and let h be the associated Hamiltonian flow. Define a new vector field on [−1, 1] × R2n by V (t, x) = φ(t)X (x) + ν(t)H (x) − ψ(t)X (x). Let γ be the associated flow on [−1, 1] × R2n , and let γ τ be the restriction of γ to {τ } × R2n . For a flow η and a subset K of its domain, we denote by Iη (K ) the maximal invariant set of η inside K . We consider the set N = [−1, 1]× D(R, 0). This set is certainly compact. Notice S that Iγ (N ) = [−1, 1] × {0} Ih (D(R, 0)). This happens because Iγ τ (D(R, 0)) = {0}
if τ 6= 0
(1)
and γ 0 = h. Assume that for all x ∈ S(R, 0) the h-orbit of x is not bounded. This is T equivalent to the fact that Ih (D(R, 0)) S(R, 0) = ∅. It follows that Iγ (N ) ⊂ Int([−1, 1] × D(R, 0)), and therefore [−1, 1] × D(R, 0) is an isolating neighbourhood of this invariant set. The continuation properties of the Conley index, the identity (1), and the particular form of the Conley index of a neat singularity imply that 6 A f ' cγ −1 (0) = cγ 1 (0) ' 6 A− f . Remark 8 Clearly, the same argument as above shows that, if U (R), R ∈ R+ , is any family of compact neighbourhoods of zero with mutually disjoint boundaries and such that
PS-DEFORMATIONS OF REAL SINGULARITIES
R < R 0 implies U (R) ⊂ U (R 0 ) and U (R) that intersects ∂U (R).
S
201
U (R) = Rn , then there is a bounded orbit in 2
Proof of Corollary 1.3 We first verify that tame equivalence is indeed an equivalence relation. Recall that we say that two nondepraved germs f and g are tamely equivalent if there are self-dual, nondepraved germs T1 and T2 and quadratic forms q1 , q2 such that f + T1 + q1 and g + T2 + q2 are related by a PS-deformation. By Theorem 3.1 this is equivalent to A f ∗ A T1 ' S A g ∗ A T2 . Notice that if Ti is a nondepraved germ that is self-dual for i = 1, 2, then T1 ⊕ T2 is self-dual. This immediately implies that our relation is an equivalence. By Theorems 1.2 and 1.1, the statement is now obvious. Indeed, we have that SingT is isomorphic, via 8, to CW∗S factorized by the equivalence relation identifying X to 6 X for every CW-complex X and, additionally, killing all homotopy types of sublinks of self-dual germs. All these germs correspond via 8 to self-dual complexes. Moreover, as germs of the form f ⊕ (− f ) are self-dual (as they have a sublink of the form A∗ A∗ with A∗ the Spanier-Whitehead dual of A), we obtain that SingT is indeed a group, and the required isomorphism is immediate. If g is a nondepraved germ in the stable PS-class of [ f ] + [q], then A g ' S A f and, as 8( f ) 6= 0, it follows that A g is not (Spanier-Whitehead) self-dual; hence g has bounded orbits. 4.3. Closed orbits In this subsection we show Corollary 1.4 and discuss its relations with results in the literature. As above, we assume a symplectic form ω fixed on R2n . Recall that by the closed orbit of a flow γ we mean either an orbit generated by a point x such that there is a period T ∈ R with γT (x) = x or the closure of an orbit generated by a point x such that the two limits limt→+∞ γt (x) and limt→−∞ γt (x) exist and are equal. Recall also the ω-limits of a point x in the flow \ \ γ : ω+ (x) = γ[t,∞) (x), ω− (x) = γ(−∞,−t] (x). t>0
t>0
We start with a particular case. LEMMA 4.1 Generically, any PS-function g : R2n −→ R with a single critical point which is nondegenerate of index different from n induces a Hamiltonian flow with at least one nonconstant closed orbit.
202
OCTAVIAN CORNEA
Proof From Theorem 1.2 we know that the function g has bounded orbits. To show that any generic such g induces a Hamiltonian flow with at least one nonconstant closed orbit, we are going to use the C 1 -closing lemma of Pugh and Robinson [15] in the following particular form (see [15, Section 11.3]): there is a dense set of Hamiltonian vector fields, S, on R2n that satisfy c (S) ⊂ 0(S). Density is understood here in the C 1 -Whitney strong topology; 0(S) is the closure of the set of points on periodic trajectories; c (S) is the set of nonwandering points with at least one nonvoid ω-limit. We make a distinction here between closed orbits and periodic trajectories: the first class contains the second but also contains the orbits that close at a stationary point. Assume that the index of the critical point of g is k. There is a C 2 -(Whitney-compact open topology) neighbourhood of g consisting of functions that have a single critical point that is nondegenerate and of index k. Therefore, generically, we may assume that g induces a Hamiltonian vector field Hg that satisfies c (Hg ) ⊂ 0(Hg ). We may also assume that the critical point of g is zero. Assume that x ∈ R2n generates a bounded orbit of the flow induced by Hg . There are two possibilities: either ω+ (x) = ω− (x) = S 0, and in this case the orbit of x closes at zero, or there is a point y ∈ ω+ (x) ω− (x) different from zero. This point is nonwandering; notice that the orbit generated by y is also bounded, and hence its ω-limits are nonempty. Therefore y is in the closure of the space of closed orbits, and therefore this space contains more than the point zero. Proof of Corollary 1.4 Let f : D 2n −→ R be a nondepraved germ, and let f be any PS-extension of f . Close to f we may find a generic family of smooth functions g such that the induced Hamiltonian vector fields are C 1 -generic in the sense above when restricted to a closed disk D and g is a Morse-Smale function with critical points in D of distinct critical values. (This happens because these Morse functions form an open and dense set.) Assume that a1 , . . . , am are the critical values corresponding, respectively, to the critical points x1 , . . . , xm . We may apply Lemma 4.1 to each critical point xi at a time. Indeed, this lemma depends on the existence of bounded orbits of the Hamiltonian flow inside f −1 ([ai − , ai + ]) for small and all i. But the same argument as that used in the first step of the proof of Theorem 1.2 can be applied in this situation (see Remark 8), and it leads to the existence of closed orbits inside this set whenever the index of the critical point xi is not n (or, in other words, whenever xi , as singularity, is not self-dual). It is easy to see that the number of critical points of index k of g is bounded from below by rk(Hk−1 (A f ; Z)). Indeed, the maximal invariant set of the flow induced by the gradient of g is compact, and, as g is very close to f , by continuation its Conley index is 6 A f . On the other hand, the Morse complex of
PS-DEFORMATIONS OF REAL SINGULARITIES
203
g computes the homology of this Conley index, and our statement is implied by the Morse inequalities. Remark 9 (a) It is useful to recall that the problem of finding closed orbits on some hypersurface h −1 (a) for the Hamiltonian flow induced by a function h : R2n −→ R depends only on the set h −1 (a) and not on h. This provides the connection between our result and, for example, that of P. Rabinowitz [16] claiming that if h −1 (a) is diffeomorphic to S n−1 under radial projection, then it contains a periodic orbit. Indeed, if h −1 (a) has this property, then by possibly replacing h with a different function without modifying the preimage of a, we may assume that h has a single critical point that is nondepraved and a minimum. Corollary 1.4 implies that, generically, we can find closed orbits on a hypersurface h −1 (a). This is much weaker than Rabinowitz’s result, of course. However, our result becomes of interest in the cases when the hypersurfaces in question are not compact and the singularity of h is complicated. In this noncompact setting, it appears that no other tools are available. Of course, one would like to strengthen Corollary 1.4 by producing closed orbits whenever self-duality is not present without the genericity assumption. However, some additional conditions are probably necessary as there are compact hypersurfaces that do not carry any closed orbits (see [7]). (b) Estimating precisely the number of critical points in close Morse approximations is rather subtle. The question turns out to depend on whether the approximation needs to be only C 0 -close or higher (see [12]). For a more general discussion on morsification, see also [17]. References [1]
[2] [3]
[4] [5] [6]
C. CONLEY, Isolated Invariant Sets and the Morse Index, CBMS Regional Conf. Ser.
Math. 38, Amer. Math. Soc., Providence, 1978. MR 80c:58009 184, 186, 193, 200 O. CORNEA, Cone-decompositions and degenerate critical points, Proc. London Math. Soc. (3) 77 (1998), 437–461. MR 99j:57036 186, 188, 190, 192, 195, 196 , “Spanier-Whitehead duality and critical points” in Homotopy Theory via Algebraic Geometry and Group Representations (Evanston, Ill., 1997), Contemp. Math. 220, Amer. Math. Soc., Providence, 1998, 47–63. MR 99g:55011 192, 193 , Homotopical dynamics: Suspension and duality, Ergodic Theory Dynam. Systems 20 (2000), 379–391. MR CMP 1 756 976 185 , Homotopical dynamics, II: Hopf invariants, smoothings and the Morse complex, preprint, 1998, arXiv:math.GT/9812103 185 E. N. DANCER, Degenerate critical points, homotopy indices and Morse inequalities, J. Reine Angew. Math. 350 (1984), 1–22. MR 85i:58033 189, 195
204
OCTAVIAN CORNEA
[7]
V. GINZBURG, Some remarks on symplectic actions of compact groups, Math. Z. 210
[8]
M. GORESKY and R. MACPHERSON, Stratified Morse Theory, Ergeb. Math. Grenzgeb.
[9]
H. HOFER and E. ZEHNDER, Symplectic Invariants and Hamiltonian Dynamics,
(1992), 625–640. MR 93h:57053 203 (3) 14, Springer, Berlin, 1988. MR 90d:57039 183, 186, 187
[10] [11] [12] [13] [14] [15] [16] [17] [18]
[19] [20] [21]
[22] [23] [24]
Birkh¨auser Adv. Texts Basler Lehrbucher, Birkh¨auser, Basel, 1994. MR 96g:58001 185 J. F. P. HUDSON, Concordance, isotopy, and diffeotopy, Ann. of Math. (2) 91 (1970), 425–448. MR 41:4549 195 H. KING, Real analytic germs and their varieties at isolated singularities, Invent. Math. 37 (1976), 193–199. MR 54:13114 198 , The number of critical points in Morse approximations, Compositio. Math. 34 (1977), 285–288. MR 56:1330 192, 203 J. W. MILNOR, Singular Points of Complex Hypersurfaces, Ann. of Math. Stud. 61, Princeton Univ. Press, Princeton, 1968. MR 39:969 188 R. PALAIS, Lusternik-Schnirelman theory on Banach manifolds, Topology 5 (1966), 115–132. MR 41:4584 190 C. C. PUGH and C. ROBINSON, The C 1 closing lemma, including Hamiltonians, Ergodic Theory Dynam. Systems 3 (1983), 261–313. MR 85m:58106 187, 202 P. H. RABINOWITZ, Periodic solutions of Hamiltonian systems, Comm. Pure Appl. Math. 31 (1978), 157–184. MR 57:7674 203 J. REINECK, Continuation to the minimal number of critical points in gradient flows, Duke Math. J. 68 (1992), 185–194. MR 93i:58028 203 E. ROTHE, A relation between the type numbers of a critical point and the index of the corresponding field of gradient vectors, Math. Nachr. 4 (1951), 12–17. MR 12:720c 189 D. SALAMON, Connected simple systems and the Conley index of isolated invariant sets, Trans. Amer. Math. Soc. 291 (1985), 1–41. MR 87e:58182 186, 193 E. H. SPANIER, Function spaces and duality, Ann. of Math. (2) 70 (1959), 338–378. MR 21:6584 185 F. TAKENS, The minimal number of critical points of a function on a compact manifold and the Lusternik-Schnirelman category, Invent. Math. 6 (1968), 197–244. MR 38:5235 196 R. THOM, Ensembles et morphismes stratifi´es, Bull. Amer. Math. Soc. 75 (1969), 240–284. MR 39:970 188 C. T. C. WALL, Classification problems in differential topology, IV: Thickenings, Topology 5 (1966), 73–94. MR 33:734 184, 195 H. WHITNEY, Elementary structure of real algebraic varieties, Ann. of Math. (2) 66 (1957), 545–556. MR 20:2342 188
Universit´e de Lille 1, Unit´e de Formation et de Recherche de Math´ematiques, 59655 Villeneuve D’Ascq, France;
[email protected], http://www-gat.univ-lille1.fr/˜cornea/octav.html
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 2,
THE ASYMPTOTICS OF MONOTONE SUBSEQUENCES OF INVOLUTIONS JINHO BAIK AND ERIC M. RAINS
Abstract We compute the limiting distributions of the lengths of the longest monotone subsequences of random (signed) involutions with or without conditions on the number of fixed points (and negated points) as the sizes of the involutions tend to infinity. The resulting distributions are, depending on the number of fixed points, (1) the TracyWidom distributions for the largest eigenvalues of random GOE, GUE, GSE matrices, (2) the normal distribution, or (3) new classes of distributions which interpolate between pairs of the Tracy-Widom distributions. We also consider the second rows of the corresponding Young diagrams. In each case the convergence of moments is also shown. The proof is based on the algebraic work of J. Baik and E. Rains in [7] which establishes a connection between the statistics of random involutions and a family of orthogonal polynomials, and an asymptotic analysis of the orthogonal polynomials which is obtained by extending the Riemann-Hilbert analysis for the orthogonal polynomials by P. Deift, K. Johansson, and Baik in [3]. 1. Introduction β-Plancherel measure In the last few years, it has been observed by many authors that there are certain connections between random permutations and/or Young tableaux, and random matrices. One of the earliest clues to this relationship appeared in the work of A. Regev [41] in 1981. A Young diagram, or equivalently a partition λ = (λ1 , λ2 , . . .) ` n P (λ1 ≥ λ2 ≥ . . . , λ j = n), is an array of n boxes with top and left adjusted as in the first picture of Figure 1, which represents the example λ = (4, 3, 1) ` 8. A standard Young tableau Q is a filling of the diagram λ by numbers 1, 2, . . . , n such that numbers are increasing along each row and along each column. In this case, we say that the tableau Q has the shape λ. The second picture in Figure 1 is an example DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 2, Received 23 February 2000. Revision received 5 February 2001. 2000 Mathematics Subject Classification. Primary 60C05; Secondary 45E05, 05A05. Baik’s work supported in part by a Sloan Doctoral Dissertation Fellowship during the academic year 1998–1999 as a graduate student at Courant Institute of Mathematical Sciences. 205
206
BAIK AND RAINS
1
2
5
3
4
8
7
6 Figure 1. Young diagram and standard Young tableau
of a standard Young tableau with shape λ = (4, 3, 1). Let dλ denote the number of standard Young tableaux of shape λ. A result of [41] is that for fixed β > 0 and fixed l, as n → ∞, X
dλ
√
λ`n λ1 ≤l
∼
(
β ll
2 /2
ln
2π )(l−1)/2 n (l−1)(l+2)/4
β
n (l−1)/2 l!
Z
e−(1/2)βl Rl
P
j
x 2j
Y
j
|x j − xk |β d l x. (1.1)
The multiple integral on the right-hand side is called the Selberg integral, which can be computed exactly for each β in terms of the gamma function. In particular, when β = 1, 2, 4, this integral is the normalization constant of the eigenvalue density of a random matrix taken from the Gaussian orthogonal ensemble (GOE), Gaussian unitary ensemble (GUE), Gaussian symplectic ensemble (GSE), respectively (see, e.g., [36]). Motivated by this result, we define the β-Plancherel measure on the set Y n of Young diagrams (or partitions) of size n by β
Mnβ (λ) := P
dλ
β µ`n dµ
,
λ ∈ Yn .
(1.2) β
A natural question is the limiting statistics of a random λ ∈ Yn under Mn as n → ∞. The case when β = 2 is quite well studied. In this case, M n2 is called the Plancherel measure that arises in the representation theory of the symmetric group (k) Sn . Denote by L n the random variable λk under the Plancherel measure Mn2 , and (1) set L n = L n . In 1977, the limiting expected shape of λ under Mn2 was obtained in [48], and independently in [35] for the so-called Poissonized Plancherel measure. In particular, it was shown that E(L n ) √ = 2. n→∞ n lim
(1.3)
RANDOM INVOLUTIONS
207
A central limit theorem for L n was then obtained in [3]: √ Ln − 2 n lim Pr = F2 (x), n→∞ n 1/6
(1.4)
where F2 is the so-called Tracy-Widom distribution function, which is expressed in terms of a solution to the Painlev´e II equation (see Definition 2 in Section 2). The connection to random matrix theory comes from this function F2 : in 1994, C. Tracy and H. Widom [44] proved that after proper centering and scaling (which is different from the scaling for L n in (1.4)), the largest eigenvalue of a random matrix taken from the GUE has the same limiting distribution given by F2 . In other words, after proper centering and scaling, the first row of a random Young diagram under the Plancherel measure behaves statistically for large n like the largest eigenvalue of a random GUE (k) matrix. Then in the same paper [3], it was conjectured that L n of a random λ ∈ Yn 2 under Mn have the same limiting distribution as the kth largest eigenvalue of a random GUE matrix for each k. This conjecture was supported by numerical simulations of (2) A. Odlyzko and Rains, and it was proved to be true for the second row L n in [4]. (k) The full conjecture for the general row L n was subsequently proved in [38], [10], and [32], independently. The authors of [38], [10], and [32] proved the convergence in joint distribution for general rows, and the authors of [10] and [32] obtained discrete sine kernel representations for the so-called bulk scaling limit of correlation functions, an analogue of the sine kernel that appears in the GUE matrix case. The authors of [3] and [4], in addition to convergence in distribution, also proved convergence (1) (2) of moments for L n and L n , respectively. Convergence of (joint) moments for the general rows is obtained recently in [5]. We also mention that there are many works on similar relationships between tableaux/combinatorics and GUE random matrices (see, e.g., [46], [9], [31], [47], [28], [34], [43]). We refer readers to [1], [37], and [12] for a survey and history of L n , and to [36] as a general reference on random matrix theory (see also the recent book [11]). One of the main topics in this paper is the limiting statistics of λ ∈ Y n under Mn1 . From (1.1) and the results for the case when β = 2, one might guess that for β = 1 the limiting statistics of a random λ is the same as the limiting statistics of the eigenvalues of GOE matrices. We establish this fact for the first two rows. More precisely, we prove (see Theorems 3.4 and 3.6 together with the remark that follows), (k) denoting by L˜ n the random variable λk of a random λ ∈ Yn under Mn1 , lim Pr
n→∞ (k)
√ ˜ (k) Ln − 2 n ≤ x = F1(k) (x), n 1/6
k = 1, 2,
(1.5)
where F1 (x) is the limiting distribution function (see [45]) for the (scaled) kth largest eigenvalue of a random matrix taken from GOE. We also prove convergence
208
BAIK AND RAINS
of moments. As in the case of β = 2, we expect that the above result should extend to the general rows k ≥ 3 and also to the joint distributions. For general values of β > 0, again from (1.1), we expect that in the large n limit the rows of a random β Young diagram under Mn correspond to the Coulomb charges on the real line with the quadratic potential at the inverse temperature β, which specializes to GOE, GUE, GSE eigenvalue distributions for the cases β = 1, 2, 4, respectively. This conjecture seems natural from the perspective of the discrete Coulomb gas interpretation for the Plancherel measure case Mn2 given by Johansson [31], [32]. Random involutions The Plancherel measure Mn2 has a nice combinatorial interpretation. The well-known Robinson-Schensted correspondence in [42] establishes a bijection between the permutations π of size n and the pairs of standard Young tableaux (P, Q) where the shape of P and the shape of Q are the same and the shape of P (or Q), denoted by λ(π), is a partition of n. Thus the Plancherel measure Mn2 on Yn is the pushforward of the uniform probability measure on Sn . Moreover, under this correspondence, λ1 (π) is equal to the length of the longest increasing subsequence of π. More generally, a theorem of C. Greene [26] says that λ1 (π) + · · · + λk (π) is equal to the length of the longest so-called k-increasing subsequence of π. Thus the difference of the lengths of the longest k-increasing subsequence and the longest (k − 1)-increasing subsequence of π ∈ Sn under the uniform probability measure is equal to λk of λ ∈ Yn under the Plancherel measure Mn2 in the sense of joint distributions. Thus, for example, (1.3) and (1.4) can be restated for the results on the longest increasing subsequence of a random permutation. On the other hand, the sum of the lengths of the first k columns of λ is equal to the length of the longest k-decreasing subsequence of corresponding π. But since the transpose λt has the same statistics as λ under Mn2 , the results (1.3) and (1.4) also hold for the longest decreasing subsequence of a random permutation. The measure Mn1 also has a combinatorial interpretation. If π is mapped to (P, Q) under the Robinson-Schensted correspondence, then π −1 is mapped to (Q, P) (see, e.g., [33, Sec. 5.1.4]). Therefore the set of involutions π = π −1 ∈ Sn is in bijection with the set of standard Young tableaux whose shapes are partitions of n. Consequently, the uniform probability measure on the set of involutions S˜n = {π ∈ Sn : π = π −1 }
(1.6)
is pushed forward to the 1-Plancherel measure Mn1 on Yn . Thus the result (1.5) for k = 1 implies that in the large n limit the length of the longest increasing (also decreasing) subsequence of a random involution behaves statistically like the largest eigenvalue of a random GOE matrix. An involution π ∈ S˜n consists of only 1-cycles and 2-cycles. It turns out that if we put a condition on the number of 1-cycles (or fixed points) of π, the limiting
RANDOM INVOLUTIONS
209
distribution is different. Introduce a new ensemble, Sn,m = {π ∈ S˜2n+m : |{x : π(x) = x}| = m}.
(1.7)
For an involution π, the number of fixed points is equal to the number of odd parts of λt (see [33]). Equivalently, the number of fixed points of π is equal to λ1 − λ2 + λ3 − . . .. Thus the uniform probability measure on the set Sn,m is pushed forward to the measure dλ P , λ ∈ Yn,m , (1.8) µ∈Yn,m dµ where
Yn,m = {λ = (λ1 , λ2 , . . .) ∈ Y2n+m :
X j
(−1) j−1 λ j = m}.
(1.9)
Note that the rows and columns of λ ∈ Yn,m now have different distributions. We ,(k) ,(k) denote by L n,m and L n,m the random variables given by the lengths of the kth row and the kth column, respectively, of a random λ ∈ Yn,m under the measure (1.8). ,(1) We also set L n,m = L n,m and L n,m = L ,(1) , the length of the longest increasing and decreasing subsequences of a random π ∈ Sn,m under the uniform probability measure. Set m (1.10) α=√ . 2n The limiting distribution of L n differs depending on α. Indeed, we prove in Theorem 3.1 and Theorem 9.2 that L √ − 2√2n + m n,[α 2n] lim Pr ≤ x = F4 (x), 0 ≤ α < 1, (1.11) n→∞ (2n + m)1/6 √ L √ − 2 2n + m n,[ 2n] lim Pr ≤ x = F1 (x), α = 1, (1.12) n→∞ (2n + m)1/6 L √ − (α + 1/α)√2n + m n,[α 2n] p lim Pr = erf(x), α > 1, (1.13) n→∞ (1/α − 1/α 3 )(2n + m)1/4
where F4 and F1 are the distributions for the limiting fluctuations of the largest eigenvalues of random GSE and GOE matrices, respectively, and erf is the standard normal distribution. Again, we also prove convergence of moments. We note that F4 = F1(2) ; the limiting distributions of the largest eigenvalue of GSE and the second largest eigenvalue of GOE are the same (see the discussion at the end of Section 3). The role of the number of fixed points for the limiting distribution can be seen from the following point selection picture. Consider a unit square [0, 1] × [0, 1] in the plane, and set δ = {(x, x) : 0 ≤ x ≤ 1}, the diagonal. Suppose we select n points at
210
BAIK AND RAINS
7 6 5 4
1234567 1537264
3 2 1
1
2
3
4 5
6
7
Figure 2. Point selection process
random in the lower triangle 0 ≤ x < y ≤ 1, and suppose we take the mirror image of the points about the diagonal δ. We also select m points at random on the diagonal δ. Hence there is a total of 2n + m points in the square. As illustrated in Figure 2, one such choice of points gives rise to a permutation π satisfying π 2 = 1 with m fixed points; that is, π ∈ Sn,m . The length L n (π) of the longest increasing subsequence of π is then equal to the “length” of the longest (piecewise linear) up/right path in the square from (0, 0) to (1, 1), where the “length” of a path is defined by the number of points on the path. The length of the longest up/right path in the above point selection process has the same distribution as L n . Now, note that the points on δ form an increasing path. When m is large compared to n, there are many points on δ and we expect that the longest path consists mostly of diagonal points. Hence we are in the linear statistics situation, and thus the order of fluctuation of the length of the longest path is expected to be (mean)1/2 by the usual central limit theorem. On the other hand, when m is small compared to n, then the longest path contains few diagonal points (none if m = 0) and we are in the situation of a 2-dimensional maximization problem. In this case it has been believed, and in a few cases (e.g., [3], [32], [25]) it has been proved, that the fluctuation has order (mean)1/3 . (For random permutations, which have a similar interpretation as a point selection process, one can see from the scaling in (1.4) that the fluctuation is of order (mean)1/3 .) Thus there must be a transition of the limiting distribution as the size of m varies. The results (1.11)–(1.13) show that α = 1 is the transition point. The fixed points play the role of adding a special line in the 2-dimensional maximization problem (see [6] for a relevant work where two special lines are added to a 2-dimensional maximization problem). We note that when (1) L n (see (1.5)). This α = 1, L √ in (1.12) has the same limiting distribution as e n,[ 2n] √ is because the typical number of fixed points of a random involution of size k is k.
RANDOM INVOLUTIONS
211
Indeed, the result (1.5) is proved by using (1.11)–(1.13) and by taking a summation over the number of fixed points (see Section 8) to which only α = 1 gives the main contribution. Once the transition point α = 1 is known, it is of interest to investigate the transition more carefully. We set α =1−
2w , (2n)1/6
(1.14)
and we take n → ∞ while keeping w fixed. We prove that (see Theorem 3.2) there is a one-parameter family of distribution functions F (x; w), which is expressed in terms of the Riemann-Hilbert representation for the Painlev´e II equation (see Definition 4), such that L √ − 2√2n + m n,[α 2n] ≤ x = F (x; w). (1.15) lim Pr n→∞ (2n + m)1/6
The new class of distributions F (x; w) interpolates F4 and F1 as w → ∞ and w = 0, respectively, and satisfies limw→−∞ F (x; w) = 0, so (1.15) is consistent (2) with (1.11)–(1.13). Alternatively, since F4 = F1 , F (x; w) interpolates the limiting distributions of the second and first eigenvalues of a random GOE matrix. The meaning of F (x; w) in terms of random matrices is not clear, but there is a Coulomb gas interpretation. In [7, (7.64)–(7.65)], the following density function is introduced. Suppose 2N ordered particles on the positive real line, 0 < ξ 2N < · · · < ξ2 < ξ1 , are distributed according to the density function 1 Z 2N
eA
P2N
j=1 (−1)
jξ
j
Y
1≤i< j≤2N
(ξi − ξ j )
2N Y
e−ξ j /2 dξ j ,
(1.16)
j=1
where Z 2N is the normalization constant. Hence, in addition to the usual Coulomb gas interaction, there is an additional attraction between neighboring pairs ξ 2 j−1 and ξ2 j , j = 1, . . . , N . When A = 0, this attraction vanishes, and one sees that (1.16) is the eigenvalue density for the 2N × 2N Laguerre orthogonal ensemble (LOE). On the other hand, when A → ∞, the neighboring particles ξ2 j−1 , ξ2 j , j = 1, 2, . . . , N , coalesce, and (1.16) becomes the eigenvalue density function for the N × N Laguerre symplectic ensemble (LSE). Thus (1.16) interpolates LOE and LSE eigenvalue distributions. This density function arises in a symmetric version of the growth model with a growth rule given by the exponential distribution considered in [31, Prop. 1.4]. A discrete version of the above density function was considered in [8, (4.27)], and the limiting distribution of the largest particle was precisely F (x; w). Now formally taking the exponential limit (set q = 1 − 1/L and α = 1 − A/L, and take L → ∞) of [8, (4.27)] (which is convincing but is not justified yet), we see that if we set w A = 1/3 (1.17) N
212
BAIK AND RAINS
in (1.16) and if we take N → ∞, the scaled largest particle (ξ1 −4N )/(2N 1/3 ) has the limiting distribution F (x; w). We plan to exploit the justification of the exponential limit in a later publication. On the other hand, the longest decreasing subsequence corresponds to the longest down/right path from (0, 1) to (1, 0) in the above point selection process. Thus it is clear that the distribution of L n,m is insensitive to the number of fixed points (see Section 3 for results and discussions). The other ensemble we consider is the set of signed involutions. A signed permutation π is a bijection from {−n, . . . , −1, 1, . . . , n} onto itself which satisfies π(x) = −π(−x). The limiting distribution for a random signed permutation is obtained by [46] and [9]. In this paper we consider random signed involutions with/without constraints on fixed points and also on negated points (π(x) = −x) (see Section 3 for results). Especially, we obtain another one-parameter family of distributions which now interpolates F2 and F12 . Here F12 means the limiting distribution for the largest “eigenvalue” of the superimposition of the eigenvalues of two random GOE matrices. We note that in this case, F2 is equal to the second largest “eigenvalue” of such superimposition (see the discussion at the end of Section 3). For convenience of future reference, we summarize various definitions introduced above. By the kth row/column of π we mean the kth row/column of the corresponding Young diagram under the Robinson-Schensted map. Definition 1 Let Sn be the symmetric group of n letters, and let Sn be the set of bijections from {−n, . . . , −2, −1, 1, 2, . . . , n} onto itself satisfying π(x) = −π(−x). We define S˜n = {π ∈ Sn : π = π −1 }, Sn,m = π ∈ S˜2n+m : |{x : π(x) = x}| = m ,
S˜n = {π ∈ Sn : π = π −1 }, Sn,m + ,m − = π ∈ S˜2n+m + +m − : |{x : π(x) = x}| = 2m + , |{x : π(x) = −x}| = 2m − , ˜ L˜ (k) n (π) = the length of the kth row of π ∈ Sn ,
L˜ n
,(k)
(π) = the length of the kth row of π ∈ S˜n ,
,(k) L n,m (π) = the length of the kth row of π ∈ Sn,m , ,(k) (π) L n,m
,(k) L n,m (π) + ,m −
(1.18) (1.19) (1.20)
(1.21) L˜ n = L˜ (1) n , L˜ n = L˜ n
,(1) L n,m = L n,m ,
,(1)
(1.22) ,
(1.23)
(1.24)
= the length of the kth column of π ∈ Sn,m , ,(1) L n,m = L n,m ,
= the length of the kth row of π ∈ Sn,m + ,m − ,
(1.25)
RANDOM INVOLUTIONS
213 ,(1) L n,m + ,m − = L n,m . + ,m −
(1.26)
The results of this paper were announced in [8]. Since we completed this paper, there have been two applications. One is to random vicious walker models (see [23], [8]), and the other is to polynuclear growth models (see [40], [39], [6]). Indeed, there are bijections between the above two applications and various ensembles considered in this paper, and thus the results in this paper can be employed to answer asymptotic questions in the above applications. The proofs of our theorems use the Poissonization and de-Poissonization scheme of [30] and [3]. We define the Poisson generating function, for example, for L n,m by (see Definition 6) Q l (λ1 , λ2 ) := e−λ1 −λ2
X λn 1 λn 2 1 2 Pr L n 2 ,n 1 ≤ l . n 1 !n 2 !
(1.27)
n 1 ,n 2 ≥0
A generalization of the de-Poissonization lemma due to Johansson [30] yields that Pr L n 2 ,n 1 ≤ l ∼ Q l (n 1 , n 2 ) as n 1 , n 2 → ∞ (see Section 6 for the precise statement). Thus if we obtain the asymptotics of the generating function, the asymptotics of the coefficients follow. The point of the scheme is that the Poisson generating functions can be expressed in terms of Toeplitz and/or Hankel determinants. The necessary algebraic work for this purpose was done in our earlier paper [7]. The general theory of orthogonal polynomials then tells us that Toeplitz/Hankel determinants can be expressed in terms of orthogonal polynomials. In [7], it turned out that for all the ensembles being discussed in this paper, we need only one family of orthogonal polynomials πn (z; t) = z n + . . . which is orthogonal with respect to the weight et cos θ dθ/(2π) on the unit circle. This orthogonal polynomial is precisely the same orthogonal polynomial used in [3] to analyze the random permutation problem. The authors in [3] computed the uniform asymptotics of the normalization constant Nn (t) of πn as n, t → ∞ using the steepest-descent analysis for the corresponding Riemann-Hilbert problem (see (5.3)). The difference between the present paper and [3] is that we need πn (−α; t) for all α ≥ 0, which is in contrast to [3], where only one quantity, Nn (t), was needed. But in order to analyze N n (t), [3] controlled in a uniform way the asymptotic behavior of the solution to the associated Riemann-Hilbert problem. Therefore the asymptotics of πn (−α; t) for α uniformly apart from 1 can be (almost) directly read off from the analysis of [3] and eventually imply (1.11). The point z = −1 (α = 1) in the complex plane plays a special role in this RiemannHilbert analysis as discussed in [3]—it is the point where a gap starts to open up in the support of the associated equilibrium measure as the relation of t to n varies. When α → 1 as n → ∞ according to the scaling (1.14) (which is required for (1.15)), we need a more careful analysis of the Riemann-Hilbert problem which is the new part
214
BAIK AND RAINS
of the asymptotic analysis of the orthogonal polynomials and the Riemann-Hilbert problem. In this paper we establish this goal by extending the analysis of [3]. In the analysis of the Riemann-Hilbert problem in Section 10, we give a rather sketchy presentation for the parts that overlap the work of [3], but we give a full proof for the new analysis required for the case α → 1. We also rework portions of [3] as necessary for consistency of presentation. This paper is organized as follows. Section 2 defines the Tracy-Widom distribution functions as well as new classes of distribution functions which are used to state the main results, and their properties are discussed (see Lemma 2.1). The main results of the paper are then stated in Section 3. Determinantal formulae and orthogonal polynomial expressions for the Poisson generating functions are taken from [7] and summarized in Section 4. In Section 5 we state the main estimates of the relevant quantities of orthogonal polynomials; these estimates are key to the proofs of the theorems of Section 3. The de-Poissonization lemmas are stated in Section 6. Proofs of the main theorems are given in Section 7 for involutions with constraints on the number of fixed points (see Theorems 3.1, 3.2, 3.3, and 3.5), and in Section 8 for general involutions and equivalently for Mn1 (see Theorems 3.4 and 3.6), respectively. The case when α > 1 is considered in Section 9 (see remark to Theorem 3.3). Finally, the Riemann-Hilbert analysis is given in Section 10, which proves the propositions in Section 5. Notational remarks The ensemble S˜n in the present paper is identical to S˜n in [7]. In [7], S˜n was introduced to denote the ensemble of “neginvolutions,” and we investigated the longest increasing subsequence of π ∈ S˜n . But there is a bijection between S˜n and S˜n , and the longest increasing subsequence of π ∈ S˜n corresponds to the longest decreasing subsequence of the image of π. In the present paper we choose the viewpoint of considering both the increasing and decreasing subsequences of involutions of the same ensemble rather than considering only the increasing subsequences of involutions of the different ensembles. 2. Limiting distribution functions Let u(x) be the solution of the Painlev´e II (PII) equation u x x = 2u 3 + xu
(2.1)
with the boundary condition u(x) ∼ − Ai(x)
as x → +∞,
(2.2)
where Ai is the Airy function. The proof of the (global) existence and the uniqueness of the solution was first established in [27]: the asymptotics as x → ±∞ are (see,
RANDOM INVOLUTIONS
215
e.g., [27], [19])
3/2
e−(4/3)x u(x) = − Ai(x) + O x 1/4 r −x 1 u(x) = − 1+O 2 2 x Recall that Ai(x) ∼ e−(2/3)x
3/2
as x → +∞,
as x → −∞.
√ /(2 π x 1/4 ) as x → +∞. Define Z x v(x) := (u(s))2 ds,
(2.3) (2.4)
(2.5)
∞
so that v 0 (x) = (u(x))2 . We can now introduce the Tracy-Widom (TW) distributions. (Note that q := −u, which Tracy and Widom used in their papers, solves the same differential equation with the boundary condition q(x) ∼ + Ai(x) as x → ∞.) Definition 2 (TW distribution functions) Set Z ∞ Z 1 ∞ 1 v(s) ds = exp − (s − x)(u(s))2 ds , F(x) := exp 2 2 x Zx ∞ 1 E(x) := exp u(s) ds , 2 x
(2.6) (2.7)
and set F2 (x) := F(x)2 = e−
R∞ x
(s−x)(u(s))2 ds
1/2
(2.8)
, R∞
F1 (x) := F(x)E(x) = F2 (x) e(1/2) x u(s) ds , F4 (x) := F(x) E(x)−1 + E(x) /2 R∞ 1/2 −(1/2) R ∞ u(s) ds x e = F2 (x) + e(1/2) x u(s) ds /2.
(2.9)
(2.10)
In [44] and [45], Tracy and Widom proved that under proper centering and scaling, the distribution of the largest eigenvalue of a random GUE/GOE/GSE matrix converges to F2 (x) / F1 (x) / F4 (x) as the size of the matrix becomes large. We note that from the asymptotics (2.3) and (2.4), for some positive constant c, 3/2 as x → +∞, (2.11) F(x) = 1 + O e−cx −cx 3/2 as x → +∞, (2.12) E(x) = 1 + O e 3 F(x) = O e−c|x| as x → −∞, (2.13)
216
BAIK AND RAINS
E(x) = O e−c|x|
3/2
as x → −∞.
(2.14)
Hence, in particular, lim x→+∞ Fβ (x) = 1 and limx→−∞ Fβ (x) = 0, β = 1, 2, 4. Monotonicity of Fβ (x) follows from the fact that Fβ (x) is the limit of a sequence of distribution functions. Therefore Fβ (x) is indeed a distribution function. Definition 3 Define χGOE , χGUE , and χGSE to be random variables whose distribution functions are given by F1 (x), F2 (x), and F4 (x), respectively. Define χGOE2 to be a random variable with the distribution function F1 (x)2 . As indicated in the introduction, we need new classes of distribution functions to describe the transitions from χGSE to χGOE and from χGUE to χGOE2 . First, we consider the Riemann-Hilbert problem (RHP) for the Painlev´e II equation (see [20], [29]). Let 0 be the real line R, oriented from +∞ to −∞, and let m(· ; x) be the solution of the following RHP: m(z; x) is analytic in z ∈ C \ 0, ! 3 1 −e−2i((4/3)z +x z) m + (z; x) = m − (z; x) 2i((4/3)z 3 +x z) for z ∈ 0, e 0 m(z; x) = I + O(1/z) as z → ∞. (2.15) Here m + (z; x) (resp., m − ) is the limit of m(z 0 ; x) as z 0 → z from the left (resp., right) of the contour 0: m ± (z; x) = lim↓0 m(z ∓ i; x). Relation (2.15) corresponds to the RHP for the PII equation with the special monodromy data p = −q = 1, r = 0 (see [20], [29], also [22], [19]). In particular, if the solution is expanded at z = ∞, m 1 (x) 1 m(z; x) = I + +O 2 as z → ∞, (2.16) z z we have 2i(m 1 (x))12 = −2i(m 1 (x))21 = u(x), 2i(m 1 (x))22 = −2i(m 1 (x))11 = v(x),
(2.17) (2.18)
where u(x) and v(x) are defined in (2.1)–(2.5). Now we define two one-parameter families of distribution functions. Definition 4 Let m(z; x) be the solution of RHP (2.15), and denote by m jk (z; x) the ( jk)-entry of m(z; x). For w > 0, define
RANDOM INVOLUTIONS
F (x; w) := F(x)
217
m 22 (−iw; x) − m 12 (−iw; x) E(x)−1
+ m 22 (−iw; x) + m 12 (−iw; x) E(x) /2, (2.19)
and for w < 0, define F (x; w) := e(8/3)w
3 −2xw
F(x)
−m 21 (−iw; x) + m 11 (−iw; x) E(x)−1 − m 21 (−iw; x) + m 11 (−iw; x) E(x) /2.
(2.20)
Also, define F (x; w) := m 22 (−iw; x)F2 (x), F (x; w) := −e
(8/3)w 3 −2xw
w > 0,
m 21 (−iw; x)F2 (x),
(2.21) w < 0.
(2.22)
First, F (x; w) and F (x; w) are real from Lemma 2.1(i). Note that F (x; w) and F (x; w) are continuous at w = 0 since at z = 0, the jump condition of RHP (2.15) implies (m 12 )+ (0; x) = −(m 11 )− (0; x) and (m 22 )+ (0; x) = −(m 21 )− (0; x). In fact, F (x; w) and F (x; w) are entire in w ∈ C from RHP (2.15). From (2.11)–(2.14) and (2.24)–(2.27), we see that lim F (x; w), F (x; w) = 1,
x→+∞
lim F (x; w), F (x; w) = 0
x→−∞
(2.23)
for any fixed w ∈ R. Also, Theorem 3.2 shows that F (x; w) and F (x; w) are limits of distribution functions, implying that they are monotone in x. Therefore F (x; w) and F (x; w) are indeed distribution functions for each w ∈ R. Definition 5 Define χw and χw to be random variables with distribution functions F (x; w) and F (x; w), respectively. We close this section by summarizing some properties of m(−iw; x) in the following lemma. In particular, the lemma implies that F (x; w) interpolates between F4 (x) and F1 (x), and F (x; w) interpolates between F2 (x) and F1 (x)2 (see Corollary 2.2). LEMMA 2.1 0 , σ = 0 1 , and set [a, b] = ab − ba. Let σ3 = 10 −1 1 10 (i) For real w, m(−iw; x) is real.
218
(ii)
BAIK AND RAINS
For fixed w ∈ R, we have m(−iw; x) = I + e
−cx 3/2
m(−iw; x) = I + e−cx
3/2
1 − e(8/3)w 0 1
3 −2xw
1 3 −e−(8/3)w +2xw
1 1 −1 (−(4/3)w3 +xw)σ3 e e m(−iw; x) ∼ √ 2 1 1
!
!
0 , 1
,
w > 0, x → +∞, (2.24) w < 0, x → +∞, (2.25)
√ √ (( 2/3)(−x)3/2 + 2w 2 (−x)1/2 )σ3
,
w > 0, x → −∞, (2.26) √ √ 1 1 1 (−(4/3)w3 +xw)σ3 (−( 2/3)(−x)3/2 − 2w2 (−x)1/2 )σ3 e m(−iw; x) ∼ √ e , 2 −1 1 w < 0, x → −∞. (iii)
(2.27)
For any x, we have lim m(−iw; x) = lim σ1 m(−iw; x)σ1 w→0− (1/2) E(x)2 + E(x)−2 −E(x)2 = . E(x)2 (1/2) −E(x)2 + E(x)−2
w→0+
(iv)
For fixed w ∈ R \ {0}, m(−iw; x) solves the differential equation d m = w[m, σ3 ] + u(x)σ1 m, dx
(v)
(vi)
(2.28)
where u(x) is the solution of the PII equation (2.1) and (2.2). For fixed x, m(−iw; x) solves 2 ∂ −u 0 u 2 m = (−4w + x)[m, σ3 ] − 4wu(x)σ1 m − 2 0 m. u −u 2 ∂w
(2.29)
(2.30)
For any x, we have
COROLLARY
m(z; x) = σ1 m(−z; x)σ1 .
(2.31)
F (x; 0) = F1 (x),
(2.32)
2.2
We have
lim F (x; w) = F4 (x),
w→∞
(2.33)
RANDOM INVOLUTIONS
219
lim F (x; w) = 0,
(2.34)
w→−∞
F (x; 0) = F1 (x)2 ,
(2.35)
lim F (x; w) = F2 (x),
(2.36)
lim F (x; w) = 0.
(2.37)
w→∞ w→−∞
Proof The values at w = 0 follow from (2.28). For w → ±∞, note from RHP (2.15) that we have limz→∞ m(z; x) = I . Proof of Lemma 2.1 Let v(z) = v(z; x) denote the jump matrix of RHP (2.15). Since v(−z) = v(z) for z ∈ R, M(z) := m(−z; x) also solves the same RHP. By the uniqueness of the solution of RHP (2.15), we have m(−z; x) = m(z; x),
z ∈ C \ R.
(2.38)
Thus, m(−iw; x) is real for w ∈ R, thus proving (i). By the symmetry of the jump matrix, σ1 v(−z)−1 σ1 = v(z), we obtain, by an argument similar to (i), σ1 m(−z; x)σ1 = m(z; x), (2.39) which is (vi). The asymptotics results (ii) as x → ±∞ follow from the calculations in [19, Sec. 6, pp. 329–333]. For the proof of (iv), define a new matrix function f (z; x) := m(z; x)e −iθσ3 ,
θ :=
4 3 z + x z. 3
(2.40)
Then f (· ; x) satisfies the jump condition f + (z; x) = f − (z; x) 11 −1 0 for z ∈ R, and f (z; x)eiθσ3 → I as z → ∞. Since the jump matrix for f (z; x) is independent of x, f 0 (z; x), the derivative with respect to x, satisfies f +0 (z; x) = f −0 (z; x) 11 −1 0 , and f 0 eiθσ3 + iθ 0 f σ3 eiθσ3 → 0 as z → ∞. Hence f 0 f −1 has no jump across R, and it satisfies f 0 f −1 + iθ 0 f σ3 f −1 → 0 as z → ∞. If we write m(z; x) = I + m 1 (x)/z + O(1/z 2 ) as z → ∞, we have iθ 0 f σ3 f −1 = i zσ3 + i[m 1 , σ3 ] + O(z −1 ) as z → ∞. Thus f 0 f −1 is entire and as z → ∞, f 0 f −1 ∼ −i zσ3 − i[m 1 , σ3 ]. Therefore, by Liouville’s theorem, we obtain f 0 (z; x)( f (z; x))−1 = −i zσ3 − i[m 1 , σ3 ].
(2.41)
220
BAIK AND RAINS
Recalling that u(x) = 2i(m 1 (x))12 = −2i(m 1 (x))21 in (2.17), we have [m 1 , σ3 ] = iu(x)σ1 . Changing f to m from (2.40), (2.41) is d m(z; x) = i z[m(z; x), σ3 ] + u(x)σ1 m(z; x). dx
(2.42)
This is (2.29) when z = −iw. The proof of (v) is very similar to that of (iv), and the details are left to the reader. We note only that in the derivation of (v) we need the identity d m 1 = i[m 2 , σ3 ] − i[m 1 , σ3 ]m 1 , dx
(2.43)
which can be obtained from (2.42) by setting m(z; x) = I + m 1 (x)/z + m 2 (x)/z 2 + O(1/z 3 ) as z → ∞. Finally, we prove (iii). Note that limw→0± m(−iw; x) = m ± (0; x). From the jump condition at z = 0, we have 1 −1 m + (0; x) = m − (0; x) . (2.44) 1 0 Letting z → 0, Im z > 0, in (vi), we have σ1 m + (0; x)σ1 = m − (0; x), which together with (2.44) implies that m + (0; x) = σ1 m + (0; x)σ1 11 −1 0 . Thus we have a(x) b(x) (2.45) m + (0; x) = a(x) + b(x) −b(x) for some a(x), b(x). Also, the condition det v(z) = 1 for all z ∈ R implies that det m(z; x) = 1 for all z ∈ C \ R, and hence we have b2 + 2ab + 1 = 0.
(2.46)
Now letting z → 0, Im z < 0, in (2.42), we obtain 0 u(x) 0 −1 . m + (0; x)(m + (0; x)) = u(x) 0
(2.47)
Thus from (2.45) and (2.46), b0 /b = −u, which has the solution b(x) = b(y)e−
Rx y
u(s) ds
.
(2.48)
From (2.24) with w = 0+ , we have b(x) = (m 12 )+ (0; x) → −1 R∞
as x → +∞.
(2.49)
Therefore b(x) = −e x u(s) ds , which is −E(x)2 from (2.7). Now (2.46) gives a(x) = (1/2)(E(x)2 + E(x)−2 ), proving (2.28).
RANDOM INVOLUTIONS
221
3. Statement of results 3.1. Involutions with constraints on the number of fixed points Recall (see Definition 1 in the introduction) the ensembles S˜n,m , S˜n,m + ,m − of (signed) involutions with constraints on the number of fixed (and negated) points. We scale the random variables: √ L n,m − 2 2n + m χn,m := , (3.1) (2n + m)1/6 √ L n,m − 2 2n + m , (3.2) χn,m := (2n + m)1/6 √ L n,m + ,m − − 2 4n + 2m + + 2m − χn,m + ,m − := . (3.3) 22/3 (4n + 2m + + 2m − )1/6 THEOREM 3.1 For fixed α and β, we have
lim Pr χn,[√2nα] ≤ x = F4 (x), 0 ≤ α < 1, n→∞ lim Pr χn,[√2n] ≤ x = F1 (x), n→∞ lim Pr χn,[√2nα] ≤ x = 0, α > 1; n→∞
lim Pr χn,[√2nβ] ≤ x = F1 (x),
n→∞
β ≥ 0;
lim Pr χn,[√nα],[√nβ] ≤ x = F2 (x), 0 ≤ α < 1, β ≥ 0, n→∞ lim Pr χn,[√n],[√nβ] ≤ x = F1 (x)2 , β ≥ 0, n→∞ lim Pr χn,[√nα],[√nβ] ≤ x = 0, α > 1, β ≥ 0. n→∞
(3.4) (3.5) (3.6)
(3.7)
(3.8) (3.9) (3.10)
As indicated in the introduction, as α → 1 at a certain rate, we see smooth transitions. 3.2 For fixed w ∈ R and β ≥ 0, we have lim Pr χn,m ≤ x = F (x; w), n→∞ lim Pr χn,m + ,m − ≤ x = F (x; w), THEOREM
n→∞
√ m = [ 2n − 2w(2n)1/3 ], (3.11) √ √ m + = [ n − 2wn 1/3 ], m − = [ nβ]. (3.12)
222
BAIK AND RAINS
From Corollary 2.2, this result is consistent with Theorem 3.1. We also have convergence of moments. 3.3 For any p = 1, 2, 3, . . ., the following hold. For fixed α and β, lim E (χn,[√2nα] ) p = E (χGSE ) p , 0 ≤ α < 1, n→∞ lim E (χn,[√2n] ) p = E (χGOE ) p , n→∞ lim E (χn,[√2nβ] ) p = E (χGOE ) p , 0 ≤ β, n→∞ lim E (χn,[√nα],[√nβ] ) p = E (χGUE ) p , 0 ≤ α < 1, β ≥ 0, n→∞ lim E (χn,[√n],[√nβ] ) p = E (χGOE2 ) p , β ≥ 0. THEOREM
n→∞
Also, for fixed w ∈ R and β ≥ 0, lim E (χn,m ) p = E (χw ) p , n→∞ lim E (χn,m + ,m − ) p = E (χw ) p , n→∞
(3.13) (3.14) (3.15) (3.16) (3.17)
√ (3.18) m = [ 2n − 2w(2n)1/3 ], √ √ m + = [ n − 2wn 1/3 ], m − = [ nβ]. (3.19)
Remark. Theorem 3.1 shows that when α > 1 is fixed, we have used incorrect scaling. When properly scaled, the resulting limiting distribution is Gaussian (see Section 9 for the statement and the proof). The proofs of Theorems 3.1, 3.2, and 3.3 are provided in Section 7. In terms of the point selection process, which is a version of (directed site) percolation, mentioned in the introduction, the above results show that the limiting distribution of the longest path depends on the geometry of the domain, while the order of fluctuation is the same: (mean)1/3 . From (1.4), the longest up/right path in a rectangle 0 ≤ x, y ≤ 1 has F2 in the limit, while the longest up/right path in a lower triangle 0 ≤ x < y ≤ 1 has F4 (see (3.4)) in the limit if there are no points on the edge 0 ≤ x = y ≤ 1. If there are points on 0 ≤ x = y ≤ 1, they affect the length of the longest up/right path. On the other hand, the longest down/right path corresponding to L n,m can be thought of as the longest path from the point (0, 1) to the line 0 ≤ x = y ≤ 1. Thus result (3.7) shows that the point-to-line maximizing path has different limiting distribution from the point-to-point maximizing path, F2 from (1.4), though the fluctuation order is identical. One can also state similar results for the Poisson process (see Proposition 7.3) and certain directed site percolation processes considered in [31] (see [8]). This observation came from discussions between Baik and Charles Newman, to whom we are especially grateful.
RANDOM INVOLUTIONS
223
3.2. General involutions Now we consider general involutions and signed involutions without any conditions on the number of fixed or negated points. THEOREM 3.4 For any fixed x ∈ R, we have √ L˜ n − 2 n lim Pr χ˜ n := ≤ x = F1 (x), n→∞ n 1/6 √ L˜ n − 2 2n lim Pr χ˜ n := 2/3 ≤ x = F1 (x)2 . n→∞ 2 (2n)1/6
(3.20) (3.21)
Also, for any p = 1, 2, 3, . . .,
lim E (χ˜ n ) p = E (χGOE ) p , n→∞ lim E (χ˜ n ) p = E (χGOE2 ) p .
n→∞
(3.22) (3.23)
As mentioned in the introduction, this result proves that the first row of a random Young diagram under the 1-Plancherel measure Mn1 behaves statistically like the largest eigenvalue of a random GOE matrix as n → ∞. The proof of Theorem 3.4 is given in Section 8. 3.3. Second rows For the second row, we scale the same way as in (3.1)–(3.3), and we denote the scaled ,(2) ,(2) ,(2) , χn,m , and χn,m random variables by χn,m + ,m − , respectively. THEOREM 3.5 Let α, β ≥ 0 be fixed. Then
,(2) √ n, 2nα ,(2) lim Pr χ √ n,[ 2nβ] n→∞ ,(2) √ lim Pr χn,[ nα],[√nβ] n→∞
lim Pr χ
n→∞
and for any p = 1, 2, 3, . . .,
,(2) √ )p n, 2nα n→∞ ,(2) lim E (χ √ )p n,[ 2nβ] n→∞ √ √ lim E (χn,[,(2) )p nα],[ nβ] n→∞ lim E (χ
= F4 (x),
(3.24)
= F4 (x),
(3.25)
= F2 (x),
(3.26)
= E (χGSE ) p , = E (χGSE ) p , = E (χGUE ) p .
(3.27) (3.28) (3.29)
224
BAIK AND RAINS
Theorem 3.5 is proved in Section 7. As in the first row, these results yield the following theorem on the second rows of general (signed) involutions. The proof is very similar to the proof of Theorem 3.4, and we skip the details. THEOREM 3.6 For any fixed x ∈ R, we have
√ (2) L˜ n − 2 n (2) ≤ x = F4 (x), lim Pr χ˜ n := n→∞ n 1/6 √ ,(2) L˜ n − 2 2n ,(2) lim Pr χ˜ n := ≤ x = F2 (x). n→∞ 22/3 (2n)1/6
(3.30) (3.31)
Also, for any p = 1, 2, 3, . . .,
lim E (χ˜ n(2) ) p = E (χGSE ) p , n→∞ lim E (χ˜ n ,(2) ) p = E (χGUE ) p .
n→∞
(3.32) (3.33)
We conclude this section with some remarks on GOE and GSE. If the conjecture given in the introduction that the kth row of a random involution behaves in the limit like the kth largest eigenvalue of a random GOE matrix is true, the result (3.30) suggests that the limiting distribution, F1(2) , of the second largest eigenvalue of GOE is equal to the limiting distribution, F4 , of the largest eigenvalue of GSE. Equivalently, since a GSE matrix has double eigenvalues, the second eigenvalues of GOE and GSE are expected to have the same limiting distribution. An indication for this is [36, Th. 10.6.1], which says that the distributions of N alternate angles of the eigenvalues of a random (2N × 2N )-matrix taken from the circular orthogonal ensemble (COE) are identical to those of the N angles of the eigenvalues of a random (N × N )-matrix taken from the circular symplectic ensemble (CSE). Indeed, for 2N × 2N Laguerre ensembles, we have proved that the joint distributions of the second, fourth, sixth, . . . largest eigenvalues of the Laguerre orthogonal ensemble (LOE) and the Laguerre symplectic ensemble (LSE) are identical (see [7, Rem. 1 to Cor. 7.6]). In particular, since the kth largest eigenvalue of a Laguerre ensemble has the same limiting distribution as the corresponding quantity for a Gaussian ensemble, the above remark implies that F4((2k−1) = F4(2k) = F1(2k) ,
k = 1, 2, . . . .
(3.34)
Thus (3.20) and (3.30) imply that the first and second rows of a random involution have the same limiting distribution as the first and second eigenvalues of GOE, respectively.
RANDOM INVOLUTIONS
225
Recently the authors of [24] proved that the same property holds true for GOE and GSE. They also proved, among many other things, that the (2k)th “eigenvalue” of a superimposition of two random GOE matrices has the same distribution as the kth eigenvalues of a random GUE matrix. In particular, when k = 1, this implies that (F12 )(2) = F2 ,
(3.35)
and hence (3.21) and (3.31) state that the first and second rows of a random signed involution have the same limiting distribution as the first and second “eigenvalues” of the superimposition of two random GOE matrices, respectively. 4. Poisson generating functions We review the results from [7] which we need in the proof of the theorems in Section 3. As in [7], throughout the paper the notation ~ indicates an arbitrary member of the set { , , }. Definition 6 We define the Poisson generating functions for the distributions introduced above: Q l (λ1 , λ2 ) := e−λ1 −λ2 Q l (λ1 , λ2 ) := e−λ1 −λ2 Q l (λ1 , λ2 , λ3 ) := e
X λn 1 λn 2 1 2 Pr L n 2 ,n 1 ≤ l , n 1 !n 2 !
(4.1)
n 1 ,n 2 ≥0
X λn 1 λn 2 1 2 Pr L n 2 ,n 1 ≤ l , n 1 !n 2 !
n 1 ,n 2 ≥0
n
X
−λ1 −λ2 −λ3
(4.2)
n 1 ,n 2 ,n 3 ≥0
λn1 1 λn2 2 λ3 3 Pr L n 3 ,n 1 ,n 2 ≤ l . n 1 !n 2 !n 3 !
(4.3)
As in [7], let f˜nml (resp., f˜nml ) be the number of involutions on n numbers with m fixed points with no increasing (resp., decreasing) subsequence of length greater than l. Thus f˜(2n 2 +n 1 )n 1 l = Pr(L n 2 ,n 1 ≤ l) · |Sn 2 ,n 1 |, and so on. Also, let f˜nm + m − l be the number of signed involutions on 2n letters with 2m + fixed points and 2m − negated points with no increasing subsequence of length greater than l : f˜(2n 3 +n 1 +n 2 )n 1 n 2 l = Pr(L n 3 ,n 1 ,n 2 ≤ l) · |Sn 3 ,n 1 ,n 2 |. We also define Pl (t; α) := e−αt−t
2 /2
Pl (t; β) := e−βt−t
2 /2
X tn X 0≤n
Pl (t; α, β) := e−αt−βt−t
n!
0≤m
X tn X 0≤n 2
n!
X 0≤n
α m f˜nml ,
(4.4)
β m f˜nml ,
(4.5)
0≤m n X t
n!
0≤m + ,m −
α m + β m − f˜nm + m − l .
(4.6)
226
BAIK AND RAINS
Using |Sn,m | = (2n+m)!/(n!m!2n ) and |Sn,m + ,m − | = (2n+m + +m − )!/(n!m + !m − !), it is easy to check that Pl (t; α) = Q l (αt, t 2 /2),
(4.7)
Pl (t; β) = Q l (βt, t 2 /2),
(4.8)
2
Pl (t; α, β) = Q l (αt, βt, t ).
(4.9)
It turns out that the P-formulae in (4.4)–(4.6) are useful for algebraic manipulations (see [7]), while the Q-formulae (4.1)–(4.3) are well adapted to asymptotic analysis. The following results from [7] provide the starting point for our analysis in this paper. For a nonnegative integer k, define πk (z; t) = z k + . . . to be the monic orthogonal polynomial of degree k with respect to the weight function exp(t (z + 1/z)) dz/(2πi) on the unit circle. Let the norm of πk (z; t) be Nk (t): Z dz = Nn (t)δnm . πn (z; t)πm (z; t)et (z+1/z) (4.10) 2πi z 6 We note that all the coefficients of πn (z; t) are real. Define πn∗ (z; t) := z n πn (z −1 ; t).
(4.11)
Then we have the following theorem. 4.1 ([7, Cors. 4.3 and 2.7]) For α, β ≥ 0, 1 ∗ 2 P2l (t; α) = e−αt−t /2 π2l−1 (−α; t) − απ2l−1 (−α; t) Dl−− (t) 2 ∗ ++ + π2l−1 (−α; t) + απ2l−1 (−α; t) Dl−1 (t) , (4.12) THEOREM
P2l+1 (t; α) = e
−αt−t 2 /2 1
2
π2l∗ (−α; t) + απ2l (−α; t) et Dl+− (t) +
π2l∗ (−α; t) − απ2l (−α; t)
P2l+1 (t; β) = e−t P2l+1 (t; α, β) = e
2 /2
e
Dl++ (t),
−αt−t 2
πl∗ (−α; t)Dl (t),
−t
Dl−+ (t)
, (4.13) (4.14) (4.15)
where for any real t ≥ 0, Dl (t) and Dl±± (t) are certain Toeplitz and Hankel determinants which in turn can be written as Y 2 e−t Dl (t) = N j (t)−1 , (4.16) j≥l
RANDOM INVOLUTIONS
e−t e e−t e−t
2 /2
−t 2 /2 2 /2+t
2 /2−t
227
Dl−− (t) = Dl++ (t)
=
Dl+− (t) = Dl−+ (t) =
Y j≥l
Y j≥l
Y j≥l
Y j≥l
N2 j+2 (t)−1 (1 + π2 j+2 (0; t)),
(4.17)
N2 j+2 (t)−1 (1 − π2 j+2 (0; t)),
(4.18)
N2 j+1 (t)−1 (1 − π2 j+1 (0; t)),
(4.19)
N2 j+1 (t)−1 (1 + π2 j+1 (0; t)).
(4.20)
Remark. The absence of β on the right-hand side of (4.14) is fairly simple to explain. Observe that in the point selection model, the longest decreasing subsequence can always be chosen to be symmetric about the diagonal; moreover, any decreasing subsequence can contain at most one diagonal point. Thus if the longest decreasing subsequence has l points, then removing the diagonal points results in a longest decreasing subsequence with 2[l/2] points. The independence from β is thus special to Pl for l odd; for l even, we do indeed have β-dependence. But by the monotonicity of Pl in l, we need only (4.14) to compute the limiting distribution; in particular, the limiting distribution does not depend on β. A similar remark applies to (4.15). As a special case, we have the following theorem. THEOREM 4.2 ([7, Th. 2.5 and Cor. 4.3]) For l ≥ 0, we have the following formulae: −− 2 (t) + Dl++ (t) /2, P2l+2 (t; 0) = e−t /2 Dl+1 2 P2l+1 (t; 0) = e−t /2 et Dl+− (t) + e−t Dl−+ (t) /2,
P2l (t; 0) = e
P2l (t; 1) = P2l (t; 1) = e P2l+1 (t; 1) = P2l+1 (t; 1) = e P2l (t; 0, 0) = e P4l+1 (t; 1, β) = P4l+3 (t; 1, β) = Also, P0 (t; 0) = e−t
2 /2
−t 2 /2
Dl++ (t),
−t−t 2 /2 −t 2 /2 −t 2
Dl−+ (t),
Dl++ (t),
Dl (t),
−t−t 2
e Dl++ (t)Dl−+ (t), 2 −+ e−t−t Dl++ (t)Dl+1 (t).
D0−− (t) = e−t
2 /2
(4.21) (4.22) (4.23) (4.24) (4.25) (4.26) (4.27) (4.28)
.
For the second row, we define the Poisson generating functions in a similar manner. Then we have the following theorem.
228
BAIK AND RAINS
THEOREM 4.3 ([7, Th. 5.8 and Cor. 5.12]) For α, β ≥ 0,
Pl
,(2)
(t; α) = Pl (t; 0),
(4.29)
,(2) P2l+1 (t; β) = P2l (t; 0),
,(2) P2l+1 (t; α, β)
(4.30)
= P2l (t; 0, 0).
(4.31)
5. Asymptotics of orthogonal polynomials Let 6 = {z ∈ C : |z| = 1} be the unit circle in the complex plane, oriented counterclockwise. Set −1 ψ(z; t) := et (z+z ) . (5.1) Let πn (z; t) = z n + . . . be the nth monic orthogonal polynomial with respect to the measure ψ(z; t) dz/(2πi z) on the unit circle. From Theorems 4.1 and 4.2, in order to obtain the asymptotics of the Poisson generating functions, we need the asymptotics, as k, t → ∞, of Nk (t), πk (z; t), πk∗ (z; t). (5.2) In this section, we summarize the asymptotic results for these quantities. Define the (2 × 2)-matrix-valued function of z in C \ 6 by Y (z; k; t)
:=
πk (z; t)
∗ (z; t) −Nk−1 (t)−1 πk−1
R
πk (s;t) ψ(s;t) ds s−z 2πis k , ∗ (s;t) R π ψ(s;t) ds k−1 −1 −Nk−1 (t) 6 s−z 2πis k 6
k ≥ 1. (5.3)
Then Y (· ; k; t) solves the following RHP (see [3, Lem. 4.1]): Y (z; k; t) is analytic in z ∈ C \ 6, ! 1 (1/z k )ψ(z; t) Y+ (z; k; t) = Y− (z; k; t) on z ∈ 6, 0 1 z −k 0 Y (z; k; t) = I + O(1/z) as z → ∞. 0 zk
(5.4)
Here the notation Y+ (z; k) (resp., Y− ) denotes the limiting value limz 0 →z Y (z 0 ; k) with |z 0 | < 1 (resp., |z 0 | > 1). Note that k and t play the role of external parameters in RHP (5.4); in particular, the term O(1/z) does not imply a uniform bound in k and t. One can easily show that the solution of RHP (5.4) is unique; hence (5.3) is the unique solution of RHP (5.4). This RHP formulation of orthogonal polynomials on the unit circle given in [3] is an adaptation of a result of A. Fokas, A. Its, and A. Kitaev in [21], where they considered orthogonal polynomials on the real line.
RANDOM INVOLUTIONS
229
From (5.3), the quantities in (5.2) are equal to Nk−1 (t)−1 = −Y21 (0; k; t), πk (z; t) = Y11 (z; k; t),
πk∗ (z; t)
k
= z Y11 (z
−1
; k; t) = Y21 (z; k + 1; t)(Y21 (0; k + 1; t))
(5.5) (5.6) −1
.
(5.7)
(For the other entries of Y , one can check directly from (5.3) that Y12 (0; k; t) = Nk (t), Y22 (0; k; t) = πk (0; t).) Thus the asymptotic analysis of RHP (5.4) would yield the asymptotics of the above quantities, and hence eventually the theorems in Section 3. The asymptotic analysis of RHP (5.4) was conducted in [3] with special interest in Y21 (0; k; t). But as mentioned in the introduction, [3] controlled the solution Y (z) to RHP (5.4) in a uniform way. In [3] and Proposition 5.1, it is natural to distinguish five different regimes of k and t. From the analysis of [3], the following results for Y (0; k; t), except for πk (0; t) in Proposition 5.1(ii), can be directly read off. For example, [3, (5.34)–(5.35)] yield the estimates for Proposition 5.1(iii) when x ≥ 0. For Proposition 5.1(ii), we need to improve the L 1 -norm bound (see [3, (5.23)]) of the associated jump matrix. If one is interested only in N k−1 (t), the first integral involving w(3) (s) in [3, displayed equation before (5.19)] vanishes, and hence [3, bound (5.23)] is enough. But for πk (0; t), this integral does not vanish, and we need an improved bound (see the discussion in (10.43)–(10.45)). PROPOSITION 5.1 ([3]) There exists M0 > 0 such that as k, t → ∞, we have the following asymptotic results for Nk−1 (t) and πk (0; t) in each different region of k and t. (i) If 0 ≤ 2t ≤ ak with 0 < a < 1, then Nk−1 (t)−1 − 1 , πk (0; t) ≤ Ce−ck (5.8)
(ii)
for some constants C, c > 0. If ak ≤ 2t ≤ k − Mk 1/3 with some M > M0 and 0 < a < 1, then
√ Nk−1 (t)−1 − 1 , πk (0; t) ≤ C e−(2 2/3)k(1−2t/k)3/2 k 1/3
(iii)
(5.9)
for some constant C > 0. If 2t = k − (x/21/3 )k 1/3 with −M ≤ x ≤ M for some constant M > 0, then 1/3 1/3 Nk−1 (t)−1 − 1 − 2 v(x) , πk (0; t) + (−1)k 2 u(x) ≤ C k 2/3 (5.10) 1/3 1/3 k k
for some constant C > 0, where u(x) and v(x) are defined in (2.1) and (2.5), respectively.
230
BAIK AND RAINS
If k + Mk 1/3 ≤ 2t ≤ ak with some M > M0 and a > 1, then
(iv)
r 2t k(2t/k−log(2t/k)−1) −1 , e N (t) − 1 k−1 k r C 2t (−1)k πk (0; t) − 1 ≤ 2t − k 2t − k
(5.11)
for some constant C > 0. If ak ≤ 2t ≤ bk with 1 < a < b, then
(v)
r 2t k(2t/k−log(2t/k)−1) −1 , N (t) − 1 e k−1 k r C 2t (−1)k πk (0; t) − 1 ≤ 2t − k k
(5.12)
for some constant C > 0.
Now we are interested in πk (z; t). If z is apart from −1 and is fixed, then similar estimates for Y (z; k; t) can be obtained from the analysis of [3]. The result below when x ≥ 0 is (almost) direct from the work of [3]. For the case when x < 0, the analysis of [3] expresses the bound in terms of the so-called g-function, and we need further analysis for this g-function. When z = 0, this g-function becomes very simple: g(0) = πi (see (10.134)–(10.139)). PROPOSITION 5.2 For 2t = k − x(k/2)1/3 , x fixed, and for each fixed z ∈ C \ 6, we have
lim et z πk (z; t) = 0,
k→∞
−1
lim z −k et z πk (z; t) = 1,
k→∞
lim et z πk∗ (z; t) = 1,
k→∞
−1
|z| < 1,
lim z −k et z πk∗ (z; t) = 0,
k→∞
|z| > 1.
(5.13) (5.14)
5.3 For 2t = k − x(k/2)1/3 , x fixed, we have for fixed α > 1, COROLLARY
lim e−αt πk (−α; t) = 0,
k→∞
lim e−αt πk∗ (−α; t) = 0.
k→∞
(5.15)
Proof Write e−αt πk (−α; t) = α k et (−α+α
−1 )
−1
α −k e−tα πk (−α; t) −1
= ek f (α;2t/k) α −k e−tα πk (−α; t),
(5.16)
RANDOM INVOLUTIONS
231
where
γ (−α + α −1 ) + log α. (5.17) 2 The function f (α; 1) is strictly decreasing for α > 0, and f (1; 1) = 0. Hence f (α; 1) < 0 for α > 1. Note that f (α; γ ) =
f (α; γ ) = f (α; 1) +
γ −1 (−α + α −1 ). 2
(5.18)
When x ≤ 0, 2t/k ≥ 1 and hence f (α; 2t/k) ≤ f (α; 1). On the other hand, when x > 0, since 2t/k − 1 = −x/(21/3 k 2/3 ), we have f (α; 2t/k) ≤ (1/2) f (α; 1) if k > (22/3 x(−α + α −1 )/( f (α; 1)))3/2 . Therefore (5.14) implies that −αt e πk (−α; t) ≤ e(k/2) f (α;1) (−α)−k e−tα −1 πk (−α; t) → 0, (5.19) as k → ∞. Similar calculations give the desired result for πk∗ .
When z → −1 (which is required for the proof of Theorem 3.2), the estimates for Y (z; k; t) cannot be directly read off from the result of [3]. However, with more detailed estimates, the same procedure as in [3] gives us the following results (see Section 10.1.3, (10.73)–(10.83), for the case when x ≥ 0, and see Section 10.2.3, (10.140)–(10.150), for the case when x < 0). Recall from Section 2 that m(· , x) solves the RHP for the PII equation (2.15). PROPOSITION 5.4 Let 2t = k − x(k/2)1/3 , where x is a fixed number. Set
α =1−
24/3 w . k 1/3
(5.20)
We have for w > 0 fixed, lim (−1)k e−tα πk (−α; t) = −m 12 (−iw; x),
(5.21)
k→∞
lim e−tα πk∗ (−α; t) = m 22 (−iw; x),
(5.22)
k→∞
and for w < 0 fixed, −1
lim (−α)−k e−tα πk (−α; t) = m 11 (−iw; x),
(5.23)
k→∞
−1
lim α −k e−tα πk∗ (−α; t) = −m 21 (−iw; x).
(5.24)
k→∞
COROLLARY 5.5 For w < 0, under the same condition as Proposition 5.4, we have
lim (−1)−k e−tα πk (−α; t) = m 11 (−iw; x)e(8/3)w
k→∞
3 −2xw
,
(5.25)
232
BAIK AND RAINS
lim e−tα πk∗ (−α; t) = −m 21 (−iw; x)e(8/3)w
k→∞
3 −2xw
.
(5.26)
Proof Note that under the stated conditions we have et (α
−1 −α)
α k = e(8/3)w
3 −2xw+O(k −1/3 )
.
(5.27)
Remark. As noted in Section 2, it follows from RHP (2.15) that (m 12 )+ (0; x) = −(m 11 )− (0; x) and (m 22 )+ (0; x) = −(m 21 )− (0; x), and hence by Corollary 5.5 the limits in (5.21) and (5.22) are in fact continuous across w = 0. For convergence of moments, we need a uniform bound of πk (z; t) for |x| ≥ M for a fixed number M > 0. The results (5.29) and (5.30) are essentially in the analysis of [3], while (5.31)–(5.34) are new estimates. We again need to extend the method of [3] to obtain the results below. The proof is provided in Section 10 (see Sections 10.1.1 and 10.1.2 for the case when x ≥ M, and see Sections 10.2.1 and 10.2.2 for the case when x ≤ −M). PROPOSITION 5.6 Define x through the relation
2t x = 1 − 1/3 2/3 . k 2 k
(5.28)
Then there exists M0 such√that the following holds for any fixed M > M 0 . Let 0 < b < 1 and 0 < L < 2−3/2 M be fixed. Then as k, t → ∞, we have for x ≥ M, tz e πk (z; t) ≤ Ce−c|x|3/2 , |z| ≤ b, (5.29) t z −1 −k 3/2 −c|x| −1 e z πk (z; t) − 1 ≤ Ce , |z| ≥ b , (5.30) −tα e πk (−α; t) ≤ Cec|x| , α = 1 − 24/3 k −1/3 w, −L ≤ w ≤ L ,
(5.31) −tα −1 3/2 e (−α)−k πk (−α; t) − 1 ≤ Ce−c|x| , α = 1 − 24/3 k −1/3 w, −L ≤ w ≤ L , (5.32) and for x ≤ −M,
−tα e πk (−α; t) ≤ C, −tα −1 e (−α)−k πk (−α; t) ≤ C,
0 < α ≤ 1,
(5.33)
α ≥ 1.
(5.34)
RANDOM INVOLUTIONS
233
COROLLARY 5.7 Let α = 1−24/3 wk −1/3 , and let −L ≤ w ≤ L for fixed L > 0. Under the assumption of Proposition 5.6, for x ≤ −M, we have −tα e πk (−α; t) ≤ Cec|x| , (5.35) −tα −1 e (−α)−k πk (−α; t) ≤ Cec|x| , (5.36)
for some positive constants C and c.
Proof We have |e−t (α−α |e
−1 )
t (α−α −1 )
α k | = e2xw+(8/3)w
α −k | = e
3 +O(k −1 )
(5.37)
,
−2xw−(8/3)w 3 +O(k −1 )
(5.38)
.
Proposition 5.6 shows that (5.35) is true for w ≥ 0. For w < 0, write −1 −1 e−tα πk (−α; t) = e−t (α−α ) (−α)k e−tα (−α)−k πk (−α; t) .
(5.39)
Now (5.35) follows from (5.34) and (5.37). The estimate (5.38) is proved similarly.
The result below is new and is used for the asymptotics of L √ when α > 1 (see n,[ 2nα] Section 10.3 for the proof). 5.8 Let α > 1 be fixed. When PROPOSITION
α t α(α 2 − 1)1/2 x = 2 − ·√ , k α +1 (α 2 + 1)3/2 k we have lim e
k→∞
−αt
k
(−α) πk (−α
−1
1 ; t) = √ 2π
Z
x
x fixed,
(5.40)
2
e−(1/2)y dy.
(5.41)
−∞
6. De-Poissonization lemmas In this section, we describe a series of Tauberian-type de-Poissonization lemmas, which enable us to extract the asymptotics of the coefficient from the knowledge of the asymptotics of its generating function. Lemma 6.1 is due to Johansson [30], and Lemma 6.2 is taken from [3, Sec. 8]. Lemmas 6.3 and 6.4 are multi-index versions. Lemmas 6.1 and 6.3 are enough for both the convergence in distribution and the convergence of moments, but for convenience of computation we use Lemmas 6.2 and 6.4 for the convergence of moments in subsequent sections.
234
BAIK AND RAINS
For a sequence q = {qn }n≥0 , we define its Poisson generating function by φ(λ) = e−λ
X
qn
0≤n
λn . n!
(6.1)
6.1 For any fixed real number d > 0, set LEMMA
p √ µ(d) n = n + (2 d + 1 + 1) n log n, p √ νn(d) = n − (2 d + 1 + 1) n log n.
(6.2) (6.3)
Then there are constants C and n 0 such that for any sequence q = {qn }n≥0 satisfying (i) qn ≥ qn+1 , (ii) 0 ≤ qn ≤ 1, for all n ≥ 0, −d φ(µ(d) ≤ qn ≤ φ(νn(d) ) + Cn −d n ) − Cn
(6.4)
for all n ≥ n 0 . LEMMA 6.2 For any fixed real number d > 0, there exist constants C and n 0 such that for any sequence q = {qn }n≥0 satisfying Lemma 6.1(i) and (ii),
√ qn ≤ Cφ(n − d n),
(6.5)
√
1 − qn ≤ C(1 − φ(n + d n)),
(6.6)
for all n ≥ n 0 . For multi-indexed sequences there are similar results. For q = {q n 1 ,n 2 }n 1 ,n 2 ≥0 , define φ(λ1 , λ2 ) = e−λ1 −λ2
X
n 1 ,n 2 ≥0
qn 1 n 2
λn1 1 λn2 2 . n 1 !n 2 !
(6.7)
From the above two lemmas, we easily obtain the following lemmas. 6.3 (d) For any real number d > 0, define µ(d) n and νn as in Lemma 6.1. Then there exist constants C and n 0 such that for any q = {qn 1 ,n 2 }n 1 ,n 2 ≥0 satisfying (i) qn 1 ,n 2 ≥ qn 1 +1,n 2 , qn 1 ,n 2 ≥ qn 1 ,n 2 +1 , (ii) 0 ≤ qn 1 ,n 2 ≤ 1, for all n 1 , n 2 ≥ 0, LEMMA
−d −d −d −d (d) (d) (d) φ(µ(d) n 1 , µn 2 ) − C(n 1 + n 2 ) ≤ qn 1 n 2 ≤ φ(νn 1 , νn 2 ) + C(n 1 + n 2 )
for all n 1 , n 2 ≥ n 0 .
(6.8)
RANDOM INVOLUTIONS
235
Similarly, we have the following lemma. LEMMA 6.4 For any fixed real number d > 0, there exist constants C and n 0 such that for any q = {qn 1 ,n 2 }n 1 ,n 2 ≥0 satisfying the two conditions in Lemma 6.3,
√ √ qn 1 n 2 ≤ Cφ(n 1 − d n 1 , n 2 − d n 2 ), √ √ 1 − qn 1 n 2 ≤ C(1 − φ(n 1 − d n 1 , n 2 + d n 2 )),
(6.9) (6.10)
for n 1 , n 2 ≥ n 0 . Remark. Similar lemmas hold true for sequences of arbitrarily many indices. 7. Proofs of Theorems 3.1, 3.2, 3.3, and 3.5 The following results follow from Proposition 5.1. Result (7.1) is derived in [3, Lem. 7.1(iii)], and the other cases are similar. We omit the details. 7.1 Let M > M0 , where M0 is given in Proposition 5.6. Then there exist positive constants C and c which are independent of M, and a positive constant C(M) which may depend on M, such that the following results hold for large l. (i) (See [3].) Define x by 2t = l − x(l/2)1/3 . For −M < x < M, X C(M) 3/2 −1 log N j (t) − 2 log F(x) ≤ 1/3 + Ce−cM . (7.1) l COROLLARY
j≥l
(ii)
Define x by t = l − (x/2)l 1/3 . For −M < x < M, X −1 , log N (t) − log F(x) 2 j+2 j≥l
X C(M) 3/2 −1 log N2 j+1 (t) − log F(x) ≤ 1/3 + Ce−cM , l j≥l X log(1 − π2 j+2 (0; t)) − log E(x) ,
(7.2)
j≥l
X C(M) 3/2 log(1 + π2 j+1 (0; t)) − log E(x) ≤ 1/3 + Ce−cM , l j≥l X , log(1 + π (0; t)) + log E(x) 2 j+2 j≥l
(7.3)
236
BAIK AND RAINS
X C(M) 3/2 ≤ log(1 − π (0; t)) + log E(x) + Ce−cM . 2 j+1 l 1/3
(7.4)
j≥l
These results yield the asymptotics of the determinants in Theorem 4.1.
7.2 There exists M1 such that for M > M1 , there exist positive constants C and c which are independent of M, and a positive constant C(M) which may depend on M, such that the following results hold for large l. (i) Define x by 2t = l − x(l/2)1/3 . For −M < x < M, COROLLARY
(ii)
−t 2 e Dl (t) − F(x)2 ≤ C(M) + Ce−cM 3/2 . l 1/3
(7.5)
Define x by t = l − (x/2)l 1/3 . For −M < x < M,
−t 2 /2 −− C(M) 3/2 e Dl (t) − F(x)E(x)−1 ≤ 1/3 + Ce−cM , l C(M) −t 2 /2 ++ 3/2 e Dl−1 (t) − F(x)E(x) ≤ 1/3 + Ce−cM , l −t 2 /2+t +− C(M) 3/2 e Dl (t) − F(x)E(x)−1 ≤ 1/3 + Ce−cM , l C(M) −t 2 /2−t −+ 3/2 e Dl (t) − F(x)E(x) ≤ 1/3 + Ce−cM . l
(7.6) (7.7) (7.8) (7.9)
Proof 3/2 For C and c in Corollary 7.1, take M1 > M0 such that Ce−cM1 ≤ 1/2. Once we 3/2 M > M1 , then for l is large, C(M)/l 1/3 + Ce−cM < 1, and hence by (7.1) fix P −1 − 2 log F(x) ≤ 1. Using |e x − 1| ≤ (e − 1)|x| for |x| ≤ 1, j≥l log N j (t) −t 2 e Dl (t) − F(x)2 = F(x)2 e
P
j≥l
log N j (t)−1 −2 log F(x)
− 1
X −1 log N j (t) − 2 log F(x) . ≤ (e − 1)F(x) 2
(7.10)
j≥l
But from (2.11) and (2.13), F(x) is bounded for x ∈ R. Hence, using (7.1), we obtain the result for (i) with new constants C, c, and C(M). For (ii), we note that F(x)E(x) and F(x)E(x)−1 are bounded for x ∈ R from (2.11)–(2.14). From Proposition 5.2, Corollary 5.3, and Theorems 4.1 and 4.2, Corollary 7.2 immediately yields the following asymptotics for Poisson generating functions.
RANDOM INVOLUTIONS
237
PROPOSITION 7.3 Let 2t = l − x(l/2)1/3 , where x is fixed. As l → ∞, for each fixed α, β,
Pl (t; α) → F4 (x),
Pl (t; 1) → F1 (x),
Pl (t; α) → 0,
0 ≤ α < 1,
α > 1,
Pl (t; β) → F1 (x),
(7.11) (7.12) (7.13)
β ≥ 0.
(7.14)
Let 4t = l − x(2l)1/3 , where x is fixed. As l → ∞, for each fixed α, β, Pl (t; α, β) → F2 (x), 2
Pl (t; 1, β) → F1 (x) , Pl (t; α, β) → 0,
0 ≤ α < 1, β ≥ 0, β ≥ 0,
(7.15) (7.16)
α > 1, β ≥ 0.
(7.17)
Similarly, using Proposition 5.4, Corollary 5.5, and Theorem 4.1, we have the following theorem. THEOREM 7.4 Let 2t = l − x(l/2)1/3 , where x is fixed. As l → ∞, we have for any fixed w ∈ R,
Pl (t; α) → F (x; w),
α =1−
24/3 w . l 1/3
(7.18)
Let 4t = l − x(2l)1/3 , where x is fixed. As l → ∞, we have for each fixed β and w ∈ R, 25/3 w Pl (t; α, β) → F (x; w), α = 1 − 1/3 , β ≥ 0. (7.19) l Recall the relation between Q l~ (λ) and Pl~ (t) in (4.7)–(4.9). We now use the dePoissonization Lemma 6.3 to obtain the asymptotic results of Theorems 3.1 and 3.2. In order to apply the de-Poissonization lemma, we need the following monotonicity results. LEMMA 7.5 (Monotonicity) For any l, Pr(L k,m ≤ l), Pr(L k,m ≤ l), and Pr(L k,m + ,m − ≤ l) are monotone decreasing in k, m, m + , and m − .
Proof We first consider Pr(L k,m ≤ l). Let f km := Pr(L k,m ≤ l) · |Sk,m | be the number of elements in Sk,m with no increasing subsequence greater than l. Consider the map h : Sk,m−1 × {1, 2, . . . , 2k + m} → Sk,m defined as follows: for (π, j) ∈ Sk,m−1 ×
238
BAIK AND RAINS
{1, 2, . . . , 2k + m}, set h(π, j)(x) = π(x) for 1 ≤ x < j − 1, h(π, j)( j) = j, and set h(π, j)(x) = π(x − 1) for j < x ≤ 2k + m. Then it is easy to see that h −1 (σ ) consists of m elements and hence that (2k + m)|Sk,m−1 | = m|Sk,m |. Moreover, if π ∈ Sk,m−1 has an increasing subsequence of length greater than l, then h(π, j) has an increasing subsequence of length greater than l. Thus (2k + m) f k(m−1) ≥ m f km . But since |Sk,m | = (2k + m)!/(2k k!m!), we obtain Pr(L k,m−1 ≤ l) ≥ Pr(L k,m ≤ l). A similar argument works for the other cases. Note that |Sk,m + ,m − | = (2k + m + , m − )!/(k!m + !m − !). Thus Lemma (6.3) can be applied to obtain the asymptotics results in Theorems 3.1 and 3.2. The proofs are similar to that in [3, Sec. 9]. Now we consider convergence of moments. For this we first obtain the following estimates that follow from Proposition 5.1(i), (ii), (iv), and (v). The proof is very similar to that of [3, Lem. 7.1(i), (ii), (iv), (v)]. Compare the results with (2.11)– (2.14), noting Corollary 7.1. COROLLARY
7.6
Set
x t = l − l 1/3 . (7.20) 2 There exists M2 such that for a fixed M > M2 , there are positive constants C = C(M) and c = c(M) such that the following results hold. (i) For x ≥ M, Y Y 3/2 1− N2 j+2 (t)−1 , 1 − N2 j+1 (t)−1 ≤ Ce−c|x| , (7.21) j≥l
1− 1− (ii)
Y j≥l
Y j≥l
j≥l
(1 − π2 j+2 (0; t)), (1 + π2 j+2 (0; t)),
For x ≤ −M, Y N2 j+2 (t)−1 , j≥l
Y j≥l
Y j≥l
Y j≥l
(1 − π2 j+2 (0; t)), (1 + π2 j+2 (0; t)),
1− 1−
Y j≥l
Y j≥l
3/2
(1 + π2 j+1 (0; t)) ≤ Ce−c|x| , (7.22) 3/2
(1 − π2 j+1 (0; t)) ≤ Ce−c|x| . (7.23)
3
N2 j+1 (t)−1 ≤ Ce−c|x| , Y j≥l
Y j≥l
(7.24) 3/2
(1 + π2 j+1 (0; t)) ≤ Ce−c|x| , 3/2
(1 − π2 j+1 (0; t)) ≤ Ce+c|x| .
(7.25) (7.26)
RANDOM INVOLUTIONS
239
Remark. From the definitions of Pl~ (t) and the equalities of Theorem 4.1, we know that all the infinite products above are between 0 and 1. Now as in [3, Sec. 9], using Lemma 6.4 and Theorems 3.1 and 3.2, this implies Theorem 3.3. Theorem 3.5 follows from Theorem 4.3. 8. Proofs of Theorems 3.4 and 3.6 In this section, we prove Theorem 3.4 by summing up the asymptotic results of Theorems 3.1, 3.2, and 3.3. Theorem 3.6 can be proved in a similar way from Theorem 4.3. Proof of (3.20) Note that we have a disjoint union S˜n =
[
Sk,m .
(8.1)
2k+m=n
Set pkml = Pr(L km ≤ l), the probability that the length of the longest decreasing subsequence of π ∈ Sk,m is less than or equal to l. As the first row and the first column of π in S˜n have the same statistics, we have X 1 Pr( L˜ n ≤ l) = pkml |Sk,m |. (8.2) | S˜n | 2k+m=n Note that
2k + m (2k)! . (8.3) 2k 2k k! As n → ∞ (see [33, pp. 66–67]), we have X 1 n/2 −n/2+√n−1/4 7 −1/2 −3/4 ˜ | Sn | = |Sk,m | = √ n e 1+ n +O n , (8.4) 24 2 |Sk,m | =
2k+m=n
√ √ and the main contribution to the sum comes from n − n +1/4 ≤ m ≤ n + n +1/4 . Fix 0 < a < 1 < b. We split the sum in (8.2) into two pieces: X 1 X Pr( L˜ n ≤ l) = pkml |Sk,m | + pkml |Sk,m | , (8.5) | S˜n | (∗) (∗∗) √ √ where (∗) is the region a n ≤ m ≤ b n and where (∗∗) is the rest. n For 2k + m = n, the quantity |Sk,m | = 2k (2k)!/(2k k!) is unimodal for 0 ≤ k ≤ √ n, and the maximum is achieved when k ∼ (n − n)/2 as n → ∞. Hence X pkml |Sk,m | ≤ n · max |Sk,[a √n] |, |Sk,[b√n] | . (8.6) (∗∗)
240
BAIK AND RAINS
√ Using Stirling’s formula for (8.3), for any fixed c, when 2k + [c n] = n, we have |Sk,[c
√
n] |
∼
−1/2+c2 /4 √ n/2 −n/2+ n(c−c log c) e n e . √ πcn 1/4
(8.7)
Hence, using (8.4), we have √ √ 1 X pkml |Sk,m | ≤ Cn 3/4 · max e n(a−1−a log a) , e n(b−1−b log b) . | S˜n | (∗∗)
(8.8)
But f (x) = x − 1 − x log x is increasing in 0 < x < 1, is decreasing in x > 1, and f (1) = 0. Therefore there are positive constants C and c such that for large n, √ 1 X pkml |Sk,m | ≤ Ce−c n . (8.9) | S˜n | (∗∗)
On the other hand, Lemma 6.3 says that (recall (4.8)) for any fixed real number √ √ d > 0, there is a constant C such that for a n ≤ m ≤ b n, (d) −1/2 1/2 (d) Pl (2µ(d) ) ; µ (2µ ) − Cn −d/2 m k k (8.10) ≤ pkml ≤ Pl (2νk(d) )1/2 ; νm(d) (2νk(d) )−1/2 + Cn −d/2 for sufficiently large n. Since Pl (t; β) ≤ Pl+1 (t; β), Theorem 4.1 for P2l+1 (t; β) yields (d)
(d)
(d)
(d)
++ ++ e−µk D[(l−1)/2] ((2µk )1/2 ) − Cn −d/2 ≤ pnml ≤ e−νk D[l/2] ((2νk )1/2 ) + Cn −d/2 . (8.11) √ √ √ √ Let l = [2 n + xn 1/6 ]. For a n ≤ m ≤ b n and hence for (n − b n)/2 ≤ k ≤ √ (n − a n)/2, p (d) (d) l/2 − (2µk )1/2 2(l/2)−1/3 , l/2 − (2νk )1/2 2(l/2)−1/3 = x + O n −1/6 log n . (8.12)
Also, note that from the asymptotics (2.3), (2.4), and (2.11)–(2.14), 1 (F(x)E(x))0 = − (v(x) + u(x))F(x)E(x) (8.13) 2 is bounded for x ∈ R. Hence, using (7.7) in Corollary 7.2, (8.12), and (8.13), we obtain −ν (d) ++ (d) 1/2 e k D ) − (F E)(x) [l/2] ((2νk ) (d) ++ (d) (d) ≤ e−νk D[l/2] ((2νk )1/2 ) − (F E) (l/2 − (2νk )1/2 )2(l/2)−1/3 (8.14) + (F E) (l/2 − (2νk(d) )1/2 )2(l/2)−1/3 − (F E)(x) p 3/2 ≤ C(M)n −1/6 + Ce−cM + Cn −1/6 log n.
RANDOM INVOLUTIONS
241
Therefore we have X (∗)
pnml |Sn,m |
≤ F(x)E(x) + C(M)n −1/6 + Ce−cM
3/2
p X + Cn −1/6 log n |Sn,m |. (8.15) (∗)
Similarly, X (∗)
pnml |Sn,m |
≥ F(x)E(x) − C(M)n −1/6 − Ce−cM
3/2
p X − Cn −1/6 log n |Sn,m |. (8.16) (∗)
But from (8.9), √ 1 X 1 X |Sn,m | = 1 − |Sn,m | = 1 + O(e−c n ). | S˜n | (∗) | S˜n | (∗∗)
(8.17)
Thus, using (8.5), (8.9), (8.15), (8.16), and (8.17), we obtain (3.20). Proof of (3.22) As in [3, Sec. 9], integrating by parts, we have Z 0 Z Z ∞ E (χ˜ n ) p = x p d Fn (x) = − px p−1 Fn (x) d x + −∞
−∞
∞
px p−1 (1− Fn (x)) d x,
0
(8.18)
√
where Fn (x) := Pr(χ˜ n ≤ x) = Pr( L˜ n ≤ 2 n + xn 1/6 ). From Theorem 4.1 and Corollary 7.6, we have 1 − e−t e−t
2 /2 2 /2
3/2
Dl++ (t) ≤ Ce−c|x| , 3
Dl++ (t) ≤ Ce−c|x| ,
x ≥ M,
(8.19)
x ≤ −M,
(8.20)
for a fixed M > M2 where t = l−(x/2)l 1/3 . Noting that P2l+1 (t; β) = e−t for all β ≥ 0, from (8.2), Lemma 6.4, (8.19), and (8.20), we obtain 3/2
1 − Fn (x) ≤ Ce−c|x| , Fn (x) ≤ Ce
−c|x|3
,
2 /2
Dl++ (t)
x ≥ M,
(8.21)
x ≤ −M.
(8.22)
Now using convergence in distribution, the dominated convergence theorem gives (3.22).
242
BAIK AND RAINS
Remark. We could also proceed using Pr( L˜ n ≤ l) =
1 | S˜n |
X
2k+m=n
pkml |Sk,m |.
(8.23)
√ The main contribution to the sum from |Sk,m | comes from the region |m − n| ≤ √ n 1/4+ . On the other hand, from Theorem 3.2, when m = n − 2wn 1/3 , the quantity √ pkml converges to F(x; w). Since the region m = n + cn 1/4+ is much narrower √ than the region m = n + cn 1/3 , the main contribution to the sum comes from the case when w = 0, implying that n 1 X ˜ Pr( L n ≤ l) ∼ F(x; 0)|Sn,m | = F1 (x). | S˜n | m=0
(8.24)
In the following proof for signed involutions, we make this argument rigorous. Proof of (3.21) We have a disjoint union S˜n =
[
2k+m + +m − =n
Sk,m + ,m − .
(8.25)
Hence again 1 Pr( L˜ n ≤ l) = S˜ n
One can check that
S
k,m + ,m −
Hence we have S˜ = n
X
2k+m + +m − =n
where
S
X
2k+m + +m − =n
pkm + m − l Sk,m + ,m − .
(2k + m + + m − )! = . k!m + !m − !
k,m + ,m −
=
X
X
0≤k≤[n/2] 0≤m + ≤n−2k
(8.26)
(8.27)
f (m + , k),
(8.28)
n! . (8.29) m + !(n − m + − 2k)!k! For fixed 0 ≤ k ≤ [n/2], f (m + , k) is unimodal in m + and achieves its maximum when m + ∼ n/2 − k. Also, f (n/2 − k, k) is unimodal in k, and the maximum is √ attained when k ∼ n/2 − n/2. Hence f (m + , k) has its maximum when (m + , k) ∼ √ √ √ ( n/2, n/2 − n/2). Consider the disk D of radius n 1/4+ centered at ( n/2, n/2 − √ n/2). We show that the main contribution to the sum in (8.28) comes from D. Set r r n n n m+ = + x, k= − + y, |x|, |y| ≤ n 1/4+ . (8.30) 2 2 2 f (m + , k) :=
RANDOM INVOLUTIONS
243
By Stirling’s formula, √ √ 1 2 2 (2n)n/2 e−n/2+ 2n e−(x +(x+2y) )/ 2n 1 + O(n −1/4+3/2 ) . enπ (8.31) Hence from the unimodality discussed above,
f (m + , k) = √
X
(m + ,k)∈D /
f (m + , k) ≤ n 2
max
(m + ,k)∈∂ D
√ 2 √ n2 f (m + , k) ≤ √ (2n)n/2 e−n/2+ 2n e−5 2n , enπ
(8.32)
and by summing up using (8.31), X
(m + ,k)∈D
√ 1 f (m + , k) = √ (2n)n/2 e−n/2+ 2n 1 + O(n −1/4+3/2 ) . 2e
(8.33)
Hence we have √ S˜ = √1 (2n)n/2 e−n/2+ 2n 1 + O(n −1/4+3/2 ) , n 2e
and the main contribution to the sum in (8.28) comes from D. As in Theorem 3.4, we write X X 1 pkm + m − l S˜k,m + ,m − + pkm + m − l S˜k,m + ,m − . Pr( L˜ n ≤ l) = S˜ c n
D
(8.34)
(8.35)
D
From (8.32) and (8.34),
1 X 2 pnm + m − l S˜n,m + ,m − ≤ e−10n . S˜ c n
(8.36)
D
On the other hand, by the remark to Lemma 6.3 and Theorem 4.1 (recall (4.9)), (d) (d) ∗ pkm + m − l ≤ e−νm + −νk π[l/2] −
νm(d)+
(νk(d) )1/2
; (νk(d) )1/2 D[l/2] ((νk(d) )1/2 ) + Cn −d/2
(8.37)
for large n. We have a similar inequality of the other direction with ν, l, and +Cn −d/2 replaced by µ,√ l − 1, and −Cn −d/2 . Let l = [2 2n + x22/3 (2n)1/6 ]. In the region D, p (d) (l/2 − 4(νk )1/2 )(l/4)−1/3 = x + O(n −1/6 log n) (8.38) and
(1 − νm(d)+ /(νk(d) )1/2 )2−4/3 (l/2)1/3 = O(n −1/12+ ).
(8.39)
244
BAIK AND RAINS
Hence, as in (8.14), using (7.5) in Corollary 7.2, Proposition 5.4, and Corollary 5.5, −ν (d) −ν (d) ∗ e m + k π [l/2] − ≤ Cn
(d)
νm +
; (νk(d) )1/2 D[l/2] ((νk(d) )1/2 ) − (νk(d) )1/2 p −1/6 −1/12+ −1/6 log n + Cn
+ C(M)n
F (x; 0)
+ Ce
(8.40)
−cM 3/2
for a constant C(M) which may depend on M and constants C and c which are independent of M. Thus for large n, √ ˜ L n − 2 2n Pr 2/3 ≤ x ≤ F (x; 0) + e(n, M) (8.41) 2 (2n)1/6 with some error e(n, M) such that lim M→∞ limn→∞ e(n, M) = 0. Similarly we have an inequality for the other direction. Recalling F (x; 0) = F1 (x)2 from (2.35), we obtain (3.21). Proof of (3.23) Integrating by parts, Z E (χ˜ n ) p =
∞
−∞
=−
Z
x p d Fn (x) 0
px −∞
p−1
Fn (x) d x +
Z
∞
(8.42) px
p−1
0
(1 − Fn (x)) d x,
√ where Fn (x) := Pr(χ˜ n ≤ x) = Pr( L˜ n ≤ 2 2n + x22/3 (2n)1/6 ). Note that when x < −(4n)1/3 , Fn (x) = 0, and that when x > 21/6 n 5/6 − (4n)1/3 , Fn (x) = 1. Let M > M0 fixed. Consider the case when −(4n)1/3 ≤ x ≤ −M. From (8.35) and (8.36), 1 X 2 Fn (x) ≤ pkm + m − l S˜k,m + ,m − + Ce−10n , (8.43) S˜ n D √ where l = [2 2n + x22/3 (2n)1/6 ]. We apply Lemma 6.4, Corollary 7.6, and (5.36) in Corollary 5.7. Note that we are in the region α → 1 faster than k −1/3 , and hence w is bounded, say, −1 ≤ w ≤ 1. So we can apply (5.36) in Corollary 5.7. Then we obtain 3 2 (8.44) Fn (x) ≤ Ce−c|x| + Ce−10n . Since −(4n)1/3 ≤ x ≤ −M, we have e−10n
2
≤ e−(10/2
4 )|x|6
;
(8.45)
thus 3
Fn (x) ≤ Ce−c|x| + Ce−(10/2
4 )|x|6
.
(8.46)
RANDOM INVOLUTIONS
245
On the other hand, when M ≤ x ≤ 21/6 n 5/6 − (4n)1/3 , similarly we have 1 X 2 1 − Fn (x) ≤ (1 − pkm + m − l ) S˜k,m + ,m − + Ce−10n . (8.47) S˜ n
D
Using Lemma 6.4, Corollary 7.6, and (5.32) in Proposition 5.6, we obtain 1 − Fn (x) ≤ Ce−c|x|
3/2
Since M ≤ x ≤ 21/6 n 5/6 − (4n)1/3 , e−10n
2
≤ e−(10/2
2
+ Ce−10n .
2/5 )|x|12/5
(8.48)
;
(8.49)
thus 1 − Fn (x) ≤ Ce−c|x|
3/2
+ Ce−(10/2
2/5 )|x|12/5
.
(8.50)
Therefore, using the dominated convergence theorem, we obtain (3.23). 9. Asymptotics for α > 1 As we remarked after Theorem 3.3, when α > 1, we must use a different scaling to obtain useful results. Let L (t; α) and L (t; α, β) be random variables with the distribution functions given by Pr(L (t; α) ≤ l) = Pl (t; α) and Pr(L (t; α, β) ≤ l) = Pl (t; α, β), respectively: the Poissonized version of L and of L . Under appropriate scalings, we obtain the Gaussian distribution in the limit. THEOREM 9.1 For α > 1 and β ≥ 0 fixed, Z x L (t; α) − (α + α −1 )t 1 2 p lim Pr e−(1/2)y dy, ≤x =√ t→∞ −1 2π −∞ (α − α )t Z x −1 L (t; α, β) − 2(α + α )t 1 2 p lim Pr e−(1/2)y dy. ≤x =√ t→∞ −1 2π −∞ 2(α − α )t
(9.1) (9.2)
Proof p Let l = (α + α −1 )t + (α − α −1 )t for L . For large t, 2t/l ≤ c < 1 for some c > 0. From Proposition 5.1(i), using Theorem 4.1, it is easy to see that 2 2 e−t /2 Dl±± (t), e−t ±t Dl±∓ (t) → 1 exponentially as l → ∞. Now Theorem 4.1, p (5.29), and Proposition 5.8 imply (9.1). For L , let l = 2(α +α −1 )t + 2(α − α −1 )t. 2 Similarly, e−t Dl (t) → 1 exponentially as l → ∞, and we obtain (9.2). Unfortunately, we can no longer apply the de-Poissonization technique; the difficulty is that (α + α −1 )t depends too strongly on small perturbations in α. Indeed, as we see, the asymptotics of the non-Poisson processes are different.
246
BAIK AND RAINS
Consider the case of involutions with [αt] fixed points and [t 2 /2] 2-cycles; the case of signed involutions is analogous. By symmetry, this is the same as the largest increasing subset distribution for the point selection process in the triangle 0 ≤ y ≤ x ≤ 1 with [t 2 /2] generic points and [αt] diagonal points. As was observed in [7, Rem. 2 to Cor. 7.6], it is equivalent to consider weakly increasing subsets where the extra points are added to the line y = 0 instead of to the diagonal. As in (3.1), let χ[t 2 /2],[αt] =
L [t 2 /2],[αt] − (α + 1/α)t p . (1/α − 1/α 3 )t
(9.3)
THEOREM 9.2 As t → ∞, the variable χ[t 2 /2],[αt] converges in distribution and moments to N (0, 1).
Proof Let S(t) be the set of points at time t, and let I be a largest increasing subset of S(t). Then there exists some number 0 ≤ s + ≤ 1 (not unique) such that (S(t) ∩ {y = 0, 0 ≤ x ≤ s + }) ⊂ I
(9.4)
and such that every other point of I has x > s + and y > 0. For any 0 ≤ s ≤ 1, we thus have f 1 (s) + f 2 (s) ≤ |I |, (9.5) where f 1 (s) is the number of points of S(t) with y = 0 and 0 ≤ x ≤ s, and where f 2 (s) is the largest increasing subset of S(t) lying entirely in the (part-open) trapezoid with x ≥ s, y > 0. Since f 1 (s) is binomial with parameters [αt] and s, we have the following lemma. LEMMA 9.3 Let M > 0 be sufficiently large and fixed. For all 0 ≤ s < 1, there exist positive constants C and c independent of s such that for w ≥ M, 2
Pr( f 1 (s) > αst + wt 1/2 ) ≤ Ce−c|w| ,
(9.6)
while for w ≤ −M, 2
Pr( f 1 (s) < αst + wt 1/2 ) ≤ Ce−c|w| .
(9.7)
For f 2 we have the following lemma. LEMMA 9.4 Let M > 0 be sufficiently large and fixed. For 0 ≤ s < 1, there exist positive constants
RANDOM INVOLUTIONS
247
C and c independent of s such that for all w ≥ M, p 3/2 Pr( f 2 (s) > 2 (1 − s)t + wt 1/3 ) ≤ Ce−c|w| ,
(9.8)
and for all w ≤ −M,
p 3 Pr( f 2 (s) < 2 (1 − s)t + wt 1/3 ) ≤ Ce−c|w| .
(9.9)
Proof We first show the corresponding large-deviation result for the Poissonization. Define f 20 (s, t) to be the length of the longest increasing subsequence when the number of points in the trapezoid is Poisson with parameter t 2 (1 − s 2 )/2. Then f 20 (s, t) is bounded between the corresponding processes for the rectangle s ≤ x ≤ 1, 0 ≤ y ≤ 1 and for the triangle 0 ≤ (x − s)/(1 − s) ≤ y ≤ 1. In particular, if f 20 (s, t) deviates sig√ nificantly from 1 − st, so must the appropriate bounding process; the result follows immediately from the corresponding results for rectangles and triangles. The corresponding large-deviation result when the number of points is fixed then follows from Lemma 6.2. In our case the number of points in the trapezoid is binomial with parameters t 2 /2 and (1 − s 2 ); the lemma follows via essentially the same argument used to prove Lemma 6.2. As we see, the value s = 1 − α −2 deserves special attention. LEMMA 9.5 Let M > 0 be sufficiently large and fixed. There exist positive constants C and c such that for w ≥ M,
Pr( f 1 (1 − α −2 ) + f 2 (1 − α −2 ) > (α + 1/α)t + wt 1/2 ) ≤ Ce−c min(|w|
2 ,|w|3/2 t 1/4 )
, (9.10)
and for w ≤ −M, 2
Pr( f 1 (1 − α −2 ) + f 2 (1 − α −2 ) < (α + 1/α)t + wt 1/2 ) ≤ Ce−c|w| .
(9.11)
Moreover, if we define χ0 (t) =
f 1 (1 − α −2 ) + f 2 (1 − α −2 ) − (α + 1/α)t p , (1/α − 1/α 3 )t
(9.12)
then χ0 (t) converges to a standard normal distribution, both in distribution and moments.
248
BAIK AND RAINS
Proof That χ0 (t) converges as stated follows from the fact that if we write χ0 (t) = χ1 (t) + χ2 (t), with f 1 (1 − α −2 ) − (α − 1/α)t p , (1/α − 1/α 3 )t
χ1 (t) =
(9.13)
f 2 (1 − α −2 ) − (2/α)t p , (1/α − 1/α 3 )t
χ2 (t) =
(9.14)
then χ1 (t) converges in distribution and moments to a standard normal distribution, and χ2 (t) converges in distribution and moments to zero. For the large-deviation bounds, we note that if x + y > z + w, then either x > z or y > w. Thus for any 0 ≤ b ≤ 1, we have Pr f 1 (1 − α −2 ) + f 2 (1 − α −2 ) > (α + 1/α)t + wt 1/2 ≤ Pr f 1 (1 − α −2 ) > (α − 1/α)t + bwt 1/2 + Pr f 2 (1 − α −2 ) > (2/α)t + (1 − b)wt 1/2 2
≤ Ce−c|bw| + Ce−c|(1−b)w|
3/2 t 1/4
;
(9.15)
2
the result follows by balancing the two terms. In the other case, the Ce −c|w| -term always dominates. LEMMA 9.6 For any sufficiently small > 0, there exist positive constants C and c such that Pr s + − (1 − 1/α 2 ) > t /3−1/3 < Ce−ct (9.16)
and
Pr((1 − 1/α 2 ) − s + > t /2−1/2 ) < Ce−ct
(9.17)
for all sufficiently large t. Proof Define a sequence si by taking si = 1 − (1 − 2/(i + 2))2 /α 2
(9.18)
for all i ≥ 0. Similarly, define a sequence si0 by si0 = max(ti , 0), with ti = 1 − (1 + 2e2
−1−i
)2 /α 2
(9.19)
(9.20)
RANDOM INVOLUTIONS
249
for i < 0 and ti = 1 − (1 + 4/(i + 1))2 /α 2
(9.21)
for i ≥ 0. Note that si is strictly decreasing and ti is strictly increasing. 9.7 For all i ≥ 0, LEMMA
p αsi + 2 1 − si+1 < α + 1/α.
For all i,
(9.22)
q 0 αsi+1 + 2 1 − si0 < α + 1/α.
(9.23)
Proof In the first case, we have p α + 1/α − (αsi + 2 1 − si+1 ) =
4 (i
+ 2)2 (i
+ 3)α
.
(9.24)
In the second case, it suffices to verify the formula with s 0 replaced by t. For i < −1, p −i (9.25) α + 1/α − (αti+1 + 2 1 − ti ) = 4e2 /4 /α. For i = −1,
Finally, for i ≥ 0,
p α + 1/α − (αti+1 + 2 1 − ti ) = (24 − 4e)/α.
p α + 1/α − (αti+1 + 2 1 − ti ) =
8i . (i + 1)(i + 2)2 α
(9.26)
(9.27)
Let i 1 = t 1/6−/6 , i 2 = t 1/4−/4 . Then there exist constants C and c such that for 0 ≤ i ≤ i1, Pr f 1 (si ) + f 2 (si+1 ) > f 1 (1 − α −2 ) + f 2 (1 − α −2 ) < Ce−ct . (9.28)
Since f 1 (si ) + f 2 (si+1 ) is an upper bound on f 1 (s) + f 2 (s) with si+1 ≤ s ≤ si , it follows that Pr(s + ∈ [si+1 , si ]) < Ce−ct . (9.29) Similarly, for i ≤ i 2 ,
0 ) + f 2 (si0 ) > f 1 (1 − α −2 ) + f 2 (1 − α −2 ) < Ce−ct , Pr f 1 (si+1
(9.30)
and thus
0 Pr(s + ∈ [si0 , si+1 ]) < Ce−ct .
(9.31)
Since there are only i 1 + i 2 + log log α such events to consider, Lemma 9.6 is proved.
250
BAIK AND RAINS
In particular, with probability 1 − Ce −ct , we have f 1 (1 − α −2 − t /2−1/2 ) + f 2 (1 − α −2 + t /3−1/3 ) ≤ L(t)
≤ f 1 (1 − α −2 + t /3−1/3 ) + f 2 (1 − α −2 − t /2−1/2 ). (9.32)
But then, using the fact that f 1 (1 − α −2 ) − f 1 (1 − α −2 − t /2−1/2 )
(9.33)
f 1 (1 − α −2 − t /2−1/2 ) − f 1 (1 − α −2 )
(9.34)
and are Poisson and using the large-deviation behavior of f 2 (s), we find that 0 Pr L(t) − f 1 (1 − α −2 − t /2−1/2 ) + f 2 (1 − α −2 + t /3−1/3 ) ≥ t 1/2− ≤ Ce−ct (9.35) and 0 Pr f 1 (1 − α −2 − t /3−1/3 ) + f 2 (1 − α −2 + t /2−1/2 ) − L(t) ≥ t 1/2− ≤ Ce−ct . (9.36) So χ(t)−χ0 (t) converges to zero in a fairly strong sense; in particular, χ(t) and χ 0 (t) must have the same limiting distribution and limiting moments. Thus Theorem 9.2 is proved. Remark. The above proof could be applied equally well to the Poisson process; the beta distribution would then be replaced by a Poisson distribution. For the signed involution case, with [2αt] fixed points, [2βt] negated points, and [2t 2 ] 2-cycles, again we let χ[t 2 ],[2αt],[2βt] =
L [t 2 /2],[αt],[βt] − 2(α + 1/α)t p . 2(1/α − 1/α 3 )t
(9.37)
Then the analogous argument proves that χ 0 (t) also converges in moments and distribution to a standard normal. 10. Steepest descent–type analysis for Riemann-Hilbert problems In this section, we prove the asymptotics of orthogonal polynomials results stated in Section 5 by applying the steepest descent–type method to RHP (5.4). The steepest descent method for RHP’s, the Deift-Zhou method, was introduced by Deift and X. Zhou in [18], developed further in [19] and [16], and finally placed in a systematic
RANDOM INVOLUTIONS
251
form by Deift, S. Venakides, and Zhou in [17]. The steepest descent analysis of RHP (5.4) was first conducted in [3]. The analysis of [3] has many similarities with [13], [14], and [15] where the asymptotics of orthogonal polynomials on the real line with respect to a general weight is obtained, leading to a proof of universality conjectures in random matrix theory. As mentioned in the introduction and Section 5, in this section we extend the analysis of [3] and obtain new estimates on the orthogonal polynomials πk (z; t). The extension is done roughly in two categories. In [3], the quantity of interest was Y21 (0; k; t), and so the z-dependence of the error bound of Y (z; k; t) was not considered carefully. But in the present paper we need the asymptotics of Y (z) for general z ∈ C and also for the case when z → −1 as k, t → ∞. Hence the first category of our extension is to investigate how the error estimate depends on z. This task sometimes requires improved estimates of the solution Y (z) (see, e.g., (10.42), where an improved L 1 -norm bound of the jump matrix is needed). On the other hand, as we see, the asymptotic solution Y (z) is expressed in terms of the so-called g-function. Thus we need detailed analysis of the g-function to obtain the asymptotics of the orthogonal polynomials. In the special case z = 0, we have g(0) = πi (see [3, Lem. 4.2]). Hence in [3], the analysis of the g-function was quite simple. But in the present paper, we need general values of g(z), and this in some cases requires further analysis. Hence the analysis of the g-function is the second category of our extension (e.g., see (10.109)–(10.121), where we need a further analysis of the g-function). Again the analysis in this section relies heavily on the analysis of [3] and we extend the method of [3]. For continuity of presentation and also for the convenience of readers, we nevertheless include some calculations that overlap [3]. When the analysis overlaps that of [3], we only sketch the method, and instead we focus on new features to indicate how to prove the propositions in Section 5. We say that an RHP is normalized at ∞ if the solution m satisfies the condition m → I as z → ∞. Thus, for instance, RHP’s (2.15) and (10.1) are normalized at ∞, while RHP (5.4) is not. In [3] it turned out that the asymptotic analysis differs critically when (2t)/k ≤ 1 and (2t)/k > 1, due to the difference of (the support of) the associated equilibrium measure (see [3, Lem. 4.3]). Hence we discuss those two cases separately in Sections 10.1 and 10.2, which extend [3, Secs. 5 and 6], respectively. Each section is also divided into three subcases. In each subcase the corresponding case of the propositions in Section 5 (except Proposition 5.8) is proved. Section 10.3 is new, and Proposition 5.8 is proved there.
252
BAIK AND RAINS
10.1. When (2t)/k ≤ 1 The following algebraic transformations (10.1)–(10.10) of RHP’s are taken from [3, (5.1)–(5.3)]. Define (−1)k et z 0 m (1) (z; k; t) := Y (z; k; t) , |z| < 1, 0 (−1)k e−t z ! (10.1) −k et z −1 z 0 (1) , |z| > 1. m (z; k; t) := Y (z; k; t) −1 0 z k e−t z Then m (1) solves a new RHP that is equivalent to RHP (5.4) in the sense that a solution of one RHP yields algebraically a solution of the other RHP, and vice versa: m (1) (z; k; t) is analytic in C \ 6, ! −1 (−1)k z k et (z−z ) (−1)k (1) (1) m (z; k; t) = m − (z; k; t) on 6, −1 + 0 (−1)k z −k e−t (z−z ) (1) m (z; k; t) = I + O(1/z) as z → ∞, (10.2) where 6 is the unit circle oriented counterclockwise as before. Here and in the sequel, m + (z) (resp., m − (z)) is understood as the limit from the left-hand (resp., right-hand) side of the contour as one goes along the orientation of the contour. Now we define m (2) (z; k; t) in terms of m (1) (z; k; t) as follows: for even k, m (2) ≡ m (1) , |z| > 1, (2) (1) 0 −1 m ≡ m 1 0 , |z| < 1; for odd k, m (2) ≡ m (2) ≡
1 0 1 0
(1) 1 0 , 0 −1 m 0 −1 0 m (1) 0 −1 , −1 −1 0
(10.3) |z| > 1, |z| < 1.
Then m (2) (·; k; t) solves another RHP m (2) (z; k; t) = m (2) (z; k; t)v (2) (z; k; t) on 6, + − m (2) (z; k; t) = I + O(1/z) as z → ∞,
where
v
(2)
1 (z; k; t) = −1 (−1)k z −k e−t (z−z )
−(−1)k z k et (z−z 0
−1 )
!
.
(10.4)
(10.5)
RANDOM INVOLUTIONS
253
The jump matrix has the following factorization: ! ! k z k et (z−z −1 ) 1 0 1 −(−1) (2) −1 (2) ) b+ . v (2) = =: (b− −1 (−1)k z −k e−t (z−z ) 1 0 1 (10.6) We note that through the changes Y → m (1) → m (2) , we have (2)
Y11 (z; k; t) = −(−1)k e−t z m 12 (z; k; t), Y21 (z; k; t) = Y11 (z; k; t) = Y21 (z; k; t) =
|z| < 1,
(2) −e−t z m 22 (z; k; t), |z| < 1, −1 (2) z k e−t z m 11 (z; k; t), |z| > 1, −1 (2) (−z)k e−t z m 21 (z; k; t), |z| >
(10.7) (10.8) (10.9)
1.
(10.10)
As in [3, (5.4)], the absolute value of the (12)-entry of the jump matrix v (2) is ek F(ρ,θ;2t/k) where F(z; γ ) = F(ρeiθ ; γ ) :=
γ (ρ − ρ −1 ) cos θ + log ρ, 2
The absolute value of the (21)-entry of v (2) is e−k F(ρe
iθ ;2t/k)
z = ρeiθ .
(10.11)
. Note that
F(ρ, θ; γ ) = −F(ρ −1 , θ; γ ). sig
(10.12)
sig
sig
sig
Figure 3 shows the curves F(z; γ ) = 0. In 1 ∪ 3 , F > 0, and in 2 ∪ 4 , sig F < 0. The region 2 becomes smaller as γ increases, and when γ = 1, the curve F(z; γ ) = 0 contacts the unit circle 6 at z = −1 with the angle π/3. We distinguish three subcases, as in [3, Sec. 5]. 10.1.1. The case 0 ≤ 2t ≤ ak for some 0 < a < 1 sig It is possible to fix ρa < 1 such that the circle {z : |z| = ρa } is in the region 2 for all such t and k. Define m (3) (z; k; t) by (see [3, (5.9)]) (3) (2) (2) −1 m = m (b+ ) , ρa < |z| < 1, (2) −1 (10.13) m (3) = m (2) (b− ) , 1 < |z| < ρa−1 , m (3) = m (2) , |z| < ρ , |z| > ρ −1 . a
(3)
a
(3)
Then m (3) satisfies a new jump condition m + = m − v (3) on 6 (3) := {z : |z| = (2) (2) −1 ρa , ρa−1 }, where v (3) = b+ , |z| = ρa and v (3) = (b− ) , |z| = ρa−1 . This 6 (3) is not the best choice (see Section 10.1.2). But for a simple and direct estimate, we use this choice in this section. From the choice of ρa , we have (see [3, (5.13)–(5.14)]) |v (3) (z; k; t) − I | ≤ e−ck
for all z ∈ 6 (3) ,
(10.14)
254
BAIK AND RAINS
sig
4
−1 sig
3
sig
1
0
sig 2
Figure 3. Curves of F(z; γ ) = 0 when 0 < γ < 1
which implies that I − C w(3) is invertible and the norm of the inverse is uniformly bounded, where w (3) := v (3) − I and C w(3) ( f ) := C − ( f w (3) ) on L 2 (6 (3) , |dz|), C ± being Cauchy operators (see [3, (2.5)–(2.9) and references therein]). From the general theory of RHP’s, we have m
(3)
1 (z) = I + 2πi
Z
6 (3)
((I − C w(3) )−1 I )(s)w (3) (s) ds, s−z
z∈ / 6 (3) .
(10.15)
This implies the estimates (see [3, (5.16)]) |m (3) 22 (0; k; t) − 1|,
−ck |m (3) , 12 (0; k; t)| ≤ Ce
(10.16)
which, using (10.13), (10.7), (10.8) and (5.5), (5.6), yield Proposition 5.1(i). This is precisely the result contained in [3, (5.17)]. From (10.13), (10.7), (10.9), and (5.6), we have (3)
πk (z; t) = −(−1)k e−t z m 12 (z; k; t), πk (z; t) = πk (z; t) =
k −t z −1
|z| < ρa ,
(3) z e m 11 (z; k; t), |z| > ρa−1 , −1 (3) (3) z k e−t z m 11 (z; k; t) − (−1)k e−t z m 12 (z; k; t),
(10.17) (10.18) ρa < |z| < ρa−1 . (10.19)
RANDOM INVOLUTIONS
255
Let 0 < b < 1 be a fixed number. From Figure 3, we could have chosen ρa such that ρa > b. When |z| ≤ b and |z| ≥ b −1 , we have dist(z, 6 (3) ) ≥ c > 0. Since z is uniformly bounded away from the contour, we can extend the argument leading to [3, (5.17)] where the uniform boundedness of zero from the contour is used. Hence, using (10.15), (10.17) and (10.18) imply that |et z πk (z; t)| ≤ Ce−ck , |e
t z −1
|z| ≤ b,
z −k πk (z; t)| ≤ Ce−ck ,
(10.20)
|z| ≥ b−1 .
(10.21)
These are (5.29) and (5.30) in Proposition 5.6 of the special case x ≥ 21/3 (1 − a)k 2/3 . On the other hand, let L > 0 be a fixed number. Set α = 1 − 24/3 k −1/3 w with −L ≤ w ≤ L, as in Proposition 5.6. Since ρa is fixed, when k is large, dist(−α, 6 (3) ) ≥ c > 0. Then from (10.15), (3)
(3)
|m 12 (−α; k; t)| ≤ Ce−ck .
|m 11 (−α; k; t) − 1|,
(10.22)
Note that 1 (s − s −1 ) ≤ s − 1, s > 0, 2 1 2 1 ≤ s ≤ 1, − (s − s −1 ) + log s ≤ (1 − s)3 , 2 3 2 1 − (s − s −1 ) + log s ≤ 0, s ≥ 1. 2
(10.23) (10.24) (10.25)
Thus for γ ≤ 1, s ≥ 1/2, 1−γ 1 γ F(−s; γ ) = − (s − s −1 ) + log s = (s − s −1 ) − (s − s −1 ) + log s 2 2 2 2 3 ≤ (1 − γ )(s − 1) + |s − 1| . 3 (10.26) For large k, α ≥ 1/2 for all −L ≤ w ≤ L, and hence (−α)k e−t (α−α −1 ) = ek F(−α;2t/k) ≤ e(32/2)|w|3 −k(1−2t/k)(24/3 w)/k 1/3 3
≤ e(32/2)L e2
Similarly, since α −1 ≥ 1/2 for large k, (−α)−k et (α−α −1 ) = ek F(−α −1 ;2t/k) ≤ ek(1−2t/k)(2
Therefore, from (10.19) and (10.22),
4/3 Lk 2/3
= Ceck
4/3 w)/(k 1/3 α)+(32/2)|w|3 α −3
2/3
≤ Ceck
(10.27)
.
2/3
(10.28) .
256
BAIK AND RAINS
−tα e πk (−α; t) −1 k (3) ck 2/3 , (10.29) = (−α)k e−t (α−α ) m (3) 11 (−α; k; t) − (−1) m 12 (−α; k; t) ≤ Ce −tα −1 e (−α)−k πk (−α; t) − 1 (3) −1 (3) = m 11 (−α; k; t) − 1 − α −k et (α−α ) m 12 (−α; k; t) ≤ Ce−ck . (10.30)
Noting x ∼ k 2/3 , these are (5.31) and (5.32) in Proposition 5.6 for the special case x ≥ 21/3 (1 − a)k 2/3 . Thus we have extended the argument of [3, (5.17)] to the case when z → −1. This is an example of the extension of the first category mentioned at the beginning of Section 10 (though it is straightforward to extend in this case).
10.1.2. The case ak ≤ 2t ≤ k − M2−1/3 k 1/3 for some 0 < a < 1 and M > M0 In Section 10.1.1, the contour (3) was not the best choice. We could have chosen the steepest descent curve for F(z; γ ). For the previous case, it was not necessary to use the steepest descent curve to obtain the desired results, but for the case at hand and in the future calculations, we need to use the steepest descent curve. For fixed θ satisfying 0 ≤ θ < π/2 or (3π)/2 < θ < 2π, F(ρ, θ; γ ) is always negative for 0 < ρ < 1, and as ρ ↓ 0, it decreases to minus infinity. On the other hand, one can check that (see [3, (5.5)]) when γ ≤ 1, the minimum of F(ρ, θ; γ ), 0 < ρ ≤ 1, is attained, for fixed π/2 ≤ θ ≤ (3π)/2, at p 1 − 1 − γ 2 cos2 θ ρ = ρθ := , (10.31) −γ cos θ and F(ρθ , θ; γ ) < 0. And it is straightforward to check that for 0 ≤ γ ≤ 1, π/2 ≤ θ ≤ 3π/2,
F(ρθ , θ; γ ) p √ q 2 2 1 − 1 − γ 2 cos2 θ 2 2 = 1 − γ cos θ + log ≤− (1 + γ cos θ)3/2 . −γ cos θ 3 (10.32) This is an extension of [3, (5.13)], where only the case when θ = π was considered. We need this improved version to obtain a better L 1 -estimate of v (3) − I in the sequel. Also, F(ρθ , θ; γ ) is increasing in π/2 ≤ θ ≤ π and is decreasing in π ≤ θ ≤ 3π/2. In fact, the saddle points for (γ /2)(z − z −1 ) + log z are z = −ρπ and z = −ρπ−1 . (3) (3) This time, define 6 (3) := 6in ∪ 6out , as in [3, (5.6)], by (3)
6in = {ρθ eiθ : 3π/4 ≤ θ ≤ 5π/4} ∪ {ρ3π/4 eiθ : 0 ≤ θ ≤ 3π/4, 5π/4 ≤ θ < 2π}, (3)
−1 iθ 6out = {ρθ−1 eiθ : 3π/4 ≤ θ ≤ 5π/4} ∪ {ρ3π/4 e : 0 ≤ θ ≤ 3π/4, 5π/4 ≤ θ < 2π},
(10.33)
RANDOM INVOLUTIONS
257
(3)
Ω4
(3)
Ω3 (3)
Ω2 (3)
−1
Ω1
0
(3)
Σ in
Σ
(3)
Σout
Figure 4. 6 (3) and (3) when γ < 1
where ρθ is defined in (10.31) with γ = (2t)/k. Orient 6 (3) as in Figure 4. Note that sig 6 (3) lies in 2 , and for 3π/4 ≤ θ ≤ 5π/4, it is the steepest descent curve. The reason why we choose a part of the circle as the contour for the remaining angles is to ensure the uniform boundedness of the Cauchy operators (see [3, Sec. 5]). This does not affect the asymptotics since, as we see, the main contribution to the asymptotics comes from the neighborhood of z = −1. (3) Define the regions (3) j , j = 1, . . . , 4, as in Figure 4. Define m (z; k; t), as in [3, (5.9)], by (2) −1 ) in (3) m (3) = m (2) (b+ 2 , (2) (3) (10.34) m (3) = m (2) (b− )−1 in 3 , (3) (3) (3) m = m (2) in 1 , 4 , (2)
where b± are defined in (10.6). Then m (3) solves a new RHP with the jump matrix v (3) (z; k; t) where ! −1 1 −(−1)k z k et (z−z ) (3) (3) v = on 6in , 0 1 (10.35) ! 1 0 (3) (3) v = on 6out . −1 (−1)k z −k e−t (z−z ) 1 (3)
Set w (3) := v (3) − I . For z ∈ 6in , from the choice of 6 (3) and (10.32), the (12)-entry of the jump matrix satisfies for 3π/4 ≤ arg z ≤ 5π/4, |z k et (z−z
−1 )
| = ek F(ρθ ,θ;2t/k) ≤ e−(2
√ 2/3)k(1+(2t/k) cos θ)3/2
258
BAIK AND RAINS
≤ e−(2
√ 2/3)k(1−2t/k)3/2
≤ e−(2/3)M
3/2
(10.36)
,
and for 0 ≤ arg z ≤ 3π/4 or 5π/4 ≤ arg z < 2π, |z k et (z−z
−1 )
| = ek F(ρ3π/4 ,θ;2t/k) ≤ ek F(ρ3π/4 ,3π/4;2t/k) ≤ e−(2
√ 2/3)k(1+(2t/k) cos(3π/4))3/2
From (10.12), similar estimates hold for z −k e−t (z−z kw (3) k L ∞ (6 (3) ) ≤ Ce−(2
−1 )
≤ e−(1/24)k .
(10.37)
(3) , z ∈ 6out . Thus we have
√ 2/3)k(1−2t/k)3/2
(10.38)
.
Also, there exists M0 such that for M > M0 , kCw(3) k L 2 (6 (3) )→L 2 (6 (3) ) ≤ c1 < 1, and hence (10.15) holds. This is precisely [3, (5.18)]. For this derivation we do not need the extension (10.32) of [3, (5.13)]. But for the improved L 1 -norm estimate of w (3) , which we do now, we need (10.32). Note that |dz| ≤ C|dθ| on 6 (3) . Using the estimates in (10.36) and (10.37), Z k t (z−z −1 ) z e |dz| (3)
6in
≤C
Z
5π/4
e−(2
√ 2/3)k(1+(2t/k) cos θ)3/2
3π/4
dθ + C
Z
e−(1/24)k dθ.
[0,2π)\[3π/4,5π/4]
(10.39)
The second integral is clearly less than Ce −(1/24)k . For the first integral, recall the a a a inequality √ (x + y) 2≥ x + y , x, y > 0, a ≥ 1. Then using the inequality 1 + cos θ ≥ (1/(2 2))(θ − π) for θ ∈ [3π/4, 5π/4], together with the condition ak ≤ 2t, the first integral is less than or equal to Z
5π/4
e−(2
√ 2/3)k[(1−2t/k)3/2 +((2t/k)(1+cos θ))3/2 ]
3π/4
≤e
√ −(2 2/3)k(1−2t/k)3/2
Z
5π/4
dθ e−(a
3/2 /(3·23/4 ))k|θ−π|3
dθ, (10.40)
3π/4
where the last inequality is less than or equal to Ck −1/3 for some constant C > 0. Therefore, adjusting constants, we obtain Z √ k t (z−z −1 ) z e |dz| ≤ C e−(2 2/3)k(1−2t/k)3/2 . (10.41) (3) k 1/3 6in
(3) We have similar estimates on 6out . Therefore
kw (3) k L 1 (6 (3) ) ≤
C k
e−(2 1/3
√ 2/3)k(1−2t/k)3/2
.
(10.42)
RANDOM INVOLUTIONS
259
This is a refinement of [3, (5.23)]. Now from (10.15), we have Z 1 w (3) (s) m (3) (z) = I + ds 2πi 6 (3) s − z Z [(I − Cw(3) )−1 Cw(3) I ](s)w (3) (s) 1 ds, + 2πi 6 (3) s−z
z∈ / 6 (3) . (10.43)
(3) In [3], only the term m (3) 22 (z) (at z = 0) was of interest. But then w22 = 0, and the first integral in (10.43) was zero. As computed in [3, (5.20)], the second integral was bounded by the product of the L ∞ - and L 1 -norms of w (3) , and then, due to (10.38), (3) the estimate kw (3) k L 1 ≤ Ck −1/3 in [3, (5.23)] was enough to control m 22 . But in the (3) present paper, we need estimates of m 12 for πk (z), and hence we need an estimate of the first integral which is the same as a bound on the L 1 -norm of w (3) . The L 1 bound that [3, (5.23)] obtained is not good enough for this purpose, and we need an improved estimate on the L 1 -norm of w (3) . Now by (10.42), the first integral is less than or equal to √ C −(2 2/3)k(1−2t/k)3/2 e , (10.44) dist(z, 6 (3) )k 1/3 while, as in [3, (5.20)], the second integral is less than or equal to, by using (10.38) and (10.42),
1 k(I − C w(3) )−1 Cw(3) I k L 2 kw (3) k L 2 2π dist(z, 6 (3) ) 1 k(I − C w(3) )−1 k L 2 →L 2 kCw(3) I k L 2 kw (3) k L 2 ≤ 2π dist(z, 6 (3) ) C kw (3) k2L 2 ≤ dist(z, 6 (3) ) C kw (3) k L ∞ kw (3) k L 1 ≤ dist(z, 6 (3) ) √ C −(4 2/3)k(1−2t/k)3/2 ≤ e . dist(z, 6 (3) )k 1/3
(10.45)
When z = 0, dist(z, 6 (3) ) ≥ c1 > 0; hence, using (10.34), (10.7), (10.8) and (5.5), (5.6), we obtain Proposition 5.1(ii). As in (10.17)–(10.19), from (10.13), (10.7), (10.9), and (5.6), we have (3)
πk (z; t) = −(−1)k e−t z m 12 (z; k; t), −1
(3)
πk (z; t) = z k e−t z m 11 (z; k; t), πk (z; t) = z k e
−t z −1
(3)
(3)
z ∈ 1 ,
(10.46)
(3)
z ∈ 4 ,
(10.47) (3)
m 11 (z; k; t) − (−1)k e−t z m 12 (z; k; t),
(3)
(3)
z ∈ 2 ∪ 3 . (10.48)
260
BAIK AND RAINS
Define x by 2t x (10.49) = 1 − 1/3 2/3 , k 2 k as in Proposition 5.6. Let 0 < b < 1 be a fixed number. Given b, from the beginning, we could have chosen 0 < a < 1 such that p 1 − 1 − a 2 cos2 θb ρ θb = (10.50) −a cos θb is strictly greater than b for some π/2 ≤ θb < π. Note that in (10.33) the choice of 3π/4 and 5π/4 was arbitrary in defining 6 (3) . Instead of 3π/4 and 5π/4, this time we use θb and 2π − θb , and we carry this forward through the later calculations. Thus we obtain the same estimates of (10.44) and (10.45) with different constants C. Now (3) (3) |z| ≤ b lies in 1 , and |z| ≥ b−1 lies in 4 . Since the distance dist(z, 6 (3) ) ≥ c2 > 0, using (10.43), (10.44), (10.45), (10.46), and (10.47), we obtain (5.29) and (5.30) in Proposition 5.6 for M ≤ x ≤ (1 − a)21/3 k 2/3 . Since in (10.20) and (10.21) in Section 10.1.1 the choice of 0 < a < 1 was arbitrary, we obtain (5.29) and (5.30) in Proposition 5.6 for all x ≥ M. √ On the other hand, let 0 < L < 2−3/2 M be a fixed number. Set α = 1 − 24/3 k −1/3 w with −L ≤ w ≤ L as in Proposition 5.6. From the inequality (1 − p √ 1 − γ 2 )/γ ≤ 1 − 1 − γ for all 0 ≤ γ ≤ 1, we have p r √ M 1 − 1 − (2t/k)2 2t ρπ = ≤1− 1− ≤ 1 − 1/6 1/3 . (10.51) 2t/k k 2 k But α =1−
24/3 w 24/3 L ≥ 1 − . k 1/3 k 1/3
(10.52)
Hence dist(−α, 6 (3) ) ≥ Ck −1/3 . Thus from (10.43), (10.44), and (10.45), we have |m (3) (−α; k; t) − I | ≤ Ce−(2/3)x (3)
3/2
,
(10.53)
(3)
which together with (10.48) (note that −α ∈ 2 ∪ 3 ) implies that
−tα e πk (−α; t) ≤ (−α)k e−t (α−α −1 ) 1 + Ce−c|x|3/2 + Ce−c|x|3/2 , (10.54) −tα −1 3/2 −1 ) 3/2 −k −c|x| −k t (α−α −c|x| e Ce (−α) πk (−α; t) − 1 ≤ Ce + (−α) e . (10.55)
For large k, α ≥ 1/2 for all −L ≤ w ≤ L, and hence, using (10.26), we obtain, as in (10.27) and (10.28), (−α)k e−t (α−α −1 ) = ek F(−α;2t/k) ≤ e−2wx+(32/2)|w|3 ≤ Cec|x| (10.56)
RANDOM INVOLUTIONS
261 PII,2
Ω1
PII,2
Ω2
0 PII,2
Ω3
PII,2
Ω4
Figure 5. 6 PII,2 and PII,2 j
and
(−α)−k et (α−α −1 ) = ek F(−α −1 ;2t/k) ≤ Cec|x| .
(10.57)
Thus from (10.54) and (10.55), we obtain (5.31) and (5.32) in Proposition 5.6. 10.1.3. The case k − M2−1/3 k 1/3 ≤ 2t ≤ k for some M > 0 In this case, as k → ∞, the point z = −ρπ on the deformed contour 6 (3) defined in (10.33) approaches z = −1 rapidly, and so we need to pay special attention to the neighborhood of z = −1. More precisely, we need to introduce the so-called parametrix for the RHP around z = −1, which is an approximate local solution. Recall RHP (2.15) for the Painlev´e II equation. Let 6 PII,2 = 61PII,2 ∪ 62PII,2 be a contour of the general shape indicated in Figure 5. Asymptotically for large z, the curves are straight lines of angle less than π/3 (see [3, paragraph after (2.18)] for more precise discussions on the curve). We define the exact shape of 6 PII,2 below. Define m PII,2 (z; x) by ! −2i((4/3)z 3 +x z) 1 e PII,2 m (z, x) = m(z; x) in 2PII,2 , 0 1 ! (10.58) 1 0 PII,2 m (z, x) = m(z; x) 2i((4/3)z 3 +x z) in 3PII,2 , e 1 m PII,2 (z, x) = m(z; x) in 1PII,2 , 4PII,2 , where m(z; x) is the solution of the RHP for the PII equation given in (2.15). Then
262
BAIK AND RAINS
m PII,2 solves a new RHP (see [3, (2.19)]) m PII,2 is analytic in C \ 6 PII,2 , ! −2i((4/3)z 3 +x z) 1 −e PII,2 PII,2 in 61PII,2 , m + = m − 0 1 ! 1 0 PII,2 PII,2 m+ = m− in 62PII,2 , 3 +x z) 2i((4/3)z e 1 m PII,2 = I + O 1/z as z → ∞.
(10.59)
Also, m 1PII,2 (x) defined by m PII,2 (z; x) = I + m 1PII,2 (x)/z + O(z −2 ) satisfies m 1PII,2 (x) = m 1 (x), where m 1 (x) is defined in a manner similar to that in (2.16). Set x by 2t x = 1 − 1/3 2/3 . k 2 k
(10.60)
(10.61)
We define 6 (3) and m (3) as in (10.33) and (10.34). Now we proceed as in [3, (5.25)– (5.35)]. Let O be the ball of radius around z = −1, where > 0 is a small fixed number. Define the map (see [3, equation displayed between (5.26) and (5.27)]) 1 λ(z) := −i2−4/3 k 1/3 (z − z −1 ) 2
(10.62)
in O . Define 6 PII,2 by 6 PII,2 ∩ λ(O ) := λ(6 (3) ∩ O ), and extend it smoothly outside λ(O ) as indicated in [3, (5.29)]. Define m PII,2 as above using this contour. Now we define the parametrix by (see [3, equation displayed between (5.30) and (5.31)]) ( m p (z; k; t) = m PII,2 (λ(z), x) in O \ 6 (3) , (10.63) m p (z; k; t) = I in O¯ c \ 6 (3) . It is proved in [3, (5.25)–(5.34)] that if we take small enough but fix it, then the ratio R(z; k; t) := m (3) m −1 p solves a new RHP R(z; k; t)
is analytic in C \ 6 R ,
R+ (z; k; t) = R− (z; k; t)v R (z; k; t) on 6 R , R(z; k; t) = I + O(1/z) as z → ∞,
(10.64)
(10.65)
RANDOM INVOLUTIONS
263
where 6 R := ∂ O ∪ 6 (3) , and the jump matrix satisfies (see [3, (5.34)]) 2/3 on O ∩ 6 (3) , kv R − I k L ∞ ≤ C/k kv R − I k L ∞ ≤ Ce−ck kv − I + m PII,2 (x)/(λ(z))k ∞ ≤ C/k 2/3 R L 1
on O c ∩ 6 (3) ,
(10.66)
on ∂ O , as k → ∞,
with some positive constants C and c which may depend on M. Set w R := v R − I . Using (10.15), which holds generally, we have (see [3, (5.35) and the preceding calculations]) Z ((I − C w R )−1 I )(s)(w R (s)) 1 R(z; k; t) = I + ds 2πi 6 R s−z Z v R (s) − I 1 =I+ ds (10.67) 2πi 6 R s − z Z [(I − Cw R )−1 Cw R I ](s)w R (s) 1 ds. + 2πi 6 R s−z Now the absolute value of the second integral is less than or equal to (recall that |λ(z)| = O(k −1/3 ) for z ∈ ∂ O ) C k(I − C w R )−1 k L 2 (6 R )→L 2 (6 R ) kCw R I k L 2 (6 R ) kw R k L 2 (6 R ) dist(z, 6 R ) C ≤ kw R k2L 2 (6 ) R dist(z, 6 R ) C ≤ , (10.68) dist(z, 6 R )k 2/3 and similarly, the first integral satisfies Z Z 1 m 1PII,2 (x) v R (s) − I 1 C ≤ ds + ds 2πi dist(z, 6 )k 2/3 . s − z 2πi λ(s)(s − z) R 6R ∂O (10.69) Hence Z PII,2 (3) −1 (x) m C 1 1 m (z; k; t) m p (z; k; t) −I + ds ≤ . 2πi dist(z, 6 R )k 2/3 ∂ O λ(s)(s − z) (10.70)
This is an extension of [3, (5.35)] to the case when z 6= 0. For z = 0, from (10.63) and (10.64), R(0) = m (3) (0) and dist(0, 6 R ) ≥ c1 > 0. Note that λ(s) is analytic in O except at s = −1, and 1 λ(s) = −i2−4/3 k 1/3 [(s + 1) + (s + 1)2 + . . .], 2
s ∼ −1.
(10.71)
264
BAIK AND RAINS
By a residue calculation for (10.70), we have, as in [3, (5.35)]), m
(3)
i24/3 m 1PII,2 (x) 1 (0; k; t) = I + + O 2/3 . k 1/3 k
(10.72)
Thus, using (2.17) and (2.18), from (5.5), (5.6), (10.7), (10.8), and (10.34), we obtain Proposition 5.1(iii) of the case when 0 ≤ x ≤ M. We now prove Proposition 5.2 when x ≥ 0. Since the choice of M was arbitrary in our calculations, for fixed x we choose M > 0 large enough so that x < M. Let z ∈ C\6 be fixed. We first assume |z| < 1. By modifying the contour 6 (3) if necessary, as (3) in (10.50) and the following paragraph, we have z ∈ 1 and dist(z, 6 (3) ) ≥ c1 > 0. Thus from (10.70), |R(z) − I | ≤ Ck −1/3 with some constant C that depends on x. Thus from (5.6), (10.7), (10.34), and (10.64), we obtain the first limit of (5.13). A similar calculation applies to the case |z| > 1, and we obtain the first limit of (5.14). The second limits of (5.13) and (5.14) follow from the first limits of (5.14) and (5.13), respectively, by replacing z → 1/z. Hence this extends the calculation [3, (5.35)] to the case when z is bounded away from the contour. Finally, we prove Proposition 5.4 when x > 0. Set α =1−
24/3 w , k 1/3
w fixed,
(10.73)
and
x 2t = 1 − 1/3 2/3 , x > 0 fixed. (10.74) k 2 k In this case, −α ∈ O . By a residue calculation again, for w not equal to zero, 1 2πi
Z
∂O
1 1 i24/3 + ds = λ(s)(s + α) λ(−α) (−1 + α)k 1/3 i i =− + 1/3 w w + (2 /k 1/3 )w 2 + · · · 1 = O 1/3 . k
(10.75)
When w = 0, we have the same order O(k −1/3 ) by a similar calculation. On the other hand, since p √ 1 1 − 1 − (2t/k)2 21/3 x x ρπ = =1− + + O , (10.76) 2t/k k k 1/3 21/3 k 2/3 using (10.73) we have dist(−α, 6 R ) ≥ Ck −2/3 . Thus we obtain from (10.70), |R(−α; k; t) − I | ≤ C.
(10.77)
RANDOM INVOLUTIONS
265
Using λ(−α) ∼ −iw, from (10.63) and (10.64), we have lim m (3) (−α; k; t) = m PII,2 (−iw, x)
(10.78)
k→∞
since is arbitrarily small. On the other hand, from the conditions on t and α, we have −1 3 lim α k e−t (α−α ) = e(8/3)w −2xw . (10.79) k→∞
Thus, using (10.34), we obtain (3)
lim m (2) (−α; k; t) = m PII,2 (−iw, x),
(3)
−α ∈ 1 , 4 , ! 3 1 −e(8/3)w −2xw (2) PII,2 lim m (−α; k; t) = m (−iw, x) , k→∞ 0 1 k→∞
lim m (2) (−α; k; t) = m PII,2 (−iw, x)
k→∞
1 3 −e−(8/3)w +2xw
!
0 , 1
(10.80) (3)
−α ∈ 2 , (10.81) (3)
−α ∈ 3 . (10.82)
Now finally using (10.58), for each fixed w and x, we have lim m (2) (−α; k; t) = m(−iw; x).
k→∞
(10.83)
From (10.7)–(10.10) and (10.79), this implies Proposition 5.4 in the case when x > 0. This is a new computation we had to do in the present paper in order to include the case when α → 1. 10.2. When (2t)/k > 1 Throughout this section we set γ :=
2t > 1. k
(10.84)
We need some definitions from [3]. Set 0 < θc < π by sin2 (θc /2) = 1/γ . Define a probability measure on an arc (see [3, (4.13)]), s θ θ 1 γ − sin2 dθ, −θc ≤ θ ≤ θc , (10.85) dµ(θ) := cos π 2 γ 2 and define a constant (see [3, (4.14)]) l := −γ + log γ + 1.
(10.86)
266
BAIK AND RAINS
Now we introduce the so-called g-function (see [3, (4.8)]) Z θc g(z; k; t) := log(z − eiθ ) dµ(θ), z ∈ C \ 6 ∪ (−∞, −1].
(10.87)
−θc
The measure dµ(θ) is the equilibrium measure of a certain variational problem, and the constant l is a related constant (see [3, Sec. 4]). For each |θ| ≤ θc , the branch is chosen such that log(z − eiθ ) is analytic in C \ (−∞, −1] ∪ {eiφ : −π ≤ φ ≤ θ} and behaves like log z as z ∈ R → +∞. The basic properties of g(z) are summarized in [3, Lem. 4.2]. In general, the role of the g-function in RHP analysis, first introduced in [19] and then generalized in [17], is to replace exponentially growing terms in the jump matrix by oscillating or exponentially decaying terms. The authors in [13] introduced a g-function of a form similar to (10.87) to analyze an RHP associated to orthogonal polynomials on the real line. The above g-function (10.87) introduced in [3] is an adaptation of their work to the circle case. When 0 ≤ γ ≤ 1, the related equilibrium measure is (see [3, (4.12)]) dµ(θ) =
1 (1 + γ cos θ) dθ, 2π
−π ≤ θ < π,
with the related constant l = 0, and hence (see [3, (4.15)]) ( log z − γ /(2z), |z| > 1, z ∈ / (−∞, −1), g(z) = −(γ /2)z + πi, |z| < 1.
(10.88)
(10.89)
Since g(z) is explicit in this case, we did not introduce it in the form (10.87) in Section 10.1. Then up to (10.104), we follow the procedure in [3, Sec. 6]. Write 6 = C 1 ∪ C2 , where C2 := {eiθ : −θc < θ < θc } and C1 := 6 \ C 2 . Define m (1) (z; k; t) by m (1) (z; k; t) := e(kl/2)σ3 Y (z; k; t)e−kg(z;k;t)σ3 e−(kl/2)σ3 , (10.90) 0 . Then m (1) solves (see [3, (6.1)]) a new RHP where σ3 = 10 −1 is analytic in C \ 6, m (1) (z; k; t) ! α (z;k;t) e−2ke (−1)k (1) m (1) (z; k; t) = m (z; k; t) on C2 , − + α (z;k;t) 0 e2ke ! k e−2ke α −(z;k;t) 1 (−1) (1) (1) on C1 , m + (z; k; t) = m − (z; k; t) 0 1 m (1) (z; k; t) = I + O(1/z) as z → ∞, (10.91)
RANDOM INVOLUTIONS
267
eiθc
eoutside C
einside C
C1
+
(3)
1
(3)
2
e−iθc
(3)
3
(3)
4
Figure 6. 6 (3) and (3) when γ > 1
where e α (z; k; t) is defined by (see [3, Lem. 6.1]) Z γ z s + 1p e α (z; k; t) := − (s − eiθc )(s − e−iθc ) ds, 4 eiθc s 2
ξ := eiθc .
(10.92)
(Notation: We use e α here instead pof α in [3] to avoid confusion with α in (10.140).) The branch is chosen such that (s − eiθc )(s − e−iθc ) is analytic in C \ C 1 and behaves like s as s ∈ R → +∞. Define m (2) (z; k; t) as in (10.3). Then m (2) solves a new RHP, normalized as z → ∞, with the jump matrices (see [3, (6.2)]) ! −2ke α (z;k;t) 1 −e v (2) (z; k; t) = on C2 , α (z;k;t) e2ke 0 (10.93) ! −2ke α −(z;k;t) −1 e (2) on C1 . v (z; k; t) = 1 0 Through the changes Y → m (1) → m (2) , we have (2)
Y11 (z; k; t) = −ekg(z;k;t) m 12 (z; k; t), k kg(z;k;t)+kl
Y21 (z; k; t) = −(−1) e Y11 (z; k; t) = Y21 (z; k; t) =
|z| < 1,
(2) m 22 (z; k; t),
ekg(z;k;t) m (2) |z| > 1, 11 (z; k; t), k kg(z;k;t)+kl (2) (−1) e m 21 (z; k; t),
(10.94) |z| < 1,
(10.95) (10.96)
|z| > 1.
(10.97)
einside ∪ C eoutside as in Figure 6, which divides C into four Set 6 (3) := C1 ∪ C (3) regions, j , j = 1, . . . , 4. Again, there is a certain freedom in choosing the shape einside and C eoutside . For example, C einside (resp., C eoutside ) can be any smooth curve of C
268
BAIK AND RAINS
(3) iθc and e−iθc ; the precise requirement is given lying in (3) 2 (resp., 3 ) connecting e in [3] (see also (10.100)–(10.102)). Define m (3) (z; k; t) by (see [3, p. 1151]) !−1 α (z;k;t) 1 −e−2ke (3) (2) in (3) m =m 2 , 0 1 ! (10.98) 1 0 (3) (2) m = m in (3) , 3 2ke α (z;k;t) e 1 (3) (3) (3) m = m (2) in 1 , 4 .
Then m (3) solves an RHP, normalized as z → ∞, with the jump matrix given by ! −2ke α (z;k;t) 1 −e einside , on C 0 1 ! 1 0 eoutside , v (3) (z; k; t) = on C (10.99) 2ke α (z;k;t) 1 e ! α −(z;k;t) −1 e−2ke on C1 . 1 0 From the properties of g(z), it is proved in [3, between (6.3) and (6.4)] that α −(z;k;t) e−ke →0
e
−ke α (z;k;t)
e
ke α (z;k;t)
→0 →0
as k → ∞, z ∈ C 1 ,
einside , as k → ∞, z ∈ C
eoutside . as k → ∞, z ∈ C
(10.100) (10.101) (10.102)
einside and C eoutside is precisely for these properties. Here the converThe choice of C gence is uniform for any compact part of each contour away from the end points e iθc and e−iθc , but it is not uniform on the whole contour. This gives rise to a technical difficulty that is overcome below using the idea of the parametrix. Formally, v (3) → v ∞ as k → ∞ where ! 1 0 v ∞ (z) = einside ∪ C eoutside , on C 0 1 (10.103) ! 0 −1 ∞ on C1 . v (z) = 1 0 ∞ ∞ Thus we expect that m (3) converges to m ∞ , the solution of the RHP m ∞ + = m− v ∞ ∞ with m → I as z → ∞. The solution m is easily given by (see [3, Lem. 6.2]) (1/2i)(β − β −1 ) (1/2)(β + β −1 ) ∞ m (z) = , (10.104) −(1/2i)(β − β −1 ) (1/2)(β + β −1 )
RANDOM INVOLUTIONS
269
where β(z) := ((z − eiθc )/(z − e−iθc ))1/4 , which is analytic C \ C 1 and β ∼ +1 as z ∈ R → +∞. 10.2.1. The case 2t ≥ ak for some a > 1 The parametrix m p (z) is introduced in [3, (6.25)–(6.31)], and it has the following properties. In the neighborhood O of size around the points e iθc and e−iθc , m p (z) is constructed using the Airy function in such a way that (m p (z))+ = (m p (z))− v (3) (z) for z ∈ 6 (3) ∩ O , and km p (m ∞ )−1 − I k L ∞ (∂ O ) = O(k −1 ). In C \ O , we set m p (z) := (3) ∩ O , and it has a m ∞ (z). Then the ratio R(z) := m (3) (z)m −1 p (z) has no jump on 6 −1 jump v R := w R + I converging to I uniformly of order O(k ) on ∂ O , and of order O(e−ck ) on 6 (3) ∩ O c as k → ∞. This implies that R(z) = I + O(k −1 ) for any z ∈ C \ 6 p , 6 p := (6 (3) ∩ O c ) ∪ ∂ O . Moreover, following the arguments in [14, Sec. 8], the error is uniform up to the boundary in each open region in C \ 6 p . In (3) particular, for z ∈ (3) 1 ∪ 4 (see [3, (6.34)–(6.40)]), m (3) (z) = I + O(k −1 ) m ∞ (z).
(10.105)
|πk (−α; k; t)| ≤ C|ekg(−α;k;t) |.
(10.106)
Here the error is uniform for ak ≤ 2t ≤ bk for some 0 < a < b. For the case (2t)/k → ∞, by shrinking the size of O properly, we again obtain uniform error (see [2]). Therefore, for any a > 0, we obtain uniformity in (10.105) for ak ≤ 2t. When z = 0, β(0) = −ieiθc /2 and g(0) = πi (see [3, Lem. 4.2(vi)]). Also, (10.98) says that m (3) (0) = m (2) (0). Thus Proposition 5.1(v) follows from (10.94) and (10.95), as in [3, (6.4)]. Now we consider Proposition 5.6 in the case where x ≤ −21/3 (a − 1)k 2/3 . For z = −α real, |β(−α)| = 1, so m ∞ (−α) is bounded. Hence from (10.98) and (10.94), (10.96), we have for α ≥ 1,
Then we proceed as in (10.109)–(10.121) of the following section to obtain the proper estimate. 10.2.2. The case k + M2−1/3 k 1/3 ≤ 2t ≤ ak for some a > 1 and M > M0 In this case, the points eiθc and e−iθc are allowed to approach −1, but the rate is restricted: √ k 1/2 M iθc |e + 1| = 2 1 − ≥ 1/3 (10.107) 2t k √ for k large. We now take the neighborhood O to be of size 2t/k − 1 around eiθc and e−iθc . From (10.107), O consists of two disjoint disks and their boundaries do not touch the real axis. We introduce the same parametrix m p as in Section 10.2.1. Then
270
BAIK AND RAINS
we have a similar result: there is M0 > 0 such that for M > M0 , 1 m (3) (z) = I + O m ∞ (z) k(2t/k − 1) (3)
(10.108)
(3)
for z ∈ 1 ∪ 4 . This is proved in [3, (6.34)–(6.40)]. When z = 0, as in Section 10.2.1, we obtain Proposition 5.1(iv), as in [3]. Now we prove Proposition 5.6 when x ≤ −M. As in Section 10.2.1, we have |πk (−α; k; t)| ≤ C|ekg(−α;k;t) |.
(10.109)
Now we need an estimate of g(−α; k; t); as we mentioned at the beginning of Section 10, this is the second way in which we must extend [3]. Note that Z 1 θc Re g(−α) = log(1 + α 2 + 2α cos θ) dµ(θ) 2 −θc (10.110) 1 α = log 2 + log + I (s), 2 γ where I (s) =
1 π
Z
1 −1
p log(s 2 − x 2 ) 1 − x 2 d x,
s :=
√ γ (1 + α) > 1. √ 2 α
(10.111)
The inequality s > 1 follows from the arithmetic-geometric mean inequality and the assumption γ > 1. A residue calculation gives us Z q 1 1 2y p 0 2 I (y) = 1 − x d x = 2y − 2 y 2 − 1, y > 1. (10.112) π −1 y 2 − x 2
Integrating from 1 to s > 1, we have
I (s) = s 2 − 1 − 2
Z
s 1
q
y 2 − 1 dy + I (1).
The constant I (1) can be evaluated (cf. [3, Lem. 4.3(ii)–(a)]): Z 1 1 π/2 I (1) = log(sin2 θ) sin2 θ dθ = − log 2. π −π/2 2 Thus we have p p 1 1 α Re g(−α) = − + log + s 2 − s s 2 − 1 + log s + s 2 − 1 . 2 2 γ
(10.113)
(10.114)
(10.115)
Assume 0 < α ≤ 1. We change the variables γ , α into s, ξ , where s is defined in (10.111) and 1/2 γ > 1. (10.116) ξ := α
RANDOM INVOLUTIONS
271
Then p p 1 1 γ F(ξ ) := g(−α)− α = − −log ξ − ξ 2 +2sξ −s 2 −s s 2 − 1+log(s + s 2 − 1). 2 2 2 (10.117) Differentiating with respect to ξ , we find 1 F 0 (ξ ) = − − ξ + 2s. (10.118) ξ √ √ Thus the maximum of F occurs at ξ = s + s 2 − 1. But F(s + s 2 − 1) = 0; hence F(ξ ) ≤ 0. Thus we obtain |e−tα πk (−α; k)| ≤ Cek Re(g(−α;k;t)−(γ /2)α) ≤ C,
0 < α ≤ 1.
(10.119)
For α ≥ 1, note that Re(g(−α)) = log α + Re(g(−α −1 )).
(10.120)
Thus, using (10.119), we have −1
|e−tα (−α)−k πk (−α; k)| ≤ Cek Re(g(−α;k;t)−(γ /2)α = Cek Re(g(−α
−1 −log α)
−1 ;k;t)−(γ /2)α −1 )
≤ C.
(10.121)
10.2.3. Case k < 2t ≤ k + M2−1/3 k 1/3 for some M > 0 First, we introduce m PII,3 as in [3, (2.22)–(2.28)]. Set 4 2 x 3/2 g (z) := z + , (10.122) 3 2 √ √ which is analytic in C\ − −x/2, −x/2 and behaves like (4/3)z 3 +x z+x 2 /(8z)+ O(z −3 ) =: θPII (z) + O(z −1 ) as z → +∞. Let 6 PII,3 := ∪5j=1 6 PII,3 as shown in j Figure 7. The angles of the rays with the real line are between zero and π/3. Recall that m(z; x) solves (2.15), the RHP for the PII equation. Define m PII,3 (z; x) by PII,3 = m(z; x)ei(g PII −θPII )σ3 , z ∈ 1PII,3 , 4PII,3 , m ! −2iθPII PII m PII,3 = m(z; x) 1 e ei(g −θPII )σ3 , z ∈ (2PII,3 ∪ 3PII,3 ) ∩ C− , 0 1 ! 1 0 i(gPII −θPII )σ3 PII,3 = m(z; x) 2iθ e , z ∈ (2PII,3 ∪ 3PII,3 ) ∩ C+ . m e PII 1 (10.123) PII
272
BAIK AND RAINS 62PII,3
PII,3 3
PII,3 1
61PII,3
65PII,4 −(−x/2)1/2
63PII,3
0
PII,3 2 (−x/2)1/2 64PII,3
PII,3 4
Figure 7. 6 PII,3 and PII,3 j
Then m (3) solves the RHP (see [3, (2.25)]) normalized at ∞ with the jump matrix ! 1 0 on 61PII,3 , 62PII,3 , PII 2ig e 1 ! 1 −e−2ig PII (3) on 63PII,3 , 64PII,3 , v (z; k; t) = (10.124) 0 1 ! PII e−2ig− −1 on 65PII,3 . 1 0
Also, we have
m 1 (x) =
m 1PII,3 (x) −
i x2 σ3 , 8
(10.125)
where m PII,3 (z; x) = I + m 1PII,3 (x)/z + O(z −2 ) as z → ∞. Now as before, set x by 2t x = 1 − 1/3 2/3 . k 2 k
(10.126)
Hence we have −M ≤ x < 0 in this section. We now proceed as in [3, case (iii) of Sec. 6]. Define the parametrix ( m p (z; k; t) = m PII,3 λ(z), x in O \ 6 (3) , (10.127) m p (z; k; t) = I in O¯ c \ 6 (3) , where λ(z) is defined in (10.62) and O is a small neighborhood of size > 0 around z = −1 (see [3, case (iii) of Sec. 6] for details). As in Section 10.1.3, the ratio R(z; k; t) := m (3) m −1 p satisfies a new RHP, normalized at ∞, with jump matrix v R satisfying the estimate (10.66), where m 1PII,2 (x) is replaced by m 1PII,3 (x). Hence we
RANDOM INVOLUTIONS
273
have Z PII,3 (3) C 1 m (z; k; t) m p (z; k; t) −1 −I + m 1 (x) ds ≤ , 2πi dist(z, 6 R )k 2/3 ∂ O λ(s)(s − z) (10.128)
which is hidden in the derivation of [3, (6.19)]. Then as in (10.72), we have m
(3)
i24/3 m 1PII,3 (x) 1 (0; k; t) = I + + O 2/3 , k 1/3 k
(10.129)
which is a (direct) extension of [3, (6.19)]. Now from (10.98) and (10.125) (see [3, (6.19)]), 1 i24/3 m 1 (x) − 2−5/3 x 2 σ3 (2) m (0; k; t) = I + (10.130) + O 2/3 . k 1/3 k Hence, using ekl = 1− x 2 /(25/3 k 1/3 )+ O(k −2/3 ) and g(0) = πi, (10.94) and (10.95) yield Proposition 5.1(iii) in the case when −M ≤ x < 0, as in [3]. For the proof of Proposition 5.2, note that, as before, for each fixed z ∈ C \ 6, (3) we can use the freedom of the shape of 6 (3) (and 6 R ) so that z ∈ (3) 1 ∪ 4 and dist(z, 6 R ) ≥ c1 > 0. Thus we obtain lim m (2) (z; k; t) = I,
k→∞
z ∈ C \ 6 fixed.
(10.131)
From (10.94) and (10.96), we have lim e−kg(z;k;t) πk (z) = 0,
|z| < 1,
(10.132)
lim e−kg(z;k;t) πk (z) = 1,
|z| > 1.
(10.133)
k→∞ k→∞
This is an extension of the calculation in [3] where z = 0 is given. Now in order to prove Proposition 5.2, we need further analysis that is an extension in the second category as mentioned at the beginning of Section 10. Since γ = (2t)/k, for the proof of Proposition 5.2, it is enough to show that lim (−1)k ek[g(z;k;t)+(γ /2)z] = 1,
k→∞
lim ek[g(z;k;t)+(γ /2)z
k→∞
−1 −log z]
= 1,
|z| < 1,
(10.134)
|z| > 1.
(10.135)
But the proof of [3, Lem. 4.3(ii)] says that for |z| > 1, z ∈ / (−∞, −1), Z z γ γ s + 1p γ 1 (s − eiθc )(s − e−iθc ) ds + g− (1), g(z) = log z − (z + z −1 ) + + 2 4 2 4 1+0 s 2 (10.136)
274
BAIK AND RAINS
where the integral p is taken over a curve from 1+0 to z lying in {z ∈ C : |z| > 1, z ∈ / (−∞, −1)}. Here (s − eiθc )(s − e−iθc ) is analytic in C \ C 1 and behaves like s as s ∈ R → +∞, and log z is analytic in C\(−∞, 0] and is real for zR+ . Calculations in the same proof, together with [3, Lem. 4.2(viii)], give us g− (1) = −1/2−(1/2) log γ . Also, using sin2 (θc /2) = 1/γ , for |s| > 1, s ∈ / (−∞, −1), p 2s (γ − 1) + O((γ − 1)2 ). (s − eiθc )(s − e−iθc ) = (s + 1) − (10.137) s+1 Thus, expanding in γ − 1, we have γ g(z) + z −1 − log z = O((γ − 1)2 ) = O(k −4/3 ), (10.138) 2 which implies (10.135). A similar computation implies that for |z| < 1, z ∈ / (−1, 0], we have Z γ γ γ z s + 1p 1 g(z) = log z − (z + z −1 ) + + (s − eiθc )(s − e−iθc ) ds + g+ (1) 2 4 2 4 1+0 s 2 (10.139) and g+ (1) = −1/2 − (1/2) log z + πi, which yield (10.134). For the proof of Proposition 5.4 when x < 0, set 24/3 w . (10.140) k 1/3 In this case, we need a new argument as α → 1. When w and x are fixed, again as in (10.77), we have limk→∞ R(−α; k; t) = I , which implies that α =1−
lim m (3) (−α; k; t) = m PII,3 (−iw, x).
(10.141)
k→∞
From [3, (6.8)], we have 4 x 3/2 2 (−iw) + lim ke α (−α; k; t) = i = ig PII (−iw), k→∞ 3 2
(10.142)
which from (10.98) and (10.123) implies
lim m (2) (−α; k; t) = m(−iw, x)ei(g
k→∞
PII (−iw)−θ
PII (−iw))σ3
.
(10.143)
Now we compute the large k limit of kg(−α; k; t) − tα when w > 0, and of kg(−α; k; t) − tα −1 − log α when w < 0. For −π < θ < π, lim↓0 arg(−α + i − eiθ ) = π + tan−1 (sin θ/(α + cos θ)), where −π < tan−1 φ < π. Since tan−1 (sin θ/(α + cos θ)) is odd in θ, we have from (10.115), lim g(−α + i) = lim Re g(−α + i) + πi ↓0
↓0
p p 1 1 α = − + log + s 2 − s s 2 − 1 + log(s + s 2 − 1) + πi, 2 2 γ (10.144)
RANDOM INVOLUTIONS
275
√ √ where s = γ (1 + α)/(2 α) > 1. Under the stated conditions on γ and α, as k → ∞, 2 √ γ (1 + α) 1 w x 1 2w 3 + O =1+ − · + , (10.145) √ k 21/3 24/3 k 2/3 k −4/3 2 α 1 x 2xw 24/3 w α (10.146) = 1 − 1/3 + 1/3 2/3 − + O −4/3 . γ k k 2 k k Note that lim↓0 g(−α + i) − lim↓0 g(−α − i) is 2πi for α > 1, and it is zero for 0 < α < 1. Therefore we obtain lim (−1)k ekg(−α)−tα = e(4/3)w
k→∞
lim (−1)k ekg(−α)−tα
k→∞
−1 −k log α
3 −xw−(4/3)(w 2 −x/2)3/2
= e−(4/3)w
w > 0, (10.147)
,
3 +xw−(4/3)(w 2 −x/2)3/2
w < 0.
,
(10.148) Also, being careful of branch, we have 4 3 4 i(g (−iw) − θPII (−iw)) = w − xw − w2 − 3 3 4 3 4 PII i(g (−iw) − θPII (−iw)) = w − xw + w2 − 3 3 PII
x 2 x 2
3/2 3/2
,
w > 0,
(10.149)
,
w < 0.
(10.150)
Since limk→∞ ekl = 1, using (10.94)–(10.97) and (10.143), this implies (5.21)–(5.24) when x < 0. 10.3. Proof of Proposition 5.8 The analysis in this section is new, and it is needed for the proof of Proposition 5.8. Let α > 1 be fixed. Set t α(α 2 − 1)1/2 x α − ·√ , = 2 k α +1 (α 2 + 1)3/2 k
x ∈ R \ {0} fixed.
(10.151)
We are interested in the asymptotics of e −αt (−α)l πk (−α −1 ; t). Since 2α/(α 2 + 1) < 1 and x is fixed, we are in the case of Sections 10.1.1 and/or 10.1.2. We define m (1) and m (2) as in (10.1) and (10.3). Recall that we have a certain freedom in the choice of 6 (3) . We choose a contour passing through the saddle points of t f (z) := (z − z −1 ) + log(−z), (10.152) k the exponent of the (12)-entry of v (2) divided by k. The saddle points (see (10.31) and the following discussion) are −ρπ and −ρπ−1 , where p 1 (α 2 + 1)1/2 1 1 1 − 1 − (2t/k)2 x ρπ = =: ρ . = − + O · + O √ c 2t/k α α(α 2 − 1)1/2 k k k (10.153)
276
BAIK AND RAINS
(3)
4
(3)
3 (3)
2 (3)
−1
1
0
6c
6r 6
(3)
6out
Figure 8. 6 (3) and (3)
Take δ > 0 and > 0 small such that 6c := {−ρc + is : −k δ−1/2 ≤ s ≤ k δ−1/2 } (3) (3) lies inside the open unit disk for all k ≥ 1. Define (see Figure 8) 6 (3) := 6in ∪ 6out (3) (3) (3) by 6in := 6c ∪ 6r and 6out := {r −1 eiφ : r eiφ ∈ 6in }, where 6r := {| − ρc + ik δ−1/2 |eiθ : |θ| < θ0 }, −ρc + ik δ−1/2 = | − ρc + ik δ−1/2 |eiθ0 . Let (3) j be as in (3) Figure 8, and define m as in (10.34). As in (10.46)–(10.48), the quantities we are interested in are e−αt (−α)k πk (−α −1 ; t) = −α k e−t (α−α e−αt (−α)k πk (−α −1 ; t) = −α k e−t (α−α
−1 )
−1 )
−1 m (3) 12 (−α ; k; t), (3)
x < 0,
(10.154)
(3)
m 12 (−α −1 ; k; t) + m 11 (−α −1 ; k; t),
x > 0. (10.155)
For the estimates of w (3) := v (3) − I , note that for any fixed 0 < ρ < 1, Re f (ρeiθ ) = F(ρeiθ ; 2t/k) (recall (10.11)) is increasing in 0 < θ < π and is δ−1/2 ) decreasing in π < θ < 2π; hence ke k f (z) k L ∞ (6r ) = ek Re f (−ρc +k . But we have t 1 α 2 (α 2 − 1) 2 x 2 −1 f (−ρc + ia) = (α − α ) − log α − a + + O(k −3/2+2δ ), k 2 k α2 + 1 |a| ≤ k δ−1/2 . (10.156)
Thus kα k e−t (α−α
−1 )
2δ
ek f (z) k L ∞ (6r ) ≤ Ce−ck ,
(10.157)
RANDOM INVOLUTIONS
277
and also using α −k et (α−α
−1 )
= ek[log α−(α
2 −1)/(α 2 +1)+O(1/
√ k)]
,
log α −
α2 − 1 > 0 for α > 1, α2 + 1 (10.158)
we have kek f (z) k L ∞ (6r ) ≤ Ce−ck .
(10.159)
On the other hand, one can directly check that Re f (−ρc + a) has its maximum at a = 0 for −k δ−1/2 ≤ a ≤ k δ−1/2 ; hence kek f (z) k L ∞ (6c ) = ek Re f (−ρc ) . Again (10.156) and (10.158) yield kek f (z) k L ∞ (6c ) ≤ e−ck .
(10.160)
Similarly, we have ke−k f (z) k L ∞ (6 (3) ) ≤ e−ck . Now calculations as in Section out 10.1.2 give us the result (10.43). Hence, using (10.158) √ and (10.160) and noting dist(−α −1 , 6 (3) ) = ((α 2 + 1)1/2 /(α(α 2 − 1)1/2 )) · (x/ k), we have
αk e
−t (α−α −1 )
√ (3) m 11 (−α −1 ; k; t) = 1 + O( ke−2(1−1 )c1 k ),
(10.161)
−t (α−α −1 )
−1 k m (3) 12 (−α ; k; t) = α e Z −1 √ −ck 2δ (−s)k et (s−s ) ds · + O( ke ). 2πi s + α −1 6c (10.162)
To evaluate the integral asymptotically, first we change the variable by s = −ρ c − √ 1/2 2 1/2 + 1) /(α(α − 1) )) · (y/ k). Then from (10.156), the numerator of the integrand becomes (i(α 2
α −k et (α−α
−1 )
e−(1/2)(y
2 +x 2 )+O(k −1/2+2δ )
(10.163)
.
Hence, setting A := α(α 2 − 1)1/2 /((α 2 + 1)1/2 ), 1 2πi Z Ak δ
k −t (α−α −1 )
α e
1 = 2πi
Z
−Ak δ
6c
(−s)k et (s−s s + α −1 2
e−(1/2)(y +x y + ix
2)
−1 )
ds
2δ dy 1 + O(k −(1/2)+2δ ) + O(e−ck ). (10.164)
Thus from (10.154) and (10.155), we obtain lim e−αt (−α)k πk (−α −1 ; t) =
k→∞
1 2πi
Z
∞ −∞
2
e−(1/2)(y +x y + ix
2)
dy,
x < 0,
(10.165)
278
BAIK AND RAINS
lim e−αt (−α)k πk (−α −1 ; t) =
k→∞
The function h(x) := (1/(2πi))
1 2πi
R∞
∞ −∞
2
e−(1/2)(y +x y + ix
2)
dy + 1,
x > 0. (10.166)
−(1/2)(y 2 +x 2 ) /(y
+ i x) dy is smooth in x > 0 √ 2 −(1/2)x = (1/ 2π )e . As x → ±∞, h(x) → 0.
−∞ e
h 0 (x)
Z
and x < 0. The derivative is Therefore we see that ( √ Rx 2 Z ∞ −(1/2)(y 2 +x 2 ) (1/ 2π ) −∞ e−(1/2)y dy, e 1 dy = √ Rx 2 2πi −∞ y + ix (1/ 2π ) ∞ e−(1/2)y dy,
x < 0, x > 0.
(10.167)
Thus Proposition 5.8 is proved.
2
Acknowledgments. We would like to thank Percy Deift for helpful discussions and encouragement, especially for his help in proving Lemma 2.1. We would also like to acknowledge many useful conversations and communications with Peter Forrester, Kurt Johansson, Charles Newman, and Harold Widom. Special thanks are due the referee who gave us crucial advice, improving the exposition of the paper significantly. References [1]
[2]
[3]
[4]
[5] [6]
[7]
D. ALDOUS and P. DIACONIS, Longest increasing subsequences: From patience
sorting to the Baik-Deift-Johansson theorem, Bull. Amer. Math. Soc. (N.S.) 36 (1999), 413–432. MR 2000g:60013 207 J. BAIK, Riemann-Hilbert problems and random permutations, Ph.D. dissertation, Courant Institute of Mathematical Sciences, New York, 1999, http://www.math.princeton.edu/˜jbaik/ 269 J. BAIK, P. DEIFT, and K. JOHANSSON, On the distribution of the length of the longest increasing subsequence of random permutations, J. Amer. Math. Soc. 12 (1999), 1119–1178. MR 2000e:05006 205, 207, 210, 213, 214, 228, 229, 230, 231, 232, 233, 235, 238, 239, 241, 251, 252, 253, 254, 255, 256, 257, 258, 259, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274 , On the distribution of the length of the second row of a Young diagram under Plancherel measure, Geom. Funct. Anal. 10 (2000), 702–731, MR CMP 1 791 137; Addendum, Geom. Funct. Anal. 10 (2000), 1606–1607. MR CMP 1 810 756 207 J. BAIK, P. DEIFT, and E. RAINS, A Fredholm determinant identity and the convergence of moments for random Young tableaux, preprint, arXiv:math.CO/0012117 207 J. BAIK and E. M. RAINS, Limiting distributions for a polynuclear growth model with external sources, J. Statist. Phys. 100 (2000), 523–541. MR CMP 1 788 477 210, 213 , Algebraic aspects of increasing subsequences, Duke Math. J. 109 (2001), 1–65. 205, 211, 213, 214, 224, 225, 226, 227, 228, 246
RANDOM INVOLUTIONS
[8]
[9]
[10]
[11]
[12] [13]
[14]
[15]
[16]
[17]
[18]
[19] [20] [21]
[22] [23]
279
, Symmetrized random permutations, preprint, arXiv:math.CO/9910019, to appear in Random Matrix Models and Their Applications, ed. P. Bleher and A. Its, Math. Sci. Res. Inst. Publ. 40, Cambridge Univ. Press, Cambridge, 2001. 211, 213, 222 A. BORODIN, Longest increasing subsequences of random colored permutations, Electron. J. Combin. 6 (1999), R13, http://www.combinatorics.org MR 2000a:05014 207, 212 A. BORODIN, A. OKOUNKOV, and G. OLSHANSKI, Asymptotics of Plancherel measures for symmetric groups, J. Amer. Math. Soc. 13 (2000), 481–515, http://www.ams.org/jams/ MR CMP 1 758 751 207 P. A. DEIFT, Orthogonal Polynomials and Random Matrices: A Riemann-Hilbert Approach, Courant Lect. Notes Math. 3, Courant Inst. Math. Sci., New York, 1999. MR 2000g:47048 207 , Integrable systems and combinatorial theory, Notices Amer. Math. Soc. 47 (2000), 631–640. MR CMP 1 764 262 207 P. DEIFT, T. KRIECHERBAUER, K. T.-R. MCLAUGHLIN, S. VENAKIDES, and X. ZHOU, Asymptotics for polynomials orthogonal with respect to varying exponential weights, Internat. Math. Res. Notices 1997, 759–782. MR 99g:34038 251, 266 , Strong asymptotics of orthogonal polynomials with respect to exponential weights, Comm. Pure Appl. Math. 52 (1999), 1491–1552. MR CMP 1 711 036 251, 269 , Uniform asymptotics for polynomials orthogonal with respect to varying exponential weights and applications to universality questions in random matrix theory, Comm. Pure Appl. Math. 52 (1999), 1335–1425. MR CMP 1 702 716 251 P. DEIFT, S. VENAKIDES, and X. ZHOU, The collisionless shock region for the long-time behavior of solutions of the KdV equation, Comm. Pure Appl. Math. 47 (1994), 199–206. MR 95f:35220 250 , New results in small dispersion KdV by an extension of the steepest descent method for Riemann-Hilbert problems, Internat. Math. Res. Notices 1997, 286–299. MR 98b:35155 251, 266 P. DEIFT and X. ZHOU, A steepest descent method for oscillatory Riemman-Hilbert problems: Asymptotics for the MKdV equation, Ann. of Math. (2) 137 (1993), 295–368. MR 94d:35143 250 , Asymptotics for the Painlev´e II equation, Comm. Pure Appl. Math. 48 (1995), 277–337. MR 96d:34004 215, 216, 219, 250, 266 H. FLASCHKA and A. C. NEWELL, Monodromy- and spectrum-preserving deformations, I, Comm. Math. Phys. 76 (1980), 67–116. MR 82g:35103 216 A. S. FOKAS, A. R. ITS, and A. V. KITAEV, Discrete Painlev´e equations and their appearance in quantum gravity, Comm. Math. Phys. 142 (1991), 313–344. MR 93a:58080 228 A. S. FOKAS and X. ZHOU, On the solvability of Painlev´e II and IV, Comm. Math. Phys. 144 (1992), 601–622. MR 93d:34004 216 P. J. FORRESTER, Random walks and random permutations, preprint,
280
BAIK AND RAINS
arXiv:math.CO/9907037 213 [24]
P. J. FORRESTER and E. M. RAINS, Inter-relationships between orthogonal, unitary and
[25]
J. GRAVNER, C. A. TRACY, and H. WIDOM, Limit theorems for height fluctuations in a
symplectic matrix ensembles, preprint, arXiv:solv-int/9907008 225
[26] [27]
[28] [29]
[30]
[31] [32] [33]
[34] [35] [36] [37]
[38] [39] [40] [41]
class of discrete space and time growth models, preprint, arXiv:math.PR/0005133 210 C. GREENE, An extension of Schensted’s theorem, Adv. Math. 14 (1974), 254–265. MR 50:6874 208 S. P. HASTINGS and J. B. MCLEOD, A boundary value problem associated with the second Painlev´e transcendent and the Korteweg–de Vries equation, Arch. Rational Mech. Anal. 73 (1980), 31–51. MR 81i:34024 214, 215 A. R. ITS, C. A. TRACY, and H. WIDOM, Random words, Toeplitz determinants and integrable systems, I, preprint, arXiv:math.CO/9909169 207 M. JIMBO, T. MIWA, and K. UENO, Monodromy preserving deformation of linear ordinary differential equations with rational coefficients, I: General theory and τ -function, Phys. D 2 (1981), 306–352. MR 83k:34010a 216 K. JOHANSSON, The longest increasing subsequence in a random permutation and a unitary random matrix model, Math. Res. Lett. 5 (1998), 63–82. MR 99e:60033 213, 233 , Shape fluctuations and random matrices, Comm. Math. Phys. 209 (2000), 437–476. MR CMP 1 737 991 207, 208, 211, 222 , Discrete orthogonal polynomial ensembles and the Plancherel measure, preprint, arXiv:math.CO/9906120 , to appear in Ann. of Math. (2). 207, 208, 210 D. E. KNUTH, The Art of Computer Programming, Vol. 3: Sorting and Searching, 2d ed., Addison-Wesley Ser. Comput. Sci. Inform. Process., Addison-Wesley, Reading, Mass., 1973. MR 56:4281 208, 209, 239 G. KUPERBERG, Random words, quantum statistics, central limits, random matrices, preprint, arXiv:math.PR/9909104 207 B. F. LOGAN and L. A. SHEPP, A variational problem for random Young tableaux, Adv. Math. 26 (1977), 206–222. MR 98e:05108 206 M. L. MEHTA, Random Matrices, 2d ed., Academic Press, Boston, 1991. MR 92f:82002 206, 207, 224 A. M. ODLYZKO and E. M. RAINS, “On longest increasing subsequences in random permutations” in Analysis, Geometry, Number Theory: The Mathematics of Leon Ehrenpreis (Philadelphia, 1998), Contemp. Math. 251, Amer. Math. Soc., Providence, 2000, 439–451. MR 2001d:05003 207 A. OKOUNKOV, Random matrices and random permutations, Internat. Math. Res. Notices 2000, 1043–1095. MR CMP 1 802 530 207 ¨ M. PRAHOFER and H. SPOHN, Statistical self-similarity of one-dimensional growth processes, Phys. A 279 (2000), 342–352. MR CMP 1 797 145 213 , Universal distributions for growth processes in 1 + 1 dimensions and random matrices, Phys. Rev. Lett. 84 (2000), 4882–4885. 213 A. REGEV, Asymptotic values for degrees associated with strips of Young diagrams, Adv. Math. 41 (1981), 115–136. MR 82h:20015 205, 206
RANDOM INVOLUTIONS
281
[42]
C. SCHENSTED, Longest increasing and decreasing subsequences, Canad. J. Math. 13
[43]
R. P. STANLEY, Generalized riffle shuffles and quasisymmetric functions, preprint,
[44]
C. A. TRACY and H. WIDOM, Level-spacing distributions and the Airy kernel, Comm.
(1961), 179–191. MR 22:12047 208 arXiv:math.CO/9912025 207
[45] [46] [47] [48]
Math. Phys. 159 (1994), 151–174. MR 95e:82003 207, 215 , On orthogonal and symplectic matrix ensembles, Comm. Math. Phys. 177 (1996), 727–754. MR 97a:82055 207, 215 , Random unitary matrices, permutations and Painlev´e, Comm. Math. Phys. 207 (1999), 665–685. MR CMP 1 727 236 207, 212 , On the distributions of the lengths of the longest monotone subsequences in random words, Probab. Theory Related Fields 119 (2001), 350–380. 207 A. M. VERSHIK and S. V. KEROV, Asymptotics of the Plancherel measure of the symmetric group and the limiting form of Young tables, Soviet Math. Dokl. 233 (1977), 527–531. MR 58:562 206
Baik Mathematics Department, Princeton University, Princeton, New Jersey 08544-1000, USA;
[email protected]; School of Mathematics, Institute for Advanced Study, Princeton, New Jersey 08540, USA Rains AT&T Labs-Research, Florham Park, New Jersey 07932, USA;
[email protected]
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 2,
ON ICOSAHEDRAL ARTIN REPRESENTATIONS KEVIN BUZZARD, MARK DICKINSON, NICK SHEPHERD-BARRON, AND RICHARD TAYLOR
Abstract If ρ : Gal(Qac /Q) → GL2 (C) is a continuous odd irreducible representation with nonsolvable image, then under certain local hypotheses we prove that ρ is the representation associated to a weight 1 modular form and hence that the L-function of ρ has an analytic continuation to the entire complex plane. Introduction E. Artin [A] conjectured that the L-series L(r, s) of any continuous representation r : Gal(Qac /Q) −→ GLn (C) is entire except possibly for a pole at s = 1 when r contains the trivial representation. The case when n = 1 is simply a restatement of the Kronecker-Weber theorem and standard results on the analytic continuation of Dirichlet L-series. Artin proved his conjecture when r is induced from a 1-dimensional representation of an open subgroup of Gal(Qac /Q). Moreover, R. Brauer [Br] was able to show in general that L(r, s) is meromorphic on the whole complex plane. Since then, the only real progress has been for n = 2, although very recently D. Ramakrishnan [Ra] has dealt with some n = 4 cases. When n = 2, such representations can be classified according to the image of the projectivised representation proj r : Gal(Qac /Q) −→ PGL2 (C). This image is either cyclic, dihedral, the alternating group A4 (the tetrahedral case), the symmetric group S4 (the octahedral case), or the alternating group A5 (the icosahedral case). When the image of proj r is cyclic, then r is reducible and Artin’s conjecture follows from the n = 1 case. When the image of proj r is dihedral, then r is induced from a character of an open subgroup of index 2, and so Artin himself proved the conjecture in this case, although the result is implicit in earlier work of E. Hecke [He]. R. Langlands [Langl] proved Artin’s conjecture for tetrahedral and some octahedral representations. J. Tunnell [Tu] extended this to all octahedral representations. These results are based on Langlands’s theory of cyclic base change for automorphic representations of GL2 , DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 2, Received 29 December 1999. Revision received 13 October 2000. 2000 Mathematics Subject Classification. Primary 11F11, 11F80; Secondary 11F33, 11G18, 14G22. Taylor’s work partially supported by National Science Foundation grant number DMS-9702885 and by the Miller Institute at the University of California at Berkeley. 283
284
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
and so the method seems to be restricted (at best) to cases where the image of r is soluble. The icosahedral case has until now largely been attacked using computational methods, where one can hope to construct an explicit weight 1 modular form to deal with any particular case. There is a growing literature on the computational side of the subject, started by J. Buhler in [Buh] and continued by G. Frey and others in [F], E. Goins [Go], K. Buzzard and W. Stein [BS], and A. Jehanne and M. M¨uller [JM]. In each case finitely many (up to twist) icosahedral cases of Artin’s conjecture are treated. The contribution here to the problem is to treat infinitely many icosahedral cases using a theoretical approach. More precisely, we prove the following theorem. THEOREM A
Suppose that r : Gal(Qac /Q) −→ GL2 (C) is a continuous irreducible representation and that r is odd, that is, that the determinant of complex conjugation is −1. If r is icosahedral, suppose that • proj r is unramified at 2 and that the image of a Frobenius element at 2 under proj r has order 3 and that • proj r is unramified at 5. Then there is a weight 1 newform f such that for all prime numbers p the pth Fourier coefficient of f equals the trace of Frobenius at p on the space of coinvariants for the inertia group at p in the representation r . In particular, the Artin L-series for r is the Mellin transform of a weight 1 newform and is an entire function. The proof follows a strategy outlined to A. Wiles by R. Taylor in 1992 (see [Ta2]), which we have now carried out in three main steps (see [ST], [Di], and [BT]). The purpose of this article is simply to pull these results together and document some technical results that we require but that do not seem to be available in the literature. The result is that this paper is rather technical. The reader who simply desires to get an overview of the main ideas of the proof should consult [Ta2], perhaps followed by [ST], [BT], and [Di], rather than this paper. We remark also that by using arguments mod 5 rather than mod 2, Taylor has proved in [Ta3] a theorem similar to Theorem A but with different local conditions. One might hope that extensions of our method may treat all odd 2-dimensional icosahedral representations of Gal(Qac /Q), although considerable work remains to be done. On the other hand, our method seems to offer no prospect of treating the general Artin conjecture.
ON ICOSAHEDRAL ARTIN REPRESENTATIONS
285
1. mod 2 icosahedral representations In this section we give a slight extension of results in [ST]. This could be avoided by appealing to the results of B. Gross in [Gr]. However, Gross’s results depend on certain “unchecked compatibilities,” and so we prefer to make our result unconditional by using this more ad hoc argument. We remark that the hypotheses in our main theorem could be weakened if one could make Gross’s theorem unconditional. We start with a strengthening of [ST, Th. 3.4]. 1.1 Fix a continuous homomorphism THEOREM
ρ : Gal(Qac /Q) −→ SL2 (F4 ). Suppose that ρ is unramified at 2 and that ρ(Frob2 ) has distinct eigenvalues α, β ∈ F× with a principal polarisation 4 . Then there is an abelian surface A/Q together √ λ : A −→ A∨ and an embedding i : Z[(1 + 5)/2] ,→ End(A) (both defined over Q ) such that √ (1) λ ◦ i(a) = i(a)∨ ◦ λ for all a ∈ Z[(1 + 5)/2]; (2) the action of Gal(Qac /Q) on A[2] ∼ = F24 is equivalent to ρ; (3) A has good ordinary reduction at 2 and Frob2 = α on A[2]et (the generic fibre of the maximal e´ tale quotient of the 2-torsion on the N´eron model of A over Z); and √ √ (4) the action of Gal(Qac /Q) on the 5-division points, A[ 5], is via a surjection Gal(Qac /Q) → → GL2 (F5 ). Proof With the third condition removed, this is the main result of [ST]. The proof of this strengthening is a slight variant of the argument of that paper. We start by recalling some of the constructions there. √ We fix an identification of F4 with Z[(1 + 5)/2]/(2) and of SL2 (F4 ) with A5 . We let Y/Q denote the smooth cubic surface given in P4 by 5 X i=1
yi =
5 X
yi3 = 0.
i=1
The group A5 acts on Y by permuting the variables. We let Y 0 ⊂ Y (resp., Y 1 ⊂ Y ) denote the complement of the 15 lines conjugate to (s : −s : t : −t : 0) (resp., the complement of the 10 points conjugate to (1 : −1 : 0 : 0 : 0)). We let Yρ (resp., Yρ0 , resp., Yρ1 ) denote the twist of Y (resp., Y 0 , resp., Y 1 ) by ρ : Gal(Qac /Q) → A5 . There is an e´ tale P1 -bundle Cρ → Yρ1 together with 6 distinguished sections
286
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
s1 , . . . , s6 : Yρ1 × Qac → Cρ × Qac such that the set {s1 , . . . , s6 } is Gal(Qac /Q)invariant. SUBLEMMA
Over Yρ0 , these sections are distinct. Proof We use without comment some notation from [ST, Sec. 2]. By the formulae in [DO, pp. 15–17], the locus in P16 where s1 , . . . , s6 are distinct is identified with the comP6 P6 3 plement, Z 0 , in i=1 zi = i=1 z i = 0 of the 15 S6 -conjugates of the plane (s : −s : t : −t : u : −u). Then using [ST, Lem. 2.4] it is easy to see that j −1 Z 0 = Y 0 , and the result follows. We let Wρ /Q denote the F4 -vector space scheme corresponding to ρ Gal(Qac /Q) → GL2 (F4 ). It comes with a standard pairing
:
Wρ × Wρ −→ µ2 which on Qac -points sends (a, b) × (c, d) 7−→ (−1)trF4 /F2 (ad−bc) . Then there is a coarse moduli space Hρ /Q parametrising quadruples (A, √ λ, i, α), where (A, λ) is a principally polarised abelian surface, where i : Z[(1 + 5)/2] ,→ ∼ End(A) has image fixed by the λ-Rosati involution, and where α : Wρ → A[2] is an isomorphism of F4 -vector space schemes taking the standard pairing to the λ-Weil pairing. There is a Zariski-open subset Hρ0 ⊂ Hρ consisting of those geometric points for which the corresponding (A, λ) is a Jacobian. We claim that there is an isomorphism Yρ0 ∼ = Hρ0 so that a geometric point y of Yρ0 maps to the point parametrising a quadruple (A, λ, i, α) such that (A, λ) is the Jacobian of the curve which maps 2 : 1 to Cρ,y ramified exactly at s1 (y), . . . , s6 (y). Unfortunately, this is not explicitly stated in [ST]. To prove it, one may assume that ρ = 1. Recall from [ST] that we have maps Y − →H2∗ −→ A2∗ − →P16 . (We keep the notation of [ST], so in particular H2∗ is a compactification of what we are now calling H1 .) The locus of Jacobians in A2∗ is the locus of points where A2∗ − →P16 is regular and which map to Z 0 ⊂ P16 . Thus Y 0 maps to H10 ⊂ H2∗ . On the other hand, H2∗ is the disjoint union of the image of Y 0 and some P1 ’s which get contracted to the points of P16 − (P16 )s (see [ST, Sec. 2].) Thus, if y is a point of H2∗ not in the image of Y 0 , then either H2∗ − →P16 is not regular at y or y gets mapped outside Z 0 . In either case, y does not lie in H10 , establishing the claim.
ON ICOSAHEDRAL ARTIN REPRESENTATIONS
287
If X ρ0 denotes the blow-up of Yρ × Yρ along the diagonal, then X ρ0 has an involution t that exchanges the two factors. We let X ρ denote the twist of X ρ0 by √ ∼ Gal(Qac /Q) → → Gal(Q( 5)/Q) → {1, t}, and we let X ρ2 be the complement in X ρ of the strict transforms of L × L as L runs over lines on Yρ . Then there is a morphism θ : X ρ2 −→ Yρ which (loosely speaking) sends (P, Q) to the third point of intersection of the line through P and Q with Yρ (see [ST] for details). We let X ρ0 (resp., X ρ1 , resp., Dρ / X ρ1 ) denote the preimage of Yρ0 (resp., the preimage of Yρ1 , resp., the pullback of Cρ ) under θ. Then it is proved in [ST, Lem. 3.1 and Prop. 3.2] that X ρ /Q is rational and that Dρ / X ρ1 is a Zariski P1 -bundle. The argument preceding [ST, Lem. 2.7] shows that given x ∈ X ρ0 we can find a Zariski-open subset U ⊂ X ρ0 containing x and a principally polarised abelian surface (AU , λU )/U such that (1) for all x1 ∈ U the fibre (AU , λU )x1 is the Jacobian of a curve which maps 2 : 1 to Dρ,x1 ramified exactly at s1 (x1 ), . . . , s6 (x1 ); ∼ (2) there is an isomorphism αU : Wρ → AU [2] of finite flat group schemes over U with alternating pairings; √ and (3) there exists iU : Z[(1 + 5)/2] ,→ End(AU ) which is compatible with αU and the action of F4 on Wρ . We remark that in [ST] the existence of iU is explained only over a nonempty open subset of U , but by [CF, Rem. 1.10(a) of Chap. I], iU extends to U . We remind the reader that AU is not canonical. Suppose that x is a geometric point of U . If f is an automorphism of (AU , λU , iU , αU )x , then T2 ( f ) ≡ 1 mod 2 and so T2 ( f 2 ) ≡ 1 mod 4. As f has finite order, this implies that f 2 = 1. If f 6= ±1, then AU,x ∼ = (1 + f )/2AU,x ⊕ (1 − f )/2AU,x and λU correspondingly decomposes as the direct sum of two polarisations. This contradicts the fact that θ(x) ∈ Yρ0 ∼ = Hρ0 . Thus we must have Aut((AU , λU , iU , αU )x ) = {±1}. In particular, if we set √ e = {(a, b) ∈ (AU × AU )[ 5] | ha, bi 6= 1}/ ∼, U where (a, b) ∼ (a 0 , b0 ) if and only if (a, b) = ±(µa 0 , b0 ) for some µ ∈ F× 5 , then e is canonical and so we can glue the U e/U to give an e´ tale cover the construction of U e X ρ0 is geometrically X ρ0 / X ρ0 of degree 60. The argument of [ST, Lem. 2.7] shows that e irreducible.
288
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
Suppose for the moment that we can find a point x2 ∈ X ρ0 (Q2 ), a Zariski-open U2 ⊂ X ρ0 × Q2 as above, and a continuous character χ2 : Gal(Qac 2 /Q2 ) → {±1} such that • the twist AU2 ,x2 (χ2 ) of AU2 ,x2 by χ2 has good reduction, and • if AU2 ,x2 (χ2 ) denotes the mod 2 reduction of the N´eron model of AU2 ,x2 (χ2 ) over Z2 , then AU2 ,x2 (χ2 )[2]et 6= (0) and Frob2 acts on AU2 ,x2 (χ2 )[2]et by α. Then we can find a neighbourhood (for the 2-adic topology) U ⊂ X ρ0 (Q2 ) of x2 such that if x ∈ U , then • x ∈ U2 , • AU2 ,x (χ2 ) has good reduction at 2, and • AU2 ,x (χ2 )[2] ∼ = AU2 ,x2 (χ2 )[2]. Because X ρ is rational, it follows from T. Ekedahl’s version of the Hilbert irreducibility theorem (see [E, Th. 1.3]) that we can find a point x ∈ X ρ0 (Q) such that • x ∈ U , and • if e x is a point of e X ρ0 above x, then [Q(e x ) : Q] = 60. Suppose that U is a Zariski neighbourhood of x in X ρ0 as above. Then (AU , λU , iU , αU )x × Q2 is a twist by some character χ20 : Gal(Qac 2 /Q2 ) → {±1} of (AU2 , λU2 , iU2 , αU2 )x . Choose a character χ : Gal(Qac /Q) −→ {±1} which restricts to χ2 χ20 on Gal(Qac 2 /Q2 ). Then AU,x (χ) has the following properties: • (AU,x (χ ), λU,x√)/Q is a principally polarised abelian surface; • iU,x : Z[(1 + 5)/2] ,→ End(AU,x (χ)), and the image is fixed by the λU,x Rosati involution; • as an F4 [Gal(Qac /Q)]-module, AU,x (χ)[2](Qac ) is equivalent to ρ; • AU,x (χ) × Q2 ∼ = AU2 ,x (χ2 ), and so AU,x (χ) has good reduction at 2; • AU,x (χ)[2] ∼ = AU2 ,x2 (χ2 )[2], and so AU,x (χ)[2]et 6= (0) and Frob2 acts on AU,x (χ )[2]et by α; √ • if G denotes the image Gal(Qac /Q) in AutF4 (AU,x (χ )[ 5]) ∼ = GL2 (F5 ), then det G = F× (because of the λ-Weil pairing) and 5 #G/G ∩
µ 0 0 ν
ν = ±1, µ ∈ F× = 60; 5
it is then elementary to check that G = GL2 (F5 ). It remains to explain the construction of x2 . This we do in two steps. More precisely, we show the following two results. (1) There is a quadruple (A, λ, i, α) (as above) defined over K such that A has good reduction, and if A denotes the reduction of its N´eron model, then A[2]et 6= (0) and Frob2 acts on A[2]et by α. (2) If y ∈ Yρ0 (Q2 ), then there is a point of X ρ0 (Q2 ) mapping to y under θ .
ON ICOSAHEDRAL ARTIN REPRESENTATIONS
289
The first assertion gives a point y2 ∈ Hρ0 (Q2 ) = Yρ0 (Q2 ) and the second a point x2 ∈ X ρ0 (Q2 ) mapping to y2 under θ. This point x2 suffices. We initially establish the second assertion. Suppose y ∈ Yρ0 (Q2 ), and let Yρ (y)0 denote the complement in Yρ of the intersection of Yρ with the tangent plane to Yρ at y. Thus Yρ (y)0 is a smooth affine cubic surface. There is an involution ι y of Yρ (y) which sends any point z to the third point of intersection of the line through y and 0 z with the √ cubic surface Yρ . We let Yρ (y) denote the twist of Yρ (y) by ι y over Gal(Q2 ( 5)/Q2 ). We may identify Yρ (y) as a Zariski-open subset of the fibre of θ : X ρ0 → Yρ0 above y, and so it suffices to show that Yρ (y)(Q2 ) 6= ∅. Note that the equations defining Y also define a smooth projective surface over Z2 , which we also denote by Y . The constructions of Yρ , Yρ (y)0 , and Yρ (y) from Y all make sense over Z2 and give rise to smooth relative surfaces over Z2 , which we denote by the same symbols. Here we are using the fact that ρ is unramified, and we are not asserting that these integral models have any moduli theoretic meaning. By Hensel’s lemma it suffices to show that Yρ (y)(F2 ) is nonempty. Without loss of generality, the surface Yρ × F2 is given in P3 by the equation X 13 + X 1 X 22 + X 23 + X 32 X 4 + X 3 X 42 = 0. (If γ is a root of T 3 + T + 1 = 0, then (X 1 : X 2 : X 3 : X 4 ) corresponds to the point (X 3 + X 4 ) + X 1 γ + X 2 γ 2 : (X 3 + X 4 ) + X 1 γ 2 + X 2 γ 4 : (X 3 + X 4 ) + X 1 γ 4 + X 2 γ : X 3 : X 4 of Y × F2 .) Thus Yρ (F2 ) has three points: P = (0 : 0 : 1 : 0), Q = (0 : 0 : 0 : 1), and R = (0 : 0 : 1 : 1). First, suppose that y reduces to P. Then Yρ (y) × F2 is the surface given in affine 3-space by the equation x13 + x1 x22 + x23 + x3 + x32 = 0 and ι y maps (x1 , x2 , x3 ) to (x1 , x2 , x3 + 1). (Here we set xi = X i / X 4 .) Thus Yρ (y) × F2 is given in affine 3-space by the equation y13 + y1 y22 + y23 + 1 + y3 + y32 = 0. (Here we let (y1 , y2 , y3 ) correspond to the point (x1 , x2 , x3 ) = (y1 , y2 , y3 + (1 + √ 5)/2).) Thus Yρ (y)(F2 ) consists of 6 points. Second, suppose that y reduces to Q. This case is exactly analogous, and again we see that Yρ (y)(F2 ) consists of 6 points.
290
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
Third, suppose that y reduces to R. Introducing a new variable X 40 = X 3 + X 4 , we see that Yρ × F2 can also be described in P3 by the equation X 13 + X 1 X 22 + X 23 + X 32 X 40 + X 3 (X 40 )2 = 0 and that in these new coordinates R becomes the point (0 : 0 : 1 : 0). Thus the analysis is the same again and we see that Yρ (y)(F2 ) again consists of 6 points. Finally, we turn to our first assertion. Let K denote the field Q(a), where a is a root of T 4 + 13T 2 + 41 = 0. √ K is a CM Then 13 + 2a 2 is a square root of √ 5, which we denote 5. Moreover, −1 field with totally real subfield Q( 5). The inverse different d K /Q is principal with generator ξ = (13a + 2a 3 )−1 . We have the prime factorisation √ √ 2O K = (((1 + 5)/2 + a)/2)(((1 + 5)/2 − a)/2). × As √ ±1 are the only roots of unity √ in K , the only elements of K with norm down to Q( 5) equal to 2 are (±(1 + 5)/2 ± √a)/2. √ The normal closure of K /Q is K ( 41)/Q, and Gal(K ( 41)/Q) is generated by two elements σ and τ , where √ σ (a) = 41/a, τ (a) √ √ √ = a, √ σ ( 41) = − 41, τ ( 41) = − 41.
Thus σ 4 = τ 2 = 1, τ σ τ = σ 3 , and σ 2 = c. By the Chebotarev density theorem, we may choose a prime ℘ of O K which is split completely and lies above a rational prime p ≡ 3 mod 4. Let α0 denote the character O K×, p → → O K×,℘ → → {±1}.
√ Fix an embedding K ( 41) ,→ C such that a has negative imaginary part, 13 + √ 2a 2 > 0,√ and 41 > 0. Then 8 = {1, σ } is a CM-type with reflex (L , 80 ), where 0 3 L = K ( 41){1,σ τ } and √ 8 = {1, σ }. The field L is also a CM field and has a totally real subfield Q( 41). It is isomorphic to the field obtained by adjoining a root × 2 of T 4 + 26T √ + 5 to Q. Then L has class number 1 andc O L is generated by −1 and 32 + 5 41. We have a prime factorisation 2Ol = I I J with #O L /I = 2 and #O L /J = 4. We have a homomorphism N80 : L × −→ K × , x 7−→ xσ 3 (x). × Then N80 extends to a map A× L → A K . Define a continuous homomorphism × α : A× L −→ K
by setting
ON ICOSAHEDRAL ARTIN REPRESENTATIONS • •
α| L × = N80 , α|O × = α0 ◦ N80 , L,p
•
291
α|O ×
L , p0 × Lv
= 1 for any rational prime p 0 6 = p, and
α| = 1 for any infinite place v of L. (This makes sense because the class number of L is 1 and because (α0 ◦ N80 )|O × = L N80 |O × .) L By results in [Lang], especially [Lang, Chap. 1, Ths. 3.6 and 4.5 and Chap. 5, Cor. 5.3], we see that there is a triple (A, λ, i)/L, where (A, λ) is a principally polarised simple abelian surface with an action i of O K , which has type (K , 8, O K , ξ ) and character α. Because α is trivial on O L×,I , we see from the fundamental theorem of complex multiplication (see [Lang, Th. 1.1 of Chap. 4]) that, for a rational prime l > 2, inertia at I acts trivially on Tl A, the l-adic Tate module of A. Thus A has good reduction at I . Let A denote the reduction mod I of the N´eron model of A. Moreover, if I = (a), then Frob2 acts on Tl A via ±N80 a. As N K /Q(√5) N80 a = 2, we see that √ ±N80 a = (±(1 + 5)/2 ± a)/2, and so √ ±N80 a ≡ (1 + 5)/2 mod (N80 I )c . √ Thus A[N80 I c ] is e´√tale and Frob2 acts on it as (1 + 5)/2. If α = (1 + 5)/2, then (A, λ, i|Z[(1+√5)/2] )/L I suffices to give the desired √ example. If on the other hand α = (1 − 5)/5, then (A, λ, i|Z[(1+√5)/2] ◦ σ )/L I suffices to give the desired example. •
We now apply this theorem to deduce the modularity of certain mod 2 representations. If N , M, and k are positive integers, we denote by h k (N ; M) the Z-algebra generated by the Hecke operators T p and h pi for any prime p6 | N M, and by the Hecke operators U p for any prime p|N M acting on the space of weight k cusp forms for 01 (N ) ∩ 00 (M). If M | N , we drop it from the notation and write simply h k (N ). If p6 | N M, set S( p) = p k−2 h pi. Also, for every positive integer n, define T (n) by the relations • T (n 1 n 2 ) = T (n 1 )T (n 2 ) if n 1 and n 2 are coprime, P • (1 − T p X + pS( p)X 2 ) r∞=1 T ( pr )X r = 1 for any prime p6 | N M, and • T ( pr ) = U pr for every prime p|N M. COROLLARY 1.2 Fix a continuous homomorphism
ρ : Gal(Qac /Q) −→ SL2 (F4 ). Suppose that ρ is unramified at 2 and 5 and that ρ(Frob2 ) has distinct eigenvalues α, β ∈ F× 4 . Then there is an odd positive integer N divisible by all primes at which ρ
292
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
ramifies and a homomorphism f α : h 2 (N ) −→ F4 which takes (1) T p to tr ρ(Frob p ) for all primes p6 | 2N , (2) T2 to α, and (3) U p to zero for all p|N . Proof First, note that [ST, Th. 4.1] is improved in [BCDT] to suppress the condition on ρ(I3 ). Thus [ST, Th. 4.2] can be improved to suppress the condition that A has semistable reduction at 3. The proof of this corollary is then the same as the proof of [ST, Th. 4.3] except that we replace references to [ST, Th. 4.2] by this improvement and references to [ST, Th. 3.4] by references to Theorem 1.1 of this paper. 2. l-adic modular forms Let l be a prime. In this section we recall some facts about l-adic modular forms, which are applied later in the case when l = 2. The most important for us is the assertion that an l-adic limit of ordinary classical modular forms is overconvergent (see Lemma 2.12). Many of these assertions appear in the literature, but we have not been able to locate proofs for them. For primes l > 3, such results are due to N. Katz [K], but we follow Coleman’s approach via rigid geometry. Fix an integer N ≥ 5 which is not divisible by l. Let X 1 (N )/Zl denote the usual compactification of the moduli scheme for pairs (E, i), where E is an elliptic curve and i is an embedding µ N ,→ E[N ]. Also, let X 1 (N ; l)/Zl denote the usual α compactification of the moduli scheme for pairs (E, i, E → E 0 ), where E is an elliptic curve, i is an embedding µ N ,→ E[N ], and α : E → E 0 is an isogeny of degree l. There are two natural projections, π1 and π2 : X 1 (N ; l) → X 1 (N ), which α take (E, i, E → E 0 ) to (E, i) and (E 0 , α ◦ i), respectively. We let ω X 1 (N ) (resp., ω X 1 (N ;l) ) denote the canonical extension to the cusps of the pullback by the identity section of the sheaf of relative differentials of the universal elliptic curve over the noncuspidal locus of X 1 (N ) (resp., X 1 (N ; l)). Then π1∗ ω X 1 (N ) = ω X 1 (N ;l) and there is a natural map j = (α ∨ )∗ : ω X 1 (N ;l) → π2∗ ω X 1 (N ) . After one inverts l, j becomes an isomorphism. We let SS denote the finite set of points in X 1 (N )(Flac ) corresponding to supersingular elliptic curves. For s ∈ SS, choose Ts ∈ O X 1 (N )×W (Flac ),s so that ac ∼ (X 1 (N ) × W (Flac ))∧ s = Spf W (Fl )[[Ts ]]
and so that if σ ∈ Gal(Flac /Fl ) and s ∈ SS, then (1 × σ ∗ )∗ (T(1×σ ∗ )(s) ) = Ts .
ON ICOSAHEDRAL ARTIN REPRESENTATIONS
293
(Here W (k) denotes the Witt vectors of k.) Let Cl denote the completion of Qlac . We let X 1 (N )an denote the rigid analytic space over Cl associated to X 1 (N ). It is connected. If r ∈ l Q and 1 ≥ r ≥ 1/l, we let X 1 (N )≥r (if r 6= 1/l) (resp., X 1 (N )>r (if r 6 = 1)) denote the rigid analytic subspace of X 1 (N )an where for each s ∈ SS we remove all points x in the residue disc of s with |Ts (x)|l < r (resp., ≤ r .) (Here | |l is the l-adic absolute value normalised by |l|l = 1/l.) LEMMA 2.1 The rigid space X 1 (N )≥r is connected.
Proof Suppose that X 1 (N )≥r has an admissible open cover {U, V } with U and V nonempty and disjoint. For each s ∈ SS the preimage of s in X 1 (N )≥r is an annulus and hence e (resp., V e) denote the union of connected and contained in either U or V . Let U U (resp., V ) with the residue disc of each s ∈ SS for which the preimage of s in e, V e} is an admissible open cover of X 1 (N )≥r is contained in U (resp., V .) Then {U an X 1 (N ) by disjoint nonempty sets, a contradiction. ⊗k over We let Mk≥r (N ) (resp., Mk>r (N )) denote the space of sections of (ωan X 1 (N ) ) X 1 (N )≥r (resp., X 1 (N )>r ). The spaces Mk≥r (N ) have natural norms making them Banach spaces. More precisely, we set
| f |r =
sup
x∈X 1 (N )≥r (Cl )
| f |x ,
where we define | f |x as follows. Let x ∈ X 1 (N )(Flac ) denote the reduction of x, and let f 0 denote a local generator for ω⊗k X 1 (N ) near x. Then we set | f |x = |( f / f 0 )(x)|l , which is easily checked to be independent of the choice of f 0 . Note that if r1 ≥ r2 and if f ∈ Mk≥r2 (N ), then | f |r1 ≤ | f |r2 . We let X 1 (N )0 denote the formal completion of X 1 (N ) along its locally closed subscheme X 1 (N ) × Fl − SS. It is a formal scheme over Zl . The base change to Cl of the rigid analytic space associated to X 1 (N )0 is just X 1 (N )≥1 . Thus we get an identification ∼ ≥1 b 0(X 1 (N )0 , ω⊗k X 1 (N ) )⊗Zl Cl → Mk (N ), ≥1 b under which 0(X 1 (N )0 , ω⊗k X 1 (N ) )⊗Zl OCl is identified to the unit ball in Mk (N ). There is a map Spec Zl ((q)) −→ X 1 (N )
294
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
corresponding to the pair (Gm /q Z , i can ), where Gm /q Z denotes the Tate curve (Tate(q) in the notation of [KM, Sec. 8.8]) and where i can comes from the tautological embedding µ N ,→ Gm (see [KM, Prop. 8.11.7]). This map extends to a map Spec Zl [[q]] −→ X 1 (N ) (use [KM, Th. 8.11.10]), and this gives rise to a map Spf Zl [[q]] −→ X 1 (N )0 . If f ∈ 0(X 1 (N )0 , ω⊗k X 1 (N ) ), then its pullback to Spf Zl [[q]] has the form ∞ X
cn ( f )q n (dt/t)⊗k ,
n=0
where t is the usual parameter on Gm and where we refer to q-expansion at infinity of f . This extends to a map
P∞
n=0 cn ( f )q
n
as the
Mk≥1 (N ) −→ Cl [[q]],
f 7 −→
∞ X
cn ( f )q n .
n=0
From the q-expansion principle (see [K, Sec. 1.6] and note that X 1 (N ) × Flac is irreducible), we deduce that for f ∈ Mk≥1 (N ) we have | f |1 = sup |cn ( f )|l . n
⊗(l−1)
If l ≥ 5, we let E denote the section of ω X 1 (N ) over X 1 (N ) with q-expansion at infinity, ∞ X 1 − (2(l − 1)/Bl−1 ) σl−2 (n)q n , n=1
P t where Bk denotes the Bernoulli number, and σt (n) = 0
ON ICOSAHEDRAL ARTIN REPRESENTATIONS
295
then the E/ f s for s ∈ SS form one possible choice for a collection of local parameters Ts at s ∈ SS satisfying (1 × σ ∗ )∗ T(1×σ ∗ )(s) = Ts for all σ ∈ Gal(Flac /Fl ) and s ∈ SS. Hence E has no zero on X 1 (N )>1/l . If l ≥ 3, we set E 0 = E. If l = 2, we take E 0 to be the section of ω⊗4 X 1 (N ) over X 1 (N ) with q-expansion at infinity, 1 + 240
∞ X
σ3 (n)q n .
n=1
In either case the q-expansion at infinity of E 0 is congruent to 1 modulo l and E 0 has no zeros in X 1 (N )>l −1/4 . We recall some elementary results about rigid analytic functions on annuli. The set of analytic functions on the annulus β ≤ |z|l ≤ α is the set of functions ∞ X
f (z) =
an z n
n=−∞
for which |an |l
βn
→ 0 as n → −∞ and |an |l α n → 0 as n → ∞.
LEMMA 2.2 If r ∈ l Q and β ≤ r ≤ α, then the supremum of | f (z)|l on |z|l = r equals
sup |an |l r n . n
Proof Set A = supn |an |l r n . Then sup | f (z)|l = A sup
|z|l =r
X
|w|l =1 |a | r n =A n l
cn w n
l
for some cn with |cn |l = 1. However, for |w|l = 1 we see that X n cn w ≤ 1 l
|an |l r n =A
with equality for some such choice of w, which is enough. P n In particular, we see that if f (z) = ∞ n=∞ an z is a function on the annulus β ≤ |z|l ≤ α, then | f (z)|l always achieves its maximum on either |z|l = α or |z|l = β (or possibly on both). In the former case this maximum equals sup |an |l α n = sup |an |l α n , n
n≥0
296
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
and in the latter case it equals sup |an |l β n = sup |an |l β n . n
n≤0
Suppose now that f is an analytic function on the annulus β ≤ |z|l < α such that | f (z)|l is bounded by A. Then we have f (z) =
∞ X
an z n ,
n=−∞
where |an |l β n → 0 as n → −∞ and where for all n we have |an |l ≤ Aα −n and |an |l ≤ Aβ −n . If | f (z)|l achieves its supremum, it does so on |z|l = β and the supremum equals sup |an |l β n = sup |an |l β n . n
n≤0
LEMMA 2.3 Suppose that 1 > r > 1/l, that r ∈ l Q , and that f is a rigid analytic function on X 1 (N )≥r . Then | f (x)|l achieves its supremum and does so at some point y that reduces to an element s ∈ SS and satisfies |Ts (y)|l = r .
Proof Because X 1 (N )≥r is a finite union of affinoids, the maximum modulus principle tells us that | f (x)|l does achieve its supremum. Thus we may assume that this supremum equals 1. If | f (x)|l does not achieve its supremum in X 1 (N )≥1 , then it does so in the inverse image under reduction of some s ∈ SS and the lemma follows from the facts about rigid analytic functions on annuli which we recalled above. Thus, suppose that f achieves its maximum in X 1 (N )≥1 . As | f (x)|l ≤ 1 on X 1 (N )≥1 , f is a global section of the structure sheaf of the formal completion of X 1 (N ) × OCl along X 1 (N ) × Flac − SS, and it thus reduces to give a regular function f on X 1 (N ) × Flac − SS. Thus we may choose s ∈ SS such that either f has a pole at s or f is constant. Choose also an affine neighbourhood U of s in X 1 (N ) × Flac which contains no other element of SS and which admits a regular function g that has a simple zero at s and no other zero on U . Let the formal completion of X 1 (N )×W (Flac ) along U equal Spf A, and let g ∈ A be a lift of g. Note that the formal completion of X 1 (N )× W (Flac ) at s is isomorphic to Spf W (Flac )[[g]].
ON ICOSAHEDRAL ARTIN REPRESENTATIONS
297
bOCl )hhSii/(gS − 1). The formal completion of X 1 (N ) × OCl along U − {s} is Spf(A⊗ Thus we may expand f as ∞ X fi Si i=0
bOCl ) and f i → 0 as i → ∞. The same expansion holds on the rigid with f i ∈ (A⊗ analytic subspace of X 1 (N )≥r consisting of points that reduce to U (as this space is connected, being the inverse image under reduction of a Zariski-connected space). Moreover, on U we see that ∞ X f = f i g −i , i=0
where f i denotes the reduction of f i and where now the sum is finite. In the formal completion of X 1 (N ) × OCl at s, we may expand fi =
∞ X
ai j g j
j=0
with ai j ∈ OCl . Thus, on the rigid analytic subspace of X 1 (N )≥r consisting of points that reduce to s, we see that f =
∞ X X k=−∞
ai,i+k g k .
i
(The second sum is over i ∈ Z such that i ≥ 0 and i + k ≥ 0.) Similarly, we see that in the formal completion of X 1 (N ) × Flac at s we have f =
∞ X X k=−∞
ai,i+k g k .
i
Write bk for i ai,i+k . Then bk ∈ OCl and either • bk is a unit for some k < 0, or • b0 is a unit and bk reduces to zero for all k 6 = 0. In either case we see that the supremum of | f (x)|l on |g(x)|l = r (i.e., on |Ts (x)|l = r ) is greater than or equal to 1, as desired. P
2.4 If 1 > r > 1/l, then there is a constant C (depending on k, N , and r ) such that for all f ∈ Mk≥r (N ) we have | f |r ≤ C sup | f |x , LEMMA
s,x
where s runs over SS and where x runs over elements of the residue disc of s with |Ts (x)|l = r .
298
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
Proof If l = 2, reduce to the case when 5|N by passing to a cover. By Lemma 2.3 we see that | f l−1 /E k |l on X 1 (N )≥r achieves its supremum at some point x that reduces to some s ∈ SS and that satisfies |Ts (x)|l = r . Thus for all y ∈ X 1 (N )≥r we have k l−1 k | f |l−1 y /|E| y ≤ sup | f |x /|E|x , s,x
where s and x run over the sets described in the statement of the lemma. Hence k | f |l−1 ≤ |E|rk sup(| f |l−1 r x /r ), s,x
where again s and x run over the sets described in the statement of the lemma. The lemma follows with C = (|E|r /r )k/(l−1) . ≥r For each s ∈ SS, choose a local generator f s of ω⊗k X 1 (N ) near s. If f ∈ Mk (N ) and s ∈ SS, then restricting f to the annulus 1 > |Ts (x)|l ≥ r in the residue disc of s we see that f / f s can be expanded as
f / fs =
∞ X
an (s, f )Tsn ,
n=−∞
where the an (s, f ) are bounded for n > 0 and where |an (s, f )|l r n −→ 0 as n → −∞. Choose a nonnegative integer M such that r −M > C (the constant from the lemma above), and choose πr ∈ Cl with |πr |l = r . Now consider the map 2 from Mk≥r (N ) to the direct sum of #SS Tate algebras Cl hT i SS which sends f to ∞ X
a M−n (s, f )πrM−n T n
n=0
s∈SS
.
P One clearly has |2( f )| ≤ | f |r . (Here, as usual, we set |( n bn (s)T n )s∈SS | = sups,n |bn (s)|l .) On the other hand, for all n ∈ Z and s ∈ SS we have |an (s, f )πrn |l ≤ |2( f )|; if this were false, then we could choose s and n so that |an (s, f )πrn |l is maximal. Then we must have n > M and we see that | f |r ≥ |an (s, f )|l = r −n sup | f |x > C sup | f |x ≥ | f |r , s,x
s,x
a contradiction. Thus C|2( f )| ≥ C sup | f |x ≥ | f |r . s,x
We deduce that 2 is a homeomorphism onto a closed subspace of Cl hT i SS .
ON ICOSAHEDRAL ARTIN REPRESENTATIONS
299
LEMMA 2.5 Suppose that 1 > r1 > r2 > 1/l. Then the natural inclusion ≥r2
Mk
(N ) ,→ Mk≥r1 (N )
is completely continuous. Proof We have a commutative diagram ≥r2
(N ) ,→ Mk≥r1 (N ) ↓ ↓ Cl hT i SS −→ Cl hT i SS
Mk
P ∞ n=0
bn (s)T n
7 −→
s∈SS
P ∞
bn (s)(πr2 /πr1 )n T n
n=0
s∈SS
where the vertical arrows are homeomorphisms onto closed subspaces (and where we have made the same choice of M to define both vertical arrows). The lower horizontal arrow is a limit of continuous operators with finite range and hence completely continuous. It follows that the upper horizontal arrow is completely continuous. The reduction X 1 (N ; l) × Flac of X 1 (N ; l) has two irreducible components that we denote X 1 (N ; l)∞ and X 1 (N ; l)0 . We choose the labelling so that ∼ • π1 : X 1 (N ; l)∞ −→ X 1 (N ) × Flac , • π2 : X 1 (N ; l)∞ −→ X 1 (N ) × Flac has degree l, • π1 : X 1 (N ; l)0 −→ X 1 (N ) × Flac has degree l, and ∼ • π2 : X 1 (N ; l)0 −→ X 1 (N ) × Flac . The two curves X 1 (N ; l)∞ and X 1 (N ; l)0 intersect in a finite number of points which ∼ ∼ we denote SSl . Then π1 : SSl → SS and π2 : SSl → SS are both bijections (see, for instance, [KM, Lem. 5.3.1] for these assertions). If s ∈ SSl , we write Ts,i for πi∗ Tπi s . 2.6 If s ∈ SSl , then (X 1 (N ; l) × W (Flac ))∧ s is isomorphic to LEMMA
l l Spf W (Flac )[[Ts,1 , Ts,2 ]]/((Ts,1 − Ts,2 )(Ts,2 − Ts,1 ) − lu s )
for some u s ∈ W (Flac )[[Ts,1 , Ts,2 ]]× . Proof ∼ [KM, Th. 6.6.2] tells us that (X 1 (N ; l) × W (Flac ))∧ s = Spf R for some 2-dimensional,
300
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
regular complete local ring R, which is flat over W (Flac ). [KM, Th. 13.4.7] tells us that l l R/l R ∼ )(Ts,2 − Ts,1 )). = Flac [[Ts,1 , Ts,2 ]]/((Ts,1 − Ts,2 Thus we have a surjection W (Flac )[[Ts,1 , Ts,2 ]] → → R and the kernel must be generated by one element f with l )(T l • f ≡ (Ts,1 − Ts,2 s,2 − Ts,1 ) mod l, and • f 6∈ (l, Ts,1 , Ts,2 )2 . The lemma follows. COROLLARY 2.7 If s ∈ SSl , then ac ∼ (X 1 (N ; l) × W (Flac ))∧ s = Spf W (Fl )[[X 1 , X 2 ]]/(X 1 X 2 − l).
Proof l ) and X = (T l −1 Take, for instance, X 1 = (Ts,1 − Ts,2 2 s,2 − Ts,1 )u s . 0 For r ∈ l Q and 1 ≥ r > 1/l, we define X 1 (N ; l)∞ ≥r (resp., X 1 (N ; l)≥r ) to be the an admissible open subset of X 1 (N ; l) consisting of • all points of X 1 (N ; l)an which reduce to a point of X 1 (N ; l)∞ − SSl (resp., X 1 (N ; l)0 − SSl ), and • all points x ∈ X 1 (N ; l)an which reduce to some s ∈ SSl and for which
|Ts,1 (x) − Ts,2 (x)l |l ≥ r (resp., |Ts,2 (x) − Ts,1 (x)l |l ≥ r ). If in fact 1 > r 2 > 1/l and s ∈ SSl , then we let Us (r ) denote the admissible open subset of X 1 (N ; l)an consisting of points that reduce to s and that satisfy |Ts,1 (x) − Ts,2 (x)l |l ≤ r and |Ts,2 (x) − Ts,1 (x)l |l ≤ r. It is easy to check that these sets do not depend on the choice of {Ts } as long as the choices satisfy (1 × σ ∗ )∗ (T(1×σ ∗ )(s) ) = Ts for σ ∈ Gal(Flac /Fl ).
ON ICOSAHEDRAL ARTIN REPRESENTATIONS
301
LEMMA 2.8 If r1 , r2 , r3 ∈ l Q , 1 > r12 > 1/l, r1 > r2 > 1/l, and r1 > r3 > 1/l, then the sets • X 1 (N ; l)∞ ≥r2 , 0 • X 1 (N ; l)≥r3 , and • for each s ∈ SSl the set Us (r1 ) form an admissible cover of X 1 (N ; l)an by connected admissible open subsets.
Proof This seems to be very well known, but as we are unable to find a reference, let us sketch the argument. Take an affine Zariski cover U 0 , U ∞ , and Us for s ∈ SSl of X 1 (N ; l) × Flac , where for s ∈ SSl we have SSl ∩ Us = {s}, where U 0 = X 1 (N ; l) × Flac − X 1 (N ; l)∞ , and where U ∞ = X 1 (N ; l) × Flac − X 1 (N ; l)0 . Shrinking Us if necessary, choose a regular function xs0 on Us which is identically zero on X 1 (N ; l)∞ ∩ Us and nonzero on (X 1 (N ; l)0 ∩ Us ) − {s} with a simple zero at s. We can lift xs0 to some affine open subset of X 1 (N ; l) × W (Flac ) which intersects the special fibre in Us . Set xs∞ = p/xs0 . In (X 1 (N ; l) × W (Flac ))∧ s we have P∞ P∞ xs0 = i=1 ai X 2i + l f = X 2 ( i=1 ai X 2i−1 + X 1 f ); that is, xs0 is X 2 times a unit (the same X 1 , X 2 as in Corollary 2.7). Thus, again shrinking Us if necessary, we may assume that xs∞ is regular on Us , identically zero on X 1 (N ; l)0 ∩ Us , and nonzero on ∞ (X 1 (N ; l)∞ ∩ Us ) − {s}. Moreover, in (X 1 (N ; l) × W (Flac ))∧ s , x s is a unit times X 1 . We let U∞ (resp., U0 , resp., Us ) denote the preimage in X 1 (N ; l)an of U∞ (resp., U0 , resp., Us ). They form an admissible affinoid cover of X 1 (N ; l)an . For r ∈ l Q and 0 ∞ ⊂ U ) to be the locus where |x 0 | ≥ r 1 ≥ r > 1/l, set Us,≥r ⊂ Us (resp., Us,≥r s s l ∞ (resp., |xs |l ≥ r ). Note also that Us (r ) is the subspace of Us where |xs0 |l ≤ r and 0 |xs∞ |l ≤ r . Note that X 1 (N ; l)0≥r (resp., X 1 (N ; l)∞ ≥r ) is the union of U0 and Us,≥r ∞ for s ∈ SS ). If r , r , r ∈ l Q , 1 > r 2 > 1/l, for s ∈ SSl (resp., U∞ and Us,≥r l 1 2 3 1 ∞ ,U0 r1 > r2 > 1/l, and r1 > r3 > 1/l, then Us,≥r , and U (r ) form an admiss 1 s,≥r3 2 0 , and U (r ) for s ∈ SS sible affinoid cover of Us . Thus X 1 (N ; l)∞ , X (N ; l) s 1 l 1 ≥r2 ≥r3 form an admissible open cover of X 1 (N ; l)an . It remains to show that for r ∈ l Q and 1 ≥ r > 1/l the spaces X 1 (N ; l)0≥r and X 1 (N ; l)∞ ≥r are connected. To save on notation, we explain only the case of 0 0 X 1 (N ; l)≥r . It suffices to check that U0 and Us,≥r for s ∈ SSl are all connected. This follows because in each case the reduction map gives a continuous map with connected fibres to a connected (in the Zariski topology) space. If r ∈ l Q and 1 ≥ r > l −l/(1+l) , then it is easy to check that 0 π1−1 X 1 (N )≥r = X 1 (N ; l)∞ ≥r q X 1 (N ; l)≥r 1/l
and π2−1 X 1 (N )≥r = X 1 (N ; l)∞ q X 1 (N ; l)0≥r . ≥r 1/l
302
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
0 ∞ 0 Moreover, X 1 (N ; l)∞ ≥r and X 1 (N ; l)≥r 1/l (resp., X 1 (N ; l)≥r and X 1 (N ; l)≥r 1/l ) form
an admissible open cover of π1−1 X 1 (N )≥r (resp., π2−1 X 1 (N )≥r ). As X 1 (N ; l) → X 1 (N ) is finite flat of degree l + 1, the same is true of the analytifications. Thus 0 π1 : X 1 (N ; l)∞ ≥r q X 1 (N ; l)≥r 1/l −→ X 1 (N )≥r
and π2 : X 1 (N ; l)∞ q X 1 (N ; l)0≥r −→ X 1 (N )≥r ≥r 1/l are both finite and flat of degree l + 1. Looking at the cardinality of the preimages of points, we deduce the following lemma. 2.9 Suppose that r ∈ l Q and 1 ≥ r > l −l/(1+l) ; then
LEMMA
(1)
∼
π1 : X 1 (N ; l)∞ ≥r −→ X 1 (N )≥r and
∼
π2 : X 1 (N ; l)0≥r −→ X 1 (N )≥r . (2)
Suppose that r ∈ l Q and 1 ≥ r > l −1/(1+l) ; then π2 : X 1 (N ; l)∞ ≥r −→ X 1 (N )≥r l and π1 : X 1 (N ; l)0≥r −→ X 1 (N )≥r l are both finite flat of degree l.
We define a bounded linear map l
≥r ≥r U = (1/l) trπ2 ◦ j ◦ π1 |−1 X (N ;l)∞ : Mk (N ) −→ Mk (N ). 1
≥r
One may check that U is compatible with the map on q-expansions which sends ∞ X n=0
an q n 7 −→
∞ X
anl q n .
n=0
Note that for 1 ≥ r ≥ l −l/(1+l) , using π1 to identify X 1 (N ; l)∞ ≥r and X 1 (N )≥r , we get a map Hom(h k (N ; l), Cl ) ,→ Mk≥r (N )
ON ICOSAHEDRAL ARTIN REPRESENTATIONS
303
which sends f to the form with q-expansion at infinity, ∞ X
f (T (n))q n .
n=1
Under this map the Hecke operator Ul corresponds to the linear map U . l Suppose that 1 > r > l −1/(1+l) . Combining U : Mk≥r (N ) → Mk≥r (N ) with the l
inclusion Mk≥r (N ) ,→ Mk≥r (N ), we get a continuous endomorphism of Mk≥r (N ), which we also denote U . It follows from Lemma 2.5 that U is completely continuous as an endomorphism of Mk≥r (N ). From the theory of completely continuous operators on p-adic Banach spaces (see [S1]), we see that we may write Mk≥r (N ) = Mk≥r (N )0 ⊕ Mk≥r (N )1
as a direct sum of U -invariant subspaces, where Mk≥r (N )0 is finite-dimensional, all the eigenvalues of U |M ≥r (N )0 are l-adic units, and U |M ≥r (N )1 is topologically nilpok
k
tent (i.e., if f ∈ Mk≥r (N )1 , then U r f → 0 as r → ∞). We let e denote projection onto the summand Mk≥r (N )0 , so that e f = lim U r ! f. r →∞
LEMMA 2.10 If f ∈ Mk≥r (N )0 for some 1 > r > l −1/(1+l) , then
f ∈ Mk>l
−l/(1+l)
(N ).
Proof i Choose a minimal integer i such that r l ≤ l −1/(1+l) , and write f = U i+1 f 0 for some ≥r f 0 ∈ Mk (N )0 . Then we see that li
U i f 0 ∈ Mk≥r (N ) ⊂ Mk>l and hence that
f = U (U i f 0 ) ∈ Mk>l
−1/(1+l)
−l/(1+l)
(N )
(N ).
LEMMA 2.11 Suppose that 1 > r ≥ l −1/(1+l) , that f ∈ Mk≥r (N ), that a ∈ Cl is an l-adic unit, and that ∈ R>0 . If |U f − a f |1 ≤ ,
then | f − e f |1 ≤ .
304
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
Proof For all positive integers t, we see that |U t! f − a t! f |1 ≤ . Taking the limit as t → ∞ and noting that | |1 ≤ | |r , the lemma follows. (We remark that a t! → 1 as t → ∞.) 2.12 Suppose we are given an integer k and a formal q-expansion LEMMA
∞ X
an q n ∈ Cl [[q]]
n=1
such that for all n we have anl = al an and such that al is an l-adic unit. Suppose we also have two series of positive integers ti and ki and a series of abelian group homomorphisms f i : h ki (N ; l) → Cl such that (1) ti → ∞ as i → ∞, (2) ki ≡ k mod (l − 1)l ti −1 , and (3) for all positive integers n and for all i, we have f i (T (n)) ≡ an mod l ti . Then
P
an q n is the q-expansion at infinity of an element of Mk>l
−l/(1+l)
(N ).
Proof P By Lemma 2.10 we need only show that an q n is the q-expansion at infinity of ≥r an element of Mk (N ) for some r < 1. Choose such an r with r > l −1/4 and r > l −1/(1+l) . We may suppose that each ti ≥ 3. Set h = 4 if l = 2, and h = l − 1 otherwise. Then f i corresponds to an element of Mk≥r (N ) which we also denote by i ≥r 0 (k −k)/ h i f i . Moreover, f i /(E ) ∈ Mk (N ) and has q-expansion at infinity congruP ent to n an q n modulo l ti . (If l = 2, note that E 0 is congruent to 1 modulo 24 .) Thus e( f i /(E 0 )(ki −k)/ h ) ∈ Mk≥r (N )0 also has q-expansion at infinity congruent to P ≥r n ti 0 n an q modulo l . As Mk (N ) is finite-dimensional, all l-adic norms are equivalent. The e( f i /(E 0 )(ki −k)/ h ) form a Cauchy sequence for | |1 and hence also for | |r . Let f ∈ Mk≥r (N )0 denote the limit of the e( f i /(E 0 )(ki −k)/ h ) in both of these norms. P Then f has q-expansion at infinity n an q n , as desired. Finally, we state the generalisation of [BT, Th. 4] to l = 2 and 3. Although in [BT] there is a running hypothesis that l ≥ 5, the proof of this theorem given there makes no use of that hypothesis.
ON ICOSAHEDRAL ARTIN REPRESENTATIONS
305
THEOREM 2.13 Let N and k denote integers with N ≥ 5. Let l6 | N be a prime. Suppose α and β are −l/(1+l) distinct nonzero elements of Cl and that f α , f β ∈ Mk>l (N ) are eigenvectors for U with eigenvalues α and β. Suppose also that f α (resp., f β ) have q-expansions P P at infinity, n≥1 an ( f α )q n (resp., n≥1 an ( f β )q n ), and that for all positive integers n not divisible by l, we have an ( f α ) = an ( f β ).
Then f = (α f α − β f β )/(α − β) is classical; that is, there is an abelian group homomorphism f 0 : h k (N ) → Cl such that for all n, f 0 (T (n)) = (αan ( f α ) − βan ( f β ))/(α − β). 3. 2-adic Hida theory and deformation theory In this section we draw together some results about 2-adic Hida theory which are not well documented in the literature, and we deduce some slight extensions of the results of [Di]. If N ≥ 5 is an odd positive integer, we let h 0 (N ) denote lim e(h 2 (2r N ) ⊗Z Z2 ), ←−
r
where e denotes H. Hida’s idempotent e = lim U2t! . t→∞
Taking the limit of the homomorphisms h i : (Z/2r N Z)× −→ e(h 2 (2r N ) ⊗Z Z2 )× , we get a continuous homomorphism 0 × S = S 2 × S2 : (Z/N Z)× × Z× 2 −→ h (N ) .
∼ Z2 [[T ]], where T + 1 is We let 3 denote the completed group ring Z2 [[(1 + 4Z2 )]] = identified with the element 5 of 1+4Z2 . Then S2 induces a continuous homomorphism 3 → h 0 (N ). According to [Hi, Ths. 3.3 and 3.4], h 0 (N ) is a finitely generated, torsion-free 3-module and for any integer k ≥ 2 we have a surjection h 0 (N )/(S2 (5) − 5k−2 ) → → e(h k (4N ) ⊗Z Z2 ) which sends T (n) to T (n) for all n and which becomes an isomorphism after tensoring with Q2 .
306
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
Set e± = (1 ± S2 (−1))/2 and h 0 (N )± = e± h 0 (N ) ⊂ h 0 (N ) ⊗Z2 Q2 . Then h 0 (N ) ⊂ h 0 (N )+ ⊕ h 0 (N )− ⊂ (1/2)h 0 (N ). Thus we see that h 0 (N )± are finitely generated torsion-free 3-modules, and so from the structure theory of finitely generated 3-modules we see that we have exact sequences of 3-modules (0) −→ h 0 (N )± −→ 3r± −→ X ± −→ (0), where r± are nonnegative integers and where X ± have finite cardinality 2a± . LEMMA 3.1 If k ≥ 2 is an integer with the same parity as (1 ∓ 1)/2, then there is a surjection
h 0 (N )± /(S2 (5) − 5k−2 ) → → e(h k (2N ) ⊗Z Z2 ) which sends T (n) to T (n) for all n, and with kernel of finite order divisible by 2a± . Proof Observe first that U2 maps the space of modular forms of weight k and level 01 (2N )∩ 00 (4) to the space of modular forms of weight k and level 01 (2N ) (cf., for instance, [Hi, Prop. 8.3]). We deduce that for k ≡ (1 ∓ 1)/2 mod 2 we have an equality ee± (h k (4N ) ⊗ Q2 ) = e(h k (2N ) ⊗ Q2 ), and the lemma follows. Similarly, set h 2 (4N )− = e− h 2 (4N ) ⊂ h 2 (4N ) ⊗ Q. LEMMA 3.2 Suppose that f : h 0 (N )− → Qac 2 is a continuous Z2 -algebra homomorphism such that f (S2 (5)) = 1/5. Then ∞ X f (T (n))q n n=1
is the q-expansion at infinity of an element of M1>2
−2/3
(N ).
Proof For each integer r ≥ 1, set k(r ) = 1 + 2a− +r . Then we can find a continuous homomorphism of Z2 -modules fr : e(h k(r ) (2N ) ⊗Z Z2 ) −→ Qac 2
ON ICOSAHEDRAL ARTIN REPRESENTATIONS
307
such that fr (T (n)) ≡ f (T (n)) mod 2r +2 for all n. The lemma follows from Lemma 2.12. Suppose that k ≥ 2 is an integer. If ℘ is a minimal prime ideal of h 0 (N )± containing S2 (5) − 5k−2 , then h 0 (N )± /℘ is a 1-dimensional integral domain. Thus ℘ contains ker h 0 (N )± → → ee± (h k (4N ) ⊗Z Z2 ) . Thus contraction gives a bijection between prime ideals of ee± (h k (4N ) ⊗Z Z2 ) and prime ideals of h 0 (N )± containing S2 (5) − 5k−2 , and hence also a bijection between maximal ideals of ee± (h k (4N ) ⊗Z Z2 ) and maximal ideals of h 0 (N )± . Hence to any maximal ideal m of h 0 (N )± we can associate a continuous semisimple representation ρ m : Gal(Qac /Q) −→ GL2 (h 0 (N )± /m) such that for all but finitely many primes p we have tr ρ m (Frob p ) = T p , and • det ρ m (Frob p ) = pS( p). We call m Eisenstein if ρ m is not absolutely irreducible. Note that the intersection over all integers k ≥ 2 with k ≡ (1 ∓ 1)/2 of ker h 0 (N )± → → e(h k (2N ) ⊗ Q2 ) •
equals \
S2 (5) − 5k−2 3r± ∩ h 0 (N ) = (0).
k
Thus h 0 (N )± = lim h 0 (N )± /
\
←
ker h 0 (N )± → → e(h k (2N ) ⊗ Z2 ) ,
k∈K
where the inverse limit is over finite sets K of integers k ≥ 2 with k ≡ (1 ∓ 1)/2 mod 2. For each k ≥ 2 there is a continuous 2-dimensional pseudorepresentation (see [Ta1] for the definition of pseudorepresentation) T : Gal(Qac /Q) −→ e(h k (2N ) ⊗ Z2 ) such that for all primes p6 | 2N the pseudorepresentation T is trivial on I p , and, moreover, T (Frob p ) = T p and T (Frob2p ) = T p2 − 2 pS( p). We remind the reader that I p is the inertia group at p, and to say that T is trivial on I p means that T (σ τ ) = T (τ ) for all σ ∈ I p and τ ∈ Gal(Qac /Q). By the Chebotarev density theorem, we see that T is
308
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
uniquely defined by these properties. Thus for any finite set K as in the last paragraph we get a continuous pseudorepresentation \ T : Gal(Qac /Q) −→ h 0 (N )± / ker h 0 (N )± → → e(h k (2N ) ⊗ Z2 ) k∈K
⊂
M
e(h k (2N ) ⊗Z Z2 )
k∈K
such that for all primes p6 | 2N the pseudorepresentation T is trivial on I p , T (Frob p ) = T p , and T (Frob2p ) = T p2 − 2 pS( p). Taking the limit over K , we find a continuous pseudorepresentation T : Gal(Qac /Q) −→ h 0 (N )± such that for all primes p6 | 2N the pseudorepresentation T is trivial on I p , T (Frob p ) = T p , and T (Frob2p ) = T p2 − 2 pS( p). By [N, main theorem] (see also [Ro]), we see that if m is a non-Eisenstein maximal ideal of h 0 (N )± , then there is a continuous representation ord ρm : Gal(Qac /Q) −→ GL2 (h 0 (N )±,m ) ord is unramified at all primes p6 | 2N and satisfies such that ρm ord • tr ρm (Frob p ) = T p , and ord (Frob ) = pS( p). • det ρm p It is known (by [De] or [W, Th. 2.1.4]) that ρ m |ss is unramified. We Gal(Qac 2 /Q2 ) ss suppose that ρ m |Gal(Qac /Q ) (Frob2 ) has two distinct eigenvalues α and β. Then it is 2
2
also known that α, β ∈ h 0 (N )± /m and that either U p − α ∈ m or U p − β ∈ m (see [De] or [W, Th. 2.1.4]). We suppose it is the former. Choose an element σ0 ∈ ord Gal(Qac 2 /Q2 ) above Frob2 . It follows from Hensel’s lemma that ρm (σ0 ) has distinct 0 eigenvalues A and B in h (N )±,m with A ≡ α mod m and B ≡ β mod m. Choose ord (σ ) with eigenvalues a basis (e B , e A ) of h 0 (N )2±,m consisting of eigenvectors of ρm 0 B and A, respectively. With respect to this basis, write a(σ ) b(σ ) ord ρm (σ ) = . c(σ ) d(σ ) Also, write ψa for the unramified character of Gal(Qac 2 /Q2 ) which takes Frob2 to a, • χ2 for the 2-adic cyclotomic character, and • S for the composite •
∼
S
× 0 × Gal(Qac /Q) −→ Gal(Q(µ2∞ N )/Q) → Z× 2 × (Z/N Z) −→ h (N ) .
ON ICOSAHEDRAL ARTIN REPRESENTATIONS
309
Then by [W, Th. 2.1.4] we see that for any integer k ≥ 2 with k ≡ (1 ∓ 1)/2 mod 2 and for any σ ∈ Gal(Qac 2 /Q2 ), we have • a(σ ) ≡ (χ2 SψU−1 )(σ ), 2 • c(σ ) ≡ 0, and • d(σ ) ≡ ψU2 (σ ), all modulo ker h 0 (N )±,m → → e(h k (2N ) ⊗Z Z2 )m . We conclude that ord ρm |Gal(Qac 2 /Q2 )
∼
χ2 SψU−1 2 0
∗ ψU2
!
and that A = U2 . Now, suppose that ρ : Gal(Qac /Q) −→ GL2 (Fac 2 ) is a continuous representation such that • ρ(c) 6= 1, • ρ|ss is unramified and ρ|ss (Frob2 ) has distinct eigenvalues Gal(Qac Gal(Qac 2 /Q2 ) 2 /Q2 ) α and β, • ρ|Gal(Q(√−1)ac /Q(√−1)) is irreducible, and • such that there exists an odd integer N ≥ 5 and a homomorphism f : h 2 (N ) −→ Fac 2 satisfying (1) f (T2 ) = α, (2) f (T p ) = tr ρ(Frob p ) for all primes p6 | 2N , and (3) f ( pS( p)) = det ρ(Frob p ) for all primes p6 | 2N . We let N (ρ) denote the conductor of ρ. Suppose also that S is a finite set of odd primes containing all the primes where ρ ramifies and some prime p ≥ 5. Then set Y dim ρ Ip N S (ρ) = N (ρ) p . p∈S
It follows from [Buz, Th. 3.1] that we can find a ring homomorphism h 2 (2N S (ρ)) −→ Fac 2 such that • U2 maps to α, • U p maps to zero if p ∈ S, • T p maps to tr ρ(Frob p ) if p6 | 2N S (ρ), and • pS( p) maps to det ρ(Frob p ) if p6 | 2N S (ρ).
310
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
(It is here we use the fact that ρ|Gal(Q(√−1)ac /Q(√−1)) is irreducible, rather than the weaker assumption that ρ is irreducible.) We let m S (ρ, α)+ denote the kernel of this homomorphism. LEMMA 3.3 Keep the above notation and assumptions. (1) There is a ring homomorphism h 2 (4N S (ρ))− → Fac 2 such that • U2 maps to α, • U p maps to zero if p ∈ S, • T p maps to tr ρ(Frob p ) if p6 | 2N S (ρ), and • pS( p) maps to det ρ(Frob p ) if p6 | 2N S (ρ). We denote its kernel m S (ρ, α)− . (2) There is a surjection
→ h 2 (2N S (ρ))m S (ρ,α)+ /(2) h 2 (4N S (ρ))−,m S (ρ,α)− /(2) → which takes T (n) to T (n) for all n. Proof Let T denote the polynomial algebra over Z2 generated by variables t p and s p for p6 | 2N S (ρ) and u p for p|2N S (ρ). Then there is a natural map T → h 2 (2N S (ρ))m S (ρ,α)+ /(2) which sends t p to T p , and so on. Let m denote the pullback of m S (ρ, α)+ . It is a maximal ideal of T. Let Y denote the open (i.e., with the cusps removed) modular curve of level 01 (2N S (ρ)) ∩ 00 (4). Let denote the character of 01 (4N S (ρ))/(01 (2N S (ρ)) ∩ 00 (4)) of order 2, thought of as a character of the fundamental group of Y . It is known that H 1 (Y, Z2 )m ∼ = h 2 (2N S (ρ))2m S (ρ,α)+ , where T acts on the cohomology by sending t p to T p , and so on (see [Gr, Prop. 12.10]). Because H 2 (Y, Z2 ) = (0) (as Y is affine), we conclude that 2 H 1 (Y, F2 )m ∼ = h 2 (2N S (ρ))m S (ρ,α)+ /(2) . Thus to prove the lemma it suffices to see that the action of T on H 1 (Y, F2 )m factors through h 2 (4N S (ρ))−,m S (ρ,α)− . However, H 1 (Y, F2 )m = H 1 (Y, F2 ())m ∼ = H 1 (Y, Z2 ())m ⊗ F2 because H 2 (Y, Z2 ()) = (0) (as Y is affine.) Finally, the action of T on H 1 (Y, Z2 ())m factors through h 2 (4N S (ρ))−,m S (ρ,α)− because H 1 (Y, Z2 ())m is torsion-free (because in turn H 0 (Y, F2 ())m = H 0 (Y, F2 )m = (0), as m is nonEisenstein).
ON ICOSAHEDRAL ARTIN REPRESENTATIONS
311
We remark that by our choice of N S (ρ), for p ∈ S we have U p = 0 in each of h 2 (2N S (ρ))m S (ρ,α)+ , h 2 (4N S (ρ))−,m S (ρ,α)− , and h 0 (N S (ρ))±,m S (ρ,α)± . To see this it suffices to check that for p ∈ S we have U p = 0 in h k (4N S (ρ))m S (ρ,α)± whenever k ≥ 2 and k ≡ (1 ∓ 1)/2 mod 2. This is, however, standard (see, e.g., [CDT, Cor. 4.2.3 and proof of Lem. 5.1.1]). We let (2) (2) ρ S,α,± : Gal(Qac /Q) −→ GL2 (R S,α,± ) denote the universal deformation of ρ to a representation that is unramified outside S ∪ {2} and that when restricted to Gal(Qac 2 /Q2 ) is of the form
φ1 0
∗ φ2
,
where φ2 is unramified and φ2 (Frob2 ) ≡ α modulo the maximal ideal, and where, thinking of φ1 as a character of Q× 2 by local class field theory, we have φ1 (−1) = ∓1 and φ1 (x) = x for all x ∈ (1 + 4Z2 ). Similarly, we let ord ord ρ S,α,± : Gal(Qac /Q) −→ GL2 (R S,α,± )
denote the universal deformation of ρ to a representation that is unramified outside S ∪ {2} and that when restricted to Gal(Qac 2 /Q2 ) is of the form
φ1 0
∗ φ2
,
where φ2 is unramified and φ2 (Frob2 ) ≡ α modulo the maximal ideal, and where, thinking of φ1 as a character of Q× 2 by local class field theory, we have φ1 (−1) = ∓1. −1 The character φ1 χ2 gives a continuous homomorphism, which we denote S2 , ord (1 + 4Z2 ) −→ (R S,α,± )× ord and so makes R S,α,± into a 3-algebra. From the definitions one sees that (2)
ord R S,α,± = R S,α,± /(S2 (5) − 1).
From the universal properties we get maps (2) • R S,α,+ −→ h 2 (2N S (ρ))m(ρ,α)+ , (2)
R S,α,− −→ h 2 (4N S (ρ))−,m(ρ,α)− , and ord • R S,α,± −→ h 0 (N S (ρ))±,m(ρ,α)± , which we claim are surjections. To see this, note that U p = 0 if p ∈ S, that T p = (2) ord tr ρm(ρ,α)± (Frob p ) or tr ρm(ρ,α) (Frob p ) is in the image for p6 | 2N S (ρ), that S( p) is ± similarly in the image for p6 | 2N S (ρ), and that U2 is in the image by Hensel’s lemma •
312
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
(as it is an eigenvalue for an element of Gal(Qac 2 /Q2 ) above Frob2 in one of these representations). [Di, Th. 4 and Prop. 6] show that the map (2)
R S,α,+ −→ h 2 (2N S (ρ))m(ρ,α)+ is an isomorphism. PROPOSITION 3.4 The natural maps
(2)
R S,α,− → → h 2 (4N S (ρ))−,m(ρ,α)− and ord R S,α,± → → h 0 (N S (ρ))±,m(ρ,α)±
are isomorphisms. Proof Consider the first of these maps. We have a commutative diagram (2)
R S,α,− /(2) → → h 2 (4N S (ρ))−,m(ρ,α)− /(2) ↓ ↓ ∼ (2) R S,α,+ /(2) −→ h 2 (2N S (ρ))m(ρ,α)+ /(2) where the left-hand vertical arrow is an isomorphism from the definitions. Thus (2)
∼
R S,α,− /(2) −→ h 2 (4N S (ρ))−,m(ρ,α)− /(2), and, because h 2 (4N S (ρ))−,m(ρ,α)− is torsion-free over Z2 , we deduce that (2)
∼
R S,α,− −→ h 2 (4N S (ρ))−,m(ρ,α)− . Now the composite ord R S,α,− /(S2 (5) − 1) → → h 0 (N S (ρ))−,m(ρ,α)− /(S2 (5) − 1) → → h 2 (4N S (ρ))−,m(ρ,α)−
is an isomorphism, and so ∼
ord R S,α,− /(S2 (5) − 1) −→ h 0 (N S (ρ))−,m(ρ,α)− /(S2 (5) − 1).
Because h 0 (N S (ρ))−,m(ρ,α)− is 3-torsion-free, we deduce that ∼
ord R S,α,− → h 0 (N S (ρ))−,m(ρ,α)− .
The same argument also shows that ∼
ord R S,α,+ → h 0 (N S (ρ))+,m(ρ,α)+ .
ON ICOSAHEDRAL ARTIN REPRESENTATIONS
313
Putting Proposition 3.4 together with Corollary 1.2 and Lemma 3.2, we obtain the following corollary. COROLLARY 3.5 Suppose that K /Q2 is a finite extension with ring of integers O K , maximal ideal ℘ K , and residue field containing F4 . Suppose also that
ρ : Gal(Qac /Q) −→ GL2 (O K ) is a continuous representation such that (1) (ρ mod ℘ K ) has image equal to a conjugate of SL2 (F4 ) ⊆ SL2 (O K /℘ K ), (2) (ρ mod ℘ K )(c) 6= 1, (3) (ρ mod ℘ K ) is unramified at 5, (4) ρ is ramified at only finitely many primes, (5) ρ is unramified at 2 and ρ(Frob2 ) has eigenvalues α and β in O K with distinct reduction modulo ℘ K . Then there exists an odd integer N ≥ 5 divisible by all primes at which ρ ramifies −2/3 and a normalised eigenform f α ∈ M1>2 (N ) such that • T p f α = (tr ρ(Frob p )) f α for all primes p6 | 2N , • pS( p) f α = (det ρ(Frob p )) f α for all primes p6 | 2N , • U2 f α = α f α , and • U p f α = 0 for all p|N . (We remark that it is presumably not hard to weaken the fifth assumption to simply require that ρ|ss be unramified and that α be an eigenvalue of ρ I2 (Frob2 ). Gal(Qac 2 /Q2 ) We do not do so, as we do not need this result.) 4. The main theorem We now turn to the proof of Theorem A. By the previous work cited in the introduction, it suffices to check the following special case, which is our only contribution. 4.1 Suppose that K /Q is a Galois extension with Galois group A5 . Suppose also that • 2 is unramified in K and Frob2 ∈ Gal(K /Q) has order 3, • 5 is unramified in K , and • K is not totally real. If r : Gal(Qac /Q) −→ GL2 (C) is a continuous icosahedral representation such that proj r factors through Gal(K /Q), then there is a weight 1 newform f such that for all prime numbers p the pth Fourier coefficient of f equals the trace of Frobenius at p on the space of coinvariants for the inertia group at p in the representation r . In THEOREM
314
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
particular, the Artin L-series for r is the Mellin transform of a weight 1 newform and is an entire function. Proof Twisting r by a character of finite order, we may suppose that the image of det r has two-power order, that r is unramified at 2 and 5, and that r (Frob2 ) has order 3. Choose ∼ an isomorphism of fields Qac 2 = C, so that we may think of r as a representation Gal(Qac /Q) −→ GL2 (O K ) for some finite extension K /Q2 inside Qac 2 . By Corollary 3.5 we see that we may find an odd integer N ≥ 5 divisible by all primes at which r ramifies and normalised −2/3 eigenforms f α , f β ∈ M1>2 (N ) such that • T p f α = (tr r (Frob p )) f α and T p f β = (tr r (Frob p )) f β for all primes p6 | 2N , • pS( p) f α = (det r (Frob p )) f α and pS( p) f β = (det r (Frob p )) f β for all primes p6 | 2N , • U2 f α = α f α and U2 f β = β f β , and • U p f α = U p f β = 0 for all p|N . Theorem 2.13 tells us that f = (α f α − β f β )/(α − β) is classical, and Theorem A follows from this. Lastly, let us give some examples. To that end we call a number field K suitable if • K is Galois over Q with group A5 , • 2 is unramified in K and Frob2 ∈ Gal(K /Q) has order 3, • 5 is unramified in K , and • K is totally complex. If K is such a number field, then we can find a continuous homomorphism r : Gal(Qac /Q) −→ GL2 (C) such that the image of proj r is Gal(K /Q) (see, for instance, [S2, corollary to Th. 4]). For any such r we have just shown that L(r, s) has analytic continuation to the whole complex plane. Thus to give examples of our theorem, one need only give examples of suitable number fields K . Suppose that S is a finite set of places of Q including 2, 5, and ∞. For v ∈ S, let L v /Qv be a finite Galois extension such that Gal(L v /Qv ) embeds into A5 . Suppose that • L 2 /Q2 is unramified of degree 3, • L 5 /Q5 is unramified, and
ON ICOSAHEDRAL ARTIN REPRESENTATIONS
315
L ∞ = C. According to [M], the quotient of affine 5 space over Q by the action of A5 which simply permutes the variables is rational. Hence, by, for example, the discussion in [S3, p. xiv] (see, in particular, [S3, Th. 2 and the remark that follows]), we see that there is a number field K that is Galois over Q with group A5 and such that for v ∈ S 60/[L v :Qv ] we have K v ∼ . By varying S we see in particular that there are infinitely = Lv many suitable number fields. More concrete examples can be found in the literature. For example, according to Buhler [Buh], the splitting fields of the following are suitable: •
x 5 + 4x 4 + 25x 3 + 17x 2 + 5x + 2, x 5 + 6x 4 + 19x 3 + 25x 2 + 11x + 2, x 5 + 3x 4 + 7x 3 + 6x 2 − 11x − 24, x 5 + 3x 4 + x 3 − 4x 2 + 17x − 8, x 5 + 2x 4 + 37x 3 − 7x 2 + 25x − 4. Corrigenda for [Ta2]. Taylor would like to take the opportunity to record some corrections to [Ta2]. He would like to thank Kevin Buzzard, Henri Darmon, and Nick Shepherd-Barron for pointing these out. • Page 339, line 10: the formula defining T p should have a factor p k−1 multiplying the second sum. • Page 339, line −4: between “if and only if” and “ f (as an element . . . ” insert “c1 ( f ) = 1 and”. • Page 340, line −11: the SS>r should read SS
r . • Page 342, line −9: in Theorem 1 we need to assume that the image of G l under the projective representation associated to ρ has order divisible by a prime other than l. • Page 343, lines 10–11: in Conjecture 1 we should have assumed that the image of G l under the projective representation associated to ρ has order divisible by a prime other than l. √ • Page 344, lines 4–6: “together with an embedding i : Z[(1 + 5)/2 ,→ End((A, ψ)/Q) such that the representation of G Q on√A[2] is equivalent to ρ.” should read “together with an embedding i : Z[(1 + 5)/2 ,→ End(A/Q) such that the image of i is fixed by the ψ-Rosati involution and the representation of G Q on A[2] is√equivalent to ρ.” • Page 344, line 19: “A[ 2]” should be “A[2]”.
316
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
References [A]
E. ARTIN, Zur Theorie der L-Reihen mit allgemeinen Gruppencharakteren, Abh. Math.
[Br]
R. BRAUER, On Artin’s L-series with general group characters, Ann. of Math. (2) 48
Sem. Univ. Hamburg 8 (1930), 292–306. 283 (1947), 502–514. MR 8:503g 283 [BCDT] C. BREUIL, B. CONRAD, F. DIAMOND, and R. TAYLOR, On the modularity of elliptic curves over Q, to appear in J. Amer. Math. Soc. 292 [Buh] J. P. BUHLER, Icosahedral Galois Representations, Lecture Notes in Math. 654, Springer, Berlin, 1978. MR 58:22019 284, 315 [Buz] K. BUZZARD, On level-lowering for mod 2 representations, Math. Res. Lett. 7 (2000), 95–110. MR 2001a:11080 309 [BS] K. BUZZARD and W. STEIN, A mod five approach to modularity of icosahedral Galois representations, to appear in Pacific J. Math. 284 [BT] K. BUZZARD and R. TAYLOR, Companion forms and weight one forms, Ann. of Math. (2) 149 (1999), 905–919. MR 2000j:11062 284, 304 [CDT] B. CONRAD, F. DIAMOND, and R. TAYLOR, Modularity of certain potentially Barsotti-Tate Galois representations, J. Amer. Math. Soc. 12 (1999), 521–567. MR 99i:11037 311 [De] P. DELIGNE, unpublished letter to J.-P. Serre, 28 May 1974. 308 [Di] M. DICKINSON, On the modularity of certain 2-adic Galois representations, Duke Math. J. 109 (2001), 319–382. 284, 305, 312 [DO] I. DOLGACHEV and D. ORTLAND, Point Sets in Projective Spaces and Theta Functions, Ast´erisque 165, Soc. Math. France, Montrouge, 1988. MR 90i:14009 286 [E] T. EKEDAHL, “An effective version of Hilbert’s irreducibility theorem” in S´eminaire de Th´eorie des Nombres (Paris, 1988/89), Progr. Math. 91, Birkh¨auser, Boston, 1990, 241–249. MR 92f:14018 288 [CF] G. FALTINGS and C.-L. CHAI, Degeneration of Abelian Varieties, Ergeb. Math. Grenzgeb. (3) 22, Springer, Berlin, 1990. MR 92d:14036 287 [F] G. FREY, ed., On Artin’s Conjecture for Odd 2-dimensional Representations, Lecture Notes in Math. 1585, Springer, Berlin, 1994. MR 95i:11001 284 [Go] E. GOINS, Elliptic curves and icosahedral Galois representations, Ph.D. dissertation, Stanford University, Stanford, Calif., 1999, http://http://www.alumni.caltech.edu/˜nubian/. 284 [Gr] B. H. GROSS, A tameness criterion for Galois representations associated to modular forms (mod p), Duke Math. J. 61 (1990), 445–517. MR 91i:11060 285, 310 [He] E. HECKE, Eine neue Art von Zetafunktionen und ihre Beziehungen zur Verteilung der Primzahlen, Math. Z. 6 (1920), 11–51. 283 [Hi] H. HIDA, On p-adic Hecke algebras for GL2 over totally real fields, Ann. of Math. (2) 128 (1988), 295–384. MR 89m:11046 305, 306 ¨ [JM] A. JEHANNE and M. MULLER , “Modularity of an odd icosahedral representation” in Colloque international de th´eorie des nombres (Talence, 1999), J. Th´eor. Nombres Bordeaux 12 (2000), 475–482. MR 1 823 197 284
ON ICOSAHEDRAL ARTIN REPRESENTATIONS
[K]
[KM] [Lang] [Langl] [M] [N] [Ra] [Ro] [S1]
[S2]
[S3] [ST]
[Ta1] [Ta2]
[Ta3] [Tu] [W]
317
N. M. KATZ, “ p-adic properties of modular schemes and modular forms” in Modular
Functions of One Variable, III (Antwerp, 1972), ed. W. Kuyk and J.-P. Serre, Lecture Notes in Math. 350, Springer, Berlin, 1973, 69–190. MR 56:5434 292, 294 N. M. KATZ and B. MAZUR, Arithmetic Moduli of Elliptic Curves, Ann. of Math. Stud. 108, Princeton Univ. Press, Princeton, 1985. MR 86i:11024 294, 299, 300 S. LANG, Complex Multiplication, Grundlehren Math. Wiss. 255, Springer, New York, 1983. MR 85f:11042 291 R. P. LANGLANDS, Base Change for GL(2), Ann. of Math. Stud. 96, Princeton Univ. Press, Princeton, 1980. MR 82a:10032 283 T. MAEDA, Noether’s problem for A5 , J. Algebra 125 (1989), 418–430. MR 91c:12004 315 L. NYSSEN, Pseudo-repr´esentations, Math. Ann. 306 (1996), 257–283. MR 98a:20013 308 D. RAMAKRISHNAN, Modularity of solvable Artin representations of GO(4)-type, preprint, 2001, http://http://math.caltech.edu/people/dinakar.html 283 R. ROUQUIER, Caract´erisation des caract`eres et pseudo-caract`eres, J. Algebra 180 (1996), 571–586. MR 97a:20010 308 J.-P. SERRE, Endomorphismes compl`etement continus des espaces de Banach ´ p-adiques, Inst. Hautes Etudes Sci. Publ. Math. 12 (1962), 69–85. MR 26:1733 303 , “Modular forms of weight one and Galois representations” in Algebraic Number Fields: L-Functions and Galois Properties (Durham, England, 1975), ed. A. Fr¨ohlich, Academic Press, London, 1977, 193–268. MR 56:8497 314 , Topics in Galois Theory, Res. Notes Math. 1, Jones and Bartlett, Boston, 1992. MR 94d:12006 315 N. I. SHEPHERD-BARRON and R. TAYLOR, mod 2 and mod 5 icosahedral representations, J. Amer. Math. Soc. 10 (1997), 283–298. MR 97h:11060 284, 285, 286, 287, 292 R. TAYLOR, Galois representations associated to Siegel modular forms of low weight, Duke Math. J. 63 (1991), 281–332. MR 92j:11044 307 , Icosahedral Galois representations, Pacific J. Math. 181, no. 3 (1997), Olga Taussky-Todd Memorial Issue, 337–347, http://http://nyjm.albany.edu:8000/PacJ/1997/181-3-20.html. MR 99d:11057 284, 315 , On icosahedral Artin representations, II, preprint, 2000, http://http://www.math.harvard.edu/˜rtaylor/ 284 J. TUNNELL, Artin’s conjecture for representations of octahedral type, Bull. Amer. Math. Soc. (N.S.) 5 (1981), 173–175. MR 82j:12015 283 A. WILES, On ordinary λ-adic representations associated to modular forms, Invent. Math. 94 (1988), 529–573. MR 89j:11051 308, 309
318
BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR
Buzzard Department of Mathematics, Imperial College, London SW7 2BZ, United Kingdom; [email protected] Dickinson Department of Mathematics, Harvard University, Cambridge, Massachusetts 02138, USA; current: Department of Mathematics, University of Michigan, Ann Arbor, Michigan 48109-1109, USA; [email protected] Shepherd-Barron Department of Pure Mathematics and Mathematical Statistics, Cambridge University, Cambridge CB3 0WB, United Kingdom; [email protected] Taylor Department of Mathematics, Harvard University, Cambridge, Massachusetts 02138, USA; [email protected]
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 2,
ON THE MODULARITY OF CERTAIN 2-ADIC GALOIS REPRESENTATIONS MARK DICKINSON
Abstract We prove some results of the form “r residually irreducible and residually modular implies r is modular,” where r is a suitable continuous odd 2-dimensional 2-adic representation of the absolute Galois group of Q. These results are analogous to those obtained by A. Wiles, R. Taylor, F. Diamond, and others for p-adic representations in the case when p is odd; some extra work is required to overcome the technical difficulties present in their methods when p = 2. The results are subject to the assumption that any choice of complex conjugation element acts nontrivially on the residual representation, and the results are also subject to an ordinariness hypothesis on the restriction of r to a decomposition group at 2. Our main theorem (Theorem 4) plays a major role in a programme initiated by Taylor to give a proof of Artin’s conjecture on the holomorphicity of L-functions for 2-dimensional icosahedral odd representations of the absolute Galois group of Q; some results of this programme are described in a paper that appears in this issue, jointly authored with K. Buzzard, N. Shepherd-Barron, and Taylor. Introduction ¯ Let ` be a rational prime, and let G Q = Gal(Q/Q) be an absolute Galois group of Q. Conjectures of J.-M. Fontaine and B. Mazur [FM] and of R. Langlands predict ¯ ` ) which is that any irreducible, continuous, odd representation ρ : G Q → GL2 (Q unramified almost everywhere and potentially semistable at ` arises from a modular form, while a conjecture of J.-P. Serre [S2] predicts that every irreducible, continuous, odd representation ρ¯ : G Q → GL2 (F¯ ` ) also arises from a modular form. In both these cases, the term “odd” means that the image of a complex conjugation element of G Q should have determinant −1. A key step of the proof that every semistable elliptic curve defined over Q is modular (see [Wi2], [TW]) relates these two conjectures by ¯ `) showing that if ` is odd, then for certain `-adic representations ρ : G Q → GL2 (Q with irreducible mod-` reduction ρ¯ : G Q → GL2 (F¯ ` ) the modularity of ρ can be DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 2, Received 22 December 1999. 2000 Mathematics Subject Classification. Primary 11F80. 319
320
MARK DICKINSON
deduced from that of ρ. ¯ Subsequent work (see [Dia1], [CDT], [BCDT]) relaxed the original hypotheses on ρ, to the extent that we now know that every elliptic curve over Q is modular. Our aim in this paper is to establish analogues of some of these results in the case ` = 2. Our results are subject to the assumption that the image of ρ¯ applied to a complex conjugation element is nontrivial, and also to an ordinariness hypothesis on the restriction of ρ to a decomposition group at 2. The main results are given as Theorem 4, Corollary 5, and Proposition 6 of Section 1.2. This work forms part of a programme, conceived by Taylor and described in [Ta], to prove the strong Artin conjecture for continuous odd representations ρ : G Q → GL2 (C) whose image in PGL2 (C) is isomorphic to the group A5 of rotational symmetries of the icosahedron: for such a representation ρ the conjecture states that ρ should arise from a weight 1 classical newform f ; it then follows that the L-function associated to ρ is entire. The results here, combined with work of Buzzard, ShepherdBarron, and Taylor [BT], [ST], allow a proof of the strong Artin conjecture for an infinite class of such representations ρ; this is explained in [BDST]. The essential differences from existing methods are twofold. Recall that the Taylor-Wiles method involves finding an integer r and a sequence of carefully chosen sets of primes {Q n }n≥1 , each of cardinality r ; for each set Q n one considers deformations of ρ¯ with prescribed determinant which are “minimally ramified” away from primes in Q n . The size of the universal deformation ring—in terms of the number of elements required to topologically generate it over its base—is equal to the dimension of a suitable Selmer subgroup of H 1 (Q, ad ρ), ¯ where ad ρ¯ denotes the space of endomorphisms of the underlying representation space of ρ. ¯ It is a direct consequence of the determinant condition on deformations that one can regard the Selmer group as a subgroup of the direct summand H 1 (Q, ad0 ρ) ¯ of H 1 (Q, ad ρ) ¯ where now ad0 ρ¯ denotes the 3-dimensional space of trace zero elements of ad ρ, ¯ and this facilitates some of the computations. When ` = 2, the Selmer groups under consideration are more naturally represented, and their dimensions are correspondingly easier to compute when considered as subgroups of H 1 (Q, ad ρ) ¯ instead of H 1 (Q, ad0 ρ), ¯ which we note is no longer a direct summand. Given this, it is natural and convenient to remove the restriction on the determinant of deformations, and this we do. It is not clear to the author whether the arguments in this paper actually require the removal of the determinant condition. The second difference lies in the Galois cohomology arguments; in general, the dimension of the Selmer group mentioned above can be computed modulo a contribution from the dual Selmer group; Wiles shows that with careful choice of the sets Q n this dual group can be made to vanish. In our case, it turns out that as a result of the removal of the determinant condition it is no longer possible to completely elimi-
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
321
nate the dual Selmer group, but one can still find suitable sets of primes Q n for which the corresponding dual Selmer group is “smallest possible,” and it is then possible to compute its dimension. These computations can be found in Section 11.2. The overall scheme of proof in this paper is closely related to that presented in [CDT, Secs. 3 to 6], and we frequently refer to that article for proofs. ¯ Notation and conventions. Let G Q = Gal(Q/Q) denote an absolute Galois group of Q, and, for each prime p, let G p and I p denote choices of decomposition subgroup and inertia subgroup at p, respectively. Let Frob p be the arithmetic Frobenius element of G p /I p , and write W p for the absolute Weil group of Q p ; it is by definition the inverse image under G p → G p /I p of the subgroup generated by Frob p . We use the identification ω p : W p ab → Q× p of topological groups of local class field theory, normalised so that ω−1 ( p) is a lift of Frob p . Let c denote a choice of complex conjup gation element of G Q . Let ε p : G Q → Z× p denote the cyclotomic character given by ¯ We also write ε : G ab → Zˆ × for the the action of G Q on p-power roots of unity in Q. Q ¯ × and isomorphism defined by the formula gζ = ζ ε(g) for any root of unity ζ in Q × any element g of G Q . Given a ring R, we freely identify a character χ : Zˆ → R × ˆ × with the corresponding primiwhich factors through a finite discrete quotient of Z ∞ ˆ tive Dirichlet character Z → R. We write A = Z ⊗Z Q for the finite adeles of Q, and we write A = A∞ × R for the full ring of adeles. We use Diamond and J. Im [DI] as a reference for basic facts, definitions, and notation relating to modular forms. 1. Results Let K be a finite extension of the field Q2 with ring of integers O and residue field k, and let ρ¯ : G Q → GL2 (k) ¯ → be an absolutely irreducible continuous representation of G Q . Fix embeddings Q ¯ ¯ C and Q → K . Using these embeddings, we regard any element of C which is algebraic over Q as an element of K¯ ; in particular, if f is a classical cuspidal eigenform and p is a prime not dividing the level of f , then we regard the eigenvalue t p ( f ) of the Hecke operator T p acting on f as an element of K¯ , and its reduction as an element of the residue field k¯ of K¯ . Given a cuspidal eigenform f of some weight and level, we say that ρ¯ arises from f if the reduction of t p ( f ) to k¯ is equal to tr ρ(Frob ¯ p ) for all but finitely many primes p that do not divide the level of f and at which ρ is unramified. We say that ρ¯ is modular if it arises from some cuspidal eigenform. A conjecture of Serre [S2] predicts that every representation ρ¯ as above is modular. (Since k has characteristic 2, the representation ρ¯ is automatically odd.) Now, suppose that ρ : G Q → GL2 (K ) is a continuous absolutely irreducible odd
322
MARK DICKINSON
representation of G Q whose reduction (obtained by reducing a conjugate of ρ with image contained in GL2 (O )—the semisimplification of the result is independent of the choice of conjugate) is isomorphic to ρ; ¯ we call ρ modular if there is a cuspidal eigenform f such that t p ( f ) = tr ρ(Frob p ) for all but finitely many primes p that do not divide the level of f and at which ρ¯ is unramified. [FM, Conj. 3c] predicts that every such ρ that is unramified almost everywhere and potentially semistable at 2 should be modular. 1.1. The residual representation We place restrictions on the local behaviour of ρ¯ at 2 and ∞; specifically, we assume that ρ¯ satisfies the following conditions: (1) ρ(c) ¯ is not the identity matrix, and (2) the semisimplification of the restriction ρ| ¯ G 2 of ρ¯ to G 2 is a direct sum of distinct unramified characters. We note that there are potential problems with some of the later arguments when ρ¯ is induced from a character of the absolute Galois group of one of the fields Q(i), √ √ Q( 2), or Q( −2), but we also note that the hypothesis on ρ| ¯ G 2 above excludes this possibility. If ρ¯ is unramified at 2, then choose an eigenvalue α of ρ(Frob ¯ 2 ). If ρ¯ is ramified at 2, then let α be the eigenvalue of Frob2 acting on the 1-dimensional space of I2 coinvariants of ρ. ¯ In addition to the conditions on ρ¯ above, we need the following modularity hypotheses: (1) ρ¯ is modular, (2) furthermore, if ρ¯ is unramified at 2, then it arises from a weight 2 newform f of odd level for which the reduction of t2 ( f ) to k¯ is equal to the chosen eigenvalue α of ρ(Frob ¯ 2 ). One can show that if ρ¯ is modular and unramified at 2, then it arises from a weight 2 newform f of odd level and the reduction of t2 ( f ) to k¯ must be equal to one of the eigenvalues of ρ(Frob ¯ 2 ); then B. Gross’s results in [Gr] on the existence of companion forms can be used to show that ρ¯ also arises from a weight 2 newform g of odd level such that t2 (g) reduces to the other eigenvalue of ρ(Frob ¯ 2 ), so that the second assumption follows from the first. However, Gross’s proofs depend on some unchecked compatibilities (though in the case of mod-` representations for odd ` this problem is resolved in [CV]); it is purely to avoid any dependency on these that we make the second assumption. Our results depend on recent work of Buzzard [B] (see also his appendix to [RS]) generalising K. Ribet’s level-lowering results to the case of mod-2 representations (at least, to those representations for which multiplicity 1 is known). The restriction on the local behaviour of ρ¯ at 2 ensures that ρ| ¯ G 2 is peu ramifi´e and hence that ρ¯ is finite
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
323
at 2 (see [E, Props. 8.5 and 8.2]). Let N (ρ) ¯ be the conductor of ρ¯ (see [DDT, Sec. 2.1] for the definition of the conductor of a continuous representation of G Q over k or K ). [B, Prop. 2.4 and Th. 3.1] yield the following proposition. PROPOSITION 1 Suppose that ρ¯ satisfies the modularity assumptions above. Then ρ¯ arises from a weight 2 newform f of level N (ρ). ¯ Furthermore, if ρ¯ is unramified at 2, then we may assume that the reduction of t2 ( f ) to k¯ is equal to the chosen eigenvalue α of ρ(Frob ¯ 2 ).
As a consequence of results of Shepherd-Barron and Taylor, the modularity assumptions are satisfied when k = F4 and ρ¯ is unramified at 5 (see [ST] and [BDST, Sec. 1] for details). 1.2. Deformations of the residual representation Let CO be the full subcategory of the category of topological O -algebras whose objects are those that arise as an inverse limit of finite-length local O -algebras with residue field k and with the discrete topology; these are the coefficient rings for deformations of ρ. ¯ Remark. One could instead work with the smaller category whose objects are Noetherian local O -algebras R with residue field k and maximal ideal m R which are complete with respect to the m R -adic topology. However, the Noetherian condition seems somewhat artificial in this context, and working in the larger category CO facilitates the proof of existence of a universal deformation; it emerges later (see Corollary 38) that all the universal deformation rings considered are in fact Noetherian. Let S be a finite set of odd primes, and assume that S contains all odd primes at which ρ¯ is ramified; we define a deformation problem associated to S as follows. Definition A lifting of ρ¯ is a pair (R, ρ) consisting of an object R of CO together with a continuous representation ρ : G Q → GL2 (R) whose reduction modulo the maximal ideal of R is conjugate to ρ. ¯ An S-lifting of ρ¯ is a lifting (R, ρ) of ρ¯ satisfying the following conditions: χε ∗ (1) ρ|G 2 has the form 0 2 ψ , where χ and ψ are unramified characters, and if ρ¯ is unramified at 2, then the reduction of ψ(Frob2 ) modulo the maximal ideal of R is equal to α;
324
(2)
MARK DICKINSON
if p is not in S ∪ {2}, then ρ is unramified at p.
Definition Say that two liftings (R, ρ) and (R, σ ) of ρ¯ to a ring R are conjugate if the representations ρ and σ are; we define a deformation (respectively, an S-deformation) of ρ¯ to R to be a conjugacy class of liftings of ρ¯ (respectively, S-liftings of ρ) ¯ to R. We abuse notation by writing (R, ρ) for the deformation represented by (R, ρ). Note that the property of being an S-lifting (R, ρ) depends only on the conjugacy class of the representation ρ, so that the notion of an S-deformation makes sense. If (R, ρ) is an S-deformation of ρ¯ and if φ : R → R 0 is a map in CO , then the pushforward (R 0 , ρ ⊗ R R 0 ) of (R, ρ) by φ is also an S-deformation. Thus we have a well-defined functor Def S : CO → Sets which sends an object R of CO to the set of S-deformations of ρ¯ to R. A representability criterion due to A. Grothendieck [Gro] yields the following proposition, whose proof is sketched in Section 2. 2 The functor Def S : CO → Sets is representable. PROPOSITION
Thus there is a universal S-deformation (R Suniv , ρ Suniv ) of ρ¯ with the property that for any S-deformation (R, ρ) of ρ¯ there is a unique morphism φ : R Suniv → R in CO such that the pushforward of the universal deformation by φ is equal to (R, ρ). Let f be a weight 2 newform from which ρ¯ arises. Then a construction of G. Shimura associates to f a compatible system of 2-adic representations, and with our choice of embeddings this yields the following result, which is explained more fully in Section 3. 3 Suppose f is a weight 2 newform from which ρ¯ arises, and let R f be the O -subalgebra of K¯ generated by the Hecke eigenvalues t p ( f ) of T p acting on f for odd primes p not dividing the level of f . Then there is a unique deformation (R f , ρ f ) of ρ¯ to R f which is unramified at p and satisfies tr ρ f (Frob p ) = t p ( f ) for all odd p not dividing the level of f . PROPOSITION
We use this to create a modular S-deformation of ρ. ¯ Let N S be the set of all weight 2 newforms f of odd level from which ρ¯ arises and for which the deformation (R f , ρ f ) of ρ¯ is an S-deformation. This set is finite (see Proposition 18) and nonempty (see Proposition 17), and we define an object R Smod of CO to be the O -subalgebra of the Q product f ∈N S R f generated by elements T p = (t p ( f )) f ∈N S for all odd primes p
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
325
not in S at which ρ¯ is unramified. Then the product of all the representations ρ f can be shown to factor through GL2 (R Smod ) (for instance, by [dSL, Prop. 2.6]) and we get an S-deformation (R Smod , ρ Smod ) of ρ. ¯ Call this the modular S-deformation of ρ. ¯ By univ univ the universal property of (R S , ρ S ), there is a map φ S : R Suniv → R Smod which takes tr ρ Suniv (Frob p ) to tr ρ Smod (Frob p ) = T p for each prime p as above and so is surjective. The purpose of this paper is to prove the following theorem. THEOREM 4 (Main theorem) The surjective map φ S : R Suniv → R Smod is an isomorphism, and the O -algebra R Suniv is a complete intersection ring.
This allows us to give the following 2-adic analogue of results of Diamond, Taylor, and Wiles relating a conjecture of Fontaine and Mazur [FM, Conj. 3c] to Serre’s conjecture. COROLLARY 5 ¯ 2 ) be a continuous Galois representation that is unramified at Let ρ : G Q → GL2 (Q all but finitely many primes, let ρ¯ : G Q → GL2 (F¯ 2 ) denote its reduction, and suppose that χε ∗ (1) ρ|G 2 has the form 0 2 ψ , where χ and ψ are unramified characters with distinct reduction to F¯ 2 ; (2) ρ¯ is irreducible and ρ(c) ¯ is nontrivial; (3) ρ¯ is modular, and if ρ¯ is unramified at 2, then it furthermore arises from a ¯ weight 2 newform f of odd level for which t¯2 ( f ) = ψ(Frob 2 ). Then ρ is modular; more precisely, there is a weight 2 newform f of odd level such that tr ρ(Frob p ) = t p ( f ) for all odd primes p at which ρ is unramified.
Proof ¯2 We begin by showing that the image of ρ lies in GL2 (K ) for some subfield K of Q which has finite degree over Q2 . The following proof of this fact is due to Florian Pop, and it was explained to me by Kevin Buzzard. The image H of ρ is compact and metrisable, and hence it can be considered as a complete metric space; it can S also be expressed as a countable union H = K 0 H ∩ GL2 (K 0 ) of closed subgroups ¯ 2 ), where K 0 ranges over all subfields of Q ¯ 2 which are finite over Q2 . By of GL2 (Q Baire’s category theorem, one of these subgroups H ∩GL2 (K 0 ) has nonempty interior ¯2 and hence is open, and so has finite index, in H . Now, let K be the subfield of Q 0 generated by K and the entries of each of some set of coset representatives for H/H ∩ GL2 (K 0 ); then K is a finite extension of Q2 and H is contained in GL2 (K ).
326
MARK DICKINSON
Since the image of ρ is compact, it stabilises a lattice in K 2 ; so after conjugation we may assume that the image of ρ is contained in GL2 (O ), where O is the ring of integers of K . One can now check that the mod-2 reduction ρ¯ of ρ satisfies the local conditions at 2 and ∞ and the modularity hypotheses given in Section 1.1 above, and that ρ is an S-deformation of ρ¯ where S consists of all odd primes at which ρ is ramified. So by the universal property we get a map R Suniv → O which sends tr ρ Suniv (Frob p ) to tr ρ(Frob p ) for all odd p not in S, and by the main theorem this ¯ 2 sending T p to tr ρ(Frob p ) for all odd p not corresponds to a map R Smod → O ⊂ Q ¯ 2 of O -algebras has the in S. But, by the construction of R Smod , any map R Smod → Q form T p 7→ t p ( f ) for some newform f in N S ; thus t p ( f ) = tr ρ(Frob p ) for all odd p at which ρ is unramified. Remark. The restriction to forms of odd level in the definition of N S is redundant: given any weight 2 newform f that gives rise to an S-deformation (R f , ρ f ) of ρ, ¯ we ¯ can apply the corollary above to ρ f ⊗ R f K to deduce that there is a weight 2 newform g of odd level with t p (g) = tr ρ f (Frob p ) = t p ( f ) for all but finitely many primes p. By multiplicity one, f and g are identical. The ring R Smod can be identified with the localisation at a maximal ideal of a Hecke algebra acting on a particular space of cusp forms of odd level, and we have a “multiplicity one” result in this case. For the applications to Artin’s conjecture in [BDST], we also need a variant of this which identifies R Smod with the localisation of the Hecke algebra acting on a space of cusp forms of even level. 6 Let S be a finite set of odd primes containing all those odd primes at which ρ¯ is ramified. Let N be either Y Y Ip Ip N (ρ) ¯ p dimk ρ¯ or 2N (ρ) ¯ p dimk ρ¯ , PROPOSITION
p∈S
p∈S
where N (ρ) ¯ is the conductor of ρ. ¯ Let T N be the polynomial ring over O generated by indeterminates T p for p not dividing N and U p for p dividing N , and let m N be the kernel of the map T N → k which sends T p to tr ρ(Frob ¯ p ) for odd p not dividing N , U p to zero for odd p dividing N , and T2 or U2 (as appropriate) to α. Let X 1 (N ) be the standard compactification of the modular curve associated to the congruence subgroup 01 (N ) of SL2 (Z). Then T N acts naturally on the cohomology group H 1 (X 1 (N ), O ) and (1) the action of R Smod on H 1 (X 1 (N ), O )m N under which T p ∈ R Smod acts by the double coset operator T p is well defined, and it identifies R Smod with the image of T N in EndO H 1 (X 1 (N ), O )m N ;
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
(2) (3)
327
H 1 (X 1 (N ), O )m N is free of rank 2 over R Smod ; for each odd p dividing N , the operator U p annihilates H 1 (X 1 (N ), O )m N .
Proof In the case where N is odd, this follows from Theorem 28, Corollary 25, and Proposition 26; when N is even it follows from Theorem 28 and the variant described at the end of Section 7. The remainder of this article is organised as follows. In Section 2 we demonstrate the existence of the universal deformation ring R Suniv ; in Section 3 we give some results describing the local behaviour of the Galois representation associated to a weight 2 newform; and in Section 4 we give some results about sheaf cohomology on modular curves. In Section 5 we describe the modular deformation ring R Smod . Section 6 introduces some representations of GL2 (Z p ) which are used in Section 7 to relate the conditions of the deformation problems to properties of modular forms; Section 7 also introduces modules for the rings R Smod , based on cohomology groups of modular curves. In Section 8 we restate the main theorem, and we give proofs in the minimal and nonminimal cases in Sections 9 and 10, respectively, deferring the proof of some results about dimensions of Selmer groups and cardinalities of sets of newforms to Sections 11 and 12, respectively. 2. Deformation theory In this section we give a criterion for the existence of a universal deformation ring, based on a representability criterion of Grothendieck, and we use this to demonstrate the existence of a universal S-deformation (R Suniv , ρ Suniv ) of ρ. ¯ We also describe the behaviour of the universal deformation under extension of scalars and under twisting of ρ, ¯ and we describe how to give R Suniv the structure of an algebra over a group ring for some special sets S. 2.1. A criterion for existence of the universal deformation Let d be a positive integer, let ρ¯ : G → GLd (k) be a continuous representation of a profinite group G, and suppose that the centraliser in Md (k) of the image of ρ¯ contains only the scalar matrices. Define a deformation of ρ¯ to R as in Section 1 to be a conjugacy class of liftings (R, ρ) of ρ. ¯ Suppose that certain deformations are designated “P -deformations” and that if (R, ρ) is a P -deformation of ρ¯ to R and if φ : R → R 0 is any map in CO , then the pushforward (R 0 , ρ ⊗ R R 0 ) is also a P deformation. Then we get a well-defined functor DefP : CO → Sets
328
MARK DICKINSON
which sends an object R of CO to the set of P -deformations of ρ¯ to R, and this functor is representable if and only if there is a universal P -deformation of ρ. ¯ The following proposition uses a representability criterion due to Grothendieck to give a criterion for the existence of a universal deformation. PROPOSITION 7 Let COfl be the category of finite-length local O -algebras with residue field k, regarded as a full subcategory of CO . The following three conditions are necessary and suffi-
cient for the existence of a universal P -deformation of ρ: ¯ (1) the deformation (k, ρ) ¯ is a P -deformation; (2)
(3)
given a diagram R φ / T o ψ S in COfl and a deformation (R ×T S, ρ) of ρ, ¯ if the pushforwards of this deformation to R and to S are both P deformations, then so is (R ×T S, ρ); and if R is a filtered limit of objects (Ri )i∈I of COfl and if the pushforward of a deformation (R, ρ) to Ri is a P -deformation for each i in I , then (R, ρ) is a P -deformation.
By a filtered limit we mean the limit of a system of objects and maps indexed by a category I whose opposite is filtered in the sense of [ML, §IX.1]. Proof The category CO can be identified with the category of pro-objects of COfl , in the sense of [Gro, Sec. A2], and it follows from [Gro, corollary to Prop. 3.1, Sec. A] that a covariant set-valued functor on CO is representable if and only if it preserves filtered limits (taken in CO ) of objects of COfl and its restriction to COfl is left exact (that is, preserves finite limits). One can check directly (see [G, appendix to Chap. 3]) that the functor Def : CO → Sets which sends an object R to the set of all deformations of ρ¯ to R satisfies these conditions and hence is representable. Now the first two conditions of the proposition assert that the subfunctor DefP of Def is left exact when restricted to COfl , and the third condition ensures that DefP commutes with filtered limits of objects of COfl . A measure of the size of the universal deformation ring can be obtained by conuniv , k[ε]) ∼ Def (k[ε]). Here k[ε] denotes sidering its tangent space, HomCO (RP = P 2 the k-vector space object k[X ]/(X ) of CO , which gives each of DefP (k[ε]) and univ , k[ε]) the structure of a vector space over k. One can show that, for a HomCO (RP general object R of CO , the tangent space HomCO (R, k[ε]) is finite-dimensional if and only if R is Noetherian; more precisely, if HomCO (R, k[ε]) is finite-dimensional of dimension d, then R is a quotient of the power-series ring O [[X 1 , . . . , X d ]] in d
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
329
variables over O . Thus the dimension of DefP (k[ε]) over k controls the number of univ as a topological O -algebra. elements required to generate RP 2.2. Universal S-deformations In this section we show that the universal deformation ring of Section 1 exists. We first extend the notion of an S-deformation by giving a definition valid for any finite set of odd primes S, not necessarily containing the odd primes at which ρ¯ is ramified. In particular, we describe a deformation problem for the special case S = ∅, corresponding to deformations of ρ¯ which are everywhere “minimally ramified.” As in [Wi2], we prove the main theorem of Section 1 in two major steps: first we prove the analogous result when S = ∅, then we use an inductive argument to deduce the result for larger S. Definition Let S be a finite set of odd primes. An S-lifting of ρ¯ is a lifting (R, ρ) satisfying the following conditions: χε ∗ (1) ρ|G 2 has the form 0 2 ψ , where χ and ψ are unramified characters, and if ρ¯ is unramified at 2, then the reduction of ψ(Frob2 ) modulo the maximal ideal of R is equal to α; (2) if p is an odd prime not in S and if ρ| ¯ I p is absolutely irreducible, then (det ρ)| I p has finite odd order; (3) if p is an odd prime not in S and if ρ| ¯ I p is semisimple but not absolutely irreducible, then the reduction map ρ(I p ) → ρ(I ¯ p ) is an isomorphism; (4) if p is an odd prime not in S and if ρ| ¯ I p is not semisimple, then ρ| ¯ I p have the χ ∗ form 0 χ and we require that the restriction of ρ to I p have the form χ0˜ χ∗˜ , where χ˜ is the Teichm¨uller lift of χ . As before, an S-deformation of ρ¯ is a conjugacy class of S-liftings of ρ. ¯ By applying Proposition 7 to the deformation problem above, we obtain the following result. 8 For a finite set S of odd primes, the following are true: (1) a universal S-deformation (R Suniv , ρ Suniv ) of ρ¯ exists; (2) if K 0 is a finite extension of K with ring of integers O 0 and residue field k 0 , then (R Suniv , ρ Suniv ) ⊗O O 0 is a universal S-deformation of ρ¯ 0 = ρ¯ ⊗k k 0 ; (3) let δ : G Q → k × be a continuous character of odd conductor, and let δ˜ be its Teichm¨uller lift; then if (R Suniv , ρ Suniv ) is a universal S-deformation of ρ, ¯ the univ univ ˜ pair (R S , ρ S ⊗O δ) is a universal S-deformation of ρ¯ ⊗k δ (defined with PROPOSITION
330
MARK DICKINSON
respect to the eigenvalue δ(Frob2 )α of (ρ¯ ⊗k δ)(Frob2 ) if ρ¯ is unramified at 2). Furthermore, if S ⊂ S 0 is an inclusion of finite sets of odd primes, then every Sdeformation of ρ¯ is also an S 0 -deformation of ρ, ¯ and the corresponding map R Suniv → 0 univ R S of universal deformation rings is surjective. Proof The first three results all follow from various invariance properties of the condition of being an S-deformation. The first result follows on checking the conditions of Proposition 7. These checks are for the most part straightforward; perhaps the most awkward point is to show that the restrictions at 2 and at odd p for which ρ| ¯ I p is not semisimple on the behaviour of a deformation (R, ρ) satisfy the conditions of Proposition 7. For the condition at 2, we may replace ρ¯ by a conjugate to suppose χ ∗ that ρ| ¯ G 2 has the form 01 χ2 and that if ρ¯ is unramified at 2, then χ2 (Frob2 ) = α. Let (R, ρ) be any lifting of ρ¯ which satisfies the condition at 2 and for which the reduction of ρ is actually equal to (rather than just conjugate to) ρ; ¯ now show that 1 0 there is a unique element λ of the maximal ideal of R such that λ1 10 ρ −λ 1 has the ψ1 ε2 ∗ form 0 ψ2 and if ρ¯ is unramified at 2, then ψ2 (Frob2 ) lifts α; the required results then follow easily. The argument at the odd p mentioned above is similar. For the second result, check that if (R, ρ) is an S-deformation of ρ, ¯ then 0 0 0 0 (R ⊗O O , ρ ⊗O O ) is an S-deformation of ρ¯ , and conversely that if (R , ρ 0 ) is an S-deformation of ρ¯ 0 , then the deformation (R 0 ×k 0 k, ρ 0 ×k 0 k) of ρ¯ obtained by conjugating ρ 0 so that it reduces to ρ¯ 0 and hence factors through GL2 (R 0 ×k 0 k) is also an S-deformation. Then the operations (R, ρ) 7 → (R ⊗O O 0 , ρ ⊗O O 0 ) and (R 0 , ρ 0 ) 7→ (R 0 ×k 0 k, ρ 0 ×k 0 k) provide a pair of adjoint functors between the (suitably defined) category of S-deformations of ρ¯ and the category of S-deformations of ρ¯ 0 , and the result follows. The third part is a consequence of the fact that if (R, ρ) is an S-deformation of ˜ is an S-deformation of ρ¯ ⊗k δ and vice versa. ρ, ¯ then (R 0 , ρ ⊗O δ) Now, suppose that S ⊂ S 0 is an inclusion of finite sets of odd primes. It is clear from the definition that every S-deformation is an S 0 -deformation, so that there is a natural inclusion Def S → Def S 0 of functors. It follows purely formally that the corresponding map R Suniv → R Suniv is an epimorphism in the category CO , and one 0 can check that every epimorphism in CO is in fact a surjection. The restrictions ρ| ¯ G p of ρ¯ for various p can exhibit several different kinds of behaviour, and many of the later arguments involve examining each possibility in turn; in order to simplify the proof of the main theorem, we first reduce the number of cases by twisting to remove any unnecessary ramification in ρ. ¯
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
331
Definition Say ρ¯ is minimal if no twist of ρ¯ ⊗k k¯ by a continuous k¯ × -valued character of G Q has conductor strictly smaller than that of ρ. ¯ After suitable extension of scalars, any ρ¯ satisfying the assumptions of Section 1 becomes minimal after twisting by a k × -valued character of odd conductor. (Recall that the conductor of ρ¯ is odd by definition.) If ρ¯ is minimal, then for every odd prime p exactly one of the following occurs: (1) ρ¯ is unramified at p; (2) ρ| ¯ G p has the form χ01 χ02 , where χ1 is ramified and χ2 is unramified; χ ∗ (3) ρ| ¯ G p is ramified and has the form 0 χ , where χ is unramified; (4) ρ| ¯ I p ⊗k k¯ is decomposable and ρ| ¯ G p is absolutely irreducible; or (5) ρ| ¯ I p is absolutely irreducible. For any ρ, ¯ let T (ρ) ¯ denote the set of p for which ρ| ¯ G p is absolutely irreducible but ρ| ¯ I p is not; for minimal ρ¯ these are the p satisfying the fourth condition of the list above. By replacing K with its unique unramified quadratic extension if necessary, we can ensure that ρ| ¯ I p is decomposable over k for each p in T (ρ). ¯ We now give a more explicit description of some of the lifting conditions in the case when ρ¯ is minimal. 9 Suppose that ρ¯ is minimal, and suppose that ρ| ¯ I p is decomposable for each p in T (ρ). ¯ Let (R, ρ) be a deformation of ρ, ¯ and let p be an odd prime. (1) If ρ| ¯ G p is decomposable and is either unramified with ρ(Frob ¯ p ) having distinct eigenvalues or is ramified, then ρ is the direct sum of two characters. If p is not in S, then (R, ρ) satisfies the lifting condition at p if and only if the restrictions to I p of the summands of ρ are the Teichm¨uller lifts of those of ρ. ¯ × (2) If p is in T (ρ), ¯ then ρ is induced from a character χ : H → R of the unique index 2 subgroup H of G p which contains I p ; furthermore, the conductors of χ, χ ◦ Frob p , and χ/χ ◦ Frob p are equal, where Frob p denotes the automorphism of H ab given by conjugation by Frob p in G p . If p is not in S, then (R, ρ) satisfies the lifting condition at p if and only if the summands of ρ| I p are equal to the Teichm¨uller lifts of the summands of ρ| ¯ Ip . The matrix ρ(c) is conjugate to 01 10 . For an odd prime p not in S and a deformation (O , ρ) of ρ, ¯ a necessary condition for this deformation to satisfy the condition at p is that the p-part of the conductor of ρ ⊗O K be equal to the p-part of the conductor of ρ¯ and that det ρ| I p be the Teichm¨uller lift of det ρ| ¯ I p . If p is not in T (ρ), ¯ then this necessary condition is also sufficient. PROPOSITION
332
MARK DICKINSON
Proof See [Dia2, Secs. 2 and 3]. In the proof of the main theorem for the case S = ∅, we need to consider a sequence of universal deformation rings (R univ Q n )n≥1 , each associated to a finite set Q n consisting entirely of odd primes p at which ρ¯ is unramified and for which ρ(Frob ¯ p ) has distinct k-rational eigenvalues. For these particular sets of primes, the associated universal deformation ring has the natural structure of an algebra over a particular group ring, as we see below. For clarity we use the letter Q in place of S for sets of primes appearing in this situation. We associate to any such Q an abelian 2-group G Q . Notation. If Q is a finite set consisting of odd primes p at which ρ¯ is unramified and for which ρ(Frob ¯ p ) has distinct k-rational eigenvalues, then write G Q for the maximal 2-power quotient of the abelian group Y (Z/ pZ)× /{±1} ( p,α)
where the product runs over all pairs ( p, α) consisting of an element p of Q and an eigenvalue α of ρ(Frob ¯ p ). We describe how to define a map O [G Q ] → R univ Q in CO , where O [G Q ] is the group ring of the abelian group G Q . Let (R, ρ) be any lifting of ρ; ¯ then by Proposition 9 the restriction ρ|W p is decomposable for each p in Q; for each pair ( p, α), let ν p,α : W p ab → R × denote the summand of ρ which reduces to the unramified character sending Frob p to α. We can collect the maps ν p,α together to obtain a map Y Y × νρ = (ν p,α ◦ ω−1 : Z× p )|Z× p → R . p ( p,α)
( p,α)
By the following lemma, this gives a group homomorphism of G Q into R × . LEMMA 10 Let (R, ρ) be any Q-deformation of ρ. ¯ The map νρ factors through the natural proQ × jection ( p,α) Z p → G Q .
Proof Each character ν p,α has unramified reduction, and so its restriction to I p has 2power order. Hence νρ factors through the maximal 2-power order quotient of Q × p,α (Z/ pZ) . In order to show that νρ factors through G Q , it remains to show that the image of the diagonally embedded element −1 is trivial in R × . Since (R, ρ) is
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
333
a Q-deformation of ρ, ¯ it follows from Proposition 9 that det ρ/ε2 is the product of the Teichm¨uller lift of det ρ¯ with some finite order character ψ : G Q → R × which is unramified outside Q, and which satisfies ψ| I p = (ν p,α ν p,β )| I p for each p in Q. Then, since ρ(c) is conjugate to 01 10 , both det ρ/ε2 and the Teichm¨uller lift of det ρ¯ send c to 1, and so Y Y 1 = ψ(c) = (ψ|W p ◦ ω−1 (ν p,α ◦ ω−1 p )(−1) = p )(−1) = νρ (c), p∈Q
( p,α)
as required. univ In particular, for the universal deformation (R univ Q , ρ Q ) we obtain a natural map
O [G Q ] → R univ Q ,
as desired. Finally, in this section, suppose that (R, ρ) is a Q-deformation that is unramified at all primes p in Q; then the characters ν p,α defined above are all unramified, and so the character νρ : G Q → R × is trivial. Since any ∅-deformation of ρ¯ satisfies this condition, the composite map × univ × G Q → (R univ Q ) → (R∅ )
has trivial image. (It is not hard to see that the natural map R univ → R∅univ identifies Q univ univ R∅ with the G Q -coinvariants of R Q , but we do not need to use this fact.) 3. Modular forms and Galois representations In this section we explain how to construct the deformation (R f , ρ f ) of ρ¯ associated to a weight 2 newform f from which ρ¯ arises. In order to decide which newforms give rise to S-deformations of ρ¯ for some given set of odd primes S, we need a good understanding of the local behaviour of ρ¯ f , that is, of the restrictions ρ f |G p for various primes p. For odd p, an excellent description of ρ f |G p is provided by a theorem of H. Carayol, via the adelic description of modular forms and the local Langlands correspondence for GL2 . The results we need in order to understand ρ f |G 2 are provided by Wiles [Wi1] and Fontaine. First we explain how to construct the deformation described in Proposition 3. We start with the following well-known result of Shimura. THEOREM 11 (Shimura) Let f be a weight 2 newform of level N f and character χ f , and, for each prime p not dividing N f , write t p ( f ) for the eigenvalue of the Hecke operator T p acting on f , regarded as an element of K¯ . Let K f be the finite extension of K generated by
334
MARK DICKINSON
these t p ( f ). Then there is a continuous absolutely irreducible Galois representation ρ f : G Q → GL2 (K f ) with the property that for all odd primes p not dividing N f the representation ρ f is unramified at p and ρ f (Frob p ) has characteristic polynomial X 7 → X 2 − t p ( f )X + pχ f ( p). Now, suppose that the representation ρ¯ : G Q → GL2 (k) arises from some particular weight 2 newform f ; then we construct a deformation of ρ¯ from ρ f as follows. Let O f be the ring of integers of K f , and let k f be its residue field. Since the image of ρ f is compact, one can find a conjugate of ρ f whose image lies in GL2 (O f ), and then one can reduce this to obtain a representation ρ¯ f : G Q → GL2 (k f ). By assumption, tr ρ(Frob ¯ p ) is equal to the reduction of t p ( f ), which by the above theorem is equal ˇ to tr ρ¯ f (Frob p ), and so by the Cebotarev density theorem and the Brauer-Nesbitt theorem the semisimplifications of ρ¯ f and ρ¯ are isomorphic. But ρ¯ is absolutely irreducible; hence ρ¯ f is irreducible, and it follows that any two GL2 (K f )-conjugates of ρ f with image in GL2 (O f ) are in fact conjugate by an element of GL2 (O f ), so that we get a well-defined conjugacy class of representations over O f . After conjugating by a suitable element of GL2 (O f ), the representation ρ f reduces to ρ¯ ⊗k k f and hence can be written with coefficients in the subobject O f ×k f k of O f . Now, let R f be the O -subalgebra of O f ×k f k generated by elements t p ( f ) for all p not dividing the level of f ; then by [dSL, Prop. 2.6], the representation ρ f factors through GL2 (R f ) and we obtain a deformation (R f , ρ f ), as required. 3.1. Automorphic representations To describe the representations ρ f |G p , we need to use the adelic description of the theory of modular forms (for an overview of this and for additional references, see [DI, Secs. 11 and 12]). The main results that we require are as follows. A weight 2 newform f naturally gives rise to an irreducible admissible representation π f of GL2 (A∞ ) over C. This representation is defined over the finite extension of Q generated by the elements t p ( f ) for odd primes p not dividing the level of f ; thus we can and do regard π f as a representation over K¯ . There is a decomposition πf ∼ = ⊗0p π p , where each π p is an infinite-dimensional irreducible admissible representation of GL2 (Q p ) over K¯ . (The prime on the symbol ⊗0p indicates that we are taking a reGL (Z )
stricted tensor product with respect to the subspaces π p 2 p of π p .) We define open compact subgroups of GL2 (Q p ) as follows. For n ≥ 0, consider the natural reduction map GL2 (Z p ) → GL2 (Z/ p n Z);
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
335
we define subgroups U ( p n ), U2 ( p n ), U1 ( p n ), and U0 ( p n ) to be the inverse images under this map of matrices of the form 10 01 , 10 ∗1 , ∗0 ∗1 , and ∗0 ∗∗ , respectively. If U2 ( p n ) ⊂ V ⊂ U0 ( p n ) for some n > 0 and if π is an infinite-dimensional irreducible admissible representation of GL2 (Q p ), then write U p for the Hecke op erator V 0p 10 V acting on π V . We also write T p and S p for the Hecke operators GL2 (Z p ) 0p 10 GL2 (Z p ) and GL2 (Z p ) 0p 0p GL2 (Z p ) acting on π GL2 (Z p ) . We then have the following proposition, which occurs as a corollary to [Ca, proof of Th. 1]. PROPOSITION 12 Let π be an infinite-dimensional irreducible admissible representation of GL2 (Q p ). n Then there is an integer c ≥ 0 such that for every n ≥ 0 the dimension of π U1 ( p ) is equal to max{0, n − c + 1}.
We call the integer p c with c as in the above lemma the conductor of π . If c = 0, then we say that π is unramified. Each infinite-dimensional irreducible admissible representation of GL2 (Z p ) is classified as principal series, special, or supercuspidal, as described in [DI]. If × ¯ ¯ χ : Q× p → K and ψ : Q p → K are continuous characters, then we write π p (χ , ψ) for the space of locally constant functions f : GL2 (Q p ) → K¯ which satisfy f ( a0 db g) = χ (a)ψ(d) f (g) for all g in GL2 (Q p ) and any matrix a0 db in GL2 (Q p ). (Note that this notation differs from the notation in [DI].) If χ/ψ is not equal to the identity or to x 7 → p −2v p (x) , then π(χ, ψ) is irreducible and is a principal series representation, of conductor equal to the product of the conductors of χ and ψ. Otherwise, π(χ , ψ) has a unique infinite-dimensional irreducible subquotient sp(χ, ψ), which is a special representation. If χ and ψ are unramified, then sp(χ, ψ) is called unramified special and it has conductor p; otherwise χ and ψ have the same conductor and the conductor of sp(χ, ψ) is equal to the product of the conductors of χ and ψ. The supercuspidal representations are more difficult to construct; they all have conductor at least p 2 . To relate the above definitions to the classical situation, if π = ⊗0p π p is an automorphic representation arising from a weight 2 newform f of level N f and character χ f , then for any p the conductor p c p of π p is equal to the p-part of N f ; thus π p is unramified if and only if p does not divide N f . Moreover, the eigenvalues of the classical Hecke operators T p (when p does not divide N f ) and U p (when p does divide N f ) acting on the newform f agree with the eigenvalues of the actions of the corresponding operators defined above on the 1-dimensional space U1 ( p c p ) U1 ( p c p ) πp , and for p not dividing N f the eigenvalue of S p on π p is equal to χ f ( p).
336
MARK DICKINSON
3.2. The local Langlands correspondence for GL2 and Carayol’s theorem Recall that the local Langlands correspondence for GL2 (whose proof was completed by P. Kutzko [Ku]) gives for any odd prime p a natural bijection between isomorphism classes of irreducible admissible representations of GL2 (Q p ) over K¯ and isomorphism classes of continuous 2-dimensional representations of the absolute Weil group W p of Q p over K¯ for which any choice of 8 in W p lifting Frob p acts semisimply. There are various normalisations used in the literature for this correspondence; we use the normalisation described by Carayol in [C, Sec. 0]. (Note that Carayol uses the opposite convention for the identification W p ∼ = Q× p and identifies p with a lift of the geometric Frobenius, so his results look a little different.) We are also using here the correspondence described in [T, Sec. 4.2.1] between continuous 2-dimensional representations of W p over a finite extension K 0 of K and 2-dimensional representations of the Weil-Deligne group of Q p over K 0 . (This is the reason for our restriction to odd primes p.) If π is an infinite-dimensional irreducible admissible representation of GL2 (Q p ), then the conductor of π is equal to the conductor of the corresponding representation of W p , and the determinant of the representation corresponding to π is equal to ε2 (χπ ◦ ω), where χπ is the central character of π. The correspondence also preserves L-factors and ε-factors (see [K] for more details and additional properties of the correspondence). The representation of W p corresponding to a particular π is decomposable, reducible but not decomposable, or irreducible according as π is principal series, special, or supercuspidal, respectively. More precisely, if π = π(χ, ψ) is principal (χ◦ω p )ε2 0 series, then the corresponding representation of W p has the form 0 ψ◦ω p . If π = sp(χ, χ) is special, then the corresponding representation has the form (χ◦ω p )ε2 ∗ 0 χ ◦ω p . Carayol showed in [C] that if p is an odd prime and f is a weight 2 newform with corresponding automorphic representation π = ⊗0p π p , then π p corresponds under the local Langlands correspondence above to the restriction of the representation ρ f to the Weil group W p at p. This theorem combined with the properties of the local Langlands correspondence has many useful consequences; we collect some of them here for later use. THEOREM 13 (Carayol) Let f be a weight 2 newform, and let π = ⊗0p π p be the corresponding automorphic representation of GL2 (A∞ ). Let ρ f : G Q → GL2 ( K¯ ) be the representation associated to f . Then for an odd prime p, (1) ρ f |W p is the representation corresponding to π p under the local Langlands correspondence as described above;
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
(2)
(3) (4)
337
let χπ : (A∞ )× → K¯ be the central character of π; then, using the identifications of class field theory described in the notation and conventions section, det ρ f = (χπ ◦ ε)ε2 and det ρ f |W p = (χπ |Q×p ◦ ω p )ε2 |W p ; the odd part of the conductor of ρ f is equal to the odd part of the level of f ; let p c p be the conductor of π p , and let e p be the dimension of (ρ f ) I p ; then the characteristic polynomial g(X ) of U p acting on the (e p + 1)-dimensional U ( p c p +e p )
space π p 1 is equal to X times the characteristic polynomial of the action of Frob p on (ρ f ) I p . Theorem 13 gives the facts we need about the restrictions ρ f |G p for odd p; we also need to say something about the structure of ρ f |G 2 when the level of f is not divisible by 4. PROPOSITION 14 Let f be a weight 2 newform. Then we have the following: (1) if f has odd level and the reduction of t2 ( f ) to k¯ is nonzero, then ∗ ∗ ∼ ρ f |G 2 = , 0 ψ
(2) (3)
where ψ is unramified and sends Frob2 to the unit root of X 7 → X 2 −t2 ( f )X + 2s2 ( f ); if f has odd level and the reduction of t2 ( f ) to k¯ is zero, then ρ f |G 2 is absolutely irreducible; and if f has level exactly divisible by 2, then χε2 ∗ ∼ ρ f |G 2 = , 0 χ where χ is unramified and sends Frob2 to the eigenvalue of U2 acting on f .
Proof The first part follows from [Wi1, Th. 2]. The third part also follows from this theorem along with the fact that the character χ f of f has odd conductor and the square of the eigenvalue of U2 on f is equal to χ f (2) (see [DDT, Th. 1.27]). The second part follows from a theorem of Fontaine (see [E, Th. 2.6]). 4. Cohomology of modular curves In this section we give some basic results about sheaf cohomology on modular curves. Let V be an open compact subgroup of GL2 (A∞ ). As in [CDT], we define the open modular curve YV to be the real 2-manifold YV = GL2 (Q)\ GL2 (A)/V U∞ ,
338
MARK DICKINSON
where U∞ = SO2 (R)R× ⊂ GL2 (R) is the stabiliser of i for the transitive action of GL2 (R) on C − R, and we define X V to be the standard compactification of YV obtained by the addition of cusps. One can show that the number of connected comˆ × and that each component can ponents of YV is equal to the index of det V in Z be identified with a quotient of the upper-half complex plane by some congruence subgroup of SL2 (Z). We define some particular open compact subgroups of GL2 (A∞ ) as follows. ˆ → Let N be a positive integer, and let π N be the natural quotient map GL2 (Z) GL2 (Z/N Z). Then we define U0 (N ), U1 (N ), U2 (N ), and U (N ) to be the inverse images under π N of all matrices of the form ∗0 ∗∗ , ∗0 ∗1 , 01 ∗1 , and 10 01 , respectively. One can check that YU0 (N ) and YU1 (N ) can be identified with the classical curves Y0 (N ) and Y1 (N ), respectively. Definition We say that an open compact subgroup V of GL2 (A∞ ) is sufficiently small if it does 0 1 or −1 −1 . not contain any conjugate of either of the matrices −1 0 1 0 0 Note that we do not demand that V not contain −1 0 −1 in the above definition. If V is sufficiently small, then YV has no elliptic points—that is, for every g in GL2 (A), the stabiliser in GL2 (Q) of the coset gV U∞ is contained in {±I }. If V is contained in U1 (N ) for some integer N ≥ 4, then both V and V {±I } are sufficiently small. Q Now suppose that V = p V p is sufficiently small, that M is a finitely generated O -module equipped with an action of V on the right, that the action of V on M factors through some finite quotient of V , and that V ∩ {±1} acts trivially on M. Extend the action of V to an action of V U∞ by letting U∞ act trivially on M. Then we can define a sheaf (in the sense of [Wa, Chap. 5]) F M = (GL2 (Q)\ GL2 (A)) × M /V U∞ → YV on YV , using the obvious projection map F M → YV . The conditions that V is sufficiently small and that V ∩ {±1} acts trivially on M ensure that this sheaf is locally constant and that each of its stalks is isomorphic to M. Let N and D be coprime positive integers such that U2 (N ) ∩ U (D) ⊂ V ⊂ U0 (N ) and such that V ∩ U (D) acts trivially on M; then the Hecke operators T p for p not dividing N D and U p for p dividing N act naturally on the cohomology group H 1 (YV , F M ). We note that [CDT, Lems. 6.1.2 and 6.3.1] adapt without change to the case when ` = 2, and that, furthermore, the hypothesis that V is contained in U1 (r 2 ) for a suitable prime r can be weakened to allow any sufficiently small V . Thus we have the following two results.
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
339
PROPOSITION 15 Q Let N and D be coprime positive integers. Let V = p V p be an open compact ˆ subgroup of GL2 (Z) satisfying
U2 (N ) ∩ U (D) ⊂ V ⊂ U0 (N ), and let M be a finitely generated O -module with a right action of V such that both V ∩ U (D) and V ∩ {±1} act trivially on M. Suppose that V is sufficiently small. Let T be the polynomial ring over O generated by indeterminates T p for primes p not dividing N D and U p for p dividing N ; we consider H 1 (YV , F M ) as a module for T. Let m be a non-Eisenstein maximal ideal of T with finite residue field. (Recall that a maximal ideal m of T is said to be Eisenstein if there is an integer N 0 such that T p − 2 is in m for all but finitely many p congruent to 1 modulo N 0 .) Then we have the following: (1) the natural maps Hc1 (YV , F M )m → H 1 (YV , F M )m and H 1 (X V , F M )m → H 1 (YV , F M )m are isomorphisms; (2) there is an isomorphism V /U (N )∩U (D) H 1 (YV , F M )m ∼ ; = M ⊗O H 1 (YU2 (N )∩U (D) , O ) m 2 (3)
if 0 → M 0 → M → M 00 → 0 is a short exact sequence of right O [V ]modules, then the sequence 0 → H 1 (YV , F M 0 )m → H 1 (YV , F M )m → H 1 (YV , F M 00 )m → 0
(4)
of T m -modules is exact; if M is free as an O -module, then the natural map H 1 (YU , F M )m ⊗O k → H 1 (YU , F M⊗O k )m
is an isomorphism. Furthermore, all the maps above are maps of T m -modules. PROPOSITION 16 (Ihara, Wiles) Let N , D, V , and M be as above, and let p be a prime not dividing 2N D. Let T be as above but with the indeterminate T p removed, and let m be a non-Eisenstein maximal ideal of T. Suppose in addition that the maximal ideal of O annihilates M, so that M is a k[V /V ∩ U (D)]-module. For s ≥ 1, write γ p and δ p for the maps H 1 (X V ∩U1 ( ps−1 ) , F M ) → H 1 (X V ∩U1 ( ps ) , F M ) arising from the maps X V ∩U1 ( ps ) → −1 X V ∩U1 ( ps−1 ) given by multiplication by the matrices 10 01 and p0 01 in GL2 (Q p ) ⊂ GL2 (A), respectively. Then we have the following:
340
(1)
MARK DICKINSON
the map H 1 (X V , F M )2m
(2)
γ p ⊕δ p
/ H 1 (X V ∩U ( p) , F M )m 1
is injective; if s ≥ 1, then the sequence 0
/ H 1 (X V ∩U1 ( p s−1 ) , F M )m
(−δ p ,γ p )
/ H 1 (X V ∩U ( ps ) , F M )2 m 1
γ p ⊕δ p
/ H 1 (X V ∩U1 ( p s+1 ) , F M )m
is exact. Again, all the maps are maps of T m -modules. Proof For a proof of this, see [CDT, Lem. 6.3.1]. If V1 ⊂ V2 is an inclusion of open compact subgroups of GL2 (A∞ ), then there is a corresponding map X V1 → X V2 of modular curves and hence a map H 1 (X V2 , O ) → H 1 (X V1 , O ) of cohomology groups. We denote by H the limit limV H 1 (X V , O ); this is an admis− → sible GL2 (A∞ )-module. The admissible representation H ⊗O K¯ of GL2 (A∞ ) over K¯ has a decomposition M H ⊗O K¯ ∼ π 2f , = f
where f runs over the set of all weight 2 newforms (see [CDT, Sec. 5.3]). 5. The modular deformation ring Let S be any finite set of odd primes. As in Section 1, we define the set N S to be the set of all weight 2 newforms f of odd level from which ρ¯ arises and for which the corresponding deformation (R f , ρ f ) is an S-deformation of ρ, ¯ and we use this set of newforms to define a modular S-deformation (R Smod , ρ Smod ) of ρ. ¯ In this section we give some properties of the sets N S and of the ring R Smod , and we define some new elements of R Smod . The proof of the following result is delayed until Section 12. PROPOSITION 17 The set N S is nonempty. Furthermore, if S = Q consists entirely of primes p for
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
341
which ρ¯ is unramified at p and ρ(Frob ¯ p ) has distinct k-rational eigenvalues, then we have the equality #N Q = #G Q #N∅ , where G Q is the group defined at the end of Section 2. The first part of this follows essentially from Buzzard’s level-lowering result (Proposition 1). PROPOSITION 18 The ring R Smod has the following properties. (1) R Smod is reduced, and it is free as an O -module of finite rank equal to the cardinality of N S . (2) If K 0 is a finite extension of K with ring of integers O 0 and if (R Smod )0 is the corresponding ring defined for K 0 , then the natural map
R Smod ⊗O O 0 → (R Smod )0 (3)
of O 0 -algebras which sends T p ⊗O 1 to T p is an isomorphism. Let δ : G Q → k × be a continuous character of odd conductor, and let δ˜ be its Teichm¨uller lift, and suppose that ρ¯ = σ¯ ⊗k δ for some representation σ¯ . univ , ρ univ ) and (R mod , ρ mod ) be the universal deformation and modular Let (R S, σ¯ S,σ¯ S,σ¯ S,σ¯ deformation, respectively, for S-deformations of σ¯ (defined with respect to the eigenvalue ασ¯ = αδ −1 (Frob2 ) of σ¯ (Frob2 ) if ρ¯ is unramified at 2), and let univ → R mod be the map arising from the universal property. Then φ S,σ¯ : R S, σ¯ S,σ¯ there is a commutative diagram R Suniv φS
R Smod
/ R univ S,σ¯
φ S,σ¯
/ R mod S,σ¯
of objects of CO where the horizontal maps are isomorphisms, and the bottom ˜ map sends T p to δ(Frob p )T p for all odd p at which ρ¯ is unramified. Proof First, note that the set N S is finite since if f is a form of odd level which gives rise to an S-deformation of ρ, ¯ then by Carayol’s theorem the level N f of f is equal to the Q dimk ρ¯ I p . conductor of ρ f , and this is bounded by N (ρ) ¯ p∈S p
342
MARK DICKINSON
Q Since R Smod is by definition a subring of f ∈N S K f , it is clear that R Smod is reduced and free of finite rank as an O -module. Now, consider the inclusion map Y R Smod ⊗O K → Kf. f ∈N S
The algebra on the left is a reduced finite-dimensional K -algebra, so it is a product of finite extensions of K . Each maximal ideal of R Smod ⊗O K arises as the inverse image of a maximal ideal of the product on the right; that is, it arises from some particular f in N S as the kernel of the surjective map R Smod ⊗O K → K f which sends T p to t p ( f ) for all odd p not in S. If two newforms f and g give rise to the same maximal ideal p of R Smod ⊗O K , then the corresponding field extensions (R Smod ⊗O K )/p → K¯ are conjugate by an element of Gal( K¯ /K ) and so f and g are conjugate by this same element, by multiplicity one. Conversely, any two newforms f and g which are conjugate by an element of Gal( K¯ /K ) give rise to the same maximal ideal, and if f is an element of N S , then so is any Gal( K¯ /K )-conjugate of f ; thus the number of conjugates of f in N S is equal to the degree of K f over K . Summing over the various conjugacy classes, we find that the K -dimension of R Smod ⊗O K , and hence the O -rank of R Smod , is equal to the number of elements of N S . Now R Smod ⊗O O 0 → (R Smod )0 is a surjection of O 0 -modules of equal rank, and so it is an isomorphism. For the last part of the proposition, note that the set N S is in bijection with the set N S,σ¯ of weight 2 newforms of odd level giving rise to S-deformations of σ¯ ; the bijection is essentially given by twisting by the Dirichlet character corresponding to the Teichm¨uller lift of δ. The result then follows easily from the definitions of R Smod mod . and R S,σ In addition to the elements T p (for p not in S and not dividing 2N (ρ)) ¯ of R Smod already defined in Section 1, we let S p be the element (s p ( f )) f ∈N S for p = 2 and for odd primes p not in S at which ρ¯ is unramified. For odd p, S p = p −1 det ρ Smod (Frob p ) is clearly contained in R Smod ; since the map p 7→ S p factors through some Dirichlet character of odd conductor, the element S2 is equal to S p for some odd p and so is also in R Smod . We also need to know that the element T2 = (t2 ( f )) f ∈N S is in R Smod . Let 8 be an element of G 2 which lifts Frob2 ; then by our assumptions on ρ| ¯ G 2 the matrix ρ(8) ¯ has distinct eigenvalues and so by Hensel’s lemma the characteristic polynomial X 7→ X 2 − tr ρ Smod (8)X + det ρ Smod (8) of ρ Smod (8) has distinct roots in R Smod . Let u be the root that lies above α. From Proposition 14 we know that for each f in N S the eigenvalue of the matrix ρ f (8) which lies over α is equal to the unit root of the polynomial X 2 − t2 ( f )X + 2s2 ( f ).
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
343
Thus u 2 − T2 u + 2S2 = 0, and (t2 ( f )) f ∈N S = T2 = u + 2S2 /u is contained in R Smod . Variant. We could also define the set N S to be the set of forms giving rise to Sdeformations of ρ¯ and of level not divisible by 4. By Proposition 14 and our assumptions on the behaviour of ρ¯ at 2, this gives the same set. Now, instead of defining T2 , define an element U2 of R Smod to be equal to the element u found above. If f is an element of N S with corresponding automorphic representation π f = ⊗0p π p , then π2 U (2)
is unramified and π2 1 is 2-dimensional; thus U2 = (u 2 ( f )) f ∈N S , where u 2 ( f ) is U (2) the eigenvalue of U2 on π2 1 which lies above α. 6. Representations of GL2 (Z p ) In this section we recall the definition and properties of some representations of GL2 (Z p ) introduced in [CDT]. Let p be an odd prime, let A be the ring of Witt vectors of F p2 , and let σ denote the Z p -automorphism of A corresponding to the Frobenius automorphism of F p2 . Let F be any algebraically closed field of characteristic zero, and let χ : A× → F × be a finite-order character of conductor p n , some n ≥ 1, such that χ/χ ◦ σ also has conductor p n . Then B. Conrad, Diamond, and Taylor define in [CDT, Sec. 3.2] a representation 2(χ) of GL2 (Z p ) over F. 19 The representation 2(χ) has the following properties: (1) 2(χ ) factors through GL2 (Z/ p n Z) to give an irreducible representation of degree ( p − 1) p n−1 ; × n (2) if δ : Z× p → F is a character of finite order and conductor at most p , then × 2(χ (δ ◦norm)) is isomorphic to 2(χ )⊗ F F(δ ◦det), where norm : A → Z× p is the norm map; (3) 2(χ ) is isomorphic to 2(ψ) if and only if χ is equal to ψ or ψ ◦ σ ; (4) the restriction of 2(χ) to U1 ( p) is irreducible, and the restrictions of 2(χ ) and 2(ψ) to U1 ( p) are isomorphic if and only if χ |1+ p A is equal to one of ψ|1+ p A or ψ ◦ σ |1+ p A ; (5) the central character of 2(χ) is equal to the restriction of χ to Z× p; (6) let V be the open compact subgroup of GL2 (Z p ) consisting of all elements that reduce modulo p n to a matrix of the form ∗0 01 ; then 2(χ)V has dimension 1 over F; (7) 2(χ ) can be realised over the subfield of F generated by values of (χ +χ ◦σ ). PROPOSITION
Proof Although the statements above are slightly different from those in [CDT] (since we deal with restrictions to U1 ( p) rather than to U0 ( p)), the proof of [CDT, Lem. 3.2.1]
344
MARK DICKINSON
also adapts to prove the analogous statements above. The last two statements are not proved in [CDT]; to give a somewhat ad hoc proof of the first of these, we may assume that F = K¯ ; then we may use Proposition 21 to find a 2-dimensional representation of W p of conductor p 2n such that if π is the corresponding representation of GL2 (Q p ), n then (π|GL2 (Z p ) )U ( p ) is isomorphic to 2(χ). Then π also has conductor p 2n and its central character has conductor p n ; it follows that the subspace of π fixed by n −n (1+ p n Z p )U1 ( p 2n ) = p0 01 V p0 01 is 1-dimensional over F and hence that π V = 2(χ )V is 1-dimensional. For the last statement, it is evident from the construction in [CDT] and the fact that 2(χ) is isomorphic to 2(χ ◦ σ ) that the trace of 2(χ) takes values in Q(χ + χ ◦ σ ); now we use the previous part of the proposition along with [W, Lem. I.1]. If χ : A× → F × is such that χ/χ ◦ σ has conductor p n for some n ≥ 1, but χ × itself has strictly greater conductor, then for some character δ : Z× p → F the twist −1 n −1 χ (δ ◦norm) of χ has conductor p and we define 2(χ) to be 2(χ (δ ◦norm))⊗ F F(δ ◦ det); by Proposition 19 this definition does not depend on the choice of δ. Now, suppose that F = K¯ and that χ : A× → K¯ × is a continuous (hence finite order) character. By a model of 2(χ) over O we mean an O [GL2 (Z p )]-module L, free over O , such that L ⊗O K¯ is isomorphic to 2(χ). 20 K¯ × be a character as above, and suppose there is a model L for 2(χ ) over O . Then we have the following: (1) L is the unique model of 2(χ) up to isomorphism; (2) the reduction L¯ = L ⊗O k of L is an absolutely irreducible k[U1 ( p)]-module; (3) there is an isomorphism PROPOSITION Let χ : A× →
L∼ = HomO (L , O ) ⊗ (χ |Z×p ◦ det) (4)
of O [GL2 (Z p )]-modules; and if L 0 is a model for 2(χ 0 ) over O and if the reduction of the pair {χ 0 , χ 0 ◦ σ } ¯ is equal to that of {χ, χ ◦ σ }, then L¯ 0 is isomorphic to L.
Proof The second part follows from [S1, Sec. 16.4, Prop. 46] using the fact that 2(χ)|U1 ( p) is absolutely irreducible; the first part then follows from [S1, Exercise 15.3]. To prove the third part, note that there is a corresponding nondegenerate pairing on V , as described in [CDT, Sec. 3.3], so that both sides are models for V and hence isomorphic. The last part comes from the fact that both L¯ 0 and L¯ have the same Brauer character; this follows from the construction of 2(χ) in [CDT].
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
345
Finally, we have the following supplement to the description of the local Langlands correspondence given in Section 3. PROPOSITION 21 Suppose that ψ : I p → K¯ × is the restriction of a character of the Galois group H C G p of the unramified quadratic extension of Q p , and write χ for the corresponding character on A× . Assume that the conductors of χ and χ/χ ◦ σ are both equal to p n for some n ≥ 1. Let π be an irreducible admissible representation of GL2 (Q p ) over n K¯ . Then the representation π U ( p ) of GL2 (Z p ) is isomorphic to 2(χ ) if and only if the restriction to I p of the corresponding representation of W p is isomorphic to ψ 0 U ( p n ) ) is 0 ψ◦Frob p ; if this is not the case, then the module Hom K¯ [GL2 (Z p )] (2(χ ), π trivial.
Here Frob p denotes the automorphism of H ab given by conjugation by Frob p . Proof This is [CDT, part 3 of Lem. 4.2.4]. 7. A module for the modular deformation ring In this section we explain how to define a module HS for the modular deformation ring R Smod , and we give a useful reinterpretation of the definition of HS in the special case when S contains all odd primes at which ρ¯ is ramified. The module HS eventually turns out to be free of rank 2 over R Smod . These results are used along with the main theorem to establish the multiplicity one result given in Section 1. For convenience we assume in this section that K is large enough that some twist of ρ¯ by a k × -valued character is minimal, in the sense of Section 2.2. This can be achieved, for example, by adding all roots of unity of order M to K , where M is the odd part of φ(N (ρ)). ¯ Thus, in general, our definition of HS is valid only after replacing K by some finite (unramified) extension and making the corresponding replacements for O , k, ρ, ¯ and CO . Begin by choosing, once and for all, a representation σ¯ : G Q → GL2 (k) and a character δ : G Q → k × such that ρ¯ = σ¯ ⊗k δ and σ¯ is minimal. We may assume that ˆ × → O × for the δ is ramified only at those primes dividing N (ρ)/N ¯ (σ¯ ). Write 1 : Z −1 Teichm¨uller lift of the character δ ◦ ε corresponding to δ. For each odd prime p, let p c p be the conductor of σ¯ |G p and let e p be the dimension of σ¯ I p . Let A and σ be as ¯ to be the set of odd in the previous section. Recall that in Section 2 we defined T (ρ) primes for which the restriction of ρ¯ to I p is decomposable over k¯ but the restriction to G p is absolutely irreducible. For each such prime the representation ρ| ¯ G p ⊗k k¯ is induced from a character of the absolute Galois group of the degree 2 unramified
346
MARK DICKINSON
extension of Q p , whose restriction to I p corresponds under local class field theory to a character A× → k¯ × . We let χ p : A× → K¯ × be the Teichm¨uller lift of this character; note that χ p + χ p ◦ σ takes values in K . We describe how to construct for each prime p an O [GL2 (Z p )]-module M p . (1) If p is in T (ρ), ¯ then let M p be a model over O for the representation 2(χ p ) described in Section 6. (By the last part of Proposition 19 the representation 2(χ p ) is realisable over K , so that a model over O exists.) (2) If p is not in T (ρ), ¯ then let M p = O (1 p ◦ det), where 1 p denotes the restriction of 1 to Z× . p Note, in particular, that M p has trivial GL2 (Z p )-action when p = 2 or ρ¯ is ˆ Let S be a finite set of unramified at p. Let M be the module ⊗ p M p for GL2 (Z). ˆ of the form odd primes; then we also define an open compact subgroup VS of GL2 (Z) Q VS = p VS, p , where each VS, p is an open compact subgroup of GL2 (Z p ). For p not in S, define VS, p as follows: (1) let VS,2 = GL2 (Z2 ); (2) if p is an odd prime not in T (ρ), ¯ then let VS, p be the set of elements ac db in U0 ( p c p ) for which d has order a power of 2 in (Z/ p c p Z)× ; (3) if p is an odd prime in T (ρ), ¯ then let VS, p = GL2 (Z p ). If p is in S, then we use the following definitions: (1) if p is not in T (ρ), ¯ then let VS, p = U1 ( p c p +e p ); (2) if p is in T (ρ), ¯ then let VS, p = U1 ( p). We now explain how to use the choices of VS and M to identify the set N S of weight 2 newforms of odd level giving rise to S-deformations of ρ. ¯ First we do some local analysis. Suppose that f is a weight 2 newform from which ρ¯ arises, with corresponding automorphic representation π f = ⊗0p π p . Then for each prime p we define a finitedimensional K¯ -module H p, f = Hom K¯ [VS, p ] (M p ⊗O K¯ , π p ). For p = 2 and for odd p not in S at which ρ¯ is unramified, H p, f ∼ = π pGL2 (Z p ) and there is a natural action of the double coset operator T p on H p, f . For p in S − T (ρ) ¯ the GL2 (Z p )module M p is equal to O (1 p ◦ det), and we can define an action of U p as follows. Extend 1 p to a character of Q× p by setting 1 p ( p) = 1; then M p can be considered as a module for GL2 (Q p ). Hence Hom K¯ (M p ⊗O K¯ , π p ) can also be considered as a module for GL2 (Q p ) (by defining (g f )(x) = g( f (g −1 x)) for g in GL2 (Z p ), x in M K¯ , and f : M K¯ → π f ), and we define U p to be the usual double coset operator VS, p 0p 10 VS, p . LEMMA 22 The module H p, f is nontrivial if and only if p = 2 and f has odd level, or p is odd and the deformation (R f , ρ f ) corresponding to f satisfies the condition at p in the
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
347
definition of an S-deformation. Furthermore, if this condition is satisfied, then (1) for p = 2 or p not in S at which ρ¯ is unramified, H p, f is 1-dimensional and T p acts by t p ( f ); (2) for p in S − T (ρ), ¯ write m p for the ideal (U p )+mO of the polynomial algebra O [U p ]; then there is a decomposition H p, f = H p, f,0 ⊕ H p, f,1
(3)
of K¯ [U p ]-modules such that H p, f,0 is 1-dimensional and annihilated by U p and (H p, f,1 )m p = 0; for the remaining p, the module H p, f is 1-dimensional.
Note that if p is in S, then there is no condition imposed at p in the definition of an S-deformation, so that for these primes Lemma 22 implies that H p, f should always be nontrivial. Proof If p = 2 or p is an odd prime not in S at which ρ¯ is unramified, then, using the results GL (Z ) described in Section 3, the space H p, f = π p 2 p is trivial unless π p is unramified, in which case it has dimension 1. But by Theorem 13, for odd p the representation π p is unramified if and only if ρ f |G p is, while for p = 2 the representation π2 is unramified if and only if f has odd level. Now, suppose that p is in S−T (ρ). ¯ Then there is no condition imposed at p in the definition of an S-deformation, so it is enough to check that the stated decomposition U1 ( p c p +e p ) is nontrivial. By Theorem 13 holds, and then H p, f ∼ = (π p ⊗ K¯ K¯ (1−1 p ◦det)) the representation π p ⊗ K¯ K¯ (1−1 p ◦ det) (considered as a representation of GL2 (Q p ) by setting 1( p) = 1) corresponds under the local Langlands correspondence to a lift σ p : W p → GL2 ( K¯ ) of σ¯ |W p . If p c p (σ p ) is the conductor of σ p and e p (σ p ) = dim K¯ σ I p , then from the definitions c p (σ p ) + e p (σ p ) = c p + e p , and then the last part p of Theorem 13 provides the required decomposition. The last part of the lemma again follows from Theorem 13 along with Proposition 9 and, for p in T (ρ), ¯ Proposition 21. We give details when p is in S ∩ T (ρ). ¯ In this case, we must show that the module H p, f = Hom K¯ [U1 ( p)] (2(χ p ), π p ) is always 1-dimensional; the argument is mildly complicated by the need to twist representations so that the conductor conditions of Proposition 21 are satisfied. The representation π p corresponds under the local Langlands correspondence to a representation ρ p : W p → GL2 ( K¯ ) which lifts ρ| ¯ W p ; by Proposition 9 this representation is induced from some character of the index 2 subgroup of W p containing I p , and the restriction of this character to I p corresponds by local class field theory to a character ψ p : A× → K¯ × whose reduction (replacing ψ p with ψ p ◦ σ if necessary) is equal
348
MARK DICKINSON
to that of χ p . Note that ψ p |1+ p A = χ p |1+ p A since the quotient ψ p /χ p has 2-power order and so is trivial on the pro- p-group 1 + p A. From Proposition 21, applied with −1 −1 U ( pn ) χ = ψ p (1−1 p ◦norm) and π = π p ⊗(1 p ◦det), we find that (π p ⊗(1 p ◦det)) n −1 is isomorphic to 2(ψ p (1−1 p ◦ norm)). Here p is the conductor of ψ p (1 p ◦ norm), −1 which is also equal to the conductor of χ p (1 p ◦ norm). Since the representation n 2(χ p (1−1 p ◦ norm)) of GL2 (Z p ) has trivial U ( p )-action, it follows that −1 H p, f ∼ = Hom K¯ [U1 ( p)] 2(χ p (1−1 p ◦ norm)), 2(ψ p (1 p ◦ norm)) , which is 1-dimensional since the two representations 2(χ p ) and 2(ψ p ) of U1 ( p) are irreducible and isomorphic by Proposition 19. Now, let T S be the polynomial O -algebra generated by indeterminates T p for p = 2 and for odd primes p not in S at which ρ¯ is unramified and U p for p in S − T (ρ). ¯ Let I S be the kernel of the surjective map T S → R Smod which sends the indeterminate U p to zero for p in S − T (ρ) ¯ and the indeterminate T p to the element T p of R Smod for p = 2 and for odd primes not in S at which ρ¯ is unramified. Let m S be the preimage N in T S of the maximal ideal of R Smod . Let M K¯ = p (M p ) K¯ denote the (irreducible) ˆ over K¯ , and let f be a weight 2 newform with representation M ⊗O K¯ of GL2 (Z) corresponding automorphic representation π f = ⊗0p π p . Then the actions of T p and U p described above give the K¯ -vector space Hom K¯ [VS ] (M K¯ , π f ) an action of T S . The following lemma indicates when a weight 2 newform f gives rise to an Sdeformation of ρ, ¯ in terms of the local components of the automorphic representation associated to f .
LEMMA 23 Let f be a weight 2 newform with corresponding automorphic representation π f = ⊗0p π p over K¯ . Then f is in N S if and only if the finite-dimensional K¯ -vector space
Hom K¯ [VS ] (M K¯ , π f )m S is nontrivial. In this case, this vector space has dimension 1 and the natural map Hom K¯ [VS ] (M K¯ , π f )[I S ] → Hom K¯ [VS ] (M K¯ , π f )m S is an isomorphism. Proof Write H for the K¯ -vector space HomVS (M K¯ , π f ); we have an isomorphism H∼ = ⊗ p H p, f
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
349
of finite-dimensional K¯ -vector spaces where H p, f was defined above, and for each prime p the operator T p or U p acts only through H p, f . One can check that the K¯ modules Hm S and H [I S ] are unaffected by base change (that is, by replacing K with a finite extension and making corresponding changes for all related data), and so without loss of generality we may assume that O , and hence T S , contains the Hecke eigenvalues of f . First, suppose that Hm S is nonzero, so that H p, f must be nonzero for each p. For all but finitely many p, the element T p − t p ( f ) of T S annihilates H p, f and hence also H ; so if Hm S is to be nonzero, then T p − t p ( f ) must be in m S for these p and it follows that ρ¯ arises from f . Now we can apply the first part of Lemma 22 to deduce that f must in fact be an element of N S . Conversely, suppose that f is in N S . Since m S is the unique maximal ideal of T S containing I S , it follows that, for any T S -module L, the natural map L[I S ] → L m S is an inclusion. We describe a decomposition H = H0 ⊕ H1 of T S -modules such that H0 is 1-dimensional and equal to H0 [I S ] while (H1 )m S = 0; then H [I S ] = H0 = Hm S , and the result follows. For each p in S − T (ρ), ¯ Lemma 22 gives a decomposition of H p, f into two components H p, f,0 and H p, f,1 ; this gives a corresponding decomposition of H = ¯ components. Let H be the single component involving the ⊗ p H p, f into 2#(S−T (ρ)) 0 product of the H p, f,0 , and let H1 be the sum of the remaining components. Let p f be the kernel of the map T S → K¯ which sends T p to t p ( f ) for p = 2 and odd p at which ρ¯ is unramified, and sends U p to zero for p in S − T (ρ); ¯ then from Lemma 22 it follows that H0 = H0 [p f ] and that (H1 )m S = 0. Furthermore, since f is in N S by assumption, the map T S → K¯ factors through the map T S → R Smod , and so I S ⊂ p f and H0 [I S ] = H0 . This gives the required decomposition of H and completes the proof of the lemma. The preceding result motivates the following definition of the module HS for R Smod . Recall that in Section 4 we defined an admissible GL2 (A∞ )-module H as the direct limit limU H 1 (X U , O ) of cohomology groups of modular curves. Now using − → the definitions of VS , M, T S , I S , and m S above, we have a natural action of T S on HomO [VS ] (M, H ) defined in the same way as the action on HomO [VS ] (M K¯ , π f ), and we define HS = HomO [VS ] (M, H )[I S ]. Since HS is supported only at the maximal ideal m S of T S , there is an inclusion HS → HomO [VS ] (M, H )m S of T S -modules, which is an isomorphism since both sides are free O -modules of the same rank (from Lemma 23 and the decomposition of H ⊗O K¯ ) and HS has torsionfree cokernel in HomO [VS ] (M, H ).
350
MARK DICKINSON
PROPOSITION 24 The module HS ⊗O K is a free R Smod ⊗O K -module of rank 2.
Proof It is enough to show this result after base change to K¯ , and then it follows easily from Lemma 23 and from the decomposition of H ⊗O K¯ . COROLLARY 25 The ring R Smod can be identified with the O -subalgebra of EndO HS generated by the Hecke operators T p and S p for p = 2 and for odd p not in S at which ρ¯ is unramified, and by operators U p for the remaining odd primes p, excluding those in T (ρ). ¯
Proof The natural map R Smod → EndO HS of free O -modules is injective by Proposition 24, and its image is generated by T p for some cofinite set of primes p. We need only show that the images of all the Hecke operators actually lie in R Smod . For p = 2 we have already shown in Section 5 that there is an element T2 of R Smod whose action on HS corresponds to the action of the Hecke operator T2 . If p is odd and not in S and ρ¯ is unramified at p, then T p = tr ρ Smod (Frob p ). If p is odd and an element of S − T (ρ), ¯ then U p = 0. If p is odd and not in S or T (ρ), ¯ then either ρ¯ I p = 0 and mod I I p p U p = 0 or ρ¯ has dimension 1 over k, (ρ S ) is a free rank 1 R Smod -module, and U p = tr(ρ Smod ) I p (Frob p ). If S contains T (ρ) ¯ and all primes at which δ is ramified (in other words, if S contains all those primes p for which the component M p of M is not just O with trivial GL2 (Z p )-action), it is possible to give a simpler description of HS which does not involve the module M. The most useful case of this is that in which S contains all odd primes at which ρ¯ is ramified. PROPOSITION 26 Suppose that S contains all odd primes p at which ρ¯ is ramified. Define an integer N by the formula Y Ip N = N (ρ) ¯ p dimk ρ¯ . p∈S
Then there are isomorphisms HS ∼ = (H U1 (N ) )m S ∼ = H 1 (X 1 (N ), O )m S of T S -modules. Furthermore, for all p in S (including those p in T (ρ)) ¯ the natural 1 action of U p on H (X 1 (N ), O )m S is by zero.
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
351
Proof First, note that N is at least 11 by Proposition 1 (since there are no weight 2 cusp forms of level smaller than 11), and so U1 (N ) is sufficiently small. Thus the second isomorphism of the theorem is immediate from Proposition 15. For the first isomorphism, we only have to deal with primes p in T (ρ) ¯ and primes p at which δ is ramified, since these are the only p for which M p has non0 trivial GL2 (Z p )-action. First, suppose that p is in T (ρ). ¯ Let VS, p be the subgroup of GL2 (Z p ) consisting of all matrices that reduce modulo p c p /2 to something of the form ∗0 01 . By Proposition 19 the space HomO [VS,0 p ] (O (1 p ◦ det), M p ) is a free O module of rank 1; let f p be a generator for this. Then composition with f p gives −c p /2 0 0 c p /2 0 a map HS → HomO [VS,0 p ] (O (1 ◦ det), H ); since p VS, p p contains 0 1 0 1 −c p /2 0 U1 ( p c p ), we can compose further with the map H → H given by x 7→ p x 0 1 to obtain an injective map HS → HomO [U1 ( pc p )] (O (1 ◦ det), H )m S . cp (Here the matrix p0 01 should be thought of as an element of GL2 (Q p ).) This map is easily seen to be injective with torsion-free cokernel. (The latter fact follows from the fact that the reduction of M p modulo the maximal ideal of O is an absolutely irreducible k[U1 ( p)]-module and hence that f p generates M p as an O [U1 ( p)]-module.) Combining these maps for all p in T (ρ), ¯ we obtain a map HS → HomO [U1 (N 0 )] (O (1 ◦ det), H )m S which is again injective with torsion-free cokernel, where N 0 denotes the integer Q Q Ip N (σ¯ ) p∈S p dimk σ¯ = p∈S p c p +e p . Now, using analogues of Lemmas 22 and 23, one can show that the domain and codomain of this map both have the same rank over O and hence that the map is an isomorphism. Now we untwist to remove the factor of 1 ◦ det: let f be any element of HomU1 (N 0 ) (O (1 ◦ det), H )[I S ]; then the image of f is contained in the subspace H 0 ⊂ H consisting of all elements x of H for which 1 1 acts trivially on x, and (1) 01 (2) the Hecke operator U p annihilates x for each p dividing the conductor N (1) of 1. P N (1)−1 1 i/N (1) For any x satisfying these two properties, note that i=0 x = 0. Now, 0 1 consider the map θ: H 0 → H 0 P N (1)−1 given by multiplication by the Gauss sum i=0 1(i) 10 i/N1(1) . This map is an
352
MARK DICKINSON
isomorphism, with inverse given by N (1)−1 X 1 1−1 (i) x 7→ N (1)
1 −i/N (1) 0 1
x.
i=0
Hence the map HomO [VS ] (O (1 ◦ det), H )[I S ] → HomO (O , H ) given by composition with θ is injective with torsion-free cokernel. A straightforward but tedious case-by-case check shows that the image of this map is contained in HomO [U1 (N )] (O , H )[I S ] = (H U1 (N ) )m S ; it is also straightforward to check (again using the methods of Lemmas 22 and 23) that both modules are free O -modules of the same rank, so that this map is also an isomorphism. In order to reduce statements of the main theorem to the minimal case, we also need to understand the behaviour of HS under twisting and under change of base. PROPOSITION 27 The module HS has the following two properties. (1) Let K 0 be a finite extension of K with ring of integers O 0 and residue field k 0 , and, using these data, define a module HS0 as above. Then HS0 is isomorphic as an (R Smod )0 = R Smod ⊗O O 0 -module to HS ⊗O O 0 . (2) Let HS,σ¯ denote the module defined as above using the representation σ¯ in place of ρ. ¯ Then there is an isomorphism
HS → HS,σ¯ of O -modules such that for each odd prime p at which ρ¯ is unramified the ˜ action of T p in R Smod on HS corresponds to the action of T p δ(Frob p ) on HS,σ¯ . mod Equivalently, the module HS regarded as a module over R S,σ¯ via the inverse mod is isomorphic to H of the natural isomorphism R Smod → R S, S,σ¯ . σ¯ mod of rank 2, then it follows that H is Thus if we can show that HS,σ¯ is free over R S, S σ¯ mod free of rank 2 over R S .
Proof The proof of the first statement is straightforward from the definition of HS and from the fact that the module M is a flat and finitely presented O -module. For the proof of the second statement, let N (δ) be the conductor of δ and suppose that V is any open compact subgroup of GL2 (A∞ ) whose determinant is contained in
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
353
ˆ × → O × for the character the kernel of the map Zˆ × → (Z/N (δ)Z)× . Write 1 : Z ˆ × × R× ) by corresponding to the Teichm¨uller lift of δ, and extend 1 to A× = Q× (Z >0 × × making it trivial on Q and R>0 . Then the map [g] 7 → 1(det g) from YV to O (where [g] is the element of YV represented by an element g of GL2 (A)) is a well-defined locally constant map, which extends to the cusps to give a map θ : X V → O ; consider the endomorphism of the space of O -valued divisors of X V which sends a point [g] to θ (g) · [g]. This gives a natural map H 1 (X V , O ) → H 1 (X V , O ); putting these maps together for all V as above, we obtain a map 2: H → H of O -modules satisfying 2(gx) = 1(det g)2(x) for x in H and g in GL2 (A∞ ). Now, let Mσ¯ be the module defined in the same way as M but with respect to σ¯ in place of ρ; ¯ then M = Mσ¯ ⊗O 1 ◦ det as a U S -module. Let m S,σ¯ be the maximal ideal of T S corresponding to σ¯ . Then one can check that composition with 2 gives the required isomorphism HomU S (M, H )m S → HomU S (Mσ¯ , H )m S,σ¯ .
Variant. Building on the variant at the end of Section 5, one can take VS,2 to be U1 (2) instead of GL2 (Z2 ), include an indeterminate U2 in T S in place of T2 , and send it to U2 in R Smod in the definition of I S and m S . Then the corresponding definition of HS gives a module that is isomorphic to the original HS , but the proof of Proposition 26 naturally identifies this new HS with H 1 (X 1 (2N ), O )m S . 8. The main theorem We are now in a position to state a stronger version of the main theorem. Let ρ¯ and S be as in Section 1. We prove the following theorem. 28 We have the following: (1) the map φ S : R Suniv → R Smod is an isomorphism; (2) the algebra R Suniv is a complete intersection ring; and (3) after any change of base necessary to define HS , this module is free over R Suniv of rank 2. THEOREM
The first step in the proof is to reduce to the case where ρ¯ is minimal; this reduces the number of possible cases for the local behaviour of ρ¯ and so makes some of the later calculations easier. In order to make ρ¯ minimal, we must first replace K by some finite
354
MARK DICKINSON
extension, with corresponding replacements for O , k, ρ, ¯ and CO , and then we must twist ρ¯ by a suitably chosen k × -valued character. That we can do both of these things follows from Propositions 8, 18, and 27, along with standard base change results. We assume from this point onwards that ρ¯ is minimal. We also explain here how to add auxiliary structure that is necessary in later arguments. Let r be an odd prime congruent to 3 modulo 4, and assume that r is not in S, ρ¯ is unramified at r , and ρ(Frob ¯ r ) has distinct k-rational eigenvalues. That ˇ such a prime exists follows from the Cebotarev density theorem together with the fact that the restriction of ρ¯ to G Q(i) is absolutely irreducible. (If this were false, then ρ¯ would be induced from a character of G Q(i) and this would contradict the hypothesis on the restriction of ρ¯ to G 2 .) Now in the definition of HS we alter the subgroup ˆ by replacing VS,r = GL2 (Zr ) with U1 (r ){±I } and by removing the VS of GL2 (Z) generator Tr from T S , and we write L S for the R Smod -module thus obtained. PROPOSITION 29 There is an isomorphism L S ∼ = HS ⊕ HS of R Smod -modules.
Proof Let VS0 be the subgroup obtained from VS by adding auxiliary structure at r . Thus VS0 is a product of subgroups of GL2 (Z p ), and these component subgroups are identical to those of VS , except at the place r where we use U1 (r ){±I } in place of GL2 (Zr ). The module L S is therefore defined as HomO [VS0 ] (M, H )m S . Since (R Smod , ρ Smod ) is an S-deformation of ρ, ¯ the representation ρ Smod is unrammod ified at r . Let α˜ r and β˜r in R S be the eigenvalues of ρ Smod (Frobr ) lifting the eigenmod and of U which values αr and βr of ρ(Frob ¯ r ). The module L S has actions of R S r commute with each other. Now, define a map HS → L S sending an element f of HS to (Ur − β˜r ) f , and P a map L S → HS which sends an element f of L S to g g f as g runs over a set of coset representatives for GL2 (Zr )/U0 (r ). We claim that the module L S is unaltered if we define it using U0 (r ) in place of U1 (r ){±1} so that the latter map is well defined and L S decomposes into two pieces on which Ur acts by α˜ r and β˜r , respectively; we claim further that the composite HS → HS of the two maps is equal to multiplication by the unit r α˜ r − β˜r of R Smod . Then the maps given identify HS with the submodule of L S on which Ur acts by α˜ r , and by symmetry we can also identify HS with the submodule of L S on which Ur acts by β˜r . This proves the proposition. It is enough to check the claims after base change from O to K¯ . But then, using the decomposition of H ⊗O K¯ , both HS ⊗O K¯ and L S ⊗O K¯ decompose into components corresponding to weight 2 newforms from which ρ¯ arises, and the problem is reduced to checking the claims for any one component.
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
355
By Lemma 23 the only newforms contributing to HS ⊗O K¯ are those in N S , and in fact the same is true for L S ; to see this, note that if f is a weight 2 newform from which ρ¯ arises and π f = ⊗0p π p is the corresponding automorphic representation, then πr is a principal series representation. For each weight 2 newform f from which ρ¯ arises, the r th component of the corresponding representation is principal series, equal to π(χ , ψ), where χ and ψ are tamely ramified, and the restriction of its central character to Zr× is of 2-power order. This form can contribute nontrivially to L S ⊗O K¯ only if π(χ, ψ)U1 (r ){±I } is nonzero. But then, since r is congruent to 3 modulo 4, the central character restricted U (r ) U (r ) to Zr× has odd order and so is trivial. So πrUr is equal to πr 0 . Furthermore, if πr 0 is nontrivial, then πr has conductor at worst r , so that at least one of the characters χ and ψ is unramified, and the central character of πr is also unramified. Hence both χ and ψ are unramified, and so is πr . It follows that the representation associated to f is unramified at r and hence that the only newforms contributing nontrivially to either of HS or L S are those in N S . Furthermore, the map L S → HS above is well defined, as claimed. One can also check, using the explicit description of π(χ, ψ), that the composite of the two maps defined above is equal to multiplication by r 2 χ(r ) − ψ(r ), or r α˜ r − β˜r , as required. Thus it is enough to show that L S is a free R Smod -module (of rank 4), and then HS is projective, so also free, necessarily of rank 2. A key step in proving the main theorem, both in the case S = ∅ and the case S 6= ∅, requires relating the modules L S for various S. Suppose that S ⊂ S 0 is an inclusion of finite sets of odd primes; then for each p not in S we define an element µ p of R Smod by the following formulas: (1) if e p = 2, then µ p = ( p − 1)2 (T p2 − S p (1 + p)2 ); (2) if e p = 1 and ρ| ¯ I p is not semisimple, then µ p = ( p − 1)2 ( p + 1); (3) if e p = 1 and ρ| ¯ I p is semisimple, then µ p = ( p − 1)2 ; (4) if e p = 0 and p ∈ T (ρ), ¯ then µ p = p 2 − 1; and (5) if e p = 0 and p ∈ / T (ρ), ¯ then µ p = p − 1. Q Now we define an element µ S,S 0 of R Smod by setting µ S,S 0 = S 0 −S µ p , except in the Q special case S = ∅ 6= S 0 when we define instead µ S,S 0 = (1/2) S 0 −S µ p . Note that if p is a prime at which ρ¯ is unramified and ρ(Frob ¯ p ) has distinct mod 2 2 eigenvalues, then T p − S p (1 + p) is a unit in R∅ , so µ p is a unit times ( p − 1)2 . In particular, if S = Q is of the form considered toward the end of Section 2, then the element µ∅,Q of R∅mod is equal to a unit times the order of the group G Q . When S ⊂ S 0 , we have a natural surjection R Suniv → R Suniv arising from the universal 0 univ univ property of (R S 0 , ρ S 0 ), and we also have a surjection R Smod → R Smod induced by 0 Q Q ¯ the projection f ∈N 0 K¯ → f ∈N S K ; these maps are compatible with the maps S
356
MARK DICKINSON
φ S 0 : R Suniv → R Smod and φ S : R Suniv → R Smod . We have the following analogue of 0 0 [CDT, Prop. 5.5.1], which is used to establish the main theorem both in the case S = ∅ and in the case S 6= ∅.
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
357
PROPOSITION 30 There is a perfect pairing
L S ⊗O L S → O of O -modules which induces an isomorphism L S → HomO (L S , O ) of R Smod modules. For each inclusion S ⊂ S 0 of finite sets of odd primes, with r not in S 0 , there is a map i S,S 0 : L S → L S 0 of R Smod 0 -modules such that (1) if S ⊂ S 0 ⊂ S 00 , then i S,S 00 = i S 0 ,S 00 ◦ i S,S 0 ; (2) i S,S 0 is injective with torsion-free cokernel; and (3) if jS 0 ,S : L S 0 → L S is the adjoint of i S,S 0 with respect to the pairings of Proposition 30, then the composite jS 0 ,S ◦ i S,S 0 : L S → L S is given by multiplication by an element of R Smod ; this element is a unit times the element µ S,S 0 defined above. Proof These results are proved in exactly the same way as in [CDT, Sec. 6.3], using Propositions 15 and 16 of Section 4 in place of [CDT, Lems. 6.1.2 and 6.3.1]. Note that each of the elements µ p defined above is equal to ( p − 1) times the corresponding factor defined in [CDT]. This difference is accounted for by the fact that we impose no global restriction on the determinant of a deformation of ρ, ¯ with the result that the maps of modular curves used to define the inclusion L S → L S 0 may have larger degree than the corresponding maps in [CDT]. The larger degree just results in an extra constant factor in the computation of the composite jS 0 ,S ◦ i S,S 0 , and it does not affect the calculations otherwise. Similarly, in the special case when S = ∅ 6= S 0 , an extra factor of 1/2 appears in the computations; this is a result of the fact that the index of VS 0 in V∅ is twice the degree of the map Y S 0 → Y∅ because V∅ contains the element −1 0 0 0 −1 while VS does not. 9. Proof of the main theorem when S = ∅ In this section we assemble the ingredients needed to prove the main theorem when S = ∅. The backbone of the argument is the following commutative algebra lemma, due to Diamond [Dia3], which simplifies the original commutative algebra arguments described in [Wi2] and [TW]. We restate it here in a less general form that is more immediately applicable to our situation. LEMMA 31 Let R be an object of CO , and let H be a nonzero R-module that is free of finite
358
MARK DICKINSON
rank as an O -module. Suppose that for some fixed integer r ≥ 0 and for each of an unbounded set of n ≥ 0 we have the following data: (1) an object Rn of CO which can be generated as a topological O -algebra by r elements and a surjection Rn → R; (2) a finite abelian 2-group G n that is the product of r cyclic groups, each of order at least 2n , and a map O [G n ] → Rn of objects of CO such that the kernel of the composite O [G n ] → Rn → R contains the augmentation ideal of O [G n ]; and (3) a module Hn over Rn which is free over O [G n ] and a map Hn → H of Rn modules inducing an isomorphism (Hn )G n ∼ = H. Then R is a complete intersection ring and H is a free R-module. Proof See Theorem 2.1 and [Dia3, proof of Th. 3.1]. Recall that L ∅ was defined as a module over R∅mod and that it becomes a module over R∅univ via the surjection φ∅ : R∅univ → R∅mod . We apply this lemma with R = R∅univ and H = L ∅ to deduce that L ∅ is free as an R∅univ -module and that R∅univ is a complete intersection. It follows trivially that φ∅ is an isomorphism. We prove in Section 11 the following theorem. 32 There is an integer s such that for every integer n ≥ 3 we can find a set Q n consisting of s primes, with the property that for each p in Q n the representation ρ¯ is unramified at p, ρ(Frob ¯ p ) has distinct k-rational eigenvalues, and p is congruent to n+1 1 modulo 2 , and such that the universal deformation ring R univ Q n can be generated by 2s elements as a topological O -algebra. THEOREM
The ring R univ Q n for n ≥ 3 plays the role of Rn , and L Q n takes the part of Hn . Let G n be the group G Q n , and let O [G n ] → Rn be the natural map defined toward the end of Section 2.2. Let r = 2s. It remains only to find a map L Q n → L ∅ of R univ Qn modules which induces an isomorphism (L Q n )G Q n → L ∅ . By Proposition 30 we have dualities L ∅ → HomO (L ∅ , O ) and L Q n → HomO (L Q n , O ) and an injection i n = i ∅,Q n : L ∅ → L Q n
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
359
with torsion-free cokernel; furthermore, these maps are all maps of R mod Q n -modules and hence also maps of R univ -modules, and if j = j is the adjoint of i n with respect n Q n ,∅ Qn to these pairings, then the composite jn i n : L ∅ → L ∅ is given by multiplication by a unit times the order of G Q n . Recall also that from Proposition 17 the cardinality of N Q n is equal to the cardinality of G Q n times that of N∅ , and hence the O -rank of L Q n is equal to the cardinality of G Q n times the O -rank of L ∅ . One can also check that the G O -rank of L Q nQ n is equal to 4#N∅ ; this follows from the observation that a newform f in N Q n is in N∅ if and only if the image of G Q n in R ×f is trivial. Now it follows, using the same pure commutative algebra argument as in [CDT, last paragraph of Sec. 6.4], that L Q n is a free O [G Q n ]-module and that the map jn of R univ Q n -modules induces an isomorphism, as required. This completes the proof of the main theorem in the minimal case. 10. Proof of the main theorem when S 6 = ∅ We prove the main theorem when S 6 = ∅ using a freeness criterion of Diamond [Dia3, Th. 2.4], based on H. Lenstra’s generalisation of the numerical isomorphism criterion of Wiles. We reproduce Diamond’s result here for convenience. THEOREM 33 (Diamond) Let R be an object of CO equipped with an O -algebra homomorphism R → O , and let p denote the kernel of this homomorphism. Suppose that H is an R-module, finitely generated and free over O , and suppose that p is in the support of H . Let T = R/ Ann R H , write pT for the image of p in T , and write JT for AnnT pT . Let denote H/(H [pT ] + H [JT ]). Let d denote the rank of H [p] = H [pT ] over O . If has finite length over O , then the following are equivalent: (1) rankO H ≤ d rankO T and lengthO ≥ d lengthO (p/p2 ), (2) rankO H = d rankO T and ∼ = (O / FittO (p/p2 ))d , and (3) R is a complete intersection ring and H is free of rank d over R.
Proof See [Dia3, Th. 2.4] Let S be a set of odd primes, and let f be an element of the (nonempty) set N∅ . Assume that K contains the eigenvalues of f . We apply the above criteria with R = R Suniv , H = L S and with the map θ S : R Suniv → O corresponding to f ; then, since R Suniv → R Smod is surjective and L S is a faithful R Smod -module, T is isomorphic to R Smod , d is equal to 4, and we know from Propositions 24 and 29 that rankO H = 4 rankO T . Write p S for the kernel of θ S , and write JS for the ideal JT of the proposition; then, in order to apply the proposition, all we have to show is that the
360
MARK DICKINSON
O -length of is at least 4 times the O -length of p S /p2S .
34 The module C S = L S /(L S [p S ] + L S [JS ]) fits into a short exact sequence LEMMA
0 → L S [p S ] → HomO (L S [p S ], O ) → C S → 0 of R Smod -modules. Proof Since R Smod is reduced, p S ∩ JS = 0 and p S + JS has finite index in R Smod . It follows that the submodules L S [p S ] and L S [JS ] have trivial intersection and that their sum is a finite-index submodule of L S . There is a natural map of R Smod -modules LS ∼ = HomO (L S , O ) → HomO (L S [p S ], O ) which is surjective because the inclusion L S [p S ] → L S has torsion-free cokernel. The kernel of L S → HomO (L S [p S ], O ) is easily seen to be contained in L S [JS ], and hence it is equal to L S [JS ] since its O -rank is equal to that of L S [JS ]. Now L S /L S [JS ] ∼ = HomO (L S [p S ], O ), and so L S /(L S [JS ] + L S [p S ]) is isomorphic to the cokernel of the map L S [p S ] → HomO (L S [p S ], O ). So this cokernel has finite length, and it follows that the map L S [p S ] → HomO (L S [p S ], O ) is injective since it is a map of O -modules of equal rank. Let i = i ∅,S : L ∅ → L S be the inclusion map described in Proposition 30, and let j = jS,∅ be its adjoint. We have a diagram of R Smod -modules 0
/ L ∅ [p∅ ] [ i
0
/ L S [p S ]
/ HomO (L ∅ [p∅ ], O ) [ j
/ C∅
/0
/ HomO (L S [p S ], O )
/ CS
/0
C1
/ C2
0
0
j∗
i∗
in which all rows and columns are exact and every module is killed by p S , so that R Smod acts through θ S . (Note that p S maps onto p∅ , so that L ∅ [p∅ ] = L ∅ [p S ].) One can check that the map denoted i is injective with torsion-free cokernel and that i ∗ is surjective;
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
361
it follows that both maps are isomorphisms since in each case the O -rank of the domain is equal to that of the codomain. The composites ji and i ∗ j ∗ are both given by multiplication by the nonzero element θ∅ (µ∅,S ) of O , and so j ∗ is injective and j ∗ i ∗ is also equal to multiplication by θ∅ (µ∅,S ). Thus C1 can be identified with the cokernel of multiplication by θ∅ (µ∅,S ) on the free rank-4 O -module HomO (L S [p S ], O ). It follows from a diagram chase that the map C1 → C2 of cokernels is an isomorphism and that the map C∅ → C S is injective. So lengthO C S = lengthO C∅ + 4 length O /θ∅ (µ∅,S )O . From Theorem 33, since we know already from the proof of the minimal case of the main theorem that L ∅ is free of rank 4 over the complete intersection ring R∅mod , we have lengthO C∅ = 4 lengthO p∅ /p2∅ . In Section 11 we prove the following result. THEOREM 35 Suppose that f is a newform in N∅ , all of whose Hecke eigenvalues lie in K , with corresponding representation ρ f : G Q → GL2 (O ). Let S ⊂ S 0 be two finite sets of odd primes, and let p S and p S 0 be the kernels of the maps θ S : R Suniv → O and θ S 0 : R Suniv → O corresponding to ρ f . Then 0
lengthO p S 0 /p2S 0 ≤ lengthO p S /p2S + lengthO O /θ S (µ S,S 0 )O , where µ S,S 0 is the element of R Smod defined in Section 8. So we now have lengthO C S = 4 lengthO p∅ /p2∅ + 4 lengthO O /θ∅ (µ∅,S )O ≥ 4 lengthO p∅ /p2∅ + 4(lengthO p S /p2S − lengthO p∅ /p2∅ ) = 4 lengthO p S /p2S , and we deduce again from Theorem 33 that R Suniv is a complete intersection ring and that L S is free of rank 4 over R Suniv ; hence R Suniv → R Smod is an isomorphism, as required. This completes the proof of the main theorem in the case S 6= ∅. 11. Galois cohomology and dimension calculations In this section we give proofs of Theorems 32 and 35, which are used, respectively, to prove the minimal case and the nonminimal case of the main theorem. 11.1. Selmer groups Here we give an interpretation of the tangent space of the universal deformation ring R Suniv as a Selmer group contained in H 1 (Q, ad ρ). ¯
362
MARK DICKINSON
Let M be any finite discrete abelian group with a continuous action of G Q , and ¯ denote its Cartier dual. let M ∗ = Hom(M, µ(Q)) Definition A family of local conditions for M is a map that assigns to each place v of Q a subgroup L v of H 1 (G v , M) in such a way that for all but finitely many places v the subgroup L v is equal to H 1 (G v /Iv , M Iv ). ⊥ If L is a family of local conditions, then so is the dual L ∗ = {L ⊥ v }v , where L v is the annihilator of L v under the perfect pairing
H 1 (G v , M) ⊗ H 1 (G v , M ∗ ) → Q/Z given by J. Tate’s local duality theorem. Given a family L of local conditions for M, we define the corresponding Selmer 1 (G , M) or H 1 (Q, M) to be the set of all elements of H 1 (G , M) whose group HL Q Q L restriction to H 1 (G v , M) lies in L v for each place v. The following theorem of Wiles relates the size of such a Selmer group to the size of its dual. THEOREM 36 (Wiles) Let L be a family of local conditions for a finite discrete G Q -module M. Then the 1 (Q, M) is finite and order of the the Selmer group HL 1 (Q, M) 1 (Q, M ∗ ) Y #HL #HL #L v ∗ = . 0 0 ∗ 0 #H (Q, M) #H (Q, M ) v #H (G v , M)
To make sense of the infinite product on the right-hand side, note that for all but finitely many places v the cardinalities of L v = H 1 (G v /Iv , M Iv ) and H 0 (G v , M) are equal. Proof For a proof of this theorem, see [DDT, Sec. 2.3]. Now suppose that S is a set of odd primes and that (R, ρ) is an S-deformation of ρ, ¯ and let M be an R-module of finite cardinality. Let R[M] denote the object R ⊕ M of CO in which M is an ideal whose square is zero. Then there is a correspondence (see [DDT, Sec. 2.4]) between deformations of ρ to R[M] (that is, deformations of ρ¯ to R[M] which lift ρ) and elements of the Galois cohomology group H 1 (Q, ad ρ ⊗ R M), which sends a cocycle ξ : G Q → M2 (M) to the representation (1 + ξ )ρ : G Q → GL2 (R[M]). This correspondence restricts to give a correspondence between the set of S-deformations of ρ to R[M] and some submodule of H 1 (Q, ad ρ ⊗ R M) which
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
363
we denote HS1 (Q, ad ρ ⊗ R M). Standard arguments give the following interpretation of HS1 (Q, ad ρ ⊗ R M) as a Selmer group. PROPOSITION 37 The subspace HS1 (G Q , ad ρ ⊗ R M) of H 1 (G Q , ad ρ ⊗ R M) consists of all those cohomology classes x such that (1) for each odd prime p not in S, the restriction of x to a decomposition group at p is contained in the submodule H 1 (G p /I p , (ad ρ) ¯ I p ) of H 1 (G p , ad ρ), ¯ and (2) the restriction of x to G 2 satisfies the local condition at 2.
By Theorem 36 the module HS1 (G Q , ad ρ ⊗ R M) has finite cardinality. In particular, in the special case when (R, ρ) = (k, ρ) ¯ and M = k, this Selmer group is naturally isomorphic to the tangent space HomCO (R Suniv , k[ε]) of R Suniv ; hence we have the following. COROLLARY 38 The universal deformation ring R Suniv is Noetherian, equal to a quotient of the power series ring O [[X 1 , . . . , X r ]] in r = dimk HS1 (G Q , ad ρ) ¯ indeterminates over O .
We also need to analyse further the local condition at 2 in this case. Let N be the 2-dimensional k[G Q ]-module corresponding to ρ, ¯ and let N1 be the 1-dimensional subspace that is fixed by inertia and on which Frob2 acts by the eigenvalue of ρ(Frob ¯ 2) −1 which is not equal to α. Write ad ρ¯ for the submodule Homk (N /N1 , N1 ) of ad ρ. ¯ Then we have the following result. PROPOSITION 39 Let σ = (1 + εξ )ρ¯ be a lifting of ρ¯ to k[ε]. Then σ satisfies the local condition at 2 if and only if the following apply: (1) if ρ¯ is ramified at 2, then ξ |G 2 is contained in the kernel of the natural map
H 1 (G 2 , ad ρ) ¯ → H 1 (I2 , ad ρ/ ¯ ad−1 ρ); ¯ (2)
if ρ¯ is unramified at 2, then ξ |G 2 is contained in the submodule H 1 (G 2 /I2 , (ad ρ) ¯ G 2 ) ⊕ H 1 (G 2 , ad−1 ρ) ¯ of H 1 (G 2 , ad ρ). ¯
Proof First, suppose that ρ¯ is ramified at 2. Without loss of generality we may suppose that χ1 ∗ ρ| ¯ G2 = . 0 χ2
364
MARK DICKINSON
Then ad−1 ρ¯ is the set of matrices of the form 00 ∗0 . The condition imposed on σ at 2 was that it should be conjugate to a representation ψ1 ∗ 0 ψ2 for some unramified characters ψ1 and ψ2 . If σ has this form, then ξ |G 2 is clearly in the kernel of the natural map H 1 (G 2 , ad ρ) ¯ → H 1 (I2 , ad ρ/ ¯ ad−1 ρ). ¯ Conversely, suppose that ξ |G 2 is in this kernel; then lifting coboundaries, we may replace ξ by an equivalent cocycle and assume that ξ | I2 = 00 ∗0 , so that ρ(g) is upper triangular for elements of the inertia group I2 . Now, since we have assumed that ρ¯ is ramified at 2, there is an element α0 β1 = ρ(g) ¯ for some g in the inertia group I2 , with either α − 1 or β nonzero. For any element h of the decomposition group G 2 , we have the cocycle identity ξ(g) + gξ(h) = ξ(h) + hξ(h −1 gh) in which both sides are just ξ(gh) = ξ(h · h −1 gh). But ξ(h −1 gh) and ξ(g) both have the form 00 ∗0 , and conjugation by ρ(h) ¯ preserves this; hence the matrix ρ(g)ξ(h) ¯ ρ(g ¯ −1 ) − ξ(h) also has this form, that is, −1 α β α ξ(h) 0 1 0
−β/α 0 γ − ξ(h) = 1 0 0
for some γ . It follows that ρ(h) is upper triangular for every h in G 2 , and so that ρ|G 2 has the required form. Now, suppose that ρ¯ is unramified at 2, so that, without loss of generality, β 0 ρ(Frob ¯ ) = 2 0 α with α and β distinct. Suppose that ρ = (1 + εξ )ρ¯ is a lifting that satisfies the local condition at 2. Then ξ |G 2 is contained in the submodule H 1 (G 2 , ad−1 ρ) ¯ ⊕ H 1 (G 2 /I2 , (ad ρ) ¯ G2 ) of H 1 (G 2 , ad ρ). ¯ Conversely, if ξ |G 2 is contained in this submodule, then, altering ξ π ∗ by a coboundary if necessary, it has the form 01 π2 , where π1 and π2 are unramified additive characters, and hence (1 + εξ )ρ¯ has the required form.
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
365
Now, define subgroups L v ⊂ H 1 (G v , ad ρ) ¯ for each place v of Q as follows: for odd p not in S, let L p = H 1 (G p /I p , (ad ρ) ¯ I p ); for p in S, let L p = H 1 (G p , ad ρ); ¯ let L ∞ = H 1 (G ∞ , ad ρ); ¯ and finally, let L 2 be an appropriate submodule of H 1 (G 2 , ad ρ) ¯ as described in the statement of Proposition 39. Then Propositions 37 1 (Q, ad ρ) and 39 identify HS1 (Q, ad ρ) ¯ with the Selmer group HL ¯ defined by the family L = {L v }v of local conditions for ad ρ. ¯ 11.2. Proof of Theorem 32 In this section we show that we can find a special series of sets of primes S such that the dimension of the tangent space of the corresponding universal deformation ring R Suniv (and hence the number of elements required to generate R Suniv as a topological O -algebra) is equal to 2#S. We continue to assume that ρ¯ satisfies the hypotheses of Section 1, although the modularity assumption is not used anywhere in this section. As explained in Section 2, the number of elements required to generate R Suniv can be expressed as the dimension of the space of S-deformations of ρ¯ to k[ε], and this in turn can be identified with the Selmer group HS1 (Q, ad ρ) ¯ defined in Section 11. We use Wiles’s result (Theorem 36) along with Propositions 37 and 39 to relate the dimension of HS1 (Q, ad ρ) ¯ to the dimension of the dual Selmer group HS1∗ (Q, ad ρ), ¯ and we then compute the latter dimension for certain carefully chosen sets S. For each integer n ≥ 0, let E n be the cyclotomic extension Q(ζ2n ) obtained by adjoining a primitive 2n th root of unity to Q, and let Fn be the extension of E n cut out by the representation ad ρ, ¯ so that G Fn is the kernel of the map proj ρ¯ : G E n → PGL2 (k). The Galois group Gal(Fn /E n ) is thus isomorphic to the projective image in PGL2 (k) of the representation ρ¯ restricted to G E n . The subgroups of PGL2 (k) were classified by L. Dickson in [Dic]; using his classification and our assumptions on ρ, ¯ we find that the projective image of ρ(G ¯ Q ) is either dihedral, of order dividing 2(#k − 1), or it is simple, conjugate to PGL2 (κ) for some subfield κ 6 = F2 of k. The main result of this section is the following lemma, which allows us to turn the contribution of H∅1∗ (Q, ad ρ) ¯ to the dimension calculation into something more manageable. LEMMA 40 Let ψ be an element of H∅1∗ (Q, ad ρ). ¯ If the restriction of ψ to the group H 1 (E n , ad ρ) ¯ is nontrivial, then there are infinitely many primes q such that (1) q is congruent to 1 modulo 2n , (2) ρ¯ is unramified at q and ρ(Frob ¯ q ) has distinct k-rational eigenvalues, and (3) ψ maps to a nontrivial element of H 1 (Fq , ad ρ) ¯ ⊂ H 1 (Qq , ad ρ). ¯ 1 Conversely, if ψ is an element of H (Q, ad ρ) ¯ which restricts to the zero element of H 1 (E n , ad ρ) ¯ and whose restriction to a decomposition group at 2 is contained in the
366
MARK DICKINSON
1 dual local condition L ⊥ ¯ and there are 2 , then ψ is in the Selmer group H∅∗ (Q, ad ρ) no primes q having the above properties.
Here L 2 is the subgroup of H 1 (G 2 , ad ρ) ¯ defined following the proof of Proposition 39. Before proving this lemma, we establish two preliminary results. LEMMA 41 The projective image of ρ¯ is the same as that of ρ¯ restricted to G E n .
Proof The projective image of ρ¯ is isomorphic to Gal(F0 /Q), while the projective image of ρ¯ restricted to G E n is isomorphic to Gal(Fn /E n ). The statement amounts to saying that the natural inclusion Gal(Fn /E n ) → Gal(F0 /Q) is a bijection, which is equivalent to the statement that the extensions E n and F0 of Q are linearly disjoint. The Galois group Gal(E n ∩ F0 /Q) is a simultaneous quotient of Gal(F0 /Q) and of Gal(E n /Q) ∼ = (Z/2n Z)× . If the projective image of ρ¯ is PGL2 (κ) for some subfield κ of k of cardinality at least 4, then it is simple and so has no abelian quotients. Thus Gal(E n ∩ F0 /Q) is trivial, and E n and F0 are linearly disjoint. If the projective image is isomorphic to the dihedral group D2h , then h is odd and the only abelian quotient of order a power of 2 is the quotient by the cyclic subgroup C h of index 2. So Gal(Fn /E n ) is isomorphic either to D2h , in which case E n and F0 are again linearly disjoint, or to C h . In the latter case, the projective image of ρ¯ becomes cyclic on restricting to the absolute Galois group of one of the three fields √ √ √ Q( 2), Q( −2), or Q( −1), and it follows that ρ¯ must be induced from a character of one of these Galois groups. One can check that then the semisimplification of the restriction of ρ¯ to G 2 is a scalar representation, and this violates our assumption on the restriction of ρ¯ to G 2 . So the projective image of ρ¯ restricted to G E n cannot be isomorphic to C h . 42 The cohomology group H 1 (Fn /E n , ad ρ) ¯ is trivial. LEMMA
Proof By Lemma 41, Gal(Fn /E n ) is isomorphic to Gal(F0 /Q) with compatible actions on ad ρ, ¯ so it is enough to prove this result when n = 0. We treat first the case when Gal(F0 /Q) is isomorphic to PGL2 (κ). In this case, the cohomology group above is H 1 (PGL2 (κ), M2 (k)), where M2 (k) is the set of two-by-two matrices with entries in k and PGL2 (κ) acts on M2 (k) by
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
367
conjugation. By decomposing M2 (k) using a κ-basis of k, one sees that it is enough to show that H 1 (PGL2 (κ), M2 (κ)) = 0. In fact, this result is true for any finite field κ with the exceptions of F3 and F5 . See [DDT] for the proof when the order of κ is odd. Here we give the proof where the cardinality of κ is divisible by 4; when κ = F2 , the group PGL2 (κ) is dihedral; this case is treated along with the other dihedral cases below. Write G for the group PGL2 (κ) ∼ = SL2 (κ), M for the G-module M2 (κ), and M 0 for the set of elements of M of trace zero. There is a filtration 0 ⊂ κ ⊂ M 0 ⊂ M, in which the quotient M 0 /κ is isomorphic to κ 2 with an element ac db of SL2 (κ) 2 2 acting on κ 2 as left multiplication by the matrix a2 b 2 . The other two quotients of c d this filtration have trivial G-action. The cohomology group H 1 (G, κ) = Hom(G, κ) is trivial because G is simple, so it follows from the long exact sequence associated to the short exact sequence 0 → κ → M → M/κ → 0 that H 1 (G, M) injects into H 1 (G, M/κ), and it suffices to prove that the latter group vanishes. One can check easily that H 0 (G, M/κ) = 0; for instance, there are no nontrivial elements of M/κ which are fixed under conjugation by both the matrix 10 11 and the matrix a0 b0 with a 6= b. So, from the short exact sequence 0 → M 0 /κ → M/κ → κ → 0 of G-modules, we get an exact sequence 0 → κ → H 1 (G, M 0 /κ) → H 1 (G, M/κ) → 0, and it suffices to show that H 1 (G, M 0 /κ) has dimension 1. Let B be the image of the upper-triangular matrices in PGL2 (κ); then the restriction map H 1 (PGL2 (κ), M 0 /κ) → H 1 (B, M 0 /κ) is injective since its composition with the corestriction map is just multiplication by the index of B in G, which is odd. Write M 1 for the B-submodule of M 0 consisting of upper-triangular matrices with trace zero. We have a short exact sequence 0 → M 1 /κ → M 0 /κ → M 0 /M 1 → 0 b in which an element a0 a −1 of B acts by a 2 on the first term and a −2 on the third. A portion of the associated long exact sequence of cohomology groups is H 0 (B, M 0 /M 1 ) → H 1 (B, M 1 /κ) → H 1 (B, M 0 /κ) → H 1 (B, M 0 /M 1 ) in which the first term is zero because B acts nontrivially on M 0 /M 1 . Let U C B be the subgroup of unipotent matrices of B; then from the five-term restriction-inflation
368
MARK DICKINSON
sequence it follows that H 1 (B, N ) ∼ = H 1 (U, N ) B/U for any finite κ-module N since the order of B/U is coprime to that of N . In particular, H 1 (B, M 1 /κ) = H 1 (U, M 1 /κ) B/U = Hom B (U, M 1 /κ) is 1-dimensional over κ since an element a0 db of B acts by a 2 on U and on M 1 /κ. Similarly, H 1 (B, M 0 /M 1 ) = Hom B (U, M 0 /M 1 ) is the trivial module. So H 1 (B, M 0 /κ) is 1-dimensional over κ. The case where the projective image of ρ¯ is dihedral is easier. We have to show that H 1 (D2h , M2 (k)) is zero, where the cyclic subgroup C h of D2h acts by conjugation by diagonal matrices in PGL2 (k) and D2h is generated by C h and the element 0 1 . Since the order of C is odd, the inflation map h 10 H 1 (D2h /C h , M2 (k)Ch ) → H 1 (D2h , M2 (k)) is an isomorphism. Now M2 (k)Ch is the space of diagonal matrices, and from the explicit description of cohomology for a cyclic group we see that H 1 (D2h /C h , M2 (k)Ch ) = 0.
Proof of Lemma 40 Suppose that ψ is a cocycle representing an element of H 1 (Q, ad ρ) ¯ which does not restrict to zero in H 1 (E n , ad ρ). ¯ If ρ¯ is unramified at q, then the conditions on the prime q can be translated into the following conditions on any lifting 8 of Frobq : (1) 8 is contained in G E n , (2) ρ(8) ¯ has distinct k-rational eigenvalues, and (3) ψ(8) is not contained in (8 − 1) ad ρ. ¯ The last condition arises because the cohomology group H 1 (Fq , ad ρ) ¯ can be exˇ pressed as the cokernel of Frobq −1 acting on ad ρ. ¯ By the Cebotarev density theorem, to find a prime q satisfying the above conditions it suffices to show that the open subset of G Q that consists of all elements σ in G E n for which ρ(σ ¯ ) has distinct k-rational eigenvalues and ψ(σ ) ∈ / (σ − 1) ad ρ¯ is nonempty. Lemma 42 shows that the first term in the restriction-inflation sequence 0 → H 1 (Fn /E n , ad ρ) ¯ → H 1 (E n , ad ρ) ¯ → H 1 (Fn , ad ρ) ¯ is zero, so that the map on the right-hand side is an injection. It follows that ψ is nontrivial when restricted to an element of H 1 (Fn , ad ρ). ¯ Since G Fn acts trivially on ad ρ, ¯ this last cohomology group is just the continuous homomorphisms from G Fn to
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
369
ad ρ. ¯ Thus it makes sense to talk about the image ψ(G Fn ) of ψ restricted to G Fn , and this image is a nontrivial subgroup of ad ρ. ¯ It follows from the explicit descriptions of the possible projective images of ρ¯ above that we can find elements α0 β0 , β0 α0 , and α0 β0 in the projective image of ρ¯ restricted to G E n , with α 6= β. The intersection of the subspaces of ad ρ¯ of the form (σ − 1) ad ρ¯ as σ ranges over these three elements is (0). But ψ(G Fn ) is nontrivial, so for at least one of these elements σ , ψ(G Fn ) is not contained in (σ −1) ad ρ. ¯ Now if τ is any element of G Fn , then (σ −1) ad ρ¯ = (τ σ −1) ad ρ, ¯ ψ(τ σ ) = τ ψ(σ )+ψ(τ ) = ψ(σ )+ψ(τ ), and ρ(τ ¯ σ ) = ρ(σ ¯ ) has distinct k-rational eigenvalues. Since ψ(G Fn ) is not contained in (σ − 1) ad ρ, ¯ it follows that there is an open subset of G Fn consisting of elements τ such that ψ(τ σ ) is not contained in (τ σ − 1) ad ρ, ¯ so that τ σ is an element satisfying the properties listed above. This completes the proof of the first part of Lemma 40. Now, suppose that ψ is an element of H 1 (Q, ad ρ) ¯ which restricts to the zero 1 element of H (E n , ad ρ). ¯ Then, for every prime q at which ρ¯ is unramified and which is congruent to 1 modulo 2n , the decomposition group G q is contained in G E n , and so ψ automatically restricts to the zero element of H 1 (Qq , ad ρ). ¯ Hence in this case there are no primes q satisfying the listed properties. Finally, note that for any prime p 6= 2 the inertia group I p is contained in G E n since E n is ramified only at 2. So ψ restricts to an element of H 1 (F p , (ad ρ) ¯ I p ) in 1 ∗ H (Q p , ad ρ) ¯ and all the local conditions ∅ are satisfied, except possibly the condition at 2. This completes the proof of Lemma 40. We now use Theorem 36 and the Selmer group interpretation of H Q1 (Q, ad ρ) ¯ given 1 at the end of Section 11.1 to compute the size of H Q (Q, ad ρ) ¯ for particular Q. Suppose that Q is a finite set of odd primes with the property that for each q in Q the representation ρ¯ is unramified at q and ρ(Frob ¯ q ) has distinct k-rational eigenvalues. Then in the formula of Theorem 36, (1) the odd primes p not in Q contribute nothing; (2) the primes p in Q each contribute a factor of (#k)2 ; (3) the G Q -module ad ρ¯ is self-dual, so that #H 0 (G Q , ad ρ) ¯ = #H 0 (Q, ad ρ¯ ∗ ); (4)
the k-vector space H 1 (G ∞ , ad ρ) ¯ is trivial, so that the local condition L ∞ is also trivial and the contribution to the dimension count at the place ∞ is − dimk H 0 (G ∞ , ad ρ) ¯ = −2. These facts give us the following formula for the dimension:
dimk H Q1 (Q, ad ρ) ¯ = dimk H Q1 ∗ (Q, ad ρ) ¯ + dimk L 2 − dimk H 0 (G 2 , ad ρ) ¯ + 2#Q − 2.
370
MARK DICKINSON
The idea now is to use Lemma 40 to find sets of primes Q for which the dual Selmer group H Q1 ∗ (Q, ad ρ) ¯ is as small as possible. We first prove a lemma that makes no use of the local information at 2 in the deformation problem. LEMMA 43 There is an integer r ≥ 0 and for each n ≥ 3 a set Q n of r primes, each congruent to 1 modulo 2n , such that the dimension of H Q1 n (Q, ad ρ) ¯ is equal to
2#Q n + dimk γ −1 (L 2 ) − dimk H 0 (G 2 , ad0 ρ) ¯ if H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k)⊥ ⊂ β L 2 , and 2#Q n − 1 + dimk γ −1 (L 2 ) − dimk H 0 (G 2 , ad0 ρ) ¯ if H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k)⊥ 6 ⊂ β L 2 , where γ and β are the maps H 1 (G 2 , ad0 ρ) ¯
γ
/ H 1 (G , ad ρ) ¯ 2
β
/ H 1 (G , k) 2
in the long exact sequence of G 2 -modules associated to the short exact sequence 0 → ad0 ρ¯ → ad ρ¯ → k → 0 in which the right-hand map is the trace map. Proof Let r be the dimension of the vector space ¯ → H 1 (E n , ad ρ) ¯ . im H∅1∗ (Q, ad ρ) For any choice of primes Q at which ρ¯ is unramified, we have an exact sequence M 0 → H Q1 ∗ (Q, ad ρ) ¯ → H∅1∗ (Q, ad ρ) ¯ → H 1 (Fq , ad ρ) ¯ q∈Q
since the only local conditions that differ are those at q. Now, starting with Q n = ∅ and using Lemma 40, we can repeatedly reduce the dimension of H Q1 ∗ (Q, ad ρ) ¯ by n adding a suitably chosen prime q to the set Q n ; after doing this at most r times, we can assume that H Q1 ∗ (Q, ad ρ) ¯ has been reduced to the portion of H∅1∗ (Q, ad ρ) ¯ n
which lies in the kernel of the map H∅1∗ (Q, ad ρ) ¯ → H 1 (E n , ad ρ); ¯ we can also then add extra primes to Q n so that it contains exactly r elements. By Lemma 40 any cohomology class in H 1 (Q, ad ρ) ¯ which lies in this kernel automatically satisfies all the local conditions away from 2, so H Q1 ∗ (Q, ad ρ) ¯ can be described as the intersection n
of the kernel of the map H 1 (Q, ad ρ) ¯ → H 1 (E n , ad ρ) ¯ with the set of elements of 1 H (Q, ad ρ) ¯ satisfying the local condition L ⊥ at 2. Using the restriction-inflation 2 sequence for the normal subgroup G E n of G Q , this is ⊥ im H 1 (E n /Q, k) → H 1 (Q, ad ρ) ¯ ∩ ker H 1 (Q, ad ρ) ¯ → H 1 (G 2 , ad ρ)/L ¯ 2 ,
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
371
which is isomorphic to its preimage ⊥ ker H 1 (E n /Q, k) → H 1 (Q2 , ad ρ)/L ¯ 2
in H 1 (E n /Q, k). But the inclusion G 2 → G Q induces a bijection Gal(E n /Q) → Gal(Q2 (ζ2n )/Q2 ) of Galois groups, and since n ≥ 3, the group H 1 Gal(Q2 (ζ2n )/Q2 ), k = Hom Gal(Q2 (ζ2n )/Q2 ), k can be identified with H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k). So the dimension of H Q1 ∗ (Q, ad ρ) ¯ n can now be expressed as ⊥ dimk ker H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k) → H 1 (Q2 , ad ρ)/L ¯ 2 . By local class field theory, the maximal abelian extension of Q2 is the compositum of the maximal unramified extension and Q2 (ζ2∞ ), which are linearly disjoint, ˆ and so the abelianisation of G 2 is isomorphic to the direct product of G 2 /I2 ∼ = Z × ∼ Gal(Q2 (ζ2∞ )/Q2 ) = Z2 . Therefore the dimension above is equal to ⊥ dimk ker H 1 (Q2 , k) → H 1 (Q2 , ad ρ)/L ¯ 2 ⊥ ) is contained in H 1 (Gal(Q (ζ ∞ )/Q ), k), if ker(H 1 (Q2 , k) → H 1 (Q2 , ad ρ)/L ¯ 2 2 2 2 and ⊥ −1 + dimk ker H 1 (Q2 , k) → H 1 (Q2 , ad ρ)/L ¯ 2
otherwise. We have two perfect pairings H 1 (Q2 , k)
α
× H 1 (Q2 , k) o Q/Z
/ H 1 (Q , ad ρ) ¯ 2
× β
H 1 (Q2 , ad ρ) ¯ o
γ
H 1 (Q2 , ad0 ρ) ¯
Q/Z
of k-vector spaces, which are compatible in the sense that (αx, y) = (x, βy) for x ⊥ in H 1 (Q2 , k) and y in H 1 (Q2 , ad ρ). ¯ Now the kernel above is α −1 (L ⊥ 2 ) = (β L 2 ) , ⊥ −1 1 by compatibility of these pairings. So dimk α (L 2 ) is equal to dimk H (Q2 , k) − dimk β L 2 , which is 3 − dimk β L 2 . Furthermore, the condition that α −1 (L ⊥ 2 ) be contained in H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k) can now be replaced by the equivalent dual condition that H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k)⊥ be contained in β L 2 . The expression for the dimension of H Q1 n (Q, ad ρ) ¯ now becomes 2#Q n + dimk L 2 − dimk H 0 (G 2 , ad ρ) ¯ − dimk β L 2
372
MARK DICKINSON
when H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k)⊥ is not contained in β L 2 , and one more than this otherwise. From the long exact sequence 0 → H 0 (Q2 , ad0 ρ) ¯ → H 0 (Q2 , ad ρ) ¯ → H 0 (Q2 , k) → γ −1 (L 2 ) → L 2 → β L 2 → 0, this is equal to 2#Q n − 1 + dimk γ −1 (L 2 ) − dimk H 0 (G 2 , ad0 ρ) ¯ or 2#Q n + dimk γ −1 (L 2 ) − dimk H 0 (G 2 , ad0 ρ), ¯ again depending on whether H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k)⊥ is contained in β L 2 or not. This completes the proof of the lemma. We now apply this lemma to the specific cases that we are considering. First, suppose that ρ¯ is unramified at 2, so that ρ| ¯ G 2 looks like χ01 χ02 with χ1 and χ2 distinct and χ2 (Frob2 ) = α. Then we require that any lifting ρ look like ψ01 ψ∗2 when restricted to G 2 where ψi is unramified and lifts χi , and the corresponding local condition is the subspace H 1 (G 2 /I2 , { diagonal matrices }) ⊕ H 1 (G 2 , k(χ1 /χ2 )) of the space H 1 (G 2 , ad ρ) ¯ = H 1 (G 2 , { diagonal matrices }) ⊕ H 1 (G 2 , k(χ1 /χ2 )) ⊕ H 1 (G 2 , k(χ2 /χ1 )). The pullback γ −1 (L 2 ) of this local condition to H 1 (G 2 , ad0 ρ) ¯ is H 1 (G 2 /I2 , k) ⊕ H 1 (G 2 , k(χ1 /χ2 )), which has dimension 2, the dimension of H 0 (G 2 , ad ρ) ¯ is 1, and the trace map β takes 1 L 2 into H (G 2 /I2 , k), so that β L 2 does not contain H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k)⊥ . So, by Lemma 43, the total dimension count is 2#Q n . The other case we must consider is the case in which ρ¯ is ramified at 2 and the χ ∗ restriction of ρ¯ to a decomposition group at 2 has the form 01 χ2 , where χ1 and χ2 are distinct and unramified. We require the same property of a lifting ρ as above. In this case, the local condition is L 2 = ker H 1 (G 2 , ad ρ) ¯ → H 1 (I2 , ad ρ/ ¯ ad−1 ρ) ¯ ,
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
373
and its pullback γ −1 (L 2 ) to H 1 (G 2 , ad0 ρ) ¯ is equal to ker H 1 (G 2 , ad0 ρ) ¯ → H 1 (I2 , ad0 ρ/ ¯ ad−1 ρ) ¯ since it follows from the long exact sequence associated to 0 → ad0 ρ/ ¯ ad−1 ρ¯ → ad ρ/ ¯ ad−1 ρ¯ → k → 0 that the map H 1 (I2 , ad0 ρ/ ¯ ad−1 ρ) ¯ → H 1 (I2 , ad ρ/ ¯ ad−1 ρ) ¯ is an injection. The trace map β again maps the local condition L 2 into H 1 (G 2 /I2 , k), so that β L 2 cannot contain H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k)⊥ . It remains to compute dimk γ −1 (L 2 ) − dimk H 0 (G 2 , ad0 ρ). ¯ We have two compatible short exact sequences of G 2 -modules: 0
/ ad−1 ρ¯
/ (ad0 ρ) ¯ I2
/ (ad0 ρ/ ¯ ad−1 ρ) ¯ I2
/0
0
/ ad−1 ρ¯
/ ad0 ρ¯
/ ad0 ρ/ ¯ ad−1 ρ¯
/0
in which the top sequence can be regarded as a sequence of G 2 /I2 -modules. Now, consider the diagram of k-modules (Figure 1), in which the rows and columns are exact. We want to compute the dimension dimk ker δ − dimk H 0 (G 2 ad0 ρ). ¯ Write Z for the cohomology group H 1 (G 2 /I2 , (ad0 ρ/ ¯ ad−1 ρ) ¯ I2 ); then, since the square at the bottom left of the above diagram commutes and since H 1 (G 2 /I2 , ad−1 ρ) ¯ = 0, the image of Z is contained in the image of u. So dimk ker δ = dimk ker u + dimk Z and dimk ker δ− dimk H 0 (G 2 , ad0 ρ) ¯ = dimk ker u + dimk Z − dimk H 0 (G 2 , ad0 ρ). ¯
374
MARK DICKINSON
0 H 0 (G 2 , ad−1 ρ) ¯ H 0 (G 2 , ad0 ρ) ¯ H 0 (G 2 , ad0 ρ/ ¯ ad−1 ρ) ¯ H 1 (G 2 , ad−1 ρ) ¯
H 1 (G 2 /I2 , (ad0 ρ/ ¯ ad−1 ρ) ¯ I2 ) H 2 (G 2 /I2 , ad−1 ρ) ¯
H 1 (G 2 , ad0 ρ) ¯S SSS SSSδ SSS u SSS S) −1 1 (I , ad0 ρ/ / H 1 (G , ad0 ρ/ / ¯ ad ρ) ¯ H ¯ ad−1 ρ) ¯ 2 2 / H 2 (G , ad−1 ρ) ¯ 2 Figure 1
From the exact sequence 0 → H 0 (G 2 , ad−1 ρ) ¯ → H 0 (G 2 , ad0 ρ) ¯ → H 0 (G 2 , ad0 ρ/ ¯ ad−1 ρ) ¯ → H 1 (G 2 , ad−1 ρ) ¯ → ker u → 0 and from the fact that dimk Z = dimk H 0 (G 2 , ad0 ρ/ ¯ ad−1 ρ) ¯ = 0, this dimension is equal to dimk H 0 (G 2 , ad−1 ρ) ¯ − dimk H 1 (G 2 , ad−1 ρ), ¯ which is in turn equal to 1 + dimk H 2 (G 2 , ad−1 ρ), ¯ and so to 1 + dimk H 0 (G 2 , (ad−1 ρ) ¯ ∗) = 1 since χ and ψ are distinct. So the total dimension count is 2#Q n , and this completes the proof of Theorem 32.
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
375
11.3. Proof of Theorem 35 In this section we prove Theorem 35. Let p be an odd prime. We begin with the following lemma. LEMMA 44 Suppose that f is an element of N∅ all of whose Hecke eigenvalues are in K . Then the determinant of Frob p −1 acting on (ad ρ f ) I p (1) is equal to a unit of O times µ p , where µ p is as defined in Section 8.
Proof This is a straightforward case-by-case check. LEMMA 45 Let ρ : G p → GL2 (O ) be any continuous representation. Then the natural map
H 1 (I p , ad ρ ⊗O K /O )G p /I p → H 1 (I p , K /O )G p /I p induced by the trace map ad ρ → O is a surjection. Proof We have a diagram H 1 (G p , ad ρ ⊗O K /O )
/ H 1 (I p , ad ρ ⊗O K /O )G p /I p
H 1 (G p , K /O )
/ H 1 (I p , K /O )G p /I p
It follows from the five-term restriction-inflation sequence and from the fact that G p /I p ∼ = Zˆ has cohomological dimension 1 that the horizontal maps are surjective. Thus it would be enough to show that the left-hand vertical map is surjective, and, from the long exact sequence in cohomology associated to the sequence 0 → ad0 ρ ⊗O K /O → ad ρ ⊗O K /O → K /O → 0, the surjectivity of the left-hand vertical map is equivalent to the injectivity of the map H 2 (G p , ad0 ρ ⊗O K /O ) → H 2 (G p , ad ρ ⊗O K /O ). For any finite G p -module M of cardinality prime to p, Tate’s local duality theorem gives a perfect pairing H 2 (G p , M) ⊗ H 0 (G p , M ∗ ) → Q/Z,
376
MARK DICKINSON
¯ p )) denotes the Cartier dual of M. Furthermore, for any n, where M ∗ = Hom(M, µ(Q the (Galois-equivariant) trace pairing on ad ρ can be used to identify the Cartier dual n of the G p -module ad ρ ⊗O m−n O /O with the G p -module ad ρ(1) ⊗O O /mO , and the n dual of ad0 ρ ⊗O m−n O /O with the module (ad ρ/O )(1) ⊗O O /mO . So the Pontryagin dual of the map lim H 2 (G p , ad0 ρ ⊗O m−n H 2 (G p , ad ρ ⊗O m−n O /O ) → lim O /O ) − → − → n n can be identified with the map lim H 0 G p , ad ρ(1) ⊗O O /mnO → lim H 0 G p , (ad ρ/O )(1) ⊗O O /mnO ← − ← − n n which is equal to (ad ρ(1))G p → (ad ρ/O )(1)G p , and to prove the lemma it is enough to show that this map is surjective. This we now do. Suppose that A is an element of M2 (O ) which represents an element of (ad ρ/O )(1)G p . It would be enough to show that the trace of A is divisible by 2, for then A − (1/2) tr A would be an element of ad ρ(1)G p lifting A. Take any lift 8 in G p of Frob p in G p /I p , and let B be the invertible matrix ρ(8). Then p B AB −1 = A + λ for some scalar matrix λ. Taking determinant and trace of this equation, we find that ( p 2 − 1) det A = λ(λ + tr A) and ( p − 1) tr A = 2λ. It follows that 4 det A = (tr proof of the lemma.
A)2 ,
and so tr A is divisible by 2. This completes the
We now turn to the proof of Theorem 35. Proof of Theorem 35 It is enough to prove the theorem in the case when S 0 = S ∪ { p}. Let N be the G Q -module ad ρ ⊗O K /O . The dual HomO (π S /π S2 , K /O ) of π S /π S2 has the same length as π S /π S2 and can be identified with the cohomology group HS1 (Q, N ). From Proposition 37 we have a diagram 0
/ H 1 (Q, N ) S
/ H 10 (Q, N ) S
/C
/0
0
/ H 1 (G p /I p , N I p )
/ H 1 (G p , N )
/ H 1 (I p , N )G p /I p
/0
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
377
in which the rows are exact and the left-hand square is Cartesian. The module C is by definition the cokernel of the top left-hand map; thus lengthO π S 0 /π S20 − lengthO π S /π S2 = lengthO C, and so to prove the theorem we must bound the length of C. By a diagram chase one sees that the right-hand vertical map is injective, so the length of C is bounded by that of H 1 (I p , N )G p /I p , and, as described in [CDT], the length of this is equal to the length of O /d p O , where d p is the determinant of Frob p −1 acting on (ad ρ) I p (1). By Lemma 44 this determinant is equal to a unit times µ p . In the case when S is nonempty, the theorem follows. In the case when S is empty, we need a little more. Suppose that S = ∅ and S 0 = { p}, and let (R 0 , σ ) be any S 0 -deformation of ρ. ¯ Since σ (c) necessarily has the form 01 10 , it follows that 1 = (det σ/ε2 )(c) =
Y ((det σ/ε2 )|Wq ◦ ωq−1 )(−1) q
where the product runs over all primes q. For every q not equal to p, it follows from Proposition 9 that (det σ/ε2 )| Iq is the Teichm¨uller lift of det ρ| ¯ Iq , which has odd order, and so ((det σ/ε2 )|Wq ◦ ωq−1 )(−1) = 1 for each q 6 = p and hence also for q = p. Now, consider the map HS10 (Q, N ) → H 1 (I p , K /O )G p /I p given by ξ 7→ tr ξ | I p . The codomain of this map can be identified with the O 1 G p /I p be the submodule corremodule Hom(Z× p , K /O ); let D ⊂ H (I p , K /O ) × sponding to maps f : Z p → K /O which send −1 to 0; then the inclusion D ⊂ H 1 (I p , K /O )G p /I p has cokernel isomorphic to O /2O . An element ξ of HS10 (Q, N ) corresponds to an S 0 -deformation (1 + ξ )ρ of ρ to R[m−n O /O ] for some n, and, from the above calculation and the fact that det((1 + ξ )ρ) = (1 + tr ξ ) det ρ, we see that the image of the map above is contained in D. Hence the inverse image of D under the surjective map H 1 (I p , ad ρ ⊗O K /O )G p /I p → H 1 (I p , K /O )G p /I p contains C and lengthO C ≤ lengthO H 1 (I p , ad ρ ⊗ K /O )G p /I p − lengthO O /2O which is equal to lengthO O (1/2)µ p O , as required. 12. Numbers of newforms In this section we prove the two results stated in Proposition 17. Since the cardinality of the set N S is not affected either by replacing K with a finite extension of K or by replacing ρ¯ with a twist by a character of odd conductor, we assume in this section that ρ¯ is minimal.
378
MARK DICKINSON
12.1. Existence of a minimal modular lift Recall that N∅ is the set of classical weight 2 newforms of odd level whose associated deformation is an ∅-deformation. We show that the modularity assumption on ρ, ¯ together with the assumption on ρ| ¯ G 2 , implies that N∅ is nonempty. From Buzzard’s level-lowering result (Proposition 1), there is a weight 2 newform f of odd level equal to the conductor N (ρ) ¯ of ρ¯ and for which the reduction of t2 ( f ) is equal to α if ρ¯ is unramified at 2. From Wiles’s result (Proposition 14) we see that the deformation (R f , ρ f ) satisfies the lifting condition at 2. Recall that the free O -module H∅ = HomV∅ (M, H )m∅ has O -rank equal to twice the cardinality of N∅ ; thus in order to show that N∅ is nonempty it is enough to show that this module is nonzero. We do this by constructing a substitute module M 0 for M such that M 0 ⊗O k ∼ = M ⊗O k and HomV∅ (M 0 , H )m∅ is demonstrably nonzero. Then from the fourth part of Proposition 15 it follows that H∅ is nonzero and N∅ is nonempty. The definition of the substitute module M 0 = ⊗O M 0p is as follows. Let f be the newform above which satisfies the local condition at 2, and assume that K contains K f . Then we have the following. (1) At p in T (ρ), ¯ ρ f |G p is induced from some character ψ p whose reduction is equal to that of χ p . Let M 0p be a model over O for 2(ψ p ) as described earlier. (2) If p is not in T (ρ), ¯ then det ρ f | I p is a lift of ρ| ¯ I p and corresponds to some × × 0 character ψ p : Z p → O . Let M p be a rank-1 O -module on which V p acts by ψ p ◦ det. Note that for p not in T (ρ) ¯ the group ψ p (det V p ) has 2-power order and so has trivial reduction. Given these definitions, one can check that the automorphic representation corresponding to f as above contributes nontrivially to HomO [V∅ ] (M 0 , H )m∅ ⊗O K , and it follows as described above that N∅ is nonempty. 12.2. Cardinalities of sets of newforms Now, suppose that Q is a finite set of odd primes such that for each p in Q the representation ρ¯ is unramified at p and ρ(Frob ¯ p ) has distinct eigenvalues. We prove the second part of Proposition 17, namely, that #N Q = #G Q #N∅ , where G Q is the abelian 2-group defined in Section 2. We define modules L i for 0 ≤ i ≤ 2 as follows. Let r be a prime congruent to 3 modulo 4 not in Q at which ρ¯ is unramified and ρ(Frob ¯ r ) has distinct k-rational eigenvalues. Also, let s be an odd prime (but not a Fermat prime) not in Q ∪ {r }
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
379
at which ρ¯ is unramified and ρ(Frob ¯ s ) has distinct k-rational eigenvalues, and fix a nontrivial character ψ : (Z/sZ)× → O × of order not a power of 2 (enlarge K if necessary). Q Define subgroups V2 ⊂ V1 ⊂ V0 with Vi = p Vi, p and an O [V0 ]-module M 0 = ⊗ p M 0p as follows: (1) at p not in Q ∪ {r, s}, let Vi, p = V∅, p and M 0p = M p ; (2) at r , let Vi,r = U1 (r ){±I }, and let Mr0 be a rank-1 O -module with trivial U1 (r )-action; (3) at s, let Vi,s = U0 (s), and let Ms0 be a rank-1 O -module on which U0 (s) acts by the character U0 (s) → U0 (s)/U1 (s) ∼ from ψ; = (Z/sZ)× → O × arising a b (4) at p in Q, let V0, p = U0 ( p), let V1, p be the set of all cp in U ( p) such d 0 a b × that d has odd order in (Z/ pZ) , and let V2, p be the set of all cp d in U0 ( p) such that both a and d have odd order in (Z/ pZ)× ; let M 0p = O have trivial U0, p -action. Let T be the polynomial algebra over O generated by operators T p for odd primes p not in Q ∪ {r, s} at which ρ¯ is unramified and by T2 , and let m be the maximal ideal of T which sends T2 to α and T p to tr ρ(Frob ¯ p ) for p odd, as before. Now, define modules L i by L i = HomVi (M, H )m ∼ = H 1 (YVi , F Mˇ )m for 0 ≤ i ≤ 2, where Mˇ denotes the right Vi -module HomO (M, O ) and the isomorphism follows from Proposition 15. Then, using the methods of proof of Lemmas 22 and 23, we can show that an automorphic representation π contributes nontrivially to L 0 ⊗O K¯ if and only if π ⊗ ψ corresponds to an element of N∅ , and it contributes nontrivially to the larger module L 2 ⊗O K¯ if and only if π ⊗ K ψ corresponds to an element of N Q ; in both cases the dimension of the contribution is exactly 2#Q+1 . The key observation is the following. LEMMA 46 Suppose that f is a weight 2 newform from which ρ¯ arises, with corresponding automorphic representation π f = ⊗0p π p , and let p be a prime in Q. Then we have the following: V (1) the subspace π p 2, p of π p has dimension 2 over K¯ ; V V (2) if π p 0, p is nontrivial, then π p is unramified and π p 0 has dimension 2 over K¯ .
Proof By our assumption on Q, ρ¯ is unramified at p and the restriction ρ| ¯ G p has distinct k-rational eigenvalues; it follows that the deformation ρ f |G p of ρ¯ is diagonalisable by Proposition 9, with both components at worst tamely ramified. Using Carayol’s
380
MARK DICKINSON
theorem and the description of the local Langlands correspondence, it follows that × π p = π(χ1 , χ2 ) is principal series, corresponding to characters χ1 : Q× p → O and × × χ2 : Q p → O which are trivial on 1 + pZ p . Write B2 (Q p ) for the subgroup of upper-triangular matrices of GL2 (Q p ). Let V be any open compact subgroup of GL2 (Q p ); then using the explicit description of the principal series representation given in Section 3, π pV can be described as the space of maps f : GL2 (Q p )/V → K¯ of B2 (Q p )-sets, where B2 (Q p ) acts on GL2 (Q p )/V by left multiplication and on K¯ by the character (χ1 , χ2 ) : B2 (Q p ) → K¯ × given by a0 db 7 → χ1 (a)χ2 (d). Choose a set S ⊂ GL2 (Q p ) of representatives for B2 (Q p )\ GL2 (Q p )/V ; then restriction to S identifies π(χ1 , χ2 )V with the space of maps f : S → K¯ such that for each s in S the image f (s) is fixed by sV s −1 ∩B2 (Q p ). If V is either V0, p or V2, p , then we may take S to be the set { 10 01 , 01 10 }, and then it is straightforward to check that π U2 ( p) has dimension 2 over K¯ and that π U0 ( p) is trivial unless χ1 and χ2 are unramified, in which case it is 2-dimensional. Since each π occurs twice in H ⊗O K¯ , we have the following. LEMMA 47 The O -rank of L 0 is equal to 2#Q+2 #N∅ , and the O -rank of L 2 is equal to 2#Q+2 #N Q .
Thus all we have to do is show that the O -rank of L 2 is equal to #G Q times the O -rank of L 0 . Note that YV2 is disconnected, having a number of components equal to the index of V2 in V1 , which is equal to the cardinality of the maximal 2-power quotient of Q × p∈Q (Z/ pZ) , while YV1 is connected and the natural map YV2 → YV1 induced by the inclusion V2 → V1 is an isomorphism when restricted to any connected component of YV2 . The natural map YVi → YV j for j ≤ i is unramified since YV j has no elliptic points, and there is an obvious isomorphism of the sheaf F Mˇ on YVi with the pullback of the sheaf F Mˇ on YV j . Thus for every connected component Z of YV2 we obtain an isomorphism H 1 (Z , (F Mˇ )| Z ) → H 1 (YV1 , F Mˇ ) of O -modules from which it follows that the O -rank of L 2 is equal to the cardinality of Q the maximal 2-power quotient of p∈Q (Z/ pZ)× times the O -rank of L 1 . It remains to show that the O -rank of L 1 is equal to the cardinality of the maximal 2-power Q quotient of ( p∈Q (Z/ pZ)× )/{±1} times the O -rank of L 0 , and this can be proved exactly as in [CDT, Lem. 6.4.3]. This completes the proof of Proposition 17. Acknowledgments I would like to thank my thesis adviser Richard Taylor for suggesting this project and for his constant availability and readiness to answer questions during my work on it. I
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
381
would also like to thank Fred Diamond for helpful conversations, and the referee for pointing out a serious error in the original version of Lemma 43 and for making many other corrections and useful suggestions. References [BCDT] C. BREUIL, B. CONRAD, F. DIAMOND, and R. TAYLOR, On the modularity of elliptic curves over Q, to appear in J. Amer. Math. Soc. 320 [B] K. BUZZARD, On level-lowering for mod 2 representations, Math. Res. Lett. 7 (2000), 95–110. MR 2001a:11080 322, 323 [BDST] K. BUZZARD, M. DICKINSON, N. SHEPHERD-BARRON, and R. TAYLOR, On icosahedral Artin representations, Duke Math. J. 109 (2001), 283–318. 320, 323, 326 [BT] K. BUZZARD and R. TAYLOR, Companion forms and weight one forms, Ann. of Math. (2) 149 (1999), 905–919. MR 2000j:11062 320 [C] H. CARAYOL, Sur les repr´esentations l-adiques associ´ees aux formes modulaires de ´ Hilbert, Ann. Sci. Ecole Norm. Sup. (4) 19 (1986), 409–468. MR 89c:11083 336 [Ca] W. CASSELMAN, On some results of Atkin and Lehner, Math. Ann. 201 (1973), 301–314. MR 49:2558 335 [CV] R. F. COLEMAN and J. F. VOLOCH, Companion forms and Kodaira-Spencer theory, Invent. Math. 110 (1992), 263–281. MR 93i:11063 322 [CDT] B. CONRAD, F. DIAMOND, and R. TAYLOR, Modularity of certain potentially Barsotti-Tate Galois representations, J. Amer. Math. Soc. 12 (1999), 521–567. MR 99i:11037 320, 321, 337, 338, 340, 343, 344, 345, 356, 357, 359, 377, 380 [CR] B. CONRAD and K. RUBIN, eds., Arithmetic Algebraic Geometry (Park City, Utah, 1999), Amer. Math. Soc., Providence, to appear. [CSS] G. CORNELL, J. H. SILVERMAN, and G. STEVENS, eds., Modular Forms and Fermat’s Last Theorem (Boston, 1995), Springer, New York, 1997. MR 99k:11004 [DDT] H. DARMON, F. DIAMOND, and R. TAYLOR, “Fermat’s last theorem” in Current Developments in Mathematics (Cambridge, Mass., 1995), Internat. Press, Cambridge, Mass., 1994, 1–154. MR 99d:11067a 323, 337, 362, 367 [Dia1] F. DIAMOND, On deformation rings and Hecke rings, Ann. of Math. (2) 144 (1996), 137–166. MR 97d:11172 320 [Dia2] , “An extension of Wiles’ results” in Modular Forms and Fermat’s Last Theorem (Boston, 1995), Springer, New York, 1997, 475–489. MR CMP 1 638 490 332 [Dia3] , The Taylor-Wiles construction and multiplicity one, Invent. Math. 128 (1997), 379–391. MR 98c:11047 357, 358, 359 [DI] F. DIAMOND and J. IM, “Modular forms and modular curves” in Seminar on Fermat’s Last Theorem (Toronto, 1993/1994), CMS Conf. Proc. 17, Amer. Math. Soc., Providence, 1995, 39–133. MR 97g:11044 321, 334, 335 [Dic] L. E. DICKSON, Linear Groups: With an Exposition of the Galois Field Theory, Teubner, Leipzig, 1901; reprint, Dover, New York, 1958. MR 21:3488 365 [E] B. EDIXHOVEN, The weight in Serre’s conjectures on modular forms, Invent. Math. 109 (1992), 563–594. MR 93h:11124 323, 337
382
[FM]
[G]
[Gr] [Gro]
[K]
[Ku] [ML] [RS]
[S1] [S2] [ST]
[dSL]
[T]
[Ta] [TW] [W] [Wa]
MARK DICKINSON
J.-M. FONTAINE and B. MAZUR, “Geometric Galois representations” in Elliptic
Curves, Modular Forms, and Fermat’s Last Theorem (Hong Kong, 1993), Ser. Number Theory 1, Internat. Press, Cambridge, Mass., 1995, 41–78. MR 96h:11049 319, 322, 325 ˆ , “Deformation theory” to appear in Arithmetic Algebraic Geometry F. Q. GOUVEA (Park City, Utah, 1999), ed. B. Conrad and K. Rubin, Amer. Math. Soc., Providence. 328 B. H. GROSS, A tameness criterion for Galois representations associated to modular forms (mod p), Duke Math. J. 61 (1990), 445–517. MR 91i:11060 322 A. GROTHENDIECK, Technique de descente et th´eor`emes d’existence en g´eom´etrie alg´ebrique, II: Le th´eor`eme d’existence en th´eorie formelle des modules, S´eminaire Bourbaki 5, 1958/59–1959/60, Soc. Math. France, Montrouge, 1995, 369–390, exp. no. 195. MR CMP 1 603 480 324, 328 S. S. KUDLA, “The local Langlands correspondence: The non-Archimedean case” in Motives (Seattle, 1991), Proc. Sympos. Pure Math. 55, Part 2, Amer. Math. Soc., Providence, 1994, 365–391. MR 95d:11065 336 P. KUTZKO, The Langlands conjecture for Gl2 of a local field, Ann. of Math. (2) 112 (1980), 381–412. MR 82e:12019 336 S. MACLANE, Categories for the Working Mathematician, Grad. Texts in Math. 5, Springer, New York, 1971. MR 50:7275 328 K. A. RIBET and W. A. STEIN, “Lectures on Serre’s conjectures” to appear in Arithmetic Algebraic Geometry (Park City, Utah, 1999), ed. B. Conrad and K. Rubin, Amer. Math. Soc., Providence. 322 J.-P. SERRE, Linear Representations of Finite Groups, Grad. Texts in Math. 42, Springer, New York, 1977. MR 56:8675 344 , Sur les repr´esentations modulaires de degr´e 2 de Gal(Q/Q), Duke Math. J. 54 (1987), 179–230. MR 88g:11022 319, 321 N. I. SHEPHERD-BARRON and R. TAYLOR, mod 2 and mod 5 icosahedral representations, J. Amer. Math. Soc. 10 (1997), 283–298. MR 97h:11060 320, 323 B. DE SMIT and H. W. LENSTRA, JR., “Explicit construction of universal deformation rings” in Modular Forms and Fermat’s Last Theorem (Boston, 1995), Springer, New York, 1997, 313–326. MR CMP 1 638 482 325, 334 J. TATE, “Number theoretic background” in Automorphic Forms, Representations and L-Functions (Corvallis, Ore., 1977), Part 2, Proc. Sympos. Pure Math. 33, Amer. Math. Soc., Providence, 1979, 3–26. MR 80m:12009 336 R. TAYLOR, “Icosahedral Galois representations” in Olga Taussky-Todd: In Memoriam, Pacific J. Math. 1997, special issue, 337–347. MR 99d:11057 320 R. TAYLOR and A. WILES, Ring-theoretic properties of certain Hecke algebras, Ann. of Math. (2) 141 (1995), 553–572. MR 96d:11072 319, 357 J.-L. WALDSPURGER, Sur les valeurs de certaines fonctions L automorphes en leur centre de sym´etrie, Compositio Math. 54 (1985), 173–242. MR 87g:11061b 344 F. W. WARNER, Foundations of Differentiable Manifolds and Lie Groups, Grad. Texts in Math. 94, Springer, New York, 1983. MR 84k:58001 338
MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS
[Wi1] [Wi2]
A. WILES, On ordinary λ-adic representations associated to modular forms, Invent.
Math. 94 (1988), 529–573. MR 89j:11051 333, 337 , Modular elliptic curves and Fermat’s last theorem, Ann. of Math. (2) 141 (1995), 443–551. MR 96d:11071 319, 329, 357
Department of Mathematics, University of Michigan, 2074 East Hall, 525 East University Avenue, Ann Arbor, Michigan 48109-1109, USA; [email protected]
383
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 2,
CUBIC RINGS AND THE EXCEPTIONAL JORDAN ALGEBRA NOAM D. ELKIES AND BENEDICT H. GROSS
Abstract In a previous paper [EG] we described an integral structure (J, E) on the exceptional Jordan algebra of Hermitian 3 × 3 matrices over the Cayley octonions. Here we use modular forms and Niemeier’s classification of even unimodular lattices of rank 24 to further investigate J and the integral, even lattice J0 = (ZE)⊥ in J . Specifically, we study ring embeddings of totally real cubic rings A into J which send the identity of A to E, and we give a new proof of R. Borcherds’s result that J0 is characterized as a Euclidean lattice by its rank, type, discriminant, and minimal norm. Contents 0. Preface . . . . . . . . . . . 1. The integral structure (J, E) 2. Embeddings of cubic rings . 3. The A-module structure on L 4. A Hilbert modular form . . 5. The case D = p 2 . . . . . . 6. The case D = 49 . . . . . . 7. The case D = 16 . . . . . . 8. The case D = 32 . . . . . . 9. The uniqueness of J0 . . . . References . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
383 384 388 390 391 394 396 399 403 404 409
0. Preface In a previous paper [EG] we described an integral structure (J, E) on the exceptional cone in R27 and studied the integral, even lattice J0 = (ZE)⊥ of rank 26 and discriminant 3. In this paper we study ring embeddings f : A → J of totally real cubic rings A into J , mapping the identity element 1 of A to the polarization E = f (1) of J . DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 2, Received 21 June 2000. Revision received 8 November 2000. 2000 Mathematics Subject Classification. Primary 11F30, 11H55, 11H56. Elkies’s work supported in part by the Packard Foundation.
383
384
ELKIES AND GROSS
We first show how such an embedding gives rise to an integral, even lattice L = A⊥ of rank 24, as well as a holomorphic Hilbert modular form F(τ ) of weight (4,4,4) for the discrete group SL2 (A) ⊂ SL2 (R)3 . We then establish some general results on the lattice L and the form F(τ ). In particular, when the discriminant of A is a square, we show there is a Niemeier lattice M between L and its dual lattice L ∨ , which is determined by the embedding f : A → J . We then give some examples. In particular, when A = Z[cos(2π/7)] = Z[α]/(α 3 + α 2 − 2α − 1) is the Dedekind domain of discriminant D = 49, and when A = {(a, b, c) ∈ Z3 : a ≡ b ≡ c (mod 2)} = Z + 2Z3 has discriminant D = 16, the embedding f is unique up to conjugation by the finite group Aut(J, E). In these cases we determine the lattices L and M and the Hilbert modular form F(τ ). In [B, Ch. 5.7], Borcherds proved that J0 is characterized by the following properties: it is an even integral lattice of rank 26, discriminant 3, and minimal norm 4. His proof requires detailed knowledge of the Lorentzian lattice II25,1 . But one can also prove this uniqueness result using only theta functions and elementary Euclidean arguments, somewhat in the spirit of J. Conway’s characterization in [C] of the Leech lattice by its rank, discriminant, and minimal norm. Starting from any even lattice L ⊂ R26 of discriminant 3 and minimal norm 4, the proof examines the configuration of minimal vectors of L and its dual, showing that L shares further combinatorial properties with J0 until L is forced to coincide with J0 . Most of these properties are also needed in our investigation of the algebra J ; in particular, the Niemeier lattice for D = 16 arises naturally in the course of the uniqueness proof. Thus several steps in that proof also provide alternative explanations for facts about J and J0 that we used in [EG] to analyze embeddings of cubic rings. In the final section of our paper we indicate these steps of the proof; a fuller treatment of that uniqueness proof will appear elsewhere. 1. The integral structure (J, E) In this section we recall the results from [EG] which we need. Let R ⊂ O be the Coxeter order in Cayley’s octonion algebra (see [EG, (5.1)]), and let β in R be defined by 1 1 β = − + (e1 + e2 + · · · + e7 ). (1.1) 2 2 Let J be the Z-lattice, of rank 27, consisting of the 3×3 Hermitian symmetric matrices over R. (This lattice was denoted L in [EG], where J was used for the real vector space containing the lattice, here denoted J ⊗ R.) An element B of J has the form
a B= z y
z b x
y x c
CUBIC RINGS AND THE EXCEPTIONAL JORDAN ALGEBRA
385
with a, b, c in Z and x, y, z in R. In particular, we define the element E in J by
2 E = β β
β 2 β
β β 2
(1.2)
(see [EG, (5.4)]). The function d(B) = abc + Tr(x yz) − ax x − by y − czz
(1.3)
(see [EG, (1.7)]) defines a cubic form d : J → Z. This gives a symmetric trilinear form (B, B 0 , B 00 ) with (B, B, B) = 6d(B). Since ββ = 2 and Tr(β 3 ) = 5, we find that d(E) = 1. Since E is also positive definite (see [EG, (1.12)]), it defines a polarization of J . The finite group Aut(J, d, E) has order 212 35 72 13 and is isomorphic to 3D4 (2).3 (see [EG, (7.7)]). From E and the cubic form, we obtain a linear form on J : T (B) =
1 (E, E, B) = 2(a + b + c) + Tr(xβ) + Tr(yβ) + Tr(zβ). 2
(1.3a)
We also obtain two symmetric bilinear forms on J : (B, B 0 ) = (E, B, B 0 ), hB, B 0 i = −(B, B 0 ) + T (B)T (B 0 ). The first is even, of signature (1, 26) and discriminant 2. The second is positive definite and unimodular (see [EG, (7.2)]). We have hB, Bi ≡ hE, Bi (mod 2); for all B in L; that is, E is a characteristic vector of the lattice. On the lattice of rank 26, J0 = {B ∈ J : T (B) = 0} = {B ∈ J : hB, Ei = 0} = {B ∈ J : (B, E) = 0}, we have the formula (B, B 0 ) = −hB, B 0 i. (This lattice was denoted L 0 in [EG, §8].) It is even, of discriminant 3, and has no roots (see [EG, (8.4)]). Its theta function was determined in [EG, (8.7)].
386
ELKIES AND GROSS
Recall that the Jordan roots S in J are the matrices of rank 1 (see [EG, (1.8)]) which satisfy T (S) = 2. There are 819 = 32 7 · 13 Jordan roots, permuted transitively by the group Aut(J, d, E) (see [EG, (7.8)]). If S is a fixed Jordan root, then hS, S 0 i = 4 2 1 0 for precisely 1 288 144 18 Jordan roots S 0 (see [EG, (8.9)]). Moreover, the 18 roots S 0 orthogonal to S come in 9 pairs (S 0 , S 00 ), with hS 0 , S 00 i = 0 and 2E = S + S 0 + S 00 . These are the root triples containing S (see [EG, (7.8)]). If S 0 and T 0 are orthogonal to S, with S 0 6 = T 0 and hS 0 , T 0 i 6 = 0, then hS 0 , T 0 i = 2. Indeed, 4 = h2E, T 0 i = 0 + hS 0 , T 0 i + hS 00 , T 0 i, so hS 0 , T 0 i = hS 00 , T 0 i = 2. In [EG, §8], we showed that the short vectors v in J0∨ have the form 2 v=± S− E , 3 where S is a Jordan root. From this we shall determine all elements B in J with hB, Bi ≤ 4. PROPOSITION 1.4 If hB, Bi = 0, then B = 0. There are no B in J with hB, Bi = 1 or hB, Bi = 2. If hB, Bi = 3, then either
B = ±E
and
T (B) = ±3,
or B = ±(E − S)
and
T (B) = ±1
for a unique Jordan root S. If hB, Bi = 4, then either B = ±S
and
T (B) = ±2
for a unique Jordan root S, or B = S1 − S2
and
T (B) = 0
for a pair (S1 , S2 ) of Jordan roots with hS1 , S2 i = 2. There are precisely 2 representations of B as a difference of Jordan roots.
CUBIC RINGS AND THE EXCEPTIONAL JORDAN ALGEBRA
387
Proof The first statement is clear, as h·, ·i is a positive-definite pairing. Since 1 ZE ⊕ J0 ⊂ J ⊂ ZE ⊕ J0∨ , 3 3 3 we may write
a E +v 3 with v in J0∨ , and a an integer. The class of a mod 3 is determined by the class of v in J0∨ /J0 . Since hE, Ei = 3, B=
hB, Bi =
a2 + hv, vi. 3
If v = 0, then a = 3b for some nonzero integer b, and B = bE. Then hB, Bi ≥ 3, with equality if and only if b = ±1. Otherwise, hB, Bi ≥ 12. If v 6 = 0, we have hv, vi ≥ 8/3, with equality if and only if v = ∓ (S − (2/3)E) for a unique Jordan root S (see [EG, (8.4)]). Taking a = ±1, we obtain the elements B = ±(E − S), with hB, Bi = 3,
T (B) = ±1.
Taking a = ∓2, we obtain the elements B = ∓S, with hB, Bi = 4,
T (B) = ∓2.
If hv, vi > 8/3, then hv, vi ≥ 4, with equality implying that v lies in J0 . We conclude the proof by showing that v = S1 − S2 ,
hS1 , S2 i = 2,
in precisely two distinct ways. Since there are 144 · 819 short vectors v in J0 (see [EG, (8.7)]) and 288 · 819 ordered pairs (S1 , S2 ) of Jordan roots with hS1 , S2 i = 2, we will get all the short vectors, provided that we show each S1 − S2 has precisely one further representation as T1 − T2 . If S1 − S2 = T1 − T2 with Si 6 = Ti , we have 2 = hS1 , T1 − T2 i = hS1 , T1 i − hS1 , T2 i.
388
ELKIES AND GROSS
Hence hS1 , T1 i = 2 and hS1 , T2 i = 0. Similarly, hT1 , S2 i = 0, so we have S1 + T2 + R = 2E, T1 + S2 + R 0 = 2E for Jordan roots R (orthogonal to S1 and T1 ) and R 0 (orthogonal to T1 and S2 ). But S1 + T2 = T1 + S2 , so R = R 0 is orthogonal to both (S1 , S2 ) and (T1 , T2 ). Conversely, such an R orthogonal to (S1 , S2 ) gives us another pair (T1 , T2 ). We conclude the proof by proving the following lemma. 1.5 If S1 and S2 are Jordan roots with hS1 , S2 i = 2, there is a unique Jordan root R orthogonal to S1 and S2 . LEMMA
Proof As we noted in [EG, (8.9)] (by invoking the Atlas [A]), Aut(J0 ) acts transitively on pairs (S1 , S2 ) of Jordan roots such that hS1 , S2 i = 2. Thus the number of Jordan roots orthogonal to both S1 , S2 is a constant independent of the choice of S1 , S2 ; call this constant n. We determine n by counting in two ways the triples (S1 , S2 , R) of Jordan roots with the above inner products. On the one hand, there are 819 choices for S1 , then 288 choices for S2 , then n choices for R, for a total of 819 · 288 · n triples. On the other hand, there are 819 choices for R, then 18 choices for S2 , then 16 choices for S1 , so 819 · 18 · 16 = 819 · 288 triples. Hence n = 1, as claimed. 2. Embeddings of cubic rings Let A be a cubic ring, by which we mean a commutative ring with unit which is isomorphic as an additive group with Z3 . Assume that A is totally real, that is, that A ⊗ R ' R3 . Let N : A → Z be the norm, which is a cubic form. Let f : A → J be a homomorphism of abelian groups. We say f is an embedding if the following three conditions hold: d( f (a)) = Na
for all a ∈ A,
f (1) = E, the abelian group J/ f (A)is torsion-free.
(2.1) (2.2) (2.3)
The first two conditions imply that f is a ring homomorphism f (a · b) = f (a) ◦ E f (b)
(2.4)
CUBIC RINGS AND THE EXCEPTIONAL JORDAN ALGEBRA
389
(see [GG, Lem. 2]), where B ◦ E B 0 is the Jordan product on J ⊗ Z[1/2] defined in [EG, (2.15)]. The condition (2.3) implies that the embedding f does not extend to a larger order A0 ⊃ A in the e´ tale algebra A ⊗ Q. Since only maximal orders were considered in [GG], this condition was unnecessary there. By (2.1) and (2.2), we have d(x E − f (a)) = N(x − a) as cubic polynomials over Z. Hence T ( f (a)) = Tr(a), h f (a), f (b)i = T ( f (a) ◦ E f (b)) = Tr(a · b). Let A∨ ⊂ A ⊗ Q be the lattice dual to A under the form ha, bi = Tr(a · b); the finite A-module A∨ /A has order D = disc(A). Let L = f (A)⊥ be the subgroup, of rank 24, of elements B of J which are orthogonal to the image of A. Then L ⊂ J0 is an even lattice, and L ⊥ = f (A), by (2.3). We have inclusions A ⊕ L ⊂ J ⊂ A∨ ⊕ L ∨ . (2.5) PROPOSITION 2.6 The projections onto the first and second components define isomorphisms of finite abelian groups
α : J/(A ⊕ L) ' A∨ /A, β : J/(A ⊕ L) ' L ∨ /L . If β ◦ α −1 = γ : A∨ /A ' L ∨ /L, then for all a, b in A∨ we have hγ a, γ bi ≡ −ha, bi
(mod Z).
Proof We have (A ⊕ L)∨ = A∨ ⊕ L ∨ . Since J is unimodular, the index d of A ⊕ L in J is equal to the index d of J in A∨ ⊕ L ∨ . Since the maps α and β are both injective, we have d ≤ #(A∨ /A) and d ≤ #(L ∨ /L). But by the above remark, d 2 = #(A∨ /A) · #(L ∨ /L). Hence the maps α and β are both isomorphisms, and we have d = #(L ∨ /L) = #(A∨ /A) = D. Define t A : J → A∨ as follows: t A (B) is the first component of B in the decomposition J ⊂ A∨ + L ∨ . Then Tr(t A (B)) = h1, t A (B)i = hE, Bi = T (B) ∈ Z.
(2.7)
390
ELKIES AND GROSS
3. The A-module structure on L The lattice J0 is even, so q(v) = hv, vi/2 defines a positive-definite quadratic form q : J0 → Z. In this section we define an A-module structure on the lattice L = f (A)⊥ ⊂ J0 and define a positive-definite quadratic map of A-modules q A : L → A∨
(3.1)
such that Tr(q A ) = q on L. The A-module structure on L is due to T. Springer (cf. [KMRT]). It comes from the E-adjoint map B 7 → B # on J (cf. [EG, (2.21)], where B # was denoted B E∗ ). This is a quadratic map, which satisfies B ◦ E B # = B # ◦ E B = d(B) · Eq(v) = −hv # , Ei for all v ∈ J0 (see [EG, 2.22]). There is a similar map a 7 → a # on A, with a · a # = N(a), and if f : A → J is an embedding, we have f (a # ) = f (a)# . The Freudenthal product B ×C is the symmetric bilinear map J × J → J defined by the formulas B × C = (B + C)# − B # − C # = 2(B ◦ E C) − T (B)C − T (C)B + (B, C)E. Note that E × C = −C. A key identity, which can be verified using [EG, (2.15)], is the inner product formula hB × C, Di = hB, C × Di. For a ∈ A and v ∈ L, we define a · v in L by the formula a · v = −( f (a) × v). This lies in L = f (A)⊥ , as for any b ∈ A, h f (b), a · vi = −h f (b), f (a) × vi = −h f (b) × f (a), vi = −h f (b × a), vi = 0. It endows L with an A-module structure. (The relevant identities can be checked for the action of A ⊗ R on L ⊗ R, as in [EG, §1–3].)
CUBIC RINGS AND THE EXCEPTIONAL JORDAN ALGEBRA
Now for v ∈ L ⊂ J ⊂ A∨ + L ∨ , write # v = −q A (v) + β(v), with q A : L → A∨ , β : L → L ∨.
391
(3.2)
Again, by extending scalars to R, one can check that q A (a · v) = a 2 · q A (v), and hv, wi A = q A (v + w) − q A (v) − q A (w) is a bilinear form with values in A∨ ; thus q A is a quadratic form. Moreover, Tr q A (v) = −T (v # ) = −hv # , Ei = q(v). From this it follows that L ∨ is also an A-module inside L ⊗ Q. Indeed, L ∨ = Hom(L , Z) under h·, ·i, so L ∨ = Hom A (L , A∨ )
(3.3)
under the bilinear form h·, ·i A . Some further identities include β(a · v) = a # · β(v) in L ∨ as well as the formula for the cubic form on L, d(v) = hv, β(v)i A (see [KMRT, pp. 522–523]). The right-hand side miraculously takes values in the subring Z of A. In particular, for a ∈ A, d(a · v) = Na · d(v). Even though the quotient L ∨ /L has the structure of a finite A-module, the isomorphism γ : A∨ /A ' L ∨ /L of finite abelian groups in Proposition 2.6 is not an A-module homomorphism, as the action of A on A∨ + L ∨ does not stabilize the sublattice J . 4. A Hilbert modular form Let A be a totally real cubic ring. An element of A ⊗ R ' R3 is said to be totally positive if each of its three R3 -coordinates is nonnegative. We denote by (A ⊗R)+ the self-dual cone of such elements. Fix an embedding f : A → J . Since (A ⊗ R)+ = (A ⊗ R)2 and (J ⊗ R)2 is the cone of positive-semidefinite matrices B in J ⊗ R, f maps totally positive α in A to positive-semidefinite B = f (α) in J . Conversely, if B ≥ 0 in J , then α = t A (B) lies in A∨ + . To verify this it suffices to check that Tr(αα 0 ) ≥ 0 for all α 0 ∈ A+ . But Tr(αα 0 ) = ht A (B), f (α 0 )i = hB, f (a 0 )i ≥ 0
392
ELKIES AND GROSS
as f (α 0 ) ≥ 0 in J . Let H be the upper half-plane. PROPOSITION 4.1 The holomorphic function f : H 3 → C, defined by the convergent Fourier expansion X F(τ ) = f (τ1 , τ2 , τ3 ) = c(α)e2πi(α1 τ1 +α2 τ2 +α3 τ3 ) α∈A∨ +
with c(0) = 1 and X
c(α) = 240
X
SinJ rank(S)=1 t A (S)=α
d 3 ,
(4.2)
d|c(S)
is a Hilbert modular form of weight (4, 4, 4) for SL2 (A). That is, aτ + b = N(cτ + d)4 F(τ ) F cτ + d for all ac db in SL2 (A).
(4.3)
In the proposition, c(S) is the largest positive integer dividing S in J . If α is primitive in A∨ , then c(S) = 1 and c(a) = 240 · #{S : rank S = 1, t A (S) = 2}. Writing S =α+v
in J ⊂ A∨ + L ∨
with α = t A (S), we find S # = (α # − q A (v)) + (β(v) − α · v). The condition that rank(S) = 1 is equivalent to the fact that S # = 0 (see [EG, (1.11)]), so ( q A (v) = α # β(v) = α · v
in A∨ ⊗ A∨ in A∨ ⊗ L ∨ .
(4.4)
To prove the proposition, we begin with a description of H. Kim’s singular form F(Z ) on the exceptional tube domain D = {Z = X + iY, with X ∈ J ⊗ R and Y ∈ (J ⊗ R)+ }.
CUBIC RINGS AND THE EXCEPTIONAL JORDAN ALGEBRA
393
This is a holomorphic function F : D → C which satisfies F(Z + B) = F(Z ) for all B ∈ J, F(g Z ) = F(Z ) for all g ∈ Aut(J, d), F(−Z /d(Z )) = d(Z )4 F(Z ). #
It has Fourier expansion
X
F(Z ) = 1 + 240
X
S≥0
d 3 e2πihA,Z i .
(4.5)
d|c(S)
rank(S)=1
These facts were established by Kim [K, p. 146], using the identity I to polarize J ; in fact, any polarization E determines an isomorphic discrete subgroup of automorphisms of the exceptional domain. The form F(τ ) is simply the restriction of F(Z ) to the sub-tube-domain H 3 = (A ⊗ R) + i(A ⊗ R)+ ,→(J ⊗ R) + i(J ⊗ R)+ = D . f
This satisfies F(τ + b) = F(τ ) for all b ∈ A, F(α 2 · τ ) = F(τ ) for all α ∈ A∗ , F(−1/τ ) = (Nτ )4 F(τ ). The matrices 1 b , 0 1
α 0
0 , α −1
0 −1 1 0
generate SL2 (A), so F(τ ) has weight (4,4,4) for this discrete group acting on H 3 . Its Fourier expansion is given by X X F(τ ) = 1 + 240 d 3 e2πi Tr(t A (S)·τ ) S≥0 rank(S)=1
d|c(S)
= 1 + 240
X
α∈A∨ + α6 =0
as claimed.
X
X rank(S)=1 t A (S)=α
d|c(S)
2πi Tr(α·τ ) , d 3 e
394
ELKIES AND GROSS
5. The case D = p 2 In this section we consider the case when the cubic ring A is maximal and has discriminant D = p 2 , with p a prime. Then p ≡ 1 (mod 3), and A is the ring of integers in the cubic subfield of the pth cyclotomic field. Thus p is tamely ramified in A and lies under a unique prime p of A; we have 3 pA = p , (5.1) A∨ = p−2 A, A∨ /A ' (Z/ pZ)2 . The quadratic space A∨ /A with form p · Tr(x y) (mod p) is split over Z/ pZ, with one of its isotropic lines the sub-A-module N = p−1 A/A.
(5.2)
0
Let N be the other isotropic line in A∨ /A, and let N and N 0 be the corresponding unimodular lattices contained in A∨ . Since rank(N ) = rank(N 0 ) = 3, both are isomorphic to the lattice Z3 (see [MH, p. 19]). Thus N has six short vectors ±e1 , ±e2 , ±e3 which satisfy hei , e j i = Tr(ei e j ) = δi j , and likewise N 0 has six short vectors ±e10 , ±e20 , ±e30 with Tr(ei0 e0j ) = δi j . Both N and N 0 contain the element 1 of A, with h1, 1i = 3. We may normalize the signs of the ei so that e1 + e2 + e3 = 1, as the only v with hv, vi = 3 have the form v = ±e1 ± e2 ± e3 . We next determine the cubic equations satisfied by the ei , ei0 , and in the process we obtain a novel proof of the existence of integers s, t such that 4 p = s 2 + 27t 2 . Let σ generate the cyclic group (of order 3) of automorphisms of A. Then σ (A∨ ) = A∨ , σ (N ) = N , and σ (N 0 ) = N 0 . Since heiσ , eiσ i = 1, we see that eiσ is a short vector not equal to ±ei . Since e1σ + e2σ + e3σ = 1σ = 1, σ cyclically permutes the set {e1 , e2 , e3 }. Hence Tr(ei ) = 1. Since Tr(ei2 ) = 1, the ei are the three roots of a cubic equation of the form f m (x) = x 3 − x 2 − m = 0
(5.3)
with m = N(ei ). The same argument shows that the ei0 are roots of a cubic equation f m 0 (x) = 0 with m 0 = N(ei0 ). These norms m, m 0 are rational numbers of p-valuation −1 and −2, respectively. Now the discriminant dµ of f µ (x) is given by dµ = −4µ − 27µ2 = −µ(4 + 27µ). Since A has square discriminant, dµ must be a rational square for both µ = m and µ = m 0 . Thus if we write m = −m 1 / p, m 0 = −m 2 / p 2 , then both m 1 (4 p−27m 1 ) and
CUBIC RINGS AND THE EXCEPTIONAL JORDAN ALGEBRA
395
m 2 (4 p 2 −27m 2 ) are squares. Now gcd(m i , 4 p −27m i )|4 for i = 1, 2 since the m i are integers not divisible by p. Thus each m i must be either a square or twice a square. But the latter is impossible because then 4 pi − 27m i would also be twice a square, yet 4 pi − 27m i ≡ 1 mod 3. Therefore each m i is a square. Writing m 1 = t 2 , we find that 4 p − 27t 2 = s 2 for some integer s, and we have thus solved 4 p = s 2 + 27t 2 ; hence t2 s2t 2 (5.4) m = − , dm = 2 . p p Next, if m 2 = t 0 2 , then 4 p 2 − 27t 0 2 = s 0 2 for some integer s 0 . Having solved 4 p = √ s 2 + 27t 2 , we have represented p as a norm of the algebraic integer (s + t −27)/2 √ in the quadratic imaginary ring Z + Z(1 + −27)/2; squaring, we obtain the representation of p 2 and conclude that s 0 = st, so s2t 2 m =− 2 , p 0
dm 0
2 s 2 t 2 (s 2 − 27t 2 )/2 = . p4
(5.5)
Fix an embedding f : A → J , and let L = f (A)⊥ as in §2. By Proposition 2.6, the map γ : A∨ /A ' L ∨ /L identifies the isotropic lines in these rank-2 quadratic spaces over Z/ pZ. We let M/L be the line corresponding to N /A, and we let M 0 /L be the line corresponding to N 0 /A. Then M and M 0 are two even, integral, unimodular lattices of rank 24 which lie between L and L ∨ . The abelian group L ∨ /L has the structure of a finite A-module, by the results of §3. PROPOSITION 5.6 The A-module L ∨ /L is cyclic and is isomorphic to A/p2 . It has a unique nontrivial A-submodule p(L ∨ /L), which is equal to the isotropic line M/L. The quadratic form q A on M takes values in A∨ , and the A-bilinear map h·, ·i A : M × M → A∨ identifies the A-module M with Hom A (M, A∨ ).
Proof Since L ∨ /L has order p 2 , it is isomorphic to either the cyclic A-module A/p2 or the A-module (A/p)2 . In the latter case, pL ∨ ⊂ L, which we use to derive a contradiction. Indeed, hL , L ∨ i A ⊂ A∨ , so if pL ∨ ⊂ L, we have hL ∨ , L ∨ i A ⊂ p−1 A∨ . Since p > 2, this means that for any y ∈ L ∨ we would have 1 hy, yi A ≥ −3. ordp q A (y) = ordp 2 On the other hand, take a in A∨ with ordp (a) = −2, and find y in L ∨ such that v = a + y is in J . Then v # = (a # − q A (y)) + (β(y) − a · y) is also in J , so it has first
396
ELKIES AND GROSS
component a # − q A (y) in A∨ . Since ordp (a # ) = −4, this forces ordp (q A (y)) = −4, a contradiction. Hence L ∨ /L is cyclic, and its unique A-submodule is p(L ∨ /L). We show that this submodule is isotropic for the quadratic form p · q(y) : L ∨ /L → Z/ pZ. Let π be a uniformizing element at p in A, and write a basis of p(L ∨ /L) as e = π · λ, with λ ∈ L ∨ . Then p · q(e) = p · Tr q A (e) = p · Tr(π 2 q A (y)). It suffices to show that π 2 q A (λ) lies in A∨ . Since L ∨ = Hom A (L , A∨ i and p2 (L ∨ ) ⊂ L, the quadratic form q A takes elements of L ∨ to elements in p−2 A∨ = (A∨ )⊗2 . Hence π 2 q A (λ) ∈ A∨ . Since p(L ∨ /L) is isotropic, it is equal to the line M/L or the line M 0 /L in L ∨ /L. If it is equal to M/L, then M is an A-module, q A : M → A∨ , and h, i A identifies M with the A-module Hom A (M, A∨ ). If not, p(L ∨ /L) = M 0 /L and q A : M 0 → A∨ . From this we derive a contradiction. Indeed, M 0 corresponds to the unimodular lattice A ⊂ N 0 ⊂ A∨ , and if a ∈ N 0 − A, then ordp (a) = −2 and ordp (a # ) = −4. By the definition of M 0 , we may find an element m 0 in M 0 with a + m 0 in J . Then (a + m 0 )# = (a # − q A (m 0 )) + (β(m 0 ) − a · m 0 ) also lies in J . Hence its first component a # − q A (m 0 ) lies in A∨ = p−2 A. This contradicts the fact that ordp (a # ) = −4 and ordp (q A (m 0 )) ≥ −2. Note 5.7 The results in this section extend, with minor modifications, to every cubic A of square discriminant. There is always a canonical A-module M with L ⊂ M ⊂ L ∨ which is unimodular for h·, ·i and on which h·, ·i A takes values in A∨ . 6. The case D = 49 We now study the case when A is the Dedekind domain Z[cos(2π/7)] = Z[α]/(α 3 + α 2 − 2α − 1) of discriminant D = 49. In this case there are 29 34 13 embeddings f : A → J , all conjugate under the finite group Aut(J, E) = 3 D 4 (2).3 of order 212 35 72 13 (see [GG, §8]). The stabilizer of a fixed embedding is the subgroup 72 : 2A4 of order 23 3 · 72 , and the normalizer of this subgroup is the maximal subgroup 72 : 2A4 × 3. The quotient is the cyclic group Aut(A) of order 3. In particular, Galois conjugate embeddings are conjugate under Aut(J, E).
CUBIC RINGS AND THE EXCEPTIONAL JORDAN ALGEBRA
We may specify one embedding by taking −1 1 f (α) = 1 −1 −β −β where we recall that
2 f (1) = E = β β
−β −β , −1 β 2 β
397
(6.1)
β β , 2
and β 2 + β + 2 = 0. The image f (A) consists of the Z-module f + p + r p − r + pβ f − r − rβ p − r + pβ f + p+r f + rβ f − r − rβ f + rβ f + p+r with f, p, r all integers. The element E corresponds to the triple (0, 1, 1), and the element f (α) corresponds to the triple (0, 0, −1). The trace form is 4 f + 2 p + r , so f (A∨ ) consists of matrices with f, p, r in (1/7)Z and 4 f + 2 p + r in Z. Since the trace form Tr(x 2 ) on A takes the values 0, 3, 5, 6, . . ., the six nontrivial cosets of N /A = p−1 A/A are represented uniquely by the six short vectors n in N with Tr(n 2 ) = 1. Let M(with L ⊂ M ⊂ L ∨ ) be the even unimodular lattice of rank 24 corresponding to the lattice N . If λ ∈ M is a root, we may find a unique short vector n ∈ N which satisfies Tr(n) = 1 and a unique choice of sign ± such that S = E ±λ−n is a Jordan root in J . Indeed, choose n and the sign uniquely so that the sum v = n ± λ in A∨ + L ∨
(6.2)
lies in J . Since hv, vi = hn, ni + hλ, λi = 1 + 2 = 3 and T (v) = Tr(n) = 1, we have v=E−S for a Jordan root S by Proposition 1.4. Consequently, we have shown the following proposition. PROPOSITION 6.3 The number of roots λ in the Niemeier lattice M is equal to twice the number of Jordan roots S in J which satisfy t A (S) = 1 − n in A∨ + , where n is any short vector in N with Tr(n) = 1.
398
ELKIES AND GROSS
Since the three short vectors in N with Tr(n) = 1 are Galois conjugate, and Galois conjugate embeddings of A are conjugate by Aut(J, E), the number of S with t A (S) = 1 − n is equal to the number of S with t A (S) = 1 − n σ . Hence we obtain the following corollary. 6.4 Fix a short vector n in N with Tr(n) = 1, and let a = 1 − n be the corresponding (totally positive) element in A∨ + . Then COROLLARY
#{roots λ in M} = 6 · #{Jordan roots S in J with t A (S) = a}. A similar argument works for the lattices M 0 and N 0 . Fix a short vector n 0 in N 0 with Tr(n 0 ) = 1, and let a 0 = 1 − n 0 . Then #{roots λ0 in M} = 6 · #{Jordan roots S with t A (S) = a 0 }.
(6.5)
We can calculate these numbers by a determination of the Hilbert modular form F(τ ). This has weight (4,4,4) for SL2 (A), and since the Galois conjugate embeddings are conjugate, it is invariant under the action of Aut(A). One can show, using the trace formula, that the space of such forms is 2-dimensional and is spanned by the forms E 22 and E 4 . Here E k is the weight-(k, k, k) Eisenstein series studied by Siegel, with the Fourier expansion X X 1 E k = 3 ζ A (1 − k) + (Nc)k−1 q a (6.6) 2 2 a>0 in A∨
c|(a)p
(see [vdG, pp. 19–20]). From the values 1 , 3·7 79 ζ A (−3) = , 2·3·5·7 ζ A (−1) = −
we find X X 1 E2 = − 3 + Nc q a , 2 3·7 2 a>0 in A∨
c|(a)p
E4 = −
79 + 24 3 · 5 · 7
X
X
a>0 in A∨
c|(a)p2
(Nc)3 q a .
CUBIC RINGS AND THE EXCEPTIONAL JORDAN ALGEBRA
399
There is a unique Aut(A) orbit of elements a > 0 in A∨ with Tr(a) = 1, represented by the squares n 2 of short vectors in N . Since the space of modular forms is 2-dimensional, there is a unique form F(τ ) with constant Fourier coefficient c(0) = 1 and coefficient c(n 2 ) = 0. Some calculation shows that this is the linear combination F(τ ) = 24 3 · 5 · 7E 2 (τ )2 + 22 5E 4 (τ ).
(6.7)
There are five orbits of Aut(A) on elements a > 0 in A∨ with Tr(a) = 2, and we tabulate the Fourier coefficients c(a) of F(τ ) on these orbits (see Table 1). As before, n is a short vector in N with Tr(n) = 1, and n 0 is a short vector in N 0 with Tr(n 0 ) = 1. We have n 3 − n 2 + (1/7) = 0, and (n 0 )3 − (n 0 )2 + (1/49) = 0 by (5.4) and (5.5). Indeed, for p = 7, s 2 = t 2 = 1. Table 1 a>0 in A∨ Tr(a)=2 2 · n2
(a)p2
c(a) of F(τ )
(2)
240 · 49
1−n
p
240 · 28
1 − n0
1
0
1 − n2
a prime of norm 13
240 · 196
1
0
1 − 2n
+ n2
The form F(τ ) we have determined is the one that appears in §4, as that satisfies c(0) = 1, c(n 2 ) = 0. Hence, for Tr(a) = 2 we have c(a) = 240#{S = Jordan roots of J with t A (S) = a}. In particular, this shows that the lattice N has 6 · 28 = 168 roots and that the lattice N 0 has 6 · 0 = 0 roots. Hence, as Niemeier lattices, ( N ' A46 , (6.8) N 0 ' Leech lattice, as claimed in [GG, §8]. Indeed, A46 is the unique Niemeier lattice with 168 roots (and Coxeter number h = 7), and the Leech lattice is the unique Niemeier lattice with no roots (see [N], [V]). 7. The case D = 16 We treat another case, when A is the subring of index 4 in Z3 consisting of the triples (a, b, c) with a ≡ b ≡ c (mod 2). This ring has discriminant D = 16 and admits an
400
ELKIES AND GROSS
embedding f : A → J which is unique up to conjugacy by Aut(J, d, E). Indeed, an embedding of A is given by the images f (2, 0, 0) = S1 , (7.1) f (0, 2, 0) = S2 , f (0, 0, 2) = S , 3
which satisfy Si2 = 2Si , Si S j = 0, S1 + S2 + S3 = 2E. Thus (S1 , S2 , S3 ) forms a root triple in the sense of [EG, §3], and by [EG, Prop. 7.8], the group Aut(J, d, E) = 3 D4 (2).3 of order 212 35 72 13 acts transitively on root triples, with fixer the subgroup 22+3+6 .7.3. Hence there are 2 · 34 · 7 · 13 = 14742 distinct embeddings f : A → J . Fix an embedding, and let L = A⊥ . Then L ∨ /L ' A∨ /A. Since A∨ is the subgroup of ((1/2)Z)3 consisting of triples (a, b, c) with a + b + c in Z, we find A∨ /A ' (Z/4Z)2 . The unimodular lattice Z3 , A ⊂ Z3 ⊂ A∨ , corresponds to the subgroup (2Z/4Z)2 killed by 2, and in turn it yields a Niemeier lattice M between L and L ∨ . We show that M is isomorphic to the Niemeier lattice whose root system is A24 1 by showing that M contains precisely 48 roots. To do this, we first determine the modular form F(τ ) of weight (4, 4, 4) for SL2 (A), defined in §4. Let 0(2) G SL2 (Z) be the subgroup of integral matrices that reduce to the identity mod 2. Then SL2 (A) G 0(2)3 , so F(τ ) has weight (4, 4, 4) for 0(2)3 and enjoys some additional invariance properties. Let W be the complex vector space of holomorphic modular forms of weight 4 for 0(2). This has dimension 3 and is spanned by the Eisenstein series at the three cusps 1 − q + 7q 2 + . . . , 16 f 2 = −q 1/2 + 8q − 28q 3/2 + 64q 2 + . . . ,
f1 =
f 3 = q 1/2 + 8q + 28q 3/2 + 64q 2 + . . . (see [R, pp. 232–235]). Here q 1/2 = eπiτ . The form f 1 is a power series in q = e2πiτ ,
CUBIC RINGS AND THE EXCEPTIONAL JORDAN ALGEBRA
401
as is f 2 + f 3 . The general Fourier coefficient an of q n/2 is given by X an = (−1)d d 3 for f 1 , d|n n|deven
X
an =
(−1)d d 3
for f 2 ,
d|n n|dodd
X
an =
d3
for f 3
d|n n|dodd
(see [R, Th. 7.3.1]). The group SL2 (Z)/ 0(2) ' S3 acts on this space by permuting the forms f 1 , f 2 , f 3 . The unique invariant is the sum 1 + 15q + 135q 2 + . . . 16 1 = E 4 of weight 4 for SL2 (Z). 16
f1 + f2 + f3 =
The space of forms of weight (4, 4, 4) for 0(2)3 is isomorphic to W ⊗ W ⊗ W . This has dimension 27 and basis f i ⊗ f j ⊗ f k . The group SL2 (Z)/ 0(2) acts diagonally and has invariant subspace of dimension 5. A basis for the invariants is given by g1 = f 1 ⊗ f 1 ⊗ f 1 + f 2 ⊗ f 2 ⊗ f 2 + f 3 ⊗ f 3 ⊗ f 3 , g2 = f 1 ⊗ f 1 ⊗ f 2 + f 1 ⊗ f 1 ⊗ f 3 + f 2 ⊗ f 2 ⊗ f 1 + f 2 ⊗ f 2 ⊗ f 3 + f3 ⊗ f3 ⊗ f1 + f3 ⊗ f3 ⊗ f2, g3 = f 1 ⊗ f 2 ⊗ f 1 + f 1 ⊗ f 3 ⊗ f 1 + f 2 ⊗ f 1 ⊗ f 2 + f 2 ⊗ f 3 ⊗ f 2 + f3 ⊗ f1 ⊗ f3 + f3 ⊗ f2 ⊗ f3, g4 = f 2 ⊗ f 1 ⊗ f 1 + f 3 ⊗ f 1 ⊗ f 1 + f 1 ⊗ f 2 ⊗ f 2 + f 3 ⊗ f 2 ⊗ f 2 + f1 ⊗ f3 ⊗ f3 + f2 ⊗ f3 ⊗ f3, g5 = f 1 ⊗ f 2 ⊗ f 3 + f 1 ⊗ f 3 ⊗ f 2 + f 2 ⊗ f 1 ⊗ f 3 + f 2 ⊗ f 3 ⊗ f 1 + f3 ⊗ f1 ⊗ f2 + f3 ⊗ f2 ⊗ f1. This is precisely the space of forms of weight (4, 4, 4) on SL2 (A), as SL2 (A) is the subgroup of SL2 (Z)3 consisting of triples of matrices with the same reduction mod 2. The form F(τ ) has an additional invariance property. Since the Aut(A) = S3 conjugate embeddings f : A → J are conjugate under Aut(J, d, E), the number of S in J of rank 1 with t A (S) = a is equal to the number with t A (S) = a σ , for all σ ∈ S3 = Aut(A). Hence the Fourier coefficient c(a) of F(τ ) is equal to c(a σ ) for all σ ∈ S3 and a ∈ A∨ + , and F(τσ1 , τσ2 , τσ3 ) = F(τ1 , τ2 , τ3 ).
402
ELKIES AND GROSS
The subspace of forms of weight (4, 4, 4) for SL2 (A) with this extra invariance property has dimension 3. As a basis, we may take h 1 = 212 g1 with c(0, 0, 0) = 1, h 2 = 24 (g2 + g3 + g4 ) with c(0, 0, 0) = 0, c(1, 0, 0) = 1, h 3 = −23 g5 with c(0, 0, 0) = 0, c(1, 0, 0) = 0, c(1/2, 1/2, 0) = 1. We tabulate the coefficients of the basis elements h i , at orbits of Aut(A) on A∨ + with Tr(a) ≤ 2 (see Table 2). Table 2 a=
(0,0,0)
(1,0,0)
( 12 , 12 , 0)
h1 h2 h3
1 0 0
-16 1 0
0 2 1
(2,0,0)
(1,1,0)
( 12 , 32 , 0)
( 12 , 12 , 1)
112 8 0
256 96 -64
0 56 28
65536 -288 -16
The form F(τ ) has Fourier coefficients c(0, 0, 0) = 1, c(1, 0, 0) = 0, 1 1 c , , 0 = 0. 2 2 Hence we must have f = h 1 + 16h 2 − 32h 3 .
(7.2)
The coefficients of F at elements a in A∨ + with Tr(a) = 2 are c(2, 0, 0) = 240, c(1, 1, 0) = 16 · 240, 1 3 c , , 0 = 0, 2 2 1 1 c , , 1 = 256 · 240. 2 2 Hence there are 16 Jordan roots S with t A (S) = (1, 1, 0), 16 roots with t A (S) = (0, 1, 1), and 16 roots with t A (S) = (1, 0, 1).
CUBIC RINGS AND THE EXCEPTIONAL JORDAN ALGEBRA
403
The three nontrivial cosets of Z3 /A are represented by short vectors n with hn, ni = 1 and Tr n = 1. An argument similar to that in §6 shows that if n is such a short vector, #{roots λ in M} = 3#{Jordan roots with t A (S) = 1 − n} = 3 · 16 = 48. This completes the proof that N ' A24 1 , as that is the unique Niemeier lattice with 48 roots (and Coxeter number h = 2; again, see [N], [V]). 8. The case D = 32 Another interesting case, which merits further study, is the cubic ring √ A = {(b, c + d 2) : b ≡ c (mod 2)}, which has index 2 in the maximal order √ A0 = Z + Z[ 2]
of discriminant 8.
Hence A has discriminant D = 32. The embeddings f : A → J are all conjugate. To construct one, choose a triple of Jordan roots (S1 , S2 , S) with hS1 , S2 i = 2 and S orthogonal to S1 and S2 , as in the proof of Lemma 1.6. We embed A into J by mapping f (1, 1) = E, f (2, 0) = S, √ f (0, 2 + 2) = S1 + S2 . Since B = S1 + S2 satisfies B 3 − 4B 2 + 2B = 0
in J,
and S · B = B · S = 0, this is a ring embedding. The abelian group A∨ /A is isomorphic to Z/8 + Z/4. Indeed, ( ) √ ! b c+d 2 ∨ A = , : b ≡ c (mod 2) . 2 4 Hence the exponent of A∨ /A is 8. The 2-torsion in A∨ /A is given by the image of the lattice c N= b, √ + d , 2
404
ELKIES AND GROSS
and N /A ' Z/2 + Z/2. The subgroup N /A is also isotropic for the induced pairing A∨ /A × A∨ /A → Q/Z, √ as Tr(n 2 ) = Tr b2 , c2 /2 + d 2 + 2cd = b2 + c2 + 2d 2 is integral for n ∈ N . We have N ⊃2 A0 ⊃2 }A. The lattice N has discriminant 2 and is isomorphic to Z + Z + Z(2). By the results of §2, the lattice L = f (A)⊥ has index 32 in L ∨ and is contained in the intermediate integral lattices L , M with L ⊂ L 0 ⊂ M ⊂ L ∨, 2
2
corresponding to A0 and N , respectively. The discriminant of M is 2. Can one determine these lattices explicitly and identify the Hilbert √ modular form √ F(τ ) inside the space of forms of weight (4, 4, 4) for 0(2) × 0( 2), where 0( 2) is the normal √ √ subgroup of SL2 (Z[ 2]) consisting of matrices reducing to the identity (mod 2)? 9. The uniqueness of J0 We outline here how J0 can be proved unique without using the Lorentzian lattice II25,1 . We remark that this uniqueness result was recently used (see [BV]) in the classification of rootless lattices in dimensions 27 and 28. We postpone to a future paper some details of the spaces of modular forms that can arise as weighted theta functions and of the reconstruction of J0 from the Niemeier lattice with root system A24 1 . Let 3 be a positive-definite even integral lattice of rank 26, discriminant 3, and minimal norm 4. To prove that 3 is isometric with J0 , we show the following proposition. 9.1 Every vector of the dual lattice 3∨ is either in 3 or has norm congruent to 2/3 mod 2Z. No vector of 3∨ has norm 2/3. The theta series of 3, 3∨ coincide with those of J0 , J0∨ . In particular, each of the two nontrivial cosets of 3 in 3∨ has 819 vectors of minimal norm 8/3. Those 819 vectors constitute a spherical 2-design 1, and the 2 · 819 minimal vectors of 3∨ constitute a spherical 4-design 1 ∪ (−1). The inner product of any w1 , w2 ∈ 1 is one of 8/3, 2/3, −1/3, −4/3. For each w1 these inner products occur, respectively, for 1, 288, 512, 18 of the 819 choices of w2 . For each i, j, k ∈ {8/3, 2/3, −1/3, −4/3} there exists an integer n i,k j , independent of 3, such that for any w1 , w2 ∈ 1 with (a, b) = k the number of w ∈ K with (w1 , w) = i and (w2 , w) = j is n i,k j .
PROPOSITION
(i) (ii) (iii) (iv) (v)
(vi)
CUBIC RINGS AND THE EXCEPTIONAL JORDAN ALGEBRA
405
(For part (iv), recall that a nonempty finite subset S of a sphere in Rn is a spherical tP design if, for every polynomial P on Rn of degree at most t, |S|−1 x∈S P(x) equals the average of P over the sphere. Part (vi) is the statement that 1 is a 4-class association scheme indexed by {8/3, 2/3, −1/3, −4/3}, with parameters n i,k j independent of 3. That 1 is such an association scheme when 3 = J0 follows from the fact that Aut(J0 ) acts distance-transitively on 1; but of course this argument, which we used earlier in this paper, is not yet available to us for arbitrary 3.)
406
ELKIES AND GROSS
Proof (i) This is known to be true for any positive-definite even integral lattice 3 of discriminant 3 and rank 8n + 2. Since [3∨ : 3] = det(3) = 3, we have either v ∗ − w∗ ∈ 3 or v ∗ +w∗ ∈ 3 for any v ∗ , w∗ ∈ 3∨ −3. Thus hv ∗ , v ∗ i ∼ = hw∗ , w∗ i mod Z. More∗ ∗ ∗ ∗ ∗ ∗ ∗ over, 3v ∈ 3, so 3hv , v i = h3v , v i ∈ Z and 9hv , v i = h3v ∗ , 3v ∗ i ∈ 2Z. Thus there exists an integer c such that hv ∗ , v ∗ i ≡ 2c/3 mod 2Z for all v ∗ ∈ 3∨ − 3. If we had c = 0, then 3∨ would be an integral lattice, which is impossible because det 3∨ = 1/3 ∈ / Z. If c = 2, then we could glue 3∨ to the A2 lattice, obtaining a positive-definite even unimodular lattice of rank 8n + 4, which is impossible. Thus c = 1, as claimed. (ii) Assume on the contrary that hv ∗ , v ∗ i = 2/3 for some v ∗ ∈ 3∨ . Let 31 = 3∩(Zv ∗ )⊥ . This is a positive-definite even integral lattice of rank 25 and discriminant hv ∗ , v ∗ i(det 3) = 2 containing no vectors of norm 2. But no such lattice exists. As with J0 , but more simply, the nonexistence of 31 was proved by Borcherds via II25,1 (see [B, Lem. 4.3.1]), and it can also be established using theta series without invoking hyperbolic lattices. To do this, first show, as in (i), that all vectors in 3∨ 1 − 31 would have norm ≡ (1/2) mod 2Z, and note that 3∨ has minimal norm at least 5/2 because 1 ∨ ∗ ∗ if w ∈ 31 has norm 1/2, then 2w ∈ 31 has norm 2. Then show, as we do for 3 in (iii) and (iv), that 3∨ 1 has minimal norm 5/2 and that its minimal vectors constitute a ∗ 2 ∗ spherical 2-design. Thus, for any minimal vector w0∗ ∈ 3∨ 1 , the average of hw , w0 i over all minimal vectors w∗ is (5/2)2 /25 = 1/4. But hw∗ , w0∗ i ∈ Z + 1/2 for all ∗ ∗ ∗ 2 ∗ w ∗ ∈ 3∨ 1 , and hw0 , w0 i = 5/2. Thus each hw , w0 i ≥ 1/4, and the inequality is strict at least for w∗ = w0∗ . Therefore the average of hw, w0 i2 exceeds 1/4. This contradiction proves that 31 cannot exist, and thus it proves that 3∨ has no vectors of norm 2/3. (iii) This is in effect already proved in [EG, Prop. 8.6], in which the theta series of J0 , J0∨ was determined using only the facts about J0 that we assumed for 3 or proved in (i) and (ii). Since the involution x ↔ −x switches the two nontrivial cosets of 3 in 3∨ , each of these two cosets has the same number of minimal vectors; thus the theta series of 3∨ also determines the number of minimal vectors in each nontrivial coset. P (iv) We use the fact that S is a t-design if and only if x∈S P(x) = 0 whenever P is a spherical harmonic of positive degree at most t. Let C be one of the nontrivial P cosets of 3 in 3∨ , and consider the weighted theta series x∈C P(x)q hx,xi/2 . This is a modular form of weight 13 + deg(P) for 0(3). Because 3∨ has minimal norm 8/3, this form vanishes at each cusp at least to the same order as η(z)32 , a form of weight 16. Thus, if 13 + deg(P) < 16, then the form is identically zero. In particular, its q 4/3 coefficient vanishes; since this coefficient is the sum of P(x) over the 819 minimal vectors of C, we confirm that these vectors constitute a spherical 2-design.
CUBIC RINGS AND THE EXCEPTIONAL JORDAN ALGEBRA
407
(This is the argument we suggested in [EG, p. 693]; see also the second part of (v) below.) As for the minimal vectors of 3∨ , the same construction yields a modular form P ϑ(z) := x∈3∨ P(x)q(z)hx,xi/2 of weight 13 + deg(P) for 00 (3). It still vanishes at least as η32 at each cusp, but at the cusp z = 0 we have ϑ(z) = O(q(−1/z)2 ), not P just O(q(−1/z)4/3 ), because ϑ(−1/z) is proportional to x∈3 P(x)q(z)hx,xi/2 and 3 has minimal norm 4. This lets us conclude that ϑ ≡ 0 if 13 + deg(P) < 18 and thus that 1 ∪ (−1), the set of minimal vectors of 3∨ , is a spherical 4-design, as claimed. (Since this design is symmetric about the origin, it is automatically a 5-design as well, but we do not use this.) (v) Since w2 ≡ w1 mod 3, we have hw1 , w2 i ≡ hw1 , w1 i = 8/3 ≡ 2/3 mod Z. By Cauchy-Schwarz |hw1 , w2 i| ≤ 8/3, with equality if and only if w1 , w2 are proportional. Since w1 , w2 ∈ 1, this equality condition is equivalent to w1 = w2 . If w1 6 = w2 , then w1 − w2 ∈ 3 − {0}, so w1 − w2 has norm at least 4, whence hw1 , w2 i = (|w1 |2 + |w1 |2 − |w1 − w2 |2 )/2 ≤ (16/3 − 4)/2 = 2/3. Likewise, w1 + w2 , a nonzero vector in 3∨ , has norm at least 8/3, whence hw1 , w2 i ≥ −4/3. Thus the only possibilities for hw1 , w2 i are −4/3, −1/3, 2/3, and 8/3, the last occurring if and only if w2 = w1 . Now fix w1 and apply (iv) with P(x) = hw1 , xi and P(x) = hw1 , xi2 . This yields two linear equations on the four counts n i := #{w2 ∈ 1 : hw1 , xi = i} (i = 8/3, 2/3, −1/3, −4/3). These are already known to satisfy the two linear condiP tions n 8/3 = 1 and i n i = 819. Solving these simultaneous linear equations yields (n 2/3 , n −1/3 , n −4/3 ) = (288, 512, 18), as claimed. For a check on the computation P we may verify that i i 4 n i = 512/3 is consistent with 1 ∪ (−1) being a spherical 4-design. (vi) (Sketch) We may assume that k 6 = 8/3. For each of the remaining three values of k, and w1 , w2 ∈ 1 such that hw1 , w2 i = k, let n i, j := #{w ∈ 1 : hw1 , wi = i, hw2 , wi = j}. P P We know i n i, j for each j, and j n i, j for each i, from (v). We can also calculate P P 2 i, j i j n i, j and i, j (i j) n i, j using the fact that 1 ∪ (−1) is a 4-design. In each case this gives us enough independent linear equations to determine all the n i, j , and, in particular, to show that they depend only on k and not on the choice of w1 , w2 . This completes the proof of the proposition. We can now obtain the uniqueness of J0 in two ways. The first is to use a combinatorial characterization of a regular graph G of order 819 and degree 18 obtained from 1. This graph has vertex set 1 and an edge connecting any w1 , w2 ∈ 1 if and only if hw1 , w2 i = −4/3. It turns out that the n i,k j of Proposition 9.1 are equivalent to the condition that G be a generalized hexagon of order (2, 8). A. Cohen and J.
408
ELKIES AND GROSS
Tits showed in [CT] that every such generalized hexagon is isomorphic to the graph obtained from the norm-(8/3) vectors of J0∨ . (They actually reduced this result in turn to M. Ronan’s characterization (see [Ro1], [Ro2]) of this graph, and they offered an alternative proof by showing that for each vertex of such a graph there is a graph involution fixing only the vertex and its neighbors and then citing F. Timmesfeld’s group-theoretic characterization in [T, (3.3)] of 3D4 (2).) But G determines the inner products of all pairs of vectors in 1: two vertices at distance d on the graph correspond to vectors in 1 with inner product (−1)d 23−d /3. Thus 1 is isometric with the configuration of minimal vectors of J0∨ , and since these vectors generate J0∨ , it follows that 3∨ ∼ = J0∨ and thus that L ∼ = J0 , as claimed. In the second approach we use the counts n i,k j for k = −4/3 to find a copy of the A24 1 Niemeier lattice in 3. Fix w1 , w2 ∈ 1 such that hw1 , w2 i = −4/3, and let w3 = −(w1 + w2 ). Then w3 ∈ 1 also, and w1 , w2 , w3 form an equilateral triangle. When 3 = J0 , such triangles are precisely the projections to J0 of root triples in J . Let L be the 24-dimensional slice of 3 orthogonal to w1 , w2 , w3 . As in §7, we show that this is an even lattice of discriminant 16 with L ∨ /L ∼ = (Z/4Z)2 and thus that the preimage of (2Z/4Z)2 in L ∨ is a self-dual lattice M. This lattice can also be described as the projection to L ⊗ R of all vectors of 3 whose inner product with each wi is even. The norm of such a projection must be even; thus M is a Niemeier lattice. The next step is to show that M contains 48 roots. We cannot use the methods of §7 to count the roots, so instead we reduce the problem to the results of Proposition 9.1. If r ∈ M has norm 2, then since r ∈ / 3 we must have r + 3wi /2 ∈ 3 for exactly one of i = 1, 2, 3. Thus w := r − wi /2 ∈ 3∨ . Then w has norm 8/3, and being congruent to wi mod 3 it must be contained in 1. Since hr, wi i = 0 for each i, the projection of w to the (w1 , w2 , w3 ) plane is −wi /2. Conversely, given w ∈ 1 with that projection, we can reconstruct the root r = w + wi /2. Enumerating the roots thus reduces to enumerating the w’s. But this is done in part (vi) of Proposition 9.1 (actually, in this case part (v) would suffice); the multiplicities of the projections of 1 to that plane are given by the diagram (see Figure 1). In particular, the number of roots is 16 + 16 + 16 = 48, as claimed. We finish this proof of the uniqueness of J0 by showing that, up to the automorphisms of M, there is a unique suitable choice for its index-4 sublattice L, from which we recover 3 and thus identify it with J0 . We can even obtain the size, if not the structure, of Aut(J0 ) by multiplying #Aut(L) by the number of choices we have made along the way. This analysis, too, we relegate to a future paper.
CUBIC RINGS AND THE EXCEPTIONAL JORDAN ALGEBRA
409
w1 T T
T T T T T
T 16 256 T 16 TT T T T T T T u T T T T T T T T 256 256 T T T T T T TT T w2 w3 16 Figure 1
Acknowledgments. It is a pleasure to thank Wee Teck Gan for his help and Richard Borcherds for a copy and discussion of his thesis. References [BV]
R. BACHER and B. VENKOV, R´eseaux entiers unimodulaires sans racine en dimension
[B]
R. E. BORCHERDS, The Leech lattice and other lattices, Ph.D. dissertation, Trinity
[CT]
A. M. COHEN and J. TITS, On generalized hexagons and a near octagon whose lines
[C]
J. H. CONWAY, A characterisation of Leech’s lattice, Invent. Math. 7 (1969), 137–142.
[A]
J. H. CONWAY, R. T. CURTIS, S. P. NORTON, R. A. PARKER, and R. A. WILSON, Atlas of
27 et 28, preprint, 2000, http://http://www.fourier.ujf-grenoble.fr 404 College, Cambridge, 1984, http://http://www.math.berkeley.edu/˜reb 384, 406 have three points, European J. Combin. 6 (1985), 13–27. MR 86j:51021 408 MR 39:6824 384
[CS] [EG]
[GG]
Finite Groups: Maximal Subgroups and Ordinary Characters for Simple Groups, Oxford Univ. Press, Eynsham, 1985. MR 88g:20025 388 J. H. CONWAY and N. J. A. SLOANE, Sphere Packings, Lattices and Groups, 2d ed., Grundlehren Math. Wiss. 290, Springer, New York, 1993. MR 93h:11069 N. D. ELKIES and B. H. GROSS, The exceptional cone and the Leech lattice, Internat. Math. Res. Notices 1996, 665–698. MR 97g:11070 383, 384, 385, 386, 387, 388, 389, 390, 392, 400, 406, 407 B. H. GROSS and W. T. GAN, Commutative subrings of certain non-associative rings,
410
ELKIES AND GROSS
[K]
H. KIM, Exceptional modular form of weight 4 on an exceptional tube domain contained in C27 , Rev. Math. Iberoamericana 9 (1983), 139–200. MR 94c:11040
Math. Ann. 314 (1999), 265–283. MR 2000j:11050 389, 396, 399
393 [KMRT] M.-A. KNUS, A. MERKURJEV, M. ROST, and J. TIGNOL, The Book of Involutions, Amer. Math. Soc. Colloq. Publ. 44, Amer. Math. Soc., Providence, 1998. MR 2000a:16031 390, 391 [MH] J. MILNOR and D. HUSEMOLLER, Symmetric Bilinear Forms, Ergeb. Math. Grenzgeb. (3) 73, Springer, New York, 1973. MR 58:22129 394 [N] H.-V. NIEMEIER, Definite quadratische Formen der Dimension 24 und Diskriminante 1, J. Number Theory 5 (1973), 142–178. MR 47:4931 399, 403 [R] R. RANKIN, Modular Forms and Functions, Cambridge Univ. Press, Cambridge, 1977. MR 58:16518 400, 401 [Ro1] M. A. RONAN, A note on the 3D4 (q) generalized hexagons, J. Combin. Theory Ser. A 29 (1980), 249–250. MR 82g:51015 408 [Ro2] , A combinatorial characterization of the dual Moufang hexagons, Geom. Dedicata 11 (1981), 61–67. MR 82i:51016 408 [T] F. G. TIMMESFELD, A characterization of the Chevalley- and Steinberg-groups over F2 , Geometriae Dedicata 1 (1973), 269–321. MR 48:8616 408 [vdG] G. VAN DER GEER, Hilbert modular surfaces, Ergeb. Math Grenzgeb. (3) 16, Springer, Berlin, 1988. MR 89c:11073 398 [V] B. B. VENKOV, “Even unimodular 24-dimensional lattices” in Sphere Packings, Lattices and Groups, 2d ed., Grundlehren Math. Wiss. 290, Springer, New York, 1993, 429–440. 399, 403
Elkies Department of Mathematics, Harvard University, One Oxford Street, Cambridge, Massachusetts 01238, USA; [email protected] Gross Department of Mathematics, Harvard University, One Oxford Street, Cambridge, Massachusetts 01238, USA; [email protected]
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 2,
CORRECTION TO: “INTERNAL LIFSHITS TAILS FOR RANDOM PERTURBATIONS OF PERIODIC ¨ SCHRODINGER OPERATORS” ´ ERIC ´ FRED KLOPP
In [1] we studied the existence of Lifshitz tails for internal gaps of a randomly perturbed periodic Schr¨odinger operator and concluded that, for both long-range and short-range single-site perturbation, one has Lifshitz tails at a band edge if and only if a suitably chosen underlying periodic operator has a nondegenerate density of states at the corresponding band edge. This result is correct only in the case of short-range potentials. In the case of long-range potentials, one finds that the Lifshitz tails hold without any assumptions on the underlying periodic potential. More precisely, if one assumes only [1, (H.4)], then the results stated in [1, Th. 2.1] are correct. If one assumes [1, (H.4s)], then the result stated in [1, Th. 2.1] in the case [1, (H.2bis(2))] is correct if one asks for a slightly faster decay, that is, if assumption [1, (H.2bis(2))] is replaced with (H.2bis(2)) there exists ν > d + 2 such that one has, for any γ ∈ 0 and almost every x ∈ C0 , V > 0 on some open set and 0 ≤ V (x +γ )·(1+ | γ |)ν ≤ g+ (x). (0.1) In the case when one has, for any γ ∈ 0 and almost every x ∈ C0 , V > 0 on some open set and 0 ≤ V (x + γ ) · (1+ | γ |)d+2 ≤ g+ (x),
(0.2)
then the correct statement (which is proved in [1]) is lim
E→0+
log(n(E) − n(0)) d log | log(N (E) − N (0))| d = =⇒ lim =− . + log E 2 log E 2 E→0
(0.3)
Let us now state the correct result in the case of long-range single-site perturbations; that is, we assume DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 2, Received 10 November 2000. Revision received 9 April 2001. 2000 Mathematics Subject Classification. Primary 82B44; Secondary 47B80, 60H25. Author’s work supported by European Training and Mobility of Researchers network grant number ERBFMRXCT960001. 411
´ ERIC ´ FR ED KLOPP
412
(H.2bis(1)) for some ν ∈ (d, d + 2], there exists 0 ≤ g− ≤ g+ , g+ ∈ L p (Rd ) (here p is taken as in [1, (H.2)]), and 0 < g− on some open set, such that, for any γ ∈ 0 and almost every x ∈ C0 , one has g− (x) ≤ V (x + γ ) · (1+ | γ |)ν ≤ g+ (x).
Then we prove that if [1, (H.4s)] holds, then zero is a continuity point for N and we have log | log(N (E) − N (0))| d lim =− . (0.4) log E ν−d E→0+ So (0.4) is true without any assumption on the underlying periodic operator. The proof of this result will appear elsewhere (see [2]). Note that (0.4) also implies that the converse of (0.3) is not true if one assumes only (0.2). References [1]
[2]
´ ERIC ´ FRED KLOPP, Internal Lifshits tails for random perturbations of periodic
Schr¨odinger operators, Duke Math. J. 98 (1999), 335–396. MR 2000m:82029 411, 412 ———, Internal Lifshitz tails for long range single site potentials, preprint, 2001, http://http://zeus.math.univ-paris13.fr/˜klopp/publi.html 412
D´epartement de Math´ematiques, Institut Galil´ee, Unit´e Mixte de Recherche (UMR) 7539 Centre National de la Recherche Scientifique (CNRS), Universit´e de Paris-Nord, 99 avenue Jean-Baptiste Cl´ement, F-93430 Villetaneuse, France; [email protected]
DUKE MATHEMATICAL JOURNAL Vol. 109, No. 3, © 2001
DIRECT AND INVERSE SPECTRAL PROBLEM FOR A SYSTEM OF DIFFERENTIAL EQUATIONS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER R. MENNICKEN, A. L. SAKHNOVICH, AND C. TRETTER
Abstract A nonclassical skew-selfadjoint system of two linear differential equations is considered, which depends rationally on the spectral parameter. Systems of this type are related to the sine-Gordon equation. We introduce the notion of W p -functions (Weyl functions) which are defined in a neighborhood of the poles. The main results are theorems on the existence and uniqueness of the Weyl functions, on the uniqueness of the solutions of the inverse problem, and on explicit solutions for the direct and the inverse problem. 1. Introduction The classical Gelfand-Levitan-Krein-Marchenko approach to inverse spectral and scattering problems was intensely and variously developed during the last decades (see [K], [M], [LS], and the references therein). This development was stimulated essentially by the creation of the famous method of inverse scattering transformation in the theory of integrable nonlinear equations (see, e.g., [AS], [FT]). Some other interesting nonclassical (direct) spectral problems with rational dependence on the spectral parameter were studied in [AL], [LMM], and [LT] (see also the references therein). An inverse problem for an analogue of a canonical system with a rational dependence like (λ − x)−1 on the spectral parameter was treated in [SaL2] and [SaL3]. Direct and inverse scattering problems for potentials with general rational dependence on the spectral parameter λ were studied in the important paper [Z] under some natural restrictions. In this paper we consider (2 × 2)-systems of first-order differential equations of DUKE MATHEMATICAL JOURNAL Vol. 109, No. 3, © 2001 Received 11 May 2000. Revision received 10 October 2000. 2000 Mathematics Subject Classification. Primary 34B07; Secondary 34A55, 34B20, 47E05. Authors’ work supported by the Deutsche Forschungsgemeinschaft (German Research Foundation).
413
414
MENNICKEN, SAKHNOVICH, AND TRETTER
the form b1 β1 (x)∗ β1 (x) b2 β2 (x)∗ β2 (x) yx (x, λ) = i + λ − d1 λ − d2
y(x, λ),
x ∈ [0, ∞), λ ∈ C,
(1.1) where yx = dy/dx denotes the derivative with respect to the variable x, b p = ±1, p = 1, 2, and d p = d p , p = 1, 2, d1 6= d2 , are constants, and the vector functions β p = β p1 β p2 have the property β p (x)β p (x)∗ = 1,
x ∈ [0, ∞), p = 1, 2.
(1.2)
Problems of the form (1.1) are intimately related to nonlinear differential equations. For example, the well-known sine-Gordon equation in laboratory coordinates ωx x − ωtt = sin ω leads to two families of auxiliary problems (1.1) depending on the parameter t with d1 = 1, d2 = −1, with either b1 = b2 = 1 or b1 = 1, b2 = −1, and with coefficients 1 β1 ( · , t) := √ 1 i eiω( · ,t)/2 h( · , t), 2 1 β2 ( · , t) := √ 1 i e−iω( · ,t)/2 h( · , t), 2 where the function h is given by ω ωt 1 h x = −i j + sin J h, 4 2 2 h(0, 0) = I2 , ω ωx 1 h t = −i j + cos J j h, 4 2 2 with j :=
1 0 , 0 −1
J :=
0 1 . 1 0
For details we refer the reader to [FT], [AS], and [SaA2]. The (2 × 2)-matrix function w( · , λ) satisfying differential equation (1.1) and the normalization condition w(0, λ) = I2 (1.3) is called the fundamental solution of (1.1). Here I2 denotes the (2 × 2)-unit matrix. Important transformations of the fundamental solutions are obtained via Backlund-Darboux transformations. For the role of Backlund-Darboux transformations and different representations of the fundamental solutions and eigenfunctions in
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
415
spectral theory we refer the reader to [D], [AM], [DIKZ], [RS], and the references therein. Different generalizations of the notion of a Weyl function (especially for the nonselfadjoint case) are based on the asymptotics of the fundamental solution of differential equation (1.1) (see, e.g., [L], [BC], [Y], [BDZ], [FI], [SaA2], [GKS]). In the present paper we use the following definition. Definition 1.1 Let p ∈ {1, 2}. A function ϕ p is called a W p -function ( pth Weyl function) of system (1.1) with the property (1.2) if and only if there exists an M > 0 such that ϕ p is defined on the complex domain b 1 p D M := λ ∈ C : λ − d p − i < M M and for all x ∈ [0, ∞),
ϕ p (λ)
< ∞. sup w(x, λ)
1
λ∈D M
(1.4)
It turns out that the Weyl functions are closely related to the spectral properties of a certain auxiliary system associated with system (1.1). For example, if λ0 is a zero of a W p -function, then µ0 = b p /(2(λ0 − d p )) is an eigenvalue of this auxiliary system. We also would like to mention that for the system (1.1) induced by the sine-Gordon equation the evolution of the Weyl functions ϕ p ( · , t), p = 1, 2, can be described via the boundary values w(0, t) and wx (0, t). The present paper is organized as follows. In Section 2 we prove the existence of unique Weyl functions ϕ1 , ϕ2 of system (1.1). In Section 3 we show the uniqueness of the solution of the inverse problem, which consists of recovering the coefficient functions β p of system (1.1) from its W p -functions. The scheme used here may be used without modifications for a system with r summands b p β p (x)∗ β p (x) instead of two as in (1.1). In Section 4 explicit solutions of the direct and the inverse spectral problem are established for λ-rational systems (1.1) as given in [GKS] for pseudocanonical systems with linear dependence on the spectral parameter. 2. Existence and uniqueness of the Weyl functions (direct problem) Suppose that the functions β p , p = 1, 2, are absolutely continuous and that sup kβ 0p (x)k < ∞,
p = 1, 2.
(2.1)
0<x<∞
In order to study system (1.1), we first construct two auxiliary systems of differ-
416
MENNICKEN, SAKHNOVICH, AND TRETTER
ential equations for p = 1, 2 as follows. For fixed p ∈ {1, 2}, we define W (x, µ) := W p (x, µ) := e−ixµ Q(x)w(x, λ)Q(0)∗ , where µ=
bp 2(λ − d p )
(2.2)
(2.3)
and Q is the (2 × 2)-matrix function given by Q(x) := Q p (x) :=
β p1 (x)
β p2 (x)
−β p2 (x)
β p1 (x)
! ,
x ∈ [0, ∞),
(2.4)
which has unitary values, that is, Q(x)∗ Q(x) = Q(x)Q(x)∗ = I2 ,
x ∈ [0, ∞).
(2.5)
Notice that by (1.2) and (2.4), bp 1 0 ∗ ∗ i Q(x)β p (x) β p (x)Q(x) = 2iµ . 0 0 λ − dp
(2.6)
Using the facts that w( · , λ) is a solution of system (1.1) and that Q fulfills (2.6), we obtain that W ( · , λ) is a solution of the system Wx (x, µ) = (iµj + ξ(x, µ)) W (x, µ),
x ∈ [0, ∞),
(2.7)
where ξ(x, µ) := Q 0 (x)Q(x)∗ + i
bk Q(x)βk (x)∗ βk (x)Q(x)∗ , (b p )/(2µ) + d p − dk
x ∈ [0, ∞), µ ∈ C, (2.8)
with k ∈ {1, 2}, k 6= p. The system (2.7) associated with (1.1) has the form of a canonical (Dirac-type) system, but with a more complicated coefficient ξ which depends on µ. In what follows we denote by < and = the real and imaginary part of a complex number. According to (1.2) and (2.1), we can choose a value M > 0 such that the inequality M sup kξ(x, µ)k < (2.9) 4 x∈[0,∞) =(µ)<−M/4
holds. By (1.3), (2.2), and (2.5), we have W (0, µ) = I2 , that is, W ( · , µ) is the fundamental solution of system (2.7). In the sequel we need the following representation of W ( · , µ).
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
417
THEOREM 2.1 Let β p , p = 1, 2, be absolutely continuous (1 × 2)-vector functions satisfying (1.2) with bounded first derivatives. Let W ( · , µ) = W p ( · , µ) be the fundamental solution of system (2.7) with ξ of form (2.8). Then W ( · , µ) admits a representation ! Z −1 x bp iµx j W (x, µ) = e D0 (x) + µ − Dk (x) + eiµu N (x, u) du 2(dk − d p ) −x −1 Z x bp + µ− eiµu Nk (x, u) du + O(µ−2 ) (2.10) 2(dk − d p ) −x
for µ = ζ + iη, η 6= 0, |ζ | → ∞, where k ∈ {1, 2}, k 6 = p, D0 , Dk are continuous diagonal matrix functions, D0∗ = D0−1 , and for any l ∈ [0, ∞), sup
x∈[ |u| , l]
(kN (x, u)k + kNk (x, u)k) < ∞.
(2.11)
The proof of this theorem is similar to the construction of the transformation operator for the classical Dirac-type system (see [SaL1]) and is given in the appendix. Now we can formulate the main theorem of this section. THEOREM 2.2 Let (1.1) be a system with coefficients β p , p = 1, 2, which are absolutely continuous vector functions satisfying (1.2), (2.1), and the additional condition
β p1 (0) 6= 0,
p = 1, 2.
(2.12)
Then there exist unique W p -functions ϕ p , p = 1, 2, of (1.1). We would like to mention that the spectral problems (direct and inverse) for system (1.1) can be treated analogously if we assume β p2 (0) 6 = 0 instead of β p1 (0) 6= 0 for p = 1, 2. Proof We fix p ∈ {1, 2}. From differential equation (2.7) for W ( · , µ) and estimate (2.9) it follows that W (x, µ)∗ j W (x, µ) x = W (x, µ)∗ i(µ − µ)I2 + jξ(x, µ) + ξ(x, µ)∗ j W (x, µ) > 0 (2.13) for x ∈ [0, ∞) when =(µ) < −M/4. By (2.3) it is easy to see that the inequality =(µ) < −M/4 is equivalent to λ − d p − (ib p /M) < 1/M, that is, λ ∈ D M . Since
418
MENNICKEN, SAKHNOVICH, AND TRETTER
W (0, µ) = I2 , (2.13) implies that W (x, µ)∗ j W (x, µ) ≥ j, or, equivalently, (W (x, µ)−1 )∗ j W (x, µ)−1 ≤ j,
M , 4
=(µ) < −
that is, W (x, µ)−1 is j-contractive. We set v11 (x, µ) v12 (x, µ) −1 W (x, µ) =: . v21 (x, µ) v22 (x, µ)
(2.14)
(2.15)
For a fixed x ∈ [0, ∞) we define a family of linear fractional transformations ψ p (x, µ) :=
v11 (x, µ)P(µ) + v12 (x, µ) , v21 (x, µ)P(µ) + v22 (x, µ)
(2.16)
where P is an analytic function which is bounded by 1, and we denote this family by N (x) := {ψ p (x, · ) : P analytic, |P(µ)| ≤ 1, =(µ) < −M/4}.
Note that according to (2.14) and (2.15) the denominator in (2.16) does not vanish and, moreover, |ψ p (x, µ)| ≤ 1 since W −1 is j-contractive. From (2.13) we infer W (x1 , µ)∗ j W (x1 , µ) > W (x2 , µ)∗ j W (x2 , µ),
x1 > x2 ,
(2.17)
that is, W (x2 , µ)W (x1 , µ)−1 is j-contractive. From this observation it follows that N (x1 ) ⊂ N (x2 ),
x1 > x2 .
(2.18)
If we let R(x, µ) :=
r11 (x, µ) r12 (x, µ) r21 (x, µ) r22 (x, µ)
:= W (x, µ)∗ j W (x, µ),
(2.19)
we see that by definition (2.16) of ψ p the condition ψ p (x, · ) ∈ N (x) is equivalent to M ψ p (x, µ) ≤ 0, =(µ) ≤ − . (2.20) ψ p (x, µ) 1 R(x, µ) 1 4 Hence we can parametrize the values ψ p (x, · ) ∈ N (x) in the form of a so-called Weyl disc, ψ p (x, µ) = ρ1 (x, µ)−1/2 u(x, µ)ρ2 (x, µ)−1/2 + ρ0 (x, µ), M x ∈ [0, ∞), =(µ) < − , 4
(2.21)
with the center of the disc given by ρ0 (x, µ) = −r11 (x, µ)−1r12 (x, µ), the radii −1/2 −1/2 ρ1 and ρ2 given by ρ1 (x, µ) = r11 (x, µ), −1 ρ2 (x, µ) = r21 (x, µ)r11 (x, µ)−1r12 (x, µ) − r22 (x, µ) ,
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
419
and a bounded parameter function |u(x, µ)| ≤ 1. By means of (2.17) and the definition of R( · , µ) (see (2.19), and noting that r21 ( · , µ) = r12 ( · , µ)∗ ), it is easy to see −1/2 −1/2 that the functions ρ1 and ρ2 are decreasing. Moreover, by (2.9) and (2.13) we have Z x M W (s, µ)∗ W (s, µ) ds. (2.22) R(x, µ) ≥ j − 2 =(µ) + 4 0 In particular, as W (s, µ)∗ W (s, µ) ≥ W (s, µ)∗ j W (s, µ) ≥ j, formula (2.22) yields M ρ1 (x, µ) = r11 (x, µ) ≥ 1 − 2x =(µ) + → ∞, x → ∞. (2.23) 4 Therefore, for =(µ) < −M/4, the discs of functions N (x) converge to a point (the so-called Weyl point): \ N (x) = lim ρ0 (x, · ) =: ψ p ( · ). (2.24) 0≤x<∞
x→∞
From (2.20) and (2.24) we derive ψ p (µ) ψ p (µ) 1 R(x, µ) 1
≤ 0,
=(µ) < −
M , 4
(2.25)
for all x ≥ 0. Therefore ψ p (µ) i(µ−µ)x e ≤ κ(x, µ) e M x , ψ p (µ) 1 W (x, µ) W (x, µ) 1
∗
(2.26)
where ψ p (µ) (i(µ−µ)−M)x κ(x, µ) := ψ p (µ) 1 W (x, µ)∗ (I2 − j)W (x, µ) e . 1 On the other hand, by (2.7), (2.9), and (2.25), we see that κ 0 ( · , µ) < 0, and together with (2.26) we obtain ψ p (µ) i(µ−µ)x ∗ e ≤ κ(0, µ) e M x = 2 e M x . ψ p (µ) 1 W (x, µ) W (x, µ) 1 (2.27) From (2.2) and (2.27) it follows that
√
w(x, λ)Q(0)∗ ψ p (µ) ≤ 2e M x/2 < ∞ sup
1 =(µ)<−M/4 for all x ≥ 0. Taking into account definition (2.4) of Q, we find
β (0)ψ (µ)−β (0) p p1 p2
β p2 (0)ψ p (µ) + β p1 (0) < ∞. β (0)ψ (µ)+β (0) p p2 p1 sup w(x, λ)
=(µ)<−M/4
1 (2.28)
420
MENNICKEN, SAKHNOVICH, AND TRETTER
Next we show that lim
=(µ)→−∞
ψ p (µ) = 0.
(2.29)
For this purpose we need some auxiliary considerations. By Theorem 2.1 on the representation of W (x, µ), we obtain for µ = ζ + iη, η < −M/4, Z ∞ lim e−iµx W11 (x, µ) =: d(x) 6 = 0, |W12 (x, µ)|2 dζ < ∞. (2.30) ζ →∞
−∞
The function b(x, µ) := − ψ
W12 (x, µ) v12 (x, µ) = W11 (x, µ) v22 (x, µ)
(2.31)
belongs to N (x) (with P ≡ 0) and is hence bounded (by 1) on the domain {µ ∈ C : =(µ) < −M/4}. In view of (2.30) we also have Z ∞ M b(x, µ)|2 dζ < ∞, ζ = <(µ), =(µ) < − . |ψ (2.32) 4 −∞ b(x, · ) has By a Phragmen-Lindelöf-type theorem (see, e.g., [PW, Theorem VIII]), ψ a representation Z ∞
b(x, µ) = ψ
e−isµ g(s) ds
(2.33)
0
if =(µ) < −M/4, where the function h given by h(s) := eηs g(s) belongs to b(x, µ) = 0, and by (2.23) all the functions from N (x), L 2 (0, ∞). Thus limη→−∞ ψ including ψ p (x, · ), have the same limit for η → −∞ and this limit is uniform in <(µ). This proves (2.29) and proves that the limit therein is uniform in <(µ). By (2.29) and by the assumption that β p1 (0) 6 = 0, p = 1, 2 (see (2.12)), we can choose M > 0 sufficiently large such that β p2 (0)ψ p (µ) + β p1 (0) > ε,
=(µ) < −
M . 4
(2.34)
Now we introduce the function ϕ p (λ) :=
β p1 (0)ψ p (µ) − β p2 (0) β p2 (0)ψ p (µ) + β p1 (0)
,
(2.35)
where λ and µ are related by (2.3). (Recall that =(µ) < −M/4 if and only if λ ∈ D M .) Then formulas (2.28) and (2.34) imply that ϕ p satisfies condition (1.4) and is hence a W p -function of system (1.1). It remains to be proved that the W p -function is unique. To this end, suppose that e ϕ p is another W p -function of (1.1) defined on a domain D M e . Without loss of
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
421
e ≤ M, which implies that D M ⊂ D M generality, we can assume that M e . Then it follows from (1.4) and (2.2) that for each x > 0,
ixµ ϕ p (λ) e ϕ p (λ) ∗
< ∞. (2.36) sup
e Q(x) W (x, µ)Q(0)
1 1 =(µ)<−M/4 By (2.29) and definition (2.35) of ϕ p , we have lim
=(µ)→−∞
ϕ p (µ) = −
β p2 (0) . β p1 (0)
(2.37)
Inequality (2.23) implies that lim
=(µ)→−∞
|W11 (x, µ)|2 − |W21 (x, µ)|2 =
r11 (x, µ) = +∞
lim
=(µ)→−∞
uniformly in <(µ) and hence lim
=(µ)→−∞
|W11 (x, µ)| = ∞
(2.38)
uniformly in <(µ). Relations (2.36)–(2.38) yield lim
=(µ)→−∞
e ϕ p (λ) =
By (2.35) we have ψ p (µ) = and we put ep (µ) := ψ
lim
=(µ)→−∞
ϕ p (λ) = −
β p2 (0) . β p1 (0)
β p1 (0)ϕ p (λ) + β p2 (0) −β p2 (0)ϕ p (λ) + β p1 (0)
(2.39)
,
(2.40)
.
(2.41)
β p1 (0)e ϕ p (λ) + β p2 (0) −β p2 (0)e ϕ p (λ) + β p1 (0)
According to (2.36) and (2.39)–(2.41), we can choose M sufficiently large such that
ixµ ep (µ) ψ p (µ) ψ
<∞ sup
e W (x, µ)
1 1 =(µ)<−M/4 for each x > 0, that is,
ixµ
e W (x, µ) h(µ) < ∞ with h(µ) := ψ p (µ) − ψ ep (µ). (2.42) sup
0 =(µ)<−M/4 By (2.39)–(2.41) the function h is bounded on {µ ∈ C : =(µ) < −M/4}. Hence, again by [PW, Theorem VIII], Z ∞ M h(µ) = µ e−itµ f (t) dt, =(µ) < − , (2.43) 4 0
422
MENNICKEN, SAKHNOVICH, AND TRETTER
where the function e h given by e h(s) := eηs f (s) belongs to L 2 (0, ∞) ∩ L 1 (0, ∞). On the other hand, by (2.38) and (2.42) we have ixµ sup (2.44) e h(µ) < ∞ =(µ)<−M/4
e ≡ ψ and for each x > 0. From (2.43) and (2.44) it follows that h ≡ 0, that is, ψ hence e ϕp = ϕp. From the proof of Theorem 2.2 we derive the following auxiliary lemma. LEMMA 2.3 bp (x, · ) be Suppose the conditions of Theorem 2.2 are fulfilled. Let ψ p (x, · ) and ψ two arbitrary functions from N (x) for a fixed x > 0. Then ¯ bp (x, µ)| ≤ 2 e(i(µ−µ)+(M/2))x |ψ p (x, µ) − ψ ,
=(µ) < −
M . 4
(2.45)
Proof According to (2.21), we have bp (x, µ)| ≤ 2 |ρ1 (x, µ)ρ2 (x, µ)|−1/2 , |ψ p (x, µ) − ψ
=(µ) < −
M . 4
From (2.7), (2.19), and (2.9) it follows that for =(µ) < −M/4, ¯ e(i(µ−µ)+(M/2))x R(x, µ) > 0, x ¯ −1 (i(µ−µ)+(M/2))x −e R(x, µ) > 0.
(2.46)
(2.47)
x
Note that the element
(R(x, µ)−1 )22
equals
(r22 (x, µ) − r21 (x, µ)r11 (x, µ)−1r12 (x, µ))−1 , and recall that ρ1 (x, µ) = r11 (x, µ), ρ2 (x, µ) = −(r22 (x, µ) − r21 (x, µ)r11 (x, µ)−1r12 (x, µ))−1 .
(2.48)
Hence from (2.47) and (2.48) it follows that ¯ e(i(µ−µ)+(M/2))x ρ1 (x, µ) ≥ r11 (0, µ) = 1, ¯ e(i(µ−µ)+(M/2))x ρ2 (x, µ) ≥ −(R(0, µ)−1 )22 = 1,
that is, ¯ ρ1 (x, µ) ≥ e(i(µ−µ)−(M/2))x ,
¯ ρ2 (x, µ) ≥ e(i(µ−µ)−(M/2))x ,
which, together with (2.46), yields the desired estimate (see (2.45)).
(2.49)
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
423
The following theorem deals with the connection of the Weyl functions with the spectral properties of problem (2.7). THEOREM 2.4 Let p ∈ {1, 2}, and let ϕ p be the W p -function of system (1.1) satisfying conditions (1.2), (2.1), and (2.12). Then the inequality Z ∞ b (s, µ)∗ W b (s, µ) ϕ p (λ) ds < ∞, =(µ) < − M , (2.50) ϕ p (λ) 1 W 1 4 0
holds, where b (x, µ) := e−ixµ Q(x)w(x, λ), W
b (0, µ) = Q(0), W
(2.51)
is the solution of system (2.7) equivalent to system (1.1). Moreover, if λr ∈ D M is a zero of the function ϕ p , then µr = b p /2(λr − d p ) is an eigenvalue of system (2.7), that is, of the operator L in L 2 (0, ∞)2 given by d L := −i j − ξ( · , µ) dx on the domain D(L) := { f ∈ W21 (0, ∞)2 : β p1 (0) −β p2 (0) f (0) = 0}, b (x, µr ) 0 . and the corresponding eigenfunction is given by W 1 Proof From estimates (2.22) and (2.25) it follows that Z x 1 − |ψ p (µ)|2 ψ p (µ) ∗ W (s, µ) W (s, µ) ds ≤ ψ p (µ) 1 1 −2 (=(µ) + (M/4)) 0 for =(µ) < −M/4, that is, Z x ψ p (µ) ds < ∞. ψ p (µ) 1 W (s, µ)∗ W (s, µ) 1 0
(2.52)
In view of (2.4) and (2.35) we have ϕ p (λ) ψ p (µ) ∗ ψ p (µ) Q(0) β p2 (0)ψ p (µ) + β p1 (0) = Q(0)Q(0) = . 1 1 1 (2.53) If we substitute (2.53) into (2.52), invoke differential equation (2.2) for W ( · , µ), and estimate (2.34), we obtain (2.50) and (2.51).
424
MENNICKEN, SAKHNOVICH, AND TRETTER
3. Inverse problem (uniqueness of the solution) The inverse problem consists of recovering the coefficients β p , p = 1, 2, of system (1.1) with properties (1.2), (2.1), and (2.12) from its W1 -function ϕ1 and from its W2 -function ϕ2 when the constants b p and d p , p = 1, 2, are given. We note that system (1.1) does not change (more exactly, the expressions β p (x)∗ β p (x) do not change) if we multiply the vector functions β p with scalars c p of modulus 1. Therefore we can assume without loss of generality that β p1 > 0,
p = 1, 2.
(3.1)
3.1 For given W p -functions ϕ p , p = 1, 2, there is at most one system (1.1) satisfying conditions (1.2), (2.1), and (3.1). THEOREM
Proof Suppose there are two systems, (1.1) and e1 (x)∗ β e1 (x) b2 β e2 (x)∗ β e2 (x) b1 β e yx (x, λ) = i + e y(x, λ), λ − d1 λ − d2
x ∈ [0, ∞), λ ∈ C,
(3.2) that satisfy the conditions of the theorem with the same W p -functions ϕ p , p = 1, 2. e We denote all quantities connected with system (3.2) with e (e.g., Q(x), w e(x, λ), e in etc.). Without loss of generality, we may assume that for the constants M and M e ≤ M (which implies D M ⊂ D M the definition of the Weyl functions we have M e ). We recall that by Theorem 2.2 the W p -functions ϕ p , p = 1, 2, are unique for each system. In what follows we use some of the properties of ϕ p which have been established in the proof of Theorem 2.2. In particular, by (2.37) it follows that ep2 (0) β β p2 (0) = . e β p1 (0) β p1 (0)
(3.3)
ep (0)β ep (0)∗ = 1 by (1.2), and β p1 (0) > 0, p = 1, 2, formula As β p (0)β p (0)∗ = β (3.3) yields v v !−1 u !−1 u u u ep2 (0)β ep2 (0) β p2 (0)β p2 (0) β t t ep1 (0). β p1 (0) = 1+ = 1+ =β ep1 (0)2 β p1 (0)2 β (3.4) According to (3.3) and (3.4), we obtain ep (0), β p (0) = β
e Q(0) = Q(0).
(3.5)
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
425
Now, consider w(x, λ). Due to (2.35) we have −1 ϕ p (λ) ∗ ψ p (µ) = Q(0) β p2 (0)ψ p (µ) + β p1 (0) . 1 1 As ψ p is bounded, (1.4) and (3.6) imply
∗ ψ p (µ) sup w(x, λ)Q(0)
<∞
1 λ∈D M for each x > 0. According to (2.7) and (2.9), we obtain e(i(µ−µ)−(M/2))x W (x, µ)∗ W (x, µ) < 0, x
(3.6)
(3.7)
=(µ) < 0,
and consequently,
sup e−iµx W (x, µ) < ∞. =(µ)<0
In view of (2.2) the last relation yields
sup e−2iµx w(x, λ) < ∞.
(3.8)
=(µ)<0
bp given by By Lemma 2.3 and (3.8), estimate (3.7) also holds with the function ψ (2.31) instead of ψ p , that is,
b ∗ ψ p (µ)
sup (3.9)
w(x, λ)Q(0)
< ∞. 1 =(µ)<−M/4 Inequalities (3.8) and (3.9) together yield
w(x, λ)Q(0)∗ 9(x, µ) < ∞ sup =(µ)<−M/4
with 9(x, µ) :=
e−2iµx 0
(3.10) b ψ p (x, µ) . 1
Analogously, we obtain
w e ∗9 e (x, µ) < ∞ sup e(x, λ) Q(0) =(µ)<−M/4
e (x, µ) := with 9
e−2iµx 0
e (x, µ) b ψ p 1
(3.11) ! .
e As Moreover, according to (3.5), inequality (3.11) also holds with Q instead of Q. e ep (µ). Applying ϕ1 = e ϕ1 , ϕ2 = e ϕ2 , and Q(0) = Q(0), we also have ψ p (µ) = ψ
426
MENNICKEN, SAKHNOVICH, AND TRETTER
Lemma 2.3 to both systems gives b e (x, µ) b sup ψ p (x, µ) − ψ p =(µ)<−M/4
≤
sup =(µ)<−M/4
e ψ bp (x, µ) − ψ p (µ) + ψ b p (x, µ) − ψ ep (µ)
≤ 4 e(i(µ−µ)+(M/2))x .
(3.12)
In view of the inequality analogous to (3.8),
e(x, λ) < ∞, sup e−2iµx w =(µ)<0
e . Altogether bp instead of ψ b and formula (3.12), inequality (3.11) also holds with ψ p we obtain
w sup e(x, λ)Q(0)∗ 9(x, µ) < ∞. (3.13) =(µ)<−M/4
From representation (2.10) of W , property (2.11), and formulas (2.2) and (2.33), we conclude Z ∞ w(x, λ)Q(0)∗ 9(x, µ) − Q(x)∗ D0 (x) 2 dζ < ∞, =(µ) < − M , (3.14) 4 −∞ where ζ = <(µ). Analogously, the inequality Z ∞ w e ∗D e0 (x) 2 dζ, e(x, λ)Q(0)∗ 9(x, µ) − Q(x)
=(µ) < −
−∞
M , 4
(3.15)
for system (3.2) follows. In a way similar to the way we arrived at (2.33), we obtain from (3.10) and (3.14) that Z ∞ w(x, λ)Q(0)∗ 9(x, µ) = Q(x)∗ D0 (x) + e−isµ g(s) ds (3.16) 0
where the elements g jk of the (2 × 2)-matrix function g are such that the functions k jk given by k jk (s) := eηs g jk (s) belong to L 2 (0, ∞) for η < −M/4. From (3.13) and (3.15) we find Z ∞ ∗ ∗e e w e(x, λ)Q(0) 9(x, µ) = Q(x) D0 (x) + e−isµe g (s) ds (3.17) 0
where the elements e g jk of the (2 × 2)-matrix function e g have the same properties as g jk . b > 0 such that the right-hand According to (3.16) and (3.17), there exists an M sides in (3.16) and (3.17) are invertible and their inverses are uniformly bounded in
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
b Hence the half-plane {µ ∈ C : =(µ) < − M}.
sup w(x, λ)e w(x, λ)−1 < ∞, b =(µ)<− M
sup b =(µ)<− M
427
e(x, λ)w(x, λ)−1 < ∞.
w (3.18)
By (1.1) and (3.2) we have w(x, λ)∗ = w(x, λ)−1 ,
w e(x, λ)∗ = w e(x, λ)−1 .
From this and (2.3) we obtain
sup w e(x, λ)w(x, λ)−1 = b =(µ)<− M
=
sup b =(µ)<− M
sup b =(µ)<− M
=
sup b =(µ)<− M
(3.19)
w (x, λ)∗ )−1 w(x, λ)∗
(e
w (x, λ)−1
w(x, λ)e
w (x, λ)−1 .
w(x, λ)e
Therefore we also have
w (x, λ)−1 < ∞. sup w(x, λ)e
(3.20)
b =(µ)> M
Let K (x, λ) := w(x, λ)e w (x, λ)−1 . b By (3.18) and (3.20), K (x, · ) is bounded in the half-planes {µ ∈ C : |=(µ)| > M}: sup
kK (x, λ)k < ∞.
(3.21)
b |=(µ)|> M
According to the theorem of Phragmen and Lindelöf, formula (3.21) implies the boundedness of K x, (b p /2µ) + d p in the neighborhood of µ = ∞, that is, K (x, λ) is bounded in the neighborhood of λ = d p . Since p ∈ {1, 2} is arbitrary, K (x, λ) is analytic in λ = d1 and λ = d2 , that is, K (x, · ) is an entire function. From (1.1) and (3.2) it follows that lim w(x, λ) = lim w e(x, λ) = I2 .
λ→∞
λ→∞
(3.22)
Thus K (x, · ) = w(x, · )e w (x, · )−1 is bounded and hence constant. By (3.22), K (x, · ) ≡ I2 , and hence w(x, · ) = w e(x, · ). Consequently, systems (1.1) and (3.2) coincide. In the next section we construct explicit solutions of the inverse problem for system (1.1) satisfying the weaker condition sup 0≤x≤l<∞
kβ 0p (x)k < ∞,
p = 1, 2, 0 ≤ l < ∞,
(3.23)
428
MENNICKEN, SAKHNOVICH, AND TRETTER
instead of (2.1). For this case we have the following uniqueness result. 3.2 For given W p -functions ϕ p , p = 1, 2, which admit asymptotic representations bp 1 M ϕp dp + = cp + O , =(µ) < − , µ → ∞, (3.24) 2µ µ 4 THEOREM
with some constants c p ∈ C there is at most one system (1.1) satisfying conditions (1.2), (3.23), and (3.1). Proof Similarly as in the proof of Theorem 3.1, we suppose that systems (1.1) and (3.2) both have ϕ1 and ϕ2 as W p -functions and that they satisfy (1.2), (3.1), and (3.23) for e ≤ M. We fix an all 0 < l < ∞. We also assume without loss of generality that M e+ p by arbitrary number l ∈ (0, ∞), and we define vector functions β+ p and β 0 β p (x), 0 ≤ x < l, 0 β+ p (0) = β p (0), β+ p (x) = 0, x ≥ l, (3.25) 0 ep (x), 0 ≤ x < l, β 0 e+ p (0) = β ep (0), e+ p (x) = β β 0, x ≥ l. e+ p into (1.1) and (3.2) instead of If we substitute the new vector functions β+ p and β e β p and β p , the new systems satisfy conditions (1.2), (3.1), and (2.1). According to Theorem 3.1, these new systems have W p -functions ϕ+1 , ϕ+2 and e ϕ+1 , e ϕ+2 , respectively. Furthermore, by (2.37) and (3.25) we have lim
=(µ)→−∞
ϕ+ p (λ) = −
β p2 (0) , β p1 (0)
lim
=(µ)→−∞
e ϕ+ p (λ) = −
ep2 (0) β ep1 (0) β
(3.26)
for p = 1, 2. In view of (3.25) the fundamental solutions of the new systems and the corresponding original systems coincide at x = l, and by Definition 1.1 of the W p -function we obtain
ϕ p (λ) ϕ+ p (λ)
< ∞, w(l, λ) (3.27) sup
1 1 λ∈D M
ϕ p (λ) e ϕ+ p (λ)
<∞ sup w e (l, λ) (3.28)
1 1 λ∈D M for p = 1, 2. By condition (1.4) for x = 0, the functions ϕ p are bounded on D M . Next we show that β p2 (0) c p = lim ϕ p (λ) = − . (3.29) µ→∞ β p1 (0)
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
429
Suppose (3.29) does not hold. Then, invoking (3.26), we see that (3.27) yields, for any real ζ , sup kw(l, ζ + iη)k < ∞. (3.30) η<−M/4
From definitions (2.2) and (2.19) and inequality (3.30) we infer sup
η<−M/4
kr11 (l, ζ + iη)k < ∞,
which contradicts (2.23). Thus (3.29) is true. In the same way, using (3.28), one can show that ep2 (0) β lim ϕ p (λ) = − , ep1 (0) µ→∞ β that is, relation (3.3) again holds and hence (3.5) (see the proof of Theorem 3.1). In view of (3.24), (3.29), and (2.12) we can choose M sufficiently large such that −β p2 (0)ϕ p (λ) + β p1 (0) > ε,
=(µ) < −
M , 4
(3.31)
for some ε > 0. We set ψ p (µ) :=
β p1 (0)ϕ p (λ) + β p2 (0)
−β p2 (0)ϕ p (λ) + β p1 (0) −2iµl e ψ p (l, µ) 9(l, µ) := . 0 1
,
From (2.4) and (3.32) it follows that −1 ψ p (µ) ϕ p (λ) −β p2 (0)ϕ p (λ) + β p1 (0) = Q(0) . 1 1 In view of (3.31) and (3.34), formulas (3.27) and (3.28) yield
∗ ψ p (µ)
sup w(l, λ)Q(0)
< ∞, 1 λ∈D M
∗ ψ p (µ) sup w e (l, λ)Q(0)
< ∞. 1 λ∈D M Notice that condition (3.23) implies that instead of (2.9) we have sup x∈[0,l) =(µ)<−M/4
kξ(x, µ)k <
M . 4
(3.32) (3.33)
(3.34)
(3.35)
430
MENNICKEN, SAKHNOVICH, AND TRETTER
Hence inequality (3.8) holds here as well for any x < l. By inequality (3.8) applied to both systems (1.1) and (3.2) and by formulas (3.33) and (3.35), we obtain
sup w(l, λ)Q(0)∗ 9(l, µ) < ∞, sup w e(l, λ)Q(0)∗ 9(l, µ) < ∞, λ∈D M
λ∈D M
(3.36) that is, the analogues of (3.10) and (3.13). From (3.24), (3.31), and (3.32) we find 1 sup |ψ p (µ)| < ∞, ψ p (µ) = O , µ → ∞. (3.37) µ =(µ)<−M/4 Moreover, assumption (3.23) also implies representation (2.10) from Theorem 2.1 for any x < l. This and (3.37) allow us to also derive the analogues of (3.14) and (3.15). Therefore all further arguments of the proof of Theorem 3.1 can be used without change. This shows that w e(l, λ) = w(l, λ). Since l was arbitrary, we finally obtain w e ≡ w. The general procedure of solving the inverse problem for system (1.1) was presented in [SaA2]. In the next section another approach is used, which allows us to construct explicit solutions. 4. Explicit solutions of the direct and inverse problem In special cases it is possible to obtain explicit solutions of the direct and inverse problem for a system (1.1). For this purpose it is important to construct the fundamental solution of a system with constant coefficients associated with (1.1) explicitly (see [GKS] and the references therein). We start with the simple initial system ! 0 )∗ β 0 0 )∗ β 0 b (β b (β 1 2 1 1 2 2 w00 (x, λ) = i + w0 (x, λ), x ∈ [0, ∞), λ − d1 λ − d2 (4.1) w(0, λ) = I2 , where λ ∈ C and β10 , β20 are constants. First we formulate and directly prove an auxiliary lemma that can also be deduced as a particular case from the paper [SaA1] on the generalized Backlund-Darboux transformation. To this end, we choose an arbitrary (n × n)-matrix A such that d1 , d2 ∈ / σ (A). We introduce an (n ×2)-matrix function 5 on [0, ∞) by the arbitrary initial condition 5(0) satisfying A − A∗ = i 5(0)5(0)∗ (4.2)
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
431
and the linear differential equation 5 (x) = −i 0
2 X
b p (A − d p In )−1 5(x)(β 0p )∗ β 0p .
(4.3)
p=1
We also introduce an (n × n)-matrix function S on [0, ∞) by S(0) = In ,
S 0 (x) = −
2 X
b p (A − d p In )−1 5(x)(β 0p )∗ β 0p 5(x)∗ (A∗ − d p In )−1 .
p=1
(4.4) From (4.2)–(4.4) we derive the operator equality AS(x) − S(x)A∗ = i 5(x)5(x)∗ .
(4.5)
Indeed, by (4.3) and (4.4) the derivatives of both sides of (4.5) coincide, and at x = 0 identity (4.5) coincides with (4.2). Now we suppose that det S(x) 6 = 0, x ∈ [0, ∞), and we set w A (x, λ) := I2 − i 5(x)∗ S(x)−1 (A − λIn )−1 5(x).
(4.6)
In view of (4.5), w A is a transfer matrix function from system theory in the form given by L. Sakhnovich (see [SaL2], [SaL3]). LEMMA 4.1 The matrix function w A given by (4.6) with 5 and S defined by (4.2)–(4.4) satisfies the differential equation
∂w A (x, λ) = G(x, λ)w A (x, λ) − w A (x, λ)G 0 (λ), ∂x
(4.7)
where G 0 (λ) := i
2 X
b p (λ − d p )−1 (β 0p )∗ β 0p ,
(4.8)
p=1
G(x, λ) := i
2 X
b p (λ − d p )−1 β p (x)∗ β p (x),
p=1
β p (x) := β 0p w A (x, d p )∗ , p = 1, 2.
(4.9)
432
MENNICKEN, SAKHNOVICH, AND TRETTER
Proof First we calculate (5∗ S −1 )0 . In view of (4.3) and (4.4) we obtain (5(x)∗ S(x)−1 )0 = i
2 X
b p (β 0p )∗ β 0p 5(x)∗ (A∗ − d p In )−1 S(x)−1
p=1
+ 5(x) S(x) ∗
−1
2 X
b p (A − d p In )−1 5(x)(β 0p )∗ β 0p
p=1
· 5(x) (A − d p In )−1 S(x)−1 . ∗
∗
(4.10)
Notice that (4.5) yields (A∗ − λIn )−1 S(x)−1 − S(x)−1 (A − λIn )−1 = i (A∗ − λIn )−1 S(x)−1 5(x)5(x)∗ S(x)−1 (A − λIn )−1 . (4.11) From (4.6) and (4.11) with λ = d p it follows that 5(x)∗ (A∗ − d p In )−1 S(x)−1 = (I2 + (w A (x, d p ) − I2 )∗ )5(x)∗ S(x)−1 (A − d p In )−1 = w A (x, d p )∗ 5(x)∗ S(x)−1 (A − d p In )−1 .
(4.12)
If we substitute (4.12) into (4.10) and use (4.6), we arrive at (5(x)∗ S(x)−1 )0 = i
2 X
b p I2 − i 5(x)∗ S(x)−1 (A − d p In )−1 5(x) (β 0p )∗ β 0p
p=1
· w A (x, d p )∗ 5(x)∗ S(x)−1 (A − d p In )−1 =i
2 X
b p w A (x, d p )(β 0p )∗ β 0p
p=1
· w A (x, d p )∗ 5(x)∗ S(x)−1 (A − d p In )−1 .
(4.13)
From (4.9) and (4.13) one concludes that (5(x)∗ S(x)−1 )0 = i
2 X
b p β p (x)∗ β p (x)5(x)∗ S(x)−1 (A − d p In )−1 .
p=1
Using the resolvent identity (A − d p In )−1 − (A − λIn )−1 = −(λ − d p )(A − d p In )−1 (A − λIn )−1 ,
(4.14)
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
433
differentiating the right-hand side of formula (4.6) defining w A (x, λ), and using (4.3), we see 2
X ∂w A (x, λ) =i b p (λ − d p )−1 β p (x)∗ β p (x)(w A (x, λ) − w A (x, d p )) ∂x p=1
−i
2 X
b p (λ − d p )−1 (w A (x, λ) − w A (x, d p ))(β 0p )∗ β 0p .
(4.15)
p=1
By means of (4.11) it follows that w A (x, λ)∗ w A (x, λ) = I2 .
(4.16)
According to (4.9) and (4.16), we have w A (x, d p )(β 0p )∗ β 0p − β p (x)∗ β p (x)w A (x, d p ) = 0. Thus, by definitions (4.8) and (4.9), (4.7) follows from (4.15). A consequence of Lemma 4.1 is the subsequent proposition. 4.2 Let b p = ±1, p = 1, 2, and d p = d p , p = 1, 2, d1 6 = d2 be constants. Suppose that the (n × n)-matrix A, the (n × 2)-matrix 5(0), and vectors β 0p ∈ C2 , p = 1, 2, are given and that they satisfy the relations PROPOSITION
A − A∗ = i 5(0)5(0)∗ ,
dp ∈ / σ (A),
β 0p (β 0p )∗ = 1,
p = 1, 2.
(4.17)
Assume in addition that for the (n×n)-matrix function S defined by the initial problem (4.4) we have det S(x) 6= 0 for all x ∈ [0, ∞). Then the parameters A, 5(0), and β 0p , p = 1, 2, generate a system (1.1) with coefficients β p (x) = β 0p w A (x, d p )∗ ,
p = 1, 2,
(4.18)
and fundamental solution w(x, λ) = w A (x, λ)w0 (x, λ)w A (0, λ)−1 ,
(4.19)
where w0 ( · , λ) is the fundamental solution of system (4.1) with the constant coefficients β 0p , p = 1, 2. Proof By (4.1), the matrix function w( · , λ) given by (4.19) satisfies w(0, λ) = I2 (see
434
MENNICKEN, SAKHNOVICH, AND TRETTER
(1.3)). Due to (4.1), (4.7), and (4.8) we obtain ∂w(x, λ) = (G(x, λ)w A (x, λ) − w A (x, λ)G 0 (λ)) w0 (x, λ)w A (0, λ)−1 ∂x + w A (x, λ)G 0 (λ)w0 (x, λ)w A (0, λ)−1 = G(x, λ)w(x, λ). By the definition of G( · , λ) in (4.9), this shows that w( · , λ) satisfies (1.1). In the following lemma we present two cases where the condition det S(x) 6= 0, x ∈ [0, ∞), is satisfied and w0 ( · , λ) can be constructed explicitly. LEMMA 4.3 If conditions (4.17) of Proposition 4.2 are fulfilled and the coefficients β 0p , p = 1, 2, are either equal, (4.20) β10 = β20 = u 1 u 2 ,
or orthogonal to each other, β10 = u 1
u2 ,
β20 = −u 2
u1 ,
(4.21)
with constants u 1 , u 2 ∈ C, then det S(x) 6 = 0,
x ∈ [0, ∞),
and the fundamental solution w0 ( · , λ) of (4.1) is given by −1 −1 w0 (x, λ) = U ∗ diag eix(b1 (λ−d1 ) +b2 (λ−d2 ) ) , 1 U
(4.22)
in the case (4.20) and −1 −1 w0 (x, λ) = U ∗ diag eixb1 (λ−d1 ) , eixb2 (λ−d2 ) U
(4.23)
in the case (4.21), where U :=
u1 −u 2
u2 u1
.
(4.24)
Proof By (4.17) we have |u 1 |2 + |u 2 |2 = 1, and hence the matrix U is unitary. With the (2 × 2)-matrix function G 0 defined in (4.8), the differential equation (4.1) can be written as ∂w0 (x, λ) = G 0 (λ)w0 (x, λ) (4.25) ∂x
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
and the columns of U form a basis of eigenvectors of G 0 (λ); more exactly, −1 + b (λ − d )−1 ) 0 2 2 ∗ i(b1 (λ − d1 ) G 0 (λ) = U U in case (4.20) 0 0
435
(4.26)
and G 0 (λ) = U
∗
ib1 (λ − d1 )−1 0
0 ib2 (λ − d2 )−1
U
in case (4.21).
(4.27)
Formulas (4.25)–(4.27) show the asserted representations (4.22) and (4.23) for w0 ( · , λ). For case (4.20), the relations 1 0 0 ∗ 0 ∗ 0 ∗ 0 ∗ U (β1 ) β1 U = U (β2 ) β2 U = 0 0 and differential equation (4.3) imply (5(x)U ) = −i ∗ 0
2 X
b p (A − d p In )
−1
5(x)U
p=1
∗
1 0 . 0 0
Hence in this case −1 −1 5(x) = e−ix(b1 (A−d1 In ) +b2 (A−d2 In ) ) π1
π2 U,
(4.28)
where π1
π2 := 5(0)U ∗ .
(4.29)
Now, consider the case (4.21). In a similar way, using 1 0 0 0 0 ∗ 0 ∗ 0 ∗ 0 ∗ U (β1 ) β1 U = , U (β2 ) β2 U = , 0 0 0 1 we obtain the representation −1 5(x) = e−ixb1 (A−d1 In ) π1
−1 e−ixb2 (A−d2 In ) π2 U
(4.30)
with π1 , π2 given as in (4.29). Suppose now that det S(l) = 0 for some l ∈ [0, ∞); that is, suppose there exists a vector f ∈ Cn , f 6= 0, such that S(l) f = 0. Then, due to (4.5), it follows that f ∗ 5(l)5(l)∗ f = −i f ∗ (AS(l) − S(l)A∗ ) f = 0,
(4.31)
436
MENNICKEN, SAKHNOVICH, AND TRETTER
that is, 5(l)∗ f = 0.
(4.32)
From (4.5) applied to f and from (4.31) and (4.32), we also get S(l)A∗ f = 0. Inductively, by analogous arguments, one can show that 5(l)∗ (A∗ )k f = 0,
S(l)(A∗ )k f = 0,
k = 0, 1, . . . .
(4.33)
From the first equality in (4.33), we obtain that for any analytic function φ, 5(l)∗ φ(A∗ ) f = 0,
(4.34)
and thus in the case (4.20), for k = 0, 1, . . . , and any x ∈ [0, ∞), 0 = 5(l)∗ ei(x−l)(b1 (A
∗ −d
−1 ∗ −1 1 In ) +b2 (A −d2 In ) )
π1∗ eix(b1 (A −d1 In ) +b2 (A −d2 In ) ) ∗ −1 ∗ −1 π2∗ ei(x−l)(b1 (A −d1 In ) +b2 (A −d2 In ) ) ∗
= U∗
(A∗ )k f ! −1
−1
∗
(A∗ )k f.
Since U is unitary, we see that π1∗ eix(b1 (A
∗ −d I )−1 +b (A∗ −d I )−1 ) 1 n 2 2 n
(A∗ )k f = 0,
π2∗ (A∗ )k f = 0.
For case (4.21) we proceed in a similar way and obtain in both cases that 5(x)∗ (A∗ )k f = 0,
k = 0, 1, . . . , x ∈ [0, ∞).
(4.35)
Again this implies that for any analytic function φ and for x ∈ [0, ∞) we have 5(x)∗ φ(A∗ ) f = 0. From this and (4.4) we easily infer S(x) f = S(0) f = f , a contradiction to (4.31). In the next theorem the solution of the direct spectral problem for systems (1.1) with coefficients β p , p = 1, 2, generated by parameters β 0p , p = 1, 2, satisfying (4.20) is given. THEOREM 4.4 Let b p = ±1, p = 1, 2, and d p = d p , p = 1, 2, d1 6= d2 , be constants. Suppose that the parameters A, 5(0), and β 0p , p = 1, 2, satisfy (4.17) and (4.20). Then the vector functions β p , p = 1, 2, given by (4.18) are well defined and satisfy (1.2). Moreover, if (2.12) holds, the functions
ϕ1 (λ) = ϕ2 (λ) =
−u 2 − iθ1∗ (A − λIn )−1 π2 , u 1 − iθ2∗ (A − λIn )−1 π2
(4.36)
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
437
where θ1 , θ2 and π1 , π2 denote the columns of 5(0) and 5(0)U ∗ , respectively, are W p -functions of the system (1.1) with coefficients β p , p = 1, 2, given by (4.18). If u 1 6= 0, we have the realization ϕ1 (λ) = ϕ2 (λ) = −
i u2 − 2 π1∗ (A× − λIn )−1 π2 , u1 u1
where A× = A −
i π2 θ2∗ . u1
(4.37)
(4.38)
Proof By Lemma 4.3 and Proposition 4.2, the functions β p , p = 1, 2, are well defined. By (4.16), the matrix w A (x, d p ) is unitary. Hence the third formula in (4.17) and the definition of β p in (4.18) imply that β p (x)β p (x)∗ = 1, p = 1, 2, that is, (1.2) holds. In view of (4.19) and (4.22) we have ∗ 0 ∗ 0 w(x, λ)w A (0, λ)U = w A (x, λ)U . (4.39) 1 1 As d p ∈ / σ (A), we conclude from (4.39) and the definition of w A (x, λ) in (4.6) that
∗ 0 sup w(x, λ)w (0, λ)U < ∞. (4.40) A
1 λ∈D M According to definitions (4.4), (4.6), and (4.29), we have ∗ θ ∗ ∗ w A (0, λ)U = U − i 1∗ (A − λIn )−1 π1 θ2
π2 .
By the definition of the matrix U in (4.24), this shows ∗ −u 2 θ ∗ 0 w A (0, λ)U = − i 1∗ (A − λIn )−1 π2 1 u1 θ2 ϕ p (λ) = (u 1 − iθ2∗ (A − λIn )−1 π2 ), 1
(4.41)
where ϕ p is given by (4.36). If u 1 − iθ2∗ (A − d p In )−1 π2 6 = 0,
p = 1, 2,
(4.42)
then (4.40) and (4.41) yield (1.4), that is, ϕ p are W p -functions of (1.1). But condition (4.42) is equivalent to the condition β p1 (0) 6 = 0, p = 1, 2 (see (2.12)). Indeed, by (4.18), (4.20), and (4.24), inequalities (2.12) are equivalent to 1 6= 0, p = 1, 2. (4.43) 1 0 w A (0, d p ) U ∗ 0
438
MENNICKEN, SAKHNOVICH, AND TRETTER
By (4.41) inequalities (4.42) can be written as ∗ 0 6= 0, 0 1 w A (0, d p ) U 1
p = 1, 2.
(4.44)
Finally, since the matrices w A (0, d p ), p = 1, 2, and U ∗ are unitary, we conclude that (4.43) and (4.44) are equivalent. It remains to derive the realization (4.37) from (4.36). It is well known and easily checked that −1 i ∗ i −1 1 − θ2 (A − λIn ) π2 = 1 + θ2∗ (A× − λIn )−1 π2 . u1 u1 Hence (4.36) can be rewritten equivalently as i ∗ u2 i ∗ × −1 −1 ϕ1 (λ) = ϕ2 (λ) = − − θ1 (A − λIn ) π2 1 + θ2 (A − λIn ) π2 u1 u1 u1 i ∗ u2 = − + θ1 (A − λIn )−1 (A× − A)(A× − λIn )−1 π2 u1 u1 iu 2 i − θ1∗ (A − λIn )−1 π2 − 2 θ2∗ (A× − λIn )−1 π2 . u1 u1 Using the relation A× − A = (A× − λIn ) − (A − λIn ), we find ϕ1 (λ) = ϕ2 (λ) = −
u2 i iu 2 − θ1∗ (A× − λIn )−1 π2 − 2 θ2∗ (A× − λIn )−1 π2 . (4.45) u1 u1 u1
If we note that u 1 θ1∗ + u 2 θ2∗ = π1∗ , we obtain that formula (4.37) is equivalent to (4.45) and hence to (4.36). Now we consider the corresponding inverse problem. For this purpose we need some basic facts from system theory (see, e.g., [KaFA], [BGK]). A rational function is called proper if it has a limit at ∞. It is well known that a proper rational function ϕ can be represented in the form e A e − λIn )−1 e ϕ(λ) = D − C( B
(4.46)
e an (n × n)-matrix A, e and an (n × 1)-vector e with a scalar D, a (1 × n)-vector C, B for some n ∈ N. A representation of form (4.46) is called a realization of ϕ. A realization e is minimal. The minimality of (4.46) is called minimal if the order n of the matrix A the realization (4.46) is equivalent to the relations e C C e eA = rank e ee en−1 e rank .. (4.47) B A B ··· A B = n. . eA en−1 C
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
439
We also need the following statement, which may be found in [Ka] (see also [LR] or [GKS, Proposition 2.2]). If T1 , T2 , and R are (n × n)-matrices such that T p ≥ 0, p = 1, 2, and T2 T R 2 = rank T1 RT1 · · · R n−1 T1 = n, (4.48) rank . .. T2 R n−1 then there exists a positive solution X of the algebraic Riccati equation R X − X R ∗ = i (X T2 X − T1 ).
(4.49)
4.5 A proper rational function ϕ which is analytic at d1 and d2 is both a W1 - and a W2 function for system (1.1) with coefficients β p , p = 1, 2, given by (4.18), where the parameters A, 5(0) = θ1 θ2 , and β10 = β20 are defined as follows: We choose a minimal realization (4.46) of ϕ and set THEOREM
β10 := β20 := u 1
u2 ,
D 1 , u 2 := − p . u 1 := p 2 1 + |D| 1 + |D|2
(4.50)
Furthermore, we take a positive solution X of the Riccati equation e∗ C eX − e R X − X R ∗ = i (u 41 X C Be B ∗ ),
e e + u1u2 e BC R=A
(4.51)
(which always exists). Finally, we set π1 :=
e∗ , iu 21 X 1/2 C
5(0) := θ1
π2 := X
θ2 := π1
−1/2 e
π2 U,
B,
U :=
u1 −u 2
u2 u1
e 1/2 + A := X −1/2 AX
,
i π2 θ2∗ . u1
(4.52) (4.53)
The coefficients β p , p = 1, 2, given by (4.18) via these parameters satisfy condition (2.12), that is, β p1 (0) 6= 0, p = 1, 2. Proof From the minimality of realization (4.46) we can derive the existence of a positive
440
MENNICKEN, SAKHNOVICH, AND TRETTER
solution of the Riccati equation (4.51). Indeed, if we set ee C B C B eR e 0 ··· 0 1 , κr := u 1 u 2 .. {z } . | r n−r −1 eR e C B
K r :=
Ir −κr
0 In−r
,
r = 1, 2, . . . , then it follows by induction that for s = 1, 2, . . . , n − 1, e e C C C C e eA eR Ys , = , Ys := .. K s · · · K 1 .. Z . . s eA es−1 eR n−1 C C
eA es C C es eR A . Z s := .. . eR n−1−s A es C
In particular,
e e C C C C e eR eA . K n−1 · · · K 1 .. = .. . . n−1 n−1 e e e CR CA The last relation shows that
e e C C C C e eR eA . rank .. = rank . .. . n−1 n−1 e e e CR CA
(4.54)
In a similar way we get rank e B Re B ···
R n−1 e B = rank e B
ee A B ···
en−1 e A B .
(4.55)
It is easy to see that rank e B Re B ···
R n−1 e B = rank e Be B∗
Re Be B∗
···
R n−1 e Be B∗ ,
∗ e eC e C C C C ∗ eR eR e C = rank . . rank ... .. eR n−1 e∗ C eR n−1 C C
(4.56)
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
441
e∗ C, e then T1 and T2 satisfy condition (4.48). Hence If we let T1 := e Be B ∗ and T2 := u 41 C the corresponding Riccati equation (4.49), which coincides with (4.51) for our choice of T1 , T2 , has a positive solution X . Therefore the parameters A, 5(0), and β 0p , p = 1, 2, given by formulas (4.50), (4.52), and (4.53) are well defined. Now we show that they satisfy the conditions in (4.17). Using (4.51) and (4.52) and noting that u 1 = u 1 , we obtain u2 u2 ∗ 1/2 ∗ ∗ ∗ ∗ e e AX − X A = i X π1 π1 − π2 π2 − π2 π1 − π1 π2 X 1/2 . (4.57) u1 u1 By (4.53) we infer i A − A∗ = i π1 π1∗ + π2 π2∗ − π2 u 2 π1∗ + u 1 π2∗ u1 i i i − (u 2 π1 + u 1 π2 ) π2∗ + π2 θ2∗ + θ2 π2∗ . u1 u1 u1
(4.58)
As U is unitary and u 2 π1 + u 1 π2 = θ2 , formula (4.58) is equivalent to A − A∗ = i 5(0)5(0)∗ , which is the first relation in (4.17). By the definition of β 0p in (4.50), ∗ we are in case (4.20) and β 0p β 0p = 1, p = 1, 2, which is the third relation in (4.17). Finally, we suppose that the second relation in (4.17) does not hold, that is, (A − d p In ) f = 0
(4.59)
for some f ∈ Cn , f 6= 0. From the first relation in (4.17) and from (4.59) it follows that 5(0)∗ f = 0; in particular, θ2∗ f = 0. Thus, by (4.38) and (4.59), we conclude that (A× − d p In ) f = 0, that is, d p ∈ σ (A× ). If we note that according to (4.38) and the second relation in (4.53) we have A× = e 1/2 and use (4.52), we can rewrite (4.46) as X −1/2 AX ϕ(λ) = D −
i i ∗ −1/2 e π1 X ( A −λIn )−1 X 1/2 π2 = D − 2 π1∗ (A× −λIn )−1 π2 . (4.60) 2 u1 u1
Since the realization on the right-hand side of (4.60) is minimal, it follows that ϕ has a pole at λ = d p , a contradiction to the assumption. This proves the second relation in (4.17). In order to show that all assumptions of Theorem 4.4 are satisfied, (2.12) remains to be proved. In the proof of Theorem 4.4 it has been shown that the representations (4.36) and (4.37) are equivalent, and hence ϕ(λ) can also be written as ϕ(λ) =
−u 2 − iθ1∗ (A − λIn )−1 π2 . u 1 − iθ2∗ (A − λIn )−1 π2
442
MENNICKEN, SAKHNOVICH, AND TRETTER
It has also been shown there that (2.12) is equivalent to condition (4.42). Now, assume that (4.42) does not hold, that is, u 1 − iθ2∗ (A − d p In )−1 π2 = 0,
p = 1, 2.
(4.61)
Then, since ϕ is assumed to be analytic in d p , p = 1, 2, it also follows that −u 2 − iθ1∗ (A − d p In )−1 π2 = 0,
p = 1, 2.
(4.62)
By (4.41), relations (4.61) and (4.62) would imply that the last column of the unitary matrix w A (0, d p )U ∗ is zero, a contradiction. Now Theorem 4.4 yields the assertion.
Analogously to Theorem 4.4, we obtain the following theorem for the case (4.21) when the coefficients β 0p , p = 1, 2, are orthogonal. THEOREM 4.6 Let b p = ±1, p = 1, 2, and d p = d p , p = 1, 2, d1 6 = d2 be constants. Suppose that the parameters A, 5(0), and β 0p , p = 1, 2, satisfy (4.17) and (4.21). Then the vector functions β p , p = 1, 2, given by (4.18) are well defined and satisfy (1.2). Moreover, if (2.12) holds, the functions
ϕ1 (λ) =
−u 2 − iθ1∗ (A − λIn )−1 π2 , u 1 − iθ2∗ (A − λIn )−1 π2
−1 , ϕ2 (λ) = − ϕ1 (λ)
(4.63)
where θ1 , θ2 and π1 , π2 denote the columns of 5(0) and 5(0)U ∗ , respectively, are W p -functions of the system (1.1) with coefficients β p , p = 1, 2, given by (4.18). If u 1 6= 0, ϕ1 admits the realization ϕ1 (λ) = −
u2 i − 2 π1∗ (A× − λIn )−1 π2 , u1 u1
(4.64)
where again A× is given by (4.38). If u 2 6 = 0, ϕ2 admits the realization ϕ2 (λ) =
i u1 + 2 π2∗ (A◦ − λIn )−∗ π1 , u2 u2
where A◦ := A× +
i π2 π1∗ . u1u2
(4.65)
(4.66)
Proof We proceed similarly as in the proof of Theorem 4.4. Instead of (4.39), we obtain, in
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
443
view of (4.19) and (4.23), 0 0 −1 ixb (λ−d ) ∗ 2 2 w(x, λ)w A =e w A (x, λ) U , 1 1 1 1 −1 w(x, λ)w A (0, λ) U ∗ = eixb1 (λ−d1 ) w A (x, λ) U ∗ . 0 0 (0, λ) U ∗
(4.67)
From the first relation in (4.67) we then derive a W1 -function ϕ1 as before. Indeed, if we choose M > 0 such that σ (A) ∪ {d2 } 6 ⊂ D M , we see that again inequality (4.40) is valid. Therefore all further considerations of the proof of Theorem 4.4 can be applied here for the case p = 1; hence we obtain the same representation and realization for ϕ1 as in Theorem 4.4, and the equivalence of the conditions u 1 − iθ2∗ (A − d1 In )−1 π2 6 = 0
(4.68)
β11 (0) 6 = 0.
(4.69)
and If we denote the entries of the (2 × 2)-matrix w A (0, λ)U ∗ by w jk (λ), j, k = 1, 2, then, by (4.41), ϕ1 can be written as ϕ1 (λ) =
w12 (λ) . w22 (λ)
(4.70)
In a similar way, from the second equation in (4.67) we obtain the expression ϕ2 (λ) =
w11 (λ) w21 (λ)
(4.71)
for a W2 -function if w21 (d2 ) 6 = 0.
(4.72)
According to (4.18), (4.21), and (4.24), the inequality β21 (0) 6 = 0 (i.e., (2.12) for p = 2) is equivalent to w12 (d2 ) 6 = 0, which, due to the fact that w A (0, λ)U ∗ is unitary, implies (4.72). Hence (4.71) indeed defines a W2 -function. By (4.16), we have w11 (λ)w21 (λ) + w12 (λ)w22 (λ) = 0, which yields the representation of ϕ2 in (4.63). The realization of ϕ2 is an easy consequence of the realization of ϕ1 in (4.64). We would like to mention that (4.63) reveals the connection between the analytically continued rational W1 - and W2 -functions which takes place on the whole complex plane excluding the poles. The next theorem is the analogue of Theorem 4.5 for the case (4.21) that the coefficients β 0p , p = 1, 2, are orthogonal.
444
MENNICKEN, SAKHNOVICH, AND TRETTER
THEOREM 4.7 A proper rational function ϕ which is analytic at d1 and takes nonzero values at −1 d2 and at ∞ is a W1 -function, and − ϕ(λ) is a W2 -function for system (1.1) with coefficients β p , p = 1, 2, given by (4.18) where the parameters A, 5(0) = θ1 θ2 , and β10 , β20 satisfying (4.21) are defined through the minimal realization (4.46) of ϕ by means of the formulas
1 , u 1 := p 1 + |D|2
u2 = − p
D 1 + |D|2
,
(4.73)
(4.52), and (4.53). The coefficients β p , p = 1, 2, given by (4.18) via these parameters satisfy condition (2.12), that is, β p1 (0) 6= 0, p = 1, 2. Proof Most of the proof coincides with the proof of Theorem 4.5. By the same reasoning as therein, we get the existence of a positive solution X of the Riccati equation (4.51), the validity of the first and the third relation in (4.17), the representation ϕ(λ) = −
−u 2 − iθ1∗ (A − λIn )−1 π2 u2 i − 2 π1∗ (A× − λIn )−1 π2 = , u1 u 1 − iθ2∗ (A − λIn )−1 π2 u1
(4.74)
and the properties u 1 6= 0,
d1 ∈ / σ (A),
β11 (0) 6= 0.
In order to satisfy all assumptions of Theorem 4.6 and to obtain thus the claim of the present theorem, it remains to be shown that u 2 6= 0,
d2 ∈ / σ (A),
β21 (0) 6= 0.
(4.75)
That u 2 6 = 0 follows from the assumption that ϕ(∞) 6= 0. Thus (4.74) implies a minimal realization for ϕ −1 , namely, u1 i −1 ∗ ◦ −1 ϕ (λ) = − 1− π (A − λIn ) π2 , (4.76) u2 u1u2 1 where A◦ is given by (4.66). By the assumptions, ϕ −1 has no singularity at λ = d2 , that is, d2 ∈ / σ (A◦ ). On the other hand, if d2 ∈ σ (A), then there exists f ∈ Cn , f 6= 0, such that (A× − d2 In ) f = 0 and 5(0)∗ f = 0 (see the proof of Theorem 4.5), and in particular, π1∗ f = 0. Then, by the definition of A◦ in (4.66), it follows that (A◦ − d2 In ) f = 0, that is, d2 ∈ σ (A◦ ), a contradiction. This proves the second relation in (4.75).
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
445
Finally, according to (4.18), (4.21), and (4.24), the last relation in (4.75) is equivalent to w12 (d2 ) 6 = 0. This is, in turn, equivalent to ϕ(d2 ) =
w12 (d2 ) 6= 0, w22 (d2 )
which is satisfied due to the assumption. Remark 4.8 Notice that the Weyl functions ϕ1 and ϕ2 considered in Theorems 4.5 and 4.7 are analytic at d1 and d2 , respectively. Therefore the conditions of Theorem 3.2 are fulfilled, and hence the solutions of the inverse problems constructed in the above-mentioned theorems are unique. Appendix. Proof of Theorem 2.1 We fix p ∈ {1, 2}, and we set q(x) := (qi j (x))i,2 j=1 := Q 0 (x)Q(x)∗ + ibk (d p − dk )−1 Q(x)βk (x)∗ βk (x)Q(x)∗ ,
(A.1)
q0 (x) := diag (q11 (x), q22 (x)), for x ∈ [0, ∞) where k ∈ {1, 2} was defined by {1, 2} = { p, k}. Furthermore, we introduce the diagonal matrix function D0 by the initial value problem D0 (0) = I2 ,
D00 (x) = q0 (x)D0 (x),
x ∈ [0, ∞),
(A.2)
x ∈ [0, ∞), µ ∈ C.
(A.3)
and put e (x, µ) := D0 (x)−1 W (x, µ), W Note that according to (A.1) and (A.2) we have q0 (x)∗ = −q0 (x),
D0 (x)∗ = D0 (x)−1 .
(A.4)
From (2.7), (2.8), and (A.3) we easily obtain ex (x, µ) = (iµj + e e (x, µ), W ξ (x, µ)) W e ξ (x, µ) := D0 (x)−1 (ξ(x, µ) − q0 (x))D0 (x).
(A.5)
By (A.5), the representation e (x, µ) = W
∞ X r =0
vr (x, µ)
(A.6)
446
MENNICKEN, SAKHNOVICH, AND TRETTER
holds with v0 (x, µ) = eiµx j , Z x vr (x, µ) = eiµ(x−u) je ξ (u, µ)vr −1 (u, µ) du,
r = 1, 2, . . . .
(A.7)
0
Taking into account (A.7), it can be checked that −1 Z x Z x bp e(x, u, 1) du + µ − ek (x, u, 1) du v1 (x, µ) = eiµu N eiµu N 2(d − d ) k p −x −x −1 Z x bp + µ− eiµx j diag (χk11 (u), χk22 (u)) du, (A.8) 2(dk − d p ) 0 where 0 e q12 ((x − u)/2) , e q21 ((x + u)/2) 0 0 χk12 ((x − u)/2) ek (x, u, 1) := 1 N 0 2 χk21 ((x + u)/2) e(x, u, 1) := 1 N 2
(A.9)
and e q (x) := (e qi j (x))i,2 j=1 := D0 (x)−1 q(x)D0 (x), χk (x) := (χki j (x))i,2 j=1 i := − bk b p (d p − dk )−2 D0 (x)−1 Q(x)βk (x)∗ βk (x)Q(x)∗ D0 (x) 2 for x, u ∈ [0, ∞), |u| ≤ x. By induction, using (A.6)–(A.9), we obtain the representation Z x e(x, u, r ) du vr (x, µ) = eiµu N −x
+ µ−
bp 2(dk − d p )
−1 Z
x
ek (x, u, r ) du eiµu N
(A.10) + 2(x, µ, r ),
−x
e, N ek , and 2 satisfying the inequalities for r = 2, 3, . . . , with some functions N r −1 x r −1 ek (x, u, r )k ≤ C r −1 x , kN , (r − 1)! (r − 1)! x r −1 kµ2 2(x, µ, r )k ≤ C r −1 (A.11) (r − 1)! for all |u| ≤ x ≤ l with some constant C = C(l, =(µ)) for any l < ∞. In particular, e in (A.10) is given by the function N R x e(s, u + s − x, r − 1) ds q12 (s) 0 1 N (x−u)/2 e e(x, u, r ) := , N Rx e q21 (s) 1 0 N (s, u + x − s, r − 1) ds (x+u)/2 e
e(x, u, r )k ≤ C r kN
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
447
and the first inequality in (A.11) is immediate. The other two inequalities can be proved analogously. From (A.3) and (A.6)–(A.11), representation (2.10) follows, and (A.11) also implies (2.11). References [AS]
M. J. ABLOWITZ and H. SEGUR, Solitons and the Inverse Scattering Transform, SIAM
[AL]
V. M. ADAMJAN and H. LANGER, Spectral properties of a class of rational operator
[AM]
M. ADLER and P. VAN MOERBEKE, Birkhoff strata, Bäcklund transformations, and
Stud. Appl. Math. 4, SIAM, Philadelphia, 1981. MR 84a:35251 413, 414 valued functions, J. Operator Theory 33 (1995), 259–277. MR 96i:47023 413 regularization of isospectral operators, Adv. Math. 108 (1994), 140–204. MR 96a:58169 415 [BGK] H. BART, I. C. GOHBERG, and M. A. KAASHOEK, Minimal Factorization of Matrix and Operator Functions, Oper. Theory Adv. Appl. 1, Birkhäuser, Basel, 1979. MR 81a:47001 438 [BC] R. BEALS and R. R. COIFMAN, Scattering and inverse scattering for first order systems, Comm. Pure Appl. Math. 37 (1984), 39–90. MR 85f:34020 415 [BDZ] R. BEALS, P. DEIFT, and X. ZHOU, “The inverse scattering transform on the line” in Important Developments in Soliton Theory, Springer Ser. Nonlinear Dynam., Springer, Berlin, 1993, 7–32. MR 95k:34020 415 [D] P. DEIFT, Applications of a commutation formula, Duke Math. J. 45 (1978), 267–310. MR 81g:47001 415 [DIKZ] P. DEIFT, A. R. ITS, A. KAPAEV, and X. ZHOU, On the algebro-geometric integration of the Schlesinger equations, Comm. Math. Phys. 203 (1999), 613–633. MR 2000f:34183 415 [FT] L. D. FADDEEV and L. A. TAKHTAJAN, Hamiltonian Methods in the Theory of Solitons, trans. A. G. Reyman, Springer Ser. Sov. Math., Springer, Berlin, 1987. MR 89m:58103 413, 414 [FI] A. S. FOKAS and A. R. ITS, Integrable equations on the half-infinite line, Chaos Solitons Fractals 5 (1995), 2367–2376. MR 96i:35109 415 [GKS] I. C. GOHBERG, M. A. KAASHOEK, and A. L. SAKHNOVICH, Pseudo-canonical systems with rational Weyl functions: Explicit formulas and applications, J. Differential Equations 146 (1998), 375–398. MR 2000c:34211 415, 430, 439 [Ka] R. E. KALMAN, Contributions to the theory of optimal control, Bol. Soc. Mat. Mexicana (2) 5 (1960), 102–119. MR 23:B518 439 [KaFA] R. E. KALMAN, P. FALB, and M. ARBIB, Topics in Mathematical System Theory, McGraw-Hill, New York, 1969. MR 40:8465 438 [K] M. G. KREIN, Topics in Differential and Integral Equations and Operator Theory, Oper. Theory Adv. Appl. 7, Birkhäuser, Basel, 1983. MR 86m:00014 413 [LR] P. LANCASTER and L. RODMAN, Algebraic Riccati Equations, Oxford Sci. Publ., Clarendon, New York, 1995. MR 97b:93003 439
448
MENNICKEN, SAKHNOVICH, AND TRETTER
[LMM] H. LANGER, R. MENNICKEN, and M. MÖLLER, “A second order differential operator depending non-linearly on the eigenvalue parameter” in Topics in Operator Theory: Ernst D. Hellinger Memorial Volume, Oper. Theory Adv. Appl. 48, Birkhäuser, Basel, 1990, 319–332. MR 94k:34032 413 [LT] H. LANGER and C. TRETTER, Spectral decomposition of some nonselfadjoint block operator matrices, J. Operator Theory 39 (1998), 339–359. MR 99d:47004 413 [L] Z. L. LEIBENZON, The inverse problem of the spectral analysis of ordinary differential operators of higher orders (in Russian), Trudy Moscov Mat. Obšˇc. 15 (1966), 70–144; English translation in Trans. Moscow. Math. Soc. 34 (1966), 78–163. MR 34:4951 415 [LS] B. M. LEVITAN and I. S. SARGSJAN, Introduction to Spectral Theory: Selfadjoint Ordinary Differential Operators, Transl. Math. Monogr. 39, Amer. Math. Soc., Providence, 1975. MR 51:6026 413 [M] V. A. MARCHENKO, Sturm-Liouville Operators and Their Applications, Oper. Theory Adv. Appl. 22, Birkhäuser, Basel, 1986. MR 88f:34034 413 [PW] R. PALEY and N. WIENER, Fourier Transforms in the Complex Domain, Amer. Math. Soc. Colloq. Publ. 19, Amer. Math. Soc, Providence, 1987. MR 98a:01023 420, 421 [RS] Z. RUDNICK and P. SARNAK, The behaviour of eigenstates of arithmetic hyperbolic manifolds, Comm. Math. Phys. 161 (1994), 195–213. MR 95m:11052 415 [SaA1] A. L. SAKHNOVICH, Iterated Bäcklund-Darboux transformation and transfer matrix-function (nonisospectral case), Chaos Solitons Fractals 7 (1996), 1251–1259. MR 97j:58137 430 [SaA2] , “Sine Gordon equation in laboratory coordinates and inverse problem on the semi-axis” in Algebraic and Geometric Methods in Mathematical Physics (Kaciveli, Ukraine, 1993), ed. A. Boutet de Monvel and V. A. Marchenko, Math. Phys. Stud. 19, Kluwer, Dordrecht, 1996. MR CMP 1 385 701 414, 415, 430 [SaL1] L. A. SAKHNOVICH, Spectral analysis of Volterra operators specified in the space of vector-valued functions L 2m (0, l) (in Russian), Ukraïn. Mat. Zh. 16, no. 2 (1964), 259–268. MR 29:2680 417 [SaL2] , Factorization problems and operator identities, Russian Math. Surveys 41 (1986), 1–64. MR 87k:47041 413, 431 [SaL3] , Spectral Theory of Canonical Differential Systems: Method of Operator Identities, Oper. Theory Adv. Appl. 107, Birkhäuser, Basel, 1999. MR 2000e:47073 413, 431 [Y] V. A. YURKO, Reconstruction of nonselfadjoint differential operators on the semi-axis from the Weyl matrix (in Russian), Mat. Sb. 182, no. 3 (1991), 431–456; English translation in Math. USSR Sb. 72, no.2 (1992), 413–438. MR 93b:34025 415 [Z] X. ZHOU, Inverse scattering transform for systems with rational spectral dependence, J. Differential Equations 115 (1995), 277–303. MR 95k:34023 413
SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER
449
Mennicken Department of Mathematics, University of Regensburg, 93040 Regensburg, Germany; [email protected] Sakhnovich Branch of Hydroacoustics, Marine Institute of Hydrophysics, National Academy of Sciences of Ukraine, Preobrajenskaja 3, 65100 Odessa, Ukraine; [email protected] Tretter Department of Mathematics and Computer Science, University of Leicester, University Road, Leicester LE1 7RH, United Kingdom; [email protected]
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 3,
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS ON INFINITE GRAPHS ALEXANDER GRIGOR’YAN AND ANDRAS TELCS
Abstract We prove that a two-sided sub-Gaussian estimate of the heat kernel on an infinite weighted graph takes place if and only if the volume growth of the graph is uniformly polynomial and the Green kernel admits a uniform polynomial decay. Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2. Statement of the main result . . . . . . . . . . . . . . . 3. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . 4. Outline of the proof and its consequences . . . . . . . . . 5. The Faber-Krahn inequality and on-diagonal upper bounds 6. The mean exit time and the Green kernel . . . . . . . . . 7. Sub-Gaussian term . . . . . . . . . . . . . . . . . . . . 8. Off-diagonal upper bound of the heat kernel . . . . . . . 9. On-diagonal lower bound . . . . . . . . . . . . . . . . . 10. The Harnack inequality and the Green kernel . . . . . . . 11. Oscillation inequalities . . . . . . . . . . . . . . . . . . 12. Time derivative of the heat kernel . . . . . . . . . . . . . 13. Off-diagonal lower bound . . . . . . . . . . . . . . . . 14. Parity matters . . . . . . . . . . . . . . . . . . . . . . 15. Consequences of the heat kernel estimates . . . . . . . . Appendix. The list of the lettered conditions . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
452 455 459 464 468 476 478 482 484 485 488 490 493 499 503 505 506
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 3, Received 5 July 2000. Revisions received 18 October 2000. 2000 Mathematics Subject Classification. Primary 60B99, 60J35, 60G50. Grigor’yan’s work supported by Engineering and Physical Sciences Research Council research fellowship number B/94/AF/1782. Telcs’s work partially supported by a visiting grant of the London Mathematical Society.
451
452
GRIGOR’YAN AND TELCS
1. Introduction Consider the heat equation
∂f = 1 f, (1.1) ∂t where f = f (t, x) is a function of t > 0 and x ∈ Rn , and where 1 is the Laplace operator in Rn . The fundamental solution to (1.1) is given by the classical GaussWeierstrass formula |x|2 1 exp − f (t, x) = . 4t (4πt)n/2 The function pt (x, y) = f (t, x − y) is called the heat kernel of the Laplace operator. In the past three decades, there have been numerous works devoted to estimates of heat kernels in various settings (see, e.g., books and surveys [4], [15], [16], [25], [35], [52], [55], [65], [66]). These are parabolic equations with variable coefficients, the heat equation on Riemannian manifolds, the discrete heat equation on graphs, and the heat semigroups on general metric measure spaces including fractal-like sets. Despite the high diversity of the underlying spaces and equations, in many important cases the heat kernel is naturally defined and, moreover, admits the so-called Gaussian estimates. For any metric measure space M with distance d and measure µ, denote by B(x, r ) the open metric ball of radius r centered at x, and denote by V (x, r ) its measure µ. Suppose first that M is either a discrete group or a Lie group, with properly defined d, µ and the heat kernel pt (x, y). Assume that the volume growth of M is polynomial; that is, for some α > 0, V (x, r ) ' r α .
(1.2)
(Here the sign ' means that the ratio of both sides of (1.2) stays between two positive constants.) Then the heat kernel on M admits the following Gaussian estimate (see [64], [37]): d 2 (x, y) −α/2 pt (x, y) ' t exp − (1.3) ct (where the positive constant c may be different for the upper and lower bounds). The heat kernel in Rn obviously satisfies (1.3) with α = n. Suppose now that M is a complete manifold with nonnegative Ricci curvature. Then the following estimate of P. Li and S.-T. Yau [47] is well known: 1 d 2 (x, y) pt (x, y) ' . (1.4) √ exp − ct V (x, t) In particular, if V (x, r ) ' r α , then the heat kernel again satisfies the estimate (1.3).
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
453
As we see, for groups of polynomial growth and for nonnegatively curved manifolds, the heat kernel is fully determined (up to constant factors) by the volume growth function. In other words, the potential theory on such spaces is characterized by a single parameter α—the exponent of the volume growth. The presence of Gaussian estimates (1.3) or (1.4) reflects certain properties of the space M. In particular, (1.4) implies that the Markov process X t with the transition density pt (x, y) has the diffusion speed of the order t 1/2 . The latter means that the process X t , started at a point x, first exits the ball B(x, R) at the time t ' R 2 . The development of Markov processes on fractals and fractal-like graphs (see [10], [7], [30], [36], [40], [41], [42], [44], [45], [59], [67], etc.) has led to construction of homogeneous metric spaces M where the process X t has the diffusion speed of the order t 1/β , with some β > 2. Such a process X t is referred to as subdiffusive, and it is characterized by two parameters α and β, which determine sub-Gaussian estimates of the heat kernel: β d (x, y) 1/(β−1) pt (x, y) ' t −α/β exp − . (1.5) ct Here α is the exponent of the volume growth as in (1.2). The Gaussian estimate (1.3) is a particular case of (1.5) for β = 2. M. Barlow and R. Bass [7] showed that sub-Gaussian estimates (1.5) with β > 2 can take place not only on singular spaces such as fractals but also on smooth Riemannian manifolds, for a certain range of time. Similar estimates hold for random walks on certain fractal-like graphs (see [8], [39]). It has become apparent that a large and interesting class of homogeneous spaces features sub-Gaussian estimates of the heat kernel. The potential theory on such spaces is determined by the two parameters and hence cannot be recovered only from the volume growth.∗ A natural question arises: How do we characterize those spaces that admit sub-Gaussian estimates (1.5) of the heat kernel? If M is a complete noncompact Riemannian manifold, then the validity of Gaussian estimate (1.3) is known to be equivalent to the following two conditions: volume growth (1.2) and the Poincar´e inequality (N )
λ1 (B(x, r )) ≥
c , r2
(1.6)
(N )
where λ1 (B) is the first nonzero eigenvalue of the Neumann boundary value problem in the ball B (see [53], [31]; similar results are known also for graphs (see [28]) parameters α and β must satisfy the inequalities 2 ≤ β ≤ α + 1, which seem to be the only constraints on α and β. We are indebted to Martin Barlow for providing us with the evidence for the latter. ∗ The
454
GRIGOR’YAN AND TELCS
and for abstract local Dirichlet spaces (see [57])). It may be tempting to conjecture that by replacing r 2 by r β in (1.6), one obtains equivalent conditions for sub-Gaussian estimates. However, this conjecture is false. At the present time, no similar characterization of spaces with sub-Gaussian estimates seems to be known. All examples of spaces where (1.5) is proved are fractal-like spaces featuring a self-similarity structure. The purpose of this paper is to provide a new approach to obtaining sub-Gaussian estimates of the heat kernel. Our point of departure is the understanding that, apart from the uniform volume growth V (x, r ) ' r α , we have to introduce additional hypotheses, which would contain the second parameter β and provide the necessary homogeneity of the space. (Just the uniform volume growth is not enough for the latter.) Let g(x, y) be the Green kernel on M; that is, Z ∞ g(x, y) = pt (x, y) dt. 0
Recall that, in Rn , g(x, y) = cn |x − y|−(n−2) if n > 2 and g ≡ ∞ if n ≤ 2. Our general result says the following: Given the parameters α > β ≥ 2, the two-sided sub-Gaussian estimate pt (x, y) ' t
−α/β
exp
−
d β (x, y) ct
1/(β−1) (1.7)
holds if and only if V (x, r ) ' r α
and
g(x, y) ' d(x, y)−(α−β) .
(1.8)
We do not specify here the ranges of the variables x, y, t, r because they are different for different settings. In the present paper, we treat the case when the underlying space is a graph, and the time is also discrete. However, the graph case already contains all difficulties. We present the proof in such a way that only minimal changes are required to pass to a general setting of abstract metric spaces, which will be dealt with elsewhere. The exact statements are given in Section 2. Note that our result is new, even for the Gaussian case β = 2. Hypothesis (1.8) consists of two conditions of different nature. The first one is a geometric condition of the volume growth, whereas the second is an estimate of a fundamental solution to an elliptic equation. Neither of them separately implies heat kernel bounds (1.7). Surprisingly enough, the exponent β, which provides the scaling of space and time variables for a parabolic equation, can be recovered from an elliptic equation, although combined with the volume growth. The paper is arranged as follows. In Section 2 we state the main result—Theorem 2.1. In Section 3 we introduce necessary tools such as the discrete Laplace operator,
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
455
its eigenvalues, the mean exit time, and so on. In Section 4 we describe the scheme of the proof of Theorem 2.1, as well as some consequences. In particular, we mention some other conditions equivalent to (1.7). The actual proof of Theorem 2.1 consists of many steps that are considered in detail in Sections 5–15. Notation. The letters c, C are reserved for positive constants not depending on the variables in question. They may be different on different occurrences, even within the same formula. All results of the paper are quantitative in the sense that the constants in conclusions depend only on the constants in hypotheses. The relation f ' g means that the ratio of functions f and g is bounded from above and below by positive constants, for the specified range of the variables. If one of those functions contains a sub-Gaussian factor exp(−(d β /(ct))1/(β−1) ), then the constant c in exp may be different for the upper and lower bounds (cf. (1.7)). We use a number of lettered formulas such as (U E), (L E), and so on, to refer to the most important and frequently used conditions. In the appendix we provide a complete list of all such formulas. 2. Statement of the main result Throughout the paper, 0 denotes an infinite, connected, locally finite graph. If x, y ∈ 0, then we write x ∼ y, provided x and y are connected by an edge. The graph is always assumed nonoriented; that is, x ∼ y is equivalent to y ∼ x. We do not exclude loops so that x ∼ x is possible. If x ∼ y, then x y denotes the edge connecting x and y. The distance d(x, y) is the minimal number of edges in any edge path connecting x and y. Assume that graph 0 is endowed by a weight µx y , which is a symmetric nonnegative function on 0 × 0 such that µx y > 0 if and only if x ∼ y. Given µx y , we also define a measure µ on vertices by X µ(x) := µx y y∼x
and µ(A) :=
X
µ(x),
x∈A
for any finite set A ⊂ 0. The couple (0, µ) is called a weighted graph. Here µ refers both to the weight µx y and to the measure µ. Any graph 0 admits a standard weight, which is defined by µx y = 1 for all edges x y. For such a weight, µ(x) is equal to the degree of the vertex x, which is the number of its neighbors.
456
GRIGOR’YAN AND TELCS
Any weighted graph has a natural Markov operator P(x, y) defined by P(x, y) :=
µx y . µ(x)
(2.1)
Clearly, we have X
P(x, y) = 1
(2.2)
y∈0
and P(x, y)µ(x) = P(y, x)µ(y).
(2.3)
For the Markov operator P, there is an associated random walk X n , jumping at each time n ∈ N from a current vertex x to a neighboring vertex y with probability P(x, y). The process X n is Markov and reversible with respect to measure µ. If µ is the standard weight on 0, then X n is called a simple random walk on 0. Conversely, given a countable set 0 with a measure µ and a Markov operator P(x, y) on 0 satisfying (2.3), identity (2.1) uniquely determines a symmetric weight µx y on 0×0. Then one defines edges x y as those pairs of vertices for which µx y 6 = 0, and one obtains a weighted graph (0, µ). One has to assume in addition that the resulting graph 0 is connected and locally finite. Let Pn denote the nth convolution power of the operator P. Alternatively, Pn (x, y) is the transition function of the random walk X n ; that is, Pn (x, y) = Px (X n = y) . Define also the transition density of X n , or the heat kernel, by pn (x, y) :=
Pn (x, y) . µ(y)
As obviously follows from (2.3), pn (x, y) = pn (y, x). The only a priori assumption that we normally make about the transition probability is the following: P(x, y) ≥ p0 , ∀x ∼ y, ( p0 ) where p0 is a positive constant. Due to (2.2), hypothesis ( p0 ) implies that the degree of each vertex x ∈ 0 is uniformly bounded from above. The latter is in fact equivalent to ( p0 ), provided X n is a simple random walk. By sub-Gaussian heat kernel estimates on graphs, we mean the following inequalities: d(x, y)β 1/(β−1) −α/β pn (x, y) ≤ Cn exp − (U E) Cn
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
457
and pn (x, y) + pn+1 (x, y) ≥ cn
−α/β
exp
−
d(x, y)β cn
1/(β−1) ,
n ≥ d(x, y), (L E)
where x, y are arbitrary points on 0 and where n is a positive integer. Let us comment on the differences between (U E) and (L E). First, observe that pn (x, y) = 0 whenever n < d(x, y). (Indeed, the random walk cannot get from x to y in a number of steps smaller than d(x, y).) Therefore, the restriction n ≥ d(x, y) in (L E) is necessary. We could assume the same restriction in (U E), but if pn (x, y) = 0, then (U E) is true anyway. Another difference—using pn + pn+1 in (L E) in place of pn in (U E)—is due to the parity problem. Indeed, if graph 0 is bipartite (e.g., Z D ), then pn (x, y) = 0 whenever n and d(x, y) have different parities. Therefore, the lower bound for pn cannot hold in general, and we state it for pn + pn+1 instead. Alternatively, one could say that the lower bound holds either for pn or for pn+1 . The structure of the graph may cause one of pn , pn+1 to be small (or even vanish), but it is not possible to decide a priori which of these two terms admits the lower bound (see Section 14 for more details). Denote by B(x, R) a ball on 0 of radius R centered at x, and denote by V (x, R) its measure, that is, B(x, R) := {y ∈ 0 : d(x, y) < R} ,
V (x, R) := µ(B(x, R)).
We say that graph (0, µ) has the regular volume growth of degree α if V (x, R) ' R α ,
∀x ∈ 0, R ≥ 1.
(V )
The Green kernel of (0, µ) is defined by g(x, y) :=
∞ X
pn (x, y).
n=0
Assuming that α > β, the estimates (U E) and (L E) imply, upon summation in n, g(x, y) ' d(x, y)−γ ,
∀x 6= y,
(G)
where γ = α − β. It turns out that (G), together with the volume growth condition (V ), is sufficient to recover the heat kernel estimates (U E) and (L E), as is stated in the following main theorem. 2.1 Let α > β > 1, and let γ = α − β. For any infinite connected weighted graph (0, µ) satisfying ( p0 ), the following equivalence holds: THEOREM
(V ) + (G) ⇐⇒ (U E) + (L E).
458
GRIGOR’YAN AND TELCS
Remark 2.1 Under hypotheses (V ) and (G), some partial heat kernel estimates were obtained by A. Telcs [62]. It is well known that a simple random walk in Z D admits the Gaussian estimate
cn −D/2 exp
−
d 2 (x, y) cn
≤ pn (x, y) ≤ Cn −D/2 exp
−
d 2 (x, y) , Cn
(2.4)
subject to the restrictions n ≡ d(x, y) (mod 2) and d(x, y) ≤ n. Similar Gaussian estimates were also proved for more general graphs, under various assumptions (see [37], [54], [22], [28]). It is easy to see that (2.4) is equivalent to (U E) + (L E) for α = D and β = 2 (see Section 14 for the parity matters). Barlow and Bass [8] constructed a family of graphs—graphical Sierpi´nski carpets (resembling in the large scale the multi-dimensional Sierpi´nski carpet), which are characterized by the parameters α and β. Heat kernels on those graphs satisfy subGaussian estimates (U E) and (L E). In general, the parameters α and β in (U E) and (L E) must satisfy the inequalities 2 ≤ β ≤ α + 1,
(2.5)
which can be seen as follows. By [9, Th. 2.1], the lower bound in (V ) implies the ondiagonal upper bound pn (x, x) ≤ Cn −α/(α+1) . By the result of [48], the upper bound in (V ) implies the on-diagonal lower bound pn (x, x) ≥ c (n log n)−α/2 . Comparing these estimates with the on-diagonal lower and upper bounds implied by (L E) and (U E), we obtain (2.5) (cf. [4, Th. 3.20 and Rem. 3.22], [59], as well as Lemma 5.4). The sub-Gaussian estimates for different α and β are related as follows. Consider the right-hand side of (U E) and (L E) as a function of α and β. It is easy to see that it decreases as β and α/β simultaneously increase (assuming d(x, y) ≥ n). In particular, (U E) gets stronger (and (L E) gets weaker) on increasing of α with constant β, whereas in general there is no monotonicity in β. Estimates (U E) and (L E) were proved by O. Jones [39] for the graphical Sierpi´nski gasket. The latter is a graph that is obtained from an equilateral triangle by a fractal-like construction (see Figure 1). The reason for a subdiffusive behaviour of the random walk on such graphs is that they contain plenty of “holes” of all sizes, which causes the random walk to spend more time on circumventing the obstacles rather than on moving away from the origin. It is possible to show that (V ) and (G) imply β ≥ 2 (see Lemma 5.4). The assumption α > β is necessary to ensure the finiteness of the Green function. It is known that either g(x, y) is finite for all x, y or g ≡ ∞. In the first case the graph
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
459
Figure 1. A fragment of the graphical Sierpi´nski gasket
(0, µ) is called transient and in the second case recurrent (e.g., Z D is transient if D ≥ 3 and recurrent otherwise). Hence, Theorem 2.1 serves only transient graphs. The question of finding equivalent conditions for sub-Gaussian estimates (U E) and (L E) is equally interesting for recurrent graphs. Note that the graph in Figure 1 is recurrent.∗ Indeed, the volume function on this graph obviously admits the estimate V (x, r ) ≤ Cr 2 , which implies the recurrence (see [18], [66]). Alternatively, one can see directly that α < β because the parameters α and β for the Sierpi´nski gasket are α = log 3/ log 2 and β = log 5/ log 2 (see [4]). Some hints on the recurrent case are given in Section 4. 3. Preliminaries If P is the Markov operator of a weighted graph (0, µ) and if I is the identity operator, then 1 := P − I is called the Laplace operator of (0, µ). For any set A ⊂ 0, denote by A the set containing all vertices of A and all their neighbors. If a function f is defined on A, then 1 f is defined on A and 1 f (x) =
X y∼x
∗ Plenty
P(x, y) f (y) − f (x) =
1 X ∇x y f µx y , µ(x)
(3.1)
y∈0
of examples of transient graphs and fractals with sub-Gaussian heat kernel bounds can be found in [4], [7], and [8].
460
GRIGOR’YAN AND TELCS
where ∇x y f := f (y) − f (x). Note that although the summation in the second sum in (3.1) runs over all vertices y, the summand is nonvanishing only if y ∼ x. The following is a discrete analogue of the Green formula: for any finite set A and for all functions f and g defined on A, X X 1 X 1 f (x)g(x)µ(x) = ∇x y f g(x)µx y − ∇x y f ∇x y g µx y . 2 x∈A
x∈A,y ∈A /
x,y∈A
(3.2) We say that a function v is harmonic in set A if v is defined in A and 1v = 0 in A. Similarly, we say that a function v is superharmonic if 1v ≤ 0. Observe that the inequality 1v ≤ 0 is equivalent to X v(x) ≥ P(x, y)v(y). y∼x
The latter implies, in particular, that the infimum of a family of superharmonic functions is again superharmonic. For any nonempty set A ⊂ 0, let c0 (A) be the set of functions on 0 whose support is finite and is in A. Denote by 1 A the Laplace operator with the vanishing Dirichlet boundary condition on A; that is, ( 1 f, x ∈ A, 1 A f (x) := 0, x∈ / A. The operator 1 A is symmetric with respect to the measure µ and is nonpositive definite. Moreover, it is essentially self-adjoint in L 2 (A, µ). For a finite set A, denote by |A| its cardinality. If A is finite and nonempty, then the operator −1 A has |A| nonnegative eigenvalues that we enumerate in increasing order and denote as follows: λ1 (A) ≤ λ2 (A) ≤ · · · ≤ λ|A| (A). It is known that all eigenvalues λi (A) lie in the interval [0, 2] and that λ1 (A) ∈ [0, 1] (see, e.g., [19], [22, Sec. 3.3]). The smallest eigenvalue λ1 (A) admits the variational definition −(1 f, f ) λ1 (A) = inf = f ∈c0 (A) ( f, f )
P (1/2) x∼y (∇x y f )2 µx y P 2 inf , f ∈c0 (A) x f (x)µ(x)
where ( f, g) :=
X x∈0
f (x)g(x)µ(x).
(3.3)
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
461
If A = B(x, R), then we write, for simplicity, λ(x, R) := λ1 (B(x, R)). Given a nonempty set A ⊂ 0, let X nA be the random walk on (0, µ) with the killing condition outside A. Its Markov operator P A (x, y) is defined by ( P(x, y), x, y ∈ A, A P (x, y) := 0, otherwise. The transition function PnA (x, y) of X nA is defined inductively: P0A (x, y) = δx y and X X A (3.4) P A (x, z)PnA (z, y). (x, y) = PnA (x, z)P A (z, y) = Pn+1 z∈0
z∈0
As easily follows from (3.4), the function u n (x) = PnA (x, y) satisfies in A × N the discrete heat equation u n+1 − u n = 1 A u n . (3.5) The heat kernel pnA (x, y) of X nA is defined by pnA (x, y) :=
PnA (x, y) . µ(y)
As follows from (2.1), p A is symmetric in x and y. In particular, the kernel pnA (x, y) satisfies heat equation (3.5) both in (n, x) and (n, y). If f (x) is a function on A, then the function X u n (x) := PnA f (x) = pnA (x, y) f (y)µ(y) y∈A
solves, in A × N, heat equation (3.5) with initial data u 0 = f and boundary data u n (x) = 0 if x ∈ / A. The Green function of X nA is defined by G A (x, y) :=
∞ X
PnA (x, y).
n=0
The alternative definition is that the function G A (x, y) is the infimum of all positive fundamental solutions of the Laplace equation in A. If the Green function is finite, then, for any y ∈ A, we have 1 A G A (·, y) = −δ y . The opposite case, when G A (x, y) ≡ +∞, is equivalent to the recurrence of the process X nA . The Green kernel g A (x, y) is defined by ∞
g A (x, y) =
G A (x, y) X A = pn (x, y). µ(y) n=0
462
GRIGOR’YAN AND TELCS
Clearly, the Green kernel is symmetric in x, y. Therefore, if g A is finite, then g A is superharmonic in A with respect to both x and y, and it is harmonic away from the diagonal x = y. Observe that if µ(x) ' 1 (which in particular follows from (V )), then G A (x, y) ' g A (x, y) and pnA (x, y) ' PnA (x, y). It is easy to see that kernels pnA (x, y) and g A (x, y) increase when A is enlarged and tend to global kernels pn (x, y) and g(x, y) (defined in Section 2) as an increasing sequence of sets A exhausts 0. If A is finite and nonempty, then it makes sense to consider the Dirichlet problem in A, ( 1u = f in A, (3.6) u=h in A \ A, where f and h are given function on A and A \ A, respectively. As follows easily from the maximum principle, the solution u exists and is unique. For a finite set A, c0 (A) is identified with all functions on A extended by zero outside A. Then the equation 1 A u = f, where u and f are in c0 (A), is equivalent to Dirichlet problem (3.6) with h = 0. Its solution is given by means of the Green operator G A as follows: X u(x) = −G A f (x) = − G A (x, y) f (y). (3.7) y
In other words, we have G A = (−1 A )−1 . For any set A ⊂ 0 and a point x ∈ 0, define the mean exit time E A (x) by X E A (x) := G A (x, y). (3.8) y∈A
As follows from the above discussion, the function E A (x) solves the following boundary value problem in A: ( 1u = −1 in A, (3.9) u=0 outside A. Denote by T A the first exit time from set A for the process X n ; that is, T A := min{k : X k ∈ / A}. We claim that E A (x) = Ex (T A ), which justifies the term “mean exit time” for E A . Indeed, T A coincides with the cardinality of all n = 0, 1, 2, . . . for which X nA is in A; that is, ∞ X TA = 1{ X nA ∈A} , n=0
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
463
whence Ex (T A ) =
∞ X
∞ X X X Px X nA ∈ A = PnA (x, y) = G A (x, y) = E A (x). n=0 y∈A
n=0
y∈A
If A = B(x, R), then we use a shorter notation E(x, R) := E B(x,R) (x). Another function associated with the exit time is the exit probability, defined by 9nA (x) := Px {X k ∈ / A for some k ≤ n} = Px {T A ≤ n} .
(3.10)
In other words, 9nA (x) is the probability that the random walk X k started at x will at least once exit A by time n. Alternatively, 9nA (x) can be defined as the solution u n (x) to the following initial boundary value problem in A × N: u n+1 − u n = 1u n , (3.11) u 0 (x) = 0, x ∈ A, u (x) = 1, x∈ / Aand n ≥ 0. n
If A = B(x, R), then we use the shorter notation 9n (x, R) := 9nB(x,R) (x). To conclude this section, we prove two useful consequences of condition ( p0 ): P(x, y) ≥ p0 ,
∀x ∼ y.
( p0 )
PROPOSITION 3.1 If ( p0 ) holds, then, for all x ∈ 0 and R > 0 and for some C = C( p0 ),
V (x, R) ≤ C R µ(x).
(3.12)
Remark 3.1 Inequality (3.12) implies that, for a bounded range of R, V (x, R) ' µ(x). Proof Let x ∼ y. Since P(x, y) = µx y /µ(x) and µx y ≤ µ(y), hypothesis ( p0 ) implies p0 µ(x) ≤ µ(y). Similarly, p0 µ(y) ≤ µ(x). Iterating these inequalities, we obtain, for arbitrary x and y, d(x,y) p0 µ(y) ≤ µ(x). (3.13) Another consequence of ( p0 ) is that any point x has at most p0−1 neighbors. Therefore, any ball B(x, R) has at most C R vertices inside. By (3.13), any point y ∈ B(x, R) has measure at most p0−R µ(x), whence (3.12) follows.
464
GRIGOR’YAN AND TELCS
PROPOSITION 3.2 Assume that hypothesis ( p0 ) holds on (0, µ). Let function v be nonnegative in A and superharmonic in A. Then, for all points x, y ∈ A such that x ∼ y, we have v(x) ' v(y).
Proof Indeed, the superharmonicity of v implies X v(x) ≥ P(x, z)v(z) ≥ P(x, y)v(y), z∼x
whence v(x) ≥ p0 v(y) by ( p0 ). In the same way, v(y) ≥ p0 v(x), whence the claim follows. 4. Outline of the proof and its consequences The proof of Theorem 2.1 consists of many steps. Here we describe the logical order of these steps. The rest of the paper is arranged so that each section treats a certain topic corresponding to one or more steps in the proof of Theorem 2.1. Apart from conditions (V ), (G), (U E), and (U E) described in Section 2, we introduce here some more lettered conditions that are widely used in the proof. We say that the Faber-Krahn inequality holds on (0, µ) if, for some positive exponent ν, λ1 (A) ≥ cµ(A)−1/ν (F K ) for all nonempty finite sets A ⊂ 0. In particular, (F K ) holds in Z D with ν = D/2. If 0 is infinite and connected and if µ is the standard weight on 0, then (F K ) automatically holds with ν = 1/2 (see [9, Prop. 2.5]). We are interested in (F K ) with ν = α/β, where α and β are the parameters from (U E) and (L E), in which case we have ν > 1. An easy consequence of (U E) is the diagonal upper estimate pn (x, x) ≤ Cn −α/β
(DU E)
for all x ∈ 0 and n ≥ 1. Consider the following estimates for the mean exit time and the exit probability: E(x, R) ' R β
(E)
for all x ∈ 0, R ≥ 1, and 9n (x, R) ≤ C exp
−
Rβ Cn
1/(β−1) (9)
for all x ∈ 0, R > 0, and n ≥ 1. For example, (E) and (9) hold in Z D with β = 2.
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
465
Part (V ) + (G) =⇒ (U E) of Theorem 2.1 is proved by the following chain of implications: (V ) + (G) ⇓Prop. 5.5 ⇓Prop. 6.3 (F K ) (E) ⇓Prop. 5.1 ⇓Prop. 7.1 (DU E) (9) | {z } ⇓Prop. 8.1 (U E) The relations among exponents α, β, γ , and ν involved in all conditions are as follows: α−β =γ
and
α/β = ν.
Given (DU E) and (9), one easily obtains full upper bound (U E) using the approach of Barlow and Bass [6] (see Section 8). The method of obtaining Faber-Krahn inequality (F K ) from (V ) and (G) is based on ideas of G. Carron [14]. The implication (F K ) =⇒ (DU E) is a discrete modification of the approach of A. Grigor’yan [32]. The implication (V ) + (G) =⇒ (E) was originally proved by Telcs [59], and here we give a simpler proof for that. The crucial part of the proof of upper estimate (U E) is the implication (E) =⇒ (9). The following nearly Gaussian estimate is true always, without assuming (E) or anything else: V (x, R) R2 exp − (4.1) 9n (x, R) ≤ C µ(x) Cn (see [58] and [33, p. 355]). However, (4.1) is not good enough for us even if we neglect the factor V (x, R) in front of the exponential. Indeed, the range of n, for which we apply (9), is n > R (see the proof of Proposition 8.1). Assuming β > 2, we have in this range β 1/(β−1) R2 R > , n n so that (9) is stronger than (4.1). We provide here an entirely new argument for (E) =⇒ (9), which is based on an investigation of solutions of the equation 1v = λv. Function v can be estimated by comparing it to 1u = −1 (and the latter is related to the mean exit time). On the other hand, function (1 + λ)n v(x) satisfies the discrete heat equation and hence can be compared to 9nA (x) by using the parabolic comparison principle (see Section 7 for details). Another proof of (E) =⇒ (9) can be obtained by using the probabilistic method of Barlow and Bass [5], [6], [7].
466
GRIGOR’YAN AND TELCS
Before we consider the proof of lower bound (L E), let us introduce the following conditions. The near-diagonal lower estimate is pn (x, y) + pn+1 (x, y) ≥ cn −α/β
if d(x, y) ≤ δn 1/β
(N L E)
for some positive constant δ. Obviously, (N L E) is equivalent to (L E) in the range d(x, y) ≤ δn 1/β . As an intermediate step, we use the following diagonal lower estimate for the killed random walk: B(x,R)
p2n
(x, x) ≥ cn −α/β
if n ≤ ε R β
(DL E)
for some positive constant ε. We say that the Harnack inequality holds on (0, µ) if, for any ball B(x, 2R) ⊂ 0 and for any nonnegative function u in B(x, 2R) which is harmonic in B(x, 2R), max u ≤ H min u
B(x,R)
(H )
B(x,R)
for some constant H ≥ 1. The Harnack inequality reflects a certain homogeneity of the graph. For example, it holds for Z D with the standard weight but fails on the connected sum of two copies of Z D as well as on a binary tree. The scheme of the proof of (V ) + (G) =⇒ (L E) is shown on the diagram below. From the previous diagram, we already know that conditions (F K ) and (E) follow from (V ) + (G), as well as the implications (F K ) =⇒ (DU E) and (E) =⇒ (9): (V ) z
(F K ) ⇓Prop. 5.1 (DU E) ⇓Prop. 12.3 [deriv]
|
+ (G) ⇓Prop. 5.5, 6.3 }| (E) ⇓Prop. 7.1 (9) + (V ) | {z } ⇓Prop. 9.1 (DL E) + (E) {z ⇓Prop. 13.1 (N L E) + (V ) | {z } ⇓Prop. 13.2 (L E)
(G) ⇓Prop. 10.1
{
(H ) ⇓Prop. 11.2 [osc] }
The central point in the diagram is Proposition 13.1, where (N L E) is obtained from (DU E), (DL E), (E), and (H ). The proof goes through the intermediate steps that
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
467
are denoted here by [osc] and [deriv]. The former refers to oscillation inequality (11.3) obtained from (H ) in Propositions 11.1 and 11.2, and the latter refers to upper estimate (12.5) for | pn+2 − pn | obtained from (DU E) in Proposition 12.3. The idea of obtaining (N L E) by means of an elliptic Harnack inequality seems to have appeared independently in papers by P. Auscher [2], [3], Barlow and Bass [6], [7], [8], and W. Hebisch and L. Saloff-Coste [38]. Basically, one views the heat equation for the heat kernel as an elliptic equation 1( pn + pn+1 ) = f,
where f = pn+2 − pn .
The elliptic Harnack inequality and the upper bound for E(x, r ) allow one to estimate the oscillation of pn + pn+1 via f . (In the continuous setting, the latter argument is classical and is due to J. Moser [49].) On the other hand, the on-diagonal upper bound for pn implies a suitable estimate for the discrete time derivative pn+2 − pn . The fact that (DU E) implies a certain estimate of the time derivative of the heat kernel is well known. In the context of manifolds it goes back to S. Cheng, Li, and Yau [17] and E. Davies [26], [27] (see also [34]); in the discrete setting it follows from the results of E. Carlen, S. Kusuoka, and D. Stroock [13] and T. Coulhon and Saloff-Coste [23]; and in the setting of fractals it is proved by Barlow and Bass [7]. Having an upper bound for the oscillation of pn + pn+1 and the on-diagonal lower bound for pn + pn+1 , one obtains (N L E). The final step in the proof—the implication (N L E) + (V ) =⇒ (L E)—is done by using the classical chaining argument of Moser [50] and D. Aronson [1]. The method of obtaining (DL E) from (9) and (V ) used in Proposition 9.1 is well known. Its various modifications can be found in [6], [11], [21], [24], [48], [56], and possibly in other places. The claim that Green kernel estimate (G) implies elliptic Harnack inequality (H ) would not surprise experts. In the context of the uniformly elliptic operators in R D , this was first observed by E. Landis [46, p. 145–146] and then was elaborated by N. Krylov and M. Safonov [43] and E. Fabes and Stroock [29]. However, this claim becomes rather nontrivial for arbitrary graphs (and manifolds) because of topological difficulties. We provide here a new, simple, and general proof of the implication (G) =⇒ (H ), which is based on the potential theoretic approach of A. Boukricha [12]. Finally, the converse implication (U E) + (L E) =⇒ (V ) + (G) is quite straightforward and is proved in Proposition 15.1. As a consequence of the above diagrams, we see that the following equivalence takes place: (F K ) + (V ) + (E) + (H ) ⇐⇒ (U E) + (L E).
468
GRIGOR’YAN AND TELCS
It is possible to show that this equivalence is also true for recurrent graphs. Furthermore, Faber-Krahn inequality (F K ) turns out to follow from (V ) + (E) + (H ), so that (V ) + (E) + (H ) ⇐⇒ (U E) + (L E). (4.2) Condition (H ) ensures here a necessary homogeneity of the graph, whereas (V ) and (E) provide the exponents α and β, respectively. Another consequence of the proof is that (V ) + (U E) + (H ) ⇐⇒ (U E) + (L E)
(4.3)
(see Remark 15.1). There are a number of conditions given in terms of capacities, eigenvalues, and so on, which can replace (E) or (U E) in (4.2) and (4.3), respectively. In the presence of (V ) and (H ), the purpose of the other condition is to recover the exponent β in (U E) and (L E). Note that if β = 2, then (U E) in (4.3) can be replaced by (DU E) (cf. [38]). The complete proofs of (4.2), (4.3), and other related statements will be given elsewhere. 5. The Faber-Krahn inequality and on-diagonal upper bounds Recall that a Faber-Krahn inequality holds on (0, µ) if there are constants c > 0 and ν > 0 such that, for all nonempty finite sets A ⊂ 0, λ1 (A) ≥ cµ(A)−1/ν .
(F K )
We discuss here relationships between eigenvalue estimates like (F K ) and estimates of the Green kernel, heat kernel, and volume growth. The outcome is the following implications: (V ) + (G) =⇒ (F K ) =⇒ (DU E), which are contained in Propositions 5.5 and 5.1, respectively, and which constitute a part of the proof of Theorem 2.1. 5.1 Let (0, µ) satisfy ( p0 ), and let ν be a positive number. Then the following conditions are equivalent: (a) Faber-Krahn inequality (F K ); (b) the on-diagonal heat kernel upper bound, for all x ∈ 0 and n ≥ 1, PROPOSITION
pn (x, x) ≤ Cn −ν ;
(DU E)
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
(c)
469
the estimate of the level sets of the Green kernel, for all x ∈ 0 and t > 0, µ{y : g(x, y) > t} ≤ Ct −ν/(ν−1) ,
(5.1)
provided ν > 1. The analogue of Proposition 5.1 for manifolds was proved by Carron [14]. The equivalence (a) ⇐⇒ (b) was also proved in [32] for heat kernels on manifolds, and in [20, Prop. V.1] for random walks satisfying in addition the condition infx P(x, x) > 0. We provide detailed proof only for the implications (a) =⇒ (b) and (c) =⇒ (a) which we use in this paper. The implication (b) =⇒ (c) can be proved in the following way. By a theorem of N. Varopoulos [63], (DU E) implies a Sobolev inequality. Then one applies an argument of [14, Prop. 1.14] (adapted to the discrete setting) to show that (5.1) follows from the Sobolev inequality. Note that our proof of (a) =⇒ (b) goes through for any ν > 0. If ν > 1, then one could apply the approach of [14] using a Sobolev inequality as an intermediate step between (a) and (b). In general, we use instead a Nash-type inequality that is obtained in the following lemma. LEMMA 5.2 Let (0, µ) be a weighted graph (which is not necessarily connected). Assume that, for any nonempty finite set A ⊂ 0,
λ1 (A) ≥ 3(µ (A)),
(5.2)
where 3(·) is a nonnegative nonincreasing function on (0, ∞). Let f (x) be a nonnegative function on 0 with finite support. Denote X X f (x)µ(x) = a and f 2 (x)µ(x) = b. x∈0
x∈0
Then, for any s > 0, 1X (∇x y f )2 µx y ≥ (b − 2sa) 3(a/s). 2 x∼y
(5.3)
Proof If b − 2sa < 0, then (5.3) trivially holds. So, we can assume in the sequel that s≤
b . 2a
Since b ≤ a max f , (5.4) implies s < max f and, therefore, the set As = {x ∈ 0 : f (x) > s}
(5.4)
470
GRIGOR’YAN AND TELCS f (x)
A
As = { f > s}
0
Figure 2. Sets A and As
is nonempty (see Figure 2). Consider function h = ( f − s)+ . This function belongs to c0 (As ) whence we obtain, by variational property (3.3) of eigenvalues, X 1X (∇x y h)2 µx y ≥ λ1 (As ) h 2 (x)µ(x). 2 x∼y
(5.5)
x∈0
Let us estimate all terms in (5.5) via f . We start with the obvious inequality f 2 ≤ ( f − s)2+ + 2s f = h 2 + 2s f, which holds for any s ≥ 0. It implies h 2 ≥ f 2 − 2s f whence X h 2 (x)µ(x) ≥ b − 2sa.
(5.6)
x∈0
The definition of As implies µ(As ) ≤ a/s whence, by (5.2), λ1 (As ) ≥ 3 (µ (As )) ≥ 3(a/s).
(5.7)
Clearly, we also have X x∼y
(∇x y h)2 µx y ≤
X
(∇x y f )2 µx y .
x∼y
Combining this with (5.7), (5.6), and (5.5), we obtain (5.3). We apply Lemma 5.2 for function 3(v) = cv −1/ν . Choosing s = b/(4a) in (5.3), we obtain 1X (∇x y f )2 µx y ≥ c a −2/ν b1+1/ν . (5.8) 2 x∼y This is a discrete version of the Nash inequality (cf. [51], [13]).
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
471
Proof of (a) =⇒ (b) in Proposition 5.1 Step 1. Let f be a nonnegative function on 0 with finite support. Denote, for simplicity, X X b= f 2 (x)µ(x) and b0 = [P f (x)]2 µ(x), x∈0
x∈0
where P is the Markov operator of (0, µ). Then we have b − b0 = ( f, f ) L 2 (0,µ) − (P f, P f ) L 2 (0,µ) = ( f, (I − P2 ) f ) L 2 (0,µ) . Clearly, Q := P2 is also a Markov operator on 0 reversible with respect to µ, and it is associated with another structure of a weighted graph on the set 0. Denote this weighted graph by (0 ∗ , µ∗ ). As a set, 0 ∗ coincides with 0, and the measures µ and µ∗ on vertices are the same. On the other hand, points x, y are connected by an edge on 0 ∗ if there is a path of length 2 from x to y in 0, and the weight µ∗x y on edges of 0 ∗ is defined by µ∗x y = Q(x, y)µ(x). Denote by 1∗ the Laplace operator of (0 ∗ , µ∗ ). Then 1∗ = P2 − I and, by Green formula (3.2), X 1 X b − b0 = − f (x) 1∗ f (x)µ(x) = (∇x y f )2 µ∗x y . (5.9) 2 x∈0
x,y∈0
Step 2. If A is a nonempty finite subset of 0, then [22, Lem. 4.3] says that∗ λ∗1 (A) ≥ λ1 (A),
(5.10)
where λ∗1 (A) is the first eigenvalue of −1∗A . By Faber-Krahn inequality (F K ) for the graph (0, µ), we obtain λ∗1 (A) ≥ cµ(A)−1/ν . (5.11) Since ( p0 ) and Proposition 3.1 imply X X V (x, 2) ≤ C µ(x) = Cµ(A) = Cµ∗ (A), µ(A) ≤ x∈A
(5.11) yields (F K ) for the graph
x∈A
(0 ∗ , µ∗ ).
Remark 5.1 The only place where ( p0 ) is used in the proof of (a) =⇒ (b) is to ensure that µ(A) ≤ Cµ(A). If this inequality holds for another reason, then the rest of the proof goes in the same way. ∗
The proof of (5.10) is based on variational property (3.3) and on the fact that all eigenvalues of −1 A belong to the interval [λ1 (A), 2 − λ1 (A)].
472
GRIGOR’YAN AND TELCS
Step 3. For some fixed y ∈ 0, denote f n (x) = pn (x, y) and X bn = f n2 (x)µ(x) = p2n (y, y). x∈0
Then f n+1 = P f n and we obtain, by (5.9), bn − bn+1 =
1 X (∇x y f n )2 µ∗x y . 2 x,y∈0
The graph (0 ∗ , µ∗ ) satisfies (F K ) so that Lemma 5.2 can be applied. Since X X f n (x)µ(x) = Pn (x, y) = 1, x∈0
x∈0
(5.8) yields 1 X 1+1/ν (∇x y f n )2 µ∗x y ≥ c bn , 2 x,y∈0
whence 1+1/ν
bn − bn+1 ≥ cbn
.
(5.12)
In particular, we see that bn > bn+1 . Next we apply an elementary inequality ν(x − y) ≥
x ν − yν , x ν−1 + y ν−1
(5.13) −1/ν
−1/ν
which is true for all x > y > 0 and ν > 0. Taking x = bn+1 and y = bn obtain, from (5.13) and (5.12), −1/ν
−1/ν
ν(bn+1 − bn
)≥
−1 bn+1 − bn−1 −(ν−1)/ν bn+1
−(ν−1)/ν + bn
=
whence −1/ν
−1/ν
bn+1 − bn
≥
bn − bn+1 1/ν bn+1 bn
1/ν + bn bn+1
, we
1+1/ν
≥
cbn
1+1/ν 2bn
=
c , 2
c = const. 2ν −1/ν
Summing up this inequality in n, we conclude that bn ≥ cn and bn ≤ Cn −ν . Since bn = p2n (y, y), we have proved that, for all y ∈ 0 and n ≥ 1, p2n (y, y) ≤ Cn −ν , which is (DU E) for all even times.
(5.14)
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
473
Step 4. By the semigroup identity, we have, for any 0 < k < m, X pm (x, y) = pm−k (x, z) pk (z, y)µ(z).
(5.15)
z∈0
In particular, if m = 2n, k = n, and y = x, then X p2n (x, x) = pn2 (x, z)µ(z).
(5.16)
z∈0
On the other hand, (5.15), the Cauchy-Schwarz inequality, and (5.16) imply X p2n (x, y) = pn (x, z) pn (z, y)µ(z) z∈0
≤
" X
#1/2 " X
pn2 (x, z)µ(x)
z∈0
#1/2 pn2 (y, z)µ(z)
,
z∈0
whence p2n (x, y) ≤ p2n (x, x)1/2 p2n (y, y)1/2 .
(5.17)
Together with (5.14), this yields p2n (x, y) ≤ Cn −ν for all x, y ∈ 0. This implies (DU E) also for odd times if we observe that, by (5.15) and (2.2), X p2n+1 (x, y) = p2n (x, z)P(z, y) ≤ max p2n (x, z). (5.18) z∈0
z∈0
Proof of (c) ⇒ (a) in Proposition 5.1 Let A be a nonempty finite subset of 0, and let f ∈ c0 (A) be the first eigenfunction of −1 A . We may assume that f ≥ 0. Let us normalize f so that max f = 1, and let x0 ∈ A be the maximum point of f . The equation −1 A f = λ1 (A) f implies, by (3.7), X f (x) = λ1 (A) G A (x, y) f (y), y∈A
whence, for x = x0 , 1 = λ1 (A)
X
G A (x0 , y) f (y) ≤ λ1 (A)
y∈A
G A (x0 , y)
y∈A
and λ1 (A) ≥
X
max x∈A
X y∈A
−1 G A (x, y) .
(5.19)
474
GRIGOR’YAN AND TELCS
On the other hand, for any x ∈ A, X
G A (x, y) =
y∈A
X
g A (x, y)µ(y) =
∞
Z
µ {g A (x, ·) > t} dt. 0
y∈A
Fix some t0 > 0, and estimate the integral above using (5.1), g A ≤ g, and the fact that µ {g A (x, ·) > t} ≤ µ(A). Then we obtain X y∈A
G A (x, y) ≤
t0
Z
µ(A) dt +
0
∞
Z
t0
−1/(ν−1)
Ct −ν/(ν−1) dt = µ(A)t0 + Ct0
.
Let us choose t0 ' µ(A)−(ν−1)/ν to equate the two terms on the right-hand side, whence X G A (x, y) ≤ Cµ(A)1/ν . (5.20) y∈A
Finally, (5.20) and (5.19) imply (F K ). The second result of this section is preceded by two lemmas. We say that a weighted graph (0, µ) satisfies the doubling volume condition if V (x, 2R) ≤ C V (x, R),
∀x ∈ 0, R > 0.
(D)
Clearly, (D) is a weaker assumption than (V ). 5.3 If (0, µ) satisfies (D), then, for all x ∈ 0 and R > 0, LEMMA
λ(x, R) ≤
C . R2
Proof Let us apply variational property (3.3) with the test function f (y) = (R − d(x, y))+ ∈ c0 (B(x, R)). Since ∇ yz f ≤ 1, (3.3) and (D) imply P (1/2) y∼z (∇ yz f )2 µ yz C0 C V (x, R) P 2 ≤ 2 ≤ 2, λ(x, R) ≤ R V (x, R/2) R y f (y)µ(y) which was to be proved.
(5.21)
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
475
The next lemma was proved in [59], but we give here a shorter proof. 5.4 Let (0, µ) satisfy ( p0 ). If (V ) and (G) hold, with some positive parameters α and γ , then α − γ ≥ 2 . LEMMA
Proof By (5.19), we have λ(x, R)−1 ≤
X
max
y∈B(x,R)
G(y, z).
(5.22)
z∈B(x,2R)
By (G) and Proposition 3.2, G(y, y) is uniformly bounded from above. Using (G) to estimate G(y, z) for y 6= z and (V ), we obtain X
G(y, z) = G(y, y) +
dlog 2 Re X
X
g(y, z)µ(z)
i=−1 z∈B(y,2−i R)\B(y,2−i−1 R)
z∈B(y,2R)
≤C +C
dlog 2 Re −γ X 2−i R V (y, 2−i R) i=−1
≤C 1+
dlog 2 Re X
2−i R
α−γ
.
(5.23)
i=−1
A straightforward computation of sum (5.23) yields, for large R, α−γ , α > γ, R X G(y, z) ≤ C log2 R, α = γ , z∈B(y,2R) 1, α < γ. Combining (5.22) and (5.24), we obtain −(α−γ ) , α > γ, R −1 λ(x, R) ≥ c log2 R , α = γ, 1, α < γ.
(5.24)
(5.25)
By Lemma 5.3, we have (5.21), which together with (5.25) implies α − γ ≥ 2. 5.5 Let (0, µ) satisfy ( p0 ). If (V ) and (G) hold with some positive parameters α and γ , then Faber-Krahn inequality (F K ) holds with the parameter ν = α/(α − γ ). PROPOSITION
476
GRIGOR’YAN AND TELCS
Proof Note that, by Lemma 5.4, we have α > γ so that ν is positive and, moreover, ν > 1. Let us verify that µ{y : g(x, y) > t} ≤ const t −α/γ . (5.26) Then (5.1) would follow with ν = α/(α − γ ), which implies (F K ), by Proposition 5.1. The upper bound in (G) and ( p0 ) implies that, for all x, y (including the case x = y; see Proposition 3.2), g(x, y) ≤ C min(1, d(x, y)−γ ).
(5.27)
If t ≥ C, then the set {y : g(x, y) > t} is empty, and (5.26) is trivially true. Assume now that t ≤ C. Then (5.27) implies µ{y : g(x, y) > t} ≤ µ{y : d(x, y) < (t/C)−1/γ } = V (x, (t/C)−1/γ ). Since R := (t/C)−1/γ ≥ 1, we can apply here the upper bound from (V ) and obtain (5.26). 6. The mean exit time and the Green kernel The purpose of this section is to verify part (V ) + (G) =⇒ (E) of the proof of Theorem 2.1. Recall that (E) stands for the condition E(x, R) ' R β ,
∀x ∈ 0, R ≥ 1.
(E)
Alongside the mean exit time E A (x), consider the maximal mean exit time E A defined by E A := sup E A (y). (6.1) y
If A = B(x, R), then we write E(x, R) := E B(x,R) . We also use the following hypothesis: E(x, R) ≤ C E(x, R), ∀x ∈ 0, R > 0. (E) 6.1 The upper bound in (E) implies, for all x ∈ 0 and R ≥ 1, PROPOSITION
E(x, R) ≤ C R β .
(6.2)
E(x, R) ≥ c R β .
(6.3)
The lower bound in (E) implies
Consequently, (E) implies (E) and E(x, R) ' R β .
(6.4)
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
477
Proof To show (6.2), let us observe that, for any point y ∈ B(x, R), we have B(x, R) ⊂ B(y, 2R), whence E(x, R) =
E B(x,R) (y) ≤
sup y∈B(x,R)
E B(y,2R) (y)
E(y, 2R) ≤ C R β .
sup
=
sup y∈B(x,R)
y∈B(x,R)
Lower bound (6.3) is obvious by E ≤ E. Finally, (E) follows from (E) and (6.4) if R ≥ 1, and (E) holds trivially if R < 1. PROPOSITION 6.2 For any nonempty finite set A ⊂ 0, we have
λ1 (A) ≥ (E A )−1 .
(6.5)
Proof Indeed, this is a combination of (5.19) and the definition of E (see (3.8) and (6.1)). The next statement was proved in [59]. PROPOSITION 6.3 Let (0, µ) satisfy ( p0 ). If (V ) and (G) hold, with some positive parameters α and γ , then (E) holds as well with β = α − γ .
Proof Denote A = B(x, R). Applying (3.8), the obvious inequality g A ≤ g, as well as (V ) and (G), we obtain (cf. (5.23) and (5.24)) X X E(x, R) = g A (x, y)µ(y) ≤ g(x, y)µ(y) ≤ C R α−γ . y∈A
y∈A
Observe that, by Lemma 5.4, we already know that α > γ . For the lower bound of E(x, R), let us prove that g A (x, y) ≥ c d(x, y)−γ ,
∀y ∈ B(x, ε R) \ {x} ,
(6.6)
provided ε > 0 is small enough. Consider the function u(y) = g(x, y) − g A (x, y) which is harmonic in A. By the maximum principle, its maximum is attained at the boundary of A, whence, by (G), 0 ≤ u(y) ≤ C R −γ .
478
GRIGOR’YAN AND TELCS
Therefore, g A (x, y) = g(x, y) − u(y) ≥ c d(x, y)−γ − C R −γ .
(6.7)
If R is large enough and if d(x, y) ≤ ε R with a small enough ε, then the second term in (6.7) is absorbed by the first one, whence (6.6) follows. Summing up (6.6) over y, we obtain (cf. (5.23) and (5.24)) X X E(x, R) = g A (x, y)µ(y) ≥ g A (x, y)µ(y) ≥ c R α−γ . y∈A
y∈B(x,ε R)\{x}
If R is not big enough, then the above argument does not work. However, in this case we argue as follows. If the random walk starts at x, then TB(x,R) ≥ R. Hence, we always have E(x, R) = Ex (TB(x,R) ) ≥ R which yields the lower bound in (E), provided R ≤ const. Assuming that (V ) and (E) hold, there are the following general relations between the exponents α and β: if the graph is transient, then 2 ≤ β ≤ α; and if it is recurrent, then 2 ≤ β ≤ α + 1 (see [59]; see also [60], [61] for various definitions of dimensions of graphs). 7. Sub-Gaussian term The following statement is crucial for obtaining the off-diagonal upper bound of the heat kernel. It contains the part (E) =⇒ (9) of the proof of Theorem 2.1. PROPOSITION 7.1 Assume that the graph (0, µ) possesses property (E). Then, for all x ∈ 0, R > 0, and n ≥ 1, we have β 1/(β−1) R 9n (x, R) ≤ C exp − . (9) Cn
We start with the following lemma. 7.2 Assume that hypothesis (E) holds on (0, µ). Let A = B(x0 , r ) be an arbitrary ball on 0, and let v be a function on A such that 0 ≤ v ≤ 1. Suppose that v satisfies in A the equation 1v = λv, (7.1) LEMMA
where λ is a constant such that λ ≥ (E A )−1 .
(7.2)
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
≥ε
479
v(x)
1
x0
A = B(x0 , r )
Figure 3. The value of the function v at the point x0 does not exceed 1 − ε
Then v(x0 ) ≤ 1 − ε,
(7.3)
where ε > 0 depends on the constants in hypothesis (E) (see Figure 3).
Proof Denote for simplicity u(x) = E A (x), and recall that u ∈ c0 (A) and 1u = −1 in A (cf. (3.9)). Also, denote 1 λ0 := (E A )−1 = . max u Consider the function w = 1 − (λ0 /2)u. Then 1/2 ≤ w ≤ 1 and, in A, 1w =
λ0 ≤ λ0 w ≤ λw. 2
Since v ≤ 1 and w = 1 outside A, the maximum principle for the operator 1 − λ implies that v ≤ w in A. In particular, v(x0 ) ≤ w(x0 ) = 1 − Hypothesis (E) yields
λ0 u(x0 ) u(x0 ) ≤ 1 − . 2 2 max u
u(x0 ) E(x0 , r ) = ≥ c, max u E(x0 , r )
whence (7.3) follows. 7.3 Assume that (0, µ) satisfies (E). Let A = B(x0 , R) be an arbitrary ball on 0, and let v be a function on A such that 0 ≤ v ≤ 1. If v satisfies, in A, equation (7.1 ) with a constant λ such that C R −β ≤ λ < λ, (7.4) LEMMA
480
GRIGOR’YAN AND TELCS
(r + 1)(i + 1) (r + 1)i xi+1 r xi
x0
Figure 4. The points xi where v(x) takes the maximum values
then
v(x0 ) ≤ exp −cλ1/β R .
(7.5)
Here λ is an arbitrary constant, C is some constant depending on condition (E), and c > 0 is some constant depending on λ and on condition (E). Proof Condition (E) implies (E) and E(x, R) ' R β (see Proposition 6.1). Choose the constant C in (7.4) so big that the lower bound in (7.4) implies λ ≥ E(x, R)−1 . Then, by Lemma 7.2, we obtain v(x0 ) ≤ 1 − ε. If we have, in addition, λ1/β R ≤ const,
(7.6)
then (7.5) is trivially satisfied. In particular, if R is in the bounded range, then (7.6) is true because λ is bounded from above by (7.4). Hence, we may assume in the sequel that R > C0
and
λ > C 00 R −β ,
(7.7)
with large enough constants C 0 and C 00 (in particular, C 00 C). The point of the present lemma is that it improves the previous one for this range of R and λ. Choose a number r from the equation λ = Cr −β , where C is the same constant as in (7.4). The above argument shows that Lemma 7.2 applies in any ball of radius r . Let xi , i ≥ 1, be a point in the ball B(x0 , (r + 1)i) in which v takes the maximum value in this ball, and denote m i = v(xi ) (see Figure 4). For i = 0, we set m 0 = v(x0 ). For each i ≥ 0, consider the ball Ai = B(xi , r ). Since Ai ⊂ B(xi , r + 1) ⊂ B(x0 , (r + 1)(i + 1)),
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
481
we have max v ≤ m i+1 . Ai
Applying Lemma 7.2 to the function v/m i+1 in the ball Ai , we obtain m i ≤ (1 − ε)m i+1 . Iterating this inequality k := bR/(r + 1)c times and using m k ≤ 1, we conclude v(x0 ) = m 0 ≤ (1 − ε)k .
(7.8)
By conditions (7.7) and (7.4) and by the choice of r , we have k'
R ' λ1/β R, r
so that (7.8) implies (7.5). 7.4 Assume that (0, µ) satisfies (E). Let A = B(x0 , R) be an arbitrary ball on 0, and let wn (x) be a function in A × N such that 0 ≤ w ≤ 1. Suppose that w solves in A × N the heat equation wn+1 − wn = 1wn (7.9) LEMMA
with initial data w0 ≡ 0 in A (see Figure 5). Then, for all n ≥ 1, wn (x0 ) ≤ exp
−c
Rβ n
1/(β−1)
+1 .
(7.10)
Proof First, consider two trivial cases. If R β ≤ Cn, then (7.10) is true just by w ≤ 1, provided c is small enough. Since 1w(x) depends only on the immediate neighbors of x, one gets by induction that wk (x) = 0 for all x ∈ B(x0 , R − k). Therefore, if R > n, then wn (x0 ) = 0, and (7.10) is true again. Hence, we may assume in the sequel that, for a large enough C, Cn 1/β < R ≤ n.
(7.11)
Fix some λ > 0, and find a function v(x) on A solving the boundary value problem ( 1v = λv in A, v=1
in A \ A.
482
GRIGOR’YAN AND TELCS
n
(x0 , n)
wn+1 − wn = 1w
w0 = 0
wn ≤ 1
A 0
Figure 5. The value of the function w at the point (x0 , n) is affected by the initial value w = 0 and by the boundary condition w≤1
The function u n (x) := (1 + λ)n v(x) solves heat equation (7.9) and satisfies the following boundary conditions: u n (x) ≥ 1 for x ∈ A \ A and u 0 (x) ≥ 0 for x ∈ A. By the parabolic comparison principle, we have w ≤ u. Assume for a moment that λ satisfies hypothesis (7.4) of Lemma 7.3. Then we estimate v(x0 ) by (7.5) and obtain wn (x0 ) ≤ (1 + λ)n v(x0 ) ≤ exp λn − cλ1/β R . Now, choose λ from the condition cλ1/β R = 2λn; that is, β/(β−1) cR λ= . 2n
(7.12)
As follows from (7.11), this particular λ satisfies (7.4). Therefore, the above application of Lemma 7.3 is justified, and we obtain β 1/(β−1) R wn (x0 ) ≤ exp(−λn) = exp − c0 , n finishing the proof. Proof of Proposition 7.1 Denote A = B(x0 , R). By (3.11), the function wn (x) := 9nA (x) satisfies all the hypotheses of Lemma 7.4. Hence, (9) follows from (7.10). 8. Off-diagonal upper bound of the heat kernel Here we prove the following implication: (F K ) + (E) =⇒ (U E),
(8.1)
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
483
which finishes the proof of the heat kernel upper bound in Theorem 2.1. Indeed, together with the implications Prop. 5.5
(V ) + (G) =⇒ (F K ) and
Prop. 6.3
(V ) + (G) =⇒ (E), (8.1) yields the part (V ) + (G) =⇒ (U E) of Theorem 2.1. 8.1 On any graph (0, µ), we have PROPOSITION
(DU E) + (9) =⇒ (U E).
(8.2)
In particular, if ( p0 ) holds on (0, µ), then (F K ) + (E) =⇒ (U E).
(8.3)
Proof By Proposition 5.1, ( p0 ) and (F K ) imply (DU E). By Proposition 7.1, (E) implies (9). Hence, implication (8.3) is a consequence of (8.2). To prove (8.2), let us fix some points x, y ∈ 0 and denote r = d(x, y)/2. Since balls B(x, r ) and B(y, r ) do not intersect, the semigroup identity (5.15) and the symmetry of the heat kernel imply, for any triple of nonnegative integers k, m, n such that k + m = n, X X pn (x, y) ≤ pm (x, z) pk (z, y)µ(z) + pm (x, z) pk (z, y)µ(z) z ∈B(x,r / )
z ∈B(y,r / )
≤ sup pk (z, y) z
X
Pm (x, z) + sup pm (x, z)
z ∈B(x,r / )
z
X
Pk (y, z)
z ∈B(y,r / )
= sup pk (y, z)Px (X m ∈ / B(x, r )) + sup pm (x, z)P y (X k ∈ / B(x, r )) . z
z
As follows from definition (3.10) of 9, Px (X m ∈ / B(x, r )) ≤ 9m (x, r ). Hence, we obtain the following general inequality, which is true for all reversible random walks: pn (x, y) ≤ sup pk (y, z)9m (x, r ) + sup pm (x, z)9k (y, r ). z
z
(8.4)
484
GRIGOR’YAN AND TELCS
As follows from (5.17), diagonal upper bound (DU E) implies, for all x, y ∈ 0, pn (x, y) ≤ Cn −α/β ,
(8.5)
provided n is even. Using inequality (5.18), we see that (8.5) also holds for odd n. Assuming n ≥ 2, choosing k ' m ' n/2, and applying (8.5) and (9) to estimate the right-hand side of (8.4), we obtain (U E). If n = 1, then (U E) follows trivially from (8.5) and the fact that pn (x, y) = 0 whenever d(x, y) > n. 9. On-diagonal lower bound In this section we prove part (9) + (V ) =⇒ (DL E) of Theorem 2.1. 9.1 Assume that hypothesis (9) holds on (0, µ). For arbitrary x ∈ 0 and R > 0, denote A = B(x, R). Then the following on-diagonal lower bound is true: PROPOSITION
A (x, x) ≥ p2n
c , V (x, Cn 1/β )
(9.1)
provided n ≤ ε R β , where ε is a sufficiently small positive constant depending only on the constants from (9). If in addition (V ) holds, then A p2n (x, x) ≥ cn −α/β ,
∀n ≤ ε R β .
(DL E)
Remark 9.1 A for any A = B(x, R), inequality (DL E) implies p (x, x) ≥ Since p2n ≥ p2n 2n −α/β cn for all positive integers n. Proof Let us fix some r ∈ (0, R) and denote B = B(x, r ). Since p B ≤ p A , it suffices to prove (9.1) for p B instead of p A , for some r < R. Semigroup identity (5.15) for p B and the Cauchy-Schwarz inequality imply B p2n (x, x)
=
X
pnB (x, z)2 µ(z)
z∈B
X 2 1 B ≥ pn (x, z)µ(z) . µ(B)
(9.2)
z∈B
Let us observe that X
pnB (·, z)µ(z) + 9nB (·) = 1.
(9.3)
z∈B
Indeed, the first term in (9.3) is the probability that the random walk X k stays in B up to the time k = n, whereas 9nB is the probability of the opposite event.
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
485
By hypothesis (9), we have 9nB (x) = 9n (x, r ) ≤ C exp
−
rβ Cn
1/(β−1) .
(9.4)
Choosing r = Cn 1/β for large enough C and assuming n ≤ ε R β for sufficiently small ε > 0 (the latter ensures r < R), we obtain, from (9.4), 9n (x, r ) ≤ 1/2, whence, by (9.3), X 1 pnB (x, z)µ(z) ≥ . 2 z∈B
Therefore, (9.2) yields B p2n (x, x) ≥
1/4 1/4 = , V (x, r ) V (x, Cn 1/β )
finishing the proof. 10. The Harnack inequality and the Green kernel Recall that the weighted graph (0, µ) satisfies the elliptic Harnack inequality if, for all x ∈ 0, R > 0, and for any nonnegative function u in B(x, 2R) which is harmonic in B(x, 2R), max u ≤ H min u (H ) B(x,R)
B(x,R)
with some constant H > 1. In this section we establish that (H ) is implied by condition (G). Recall that the latter refers to g(x, y) ' d(x, y)−γ ,
∀x 6 = y.
(G)
Consider the following annulus Harnack inequality for the Green kernel: for all x ∈ 0 and R > 1, max g(x, y) ≤ C
y∈A(x,R)
min
y∈A(x,R)
g(x, y),
(H G)
where A(x, R) := B(x, R) \ B(x, R/2). 10.1 Assume that ( p0 ) hold and the graph (0, µ) is transient. Then PROPOSITION
(G) =⇒ (H G) =⇒ (H ). Since the implication (G) =⇒ (H G) is obvious, we need to prove only the second implication. The main part of the proof is contained in the following lemma.
486
GRIGOR’YAN AND TELCS
U A B x
y
z
Figure 6. The sets B = U0 , A = U2 \ U1 , and U = U3 LEMMA 10.2 Let U0 ⊂ U1 ⊂ U2 ⊂ U3 be a sequence of finite sets in 0 such that Ui ⊂ Ui+1 , i = 0, 1, 2. Denote A = U2 \ U1 , B = U0 , and U = U3 . Then, for any function u that is nonnegative in U2 and harmonic in U2 , we have
max u ≤ H min u, B
B
where H := max max max x∈B y∈B z∈A
G U (y, z) G U (x, z)
(10.1)
(10.2)
(see Figure 6).
Remark 10.1 Note that no a priori assumption has been made about the graph (0, µ) (except for connectedness and unboundedness). If the graph is transient, then, by exhausting 0 by a sequence of finite sets U , we can replace G U in (10.2) by G. Note also that, without loss of generality, one can take U2 = U1 . Proof The following potential-theoretic argument is borrowed from [12]. We use the notation of Section 3. Given a nonnegative harmonic function u in U2 , denote by Su the following class of superharmonic functions: Su = v : v ≥ 0 in U , 1v ≤ 0 in U, and v ≥ u in U1 . Define the function w on U by w(x) = min {v(x) : v ∈ Su } .
(10.3)
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
487
u v w
U1
U1
U
Figure 7. The function u, a function v ∈ Su , and the function w = min Su v. The latter is harmonic in U1 and in U \ U 1 .
Clearly, w ∈ Su . Since the function u itself is also in Su , we have w ≤ u in U . On the other hand, by definition of Su , w ≥ u in U1 , whence we see that u = w in U1 (see Figure 7). In particular, it suffices to prove (10.1) for w instead of u. Let us show that w ∈ c0 (U ). Indeed, let v(x) = EU (x). Then, by (3.9) and the strong minimum principle, v is superharmonic and strictly positive in U . Hence, for a large enough constant C, we have Cv ≥ u in U1 , whence Cv ∈ Su and w ≤ Cv. Since v = 0 in U \ U , this implies w = 0 in U \ U and w ∈ c0 (U ). Denote f := −1w, and observe that f ≥ 0 in U . Since w ∈ c0 (U ), we have, for any x ∈ U , X w(x) = G U (x, z) f (z). (10.4) z∈U
Next we prove that f = 0 outside A so that the summation in (10.4) can be restricted to z ∈ A. Given this, we obtain, for all x, y ∈ B, P G U (y, z) f (z) w(y) = Pz∈A ≤ H, w(x) z∈A G U (x, z) f (z) whence (10.1) follows. We are left to verify that w is harmonic in U1 and outside U1 . Indeed, if x ∈ U1 , then 1w(x) = 1u(x) = 0 because w = u in U1 . Let 1w(x) 6= 0 for some x ∈ U \U1 . Since w is superharmonic, we have 1w(x) < 0 and X w(x) > Pw(x) = P(x, y)w(y). y∼x
488
GRIGOR’YAN AND TELCS
Consider the function w0 , which is equal to w everywhere in U except for the point x; and w0 at x is defined to satisfy X w0 (x) = P(x, y)w0 (y). y∼x
Clearly, w0 (x) < w(x), and w0 is superharmonic in U . Since w0 = w = u in U1 , we have w0 ∈ Su . Hence, by definition (10.3) of w, w ≤ w0 in U , which contradicts w(x) > w0 (x). Proof of Proposition 10.1 Now we assume (H G) and prove (H ). Given any ball B(x0 , 2R) of radius R > 4 and a nonnegative harmonic function u in B(x0 , 2R), define the sequence of radii R0 = R, R1 = 3R/2, and R2 = 2R, and denote Ui = B(x0 , Ri ) for i = 0, 1, 2 and U3 = 0. By Lemma 10.2, we have inequality (10.1), which implies (H ), provided we can show that the Harnack constant H from (10.2) is bounded from above, uniformly in x0 and R. Indeed, if x, y ∈ B(x0 , R) and z ∈ A = B(x0 , 2R) \ B(x0 , 3R/2), then both distances d(z, x) and d(z, y) are between R/2 and 7R/2. By iterating (H G) in the annuli centered at z, we obtain G(y, z) g(z, y) = ≤ const, G(x, z) g(z, x) whence we see that H is indeed uniformly bounded from above. The condition R > 4, which we have imposed above, ensures that Ui ⊂ Ui+1 , which is required for Lemma 10.2. If R ≤ 4, then (H ) simply follows from ( p0 ) and Proposition 3.2. 11. Oscillation inequalities For any nonempty finite set U and a function u on U , denote osc u := max u − min u. U
U
U
The purpose of this section is to prove estimate (11.3), which provides the step (H ) =⇒ [osc] of the proof of Theorem 2.1. 11.1 Assume that elliptic Harnack inequality (H ) holds on (0, µ). Then, for any ε > 0, there exists σ = σ (ε, H ) < 1 such that, for any ball B(x, R) and for any function u defined in B(x, R) and harmonic in B(x, R), we have PROPOSITION
osc u ≤ ε osc u.
B(x,σ R)
B(x,R)
(11.1)
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
489
Proof Fix a ball B(x, R), and denote for simplicity Br = B(x, r ). Let us prove that, for any r ∈ (0, R/3], osc u ≤ (1 − δ) osc u, (11.2) Br
B3r
where δ = δ(H ) ∈ (0, 1). Then (11.1) follows from (11.2) by iterating. If r ≤ 1, then the left-hand side of (11.2) vanishes and (11.2) is trivially satisfied. If r > 1, then B2r ⊂ B3r , and the function u − min B3r u is nonnegative in B2r and harmonic in B2r . Applying Harnack inequality (H ) to this function, we obtain max u − min u ≤ H min u − min u B3r
Br
Br
B3r
and osc u ≤ (H − 1) min u − min u . Br
Br
B3r
Similarly, we have osc u ≤ (H − 1) max u − max u . Br
B3r
Br
Summing up these two inequalities, we conclude osc u ≤ C osc u − osc u , Br
B3r
Br
whence (11.2) follows. 11.2 Assume that elliptic Harnack inequality (H ) holds on (0, µ). Let u ∈ c0 (B(x, R)) satisfy in B(x, R) the equation 1u = f . Then, for any positive r < R, osc u ≤ 2 E(x, r ) + εE(x, R) max | f | , (11.3) PROPOSITION
B(x,σ r )
where σ and ε are the same as in Proposition 11.1. Proof Denote for simplicity Br = B(x, r ). By definition of the Green function, we have X u(y) = − G B R (y, z) f (z), z∈B R
whence, using (3.8), we obtain max |u| ≤ E(x, R) max | f | .
490
GRIGOR’YAN AND TELCS
u v Br
BR
Figure 8. The functions u and v in the case f ≤ 0
Let v ∈ c0 (Br ) solve the Dirichlet problem 1v = f in Br (see Figure 8). In the same way, we have max |v| ≤ E(x, r ) max | f | . The function w = u − v is harmonic in Br whence, by Proposition 11.1, osc w ≤ ε osc w. Bσ r
Br
Since w = u on Br \ Br , the maximum principle implies that osc w = osc w = osc u ≤ 2 max |u|. Br
Br \Br
Br \Br
Hence, osc u ≤ osc v + osc w ≤ 2 max |v| + 2ε max |u| ≤ 2 E(x, r ) + εE(x, R) max | f |, Bσ r
Bσ r
Bσ r
which was to be proved. 12. Time derivative of the heat kernel Given a function u n (x) on 0 ×N, by the “time derivative” of u we mean the difference ∂n u := u n+2 − u n . The main result of this section is Proposition 12.3, which provides upper bound (12.5) for ∂n p and thus constitutes the part (DU E) =⇒ [deriv] of the proof of Theorem 2.1. The crucial point is that ∂n p decays as n → ∞ faster than pn . The analogue of the time derivative in the discrete case is ∂n p = pn+2 − pn rather than pn+1 − pn . Indeed, in Z D (as well as in any other bipartite graph), pn (x, x) = 0 if n is odd. Therefore, the difference pn+1 (x, x) − pn (x, x) is equal either to pn+1 (x, x) or to − pn (x, x), and hence it decays as n → ∞ at the same rate as pn (x, x). PROPOSITION 12.1 Let A be a nonempty finite subset of 0, and let f be a function on A. Define
u n (x) = PnA f (x).
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
491
Then, for all integers 1 ≤ k ≤ n, k∂n uk L 2 (A,µ) ≤
1 ku n−k k L 2 (A,µ) . k
Proof The proof follows the argument from [17]. Let φ1 , φ2 , . . . , φ|A| be the eigenfunctions of the Laplace operator −1 A , and let λ1 , λ2 ,. . . ,λ|A| be the corresponding eigenvalues. Let us normalize φi ’s to form an orthonormal basis in L 2 (A, µ). The function f can be expanded in this basis: X f = ci φi . i
Since
PA
= I − (−1 A ), we obtain un =
X
ρin φi ,
(12.1)
i
where ρi := 1 − λi are eigenvalues of the Markov operator P A . From (12.1), we obtain X u n − u n+2 = 1 − ρi2 ρin φi and ku n − u n+2 k2L 2 (A,µ) =
2 X 1 − ρi2 ρi2n .
(12.2)
i
Note that |ρi | ≤ 1 and hence ρi2 ∈ [0, 1]. For any a ∈ [0, 1], we have 1 ≥ (1 + a + a 2 + · · · + a k )(1 − a) ≥ ka k (1 − a), whence
1 . k Applying this inequality for a = ρi2 , we obtain, from (12.2), (1 − a) a k ≤
ku n − u n+2 k2L 2 (A,µ) ≤
1 X 2(n−k) 1 ρi = 2 ku n−k k2L 2 (A,µ) , k2 k i
which was to be proved. 12.2 Let A be a nonempty finite subset of 0. Then, for all x, y ∈ A, 1q A (x, x) p A p2m ∂n p A (x, y) ≤ 2(n−m−k) (y, y) k PROPOSITION
for all positive integers n, m, k such that m + k ≤ n.
(12.3)
492
GRIGOR’YAN AND TELCS
Proof From semigroup identity (5.15) for p A , we obtain X ∂n p A (x, y) = pmA (x, z)∂n−m p A (z, y)µ(z), z∈A
whence
∂n p A (x, y) ≤ pmA (x, ·)
L 2 (A,µ)
∂n−m p A (y, ·)
L 2 (A,µ)
.
By Proposition 12.1,
∂n−m p A (y, ·)
L 2 (A,µ)
≤
1
A
pn−m−k (y, ·) 2 L (A,µ) k
for any 1 ≤ k ≤ n − m. Since
2
A
pm (x, ·) 2
L (A,µ)
=
X
A pmA (x, z)2 µ(z) = p2m (x, x),
z∈A
we obtain (12.3). PROPOSITION 12.3 Suppose that (DU E) holds; that is, for all x ∈ 0 and n ≥ 1,
pn (x, x) ≤ Cn −ν .
(12.4)
|∂n p(x, y)| ≤ Cn −ν−1 .
(12.5)
Then, for all x, y ∈ 0 and n ≥ 1,
Proof First, assume n > 3. Then we can choose k and m in (12.3) so that k ' m ' n/3 and n − m − k ' n/3. As follows from (12.4), for any nonempty finite set A ⊂ 0, A p2m (x, x) ≤ Cn −ν
and
A p2(n−m−k) (y, y) ≤ Cn −ν ,
whence, by Proposition 12.1, ∂n p A (x, y) ≤ Cn −ν−1 . By letting A → 0, we obtain (12.5). If n ≤ 3, then (12.5) follows from the trivial inequality |∂n p| ≤ pn + pn+2 and the fact that (12.4) implies a similar bound for pn (x, y) (cf. (5.17) and (5.18)).
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
493
13. Off-diagonal lower bound An important intermediate step in proving the lower estimate (L E) is a near-diagonal lower estimate pn (x, y) + pn+1 (x, y) ≥ cn −α/β (N L E) for all x, y ∈ 0 and n ≥ 1 such that d(x, y) ≤ δn 1/β .
(13.1)
In this section we finish the proof of lower bound (L E) in Theorem 2.1 as on the following diagram: (V ) + (G) =⇒ (F K ) + (V ) + (E) + (H ) =⇒ (N L E) + (V ) =⇒ (L E). The first implication here is given by Propositions 5.5, 6.3, and 10.1, whereas the other two are proved below. Let us recall that (DL E) refers to the lower bound B(x,R)
p2n
(x, x) ≥ cn −α/β ,
∀n ≤ ε R β ,
(DL E)
with some small enough ε > 0, and (DU E) refers to the upper bound pn (x, x) ≤ Cn −α/β .
(DU E)
Denote for simplicity by (E ≤) the upper bound in (E); that is, E(x, R) ≤ C R β ,
∀x ∈ 0, R ≥ 1.
(E ≤)
13.1 For any graph (0, µ), we have PROPOSITION
(DU E) + (DL E) + (E ≤) + (H ) =⇒ (N L E).
(13.2)
Consequently, if ( p0 ) holds on (0, µ), then (F K ) + (V ) + (E) + (H ) =⇒ (N L E).
(13.3)
Proof Let us first show how the second claim follows from the first one. Recall that, by Proposition 5.1, (F K ) =⇒ (DU E); by Proposition 7.1, (E) =⇒ (9); and, by Proposition 9.1, (9) + (V ) =⇒ (DL E). Hence, the hypotheses of (13.3) imply the hypotheses of (13.2). To prove (13.2), fix x ∈ 0, n ≥ 1, and set 1/β n R= (13.4) ε
494
GRIGOR’YAN AND TELCS
for a small enough positive ε. So far we only assume that ε satisfies (DL E), but later one more upper bound on ε is imposed. Denote A = B(x, R), and introduce the function A u(y) := pnA (x, y) + pn+1 (x, y). By hypothesis (DL E), we have u(x) ≥ cn −α/β . Let us show that |u(x) − u(y)| ≤
c −α/β n 2
(13.5)
for all y such that d(x, y) ≤ δn 1/β , which would imply u(y) ≥ (c/2)n −α/β and hence prove (N L E). The function u(y) is in the class c0 (A) and solves the equation 1u(y) = f (y) where A f (y) := pn+2 (x, y) − pnA (x, y). On-diagonal upper bound (DU E) implies, by Proposition 12.3, max | f (y)| ≤ y
C n α/β+1
.
(13.6)
By (H ) and Proposition 11.2, we have, for any 0 < r < R and for some σ ∈ (0, 1), (13.7) osc u ≤ 2 E(x, r ) + ε2 E(x, R) max | f | . B(x,σ r )
By Proposition 6.1, (E ≤) implies a similar upper bound for E. Estimating max | f | by (13.6), we obtain, from (13.7), osc u ≤ C
B(x,σ r )
r β + ε2 R β . n α/β+1
Choosing r to satisfy r β = ε2 R β and substituting from (13.4) n = ε R β , we obtain osc u ≤ C
B(x,σ r )
ε2 R β = Cεn −α/β , n α/β+1
which implies osc u ≤
B(x,σ r )
c −α/β n , 2
(13.8)
provided ε is small enough. Note that σ r = σ ε 2/β R = σ ε2/β
1/β n = σ ε1/β n 1/β = δn 1/β , ε
where δ := σ ε1/β . Hence, (13.8) implies (13.5), provided d(x, y) ≤ δn 1/β , which was to be proved.
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
495
The final step in proving part (V ) + (G) =⇒ (L E) of Theorem 2.1 is covered by the following statement. Denote by (V ≥) the lower bound in (V ) that is, V (x, R) ≥ c R α ,
∀x ∈ 0, R ≥ 1.
(13.9)
13.2 Assume that (0, µ) satisfies ( p0 ). Then PROPOSITION
(N L E) + (V ≥) =⇒ (L E). We precede the proof with the following lemmas. Denote for simplicity P˜n = Pn + Pn+1 ,
(13.10)
where Pn is the n-convolution power of the Markov operator P. In particular, we have Pn Pm = Pn+m .
(13.11)
We need a replacement for this property for the operator P˜n , which is stated in Lemma 13.5. LEMMA 13.3 Assume that ( p0 ) holds on (0, µ). Then, for all integers n ≥ l ≥ 1 such that
n≡l
(mod 2),
(13.12)
we have Pl (x, y) ≤ C n−l Pn (x, y)
(13.13)
for all x, y ∈ 0, with a constant C = C( p0 ). Proof By semigroup property (5.15), we have X Pk+2 (x, y) = Pk (x, z)P2 (z, y) ≥ Pk (x, y)P2 (y, y). z∈0
Using ( p0 ), we obtain P2 (y, y) =
X z∼y
P(y, z)P(z, y) ≥ p0
X
P(y, z) = p0 ,
z∼y
whence Pk+2 (x, y) ≥ p0 Pk (x, y). Iterating this inequality, we obtain (13.13) with −1/2 C = p0 .
496
GRIGOR’YAN AND TELCS
LEMMA 13.4 Assume that (0, µ) satisfies ( p0 ). Then, for all integers n ≥ l≥ 1 and all x, y ∈ 0,
P˜l (x, y) ≤ C n−l P˜n (x, y),
(13.14)
where C = C( p0 ). Remark 13.1 Note that no parity condition is required here in contrast to condition (13.12) of Lemma 13.3. Proof This is an immediate consequence of Lemma 13.3 because both Pl (x, y) and Pl+1 (x, y) can be estimated from above via either Pn (x, y) or Pn+1 (x, y) depending on the parity of n and l. LEMMA 13.5 Assume that (0, µ) satisfies ( p0 ). Then, for all n, m ∈ N and x, y ∈ 0, we have the following inequality:
P˜n P˜m (x, y) ≤ C P˜n+m+1 (x, y),
(13.15)
where C = C( p0 ). Proof Observe that, by (13.10) and (13.11), P˜n P˜m = (Pn + Pn+1 )(Pm + Pm+1 ) = Pn+m + 2Pn+m+1 + Pn+m+2 . By Lemma 13.3, Pn+m (x, y) ≤ C Pn+m+2 , whence P˜n P˜m (x, y) ≤ C(Pn+m+1 + Pn+m+2 ) = C P˜n+m+1 .
LEMMA 13.6 Assume that (0, µ) satisfies ( p0 ). Then, for all x, y ∈ 0 and k, m, n ∈ N such that n ≥ km + k − 1, we have the following inequality: k P˜m (x, y) ≤ C n−km P˜n (x, y). (13.16)
Proof By induction, (13.15) implies k P˜m (x, y) ≤ C k−1 P˜km+k−1 (x, y).
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
497
From inequality (13.14) with l = km + k − 1, we obtain P˜km+k−1 (x, y) ≤ C n−km−(k−1) P˜n (x, y), whence (13.16) follows. Proof of Proposition 13.2 Since P˜n (x, y) = ( pn (x, y) + pn+1 (x, y))µ(y), (N L E) can be stated as follows: P˜n (x, y) ≥ cn −α/β µ(y) if d(x, y) ≤ δn 1/β .
(13.17)
The required (L E) takes the form β d (x, y) 1/(β−1) −α/β ˜ Pn (x, y) ≥ cn µ(y) exp − . cn
(13.18)
To prove (13.18), fix x, y ∈ 0, n ≥ d(x, y), and consider the following cases: Case 1: d(x, y) ≤ δn 1/β , Case 2: δn 1/β < d(x, y) ≤ εn, Case 3: εn < d(x, y) ≤ n. Here δ is the constant from (13.17) and ε > 0 is a small constant to be chosen later. In the first case, (13.18) coincides with (13.17). In the third case, (13.18) becomes P˜n (x, y) ≥ cn −α/β µ(y) exp(−Cn), (13.19) which can be deduced directly from ( p0 ). Indeed, depending on the parity of n, there is a path from x to y of length either n or n + 1. The Px -probability that the random −(n+1) walk follows this path is at least p0 , whence P˜n (x, y) ≥ exp(−Cn). This implies (13.19), using the fact that µ(y) ≤ C. The latter is proved as follows. Take, in (13.17), x ∼ y and n ' δ −β . Then (13.17) implies 1 ≥ P˜n (x, y) ≥ cδ α µ(y), whence µ(y) ≤ C. Consider the main second case. Denote d = d(x, y), take a positive integer k such that k ≤ d, (13.20)
498
GRIGOR’YAN AND TELCS
ok−1 o2
y = ok
o3
x = o1
Figure 9. The chain of balls B(oi , r )
and define m by n − 1. m= k
(13.21)
Since k ≤ d ≤ εn, we see that n/k ≥ ε−1 and that m is positive. Since n ≥ k(m + 1), Lemma 13.6 applies and yields k C n−mk P˜n (x, y) ≥ P˜m (x, y). (13.22) In order to estimate ( P˜m )k (x, y), observe that there exists a sequence o1 , o2 , . . . , ok of points on 0 such that x = o1 , y = ok , and, for all i = 1, 2, . . . , k − 1, d(x, y) =: r (13.23) d(oi , oi+1 ) ≤ k (see Figure 9). Clearly, we have X k P˜m (x, y) ≥ z 1 ∈B(o1 ,r )
X
···
P˜m (x, z 1 ) P˜m (z 1 , z 2 ) · · · P˜m (z k−1 , y).
z k−1 ∈B(ok−1 ,r )
(13.24) Assume that we have, in addition, 3r ≤ δm 1/β .
(13.25)
Since d(z i−1 , z i ) ≤ 3r , each P˜m (z i−1 , z i ) can be estimated by (13.17) as follows: P˜m (z i−1 , z i ) ≥ cm −α/β µ(z i ). The same applies to P˜m (x, z 1 ) and P˜m (z k−1 , y). Using the lower bound of volume (13.9), we obtain, from (13.22) and 13.24), C n−mk P˜n (x, y) ≥ (cm −α/β )k−1 V (o1 , r ) · · · V (ok−1 , r )µ(y) ≥ ck m −(α/β)k r α(k−1) µ(y).
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
499
Hence, P˜n (x, y) ≥ cn−mk+k m −(α/β)k r α(k−1) ≥ ck m −α/β
r
α(k−1)
m 1/β
,
(13.26)
where we have used the fact that n − mk + k ≤ 3k, which follows from (13.21). Before we go further, let us specify the choice of k to ensure that both (13.20) and (13.25) hold. Using definitions (13.21) and (13.23) of m and r , we see that (13.25) is equivalent to 1/β d n C ≤δ k k or β 1/(β−1) −β/(β−1) d k ≥ Cδ . (13.27) n Let k be the minimal possible integer satisfying (13.27). By the hypothesis d ≥ δn 1/β , we have β 1/(β−1) d k' . (13.28) n Condition (13.20) follows from the hypothesis n ≥ ε−1 d, provided ε is small enough. From (13.28), (13.21), and (13.25), we obtain β/(β−1) 1/(β−1) n n m' and r' . d d Hence, by (13.26) and m ≤ n/k, P˜n (x, y) ≥ ck m −α/β ≥ n −α/β k α/β exp(−Ck) ≥ n −α/β exp(−C 0 k). Substituting here k from (13.28), we obtain (13.18). 14. Parity matters Let us recall that (L E) contains the estimate for pn + pn+1 rather than for pn . In this section we discuss to what extent it is possible to estimate pn from below. In general, there is no lower bound for pn (x, y) for the parity reason. Indeed, on any bipartite graph, the length of any path from x to y has the same parity as d(x, y). Therefore, pn (x, y) = 0 if n 6≡ d(x, y) (mod 2). We immediately obtain the following result for bipartite graphs. PROPOSITION 14.1 If (0, µ) is bipartite and satisfies (L E), then d(x, y)β 1/(β−1) pn (x, y) ≥ cn −α/β exp − cn
(14.1)
500
GRIGOR’YAN AND TELCS
for all x, y ∈ 0 and n ≥ 1 such that n ≥ d(x, y)
and
n ≡ d(x, y)
(mod 2).
(14.2)
Proof Indeed, assuming (14.2), n + 1 and d(x, y) have different parities whence pn+1 (x, y) = 0, and (14.1) follows from (L E). If there is enough “mixing of parity” in the graph, then one does get the lower bound regardless of the parity of n and d(x, y). 14.2 Assume that graph (0, µ) satisfies ( p0 ), (L E), and the following “mixing” condition: there is an odd positive integer n 0 such that PROPOSITION
inf Pn 0 (x, x) > 0.
x∈0
(14.3)
Then lower bound (14.1) holds for all n > n 0 and x, y ∈ 0, provided n ≥ d(x, y). For example, if n 0 = 1, then hypothesis (14.3) means that each point x ∈ 0 has a loop edge x x. If n 0 = 3 and there are no loops, then (14.3) means that, for each point x ∈ 0, there is an edge triangle x y, yz, zx. This property holds, in particular, for the graphical Sierpi´nski gasket (see Figure 1). Proof By (9.2) we obtain, for any positive integer m, 2 X 1 1 p2m (x, x) ≥ pm (x, z)µ(z) = . V (x, m + 1) V (x, m + 1) z∈B(x,m+1)
Condition ( p0 ) and Proposition 3.1 imply V (x, m + 1) ≤ C m+1 µ(x), whence P2m (x, x) = p2m (x, x)µ(x) ≥ C −m−1 . Since we use this lower estimate only for the bounded range of m ≤ m 0 , we can rewrite it as P2m (x, x) ≥ c, (14.4) where c = c(m 0 ) > 0. Assuming n > n 0 , we have, by semigroup property (5.15), X pn (x, y) = pn−n 0 (x, z)Pn 0 (z, y) ≥ pn−n 0 (x, y)Pn 0 (y, y) z∈0
(14.5)
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
501
x
m
ξ o
y
m
−m
Figure 10. Every path of odd length from x to y goes through o and ξ
and, in the same way, pn (x, y) ≥ pn−n 0 +1 Pn 0 −1 (y, y).
(14.6)
By hypothesis (14.3), we can estimate Pn 0 (y, y) from below by a positive constant. Also, Pn 0 −1 (y, y) is bounded below by a constant, as in (14.4). Hence, adding up (14.5) and (14.6), we obtain pn (x, y) ≥ c( pn−n 0 (x, y) + pn−n 0 +1 (x, y)).
(14.7)
The right-hand side of (14.7) can be estimated from below by (L E), whence (14.1) follows. Finally, let us show an example that explains why in general one cannot replace pn + pn+1 in (L E) by pn , even assuming the parity condition n ≡ d(x, y) (mod 2). Example 14.1 Let (0, µ) be Z D with the standard weight µx y = 1 for x ∼ y, and let D > 4. We modify 0 by adding one more edge ξ of weight 1, which connects the origin o = (0, 0, . . . , 0) to the point (1, 1, 0, 0, . . . , 0), and we denote the new graph by (0 0 , µ0 ). Clearly, the volume growth and the Green kernel on (0 0 , µ0 ) are of the same order
502
GRIGOR’YAN AND TELCS
as on (0, µ); that is, V (x, r ) ' r D
and
g(x, y) ' d(x, y)2−D .
Hence, for both graphs one has, by Theorem 2.1, d 2 (x, y) −D/2 pn (x, y) ≤ Cn exp − Cn
(14.8)
and a similar lower bound (L E) for pn + pn+1 . Since Z D is bipartite, we have for (0, µ), by Proposition 14.1, d 2 (x, y) −D/2 pn (x, y) ≥ cn exp − if n ≥ d(x, y) and n ≡ d(x, y) (mod 2). cn (14.9) Let us show that (0 0 , µ0 ) does not satisfy (14.9). Fix some (large) odd integer m, and consider points x = (m, m, 0, 0, . . . , 0) and y = −x (see Figure 10). The distance d(x, y) on 0 is equal to 4m, whereas the distance d 0 (x, y) on 0 0 is 4m − 1, due to the shortcut ξ . Denote n = m 2 . Then n ≡ d 0 (x, y) (mod 2) and n > d 0 (x, y). Let us estimate from above pn (x, y) on (0 0 , µ0 ) and show that it does not satisfy lower bound (14.9). Since n is odd and all odd paths from x to y have to go through the edge ξ , the strong Markov property yields pn (x, y) =
n X
Px (τ = k) pn−k (o, y),
(14.10)
k=0
where τ is the first time the random walk hits the point o. If n − k < m, then pn−k (o, y) = 0. If n − k ≥ m, then we estimate pn−k (o, y) by (14.8) as follows: pn−k (o, y) ≤
C (n − k)
D/2
≤
C . m D/2
Therefore, (14.10) implies pn (x, y) ≤ Cm −D/2 Px {τ < ∞} . The Px -probability to hit o is of the order g(x, o) ' m 2−D . Hence, we obtain pn (x, y) ≤ Cm −(3D/2−2) = Cn −(3D/4−1) = o(n −D/2 ) so that lower bound (14.9) cannot hold. A more careful argument shows that, in fact, pn (x, y) ' n −(D−1) .
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
503
15. Consequences of the heat kernel estimates Here we prove the remaining part of Theorem 2.1, as stated in the next proposition. PROPOSITION 15.1 Assuming ( p0 ), we have
(L E) + (U E) =⇒ (V ) + (G). Proof The Green kernel is related to the heat kernel by g(x, y) =
∞ X
pn (x, y).
(15.1)
n=0
Let x 6= y. Then p0 (x, y) = 0, and the upper bound (U E) for pn implies the upper bound for g as follows: g(x, y) ≤ C
∞ X
n
−α/β
exp
n=1
dβ −c n
1/(β−1) ,
where d = d(x, y). By estimating the sum via an integral, we obtain g(x, y) ≤ Cd −γ with γ = α − β. Similarly, one proves g(x, y) ≤ Cd −γ using (L E) and the obvious consequence of (15.1): ∞
g(x, y) ≥
1X ( pn (x, y) + pn+1 (x, y)). 2 n=1
Let us prove the upper bound for the volume V (x, R) ≤ C R α for any x ∈ 0 and R ≥ 1. Indeed, for any n ∈ N, we have X pn (x, y)µ(y) ≡ 1, y∈0
whence X
( pn (x, y) + pn+1 (x, y))µ(y) ≤ 2
yeB(x,R)
and V (x, R) ≤ 2
inf
−1 ( pn (x, y) + pn+1 (x, y)) .
y∈B(x,R)
(V ≤)
(15.2)
504
GRIGOR’YAN AND TELCS
Taking n ' R β and applying (L E), we see that the inf is bounded below by cn −α/β ' R −α whence (V ≤) follows. Let us prove the lower bound for the volume V (x, R) ≥ c R α .
(V ≥)
We first show that (U E) and (V ≤) imply the following inequality: X
pn (x, y)µ(y) ≤
y ∈B(x,R) /
1 , 2
∀n ≤ ε R β ,
(15.3)
provided ε > 0 is sufficiently small. Denoting Rk = 2k R, we have X X d(x, y)β 1/(β−1) pn (x, y)µ(y) ≤ C n −α/β exp − c n y ∈B(x,R) /
y ∈B(x,R) /
≤C
∞ X
X
k=0 y∈B(x,Rk+1 )\B(x,Rk )
β 1/(β−1) Rk n −α/β exp − c n
β Rk 1/(β−1) ≤C exp − c n k=0 ∞ X 2k R α 2k R β/(β−1) =C exp − c 1/β . n 1/β n ∞ X
Rkα n −α/β
(15.4)
k=0
If R/n 1/β is large enough, then the right-hand side of (15.4) is majorized by a geometric series, and the sum can be made arbitrarily small, in particular, smaller than 1/2. From (15.2) and (15.3), we conclude that X
pn (x, y)µ(y) ≥
y∈B(x,R)
whence 1 V (x, R) ≥ 2
sup
1 , 2
(15.5)
−1 pn (x, y) .
y∈B(x,R)
Finally, choosing n = [ε R β ] and using the upper bound pn (x, y) ≤ Cn −α/β , we obtain (V ≥). This argument works only if ε R β ≥ 1. Let us now prove (V ≥) for the opposite β case when ε R β < 1. To that end, define R0 by ε R0 = 1. Then we have R < Ro . By hypothesis ( p0 ) and Proposition 3.1, we have V (x, R0 ) ≤ Cµ(x). Combining with the lower bound (V ≥) for V (x, R0 ), we obtain µ(x) ≥ c > 0. In particular, for any R > 0, we have V (x, R) ≥ c, which implies (V ≥) for the bounded range of R.
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
505
Remark 15.1 Using a similar argument, one can also show the following implication: (V ) + (U E) + (H ) =⇒ (L E).
(15.6)
Indeed, as we have seen in the proof of Proposition 15.1, (U E) implies (G ≤), which, together with (V ), is enough to obtain (E ≤) (see Proposition 6.3). From (U E) and (V ), one obtains the diagonal lower bound p2n (x, x) ≥ cn −α/β . Indeed, from (9.2) and (15.5) with R = Cn 1/β , we deduce p2n (x, x) ≥
1 V (x, R)
X
2 pn (x, y) dµ(y) ≥
y∈B(x,R)
1 ' n −α/β . 4V (x, R)
From this estimate, one gets (DL E) (see [56]; the argument is similar to the proof of (6.6)). Also, (DU E) follows trivially from (U E). Hence, having (DU E), (DL E), (E ≤), and (H ), we obtain (N L E) by Proposition 13.1, and then we deduce (L E) from (N L E) + (V ) by Proposition 13.2. Implication (15.6) yields that (V ) + (U E) + (H ) is equivalent to either of our main conditions (V ) + (G) and (U E) + (L E). Indeed, we have (V ) + (G) =⇒ (V ) + (U E) + (H ) =⇒ (U E) + (L E), where the first implication follows by Theorem 2.1 and Proposition 10.1, and the second is the same as (15.6). We are left to close the circle by Theorem 2.1 or Proposition 15.1. Appendix. The list of the lettered conditions Here we provide a list of the lettered conditions frequently used in the paper. The relations among the exponents α, β, γ , ν are as follows: α > β ≥ 2,
γ = α − β,
and
ν = α/β.
In all conditions, n is an arbitrary positive integer, R is an arbitrary positive real number, and x, y are arbitrary points on 0, subject to additional restrictions if any. The constants C, c, δ, ε, p0 are positive. We have the following list: V (x, R) ' R α ,
∀R ≥ 1,
(V )
E(x, R) ' R β ,
∀R ≥ 1,
(E)
g(x, y) ' d(x, y)−γ ,
x 6 = y,
(G)
506
GRIGOR’YAN AND TELCS
V (x, 2R) ≤ C V (x, R),
(D)
E(x, R) ≤ C E(x, R),
(E)
λ1 (A) ≥ cµ(A)−1/ν
for all nonempty finite sets A ⊂ 0,
pn (x, x) ≤ Cn −1/ν , pn (x, y) ≤ Cn
( pn + pn+1 )(x, y) ≥ cn
−α/β
B(x,R)
p2n
−α/β
exp −
exp −
d(x, y)β Cn
d(x, y)β cn
(x, x) ≥ cn −α/β
pn (x, y) + pn+1 (x, y) ≥ cn −α/β
(F K ) (DU E)
1/(β−1) ,
(U E)
1/(β−1) if n ≥ d(x, y), (L E)
if n ≤ ε R β ,
if d(x, y) ≤ δn 1/β ,
(DL E) (N L E)
β 1/(β−1) R 9n (x, R) := Px (TB(x,R) ≤ n) ≤ C exp − , Cn P(x, y) ≥ p0
if x ∼ y,
max B(x,R) u ≤ H min B(x,R) u
(9) ( p0 ) (H )
for any function u nonnegative in B(x, 2R) and harmonic in B(x, 2R). References [1]
[2] [3]
D. G. ARONSON, Non-negative solutions of linear parabolic equations, Ann. Scuola
Norm. Sup. Pisa (3) 22 (1968), 607–694, MR 55:8553; Addendum, Ann. Scuola Norm. Sup. Pisa (3) 25 (1971), 221–228. MR 55:8554 467 P. AUSCHER, Regularity theorems and heat kernel for elliptic operators, J. London Math. Soc. (2) 54 (1996), 284–296. MR 97f:35034 467 P. AUSCHER and T. COULHON, Gaussian lower bounds for random walks from elliptic regularity, Ann. Inst. H. Poincar´e Probab. Statist. 35 (1999), 605–630. MR 2000m:60086 467
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
[4]
[5]
[6] [7] [8]
[9] [10] [11]
[12]
[13]
[14]
[15] [16] [17]
[18]
[19] [20] [21]
507
M. T. BARLOW, “Diffusions on fractals” in Lectures on Probability Theory and
Statistics (Saint-Flour, France, 1995), Lecture Notes in Math. 1690, Springer, Berlin, 1998, 1–121. MR 2000a:60148 452, 458, 459 M. T. BARLOW and R. F. BASS, The construction of Brownian motion on the Sierpi´nski carpet, Ann. Inst. H. Poincar´e Probab. Statist. 25 (1989), 225–257. MR 91d:60183 465 , Transition densities for Brownian motion on the Sierpi´nski carpet, Probab. Theory Related Fields 91 (1992), 307–330. MR 93k:60203 465, 467 , Brownian motion and harmonic analysis on Sierpinski carpets, Canad. J. Math. 51 (1999), 673–744. MR 2000i:60083 453, 459, 465, 467 , “Random walks on graphical Sierpinski carpets” in Random Walks and Discrete Potential Theory (Cortona, Italy, 1997), Sympos. Math. 39, Cambridge Univ. Press, Cambridge, 1999, 26–55. MR CMP 1 802 425 453, 458, 459, 467 M. T. BARLOW, T. COULHON, and A. GRIGOR’YAN, Manifolds and graphs with slow heat kernel decay, to appear in Invent. Math. 458, 464 M. T. BARLOW and E. A. PERKINS, Brownian motion on the Sierpi´nski gasket, Probab. Theory Related Fields 79 (1988), 543–623. MR 89g:60241 453 I. BENJAMINI, I. CHAVEL, and E. A. FELDMAN, Heat kernel lower bounds on Riemannian manifolds using the old ideas of Nash, Proc. London Math. Soc. (3) 72 (1996), 215–240. MR 97c:58150 467 A. BOUKRICHA, Das Picard-Prinzip und verwandte Fragen bei St¨orung von harmonischen R¨aumen, Math. Ann. 239 (1979), 247–270. MR 81h:31018 467, 486 E. A. CARLEN, S. KUSUOKA, and D. W. STROOCK, Upper bounds for symmetric Markov transition functions, Ann. Inst. H. Poincar´e Probab. Statist. 23 (1987), 245–287. MR 88i:35066 467, 470 G. CARRON, “In´egalit´es isop´erim´etriques de Faber-Krahn et cons´equences” in Actes de la table ronde de g´eom´etrie diff´erentielle (Luminy, France, 1992), Semin. Congr. 1, Soc. Math. France, Montrouge, 1996, 205–232. MR 97m:58198 465, 469 I. CHAVEL, Eigenvalues in Riemannian Geometry, Pure Appl. Math. 115, Academic Press, Orlando, 1984. MR 86g:58140 452 , Isoperimetric inequalities and heat diffusion on Riemannian manifolds, lecture notes, 1999. 452 S. Y. CHENG, P. LI, and S.-T. YAU, On the upper estimate of the heat kernel of a complete Riemannian manifold, Amer. J. Math. 103 (1981), 1021–1063. MR 83c:58083 467, 491 S. Y. CHENG and S.-T. YAU, Differential equations on Riemannian manifolds and their geometric applications, Comm. Pure Appl. Math. 28 (1975), 333–354. MR 52:6608 459 F. R. K. CHUNG, Spectral Graph Theory, CBMS Regional Conf. Ser. in Math. 92, Amer. Math. Soc., Providence, 1997. MR 97k:58183 460 T. COULHON, Ultracontractivity and Nash type inequalities, J. Funct. Anal. 141 (1996), 510–539. MR 97j:47055 469 T. COULHON and A. GRIGOR’YAN, On-diagonal lower bounds for heat kernels and
508
[22] [23] [24] [25] [26] [27] [28]
[29]
[30]
[31]
[32] [33] [34] [35]
[36]
[37]
[38]
GRIGOR’YAN AND TELCS
Markov chains, Duke Math. J. 89 (1997), 133–199. MR 98e:58159 467 , Random walks on graphs with regular volume growth, Geom. Funct. Anal. 8 (1998), 656–701. MR 99e:60153 458, 460, 471 T. COULHON and L. SALOFF-COSTE, Puissances d’un op´erateur r´egularisant, Ann. Inst. H. Poincar´e Probab. Statist. 26 (1990), 419–436. MR 91j:43002 467 , Minorations pour les chaˆınes de Markov unidimensionnelles, Probab. Theory Related Fields 97 (1993), 423–431. MR 95b:60085 467 E. B. DAVIES, Heat Kernels and Spectral Theory, Cambridge Tracts in Math. 92, Cambridge Univ. Press, Cambridge, 1989. MR 90e:35123 452 , Pointwise bounds on the space and time derivatives of heat kernels, J. Operator Theory 21 (1989), 367–378. MR 90k:58214 467 , Non-Gaussian aspects of heat kernel behaviour, J. London Math. Soc. (2) 55 (1997), 105–125. MR 97i:58169 467 T. DELMOTTE, Parabolic Harnack inequality and estimates of Markov chains on graphs, Rev. Mat. Iberoamericana 15 (1999), 181–232. MR 2000b:35103 453, 458 E. B. FABES and D. W. STROOCK, A new proof of Moser’s parabolic Harnack inequality using the old ideas of Nash, Arch. Rational Mech. Anal. 96 (1986), 327–338. MR 88b:35037 467 S. GOLDSTEIN, “Random walks and diffusion on fractals” in Percolation Theory and Ergodic Theory of Infinite Particle Systems (Minneapolis, 1984/85), ed. H. Kesten, IMA Vol. Math. Appl. 8, Springer, New York, 1987, 121–129. MR 88g:60245 453 A. A. GRIGOR’YAN, The heat equation on non-compact Riemannian manifolds (in Russian), Mat. Sb. 182, no. 1 (1991), 55–87; English translation in Math. USSR-Sb. 72, no. 1 (1992), 47–77. MR 92h:58189 453 , Heat kernel upper bounds on a complete non-compact manifold, Rev. Mat. Iberoamericana 10 (1994), 395–452. MR 96b:58107 465, 469 , Integral maximum principle and its applications, Proc. Roy. Soc. Edinburgh Sect. A 124 (1994), 353–362. MR 95c:35045 465 , Upper bounds of derivatives of the heat kernel on an arbitrary complete manifold, J. Funct. Anal. 127 (1995), 363–389. MR 96a:58183 467 , “Estimates of heat kernels on Riemannian manifolds” in Spectral Theory and Geometry (Edinburgh, 1998), ed. B. Davies and Yu. Safarov, London Math. Soc. Lecture Note Ser. 273, Cambridge Univ. Press, Cambridge, 1999, 140–225. MR 2001b:58040 452 B. M. HAMBLY and T. KUMAGAI, Transition density estimates for diffusion processes on post critically finite self-similar fractals, Proc. London Math. Soc. (3) 78 (1999), 431–458. MR 99m:60118 453 W. HEBISCH and L. SALOFF-COSTE, Gaussian estimates for Markov chains and random walks on groups, Ann. Probab. 21 (1993), 673–709. MR 94m:60144 452, 458 , On the relation between elliptic and parabolic Harnack inequalities, preprint, 2000. 467, 468
SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS
509
[39]
O. D. JONES, Transition probabilities for the simple random walk on the Sierpi´nski
[40]
J. KIGAMI, Harmonic calculus on p.c.f. self-similar sets, Trans. Amer. Math. Soc. 335
graph, Stochastic Process. Appl. 61 (1996), 45–69. MR 97b:60115 453, 458
[41] [42]
[43]
[44] [45]
[46]
[47] [48]
[49] [50]
[51] [52]
[53] [54]
[55]
(1993), 721–755. MR 93d:39008 453 , Harmonic calculus on limits of networks and its application to dendrites, J. Funct. Anal. 128 (1995), 48–86. MR 96e:60130 453 J. KIGAMI and M. L. LAPIDUS, Weyl’s problem for the spectral distribution of Laplacians on p.c.f. self-similar fractals, Comm. Math. Phys. 158 (1993), 93–125. MR 94m:58225 453 N. V. KRYLOV and M. V. SAFONOV, A certain property of solutions of parabolic equations with measurable coefficients (in Russian), Izv. Akad. Nauk SSSR Ser. Mat. 44, no. 1 (1980), 161–175, 239; English translation in Math. USSR-Izv. 16 (1981), 151–164. MR 83c:35059 467 T. KUMAGAI, Estimates of transition densities for Brownian motion on nested fractals, Probab. Theory Related Fields 96 (1993), 205–224. MR 94e:60068 453 S. KUSUOKA and Z. X. YIN [X. Y. ZHOU], Dirichlet forms on fractals: Poincar´e constant and resistance, Probab. Theory Related Fields 93 (1992), 169–196. MR 94e:60069 453 E. M. LANDIS, The Second Order Equations of Elliptic and Parabolic Type (in Russian), Izdat. “Nauka,” Moscow, 1971, MR 47:9044; English translation in Transl. Math. Monogr. 171, Amer. Math. Soc., Providence, 1998. MR 98k:35034 467 P. LI and S.-T. YAU, On the parabolic kernel of the Schr¨odinger operator, Acta Math. 156 (1986), 153–201. MR 87f:58156
452 F. LUST-PIQUARD, Lower bounds on K n 1→∞ for some contractions K of L 2 (µ), with applications to Markov operators, Math. Ann. 303 (1995), 699–712. MR 96m:47055 458, 467 J. MOSER, On Harnack’s theorem for elliptic differential equations, Comm. Pure Appl. Math. 14 (1961), 577–591. MR 28:2356 467 , A Harnack inequality for parabolic differential equations, Comm. Pure Appl. Math. 17 (1964), 101–134, MR 28:2357; Correction, Comm. Pure Appl. Math. 20 (1967), 231–236. MR 34:3121 467 J. NASH, Continuity of solutions of parabolic and elliptic equations, Amer. J. Math. 80 (1958), 931–954. MR 20:6592 470 F. O. PORPER and S. D. E` ˘IDEL’MAN, Two-side estimates of fundamental solutions of second-order parabolic equations and some applications (in Russian), Uspekhi Mat. Nauk 39, no. 3 (1984), 107–156; English translation in Russian Math. Surveys 39, no. 3 (1984), 119–179. MR 86b:35078 452 L. SALOFF-COSTE, A note on Poincar´e, Sobolev, and Harnack inequalities, Internat. Math. Res. Notices 1992, 27–38. MR 93d:58158 453 , Isoperimetric inequalities and decay of iterated kernels for almost-transitive Markov chains, Combin. Probab. Comput. 4 (1995), 419–442. MR 97c:60171 458 R. SCHOEN and S.-T. YAU, Lectures on Differential Geometry, Conf. Proc. Lecture
510
[56]
[57] [58] [59] [60] [61] [62]
[63] [64] [65]
[66] [67]
GRIGOR’YAN AND TELCS
Notes Geom. Topology 1, International Press, Cambridge, Mass., 1994. MR 97d:53001 452 D. W. STROOCK, “Estimates on the heat kernel for second order divergence form operators” in Probability Theory (Singapore, 1989), ed. L. H. Y. Chen, K. P. Choi, K. Hu, and J. H. Lou, de Gruyter, Berlin, 1992, 29–44. MR 93m:35092 467, 505 K-T. STURM, Analysis on local Dirichlet spaces, III: The parabolic Harnack inequality, J. Math. Pures Appl. (9) 75 (1996), 273–297. MR 97k:31010 454 M. TAKEDA, On a martingale method for symmetric diffusion processes and its applications, Osaka J. Math. 26 (1989), 605–623. MR 91d:60193 465 A. TELCS, Random walks on graphs, electric networks and fractals, Probab. Theory Related Fields 82 (1989), 435–449. MR 90h:60065 453, 458, 465, 475, 477, 478 , Spectra of graphs and fractal dimensions, I, Probab. Theory Related Fields 85 (1990), 489–497. MR 91k:60075 478 , Spectra of graphs and fractal dimensions, II, J. Theoret. Probab. 8 (1995), 77–96. MR 96d:60107 478 , Transition probability estimates for reversible Markov chains, Electron. Comm. Probab. 5 (2000), 29–37, http://www.math.washington.edu/˜ejpecp/ MR 2001b:60088 458 N. TH. VAROPOULOS, Hardy-Littlewood theory for semigroups, J. Funct. Anal. 63 (1985), 240–260. MR 87a:31011 469 , Analysis on nilpotent groups, J. Funct. Anal. 66 (1986), 406–431. MR 88h:22014 452 N. TH. VAROPOULOS, L. SALOFF-COSTE, and T. COULHON, Analysis and Geometry on Groups, Cambridge Tracts in Math. 100, Cambridge Univ. Press, Cambridge, 1992. MR 95f:43008 452 W. WOESS, Random walks on infinite graphs and groups—a survey on selected topics, Bull. London Math. Soc. 26 (1994), 1–60. MR 94i:60081 452, 459 X. Y. ZHOU, Resistance dimension, random walk dimension and fractal dimension, J. Theoret. Probab. 6 (1993), 635–652. MR 95d:60119 453
Girgor’yan Department of Mathematics, Imperial College, 180 Queen’s Gate, London SW7 2BZ, United Kingdom; [email protected] Telcs International Management Center, Graduate School of Business, Budapest, Zrinyi u. 14, Budapest H-1051, Hungary; [email protected]
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 3,
THE GLOBAL NILPOTENT VARIETY IS LAGRANGIAN VICTOR GINZBURG
Abstract The purpose of this paper is to present a short elementary proof of a theorem due to G. Faltings and G. Laumon, which says that the global nilpotent cone is a Lagrangian substack in the cotangent bundle of the moduli space of G-bundles on a complex compact curve. This result plays a crucial role in the geometric Langlands program (see [BD]) since it insures that the D -modules on the moduli space of Gbundles whose characteristic variety is contained in the global nilpotent cone are automatically holonomic and, in particular, have finite length. Let (M, ω) be a smooth symplectic algebraic variety. A (possibly singular) algebraic subvariety Y ⊂ M is said to be isotropic, respectively, Lagrangian, if the tangent space, Ty Y , at any regular point y ∈ Y is an isotropic, respectively, Lagrangian, vector subspace in the symplectic vector space Ty M. (We always assume Y to be reduced but not necessarily irreducible.) The following characterisation of isotropic subvarieties proved, for example, in [CG, Prop. 1.3.30], is used later: Y ⊂ M is isotropic if and only if, for any smooth locally closed subvariety W ⊂ Y , we have ω|W = 0 . (Here W is possibly contained in the singular locus of Y .) An advantage of this characterisation is that it allows us to extend the notion of “being isotropic” from algebraic subvarieties to semialgebraic constructible subsets. Thus, we call a semialgebraic constructible subset Y ⊂ M isotropic if ω|W = 0 for any smooth locally closed algebraic variety W ⊂ Y . e M, Now, let M be a smooth stack that can be locally presented as p : M e where M is a smooth algebraic variety and p is a smooth surjective morphism; for example, M is locally isomorphic to the quotient of a smooth algebraic variety modulo an algebraic action of an algebraic group (see, e.g., [LMB]). We have a natural diagram e ,→ e T ∗ M T ∗ M ×M M T ∗ M. DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 3, Received 16 August 2000. Revision received 13 October 2000. 2000 Mathematics Subject Classification. Primary 53D12, 14D20.
511
512
VICTOR GINZBURG
A substack Z ⊂ T ∗ M is said to be constructible, respectively, isotropic or Lae is a constructible, respectively, isotropic or Lagrangian, subgrangian, if (Z × M M) ∗ e relative to the standard symplectic structure on the cotangent bundle of a set of T M smooth variety. Let X be a smooth complex compact connected algebraic curve of genus g > 1, and let G be a complex semisimple∗ group. Below we write BunG for the moduli space of principal algebraic G-bundles on X , regarded as a stack (cf. [LMB]) rather than a scheme. In particular, no stability conditions on principal bundles P ∈ BunG are imposed. Given a principal G-bundle P, let g P and g∗P denote the associated vector bundles corresponding to the adjoint and coadjoint representations of G, respectively. Let P be the universal bundle on BunG × X , and let p : BunG × X → BunG be the projection. The cotangent stack, T ∗ BunG , is a stack (see [BD, Sec. 1.1.1]) that is relatively representable over BunG by the affine spectrum of the sheaf of algebras Sym(R 1p∗ gP ), the symmetric algebra of the first derived pushforward sheaf. Note that for all i > 1 we have R i p∗ gP = 0 since dim X = 1. Hence, the formation R 1p∗ (−) is right-exact and therefore commutes with base change. For a scheme S and a morphism S → BunG , write P(S) for the pullback of the universal bundle to S × X . Using the base change, one obtains the following (stack version of the) Kodaira-Spencer formula for the set of S-points of the fiber of T ∗ BunG over P(S): ∗ TP(S) BunG = 0 S, H om Coh (R 1p∗ gP(S) , O S ) = 0 S, H om D b (Coh) (Rp∗ gP(S) [1] , O S ) (1) ! = 0 S, H om D b (Coh) (gP(S) , Rp O S [−1]) = 0(S × X, g∗P(S) ⊗ 1X ×S/S ) ' 0(S × X, gP(S) ⊗ 1X ×S/S ) , where the second isomorphism exploits the fact that the complex Rp∗ gP(S) [1] is concentrated in nonpositive degrees and the last isomorphism uses the identification g∗P ' gP induced by the Killing form on g. Write N for the nilpotent cone in g, the zero variety of the set of Ad G-invariant polynomials on g without constant term. Choose a Borel subgroup B ⊂ G with Lie algebra b, and let n denote the nilradical of b. B. Kostant proved in [Ko] (see also [CG, Chap. 6]) that N is equal, as a subscheme of g, to the image of the Springer resolution, the morphism G × B n → g given by the assignment (g, x) 7→ Ad g(x). Given a scheme S and a G-bundle P on S × X , we choose local trivialisations of the vector bundles g P and 1X , and we view a local section x of g P ⊗ 1X as a function S × X → g. The section x is called nilpotent if the corresponding function ∗ It
is not hard to extend our results to any reductive group, but that would lead to unpleasant dimension shifts in various formulas below, so we restrict ourselves to the semisimple case.
GLOBAL NILPOTENT VARIETY IS LAGRANGIAN
513
gives a morphism S × X → N ⊂ g. The notion of a nilpotent section does not depend on the choices of trivialisations involved. Following Laumon [La], define the global nilpotent cone as a closed (nonreduced) substack N ilp ⊂ T ∗ BunG whose set of S-points is N ilp(S) = {(P, x) x ∈ 0(S × X, g P ⊗ 1X ), x is nilpotent section}, where P runs over G-bundles on S × X . MAIN THEOREM
N ilp is a Lagrangian substack in T ∗ BunG .
Remark. This theorem was first proved, in the special case G = SLn , by Laumon [La]. His argument cannot be generalised to arbitrary semisimple groups. In the general case, the theorem was proved by Faltings [Fa, Th. II.5]. The proof below seems to be more elementary than that of Faltings; it is based on nothing but a few general results of symplectic geometry. Another proof of the theorem is given in [BD]. That proof is more complicated; however, it potentially leads to a description of the irreducible components of N ilp. We begin with a few general lemmas. Let (M1 , ω1 ) and (M2 , ω2 ) be complex algebraic symplectic manifolds, and let pri : M1 × M2 → Mi be the projections. We regard M1 × M2 as a symplectic manifold with symplectic form pr∗1 ω1 − pr∗2 ω2 , involving the minus sign on the second factor. The following result is a special case of [CG, Prop. 2.7.51]. LEMMA 1 Let 31 ⊂ M1 and 3 ⊂ M1 × M2 be smooth algebraic isotropic subvarieties. Then pr2 pr−1 1 (31 ) ∩ 3 ⊂ M2 is an isotropic subvariety.
Proof Set Y := pr−1 1 (31 ) ∩ 3. Simple linear algebra shows that, for any y ∈ Y , the image of the tangent map (pr2 )∗ : Ty Y → Tpr2 (y) M2 is isotropic. We use the characterisation of isotropic subvarieties mentioned at the beginning of the paper. Let W ⊂ 32 := pr2 (Y ) be an irreducible smooth subvariety. Observe that map pr2 : pr−1 there exists a nonempty smooth Zariski2 (W )∩Y → W is surjective. Hence, open dense subset Y 0 ⊂ pr−1 (W ) ∩ Y such that the restriction pr2 : Y 0 → W has 2 red surjective differential at any point of Y 0 . Therefore, the tangent space at the generic point of W is isotropic. Whence the tangent space at every point of W is isotropic by continuity. It follows that any smooth subvariety of 32 is isotropic, and the lemma follows.
514
VICTOR GINZBURG
Given a manifold N , we write λ N for the canonical 1-form on T ∗ N , usually denoted “ pdq,” such that dλ is the canonical symplectic 2-form on T ∗ N . Let f : N1 → N2 be a morphism of smooth algebraic varieties. Identify T ∗ (N1 × N2 ) with T ∗ N1 × T ∗ N2 via the standard map multiplied by (−1) on the factor T ∗ N2 . The canonical 1-form on T ∗ (N1 × N2 ) becomes, under the above identification, equal to pr∗1 (λ N1 ) − pr∗2 (λ N2 ). We endow T ∗ N1 × T ∗ N2 with the corresponding symplectic form pr∗1 (dλ N1 ) − pr∗2 (dλ N2 ); it is induced from the canonical symplectic form on T ∗ (N1 × N2 ) via the identification. Introduce the following closed subvariety: Y f = {(n 1 , α1 ), (n 2 , α2 ) ∈ T ∗ N1 × T ∗ N2 n 2 = f (n 1 ) , α1 = 0 = f ∗ (α2 )} . (3) 2 The image of Y f under the second projection pr2 : T ∗ N1 × T ∗ N2 → T ∗ N2 is an isotropic subvariety in T ∗ N2 . LEMMA
Proof Using the above explained identification of T ∗ (N1 ×N2 ) with T ∗ N1 ×T ∗ N2 involving a sign, the conormal bundle to the graph of f can be written as the subvariety 3 = {(n 1 , α1 ), (n 2 , α2 ) ∈ T ∗ N1 × T ∗ N2 n 2 = f (n 1 ) , α1 = f ∗ (α2 )} . Observe that the canonical 1-form pr∗1 (λ N1 ) − pr∗2 (λ N2 ) on T ∗ N1 × T ∗ N2 vanishes identically on 3. Hence, 3 is an isotropic subvariety, and we may apply Lemma 1 to M1 = T ∗ N1 , M2 = T ∗ N2 , and 31 = TN∗1 N1 = zero-section, and to 3 above. ∗ Observe now that we have by definition Y f = 3 ∩ pr−1 1 (TN1 N1 ). Hence, by Lemma 2 the subvariety pr2 (Y f ) is isotropic.
LEMMA 3 If N1 and N2 are smooth algebraic stacks and f : N1 → N2 is a representable morphism of finite type, then the assertion of Lemma 2 remains valid.
Proof Due to locality of the claim, we may (and do) assume that N2 is quasi-compact. Let e2 be a smooth algebraic variety, and let N e2 → N2 be a smooth surjective equidiN e1 := N e2 × N2 N1 . Note that the set Y f ⊂ T ∗ N1 × T ∗ N2 mensional morphism. Set N defined in (3) may be viewed as a subset in T ∗ N2 × N2 N1 . Therefore, we have e2 ⊂ T ∗ N2 × N2 N e2 × N2 N1 = T ∗ N e2 × N2 N e1 . Y f × N2 N e2 is an isotropic subvariety in T ∗ N e2 . Let F : We must show that the image of Y f × N2 N ∗ e e e e N1 → N2 be the natural morphism, and let Y F ⊂ T N2 × Ne2 N1 be the corresponding
GLOBAL NILPOTENT VARIETY IS LAGRANGIAN
515
e2 × Ne N e = T∗N e2 × N2 N1 and Y f × N2 N e2 = subvariety of Lemma 2. Observe that T ∗ N 2 1 ∗ e e2 is Y F . Hence, Lemma 2 applied to F shows that the image of Y F × N2 N2 in T N isotropic. The claim follows. Choose a Borel subgroup B ⊂ G with Lie algebra b, and let n denote the nilradical of b. Given a field K of characteristic zero, write G(K ), B(K ), b(K ), and so on, for the corresponding sets of K -rational points. The following result seems to be well known; it is included here for the reader’s convenience. LEMMA 4 For any field K ⊃ C and any x ∈ N (K ), there exists an element g ∈ G(K ) such that Ad g(x) ∈ n(K ).
Proof Since the Jacobson-Morozov theorem holds for any field of characteristic zero, one may find an sl2 -triple (x, h, x − ) ⊂ g(K ) associated to the given nilpotent element x ∈ g(K ). The eigenspaces of the semisimple endomorphism ad h : g → g corresponding to nonnegative eigenvalues span a parabolic subalgebra px ⊂ g, which is defined over K . Writing ux for the nilradical of px , by construction we have x ∈ ux (K ). Clearly, if bx ⊂ px is a Borel subalgebra defined over K and nx is its nilradical, then x ∈ ux (K ) ⊂ nx (K ). Thus, it suffices to prove that the parabolic px contains a Borel subalgebra defined over K . To this end, let P denote the partial flag variety of all parabolics in g of type px . There is a unique p ∈ P (K ) such that p ⊃ b, where b is our fixed Borel subalgebra. Now, the group G(K ) acts transitively on P (K ). (This follows easily from the Bruhat decomposition; see [Ja].) We deduce that there exists g ∈ G(K ) such that Ad g(p) = px . But then Ad g(b) ⊂ px is a Borel subalgebra defined over K , and we are done. Let f 1 , . . . , fr , (r = rk g) be a set of homogeneous free generators of C[g]G , the algebra of G-invariant polynomials on g. Let di = deg f i be the exponents of g. Following N. Hitchin [Hi], we put M M Hitch := 0 X, X⊗d1 ··· 0 X, X⊗dr . This is an affine space of dimension equal to dim BunG . (At this point it is used (see [Hi]) that genus(X ) is greater than 1.) Hitchin has defined a morphism π : T ∗ BunG → Hitch by assigning to any pair (s, P) ∈ T ∗ BunG , where P ∈ BunG and s ∈ TP∗ BunG ' 0(X, gP(S) ⊗ 1X ) (see (1)), the element π(s, P) = ⊕ri=1 f i (s) ∈ Hitch. It is immediate from the construction that the global nilpotent variety is the fiber of π over the zero element 0 ∈ Hitch.
516
VICTOR GINZBURG
Remark. Hitchin actually worked in the setup of stable Higgs bundles and not in the setup of stacks. But his construction of the map π extends to the stack setup verbatim. We make no use of any additional properties of the map π established in [Hi]. LEMMA 5 N ilp is an isotropic substack in T ∗ BunG .
Proof Write N2 = Bun B for the moduli stack of principal B-bundles. By an old result of G. Harder [Ha], any G-bundle on a curve has a B-reduction; hence the natural morphism of stacks f : Bun B → BunG is surjective. Let P be an algebraic G-bundle on the curve X , and let s be a nilpotent regular section of g P ⊗ 1X . Harder showed, further, using a key rationality result of R. Steinberg [St], that the bundle P is locally trivial in the Zariski topology. Thus, trivializing P on the generic point of X , one may identify the restriction of s to the generic point with a nilpotent element of g(K ), where K = C(X ) is the field of rational functions on X . Hence, Lemma 4 implies that there exists a B-reduction of P over the generic point of X such that s ∈ n P ⊗ 1X . Here, a B-reduction is a section of the associated bundle B\P. The fibers of B\P being projective varieties (isomorphic to B\G), any section of B\P defined over the generic point of X extends to the whole of X . Thus, there exists a B-reduction over X of the G-bundle P such that s ∈ n P ⊗ 1X . Further, let k ⊃ C be a field, and let S = Spec(k). For any P ∈ Bun B (k), we have TP∗ Bun B = H 1 (X, b P )∗ = H 0 (X, b∗P ⊗ 1X ) = H 0 X, (g P /n P ) ⊗ 1X . It follows that in the notation of Lemma 3, for N1 = Bun B and N2 = BunG , we have N ilp = pr2 (Y f ). Observe further that Bun B is the union of a countable family of open substacks, each of finite type over BunG . Thus, Lemma 3 implies that N ilp is the union of a countable family of isotropic substacks. We claim that if an algebraic stack S is the union of a countable family {Si }i∈N of locally closed substacks, then any quasi-compact substack of S can be covered by finitely many Si ’s. To prove this, we may assume without loss of generality that S is itself quasi-compact. Choose a smooth surjective morphism e S S, where e S is a scheme of finite type. By our assumptions there exists a countable family {e Si }i∈N of locally closed subschemes of e S such that e S = ∪i e Si . Hence, there exists n 0 such that e S equals the union of the closures of e S1 , . . . , e Sn , thanks to Baer theorem. Hence, e S1 ∪ · · · ∪ e Sn is Zariski dense in e S, and dim e S r (e S1 ∪ · · · ∪ e Sn ) < dim e S. Arguing by induction on dim e S, we deduce that e S is covered by finitely many e Si ’s. Hence, for any field k ⊃ C, the quasi-compact set S(k) is covered by finitely many subsets Si (k).
GLOBAL NILPOTENT VARIETY IS LAGRANGIAN
517
Thus, we have proved that any open quasi-compact substack of N ilp can be covered by finitely many isotropic substacks. This implies that N ilp is itself isotropic. PROPOSITION 1 There is an equality dim N ilp = dim BunG .
Proof We observe that, since BunG is an equidimensional smooth stack, each irreducible component of T ∗ BunG has dimension greater than or equal to 2 dim BunG . To see this, one may replace BunG by an open quasi-compact substack Y , which admits a e Y , where Y e is a smooth algebraic variety and smooth surjective morphism p : Y ∗ e is a closed subscheme of T ∗ Y e fibers of p are purely m-dimensional. Then, T Y ×Y Y ∗ ∗ locally defined by m equations. Hence, for each irreducible component T j of T Y we find e) − m ≥ (dim T ∗ Y e − m) − m = 2(dim Y e − m) = 2 dim Y. dim T j∗ = dim(T j∗ ×Y Y It follows that each irreducible component of any fiber of the Hitchin morphism π : T ∗ BunG → Hitch has dimension greater than or equal to 2 dim BunG − dim Hitch = dim BunG . But we have proved that each component of N ilp = π −1 (0) is an isotropic subvariety. Thus, dim N ilp = dim BunG . Remark. Although the inequality dim T ∗ BunG ≥ 2 dim BunG is no longer true if X has genus one or zero, it has been shown in [BD] that the stack N ilp still has pure dimension equal to dim BunG . COROLLARY 1 Every irreducible component of any fiber of π has dimension equal to dim BunG . In particular, the Hitchin morphism π is flat.
Proof There is a natural C∗ -action on Hitch such that t ∈ C∗ acts on the direct summand 0(X, X⊗di ) ⊂ Hitch via multiplication by t di . The map π : T ∗ BunG → Hitch is C∗ -equivariant relative to the above-defined C∗ -action on Hitch and to the standard C∗ -action on T ∗ BunG by dilations along the fibers, respectively. Clearly, zero is the only fixed point of the C∗ -action on Hitch and, moreover, it is contained in the closure of any other C∗ -orbit on Hitch. It follows, since the dimension of any irreducible component of any fiber is less than or equal to the dimension of the special fiber, that for any h ∈ Hitch we have dim π −1 (h) ≤ dim π −1 (0). But for curves of genus greater than 1, Proposition 1 yields dim π −1 (0) = dim N ilp = dim BunG . Thus, for
518
VICTOR GINZBURG
any h ∈ Hitch we get dim π −1 (h) ≤ dim BunG . On the other hand, the dimension of each irreducible component of any fiber of the morphism π : T ∗ BunG → Hitch is no less than dim T ∗ BunG − dim Hitch = dim BunG . This proves the opposite inequality. The proof of the main theorem is completed by the following stronger result. THEOREM 1 The stack T ∗ BunG is a local complete intersection, and N ilp is a Lagrangian complete intersection in T ∗ BunG .
Proof The claims being local, we may replace BunG by an open quasi-compact substack Y e Y , where Y e is a smooth algebraic which admits a presentation of the form p : Y variety and p is a smooth surjective morphism with fibers of pure dimension m. As e is a closed subscheme of we have observed in the proof of Proposition 1, T ∗ Y ×Y Y ∗ e locally defined by m equations and, moreover, dim T ∗ Y ≥ 2 dim BunG . On the T Y other hand, since fibers of the Hitchin map π have dimension equal to dim BunG , we find dim T ∗ Y ≤ dim BunG + dim Hitch = 2 dim BunG . This proves that the stack T ∗ BunG is a local complete intersection. Further, N ilp being the zero fiber of the surjective morphism π, it is defined by dim BunG equations in T ∗ BunG . The equality dim N ilp = dim BunG (Proposition 1) implies that N ilp is a complete intersection in T ∗ BunG . Finally, since N ilp is equidimensional (Corollary 1), Lemma 5 implies that N ilp is Lagrangian. Acknowledgments. I am grateful to V. Drinfeld for his invaluable help and also for his extreme patience while answering my numerous foolish questions. References [BD]
[CG] [Fa] [Ha] [Hi]
A. BEILINSON and V. DRINFELD, Quantization of Hitchin’s integrable system and
Hecke eigensheaves, preprint, 2000, http://www.math.uchicago.edu/˜benzvi 511, 512, 513, 517 N. CHRISS and V. GINZBURG, Representation Theory and Complex Geometry, Birkh¨auser, Boston, 1997. MR 98i:22021 511, 512, 513 G. FALTINGS, Stable G-bundles and projective connections, J. Algebraic Geom. 2 (1993), 507–568. MR 94i:14015 513 G. HARDER, Halbeinfache Gruppenschemata u¨ ber Dedekindringen, Invent. Math. 4 (1967), 165–191. MR 37:1378 516 N. HITCHIN, Stable bundles and integrable systems, Duke Math. J. 54 (1987), 91–114. MR 88i:58068 515, 516
GLOBAL NILPOTENT VARIETY IS LAGRANGIAN
519
[Ja]
J. C. JANTZEN, Representations of Algebraic Groups, Pure Appl. Math. 131, Academic
[Ko]
B. KOSTANT, Lie group representations on polynomial rings, Amer. J. Math. 85
[La]
G. LAUMON, Un analogue global du cˆone nilpotent, Duke Math. J. 57 (1988),
[LMB]
G. LAUMON and L. MORET-BAILLY, Champs alg´ebriques, Ergeb. Math. Grenzgeb. (3)
[St]
´ R. STEINBERG, Regular elements of semisimple algebraic groups, Inst. Hautes Etudes
Press, Boston, 1987. MR 89c:20001 515 (1963), 327–404. MR 28:1252 512 647–671. MR 90a:14012 513 39, Springer, Berlin, 2000. MR CMP 1 771 927 511, 512 Sci. Publ. Math. 25 (1965), 49–80. MR 31:4788 516
University of Chicago, Department of Mathematics, Chicago, Illinois 60637, USA; [email protected]
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 3,
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ TERESA KRICK, LUIS MIGUEL PARDO, AND MART´IN SOMBRA
Abstract We present sharp estimates for the degree and the height of the polynomials in the Nullstellensatz over the integer ring Z. The result improves previous work of P. Philippon, C. Berenstein and A. Yger, and T. Krick and L. M. Pardo. We also present degree and height estimates of intrinsic type, which depend mainly on the degree and the height of the input polynomial system. As an application we derive an effective arithmetic Nullstellensatz for sparse polynomial systems. The proof of these results relies heavily on the notion of local height of an affine variety defined over a number field. We introduce this notion and study its basic properties. Contents Introduction . . . . . . . . . . . . . . . . . . . . . 1. Height of polynomials and varieties . . . . . . . 1.1. Height of polynomials . . . . . . . . . . . 1.2. Height of varieties . . . . . . . . . . . . . 2. Estimates for local and global heights . . . . . . 2.1. Estimates for Chow forms . . . . . . . . . 2.2. Basic properties of the height . . . . . . . 2.3. Local height of norms and traces . . . . . . 3. An effective arithmetic Nullstellensatz . . . . . . 3.1. Division modulo complete intersection ideals
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
522 527 528 536 542 542 548 556 562 562
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 3, Received 11 February 2000. Revision received 30 October 2000. 2000 Mathematics Subject Classification. Primary 11G35; Secondary 13P10. Krick and Sombra’s work partially supported by Consejo Nacional de Investigaciones Cient´ıficas y T´ecnicas, Universidad de Buenos Aires Ciencia y T´ecnica, and Agencia Nacional de Promoci´on Cient´ıfica y Tecnol´ogica (Argentina), and by the Mathematical Sciences Research Institute at Berkeley (USA). Sombra’s work also partially supported by National Science Foundation grant number DMS-97-29992 to the Institute for Advanced Study, Princeton, New Jersey. Pardo’s work partially supported by PB 96-0671-C02-02 (Spain), and by Centre National de la Recherche Scientifique 1026 Math´ematiques Effectives, D´eveloppements Informatiques, Calcul, Ing´enierie et Syst`emes (France). 521
522
KRICK, PARDO, AND SOMBRA
3.2. An effective arithmetic Nullstellensatz Intrinsic type estimates . . . . . . . . . . . 4.1. Equations in general position . . . . . 4.2. An intrinsic arithmetic Nullstellensatz References . . . . . . . . . . . . . . . . . . . 4.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
569 580 580 588 595
Introduction Hilbert Nullstellensatz is a cornerstone of algebraic geometry. In a simplified form, its statement is the following: Let f 1 , . . . , f s ∈ Z[x1 , . . . , xn ] be polynomials such that the equation system f 1 (x) = 0, . . . , f s (x) = 0 (1) has no solution in Cn . Then there exist a ∈ Z \ {0} and g1 , . . . , gs ∈ Z[x1 , . . . , xn ] satisfying the B´ezout identity a = g1 f 1 + · · · + gs f s .
(2)
As for many central results in commutative algebra and algebraic geometry, it is an existential noneffective statement. The estimation of both the degree and the height of polynomials satisfying identity (2) became an important and widely considered question. Effective versions of Hilbert Nullstellensatz apply to a wide range of situations in number theory and theoretical computer science. In particular, they decide the consistency of a given polynomial system. In their arithmetic presentation they apply to Lojasiewicz inequalities (see [51], [26]) and to the consistency problem over finite fields (see [28], [22]). Let h( f ) denote the height of an arbitrary polynomial f ∈ Z[x1 , . . . , xn ], defined as the logarithm of the maximum modulus of its coefficients. The main result of this paper is the following effective arithmetic Nullstellensatz. THEOREM 1 Let f 1 , . . . , f s ∈ Z[x1 , . . . , xn ] be polynomials without common zeros in Cn . Set d := maxi deg f i and h := maxi h( f i ). Then there exist a ∈ Z \ {0} and g1 , . . . , gs ∈ Z[x1 , . . . , xn ] such that • a = g1 f 1 + · · · + gs f s , • deg gi ≤ 4 n d n , • h(a), h(gi ) ≤ 4 n (n + 1) d n (h + log s + (n + 7) log(n + 1) d).
As we see below, this result substantially improves all previously known estimates for the arithmetic Nullstellensatz.
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
523
The following variant of a well-known example due to D. Masser and to Philippon (see [6]) yields a lower bound for any general degree and height estimate. Set f 1 := x1d ,
f 2 := x1 xnd−1 − x2d ,
d . . . , f n−1 := xn−2 xnd−1 − xn−1 ,
f n := xn−1 xnd−1 − H
for any positive integers n, d, and H . These are polynomials of degree d and height bounded by h := log H without common zeros in Cn . Let a ∈ Z\{0} and g1 , . . . , gn ∈ Z[x1 , . . . , xn ] such that a = g1 f 1 + · · · + gn f n . n−2
n−1
Specializing this identity at x1 := H d t d −1 , . . . , xn−1 := H t d−1 , xn := 1/t, we obtain n n−2 n−1 n−1 a = g1 (H d t d −1 , . . . , H t d−1 , 1/t) H d t d −d . We conclude that deg g1 ≥ d n − d and h(a) ≥ d n−1 h. In fact, a modified version of this example gives the improved lower bound h(a) ≥ d n h (see Example 3.10). This shows that our estimate is essentially optimal. The earlier work on the effective Nullstellensatz dealt with the degree bounds. Let k be a field, and let k be its algebraic closure; let f 1 , . . . , f s ∈ k[x1 , . . . , xn ] be n polynomials of degree bounded by d without common zeros in k . In 1926, G. Hermann [25] (see also [23], [43]) proved that there exist g1 , . . . , gs ∈ k[x1 , . . . , xn ] such that 1 = g1 f 1 + · · · + gs f s n−1
with deg gi f i ≤ 2 (2d)2 . After a conjecture of O.-H. Keller and W. Gr¨obner, this estimate was dramatically improved by W. Brownawell [6] to deg gi f i ≤ n 2 d n + n d in case char(k) = 0, while 2 L. Caniglia, A. Galligo, and J. Heintz [7] showed that deg gi f i ≤ d n holds in the general case. These results were then independently refined by J. Koll´ar [29] and by N. Fitchas and Galligo [12] to deg gi f i ≤ max{3, d}n , which is optimal in case d ≥ 3. For d = 2, M. Sombra [53] recently showed that the bound deg gi f i ≤ 2n+1 holds. Now, let us consider the height aspect: assume that f 1 , . . . , f s ∈ Z[x1 , . . . , xn ] are polynomials of degree and height bounded, respectively, by d and h. The previous degree bound reduces B´ezout identity (2) to a system of Q-linear equations. Applying Cramer rule to this linear system, one obtains an estimate for the height of a and the 2 polynomials gi of type s d n (h + log s + d).
524
KRICK, PARDO, AND SOMBRA
However, it was soon conjectured that the true height bound should be much smaller. Philippon [48] obtained the following sharper estimate for the denominator a in the B´ezout equation: deg gi ≤ (n + 2) d n ,
h(a) ≤ κ(n) d n (h + d),
where κ(n) depends exponentially on n. The first essential progress on height estimates for all the polynomials gi was achieved by Berenstein and Yger [2], who obtained deg gi ≤ n (2 n + 1) d n ,
h(a), h(gi ) ≤ λ(n) d 8n+3 (h + log s + d log d),
where λ(n) is a (nonexplicit) constant that depends exponentially on n. Their proof relies on the previous work of Philippon and on techniques from complex analysis. Later on, Krick and Pardo [31], [32] obtained deg gi ≤ (n d)c n ,
h(a), h(gi ) ≤ (n d)c n (h + log s + d),
where c is a universal constant (c ≤ 35). Their proof, based on duality theory for Gorenstein algebras, is completely algebraic. Finally, Berenstein and Yger [3] improved their height bound to λ(n) d 4n+2 (h + log s + d ) and extended it to the case when Z is replaced by an arbitrary diophantine ring. It should be said, however, that the possibility of such an extension was already clear from the arguments of [32]. We refer the reader to the surveys [58], [1], and [45] for a broad introduction to the history of the effective Nullstellensatz, main results, and open questions. Aside from degree and height estimates, there is a strong current area of research on computational issues (see [19], [13], [32], [18], [17], [22]). There are other results in the recent research papers [50], [30], and [9]. With respect to previous work, in this paper we improve in an almost optimal way the dependence of the height estimate on d n and we eliminate the extraneous exponential constants depending on n. We remark that the polynomials arising in Theorem 1 are a slight variant of the polynomials which appear in [32] and can thus be effectively computed by their algorithm. Although the exponential behavior of the degree and height estimates is—in the worst case—unavoidable, it has been observed that there are many particular instances in which these estimates can be essentially improved. This has motivated the introduction of parameters associated to the input system which identify special families whose behavior with respect to our problem is polynomial instead of exponential. In this spirit, M. Giusti, Heintz, J. Morais, J. Morgenstern, and Pardo [18] introduced the notion of degree of a polynomial system f 1 , . . . , f s . Roughly speaking, this parameter measures the degree of the varieties cut out by f 1 , . . . , f i for
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
525
i = 1, . . . , s − 1. It was soon realized that the degrees in the Nullstellensatz can be controlled in terms of this parameter, giving rise to the so-called “intrinsic Nullstellens¨atze” (see [18], [33], [17], [52]). Recently K. H¨agele, Morais, Pardo, and Sombra [22] (see also [21]) obtained an arithmetic analogue of these intrinsic Nullstellens¨atze. To this aim, they introduced the notion of height of a polynomial system, the arithmetic analogue of the degree of the system. They obtained degree and height estimates which depend polynomially on the number of variables and on the degree, height and complexity of the input system. This result followed from their study of the computational complexity of the Nullstellensatz. In this paper we obtain a dramatic improvement over this result, bringing it to an (apparently) almost optimal form. In particular, we show that the dependence on the degree and the height of the system is linear, and we eliminate the influence of the complexity of the input. THEOREM 2 Let f 1 , . . . , f s ∈ Z[x1 , . . . , xn ] be polynomials without common zeros in Cn . Set d := maxi deg f i and h := maxi h( f i ). Let δ and η denote the degree and the height of the polynomial system f 1 , . . . , f s . Then there exist a ∈ Z \ {0} and g1 , . . . , gs ∈ Z[x1 , . . . , xn ] such that • a = g1 f 1 + · · · + gs f s , • deg gi ≤ 2 n 2 d δ, • h(a), h(gi ) ≤ (n + 1)2 d 2 η + (h + log s) δ + 21 (n + 1)2 d log(d + 1) δ .
Since δ ≤ d n−1 and η ≤ n d n−1 h + log s + 3 n (n + 1) d (see Lemma 4.8), one recovers from this statement essentially the same estimates as those of Theorem 1. However, we remark that Theorem 2 is a more flexible result as there are many situations in which the degree and the height of the input system are smaller than the B´ezout bounds. When this is the case, it yields a much more accurate estimate (see Sec. 4.2.2). An example of the situation when both the degree and the height of the system are smaller than the expected worst-case bounds is the sparse case. To state the result, we first need to introduce some standard notation. The support Supp( f 1 , . . . , f s ) of a polynomial system f 1 , . . . , f s ∈ C[x1 , . . . , xn ] is defined as the set of exponents of all the nonzero monomials of all f i ’s, and the Newton polytope N ( f 1 , . . . , f s ) ⊂ Rn is the convex hull of this support. The (normalized) volume of f 1 , . . . , f s equals n! times the volume of the corresponding Newton polytope.
526
KRICK, PARDO, AND SOMBRA
The notions of Newton polytope and volume of a polynomial system give a sharper characterization of its monomial structure than the degree alone. These concepts were introduced in the context of root counting by D. Bernstein [4] and A. Kushnirenko [35] and are now in the basis of sparse elimination theory (see, e.g., [56]). As an application of Theorem 2 we derive the following effective arithmetic Nullstellensatz for sparse polynomial systems. COROLLARY
3
Let f 1 , . . . , f s ∈ Z[x1 , . . . , xn ] be polynomials without common zeros in Cn . Set d := maxi deg f i and h := maxi h( f i ). Let V denote the volume of the polynomial system 1, x1 , . . . , xn , f 1 , . . . , f s . Then there exist a ∈ Z \ {0} and g1 , . . . , gs ∈ Z[x1 , . . . , xn ] such that • a = g1 f 1 + · · · + gs f s , • deg gi ≤ 2 n 2 d V , • h(a), h(gi ) ≤ 2 (n + 1)3 d V ( h + log s + 22n+3 d log(d + 1)). The crucial observation here is that both the degree and the height of a polynomial system are essentially controlled by the normalized volume. This follows from an adequate arithmetic version of the Bernstein-Kushnirenko theorem (see Prop. 2.12). Our result follows then from Theorem 2 in a straightforward way. As before, we can apply the worst-case bound V ≤ d n to recover from this result an estimate similar to the one presented in Theorem 1. However, this result gives sharper estimates for both the degree and the height when the input system is sparse (see Ex. 4.13). The sparse aspect in the Nullstellensatz was previously considered by J. Canny and I. Emiris [8, Th. 8.2] for the case of n + 1 n-variate Laurent polynomials without common roots at toric infinity. Their result is the sparse analogue of F. Macaulay’s effective Nullstellensatz [40]. The first general sparse Nullstellensatz was obtained by Sombra [53]. In both cases the authors give bounds for the Newton polytopes of the output polynomials in terms of the Newton polytopes of the input ones. We refer to the original papers for the exact statements. It is quite difficult to make a definite comparison between these results and ours. The latter does not give sharp bounds for Newton polytopes. But on the other hand, our degree estimate for the general case is better, while the height estimate is completely new. The key ingredient in our treatment of the arithmetic Nullstellensatz is the notion of local height of a variety defined over a number field K .
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
527
Let V ⊂ An (Q) be an equidimensional affine variety defined over K . For each absolute value v over K , we introduce the local height h v (V ) of V at v as a Mahler measure of a suitable normalized Chow form of V . This is consistent with the Faltings height h(V ) of V , namely, h(V ) =
X 1 Nv h v (V ), [K : Q] v∈M K
where M K denotes the set of canonical absolute values of K and Nv the multiplicity of v. We study the basic properties of this notion. In particular, we are able to estimate the local height of the trace and the norm of a polynomial f ∈ K [x1 , . . . , xn ] with respect to an integral extension K [Ar ] ,→ K [V ]. We also obtain local analogues of many of the global results of J.-B. Bost, H. Gillet, and C. Soul´e [5] and Philippon [49]. Our proof of the arithmetic Nullstellensatz is based on duality theory for Gorenstein algebras (trace formula). This technique was introduced in the context of the effective Nullstellensatz in [19] and [13]. Here we follow mostly the lines of J. Sabia and P. Solern´o [50] and of Krick and Pardo [32]. The trace formula allows us to perform division modulo complete intersection ideals, with good control of the degree and height of the involved polynomials. Local arithmetic intersection theory plays, with respect to the height estimates, the role of classical intersection theory with respect to the degree bounds. Finally, we remark that all of our results are valid not just for Q but for arbitrary number fields. In fact, the general analysis over number fields is necessary to obtain the sharpest estimates for the case K := Q. We also remark that the estimates in the general version of Theorem 1 (Theorem 3.6) do not depend on the involved number field. The outline of the paper is the following: In Section 1, we recall the basic definitions and properties of the height of polynomials, and we introduce the notion of local height of a variety defined over a number field. In Section 2, we derive useful estimates for the local heights of the trace and the norm of a polynomial in K [V ], and we study the behavior of the local heights of the intersection of a variety with a hypersurface. In Section 3, we recall the basic facts of duality theory which are useful in our context, and we prove Theorem 1. In Section 4, we focus on the intrinsic and sparse versions of the arithmetic Nullstellensatz. 1. Height of polynomials and varieties Throughout this paper Q denotes the field of rational numbers, Z the ring of rational integers, K a number field, and O K its ring of integers. We also denote by R the field
528
KRICK, PARDO, AND SOMBRA
of real numbers, by C the field of complex numbers, by k an arbitrary field, and by k an algebraic closure of k. As usual, An and Pn denote, respectively, the affine and the projective space of n dimensions over k. For every rational prime p we denote by | · | p the p-adic absolute value over Q such that | p| p = p −1 . We also denote the ordinary absolute value over Q by | · |∞ or simply by | · |. These form a complete set of independent absolute values over Q; we identify the set MQ of these absolute values with the set {∞, p; p prime}. For v ∈ MQ we denote by Qv the completion of Q with respect to the absolute value v. In case v = ∞ we have Q∞ = R, while in case p is prime we have that Q p is the p-adic field. There exists a unique extension of v to an absolute value over the algebraic closure Qv . We denote by Cv the completion of Qv with respect to this absolute value. This field is algebraically closed and complete with respect to the induced absolute value, which we also denote by v. We have C∞ = C. 1.1. Height of polynomials In this section we introduce the different measures for the size of a multivariate polynomial, both over Cv and over a number field. We establish the link between the different notions and study their basic properties. 1.1.1. Height of polynomials over Cv We fix an absolute value v ∈ {∞, p ; p prime} for the rest of this section. Let A ⊂ Cv be a finite set. We denote by |A |v := max{ |a|v , a ∈ A } its absolute value. Then we define the (logarithmic) height of A as h v (A ) := max{ 0, log |A |v }, that is, h v (A ) = log |{1} ∪ A |v . P For a polynomial f = α aα x α ∈ Cv [x1 , . . . , xn ], we define its absolute value | f |v as the absolute value of its set of coefficients, that is, | f |v := maxα { |aα |v }. In the same way we define the height h v ( f ) of f as the height of its set of coefficients: h v ( f ) := max{ 0, log | f |v }. When v = ∞, that is, when f has complex coefficients, we make use of the (logarithmic) Mahler measure of f defined as m( f ) :=
1
Z
1
Z ···
0
log | f (e2π i t1 , . . . , e2π i tn )| dt1 · · · dtn .
0
This integral is well defined, as log | f | is a plurisubharmonic function on Cn (see [39, Appendix I]).
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
529
The Mahler measure was introduced by D. Lehmer [37] for the case of a univariQd ate polynomial f := ad i=1 (x − αi ) ∈ C[x] as m( f ) = log |ad | +
d X
max{0, log |αi | }.
i=1
The link between both expressions of m( f ) is given by Jensen’s formula. The general case was introduced and studied by K. Mahler [41]. The key property of the Mahler measure is its additivity: m( f g) = m( f ) + m(g). We have the following relation between log | f | and m( f ): − log(n + 1) deg f ≤ m( f ) − log | f | ≤ log(n + 1) deg f.
(1.1)
The right inequality follows from the definition of m and the fact that the number of f monomials of f is bounded by n+deg ≤ (n + 1)deg f . For the left inequality, we n refer to [47, Lem. 1.13] and its proof. When f has total degree bounded by 1, the inequality is refined to log | f | ≤ m( f ). Also, for any degree, m( f (x1 , . . . , xn−1 , 0)) ≤ m( f ). We make frequent use of the following more precise relation. LEMMA 1.1 Let f ∈ C[X 1 , . . . , X r ] be a polynomial in r groups of n i variables each, for i = 1, . . . , r . Let di denote the degree of f in the group of variables X i . Then
−
r X i=1
log(n i + 1) di ≤ m( f ) − log | f | ≤
r X
log(n i + 1) di .
i=1
Proof The right inequality follows directly from the definition of m( f ) and the fact that we Q can bound by i (n i + 1)di the number of monomials of f . Thus we only consider the left inequality. Let f α1 ···αi ∈ C[X i+1 , . . . , X r ] denote the coefficient of f with respect to the monomial X 1α1 · · · X iαi . Applying inequality (1.1), we obtain for all (ξi+1 , . . . , ξr ) ∈ Cn i+1 +···+nr , log | f α1 ···αi−1 (X i , ξi+1 , . . . , ξr )| ≤ m( f α1 ···αi−1 (X i , ξi+1 , . . . , ξr )) + log(n i + 1) di . We have | f α1 ···αi−1 (X i , ξi+1 , . . . , ξr )| = maxαi | f α1 ···αi (ξi+1 , . . . , ξr )|. We inten +···+nr grate both sides of the last inequality on S1 i+1 , and we deduce max{m( f α1 ···αi ) ; αi ∈ Zni } ≤ m( f α1 ···αi−1 ) + log(n i + 1) di .
530
KRICK, PARDO, AND SOMBRA
We apply this relation recursively, and we obtain log | f | = max{m( f α1 ···αr ) ; α1 ∈ Zn 1 , . . . , αr ∈ Znr } ≤ m( f ) +
r X
log(n i + 1) di .
i=1
Let f ∈ C[X 1 , . . . , X r ] be a multihomogeneous polynomial in r groups of n i + 1 variables each, and set f a for a dehomogenization of f with respect to these groups of variables. Then m( f a ) = m( f ), log | f a | = log | f |. Thus the estimates of the preceding lemma also hold for f . Next we introduce the (logarithmic) Sn -Mahler measure of a polynomial f ∈ C[x1 , . . . , xn ] as Z m( f ; Sn ) :=
log | f (x)| µn (x),
Sn
where Sn := {(z 1 , . . . , z n ) ∈ Cn : |z 1 |2 +· · ·+|z n |2 = 1} is the unit sphere in Cn , and µn is the measure of total mass 1, invariant with respect to the unitary group U (n). More generally, let f ∈ C[X 1 , . . . , X r ] be a polynomial in r groups of n variables each. Its Snr -Mahler measure is then defined as Z m( f ; Snr ) := log | f (X )| µrn (X ) Snr
with Snr := Sn × · · · × Sn . This alternative Mahler measure was introduced by Philippon [49, I]. With this notation the ordinary Mahler measure m( f ) of f ∈ C[x1 , . . . , xn ] coincides with m( f ; S1n ). When f ∈ C is a constant, we agree that m( f ; Sn0 ) = log | f |. The Snr -Mahler measure is related to the ordinary Mahler measure by the following inequalities (see [38, Th. 4]): 0 ≤ m( f ) − m( f ; Snr ) ≤ r d
n−1 X 1 , 2i
(1.2)
i=1
where d is a bound for the degree of f in each group of variables. Finally, we summarize in the following lemma the basic properties of the notion of height of polynomials in Cv [x1 , . . . , xn ]. LEMMA 1.2 Let v ∈ MQ and f 1 , . . . , f s ∈ Cv [x1 , . . . , xn ]. (1) If v = ∞, then P (a) h ∞ ( i f i ) ≤ maxi {h ∞ ( f i )} + log s;
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
531
Qs Ps Ps−1 h ∞ ( i=1 f i ) ≤ i=1 h ∞ ( f i ) + log(n + 1) i=1 deg f i ; h ∞ ( f 1 f 2 ) ≤ h ∞ ( f 1 ) + h ∞ ( f 2 ) + log(n + 1) min{deg f 1 , deg f 2 }. (c) Let g ∈ C[y1 , . . . , ys ]. Set d := maxi {deg f i } and h ∞ := maxi {h ∞ ( f i )}. Then h ∞ g( f 1 , . . . , f s ) ≤ h ∞ (g) + deg g h ∞ + log(s + 1) + log(n + 1) d ; P P Q (d) log | i f i |∞ ≥ i log | f i |∞ − 2 log(n + 1) i deg f i . If v = p for some prime p, then P (a) h p ( i f i ) ≤ maxi {h p ( f i )}; Q P (b) h p ( i f i ) ≤ i h p ( f i ). (c) Let g ∈ C p [y1 , . . . , ys ]. Set d := maxi {deg f i } and h p := maxi {h p ( f i )}. Then h p g( f 1 , . . . , f s ) ≤ h p (g) + deg g h p ; Q P (d) log | i f i | p = i log | f i | p . (b)
(2)
Proof The different behavior for v = ∞ and v = p is simply due to the fact that | · | p is nonArchimedean, that is, it verifies the stronger inequality |a + b| p ≤ max{|a| p , |b| p } for any a, b ∈ C p . Inequalities (1.a), (1.b), (2.a), and (2.b) are now immediate from the definition of hv . For (1.c) and (2.c), let us first consider the case v = ∞. Set c(n) := log(n + 1). First we compute h v ( f 1α1 · · · f sαs ) for the exponent (α1 , . . . , αs ) of a monomial of g. Applying (1.b), we obtain X h ∞ ( f 1α1 · · · f sαs ) ≤ c(n) d + h ∞ αi ≤ c(n) d + h ∞ deg g. i
The polynomial g has at most (s + 1)deg g monomials, and so h ∞ (g( f 1 , . . . , f s )) ≤ h ∞ (g) + (c(n) d + h ∞ ) deg g + c(s) deg g. The case v 6= ∞ follows in a similar way. For (1.d), we apply inequality (1.1) directly: X X log | f i |∞ ≤ m( f i ) + c(n) deg f i i
i
Y X = m( f i ) + c(n) deg f i i
≤ log |
i
Y
f i |∞ + 2 c(n)
i
For (2.d), the Gauss lemma implies that
X
deg f i .
i
P
i
log | f i | p = log |
Q
i
fi | p .
532
KRICK, PARDO, AND SOMBRA
We make frequent use of the following particular case of the previous lemma: Let ( f i j )i j be an (s × s)-matrix of polynomials in Cv [x1 , . . . , xn ] of degrees and heights bounded by d and h v , respectively. From Lemma 1.2(a,b) we obtain • h ∞ det( f i j )i j ≤ s h ∞ + log s + d log(n + 1) , • h p det( f i j )i j ≤ s h p . 1.1.2. Height of polynomials over a number field The set M K of absolute values over K which extend the absolute values in MQ is called the canonical set. We denote by M K∞ the set of Archimedean absolute values in M K , that is, the absolute values extending ∞. If v ∈ M K extends an absolute value v0 ∈ MQ (which is denoted by v | v0 ), there exists a (not necessarily unique) embedding σv : K ,→ Cv0 corresponding to v, that is, such that |a|v = |σv (a)|v0 for every a ∈ K . In the p-adic case, there is a one-to-one correspondence P 7 → v(P ) between prime ideals of O K which divide p and absolute values extending p, defined by |a|v(P ) := p −ordP (a)/eP = N(P )−ordP (a)/eP
fP
for a ∈ K ∗ . Here ordP (a) denotes the order of P in the factorization of a, and N(P ) denotes the norm of the ideal P . Also, eP := ordP ( p) denotes the ramification index, and f P := [O K /P : Z/( p)] denotes the residual degree of the prime ideal P. Note that a ∈ O K if and only if log |a|v ≤ 0 for every v ∈ M K \ M K∞ . We denote by K v the completion of K in Cv0 . The local degree of K at v is defined as Nv := [K v : Qv0 ], and it coincides with the number of different embeddings σ : K ,→ Cv0 which correspond to v. When v is Archimedean, K v is either R or C, and Nv equals 1 or 2 accordingly. When v is non-Archimedean, Nv = eP f P , where P is the prime ideal that corresponds to v. In any case, X [K : Q] = Nv v | v0
for v0 ∈ MQ . The canonical set M K satisfies the product formula with multiplicities Nv : Y | a |vNv = 1, ∀ a ∈ K ∗. (1.3) v∈M K
Let A ⊂ K be a finite set. Let v ∈ M K be an absolute value that extends v0 ∈ MQ , and let σv be an embedding corresponding to v. The local absolute value
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
533
of A at v is defined as |A |v := |σv (A )|v0 = max{|σv (a)|v0 , a ∈ A }. Then we define the local height of A as h v (A ) := max{0, log |A |v } = h v0 (σv (A )). We note that this notion behaves well with respect to extensions. Let K ,→ L be a finite extension, and let w ∈ M L be an absolute value extending v. Then h v (A ) = h w (A ). P For a polynomial f = α aα x α ∈ K [x1 , . . . , xn ], we define the local absolute value of f at v (denoted by | f |v ) as the absolute value at v of its set of coefficients, and the local height of f at v (denoted by h v ( f )) as the local height at v of its set of coefficients. Finally, we define the (global) height of a finite set A ⊂ K as h(A ) :=
X 1 Nv h v (A ). [K : Q] v∈M K
In classical terms this is the affine height of A ; if we set A := {a1 , . . . , a N }, then h(A ) equals the Weil absolute height of the point (1 : a1 : · · · : a N ) ∈ P N . Because of the imposed normalization, this quantity does not depend on the field K in which we consider the set A . This allows us to extend the definition of h to subsets of Q. We also define the (global) height of f 1 , . . . , f s ∈ K [x1 , . . . , xn ] as the global height of its set of coefficients; that is, h( f 1 , . . . , f s ) :=
X 1 Nv max h v ( f i ). i [K : Q]
(1.4)
v∈M K
We have h v (a) ≤ h v (A ) for every a ∈ A and every v ∈ M K , and so max h(a) ≤ h(A ). a∈A
In case A ⊂ O K , we have that h v (A ) = 0 for every v ∈ M K \ M K∞ , and so h(A ) = P (1/[K : Q]) v∈M ∞ Nv h v (A ). We also have h v (A ) ≤ [K : Q] maxa∈A h(a) for K all v ∈ M K and hence h(A ) ≤ [K : Q] max h(a). a∈A
Both inequalities are sharp. Equality is attained in the first one when, for instance, A has only one element.
534
KRICK, PARDO, AND SOMBRA
√ √ √ For √ the second one, set A = {1 + 2, 1 − 2} ⊂ Q( √ √ √ 2). Then h(A ) = log(1 + 2) while h(1 + 2) = h(1 − 2) = (1/2) log(1 + 2). Hence h(A ) = 2 maxa∈A h(a). More generally, if a ∈ C is a Pisot number, namely, an algebraic integer such that |a| > 1 and all its conjugates lie inside the unit disk, and K := Q(a) is Galois, then, for A := {σ (a) : σ ∈ Gal(K /Q)} ⊂ K , we have h(A ) = [K : Q] h(a). Let a = m/n ∈ Q∗ be a rational number, where m ∈ Z and n ∈ N are coprime. Then h(a) = max{|m|, n}, that is, the height of a controls the size of both the minimal numerator and the minimal denominator of a. More generally, let A ⊂ Q be a finite set, and let b ∈ N be a minimal common denominator for all the elements of A . Then h(A ) = log max{ |b A |, b }. The following is the analogous statement for the general case. 1.3 Let A ⊂ K be a finite set. Then there exist b ∈ Z \ {0} and B ⊂ O K such that LEMMA
bA = B,
h(A ) ≤ h({b} ∪ B ) ≤ [K : Q] h(A ).
Proof Let v ∈ M K \ M K∞ , and set P for the corresponding prime ideal of O K . Let av ∈ A such that h v (A ) = h v (av ), and set c(P ) = max{0, − ordP (av )}. Then ordP (a) ≥ −c(P ) for every a ∈ A , and h v (A ) = c(P ) log N(P )/eP f P . Set Y b := N(P )c(P ) , B := {b a ; a ∈ A }, P
where P runs over all prime ideals of O K . Clearly b ∈ N \ {0}. We have ordP (b) = eP f P c(P ), and so ordP (b a) ≥ eP f P c(P ) − c(P ) ≥ 0 for every a ∈ A . Hence B ⊂ O K . For v ∈ M K∞ we have h v ({b} ∪ B ) = h v (A ) + log b, and so X X 1 1 h({b} ∪ B ) = Nv h v ({b} ∪ B ) = Nv h v (A ) + log b. [K : Q] [K : Q] ∞ ∞ v∈M K
v∈M K
We have log b =
X
c(P ) log N(P ) =
P
X
Nv h v (A ),
v ∈M / K∞
and therefore h({b} ∪ B ) =
X X 1 Nv h v (A ) + Nv h v (A ) ≤ [K : Q] h(A ). [K : Q] ∞ ∞ v∈M K
v ∈M / K
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
535
On the other hand, we have h v (A ) + log |b|v ≤ h v ({b} ∪ B) for all v ∈ M K . Applying the product formula (1.3) to b, we obtain X 1 Nv h v (A ) + log |b|v [K : Q] v X 1 ≤ Nv h v ({b} ∪ B ) = h({b} ∪ B ). [K : Q] v
h(A ) =
∗
Finally, let a ∈ Q be a nonzero algebraic number, and set pa ∈ Z[t] for its primitive minimal polynomial. We have h(a) = m( pa )/ deg a. More generally, the height of a finite set can be seen as the height of the minimal polynomial of a generic linear combination of its elements. This gives a partial motivation for the notion of global height of a finite set. LEMMA 1.4 Let A := {a1 , . . . , a N } ⊂ K be a finite set, and set Y pA := (u 0 + σ (a1 ) u 1 + · · · + σ (a N ) u N ) ∈ Q[u 0 , . . . , u N ], σ
where the product is taken over all Q-embeddings σ : K ,→ Q. Then − log(N + 1) ≤ h(A ) − h( pA )/[K : Q] ≤ log(N + 1). Proof Set L(u) := u 0 + a1 u 1 + · · · + a N u N ∈ K [u], so that Y pA = σ (L). σ
For v0 ∈ MQ we choose an inclusion Q ,→ Cv0 . Then for each v ∈ M K such that v|v0 there are Nv embeddings σ : K ,→ Q which correspond to it. We note that for each such σ , log |σ (L)| = h v (A ) holds. Applying Lemma 1.2(b), we obtain X h ∞ ( pA ) ≤ log |σ (L)| + [K : Q ] log(N + 1) σ
=
X
Nv h v (A ) + [K : Q ] log(N + 1).
v∈M K∞
In the same way we obtain h p ( pA ) ≤ h( pA ) ≤
X v∈M K
P
v| p
Nv h v (A ) for p prime, and hence
Nv h v (A ) + [K : Q] log(N + 1) = [K : Q] h(A ) + log(N + 1) .
536
KRICK, PARDO, AND SOMBRA
On the other hand, log |σ (L)| ≤ m(σ (L)) for every σ , as L has total degree 1. Thus X [K : Q] h(A ) = Nv h v (A ) v∈M K
=
X
log |σ (L)|∞ +
σ
≤ m( pA ) +
XX p
X
log |σ (L)| p
σ
h p ( pA )
p
≤ h( pA ) + [K : Q] log(N + 1) by application of Lemma 1.2(d) and inequality (1.1), and the definition of the height.
1.2. Height of varieties In this section we introduce the notions of local and global height of an affine variety defined over a number field. For this aim we recall the basic facts of the degree and Chow form of varieties. As an important particular case, we study the height of an affine toric variety. 1.2.1. Degree of varieties Let k be an arbitrary field, and let V ⊂ An be an affine equidimensional variety of dimension r . We recall that the degree of V is defined as the number of points in the intersection of V with a generic linear variety of dimension n −r . This coincides with the sum of the degrees of its irreducible components. For an arbitrary variety V ⊂ An we set V = ∪i Vi for its decomposition into equidimensional varieties. Following Heintz [23], we define the degree of V as X deg V := deg Vi . i
For V = ∅ we agree deg V := 1. This is a positive integer, and we have deg V = 1 if and only V is a linear variety. The degree of a hypersurface equals the degree of any generator of its defining ideal. The degree of a finite variety equals its cardinal. For a linear morphism ϕ : An → Am and a variety V ⊂ An , we have deg ϕ(V ) ≤ deg V , where ϕ(V ) denotes the Zariski closure of ϕ(V ) in Am . The basic aspect of this notion of degree is its behavior with respect to intersections. It verifies the B´ezout inequality deg(V ∩ W ) ≤ deg V deg W
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
537
for V, W ⊂ An , without any restriction on the intersection type of V and W (see [23, Th. 1], [14, Exam. 8.4.6]). 1.2.2. Normalization of Chow forms Let V ⊂ An be an affine equidimensional variety of dimension r defined over a field k. Let FV be a Chow form of V , that is, a Chow form of its projective closure V ⊂ Pn . This is a squarefree polynomial over k in r + 1 groups U0 , . . . , Ur of n + 1 variables each. It is multihomogeneous of degree D := deg V in each group of variables, and it is uniquely determined up to a scalar factor. In the case when V is irreducible, FV is an irreducible polynomial, and in the general case of an equidimensional variety, the product of Chow forms of its irreducible components is a Chow form of V . In order to avoid this indeterminacy of FV , we fix one of its coefficients under a technical assumption on the variety V . For purpose of reference, we state it in the following assumption. ASSUMPTION 1.5 We assume that the projection πV : V → Ar defined by x 7→ (x1 , . . . , xr ) verifies #πV−1 (0) = deg V .
This assumption implies that πV : V → Ar is a dominant map of degree deg V , by the theorem of dimension of fibers. Later on, we prove that in fact this assumption implies that the projection πV is finite, that is, the variables x1 , . . . , xr are in Noether normal position with respect to V (Lemma 2.14). We remark that the previous condition is satisfied by any variety under a generic linear change of variables. Each group of variables Ui is associated to the coefficients of a generic linear form L i (Ui ) := Ui 0 + Ui 1 x1 + · · · + Ui n xn . The main feature of a Chow form is that FV (ν0 , . . . , νr ) = 0 ⇔ V ∩ {L 0h (ν0 ) = 0} ∩ · · · ∩ {L rh (νr ) = 0} 6 = ∅ n+1
holds for νi ∈ k . Here L ih := Ui 0 x0 + · · · + Ui n xn stands for the homogenization of L i . Assumption 1.5 implies that V ∩ {x1 = 0} ∩ · · · ∩ {xr = 0} is a zero-dimensional variety of Pn lying in the affine space {x0 6= 0}. Set ei for the (i + 1)-vector of the canonical basis of k n+1 . Then FV (e0 , . . . , er )—that is, the coefficient of the monomial U0D0 · · · UrDr —is nonzero. We then define the (normalized) Chow form ChV of V by fixing the election of FV through the condition ChV (e0 , . . . , er ) = 1. Under this normalization, ChV equals the product of the normalized Chow forms of the irreducible components of V .
538
KRICK, PARDO, AND SOMBRA
1.2.3. Height of varieties over Cv Let v ∈ {∞, p; p prime} be an absolute value over Q, and let V ⊂ An (Cv ) be an equidimensional variety of dimension r which satisfies Assumption 1.5. We introduce the height of V as a Mahler measure of its normalized Chow form. Definition 1.6 The height of the affine variety V ⊂ An (Cv ) is defined as r +1 h v (V ) := m(ChV ; Sn+1 ) + (r + 1)
X n
1/2i
deg V
i=1
in case v = ∞ and as h v (V ) := h v (ChV ) in case v = p for some prime p. This definition coincides in the non-Archimedean cases with the local height of V ⊂ Pn with respect to the divisors div(x0 ), . . . , div(xr ) ∈ Div(Pn ) as it is introduced in [20, Sec. 9]. In general, it is also closely related to Philippon’s local height of a projective variety (see [49, II]). Let us consider some examples: Pn Pi • We have that h ∞ An (C) equals the Stoll number i=1 j=1 1/2 j, while h p An (C p ) = 0. This follows from [5, Lem. 3.3.1], [55, Th. 3], [49, I, Th. 2], and the fact that ChAn = det(U0 , . . . , Un ). • Let V ⊂ An (Cv ) be a hypersurface verifying Assumption 1.5, defined by a squarefree polynomial f ∈ Cv [x1 , . . . , xn ]. Then the coefficient of the monodeg V mial xn is nonzero, and we can suppose without loss of generality that it equals 1. Let f h denote the homogenization of f . Then h v (V ) = m( f h ; Sn+1 ) +
X n−1 X i
1/2 j
deg V
i=1 j=1
•
in case v = ∞, while in case v = p for some prime p, h v (V ) = h v ( f ) (see [49, I, Cor. 4]). In case V = {ξ } for some ξ ∈ An , we have (see, e.g., [49, I, Prop. 4]) 1 log(1 + |ξ1 |2 + · · · + |ξn |2 ), 2 h p (V ) = h p (ξ ).
h ∞ (V ) =
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
539
1.2.4. Height of varieties over a number field Let V ⊂ An (Q) be an equidimensional variety of dimension r defined over a number field K . We define the (global) height h(V ) of V as the Faltings height (see [11]) of its projective closure V ⊂ Pn . It verifies the identity X X 1 r +1 + Nv m σv (FV ); Sn+1 Nv log |FV |v h(V ) = [K : Q] ∞ ∞ v∈M K
+ (r + 1)
X n
v ∈M / K
1/2i
deg V,
i=1
where FV denotes any Chow form of V (see [55, Th. 3], [49, I, Th. 2]). Following Philippon [49, III], we introduce h through this identity without appealing to Arakelov theory. For an arbitrary affine variety, we define its (global) height as the sum of the heights of its equidimensional components. It coincides with the sum of the heights of its irreducible components. We agree that h(∅) := 0. We also introduce the local counterpart of this notion. Let v ∈ M K be an absolute value over K , and suppose that V satisfies Assumption 1.5. Let v0 ∈ MQ such that v|v0 , and let σv : K v → Cv0 be an embedding corresponding to v. We define the local height of V at v as h v (V ) := h v0 (σv (V )). This is consistent with the global height h(V ) =
X 1 Nv h v (V ). [K : Q] v∈M K
The global height h is related to the height h BGS of Bost, Gillet, and Soul´e by the formula X r X i h(V ) = h BGS (V ) + 1/2 j deg V i=1 j=1
(see [5, Prop. 4.1.2 (i)]). It is also related to the height h introduced in [17] in terms of the so-called geometric solution of a variety. They are polynomially equivalent (see [54, Th. 1.3.26]), namely, h(V ) ≤ (n deg V h(V ))c , h(V ) ≤ (n deg V h(V ))c , for some constant c > 0. P P We have h(V ) ≥ ( ri=1 ij=1 1/2 j) deg V , with equality only in the case when V is defined by the vanishing of n − r standard coordinates (see [5, Th. 5.2.3]). For Pn Pi instance, h(An ) = i=1 j=1 1/2 j. In particular, h(V ) ≥ 0.
540
KRICK, PARDO, AND SOMBRA
This notion of height satisfies the arithmetic B´ezout inequality (see [5, Th. 5.5.1 (iii)], [49, III, Th. 3]) h(V ∩ W ) ≤ h(V ) deg W + deg V h(W ) + c deg V deg W, for V, W ⊂ An (Q), with dim XV dim XW c := i=0
j=0
1 2(i + j + 1)
dim V + dim W + n− log 2. 2
1.2.5. Height of affine toric varieties Now we consider the case of affine toric varieties. The obtained height estimate is crucial in our treatment of the sparse arithmetic Nullstellensatz (see Corollary 4.12). In what follows we recall some basic notation and results of affine toric varieties and sparse resultants. References are [15] and [57]. Let A = {α1 , . . . , α N } ⊂ Zn be a finite set of integer vectors. Let r := dim A denote the dimension of A , that is, the dimension of the free Z-module ZA . We normalize the volume form of RA in order that any elementary simplex of the lattice ZA have volume 1. The (normalized) volume Vol(A ) of A is defined as the volume of the convex hull Conv({0} ∪ A ) with respect to this volume form. In case ZA = Zn , Vol(A ) then equals n! times the volume of Conv({0} ∪ A ) with respect to the Euclidean volume form of Rn . ∗ N We associate to the set A a map (Q )n → Q defined by ξ 7→ (ξ α1 , . . . , ξ α N ). The Zariski closure of the image of this map is the affine toric variety X A ⊂ A N . This is an irreducible variety of dimension r and degree Vol(A ). For i = 0, . . . , r , we denote by Ui a group of variables indexed by the elements of A and we set X Fi := Uiα x α α∈A
for the generic Laurent polynomial with support contained in A . Let W ∗ ∗ (P N −1 )r +1 × (Q )n be the incidence variety of F0 , . . . , Fr in (Q )n , that is
⊂
W = {(ν0 , . . . , νr ; ξ ); Fi (νi )(ξ ) = 0 ∀i}, ∗
and let π : (P N −1 )r +1 × (Q )n → (P N −1 )r +1 be the canonical projection. Then π(W ) is an irreducible variety of codimension 1. Any of its defining polynomials RA ⊂ Q[U0 , . . . , Ur ] is called the A -resultant or sparse resultant, and it coincides with a Chow form of the affine toric variety X A (see [27]). It is a multihomogeneous polynomial of degree Vol(A ) in each group of variables, and it is uniquely defined up to its sign, if we assume it to be a primitive polynomial with integer coefficients. We obtain the following bound for the height of X A . Our argument relies on the Canny-Emiris determinantal formula for the sparse resultant (see [8]).
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
541
PROPOSITION 1.7 Let A ⊂ Zn be a finite set of dimension r and cardinality #A ≥ 2. Then
h(X A ) ≤ 22 r +2 log(#A ) Vol(A ). Proof Let RA denote the A -resultant, which we assume to be primitive with integer coefficients. Thus X N h(X A ) = m(RA ; SrN+1 ) + (r + 1) 1/2 i Vol(A ), +1 i=1
and so it suffices to estimate the SrN+1 +1 -Mahler measure of RA . Let M be the Canny-Emiris matrix associated to the generic polynomial system F0 , . . . , Fr . This is a nonsingular square matrix of order M, where M denotes the cardinality of the set E := (r + 1) Q + ε ∩ Zn . Here Q := Conv({0} ∪ A ), and ε ∈ Rn is any vector such that each point in E is contained in the interior of a cell in a given triangulation of the polytope (r + 1) Q. In particular, ε can be arbitrarily chosen in a nonempty open set of Rn . Every nonzero entry of M is a variable Uiα . In fact, each row has exactly N nonzero entries, which consist of the variables in some group Ui . We refer to [8, Sec. 4] for the precise construction. Thus det M ∈ Z[U0 , . . . , Ur ] is a multihomogeneous polynomial of total degree M and height bounded by M log N . This polynomial is a nonzero multiple of the sparse resultant RA (see [8, Th. 5.2]). The assumption that RA is primitive implies that det M /RA lies in Z[U0 , . . . , Ur ], and so m(RA ) ≤ m(det M ). Let {T j } j∈J be a unimodular triangulation of Q, so that {(r + 1) T j } j∈I is a triangulation of (r + 1) Q. For every ε ∈ Rn , the set of integer points contained in (r + 1) T j + ε is in correspondence with a subset of those of (r + 1) T j . Moreover, for a generic choice of ε we loose—at least—the set of integer points in a facet of codimension 1. Thus 2r n n # (r + 1) T j + ε ∩ Z ≤ # r T j ∩ Z = ≤ 22 r −1 r and so M≤
X
# (r + 1) T j + ε ∩ Zn ≤ 22 r −1 #J = 22 r −1 Vol(A ).
j∈J
Applying Lemma 1.1, we obtain m(RA ) ≤ log | det M | + deg(det M ) log N
542
KRICK, PARDO, AND SOMBRA
≤ 2 M log N ≤ 22 r log N Vol(A ). We conclude that h(X A ) = m(RA ; SrN+1 +1 ) + (r + 1)
X N
1/2 i
Vol(A )
i=1
≤ m(RA ) + 2 (r + 1) log N Vol(A ) ≤ 22 r +1 log N Vol(A ), as N = #A ≥ 2. In case A ⊂ (Z≥0 )n —that is, when F0 , . . . , Fr are polynomials—we set d := max{|α| : α ∈ A } = deg F0 . We then have N = d+n ≤ (n + 1)d and so n h(X A ) ≤ 22 r +1 log(n + 1) d Vol(A ). 2. Estimates for local and global heights In this chapter we study the basic properties of local and global heights that we need for our purposes. The key result is a precise estimate for the local height of the trace and the norm of a polynomial f ∈ K [x1 , . . . , xn ] with respect to an integral extension K [Ar ] ,→ K [V ]. We also study some of the basic properties of the height of a variety, in particular, its behavior under intersection with hypersurfaces and under affine maps. 2.1. Estimates for Chow forms In this section we recall the notion of generalized Chow forms of a variety in the sense of Philippon [47], and we prove a technical estimate for its local height. 2.1.1. Generalized Chow forms Let V ⊂ An be an affine equidimensional variety of dimension r and degree D defined over a field k. For d ∈ N we denote by U (d)0 a group of d+n variables. Also, for 1 ≤ i ≤ r n we denote by Ui a group of n+1 variables, and we set U (d) := {U (d)0 , U1 , . . . , Ur }. Set X F := U (d)0α x α , L i := Ui0 + Ui1 x1 + · · · + Uin xn |α|≤d
for the generic polynomial in n variables of degree d and 1 associated, respectively, to U (d)0 and Ui .
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
543
Set N := d+n + r (n + 1), and let W ⊂ A N × V be the incidence variety of n F, L 1 , . . . , L r with respect to V ; that is, W :=
ν(d)0 , ν1 , . . . , νr ; ξ ; ξ ∈ V, F ν(d)0 (ξ ) = 0, L i (νi )(ξ ) = 0, 1 ≤ i ≤ r .
Let π : A N × An → A N denote the canonical projection. Then π(W ) ⊂ A N is a hypersurface (see [47, Prop. 1.5]), and any of its defining equations Fd,V ∈ k[U (d)] is called a generalized Chow form or a d-Chow form of V . A d-Chow form is uniquely defined up to a scalar factor. It shares many properties with the usual Chow form, which corresponds to the case d = 1. We have Fd,V ν(d)0 , ν1 , . . . , νr = 0 ⇔ V ∩ {F h ν(d)0 = 0} ∩ {L 1h (ν1 ) = 0} ∩ · · · ∩ {L r(h) (νr ) = 0} 6= ∅ n+1 (d+n) for ν(d)0 ∈ k n and νi ∈ k . Here V ⊂ Pn denotes the projective closure of V , while F h and L ih stand for the homogenization of F and L i , respectively. A d-Chow form Fd,V ∈ k[U (d)] is a multihomogeneous polynomial of degree D in the group of variables U (d)0 and of degree d D in each group Ui (see [47, Lem. 1.8]). When V is an irreducible variety, Fd,V is an irreducible polynomial of k[U (d)]. When V is equidimensional, it coincides with the product of d-Chow forms of its irreducible components. Now, let U0 be another group of n + 1 variables, and consider the morphism
%d : k[U (d)] → k[U0 , U1 , . . . , Ur ] defined by %d (F) = L d0 and %d (L i ) = L i for i = 1, . . . , r , where L 0 stands for the generic linear form associated to U0 . In other terms, d d d! d−|α| α1 αn %d (U (d)0α ) = U00 U01 · · · U0n where := α α (d − |α|)! α1 ! · · · αn ! for |α| ≤ d, and %d (Ui j ) = Ui j for i = 1, . . . , r and j = 0, . . . , n. This morphism gives the following relation between a d-Chow form Fd,V and the usual one (see [47, proof of Prop. 2.8]). 2.1 Let V ⊂ An be an equidimensional variety. Then %d (Fd,V ) = λ FVd for some λ ∈ k∗. LEMMA
544
KRICK, PARDO, AND SOMBRA
Proof It is enough to consider the case when V is irreducible. Set r := dim V . The polynomials %d (Fd,V ) and FV both have the same zero locus; let νi ∈ n+1 A for i = 0, . . . , r . As %d (Fd,V ) = Fd,V (%d (U (d)0 ))α , U1 , . . . , Ur , then %d (Fd,V )(ν0 , . . . , νr ) = 0 if and only if V ∩ {%d (F h )(ν0 ) = 0} ∩ {L 1h (ν1 ) = 0} ∩ · · · ∩ {L rh (νr ) = 0} 6= ∅, that is, if and only if V ∩ {L 0h (ν0 )d = 0} ∩ {L 1h (ν1 ) = 0} ∩ · · · ∩ {L rh (νr ) = 0} 6 = ∅, which is clearly equivalent to FV (ν0 , . . . , νr ) = 0. On the other hand, as V is irreducible, FV is an irreducible polynomial, and thus %d (Fd,V ) is a power of FV (modulo a constant λ). Since deg FV = (r + 1) deg V and deg %d (Fd,V ) = (r + 1) d deg V , we derive that %d (Fd,V ) = λ FVd for some λ ∈ k∗. Now, assume that V satisfies Assumption 1.5. Then V ∩ {x0d = 0} ∩ {x1 = 0} ∩ · · · ∩ {xr = 0} = ∅. Setting e(d)α and ei for the α-vector and the (i + 1)-vector of the canonical bases d+n ( of k n ) and k n+1 , respectively, we infer that Fd,V (e(d)0 , e1 , . . . , er )—that is, the D U d D · · · U d D —is nonzero. coefficient of the monomial U (d)00 rr 11 We define the (normalized) d-Chow form C h d,V of V by fixing the election of Fd,V with the condition C h V (e(d)0 , e1 , . . . , er ) = 1. D U d D · · · U d D is the only monomial of In the previous construction, U (d)00 rr 11 d D · · · U d D . The imposed normalizations then k[U (d)] which maps through %d to U00 rr imply %d (C h d,V ) = C h dV . 2.1.2. An estimate for generalized Chow forms The following technical result is crucial to our local height estimates for the trace and the norm of a polynomial (see Sec. 2.3.2), as well as for the intersection of a variety with a hypersurface (see Sec. 2.2.2). The proof follows the lines of [47, Prop. 2.8]. We adopt the following convention: Let f ∈ k[x1 , . . . , xn ] be a polynomial of degree d. We denote by Fd,V ( f ) and Chd,V ( f ) the specialization of U (d)0 into the coefficients of f in Fd,V and Chd,V , respectively. LEMMA 2.2 Let V ⊂ An (Cv ) be an equidimensional variety of dimension r which satisfies Assumption 1.5. Let f ∈ Cv [x1 , . . . , xn ]. Then Pn r • m Chdeg f,V ( f ); Sn+1 + r i=1 1/2i deg f deg V ≤ deg f h v (V ) + h v ( f ) deg V + log(n + 1) deg f deg V for v = ∞,
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ •
545
h v Chdeg f,V ( f ) ≤ deg f h v (V ) + h v ( f ) deg V for v = p for some prime p.
We need the following lemma in order to treat the non-Archimedean case. 2.3 Let g ∈ C p [y1 , . . . , ym ], and let ⊂ Am (C p ) be a Zariski open set. Then LEMMA
|g| p = max {|g(ν)| p ; ν ∈ , |ν| p = 1}. Proof For q ∈ N we denote by G q the set of q-roots of 1 in Q ,→ C p . Let α = (α1 , . . . , αm ) ∈ Zm such that |αi | < q. Then X 0 if α 6 = 0, ξα = m q if α = 0. m ξ ∈G q
P Set g = α aα x α . Let q > deg g such that |q| p = 1, that is, p /| q. Then for any ω = (ω1 , . . . , ωm ) ∈ (C∗p )m we have aα =
X 1 g(ω ξ ) ξ −α . ωα q m m ξ ∈G q
From the previous expression we derive that for each ω ∈ S := {ω ; |ωi | p = 1} there exists ξω ∈ G qm such that |g| p ≤ maxξ |g(ω ξ )| p = |g(ω ξω )| p . But on the other hand, |g(ω ξω )| p ≤ maxα |aα | p = |g| p . Thus |g| p = |g(ω ξω )| p . The set S is Zariski dense in Am (C p ), and as G qm is finite, the set { ω ξω ; ω ∈ S } ∩ is also dense and, in particular, is nonempty. For any ν0 in this set we have |g| p = |g(ν0 )| p and therefore |g| p ≤ max {|g(ν)| p ; ν ∈ , |ν| p = 1}. The other inequality is straightforward. Proof of Lemma 2.2 First, we consider the case when V is a zero-dimensional variety. We may assume without loss of generality that V is irreducible; that is, V = {ξ } for some ξ = (ξ1 , . . . , ξn ) ∈ Cnv . Set d := deg f . Then X ChV = L(ξ ) := U0 + U1 ξ1 + · · · + Un ξn , Chd,V = F(ξ ) := Uα ξ α , α
546
KRICK, PARDO, AND SOMBRA
where L and F denote generic polynomials in n variables of degree 1 and d, respectively. Then h ∞ (F(ξ )) = log max {|ξ α |} |α|≤d
= log max{1, |ξi |d } i
= d h ∞ (L(ξ )) ≤ d m L(ξ ); Sn+1 + d
X n
1/2 i .
i=1
The last line follows from inequality (1.2). Now, a direct computation shows that h ∞ Chd,V ( f ) ≤ h ∞ F(ξ ) + h ∞ ( f ) + log(n + 1) d. In this case, Chd,V ( f ) ∈ C and so m Chd,V ( f ) = h ∞ Chd,V ( f ) ≤ d m L(ξ ); Sn+1 + d
X n
1/2 i + h ∞ ( f ) + log(n + 1) d
i=1
≤ d h ∞ (V ) + h ∞ ( f ) + log(n + 1) d. Analogously, h p F(ξ ) ≤ d h p L(ξ ) and so h p Chd,V ( f ) ≤ d h p (V ) + h p ( f ). r (n+1) Now, we consider the general case. Set ν = (ν1 , . . . , νr ) ∈ Cv , L(νi ) := νi0 + νi1 x1 + · · · + νin xn , and V (ν) := V ∩ V (L(ν1 ), . . . , L(νr )) ⊂ An (Cv ). Then V (ν) is a zero-dimensional variety of degree deg V for ν in a Zariski open set v of Ar (n+1) (Cv ). Let ν ∈ . By [47, Prop. 2.4] there exist λν , θν ∈ C∗v such that ChV (ν) (U0 ) = λν ChV (U0 , ν) , Chd,V (ν) U (d)0 = θν Chd,V U (d)0 , ν , (2.1) where ChV (U0 , ν), Chd,V U (d)0 , ν stand for the specialization of U1 , . . . , Ur into ν1 , . . . , νr . Applying the morphism %d linking the d-Chow form with the usual one, we obtain ChdV (ν) = %d (Chd,V (ν) ) = θν %d Chd,V (ν) = θν ChdV (ν), and so θν = λdν in identities (2.1).
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
547
We consider the case v = ∞. Any Zariski closed set of Ar (n+1) (C) intersects in a set of µrn+1 -measure zero, and so the previous relation holds for almost evr , which means that for those ν, Ch d ery ν ∈ Sn+1 d,V ( f, ν) = Chd,V (ν) ( f )/λν . Therefore Z r m Chd,V ( f ); Sn+1 = log |Chd,V ( f, ν)| µrn+1 r Sn+1
r Sn+1
Z = r Sn+1
Z ≤ r Sn+1
log |Chd,V (ν) ( f )| − d log |λν | µrn+1 d h ∞ V (ν) + h ∞ ( f ) deg V (ν)
+ log(n + 1) d deg V (ν) − d log |λν | µrn+1 Z =d m ChV (ν) (U0 ); Sn+1 − log |λν | µrn+1 r Sn+1
+
X n
1/2i d deg V + h ∞ ( f ) deg V
i=1
+ log(n + 1) d deg V Z =d m ChV (U0 , ν); Sn+1 µrn+1 r Sn+1
+
X n
1/2i d deg V + h ∞ ( f ) deg V
i=1
+ log(n + 1) d deg V = d h ∞ (V ) + h ∞ ( f ) deg V + log(n + 1) d deg V X n −r 1/2i d deg V. i=1
The case v = p follows analogously from the previous lemma, identities (2.1), and the zero-dimensional case. As before, let v ⊂ Ar (n+1) (Cv ) be a Zariski open set such that ν ∈ v implies that V (ν) is a zero-dimensional variety of degree deg V . By Lemma 2.3 we can take ν ∈ v such that log |ν| p = 1 and |Chd,V ( f )| p = |Chd,V ( f, ν)| p . Thus log |Chd,V ( f )| p = log |Chd,V (ν) ( f )| p − d log |λν | p ≤ d log |ChV (ν) | p + h p ( f ) deg V − d log |λν | p
548
KRICK, PARDO, AND SOMBRA
= d log |ChV (U0 , ν)| p + h p ( f ) deg V ≤ d log |ChV | p + h p ( f ) deg V. The hypothesis that V satisfies Assumption 1.5 is essential in order to properly normalize the involved Chow forms and to define the local height of V . If we disregard normalization, we obtain altogether the following global result. LEMMA 2.4 Let V ⊂ An be an equidimensional variety of dimension r defined over a number field K , and let Fd,V be a d-Chow form of V . Let f ∈ K [x1 , . . . , xn ] be a polynomial of degree d. Then X r 1 Nv m σv Fd,V ( f ) ; Sn+1 [K : Q] ∞ v∈M K
+
X
Nv log |Fd,V ( f )|v + r
v ∈M / K∞
X n
1/2i d deg V
i=1
≤ d h(V ) + h( f ) deg V + log(n + 1) d deg V. Proof Note first that the product formula implies that the left-hand side of the inequality does not depend on the choice of the d-Chow form Fd,V . In the case when V is zero-dimensional, it satisfies Assumption 1.5 trivially. Thus the result follows from direct application of the previous lemma. For the general case, we let Fd,V be an arbitrary d-Chow form of V and we choose FV such that %d (Fd,V ) = FVd holds. Fix an absolute value v ∈ M K . Following the notation in the proof of the previous lemma, for any ν ∈ v there exists λν ∈ C∗v such that ChV (ν) (U0 ) = λν FV (U0 , ν) ,
Chd,V (ν) (U (d)0 ) = λdν Fd,V (U (d)0 , ν).
We then proceed as in the previous lemma, and we obtain the corresponding estimate for v. Adding up these estimates, we derive the estimate in terms of the height of the variety. 2.2. Basic properties of the height We derive some of the basic properties of the notion of height of a variety. In particular, we study the behavior of the height of a variety under intersection with a hypersurface and under an affine map. We also obtain an arithmetic version of the Bernstein-Kushnirenko theorem.
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
549
2.2.1. Height of varieties under affine maps Let ϕ : An → Am be a regular map defined by polynomials ϕ1 , . . . , ϕm ∈ K [x1 , . . . , xn ]. We recall that the height of ϕ is defined as h(ϕ) := h(ϕ1 , . . . , ϕm ). We obtain the following estimate for the height of the image of a variety under an affine map. PROPOSITION 2.5 Let V ⊂ An be a variety of dimension r , and let ϕ : An → A N be an affine map. Then h ϕ(V ) ≤ h(V ) + (r + 1) h(ϕ) + 8 log(n + N + 1) deg V.
The proof of this result follows from the study of the particular cases of a linear projection and an injective affine map. The following estimate for the height of a linear projection of a variety generalizes [11, Prop. 2.10] and [5, Sec. 3.3.2]. Its proof is essentially based on the description of the Chow form of such a projection variety, due to P. Pedersen and B. Sturmfels [46, Prop. 4.1]. LEMMA 2.6 Let V ⊂ An × Am be a variety of dimension r , and let π : An × Am → An denote the projection (x, y) 7→ x. Then h π(V ) ≤ h(V ) + 3 (r + 1) log(n + m + 1) deg V.
Proof We assume without loss of generality that V is irreducible. Set W := π(V ) ⊂ An and s := dim W . The case s = r follows directly from [46, Prop. 4.1]. In this case, there exists a partial monomial order ≺ such that FW | init FV ,
where init FV denotes the initial polynomial of FV with respect to ≺. In particular, init FV is the sum of some of the terms in the monomial expansion of FV . The general case s ≤ r reduces to the previous one. We choose standard coordinates z s+1 , . . . , zr of Am such that the projection $ : An × Am → An × Ar −s , verifies dim Z = r for Z := $ (V ).
(x, y) 7→ (x, z)
550
KRICK, PARDO, AND SOMBRA
Let % : An × Ar −s → An denote the canonical projection. Then F Z | init FV , π = % ◦ $ , and W = %(Z ). We have that %−1 (ξ ) = {ξ } × Ar −s for ξ ∈ %(Z ) by the theorem of dimension of fibers. Thus Z = W × Ar −s , and, in particular, i(W ) = Z ∩ V (z s+1 , . . . , zr ) ⊂ An × Ar −s , where i denotes the canonical inclusion An ,→ An × Ar −s . We have deg W = deg Z and so FW := F Z (z s+1 , . . . , zr ) is a Chow form of W (see [47, Prop. 2.4]). Now, we estimate the height of FW . Let K be a number field of definition of V , and set init FV = Q F Z for some polynomial Q. From the proof of [47, Lem. 1.12(v)], there is a nonzero coefficient λ of Q such that log |λ|v ≤ m(σv (Q)) for all v ∈ M K∞ . Clearly log |λ|v ≤ log |Q|v also holds for all v ∈ / M K∞ . Thus m σv (F Z ) ≤ m σv (init FV ) − log |λ|v for v ∈ M K∞ , while log |F Z |v ≤ log | init FV |v − log |λ|v for v ∈ / M K∞ . ∞ Let v ∈ M K . From [47, Lem. 1.13] we obtain m σv (FW ) ≤ m σv (F Z ) . Hence s+1 m σv (FW ); Sn+1 ≤ m σv (FW ) ≤ m σv (init FV ) − log |λ|v ≤ log | init FV |v + (r + 1) log(n + m + 1) deg V − log |λ|v ≤ log |FV |v + (r + 1) log(n + m + 1) deg V − log |λ|v n+m X r +1 ≤ m σv (FV ); Sn+m+1 + (r + 1) 1/2i deg V i=1
+ 2 (r + 1) log(n + m + 1) deg V − log |λ|v / M K∞ we have analoby application of Lemma 1.1 and inequality (1.2). In case v ∈ gously log |FW |v ≤ log |FV |v − log |λ|v , and so h(W ) ≤ h(V ) + (s + 1)
X n
1/2i
deg V + 2 (r + 1) log(n + m + 1) deg V
i=1
≤ h(V ) + 3 (r + 1) log(n + m + 1) deg V. The following lemma is a variant of [49, I, Prop. 7].
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
551
LEMMA 2.7 Let V ⊂ Am be a variety of dimension r , and let ψ : Am → An be an injective affine map. Then h ψ(V ) ≤ h(V ) + (r + 1) h(ψ) + 5 log(n + 1) deg V.
Proof We assume again without loss of generality that V is irreducible. Let K be a number field of definition of both V and ψ, and set ψ(x) = a + A x for some (m × n)-matrix A of maximal rank and a ∈ K n . Then let ψ ∗ : An+1 → Am+1 be the linear map y 7→ (a, A)t y defined by the transpose of the matrix associated to ψ. Set W := ψ(V ), and let V ⊂ Pm , W ⊂ Pn denote the projective closures of V and W , respectively. n+1 For i = 0, . . . , r we let νi ∈ Q , and we set L h (νi ) := νi0 x0 + · · · + νin xn for the homogenization of the associated linear form. Then FW (ν0 , . . . , νr ) = 0 if and only if there exists ξ ∈ V such that ψ(ξ ) lies in the linear space determined by ν0 , . . . , νr . Equivalently, ξ lies in the linear space determined by ψ ∗ (ν0 ), . . . , ψ ∗ (νr ). We conclude that FW = FV ◦ (ψ ∗ )r +1 . Let v ∈ M K∞ . Then r +1 m σv (FW ), Sn+1 ≤ log |FW |v + (r + 1) log(n + 1) deg V ≤ log |FV |v + (r + 1) h v (ψ) + 2 log(n + 1) deg V + (r + 1) log(n + 1) deg V ≤ m σv (FV ) + (r + 1) log(m + 1) deg V + (r + 1) h v (ψ) + 3 log(n + 1) deg V X m r +1 ≤ m σ (FV ), Sm+1 + 1/2i (r + 1) deg V i=1
+ (r + 1) h v (ψ) + 4 log(n + 1) deg V. Here we have applied Lemma 1.1, inequality (1.2), and the proof of Lemma 1.2(c), using the fact that the number of monomials of FV is bounded by (n + 1)(r +1) deg V . In case v 6∈ M K∞ we obtain analogously log |FW |v ≤ log |FV |v + (r + 1) h v (ψ) deg V , and hence h ψ(V ) ≤ h(V ) + (r + 1) h(ψ) + 5 log(n + 1) deg V.
552
KRICK, PARDO, AND SOMBRA
Proof of Proposition 2.5 Let ψ : An → A N × An be the injective map x 7 → ϕ(x), x . Then ϕ decomposes as ϕ = π ◦ ψ, where π : A N × An → A N denotes the canonical projection. Thus h ϕ(V ) ≤ h ψ(V ) + 3 (r + 1) log(n + N + 1) deg ψ(V ) ≤ h(V ) + (r + 1) h(ψ) + 5 log(n + N + 1) deg V + 3 (r + 1) log(n + N + 1) deg V = h(V ) + (r + 1) h(ϕ) + 8 log(n + N + 1) deg V. 2.2.2. Local height of the intersection of varieties We obtain the following estimate for the local height of the intersection of a variety with a hypersurface. This is a consequence of our previous estimate for generalized Chow forms. This result can be seen as the local analogue of [47, Prop. 2.8], and its proof closely follows it. PROPOSITION 2.8 Let V ⊂ An be an equidimensional variety of dimension r defined over a number field K . Let f ∈ K [x1 , . . . , xn ] be a polynomial that is not a zero divisor in K [V ]. We assume that both V and V ∩ V ( f ) satisfy Assumption 1.5. Then there exists λ ∈ K ∗ such that • h v V ∩ V ( f ) ≤ deg f h v (V ) + h v ( f ) deg V + log(n + 1) deg f deg V − log |λ|v for v ∈ M K∞ , • h v (V ∩ V ( f ) ≤ deg f h v (V ) + h v ( f ) deg V − log |λ|v for v ∈ / M K∞ .
Proof Set d := deg f and W := V ∩ V ( f ) ⊂ An . By [47, Prop. 2.4] there exists Q ∈ K [U1 , . . . , Ur ] \ {0} such that Chd,V ( f ) = Q ChW . Then—as in the proof of Lemma 2.6—there exists a nonzero coefficient λ of Q such that log |λ|v ≤ m σv (Q) for all v ∈ M K∞ and log |λ|v ≤ log |Q|v for all v ∈ / M K∞ . Now, let v ∈ M K∞ . From inequality (1.2) we obtain r log |λ|v ≤ m σv (Q) ≤ m σv (Q); Sn+1 +r
X n
1/2i (d deg V − deg W )
i=1
since Q has degree d deg V − deg W in each group of variables. Then X n r h v (W ) = m σv (ChW ); Sn+1 +r 1/2i deg W i=1
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
553
X n r = m σv Chd,V ( f ) ; Sn+1 +r 1/2i d deg V i=1 r − m σv (Q); Sn+1 −r
X n
1/2i (d deg V − deg W )
i=1
≤ d h v (V ) + h v ( f ) deg V + log(n + 1) d deg V − log |λ|v by straightforward application of Lemma 2.2. The case v ∈ / M K∞ follows in an analogous way. Proposition 2.8 can be immediately generalized to families of polynomials. COROLLARY 2.9 Let V ⊂ An be
an equidimensional variety of dimension r defined over K . Let f 1 , . . . , f s ∈ K [x1 , . . . , xn ] be polynomials that form a complete intersection in V . We assume that V ∩ V ( f 1 , . . . , f i ) satisfies Assumption 1.5 for i = 0, . . . , s. Set di := deg f i . Then there exists λ ∈ K ∗ such that P • h v V ∩ V ( f 1 , . . . , f s ) ≤ h v (V ) + h ( f )/d i deg V + s log(n + i v i Q ∞ 1) deg V i di − log |λ|v for v ∈ M K , Q P • h v V ∩ V ( f 1 , . . . , f s ) ≤ h v (V )+ i h v ( f i )/di deg V i di −log |λ|v for v ∈ / M K∞ .
Proof We just consider the case when v is Archimedean, as the other one follows similarly. From the preceding result we obtain h v V ∩ V ( f 1 , . . . , f i ) ≤ di h v V ∩ V ( f 1 , . . . , f i−1 ) + h v ( f i ) deg V ∩ V ( f 1 , . . . , f i−1 ) + log(n + 1) di deg V ∩ V ( f 1 , . . . , f i−1 ) − log |λi |v for some λi ∈ K ∗ . For the final estimate we apply iteratively this inequality and we Qs d ···d set λ := i=1 λi i+1 s . 2.10 Let f 1 , . . . , f s ∈ K [x1 , . . . , xn ] be polynomials that form a complete intersection in An . We assume that V ( f 1 , . . . , f i ) satisfies Assumption 1.5 for i = 1, . . . , s. Set di := deg f i . Then there exists λ ∈ K ∗ such that COROLLARY
554
•
KRICK, PARDO, AND SOMBRA
h v V ( f1, . . . , fs ) ≤ for v ∈ M K∞ ,
•
h v V ( f1, . . . , fs ) ≤
Q i h v ( f i )/di + (n + s) log(n + 1) i di − log |λ|v
P P
i
h v ( f i )/di
Q
i
di − log |λ|v for v ∈ / M K∞ .
Proof We apply the previous result to V := An , using the fact that h ∞ (An ) =
n X i X
1/2 j ≤ n log(n + 1),
h p (An ) = 0.
i=1 j=1
The following corollary is the global counterpart of the previous results. It can be seen as an arithmetic analogue of [24, Prop. 2.3]. We remark that in the global situation we do not need to assume Assumption 1.5 for the intermediate varieties. In particular, f 1 , . . . , f s do not need to be a complete intersection in V . COROLLARY 2.11 Let V ⊂ An be a variety of dimension r , and let f 1 , . . . , f s ∈ Q[x1 , . . . , xn ]. Set di := deg f i , h := h( f 1 , . . . , f s ), and n 0 := min{r, s}. We assume that d1 ≥ · · · ≥ ds holds. Then X n0 n0 Y h V ∩V ( f 1 , . . . , f s ) ≤ h(V )+ 1/di h deg V +n 0 log(n +1) deg V di . i=1
i=1
Proof We proceed by induction on (r, s) with respect to the product order on N × N defined by (r, s) (r 0 , s 0 ) ⇔ r ≥ r 0 and s ≥ s 0 . The cases when r = 0 or s = 0 are both trivial. Now, let r, s ≥ 1; we assume that the statement holds for all (r 0 , s 0 ) ≺ (r, s) such that (r 0 , s 0 ) 6= (r, s). Let V = ∪C C be the decomposition of V into irreducible components. In case C ⊂ V ( f s ) we have that C ∩ V ( f 1 , . . . , f s ) = C ∩ V ( f 1 , . . . , f s−1 ) and by the inductive hypothesis, X m0 m0 Y h C ∩ V ( f 1 , . . . , f s ) ≤ h(C)+ 1/di h deg C +m 0 log(n +1) deg C di i=1
i=1
with m 0 := min{r, s − 1}. In case C 6⊂ V ( f s ) we have either C ∩ V ( f s ) = ∅ or dim C ∩ V ( f s ) ≤ r − 1. The first case is trivial. For the second case, we proceed as in the proof of Lemma 2.8,
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
555
applying Lemma 2.4 instead of Lemma 2.2, and we obtain h C ∩ V ( f s ) ≤ ds h(C) + h deg C + log(n + 1) ds deg C. Since dim(C ∩ V ( f s )) = r − 1, we can apply the inductive hypothesis to the variety C ∩ V ( f s ), and we obtain h C ∩ V ( f1, . . . , fs ) nX 0 −1 ≤ h C ∩ V ( fs ) + 1/di h deg C ∩ V ( f s ) i=1 n 0 −1 Y + (n 0 − 1) log(n + 1) deg C ∩ V ( f s ) di i=1
≤ h(C) +
X n0
1/di h deg C + n 0 log(n + 1) deg C
i=1
n0 Y
di .
i=1
Finally, X h V ∩ V ( f1 ∩ · · · ∩ fs ) ≤ h C ∩ V ( f1 ∩ · · · ∩ fs ) C
≤
X
h(C) +
X n0
C
= h(V ) +
1/di h deg C + n 0 log(n + 1) deg C
n0 Y
i=1
X n0
di
i=1
1/di h deg V + n 0 log(n + 1) deg V
n0 Y
di .
i=1
i=1
With the same notation as in Corollary 2.11, for V := An we obtain X Y n0 n0 h V ( f1, . . . , fs ) ≤ 1/di h + (n + n 0 ) log(n + 1) di . i=1
i=1
2.2.3. An arithmetic Bernstein-Kushnirenko theorem From our estimate for the height of an affine toric variety (see Proposition 1.7) and the previous results of this section, we derive the following arithmetic version of the Bernstein-Kushnirenko theorem. We refer to Section 1.2.5 for the notation. PROPOSITION 2.12 Let f 1 , . . . , f s ∈ K [x1 , . . . , xn ], and set
A := Supp(1, x1 , . . . , xn , f 1 , . . . , f s ) ⊂ (Z≥0 )n .
Also, set d := maxi deg f i and h := h( f 1 , . . . , f s ). Then
556 • •
KRICK, PARDO, AND SOMBRA
deg V ( f 1 , . . . , f s ) ≤ Vol(A ), h V ( f 1 , . . . , f s ) ≤ n h + 22(n+1) log(n + 1) d Vol(A ).
Proof Set A := {α1 , . . . , α N }. The case N = 1 is trivial, and so we assume N ≥ 2. We also assume that α1 , . . . , αn are the vectors of the canonical basis of Rn . The map ϕA : An → A N induces an isomorphism between An and the affine toric variety X A ⊂ A N . The projection map πA : A N → An defined by y 7→ (y1 , . . . , yn ) restricted to X A is the inverse map of ϕA . P For i = 1, . . . , s we set f i = Nj=1 ai j x α j , and we let `i :=
N X
ai j y j ∈ K [y1 , . . . , y N ]
j=1
be the associated linear form. Set V := V ( f 1 , . . . , f s ) ⊂ An and W := X A ∩ V (`1 , . . . , `s ) ⊂ A N . We have ϕA (V ) = W , and so V = πA (W ). Then deg V ≤ deg W ≤ deg X A = Vol(A ) and h(V ) ≤ h(W ) + 3 (n + 1) log(N + 1) deg W ≤ h(X A ) + n h deg(X A ) + 4 (n + 1) log(N + 1) deg(X A ) ≤ n h + (22 n+1 log N + 4 (n + 1) log(N + 1)) Vol(A ) by successive application of Lemma 2.6, Corollary 2.11, and Proposition 1.7. Finally, 2(n+1) log(n + 1) d Vol(A ). N ≤ d+n n , and so h(V ) ≤ n h + 2 It seems that the factor 22 n in the estimate of h(X A ) is superfluous. If this is the case, the above estimate can be considerably improved. V. Maillot has recently obtained another estimate for the height of the isolated points of V ( f 1 , . . . , f s ), which is more precise in some particular cases (see [42, Cor. 8.2.3]). 2.3. Local height of norms and traces Let V ⊂ An be an equidimensional variety of dimension r and degree D defined over a field k which satisfies Assumption 1.5. As we see below, this implies that the projection πV : V → Ar defined by x 7 → (x1 , . . . , xr ) is finite (see Lemma 2.14). Set L := k(Ar ) and M := L ⊗k[Ar ] k[V ], so that M is a finite L-algebra of dimension D. Let f ∈ k[x1 , . . . , xn ]. We identify f ∈ k[V ] with the multiplication map M → M defined by q 7→ f q. From the Hamilton-Cayley theorem we derive that the characteristic polynomial X f ∈ L[t] of this map verifies X f ( f ) = 0.
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
557
The fact that the inclusion πV∗ : k[Ar ] ,→ k[V ] is integral implies that the minimal polynomial m f of this map lies in k[Ar ][t]. We have that X f | m Df in L[t], and so Gauss lemma implies that X f lies in fact in k[Ar ][t]. Moreover, the natural map k[V ] → M is an inclusion, as V is an equidimensional variety, and so X f ( f ) = 0 in k[V ]. Set X f = t D + b D−1 t D−1 + · · · + b0 ∈ k[Ar ][t]. Then the norm NV ( f ) and the trace TrV ( f ) of f are defined as NV ( f ) := (−1) D b0 ∈ k[Ar ] ,
TrV ( f ) := −b D−1 ∈ k[Ar ].
They equal, respectively, the determinant and the trace of the L-linear map f : M → M. We also define the adjoint polynomial f ∗ of f as f ∗ := (−1) D−1 ( f D−1 + b D−1 f D−2 + · · · + b1 ) ∈ k[x1 , . . . , xn ]. From the identity X f ( f ) = 0 we obtain that f ∗ f = NV ( f ) in k[V ]. The key result of this section is a precise bound for the height of the norm and the trace of a polynomial in the case when k is a number field. 2.3.1. Characteristic polynomials Let V ⊂ An be an equidimensional variety of dimension r and degree D defined over k. We keep notation as in Section 2.1.1: for d ∈ N we denote by P α and L := U + U x + · · · + U x F := i in n the generic i0 i1 1 |α|≤d U (d)0α x polynomial of degree d and 1 associated to the group of variables U (d)0 and Ui , respectively. As before, we set U (d) := {U (d)0 , U1 , . . . , Ur } and N := d+n + r (n + 1). n Also, we introduce an additional group T := {T0 , . . . , Tr } of r + 1 variables which correspond to the coordinate functions of Ar +1 . We consider the map ψ : A N × An → A N × Ar +1 , ν(d), ξ 7 → ν(d), F ν(d)0 (ξ ), L 1 (ν1 )(ξ ), . . . , L r (νr )(ξ ) , where ν(d) := (ν(d)0 , ν1 , . . . , νr ) ∈ A N and ξ ∈ An . Then the Zariski closure ψ(A N × V ) ⊂ A N × Ar +1 is a hypersurface, and any of its defining equations Pd,V ∈ k[U (d)][T ] is called a d-characteristic polynomial of V . Also, we define the characteristic polynomial of V by PV := P1,V . A d-characteristic polynomial is uniquely defined up to a scalar factor. In the case when V is irreducible, ψ(A N × V ) is an irreducible hypersurface and thus Pd,V is an irreducible polynomial. When V is equidimensional, it coincides with the product of d-characteristic polynomials of its irreducible components.
558
KRICK, PARDO, AND SOMBRA
The following construction links the characteristic polynomial of a variety with its generalized Chow form. Set U (d)00 − T0 for α = 0, ζ (d)0α := U (d)0α for α 6 = 0. Analogously, for i = 1, . . . , r we set ζi0 := Ui0 − Ti and ζi j := Ui j for j 6= 0. Finally, we set ζ (d) := ζ (d)0 , ζ1 , . . . , ζr . 2.13 Let V ⊂ An be an equidimensional variety of dimension r and degree D. Let Fd,V be a d-Chow form of V . Then Fd,V ◦ ζ (d) is a d-characteristic polynomial of V . LEMMA
Proof It is enough to consider the case when V is irreducible. Let Pd,V be a d-characteristic polynomial of V . For (ν(d), ξ ) ∈ A N × V we set ϑ := F ν(d)0 (ξ ), L 1 (ν1 )(ξ ), . . . , L r (νr )(ξ ) ∈ Ar +1 , so that Pd,V ν(d) (ϑ) = 0. We observe that ξ ∈ V ∩ {F ν(d)0 (x) = ϑ0 } ∩ {L 1 (ν1 )(x) = ϑ1 } ∩ · · · ∩ {L r (νr )(x) = ϑr } ⊂ An . In particular, this variety is nonempty, and so we infer that Fd,V ◦ ζ (d) ν(d), ϑ = 0. This implies that Pd,V |Fd,V ◦ ζ (d), as Pd,V is an irreducible polynomial. On the other hand, Fd,V ◦ ζ (d) is also irreducible, as it is multihomogeneous and Fd,V ◦ ζ (d) U (d), 0 = Fd,V U (d) . We conclude that Pd,V and Fd,V ◦ ζ (d) coincide up to a factor in k ∗ . The previous construction shows that a d-characteristic polynomial of V is multihomogeneous of degree D in the group of variables U (d)0 ∪ {T0 } and of degree d D in each group Ui ∪ {Ti }. Set kd := k U (d) , and set φ : An (kd ) → Ar +1 (kd ) , x 7→ F(x), L 1 (x), . . . , L r (x) . Then Pd,V ∈ kd [T ] is also a minimal equation for the hypersurface φ(V ), and by B´ezout inequality we have also degT Pd,V ≤ d D (see, e.g., [50, Prop.1]). We assume from now on that V satisfies Assumption 1.5, that is, #πV−1 (0) = deg V . In order to avoid the indeterminacy of the d-characteristic polynomial, we fix it as Pd,V := (−1) D Chd,V ◦ ζ (d).
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
559
In particular, we set PV := (−1) D ChV ◦ ζ (1) for the characteristic polynomial of V. Set PV := a D T0D + · · · + a0 for the expansion of PV with respect to T0 . We have that PV is multihomogeneous of degree D in each group Ui ∪ {Ti }. This implies that a D lies in fact in k[U1 , . . . , Ur ] and is multihomogeneous of degree D in each Ui for i = 1, . . . , r . D in Ch , and the imposed Moreover, a D coincides with the coefficient of U00 V normalization on ChV implies that a D (e1 , . . . , er ) = ChV (e0 , e1 , . . . , er ) = 1. We extend the morphism %d of Section 2.1.1 to a morphism k[U (d)][T ] → k[U0 , . . . , Ur ][T ] defining %d U (d)00 − T0 := (U00 − T0 )d and %d (Ti ) := Ti for 1 ≤ i ≤ r . In other terms, d X d d− j j %d (T0 ) = (−1) j−1 U00 T0 . j j=1
We obtain d %d (Pd,V ) = %d (−1) D Chd,V ◦ ζ (d) = (−1) D ChV ◦ ζ (1) = (−1)(d+1)D PVd . Now, set Pd,V = ad,D T0D + · · · + ad,0
for the expansion of Pd,V with respect to T0 . The previous remark implies that ad,D = %d (ad,D ) = a dD . In particular, ad,D ∈ k[U1 , . . . , Ur ] and ad,D (e1 , . . . , er ) = 1. The following lemma allows us to obtain a characteristic polynomial of f ∈ k[x1 , . . . , xn ] from the d-characteristic polynomial of the variety V . We introduce the following convention: Given a polynomial f ∈ k[x1 , . . . , xn ] of degree d and linear forms `1 , . . . , `r ∈ k[x1 , . . . , xn ], we denote by Pd,V ( f, `1 , . . . , `r ) the specialization of the variables in U (d) into the coefficients of f, `1 , . . . , `r . 2.14 Let V ⊂ An be an equidimensional variety of dimension r and degree D which satisfies Assumption 1.5. Then the projection πV : V → Ar is finite. Moreover, for a polynomial f ∈ k[x1 , . . . , xn ] of degree d, the characteristic polynomial of f is given by LEMMA
X f = Pd,V ( f, e1 , . . . , er )(t, x1 , . . . , xr ) ∈ k[Ar ][t].
560
KRICK, PARDO, AND SOMBRA
Proof We have that PV (U0 , . . . , Ur )(L 0 , . . . , L r ) = 0 in k[U ] ⊗ k[V ], and so PV (e j , e1 , . . . , er )(t, x1 , . . . , xr ) ∈ k[Ar ][t]
is a monic equation for x j in k[V ], for j = r + 1, . . . , n. Thus the projection πV is finite. For the second assertion, set PF (t) := Pd,V U (d)0 , e1 , . . . , er (t, x1 , . . . , xr ) ∈ k[U (d)0 ][Ar ][t]. This is a polynomial of degree D. It is monic with respect to t, as ad,D ∈ k[U1 , . . . , Ur ] and ad,D (e1 , . . . , er ) = 1. We have PF (F) = 0 in k[U (d)0 ] ⊗ k[V ]. Now, let m F be the monic minimal polynomial of F. Let U 0 (d)0 be a group of d+n−r variables, and set F0 for the generic polynomial of degree d in the variables n−r xr +1 , . . . , xn . Then m F U 0 (d)0 , 0 ∈ k[U 0 (d)0 ][t] is an equation for F0 over πV−1 (0). Since πV−1 (0) is a zero-dimensional variety of degree D and F0 separates its points, we infer that degT0 m F = D, and so PF = m F . Finally, we obtain X f = X F ( f ) = PF ( f ) = Pd,V ( f, e1 , . . . , er )(t, x1 , . . . , xr ).
2.3.2. Estimates for norms and traces Now, we prove the announced estimates for the height of the norm and the trace of a polynomial. 2.15 Let V ⊂ An be an equidimensional variety of dimension r defined over K which satisfies Assumption 1.5. Let f ∈ K [x1 , . . . , xn ]. Then • deg NV ( f ) ≤ deg f deg V , • h v NV ( f ) ≤ deg f h v (V ) + h v ( f ) deg V + (r + 1) log(n + 1) deg f deg V for v ∈ M K∞ , • h v NV ( f ) ≤ deg f h v (V ) + h v ( f ) deg V for v ∈ / M K∞ . LEMMA
Proof We keep notation as in Section 2.3.1. Set d := deg f and D := deg V . We then have NV ( f ) = (−1) D Pd,V ( f, e1 , . . . , er )(0, x1 , . . . , xr ) = Chd,V ( f, e1 − e0 x1 , . . . , er − e0 xr )
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
561
by Lemmas 2.14 and 2.13. Then deg NV ( f ) ≤ degT Pd,V ≤ d D. From the previous expression we also obtain that the coefficients of NV ( f ) are some of the coefficients of Chd,V ( f ), and so | NV ( f )|v ≤ |Chd,V ( f )|v for every absolute value v of K . Let v ∈ M K∞ . Then log | NV ( f )|v ≤ log |Chd,V ( f )|v X n r ≤ m σv Chd,V ( f ) ; Sn+1 + r 1/2i d D + r log(n + 1) d D i=1
≤ d h v (V ) + h v ( f ) D + (r + 1) log(n + 1)d D by inequalities (1.1) and (1.2) and Lemma 2.2. In a similar way we obtain h v NV ( f ) ≤ d h v (V ) + h v ( f )D for v ∈ / M K∞ . The proof of the following lemma follows closely that of [50, Lem. 9]. We slightly improve the degree estimate obtained therein, and we get the corresponding height estimate. LEMMA 2.16 Let V ∈ An be an equidimensional variety of dimension r defined over K which satisfies Assumption 1.5. Let f, g ∈ K [x1 , . . . , xn ] such that f is not a zero divisor in K [V ]. Set d := max{deg f, deg g} and h v := max{h v ( f ), h v (g)} for v ∈ M K . Then • deg TrV ( f ∗ g) ≤ d deg V , • h v TrV ( f ∗ g) ≤ d h v (V ) + (h v + log 2) deg V + (r + 1) log(n + 1) d deg V for v ∈ M K∞ , • h v TrV ( f ∗ g) ≤ d h v (V ) + h v deg V for v ∈ / M K∞ .
Proof Let D := deg V , and let t be a new variable. Then K [x1 , . . . , xr , t] ,→ K [V × A1 ] is again an integral inclusion and NV ×A1 (t − f ∗ g) = X f ∗ g (t). Set Q(t) := NV ×A1 (t f − g) ∈ K [x1 , . . . , xr , t]. Since f ∗ f = NV ( f ), we have that NV ( f ∗ ) = NV ( f ) D−1 , and so NV ( f ) D−1 Q = NV ( f ∗ ) Q = X f ∗ g NV ( f )t . Set Q = c D t D +· · ·+c0 with ci ∈ K [Ar ]. The last identity then implies TrV ( f ∗ g) = −c D−1 .
562
KRICK, PARDO, AND SOMBRA
Set q > D, and let G q denote the group of q-roots of 1. Then Q(ω) = N V (ω f − g) for ω ∈ G q , and so TrV ( f ∗ g) = −
1 X NV (ω f − g) ω1−D . q ω∈G q
From Lemma 2.15 we get deg TrV ( f ∗ g) ≤ d D. For v ∈ M K∞ , we then obtain h v TrV ( f ∗ g) ≤ max h v NV (ω f − g) ω∈G q
≤ d h v (V ) + (h v + log 2) D + (r + 1) log(n + 1) d D. Analogously, for v ∈ / M K∞ we take q > D such that |q|v = 1, and we obtain ∗ h v TrV ( f g) ≤ d h v (V ) + h v D. 3. An effective arithmetic Nullstellensatz In this section we obtain the announced estimates for the arithmetic Nullstellensatz over the ring of integers of a number field K . Theorem 1 in the introduction corresponds to the case K := Q. These estimates depend on the number of variables and on the degree and height of the input polynomials. 3.1. Division modulo complete intersection ideals A crucial tool in our treatment of the arithmetic Nullstellensatz is the trace formula. One of its outstanding features is that it performs effective division modulo complete intersection ideals (see [19], [13], [32], [50], [17], [22]). In this section we apply the trace formula to obtain sharp height estimates in the division procedure. 3.1.1. Trace formula We describe in what follows the basic aspects of duality theory for complete intersection algebras that we need in the sequel. We refer to E. Kunz [34, Appendix F] for a more complete presentation of this theory. Let k be a perfect field, and set A := k[t1 , . . . , tr ] and A[x] := A[x1 , . . . , xn ]. Let F := {F1 , . . . , Fn } ⊂ A[x] be a reduced complete intersection that defines a radical ideal (F) of dimension r . We consider the A-algebra B := A[x]/(F) = A[x1 , . . . , xn ]/(F1 , . . . , Fn ). We assume that the inclusion A ,→ B is finite; that is, the variables t1 , . . . , tr are in Noether normal position with respect to the variety V := V (F) ⊂ Ar +n . This is
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
563
the case, for instance, if V satisfies Assumption 1.5. Thus B is a projective A-module that turns out to be free of rank bounded by deg V by the Quillen-Suslin theorem. The dual A-module B ∗ := Hom A (B, A) can be seen as a B-module with scalar multiplication defined by f · τ (g) := τ ( f g) for f, g ∈ B and τ ∈ B ∗ . It is a free B-module of rank 1, and any of its generators is called a trace of B. The following construction yields a trace σ canonically associated to the complete intersection F. (x) We take new variables y := {y1 , . . . , yn }, and we set Fi := Fi (x) ∈ A[x] and (y) (y) (x) Fi := Fi (y) ∈ A[y]. Then Fi − Fi belongs to the ideal (y1 − x1 , . . . , yn − xn ), and so there exist (nonunique) Pi j ∈ A[x, y] such that (y)
Fi
(x)
− Fi
=
n X
Pi j (x, y) (y j − x j )
j=1
for i = 1, . . . , n. We consider the determinant 1 ∈ A[x, y] of the square matrix (Pi j )i j , and we write it as X 1= am bm m
with am ∈ A[x] and bm ∈ A[y]. Again, the polynomials am , bm are not uniquely defined. The polynomial 1 ∈ A[x, y] is called a pseudo-Jacobian determinant of the complete intersection F. Set cm := bm (x) ∈ A[x]. Then there exists a unique trace σ ∈ B ∗ such that for g ∈ A[x], X g= σ (g a m ) cm m
where the bar denotes class modulo (F). This identity is known as the trace formula. Let J := det(∂ Fi /∂ x j )i j be the Jacobian determinant of the complete intersection F with respect to the variables x1 , . . . , xn . Then the following identity—which justifies the name of pseudo-Jacobian for 1—holds: X J= a m cm . m
The standard trace TrV is related to σ by the equality TrV (g) = σ (J g) for all g ∈ A[x]. 3.1.2. A division lemma Throughout this section we keep notation and assumptions as in the previous one, but we replace k by a number field K . Set d := maxi deg Fi and h v := maxi h v (Fi )
564
KRICK, PARDO, AND SOMBRA
for v ∈ M K . Here deg Fi denotes the total degree of Fi as an element of K [t1 , . . . , tr ][x1 , . . . , xn ]. We choose concrete polynomials am , cm which satisfy the trace formula, and we estimate their degree and local height. First, we choose the polynomials Pi j . Remarking that (y) Fi
−
(x) Fi
=
n X
Fi (x1 , . . . , x j−1 , y j , . . . , yn ) − Fi (x1 , . . . , x j , y j+1 , . . . , yn ),
j=1
we set Pi j := (Fi (x1 , . . . , x j−1 , y j , . . . , yn ) − Fi (x1 , . . . , x j , y j+1 , . . . , yn ))/(y j − x j ). Here we perform the division through the formula + y k−2 (y kj − x kj )/(y j − x j ) = y k−1 x j + · · · + y j x k−2 + x k−1 j j j j . We set 1 := det(Pi j )i j . Finally, we choose bm ∈ A[y] as the monomials in the expansion of 1 with respect to y, am ∈ A[x] as the corresponding coefficient, and we set cm := bm (x). P Set Fi = α Aiα x α with Aiα ∈ A. Then X α j−1 α j+1 α −1 α −1 Pi j = Aiα x1α1 · · · x j−1 y j+1 · · · ynαn (y j j + · · · + x j j ) ∈ A[x, y]. α
We deduce that deg Pi j ≤ d − 1 and h v (Pi j ) ≤ h v for every v ∈ M K . Then deg 1 ≤ n (d − 1), and so deg am + deg cm ≤ n (d − 1). We have also h v (cm ) = 0 and h v (am ) ≤ h v (1). Finally, we can write Pi j = C0 + · · · + Cd−1 y d−1 , j where each Ck ∈ A[x1 , . . . , x j , y j+1 , . . . , yn ] is a polynomial in n + r variables of total degree bounded by deg Pi j ≤ d − 1. This implies that the number of monomials +d−1 of Pi j is bounded by d n+rn+r ≤ d (n + r + 1)d−1 . ∞ Therefore, for v ∈ M K we have h v (am ) ≤ h v (1) ≤ n h v + (n − 1) log d + (d − 1) log(n + r + 1) + n log n ≤ n h v + d log(n + r + 1) + log d . Analogously, we have h v (am ) ≤ n h v for v ∈ / M K∞ .
(3.1)
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
565
The following lemma is a sharp estimate for the degree and local height of the polynomials in the division procedure. It is a substantial improvement over [32, Th. 29]. We introduce the notation degt f and degx f for the degree of a polynomial f ∈ A[x] with respect to the groups of variables t and x, respectively. MAIN LEMMA 3.1 (Division lemma) Set A := K [t1 , . . . , tr ] and A[x] := A[x1 , . . . , xn ]. Let F := {F1 , . . . , Fn } ⊂ A[x] be a reduced complete intersection defining a variety V := V (F) ⊂ Ar +n which satisfies Assumption 1.5. Set B := K [V ] = A[x]/(F). Let f, g ∈ A[x] be polynomials such that f ∈ B is a nonzero divisor and f | g in B. Set d := max{deg f, deg F1 , . . . , deg Fn } and h v := max{h v ( f ), h v (F1 ), . . . , h v (Fn )} for v ∈ M K . Then there exist q ∈ A[x] and ξ ∈ K ∗ such that • q f = g, • degx q ≤ n d, • deg q ≤ degt g + n d + max{(n + 1) d, degx g} deg V , •
•
h v (q) ≤ h v (g)+(n d +max{d, degx g}) h v (V )+ (n +1) h v +(r +6) log(n + r + 1) n d + max{(n + 1) d, degx g} deg V + 2 log(r + 1) degt g − log |ξ |v for v ∈ M K∞ , h v (q) ≤ h v (g) + (n d + max{d, degx g}) h v (V ) + (n + 1) h v deg V − log |ξ |v for v ∈ / M K∞ .
Proof Set L := K (t1 , . . . , tr ) for the quotient field of A and M := L ⊗ A B. Then M is a finite L-algebra of dimension deg V and σ can be uniquely extended to a L-linear map σ : M → M. The fact that B is a torsion-free A-algebra implies that the canonical map B → M is an inclusion. We only consider the case n ≥ 1. For the case n = 0 we refer to Remark 3.2. Whenever it is clear from the context, we avoid explicit reference to the ring in which we are considering a given element of A[x]. Let q0 ∈ A[x] be any polynomial such that q0 f = g in B. We have that f is a nonzero divisor in B, and so it is invertible in M. Then q0 = f −1 g in M and therefore σ ( f −1 g p) = σ (q0 p) ∈ A for all p ∈ A[x]. Then we set X q := σ ( f −1 g am ) cm ∈ A[x]. m
Trace formula implies that q ≡ q0 (mod (F)), and so q f = g in B.
566
KRICK, PARDO, AND SOMBRA
Let J ∈ A[x] denote the Jacobian determinant of the complete intersection F with respect to the group of variables x. This is a nonzero divisor because of the Jacobian criterion (see, for instance, [10, Th. 18.15]), and so it is also invertible in M. Let (J f )∗ be the adjoint polynomial of J f , and set 3m := TrV (J f )∗ g am ∈ A. We have J f (J f )∗ = N(J f ) ∈ A \ {0}, and so 3m / N(J f ) = Tr (J f )−1 g am = σ ( f −1 g am ) ∈ A. In particular, N(J f ) | 3m in A, and we have the expression q=
X 1 3m cm . N(J f ) m
In the sequel, ξ ∈ K ∗ is any nonzero coefficient of N(J f ). Let us consider degrees. Clearly degx q ≤ maxm deg cm ≤ n(d − 1) ≤ n d. P α Next we analyze the total degree of q. Let g := α pα x be the monomial expansion of g with respect to x. Then X 3m = pα Tr (J f )∗ x α am , (3.2) α
as Tr is an A-linear map. We have the estimates deg(J f ) ≤ n (d − 1) + d ≤ (n + 1) d and deg(x α am ) ≤ degx g + deg am , from which we get deg Tr (J f )∗ x α am ≤ max{(n + 1) d, degx g + deg am } deg V by Lemma 2.16. Thus deg q ≤ degt g + max{max{(n + 1) d, degx g + deg am } deg V + deg cm } m
≤ degt g + max{max{(n + 1) d + deg cm , degx g + deg am + deg cm }} deg V m
≤ degt g + max{(n + 1) d + n d, degx g + n d} deg V ≤ degt g + (n d + max{(n + 1) d, degx g}) deg V. For the rest of the proof, we use the following basic estimates several times: max{deg(J f ), deg(x α am )} ≤ n d + max{d, degx g}, deg Tr (J f )∗ x α am ≤ (n d + max{d, degx g}) deg V. Finally, we estimate the local height of q. Let v ∈ M K∞ . We have h v (∂ Fi /∂ x j ) ≤ h v + log d, and so h v (J ) ≤ n (h v + log d) + (n − 1) log(n + r + 1) (d − 1) + n log n
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
567
≤ n h v + log(n + r + 1) d + log d . Therefore h v (J f ) ≤ n h v + log(n + r + 1) d + log d + h v + log(n + r + 1) d ≤ (n + 1) h v + (n + 1) log(n + r + 1) d + n log d
(3.3) by Lemma 1.2(b). We recall that h v (x α am ) ≤ n h v + log(n + r + 1) d + log d by inequality (3.1), and so max{h v (J f ), h v (x α am )} ≤ (n + 1) h v + (n + 1) log(n + r + 1) d + n log d. Then h v Tr (J f )∗ x α am ≤ (n d + max{d, degx g}) h v (V ) + (n + 1)h v + (n + 1) log(n + r + 1) d + n log d + log 2 deg V + (r + 1) log(n + r + 1) (n d + max{d, degx g}) deg V ≤ (n d + max{d, degx g}) h v (V ) + (n + 1) h v + (2 n + 1) log(n + r + 1) d deg V + (r + 1) log(n + r + 1) (n d + max{d, degx g}) deg V by Lemma 2.16. By considering separately the cases degx g ≤ (n + 1) d and degx g > (n + 1) d, we obtain (2 n + 1) d + (r + 1) (n d + degx g) ≤ degx g + n d + (r + 1) (n d + degx g) ≤ (r + 2) (n d + degx g). We conclude that h v Tr (J f )∗ x α am ≤ (n d + max{d, degx g}) h v (V ) + (n + 1) h v + (r + 2) log(n + r + 1) × n d + max{(n + 1) d, degx g} deg V. Hence h v (3m ) ≤ max{h v pα Tr (J f )∗ x α am } + log(n + 1) degx g α ≤ h v (g) + max{h v Tr (J f )∗ x α am } α
568
KRICK, PARDO, AND SOMBRA
+ log(r + 1) (n d + max{d, degx g}) deg V + log(n + 1) degx g ≤ h v (g) + (n d + max{d, degx g})h v (V ) + (n + 1) h v + (r + 2) log(n + r + 1) × n d + max{(n + 1) d, degx g} deg V + log(r + 1) (n d + max{d, degx g}) deg V + log(n + 1) degx g ≤ h v (g) + (n d + max{d, degx g}) h v (V ) + (n + 1) h v + (r + 4) log(n + r + 1) × n d + max{(n + 1)d, degx g} deg V by application of identity (3.2) and Lemma 1.2(a,b). We have h v (q) ≤ max {h v 3m / N(J f ) } m
as each cm is a different monomial in x. Thus it remains only to estimate the local height of each 3m /N (J f ). Recall that ξ ∈ K ∗ is any nonzero coefficient of N(J f ). Then log |3m / N(J f )|v ≤ h v (3m ) + 2 log(r + 1) × degt g + (n d + max{d, degx g}) deg V
− log | N(J f )|v ≤ h v (g) + (n d + max{d, degx g}) h v (V ) + (n + 1) h v + (r + 6) log(n + r + 1) × n d + max{(n + 1)d, degx g} deg V + 2 log(r + 1) degt g − log |ξ |v
(3.4)
by Lemma 1.2(d) and the fact that log |ξ |v ≤ log | N(J f )|v . From Lemma 2.15 and inequality (3.3) we obtain log |ξ |v ≤ h v N(J f ) ≤ (n + 1) d h v (V ) + (n + 1) h v + (n + 1) × log(n + r + 1) d + n log d deg V + (r + 1) (n + 1) log(n + r + 1) d deg V ≤ (n + 1) d h v (V ) + (n + 1) h v + (r + 3) (n + 1) × log(n + r + 1) d deg V.
(3.5)
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
569
This implies that the right-hand side of inequality (3.4) is nonnegative. So the inequal ity also holds for h v 3m / N(J f ) and thus for h v (q). The case v ∈ / M K∞ is treated in exactly the same way. The obtained estimates do not involve any constant terms with respect to h v , h v (g), and h v (V ); in particular, degt g does not appear in the estimate. This follows simply from Lemma 1.2. In this case, inequality (3.5) reads as follows: log |ξ |v ≤ (n + 1) d h v (V ) + (n + 1) h v deg V.
(3.6)
We remark that the election of ξ is independent of v, and so it can be done uniformly. Remark 3.2 Let notation be as in the previous lemma. In case n = 0 we have the sharper estimates • deg q ≤ deg g, • h v (q) ≤ h v (g) + h v + 2 log(r + 1) deg g − log |ξ |v for v ∈ M K∞ , • h v (q) ≤ h v (g) + h v − log |ξ |v for v ∈ / M K∞ . ∗ Here ξ ∈ K denotes any nonzero coefficient of f . The local height estimates follow from Lemma 1.2(d) and the fact that h v − log |ξ |v ≥ 0. 3.2. An effective arithmetic Nullstellensatz 3.2.1. Estimates for the complete intersection case The following result gives estimates for the degree and local height of the polynomials arising in the Nullstellensatz over a number field K in the case when the input is a reduced weak regular sequence. It is a direct consequence of the division lemma above. These estimates depend mainly on the degree and height of the varieties successively cut out by the input polynomials. They are quite flexible, and they apply to other situations as well, as we see in Section 4. We recall that f 1 , . . . , f s ∈ K [x1 , . . . , xn ] is a weak regular sequence if f i+1 is not a zero divisor modulo the ideal ( f 1 , . . . , f i ) for i = 0, . . . , s − 1. Furthermore, it is called reduced when all these ideals are radical. LEMMA 3.3 Let f 1 , . . . , f s ∈ K [x1 , . . . , xn ] be polynomials without common zeros in An which form a reduced weak regular sequence. Assume that V j := V ( f 1 , . . . , f j ) satisfies Assumption 1.5 for j = 1, . . . , s − 1. Set d := maxi deg f i and h v := maxi h v ( f i ) for v ∈ M K . Assume also n, d ≥ 2. Then there exist p1 , . . . , ps ∈ K [x1 , . . . , xn ] and ξ ∈ K ∗ such that
570 • • •
•
KRICK, PARDO, AND SOMBRA
1 = p1 f 1 + · · · + ps f s , Pmin{n,s}−1 deg pi ≤ 2 n d (1 + j=1 deg V j ), Ps−1 h v ( pi ) ≤ 2 n d j=1 h v (V j ) + (n + 1) h v + 2 n (2 n + 5) log(n + 1) d (1 + Ps−1 ∞ j=1 deg V j ) − log |ξ |v for v ∈ M K , Ps−1 Ps−1 h v ( pi ) ≤ 2 n d j=1 h v (V j ) + (n + 1) h v (1 + j=1 deg V j ) − log |ξ |v for v∈ / M K∞ .
Proof Set Ii := I (Vi ) = ( f 1 , . . . , f i ) for i = 1, . . . , s − 1. Also, set f 0 := 0, V0 := V ( f 0 ) = An , and I0 := I (V0 ) = (0). Finally, set Ai := K [x1 , . . . , xn−i ] and Bi := K [Vi ] = K [x1 , . . . , xn ]/Ii for 0 ≤ i ≤ s − 1. The fact that Vi satisfies Assumption 1.5 implies that the inclusion Ai ,→ Bi is integral. We note that the sets of free and dependent variables of Bi have cardinality n − i and i, respectively. Also, the set of dependent variables of B j is contained in that of Bi for i ≤ j. For f ∈ K [x1 , . . . , xn ] we denote by degx(i) f the degree of f in the dependent variables xn−i+1 , . . . , xn of Bi with respect to the integral inclusion Ai ,→ Bi . For i ≤ j, the previous observation implies that degx( j) f ≤ degx(i) f . Applying the Division Lemma 3.1, we construct inductively polynomials p1 , . . . , ps . First, we take ps such that ps f s ≡ 1
(mod Is−1 ).
For 0 ≤ i ≤ s − 2 we assume that pi+2 , . . . , ps are already constructed and we set bi+1 := 1 − ( pi+2 f i+2 + · · · + ps f s ). Then f i+1 is a nonzero divisor and f i+1 | bi+1 in Bi . We again apply the division lemma to obtain pi+1 such that pi+1 f i+1 ≡ bi+1
(mod Ii ).
Continuing this procedure until i = 0, we get 1 = p1 f 1 +· · ·+ ps f s in K [x1 , . . . , xn ]. Let us analyze degrees. First, we consider the case s ≤ n. Again we proceed by induction. The estimates from the division lemma for As−1 := K [x1 , . . . , xn−(s−1) ], g := 1, and f := f s give degx(s−1) ps ≤ (s − 1) d ≤ (n − 1) d and deg ps ≤ (2 s − 1) d deg Vs−1 . Now, let 1 ≤ i ≤ s − 2. Then degx(i) pi+1 ≤ i d and deg pi+1 ≤ deg bi+1 + i d + max{(i + 1) d, degx(i) bi+1 } deg Vi ,
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
571
where degx(i) bi+1 ≤ max {degx(i) p j + deg f j } ≤ max degx( j−1) p j + d ≤ s d. j≥i+2
j≥i+2
Applying recursively the previous inequality, we obtain deg pi+1 ≤ max deg p j + d + (s + i) d deg Vi j≥i+2
≤ (2 s − 1) d deg Vs−1 +
s−2 X
d + (s + j) d deg V j
j=i
= (s − i − 1) d +
s−1 X
(s + j) d deg V j .
j=i
For i = 0 we have p1 | b1 and therefore deg p1 ≤ deg b1 ≤ max j≥2 deg p j + d. Then for all i, deg pi ≤ (s − 1) d +
s−1 s−1 X X (s + j) d deg V j ≤ 2 n d (1 + deg V j ). j=1
j=1
Next, we consider the case s = n+1. In this case Vs is a zero-dimensional variety, and so deg pn+1 = degx(n) pn+1 ≤ n d. Let 1 ≤ i ≤ n − 1. Then degx(i) pi+1 ≤ i d. Again we apply recursively the previous inequality, and we get deg pi+1 ≤ max deg p j + d + (n + 1 + i) d deg Vi j≥i+2
≤ nd +
n−1 X
d + (n + 1 + j) d deg V j
j=i n−1 X = (2 n − i) d + (n + 1 + j) d deg V j . j=i
We have also deg p1 ≤ deg b1 ≤ max deg j≥2 deg p j + d. We conclude that for all i, deg pi ≤ 2 n d +
n−1 X
(n + 1 + j) d deg V j ≤ 2 n d (1 +
j=1
n−1 X
deg V j ).
j=1
Finally, we estimate the local height of these polynomials. In the rest of the proof we make repeated use of the following degree bounds: degx(i−1) pi ≤ n d,
572
KRICK, PARDO, AND SOMBRA min{n,s}−1 X
deg pi ≤ 2 n d (1 +
deg V j ).
j=i−1
As usual, we concentrate on the case v ∈ M K∞ ; the case v ∈ / M K∞ can be treated analogously. We apply the division lemma to As−1 := K [x1 , . . . , xn−(s−1) ], g := 1, and f := f s , and we obtain h v ( ps ) ≤ s d h v (Vs−1 ) + s h v + n − (s − 1) + 6 s + (s − 1) log(n + 1) d × deg Vs−1 − log |ξs−1 |v for some ξs−1 ∈ K ∗ . Let 1 ≤ i ≤ s − 2, and set n 0 := min{n, s}. Then there exists ξi ∈ K ∗ such that h v ( pi+1 ) ≤ h v (bi+1 ) + (i d + max{d, degx(i) bi+1 }) h v (Vi ) + (i + 1) h v + (n − i + 6) log(n + 1) × i d + max{(i + 1) d, degx(i) bi+1 } deg Vi + 2 log(n − i + 1) deg bi+1 − log |ξi |v ≤ max h v ( p j ) + h v + log(n + 1) d + log(s − i) j≥i+2
+ (s + i) d h v (Vi ) + (i + 1) h v deg Vi + (n − i + 6) (s + i) log(n + 1) d deg Vi + 2 log(n + 1) 2 n d (1 +
nX 0 −1
deg V j ) + d − log |ξi |v .
j=i+1
Applying the inductive hypothesis, we obtain s−2 X h v ( pi+1 ) ≤ s d h v (Vs−1 ) + d (s + j) h v (V j ) j=i
+ (s − i − 1) h v + h v
s−1 X
( j + 1) deg V j
j=i
+ 4 (s − i − 1) (n + 1) log(n + 1) d + log(n + 1) d
s−1 X
(n − j + 6) (s + j) deg V j
j=i
+ 4 n log(n + 1) d
nX 0 −1 j=i+1
( j − i) deg V j −
s−1 X j=i
log |ξ j |v .
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
573
For i = 0 we apply Remark 3.2. There exists ξ0 ∈ K ∗ such that h v ( p1 ) ≤ h v (b1 ) + h v + 2 log(n + 1) deg b1 − log |ξ0 |v ≤ max h v ( p j ) + 2 h v + log(n + 1) d + log s j≥2
+ 2 log(n + 1) 2 n d (1 +
nX 0 −1
deg V j ) + d − log |ξ0 |v .
j=1
We set ξ :=
Qs−1
j=0 ξ j .
h v ( p1 ) ≤ 2 n d
As 2 ≤ s ≤ n + 1, we have
s−1 X
h v (V j )
j=1
+ (n + 1) h v (1 +
s−1 X
deg V j ) + 4 n (n + 1) log(n + 1) d
j=1
+ log(n + 1) d
s−1 X
(n − j + 6) (n + 1 + j) deg V j
j=1
+ 4 n log(n + 1) d
nX 0 −1
j deg V j − log |ξ |v
j=1
≤ 2nd
s−1 X
h(V j ) + (n + 1) h v + 2 n (2 n + 5)
j=1 s−1 X × log(n + 1) d (1 + deg V j ) − log |ξ |v . j=1
This last inequality follows from the fact that 4 n j +(n− j +6) ( j +s) ≤ 2 n (2 n+5) for j ≤ n − 1, and 6 (2 n + 1) ≤ 2 n (2 n + 5) as n ≥ 2. To conclude the proof, observe that for i = 1, . . . , s − 1, inequality (3.5) guarantees that the obtained estimate for pi differs from the one for pi+1 by a positive term. Thus the same estimate holds for h v ( pi ), 1 ≤ i ≤ s. The non-Archimedean case is treated in exactly the same way. The conclusion of the proof comes in this case from inequality (3.6). By means of B´ezout inequality, we can now estimate the degree and height of the varieties V j . In this way we obtain an estimate which depends only on the degree and height of the input polynomials.
574
KRICK, PARDO, AND SOMBRA
COROLLARY 3.4 Let notation and assumptions be as in Lemma 3.3, and assume n, d ≥ 2. Then there exist p1 , . . . , ps ∈ K [x1 , . . . , xn ] and γ ∈ K ∗ such that • 1 = p1 f 1 + · · · + ps f s , • deg pi ≤ 4 n d n , • h v ( pi ) ≤ 4 n (n + 1) d n h v + 4 n (4 n + 5) log(n + 1) d n+1 − log |γ |v for v ∈ M K∞ , • h v ( pi ) ≤ 4 n (n + 1) d n h v − log |γ |v for v ∈ / M K∞ .
Proof Let us first consider degrees. From the preceding result we obtain deg( pi ) ≤ 2 n d (1 +
min{n,s}−1 X
deg V j ) ≤ 2 n d (1 + · · · + d n−1 ) ≤ 4 n d n .
j=1
Here we applied the inequality 1 + · · · + d n−1 ≤ 2 d n−1 to obtain the last estimate. Next we consider the local height estimates. Let v ∈ M K∞ . We have h v ( pi ) ≤ 2 n d
s−1 X
h v (V j )
j=1 s−1 X + (n + 1) h v + 2 n (2 n + 5) log(n + 1)d (1 + deg V j ) − log |ξ |v j=1
for some ξ ∈ K ∗ . Applying Corollary 2.10, h v (V j ) ≤ j d j−1 h v + (n + j) log(n + 1) d j − log |λ j |v for some λ j ∈ K ∗ . Therefore h v ( pi ) ≤ 2 n d
s−1 X
j d j−1 h v + (n + j) log(n + 1) d j − log |λ j |v
j=1
+ (n + 1) h v + 2 n (2 n + 5) log(n + 1) d
n X
d j − log |ξ |v
j=0 2
n
2
≤ 4 n d h v + 8 n log(n + 1) d
n+1
+ 2 (n + 1) d n h v + 4 n (2 n + 5) log(n + 1) d n+1 − 2nd
s−1 X
log |λ j |v − log |ξ |v
j=1
≤ 4 n (n + 1) d n h v + 4 n (4 n + 5) log(n + 1) d n+1 − log |γ |v ,
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
575
Q 2n d. where γ ∈ K ∗ is defined as γ := ξ s−1 j=1 λ j ∞ The case v ∈ / M K follows analogously: h v ( pi ) ≤ 2 n d
s−1 X
h v (V j ) + (n + 1) h v (1 +
j=1
s−1 X
deg V j ) − log |ξ |v .
j=1
We have h v (V j ) ≤ j d j−1 h v − log |λ j |v , and therefore h v ( pi ) ≤ 2 n d
s−1 X
( j d j−1 h v − log |λ j |v ) + (n + 1) h v
j=1
n X
d j − log |ξ |v
j=0
≤ 4 n (n + 1) d h v − log |γ |v . n
3.2.2. Proof of Theorem 1 In order to prove Theorem 1, it remains only to put the general case into the hypothesis of Corollary 3.4. This is accomplished by replacing the input polynomials and variables by generic linear combinations. The coefficients of the linear combinations are chosen to be roots of 1. Amazingly enough, we do not need any control on the degree of the involved finite extension. Let L be a finite extension of K , and let B := {e1 , . . . , e N } be a basis of L as a K -linear space. We recall that B ∗ := {e1∗ , . . . , e∗N } is the dual basis of B if Tr LK (ei e∗j ) = 1 for i = j and zero otherwise. LEMMA 3.5 Let ω ∈ Q be a primitive p-root of 1 for some prime p. Then the basis B ∗ := { (ω− j − ω) / p : j = 0, . . . , p − 2 } of Q(ω) is dual to B := { ωi : i = 0, . . . , p − 2 }.
Proof A direct computation shows that for i, j = 0, . . . , p − 2, p−1 X l i −l j p, Tr ωi (ω− j − ω) = ω (ω − ωl ) = 0, l=1
for i = j, for i 6= j.
We use this result in the following way: Let ω be a primitive p-root of 1, and set L := K (ω). Let us assume that Q(ω) and K are linearly independent and that p does not divide the discriminant of K . Both conditions are satisfied by all but a finite number of p. Then [L : K ] = p − 1 and O L = O K [ω] (see [36, Chap. 3, Prop. 17]). Now, let ν ∈ O L \ {0}. By the preceding lemma ν=
1 1 Tr ν (1 − ω) + · · · + Tr ν (ω2− p − ω) ω2− p ∈ O K [ω] \ {0}, p p
576
KRICK, PARDO, AND SOMBRA
and so there exists 0 ≤ j ≤ p − 2 such that Tr ν (ω− j − ω) / p ∈ O K \ {0}. 3.6 (Effective arithmetic Nullstellensatz) Let K be a number field, and let f 1 , . . . , f s ∈ O K [x1 , . . . , xn ] be polynomials without common zeros in An . Set d := maxi deg f i and h := h( f 1 , . . . , f s ). Then there exist a ∈ O K \ {0} and g1 , . . . , gs ∈ O K [x1 , . . . , xn ] such that • a = g1 f 1 + · · · + gs f s , • deg gi ≤ 4 n d n , • h(a, g1 , . . . , gs ) ≤ 4 n (n + 1) d n h + log s + (n + 7) log(n + 1) d . THEOREM
Theorem 1 in the introduction corresponds to the case K := Q. The extremal cases n = 1 and d = 1 are both simple. We treat them directly in the following lemmas. LEMMA 3.7 Let `1 , . . . , `s ∈ O K [x1 , . . . , xn ] be polynomials of degree bounded by 1 without common zeros in An . Set h := h(`1 , . . . , `s ). Then there exist a ∈ O K \ {0} and a1 , . . . , as ∈ O K such that • a = a1 `1 + · · · + as `s , • h(a, a1 , . . . , as ) ≤ (n + 1) h + log(n + 1) .
Proof Equation a = a1 `1 + · · · + as `s is equivalent to a O K -linear system of n + 1 equations in s unknowns. The height estimate follows then from application of the Cramer rule. LEMMA 3.8 Let f 1 , . . . , f s ∈ O K [x] be polynomials without common zeros in A1 . Set d := maxi deg f i and h := h( f 1 , . . . , f s ). Then there exist a ∈ O K \ {0} and g1 , . . . , gs ∈ O K [x] such that • a = g1 f 1 + · · · + gs f s , • deg gi ≤ d − 1, • h(a, g1 , . . . , gs ) ≤ 2 d (h + d).
Proof P P Let f := i ai f i , g := i bi f i ∈ K [x] be generic linear combinations of f 1 , . . . , f s . Then f and g are coprime polynomials, and so there exist p, q ∈ K [x] with deg p < deg g and deg q < deg f such that 1 = p f + q g.
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
577
Expanding this identity, we see that there exist p1 , . . . , ps ∈ K [x] with deg pi ≤ d − 1 such that 1 = p1 f 1 + · · · + ps f s . Thus the above B´ezout identity translates to a consistent system of K -linear equations. The number of equations and variables equal 2 d and s d, respectively. This system can be solved by the Cramer rule. The integer a is the determinant of a nonsingular (2 d × 2 d)-submatrix of the matrix of the linear system. Proof of Theorem 3.6 We assume n, d ≥ 2. Let G p ⊂ Q denote the group of p-roots of 1 for a prime p. For ai j ∈ G p and i = 1, . . . , min{n + 1, s} we set qi := ai1 f 1 + · · · + ais f s . Also, for bkl ∈ G p and k = 1, . . . , n we set yk := bk0 + bk1 x1 + · · · + bkn xn . We assume that for a specific choice of p, ai j , and bkl there exists t ≤ min{n + 1, s} such that (q1 , . . . , qi ) ⊂ K [x1 , . . . , xn ] is a radical ideal of dimension n − i for i = 1, . . . , t − 1 and 1 ∈ (q1 , . . . , qt ). We also assume that y1 , . . . , yn is a linear change of variables and that Vi := V (q1 , . . . , qi ) ⊂ An satisfies Assumption 1.5 for i = 1, . . . , t − 1 with respect to y1 , . . . , yn−i . This is guaranteed by the fact that these conditions are generically satisfied: there exists a hypersurface H of the coefficient space such that (ai j , bkl ) ∈ / H implies that q1 , . . . , qs satisfy the stated conditions with respect to the variables y1 , . . . , yn (see, for instance, [16, Ths. 3.5 and 3.7.2], [19, Sec. 3.2 ], [50, Prop. 18 and proof of Th. 19]). As ∪ p G p is Zariski dense in A1 , it follows that these coefficients can be chosen to lie in G p for some p. Moreover, p can be chosen such that for ω a primitive p-root of 1 and L := K (ω), Q(ω) and K are linearly independent and p does not divide the discriminant of K . We also refer the reader to Section 4.1, where we give a selfcontained treatment of this topic. Set b := (bk0 )k ∈ G np and B := (bkl )k,l≥1 ∈ GLn (Q), so that x = B −1 (y − b). For j = 1, . . . , t, set F j (y) := q j B −1 (y − b) ∈ L[y1 , . . . , yn ]. Then F1 , . . . , Ft satisfy the hypothesis of Corollary 3.4. Let γ ∈ L ∗ and P1 , . . . , Pt ∈ L[y1 , . . . , yn ] be the nonzero element and the polynomials satisfying B´ezout identity we obtain there.
578
KRICK, PARDO, AND SOMBRA
Now, for i = 1, . . . , s, set pi :=
t X
ai j P j (B x + b) ∈ L[x1 , . . . , xn ]
j=1
so that 1 = p1 f 1 + · · · + ps f s holds. n+1 Finally, set µ := (det B)4 n (n+1) d γ ∈ L ∗ . By Lemma 3.5 there exists 0 ≤ ` ≤ p − 2 such that Tr µ (ω−` − ω) 6= 0. We define a := Tr µ (ω−` −ω) / p ∈ K ∗ , gi := Tr µ pi (ω−` −ω) / p ∈ K [x1 , . . . , xn ] for i = 1, . . . , s. Then a = g1 f 1 + · · · + gs f s as f 1 , . . . , f s ∈ K [x1 , . . . , xn ] and Tr is a K -linear map. Aside from the degree and height bounds, we show that since f 1 , . . . , f s ∈ O K [x1 , . . . , xn ], a ∈ O K and gi ∈ O K [x1 , . . . , xn ]. Let us first analyze degrees and local heights. As deg F j ≤ d, deg gi ≤ deg pi ≤ max j deg P j ≤ 4 n d n . Now, let v ∈ M K∞ , and let w ∈ M L such that w | v. We have h w B −1 (y − b) ≤ n log n − log | det B|w , and so h w (F j ) ≤ h w (q j ) + n log n − log | det B|w + 2 log(n + 1) d ≤ h v + log s + (n + 2) log(n + 1) d − log | det B|w d by Lemma 1.2(c). From Corollary 3.4, h w (P j ) ≤ 4 n (n + 1) d n max h w (Fk ) + 4 n (4 n + 5) log(n + 1) d n+1 − log |γ |w k
≤ 4 n (n + 1) d (h v + log s + (n + 2) log(n + 1) d − log | det B|w d) n
+ 4 n (4 n + 5) log(n + 1) d n+1 − log |γ |w = 4 n (n + 1) d n (h v + log s) + 4 n (n 2 + 7 n + 7) log(n + 1) d n+1 − log |µ|w . Therefore h w (µ pi ) ≤ max h w (P j ) + 2 log(n + 1) max deg P j + log t + log |µ|w j
j
≤ 4 n (n + 1) d (h v + log s) + 4 n (n 2 + 7 n + 7) log(n + 1) d n+1 n
+ 8 n log(n + 1) d n + log(n + 1) ≤ 4 n (n + 1) d n (h v + log s + (n + 7) log(n + 1) d) − log 2
(3.7)
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
579
again by Lemma 1.2(c) and the fact d, n ≥ 2. We have 1 1 X σ µ pi (ω−` − ω) , gi = Tr µ pi (ω−` − ω) = p p σ ∈Gal L/K
and so h v (gi ) ≤ max h w (µ pi ) + log 2 w|v
≤ 4 n (n + 1) d n (h v + log s + (n + 7) log(n + 1) d). We have h w (µ) ≤ 4 n (n +1) d n (h v +log s)+4 n (n 2 +7 n +7) log(n +1) d n+1 , and so the previous estimate also holds for h v (a). Now let v ∈ / M K∞ and w | v. Analogously, we have h w (µ), h w (µ pi ) ≤ 4 n (n + 1) d n h v = 0 as f 1 , . . . , f s ∈ O K [x1 , . . . , xn ]. Then µ ∈ O L \ {0} and µ pi ∈ O L [x1 , . . . , xn ], which in turn implies that a ∈ O K \ {0} and gi ∈ O K [x1 , . . . , xn ] as desired. The global height estimate then follows from the expression X 1 Nv max{h v (a), h v (g1 ), . . . , h v (gs )}. h(a, g1 , . . . , gs ) = [K : Q] ∞ v∈M K
Remark 3.9 The fact that the bound (3.7) is uniform on w for w | v is the key that allows us to get rid of the roots of 1. This is no longer the case in our treatment of the more refined arithmetic Nullstellens¨atze in Section 4. The following example improves the lower bound for a general height estimate given in the introduction and thus shows that the term d n h is unavoidable. Example 3.10 Set f 1 := x1 − H,
f 2 := x2 − x1d , . . . ,
d f n := xn − xn−1 ,
f n+1 := xnd
for any positive integers d, H . These are polynomials without common zeros in An of degree and height bounded, respectively, by d and h := log H . Let a ∈ Z \ {0} and g1 , . . . , gn+1 ∈ Z[x1 , . . . , xn ] such that a = g1 f 1 + · · · + n−1 gn+1 f n+1 . We evaluate this identity in (H, H d , . . . , H d ), and we obtain a = gn+1 (H, H d , . . . , H d from which we deduce h(a) ≥ d n h.
n−1
n
) Hd ,
580
KRICK, PARDO, AND SOMBRA
4. Intrinsic type estimates Theorem 1 is essentially optimal in the general case. There are, however, many particular instances in which these estimates can be improved. Consider the following example: f 1 := x1 − 1,
f 2 := x2 − x1d , . . . ,
d f n := xn − xn−1 ,
f n+1 := H − xnd
for any positive integers d and H . These are polynomials without common zeros in An of degree and height bounded, respectively, by d and h := log H . Theorem 1 says that there exist a ∈ Z \ {0} and g1 , . . . , gn+1 ∈ Z[x1 , . . . , xn ] such that a = g1 f 1 + · · · + gn+1 f n+1 with deg gi ≤ 4 n d n and h(a), h(gi ) ≤ 4 n (n + 1) d n (h + (n + 7) log(n + 1) d). However, the following B´ezout identity holds: H −1=
x1d − 1 xd − 1 xd − 1 ··· n f1 + · · · + n f n + f n+1 . x1 − 1 xn − 1 xn − 1
Note that the polynomials arising in this identity have degree and height bounded, respectively, by n (d − 1) and h. There is in this case an exponential gap between the a priori general estimates and the actual ones. The explanation is somewhat simple: for i = 1, . . . , n, the varieties Vi := V ( f 1 , . . . , f i ) = V (x1 − 1, x2 − 1, . . . , xi − 1) ⊂ An verify deg(Vi ) = 1 and h(Vi ) ≤ 2 n log(n + 1). Namely, both the degree and the height of the varieties successively cut out by the input polynomials are much smaller than the corresponding B´ezout estimate. As the varieties Vi verify the assumptions of Lemma 3.3, a direct application together with Lemma 1.3 produces the more realistic estimates deg gi ≤ 2 n 2 d , h(a), h(gi ) ≤ (n + 1)2 h + 8 n log(n + 1) d . Based on this idea, we devote this section to the study of more refined arithmetic Nullstellens¨atze which can deal with such situations. 4.1. Equations in general position This section deals with the preparation of the input data. To apply Lemma 3.3, we need to prepare the polynomials and the variables of the ambient space. Let f 1 , . . . , f s ∈ K [x1 , . . . , xn ] be polynomials without common zeros in An . For i = 1, . . . , s and ai j ∈ Z we set qi := ai1 f 1 + · · · + ais f s .
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
581
We estimate the height of rational integers ai j in order that there exist t ≤ min{n + 1, s} such that (q1 , . . . , qi ) ⊂ K [x1 , . . . , xn ] is a radical ideal of dimension n − i for i = 1, . . . , t − 1 and 1 ∈ (q1 , . . . , qt ). Also, we set yk := bk0 + bk1 x1 + · · · + bkn xn for k = 1, . . . , n and bk l ∈ Z. Again we want to estimate the height of rational integers bkl such that Vi := V (q1 , . . . , qi ) ⊂ An satisfies Assumption 1.5 with respect to this set of variables for i = 1, . . . , t − 1. Namely, the projection πi : Vi → An−i ,
x 7 → (y1 , . . . , yn−i )
must verify #πi−1 (0) = deg Vi ; that is, # Vi ∩ V (y1 , · · · , yn−i ) = deg Vi for i = 1, . . . , t − 1. Lemma 2.14 would then imply that the variables y1 , . . . , yn−i are in Noether normal position with respect to Vi . It is well known that these conditions are satisfied by a generic election of ai j and bkl (see, for instance, [19, Sec. 3.2], [50, Prop. 18 and proof of Th. 19]). We have already applied such a preparation to obtain the classic style version of the effective arithmetic Nullstellensatz presented in Theorem 3.6. There we chose roots of 1 as coefficients of the linear combinations since their existence was sufficient in our proof. However, technical reasons (see Remark 3.9) prevent us from applying the same principle in Section 4, and we need to carry out a more careful analysis. We note that all aspects of this preparation were previously covered in the research papers [2, Sec. 4], [19, Sec. 3.2], [32, Sec. 6], and [22, Sec. 5.2]. However, the bounds presented therein are either nonexplicit or not precise enough for our purposes. Here we choose to give a self-contained presentation, which yields another proof of the existence of such linear combinations. The obtained estimates substantially improve the previously known ones. 4.1.1. An effective Bertini theorem This section is devoted to the preparation of the polynomials. We first establish some auxiliary results. The following is a version of the so-called shape lemma representation of a zerodimensional radical ideal. The main difference here is that we choose a generic linear form—instead of a particular one—as a primitive element. For a polynomial f = c D t D + · · · + c0 ∈ k[t] we denote its discriminant by discr( f ) ∈ k. We recall that discr ( f ) 6= 0 if and only if c D 6= 0 and f is squarefree, that is, f has exactly D distinct roots. LEMMA 4.1 (Shape lemma) Let V ⊂ An be a zero-dimensional variety defined over k. Let U := (U0 , . . . , Un ) be
582
KRICK, PARDO, AND SOMBRA
a group of n + 1 variables, and set L := U0 + U1 x1 + · · · + Un xn for the associated generic linear form. Let P ∈ k[U ][T ] be a characteristic polynomial of V . Set P 0 := ∂ P/∂ T ∈ k[U ][T ] and ρ := discrT P ∈ k[U ] \ {0}. Also, set I for the extension of I (V ) to k[U ][x]. Then there exist v1 , . . . , vn ∈ k[U ][T ] with deg vi ≤ deg V − 1 such that Iρ = P(L), P 0 (L) x1 − v1 (L), . . . , P 0 (L) xn − vn (L) ρ ⊂ k[U ]ρ [x]. Here Iρ denotes the localization of I at ρ. Proof We note first that I (V ) is a radical ideal, and so I = k[U ] ⊗k I (V ) is also radical. We readily obtain from the definition of P that I ∩ k[U ][L] = P(L) , and so P(L) ∈ I . P We can write P(L) = α aα (x) U α with aα (x) ∈ I (V ). Therefore ∂ P(L)/∂Ui also lies in I for all i. A direct computation shows that for i = 1, . . . , n, ∂ P(L)/∂Ui = P 0 (L) xi − vi (L) for some vi ∈ k[U ][T ] with deg vi ≤ deg P − 1 = deg V − 1. Set J := P(L), P 0 (L) x1 − v1 (L), . . . , P 0 (L) xn − vn (L) ⊂ k[U ][x]. The previous argument shows the inclusion I ⊃ J . On the other hand, ρ = A P + B P 0 for some A, B ∈ k[U ][T ]. Set wi := B vi . Then xi ≡ wi (L)/ρ (mod Jρ ), and so for every f ∈ k[U ][x] we have that f ≡ f (U, w1 (L)/ρ, . . . , wn (L)/ρ) modulo Jρ , and hence modulo Iρ . For f ∈ I , ρ deg f f U, w1 (L)/ρ, . . . , wn (L)/ρ ∈ I ∩ k[U ][L] = P(L) , which implies Iρ ⊂ Jρ as desired. Let ν ∈ k n+1 such that ρ(ν) 6= 0. It follows that I (V ) can be represented as I (V ) = P(L), P 0 (L) x1 − v1 (L), . . . , P 0 (L) xn − vn (L) (ν) ⊂ k[x]. Now, let f 1 , . . . , f s ∈ k[x1 , . . . , xn ] be polynomials without common zeros in An . For i = 1, . . . , s we let Z i := (Z i1 , . . . , Z is ) denote a group of s variables, and we set Q i := Z i1 f 1 + · · · + Z is f s ∈ k[Z ][x] for the associated generic linear combination of f 1 , . . . , f s .
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
583
LEMMA 4.2 For ` = 1, . . . , s, the ideal (Q 1 , . . . , Q ` ) is a complete intersection prime ideal of k[Z ][x].
Proof Set I := (Q 1 , . . . , Q ` ) and V := V (I ) ⊂ As` × An . First we observe that V is a linear bundle over An : the projection π : V → An ,
(z, x) 7 → x
is surjective, and the fibers are affine spaces of dimension (s − 1) `. This follows from the assumption that the f j have no common zeros. This implies that dim V = (s − 1) ` + n because of the theorem of dimension of fibers. Namely, Q 1 , . . . , Q ` is a complete intersection, and in particular the ideal I is unmixed. Set I = I1 ∩ · · · ∩ Im for the primary decomposition of this ideal. We show that I j is prime for all j and then that m = 1. First we have that I f j = (Q 1 / f j , . . . , Q ` / f j ) = (Z 1 j + H1 j , . . . , Z `j + H`j ) where Hi j ∈ k[Z i ][x] f j does not depend on Z i j . Therefore (k[As` × An ]/I ) f j ∼ = k[A(s−1)` × An ] f j is a domain; that is, I f j is prime. We have I f j = (I1 ) f j ∩ · · · ∩ (Im ) f j , and so there exists 1 ≤ n( j) ≤ m such that I f j = (In( j) ) f j ,
V (Ii ) ⊂ { f j = 0}
for i 6= n( j).
In particular, In( j) = I f j ∩ k[As` × An ] is prime. The fact that ∩ j { f j = 0} = ∅ ensures that n( j) runs over all 1 ≤ i ≤ m, and so I is radical. The expression I f j = (Z 1 j + H1 j , . . . , Z `j + H`j ) implies that π(V (I f j )) ⊂ An contains the dense open set { f j 6 = 0}. In particular, V (I f j ) is not contained in any of the hypersurfaces { f i = 0}, and so n( j) = n(1) for all j. This implies that m = 1, and so I = I1 is prime. The following proposition shows that (Q 1 (a1 ), . . . , Q ` (a` )) is a radical ideal for a generic election of ai := (ai1 , . . . , ais ). Unlike Lemmas 4.1 and 4.2, this result does not hold for arbitrary characteristic. For instance, let x p , 1 − x p ∈ F p [x] for some prime p. Then Q 1 (a1 ) = b + c x p for some b, c ∈ F p , and so Q 1 (a1 ) = (b1/ p + c1/ p x) p is not squarefree.
584
KRICK, PARDO, AND SOMBRA
PROPOSITION 4.3 Let char(k) = 0, and set I := (Q 1 , . . . , Q ` ) ⊂ k[Z ][x]. • In case I ∩ k[Z ] 6 = {0} there exists F ∈ k[Z ] \ {0} with deg F ≤ (d + 1)` such that F(a1 , . . . , a` ) 6 = 0 for a1 , . . . , a` ∈ k s implies that 1 ∈ Q 1 (a1 ), . . . , Q ` (a` ) . • In case I ∩ k[Z ] = {0} there exists F ∈ k[Z ] \ {0} with deg F ≤ 2 (d + 1)2 ` such that F(a1 , . . . , a` ) 6 = 0 for a1 , . . . , a` ∈ k s implies that Q 1 (a1 ), . . . , Q ` (a` ) ⊂ k[x] is a radical ideal of dimension n − `.
Proof Set V := V (I ) ⊂ As` × An . We have dim V = (s − 1) ` + n and deg V ≤ (d + 1)` . First, we consider the case I ∩ k[Z ] 6 = {0}. This occurs, for instance, when ` ≥ n + 1 since then dim I = s ` + n − ` < dim k[Z ] = s `. Let π : As` × An → As` be the canonical projection. Then π(V ) is a proper subvariety of As` , and thus it is contained in a hypersurface of degree bounded by deg V . This can be seen by taking a generic projection of this variety into an affine space of dimension s ` + n − ` + 1 (see [23, Rem. 4]). Let F ∈ k[Z ] be a defining equation of this hypersurface. Then F ∈ I as I is prime, and we have deg F ≤ (d + 1)` . Thus 1 ∈ I F ⊂ k[Z ] F [x], and therefore 1 ∈ I (a) := Q 1 (a1 ), . . . , Q ` (a` ) for a ∈ k s` such that F(a) 6 = 0. Next, we consider the case I ∩ k[Z ] = {0}.We adopt the following convention: For an ideal J ⊂ k[x] and for ζ any new group of variables, we denote by J [ζ ] and J (ζ ) the extension of J to the polynomial rings k[ζ ][x] and k(ζ )[x], respectively. We assume for the moment ` = n. Then dim I = s `, and so the extended ideal I (Z ) ⊂ k(Z )[x] is a zero-dimensional prime ideal. We have then that k(Z ) ⊗k I (Z ) ⊂ k(Z )[x] is a zero-dimensional radical ideal, as char(k) = 0 (see [44, Th. 26.3]). Our approach to this case is based on Shape Lemma 4.1. We determine a polynomial F ∈ k[Z ] such that F(a) 6 = 0 implies that the shape lemma representation of I (Z ) can be transferred to a shape lemma representation of I (a). Let U be a group of n + 1 variables, and set L := U0 + U1 x1 + · · · + Un xn for the associated generic linear form. Consider the morphism 9 : An+1 × As` × An → An+1 × As` × A1 , (u, z, x) 7 → u, z, L(x) , and let W be the variety defined by I in An+1 ×As` ×An , that is, W = An+1 × V . The Zariski closure 9(W ) is then an irreducible hypersurface. We set P ∈ k[U, Z ][T ] for one of its defining equations. If I [U ](Z ) is the extension of I (Z ) to k[U ](Z )[x], the polynomial P can be equivalently defined through the condition that P(L) is a generator of the principal
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
585
ideal I [U ](Z ) ∩ k[U, Z ][L]. Namely, P is a characteristic polynomial of the zerodimensional variety W0 defined by I (Z ) in An (k(Z )). Let v1 , . . . , vn ∈ k[U ](Z )[T ] denote the polynomials arising in the shape lemma applied to W0 . From the proof of this lemma we have that ∂ P(L)/∂Ui = P 0 (L) xi − vi (L) ∈ k[U, Z ][L] and so vi ∈ k[U, Z ][T ]. Set J := P(L), P 0 (L) x1 −v1 (L), . . . , P 0 (L) xn −vn (L) ⊂ k[U, Z ][x] and ρ := discrT P ∈ k[U, Z ] \ {0}. Then (I [U ](Z ) )ρ = (J [U ](Z ) )ρ ⊂ k[U ](Z )ρ [x]. We have that both Iρ[U,Z ] and Jρ[U,Z ] are prime ideals of k[U, Z ]ρ [x] with trivial intersection with the ring k[U, Z ]. Thus they coincide, respectively, with the contrac[U ](Z ) [U ](Z ) tion of Iρ and Jρ to k[U, Z ]ρ [x], and so Iρ[U,Z ] = Jρ[U,Z ] ⊂ k[U, Z ]ρ [x]. Define F ∈ k[Z ] \ {0} as any of the nonzero coefficients of the monomial expansion of ρ with respect to U . Let a ∈ k s` such that F(a) 6= 0. Then ρ(U, a) 6 = 0 and so P(U, a)[T ] is squarefree in k(U )[T ]. Then I (a)[U ]
ρ(U,a) 0
= P(L), P (L) x1 − v1 (L), . . . , P 0 (L) xn − vn (L) (a)ρ(U,a) ⊂ k[U ]ρ(U,a) [x] is radical, which implies in turn that I (a) = (I (a)[U ] )ρ(U,a) ∩ k[x] is a zerodimensional radical ideal of k[x], as desired. It remains to estimate the degree of F. To this end, it suffices to bound the degree of ρ with respect to the group of variables Z . We recall that P was defined as a defining equation of the hypersurface 9(W ). The map 9 is linear in the variables Z and x, and so deg Z P ≤ deg W = deg V ≤ (d + 1)n . This implies that deg F ≤ deg Z ρ ≤ deg Z P (2 deg Z P − 1) ≤ 2 (d + 1)2n . Finally, we consider the case ` < n for I ∩ k[Z ] = {0}. Let U1 , . . . , Un−` be groups of n + 1 variables each, and set L i := Ui0 + Ui1 x1 + · · · + Uin xn . for i = 1, . . . , n − `. Set U := (U1 , . . . , Un−` ), L := (L 1 , . . . , L n−` ), and k0 := k(U, L). The extended prime ideal I0 ⊂ k0 [Z ][x1 , . . . , x` ] verifies I0 ∩ k0 [Z ] = {0} and thus falls into the previously considered case.
586
KRICK, PARDO, AND SOMBRA
Thus there exists F0 ∈ k0 [Z ] \ {0} with deg F0 ≤ 2 (d + 1)2` such that F0 (a) 6= 0 for a ∈ k s` implies that I0 (a) is a radical ideal of k0 [x1 , . . . , x` ]. This implies in turn that I (a) is a radical ideal of dimension n − l of k[x], as I (a) = I0 (a) ∩ k[x]. We can assume without loss of generality that F0 lies in k[U, L][Z ]. We conclude by taking F as any nonzero coefficient of the monomial expansion of F0 with respect to the variables U and L. COROLLARY 4.4 Let char(k) = 0, and let f 1 , . . . , f s ∈ k[x1 , . . . , xn ] be polynomials without common zeros in An . Set d := maxi deg f i . Then there exist t ≤ min{n + 1, s} and a1 , . . . , at ∈ Zs such that • Q 1 (a1 ), . . . , Q i (ai ) is a radical ideal of dimension n − i for 1 ≤ i ≤ t − 1, • 1 ∈ Q 1 (a1 ), . . . , Q t (at ) , • h(ai ) ≤ 2 (n + 1) log(d + 1).
Proof Set t for the minimal i such that Ii := (Q 1 , . . . , Q i ) ∩ k[Z ] 6= {0}. Then t ≤ n + 1, and by the previous result there exists Ft ∈ k[Z ] with deg Ft ≤ (d + 1)t such that Ft (a) 6= 0 implies that 1 ∈ (Q 1 (a1 ), . . . , Q t (at )). On the other hand, for i < t we take a polynomial Fi ∈ k[Z ] of degree bounded by 2 (d + 1)2 i such that Fi (a) 6 = 0 implies that Q 1 (a1 ), . . . , Q i (ai ) is a radical ideal of dimension n − i. Then we take F := F1 · · · Ft , and so deg F ≤ 2 (d + 1)2 + · · · + 2 (d + 1)2(t−1) + (d + 1)t ≤ (d + 1)2n + 2 (d + 1)2n + (d + 1)n+1 ≤ 4 (d + 1)2n . Finally, F 6= 0 implies there exist a1 , . . . , at ∈ Zs such that h(ai ) ≤ log(deg F) and F(a) 6 = 0. 4.1.2. Effective Noether normal position Now we devote ourselves to the preparation of the variables. For k = 0, . . . , n we let Uk := (Uk0 , . . . , Ukn ) be a group of n + 1 variables and we set Yk := Uk0 + Uk1 x1 + · · · + Ukn xn . PROPOSITION 4.5 Let V ⊂ An be an
equidimensional variety of dimension r defined over k.
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
587
Then there exists G ∈ k[U1 , . . . , Ur ] \ {0} with degUk G ≤ 2 (deg V )2 such that G(b1 , . . . , br ) 6= 0 for b1 , . . . , br ∈ k n+1 implies that # V ∩ V (Y1 (b1 ), . . . , Yr (br )) = deg V. Proof Let FV be a Chow form of V , and let PV ∈ k[U, T ] be the characteristic polynomial of V associated to FV given by Lemma 2.13. Set D := deg V, and let PV = c D T0D + · · · + c0 be its expansion with respect to T0 . Also, set ρ := discrT0 PV ∈ k[U0 , . . . , Ur ][T1 , . . . , Tr ] \ {0} for the discriminant of PV with respect to T0 . Observe that as PV is multihomogeneous of degree D in each group of variables Ui ∪ {Ti }, the degree of ρ in each of these groups of variables is bounded by D (2D − 1). n+1 such that V (ν) := V ∩ V (Y1 (ν1 ), . . . , Yr (νr )) is a Now, let ν1 , . . . , νr ∈ k zero-dimensional variety of cardinality D, and let FV (ν) be a Chow form of V (ν). Set ζ0 := (T0 − U00 , U01 . . . . , U0n ). Then applying [47, Prop. 2.4], there exists λ ∈ k ∗ such that PV (U0 , ν1 , . . . , νr )(T0 , 0, . . . , 0) = FV ζ0 (U0 , T0 ), ν1 , . . . , νr = λ FV ν) (ζ0 (U0 , T0 )
= λ PV (ν) (U0 )(T0 ) where PV (ν) is a characteristic polynomial of V (ν). This implies PV (U )(T0 , 0, . . . , 0) ∈ k[U ][T0 ] is a squarefree polynomial and so ρ(U )(0) 6 = 0. We take G ∈ k[U1 , . . . , Ur ] as any nonzero coefficient of the expansion of ρ(U )(0) with respect to U0 . Therefore deg G ≤ degUi ρ(U )(0) ≤ D (2 D − 1). The condition G(b) 6 = 0 implies that ρ(U0 , b1 , . . . , br )(0) 6= 0, and so #V (b) = D. As we noted before, this implies that the variables Y1 (b1 ), . . . , Yr (br ) are in Noether normal position with respect to the variety V .
588
KRICK, PARDO, AND SOMBRA
COROLLARY 4.6 Let char(k) = 0, and let q1 , . . . , qt ∈ k[x1 , . . . , xn ] be polynomials without common zeros in An which form a reduced weak regular sequence. Set d := maxi deg f i . Then there exist b1 , . . . , bn ∈ Zn+1 such that V (q1 , . . . , qi ) satisfies Assumption 1.5 with respect to the variables Y1 (b1 ), . . . , Yn−i (bn−i ) for i = 1, . . . , t, and
h(bk ) ≤ 2 (n + 1) log(d + 1). Proof This follows readily from the previous result. We take G i as the polynomial corresponding to the variety V (q1 , . . . , qi ), and we set G := G 1 · · · G t−1 ∈ k[U1 , . . . , Un ]. We have degU j G i ≤ 2 d 2 i , and so degU j G ≤ 2 d 2 + · · · + 2 d 2 (t−1) ≤ 4 d 2 (t−1) ≤ 4 d 2 n . We conclude by taking b1 , . . . , bn ∈ Zn+1 such that h(bi ) ≤ log(deg G) and G(b) 6= 0. (If t < n + 1, we complete with vectors of the canonical basis of k n+1 .) 4.2. An intrinsic arithmetic Nullstellensatz In this section we introduce the notions of degree and height of a polynomial system defined over a number field K . Modulo setting the input equations in general position, these parameters measure the degree and height of the varieties successively cut out. The resulting estimates for the arithmetic Nullstellensatz are linear in these parameters. As an important particular case, we derive a sparse arithmetic Nullstellensatz. 4.2.1. Intrinsic parameters Let f 1 , . . . , f s ∈ K [x1 , . . . , xn ] be polynomials of degree bounded by d without common zeros in An . For i = 1, . . . , s we let Z i denote a group of s variables and we set Q i (Z ) := Z i1 f 1 + · · · + Z is f s ∈ K [Z ][x] for the associated generic linear combination of f 1 , . . . , f s . Let 0 be the set of integer (s × s)-matrices a = (ai j )i j ∈ Zs×s of height bounded by 2 (n + 1) log(d + 1) such that Ii (a) := Q 1 (a1 ), . . . , Q i (ai ) ⊂ K [x1 , . . . , xn ] is a radical ideal of dimension n − i for i = 1, . . . , t − 1 and 1 ∈ It (a) for some t ≤ min{n + 1, s}. Corollary 4.4 implies that 0 6= ∅. For a ∈ 0 we set δ(a) := max { deg V Ii (a) ; 1 ≤ i ≤ min{t, n} − 1 },
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
589
η(a) := max { h V Ii (a) ; 1 ≤ i ≤ t − 1 }. We set 0min ⊂ Zs×s for the subset of matrices a ∈ 0 such that η(a) + d δ(a) is minimum. Finally, let a min ∈ 0min be a matrix that attains the minimum of δ(a) for a ∈ 0min . Definition 4.7 Let notation be as in the previous paragraph. Then we define the degree and the height, respectively, of the polynomial system f 1 , . . . , f s as δ( f 1 , . . . , f s ) := δ(a min ) ,
η( f 1 , . . . , f s ) := η(a min ).
We restrict ourselves to integer matrices of bounded height in order to keep control of the height of Q 1 (a1 ), . . . , Q t (at ). The election of η(a) + d δ(a) as the defining invariant comes from the need to estimate the degree and height simultaneously. We note that in the case when f 1 , . . . , f s is already a reduced weak regular sequence we have η( f 1 , . . . , f s ) + d δ( f 1 , . . . , f s ) ≤ η(Id) + d δ(Id). We can estimate these parameters through the following arithmetic B´ezout inequality. LEMMA 4.8 Let f 1 , . . . , f s ∈ K [x1 , . . . , xn ] be polynomials without common zeros in An . Set di := deg f i , and assume that d1 ≥ · · · ≥ ds holds. Set d := d1 = maxi deg f i and h := h( f 1 , . . . , f s ). Also, set n 0 := min{n, s} and n 1 := min{n + 1, s}. Then Qn 0 −1 • δ( f 1 , . . . , f s ) ≤ j=1 dj, Qn 1 −2 • η( f 1 , . . . , f s ) ≤ n h + log s + 3 n (n + 1) d j=1 d j .
Proof Let a := a min = (ai j )i j ∈ Zs×s be a coefficient matrix such that δ( f 1 , . . . , f s ) = δ(a) and η( f 1 , . . . , f s ) = η(a), and set qi := ai1 f 1 + · · · + ais f s ,
1 ≤ i ≤ s.
Let t ≤ n 1 = min{n + 1, s} be minimum such that 1 ∈ (q1 , . . . , qt ). Let e a ∈ Z(t−1)×s be the matrix formed by the first t − 1 rows of a, and let c ∈ Z(t−1)×s be a staircase matrix equivalent to e a. The polynomial system e qi := ci1 f 1 + · · · + cis f s
590
KRICK, PARDO, AND SOMBRA
is then equivalent to q1 , . . . , qt−1 ; that is, (e q1 , . . . , e qi ) = (q1 , . . . , qi ) for i = 1, . . . , t − 1. Also, we have deg e qi ≤ di , and so δ := max {deg Vi ; 1 ≤ i ≤ min{n, t} − 1} ≤
nY 0 −1
dj.
j=1
We have also that each coefficient of c is a subdeterminant of e a . Thus e h := h(e q1 , . . . , e qt−1 ) ≤ h + log s + h(c) ≤ h + log s + (t − 1) 2 (n + 1) log(d + 1) + log(t − 1) ≤ h + log s + n (3 n + 1) d, and so, applying Corollary 2.11, η ≤ max {h(Vi ); 1 ≤ i ≤ min{n + 1, t} − 1} nX nY 1 −1 1 −1 e ≤ h/dl + (n + n 1 − 1) log(n + 1) dj l=1
j=1
1 −2 nY ≤ n (h + log s + n (3 n + 1) d) + 2 n log(n + 1) dj
j=1
≤ n h + log s + 3 n (n + 1) d
nY 1 −2
dj.
j=1
We can also estimate these parameters through the following arithmetic BernsteinKushnirenko inequality. 4.9 Let f 1 , . . . , f s ∈ K [x1 , . . . , xn ] be polynomials without common zeros in An . Set d := maxi deg f i and h := h( f 1 , . . . , f s ). Also, let V denote the volume of 1, x1 , . . . , xn , f 1 , . . . , f s . Then • δ( f 1 , . . . , f s ) ≤ V , • η( f 1 , . . . , f s ) ≤ n V (h + log s + 22 n+3 d). LEMMA
Proof Let a := a min = (ai j )i j ∈ Zs×s , and set qi := ai 1 f 1 + · · · + ai s f s for i = 1, . . . , s. Then Supp(qi ) ⊂ Supp( f 1 , . . . , f s ), and so V (1, x1 , . . . xn , q1 , . . . , qs ) ≤ V . Applying Proposition 2.12, we obtain δ ≤ V and η ≤ n max h(qi ) + 22 n+2 log(n + 1) d V i
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
591
≤ n h + log s + 2 (n + 1) log(d + 1) + 22 n+2 log(n + 1) d V ≤ n V (h + log s + 22 n+3 d ). 4.2.2. Proof of Theorem 2 Modulo the preparation of the input data, the proof of Theorem 2 follows the lines of the example introduced at the beginning of Section 4. The following is the general version of Theorem 2 over number fields. THEOREM 4.10 (Intrinsic arithmetic Nullstellensatz) Let K be a number field, and let f 1 , . . . , f s ∈ O K [x1 , . . . , xn ] be polynomials without common zeros in An . Set d := maxi deg f i and h := h( f 1 , . . . , f s ). Also, let δ and η denote the degree and the height of the polynomial system f 1 , . . . , f s . Then there exist a ∈ Z \ {0} and g1 , . . . , gs ∈ O K [x1 , . . . , xn ] such that • a = g1 f 1 + · · · + gs f s , • deg gi ≤ 2 n 2 d δ, • h(a, g1 , . . . , gs ) ≤ (n+1)2 [K : Q] d 2 η+(h+log s) δ+21 (n+1)2 d log(d+ 1) δ .
Proof Let a min = (ai j )i j ∈ Zs×s be a coefficient matrix such that δ = δ(amin ), η = η(amin ), and h(amin ) ≤ 2 (n + 1) log(d + 1). We set qi := ai1 f 1 + · · · + ais f s for i = 1, . . . , s. Then (q1 , . . . , qi ) is a radical ideal of dimension n − i for i = 1, . . . , t − 1 and 1 ∈ (q1 , . . . , qt ) for some t ≤ min{n + 1, s}. For 1 ≤ k ≤ n, 0 ≤ l ≤ n, we also let bkl ∈ Z be integers with h(bkl ) ≤ 2 (n + 1) log(d + 1) such that Vi := V (q1 , . . . , qi ) satisfies Assumption 1.5 with respect to the variables yk := bk0 + bk1 x1 + · · · + bkn xn for i = 1, . . . , t − 1. Set b := (bk0 )k ∈ Zn and B := (bkl )k,l≥1 ∈ GLn (Q), and set ϕ : An → An for the affine map ϕ(x) := B x + b. For j = 1, . . . , t we then set F j (y) := q j (x) = q j ϕ −1 (y) ∈ K [y1 , . . . , yn ]. Thus F1 , . . . , Ft are in the hypothesis of Lemma 3.3 with respect to y1 , . . . , yn , and we let P1 , . . . , Pt ∈ K [x1 , . . . , xn ] be the polynomials satisfying B´ezout identity we obtain there.
592
KRICK, PARDO, AND SOMBRA
Finally, for i = 1, . . . , s, we set pi :=
t X
ai j P j ϕ(x) ∈ K [x1 , . . . , xn ].
j=1
We have 1 = p1 f 1 + · · · + ps f s . Now we analyze the degree and the height of these polynomials. We assume n, d ≥ 2 as the remaining cases have already been considered in Lemmas 3.7 and 3.8. Set Wl := V (F1 , . . . , Fl ) ⊂ An for l = 1, . . . , t − 1. We have Wl = ϕ(Vl ) and so deg Wl = deg Vl . We have also deg F j = deg q j ≤ d, and so deg pi ≤ max deg P j ≤ 2 n d (1 + j
min{n,s}−1 X
deg Wl ) ≤ 2 n 2 d δ
l=1
as deg Wl ≤ δ for l ≤ n − 1. Now, let v ∈ M K∞ . We have h ∞ (ϕ) ≤ 2 (n + 1) log(d + 1) and so h ∞ (ϕ −1 ) ≤ n h ∞ (ϕ) + log n − log | det B|∞ ≤ n 2 (n + 1) log(d + 1) + log n − log | det B|∞ ≤ 3 n (n + 1) log(d + 1) − log | det B|∞ . Set h v := maxi h v ( f i ). Then h v (Fi ) ≤ h v (qi ) + h ∞ (ϕ −1 ) + 2 log(n + 1) deg qi ≤ h v + 2 (n + 1) log(d + 1) + log s + 3 n (n + 1) log(d + 1) − log | det B|∞ + 2 log(n + 1) d ≤ h v + log s + n + 1 + 3 n (n + 1) + 2 n d log(d + 1) − log | det B|∞ d ≤ h v + log s + 3 (n + 1)2 d log(d + 1) − log | det B|∞ d by Lemma 1.2(c) and the fact that log(n + 1) ≤ n and log(d + 1) ≥ 1 for d ≥ 2. Next, applying Lemma 2.7, we obtain h(Wl ) ≤ h(Vl ) + (n − l + 1) h(ϕ) + 5 log(n + 1) deg Vl ≤ h(Vl ) + n 2 (n + 1) log(d + 1) + 5 log(n + 1) deg Vl ≤ η + n (7 n + 2) d log(d + 1) δ as deg Wl = deg Vl ≤ d δ and h(Vl ) ≤ η for l = 1, . . . , t − 1. By Lemma 3.3 there exists ξ ∈ K ∗ such that h v (P j ) ≤ 2 n d
t−1 X l=1
h v (Wl ) + (n + 1) max h v (Fl ) l
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
+ 2 n (2 n + 5) log(n + 1) d
1+
t−1 X
593
deg Wl − log |ξ |v
l=1
≤ 2nd
t−1 X
h v (Wl ) + (n + 1)2 (h v + log s) d δ
l=1
+ 3 (n + 1)4 + 2 n 2 (2 n + 5) (n + 1) d 2 log(d + 1) δ − log |µ|v with µ := (det B)(n+1)
2 d2 δ
ξ ∈ K ∗ . From the previous estimates we deduce h v ( pi ) ≤ max h v (P j ) + h ∞ (ϕ) + 2 log(n + 1) max deg P j j
j
+ 2 (n + 1) log(d + 1) + log t X ≤ 2nd h v (Wl ) + (n + 1)2 (h v + log s) d δ l
+ 3 (n + 1)4 + 2 n 2 (2 n + 5) (n + 1) d 2 log(d + 1) δ − log |µ|v + 2 (n + 1) log(d + 1) + 2 log(n + 1) 2 n 2 d δ + 2 (n + 1) log(d + 1) + log(n + 1) X ≤ 2nd h v (Wl ) + (n + 1)2 (h v + log s) d δ l
+ 7 (n + 1)3 (n + 2) d 2 log(d + 1) δ − log |µ|v . P Analogously, h v ( pi ) ≤ 2 n d l h v (Wl )+(n+1)2 h v d δ−log |µ|v for v ∈ / M K∞ . Hence X h( p1 , . . . , ps ) ≤ 2 n d h(Wl ) + (n + 1)2 (h + log s) d δ l
+ 7 (n + 1)3 (n + 2) d 2 log(d + 1) δ ≤ 2 n 2 d η + 2 n 3 (7 n + 2) d 2 log(d + 1) δ + (n + 1)2 (h v + log s) d δ + 7 (n + 1)3 (n + 2) d 2 log(d + 1) δ ≤ 2 n 2 d η + (n + 1)2 (h + log s) d δ + 21 (n + 1)4 d 2 log(d + 1) δ. Finally, we apply Lemma 1.3 to obtain a ∈ Z \ {0} such that gi := a pi ∈ O K [x1 , . . . , xn ]. Thus a = g1 f 1 + · · · + gs f s and the corresponding height estimates are multiplied by [K : Q]. We derive from this result and Lemma 4.8 the following estimate in terms of the degree and the height of the input polynomials.
594
KRICK, PARDO, AND SOMBRA
COROLLARY 4.11 Let f 1 , . . . , f s ∈ O K [x1 , . . . , xn ] be polynomials without common zeros in An . Set di := deg f i , and assume that d1 ≥ · · · ≥ ds holds. Also, set d := d1 = maxi deg f i , h := h( f 1 , . . . , f s ), and n 0 := min{n, s}. Then there exist a ∈ Z \ {0} and g1 , . . . , gs ∈ O K [x1 , . . . , xn ] such that • a = g1 f 1 + · · · + gs f s , Qn 0 −1 • deg gi ≤ 2 n 2 d j=1 dj, • h(a, g1 , . . . , gs ) ≤ 2 (n + 1)3 [K : Q] h + log s + 3 n(n + 7) d log(d + Qn 0 −1 1) d j=1 dj.
4.2.3. Estimates for the sparse case Our arithmetic Bernstein-Kushnirenko inequality (see Proposition 2.12 and Lemma 4.9) shows that both the degree and the height of a system are controlled by its volume. We then derive from Theorem 4.10 the following arithmetic Nullstellensatz for sparse polynomial systems. Corollary 3 in the introduction corresponds to the case K := Q. COROLLARY 4.12 (Sparse arithmetic Nullstellensatz) Let f 1 , . . . , f s ∈ O K [x1 , . . . , xn ] be polynomials without common zeros in An . Set d := maxi deg f i and h := h( f 1 , . . . , f s ). Also, let V denote the volume of the polynomial system 1, x1 , . . . , xn , f 1 , . . . , f s . Then there exist a ∈ Z \ {0} and g1 , . . . , gs ∈ O K [x1 , . . . , xn ] such that • a = g1 f 1 + · · · + gs f s , • deg gi ≤ 2 n 2 d V , • h(a, g1 , . . . , gs ) ≤ 2 (n + 1)3 [K : Q] d V h + log s + 22n+3 d log(d + 1) .
Example 4.13 For 1 ≤ i ≤ s we let f i := ai0 + ai1 x1 + · · · + ain xn + bi1 x1 · · · xn + · · · + bid (x1 · · · xn )d ∈ Z[x1 , . . . , xn ] be polynomials of degree bounded by n d without common zeros in An . Set h := maxi h( f i ). Also, set Pd := Conv 0, e1 , . . . , en , d (e1 + · · · + en ) ⊂ Rn , so that Pd contains the Newton polytope of the polynomials 1, x1 , . . . , xn , f 1 , . . . , f s . Then V ≤ Vol(Pd ) = n! d/(n − 1)! = n d.
We conclude that there exist a ∈ Z \ {0} and g1 , . . . , gs ∈ Z[x1 , . . . , xn ] such that a = g1 f 1 + · · · + gs f s , • deg gi ≤ 2 n 4 d 2 , • h(a), h(gi ) ≤ 2 n 2 (n + 1)3 d 2 h + log s + n 22 n+3 d log(n d + 1) . This estimate is sharper than the one given by Theorem 1. •
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
595
References [1]
[2] [3] [4] [5]
[6] [7]
[8] [9] [10] [11] [12]
[13]
[14] [15]
[16]
C. A. BERENSTEIN and D. C. STRUPPA, Recent improvements in the complexity of the
effective Nullstellensatz, Linear Algebra Appl. 157 (1991), 203–215. MR 92m:13024 524 C. A. BERENSTEIN and A. YGER, Effective Bezout identities in Q [z 1 , . . . , z n ], Acta Math. 166 (1991), 69–120. MR 92f:32004 524, 581 , Residue calculus and effective Nullstellensatz, Amer. J. Math. 121 (1999), 723–796. MR 2000g:13016 524 D. N. BERNSTEIN, The number of roots of a system of equations, Funct. Anal. Appl. 9 (1975), 183–185. MR 55:8034 526 J.-B. BOST, H. GILLET, and C. SOULE´ , Heights of projective varieties and positive Green forms, J. Amer. Math. Soc. 7 (1994), 903–1027. MR 95j:14025 527, 538, 539, 540, 549 W. D. BROWNAWELL, Bounds for the degrees in the Nullstellensatz, Ann. of Math. (2) 126 (1987), 577–591. MR 89b:12001 523 L. CANIGLIA, A. GALLIGO, and J. HEINTZ, Borne simple exponentielle pour les degr´es dans le th´eor`eme des z´eros sur un corps de caract´eristique quelconque, C. R. Acad. Sci. Paris S´er. I Math. 307 (1988), 255–258. MR 90c:12002 523 J. CANNY and I. EMIRIS, A subdivision-based algorithm for the sparse resultant, J. ACM 47 (2000), 417–451. MR CMP 1 768 142 526, 540, 541 L. EIN and R. LAZARSFELD, A geometric effective Nullstellensatz, Invent. Math. 137 (1999), 427–448. MR 2000j:14028 524 D. EISENBUD, Commutative Algebra: With a View Toward Algebraic Geometry, Grad. Texts in Math. 150, Springer, New York, 1995. MR 97a:13001 566 G. FALTINGS, Diophantine approximation on abelian varieties, Ann. of Math. (2) 133 (1991), 549–576. MR 93d:11066 539, 549 N. FITCHAS and A. GALLIGO, Nullstellensatz effectif et conjecture de Serre (th´eor`eme de Quillen-Suslin) pour le calcul formel, Math. Nachr. 149 (1990), 231–253. MR 92i:12002 523 N. FITCHAS, M. GIUSTI, and F. SMIETANSKI,“Sur la complexit´e du th´eor`eme des z´eros” in Approximation and Optimization in the Caribbean, II (Havana, 1993), Approx. Optim. 8, Lange, Frankfurt, 1995, 247–329. MR 97g:68091 524, 527, 562 W. FULTON, Intersection Theory, Ergeb. Math. Grenzgeb. (3) 2, Springer, Berlin, 1984. MR 85k:14004 537 I. M. GELFAND, M. M. KAPRANOV, and A. V. ZELEVINSKY, Discriminants, Resultants, and Multidimensional Determinants, Math. Theory Appl., Birkh¨auser, Boston, 1994. MR 95e:14045 540 M. GIUSTI and J. HEINTZ, “La d´etermination des points isol´es et de la dimension d’une vari´et´e alg´ebrique peut se faire en temps polynomial” in Computational Algebraic Geometry and Commutative Algebra (Cortona, Italy, 1991), ed. D. Eisenbud and L. Robbiano, Sympos. Math. 34, Cambridge Univ. Press, Cambridge, 1993, 216–256. MR 95a:68063 577
596
[17]
[18]
[19] [20] [21] [22]
[23] [24]
[25] [26] [27] [28] [29] [30] [31]
[32]
[33] [34] [35]
KRICK, PARDO, AND SOMBRA ¨ ˜ , M. GIUSTI, J. HEINTZ, K. HAGELE, J. E. MORAIS, L. M. PARDO, and J. L. MONTANA
“Lower bounds for diophantine approximations” in Algorithms for Algebra (Eindhoven, Netherlands, 1996), J. Pure Appl. Algebra 117/118 (1997), 277–317. MR 99d:68106 524, 525, 539, 562 M. GIUSTI, J. HEINTZ, J. E. MORAIS, J. MORGENSTERN, and L. M. PARDO, Straight-line programs in geometric elimination theory, J. Pure Appl. Algebra 124 (1998), 101–146. MR 99d:68128 524, 525 M. GIUSTI, J. HEINTZ, and J. SABIA, On the efficiency of effective Nullstellens¨atze, Comput. Complexity 3 (1993), 56–95. MR 94i:13016 524, 527, 562, 577, 581 W. GUBLER, Local heights of subvarieties over non-Archimedean fields, J. Reine Angew. Math. 498 (1998), 61–113. MR 99j:14022 538 ¨ K. HAGELE , Intrinsic height estimates for the Nullstellensatz, Ph.D. dissertation, Univ. Cantabria, Cantabria, Spain, 1998. 525 ¨ K. HAGELE, J. E. MORAIS, L. M. PARDO, and M. SOMBRA, On the intrinsic complexity of the arithmetic Nullstellensatz, J. Pure Appl. Algebra 146 (2000), 103–183. MR 2000m:14069 522, 524, 525, 562, 581 J. HEINTZ, Definability and fast quantifier elimination in algebraically closed fields, Theoret. Comput. Sci. 24 (1983), 239–277. MR 85a:68062 523, 536, 537, 584 J. HEINTZ and C.-P. SCHNORR, “Testing polynomials which are easy to compute” in Logic and Algorithmic (Zurich, 1980) Monograph. Enseign. Math. 30, Univ. Gen`eve, Geneva, 1982, 237–254. MR 83g:12003 554 G. HERMANN, Der Frage der endlich vielen Schritte in der Theorie der Polynomideale, Math. Ann. 95 (1926), 736–788. 523 ´ and B. SHIFFMAN, A global Lojasiewicz inequality for algebraic S. JI, J. KOLLAR, varieties, Trans. Amer. Math. Soc. 329 (1992), 813–818. MR 92e:32007 522 M. M. KAPRANOV, B. STURMFELS, and A. V. ZELEVINSKY, Chow polytopes and general resultants, Duke Math. J. 67 (1992), 189–218. MR 93e:14062 540 P. KOIRAN, Hilbert’s Nullstellensatz is in the polynomial hierarchy, J. Complexity 12 (1996), 273–286. MR 98e:68109 522 ´ , Sharp effective Nullstellensatz, J. Amer. Math. Soc. 1 (1988), 963–975. J. KOLLAR MR 89h:12008 523 , Effective Nullstellensatz for arbitrary ideals, J. Eur. Math. Soc. (JEMS) 1 (1999), 313–337. MR 2000h:13014 524 T. KRICK and L. M. PARDO, Une approche informatique pour l’approximation diophantienne, C. R. Acad. Sci. Paris S´er. I Math. 318 (1994), 407–412. MR 95d:13033 524 , “A computational method for diophantine approximation” in Algorithms in Algebraic Geometry and Applications (Santander, Spain, 1994), Progr. Math. 143, Birkh¨auser, Basel, 1996, 193–253. MR 98h:13039 524, 527, 562, 565, 581 ´ , On intrinsic bounds in the Nullstellensatz, Appl. T. KRICK, J. SABIA, and P. SOLERNO Algebra Engrg. Comm. Comput. 8 (1997), 125–134. MR 98g:13030 525 E. KUNZ, K¨ahler Differentials, Adv. Lectures Math., Vieweg, Braunschweig, 1986. MR 88e:14025 562 A. G. KUSHNIRENKO, Newton polytopes and the B´ezout theorem, Funct. Anal. Appl.
SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ
597
10 (1976), 233–235, http://www.emis.de/ZMATH 526 [36]
S. LANG, Algebraic Number Theory, Addison-Wesley, Reading, Mass., 1970.
[37]
D. H. LEHMER, Factorization of certain cyclotomic functions, Ann. of Math. (2) 34
[38]
P. LELONG, Mesure de Mahler et calcul des constantes universelles pour les
[39]
P. LELONG and L. GRUMAN, Entire Functions of Several Complex Variables,
[40]
F. S. MACAULAY, Some formulæ in elimination, Proc. London Math. Soc. 35 (1903),
[41]
K. MAHLER, On some inequalities for polynomials in several variables, J. London
[42]
V. MAILLOT, G´eom´etrie d’Arakelov des vari´et´es toriques et fibr´es en droites
[43]
¨ D. W. MASSER and G. WUSTHOLZ , Fields of large transcendence degree generated by
MR 44:181 575 (1933), 461–479, http://www.emis.de/ZMATH 529 polynˆomes de n variables, Math. Ann. 299 (1994), 673–695. MR 95g:32025 530 Grundlehren Math. Wiss. 282, Springer, Berlin, 1986. MR 87j:32001 528 3–27. 526 Math. Soc. 37 (1962), 341–344. MR 25:2036 529 int´egrables, M´em. Soc. Math. Fr. (N.S.) 2000, no. 80. MR CMP 1 775 582 556
[44] [45]
[46] [47] [48] [49]
[50]
[51] [52]
[53]
values of elliptic functions, Invent. Math. 72 (1983), 407–464. MR 85g:11060 523 H. MATSUMURA, Commutative Ring Theory, Cambridge Stud. Adv. Math. 8, Cambridge Univ. Press, Cambridge, 1986. MR 88h:13001 584 L. M. PARDO, “How lower and upper complexity bounds meet in elimination theory” in Applied Algebra, Algebraic Algorithms and Error-correcting Codes (Paris, 1995), Lecture Notes Comput. Sci. 948, Springer, Berlin, 1995, 33–69. MR 99a:68097 524 P. PEDERSEN and B. STURMFELS, Product formulas for resultants and Chow forms, Math. Z. 214 (1993), 377–396. MR 94m:14068 549 ´ P. PHILIPPON, Crit`eres pour l’ind´ependance alg´ebrique, Inst. Hautes Etudes Sci. Publ. Math. 64 (1986), 5–52. MR 88h:11048 529, 542, 543, 544, 546, 550, 552, 587 , D´enominateurs dans le th´eor`eme des z´eros de Hilbert, Acta Arith. 58 (1990), 1–25. MR 92i:13008 524 , Sur des hauteurs alternatives, I, Math. Ann. 289 (1991), 255–283 MR 92m:11061; II, Ann. Inst. Fourier (Grenoble) 44 (1994), 1043–1065 MR 96c:11069; III, J. Math. Pures Appl. (9) 74 (1995), 345–365. MR 97a:11098 527, 530, 538, 539, 540, 550 ´ , Bounds for traces in complete intersections and degrees in J. SABIA and P. SOLERNO the Nullstellensatz, Appl. Algebra Engrg. Comm. Comput. 6 (1995), 353–376. MR 96k:13017 524, 527, 558, 561, 562, 577, 581 ´ , Effective Lojasiewicz inequalities in semialgebraic geometry, Appl. P. SOLERNO Algebra Engrg. Comm. Comput. 2 (1991), 2–14. MR 94i:14059 522 M. SOMBRA, “Bounds for the Hilbert function of polynomial ideals and for the degrees in the Nullstellensatz” in Algorithms for Algebra (Eindhoven, Netherlands, 1996), J. Pure Appl. Algebra 117/118 (1997), 565–599. MR 98i:13032 525 , A sparse effective Nullstellensatz, Adv. in Appl. Math. 22 (1999), 271–295. MR 2000c:13041 523, 526
598
[54] [55]
[56]
[57] [58]
KRICK, PARDO, AND SOMBRA
, Estimaciones para el teorema de ceros de Hilbert, Ph.D. dissertation, Univ. Buenos Aires, Buenos Aires, 1998. 539 C. SOULE´ , “G´eometrie d’Arakelov et th´eorie des nombres transcendants” in Journ´ees Arithm´etiques (Luminy, 1989), Ast´erisque 198–200 Soc. Math. France, Montrouge, 1991, 355–371. MR 93c:14024 538, 539 B. STURMFELS, “Sparse elimination theory” in Computational Algebraic Geometry and Commutative Algebra (Cortona, Italy, 1991), Cambridge Univ. Press, Cambridge, 1993, 264–298. MR 94k:13035 526 , Gr¨obner Bases and Convex Polytopes, Univ. Lecture Ser. 8, Amer. Math. Soc., Providence, 1996. MR 97b:13034 540 B. TEISSIER, R´esultats r´ecents d’alg`ebre commutative effective, Ast´erisque 189–190 (1990), 107–131, S´eminaire Bourbaki 1989/90, exp. no. 718. MR 92e:13015 524
Krick Departamento de Matem´atica, Universidad de Buenos Aires, Ciudad Universitaria, 1428 Buenos Aires, Argentina; [email protected] Pardo Departamento de Matem´aticas, Estad´ıstica y Computaci´on, Universidad de Cantabria, E-39071 Santander, Espa˜na; [email protected] Sombra Departamento de Matem´atica, Universidad Nacional de La Plata, Calle 50 y 115, 1900 La Plata, Argentina, and School of Mathematics, Institute for Advanced Study, Princeton, NJ 08540, USA; [email protected], [email protected]
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 3,
POSITIVITY IN EQUIVARIANT SCHUBERT CALCULUS WILLIAM GRAHAM
Abstract We prove a positivity property for the cup product in the T -equivariant cohomology of the flag variety. This was conjectured by D. Peterson and has as a consequence a conjecture of S. Billey. The result for the flag variety follows from a more general result about algebraic varieties with an action of a solvable linear algebraic group such that the unipotent radical acts with finitely many orbits. The methods are those used by S. Kumar and M. Nori. 1. Introduction Let X = G/B be the flag variety of a complex semisimple group G with B ⊃ T a Borel subgroup and a maximal torus, respectively. The homology H∗ (X ) has as a basis the fundamental classes [X w ] of Schubert varieties X w ⊂ X . If {xw } ⊂ H ∗ (X ) is the corresponding dual basis for cohomology, then the cup product, expressed in this basis, has nonnegative coefficients: X w xw , (1.1) xu xv = auv w are nonnegative integers. where auv The T -equivariant cohomology and Chow groups of the flag variety have been described by [A], [KK], and [Br1]. One reason to study these groups is that they provide a way to compute the coefficients in the multiplication in ordinary cohomology. In addition, the equivariant groups are related to degeneracy loci in algebraic geometry (see [F2], [F3], [PR], [G]); these in turn are related to double Schubert polynomials (see [LS]), of interest in combinatorics. Peterson [P] recently conjectured that the equivariant cohomology groups of the flag variety have a positivity property generalizing (1.1). The T -equivariant cohomology HT∗ (X ) is a free module over HT∗ (pt) with a basis dual (in a suitable sense; see Section 2) to the equivariant fundamental classes [X w ]T ; again we call this basis {xw }. Now HT∗ (pt) is isomorphic to the polynomial ring S(Tˆ ) = Z[λ1 , . . . , λn ], where
DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 3, Received 13 June 2000. Revision received 11 October 2000. 2000 Mathematics Subject Classification. Primary 14M, 22E. Author’s work partially supported by the National Science Foundation. 599
600
WILLIAM GRAHAM
λ1 , . . . , λn is a basis for the free abelian group Tˆ of characters of T . Let α1 , . . . , αn denote the simple roots in Tˆ (chosen so that the roots of b = Lie B are positive). In the equivariant setting, we can again expand the product xu xv in the form (1.1), but now w are in H ∗ (pt)—in other words, they are polynomials. Peterson’s conjecture the auv T w is written as a sum of monomials in the α , the coefficients are is that when each auv i all nonnegative. In this paper we prove the conjecture, not just for finite-dimensional flag varieties but in the general Kac-Moody setting. An immediate corollary is a conjecture of Billey [Bi]. The methods of this paper are those used by Kumar and Nori [KN]. In that paper, the authors prove the nonnegativity result (1.1) in ordinary cohomology for the flag variety of a Kac-Moody group. As they observe, the difficulty in proving this result is that in the Kac-Moody case, unlike the finite-dimensional case, the flag variety is not, in general, a homogeneous space. However, it is approximated by finite-dimensional varieties, each of which has an action of a unipotent group with finitely many orbits. The main result of [KN] is that, for such varieties, the cup product has nonnegative coefficients (with respect to a suitable basis); the result for the flag variety follows. A similar problem arises in equivariant cohomology. The equivariant cohomology of X is by definition the cohomology of a “mixed space” X T , which, although infinitedimensional, can be approximated by finite-dimensional varieties. As in the situation considered by Kumar and Nori, the space X T is not a homogeneous space. But unlike their situation, the finite-dimensional approximations to X T do not (as far as I know) have actions of unipotent groups with finitely many orbits, so we cannot apply their result. Instead, by adapting their proof to the equivariant setting and using a relation in equivariant cohomology (or Chow groups) observed by M. Brion, we are able to deduce an equivariant analogue of the main result of [KN]. The equivariant nonnegativity result for the flag variety follows immediately. 2. Preliminaries We work with schemes over the ground field C and assume (to freely apply the results of [F, Chapter 19]) that all schemes considered admit closed embeddings into nonsingular schemes. For schemes with group actions, we assume that equivariant embeddings exist. We use equivariant cohomology and Borel-Moore homology with integer coefficients as our main tools; H∗ X denotes the Borel-Moore homology of X . As general references for functorial properties of Borel-Moore homology, see [F] and [FM]. For smooth varieties we could alternatively use equivariant Chow groups, but for nonsmooth varieties the Chow “cohomology” theory is not as well understood, and for this reason we use (equivariant) cohomology and Borel-Moore homology groups. In this section we recall some basic facts about these groups (for more background, see [Br2] or [EG]). We also prove, for lack of a reference, equivariant versions of
POSITIVITY IN EQUIVARIANT SCHUBERT CALCULUS
601
several familiar nonequivariant results. Let X be a scheme with an action of a linear algebraic group G. Let V be a representation of G, and let U be an open subset of V such that G acts freely on U . View G as acting on the right on U and on the left on X ; then G acts on U × X by g · (u, x) = (ug −1 , gx). ∗ Define U ×G X to be (U × X )/G. The equivariant cohomology and Borel-Moore homology of X are, by definition, HGi (X ) = H i (U ×G X ), H jG (X ) = H j+2(dim V −dim G) (U ×G X ), provided the (complex) codimension of V − U in V is greater than i/2 (for the first equation) or dim X − j/2 (for the second equation). These groups are independent of the choice of V and U , provided the codimension condition is satisfied. For this reason we often denote U ×G X by X G (omitting U from the notation). The quotient U/G is a finite-dimensional approximation to the classifying space BG introduced in Chow theory by B. Totaro [T]. We frequently write BG when we mean such a finite-dimensional approximation. Note that HGi (X ) vanishes for negative i; H jG (X ) vanishes for j > 2 dim X but can be nonzero for negative j. The equivariant cohomology of a point we denote by HG∗ . Both HG∗ (X ) and H∗G (X ) are modules for HG∗ . HG∗ (X ) has a natural ring structure, and H∗G (X ) is a module for this ring. Any G-stable closed subvariety Y ⊂ X has a fundamental class [Y ]G in H2Gdim Y (X ). There is a natural map ∩[X ]G : HG∗ (X ) → H∗G (X ); if X is smooth, this is an isomorphism. In particular, we always identify H∗G (pt) with HG∗ . Let π X : X → pt denote the projection. If X is proper, this induces an HG∗ linear map π∗X : H∗G (X ) → H∗G (pt) ∼ = HG∗ . In this case there is a pairing ( , ) : ∗ ∗ G HG (X ) ⊗ H∗ (X ) → HG taking x ⊗ C to π∗X (x ∩ C). We sometimes write this R R pairing as C x, and, if C = [Y ]G , we abuse notation and write it as Y x. The pairing has the property that, given f : X 1 → X 2 , we have ( f ∗ x2 , C1 ) = (x2 , f ∗ C1 ).
(2.2)
(Proof: ( f ∗ x2 , C1 ) = π∗X 1 ( f ∗ x2 ∩ C1 ) = π∗X 2 f ∗ ( f ∗ x2 ∩ C1 ) = π∗X 2 (x2 ∩ f ∗ C1 ) = (x2 , f ∗ C1 ).) The map X ×G U → U/G is a fibration with fiber X , and pullback to a fiber yields a map v0 : HG∗ (X ) → H ∗ (X ). There is also a Gysin morphism H∗G (X ) → H∗ (X ), which we again denote by v0 . Properties of Gysin morphisms (see [FM, Section 2.5]) imply that if b ∈ HG∗ (X ) and a ∈ H∗G (X ), then v0 (b ∩ a) = v0 (b) ∩ v0 (a). A variety X is said to be paved by affines if it can be written as a finite disjoint ` 0 union X = X i , where each X i0 is a locally closed subvariety isomorphic to affine ∗ Alternatively,
we could let G act on the left on U and then take the diagonal action on U × X .
602
WILLIAM GRAHAM
space Adi for some di such that, for some partial order on the indexing set, [ Xi ⊆ X 0j . j≤i
Here X i is the closure of X i0 . As is well known (see, e.g., [KN]), the Borel-Moore homology H∗ (X ) is the free Z-module generated by the fundamental classes [X i ] (where X i is the closure of X i0 ); the odd-dimensional Borel-Moore homology vanishes. Part (b) of the next proposition is from [A, Propositions 2.5.1 and 2.4.1]. PROPOSITION 2.1 Suppose the G-variety X has a paving by G-invariant affines X i0 . Then (a) H∗G (X ) is a free HG∗ -module with basis {[X i ]G }. (b) Suppose in addition that X is complete and that HG∗ is torsion-free and vanishes in odd degrees. Then there exist classes xi (of degree dim X i ) in HG∗ (X ) which form a basis for HG∗ (X ) as HG∗ -module, such that the bases {[X i ]G } and R {xi } are dual in the sense that X i x j = δi j .
Proof (a) Let X k0 be open in X , and let Y = X − X k0 ; then there is a long exact sequence of HG∗ -modules G → Hi+1 (X k0 ) → HiG (Y ) → HiG (X ) → HiG (X k0 ) → · · · .
Since X k0 is isomorphic to affine space, H∗G (X k0 ) is a free HG∗ -module of rank 1, generated by [X k0 ]G . Hence all the odd equivariant homology of X k0 vanishes; by induction the same holds for Y , and then by the long exact sequence it holds for X . Thus we have a short exact sequence of HG∗ -modules 0 → H∗G (Y ) → H∗G (X ) → H∗G (X k0 ) → 0. This is split by the HG∗ -linear map H∗G (X k0 ) → H∗G (X ) taking [X k0 ]G to [X k ]G . Induction implies (a). (b) See [A, Propositions 2.5.1 and 2.4.1]. Remarks (1) Although A. Arabia assumes that G is connected, his proof is valid under the hypotheses stated. If G is connected, then [Bo1, Sections 7 and 19] (cf. [A, p. 136]) shows that if H ∗ (G) is torsion-free, then HG∗ is also torsion-free and vanishes in odd degrees. In particular (as observed in [A]), this holds with coefficients in a field, and Propositions 2.1 and 2.2 are also valid with field coefficients. If G is allowed to
POSITIVITY IN EQUIVARIANT SCHUBERT CALCULUS
603
be disconnected, with identity component G 0 , then the finite covering BG0 → BG induces an isomorphism HG∗ (pt; Q) ' HG∗ 0 (pt; Q)G/G 0 . R (2) As noted in [A], the conditions X i x j = δi j imply that under the map HG∗ (X ) → H ∗ (X ) the images xi0 of xi form a basis of H ∗ (X ) dual to the basis {[X i ]} of H∗ (X ). For a variety X paved by G-invariant affines as above, we have the following description of the product on HG∗ (X ) in terms of the diagonal morphism. The nonequivariant version of this result was used by [KN]. The equivariant version was mentioned in [P] for the flag variety; the general proof is the same. Note that the diagonal morphism δ : X → X × X is G-equivariant (G acting diagonally on X × X ). PROPOSITION 2.2 Let X be a G-variety with a paving by G-invariant affines X i0 ; assume that HG∗ is torsion-free and vanishes in odd degrees. Let X i and xi be as in the previous propoP sition. We can write δ∗ [X k ]G = i, j aikj [X i × X j ]G , where aikj ∈ HG∗ . The product in HG∗ (X ) is given by X xi x j = aikj xk . k
Proof We can write δ∗ [X k ]G in the form claimed because the classes [X i × X j ]G form a basis for H∗G (X × X ) as HG∗ -module. Let qi : X × X → X denote the ith projection. As in the nonequivariant case, the product on HG∗ (X ) is given by c1 · c2 = δ ∗ (q1∗ c1 · q2∗ c2 ) for c1 , c2 ∈ HG∗ (X × X ). (This can be seen by considering the composition i δG X G → (X × X )G ∼ = X G ×BG X G ,→ X G × X G
and noting that the product on H ∗ (X G ) is given by ζ1 · ζ2 = (i ◦ δG )∗ (pr∗1 ζ1 · pr∗2 ζ2 ), where pri : X G × X G → X G is the projection and ζi ∈ H ∗ (X G ). Choosing ζi to represent ci ∈ HG∗ (X ), the assertion follows easily.) The preceding proposition shows that if X is paved by invariant affines, then HG∗ (X ) and H∗G (X ) are free HG∗ -modules with a perfect pairing ( , ) : HG∗ (X ) ⊗ HG∗ H∗G (X ) → HG∗ .
604
WILLIAM GRAHAM
Using this, we can identify HG∗ (X ) = Hom HG∗ (H∗G (X ), HG∗ ). P Therefore, to show that xi x j = k aikj xk , it is enough to show that for all ν ∈ H∗G (X ) we have ! X X k (xi x j , ν) = ai j xk , ν = aikj (xk , ν). k
k
In fact, it is enough to check this when ν is one of the basis elements [X k ]G ; that is, it is enough to show (xi x j , [X k ]G ) = aikj . Now (xi x j , [X k ]G ) = (δ ∗ (q1∗ xi · q2∗ x j ), [X k ]G ) = (q1∗ xi · q2∗ x j , δ∗ [X k ]G ) X = akmn (q1∗ xi · q2∗ x j , [X m × X n ]G ). m,n
By definition of the pairing, (q1∗ xi · q2∗ x j , [X m × X n ]G ) = π∗X ×X (q1∗ xi · q2∗ x j ∩ [X m × X n ]G ). This is computed using the fibrations X G → BG and (X × X )G = πG X G ×BG X G → BG. By the next lemma, the result is equal to π∗X (xi ∩ [X m ]G ) · π∗X (x j ∩ [X n ]G ) which is 1 if i = m and j = n, and zero otherwise. We conclude that (xi x j , [X k ]G ) = aikj , as desired. LEMMA 2.3 Let ρi : X i → Y (i = 1, 2) be fibrations with ρi proper and with π : X 1 ×Y X 2 → Y , qi : X 1 ×Y X 2 → X i the projections. Let Z i ⊂ X i be closed subvarieties such that ρi | Z i : Z i → Y are fibrations, and let αi ∈ H ∗ (X i ). Assume that Y is smooth, and identify H∗ (Y ) with H ∗ (Y ). Then
π∗ (q1∗ α1 · q2∗ α2 ∩ [Z 1 ×Y Z 2 ]) = ρ1∗ (α1 ∩ [Z 1 ]) · ρ2∗ (α2 ∩ [Z 2 ]), where on the right-hand side the product is taken in H ∗ (Y ). Proof We have a Cartesian diagram 1
X 1 ×Y X 2 ↓π
→
Y
→
δ
X1 × X2 ↓5 Y ×Y
POSITIVITY IN EQUIVARIANT SCHUBERT CALCULUS
605
Because Y is smooth, δ (and hence 1) are regular embeddings, so there are Gysin maps δ ∗ and 1∗ on homology [F, Example 19.2.1]. These satisfy the relation π∗ 1∗ = δ ∗ 5∗ [FM, p. 26]. Claim: In H∗ (X 1 ×Y X 2 ), q1∗ α1 · q2∗ α2 ∩ [Z 1 ×Y Z 2 ] = 1∗ ((α1 ∩ [Z 1 ]) × (α2 ∩ [Z 2 ])). To prove this, first note that (with pri : X 1 × X 2 → X i denoting the projection) q1∗ α1 ·q2∗ α2 = 1∗ (pr∗1 α1 ·pr∗2 α2 ) = 1∗ (α1 ×α2 ) (cf. [M, p. 351]). Next, [Z 1 ×Y Z 2 ] = 1∗ [Z 1 × Z 2 ] since Z 1 × Z 2 and 1(X 1 ×Y X 2 ) are subvarieties of X 1 × X 2 whose intersection at smooth points is transverse. Hence (noting that [Z 1 × Z 2 ] = [Z 1 ]×[Z 2 ] by [F, p. 377]), q1∗ α1 · q2∗ α2 ∩ [Z 1 ×Y Z 2 ] = 1∗ (α1 × α2 ) ∩ 1∗ [Z 1 × Z 2 ] = 1∗ ((α1 × α2 ) ∩ ([Z 1 ] × [Z 2 ])) = 1∗ ((α1 ∩ [Z 1 ]) × (α2 ∩ [Z 2 ])) proving the claim. To complete the proof of the lemma, we compute π∗ (q1∗ α1 · q2∗ α2 ∩ [Z 1 ×Y Z 2 ]) = π∗ 1∗ ((α1 ∩ [Z 1 ]) × (α2 ∩ [Z 2 ])) = δ ∗ 5∗ ((α1 ∩ [Z 1 ]) × (α2 ∩ [Z 2 ])) = δ ∗ (ρ1∗ (α1 ∩ [Z 1 ]) × ρ2∗ (α2 ∩ [Z 2 ])) = ρ1∗ (α1 ∩ [Z 1 ]) · ρ2∗ (α2 ∩ [Z 2 ]). This proves the lemma. 3. The positivity theorem In this section we prove the positivity result about multiplication in equivariant cohomology (Theorem 3.1). As in the nonequivariant case considered by Kumar and Nori, it is deduced from a result about invariant cycles (Theorem 3.2). In the nonequivariant setting, A. Hirschowitz [H] proved that for a projective scheme with an action of a connected solvable group B, any effective cycle is rationally equivalent to a Binvariant effective cycle. Kumar and Nori gave a different proof of this result (without assuming projectivity) in the special case of unipotent groups, and the proof of Theorem 3.2 is adapted from their proof. In this section, T denotes an algebraic torus (i.e. product of multiplicative groups Gm ) with Lie algebra t = Lie T and with Tˆ ⊂ t∗ the group of characters of T . The equivariant cohomology group HT∗ can be identified with the polynomial ring S(Tˆ ), the symmetric algebra on the free abelian group Tˆ .
606
WILLIAM GRAHAM
THEOREM 3.1 Let B be a connected solvable group with unipotent radical N and Levi decomposition B = T N . Let α1 , . . . , αd ∈ Tˆ denote the weights of T on n = Lie N . Let X be a complete B-variety on which N acts with finitely many orbits X 10 , . . . , X n0 . These are a paving of X by B-stable affines; let X 1 . . . , X n denote the closures, so that {[X 1 ]T , . . . , [X n ]T } are a basis for H∗T (X ). Let {x1 , . . . , xn } denote the dual basis of HT∗ (X ). Write X xi x j = aikj xk k
with α1i1
aikj
∈
· · · αdid ,
HT∗
= S(Tˆ ). Then each aikj can be written as a sum of monomials
with nonnegative integer coefficients.
Note that the constant term in each aikj (i.e., the coefficient of α10 · · · αd0 ) is nonnegative by the above theorem. This is the coefficient that occurs in the multiplication in the ordinary cohomology H ∗ (X ). The reason is that our hypotheses imply H ∗ (X ) = HT∗ (X )/HT>0 · HT∗ (X ) (see [GKM]). The next result is the key ingredient in the proof of Theorem 3.1. In this theorem, N is not assumed to act with finitely many orbits. The result also holds with equivariant Chow groups in place of equivariant Borel-Moore homology. THEOREM 3.2 Let B be a connected solvable group with unipotent radical N , and let T ⊂ B be a maximal torus, so that B = T N . Let α1 , . . . , αd ∈ Tˆ denote the weights of T acting on n = Lie N . Let X be a scheme with a B-action, and let Y be a T -stable subvariety of X . Then there exist B-stable subvarieties Y1 , . . . , Yr of X such that in H∗T (X ), X [Y ]T = f i [Yi ]T ,
where each f i ∈ HT∗ can be written as a linear combination of monomials in α1 , . . . , αd with nonnegative integer coefficients. The following lemma was pointed out to me by Michel Brion. 3.3 Suppose that the connected solvable group B = T N acts on X and that N has finitely many orbits on X . Then each N -orbit is B-stable (in fact, the B-orbit of a T -fixed point). LEMMA
Proof B has finitely many orbits on X (as the subgroup N does); as each B-orbit is N -
POSITIVITY IN EQUIVARIANT SCHUBERT CALCULUS
607
stable, it is a finite union of N -orbits. Let B · x 0 ' B/B 0 be an orbit, where B 0 is the stabilizer of x 0 . As each N -orbit is isomorphic to affine space (see, e.g., [KN]), the odd cohomology of B · x 0 vanishes, so B 0 must contain a maximal torus of B. As all maximal tori of B are conjugate [Bo2, Corollary 11.3], there is some b ∈ B such that B 0 = bB1 b−1 , where B1 ⊃ T . Then B · x 0 = B · x, where x = b−1 x 0 ; moreover, B1 is the stabilizer of x. Hence B · x is the N -orbit of the T -fixed point x. Proof of Theorem 3.1 The group B˜ = T ·(N ×N ) (semidirect product) acts on X ×X by t·(n 1 , n 2 )( p1 , p2 ) = (tn 1 p1 , tn 2 p2 ). The unipotent radical N × N has finitely many orbits X i0 × X 0j on X × X with closures X i × X j , so H∗T (X × X ) is a free HT∗ -module with basis P [X i × X j ]T . By Proposition 2.2, if xi x j = k aikj xk , then δ∗ [X k ]T = [δ(X k )]T = P k k i j ai j [X i × X j ]T . The coefficients ai j are uniquely determined by the expansion of δ∗ [X k ]T because the classes [X i × X j ] are linearly independent over HT∗ . By Theorem 3.2, these coefficients can be written as monomials in α1 , . . . , αd with nonnegative integer coefficients, where α1 , . . . , αd are the weights of T on Lie (N × N ) (which are the same as the weights of T on n). Proof of Theorem 3.2 ϕ ∼ First, consider the case where dim N = 1; then B/T → N → Ga , where Ga ∼ = A1 ∼ is the additive group. Write α = α1 . We have B = N T , and the map B/T → N sends nT → n. Now, B acts on B/T by left multiplication. Via the isomorphism of B/T with N , we obtain an action of B on N ; the subgroup T ⊂ B acts on N by conjugation, and the subgroup N acts by left multiplication. The action of T by conjugation on N corresponds under ϕ to an action of T on A1 with weight α. Embed B/T ,→ P1 by nT 7→ [ϕ(n) : 1]. The action of B on B/T extends to an action on P1 ; the element tn ∈ B acts by the matrix α(t) ϕ(n) . 0 1 The point ∞ = [1 : 0] is fixed by B, while the point 0 = [0 : 1] is fixed by T . Now, B acts on B ×T X by left multiplication: b · (b0 , x) = (bb0 , x). Under the isomorphism θ : B ×T X → B/T × X , taking (b, x) to (bT, bx), the B-action corresponds to the product action on B/T × X . This extends to a B-action on P1 × X . The projections π : P1 × X → P1 and ρ : P1 × X → X are B-equivariant. If Y ⊂ X is a T -invariant subvariety, then B ×T Y is a B-invariant subvariety of B ×T X . Let Z be the Zariski closure of θ(B ×T Y ) in P1 × X ; θ (B ×T Y ) and Z are B-invariant subvarieties of P1 × X . Let π Z denote the restriction of π to Z . Let [w0 : w1 ] be projective coordinates on P1 , and let w be the rational function w0 /w1 . Let g = π Z∗ w; then w (and hence g) are rational functions that are
608
WILLIAM GRAHAM
T -eigenvectors of weight −α. By [Br1, Theorem 2.1]∗ we have in H∗T (P1 × X ) the relation [div Z g]T = α[Z ]T . Therefore in H∗T X we have the relation ρ∗ [div Z g]T = αρ∗ [Z ]T .
(3.3)
Now, π Z−1 (0) = {0} × Y (cf. [KN]). Also, π Z−1 (∞) = {∞} × D, where D is a subscheme of X . Therefore (3.3) yields [Y ]T = [D]T + αρ∗ [Z ]T . As π Z is B-equivariant and ∞ ∈ P1 is B-fixed, it follows that {∞} × D, and hence D, are B-invariant. Each irreducible component Yi (i = 1, . . . , r ) of D is therefore B-invariant (as B is connected), and if m i is the multiplicity of Yi in D, then P [D]T = ri=1 m i [Yi ]T . Likewise, ρ is B-equivariant, and Z is B-invariant. If Z i is a component of Z , then the map ρ| Z i of Z i onto its image in X is finite if and only if the map ρT | Z i T of Z i T onto its image in X T is finite; in that case the degrees of the maps are the same. If we list the components of ρ(Z ) which are finite images of components of Z as Yr +1 , . . . , Ys , it follows that each of these components is B-invariant Ps and that ρ∗ [Z ]T = i=r +1 m i [Yi ]T , where m i are positive integers. We conclude that r s X X [Y ]T = m i [Yi ]T + m i α[Yi ]T , (3.4) i=1
i=r +1
where the Yi are B-invariant. This proves the result if dim N = 1. To prove the result in general, we can find a subgroup N 0 ⊂ N such that N 0 is normal in B and dim N /N 0 = 1. Let α be the weight of T on Lie (N /N 0 ). Define B 0 = N 0 T ⊂ B = N T . By induction we may assume the result is true for B 0 . It is enough to show that, given a B 0 -invariant subvariety Y ⊂ X , we can write [Y ]T as in (3.4), with B-invariant Yi . For this we modify the above proof as follows. Replace 0 0 B/T , B ×T X , and B ×T Y by B/B 0 , B × B X , and B × B Y ; the map θ now takes 0
∼ =
B × B X to B/B 0 × X . Again ϕ : B/B 0 → Ga = A1 , and T acts by weight α on A1 . We can embed B/B 0 ,→ P1 as before; the point ∞ = [1 : 0] is fixed by B, and [0 : 1] is fixed by B 0 . With these modifications, (3.4) is proved as above. This proves the theorem.
Brion is using the convention that if X is a T -space, then T acts on functions on X by (t · f )(x) = f (t x), while we are using the convention that T acts on functions by (t · f )(x) = f (t −1 x). Under Brion’s convention, our function g would be an eigenvector of weight α.
∗ M.
POSITIVITY IN EQUIVARIANT SCHUBERT CALCULUS
609
4. Schubert varieties 4.1. Peterson’s conjecture Let G be a complex semisimple group, and let B ⊃ T be a Borel subgroup and a maximal torus, respectively. Let N be the unipotent radical of B; let B − = T N − be the opposite Borel. Choose a system of positive roots so that the roots in n are positive. Let W = N (T )/T denote the Weyl group; we abuse notation and write w for an element of W and also for a representative in N (T ). Let X = G/B denote 0 = N · w B ⊂ X , and let the flag variety. The T -fixed points are {w B}w∈W . Let X w ` ` 0 0 − 0 Yw = N · w B. Then X = w X w (resp., X = w Yw ) is a decomposition of X as a disjoint union of finitely many N (resp., N − )-orbits. Let X w and Yw denote the 0 and Y 0 , and let {x } and {y } be the bases of H ∗ X dual (in the sense closures of X w w w w T of Proposition 2.1) to {[X w ]T } and {[Yw ]T }. Let α1 , . . . , α` denote the simple roots. Any weight of T on n (resp., n− ) is a nonnegative (resp., nonpositive) linear combination of the simple roots. Therefore the next corollary is an immediate consequence of Theorem 3.1. COROLLARY 4.1 P w P w w With notation as above, write xu xv = w auv xw and yu yv = v buv yw , with auv ∗ w in H . Then a w (resp., bw ) is a linear combination of monomials in the α and buv i uv uv T (resp., −αi ), with nonnegative coefficients.
Remark Theorem 3.1 can be applied to the varieties X w and Yw , which are in general singular, to yield an analogue of Corollary 4.1 for HT∗ (X w ) and HT∗ (Yw ). The analogous result also holds for partial flag varieties. ∩[X ]T
Because X is smooth, the map HT∗ (X ) → H∗T (X ) is an isomorphism. The next lemma is known (cf. [P]), but for lack of reference we give a proof. LEMMA 4.2 ∩[X ]T The map HT∗ (X ) → H∗T (X ) takes yw to [X w ]T .
Proof We can identify HT∗ (X ) with Hom HT∗ (H∗T (X ), HT∗ ) (see the proof of Proposition 2.2). Hence any γ ∈ HT∗ (X ) is uniquely determined by the values π∗T (γ ∩ h 0 ) as h 0 ranges over the basis {[Yw0 ]T } of H∗T (X ). Now, if γ ∈ HT∗ (X ) satisfies γ ∩ [X ]T = h, then γ ∩ h 0 = h · h 0 . Indeed,
610
WILLIAM GRAHAM
the intersection product on H∗T (X ) satisfies the following: if γ 0 ∩ [X ]T = h 0 , then γ · γ 0 ∩ [X ]T = h · h 0 ; but γ · γ 0 ∩ [X ]T = γ ∩ (γ 0 ∩ [X ]T ) = γ ∩ h 0 . Combining these facts, we see that to show yw ∩ [X ]T = [X w ]T , it suffices to show π∗X ([X w ]T [Yw0 ]T ) = π∗X (yw ∩ [Yw0 ]T ) = δww0 . Now, for any w, w0 , the intersection X w ∩ Yw0 is T -invariant and is known to satisfy codim X w ∩ Yw0 = codim X w + codim Yw0 . (Indeed, by [KL], X w ∩ Yw0 0 is irreducible and of dimension dim X − dim X w − dim Yw0 , but by [F, p. 137] each component of X w ∩ Yw0 has at least that dimension. It follows that X w ∩ Yw0 0 is dense in X w ∩ Yw0 .) Hence [X w ]T [Yw0 ]T is a multiple of [X w ∩ Yw0 ]T . If dim X w ∩ Yw0 > 0, then dim(X w ∩ Yw0 )T > dim BT , so π∗X ([X w ∩ Yw0 ]T ) = 0. If dim X w ∩ Yw0 = 0, then w = w0 and X w and Yw intersect with multiplicity 1 at the point w B [C, Prop. 2]. Hence πTX : X T → BT maps (X w ∩ Yw )T isomorphically onto BT , and therefore π∗X ([X w ]T [Yw ]T ) = π∗X ([X w ∩ Yw ]T ) = 1. This proves the lemma. The intersection product on H∗T (X ) is induced by the product on HT∗ (X ), via the isomorphism ∩[X ]T . Lemma 4.2 and Corollary 4.1 therefore imply the following corollary. COROLLARY 4.3 P w The intersection product on H∗T (X ) is given by [Yu ]T [Yv ]T = w auv [Yw ]T (resp., P w ∗ w w [X u ]T [X v ]T = w buv [X w ]T ), where each auv (resp., buv ) in HT is a sum of monomials in α1 , . . . , α` (resp., −α1 , . . . , −α` ), with nonnegative coefficients.
Corollaries 4.1 and 4.3 were conjectured by Dale Peterson. 4.2. Billey’s conjecture B. Kostant and S. Kumar [KK] defined functions (for each w ∈ W ) ξ w : W → S(Tˆ ) ⊂ S(t∗ ), and showed that, for any u, v ∈ W , one can write X uv w ξuξv = pw ξ w
uv ∈ S(t∗ ). Billey [Bi] observed in examples that if ν ∈ t satisfies for unique pw uv (ν) ≥ 0, and asked if a geometric proof α(ν) > 0 for all positive roots α, then pw was possible. Arabia [A] proved the following relation of the functions ξ w to the T -equivariant equivariant cohomology of the flag variety. We use the notation of the preceding subsection; thus i w : w B → G/B = X denotes the inclusion, and i w∗ : HT∗ (X ) → HT∗ (w B) = HT∗ denotes the pullback. As usual, we identify HT∗ (X ) with H∗T (X ).
POSITIVITY IN EQUIVARIANT SCHUBERT CALCULUS
611
4.4 The functions ξ w are related to pullbacks of cohomology classes by i u∗ xw = −1 ξ w (u −1 ). −1 −1 uv are related to the multiplication in H ∗ (x) by p u ,v The polynomials pw = T w−1 w auv .
THEOREM
(1) (2)
This is proved (in the general Kac-Moody case) in [A, Theorem 4.2.1]. We have stated this theorem using the conventions of [KK] for the functions ξ w ; below we explain the relationship between the conventions of [A] and [KK]. Note that (2) follows immediately from (1) since (as noted by Arabia) the pullback ⊕i w∗ : HT∗ (X ) → ⊕w∈W HT∗ is injective. As a consequence, we obtain Billey’s conjecture. COROLLARY 4.5 uv (ν) ≥ 0. If ν ∈ t satisfies α(ν) > 0 for all positive roots α, then pw
Proof This follows immediately from Theorem 4.4 and Corollary 4.1. We now discuss the conventions of [A] and [KK]. Let C[W ] denote the group algebra over C of W ; let Q be the quotient field of S(t∗ ). Kostant and Kumar set Q W = C[W ] ⊗ Q; Arabia defines Q and Q W with rational rather than complex coefficients, but we ignore this difference. Both [KK] and [A] define elements ξ w ∈ Hom Q (Q W , Q), but with different conventions; if we use ξ w for the elements defined −1 in [KK] and ξ Aw for the elements defined in [A], then ξ w = ξ Aw . Let F(W, Q) denote the set of functions from W to Q. Both [KK] and [A] use ' identifications F(W, Q) → Hom Q (Q W , Q); we denote their respective identifications by f 7→ f K ,
where f K (δu ⊗ 1) = f (u)
f 7→ f A ,
where f A (δu ⊗ 1) = f (u
−1
(see [KK, (4.17)]), )
(see [A, Section 4.1]).
w w If we define f w and g w in F(W, Q) by f Kw = ξ w , g w A = ξ A , then f (u) = −1 g w (u −1 ). Arabia uses the injection
⊕i u∗ : HT∗ (X ) ,→ ⊕HT∗ ' F(W, S(t∗ )) ⊂ F(W, Q) to identify HT∗ (X ) with a subset of F(W, Q). In his paper he proves that, under this identification, g w corresponds to what we have denoted by xw ∈ HT∗ (X ). In [KK]
612
WILLIAM GRAHAM
there is no separate notation introduced for the f w , but rather they are identified with ξ w ; that is, the ξ w are viewed as elements of F(W, Q). If we return to their notation, −1 we see ξ w (u −1 ) = i u∗ xw , as stated in Theorem 4.4. Note that if we let ξ Bw denote the functions used by Billey, then ξ Bw (u) = −1 w ξ (u −1 ). We also remark that the notation xw in this paper does not have the same meaning as it does in [A] and [KK]. 4.3. The Kac-Moody case The analogues of Corollaries 4.1 and 4.5 are also valid for flag varieties (complete or partial) of Kac-Moody groups. The key point is that such a flag variety, although in general infinite-dimensional, can be approximated by finite-dimensional varieties for which the hypotheses of Theorem 3.1 are satisfied. Indeed, this was exactly the geometric motivation of Kumar and Nori. We briefly sketch how this works in equivariant cohomology. The basic facts we need can be found in [Sl], to which we refer for a more detailed explanation of the notation. Let G be a Kac-Moody group, and let B be a Borel subgroup; let X = G/B denote the flag variety. The group B is a proalgebraic group (inverse limit of algebraic groups), and it has a Levi decomposition B = T N , where N is a proalgebraic prounipotent group (denoted by U in [Sl] and [KN]) and T is a finite-dimensional torus. The space X has the structure of indvariety; it is realized as a union X = ∪k≥0 X k , where each X k is a finite-dimensional variety embedded as a closed subvariety of X k+1 . Here X k is defined as follows. We ` 0 0 = B · w B. The have X = X w , realizing X as a disjoint union of Schubert cells X w 0 union is over all elements of the Weyl group W ; each X w is isomorphic to the affine ` 0 ; this is space Al(w) , where l(w) is the length of w. By definition, X k = l(w)≤k X w a finite-dimensional projective variety that is paved by affines. Moreover, each X k is B-stable, and there exists a subgroup Nk ⊂ N , normal in B, such that Bk = B/Nk is a finite-dimensional solvable group, and the action of B on X k factors through the map B → Bk . Each X k therefore satisfies the hypotheses of Theorem 3.1. As in the finite case, there is a set of simple roots α1 , . . . , αl in t∗ , and, moreover, for any k every weight in Lie (N /Nk ) is a nonnegative linear combination of simple roots. Now, for any fixed i the pullback HTi (X ) → HTi (X k ) is a canonical isomorphism for k sufficiently large (as the decomposition of X into Schubert cells makes X a CWcomplex and X k contains all cells in X of dimension less than or equal to 2k, and the same is true for the mixed spaces X kT and X T ). There is a basis {xw } of HT∗ (X ) dual to the fundamental classes [X w ]T in the sense that the pullbacks to HT∗ (X k ) form a basis dual to the [X w ]T ∈ H∗T (X k ), for l(w) ≤ k. This basis does not depend on k, as can be seen using property (2.2) of the pairing, applied to the inclusion map of X k into X k+1 . Theorem 3.1 therefore implies the following corollary, also conjectured by Peterson.
POSITIVITY IN EQUIVARIANT SCHUBERT CALCULUS
613
COROLLARY 4.6 With notation as above, if X is the flag variety of a Kac-Moody group, with basis P w ∗ w {xw } of HT∗ (X ), then xu xv = w auv x w , with auv ∈ HT a linear combination of monomials in the αi , with nonnegative coefficients.
Acknowledgments. The author would like to thank Michel Brion and James Carrell for some useful e-mail. References [A]
[Bi] [Bo1]
[Bo2] [Br1] [Br2]
[C]
[EG] [F] [F2] [F3] [FM]
[GKM]
A. ARABIA, Cohomologie T -´equivariante de la vari´et´e de drapeaux d’un groupe de
Kac-Moody, Bull. Soc. Math. France 117 (1989), 129–165. MR 90i:32042 599, 602, 603, 610, 611, 612 S. BILLEY, Kostant polynomials and the cohomology ring for G/B, Duke Math. J. 96 (1999), 205–224. MR 2000a:14060 600, 610 A. BOREL, Sur la cohomologie des espaces fibr´es principaux et des espaces homog`enes de groupes de Lie compacts, Ann. of Math. (2) 57 (1953), 115–207. MR 14:490e 602 , Linear Algebraic Groups, 2d ed., Grad. Texts in Math. 126, Springer, New York, 1991. MR 92d:20001 607 M. BRION, Equivariant Chow groups for torus actions, Transform. Groups 2 (1997), 225–267. MR 99c:14005 599, 608 , “Equivariant cohomology and equivariant intersection theory” in Representation Theories and Algebraic Geometry (Montreal, 1997), ed. A. Broer and A. Daigneault, NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci. 514, Kluwer, Dordrecht, 1998, 1–37. MR 99m:14005 600 C. CHEVALLEY, “Sur les d´ecompositions cellulaires des espaces G/B” in Algebraic Groups and Their Generalizations: Classical Methods (University Park, Pa., 1991), Proc. Sympos. Pure Math. 56, Amer. Math. Soc., Providence, 1994, 1–23. MR 95e:14041 610 D. EDIDIN and W. GRAHAM, Equivariant intersection theory, Invent. Math. 131 (1998), 595–634. MR 99j:14003a 600 W. FULTON, Intersection Theory, Ergeb. Math. Grenzgeb (3) 2, Springer, Berlin, 1984. MR 85k:14004 600, 605, 610 , Flags, Schubert polynomials, degeneracy loci, and determinantal formulas, Duke Math. J. 65 (1992), 381–420. MR 93e:14007 599 , Determinantal formulas for orthogonal and symplectic degeneracy loci, J. Differential Geom. 43 (1996), 276–290. MR 98d:14004 599 W. FULTON and R. MACPHERSON, Categorical framework for the study of singular spaces, Mem. Amer. Math. Soc. 31 (1981), no. 243. MR 83a:55015 600, 601, 605 M. GORESKY, R. KOTTWITZ, and R. MACPHERSON, Equivariant cohomology, Koszul duality, and the localization theorem, Invent. Math. 131 (1998), 25–83.
614
WILLIAM GRAHAM
MR 99c:55009 606 [G]
W. GRAHAM, The class of the diagonal in flag bundles, J. Differential Geom. 45
[H]
A. HIRSCHOWITZ, Le groupe de Chow e´ quivariant, C. R. Acad. Sci. Paris S´er. I Math.
[KL]
D. KAZHDAN and G. LUSZTIG, “Schubert varieties and Poincar´e duality” in Geometry
(1997), 471–487. MR 98j:14070 599 298 (1984), 87–89. MR 85j:14007 605
[KK]
[KN]
[LS]
[M] [P] [PR]
[Sl]
[Sp] [T]
of the Laplace Operator (Honolulu, 1979), Proc. Sympos. Pure Math. 36, Amer. Math. Soc., Providence, 1980, 185–203. MR 84g:14054 610 B. KOSTANT and S. KUMAR, The nil Hecke ring and cohomology of G/P for a Kac-Moody group G, Adv. in Math. 62 (1986), 187–237. MR 88b:17025b 599, 610, 611, 612 S. KUMAR and M. NORI, Positivity of the cup product in cohomology of flag varieties associated to Kac-Moody groups, Internat. Math. Res. Notices 1998, 757–763. MR 99i:14061 600, 602, 603, 607, 608, 612 ¨ A. LASCOUX and M.-P. SCHUTZENBERGER , “Interpolation de Newton a` plusieurs variables” in Seminare D’alg`ebre Paul Dubreil et Marie-Paule Malliavin (Paris, 1983/84), Lecture Notes in Math. 1146, Springer, Berlin, 1985, 161–175. MR 88h:05020 599 J. MUNKRES, Elements of Algebraic Topology, Addison-Wesley, Menlo Park, Calif., 1984. MR 85m:55001 605 D. PETERSON, lectures, 1997. 599, 603, 609 P. PRAGACZ and J. RATAJSKI, Formulas for Lagrangian and orthogonal degeneracy ˜ loci: Q-polynomial approach, Compositio Math. 107 (1997), 11–87. MR 98g:14063 599 P. SLODOWY, “On the geometry of Schubert varieties attached to Kac-Moody Lie algebras” in Proceedings of the 1984 Vancouver Conference in Algebraic Geometry, ed. J. Carrell, A. Geramita, and P. Russell, CMS Conf. Proc. 6, Amer. Math. Soc., Providence, 1986, 405–442. MR 87i:14043 612 E. SPANIER, Algebraic Topology, McGraw-Hill, New York, 1966. MR 35:1007 B. TOTARO, “The Chow ring of a classifying space” in Algebraic K -Theory (Seattle, 1997), Proc. Sympos. Pure Math. 67, Amer. Math. Soc., Providence, 1999, 249–281. MR CMP 1 743 244 601
University of Georgia, Department of Mathematics, Boyd Graduate Studies Research Center, Athens, Georgia 30602, USA; [email protected]