f -\-^g that divides both / and g. Since / is irreducible and d is positive, d = 1 or / . By (3), d ^^ f. Therefore (t)f ^ i^g ^ 1. Thus 0//^ + ipgh = h is divisible by / , as was to be shown, because / divides both terms on the left. Next consider the case of Theorem 2 in which / contains fewer than m indeterminates: A Special Case. An irreducible element of Z[xi^ X2^ • • •, Xm-i] is prime as an element of Z[xi, X2, • - -, Xm]Let / be an irreducible polynomial in xi, ^2, . . . , Xm-i with integer coefficients, and let g{x) and h{x) be polynomials in x, Xi, a;2, •. •, Xm-i with integer coefficients for which (2) and (3) hold. Then all coefficients oi g{x)h{x) are divisible by / , but at least one of the coefficients oi g{x) is not divisible by / . Let g{x) = a^x^ -\- aix^~^ + • • • + a^, where the a^ are polynomials in xi, X2, . . . , Xm-i: and let / be the least index for which aj is not divisible by / . If h{x) were not divisible by / , then, in the same way, when h{x) was written in the form h{x) = box^ -\- bix^~^ -}-••• -\- bt there would be a least index J for which bj was not divisible by / . Then the coefficient of x^^'^ in g{x)h{x) would be a lb J plus terms divisible by / (this coefficient is ajbj plus terms that are products in which one factor is divisible by / ) . But ajbj would not be divisible by / by the inductive hypothesis, so g{x)h{x) would not then be divisible by / , contrary to hypothesis. Therefore h{x) must be divisible by / , as was to be shown. The general case, in which / may contain x, can now be deduced from the case of TTI — 1 indeterminates as follows: The Euclidean algorithm gives equations d{x) = (t){x)f{x) -\- ip{x)g{x)^ f{x) — qi{x)d{x)^ g{x) = q2{x)d{x) where d^ 0, -0, gi, q2 are polynomials in x with coefficients in the field of rational functions in xi, X2, . . . , Xm-i- Let ^ be a common denominator of all five of these polynomials. (For example, 6 could be taken to be the product of all denominators of all coefficients of the five.) Then D{x) — 6 • d{x)^ ^{x) = 6 • 0(x), ^{x) = 6 • '0(x), Qi{x) = 6 ' qi{x), and Q2{x) = S • q2{x) all are in Z[x, Xi, X2, .. •, Xm-i] and they satisfy D{x) — ^{x)f{x) + ^{x)g{x)^ d'^ • f{x) = Qi{x)D{x) and 6'^ • g{x) = Q2{x)D{x). By the special case already proved, each irreducible factor e of S'^ divides either Qi{x) or D{x). Therefore, 6'^ ' f{x) = Qi{x)D{x) can be divided by each of the irreducible factors of 6'^ in succession to find f{x) = {Qi{x)/ei) • {D{x)/e2) where (5^ = 6162. By (1), D{x)/e2 must be ± 1 or ±f{x). By (3), D{x)/e2 7^ ±f{x). Therefore,
18
1 A Fundamental Theorem
D{x) = ±62, SO dze2h{x) = ^ ( x ) • f{x) ' h{x) + ^{x) • g{x) • h{x), which shows t h a t f{x) divides e2/i(x), say €2h{x) = Q3{x) • f{x). By the special case, each irreducible factor of 62 divides Qs{X)^ so f{x) divides /i(x), as was to be shown. T h u s Theorem 2 follows by induction. C o r o l l a r y 1 ( U n i q u e f a c t o r i z a t i o n of p o l y n o m i a l s w i t h i n t e g e r coeff i c i e n t s ) . If (j)i(j)2' •' (j)^ = '0i'02 • • • '01/; where the factors on both sides are irreducible polynomials with integer coefficients, then fi = v, and the factors can be so ordered that (j)i = —-0^ for an even number of values ofi, and (j)i = ipi for all others. Deduction. Let such an equation 0 i 0 2 * * • ^^^ — i^ii^2 ''' i^i^ be given in which /i > 1. Since Theorem 2 implies t h a t ^ i divides t/jj for some j , the tp^s can be rearranged to make 01 divide '0i, say 0 i = qicf)!. T h e n 0203 ••* 0^^ = ^1'020^3 • • • 01/ is a product of factors, at least z/ — 1 of which are irreducible. If fi were less t h a n z^, ji iterations of this step would express 1 as a product of z/ factors, at least u — ji oi which were irreducible, contrary to the fact t h a t the only factors of 1, t h e units 1 and —1, are not irreducible. Therefore, /J. > v. For the same reason, z/ > /x, so /i and u must be equal in any such equation. In the first equation 0203 • • • 0;u = ^10203 •' 'i^i^ found by the process above, qi must therefore have no irreducible factors and therefore must be a unit. Thus, fi steps rearrange the T/^'S in such a way t h a t 0^ = qi(f)i for each z, where qi is a unit and 1 = qiq2''' q^- T h e last equation shows t h a t the number of g's t h a t are —1 is even, and the corollary follows. C o r o l l a r y 2 ( G a u s s ' s * l e m m a ) . If an element of Z[x,xi, X2, • •., Xm-i] is reducible over the field of rational functions of xi, X2, • • -, Xm-i in the sense that it can be written as a product of two polynomials of positive degree in X with coefficients in this field, then it is reducible as an clement of Z[x^Xi, X2,
...,
Xm-l]'
Deduction. Let f{x) be reducible over the field of rational functions, say f{x) = g{x)h{x), and let di and G?2 be elements of Z[xi, X2, . . . , ^ m - i ] t h a t clear the denominators oi g{x) and h{x) respectively, in other words, elements such t h a t G{x) — dig{x) and H{x) — d2h{x) are in Z [ . T , X I , X2, • . . , Xm-i]T h e n did2f{x) — G{x)H{x); this equation can be divided successively by the irreducible factors of did2 to produce an equation f{x) = —^ • —^ (where Gauss's original statement was that a product of monic polynomials with rational coefficients can have integer coefficients only if the factors do. The same is true for TTi > 1: A product of monic polynomials whose coefficients are rational functions in xi, X2, . . . , Xm-i can have coefficients in Z[3;i, X2, • . . , Xm-i] only if the factors do. This statement can be proved in the same way as the statement above: When g{x) and h[x) are monic, f{x) is monic, so its factors —^ and —^ are monic, which implies that -^ = 1 and -^ — \ and therefore that g{x) = G{x) and h(x) = H{x). For more on Gauss's lemma, see Essay 2.5.
Essay 1.4 Factorization of Polynomials with Integer Coefficients
19
ei and 62 are integers for which 6162 = did2, and both factors have integer coefficients), which shows that f{x) is reducible in Z[x-, Xi, X2, . . . , Xm-i]The methods used to prove Corollary 1 prove another proposition: Proposition. If (f>i(t>2 '''
20
1 A Fundamental Theorem
Essay 1.5 A Factorization Algorithm Die im Art. 1 aufgestellte Definition der Irreductibilitdt entbehrt so lange einer sicheren Grundlage, als nicht eine Methode angegeben ist, mittels deren bei einer bestimmten, vorgelegten Function entschieden werden kann, ob dieselbe der aufgestellten Definition gemdss irreductibel ist oder nicht. (The definition of irreducibility given in Art. 1 lacks a firm foundation until a m e t h o d is given t h a t makes it possible to determine whether a given example does or does not satisfy it.)—L. Kronecker [39, §4] T h e naive m e t h o d of constructing a splitting field t h a t was sketched at the beginning of t h e preceding essay suggests t h a t the factorization into irreducibles of a polynomial with coefficients in a root field (see Essay 1.3) might be a key tool in the proof of t h e fundamental theorem. In fact, as later essays will show, it suffices to be able to factor monic, irreducible polynomials with coefficients in 7i[ci, C2, .-., Cy] itself, which is the problem treated by the following theorem: T h e o r e m . Given monic, irreducible polynomials f and g in one nate with coefficients in the ring Z[ci, C2, . . . , c^^] of polynomials ..., Cy with integer coefficients, factor f{x) as a polynomial with in the root field ofg{y).
indetermiin ci, C2; coefficients
In other words, one is to construct a congruence (1)
f{x)
= (t)i{x,y)(l)2{x,y)
" ' (t)k{x,tj) mod
g{y)
in which t h e factors 0^ are polynomials in two indeterminates with coefficients in the field of rational functions in ci, C2, . . . , c^y t h a t are monic and of positive degree in x, t h a t have degree in y less than* the degree of g., and t h a t cannot be written, mod g{y)^ as products of polynomials of lower degree in x. Notation: T h e ring of polynomials in ci, C2, . . . , c^ with integer coefficients will be denoted by R = Z[ci, C2, . . . , Cy\. Its field of quotients, the field of rational functions in ci, C2, . . . , c^^, will be denoted by K. W h e n z/ = 0, i^ is t h e ring of integers and K is the field of rational numbers. This essay is devoted to a description of the algorithm for finding the factorization of f{x) mod g{y) in the form of a congruence (1). The validity of the algorithm will be proved in the next essay. T h e factorization algorithm will make use of computations in the ring R[x.,y] mod {f{x),g{y)) of polynomials in x, y, ci, C2, . . . , c^ with integer coefficients, where two such polynomials represent the same ring element— by definition—if their difference is a multiple of f{x) plus a multiple of g{y) * As far as (1) is concerned, 0i can be replaced by any polynomial that is congruent to it mod g{y), and this condition on the degrees of the 0's need not be satisfied. Restricting the degree in this way serves to determine the factors (j)i(x^y) once f{x) and g{y) are given, as is shown by the final proposition of Essay 1.4.
Essay 1.5 A Factorization Algorithm
21
(the multipliers being, of course, in R[x, y]). Since f(x) is monic of degree TTI, ^m — f(^x) is di polynomial of degree less than m in x that represents the same ring element as x"^, so any ring element that is represented by a polynomial of degree m -\- j in x for j > 0 can also be represented by a polynomial whose degree in x is less than m -\- j (replace the leading term (j){y)x'^^^ in X with (j){y)x^{x'^ — f{x)) while leaving the other terms unchanged). Thus, every ring element can be represented by a polynomial whose degree in x is less than m. In fact, every element of R[x,y] is congruent mod f{x) to one and only one polynomial of degree less than m in x, because an element of R[x,y] whose degree in x is less than m can be a multiple of f{x) only if it is zero. In the same way, any element of R[x,y] is congruent mod g{y) to just one element whose degree in y is less than n. Moreover, since the reduction method can be applied to each coefficient (f)i{y) of a polynomial (l)i{y)x'^~^ + 02(l/)^^"^ + • • • + (l>m{y) that has already been reduced mod / ( x ) , every element of R[x^ y] mod {f{x),g{y)) is represented by one and only one element of i^[x, y] whose degree in x is less than m and whose degree in y is less than n. Each element of this ring R[x,y] mod {f{x),g{y)) is a root of a monic polynomial with coefficients in R. Specifically, if (t){x^y) is an element of R[x^y]^ a monic polynomial J^{z) of degree mn with coefficients in R for which T{(j){x^y)) = 0 mod {f{x)^g{y)) can be constructed in the following way: For each of the mn monomials x'^y^ in which 0 < i < m and 0 < j < n, the polynomial (j){x^y)x'^y^ is congruent mod f{x) and g{y)^ as was just seen, to a sum of multiples of x^y^ ^ where 0 < a < m and 0 < /^ < n, in which the multipliers are in R, Thus, the congruence m—l n—1
determines an mn x mn matrix M of elements Mij^^fs of R once an ordering of the m.n monomials x'^y^ is decided upon. Otherwise stated, M is the matrix that represents multiplication by (/)(x, y) relative to the basis x^y^ of R[x,y] mod {f{x)^g{y)) over R. The characteristic polynomial of this matrix, which is to say the polynomial ^(2:) = det(2;/—M), is monic of degree mn in z; by the Cay ley-Hamilton theorem, it satisfies J^{(j){x, y)) = 0 mod {f{x),g{y)). (A proof will be given in the next essay.) Let this construction be applied not to a single polynomial (j){x^y) but to tx -\- uy^ regarded as a polynomial in new indeterminates t and u whose coefficients are in R[x^y]. The result is a polynomial T{z,t^u) in 2:, t, and u with coefficients in R. Specifically, T is the characteristic polynomial det(2:/ — M) of the mn x mn matrix M determined by C • {tx -h uy) = MC mod {f{x),g{y)), where C is the column matrix of length mn whose entries are the monomials x'^y^ in which 0 < i < m and 0 < j < n arranged in some order and tx-\-yu is a 1x1 matrix. The entries of M are homogeneous polynomials of degree 1 in t and u with coefficients in R, so the entries oi zl — M are
22
1 A Fundamental Theorem
homogeneous of degree 1 in z, t, and u. Thus, J-{z^ t, u) is homogeneous of degree mn in these indeterminates and has coefficients in R; moreover, it is monic in z. As was seen in the last essay, the irreducible factors of J^{z,t^u) as a polynomial with integer coefficients (in 3 + z/ indeterminates) can be found, say T{z^t^u) = YlJ^i{z^t^u)^ where the Ti[z^t^u) are irreducible. Because T is homogeneous, its irreducible factors Ti are homogeneous. Because J^ is monic in 2:, the leading coefficient of each of its irreducible factors Ti as a polynomial in z is ± 1 , so one can stipulate that each Ti is monic in z, and this condition determines the J^i completely. The required factorization (1)
f{x) = (t)i{x,y)(l)2{x,y) •' • (j)k{x,y) mod g[y)
contains one factor (j)i{x,y) for each Ti{z^t^u). It is constructed as follows: As will be shown, the degree of J^i (it is homogeneous in z, t, and u) is a multiple of n, say it is /x^n. (By symmetry, this degree is also a multiple of m, a fact that is not of interest here.) Substitute tx + uy for z and 1 for t in Ti and write the result in the form (2) T^{x + uy, 1,u) = B^^ou^^^ + ^^,2^^^""' + ^^,2^^^""' + • • • + ^^,M^n. Each coefficient Bi^j is a polynomial in x, y, Ci, C2, . . . , c^^-i, and Cy with integer coefficients. The first fii of these coefficients are all zero mod g{y)^ which is to say that reduction mod g{y) gives J^i{x -h uy, 1, u) = i^iU^^^""-^^ + • • • mod g[y), where the omitted terms are of lower degree in u and ifji = Bi^^. mod g{y). The factor (l)i{x, y) of f{x) mod g{y) corresponding to this factor Ti of T is (3)
0z(x, y) = -77^f|- mod ^(2/),
where g'{ij) is the derivative oi g{y). (Implicit in this statement, since (/)^(x, y) is monic in x, is the statement that ipi = g'{y)^^x^^ + • • • mod g{y) where the omitted terms have lower degree in x.) Example 1. f{x) = x'^ — 2 and g{y) = ^^ — 3. The first step is to find J^{z, t, u) for this / and g. When the monomials x"^^ for 0 < a < 2 and 0 < /^ < 2 are put in the order 1, x, y, xy, the matrix that represents multiplication by tx + uy becomes ' 0 t u 0 2t 0 0 u 3u 0 0 t 0 3u2t0 Therefore J^ is the determinant of
Essay 1.5 A Factorization Algorithm z -2t -3u 0
-t -u z 0 z 0 —3u -2t
23
0 -u -t z
which can be found without too much paper-and-pencil calculation to be z^ - (4^2 -h6?i2)^2 _^4^4 _ 12^21^2 _|_9^4^ rpj^-g polynomial T{z, t, u) is irreducible because J-{z, 1,1) = z^ — lOz^ + 1 obviously has no root mod 5, so it can only have a factorization of the form {z"^ -i-az-\-b) (z^ -h cz-^ d) = z^ — lOz^ + 1, and this would imply a — —c, d -^ ac-\- b = —10, and 6 = d = ± 1 , so a^ = —ac = b-\-d-\-10 =^ ±2 + 10, which is impossible. Therefore, x^ — 2 is irreducible mod y^ — 3 (the factorization algorithm produces only the one factor corresponding to ^ itself). To determine this factor—which must, of course, be a:^ — 2 itself— one computes the coefficient ip of u^ in T{x + uy, 1, li) = {x -\- uy)^ — (4 -h 6u^){x + uy)"^ + 4 — 12u'^ + 9t^^ because degT/degg = 2. (As expected, the coefficient of u^ is y"^ — 6y'^ + 9 = 0 mod {y'^ — 3), and the coefficient of u^ is 4:xy^ — 12xy = 0 mod {y'^ — 3).) Because xp = Qx'^y'^ — Ay'^ — 6x'^ — 12, formula (3) gives the factor 6x'^y'^-4y'^-6x^-12
_ 18^:2-12-6x2-12 _ 12x^-24 12 i^yy
=
X
2 mod g{y)
as expected. Example 2. f{x) determinant of
2 and g{y) = y
18. In this case, J-'{Zj t, u) is the
-t u 0 z z 0 —u -2t 0 z -t -ISu 18u -2t z 0 which can be found—the calculation is a variation of the one in Example 1—to be z^ - (4^2 + 361/2)^2 _^ 4^2 _ 72^2^2 _^ 324^^2^ rj.^^^ ^ factorization T{z^t,u) = J^i{z^t,u)J^2{z^t,u) can be found by completing the square to put J^ in the form T{z,t,u) = {z^ - 2t^ - l8v?Y - lUi^v? = {z^ - 2^ lSv?-l2tu){z'^-2t^-l8v? + l2tu). The factor oif{x) mod g{y) corresponding to J^i{z^t^u) is, because in this case //i = d e g ^ i / d e g ^ = 1, the coefficient of u in {x + uyY — 2 — 181^^ _ i2u divided by g'{y), which is 2xy - 12 _ 2x2/2 - I2y _ 36x - 12^ = X - yf mod (y2 _ 18). 2y2 36 2?/ (As expected, the coefficient of v?^ which is y'^ — 18, is zero mod g{y).) In the same way, the factor of /(x) mod g{y) corresponding to ^2 is x + \y. Indeed, (x - \y) (x + \y) = x2 - ^y'^ = x^ - 2 mod [y'^ - 18), so /(x) = x^ - 2 splits mod g{y) = y'^ — IS into linear factors. (If y = \/T8, then | = \/2.)
24
1 A Fundamental Theorem
Example 3. f{x) — x^ + c i x + C2, g{y) = y'^ — cf -^ 4c2. T h e factorization depends on factoring t h e characteristic polynomial of 0
t
— C2t —Cit
du 0
0 du
u
0
0
U
0 t —C2t —cit
where d = cf — 4c2. T h e computation of this characteristic polynomial is not too onerous. (One m e t h o d is t o use t h e formula z"^ — A\z^ -h A2Z^ — A3Z + A4 for the characteristic polynomial, where Ai is the sum of t h e i x i principal minors of t h e matrix.) T h e result is T{z, t, u) = z^ + 2citz^ + (2c2t^ + cji'^ - 2du^)z^ + Cit{2c2t'^ -\-4r
-h {2c2d - cld)ru^
2du^)z
+ J2^A d^u
a homogeneous polynomial in 2:, t, and u with coefficients in Z[ci,C2] when c\ — 4c2 is substituted for d. T h e difficult step is the factorization of T{z^ t, u). W h e n t = 0 it is ^^ - 2dv?'z'^ + d^u^ = {z'^ - dv?Y, and when 1/ = 0 it is z"^ + 2citz^ + (2c2 + cl)t^z'^ + 2ciC2t'^z + clt^ = (z^ + ci^t + C2t^Y. Therefore, the factorization of ^ , if there is one, must be of the form (2:^ + c i t z + C2t^ + ptu — dv?){z'^ -\- citz -\- C2t^ + gti^ — (ii^^), where p and g' are in Z[ci, C2]. T h e coefficient of t^ix is 0 on the one hand and C2(p + q) on the other, so p — —q. T h e n the coefficient of t^v? is 2c2
C\
- mod^(?/).
(Note t h a t t h e coefficient of u^ is y^ d = 0 mod 5'(2/), as expected.) In t h e same way, ^ 2 leads to t h e factor x + y — | . Note t h a t this process merely factors f{x) mod g{y). It does not construct t h e polynomial g{y) modulo which f{x) is a product of linear factors. Example 4. f{x) — x^ — 2 and g{y) — y^ — 2. W h e n the monomials x^y^ with 0 < a < 3 and 0 < /? < 3 are put in the order 1, y, 7/^, x, xy, xy"^, x^y^^ the m a t r i x of which T^z^t^u) is the determinant is* z/g — IM^^ UMy, where My is the 9 x 9 m a t r i x of 3 x 3 blocks in which the diagonal blocks are all equal to "0 1 0 '
G Here Ik is the k x k identity matrix.
00 1 200
Essay 1.5 A Factorization Algorithm
25
and the blocks off the diagonal are zero, and where Mx is the 9 x 9 matrix of 3 X 3 blocks that has the form of the matrix G; that is, the first block in the first row is zero, the second block in the first row is the identity matrix /s, and so forth, the first block in the last row being 2/3. Hand computation of this 9 x 9 determinant is straightforward but tedious. An easier method of hand computation uses the formula T{z^ l^u) = det {{zl^ — uG)^ — 2/3) that is proved in the next essay. Since G^ = 2/3 (this is the main property of G), it follows that J-{z^ l^u) is the product of the determinant of {zl — uG) — G and the determinant of [zI - uGf + {zl - uG)G + G^
The first determinant is {u^\f
z^ 2{u^-u^l) 2{-2u^\)z det [r^I
( - 2 ^ + 1)2; z^ 2{u^ -u^\)
v?-u-^l {-2u-V\)z z^
- G) = {u-\-l)^ ( T ; ^ ]
- 2) =
2;^—2(tfc-f-l)^, which gives the factor z^ — 2{t-\-u)^ of ^ ( z , t , i i ) ; call it J^i{z^t,u). The second determinant is z^ -^2{-2u + ifz'^ + 4.{v? - w + 1)^ - 3 • 2{v? -u^z- + {-^u^ + ^v? + 6ii - 4)2;^ + 4.{v? - IX + 1)"^, which gives l){-2u-^l)z'^ v6 the factor z^ -f {-Au^ + ^tv? + Gt^w - U'^)z'^ -h 4.{v? -tu^ff ofT{z,t,u); call it T2{z^t^u). The first factor Ti{z,t,u) is irreducible because its degree is 3, and any factor of J-" has degree divisible by 3. That ^2(^5^,'^) is irreducible follows from the irreducibility of J^2(^, 1, - 1 ) = z^ -\- 108. (Since z^ + 108 = (z^ -h 2)(z^ -i- z -\-2){z'^ — z -\-2) mod 5 and the factors on the right are irreducible mod 5, all factors of z^ -\- 108 must have even degree. Similarly, 2:^ + 108 = {z^ — 2){z^ + 2) mod 7 and the factors on the right are irreducible mod 7, so all factors of 2^^ + 108 must have degree divisible by 3. Thus, 2;^ -h 108 has no proper factor.) Therefore, x^ — 2 is a product of a factor of degree 1 and a factor of degree 2 mod y^ — 2. The actual factorization x^ — 2 = {x — y){x'^-^xy-\-y'^) mod {y^ — 2) is in fact obvious once the factor of degree 2 is known to be irreducible. Its derivation using the algorithm is as follows: The coefficient of u^ in J^i{x -\- uy^ l,ifc) is 3xy^ — 6, so the factor is ^^| o"^ = 3xy -6j/ ^ ^ _ y ^ ^ ^ g{y). (As expected. 3?/2
32/3
the coefficient of u^ is y^ 0 mod g{y)-) The coefficient ofu^ in {x-{-uy)^ + 4)(x + uy)^ + 4(1^2 - 1 ^ + 1)^ is Ibx'^y^ + {-4:){3x'^y) + 6(3x7/2) _^ 5^3 ^ 4 (3 + 3), so the corresponding factor of f{x) mod g{y) is 30x'^y-12x'^y-\-18xy'^-\-12-\-24 18x'^y-\-18xy'^-\-18y^ _ -\- xy -{-y'^ mod {y^- 2), as (32/2 18?; expected. (Also as expected, the coefficient of u^^ which is y^ — 4y^ + 4, and the coefficient of tu^^ which is 6xy^ + 6y^ — I2xy'^ — 12, are both zero mod (2/^-2).) Example 5. f{x) c, 9{y) — y^ — c. Replacement of 2 with c in the matrix G of Example 4 gives as the two factors of T{z, l^u) the polynomials z^ — c{u + 1 ) ^ , which is the determinant of
26
1 A Fundamental Theorem z -{u^l) 0 z -c{u-{-l) 0
0 -{u^l) z
•)
and z^ -hc{--2w^H- 3u^ -f 3u - 2)z^ -^c^{u'^-u + 1)^, which of z'^ {-2u^l)z u'^-u-\-l' c(ii^-u+l) z'^ {-2u+l)z c{-2u + l)z c{v? - li + 1) z'^ The factorization of T{z^t^u) is derived from these factors by making the factors homogeneous in t. Both factors are irreducible because they are irreducible when c — 2. They prove that the obvious factorization x^ — c = {x — y){x^ -\- xy -\- y'^) mod {y^ c) is a factorization into factors that are irreducible mod y^ — c.
Essay 1.6 Validation of the Factorization Algorithm
27
Essay 1.6 Validation of t h e Factorization Algorithm Let f{x) and g{y) be monic, irreducible polynomials with coefficients in t h e ring R = Z[ci, C2, . . . , c^,] of polynomials in ci, C2, • . . , c^, with integer coefficients. T h e last essay showed t h a t if C is a column matrix of length mn whose entries are the monomials x^y^ with 0 < a < m — d e g / and 0 < /3 < n = d e g ^ , arranged in some order, t h e n t h e congruence MC = C'{tx
+ uy) mod
U{x),g{y))
determines an mn x mn matrix M whose entries are homogeneous linear polynomials in t and u with coefficients in R. (Here tx -f uy is to be regarded as a 1 X 1 matrix, so t h a t the right side of the congruence, like t h e left, is a m a t r i x product.) Let ^ ( z , t, u) = det{zl — M) be t h e characteristic polynomial of M (which, by the way, is independent of the choice of the order of the entries of C) and let be its factorization into irreducible factors t h a t are monic in z. Clearly T is homogeneous of degree mn in z, t, and u with coefficients in R^ so t h e J^i are homogeneous in z, t, and u. It is to be shown t h a t the degree of each J-i{z, t, u) is a multiple of n, say deg J^i — jiin^ t h a t (1)
9'{y)'^f{x)
= ilJi{x,y)ilj2{x,y)
• " il^k{x,y) mod
g{y)
when %l)i[x^y) is defined to be the coefficient of u^^^^~^) in Ti{x + uy^ 1,'?^), and t h a t t h e factors ipi{x^y) on t h e right are irreducible as polynomials in X with coefficients in t h e root field of g{y); moreover, it is to be shown t h a t ipi{x^y) = g'{y)^^x^^ + • • • m o d ^ ( ^ ) , where t h e omitted terms have lower degree in x, so t h a t division of (1) by g'{y)'^ gives the required factorization f{x) = (l)i{x,y)(j)2{x,y) " ' (l)k{x,y) mod g{y) of f{x) mod g{y) into factors 0i(x, y) = t^^y,] mod g{y) with coefficients in t h e root field of g{y) t h a t are monic as well as irreducible. T h a t d e g ^ ^ is divisible by n can be proved as follows: Set t = 0 and u = 1 in M to find a matrix, call it M^, t h a t satisfies My • C = C • y mod g{y). Let the monomials x^y^ in C be ordered by p u t t i n g x'^y^ ahead of x^ y^ if a < a ' or if a = a^ and p < f3\ T h e n My is a matrix of n x n blocks in which the blocks off the diagonal are all zero and t h e blocks on the diagonal are all u times t h e matrix G whose first n — 1 rows are t h e last n — 1 rows of the n X n identity matrix In and whose last row is —bn^ —bn-i^ • • . , —&i, where the bj are t h e coefficients oi g{y) = y'^ -i- biy^~^ + 62^^"^ + • • • + ^nAs is easily* shown, g{y) = det{yln — G). T h e n T{z^ 0, u) = det(2;/ — uMy) = det{zl - uG)"^ = u^"" d e t ( f / n - G ) ^ = i i ^ ^ ^ ( f ) ^ = g{z, w ) ^ , where g{z, u) * This is the special case f{x) = x and u = 1 of the formula J^{z, l,u) = det f{zl — uG) proved at the end of this essay.
28
1 A Fundamental Theorem
is the homogeneous polynomial in z and u with coefficients in R for which g(^z) = ^(z, 1). Thus n ^A^^ O5 '^) = ^(^^5 '^)^5 so, because ^(z, ix) is irreducible over R and because T[^z^ 0, i^) and ^(z, u) are both monic in z, each JF^(z, 0, TX) is g{z,u)^' for some /i^, where ^ fii = m. Thus, deg^^ = /i^n, as was to be shown. When Ti{x -\- uy^ l^u) is expressed as a polynomial in u whose coefficients are polynomials in x with coefficients in the root field of g{y), the coefficient of u^'^^'^"^ is g'{yY^x^^ + • • • mod g{y), where the omitted terms have lower degree in x. In particular, its degree as a polynomial in u is at least iii(n — 1). This statement follows from the observation that J^i{x -\- uy, l,u) is a sum of terms Xaj3-f{x + uy)^l^u^, where Ac,/?^ is in Z[ci, C2, . . . , c^] and a + /? + 7 = liiU. Thus, this polynomial contains no terms whose combined degree in x and u is greater than jiiU^ and the terms whose combined degree in x and u is exactly //^n are the terms of Ti{x + uy,0,u) = g{x + uy,u)^\ This homogeneous polynomial in x and u with coefficients in the root field of g{y) can also be written in the form
'^^-g (y + ^, lY' = «^- (giy) + g'{y)^ + lg"{y){-)^
+ -i^i
g{y)u^^g\y)xu^-' =
+ \g"{y)x^u^-^
+
{g\y)xu^-^^...Y^
= g\yY^x^^u^^^''-^^ + . •.
mod g{y),
where the omitted terms all have combined degree fiiU in x and u and degree less than /i^(n —1) in u. Therefore, the coefficient of ?/^*^^^~^^ in !Fi{x-\-uy^ l,i/) is as described above. When T[x + uy^ 1,7/) is expressed as a polynomial in u whose coefficients are polynomials in x with coefficients in the root field of g{y), the coefficient ^y ^m(n-i) ^^ g'{y)^f{x) uiod g{y) and the coefficients of all higher powers of u are zero mod g{y). The main step in the proof of these statements is the proof that, as was stated in the preceding essay, T{tx -\- uy^t^u) = 0 mod {f{x)^g{y)). For the proof of this congruence, let M be as above and let A^* be the adjoint of the matrix M — {tx + uy)Imn — M of which J^{tx + uy^ t, u) is the determinant. (That is, the entry in the zth row of the j t h column of A^* is (—1)*"^-^ times the determinant of the [mn — 1) x {mn — 1) matrix that remains when the zth column and the jih row of At are deleted.) Then A4* • M — J-'{tx + uy,t,u)Imn- By the definition of M, all entries of the column matrix AiC are zero mod {f{x)^g{y)). Therefore all entries of J^{tx + uy^t^u)ImnC are zero mod {f{x),g{y)). Since 1 is an entry of C, it follows that J^{tx + uy^t^u) = 0 mod {f{x)^g{y)). Thus, T{x + uy^ 1, w) = 0 mod {f{x)^g{y)). Since the combined degree in x and u of any term of T{x + uy^ 1,7x) is at most m.n^ the coefficient of u^^~^ has degree at most j in X] when j < m it follows that this coefficient has degree less than m, which
Essay 1.6 Validation of the Factorization Algorithm
29
means that it is already reduced mod f{x) and therefore, because reduction of it both mod f{x) and mod g{y) gives the result zero, it must be zero mod g{y). Thus, the coefficients of u^ for k > m{n — 1) are all zero mod g{y)^ as was to be shown. As in the previous paragraph, the coefficient of J^{x + uy^ 1, i/) is congruent to g'{y)^x'^ + • • • mod g{y)^ where the omitted terms have degree less than mmx. Since f{x) is monic of degree mmx^ the polynomial T{x + uy^ 1, u) — g'(y)^ f {x)u'^^^~^^ is zero mod {f{x)^g{y)) and the coefficient of it has no terms of degree m or greater in x, so this coefficient is zero mod g{y)^ as was to be shown. Thus, when the two sides of the congruence J^{x -\- uy, 1, u) = Yl^ J^i{x -h uy^l^u) laod g{y) are regarded as polynomials in u whose coefficients are polynomials in x with coefficients in the root field of g{y)^ the left side is a polynomial of degree m{n — 1) whose leading coefficient is g'{y)'^f{x). The ith factor on the left has been shown to have degree at least /ii{n — 1). If it had greater degree for any i, the degree of the product would be greater than J2iii{n — 1) = m{n — 1), which is not the case. (Since the ring of polynomials in x with coefficients in the root field of g{y) is an integral domain, the degree of a product is the sum of the degrees of the factors, and the leading term of a product is the product of the leading terms.) Therefore, J^i{x + uy^ l^u) = \lji{x^y)u^^^^^~^^ + • • • mod g{y)^ where the omitted terms have degree less than //^(n — 1) in u^ from which the desired congruence g'{y)'^f{x) = Ylii^i{x,y) mod g{y) follows. Since it has already been shown that il^i{x,y) = g'{yY^x^^ + • • • mod g{y)^ it remains only to show that the il)i{x^ y) are irreducible over the root field of g{y). This fact will follow from a quite different description of T{z^ 1,'^), namely, as det f{zln — uG), where G is the n X n matrix described above. As was noted above, the matrix My obtained from M by setting t = 0 and li = 1 in M is a matrix of n x n blocks in which the nondiagonal blocks are all zero and the diagonal blocks are all the matrix G described above. Similarly, M^, the matrix obtained by setting t = 1, u = 0 in M, is a matrix of n X n blocks; the first (m — l)n rows are the last (m — l)n rows of Imn^ and the last n rows contain the matrices —amln, —^m-i-^n, • • •, —<^i^n, where the ai are the coefficients of f{x) = x'^ + aix'^~^ + • • • + a-m- What is to be shown is that ^ ( ^ , l,t^), which is the determinant of zimn — ^x — uMy by the definition of ^ , is det f{zln — uG). Let C be the matrix of n x n blocks in which blocks above the diagonal are zero, blocks on the diagonal are J^, and blocks i steps below the diagonal are {zIn — uGy. By direct computation, {zImn — Mx — uMy)C has the last (m — l)n rows of —Imn in its first (m — l)n rows, and has f{zln — uG) in the first n columns of its last n rows. Since the determinant of £ is 1, it follows that J^{z^l^u) — dzdet f{zIn — uG). Since T{z^ 1, u) and det f{zln — uG) are both monic in 2:, the sign is plus.
30
1 A Fundamental Theorem
Since the matrix G satisfies g{G) — 0 (by direct computation,* or, because g is the characteristic polynomial of G, an application of the Cay ley-Hamilton theorem), a factorization f{x) = 6i{x,y)02{x,y) - - - Oi{x^ y) mod g{y) in which each Oi{x^ y) has coefficients in the field K of rational functions in ci, C2, . . . , Cy and is monic in x implies T{z^\^u) — detf{zln — uG) =
* Let pi denote the zth row of In- Then piG-' = pi^i for ji = 0, 1, ..., n — 1, but piG"^ = pnG = [-ao -ai ••• - a^-i] = pi{-aQl - aiG an-iG"^'^), which proves that pi • g{G) = 0. Therefore, pi^j • g{G) = pi • G^ • g{G) — pi • g{G) • C^ = 0 • G^ = 0 for j = 0, 1, ..., n - 1, which is to say g{G) = Ing{G) = 0.
Essay 1.7 About the Factorization Algorithm
31
Essay 1.7 About the Factorization Algorithm T h e m e t h o d of factoring f{x) mod g{y) in t h e preceding essays has as an immediate corollary: T h e Kronecker—Kneser* T h e o r e m . Let f and g be monic, polynomials with coefficients in Z[ci, C2, ..., c^\, and let f{x)
= 01 (x, y)02(^, y)-"
irreducible
(t>k{x, y) mod g{y)
and g{y) = xlJi{x,y)i;2{x,y)"-
^pi{x, y) mod
f{x)
be the factorizations of each modulo the other. Then k = I, and the can be so ordered that deg^ (/)^/ deg^^ ipi = deg / / d e g ^ for each i.
factors
Proof. To factor g{y) mod / ( x ) , one constructs the characteristic polynomial ^{z, t, u) of t h e m a t r i x tMy^uM^ for which {tMy-\-uMx)C = C{ty-\-ux) m o d {g{y)^ f{x)), where C is the column m a t r i x of length mn t h a t contains the monomials y^x^ in which 0 < a < n = degg and 0 < f3 < m = d e g / . Because this characteristic polynomial is independent of t h e order chosen for the entries of C, it is clear t h a t !F{z^t^u) = !F{z^u^t). T h e factorization algorithm proves t h a t A: = / is t h e number of irreducible factors of J^{z,t,u), and the integers deg / deg^ ipi = d e g ^ deg^, (pi are t h e degrees of those factors. In addition to its aesthetic appeal, this theorem is a powerful tool. See Essay 2.1, where it is used in the proof of Galois's fundamental theorem. Inevitably, some readers will object t h a t t h e algorithm is impractical. T h e construction of T{z, t, u) is already a formidable task, and t h e factorization of this polynomial in three indeterminates with integer coefficients is even more daunting. But t h e practicality of t h e algorithm is irrelevant, because its purpose is to prove the existence of the factorization, not to effect it. Once t h e factorization is known to exist, methods for constructing it can be addressed. A similar situation occurs in t h e case of t h e fundamental theorem of algebra (see Essay 5.1); Newton's m e t h o d is in most cases the best way t o construct the roots of a polynomial, but other methods are needed to prove t h a t there are roots t o be constructed. Kronecker emphasized the importance of the problem of factoring f{x) m o d g{y) in a footnote to his 1887 paper "Uber den Zahlbegriff" [43, p. 262 of vol. * I called this theorem "Dedekind's reciprocity theorem" in Galois Theory [18, p. 66], but I have since learned that it already had the name "KroneckerKneser theorem" (see ([8] and [50]). Richard Dedekind discovered it in 1855, but the discovery was not published until Scharlau's paper [59] appeared in 1982. Kronecker included the theorem in his university lectures ([32, p. 309]). He might have known of Dedekind's work, but since he does not seem to have cited Dedekind, he probably discovered the theorem independently. The first publication of the theorem was by A. Kneser [36].
32
1 A Fundamental Theorem
I l i a of t h e repubhcation in Mathematische Werke]. In 1881, he had already described an algorithm for such factorizations in the following way: It can be assumed t h a t f{x) has no repeated factors, because otherwise one could free it of repeated factors by dividing it by its greatest common divisor with its derivative. One sets z -\- uy in place of x in f{x), where u is an indeterminate; at t h e same time, one t r e a t s / itself as a function of X and the algebraic quantity y, which may figure in its coefficients. Therefore, denote / by f{x,y) and form the product of all the conjugate expressions f{z-\-uy^ y)^ t h a t is, all of t h e m t h a t arise when y is replaced by its conjugate values. This product is a polynomial in z whose coefficients are rational functions in ci, C2, . . . , c^^ [the presence of u in the coefficients is ignored] and therefore, as has been shown, can be decomposed into irreducible factors. If these factors are F i ( z ) , F2{z), . . . , then, as is easy to see, the greatest common divisors of / ( z + uy^ y) and Fi{z) for z = 1, 2, . . . give the irreducible factors of f{z + uy^ y)^ from which the irreducible factors of f{x) itself can be found when x — uy is substituted for z. It remains to remark t h a t substitution of 2; +1^^ for x ensures t h a t y actually occurs in the coefficients of / . " ^ Dabei kann angenommen werden, dass die Function f{x) keine gleichen Factoren enthalt; denn anderenfalls wiirde man dieselbe von gleichen Factoren dadurch befreien konnen, dass man sie durch den grossten Theiler, den die Function f{x) mit ihrer Ableitung gemein hat, dividirt. Man setze nun zuvorderst z -\- uy an Stelle von x in / ( x ) , wo u eine unbestimmte Grosse bedeutet; man betrachte ferner / selbst als Function von x and der zum Rationalitats-Bereich gehorigen algebraischen Grosse y welche also auch in den Coefficient en vorkommen kann, bezeichne demnach die Function / durch / ( x , y) und bilde das Product aller mit einander conjugirten Ausdriicke f{z-\-uy,y), d. h. aller derjenigen, welche entstehen, wenn man die mit y conjugierten algebraischen Grossen an Stelle von y setzt. Dieses Product ist eine ganze Function von z, deren Coefficienten rationale Functionen der Variabeln ci, C2, . . . , Cj, sind, kann also nach dem Vorhergehenden in irreductible Factoren zerlegt werden. Sind diese Factoren: J^i{z), ^2(2^), •••, so bilden, wie leicht zu sehen, die grossten gemeinschaftlichen Theiler von f{z-\-uy,y)
und
Th{z)
fur /i = 1, 2, . . . die irreductibeln Factoren von f{z-\-uy^ y), aus denen die Factoren von f{x) selbst unmittelbar hervorgehen, wenn wieder x — uy ?iii Stelle von z gesetzt wird. Es ist noch zu bemerken, dass die Einfiihrung von 2: + ny an Stelle von X zu dem Zwecke erfolgt ist, das Vorkommen von y in den Coefficienten zu sichern. (From §4 of [39]. The translation above is somewhat free, and Kronecker's notation F , 91, IH', d\'\ d\"'^ . . . , has been changed to / , ?/, ci, C2, . . . , Cjy to agree with the notation of these essays.)
Essay 1.7 About the Factorization Algorithm
33
My discussion of this subject in [18, §§60-61] shows that I found the exact algorithm Kronecker had in mind—not to mention its validity—far from "easy to see." In retrospect, however, I do see that it is essentially the algorithm of Essay 1.5. Instead of factoring a polynomial in x alone as in Essay 1.5, Kronecker changes f{x) to f{z + uy), where u is an indeterminate, in order to be sure that the polynomial to be factored does involve y. He then forms the "product of its conjugates," by which he surely means (see his §2) the norm of f{z -h uy) as a polynomial with coefficients in the root field of g{y)^ which is to say that it is plus or minus the constant term of the polynomial of which f{z-\- uy) is a root. The polynomial of which f{z-\- uy) is a root is the characteristic polynomial of the matrix M of elements of R[z, u] defined by C • f{z + uy) = MC mod g{y)^ where C is the column matrix with entries 1, ?/, ^^, . . . , y'^~^ (n being the degree of ^). Thus, Kronecker's ^1(2;), ^2(^)5 • • • are the irreducible factors of the constant term of the characteristic polynomial of this M. But this is zbdetM, and M is the n x n matrix f(zl + wG), where G is the matrix determined by g{y) as in Essay 1.6. Thus, the Fi{z) are the irreducible factors of det/(2;/ + uG). Since, as was shown in Essay 1.6, det/(;^/ + uG) = T{z^ 1, —u)^ he is saying that the desired irreducible factors 0i(x, y) are the greatest common divisors of f{z^-uy) with Ti (z, 1, —u)^ T2{z^ 1, —It), . . . , or, better, the greatest common divisors of j{x) with T\{x — uy^ 1, —ix), ^2(^ — uy^ 1, —'^)5 When one changes the sign of u and notes that a common divisor of f{x) and Ti{x -\- uy^l^u) must be independent of u and must therefore divide all coefficients of J^i{x -h uy^l^u), Kronecker's claim becomes the statement that (j)i{x^y) is the greatest common divisor of fix) and the coefficients of J-i{x + uy^ l,?i) when it is expanded in powers of u. Now, (j)i{x^y) is the greatest common divisor of f{x) and the leading coefficient %lji{x,y) mod g{y) in this expansion, so his claim comes down to the statement that (/)i(x, y) divides all the other coefficients oi Ti{x-^uy^l, u) when they are regarded as polynomials in x with coefficients in the root field of ^(y). Proposition. As a polynomial in x and u with coefficients in the root field of g{y), J^i{x -\-uy^l^u) is divisible by (f)i{x^y). Proof. Let /C be the field K[x^y] mod ((/>i(x, y),g{y)), which is the ring of polynomials in X with coefficients in the root field of g{y) modulo the irreducible polynomial (j)i{x^y) with coefficients in this root field. (As before, K denotes the field of rational functions in ci, C2, . . . , c^.) Since f{x) is 0 as a polynomial with coefficients in /C (because f{x) is divisible by (j)i{x^y) mod ^(2/)), and since ^(2;, t, tt) = 0 mod {f{x),g{y)) as was shown in Essay 1.6, J-'{z^ t, u) is zero as a polynomial with coefficients in IC. Therefore at least one of its factors Tj{z^t^u) must be zero as a polynomial with coefficients in /C. For any such value of j , J-j{x -\- uy^ 1, u) must be zero as a polynomial in u with coefficients in /C. In particular, iljj{x^y) must be zero as an element of /C, which is to say that ipj{x,y) is divisible by (f)i{x^y) mod g{y). But ipjix^y)
34
1 A Fundamental Theorem
is a unit times (j)j{x^y) and the irreducible factors (j)j{x^y) of f{x) mod g{y) are distinct because f{x) is irreducible. Therefore, 2pj{x,y) is not divisible by (j)i{x^y) mod g{y) unless j — i, and the proposition follows.
Essay 1.8 Proof of the Fundamental Theorem
35
Essay 1.8 Proof of the Fundamental Theorem As before, R will denote the ring Z[ci, C2, . . . , c^^] of polynomials in ci, C2, . . . , Cjy with integer coefficients and K will denote its field of quotients, the field of rational functions of ci, C2, . . . , c^^. When z/ = 0, i? is the ring of integers and K is the field of rational numbers. The theorem to be proved was stated in Essay T2: Fundamental Theorem. Given a polynomial f{x) — a^x^^a\x^~^ ^ h On of positive degree n with coefficients in R, construct a monic, irreducible polynomial g(y) with coefficients in R with the property that f{x) is a product of linear factors with coefficients in the root field of g{y). In other words, when the factors of f{x) mod g{y) are taken to be monic in X, the factorization is to have the form f{x) = ao{x — pi{y)){x — p2{y)) • • • (x — pn{y)) mod g{y), where ao is the leading coefficient of f{x) and the pi{y) are elements of the root field of g{y). Such a polynomial g{y) will be said to split f{x). As the proposition at the end of Essay 1.4 implies, the roots Pi{y) are determined, as elements of the root field of g{y)^ by f{x). Loosely speaking, the root field of g{y) extends computations in J^ in such a way that the given f{x) with coefficients in R splits into linear factors. The factorization algorithm of the preceding essays, which assumes that f{x) is monic and irreducible, can be used to factor an arbitrary / by taking the change of variable xi = aox and writing aQ~^f{x) = Xi + aiXi~^ + aoa2Xi~'^ H ha^Q~^ajX^~-^ H ha^'^an- A factorization of aQ~^f{x) as a polynomial in Xi becomes a factorization of f{x) as a polynomial in x when it is divided by the nonzero element OQ"^ of K and the substitution Xi = aox is made. In this way, the theorem is reduced to the case in which f{x) is monic. The iteration theorem below proves this case of the theorem using the factorization algorithm for monic, irreducible polynomials, which obviously implies a factorization algorithm for arbitrary monic polynomials. This theorem differs from Kronecker's theorem in Ein Fundamentalsatz der allgemeinen Arithmetik [42] in that it specifies that the splitting field is to be described as the root field of g{y)^ whereas Kronecker left the form of the description open and in fact preferred a "prime module system" of an altogether different type. Nor is the proof below similar to Kronecker's, which constructed specific relations satisfied by the roots in a splitting field. Instead, it constructs a splitting polynomial g{y) for f{x) in an iterative way that follows the naive proof sketched at the beginning of Essay 1.4. Iteration Theorem. Given a monic polynomial f{x) with coefficients in R, and given a monic, irreducible polynomial g{y) with coefficients in R that does not split f{x), construct a monic, irreducible polynomial h{z) with coefficients in R for which the factorization of f{x) mod h{z) contains more linear factors than does the factorization of f{x) mod g{y).
36
1 A Fundamental Theorem
Proof. The factorization of f{x) mod g{y) is accomphshed by applying the factorization algorithm to each of the monic, irreducible factors of f{x) and taking the product of the results. By assumption, at least one of the irreducible factors of f{x) mod g{y) obtained in this way has degree greater than 1. With the notation as before, at least one of the polynomials T{z^ t, u) used in the factorization of f{x) mod g{y) (there is an T for each irreducible factor of /(x)) must, by assumption, have at least one factor !Fi{z^t^u) that gives rise to a monic factor (j)i{x^y) of f{x) mod g{y) of degree greater than 1. Let Ti{z,t^u) be such a factor, and let
Essay 1.8 Proof of the Fundamental Theorem
37
at most / / i n — cr in w, so det J\f{u) has degree at most 1 + 2 H h (/iin — 1) = | / i i n ( / i i n — 1) in u. If detAf{u) were zero, there would be* a nontrivial solution vi{u)^ '^2(^), • • • 7 '^/xin(^) (polynomials with coefficients in K) of t h e /iiu x / / i n homogeneous system of linear congruences vi{u){x-\-uy)^'^^~^ -\-V2{u){x-^uy)^'^'^~'^ -\• • --^-Vij^^riiu) = 0 mod (01 (x, y)^g{y))' (This single congruence is equivalent to / i i n congruences, one for each row x^y^ of Ci.) In other words, there would be a nonzero polynomial F{z^ u) with coefficients in K—and therefore one with coefficients in R—whose degree in z was less t h a n jj^in = d e g ^ i and for which reduction of F{x + uy, u) mod (01 (x, y)^g{y)) gave zero. Application of t h e Euclidean algorithm to Ti{z^ 1, u) and F{z^ u) as polynomials in z with coefficients in t h e field of quotients of K[u] would give polynomials a{z) and f5{z) in z with coefficients in this field for which d{z) = a{z)Ti{z^ 1, u)-\-l3{z)F[z^ u) was a common factor of ^ i ( z , l,ti) and F{z^u), say ^ i ( z , l,ix) = d(z)qi{z) and F{z,u) — d{z)q2{z). There would be a polynomial A{u) in u with coefficients in R t h a t cleared the denominators in all three equations, say D{z^u) = A{z,u)J^i{z,l^u) -\- B{z^u)F{z,u), A{u)Ti{z,l^u) = D{z^u)Qi{z^u) and A{u)F[z^u) — D{z,u)Q2{z,u)j where deg^ D = degd. Since J^i{z^l,u) is irreducible, it would be a factor either of D{z, u) or of Qi{z^ u). It cannot be a factor of D ( z , w), because this would imply t h a t it was a factor of A{u)F{z^ u)^ contrary t o deg^ F < fiin = d e g ^ ^ i . Nor can it be a factor of Qi{z,u)^ because then D{z^u) would divide A(u) and therefore be independent of z, cont r a r y to D ( x + n ? / , it) =^ A{x^uy^u)Ti{x-^uy^l^u)-\-B{x+uy^u)F{x^uy^u) = 0 mod {(j)i{x,y),g{y)). Therefore, detM{u) ^ 0. Given an integer a, consider t h e homomorphism i from K[z\ to IC t h a t carries z to x -[- ay and carries elements of K t o themselves. Since L carries ha{z) = ^1(2^, 1, a) to J^i{x + a^, l , a ) , which represents t h e zero''" element of /C, L defines a homomorphism from K[z] m o d ha{z) to /C. T h e m a t r i x of coefficients of L relative to the basis x'^y^ of /C and the basis z^ (0 < 7 < / i i n ) of K[z] mod ha{z) is Af{a). If its determinant is nonzero, t h e n L is an isomorphism. In this case, because /C is a field, K[z] mod ha{z) is a field, which implies t h a t ha{z) is irreducible over K and therefore, by Gauss's lemma (see Essay 1.4), is irreducible over R. T h e degree of det J\f{u) is at most | / / i n ( / / i n — 1) < (//in)^ = ( d e g ^ i ) ^ , so det J\f{u) is zero for fewer t h a n ( d e g ^ i ) ^ integers a, and the lemma follows. Proof of the Iteration Theorem. Suppose t h a t h{z) = ha{z) = ^1(2:, 1, a), where a is chosen in such a way t h a t h is irreducible, and let )C be t h e root * If the rank of J\f{u) is fiin — 1, any nonzero row of the adjoint matrix J\f{uy oi J\f{u) is such a solution. Otherwise, choose an (r -h 1) x (r + 1) subsystem of the original fiin x /^in system whose rank r is the same as that of the original system. A nonzero row of the adjoint of the matrix of coefficients of the chosen subsystem, filled out with zeros, is a nontrivial solution of the original system. ^ In fact, according to the proposition of Essay 1.7, each coefficient of JFi (x+uy, 1, u) as a polynomial in u is zero mod {(j)i{x^y)^g{y)).
38
1 A Fundamental Theorem
field of h{z). It is to be shown that the factorization of f{x) mod h{z) contains more linear factors than does the factorization of f{x) mod g{y). In other words, it is to be shown that the root field of h{z) contains more roots of f{x) than does the root field of g{y). As was shown in the proof of the lemma, K[z] mod h{z) is isomorphic to /C, which is by definition K[x^y] mod {g{y)^ (j)i{x^ay^y)). Since this last field contains a root y of g{y)^ it contains a field isomorphic to K[y] mod g{y)^ so it certainly contains at least as many roots of f{x) as K[y] mod g{y) does. But f{x) = (t)i{x,y)(l)2{x,y) •'• (pkix, y) mod g{y) implies that f{x-^ay) = 0 mod {g{y), (pi{x-\-ay, y)), so /C contains a root of f{x)—namely, the element represented hy x-\- ay—that is not in the subfield corresponding to K[y] mod g{y) (because it is reduced mod {g{x)^(j)i{x + cty.y)) and contains x). Thus, f{x) has at least one more root in the new field than it did in the old, as was to be shown. Proof of the fundamental theorem. Given / ( x ) , start with g{y) = y and apply the following algorithm. If g{y) splits / ( x ) , the algorithm terminates. Otherwise, use the iteration theorem to construct h{z) such that the factorization of f{x) mod h{z) has more linear factors than does the factorization of f{x) mod g{y). Replace g{y) with h{z) and repeat. Since the number of linear factors of f{x) mod g{y) increases with each step and can never exceed d e g / , the algorithm must terminate after at most d e g / steps with a polynomial that splits f{x).
Essay 1.9 Minimal Splitting Polynomials
39
Essay 1.9 Minimal Splitting Polynomials T h e theorem of t h e preceding essay p u t s t h e statement "every polynomial has a splitting field" in a very specific a n d concrete form: Given a polynomial f{x) in one indeterminate with coefficients in R = Z[ci, C2, . . . , Cj^], t h e iterative algorithm t h a t proves t h e fundamental theorem in t h e last essay constructs a monic, irreducible polynomial in one indeterminate g{y), with coefficients in t h e same ring i?, with t h e property t h a t all factors of t h e factorization of f{x) m o d g{y) are linear in x. The splitting field of a polynomial, as opposed t o a splitting field, is t h e field generated by t h e roots. In other words, it is t h e field implicit in t h e assertion (see Essay 1.2) t h a t there is a valid way t o do rational computations with t h e roots of a polynomial. A specific and concrete description of the splitting field of f{x) is given by an amended version of t h e theorem of t h e preceding essay: T h e o r e m . Given a polynomial f{x) = aox'^ -h aix^~^ -h • • • -f a^ with coefficients in R, construct a monic, irreducible polynomial g{y) with coefficients in R that splits f{x) and is itself split by any polynomial that splits f{x). In particular, g{y) splits itself. Galois wrote,* ".. .every equation depends on an auxiliary equation with t h e property t h a t all t h e roots of this new equation are rational functions of one another," which is t o say t h a t t h e polynomial in t h e new equation splits itself. He went on t o write, " . . . this remark is a mere curiosity; in fact, an equation which has this property is not in general any easier to solve t h a n any other," b u t it is hard t o understand how Galois could call his observation a "mere curiosity," because, as Essay 2.1 will explain, his brilliant insight into t h e algebraic solution of equations is based on t h e existence of such an "auxiliary equation" and on t h e fact t h a t t h e solution of such an equation can be analyzed using t h e automorphisms of its splitting field. A polynomial t h a t splits itself is called a G a l o i s p o l y n o m i a l . Proof. T h e construction t h a t proved t h e theorem of t h e preceding essay in fact proves this stronger theorem, because the polynomial it constructs is split by any polynomial that splits f{x). Since any polynomial t h a t splits f{x) also splits t h e identity polynomial g{y) = y, t h e proof of this statement comes down t o proving t h e following statement: Let a monic, irreducible polynomial with coefficients in R split both f{x) and g{y); then it also splits any polynomial h{z) constructed by the iteration theorem of the preceding essay. Since h{z) has t h e form ^ i ( z , l , a ) for some integer a, it will suflB.ce t o prove t h a t any polynomial gi{v)—monic, irreducible, and with coeflBcients in R—that splits t h e monic, irreducible polynomials f{x) and g{y) also splits t h e polynomial J-{z, t, u) t h a t is constructed from f{x) and g{y) by t h e factorization algorithm in Essay 1.5. Quoted from my translation of Galois's memoir in Appendix 1 of [18].
40
1 A Fundamental Theorem
Let /C denote the root field ofgi{v). Since gi{v) splits / ( x ) , /C contains m = deg / distinct (because / is irreducible, it is relatively prime to its derivative, so it has no multiple roots) roots of / ; call them ai, a2, . . . , a^. Similarly, /C contains n = deg^ distinct roots of g; call them 61, 62, . . . , bn- That gi{v) splits J^{z^ t, ix) will be proved by proving that J-{z, t, u) = jT(^ — ciit — bju)^ where the product is over all mn pairs (i, j) in which 1 < i < m and I < j < n. The number mn of factors on the right is the degree of the homogeneous polynomial on the left, and they are distinct (to say that two polynomials of the form z — ait — bjU are equal means that their coefficients are equal, which occurs only when both i and j have the same values in them), so the formula will be proved if it is proved that each z — ait — bjU is a factor of T{z, t, u). In other words, what is to be shown is that T{ait + bjU^ t, u) — 0. To put it yet another way, the determinant of the vnn x mn matrix (a^t H- bju)Imn — tMx — uMy is to be shown to be zero for all pairs (i, j ) . Here M^ is by definition the mn x mn matrix of elements of R for which MxC = Cx mod {f{x),g{y)), where C is a column matrix of length mn whose entries are x^y^ for 0 < a < m, 0 < P < n. Thus, each of the mn entries of the column matrix MxC differs from the corresponding entry of Cx by a polynomial of the form (j){x^y)f{x) + il){x,y)g{y). Because a^ is a root of f{x) and bj is a root of g{y)^ it follows that MxCij and Cijai are equal as matrices whose entries are in /C, where Cij is C with a^ substituted for x and bj substituted for y. In the same way, MyCij = Cijbj. Therefore {{ait + bju)Imn — t^x — uMy)Cij = t{ailmn — Mx)Cij -\-u{bjImn " ^y)Cij = 0. Siucc 1 = a^b^ is one of the entries of Cij, the determinant of {ait -\- bju)Imn — tMx — uMy must be zero, as was to be shown. A polynomial that splits f{x) and is split by any other polynomial that splits f{x) is a minimal splitting polynomial of f{x). There are infinitely many minimal splitting polynomials of / ( x ) , but only one splitting field. In other words, the root fields of two minimal splitting polynomials are isomorphic, as follows from the fact that each polynomial splits the other. This observation is the key to the notion of the Galois group of f{x), which is introduced in the first essay of Part 2.
Topics in Algebra
Essay 2.1 Galois's F u n d a m e n t a l T h e o r e m Theoreme. Soit une equation donnee, dont a, h, c, . . . sont les m racines. II y aura toujours un groupe de permutations des lettres a, h, c, . . . qui jouira de la propriete suivante: 1° que toute fonction des racines, invariables par les substitutions de ce groupe, soit rationellement connue; 2° reciproquement, toute fonction des racines, determinable rationellement, soit invariable par ces substitutions.—E. Galois [27] (English translation [18, p. 104]) A Galois polynomial is a monic, irreducible polynomial that splits itself, and a Galois field is the root field of a Galois polynomial. Here the Galois polynomial, call it ^f, is assumed to have its coefficients in a ring of the form i^ = Z[ci, C2, . . . , c^y], the polynomials with integer coefficients in some set Ci, C2, . . . , Ciy of indeterminates, so the associated Galois field extends the field of quotients K of R, which is the field of rational functions in ci, C2, . . . , Cj^. As is explained in Essay 1.3, computations in the Galois field are done with expressions of the form p{y)/q^ where p{y) is a polynomial in y with coefficients in R whose degree is less than deg^f, and g is a nonzero element of R. Such an expression can also be regarded as a polynomial in y with coefficients in K, in which case the notation K[y] mod g{y) becomes a natural one for the root field of g{y). Because g splits itself, g{x) = HlLiC^ ~" P^iv)) ^ ^ d g{y)^ where n is the degree of g and pi{y), p2{y), • • •, pn{y) are elements of the field. The polynomials pi{y) represent distinct elements of the root field of g{y). When they are reduced mod g{y)^ they are determined by g{y). One of them is y. Substitution of one of the roots Pi{y) in place of y gives an automorphism of K[y] mod g{y). (Since g{pi{y)) = 0 mod g{y), (piy) = ^{y) mod g{y) implies (j){pi{y)) = "(pipiiy)) mod^(y), so substitution of Pi{y) for y gives a homomorphism of K[y] mod g{y) to itself. It is represented by an n x n matrix
42
2 Topics in Algebra
Fig. 2.1. Galois. of elements of K, so to prove that it is one-to-one and onto it will suffice to prove (j){pi{y)) = {) vnod g{y) only when (f){y) = 0 mod g{y). But if (j)(y) ^ 0 mod g{y)^ then (j){y) has a reciprocal mod g{y)^ so (pipiiy)) has a reciprocal mod g{y) and is therefore not zero mod g{y).) Conversely, since an automorphism of the field must carry the root ^ of ^ to another root of g^ there are precisely n automorphisms of K[y] mod g{y), and an automorphism is determined by the root Pi{y) of g to which it carries y. The group of these automorphisms is the Galois group of the field. More generally, the Galois group of a polynomial f{x) is the Galois group of a minimal splitting polynomial of f{x) (see Essay 1.9). The modern version of the fundamental theorem of Galois theory states that the subgroups of the Galois group of a given Galois polynomial g{y) correspond one-to-one to subfields of the Galois field K[y] mod g(y) that contain K. This is nearly Galois's statement of his Proposition 1. Galois was thinking of the field as being built up by a succession of what he called "adjunctions" (see Essay 2.3), and the main problem was to determine, after certain adjunctions had been made, which elements of the field could then be "determined rationally." He said (see above) that there is a subgroup of the group of automorphisms that leaves fixed the elements that can be determined rationally and leaves only these fixed. (He described the automorphisms in terms of the way that they permute the roots of some polynomial of which the Galois field is the splitting field. For him, the elements of the Galois field were rational functions of the roots, so an automorphism of the field was tantamount to the permutation of the roots that it effected.) In more modern terminology, he was saying that the subfield generated by the adjoined elements xi, X2, . . . ,
Essay 2.1 Galois's Fundamental Theorem
43
Xm contains exactly the elements t h a t are left fixed by the automorphisms t h a t leave all of the x's fixed: G a l o i s ' s T h e o r e m . Let xi, X2, .., Xm be elements of a Galois field K[y] mod g{y). The obvious necessary condition for another element z of the field to be expressible as a polynomial in xi, X2, . . . ; Xm with coefficients in K—namely, that z be unmoved by any automorphism of the field that leaves each of xi, X2, . •, Xm unmoved—is also sufficient. Proof Suppose first t h a t xi, X2^ .. •, Xm are all in K, or, what is eff'ectively the same, t h a t m = 0. T h e n all automorphisms leave t h e x's unmoved, and what is to be shown is t h a t all automorphisms leave z unmoved only if z is in K. Let z be expressed as a polynomial in y of degree less t h a n n = d e g ^ with coefl^icients in K^ say z{y) = aiy^~^ -\- a2y^~^ + • • • + a^. Consider t h e polynomial C{X) = aiX^~^ + a2X^~'^ -\ \- a^ — z{y), a polynomial in t h e indeterminate X with coefficients in the Galois field. (All coefficients but t h e constant t e r m an — z{y) are in the smaller field K.)To say t h a t substitution of any root pi{y) in place of ^ in z{y) does not change z{y) mod g{y) is to say t h a t ai{pi{y)Y-^ + a2{Pi{y)T-'' + • • • + a^ - z{y) = z{pi{y)) - z{y) = 0 mod g{y). In other words, each Pi{y) is a root of C{X) in t h e Galois field. Since the degree of C(-^) is less t h a n n, the fact t h a t it has n distinct roots in the root field of g{y) (that is, n distinct linear factors X — Pi{y)) implies* t h a t it is zero. In particular, its constant t e r m a^ — z{y) must be zero, so z{y) = a^ is in i^, as was to be shown. Suppose next t h a t m = 1, and let x{y) denote the one given field element. Let t h e list of images x{pi{y)) of x{y) under t h e Galois group consist of cr distinct field elements, each occurring r = n/a times, and let 0 ( X ) = n ( - ^ ~ ^{pi{y))) be the polynomial of degree a in the indeterminate X with coefficients in the Galois field obtained by taking the product of the distinct factors X — x{pi{y)). For any j = 1, 2, . . . , n, changing y to pj{y) in (j){X) merely permutes the factors and does not change (j){X). Therefore, each coefficient oi (j){X) is unchanged by t h e automorphisms in t h e Galois group, so, by t h e case m = 0 t h a t has already been proved, 0 ( X ) is a polynomial with coefficients in K. Therefore, an element ao in R can be found for which aQ(j){X) has coefficients in R. If x{y) is replaced by aQx{y), the new x{y) is a root of t h e monic polynomial aQ~'^(/)(X/ao), and an element of the Galois field can be expressed as a polynomial in t h e old x{y) with coefficients in K if and only if it can be expressed as a polynomial in the new one with coefficients in K. Therefore, one can assume without loss of generality t h a t (/)(X) is monic in X with coefficients in R. It is irreducible, because a monic factor of (/)(X) = J J ( X — x{pi{y))) over iC is a product of some subset of its factors X — x{pi{y)) over t h e Galois field, so, if it is not 1, it has at least one x{pi{y)) as a root, which implies t h a t it has all x{pi{y)) as roots (because an automorphism carries roots to roots, and t h e automorphisms act * See the first footnote of Essay 1.4.
44
2 Topics in Algebra
transitively on the Pi{y)) and is therefore 0 ( X ) itself. If g{Y) is the Galois polynomial t h a t defines the field, the statement t h a t (j){X) has a = deg(/) roots in the Galois field is the statement t h a t g(Y) splits 0 ( X ) . Therefore, by the Kronecker-Kneser theorem,* t h e factorization of g{Y) mod <j){X) consists of G factors, each of degree r = n/cr, say g{Y) = W ^liY^ X) mod (j){X)^ where each ^i is a polynomial in two indeterminates with coefficients in K. Since (j){x{y)) = O m o d ^ ( ^ ) , g{Y) = Y\^i(X-,^{y)) mod ^(?/) (the difference of g{Y) and ^^^^^(y, X ) is a multiple of 0 ( X ) so it is a polynomial in X and Y in which the substitution of x{y) for X gives a polynomial in y and y t h a t is zero mod g{y))^ so each ^i{Y^x{y)) has precisely r distinct roots /?i(^) m o d g{y)^ and t h e roots of ^ mod g{y) are partitioned in this way into a sets of r each. Let lZ^i(y, X) be the factor for which l^i(i/, x(y)) = 0 mod g{y). Suppose now t h a t z{y) is unchanged by all automorphisms y H^ pi{y) t h a t leave x{y) unmoved; i.e., z{pi{y)) = z{y) uvod g{y) for every z for which x{pi{y)) = x(y) laod g{y). Division of z{Y) by ^liY^X)—where capital letters are used to emphasize t h a t the division is division of a polynomial in Y with coefficients in i^ by a monic polynomial in Y with coefficients in K[X] and has nothing to do with operations in the Galois field—gives an equation z(Y) = q{Y^ X)^i(Y^ X) + r(Y^ X ) , where the degree of the remainder r ( y , X ) in Y is less t h a n r . Since ^i{pi{y)^x{y)) = 0 mod g{y) for each of the T values of i for which x{pi{y)) = x{y) vaod g{y) (because the automorphism y h-> pi{y) carries ^i{y^x{y)) mod g{y) to ^i{pi{y),x{y)) and y^i{y,x{y)) = 0 mod g{y) by the definition of lZ^i(F, X ) ) , and since by assumption z{pi{y)) = z{y) mod g{y) for each of them, r{pi{y)^x{y)) is the same element of t h e Galois field for each of r distinct values of z, which is to say t h a t r{Y,x{y)), as a polynomial in Y with coefficients in the Galois field, has the same value z{y) for r distinct values Pi{y) of Y. Because the degree of r in y is less t h a n r , the argument given in the case m = 0 shows t h a t r{Y,x{y)) — z{y) is t h e zero polynomial (in Y with coefficients in the Galois field), which gives the required expression z{y) = r(0,x(i/)) of z{y) as a polynomial in x{y) with coefficients in K. Finally, t h e general case follows from t h e case ?7i = 1 just proved once one proves t h e lemma below, because a polynomial in uiXi -\- U2X2 + • • • + UmXm, where ui, U2, . . . , Um are integers, can obviously be expressed as a polynomial in X i , X2, . . . , Xrn-
L e m m a ( T h e o r e m o f t h e p r i m i t i v e e l e m e n t ) . Let xi, X2, ..., Xm he elements of a Galois field K[y] mod g{y). Construct integers ui, U2, ..., Um for which the automorphisms of the Galois field that leave all of xi, X2, . • . ; Xm unmoved coincide with the automorphisms that leave x = uiXi -\- U2X2 + • • • + UmXm unmoved. Proof. Consider first t h e case m — 2. Let u be the number of automorphisms y "^ Pi{y) foi" which xi{pi{y)) ^ xi{y). For each of these v automorphisms, * See Essay 1.7.
Essay 2.1 Galois's Fundamental Theorem
45
let Zi be the element ^^1^ 7?? f x of the Galois field, and let u be an integer that is not equal to any such Zi. (The number of ^'s is at most n — 1, so there is sure to be such an integer u among 1, 2, . . . , n.) Then x — uxi ^ X2 has the required property, because an automorphism that does not move x cannot move xi (by the choice of li, u - xi{pi{y)) -^ X2{pi{y)) 7^ u-xi{y) •}-X2{y) when xi{pi{y)) 7^ xi{y)) ^nd therefore cannot move X2 (because u • Xi{pi{y)) + X2{pi{y)) = u ' xi{y) ^ X2{y) and Xi{pi{y)) = xi{y) imply X2(pi(^)) = X2(i/)). If the lemma is true for m, it is true for TTI + 1, because one can use the inductive hypothesis first to find integers i^i, U2, . . . , Um such that the only automorphisms that leave uiXi -^ U2X2 -\- • - - -\- UmXm unmoved also leave all of xi, X2, . . . , Xm unmoved, and then use the case m = 2 to find integers Ui and U2 such that the only automorphisms that leave Ui(uiXi + U2X2 + h Um^m) + U2Xm+i uumovcd Icavc both uiXi + U2X2 4- • • • + UmXm and x^+i unmoved and therefore leave all x's unmoved.
46
2 Topics in Algebra
Essay 2.2 Algebraic Quantities . . . the deep meaning of Kronecker^s view, according to which the absolutely algebraic fields [finite fields or algebraic number fields of finite degree] are the natural ground-fields of algebraic geometry, at any rate as long as purely algebraic methods are being used.—Andre Weil [62] Kronecker asserted, in substance, in §2 of his treatise Grundziige einer arithmetischen Theorie der algebraischen Grossen [39] t h a t the field of rational functions in any finite set of algebraic quantities is isomorphic to a root field as that term is defined in Essay 1.3. Because the t e r m "algebraic quantity" is vague, this is not an assertion t h a t can be proved. However, it can be deduced from a few n a t u r a l assumptions about "algebraic quantities" as follows. Suppose first t h a t just one "algebraic quantity" q is involved. For any polynomial F{X) with integer coefficients, F{q) should be a meaningful algebraic quantity, because one must surely be able to add, subtract, and multiply algebraic quantities. Moreover, one should be able to determine when two such polynomials in q have the same value, or, what is the same, able to determine which such polynomials in q are zero. If only the zero polynomial F[X) satisfies F{q) — 0, t h e n the field of rational functions in q is simply the field of rational functions in a single indeterminate, which is the root field of the polynomial y — ci with coefficients in Z[ci]. Otherwise, there is an irreducible polynomial F{X) with integer coefficients for which F{q) = 0, provided one makes the n a t u r a l assumption t h a t a product of "algebraic quantities" can be zero only when one of the factors is zero. (Of course, F{q) = 0 implies t h a t F is not a unit.) Let F{q) = a^q^ + aiq^~^ + • • • + a^i be zero, where t h e coefficients ai are integers and F{y) is irreducible, and let / be the monic polynomial with integer coefficients defined by f{y) = a^~^F{y/a^)). Then, because rational functions of q are rational functions of a^q and conversely, and because ao^ is a root of / , t h e field of rational functions in q is isomorphic to the root field of / ( ^ ) , which completes the "proof" of Kronecker's assertion in the case of a single algebraic quantity. Suppose now t h a t t h e assertion is true for a set gi, ^^2, • • •, ^m of ^ algebraic quantities and let an ( m - h l ) s t algebraic quantity qm+i be given. Let Q and QQ denote the fields of rational functions in gi, ^^2, • • •, ^m+i and in gi, ^2, • • •, ^m, respectively. By the inductive hypothesis, QQ is isomorphic to t h e root field of some monic, irreducible polynomial g{y) with coefficients in Z[ci, r!2, •. •, c^] for some u. An element of i? can be expressed as a quotient of polynomials in qm+i with coefficients in QQ. A S before, there are two cases, depending on whether there is a nonzero polynomial F{X) with coefficients in QQ for which F{qm^i) = 0. If not, Q is the field of quotients of the ring Z[ci, C2, . . . , Cj^^qm^i] m o d ^ ( ^ ) , which is t h e root field of g{y) when it is regarded as a polynomial with coefficients in Z[ci, C2, . . . , c^^+i] and c^^^i is identified with qm+i- If so, t h e n qm+i is a root of a polynomial with coefficients in the root
Essay 2.2 Algebraic Quantities
47
field of g{y)^ say F{q,y) = 0, where F is a polynomial in two indeterminates with coefficients in Z[ci, C2, . . . , Ci^] and y is a root of g{y) in a splitting field of g; then g is a root of J^{X) = H i L i -^(-^^ yi)-) where n is t h e degree of g and the yi are the roots of g in the splitting field, so t h a t T{X) has coefficients t h a t are symmetric polynomials in T/I, 2/2,---i^n and are therefore in Z[ci, C2, . . . , Ci^]. T h e given field Q is then isomorphic to a subfield of the splitting field of g{X)T{X), namely, the subfield obtained by adjoining first a root y of g and t h e n a root q of F{X,y). By the theorem of the primitive element, this field is the root field of a monic, irreducible polynomial with coefficients in Z[ci, C2, . . . , Ci^]. In either case, then, i? is a root field, as was to be shown. The definition of an "algebraic quantity" is problematic, because algebraic quantities by their very n a t u r e do not exist in isolation; they are "things on which rational computations can be performed," and therefore are items in entire systems of computation, which is to say, elements of entire fields. All in all, it seems best to adopt Kronecker's later view (see Essay 1.1) and to abandon t h e notion of algebraic quantities in favor of a formulation of t h e subject in terms of "general arithmetic" and to make t h e above assertion a definition: An a l g e b r a i c field is a field t h a t can be described as the field of quotients of an integral domain of the form Z[ci, C2, . . . , c^] mod F , where F is a nonconstant irreducible element of Z[ci, C2, . . . , c^y]. (If F is constant and irreducible, this field of quotients is the field of rational functions in ci, C2, . . . , Cjy with coefficients in t h e field with p elements for some prime p, a field t h a t does not really seem to merit the name "algebraic field." T h e field of rational functions in ci, C2, . . . , Cjy with integer coefficients is an algebraic field; for example, it is the field of quotients of Z[ci, C2, . • . , c^, Cj^-\-i] mod c^y+i.) Note t h a t there is no stipulation t h a t the irreducible polynomial be monic in one of its indeterminates. Such a stipulation would involve the indeterminates asymmetrically. As was seen above, one can adjoin a root of a polynomial F{X) = aoX'^ + aiX^~^ + • • • + a^ by adjoining a root of t h e monic polynomial f{X) = aQ~^F{X/ao), from which it follows easily t h a t every algebraic field can be presented as the root field of an irreducible element of Z [ci, C2, . . . , Ciy] t h a t is monic and of positive degree in Ci for some i. A set of elements i^i, 1^2, • • •, Uk of an algebraic field is a l g e b r a i c a l l y i n d e p e n d e n t if no nonzero polynomial (j) in k indeterminates with integer coefficients satisfies 0(iii, U2, . . . , Uk) = 0. If t h e polynomial F t h a t determines an algebraic field involves c^^, t h e n the indeterminates ci, C2, . . . , Cj^-i represent algebraically independent elements of t h e algebraic field. (The construction of this field can be described as the adjunction of a root of a monic irreducible polynomial t o the field of rational functions of ci, C2, . . . , Cjy_i.) Therefore, such an algebraic field always contains a set of z/ — 1 algebraically independent elements. T h e t r a n s c e n d e n c e d e g r e e of an algebraic field is t h e m a x i m u m number of elements in an algebraically independent subset. T h e following proposition implies t h a t an algebraic field defined by some irreducible F in Z[ci, C2, . . . , Cj/l has transcendence degree u — 1:
48
2 Topics in Algebra
P r o p o s i t i o n . Let a nonconstant irreducible element F OJTA\C\, C^, -.., c^\ he given. Assume that F involves c^ and let u he an element of the algehraic field F determines. For a given numher I < v, one can determine whether the elements Ci, 02, . •., ci, u are algebraically independent, and if they are, one can describe the algebraic field as being determined by an irreducible element Fi ofZ[di, d2, . . . ; dj^], where the first I + 1 indeterminates di, d2, ..., di-^i are identified with ci, C2, ..., ci^u. Proof. Assume without loss of generality t h a t F is monic in Cy. Every element of the algebraic field can be written in exactly one way in t h e form Yl,i=i Vii^i^ C2, . . . , Ci^_i)c^~% where N is the degree of F in Cj^ and the rji{ci, C2, . . . , Cj^-i) are rational functions in the first v — 1 indeterminates. W h e n the N-\-l elements l^u^v?., . . . , u^ of the field are written in this form, one can construct a linear dependence of these elements of the field over t h e field of rational functions in ci, C2, . . . , c ^ - i , and from this one can construct a polynomial in u indeterminates with integer coefficients in which substitution of ci, C2, . . . , Cjy_i, 16, regarded as elements of the given root field, results in zero. One of the irreducible factors of this polynomial, call it G, must have t h e same property. This G is determined, u p to sign, by u. Since Ci, C2, . . . , c^, li are algebraically independent if and only if at least one of t h e indeterminates Q+i, Q+2? • • •, Cy-i occurs in G, what is to be shown is t h a t when this is t h e case, the given algebraic field can be described as stated in the proposition. Suppose Ch occurs in G for some h> I. Assume without loss of generality t h a t h — u — 1. T h e field of quotients of Z[ii, ci, C2, . . . , Cy-i] mod G is an algebraic field t h a t can be identified with a subfield of t h e given algebraic field. It can also be described as the field obtained by adjoining Cy-i to t h e field of rational functions in li, ci, C2, . . . , Cy-2' Since t h e given algebraic field can then be obtained by adjoining Cy t o this field, the desired conclusion follows from the theorem of the primitive element, which implies t h a t the given algebraic field, which can be obtained by two successive adjunctions to t h e field of rational functions in tx, ci, C2, . . . , Cy-2-> can be obtained by a single adjunction. Thus, given k algebraically independent elements of an algebraic field, one can successively alter t h e presentation of t h e field so t h a t the first k indeterminates of the polynomial t h a t describes it represent these k field elements. In particular, because t h e v indeterminates of a defining relation do satisfy an algebraic relation, k
Essay 2.3 Adjunctions and the Factorization of Polynomials
49
Essay 2.3 Adjunctions and t h e Factorization of Polynomials Cela pose, nous appellerons rationelle toute quantite qui s'exprimera en fonction rationnelle des coefficients de Vequation et d^un certain nombre de quantites adjointes a Vequation et convenues arbitrairement.—E. Galois, [27] (English translation, [18, p. 101]) Loosely speaking, the assertion of the preceding essay states t h a t any algebraic computation can be regarded as taking place inside the root field of some polynomial. Since t h e theorem of P a r t 1 implies t h a t every such field is a subfield of a Galois field—the root field of a Galois polynomial—every algebraic computation can be regarded as taking place inside a Galois field. For example, the factorization of f{x) mod g{y), where / and g are monic and irreducible, can be described, now t h a t the theorem of P a r t 1 and Galois's fundamental theorem have been proved, in the following way: In a Galois field in which fg splits, adjoin to* K a root 6 of ^ to obtain a subfield K{b) of the Galois field t h a t contains a root of ^. T h e factorization problem is to determine the irreducible factors of / when elements ofK{b) are p e r m i t t e d as coefficients. Over the Galois field, f{x) is a product of linear factors, say f{x) = f K ^ ~" Oi). Because f{x) is irreducible, it is relatively prime to its derivative, which implies t h a t the roots a^ of f{x) are distinct. T h e automorphisms of t h e Galois field t h a t leave b unmoved partition these roots into orbits, two roots being in the same orbit if and only if there is an automorphism t h a t carries one to the other without moving b. Thus, f{x) = Yl (f^ji^)^ where t h e factors (l)j{x) on the right are t h e products of the factors x — ai over all roots ai in one orbit. Each factor (t)j [x) of f{x) found in this way is unchanged by t h e automorphisms t h a t leave b fixed—such automorphisms merely p e r m u t e t h e factors of 0 j ( x ) — s o by Galois's theorem each 0 j ( x ) can be expressed in t h e form (j)j{x,b)^ where 0 j is a polynomial in two indeterminates with coefficients in K t h a t is monic in X. Each (f)j{x,b) is irreducible over K{b)^ because any monic factor other t h a n 1 over K{b) must be divisible by at least one of t h e monic factors x — ai of 0 j ( x , b) over K and therefore must, by the definition of 0 j ( x , 6), be divisible by all such factors, which implies t h a t it is divisible by (j)j{x^b) (because its linear factors are distinct). In conclusion, f{x) — O ^ j C ^ ' ^ ) ^^ ^^^ unique factorization of f{x) into monic factors irreducible over K{\)). Similarly, t h e factorization T{z^ t, u) — W{z — ait — bjU)., where the product is over all vnn pairs (a^, bj) in which ai is a root of / and bj is a root of ^ in a splitting field for fg (see Essay 1.9), implies t h a t t h e factorization of T into factors irreducible over R is obtained by grouping^ together factors z—ait—bjU t h a t are in the same orbit under the action of the Galois group. T h e Galois * As before, K is the field of quotients of the ring i^ = Z[ci, C2, . . . , Ci/] in which / and g have their coefficients. ^ Here the notion of "grouping" is very close to Galois's original use of the word "group" in a similar context.
50
2 Topics in Algebra
group acts transitively on the bj (because g is irreducible), and the a^ for which z — ait — bjU are in a given orbit for any one bj are, as was just shown, the roots Gi of one of the irreducible factors (/)fc(a:, bj) of f{x) when bj is adjoined. From this it follows that the factorization f{x) = Yl(j)k{x^y) mod g{y) implies that the factorization ^(2:, t^u) = Yl ^k{z^ t, u) of J^ into factors irreducible over R is found by defining the factors Tk{z, t, u) to be Y\{z — a^t — bju), where, for a given /c, the product is over all roots bj of g and, for each 6j, over all roots a^ of / for which (l)k{cii, bj) = 0. Thus, the degree of ^^ is /i/gn, where /ik = deg^, 0^. Moreover, J^k{x + uy^ l,w) = Y\{{x — ai) -f u{y — bj)) when the product is over all /i^n pairs (a^, bj) for which (/)k{cii^ bj) = 0. When Tkix + uy, l,u) is expanded as a polynomial in u whose coefficients are polynomials in x and y with coefficients in R, the terms of degree greater than fikU — n m. u are zero mod g{y)^ because, as is clear from the product representation, it is a sum of terms, each of which contains all n factors y — bj of g{y). (The portion x — ai of fewer than n factors is used in forming the term, so all distinct portions u{y — bj) are used at least once.) By the same token, the coefficient '0fc(x,y) of u^k(n-i) ^^ J^j^[x -h uy^ 1, u) (in the notation of Essay 1.5) contains only n terms that are nonzero mod g{y); explicitly,
^k{x,^) = XI^^'^^'^^'^ ( _/, )
^^^9{y)-
Therefore, ^l^ki^^y) has degree jj^k in x and its leading coefficient is Mfc
Y.(f^)'-^iY:m ^.'(.r.-„„,. (The first step follows from / _^.)( _ij ) = 0 mod g{y) when j 7^ /c, and the second follows from the observation that ^ ^ ^ has degree n — 1 and agrees with g^{y) when y = 6^-.) Similarly, JF(x + uy, 1, li) = f{x)g'{y)^u'^^^~^^ -\ mod g{y), which implies f{x)g'{y)'^ = Ylipk{x,y) mod g{y). By unique factorization mod g{y), it follows that each ipki'X^y) is of the form g'{y)^^(f)i{x^y) mod g{y) for some /. That / = k follows from the observation that if (ai^bj) is a pair for which (j)i{ai^bj) = 0, then substitution of {ai^bj) for (x, y) in %ljk{x, y) = YTL^I ^k{x, b,) [J^J mod g{y) gives 0 when k = I, so ipi{x,y) cannot divide (t)i{x,y) mod g{y) unless k = I. Therefore, ipk{x,y) = g\y)^^'^(j)k{x^ y) mod g{y)^ as the algorithm of Essay 1.5 asserts. Galois theory can be used to describe computations in a subfield K{ai^ 0^25 • • • 5 <^n) of a Galois field in the following way: First, if /ii is the number of distinct images of a i under the Galois group, then the product of the linear polynomials x — Sai over all distinct images Sai oi ai under the Galois group is a monic polynomial in x with coefficients in K that is irreducible over K by the argument above. Call it fi{x). Then fi{ai) = 0 provides a relation
Essay 2.3 Adjunctions and the Factorization of Polynomials
51
t h a t can be used to replace any element of the Galois field expressed as a polynomial in a i with coefficients in K^ say '0(Q;I), by another expression of t h e same element of the same form in which the degree of the polynomial is less t h a n /xi; one has only to divide ip{ai) by / i ( a i ) regarded as a monic polynomial with coefficients in K to find '0(Q;I) = 9(Q^I)/I(Q^I) + r(o^i) and t h e n to note t h a t '?/^(Q;I) = r{ai) as elements of the Galois field.* In this way, the relation fi{ai) = 0 can be used to find the unique representation of any element of K{ai) as a polynomial in a i with coefficients in K of degree less than 111. Similarly, if /i2 is the number of distinct images of 0^2 under the elements of t h e Galois group t h a t leave a i unmoved, the product of the /X2 distinct linear polynomials of the form x — Sa2, where Sa2 is t h e image of 0^2 under an element S of the Galois group t h a t leaves ai unmoved, is a monic polynomial f2{oLi^x) of degree 112 with coefficients in the field K[ai) t h a t is irreducible over this field and t h a t satisfies /2(0^1,0^2) = 0- Using the relation / 2 ( a i , 0^2) = 0 one can (division with remainder by a monic polynomial) find, for any polynomial in a i and a2 with coefficients in K , another polynomial representing the same element of K{ai.,a2) whose degree in a2 is less t h a n Ii2' By the irreducibility of / 2 , t h e only way t h a t a polynomial in ai and a2 of degree less t h a n 112 in a2 can be zero is for the coefficient of each power of a2 to be zero. Division by / I ( Q ; I ) to reduce the degree in ai does not increase t h e degree in 0^2 and proves t h a t each element of K{ai^a2) is represented by one and only one polynomial in Q;I and (X2 with coefficients in K whose degree in ai is less than /ii and whose degree in 0^2 is less than /i2- Continuation of this process leads to the following description of t h e Galois field: P r o p o s i t i o n . Given elements a\, a2, .. -, otn of a Galois field K[y] mod g{y), and given an element f3 of this Galois field that is unmoved by the elements of the Galois group that leave all of a\, a2, .. -, otn unmoved, express (3 as a polynomial in ai, 0L2, • -, OLn with coefficients K whose degree in ai for each i is less than the number of distinct images of ai under elements of the Galois group that leave all of ai, a2, • • •, oii-i unmoved. This polynomial is determined by (3. It will be called the representation in c a n o n i c a l f o r m of (3 as an element of the sub field K{ai, a2, . • . , otn) of the Galois field generated by a\, a2, • •., 01^ over K. Computations in K{a\., a 2 , . . . , 01^) can be done by dealing with polynomials in canonical form; addition is the usual addition of polynomials, while multiplication is t h e usual multiplication of polynomials followed by reduction to canonical form. T h e reduction of a polynomial to canonical form uses n relations fi{ai) = 0, / 2 ( Q ^ I , « 2 ) = 0, . . . , / n ( a ; i , 0^2, . . . , a ^ ) = 0 t h a t will be called a d j u n c t i o n r e l a t i o n s . T h e relation fi = 0 can be described either as a f ' = (j)i{ai^ a2, ..., cxi), where (piiai^ a 2 , • • • ? <^i) is the expression of a f ' in canonical form, or as t h e statement t h a t x = a^ is a root of t h e * Alternatively, one can iteratively replace a^^ with a^^ — (/)(ai) until the degree is less than /ii = degV^.
52
2 Topics in Algebra
polynomial IK^ ~" '^^i)^ where the product is over all distinct images of ai under elements S of the Galois group that leave a i , a2, . . . , o^i-i fixed and where the coefficients of this polynomial are written in canonical form. The adjunction relations are naturally used in reverse order—that is, /n = 0 is used to reduce the degree in a^, then fn-i = 0 is used to reduce the degree in a^-i without increasing the degree in a^, then fn-2 = 0 is used to reduce the degree in an-2 without increasing the degree in either a^ or Qn-i? and so forth, to end with a polynomial in canonical form. The theorem of the primitive element (Essay 2.1) implies, of course, that the field K{ai, a2, . . . , c^n) described in this way can be described by a single adjunction K{(3) with a single adjunction relation f{P) = 0. However, a construction of a field by a number of simple adjunctions may be preferable to a single adjunction, because it may describe the field more simply. For example, the classical question of whether a given equation can be solved by radicals simply asks whether a splitting field for a given a polynomial f{x) can be described by adjunction relations of the special form af' = 0i(ai, 0:2, . . . , ai-i) in which the right side does not involve ai. Example 1. The splitting field of /(x) = x^ — x'^ — 1. This polynomial is irreducible over Z, because a factorization would have the form (x^ + Ax + l)(x2 + 5 x - 1), and ^ + 5 = 0, - 1 + AB + 1 = - 1 , and - ^ + 5 = 0 would all hold. The factorization of f{x) mod f{y) can be found by the elementary calculation x^ a : 2 - -1 =EX4 - x 2 -y' + y2 ^{x 2 - y 2 ) ( ^ 2 ^ ^ 2 ) _ ( x 2 - -y') {x - y){x ^ y){x'^ Thus, adjunction of a root a :' + y'- - 1 ) mod {y' -y'of f{x) to Q gives f{x) either 2 or 4 roots, depending on whether x^ + a^ — 1 splits over Q(a). If one assumes that f{x) is not a Galois polynomial—which is to say that x^ + i/^ — 1 is irreducible mod y^ — y'^ — 1—this factorization of f{x) mod f{y) implies that the adjunction relations - ! ) •
a^ = a^ + 1, b = —a, 2 c = 11 - a 2 , d = —c, describe the splitting field. These relations show that the permutation (ab) is in the Galois group, because they easily imply the relations b^ = 6"^ + 1, a = —6, c^ = 1 — 6^, d = —c that result when a and b are interchanged and c and d are unmoved. Similarly, the four-cycle (acbd) is in the Galois group because the adjunction relations also imply all the relations c^ = c^ + 1, (i = —c, 6^ = 1 — c^, a = —6 obtained when the roots of f{x) in the splitting field are permuted in this way. These two permutations generate a group of 8 motions (a dihedral group). Since the field described by the adjunction relations obviously has degree 8 over Q, these 8 motions account for the entire Galois group.
Essay 2.3 Adjunctions and the Factorization of Polynomials
53
As for the proof that f{x) is not a Galois polynomial—which comes down to the statement that 1 — a^ is not a square in the field Q(a) obtained by adjoining a root of a^ — a^ — 1 to the rationals—the means to do it are given by the algorithms of Part 1, but the computations they require are not easily done with pencil and paper. As is shown in Essay 1.6, x'^ -\-y^ — 1 is irreducible mod y^ — y'^ — 1 \i and only if the determinant of {zl/^ — UGY -\- G^ — I4 is irreducible in Z[z,u\, where G is the 4 x 4 matrix whose first 3 rows are the last 3 rows of I4 and whose last row is 1, 0, 1, 0. This determinant, call it JS(^, 1, u), is z^ + (-2i/2 - 2)z^ -h (-u^ + 16^/2 - 1)^4 _^ ^2u^ _ i2u^ _ y2^2 _^ 2)z^ -h {y? H- 6^^ + \\u^ + ^u^ + 1). It is irreducible because J^3(^, 1, 2) = z^ 10z^-\-47z'^ — llOz^ + 841 is in fact irreducible. (The proof can be accomplished using the primitive methods of Essay 1.4, but of course a computer algebra package will find the answer much more quickly.) Therefore, the algorithms of Part 1 lead to the conclusion that ^3(^, 1,2) is a minimal splitting polynomial ofx^-x^-1. An easier method of proving that f{x) is not a Galois polynomial using an altogether different method is to observe that f{x) has the real root
'1 + V5 so that Q(a) can be embedded in the field of real numbers; relative to this embedding, 1 — a^ is a negative number, so it is not a square in Q(a). Once the adjunction relations are known, the computation of ^^3(2;, 1,2) can be accomplished as follows: This polynomial is the polynomial of which aH-2c and its conjugates—a set of 8 distinct elements of the splitting field—are the roots. Therefore, it is the product of the eight conjugates of ^ —a —2c under the Galois group. The product of this linear polynomial and its conjugate z — c — 2a under {ac){bd) is z'^ — 3(a -h c)z + bac + 2. The motion {ah){cd) carries 2^ — a — 2c to 2^ + a + 2c and 2; — c — 2a to z + c + 26 and their product to z^ + 3(a + c)^; + bac + 2; thus, the product of all four factors is z^-f-(10ac+4-9(a+c)2)z2_^(^5^^_l_2)2 = ^4_(^5^3^)^2_2i_^20i, where use is made of the identity a^(? = a^(l — a^) — —a'^-^a^ = —1 to write i for ac. Since {ah) carries i to — i, the product of all 8 conjugates is therefore the norm of z"^ - (5 + 8i)z2 - 2 1 + 20i, which is the polynomial z^ - 10z^+47z^ -110^^ + 841 found above. The explicit factorization oix^ — x'^ — 1 in the field Q\z\ mod {z^ — lOz^ -f 4:1 z^ — 1102:^ + 841) can be found using a method described by Galois (see [18, §37]). Because ( X - 6 ) ( X - c ) ( X - ( i ) = ( X 4 - X 2 - l ) / ( X - a ) = X 3 + aX2 + (a^ — 1)X -\- {a^ — a)^ a polynomial relation between a and t = a + 2c is given byO = ( t - a - 2 c ) ( t - a - 2 6 ) ( t - a - 2 6 / ) = 8 ( ^ - 6 ) ( ^ - c) [ ^ - d) = 8 {{^f + « ( V ) ' + (^' - 1) V + a^ - a) = ( t - a ) 3 + 2 a ( t - a ) 2 + 4 ( a 2 l)(t - a) + 8(a^ - a) = ba^ + 'Ma^ - {t^ + 4)a + t^ - U. In other words, a is a root of the polynomial 5^^ + ZtV^ - (t^ + 4 ) r + t^ - 4t in Y with
54
2 Topics in Algebra
coefficients in the splitting field. Since it is also a root of y ^ — F^ — 1, it is a root of the greatest common divisor of these two polynomials, which can be found using the Euclidean algorithm. Explicitly, if A — Y^ — Y'^ — \ and B = 5y3 + 3ty2 _ (t2 + 4 ) y + t^ _4t, then a is a root of A,B,C = yB-5A = 3ty3 _ (t2 _ i)y2 _^ (^3 _ 4^)y ^^^ D = 3tB-5C = {Ut^ - 5)Y^ - {St^ St)Y + {3t^ - 12^2 -25), E^ {3tYD - {Ut^ - 5)C) /5 = {-2t^ + ^^ + l)Y^ + (-t^H-5t^-19t)y-14t2 + 5, and F = ((-2t^ + 1 ^ + 1)D - (Mt^ - 5)E) /3t = (lOt^ - 33^4 + 97^2 - 29)y + {-2f + 9t^ -f 79t^ - 59t). Thus, a is expressed rationally in terms of t as 2^^ - 9t^ - 79t^ + 59t 10^6-33^4 + 9 7 ^ 2 - 2 9 ' and so are b — —a, c = ^ ^ , and d = —c. The explicit splitting is therefore 4 ^
2 ^ ^
.
/
2f -9t^ - 79t^ + 59t \ / 2t^ - 9t^ - 79t^ + 59t ^TTT^i :::rT^ ^77 h ^ + 10t6 - 33^4 + 97^2 - 29y V 10^^ " 33^^ + 97^2 - 29 Af - 12t^ + SSt^ - Ut\ / 4:f ~ 12t^ + 88t^ - 44t ^ ' ^ ~ 10^6 - 33t4 + 97t2 - 29/ V ^ 10^6 _ 33^4 _^ 97^2 _ 29
1 = U ^ — TTT-^
mod(t^ - lOt^ + 47t^ - 110^2 + 841). To express the four roots as polynomials in t with rational coefficients, one must 'rationalize the denominator' lOt^ — 33^^ + 97t^ — 29, which is simple in theory but lies beyond the range of hand computation. Doing the algebra on a computer (I used Maple), one can obtain (674t^ - 6363t^ - 7501^^ + 179117)(10t^ - 33^"^ + 97^^ - 29) = (6740t^ - 18472^2 - 301153)(t^ - 10^^ + 47t^ - llOt^ + 841) + 248075280. Multiplication of numerator and denominator of a by 674^^ —6363^'^ —7501t"^ + 179117 and simplification then gives 7f + 220^^ - 1846t^ + 9003t 18966 Because c = ^ ^ , the factorization 2^ X 18966^(x^ -x^
then follows easily.
-1)
= (18966X - {if + 220t^ - 1846t^ + 9003t)) x(18966x + {7f + 220t^ - 1846t^ + 9003t)) x(2 • 18966X - {7f + 220t^ - 1846^"^ - 9963t)) x(2 • 18966X + {7f + 220t^ - 1846t^ - 9963t)) mod(t^ - lOt^ + 47t^ - llOt^ + 841)
Essay 2.3 Adjunctions and the Factorization of Polynomials
55
Example 2. The splitting field of f{x) — x^ -\- 3x^ + 7x + 4. In this case, my derivation of the adjunction relations is rather long and ad hoc. The end result is that the Galois group is the alternating group, which makes the first two adjunction relations and the last easy. The third one is the hard one: a^ + Sa^ -h 7a + 4 = 0, b^ + ab^ + a^6 + a^ + 3a + 36 + 7 = 0, c = -l-(-24a^62 + 84a^6 - 36a^ + 42a'^b'^ - UaH + 63a^ -22a6^ + 210a6 - 299a - 636^ -f 1546 - 294), d=-a-b-c= b-c=
-^(24a^62 (24a^6^-84a - 84a^6 + 36a^ - 4:2a%^ + Ua^b - Q^a^ l133^ oo +22a6^ - 210a6 + 166a + 636^ - 2876 + 294).
56
2 Topics in Algebra
Essay 2.4 T h e Splitting Field of X"^ + CiX'^-^
+ C2X^"^ H
\- Cn
Dans le cas des equations algebriques, ce groupe n'est autre chose que Vensemble des 1 • 2 • 3 • • • m permutations possibles sur les m lettres, puisque dans ce cas, les fonctions symetriques sont seules determinables rationnellement.—E. Galois, [27] (English translation, [18, p. 104]) T h e o r e m . Construct the splitting field of the polynomial f{x) = x'^ -\cix'^~^ + C2X^^^ -\- '' • -\- Cn in which the coefficients ci, C2, ..., Cn are indeterminates. As the preceding essay explains, a natural way in which to describe the splitting field of a polynomial is to give the relations t h a t tell how to adjoin each new root to t h e field obtained by adjoining the roots t h a t precede it. Given the splitting field of any polynomial / ( x ) , one can write (1)
{x - ai){x - ai^i)
---{x-an)^
{x - ai){x
- a2) " ' {x -
ai-i)'
where Q;I, 0^2, • • •, c^n are the roots of f{x) in the splitting field, arranged in some order. T h e right side can be regarded as specifying a monic polynomial of degree n — i-\-l with coefficients in the field obtained by adjoining a i , 0^2, . . . , a^_i, namely, the polynomial t h a t results from simple division of the monic polynomial in the numerator by t h e monic polynomial in the denominator, because the numerator does not involve any a ' s at all, and the denominator involves just o^i, 0^2, • • •, ^ i - i - As t h e left side shows, a^ is a root of this monic polynomial. Since t h e adjunction relation is the irreducible monic polynomial with coefficients in t h e field obtained by adjoining a i , 0^2, . . . , o^i-i of which ai is a root, it must b e an irreducible factor of the polynomial indicated by the right-hand side of formula (1). T h e key to the theorem above is t h a t these polynomials themselves are irreducible, so they are the adjunction relations t h a t describe the splitting field. Galois, with his customary terseness, said this when he wrote (see above) t h a t "In the case of algebraic equations, this group is none other t h a n the set of all 1 •2-3 • • • n p e r m u t a t i o n s of the n letters, because in this case the symmetric functions are the only ones t h a t can be determined rationally"; the order of the Galois group is t h e product of t h e degrees of the adjunction relations, so Galois's statement t h a t the order of t h e Galois group is n • (n — 1) • (n — 2) • • • 1 implies t h a t the polynomial in (1) is the adjunction relation at each step. How Galois might have justified his assertion can only be guessed. The proof t h a t follows is inspired by Kronecker.* T h e adjunction relations are certain polynomials in a i , a 2 , • • •, c^n and ci, C2, . . . , Cn t h a t are zero as elements of the splitting field of f{x) = x^ + cix^~^ + C2x'^~'^ + • • • + c^. In theory they can of course be found by * See Kronecker, [37, Sec V]. Also [39, §12].
Essay 2.4 The Splitting Field of x"" + cix"""^ + C2x''~^ -\
+ Cn
57
constructing the splitting field as the root field of a Galois polynomial with coefficients in Z[ci, C2, . . . , Cn] and then doing the computations sketched in Essay 2.3. The objective is to find them without constructing the splitting field. Let R be the ring of polynomials in 2n indeterminates ai, a2, . . . , a^, ci, C2, . . . , Cn with integer coefficients. Imagine that a splitting field of f{x) = x'^ + Cix^~^ + C2x'^~'^ + • • • + Cri has been constructed and that an order has been chosen for the roots o^i, 0^2, . . . , ce^ of f{x) in the splitting field. Substituting a^ for a^ for i = 1, 2, . . . , n while sending each Q to itself defines a homomorphism from R to the splitting field. The objective is to determine the kernel of this homomorphism, and, more generally to determine conditions under which two elements 0 and ip of R have the same image in the splitting field. The following proposition gives sufficient conditions for this to be the case: Proposition 1. Let ai for i = 1, 2, ..., n he the elementary symmetric polynomials in ai, a2, . . . ; On, which is to say the coefficients of x'^~^, x'^~'^, ..., 1, respectively, in the polynomial HlLiC^ "^ ^0- Then the elements Q and (—l)Vi of R have the same image in the splitting field of f{x) = x'^ -\cix'^~^ + C2X^~^ H \- Cn under any homomorphism from R to the splitting field constructed in the way that was just described. Consequently, elements 0 and ipofR have the same image in the splitting field whenever their difference is a sum of multiples of the polynomials Ci — (—l)Vi. Proof Change x to —x in the definition of the elementary symmetric polynomials and multiply by (—1)'^ to obtain x^ — crix^~^-h(72x^~^H h(—l)^cr^ = {x — ai){x — 02)'' • {x — On)' Under a homomorphism of R into the splitting field that comes from ordering the roots o^i, 0^2, • • •, otn of / ( x ) and sending Oi to ai for each i while sending each Q to itself, this polynomial is carried to / ( x ) . Since f{x) = x"' + cix^~^ + C2X^~^ -^ • - • -\- Cn has this same image under such a homomorphism, the corresponding coefficients (—l)^cr^ and Q must also have the same image, as was to be shown. Let Ai — Ci~ {—lyai for i = 1, 2, . . . , n. Then such homomorphisms send Ai, ^ 2 , . . . , An all to zero. Therefore, if 0 and '0 are elements of R for which 0 ~ V^ = Z] DiAi^ where Di, 2^2, • • •, Dn are elements of R, then cj) and tp must have the same image under any such homomorphism, and the proof is complete. Let the statement that (j) and ip satisfy (/> — '0 = ^DiAi for some elements Di, D2, • • • 5 Dn of R be abbreviated (j) = ip mod A. Since this congruence relation is consistent with addition and multiplication, a ring R mod A is defined in this way. Proposition 2. As a polynomial with coefficients in the ring R mod A, the polynomial {x — ai){x — a^+i) • • • (x — a^); call it fi{x), for i = 1, 2, ..., n, is congruent to
58 (2)
2 Topics in Algebra 0i,o(a:) + 0i,l(x)ci + (t)i^2{x)c2 H
h (t)i^n-i{x)Cn-i + Cn-i+l mod ^
i(;/iere (j>i^j{x) is defined by (3)
X 0 , , , ( x ) = X ; a r a ? •••
/or i > 1 anc? for j = 0, 1, 2, ..., n — i, the sum being over all monomials in which the exponents ei, e2, • • •, ei are nonnegative integers whose sum is n — i—j-\-l. (In particular, (t)ij (x) is monic of degree n — i — j + l i n x . When i = 1, it is simply x^~^.) Proof. To say that two elements of R[x] (polynomials in x, ai, a2, . . . , a^, ci, C2, . . . , Cn with integer coefficients) are congruent mod A means, of course, that their difference can be written in the form J2^=i^j^j where Di, D2, . . . , Dn are in R[x]. The case i = 1 of (2) follows from the observation that the right side minus the left side is ^2^=1 x^~^ Aj. When the formula is proved for one value of i, one knows in particular, because (x - ai)fi^i{x)
- fi{x) = fi{x) - fi{ai),
that {x — ai)fi-\-i{x) is congruent to {(l)i,o{x) - 02,0(^0) + (02,1 (x) - (t)i,l{ai))ci-\
\-{(l)i,n-i{x) - (j)i,n-i{cii))Cn-i
mod A, say their difference is ^^=iDi,jAj, where the elements Z^2,i, ^2,2, . . . , Di^n of R[x] are defined in this way. Each of the n — i -\- I summands {(t)i,t{x)-(l)i^,{ai))c, is divisible by x - a ^ , as is {x-ai)fi^i{x), so ZlJ^i A j ^ j is divisible by x —a^. The quotient is ^ EijAj^ where Eij is found by striking from Dij all terms that are not divisible by x and dividing what is left by x. In particular, fi^i{x) and /.x (4)
02,O(^)-02,O(G^O , (t>i,i{x)-(t)i,i{ai) —
\
——
(t>i,n-i{x)-(l)i,n-i{ai) CiH
\
—
Cn-i
are congruent mod A (their difference is ^Ei^jAj). But this is the statement to be proved, because, (j)i^ri-i{x) being monic of degree 1 in x, the numerator of the last term is x — a^, while the numerators of the other terms are (x — ai)(j)ij^i^j{x)^ as follows easily from x^ -a^ = {x- a) (x^-^ -f x^-^a + • • • + a^'^) . Let an element of R be said to be in canonical form if its degree in a^ is at most n — i for each i = 1, 2, . . . , n. The following proposition makes computations in the ring R mod A possible:
Essay 2.4 The Splitting Field of x^ + cix^"^ + C2x''~^ + - • - + Cn
59
Proposition 3. Each element of R is congruent mod A to one and only one element in canonical form. Proof Let Ti = 0i,o(ai) + (/)z,i(a^)ci+0i,2(<^2)c2H \-(l)i^n-i{cii)cn-i-^Cn-i+i. Then Ti — a'^~'^~^^-\ hCn-i+i, where the omitted terms have degree at most n — i in ai, and do not contain aj for j > i or Cj for j > n — i -f 1. Moreover, Ti = 0 mod A (because fi{ai) = 0). Division of a given element 0 of i? by T^ = a^ + • • • regarded as a monic polynomial in a^ leaves a remainder that is congruent to (f) mod A, call it 01, from which a^ has been eliminated. Division of 01 by T^-i = cin-i + *' • regarded as a monic polynomial in an-i leaves a remainder that is congruent to 01 = 0 mod A, call it 02, in which the degree of a^-i is at most 1 and a^ has not been reintroduced. Continuing in this way—on the zth step dividing 0i_i by Tn+l_^ regarded as a monic polynomial in a^+i-z and calling the remainder 0^—produces a sequence 0 = 0o, 0i, 02, • • •, 0n of polynomials congruent to 0 mod A. Since the degree in a^ is reduced to at most n — z by the (n + 1 — i)th step and is not increased by any subsequent step, 0^ is in canonical form. Thus, every element 0 of i? is congruent mod A to an element 0n in canonical form. Any element '0 of i^ is congruent mod A to an element from which the c's have been eliminated, because division of ip by Ti regarded as a monic polynomial in Cn leaves a remainder that does not contain c^, then division of this remainder by T2 regarded as a monic polynomial in c^-i leaves a remainder that contains neither c^-i nor c^, and so forth. (In other words, at the ith step, one substitutes c^+i-i — Ti in place of Cn-\-i-i to obtain an element that is unchanged mod A in which c^+i-i is no longer present and no c with a larger index is reintroduced.) When the input to the first algorithm is a polynomial in ai, a2, . . . , a^ alone and the input to the second algorithm is a polynomial in canonical form, these two algorithms are inverse to one another and establish a one-toone correspondence between polynomials in ai, a2, . . . , a^ and polynomials in canonical form in which corresponding polynomials are congruent mod A, because the first algorithm produces a sequence of equations QiTji
01 = Q2Tn-l >n-l = QnTl
n, +02, +0n,
in which, when 0 contains none of Ci, C2, . . . , c^, 0^ contains none of Q+i, Q+2, . . . , c^ for z = 1, 2, . . . , n — 1, and 0^ is in canonical form. Thus, 0^ is the remainder when 0^+i is divided hy Tn-i regarded as a monic polynomial in Q+i, as was to be shown. To say that a polynomial 0 in ai, a2, . . . , a^ alone is congruent to zero mod A means that it has the form 0 = ^DjAj] this means that 0 = 0 because
60
2 Topics in Algebra
substitution of (—1)V^ for Q in this equation leaves t h e left side unchanged and makes t h e right side zero. Thus, two elements of R in canonical form are congruent mod A if and only if the two polynomials in a i , a2, . . . , a^ alone to which they correspond are congruent mod A, which is true if and only if these two polynomials are equal. This shows t h a t two elements of R in canonical form are congruent mod A only if they are equal and completes the proof of t h e proposition. Proof of the Theorem. The formula x^ -f Cix'^~^ + C2x'^~'^ + • • • + c^ = (x — ai){x — a2) •'' {x — an) mod A gives an explicit factorization of this polynomial with indeterminate coefficients into linear factors over an integral domain t h a t contains Z[ci, C2, . . . , c^] as a subring. (Elements of R t h a t do not contain a i , a2, ... ^ an are in canonical form, so they are congruent mod A only if they are equal.) Thus, the field of quotients of this integral domain is a splitting field of the polynomial. Since it is generated over Z[ci, C2, . . . , c^] by the roots of t h e polynomial in the ring, it is a minimal splitting field and is therefore the splitting field. C o r o l l a r y 1. The Galois group of x^ + cix^~^ -\- C2x'^~'^ + • • • + c^; where the c 's are indeterminates, permutes the roots in the splitting field in all n\ possible ways. Deduction. Each element of the integral domain R mod A has a unique representation in the form ^ ^ei,e2,...,e^(ci, C2, . . . , c^)a^^a2^ • • • a^j', where the coefficients Be^^e2,...,erA^i^ ^2, • • •, Cn) are in Z[ci, C2, . . . , Cn] and the monomials al^a2^ • • • a^"^ range over all n! such monomials in which e^ < n — i for each i, which shows t h a t the degree of the splitting field as an extension of the field of rational functions in ci, C2, . . . , c^ is n! and therefore t h a t n! is the order of the Galois group. C o r o l l a r y 2. A polynomial in ai, 02, • -, On that is unchanged by all n\ permutations of ai, 02, . . , On has one and only one representation as a polynomial with integer coefficients in the elementary symmetric polynomials ai, a2, •.., cFn in ai, 02, . . •, anDeduction. Let 0 be a given polynomial in a i , 02, . . . , On t h a t is unchanged by permutations of t h e a's. W h e n (j) is regarded as an element of i? = Z[ai, a2, . . . , ttn, ci, C2, . . . , Cn] it is cougrucnt mod A to one and only one element in canonical form; call it 0^. Since 0 and Ai, ^42, . . . , An are unchanged by permutations of the a's, (pn is unchanged by permutations of the a's. Since (pn does not contain a^, it cannot contain any a. Therefore (pn is a polynomial ^ ( c i , C2, . . . , Cn) in the c's alone. Since it is congruent mod A to (j) and to no other element of Z[ai, a2, . . . , a^], it follows t h a t 0 is equal to ^(—(71, ( 7 2 , . . . , (—1)^(7^) and to no other polynomial in the cr's with integer coefficients.
Essay 2.4 The Splitting Field of x^ + cix^"^ + C2X^"^ + • • • + Cn
61
Example. T h e adjunction relations t h a t describe the splitting field of x^ -\cix^ + C2X^ + csx^ + C4X + C5, where ci, C2, . . . , C5 are indeterminates, are al + c i ^ i H- C2ai -f c s ^ i -f C4^ai -h C5 = 0, (0^2 + Q;2cei + a^^^f + 0:20^1 + <^i) + Ci (0^2 + ci;!^! + c^2<^i + <^i) -\-C2{a2 -\- a2ai + a f ) +C3(Qf2 + a i ) + C4 = 0, (^3 + Q;3Q^2 + Q^io^i + <^3<^2 + <^3Q;2<^I + a^al
+ ^2 + o^l^i + 0^2(^1 + o;?)
+Ci (0^3 + 0^30^2 + <^3<^l + Cl^2 + ^2(^1 + Q^l) -h ^2(0^3 + Ce2 + Cei) + C3 = 0, (^4 + a^as + Q;4Q;2 + ^40^1 + c^i + Q;3<^2 + ces^;! + <^2 + <^2<^i + <^i) + C I ( Q ; 4 + Q;3 + 0^2 + ai)
+ C2 =
0,
^5 + 0^4 + a 3 + Q;2 + Q;I + ci = 0, where a i , a 2 , . . . , 0^5 are t h e roots of the equation in a splitting field. (In t h e above construction, a^ is the element of R m o d A represented by a^.)
62
2 Topics in Algebra
Essay 2.5 A F u n d a m e n t a l Theorem of Divisor Theory In 1894, when Richard Dedekind's theory of ideals was first beginning to gain widespread recognition, Adolph Hurwitz published a paper [33] in which he proposed a new approach to the theory. Unfortunately for him, Dedekind did not accept his proposal as a friendly amendment. In fact, Dedekind replied [13] rather sharply t h a t he too had discovered the approach Hurwitz described— he had even published a paper [12] on it in an out-of-the-way journal—but t h a t he had firmly decided against it on philosophical grounds. Neither Hurwitz nor Dedekind seems to have been p e t t y or disagreeable as a general rule, but b o t h of t h e m felt they had a vital stake in the matter. Hurwitz replied [34] t h a t he was not persuaded by Dedekind's philosophical objections to his proposal and t h a t t h e published theorem to which Dedekind had referred was in fact a special case of a theorem Kronecker had published [41] long before Dedekind had, although Hurwitz himself had only lately learned of Kronecker's theorem. T h e interesting p a r t of this story is t h a t three eminent mathematicians— Kronecker, Dedekind, and Hurwitz, in t h a t order—discovered and focused on this one theorem and its importance for the theory of ideals (or, as Kronecker termed it, the theory of divisors), yet the theorem is not widely known today and is rarely included in modern t r e a t m e n t s of the theory. Dedekind regarded the theorem as a generalization of Gauss's lemma [28, Art. 42]: If the product of two monic polynomials with rational coefficients has integer coefficients, then the factors must have integer coefficients. He restated Gauss's lemma in the following form: If the product of two polynomials (not necessarily monic) with rational coefficients has integer coefficients, then the product of any coefficient of the first and any coefficient of the second is an integer. This s t a t e m e n t obviously implies Gauss's lemma (1 is a coefficient of b o t h polynomials in Gauss's case). The reverse implication can be proved fairly easily, so t h e two statements are essentially t h e same. T h e advantage of Dedekind's version is t h a t it is true with algebraic numbers and algebraic integers in place of rational numbers and rational integers. Thus, Dedekind's theorem states t h a t if the product of two polynomials whose coefficients are algebraic numbers has coefficients that are algebraic integers, then the product of any coefficient of the first and any coefficient of the second is an algebraic integer. Another way to p u t it is the following: Let ao, a i , . . . , Om and 6o, &i, . . . , bn be two sequences of algebraic numbers. There are two ways to "multiply" them, call t h e m the "polynomial" way, in which t h e product is the sequence dobo^ dobi + ai6o, ao62 + ai6i + a26o, .. •, ambn of coefficients of the product {aox'^ + aix'^~^ H \- am){box'^ -^ bix^~^ -\ \-bn) of polynomials of which the given sequences are the coefficients, and the "pairwise" way, in which the product is t h e sequence of products a^fej, where 0 < i < m and 0 < j < n, arranged in some order. Each t e r m of the polynomial product is a sum of terms of the pairwise product, so if all terms of the pairwise product sequence
Essay 2.5 A Fundamental Theorem of Divisor Theory
63
are algebraic integers, the terms of the polynomial product sequence are too. Dedekind's theorem states the converse: if all terms of the polynomial product of two sequences of algebraic numbers are algebraic integers, then so are all terms of the pairwise product. Here an algebraic number is a "quantity" x that satisfies an equation x^ -\-Aix^~^ -\-' • • + ^ i v = 0 in which the coefficients Ai are rational numbers. After this definition is made, one can prove that in fact x is an algebraic number if the coefficients Ai are merely algebraic numbers. An algebraic integer is an algebraic number that satisfies such an equation in which the Ai are integers. Again, one can prove that in fact x is an algebraic integer if the Ai are merely algebraic integers. Dedekind's version of Gauss's lemma is then the statement that each term aibj of the pairwise product satisfies an equation of the form (aibj)^ + Ai{aibj)^~^ + A2(aibj)^~'^ + • • • + ^ ^ = 0 in which the coefficients Ai can be chosen to be algebraic integers whenever all terms of the polynomial product are algebraic integers. Kronecker made the stronger statement [41] that one can give (theoretically, at least) explicit formulas in which the A's are expressed as polynomials with integer coefficients in the terms of the polynomial product and are therefore algebraic integers whenever the terms of the polynomial product are algebraic integers. Stated in this way, the theorem becomes a very concrete theorem of "general arithmetic": Theorem. Let ao, ai, ..., am, bo, bi, ..., bn be indeterminates and let R = Z ao, ai, ..., am, bo, bi, ..., bn] be the ring of polynomials in these indeterminates with integer coefficients. Let co, ci, ..., Cm+n be the elements of R defined by Ci =
2_j
^j^k-
j-\-k—i
For each product ajbk of one a and one b (where 0 < j < m and 0 < k < n), construct a relation of the form F{ajbk) = 0 in which F{X) = X^ + PiX^~^ + P2X^~'^ + • • • + Pn is a monic polynomial whose coefficients pi, P2, . . . ; PN O^TC elements of R that are polynomials in co, ci, ..., Cm-\-n 'with integer coefficients. Proof. Let f,g, and h be the polynomials with coefficients in R defined by f{x) = aox"^ + a i x ^ - i + • • • + a^, g{x) = box"" -f bix""-^ + • • • + 6n, h{x) = f{x)g{x). Construct a splitting field for h{x)^ and let ^i, ^2, • • •, im+n be the negatives of the roots of h{x) in this field. Then h{x) = cox'^^^ -{-cix'^^^~^ -h Ui^ -^ Ci)^ SO Ci = co^i, where Ei is the ith elementary symmetric polynomial in ^1, ^2^ •••, im+n- The equation h{x) = f{x)g{x) partitions the factors x-\-^i of h{x) into two subsets, say f{x) = ao{x-\-£^i){x-\6 ) • • • (^ + ^m) and g{x) = 6o(^-f-^m+i)(^ + ^m+2) • • • (^+^m+n). Then ajbk = aoboCTjTk = coCTjTk, where aj is the j t h elementary symmetric polynomial in ^1, ^2 5 • • • 5 ^m and Tk is the kth elementary symmetric polynomial in ^m+i, ^m+2, . . . , ^m+n- Let F{X) = Us i^ ~ ^oS{crjrk)), where S runs over all
64
2 Topics in Algebra
{m-\-n)\ permutations 5 of ^ i , ^2, • • •, Cm+n- Since ajbk is a root of the monic polynomial F{X) (it is the root of the factor in which S is the identity), the theorem will be proved if the coefficients of F are shown to be polynomials in Co, ci, . . . , Cm-\-n with integer coefficients. T h e coefficient of x ^ ^ + ^ ^ ' ~ ^ in F is (—CQ)^ times the sum of all products of p distinct conjugates of ajTk under p e r m u t a t i o n of ^ i , ^2, • • • 5 Cm-\-n' The degree of any conjugate of ajTk in ^1 is 1, so this coefficient is CQ times a symmetric polynomial in ^ 1 , ^2, • • •, £,m-\-n whose degree in ^1—and therefore in any one ^^—is at most p. Because t h e polynomial is symmetric, this coefficient of F has the form CQC{UI^ Z ' 2 , . . . , Urn-\-n)^ where C is a polynomial with integer coefficients, by Corollary 2 of Essay 2.4. Let each t e r m -iU^^Ui'' • • • U^^;^ of C, where 7 is the integer coefficient of the term, be expressed as a polynomial in ^ 1 , <^2, • • • 5 im+n^ and let the terms be arranged in lexicographic order (terms with highest degree in ^1 come first, among terms with the same degree in ^1 t h e terms with highest degree in ^2 come first, and so forth). The leading t e r m of t h e result is 7^1 ^^2^ • • • i^^n •> where ei = c^i +G?2 + • * '-\-dm+ni ^2 ^
^ 2 ~r 0^3 I ' ' • I (lra-\-ni
- - - •> ^m-\-n — l ^^ ^m-\-n — l
i (^m-\-n^ ^m+n
^^ ^m-\-n-)
because the leading t e r m of a product is the product of the leading terms and the leading t e r m of E^ is ii^2''' ik- No two of these leading terms contain the same exponents (the e's determine the (i's as differences of successive e's), so the leading term of C, when it is expressed in terms of ^ 1 , ^2, . . . , ^rn+n ciud Written in lexicographic order, is one of these leading terms 7 ^ r ^ 2 ^ ' ' ' Cn^n • -'-^ particular, its degree in ^1 m di -\- d^ -\- - - - -\- d^ for some term, so its degree in ^1 is t h e total degree of C in the Z"s. Therefore, C Q C ( Z V 1 , 2^25 ' ' • •> ^ m + /co) can be expressed as a polynomial in CQ, ci, . . . , Cm+n with integer coefficients, as was to be shown. See P a r t 0 of [19] and [23, Nr. 20] for fuller accounts and other references.
Some Quadratic Problems
Essay 3.1 The Problem An + B = n and "Hypernumbers" The problem that motivates the study of 'hypernumbers' in the next few essays comes from the prehistory of mathematics. In "The Measurement of the Circle," Archimedes states that Y | | < \/3 and ^YH' > v ^ without giving any derivation. The closeness of these approximations becomes clear when one compares 265"^ = 70225 to 3 • 153^ = 70227 in the case of the first and 1351^ = 1825201 to 3 • 780^ = 1825200 in the case of the seco'nd to find that 265^ + 2 = 3- 153^ and 1351^ = 3 • 780^ + 1. There have been many attempts to guess how Archimedes might have derived these estimates. One can be certain that they were not found by trial and error; very probably, they involve some analogue of what is today called the continued fractions algorithm, but there is no documentary evidence on which to base such speculations. A similar problem is treated in earlier Greek mathematics. As early as the time of Pythagoras, Greek mathematicians are said to have derived* an entire sequence of approximations to \/2 in the form of "side and diagonal" numbers. If d is the length of the diagonal of a square and s is the length of its side, then (P = 25^ by the Pythagorean theorem. The followers of Pythagoras are thought to have discovered that there are no whole-number solutions (d, s) of this equation—and to have been very dismayed to learn that numbers, in the simplest sense, are not sufficient for the description of this simple geometrical construction. But their study of the problem probably went well beyond the impossibility of (P = 2s'^ in whole numbers to the following sequence of approximate solutions d^ = 2s'^ d= 1. A solution (d^, Sn) of (i^ = 2s^ =b 1 implies a solution (o^^+i, s^+i) of
66
3 Some Quadratic Problems
--" ^
[^
I
N
\
/-
I I
\ V
y
/
I
^\
y
''
/ N
\
\
I I I
S
I
y
y
y
y
y
^^ \
I /
Fig. 3 . 1 . The line segment AB is the hypotenuse of two different right triangles.
algebraic notation, to verify.''' Let (o^o, ^o) = (1,1). T h e n dg = 2^0 — 1, and t h e formula generates the sequence (1,1), (3,2), (7,5), (17,12), (41,29), (99,70), (239,169), (577,408), . . . , which alternates between solutions of d^ + 1 = 2s^ and solutions of c/2 = 25^ + 1; t h a t is, 1^ + 1 = 2-l2,32 - 2-22 + 1, 7^ + 1 = 2 . 5 ^ and so forth. In this way one derives, for example, 577^ = 2 • 408^^ + 1, which shows t h a t | ^ is a very good approximation to v/2, because ( | ^ ) ^ = 2 + 4 ^ , but t h a t it is a bit too large. Plato's reference in Theaetetus to the study of the irrationality of square roots up to v T z probably indicates t h a t mathematicians of his time studied rational approximations of these other square roots as well. There are other indications t h a t techniques of finding approximate solutions of y-^ = Ax^ were studied by ancient mathematicians whose works are lost. In India, Brahm a g u p t a , in the 7th century, stated the formula t h a t will be studied later in ^ The statement that d^ = 2s^ ± 1 implies d^-j-i = 2s^_j_i =F 1 can be deduced from the formula (2s + d)^ + d^ == 2s^ + 2(s + 2s^ implies (2s + df < 2(s + df and 2(s + df - (2s + df = d^ - 2s^; in the same way, d^ < 2s^ implies (2s + df - 2(s + df = 2s^ - d^.
Essay 3.1 The Problem AD + B = D and "Hypernumbers"
67
this essay, and Bhascara Acharya, in t h e 12th century, mentioned t h e spectacular fact t h a t the smallest number x for which 61x^ + 1 is a square is X = 226153980. Problems similar to this one are connected with the famous "cattle problem" of Archimedes [49], causing scholars to believe t h a t Archimedes knew far more about such number-theoretic problems t h a n our usual view of Greek mathematics as being primarily geometrical would lead us to expect. These problems will be studied in this group of essays in the form of t h e problem t h a t will be indicated by the symbolic equation AU -h 5 = D, which is to say t h e problem "Given numbers A and 5 , find numbers x for which Ax'^-\-B is a square," say Ax^-^B = y"^. (At first glance, Archimedes' solution 265^ + 2 = 3-153^ does not appear to be an instance of this problem, because A = 3 and B = 2 are on opposite sides of t h e equation, b u t if t h e equation is multiplied by 3, it becomes 3 • 265"^ H- 6 = (3 • 153)^, which is a solution of 3n + 6 = n . Conversely, in any solution (x, y) of 3x^ -\- ^ — y^ ^ y must be divisible b y 3, say z = i//3, and division of 3x^ + 6 = ^z^ by 3 gives a solution of x^ + 2 = 3z^.) Because the solution of this problem is easy when ^4 is a square,* t h e case in which ^4 is a square will be ignored. B r a h m a g u p t a stated (but in words, not as an algebraic formula) t h e crucial tool t h a t is used in the solution of AU + B = D. It is the observation^ t h a t a solution of AU + ^ = D can be combined with a solution of AU + C = D to find a solution of ^ D + BC = U. Specifically, if Ax'^ -\- B = y'^ and Au'^ -\-C = v'^, then A{xv + yu)'^ + BC — {^Axu -h yv)^. It seems likely t h a t some version of this remarkable fact was known was known in Greek times and t h a t it was involved in the calculation of approximations to square roots. Archimedes' approximations to \ / 3 can be derived using B r a h m a g u p t a ' s formula in t h e following way: Combine t h e simple equation 3 - 1 ^ + 1 = 2^ with itself to obtain 3 • 4^ + 1 = 7^, t h e n combine this new equation with 3 • 1^ + 1 ^^ 2^ to obtain 3 • 15^ + 1 = 26^, and so forth, to obtain the infinite sequence 3-1^ + 1 = 2^ 3.42 + 1 = 7^ * When A is a square, the problem is to write the given B in the form s^ — ^^ = (s — t)(s + 1 ) , where t is a multiple of the square root of A. Thus s — t — B\ and s-\-t — ^ 2 , where B = B1B2 is one of the finite set of factorizations of B in which Bi < B2. Since Bi-\-B2 — 2s is even, the problem is thus to find all factorizations BxB2 = B, if any, in which Bx < B2, Bi = B2 mod 2, and {B2 - Bi)/2 is a multiple of the square root of A. For each of them, {^^^Y = {^^^Y + B is a solution, and there are no others. ^ See [10, p. 363]. The proof, using modern algebraic notation, is a simple calculation. How Brahmagupta might have proved it without algebraic notation—or how he might have known it is true—is a mystery. Certainly Euclid's Proposition 10 of Book 2 (see note above) indicates a Greek awareness of a similar phenomenon many centuries earlier, but there is no reason to suppose that the Greeks were the first.
68
3 Some Quadratic Problems 3 • 15^ 1 = 26^ 3 • 56^ 1 = 97^ 1 = 362^ 3•209^ 3.780^ + 1 = 1351^
of solutions of 3n + 1 = D that includes the one Archimedes used. Each of these equations can be combined with 3 • 1^ + 6 — 3^ to obtain 3-52 + 6 = 3-192 + 6 = 3-712 + 6 = 3 • 265^+ 6 = 3 • 989^ + 6 = - 3691^ + 6 =
(^^ 3 ) ^ isM l ) ^ {c^ 4 l ) ^ {cM 5 3 ) ^
ic^ 5 7 l ) ^ ii^ 2 l 3 l ) ^
3 to obtain an infinite sequence 52 + 2 == 3 192 + 2 =- 3 71^ + 2 == 3 265^ + 2--= 3 989^ + 2--= 3 3691^ + 2'-= 3
32, ll^ 41^, 1532, 571^, 2131^,
of solutions of D + 2 = 3n that includes, of course, the other solution Archimedes used. (One naturally wonders whether D + 1 = 3n is possible; it is not, because —1 is not a square mod 3.) In modern terminology, Brahmagupta's formula has become the statement that for expressions of the form y + xy/A, the product of the norms is the norm of the product. Here the norm of y -\- xy/A is by definition its product with its conjugate y — x \ M , which is to say that it is {y + xy/A){y — xy/A) = ^^ — Ax^. When one computes with these expressions using the normal rules of algebra—the Buchstabenrechnung of Essay 1.1—one finds that the conjugate of a product is the product of the conjugates, because f ^ -h xvAj
(v-^ uvAj
(y — XVA\
(V — uyA\
= (yv + Axu) -\- [yu + xt')vG4,
whereas = {yv + Axu) — {yu +
xv)vA,
Essay 3.1 The Problem AU-\- B = U and "Hypernumbers"
69
so the norm of a product can be computed in either of two ways: the product can be expanded (y + x\/]4) (v + uy/A j = {yv -f Axu) + {yu + xv) y/A to find that the norm of the product is {yv-\-Axu)'^ — A{yu-\-xv)"^, or the norm of the product can be computed by multiplying the product ( y + xvAj f v -\- uy/Aj by its conjugate (y — xy/Aj
(v — uy/Aj.
Thus
{yv + Axu)'^ - A{yu + xv)^ = {y'^ - Ax^){v'^ - Au^). With B = y'^—Ax'^ and C = v'^—Au'^, this is Brahmagupta's formula rewritten as {yv + Axu)'^ — A{yu + xv)'^ = BC. A solution of ^ n + 5 = D is an expression y + X \ / A , in which x and y are numbers, whose norm is B. For want of a better term, I will call such an expression y + x\/A a hypernumber for A, so that the problem "find all solutions of An + -B = D" for given numbers A and B^ with A not a square, becomes "find all hypernumbers for A whose norms are J5." More precisely, the hypernumbers for a given number A not a square can be described in the following way: As in Essay 1.1, a number is a term in the sequence 0, 1,2, For a given number A not a square, a hypernumber is an expression y -f- xy/A in which x and y are numbers and y/A is a mere symbol. Hypernumbers for the same A are added in the obvious way, (T/I
+ X2 vC4) + (2/2 + X 2 \ / l ) = (^1 + y2) + {xi +
X2)VA,
and they are multiplied using the rule f y/A J = A to obtain (^yi + X2^/Aj [y2 + 2:2V^j = (2/12/2 + ^^1X2) + {yiX2 +
y2Xi)\fA.
Otherwise stated, the hypernumbers for A are N(X) mod (X^ — A), the set of all polynomials in X whose coefficients are numbers, when two such polynomials are considered to be equal if they are congruent mod (X^ — A). Every polynomial in X whose coefficients are numbers is congruent mod {X'^ — A) to one and only one polynomial of degree less than 2 (replace X^ with A, X^ with AX, X^ with A^, and so forth), and the defining relation X'^ =^ A justifies writing yfA in place of X. This definition of course implies the rules of addition and multiplication of hypernumbers just stated. (The assumption that A is not a square guarantees that nonzero factors can be canceled in the arithmetic of hypernumbers y-\-xy/A, because it guarantees that X'^ — A is irreducible, from which it follows that Z[X] mod {X'^ — A) is an integral domain; then for integers r, 5, x, y, u, v the congruence {s-\-rX){y-\xX) = (s -f rX){v 4- uX) mod (X^ - A) implies 5 + r X = 0 or y + x X = v + uX mod (X^ — A) and therefore implies s -\- rX = 0 or y-{- xX — v ^ uX. On the other hand, if A = r^ then, (r + X ) r = (r + X ) X mod (X^ - A) even though r + X ^ O a n d r ^ X mod (X^ - A).)
70
3 Some Quadratic Problems
The exclusion of negative numbers is a bit inconvenient—-the norm y'^—Ax'^ of a hypernumber y + xy/A may not be a number in this strict sense because Ax'^ may be larger than 2/^, and the conjugate y — x\J~A of ^ + x\/~A will be a hypernumber only when x = 0—but insistence on the narrow definition of "number" can be maintained with very little real difficulty and gives the theory a pleasing economy of structure. All of the results in the first five essays of this section, including the law of quadratic reciprocity in Essay 3.5, are deduced using only the arithmetic of numbers 0, 1, 2, . . . in the narrowest sense.
Essay 3.2 Modules
71
Essay 3.2 Modules T h e notion of a module of hypernumbers t h a t is introduced in this essay is used in t h e next essay to solve AD -\- B = D and in the following essays to deal with other questions in number theory. Very simply put, a module of hypernumbers for a given A is a list of hypernumbers for t h a t A, written between square brackets to indicate t h a t the list is to be used to define a congruence relation. The concept is motivated by the following reexamination of the Euclidean algorithm. Gauss's notion of what it means to say t h a t a = b mod m—that is, two numbers a and b are congruent modulo a third number m—was generalized by Kronecker* as follows: Given a list of numbers m i , m2, . . . , TTT,^, two numbers a and b are c o n g r u e n t m o d u l o [TTII, m2, • . . , m ^ ] , written a = b mod [mi, 1712, • • •, rn/j], if there are numbers i i , 22, - - - ^i/j. and j i , J2, • • •, j ^ such t h a t Ci-\-Yla=i '^^'^a = ^ + Sa=i^Q!^^Q!- ^ module is a (nonempty, finite) list of numbers [TTII, 7722, . . . , 777,^] written between square brackets to indicate t h a t they are to be used to define a congruence relation in this way. Two modules are e q u a l if they define the same congruence relation. Clearly, two modules are equal if t h e lists of numbers they contain can be obtained from one another by a sequence of steps in which (1) terms are rearranged, or (2) a zero is omitted from the list or annexed to it, or (3) a t e r m is added to or subtracted from another term. (A subtraction assumes, of course, t h a t the t e r m being subtracted is less t h a n or equal to t h e t e r m from which it is being subtracted.) In the case of operations of types (1) or (2) the assertion is obvious. In t h e case of an operation of t y p e (3), it follows from t h e observations t h a t a -f- i i ( m i + 7712) + Z2^2 + ^3^3 + • • • = 6+ji(miH-7n2)-f J 2 ^ 2 + J 3 ^ 3 H implies a - h i i m i + ( 2 1 + ^ 2 ) ^ 2 ^ - i s ^ s H = 6 + j i m i H - ( j i H - J 2 ) ^ 2 + J 3 ^ 3 + - • • and, conversely, a+Zi777,i+i2^2-f ^WsH = 6+Jimi+J2^2+J3^3H implies a+ii(?77.i+777,2)+ (^'(+^2)^2+^3^3H ^ b-\-j'i{mi +777-2) + (^1 + ^ 2 ) ^ 2 + J 3 ^ 3 + • • • when i^777,2 + J W 2 is added t o b o t h sides. These simple observations lead t o a version of the Euclidean algorithm: The Euclidean Algorithm. Input: A list of numbers describing a module. Algorithm: While the list contains more than one number If the first entry is zero, drop it from the list. If the first entry is greater than the second entry, interchange entries. Otherwise, subtract the first entry from the second. End
the first
two
See, for example, [44, p. 144]. Kronecker did not go to the extreme that I have of insisting that the multipliers all be natural numbers, so he did not need to put sums of multiples of m's on both sides of the equation.
72
3 Some Quadratic Problems
Output: The list with one entry that remains. For example, [21,15,6] = [15,21,6] = [15,6,6] = [6,15,6] = [6,9,6] = [6,3,6] = [3,6,6] = [3,3,6] = [3,0,6] = [0,3,6] = [3,6] = [3,3] = [3,0] = [0,3] = [3]. Each step results in a new module equal to the preceding one; it either reduces the length of the list (the first alternative holds), or it reduces the sum of the entries (the third alternative), or it is followed by a step in which one of these two types of reduction occurs (the second alternative). Therefore, the algorithm eventually terminates, and reduces the module to a very simple form: Theorem 1. Given any module [mi, m2, - • •, m^]; there is a number n for which [mi, m2, ..., m^] — [n]. This theorem gives a canonical form for modules, because [ni] = [77,2] only if ni =712. (If [^1] = [^2], then each of the numbers ni and n2 is a multiple of the other, which implies ni = 712, because it implies that if one is zero, then both are, and otherwise, each is less than or equal to the other.) One can determine whether two given modules are equal by putting them both in canonical form; they are equal if and only if the canonical forms are identical. The number n is obviously the greatest common divisor of mi, 7712, • • •, '^n^^ except when the numbers m^i are all zero, in which case there is no greatest common divisor because all numbers are common divisors. Corollary. / / two lists determine the same module, they can he transformed into one another by a sequence of steps of types (1), (2), and (3) described above. Deduction. One can pass from either of them to their common canonical form and back by a sequence of such steps, so one can pass from either of them to the other. The "Euclidean algorithm" of Essay 1.4 shows that if [7711,7712] = [^], there are integers (j) and if) for which (j)mi + ipm2 = 77,. Without using integers, this fact can be stated and generalized as follows: Proposition 1. / / [777,1, ^ 2 ; •••; '^^i] = [^] o.'nd m.^ 7^ 0, then there are numbers ki, k2, ..., kf^ for which kirui +/c2r7i2H \-k^-im.^-i-\-n = kfj^m^. Proof. Because 71 = 0 mod [77,] and therefore 77, = 0 mod [777,1, ^ 2 , • • •, '^^l]l there are numbers zi, 22, . . . , i^ and ji, J2, • • •, j/^ such that n -\-J2^a'^a = X l i a ^ a - What is to be shown is that there is an equation of this form in which ia > ia for Q; = 1, 2, . . . , /i — 1. If i^ < ja for some a, one can add T^a'^ii to both sides by adding 772^ to i^ and 777,c, to j ^ , increasing io, without changing any other i and without changing any j other than j ^ . Repetition of this step enough times makes ice ^ ja without changing the relation between i(i and j ^ for any (3 < /i other than a. Since this can be done for each a < /x, the desired conclusion follows.
Essay 3.2 Modules
73
Two numbers mi and 1712 are relatively prime if [mi, 7712] = [1]. Proposition 1 implies that if mi and m2 are relatively prime and nonzero then each is invertible mod the other. It also implies an important theorem of elementary number theory: The Chinese remainder theorem. / / / > 0 and F > 0 are relatively prime and g and G are given numbers, the congruences x = g mod / and X ~ G mod F determine a number x mod fF in the sense that there is a solution X of the congruences and any two solutions are congruent mod fF. Proof By the proposition, there are numbers /ci, A:2, /i, and I2 for which kif + 1 = k2F and hF + 1 = hf. Then x = g • k2F ^ G • hf satisfies X = g-k2F = g'{kif^l) = g mod / and x = G-hf = G-(/iF-f 1) = G mod F , as required. The uniqueness of x mod fF follows from simple counting: Since one of the fF numbers x less than fF solves each of the fF possible problems X = g mod / and x = G mod F in which g < f and G < F , no two of them solve the same problem. There is a natural way to multiply modules: The product of [777,1, '^2, . . . , m^] and [ni, 77,2, • • •, nj^] is by definition the module described by a list [..., rrianp,...] made up of all products of one m and one 77,, arranged in some order. Multiplication is well defined for modules in the sense that if one list is replaced by another list describing the same module, then the product list may change, but the module it describes will not. (This statement is clear if the passage from one factor to an equal factor involves rearranging the list or omitting or annexing zeros. If the passage involves adding a term to or subtracting a term from another, it is only slightly less obvious.) Multiplication of modules is obviously commutative and associative. All of the same ideas apply without change to modules of hypernumbers, except that there is no Euclidean algorithm in the case of hypernumbers, and the problem of establishing a canonical form for modules of hypernumbers is more challenging. Let a number A, not a square, be fixed throughout the discussion. A module of hypernumbers is simply a list [mi, m2, . . . , m^] of hypernumbers (for the given A) enclosed in square brackets. Two hypernumbers a and b are congruent modulo a given module, written a = fe mod [mi, 777-2, •••? m^], if there are hypernumbers ii, i2, . • •, i/^ and j i , J2, . . . , j ^ such that a -\- ^ia'^a = b -\- YlJa'^a- Two modulcs are equal if they determine the same congruence relation. Again it is easy to see that two modules are equal if the lists of numbers they contain can be obtained from one another by a sequence of steps in which (1) terms are rearranged, or (2) a zero is omitted from the list or annexed to it, or (3) a term is added to or subtracted from another term. (A subtraction is of course possible only when the coefficients of the term being subtracted are no larger than the corresponding coefficients of the term from which it is being subtracted.) In the hypernumber case, there is another elementary
74
3 Some Quadratic Problems
operation t h a t does not change t h e module, namely, (4) \fA times a term is added to or subtracted from another term. This set of four types of transformations t h a t change a module into an equal module are sufficient to establish a m e t h o d for determining whether two given modules are equal: T h e o r e m 2. pernumbers for [e/, eg + ey/A], g^ = A mod / .
Let A he a fixed number, not a square. Every module of hyA that is not"" equal to [0] is equal to a module of the form where e, f, and g are numbers for which ef ^ 0, g < f, and Two modules of this form are equal only if they are identical.
A module [e/, eg + e^/A] in which e / 7^ 0, ^ < / , and g'^ = A mod / will be said to be in c a n o n i c a l form. Proof. T h e following elaboration of t h e Euclidean algorithm p u t s any module t h a t is not equal t o [0] in canonical form after a finite number of steps. By assumption, t h e list t h a t presents the given module contains at least one nonzero entry, call it y + x\/~A. Because the number \y'^ — Ax'^\ can be annexed to the list, one can assume without loss of generality t h a t the list t h a t presents the given module contains a nonzero number. (The annexed number is y(7/ + X \ / A ) — x\fA{y + x\fA) if y'^ > Ax'^ and xy/A{y + x\fA) — y{y + xy/A) if y'^ < Ax'^. Therefore, annexing it t o the list does not change the module. It cannot be zero because A is not a square, so Ax'^ cannot be a square"^ unless x = 0.) Therefore, provided the given module is not [0], one can assume without loss of generality t h a t the list representing the given module has a nonzero number as its first entry. Moreover, because the first entry times \/]4 can be annexed to the list if necessary, one can also assume without loss of generality t h a t the list contains at least one hypernumber t h a t is not a number. Reduction to Canonical Form Input: A presentation of a module in which the first entry is a nonzero number, and at least one entry is a hypernumber t h a t is not a number. * The module [0] is equal only to modules of the form [0, 0 , . . . , 0]. It is a very trivial sort of module—congruence mod it is simply equality—which for the most part will be ignored. A module in canonical form is not [0]. ^ li y^ = Ax^ and x ^ 0, then A must be a square, as can be seen as follows: Let [x,y] = [d] ^ [0]. If 2/ = 0, then yl = 0 is a square. Otherwise, by Proposition 1, there are numbers a and jS for which ax + d = /3y, from which it follows that [xd] = [xd,Ax^] = [xd.y'^] = [xd,y^, f3^y^] = [xd, y^, {ax -\- df] = [xd,y^,d'^] = [d"^]. Thus, xd = d^,x = d, y"^ = Ad^, and {y/d^ = A. Or if one is willing to take the unique factorization of numbers as known, one can simply observe that some prime factor of A divides A an odd number of times, and therefore divides Ax'^ an odd number of times, so Ax"^ is not a square.
Essay 3.2 Modules
75
Algorithm: While the module is not in canonical form If any number in the list is preceded by an entry that is not a number, interchange the two. Otherwise, if the second entry is a number, use the Euclidean algorithm to replace the first two entries with their greatest common divisor. Otherwise, if there is a third term (in which case the first term is a number and the second and third terms are not numbers), make use of the first term to perform the Euclidean algorithm on the coefficients of y/A in the second and third terms. Specifically, if the coefficient of y/A in the second term is less than or equal to the coefficient of ^TA in the third term, add the first term to the third term as many times as necessary and then subtract the second term from the third; otherwise, interchange the second and third terms. Otherwise (in which case there are just two entries, the first a number and the second not), if the coefficient of y/A in the second entry does not divide the other coefficient of the second entry, annex VA times the second entry to the list as a third entry. Otherwise, if the coefficient of y/A in the second entry does not divide the first entry, annex y/A times the first entry to the list as a third entry. Otherwise (in which case the module has the form [e/, eg + ey/A] but is not in canonical form), subtract the first term from the second if possible. Otherwise, annex the difference of the numbers eA and eg^, which is the difference of the hypernumbers y/A{eg-\-ey/A) and g{eg-\-eyA), to the list as a third entry. End Output: The module in canonical form with which the algorithm terminates. Example. To apply the algorithm to [7+5\/3] one must first annex |7^—3-5^| — 26. The succeeding steps are [26, 7 + 5\/3] = [26, 7 + 5^3,15 + 7^3] = [26,7 + 5^3,8 + 2^3] = [26,8 + 2\/3,7 -h 5v^] = [26,8 + 2\/3,25 + 3\/3] = [26,8 + 2V^, 17 + 73] - [26,17 + v ^ , 8 + 2^3] = [26,17 + x/3,17 + v^] - [26,17 + \/3,0] = [26,0,17 + \/3] = [26,17 + V^]. The final module is in canonical form because 1 divides both 17 and 26, 17 is less than 26, and 17^ = 3 mod 26. This algorithm terminates for the following reasons: By assumption, the input list contains both a number and a hypernumber that is not a number. A step that changes a hypernumber into a number leaves a hypernumber in the list unchanged, and the only step that reduces the number of numbers in the list replaces two numbers with a single one (the greatest common divisor of the first two entries). Therefore, at each step
76
3 Some Quadratic Problems
there is at least one number and at least one hypernumber not a number. They are arranged by the first step to put all numbers first, and the second step eventually reduces the number of numbers in the list to one. Steps of t h e first three types do not change t h e greatest common divisor of the coefficients of \/]4 t h a t occur in entries of t h e list. Since they reduce the total of the numbers in the list or t h e total of t h e coefficients of ^Tk (except for the finitely many steps t h a t rearrange terms), eventually a step beyond t h e first three must be reached. Each such step reduces either t h e greatest common divisor of the coefficients of ^/]4 or t h e greatest common divisor of t h e numbers in t h e list (except for finitely many steps t h a t reduce the first coefficient of the second term), so only a finite number of t h e m can occur before canonical form is achieved. L e m m a . Let [e/, eg + e\/]4] he a module in canonical form. The congruence y + xy/A = 0 mod [e/, eg + eVA] is equivalent to the pair of congruences y = gx mod ef and x = 0 mod e. Proof
Because efVJ
+ efg = f [eg +
e^/A^
and vA
{eg + evAj
= g ieg ^ evA\
mod e / ,
an equation of t h e form y + x \ ^ + iief
+ i2 (eg + e\fA\
= y' + x'\fA
+ jief
+ J2 Ug + e\/~A\
in which zi, 22, j i , J2 are hypernumbers implies another equation of the same form in which i i , 22, j i , J2 are numbers. (For example, if zi = a + / 5 \ / ] 4 , one can add I3efg to b o t h sides and replace l3\fAef-]-f3efg with f]f{eg-\-ey/A) to obtain another equation of t h e same form in which ii is a, 22 is increased by / 3 / , and ji is increased by Pg. If ^2 = Q^ + / 3 V ^ , one can use the fact t h a t y/A{eg + e\/^)-\7 e / = g{eg + ey/A) + Sef for suitable 7 and S to add P'yef to b o t h sides and replace f3y/A{eg + e^/A) + P^ef with f3g{eg + ey/A) -f- pSef to obtain another equation of the same form in which Z2 is a -\- Pg, while pS is added to ii and P^ is added to j i , and so forth.) Therefore, y -h xy/A = 0 mod [e/, eg + ey/A] if and only if y -\- x \ ^ -\- iief -\- i2{eg -\- ey/A) = jief-\-J2{eg-\-e\/^) for some numbers i i , 22, j i , J2- Comparison of the coefficients of VA shows not only t h a t X = 0 mod e but also t h a t x -\- i2e = J2e; then comparison of the other t e r m s shows t h a t y-\-iief-{-i2eg = jief-\-J2eg = jief-\-{x-\-i2e)g and therefore t h a t y = gx mod ef. Conversely, if x -h ie = je and y + kef = gx -\- lef for numbers z, j , A:, /, then y-\-xy/A-\-kef-hi{eg-\-ey/A) = gx-\-lef -{-jey/A-\-ieg = gje + lef + je\/A = j{eg + ey/A) + lef, so 2/ + xy/A = 0 mod [ef, eg + e\/\A].
Essay 3.2 Modules
77
Completion of the Proof of Theorem 2. Thus, if [e/, eg -f eVA] = [e' f, e'g' + e'y/A\^ then e'^f' + e'^/A = 0 mod [e/, eg + e\/]4], which imphes in particular that e' = 0 mod e. By symmetry, e = 0 mod e', so e = e^ Then e' f = ^ • 0 mod ef implies f' = 0 mod / , and f — f follows as before by symmetry. Finally, eg' + e\fA = 0 mod [e/, eg + e v ^ ] implies eg' = ge mod e/, which is to say g' = g mod / . Since both g and ^' are less than f = f ^ 9 = 9' fohows. Corollary. / / two modules are equal, each can be transformed into the other by a sequence of steps of types (l)-(4)' Deduction. Such steps suffice to transform a module into its canonical form and vice versa. The product of two modules of hypernumbers can be defined, exactly as in the case of modules of numbers, to be the module described by the list containing all products in which one factor is from a list describing the first module and the other factor is from a list describing the second. Products are easily shown to be well defined for modules; that is, if a factor is replaced by an equal module, the new product is equal to the old one. The product operation defined in this way is commutative and associative, which is to say that it makes the set of modules of hypernumbers for a given A into a commutative semigroup. In this semigroup, [1] is an identity. The "modules" described here are closely related to Dedekind's "ideals" in the ring Z[\/]4], but the underlying attitude is opposite to Dedekind's. His goal was to divorce the theory as much as possible from algorithmic techniques, and he felt that he had achieved his goal by considering the infinite set of all ring elements that are zero mod [e/, eg-\-e^/A] to be a mathematical entity. To me, it borders on the absurd to believe that a mathematical idea is made "concrete" [23, Remark 21, p. 60] by describing it as an infinite set whose elements are themselves abstractions. Modules of hypernumbers for a given A are made concrete by specifying how they are to be described (as finite, nonempty lists of hypernumbers between square brackets) and how to compute with them (they are multiplied by the familiar rule, and one determines whether two given modules are equal by reducing them both to canonical form). Examples. When A — 3^ some modules in canonical form are [2,1 + ^ 3 ] , [3, A/S] , [11,5 + Vs], [ll, 6 + \ / 3 ] . Some products of such modules are 2,1 + Vsl [2,1 + V^j = [4, 2(1 + V3), 4 + 2\/3] = [2] [2,1 + A/3, 2 +
A/S
= [2] [2,1 + ^/3,1]= [2], 11, 5 + \/3] [11,5 + V3J = [ l l ^ 11(5 + V^), 28 + 10\/3 121,55 - 28 + (11 - 10)\/3,28 -h 10\/3 = 1121, 27 + VS, 28 + 10\/3 + 2 • 121 - 10(27 + V^)
78
3 Some Quadratic Problems
121,27 + Tsl , 11,5 + \/3l 111,6 + V31 = 1121,11(5 + V3), 11(6 + Vs),33 + llVsj = [11] [ll, 5 + VS, 6 + V3,3 + \/3l - [11] fll, 2,3,3 + \/3l= [11] [ll, 2,1,3 + ^ 1 = [11].
Essay 3.3 The Class Semigroup. Solution of AB -\- B = D.
79
Essay 3.3 T h e Class Semigroup. Solution of An + B = D. ... die schwierigste Frage ... ndmlich die, oh zwei reducirte Formen derselben Determinante, welche verschiedenen Perioden angehoren, dquivalent sein konnen oder nicht. ( . . . the most difficult question . . . , namely, whether two reduced forms with the same determinant t h a t belong to different periods can be equivalent.)—P. G. Lejeune Dirichlet [16, §80] Again let ^ be a fixed number, not a square. As was seen in t h e previous essay, the modules of hypernumbers for A form a commutative semigroup under multiplication, or, to p u t it more simply, the operation of multiplication of modules is commutative and associative. Computations in the semigroup of modules will be used in this essay to solve An -\- B = D. A key role will be played by t h e following notion of equivalence of modules. A module will be called principal* if it can be expressed in the form [?/ + x\/]4] for some hypernumber y-\-xy/A t h a t satisfies y'^ > Ax^. T h e principal modules form a subsemigroup—in other words, a product of principal modules is principal—by virtue of B r a h m a g u p t a ' s formula {y'^ — Ax^){v'^ — Au^) = {yv + Axu)^ — A{yu + xv)'^, because this formula shows t h a t the product [{y + x\^){v + US/A)] = [{yv + Axu) + {yu -f xv)y/A] of [y -\- X^/A] and [^; + 'U\/A] satisfies {yv-{-Axu)^ > A{yu-\-xv)'^ when y'^ > Ax'^ and v'^ > Av?. Two modules M i and M2 will be called e q u i v a l e n t , written M i ~ M2, if there are principal modules P i and P2 for which M i P i = M2P2. This is an equivalence relation (transitivity follows from t h e fact t h a t a product of principal modules is principal, because M i P i = M2P2 and M2P1 = M3P2 imply M i P i P { = M2P2P1 = M3P2P2) t h a t is consistent with multiplication of modules ( M i ~ M2 implies M1M3 ^ M2M3 for any module M3). T h e c l a s s s e m i g r o u p is simply t h e set of equivalence classes, multiplied by multiplying representatives. Otherwise stated, t h e class semigroup is t h e quotient semigroup of t h e semigroup of modules relative to the subsemigroup of principal modules. Computations in t h e class semigroup depend on solving t h e problem of determining whether two given modules are equivalent; this problem, which I will call t h e equivalence problem, is solved by the theorem of this essay. It is the main step in the solution of AU -h P = • • T h e equivalence problem cannot be solved by giving a canonical form t h a t picks one representative out of each equivalence class, because there is no natural canonical form for this particular equivalence relation. Instead, the solution of t h e equivalence problem will follow a procedure like t h e one Gauss used in Section 5 of Disquisitiones Arithmeticae to determine whether two given binary quadratic forms are equivalent; it consists of two parts, the first establishing t h a t every module is equivalent to one in a certain finite set of * This term derives from the fact that the module is in the principal class of the class group. It has always seemed to me peculiar to apply the adjective "principal" to the module itself, but the usage is universal among mathematicians.
80
3 Some Quadratic Problems
Fig. 3.2. Gauss.
stable^ modules, and t h e second giving a method of determining whether two stable modules are equivalent. Specifically, an algorithm—the "comparison algorithm"—will be given for generating a sequence of modules equivalent to a given one. A sequence of equivalent modules generated by the comparison algorithm eventually begins to cycle, as will be shown; a module will be called s t a b l e if the sequence of modules obtained by applying t h e comparison algor i t h m to it cycles back to this module itself. T h e equivalence problem will be solved by showing t h a t the obvious sufficient condition for the equivalence of two modules—namely, t h a t apphcation of t h e comparison algorithm to t h e m leads to the same cycle of stable modules—is also necessary. Thus, the answer to Dirichlet's "most difficult question" is no: Reduced forms in different periods are not equivalent, or, in the present formulation, stable modules in different cycles are not equivalent. By the definition of equivalence, a module [e] [/, g -h \^4] in canonical form is equivalent to [f,g-\-^/A]. Therefore, in solving the equivalence problem one can assume without loss of generality t h a t the given modules in canonical form have e = 1. ^ I have avoided Gauss's term "reduced" because it conflicts with my term "reduction algorithm," an algorithm for reducing a coefficient of VA, not for reducing the module.
Essay 3.3 The Class Semigroup. Solution of AD -\- B = B.
81
Comparison Algorithm. Input: A module [f,g + \/A] in canonical form with e = 1. Algorithm: Let r be the smallest solution ofr-\-g = 0 mod / for which r^ > A. Let A ^ (r2 - A)lf. Let gi be the smallest solution of gi = r mod / i . Output: A module [fi,gi-\-VA] in canonical form with e — 1 and an equation [r + \/A] [/, g -\- y/A] = [/] [/i, gi + ^/A] showing that it is equivalent to the input module. That the definition of/i makes sense—that is, that r'^ = A mod /—follows from r = —g mod / and the fact that ^ is a square root of A mod / . That gi is a square root of A mod / i follows from r = gi mod / i and r'^ — A = ffi. Of course gi < fi, because gi is the smallest number in its class mod / i . Finally, when q is defined hy qf = r -{- g one obtains the output equation [r + V I ] [/, ^ + / A ] = [/(r + y Z ) , r^ + A + (r + g) V I ]
= [/(r + VA), fq{r + VA),rg^r^~
ffi + (r + g)y/A]
= [f{r + VA)Jq{r + ^/A),rfq-ff^^fq^/A] = [f{r + ^/A), ff^^rfq - ff, + fqVI] = [/(r + ^/I),/A] = [/][/i,r + y i ] = [f][f,^g, ^ VJ]. Let the output [/i, gi -h \ / I ] of the comparison algorithm be called the immediate successor of the input [/, g + A / I ] , and let the successors of [/, g+y/A] be the modules in the sequence generated by repeated application of the comparison algorithm. Not only is each successor of [/, g + y/A] equivalent to [/, g -j- \ / I ] , but the algorithm gives an explicit equivalence
(1)
n(n+vi) [/,(/ +VI] =
"fe-1
"
fki9k
where [fi, gi + \ / I ] is the ith successor of [/, g + \/A] = [/o, go + y/A] and where ri is the value of r used by the comparison algorithm to go from [/^_i, ^^_i + ^/A] to [fi,gi^y/A]. Theorem. Let [/, g + \ / I ] and [F, G -\- \/A] be modules in canonical form with e = 1, and let [F, G-\-V\A] be stable. Formula (1) describes all equivalences between [/, g + \ / I ] and [F, G -h \ / I ] ^^ the sense that any equivalence \Y + X^fA][f,g+^fJ] = [V^UVA][F,G^y/A] in which Y^ > AX^ andV^ > AU'^ must satisfy k
(v + UVA) J] {n + VA)
k-1
= (Y + XVA) H
fi,
82
3 Some Quadratic Problems
where k is a number for which [F, G + VA] is the kth successor of [/, g + y/A]. In particular, there are no equivalences when [F, G + y/A] is not a successor
of[f,g + VA]. Equation (1) implies that both coefficients of the hypernumber / n^^]^(ri + VA) are divisible by Hi^o /*• Therefore both coefficients of (ri -\- VA){r2 + VA) • • • (^/C + y/A) are divisible by /1/2 • • • fk-i- Thus, (1) can be divided by /1/2 • • • fk-i^ which will normally be a very large number, to put it in the form
(2)
[y + xVA] [/, g + VA] = [/] [fk, <7fc + VI] where
, + xx/I =
n t i ^ l ± ^ .
The theorem shows that the most general equivalence [Y -f Xy/A][f,g^
y/J] = [V ^ UVA][F,
G+VI]
is obtained from (2) by multiplying by [{V + Uy/A) YliZi fi] ^^^ dividing by
[n-=o/.]. Proposition 1. The number of stable modules is finite, and every module has a stable successor. Proof As above, let [fi.gi + ^/A] be the ith successor of the given module, and let r^ be the number used in passing from [/j_i,^^_i + \/A] to [fi.gi + \/A]. In other words, let r^ be the least solution of r^ + ^^_i = 0 mod fi-i whose square is greater than A. It will be shown that if \ri — / i - i p > A, then | r ^ - / ^ - l p > |r,+i - fi\'^. Note ffist that \ri — / i - i p < ^ if and only if \ri — /^p < A, because both are equivalent to fi-i -\- fi < 2ri, as one sees when one writes them as rf + ff_i < 2rifi-i -i- A and rf -\- ff < 2rifi -\- A, respectively, subtracts A from both sides, and uses rf — A = fi-ifi to obtain fi-ifi + f^_i < 2r^/i_i and fi-ifi + ff < 2rifi^ respectively. In the same way, the three inequalities \ri - / i - i p > A, \ri - Zip > A, and fi-i -\- fi > 2ri all imply one another. Also, on successive steps, the inequality r^ + r^+i > 2fi holds, as can be seen as follows: Because r^-hri-t-i = gi-\-ri-^i = 0 mod fi, it will suffice to prove that ri -\- r^+i > fi. This is true if ri > fi. It is also true if r^ < /z_i, because then rf > rf - A ^ fifi-i > fin, so r^ > fi. Otherwise, /^_i < r^ < fi, in which case (r^ — fi-iY < -A by the definition of r^ {A is not a square, so {ri — fi-iY 7^ A), which implies |r^ — fi\^ < A, as was just seen. Thus, {fi — ^i)'^ < f^i-\-i and fi — ri < r^+i in this case as well. Suppose now that |r^ — fi-i\^ > A. If fi < r^+i, the definition of r^+i implies |r^+i — fi\^ < A, so of course |r^ — / ^ - i p > |r^+i — /^p in that case. Otherwise, r^+i < fi, in which case the inequality of the last paragraph imphes that ri — fi > fi — rij^i > 0. On the other hand, the assumption |r^ —/^_ip > A
Essay 3.3 The Class Semigroup. Solution of AD + B = D.
83
implies fi-i -\- fi > 2ri, as was seen above. Therefore /^_i —ri > ri — fi^ which combines with t h e previous inequality t o give fi-i — Vi > fi — r^+i > 0, from which the desired inequality |r^ — / ^ - i p > |^i+i — / z P follows. Therefore |r^ — / i _ i p decreases as long as it is greater t h a n A, so a step must be reached at which |r^ — / i - i p < A. T h a t t h e same inequality holds on all subsequent steps—which is t o say t h a t Ir^ —/^_i p < A implies |r^+i —/^P < A—can be proved as follows: If |r^—/^_ip < A, then |r^—/^p < A, as was seen. If |r^+i—/^p were greater t h a n A, t h e n fi would be greater t h a n r^+i (r^+i is t h e least number in its class m o d fi whose square is greater t h a n A), in which case t h e above inequality n + r^+i > 2fi would imply U - fi > fi - r^+i > 0, from which \ri - / ^ P > 1/^ — r ^ + i p > A would follow. Therefore, 1/^ — r ^ + i p must be less t h a n A. Thus, t h e sequence of successors of any module eventually reaches a module [/, g-\-\^] in canonical form for which |r — / p < A, where r is t h e least solution of r + ^ = 0 mod / for which r^ > A. Let M. denote t h e set of such modules. T h e set A4 is finite, as one sees when one sets (p = |^ — / | and notes t h a t t h e n (f)^ < A and (j) = =br = =b^ m o d / , so / divides A — (j)"^. In particular, / < A. Since canonical form requires t h a t g be less t h a n / , A1 is therefore finite. T h e comparison algorithm defines a function from A4 t o itself, as was shown above. Since it carries [/,^ + y/A] in A^ t o a module [fi,gi + VA] for which |r — / i p < A, / i and gi determine r as t h e least number in t h e class of gi mod / i whose square is greater t h a n A. (If gf > A, t h e n r = gi; otherwise, r = gi + / i / i for /i > 0.) Therefore, [fi^gi -\- VA] determines r and determines [/, ^ -h y/A] by t h e rules f = {r"^ — A)/fi and g = —r mod / . In short, t h e function from A4 to itself defined by t h e comparison algorithm is one-to-one. Therefore, t h e comparison algorithm permutes t h e finite set A^, which implies t h a t every module in M. is stable—application of t h e comparison algor i t h m to it cycles back to this module itself—and t h e proof of t h e proposition is complete. Moreover, it has been shown t h a t t h e stable modules are precisely those in Ai. (It is not difficult t o show t h a t these are t h e modules [/, g -\- VA] in canonical form in which / divides a number of t h e form A — 0^ and t h e square root g oi A mod / satisfies either g'^ < A o r {f — gY < A.) See the table at t h e end of t h e essay for a list of t h e stable modules for a few values of A and t h e cycles into which they are partitioned by t h e comparison algorithm. T h e first step in finding all equivalences between [/, g + A/A] and [F, G + \ / A ] , where [F,G + ^/A\ is stable, will be t o find all equivalences of t h e special f o r m [ ^ - h x V ^ ] [ / , ^ + \/]4] = [n][F,G^VA] in which ^ - h a ; A / I = 0 m o d [ F , G 4 \fA\. T h e solution of this problem will use t h e following algorithm: Reduction Algorithm. Input: An equation [y -h x^/A][f,g -^ ^/A] = [n][F, G + V l ] in which x > 0, y^ > Ax^^ t h e modules [/, ^ -h \fA\ and [F, G -h \/~A\ are in canonical form, and y + x\fA = 0 mod [F, G + \fA]. ([F, G + ^/A] need not be stable.)
84
3 Some Quadratic Problems
Algorithm: Determine p as the least number congruent to G mod F for which y < px. Define yi -h Xl^/A to be {p - \fA){y + x\/A)/F. Define Fi to be {p^ — A)/F and Gi to be the least solution of p -\- Gi = OmodFi. O u t p u t : A new equation [yi + xi\fA\[f^g + \/]4] = [n][Fi,Gi -h \ / A ] , with Xi < X, which can be used as a new input equation—that is, [-Fi, G i + \/A\ is in canonical form, yi > Ax\^ and y\ + xi\fA = 0 mod [Fi, Gi -h \/^]—unless xi = 0 . Justification. By t h e choice of p, px > y and p = G mod F , so t h e definition of xi as {px — y)/F is valid by virtue of ^ = Gx = px mod F (because y + x\fA = 0 mod [F, G + V ^ ] ) - Moreover, Fxi = px — y < Fx because px — y > Fx would imply p> F and (p — F)x > ?/, contrary to the definition of p. Thus, xi < X. Since p'^x'^ > y^ > Ax'^ implies p^ > A (because x > 0), it follows t h a t [pyY — f? - y^ > A- Ax^ — (Ax)^ and py > Ax; at the same time, py = Gy = G^x = Ax m o d F , which shows t h a t the definition of yi as {py — Ax)/F is valid. T h a t yf > Ax\ follows from (p^ - A){y'^ - Ax'^) > 0 when one rewrites this inequality first as p^y^ -\- A?x^ > Ax'^p^ -\- Ay^ ^ then as {py — Ax)"^ > A{xp — yY^ and divides by F^. Since p^ = G^ = A mod F , the definition of F i as (p^ - A)/F is valid and F i > 0. Also, Gl = {-pf = p'^ = A mod F i by virtue of p^ - yl = F F i , so [ F i , G i + \AA] is in canonical form. W h e n q is defined by p + Gi = g'Fi, one deduces [Fi][F,G + V 3 ] = [Fi][F,p + ^] = [(p - y/A){p + ^ ) , F i ( p + v ^ ) ] = [(gFi-Gi-A/Z)(p+x/I),Fi(p + v^)] = [ ( g F i - G i - y Z ) ( p + x / I ) , F i ( p + \/]4), g F i ( p + VA) - {qFi - G l - V ^ ) ( p + V ^ ) ] (the third entry is q times the second minus the first) = [{qFi - Gi - V ^ ) ( p + \/]4), F i ( p -h \ / A ) , ( G I +
^ ) ( p + v^)] = [Fi(p + ^ ) , ( G i + VI)(p + V^)] = [ p + ^ ] [ F i , G i + VI]. Since (p + V ^ ) ( y i + X I A / A ) = (p^ - ^)(?/ + a:\/]4)/F = Fi(?/ + x\/]4) = 0 mod [Fi][F,G + / A ] , t h e equation [Fi][F,G + V ^ ] = [ p + v ^ ] [ F i , Gi + V A ] implies (p + V A ) {yi-\-Xi ^/A) = 0 mod [p + \/]4] [Fi, Gi + V ^ ] and therefore implies* yi+xi^/A = 0 mod [Fi, Gi + %/]4]. Finally, multiplication of the input equation by [Fi] gives [Fi][?/ + x ^ ] [ / , ^ + / A ] = [n][p + y : 4 ] [ F i , G i + ^ ] ; multiply by [p—V^]—which is valid even though p—y/Ais not a hypernumber because the hypernumbers y-\-xy/A and p + y/A can b o t h be multiplied by p—y/A—to put this equation in t h e form [Fi][F{yi + xiy/A)][f, g-j-^/A] = [n][FFi][Fi, Gi + ^/A] and divide by [FFi] to conclude t h a t the o u t p u t equation holds. T h e theorem will be proved by proving t h a t if [F, G + ^/A] is stable and if the reduction algorithm is applied iteratively until x is reduced to zero, then (1) the terminal equation is obvious from t h e original equation and (2) the steps of the algorithm can be retraced using t h e comparison algorithm * The definitions imply—when use is made of the fact that if a, 6 and c are hypernumbers with c 7^ 0 then ac = be implies a = b—that, for any nonzero hypernumber c and any module M, a congruence ac = be mod [e]M implies a = b mod M.
Essay 3.3 The Class Semigroup. Solution oi AU -{• B = B.
85
to go from the terminal equation back to the original, thereby determining the possible original equations and showing that [F, G -f y/A] is a successor of [f,g + V ^ ] . For example, the input equation [236 + 89\/7][83,16-\-V7] = [83][3, l + \/7] leads to [236 -h 89\/7][83,16 + v^] = [83] [3,1 -f V7] [107 + 40^7] [83,16 + ^ ] = [83] [3,2 + ^ ] [85 + 31v^][83,16 + ^/7] = [83] [6,1 + V7]
{pi = 4), (p2 = 5), {p^ = 7),
[63 + 22A/7][83, 16 + X/7] -
[83] [7, V7]
{PA = 7),
[41 + 13^7] [83,16 + V7] [19 + 4^7] [83,16 + V7] [35 + 3x/7][83,16 + A/7] [51 + 2v^][83,16 + v^] [67 + A/7] [83,16 + 77] [83] [83,16 + / f ]
[83] [6, 5 + ^7] [83] [3,1 + ^7] [83] [14, 7 + ^7] [83] [31,10 + v^] [83] [54,13 + x/7] [83] [83,16 + A/7].
(ps = 5), (pe - 7), (pr = 21), (ps = 41), (pg = 67),
= = = = = =
Each step leaves the second factor on the left and the first factor on the right unchanged. At the last step, the uniqueness of canonical form implies that the two sides are identical. Therefore, the terminal equation [83] [83,16 + \/7] = [83] [83,16 + \/7] is determined without computation by the original one. Moreover, at each step the module on the right will be seen to be the immediate successor of the module below it. In fact, the number pi used to go from equation i — 1 to equation i is the number r used by the comparison algorithm to go from the module in equation i to the one in equation i — 1, which implies that the input equation at the top of the list can be obtained by starting with the identity at the bottom and successively multiplying by 67 + A/7, dividing by 83, multiplying by 41 + \/7, dividing by 54, and so forth, applying the operations to the hypernumbers in the first factors on the left and to the modules in the second factors on the right. As this example indicates, the key fact used to determine the possible input equations is that application of the reduction algorithm to an equation [y + xy/A] [/, g + \fA] = [n] [F, G + ^/A] in which [F, G + VA] is stable produces a sequence of equations [i/i + Xi^/A][f^ g + \/]4] = [n][F^, Gi + ^/A\ in which the immediate successor of [F^, Gi + A/A] is [Fj_i, G^_i + \fA\ and the number Pi used by the reduction algorithm to go from equation i — 1 to equation i is the number used by the comparison algorithm to go from [F^, Gi + y/A\ to its immediate successor. Let a step of the reduction algorithm be called traceable if the number p used to perform it is equal to the number r used by the comparison algorithm to determine the immediate successor of [Fi, Gi + Lemma. A step of the reduction algorithm is traceable if [F, G + \fA\ is stable or if it follows a traceable step. Proof. Let [y + x^/A][f, g + y/A] = [n][F, G + \/A] be an input to the reduction algorithm. To say that the resulting step of the reduction algorithm is
86
3 Some Quadratic Problems
traceable is to say that p = r where p is the number used by the reduction algorithm and r is the number used by the comparison algorithm to determine the immediate successor of [Fi,Gi -f y/A]. Let s be the number used by the comparison algorithm to determine the immediate successor The first step of the proof will be to show that if the step is not traceable, then p-\- s = F. If the step is not traceable, then because p'^ > A and p-\- Gi = 0 mod Fi, and because r is by definition the smallest number for which r'^ > A and r + Gi = 0 mod Fi, p must be at least as great as Fi, and the square of p- Fi must be greater than A. Then {p - Fi^ > A or p^ + Ff > 2pFi + A, from which it follows (subtract A and divide by Fi) that F -\- Fi > 2p and F > p -\- {p — Fi) > p^ so F — p > p — Fi > 0. If p -\- s were greater than or equal to 2F, it would follow that s > F -\- {F - p) > F dnid {s - F)'^ > {F - PY > {p- FiY > A, contrary to the definition of s. Thus, p + s < 2F. Since p + s = G + 5 = 0 mod F , the desired conclusion p-\- s = F follows from the assumption that the step is not traceable. Since p ^ s = F implies |s — F p = p^ > A, which in turn implies that [F, G + \/]4] is not stable (see the proof of Proposition 1), the first statement of the lemma, that if [F, G + y/A] is stable then the step is traceable, follows. Suppose, finally, that the step follows a traceable step of the reduction algorithm. Then the step that it follows is retraced by multiplying by s + \/A and dividing by F . Thus, the previous x is {y + sx)/F. Since the reduction algorithm reduces x, it follows that x < ^^^^, which is to say Fx < y ^ sx. If the step were not traceable, F would be 5 + p, so px + sx would be less than y + sx, and px would be less than ?/, contrary to the definition of p. Therefore, the proof of the lemma is complete. Proposition 2 (Solution of Pell's equation). The only solutions of PelVs equation Ax'^ -\-l = y'^ are those given by the reduction algorithm, namely, the pairs {x,y) given by formula (2) when [f,g + \fA] = [fki9k + \fA] = [!]• Proof. Putting [y^x\fA\ in canonical form when y and x are relatively prime and y'^ > Ax"^ easily gives [y + XVA] — [y'^ — Ax'^^g + V^], where g is determined by y = gx mod (^^ — Ax"^). Therefore, Ax^ + 1 = 7/^ implies [y + x\/]4] = [1]. Conversely, if y^ > Ax^ and [y + XA/A] = [1], then x and y are relatively prime and y^ — Ax^ = 1. In short, solutions of Pell's equation correspond one-to-one to hypernumbers y + xy/A for which y^ > Ax^ and [y + xVA] = [l]. Since [1] is stable, infinitely many of its successors are [1]. Each such successor implies a solution {xk,yk) of Pell's equation given by the formula yk
-^Xk^/A
^Yt
is to be shown is that there iare no others.
Essay 3.3 The Class Semigroup. Solution of AD + 5 = D.
87
But a solution of Pell's equation is, as was just shown, a hypernumber y + x\fA that satisfies [y + xv^][l] — [1]. This equation is an input to the reduction algorithm (write the right side as [1][1,A/A]), and repeated application of the reduction algorithm reduces it to [l][l,\/]4] = [l][l,^/]4]. The input y + x\fA = 1 + 0 • ^/A of course is already reduced and corresponds to the trivial solution A • 0^ + 1 = 1^ of Pell's equation. Otherwise, the reduction requires /c > 1 steps, all steps are traceable, and y + x\/A is obtained when 1 is multiplied by pk + \fA and divided by 1, multiplied by pk-i + ^/A and divided by / ^ - i , and so forth. Since the sequence of p's is the sequence—in reverse order—of r's obtained by applying the comparison algorithm to [1], Proposition 2 follows. For example, when A = 13, the cycle of [1] is [1], [3,1 + Vl3], [4,1 -h vT3], [9,7 + ^13], [12,ll + Vl3], [13,713], [12,1 + 713], [9,2 + ^13], [4,3 + 713], [3, 2 + \/l3], after which the sequence returns to [1] and repeats. The r's used at the successive steps are 4, 5, 7, 11, 13, 13, 11, 7, 5, 4, after which they repeat. Thus the smallest solution of 13a:^ -\-l = y'^ other than the trivial one is given by the coefficients of (4 + 7 l 3 ) ' ( 5 + 713)^(7 + 713)^(11 + 713)^(13 + 713) 32 .42 . 92 .122 . 13
which is easily found to be 649 + 1807l3. That is, the smallest solution of Pell's equation when A = 13 is 13 • 180^ + 1 = 649^ Since (649 + 1807l3)^ = 842401 + 2336407l3, the next smallest solution is 13 • 233640^ + 1 = 842401^ and so forth. (For any A^ as for A = 13, the sequence of r's is in fact a palindrome, so the sequence of p's is identical to the sequence of r's.) Proposition 3. / / [/, g + ^/A\ is principal, then [f,f — g-h VA] is also principal, and the product of these modules is [/]. Proof. Suppose [/, ^ + 7 2 ] = [y-\-xy/A], where y'^ > Ax'^. Since any common divisor of x and y divides the coefficient of \fA in ^f + \fA, x and y are relatively prime. Therefore, the number y'^ —Ax^^ call it TV, is relatively prime to X, and x has a reciprocal, call it r, mod N. Then [y -\- x^/A] = [N, y -\XVA, r{y + X 7 A ) ] = [N,G-\7 3 ] , where G = ry mod N, so f = N, g = G, and y = xry = gx mod A^. The solutions (X, Y) of Pell's equation obviously grow without bound, so there is a solution Y'^ = AX'^ + 1 of Pell's equation in which X > x. Since y^-Ax^ = / > 0, it follows that X'^y^ = AX^x^-^fX^ > AX^x'^-hx^ = Y^x'^, which implies Xy > Yx. Also, Y'^y'^ = A^x^X'^ ^-Ax^ ^ f AX'^ ^ f > A^X'^x^, so Yy > AXx. Therefore, the formula z + wy/A = (Y -\- XVA){y - XVA) defines a hypernumber z -\- wyfA (even though, by the strict definition being used here, y — xyA is not a hypernumber). This hypernumber satisfies [z + w^fA){y + X 7 A ) = ( r + X^fA){y - x^fA){y + x^^A) = ( F + X^fA)f, Thus,
88
3 Some Quadratic Problems
[z + wy/A][y -h x^/A] = [F + Xy/A][f]
= [/], and what is to be shown is t h a t
Now, z^ = {Yy-AxXf = Y'^y^-2AxyXY^A^x^X^ = Y'^y'^-AY^x^ + AY^x^ - 2AxyXY + AX^y^ - AX^y^ + A^x^X^ = {Y^ - AX^){y^ - Ax^) + A{Xy — Yx)'^ = Aw'^-hf. Thus z'^—Aw'^ = / , and it remains only t o show t h a t z = {f—g)w mod / . B u t equating coefficients of VA in {z-{-w^/A){y-\-x^/A) = ( y + X\/A)f gives wy -\- zx = / X , so 0 ^ wy -\- zx = wgx -\- zx mod / , which implies, because x is relatively prime to / , t h a t wg -\- z = 0 mod / , or z = (f — g)w mod / , as was to be shown. Corollary.
A module that is equivalent
to [1] is
principal
Deduction. To say t h a t M is equivalent to [1] means t h a t there are principal modules P i and P2 for which MPi — P2. By Proposition 3, there is a principal module P3 such t h a t P1P3 = [n] for some number n. Thus M[n] = P2P3, which implies M[n\ = [z -{- wy/A]^ where z^ > Aw'^. This equation impHes t h a t n divides b o t h z and i^, so M = [^ + ^^^^1 ^^ principal. Proof of the Theorem. Suppose t h a t [/, g + y/A] and [P, G + ^/A] are equivalent—say [y + xy/A][f,g + y/A] = [v -\- UVA][F,G + y/A] where y'^ > Ax^ and v^ > Au^—and t h a t [P, G + \/]4] is stable. By Proposition 3, there is a hypernumber z + w\fA for which z^ > Aii;^ and [z + i^jv^lit' + u\/A\ = [n] for some number n. Let t h e given equivalence between [/, g + y/A\ and [P, G + \fA\ be multiplied by [P(2; + w\fA)] to yield an equation of the form [Y + X / A ] [/, ^ + \/]4] = [N] [P, G + \/A], where TV = Fn, t h a t is an input to the reduction algorithm. Application of the reduction algorithm reduces this equation to [N][f,g + y/A] = [N][f,g^ \fA]. Since [P, G + \fA] is stable, the steps of the algorithm can be retraced by applying the comparison algorithm to [/, ^ + \ / ^ ] , from which it follows t h a t Y + X\/~A can be obtained by multiplying N hy ri -{- \/]4, dividing by / , multiplying by r2 + \ / ^ , dividing by / i , and so forth, stopping with the kih step, where [P, G + \/~A] is the A:th successor of [/, ^ + \ / ^ ] . In short.
Y ^Xy/A
= N 112 = 0 /^
Since Y + X \ / ] 4 = F{z + it;\/]4)(2/ + x\/]4) and N = P n , the equation F{z + ^ ^ ) ( ? / + x^/A) n t r j / i = ^ ^ n j = i ( n + V ^ ) foUows. The equation k-l
{y + xs/A)
k
J J / i = (i; + w \ / I ) J J ( r , + V I ) Z=0
2=1
of the theorem follows when one multiplies by i; + uyA C o r o l l a r y ( S o l u t i o n of t h e e q u i v a l e n c e p r o b l e m ) . different cycles are not equivalent.
and divides by Fn. Stable modules
in
Essay 3.3 The Class Semigroup. Solution of AD + B = U.
89
Deduction. If two stable modules are equivalent, the theorem implies that each is a successor of the other, so they are in the same cycle. Solution of An^B = D. A solution of Ax^ -\-B = y'^ is called primitive if X and y are relatively prime. The primitive solutions are found in the following way: For each square root p of A mod B, use formula (2) to find all solutions
of the problem [y + x\fA\[B^p^yfA] — [B]. Each pair {x, y) found in this way is a primitive solution of Ax'^ -\- B — y'^, and there are no others. The solutions {x,y) that are not primitive are of the form {ud^vd), where is a square factor of B and (w, v) is a primitive solution of Au^ ~^ ^ ~ v^. Therefore, they can be found by finding all square factors d^ of B that are greater than 1 and, for each of them, using the method just described to find B_ all primitive solutions (u, v) of Au^ Proof. The hypernumbers (3) are the solutions of [y -\- x\fA\[B^p -\- \fA\ = [5][1, yA\ found by the construction of the theorem. Multiplication by [5, B — p + \fA] then gives [y + xV^][5] = [B][B, B - p-^ ^/A], which implies [y + xv]4] = [B, B — p-\-\/A]^ so X and y are relatively prime and satisfy y"^ —Ax"^ = B. Conversely, if x and y satisfy these conditions, then reduction of [^ + x\/]4] to canonical form gives [B^g -\- ^/A] for some square root g of A mod B, so [y + x\^][B^ B — g ^ y/A] = [B] for another square root B — g of A mod 5 , and y + x^/A is among the solutions given by (3). For example, the solution of 79n + 21 = D requires finding the square roots of 79 mod 21. (Note that 21 has no square factors, so all solutions are primitive solutions.) These are easily found by finding the square roots ± 1 of 79 mod 3 and the square roots ± 3 of 79 mod 7 and putting them together using the Chinese remainder theorem to find the four square roots 4, 10, 11, and 17 of 79 mod 21. The module [21,4 + \/79] is stable, and its cycle under the comparison algorithm contains 8 stable modules; since [1] is not among them, the square root 4 of 79 mod 21 gives rise to no solutions of 79n + 21 = D. Similarly, the module [21,17 -h A/79] is stable. Its cycle—which contains the conjugates of the modules in the cycle of [21,4 + \/79], as is shown in the table below—also has length 8 and does not contain [1], so this square root does not give rise to any solutions of 79n + 21 = D either. Application of the comparison algorithm to [21,10 + \/79], on the other hand, reaches [1] in two steps from [21,10 + \/79] to [2,1 + \/79] to [1]; the values of r are first 11 and then 9, so 7/ + x\/79 = (II+V^)(9+A/79) = 89 + 10\/79, and the smallest solution of 79n + 21 = D in the sequence of solutions corresponding to [21,10 + A/79] is 79 • 10^ + 21 = 89^. The next solution is found by taking
90
3 Some Quadratic Problems
t h e comparison algorithm two steps further, which multiplies 89 + 10\/79 by (9+v^K9+x/79) ^ go -f 9 / 7 9 , which of course describes the smallest solution 79 • 9^ + 1 = 80^ of Pell's equation in t h e case A = 79. Since (89 + 1 0 \ / 7 9 ) (80 + 9 v ^ ) = 1 4 2 3 0 + 1 6 0 1 ^ 7 9 , t h e next solution of 790 4-21 = D in this sequence is 79 • 1601^ + 21 = 14230^. More generally, the n t h solution in t h e sequence is contained in the coefficients of (89 + 10\/79)(80 + 9 \ / 7 9 ) ' ' " ^ . In the same way, the fact t h a t [21,11 + v ^ ] ^ [1] leads to an infinite sequence of solutions of 79x^ + 21 = 2/^, namely, the solutions in which x is t h e coefficient of \/79 in (10 + \/79)(80 + 9\/79)'^"^. All solutions are contained in these two infinite sequences. O r b i t s of S t a b l e M o d u l e s for V a r i o u s V a l u e s of A A = 2. (2 modules, 1 cycle) [1]~[2,V21; A = 3. (3 modules, 2 cycles) [1] (Cycle contains just one module.) [2,1 + ^/3] - [3,\/3]; A = 5. (5 modules, 2 cycles) [1] - [4,3 + x/5] - [5, v^] - [4,1 + v ^ ] , [2,1 + V5]; A = 6. (6 modules, 2 cycles) [1]-[3,V6], [2, V6] - [5,4 + V6] - [6, V6] - [5,1 + x/6]; A = 7. (7 modules, 2 cycles) [ l ] - [ 2 , l + x/7], [3,1 + V^] - [6, 5 + V7] - [7, V7] - [6,1 + x/7] - [3, 2 + v/7]; A = 8. (7 modules, 3 cycles) [1], [2,V8]-[4,v^], [7,1 + V8] - [4, 2 + x/8] - [7, 6 + ^/8] - [8, v ^ ] ; A = 10. (10 modules, 2 cycles) [1] - [6,4 + VTO] - [9, 8 + yiO] - [10, \/lO] - [9,1 + \/lO] - [6, 2 + VTO], [2, v ^ ]
-
[3,1 + VW]
-
[5, \/l0]
-
[3, 2 + A/10];
A = 11. {9 modules, 2 cycles)
[i]-[5,4 + A/n]-[5,i + y n ] , [2,1 + x/iI] - [7,5 + \/Ii] - [io,9 + \/TT] - [ii,\/rTl - [10, i + x/TI]
-
[7,2 + A/n]; A = 12. (11 modules, 4 cycles) [l]~[4,x/l2], [2,712], [3,^12] ~ [8,6+\/l2] ~ [ 1 1 , 1 0 + v ^ ] ~ [12,VT2] ~ [11,1 + ^12] ~ [8,2+^12],
Essay 3.3 The Class Semigroup. Solution oi AD -\- B = D.
91
[6,Vl2]-[4,2 + VT2]; A = 13. (13 modules, 2 cycles) [1] - [3,1 + Vl^] - [4,1 + A/T3] - [9, 7 + VT3] - [12,11 + x/l3] - [13, VU] [12,1 + v ^ ] - [9, 2 + VlS] - [4,3 + VTS] - [3, 2 + ^13], [2,1 + v ^ ] - [6, 5 + 713] - [6,1 + 713]; A = 14. (10 modules, 2 cycles) [l]-[2,yi4], [13,1 + ^/I4] - [10, 2 + VTi] - [ 5 , 3 + ^/l4] - [7, VU] - [ 5 , 2 + x/T4] - [10, 8 + Vli] - [13,12 + A/14] - [14, VT4]; ^ = 15. (12 modules, 4 cycles) [1], [5,^/T5]-[2,l + Vl5], [3, 715] - [7,6 + VT5] - [7,1 + yi5], [15, A/15] - [14,l + x/T5] - [11,2+ v ^ ] - [6,3 + ^/15] - [11,9+ v ^ ] [14,13 + VT5]; A = 17. (13 modules, 2 cycles) [1] - [8,5 + ^ - [13,11 + ^17] - [16,15 + ^ - [17,^17] - [16,l + \/l7] [13,2 + x/l7]-[8,3 + Vl7],
[2,1 + v^] - [4,1 + Vrf] - [8,7 + Vrf] - [8,1 + Vrf] - [4,3 + yi7]; A = 18. (12 modules, 2 cycles) [1] - [7, 5 + \/l8] - [9, Vl8] - [7, 2 + / I S ] , [2, Vl8] - [ 9 , 6 + Vl8] - [14,12 + x/Ts] - [17,16 + \/l8] - [18, Vl8] - [17,1 + \/T8] - [14, 2 + VlS] - [9, 3 + A/18];
A = 19. (17 modules, 2 cycles) [1] - [ 6 , 5 + A/19] - [ 5 , 2 + yi9] - [9, 8 + 719] - [9,1 + VT9] - [ 5 , 3 + \/l9] [6,1 + A/T9],
[2,1 + \/i9] - [ 3 , 2 + 719] - [10, 7 + A/19] - [15,13 + 7l9] - [18,17 + \/i9] [19, 7l9] - [18,1 + \/T9] - [15, 2 + \/l9] - [10, 3 + VT9] - [3,1 + \/l9]; A = 20. (14 modules, 3 cycles) [l]-[5,720], [2, 720] - [8,6 + 720] - [10, 720] - [8, 2 + 720], [20, 720] - [19,1 + 720] - [16, 2 + 720] - [11, 3 + 720] - [4, 720] - [11,8 + 720] - [16,14 + 7 ^ ] - [19,18 + 720]; A = 21. (18 modules, 4 cycles) [1] - [4,1 + 72l] - [7, 72l] - [4, 3 + 721], [3, 721] - [5,1 + 72T] - [12, 9 + A/2T] - [17,15 + 721] - [20,19 + 72T] [21, V2i] - [20,1 + 721] - [17, 2 + 72I] - [12, 3 + 721] - [5,4 + 721], [2,l + 72l], [6,3 + 72I] - [10,9 + 72T] - [10,1 + 72I].
92
3 Some Quadratic Problems
Finally, an example that is frequently cited by Gauss.* A = 79. (51 modules, 6 cycles) [ 1 ] - [ 2 , 1 + ^79], [3,1 + ^79] ~ [14,11 + 779] - [15, 2 + A / 7 9 ] - [6,1 + ^79] - [7,4 + ^79], [9,4 + V79] - [13,1 + V79] - [ 5 , 2 + ^79] - [18,13 + ^79] - [25, 23 + ^79] ^ [26,1 + A/79] - [21,4 + A/79] - [10, 7 + ^79], [27,22 + v ^ ] - [ 3 5 , 3 2 + A/79] - [39,38 + ^/79] - [39,1 + ^79] - [35,3 + ^79] [27, 5 + A/79] - [15, 7 + A/79] - [30,23 + A/79] - [43,37 + ^79] - [54,49 + V ^ ] . ^ [63,59 + A/79] - [70,67 + A/79] - [75,73 + A/79] - [78, 77 + A/79] ~ [79, A/79] [78,1 + v ^ ] - [75, 2 + Vi9] - [70, 3 + V79] - [63,4 + ^79] - [54, 5 + A/79] [43,6 + Vf9] - [30, 7 + A/79] - [15,8 + ^79], [9, 5 + A/79] ~ [10,3 + V79] - [21,17 + ^79] - [26, 25 + ^79] - [25, 2 + ^79] [18,5 + A / 7 9 ] - [5,3 + V 7 9 ] - [13,12 + A/79], [3, 2 + A/79] ~ [7,3 + V79] -
[6, 5 + ^79] -
[15,13 + A/79] -
[14,3 + A/79].
* Disquisitiones Arithmeticae, §§185, 186, 187, 195, 196, 198, 223. The reason 79 is of interest is that it is the smallest value of A for which the class group contains a square that is not the identity—for example, [3,1 + A/79]^ ~ [9, 4 + V ^ ] / [1]. Perhaps Gauss's attention was drawn to this case by the fact that it occurs in a counterexample Lagrange gave to a conjecture of Euler [48, Article 84]. Lagrange notes that the problem 79 • D + 733 = D has a solution (the comparison algorithm gives [733, 476 + A/79] - [90, 77 + v/79] - [1]) but the problem 79 • D + 101 = D does not ([101,33 + ^79] - [45,23 + A/79] - [9,4 + A/79] ^ [1]), contrary to a conjecture of Euler that would have implied that the answer to "Does An-\~B = D have a solution?" might, for prime B, depend only on the class of B mod 4A.
Essay 3.4 Multiplication of Modules and Module Classes
93
Essay 3.4 Multiplication of Modules and Module Classes In this essay, the semigroup of modules and the class semigroup are examined more closely. In the semigroup of modules, modules can be decomposed as products of their "p-parts" where p ranges over the primes, and in this way t h e semigroup can be described quite fully, except for the crucial problem of determining the primes p mod which A is a square, which are t h e primes for which [p] is a product in which neither factor is [1]. This question is the subject of Essay 3.5. T h e structure of the class semigroup depends on more subtle considerations, and general statements are harder to come by. T h e theorem t h a t is proved in this essay and t h a t is used in the next to prove the law of quadratic reciprocity merely describes the subgroups of index 2 of the class group in a few simple cases (namely, t h e cases in which A is an odd prime or twice an odd prime or a product of two primes t h a t are congruent mod 4). The computation of products of modules comes down to the computation of products [/, g + \/~A] [F, G + \fA] (where g'^ = A mod / and G'^ = A mod F), because multiplication of a module by [e] is easy. T h e following theorem determines these products when / and F are relatively prime: T h e o r e m 1. / / / and F are relatively prime, [/F, z + V ^ ] ; where z is determined by z = g mod / and z = G mod F. Proof. By t h e Chinese remainder theorem,* t h e congruences z = g mod / and z ^ G m o d F determine a unique z m o d / F , so t h e formula [/F, z -\- y/A] determines a module. T h e desired product is [/F, / ( G + V ^ ) , F{g-\-y/A), {g-\VA){G + VA)], which is [/F, f(z + y/A),F{z + yfA),gG + A + (^ + G ) v ^ ] , because fG = fz mod / F and Fg = Fz mod / F . If / 3 / = aF -\- 1, then z + \fA is t h e difference of (3f{z + \fA) and aF{z -h v ^ ) , so t h e desired product is [fF,z -h VA^gG + A + (^ + G)y/A]. Since A = z'^ m o d / and A = z"^ mod F and since / and F are relatively prime, A = z'^ mod / F ; moreover, {z — g){z — G) = 0 mod / F , so A + gG = z{g + G) mod / F , which implies t h a t gG + ^ in t h e last t e r m can be replaced by z{g -\- G)^ and t h e desired product becomes [/F, z + \/]4, {g -^ G){z + v ^ ) ] = [fF^ z + V ^ ] , as was to be shown. Given a prime p and a module [/, g + \fA\, let t h e p - p a r t of [/, g + vC4] be the module [p^^g -\- \ / ^ ] , where n is the number of times p divides / . (If p does not divide / , the p-part of [/, ^ + ^fA\ is [1] by this definition.) By Theorem 1, every module is the product of its p-parts, and one can find t h e product of [/, g -f- \fA\ and [F, G + vG4] by finding t h e product of their p-parts for each prime divisor p of / F , which reduces t o finding t h e product of the p-parts for t h e primes p t h a t divide b o t h / and F . In short, the computation of [/, g + \fA\ [F, G + \fA\ can be done prime by prime. For all b u t a very few primes, the needed products are given by the three propositions t h a t follow. * See Essay 3.2.
94
3 Some Quadratic Problems
Proposition 1. Let p be an odd prime that does not divide A. If A is not a square mod p, there are no modules [/, g + A/A] in canonical form other than [1] in which f is a power of p. If A is a square mod p, there are exactly two modules [p^ ^g -\- \/A\ in canonical form for each n > 0^ and they are the nth powers of those in which n = 1. The product of the two in which n = 1 is [p]. Thus, a product of the form [p'^^g + A/]4][^^, G + y/A] can be found by writing it as [p, ^ + \/]4]^[p, G + A/Z]^ and observing that the theorem imphes that their product is [p,g + \/A]'^+'' if [p,g + VA] = [p,G -\- VA], and is [j9]^[p, G + \ / ] 4 ] ^ - ^ or [p]^[p, g + \ / ] 4 ] ^ - ^ in the obvious way 'ii\p,g + VA]^ [P.G^VA]. Proof Because a polynomial of degree n with integer coefficients that is not zero mod p has at most n roots mod p, A has at most two square roots mod p. Because —g is a square root of A mod p whenever g is, and because g and —g are different mod p when this is the case (if they were the same, then 2g would be zero mod p so AA = {2gY = 0 mod p, contrary to hypothesis), A has either no square roots mod p or exactly two. When it has two, and when g < p is one of them, the product [p, g + \/]4][p,P — ^ + VA] is [p^,p{p — g ^ ^/^)iP{9 + ^/^)iP9 — g'^ -\- A -\- py/A]. The third term minus the second is 2pg mod p^ and [p^, 2pg] = [p] (again, because the square of 2g is not divisible by p and p is prime), so the first term p^ can be replaced by p, and the module is equal to [p]. If, for some n > 1, gn-i is a square root of A modp^~^, then there is a unique square root of A mod p^ that is congruent to gn-i modp^~^, because the formula {gn-i + l3p^~^Y = A m o d p ^ implies 2(3gn-i = {A— ^^_i)/p^~^ mod p, which determines /? mod p because 2gn-i is relatively prime to p. Therefore, for each value of n and for each square root g of A mod p there is exactly one square root of A mod p^ that is g mod p; call it gn. If [p,^ + ^ ] " - ^ = [p^-\gn-i + VA], then the same formula holds with n in place of n - 1, because [p,^ + y/A]'^ = [p'^~^,gn-i + ^/A][p,g 4- VA] = [p^-\gn^VA][p,gn + VA] = [p^,p{gr^ + ^/A), gl ^ A + 2gnVA] = [p",p(^n + VA),2gn{gn + VA)] = [p^.g^ + VA] (because [p,2^,] = [1]). Thus [p,^ + VA]is the unique module [p^,G -\- y/A] in canonical form in which G = g mod p, as was to be shown. Proposition 2. If p is a prime that divides A once but not twice, the only module [p^,^ -f vA] in canonical form in which n > 0 is [p, V ^ ] ; the square
of[p,VA] is [p]. Proof If g were a square root of A mod p^ for n > 1, then p would divide g^ and therefore would divide g itself, so p^ would divide g^ = A mod p^, contrary to the assumption that A^O mod p^. Therefore, the only module [p^,g-\-\/A] in canonical form in which n > 0 is [p, \/A] because g"^ = A = 0 mod p implies ^ = 0 mod p. The square of this module is [P^,PA/]4, A] = [p], because [p^,^] = [p] by assumption.
Essay 3.4 Multiplication of Modules and Module Classes
95
Proposition 3. If A is odd, the modules in canonical form [2^,^ + v ^ ] with n > 0 are as follows: (i) When A = 3 mod A, [2,1 + A/Z] is the only one; its square is [2]. (a) When A = b mod 8, there are three: a = [2,1 + ^/A], /? = [4,1 + \fA], and (5 = [4, 3 + yA\. The product of a with any of the three is [2\a, whereas p^ = [2]~P, 'f = [2]/3 and (3p = [4]. (Hi) When A = 1 mod 16, let a = [2,1 + VA], /? = [8,5 + VA], and P = [8,3 + y/A]. The modules in question are a and four infinite sequences of modules /^^/[2^-i], ^''/[2''-^], a/3^/[2^], and op /['^"'] for n > 0, these modules being distinct from one another. They can be multiplied using o? — \2\a andf5p^ [8]. (iv) When A = 9 mod 16, the answer is given by (Hi) when (5 is changed to [8,1 + y/A] and ^ to [8, 7 + y/A]. Proof, (i) The square of any odd number is 1 mod 4, so A = 3 mod 4 implies that no module [2^,^ -f v ^ ] is in canonical form when n > 1. The square of [2,1 + \/A] is [4,2 + 2 A / I , A + 1 + 2y/A] = [4,2 + 2y/A, A - 1], which is [2] because [A, A - 1] = [2]. (ii) Similarly, the square of any odd number is 1 mod 8, so ^ = 5 mod 8 implies that there are no modules [2^, p -h A/A] in canonical form in which n > 2. The multiplication formulas that are given are easy to verify. For example, [4,1 -f- y/Af = [16,4(1 -f \/]4), 1 + A + 2y/A]] twice the third term minus the second is 2+2A—A — 2(^1—1) = 8 mod 16, so the 16 in the first term can be replaced by 8, after which the second term can be dropped because it can be expressed in terms of the first and third, resulting in [8, H-A-f2\/]4] =
[2][4,3 + y i ] . (iii) and (iv) Let .A be 1 mod 8. For each n>l, there is a unique solution gn < 2^+2 of gl = A + 2^+^ ^^^ 2^+^ for which gn = 1 mod 4. In fact, ^1 = 5 is determined by these conditions when A = 1 mod 16, and gi = 1 is determined by them when A = 9 mod 16. For n > 1, knowledge of gn-i enables one to find gn in the following way. The congruence h^ = A.+2^"^^ mod 2n+3 Q£ which gn is a root implies, because g^_i = A-\- 2^+^ mod 2^^+^, that h^-gl_i = 2^+1 mod 2^+2. If /i = 1 = gn-i mod 4, then /i + ^n-i is divisible by 2 but not 4, so the congruence {h -\- gn-i){h — gn-i) = 2^^^ mod 2^^+^ imphes that h — gn-i is divisible by 2^ but not 2^'^^. Thus h = gn-i + 2^ mod 2^^+^. In short, if the conditions on gn can be met, then gn = gn-i + 2^ mod 2^+^. The two possible values gn-i =t 2^ of gn mod 2^+^ determined in this way have squares that differ by 2^^+^ mod 2"^+"^, so the condition g'^ = A + 2^+^ mod 2^"^^ is satisfied by exactly one of them. The sequence ^1, ^2, 93 j • • • determined in this way describes the modules /3^/[2'^~^] — [2^+^, ^n H- VA], as can be seen as follows: By the very definition of ^1, P = [8,^1 + y/A] in both (iii) and (iv). What is to be shown, then, is that [8,^1 + \/]4][2^+\^n_i + VA] = [2][2^+^^n + VA]. The product on the left side is [2^^\S{gn-i + y/A),2^^^{gi-^VA),gign-i-hA + {gi+gn-i)VA]. Because gn-i = gn + 2'' mod 2 ^ + \ one has 8gn-i = 8gn + 2^+^ mod 2^+^,
96
3 Some Quadratic Problems
so the second term can be changed to 8{gn + y/A) -h 2"^+^. Similarly, the third term can be changed to 2''^^{gn + VA) + 2"'+^. (Use ^i = ^2 + 4 mod 8 and g2 = Qn mod 8.) Finally, the congruences g^ = gi + 4 mod 8 and gn = gn-i + 2- mod 2-+1 imply {g^ - gi){gn - 9n-i) = 2^+^ mod 2-+^ which combines with A = g'^ -\- 2"^+^ mod 2^^+"^ to give A -\- gign-i = 9n{9n-i + ^fi) mod 2^+^, and the product [8,^i + \/]4][2^+-^,^^_i + \/]4] can be written as [2"+^8(5n + y I ) + 2"+^2"+l(9n + V I ) + 2"+^(5l+5„-l)(5r^ + VA)+2"+3a], where a = 0 or 1. The diflFerence between ^^ ^"^"^ times the second term and 4 times the last term is 2"^+^ mod 2^+"^, because ^^"^^"""^ is odd. Thus the first term can be changed to 2^+^, and the desired product can be expressed as [2"+^ ^{gn + v ^ ) , 2^^\9n + VC4), (^1 + ^n-i)(^n + v ^ ) ] - The third term is a multiple of the second and can be dropped. Because [8,^1 + gn] = [2], this brings [8,^i + \ / I ] [ 2 ^ + \ ^ n _ i + \ / I ] to the desired form [2-2^+^ 2(^^ + V^)]. The sequence ^1, '§2^ ^25 • • • defined by 'g^ = —gn mod 2^"^^ satisfies the same conditions as the sequence ^ 1 , ^2, S's • • •, except that the condition gn = I mod 4 is replaced by ^^ = 3 mod 4. Therefore, the same argument gives ;5''/[2^-i] = [2^+2^-^^ + VA\ = [2'^+^^^ + \ / I ] . For n > 1 there are four modules of the form [2^~^^,g + V ^ ] , one for each of the four square roots of A mod 2^+^, which are ib^^ and ih^^ + 2^"*"^. The first two have been 77,4-1
accounted for. The remaining two are a/?^^^/[2"^+^] and a(3 /[2^+^], as follows from the observation that the last term in [2,1 + \/]4][2^'^'^, :^gn-\-i + y/A] = [2-+^2(±^,+l + ^ ) , 2 - + 3 ( l + v ^ ) , ± ^ , + l + A + ( l ± ^ , + l ) ^ / I ] can be changed to (1 ± ^n+i)(±^n+i) + 2""+^ + (1 ± ^n+i)V^, which is 2^+^ plus a multiple of the second term; therefore, the first term can be replaced with 2^+^, so that the module becomes [2^+^, 2(±c/n+i + V^)] = [2][2^+^ zb^f^ + 2'^+^H-\/]4], as was to be shown. When n = 0 the result is [2] [4, ±l+\/]4], which accounts for the two modules aP/[2] = [4,1 + y/A\ and a^/[2] = [4,3 + \/]4] and completes the proof. These three propositions cover all products [/, g + \/]4] [F^ G + y/A] except those in which some prime p divides both / and F and divides A twice. In particular, if A is square-free, it describes all products. The description of products of equivalence classes of modules is in principle much easier, because the number of equivalence classes is finite, so one can simply compile a multiplication table showing all products of all pairs of classes. However, multiplication tables are rarely very enlightening. More insight into the class semigroup is obtained by considering specific features. In particular, as Gauss's work in Section 5 of Disquisitiones Arithmeticae showed, enough information about the class semigroup on which to base a proof of quadratic reciprocity is provided by analyzing the classes that are ambiguous in the sense defined below. Lemma. / / [/, g + y/A] is stable and if [/i, gi + y/A\ is its successor in the comparison algorithm, then [/, f—g-\-y/A] is the successor o/[/i, / i —gi-\-VA] in the comparison algorithm.
Essay 3.4 Multiplication of Modules and Module Classes
97
Proof. By definition, [/i, gi + VA] is determined by / / i = r'^ — A and gi = r mod / i where r is the least solution of r + ^f = 0 mod / whose square is greater t h a n A. Similarly, t h e successor of [/i, / i — ^i + \ / ^ ] is [f^g^ + V ^ ] where / ' / i = r^ — ^ , ^' = r i mod f\ and r i is t h e least solution of r i = gi mod / i whose square is greater t h a n A. It was shown in Essay 3.3 in t h e proof t h a t t h e comparison algorithm permutes stable modules t h a t r is t h e least number in its class mod / i whose square is greater t h a n A. T h u s , since r i and r are b o t h gi mod / i , r i = r, which implies f = f and g' = ri = —g mod / , as was t o be shown. Thus, if [/, g -\- y/A] is stable, t h e cycle of [f,f — g-\- y/A] contains t h e modules obtained by changing g to f — g in t h e modules in t h e cycle of [f,g + v A ] , but they are traversed in t h e opposite direction. In particular, if [fif~9~^ VA] ^ [/, ^ -h y/A] holds for one module in a cycle it holds for all W h e n this is t h e case, t h e cycle is called a m b i g u o u s . Let Ml, M2, • . . , Mjy be t h e modules in an ambiguous cycle in t h e order given t h e m by t h e comparison algorithm. Let t h e definition of Mi be extended t o all integers i by setting Mi = Mj whenever i = j mod z/, and, for Mi = [fi 5 9i + VA] •> let Mi denote [/^, fi — gi-^ y/A]. Call a stable module Mi p i v o t a l if (1) Mi = Mi or (2) Mi = M^_i. An ambiguous cycle contains exactly two pivotal modules—unless it contains just one module—as can be seen as follows: Let t h e given cycle be ambiguous and let /i > 0 satisfy MQ = M ^ . By t h e lemma, M i = M ^ _ i , M2 = M ^ _ 2 , and so forth. If /i > 1 renumber t h e modules in t h e cycle by setting M / = M^+i. T h e n M / = M^+i = M ^ _ i _ i = -M^_^_2, so t h e renumbering of t h e modules has t h e effect of reducing /i by 2. In this way, // can be successively reduced until it is 0 or 1. If // = 0, t h e n Mi — M(^_i) = M^^^i for each z, which is to say Mi — Mj^-i, so Mi is pivotal of t y p e (1) if and only iii = —i mod u and pivotal of t y p e (2) if and only if i = 1 — i mod ly. T h u s , when z/ = 2cr for a > 0 t h e only pivotal modules are MQ and M^ (because 0 and a are t h e only numbers i less t h a n u for which 2i = 0 mod 2a and 2i = 1 m o d 2cr has no solutions at all) and when z/ = 2 r -f 1 t h e only pivotal modules are MQ and M^-^-i. Similarly, if /i = 1, t h e n Mi = Mi^+i_^, so Mi is pivotal of t y p e (1) if and only if i = 1 — i m o d v and pivotal of type (2) if and only if i = 2 — z mod u, from which it follows t h a t there are two pivotal modules when z/ = 2(j for cr > 0, namely, M i and M0-+1, and two pivotal modules when u = 2 T -h 1, namely. M i and M ^ + i . T h a t there are two pivotal modules unless u = 1 follows from t h e observation t h a t the modules given by these formulas are distinct unless 1/ = 1. T h e o r e m 2. (a) If A is an odd prime, there are four pivotal modules, [1,VA], [2,1 -h VA], [A,y/A], and [ ^ ^ , 1 + y/A] (except that in the cases A = 3 and A = b the last of these coincides with one of the first two to form a cycle of length 1). (b) If A is twice an odd prime, say A = 2p, there are four pivotal modules, [1,VA], [2,VA], [P,VA], and[A,^/A].
98
3 Some Quadratic Problems
(c) If A is a product of two odd primes, say A = pq, where p < q, there are eight pivotal modules, [1, ^/A], [p, \/]4], [q, VA], [A, ^/A], [2,1 + VA], [2p,p-\\/A], [^^y^, 1 + VA] cind [ ^ ^ , p + \fA\ (except that the last of these coincides with one of the others to form a cycle of length 1 if q — p — 2 or 4). Thus, in cases (a) and (b) there are 2 ambiguous cycles, and in case (c) there are four. Proof (a) A module [/, g + y/A] in canonical form is pivotal of type (1) if and only if ^ = —g mod / . W h e n this is the case, 2g = 0 mod / and 4A = (2^)^ = 0 mod / ; t h a t is, / is a factor of 4A. Since A is an odd prime, / = 1, 2, 4, or A, because / < A in a stable module. Since f = 4 would imply 2g = 0 mod 4 and ^ = 0 mod 2, it is impossible in view of ^^ = A mod / . Therefore, the only possible pivotal modules of type (1) in this case are [l,v^], [2,1 + / A ] , and [A, v ^ ] , all of which are clearly pivotal of type (1). If [/, ^ + y/A] is pivotal of type (2), then its predecessor in t h e comparison algorithm is [f,f — g-\- V ^ ] , and the r t h a t determines the step from [f^f — g-\- VA] to [/,g-\-y/A] satisfies p ^ y.2 _ ^ Thus, A = {r - f){r -\- f), which implies t h a t r - / = 1 and r-\-f = A and therefore t h a t / = ^ ^ , r = ^-^. Since the new g is congruent to r mod / , and since r = 1 mod / , the only possible pivotal module of type (2) is M ^ , l + \/]4 , which is in fact pivotal of type (2). (b) As in the proof of (a), if [/, g + ^/A] is pivotal of type (1), then 4^4 = 0 mod / and / < A, so / = 1, 2, 4, p, or 2p. Again, / = 4 is impossible because it would imply ^4 = 0 mod 4, so the only pivotal modules of type (1) are the ones listed in (b). There are no pivotal modules of type (2) because A = 2 mod 4, so r^ — f^ = A is impossible. (c) If / is a factor of 4^4 less t h a n or equal to A, then / = 1, 2, 4, p, g', 2p, 2g, or A = pq. Since [2g, q + \/A] is not stable (both q'^ and {2q — q)'^ are q^ > pq =^ A)^ the only possible pivotal modules of type (1) are [1], [2, 1 + A/A], [p, vA], [q^ A / ^ ] , [2p^p + vA], and [A^ \A]^ all of which are indeed pivotal of type (1). Those t h a t are pivotal of type (2) satisfy A = {r — f){r -\- f) as in the proof of (a), where r is t h e number used by the comparison algorithm to go to the pivotal module from its predecessor. Thus, either r — / = 1 and r-\-f = A or r — f = p and r + / = g. In t h e first case, r = 1 mod / , so the module must be
^ ^ ^ , 1 + y/Al,
module must be
^,P
and in the second case, r = p mod / , so the
+ VA which completes the list.
The complications t h a t arise in the cases A = 1 mod 4 of Proposition 3 stem from the fact t h a t in these cases the class semigroup is not a group, which is to say t h a t there are classes without inverses. For example, the square of [2,1 + \/5] is [2] [2,1 + V^], but [2,1 + V^] constitutes a cycle of length 1 and is therefore not equivalent to [1]. Therefore, the class of a = [2,1 + \/5] has no inverse because a^ ~ [1] would imply a = Q;[1] ^ a'^^y = [2] 0^7 ^ 0^7 ^ [1]. A module whose class is invertible in the class semigroup is called p r i m i t i v e .
Essay 3.4 Multiplication of Modules and Module Classes
99
The class group for a number A, not a square, consists of the invertible elements of the class semigroup, those classes whose elements are primitive. For any given A, the class semigroup is found by finding the cycles of stable modules. One can then find the class group by using the following theorem to determine which cycles contain primitive modules: Theorem 3. Let [e][/, ^ + y/A] be a module in canonical form, and let d he the greatest common divisor of f, 2g, and '^ 7 ^ if d = 1, then [f,g + y/A][f,f — g -\- y/A] = [/]; in particular, [e][f,g + \/]4] is primitive, because the class of [f, f — g -\- A/]4] is inverse to its class. If d > 1, [e][f,g + ^/A] is not primitive. In particular, if A = 1 mod 4, the module [2,1 + y/A] is not primitive, because in this case d = 2. Proof. By direct computation
[f,g+y/A][f,f-g+VA]
= [f, -fg + fVAJg + fVAJg-g^ = [f, -2fg,fg + fVA,\g^= [/]
+A
•/VA]
'f,2gJ-lf^,g + VA
= [/]['i,g + \/A], where use is made of the first term /^ to compute with the coefficients of the other terms as numbers mod /^. If d = 1, this product is [/], which proves the first statement.
Now, [d,g + VA][f,g + VA] = [df,d{g + VA),g^ + A + 2g^]
= [df,dig +
VA),A — g'^] (subtract 2g/d times the second term from the third, computing mod df) = [d][f^g + ^/~A\ (because g^ = Axnoddf). Therefore [d,g + y/A][f,g + VA] = [d][f,g ^ yfA] - [/,^ + A / I ] , which shows that ^/ [/? 9 + V^] ^^ primitive, then [d, g + \/~A\ ^ [1], which is to say that repeated application of the comparison algorithm to [d^ g + y/A\ must eventually reach [1]. But if d divides / , 2^, and '^ 7 ' for any module [/, ^ + \fA\ in canonical form—as is the case with the module [d^ gi -h \fA\ when gi is the smallest number congruent to g mod d because gi-\- g and gi — g are both zero mod d, so gf- A = g'^ - A-\-{gi^ g){gi - g) = 0 mod d"^—then d divides F , 2G, and ^—^-^, where [F, G + \ / I ] is the successor of [/, g + V l ] in the comparison algorithm, as can be seen as follows: Let r = uf — g he the number used to find [F, G + A/A]. Then / F = r^ - A = u'^f^ - 2guf -i-g'^ - A = 0 mod df, which implies F = 0 mod d. Moreover, G = fiF -\- r for some /x, which gives 2G = 2/xF ^2Tyf-2g = 0 mod d. Finally, G^ - A = r^ - A + 2/iFr -f M^F^ = Ff + 2/iFr + fi'^F'^ = Ff ^ 2/iFG = 0 mod dF, so d divides i ^ ^ . (Note that /x may be negative in these congruences, so that the argument is still valid when G < r.) Therefore, if [/, g + y/A] is primitive, d must divide 1, as was to be shown.
100
3 Some Quadratic Problems
Corollary. Elements of the class group whose squares are the identity correspond one-to-one to ambiguous cycles whose modules are primitive. Deduction. An equivalence class whose square is the identity contains modules whose squares are equivalent to [1]; in particular, the modules it contains are all primitive. By the theorem just proved, if [/, g + y/A] is a stable module in canonical form that is in such a class, then both [f,g-{-y/A] and [/, / —^ +\/]4] are in the class inverse to the class of [/, g + y/A], which shows that the cycle of [/, g -\- \/\A] is ambiguous. Conversely, if [/, g + VA] is both primitive and ambiguous, then [f,g + ^/A]^ ^ [f,g + VA][fJ - g + ^/A] = [/] ~ [1]. When this corollary is combined with Theorem 2, it determines the elements of the class group of order 2 in a few cases: Theorem 4. / / A is prime and congruent to 1 mod 4, the class group has no element of order 2. If A is prime and congruent to 3 mod 4, or if A is a product of two primes A = pq for which p -\- q ^ 0 mod A, the class group has a unique element of order 2. Proof. When A is prime and 1 mod 4, [2,1 + VA] and [ ^ , 1 + ^/A] are not primitive, because the greatest common divisor of / , 2^, and ' Z^ ' is 2 for each of them, so at most two pivotal modules are primitive. The cycle of [1] therefore is the only one that represents an element of the class group whose square is the identity. (The few cases in which there are cycles of length 1 are enumerated in Theorem 2.) Therefore, the class group contains no elements of order 2. On the other hand, when A is prime and 3 mod 4, or when A — 2p where p is an odd prime, all four of the pivotal modules identified in Theorem 2 are primitive. Therefore, there are two primitive, ambiguous classes. The case A = pq in which p and q are odd primes that satisfy p = q mod 4 is similar, because the four pivotal modules other than [1], [p, vA]^ [q, vA]^ and [A, yA] are not primitive (for all of them, / , 2g^ and '^ 7 ' are all even). Thus, in these cases, the class group has just one element of order 2.
Essay 3.4 Multiplication of Modules and Module Classes T h e Class Semigroup for Various Values of A (Compare to the table of Essay 3.3)
Value of A 2 3 5 6 7 8 10 11 12 13 14 15 17 18 19 20 21 79
Class group Trivial group Group of order 2 Trivial group Group of order 2 Group of order 2 Group of order 2 Group of order 2 Group of order 2 Group of order 2 Trivial group Group of order 2 Four-group Trivial group Group of order 2 Group of order 2 Group of order 2 Group of order 2 Cyclic of order 6
Representatives c none none [2,1 + ^/5] none none [2,V8] none none [2,yi2], [6,yi2] [2,1 + Vn] none none [2,1 + yi7] none none [2,V26] [2,1 + ^21], [6,3 none
21]
101
102
3 Some Quadratic Problems
Essay 3.5 Is A a Square M o d p? . . . eine noch hohere Bedeutung haben sie [die Reciprocitdtsgesetze] in der geschichtlichen Entwickelung dieser mathematischen Disciplin [Zahlentheorie] dadurch erlangt, dafi die Beweise derselhen, so weit sie ilberhaupt gefunden sind, fast durchgdngig aus neuen, his dahin noch unerforschten Gebieten haben geschopft werden miissen, welche so der Wissenschaft aufgeschlossen worden sind. ( . . . t h e reciprocity laws attained an even greater significance in the historical development of number theory by the fact t h a t their proofs, insofar as proofs have been found, had to be sought in areas t h a t were hitherto almost completely unexplored and t h a t in this way were opened to science.)—E. E. K u m m e r [47, Introduction] Essay 3.4 gives a description of t h e semigroup of modules for a given A {A not a square) t h a t is virtually complete except t h a t it leaves untouched the obvious question raised by its Proposition 1: For which odd primes p not dividing A are there modules [p,g + \/A] in canonical form? This is the question "What is t h e value of X p ( ^ ) ? " where, for a given odd prime p, Xp is defined to be the q u a d r a t i c c h a r a c t e r of numbers mod p, which is to say t h a t Xp is the function t h a t assigns to a number A the value* —1 if the congruence A = x^ mod p has no solution x, the value 0 if A = 0 mod p, and t h e value 1 otherwise. In this way, the evaluation of Xpi-^) is essential to computation in the semigroup of modules of hypernumbers for a given A. T h e problem of evaluating Xp(A) engaged Euler's interest rather early in his career (see [17]), when he discovered empirically the amazing fact t h a t ifp and q are primes that satisfy p = q mod 4:A, then Xp{^) = Xq(^)- I^ other words, the answer to the question "Is A a square mod p?" depends only on the class of the prime p mod AA. Euler made many a t t e m p t s to prove what he had found empirically; in the process, he found refinements and generalizations of the phenomenon, thereby setting much of the agenda for number theory for t h e next hundred years, but t h e hoped-for proof eluded him. W h a t Euler knew b u t couldn't prove about t h e values of Xp(^) implies the law of quadratic reciprocity fairly easily. This law, which is stated below, was p u t in its usual form by Legendre and was first proved by the young Gauss, who gave two proofs in Disquisitiones Arithmeticae^ published in 1801 when he was 24 years old. T h e second of these uses his theory of composition of binary quadratic forms. The proof t h a t will be given in this essay is inspired by Gauss's second proof, b u t it will use modules and their multiplication instead of quadratic forms and their composition. T h e law of quadratic reciprocity is one case of a general formula for Xp (^) of the form Use of the "negative number" — 1 can be avoided by treating the values of XP ^^ numbers mod 4, so that —1 = 3, (—1)^ = 1-
Essay 3.5 Is A a Square Mod p? (1)
Xp{A) ^
103
aA{p)l[xAdP) Ai
in which p is an odd prime, A is a square-free number that is not divisible by p, the product on the right is a product over all odd* prime factors Ai of A^ and (TA{P) depends only on the classes of p and A mod 8; in fact crA{p) depends only on the classes of p and A mod 4 when A is odd. The formula for (JA (p) is given at the end of the essay. Euler's observation that Xpi^) = Xg(^) when p = q mod AA follows immediately from (1), because p = q mod 4A implies p = q mod Ai for all odd prime factors Ai of A and implies p = q mod 4 when A is odd, p = q mod 8 when A is even. The derivation of formula (1) is the subject of the present essay. The law of quadratic reciprocity is simply the case in which A is an odd prime. However, the derivation will begin with this special case. It will use two lemmas: Lemma 1. If p is an odd prime, Xp(p ~ 1) ^-^ 1 when p = 1 mod 4 and —1 when p = 3 mod 4. Proof. I will use without proof the fundamental fact that for any prime p there is a primitive root 7 mod p, which is to say a number 7 with the property that every number not divisible by p is congruent to a power of 7 mod p. (See Section 3 of Disquisitiones Arithmeticae, which contains two proofs, the first in Articles 39 and 54, the second in Article 55.) If 7 is a primitive root mod p, then each of the p — 1 numbers between 0 and p is congruent mod p to a unique power of 7 in which the exponent is less than p. Since 7^^ = 7^^ mod p if and only if 2/i = A mod {p — 1), a power 7'^ of 7 is a square mod p for X < p if and only if A is even. Since the roots 7*^ and ^^P~^y^ of x^ — 1 mod p coincide with ± 1 mod p^ p—1 = 7^^"^)/^ mod p, which is a square mod p if and only if {p — l ) / 2 is even, or in other words, Xp(p — 1) = 1 if and only if p = I mod 4, as was to be shown. Let X4 denote the function that assigns the value 0 to even numbers, the value 1 to numbers congruent to 1 mod 4, and the value —1 to numbers congruent to 3 mod 4. Then Lemma 1 can be stated as Xp{p-1)
= X4{p) for any odd prime p.
As is easily proved, Xp(?7in) = X P ( ^ ^ ) X P ( ^ )fo^"^11 numbers m and n whenever p is prime and also when p = 4. Lemma 2. Given a primitive module and given a number N, construct a module [/, g -h \/3] in canonical form for which (1) f is relatively prime to N and (2) the product of [f^g + \fA\ and the given module is equivalent to [1]. * Since x^iv) — 1 for all odd primes p, it makes no difference whether A^ = 2 is included in the product in formula (1) when A is even.
104
3 Some Quadratic Problems
Proof. Let [E][F, G-f \/]4] be the given primitive module. Since it is equivalent one m a y as well assume E — 1. Choose ji large enough t h a t {^F+Gf > A. From G^ = A mod F it follows t h a t ( / i F + G ) ^ - ^ = 0 mod F , say HF = (//F + G)^ - A. For all numbers z/, {uF + /xF + G)^ - A can be written in the form F • ^(z^), where q{iy) is a polynomial of degree 2 with coefficients t h a t are numbers, namely, q{iy) = Fiy^ + 2 ( / i F + G)iy -f ( M ^ F + 2/xG + H). A common divisor d of the coefficients of q{iy) divides F , 2G, and i J , so {/iF + G)2 = A mod d F and G^ = A mod dF; thus, (i divides F , 2G, and ^—^—^, which implies d = 1 because [F, G + \/]4] is primitive by assumption. Thus, for any prime p , q{h') is a nonzero polynomial mod p whose degree is 2 at most. Therefore, g(z/) has at most 2 roots mod p t h a t are less t h a n p. Moreover, since F and H cannot b o t h be even (because at least one of F , 2G, and H is odd), q{u) is either odd when u is even (when H is odd) or odd when ly is odd (when H is even). Let Pi, P25 • • • ^ Pa hst the distinct prime divisors of FN. For each p^, choose a number Ui < pi for which q{iyi) ^ 0 mod Pi. Use the Chinese remainder theorem to construct a number i^ < Y[Pi such t h a t u = Ui mod p^ for each i. T h e n q{iy) = ^(z^^) ^ 0 mod p^ for each i. In other words, q{iy) is relatively prime to FN. Let p — {u -\- fi)F -\- G for the z/ chosen in this way. T h e n [p+y/A] = [p^A,P+VA] = [F'q{iy),p-\-y/A] ^ [F, p-\- y/A][q{iy), p-\- y/A] (because q{u) and F are relatively prime) = [F,G -\- y/A][q{i'), p + y/A] (because p = G mod F ) . Thus, because [p + ^/A] ~ [1], the module [^(z/),^ + \/]4] has the required properties. P r o p o s i t i o n . If p is an odd prime divisor of A, the value of Xpif) ^<5 ihe same for any two equivalent primitive modules [f,g + \/A] in which f ^ 0 mod p. In this way, Xp determines a homomorphism from the class group to the group with two elements ±1. Similarly, if A = 3 mod 4, the value of XA{f) is the same for any two equivalent primitive modules [/, g + \fA] in which f is odd, and the function from the class group to ±1 that XA determines in this way is a homomorphism,. Proof Let [/, g + y/A] be a given primitive module in canonical form. Use L e m m a 2 to find a module [F, G-\-\/A] in canonical form with [/, g-\-y/A] [F, G + VA] ^ [1] and F relatively prime to pf. Since Lemma 2 implies t h a t there is a module in canonical form [JT, Q + y/A] with [J^,T-\-\^] [F, G + VA] - [1] and J^ relatively prime to p F , the equivalence class of [/, g^\J~A\ contains a module \T,Q^\fA\ (because b o t h [/, ^ + / A ] and \T, ^ + \ / I ] are in the class inverse to the class of [F, G -h V ^ ] ) in canonical form in which T is relatively prime to pF. W h a t is to be shown is t h a t Xp(-^) = Xp{f) whenever / ^ 0 mod 2^. This will be done by proving t h a t if / ^ 0 mod p, then Xp{f) — Xp{E). Now [/F, Z + VA] = [f,g + V ^ ] [ F , G + V I ] - [1] for z = ^ mod / and z = G mod F , so there are principal modules [v + uy/A] and [y + xy/A] such
Essay 3.5 Is A a Square Mod p?
105
t h a t [v + u\fA][fF^ z + VA\ — [y^ xy/A][l]. Since there is a principal module [t + s v ^ ] for which [t + s\/]4]['^ + '?^v^] = [N] for some number TV (Proposition 3 of Essay 3.3), one can assume without loss of generality t h a t i^ = 0; t h a t is, ['L'][/F, z-\- \fA\ = [^ + x\/]4] where y^ > Ax^. Since this equation implies t h a t a: = 0 mod v and y = zx mod vfF (see Essay 3.2), t h e equation can be divided by V to give one of t h e form [/F, z + \/]4] = [2/ + xy/A]. This equation implies t h a t X is relatively prime t o y (any common divisor divides t h e coefficient of VA in 2: -f y/A) and therefore t h a t x is relatively prime t o y'^ — ^ x ^ , from which it follows t h a t [y + xy/A] = [y'^ - Ax^, y + XVA] = [y'^ - Ax'^, p + y/A] for some p {x is invertible mod y"^ — Ax'^). Therefore fF = y^ — Ax^. Since A = 0 mod p and fF ^ 0 mod p, it follows t h a t fF = y'^ ^ 0 m o d p. T h u s Xpif) = XpifF'^) = Xpiy'^F) = Xp{F), as was t o be shown. T h e proof of t h e analogous theorem for X4 in t h e case A = 3 mod 4 follows t h e same steps, except t h a t one needs t o prove t h a t if fF = y'^ — Ax^ and fF is odd, then X4(/) = X^i^)This follows easily from t h e observation t h a t y a n d X must have opposite parity (because fF is o d d ) , so one of t h e terms y'^ and —Ax"^ is 0 m o d 4 and t h e other is 1 m o d 4, resulting in XA.{fF) = 1, from which X4(/) = XA{F) follows. T h e o r e m . Let p and q be distinct odd primes. Ifp=l m o d 4, then Xq{p) = 1 implies Xp{q) = ^- VP = ^ mod 4, then Xq{p) = 1 implies Xp{q) = X4(^). ^/ p = g = 3 mod 4, then Xp{q) = -Xg(p)Proof If Xg(p) = I5 there is a module [q,g -h y/p] ^ [1]. By t h e proposition, t h e value of XP{Q) depends only on t h e class of [g, g + ^yp]. T h e kernel of t h e homomorphism defined by Xp froni t h e class group t o t h e group with two elements is either a subgroup of index two or it is t h e whole group. W h e n p = 1 m o d 4, Theorem 4 of t h e preceding essay states t h a t t h e class group has no element of order two. Therefore, t h e group has odd order (see Essay 5.2), so it can have no subgroup of index two, which implies t h a t t h e kernel is t h e whole group, so Xq{p) — 1 implies Xpio) = 1, as was t o be shown. W h e n p ~ 3 m o d 4, on t h e other hand, t h e class group contains a single element of order two, so t h e operation of squaring is a two-to-one homomorphism from t h e class group to itself whose image is a subgroup of index two. This subgroup is necessarily t h e kernel of t h e homomorphism determined by Xp, because this kernel contains all squares b u t does not contain t h e class of [p — 1,1 + ^ ] , because Xp{P~^) = ~ 1 - T h e homomorphism determined by X4 also has t h e subgroup of squares as its kernel, because its value for t h e class of [p, y/p] is XA{P) = —1- Therefore, these two homomorphisms are identical, and t h e statement t o be proved—that Xq{p) ~ 1 implies Xpio) = X4:{Q)—follows. W h e n p = q = 3 m o d 4 one finds in a similar way t h a t if either XP{Q) or Xq{p) is 1, t h e n t h e other is — 1 , b u t t h e possibility t h a t b o t h might be — 1 remains. Consider t h e class group in t h e case A — pq. By Theorem 4 of t h e last essay, t h e class group in this case has a single element of order 2, so the squares form a subgroup of index two, as before, t h a t is obviously
106
3 Some Quadratic Problems
contained in the kernel of the homomorphisms from the class group to the group with two elements that are determined by either Xp or Xq- I^ f^ct, since the class of [pg — 1,1 + -y/pq] is not in either kernel, these homomorphisms have the same kernel—the subgroup of squares—and are therefore identical. The stable modules [1], [p,-y/pq]^ [Q^ y/PO]^ ^^^ [PQ^ y/Pol ^^^ partitioned between the two ambiguous, primitive cycles. Since [1] is in the principal cycle and Ipq, y/pq\ is not (one step of the comparison algorithm shows that [pg, y ^ ] ^ [pg — 1,1 + y/p^]), exactly one of [p, >Jpq\ and [g, yjpq\ is in the kernel of the homomorphism in question, which is to say that exactly one of Xg(p) ^^^ XP{Q) is 1, as was to be shown. The Law of Quadratic Reciprocity. If p and q are distinct odd primes, then XP{Q) = Xq{p) unless p = q = 3 mod A, in which case XP{Q) — ~Xq{v)Proof. The last statement is of course part of the previous theorem. Since Xp(^) = — 1 is the negation of Xpio) — 1? the statement that XP{Q) — Xq{p) is the statement that Xpio) = 1 if and only if Xq{p) — 1Whenp = g = 1 mod 4, the theorem proves that Xq{p) — 1 implies XP{Q) — 1, and the desired conclusion follows by symmetry. When p = 1 mod 4 and g = 3 mod 4, the theorem proves that Xq{p) — 1 implies XP{Q) — 1 ^^d that XP{Q) — 1 implies Xq{p) ~ XA{P) — 1^ ^s was to be shown. Evaluation of ±1 mod 8.
XP(2).
If
P
is an odd prime, Xp(2) — ^ if ctnd only if p ^
Proof Consider the class group in the case A = S. It has two elements, the class of [7,1 + V^] and the principal class. An odd prime p satisfies Xp(2) = 1 if and only if it satisfies Xp(8) = 1^ which is true if and only if there is a module [p, g ^ ^/S] ^ [1] for some g. When this is the case, either [p^g + VS] or [p, q -h v^] [7,1 H- VS] is principal, which implies (unless p = 7) that either p or 7p is of the form y'^ — Sx'^ and therefore that p = ±1 mod 8. If p = 1 mod 8, then either [8,1 + y/p] or [8, 5 + ^ ] is primitive (because if (p— l ) / 8 is even, then |p — 25|/8 is odd). The homomorphism from the class group to ± 1 determined by Xp is trivial (since p = 1 mod 4, the class group has no element of order 2), so Xp(8) = 1 and Xp(2) = 1Finally, if p = 7 mod 8, then the last two of the four pivotal modules [1, ^ ] , [2,1 + v^], [p, y/p], and [ ^ , l-\-^/p] are not principal—[p, ^ ] because it is equivalent to [p — 1,1 + y ^ ] , for which Xp{P~ 1) = ""I? and [^y^, 1 + y/p] because X4(^y^) = — 1- Therefore [2,1 + -y/p] must be principal, which implies that [2,1 + v^] = [^ + x^], where 2 = y^ - px\ Thus, Xp(2) = Xpiv^) = 1, as was to be shown. Let X8 denote the function that assigns the value 0 to even numbers, the value 1 to numbers congruent to ± 1 mod 8, and the value —1 to numbers congruent to ± 3 mod 8, so that Xp(2) is Xsip) for all odd primes p.
Essay 3.5 Is A a Square Mod p?
107
Evaluation of Xp(^)- ^^^ A be a square-free number and let p be an odd prime that does not divide A. The coefficient O-A{P) in formula (1) at the beginning of the essay is given by
(^A{P)
'1 X4{p) Xsip) ,X4{P)X8{P)
if A if A ifA z/^
= = = =
l mod 4, 3 mod 4, 2 mod 8, 6mod8.
In particular, cr^(p) depends only on the classes of p and A mod 4 when A is odd and only on their classes mod 8 when A is even. Proof Since Xp(^) = 0 x ^ ( ^ 0 ? where the product is over the prime factors Ai of A, the evaluations of XP{Q) for prime q given above imply that
if A is odd or ^A{P)
=
X8{P)X4{PT
if A is even, where u is the number of prime factors Ai of A that are 3 mod 4. Because x^ipY is 1 when u is even and x^ip) when v is odd, the given formulas follow when one observes that an odd A is 3 mod 4 if and only if u is odd, and an even A is 6 mod 8 if and only if u is odd.
108
3 Some Quadratic Problems
Essay 3.6 Gauss's Composition of Forms T h e structure of Gauss's Disquisitiones Arithmeticae suggests t h a t the original purpose of his theory of composition of forms was to put the law of quadratic reciprocity in a setting t h a t would make it seem clearer and more natural. In the first three sections of t h e book he introduces the elementary theory of congruences and proves the important theorem t h a t there is a primitive root mod p for every prime p. In Section 4 he goes on to the statement and proof of what he calls the "fundamental theorem," essentially t h e law of quadratic reciprocity. His proof in Section 4 was described by H. J. S. Smith [60, P a r t 1, Art. 18] as "repulsive to all but the most laborious students." Perhaps Gauss felt the same way about it, because he gave a second and altogether different proof in Section 5; in later years he gave other proofs, indicating t h a t even then he was not satisfied t h a t he had grasped the t r u e basis of the phenomenon. It is misleading to think of Section 5 as just one of seven sections of the book, because in number of pages it is more t h a n half of t h e book. A large part of it is devoted to t h e theory of "composition" of binary quadratic forms, which is used (Article 262) to prove the "fundamental theorem," but which is also used in t h e study of ternary forms and is studied for its own sake. Surely "composition" represents an early step in Gauss's quest for the deeper secrets of number theory. Section 5 had a profound effect on the development of number theory in the 19th century. K u m m e r ' s proof of his generahzed reciprocity law in midcentury, a proof t h a t was found only after years of intense effort, was directly inspired [47, p. 20 (700)] by Gauss's proof of quadratic reciprocity in Section 5. But beyond t h a t , Section 5 was fundamental to the development of Dedekind's theory of "ideals" in t h e second half of the century, and in t h a t way directly influenced the core ideas of modern abstract algebra. Moreover, the use of the structure of the group (without the name) of equivalence classes of binary quadratic forms in Section 5 (together with another implicit use of groups in Section 7) contributed to the development of the theory of groups. But perhaps t h e profoundest way in which Section 5 affected the development of mathematics lay in the challenge t h a t it presented. Starting with Dirichlet, and continuing with Kummer, Dedekind, Kronecker, Hermite, and countless others, the unwieldy b u t fruitful theory of composition of forms called forth great efforts of study and theory-building t h a t shaped modern mathematics. The "forms" in Gauss's theory are b i n a r y q u a d r a t i c f o r m s , which is to say t h a t they are homogeneous polynomials (forms) of degree 2 (quadratic) in 2 variables (binary) with integer coefficients. He used the notation ax'^-{-2bxy-icy^ for such forms and, as this notation indicates, he only considered forms with even middle coefficients. Because I prefer not to impose this restriction, I have chosen different letters altogether and will denote binary quadratic forms by rx^ -f sxy + ty'^ ^ where x and y are the variables of the form and r, 5, and t are its integer coefficients. (In this essay and the next, I will do as Gauss
Essay 3.6 Gauss's Composition of Forms
109
did and use integers instead of the numbers in the narrowest sense that were used in the other essays of Part 3.) I will also write the form as (r, 5, t) when the variables can remain unnamed. The notion of "composition" generalizes Brahmagupta's formula (x^ - Dy^){u^ - Dv'^) = X^ - DY'^
when X = xu-^ Dyv and Y = xv + yu
(where D is a. fixed integer). Other examples are {x^ + xy + y^){u^ -huv-h v^) = X^ + XY-]-Y^ when X = xu — yv and Y = xv -{- yu -\- yv and (16x2 + 4xy - y'^){iu^ + 2uv - v'^) = 4X'^ + 6XY + F^ when X = Axu — 2xv — yu and Y = Axv -\- 2yu -\- yv. These formulas can be verified by the lengthy but simple process of performing the prescribed substitutions for X and Y on the right and expanding to find that the resulting polynomial in x, ?/, u^ and v is indeed the product of the two polynomials on the left. In general, given two binary quadratic forms rx^ + sxy H- ty'^ and pv? + (juv-\-Tv'^ with integer coefficients, a third form RX'^ ^SXY^TY'^ is transformable into their product (see Gauss's §235) if one can define X and Y to be sums of integer multiples of the monomials xu^ xv, yu, and yv in such a way that (ra^ + sxy + ty^){pu^ + 2auv + rv^) = RX'^ + SXY + T F ^ A form that is transformable into the product of two forms composes those forms if the six 2 x 2 minors of the 2 x 4 matrix of coefficients of the expressions of X and Y in terms of xu, xv, yu, and yv that effect the transformation have no common divisor greater than 1. (It is easy to check that the three formulas above are compositions. For example, the last formula shows that AX'^ + 6XY + y^ composes 16^^ + Axy — y'^ and 4tu^ + 2uv — v'^ because the minors of the matrix of coefficients 4-2 -10 0 4 2 1 that effects the transformation are 16, 8, 4, 0, —2, and —1.) To modern ears, the phrase "composition of forms" suggests a binary operation, assigning a composed form to each pair of given forms, but Gauss's compositions do not conform to this expectation. On the one hand, there may be no form that composes two given forms, while on the other hand, if some form does compose them, then infinitely many others also do,* because * Both the English and the German translations of the Disquisitiones wrongly translate the theorems of §236 and §249, among others, when they use definite articles rather than indefinite ones; the original Latin of course has no articles.
110
3 Some Quadratic Problems
infinitely many others can be obtained by a u n i m o d u l a r c h a n g e of varia b l e s in the composed form. Specifically, a m a t r i x [^ ^] with determinant 1 can be used to define U and V as sums of multiples of xu^ xv, yu^ and yv by U = aX ^bY and V = cX -\- dY; then substitution of X = dU - bV and Y = —cU-\- aV in the known composition using X and Y produces a composition using U and V. It seems fair t o say t h a t Gauss's theory in its full generality is largely forgotten. W h e n A n d r e Weil writes [63, p. 334] t h a t Dirichlet "restored its original simplicity" he is overlooking the fact t h a t Dirichlet composes only certain pairs of forms—pairs t h a t are concordant or einig—and t h a t Dirichlet justified this limitation by shifting all emphasis from the composition of forms to the composition of equivalence classes of forms, t h u s disregarding Gauss's success in developing t h e theory in the greatest possible generality. One could certainly argue t h a t in this case Gauss's insistence on generality was excessive—that t h e purposes to which the theory is put are served just as well by the mere composition of equivalence classes, and t h a t the classification of forms is so n a t u r a l t h a t the binary operation of composition of equivalence classes is a legitimate subject of s t u d y — b u t Gauss evidently disagreed. T h e technical demands of developing the theory in the way Gauss does are indeed formidable. This becomes clear from Gauss's very statement of what amounts to t h e associative law, not to mention the difficulty of proving the statement t h a t if F composes f and f, if T composes F and f", if F' composes f and f", and if T' composes F' and f, then F and F' are properly equivalent (§240, where it is assumed t h a t all forms enter directly into all compositions in the sense defined in Essay 3.7). The theory of multiplication of modules of hypernumbers resolves this conflict between the wish t o preserve the full* generality of Gauss's theory and the wish to avoid its technical difficulties. R u m m e r ' s first paper [46, p. 324 (208)] on "ideal prime factors" mentions the possibility of applying his new theory to Gauss's—at least to justify Gauss's belief t h a t t h e forms ax^ -\- 2bxy + cy'^ and ax'^ — 2bxy -\- cy^ should be considered to be inequivalent—but he never laid out the exact relation between the two theories. Similarly, Dedekind was well aware t h a t Gauss's composition of forms was, in essence, the multiplication of modules—"ideals" in his terminology—but he did not develop the correspondence in detail. Nor, as far as I know, has anyone since Dedekind. In all probability, the reason is t h a t it was felt t h a t Gauss's approach was a false start t h a t could b e disregarded and replaced with Dedekind's. But such an a t t i t u d e has t h e great disadvantage t h a t it destroys the access of modern readers to Gauss's classic. Essay 3.7 is meant to bridge the gap between the modern theory and Gauss's. On the one hand, modern readers will certainly see t h a t the multiplication of modules of hypernumbers is the multiplication of ideals in quadratic * In truth, Essay 3.7 does not preserve the full generality of Gauss's theory because it ignores forms (r, 5, t) for which 5^ — Art is a square.
Essay 3.6 Gauss's Composition of Forms
111
number fields with the added technicahty of deahng not with all integers in the field but with what Dedekind called orders (Ordnungen) of integers in the field. On the other hand, as Essay 3.7 shows, multiplication of modules makes possible the complete description of Gauss's compositions in the sense that it solves the problem. Given two binary quadratic forms, determine whether they can he composed, and if so, find all possible compositions. In brief. Gauss himself showed that two forms can be composed if and only if they pertain to the same square-free integer (see Essay 3.7) and showed that knowledge of one composition implies knowledge of all, because any two differ by a unimodular change of variables. Thus, the problem is solved by Proposition 3 of Essay 3.7, which describes how to use multiplication of modules to find an explicit composition of two given binary quadratic forms, provided they pertain to the same square-free integer.
112
3 Some Quadratic Problems
Essay 3.7 T h e Construction of Compositions Binary quadratic forms with integer coefficients will be called "forms" in this essay. For simplicity, all forms will be assumed to be irreducible; in other words, forms (r, s,t) whose discriminants s^ — Art are squares are excluded.* Theorem.
Given two forms,
construct
all forms that compose
them.
If {R,S,T) composes (r,s,t) and {p,a,r), then (—i?, — 5 , - T ) composes (—r, —5, —t) and (p, cr, r ) . Therefore, one can assume without loss of generality t h a t r is nonnegative. Since s'^ — Art is not a square, r is positive in this case. Similarly, p can be assumed to be positive. T h e required construction is accomplished by the four propositions t h a t follow. A form (r, s, t) will be said to p e r t a i n to a given square-free integer if its discriminant s^ — Art is a square times t h a t integer. In this way, each form pertains to one and only one square-free integer. (An integer is square-free if it is not divisible by any square greater t h a n 1.) T h e reducible forms t h a t have been excluded from consideration are simply the forms t h a t pertain to the square-free number 1 together with forms whose discriminant is zero. P r o p o s i t i o n 1. same square-free
If a form composes two others, then all three pertain to the number.
Corollary. / / two given forms pertain to different square-free no form composes them.
integers,
then
Proof. Given are three forms (r, 5, t), (p, cr, r ) , and (R, 5, T ) and a substitution X = poxu + pixv + p2yu + p^yv, Y = Qoxu + qixv + q2yu + qsyv, in which the 2 x 2 minors of the matrix of coefficients have no common divisor greater t h a n 1, whose substitution in RX^ + SXY -\- TY^ results in (rx^ + sxy + ty'^){pu^ + GUV -h r v ^ ) . This last statement, when the coefficients of the various monomials x^u^, x'^v'^, . . . , xyuv are compared, amounts to nine equations: (1)
Rpl^Spoqi^-^Tql^rp
(2)
Rpl^Spiqi^Tql^rr,
(3)
Rpl + Sp2q2 + Tql = tp,
(4)
Rpl + ^^3^3 + Tql = tr,
(5)
2Rpopi + S{poqi -^Piqo) + "^Tq^qi = ra,
* Characteristically, Gauss does not exclude reducible forms, as is shown by the point he makes in §235 of avoiding the assumption that the first coefficients of his forms are nonzero.
Essay 3.7 The Construction of Compositions
113
(6)
2Rpop2 + S{poq2 + P2<7o) + 2Tgog2 = sp,
(7)
2Rpip3 + S{piq3 + PsQi) + 2Tgig'3 = sr,
(8)
2i^p2P3 + S{p2q3 + ^3^2) + 2Tg2^3 = ta,
(9) 2R{pops + P1P2) + 5'(po93 + Pi92 + P2qi + P39o) + 2T(go^3 + ^1^2) = scr. Gauss's theory is based entirely on virtuosic algebraic deductions of general properties of transformations and compositions of forms from these nine equations. His very first step is to subtract 4 times the product of equations (1) and (2) from t h e square of (5). In the notation above, the result is (52 - 4RT){poqi
- piqof
= r\a^
-
Apr).
T h e right side is obvious; the left is not, b u t a little work with pencil and paper will confirm it. Since r > 0 and cr^ — 4 p r is not a square, this equation shows t h a t p^qi / Pigo, so (p, G^ T) and (JR, 5 , T) pertain t o the same square-free integer, namely, the integer obtained by dividing ( 5 ^ — 4RT){poqi — PiqoY — '^'^{^'^ — 4pT) by its largest square factor. By the symmetry of t h e definition of composition, {r,s^t) and (i?, S', T ) must also pertain to the same square-free integer, and the proposition is proved.* The c o n t e n t c of a form (r, s,t) coefficients.
is the greatest common divisor of its
P r o p o s i t i o n 2. Let (r, s,t) and {p^a^r) be forms for which s^ — 4 r t = a'^ — Apr and for which s and a are even, say s = 2s' and a = 2a'. In this case, s'^—Art = a'^—Apr — AA, where A = {s')'^—rt — {a'^—pr is not a square (by assumption). Let the module [r, s' + VA][p,a' -h VA] be put in canonical form, say it is [E][F,G + VA], and let H be defined to be (G^ - A)/F. Then a multiple of (F, 2G, H) composes (r, s, t) and (p, a, r ) . The specific multiplier is determined by the rule that the content of the composed form is the product of the contents of its ^factors, '^ (The theory of modules of hypernumbers was developed in t h e earlier essays only in the case in which A is a positive, nonsquare integer. Its extension to the case in which A is negative presents no difficulties at all. Therefore, forms with negative discriminants can be included in this proposition.) Proof T h e actual substitution t h a t accomplishes t h e composition is given implicitly by t h e formula (10) (rx + {s' + VA)y^
(^pu + {a' + VA)V^
= E (FX
+ ( G 4- \ / I ) y ) .
* By doing a good deal more algebraic work, Gauss proves Proposition 1—his "first consequence"—also in the case r — 0.
114
3 Some Quadratic Problems
Multiplication of this equation by its conjugate (the equation obtained by changing \/~k to — A/A) gives ((rx + s'yf - V ) ({pu + o'vf - Av''\ = E^ ({FX + GVf - AY^) , or, what is the same, rp{rx^ + sxy + ty^){pu^ + auv + rv'^) = E^F{FX'^ + 2GXY + i j y ^ ) . Thus, the substitution imphcitly defined by (10) transforms F X ^ + 2^X1^ + HY"^ times E'^F/rp into the product {rx'^ ^ sxy -\- ty'^){pu^ -\- auv -\- rv'^). This is the multiple of (F, 2G, H) described in the proposition. What is to be shown is that (1) the coefficients of the substitution implicitly defined by (10) are integers, that (2) c^ == '^r^C', where c, 7, and C are the contents of (r, 5,t), (p, cr, r ) , and (F, 2G, H) respectively, and that (3) the greatest common divisor of the six 2 X 2 minors of the matrix of coefficients of the substitution is 1. Comparison of the coefficients of y/A on the two sides of (10) shows that rxv -\- pyu -\- {s' ^ a')yv = EY^ after which a comparison of the remaining terms shows that rpxu + ra'xv + ps'yu + {s'a' + A)yv = EFX -\- G{rxv + pyu + (s' + a')yv). Therefore, the matrix of coefficients of the substitution determined by (10) is rp EF
r{a' — G) p{s'-G) EF EF
0
-
^
E
P E
s'a'-\-A-G{s'-\-a') EF s'-\-a' E
That the entries of this matrix are integers can be seen as follows: By definition, [r, s' + y/A][p, cr' + v ^ ] = [E][F, G + VA]. Thus, F , F , and G are found by putting [rp, ra' + r^/A, ps' + p v ^ , s'a' -\- A^ {s' + G')\fA\ in canonical form. Let A, i3, and C be integers for which Ar -^ Bp -\- C{s' + a') is the greatest common divisor of r, p, and {s' -\- a'); call it d. Clearly, d divides both coefficients of all four numbers in the product module [rp, ra' + ryJ~A,ps'-\-p\fA,s'a'^A^-[s'^a')\fA\ with the possible exception oi s'a' ^A, and d divides this coefficient as well because A = [s'Y mod r, so s'a' -\- A = {s'a' + {s')'^) = s'{a' -f 5') = 0 mod d. Let GQ be defined by the equation 6/(Go + V I ) = A{ra' + r V I ) + B{ps' + p V I ) + ^(sV' + A + (5' + a^ V I ) . Thend(Go+VI) is a sum of multiples of hypernumbers already in the product module, so it can be annexed to the list, and multiples of it can be subtracted from the previous entries to put the product module in the form [rp, ra' + r\fA, ps' + p\fA, s'a' + ^ + (5' + a')^/A, d{Go + VA)] = [rp, ra' - rGo, ps' pGo, s'a' + A-{s'^ a')GQ, d{Go + VA)]. This module is [dFo, d{Go + VI)], where by definition [dFo] = [rp, ra' — rGo, ps' — pGoj s'a' -\- A — {s' ^ a')Go](Note that all entries in the module on the right are divisible by d.) In short, [F][F, G + VA] = [(i][Fo, Go + V~A]- By the uniqueness of canonical form, it
Essay 3.7 The Construction of Compositions follows t h a t E = d^ F = F Q , and G = Go m o d F , provided
115
it is shown t h a t
Gl = AmodFo. Let a representation of a module b e called full if VA times any hypernumber in t h e list can be written as a s u m of integer multiples of hypernumbers in t h e list. If g^ = A m o d / , t h e n [f,g + \/A] is full, because fVA = -gf + f{g + y/A) and {g + VA)y/A = qf + g{g + v ^ ) , where q is defined by q = {A — g'^)/f. Thus, t h e representation of any module in canonical form is full. A product of full representations is full, and any representation obtained from a full representation by adding one hypernumber in t h e list t o another or subtracting one from another is full, and t h e same is true for t h e operations of rearranging or of annexing t o or deleting from t h e list a zero entry. Therefore, [dF^^dGo + d\fA\ is full, which implies t h a t [dGo + dy/A)y/A = m • dFo + n • {dGo + d\/^) for integers m a n d n, which implies dGo = ^d and dA = mdFo-\-ndGo = mdFo-\-dG^^ so A = G^ m o d FQ, as was t o b e shown. T h e identity E = d shows t h a t t h e entries in t h e last row of t h e substitution m a t r i x are integers. T h e identity EF = dFo shows t h a t [EF] = [rp^ra' — rGo^ps' — pGo, s'a' -\- A — {s' -{- CF')GO] a n d therefore shows, because dGo = dG m o d dF^ t h a t t h e entries in t h e first row of t h e substitution m a t r i x are integers. By direct computation, t h e six 2 x 2 minors of this m a t r i x of coefficients are seen t o be - ^ ^ times r, p, s' -\-G'^ G' — S', T , a n d t. Thus, t h e greatest common divisor of t h e six minors is 1 if a n d only if \E^F\ = \rp\ [r, p, s' ^a' ^a' — s' ^r^ t]. T h a t this is t h e case can be proved as follows: It was seen above t h a t \EF\ — \dFo\ is t h e greatest common divisor of r p , ra' — r G o , ps' — pGo-, and s'a' + A — (5' + o'^Go^ so w h a t is t o b e shown is that \E\\rp,rG'-rGo.ps'-pGo.s'o
^A-{s'^G')GO\
=
\rp\\r,p,s'^G',G'-s',T,t\.
T h e first t h r e e t e r m s on t h e right—r^p, rp^ a n d rp(^s'^G'^—are all divisible by Erp and are therefore zero modulo t h e module on t h e left. T h a t t h e remaining terms on t h e right are zero modulo t h e module on t h e left follows from t h e identities rpio' - s') = I rpT = rpt = ^
• E{ra' - rGo) - ^ • E{ps' - pGo), ^-^
• E{ra' - rGo) - ^ • E{s'a'
+ A-{s'
+
a')Go),
^
• E{ps' - pGo) - ^ • E{s'a'
+ A-{s'
+
a')Go).
On t h e other hand, t h e four t e r m s in t h e module on t h e left are zero modulo t h e module on t h e right by virtue of Erp = Ar'^p + Brp^ + Crp{s' + G'), E{G'
-GO)=BP{G'
-s')^CpT,
116
3 Some Quadratic Problems E{s' -Go) E{s'a'
+ A-{s'^
G')GO)
=Ar{s'-a')+Crt, = -Arpr
-
Brpt.
(The last three equations are found by eliminating one of A^ i3, and C from E^Ar-^Bp-^ C{s' + cj') and EGQ = Ara' + Bps' + C{s'cj' + A).) Only the equation c^rp = E'^FC of step (2) remains to be proved. As was shown in the proof of Theorem 3 of Essay 3.4, if [/, ^ + VA] is in canonical form, then [f^g^^][fJ-g^^/A] = [f][d,g^y^], where [d] - [/, 2^, ^ ] . Thus, since [c] = [r, 2 s ' , t ] , the product of [r^s' + VA] with its conjugate is [r][c^ s^-{-yA\. In t h e same way, the product of [p, a^ -\-V~A\ with its conjugate is [p] [7, cr' + \/]4], and t h e product of [F, G + A/Z] with its conjugate is [F] [C, G + / A ] . Therefore, multiplication of [r, s' + \ / l ] [p, cr' + 7 1 ] = [E] [F, G + A/A] by its conjugate gives t h e equation [rp][c, s' + V ^ ] [ 7 , cr' + ^/A] = [F^F][G, G + \/]4]. Since 5' = —s' m o d c, cr' = —a' mod 7, and G = —G mod G, all modules in this last equation are self-conjugate, so the product of this equation with its conjugate is simply [r^p^] [c] [c, s' + VA] [7] [7, a ' -h 7 1 ] = [F^F^] [G] [G, G + \ / I ] . Replacement of [rp] [c, 5' + VA] [7, a ' + ^/A] with [F^F] [G, G -f / A ] on the left side of this equation gives [rpnE^F][C,G + ^/A] = [E^F^C][C,G ^ ^/A]• Reduction of G mod G puts b o t h sides in canonical form, so rpc^E'^F = E^F'^C, from which t h e desired equation rpcj = E^FC follows. P r o p o s i t i o n 3 . Given two forms pertaining construct a form that composes them.
to the same square-free
integer,
Proof Let (r, 5,t) and (p^a^r) be the given forms. Because they pertain to the same square-free integer, positive integers m and p can be chosen to make w?{s'^ — Art) = p'^{a^ — Apr). Moreover, because m and p can be doubled, if necessary, one can assume without loss of generality t h a t ms and pa are even. Proposition 2 t h e n constructs a form, call it (F, 2G, H), t h a t composes {mr, ms, mt) and {pp, pa, pr) whose content is the product of the contents of {m.r,m.s,mt) and {p.p,pa,pr). Therefore, F , 2G, and H are all divisible by m/i, and the same substitution t h a t gives (F, 2G, H) as a composition of {mr, ms, mt) and {pp, p.a, pr) gives f ^ , ^ , ^
j as a composition of (r, s, t)
and {p, a, r ) , as was t o be constructed. It remains to find all compositions of two given forms t h a t pertain to the same square-free number. The compositions constructed by Proposition 2—and therefore those constructed by Proposition 3—contain both forms dir e c t l y in the sense Gauss defines t h a t term, which is to say t h a t the first two of the six minors, the minors poQi — PiQo and poq2 — P2(lo^ are both positive. (These minors are the positive integers r'^p/E'^F and rp^/E'^F.) More generally, the first form (r, s, t) enters d i r e c t l y if po92 —P2Q0 is positive, i n v e r s e l y if it is negative. (As was seen in the proof of Proposition 1, it cannot be zero.) W h e t h e r {p,a,r) enters directly or inversely is similarly determined by the sign of poQi — piQo- Since reversing t h e sign of y changes (r, s,t) to (r, — s, t)
Essay 3.7 The Construction of Compositions
117
and changes the signs in the last two columns of t h e matrix of coefficients of the substitution while leaving everything else unchanged, it changes a composition in which (r, 5, t) enters directly into one in which it enters inversely and vice versa. Similarly, a composition in which (p, cr, r ) enters directly can be changed into one in which it enters inversely and vice versa. In this way, all compositions of (r, 5, t) and (yo, cr, r ) can be found by finding one composition of each of the four combinations of (r, ± 5 , t) and (p, zbcr, r ) in which b o t h forms enter directly and applying the final proposition: P r o p o s i t i o n 4 . Given one composition of a pair of forms in which both forms enter directly, every other such composition is obtained by a unimodular change of variables. Proof. Let the given forms be (r, s, t) and (p, a, r ) and let the given composition RX^ + SXY + TY^ be effected by t h e substitution whose m a t r i x of coefficients is TPO P I P 2 P 3
L^o qi q2 qsLet Aij be the six 2 x 2 minors of this matrix, where 0 < i < j < 3 and ^ij ~ Piqj ~ Pjqi- It is given t h a t AQI and Z\o2 are positive. It is to be shown t h a t any other composition matrix for t h e same two forms in which AQI and Z\o2 are b o t h positive can be obtained by multiplying t h e given m a t r i x on t h e left by a 2 X 2 m a t r i x of integers whose determinant is 1. Every composition is accomplished, as was seen in t h e proof of Proposition 3, by a substitution t h a t accomplishes t h e composition of two forms with the same discriminant. In this case, the formulas ( l ) - ( 9 ) above imply, as Gauss proves in §235, t h a t A^^ = r^, Z\oi(^03 - ^12) = rs, A01A23 = rt, A\^ = p^, AQ2{AQ2, + ^12) = per, and Z\o2^i3 = P^- (These are Gauss's equations ( 1 2 ) (14) and (18)-(20) when use is made of his identity A^ — dd'.) Since it is given t h a t zioi > 0, t h e first equation determines Z\oi, after which the next two equations determine Z\o3 — A12 and Z\23- Similarly, t h e remaining three equations determine AQ2^ ^ 0 3 + ^ 1 2 , and Z\i3. In short, in any two compositions of the same forms (in which b o t h forms enter directly), the six 2 X 2 minors of t h e matrix of coefficients of the substitution are identical. Then, as t h e elementary lemma Gauss proves in §234 shows, one m a t r i x can be obtained from t h e other by multiplication on t h e left by a matrix of integers with determinant 1, as was to be shown. Example: Compose ( 1 6 , 4 , - 1 ) and ( 4 , 2 , - 1 ) . T h e discriminants of these forms are 80 and 20, respectively, so b o t h pertain to 5. Multiply the second by 2 so t h a t t h e problem is to compose (16,4, —1) and (8,4, —2) with discriminant 80. By Proposition 2, the composed form is computed by p u t t i n g the product [16, 2 + \/20] [8, 2 H- \/20] in canonical form. T h e computation of this canonical form is [128,32 + 1 6 ^ 2 0 , 1 6 + 8 ^ 2 0 , 24 + 4 ^ 2 0 ] - [4] [32,4 + 2 A / 2 0 , 6 + ^ 2 0 ] = [4][32,24,6 + v ^ ] = [4][8,6 + A/20]. Thus, £; = 4, F = 8, G = 6, so
118
3 Some Quadratic Problems
H = {6^ - 20)/8 = 2 and some multiple of (8,12,2) composes (16,4,-1) and ( 8 , 4 , - 2 ) . The product of the contents 1 and 2 of the "factors" is the content 2 of (8,12,2), so this form itself composes (16,4, —1) and (8,4, —2). Division by 2 then implies that (4,6,1) composes the given forms (16,4, —1) and (4, 2, —1). The matrix of coefficients that describes the substitution that accomplishes the composition is rp EF
r(a'-G) EF
p{s -G) s'(T'+A-G{s'+a') EF ^F ^
0
-
-^
_ ^
E
E
S -\-(J
E
4-2-10 0 4 2 1
as can be found either by evaluating the matrix entries or by solving the equation (^16x + (2 + V20) y^ (^Su + (2 + \/20) v^ =4 (^SX + (6 -i- \/2o)
F)
for X and Y in terms of x, y, u^ and v. This is the third example of a composition given in Essay 3.6. In the composition just given, both forms enter directly. The same method gives "82 - 1 r 02 1 0 as a matrix that describes a substitution giving (1, 0, —5) as a composition of (16, —4, —1) and (4, 2, —1) in which both forms enter directly. Reversing the sign of y then gives the substitution 82 1 - 1 0 2-10 which gives (1,0, —5) as a composition of (16,4, —1) and (4,2, —1) in which the first form enters inversely and the second enters directly. To find all compositions of (16,4, —1) and (4,2, —1), one would also find a composition in which the first entered directly and the second entered inversely, and a composition in which both entered inversely. Every composition of the two forms is then obtained from one of these four by a unimodular change of variables.
The Genus of an Algebraic Curve
Essay 4.1 Abel's Memoir It appears to me that if one wants to make progress in the study of mathematics one should study the masters and not the pupils.—Niels Henrik Abel, quoted from an unpublished source in [52, p. 138] Niels Henrik Abel's submission of his Memoire sur une propriete generale d'une class tres-etendue de fonctions transcendantes to the Paris Academy in October 1826 should have been a high point in the history of mathematics. Instead, it was a low point in the history of the Paris Academy. Abel, lonely and unknown, was temporarily in Paris thanks to a travel grant from the government of Norway, and he hoped to win recognition in the city that was then the mathematical capital of Europe. Unfortunately, he naively believed that recognition could be won by submitting a work of undeniable genius to Europe's leading mathematical institution. He did not understand that works of undeniable genius are inherently difficult to read, even for the most learned readers, and he did not understand that the members of Europe's leading mathematical institution would not devote the needed time and thought to the work of a 24-year-old mathematician who was unknown to them and who came from a country they had scarcely heard of. Of course, one of the famous men of the Academy might by some lucky accident have taken notice of the memoir long enough to realize that it was worth pursuing, but none did. In 1837, eight years after Abel's untimely death, the Norwegian scholars charged with publishing Abel's collected works applied to the Academy via the Norwegian government and its diplomatic representatives in Paris for a copy of the memoir—Abel had apparently not kept a copy for himself—but the effort did not succeed, and the memoir is absent from the first publication of Abel's works in 1839. Finally, the Academy did publish the memoir in 1841, making it available to eager readers like C. G. J. Jacobi for the first time.
120
4 The Genus of an Algebraic Curve
Fig. 4 . 1 . Abel.
In the two and a half years Abel lived after submitting the memoir, he enjoyed a growing reputation based on his publications in Crelle's Journal, but he patiently awaited publication of t h e Paris memoir, believing it would ensure his fame. He even alluded to the memoir in one of his published works, piquing the curiosity and indignation of Jacobi, who read the allusion too late to write to Abel about it. T h a t Abel's memoir remained unpubhshed in his lifetime* deprived him of the challenge and encouragement of readers' responses and therefore probably deprived mathematics of important further work. (Incredibly, the tragedy was repeated only three years after Abel died when Galois went t o an early grave ignored by the same Paris Academy.) Abel's memoir deals with integrals of algebraic differentials, a topic t h a t is not at all easy to understand from t h e point of view of naive geometry and integration along a curve. Because an algebraic differential like dx/^/l — x^ is "many-valued" and because, moreover, an integral of such a differential depends on choosing b o t h a p a t h and a constant of integration, modern readers * The last work Abel published was a brief note that contained a theorem from the memoir. Abel's biographer Oystein Ore says that the theorem of that last brief note is the theorem of the memoir [52, p. 219], but it is far short of the theorem in the introduction of the memoir that I am discussing in this essay and that I take to be, in Ore's phrase, "the main theorem from the Paris memoir."
Essay 4.1 Abel's Memoir
121
may well despair of understanding even what Abel means by the sum of a finite number of integrals of a given algebraic differential, much less why questions about such sums might be interesting or significant. But there is another way to describe the main idea that makes better sense to modern readers and explains the main theorem of the memoir more clearly. Abel's "algebraic differentials" are differentials of the form / ( x , y)dx^ where / is a rational function of two variables and where y is an "algebraic function" of X. The notion of an "algebraic function" has become a source of unease for modern readers because an algebraic function is normally "many-valued" and the property of being single-valued is the essence of the set-theoretic notion of a "function." But of course there are modern ways to deal with algebraic functions. One is to give the functions their own special domain; this is the source of the theory of Riemann surfaces. The other is to regard an "algebraic function" not as a function at all, but simply as an element of an algebraic function field, which is to say an algebraic field whose transcendence degree is positive (see Essay 2.2). The subject of Abel's memoir is algebraic functions of one variable, which is to say, in the terminology of Essay 2.2, elements of an algebraic field of transcendence degree 1. In other words, Abel is dealing with the field of rational functions on an algebraic curve defined over the rationals. The concept that I propose as an aid to understanding Abel's memoir is that of an algebraic variation of a set of points on an algebraic curve. Abel describes such a variation as the solutions of a pair of equations x{x,y) = 0, (9(x,i/,a,a',a'',...) = 0, where xi^^v) is the irreducible polynomial with integer coefficients, monic in y, that defines the algebraic curve under discussion, and 0{x, y, a) is an auxiliary polynomial in x and y whose coefficients a, a'^ a", ... are indeterminates. For each fixed value of the coefficients a, a', a"^ . . . the pair of equations determines a set of points {{xk^yk)} on the curve x — ^'> ^^^ ^s the coefficients vary, these points vary along the curve. A variation of points on the curve that can be generated in this way is an algebraic variation. Somewhat more precisely, let C^ denote the set of all A^-tuples of points on the curve C defined by xi^^v) = 0- An algebraic variation of a point of C^ is determined by choosing a 0{x^y^a) of the form 0{x^y,a) = ^OijX^y^, where the exponent pairs (i, j ) are in some specified finite set. To say that 0{x,y^a) = 0 at a particular point of the curve xi^^v) — 0 means that the parameters aij in 6 satisfy a certain (linear) condition. Choose values for the aij that make ^ = 0 at all A^ of the given points. There will be other points of ;)^(x, 2/) = 0 where ^ = 0 for these values of a^j, say there are M of them. An allowable variation of the N given points is one that results when the CLri j are allowed to vary from their fixed values in such a way that the M additional zeros all remain at zero while the N original ones are allowed to move. For each point of C^, the points of C^ that can be reached from it by a sequence of algebraic variations lie on an algebraic subvariety of C^.
122
4 The Genus of an Algebraic Curve
Abel probably h a d some geometric conception of such variations of sets of points on xi^^v) — O5 b u t exactly what it might have been can only be guessed. Today one would never discuss intersection points without first specifying an algebraically closed ground field, b u t Abel would probably not have thought of curves as ordered pairs of complex numbers in anything like the modern way. More likely, he would have just imagined sets of points of intersection of an ordinary plane curve with an auxiliary curve and considered constraints on variations of t h e intersection points produced by varying the auxiliary curve. In modern terms, the number of constraints on the variation of A^ points of a curve is t h e codimension of t h e subvariety of algebraic variations within t h e AT-dimensional variety C^ of all variations. This codimension is very nearly t h e same as t h e g e n u s of the curve, and whatever his geometric conception of t h e problem setting may have been, it is this number t h a t Abel successfully investigated. In terms somewhat closer to Abel's, if /(a^, y)dx is an algebraic differential (which is to say, a rational function / on the curve C times the symbol dx), and if an infinitesimal algebraic variation of t h e points {{xk^Uk)} is performed, Abel asserts t h a t the resulting variation of Yl f{^k^yk)dxk is a differential t h a t can be expressed rationally in terms of the parameters aij and their differentials.* Thus, if the point ( P i , P25 • • • ? PN) can be moved to the point (Q15 Q25 • • • 5 QN) of C^ by an algebraic variation, then pQi
/ J Pi
rQ2
f{x,y)dx-\-
PQN
f{x,y)dx-\
+ / J
J P'2
f{x,y)dx PN
is equal to t h e integral of a rational differential in the aij and can therefore be expressed in terms of elementary functions—logarithms and trigonometric functions, as well as rational functions—of t h e aij. Now let g be the codimension of the subvarieties of algebraic variations. T h e n N — g oi the points ( P i , P2, . . . , P/v) can be moved in arbitrary ways by an algebraic variation, provided the remaining g points move in such a way as to keep t h e new (P{, P2, . . . , Pj^) on the same subvariety. Thus, if O is a chosen base point on the curve C, there is an algebraic variation—or at least a succession of algebraic variations—of a point ( P i , P2, . . . , PN) of C^ t h a t connects it to a point of C^ of the form (O, O , . . . , O, Q i , (52, • • • Qg)- Then / Jo
f{x,y)dx-h
f{x,y)dx-\ Jo
pPN-g+l
+ / JQI
+ / Jo
f{x,y)dx
PPN
/ ( x , y)dx + • • • + / JQg
/ ( x , y)dx
can be expressed in terms of elementary functions of the parameters used in the variation, so t h a t when t h e g integrals from O to Qi are added, one obtains * This, in essence, is the theorem of Abel's last published note that Ore mistook for the main theorem of the memoir. See the note above.
Essay 4.1 Abel's Memoir P2
/ Jo
f{x,y)dx-\-
/ Jo
PPN
f{x,y)dx
+ "--\- / Jo
f{x,y)d.
f-Qg
rQi
= / Jo
123
f{x,y)dx^'"^
I Jo
f{x,y)dx-\-E,
where E can be expressed in terms of elementary functions of the parameters of the variation. (The p a t h s of integration are, of course, t h e ones determined by the algebraic variation from (O, O , . . . , O, Qi^ Q2, • • • Qg) to ( P i , P2, • • •, Pjv) t h a t is assumed.) Thus, disregarding elementary functions, a sum of any number N of integrals of f{x^y)dx can be expressed as a sum of just g integrals, where g depends only on t h e differential / ( x , y)dx being integrated— and in fact depends only on the algebraic curve xi^^v) — 0 ^^ which t h e differential has its existence—not on N. This is the main theorem of Abel's Paris memoir. In Abel's own words, "The number of these conditions [the number g above] does not depend at all on the number of summands, b u t only on the n a t u r e of the particular integrands t h a t one considers. Thus, for example, for an elliptic integrand this number is 1; for an integrand t h a t contains no irrationalities b u t a radical of the second degree, under which t h e variable has degree at most six, the number of necessary conditions is 2, and so forth."* I have said above t h a t the crucial number of conditions g is "roughly" the genus of the curve C. Abel's statement t h a t ^ is 1 in the elliptic case, 2 in case y^ is a polynomial of degree 5 or 6 in x, and so forth, of course suggests t h a t g is connected to t h e genus and is t h e genus in m a n y cases. It fails to be the genus only because Abel bases his variation of the points on the variation of parameters aij in functions of the form 0{x, V^ci) = J2 ^ij^^V^ 1 which is not quite general enough and in some cases gives too large a value for g because it omits certain variations t h a t deserve to be called algebraic variations. W h e n 6 is instead taken to have the form 9{x^ 2/5 <^) = XI ^i^ii^i v) where the "functions" Oi[x^ y) are integral over x—which may reduce g because it may include more variations—g becomes the actual genus, as will be shown in Essay 4.6.
In an effort to clarify Abel's statement, I have taken some liberties with the translation. His actual words were, "Le nombre de ces relations ne depend nullement du nombre des fonctions, mais seulement de la nature des fonctions particuliere qu'on considere. Ainsi, par exemple, pour une fonction elliptique ce nombre est 1; pour une fonction dont la derivee ne contient d'autres irrationalites qu'un radical du second degre, sous lequel la variable ne passe pas le cinquieme ou sixieme degre, le nombre des relations necessaires est 2, et ainsi de suite." His "fonctions" are the integrals above, and his "derivees" are the integrands. What he is calling "une fonction elliptique" is what is today called an elliptic integral.
124
4 The Genus of an Algebraic Curve
Essay 4.2 Euler's Addition Formula Man sollte weniger danach strehen, die Grenzen der mathematischen Wissenschaften zu erweitern, als vielmehr danach, den bereits vorhandenen Stoff aus umfassenderen Gesichtspunkten zu betrachten. (One should strive less to extend the boundaries of the mathematical sciences and much more to treat the already available material from more comprehensive viewpoints.)—E. Study [24, p. 140] Euler stated his addition formula for elliptic integrals in a variety of ways, none of which shed enough light on the formula to suggest a generalization to other kinds of integrands. The great achievement of Abel's Paris memoir was to describe Euler's formula as the case ^ = 1 of a more general phenomenon. It is customary today to describe an elliptic curve by a formula of the form y'^ = x^ -f g2X -\- gs, in which ^2 and gs are rational numbers, called its "Weierstrass normal form." When the curve is written in this form, the "addition" or "group law" on the curve is described as follows: Let P and Q be given points on the curve, and let S be the third point in which the line through P and Q intersects the curve. (The curve, being a cubic, intersects a line in the xy-plane in three points when they are counted in the right way.) The sum R — P -\- Q oi P and Q is defined to be the third point in which the line through S and the point at infinity intersects the curve. (The lines through the point at infinity are the lines x = constant—these are the lines that intersect the curve in only two finite points—so R is the point whose xcoordinate is the same as that of S and whose ^/-coordinate is the ^-coordinate of S with the sign reversed.) This construction is connected to the theorem of Abel's memoir in the following way: Let 0{x^ ?/, a, 6, c) = ax -h by -\- c. Algebraic variations of the given points P and Q are obtained by choosing initial values for a, 6, and c that make 6 zero at P and Q and allowing a, 6, and c to vary in such a way that the third point of intersection of the line with the curve, call it 5, remains fixed. In other words, the algebraic variations of the pair (P, Q) are the pairs of points (P^ Q') on the curve for which P ' , Q', and S are colinear. In particular, if a point O of the curve is chosen as the origin—or the identity of the group law—then the algebraic variation of (P, Q) that carries P to O carries Q to the third point R in which the line through O and S intersects the curve. When O is chosen to be the point at infinity, R is the point P^Q described above. Abel's point of view explains why this "addition" is useful and shows that it is intrinsic to the curve. According to Abel, for any rational function f{x,y) of X and 2/ = \ / x ^ ~ T ^ ^ x T ^ , the sum f^ f{x,y)dx + f^ f{x,y)dx can be expressed in terms of integrals of rational functions, or, what is the same, / ^ / ( x , y)dx + / Q f{x,y)dx = J^ / ( x , y)dx plus an integral of rational functions. In particular, in the special case / ( x , y) = ^,in which the integrand is holomorphic in the sense explained in Essay 4.6, the formula is
Essay 4.2 Euler's Addition Formula
125
Fig. 4.2. Euler.
Fig. 4.3. The operation of addition on an elliptic curve.
r d^ Jo y
r dx_ f lo Jo y Jc
y '
which is one form of Euler's addition theorem. More precisely, these integrals depend on the paths of integration, and for the formula to hold, these paths must be chosen correctly. Thus, the sum of two integrals of dx/y can be expressed as just one integral of the same integrand, provided the limits of integration satisfy a certain algebraic relation and the paths of integration are chosen correctly. Once it is known that two such integrals can be reduced to one, it follows that any number of such integrals can similarly be reduced to one. Abel's
126
4 The Genus of an Algebraic Curve
construction describes this all at once, rather t h a n as a step-by-step reduction. An algebraic variation of a set of points ( P i , P2, . • . , PN) on t h e curve is described by a function 6{x,y,a) of t h e form ^aijx'^y^ for some selection of exponent pairs (i, j ) . Since y^ is a polynomial in x^ it is natural to assume t h a t all of t h e chosen values of j are less t h a n 2, so t h a t 9 takes the form 01 (x) + (/)2(x)y, where (j)i and 02 are polynomials in x containing terms of certain specified degrees whose coefficients are indeterminates an and a^2. T h e procedure is to give 0 enough t e r m s t h a t values can be chosen for t h e parameters aij t h a t make 0 zero at t h e given points P i , P2, . . . , PN , and then to allow t h e parameters to vary from their chosen values in such a way t h a t the value of 6 remains at zero for t h e zeros of 0 other than P i , P2, . . . , PAT, while t h e A^ given zeros of 6 are allowed t o vary. T h e main question is. How many conditions are imposed on t h e variation of t h e A^ points along the curve by t h e requirement t h a t t h e variation b e describable in this way? T h a t t h e answer is 1—that t h e genus of this curve is 1—can be seen in t h e following way. T h e crucial step is t o determine t h e number of zeros oi 6{x^y) = 01 (x) + (j)2{x)y on t h e curve. A simple way to do this is to make use of t h e idea t h a t a rational function on a n algebraic curve assumes each value t h e same number of times, when they are counted properly, and in particular t h a t the number of zeros is equal to the number of poles. T h e function x assumes every value twice, and in particular, it has a double pole at the one point where x = CXD. The function ?/, on t h e other hand, assumes every value three times and has a triple pole at t h e one point where x = cx). (These statements can be justified in various ways, b u t since they are used here only as heuristic devices, no formal justification will be given.) It follows t h a t a polynomial 0 ( J : ) of degree u has 2u poles, all of t h e m at the point where x = 00, and t h a t A/', t h e A^ conditions on t h e 2iy coefficients of 9 imposed by t h e requirement t h a t 9 be zero at A^ given points can be satisfied by some choice of 9. Since 9 has 2z/ zeros, it has 2u — N zeros other t h a n t h e A" required ones, and t h e algebraic variations of t h e A" given points are found by varying t h e 2v coefficients of 9 in such a way t h a t these 2u — N extra zeros remain as zeros. T h e 2iy — N conditions stating t h a t 9 must have these zeros are independent, so t h e coefficients of 9 then vary with 2iy — {2v — N) = N degrees of freedom. However, multiplication of ^ by a constant does not change its zeros, so varying t h e coefficients of 9 with A" degrees of freedom varies its zeros with only A^ — 1 degrees of freedom. In short, an algebraic variation of A' given points moves t h e m in only A' — 1 different directions, which is to say t h a t algebraic variations satisfy one constraint in this case. Otherwise stated, algebraic variations describe subvarieties of codimension 1 in C^. This description of t h e phenomenon is in no way tied to t h e Weierstrass normal form. Gauss alludes indirectly t o t h e elliptic curve 7/^ = 1 — x^ in t h e
Essay 4.2 Euler's Addition Formula
127
introduction to Section 7 of the Disquisitiones Arithmeticae when he mentions the transcendental functions related to integrals of dx/y/l — x^. Euler too dealt with the curve y'^ — 1 — x^ [26], for which explicit and beautiful formulas can be developed for the addition law, and it is clear from Abel's published papers that this particular curve is one that he studied intensely. To require that it be put in Weierstrass normal form before the group law is described loses certain symmetries that deserve to be kept. But the above heuristic derivation of the fact that a curve in Weierstrass normal form has genus 1 also proves that y^ = 1 — x^ has genus 1, because in this case x is CXD at two points, both of them simple poles, whereas y has double poles at these points ( ( ^ ) ^ = {'^Y~^ is finite when X — oc), so 0{x,y) = 01 (x)+02(x)2/has a z/-fold pole at each—and therefore 2iy zeros—when deg(/)i = z/ and deg(j)2 = v — 2. Again the number of parameters in such a function 0i(x) + (t>2{x)y is 2z^, and the same arguments then show that the algebraic variation of N points on the curve moves them with only A^ — 1 degrees of freedom and therefore determines subvarieties of C^ of codimension 1. In the same way, Abel's construction generalizes the Euler addition formula to any curve C for which the algebraic variations describe subvarieties of C^ of codimension 1. If (Pi, P25 • • • 5 PN) is moved to (O, O,... ,0,R) by means of an algebraic variation, then, as before, rPi
rP2
pR
PPN
(1) / f{x,y)dx^j f{x,y)dx + "-+ Jo Jo Jo
f{x,y)dx==
f{x,y)dx-\-E, Jo
where E' is a quantity that can be expressed in terms of integrals of rational functions. (Moreover, E is zero when the integrand is holomorphic in the sense defined in Essay 4.6. This is true, as will be shown, of the integrand dx/y for curves in Weierstrass normal form or for the curve y^ = 1 — x^.)
128
4 The Genus of an Algebraic Curve
Essay 4.3 An Algebraic Definition of t h e Genus Modern t r e a t m e n t s of t h e genus of a curve normally describe it in terms of t h e topology of the associated Riemann surface. Therefore, modern mathematicians are usually amazed to learn t h a t the idea stems from Abel, who lived and worked at a time when even the notion of a complex function of a complex variable was in its early infancy and t h e notion of a Riemann surface was still in the future. (Riemann surfaces first appeared in Riemann's dissertation [57] of 1851.) But as the discussion in the preceding essay shows, Abel's point of view does not depend on complex numbers. T h e geometric picture of N points on the curve varying with N ~ g degrees of freedom t h a t was presented in the preceding essays does depend on complex numbers, because the coordinates of the intersection points defined by X = 0, ^ = 0 exist only in some algebraically closed field, and the notion of continuous variation requires something like real numbers. But the actual determination of the genus depends on purely algebraic considerations, at least in the examples of t h e preceding essay. All t h a t is needed is to construct, for a large number z/, a formula 6{x,y,a^a\a'\ ...) for the most general "function" in the field t h a t has poles only at points where x = oc and no longer has poles at those points when it is divided by x^ (although this division will probably cause it to have poles at x = 0). T h e g e n u s g is determined by the condition t h a t the number of zeros of ^ is ^f — 1 greater t h a n the number of arbitrary constants in the formula for 0. Of course elements of a function field are not really functions in the usual sense, so they do not really have zeros and poles, and the condition t h a t an element have poles only where x = cxo is far from rigorous. Therefore, this description of the genus needs more explanation. Starting with the field of rational functions on an algebraic curve xi^^u) — 0—which is simply the root field of a monic, irreducible polynomial x ( x , y) in y with coefficients in Z[x]— one needs to define w h a t it means to say t h a t an element 6 of the field has no poles where x is finite and t h a t 6/x^ has no poles where 1/x is finite, and t h e n one needs to determine how many zeros such a 0 has and how many arbitrary constants there are in t h e formula for t h e most general such 0. T h e idea of an element 0 having no poles where x is finite has a s t a n d a r d algebraic formulation: An element 0 of t h e field of rational functions on a curve x(x,2/) = 0 is i n t e g r a l o v e r x if some power of 9 is equal to a sum of multiples of lower powers in which t h e multipliers are elements of the ring Q[x] of polynomials in x with rational coefficients.* T h e justification of this * One could also use the more restrictive, but perhaps more natural, definition in which the multipliers are required to be in Z[x]. Then an element would be integral over X in the sense defined above if and only if some integer multiple of it was integral in the more restrictive sense. Since | is certainly a "function" without poles, the definition given above is the one that describes "rational functions without poles" on an algebraic curve.
Essay 4.3 An Algebraic Definition of the Genus
129
definition is, in the last analysis, pragmatic—it works in the sense t h a t it suggests correct theorems and is useful in proofs. (The analogous definition of an a l g e b r a i c i n t e g e r in an algebraic number field^—see Essay 2.5—emerged in the work of Kronecker and Dedekind in the 1860s and 1870s. Bourbaki [6] claims it is in t h e work of Eisenstein as early as 1852, b u t I do not find it there. Kronecker [38, §1] used the above definition of integrality over x of an "algebraic function" in his study of function fields, but as far as I have found he does not explain or motivate it.) It is easy to see t h a t the elements of the field of rational functions on x{x,y) t h a t are integral over x form a ring in the field and t h a t this ring contains Q[a:].''" If 0 is integral over x, then dividing an equation t h a t demonstrates its integrality, say 9^ — ai{x)6^~^ + • • • + CLn{x), by x'^^ for fi larger t h a n t h e m a x i m u m degree of the ai{x) shows t h a t 6/x^ is integral over 1/x for all such values oi fi} T h e o r d e r of ^ at x = oo is by definition t h e smallest u for which 0/x^ is integral over 1/x. Let 0{x^) denote the elements 0 of t h e field of rational functions on x{x,y) = 0 t h a t are integral over x and have order at most v where a: = oc. T h e goal is to find a formula for the most general element 0 of 0{x^), and to compare the "number of zeros" of 0 to t h e "number of arbitrary constants it contains." T h e "number of zeros" of such a 0 has a very plausible meaning. By assumption, xi^^y) is monic in y, say of degree n in y. T h e n the values of y for a given x are t h e roots of a monic polynomial of degree n, so there are n of them, counted with multiplicities. For this reason, there are n points on the curve for each x, so x assumes each value exactly n times on t h e curve, counted with multiplicities. For this reason, it is reasonable to take t h e view t h a t X also assumes t h e value oc exactly n t i m e s — t h a t x has n poles on the ' If Zi^ can be expressed as a sum of multiples of lower powers of zi in which the multipliers are in Q[a:], and 2;^^ can be expressed as a sum of multiples of lower powers of Z2 in which the multipliers are in Q[x], then every polynomial in zi and Z2 with coefficients in Q[x] can be expressed as a sum of multiples of zlz2 with coefficients in Q[x], where i < ni and j < n2- Therefore, multiplication by any such polynomial in zi and Z2—in particular multiplication by zi + Z2 and Z1Z2—can be represented by the nin2 x nin2 matrix of elements of Q[x] that gives its effect on these nin2 monomials z{z2' Therefore, since the polynomial in zi and Z2 is a root of the (monic) characteristic polynomial of this nin2 x nin2 matrix by the Cay ley-Hamilton theorem, zi -\- Z2 and Z1Z2, and, in the same way, all polynomials in zi and Z2 with coefficients in Q[x], are integral over x. * Note that x(a:,y)/x'^ has the form xi(l/x,2//x'^), where xi is irreducible with integer coefficients and monic in its second variable, when K, is large enough, and that X and y can be expressed rationally in terms oi u = 1/x and v = y/x^, so the field of rational functions on xi^^v) — 0 can also be regarded as the field of rational functions on the curve xi(^5 ^) = 0. To say that an element of this field is integral over 1/x means, of course, that it is integral over u.
130
4 The Genus of an Algebraic Curve
curve, counted with multiphcities. Therefore, x^ has n poles of order v for a total of nu poles. Since 6 has the same poles as x^ (except when 6 is in the subset of 0{x^) containing "functions" that have fewer than the maximum number of poles allowed, which is sparse in 0(x^)), it follows that 0 should be regarded as having nu poles; reversing the above reasoning then leads to the conclusion that 0 assumes each value nu times, including the value zero. In short, it is plausible to take nu to be the number of zeros of a typical element of6)(x^). This analysis of the number of zeros of a typical element of 0{x^) for large u overlooks, however, a phenomenon that is exhibited by the example x{x,y) = (x^+2/^)^—2(x+?/)^. The field of rational functions on this "curve"— 2,
2
the root field of this xi^iU)—contains the element ^ ^ , which is a root of X^ — 2 (by definition, the square of x^ + y^ is 2{x -\- yY). Therefore, it is reasonable to let \/2 denote this element of the field. Then xi^^v) — (^^ + T/-^ —\/2(x + 2/))(x^+7/^ + \/2(x + y)), which shows that the field is an extension of degree two, not four, of the field of rational functions in x with coefficients in the number field Q(\/2). Geometrically, the curve (x^ +?/^)^ — 2(x + 2/)^ — 0 is quite simple, because the reduction {x'^-{-y"^ — \/2[x^y)){x'^-\-y'^-\-\/2{x-\-y)) = 0 shows that it is a union of two circles, namely, the circle whose diameter is the line from the origin to (\/2, \/2) and the one whose diameter is the line from the origin to ( —V^, —V2). Geometers traditionally use algebraically closed ground fields in part to avoid situations like this in which a curve described by a simple irreducible polynomial becomes a union of two curves when the field of constants is extended. The simple constructive solution to this difficulty is not to make the giant leap to an algebraically closed ground field—the usual choice being the field of complex numbers, which is not an algebraic but a transcendental extension— but to adjoin new constants as needed. In the example, the constant \/2 is 2 I
2
not just needed, it is already present as ^^'^^ , and when it is used the curve is reducible, and the geometric picture of the "curve" whose field of rational functions is the root field of (x^ -^-y'^Y — 2{x ^-y^ is a single circle x^ -h ^^ = ^/2{x -h y). This revision of the picture makes it clear that the number of zeros of x on the curve is two, not four. Consequently, the number of poles of x^, which is the same as the number of zeros, is 2z^, not 4i/, and a typical element of 0{x^) has 2iy zeros, not 4z/. More generally, one needs to take into consideration the possibility that the root field of x(x, y) may contain constants other than the obvious constants in Q. Here a "constant" is an element of the root field that is a root of a polynomial with integer coefficients, or, what is the same, an element of 0{x^). (A polynomial in x with rational coefficients is equal to a polynomial in 1/x with rational coefficients if and only if it is a rational number, and a root of a monic polynomial with rational coefficients is a root of a polynomial with integer coefficients.) For this reason, &(x^) will be called the field of
Essay 4.3 An Algebraic Definition of the Genus
131
constants of the root field of x(^, ?/)• (Note that 0{x^) is a vector space over Q for all z^, and that 0{x^) is in fact a field.) The example then suggests the following definition: The number of zeros of a typical element of 0{x^) for large values of z^ is no z^, where no is the degree of the root field of x as an extension, not of the field Q{x) of rational functions in X with integer coefficients, but of the field of rational functions of x with coefficients in the field of constants 0{x^). When 0{x^) = Q, no is simply the degree of x in i/, but in the general case it is this degree divided by [0{x^) : Q]. Similarly, when one counts the "number of constants" in a formula for a typical element of 0{x^), one thinks of the constants as being in 0{x^), not Q. Because the product of an element of 0{x^) and an element of 0(x^) is an element of 0(x^+^), 0{x^) is a vector space over 0{x^). The number of constants in a formula for a typical element of 0{x^) is quite simply the dimension of 0{x^) as a vector space over the field 0{x^). In this way, Abel's conception of the number of integrals to which a sum of integrals of an algebraic integrand can be reduced leads to the definition of the genus of the root field of x(x, y) as the number g for which dim0(x^) = noz^ — ^ + 1, where z/ is a large integer, where no is the degree n of x in y divided by the degree c of the field of constants 0{x^) as an extension of Q, where 0{x^) denotes the subset of the root field containing elements 0 that are integral over x and have order at most u where x = oo, and where the dimension is the dimension of 0{x^) as a vector space over the field of constants 0{x^). The underlying idea is that an element of 0{x^) has noz^ zeros and contains dim0(a:^) parameters; variation of all dim0(x^) parameters in such a 0 varies its zeros with only dim0(x^) — 1 degrees of freedom—one degree of freedom is lost because multiplication of a function by a constant does not change its zeros—so the number of constraints g on the motion of the noz/ zeros under an algebraic variation is determined by the equation dim0(x^) — 1 = noz^ — g. The main theorem will be to show that this genus is intrinsic to the curve x{x,y) = 0 in the sense that if the root fields of two polynomials xi^^v) ^^^ isomorphic—if the two corresponding curves are birationally equivalent— then the fields have the same genus. Although the proof will be somewhat long, the underlying reason that the genus is intrinsic stems from the above discussions: It is the codimension of the subvarieties of C^ determined by the algebraic variations of N points on the curve.
132
4 The Genus of an Algebraic Curve
Essay 4.4 N e w t o n ' s Polygon . . . ses [Newton's] principaux Guides dans ces Recherches [on cubic curves] ont ete la Doctrine des Series infinies, qui lui doit presque tout, & Vusage du Parallelogramme analytique, dont il est VInventeur. .. .11 est facheux que Mr. Newton se soit contente d^etaler ses decouvertes sans y joindre les Demonstrations, et quHl ait prefere le plaisir de se faire admirer a celui d^instruire. (Newton's main guides in his researches on cubic curves were the doctrine of infinite series, which owes him practically everything, and t h e use of the analytic parallelogram, of which he is t h e inventor. . . . It is annoying t h a t Mr. Newton contented himself with laying out his discoveries without accompanying t h e m with proofs, and t h a t he preferred the pleasure of making himself admired t o t h a t of instructing.)—Gabriel Cramer, [11, Preface] T h e program outlined a t the end of the last essay for constructing the genus of an algebraic curve—or, more precisely, the genus of the root field of a given x ( x , y)—will be carried out in the following essays using an algorithm of Isaac Newton* for expanding an algebraic function of x as a power series in fractional powers of x. Known as Newton's polygon, or sometimes Newton's parallelogram^ it constructs, for a given polynomial equation xi^^v) = 0? infinite series expansions of y in fractional powers of x. It involves choices and results in n different expansions, where n — deg^ %. It will be useful to expand y not only in powers of x but also in powers of X — OL ioY various algebraic numbers a, something t h a t can be accomplished by the same method, since setting x\ — x — a and X i l ^ i ^ y ) — x ( ^ ~ ^iV) gives an algebraic relation between x\ and y t h a t can be used to expand y in (fractional) powers of Xi using Newton's polygon. Let x ( x , 2/) be an irreducible polynomial with integer coefficients t h a t has positive degree in b o t h x and y and is monic in ^, and let a be a given value for x. T h e objective is to find infinite series "solutions" y — 6o{x — c^Y^ -\- Oi{x — ay^ -\- 02{x — a)^2 _^ . . . of ^ ( x , y) = 0 in which the coefficients 6i are algebraic numbers and t h e exponents eo < Si < - - - < Sk < - - - are an increasing sequence of rational numbers. It will also be assumed t h a t the exponents increase without b o u n d in the sense t h a t for any given N one can find a value of k for which Sk > N. T h e meaning of the statement t h a t such a * Newton's presentation is quite sketchy. My main source was Walker [61]. See also Newton [51, vol. 3, p. 50 and p. 360, vol. 4, p. 629], Hensel-Landsberg [31], and Chebotarev [9]. Chebotarev cites (end of §2) the Hensel-Landsberg book as his basic source, but he examines the Newton polygon much more fully than that book does, dealing thoroughly, for instance, with the history of the method. Unfortunately, his article is available only in Russian, and is difficult to find. Chebotarev advocates calling it Newton's "diagram" as Hensel and Landsberg do, saying that the "polygon" was not present in Newton's formulation, but the name "Newton's polygon" now seems firmly established.
Essay 4.4 Newton's Polygon
133
Fig. 4.4. Newton. series "solves" xi^^u) = 0 is clear, if somewhat nonconstructive: Such series can be added and multiplied term by term, and x = a -\- {x — a) is such a series (a terminating one), so xi^iV) represents such a series, the coefficients of which can be computed in an open-ended way by finding, for any given upper bound, all terms of the series x(x, ^) in which the exponent is less than that bound. To say that x{^^y) — 0 means simply that the result is always zero. Since y is integral over x, the exponents Si are to be expected to be nonnegative. Therefore, a knowledge of the terms of the series for y through the term 6k{x — aY'^ is all that is needed to compute all terms of x(^, y) in which the exponents are less than or equal to e^, because all omitted terms contain {x — aY^+^ for some i > 0 and £k-\-i > Sk- What is sought, then, are infinite sequences 6o^ ^i, O2, . • .and 0 < SQ < Si < - - - ioi which all terms of the terminating sequence x{^^ ^o(^ — <^)^° -\-9i{x — a)^^ H \-6k{x — ay^) have exponents greater than £k- A constructive solution of this problem must of course be an algorithm for generating such sequences. "Newton's polygon" is such an algorithm. More specifically, given the initial terms 9o{x — a)^° -\-9i(x — aY^ -\ h 6k{x — aY'^ of an infinite series solution y of xi^^u) = 0 in the sense just described, the algorithm should give all possible values 9k-\-i{x — aY^'^^ for the next term of the sequence. They can be completely described in the following way: To avoid fractional exponents, let m be the least common denominator of £0, £1, . . . , Ek and let s — {x — a)"*^/^, so that the initial terms that are assumed to be known take the form /3o + /3is + • • • + f3hS^^ where h is the
134
4 The Genus of an Algebraic Curve
integer msk and where /3^ is zero unless i is of the form mSj for some j , in which case /?^ = Oj. Let the t e r m following (3hS^ be 75^"^^, so t h a t the required equation is x(<^ + 5"^, /3o + /3i/i + • • • + PhS^ + -^5^+/^ + • • •) = 0, where p is a positive rational number. To determine t h e possible values of 7 and p expand X(Q^ + •5"^,/^o + /?is + • • • + /^/^s^ + ^5^), a polynomial in s and t whose coefficients are algebraic numbers (because they are polynomials in /?o, A , • . . , /^/i, o^, a n d the coefficients of x ) , as a polynomial in t, ^Q{S) + ^i{s)t-\-^2{s)t^ H h^n(5)t'^, whose coefficients ^i{s) are polynomials in s. Again, to avoid fractional exponents, let p be written p = - , where a and r are positive integers, and let si = 5^/'^, so t h a t the required identity becomes X(a + sf^, po + A 5 [ + • • • + f3hs^^ + T^i^^"" + •••) = 0, which is to say
T h e simple idea t h a t underlies Newton's polygon is t h e observation t h a t this infinite series in 5i with algebraic number coefficients, which is a sum of n + 1 such series, can be identically zero only if all terms in the sum cancel, and, in particular, only if the lowest-order terms of these series cancel. If the polynomial ^i{s) is nonzero, it has t h e form Qs^^ + • • •, where Ci 7^ 0 ^^^ ^he omitted terms all have degree greater t h a n ji. W i t h this notation, the term of ^^(^l)(7^?^ y of lowest degree, w h e n ^ ^ ( 5 ) 7^ 0, is QY^T^^^' - T h e required cancellation dictates t h a t the positive integers a and r must have the property t h a t ai + rji assumes its minimum value for at least two different pairs {i^ji) (note t h a t these pairs are determined by x? ^^5 and (3o -\- Pis -\- • • • -\- PhS^). These conditions limit the pairs (a, r ) to a finite number of possibilities— the geometrical picture is the one described below—and even gives strong information about t h e coefficient 7 of the next term, namely, t h a t it is a nonzero root of the polynomial X^ Ci7% where the sum is extended over just those values of i for which ^i{s) / 0 and ai + rji assumes its minimum value. Some t e r m of some series ^i{s\){'^s\ -\ )* for i > 0 must cancel the first t e r m Co'^i'^^ of ^Q{S\)^ SO TJQ > ai -\- rji for some z > 0. Since a and r are b o t h positive, jo must be greater t h a n ji for at least one i > 0. Therefore, the above discussion shows t h a t the series (3Q + Pis ^ + l3hS^ can be extended to be an infinite series solution y oi x{^->y) — 0 when x = a -\- s'^ only if the polynomial in two variables x(<^ + s^^Po + (^is + • • • + I3hs^ -\- ts^) = ^ o ( 5 ) - h ^ i ( s ) t + ^2(5)t^H V^n{s)t'^ has the property t h a t s divides ^{){s) more times t h a n it divides ^i{s) for at least one i > 0. Otherwise stated, the term or terms of this polynomial of lowest degree in 5 must all contain t. As will be shown, these necessary conditions on t h e constants t h a t describe the next t e r m when a certain number of terms are known permit one to construct all possible solutions y oi xi^iV) — 0 ^s infinite series of fractional powers of X — a. A t r u n c a t e d s o l u t i o n y oi xi^iV) = 0 at x = a will by definition consist of (1) an algebraic number field A containing a, (2) a positive integer m, and (3) a finite sequence /^o^ A? • • • 5 /3/i in A with the property t h a t the t e r m or
Essay 4.4 Newton's Polygon
135
terms oi x{^ ^ ^^^ Po + Pis -i h/3/i5^ -h ts^) of lowest degree in s all contain t. In addition, it will be assumed that the result ^o{s) of setting t = 0 in this polynomial is not zero; otherwise, y = Po -\- /3i{x — a)^/"^ H \- Ph{x — a)^^^ is an actual solution of xi^i v) — ^ ^.nd there is no need to use higher powers of X — a. Newton's Polygon Input: A truncated solution y of xi^^v) = 0 at x = a, as that term was just defined. Algorithm: As above, let x(<^ + 5^, Po + Pis H h PhS^ + ts^) be written in the form ^o{s) + ^i{s)t + ^2{s)t'^ 4- • • • + ^nis)^^ of a polynomial in t whose coefficients ^i{s) are polynomials in s with coefficients in the field A specified by the input. Consider the set of pairs [i^ji) of integers, where i is in the range 0 < i < n, where ^i{s) / 0, and where ji is the number of times that s divides ^i{s). By assumption, jo is defined and greater than at least one other ji. The segments of the N e w t o n polygon corresponding to this input are the line segments that join two points (i^ji), say those corresponding to the indices ii and i2 > ii, in such a way that (1) the segment has negative slope, so it is described by the equation ai -\- rj = k where a = ji^ — ji^ and r = i2 — ii are both positive and where k is the common value of ai-\-rj for these two indices, (2) ai + Tji > k for all indices i for which ji is defined, and (3) ai + rji > k whenever ji is defined andi < ii ori> i2. With each such segment, associate the polynomial
Vic)=
Yl
CiC*
(Ti-\-TJi=k
with coefficients in A, where the sum is over just those values of i for which {i,ji) lies on the segment, of which there are at least two, and where Q is the coefficient of s^^ in ^i{s). Extend the input field A, if necessary, to split all polynomials r]{c) that result in this way from segments of the polygon. (Geometrically, the segments join to form a polygonal path that joins the point (0,jo) to the first point, call it (/, J ) , of the form (i,ji) at which ji assumes its minimum value. This path is determined by the fact that it joins points of the form {i,ji) in such a way that none of these points are in the interior of the closed polygon formed by it and the segments from (0, jo) to (0, J) and from (0, J) to (/, J).) Output: A truncated solution of x(x, ?/) = 0 for x = a for each nonzero root 7 of each polynomial 7/(c) in the extended field constructed by the algorithm, namely, the truncated solution (1)
^ = /3, + /3^(^ _ c,)V- + /32(^ _ a ) 2 / + • • • + ph{x - a)'^/^ + 7(x - a)^^^+^)/™
in which one term with coefiicient 7 and exponent ^^"^^ = -^^ is added to the input truncated solution where p = ^. In other words, the output
136
4 The Genus of an Algebraic Curve
(7,3) (4,2)»
H
1
1
\
(5,2) •
\
h -I
Fig. 4 . 5 . When there are seven points {i,ji) = (0,10), (1,9), (2,4), (3,3), (4,2), (5,2), (7,3), Newton's polygon has two segments. They join (0,10) and (4,2) via (2,4).
t r u n c a t e d solution corresponding to 7 consists of the (possibly) extended field A constructed by the algorithm, the positive integer rm^ and the sequence /?^, /?;, /?^, . . . , / 3 ; , ^ , in which Pi^ = ft for 2 = 0, 1, . . . , /i and P^^^^ = 7 but all other coefficients PI are zero. T h a t each o u t p u t (1) is a truncated solution—unless, of course, it is an actual solution—can be proved as follows: Set s — si and t = 5^(7 + ^1) in the definition of ^o{s)^ ^i{^)^ - • • •> ^n{s) to put the new equation in the form (2)
x ( « + sl^.po
+ Pis]-h
p2sf
+ • • • + PhS^
r.hT-\
+ 7^1
hT-\-(T
+ ti5i
n
z=0
By the choice of a and r , no term on t h e right contains si to a power less t h a n t h e minimum value of ai + rji^ call it /c, and the terms t h a t contain si to the power k exactly are s\r]{'^ + t i ) by the definition of rj. Since 77(7 + t i ) is a nonzero polynomial (its degree in ti is t h e largest value of i for which t h e point {i^ji) hes on the corresponding segment of the Newton polygon) with constant t e r m zero (by the choice of 7), (2) is a polynomial in which the terms of lowest degree k in si all contain t i , as was to be shown. T h e a m b i g u i t y of a t r u n c a t e d solution is, in the notation used above, the least index i for which ji attains its minimum. Otherwise stated, it is t h e z-coordinate / of the endpoint (/, J ) of t h e Newton polygon other t h a n (0, jo)-
Essay 4.4 Newton's Polygon
137
A truncated solution will be called unambiguous if its ambiguity is 1. In this case, the polygon consists of a single line segment, and rj{c) is a polynomial of degree 1 whose single root is nonzero, so the algorithm produces a single output; moreover, the algorithm does not increase m, and it results in no extension of A because the root of r}{c) is already in A. The ambiguity of an output solution is the multiplicity of its ^ as a root of its r]{c), as follows from the above observation that the terms of lowest degree in si are Sirj{'y -t- ^i), because the multiplicity of 7 as a root of 77(c) is the number of times ti divides 77(74-^1). Thus, among the nonzero terms es^tl of (2) in which p assumes its minimum value A:, the one in which q has its least value is the one in which q is the multiplicity of 7 as a root of its r]{c). In particular, if the input truncated solution is unambiguous, so is the output truncated solution. Thus, if it begins with an unambiguous truncated solution, the algorithm constructs an infinite series solution (which may in rare cases be an actual terminating solution) with coefficients in the same A. In short, the construction of infinite series solutions is reduced by the Newton's polygon algorithm to the construction of unambiguous truncated solutions. Theorem 1. Construct n distinct infinite series solutions y of x{x^y) — 0 at X — a. As above, a is a given algebraic number and xi^^v) is a given polynomial with integer coefficients that is irreducible, contains both x and y, and is monic of degree n in y. For the reason just stated, an infinite series solution can be regarded as having been constructed when an unambiguous truncated solution has been constructed. The proof of the theorem will follow an example: Let x(x, y) = y^ —xy-\-x^ (the curve x = 0 is the folium of Descartes—see Fig 4.6, p. 151) and let a = 0. If one begins with the truncated solution m = 1, y = 0, one begins with x(0 + s, 0 +1) = s^ — st -\-t^, and Newton's polygon joins the points (0, 3) and (3, 0) via the point (1,1). The two segments of the polygon are described by the equations 2i + j = 3 and i -\-2j = 3. The first segment gives just one output truncated solution 2/ = x^, because m = l , / i = 0, (j = 2, T = 1 and because the polynomial r]{c) = 1 — c in this case has just one nonzero root 1. It is a simple root, so this output solution is unambiguous. The second segment gives two output truncated solutions y = ^\fx (because vn— l,/i = 0, cr = l,T = 2), where 7 is a nonzero root of 77(c) = —c^(?. Thus 7 = zbl, and the output consists of two truncated solutions y = zby^. Both are unambiguous. Thus, Newton's polygon constructs three unambiguous truncated solutions and therefore constructs the three required infinite series solutions of y^ — xy^rx^ — ^ioY X — 0. These infinite series solutions can be found by repeated application of the Newton polygon algorithm, but the first few terms can be found more easily by the following method. The truncated solution y — -Ly/x. calls for the computation of x{^'^•> ^-^ + St) = {sy - s^{±s + St) -f {±s + st)^ = s^ {s^ - (±1 +1) + (±1 +1)3) -
138
4 The Genus of an Algebraic Curve
s^{s^ + 2t lb 3t^ + t^). The term 2t shows that this truncated solution is unambiguous. The continuation of the truncated solution y = :^^/x + • • • can be found using the equation s^ + 2t± 3t^ -h t^ = 0 to express t as a power series in s = y/x and substituting the result in y — ±s -\- st. Consider first the case in which the sign is plus. The relation s^ -\-2t -\- 3t^ -\-1^ — 0 can be written - 1 ^ 3 + {\s^ + IsH' + .. • ) ( - § -l) = -^s^ - Is' - IsH^ - IsH +••• = - | 5 ^ - | 5 ^ - | - | - 5 ^ + | - ^ 5 ^ + --- = - ^ 5 ^ - | s ^ - i s ^ + ---, where theomitted terms all contain 5^^, from which y = s-{-st = s—^s"^— ^s^^ — ^s^^-\ . When 5^ -h 2t + 3t^ + t^ = 0 is changed to s^ -\-2t - St"^ -\-t^ = 0, the corresponding solution is found by changing s to — 5 and t to —t. In summary, the second segment i + 2j = 3 corresponds to two infinite series solutions of y^ — xy-\-x^ = 0; they begin y = =bv^ - -x^ T :^x^V^ - - x ^ + • • •. The infinite series solution y — x'^ -\ corresponding to the first segment 2i + j = 3 calls for computing the polynomial x(s, s^ -\- s^t) = 5^ — 5^(1 + t) + s^(l + tf = s^{-t + s^(l + t)^). The term -t shows that the truncated solution y = x^ is unambiguous. The expansion of y in powers of x can be found by using the relation —t + s^{l-\-t)^ = Oto expand t in powers of s = x and substituting the result in y = s'^ + s^t. Now, ^ = 5^(1 + ^)*^ implies t = = = =
5^(1 + s^{l + tff = 5^(1 + 3s^(l + tf + 3s^(l + tf + s\l + tf) 5^ + 35^(1 + tf + 35^(1 + tf + 5^2(1 + tf s^ + 35^ + 9s^t + ^s't^ + • •. + 35^ -f- l^s^t + ... + s^2 + • • • 5^ + 35^ + 12s^ -h 28s^^ + • • •,
so 7/ = x^ + x^ + 3x^ + 12x^^ + 28x^^ + • • • is the beginning of this infinite series solution of y^ — xy -{- x^ = 0. (Note that the sum of the three series is zero, at least up to the terms in x^, in accord with the fact that the coefficient of y'^ in y^ — xy -{- x^ is zero.) Proof of Theorem 1. A truncated solution oixi^iV) at x = a in which m = 1 and /i = 0 is an algebraic number (3Q for which the terms of x(c^ + -5, /?o + ^) of lowest degree in 5 all contain t] since x{^^s^Po+t) contains the term f^ with no s at all, y = /3o is a truncated solution if and only if X(Q; + -^^ /^o) does not contain a term without 5 or, to put it more simply, if and only if x(<^, Po) = 0. In short, these truncated solutions y = Po are the roots of x(<^^ ?/)• The ambiguity of such a truncated solution y = PQ of X(X, y) at x = a is equal to the multiplicity of Po as a root of X(Q^, y), because the ambiguity of the truncated solution is by definition the least index i for which ^^(0) ^ 0, where x{a-^s,po-{-t) = ^o(s)^^i{s)t^^2{s)t'^ H ht"", which is the multiplicity of Po as a root of x(<^? u)- I^ particular, if all roots of xi^i v) ^^^ simple, the
Essay 4.4 Newton's Polygon
139
Newton polygon algorithm applied to any one of the n unambiguous truncated solutions y = po generates an infinite series solution y oi xi^^v) = 0 at x = a, which proves the theorem in this case. In the general case, one can apply the following algorithm: Input: A set of truncated solutions of x(x, y) = 0 at x = a. Algorithm: While the set contains a truncated solution whose ambiguity is greater than 1, let the Newton polygon algorithm he used to replace one such truncated solution with one or more longer truncated solutions. The theorem will be proved by proving that this algorithm terminates— that is, it reaches a stage at which all truncated solutions in the set that has been found are unambiguous—and by proving that each step leaves the sum of the ambiguities unchanged, so that if the algorithm starts with the truncated solutions y — j3o^ the sum of whose ambiguities is deg^ xi^iV) — '^ (because this sum is the sum of the multiplicities of the roots Po of x(a,y)), it terminates with a set of n unambiguous truncated solutions, which then imply n infinite series solutions. That the sum of the ambiguities does not change can be seen as follows: Let the notation be as in the description of Newton's polygon. The ambiguity of the input truncated solution is the least index / for which ji attains its minimum value J. Since the segments of the Newton polygon join (0, jo) to (/, J) and since the number of nonzero roots—counted with multiplicities—of any r]{c) is its degree minus the number of times c divides it, which is the difference i2 — ii of the i-coordinates of the endpoints of the corresponding segment, the ambiguity of the input truncated solution is the total number of nonzero roots, counted with multiplicities, of the polynomials rj^c) corresponding to segments of the polygon. Since, as was noted above, the multiplicity of 7 as a root of 77(c) is the ambiguity of the output truncated solution corresponding to 7, the sum of the ambiguities of the output solutions is the ambiguity of the input solution, as was to be shown. Each step of the above algorithm increases the number of truncated solutions in the list unless the input truncated solution, which has ambiguity greater than 1 by assumption, yields a single output truncated solution, which means that 77(c) is a constant times (c — 7)^ for some nonzero algebraic number 7, where // is the ambiguity of the input solution. It will be shown that the number of steps of this type is bounded above, so that repeated application of the Newton polygon algorithm eventually must increase the number of truncated solutions in the set. Since the total of their ambiguities is n at each step, it will follow that the process must terminate with n unambiguous truncated solutions, and the theorem will be proved. Suppose, therefore, that the (ambiguous) input truncated solution is one that produces a single output truncated solution. It is to be shown that iteration of the algorithm eventually produces more than one output truncated solution. Let /i be the ambiguity of the input solution, which is therefore the ambiguity of each subsequent output solution as long as there is only one of
140
4 The Genus of an Algebraic Curve
them. As was just noted, when there is only one o u t p u t t r u n c a t e d solution, 7y(c) is a constant times (c — 7 ) ^ for some algebraic number 7, which implies t h a t Newton's polygon consists of a single segment t h a t passes through pairs (i, j i ) in which i has all values from 0 to /i because 77(c) contains terms in which c has all of these exponents. T h e single segment of the polygon is (io - jfx)i + A^J = k, where k = /xjo- T h e n (jo - j / . ) • 1 + MJi = /ijo, which shows t h a t jo — j i is divisible by fi. Therefore, t h e segment can also be written ai -\- j = jo, where a = ^^~^^; t h a t is, r can be taken to be 1, so t h a t Si = s. T h e n (2) is divisible at least jo times by s (because k = jo), whereas x ( a - f 5 ^ , /3o+A«5H \-js^-\-ts^) is, by t h e definition of j ^ , divisible exactly j ^ times by 5. In other words, adding t h e next t e r m 75^+"^ to the truncated solution increases the number of times s divides x ( a - h s ' ^ , Po-\-PiS-\ h/?/iS^+ts^) from j ^ to at least jo = Ji_i + (T/J^. Thus, if ly successive steps repeat t h e phenomenon of producing a single o u t p u t t r u n c a t e d solution, it produces a t r u n c a t e d solution y = Po -^ - " -\PhS^ + 7is^+^i _^ r^^gh+ai+a2 + . . . -^ 7i/S^+^, where T - cri + cr2 -h • • • + cr^, for which x ( a + s^^y -\- ts^~^^) is divisible j ^ + /lE times by s, say X ( a + 5 - , /3o + A s + • • • + PhS^ + • • • + -f.s^^^
+ ts^^"")
= s^-^^^q{s,
t).
Differentiation with respect to t gives sh+E^
( a + a " , /3o + /3is + . . . + /3,s'^ + . . . + ^,s^+^
+
ts^+^)
dy
On t h e other hand, elimination of y between x{^->y) ^^id -^{x,y) (see Essay 1.3) gives, because t h e irreducibility of x implies t h a t these polynomials are relatively prime, an equation of the form
9x A{x,y)x{x,y)
^ B{x,y)
— [x,y)
=
D{x),
in which A{x,y), B{x^y), and D{x) are polynomials with integer coefficients. Substitution of x = a-\-s'^ and y = /?o + /3i5 + - • • + /?/,5^-f • • • + 7 ^ 5 ^ + ^ + t 5 ^ + ^ in A{x^y)x{x,y) + B(x^y)-^{x^y) — D{x) gives D{a-]-s'^) on the right and on the left gives a polynomial in s and t t h a t is divisible at least (j^-\-/j,U) — {h-\-E) times by s. Thus j ^ + (/i — 1)17 — /i is bounded above by the number of times s divides D{a -\- 5 ^ ) . Since /i > 1, this implies an upper bound on U; but L > u, because Z" is a sum of 1/ terms, each of which is at least 1, so z^ is bounded above, and t h e proof of Theorem 1 is complete.* * Walker's proof of this point [61, p. 102] is not constructive, because he jumps from the observation that the ambiguity can never increase and can never go below 1 to the conclusion that he can find a step beyond which the ambiguity never decreases.
Essay 4.4 Newton's Polygon
141
Theorem 2. Every truncated solution of x{x^y) = 0 for x — a is a truncation of one of the infinite series solutions constructed by Theorem 1. Proof As was shown prior to the statement of the Newton polygon algorithm, if an infinite series solution is truncated, and the algorithm is applied to the result, one of the outputs is the truncated series with the next nonzero term after the truncation added. Therefore, any truncated solution is among the outputs if one starts with the truncated solutions ?/ = /5o in which x(<^, Po) = 0 and repeatedly applies the algorithm. Since these are the truncated solutions constructed by Theorem 1, Theorem 2 follows.
142
4 The Genus of an Algebraic Curve
Essay 4.5 Determination of the Genus On doit donner au probleme une forme telle qu 'il soit toujours possible de le resoudre, ce qu^on pent toujours faire d^un probleme quelquonque. (One should give t h e problem a form in which it will always be possible to solve it, which can always be done for any problem whatever.)—Niels Henrik Abel [2, p. 217] I confess t h a t t h e meaning of this dictum of Abel's is not altogether clear to me. Certainly it sounds like good advice, if one can understand what it means. My best guess is t h a t he means something like what Kronecker meant when he said t h a t one should require of one's definitions t h a t one be able to determine by a finite calculation whether t h e definition is fulfilled in any given case. In t h e case of the determination of t h e genus of an algebraic field of transcendence degree one—the genus of a given algebraic curve—I believe b o t h men would focus on constructive techniques like the ones given in this essay. T h e genus was described in Essay 4.3 in t e r m s of the dimensions of the spaces 0{x^) of elements of t h e root field of x ( ^ , y) (as always, an irreducible polynomial with integer coefficients t h a t contains b o t h x and y and is monic in y) t h a t are integral over x and become integral over ^ when they are divided by x^. These dimensions can be determined easily once one constructs what Dedekind and Weber [14] called a n o r m a l b a s i s of the root field. T h e o r e m . For a given x{x, y), construct a subset yi, y2, •. •, yn of its root field with the property that yi, y2, ..., yn is a b a s i s of the field over the field of rational functions in x in the sense that each element w of the root field has a unique representation in the form (1)
w = (t)i{x)yi + (t)2{x)y2 H
h (t)n{x)yn,
where the coefficients (pi^x) are rational functions of x, and is an integral b a s i s in the sense that w is integral over x if and only if each coefficient (l)i{x) in its representation (1) is a polynomial with rational coefficients, and further is a n o r m a l b a s i s in the sense that w is in 0{x'^) if and only if each coefficient (t)i{x) in its representation (1) is not only a polynomial but also satisfies deg(j)i --\- Xi < ly, where Xi is the order of yi at x = oo for each i, that is, the least integer for which yi is in 0{x^^). Proof Dedekind and Weber gave what appears to be an algorithm for constructing an integral basis (their §3), but their construction relies on the assumption t h a t for a given constant a one can either find an element y t h a t is integral over x and remains integral over x when it is divided by x — o; or prove t h a t there is no such y. T h e proof below uses, in essence, the method of Newton's polygon t o justify this assumption and then constructs an integral basis using a m e t h o d similar to theirs. However, they also assume t h a t
Essay 4.5 Determination of the Genus
143
a polynomial with rational coefficients can be written as a product of linear factors—they assume complex number coefficients—and the proof below is a modified version of theirs that adjoins only the constants that are needed. The first step will be to find a common denominator of the elements integral over X. The operation of multiplication by an element of the field can be described by the n x n matrix of rational functions of x that describes it with respect to the basis 1, ?/, . . . , y^~^ of the root field as a vector space over the field of rational functions in x. In other words, an element z of the root field of X(x, ^) can be described by the matrix whose entry rriij in the zth row of the j t h column is the rational function of x that is the coefficient of y^~^ in the representation of zy^~^ with respect to the basis 1, ?/, . . . , y'^~^. The trace S r = i ^^* ^^ ^^^ matrix obtained in this way is the trace of z with respect to X. Lemma. If an element of the root field ofxi^^y) is integral overx, its trace is a polynomial in x with rational coefficients. Let '0 = p(x, y)/q{x) be integral over x. Then, by the definition of integrality, there is a relation of the form F('0) = 0, in which F is a monic polynomial with coefficients in Q[x]. Since F can be written as a product of irreducible, monic polynomials with coefficients in Q [x], il^ must be a root of an irreducible monic polynomial with coefficients in Q[x]; call it Fi. By the proposition of Essay 2.3, the root field of x(x,?/), because it contains '0 and is generated over Q(x) by ?/, can be described by two adjunction relations /i('0) = 0 and /2(2/,'0) = 0, where / i and /2 have coefficients that are rational functions of x, / i is monic of degree z^i, say, and is irreducible, while /2 is monic of degree 1^2 ^ say, in y and is irreducible as a polynomial in y with coeflScients in the field of rational functions in x with ifj adjoined. Because / i and Fi both have ^ as a root, because both are monic with coefficients that are rational functions of x, and because both are irreducible over the field of rational functions in x (Fi is irreducible in this sense by virtue of Gauss's lemma), / i = Fi. In particular, the coefficients of / i are not just rational functions of x, they are polynomials in x with rational coefficients. The trace of -0 is by definition the trace of the matrix that represents multiplication by '0 relative to the basis 1, 7/, • • •, y^~^ of the root field over the field of rational functions in x. Therefore (because iv {AB) = ti{BA)^ so ii{M~^AM) = tr (AMM~^) = tr (yl)), it is the trace of the matrix that represents multiplication by -0 relative to any basis. In particular, it is the trace of the matrix that represents multiplication by ^^ relative to the basis ip^y^, 0 < i < ui^ 0 < j < 1^2' When the elements of this basis are suitably ordered, the matrix that represents multiplication by ip becomes a z/2 x ^2 matrix of ui x ui blocks (note that 1^11^2 = n) in which the blocks off the diagonal are all 0 and the blocks on the diagonal are all the same matrix: Its first 1^1 — 1 rows are the last z^i — 1 rows of /^^^, and its last row contains the negatives of the coefficients (after the first) of the polynomial / i = Fi,
144
4 The Genus of an Algebraic Curve
listed in reverse order. In particular, its entries are all in Q[x], so its trace is in Q[x]. (In fact, its trace—and therefore t h e trace of ip—is simply —1^2 times t h e second coefficient of Fi.) T h e matrix, call it 5 , whose entry in t h e zth row of the jth column is t h e trace of ^*+-^~^, is a matrix of polynomials in x with integer coefficients. Therefore, its determinant, call it D{x), is a polynomial in x with integer coefficients. T h e lemma implies t h a t D{x) is a common denominator of the elements of the root field integral over x. In fact, if p{x,y)/q{x) is integral over x^ where p(x, y) and q{x) are polynomials with rational coefficients and q{x) 7^ 0, and if it is in lowest terms, t h e n not only does q{x) divide D{x)^ b u t so does q{x)'^. This can be proved as follows: T h e matrix S of which D{x) is the determinant represents the bilinear form "the trace of the product" on the root field of x ( x , y) relative to the basis 1, 2/, . . . , y'^~^. This observation implies t h a t D{x) ^ 0, because if D{x) were zero, there would be a solution v[x) oi S 'v{x) = 0 t h a t was a nonzero column matrix whose entries Vi{x) were rational functions of x, and this would imply irx{wv) — 0 for all elements w of t h e root field, where i) = Yl^^i^ii^)y^~'^^ contrary to the fact t h a t tr x{wv) = n when w is the reciprocal of v. If p(x, y)/q{x) is integral over x and in lowest terms in the sense t h a t q{x) and t h e coefficients Pi{x) of p{x,y) = Po{x) + Pi{x)y + • • • + pn-i{x)y'^~^ have no common divisor of positive degree, and if one of the coefficients Pi{x) is nonzero, t h e n a new basis of integers is obtained by replacing y^ with p{x,y)/q{x) in t h e basis 1, ? / , . . . , y^"^. T h e entries of the matrix t h a t represents the bilinear form "the trace of the product" relative to this new basis are polynomials in x with rational coefficients, because they are traces of elements integral over x. Therefore, its determinant is a polynomial in X. On the other hand, its determinant is ( ^ 7 ^ ) D{x), because the matrix t h a t makes the transition from one basis to the other is the identity matrix with the (i + l)st row replaced by a new row consisting of the coefficients of p{x^y)/q{x) and which therefore has ^ 7 ^ in its (i + l ) s t column, so both the transition matrix and its transpose have determinant ^ 7 ^ . Therefore, q{xY' divides pi{x)'^D{x) for each i (trivially so when Pi{x) = 0). Thus, q{x)'^ divides the greatest common divisor of these polynomials pi{x)'^D{x)^ which is the greatest common divisor of the Pi{x)'^ times D{x). Since p{x, y)/Q{x) is in lowest terms by assumption, q{x)'^ is relatively prime to the greatest common divisor of the p^(x)^, so q{x)'^ divides D{x)^ as was to be shown. Since every element p(x, y)/q{x) of t h e root field can be written in the form P ( x , y ) + ^q('^\ : where P ( x , y ) is a polynomial in x and y and where ^^f^y is a proper fraction in t h e sense t h a t deg^ r < deg q (and, as it is natural to assume, deg^ r
Essay 4.5 Determination of the Genus
145
efficients, t h e following proposition reduces this determination to t h e solution of a system of homogeneous linear equations. P r o p o s i t i o n . A rational function p{x^y)/q{x) with rational coefficients is integral over x if and only if for each algebraic number a that is a root of q{x) and for each infinite series solution y of x(^5 y) = 0 in fractional powers of x — a given by Newton^s polygon, all terms of the power series in s that results from substituting the series for y in p{x^ y) and then substituting a -\- s^ for X, where m clears the denominators in the fractional exponents, are divisible by s at least as many times as the polynomial q{a -h s^) is. Loosely speaking, the condition is t h a t each expression of p{x,y)/q{x) obtained by using an expansion of i/ as a power series in fractional powers of X — a, where a is a root of q{x), and writing t h e reciprocal of q{x) as a negative power ofx — a times a power series in x — a with nonzero constant term, contains nonnegative exponents exclusively; in short, p{x^y)/q{x) has no poles where x is finite. Proof. Let ^{x^p) = p^-\-Ci{x)p^~^-] \-Cj^{x) be t h e irreducible, monic polynomial* whose coefficients Ci{x) are rational functions of x of which p(x, y)— regarded as an element of the root field—is a root. Because p(x, y) is integral over X (it is a polynomial in y with coefficients in Q[x]), the coefficients Q ( X ) are polynomials in x with rational coefficients. To say t h a t p{x^y)/q{x) is integral over X means t h a t Ci{x) is divisible by q{xy for each i. It is to be shown t h a t this is true if and only if p{a + 5 ^ , /3o + A ^ + /325^ H ) = 0 mod s^ for all infinite series solutions y = /3o + A ^ + /325^ + • • • of x ( x , ?/) = 0, where a is a root of g(x), where s = {x — a)^^^, and where e is the number of times t h a t s divides q{a + s'^). By definition, to say t h a t ^{x^p) — 0, where x and p are regarded as elements of t h e root field, means t h a t ^{x^p{x^y)) = 0 mod xi^^v)- In other words, it means t h a t !Z^(x,p(x, y)) = g(x, y)x{^^ v) fo^ some polynomial g(x, y) with rational coefficients. Since X(Q; + 5 ^ , /3O + /3ISH \-(3hS^) = 0 m o d s^^^ for each /i, it follows t h a t ^{a + s'^,p{a + 5^,/?o + A s H h (3hS^)) = 0 mod 5^+-^ for each h. Therefore, p{a -\- {x — a),Po -\- Pi{x — ay^'^ + • • • + / ? / , ( x - a ) ^ / ' ^ ) , w h e n i t is regarded as a polynomial in {x — a)^^'^ and t r u n c a t e d by omitting all t e r m s in which t h e exponent is larger t h a n h/m^ is a t r u n c a t e d solution of^{x^p) = 0 for X = a. By Theorem 2 of t h e last essay, it is therefore a truncation of one of the infinite series solutions p of ^{x^p) = 0 for x = a found by t h e construction''' of Theorem 1 of t h e last essay. It is t o be shown, This polynomial can be found because ^{x,p) is a factor of the characteristic polynomial of the matrix that represents multiplication by p{x, y) relative to the basis 1, 2/, •••, y"""^^ Strictly speaking, this construction does not apply to lZ^(x,p), because its coefficients are rational and the description of the Newton polygon algorithm in Essay 4.4 assumes that the coefficients of the given equation xi^^v) — 0 ^^^ integers, but the algorithm applies without modification to the case of rational coefficients.
146
4 The Genus of an Algebraic Curve
therefore, t h a t q{xy divides Ci{x) for each i if and only if for each root a of q{x) in an algebraic number field, every infinite series solution p of ^{x^p) = 0 in fractional powers of a; — a is divisible by t h e highest power oi x — a t h a t divides q{x) = q{a -\- {x — a)). Suppose first t h a t q{xy divides Ci{x) for each i. For a given root a of q{x) whose multiplicity is e, {x — ce)^* then divides Ci{x). T h e initial term of any infinite series solution p oi ^{x^p) = 0 in fractional powers of x — a can be found using t h e m e t h o d by which the Newton's polygon algorithm finds the next t e r m of a t r u n c a t e d series solution. Specifically, t h e equation ^{a + s,p) = Cjy{a + s) + Cj,_i(a -f s)p^ h ci{a + s)p^~^ + p ^ = 0 shows, because the terms of lowest degree cancel, t h a t t h e lowest order t e r m of a series expansion p = js^^^ + • • • corresponds to a segment of the "Newton polygon" dictated by the points {i,ji)j where ji for i = 0, 1, . . . , z/ is the number of times s = X — a divides Cjy-i{a + s), except t h a t ji is undefined when Ciy-i{x) = 0. Since {x — ay^^~'^^ divides c,y_^(x), ji is at least e(z/ —z) whenever it is defined. In particular, t h e minimum value 0 of ji occurs only for i — u. T h e rightmost segment of the polygon, call it ai-\-TJ = /c, therefore has (z/, 0) as its right end; its other end is at a point (i, j i ) for which ai -\- rji = k = au -\- r - 0. For this index i, b o t h ji = ^{v — i) and ji > e(i/ — i) hold. Therefore, for this segment of the polygon, 7 > e. All infinite series solutions p = j{x — OLY^'^ + • • • t h a t correspond to this segment of the polygon are therefore divisible by [x — a)^. As is easily shown, t h e ratio - is smallest for this rightmost segment,* so all solutions p = ^{x — a)^!'^ + • • • are divisible by (x — a ) ^ , as was to be shown. Conversely, if q{xY fails to divide Ci{x) for some z, then {x — o;)^* fails to divide Q ( X ) for some root OL of multiplicity e of q[x) and some index i. Moreover, x(^5 V) was assumed in Essay 4.4 to be irreducible. The series expansions of a reducible polynomial can be found by finding the expansions of its irreducible factors. * What is to be shown is that the ratio cr/r for any segment of the polygon is larger than the ratio cr/r for the segment to its right. Since cr/r is minus the slope of the segment, this is the statement that the slopes of the segments increase as one moves from left to right, which is evident. In actual inequalities, the three endpoints of two successive segments of Newton's polygon, call them (r, jV), (s, js), (t, j t ) , satisfy ar + Tjr
= crs + Tjs
cf'r + T jr
rjt,
> cr's + T'JS = cr't +
r'jt,
where a' and r' pertain to the segment from (s, js) to {t,jt), r(jr - js) = cr{s - r)
and
r{jr
follow. Therefore ^ _ Jr - Js T s — r
as was to be shown.
o_ T'
from which
- js) > (T'(s - r)
Essay 4.5 Determination of the Genus
147
For such an a the points (i, ji) of the polygon arising from ^{a -\- s,p) = Cy{a + s) + Cy-i{a + s)p-\-... 4-p^ include at least one for which e{u — i)> jili ji — 0 for some i < z/, then ^{a^p) contains a term of degree less than u in p, so this polynomial in p has a nonzero root, call it /3o, and there is a solution p = /^o + • • • of ^{a^p) = 0 that is not divisible by 5 = a: — a, and therefore not divisible by {x — aY. Otherwise, as before, the rightmost segment of the polygon, call it cri-^rj = A:, passes through (i/, 0) and at least one other point of the form {i,ji). At least one point (i^ji) lies below the line j = e(z/ — i) of slope —e passing through (z^, 0); since all points (i^ji) lie on or above any segment of the polygon, the rightmost segment j = ^{u — i) must he under the line j = e{v — i) for i < u. Thus, ^ < e, so no solution p — 7(x — a)^!'^ -h • • • arising from this segment of the polygon is divisible by (x — a)^, and the proof is complete. Thus, in a proper fraction r{x^y)/q{x) that is integral over x^ the coefficients of r{x^y) satisfy a homogeneous system of linear equations, so the most general such fraction can be written as a linear combination of a finite number of them, say of ^i, ^2, • • •, Cfc, with rational coefficients. When these elements ^1, ^2, • • •, ^A: together with 1, ^, 2/^, • • •, y^~^ are taken as input to the following algorithm of Kronecker ([39, §7]), the algorithm produces an integral basis of the root field of xi^^ v) ^s described in the statement of the theorem. Construction of an Integral Basis Input: Elements ^1, ?/2, . . . , ? / / of the root field of x(^, y) integral over x that span the elements integral over x in the sense that each element integral over X can be expressed in the form X]i=i 4^i{^)yi where the coefficients (j)i{x) are polynomials in x with rational coefficients. (At the outset, I — n-\-k^ and the coefficients of the ^i can be taken to be rational numbers.) Algorithm: As long as the number I of elements in the spanning set is greater than n, carry out the following operations. Consider the I x / symmetric matrix \tvxiyiyj)] CLTid consider its symmetric nxn minor determinants—those nxn minor determinants in which the indices of the n columns selected coincide with those of n the rows selected. Each such minor determinant is a polynomial in X with rational coefficients because all of its entries are. Rearrange yi, y2, ..., yi, if necessary, to make the first such minor—the one formed by selecting the first n rows and columns—nonzero and of degree no greater than that of any other nonzero symmetric nxn minor. Then the first n entries of yi, y2, ''', Vi dre linearly independent over Q{x), which means that each remaining entry yn+i, yn+2, - • -, yi can be expressed as a sum of multiples of the first n in which the multipliers are rational functions ofx. Each multiplier in each of these expressions can be written as a polynomial in x plus a proper rational function of x, one in which the degree of the numerator is less than the degree of the denominator. Let polynomial multiples of the first n of the y ^s be subtracted from the later y ^s in order to make the multipliers in the
148
4 The Genus of an Algebraic Curve
representations of the later y ^s in terms of the first n all proper rational functions. Delete any y ^s that have become zero as a result of these subtractions, rearrange the list again, and repeat. O u t p u t : A list ?/i, ?/2, . . . , yn of just n elements integral over x t h a t span, over Q[x], the set of all elements integral over x. T h e operations of t h e algorithm—rearrange the T/'S, delete zeros, and subtract one y times a polynomial in x with rational coefficients from another y—do not change the conditions satisfied by the original set of T/'S t h a t they span t h e elements integral over x when coefficients t h a t are polynomials in x with rational coefficients are used. An argument like t h e one above t h a t proves t h a t D{x) is a common denominator of the elements integral over x proves t h a t each iteration of the algorithm reduces the degree of the determinant of the first n x n symmetric minor. Specifically, if, after t h e multipliers in the representations of ^ n + i , yn-\-2 ? • • • 5 2// cis sums of multiples of T/I , ^2, • • •, Vn have been reduced so t h a t they are proper rational functions, and after zeros have been deleted, there are more t h a n n items in t h e fist, then one of the coefficients—say the coefficient of yi—in t h e representation of 2/n+i is a nonzero proper fraction, call it ^ ^ , where d e g p < deg q. T h e symmetric nxn minor for any selection of n indices is a polynomial. As before. M i = ( 4 ^ j MQ when Mi is the minor in which the selected indices are 2, 3, . . . , n-f-l and MQ is the one in which they are 1,2, . . . , n. Thus, g(x)^Mi = p(x)^Mo, which shows t h a t deg M i < degMo. Thus, the minor of least degree has degree less t h a n deg MQ , and deg MQ decreases with each step, as was to be shown. In this way, the algorithm continues to reduce the degree of the first nxn minor. By the principle of infinite descent, the algorithm must terminate. In other words, a stage must be reached at which the list contains only n elements. Clearly, they are an integral basis of the root field. T h e proof of t h e theorem will be completed by a second algorithm, which starts with an integral basis and produces a normal basis. It requires t h a t one also construct an integral basis relative to the parameter u = ^] in other words, it uses a set 21, 2:2, . . . , 2;^ of elements of the root field of x ( ^ , y) with the property t h a t every element of the root field has a unique representation in the form Y2^i{^)^i^ where the coefficients ipi{x) are rational functions of x, and t h a t t h e element is integral over u = - ii and only if each ipi(x) is a polynomial in ^. T h e algorithm just given can be used to construct such a set 2^1, 2^2, • • • 7 Zn'i simply describe t h e root field as t h e root field of xi{^^^) = x{^^y)/^^^i where u = ^^ v — ^ , and A is large enough to make x i a polynomial in u and V. Such an integral basis zi^ Z2, ..., Zn relative to - will be used to determine, given an integral basis yi, 2/2 5 • • •, Vn^ whether the basis yi
y2
Un_
Essay 4.5 Determination of the Genus
149
is an integral basis relative to ^, where A^, for each i, is the order of yi at X — oc; that is, A^ is the least integer for which yi/x^^ is integral over ^. Construction of a Normal Basis Input: An integral basis y\^ 1/2, . . . , yn of the root field of x(x, y) relative to X.
Algorithm: Find the orders \i, \2, ..., \n of yi, y2, - - -, yn at x = 00. As long as -^, -^, - - -, - ^ (which is a basis consisting of elements integral over ^) is not an integral basis relative to ^, construct a new integral basis in which one yk is replaced by a new y'j^ whose order A^ at x = 00 is less than Xk in the following way. Write each Zi of an integral basis relative to ^ in the form ^ji^ij{x)^, where the il^ij{x) are rational functions of x. By assumption, at least one ipij (x) is not a polynomial in ^. (If all were polynomials in ^ , then each Zi and therefore each element integral over - would be a sum of multiples of the -^ with coefficients that were polynomials in -.) Choose a value of i for which at least one ipij (x) is not a polynomial in ^. Since x^Zi = J2i^ij{^)^^~'^^yj ^^ integral over x for sufficiently large u, arid yi, y2, ..., yn is an integral basis, the denominator of ipij (x) is a power of x for each j = 1, 2, ..., n, say il^ij{x) = x^j{x) -h Oj{^), where ^j{x) is a polynomial in X, and Oj{-) is a polynomial in - . By the choice of i, £,j{x) 7^ 0 for at least one j . Let a > 0 be the maximum of the degrees of ^i{x), ^2(^); • • •; in{x). Among those indices j for which deg^j = a, let k be one for which Xk is as large as possible and set y'j^ = ^CjX^^~^^yj, where Cj is the coefficient of x^ in ^j{x) (which is zero if deg^j ^ a). Output: An integral basis 2/1, 1/2, • • •, ?/n with the property that yi y2 yn Q^Ai
T' 2
X ^
is an integral basis relative to - . Justification. Replacement of yk with ?/^ gives an integral basis, as is shown by the two formulas y^ = V . CjX^^~^^yj (note that A^ > Xj for all j by the choice of k) and yk = -^y'k — IZj^k ^yj (^^^^ that Ck ^ 0 hy the choice of k). All that is to be shown, then, is that A'^ < A^. To this end, note that ^ 1 ^ = Yli^j~^ ) * ^ 5 where the omitted terms contain ^, ^ , ^ , Multiply by x and use the definition of yj^ to obtain -^ = x- ^ ^ + X ] ^i (x)* " ^ ' where ^j(^) for each j is x • ^^ l^+i^—, which is a polynomial in ^. Thus, X • -^ is a difference of elements integral over ^, which implies that the order of yj^ at X = oc is at most Xk — I, as was to be shown. Since the algorithm reduces the sum of the A^ at each step, it must terminate by the principle of infinite descent. When it terminates, the integral basis ^ 1 , 2/25 • • • 5 yn is a normal basis, because w = Yl^i{^)yi ^^^ order at most u if and only if all coefficients of -^ =J2 u-\i ' ~%; ^^^ polynomials in
150
4 The Genus of an Algebraic Curve
^, which is true if and only if deg (/)^ < z/ — A^, and the proof of the theorem is complete. If ?/i, ?/2, . . . , 2/n is a normal basis of the root field of x(x, 2/), the elements of 0{x^) are those whose representations in the form ^ ^ (j)i{x)yi have coefficients (t)i{x) that are polynomials in x, with rational coefficients, of degree at most v — Xi for each i. When ly < Xi this condition of course means that (/>i(x) = 0. Therefore, the dimension of 0{x^) as a vector space over Q is the sum of the numbers u — Xi + 1 over all indices i for which Xi < u. For large u^ then, the dimension of 0{x^) as a vector space over Q is exactly (z/ + l)n — ^^ A^. At the other extreme, when u = 0 this dimension—which is the degree of the field of constants 0{x^) as an extension of Q, denoted by c in Essay 4.3—is simply the number of indices i for which A^ = 0. In the notation of Essay 4.3, the genus of the root field of xi^^v) is g — TIQU — dim0(x^) + 1 for all sufficiently large i/, where no = n/c and the dimension is the dimension as a vector space over the field of constants, which is the dimension as a vector space over Q divided by c; thus, 1
g = riQiy
x"^ \
U^^ + 1)^ ~ X ] ^ 0 ^ •*" ^
~ ~ ^^^ ~ "^^'
In particular, when Q is the field of constants of the root field of x(^5 y), the genus of the root field is simply
(j2Xi)-in-l), where n — deg^ x ^^id Ai, A2, . . . , A^ are the orders of the elements ?/i, ^2, . . . , ^n of a normal basis of the field. As the discussion of Essay 4.3 already shows, the natural description of the genus uses the field of constants of the root field under consideration instead of the field of rational numbers: Determination of the Genus. ^45 was just explained, the construction of the theorem gives a basis over Q of the field of constants of the root field of x{x^y), namely, the elements yi of order zero in a normal basis. When the field Q is replaced by the (possibly) larger field of constants in the theorem, the construction gives a subset yi, y2, . . . ; yno of the root field of x(^, y) and nonnegative integers fii, JI2, • • •, f^no with the property that the elements of 0{x^) for any given v are precisely those of the form (f)l{x)yi
+ 02(^)2/2 +
h 0no(^)?/no
where (pi{x) is a polynomial of degree at most u — (ii in x whose coefficients are in the field of constants of the root field of x{x, y). Thus, for large u, the dimension of 0{x^) as a vector space over the field of constants is J27=i(^ ~ /li -\- 1) = UQiy — J2 f^i ~^ ^0' By the definition of the genus, this dimension is UQV — g -\- 1, from which it follows that 9= ( X l ^ M - ( ^ 0 - 1 ) .
Essay 4.5 Determination of the Genus
151
\y x3 + i/3 = xy
Fig. 4.6. The foUum of Descartes.
In particular, ^lJii'>nQ
— l.
y^ — xy + x^ (the folium of Descartes). Example 1: xi^^v) Multiplication by y is represented by
0 1 0" 0 01 -x^ X 0 relative to t h e basis 1, y^ y'^ of the root field over Q{x). of 7/ is 0. T h e trace of y'^ is the trace of
0 0
10 01
-x^ X 0
I 2
=
0 -x^ 0
0
Therefore, the trace
1
X
0
—X^
X
which is 2x. T h e trace of y^ — xy — x^ is x times the trace of y plus ~x^ times t h e trace of 1, which is x • 0 — x^ • 3. Similarly, t h e trace of y^ = xy'^ x^y IS 2x^, from which it follows t h a t 3 0 ^X
0 2x
2x -3x^
tjX
ZdX
and
D{x) = 12rr^ - 8x^ - 27:r^ = x^(4 - 27x^).
T h e square of t h e denominator q{x) of an element of t h e root field integral over X must divide x'^{A — 21 x"^)^ so x is a common denominator of these integral elements. A proper fraction integral over x must therefore be of the form ^+ y+^y where a, 6, and c are rational numbers. By the proposition, and by the fact t h a t y = i y ^ — . . . and y = x'^ -\- - • • are the series expansions of y in fractional powers of x, such an expression is integral over x if and only if a + h(^s) + c(dz5)^ = 0 mod s^ and a + bs^ + cs^ = 0 mod s. These conditions hold if and only if a = 6 = 0, so 2
t h e proper fractions integral over x are the rational multiples of ^ . Thus, 2
1, 2/, — are an integral basis. For this basis, Ai = 0 and A2 = 1. To find
152
4 The Genus of an Algebraic Curve
the order A3 of ^3 = ^ at x = 00, one needs to find the equation of which To 0 1" it is a root, which is the characteristic polynomial of - —x^ x 0 . This [ 0 -x^ X characteristic polynomial is X^ — 2X'^ -\- X — x^, so y^ — 2yl -\- ys — x'^ = 0, and {ff - 2 • ^ • {ff + J^ • ( ^ ) - 1 = 0, which makes it clear that A3 = 1. With u = ~ and v = ^ the equation v^ — uv -\-1 = 0 holds. That 1, v, v^ is an integral basis of the root field oiv^ — uv ^ \ follows from the fact that in this case S =
3 0 2ii 0 2u - 3 2u - 3 2u^
from which
D{u) = 4u^ - 27.
Since D{u) is square-free, 1, i;, -z;^ is an integral basis over u. Thus, 1, y, y^ jx is a normal basis, because 1, - , ^ - ^ is the integral basis 1, v^ v^ over u. In this case, then, Q is the field of constants, and the genus is (0 + 1 + l ) - ( 3 - l ) = 0. Example 2: x(x, y^ — y^ ^ x^y -f x (the Klein curve). In this case, D{x) — —4x^ — 27x^, whose only square factor is x^, so again the proper fractions integral over x have the form ^^^V^^V ^ where a, 6, and c are rational numbers. Application of Newton's polygon in the case a = 0 leads easily to three unambiguous truncated solutions oi y^ •\- x^y ^ x — 0, namely, y — ^\fx^ where 7 is a cube root of —1. Substitution oiy — —s^ for y and of s^ for x in a + % + cy^ gives a series divisible by x = 5^ only if a = 6 = c = 0, so \, y^ y^ is an integral basis over x. The orders of the first two are 0 and 2, respectively. The third, call it i<; = y^, is a root of the characteristic polynomial of 0 0 -X
1 0" 0 1 -x^O
2
=
" 0 0 1 -X - x ^ 0 __ 0 -X - x ^
therefore, w^ ^2x'^'uP' -\-x^w — x^ — 0, from which it is clear that the order oiw at X = 00 is 3. (Division by x^ gives an equation showing that wjx^ is integral over 1/x, but division by x^ gives one that shows that wjx^ is not integral over 1/x.) That 1, 7/, ?/^ is a normal basis follows from the observation that 1, ^ , ^ is an integral basis over u— ^^ because division of 2/^ + x^^y + x = 0 by x^ gives v^ ^ uv ^ \^ — 0, where v — ^, and because, as is easily shown, 1, 2
2
-z; = ^ , ^ = Is is an integral basis over u. Since Ai = 0, A2 = 2, and A3 = 3, it follows that Q is the field of constants, and the genus is (0 + 2 + 3) — (3 — 1) = 3. Example 3: x(x, y) = (x^ + ?/^)^ - 2(x + ^)^ (see Essay 4.3). As was noted in Essay 4.3, the algebraic analysis of this example should begin with the observation that the root field of x(x, y) contains a square root 2,
2
of 2 in the form of the element ^ ; ^ , which enables one to treat the root field
Essay 4.5 Determination of the Genus
153
as the root field of the polynomial x^ ^ y^ — \f2{x + y), whose degree in y is 2 instead of 4. (The irrational constants in the root field, if there are any, can be found by constructing one solution y of x(x, ?/) = 0 for one rational value a oi x\ the field of constants A needed to express such a solution must contain all constants in the root field, because the solution makes it possible to express any element of the root field as a power series—possibly with some negative exponents—with coefficients in A, and in particular to express any constant in the root field as an element of A. For example, when a = 0 the roots of x(0, ?/) = y^ — 2y^ yield two unambiguous truncated solutions y — ±\/2 • x^ and the truncated solution y = 0 • x^, whose ambiguity is 2. If the ambiguous solution y = 0' x^ is used as an input to Newton's polygon, the output is the truncated solution y = —1 • x, with ambiguity 2. If this truncated solution is the input, there are two unambiguous outputs y — —x :k \/2 • x^, for each of which \/2 must be adjoined. Thus, A = Q(\/2) for any one of the four infinite series solutions for a = 0, and no cleverness is needed to discover the irrational constant \/2 in the root field. For any x(x,7/), the construction of a single unambiguous solution x = a-\- s^^ y = /SQ -\- f3is -\- • • • of x(^5 2/) = 0 gives a number field A that contains, for the same reason, all constants in the root field of x(x,7/); factorization of xi^^v) over such an A will then show the extent to which the adjunction of constants can reduce the degree in y of x(x,2/), or, more precisely, will determine the degree of the root field as an extension of A(x).) The elements 1, \/2, y, V^y are easily shown to be a normal basis in which the A's are 0, 0, 1, 1, respectively, so the genus is ^ X] A^ — (no — 1) = ^(0 + 0 + 1 + 1) - (2 - 1) = 0. When Q is replaced by Q ( v ^ ) , 1 and y are a normal basis in which the A's are 0 and 1 respectively, and the genus is (0 + l ) - ( 2 - l ) = 0 . Of course, the genus is 0 geometrically, because the curve is a circle, which is birationally equivalent to a line. Example 4: xi^iV) = y^ + x"^ — 1 (the elliptic curve mentioned in Essay 4.2). Here the trace of 1 is 2, and the trace of 2/ is 0, so the trace of y^ = 1 — x^ is 2(1 — x^) and D{x) = 4(1 —x^). Since this polynomial has distinct roots, 1, y is an integral basis. The order of 1 at x = oc is of course 0, and the order of y is 2 (because division of 7/^ + a:^ — 1 by x^ gives a polynomial in -^ and ^ ) . Since 1 and ^ are an integral basis relative to ^, as is easily shown, 1 and y are a normal basis and the field of constants is Q, which implies that the genus is (0 + 2) - (2 - 1) = 1. Example 5: xi^^u) = y^ + x^ — 1 (a frequently cited hyperelliptic curve). By considerations similar to those in the last example, D{x) = 4(1 — x^) has distinct roots, so 1 and y form an integral basis. The orders at a: = oc are 0 and 3 respectively, and this basis is a normal basis. Therefore the genus is (0 + 3 ) - ( 2 - 1 ) = 2.
154
4 The Genus of an Algebraic Curve
Example 6: xi^^u) = V^ ~ f{^)^ where f{x) is a polynomial of degree 2n — 1 or 2n with distinct roots (a general hyperelliptic curve). As in the previous examples, 1 and y are a normal basis for which the orders at x = oo are 0 and n, so the genus is (0 + n) — (2 — 1) = n — 1, as is implied by the passage from Abel's memoir quoted in Essay 4.1.
Essay 4.6 Holomorphic Differentials
155
Essay 4.6 Holomorphic Differentials Given an algebraic curve C, the m e t h o d of the preceding essay determines its genus g regarded as in Essay 4.3 as t h e codimension of the subvarieties of C^ swept out by algebraic variations of N points on the curve. T h e objective of the present essay is to express this idea in terms of differential equations N
(1)
^hj{xi,yi)dxi==0
(j = 1,2, . . . , ^ )
i=l
describing these subvarieties of C^. Here the differentials hj{x, y)dx for j = 1, 2, . . . , ^ are to be a basis, over the field of constants, of the space of h o l o m o r p h i c differentials on the curve, a concept t h a t is to be defined. T h e equations (1) state t h a t algebraic variations satisfy g infinitesimal conditions, where g is t h e dimension of t h e space of holomorphic differentials; therefore, not only do t h e algebraic variations partition C^ into subvarieties of codimension g, but this partition is expressed by g explicit differential equations. In these equations, (xi^i/i) for i = 1, 2, . . . , N are given solutions of x{xi, Vi) — 0, where x(^? y) = 0 is t h e equation of the curve C. T h e heuristic idea of (1) is the following: If (1) correctly describes the possible algebraic variations of N points, it certainly describes the possible algebraic variations of fewer t h a n ISl points: Just add conditions dxi = 0 for a certain number of t h e points. Therefore, there is no loss of generality in assuming t h a t N is t h e number n^v of zeros of an element of 0[xy) for some large v. (Here no again denotes t h e degree of the root field as an extension of t h e field obtained by adjoining all its constants t o Q ( x ) , or, in t h e notation used before, no = njc^j As has been shown, t h e most general element of 0 ( x ^ ) is given by an explicit formula Q t h a t contains A^ — ^ + 1 unknown constants, call t h e m a i , a2, . . . , ajv-^+i (and in fact contains t h e m linearly); the conditions x{p^, 2/) = 0 and Q{x^ ?/, ^ i , a2, . . . , a i v - ^ + i ) = 0 define, implicitly, A^ solutions {xi^yi) of xi^iiUi) = 0 as functions of a i , a2, . . . , a j v - ^ + i , where A^ = UQU. Since multiplication of ^ by a constant does not change its common zeros with X, one of the parameters in ^, say ajv-c^+i, can be set equal to 1. T h e n t h e N moving points depend on N — g parameters, and they sweep out a subvariety of codimension g. In principle, the equations (1) result from implicit differentiation of the defining equations X = 0, ^ = 0 of the N moving points {xi^yi) in the following way. For fixed values of the x's, 2/'s and a's, the 27V relations dx{xi,yi) = 0, dO{xi^yi^a) — 0 give 2A^ homogeneous, linear equations in t h e 3A^ — g differentials dx^, dyi^ daj, whose coeflacients are rational functions of t h e 3N—g variables. T h e relation dx{xi,yi) = 0 involves just one pair of values {xi^yi) and a i , a2, . . . , CLN-Q^ SO it can be used (provided X'l IS a local p a r a m e t e r at {xi,yi)) to express each dyi in t e r m s of the corresponding dxi and dai, da2, . . . , daN-g and in this way t o reduce t h e differential equations t o j u s t N equations in 2N — g differentials dxi and dai. These equations can be solved.
156
4 The Genus of an Algebraic Curve
in the generic case, to express dai, da2, . . . , da^-g in terms of the dxi and to ehminate them, leaving g relations among the dxi. These g relations are the required differential equations (1) because they describe the relations satisfied by the dxi when the parameters ai are allowed to vary. In other words, these are the infinitesimal relations satisfied by algebraic variations of the N points In practice, the actual elimination of the dai to find the relations among the dxi seems impractical, even in the simplest examples. Instead, the derivation of the equations (1) will depend on observing that the holomorphic differentials, the ones that express the crucial relations (1), are the differentials that have no poles. Heuristically, such differentials lead to relations (1) in the following way. If 0{x^ y) has UQV zeros on the curve x(x, 2/) = 0 and no zeros where X = (X), and if /i(x, y) dx has no poles—even when x = 00—then the differential eixv)^ ^^^ poles only at the n^u zeros oi 0{x^y). Thus, one can make use of the fact that the sum of the residues of a differential is zero to find that 2_] (residue of ^^!^ )^ at that zero of ^ J — 0. zeros of 6 As a function on the curve xi^^v) = 0^ ^{^^u) can be regarded, locally, as a function of x near each of its zeros (provided these zeros avoid places on the curve where x is not a local parameter), so ^ is meaningful at each zero of 6{x^y) on the curve. When this derivative is not zero, its reciprocal is the residue of ^ T ^ T at the zero of 6 because ^(x, y) — a\x + a2X^ + • • • implies that this residue is, by definition, — when a\ 7^ 0. Thus ax
0=
2_. (residue of zeros of 0
Q(^\\ ^ at that zero of ^j =
V^ h{xi^yi)—^, zeros of 6
where dXi for each i is the infinitesimal change in Xi that results from an infinitesimal change d9 in 6. In other words, if the nou points where 6 is zero are moved to the nearby points where 0 is dO, then the UQU changes dxi in the x-coordinates of the intersection points satisfy ^ /i(xi, y^) dxi = 0, as was to be shown, provided the zeros of 0{x, y) are at points where both x and 0 are local parameters on the curve. That this necessary condition for the dxi to result from an algebraic variation of the intersection points is also a sufficient condition follows from—or at any rate is made plausible by—the fact that the number of linearly independent holomorphic differentials is g, so that the system of differential equations (1) describes a subvariety containing the algebraic variations that has the same dimension A^ — ^ (at generic points) as the subvariety of algebraic variations and that therefore must coincide with it. With this geometric motivation, the remainder of this essay will (a) define the notion of "holomorphic differential" in a precise algebraic way that accords
Essay 4.6 Holomorphic Differentials
157
with the notion of "no poles," (b) prove t h a t t h e dimension of t h e holomorphic differentials as a vector space over the field of constants is g^ (c) prove t h a t t h e sum of t h e residues of a differential is zero, and (d) flesh out t h e implicit differentiation sketched above to reach the conclusion ^ hj{xi, yi) dxi — 0. As before, let x ( x , ?/) be an irreducible polynomial in two indeterminates with integer coefficients t h a t contains b o t h indeterminates and is monic in t h e indeterminate y. Let K denote the root field of x(^5 y\ A differential in K is an expression of the form f{x,y)dx, where f{x^y) is an element of K and dx is merely a symbol. More precisely, f{x,y) dx is a differential expressed with respect to the parameter x; it is easy to guess how a differential expressed with respect to the parameter x might be expressed with respect t o another parameter of K, but in this essay all differentials will be expressed with respect t o the preferred parameter x. As before, t h e root field K of xi^^v) will be regarded as an extension not of Q ( x ) , t h e field of rational functions in x, b u t of Ko{x), the field of rational functions in x with coefficients in t h e field of constants KQ of K. (The symbol KQ thus replaces the symbol 0{x^) as the notation for the field of constants of t h e root field.) As before, let TIQ be t h e degree of K as an extension of Ko{x). By t h e definition of the genus, t h e dimension of 0{x^) as a vector space over KQ is TIQU — g -\- 1 for all sufficiently large u. In this essay, instead of differentials f{x,y) dx themselves, their traces will be considered; the t r a c e of / ( x , y) dx is by definition t h e differential tTx{f{x,y))dx, where dx is a symbol and t r ^ ( / ( x , y)) is the element of KQ{X) t h a t is the trace of f{x^y) with respect to t h e field extension K D Ko{x)] in other words, trx{f{x,y)) is the trace of t h e no x no matrix t h a t represents multiplication by f{x,y) with respect to t h e basis 1, ^, y^, . . . , y^^~'^ of K over Ko{x) (or, for t h a t matter, with respect to any basis of K over Ko{x)). T h e heuristic idea behind this definition is t h a t t r ^^(/(x, y) dx) is t h e sum over all no values at x of the differential f{x,y)dx^ which, being symmetric in t h e no values of y for any given x, is a rational function of x alone. Holomorphic differentials were described above as differentials without poles. Certainly, the trace of a holomorphic differential must therefore be dx times an element of Ko{x) without poles; in other words, if h{x^y)dx is holomorphic, t h e n tr^^ (/i(x, y)) must be a polynomial in x. However, this necessary condition should not be expected to be sufficient, because f{x,y) dx might have two poles at t h e same value of x t h a t cancel when t h e sum is taken over all y. In t h e case of two canceling poles, however, one would expect t o be able to choose an element 0{x,y) oi K t h a t was zero at just one of t h e poles and t h a t h a d no poles for finite values of a:, so t h a t 9{x,y)f{x,y) dx would be a differential t h a t had no poles where f{x,y) dx did not and for which t h e poles t h a t canceled in tr^^ {f{x,y)dx) no longer canceled in tr^c {9{x,y)f{x,y)dx). Therefore, such a differential would not satisfy t h e stronger necessary condition for a differential f{x,y)dx to be holomorphic: For every 9{x^y) t h a t is integral over x, tr^^ {6{x^y)f{x^y)) is a polynomial in x. But this necessary
158
4 The Genus of an Algebraic Curve
condition, too, should not be expected to be sufficient, because it would not detect poles of / ( x , y) dx at places where x = (X). For these reasons, a differential f{x,y) dx will be said to be holomorphic for finite x if tr^: {0{x^y)f{x,y)) is a polynomial in x whenever 0{x,y) is integral over x, and will be said to be holomorphic if it is holomorphic for finite X and if f d{-) = —-^duis holomorphic for finite u. (Here d{-) = — ^ is a definition. It will be justified by Corollary 1 below. Since tr^^ is the same as tr^^ when u = ^—both are found using a basis of the field over Q(x) = Q{u)—to say that —^^du is holomorphic for finite u means that trn(^ • -4) = ^^xi^'^Of) is a polynomial in u — - whenever 9 is integral over u.) Theorem. Construct the holomorphic differentials for a given x(^, y) cind prove that their dimension as a vector space over the field of constants KQ is the genus of the root field of x{x,y). Proof Let the construction that was used to determine the genus in Essay 4.5 be used to construct a normal basis yi, ^2, • • • ^ Vno of the root field as an extension of the field Ko{x) of rational functions of x with coefficients in the field of constants KQ and to construct nonnegative integers /ii, /i2, . . . , /ino for which an element of the root field is in 0{x^) if and only if its unique expression in the form (j)i{x)yi + 02(^)1/2 -h • • • + (/)no(^)2/no? where the coefficients are in JFCO(X), has coefficients that are polynomials in x with coefficients in KQ and the degrees of these polynomials satisfy deg (/){-\-fii < v. Let Si denote the symmetric no x no matrix of polynomials in x with coefficients in KQ whose entry in the zth row of the j t h column is tixiViVj)^ where the trace is taken relative to the extension K D Ko{x). (In other words, this entry is the trace of the matrix that represents multiplication by yiyj relative to the basis yi, 2/2, • • •, 2/no of K over Ko{x).) The symmetric bilinear form "the trace of the product" is represented by -Si in the sense that if h = hiyi + /i2^2 H + hnoVno and 9 = 9iyi + 6>22/2 H h 6>no2/no are the representations of two elements h and 9 of K relative to this basis, then tr x{h9) = [h^Si [9] where [h] represents the row matrix whose entries are /ii, ^2 5 • • • 5 hno, and [9] represents the column matrix whose entries are ^1, 6>2, . . . ,
UriQ.
With this notation, to say that h dx is holomorphic is to say that [h] Si [9] is a polynomial of degree at most u — 2 whenever the ith entry 9i of the column matrix [9] is a polynomial whose degree is at most u — fii^ because trx{h9) must be a polynomial in x, while tixi—x^h • ^ ) = — ^^^u-2 must be a polynomial in - . (Note that the hi need not be polynomials.) In other words, the row matrix [h]Si has the property that its product with a column matrix [9] is a polynomial of degree at most u — 2 when the ith entry of 9 is a polynomial of degree at most u — fii. If one takes all entries but one of [9] to be zero and that one to be a polynomial of degree v — iii for some large z/, one sees that the ith entry of [h] Si must be a rational function whose product
Essay 4.6 Holomorphic Differentials
159
with any polynomial of degree z/ — /i^ is a polynomial of degree at most u — 2. Thus, the zth entry of [h]Si must be a polynomial of degree at most /i^ — 2 when iii > 2 and must be zero if fii is 0 or 1. In other words, [h] must have t h e form [c]5';|~''^, where c is a row m a t r i x whose i t h entry is a polynomial in x of degree at most /i^ — 2 with coefficients in K^. (In particular, the i t h entry is zero when /i^ is 0 or 1.) This formula [h] — [c\S^^ completely describes the holomorphic differentials hdx. T h e number of constants in t h e coefficients of t h e entries of [c] is the sum of the numbers fJ^i — 1 over all values of i for which fii > 0. Since exactly one /i^ is zero (because KQ = 0{x^) consists of all elements 0 i ^ i + >22/2 + ' ' • + ^PnoUno i^ which 0^ = 0 when //^ > 0 and (pi is constant when /i^ = 0), it follows t h a t the number of arbitrary constants in this formula for h is ( ^ A^i) — (^o ~ l)^ which is t h e genus, as was to be shown. T h e proof t h a t the sum of the residues of any differential f{x,y) dx is zero reduces, by virtue of the definition of the sum of t h e residues as t h e sum of t h e residues of the rational differential tr x{f{X'>y))dx^ to the same statement for rational differentials 4 ^ d x , where p{x) and q{x) are polynomials with coefficients in some algebraic number field KQ and q{x) ^ 0. To define the sum of the residues of such a differential, it will be convenient to assume t h a t the denominator q{x) splits into linear factors over KQ^ although, as will be seen, t h e sum of the residues can be expressed rationally in terms of the coefficients of p{x) and q{x) even when this condition is not fulfilled. By t h e m e t h o d of partial fractions, one can see t h a t if q{x) — Y\{^ — o^iY^ -> where the ai are distinct constants, then
for suitable constants pi(j. (One can assume without loss of generality t h a t d e g p < degg, so t h a t P{x) — 0. Multiplication of b o t h sides of the required equation
by q{x) = Yii^ ~ ^iY^ gives an equation of t h e form p{x) = Yl^ Pia^iaix) in which the polynomials Aia{x) have degree less t h a n k = d e g ^ and depend only on q{x). This gives an inhomogeneous k x k system of linear equations satisfied by the k required coefficients pi^- W h e n p{x) = 0, these equations have only t h e trivial solution,* so for any p{x) of degree less t h a n k they have * Multiplication of YlLi (J'-afyi ^ 0, where a i , a2, . . . , a/^ are distinct algebraic numbers and degcpi < vi for each z, by Of^^^^^ ~ ^^Y^ gives an equation S r = i '^*(^) = 0 in which the ipi^x) are polynomials. All but one of these polynomials is divisible by (x — ai)^^, so the remaining one must also be divisible by {x — ai)^^, from which it follows that (/)i(x) must be zero. In the same way, 0i(a:) = 0 for each i.
160
4 The Genus of an Algebraic Curve
a unique solution.) The residue at x = a of ^Idx
is defined to be p^i, the
coefficient of ^^^ in the partial fractions expansion of ^7^, when a is one of the roots a^ of q{x); otherwise, the residue at x = a of ^^dx Note that the residue at x = a of ( 4 \ +
is zero.
/ ^ 1 dx is the sum of the
residues at x = a of ~^dx and ^4|y(ix. (The partial fractions decomposition of a sum is the sum of the partial fractions decomposition when terms with the same denominators are combined.) The conventional statement that the sum of the residues of a rational differential is zero assumes that the "residue at x = cxo" is included in the sum. In this way, the conventional statement can be seen, as the corollary below shows, as a method of evaluating the sum of the residues of ^^dx over all finite values of a. This evaluation is in fact quite easy: Proposition. The sum of the residues at a of a rational differential over all finite values of a is lim
^-00
X'r{x)
_
q{x)
v^r{^) u^q{^)
^^dx
(e = degg), u=0
where r{x) is the remainder when p{x) is divided by q{x), where the limit on the left is merely a mnemonic standing for the expression on the right, and the expression on the right denotes the quotient of constants in which the denominator is the leading coefficient of q and the numerator is the coefficient of x^~^ in r{x). Proof. Since the residues of f P(a:) + 4 ^ 1 dx are the same as those of 4f|c^^, one can assume without loss of generality that the quotient ^T^T in the given differential is a proper fraction; i.e., one can assume r{x) = p{x). The residue of r^^^Y is p if e = 1 and 0 if e > 1, so for fractions of the particular form ^
= ^^l^Y the residue is given by the formula lim^^^oo ^ ^ - The
theorem therefore follows from the observation that if ^-T\ and '^'^)^\ are proper q{x)
fractions, then X ' r{x) q{x)
x ' ri{x) qi{x)
qi{x)
^
^
x - r{x)qi{x)-{-x • ri{x)q{x) q{x)qi{x)
so the same is true of their limits as x -^ oc, interpreted as in the statement of the theorem. (Note also that lima^^oo ^'(x) ^^ unchanged if a common factor is canceled from numerator and denominator.) Corollary 1. The sum of the residues of ^^^dx over all finite values of x is minus the residue at x — oc^ which residue is by definition the residue at u = 0 of
Essay 4.6 Holomorphic Differentials P(-)
1
161
pi-)
(The expression on the left is a mere mnemonic that takes advantage of the formula d{^) = — ^ of elementary calculus.) Deduction. What is to be shown is that the value of
at 2z = 0 is the residue at ?i = 0 of ^^^^
du=
^-^du.
Since this differential has the form uQ{u)
-du, '
where
Q{u)
is a proper fraction in which (5(0) ^ 0, this conclusion follows immediately from the definition. Corollary 2. When the residue of ^^dx at x = oo is defined as in Corollary 1, the sum of the residues of a rational differential is zero. These algebraic facts make possible a plausible implicit differentiation of x(x, i/) = 0 and 6{x^ 2/, o^i, ci2, . . . , CLN-Q) = 0 that leads to N
(2)
Y^hj{xi,yi)dxi
=0
(j = 1,2,... ,^)
i=l
when the dy^s and da's are eliminated. As before, there is no loss of generality in assuming that N = n^v for some large v and that the [xi^yi) are the intersection points xi^i^Vi) = 0? 6{xi, yi) = 0 for some fixed 6 = aiOi + a2^2 H h aN-g-\-iON-g-\-i^ where the 6i are a basis of 0{x^) over KQ and ai, a2, . . . , ajv-^+i are fixed constants. In addition, it will be assumed that the chosen 9 is in "general position" in the sense that a: is a local parameter at each of the N intersection points (x^, yi) and 0 has poles of order u at each of the n points where x = cx). Each of the N intersection points [xi^yi) implies a pair of differential equations (3) Oxdxi 4- Oydyi + Oidai + ^2C^«2 H
Xxdxi + XydVi = 0, \- ON-g+idaN-g+i = 0,
where subscripts x and y denote partial derivatives, and these partial derivatives are to be evaluated at the point (a:^, y^, ai, a2, . . . , ajv-c^+i) at which ai.
162
4 The Genus of an Algebraic Curve
a2, . . . , a^-g^i have the given values that determine the N points (xi^i/i), and Xi and i/i are the coordinates of one of these points. Elimination of dyi from the pair of equations (3) gives the single equation dxi + Q{Oidai + 02da2 + • • • + ^AT-^+Ic/aAT-^+I) = 0 in which Q denotes the quotient —e^-e Y ' "^^^^ quotient is in fact the reciprocal of the derivative of 9 with respect to x (eliminate dy from the equations Xxdx + Xydy = 0 and Oxdx^Oydy = d^, a computation that assumes x is a parameter on the curve at the point in question). Otherwise stated, it is the residue of the differential ^ at this zero of the denominator 6^ because it is the value of the quotient ^/~^\ at the point (xi^yi) where numerator and denominator, taken separately, are both zero. (It is natural to think of this number as a limit, but of course it can be described algebraically as the value of the rational function of x and y when it is put in canonical form—a numerator in which y has degree less than n — deg x ^^^ ^ denominator that is a polynomial in x alone that is relatively prime to the numerator.) Therefore, if each equation dxi-\-Q{0idai-\-62da2-\-' • •-\-0 N -g+idajsi -gj^i) = 0 is multiplied by the value h{xi^yi) of h at the corresponding point {xi^yi) and all N of these equations are added, the result is ^ h{xi,yi)dxi + Cidai -h C2da2 -\ h CN-g-\-idaN-g-\-i = 0, where the coefficient Cj of daj is the sum over all A^ zeros of 0 on the curve x = 0 of hOj times the residue of ^ at that point. It is to be shown that each such coefficient Cj is zero. Since neither 9j nor hdx has poles for finite x, the differential 9jhdx/0 has residues for finite x only at the zeros (xi^yi) of 9, and these residues are the values at {xi^yi) of 9jh times the residue of dx/9 at (xi.yi). In short, Cj is the sum of all residues of the differential 9jhdx/9 at points where x is finite. Therefore, it is minus the sum of the residues at x = oo of the differential 9jhdx/9. Since 9j has order at most z/ at x = CXD (it is in 0{x^)) and 9 has order z/ at X = oc (by assumption), 9j/9 is finite at x = oc, so 9jhdx/9 has no pole at X = oo, which implies Cj = 0 and ^ h{xi,yi)dxi = 0, as was to be shown. Example 1: x{^^ v) = V^ -^ ^^V + ^ (the Klein curve). As was seen in Essay 4.5, 1, y, y'^ are a normal basis over Q{x) for which the A's are 0, 2, 3. Therefore {hi + h2y -\- hsy'^)dx is a holomorphic differential if and only if [hi /i2 hs] S' = [O a 6x + c], where a, 6, and c are rational numbers and the matrix 5, which has tr (T/*"^-^"^) in the zth row of the j t h column, is easily found to be 3 0 0 -2x^ -2x^ -3x
-2x^^ -3x 2x^
When c = 1 and a = 6 = 0, this gives a 3 x 3 homogeneous linear system whose solution is [hi /i2 hs] = 4^9^^272:2 [^^^ ~ ^^ ^^^] • Thus, h = ^^ 4x^^273;^ ^ ' which can be written more simply as /i = 3 2^3,3 • It is easy to see that the solution in which 6 = 1 and c = a = 0 is x times this one, and
Essay 4.6 Holomorphic Differentials
163
the solution in which a — I and b — c =^ 0 is y times this one, which leads to the formula c-\-bx + ay 3y^ + x^ for the most general holomorphic differential on this curve. The formula has three parameters a, 6, c because the genus is 3. (For an easier derivation of this formula, see the examples of Essay 4.8.) Example 2: x(x, y) — y^ — f{x), where f(x) is a polynomial of degree 2n — 1 or 2n with distinct roots (a general hyperelliptic curve). As was seen in Essay 4.5, 1 and y are a normal basis for which the orders at X = oo are 0 and n. (The matrix S{x) is [^ 2f?x)]' whose determinant D(x) = 4:f(x) has distinct roots, so 1, ?/ is an integral basis over x. Division of y'^ - f{x) = 0 by x^^ gives ( ^ ) 2 - ^ ^ = 0, which, when i; = ^ and li = ^, is a curve of the same form v^ — F{u) = 0 of which 1, t' is an integral basis over u. It follows that 1, 7/ is a normal basis relative to x in which the order of 2/ at X = oo is n.) Therefore, {hi -{-h2y)dx is a holomorphic differential if and only if "2 0 ;0 q{x)] [hi h2 0 2f{x) where q{x) is a polynomial of degree at most n — 2. Thus, hdx =
2ffx)^ ~
2 is the most general holomorphic differential, where q{x) is a polynomial of degree at most n — 2. The genus is n — 1.
164
4 The Genus of an Algebraic Curve
Essay 4.7 The Riemann-Roch Theorem Dedekind and Weber say in their classic treatise t h a t t h e R i e m a n n - R o c h theorem, in its usual formulation, determines t h e number of arbitrary constants in a function with given poles [14, §28]. Indeed, t h a t is exactly t h e way Roch himself formulated t h e theorem [58], as his title "On t h e Number of Arbitrary Constants in Algebraic Functions" indicates. T h e answer, a formula for the dimension of the vector space of rational functions with (at most) given poles, is a corollary of the theorem of this essay, which describes the principal parts of rational functions on an algebraic curve. Let f{x^y) be a rational function on a curve xi^iV) — 0^ say f{x,y) = p{x^y)/q{x)^ where p and q are polynomials with integer coefficients and / is regarded as an element of the root field of x ( ^ , v)- T h e p r i n c i p a l p a r t s of / at finite v a l u e s of x are, by definition, t h e terms with negative exponents in the expansions of / in powers oix—a for algebraic numbers a. Such expansions are obtained by applying Newton's polygon to expand y\nn — deg^ x ways in (possibly fractional) powers of x — a, substituting these expansions in p(x, y)^ and multiplying the result by the expansion of \/q{x) in powers oix — a] they can contain negative powers oi x — a only if the expansion of l/q(x) does, which is to say, only if a is a root of q{x). T h e principal p a r t s of / ( x , y) thus amount simply to a list of the roots a of q{x) and, for each of them, a list of the terms, if any, with negative exponents in the n series found by substituting expansions of y in powers of x — o; in / ( a + (x — a ) , y). One can define the p r i n c i p a l p a r t s o f / at x = oo as the principal parts at li = 0 when u = ^, b u t for the sake of simplicity this essay will deal only with rational functions t h a t are finite at x = CXD, so t h a t there are no principal parts at X = (X). Specifically, the only functions considered will be those of the form f{x^y) — p{x^y)/q{x)^ where p{x^y) is in 0{x^) for v — degg. Expansion of numerator and denominator of f{x,y) = (^\/Jl in powers of 1/x then gives a quotient of power series in 1/x in which neither numerator nor denominator contains terms with negative exponents (the numerator is integral over 1/x) and the denominator is not zero when 1/x = 0, so t h e expansions of /(a;, y) in powers of - contain no terms with negative exponents. For values a of x at which xi^^v) — 0 ramifies—which is to say t h a t at least one of the expansions of y in powers oix — a involves fractional powers— the principal parts of a function satisfy an obvious consistency requirement, namely, since one solution ?/ = /^o + A-^ + /^2<5^ H in which s — "^Jx — a for vn> 1 implies m — 1 other solutions obtained by multiplying s by some TTith root of 1 other t h a n 1, it must be true t h a t a term of any one of the corresponding expansions of / ( x , y) determines the t e r m with the same exponent in any of the other m — 1 expansions: One needs merely to change s to ujs for a suitable root of unity u). For these reasons, a s e t of p r o p o s e d p r i n c i p a l p a r t s of a rational function on t h e curve x ( x , ?/) = 0 will be defined t o consist of (1) an algebraic number field A , (2) a finite set of elements cei, 0:2, • • •, a ^ of A , and (3) for
Essay 4.7 The Riemann-Roch Theorem
165
each of the a^ and for each of the n ways of expanding y in powers of x — a^ an expression of the form 71 (x — ai)~^ + 72(x — ai)~'^ + • • • + 7z(x — Q;^)~^ where / is a positive integer and the 7's are in A, except that in the case of expansions of y that are in powers of A/X — a^ with m > 1 the expressions must take the form 71 ( >/x — a^)"-^ + 72( A/^"^^^^)~^ + • • • + 7K V ^ ~~ ^i)~^ and must satisfy the consistency requirement just described. (A natural way to handle this consistency requirement is to prescribe the expression 7i( A/X — ai)~^ + 72( \/x — ai)~'^ + • • • + 7/( ^/x — ai)~^ for just one expansion of y in powers of A/X — ai in each set of m and to derive the others from it.) The problem is to determine whether, for a given set of proposed principal parts, there is a rational function on the curve that is finite where x = 00 and that has the stated principal parts. The answer given by the theorem below is that the holomorphic differentials give simple necessary and sufficient conditions for there to be such a function. If hdx is a holomorphic differential and if /(x,?/) = (^ is finite when X = 00, then iVxifh) on the one hand is a rational function of x whose denominator q{x) has degree u and whose numerator tr x{ph) has degree at most i/ — 2, so the sum of the residues of tvx{fh) over all finite values of the variable is zero, and on the other hand is a rational function whose residue at any finite value a of x is a linear function of the principal parts of / . In this way, a certain linear function of the principal parts of / is necessarily zero. Explicitly, the following lemma can be used to express the residue of tr x{fh) at X == a in terms of the principal parts of / : Lemma. Let g{x^y) he a rational function on the curve x(x,i/) = 0. The expansion of tr x {9) ^^ powers of x — a is the sum of the n expansions in (possibly fractional) powers of x ~ a obtained by substituting the n solutions of x(^, y) =0 atx = a in g{x, y). Proof. Let the main theorem of Part 1 be used to construct a minimal splitting polynomial, call it F(x,2:), of x(x,2/) regarded as a polynomial in y with coefficients in Z[x] and let K be the root field of F(x, z), which is to say that K is the splitting field of x(x, y). Finally, let z be an expansion of a solution z of F(x, z) = 0 in (possibly fractional) powers of x—a, say z = 6o-}-Sis-\-S2s'^-\ , where x = a + 5^. (Let m be determined by the condition that the indices i of terms in z in which Si j^ 0 have no common divisor greater than 1.) The substitution x = a-\- s'^^ z = z embeds K in the field of quotients of the ring of power series in s with coefficients in A, where A is an algebraic number field containing a that is constructed by the Newton polygon algorithm. Let A(s) denote this field of quotients; it is, in effect, the ring of formal power series in s with coefficients in A enlarged to include power series with a finite number of terms with negative exponents, because the reciprocal of a power series 7^5* +7^+15*+^ H in which the term of lowest degree has degree i can be expressed as a power series —s~'^ H in which there are i or fewer terms with negative exponents.
166
4 The Genus of an Algebraic Curve
In short, the spHtting field K of xi^^v) can be regarded* as a subfield of A{s) for a sufficiently large algebraic number field A under an embedding t h a t carries x to a-\- s'^. Let ^ i , ^2, • • •, ^/x be the distinct images of ^ ( x , y ) under t h e Galois group of K. The irreducible polynomial i^{X) with coefficients in"'' Q{x) of which g{x^y) is a root is t h e n n f = i ( ^ ~ 9i)^ ^ ^ ^ ^^x{g) is —j times t h e coefficient of X^~^ in ip{X)^ where j is the degree of the root field K of x ( x , y) as an extension of its subfield generated by g^ because the matrix t h a t represents multiplication by g relative to a basis of K over Q ( x ) can be arranged as a j x j m a t r i x of // x /i blocks in which t h e blocks off t h e diagonal are all zero and the diagonal blocks are all the matrix whose first /x — 1 rows are t h e last /i — 1 rows of / ^ and whose last row is the negatives of the coefficients of ip (except the leading coefficient 1) listed in reverse order. Thus, since the coefficient of X^~^ is — (^i -f ^2 H ^ 9i^)^ ^^e expansion of tra;(^) in powers of s is given by tr x{g) = j{gi + 5^2 + • • • + gi^) when the gi are represented as elements of A{s). W h a t is to be shown, then, is t h a t substitution of the n expansions of y in powers of 5, along with substitution of o; + 5"^ for x, into g{x,y) gives each of t h e // expansions ^^ for i = 1, 2, . . . , /i exactly j times. T h e Galois group of K expresses each root z of F ( x , z) as a polynomial in one such root with coefficients in Q ( x ) . Substitution of z in these polynomials, together with substitution of a + s ^ for x, gives deg^^ F{x,z) distinct embeddings of K in A ( s ) . The possible expansions of y in powers of s and the possible expansions gi of g{x,y) all occur as images of y or g{x^y)^ respectively, under these embeddings. T h e action of t h e Galois group on the embeddings implies t h e desired conclusion t h a t each gi occurs for the same number of different expansions of y. (Note in particular t h a t all fractional powers of x — a cancel when the sum fi'i + ^ 2 + • • • +^/x is computed. This is a clear consequence of the "consistency requirement" described above, because the sum is zero whenever uj is a n m t h root of unity and LJ^ 7^ 1.) Thus, the residue of tvx{fh) dx at any a is the sum of the coefficients of (x — a)~"^ over all n expansions of fh in powers of s = \ / x — a. Not only does this sum depend linearly on the principal parts of / , but the entries in the matrix t h a t describes it are coefficients in the expansion of h in powers of s. * Such an embedding of an algebraic field of transcendence degree 1 in A{s) is analogous to an embedding of an algebraic number field in the field of complex numbers (see Essay 5.1). As in the latter case, the field K loses much of its constructive meaning when it is regarded merely as a subfield of A(s), because elements of A(5) are infinite series. An element of ii^ is a root of a polynomial whose coefficients are rational functions of x, so the infinite series that represents it as an element of A(s) can be specified by giving enough terms to determine an unambiguous truncated solution of the equation in question, after which all later terms are determined by Newton's polygon. ^ As always, Q(a:) denotes the field of quotients of Z[x], which is to say the field of rational functions in x.
Essay 4.7 The Riemann-Roch Theorem
167
Explicitly, if/i = ho-\-his-\-h2s'^-\ and if / = 7i5~-^-f72S~^H \-^is~\ then the coefficient of {x - a ) " ^ in fh is /io7m + ^i7m+i + /i27m+2 H h hi-rn1h a linear function of 71, 72, . . . , 7/. When this formula is summed over all principal parts of / it gives, for each holomorphic differential hdx, an explicit linear function of the principal parts of / that must be zero. Theorem. If a set of proposed principal parts satisfies the condition just described for each holomorphic differential hidx in a basis hidx, h2dx, ..., hgdx of the holomorphic differentials on x{^^ v) — 0^ then it in fact gives the principal parts of some rational function on the curve. In short, these g necessary conditions for a set of proposed principal parts to be the principal parts of a function are sufficient Proof Let a set of proposed principal parts be called subordinate to a polynomial q{x) if each a for which it specifies a polynomial 7i5~-^+725~^ + • • • + 7/5~Ms a root of q{x) and if, moreover, the multiplicity of a as a root of q(x) is at least //m, so that multiplication of 7i5~^ + 72^"^ + • • • + 7/5~^ by q{x) = q{a + s^) makes all exponents nonnegative. Every set of proposed principal parts is subordinate to some q{x), so it will suffice to prove that the theorem holds for all sets of proposed principal parts subordinate to a given q{x). Moreover, those subordinate to q{x) are also subordinate to q{x)r{x) for any polynomial r(x), so one can assume without loss of generality that q{x) is a polynomial of high degree. If / ( x , y) is finite at x = oc and its principal parts are subordinate to q{x), then q{x)f{x^y) has order at most degq at x = 00 and has no poles for finite X, which is to say that f{x^y) = p{x,y)/q{x), where p{x^y) is in 0{x^) for V = degq. But for large v the set of such functions f{x,y) is a vector space of dimension ni/ — g -\- 1 over the field of constants. Functions that differ by a constant have the same principal parts and conversely, so the vector space of principal parts that actually occur is seen in this way to have dimension nv — g for large v. On the other hand, the dimension of the space of proposed principal parts subordinate to q{x) can be found in the following way. If a is a root of q{x) of multiplicity /i, and if none of the n expansions of y in powers oi x — a involve fractional powers, the proposed principal parts subordinate to q{x) contain nfi coefficients corresponding to this a, ji coefficients in each expansion (those of {x - a ) - \ (x - a ) - ^ . . . , (x - a)-^). The same formula jin holds even when some expansions involve fractional powers, because the proposed principal part corresponding to an expansion in powers of "y/x — a contains m times as many coefficients in the required range, but by the "consistency requirement" the coefficients of just one determines those of a set of m of them. Therefore, a set of proposed principal parts subordinate to q{x) contains nv unknown coefficients 7, where n = deg^ x ^"^^ ^ — ^^Z QIn short, the principal parts that actually occur are a subspace of codimension g in the nu-dimensional space of proposed principal parts subordinate to
168
4 The Genus of an Algebraic Curve
q{x) when u = degq is sufficiently large. Since the conditions imposed by the holomorphic differentials—the sum of the residues of tixifh) is zero for all holomorphic differentials hdx—are expressed by g homogeneous linear conditions on the coefficients of the principal parts, the actually occurring ones account for all of those that satisfy the necessary conditions provided the necessary conditions are independent (because then they determine a subspace of codimension g). In short, it will suffice to prove that every polynomial divides one for which the g necessary conditions are independent as conditions on sets of proposed principal parts subordinate to that polynomial. The g homogeneous linear conditions imposed by the holomorphic differentials on proposed principal parts subordinate to q{x) are expressed by a g X {nv) matrix of elements of A, call it Cq. What is to be shown is that every polynomial divides a polynomial q{x) for which the rank of Cq is g. This will be done by showing that if the rank of Cq is less than g and if (3 is any element of A that is not a root of q{x), then replacing q{x) with {x — P)q{x) increases the rank of Cq, except for very extraordinary coincidences in the choice of f3 which can easily be avoided. In fact, changing q{x) to {x — P)q{x) increases the degree of q{x) by 1 and therefore adds n columns to Cq. Each of the new columns contains the g values of the coefficient hi of one of the basis hidx^ h2dx^ . . . , hgdx of holomorphic differentials at one of the n points on the curve at which x = (3. More precisely, let P be required to be an element of A {(3 can be taken to be a positive integer) for which xiP^^) has distinct roots; then each column of the new Cq corresponds to one of the roots 7 of x(/^, X) and it contains the values of /ii, /i2, .. •, /^5 when (/3,7) is substituted for {x, y). It is to be shown that if the original Cq has rank less than ^, then the extended Cq has rank greater than that of the original, except under extraordinary circumstances. The ranks of the original and the extended Cq are unchanged by a change of basis of the holomorphic differentials. If the rank of the original Cq is less than g, then there are constants ci, C2, . . . , c^, not all zero, such that multiplication of the original Cq on the left by the row matrix with entries Ci, C2, . . . , Cg gives a row of zeros. If a new basis of the holomorphic differentials is used in which the first holomorphic differential is X]f=i Cihidx, the original Cq becomes a matrix whose first row is zero and the extended Cq becomes a matrix in which the first row contains niy zeros and n new entries that are the values of the new hi at the n points {x,y) = (/^, 7). Thus, the extended Cq has greater rank unless all n of these values are zero. Clearly, it would be an extraordinary coincidence if a value of P chosen at random were to result in even one value of hi that was zero, much less n of them. Since the number of zeros of the rational function hi (which is nonzero and does not have a pole at {P, 7) by the choice of P) is finite, one can easily find a new P for which the first row of the extended Cq contains a nonzero entry, and therefore find a p for which the rank of Cq is increased.
Essay 4.7 The Riemann-Roch Theorem
169
This theorem determines exactly which proposed principal parts are actual principal parts and therefore solves the Riemann-Roch problem of determining the dimension of the space of rational functions with prescribed poles with, at most, prescribed multiplicities. In the notation and terminology of the proof, one can say that the vector space of functions whose principal parts are subordinate to q{x) has dimension nv — p-\-l, where p is the rank of C^, because the proposed principal parts are a space of dimension nz/, those that actually occur satisfy p independent conditions, and the linear function that carries functions to their principal parts has a one-dimensional kernel. This formula nv — p^l gives the answer only in the special cases in which, roughly speaking, a pole is allowed at one point only if a pole is allowed at all other points where x has the same value (multiplicities counted). A more general case of the formula can be stated by introducing a little more terminology. Let one set of proposed principal parts be said to be subordinate to another if they make use of the same algebraic number field A, if each a of the first also occurs in the second, and if for each proposed expansion of y in powers of x — a, the terms in the corresponding expression 7i( \/^ - Q:i)~^+72( V ^ - a^)"^ H h7/( V ^ - OLi)~^ (where in most cases m is 1) of the first all have exponents at least as great (bearing in mind that — 1 is greater than —2) as the smallest exponent of a nonzero term in the corresponding expression of the second. In short, the first set of proposed principal parts calls for no poles of greater multiplicity than are called for by the second. Let the number of coefficients in a set of proposed principal parts be the number of elements of A that need to be specified to describe a set of proposed principal parts that is subordinate to it (bearing in mind that if it calls for terms with fractional exponents, then the coefficients of one of the m expressions 71 ( A/X — ai)~^ +72( \/^ ~ OLi)~'^ H h7/( \/^ ~ <^z)~^ determine those in the other m — 1). Corollary 1 (Riemann-Roch Theorem). The functions whose principal parts are subordinate to a given set of proposed principal parts form a vector space of dimension N — p -\- 1, where N is the number of coefficients in the given set of proposed principal parts and p is the rank of the g x N matrix that describes the necessary and sufficient conditions of the theorem. Deduction. The sets of principal parts subordinate to the given set form a space of dimension N; the necessary and sufficient conditions describe a subspace of codimension p, which is the space of possible principal parts of actual functions; and the space of functions with these principal parts has dimension one greater, because two functions with the same principal parts differ by a constant. The theorem also implies that the conditions (1) of Essay 4.6 satisfied by algebraic variations of N points on a curve are sufficient, with a few added assumptions, for a proposed variation to be algebraic:
170
4 The Genus of an Algebraic Curve
Corollary 2. Let (x^, yi) for i = 1, 2, ..., N he pairs of algebraic numbers that satisfy xi^i^Vi) = 0; ^^^ suppose that x(x^,X) has n distinct roots for each Xi, i — 1, 2, ..., N, so that Newton^s polygon gives a unique power series solution y — yi + P{x — Xi)-\ of xi^itV) for each i. For any list of nonzero"" algebraic numbers 6i, 62, . - , ^N that satisfy ^i^ih{xi^yi)6i = 0 for all holomorphic differentials, there is a rational function f on x{x^y) = 0 whose zeros are precisely at the points (x^, yi) and whose expansion f = 7^(x—Xi)H in powers of x — xi at each such point shows that ^\{xi,yi) — ^i ^^ 'the sense that 6i = —. More picturesquely, prescribed infinitesimal changes dxi in the x-coordinates of the points are generated by changing / from 0 to d/, which changes Xi to Xi-\-6idf, provided the prescribed changes satisfy the necessary conditions ^hdxi = 0 for all holomorphic differentials hdx. Deduction. The required function / is found by using the theorem to construct a function 0 finite at x = 00 whose principal parts are _ for i = 1, 2, . . . , N and setting f = l- Then / is zero only at the (xi.yi)^ and at these points + • • •, from which / = j-{x — Xi) -\- - • - follows. Corollary 3. When (xi^yi) for i — 1,2, ..., N are points on x(a:, y) =^ as in Corollary 2, the rational functions on x{x^y) = 0 that have simple poles, at most, at these N points and no other poles form a vector space of dimension N — g -{- 1 -\- ji, where g is the genus of the curve and fi is the dimension of the vector space of holomorphic differentials that are zero at all N points. Deduction. By Corollary 2, the dimension is A^ — p + 1, where p is the rank of the g X N matrix whose columns correspond to the N given points and whose g entries in each column are the values at the corresponding point of the coefficients hi of a basis hidx^ /i2
It is natural to exclude (5^ = 0, because the points {xi,yi) for which 6i = 0 can be omitted from the list.
Essay 4.8 The Genus Is a Birational Invariant
171
Essay 4.8 The Genus Is a Birational Invariant So far in these essays, the genus of the field of rational functions on an algebraic curve xi^^y) — 0 tias been described in ways t h a t used the special element X of the field, first when the description was in terms of the dimension of the vector space 0 ( x ^ ) , then when it was in terms of the dimension of t h e space of holomorphic differentials h dx over the field of constants. However, the geometric motivation of the concept in terms of algebraic variations leads one to believe t h a t the genus depends only on the curve, not on the p a r a m e t e r X used to describe the curve. If z is any element of the root field of x ( x , y) t h a t is not a constant, one can construct* an element w of this root field such t h a t every element of t h e field can be expressed rationally in terms of z and w and such t h a t z and w satisfy a relation of the form Xi(z, w) = 0, in which Xi{z, w) is an irreducible polynomial with integer coefficients t h a t is monic in w. In short, the root field of x(^5 y) can also be described as t h e root field of Xi{z, w). W h a t is to be expected, and what will be proved in this essay, is t h a t the genus is t h e same whether x or 2; is t h e special parameter used to define it. In other words, t h e genus depends only on the field of rational functions on t h e curve, which is what it means to say t h a t t h e genus is a birational invariant. The needed connection is obvious from the point of view of differential calculus: T h e rule hdx ^^ [h^) dz establishes a one-to-one correspondence between differentials expressed with respect to x and differentials expressed with respect to z. T h e heuristic meaning of "holomorphic" is "no poles," so this correspondence between differentials with respect to x and those with respect to z should be expected to p u t t h e holomorphic differentials in t h e two cases in one-to-one, linear correspondence, implying t h a t the dimension of these spaces of holomorphic differentials—the genus in t h e two cases—should be t h e same. It is easy to give algebraic meaning to j ^ . Let x and z be given elements of the root field K of x{^^ y)- There is^ an irreducible polynomial 0 ( X , Z) in two indeterminates with integer coefficients—it is uniquely determined, u p to * See Essay 2.2. The field Q(2;) of rational functions in z is isomorphic to a subfield of the root field, provided z is not a constant. Adjunction of x and y to Cl{z) gives an explicit extension of Q(2;) that is isomorphic to the root field of x(^?2/)- By the theorem of the primitive element, such a double adjunction can be obtained by a simple adjunction, and one can describe a simple adjunction as the root field of an irreducible monic polynomial with coefficients in 7A[Z] in the usual way. ^ Since the root field is an extension of Q,{x) of finite degree n, the powers 1, z^ 2;^, . . . , z'^ are linearly dependent over Q(a:), which is to say that one can find a nonzero polynomial of degree at most n with coefficients in Q(x) of which 2: is a root. The needed relation (j){x^ z) = 0 is found by clearing denominators and passing to an irreducible factor if necessary. Because the root field of x(^5 y) contains Q(x), the relation (/)(x, 2:) = 0 must involve z. To say that z is a parameter—that it is not a constant—means that cj) also involves x; if this is not the case, the
172
4 The Genus of an Algebraic Curve
its sign, by x and z—with the property that (l){x,z) = 0 in K. The derivative of X with respect to z is found algebraically by implicit differentiation: differentiation of (j){x, z) = 0 gives 0a;(x, z)dx -\- (j)z{x^ z)dz = 0 and therefore gives dx (t)z{x,z) dz
(t)a:{x,z)'
where the subscripts indicate partial derivatives. The theorem to be proved states that when ^ is defined algebraically in this way, h dx is holomorphic if and only if h^dz is, when, naturally, one determines whether hidz is holomorphic for hi = hj^ by dealing with it as a differential in the root field of Xi(2:, w) rather than the root field of x(a:, y). The notion of "principal parts" of an element of K that was defined in the preceding essay will play an important role in the proof, but now the dependence of these principal parts on x needs to be emphasized: The principal parts of / relative t o a parameter x are the terms with negative exponents* in all possible expansions of / in (possibly fractional) powers ofx — a for algebraic numbers a. (Possible principal parts at x = cxo will be ignored.) As was shown in the last essay, for a given / one can find a polynomial q{x) with the property that the only possible principal parts of / relative to x occur when a is a root of q{x). Therefore, the determination of the principal parts relative to a; is a finite—and usually quite simple^—calculation. One can then use the following lemma to determine whether f dx is holomorphic for finite x: Lemma 1. A differential f dx is holomorphic for finite x if and only if all terms of all principal parts of f relative to x have exponents greater than — 1. This criterion says roughly that / dx has no poles, because at a point of the curve where x — a there is a local parameter s for which x — a = s'^ for some m; to say that / dx has no pole at this place where s = 0 is to say that f d{s^) — mfs^~^ds has no pole, which is to say that multiplication of / by s'^~^ clears its denominator, or, what is the same, that its expansion in powers oi x — a = s^ contains no terms whose exponents are less than or equal to —1. Proof. Suppose first that all terms of all principal parts of / relative to x do have exponents greater than —1. If ^ is integral over x, then its image under any embedding of K in A{s) that takes x to a -\- s'^ for some m is a series denominator of ^ is zero and the derivative is not defined (x is not a function ofz). * This definition is imprecise in that it ignores the question of multiplicities. In Essay 4.7, n = deg^ % distinct embeddings of the root field of x in A(5) were constructed for each given a. A given / may have fewer than n distinct images, so the same principal parts may occur for more than one embedding. Obviously, Lemma 1 is not affected by the way in which multiplicities are treated.
Essay 4.8 The Genus Is a Birational Invariant
173
that contains no powers of s with negative exponents. Therefore, it contains no powers of x — a — s'^ with negative exponents, so the image of fO under any such embedding contains no terms in which the exponent on x — a is —1 or less. As was seen in Essay 4.7, the expression of tix{fO) as a power series in X — Q; is the sum of the n images of fO in A{s). Therefore, it is a series in which no term has an exponent less than or equal to —1. Thus, since it is a series expansion of a rational function of x, which implies that it contains no terms with fractional exponents, it is a power series in x — a. Since this is true for every a, it follows that trx{fO) must in fact be a polynomial in x whenever 0 is integral over x. In short, f dx is holomorphic for finite x which completes the proof of Lemma 1. Conversely, suppose that some embedding of the root field K of xi^^v) ^^ A{s) that carries x to a -{- s^—where a is an algebraic number and m is a positive integer—carries / to a series in which the exponent on s is less than or equal to —m, so that the exponent on x — a is less than or equal to —1. Let Zi^ Z2, '-. ^ Zn he 8iii integral basis of the root field over x, and let ^ = ^ QZ^, where Ci, C2, . . . , Cn are constants to be determined. Consider the terms in the n expansions of 6 in (possibly fractional) powers of x — a in which the exponents are less than 1. When the n expansions do not involve fractional powers, the expansion of each Zi has just one term-—the constant term—in which the exponent is less than 1 in each of the n embeddings, and the same is true of 9; in this case, the n constant terms in the series representations of 0 are the entries of the column matrix Mc where c is the column matrix with entries Ci, C2, . . . , c^ and M is the n x n matrix whose j t h column contains the n constant terms in the images of Zj under the n embeddings. When one or more of the n expansions do involve fractional exponents^ when the curve is ramified at x = a—a similar statement is true. An embedding involving powers of 5 = \/x — a in which m > 1 implies m — 1 others in which s is replaced by ujs for the mth roots of unity uj other than 1. There are precisely m terms (some of which may be zero) in any one of these series in which the exponent on x — a is less than 1, namely, the terms in 5^, s^^ . . . , s^~^. Since the sum of the various values of m is n, it follows that all coefficients of all n expansions of 0 in which the exponents on x — a are less than 1 are determined by just n such coefficients, namely, the coefficients of a selection of the expansions when just one expansion is selected from each set of m related expansions. The same is true of each Zj, and all coefficients of 0 are determined by those given by a formula Mc, as before, in which the j t h column of M gives the coefficients of the selected expansions of Zj. (For example, in the case of the integral basis 1, y, y'^/x of the root field of y^—xy+x^ over x, it was shown in Essay 4.4 that there are three expansions of y when a = 0, namely, y = i b y ^ + - • • and y = 0 + - • •, where the omitted terms are divisible by x. Therefore, every 0 integral over x has three expansions, but the one corresponding to y — —y/x + • • • can be derived from the one corresponding to y — ^Jx + • • • by changing ^Jx to —^/x. When just two expansions, those corresponding to y = \fx^ and 7/ = OH , are selected.
174
4 The Genus of an Algebraic Curve
the selected expansions of 0 = ci + C2y -\- csy^/x are given by the formula Mc-
"100" 0 10 101
Cl C2
_^3_
where the first two rows contain the coefficients of 1 and y/x in the first selected expansion and the last row contains the coefficient of 1 in the second.) That the matrix M is invertible can be proved as follows: If Mc = 0, then the n expansions of 0 contain no terms in which the exponent on x — a is less than 1, so 6/{x — a) is integral over x. If any Q were nonzero, one of the coefficients Ci/{x — a) in the representation oiO/{x — a) relative to the integral basis zi, 2^2, . . . , Zn would not be a polynomial, contrary to the definition of an integral basis. Therefore, Mc = 0 implies c = 0, so the square matrix M is invert ible. In other words, the coefficients in the terms of the expansion of 0 in which the exponent on x — o; is less than 1 can be given arbitrarily chosen values (subject to the relations among m such expansions when m > 1 that were just noted) by taking the column matrix c to be the column matrix of chosen values multiplied on the left by M~^. Because the principal parts of / are assumed to contain a nonzero term in which the exponent on x — a is less than or equal to —1, the least such exponent has the form e = —i — —, where i > 1 and 0 < j < m. Let 0 = J2 ^i^i be chosen so that the coefficient of {x — ay^^ in the expansion of 0 in the embedding in A{s) that gives rise to the nonzero term with exponent e in the expansion of / is nonzero, but all other expansion coefficients of terms with exponent less than 1 are zero. Then the expansion of fO in m embeddings begins j{x — a)"^ -h • • •, where 7 7^ 0, while in the remaining embeddings the expansion of fO contains no terms in which the exponent on x — a is less than or equal to —i. Therefore, the sum of the n expansions of fO is 7717(0: — a)~* + • • •, where z > 1. In particular, this sum is not a polynomial in x. The bilinear form "the trace of the product" from K x K to Q{x) is described, relative to the integral basis zi, 2:2, • • •, z^hy diU n x n symmetric matrix S of polynomials in x with integer coefficients, namely, the matrix whose entry in the zth row of the j t h column is tr x{ziZj). To say that f dx is holomorphic for finite x means simply that all entries of [f]S are polynomials in X when [/] denotes the row matrix whose entries are the coefficients that represent / in the integral basis 2:1, 2:2, . . . , Zn- If this were the case, the sum of the expansions of / ^ , which is [/]/S'[c] where [c] contains the coefficients of 0 = J2^i^i ^s above, would also be a polynomial in x. Since it is not, f dx must not be holomorphic for finite x, which completes the proof of Lemma 1. Theorem. Let z be a parameter in the root field of x(^, y) ctnd let ^ be the element of the root field defined using implicit differentiation as above. A differential f dx is holomorphic if and only if f^dz is holomorphic.
Essay 4.8 The Genus Is a Birational Invariant
175
Proof. The reciprocal oi ~ is ^ , so it will suffice to prove that '^fdx is holomorphic" implies '^f^dz is holomorphic." Let ^ be a given algebraic number and let an embedding of K in A (a) be given that carries z to 6 -{- a^ for some / / > 0. It is to be shown that if h dx is holomorphic, then the image oi h - ^ under this embedding contains no terms in which the exponent on z — b \s less than or equal to —1, or, what is the same, that all exponents in the expansion oi [z — 6) • h • ^ are positive. Assume first that the image of x under the given embedding has no terms in which the exponent on a is negative; say it is a + a'a + a"o-'^ + • • •. In this case, let m be the exponent of the first nonzero term in the expansion of X — a. (There is such a term because x is not a constant.) When an mth root ei of the reciprocal of a^^^ is adjoined to A, if necessary, the following lemma constructs a substitution a = eis + 62^^ + 635"^ + • • • that carries X = a + a(^)cr"^ + •. • to a + 5"^. Lemma 2. Given a nonzero power series AmX'^-\-Ajn+ix'^'^^^Am-{-2X^^'^-^ ''' in which the coefficients are algebraic numbers and the first nonzero term contains x to the power m > 0, and given an mth root Ci of I/Am, construct an infinite series x — C\S-\-C2S^-\rCzs'^-\-• • - with algebraic number coefficients whose substitution in the series results in s ^ . Proof Substitution of x = Cis + C2S^ + Css^ + • • • in A ^ x ^ + A^+ix^+^ + A m + 2 X ^ + 2 + • • • gives BmS"^ + ^ m + l ^ ^ ^ ^ ' + 5 m + 2 5 " ^ + ' + ' ' ' w h c r C Bm
=
AmCr = 1, 5m+i = mAmCr~^C2 + Am+iC^^\ . . . . The formula for Bm+i when z > 0 contains the terms mAmC]^~^Ci-^i and Am+iC'^^'^] the remaining terms in the formula constitute a polynomial in Ci, C2, . . . , C^ and A ^ , A ^ + i , . . . , Ajnj^i-i with integer coefficients. Thus, the requirement Bm+i — 0 for i > 0 is the statement that C^+l is a polynomial in Ci, (72, . . . , Ci and Am, ^m+1, . •., Am+i divided by mAmC^'^ = m/Ci. Since Am = l/Cf", it follows that each successive C^+i can be expressed rationally in terms of Ci, Am+i, Am+2, • • •, Am+i' The series Cis -\- €28^ + Cs^^ + • • • constructed in this way has the required property. Because the given embedding K —^ A (a) followed by the substitution cr = eis + 625^ H carries x io a-\- s^^ and because h dx is holomorphic, the resulting embedding K -^ A{s) carries /i to a series in s in which no term has an exponent less than or equal to — m on s. Otherwise stated, all exponents in the expansion of (x — a) • /i in powers of s are positive. Since this expansion is found by substituting the expansion of a in powers of s into the expansion oi {x — a) ' h in powers of cr, it follows that all exponents in the expansion of {x — a) • h in powers of a are positive. Let this expansion be multiplied by the expansion of ^ • ^ 5 ^ in powers of a. On the one hand, the result is {z — 6)'h-^. On the other hand, if (/)(x, z) = 0 is the equation satisfied by x and z, then (f) [a-{- a^^'^a^ + • • •, (5 -h cr^) is identically zero, so differentiation with respect to a gives (l)x{x, z){ma^'^'>a'^~^ +
176
4 The Genus of an Algebraic Curve
• • •) + (pzix, z)ii(j^^~^ — 0, where x and z stand for their expansions as power series in a and the omitted terms are divisible by cr'^. Multiplication by a then gives (l)x (x, z) (ma^^^ (x — a) -h • • •) + 0;^(x, z)iji{z — S)=0, where the omitted terms are divisible by cr^"^^. Division by (l)x{x, z) (which is not zero, because z is not a constant) times x — a gives ma^'^^ -\ M ^ ' f r f — 0? where the omitted terms are all divisible by a. This equation shows that the expansion in powers of a of j^^E^ is the constant —a^^^ plus terms in a. Therefore, {z - 6) ' h ' j ^ = {{x - a) • h) (—a'^^^ -\ j is a product of two series in a, one with positive exponents and one with no negative exponents, which shows that all terms in the expansion of {z — 6) • h - ^ in powers of a have positive exponents, a conclusion that holds for any embedding K -^ A((7) that carries z to 6 -\- a^ and carries x to a series with no negative exponents. If an embedding that carries z to 6 -\- a^ carries x to a series with some negative exponents, it carries TX = ^ to a series in which all exponents are positive. Since ^ - du \s holomorphic for finite u by virtue of the assumption that hdx is holomorphic, it follows that all exponents in the expansion of {z - 6) ' ^ ' ^ m powers of a are positive. By the chain rule, ^ = ^ ^ = ^ . ^ when X = :^, so it follows that all exponents in the expansion of {z — S) • h- j ^ — {^ ~ ^)' h' ^ • j ^ in powers of a are positive in this case too. Thus, Lemma 1 implies that h • j ^ - dz is holomorphic for finite z. By the same token, h- ^ -dv is holomorphic for finite v for any parameter V and in particular when v — ^. Therefore h • ^ • ^ • dv is holomorphic for finite V = ^, which completes the proof that h • ^ • dz is holomorphic. Corollary. The genus is a birational invariant The determination of the genus can be accomplished by finding holomorphic differentials, for which the following proposition is useful. An algebraic curve x(^, 2/) = 0 is nonsingular for finite x if no pair (a, /?) of algebraic numbers satisfies all three conditions xi^^P) — 0^ Xx{(^^ (^) = 0? and X^(Q;,/?) = 0.
Proposition. If x{x,y) = 0 is nonsingular for finite x, then hdx is holomorphic for finite x if and only if h • Xy ^-^ integral over x. In other words, when x(x, ?/) = 0 is nonsingular for finite x, the differentials holomorphic for finite x are those of the form /^ \ , where (/)(x, y) is integral over X. Proof. First assume that hdx is holomorphic. By the proposition of Essay 4.5, it will suffice to prove that the image oi h- Xy in each embedding of K in A(5) that carries x to a + s^, and carries rational numbers to themselves, is without negative exponents. When Xy{^^(^) 7^ 0, /? is a simple root of xi^^v)-! which implies, as was shown in Essay 4.4, that x = a ^ s^ y — (3 is d^n unambiguous truncated solution oi xi^^y) ~ 0- Such a truncated solution implies an infinite series
Essay 4.8 The Genus Is a Birational Invariant
177
solution y = l3 -^ P'{x - a) + f3'\x — a)"^ H . T h e corresponding embedding K -^ A(5) does not involve fractional powers o^ x — a. T h e assumption t h a t h dx is holomorphic implies t h a t the image of h in A(5) contains no exponents less t h a n or equal to —1, so all exponents are greater t h a n or equal to zero. The same is true of the image of Xy—it is a polynomial in x and y and is therefore integral over x—so the image oi h • Xy under the embedding has no negative exponents, as was to be shown. Otherwise, XX{OL^I3) 7^ 0, because the curve is nonsingular for finite x. In this case, t h e polynomial ^o(^) in x ( a + 5, /? + t) = ^Q{S) + ^i{s)t -\ \-t^ is divisible by s b u t not 5^, so the Newton polygon algorithm leads t o a "polygon" with one segment from (0,1) t o a point where ji = 0; call it (r, 0). T h e ambiguity of the t r u n c a t e d solution x = a-\-s^y = [3\^ then r , and t h e o u t p u t of Newton's polygon is r unambiguous t r u n c a t e d solutions x = a + 5[, y = P '^ A/CO • '^i (one solution for each of t h e r possible values of \/Co)- By Lemma 2, t h e infinite series expansion y — (3 — ^Co • si^- f3" si-\ implies an infinite series expansion si = 6i(?/ — /?) + 62(2/ — /5)^ + • • •, whose substitution in ^/^ • si-\- (3"si^ gives y — (3 and whose substitution in the embedding K -^ A(51) therefore gives an embedding K —^ A{y — (3) t h a t carries y to (3 -{- {y — (3). Because h - j ^ - dy is holomorphic, it follows t h a t the image of h ' ^ — —/i • — under this embedding has no exponents less t h a n or equal to —1. It has no fractional exponents, so all exponents in t h e expansion of h • — in powers oi y — (3 are at least zero. Therefore, t h e same is true of its expansion in powers of 5i. Since Xx is a polynomial in x = a -\- s\ and 2/ = /^ + A/CO ' SI-\ 5 t h e expansion of /i • Xy = ^ • ^ • Xx in powers of si has no terms with negative exponents, as was to be shown. Thus, the proof t h a t "/i dx is holomorphic" implies "/i • Xy is integral over x" is complete. To prove, conversely, t h a t all differentials of t h e form - ^ dx in which 0 is integral over x are holomorphic for finite x it will suffice to prove t h a t — dx is holomorphic for finite x. Certainly for any embedding x = a -\- s'^ ^ Xy
y = P -\- f3's + f3"s^ -h • • • for which Xy(<^?/?) 7^ 0 t h e expansion of - ^ in Xy
powers of s contains no negative exponents. All other embeddings have t h e form x = a-\-s\^y — (3-{- f3'si + /3''sf + • • •, where /?' ^ 0—as was just seen—by virtue of t h e assumption t h a t xi^^v) has no singularities for finite X. Differentiation of x(<^+5[,/?-f-/3'siH ) = 0 gives Xx{o^-^sl, f3-\-(3'si-\ )• r g r ' + X y ( c ^ + 4 . / ^ + /?^^i + ' ' 0 - ( / ? ^ + - 0 = 0 - T h u s , - ^ = - / ^ + ; - _ , . Since Xx{<^i P) 7^ 0, it follows t h a t — r + 1 is t h e least exponent in t h e expansion of — in powers of si. Since s7^~^^ s^ ^"^^ = {y (y — /S) ^ + ^ , t h e desired conclusion t h a t Xy
the embedding K -^ A(5) contain no the principal principal pp aa rr tt ss of of — — relative relative to to this t Xy
exponents t h a t are —1 or less follows. Example 1: x(^5 2/) — y^ + ^^ ~ 1 (the elliptic curve mentioned in Essay 4.2) Since Xy = ^ implies y — 0 and Xx = 0 implies x = 0, this curve is nonsingular for finite x, because {a, (3) = (0,0) does not satisfy /3^+Q;^ —1 = 0.
178
4 The Genus of an Algebraic Curve
Therefore, the differentials holomorphic for finite x are those of the form ^ ^ , where (j) is integral over x. The substitution x = -^ y = ^ puts this curve in the form ^'^ + l—^t^ = 0 and puts dx/27/in the form—(:^)dw/2(;^) = —du/2v, so (t)dx/2y is holomorphic if and only if 0 is integral over x and over u = 1/x, which is to say, if and only if (f) is constant. Example 2: x(^, y) = 2/^ + x^y + x (the Klein curve) This curve is nonsingular for finite x, because 3/5^+Q;'^ = 0 and Sa^/^+l = 0 imply that /? = — 3 ^ and a^ = —3/?^ = ~ 3 ^ ' ^^ ^^^^ ^'' ~ ""15 therefore /3^ + OL^I^ -f a; ^ 0, because o^^/? + 0^ = 0^^- ^ ^ + o^ = ^ is not the negative of /^^ _ ^^ M
1 = 27a6
g —« 27-(-l/3) ~ 9 '
Since Xyi^^v) — ^V^ + ^^^ every holomorphic differential can be written 3 2 1^3 for some 0 integral over x. Because 1, 2/, ^^ is an integral basis over x (Essay 4.5), (p must be a polynomial in x and y. Determining the holomorphic differentials on the Klein curve therefore amounts to determining the polynomials (f){x^y) in X and y for which g'^f^g is holomorphic. Such a differential is holomorphic for finite x, and the problem is to determine the conditions under which it is without poles at x = 00. The substitution x = -,y — \ transforms the curve into v^-\-u^v-\-u^ = 0, a curve with a singularity at (u^v) — (0,0). The first step of the Newton polygon algorithm in the case in which the value of i/ is 0 calls for setting u — s and v — ^ -V t^ which leads to s^ + s^t + t^. The polygon is based on the points (0,8), (1,3), and (3,0), so it consists of two segments 5i + j = 8 and 3i + 2j = 9. The first segment furnishes an unambiguous truncated solution u — s, V — —s^, which implies an infinite series solution, and the second furnishes the remaining two solutions in the form of the unambiguous truncated solutions u — a^ ^ v = zbicr^. The expression of o ^f 3 relative to the first of these is dx
(
du\
32/2+ ^3
V
^V
1 3-4 + A
—s^ds (35i0-h---)2 + 53
=
{-s^-'-)ds.
Therefore, there is no pole for this embedding if this differential is multiplied by X = - + ••• or by any power of ^ = ^ = —s^ + • • •• The expression of dx relative to the second is dx 3^2 ^ ^ 3
—2dcF 1 ^3 3 . i—\+-) + 1
2da a^
2+
= (a^ + • • •) da.
Therefore, it remains finite for this embedding if it is multiplied by x = ^ or by ?/ == i b ^ -f • • • but not if it is multiplied by any polynomial of higher degree \n x or y. In conclusion, the holomorphic differentials in this case are {a^hx -\- cy) dx 3y^ + x^ as was already found in Essay 4.6.
Miscellany
Essay 5.1 On t h e So-Called F u n d a m e n t a l T h e o r e m of Algebra Nor does this account of infinity rob the mathematicians of their study; for all that it denies is the actual existence of anything so great that you can never get to the end of it. And as a matter of fact, mathematicians never ask for or introduce an infinite magnitude; they only claim that the finite line shall he of any length they please; and it is possible to divide any magnitude whatsoever in the same proportion as the greatest magnitude.— Aristotle [3, p. 261]. Regrettably, the name "fundamental theorem of algebra" has become firmly attached to the statement that the field of complex numbers is algebraically closed—that is, the statement that a polynomial in one indeterminate x whose coefficients are complex numbers can be written in the form a^ 11?= i(^ ~ ^j)^ where complex numbers and ao is the leading coefficient of f{x). From a constructive point of view, this "theorem" is false for the following reason. A "complex number" by definition has the form a + 6i, where a and b are real numbers and i = y/^. A real number is a convergent sequence of rational numbers. Thus, computations with real numbers are computations with convergent sequences, so they are computations with approximations, and it is not constructively true that a real number is either zero or nonzero. For example, the sequence of partial sums of the series X^^^ j ^ , where n^ = 1 if i is an even number greater than 2 that is not the sum of two primes and rii = 0 for all other numbers i, is a well-defined real number, call it r, because its decimal expansion to a huge number of places is easily computable, and its computation to any number of places is a finite calculation. For as far as the computation has been carried, r appears to be zero. However, the statement that r = 0 amounts simply to the Goldbach conjecture that every even number greater than 2 is a sum of two primes, a statement that up to the
180
5 Miscellany
present time no one has been able t o prove or disprove.* To write rx"^ + x in t h e form ao OC^ "" ^j) ^^^ must determine whether t h e Goldbach conjecture is true: If it is, t h e n rx^ -f- x = a:, b u t otherwise, rx^ -\- x = rx{x -\- -). At t h e present time, then, no known construction finds t h e representation of rx^ + x in t h e required form. As will be shown in this essay, t h e theorem is true constructively when the coefficients of t h e given polynomial are known exactly. T h e o r e m . Given a polynomial f{x) whose coefficients are complex rational numbers—numbers of the form u-\-vi in which u and v are rational numbers and i = \f^—express it in the form a^ Y\!j=i{^ ~ ^j)y 'where xi^ ^2; ..., Xn are complex numbers and OQ is its leading coefficient. If f{x) is t h e complex conjugate of / ( x ) , then f{x)f{x) is a polynomial with rational coefficients, so t h e fundamental theorem of Essay 1.2 provides an irreducible, monic polynomial G{y) with integer coefficients t h a t splits f{x)f{x){x'^ + !)• W h e n t h e t e r m i in t h e coefficients of f{x) is identified with one of t h e two square roots of —1 in t h e root field of G{y), f{x) becomes a polynomial with coefficients in t h e root field t h a t splits into linear factors, because t h e polynomial f{x)f{x) of which it is a factor splits into linear factors. Therefore, f{x) = aoY\{x — Pj{y)) mod G{y) where t h e pj{y) are elements of t h e root field. Because t h e root field is isomorphic to the subfield Q[y*] of t h e complex numbers when y* is a complex root of G{y)^ these observations imply t h a t t h e theorem t o be proved is a corollary of the following simpler theorem: M a i n T h e o r e m . Given a monic, ficients, construct a complex root.
irreducible polynomial
with integer coef-
(More generally, let t h e field Q[i] of complex rational numbers be replaced by t h e root field of a monic, irreducible polynomial with integer coefficients. T h e Main Theorem implies t h a t such a root field is isomorphic t o a subfield of t h e complex numbers. In other words, every algebraic number field is isomorphic t o a subfield of t h e complex numbers. However, because algebraic computations can be done exactly in a root field, such subfields of the complex numbers are very special; in them, as in Q[z], algebraic computation can be done exactly. If f{x) is a polynomial with coefficients in such a subfield of the complex numbers, then f{x) can be written as a factor of a polynomial (its norm) with rational coefficients, a n d t h e method just explained shows t h a t f{x) can be written in t h e form OQ Yli^ ~ ^j)i where t h e Xj are complex numbers, b u t where t h e Xj are in fact contained in some algebraic number field viewed as a subfield of t h e complex numbers.) * If one accepts Kronecker's criterion quoted in the epigraph to Essay 1.5, one must say that the definition of "positive real number" lacks a firm foundation because no method can be given for determining whether the real number r is positive.
Essay 5.1 On the So-Called Fundamental Theorem of Algebra
181
Lemma. Given a monic, irreducible polynomial f{x) with integer coefficients, and given a positive rational number e, find a rational complex number XQ for which |/(^o)| < e. Proof. Let n be the degree of the given polynomial f{x). Because 4 ^ ^ — ^ can be written as a quotient of polynomials in which the numerator has degree n— 1 and the denominator has degree n - h l , 4 ^ = ^ + 0(|x|~^) as \x\ -^ oc; that is, there is a constant K for which 1 4 ^ " ^ | ^ T ^ ^^^ ^^^ sufficiently large |x|. It follows that if i? is a positive integer and S is the square in the complex plane whose vertices are d=i^ zb Ri^ then
JdS f{x)
Jds X
\R
as i^ —> oo. Since f^^ ^ dx = 27ri • n for all R, it follows that there is a positive integer i?, call it RQ^ for which J^^ ^M ^^ ¥" 0^ where ^o is the square in the complex plane whose vertices are ibi^o i ^o^Because -^^^ _-^^^^ is a polynomial in x and y with integer coefficients, X
y
there is an integer M such that I I
~: X
y
I is at most M on the square SQ. I
For such an integer, \f{x) — f{y)\ < M\x — y\ for all x and y in ^o- Let T be an integer larger than ^ and let So be partitioned into subsquares that are ^ on a side. If x and y are complex numbers in the same subsquare, then \f{x) - f{y)\ < v ^ ^ M < | e . Thus, if |/(a:)| > e at the midpoint of a subsquare, then f{x) is nonzero throughout the subsquare, which means that log/(x) is defined as a function of x throughout the subsquare; since the derivative of \ogf{x) is "TCT, it follows that the integral of 4 ^ dx around the boundary of each such subsquare is zero. If |/(x)| > e were true for the midpoints of all the subsquares, then J^^ 4 ^ dx^ which is the sum of the integrals around the boundaries of all the subsquares, would be zero, contrary to the choice of SQ. Thus, the finite set of midpoints of the subsquares contains at least one complex rational number XQ for which |/(:z^o)| < ^^ as required. (Note that the coordinates of the midpoints are rational, so | / ( x ) p at the midpoints can be computed exactly and compared to the rational number e^. Obviously, the amount of computation required to find an XQ by this construction could be huge even for a rather simple f{x) and a moderate e. The method described here is not a practical way of finding an XQ—which in most cases would be easily accomplished by simple bisection methods to get a rough estimate, followed by Newton's method—but a way that can be succinctly described and is a finite calculation.) Proof of the Main Theorem. Let f{x) be the given monic, irreducible polynomial with integer coefficients. Since | / ( x ) | ^ CXD as \x\ -^ CXD, a positive integer N can be chosen for which \x\ > N implies \f{x)\ > 1.
182
5 Miscellany
By the Euclidean algorithm, there are polynomials a{x) and P{x) with rational coefficients for which a{x)f{x) + [3{x)f'{x) divides b o t h f{x) and f'{x). Because f{x) is irreducible and the degree of f'{x) is less t h a n t h a t of / ( x ) , a{x)f{x) + j3{x)f'{x) must be a nonzero rational number, so it can be assumed without loss of generality to be 1. Let A and B be positive integers for which \2a{x)\ < A and \2(3{x)\ < B throughout the disk \x\ < A^ in the complex plane. Finally, let C be a positive integer t h a t is an upper b o u n d for the modulus of t h e polynomial ^ IZ when x and y are complex numbers whose moduli are less t h a n N ^1. Use the lemma to find a rational complex number XQ for which | / ( x o ) | is less t h a n b o t h ^ and J^JQ ^ ^ ^ define a sequence x i , X2^ . . . , x^, . . . by the formula
This sequence converges, as is proved by t h e estimate (1)
|Xn+l - ^ n | < ^ l ^ r i - ^ n - l |
( f o m = 1, 2, . . . )
which will be proved inductively using the estimate (2) \f'{x)
— f'{xQ)\
< —— 2B
(for X on the line segment from Xn-i
to Xn)-
Because 2 = 2a{xo)f{xo) -\- 2/3{xo)f'{xo) and because \f{xo)\ < ^ < 1, the estimate |xo| < N holds, so 2 = \2a{xo)f{xo) + 2P{xo)f{xo)\ < A- \ ^ B • | / ' ( x o ) | , which implies ,r,} ^| < B. Thus, because xi — XQ = —jr^^, the estimate \xi — xo\ < B - 4^2^ = j ^ holds. In particular, because \xo\ < N, the line segment from XQ to xi lies inside t h e disk for which the estimate \f\x) — / ' ( x o ) | < C • \x - xo\ applies, and \f^{x) — f'{xo)\ on this segment is at most C • -^^ = ^ , which implies (2) in t h e case n = 1. W h e n (2) is used to estimate the modulus of
^ £'(/(.„,-/•(.))..
f{xi) - f{xo) ^2 - ^ 1 = ^ 1 - ^ 0
777
^
f\xo)
r,
f
one obtains |x2 — x i | < B\xi — XQ\ • 1^ = ^\xi — XQ\, which proves (1) in the case n = 1. Now if (1) holds for all numbers less t h a n n, then \xn — XQ\ < \xn — Xn-i\-\|X^_1-X^_2|H
h | X i - X o | < {^2^
+ 2 ^
"^
VI)\XI-XQ\<
2'J^
=
2^' It follows t h a t all values of x on the segment from Xn-i to Xn satisfy \x\ < A^ + 1, so \f'{xo) — f'{x)\ < C ' 2BC' which proves the estimate (2) for n. W h e n this estimate is applied to the formula ^n-\-l
^ £_(/(.„)-/'MM.
Essay 5.1 On the So-Called Fundamental Theorem of Algebra
183
it gives (1) for this n (the modulus of the constant in front is less t h a n B and the modulus of the integrand is less t h a n 2 ^ ) . By (1), t h e sequence of complex rational numbers XQ, x i , X2^ . • .satisfies the Cauchy criterion, which is to say t h a t it defines a complex number. Moreover, \Xn^l-Xn\ = jjT^l^ SO | / ( x ^ ) | = | / ' ( ^ o ) | l^n+l " ^n | < ^\f'{Xo)\\xi xo\ approaches 0 as n —> (X), which is to say t h a t the complex number defined by the sequence is a root of / . Alternative Proof (See Gauss, [30]). Let f{x) be a monic, irreducible polynomial with integer coefficients. If deg / is odd, a simple bisection argument proves t h a t / must have not just a complex root but even a real root. (Since f{N) is positive and f{—N) is negative for all sufficiently large N, one can construct an interval on which f{x) changes sign. Repeated bisections—note t h a t each bisection point is rational, so the value of / there can be determined exactly—produce nested intervals t h a t become arbitrarily short on which / changes sign. T h e sequence of upper ends of these intervals converges, and its limit is a root of / . ) T h e idea of t h e proof is to use the quadratic formula t o reduce t h e problem of finding a complex root of a given polynomial whose degree is even to the problem of finding a complex root of a polynomial whose degree is divisible fewer times by 2. Specifically, given a monic, irreducible / with integer coefficients whose degree is divisible u times by 2, an auxiliary monic, irreducible polynomial with integer coefficients will be constructed with two properties: (1) Its degree is divisible fewer t h a n 1/ times by 2. (2) A complex root of the original polynomial can be constructed if a complex root of the auxiliary polynomial is given. Let the given monic, irreducible polynomial with integer coefficients be written in t h e form f{x) — W{x — pj) and define F(u,v) = H - ^^{u — {pj -\Pk)v + Pjpk)^ where the product is over all pairs {j,k) of indices for which I < j < k < n. Here the pi are elements of some Galois algebraic number field, so t h e coefficients of F , being symmetric in t h e pi, are integers. T h e auxiliary polynomial will be F{u, V), where V is an integer for which F(ix, V) (a polynomial in u with integer coefficients) is relatively prime to its derivative in t h e sense t h a t there are polynomials a{u) and P{u) with rational coefficients for which a{u)F{u^ V)+I3{u)^{u^ V) — 1. T h e expression of f{x) in t h e form f j ( x — pj) of course requires t h e construction of a splitting field for / ( x ) , b u t this construction is needed only t o provide the rationale for t h e following construction of F{u^ v) using computations with symmetric polynomials (Essay 2.4). According to its definition, F{u,v) is the polynomial in u in which t h e coefficient of '^ is (—1)* times t h e i t h elementary symmetric polynomial in the (2) expressions {pj -\- pk)v — pjPk^ where pj and pk ^ pj are in t h e splitting field. These are polynomials in v whose coefficients are symmetric polynomials in t h e p j , so F{u^ v) is a polynomial in u and v whose coefficients are symmetric polynomials in t h e pj. Since any symmetric polynomial is a polynomial with integer coefficients in the elementary symmetric polynomi-
184
5 Miscellany
als, and since the ith elementary symmetric polynomial in pi, p2, • • •, Pn is (—1)* times the coefficient of x^~'^ in / ( x ) , F{u,v) is expressed in this way as a polynomial in two indeterminates with integer coefficients without using the splitting field in the calculation, except as a rationale. That an integer V can be chosen in such a way that F{u^ V) is relatively prime to its derivative in the sense described can be proved as follows: Again, a splitting field for f{x) is needed to justify the proof but is not needed for the calculations that constitute the proof. For any integer V, the Euclidean algorithm for polynomials with coefficients in Q can be applied to F(w, V) a n d f f ( i i , y ) t o f i n d a nonzero polynomial of the form A{u) = a{u)F{u, V) + (3{u)^{u,V) that divides both F{u,V) and ^{u,V). An integer V is to be found for which this "greatest common divisor" A{u) has degree 0. Of course the computation of A{u) involves only rational numbers, but it can be regarded as a calculation with polynomials whose coefficients are in a splitting field for f{x). When it is regarded in this way, A{u) can fail to be of degree 0 only if it is divisible by at least one of the linear factors u — {pj -\- pk)V -\- pjPk of F{u, V) with coefficients in that field. Thus, it can fail to be of degree 0 only if one of the roots {pj -\- Pk)V — pjPk of F{u^ V) in this field is also a root of ^{u^V). But the formula for the derivative of a product implies that ^{{pj + Pk)y — PjPk^ y) is equal to plus or minus the product of the H2) ((2) ~ -'-) diff"erences of the roots of F(ix, F ) , or, what is the same, plus or minus the value when v = V oi the polynomial that is the product of the ^(2) ((2) ~" 1) hnear polynomials in v with coefficients in the splitting field of f{x) that are differences of distinct roots of F{u^ v) as a polynomial in u. This product is a polynomial in v of degree ^{^) ((2) ~ l)^ which is nonzero, because the factors are all nonzero (if pj -h Pk = pm + Pn and pjPk — PmPm then {x — pj){x — pk) = {x — pm){x — pn), which implies when both j < k and m < n that j = m and k = n^ because / is irreducible and therefore has distinct roots), so it is a nonzero polynomial in v with rational coefficients. The number of roots of such a polynomial is at most equal to its degree, so there is certainly an integer V that is not a root. If the degree n of f{x) is even—so that v > 0—the degree (2) of F(ii, V) is divisible exactly u — 1 times by 2. Therefore, at least one of its irreducible factors is divisible fewer than u times by 2. The Main Theorem will therefore be proved if F{u, V) is shown to have property (2) stated above that a complex root of F{u^ V) (a complex root of an irreducible factor of F{u^ V) is of course a root of F{u, V)) enables one to construct a complex root of f{x). Since F{u^v) has the special form F{u^v) = nr=i(^* + ^'^^ "^ li^)^ the polynomial in three variables F {u-\- ^w^v — ^w) can be written in the form F{u,v)(b(u^v.w). In fact, if one defines QAu.v) to be —, i^'^^—, then F{u,v) = {ai -\- piU^-fiv)Qi{u,v), so | ^ = PiQi + (a^ -\- f3iU ^-/iv)^ and I f = 7iQi + ( < ^ 2 + A ^ + 7 2 ^ ) ^ , and a typical factor of F ( i i + ^w,v ^w) is ai-\-Pi{u-^ ^w)-\-ji{v^w) =- ai + piU^jiV-^w {Pi^ - 7 ^ ! ^ ) . Since
Essay 5.1 On the So-Called Fundamental Theorem of Algebra
185
A | f - 7 i | ^ = (^i + A ^ + l^v) ( A ^ - 7 i ^ ) , this factor is divisible by ai + jSiU -\- 7^f, and the entire product is divisible by F{u^ v). (Explicitly, this calculation proves that the quotient (j){u^ v^w) \sY\il -\- w{Pi-^ — Ji-^)) •) In the case of the auxiliary polynomial F{u, V) constructed for a given f{x) as above, it follows that if [/ is a complex root of F{u^ V) and if U' and V are the complex values of ^ and ^ , respectively, when {u^v) = (U^V), then F{U + V^w^ V — U'w)^ a polynomial in w with complex coefficients, is identically zero. Since t/' ^ 0 by the construction of V^ one can set w — ^^^-^rin this identity to find that F(f/ + y ^^^^^, a;), a polynomial in x with complex coefficients, is identically zero. The quadratic formula can be applied to the equation U + V'^^^-^ = x'^ to find a complex root, call it X. This complex number is a root of f{x)^ because 0 = F(X^, X) = W- ^{X'^ — {pj -f pk)X H-
pjPk) = nj,kix - Pj)ix - Pk) = f{xr-\
Thus, if every monic, irreducible polynomial with integer coefficients whose degree is divisible fewer than u times by 2 has a complex root, the same is true of every monic, irreducible polynomial whose degree is divisible exactly ly times by 2, and the proof is complete. Both proofs show clearly that the algebraic issues involved in the theorem are handled by the theorem of Essay 1.2 and have nothing to do with complex numbers. The complex numbers enter only because every algebraic number field—which is to say, in the terminology of Essay 2.2, every algebraic field of transcendence degree 0—is isomorphic to a subfield of the complex numbers. However, from a constructive point of view, to regard the field elements as complex numbers instead of algebraic quantities is a mistake, because it replaces quantities for which exact computation is possible with quantities for which computation involves limits. In short, the so-called fundamental theorem of algebra results when the "Main Theorem" above is used to debase the meaning of the theorem of Essay 1.2.
186
5 Miscellany
Essay 5.2 Proof by Contradiction and t h e Sylow Theorems T h e widely held belief t h a t "proofs by contradiction" are "not constructive" is mistaken. One can constructively prove t h a t a certain construction is impossible by proving t h a t if it were possible, then another construction that is clearly impossible would also be possible. Perhaps the oldest such proof is the proof t h a t y/2 is irrational: Let a unit of length be fixed, and suppose an isosceles right triangle ABC could be given whose sides AB and BC and hypotenuse AC had whole number lengths. One could t h e n construct a smaller isosceles right triangle with the same property in the following way. Let D be the point on the hypotenuse AC for which the length AD is equal to the length AB of t h e sides of ABC. Let E be the point where the perpendicular to the hypotenuse AC through the point D intersects t h e side BC. T h e n CDE would be an isosceles right triangle smaller t h a n ABC whose sides and hypotenuse had whole number lengths, as can be seen in the following way: The side CD of CDE is the difference of the lengths AC and AD = AB, b o t h of which would be whole number lengths, so it would be a whole number length. Moreover, t h e new triangle CDE is an isosceles right triangle because angle DCE is angle ACB and angle CDE is right, so t h e new triangle is similar to the original one. Finally, when AE is joined, the right triangles ADE
Essay 5.2 Proof by Contradiction and the Sylow Theorems
187
and ABE are congruent because the hypotenuse and one side are equal, so the lengths DE and EB are equal, and the hypotenuse CE^ being the difference of the lengths CB and EB = ED = CD would be a difference of whole number lengths and would therefore be a whole number length. Thus, if there were such an isosceles right triangle, one would be able to construct an infinite decreasing sequence of positive whole numbers, for example, the whole number lengths of the hypotenuses of the sequence of isosceles right triangles constructed in this way. Since there can be no such sequence, there can be no such isosceles right triangle. By the Pythagorean theorem, such an isosceles right triangle is the same as a pair of whole numbers s and d for which (P = 25^, so another way to state the same conclusion is to say that no rational number d/s can have 2 as its square. The main Sylow theorem. Theorem 1 below, can be proved by the same method, which Fermat called the method of infinite descent. Theorem 1. If d divides the order \G\ of a finite group G, and if d is a power of a prime number, then G has a subgroup of order d. Lemma. / / the order of an abelian group is divisible by a prime number p, then the group contains an element of order p. Note that the lemma is a very special case of Theorem 1. Proof. Suppose a finite abelian group G is given whose order is divisible by a prime p but that contains no element of order p. It will be shown that one would then be able to construct a smaller group of the same type. Choose an element a of G that is not the identity {\G\ is divisible by p > 1, so G contains elements other than 1) and let g > 1 be the order of a. If p divided g, then {a^^^Y = 1 would imply, since G contains no element of order p, that a^/^ = 1, contrary to the definition of q. Therefore, p does not divide q. Because p is prime, p and q are relatively prime, the Euclidean algorithm implies that some multiple of q is one more than a multiple of j9, say uq = vp -\- 1. Let C be the subgroup of G generated by a, and let G' be the quotient group G/C. (Since G is abelian, C is a normal subgroup.) Let b be an element of G that represents an element of G' whose pth power is 1. Then b^ = a' for some i, and (a^^6)^ = a^^P+^ = a^^^ = T^ = 1. Since G has no elements of order p, it follows that a^'^b — 1. Therefore, 6 is a power of a, which means that it represents the identity element of G^ Therefore, G' contains no element of order p. The order of \G'\ is \G\/qy which is divisible by p because |G| is divisible by p and q is not. Thus, G', like G, is abelian and has order divisible by p but contains no element of order p. Since \G'\ < |G|, the impossibility of such a group G follows from the observation that the existence of such a G would imply the existence of an infinite sequence of positive integers \G\ > \G^\ > \G''\ > • • •, which is impossible.
188
5 Miscellany
Proof of Theorem 1. As in t h e proof of the lemma, a counterexample will be shown to imply a smaller counterexample. Thus, suppose G is a counterexample, which is to say t h a t G is a finite group for which there is a power of a prime p^ t h a t divides \G\ b u t t h a t is not the order of any subgroup of G. It is to be shown t h a t from such a group one can construct a smaller such group. Let Z be t h e center of G. If p divides | ^ | , a smaller counterexample can be constructed as follows: In this case, the lemma implies (because Z is abelian) t h a t Z contains an element of order p, call it a. Let C be the subgroup generated by a. Since G has no subgroup of order p^ and C has order p, k must be at least 2. Since G is a normal subgroup, G' = G/C is a group. The order of G' is | G | / p , so ^^^"^ divides |G'|. If i J ' is a subgroup of G' of order m, then the elements of G t h a t represent elements of H' form a subgroup of G of order mp. Therefore, since G has no subgroup of order p^, G' has no subgroup of order p^~^ ^ which shows t h a t G' is a smaller counterexample t h a n G. If p does not divide |Z|, a smaller counterexample can be constructed even more easily. Since G is the union of Z and all t h e conjugacy classes of G t h a t contain more t h a n one element {Z is the union of the conjugacy classes t h a t contain one element), since \G\ is divisible by p , and since \Z\ is not divisible by p, the number of elements in at least one conjugacy class with more t h a n one element must not be divisible by p. If g is in such a conjugacy class, then, because the number of elements in the conjugacy class of ^ is | G | / | Z ( ^ ) | , where Z{g) is the centralizer of ^, \Z{g)\ < \G\ and p divides \Z{g)\ as many times as it divides |G|. Thus, p^ divides \Z{g)\ and Z{g) has no subgroup of order p^ (because G has no such subgroup), so Z{g) is a smaller counterexample. Thus, Theorem 1 follows by the principle of infinite descent. For the sake of completeness, the other Sylow theorems will be deduced from Theorem 1. Theorem 1 shows t h a t if p is a prime divisor of |G|, then G has a subgroup of order p^, where p^ is the largest power of p t h a t divides |G|. Such a subgroup is called a S y l o w p - s u b g r o u p of G. T h e o r e m 2. If H is a Sylow p-subgroup of a finite group G, and if K is a subgroup of G whose order is a power of p, then some conjugate of K is contained in H. In particular, any two Sylow p-subgroups of G are conjugate. T h e o r e m 3 . Let s be the number of Sylow p-subgroups \G\ and 5 = 1 mod p.
of G. Then s divides
(Theorems 2 and 3 are t r u e for primes t h a t do not divide the order of G when one adopts t h e n a t u r a l convention t h a t in this case t h e only Sylow p-subgroup is {1}. Similarly, Theorem 1 is trivially true for the zeroth power 1 of any prime.) Proof of Theorem 2. T h e subgroup K acts on the left cosets of i J in G by left multiplication, and this action partitions the left cosets into orbits. (To
Essay 5.2 Proof by Contradiction and the Sylow Theorems
189
say that g and g' represent cosets in the same orbit means that kg is in the coset represented by g' for some k in K, or to put it another way, {kg)~^g' is in H for some kinK.) The number of elements in any orbit divides the order of K, which is a power of p, so the number of elements in any orbit is p^ for some z > 0. The sum over all orbits of these numbers p^ is the number of left cosets of H in G, which, because i7 is a Sylow p-subgroup, is not divisible by p. Therefore, p* = 1 for at least one orbit. In other words, for at least one g in G, the coset represented by kg is the same as the coset represented by g for all /c in X. In short, g~^kg is in H for all A: in i^. Since g~^Kg is a subgroup conjugate to K^ the theorem follows. Proof of Theorem 3. Let G act on the s Sylow p-subgroups by conjugation. By Theorem 2, this action is transitive: There is just one orbit of size s. Therefore, s divides \G\. Let iJ be a Sylow p-subgroup of G, and let H act on the s Sylow psubgroups by conjugation. This action partitions these s subgroups into orbits. Suppose the number of elements in the orbit of H' is 1. Then hH'h~^ = H' for all h in H. In other words, the normalizer of H' in G, call it N^ contains H. Since N also contains H' and both H and H' are Sylow p-subgroups of N {p can divide |A^| no more times than it divides |G|), Theorem 2 implies that H and H' are conjugate in N, which is to say that nH'n~^ = H for some n in N. But nH'n~^ = H' by the definition of N. Therefore, H' — i7, which shows that only one orbit—the orbit of H—consists of a single element. The number of elements in any orbit divides the order of H and is therefore a power of p. Therefore, all orbits other than the one with 1 element have p* elements, where z > 1, from which s = 1 mod p follows.
190
5 Miscellany
Essay 5.3 Overview of 'Linear Algebra' So ist es nicht erstaunlich, dafi ein grofier Teil der modernen algebraischen Lehrbiicher sich der abstrakten Richtung angeschlossen hat, welche im Bereich der Forschung so grofie Erfolge zu verzeichnen hatte. Jedoch mehr als einmal hatte ich Gelegenheit zu beobachten, dafi dies im Bereich der Lehre nicht durchweg der Fall ist. (It is therefore not surprising t h a t a great many modern textbooks have followed the abstract direction t h a t has registered such great successes in the realm of research. However, I have had more t h a n one opportunity to observe t h a t in the realm of teaching this is not invariably the case.)—N. Chebotarev [8, Author's preface t o t h e translation] Some years ago, Sheldon Axler published a book with the audacious title Linear Algebra Done Right [4]. I was probably more struck by the audacity of his title t h a n most readers, because only a few years earlier I had published my own book called Linear Algebra, in which the subject had in fact been done right, b u t I had never thought to say so in the title. Of course, Axler's idea of doing it right t u r n e d out to have nothing to do with mine. W i t h o u t doubt the most attractive quality of mathematics is its apparent lack of subjectivity. "It must be easy for you mathematicians to grade papers," my friends often tell me, "because in mathematics there's only one right answer." In mathematics it can even h a p p e n t h a t the student is right and the teacher wrong, and t h e teacher can be forced to admit it (usually, we hope, cheerfully). T h e other side of this pleasant coin is t h a t mathematics a t t r a c t s people who have a great need for certainty and encourages t h e m to develop into rigidly dogmatic thinkers. T h e charge is made against advocates of constructive mathematics—it was made against Kronecker, against Brouwer, against Bishop—that they are dogmatists who implacably advocate unreasonably extreme views. But what distinguishes t h e m from their accusers is neither the extremity of their views nor the tenacity with which they hold them, b u t the mere fact t h a t their views differ from those of their accusers. T h e feeling on b o t h sides too often is, "I am not convinced by your arguments because your arguments are unconvincing; you are not convinced by my arguments because you are dogmatic." Of course mathematicians feel t h a t mathematics is pure reason and therefore immune to such controversy. But there are plenty of controversies in mathematics. How else can Axler's and my difference regarding linear algebra be described? His choice of title is—I assume—intended as a joke, just as I am joking when I say t h a t my linear algebra book had already done it right. But, in b o t h cases, not really. And I expect t h a t if you ask the first mathematician who comes along which of us is right, the reply will be t h a t both are wrong, and the right way to do linear algebra is . . . . So, having established t h a t it is a mere m a t t e r of opinion, let me explain, if not why I am right, at least why my opinion differs from Axler's. His main
Essay 5.3 Overview of 'Linear Algebra'
191
goal is to avoid determinants, for the reason that the formula for determinants is difficult to motivate, contrary to the modern style of mathematics and, as Axler shows, avoidable. I agree with him that the formula for determinants is daunting, but I believe that determinants, like a boulder in the path, need to be dealt with, not avoided. They are central to linear algebra—specifically to the solution of systems of linear equations—and the sooner students can be brought to use them and be comfortable with them, the better. My main goal in Linear Algebra, by contrast, was to deal with the subject in an algorithmic way that I have found through teaching makes sense to students and gives them the tools they need to solve problems in linear algebra. (Also, the book defines determinants without the formula in a natural way that is explained below.) The early chapters are largely devoted to the following theorem. Let two m X n matrices of integers be called equivalent if one can be transformed into the other by a sequence of steps in which a row is added to or subtracted from an adjacent row or a column is added to or subtracted from an adjacent column. Theorem. Given two m x n matrices of integers, determine whether they are equivalent. Let an m X n matrix of integers be called strongly diagonal if it is diagonal (that is, the entry in the ith row of the jth column is zero whenever i 7^ j ) , if each diagonal entry is a multiple of its predecessor on the diagonal, and if the diagonal entries are nonnegative, except that the entry in the lower right corner may be negative when the matrix is square. The theorem is proved by giving an algorithm that transforms a given m x n matrix of integers into a strongly diagonal one and proving that two strongly diagonal matrices are equivalent only if they are equal. In short, the theorem is proved by showing that strongly diagonal form is a canonical form for matrices with respect to this equivalence relation. (Strongly diagonal form is very close to what is often called Smith normal form in honor of H. J. S. Smith.) The algorithm is simple. (In the book it is given in two stages: the rules for reducing to diagonal form in Chapter 2 and the additional rules for reducing to strongly diagonal form in Chapter 5.) The hard part of the proof is the proof that if two square diagonal matrices are equivalent, then the products of their diagonal entries are equal. That the absolute values of these products are equal is comparatively easy to prove, so the proof comes down to showing that the signs are the same. The investigation of the sign of the product of the diagonal entries in equivalent diagonal square matrices motivates the definition of the determinant of a square matrix as the product of the diagonal entries of an equivalent diagonal matrix; the main thing to be proved then becomes the theorem needed to make this definition valid, namely, the statement that if two square diagonal matrices are equivalent, then the products of their diagonal entries are the same. In fact, the difficult point of the proof can be put even more starkly: Let J be the strongly diagonal matrix that is the nxn identity
192
5 Miscellany
matrix In with the last diagonal entry changed to —1. Prove that J is not equivalent to 7^. At first glance, this theorem seems to have little to do with linear algebra as it is generally thought of (vector spaces, linear maps, bases, etc.), but it provides an algorithmic solution to the core problem of linear algebra: Given an m X n matrix and a column matrix Y of length ?7i, find all column matrices X of length n for which AX = y . In linear algebra courses, the matrices are usually assumed to have real number entries, but the limit process inherent in the notion of a real number has nothing to do with linear algebra per se, and a more reasonable assumption is that the entries are rational numbers. The denominators can be cleared in order to translate the problem into one in which A and Y have integer entries, and the problem becomes that of finding all solutions X of AX — Y with rational entries (with, naturally, a preference for solutions whose entries are integers). If matrices A and B are equivalent, the solution of AX — Yis equivalent to the solution of BX' = Y' ^ because the column operations used to transform A into B can be regarded as invertible transformations of X into X' ^ while the row operations are invertible transformations of Y into Y'. Therefore, it suffices to solve AX — Y for diagonal matrices, which can be done by inspection. For example, DX = F for a diagonal matrix D (not necessarily square) has a solution X for every Y if and only if D has a nonzero entry in each row, which implies, in particular, that the number m of rows of D is no greater than the number n of columns. Similarly, DX determines X if and only if D has a nonzero entry in each column, which implies, in particular, that m > n. Thus, the equation AX = Y can be inverted to express y as a function of X only when m = n^ whether or not A is diagonal. Moreover, a square matrix is invertible if and only if the product of the diagonal entries of an equivalent diagonal matrix (which, at this point of the development, has not yet been shown to be independent of the choice of the equivalent diagonal matrix) is nonzero. / / students found it helpful to think of mathematics in terms of sets and functions, this could all be told to them in the usual way: An m x n matrix describes a particular kind of function—a linear function—from Q'^ to Q"^. It can be onto only if n > m and can be one-to-one only if n < ?7i. If a linear function is both one-to-one and onto, then its inverse function is a linear function, which is to say that the square matrix of coefficients of the given function has an inverse matrix. Time after time in teaching the course I have decided that I surely could explain these facts of linear algebra in terms of linear functions in a way that the students would find helpful, and time after time the effort has failed. The statement that a function is one-to-one seems indistinguishable from the definition of a function for most students. Confusion about the difference between the statement that / is a function from Q^ to Q ^ and the statement that / is onto Q^ is compounded by the fact that different mathematicians mean different things when they talk about the "range" of a function. Class
Essay 5.3 Overview of 'Linear Algebra'
193
discussions bog down in terminological and conceptual issues that have nothing to do with linear algebra. These experiences have convinced me that the set-function conceptualization is not helpful for students of linear algebra. Perhaps it will work the other way around—a knowledge of linear algebra may help teach notions of sets and functions—but in my experience it does not work the way it is currently supposed to. The above theorem and related topics form the substance of the first six chapters. Chapter 7 is on Moore-Penrose generalized inverses. For every mxn matrix A of rational numbers, there is a unique n x m matrix of rational numbers B for which AB and BA are both symmetric and the equations ABA = A and BAB = B both hold. This matrix B is the Moore-Penrose generalized inverse of A, or the "mate" of A, as I call it for short. Clearly, if B is the mate of A^ then A is the mate of B. The main property of mates is that BY is the best solution, in the least squares sense, of the equation AX = Y for any column matrix Y of length m. (More precisely, when ||M|p denotes the sum of the squares of the entries of a matrix M, ||1^ — AX|p attains its minimum value when X = BY and, among all column matrices X of length n for which this minimum is attained, X = BY is the one for which ||X|p is smallest.) Chapter 8 generalizes the theorem stated above from the case of matrices of integers to the case of matrices of polynomials in one indeterminate x with rational coefficients. In this case, rather than restricting to addition or subtraction of an adjacent row or column, one allows subtraction of any multiple of a row or column from an adjacent row or column, where the multipliers are polynomials in x with rational coefficients. (In the case of integer matrices, the two definitions are the same because subtraction of an arbitrary integer multiple can be achieved by repeated additions or subtractions.) The condition that the nonzero diagonal entries of a strongly diagonal matrix must be positive is replaced by the condition that they must be monic unless, again, they occur in the lower right corner of a square matrix. Once again, every matrix is equivalent to a strongly diagonal matrix, and strongly diagonal matrices are equivalent only if they are equal. In this way, the problem of determining whether two given matrices are equivalent is solved. This solution leads to the proof of the following important theorem of intermediate linear algebra: Two n x n matrices of rational numbers A and B are similar if there is an invertible n x n matrix of rational numbers P for which A = P-^BP. Theorem. Given two nxn matrices of rational numbers, determine whether they are similar.
194
5 Miscellany
Proof. It is not difficult to prove* t h a t A is similar to B if and only ii xl — A is equivalent to xl — B when b o t h are regarded as matrices whose entries are polynomials in x with rational coefficients. (Here / denotes the nx n identity matrix.) Therefore, t h e algorithm of Chapter 8 for solving this latter problem solves the problem of t h e theorem. The c h a r a c t e r i s t i c p o l y n o m i a l of a square matrix A is the determinant of xl — A. As follows from what has just been said, similar matrices have t h e same characteristic polynomial. The converse of this statement is false, as is shown by the fact t h a t the matrix A = [^ | ] is not similar to the 2 x 2 identity matrix. In fact, the strongly diagonal matrix equivalent to xl — A in this case is [^ (3^^1)2], not [^~^ x - i ] ' "^^^ m i n i m u m p o l y n o m i a l of a square matrix A is t h e last diagonal entry of the strongly diagonal matrix equivalent to xl — A. T h e minimum polynomial of a m a t r i x is easily shown to be the greatest common divisor of t h e polynomials of which it is a root (when t h e constant t e r m of the polynomial is interpreted as a multiple of A^, and A^ is interpreted as / ) . For example, in the case of the matrix A just considered, ^4 is a root of f{x) if and only if f{x) is a multiple of J:^ — 2x + 1. By the above, similar matrices have the same minimum polynomial, but this necessary condition for two matrices to be similar is still not sufficient. Such considerations lead to t h e study of the e l e m e n t a r y d i v i s o r s of a matrix A, which are certain powers of irreducible polynomials (the elementary divisors of / are x — 1 and x — 1, while [J J] has just one elementary divisor (x — 1)^) whose product is the characteristic polynomial; they are easily described in terms of the strongly diagonal matrix equivalent to xl — A, and they do determine the similarity class of A. T h e elementary divisors are closely related to the r a t i o n a l c a n o n i c a l f o r m of a matrix. T h e J o r d a n c a n o n i c a l form of a matrix, a subject t h a t is much taught, and in my opinion overemphasized, in intermediate linear algebra courses, is the rational canonical form if one works over the complex numbers rather t h a n the rational numbers, or, better, if one works over an algebraic extension of Q t h a t splits the characteristic polynomial of the matrix. A m a t r i x is d i a g o n a l i z a b l e if it is similar to a diagonal matrix, or, what is the same, if its elementary divisors all have degree 1. T h e methods of Chapter 9 not only make it possible to determine whether a given matrix is diagonalizable, they make it possible, when it is diagonalizable, to construct a similar diagonal matrix. However, this solution of the problem "determine whether a given matrix is diagonalizable" does not solve the problem of "diagonalizing" symmetric matrices in the sense of t h e spectral theorem because it tells only whether a symmetric matrix of rational numbers is similar to a diagonal matrix of rational numbers. This proof is marred by a misstatement in the first—and so far only—printing of the book. On p. 92, E should be assumed to have polynomial entries and D to have rational number entries.
Essay 5.3 Overview of 'Linear Algebra'
195
Chapter 10 is devoted to the (finite-dimensional) spectral theorem, which states that symmetric matrices are similar to diagonal matrices of real numbers. In the strict sense, this is not a theorem of linear algebra because it involves limits in an essential way: The equivalent diagonal matrix normally contains irrational numbers. This topic warrants an essay of its own.
196
5 Miscellany
Essay 5.4 The Spectral Theorem I remember trying unsuccessfully to concoct a constructive proof of the spectral theorem for symmetric matrices as long ago as 1964. It was only while writing my linear algebra book in t h e early 1990s t h a t I realized t h a t the eigenvectors would follow easily once t h e eigenvalues had been constructed, and t h a t t h e eigenvalues could be described constructively, in most cases, as places where the characteristic polynomial changed sign. This does not cover t h e case of multiple eigenvalues, but while I was developing t h e ideas in Chapter 9 of Linear Algebra it became clear to me t h a t the important polynomial is not the characteristic polynomial of the symmetric matrix but its minimum polynomial, which in t h e case of a symmetric matrix has no multiple roots. In this way, the geometrically fascinating principal axes theorem for symmetric matrices reappeared as the rather modest assertion t h a t the minimum polynomial of a symmetric matrix changes sign a number of times equal to its degree. Once this is known to be true, simple bisection determines all of the eigenvalues as real numbers, and simple linear algebra over a splitting field of the minimum polynomial suffices to determine the corresponding eigenvectors. Once the problem was reduced in this way, Kronecker's work on Sturm's theorem helped me solve it finally to my satisfaction. T h e solution was included in the last chapter of Linear Algebra, where, as far as I know, no one has ever read it. Here it is once again, with a few simplifications and improvements. T h e o r e m . Given a symmetric matrix S whose entries are integers, let f{x) be its minimum polynomial, and let m be the degree of f{x). Construct m-\-l rational numbers XQ < xi < X2 < - - - < Xm with the property that f{xi-i) and f{xi) have opposite signs for i = \, 2, ..., m. Proof. If 772 = 1, then f{x) = x + c, and one can simply set XQ — —N and xi = N for a sufficiently large number N. Assume, therefore, t h a t m > 1. The function* tv {g{S)h{S)), defined for pairs of polynomials {g{x),h{x)) with rational coefficients, can be regarded as a symmetric bilinear function from Q[x] mod f{x) to Q. If h{x) has t h e property t h a t tr {g{S)h{S)) = 0 for all polynomials g{x), t h e n h{x) = 0 mod f{x), because tr (/i(S')^) is the sum of the squares of the entries of the matrix h{S) (because S and therefore h{S) are symmetric), so the case g = h oi tv {g{S)h{S)) = 0 implies h{S) = 0. Therefore, t h e m x m matrix of rational numbers t h a t represents this symmetric bilinear form with respect to the basis 1, x, x^, . . . , x'^~^ of Q[x] mod f{x) is invertible, from which it follows t h a t every linear form Q[x] mod f{x) -^ Q can be expressed as g{x) ^-^ tr {g{S)h{S)) for some polynomial h{x) with rational coefficients. In this way, a polynomial h{x) with rational coefficients can be constructed for which tr {g{S)h{S)) = gi for any * Here the trace tr (M) of a square matrix M is of course the sum to its diagonal entries.
Essay 5.4 The Spectral Theorem
197
polynomial g{x) = gix^~^ + g2X^~'^ + • • • + ^'m in which the Qi are rational numbers. If it is stipulated that degh < m^ this property determines h when f{x) is given. When h{x) is defined to be the polynomial determined in this way, h{x) is relatively prime to / ( x ) , because a common divisor d{x) of f{x) and h{x) of positive degree, say f{x) = qi{x)d{x) and h{x) = q2{x)d{x)^ where deg^ > 0, would imply deggi < m, say degg'i -ht = m, where t > 0, so x^~^qi{x) would be a polynomial of degree m—1 and tr {S*~^qi{S)h{S)) would be nonzero, contrary to S'-^qi{S)h{S) = S*-^qi{S)q2{S)d{S) = S'-^q2{S)f{S) = 0. Therefore, h{x) is invertible mod f{x). Let Sm{x) = / ( x ) , let Sm-i be the unique inverse of h{x) mod f{x) whose degree is less than m, and let later terms of the sequence Sm{x), Sm-i{x), Sm-2{x)^ . . . , Sk{x) be defined by defining Si{x) to be the negative of the remainder when 5^+2 (^) is divided by 5^+1 (x). The sequence terminates with the last nonzero term Sk{x) generated in this way. It will be shown that each Si has the form Si{x) = si'^x^ + s^_^x^~^ H h S Q \ where the first coefficient 5^ , call it Ci, is positive. In particular, si{x) has degree 1 with a positive leading coefficient, and the final nonzero term SQ is a positive constant. For any g{x) = gix^"^ + g2X^~'^ -\ h gm, the identity tT{g{S)sm-i{S)h(Sf)=g^ follows from the definitions of h{x) and Sm-i{x), because Sm-i{S)h{S) = / . Use of Sm-2{x) = -Sm{x) + qm-i{x)sm-i{x), whcrc qm-i{x) is the quotient in the division that defines 5^-2(^)5 in ti {g{S)sm-2{S)h{S)^) gives - 0 + ti{g{S)qm-i{S)sm-i{S)h{S)'^), which is the coefficient of x^"^ in g{x)qm-i{x), provided g{x)qm-i{x) has degree less than m. Since Sm{x) — x'^ + • • •
and
Sm-i{x) = mx'^~^ + • • •
(the latter because tr {sm-i{S)sm-i{S)h{S)'^) = tr (/) = m), it is clear that qm-i{x) has degree 1 and leading coefficient 1/m, so tr
{g{S)sm-2{S)h{Sf)=g,/m
whenever g{x) = gix'^~^ + • • •• The general case of this formula, which states that (1) tT{g{S)si{S)h{Sf)=gr/ci^i (Q+1 is the leading coefficient of s^+i(x)), where g{x) — gix'^ + • • • is a polynomial whose degree is at most i, will now be proved. Note first that the case i = m — 2 already proved implies, since Sm-2{x) has degree at most 771 — 2, that the coefficient Cm-2 of x'^~'^ in Sm-2{x) satisfies c^_2/m = tr {{sm-2{S)h{S))'^), which is the sum of the squares of the entries of Sm-2{S)h{S) and is therefore positive unless Sm-2{S)h{S) = 0; thus Cm-2 is positive, because h{S) is invertible and Sm-2{S) = 0 would imply
198
5 Miscellany
tr {g{S)sm-2{S)h{S)'^) = 0 for all g{x), but for g{x) = x ^ - ^ ^-j^jg ^^ace is 1/m. Therefore, the assertion that Si{x) has degree i and positive leading coefficient is proved for i = TTI, m — 1, m — 2. Similarly, if this assertion and (1) are proved for both i -\- 2 and i + 1, say Si^2{x) = 0^+2^^"^^ + • • • and Si+i(x) = Q+IX*+^ + ' • •, the same method proves them for i, provided i > 0, because if g{x) = gix^ -\- - - -^ then tr {g{S)siiS)hiSf)
= - tr {giS)si+2{S)h{Sf)+tT
{g{S)qi+i{S)s,+i{S)h{Sf)
because ^(x)g'i+i(x) = ^^^''^^ • x'^'^^ + • • •. In particular, Si{S) ^ 0. The case g{x) = 5^(x) of this identity implies that Ci/ci-^i is the positive rational number tr {{si{S)h{S))'^)^ so Si(x) has degree i and a positive leading coefficient, as was to be shown. A polynomial of degree i cannot change sign more than i times, as can be seen as follows: If F(x) is a polynomial, if a and b are rational numbers for which F{a) < F{b), and if c is the midpoint of the interval [a, b], then the rate of increase of F on the interval is the average of the rate of increase on the two halves of the interval [a, c] and [c, 6], which is to say that F{b) - F{a) _ 1 fF{c) - F{a) b— a 2\ c—a
F{b) - F{c) b— c
as follows from c — a = b — c = ^{b — a). Select the half interval on which the rate of increase is larger, or, if the two rates are the same, select the half interval on the right. Iteration of this bisection and selection rule determines a nested sequence of subintervals of [a, 6], each half as long as its predecessor, and therefore determines a real number (their "intersection"). Since the derivative* F'{x) of F{x) at this real number is the limit of a nondecreasing sequence of positive rational numbers (namely, of the values of {F{b) — F{a))/{b — a), where a and b wee the endpoints of the successive intervals), the real number determined by the nested intervals is one at which F^x) is positive. In this way, any interval on which F{x) increases contains a real number at which F'{x) is positive, and therefore contains a rational number at which F'{x) is positive. Similarly, any interval on which F decreases contains a rational number at which F'{x) is negative. Therefore, if a polynomial with rational coefficients of degree i changes sign at least cr > 0 times, then z > 0, and its derivative is a polynomial of degree i — 1 that changes sign at least a — 1 times. Repetition of this argument a times gives a polynomial of degree i — a, so z > cr. Moreover, if i nonoverlapping intervals on which a polynomial of degree i changes sign are given, bisection of each of them (but moving the midpoint * The derivative of F{x) is of course the coefficient of h in the polynomial F{x + h) — F{x) = F'{x)h + • • •, where the omitted terms all contain h?.
Essay 5.4 The Spectral Theorem
199
slightly if it happens to be a root of the polynomial) gives 2i nonoverlapping intervals; t h e polynomial changes sign on at least i of t h e m and, as was just shown, on no more t h a n i of them. Therefore, repetition of the bisection process constructs i real roots of the polynomial and shows t h a t the values of such a polynomial for two rational numbers a and b have opposite signs if and only if the interval [a, b] contains an odd number of its i real roots, provided neither a nor 6 is a root. For each i = 1, 2, ..., m, Si{x) has i real roots, and the number of real roots of Si{x) that are greater than a given one of them is equal to the number of real roots of Si-i{x) that are greater than it, as can be proved inductively as follows: This statement is obviously true in the case i = 1, because a polynomial of degree 1 has just one root, and no roots of SQ are greater t h a n it. Suppose now t h a t it is true for a given i, and let p i , p2^ .. •, Pi be t h e real roots of 5i(x), in ascending order. By t h e inductive hypothesis, the number of real roots of Si-i{x) greater t h a n pj is i — j ^ so, since Si-i{x) is positive for all sufficiently large values of x, Si-i{pj) has t h e sign (—1)*"-^. (A polynomial of degree z — 1 has at most i — 1 roots, counted with multiplicities, so t h e roots of Si-i{x) are simple and Si-i{x) changes sign at each root.) Since the formula Si-^i{x)-\-Si-i{x) == qi{x)si{x) implies* t h a t Si-\-i{x) and s^_i(x) have opposite signs at a root of Si{x)^ it follows t h a t t h e sign of s^+i(x) at pj is (—1)*"-^+^. Because the leading coefficient of Si-i{x) is positive, when yo^+i is chosen to be a large enough number and when —po is chosen to be a large enough number, the same rule describes the sign of 5^+i(x) at all of the real numbers po, p i , . . . , p^+i. Since these signs alternate, it follows t h a t Si^i{x) changes sign z + 1 times, and therefore has i -h 1 real roots. Moreover, t h e j t h one of these roots lies in the j t h interval where s^+i(x) changes sign, which places it between pj and pj+i and shows t h a t t h e number of roots of s^+l(x) greater t h a n a given root of Si-^i{x) is the number of p^s greater t h a n it, as was to be shown. Thus, sufficiently close rational approximations to the roots of Sm-i{x), together with a pair of values ±A^ for a large number A/", demonstrate m changes of sign in Sm{x) — f{x), as required. C o r o l l a r y ( T h e s p e c t r a l t h e o r e m ) . Given a symmetric matrix S whose entries are integers, find real numbers pi, P2, • • -, Pm d'^d symmetric matrices of real numbers Pi, P2, . . . ; Pm that satisfy 5 = P i P i + P 2 P 2 + . . . + P m P rrri') * A common divisor of Si{x) and Si-i(x) divides all of the Sj{x), and therefore divides the nonzero constant SQ, and must therefore be a nonzero constant. Therefore, there are polynomials a{x) and /3{x) with rational coefficients for which a{x)si{x) + p{x)si-i{x) = 1. If p is a real root of Si{x), then 1 = /3{p)si-i{p), which implies that the real number Si-i{p) is nonzero. Similarly, a root of Si-i{x) is not a zero of Si{x). Thus, when p is a root of Si{x), the equation Si-\-i{p) + Si-i{p) = 0 implies that Si+i(p) and Si-i{p) have opposite signs.
200
5 Miscellany / = P i + P2 + • • • + Prr
and PP =
Pi 0
if ^ = J, otherwise.
Deduction. All that is needed is to construct the matrix that is orthogonal projection on the eigenspace corresponding to each eigenvalue p^, which is to say orthogonal projection on the kernel oi S — pil for all roots pi of the minimum polynomial of S. (To say that a matrix is an orthogonal projection means that it is symmetric and idempotent.) Each of these orthogonal projections is easy to find, because the orthogonal projection on the kernel of a symmetric matrix M is / — Q, where Q is orthogonal projection on the image of M, which is to say that Q is M multiplied on the right by the MoorePenrose generalized inverse of M. Therefore, the spectral decomposition of S can be given once the Moore-Penrose generalized inverses of the matrices I — S -\- pil are found. Note that the computation of the Moore-Penrose generalized inverse of a matrix requires exact computations with the entries, so it becomes possible only after a splitting field for the minimum polynomial is constructed; the interpretation of the p's and P's as real numbers requires an identification of the splitting field with a subfield of the field of real numbers.
Essay 5.5 Kronecker as One of E. T. Bell's "Men of Mathematics"
201
Essay 5.5 Kronecker as One of E. T. Bell's "Men of Mathematics" Kronecker laid himself out in 1891 to criticize Cantoris work to his students in Berlin, and it became clear that there was no room for them both under one roof As Kronecker was already in possession, Cantor resigned himself to staying out in the cold.—E. T. Bell, Men of Mathematics^ p. 570. Discussing Lindemann's proof that n is transcendental, Kronecker asked, ^'Of what use is your beautiful investigation regarding TT? Why study such problems, since irrational [and hence transcendental] numbers do not exist?''^ihid., p. 568. It is a mistake to take Eric Temple Bell's book Men of Mathematics too seriously. Bell set out to write a popular book about the history of mathematics, and he succeeded admirably. From its publication in 1937 until today, the book has amused and inspired several generations of amateur and professional mathematicians, including mine. His outrageousness is part of his winning style. But in spicing up his stories he did create distortions, extrapolations, and outright falsehoods that have since become common "knowledge." On the whole, the picture Bell paints of Kronecker is not negative. "His skepticism was his greatest contribution," the table of contents says of Kronecker, and at many points Kronecker is made the respectable spokesman for those who objected to the growing use of the transfinite. Kronecker would probably have preferred to have less about his philosophy and more about his mathematical achievements in the book, but his importance and his contributions are certainly not slighted. Unfortunately, the word "vicious" is used more than once to describe his criticism of others, principally Weierstrass and Cantor, and, as I have said in the Preface, I do not feel this word is justified. I have recently come to understand what lies behind the two statements of Bell that are quoted above, and in explaining them in this essay I hope to shed some light on Kronecker and his ideas, as well as on Bell, his methods, and his lack of credibility. In the first quotation Bell implies that it was one thing to criticize professional colleagues and quite another to do it in front of students. That was surely Cantor's view of it when he bitterly complained to W. Thome in a letter dated 21 September 1891 that Kronecker in public lectures had told his "immature audience" that Cantor's work was "mathematical sophistry" [56, document 38]. Bell was undoubtedly referring to this letter of Cantor's when he wrote of Kronecker's 1891 criticism of Cantor, but I doubt that his evidence of the alleged criticism was as reliable as that contained in the transcript of Kronecker's 1891 lectures that was recently published [45]. Despite Cantor's claim that Kronecker had denounced a specific work of his, the published version of Kronecker's lectures does not even mention Cantor, much less the
202
5 Miscellany
specific work Cantor cites. The word "sophistry" is indeed used [45, p. 247] in connection with a transformation of i/-dimensional space into a space of some other dimension, which Cantor might reasonably imagine to be a reference to his work, but, as t h e editors of the lectures point out, t h e lectures were given just one year after P e a n o published his famous curve t h a t fills an area of the plane. It is of course possible t h a t Cantor knew more t h a n we do about Kronecker's actual words; he says in the letter t h a t he had obtained a copy of the lectures by chance, but nothing of t h e sort is to be found among his surviving papers. However, it is equally possible t h a t he was overreacting to a simple statement of opinion t h a t may not even have been directed at him and t h a t had, in the version t h a t has survived, no tinge of personal animosity. For his part. Cantor says Kronecker's "entire course of lectures is a muddled and superficial mix of undigested ideas, boasts, unmotivated name-calling, and rotten jokes."* If Bell's impression of Kronecker's remarks came from this characterization of t h e m it is easy to see why he used t h e word "vicious," b u t the surviving version of the lectures in no way deserves Cantor's description. Finally, before Kronecker's hostility can be taken, as Bell does take it, to be the cause of C a n t o r ' s spending his entire career at Halle instead of being called to Berlin, one must show t h a t somewhere there was someone who felt Cantor was a qualified a n d desirable candidate for appointment at Berlin. T h e fact is t h a t Kronecker died in the very year 1891 t h a t these lectures were given, so there was very soon no question of their needing to live "under one roof." Weierstrass survived Kronecker, and when Weierstrass died he was succeeded by H. A. Schwarz, no friend of Kronecker's views, but I am unaware of any effort to bring Cantor t o Berlin. When, as a g r a d u a t e student, I first read the passage in Bell's book about Kronecker's a t t i t u d e toward TT, I think I was as indignant as Bell wanted me to be with the claim t h a t TT might not "exist." Years later, when I encountered the same anecdote in Constance Reid's book Hilbert, I had come to take a great interest in Kronecker a n d his ideas, so this time rather t h a n being indignant, I was puzzled and unsure t h a t the anecdote was authentic. Neither Bell nor Reid cites a source. Kronecker's works show no inhibition about the use of TT. His papers on analytic number theory and those on elliptic functions are full of TT'S. In the first lecture in his course of lectures on number theory [44], he refers without apology to "the transcendental number TT from geometry" and notes t h a t it can be defined b y ^ = 1 — ^ + ^ — y + - - - . I am not aware t h a t he ever expressed any reservations about any particular transcendental number. W h a t he had reservations about was the notion t h a t t h e totality of real numbers could be treated as a mathematical entity. W h y would he have had any reservations about the "existence" of TT, or even about the meaningfulness of Lindemann's * Die ganze Vorlesung ist ein wirres oberfldchliches Gemisch von unverdauten Ideen, Prahlereien, unmotivierten Schimpfereien und faulen Witzen.
Essay 5.5 Kronecker as One of E. T. Bell's "Men of Mathematics"
203
theorem t h a t TT was transcendental? On the other hand, Bell's quotation of Kronecker could not be a mere invention. A few years ago, I found what I took to be Bell's source in Florian Cajori's History of Mathematics [7], where Cajori writes, "[Kronecker] once paradoxically remarked to Lindemann: 'Of what use is your beautiful research on t h e number TT? W h y cogitate over such problems, when really there are no irrational numbers whatever?' " B u t Cajori gives no source either. His book was originally published in 1894, which would put it only three years away from Kronecker and make plausible the hypothesis t h a t Cajori learned the story through word of mouth. Only much later did I read carefully the copyright page of t h e Chelsea reprint edition I had. There was a "second, revised and enlarged edition" in 1919, of which the Chelsea edition was a reprint. W h e n I finally tracked down a copy of t h e 1894 edition in microfiche, I learned t h a t t h e story of Kronecker and n had been added in 1919, twenty-eight years after Kronecker's death. Then, in J u n e of 2003, while writing to Professor David Rowe of Mainz about a different question, I had the happy thought to ask him whether he knew where Cajori might have heard the story. By r e t u r n e-mail he was able give me what seems certain to be the correct source.* In 1904, Teubner published a G e r m a n translation of Poincare's Science et Hypothese a n n o t a t e d by Lindemann [54]. One of Lindemann's notes—note (4) to page 20—cites Kronecker's advocacy of restating the theory of algebraic quantities entirely in terms of the theory of polynomials with integer coefficients (see Essay 1.1) and goes on to say: Spater ging Kronecker noch weiter, indem er die Existenz irrationaler Zahlen leugnete; so sagte er mir in seiner lebhaften und zu Paradoxen geneigten Art einmal: "Was nlitzt uns Ihre schone Untersuchung iiber die Zahl TT? Wozu das Nachdenken iiber solche Probleme, wenn es doch gar keine irrationalen Zahlen gibt?" (Later, Kronecker went even further and denied the existence of irrational numbers; thus, he once said to me in his lively and paradoxical way, "Of what use to us are your beautiful researches about the number TT? W h y consider such problems when in fact there are no irrational numbers?") Since Lindemann published this more t h a n ten years after Kronecker's death, we are entitled to take his quotation marks with a grain of salt. Clearly, Lindemann felt t h a t Kronecker was teasing him—not unkindly it would seem in view of the appearance of t h e word "beautiful"—but Kronecker's exact words, which are essential to an understanding of t h e underlying criticism, would probably have been difficult for Lindemann to recall after ten minutes, not to mention ten years. * Their different English versions of the alleged quotation suggest that Bell may have based his telling of the story directly on Lindemann, not on Cajori's retelling.
204
5 Miscellany
That Kronecker would prefer to state Lindemann's result in a way that made no reference to the totality of transcendental numbers comes as no surprise. If his meaning was simply that he would prefer to state the result in a form something like, "For any polynomial f{x) in one variable with integer coefficients the sequence of rational numbers / ( I — ^ + | — ^ + ---H- ^^,-^) can be bounded away from zero," no one would be scandalized, and the statement could not be used to ridicule Kronecker's views. I certainly do not claim to know what the point of Kronecker's criticism might have been—it could have been the form in which Lindemann stated his result or the methods he used or many things in between—but it seems certain to me that he would have regarded the result as having meaning and even as having considerable interest. For the fun of it, we can indulge Bell in his extravagant caricatures of our mathematical forebears, but we should be careful not to let them affect our understanding of the history of our subject. In particular, we should not let Kronecker's role as Bell's gadfly obscure his true-life role as a great mathematician whose works are classics.
References
[I]
[2] [3] [4] [5] [6] [7] [8]
[9]
[10] [II] [12] [13] [14]
N. H. Abel, Memoire sur une propriete generate d'une classe tres-etendue de fonctions transcendantes, Memoires presentes par divers savants a I'Academie des sciences, Paris, 1841, Oeuvres Completes, vol. 1, 145-211, (4.1). N. H. Abel, Sur la resolution algebrique des equations, Oeuvres Completes, vol. 2, pp. 217-243, (4.5). Aristotle, The Physics; P. H. Wicksteed and F. M. Cornford, translators. Harvard Univ. Press, Cambridge, Mass., 1957, (5.1) S. Axler, Linear Algebra Done Right, Springer-Verlag. New York, 1996, (5.3). E. T. Bell, Men of Mathematics, Simon and Schuster, New York, 1937, 1962, (5.5). N. Bourbaki, Elements d'Histoire des Mathematiques, 2nd edition, Hermann, Paris, 1969 (4.3). F. Cajori, History of Mathematics, Macmillan, New York, 1894, 1919, Chelsea Reprint 1980, 1985, Landmarks of Science Microform (5.5) N. G. Chebotarev (Tschebotarow), Grundziige der Galois'schen Theorie, Noordhoff, Groningen-Djakarta, 1950, Translation of Osnovie Teorii Galua, Gosudarstvennoe Techniko-Teoreticheskoe-Isdatelstvo, Moscow-Leningrad, 19341937, (Synopsis, 1.7, 5.3). N. G. Chebotarev, Newton's Polygon and its Role in the Present Development of Mathematics (Russian), Isaac Newton, 1643-1727 (S. I. Vavilova [transliterated Wawilow on the English version of the title page], ed.), Izdatelstvo Akademii Nauk, Moscow-Leningrad, 1943, pp. 99-126, (4.4). H. T. Colebrooke, Algebra, with Arithmetic and Mensuration, from the Sanscrit of Brahmegupta and Bhascara, J. Murray, London, 1817, (3.1). Gabriel Cramer, Introduction a Vanalyse des lignes courbes algebriques, Freres Cramer, Geneva, 1750, Landmarks of Science Microform (4.4). R. Dedekind, Uber einen arithmetischen Satz von Gauss, Mitt. Deut. Math. Ges. Prag, (1892), 1-11, Werke, vol. 2, 28-38. (2.5). R. Dedekind, Uber die Begriindung der Idealtheorie, Nachr. Kon. Ges. Wiss. Gottingen, (1895), 106-113, Werke, vol. 2, 50-58. (2.5). R. Dedekind and H. Weber, Theorie der algebraischen Funktionen einer Verdnderlichen, Jour, fiir Math., 92 (1882), 181-290, Dedekinds Werke, vol. 1, 238-349. (4.5, 4.7).
206
References
[15] L. E. Dickson, History of the Theory of Numbers, Carnegie Institute, Washington, 1920, Chelsea reprint, 1971 (3.1). [16] P. G. L. Dirichlet, Vorlesungen ilber Zahlentheorie (R. Dedekind, ed.), Vieweg, Braunschweig, 1863, 1871, 1879, 1894, Chelsea reprint, 1968, (3.3). [17] H. M. Edwards, Euler and Quadratic Reciprocity, Mathematics Magazine, 56 (1983), 285-291, (3.5). [18] H. M. Edwards, Galois Theory, Springer-Verlag, New York, 1984 (1.7, 1.9, 2.1, 2.3, 2.4). [19] H. M. Edwards, Divisor Theory, Birkhauser, Boston, 1990, (2.5). [20] H. M. Edwards, Linear Algebra, Birkhauser, Boston, 1995, (5.3, 5.4). [21] H. M. Edwards, Kronecker on the Foundations of Mathematics, From Dedekind to Godel (Jaakko Hintikka, ed.), Kluwer, 1995, pp. 45-52, (Preface, 1.1). [22] H. M. Edwards, Kronecker's Fundamental Theorem of General Arithmetic, Proceedings of a conference held at MSRI, Berkeley, in April 2003, (to appear) (Synopsis). [23] H. M. Edwards, O. Neumann and W. Purkert, Dedekinds "Bunte Bemerkungen" zu Kroneckers "Grundzilge", Arch. Hist. Exact Sci., 27 (1982), 49-85, (2.5). [24] F. Engel, Eduard Study, Jahres. der DMV 40 (1931), (4.2). [25] Euclid, The Thirteen Books of Euclid's Elements, T. L. Heath, translator and editor, 2nd Edition, Cambridge Univ. Press, 1925, Dover reprint, 1956, (1.2, 1.4, 3.1). [26] L. Euler, Observationes de Comparatione Arcuum Gurvarum Irrectificabilium, Novi Comm. acad. sci. Petropolitanae, 6 (1761), 58-84, Opera, ser. 1, vol. 21, pp. 80-107, Enestrom listing 252, (4.2) [27] E. Galois, Memoire sur les conditions de resolubilite des equations par radicaux, J. Math. Pures et AppL, 11, 1846, 381-444, see [18] for other citations and for English translation. (Synopsis, 1.2, 1.9, 2.1, 2.3, 2.4). [28] C. F. Gauss, Disquisitiones Arithmeticae, Braunschweig, 1801, (Synopsis, 2.5, 3.3, 3.4, 3.5, 3.6, 3.7). [29] C. F. Gauss, Demonstratio Nova Theorematis Omnem Functionem Rationalem Integram Unius Variabilis in Factores Reales Primi vel Secundi Gradus Resolvi Posse (1799), Helmstadt, Werke, vol. 3, pp. 1 30, (5.1). [30] C. F. Gauss, Demonstratio Nova Altera Theorematis Omnem Functionem Rationalem Integram Unius Variabilis in Factores Reales Primi vel Secundi Gradus Resolvi Posse, Comm. soc. reg. sci. Gottingensis (1815), Werke, vol. 3, 31-56, (5.1). [31] K. Hensel and G. Landsberg, Theorie der algebraischen Funktionen einer Variabeln, Leipzig, 1902, Chelsea Reprint, 1965 (4.4). [32] O. Holder, Uber den Casus Irreducibilis, Math. Annalen, 38 (1891), 307-312, (1.7). [33] A. Hurwitz, Uber die Theorie der Ideale, Nachr. kon. Ges. Wiss. Gottingen (1894), 291-298, Werke, vol. 2, 191-197 (2.5). [34] A. Hurwitz, Uber einen Fundamentals atz der arithmetischen Theorie der algebraischen Grofien, Nach. kon. Ges. Wiss. Gottingen (1895), 230-240, Werke, vol. 2, 198-207 (2.5). [35] A. N. Kolmogorov and A. P. Yuskevich (ed.). Mathematics in the 19th Gentury (Russian), Nauk, Moscow, 1978, English translation by A. Shenitzer, Birkhauser, 1992 (1.2).
References
207
[36] A. Kneser, Uher die Gattung niedrigster Ordnung . . . , Math. Annalen, 30 (1887), 179-202, (1.7). [37] L. Kronecker, Uher die verschiedenen Sturm^schen Reihen und ihre gegenseitigen Beziehungen^ Monatsber. Akad. Wiss. Berlin (1873), 117-154, Werke, I, 303-348, (2.4). [38] L. Kronecker, Uber die Discriminante algebraischer Functionen einer Variablen, Jour, fiir Math., 91 (1881), 301-334, Werke, II, 193-236, (4.3). [39] L. Kronecker, Grundzilge einer arithmetischen Theorie der algebraischen Groflen, Jour, fiir Math. 92 (1882), 1-122, Werke, II, 237-388, (1.1, 1.4, 1.5, 1.7, 2.2, 2.4, 4.5). [40] L. Kronecker, Die Zerlegung der ganzen Grossen eines natilrlichen Rationalitdts-bereichs in ihre irreductibeln Factoren, Jour, fiir Math. 94 (1883), 344348, Werke, II, 409-416, (1.4). [41] L. Kronecker, Zur Theorie der Formen hohere Stufen, Monatsber. Akad. Wiss. BerHn (1883), 957-960, Werke, II, 419-424, (2.5). [42] L. Kronecker, Ein Fundamentalsatz der allgemeinen Arithmetik, Jour, fiir Math. 100 (1887), 490-510, Werke, Ilia, 209-240, (1.1, 1.2, 1.8). [43] L. Kronecker, Uber den Zahlbegrijf, Jour, fiir Math. 101 (1887), 260-272, Werke, Ilia, 249-274, (1.1, 1.7). [44] L. Kronecker, Vorlesungen ilber Zahlentheorie, Teubner, Leipzig, 1901, Reprint, Springer, New York, 1978 (3.2, 5.5). [45] L. Kronecker, Uber den Begriff der Zahl in der Mathematik (Sur le concept de nombre en mathematiques)^ Retranscribed and annotated by J. Boniface and N. Schappacher, Revue d'histoire des mathematiques, 7 (2001), 207-275, (1.1, 5.5). [46] E. E. Kummer, Zur Theorie der complexen Zahlen, Jour, fiir Math. 35 (1847), 319-326, Collected Papers, 1, 203-210, (3.6). [47] E. E. Kummer, Uber die allgemeinen Reciprocitdtsgesetze unter den Resten und den Nichtresten, Math. Abh. Kon. Akad. Wiss. Berlin, 1859, Collected Papers, 1, 699-839, (3.5, 3.6). [48] J. L. Lagrange, Additions to Euler's Algebra, republished in vol. 7 of Lagrange's Oeuvres and vol. 1 (1) of Euler's Opera (3.3). [49] H. W. Lenstra, Jr., Solving the Pell Equation, AMS Notices, 49 (2002), pp. 182-192, (3.1). [50] A. Loewy, Algebraische Gleichungen mit reelen Wurzeln, Math. Zeitschrift 11 (1921), 108-114, (1.7). [51] I. Newton, The Mathematical Papers of Isaac Newton (D. T. Whiteside, ed.), Cambridge Univ. Press, 1969, (4.4). [52] O. Ore, Neils Henrik Abel, Mathematician Extraordinary, Univ. of Minnesota, Minneapolis, 1957, Chelsea reprint, 1974 (4.1). [53] H. Poincare, L'Oeuvre Mathematique de Weierstrass, Acta Mathematica 22 (1899), 1-18, (Preface). [54] H. Poincare, Wissenschaft und Hypothese (F. Lindemann, ed.), Teubner, Leipzig, 1904, (5.5). [55] Princeton University Bicentennial Conferences, Series 2, Conference 2, Problems of Mathematics, Reprinted in A Century of Mathematics in America, P. Duren et al., eds., AMS, Providence, 1989. (1.4). [56] W. Purkert and H. J. Ilgauds, Georg Cantor, Birkhauser, Basel, 1987, (5.5).
208
References
[57] B. Riemann, Grundlagen fiir eine allgemeine Theorie der Functionen einer verdnderlichen complexen Grosse, Riemann's gesammelte mathematische Werke, 1892, Dover reprint, 1953 (4.3). [58] G. Roch, Ueber die Anzahl der willkurlichen Constanten in algebraischen Functionen, Jour. f. Math. (1864), 372-376, (4.7). [59] W. Scharlau, Unveroffentlichte algebraische Arbeiten Richard Dedekinds aus seiner Gottinger Zeit 1855-1858, Arch Hist. Exact Sci. 27 (1982), 335-367, (1.7). [60] H. J. S. Smith, Report on the Theory of Numbers, Reports of the British Association for the Advancement of Science, 1859-1865, (3.6). [61] R. J. Walker, Algebraic Curves, Princeton Univ. Press, Princeton, 1950, Springer-Verlag reprint, 1978, (4.4). [62] A. Weil, Number-theory and algebraic geometry. Proceedings of the International Mathematics Congress, VI. II, 1950, pp. 90-100, Collected Papers, vol. 2, 442-452 (Preface, 2.2). [63] A. Weil, Number Theory, Birkhauser, Boston, 1984, (3.6).
Index
Abel, Niels Henrik (1802-1829), xvii, xviii, 119, 120, 122-124, 127, 128, 142 addition formula, see Euler's addition formula adjunction relations, xv, 51 adjunctions, xiii, 42 algebraic field, 47 algebraic integer, 63, 129 algebraic number, 63 algebraic number field, 48 algebraic quantities, xv, 1, 46, 47 algebraic variation, xvii, 121-124, 126, 127, 155 ambiguity of truncated solutions, 136, 138 ambiguous module classes, 96-98, 100 Archimedes, 65, 67, 68 Aristotle, ix, 179 Axler, Sheldon, 190 Bell, Eric Temple (1883-1960), 201-204 Bhascara Acharya, 67 binary quadratic forms, 108, 112 Bishop, Errett (1928-1983), ix, 190 Brahmagupta, 66-68, 109 Brouwer, J.E.J. (1881-1966), ix, 190 Buchstabenrechnung, 2, 4, 68 Cajori, Florian (1859-1930), 203 canonical form, 74, 191 Cantor, Georg (1854-1918), ix, x, 201, 202
Chebotarev, Nikolai G. (1894-1947), XV, 132
Chinese remainder theorem, 73, 93, 104 class group, 99 class semigroup, xvi, 79, 98 Comparison Algorithm, 80 completed infinites, ix, x complex numbers, 122, 128, 130, 179, 194 composition of forms, 102, 108, 109, 112 congruence relations, xvi, 71, 73 consistency requirement, 164-166 constructive mathematics, x, xi, 6, 133, 179-180, 186 content of a form, 113 continued fractions algorithm, 65 Dedekind, Richard (1831-1916), ix, x, 31, 62, 108, 110, 111, 129, 142, 164 diflFerential, 121, 157 Dirichlet, G. Lejeune (1805-1859), ix, xvii, 80, 108, 110 Disquisitiones Arithmeticae, xv, 79, 96, 102, 103, 108, 109, 127 double adjunction, xiv, 36 elementary divisors, 194 elementary symmetric polynomials, 57 elimination, 10 elliptic curves, xviii equivalence problem, 79, 88 equivalent modules, xvi, 79 Euclid's Elements, xi, 6, 16, 66 Euclidean algorithm, 16, 71, 73
210
Index
Euler, Leonhard (1707-1785), xviii, 91, 102, 124, 127 Euler's addition formula, 124-127 field of constants, 131, 150, 152, 157 field of quotients, 4 folium of Descartes, 137, 151 full representations of modules, 115 Fundamental Theorem of Algebra, xi, xiv, xix, 179, 185 Galois, Evariste (1811-1832), xiii, xv, 6, 7, 39, 42, 56, 120 Galois field, 41, 49-51 Galois group, 42 Galois polynomial, 39, 41 Gauss, Carl Friedrich (1777-1855), ix, xv-xvii, 3, 6, 7, 79, 91, 102, 108-110, 126, 183 Gauss's lemma, xiii, 18, 62 general arithmetic, xi, xv, xvii, 4, 63 genus, xvii, xviii, 122, 123, 126-128, 131, 150, 152, 153, 157-159, 171 genus of an algebraic curve, xvii Grobner bases, 5 Hensel, Kurt (1861-1941), 132 Hermite, Charles (1822-1901), 108 Hilbert, David (1862-1943), ix holomorphic differentials, xviii, xix, 124, 127, 155-158, 171-174, 176-178 Hurwitz, Adolph (1859-1919), 62 hyperelliptic curve, 154, 163 hypernumber, xv-xvi, 69 implicit diflFerentiation, 155, 157, 161 infinite descent, 148, 149, 187, 188 infinity, x, xi integers, 3 integral basis, 142, 147 integral domain, 4 integral over x, 128 irreducible polynomial, 15 Jacobi, Carl J.G. (1804-1851), 119 /^o, 157 Klein curve, 162, 178 Kronecker, Leopold (1823-1891), ix-xi, xiv, XV, xix, 1-8, 13, 31-33, 35,
46-47, 56, 62-63, 71, 108, 129, 142, 147, 190, 196, 201-204 Kronecker-Kneser Theorem, 31, 44 Kummer, Ernst (1810-1893), 102, 108, 110 Lagrange, Joseph-Louis (1736-1813), 92 Landsberg, Georg (1865-1912), 132 Legendre, Adrien-Marie (1752-1833), 102 Lindemann, Ferdinand (1852-1939), 203 linear algebra, xix, 190-193 minimal splitting polynomial, xiv, 39-40 module systems, 4-5 modules, xv-xvi, 71-78, 102 modules of hypernumbers, 73, 110 Moore-Penrose generalized inverse, 193, 200 multiplication of modules, xvi-xviii, 73, 77 Newton's polygon, xi, xvii, 132, 135, 139, 142, 152 Newton, Isaac (1642-1727), xvii, xviii, 132 normal basis, 142, 149 number, 2 order of a function at oo, 129 Ore, Oystein (1899-1968), 120, 122 p-parts, 93 Pascal, Blaise (1623-1662), 1 Pell's equation, 86 pivotal modules, 97, 98 Plato, 66 Poincare, Henri (1854 1912), x primitive modules, 98 100 principal modules, xvi, 79 principal parts of a function, 164, 172 proof by contradiction, 186 Pythagoras, 65 quadratic character mod p, 102 quadratic forms, xv, 108
Index
211
quadratic reciprocity, xv, xvi, 70, 93, 102, 106, 108
successors, 81 Sylow theorems, xix, 186
Reduction Algorithm, 83 Reid, Constance, 202 residues, 156, 159, 160, 162 Riemann surfaces, xvii, 121, 128 Riemann-Roch theorem, xviii, 164, 169 root fields, xiii, 5, 10 Rowe, David, 203
6)(x"), 129 the Euclidean algorithm, 16, 71 theorem of the primitive element, 44, 48, 52, 171 trace, 143, 157 transcendence degree, 47 truncated solution, 134-139
similar matrices, 193 simple algebraic extension, 10 Smith, H.J.S. (1826-1883), 108, 191 spectral theorem, xi, 195, 196, 199 splitting field, xiii, 6, 39, 40 stable modules, 80
Walker, Robert J. (1909-1992), 132, 140 Weber, Heinrich (1842-1913), 142, 164 Weierstrass, Karl (1815-1897), ix, x, 201, 202 Weierstrass normal form, 124, 127 Weil, Andre (1906-1998), ix, 46, 110