ON ITERATIVE METHODS FOR SOLVING NONLINEAR LEAST SQUARES PROBLEMS OVER CONVEX SETS BY
ADI BEN-ISRAEL ABSTRACT
Nonlinear least squares problems over convex sets in R~ are treated here by iterative methods which extend the classical Newton, gradient and steepest descent methods and the methods studied recently by Pereyra and the author. Applications are given to nonlinear least squares problems under linear constraint, and to linear and nonlinear inequalities. Introductien. Iterative methods for the solution of nonlinear least squares problems are extended here to problems over convex sets, and are applied in particular to linear and nonlinear inequalities and to nonlinear least squares problems with linear constraints. Linearization plays a dual role in our approach: Convex sets are linearized in the sense that their proximity maps are considered as perpendicular projections on supporting hyperplanes and nonlinear functions are linearized, i.e. replaced by their linear approximations. Similar approaches were successfully used by Cheney and Goldstein [5,1, Goldstein [7,1, Bellman and Kalaba [1,1, Rosen [18-1, [19], Poljak [17,1 and many others. In the problems considered below a certain closed convex set stands for what the origin stands in the classical problems in the sense that belonging to that set replaces vanishing, and the minimization of the distance from that set replaces solving least squares. Our methods are natural extensions of the classical methods of Newton, gradient and steepest descent, and of the methods recently studied by Pereyra [16,1 and the author [2], [3]. The paper has 4 sections. Notations and preliminaries are concentrated in Section 0. In Section 1 the problems and methods of solution are introduced and their relations to well-known methods are shown. Convergence theorems are stated in Section 2. Selected applications are given in Section 3. Applications to Mathematical Programming and numerical experience will be given elsewhere. Received March 21, 1967. Part of the research underlying this report was undertaken for the Officeof Naval Research, Contract Nonr-1228(10),Project NR047-021, and for the U.S. Army Research Office- - Durham, Contract No. DA-31-124-ARO-D-322 at Northwestern University. Reproduction of this paper in whole or in part is permitted for any purpose of the United States Government. 211
212
ADI BEN-ISRAEL
[October
§0. Preliminaries and Notations 0.1 We denote by R n the real Euclidean n-space with inner product: (x, y) = ~,~=1 xiYi
Ilxll
norm: =(x,x) ''2, distance between points x,y: d(x, y ) = IIx- y II, distance between a point x and a closed set K: d ( x , K ) = i n f { d ( x , y ) : y e K } , the closed sphere with center x and radius r: S(x, r) = {y: d(x,y) <=r}. The space of linear transformations from R" into R '~ is denoted by L(R~,Rm). 0.2 For an m x n real matrix A we denote by Ar the transpose the spectral norm, e.g. [10] R(A) the range space N(A) the null space A+ the generalized inverse, e.g. [15], [4].
Ilall
0.3 A function f : R" ~ R m is differentiable at Xo e R "if there is a f'(xo) ~ L(R", R m) such that lim IIf(x + h) - f ( x ) - f ' ( x o ) h II = 0
IIh II f'(Xo) is the derivative of f at Xo, and is represented by the Jacobian matrix f'(x°) =
x°
'
j = 1,...,n .
The gradient, Vf(xo), of a function f : R n - ~ R at a point x o ~ R ~ is f'(xo) r. If 9S is an open set in R" and the mapping: x ~ f ' ( x ) o f R ~ into L(R~,R '~) is defined and continuous for all x e ~3 we say that f is continuously differentiable in ~3 and write f e C'(9S). 0.4 With a dosed convex set K in R ~ we associate the proximity map P~¢:Rn--*R~ defined by: PK(x)eK and d(x, Pr(x))= d(x,K) for all x e R ~. For properties, applications and further references on proximity maps see [6], [13] and I"20]. We write Pgl(x) for x - PK(x), and recall the following properties: (i) PK±(x)= 0 if, and only if x e K, (ii) If K is a linear subspace in R n then Pr is the perpendicular projection on K, and P~-t is the perpendicular projection on K ~, the orthogonal complement of K. In this case we write PKx for Pr(x). (iii) If K is a dosed convex cone in R n (with vertex at the origin) then Pr-L is the proximity map of the cone K j- = {y: x e K =~ (y, x) =<0), e.g. 1-13]. (iv) I f x ~ K then Pr(x) is a boundary point of K and PK±(x)is the (outward pointing) normal to a supporting hyperplane of K at Px(x), e.g. [63 p. 448 lemma.
1967]
NONLINEAR LEAST SQUARES PROBLEMS
213
§1. Problems and Methods 1.1. P r o b l e m s Let f : R" ~ R m, K a closed convex set in R r', 93 an open subset of R" and let d ( f ( x ) , K ) be differentiable in 93.
We consider the following: P r o b l e m 1: F i n d an x ~ 93 f o r which
(1)
Vd2(j(x),K) = 0
The significance of problem 1 is that if solvable, its solutions may include the solutions (provided they exist) of the following problems: P r o b l e m 2: F i n d an x ~ R " f o r w h i c h (2)
d ( f ( x ) , K ) is m i n i m i z e d
P r o b l e m 3: F i n d an x ~ R n such that
(3)
f(x) ~ K
We note that problem 3 includes: P r o b l e m 4: F i n d an x e R " which satisfies (3) and in addition (4)
xeL
where L is a closed convex set in R".
Indeed problem 4 is rewritten as: F i n d an x ~ R n which satisfies
(5)
g(x) e M
where g:R"--~ R m x R" is
and
(7)
M = K x L, a convex set in R m x R".
1.2. Methods. For the solution of Problem 1 and the treatment of Problems 2 and 3 we propose the iterative methods
(8)
x~+ 1 = x ~ - f (x~) Pr-L(f(x,)),
t
+
V = O, 1, ..-
x~ +, = x~ - ( 7 " 7 1 ) f ' ( x , ) r p l ~ ' ( f ( x ~ ) ) ,
v = O, 1,...
and (9)
214
ADI BEN-ISRAEL
[October
where x o is an approximate solution, and {T,} is a suitably chosen sequence of nonsingular operators in L(R", R"). In treating Problem 3 it is always possible, and often desirable to replace method (8) by method (9) with positive definite T~. This is done by considering an equivalent (artificial) Problem 4 in which the convex set L is taken sufficiently large to include all points of interest, i.e., all relevant x satisfy (10)
~x e L
where a is a fixed positive number, or equivalently (11)
PL ±(ex) = 0.
For this artificial problem method (8) becomes (12)
v = 0,1,...
x~ +1 = x~ - g'(x3+ PM.(g(x~)),
where (13)
g(x)
(14)
g'(x)= \
(15)
\ ax :
]
g'(x) + = ( f ' ( x ) r f ' ( x ) + ~2I)- l(f'(x)r, aI)
(7) M = K x L and accordingly (16)
PM~(g(x)) = \ PL (CtX) ]
Substituting (15), (16) and (11) in (12) we get (17)
xv+ , = x~ - ( f ' ( x , ) r f ' ( x 3 + ~2I)-lf'(x,)rPt~±(r(x,)),
v = o, 1,...
which is method (9) with (18)
T~ =f'(x~)rf'(x~) + ct2I,
v = 0,1, ...
from which the (fictitious) set L has justly disappeared. Comparing (8) with (17) we see that the latter uses the inverse of the positive definite matrix (18), whereas the former uses the generalized inverse of the arbitrary matrix f'(x~). Another advantage of (17) is that it combines characteristics of both Newton and Gradient methods, e.g. the discussion in Marquardt [11] for the special case K = {0}. 1.3
(19)
Relations with known methods. In the special case K = {0} we have
p~ l(y) = y,
for all y.
1967]
NONLINEAR LEAST SQUARES PROBLEMS
215
For K = {0} method (8) reduces therefore to the author's variant of Newton's method, [2] (20)
v = 0,1, ...
xv+ 1 = xv - f ' ( x ~ ) + f(x~),
which for nonsingutar ./'(x~) is the Newton method [9], [1] (21)
x~+ 1 = xv - f ' ( x v ) - X f ( x , ) ,
v = O, 1, ....
Method (9) reduces for K = (0} to Pereyra's method [16] (22)
x,+l=x,-(T~-l)J'(xv)rf(x,),
v = 0,1,...
which includes: (i) The Gauss-Newton method with (23)
T, = f'(xOr f ' ( x , ) ,
v = O, 1,...
(ii) The modified Newton method with (24)
T, = f ' ( x , ) r f ' ( x o ) ,
v = O, 1, ...
(iii) Gradient methods since (25)
f ' ( x ) T f ( x ) = ~ V ( f ( x ) , f ( x ) ) and in particular the
(iv) Steepest descent method with scalar Tv. Method (17) reduces for K = {0} to Marquardt's method [11] (26)
x,+l = x~ - (f'(x~)rf'(x~) + ce2I)-lf'(x~)rf(x~),
v = 0,1,...,
see also Morrison [14], and Meeter [12]. The least-squares method of [3] (27) x~+ 1 = xv - ( f ' ( x O r f ' ( x O + oe2 I ) - l ( f ' ( x , ) r f ( x 0 + a2(x~ - u)),
v = 0,1,...
is obtained for K = {0}, L = {0}, from (12) by taking (28)
g(x) = ( f ( x ) L
(x -
u)/"
§2. Convergence Theorems 2.1 Method (8). Sufficient conditions for the convergence of method (8), analogous to those of [2] in the special case K = {0}, are given in: THEOREM 1. Assumptions: f : R" ~ R m, x o a point in R", K a closed convex set in R '~ and r, M , N positive constants such that:
216
ADI BEN-ISRAEL
(29)
[October
f ~ C'(~3) where ~3 is an open set containing S(xo, r)
For all u, v e S ( x o , r ) with u - v e R ( f ' ( v ) r ) :
(30)
lls'(o)(u- v)- ~. ±(:(u)) +,. ±(s(v))II-<__Mllu - vll
and
II (:'(o+ - s,(u) +)P~(s(u))II ---
(31)
For all x e S ( x o , r)
(32)
M If(x)+ II + N = k < 1
(33)
llf'(xo)+~K'V(~o)) I1 < (1 - k)r
Conclusions: The sequence (8) converges to a solution x* of(I) which lies in S(xo, r) and is unique in S(xo, r) N {x* + R(f'(x*)r)}.
Proof. The proof follows closely that of theorems 1, 2 of [2]. We prove first that: (34)
xv e S (Xo, r)
v = 1, 2,...
For v = 0,1,... we write (35 ) x v+ 1 - xv = - f ' ( x v ) +P~ ± ( f (xv)) = xv - xv- 1 -- f'(xv)+Px±(f(x~)) + f ' ( x v - 1)+PK x(f(x~_ 1)) r
+
t
= f (x~-l) [ f (x~-l)(x~ - x,-1) -- PKI(f(x~)) + PK±(f(x~-l))] + [if(X,_ l) + - f ' ( x , ) + ] P ~ l(f(xv)).
where we used (36)
x, - xv_ 1 = f P(x,-O -I" f t (x,-1)(xv- x,-1)
which follows from (37)
x, - x,-1 s R ( f ' ( x v - 1 ) +) = R ( f ' ( x , - o r ) ,
e.g. [4].
From (35), (30), (31) and (32) we get
(38)
IIx,+,-x,
II-<-(Mllf'(x,-,)+ll+~Ollx,-x,-,ll -- k II~, - ~,-1 II < II~,- ~,-, II
Now, by (33) we get (39)
[[x t - Xo][ < (1 - k)r
so that (34) holds for v = 1 and from (38)
1967]
NONLINEAR LEAST SQUARES PROBLEMS
(40)
[Ix~+l - Xo [I < ~ k' 11xl - Xo [I =
217
k(1 - k ~)
j=l
~ _-- k) I[ xl - Xo [1
which by (39) proves (34). The convergence of (8) to a point x* e S(xo, r) follows now from (38). Using (8) we note that the limit x* must satisfy
f'(x*)+ PK±(f(x*)) = 0
(41) and from
N(A +) = N(Ar),
(42) e.g. [4], we get
f'(x*)T Pr ±(f (x*)) = 0
(43)
Let S be a hyperplane whose translate PK(f(x*)) + S supports K at Pr(f(x*)) where its normal is PK±f(x*). Since Pr±(f(x*)) and P s z f ( x *) are collinear
PK-(f(x*)) = tPs-tf(x*), for some real t
(44)
and since PS-L is idempotent and symmetric, we rewrite (43) as (45) f'(x*) rP s if(x*) = f'(x*) rP s ±Ps ~f(x*) = [(Ps " f(x*))'] r(P s if(x*)) = 0 As in (25) we note that
[(Ps ~-f(x))'] r(es ±f(x)) = 2 V (Ps "f(x), Ps " f(x))
(46)
Combining now (43), (44), (45) and (46) we verify that the limit x* of (8) is a solution of (1). The claimed uniqueness of x* holds because for any other limit x** of (8) in S(xo, r) r3 {x* + g(f'(x*) r} we have (47)
II x**-x,
II = IIx** -
+f'(x*)+Pr±(f(x*))ll
<__ [[f'(x*)+ f'(x *) (x** - x*) - f ' ( x * ) + Pr±(f(x**)) + f'(X*)+ Pr-L(f(x*))[I + I] (if(x*)+ -if(x**)+)PKz(f(x**))]l =< (M {[f'(x*) + [I + N)[I x** - x* [], by (30), (31) < II x,,
-
x, II,
by (32), a contradiction.
Q.E.D.
2.2 Method (9). We use the notations (48)
~b(x) = f'(x)rf(x)
(49)
t~(x) = f'(x) rPr ±(f(x))
218
ADI BEN-ISRAEL
[October
in adopting Pereyra's theorem 2.1 of [16] to give sufficient conditions for the convergence of (9). THEOREM 2. Assumptions: f : Rn--* R m, Xo a point in R ~, K a closedconvex set in R m, {Tv:v=0, 1,...} a sequence of nonsingular operators in L(R ~, R~), p, 2, e, fl, rl, k, r positive numbers such that (50) f is twice differentiable in an open subset of R ~ which contains S(xo, p)
(51)
IIr,-111__<~
,
v=o,t,...
(52)
II Zv-O'(xo)ll _-<.,
,
v=O,1,...
(53)
IIv(u)- v(v)-~'(Xo)(U-v)ll
_-<~l/u-vll for all u, v~S(xo, r)
(54)
IIV(xo)ll z
(55)
k = 4(# + .) < 1
(56)
r = 1~ k < p
Conclusions: The sequence (9) converges to a solution x* of (1) which lies in S(xo, r) and is unique in S(xo, r), Proof.
The method (9) is rewritten as
(57)
x,+ t = x, - T~- 1tlJ(xv),
v = 0,1, .--
From (51), (54), (55) and (56) we see that
(58)
llxl-Xoll <--~llV(Xo)}l z~
and using (57), (53) and (52) we verify that
(59) IIv(x,)II = IIv(~o- V(~o)- To(x,- Xo)ll =< IIv(xl) - V(xo) - ~'(xo)(~l - ~o)II + II(~'(Xo) - ro)(~ - ~o)II --<(~ + ,)II x, - Xo II We prove now, by induction, that the relations (60. v)
(61. v) (62. v)
II~ , - ~o II _ r
Ilx,- x~_~ II--<x IIv(x,-,) II 11v(~v)II --<(~ + ~)II ~, - x,_l I1
hold for all v = 1, 2, .... Suppose indeed that (60. v), (61. v) and (62. v) hold for v = 1, 2,.-., p and we will show them to hold for v = p + 1: (61. p + 1)
IIx., - ~, II <__xll v(~.) It,
by (51)
1967]
219
NONLINEAR LEAST SQUARES PROBLEMS
Using (61. p + 1) and (62. v), v = 1,..-,p, we get (63)
IIxp+, - x, II --
so that
p
(60. p + 1)
I1x , , ~ - x o
p
II ~ z IIx , , , - x , ~=o
II ~ z k, IIx ~ - x o 11 < ,=o
=
1
=r
And finally, using (57), (53) and (52): (62. p +
1)
IlV(xp÷Oll = IIv(xp+,)- v ( x ~ ) - r ~ ( x ~ ÷ , - x~)lJ
--- I1V(x,+,)
- v(x~) - ¢'(Xo)(X,+ ~ - x~)II
+ I1(¢'(Xo) -
Tp)(xp+,
- xt, )
II
z (~ + ~)II x,+, - x~ II The convergence of (57) to a point x* in S(xo, r) is now guaranteed. At the point x* we have (64)
tO(x*) = f'(x*)rPK-t(J(x*)) = 0
and reasoning as in the proof of Theorem 1 we verify that x* is a solution of (I). The uniqueness of x* follows from the fact that for any other solution x** of W(x) = 0 in S(xo, r) we have (65)
IIx**- x*ll--II ro-'ro(~**- ~*)ll
by (51)
<-s,q v ( x , , ) -
v(x,)-
~,(xo)(x,* - x,) II +,~11 (To- ~'(Xo))(x** - x*)ll
by (52) and (53) <
(~ + 5)II x** - ~* I1,
by (55), <
]] x** - x*]],
a contradiction.
Q.E.D.
2.3 Error bounds. If the conditions of Theorem 1 are satisfied, then an error bound for (8) is given by (66)
IIx, - x* It --< k >
where k is given by (32). Indeed, from (38) and (39) we get
220
ADI BEN-ISRAEL
[October
P
(67)
11x ~ + , - x~lt < ~.=,1[x v + , - x , + , - t II p- l kv < k" • k'[[ x t - x o[I < ~ [ 1
x t - x o [1 < k'r.
iffi0
Similarly, (66) is an error bound for the method (9) with k given by (55), provided the conditions of Theorem 2 are satisfied. §3.
Applications
3.1 Proximity maps: special cases. The promise of methods (8) and (9) lies in problems where the proximity maps are readily available. Two such cases are when the convex set considered is (i) a linear manifold, or (ii) an orthant in R n. The proximity maps in these cases are given below. (i) Linear manifold. I f L is a linear manifold in R n it can be represented as L = {x:Ax = b}
(68)
for some m, A e L ( R ~, Rm), b ~ R m. Equivalently we write
L = A +b + N(A)
(69) and the proximity map is (70) (71)
PL(x) = A+b + PN(A)x, for all x e R ~, from which eL-L(X)
---- PRtAT) x - -
A +b.
(ii) Orthant. If K is the nonpositive orthant R".. of R", i.e. (72)
K = {x: x < 0 i.e. x~ < 0 for i = 1,.-., n}
then for any x ~ R *, written after a rearrangement as
where the subvector x÷ is positive and the subvector x_ is nonpositive, we have
3.2 Nonlinear Least-squares problems with linear constraints.
Let f : R~-~ R m, A eL(R',Rm), b ~ R m.
19671
221
NONLINEAR LEAST SQUARES PROBLEMS
Consider: Problem 5: Find an x ~ R" which minimizes
(76)
(f(x),f(x))
= ~. f~(x) l=l
subject to Ax =b.
(77)
We will instead consider problem 4 of §1 rewritten as" Problem 6: Find an x ~ R n for which
(78)
g(x) = ( f ! x ) ] ~ M = K x L \x/
where
(79)
K = (o}
and L is given by
(68)
L = { x : A x = b}
This problem is stricter than Problem 5 because in addition to (77) we require here that (80)
f ( x ) = O.
Applying method (12) to problem 6 we get, using (13)-(16) with ct = 1 together with (19) and (71): (81) Xv+l = x, - (f'(xv)rf'(x~) + I ) - l[f'(x,) f(xv) + PR(AT)X~-- A +b], v = 0,1,'... Indeed, if (81) converges to a point x* then at x* (82)
-- f'(x*)Tf(x *)
= PR(AT)X * -- a + b
and by (25), (71) we conclude that the gradient V ( f ( x * ) , f(x*)) is perpendicular to the linear manifold L, which if x* ~ L is the classical necessary condition for the minimization of (76) subject to (77). Conversely, we conclude from (82) that if the limit x* is a stationary point for (76) then, from the vanishing of the right side of (82), x* satisfies (77). Finally, in order to keep the sequence (81) "closer" to L, we can replace a subsequence of (81) by
222 (83) 3.3
[O~o~r
ADI BEN-ISRAEL k=l
x,~+l = PL(xvk) = A+b + PN(A)X*k,
2,....
L i n e a r inequalities.
Let C ~ L ( W , Rk), d e R k and consider P r o b l e m 7: F i n d an x e R " such that
(84)
Cx - d <__0
This is Problem 3 with (85)
f ( x ) = Cx -- d
and K, the nonpositive orthant in R k, given as in (72). In applying our methods to this problem we encounter terms like (86)
BPx'(Y)
where B e L ( R k , Rp), y ~ R k.
Rearranging y as in (73) and likewise the columns of B B = (B+,B_)
(87) we getfrom(75) (88)
BP~.(y)=(B+,B_)Pr.(Y+-)=
B+y+
Method (8) now becomes (89)
xv+l = xv - (C+)+(Cx, - d)+
v = O, 1, ...
where the subvector (Cx, - d)+ consists of the positive components of C x , - d and the matrix (C+)+ consists of the corresponding columns of C +. Similarly, method (9) gives (90)
xv+ 1 = x, - T~l(Cr)+(Cxv - d)+,
v = O, 1, ...
and, in particular, from (17) (91) 3.4
x~+ x = xv - (CTC + ctzI) - l(Cr)+ (Cx~ - d)+, Linear
C L(R",Rb,
equations
and
inequalities.
Let
v = 0,1,...
A ~ L(R",R'~),
b ~R m
d~R k
and consider: P r o b l e m 8:
Find an x ~ R ~ such that (77) A x - b = 0 , and (84)CX-d<__O.
This is problem 4 with f as in (85), K as in (72), and, if (77) is solvable (92)
L = { x : A x = b} = A+b + N ( A )
1967|
NONLINEAR LEAST SQUARES PROBLEMS
223
Using (12)-(16) with ot = 1 we get from (71) and (88): (93)
x~+1 = xv - ( c r c + I)-2 [(Cr) +(Cxv - d) + + PR(AT~X~ -- A + b], v=0,2
....
Alternatively, one could treat (77) and (84) separately, by alternating the iterations (83) and (91). In the case where b = 0, d = 0, A = (B, - I ) , C = (0, - I ) and strict inequality in (84) problem 8 was similarly solved by Ho and Kashyap, [8]. 3.5
N o n l i n e a r inequalities. Let f : R " ~ R m and consider:
P r o b l e m 9: F i n d an x e R "
for which
(94)
J(x) < 0
This is problem 3 with K given by (72). Methods (8) (9) and (17) give reslzectivcly: (95)
x,+ x = x~ - ( f ' ( x ~ ) + ) + f ( x v ) +
v = O, 1, ...
(96)
x~+l = xv - T ~ - l ( f ' ( x ~ ) r ) + f ( x ~ ) +
v = O, 1, ...
(97)
x~+ j = xv - ( f t (x0 Tf ! (x~) + ~2I)- J(f'(x,.) r)+ f ( x O + ,
v = O, 1 , . . . .
If any of the above methods converges then its limit x* satisfies (98)
( f ' ( x * ) r) +f (x*) + = 0
so that x* is a stationary point for the partial sum of squares
E f/2(X) over
for which fi(x*) > 0
i
which thus depends on x*.
REFERENCES 1. R. Bellman and R. Kalaba, Quasilinearization and Nonlinear Boundary-Value Problems, American Elsevier, New York, •965. 2. A. Ben-Israel, A Newton Raphson Method for the Solution of Systems of Equations, J. Math. Anal. AppL 15 (1966), 243-252. 3. A. Ben-Israel, On the Newton-Raphson Method, Systems Research Memorandum No. 162, Northwestern University, Evanston, September, 1966. 4. A. Ben-Israel and A. Charnes, Contributions to the Theory of Generalized Inverses, J. Soe. Indust. Appl. Math. 11 (1963), 667-699. 5. E. W. Cheney and A. A. Goldstein, Newton's Method for Convex Programming and Tchebyeheff Approximation, Numerische Mathematik, 1 (1959), 253-268. 6. E. W. Cheney and A. A. Goldstein, Proximity Maps for Convex Sets, Proe. Amer. Math. Soe., 10 (1959), 448--450. 7. A. A. Goldstein, Minimizing Functionals on Hilbert Space, pp. 159--165 in A. V. Balakrislman and L. W. Neustadt (ed.) Computing Methods in Optimization Problems, Academic Press, New York, 1964.
224
ADI BEN-ISRAEL
8. Y-C. Ho and R. L. Kashyap, A Class of Iterative Procedures for Linear Inequalities, J. SIAM Control 4 (1966), 112-115. 9. A. S. Householder, Principles of Numerical Analysis, McGraw-Will, New York, 1953. 10. A. S. Householder, Theory of Matrices in Numerical Analysis, Blaisdell, New York, 1964. 11. D. W. Marquardt, An Algorithm for Least Squares Estimation of Nonlinear Parameters, J. Soc. Indust. Appl. Math. 11 (1963), 431--441. 12. D. A. Meeter, On a Theorem Used in Nonlinear Least Squares, J. SIAM AppL Math. 14 (19600, 1176-1179. 13. J. J. Moreau, Proximit6 et Dualitd darts un Espace Hilbertien, Bull. Soc. Math. France, 93 (1965), 273-299. 14. D. D. Morrison, Methods for Nonlinear Least Squares Problems and Convergence Proofs, Proc. Jet Propulsion Lab. Seminar: Tracking Problems and Orbit Determination, 1-9, 1960. 15. R. Penrose, A Generalized Inverse for Matrices, Proc. Cambridge Philos. Soc. 51 (1955), 406-413. 16. V. Pereyra, Iterative Methods for Solving Nonlinear Least Squares Problems, S I A M J. Numer. Anal. 4 (1967), 27-36. 17. B. T. Poljak, Gradient Methods for Solving Equations and Inequalities, Z. Vycisl. Mat i Mat. Fiz. 4 (1964), 995-1005. 18. J. B. Rosen, Optimal Control and Convex Programming, pp. 223-237 in Proceedings of the IBM Scentific Computing Symposium on Control Theory and Applications, T. J. Watson Research Center, Yorktown Heights, N. Y., October, 1964. 19. J. B. Rosen, Iterative Solution of Nonlinear Optimal Control Problems, 3". SIAM Control 4 (1966), pp. 223-544. 20. W. J. Stiles, Closest-Point Maps and Their Products, Niew Archief voor Wiskunde (3) XIII (1965), pp. 19-29 and 212-225.
NORTHWESTERN UNIVERSITY, EVANSTON~ ILLINOIS
SOME COMPLEMENTS TO BROUWER'S FIXED POINT THEOREM BY
HERBERT ROBBINS ABSTRACT The sets which can be the fixed points of a continuous function or a homeomorphism of B n are investigated.
We present some elementary complements to Brouwer's fixed point theorem in n-space. Let B ~ = all points P = (xt, ...,x~) with [[P I[2 = ]E]x~2 < 1 and S ~-1 = all P with I]P ]l = 1. If f : B n ~ B ~ is any continuous map of B ~ into itself, the fixed point set A of f is the set of all P e B n such that f ( P ) = P. Clearly, A is closed, and by Brouwer's theorem, non-empty. TIJEO~M 1. For any n > 1 and any non-empty closed set A c B ~, there is a continuous map f : B ~ B ~ with A as itsfixed point set.
Proof. Define d(P, A) = inf ]1P -
Q IJ.
Q~A
Then d(P, A) is a continuous function of P, and d(P, A) = 0 iff P e A. Choose any Q e A, and define f : B ~ B ~ by setting
(1)
P + d(P, A) ~(Q - f i-- QP)i
for
P#e,
Q
for
P=Q.
f(f)=
Then f is continuous and has A as its fixed point set. THEOREM 2. For any odd n there is a non-empty closed set A c B n which is not the fixed point set of any homeomorphism f : Bn ~ B ~. Proof. Let A consist of all points P with tLPII _< 2/2. Suppose f : B ~ B ~ is a homeomorphism with A as its fixed point set. Consider the family of continuous maps f S n- 1 ~ S n- 1 defined by setting Received April 15, 1967.
225
HERBERT ROBBINS
226
f(tP)
1~
J
then fl/2(P) = P ,
fl(P) =f(P)-
Hence the restriction of f to S"- 1 is homotopic to the identity )¢t l~as no fixed points, which is impossible for n odd. THEOREM 3. For any non-empty closed set A c B 2, there exists a homeomorphism f : B 2 "-~ B 2 with A as its fixed point set. Praof. Case 1. A contains an interior point of B 2, which we may assume to be the origin. Define f : B 2 -+ B2 by setting for any P = (x 2, x2) e B 2, f(P) = (x ~,x'2) with x~ = x l c o s t d- x2sint where t = d(P,A).
(2) !
x 2 = -- xl sin t + x2 cos t Clearly, f is continuous, with A as its fixed point set, and it is easy to verify that f is a homeomorphism of B 2.
Case 2. A contains a boundary point of B 2, which we may assume to be the point (1,0). We then replace (2) by x~-r= (3)
(xl-r)cost+x2sint,
x2' = - (xl - r) sin t + x2 cos t,
where
r 2=xl 2+x22
t = d(P, A),
and argue as before. REMARKS. 1. Theorem 3 is true for any B 2", at least in Case 1. To see this, define f : B 2"-oB2" by putting f(P)=(x~,x2,...,x2,) where x;,x 2 are defined by (2), x 3, x~ by (2) with 1 replaced by 3 and 2 replaced by 4, etc. The construction of f in Case 2 for arbitrary B 2" remains to be supplied. 2. By taking as B a the points P with x~2 + x22 + (x3 - 1)2 -<_1, considering the sections of this by planes through the x~-axis, and applying the transformation analogous to (3) to each of these, we see that ifA is a closed subset of B a containing at least one boundary point of Ba (in this case, the origin), then there exists a homeomorphism f : B 3__,B 3 with A as its fixed point set. Presumably the same holds for any odd n (certainly for n = 1). I am indebted to A. Dvoretzky and P. A. Smith for helpful suggestions. PURDUE UNIVERSITY, AND UNIVERSITYOF CALIFORNIA, BERKELEY, CALIFORNIA,U.S.A.
AVERAGED NORMS BY EDGAR ASPLUND
ABSTRACT
A method to construct an equivalent norm with both a rotundity and a smoothness property in a Banaeh space having two different equivalent norms, one with the rotundity and one with the smoothness property. In a number o f cases it has been proved that in some class of reflexive Banach spaces, one can in each space introduce an equivalent norm with some special property. For example, Kadec [4] has proved a result that each separable reflexive Banach space admits an equivalent locally uniformly rotund norm, and recently Lindenstrauss [5] has proved that each reflexive Banach space admits an equivalent rotund norm. In these cases, the classes are closed with respect to taking dual spaces, and then the theorems say that one may find other equivalent norms with the properies dual to those mentioned. A natural question is then whether one can find an equivalent norm satisfying both properties. We will show in this note a simple averaging procedure which yieIds an affirmative answer in the two cases mentioned above. In order to be applicable in other cases we will state the basic estimating lemma in a more general setting. Let E be a (real) vector space and let fo, go denote convex functions on E, taking values in R k; { + oo} but not identically + oo. Furthermore, we assume that the functions are homogeneous of second degree:
fo(tX) = t2fo(x), go(tX) = t2go(x) for all t in R and x in E hence nonnegative and vanishing at the origin of E. Finally, we assume that o and go are equivalent in the sense that there exists a positive number C such that
go --
228
EDGAR ASPLUND
[October
The second is the "inf-convolution average" gl defined by
gi(x)=inf {~(fo(x + y)+ go(x- Y)):Y~E} It is easy to see that fx and gx are convex, homogeneous of second degree, and that they satisfy the relations go < gl < f l
fa -<- (1 + 2-~C)g~ Now iterate this procedure: 1
f.+ l(x) = :~(L(x) + g,,(x)) (1)
for n > 0 1
g.+ l(x) = inf ~(f.(x + y) + g.(x - y)): y ~ E The result is two sequences of functions that satisfy the relations
g. <=g.+a <=f.+l <=f. f . =<(1 + 2-nC)g.
for all n > 0
Hence the two sequences converge (uniformly on each set on which fo is bounded) to a likewise second degree homogeneous convex function h
h(x) = l i m f . ( x ) = lira g.(x) . " * O0
(2)
n--~ O0
(1 + 2-"C)-1h < g. ~ h < f _-<(1 + 2-nC)h
The estimates (2) of the speed of this convergence are however too crude to be useful. The object of the main lemma is to improve these estimates. LEMMA g --
for all n > O.
The proof is by recursion starting with n = 0. Assume n is the largest natural number for which the lemma has been proved. We have then the following estimates, using homogeneity and convexity and putting for the time being 1 + 4-"C]2 = a f.+l =
11(__:r. + y) + g.(x - y)) >=-~\a~..,+ l(ax + ay) + l f . + l ( x - y)))
-
1 l+aI
l+a_
1
+ ay)+ 2
a
y)) "~ )
1967]
AVERAGED NORMS
229
Taking the inf over y of both sides above, one obtains the desired conclusion:
f.+l(x) < (1 + 4-t"+l)C)g.+l(x) Thus the lemma is proved. From now on we will assume that the functions are finite-valued everywhere on E and vanish only at the origin. We first prove that strict convexity of either of tee starting fur~cticns fo or go is inherited by h. THEOREM 1. Proof.
If f o is strictly convex, then so is h.
Because of the iteration procedure, one can write the function f , as 1
f, = ~ f o + h. where h. is another convex function, Using the lemma we have
0 <jr. - h
Now suppose x and y are points in E such that y # x. We have the estimate (3)
h(x) - 2h (x 2 Y ) + h(y ) > l[fo(x)-2fo~--~-)+fo(Y)-
C(fo(x)+fo(Y))]
Since fo is assumed to be strictly convex, the right hand side above is strictly positive for some n, proving that h is strictly ccnvex, as asserted. As indicated in the beginning of this paper the intended use of our averaging procedure was to work with on equivalent norms of a Banach space (or at least a normed space, but for our purposes here there is no loss of generality to assume it complete), so we will now suppose that E is a Banach space and that fo is related to one of the equivalent norms by the formula (4)
1 fo(x)=~llxll2
The reason to take this relationship is that the same one will then connect the associated dual norm in E*, the conjugate Banach space, with the function )Co. conjugate to fo, which is defined by
fo*(X) = sup{(x,y>-fo(y):y~E} for all x in E*
230
EDGAR ASPLUND
[October
The same remark applies to the other functions f., g. and h - - t h e related norms are all equivalent, in fact, they are given by the expressions (2f.(x)) 1/2, (2g~(x)) 1/2 and (2h(x)) 1/2 respectively. Also, the conjugate functions f.*, g.* and h* correspond in that way to equivalent norms of E*. Moreover, the iteration relations (1) become inverted on the conjugate side: f,+ 1*(x)= i n f { ~ ( f , * ( x + y ) + g , * ( x - y ) ) : y ~ E * } g.+ l*(x) =
+ g.*(x))
whereas the estimates related to the Lemma become (1 + 4 - " C ) - I h * < f~* < h* < g,* < (p + 4-"C)h* All this follows from the general theory of conjugate functions in Banach spaces, for which we refer to Brondsted [1], together with the elementary fact that if f is convex and homogeneous of second degree and C is a positive constant, then (cu)* = C - ~f * Note that with respect to the general theory we are in a particularly simple ease, since the functions involved are all everywhere continuous and a f o r t i o r i finite valued. We are now ready to state the first main application. THEOREM 2. I f in a Banach space E there exists one equivalent rotund norm and another equivalent norm whose dual is a rotund norm for E*, then there exists a third equivalent norm with both these properties. In particular, each reflexive Banach space has an equivalent norm which is rotund and smooth. Proof. If ]1x I1 is a rotund norm for E, then f0 given by (4) is strictly convex. In the same way we may suppose that go* is strictly convex on E*. Therefore, by Theorem 1, h is strictly convex on E and h* is strictly convex on E*. Thus the norm related to h is rotund and its dual is rotund on E*. By the result of Lindenstrauss referred to in the first paragraph, the hypotheses of the theorem are fulfilled if E is reflexive, in which case smoothness and rotundity are dual properties, so that the averaged norm will be both rotund and smooth. Prof. Lindenstrauss has pointed out to the author that Theorem 2 settles the only remaining open question in the table of Day [2], in that it shows that Co(I) for an arbitrary index set I - - is sere, i.e. that it can be renormed with a norm which is both rotund and smooth. Actually, in [2] Day constructs an equivalent rotund norm for Co(I) and also an equivalent rotund norm for the dual space
19671
AVERAGED NORMS
231
ll(l) which is moreover a "conjugate norm", i.e. derives by duality from some norm on co(l). Thus our averaging procedure applied to the two norms on co(l) gives a third equivalent norm with both desired properties. We will now investigate uniformity conditions on the strict convexity of the considered functions. A convex function f will be called locally uniformly strictly convex at x if the quantity (5)
inf { f ( x ) -
is strictly positive for all e > 0. Note that since all our norms are equivalent it does not matter which one is used for IIY [J above. Also, supposing that f is one of our second degree homogeneous norm-related functions, we say that it is uniformly strictly convex if the quantity
is positive for all e > 0. We have then the following theorem THEOREM 3 I f fo is (locally) uniformly strictly convex, then so is h. Proof. Take the appropriate inf of both sides of (3), then choose n large enough. It is intuitively evident that any of the functions fo, h, go*, h* is (locally) uniformly strictly convex if and only if the related norm is (locally) uniformly rotund. To check this in detail requires some quite elementary but messy computations which we outline here for the local case. Since we are free to choose our norm in (5), we take the one related to f as in (4) and express f in it. Then (6)
inf{~l[xll 2 -
~/12+ x÷y
lly ] l Z : l [ x - y [ ] > e }
is strictly positive for all e > 0. By specialization to the case Irx 11= [Iy ll = 1, this implies local uniform rotundity of the norm, which is usually expressed as follows. For each u in E with ]l u -- 1 there is a function ~5(e)with strictly positive values for e > 0, such that
11
(7)
Ilvll = IIu I1 = 1, IIu-v I[
implies1- - ~
>8
conversely, to see that (7) implies (6), we put u = x/[I x [I, v = y~ [ry [I but then we have to distunguish between three cases:
I: llxll-Hyrl =-~-, ii:o____]lxlr-ilyH=2,
iii:o<]ly]l-Ilxll=
2
232 Case I:
EDGAR ASPLUND 1
~llx ir~-/~
1
~+~ II, fi~ )~ II~ - ~lZxii+ Ji,ll~ +~ li~lJ~
1
=~(Hxll-Ily[I)2 Case II: Put k_-0ixll/llyli
~11~H~-/~
[October
a
52
and use x ÷ y = l l y l l ( u ÷ v + ( k - 1 ) u ) :
~+ ), H~ ii, iJ~(~+~~ (TO+~))"+° ~-'~
Case III: l iJxH2 -
x~-~Y2+lljyl12_>_llxIill yl[( 1 -
-~[])
Since in case II and III one can estimate l l u - v [I:
-
=
x-y
II
=2TlyI~' 2rlxrl
it follows from (7) that (6) admits a strictly positive infimum. Thus f is locally uniformly strictly convex if and only if the related norm is locally uniformly rotund. It follows that, as in Theorem 2, one can "mix" any of the three properties rotundity, locally uniform rotundity and uniform rotundity in E with any of the three in E*. We leave it to the reader to imagine the details. Also, one can easily prove that other norm properties, like the "uniformly non-square" property invented by James [3] and related properties, are inherited from the norm related to fo to that related to h. Here we will only state the result on separable reflexive Banach space mentioned in the beginning. THEOREM 4. I f E is a separable reflexive Banach space, then there exists an equivalent norm for E such that both it and its dual are locally uniformly rotund. Consequently, the norm and its dual are both Frdchet differentiable. Proof. The cited result of Kadec says that any separable Banach space has an equivalent locally uniformly rotund norm. For reflexive spaces, local uniform rotundity in one of the spaces implies Fr~chet differentiability of the dual (but not conversely).
1967]
AVERAGED NORMS
233
REFERENCES 1. A. Brondsted, Conjugate convex functions in topological vector spaces. Mat.-fys. Medd. Dansk. Vid. Selsk. 34 (1964) 2 27 pp. 2. M. M. Day, Strict convexity and smoothness of normed spaces. Trans. Amer. Math. Soc. 78 (1955), 516-528. 3. R. C. James, Uniform]y non-square Banach spaces. Ann. of Math. 80 (1964), 542-550. 4. M. I. Kadec, Spaces isomorphic to a locally unifoimly convex space. (Russian) Izv. Vyss. Uceben. Zaved. Matematika, 13 (1959) 65 51-57. 5. J. Lindenstrauss, On non-separable reflexive Banach spaces. Bull. Amer. Math. Soc. 72 (1966), 967-970. UNIVEI~ITY OF WASHINGTON, SEATTLE, WASH.
MEASURABLE CARDINALS AND THE CONTINUUM HYPOTHESIS BY
A. LI~VY AND R. M. SOLOVAY1 ABSTRACT
Let ZFM be the set theory ZF together with an axiom which asserts the existence of a measurable cardinal. It is shown that if ZFM is consistent then ZFM is consistent with every sentence t~ whose consistency is proved by Cohen's forcing method with a set of conditions of eardinality < Ic. In particular, if ZFM is consistent then it is consistent with the continuum hypothesis and with its negation. 1. Introduction. The two major questions which are left unanswered by the Zermelo-Fraenkel axiom system, ZF, 2 of set theory are, roughly: H o w " m a n y " subsets does a given set have, and how " b i g " can cardinal numbers get. Various axioms have been considered, which answer, to some extent, these questions. This suggests the following problem: H o w do suggested answers to one o f these questions affect the answer to the other one. It turned out that the strong axioms of infinity traditionally considered did not have any effect on the number of subsets of a given set; these axioms did not even settle the question of whether a given set has a non-constructible subset. (These results are due to Cohen and G/Sdel.) However, when one considers the axiom which asserts the existence of a measurable cardinal, the situation changes radically. It was first shown by Scott I-9] that this axiom entails the existence of nonconstructible sets; later work by Rowbottom, Gaifman, Silver and Solovay showed that from this axiom one can draw far-reaching conclusions concerning the existence of non-constructible sets of integers. Thus one can show, assuming there are measurable cardinals, that there are many non-constructible subsets of 09. Can one show, using the same assumption, the existence of more than Nl subsets of 09, and thereby refute the continuum hypothesis? Unfortunatel$' the answer is no. We shall prove, in this paper, the following theorem: 1 The research of the first named author has been sponsored in part by the Information Systems Branch, Office of Naval Research, Washington, D.C. under Contract F--61052 67 C 0055; the second named author was partially supported by an NAS-NRC post-doctoral fellowship and by National Science Foundation grant GP-5632. 2 Throughout this paper, we consider the axiom of choice to be an axiom of ZF. Received April 16, 1967 234
MEASURABLE CARDINALS AND THE CONTINUUM HYPOTHESIS
235
THEOREM 1. Suppose that ZF + "there is a measurable cardinal" is a consistent theory. Then both the continuum hypothesis and its negation can be consistently added to this theory. Our principal tool in proving Theorem I is Cohen's forcing method [1]. In order to verify that there are measurable cardinals in the Cohen extension, we prove a technical theorem, Theorem 3 below, which states, roughly, that if r is a measurable cardinal, and the set of conditions has power less than to, then ~cremains measurable in the Cohen extension. We shall prove a strengthened version of Theorem 1, which considers the value of 2 *" for a suitable initial segment of the a's. Before stating our result, we recall a result of Easton [2]. Given a function G such that one can prove in ZF (i) a < fl --+ G(:t) _~ G(fl) and (ii) NG(~)is not cofinal with cardinals =< N~ then if ZF is consistent, it remains consistent when we add the axiom Va (N~ is regular ~ 2 *" = No(~)), provided that the definition of G is of a particularly simple form. The rigorous requirement on G is that G should be absolute with respect to the Cohen-extension used in [2], but for our purposes it is enough that G(a) can be given by a term which involves only functions on the ordinals which have definitions refering only to constructible sets, such as the constants 0, 1 and the binary operations x + y and x • y, together with the (possibly non-constructible) unary operation cox. For example, we can take G(~) = co,., + co,.2+7. These restrictions on G are essential; if, for example, we are allowed to refer to the operation of cardinal exponentiation in the definition of G, we could choose G(a) = 2z~'; for this G, 2 ~" = N6(,) is obviously contradictory. We denote by Z F M the theory obtained from ZF by adding to it the statement that there exists a measurable cardinal. The following theorem gives a partial description of the possible behaviour, relative to ZFM, of the function 2 a" (for regular R~'s). THEOREM 2. Let G 1 and G2 be functions on the ordinals for which one can prove in ZFM (i) and (ii) above; let the functions G1, G2 and a particular ordinal F be given by definitions which are absolute as required above; let the following facts be provable in ZFM: (For brevity, we denote by ~co the least measurable cardinal.) (iii) F < tco.
(iv) W(~ (v)
< r--+ Gt(~ ) < r).
¥~(F -< ~ -< ~o--+ GI(~) = a + 1).
236
A. Lt~VY AND R. M. SOLOVAY
[Octobei
Define a function G: On -* On by G(~) = GI(~) for ~ < Xo, G(~) = G2(~) for ~ > x0.
( lt is clear that G satisfies ( i) and ( ii ) above.) Then if Z F M is consistent, it is compatible with the axiom W(N~ regular ~ 2R = ~qo(~)). The proof of Theorem 2 uses a recent result of Silver I11] which states that the generalized continuum hypothesis is relatively consistent with ZFM. Theorem 2 allows us to manipulate the values of 2 ~ (for ~ regular) in the ranges ~ < F and ~ > ~:o but gives no information (beyond that given by Silver's result) in the range F < ~ < x o. Thus the following questions are open: 1) Is 2~° = N~o+2 consistent with ZFM? 2) Is V~(~ < Xo and N~ regular -~ 2 ~ = N,+2) consistent with ZFM? 3) More generally, let G(~) be given by an absolute definition of the sort described preceding Theorem 2, and suppose that in ZFM, one can prove (i) and (ii) above. (For example, let G(~) = ~ + 2.) Is V~(N~ regular ~ 2 "~ = NG(~)) compatible with ZFM? (It seems likely that the answer to each of these questions is "Yes".) Our formulation of 3) does not allow for the notion of a measurable cardinal to appear in the definition of G. If we do allow this notion, then requirements (i) and (ii) are definitely not sufficient. Of course we have to rule out anything like G(0) = ~o, but there are more subtle requirements relating G(ro) to the values of G(a) for a < Xo. For example, it is shown in Hanf-Scott [3], that if 2 ~" < Na+2, for all regular N~ less than Xo, then 2~o < N,o + 2. The remainder of this paper is organized as follows: in §2, we write down axioms for a forcing relation, which allow us to consider simultaneously the case of Cohen extensions and that of Boolean valued models. (Cf. [10]. That we can treat these two cases together is not surprising in view of the close relations between them established in [10].) In §3, we reduce the proofs of Theorems 1 and 2 to the proof of a certain technical result (Theorem 3 below). In §4, we give a proof of Theorem 3 together with a partial converse. Theorem 3 allows one to extend many other results from the case of ZF to that of Z F M : For example, Souslin's Hypothesis and its negation, Kurepa's Conjecture and its negation ([12]), and the proposition that all projective sets are Lebesgue measurable ([14]) are all relatively consistent with ZFM. 2. Axioms for a forcing relation. It is well-known to people working with Cohen's method that the use of countable models is not essential for consistency results. All that is really needed is a "forcing relation" having suitable properties.
1967] MEASURABLECARDINALS AND THE CONTINUUM HYPOTHESIS
237
In this section, we write down a set of axioms for a forcing relation, sufficient to carry out the proofs in this paper. These axioms will be simultaneously applicable to the forcing relations introduced by Cohen and to the Boolean valued models introduced by Scott and Solovay [10]. In reading the proofs in later sections, the reader may prefer to check our arguments using his intuitive knowledge of forcing, rather than the axioms. By a class, in this paper, we always mean a subcollection of the universe of sets of the form {x[ ~b(x, t)}. Here ~ is a formula of ZF with two free variables and the set t serves as a parameter. For example, each set t is a class since t = {x Ix e t}. By a Cohen extension of set theory we mean the following. We are given a class C, the members of which we call conditions; a reflexive partial ordering ~ on C with a minimal member which we denote by 0; a class T, the members of which we call terms; finally, there is a one-one mapping (x ~ x} of the class of all sets into T. We let La be a first order language which includes the language of set theory (and which possibly contains extra relation symbols and special quantifiers). For every formula ~b(xl, ...,x,) of &o, with no free variables other than xl, ...,x,, we suppose given a subclass IF÷ of C x T ~. We write (1)
p Ik~b(tl,..., &)
for (p, tl,..., t,> e Ih÷. We read (1) as " p forces q~(tt,... , t,)" (where by 'forcing' we intend what is frequently referred to as 'weak forcing', i.e. 'forcing of the double negation'). This completes our description of the " d a t a " of the concep "Cohen extension". When one starts with a countable model M of ZF and extends it, by Cohen's method, to another model N of ZF the sentences ~b for which 0 IF ~b is true in M are true in N; moreover if ~ is a sentence of set theory (which does not contain the additional symbols of ~ ) and ~ is true in N, then in many instances of the use of Cohen's method, 0 IF ~b is true in M . Each set in N is denoted by some term t e T. In particular, if x e M _c N , x is denoted by the term x . We shall assume that ~ contains a unary predicate S(x), which reads 'x is standard' and which is satisfied in N exactly by the members of M . Even though in the following we shall not be dealing with countable models M and N (but talking only of forcing relations) we shall say that '~b is true in the extension' for 0 IF ~b. We shall say '~b is true in the ground model' if ~b is true. If ~b is a formula of the language of set theory, let ~bs be the formula obtained by relativizing the quantifiers with S; i.e.
[ 3x(S(x)/ relativization commutes with the other logical connectives, and is the identity
238
A. LI~VY AND R. M. SOLOVAY
[October
on atomic formulas. Then ~(Xx, ..., x,) holds in the ground model iff ~S(xD ..., x,) is true in the extension. Thus the predicate S allows us to talk about the "ground model" in cp. We use the letters p, q (possibly with primes attached) to denote members of C. The classes I~-~are required to satisfy the following axioms: (a) p Ib ~ b ( t l , . . - , t,) if and only if for no q ~ p does q IF ~b(q,..., t,). (b) p I~- ~b(q,..., tin) V @(q,'", tin) if and only if for every q ~ p there is a q' ~ q such that q' II- ¢ ( t t , ' " , tin) or q' II- d/(q, ..., tm) ; p Ib =tX dp(tl, "', tn, X) if and only if for every q ~ p there is a q' ~ q and a t e T such that q' II- c~(q,..., tn, t). (c) pl~-qS(tl,'",t,) Aff(tt,...,tn) if and only if pl~- q~(tl,...,t,) and p I~-~b(q, ..., t,); p It- Vx~(tl,... , tn, X) if and only if for every t e T, p II- q~(h,"', tn, t). (d) p I~- S(t) if and only if for every q ~ p there is a q' ~ q and a set x such that q' It- t = x . p l~- t e a if and only if for every q ~ p there is a q' ~ q and a set x e a such that q' IF- t = x . (e) Let ~b(x~, ...,x,) and ff(xt, '",xk) be formulas of La. Let h, ...,t,,sl,...,sk be terms from T such that the sentences ~b(h, "",tn) and ~b(sl, "",sk) coincide. Then p Ib tp(h, ..., t,) if and only ifp I~- ~b(sl, ..., Sk). (Example: let q~be "xl = x2", ~b be "Xl = xa" and let tl = t2 = sl.). (f) If dp(xl, ...,x,) is an axiom of logic or ZF then p I~- ~b(q, ...,t,). (In the case of the replacement schema, we include all the instances expressible in ~'; i.e., we allow predicates other than e or = to occur.) If p I~- ~b(q, ... ,t,) and p Ib~b(h, ..., t,)-~ ~ ( h , ' " , t,) then p I~-~(h, ""', t,). Hence if dP(xl,..., x,) is a theorem of ZF then p I~- ~b(tx,..., t,). REMARKS. We can now give a sense to p I~-~b(tl, ..., t,) even if ~b contains symbols which do not belong to the primitive language of ZF but are defined symbols of ZF (or, for that matter, if q~ is formulated in English); in this case we mean by "p I~- ~b(q,..., t,)" that p [~-~b'(tl, ..., tn) where q~' is a formula which uses only primitive symbols of ZF and which is equivalent to ~b; from (f) it follows that the truth or falsity of p I~-qV(q,...,t,) does not depend on the particular choice of ~b'. Notice also that if q~(tl, ...,t,) and ~b(tl, -.., t,) contradict one another in ZF then we cannot have p [~-c~(q,...,t,) together with p [~-~b(q,'",tn). In fact, from p IbqS(q,...,t,) and p [~-q~(ti,...,t,)~-]~(q,...,t,) (true by (f)) we get p I~--1 ~,(tl, ..., tn), and this contradicts p I~-~b(q, ..., t,), by (a). The following assumption is not needed for the proof of Theorem 3, but it holds in all the applications: (g) 0 Ib Vx(x is an ordinal ~ S(x)). From (a)-(f) we obtain the following consequences. (h) If p I~-~b(ta,..., t,) and q ~ p, then also q I~-~b(tl,..., t,) (this follows by applying (a) and (f) to -1 ---q~b(tx, ..., t,)). (i) p Ikd?(q,...,t,)~k(tl,...,t,) if and only if for every q ~ p such that
1967]
MEASURABLE CARDINALS AND THE CONTINUUM HYPOTHESIS
239
q It-$(h,'", t,) there is q ' ~ q such that q' 1~-~(tl,..-, t~). (This follows from (a), (b), (f) and (h).) (j) p II~Vx(S(x)~ dp(tl,...,t,,x)) if and only if for every x, p Ib$(tl,...,tn,x); p I[-Vx(x¢a-~c~(tl,...,tn, x)) if and only if for every x e a , p IbqS(tl,...,t,,x). (This follows from (d), (c), (f), (h) and (i).) There is an analogous fact concerning existential quantification. We say that a formula $(x~,..., xn) of the language of ZF with no free variable other than Xl, ...,x, is absolute with respect to the extension if for all x l , " ' , x ,
0 lt-$(xl,x2,...,x~) if ~b(xl,...,x~) and 0 Ib--a~b(xl,...,x~) if ---l$(xl,...,x,). Using (d), as well as (c), (f), (h), (i) and (j), one can show the following formulas are absolute: x e y, x = y. (The proof is a simultaneous transfinite induction on max(][xll,]ly[I) where ][xl[ is the rank of the set x.) If d?(xl,...,x,) and ~(xl,...,x~) are absolute, so is -~ ~b(xl,...,x,) and ~b(xl,..-,x,)A ~(xl,...,x~). Moreover it follows from (j) that ( 3 x i e y)~b(xi, ...,x,) and (Vxl e y)~b(xl, ...,x,) are absolute if q5 is. Using these remarks, it is easy to verify the following: (k) The following formulas are absolute: x e y , x = y, x c_ y, x n y = O, {x} = y , x = y U z , U x = y, f is a (one-one)function from x onto (into) y (into the power-set of y), z = LJ.~ ~yf(x), x is an ordinal. This completes our discussion of the consequences of (a)-(g). We now discuss how we will use the formalism discussed here in later sections. Suppose that we have theories T1 and T2 at least as strong as ZF and we wish to prove (2)
Con (T1) ~ Con (T2).
(Con(T,.) is the number theoretic sentence expressing the consistency of Tv) Then within the theory T~ we construct a Cohen extension by giving definitions of C, T, II-~, etc. (One only has to give I~ for atomic formulas 4; I~ for more complicated formulas is then determined by a)-g).) We do this in such a way that a)-g) are theorem schemes of Tt. Finally, we show that for each axiom 0 of T2, 0 I~-0 is a theorem of T~. From this, (2) follows easily. In practice, it is not necessary to explicitly describe the class of terms T. Suppose for example, that our conceptual picture is that we get the Cohen extension by adjoining a class of ordinals A. Then to describe the Cohen extentension, M[A], we have only to specify 1) The partially ordered class C of conditions and 2) The forcing relation on the atomic sentences k e A (for ;t.'an ordinal). The construction of T, of II-~ for arbitrary formulas ~b, etc. can be carried out exactly as in [7]. The requirements (a)-(g) hold by standard arguments if C is a set; if C is a class, it will not in general be true that 0 II-~bfor each axiom ~b of ZF.
240
A. LI~VY AND R. M. SOLOVAY
[October
However, this will be true, by results of Easton, in the one case when we make use of a class of conditions. It is absolutely vital for the applications in §3 that we allow V ~ L to hold in the ground model since V = L is incompatible with the existence of a measurable cardinal. The presentation of forcing in 17] meets this requirement. Our axioms (a)-(g) are also satisfied by the Boolean algebraic-valued interpretation of ZF of Scott-Solovay 1"10]. One takes C to be the set of non-zero elements of a "dense" subalgebra of the Boolean algebra. (By "dense" we mean that for every nonzero element a of the Boolean algebra, there is a p ~ C with 0 < p < a.) We say p ~ q if q < p ; T is the class of all "sets of the model"; x is the "standard set corresponding to x " , and p II-tk(q, ...,t,) means that p < the truth value of q~(q, ..., t,). Both Cohen's method and the method of Booleanvalued models apply to cases when the axiom of choice fails. Our axioms for forcing still hold in this case if one now interprets ZF to be Zermelo-Frankel set theory excluding the axiom of choice. 3. Proofs of Theorems 1 and 2. Let P(ct) be a property of ordinals expressible in ZF. DEFINmON. P is preserved under mild Cohen extensions if whenever x is a cardinal such that P(x) holds and the class of conditions C is a set of power less than ~ then P(K) holds in the Cohen extension. DEFINITIOrq: A cardinal r is measurable if it is uncountable and the Boolean algebra S(x) of all subsets of x has a non-principal r-complete prime ideal. ("xcomplete" means closed under unions of power less than x.) Alternatively, an uncountable cardinal x is measurable if there is a x-additive measure #: S(x) ~ {0,1} vanishing on points and giving x the measure 1. Our definition of measurable cardinal differs from that in Scott [9] in that he only requires p to be countably additive. However the least measurable cardinal in the sense of [9] is measurable in our sense. If x is measurable, x is the xth strongly inaccessible cardinal. (All these results are proved in [5]. In ['5], the measurable cardinals are referred to as "uncountable cardinals not in C 1.") The following theorem is the key new idea of the present paper. It will be proved in §4.
THEOREM 3. The property of measurability is preserved under mild Cohen extensions. We now prove Theorem 1. Theorem 1 will follow immediately from Theorem 3 and the following theorem. THEOREM 4. Let P(~) be a property of cardinals which is preserved under mild Cohen extensions. Suppose further that I-zF W(P(~) ~ • is a strongly inaccessible cardinal).
1967]
MEASURABLE CARDINALS AND THE CONTINUUM HYPOTHESIS
241
Then if Z F + "3ctP(ct)" is consistent, then so are the theories
ZF + "3~P(~)" + "2 ~° = N I " and
ZF + "3~P(ct)" + "2 ~° > R l " . Proof. To get the second result we use the original Cohen extension of [1] to "adjoin" N2 generic subsets of 09. Then it is clear that 2 ~° > N 2 in this Cohen extension. Moreover, the set of conditions has power N2. From our assumptions on P it is clear that 3~P(ct) holds in the Cohen extension if it holds in the ground model. We now prove the first result: We are going to "adjoin" a map F: N1 --, S(o9) to the ground model. (S(o9) is the power set of 09.) A condition on F will simply specify the restriction of F to some countable ordinal, i.e.. a condition will be a function on a countable ordinal into S(o9). Thus it is clear that F maps N1 into the set of standard subsets of o9. An easy argument shows that F maps onto the standard subsets. Now the conditions are closed under countable increasing unions. It follows (by Lemma 7 below) that every subset of co in the Cohen extension is standard. Thus "2 ~° < N I " holds in the extension, so the continuum hypothesis holds in the extension. Finally, the cardinality of the set of conditions is 2~o. Thus our assumptions on P imply that 3~(P(0t)) holds in the extension if it holds in the ground model. The proof of Theorem 4 is complete. DISCUSSION: Nearly all of the large cardinal properties that have been considered to date are preserved under mild Cohen extensions. For example this is true of the properties: "Ramsey", "Strongly inaccessible", "Measurable", " M a h l o " , "n-th-order indescribable", "Strongly compact", etc. For many of these properties, this fact is well-known. For "Strongly compact" this is a theorem of McAloon [8]. Let P be some property of cardinals. We can express that there are many cardinals with the property P by the axiom
(3)
~/~ 3fl(fl >ct and e(fl))
which asserts that there are arbitrarily large cardinals with the property P. An even stronger principle in this direction can be expressed using the concept of a normal function. A function F: On --, On (On is the class of all ordinals) is normal if (1) ~ < fl ~ F(~) < F(fl), and (2) if ;~ is a limit ordinal, F(2) = sup (F(fl): fl < 2}). The following principle gives a scheme of first order axioms: (4)
(VF)(F normal ~ 3~(F(~)= ct and P(ct))).
It is not difficult to show that (4) implies (3). (The function { 2 ~ ~ +2} is
242
A. LI~VY AND R. M. SOLOVAY
[October
normal.) L6vy has shown in [6] that (4) is equivalent to the following reflection schema: 'Ca( 3fl > ~) (P(fl) A (Vxt, "', x, ~ R(fl))(¢(xl, "', x,) ~ (a R¢P)(x,,..., x,))
(Here R(fl) is the set of all sets of rank less than ft. eRcp) is obtained from ~ by relativizing all quantifiers to R(fl).) LEMMA 5. Suppose that the class of conditions C for some Cohen extension is a set and that the property P of cardinals is preserved under mild Cohen extensions. Then if (3) (resp. (4)) holds in the ground model it holds in the extension. Proof. For (3) this is clear. Suppose then that (4) holds in the ground model. We show that it holds in the extension. Let F be a class of ordered pairs in the extension. That is, F is given by some formula ¢(x, y, z) of ~ and a parameter t: F = {(x, y): ¢(x, y, t)). Suppose also that some condition p forces " F is a normal function". It suffices to show that there is an ordinal 2 such that p [~-¢(k, k, t) and P(k), since then we get, by the absoluteness of "x is an ordinal" (see (k)), p IF-k is an ordinal /k¢(~,,~,,t)/kP(~,),
and (4) holds in the Cohen extension by the second part of (b) and by (i). Suppose that q ~ p and q I~-(¢(ot, p,t) and ¢(ot,'/, t)). Then since p I}-F is a function, q IF-~=7 and hence (by (k)) f l = y . It follows that S, = {fl I (3q ~ p) q IF-t~(ct, [i, t)} is a set. We define a function G(~t) by transfinite induction: Let y be the least infinite cardinal following the cardinal of C. We put: G(0) = max (lub So, ?); G(e¢+ 1) = max(lubS~+l,G(a) + 1); G(;t) = lub {G(a): 0t < 2} for 2 a limit ordinal. Then G is clearly normal. For a not a limit ordinal, we have, by construction, (5)
p IFF(at) = II, where fl = G(00.
(Cf. the definition of S~.) Since p IF-" F is normal", it is easy to see that (5) holds also when ~ is a limit ordinal. Since (4) holds in the ground model, there is an ordinal ~ such that G(;t) = and P(2). Since G(2)> 0 ( 0 ) > ~ > the cardinal of C, P(~,) holds in the extension. Since p [~ F is increasing, and since G(2) = 2, p II-~, =< F(~) <= ~. Thus p IF-F(~) = ~. and PO.). q.e.d. Using Theorem 3, Lemma 5 and the method of proof of Theorem 4, we can derive the following corollary.
1967] MEASURABLECARDINALS AND THE CONTINUUM HYPOTHESIS
243
THEOREM 6. I f the theory Z F + "Every normal function has a measurable fixed point" is consistent then both "2~° = N~" and "2 ~° # N l " can be consistently added to it.
We now prove Theorem 2. We first invoke a recent theorem of Silver [-11] which states that if ZFM is consistent, so is ZFM + GCH. (GCH is the generalized continuum hypothesis which states 2 s- = N~+ 1 for all ordinals 0t). Our proof will be in two steps. In the first, we handle the values of 2 R" for 0e> ro + 1. In the second step we handle the values of 2 a~ for ~ __ Ko hold in the extension. A priori, the ordinal Ko need not be measurable in the extension. We show next, however, that the statement 3) Ko is the least measurable cardinal holds in the extension. Our proof will use the following lemma. LEMMA 7 (SOLOVAY [13]). Let 2 be an infinite cardinal. Suppose that the class of conditions has the property that every well-ordered increasing sequence of conditions of length at most ,~ has an upper bound in C. Then Vx(xf')~. is standard) holds in the Cohen extension.
Proaf. Let p be a condition, and t a term. We show that 3p' ~ p such that p' IFt n~, is standard. We first construct an increasing sequence {p~[0 =< 0e< ,t} of conditions such that 1) p ~ Po; 2) Either p~ )k at e t or p~ I)-at¢ t. There is no difficulty in constructing such a sequence by transfinite induction in view of our assumption on C.Let p' be an upper bound for {p~]g < 2} in C.Let a = {0t[p'lF0e6 t}. We prove now p' U-t¢'3~. = a as follows. By (f) it is enough to show p' IF-Vx(xe~. ~ (x e t ~ x ~ a ) ) A Vx(x ~ a ~ x~ t A xE~);
this can be easily shown by means of the second part of (j). In view of (d), Lemma 7 is proved. Since the conditions we are considering are dosed under unions of length at most to, Lemma 7 implies that every subset of ~0 in the Cohen extension is
244
A. LI~VY AND R. M. SOLOVAY
[October
standard. In exactly the same way we can also show that every subset of )% x Ko in the Cohen extension is standard, and therefore every function on a cardinal 2 < Ko into S0%) in the Cohen extension is standard. It follows that if I is a Kocomplete non-principal prime ideal in S(Ko), I is a Ko-complete non-principal prime ideal on S0%) in the Cohen extension. Thus ~o remains measurable in the extension. Similarly, if 2 is a cardinal < lCo, every subset of k is standard. It follows easily (since 2 x is also less than iCo), that 2 is measurable iff ~. is measurable in the Cohen extension. But ~:o was the least measurable cardinal, therefore, )% is the least measurable cardinal in the Cohen extension. We have proved the following lemma. LEMMA 8. Assume Z F M is consistent. Then Z F M remains consistent if we add the following axioms: 1) 2~" =~q~+l i f ~ < ~ o , 2) 2~" = N~2(,) if = > ~o, and N= is regular. (In 1), 2) ~¢o is an abbreviation for "the least measurable cardinal".) We shall now use another Cohen extension to deduce the consistency of the theory described in Theorem 2 from the consistency of the theory described in Lemma 8. The extension we use is again of the type considered in Easton [2]. For each regular cardinal N~ < F (cf. the statement of Theorem 2 in §1) we adjoin N~l(, ) generic subsets of N~. Then assuming that 1) and 2) hold in the ground model, it follows by arguments in 12] that the statement "2 ~" = Nc1(~ for ~ < Ko, and N~ regular; 2 ~ = NG:(~) for 0~> Ko, and N~ regular." holds in the Cohen extension. To complete the proof it suffices to show that " t o is the least measurable cardinal" holds in the Cohen extension. Now an examination of Easton [2] shows that the cardinality of the set of conditions used is at most 2r. Since Z F M ~-F < x o, and x o is strongly inaccessible, Theorem 3 implies that Ko is measurable in the Cohen extension. Suppose next that for some condition p and for some 2 < x o, (6)
p I~-L is measurable.
We shall derive a contradiction. This will show that ~o is the least measurable cardinal in the Cohen extension and so complete the proof of Theorem 2. From (6), it follows that p ll-k is strongly inaccessible. It follows from the definition of strong inaccessibility and from (k) that 2 is strongly inaccessible. Moreover since the definition of F is "absolute" and ZFM l- F is less than the least measurable cardinal, (6) implies that p II-F < k. It follows that 2 r < 4. Since 2 < to, 2 is not measurable. Thus (6) contradicts the ffollowing llemma which will be proved in §4 (and which is a partial converse to Theorem 3).
1967] MEASURABLECARDINALS AND THE CONTINUUM HYPOTHESIS
245
LEMMA 9. The property of being a "non-measurable cardinal" is preserved under mild Cohen extensions. This completes our present discussion of the proof of Theorem 2. REMARKS 1. R. Jensen [4] has announced the construction of a Cohen extension of an arbitrary model of ZF in which the generalized continuum hypothesis holds. It seems likely that, by using his techniques, one could prove a version of Theorem 2 for an arbitrary property, P, preserved under mild Cohen extensions. (The version would allow us, roughly speaking, to set 2 ~ ' = N~1(~) for regular N, < F, where GI is as in Theorem 2, F is a cardinal with an "absolute" definition and bzv(Va)(P(a) ~ F < a and a is strongly inaccessible). 2. We know of no systematic technique for proving "large cardinal" axioms compatible with GCH. In particular, the following questions are open, as far as we know: 1) Is ZF + GCH + "There is a compact cardinal" consistent (relative to ZF + "There is a compact cardinal")? 2) Same question as 1) but replace "There is a compact cardinal" by the axiom scheme "Every normal function has a measurable fixed point". ~4. m-complete prime ideals in Cohen extensions. The following theorem is a sharpened version of Theorem 3. THEOREM 10. Let rc be a measurable cardinal and I a non-principal xcomplete prime ideal of the power set of to, S(x). Suppose that the class, C, of conditions of some Cohen extension has power less than r. Then it is true in the extension that I generates a non-principal K-complete prime ideal J = {x _~ x: (3y e I) x ~y} of S(~). Proof.
Let t be a term and p a condition. Let T= {a< r:(3q~p)(q
II-0t~ t)}.
(Intuitively, T is the set of "possible" members of t N K.) Consider now the two possibilities: Te I and T ~ I: Case 1. T ~ I. It is now easy to prove p I[-Vx(x ~ K ~ (X E t ~ X ~ T)), using the second part of (j), (i), and the absoluteness of x e y which is stated in (r). Therefore we have, by the remark which follows (f), p II- tNK _~T. If we let J be an abbreviation for "the ideal generated by ! in S(x)," then it follows that p II-(tt3 K)~ J. Case 2. T ~ I. Since I is r-complete and the set of conditions, C, has power less than r, it follows that for some q ~ p, the set W={a
IF Qt~ t}
246
A. LI~VY A N D R. M. S O L O V A Y
[October
is not in I. Using the second part of (j) wc easily get q ll-W ~ t. Thus q fi-( z - t)~ Y, where V = r - W. Since I is prime, (r - W ) e I. It follows that
q II-(z -- t)~J" Our discussion shows that for any term t and any condition p, some extension of p forces (t n z) ~ J or (z - t) ~ J. It follows from (b) and (c) that 0 [~-J is prime. Since I is non-principal, 0 H-J is non-principal. To complete the proof, we show " J is u-complete" holds in the Cohen extension. We suppose the contrary and derive a contradiction. Let then p be a condition, and f a term such that p H-f is a function and domain (f) is an ordinal less than K and (Vx) (x ~ domain ( f ) -~ f ( x ) ~ J) and U range (jr) = u. By extending p, if necessary, we may suppose p IF- domain (f) = X, for some ordinal A < r. We let T~ be the set of "possible" members of f(~), given p. That is
T~={fl
IF"I~ef(0t))}
(~<2)
Suppose that for some 0~< 2 T~ 61 then, by the discussion of Case 2 above, there is a q ~ p such that W = {fl < x: q I}-pef(ot)} 6 I and hence q I}-f(0t)6J, which contradicts p I}-f(at)e J. Thus we have T~e I for every ~ < 2, and since I is K-complete also
We show this contradicts (8)
p i}-Lj range ( f ) = I¢.
Pick T e ( r - {,.J~<~T~). (Possible, since r ¢ I . ) For some q ~ p , and some ~ < ~, we have (by (8)) q It-T ef(ct). Thus ? e T~, which gives a contradiction. Theorem 10 is no longer true if the cardinality of C is > r. (In fact, following Cohen [1], we can choose a set of conditions of power r, so that 2 ~° >_-u in the Cohen extension. Measurable cardinals are strongly inaccessible so u is not measurable in the extension.) We now prove a strengthcned version of Lcmma 9. LEMMA 11. Suppose that r is a cardinal number and that the class of conditions, C, has cardinality less than r. It is true in the extension that if J is a K-complete non-principal prime ideal on the power set of K, then J is generated by some standard set 1 such that I is a non-principal r-complete prime ideal in S(x) in the ground model. Proof. Let p be a condition and J a term such that p H-J is a K-complete non-principal prime ideal in S(K). We must produce q ~ p, and an ideal I such that 1) I is a u-complete nonprincipal prime ideal in S(r), 2) q IV-l generates J in S(K).
1967] MEASURABLECARDINALS AND THE CONTINUUM HYPOTHESIS
247
The key fact is the following: (*) there is a q ~ p such that for every subset a of x, q II-aeJ or q I I - K - - a e J . Granted this fact, it is routine to check that I = {a
Iq tFa J}
is a r-complete non-principal prime ideal such that (9)
q It-I _ J.
By Theorem 10, I generates some x-complete prime ideal. By (9), we see that q forces this ideal to be J. It remains to prove (*). Suppose, to the contrary, that for each q ~ p, there is a decomposition x = aoq U a q1 ;
a qo O a ~l = 0
such that neither q I~-a°~ J nor q It-a~ ~ J . We divide x into equivalence classes by putting ~ ~ fl if for all q ~ p o ~ a q0 =- p
aO.
Then there are at most 2 cara~c)equivalence classes. Since K is strongly inaccessible, so is r, by (k), and therefore 2 cara
248
A. L]~VY AND R. M. SOLOVAY
4. R. Jensen, An imbedding theorem for countable Z F models (abstract), Ibid., 12 (1965), 720. 5. H. J. Keisler and A. Tarski, From accessible to inaccessible cardinals, Fund. Math., 53 (1964), 225-308. 6. A. l_~vy, Axiom schemata o f strong infinity in axiomatic set theory, Pacific L Math., 10 (1960), 223-238. 7. A. I.~vy, Definability in Axiomatic Set Theory I, Logic, Methodology, and Philosophy of Science, Proceedings of the 1964 International Congress (Y. Bar-Hillel, ed.), Amsterdam, 1966, 127-151. 8. K. McAloon, Some applications o f Cohen's method, doctoral dissertation, University of California, Berkeley, 1966. 9. D. Scott, Measurable cardinals and constructible sets, Bull. Acad. Polon. Sci., Ser. des Sci. Math., Astr. et Phys., 9 (1961), 521-524. 10. D. Scott and R. Solovay, Boolean-valued people looking at set theory, To appear in the Proceedings of the 1967 Summer Institute on Set Theory in Los Angeles. 11. J. H. Silver, The consistency o f the generalized continuum hypothesis with the existence o f a measurable cardinal (abstract), Notices Amer. Math. Soc., 13 (1966), 721. 12. J. H. Silver, The independence of Kurepa's conjecture and the unprovability o f a twocardinal conjecture in model theory (abstract), Ibid., 14 (1967), 415. 13. R. Solovay, Independence results in the theory of cardinals. I, IL Preliminary Report (abstract), Ibid., 10 (1963), 595. 14. R. Solovay, A model of set theory in which all sets of reals are Lebesgue measurable. To appear. THE HEBREWUNIVERSITYOF JERUSALEM, UNIVERSITYOF CALIFORNIA, BERKELEY,CALIFORNIA
¢p BY
CHARLES A. McCARTHY(I) ABSTRACT
The space cp is the class of operators on a Hilbert space for which the cp norm I Ttp= ltrace(T* T)p/2]1/p is finite. We prove many of the known results concermng cp in an elementary fashion, together with the result (new for 1 < p < 2) that cp is as uniformly convex a Banach space as lp. In spite of the remarkable parallel of norm inequalities in the spaces cp and In, we show that p ~ 2, no cn built on an infinite dimensional Hilbert space is equivalent to any subspace of any In or Lp space. 1. Introduction. This paper is devoted to a systematic study of the classes of compact operators on a Hilbert space known as c r Briefly, cp is the linear space of those operators T for which ] Tip = [tr(T*T) p/2] 1/p is finite. We show that there is a complete parallel between the spaces of operators cp and the sequence spaces lp, all the more surprising because no non-trivial cp is isometric to any subspace of any lp or Lp space nor is an infinite dimensional cp even bicontinuously imbeddable in any Ip or Lp space by a linear map. Our principal new result is that for 1 < p < 2, cp is uniformly convex and has the same modulus of convexity as lr In spite of the remarkable parallels in norm inequalities in the theory of cp and Ip spaces - - (analogues of the H r l d e r and Minkowski inequalities as well as Clarkson's inequalities which verify the uniform convexity of lp) h c, and Ip are very different as Banach spaces. The situation seems to be that extremal cases of the norm inequalities studied occur in commutative *-subalgebras of cp which are neces, sarily isometric to Ip. The non-commutativity of cp as an algebra (operator multiplication) seems to serve only to make the proofs of the theorems more involved. Most of our other results are not new, but we hope that our techniques are of some interest in themselves. We use principally the spectral theorem for self-adjoint operators and the polar decomposition. In particular we do not prove theorems in the finite dimensional case and then pass to a limit, nor do we make explicit use of the concept of tensor product, nor of the more sophisticatedinterpolation techniques. Most of the known results concerning cp may be found in Gohberg and Krein [5"], Dunford and Schwartz [4, pp. 1088-!144], Grothendieck [6], Dixmier [2], and Schatten [13, 14], and the references [therein. The beginnings of the subject, together with a number of related special theorems, Received, March 22, 1967 and in revised form July 20, 1967 (i) The author was supported by National ScienceFoundation Grant GP-5707. 249
250
CHARLES A. McCARTHY
[October
seems to be [11]. The spaces we consider are called Sp by Gohberg and Krein, Cp by Dunford and Schwartz, and Lap by Dixmier. We have chosen c for compact, lower case because the inclusion relations between the spaces cr are those of lp, not L r In fact, L~ has operator analogues which are spaces consisting mostly of unbounded operators; we shall study these in a later paper. Gohberg and Krein study the Orlicz space analogues of these spaces; the ratio of complexity to novelty remains high. It is a pleasure to thank Professor G. K. Kalisch for a number of interesting comments and suggestions, Miss Frances Frost and Professor N. Rivi6re who called our attention to errors of substance in an earlier version of this paper, and to Professors I. S. Gohberg, R. Kunze, and J. Stampfli who supplied invaluable references to the literature. Throughout this paper H will denote a fixed Hilbert space with noim[ • [ and inner product ( •, • ), the dimension of H is unimportant, p will range in the interval 0 < p < oo, and for 1 < p < oo, p' will always denote the conjugate exponent to p: (I/p) + (1/p') = 1. For linear operators A,B on H, we write A > B to mean that A and B are both self-adjoint and (Ax, x) > (Bx, x) for all x in H. We say A is positive if A > 0. We shall make frequent use of the polar decomposition of an operator, but in forms which are not quite usual; thus we include this development. We denote the range of an operator T by ~t(T) and the null space of T by .h/'(T); recall that ~ ( T ) " = Jlr(T*), ~ ' ( T ) " = 9t(T*). First notice that for any x in H, ITxl 2= (T*Tx, x) = ((T*T)*/2x, (T*T)X/2x) = I(T*T)I/2xI2. ThusrI(T)= rI((T*T) t/2, and upon taking orthogonal complements, ~t(T*) = 9I((T*T)*/2). Define U to be that linear operator for which U[.(T*T)I/2y + z]---Ty (y I X ( T ) , z EJff(T)), elements of the form (T*T)*/2y + z are dense in H and U has bound 1, so U may be extended uniquely to an operator (again denoted by U) on all of H. We also note that T = U(T*T)1/2 and further, U T*T = TT*U. It follows immediately by induction that U(T*T)"= (TT*)"U for all integral n > 0; thus for any polynomal tk, and hence for any Borel function ~p, Ut~(T*T) = ~p(TT*)U. In particular, with ¢k(t) = t 1/2 we have T U(T*T) 1/2 = (TT*)I/2U and with ~b(t)= t 1/4 we have T = U(T*T)*/4(T*T) 1/4 =(TT*)I/4U(T*T) 1/4. It may also be shown that U*U~(T*T) = ~b(T*T) = 4~(T*T)U*U for any Borel function ¢k which vanishes at zero. We will also use the trace of an operator. Suppose that an operator A is either positive or satisfies ~*l (Aft,, ~b,) I < oo for some orthonormal basis {~b,} of H. Then if {¢p} is any other orthonormal basis of H, the interchange of the order of summation is permissible in =
--
(A+,,¢,)
I(+..¢,)I*=
(A¢,.+,).
19671
cp
251
Thus the quantity ~,(A~b~, q~ ) is independent of the orthonormal basis {~b~}of H; we call this quantity the trace of A, denoted trA. If A is positive and compact, we may choose {~b~} to be an orthonormal basis for H consisting of eigenvectors for A; thus tr A is simply the sum of the eigenvalues of A enumerated with their multiplicities. It is also true that if ~ [ (Arks, ~b~)I < oo for an orthonormal basis {~b~}of H, that the eigenvalues of such an A are absolutely summable and that their sum is trA, but we will neither use nor need this fact. Now suppose that T is a compact operator on H. The operator T*T is positive and compact and has a unique positive square root which is also compact. The characteristic numbers of T are defined to be the eigenvalues p~ of ( T ' T ) 1/2 enumerated with their multiplicity; we arrange them in a decreasing sequence, at most countably many being greater than zero, as gl(T) > #2(T) > . . . > 0,
#n(T) ~ 0.
For 0 < p < 0% we define I TIp then cp-norm of T, whether finite or infinite, to be
I oo
I1/P
]Tip = {~t [~t.(T)]P I
I
= , ~ [.,(T)]'} I" = [tr(T*T)P/2] '/p.
We set [ T[ oo to be simply the operator norm of T. The class cp is the set of all T for which 1 TIp is finite. At this point, we wish to observe three facts which will be used throughout this work. Their proofs are sufficiently immediate to be omitted. LEMMA 1.1 a. I r b - - I ( r * T ) "
b.
b. I r A >=O, and r is a positive number, then {A Iw, = [ A [~. c. I f p < q ,
lTlp>]Tlq.
We now prove the preliminary theorem that ] Tip = ] T* IS" LEMMA 1.2. Let U be unitary. Then UT and T U have the same characteristic numbers as T. Thus for every p, I T]p = ] UT[p = ] TU[,. Proof. The squares of the characteristic numbers of UT are the eigenvalues of (UT)*(UT) = T*U*UT = T ' T , so that I~(UT) = #~Z(T); since characteristic numbers are non-negative, it follows that IA(UT) =/~(T). The squares of the characteristic numbers of T U are the eigenvalues of (TU)*(TU)= U*TT*TU; this operator is unitarily equivalent to, hence has the same eigenvalues as, T*T. THEOREM 1.3. T and T* have the same characteristic numbers (zero possibly excepted). Thus for every p, ]T]p = [ T* [p . Proof. It is a well-known fact that for any two e~ements a, b in a Banach algebra (such as a = T, b = T* in the algebra of all bounded operators on H),
252
CHARLES A. McCARTHY
[October
the spectrum of ab is equal to the spectrum of ba (zero possibly excepted). For our purposes however, we need also to keep track of the multiplicity of the spectrom; the theorem may be demonstrated by means of a perturbation argument from this general Banach-algebraic fact, but we prefer to give the following alternative proof. If H is finite-dimensional, then in the polar decomposition of T : T = U(T*T) 1/2 we may take U to be unitary. Thus T = U(T*T) ~/2 and T* = (T*T)~I2U * ; by lemma 1.2, both T and T* have the same characteristic numbers as ( T ' T ) ~/2. If H is not finite dimensional, the operator U appearing in the polar decomposition of T need not be unitary nor admit of replacement by a unitary. Thus we adopt the following procedure which is valid for H of any dimension. Consider the Hilbert space ~ which is the direct sum of H with itself :/~ = H@H. Let ~ = T @ 0 so that T*T = T * T @ 0; thus T and ~have the same characteristic numbers (zero possibly excepted). Similarly T* and ~* have the same characteristic numbers (zero possibly excepted). Let U be the partial isometry which appears in the polar decomposition of T: T = U(T*T) a/2. U is an isometry of ~((T*T) 1/2) onto ~(T) and vanishes on,W'((T*T)I/2). The operator 0 = U ~ 0 then maps :~((~,~)~/2) isometrically onto & ( ~ ) a n d vanishes on ,~V (( T* T) 1/2) The orthogonal complements of &((~* ~) i/2): ~((T* T) 2/2)± ~ H, and of ~ ( ~ ) : ~(T)'L @H, are of the same dimension and thus there exists an isometry 1~ of ~((~,~)~/2)_L onto ~ ( ~ ) ± which vanishes on ~((~,~)~/2. The operator 1~ = /~ + 1~ is an isometry o f ~ o n t o Dand hence is unitary; also ~ = g/(T* 2V)~/2, hence ~* = (~,~)1/2g/,. By Lemma 1,2, the characteristic numbers of ~ are those of (~,,f)~/2 which are those of T*. Thus (zero possibly excepted), the characteristic numbers of T are those of T*. 2. Norm inequalities in cp. In this section we will prove analogues of the H61der and Minkowski inequalities for cp as well as Clarkson's inequalities which demonstrate the uniform convexity of %(1 < p < o0). The importance of our first lemma cannot be over-emphasized. While it is no more than an elementary observation, it is the result which underlies all our computations. LEMMA 2.1.
Let A > O. Let x ~ H. Let ~ be a given positive real number. Then i f O < v < 1,
( A ~ x , x ) < ( A x , x)'Ix[2(I-~);
/f 1 =< r < oo, (a x,x) _>_(ax, x)l I f v ~ 1, equality implies that x is an eigenvector of A.
Proof. First suppose V >-1. Let E(. ) denote the spectral resolution of A. Then using the H/51der inequality we have
19671
et,
(Ax, x) = f ;
253
,~.(E(d2)x,x)
< [fo°° 2;'(E(d2)x,x)]'/' . [ £ °° 1 • (E(d2) x, x)] (r- 1)/v --
(A'x,x)*". I xl ="'-*)"
If 0 < ~ ~ 1, then apply what has just been proved to the operator A ~ and the number 1/v:(Arx, x) <=(Ar/v)x,x)~[x] m-r). Equality in the case ~ # 1 requires that the ratio of 2 ~ to 1 be constant on the support of the measure (E(d2)x,x); this support may thus contain at most one point Ao so we have Ax = 2oX. As an immediate consequence of Lemma 2.1 we obtain the following useful expressions for the ep norm of an operator. LEMMA2.2.*
IfO
thenlTl~=inf ~ [T4),[~';
/fE=
then [ Tl~= sup ~ ] T4),] ~
(The inf or sup is to be taken over all orthonormal bases of H). If p ~ 2, equality occurs if and only if {4),} is an orthonormal basis for H consisting of eigenvectors for T*T. If p~_ 1, then [Tip = sup { ~ , [ (T4),,~,,)IP) 1/~ where 4), and ff~ run over all pairs of orthonormal bases for H. Equality holds if and only if {4),} is an is an orthonormal basis for H consisting of eigenvectors for T*T and {tp,} an orthonormal basis for /-/ obtained by completing the orthonormal set
{(r4),/] r~,]): r4), # o}. Proof. Using Lemma 2.1 with y = p/2, we have for any orthonormal basis {4),} of H that
((T*T)'/z4),,4),)__< (T'T4),,,4),)p/2 = IT4),,[p
(p < 2),
((r'r:'=4),,,,) s (r'r4),,4,) ',= = [r4)=l"
{p_>_2).
Summing on ~, we see that
l rlg
= tr(T*T)'12<
[TIg = tr(T*T) ' / z >
•
i r4),l,
E IT4),1"
(p __<2),
(p >__2).
~t
* Part of Lemma 2.2 appears in Dunford and Schwartz [3] as Lemma XI.9.32. The condition "2 <=p" given there should read "p ~ 2." The lemma appears in toto in Gohberg and Krein
[5,p. 155].
254
CHARLES A. McCARTHY
[October
The conditions for equality in Lemma 2.1 show us that if p/2 # 1, ITI~= ~IT~I' if and only if every ~b~is an eigenvector of T*T. The last assertion is proved by using the polar decomposition
T = (TT*) TM U(T*T) 114, so
that E [(T~b~,0,)[P = E [((TT*)'/gU(T*T)'/4~,O~)I p
<= E t(T*T)~/'G l" I(TT*)"'GI" 1/2
1/2
= I Tl~/Zlr *1~/2 = I TI;. The conditions necesary and sufficient for equality may be seen directly from this chain of inequalities. Unfortunately, this last assertion fails for p < 1, as may be seen by considering
G
,
')
,
x~ 2
•
We may now prove analogues of the H~51der and Minkowski inequalities. TI-rEOREM2.3*. Let
TEcp,
SEcq.
Then
TSEc,
with
1/r= l / p + l/q,
0 < p, q, r < ~, and ITS I, <=I T[p IS 1~.Equality holds if and only if: if p, q < oo,
ITILP
( T ' T ) ~ is a multiple of (SS*)~ ; if p = ~ > q, P T * T e = where P is the orthogonal projection of H onto the range of SS*; if p < ~ = q, QSS*Q -a where Q is the orthogonal projection of H onto the range of T*T.
Isl~
Proof. First suppose p = ~ , q = r < oo. If q = r < 2, let {tp~} be an orthonormal basis for H consisting of eigenvectors for S*S. Then we have by Lemma 2.2
(2.3,1)
Irs[~< Y, ITs~Iq
Ot
q
q
g
To investigate the case of equality, we first notice that for the second inequality to be an equality we must have I rs~l = I TI~Is~I for every eigenvector of S*S. Consider those tk~ for which S*S~p~ = 2 ~# 0 and let the polar decomposition * Theorem 2.3 appears in A. Horn [7], derived from some inequalitiesof Weyl [4, p. 1079].
19671
cp
255
of S be S = U(S*S)1/2. Since U is an isometry on this set of ~b,, we have that {U~bat: 2at ¢ 0} = {271/2 SqUat:2, # 0} is an orthonormal set. Let us denote this set by {~bat},and notice that {~at} spans precisely ~(S) = ~(SS*) = ~(P). The equality ITSO, I=ITI~ISoatl yields I T ~ a t l - - I T I ~ I ~ , , I ; since T attains its norm on each ~kat,it must attain its norm everywhere on the span of the ~b,, so that T/[ is an isometry on the range of P; this is equivalent to PT*TP = [T]~P. Conversely, the condition PT*TP implies that S*T*TS~at=S*PT*TPS~kat --- ITl s*Psoat = I l s*s at so that if that is an eigenvector for S*S and the second inequality in (2.3.1) is an equality, then necessarily ~at is an eigenvector of T ' S ' S T and the first inequality in (2.3.1) must also be an equality. Continuing with the case p = oo q = r < ~ , suppose q > 2. Let {~bat} be an orthonormal basis for H consisting of eigenvectors for T ' S ' S T . Then by Lemma 2.2,
7"[
=
(2.3.2)
I TILP
ITslg= r,at ITS atl'<-lTl r,at Is atl lTl lsl
To have equality throughout forces the last inequality to be an equality. By Lemma 2.2, each ~, must be an eigenvector of S*S. That the middle inequality of (2.3.2) be an equality is equivalent to PT*TP = [T[LP just as previously. Conversely, if ~bat is an eigenvector of S*T*TS and PT*TP = then S*T*TS~p~---S*PT*TPS~pat = ]TI2S*S~pat so that ~batmust be an eigenvector of S*S and the last inequality of 2.3.2 must be an equality. Next we consider the case q = ~ , p = r < ~ . The isometry of the adjoint P__ $ ~ $ P mapping in every cp class (Theorem 1.3) shows that [TS[p-[S T * P[p=]S [o~]T* P]p = [S[~ I T]~, the middle inequality being wht we have just proved. We have also just proved that equality holds if and only if QSS*Q = Q(S*)*S*Q = IS ]2Q where Q is the orthogonal projection of H on ~(T*) = ~(T*T). This completes the proof in the case p = oo or q = o% so that throughout the remainder of the proof we take p < ~ , q < oo, r < p,q. We next prove the theorem in the case r _< 2. Together with r __<2, let us also assume for the moment that p >__2. Let {~bat}be an orthonormal basis for H consisting of eigenvectors for S'S: S*Sdpat = ;t~2if,. Using the polar decomposition of S : S = U(S*S) 1/2 where U is an isometry on the range of S'S, we see that {U~b,: 2at # 0} is again an orthonormal set; we complete this set of vectors to an orthonormal basis for H, denoted {Cat}. Now we use Lemma 2.2 first for r __<2, then for p ____2, and the H61der inequality for sequences (r/q + r/p = 1) to obtain
I T*ILP,
(2.3.3)
I rsl;
--- E
Z [TU2~q~at["
at
=
at
at
a:lz0atl'=<
--- ]s];I TI;.
a:
Iz0atl'
256
CHARLES A. McCARTHY
[October
If we have equality throughout, in particular we must have equality in the application of the HNder inequality, which requires that the ratio of I Sips]q = 2~ to I r0~l ~ be some constant independent of ~, generically denoted throughout by c. For the last inequality to be an equality requires ~b~ to be eigenvectors of T*T. Now2~SS*0~ = SS*S¢~ = S~2¢~ = ;t~a ~/~ so that 0~ are also eigenvectors for SS*. The facts that T * T and SS* are positive, have a common basis of eigenvector and that cI(T*T)P~/~ 11/2 = c I TO, Ip = JS*O,l ~ , together imply (T* T) P= c(SS*)'. Conversely, ( T ' T ) p = c(SS*) g may be seen to imply equality throughout (2.3.3); we leave the details to the reader. Continuing with the case r < 2, now assume p =< 2. Using the polar decomposition of T, we have
T = U(T*T) 1/2n... ( T ' T ) 112~, where there are n factors of (T'T)i/2n; n is chosen so that n > 2/p. Since T ~ cp, ( T * T ) ' I 2 ~ c , p and 1.1). Using what we have already proved ( n p > 2), we have
I(T*T)'~e'l~,=lTl~,/'(I.emma
(2.3.4)
I(r*r)~+.,:~sl .~÷<~÷,,.~,-, Z
I r~'*l
I(T*T¢'~'SI,,,~+~,., -, (k = 0,1,2,...,n - 1),
which yields
l(r*r)"es l,--< Irl, Isl,,
(2.3.5) Since I U 1~ -< 1, we have
ITsl,<-_ lv(r*TO'/~slr,<= lvl~lTlplSl,=lrl,lslq. Equality holds if and only if equality holds at every step of (2.3.4) and at (2.3.5). Equality at (2.3.4)implies in particular (k=0) that ((T'T)1/,),p = c(SS,),, which may be seen conversely to imply I TSl, = I T I , Islq. Finally we consider the case 2 < r. If n is an integer n ~ 1"/2, then
(s*r*rs~'~ ~,~,(r s
2)
and from what we have already proved, the operators (S*T*TS)"S*T*TS, (S*T*TS)nS*T*T, (S*T*TS)nS*T * and (S*T*TS)~S * also belong to various c classes, t < 2. After a certain amount of arithmetic, we find
I(S*T*TS) "+'1,/¢.+,, <=l(s*r*rs)"],/.] s* I,I r*l,I ml, Isl,. Using Lemma 1.1 and Theorem 1.3, we have
I rsl, ~`"÷'' z l rslf" I rl~ Isl~ fro~ which Its l, sir I,I sl~. We can have equality only if
19671
co
257
{ [ T , T S ( S , T , T S ) . ] [ ( S , T , TS)~S.T,T]}((z(.+ I)/')-OIq))-'= c(SS*)q ;
this (with c generically denoting some constant) is equivalent to S*{ T * T S ( S * T * T S ) Z " S * T * T}S = S*(SS*)- 1+(2q(.+~)/o S = c(S*S) (2q/°(~+l). or
( S * T * T S ) = c(S*S) q/'.
Thus the restriction of T * T to the range of S must be (SS*) q/'-I = (SS*)q/P; but then T * T must be zero on the orthogonal complement of &(S*S) if we are to have [TS[, = [Tlp[ SIq, so we must in fact have (T*IO p = c(SS*) q. That the condition ( T ' T ) p = c(SS*)qis sufficient for I Tsl, = I rl, IS]q may againbe easily verified and the theorem is complete. TamREM 2.4. Let T, S e % , l < p < o o . Then I r + s l p < - I r l , + l s l p , so that cp is a linear space and ] • Ip is a norm on %. I f p > 1, equality holds if and only if a T = bS f o r some a,b > O, a + b > 0. I f p = 1, equality holds if and only if T * T and S*S have a common set of eigenveetors {qS~} which f o r m an orthonormal basis for H such that for each ~ there exist a~,b~ > O, a~ + b~ > O, with a~rc~ = b~Sq6~, Proof.
From Lemma 2.2, let {4~}, {O~} be two orthonormal bases for H such
that ÷
The Minkowski inequality for sequences yields
IT
{
}l/p
{
~l/p
and by Lemma 2.2 again, this is dominated by [ r [ , + I Slp" If we have IT + S[p we see first from Lemma 2.2 that ~b, must be eigenvectors of both T * T and S ' S , and that (unless ZqS, = 0)~b~= Tdp~,/[Zdp~,I and (unless SqS~= 0) ~k, = SdpJ[ S~b~I" Equality in the use of the Minkowski inequality forces, for p > 1, constants a, b>=0 a + b > 0 , such that a(T~b~,~b,)= b(S~b~,~); since (T~,~p)=0=(Sq~,,~ba) for a ¢ f l , we must have a T = bS. Conversely, it is clear that these conditions imply IT + Sip = I Tip + Isl,. For p-- 1, equality in the use of the Minkowski inequality forces only constants a,,,b~> 0, a~ + b, > 0, such that a~(T~b,,ff~) = b,(S~b~,~k~); again (Z4~,,~bp)= 0 = (S~,~b~) for a ~ fl shows a,T4)~ = b~Sd?~ for all a. Conversely these conditions are surely sufficient to imply [ T l + $11 = [ Tit + I s l , . w e now continue with the estimates which show that %(1 < p < oo) is uniformly
--Irlp÷Islp,
258
CHARLES A. McCARTHY
[October
convex. Recall that a normed linear space X = {x} is said to be uniformly convex if the function 6(~) =
inf Ixl =IYl=l, Ix-Yl =6
1 - ~] x + y
is strictly positive in some range 0 < ~ < so. J. A. Clarkson [1] showed that the sequence spaces lp(1 < p < 0o) are uniformly convex by proving a number of sharp inequalities concerning the norm of elements in Ip; those which imply that lp is strictly convex are:
I x + y l r + l x - y l , ' < = ( l x l , + l y ] , ) ,/,'
(1 < p < 2 ) ,
Ix+yIp+ix-ylp<2p-,(ixIp+Iy]~
(2_<_p < oo).
and
If we take Ix] = l y l = 1, we have
I x + r l °' ==_2" - I x - y],'
<1 < p < 2),
I x + Y]" ----2' - Ix - Y]"
( 2 = < p < oo).
The modulus of convexity is thus given by 26(e) = 2 - (2 p" - eP') ~/p'
(1 < p < 2),
26(e) = 2 - ( 2 p - eP) 1/p
(2 __
We will now show that cp has the same modulus of convexity as Ip by demonstrating that Clarkson's inequalities hold in cr The conditions under which equality holds will simply be stated without proof; they only require checking the cases of equality in the inequalities within the proofs. Dixmier [2] used an interpolation theorem to show that cp and Ip have the same modulus of convexity in the range 2 < p < ~ ; our technique in this range uses only the HNder inequality. In view of the fact that the only known proof Clarkson's inequality for lp(1 < p < 2) seems to be the lengthly original demonstration of Clarkson, it is not suprising that the analogous result for cp must be at least as troublesome. We first state for reference some inequalities concerning real numbers. The proofs of the first two may be safely left to the reader. I n e q u a l i t y a: Let - a < b < a. Then
if0<~,
2~-I(a ~ + b ~ ) < ( a + b ) ~ + ( a - b )
if l < r < o o ,
2(a ~ + b ~ ) < ( a + b )
+(a-b)
~<2(a y+by);
~_~2~-l(a~+b~).
Cp
1967]
259
Inequality b: Let a > 0 b > 0. Then if0<~
(a+b)r
if 1 < ~ < ~ ,
( a + b ) ~ > a ~ + b y.
The third inequality which we require is the very deep inequality of Clarkson [1]. The forms in which we use this are
Inequality c:
Let - a~ < b, < a . Then for 1 < p < 2
{ ~ (a~+ b~)p} P'/P+ { ~ (a~- b~)P}p'/p < 2{ ~ a:+ I b~lPlp'Ip Let-a
Then f o r 2 < p < o o
la + bl + l a - b l
>__2(a"+ Ibis)
We now generalize these numerical inequalities to inequalities for operators. LEMMA 2.5.
Cpnorms of
Suppose A,B are operators on H and - A <_B < A. Then if0
tr(A+B)V+tr(A-B)~<2trAV;
ifl<e<~,
tr(A+B)V+tr(A-B)r>2trAL
If ~ # 1 and the quantities involved are finite, then equality holds if and only if AB = O. Proof. The lemma has non-trivial content only if A and B are compact; also A and B must be self-adjoint and A positive. Let {¢~} be an orthonormal basis for H consisting of eigenvectors for A:A¢~ = I ~ . Since -(A¢~,¢~) < (Bq~, ¢~) < (A¢~, ~b~), inequality a yields ((A + B)q~,,,~b.)r + ((A - B)~b,Ab,~)r < 2(Aq~,,,~b.)~' = 22~
(? < 1),
((A + B)~b,, q~,)~ + ((A - B)~b,, q~,)r> 2(A~b,, q~,)~ = 21~
(~ > 1).
Also, A + B ~ 0 and A - B > 0, so Lemma 2.1 yields ((a + B)~b,, q~,) + ((A - B)~b,, q~,) < ((A + B)qS,, ~b,)~+ ((A - e)~b,, ~b,)~ < 22~
(~, < 1)
((a + B)rq~,,, q~,,) + ((A - B)~b,~,~b,,) > ((a + B)~b,, ¢,)r + ((A - B)ck,, ~b,)' > 22~
(y > 1).
260
CHARLES A, McCARTHY
[October
Summing on 0c we finally obtain
(r < 1),
t r ( A + B ) ~ + t r ( A - B) ~< 2 ~ 2 v = 2 t r A ~ ~t
tr(A+B)~+tr(A-B) LEMMA 2.6.
~>_ 2 ~ 2 ~ = 2 t r A r
(r > 1).
Suppose A, B are operators on H and A >=0, B >=O. Then if0<~<_l,
t r ( A + B ) ~ =< t r A r + t r B r ;
i f l < = y < o% t r ( A + B ) ~ _>_ t r A V + t r B r. I f ~ ~ 1 and the quantities involved are finite, equality holds if and only if AB = O. Proof. First we find operators C, D on H such that C(A +B)I/2=A ~/2, D(A + B) ~/2 = B 1/2 and C*C + D*D = 1. [We do this as follows: B > 0 implies A < A + B, so 1,4 '/2x] = (Ax, x) '/2 < ((A + B)x,x) ':2 = I(A + B)'/2x I; similarly [B1/zx I < I( A + B)'/2xl " Let x = (A + B)l/Zu + v, where v e d ( A + B), u .I.~:(A + B); such x are dense in H, so C and D are uniquely determined if we require Cx = A1/2u, Dx = B1/2u + v. Clearly C(A + B) 1/2 = A 1/2 and D ( A + B ) I/2 =B1/2; C * C + D * D = I follows from / / 5 +c Dx x 2 = A,/2u 2 + B1/2u2+Iv2=(Au, u)+(Bu, u)+[v2= ( A - ( - B ) V 2 u Z + I v 2 = x] 2. Note also that Icxl-_ Ixl, Io l _-_Ixl] Then we have (A + B)' = (A + B) 'lz C*C(A + B) ~/2 + (A + B)r/2D*D(A + B)v/2, and tr(A + B)' = 1C(A + B) ':2 I~ + I D(A + B)':2I~
= I(A + B)'/2C* II +IrA + = trC(A + B f C * + trD(A + BfD*. To estimate tr C(A + B)~C *, let {¢~} be an orthonormal basis for H consisting of eigenvectors for A: A¢~ = 2~b~. Then by Lemma 2.1,
(C(A + BfC*¢~, ¢~) ~ ((A + B)C*¢~,C*¢~)~1C*¢~12(~-v)
= (A¢~,¢~) ~ = 2]
(y < 1),
(C(A + BfC*¢~, ¢~) >_
((A+B)C*C,,C*f,Ylc*¢,I*"-'>_>_(A~b,,¢,)~=
2~
(~>1).
(Recall that 1C*¢~12(1-v)=< I¢~[ 2 ( ' - ~ ) = 1 for ? =< 1, and IC*¢~[ - 2 ( ' - ) < l(k, I-2 1 - . = I for ? => 1). Thus summing on a yields trC(A + B)~C*~< trA ~
(y < 1),
trC(A + B)~C * >>- trA ~
(~ >= 1).
19671
cp
261
Similarly, trD(A + B)rD * < trB v
(~, _~ 1),
tr D(A + B)rD* > tr B'
(~, > 1).
and the lemma is proved. Lemmas 2.5 and 2.6 yield all of Clarkson's inequalities for %(1 < p < m), which we formulate as: THEOREM 2.7. Let T, S be operators on H. Then (i) . 2v-I(}T[~+]SI~ < [ T + S [ ~ + [ T - S ] ~ (ii).
__<2(]T[;+IS}~ )
[T+S[;'+[T-S[;'<=2([T[;+[S[~) v'/~
aii).2(Irl~+lSig<=lw+slg+lr-sl~<=2~-'([wl~+lSl9 (iv).
2(I TI; + I Sl~'"'=< I T+ sl;'+ I T - SI;'
(0
(2__
If p = 2, equality always holds; if p ~ 2 and the quantities involved are finite, equality holds in (i) or (iii) if and only if T*TS*S = 0, in (ii) or (iv) if and only ifT=Sor T=0orS=0. Proof of(i) and (iii): First consider p__<2. Then [ T + S I ~ , + I T - S [ ~ = tr((T*T + S ' S ) + (T*S + S ' T ) ) ~/~ + t r ( ( r * r + S ' S ) - (T*S + S ' T ) ) '/2. Lemma 2.5 is applicable with A = T*T + S'S, B = T*S + S'T, ~ = p/2 < 1, for [(Bx, x ) | = [2 Re(Sx, Tx) [ =[ < 2 S x [ [ Tx [ =[ [ < Sx2-b [Tx[ 2 - (Ax, x); thus IT + St] +[ T - S[~ _-
2(l T[~ + [Slg.
It follows from this that also so that
12r[~ + 12sl; ~ 2( I T + slg +IT - sl9
2P-'([T[; + [S [,~)s l r + s[; + I T - S[; < 2(l rig + [ s [ D (0 < p N 2). (iii) follows in exactly the same way, with the sense of all inequalities reversed. Proof of (ii). Let A and B be as above, and let {¢=} be an orthonormal basis for H. We use inequality e with a=, b= defined a~ _+ b~ = ((A _+ B)pl2r~,,dp=)llP, to obtain
IT+SI~'+IT-sI~" + 2-'1 ((~ + m"'~., ~:)" - ( ( ~ - 8).,'~., ~.)',. I.~"J: /
262
CHARLES A. McCARTHY
[October
Apply inequality a to each of the summands on the right-hand side (with 7 = P > 1) to obtain
Ir+sl~'+lr-sl~ =< 2 E 2 - ' ' 2"-'[((A + B)p/2¢,,¢~) + ((A - B)p/2 ¢,, ¢,)]} p'/p d$
=< 2{2-~[tr(A + B) p/2 + tr(A - B)p/2]} p'/~. Use of Lemma 2.6, then Lemma 2.5 (with), = p/2 <=1) yields
I T + S I n ' + I T - SI~: <= 2(trAp/2)P'/~= 2{tr(T*T + S*S)p/2} p'/p <= 2{tr(T*T) p[2 + tr(S*S)p/2} p'/p= 2(] T ]~ + IS ]P)P'/P Proof of (iv). Again set A = T*T + S'S, B = T*S + S'T, and let {¢,} be an orthonormal basis for H. Since p >= 2, p/p'>= 1, and the triangle inequality in Ip,/p yields
Ir + sl;' + I r - s l ; ' e Z [((A + B)'/~¢,, ¢~)"/" + ((A - B)'/2¢,, CJ"/']'/"}"/'. Inequality a, with 7 = p', yields
((.4 + B)'/2¢~, qL)"" + ((.4 - B)~/2¢,,
¢,))"/"
+ t((.4 + B)'/'¢,, 4~)'/'- ((.4 - B)'/~¢,, ~) ' / ' ] 2
"'/ J
Application of inequality c with p >= 2 yields
((.4 + S)'n¢,, e J " / " + ((X - a)'/~¢~, (2.7.2)
=>2"2([((A+B)'/2¢"¢')'/'] =
2"
.
¢,)"/"
+ [((A-B)'/2¢"d?')'/P]')
21-'' [((,4 + B) '/2¢,, ¢,) 4- ((A - B)'/2¢,, ¢~)] ''/p.
The insertion of (2.7.2) into (2.7.1) then yields
IT + sl,r + I T - Sl,r>__ 2 . 2 ' - " { t r ( A
4-
B) m + tr(A - B~"I2}p'n'
~ 2" 2'-"{2trA'12} " ' / ' = 2 ( t r A ' / 2 ) ' ' 1 " >
2([ rl; + I s I,~'"
the last two inequalities being applications of Lemmas 2,6 and 2.5 with ~ = p]2 > 1.
19671
Cv
263
To conclude this section, we prove the following special result for 0 < p < 1. Professor Gohberg has informed us that this is a special case of a theorem of S. Ju. Rotfel'd [to appear in Functional Analysis and its Applications, I, fasc. 3 (1967)-]. THEOREM2.8.
IT+S]~
are in cv, equality holds if and only if T*TS*S = O. Thus, if p < 1, p(S, T) = [ T - S [~ is a metric on cv. Proof.
First we demonstrate the theorem in the case 0 < p < 2/3. Since
(T'T) 1-p and (S*) 1-p are both dominated by (T*T)I-v+ (S'S) I-p, there are operators C and D, both of operator norm at most one, such that
(T.T)~I-v)t2= C[(T*T) 1-v + (S*S)X-v] llz and
(S,S)O-v)/2 = D[(T,T) I-v + (S*S)'-"]'/2. (cf. the proof of Lemma 2.6). Using the polar decompositions of the operators T and S we have
T + S = U(T*T) 1/2 + V(S*S) 1/z = [U(T*T)v/ZC + V(S*S)V/ZD] [ ( r * T ) ' - v + (S*S)'-v]'/2. Using our generalized H61der inequality (Theorem 2.3) followed by the triangle inequality in c 1 (Theorem 2.4), we have
IT + S Iv < I U(T* T)v/2C + V(S*S)P/ZD Ili [(T* T)' - p + (S'S)' -P] '~2iv(l_ v) <= {1U(T*T)P/aCI1 + [ V(S*S)P/Z D[,} {tr [ ( T ' T ) t-v + (S*S)'-']v/2(1 -p)}(,-p)/,. Using the fact that the operator norms of C, D, U and V are at most one, we have ] T + Sly =< {1
--- (I
(T*T)'/2[, + + I
](S*S)P/z],}{tr[(C*T) 1-" + (S*S) I-v] "/m -')}('-')Iv
{tr [(T'T) 1-"
+
(s*s)l-q'/2"-'~} "-~/'.
Since 0 < p _-<2/3, p/2(1 - p) _-<1, so that by Lemma 2.6 tr [ ( T ' T ) ' - ~ + ( S * S ) I - ~ ] ' / ~ " - " < tr(T*T),/2 + tr(S*S) "/2 =
I TI; + Isl%.
We thus have
IT + S[g < (1 Tiff + I sl;)(I Tiff + I s I;)"-''--(I TI~ + [ sis) '/'" Now consider the case 2/3 < p < 1. Let Q denote the operator [ ( T ' T ) p/z
264
CHARLES A. McCARTHY
[October
+ (S.S),/2] 1/2 and let C, D be operators such that C * C + D * D = I , (T'T) p/4 = CQ = QC* (S'S) p/4 = D Q = QD*. The restriction 2/3 < p < 1 allows us to write T + S = U(T*T) 1/2 + V(S*S) 1/2 = [U(CQ)2/'-2QC*CQa-2W-F V(DQ) 2/'-z QD.DQa-2/~Q2/, -2. Using our generalized H61der inequality, the triangle inequality in c,, and the facts that [ UIoo < 1, I VIoo _g 1, we have
IT+Sl, < ]U(CQ) 2/'-2QC*CQa-2/P + V(DQ)ZI'-2QD* DQa-2/p]Qz/p-2[,IQ2/,-21, " - " <={ I(CQ)2"-2QC*CQ3-2" I, + I(DQ)~'-2 Q~*~Q3-2',ld I Q'll-,', Assuming (CQ)21"-2QC*CQa-21"[1 < QC*CQ , and[(CQ) 21"-2QD*DQa-2/'I1 < QD*DQ 1, which we will prove in a moment, we have IT + Sip < {trQC*CQ + trQD*DQ}[ Q2il-p/p = (tr Q(C*C + D*D)Q • (tr Q2) 1-,/,
= (trQ2)tl'=(lTl~ + Isl~ '/,. To show that [(CQ)2/'-2QC*CQ3-2/P l,
I(CQ)2" 2QC*CQ3-2"l, <= I(cQ)2C'-"%,,-,IqC*CQ ~-2" I,,~,-, = lu , - C , C ~1 ~ , - p i p . IQ ~-2'' c*CQl.,=.-1 Let now {~,} be an orthonormal basis for H consisting ofeigenvectors for QC*CQ: QC*CQd~, = A,d?,. For 2/3 < p < 1, 1 < p/2p-1 < 2 and 0 < 3 - 2 / p < 1, so that Lemmas 2.2 and 2.1 yield
[~ a- 2/p~,C~ Ipl2p- , < y (Q 20-2/')C*CQq~,' C*CQ~P,)Pl2(2P-') "~ ~l,12p-,= <=Z (Q2 C*CQdp,, C*CQd?,) a'- 2/2(z,- ,,l c,cQ~. [2-2,,:,- , ¢g
Since I C*[~o_< 1, and 2 - 2p/2p- 1>0, I c , c Q ~ , I 2-~,,~,-' <_IcQ~, ~-2,,~,-, = ,t~'-,/2p- 1. Also, (Q2C*CQ~, c*cQd~) 3p-2/2(2,-1~ = I QC*CQd?~13,-212j,-1 A,3,-2/2p- ~. It follows that =
IQa-zlPC*CQ1~I2"-'1~/2Z,-, Z ..,~3P-2/zP-':~)-~/2~-, -- Z A~, so that
-- z ~,--
I ec*ce I,.
19671
cp
265
This demonstration used only the fact [CIoo-<_ 1, so that C may be everywhere replaced by D. 3. cp is complete. Having shown that cp is a metric space we proceed to prove that cp is complete in this metric. First notice that since [ T[p > [ T [oo, any Cauchy sequence {Tn} in the metric of cp must be a Cauchy sequence in operator norm. It follows that Tn must converge uniformly to an operator T. Lemma 3.1 below will show then that T e cp and that IT - T. [p-, 0.
Let T~ be a sequence of operators which converges uniformly to T. ThenlTlp
Proof. Since Tn converges uniformly to T, T~* converges uniformly to T* and A, = T* Tn thus converges uniformly to A = T*T. It follows that for any fixed > 0 A~ converges uniformly to A r. [To see this, let M = sup IA, [~, and let p(t) be a polynomial such that [p(t) - t'[ < 8 for 0 -< t < M. Then ]p(A,)-Agloo __< and [ p ( A ) - A' ]oo < e, and since An converges uniformly to A, p(An) converges uniformly to p(A); thus [A~-Ar[oo < 3~ for all n sufficiently large]. Now let {~} be any orthormal basis for H and let a be any finite set of indices~ Since A~ converges to A r, ]A~I ~ converges to IA~I 2 and thus
X: IA~=I 2
=
at~fl'
lira ~
IA~,I ~
n " + 0 0 ~t E @
<
liminf ~ n-coo
[a~¢,[ ~ --
liminf
I A~I]
a
Since this holds for all finite sets a, we must have
Ia'l~ = s u p Y [A'¢~I ~ < l i m i n f l A ~ l 2 . If we take 7 = p/4, we then have
I Tlg-- I
~n ~ <= lim inf I-p/* 2= n---~ OO
COROLLARY 3.2.
lira inflT~l;. n --I. OO
Cp is complete.
Proof. Let {T~} be a Cauchy sequence in cp and let T be the operator to which Tn converges uniformly. Then for each fixed n, the operators Tn - Tm converge uniformly to T , - T as m ~ oo. Using Lemma 3.1, we have I T n - T[p 1. Also, we have demonstrated the uniform convexity of cp for 1 < p < oo and thus, by a Theorem of Pettis [12], cp must be reflexive for 1 < p < oo. The norm inequalities of Theorem 2.7 show that the norm in c2 satisfies the parallelogram law and thus c2 is a Hilbert space. We now show that in a natural
266
CHARLES A. McCARTHY
[October
way the dual space to cp is cp, (1 < p < oo). We first show that operators in c~ possess a trace. L~MMA 4.1.
I f T ~ c 1 then tr T exists and [tr T I _-<
ITI,.
Proof. Lemma 2.2 actually shows that if {~b}, (~b} are any two orthonormal bases for H, then ~ , (T~b,, ~'~)1 < I T Il" In particular, we may take ~b, = ~k,. , Our next two theorems show that cp = cp, in a natural way. First we show * and then c* c cp, THEOREM 4.2. Let 1 < p < ~ , and let S sce,. Then F ( T ) = t r ( S T ) is a continuous linear.functional on cp with norm precisely [Sip, , attained when T = V*(SS*) p'Ip (V is the partial isometry occuring in the polar decomposition of S).
IST I, zlslp.ITI,
Praaf. For any T e % , S T i s in cl (Lemma 2.3) and so that tr(ST) exists and may be computed by tr S T = ~,,(STdp,, d?~)for any orthonormal basis {~b~} of H. The fact that tr S T may be so computed shows that trS(Tt + T2)---trST~ + trST2 and t r S ( a T ) = a t r S T so that t r S T is in fact linear on cr That t r S T is bounded with bound IS p, follows from trST[ _<_Isl,.I T [ , . It is easy to check that when T = V*(SS*) p'Ip, t r S T = S]p, T , . THEOREM 4.3. Let F(T) be any bounded linear functional on cp(l< p < oo). Then there exists an S in cp, such that F(T) = trST. (By Theorem 4.2, is the bound o f F ; it is also clear that S is uniquely defined by F).
Isle
Proof. It is no loss of generality to assume that F has bound 1. Let T n ~ cg be chosen so that [Tnlp = 1 and F(Tn)~ 1. Then F(T~ + Tin) < [T~ + Tm[p, but as n, m ~ oo, F(T~ + Tin) ~ 2. Since cp is uniformly convex, ] Tn - T m ]p ~ 0 and thus T, converges to some operator T E cp with [ T[p = 1. We need now only to show that F ( R ) = t r S R for every R s cp. Consider the functional G on ep given by G(R) = F ( R ) - t r S R . Let a = 1/2 IG] and let A be in cp, I A I , = 1 such that G(A) = 2a. [That G attains its bound follows from the uniform convexity of cp ust as did the fact that F attains its bound]. We then have F(A) = z + a, tr SA = z - a for some complex number z. Consider first the case 1 < p < 2. Since lel = 1. Isle. = 1. lalp-- 1 and I Tip -- 1, we have using Theorem 2.7 (i),
IF(T + ~A)[ ~ + ItrS(T-eA)[P (4.3.1)
_
I T - eAl~,< 2(1 + eP)(e > 0,1 < p < 2).
The left-hand side of (4.3.1) is {(1 + e R e z + ca) 2 + (elmz)2} '/2 + {(1 - e R e z + ca) 2 + (elmz)2} p/2 = 2 + 2pea + 0(~2)
as t ~ O.
19671
ep
267
Thus 2 + 2pea + 0(e 2) < 2(1 + le]P). Letting e tend to zero and recalling that p > 1, > we see that a = 0. That is to say, I G I = 0 and thus F(R) is in fact equal to tr S R for all R E %. Now consider the case p > 2. We have again from Theorem 2.7 (iv) that
IF(T + eA)IP' + Itr(S(T--eA)IP' (4.3.2)
<= [T+
A[g'+IT-eAIg' <=
= 2(1 + ep')p'lp. The left-hand side of (4.3.2) is equal to {(1 + 8Rez + ea) z + (8Imz)2} p'/2 + {(1 - ~Rez + ea) 2 + (elmz)2} p'/2
= 2 + 2p'ea + O(e:)
(e -* 0),
while the right-hand side of (4.3.2) is equal to 2 + 2P/P'e p' + 0(e2P'). Again as e -~ 0, recalling that p < oo so that p' > 1, we see that a = [ G] = 0 and thus that F(R) = tr SR for all R. 5. Concluding remarks. We have exhibited a remarkable parallel between the spaces of operators Cp and the sequence spaces Ir In fact, all our norm inequalities for cp contain the same inequalities for Ip; to see this, it suffices to show that there exists an isometric imbedding of Ip into cp: Let {q~} be any orthonormal basis for H and let P~ be the orthogonal projection of H onto the span of ~b~ defined by P~q~= c5~q5~. For every sequence ~ = {~} ~ 1 , let T = T¢ = ~ e ~ Then T ' T - ] ~ 1 ¢ ~ 2p~, so the characteristic numbers of T are and
I¢ 1
Since Cp shares ao many properties in common with lp, one would like to be reassured that Cp is not, in fact, an lp or Lp space or even some subspace thereof. Of course c2 is a Hilbert space, but for p ~ 2, Lemma 1.2 shows us that there are far too many isometries of cp for cp to be an Ip or Lp space. Perhaps the most convincing demonstration is to actually calculate some norms when p ~ 2. Any class % of operators on a Hilbert space of dimension exceeding 1 has a subspace which may be represented as 2 x 2 matrices of the form (aS) cd " Suppose that this subspace of Cp were isometric with some four-dimensional subspace of some space Lp(f~,dp), with (a ao~ corresponding to the function af + bg
+ ch + dk. By taking a= l, b=c=d=O ,,/ and equating the % norm of (lo Oo) and Lp norm of f, we have y ° l f l o - - 1; similarly y . l g l p = Y . l h l p = y ° l k l ~ -- 1.
268
[October
CHARLES A. McCARTHY
Next we consider (10 0)e,0 to see that 2 =
J',lf+ e'°k"l
for every 0. Integrating
in 0 and using the H61der inequality, we have for p < 2
t'2: a° If
= fa Jo
2re
f [ f 2" dO If + e"kl~l "/~
+ ef°k [p< dta\Jo
zn
/
= fo 2, the sense of all inequalities above is reverrsed but the conclusion is the same. Now let ~ql be the intersection of the supports of f and g, f~2 that of f and h, l'23 that of g and k, and ~)4 that of h and k; thus ~ql, "", 1)4 are mutually disjoint. By taking c = d = 0 and equating norms, we see that
¢1 12+ Ibl') '':: f<,, I:s+,,.l" + f::
f<,. Ibi'l.l'.
Thus j'u ] a f + bg pis a function only of la and b ; similarly, j'u, a f + ch p is a function only of a and c , ~.t~3 bg + dk p is a function only of b and d , and Y, lch+dk Pisafunction only of]c[ and [ d . Thus j'a a f + b g + c h + d k p = ~.a, a f + b g P + ... + fll4lch +d]¢[ p is a function only of lal, Ibl, Icl, and d . The presumed isometric imbedding of cp into Lp thus yields the equality of the cp norms of
or 4 = 2 p/2+ 2 p/2, which is impossible for p ¢ 2. We have seen that for p ~ 2, cp on a two-dimensional Hilbert space is not isometric with any subspace of any Lp space. Of course between any two Banach spaces of the same finite dimension there exists a bicontinuous linear transformation. We now show that if p ¢ 2, there is no bicontinous map between cp on an infinite dimensional Hilbert space and any subspace of any Lp space. In passing, we will obtain estimates on how far from an isometry any linear one-to-one map between cp on a finite-dimensional Hilbert space and any subspace of any Lp must be. Our example is derived from that of S. Kakutani [8]. Let {~b~} be an orthonormal basis for H, fixed once and for all, and denote by P, the orthogonal projection on H defined by P,~bp = di,o~,; the operator norm of ~,,a,P, is sup, i a= I. Define the operators E, and F, on cp by E , ( T ) = P,T, F,(T) = T F , ; by Theorem2.3, [ Y~oa.~.(T)I,z I~=a.P=[~[rl,= sup. [a. I and similarily for the {F,}. Thus {E,} and {F,} are the atoms respectively of two Boole an algebras (E}, {F} of projections of bound one on cr Let {G} denote the Boolean algebra of projections on cp generated by {E} and {F}. We first obtain estimates for the norms of some elements of {G}. If we think of
Irl,
19671
cp
269
operators T in % as given by matrices (t~) = ((T~b~, ~bp)), we see that an operator on ep of the form ~,pa~,~E~F~ carries (t~p) into (a~,~t~). Suppose that T is given by t~ = 1 (1(1 <=~,fl<~ n), 0 other wise, and U i s given by u~=n-1/2to~+P(1 <=~,fl < n), 0 otherwise, where to is a primitive n-th root of unity. T is simply n I/2Q with Q a self-adjoint projection of rank one, and hence I TIp = n 1 / 2 ; U is the direct sum of an n x n unitary with zero and hence n,n ,-.~ • + / ~ 17 U[~=n 1/p. It follows that for 0 < p < 2 the operator ~-~ ,.,,B=l,r~, ~ , , p on ep has norm at least n °/p- 1/27 Now suppose that A is a linear one-to-one operator from cp into some subspace of an Lp space. Then {AEA-1} and {AFA-1} are Boolean algebras of projections on some subspace of an Lp space of bound at most IIA II IIA-11I • it follows from [10] in the case 2 < p < oo, or better with the estimates of [9, Section 6] valid uniformly in the range 0 < p < 2, that {AGA- 1} is a Boolean algebra of projections with bound at most const. IIa I[2 IIA-~[I 2, and hence that {G} has bound at most const. IIAII 3 IIA-1 l[3. But we have already shown that {G} has bound at least n(1/p-1/2) , for any n no greater that the dimension of H, and thus we must have IIa tl IIA-Ill > c°nst'(dimH)l/a(1/P-1/2)" In the case that H is infinite-dimensional, we see that A cannot be bicontinuous; in the case that H is finite-dimensional, we have a lower bound for I1h II IIA-'I[. (The constant in this last estimate may be taken to be 14-s uniformly for 0 < p < 2, although much better constants are undoubtedly available; we also except that the exponent 1~3(lip - 1/2) may be improved to ( l i p - 1/2) but no more.) The analogous result for 2 < p < oo follows from the consideration of adjoints, yielding [IAil I[a -1 [l ->-const. (dim H ) 1/3(1/2- I/p) . There is a partial converse to the H61der inequality for sequences which states: If {a.} is a sequence such that {a, b.} ~ l, for every b, ~ lp, then
Although everywhere defined, linear, but not continuous, functionals abound on lp, this theorem says that there are no such functionals which are given by sequences. A similar statement holds for %: T~mo~N 5.1. Let T be an everywhere defined linear operator on H such that TSec, for every Seep. Then Teeq (1/r = lip + l/q). Proof. We first show that T is bounded. If T is not bounded, then there exists an orthonormal set {q~,} of H such that I > 3 ~ [ T o see this, it is clear that we can choose ~bl. Having selected ~bl, -.., ~ , - 1 orthonormal, select ~b. of norm 1 in the orthogonal complement of the subspace of H spanned by ~bl, ..., qS._ 1 such that ____3"maxl_<,,<.]T@~]; if this cannot be done, the linearity of T alone shows that T must be bounded]. Now define S to be the continuous linear operator for which SqS. = 2-"~b,, S = 0 on the orthogonal complement of
]z~.l
270
CHARLES A. McCARTHY
[October
{q~,}. S belongs to every cp (p > 0). But T S ~ cr says in particular that oo > I TS~p. I > (3/2)" uniformly in n which is impossible. Thus T must be bounded. If r = q, this is clearly also all that can be said. Now suppose r < q. We assert that T must be compact, for if T were not compact, then T * T would not be compact and thus by decomposing the spectrum of T * T (with its multiplicity) we could find a countable orthonormal set {~,} in H for which the support of the vectors 4,, are disjoint and bounded away from zero; thus, (T~b,, T ~bm)= 0 if n # m, inf,] T~, ] = a > 0. Complete {~b } to an orthonormal basis {tp~} for H in any manner whatever. Define S by Sip, = b,~b, for the originally chosen q~,'s, S~p~=O otherwise, where { b , } e l p . Then [TSc~,[=[b,[ Td? > a l b , . If we take, however, {b,} to be in lq but not in l,, we see that ~ T S ~ , " = oo. Since (S*T*TS~b~,~p~) =0unless ~ = fl, we see that {t~} must be an orthonormal basis of eigenvectors for S * T * T S and hence, by Lemma 2.2, T S cannot be in c,, contrary to our hypothesis. Finally, knowing that T is compact, let {~b~}be an orthonormal basis for H consisting of eigenvectors for T*T. Define S by S ~ = b~/~ where {b~} e lp. Then {~b~} is also an orthonormal basis of eigenvectors for S * T * T S and we have ~ > [ T S [ : = ~.[TS~b~['= ~[b~['[T~b~['. By the theorem on sequences, {[ T~b~[} e l~ and Lemma 2.2 again yields T e Cq. The conclude, we prove the often used lemma that operators of finite rank are dense in every cr LEMMA 5.2. Let T e cp. Then for every ~ > O, there exists an operator T~ such that the range o f T~ is finite dimensional and [ T~ - T[p < e. Proof. Let/1~ be the characteristic numbers of T and let {~b~} be an orthonormal basis for H consisting of eigenvectors for T ' T , with T*T~p~ = t z f ~ . Let 6 be chosen arbitrarily small and let t; be a finite set of indices such that ~#~<e p. Define T~ by T~c~=T~b~ (~e~), T~dp~=O ( ~ t ~ ) . Then
REFERENCES 1. J. A. Clarkson. Uniformly convex spaces, Trans. Amer. Math. Soc. 40 (1936), 396-414. 2. J. Dixn'der, Formes lindaires sur un anneau d'operateurs, Bull. Soc. Math. Franco 81 (1953) 9-39. 3. N. Dunford, Spectral Operators, Pacific J. Math. 4 (1954), 321-354. 4. N. Dun_fordand J. T. Schwartz.Linear Operators,Part II. Interscienc¢,New York (1963). 5. I. C. Gohb~rg and M. G. Krein. Introduction to the Theory of Linear Non-Self-Adjoint Operators in Hilbert Space, Moscow (1965). (In Russian). 6. A. Grothendicck, Produits tensoriels topologiques et espaces nucldaires, Memoirs Amer. Math. Soc. 16 (1955). 7. A. Horn, On the singular values of a product of completely continuous operators, Pro¢. Nat. Acid. Sci. U.S.A. 36 (1950), 373-375. 8. S. Kakutani, An example concerning uniform boundedness of spectral measures, Pacific J. Math. 4 (1954), 363-372.
19671
cp
271
9. W. Littman, C. McCarthy, and N. Rivi6re, L p Multiplier theorems of Marcinkiewicz type, To appear. 10. C. McCarthy, Commuting Boolean algebras of projections 11, Proc. Amer. Math. Soc. 15 (1964), 781-787. 11. J. yon Neumann. Some matrix-inequalities and metrization o f matric-space, Tomsk Univ. Rev. 1 (1937), 286-300. 12. B. J. Pettis. A proof that every uniformly convex space is reflexive, Duke Math. J. 5 (1939), 249-253 13. R. Schatten, A theory of cross-spaces, Ann. of Math. Studies, No. 26, Princeton University Press, Princeton, 1950. 14. ~ , Norm ideals of completely continuous operators, Ergebnisse der Math, Neue Folge, 27, Springer Verlag, 1960. UNIVERSITYOF MINNESOTA, MINNEAPOLIS, MIN't~SOTA55455