INTRODUCTION TO ANALYSIS, 2002–2003
Tim Traynor University of Windsor
timtraynor.com/analysis
13/1/2003 1054 mam 1
...
29 downloads
594 Views
1MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
INTRODUCTION TO ANALYSIS, 2002–2003
Tim Traynor University of Windsor
timtraynor.com/analysis
13/1/2003 1054 mam 1
2
INTRODUCTION TO ANALYSIS, 2002–2003
Many thanks to Maria Pap, Aleksandra Katafiasz, and students of several years. T.T.
13/1/2003 1054 mam
Contents Titlepage Acknowledgements Axioms for the real number system Consequences of the field axioms Consequences of the ordered field axioms Absolute value The natural numbers The integers and rationals numbers The space Rn The existence of roots — a consequence of completeness Cantor’s Principle and the uncountability of the reals Suprema, infima, and the Archimedean Property Exponents Topology in the reals and n-space Interior and boundary Metric spaces and subpaces Closure and accumulation: the Bolzano-Weierstrass Theorem (set form) Compactness and the Heine-Borel Theorem Compactness in subspaces Convergence of sequences: Definition and examples Limit theorems for sequences of reals (including uniqueness) Existence: Monotone sequences Limit superior and limit inferior Cluster points and subsequences: the Bolzano-Weierstrass theorem (sequence form) Existence: Cauchy sequences The number e, an application of Monotone Convergence Series of numbers Limits of functions Continuity of functions Continuity and compactness The Intermediate Value Theorem Uniform continuity Differentiation The Darboux Property of derivatives Mean Value Theorems L’Hopital’s rule Taylor’s Theorem The Riemann integral Existence of the Riemann integral Equivalence of the Riemann and Darboux integrals Linearity and order properties of the Riemann integral The integral of composites, products et cetera 13/1/2003 1054 mam 3
4
CONTENTS
The Fundamental Theorem of Calculus Pointwise and uniform convergence Uniform convergence: Continuity, integral and derivative Power series The Exponential and trigonometric functions Differentiation of vector valued functions and complex valued functions Integration of vector valued functions Banach’s Contraction Mapping Theorem Inverse Function Theorem Implicit Function Theorem Appendix: Countability
13/1/2003 1054 mam
Axioms for the real number system We begin by assuming the existence of a set R called the set of real numbers with certain properties. We will see that everything is based on these. The real numbers form a complete ordered field. This means that the set R satisfies the following axioms: The field axioms. R is a field; that is, it has two binary operations (x, y) 7→ x + y and (x, y) 7→ xy called addition and multiplication and two distinguished members 0 and 1, such that: (A1) For all x, y ∈ R, x + y = y + x ( commutativity) (A2) For all x, y, z ∈ R, (x + y) + z = x + (y + z) (associativity) (A3) For all x ∈ R, x + 0 = x = 0 + x (0 is an identity for addition) (A4) To each x ∈ R corresponds an element −x ∈ R with x + (−x) = 0 = (−x) + x. (additive inverse) (M1) For all x, y ∈ R, xy = yx, (commutativity) (M2) For all x, y, z ∈ R (xy)z = x(yz) (associativity) (M3) 1 6= 0 and for all x ∈ R x1 = x = 1x (1 is an identity for multiplication). (M4) To each x ∈ R with x 6= 0 corresponds an x−1 ∈ R with xx−1 = 1 = x−1 x (multiplicative inverse). (DL) For all x, y, z ∈ R, x(y + z) = xy + xz (distributive law). The Order Axioms. In addition to the above there is a relation < on R making it an ordered field, that is, it satisfies: (O1) For all x, y ∈ R, exactly one of x < y, x = y, y < x holds (trichotomy). (O2) If x < y and y < z, then x < z (transitivity). (O3) x < y implies x + z < y + z. (addition preserves order) (O4) x < y and z > 0 implies xz < yz. (multiplication by a number > 0 preserves order). The Completeness Axiom. If A and B are non-empty subsets of R such that for all a ∈ A and b ∈ B, a < b, then there is an x ∈ R such that a ≤ x for all a ∈ A and x ≤ b for all b ∈ B. (Here x ≤ y is an abbreviation for x < y or x = y.) (There are many other ways of expressing the completeness axiom and we will meet some of them. We have chosen one that can be stated with very little theory.)
13/1/2003 1054 mam 5
6
AXIOMS FOR THE REAL NUMBER SYSTEM
In analysis books, it is traditional to only write x + 0 = x in the axiom for additive identity, since the other part, x = 0 + x follows from the commutativity. Similar statements apply for the multiplicative identity and the additive and mulitiplicative inverses. Here, however, we have elected to use the conventions usually used in algebra books, to facilitate the comparison with other algebraic systems. The distributive law (DL) is described by saying that multiplication is distributive over addition. This distinguishes it from x+(y ·z) = (x+y)·(x+z), which does not hold. In elementary set theory we do have two distributive laws (intersection over union and union over intersection). (DL) is actually what is known as a left distributive law. Using the commutativity of multiplication, we may deduce from it a right distributive law of multiplication over addition, namely that for all x, y, z ∈ R, (y + z)x = yx + zx. From the field axioms we can deduce all the usual rules of algebraic manipulation that we know so well. The order axioms add the ability to handle inequalities in the way we are accustomed, and the Completeness Axiom adds the power of Calculus and much more. Even though the field axioms give us all the rules of the algebra of numbers, they don’t guarantee the existence of very many numbers. In fact, the field could have only the members 0 and 1. (You might want to amuse yourself by figuring out what the addition and multiplication tables would have to look like then.) But, when we add the order axioms, we find that 0 < 1 < 1 + 1 < (1 + 1) + 1 < ((1 + 1) + 1) + 1 < . . . . This gives an infinite number of numbers. We will be able to produce the set N = {1, 2, . . . } of natural numbers, complete with the Principle of Mathematical Induction. From them and their additive inverses and 0 we will get the set Z of integers and, using multiplicative inverses, we can upgrade to Q, the rational numbers. This set is again an ordered field. But it is the Completeness Axiom that really gives us lots of numbers. You √ probably have heard that 2 and π are not rational numbers. It is the completeness axiom that guarantees they exist as real numbers. It is completness that will tell us that a continuous function that is negative at some point and positive somewhere else must be 0 somewhere in between. (without completeness, the cosine function would NEVER cross the x-axis). It will tell us that many integrals exist, even if we can’t figure out their values. It will enable us to solve differential equations, etc. Look closely at this axiom: it is of a different nature than the others. The others talk only about one, two, or three real numbers at a time. The Completness Axiom, however, talks about two sets of real numbers. That is what gives it its power.
13/1/2003 1054 mam
Consequences of the field axioms We won’t spend much time on these, since the student will have ample practice with them in Algebra courses; we just give a few examples which should convince you that all the usual algebraic properties will be deducible. Before we begin, look at the first axiom. It starts for all x, y ∈ R. What this really means is for all x ∈ R and for all y ∈ R This is a common way to abbreviate. Make sure you don’t get “for all x, y . . . ” mixed up with “for all (x,y) . . . ” the latter is talking about an ordered pair of elements. An expression such as “for all x” is an example of a universal quantifier. Logically, “for each x” and “for every x” (∀x) also mean the same thing. The other type of quantifier is the existential quantifier, “there exists ”, “for some ” (∃). Quantifiers play a big role in Analysis. The first results we will establish are • Uniqueness of the identity elements • Cancellation laws for addition and multiplication • Uniqueness of the inverses Theorem. (Uniqueness of the identity elements.) (1) If a ∈ R and for all x ∈ R, x + a = x, then a = 0. (2) if b ∈ R and for all x ∈ R, xb = x then b = 1. Proof. (1) Since 0 is an identity for addition, (A3), for all x ∈ R, x = 0 + x. In particular, a = 0 + a. By hypothesis, for all x ∈ R, x + a = x. Therefore, 0+a=0 which combined with (∗) gives a = 0, as required. 13/1/2003 1054 mam 7
(∗)
8
CONSEQUENCES OF THE FIELD AXIOMS
(2) The corresponding statement for multiplication and 1 is proved in the same way. Here is a more compact version of the proof of (1) Proof. a = a + 0, = 0,
by A3, by hypothesis,
and hence a = 0. Theorem. (Cancellation) (1) For all x, y, z ∈ R, if x + z = y + z, then x = y. (2) Similarly, for all x, y and w in R, if w 6= 0 and xw = yw, then x = y. Proof. (1) Fix arbitrary x, y, z ∈ R for which x + z = y + z. Then, (x + z) + (−z) = (y + z) + (−z) x + (z + (−z)) = y + (z + (−z)), x + 0 = y + 0, x = y,
by A2, associativity by A4 by A3
The proof of (2) is left as an exercise Inverses, too, are unique: Theorem. The only additive inverse of an element x in R is −x. If x ∈ R is not 0, then x−1 is its only multiplicative inverse. Proof. Let x ∈ R. Suppose a is an additive inverse of x. Then, x + a = 0. But also x + (−x) = 0, hence x + a = x + (−x). By commutativity, a + x = (−x) + x, 13/1/2003 1054 mam
CONSEQUENCES OF THE FIELD AXIOMS
9
and then a = −x, by cancellation (the previous result). The proof of the corresponding result for multiplicative inverse is left to the reader. The word inverse by itself usually refers to the multiplicative inverse. while −x is called minus x, or “the negative of x”. In the latter case, we must beware that the negative of a negative number (to be defined later) is not negative, but positive. x We define x − y to be x + (−y) and if y 6= 0, to be xy −1 , y Once we get going, of course, we will not even bother with the justifications of simple algebraic calculations, since we will have much more immediate concerns. The student should, however, achieve a level of competence that such tiny gaps in reasoning can be filled anytime. Here are some more consequences of the field properties: Theorem. Let a,b,c, . . . be elements of R (or another fixed field). Then, (1) −0 = 0; (2) 1−1 = 1; (3) −(−a) = a; (4) if a 6= 0, then (a−1 )−1 = a; (5) a · 0 = 0; (6) (−a)b = −(ab) = a(−b); (7) (−a)(−b) = ab; (8) ab = 0 implies either a = 0 or b = 0; ac a c (9) If b 6= 0 and d 6= 0 then · = b d bd c ad + bc a (10) If b 6= 0 and d 6= 0 then + = b d bd c a (11) If b 6= 0 and d 6= 0 then = if and only if ad = bc. b d We prove a few of these and leave the rest to the reader. Proof. (1) Since 0 + 0 = 0, by (A3), 0 is an additive inverse of 0. But such an inverse is unique, hence 0 = −0. (5) Start with 0 + 0 = 0. Multiply by a on the left to obtain a(0 + 0) = a0. 13/1/2003 1054 mam
10
CONSEQUENCES OF THE FIELD AXIOMS
Now use the distributive law a0 + a0 = a0. The right side here is also 0 + a0, by A3 and A1, so a0 + a0 = 0 + a0 a0 = 0,
by cancellation.
(6) As with (1) we show (−a)b is an additive inverse of ab. ab + (−a)b = b(a + (−a)), = b0 =0
by commutativity and distributivity by A4 by the previous result.
Thus ab + ((−a)b) = 0, so by uniqueness of additive inverse, (−a)b = −(ab). That a(−b) = −(ab) is proved in the same way.
13/1/2003 1054 mam
Consequences of the ordered field axioms We begin with a very simple, but very striking, consequence of these seemingly innocent assumptions. It illustrates the role of the tricotomy axiom. Theorem. 0 < 1. Proof. By trichotomy 0 < 1, 0 = 1, or 1 < 0. We know 0 6= 1, by the axiom. Now, suppose 1 < 0. Then, since addition preserves order (O3) , 1 + (−1) < 0 + (−1). And, since 1 + (−1) = 0 and 0 + (−1) = −1, we have 0 < −1. This and O4 yields 0(−1) < (−1)(−1). But 0(−1) = 0 and (−1)(−1) = 1 · 1 = 1, so 0 < 1. This contradicts 1 < 0. The only possibility remaining is 1 > 0, as required.
Let us officially define 2 = 1 + 1, 3 = 2 + 1, 4 = 3 + 1, 5 = 4 + 1, 6 = 5 + 1, 7 = 6 + 1, 8 = 7 + 1, 9 = 8 + 1 . From the previous result we get, for all x ∈ R 1 + 0 < 1 + 1, 1<2
so,
In general, for all x, x < x + 1, so 1 < 2 < 3 < 4 and so on. This shows that there are lots of numbers going out to the right. Since taking negatives reverses the order (as we will see), there are also lots of numbers going to the left. Here is how we can also get lots of numbers between two numbers. It is surprising how useful this fact is. Theorem. If a < b, then a <
a+b < b. 2
Proof. (Practice exercise.) As you are aware a < b < c means a < b and b < c. One of the conveniences of this notation is that by transitivity, we can drop the middle one, so it includes the statement a < c. 13/1/2003 1054 mam 11
12
CONSEQUENCES OF THE ORDERED FIELD AXIOMS
The expression a ≤ b is an abbreviation for a < b or a = b. The new relation ≤ on R is still transitive and addition preserves this order: a ≤ b implies
a+c≤b+c
And similarly if c ≥ 0, a ≤ b implies
ac ≤ bc
The trichotomy axiom O1 breaks into two conditions: (O1a) For all x, y ∈ R, x < y or x = y or y < x and (O1b) For all x, y ∈ R, not (x < y and x = y) and not (x < y and y < x) and not(x < y and y < x). These can be stated in terms of the relation ≤ as the following two statements (TOa) For all x, y ∈ R, either x ≤ y or y ≤ x. (TOb) For all x, y ∈ R, if x ≤ y and y ≤ x then x = y. (A relation that satisfies TOa and TOb is called a total order.) The student should prove the above statements as an exercise in simple logic. We will assume them in what follows. As you expect, the expression a > b is defined to mean b < a and a ≥ b is defined to mean b ≤ a. Our first Analysis result. (1) If c ≤ x for all x > a, then c ≤ a. (2) Let t > a. If c ≤ x for all x with a < x < t, then c ≤ a. (3) If c ≤ a + ε for all ε > 0, then c ≤ a. (4) If a ≥ 0 and a ≤ ε for all ε > 0, then a = 0. Proof. (1) By trichotomy the negation of c ≤ a is c > a. Suppose c > a. Then, a< Thus, choosing x =
a+c 2
a+c < c. 2
yields x > a and c > x. This proves that
If c > a then
there exists x > a with c > x,
which is the contrapositive of if c ≤ x, for all x > a, 13/1/2003 1054 mam
then
c ≤ a.
CONSEQUENCES OF THE ORDERED FIELD AXIOMS
13
This is the result to be proved. (2) Let t > a and suppose c ≤ x, for all x with a < x < t. Take first x = a+t 2 , so that a < x < t and by the hypothesis, c ≤ x, and by transitivity c < t. Now, suppose c > a. Put x0 = a < x0 < t, so again by hypothesis
a+c 2 .
Then, a < x0 < c. But then by transitivity
c ≤ x0 . Since x0 < c, this contradicts trichotomy. Thus, c > a is false, and by trichotomy, c ≤ a. (3) Suppose c ≤ a + ε, for all ε > 0. Let x > a. Then x − a > 0, so c ≤ a + (x − a) = x. Thus, c ≤ x, for all x > a, hence c ≤ a, by (1). (4) Let a ≥ 0 and a < ε, for all ε > 0. By (1) (or by (3)), we have a ≤ 0, which combined with a ≥ 0 yields a = 0. A real number a is called positive, negative, non-negative, non-positive,
if if if if
a > 0, a < 0, a ≥ 0, and a ≤ 0.
Theorem. For real numbers a, b: (1) if a < b then −b < −a; (2) if a is negative then −a is positive; (3) if a is positive then −a is negative. Proof. (1) Let a < b. Adding −a + (−b) to both sides gives a + ((−a) + (−b)) < b + ((−a) + (−b)),
by O3
(a + (−a)) + (−b) < (b + (−b)) + (−a),
by A2 and A1
0 + (−b) < 0 + (−a) −b < −a, as required. (2) Let a be negative. Then a < 0, so by (1), −0 < −a. But −0 = 0. Thus, −a is positive by definition. 13/1/2003 1054 mam
14
CONSEQUENCES OF THE ORDERED FIELD AXIOMS
Theorem. Let a, b, c, d be real numbers. (1) If a < b and c < d then a + c < b + d. (2) If 0 < a < b and 0 < c < d then ac < bd. Proof. (1) Start with a < b. By O3 we may add c to both sides yielding a + c < b + c.
(∗)
But also c
(∗∗)
Combining (∗) and (∗∗), using transitivity (O2), we have a + c < b + d, as required. (2) This is proved similarly, using O4 instead of O3. Note carefully how the positivity is used. Here are some more familiar properties. The proofs of the above should now present no serious difficulties. They are left as exercises. Make sure you try some. Theorem. Let a, b, c, d be real numbers. Then: (1) a < b and c < 0 implies ac > bc. (2) a > 0 and b > 0 implies ab > 0. (3) a < 0 and b < 0 implies ab > 0. (4) a < 0 and b > 0 implies ab < 0. (5) a2 ≥ 0 and a2 > 0, if a 6= 0. (6) if a > 0 then a−1 > 0 (7) if 0 < a < b then 0 < b−1 < a−1 . (8) if 0 < a < 1, then a2 < a. (9) if a > 1, then a < a2 . (10) 0 < a < b implies a2 < b2 . a c (11) 0 < b, d implies < iff ad < bc. b d b a < (12) 0 ≤ a < b implies a+1 b+1 c a a+c c a < . (13) 0 < b, d and < implies < b d b b+d d 13/1/2003 1054 mam
Absolute Value Definition. For each x ∈ R, the absolute value of x is defined to be |x| = x if x ≥ 0 and −x if x < 0. An immediate consequence (which you should prove for practice) is: Lemma. |x| ≤ a iff −a ≤ x ≤ a. (similarly, for <) Theorem. (1) |x| ≥ 0, for all x; |x| = 0 iff x = 0 (2) |x + y| ≤ |x| + |y|. (3) |xy| = |x||y|. We are using the common abbreviation “iff” for “if and only if”. Proof. (1) If x ≥ 0 then |x| = x ≥ 0. If x < 0, |x| = −x > 0. In both cases, |x| ≥ 0. Now, if x = 0, |x| = x = 0 by definition. Conversely, suppose x 6= 0. Then either x < 0 or x > 0. If x < 0 then |x| = −x > 0, while if x > 0 then |x| = x > 0. Thus, in both cases |x| 6= 0. (2) Since |x| ≤ |x|, we have −|x| ≤ x ≤ |x| and similarly −|y| ≤ y ≤ |y|, and adding gives −(|x| + |y|) ≤ x + y ≤ |x| + |y|. Now using the Lemma again with a = |x| + |y|, we have |x + y| ≤ |x| + |y|, as required. Alternative proof. There are 4 cases: In case x ≥ 0,
y ≥ 0, we have |x| = x, |y| = y, and x + y ≥ 0, so that |x + y| = x + y = |x| + |y|.
In case x < 0, y < 0, we have |x| = −x, |y| = −y and x + y < 0, so |x + y| = −(x + y) = −x + (−y) = |x| + |y|. In case x ≥ 0, y < 0, we have |x| = x, |y| = −y > 0. If x + y ≥ 0, this gives |x + y| = x + y < x + (−y) = |x| + |y|, while if x + y < 0 it gives |x + y| = −x − y ≤ x + (−y) = |x| + |y|. The case x < 0, y ≥ 0 is similar. In all 4 cases then, we had |x + y| ≤ |x| + |y|.
(3) This is similar to the alternative proof of (2), but easier. Property (2) here is called the triangle inequality for absolute value. Often the following variant of the triangle inequality is useful. 13/1/2003 1054 mam 15
16
ABSOLUTE VALUE
Theorem. Let x, y ∈ R. then | |x| − |y| | ≤ |x − y| Proof. |x| = |x − y + y| ≤ |x − y| + |y|, by the triangle inequality, so |x| − |y| ≤ |x − y|. Similarly |y| − |x| ≤ |x − y|. Now | |x| − |y| | is one of |x| − |y| or |y| − |x| = −(|x| − |y|), depending on the sign, so in both cases | |x| − |y| | ≤ |x − y|. For real numbers x and y, the quantity |x − y| is called the distance from x to y. We will write d(x, y) or dist(x, y) for this. Theorem. For all a, b, c ∈ R, d(a, c) ≤ d(a, b) + d(b, c). Proof. Let a, b, c ∈ R. Then, |a − c| = |(a − b) + (b − c)| ≤ |a − b| + |b − c|, by the triangle inequality for absolute value. The rest is just translation into the distance notation. The property stated in the above theorem is called the triangle inequality for distance. The name really comes from the analogous result in two dimensions. It says that the length of one side of a triangle is less than or equal to the sum of the lengths of the other two sides.
13/1/2003 1054 mam
The natural numbers For a subset A of R, A is called inductive (or inductively closed) if (i) 1 ∈ A, and (ii) for all x, x ∈ A implies x + 1 ∈ A. One inductive set is R itself. The set of natural numbers is defined to be the set N=
\
{A : A is inductive }.
Theorem. N is an inductive subset of R and N is contained in each inductive set. Proof. First, there is an inductive subset of R, namely R itself, so N exists and is a subset of R. T Since 1 ∈ A for all inductive sets A, 1 ∈ {A : A is inductive } = N. To see that N satisfies the other half of the definition, let x ∈ N. Then, x ∈ A, for all inductive sets A. Hence x + 1 ∈ A, for all inductive sets A Therefore, x+1∈
\
{A : A is inductive } = N,
by definition. Thus x ∈ N =⇒ x + 1 ∈ N, so N is inductive. For the second statement, let A be inductive. If x ∈ N, then x ∈ A, by definition of intersection. So N ⊂ A. The end of the proof is a special case of the general fact that the intersection of a family of sets is always contained in each set in the family. The above immediately yields the Principle of Mathematical Induction: Theorem. (PMI) (1) If A is an inductive set of natural numbers, then A = N. (2) If P (n) is a statement about natural numbers n, such that (i) P (1) is true, and (ii) whenever P (n) is true, P (n + 1) is true, then P (n) is true for all n ∈ N. Proof. (1) The hypothesis was that A ⊂ N and A is inductive, so N ⊂ A by the previous result. Thus A = N. (2) Let A = {n ∈ N : P (n) is true }. By hypothesis, 1 ∈ A and x ∈ A =⇒ x + 1 ∈ A. Thus A is inductive, so A = N, by part (1). That is, for all n ∈ N, P (n) is true.
13/1/2003 1054 mam 17
18
THE NATURAL NUMBERS
Theorem. For all n ∈ N, n ≥ 1. Proof. Here the statement to be proved by induction on n is: n ≥ 1. Now, (i) 1 ≥ 1 is true of course and, (ii) if n ≥ 1, then n + 1 ≥ 1 + 0 ≥ 1. Thus, by PMI n ≥ 1, for all n ∈ N. Corollary. 0 ∈ / N. Proof. If 0 ∈ N we would have 0 ≥ 1, by the previous theorem. But 0 < 1, so this would be a contradiction. So, if n = 1, its predecessor n − 1 is not a natural number, but this is the only such case. Theorem. If n ∈ N and n 6= 1, then n − 1 ∈ N. Proof. Let A be {n ∈ N : n = 1 or n − 1 ∈ N}. Then A contains 1, by definition. Suppose now n ∈ A. Then n ∈ N; hence, (n + 1) − 1 = n ∈ N. Thus n ∈ A =⇒ n + 1 ∈ A, so A is inductive. Hence N = A, by PMI. Thus, for all n ∈ N, n = 1 or n − 1 ∈ N, In other words, n 6= 1 implies n − 1 ∈ N. The last step of the proof used the fact, from elementary logic, that the statement p ∨ q is equivalent to the statement (not p) =⇒ q. (∨ means “or” and =⇒ means “implies”.) Theorem. If n ∈ N and n < m ∈ N, then n + 1 ≤ m. Proof. We use induction on n: So let A = {n ∈ N : for all m ∈ N, if n < m then
n + 1 ≤ m}.
Let m ∈ N. If 1 < m, then m − 1 ∈ N, so 1 ≤ m − 1, hence 1 + 1 ≤ m. Since m was arbitrary, 1 ∈ A. Suppose n ∈ A, and let m ∈ N again be arbitrary. If n + 1 < m, then 1 < m, so m − 1 ∈ N and n < m − 1; hence, n + 1 ≤ m − 1, by the inductive hypothesis (that is, since n ∈ A). Thus, (n + 1) + 1 ≤ m. And this was true for arbitrary m, so A is inductive, and A = N, as required. In the above proof we used a general principle. If we have a statement to prove about two natural numbers n and m, we try to do induction on just one of them (in the above case, n). We get more power by making the inductive hypothesis strong, saying “for all m ∈ N”. The idea is that if you assume more, you can prove more. 13/1/2003 1054 mam
THE NATURAL NUMBERS
19
Corollary. If n ∈ N and n < a < n + 1, then a ∈ / N. Proof. Let n ∈ N and n < a < n + 1. If a ∈ N, then by the theorem n + 1 ≤ a, which contradicts a < n + 1.
Well-ordering property. Each non-∅ subset of N has a least element. If A is a set of real numbers, to say that a0 is a least element of A means a0 ∈ A and for each a ∈ A, a0 ≤ a. Clearly, a set A can have no more than one least element (why?). Proof. Let M ⊂ N and let M be non-empty. Suppose M has no least element. We will show that this leads to a contradiction. Let B be the set of n ∈ N for which n ≤ m, for all m ∈ M . We claim that B is inductive. First, 1 ∈ B. Indeed, 1 ≤ m, for all m ∈ M , since M is a set of natural numbers. Now let n ∈ B. Then n ≤ m, for all m ∈ M , But n ∈ / M , for otherwise n would be the least element of M which we are assuming doesn’t exist. Thus, n < m, for all m ∈ M , so n + 1 ≤ m, for all m ∈ M , by the previous result, that is n + 1 ∈ B. Thus, n ∈ B implies n + 1 ∈ B. This shows B is inductive, so contains all the elements of N. In other words, for all natural numbers n, n ≤ m, for all m ∈ M . Since M 6= ∅ it contains some element, say m0 . But m0 is a natural number so m0 ≤ m, for all m ∈ M . So m0 is the least member of M after all, a contradiction.
13/1/2003 1054 mam
20
THE NATURAL NUMBERS
There is a strengthening of the Principle of Mathematical induction which is often more convenient, known as the 2nd Principle of Mathematical Induction or the Principle of Complete Induction. As usual {1, . . . , n} means the set of all natural numbers k with 1 ≤ k ≤ n. Theorem. (2nd PMI) (1) Let B be a set of natural numbers such that 1 ∈ B and that for all n, {1, . . . , n} ⊂ B implies n + 1 ∈ B, then B = N. (2) If P (n) is a statement about natural numbers n such that (i) P (1) is true, and (ii) whenever P (k) is true for all natural numbers k ≤ n, P (n + 1) is also true, then P (n) is true for all n ∈ N. Proof. Let A be the set of all n ∈ N such that {1, . . . , n} ⊂ B. We will show that A is inductive. Since 1 ∈ B, {1} ⊂ B, so 1 ∈ A. Let n ∈ A. Then {1, . . . , n} ⊂ B, so n + 1 ∈ B. But then, {1, . . . , n + 1} ⊂ B so that n + 1 ∈ A. Theorem. N is closed under addition. The statement means that for all m, n ∈ N, m + n ∈ N. In the proof, we will use the general principle mentioned earlier: We will choose one of m and n and prove the statement with the other universally quantified. Proof. Let A = {n ∈ N :
m + n ∈ N,
for all m ∈ N}.
Then 1 ∈ A, because this just says m + 1 ∈ N for all m ∈ N, which we already know. Now suppose n ∈ A. Then, for each m ∈ N, m + n ∈ N, so m + (n + 1) = (m + n) + 1 ∈ N, so that n + 1 ∈ A. Thus, A is inductive, so each n ∈ N belongs to A, which says that for all n ∈ N and for all m ∈ N, m + n ∈ N, as required. . In the above proof, how did we know to add n + 1 to the m? Because, we assumed n ∈ A and we were trying to show n + 1 in A. For that, we needed to show n + 1 satisfied the definition of the members of the set A. Thus, we fixed one m and added n + 1 to it to see if m + (n + 1) still belonged to N. It did. We also followed a common custom of not bothering to write out the line: n ∈ A implies
n + 1 ∈ A.
Theorem. N is closed under multiplication. Proof. Let A = {n ∈ N : mn ∈ N, 13/1/2003 1054 mam
for all m ∈ N}
THE NATURAL NUMBERS
21
Then, 1 ∈ A, since 1 is a multiplicative identity, so suppose n ∈ A and fix m ∈ N. Then, m(n + 1) = mn + m, by the distributive law. Since n ∈ A, mn ∈ N and thus, by closure under addition, mn + m ∈ N. This proves n + 1 ∈ A, so A is inductive. Therefore by PMI, for all n ∈ N and for all m ∈ N mn ∈ N. Definition. For n ∈ N and x ∈ R we define xn using the idea of induction: x1 = x, and assuming xn defined we let xn+1 = xn x. A definition of this type is called a recursive definition or a definition by recursion. One also lets x0 = 1. (In some books, 00 is left undefined, but these still tend to use it as though its definition were 00 = 1, for example in an expression such as P n k 0 k=0 x , when x is allowed to be 0, so we take 0 = 1 always. We have to be very careful, though, not to give this expression properties that it doesn’t have.) 1. Sometimes the 2nd principle of induction is stated in only one step: If B is a set of natural numbers such that n ∈ N and k ∈ B, for all k ∈ N with k < n implies n ∈ B, then B = N. Show that in using this version one still has to prove 1 ∈ N. 2. If m, n ∈
N, with m > n, then m − n ∈ N.
13/1/2003 1054 mam
22
THE NATURAL NUMBERS
Notes
13/1/2003 1054 mam
The integers and rational numbers We now define the set Z of integers by Z = N − N = {n − m : n ∈ N and m ∈ N}, and the set Q of rational numbers by o nm : m, n ∈ Z, n 6= 0 . Q= n It is easy to show that Z = N ∪ {0} ∪ (−N) and that Q = m n∈N . n : m ∈ Z, We find that Z is closed under addition and taking of additive inverse (that is n ∈ Z implies −n ∈ Z.) Moreover: Theorem. Q is closed under addition and under multiplication and taking of additive and multiplicative inverses. Hence, Q satisfies the axioms of an ordered field. The above unproved statements should be considered exercises. Do a few of them from time to time. Incompleteness of the rationals. Although√ Q is an ordered field, it still has lots of holes. We expect there to be a number ( 2) whose square is 2, for example. But: Theorem. There is no rational number a such that a2 = 2. Proof. (We assume here a tiny bit of number theory. See[ ].) Suppose that a = m/n, in lowest terms and that a2 = 2. Then, m 2 n
= 2,
so that. m2 = 2n2 . This says m2 is even, so m itself must be even. (If m = 2k + 1, odd, then m2 = 4k 2 + 4k + 1, odd.) But if m = 2k, 4k 2 = 2n2 , so 2k 2 = n2 , so n2 is even, and hence n is also even. m was in lowest Thus, m and n are both even, which contradicts the fact that n terms. The above may be extended to show that no prime has a rational square root. This result will be often used in examples. If you would like to prove this, remember the words “even” and “odd” mean “divisible by 2” and “not divisible by 2”. then
(The general result is: If n is a natural number that is not a perfect square, √ n is not rational.)
√ In any case, we see that 2, if it exists, is irrational; that is, not rational. We will prove, however, that it does exist in R as a consequence of completeness. See the section EXISTENCE OF ROOTS. 13/1/2003 1054 mam 23
24
THE INTEGERS AND RATIONAL NUMBERS
Notes
13/1/2003 1054 mam
A bit about Rn As the reader will know from first year Linear Algebra, Rn (real Euclidean n-space) is the set of n-tuples x = (x1 , . . . , xn ) of real numbers. There is an addition x + y and a multiplication cx defined for x, y ∈ Rn and c∈R x + y = (x1 + y1 , . . . , xn + yn ) cx = (cx1 , . . . , cxn ), making Rn a vector space (we won’t reproduce the definition here). Pn There is a dot product defined by x · y = i=1 xi yi and a norm kxk = Pn ( i=1 x2n )1/2 After establishing simple properties of the dot product, one proves the CauchySchwartz inequality: |x · y| ≤ kxkkyk, as follows. If x or y are 0, both sides are 0. If kxk = 1 and kyk = 1, 0 ≤ kx − yk2 = (x − y) · (x − y) = x · x − 2x · y + y · y = kxk2 − 2x · y + kyk2 = 2 − 2x · y, so that x·y ≤1 Similarly, using 0 ≤ kx + yk2 , we have −x · y ≤ 1. Thus, if kxk = kyk = 1, |x · y| ≤ 1,
(∗)
If x 6= 0 and y 6= 0, we can replace x and y in (∗) by x/kxk and y/kyk to obtain x y kxk · kyk ≤ 1, which is the same as |x · y| ≤ kxkkyk, as required. The triangle inequality kx + yk ≤ kxk + kyk. follows from the Cauchy-Schwarz inequality via the calculation kx + yk2 = (x + y) · (x + y) = x · x + 2x · y + y · y ≤ x · x + 2kxkkyk + y · y = kxk2 + 2kxkkyk + kyk2 = (kxk + kyk)2 .
13/1/2003 1054 mam 25
26
A BIT ABOUT
Notes
13/1/2003 1054 mam
Rn
The existence of roots — a consequence of completeness √ We now find out that the irrational number
2 does indeed exist.
Theorem. Let y be a positive real number. Then, for every n ∈ N, there exists a unique positive real number such that xn = y. Proof. (1) First, note that for positive numbers a, b, a < b =⇒ an < bn This is proved by induction on n. (Exercise). (2) This implies uniqueness: Suppose xn1 = y and xn2 = y, with x1 and x2 positive but not equal. Then one must be smaller, by trichotomy. Say x1 < x2 . Then xn1 < xn2 , so we can’t have xn1 = xn2 . The contradiction shows x1 = x2 . (3) Let A = {a > 0 : an ≤ y}, B = {b > 0 : bn ≥ y}. I claim that A and B are not empty and every element of A is ≤ every element of B. Indeed, since y > 0, 0< so we have 0<
y y+1
y y+1
n
≤
< 1,
y y+1
< y.
y Thus, y+1 ∈ A. In the same way, y + 1 > 1, so (y + 1)n ≥ y + 1 > y and hence y + 1 ∈ B.
Now, if a ∈ A and b ∈ B we have an ≤ y ≤ bn , so an ≤ bn , and therefore a ≤ b. This comes from step 1, because if a > b we would have an > bn . (4) Step 3 sets us up for our completeness axiom. There must exist an x with a ≤ x ≤ b, for all a ∈ A and all b ∈ B. We will now show that for this x, xn = y. Let 0 < a < x. Then, a ∈ A. For suppose not; then, an > y, which makes a ∈ B and x ≤ each element of B, a contradiction. Similarly, if we let b > x, then b ∈ B. Thus, an < xn < bn an ≤ y ≤ bn If we multiply the second string of inequalities by −1, they turn around, so an < xn < bn −bn ≤ −y ≤ −an and adding gives an − bn < xn − y < bn − an . In other words, |xn − y| ≤ bn − an ≤ (b − a)nbn−1 . Here we used the fact that bn − an = (b − a)
n X
bn−i ai−1 ≤ (b − a)nbn−1 .
i=1
Now take any ε such that 0 < ε < x and a = x − ε, b = x + ε. Then b − a < 2ε and b < 2x and so |xn − y| ≤ 2εn(2x)n−1 . 13/1/2003 1054 mam 27
28
THE EXISTENCE OF ROOTS — A CONSEQUENCE OF COMPLETENESS
Hence, 0≤
|xn − y| ≤ ε. 2n(2x)n−1
Now ε here was arbitrary satisfying 0 < ε < x. So, |xn − y| = 0. 2n(2x)n−1 Hence |xn − y| = 0. But then xn − y = 0, so xn = y, which completes the proof.
13/1/2003 1054 mam
Cantor’s Principle and the uncountablility of R A closed interval in R is a set of the form [a, b] = {x : a ≤ x ≤ b}. Cantor’s Principle of nested intervals. For each n ∈ N , let In be a (nonempty)Tclosed interval of the real number system and suppose In ⊃ In+1 , for all n. Then, n∈N In 6= ∅. Proof. Let In = [an , bn ], for each n. The condition In ⊃ In+1 says that for each n, an ≤ an+1 ≤ bn+1 ≤ bn . a1 ≤ a2 ≤ · · · ≤ an ≤ bn ≤ · · · b2 ≤ b1 . A simple induction yields that n ≤ k implies an ≤ ak and bk ≤ bn , From this we see that for any n, m ∈ N an ≤ bm . (Indeed, let k = max{n, m}, then an ≤ ak ≤ bk ≤ bm .) Thus, by completeness there exists a real number x such that for all n, an ≤ x ≤ bn ; T that is, x ∈ n∈N In . If we look more closely we can prove that if a = sup{an : n ∈ N, and b = inf{bm : m ∈ N} [See SUPREMA, INFIMA, AND THE ARCHIMEDIAN PROPERTY], then
\
n∈N
In = [a, b].
Each bm is an upper bound for all of the an , so then a ≤ bm . Since this is true for all m, we have a ≤ b. Since a is an upper bound of the an and b is a lower bound of the bm we have an ≤ a ≤ b ≤ bn , for all n∈ N; that is In ⊃ [a, b], for all n ∈ N. In other words
\
n∈N
In ⊃ [a, b].
Finish this, by showing each element of the left side is an element of the right.
The uncountability of the reals. √ We saw in the section on existence of roots that there are irrational numbers, 2 in particular. We are now going to see that there are many more irrational numbers than there are rationals. One can prove that it is possible to “count” the rational numbers; that is, to list all the rational numbers r1 , r2 , r3 , r4 , . . . , letting one rational number rn correspond to each natural number n. The theorem below will prove that this cannot be done for the real numbers. More explanation about the countability of Q and of the concepts will follow the proof. An interval [a, b] of R is called non-degenerate if a < b. Lemma. If x ∈ R and [a, b] is a non-degenerate closed interval of R, then there exists a non-degenerate closed interval I ⊂ [a, b] with x ∈ / I. Proof. Let a0 and b0 be any two points with a < a0 < b0 < b, for example a0 = a + (b − a)/3 and b0 = a + 2(b − a)/3. Then the intervals [a, a0 ] and [b0 , b] are disjoint so either x ∈ / [a, a0 ] or x ∈ / [b0 , b]. Of course, it is quite possible that x is in neither of the two intervals constructed in the above proof. 13/1/2003 1054 mam 29
30
CANTOR’S PRINCIPLE AND THE UNCOUNTABLILITY OF
R
Theorem. Each interval [a, b] ⊂ R with a < b is uncountable, hence R itself is uncountable. Proof. To say that [a, b] is uncountable is to say that it cannot be written in the form [a, b] = {x1 , x2 , . . . } = {xn : n ∈ N}. Suppose to the contrary, that we could write [a, b] in that form. Then, by the / I1 . lemma, there exists a non-degenerate closed interval I1 ⊂ [a, b] with x1 ∈ Again, there is a non-degenerate closed interval I2 ⊂ I1 with x2 ∈ / I2 , Continuing / Ik , we find a non-degenerate closed recursively, assuming Ik defined with xk ∈ / Ik+1 . Thus, we have a nested sequence interval Ik+1 contained in Ik with xk+1 ∈ of closed intervals [a, b] ⊃ I1 ⊃ I2 ⊃ · · · ⊃ Ik ⊃ Ik+1 ⊃ . . . , T with xk ∈ / Ik , for all k. The intersection n∈N In of all these InTcontains none of the points of [a, b], yet by Cantor’s Principle of Nested Intervals, n∈N In 6= ∅. This is a contradiction.
T
T
Let us go through that contradiction slowly. Cantor’s principle says n∈N In 6= ∅. / Ik But, if c is a point of n In , then c belongs to [a, b] so is one of the xk . But xk ∈ and Ik ⊃ n In , so c = xk ∈ / n In .
T
T
Finally, R is uncountable, because R contains the uncountable interval [0,1] and any subset of a countable set is countable. The reader may have seen elsewhere a proof involving decimal expansions. Although it looks quite different, it is essentially the same type of proof. The reason is that if x has decimal expansion 0.d1 d2 d3 . . . then the digits d1 , d2 , d3 , . . . determine a decreasing sequence I1 , I2 , I3 , . . . of intervals to which x belongs and / In . You might try working this out. can be chosen to ensure that xn ∈ One famous way of counting the positive rationals is 1 2 1 3 2 1 4 3 2 1 5 4 3 2 1 , , , , , , , , , , , , , , ,..., 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 listing first those for which the numerator and denominator sum to 2, then those for which they sum to 3, then to 4, then to 5, etc. (If we want to list a number only once we would have to omit duplicates — those which are not in lowest terms, such as 22 , 33 , 62 , etc.) To list all the rationals, start with 0, then alternate positive and negative, thus: 1 1 2 2 1 1 3 3 2 , − , , − , , − , , − , ,. . . 1 1 1 1 2 2 1 1 2 So, the set of rational numbers is countable. A set is finite if is empty or it can be written in the form {x1 , x2 , . . . , xn } where n is a fixed natural number. Finite sets are also considered countable. It is evident, then, that each subset of a countable set is countable. 0,
It turns out that a set (such as Q) which is countable and infinite can always be written {x1 , x2 , x3 , . . . } = {xi : i ∈ N}, with all of the xi distinct. Such sets are called denumerable (or countably infinite).
For more about these things, including rigorous proofs, see the section COUNTABILITY.
13/1/2003 1054 mam
CANTOR’S PRINCIPLE AND THE UNCOUNTABLILITY OF
R
31
Cantor’s Principle in Rn .. To say that I is a closed interval of Rn means I = [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ], where each of [a1 , b1 ], . . . , [an , bn ] is a closed interval of R. If a = (a1 , . . . , an ) and b = (b1 , . . . , bn ) we write [a, b] for this I. With these definitions we obtain the same result as before. Theorem. (Cantor’s Principle of Nested Intervals in Rn .) Let (Ik ) be a sequence of non-empty closed intervals in Rn with Ik ⊃ Ik+1 for all k ∈ N. Then T k Ik 6= ∅. The proof is an exercise. It consists of reducing the problem to the corresponding theorem in R.
13/1/2003 1054 mam
32
CANTOR’S PRINCIPLE AND THE UNCOUNTABLILITY OF
Notes
13/1/2003 1054 mam
R
Suprema, infima, and the Archimedean Property For a subset S of R, a real number c is the largest element of S (also called greatest element or maximum of S), if (1) c ≥ x, for all x ∈ S (2) c ∈ S. We then write c = max S. Similarly, c is called the smallest element of S (also called least element or mininum of S), and we write c = min S, if (1) c ≤ x, for all x ∈ S and (2) c ∈ S. It is immediate that a set can only have one maximum and only one minimum. Indeed, if c1 and c2 are different elements of S then either c1 < c2 or c2 < c1 , so they can’t both be the largest, nor can they both be the smallest. A finite set F is one which is either ∅ or can be written as F = {x1 , . . . , xn }, where n is a natural number. It is a nice exercise in induction to prove that every non-empty finite set of real numbers has a largest and a smallest element. We also know, from the Well-Ordering Property, that each non-empty set of natural numbers has a least element. The existence of greatest or least elements in other situations is guaranteed by completeness. For a subset S of R, a real number u is said to be an upper bound for S if u ≥ x, for all x ∈ S. Similarly, ` is called a lower-bound for S if ` ≤ x, for all x ∈ S. Theorem. (a) If a non-empty set S ⊂ R has an upper bound, then it has a least one. (b) If S has a lower bound, then it has a greatest one. Proof. (a) First suppose S has an upper bound and let U be the set of all the upper bounds of S. Our job is to prove that U has a least element. Now S is non-empty, by hypothesis and U is non-empty, since an upper bound is assumed to exist. By definition of upper bound, if x ∈ S and u ∈ U , we have x ≤ u. Thus, we are in the setting of the Completness Axiom: S plays the role of A and U plays the role of B. Therefore, there exists a real number c such that x ≤ c ≤ u, for all x ∈ S and u ∈ U . This is exactly what we want. That x ≤ c for all x ∈ S says that c is an upper bound for S and that c ≤ u, for all u ∈ U says that c is the least one. (b) Let us briefly outline the proof for lower bounds. If S 6= 0 and has a lower bound, then the set L of all its lower bounds is non-empty, and by definition ` ≤ x for all ` ∈ L and all x ∈ S. By the completeness axiom, there is a c with ` ≤ c ≤ x for all ` ∈ L and all x ∈ S. This c is then the greatest lower bound of S. The least upper bound bound, if it exists, is also called the supremum of S (supS) and the greatest lower bound of a set S, if it exists, is also called the infimum of S (inf S). A set which has an upper bound u is called bounded above by u and a set which has a lower bound ` is called bounded below by `. The theorem we just stated is often quoted as: 13/1/2003 1054 mam 33
34
SUPREMA, INFIMA, AND THE ARCHIMEDEAN PROPERTY
Theorem. (a) Every non-empty set bounded above has a supremum, and (b) Every non-empty set bounded below has an infimum. In many books, one of these is taken as an axiom and is called completeness. We have shown these follow from our Completeness Axiom. It is also easy to show that each implies our axiom. Going back to the meanings of the terms we see that c = sup S iff (1) c ≥ x, for all x ∈ S and (2) if u ≥ x for all x ∈ S, then u ≥ c. It is often useful, mainly in checking individual examples, to use condition (2) in its contrapositive form: (20 ) if u < c then there exists x ∈ S with u < x. (In other words, if u is less than c, then it is no longer an upper bound of S.) You should compare (1) and (2) to the corresponding conditions for maximum, to see that if c = max S, then c = sup S. In the case of lower bounds we see that c = inf S iff (1) c ≤ x, for all x ∈ S and (2) if ` ≤ x for all x ∈ S, then ` ≤ c, or in contrapositive form (20 ) if ` > c then there exists x ∈ S with ` > x. (In other words, if ` is greater than c, then it is no longer a lower bound of S.) Again a minimum of a set is always its infimum. Examples. (a) For the set A = {− 12 , 3, 2, 1}, 3 is the maximum of A since it belongs to A and is ≥ each of the members of A. It is therefore also the supremum of A. Similarly, the mimimum (hence also the infimum) of A is −1 2 . (b) The infimum of the set B = (3, ∞) = {x : x > 3} is 3. This requires proof. Certainly, (1) 3 < x, for all x ∈ (3, ∞). Now, suppose ` is a lower bound for B. That is, suppose ` ≤ x for all x > 3 Then, ` ≤ 3. This was “our first Analysis result” (in the section on consequences of the ordered field properties.) Thus, (2) If ` ≤ x for all x ∈ B, then ` ≤ 3 Hence 3 is the greatest lower bound of B (infimum). (c) The set N of natural numbers has 1 as a minimum, hence infimum. It is not, however, bounded above, so has no supremum. The Archimedean Property. (1) If a, b ∈ R and a > 0 then there exists an n ∈ N such that na > b. (2) For each ε > 0, and b in R, there exists n ∈ N such that
b n
< ε.
Proof. (1) Let a > 0, and b any element of R. Suppose the conclusion were false. Then we would have na ≤ b, for all n ∈ N. This means that the set S = {na : n ∈ N} 13/1/2003 1054 mam
SUPREMA, INFIMA, AND THE ARCHIMEDEAN PROPERTY
35
is bounded above (by b). Thus, S has a supremum (say, c = sup S). Since c is an upper bound for S, na ≤ c, for all n ∈ N. But if n ∈ N, so is n + 1, hence (n + 1)a ≤ c, for all n ∈ N, and therefore, na ≤ c − a, for all n ∈ N. But this shows that c − a is also an upper bound for S, even though it is smaller than c, since a > 0. This is a contradiction, establishing (1). (2) This is really the same statement: Let ε > 0. Then by part (1), there is an n ∈ N such that nε > b. But then nb < ε, as required. Another way to look at the contradiction at the end of the proof of (1) is to conclude from na ≤ c − a, for all n ∈ N that c ≤ c − a, by property (2) of supremum, and hence c < c. Practice Problems. (1) Find the supremum, infimum, maximum and minimum of { n1 : n ∈ N}, if they exist. (2) Do the same for (1, 2]. (3) Determine the set S = (if possible).
T
1 1 n∈N (1 − n , 1 + n ),
and find its supremum and infimum
Density of the rationals. If a and b are real numbers with a < b, then there is a rational number r with a < r < b. The intuitive idea here is to use the Archimedean property to get to the left of a and then move in small steps of size n1 < b − a to the right till we (or the rational number) land in (a, b). Proof. Fix a, b ∈ R with a < b. By the Archimedean property, there exists k ∈ N with k > −a, so −k < a. Now, by the second form of the Archimedean property, there exists n ∈ N, with 1 < b − a. n Again, by the Archimedean property, there exists a natural number m such that m n1 > a + k. By the well-ordering property of N, there is a smallest such natural number. If m is this one, we have m >a+k n 13/1/2003 1054 mam
but
m−1 ≤a+k n
36
SUPREMA, INFIMA, AND THE ARCHIMEDEAN PROPERTY
Thus, a< and
m − k, n
1 m−1 1 m −k = + − k ≤ + a < b − a + a = b, n n n n
since
1 n
< b − a.
Summarizing, we have a< that is, a < r < b, where r =
m − k < b, n
m − k, a rational number. n
Density of the irrationals. If a and b are real numbers with a < b, then there is an irrational number r with a < r < b. This should be proved as an exercise. Hint: if you add an irrational number to a rational one, the result is irrational.
13/1/2003 1054 mam
SUPREMA, INFIMA, AND THE ARCHIMEDEAN PROPERTY
37
Solution of some practice problems. Here are some examples which were suggested as practice problems. Example. Find the supremum, infimum, maximum and minimum of { n1 : n ∈ N}, if they exist. Soln. Let us call the set W . Since, for all n ∈ N, 0<
1 ≤ 1, n
(∗)
we see that 1 ≥ x, for all x ∈ W ; and since 1 ∈ W , 1 is the maximum of W . Since, when it exists, the maximum of a set is always its supremum, this says also that 1 = sup W . Again from (∗), we see that (1) 0 ≤ x, for all x ∈ W (0 is a lower bound of W ). To show it is the greatest one, suppose ` is a lower bound for W with ` > 0. Then, by the Archimedean property, there exists n ∈ N with n1 < `; that is, (2) there exists x ∈ W with x < `, as required. Example. Determine
T
n∈N (1
− n1 , 1 + n1 ) and find its supremum and infimum.
Soln. Call this set S. A point x belongs to S iff x ∈ (1 − n1 , 1 + n1 ), for all n ∈ N. Now, certainly 1 belongs to each of these intervals, so 1 ∈ S. If x 6= 1, then either x > 1 or x < 1. If x > 1, then x − 1 > 0, so by the Archimedean property, there exists a natural number n with n1 < x − 1, and hence x > 1 + n1 . Thus, for this n, / S. Similarly, if x < 1, 1 − x > 0 and we can find n with x∈ / (1 − n1 , 1 + n1 ), so x ∈ 1 < 1 − x, so that x < 1 − n1 , and again x ∈ / S. n Thus, 1 is the only element of S; S = {1}. So 1 = inf S = sup S = max S = min S. Example. Find the maximum, minumum, supremum and infimum of the set (1,2], provided these exist. Let us put A = (1, 2]. Then, by definition x ∈ A iff 1 < x ≤ 2. Thus, 2 ≥ x, for all x ∈ A and 2 ∈ A. These are exactly the two conditions that 2 be the maximum of A. So, 2 = max A. Since, when it exists, the maximum of a set is always its supremum, this says also that 2 = sup A. Let us go over this in this special case. We need two things: (1) that 2 is an upper bound of A (i.e. 2 ≥ x, for all x ∈ A) and (2) that it is the least one (if u ≥ x, for all x ∈ A, then u ≥ 2). The first of these was already part of 2 being the maximum of A. As for the second, since 2 ∈ A, if u ≥ each element of A, it is certainly ≥ 2. Now let us look investigate the question of minimum and infimum. It looks like 1 “would like to be” the minimum of A, but 1 ∈ / A, so it cannot be the mimimum. We do, however, have that 1 ≤ x, for all x ∈ A; that is, 1 is a lower bound of A. Is it the greatest one? Well, suppose ` > 1. . There are two possibilities: ` > 2 and 13/1/2003 1054 mam
38
SUPREMA, INFIMA, AND THE ARCHIMEDEAN PROPERTY
1 < ` ≤ 2. In the first case, ` is not a lower bound of A, since 2 ∈ A. In the second, if we put a = 1+` 2 , we have 1 < a < ` ≤ 2 , In particular, 1 < a ≤ 2, so a ∈ A, and also, a < `, so ` is not a lower bound for A. We have thus proved that the infimum of A is 1: 1 = inf A. Now, if A had a minimum, it would have to also be the infimum, so 1 would have to be the mimimum, which it is not, since 1 ∈ / A. Thus, (1, 2] has no minumum.
Supremum and Infimum as operations. We can view sup and inf as operations acting on the elements of a set S, yielding a new real number. This point of view can be quite convenient, especially in theoretical calculations. The definition of supremum for example, together with the theorem on its existence could be stated: Let S be a non-empty set of real numbers. If x ≤ u, for all x ∈ S, then (0) sup S exists, (1) x ≤ sup S, for all x ∈ S and (2) sup S ≤ u. Thus, (for all x ∈ S, x ≤ u) implies sup S ≤ u. “The supremum of a set of numbers, each ≤ u is also ≤ u.” (Of course, there is nothing new here, but a change of emphasis.) Similarly, for a non-empty set S of real numbers, from ` ≤ x, for all x ∈ S, we deduce (0) inf S exists (1) inf S ≤ x, for all x ∈ S, (2) ` ≤ inf S. “The infimum of a set of numbers, each ≥ `, is also ≥ `.” Let us look at these principles in action. Example. (1) If S is a set of real numbers, and c ∈ R, cS denotes {cx : x ∈ S}. Now, if S is a non-empty set of real numbers, bounded above and c > 0, then sup(cS) = c sup S. Proof. Let S be non-empty and bounded above, so that sup S exists. Let y ∈ cS. Then, y = cx, for some choice of x ∈ S. Thus, y/c = x ∈ S. Since the supremum of S is an upper bound for S, y/c ≤ sup S. Since c > 0, we may multiply by it and get y ≤ c sup S. But, y was an arbitrary element of cS, so for all y ∈ cS, y ≤ c sup S. 13/1/2003 1054 mam
SUPREMA, INFIMA, AND THE ARCHIMEDEAN PROPERTY
39
sup(cS) ≤ c sup S.
(*)
Hence, also, Now, suppose x ∈ S. Then, cx ∈ cS. Hence, cx ≤ sup(cS). Therefore, since c is positive, 1 sup(cS). c The x ∈ S here was arbitrary, so this statement is true for all x ∈ S. Thus again we may “take the supremum of the left-side over all x ∈ S,” obtaining x≤
sup S ≤
1 sup(cS). c
Thus, c sup S ≤ sup(cS). This with the earlier statement (*), yields sup(cS) = c sup S.
(2) Let A ⊂ B, where A is non-empty and B is bounded above. Then sup A ≤ sup B. Proof. Let x ∈ A. Then x ∈ B. Since sup B is an upper bound of B, x ≤ sup B. Thus, for all x ∈ A, x ≤ sup B. Hence, sup A ≤ sup B. How would the above change if we talked about inf instead of sup?
13/1/2003 1054 mam
40
SUPREMA, INFIMA, AND THE ARCHIMEDEAN PROPERTY
Notes
13/1/2003 1054 mam
Exponents Here we will explain how to define ax where 0 < a ∈ R and x ∈ R. What we want is that a1 = a and the laws of exponents hold, namely: (1) ax+y = ax ay
(2) (ax )y = axy
(3) (ab)x = ax bx ,
for as many values of x, y, a, b as possible. Natural exponent. Start by allowing a to be any real number. For n ∈ N, an is defined recursively by a1 = a and an+1 = an a. Thus, as we expect, an becomes a | · a{z. . . a}. n times
Using induction, we can prove that an+m = an am
(an )m = anm
and (ab)n = an bn ,
for all real a, b and natural numbers n, m; that is, the laws of exponents for a, b real and x, y naturals. Integer exponent. Now, restrict to the case a 6= 0. For x an integer, x = n − m where n, m ∈ N. We can prove that if n − m = n0 − m0 , where n, m, n0 , m0 are natural numbers, then 0
an an = am am0 So it makes sense to define an−m =
an . am
Notice in particular that a0 = a1 /a1 = 1. With this definition, it is an easy exercise to show the laws of exponents hold for a, b non-zero and x, y integers. Rational exponents. If x is any positive real number and n is a natural number, there exists a unique real number y such that y n = x and we let x1/n = y. See the section EXISTENCE OF ROOTS. Here, if a > 0 and r is a rational number of the form m/n where m ∈ Z and n ∈ N, we put ar = (am )1/n . To justify this, we need to show that if r is also p/q, where p ∈ Z and q ∈ N, then (am )1/n = (ap )1/q . By the uniqueness of roots, it is enough to show that when (am )1/n is raised to the q th power, the result is ap . As a first step, notice that (am )1/n is also (a1/n )m , since ((a1/n )m )n = (a1/n )mn = (a1/n )nm = ((a1/n )n )m = am . But m/n = p/q means mq = pn. Thus, ((am )1/n )q = ((a1/n )m )q = (a1/m )mq = (a1/n )np = ((a1/n )n )p = ap , as required. Again, straightforward calculations allow us to prove the laws of exponents for a, b > 0 and x, y rational. 13/1/2003 1054 mam 41
42
EXPONENTS
Arbitrary real exponents. We now come to the point of all the above discussion. We would like to extend the definition of ax so it applies to all real x. For this, we look at three cases. a > 1, a = 1, and a < 1. Since we want (1b)x = 1x bx , we have no choice but to take 1x = 1; and since we want 1 = (aa−1 )x = ax (a−1 )x , we will have to have ax = (a−1 )−x . So we work with the case a > 1. Let a > 1, x ∈ R. Then, ax is defined to be the unique number satisfying. ar ≤ ax ≤ as , for all rational numbers r, s with r < x < s.
(∗)
To establish that such a number exists, consider the two sets C = {ar : r ∈ Q, r < x} and D = {as : s ∈ Q, x < s} We show (a) c ∈ C and d ∈ D imply c < d. (b) For each ε > 0 there exist c ∈ C and d ∈ D with d − c < ε. It follows that there is exactly one number between the elements of C and D, namely, sup C = inf D (this is an exercise) and the definition sets ax equal to this value. (a) Since a > 1, for natural numbers m, n, am > 1, hence am/n = (am )1/n > 1. That is, for all rational numbers r > 0, ar > 1. From this we deduce that for rationals, (∗∗) r < s implies ar < as . Indeed, s − r > 0, so ar < ar (as−r ) = ar+s−r = as . Thus, if r < x and x < s, ar < as , which establishes (a). (b) By induction one can show that for every natural number n, b > 1 implies bn − 1 ≥ n(b − 1); hence, taking b = a1/n , a − 1 ≥ n(a1/n − 1); hence, if 0 < s − r < 1/n, then as − ar = ar (as−r − 1) ≤ ar (a1/n − 1) ≤
ar (a − 1) . n
Let M be any rational > x. Use the Archimedean property to find n so large aM (a − q) that < ε. Then choose r, s rational with n x− Then, r < M and s − r <
1 n,
1 1
so
as − ar ≤
aM (a − 1) ar (a − 1) ≤ < ε. n n
This is what we wanted: ar ∈ C, as ∈ D and as − ar < ε. 13/1/2003 1054 mam
EXPONENTS
43
Note. (i) If x is rational, the new definition and the old definition of ax give the same value, because of (∗∗). (ii) If x < y, and a > 1, we see that ax < ay . Indeed, by density we may choose rationals u, v with x < u < v < y, so ax ≤ au < av ≤ ay . As we stated earlier, if a = 1, we let ax = 1, for all x, if 0 < a < 1, we let a = (a−1 )−x x
Theorem. With the above definitions, the expression ax satifies the laws of expo(2) (ax )y = axy (3) (ab)x = ax bx , . nents: (1) ax+y = ax ay Proof. (1) Start with a > 1 and x, y ∈ R. Let u and v be rational with u<x+y 1, by applying it to a−1 : ax+y = (a−1 )−(x+y) = (a−1 )−x+(−y) = (a−1 )−x (a−1 )(−y) = ax ay (2) This is similar to the proof of (1), but uses multiplication instead of addition. Case a > 1, x, y > 0. Let u, v be rationals with u < xy < v. Since u < xy, and y > 0, uy < x. Thus, there exists a rational r with u/y < r < x. And then u/r is rational s with s < y. Thus, au = ars = (ar )s ≤ (ax )s ≤ (ax )y . Similarly (ax )y ≤ av . Hence, au ≤ (ax )y ≤ av , for all rationals u, v with u < xy < v, so by the uniqueness in the definition, axy = (ax )y . 13/1/2003 1054 mam
44
EXPONENTS
The cases with a > 1 and one or both of x and y negative follow from the positive case by simple algebraic manipulation. For example, if a > 1, x > 0, y < 0, we have axy = (a−xy )−1 = (ax(−y) )−1 = ((ax )(−y) )−1 = (ax )y . We leave the other cases to the reader. As in (1), the cases with a = 1 are trivial and the cases with a < 1 follow from the cases a > 1 by considering a−1 . (3) is left as an exercise.
13/1/2003 1054 mam
Topology in R and Rn . In R, the distance between x and y, defined by d(x, y) = |x − y|, satisfies (1) d(x, y) ≥ 0 (2) d(x, y) = 0 if and only if x = y (3) d(x, y) = d(y, x) (4) d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality). These conditions are also satisfied by distances in Rn . For points (vectors) in Pn 2 1/2 Rn , the distance from x to y is d(x, y) = kx − yk, where kxk = . kxk is i=1 xi called the norm of x and d is called the Euclidean metric or Euclidean distance in Rn . We are going to talk about the topology of the real numbers. It seems to actually be just as easy to talk about the same concepts in Rn and even aids in understanding, so we do so, except when something special about the real numbers comes up. As usual, if we are working in R, the complement of a set A in R is Ac = R \ A and if we are working in Rn , Ac means Rn \ A. Let X = R or Rn . For x ∈ X, ε > 0, B(x, ε) = {y : d(y, x) < ε} is called the ε-neighbourhood or (open) ε-ball about x. ε is called the radius of the ball. For each x, we will let Bx = {B(x, ε) : ε > 0}, the family of all these basic neighbourhoods. Now if x ∈ X and A ⊂ X, exactly one of three things can happen: A contains some U ∈ Bx c
(x is an interior point of A)
A and A intersect each U ∈ Bx
(x is a boundary point of A)
A is disjoint from some U ∈ Bx
(x is an exterior point of A)
The set of such points are respectively called the interior of A, the boundary of A, and the exterior of A, and denoted int A, bd A, ext A. You can check right away that a point x is an exterior point of A if and only if it is an interior point of Ac . A set is called open if all its points are interior points and is called closed if all points outside it are exterior points. Thus, (1) A is open iff for each x ∈ A, there exists an ε > 0 with B(x, ε) ⊂ A; (2) A is closed iff for each x ∈ / A, there exists an ε > 0 with B(x, ε) ∩ A = ∅. Since B(x, ε)∩A = ∅ ⇔ B(x, ε) ⊂ Ac , (2) says A is closed iff for all x ∈ Ac , there exists ε > 0 with B(x, ε) ⊂ Ac . That is: Lemma. A set is closed if and only if its complement is open. Since a set clearly contains its interior [[proof ?]] we can say A is open iff it is equal to its interior, and it is closed iff its complement is equal to its exterior. Examples. Determine which of the following sets are open, closed, neither or both. (a) (1, 5). (b) [1, 5]. (c) (1, 5] (d) [2, ∞). 13/1/2003 1054 mam 45
46
TOPOLOGY IN
R
AND
Rn
(e) R. (f) {1, 2, 3}. (g) { n1 : n ∈ N}.
√
(h) {r ∈ Q : r >
2}.
(i) [2, 4] × [3, 6], a closed interval of R2 . Soln. (a) Let A be the open interval (1, 5). If x ∈ A, then 1 < x < 5, so x − 1 > 0 and 5 − x > 0. Put ε = min{x − 1, 5 − x}. Then, ε > 0. I claim B(x, ε) ⊂ A. Indeed, y ∈ B(x, ε) implies |y − x| = d(y, x) < ε. Thus, 1 = x − (x − 1) ≤ x − ε < y < x + ε ≤ x + 5 − x = 5, so y ∈ (1, 5) = A. We have shown that for each x ∈ A, there exists ε > 0 with B(x, ε) ⊂ A. So all of the points of A are interior points, or in other words, A is open. On the other hand, A is not closed because 1 is not in A and every neighbourhood of 1 contains points of A: if ε > 0 then the neighbourhood B(1, ε) = (1 − ε, 1 + ε) and if a is any point between 1 and min{5, 1 + ε}, such as min{3, 1 + ε/2}) , then 1 < a < ε and a ∈ A, so a ∈ B(1, ε) ∩ A. (b) Let B = [1, 5]. (Don’t confuse this B with a ball B(1, ε).) Then B is not open, since for example, 1 ∈ B, but is not an interior point. Every neighbourhood of 1 contains points outside of B; indeed, if ε > 0 B(1, ε) = (1 − ε, 1 + ε) contains the point 1 − ε/2, which does not belong to B. However, B is closed. To prove this, we take an element x ∈ / B and find a neighbourhood of x which is disjoint from B. For such an x, either x < 1 or x > 5. In case x < 1, we let ε = (1 − x)/2, which is > 0. Then B(x, ε) = (x − (1 − x)/2, x + (1 − x)/2. All of its elements are < 1, so B(x, ε) ∩ B = ∅. In case x > 5, we put ε = (x − 5)/2. Then t ∈ B(x, ε) implies t > x − (x − 5)/2 > x − (x − 5) = 5, so t ∈ / B. In other words, B(x, ε) ∩ B = ∅. Thus, in both possible cases, x has a neighbourhood which doesn’t intersect B. (Actually, we could have taken ε = 1 − x in the first case and x − 5 in the second. The smaller choice of ε was taken just to make us feel “safer”: B(x, ε) is well away from B. ) (c) If C = (1, 5], then C is not closed, because as for part (a), each neighbourhood of 1 intersects C. Use the same a. On the other hand, C is not open, because each neighbourhood of 5 contains points of C c , namely points > 5. (Indeed, if ε > 0, then 5 + ε/2 ∈ B(5, ε), but does not belong to C.) 13/1/2003 1054 mam
TOPOLOGY IN
R
AND
Rn .
47
(d) [2, +∞) means {x ∈ R : 2 ≤ x < ∞}. Call this set A. Then 2 ∈ A, but / int(A). Thus, A is not each neighbourhood of 2 intersects Ac = (−∞, 2), so 2 ∈ open. On the other hand, any point of Ac — that is, any point x < 2 — is in the exterior. Indeed, let ε = 2 − x. Then t ∈ B(x, ε) implies t < x + ε = 2, so B(x, ε) is disjoint from A. Thus, A is closed. (e) If x ∈ R, then each neighbourhood of x is ⊂ R, so x ∈ int R; thus, R is open. R is also closed, for there are no points outside of R, hence all points outside of R are exterior! The condition is vacuously satisfied. (f) Let A = {1, 2, 3}. Here we have only 3 points. If x ∈ / A, put ε = min{d(x, 1), d(x, 2), d(x, 3)}. This minimum exists since the minumum of a finite set always exists and it is > 0 since each of d(x, 1), d(x, 2), d(x, 3) is > 0. Now, the ball B(x, ε) does not intersect A, because y ∈ B(x, ε) implies d(x, y) < ε ≤ d(x, a), for each a ∈ A. This shows that each point outside of A is an exterior point, so A is closed. On the other hand, if a ∈ A, then a is not an interior point of A. Indeed, if ε > 0, then B(a, ε) = (a − ε, a + ε) is an infinite set and A is finite, so B(a, ε) contains (infinitely many) points that are not in A. Thus, B(a, ε) 6⊂ A. (g) Let M = { n1 : n ∈ N}. Then M is neither open nor closed. M not closed: 0 ∈ / M . To see that 0 has no neighbourhood disjoint from M , involves the Archimedean property. If ε > 0, there exists n ∈ N with n1 < ε. But then n1 ∈ B(0, ε) ∩ M . M not open: 1 ∈ M , but each neighbourhood of 1 contains many points of M c [[Why?]] . √ (h) Let S = {r ∈ Q : r > 2}. Then the density of the irrationals will show S is not open, and the density of the rationals will allow us to conclude S is not closed. In more detail: suppose x ∈ S. Then S does not contain a neighbourhood of x. Indeed, if ε > 0 then B(x, ε) is an interval, so contains an irrational, and S consists entirely of rationals. Thus, S is not open. √ / S and if and ε > 0, On the other hand, if x is any irrational > 2, then x ∈ B(x, ε) contains the interval (x, x + ε). Any rational in this set belongs to S. Thus such an x has no neighbourhood disjoint from S, so S is not closed. / (i) Let I = [2, 4] × [3, 6]. Then, I is closed. To prove this, let c = (c1 , c2 ) ∈ / [2, 4] or c2 ∈ / [3, 6] (or both). In the case c1 ∈ / [2, 4], [2, 4] × [3, 6]. Then either c1 ∈ either c1 < 2 or c1 > 4. In the first case, let ε = 2 − c1 . Wephave to show that B(c, ε) ∩ I 6= ∅. Let x = (x1 , x2 ) ∈ B(c, ε). Then |x1 − c1 | ≤ (x1 − c1 )2 + (x2 − c2 )2 = d(x, c) < ε. Thus, x1 < c1 + ε = 2. Therefore x1 ∈ / [2, 4], so x ∈ / [2, 4] × [3, 6]. The case x1 > 4 is proved similarly, using ε = x1 − 4. And, of course, the case / [3, 6] splits into 2 subcases, also proved in the same way. x2 ∈ We now prove I is not open. There are lots of points that are not interior points. To show the way to more general situations, let us look at the point a = (2, 5). Since 1 ∈ [2, 4] and 5 ∈ [3, 6], (2, 5) ∈ I. To show that no neighbourhood of a is entirely contained in I, let ε > 0 be arbitrary. Let x = (2 − ε/2, 5). / [2, 4]. But, x ∈ B(a, ε), since d(x, a) = kx − ak = Then, x∈ / I, since x1 = 2 − ε/2 ∈ p 2 (2 − ε/2 − 2) + (5 − 5) = ε/2 < ε. This shows B(a, ε) is not contained in I. 13/1/2003 1054 mam
48
TOPOLOGY IN
R
AND
Rn
Since ε was arbitrary, no ball around a is contained in I. Thus, a is not an interior point of I and therefore, I is not open. Pay close attention to the following two theorems. You will notice that they do not depend on the special structure of R, but only on the properties of distance. Thus, they are valid also in Rn (and even in more general metric spaces). Theorem. Every open ball is an open set. Proof. Let U = B(a, r), where r > 0. Let x ∈ U . Then d(x, a) < r. Put ε = r − d(x, a). Then B(x, ε) ⊂ U. Indeed, if y ∈ B(x, ε) then d(y, x) < ε, so d(y, a) ≤ d(y, x) + d(x, a) < ε + d(x, a) = r − d(x, a) + d(x, a) = r, that is, d(y, a) < r. In other words, y ∈ U , as required.
Theorem. (1) The union of any family of open sets is open. (2) The intersection of a finite family of open sets is open. S Proof. (1) Let {Gi : i ∈ I} be a family of open sets and put U = i∈I Gi . If x ∈ U , then (by definition of union) there exists an i ∈ I with x ∈ Gi . Since Gi is open, there exists an ε > 0 with B(x, ε) ⊂ Gi . But Gi ⊂ U , so B(x, ε) ⊂ U . Thus, for each x ∈ U there is a neighbourhood of x contained in U , so U is open. (2) It will be enough to prove that the intersection of two open sets is open, since the general case will follow by induction. So let G1 and G2 be open and let x ∈ G1 ∩ G2 . Then, x ∈ G1 and x ∈ G2 . Since each of these is open, there exist ε1 > 0 and ε2 > 0 such that B(x, ε1 ) ⊂ G1 and B(x, ε2 ) ⊂ G2 . Put ε = min{ε1 , ε2 }. Then ε > 0 and B(x, ε) is contained in G1 and also contained in G2 , so B(x, ε) ⊂ G1 ∩ G2 . Thus, each point of G1 ∩ G2 has a neighbourhood contained in G1 ∩ G2 , so G1 ∩ G2 is open. The two properties in the above theorem form the basis for the concept of topology in more advanced studies. As a corollary to the theorem we obtain: Theorem. The intersection of any family of closed sets is closed; the union of any finite family of closed sets is closed. Proof. Recall that a set is closed if and only if its T complement is open. So let {Fi : i ∈ I} be a family of closed sets and let C = i Fi . Then, for each i ∈ I, Fic is open. Therefore,
[
Fic is open.
i∈I
But, by De Morgan’s laws, c
C =
\ i∈I
Fi
!c
=
[
Fic ,
i∈I
c
so C is open. Therefore, its complement, C is closed. The proof of the second statement is similar and is left to the reader. 13/1/2003 1054 mam
Interior and boundary Interior and boundary points were defined in the section Topology in R and Rn . It is worth noticing that the interior points are the points of the set that are not in the boundary. Lemma. int A = A \ bd A. Proof. If x ∈ int A then there exists a neighbourhood U = B(x, ε) of x with U ⊂ A. Since x ∈ U , x ∈ A. But U ⊂ A also implies U ∩ Ac = ∅, so x ∈ / bd A. This shows x ∈ int A implies x ∈ A \ bd A. Conversely, if x ∈ A \ bd A, then x ∈ A and x ∈ / bd A. The second statement means that there is a neighbourhood U of x which does not intersect both A and Ac . But x ∈ U so U intesects A, hence U doesn’t intersect Ac , That is U ⊂ A, so that x ∈ int A. Practice. All points of A that are not interior points are in the boundary. Proposition. (1) A is open iff it is disjoint from its boundary. (2) A and Ac have the same boundary. (3) A is closed iff it contains its boundary. Proof. (1) Once again, int A = A \ bd A so A is open
⇔
A = int A ⇔
(2) and (3) are left to the reader.
A = A \ bd A
⇔
A ∩ bd A = ∅
Example. Find the boundary, and interior points for the following. (a) (1, 5). (b) [1, 5]. (c) [1, 5). (d) [2, ∞). (e) R. (f) {1, 2, 3}. (g) { n1 : n ∈ N}.
√
(h) {r ∈ Q : r >
2}.
Soln. (a) Let A be the open interval (1, 5). What are its boundary points and what are its interior points? As we have shown elsewhere A is open: each point of A is an interior point and since A can have no interior points outside, these are all of them; A = int(A). But, if ε > 0 then the neighbourhood B(1, ε) = (1 − ε, 1 + ε) contains both points of A and of Ac . Indeed, if a is any point between 1 and min{5, 1 + ε}, such as min{3, 1 + ε/2}, then 1 < a < ε and a ∈ A, so a ∈ B(1, ε) ∩ A 13/1/2003 1054 mam 49
50
INTERIOR AND BOUNDARY
and if b is, say, 1 − ε/2, then b ∈ Ac and 1 − ε < b < 1, so b ∈ B(1, ε) ∩ Ac . Thus, 1 is a boundary point of A. In the same way, each neighbourhood of 5 contains points of A and of Ac . Thus, 5 is also a boundary point of A. If x > 5, taking ε = x − 5 we find that B(x, ε) ∩ A = ∅, so x is an exterior point (hence not a boundary point of A). Similarly, if x < 1, x is not a boundary point point of A. We see that bd A = bd(1, 5) = {1, 5}. Notice that A contained none of its boundary points. (b) Let B = [1, 5]. Here again each point of (1,5) is an interior point of B, with exactly the same proof as in (a) and, just as for (1,5) bd B = {1, 5}. This time, however, not all the points of B are interior points: the points 1, 5 are in B, but are boundary points rather than interior points. int B = (1, 5). (c) If C = (1, 5], then one more time we see that bd C = {1, 5} and int C = (1, 5), but now C contains one of its boundary points but not the other. (d) Let A = [2, +∞) = {x ∈ R : 2 ≤ x < ∞}. If x > 2 and ε = x − 2, then B(x, ε) ⊂ [2, ∞) so (2, ∞) ⊂ int[2, ∞). Each neighbourhood of 2 intersects both A and Ac = (−∞, 2), so 2 ∈ bd A. Since there are no more points of A, int A = (2, ∞). Similarly, any point < 2 is in the exterior, so bd A = {2}. (e) If x ∈ R, then each neighbourhood of x is ⊂ R, so x ∈ int R; there are no more points, so int R = R and bd R = ∅. (f) Let A = {1, 2, 3}. Here we have only 3 points. If x ∈ / A, put ε = min{d(x, 1), d(x, 2), d(x, 3)} this minimum exists since the minumum of a finite set always exists and it is > 0 since each of d(x, 1), d(x, 2), d(x, 3) is > 0. Now, the ball B(x, ε) does not intersect A, because y ∈ B(x, ε) implies d(x, y) < ε ≤ d(x, a), for each a ∈ A. This shows that each point outside of A is an exterior point, so not a boundary point of A. On the other hand, if a ∈ A, then a is a boundary point of A. Indeed, if ε > 0, then a ∈ B(a, ε) ∩ A, so B(a, ε) ∩ A 6= ∅. Also, B(a, ε) = (a − ε, a + ε) is an infinite set and A is finite, so B(a, ε) contains (infinitely many) points that are not in A; that is, B(a, ε)∩Ac 6= ∅. This shows that each point of A is a boundary point of A. (g) Let M = { n1 : n ∈ N}. We find bd M = M ∪{0}. That is, each of the points of M are boundary points, 0 is another one, and there are no others. To see that 0 is a boundary point, we need the Archimedean property: If ε > 0, there exists n ∈ N with n1 < ε. But then n1 ∈ B(0, ε) ∩ M . On the other hand, 0 ∈ B(0, ε) ∩ M c . 13/1/2003 1054 mam
INTERIOR AND BOUNDARY
51
That there are no other boundary points, involves 3 cases: x < 0, 0 < x < 1, and x > 1. In the first case, if we choose ε = −x, B(x, ε)∩M = ∅, since all elements of M are > 0. The case x > 1 is similar. Just use ε = x − 1. It is the case 0 < x < 1 that is more interesting. Let x ∈ / M , but 0 < x < 1. Thus, there is no natural number with x = 1/n, so 1/x > 1 and 1/x is not a natural number. Then there is a greatest natural number n = b1/xc less than 1/x, and n + 1 = d1/xe. √ (h) Let S = {r ∈ Q : r > 2}. Then√ density of the rationals and of the irrationals will allow us to conclude bd S = [ 2, ∞) and int S = ∅. In the following list of properties we will see the interior of A is the largest open set contained in A. Properties of interior. (0) A is open iff A = int A (1) int A ⊂ A; (2) A ⊂ B implies int A ⊂ int B. (3) If G is open and G ⊂ A, then G ⊂ int A. (4) int A is open; that is, int(int A) = int A. (5) int(A ∩ B) = (int A) ∩ (int B) Proof. (0) and (1) have been proved before, we are just collecting them here together. (2) If x ∈ int A, then there exists an open ball U = B(x, ε) with U ⊂ A. But A ⊂ B, so U ⊂ B and hence x ∈ int B. (3) If G is open and G ⊂ A, then G = int G ⊂ int A by (0) and (2) (4) Let x ∈ int A. We have to show that there exists a neighbourhood of x contained in int A. By definition, we do know that there is an open ball U = B(x, ε) ⊂ A. But an open ball is an open set, so U ⊂ int A, by (3). Thus int A is open. The formula int(int A) = int A follows by (0). (5) Since A ∩ B ⊂ A, (2) yields int(A ∩ B) ⊂ int A and similarly, int(A ∩ B) ⊂ int B so int(A ∩ B) ⊂ (int A) ∩ (int B) On the other hand, int A and int B are open, so (int A) ∩ (int B) is open 13/1/2003 1054 mam
52
INTERIOR AND BOUNDARY
and (int A) ∩ (int B) ⊂ A ∩ B. Thus, (int A) ∩ (int B) ⊂ int(A ∩ B), by (3). We said that int A is the largest open set contained in A. Well, (4) says it is open, (1) says it is contained in A and (3) says it contains any other open set contained in A.
13/1/2003 1054 mam
Metric spaces and subspaces A metric space is a set S together with a function d : S × S −→ R (called a metric) such that for all x, y, z ∈ S, (1) d(x, y) ≥ 0 (2) d(x, y) = 0 if and only if x = y (3) d(x, y) = d(y, x) (4) d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality). Examples. (a) R, with d(x, y) = |x − y|. Pn 2 1/2 (b) Rn , with d(x, y) = kx − yk, where kxk = . kxk is called the i=1 xi norm of x and d is called the Euclidean metric in Rn . The first three conditions are very easy to prove from the properties of the square root function. Pn To prove the triangle inequality, one introduces the dot product x · y = i=1 xi yi , proves the Cauchy-Schwarz inequality: |x · y| ≤ kxkkyk, deduces the triangle inequality of the norm kx + yk ≤ kxk + kyk, and as in the real number system, turns it into the result about distance as follows kx − yk = kx − z + z − yk ≤ kx − zk + kz − yk. (c) The main examples we work with in this course are subspaces of Rn (or R). If X ⊂ Rn , we can define a distance function on X simply by restriction dX (x, y) = d(x, y) = kx − yk, for x, y ∈ X. This distance is still a metric, because the three properties were already satisfied for elements of Rn , so they are certainly satisfied for elements of X. The set X with the metric dX is called a (metric) subspace of Rn . In a metric space, the open ball centred at a radius r is B(a, r) = {x : d(x, a) < r}, and the closed ball centred at a radius r is B(a, r) = {x : d(x, a) ≤ r}, One also says “ball about a” instead of ball centred at a”. In a metric space, a set A is called bounded if it is contained in some ball. (It doesn’t matter whether one uses open or closed balls. Why?) Theorem. (a) In R, A is bounded in the metric space sense iff it is bounded above and bounded below, and iff there exists M with |x| < M , for all x ∈ A. (b) In Rn , A is bounded in the metric space sense iff it is contained in some ball centred at 0. (In part (b) we could use any other point. It is just that 0 is one we know, and often find very convenient.) 13/1/2003 1054 mam 53
54
METRIC SPACES AND SUBSPACES
Proof. (a) If A is bounded in the metric space sense, then there exist s point a and a radius r such that A ⊂ B(a, r). Thus, x ∈ A implies a − r < x < a + r. This shows that A is bounded below by a-r and above by a + r. Now suppose A is bounded below by b and above by c. Then A ⊂ [b, c], and [b, c] is a closed ball B(a, r), where a = (b + c)/2 and r = (c − b)/2. We leave it to the reader to show these are equivalent to the statement about M . It also follows from part (b) (b) If A is contained in a ball centred at 0, it is certainly contained in a ball. Now, suppose A is contained in the ball B(a, r). Then, for all x ∈ A, d(x, a) < r, so d(x, 0) ≤ d(x, a) + d(a, 0) < r + d(a, 0). In other words A is contained in the open ball centred at 0, radius M = r + d(a, 0). Notice also, that if the condition that A is contained in the open ball centred at 0 radius M means exactly that kxk < M , for all x ∈ A. Balls, open sets, closed sets, in subspaces. Let X ⊂ R. Then, BX (a, ε) denotes the open ball centred at a and radius ε in the subspace X . Similar notation is used in the case of subspaces of Rn and for closed balls. We calculate immediately that BX (a, ε) = {x ∈ X : dX (x, a) < ε} = {x ∈ X : d(x, a) < ε} = {x : d(x, a) < ε} ∩ X = B(a, ε) ∩ X. Of course, a similar result holds for closed balls: B X (a, ε) = B(a, ε) ∩ X. Proposition. Let X ⊂ R. For a set U ⊂ X, (1) U is open in X iff there exists an open set G in R such that U = G ∩ X. (2) C is closed in X iff there exists a closed set F in R such that C = F ∩ X. Warning. If X = [a, b], a closed interval and G = R, we get G ∩ X = [a, b] is open in X but not in R. Proof. ( ⇐= ) Let G be open in R. We have to prove that G ∩ X is open in X. Let a ∈ G ∩ X. Then a ∈ G. Since G is open in R, there exists ε > 0 such that B(a, ε) ⊂ G. Therefore BX (a, ε) = B(a, ε) ∩ X ⊂ G ∩ X. Therefore, every point of G ∩ X is an interior point and hence G ∩ X is open in X. ( =⇒ ) Let U be open in X. We have to find G in R such that U = G ∩ X. For each a ∈ U,there is an open ball in X about a contained in U . That is there exists εa > 0 such that BX (a, εa ) ⊂ U . Then [
BX (a, εa ) = U (why?).
a∈U
But BX (a, εa ) = B(a, εa ) ∩ X, for all a ∈ U 13/1/2003 1054 mam
METRIC SPACES AND SUBSPACES
Put G=
[
55
B(a, εa ).
a∈U
Then G is open (in the whole space) and G∩X =
[ a∈U
B(a, εa ) ∩ X =
[
BX (a, εa ) = U.
a∈U
Thus, U = G ∩ X, where G is open in the whole space. (b) is left as an exercise. (Hint: What is the relationship between open and closed sets.) There are similar results about interior and about closure. WARNING: THE CORRESPONDING STATEMENT ABOUT BOUNDARY IS FALSE. SO IS THE CORRESPONDING STATEMENT ABOUT COMPACT SETS. In the case of compact sets, it turns out that a subset of X is compact in X iff it is compact in the whole space. (See the section: Compactness in subspaces.)
13/1/2003 1054 mam
56
METRIC SPACES AND SUBSPACES
Notes
13/1/2003 1054 mam
Closure and accumulation: the Bolzano-Weierstrass Theorem (set form) Closure. There is an operator that does for closed sets what interior does for open ones. For A ⊂ R or Rn (or any other metric space) we define the closure of A by cl A = A ∪ bd A. Then, the elements of cl A are called closure points of A. Here is a characterization of closure which is so important that many take it as the definition. Theorem. x is a closure point of A iff every neighbourhood of x intersects A. You should prove this as an exercise, from the definition. We can also deduce this fact from the following formula. Lemma. (cl A)c = int(Ac ). Proof. This is just a calculation. Since cl(A) = A ∪ bd A, (cl A)c = (A ∪ bd A)c = Ac ∩ (bd A)c = Ac \ bd(A) = Ac \ bd(Ac )
since bd(A) = bd(Ac )
c
= int(A ). To get the above theorem, then, we see that c ∈ cl(A) if and only if c ∈ / int(Ac ); c that is, iff there does not exist a neighbourhood of c contiained in A — in other words, iff every neighbourhood of c intersects A. Properties of closure. (0) A is closed iff A = cl A (1) cl A ⊃ A; (2) A ⊂ B implies cl A ⊂ cl B. (3) If F is closed and F ⊃ A, then F ⊃ cl A. (4) cl A is closed; that is, cl(cl A) = cl A. (5) cl(A ∪ B) = (cl A) ∪ (cl B) The proofs are left as exercises. They follow almost immediately from the corresponding results for interior. Alternatively, one may use similar proofs. Accumulation points. For a point c and a set A, c is an accumulation point of A if for each ε > 0, B(c, ε) ∩ A \ {c}6= ∅. Thus, c is an accumulation point of A iff each neighbourhood of c intersects A in a point other than c. The set of all accumulation points of A will be denoted here by acc A. (In some books this is denoted A0 and is called the derived set of A. We won’t use this terminology.) 13/1/2003 1054 mam 57
58
CLOSURE AND ACCUMULATION: THE BOLZANO-WEIERSTRASS THEOREM
A neighbourhood of c with c removed is called a deleted neighbourhood of c. In particular, the set B 0 (c, ε) = B(c, ε)\{c} is called the deleted neighbourhood of c radius ε. (Here ε > 0 as usual.) Using this language we could say that c is an accumulation point of A if each deleted neighbourhood of c intersects A. Theorem. (a) acc A \ A = bd A \ A = cl A \ A. (b) cl A = A ∪ acc A
(= A ∪ bd A).
Proof. (a) Let c ∈ (acc A) \ A. Then c ∈ / A and every neighbourhood of c intersects A \ {c}. Let U be a neighbourhood of c. Since U intersects A \ {c} it intersects A and it also intersects Ac , since c ∈ U ∩ Ac . This shows that c ∈ bd A. Since it is also in Ac , c ∈ bd A\A. Conversely, let c ∈ bd A \ A. Then c ∈ bd A and c ∈ / A. By definition of boundary, every neighbourhood U of c intersects A (and Ac ). And since c ∈ / A, U ∩ A = U ∩ A \ {c} = 6 ∅. So that c ∈ acc A. The second equality is trivial: cl A \ A = (A ∪ bd A) \ A = bd A \ A. (b) is of the same level of difficulty: cl A = A ∪ bd A = A ∪ (bd A \ A) = A ∪ (acc A), by (a). Many books use the formula cl A = A ∪ acc A as the definition of closure. Examples. (1) If A is (2,3], then acc A = [2, 3]. (2) No finite set has an accumulation point. Indeed, if F is finite and ε = min{d(c, a) : a ∈ F \ {c}), then ε > 0 and B(c, ε) ∩ F \ {c} = ∅. (3) If A = { n1 : n ∈ N}, then acc A = {0}. (4) acc N = ∅. The following is one of the reasons for the name “accumulation” point. A point x is called an isolated point of A if it belongs to A, but is not an accumulation point of A. Thus, in the above examples (2), (3), and (4) all the points of the set are isolated. Theorem. A point c is an accumulation point of A iff every neighbourhood of c contains infinitely many points of A. Proof. ( ⇐= ) This direction of the proof is immediate. If a neighbourhood contains infinitely many points, then it contains one other than c! ( =⇒ ) Let c ∈ acc A. Then, by definition, ∀ε > 0,
B(c, ε) ∩ A \ {c} = 6 ∅.
First, let ε1 be any number > 0. Taking ε = ε1 in (∗) we have B (c, ε1 ) ∩ A \ {c} = 6 ∅. 13/1/2003 1054 mam
(∗)
CLOSURE AND ACCUMULATION: THE BOLZANO-WEIERSTRASS THEOREM
59
Choose x1 ∈ B(c, ε1 ) ∩ A \ {c}. Since x1 6= c, d(x1 , c) > 0. Let ε2 be any positive number ≤ d(x1 , c). Take this for ε in the definition and get a point x2 ∈ B(c, ε2 ) ∩ A \ {c}. So that 0 < d(x2 , c) < d(x1 , c) < ε1 . Continuing recursively, assuming xn defined in B(c, εn ), let 0 < εn+1 ≤ d(xn , c), then use the definition to choose xn+1 ∈ B(c, εn+1 ) ∩ A \ {c}. Thus d(xn+1 , c) < d(xn , c) < δ. This defines a sequence (xn ) of elements of A \ {c} in B(c, δ). Moreover, at each stage d(xn+1 , c) < d(xn , c), so if m > n, d(xm , c) < d(xn , c) so the points xn are distinct. You see that, don’t you? If xn = xm then d(xn , c) = d(xm , c)
Bolzano-Weierstrass Theorem (set form). Every bounded infinite set in R or Rn has an accumulation point. We will be giving several proofs of this, because of the various techniques that they teach. Many of our results are true in general metric spaces. This is not one of them. It depends on the order completeness of the reals. Proof. (Case of R.) Let A be a bounded infinite set in R. Since A is bounded, there exists a closed bounded interval I with A ⊂ I. Let d be its length (if I = [a, b], the length of I is b-a). We may bisect I, writing it as the union of two disjoint intervals J and K I = J ∪ K. with the length of J and the length of K both d/2. Now A= A∩I = A ∩ (J ∪ K) = (A ∩ J) ∪ (A ∩ K). Now at least one of A ∩ J and A ∩ K must be infinite, otherwise A would be finite. So let I1 be one of J or K containing infinitely many points of A. Continue this recursively — if In has been chosen of length d/2n choose In+1 a closed interval of length d/2n+1 contained in In and containing infinitely many points of A. By Cantor’s Principle of nested intervals, the intersection \ In n∈N
contains some point c. (In fact it will be a singleton {c}, but that is not needed here.) We claim c is an accumulation point of A. To see this, let ε > 0. Then there exists n ∈ N with d/2n < ε and we have In ⊂ B(c, ε). Indeed, c ∈ In and if x ∈ In , then d(x, c) < length(In ) = d/2n < ε. Finally, since In contains infinitely many points of A, the neighbourhood B(c, ε) contains infinitely many points of A, so c is an accumulation point of A. The proof of the Bolzano-Weirestrass Theorem in Rn is essentially the same, using Cantor’s Principle in Rn . At each stage, we divide intervals into 2n intervals of diameter half that of the previous, and continue as before. 13/1/2003 1054 mam
60
CLOSURE AND ACCUMULATION: THE BOLZANO-WEIERSTRASS THEOREM
Notes
13/1/2003 1054 mam
Compactness and the Heine-Borel Theorem The general statements and definitions we make here are valid in R, or Rn or any metric space. The Heine-Borel Theorem however is only true in R or Rn . If U is a family of sets and A is another set, U covers A means the union of the sets of U contains A [ [ U ⊃ A. U= U ∈U
Definition. K is compact means that whenever U is a family of open sets covering K, there is a finite U 0 ⊂ U which also covers K. Example 1. Every finite set is compact. Proof. Let K = {x1 , . . . , xn }. To prove K is compact we must begin with an arbitrary family of open sets which covers K and extract from it a finite set which still covers K. So let U be such a family of open sets. [
U ⊃ K.
U ∈U
Then for each i = 1, . . . , n,
[
xi ∈
U,
U ∈U
hence we may choose a Ui ∈ U with xi ∈ Ui . Put U 0 = {U1 , . . . , Un }. Then
[
U 0 = U1 ∪ · · · ∪ Un ⊃ K,
and U 0 is finite, so U 0 is a finite subfamily of U which covers K. We have shown that every family of open sets which covers K has a finite subfamily which also covers K. That is, by definition, K is compact. Example 2. Let A = [1, ∞) = {x ∈ R : x ≥ 1}. Then A is not compact. To prove this we must show that there exists U, a family of open sets which covers A, such that there does not exist a finite subfamily U 0 of U which also covers A. For each n ∈ N, put Un = (0, n), an open interval, hence an open set. Put U = {Un : n ∈ N}. Then [ [ [ U= U= Un . U ∈U
n∈N
This contains A. Indeed, a ∈ A means a ∈ R with a ≥ 1. By the Archimedean Property, there exists N ∈ N with N > a, hence 0 < a < N. In other words a ∈ UN , so [ Un . a∈ n∈N 0
Now, suppose U is a finite subfamily of U. Then, we can write U 0 = {Un1 , . . . , Unk }, 13/1/2003 1054 mam 61
62
COMPACTNESS AND THE HEINE-BOREL THEOREM
where k ∈ N. Let M be the maximum of the numbers n1 , . . . , nk . Since the Un are (increasingly) nested, [ U 0 = (0, M ) = UM . This does not contain A since A contains many points > M . Thus, U is a family of open sets covering A, which has no finite subfamily which covers A. Therefore, A is not compact. Example 3. (2,5] is not compact. To prove this notice that this interval has no minimum. It is the fact that the 2 is missing that will make it not compact. We put Un = (2 + n1 , ∞), for each n ∈ N. Then, Un = (2, ∞) \ (2, 2 + n1 ]. The family U = {Un : n ∈ N} consists of open sets and it covers (2,5]. Indeed, [
Un =
n∈N
[
T
n (2, 2
+ n1 ] = ∅, so
\ 1 1 2, 2 + = (2, ∞) \ = (2, ∞) ⊃ (2, 5]. (2, ∞) \ 2, 2 + n n
n∈N
n∈N
Now, suppose U 0 is a finite subfamily of U, say U 0 = {Un1 , . . . , Unk } 1 , ∞) If M is the largest of the ni , i = 1, . . . , k, then the union of U 0 is UM = (2 + M and there is a point in 1 , (2, 5] \ UM = 2, 2 + M
so that U 0 does not cover (2,5]. Thus, (2,5] is not compact. The examples here are indicative of a general theorem. Theorem. (1) Every compact set is bounded. (2) Every compact set is closed. Proof. (Written for the case of Rm .) (1) Let K be compact. For each n ∈ N, let Un = B(0, n). Then, Un is open for each n ∈ N. Moreover, each point x is some distance d(0, x) away from 0. Thus, there exists an n with n > d(0, x), so x ∈ B(0, n). Since this is true for all x ∈ Rm , [
Un = Rm ⊃ K.
n∈N
Therefore, by compactness, there exists a finite subfamily U 0 of U which also covers K. Say, U 0 = {Un1 , . . . , Unk }. If M is the maximum of the ni , we have [
U 0 = UM ⊃ K.
In other words, x ∈ K implies kxk < M, so K is bounded. 13/1/2003 1054 mam
COMPACTNESS AND THE HEINE-BOREL THEOREM
63
(2) Let K be compact and assume that K is not closed. We will find an open cover of K which has no finite subcover. This will be a contradiction. We have slipped into the popular language here. A family U of open sets which covers K is referred to as an open cover of A, even though normally “open” refers to a set. A subfamily of a cover of A which still covers A is called a subcover. Sometimes one says ‘subcover of A’, but it is ‘sub’ of U and cover of A.
Well, since K is not closed, there exists a ∈ cl K which is not in K. For each n ∈ N, let Un = B(a, 1/n)c . Then each Un is open since the closed ball B(a, 1/n) = {x : d(x, a) ≤ 1/n} is a closed set. Now \ B(a, 1/n) = {a}, n∈N
(why?) so [
Un =
n∈N
[
B(a, 1/n)
c
!
=
n
\
B(a, 1/n)
!c
= {a}c ,
n∈N
But, a ∈ / K, so this contains K. We thus have our open cover. If {Un1 , . . . , Unk } were a finite subcover, taking M = max{n1 , . . . , nk } would give UM =
k [
Uni ⊃ K.
i=1
Since UM = B(a, 1/M )c , we have B(a, 1/M ) ∩ K = ∅. But this is a contradiction — a ∈ cl K implies every neighbourhood of a intersects K. Hence, every compact set is closed. Theorem. Every closed subset of a compact set is compact. Proof. Let K be compact. Let F be a closed set with F ⊂ K. Let U be a family of open sets covering F .
e
e
e
The plan is to construct an open cover U of K from this, extract a finite U 0 ⊂ U which covers K and then get from it the finite U 0 ⊂ U which covers F .
Please do not make the mistake many students make of starting with a cover of K.
We put Ue = U ∪ {F c }. Since F is closed, the elements of Ue are still open. Since U covers F , [ [ U ∪ F c ⊃ F ∪ F c ⊃ K, Ue = U ∈U
so Ue is an open cover of K. Thus, there is a finite Ue0 which covers K and hence also covers F . Now, F c may or may not belong to Ue0 , but it contains no points of F , so is not really needed. We remove it if it is there: put U 0 = Ue0 \ {F c }. Then, U 0 still covers F , for if x ∈ F , then there exists U ∈ Ue0 with x ∈ U ; but this / F , so must belong to U 0 . U cannot be F c , since x ∈ As we said before, the general results above are actually true in any metric space. The following, however, is not. The fact that we are dealing with Rn is essential. 13/1/2003 1054 mam
64
COMPACTNESS AND THE HEINE-BOREL THEOREM
Heine-Borel Theorem. In Rn , any closed bounded set is compact. Proof. (We write this for the case of R. The modifications for Rn are easy. To faciliate this we are using the term “diameter” rather than “length” of an interval.) Suppose C is closed and bounded. Since C is bounded, it is contained in a closed interval I = [a, b]. Since each closed subset of a compact set is compact, we need only prove that I is compact. For this, let U be an open cover, such that no finite subfamily covers I. Put I1 = I and let d = diam(I). Assume In has been chosen a closed interval ⊂ I such that In is not covered by a finite subfamily of U and 1 diam(In ) = n−1 d. 2 Then, In is the union of two intervals of diameter 12 diam(In−1 ) = 21n d. If both of these can be covered by finite subfamilies of U, then so could In , so at least one of them, say In+1 cannot be covered by a finite subfamily of U. Thus, we have obtained,Tby recursion, a nested sequence (In ) of closed intervals, and by Cantor’s Principle, n In 6= ∅. Let c be a point of this intersection. Now, U is an open cover of I, so there exists U ∈ U with c ∈ U . For such a U , there exists ε > 0 with B(c, ε) ⊂ U . Choose such an ε and then an n so that diam(In ) = d/2n−1 < ε. Then, In ⊂ B(c, ε) ⊂ U. Thus, {U } is a finite subfamily of U which covers In , contrary to the construction: NO finite subfamily of U covers In .
13/1/2003 1054 mam
COMPACTNESS AND THE HEINE-BOREL THEOREM
65
Optional. Theorem. Let F be of compact sets such that for each finite T a non-empty family T subfamily F 0 of F, F 0 6= ∅. Then F 6= ∅. (Recall that
T
F and
T
F ∈F
F both mean the same thing.)
Proof. Let F be a non-empty family of compact sets such that each finite subfamily has non-empty intersection. Let U = {F c : F ∈ F}. Now, F compact implies F is closed and hence F c is open. Thus, U is a family of open sets. T Choose any K ∈ F. If we assume F ∈F = ∅, we have [
U=
[
c
F =
F ∈F
\
F
!c
= ∅c ⊃ K.
F ∈F
Thus, by compactness there exists a finite U 0 ⊂ U which also covers K. In other words, there exists a finite F 0 ⊂ F such that [
F c ⊃ K.
F ∈F 0
That is,
\
F ⊂ K c,
F ∈F 0
or in other words
\
F ∩ K = ∅,
F ∈F 0
This shows that the intersection of the finite subfamily F 0 ∪ {K} is empty. This is a contradiction. Practice. To see if you understand, show that Cantor’s Principle of Nested Intervals can be regarded as as special case of the above theorem.
13/1/2003 1054 mam
66
COMPACTNESS AND THE HEINE-BOREL THEOREM
Notes
13/1/2003 1054 mam
Compactness in subspaces You will recall that, in a subspace X of R, (1) a set U is open iff there exists a set G, open in R, such that U = G ∩ X. (2) a set C is closed iff there exists a set F , closed in R, such that C = F ∩ X. In the first case U need not be open in R. For example, if X is not open in R we can take U = X and G = R. (However, U will be open in R if X is open in R. Indeed, U = G ∩ X is then the intersection of 2 open sets in R.) Similarly, in case (2), C need not be closed in R, though it is so if X is closed in R. But compactness behaves differently. Proposition. Let X be a subspace of R. If K ⊂ X, then K is compact in X iff K is compact in R. Proof. Suppose K is compact in X. To prove K is compact in R, let U be a cover of K by open sets in R. Since [ U ⊃ K, U ∈U
intersecting with X gives [
U ∩ X ⊃ K ∩ X = K,
U ∈U
Thus, {U ∩ X : U ∈ U} is a cover of K by open sets of X. But K is compact in X, so there exists a finite subcover {U1 ∩ X, . . . , Un ∩ X} Thus, (U1 ∩ X) ∪ · · · ∪ (Un ∩ X) ⊃ K and therefore, U1 ∪ · · · ∪ Un ⊃ K, so that {U1 , . . . , Un } is the required subcover, showing that K is compact in R. Now, for the converse, suppose K is compact in R. To show K is compact in the subspace X, let U be a cover of K by open sets of X. For each U ∈ U, there is an open GU of R with U = GU ∩ X. Then
[ U ∈U
GU ⊃
[
U ⊃ K,
U ∈U
so that {GU : U ∈ U} is an cover of K by open sets of R. Thus, there is a finite subcover. Say, GU1 ∪ · · · ∪ GUn ⊃ K. Intersect these all with X and get (GU1 ∩ X) ∪ · · · ∪ (GUn ∩ X) ⊃ K ∩ X = K. 13/1/2003 1054 mam 67
68
COMPACTNESS IN SUBSPACES
That is, U1 ∪ · · · ∪ Un ⊃ K, so {U1 , . . . , Un } is a subfamily of U covering K, the required subcover. Hence, K is compact in X. Everything we said above is also true for subspaces of Rn , or indeed any other metric space: If X is a subspace of a metric space Y , then a subset K of X is compact in X iff it is compact in Y . The conclusion to all of this is that, if a theorem refers to a compact subset, it doesn’t matter what space we think of the set being in. This frequently comes up if we are working with a function defined only on a subset of R, or if we need to restrict a function to a subset of its domain.
13/1/2003 1054 mam
Convergence of sequences — Examples A sequence (xn ) of real numbers is said to converge to the number a provided for all ε > 0, there exists N ∈ N such that for all n > N , |xn − a| < ε. In symbols ∀ε > 0, ∃N ∈ N, ∀n > N, |xn − a| < ε. We then call a the limit of the sequence (xn ) and write lim xn = a n
or
xn −→ a.
Examples. Below we give the general term of each of a number of sequences. The object is to investigate whether the limits of the sequences exist or not, with proof.
1.
1 n
1 2. √ n 3. 1 +
1 2n
4. (−1)n 5.
n2 1 + n2
n2 + 2n n3 − 5 1 1 n − 7. (−1) 2 n 6.
8. n2 Remember that |a + b| ≤ |a| + |b| and ||a| − |b|| ≤ |a − b|, versions of the triangle inequality. Example (1). limn Let xn =
1 = 0. In other words n
1 1 converges to 0 or briefly, −→ 0. n n
1 . n
Analysis. The definition of convergence involves showing that the distance from the general term xn to 0 can be made small when n is large. So we look at 1 − 0 . n We have to make this small when n is large. More precisely, we must show that 1 ∀ ε > 0, ∃N ∈ N, ∀n > N, − 0 < ε. n 13/1/2003 1054 mam 69
70
CONVERGENCE OF SEQUENCES — EXAMPLES
Now,
1 − 0 < ε ⇔ 1 < ε n n 1 ⇔n> . ε 1 1 The Archimedean Property lets us find N ≥ , and any n > N will satisfy n > . ε ε So let us put this into a formal proof. Proof. Let ε > 0 be given (fixed, but arbitrary). Choose by the Archimedean 1 Property N ∈ N with N ≥ . Then, for n > N ε 1 1 < ≤ ε, n N therefore,
Thus,
1 − 0 = 1 < ε. n n 1 for all ε > 0, there exists N ∈ N, for all n > N , − 0 < ε. n
That is, limn
1 = 0. n
Using the second form of the Archimedean Property makes the proof even 1 1 easier. There exists N ∈ N with ≤ ε, so any n > N will satisfy < ε. So let’s N n write the proof that way. Proof. Let ε > 0 be given (fixed, but arbitrary). Choose by the Archimedean 1 Property N ∈ N with ≤ ε. Then, for n > N N 1 1 < ≤ ε, n N therefore,
Thus,
1 − 0 = 1 < ε. n n 1 for all ε > 0, there exists N ∈ N, for all n > N , − 0 < ε. n
That is, limn
1 = 0. n
1 Example (2). ( √ ) converges to n
.
1 Analysis. We guess that limn √ = 0. The relevant distance is n 1 √ − 0 = √1 , n n 13/1/2003 1054 mam
CONVERGENCE OF SEQUENCES — EXAMPLES
71
1 For ε > 0 we need to find N such that n > N implies √ < ε. Now n 1 1 √ √ <ε⇔ < n ε n 1 ⇔ n > 2. ε So we see how the proof should go. Proof. Let ε > 0 be given. Choose N ∈ N with 1 ≤ ε2 N
(possible by the Archi. Property)
Then for n > N , 1 1 < ≤ ε2 , hence n N 1 √ < ε. n 1 Thus, for all n > N , we have √ − 0 < ε. But ε was arbitrary, so for all ε > 0, n 1 1 there exists N ∈ N such that for all n > N , √ − 0 < ε. That is limn √ = 0. n n Example. Let xn = 1 +
1 . Then limn xn = 1. 2n
Analysis. We are to show that no matter what ε > 0 we are given, we can find an N so large that |xn − 1| < ε, for n > N . Now
1 1 |xn − 1| = 1 + n − 1 = n , 2 2
Since 2n ≥ n, for all n ∈ N, 1 <ε 2n provided 1 < ε. n 1 < ε, by the Archimedean Proof. Let ε > 0 be given. Choose N ∈ N with N Property. Then for all n ∈ N, n > N implies 1 1 1 |xn − 1| = 1 + n − 1 = n < < ε. 2 2 n Thus, for all ε > 0 there exists N ∈ N such that n > N implies |xn − 1| < ε; that is, xn −→ 1. 13/1/2003 1054 mam
72
CONVERGENCE OF SEQUENCES — EXAMPLES
Example. Let an = (−1)n . Then limn an does not exist. That is the sequence (an ) does not converge. Notice that
a1 a2 a3 a4
= −1 =1 = −1 = 1, .. .
an =
−1, if n is odd 1, if n is even.
The successive values here never get closer together than 2. Imagine an −→ c. When n is even, |an − c| < ε gives |1 − c| < ε, hence 1 − ε < c < 1 + ε. When n is odd |an − 1| < ε gives |(−1) − c| < ε, hence −1 − ε < c < −1 + ε. We take ε = 1. Then the even case yields 0 < c < 2 and the odd one gives −2 < c < 0. Putting these things together properly will yield a contradiction. Claim. (an ) does not converge. Proof. Suppose an converged. Then there would exist c with an −→ c. Take ε = 1. Then there exists N such that for all n > N , |an − c| <1. For such N , we take an even n > N and get 1 − ε < c < 1 + ε, that is 0 < c < 2. But we may also take an odd n > N , and get −1 − ε < c < −1 + ε, that is −2 < c < 0. Thus the existence of such an N implies 0
and
c < 0,
a contradiction. Therefore, for ε = 1, no N exists with |an − c| < ε, for n > N . Therefore ∀ε > 0, ∃N ∈ N, ∀n > N, |an − c| < ε is false: (an ) does not converge to c. Here c was any proposed limit, so (an ) does not converge. 13/1/2003 1054 mam
CONVERGENCE OF SEQUENCES — EXAMPLES
73
Proof. (Second method.) Here the idea is that if (an ) converges to some c, then the terms must be getting close together. Suppose (an ) converges to c. Take ε = 1 in the definition to find N so that n > N implies |an − c| < 1. Consider any n > N . Then n + 1 is also > N . Thus, |an − an+1 | ≤ |an − c| + |c − an+1 | < 1 + 1 = 2, |an − an+1 | < 2, but also |an − an+1 | = |(−1)n − (−1)n+1 | = |1 + 1| = 2. These two statements together yield 2 < 2, a contradiction. Thus, (an ) does not converge to c. The c here was arbitrary, so (an ) does not converge at all. We will see that the method just used is quite general. It involves the idea of a Cauchy sequence. See the section EXISTENCE: CAUCHY SEQUENCES. Example. We guess limn
The
n2 = 1. One way to see this is to use the calculation: 1 + n2 1 n2 = . 2 1 1+n + 1 n2
1 is very small when n is large, so should be negligible. n2
Analysis.
2 n2 n − (1 + n2 ) 1 + n2 − 1 = 1 + n2 −1 = 1 + n2 1 = . 1 + n2 1 1 < ε provided < ε. And Archimedes can take care of Now 1 + n2 > n, so 2 1+n n that. n2 . Let ε > 0 be given. By the Archimedean property, there 1 + n2 1 ≤ ε. For such an N let n > N . then exists N with N 1 1 1 ≤ε < < 1 + n2 n N Thus, n2 |xn − 1| = − 1 2 1+n 2 n − (1 + n2 ) = 1 + n2 −1 = 1 + n2 1 = < ε. 1 + n2 Proof. Let xn =
13/1/2003 1054 mam
74
CONVERGENCE OF SEQUENCES — EXAMPLES
Therefore, for all n > N , |xn − 1| < ε. Since ε > 0 was arbitrary, for all ε > 0 there exists N such that for all n > N , |xn − 1| < ε. Therefore, limn xn = 1.
Example. limn
n2 + 2n = 0. n3 − 5
Analysis.
2 2 n + 2n n + 2n n3 − 5 − 0 = n3 − 5 .
If n ≥ 2, we may remove the absolute value signs, because then n3 − 5 ≥ 23 − 5 > 0. How much bigger must n be to make for n. If n ≥ 2 we do have
n2 + 2n < ε? Do not attempt to solve n3 − 5
n2 + 2n ≤ n2 + n2 = 2n2 and also n3 − 5 ≥
n3 n3 , whenever ≥ 5, 2 2
which will hold if n3 ≥ 10, which will hold if n ≥ 3. Now, for n ≥ 3, we have 2n2 4 n2 + 2n ≤ 3 = 3 n −5 n /2 n and this is < ε provided n > 4/ε. We are ready to organize this into a proof. Proof. Let us call this sequence (xn ). Let ε > 0 be given. Choose N1 ∈ N with N1 ≥ 4/ε. Let N = max{3, N1 }. Then for n > N , we have 2 n + 2n − 0 |xn − 0| = 3 n −5 2 n + 2n = 3 (since n ≥ 2) n −5 2n2 ≤ 3 (because n ≥ 3 =⇒ n3 − 5 ≥ n3 /2) n /2 4 = n < ε. Thus, for all ε > 0, there exists N ∈ N, such that for all n > N , |xn − 0| < ε. Thus, limn xn = 0.
13/1/2003 1054 mam
CONVERGENCE OF SEQUENCES — EXAMPLES
75
1 1 − . When n is large, we see that an gets 2 n 1 1 close to , when n is even, and close to − , when n is odd. We guess that this 2 2 sequence does not converge. Example (7). Let an = (−1)n
Analysis. Suppose (an ) converges and let c be its limit. For all n ∈ N |an+1 − an | ≤ |an+1 − c| + |c − an |. This can be made small for n large. While also, 1 1 1 1 |an+1 − an | = (−1)n+1 − − (−1)n − 2 n+1 2 n 1 1 1 1 = (−1)n+1 − + − 2 n+1 2 n 1 1 . = 1 − + n+1 n If n ≥ 2, we have 1 1 1 1 + ≤ + n+1 n 3 2 so for these n we may remove the absolute value 1 |an+1 − an | = 1 − + n+1
< 1, signs 1 2 >1− . n n
If n ≥ 6, we would get 2 2 = . 6 3 1 Now we see what to do for a proof. We will use ε = . 3 |an+1 − an | > 1 −
1 Proof. Suppose (an ) converges to c. Then there exists N ∈ N, with |an − c| < , 3 for all n > N . Choose any n ∈ N greater than the maximum of N and 6. Then n + 1 is also > max{N, 6}. We have 2 1 1 |an+1 − an | ≤ |an+1 − c| + |c − an | < + = . 3 3 3 1 1 1 1 |an+1 − an | = (−1)n+1 − − (−1)n − 2 n+1 2 n 1 1 1 1 − + − = (−1)n+1 2 n+1 2 n 1 1 + . = 1 − n+1 n Since n ≥ 6, we may remove the absolute value signs and get 1 2 1 + >1− |an+1 − an | = 1 − n+1 n n 2 2 > 1− = . 6 3 2 2 Combining this with (∗) we have < , a contradiction. 3 3 Thus, (an ) does not converge. 13/1/2003 1054 mam
(∗)
76
CONVERGENCE OF SEQUENCES — EXAMPLES
Example (8). The sequence (n2 ) does not converge. The reason this doesn’t converge is that it becomes too large (unbounded). Let us go directly to the proof. Proof. Suppose limn n2 = c ∈ R. Then, taking ε = 1 in the definition, there exists N such that for all n > N , |n2 − c| < 1. By the triangle inequality, we have n2 ≤ |n2 − c| + |c| < 1 + |c|, for all n > N . But according to the Archimedean Property, there exists n with n > max{N, 1 + |c|}, For such an n we have n < n2 < 1 + |c| < n, which is impossible. This contradiction shows that the limit c did not actually exist.
13/1/2003 1054 mam
Limit theorems for sequences of reals Limits of sequences of real numbers are unique. If a sequence (xn ) converges to a number a, then that is the only number it converges to. This justifies the notation a = limn xn . Theorem (uniqueness). Let (xn ) be a sequence converging to a and to b. Then a = b. Proof. Let ε > 0. Since xn −→ a, we may choose Na such that |xn − a| < ε/2, for n > Na . Since xn −→ b, we may choose Nb such that |xn − b| < ε/2, for n > Nb . Put N = max{Na , Nb }, and let n > N . Then, by the triangle inequality, |a − b| ≤ |a − xn | + |xn − b| . ε ε < + =ε 2 2 Thus, for all ε > 0, |a − b| < ε. Hence, a − b = 0, by what we call “the first analysis result”, so a = b. A set A ⊂ R is called bounded if there exists M ∈ R with |a| ≤ M,for all a ∈ A. You can prove that this is the same as “order bounded”, that is bounded above and bounded below. A sequence (an ) in R is called bounded if its range {an : n ∈ N} is bounded; that is, if there exists M ∈ R with |an | ≤ M , for all n ∈ N. Theorem. Every convergent sequence is bounded Proof. Let (xn ) be a convergent sequence and let a be its limit. From the definition of convergence, if we take ε = 1, we obtain an N ∈ N with |xn − a| < 1, for all n > N. Then, by the triangle inequality, |xn | ≤ |xn − a| + |a| < 1 + |a|, for n > N. Put M = max{|x1 |, . . . , |xN |, 1 + |a|}. Then, |xn | ≤ M, for all n ∈ N. Therefore, (xn ) is a bounded sequence. Comparison Theorem. Let (xn ) and (cn ) be a sequence of real numbers, a ∈ R. If limn cn = 0 and there exists k ≥ 0 and m ∈ N such that |xn − a| ≤ kcn , for all n > m, then limn xn = a. The proof is a good exercise. 13/1/2003 1054 mam 77
78
LIMIT THEOREMS FOR SEQUENCES OF REALS
Theorem. Let (xn ) and (yn ) be sequences in R, a, b ∈ R. (1) If xn −→ a and yn −→ b, then xn + yn −→ a + b. (2) If xn −→ a and c ∈R then cxn −→ ca. (3) If xn −→ a and yn −→ b, then xn yn −→ ab. (4) Let xn −→ a and yn −→ b. If yn 6= 0, for all n ∈ N and b 6= 0, then xn /yn −→ a/b. Thus, lim(xn + yn ) = lim xn + lim yn provided the right side exists. n
n
n
You should write out similar formulations for (2),(3) and (4). Notice that you have the additional requirement, in the case of quotients, that none of the denominators be 0. Proof of (1). Let xn −→ a, yn −→ b. Let ε > 0 be given. By definition, since xn −→ a, there exists N1 ∈ N such that |xn − a| <
ε , 2
for n > N1 .
Also since yn −→ b, there exists N2 ∈ N such that |yn − b| <
ε , 2
for n > N2 .
Let N = max{N1 , N2 }. Then n > N implies both of these hold. Therefore, for n > N, |xn + yn − (a + b)| = |xn − a + yn − b| ≤ |xn − a| + |yn − b| ε ε < + = ε. 2 2
by the triangle inequality
We have shown that for all ε > 0, there exists N ∈ N such that for all n > N , |(xn + yn ) − (a + b)| < ε; that is, (xn + yn ) converges to a + b. Proof of (2). Suppose limn xn = a, c ∈ R. Let ε > 0 be given. Put ε0 =
ε . |c| + 1
Then there exists N ∈ N such that n > N implies |xn − a| < ε0 . Then, n > N implies |cxn − ca| ≤ |c||xn − a| ≤ |c|ε0 = |c|
e < ε. |c| + 1
Since ε > 0 was arbitrary, this shows for all ε > 0 there exists N such that for all n > N |cxn − ca| < ε. In other words, cxn −→ ca. In the above proof, we used |c| + 1 instead of |c| since c could have been 0 and we cannot divide by 0. An alternative would have been to treat the case c = 0 separately. After all, if c = 0, |cxn − ca| = |0 − 0| = 0, for all n ∈ N. 13/1/2003 1054 mam
LIMIT THEOREMS FOR SEQUENCES OF REALS
79
Analysis of (3). Let xn −→ a and yn −→ b. Then |xn yn − ab| = |xn yn − xn b + xn b − ab| ≤ |xn yn − xn b| + |xn b − ab| ≤ |xn ||yn − b| + |xn − a||b|. Since |xn − a| gets small as n gets large and since |b| stays fixed, the second term here will become small. As for the first term, we see that |yn − a| gets small as n gets large. Without some control of |xn | the product |xn ||yn − b| could get large. But since every convergent sequence is bounded, this won’t be a problem: there is a M such that |xn | < M for all n ∈ N, and M |yn − b| can be made small. Proof of (3). Since (xn ) converges, there exists M with |xn | < M for all n ∈ N. Let ε > 0 be given. Since xn −→ a, there exists N1 ∈ N such that |xn − a| <
ε , 2(|b| + 1)
for n > N1 .
Also since yn −→ b, there exists N2 ∈ N such that |yn − b| <
ε , 2M
for n > N2 .
Let N = max{N1 , N2 }. Then n > N implies both of these hold, and |xn yn − ab| = |xn yn − xn b + xn b − ab| ≤ |xn yn − xn b| + |xn b − ab| ≤ |xn ||yn − b| + |xn − a||b| ε ε + |b| <M 2M 2(|b| + 1) ε ε ≤ + = ε. 2 2 Thus, for all n > N , |xn yn − ab| < ε. SInce ε > 0 was arbitrary, this shows xn yn −→ ab. Proof of (4). Let xn −→ a and yn −→ b, with b 6= 0 and yn 6= 0 for all n. It will be enough to prove that 1 1 (∗) −→ , yn b because then, by the limit of products theorem (3), xn 1 1 a −→ a = . = xn yn yn b b Accordingly, let ε > 0 be given. For all n, 1 − 1 = |b − yn | . yn b |yn ||b| Now, since yn −→ b, there exists N1 such that |yn − b| < |b|/2, 13/1/2003 1054 mam
for n > N1
80
LIMIT THEOREMS FOR SEQUENCES OF REALS
so, by the triangle inequality, for n > N1 , |b| ≤ |b − yn | + |yn | < |b|/2 + |yn |, So |yn | ≥
|b| , for n > N1 . 2
Also, there exists N2 such that |yn − b| < ε|b|2 /2
for n > N2 .
Let N = max{N1 , N2 }. Then, for n > N 2 1 − 1 = |b − yn | < ε|b| /2 yn 2 b |yn ||b| |b| /2
.
To understand the next two theorems, it is better to think of convergence in terms of neighbourhoods. Remember that |x − a| < ε iff a − ε < x < a + ε, that is iff x ∈ B(a, ε) = (a − ε, a + ε). Thus, xn −→ a iff for all ε > 0, there exists N ∈ N such that for all n > N , xn ∈ (a − ε, a + ε). The key to the squeeze theorem below is that if x and z belong to an interval U , such as (a − ε, a + ε), and if x ≤ y ≤ z, then y also belongs to U . Squeeze Theorem. Let (xn ), (yn ) and (zn ) be sequences of real numbers with xn ≤ yn ≤ zn , for all n. If (xn ) and (zn ) both converge to a, then (yn ) converges to a. Proof. Let ε > 0 be given. Since xn −→ a, we may choose N1 with xn ∈ (a − ε, a + ε),
for n > N1 .
Since zn −→ a, we may choose N2 with zn ∈ (a − ε, a + ε),
for n > N2 .
Put N = max{N1 , N2 }. Then for n > N , both xn and zn belong to (a − ε, a + ε). But, by hypothesis, xn ≤ yn ≤ zn , for all n ∈ N. Hence, yn ∈ (a − ε, a + ε),
for n > N .
Explicitly, if n > N , then a − ε < xn ≤ yn ≤ zn < a + ε. Since ε > 0 is arbitrary, we have for all ε > 0 there exists N with |yn − a| < ε, for all n > N , as required. 13/1/2003 1054 mam
LIMIT THEOREMS FOR SEQUENCES OF REALS
81
Theorem. (Preservation of inequalities.) If (xn ) and (yn ) are sequences of reals with xn −→ x, yn −→ y and xn ≤ yn , for all n∈ N, then x ≤ y. Proof. Assume the hypothesis and suppose x > y. Put ε = (x − y)/2. Then, x − ε = y + ε = (x + y)/2. Now, since xn −→ x, we may choose N1 such that xn ∈ (x − ε, x + ε) for n > N1 , and since yn −→ y, we may choose N2 such that yn ∈ (y − ε, y + ε) for n > N2 . Let n be any natural number > max{N1 , N2 }. Then, yn < y + ε = x − ε < xn . This contradicts the fact that xn ≤ yn for all n ∈ N. WARNING: This theorem does not say that the strict inequality is preserved. If xn < yn , for all n and the limits exist, it is not true that limn xn < limn yn . We still only have limn xn ≤ limn yn . 1 1 For an example, take xn = 1 − and yn = 1 + . Then xn < yn but in the n n limit both become 1. Basic Examples. (1) If p > 0, limn
1 = 0. np
(2) If |a| < 1, then limn an = 0. (3) limn n1/n = 1 (4) If a > 0 limn a1/n = 1. For example (1) we don’t require that p be an integer. But, we need the fact that for all x > 0 and all p ∈ R, xp is defined and the usual rules of exponents hold. See the section Exponents. Proof of example (1). Let ε > 0. By the Archimedean property, there exists N ∈ N with N ≥ 1/(ε1/p ). Then for n > N , p 1 1/p < ε = ε. np Thus,
and
1 np − 0 < ε,
for n > N ,
1 −→ 0. np
Proof of example (2). Let |a| < 1.
1 1 > 1, so = 1 + b, where b > 0, and for n ∈ N . |a| |a| |a|n =
13/1/2003 1054 mam
1 . (1 + b)n
82
LIMIT THEOREMS FOR SEQUENCES OF REALS
By the Bernoulli inequality (1 + b)n ≥ 1 + nb. (This is proved by induction or from the Binomial Theorem.) Hence, |an | = |a|n ≤
1 1 < . 1 + nb nb
1 1 Let ε > 0. Then there exists N such that < εb. Then for n > N , |a|n < < N nb εb = ε. b Thus, for all ε > 0, there exists N such that for all n > N , |an − 0| < ε. Hence an −→ 0. . Proof of example (3). Let xn = n1/n − 1. Then, xn ≥ 0, for all n. It will be enough to prove xn −→ 0. By the Binomial Theorem, for n > 1, n = (1 + xn )n = 1 + nxn +
n(n − 1) 2 n(n − 1) 2 xn + . . . xnn > xn ; 2 2
hence, x2n <
n 2 = , n(n − 1)/2 n−1
for n > 1.
2 Now let ε > 0 be given. Choose any N ≥ 2 +1. Then, for all n > N , |xn | = xn < ε. ε Hence, xn −→ 0.
Proof of (4). First assume a ≥ 1. Then 1 ≤ a1/n ≤ n1/n , for n ≥ a. So |a1/n − 1| ≤ n1/n − 1,
for n > a.
By example (3) (n1/n − 1) converges to 0, so a1/n −→ 1, by comparison. (Alternatively we could have used the squeeze theorem.) In case 0 < a < 1, we apply the above to the reciprocal: 1/a > 1, so (1/a)n −→ 1 and hence an = 1/(1/a)n −→ 1 by the limit of quotients theorem.
Theorem. Let (an ) be a sequence of non-negative real numbers such that limn exists and is < 1. Then an −→ 0.
an+1 an
an+1 and assume r < 1. Choose c with r < c < 1 and put an an+1 − r| < ε.. ε = c − r. Then, there exists N ∈ N such that, for all n ≥ N , | an Thus, for n ≥ N , an+1 < r + ε = c. an Proof. Let r = limn
13/1/2003 1054 mam
LIMIT THEOREMS FOR SEQUENCES OF REALS
In other words an+1 < can for n ≥ N . Thus, aN +1 ≤ aN c aN +2 ≤ aN +1 c ≤ aN c2 , .. . By a simple induction aN +k ≤ aN ck , for all k ∈ N or what is the same, an ≤ aN cn−N ,
for n > N .
Put K = aN c−N and get an ≤ Kcn ,
for n > N .
Since an ≥ 0 and cn −→ 0, it follows that limn an = 0.
13/1/2003 1054 mam
83
84
LIMIT THEOREMS FOR SEQUENCES OF REALS
It would be a good idea to write out the proof of uniqueness of limits in terms of the distance function d, to see that the result is also valid in general metric space. Another way to look at uniqueness of limits is via Theorem. (The Hausdorff Property.) In any metric space, if a 6= b, there exist neighbourhoods Ua of a and Ub of b with Ua ∩ Ub = ∅. Proof. Since a 6= b, d(a, b) > 0. Take δ any positive number ≤ d(a, b)/2, Ua = B(a, δ) and Ub = B(b, δ). Then Ua and Ub are neighbourhoods of a and b, respectively, and Ua ∩ Ub = ∅. Indeed, if z ∈ Ua ∩ Ub , then d(a, b) ≤ d(a, z) + d(z, b) < δ + δ ≤ d(a, b), an impossibility. To prove uniqueness of limits, suppose (xn ) converged to both a and b. By the Hausdorff property, there exist neighbourhoods Ua of a and Ub of b with Ua ∩Ub = ∅. Since xn −→ a, we may choose Na such that xn ∈ Ua , for n > Na . Since xn −→ b, we may choose Nb such that xn ∈ Ub , for n > Nb . Now, take a particular n > max{Na , Nb }. Then, xn ∈ Ua ∩ Ub , which is impossible.
13/1/2003 1054 mam
Existence: Monotone sequences In the definition and examples, we needed to know (or guess) the limit of a sequence in order to prove it converged. Here and in a subsequent section, we will find conditions that guarantee the existence of a limit, without knowing it in advance. The first is the idea of a monotone sequence. A sequence (xn ) of real numbers is called increasing if xn ≤ xn+1 , for all n ∈ N; it is called strictly increasing if xn < xn+1 ,
for all n ∈ N;
xn ≥ xn+1 ,
for all n ∈ N;
xn > xn+1 ,
for all n ∈ N.
it is called decreasing if
and strictly decreasing if
A sequence is called monotone if it is either increasing or decreasing and I guess you can figure out what “strictly monotone” means. Alternate terminology: Some people use ‘increasing’ for ‘strictly increasing’ and say ‘non-decreasing’ for what is here called increasing. (Ross’s book, used in 1995–1996 here uses this terminology.) Be careful: a sequence which is not decreasing need not be non-decreasing! (E.g. our friend ((−1)n ). ) The book by Lay, “Analysis with an introduction to proof” uses the same terminology as we do. The book by Gaskill and Narayanaswami, “Foundations of Analysis” uses monotone increasing for increasing and monotone decreasing for decreasing. Sometimes isotone and antitone are used. Be sure which terminology a book you are looking at is using.
Monotone Convergence Theorem. Every bounded monotone sequence of real numbers converges. Proof. We prove the increasing case and leave the decreasing case as an exercise. Suppose (xn ) is an increasing sequence which is bounded. In particular, {xn : n ∈ N} is bounded above, so has a supremum. Let a = supn∈N xn . We claim (xn ) converges to a. Indeed, fix ε > 0. By definition of supremum, for all n ∈ N,
xn ≤ a < a + ε
and since a − ε < a, there exists N ∈ N with a − ε < xN . But (xn ) is increasing, so xn ≥ xN , for all n ≥ N . Thus, for all n ≥ N, a − ε < xn < a + ε. That is, for all ε > 0, there exists N ∈ N with xn ∈ B(a, ε) for all n > N ; in other words, xn −→ a. . Recall that B(a, ε) is {x : |x − a| < ε} the ε-ball or ε-neighbourhood of a. You will notice that what one really proves is 13/1/2003 1054 mam 85
86
EXISTENCE: MONOTONE SEQUENCES
Corollary. (1) each increasing sequence bounded above converges to its supremum. (2) each decreasing sequence bounded below converges to its infimum. Of course, an increasing sequence is bounded iff it is bounded above, and a decreasing sequence is bounded iff it is bounded below. (why?) Example. Let (xn ) be the sequence defined by recursion by x1 = 1,
xn+1 =
√
for n ∈ N.
1 + xn ,
We will see that this is a bounded increasing sequence. If we try a few values, it appears that xn ≤ 2 for all n. So we try to prove that by induction. First, x1 = 1 < 2. Now suppose, xn < 2. Then xn+1 =
√
√
√ 1 + xn <
1+2=
3 < 2,
so by induction xn < 2 for all n ∈ N. To prove that (xn ) is increasing, we also use induction. We have to show xn ≤ xn+1 , for all n ∈ N. This is true for n = 1, since x1 = 1 < true for n we have xn+1 =
√
√
1 + xn ≤
1+1 =
p
√
1 + x1 = x2 . Now supposing it
1 + xn+1 = xn+1+1 .
Thus the result holds for all n ∈ N. Since (xn ) is bounded and increasing we can apply the Monotone Convergence Theorem to yield that (xn ) converges to some a ∈ R. To decide what the limit a is we need that limn xn+1 is also a. (Proof?). Now, √ xn+1 = 1 + xn , so x2n+1 = 1 + xn , for all n. Hence, using the limit theorems for sums and products, we have a2 = lim x2n+1 = 1 + lim xn = 1 + a. n
n
√ Thus a2 = 1 + a, and hence a = (1 ± 5)/2. Also, xn ≥ 0 for all n, so in the limit a ≥ 0. (Inequalities are preserved under limits.) This excludes the possibility that √ √ a = (1 − 5)/2; therefore a = (1 + 5)/2. The above proof could have been shortened using the following result. Lemma. If (yn ) is a sequence of positive real numbers with yn −→ b, then √ b. The proof is an exercise, based on 13/1/2003 1054 mam
√
√ y−
b= √
y−b √ . y+ b
√
yn −→
EXISTENCE: MONOTONE SEQUENCES
87
√ Example. Finding 2. Notice that x2 = 2 ⇐⇒ 2x2 = x2 + √ 2 ⇐⇒ x = (x2 + 2)/2x. Let us use this formula as motivation to try to find 2. Let x1 = 2, and for each natural number n, let xn+1 =
x2n + 2 . 2xn
(*)
We will show that (xn ) is a bounded monotone sequence. For all a, b, a2 + b2 ≥ 2ab, since a2 − 2ab + b2 = (a − b)2 ≥ 0. Thus, for all x, √ x2 + ( 2)2 √ ≥ 2 2x
(**)
Now x2 = (22 + 2)/(2 · 2) ≤ 8/4 = 2 = x1 . Assuming xn+1 ≤ xn , we have x2n+1 + 2 ≤ xn+1 2xn+1 ⇐⇒ x2n+1 ≤ 2x2n+1
xn+2 =
⇐⇒
x2n+1 ≥ 2,
which is true because of (∗∗). Thus, by induction xn+1 ≤ xn , for all n. We have √ shown that (xn ) is a decreasing sequence, which by (∗∗) is bounded below by 2. Thus xn converges to some x. Then taking a limit on both sides of (∗) yields x2 + 2 . x= 2x √ √ As we have noted the only solution to this equation is x = 2, so that xn → 2. The Monotone Convergence Theorem can be extended to the unbounded case using the concept of infinite limits. For a sequence (xn ) of real numbers, limn xn = +∞ means for each M ∈ R, there exists N ∈ N such that, for all n > N , xn > M . Some people refer to this as (xn ) converges to +∞ (in the extended real number system). Others say (xn ) diverges to +∞, to emphasize that (xn ) does not converge in the real number system. Similarly, limn xn = −∞ means for each M ∈ R, there exists N ∈ N such that, for all n > N , xn < M . Here one says (xn ) converges to −∞ (in the extended real number system), or (xn ) diverges to −∞. Theorem. (a) If (xn ) is unbounded and increasing, then limn xn = +∞. (b) If (xn ) is unbounded and decreasing, then limn xn = −∞. Proof. (a) Let (xn ) be increasing and unbounded. By definition limn xn = +∞ means for each M ∈ R, there exists N ∈ N such that, for all n > N , xn > M . So let M ∈ R be given. Since (xn ) is increasing, it is bounded below by x1 . But (xn ) is unbounded, so it can’t be bounded above. Thus, there must be an N ∈ N with xN > M . Now, as in the case of finite limits, for n ≥ N , we have xn ≥ xN , since the sequence is increasing. Thus, for all n ≥ N , xn > M , as required. 13/1/2003 1054 mam
88
EXISTENCE: MONOTONE SEQUENCES
The proof of (b) is similar.
In the extended real number system R, +∞ is considered the supremum of any set which is not bounded above, and −∞ is the infimum of any set which is not bounded below. Thus, in general, Corollary. (1) each increasing sequence of real numbers converges to its supremum, possibly +∞. (2) each decreasing sequence of real numbers converges to its infimum, possibly −∞.
13/1/2003 1054 mam
Limit inferior and limit superior. If (xn ) is a bounded sequence there are two important monotone sequences of real numbers associated with it. If we put, for each n ∈ N, an = inf xk
and
k≥n
bn = sup xk k≥n
then (an ) is an increasing sequence and (bn ) is a decreasing sequence. Indeed, for each n ∈ N, {xk : k ≥ n} ⊃ {xk : k ≥ n + 1}, So an = inf{xk : k ≥ n} ≤ inf{xk : k ≥ n + 1} = an+1 and bn = sup{xk : k ≥ n} ≥ sup{xk : k ≥ n + 1} = bn+1 . (The infima and suprema here exist in R because the sequences are bounded both below and above.) Now, if (xn ) is bounded, we see that (an ) is also bounded, and since it is increasing it converges to some a. In fact, it converges to a = supn an . This a is called the limit inferior of (xn ) written lim inf n xn . Thus, lim inf xn = lim inf xk = sup inf xk n
n k≥n
n k≥n
Similarly, (bn ) converges to b = inf n bn , which is called called the limit superior of (xn ) written b = lim supn xn : lim sup xn = lim sup xk = inf sup xk . n k≥n
n
n k≥n
Other notation for these are: limn xn for lim inf n xn and limn xn for lim supn xn . 1 . Then for each n, 2n
Example. Let xn = 1 + (−1)n + xn = 2 +
1 , 2n
xn =
if n is even, and
1 2n
if n is odd.
Thus, if bn = supk≥n xk , then 1 2 + n, if n is even 2 bn = 2 + 1 , if n is odd. 2n+1 hence, lim sup xn = lim bn = 2. n
n
On the other hand, we find that inf k≥n xn = 0, for all n, so lim inf n xn = 0. 13/1/2003 1054 mam 89
90
LIMIT INFERIOR AND LIMIT SUPERIOR
Theorem. For a bounded sequence (xn ) of real numbers, (1) lim inf n xn = a if and only if for each ε > 0, (i) there exists n such that xk > a − ε, for all k ≥ n. (ii) for all n there exists k ≥ n with xk < a + ε. (2) lim supn xn = b if and only if for each ε > 0, (i) there exists n such that xk < b + ε, for all k ≥ n. (ii) for all n there exists k ≥ n with xk > b − ε. Proof. We do the case of lim sup and leave the lim inf case as an exercise. Let b = lim supn xn . Then b = inf n bn , where bn = supk≥n xk . (i) Let ε > 0 and let n ∈ N. Since b = inf n bn , we may choose n with bn < b + ε. That is, sup xk < b + ε. k≥n
But if the supremum of a set is < some number, each member of the set is also < that number, so xk < b + ε, for all k ≥ n. Thus, for each ε > 0, there exists n, such that xk < b + ε, for all k ≥ n. (ii) Again let ε > 0. Then b − ε < b = inf n bn , so if we fix an arbitrary n, b − ε < bn . But, bn = supk≥n xk , and a number less than a least upper bound is no longer an upper bound, so there exists k ≥ n with b − ε < xk . Thus, we have shown that for each ε > 0, and each n ∈ N, there exists k ≥ n with xk > b − ε, completing the proof that the limit superior satisfies the two properties. Conversely, suppose (i) and (ii) hold. Let ε > 0. Then, by (i), we may choose n such that xk < b + ε, for all k ≥ n. Taking supremum over all the k ≥ n, we obtain sup xk ≤ b + ε. k≥n
(Be careful, this is ≤, not < !) Thus, there exists n such that sup xk ≤ b + ε. k≥n
13/1/2003 1054 mam
LIMIT INFERIOR AND LIMIT SUPERIOR.
91
Hence, taking infimum (or limit), lim sup xn = inf sup xk ≤ b + ε. n k≥n
n
Since this inequality is true for arbitrary ε > 0, lim sup xn ≤ b n
. Again, let ε > 0 and fix n ∈ N. Then, by (ii), there exists k ≥ n with xk > b − ε, so that sup xk > b − ε. k≥n
But then, since n was arbitrary, we may take infimum and get lim sup xn = inf sup xk ≥ b − ε. n k≥n
n
Finally, since ε was arbitrary, we get lim sup xn ≥ b. n
Thus, lim supn xn ≥ b and lim supn xn ≤ b, so we have equality . Note. Remember that the variables k, and n in the definitions limit inferior and limit superior and in the above theorem are dummy variables. Thus, lim supn xn is also limN supn≥N xn = inf N supn≥N xn . And, if ε > 0, (i) there exists N such that xn < b + ε, for all n ≥ N . (ii) for all N , there exists n ≥ N with xn > b − ε. Here, (i) can be summarized by saying xn < b + ε, for all except for a finite number of indices n, and (ii) by saying that xn > b − ε, for an infinite number of n. “All but finitely many terms are < b + ε and infinitely many terms are > b − ε.”
13/1/2003 1054 mam
92
LIMIT INFERIOR AND LIMIT SUPERIOR
Notes
13/1/2003 1054 mam
Cluster points and subsequences: The Bolzano-Weierstrass theorem (sequence form) A point c is called a cluster point of (xn ) if for each ε > 0, for all n ∈ N, there exists m > n such that xm ∈ B(c, ε). Thus, the set {n ∈ N : xn ∈ B(c, ε)} is infinite. In other words, c is a cluster point of (xn ) if each neighbourhood of c contains xn for infinitely many n. We also say (xn ) clusters at c. Lemma. If (xn ) converges to c then (xn ) clusters at c. Examples. (a) Let xn = (−1)n . Then, 1 and −1 are cluster points of (xn ). Are there any others? (b) Let xn = 1 + (−1)n + 1/n. Then, 2 and 0 are the only two cluster points of (xn ). (c) Let xn = n + (−1)n n. Then (xn ) has only one cluster point, yet does not converge. Theorem. Let (xn ) be a bounded sequence of real numbers. Then lim inf n xn and lim supn xn are each cluster points of (xn ). Proof. (liminf case) This follows from the earlier result: lim inf n xn = a if and only if for each ε > 0, (i) there exists n such that xk > a − ε, for all k ≥ n. (ii) for all n there exists k ≥ n with xk < a + ε. To see this, let a = lim inf n xn . Let ε > 0 and n ∈ N and choose by (i) an N ∈ N such that for all k ≥ N , xk > a − ε. Now, by (ii), we can find m ≥ max{n + 1, N }, with xm < a + ε. Thus, m > n and satisfies xm > a − ε and xm < a + ε, that is a − ε < xm < a + ε. Thus, for all ε > 0, and all n ∈ N, there exists m > n with xm ∈ B(a, ε); that is, a is a cluster point of (xn ). The proof for lim supn xn is similar.
For a bounded sequence (xn ) lim inf n xn is the smallest cluster point of (xn ) and lim supn xn is the largest. If a bounded sequence (xn ) has lim inf n xn = lim supn xn , then the sequence converges to this common value.
13/1/2003 1054 mam 93
94
CLUSTER POINTS AND SUBSEQUENCES: BW THM
Subsequences. The reader will have noticed that in examples where there is more than one cluster point, the sequence seems to ‘converge’ to that point over a subset of the indices. If (xn ) is a sequence and (nk ) is a strictly increasing sequence of natural numbers n1 < n2 < · · · < nk < nk+1 , then the sequence (yk ) defined by yk = xnk , for all k ∈ N is called a subsequence of (xn ). Theorem. If (xn ) is a sequence which clusters at a, then there is a subsequence (xnk ) which converges to a. Proof. From the definition, if (xn ) clusters at a, for each n ∈ N, and for each ε > 0, there exists m (which depends on both n and ε), with xm ∈ B(a, ε). We define a subsequence recursively as follows. First take n = 1 and ε = 1 and obtain an n1 > 1 with xn1 ∈ B(a, 1). Then take n = n1 and ε = 1/2 and obtain an n2 > n1 with xn2 ∈ B(a, 1/2). In general, if nk has been chosen, we choose nk+1 > nk with xnk+1 ∈ B(a,
1 ). k+1
We then have xnk ∈ B(a, 1/k) for all k ∈ N. The resulting subsequence (xnk ) converges to a. Indeed, let ε > 0, Then there exists K with and xk ∈ B(a, 1/k) ⊂ B(a, ε), as required.
1 1 < ε. For k > K we have <ε K nk
.
The converse of the above result clearly holds too. Theorem. If (xn ) has a subsequence converging to a, then (xn ) clusters at a. Proof. By induction, one establishes that if (nk ) is a strictly increasing sequence of natural numbers, then nk ≥ k, for all k. Now suppose (xnk ) converges to a. Fix ε > 0, and N ∈ N. Then there exists K such that for k > K, xnk ∈ B(a, ε). Choose k to be any natural number > max{N, K}. Then, m = nk ≥ k > N and xm ∈ B(a, ε). Thus, ∀ε > 0, ∀N ∈ N, ∃m > N with xm ∈ B(a, ε).
Thus we have seen that the notion of cluster point and subsequential limit coincide. For a bounded sequence lim supn xn is the largest of its cluster points and lim inf n xn is its smallest. As a Corollary we have: 13/1/2003 1054 mam
CLUSTER POINTS AND SUBSEQUENCES: THE BOLZANO-WEIERSTRASS THM 95
Bolzano-Weierstrass Theorem. (Sequence form.) (1) Every bounded sequence of real numbers has a cluster point. (2) Equivalently, every bounded sequence of real numbers has a convergent subsequence. Another proof of the sequence form can be based on the following, which is of interest for its own sake. Theorem. Every sequence of real numbers has a monotone subsequence. Proof. Let (xn ) be a sequence of real numbers. Call n a dominant index if xn ≥ xm , for all m ≥ n. There are 2 cases. Either the set D of dominant indices is infinite or it is finite. If D is infinite, choose a sequence (nk ) in D with nk < nk+1 , for all k ∈ N. Then, xnk ≥ xnk+1 , for all k, so the subsequence (xnk ) is decreasing. If D is finite, then for all n, there exists m > n, with xm > xn . In this case, we can let n1 be arbitrary, and for each k choose nk+1 > nk , with xnk+1 > xn , obtaining a strictly increasing subsequence (xnk ). To prove the Bolzano-Weierstrass theorem from this result, let (xn ) be a bounded sequence in R. Then, (xn ) has a monotone subsequence. But every mononotone sequence converges, so (xn ) has a convergent subsequence.
13/1/2003 1054 mam
96
CLUSTER POINTS AND SUBSEQUENCES: BW THM
Notes
13/1/2003 1054 mam
Existence: Cauchy sequences Definition. A sequence (xn ) is called a Cauchy sequence if for all ε > 0 there exists N ∈ N such that for all n, m > N , |xn − xm | < ε. We see immediately that Lemma. Every convergent sequence is Cauchy. Proof. Suppose (xn ) converges to a. Let ε > 0. By definition, there exists N such that for all n > M |xn − a| < ε/2. But then for n, m > N , |xn − xm | ≤ ε ε |xn − a| + |a − xm | < + = ε, as required. 2 2 The remarkable thing is we will be able to prove that every Cauchy sequence converges, which will give us a second way of getting the existence of a limit without knowing it in advance. Theorem. (Cauchy Criterion) Every Cauchy sequence of real numbers converges. We prove this by showing (1) If a Cauchy sequence clusters at a it converges to a. (2) Every Cauchy sequence is bounded. and we put this together with: (3) Every bounded sequence of real numbers has a cluster point. (Bolzano-Weierstrass theorem) to obtain the result. Theorem. If a Cauchy sequence clusters at a it converges to a. Proof. Let (xn ) be a Cauchy sequence with cluster point a. We will show that (xn ) also converges to a. Let ε > 0. Since (xn ) is Cauchy, there exists N such that for all n, m > N , |xn − xm | < ε/2 Fix such an N and let n > N . Since (xn ) clusters at a, we may choose m > n such that |xm − a| < ε/2. Thus, |xn − a| ≤ |xn − xm | + |xn − a| < ε. We have shown, then, that for all n > N , |xn − a| < ε. Since ε > 0 was arbitrary, xn −→ a. Theorem. Every Cauchy sequence is bounded. Proof. Let (xn ) be a Cauchy sequence. Apply the definition with ε = 1 to obtain N such that for n, m > N , |xn − xm | < 1. 13/1/2003 1054 mam 97
98
EXISTENCE: CAUCHY SEQUENCES
In particular, taking m = N + 1 |xn | ≤ |xn − xN +1 | + |xN +1 | < 1 + |xN +1 |,
for all n > N + 1.
So we put M = max{|x1 |, . . . , |xN |, 1 + |xN +1 |}. Then, |xn | ≤ M, for all n ∈ N. Therefore, (xn ) is a bounded sequence. Proof of the Cauchy criterion. Let (xn ) be a Cauchy sequence in R. Then (xn ) is bounded, by the above theorem, so has a cluster point by the Bolzano-Weierstrass theorem. Thus, (xn ) converges to that cluster point. You will have noticed that the completeness of the reals was the essential ingredient in this proof. (This was what caused cluster points, such as lim sup and lim inf to exist.) The fact that the Cauchy criterion holds for sequences in R is called metric completeness of R.
13/1/2003 1054 mam
The number e, an application of Monotone Convergence Here we use the Monotone Convergence Theorem for the real numbers to prove that n 1 lim 1 + n n exists. You know this from Calculus as the number e. Once we show it exists, we will make this the definition of e. Let an =
n 1 1+ . n
Of course, an ≥ 0. We will show that (an ) is bounded above and is increasing, so that it will converge by the Monotone Convergence Theorem. First, use the Binomial Theorem to obtain n k X n 1 an = . k n k=0
Boundedness. Thus, k 2 k−1 1 1 n 1 1− ··· 1 − . =1 1− n n n n k! k Since each of the
(∗)
i 1 > 0 this is ≤ , hence n k! n k n X X n 1 1 . an = ≤ k n k! k=0
k=0
For k ≥ 1, k! ≥ 2k−1 , so n n k−1 X X 1 1 ≤1+ an ≤ ≤ 1 + 2 = 3. k! 2 k=0
k=1
Here, we use the formula n X
rk−1 =
k=1
1 − rn , 1−r
the sum of a finite geometric series, with r =
1 . 2
Since (an ) is bounded below by 0, this shows (an ) is bounded. Increasing. To show (an ) is actually increasing, look at the expressions for an and an+1 . n k X n 1 an = . k n k=0 k n X 1 n+1 1 + . an+1 = n+1 (n + 1)n+1 k k=0
13/1/2003 1054 mam 99
100
THE NUMBER e – MCT
So an ≤ an+1 , provided for each k = 0, . . . , n k k n 1 n+1 1 ≤ , n n+1 k k But, the left-side here is 1 2 k−1 1 1 1− 1− ··· 1 − n n n k! and the right side is obtained from it by replacing n by n + 1, which makes it larger, i i ). since (1 − ) < (1 − n n+1 This proves (an ) is a bounded monotone sequence of real numbers, so it converges. As we said, the limit is called e. Connection with the series We noticed above that
P 1 . k k!
n X n 1 1 . an = 1 + ≤ n k! k=0
And, if we let bn stand for the right side of this inequality, bn ≤ 3. Clearly, the sequence (bn ) is increasing and bounded, so it too has a limit, denoted P∞ 1 . Let us temporarily call this limit E. Since k=0 k! an ≤ bn , in the limit we have e ≤ E. But look at (∗) again. n k n X X 1 2 1 n k−1 1 1 − = 1 1− ··· 1 − . an = k n n n n k! k=0
k=0
Fix m ∈ N. Then, for n ≥ m, m X 2 k−1 1 1 1− ··· 1 − . 1 1− an ≥ n n n k! k=0
i converges to 0 as n runs, so taking But, this is a finite sum, and each of the n limits of both sides gives m X 1 e≥ = bm . k! k=0
Again, inequalities are preserved in the limit, so e ≥ lim bn = E. n
Since we had e ≤ E before, this gives e = E. That is, n X ∞ 1 1 . = lim 1 + n n k! k=0
(See SERIES OF NUMBERS.) 13/1/2003 1054 mam
Series of numbers Associated a sequence (an ) = (a1 , a2 , . . . ) of real (or P complex) numbers is P with P n a . For each n ∈ N, the sum s = the series n an = ∞ n n n=1 k=1 ak is called the th th n partial sum of the series and an is called its n term. P∞ The series n=1 an is said to converge if the sequence (sn ) of partial sums converges and to be Cauchy if (sn ) is Cauchy. If s is the limit of (sn ) we call s the sum of the series and write ∞ X an = s. n=1
If (sn ) diverges, the series is said to diverge. P∞ n−1 converges iff |x| < 1, in Example. (Geometric series) the series n=1 x which case ∞ X 1 . xn−1 = 1−x n=1 Proof. If x 6= 1, the nth partial sum is sn =
n X
xk−1 =
k=1
1 − xn . 1−x
Since xn −→ 0 if and only if |x| < 1, this converges to
1 if |x| < 1 and diverges 1−x
if |x| > 1. If x = 1, we have sn = n so the series diverges (to +∞) and if x = −1, the sequence (sn ) is (1, 0, 1, 0, . . . ) which also diverges. For convenience of notation one also works with series of the form ∞ X
an ,
n=p
whose terms form a sequence (ap , ap+1 , . . . ) indexed on {p, p + 1, . . . } and whose partial sums are of the form n X ak . sn = k=p
For example, the geometric series above is often written as x 6= 1) n X 1 − xn+1 , xk = 1−x
P∞
n=0
xn . We have (if
k=0
and if |x| < 1, ∞ X
xn =
n=0
1 . 1−x
The monotone convergence theorem for convergence of sequences of reals becomes, in terms of series: 13/1/2003 1054 mam 101
102
SERIES OF NUMBERS
Theorem. A series of non-negative terms converges iff its partial sums are bounded. P∞ Proof.PLet n=1 an be a series with an ≥ 0, for all n, and with partial sums n sn = k=1 ak . Then, for all n, sn ≤ sn + an+1 = sn+1 . This shows that (sn ) is an increasing sequence of real numbers, so if it is bounded above, it is convergent. Note. We can say more: For a series of non-negative terms n X
ak ≤
∞ X
an .
n=1
k=1
Indeed, using the notation of the previous proof, we know that limn sn = supn sn . In case the (sn ) is not bounded above, we know that sn −→ +∞ in R and we write ∞ X an = +∞. n=1
However, we still say the series diverges (to infinity). Example. (The harmonic series) The series
P∞
n=1
1 diverges. n
Proof. Suppose this series were to converge with sum s, and let sn be the nth partial sum. Then s − sn −→ 0. But, for all n, s − sn ≥ s2n − sn =
2n X 1 k
k=n+1
≥
2n X k=n+1
1 n 1 = = , 2n 2n 2
which does not converge to 0, a contradiction.
13/1/2003 1054 mam
SERIES OF NUMBERS
103
The Cauchy condition can be restated in terms of series as: P Theorem. n an is Cauchy (hence converges) iff for each ε > 0, there exists N such that for n ≥ m ≥ N , n X ak < ε. k=m
P Proof. Let sn = nk=1 ak . Then, by definition, (sn ) is Cauchy iff for each ε > 0, there exists N such that for n, m > N , |sn − sm | < ε. We notice that if n ≥ m, then n m X X |sn − sm | = ak − ak k=1 k=1n X = ak . k=m+1
Suppose (sn ) is Cauchy. Let ε > 0. Choose N1 so that for m, n > N1 , |sn −sm | < ε. Let N = N1 + 2 and n ≥ m ≥ N . Then, n, m − 1 > N1 , so n X |sn − sm−1 | = an < ε. k=m
Thus, for each ε > 0, there exists N such that n ≥ m ≥ N implies |
|
Pn
k=m
an | < ε.
PnConversely, suppose ε > 0 and N is chosen so that n ≥ m ≥ N implies k=m an | < ε. Let n, m > N . P In case n > m, we have n ≥ m + 1 ≥ N and |sn − sm | = nk=m+1 an < ε.
The case n < m is proved similarly, by interchanging the roles of n and m; and in case n = m, |sn − sm | = 0 < ε. Thus, in all cases n, m > N implies |sn − sm | < ε, so the sequence (sn ) is Cauchy. Theorem. (nth -term test) P If n an converges then an −→ 0. If (an ) does not converge to 0, then the series
P
n
an diverges.
Proof. If the series converges Pnit is Cauchy, so if ε > 0 is given, we can find an N such that for n ≥ m ≥ N | k=m ak | < ε. Taking n = m ≥ N gives |an | < ε. This proves an −→ 0. We emphasize that the above result does not say that an −→ 0 implies the P 1 series converges. Indeed, the harmonic series n diverges, yet 1/n −→ 0. n However, there is a case where this does hold. 13/1/2003 1054 mam
104
SERIES OF NUMBERS
Theorem. (alternating series test) If the sequence (an ) decreases to 0, then the series Moreover, if Rn =
∞ X
(−1)k+1 an −
n=1
n X
∞ X
(−1)k+1 ak =
k=1
P∞
n=1 (−1)
n+1
an converges.
(−1)k+1 ak ,
k=n+1
then |Rn | ≤ an+1 and Rn has the same sign as the n + 1st term (−1)n+2 an+1 . P n (Corresponding statements hold for ∞ n=1 (−1) an .) Thus, if the terms of a series have alternating signs and have absolute values which decrease with limit 0, then the series converges and the nth “remainder”, that is, the error in using the nth partial sum to approximate the sum, is bounded by the size of first term omitted (the n + 1st term) and is of the same sign. Another way you could state this result for both cases (without explicitly writing the (−1)n or (−1)n+1 ) is:
P
Suppose cn −→ 0, |cn | ≥ |cn+1 |, for all n, and cn cn+1 ≤ 0, for all n. Then, converges and
n cn
Rn =
∞ X n=1
cn −
n X k=1
∞ X
ck =
ck
k=n+1
satisifies |Rn | ≤ |cn+1 | and cn Rn+1 ≥ 0, for all n. The cn cn+1 ≤ 0 indicates that the terms alternate in sign. The cn Rn+1 ≥ 0 indicates that the remainder is of the same sign as cn+1 .
Proof. Let m and n be natural numbers with m > n. Then m X
(−1)k+1 ak = (−1)n+1 ((an − an+1 ) + (an+2 − an+3 ) + (an+4 − an+5 ) + . . . ])
k=n
Since the sequence (an ) decreases, the terms (an − an+1 ), (an+2 − an+3 ), . . . are all ≥ 0, and hence the sum (an − an+1 ) + (an+2 − an+3 ) + (an+4 − an+5 ) + . . . ]) is also ≥ 0. (Whether this sum ends with am or am − am−1 depends on whether m − n is even or odd, but it is still ≥ 0.) m X k+1 (−1) ak = (an − an+1 ) + (an+2 − an+3 ) + (an+4 − an+5 ) + . . . ]) k=n
= an − [(an+1 − an+2 ) + (an+3 − an+4 ) + . . . ]) ≤ an ,
since again the terms (an+1 − an+2 ), (an+3 − an+4 ), . . . are ≥ 0. Now, let ε > 0 and use the fact that an −→ 0 to find N such that n ≥ N implies an < ε. Then for m > n ≥ N , we have m X k+1 (−1) ak < ε, k=n
13/1/2003 1054 mam
SERIES OF NUMBERS
105
so the series is Cauchy hence converges. We have still to check the sign of the remainder. We saw above that m X
(−1)k+1 ak
k=n
is (−1)n+1 times a non-negative quantity, hence has the same sign as (−1)n+1 an , the nth term. When we let m tend to infinity we obtain Rn−1 =
∞ X
(−1)k+1 ak ,
k=n
which still has the sign of the nth term, as required. (Replace n by n + 1 to obtain the result in the form stated.) P P Theorem. (Comparison test) For series n an , and n bn of non-negative terms, if N0 ∈ N, and an ≤ bn , for all n ≥ N0 , then P P (a) if n bn converges then so does n an and P P (b) if n an diverges, then so does n bn . Proof. (a) and (b) are contrapositives of each other, so we prove only the first. Changing a finite number of terms does not affect convergence, (although it does affect the sum), so we may assume an ≤ bn for all n. P P∞ Pn Let n bn converge and let B = n=1 bn . The partial sums Bn = k=1 bk form an increasing sequence, bounded above by B. But then,
0≤
n X k=1
ak ≤
n X
bk ≤ B.
k=1
P This shows that the partial sums of n an are also bounded above by B and, moreover ∞ ∞ X X an ≤ B = bn . n=1
13/1/2003 1054 mam
n=1
106
SERIES OF NUMBERS
A series
P
n
an is said to converge absolutely if
P
n
|an | converges.
Theorem. If a series of real (or complex) numbers converges absolutely, then it converges. Proof. Let
P
n
an converge absolutely. Then, by definition,
P
n
|an | converges.
Thus, by the Cauchy criterion, if ε > 0 is fixed, we can choose N such that n ≥ m ≥ N implies n X |ak | < ε k=m
Hence,
Thus, the series
P
n
n n X X ak ≤ |ak | < ε k=m
k=m
an is also Cauchy, so converges.
The reason we emphasize that the terms of the sequence are real or complex numbers is that the metric completeness of these spaces is responsible for the result. Remember “metric completeness” refers to the fact that Cauchy sequences converge. (That absolute convergence implies convergence can actually be used to characterize completeness.) Corollary. (Absolute P comparison test). If |an | ≤ |bn | for all n, then verges absolutely if n bn does.
P
n
an con-
Notice that in this form, there was no test for divergence. Theorem. (ratio P P comparison test) Let n an and n bn be series of non-negative terms. (a) If limn
P P an = L < ∞ and n bn converges, then n an converges also. bn
(b) If limn
P P an = L > 0 and n an converges, then so does n bn . bn
P In P part (b), L is allowed to be +∞. In case 0 < L < ∞, the result says n an and n bn converge or diverge together. That is, both converge or both diverge. Proof. (a) Suppose an /bn −→ L < ∞ and fix K with L < K < ∞. Then there P ex≤ Kbn , for all n. Thus, if n bn ists N such that forP n ≥ N , an /bn < K. Thus, anP converges, so does n Kbn , and hence so does n an , by the usual comparison test. (b) Suppose limn an /bn = L > 0, and let 0 < c < L. Then, we may choose N > c. Thus, cbn < an ,P for all n ≥ N , and so convergence so P that for n ≥ N , an /bnP of n an implies that of n cbn , and hence of n bn , since c is not 0. You will notice that limit could have been replaced by lim sup in (a) and by lim inf in (b). As the proof shows, what is really involved is that the ratios an /bn be bounded above in (a) and be bounded below by a c > 0 in (b). 13/1/2003 1054 mam
SERIES OF NUMBERS
107
The doubling idea used to show that the harmonic series diverges can be refined to give a surprisingly useful test. For a series whose terms are non-negative and decrease, a rather “thin” subsequence determines convergence. Theorem. (Cauchy’s condensation test) Let an ≥ an+1 ≥ 0, for all n. Then ∞ X
an converges
∞ X
⇐⇒
n=1
2k a2k converges.
k=0
Proof. If 2m > n, we have n X
ai ≤ a1 + (a2 + a3 ) + (a4 + a5 + a6 + a7 ) + · · · + (a2m + · · · + a2m+1 −1 )
i=1
≤ a1 + (a2 + a2 ) + (a4 + a4 + a4 + a4 ) + · · · + (a2m + · · · + a2m ) = 1a1 + 2a2 + 4a4 + · · · + 2m a2m ∞ X ≤ 2k a2k , k=0
P∞ P k so if ∞ k=0 2 a2k converges, so does n=1 an , since its partial sums are bounded above. Similarly, ∞ X
ai ≥ a1 + a2 + (a3 + a4 ) + (a5 + a6 + a7 + a8 ) + (a9 + · · · + a16 ) + · · · + (a2m−1 +1 + · · · + a2m )
i=1
≥ a1 + a2 + (a4 + a4 ) + (a8 + a8 + a8 + a8 ) + (a16 + · · · + a16 ) + · · · + (a2m−1 +1 + · · · + a2m ) 1 ≥ a1 + a2 + 2a4 + 4a8 + · · · + 2m−1 a2m , 2 1 = (a1 + 2a2 + 4a4 + · · · + 2m a2m ) 2 m 1X k 2 a2k = 2 k=0
so that if
P∞
n=1
an converges, so does
P∞
k=0
2k a2k .
As an application, Theorem. (p-series) For a real number p,
X 1 converges iff p > 1. np n
Proof. By the Cauchy condensation test the series ∞ X k=0
∞
2k
X 1 = k p (2 )
k=0
1
k
2p−1
does. But this is a geometric series; it converges iff
13/1/2003 1054 mam
X 1 converges iff np n
1 < 1, that is, iff p > 1. 2p−1
108
SERIES OF NUMBERS
Theorem. (root test) The series
P
n
an
(a) converges absolutely if lim supn |an |1/n < 1; (b) diverges if lim supn |an |1/n > 1 If lim supn |an |1/n = 1 the series could converge or diverge. Proof. (a) Let α = lim supn |an |1/n < 1. Choose r with α < r < 1. Then, there exists N such that for n > N , |an |1/n < r. Thus, |an | ≤ rn for n > N. P P Hence n an converges absolutely by comparison with the geometric series n rn , 0 ≤ r < 1. (b) If lim supn |an |1/n > 1, then for P infinitely many n, |an | > 1, hence (an ) could not tend to 0. Hence, the series n an diverges. P 1 diverges, while limn (1/n2 )1/n is also n n
Notice that limn (1/n)1/n = 1 and P 1 1 and n 2 converges. n Theorem. (ratio test) The series
P
n
an
(a) converges absolutely if lim supn |an+1 |/|an | < 1 and (b) diverges if lim inf n |an+1 |/|an | > 1; (more generally, if there exists N such that for n ≥ N , |an+1 |/|an | ≥ 1). Proof. (a) We may assume an ≥ 0, for all n. Let α = lim supn for n ≥ N ,
an+1 < 1 and let 1 > r > α. Then, there exists N such that an an+1 < r. an
Thus, for all n ≥ N , an+1 < an r, so that aN +2 < aN +1 r < aN r2 ,
aN +1 < aN r, and by induction
aN +k < aN rk . Thus, for all n ≥ N , an ≤ aN rn−N = aN r−N rn . Writing K for the constant aN r−N , we have an ≤ Kr n , for n ≥ N. Since 0 < r < 1, the series geometric series. 13/1/2003 1054 mam
P
n
an converges, by comparision with a convergent
SERIES OF NUMBERS
109
(b) If lim inf an+1 /an > 1, then there exists an N such that for all n ≥ N , an+1 /an > 1. Now if we have even an+1 /an ≥ 1, for all n ≥ N , we see Pthat an ≤ an+1 for all n ≥ N , so an cannot converge to 0, hence the series n an cannot converge. Once again 1.
P P 1 diverges, while n (1/n2 ) converges. In both cases lim an+1 /an = n n
Notice that both the ratio test and the root test deduce divergence only from the lack of convergence to 0 of the terms. The relationship between the ratio and the root tests is brought out by the following result. It shows that whenever the ratio test shows convergence, the root test will also. If the lim inf version of the ratio test shows divergence, so will the root test. There are series, however, for which the root test indicates convergence, but the ratio test does not apply. Theorem. If an > 0 for all n ∈ N, lim inf n
an+1 an+1 ≤ lim inf a1/n ≤ lim sup a1/n ≤ lim sup . n n n an an n n
Proof. Recall that, for any sequence (xn ), lim sup xn = lim sup xk n k≥n
n
and lim inf xn = lim inf xk . n
n k≥n
and that if β > lim sup xn , then there exists N such that for n ≥ N , xn < β.
(∗)
n
(“All but finitely many terms are < β.”) Now, for each n, inf xk ≤ sup xk ,
k≥n
k≥n
so in the limit lim inf xn ≤ lim sup xn . n
n
This is a general result which applies here to give ≤ lim sup a1/n lim inf a1/n n n . n
n
The interesting part is the comparison with the ratios. Let α = lim supn for n ≥ N ,
13/1/2003 1054 mam
an+1 and let β > α. Then, by (∗) there exists N such that an an+1 < β. an
110
SERIES OF NUMBERS
Thus, for all n ≥ N , an+1 < an β, so that aN +2 < aN +1 β < aN β 2 ,
aN +1 < aN β, and by induction
aN +k < aN β k . Thus, for all n ≥ N , an ≤ aN β n−N = aN β −N β n . Writing K for the constant aN β −N , we have an ≤ Kβ n , for n ≥ N. Thus, ≤ K 1/n β, for n ≥ N. a1/n n Taking the lim sup of both sides (recalling that the limit superior does not change if we change a finite number of terms) we have lim sup a1/n ≤ lim sup K 1/n β. n n
n
But if a limit exists, it is also the limit superior, so lim sup K 1/n = lim K 1/n = 1, n
n
and hence, lim sup a1/n ≤ β. n n
But β was arbitrary > α. Hence, lim sup a1/n ≤ α = lim sup n n
n
an+1 an
The inequality involving limit inferior is proved the same way.
13/1/2003 1054 mam
Limits of functions For a real-valued function defined on a set X ⊂ R (or Rn ) and a point c, we say f (x) converges to L as x tends to c, and write f (x) −→ L as x −→ c iff for each ε > 0, there exists δ > 0 such that for x ∈ X, 0 < |x − c| < δ
=⇒
|f (x) − L| < ε.
In other words, for all ε > 0 there exists δ > 0 such that f [B(c, δ) \ {c}] ⊂ B(L, ε). One also writes limx−→c f (x) = L. (This notation has a slight flaw, as we shall see shortly.) Example. Let f (x) = x2 + 2x + 6, for x ∈ R. Then, limx−→3 f (x) = 21. Proof. Let ε > 0. Then, |f (x) − 21| < ε
⇔
|x2 + 2x + 6 − 21| < ε
⇔
|x2 + 2x − 15| < ε
⇔
|x − 3||x + 5| < ε.
Now, if |x − 3| < 1, we will have |x + 5| ≤ |x − 3| + |3 + 5| ≤ 1 + 8 = 9, so that |f (x) − 21| ≤ |x − 3|9. Thus, |f (x) − 21| will be < ε provided |x − 3|9 < ε, and |x − 3| < 1. Thus, we take δ = min{1, ε/9}. Then, |x − 3| < δ implies |f (x) − 21| = |x − 3||x + 5| ≤ |x − 3|9 < as required to prove limx−→3 f (x) = 21.
13/1/2003 1054 mam 111
ε · 9 = ε, 9
112
LIMITS OF FUNCTIONS
As you can see, the techniques we have developed for limits of sequences seem to apply for limits of functions. Actually, there is a close connection, as we will see, known as the sequential criterion for convergence. The set B 0 (c, δ) = B(c, δ) \ {c} is often called the deleted neighbourhood of c radius δ, or the deleted open ball about c radius δ. The definition of convergence of the function f to L as x −→ c becomes: For each neighbourhood V of L there exists a deleted neighbourhood U of c with f (U ) ⊂ V Notice that by definition, a point a is an accumulation point of a set A iff B(a, δ) ∩ (A \ {a}) 6= ∅, and this can be rewritten so that B 0 (a, δ) ∩ A 6= ∅. (A point of A which is not an accumulation point is called an isolated point of A. ) So points of A come in 2 kinds: accumulation and isolated. But be careful, accumulation points need not be points of A; A ∪ acc(A) = cl A.
Uniqueness. If c is an isolated point of the domain of the function f , the limit of f at c has little meaning, as we will describe shortly, but if a is an accumulation point of the domain, the limit is unique. To see this, we recall a concept mentioned briefly in the discussion of convergence of sequences: Theorem. (The Hausdorff Property.) In any metric space, if x 6= y, there exist neighbourhoods Ux of x and Uy of y with Ux ∩ Uy = ∅. The proof of this result is a triviality, but the result is enormously important, so lets go through it again. Proof. Since x 6= y, d(x, y) > 0. Take δ any positive number ≤ d(x, y)/2, Ux = B(x, δ) and Uy = B(y, δ). Then Ux and Uy are neighbourhoods of x and y, respectively, and Ux ∩ Uy = ∅. Indeed, if z ∈ Ux ∩ Uy , then d(x, y) ≤ d(x, z) + d(z, y) < δ + δ ≤ d(x, y), an impossibility. Theorem. (Uniqueness of limits) Suppose c is an accumulation point of X, f : X −→ R. If f (x) −→ L1 as x −→ c and f (x) −→ L2 as x −→ c, then L1 = L2 . Proof. Suppose L1 6= L2 . By the Hausdorff property, there exists neighbourhoods V1 of L1 and V2 of L2 such that V1 ∩ V2 = ∅. By definition, there exist deleted neighbourhoods U1 of c and U2 of c with with f (U1 ) ⊂ V1 and f (U2 ) ⊂ V2 . Now U1 ∩ U2 is still a deleted neighbourhood of c. (Take out your δ 0 s and check.) Thus, U1 ∩ U2 ∩ X 6= ∅. But if x ∈ U1 ∩ U2 ∩ X, f (x) ∈ V1 ∩ V2 , which is impossible.
The theorem we have just proved is the justification of the notation lim f (x) = L.
x−→c
However, limits of functions can be non unique in a certain situation. 13/1/2003 1054 mam
LIMITS OF FUNCTIONS
113
Proposition. If c is not an accumulation point of X, the domain of the real-valued function f , then f (x) −→ y as x −→ c, for all y ∈ R. Proof. c is not an accumulation point of X means there exists a deleted neighbourhood U of c which does not intersect X. Thus, if V is a neighbourhood of y, f (U ) = f (X ∩ U ) = f (∅) = ∅ ⊂ V. This is so disturbing to some people that they refuse to talk about limits at an isolated point of X. They just define this behaviour away. Lay’s book is one of these. If we use a formula for something, it should have a unique definition. In case c is an accumulation point of the domain of f , the limit is unique, so this is fine. If c is not an accumulation point, it is not. Nevertheless, it is somewhat common to use this notation even in the latter case. One more thing . . . . You will notice that it is not necessary for c to be a member of the domain of the function to define a limit at c. Theorem. (The sequential criterion for convergence). Let c ∈ R, X ⊂ R and f : X −→ R. Then f (x) −→ L as x −→ c
⇔ f (xn ) −→ L for each sequence (xn ) in X \ {c} converging to c.
Proof. ( =⇒ ) Suppose f (x) −→ L as x −→ c. Let (xn ) be a sequence in X \ {c} with xn −→ c. We are to show f (xn ) −→ L. Let ε > 0. Then, there exists δ > 0 such that f (x) ∈ B(L, ε), whenever x ∈ B(c, δ) \ {c}. But xn −→ c and xn 6= c for all n, so there exists N such that n > N implies xn ∈ B(c, δ) \ {c}. For this N , and for n > N , we therefore have f (xn ) ∈ B(L, ε) as required. ( ⇐= ) We prove this by establishing the contrapositive. Assume f (x) does not converge to L as x −→ c. That is, there exists ε > 0 such that for all δ > 0, there exists x ∈ B(c, δ) ∩ X, with x 6= c, and f (x) ∈ / B(L, ε). Fix such an ε > 0. (Today it is delta’s turn to take on a lot of identities.) For each n ∈ N, take δ = 1/n in the above statement and choose xn ∈ B(c, 1/n) ∩ X, with xn 6= c,
but f (xn ) ∈ / B(L, ε).
Well, this means that (xn ) is a sequence in X \ {c} for which xn −→ c, yet there is no n for which f (xn ) belongs to B(L, ε); the sequence (f (xn )) certainly doesn’t converge to L. For those who prefer distances, maybe written with absolute values, we constructed (xn ) with xn ∈ X 0 < |xn − c| <
13/1/2003 1054 mam
1 , n
but |f (xn ) − L| ≥ ε, for all n ∈ N.
114
LIMITS OF FUNCTIONS
Example. Let f (x) = sin(1/x), for x 6= 0, so the domain of f is X = R \ {0}. Then f (x) does not converge to anything as x −→ 0. nπ ), for each n. Then f (x1 ) = 1, f (x2 ) = 0, f (x3 ) = −1, 2 f (x4 ) = 0, . . . . We see that the sequence (f (xn )) does not converge. But xn 6= 0 and xn −→ 0, so the sequential criterion isn’t satisfied. Thus, f (x) does not converge as x −→ c. Proof. Let xn = 1/(
Let f, g be functions on a set X to R. Then the functions f + g, f g, f /g are “defined pointwise” as follows. (1) f + g is defined on X by (f + g)(x) = f (x) + g(x). (2) f g is defined on X by (f g)(x) = f (x)g(x). (3) f /g is defined on {x ∈ X : g(x) 6= 0} by (f /g)(x) = f (x)/g(x). Theorem. Let f : X −→ R g : X −→ R, c ∈ R, with f (x) −→ L and g(x) −→ M as x −→ c, then (1) (f + g)(x) −→ L + M as x −→ c (sum law) (2) (f g)(x) −→ LM as x −→ c (product law) (3) (f /g)(x) −→ L/M as x −→ c (quotient law), provided M 6= 0. We will prove (1) using the definition of limit, and (2) using the sequential criterion (SC). (3) is left as an exercise. I think you will find it easiest using the SC. Proof. (1) Let ε > 0. Since f (x) −→ L as x −→ c, then there exists δ1 > 0 such that x ∈ X and x ∈ X, and 0 < |x − c| < δ1 , imply |f (x) − L| < ε/2 and similarly there exists δ2 > 0 such that x ∈ X, and 0 < |x − c| < δ2 , imply |g(x) − M | < ε/2. Put δ = min{δ1 , δ2 } then, x ∈ X and 0 < |x − c| < δ imply |(f + g)(x) − (L + M )| ≤ |f (x) − L| + |g(x) − M | <
ε ε + = ε. 2 2
Since ε > 0 was arbitrary, ∀ε > 0, ∃δ > 0, ∀x ∈ X, 0 < |x − c| < δ
=⇒
|(f + g)(x) − (L + M )| < ε.
That is, (f + g)(x) −→ L + M as x −→ c. (2) Let (xn ) be an arbitrary sequence in X\{c} converging to c. Then f (xn ) −→ L and g(xn ) −→ M , by the SC and (f g)(xn ) = f (xn )g(xn ) by definition. Therefore by the product law for sequences (f g)(xn ) −→ LM. 13/1/2003 1054 mam
LIMITS OF FUNCTIONS
115
Since (xn ) was arbitrary, f g satisfies the SC for convergence: (f g)(x) −→ LM as x −→ c. 1. The sequential criterion has a stronger form, without reference to a particular limit: If for each (xn ) converging to c, (f (xn )) converges, then f (x) converges as x −→ c. (The point here is that at first sight, the limit of f (xn ) could be different for different sequences (xn ), yet it is part of the conclusion that there is only one such limit, and then f (x) converges to it as x −→ c.
Left and right limits. For a function f defined on a subset X of the reals, and c ∈ R, f (x) tends to L as x −→ c from the left means the restriction of f to X ∩ (−∞, c] converges to L. Then L is called the left limit or left-hand limit of f at c, denoted lim f (x) f (c−) = lim f (x) or x−→c x−→c−
x
Be very careful using this notation. There is no point c−. f (c−) makes sense, but c− does not. (Also if c is not an accumulation point of X ∩ (−∞, c], one shouldn’t use the limit notation, since the limit is not unique.) Similarly, f (x) converges to L as x −→ c from the right means the restriction of f to X ∩ [c, ∞) does. The corresponding notation is lim f (x) f (c+) = lim f (x) or x−→c x−→c+
x>c
A real-valued function f is called increasing on X if x1 < x2 in X implies f (x1 ) ≤ f (x2 ) and decreasing on X if x1 < x2 in X implies f (x1 ) ≥ f (x2 ). A real-valued function f is called strictly increasing on X if x1 < x2 in X implies f (x1 ) < f (x2 ) and strictly decreasing on X if x1 < x2 in X implies f (x1 ) > f (x2 ). A function is called monotone if it is either increasing or decreasing and strictly monotone if it is either strictly increasing or strictly decreasing. You should prove as an exercise: Theorem. If f is a monotone function defined on an interval I of the reals, then for each c ∈ I, the limits from the right and from the left exist. In fact, if f is increasing, f (c−) = supxc f (x) if c is not the right endpoint of I. If f is decreasing, we have the same result with supremum and infimum interchanged.
13/1/2003 1054 mam
116
LIMITS OF FUNCTIONS
Notes
13/1/2003 1054 mam
Continuity of functions Let X be a subset of R and f : X −→ R. Then f is called continuous at a ∈ X iff for every ε > 0, there exists δ > 0 such d(f (x), f (a) < ε, whenever x ∈ X and d(x, a) < δ. In terms of neighbourhoods, f : X −→ R is continuous a ∈ X if for each neighbourhood V of f (a), there exists a neighbourhood U of a with f (U ) ⊂ V . Here we remember that a δ-neighbourhood in X is of the form BX (a, δ) = X ∩ BR (a, δ). If A ⊂ X, f is called continuous on A if it is continuous at each point of A; f is called simply continuous if it is continuous at each point of its domain. We say f is discontinuous at the point a, if a ∈ X and f is not continuous at a. If a is not a point of the domain of f , f is neither continous nor discontinuous at a — it is not “at a” at all.
If a is not an accumulation point of the domain X of f , (that is if a is an isolated point of X) then we see that f is automatically continuous at a. Indeed, if a is isolated, then exists δ > 0 such that BX (a, δ) = BR (a, δ) ∩ X = {a}: thus for x ∈ BX (a, δ), |f (x) − f (a)| = |f (a) − f (a)| = 0 < ε, no matter what the given ε > 0 was. Looked at another way, if V is a neighbourhood of f (a), then, f (BX (a, δ)) = {f (a)} ⊂ V . Proposition. Let f : X −→ R a ∈ R, then f is continuous at a iff limx−→a f (x) = f (a). This is an immediate consequence of the definitions. We emphasize that the condition means 3 things: (1) the limit limx−→a f (x) exists, (2) f (a) exists (that is a ∈ X) and (3) the two sides are equal. Theorem. (Sequential criterion for continuity) Let f : X −→ R and let a ∈ X. Then f is continuous at a iff for each sequence (xn ) in X \ {a} converging to a, f (xn ) −→ f (a), or alternatively, iff for each sequence (xn ) in X converging to a, f (xn ) −→ f (a). Proof. Since f is continuous at a iff limx−→a f (x) = f (a), the first version is an immediate consequence of the sequential criterion for limits; namely, lim f (x) = L
x−→a
iff for each (xn ) in X \ {a} converging to a, f (xn ) −→ L . The proof of the second form of this is essentially the same, the difference is just that we don’t have to pay special attention to a. Here is the detail: Suppose f be continuous at a. Let xn −→ a, xn ∈ X for all n∈ N. Let V be a neighbourhood of f (a). Then there is a neighbourhood U in 13/1/2003 1054 mam 117
118
CONTINUITY OF FUNCTIONS
X of a such that f (U ) ⊂ V . But since xn −→ a, there exists N ∈ N such that for n > N , xn ∈ U and hence f (xn ) ∈ V . Thus, for every neighbourhood V of f (a), there exists N with f (xn ) ∈ V for n > N . That is, f (xn ) −→ f (a). For the converse, we assume f is not continuous at a. Then, there exists an ε > 0 such that there is no neighbourhood U of a with f (U ) ⊂ V . For each n ∈ N, 1 1 / V . Since −→ 0 we have then, there exists xn ∈ BX (a, ) such that f (xn ) ∈ n n xn −→ a. Thus, (xn ) is a sequence in X converging to a, yet f (xn ) does not converge to f (a). Example. Let f : R −→ R be the indicator function of the rationals 1Q : 1 if x ∈ Q f (x) = 1Q (x) = 0 if x ∈ /Q Then f is discontinuous at each point of R. The easiest way to prove this is by the sequential criterion. If a ∈ Q, we use the fact that Qc is dense in R to obtain a sequence (xn ) of irrationals converging to a, but then f (xn ) = 0 −→ 0 and f (a) = 1, so (f (xn )) does not converge to f (a). So f is not continuous at a. On the other hand if a ∈ / Q then (since Q is dense in R) there is a sequence (xn ) in Q with xn −→ a but f (xn ) = 1 which doesn’t converge to 0 = f (a) and again, by the sequential criterion, f is not continuous at a. Dirichlet’s function. We now give an example of a function on [0, 1] which is continuous exactly on the irrationals of this interval. We put, for x ∈ [0, 1] ( 1 m if x = rational in lowest terms f (x) = n n 0 if x is irrational. (Lowest terms are necessary to make the function “well-defined”, that is to have a unique value for each x.) m , in lowest terms. Let (xk ) be a n −→ a. Then f (xk ) = 0 for all k, yet
Suppose that a is rational of the form
sequence of irrationals in [0,1] with xk 1 f (a) = 6= 0. Hence limk f (xk ) 6= f (a), so f is not continuous at a. n
Now suppose a is irrational. To show f is continuous at a we go back to the definition. Fix ε > 0. By the Archimedean property, there exists k such that 1/k < ε. Now, let F be the set of rationals in [0,1] with denominators < k, This set is finite, so there exists a δ > 0 such that B(a, δ) ∩ F = ∅. Thus, for each rational m in B(a, δ) ∩ [0, 1], n ≥ k and hence, n m 1 1 − f (a) = − 0 = < ε. f n n n Since, for each irrational x in B(a, δ) ∩ [0, 1], f (x)= 0, so again |f (x) − f (a)| < ε. Thus, for all x ∈ [0, a], |x − a| < δ, implies |f (x) − f (a)| < ε, so that f is continuous at a. The sum, product and quotient of continuous functions are continuous: 13/1/2003 1054 mam
CONTINUITY OF FUNCTIONS
119
Theorem. Let f, g be functions on X to R. If f, g are continuous at a then then (a) the functions f + g and f g are continuous at a. (b) f /g is continuous at a, provided g(a) 6= 0. We recall that the f /g is defined on the set of those x ∈ X for which g(x) 6= o. Proof. This follows from the corresponding theorem about limits. For example lim (f + g)(x) = lim f (x) + lim g(x)
x−→a
x−→a
x−→a
= f (a) + g(a) = (f + g)(a)
Examples. (1) Each constant function defined by f (x) = c for all x ∈ R is continuous on R. Indeed, for each ε > 0, we can take any δ > 0, and get |f (x) − f (a)| = 0 < ε for all x,a, in particular when |x − a| < δ. (2) The identity function i(x) = x is continouus. For a given ε > 0, the choice δ = ε satisfies the definition: |x − a| < δ
=⇒
|f (x) − f (a)| = |x − a| < ε.
(3) Let p be a polynomial function. Then there exist n ∈ N and constants a0 , a1 , . . . , an such that p(x) = a0 + a1 x1 + a2 x2 + · · · + an xn . Thus, using (1) and (2) and induction with the fact that the sum and product of continuous functions is continuous, we see that p is continuous on R. (4) A rational function is by definition the quotient of two polynomial functions, and is thus continuous on its domain. The composition of two continuous functions is continuous. Theorem. Let f : X −→ R and g : Y −→ R with f (X) ⊂ Y . If f is continuous at a ∈ X and g is continuous at f (a) then g ◦ f is continuous at a. Proof. Let W be a neighbourhood of g ◦ f (a). Since g is continuous at f (a), there is a neighbourhood V of f (a) such that g(V ) ⊂ W. But f is continuous at a, so there exists a neighbourhood U of a such that f (U ) ⊂ V. Hence, g ◦ f (U ) = g(f (U )) ⊂ g(V ) ⊂ W, as required. 13/1/2003 1054 mam
120
CONTINUITY OF FUNCTIONS
Theorem. Let f : X −→ R. Then f is continuous iff for each G open in R, f −1 (G) is open in X. Proof. Suppose f is continuous and let G be open in R. Let a ∈ f −1 (G). Then, f (a) ∈ G. Then there is an open ball V centered at f (a) contained in G. But then, since f is continuous at a, there is a neighbourhood U of a such that f (U ) ⊂ V . Thus, f (U ) ⊂ G,and hence U ⊂ f −1 (G). Thus each point of f −1 (G) is an interior point, so is open. 1. Discuss how the composition of continuous functions theorem could lead to a formal substitution: if f (x) −→ b, as x −→ a, then lim g(f (x)) = lim g(y).
x−→a
y−→b
Is continuity necessary? What other conditions would do?
13/1/2003 1054 mam
Continuity and compactness Theorem. The continuous image of a compact set is compact. That is, suppose f : K −→ R, is continuous. If K is compact then f (K) is compact. Proof. Let U be an open cover of f (K). For each U ∈ U, f −1 (U ) is open in K. Moreover, the familiy {f −1 (U ) : U ∈ U} covers K. Indeed, [
U ⊃ f (K)
U ∈U
so
[
f
−1
(U ) = f
−1
U ∈U
[
U
!
⊃ K.
U ∈U
Since K is compact, there is a finite subfamily {f −1 (U1 ), . . . , f −1 (Un )} which also covers K. f −1 (U1 ) ∪ · · · ∪ f −1 (Un ) ⊃ K. That is f −1 (U1 ∪ · · · ∪ Un ) ⊃ K, and so U1 ∪ · · · ∪ Un ⊃ f (K).
Corollary. ( Extreme Value Theorem.) Each continuous real-valued function on a non-empty compact set assumes a maximum and a minimum value. Proof. Let K be compact and non-empty and let f be continuous on K to R. Then f (K) is compact non-empty, hence it has a minimum and a maximum. Note if y1 = min f (K) and y2 = max f (K), then y1 ∈ f (K) so there exists x1 ∈ K with f (x1 ) = y1 and similarly there exists x2 ∈ K with f (x2 ) = y2 . Finally if x ∈ K we obtain f (x) ∈ f (K), so f (x1 ) ≤ f (x) ≤ f (x2 ).
13/1/2003 1054 mam 121
122
CONTINUITY AND COMPACTNESS
Notes
13/1/2003 1054 mam
The Intermediate Value Theorem Here is the result on which much of the application of Calculus is based. Our hard work will pay off handsomely. Theorem. Let I be an interval, f : I −→ R be continuous and let a, b ∈ I, with a < b. If y is a point strictly between f (a) and f (b), then there exists c ∈ (a, b) such that f (c) = y. Proof. We may assume I = [a, b], since the restriction of a continuous function is continuous. We may assume f (a) < y < f (b). (For, if f (a) > y > f (b), we may replace f by −f and y by −y.) Let A = f −1 ((−∞, y)) and C = f −1 ((−∞, y]) Then A is open in [a, b] and C is closed in [a, b], since (−∞, y) is open and (−∞, y] is closed and f is continuous. Now A = {x : f (x) < y} C = {x : f (x) ≤ y}. Since f (a) < y, A is not empty. Since A ⊂ [a, b], A is bounded above. Thus we may set c = sup A. Then c ∈ cl A, A ⊂ C and C is closed so c ∈ C. That is, f (c) ≤ y. Now, suppose f (c) 6= y. Then, f (c) < y, so c ∈ A, open in [a,b]. Thus, there exists δ > 0 such that [a, b] ∩ B(c, δ) ⊂ A. We may assume δ ≤ b − c, so that [c, c + δ) ⊂ A. If we let d be any point of (c, c + δ) we have c < d ∈ A, contradicting the fact that c is an upper bound of A and establishing that f (c) = y. Remark. Of course, if in the hypothesis of the above theorem, we just have y between f (a) and f (b) instead of strictly between, then there still is c ∈ [a, b] with f (c) = y. Indeed, if y is between them but not strictly between them, then either f (a) = y or f (b) = y. Example. Every 5th degree polynomial has at least one real root. We use the limit behaviour of such polynomials at the infinities. If P (x) = a5 x5 + a4 x4 + a3 x3 + a2 x2 + a1 x + a0 , there are 2 cases, a5 > 0 and a5 < 0. Without loss of generality we may assume a5 > 0, since P (x) = 0 if and only if −P (x) = 0. Then,
a4 a3 a2 a1 a0 + P (x) = a5 x5 1 + + + + a5 x a5 x2 a5 x3 a5 x4 a5 x5 which converges to +∞ as x → ∞ and converges to −∞ as x → −∞. In particular, there exists s with P (s) > 0 and t with P (t) < 0. But then, since P is continuous, by the Intermediate Value Theorem, there exists c between s and t with P (c) = 0, as required. Of course, a slight modification will show that every odd degree polynomial has a real root. Here is a first application of the Intermediate Value Theorem. 13/1/2003 1054 mam 123
124
THE INTERMEDIATE VALUE THEOREM
Corollary. Let f : [a, b] −→ [a, b] be continuous. Then f has a fixed point; that is, there exists x ∈ [a, b] with f (x) = x. Proof. Let g(x) = f (x) − x for all x ∈ [a, b]. We have to show there exists c with g(c) = 0. But, f (a) ∈ [a, b] =⇒ g(a) = f (a) − a ≥ 0 and f (b) ∈ [a, b] =⇒ g(b) = f (b) − b ≤ 0 Thus, by the Intermediate value theorem, there exists c ∈ [a, b] with f (c) = c. (See the above remark.) Every continuous image of a compact interval is a compact interval. Corollary. If I is a compact interval of R and f is a continuous real-valued function whose domain contains I, then f (I) is a compact interval. Reminder. A compact interval is the same as a closed interval [a, b]. This language is a way of emphasizing its properties. As you know, one includes sets of the form [a, ∞), (a, ∞), (−∞, b), (−∞, b], (−∞, ∞) as intervals. Thus, [a, ∞) is a closed set, which is an interval, but is not compact. You have to look at the context to understand what someone means when he or she talks about “closed interval”. But a compact interval must be closed and bounded, so there is no ambiguity: it must be of the form [a, b], where a ≤ b, real numbers. Proof. If I is a compact interval, then f (I) is compact, and as we have observed has a minimum m1 at some point x1 and a maximum m2 at some point x2 . Thus f (I) ⊂ [m1 , m2 ]. If y ∈ [m1 , m2 ], then the Intermediate Value Theorem yields a point x between x1 and x2 with f (x) = y. Since I is an interval, x ∈ I, so [m1 , m2 ] ⊃ f (I). Thus, f (I) = [m1 , m2 ], a compact interval. Actually, if we drop compactness, we still find that a continuous image of an interval is an interval. Corollary. If I is an interval of R and f is a continuous real-valued function whose domain contains I, then f (I) is an interval. Because of the following result, this is almost a restatement of the Intermediate Value Theorem. Lemma. A set J in R is an interval if and only if whenever y1 and y2 belong to J and y1 < y < y2 , then y also belongs to J. Proof. The proof of the lemma is an exercise, which you have probably already done. A function which satisfies the conclusion of the Intermediate Value Theorem is said to have the Intermediate Value Property or Darboux Property. A function can have this property without being continuous. 13/1/2003 1054 mam
THE INTERMEDIATE VALUE THEOREM
125
Example: A discontinuous function with the Intermediate Value Property. The function defined on R by f (x) = sin(1/x) for x 6= 0 and f (0) = 0 is not continuous at 0, but attains all values between any two of its values. Indeed, if [a, b] with a < b contains 0, f takes on all values between 0 and 1, and if it does not contain 0 then f takes all values between f (a) and f (b) by the IVT, because f is continuous. Notice that the graph of the function in the above example wiggles a lot. If it doesn’t wiggle too much, a function with the IVP must be continuous. The simplest case is that of a monotone function. Theorem. If f is increasing or decreasing on an interval I and if f (I) is an interval, then f is continuous on I. . Proof. Suppose I is an interval, f (I) = J, an interval and f : I −→ J is increasing. Let x ∈ I, y = f (x) ∈ J, and ε > 0. We have to show that there is a neighbourhood U of x such that f (U ) ⊂ B(y, ε). We will do here the case y is an interior point of J and leave the modification necessary for the case of an endpoint to the reader. Since y is an interior point of J, there exists ε0 < ε such that B(y, ε0 ) ⊂ J. In particular, y − ε0 and y + ε0 both belong to J. Thus, there exist a, b ∈ I with f (a) = y − ε0 and f (b) = y + ε0 . Since f is increasing, a < x < b. If a < c < b, we then have y − ε0 < f (c) < y + ε. Thus f maps the interval (a, b) into J. Since (a, b) is an open interval containing x, this suffices: (a, b) is the required neighbourhood of x. We here begin using the more general notion of neighbourhood, to save the tedious “choosing δ > 0” etc. If U is a set which contains some δ-neighbourhood of x, then U is still referred to as a neighbourhood. Of course, if f maps this kind of neighbourhood into a set V , then it must map some δ-neighbourhood of x into V .
Corollary. If I is an interval and f is continuous and strictly increasing on I then f −1 is continous and strictly increasing on f (I). Proof. If I is an interval and f is continuous, then f (I) is also an interval. If f is also strictly increasing, then so is f −1 . Since f −1 is defined on an interval and has an interval as its range, it must be continuous, by the previous theorem. Example. Existence of roots again. Let f (x) = xn , defined for x ∈ (0, ∞). Then f is continuous and strictly increasing. Thus, f has an inverse, which therefore must also be strictly increasing and continuous. By the IVT, the range of f is an interval. Since xn −→ ∞, the interval must be (0, ∞). Thus the inverse of f is defined on (0, ∞). This is the function y 7→ y 1/n . It is interesting that the only way a continuous function on an interval can have an inverse is if it is strictly monotone: 13/1/2003 1054 mam
126
THE INTERMEDIATE VALUE THEOREM
Theorem. If f is continuous and injective on an interval I, then f is either strictly increasing or strictly decreasing on I. Proof. Suppose f is continuous and injective on the interval I. Claim. If a1 < a2 < a3 then f is monotone on {a1 , a2 , a3 }; that is, either f (a1 ) < f (a2 ) < f (a3 ) or f (a1 ) > f (a2 ) > f (a3 ). Indeed, suppose a1 < a2 < a3 and f (a1 ) < f (a2 ) then we must also have f (a2 ) < f (a3 ). Otherwise, f (a2 ) > f (a3 ) and if we choose any y strictly between max{f (a1 ), f (a3 )}, and f (a2 ), the Intermediate Value Theorem yields points c1 ∈ (a1 , a2 ) and c2 ∈ (a2 , a3 ), with f (c1 ) = f (c2 ) = y, contradicting injectivity. The case f (a1 ) > f (a2 ) is similar, establishing the claim. Now suppose x1 < x2 with f (x1 ) < f (x2 ) and x < x0 are any two points of I. We have to show that f (x) < f (x0 ) as well. Let A be the set {x1 , x2 , x, x0 }. It is possible that A consists of 2 or 3 points (for example we could have x1 = x). The first of these cases is trivial. The second case we have handled by our claim. The final case is that A has 4 distinct points. If so, order them in increasing order a1 < a2 < a3 < a4 . Since f is monotone on sets of 3 elements, and since {a1 , a2 , a3 } and {a2 , a3 , a4 } have the pair {a2 , a3 } in common, f is monotone on the whole of A. Since f (x1 ) < f (x2 ), f is increasing on A and we have f (x) < f (x0 ) as required. 1. Prove the Intermediate Value Theorem using a bisection argument. Hint: If y is between f (a) and f (b) and c is the point (a + b)/2, then y is either between f (a) and f (c) or f (c) and f (b). This allows one to construct a decreasing sequence of closed intervals. The one point x in the intersection of these satisfies f (x) = y.
13/1/2003 1054 mam
Uniform Continuity Let X and Y be metric spaces, with distance functions dX and dY . (We often write simply d, in both cases, if there is no danger of confusion.) Let f be a function on X to Y . Then f is continuous on X iff it is continuous at each point a of X, that is: ∀a ∈ X,
∀ε > 0, ∃δ > 0, such that
∀x ∈ X, d(x, a) < δ =⇒ d(f (x), f (a)) < ε. For example, let X = R and f (x) = x2 , for x ∈ X. To show that f is continuous, one takes a ∈ X, lets ε > 0 and calculates for x ∈ X: d(f (x), f (a)) = |x2 − a2 | = |x − a||x + a| If we assume |x − a| < 1 we will have |x + a| < 1 + 2|a|, so |f (x) − f (a)| ≤ |x − a|(1 + 2|a|), which is < ε, provided |x − a| <
ε , 1 + 2|a|
so we can take δa = min{1, ε/(1 + 2|a|)}, and get |x − a| < δa implies |f (x) − f (a)| < ε. Notice that the δ = δa depends on a. For larger a we need a smaller δa . Uniform continuity is different. The distances involved don’t depend on where in the domain of the function we are. The order of quantifiers is changed. A function f : X → Y is said to be uniformly continuous on X if ∀ε > 0, ∃δ > 0, such that ∀x, y ∈ X,
d(x, y) < δ =⇒ d(f (x), f (y)) < ε.
Example. Let f (x) = 3x, for x ∈ R. Fix ε > 0. Then for all x, y, |f (x) − f (y)| < ε iff
3|x − y| < ε
So if we take δ = ε/3, we have for all x, y ∈ R, |x − y| < δ
=⇒
|f (x) − f (y)| < ε.
Thus, f is uniformly continuous on R. At this point one should check that the function defined on R by f (x) = x2 is not uniformly continous on R. If one says f : X −→ Y is uniformly continuous, one means f is uniformly continuous on its domain, X. Note on generality. Although we have defined the notion of uniform continuity in general metric spaces, we have written many of the results in the special case of the real numbers, for the feeling of concretness it gives. We leave it to the reader to write out the metric space version. In doing so, be sure to take into account when completeness of the reals as a metric space is being used. 13/1/2003 1054 mam 127
128
UNIFORM CONTINUITY
Theorem. Every continuous function on a compact set is uniformly continuous. That is, if f : K −→ R is continuous and K is compact, then f is uniformly continuous on K. Proof. Let ε > 0. Since f is continuous on K, for each a ∈ K, there exists δa such that =⇒ |f (x) − f (a)| < ε/2. (∗) x ∈ B(a, δa ) ∩ K Now, {B(a, δa /2) : a ∈ K} is a family of open sets covering K, so there exists a finite subfamily which also covers K; that is, there exists a1 , . . . , an ∈ K such that n [
B(ai , δai /2) ⊃ K.
(∗∗)
i=1
Let δ =
1 min{δa1 , . . . , δan }. Suppose |x − y| < δ. By (∗∗) there exists i such that 2 |x − ai | < δai /2.
Then we also have |y − ai | ≤ |y − x| + |x − ai | < δ + δai /2 < 2δai /2 = δai . Thus, by (∗), we have |f (x) − f (y)| ≤ |f (x) − f (ai )| + |f (ai ) − f (y)| < ε/2 + ε/2 = ε. We have shown that for x, y ∈ K, |x − y| < δ implies |f (x) − f (y)| < ε. Notice that the dependence on i disappeared.
Thus, f is uniformly continuous on K.
Theorem. Let f be uniformly continuous on X. If (xn ) is a Cauchy sequence in X, then (f (xn )) is also a Cauchy sequence. Informally, “uniformly continuous functions map Cauchy sequences to Cauchy sequences”. Proof. Let (xn ) be a Cauchy sequence in X. Fix ε > 0. Then, by uniform continuity, there exists a δ > 0 such that for x, y ∈ X, |x − y| < δ implies |f (x) − f (y)| < ε. From the definition of Cauchy sequence we obtain N such that for n, m > N |xn − xm | < δ. Thus, for n, m > N , |f (xn ) − f (xm )| < ε. Since ε was arbitrary, this shows the sequence (f (xn )) is Cauchy. Recall that if f is a function with domain A, A ⊂ B and f is defined on B with f (x) = f (x) for x ∈ A, we say that f is an extension of f (to B) and f is the restriction of f to A. To say that f has an extension to a continuous function on B, means there exists an extension f of f to B, which is continuous. (Similarly for “uniformly continuous”.) 13/1/2003 1054 mam
UNIFORM CONTINUITY
129
Theorem. Let A ⊂ B ⊂ R. If f : A −→ R and f : B −→ R is an extension of f which is uniformly continuous (on B), then f itself is uniformly continuous (on A). Proof. Let ε > 0. Since f is uniformly continuous on B, we may choose δ > 0 such that for all x, y ∈ B, |x − y| < δ implies |f (x) − f (y)| < ε. Now, let x, y ∈ A with |x − y| < δ. Then x, y ∈ B so |f (x) − f (y)| < ε. But, f extends f , so f (x) = f (x) and f (y) = f (y). Hence, |f (x) − f (y)| < ε. Thus, for all x, y ∈ A with |x − y| < δ, |f (x) − f (y)| < ε. Since ε was arbitrary, this shows f is uniformly continuous on its domain, A. As an application: Theorem. Let f be defined on the open interval (a, b) with values in R. If f can be extended to a continuous function on [a, b], then f is uniformly continuous, Proof. If f is a continuous function on [a, b] which extends f , then f is a continuous function on the compact set [a, b], so is uniformly continuous. Thus, its restriction f is also uniformly continuous, by the previous result. We are now going to prove a generalization of the converse of this theorem. Theorem. (Extending a u.c. function) Let f be uniformly continuous on the set A. Then f can be extended to a uniformly continuous function on cl A. The conclusion says there exists a function f : cl(A) −→ R such that f is uniformly continuous on cl A and f extends f . Proof. . The proof uses completness. Let f be uniformly continuous on A. Fix x ∈ cl(A). If there is a continuous function on cl(A) which extends f then its formula must be f (x) = lim f (a), a−→x
(1)
so we check that the limit on the right-side exists. Let (an ) be any sequence in A converging to x. Then, (an ) is Cauchy. Since f is uniformly continuous on A, (f (an )) is also Cauchy. Since R is complete as a metric space, (f (an )) converges. Thus, for every sequence (an ) in A converging to x, (f (an )) converges; hence, by the sequential criterion, lima−→x f (a) exists. Define f on cl(A) by (1). First, f extends f , because if x ∈ A, f (x) = lim f (a) = f (x), since f is continuous on A. a−→x
13/1/2003 1054 mam
130
UNIFORM CONTINUITY
We now prove f is uniformly continuous. Let ε > 0. Use the definition of the uniform continuity of f to choose δ > 0 such that a, a0 ∈ A,
|a − a0 | < δ =⇒ |f (a) − f (a0 )| < ε
(2)
Let x, y ∈ cl A with |x − y| < δ. Choose sequences (an ) and (a0n ) such that an −→ x and a0n −→ y. Then, |an − a0n | −→ |x − y| < δ. Thus, there exists N so large that for all n ≥ N , |an − a0n | < δ, so that by (2) |f (an ) − f (a0n )| < ε
for all n ≥ N .
But, f (an ) −→ f (x), f (a0n ) −→ f (y), so in the limit |f (x) − f (y)| ≤ ε. Thus, x, y ∈ cl A, |x − y| < δ implies |f (x) − f (y)| ≤ ε, and hence f is uniformly continuous on cl A.
Corollary. If f is uniformly continuous on an interval (a, b) of R, a < b, real, then, f can be extended to a uniformly continuous function on [a, b]. The proof of this is immediate from the previous theorem, since cl(a, b) = [a, b]. . 1. A function f : R −→ R is said to be periodic if there exists a number k (called the period of f ) such that f (x + k) = f (x) for all x ∈ R. Prove that a continuous periodic function is bounded and uniformly continuous on all of R. 2. Let f be defined on an interval (a, b), with one or both of a, b infinite. If limx→a f (x) and limx→b f (x) exist in R, then f is uniformly continuous.
13/1/2003 1054 mam
Differentiation Let f be a real-valued function defined on an interval I of R containing the point c. We say f is differentiable at c if the limit f (x) − f (c) x−→c x−c lim
exists. If so, this limit is called the derivative of f at c. The function f 0 with domain the set of points where f is differentiable defined by f (x) − f (c) x−→c x−c
f 0 (c) = lim is called the derivative of f .
Theorem. For a function f : I −→ R and c ∈ I, f is differentiable at c iff there exists a number m and a function ε such that limx−→c ε(x) = ε(c) = 0 such that for all x ∈ I, f (x) = f (c) + m(x − c) + ε(x)(x − c). In this case, f 0 (c) = m. The graph of the function x 7→ f (c) + m(x − c) is tangent to the graph of f at (c, f (c)), and we will sometimes call this result the tangent characterisation of differentiability. The slope of the tangent line is m = f 0 (c), and one also calls this the slope of f at c. Proof. If such a number m and a function ε exist, we have lim
x−→c
f (x) − f (c) = lim (m + ε(x)) = m + lim ε(x) = m, x−→c x−→c x−c
so f is differentiable at c with derivative m. Conversely, suppose f is differentiable with f 0 (c) = m, we have lim
x−→c
f (x) − f (c) − m = 0, x−c
Put ε(c) = 0 and for all other x, ε(x) =
f (x) − f (c) −m x−c
Then limx−→c ε(x) = 0 = ε(c). and f (x) − f (c) = m + ε(x), x−c so f (x) = f (c) + m(x − c) + ε(x)(x − c), as required.
13/1/2003 1054 mam 131
132
DIFFERENTIATION
Theorem. If f is differentiable at c, then f is continuous at c. Proof. If f is differentiable at c then it can be written as f (x) = f (c) + m(x − c) + ε(x)(x − c). where limx−→c ε(x) = ε(c) = 0. This last condition says ε is a continuous function, so the right-hand side is a continuous a sum of products of continuous functions, so is continuous. Hence f is continuous. The most important rule for calculation of derivatives is the chain rule, so we give it immediately. The chain rule. Let I be an interval and g : I −→ R be differentiable at x0 . Let g(I) ⊂ J, f : J −→ R, and let f be differentiable at u0 = g(x0 ). Then, f ◦ g is differentiable at x0 , with (f ◦ g)0 (xo ) = f 0 (uo )g 0 (x0 ) = f 0 (g(x0 ))g 0 (x0 ). Proof. Since f is differentiable at u0 , there exists a function, ε which is continuous and 0 at u0 = g(x0 ) with f (u) = f (u0 ) + f 0 (u0 )(u − u0 ) + ε(u)(u − u0 ),
(∗)
for all u ∈ J. Replacing u by g(x) and u0 by g(x0 ) in (∗) yields
f (g(x)) = f (g(xo )) + f 0 (u0 )(g(x) − g(x0 )) + ε(g(x))(g(x) − g(x0 )). If x 6= x0 , we may rearrange and divide by x − x0 , obtaining: f (g(x)) − f (g(xo )) g(x) − g(x0 ) g(x) − g(x0 ) = f 0 (u0 ) + ε(g(x)) . x − x0 x − x0 x − x0 g(x) − g(x0 ) converges to g 0 (x0 ) and ε(g(x)) convergest to x − x0 ε(g(x0 )) = ε(u0 ) = 0, so Now as x −→ x0 ,
lim
x−→x0
f (g(x)) − f (g(xo )) = f 0 (u0 )g 0 (x0 ) + 0g 0 (x0 ) = f 0 (g(x0 ))g 0 (x0 ). x − x0
That is (f ◦ g)0 (x0 ) = f 0 (g(x0 ))g 0 (x0 ), as required.
The following version of the above proof, using only the tangent characterization of differentiability is a little messier, but is more directly generalizable to higher dimensions, where division is not defined. 2nd proof. Since g is differentiable at x0 , there exists a function ε1 , continuous and 0 at x0 with (1) g(x) = g(x0 ) + g 0 (x0 )(x − x0 ) + ε1 (x)(x − x0 ), for all x ∈ I. Since f is differentiable at u0 , there exists a function, ε2 which is continuous and 0 at u0 = g(x0 ) with f (u) = f (u0 ) + f 0 (u0 )(u − u0 ) + ε2 (u)(u − u0 ), 13/1/2003 1054 mam
(2)
DIFFERENTIATION
133
for all u ∈ J. Replacing u by g(x) in (2) gives f (g(x)) = f (uo ) + f 0 (u0 )(g(x) − u0 ) + ε2 (g(x))(g(x) − u0 ). But u0 = g(x0 ), so by (1) we may replace g(x)−u0 by g 0 (x0 )(x−x0 )+ε1 (x)(x−x0 ) yielding f (g(x)) = f (g(x0 )) + f 0 (u0 )[g 0 (x0 )(x − x0 ) + ε1 (x)(x − x0 )] + ε2 (g(x))[g 0 (x0 )(x − x0 ) + ε1 (x)(x − x0 )] = f (g(x0 )) + f 0 (u0 )g 0 (x0 )(x − x0 ) + [f 0 (u0 )ε1 (x) + ε2 ((g(x))g 0 (x0 ) + ε1 (x)] (x − x0 ) Since [f 0 (u0 )ε1 (x) + ε2 ((g(x))g 0 (x0 ) + ε1 (x)] converges to 0 as x −→ x0 , this shows f ◦ g is differentiable at x0 with derivative f 0 (u0 )g 0 (x0 ) = f 0 (g(x0 ))g 0 (x0 ).
Theorem. If f is constant on the interval I, then f is differentiable on I with f 0 (x) = 0 for all x ∈ I. To prove this, just do the calculation from the definition, or note that m = 0 and the function ε = 0 satisfy the tangent characterization of differentiability. Theorem. Let f : I −→ R, g : I −→ R be differentiable at c ∈ I and let k ∈ R, then: (a) kf is differentiable at c with (kf )0 (c) = kf 0 (c) (b) f + g is differentiable at c and (f + g)0 (c) = f 0 (c) + g 0 (c) (sum rule). (c) f g is differentiable at c and (f g)(c) = f 0 (c)g(c) + f (c)g 0 (c) (product rule). f (d) If g(c) 6= 0, then is differentiable at c and g (quotient rule).
0 g(c)f 0 (c) − f (c)g 0 (c) f (c) = g g(c)2
Proof. (a),(b) are left as exercises. For (c), we merely calculate, for each x ∈ I, (f g)(x) − (f g)(c) f (x)g(x) − f (c)g(c) = x−c x−c f (x)g(x) − f (c)g(x) + f (c)g(x) − f (c)g(c) = x−c f (x) − f (c) g(x) − g(c) = g(x) + f (c) . x−c x−c Now, since f is differentiable at c the
f (x) − f (c) converges to f 0 (c) as x −→ c. x−c
g(x) − g(c) converges to g 0 (c). Moreover, since g is differentiable at c, it x−c is continuous there, so g(x) −→ g(c) as x −→ c. This shows f g is differentiable at c with derivative f 0 (c)g(c) + f (c)g 0 (c). Similarly
13/1/2003 1054 mam
134
DIFFERENTIATION
(d) The proof of the quotient rule is similar. But before doing the calculation we recall that f /g has as domain the set of points x for which g(x) 6= 0. Moreover, since g(c) 6= 0 and g is continuous at c, there is a neighbourhood of c on which g 6= 0. Now, for x in this neighbourhood, we calculate g(c)f (x) − g(x)f (c) g(c)(f (x) − f (c)) − (g(x) − g(c))f (c) f (x) f (c) − = = , g(x) g(c) g(x)g(c) g(x)g(c) which when divided by (x-c) gives f (x) f (c) (f (x) − f (c)) (g(x) − g(c)) − g(c) − f (c) , g(x) g(c) x−c x−c = x−c g(x)g(c) Now, let x −→ c, and use the fact that f is differentiable at c, g is differentiable at c and g is continuous at c to get the required result.
For a function f : I −→ R, a point c is called a critical point of f if one of the following three conditions is satisfied (1) c is an endpoint of I, (2) f 0 (c) does not exist, or (3) f 0 (c) = 0. Theorem. If f : I −→ R is differentiable at c ∈ int(I) and f has a maximum or miminum at c, then f 0 (c) = 0. Proof. We assume f has a maximum at c; the case of minumum is similar. For elements x ∈ I with x < c, we have f (x) ≤ f (c), and x − c < 0, so f (x) − f (c) ≥ 0, x−c Thus, f 0 (c) =
lim x −→ c x
f (x) − f (c) ≥ 0. x−c
Similarly, for x ∈ I with x > c, f (x) − f (c) ≤ 0, x−c and again f 0 (c) =
lim x −→ c x>c
f (x) − f (c) ≤ 0. x−c
Thus f 0 (c) ≥ 0 and f 0 (c) ≤ 0, so f 0 (c) = 0. 13/1/2003 1054 mam
DIFFERENTIATION
135
f (x) − f (c) x−c to I ∩ (−∞, c), and then took a limit, the so-called limit from the left. Since the limit of q is f 0 (c), so is the limit from the left. In the above proof, we restricted the “difference quotient” q(x) =
Some people prefer to use the sequential criterion, letting an arbitrary sequence of points xn in I with xn < c and xn −→ c, but the proof is essentially the same. The hypothesis that c is an interior point of I was needed. Where? The limit from the left used above is called the left derivative of f at c, 0 denoted f− (c). Similarly the limit from the right is called the right derivative, 0 f+ (c). Theorem. Let I be an interval, f : I −→ R and have a local maximum or local minimum at a point c ∈ I. Then c is a critical point of f . Proof. We may assume there is a local maximum at c. There are 3 cases in the definition of critical point; if c is not an endpoint and not a point where f is non-differentiable, then c must be an interior point of I and f 0 (c) exists. Thus there is a ball B(c, ε) = (c − ε, c + ε) ⊂ I, and there is a neighbourhood U of c such that the restriction of f to U has a maximum. By shrinking the radius, then, we find an open interval (a, b) such that c ∈ (a, b), f is differentiable at c and the restriction of f to (a, b) has a maximum at c; thus, we may assume without loss of generality that I = (a, b) and f has a maximum at c. Then f 0 (c) = 0 by the previous result.
13/1/2003 1054 mam
136
DIFFERENTIATION
Notes
13/1/2003 1054 mam
The Darboux Property of Derivatives The following is often called The Intermediate Value Theorem for derivatives. It will lead to results about continuity of derivatives and differentiation of inverse functions (Inverse Function Theorem). The Darboux Property of derivatives. Let f be differentiable on the interval I. If a, b ∈ I and m is any number strictly between f 0 (a) and f 0 (b), then there exists c between a and b with f 0 (c) = m. Proof. Assume without loss of generality that a < b and f 0 (a) < m < f 0 (b). As before, let g(x) = f (x)−mx. Then, g 0 (a) = f 0 (a)−m < 0 and g 0 (b) = f 0 (b)−m > 0. Since f is continuous on [a, b], it must have a minimum at some point c of [a, b]. Now, c cannot be a because then, for all x ∈ [a, b], g(x) ≥ g(a) so that g(x) − g(a) ≥ 0, x−a and in the limit g 0 (a) ≥ 0. Similarly, c cannot be b, and thus c is in the interior. But then g 0 (c) = 0; that is, f 0 (c) = m. Corollary. Let f be differentiable on an interval I, with a monotone derivative f 0 . Then f 0 is continuous on I. Proof. We recall that whenever a monotone function defined on an interval has the Darboux property (that is, satisfies the conclusion of the Intermediate Value Theorem), it is continuous. We have just proved that f 0 is such a function.
Inverse function Theorem. Let f be differentiable on the interval I and f 0 (x) 6= 0, for all x ∈ I. Then, (1) f is injective, strictly increasing or strictly decreasing, (2) f −1 is differentiable, on J = f (I) and (3) at each point y ∈ J, (f −1 )0 (y) =
1 . f 0 (f −1 (y))
Note that if x is the point of I with y = f (x), the final formula reads (f −1 )0 (y) =
1 . f 0 (x)
In calculus courses one often writes . dy dx =1 , dy dx 13/1/2003 1054 mam 137
138
THE DARBOUX PROPERTY OF DERIVATIVES
but one has to remember that on the left side, the x represents the function f −1 , dx and stands for the derivative of that function at the point y, whereas on the dy dy stands for the derivative of f at right side y is representing the function f and dx x, yet we are still assuming that the y at which we evaluate the left side is related to the x at which we evaluate the right side by y = f (x). The notation is a mess! — though very convenient in calculations. Proof. By the Darboux property of derivatives, if f 0 6= 0 on I, then there cannot exist a, b ∈ I with f 0 (a) > 0 and f 0 (b) < 0. Thus f 0 > 0 on I or f 0 < 0 on I. Thus, f is strictly increasing on I or strictly decreasing. In particular, f is injective. Say f is strictly increasing on I. Then f −1 is also strictly increasing. Since f is continuous, it too has the Darboux property (intermediate value property) which makes J = f (I) an interval and f −1 is continuous there. Fix y ∈ J, y = f (x) where x ∈ I. We will use the sequential criterion to show f −1 is differentiable at y. Let (yn ) be a sequence in J \ {x} with yn −→ y and put xn = f −1 (yn ). Since f −1 is continuous, xn −→ x. Also xn 6= x, for all n since f −1 is injective. Thus, we calculate xn − x 1 f −1 (yn ) − f −1 (y) = = . f (xn ) − f (x) yn − y f (xn ) − f (x) xn − x Since f is differentiable at x, this converges to
1 . Since the sequence (yn ) was f 0 (x)
arbitrary, lim
t−→y
1 f −1 (t) − f −1 (y) = 0 . t−y f (x)
That is, (f −1 )0 (y) exists and is 1/f 0 (x).
13/1/2003 1054 mam
Mean Value Theorems Rolle’s Theorem. Let f : [a, b] −→ R be continuous on [a, b] and differentiable on (a, b) with f (a) = f (b). Then, there exists c ∈ (a, b) with f 0 (c) = 0. Proof. Since f is continuous on the compact set [a, b], it has a maximum and a minimum at some point of [a, b]. If one of these is in the interior, that is, at some c ∈ (a, b), then f 0 (c) = 0. If, on the other hand, both the maximum and the minimum are at the endpoints, then they are equal by hypothesis. Hence, in this case f is a constant, and thus has derivative 0 at all points of (a,b). Thus, any c in (a, b) produces the desired conclusion. The following theorem is purportedly due to Lagrange. We will also study a generalization due to Cauchy. Mean Value Theorem. (MVT) Let f : [a, b] −→ R be continuous on [a, b] f (b) − f (a) . and differentiable on (a, b). Then, there exists c ∈ (a, b) with f 0 (c) = b−a 0 Equivalently, f (b) − f (a) = f (c)(b − a). Proof. This is a generalization of Rolle’s Theorem. The method of proof is to create a new function h which satisfies the hypotheses of Rolle’s Theorem and for which h0 (c) = 0 gives the desired equality. Let g(x) = f (a) +
f (b) − f (a) (x − a), b−a
for all x ∈ [a, b] Then
g(a) = f (a), g(b) = f (b).
f (b) − f (a) . b−a Thus, we let h(x) = f (x) − g(x), for all x ∈ [a, b], so that h(a) = h(b) = 0, h is continuous on [a,b] and h0 = f 0 − g 0 . By Rolle’s Theorem, there is a point c ∈ (a, b) f (b) − f (a) , as required. where h0 (c) = 0, that is, where f 0 (c) = b−a g 0 (x) =
The Mean Value Theorem can be rewritten in the useful form: Theorem. If I is an interval of the reals and f is continuous on I and differentiable on its interior, then for each distinct x1 , x2 ∈ I, f (x2 ) = f (x1 ) + f 0 (c)(x2 − x1 ), for some c strictly between x1 and x2 . f (x1 ) − f (x2 ) does not change when you interx1 − x2 change x1 and x2 . Of course, if f is differentiable everywhere, the result is true also for x1 = x2 , except that one then has c just between x1 and x2 , possibly equal. This is because the quotient
13/1/2003 1054 mam 139
140
MEAN VALUE THEOREMS
It is an immediate consequence that only constant functions have derivative zero. Theorem. If f is continuous on an interval I, differentiable on its interior, with f 0 = 0 there, then f is constant on I. Proof. If x1 , x2 belong to I, with x1 < x2 , then there exists c ∈ (x1 , x2 ) with f (x2 ) − f (x1 ) = f 0 (c)(x2 − x1 ) = 0. Thus, f (x1 ) = f (x2 ). This shows all the values of f are the same. That is, f is a constant function.
From this it follows that if two functions have the same derivative, they differ by a constant: Corollary. Let F, G be continuous on an interval I, differentiable on the interior with F 0 = G0 . Then there exists a constant C such that G = F + C on I. If f is defined on the interior of an interval I and F is defined and continuous on I, with derivative f in the interior, then F is called Ra primitive, antiderivative or indefinite integral of f , and one writes F (x) = f (x) dx. Thus, if G is another antiderivative of f , then F = G + C, where C is a constant function. Theorem. Let f be continuous on the interval I, differentiable on its interior. (a) If f 0 > 0 on int I, then f is strictly increasing on I. (b) If f 0 < 0 on int I, then f is strictly decreasing on I. (c) f 0 ≥ 0 on int I iff f is increasing on I. (d) f 0 ≤ 0 on int I iff f is decreasing on I. Proof. (a) If x1 < x2 in I, then the Mean Value Theorem holds for the interval [x1 , x2 ], so we may choose c between x1 and x2 with f (x2 ) = f (x1 ) + f 0 (c)(x2 − x1 ). Since f 0 (c) > 0, this says f (x1 ) < f (x2 ), as required. The proof of (b) is similar, as is one direction of (c) and (d).
We emphasise that in (a) and (b), the implication cannot be reversed. For example, if f is defined on R by f (x) = x3 , f 0 (x) = 3x2 , which is 0 at x = 0, yet f is everywhere strictly increasing. The reader can prove: Practice Problems. (1) Let f be continuous on the interval I and differentiable on the interior with f 0 > 0 except at finitely many points in each bounded interval. Then f is 13/1/2003 1054 mam
MEAN VALUE THEOREMS
141
strictly increasing on I. (Note: At the finitely-many points, we don’t even need to assume the differentiability.) Actually, you can prove: (2) We have the same conclusion if f 0 ≥ 0 on I and there is no interval (a, b) with a < b on which f 0 = 0. In your study of uniform continuity, you will recall that one often tries to get an inequality of the form |f (x) − f (y)| ≤ M |x − y|, for then |x − y| < ε/M implies |f (x) − f (y)| < ε, so that f is uniformly continuous. A function with this property is called Lipschitz (of order 1). Theorem. Let f be continuous on an interval I, differentiable in the interior. If f 0 is bounded on the interior of I, then f is Lipschitz on I, hence f is uniformly continuous. Proof. Let |f 0 (x)| ≤ M , for all x ∈ I. Let x1 , x2 ∈ I. If these are distinct, there exists c between x1 and x2 with f (x1 ) − f (x2 ) = f 0 (c), x1 − x2 by the Mean Value Theorem. Thus, f (x1 ) − f (x2 ) = |f 0 (c)| ≤ M, x1 − x2 so |f (x1 ) − f (x2 )| ≤ M |x1 − x2 |. In case x1 = x2 , this argument fails, but the inequality is still true, since both sides are then 0. Here is the generalized mean value theorem mentioned earlier. Cauchy’s Mean Value Theorem. Let f, g be real-valued functions continuous on [a, b], differentiable on (a, b). Then, there exists a point c ∈ (a, b) such that (f (b) − f (a))g 0 (c) = (g(b) − g(a))f 0 (c).
Proof. Let h(x) = (f (b) − f (a))g(x) − (g(b) − g(a))f (x), for all x ∈ [a, b]. Then h(a) = (f (b) − f (a))g(a) − (g(b) − g(a))f (a) = f (b)g(a) − g(b)f (a). and h(b) = (f (b) − f (a))g(b) − (g(b) − g(a))f (b) = −f (a)g(b) + g(a)f (b), which is the same thing. So by Rolle’s theorem (or the MVT) there exists a point c in (a, b) with h0 (c) = 0, that is where (f (b) − f (a))g 0 (c) − (g(b) − g(a))f 0 (c) = 0. This is what is required.
One of the major consequences of Cauchy’s MVT is L’Hˆopital’s rule, in another section. 13/1/2003 1054 mam
142
MEAN VALUE THEOREMS
Notes
13/1/2003 1054 mam
L’Hˆ opital’s Rule This actually refers to several related results about calculating the limit of a quotient f /g of two functions f, g by calculating the limit of the quotient f 0 /g 0 of their derivatives. L’Hˆ opital’s rule (Simplest case.). Let f, g be defined on an interval I of R, differentiable with g 0 not 0 in a deleted neighbourhood of c in I, continuous at c with f (c) = g(c) = 0. If f 0 (x) = L ∈ R, lim 0 x−→c g (x) then lim
x−→c
f (x) = L. g(x)
Remember that each neighbourhood of c in I is of the form B(c, ε) ∩ I, which is another interval ⊂ I. The corresponding deleted neighbourhood is ((c − ε, c) ∩ I) ∪ ((c, c + ε) ∩ I), which can be taken of the form (a, c) ∪ (c, b), unless c happens to be an endpoint of I in which the deleted neighbourhood becomes either (a, c) or (c, b). Proof. First, let U = (a, c) ∪ (c, b) be the deleted neighbourhood of c in I such that f 0 and g 0 exist at all points of U and g 0 is not zero in U . We use the sequential criterion for convergence. We must show that for each sequence (xn ) in U with f (xn ) −→ L. Since g 0 (x) 6= 0 for x ∈ U , we see that g is strictly xn −→ c, g(xn ) monotone in (a, c) and in (c, b), and since g(c) = 0, this implies g 6= 0 in U . Fix (xn ) a sequence in U . For each xn , either xn < c or c < xn . Since g is continuous on [xn , c] (or [c, xn ]), and differentiable on the interior, Cauchy’s Mean Value Theorem allows us to choose cn between xn and c with (f (xn ) − f (c))g 0 (cn ) = (g(xn ) − g(c))f 0 (cn ). In other words, since f (c) = g(c) = 0, (f (xn ) − f (c)) f 0 (cn ) f (xn ) = = 0 . g(xn ) (g(xn ) − g(c)) g (cn ) Now, since cn is between xn and c, |cn −c| ≤ |xn −c|, so cn −→ c, and by hypothesis, f 0 (cn ) −→ L, g 0 (cn ) so
f (xn ) −→ L. g(xn )
Since (xn ) was an arbitrary sequence in U converging to c, lim
x−→c
f (x) = L. g(x)
Of course, in Calculus, we are used to using the above theorem, without assuming continuity at c, but just that f and g both have limit 0 at c. 13/1/2003 1054 mam 143
ˆ L’HOPITAL’S RULE
144
L’Hˆ opital’s rule (0/0 form at a real.). Let I be an interval of R, c ∈ I f, g be defined on I of R, except possibly at c, differentiable with g 0 not 0 in a deleted neighbourhood of c in I, and limx−→c f (x) = limx−→c g(x) = 0. If f 0 (x) =L x−→c g 0 (x) lim
then lim
x−→c
f (x) = L. g(x)
Proof. We simply define extensions, f and g on I by f (x) =
f (x), x 6= c 0,
x = c.
g(x) =
g(x), x 6= c 0,
x = c.
0
Then, f and g are continuous with value 0 at c, and f = f 0 and g 0 = g 0 6= 0 in a deleted neighbourhood of c. Thus, 0
f (x) f 0 (x) = lim 0 = L. lim 0 x−→c g (x) x−→c g (x) and hence lim
x−→c
f (x) f (x) = lim = L. g(x) x−→c g(x)
Recall the practice problem: Lemma. If h is defined (at least) in [a, ∞), a > 0, and H(t) = h(1/t) for 0 < t < 1/a. Then lim h(x) = lim H(t), x−→∞
t−→0+
whenever one side exists in R. L’Hˆ opital’s rule (0/0 form at ∞ or −∞.). Let f, g be defined (at least) on an interval on [a, ∞) of R, differentiable there with g 0 not 0 and limx−→∞ f (x) = limx−→∞ g(x) = 0. If f 0 (x) = L ∈ R, lim 0 x−→∞ g (x) then lim
x−→∞
f (x) = L. g(x)
A similar statement holds for limits at −∞. Proof. We may assume a > 0. Put F (t) = f (1/t), G(t) = g(1/t), for each t with 1/t ∈ [a, ∞). Then F and G are differentiable in (0,1/a) and F 0 (t) f 0 (1/t)(−1/t2 ) f 0 (1/t) f 0 (x) = lim 0 = lim 0 = lim 0 = L. 0 2 x−→∞ g (x) t−→0+ G (t) t−→0+ g (1/t)(−1/t ) t−→0+ g (1/t) lim
13/1/2003 1054 mam
ˆ L’HOPITAL’S RULE
145
Moreover, F and G converge to 0 as t −→0+, so by the previous result, lim
x−→∞
f (x) F (t) = lim = L. g(x) t−→0+ G(t)
. Here is a version when the denominator converges to ∞, note that the numerator is not assumed to converge to ∞, as it usually is in Calculus books. L’Hˆ opital’s rule (A/∞ form). Let f, g be defined and differentiable (at least) in a (possibly infinite) interval (a, b) ⊂ R, except at some c, with g 0 never 0 there. If limx−→c g(x) = ∞, (or −∞) and f 0 (x) = L ∈ R, x−→c g 0 (x) lim
then lim
x−→c
f (x) = L. g(x)
Proof. We may assume b = c, since the case c = a is similar and the general case follows by considering right and left limits. We do the case L is finite, but write the proof in such a way that it is easily changed to give the infinite cases. Let u < L < v. We need only show that there is an interval U = (r, c) with f (x) u< < v, for all x ∈ U . Choose u0 , v 0 ∈ R with u < u0 < L < v 0 < v. Since g(x) f 0 (x) = L, x−→c g 0 (x) lim
there exists an interval U1 = (r1 , c) with u0 <
f 0 (t) < v0 , g 0 (t)
(∗)
for all t ∈ U1 . Fix x, y ∈ U1 (distinct) and apply Cauchy’s MVT to get a t between x and y with f 0 (t) f (x) − f (y) = 0 . g(x) − g(y) g (t) There is no problem about dividing by zero here, since g 0 is not 0 anywhere on (a, c), and hence g(x) is never g(y). Since t is between x and y, it also belongs to U1 , hence by (∗), f (x) − f (y) < v0 . (∗∗) u0 < g(x) − g(y) We need not make reference to t any longer. This statement (∗∗) holds for all x and y in U1 . Hold y fixed throughout the remainder of the proof. 13/1/2003 1054 mam
ˆ L’HOPITAL’S RULE
146
Since g(x) −→ ∞, we may choose a smaller neighbourhood U2 = (r2 , c) with g(x) > g(y) and g(x) > 0, for x ∈ U1 . Divide the numerator and denominator of the quotient in (∗∗) by g(x) to obtain f (x) f (y) − g(x) g(x) u0 < < v0 . g(y) 1− g(x) The denominator is > 0 here so we may multiply through by it, preserving the inequality. After adding f (y)/g(x), this gives g(y) g(y) f (y) f (x) f (y) u0 1 − + < < v0 1 − + . g(x) g(x) g(x) g(x) g(x) Now, as x −→ c, since g(x) −→ ∞, the left side here converges to u0 and the right side converges to v 0 . Thus, since u < u0 , and v > v 0 , there is a smaller deleted neighbourhood U3 = (r3 , c) such that the left side is > u and the right side is < v, for x ∈ U3 . Hence f (x) < v. u< g(x) Thus, U3 is the required U . In the case the limit L is +∞, a proof is obtained by deleting the side of the inequalities above containing v and v 0 , since the neighbourhoods of +∞ are of the form (u, +∞). Similarly, the proof for the case L = −∞ is obtained by deleting the side involving u and u0 .
13/1/2003 1054 mam
Taylor’s Theorem The following theorem, named after Brook Taylor (1685–1731), is an extension of the Mean Value Theorem to include higher derivatives. For a function defined on an interval I. If f is differentiable in a neigbhourhood of x and the derivative f 0 is differentiable at x, then the second derivative of f at x is f 00 (x) = (f 0 )0 (x). Similarly, f 000 (x) is the derivative of f 00 at x, provided f 00 is exists in a neigbourhood of x, and f 00 is differentiable at x. Other notation for these are f (2) (x) and f (3) (x). This notation can be extended recursively. We put f (0) (x) = f (x), and supposing f (n) exists in a neighbourhood of x and is differentiable at x, we put f (n+1) (x) = (f (n) )0 (x). Taylor’s Theorem. (Taylor’s formula with remainder.) Let I be an interval of the reals, f : I −→ R and its first n derivatives be continuous on I and differentiable on the interior of I. Let x0 ∈ I. Then for all x ∈ I, f (x) =
n X f (k) (x0 ) k=0 (n+1)
where Rn (x) =
k!
(x − x0 )k + Rn (x)
(c) f (x − x0 )n+1 , for some c between x and x0 . (n + 1)!
We emphasise that f (n+1) is assumed to exist on the interior of I, but need not be continuous, and it is not assumed to exist at the endpoints of I. Remember that f (0) = f and 0! = 1, (x − x0 )0 = 1, even when x = x0 . Proof. For each x ∈ I, put Pn (x) =
n X f (k) (x0 ) k=0
k!
(x − x0 )k
This is called the Taylor polynomial of order n about x0 ; it is defined for all x in I, since f is n-times differentiable. Define the remainder Rn (x) by subtraction: Rn (x) = f (x) − Pn (x). From the way it is defined, the remainder automatically satisfies f (x) = Pn (x) + Rn (x). We are to show it can be written in the stated form. Keep x, x0 ∈ I fixed, with x 6= x0 , and define a function g(t) =
n X f (k) (t) k=0
Then, since (x − x) = 0 and
k!
(x − t)k + Rn (x)
(x − t)n+1 . (x − x0 )n+1
(x − x0 )n+1 = 1, (x − x0 )n+1
g(x) = f (x) and g(x0 ) = Pn (x) + Rn (x) = f (x). 13/1/2003 1054 mam 147
148
TAYLOR’S THEOREM
Thus, by Rolle’s Theorem (or the Mean Value Theorem), there exists a c between x and x0 with g 0 (c) = 0. Using the rules of differentiation, we calculate: f (k) (t) (n + 1)(x − t)n k−1 − Rn (x) (x − t) − k(x − t) g (t) = f (t) + k! k! (x − x0 )n+1 k=1 n (k+1) X (t) f (k) (t) (n + 1)(x − t)n f (x − t)k − (x − t)k−1 − Rn (x) = f 0 (t) + k! (k − 1)! (x − x0 )n+1 0
n (k+1) X (t) f
0
k=1 (n+1)
0
= f (t) + =
f
(n+1)
n!
f
(t)
(t)
n!
k
f 0 (t) (x − t)n (x − t) − (x − t)0 − Rn (x)(n + 1) 0! (x − x0 )n+1 n
(x − t)n − Rn (x)(n + 1)
(x − t)n . (x − x0 )n+1
Now, at t = c the left side is 0, so when we solve for Rn (x), the (x − t)n cancel and Rn (x) =
f (n+1) (c) (x − x0 )n+1 , (n + 1)n!
which is equal to the required form.
The form given to the remainder here is generally attributed to Lagrange. Taylor didn’t actually prove this theorem, but gave the infinite series expansion, known as Taylor’s series, without discussing questions of convergence. Example. If f (x) = ex , for x ∈ R, then f 0 (x) = ex , for all x ∈ R. Thus, for all n f (n) (0) = e0 = 1 and the Taylor polynomial of order n about 0 is Pn (x) =
n X 1 k x . k!
k=0
Taylor’s Theorem says f (x) = Pn (x) +
f (n+1) (c) n+1 ec x xn+1 , = Pn (x) + (n + 1)! (n + 1)!
for some c between 0 and x. If, for example x > 0, and n = 4, we obtain (since et increases with t) 1 < ec < e and x5 x5 < ex < P4 (x) + ex . P4 (x) + 5! 5! x
13/1/2003 1054 mam
.
The Riemann Integral Here we introduce the Riemann integral using the formulation due to Darboux. You will have heard of Riemann sums from calculus. The Darboux sums are extreme forms of these which behave a little more nicely from some points of view. In everything, if we don’t say otherwise, f is a real-valued function defined (at least) on a bounded interval [a, b] to R. A partition of [a, b] is a finite set of points P = {x0 , . . . , xn } with a = x0 ≤ x1 ≤ x2 ≤ · · · ≤ xn = b. Usually, we will assume that x0 < x1 < x2 < · · · < xn , but for technical reasons, the official definition has weak inequalities. The upper and lower Darboux sums for f with respect to the partition P are U (f, P ) =
n X
sup f ([xi−1 , xi ])(xi − xi−1 ) and
L(f, P ) =
i=1 n X
inf f ([xi−1 , xi ])(xi − xi−1 ).
i=1
Notice that f ([xi−1 , xi ]) = {f (t) : t ∈ [xi−1 , xi ]} is a set of points and the upper sum is defined in terms of the sup of this set. We will use the abbreviation Mi = Mi (f ) = sup f ([xi−1 , xi ]), but we must be aware that this actually depends on the partition P as well. Similarly, we will denote mi = mi (f ) = inf f ([xi−1 , xi ]), We will assume that f is bounded above by M and below by m. We then see that m ≤ mi (f ) ≤ Mi (f ) ≤ M , from which it follows that m(b − a) ≤ L(f, P ) ≤ U (f, P ) ≤ M (b − a). You should check this out. Then the upper and lower Riemann integrals are: Z b Z b f= f (x) dx = inf{U (f, P ) : P is a partition of [a, b]} and, Z
a b
f= a
Z
a b
f (x) dx = sup{L(f, P ). P is a partition of [a, b]} a
When these two are equal f is called Riemann integrable on [a, b] and the RieRb Rb mann integral (or Darboux integral) a f = a f (x) dx is their common value. Note the resemblence of the definition of Z
b
f = inf a
P
n X
Rb
a
f to the lim sup of a sequence
sup f ([xi−1 , xi ])(xi − xi−1 )
i=1
Rb Rb and that of f to the lim inf of a sequence. It is intuitively obvious that f ≤ a a Rb f , but that does require proof, which we will give shortly. But using this, let us a give an example. 13/1/2003 1054 mam 149
150
THE RIEMANN INTEGRAL
Example. Let f (x) = x2 , defined for x ∈ [0, 1]. Consider the partition P = {x0 , x1 , . . . , xn } where xi =
i , for i = 0, . . . , n. Then, for each i, n Mi (f ) = sup f ([xi−1 , xi ]) = x2i =
so U (f, P ) =
n X
Mi (f )(xi − xi−1 ) =
i=1
2 i , n
n X n(n + 1)(2n + 1) i2 1 = 2 n n 6n3 i=1
and hence, since this is one of the sums in the definition of Z
b
Rb
a
f,
n(n + 1)(2n + 1) 6n3
f≤ a
(the infimum of a set is a lower bound for it.) Since the left-side does not depend on n, we may take a limit and get Z
b
f≤ a
2 1 = . 6 3
Similarly, we find L(f, P ) =
n X
n X (n − 1)(n)(2(n − 1) + 1) (i − 1)2 1 = 2 n n 6n3 i=1
mi (f )(xi − xi−1 ) =
i=1
so
Z
b
f≥ a
(n − 1)(n)(2(n − 1) + 1) 6n3
and in the limit
Z
b
f≥ a
1 . 3
As we mentioned above, we will easily be able to prove that Z
b
f≤
Z
1 ≤ 3
Z
Thus f is Riemann integrable with
b
f≤
a
Z
b
f≤ a
a
Rb
f a
a
so
b
f=
1 . 3
1 . 3
Example. Let g be the indicator function of the rationals on [0, 1]: g(x) =
1, if x is rational 0, if x is irrational.
We will see that g is not Riemann integrable. 13/1/2003 1054 mam
THE RIEMANN INTEGRAL
151
Indeed, let P be a partition {x0 , . . . , xn } of [0,1] (with xi−1 < xi , for each i, WLOG). Then for each i, [xi−1 , xi ] contains a rational and an irrational, so Mi (g) = 1 and mi (g) = 0, Thus, U (g, P ) =
n X
1(xi − xi−1 ) = xn − x0 = 1
L(g, P ) =
i=0 n X
0(xi − xi−1 ) = 0
i=0
Since P was an arbitrary partion of [0, 1] we have integrable.
Rb
a
g = 1 and
Rb
a
g = 0, so g is not
All right, since we see that some functions are integrable and others are not, we will work towards characterizing integrable functions. First, here is more of the analogue of lim sup and lim inf in this setting. If P and Q are partitions of the same interval, one says that Q is finer than P or Q is a refinement of P , if Q ⊃ P . Theorem. Let f : [a, b] −→ R, and let P and Q be partitions of [a, b], with Q finer than P . Then L(f, P ) ≤ L(f, Q) ≤ U (f, Q) ≤ U (f, P ). Proof. We may assume that P = {x0 , . . . , xn } and that Q = P ∪ {x∗ }, where / P . The general case follows by a simple induction. x∗ ∈ We work with the lower sums. The upper sums behave similarly. / P , we Now, since Q is a partition of [a, b], we have a ≤ x∗ ≤ b, and since x∗ ∈ know there exists k with xk−1 < x∗ < xk . Then L(f, P ) =
n X
mi (f )(xi − xi−1 )
i=1
=
X
mi (f )(xi − xi−1 ) + inf(f [xk−1 , xk ])(xk − xk−1 )
i6=k
and the lower sum for Q is the same, except that the last term is replaced by two new ones X mi (f )(xi − xi−1 )+ L(f, Q) = i6=k
inf f ([xk−1 , x∗ ])(x∗ − xk−1 ) + inf f ([x∗ , xk ])(xk − x∗ ). But if t1 = inf f ([xk−1 , x∗ ] and t2 = inf f ([x∗ , xk ], we use the fact that A ⊃ B implies inf A ≤ inf B to obtain mk (xk − xk−1 ) = mk (x∗ − xk−1 ) + mk (xk − x∗ ) ≤ t1 (x∗ − xk−1 ) + t2 (xk − x∗ ), which is all that is needed to obtain L(f, P ) ≤ L(f, Q).
13/1/2003 1054 mam
152
THE RIEMANN INTEGRAL
Corollary. If P and Q are partitions of [a, b] and f : [a, b] −→ R, then L(f, P ) ≤ Rb Rb U (f, Q), and hence f ≤ a f . a
Proof. If P and Q are any partitions of [a, b], then P ∪ Q is a refinement of each of them, hence L(f, P ) ≤ L(f, P ∪ Q) ≤ U (f, P ∪ Q) ≤ U (f, Q). This proves the first statement: L(f, P ) ≤ U (f, Q), for all partitions P and Q of [a, b]. Since L(f, P ) is a lower bound for the set {U (f, Q) : Q a partition of [a, b]}, When we take infimum, we have L(f, P ) ≤
Z
b
f. a
Then the upper integral becomes an upper bound for all the lower sums, so when we take supremum over the possible P , we have Z
b
f≤ a
as required.
13/1/2003 1054 mam
Z
b
f, a
Existence of the Riemann integral We continue to use the notation from the definition of the Darboux form of the Riemann integral; f is still a bounded function on [a, b] to R. Theorem. (basic integrability criterion) Let f be a bounded function on [a, b] to R. Then f is Riemann integrable iff for each ε > 0, there exists a partition P of [a, b] with U (f, P ) − L(f, P ) < ε. Proof. Suppose first that f is integrable and let ε > 0. Then, since the infimum of upper sums, there exists a partition P1 of [a, b] with Z
b
and since with
a
f is also
Rb
a
a
f =
Rb
a
f,
ε > U (f, P1 ), 2
f+ a
Rb
Rb
f a supremum of lower sums, there exists a partition P2 Z
b
ε < L(f, P2 ). 2
f− a
Put P = P1 ∪ P2 . Since “upper sums come down and lower sums go up”; that is, since refinement decreases upper sums and increases lower sums, we have Z
b
f+
ε > U (f, P), 2
f−
ε < L(f, P ), 2
a
and
Z
b a
and subtraction gives U (f, P ) − L(f, P ) < ε. Now, for the converse, let ε > 0 and choose a partition P of [a, b] such that U (f, P ) − L(f, P ) < ε. Since
Z Z
b
f ≤ U (f, P ),
and
a b
f ≥ L(f, P ), a
and since
Rb
a
f≤
Rb
a
f , subtracting gives
0≤
Z
b
f−
Z
a
Since ε is arbitrary, we have
Rb
a
f−
Rb
a
b
f < ε. a
f = 0, so f is integrable, by definition.
Now we will apply this to give integrability of two of our favourite kinds of functions. 13/1/2003 1054 mam 153
154
EXISTENCE OF THE RIEMANN INTEGRAL
Theorem. Let f be a real function on [a, b]. (1) If f is monotone, then f is integrable on [a, b]. (2) If f is continuous, then f is integrable on [a, b]. Proof. We show in each case that the basic integrablility criterion is satisfied. We see that for any partition of [a, b] U (f, P ) − L(f, P ) =
n X
(Mi − mi )(xi − xi−1 ),
i=1
and we will use the hypotheses to make these small. In the first case this will be done by making the ∆xi = xi − xi−1 small, and in the second by making the Mi − mi small. (1) We assume without loss of generality that f is increasing. The decreasing case is similar. Let ε > 0. Choose δ > 0 such that (f (b) − f (a))δ < ε and then a partition P = {x0 , . . . , xn } such that xi − xi−1 < δ for all i = 1, . . . , n. Now, U (f, P ) − L(f, P ) =
n X
(Mi − mi )(xi − xi−1 ).
i=1
which we are to show is less than ε. Since f is increasing, for each i, Mi (f ) = f (xi ) and mi (f ) = f (xi−1 ). Thus, U (f, P ) − L(f, P ) =
n X
(f (xi ) − f (xi−1 ))(xi − xi−1 ),
≤
i=1 n X
(f (xi ) − f (xi−1 )δ = (f (b) − f (a))δ < ε.
i=1
Thus, the basic integrability criterion is satisfied and f is integrable. (2) Suppose f is continuous on [a, b]. Then f is uniformly continuous. Let ε > 0. Then there exists δ > 0 such that |x − y| < δ implies |f (x) − f (y)| < ε/(b − a). Then choose any partition P with xi − xi−1 < δ. Since f attains a maximum and a minimum on [xi−1 , xi ], there are points si and ti in [xi−1 , xi ] with f (si ) = mi and f (ti ) = Mi . Since |si − ti | < δ, we have Mi − mi < ε/(b − a) and n X
n ε X . U (f, P ) − L(f, P ) = (Mi − mi )(xi − xi−1 ) < (xi − xi−1 ) = ε. b − a i=1 i=1
Again this shows the basic integrablity criterion is satisfied, so f is integrable.
Continuous and monotone functions are by no means the only ones that are integrable. Changing the value of a function on a finite number of points, for example destroys both properties, but leaves the function integrable with the same integral. We will see this a little later. Right now, you should prove the special case: 13/1/2003 1054 mam
EXISTENCE OF THE RIEMANN INTEGRAL
155
Theorem. If S is a finite subset of [a, b], f : [a, b] −→ R, and f (x) = 0 if x ∈ / S, Rb then f is integrable with a f = 0. Example. The Dirichlet function, defined on [0,1] by ( 1 m , if x = is rational in lowest terms f (x) = n n 0, otherwise is integrable with integral 0. The point is that we have shown elsewhere in the course that this function is discontinuous at all rationals (and continuous at all irrationals) of [0, 1] and it is certainly not monotone. Nevertheless, it is integrable. Proof. Since f ≥ 0,
R1 0
f ≥ 0, so it is the upper integral that is of interest.
Now, let ε > 0. Since there are only finitely many n with 1/n > ε, there are only finitely many rationals in [0, 1] of the form m/n, with f (m/n) = 1/n > ε. But there are no irrationals with f (x) > ε, since f (x) is 0 for x irrational. Thus, the set F := {x : f (x) > ε} is finite. Say it has N elements. Choose any partition P = {x0 , x1 , . . . , xn } of [0, 1] ε of mesh kP k = max{xi − xi−1 : i = 0, . . . , n} < . We may assume xi−1 < xi , for N all i. Put J = {i : F ∩ [xi−1 , xi ] 6= 0}. Then, J has at most N elements and U (f, P ) =
n X
Mi (f )∆xi
i=1
=
X
X
Mi (f )∆xi +
i∈J
Mi (f )∆xi
i∈J /
By definition, if i ∈ / J, f (x) ≤ ε, so X
Mi (f )∆xi ≤
i∈J /
X
ε∆xi ≤
n X
ε∆xi = ε
i=1
i∈J /
For i ∈ J, we still have Mi (f ) ≤ 1, and ∆xi < ε/N , so X
Mi (f )∆xi ≤
i∈J
X
1
i∈J
ε ε
Altogether, U (f, P ) ≤ 2ε. Thus,
Z
1
f ≤ 2ε,
for all ε > 0,
0
and hence
R1 0
f ≤ 0. Since
13/1/2003 1054 mam
R1 0
f ≥ 0, we have
R1 0
f = 0.
Equivalence of the Riemann and Darboux integrals Here we show that the original definition of integral given by Riemann gives the same result as the Darboux formulation. We assume f is a real-valued function defined (at least) on a bounded interval [a, b] to R. A partition of [a, b] is a finite set of points P = {x0 , . . . , xn } with a = x0 ≤ x1 ≤ x2 ≤ · · · ≤ xn = b. The mesh (or norm) of the partition P is kP k = maxi |xi − xi−1 | = maxi ∆xi . A choice of a point ti in the interval [xi−1 , xi ] is nowadays referred to as a tag and P = {x0 , . . . , xn } together with a list t = {t1 , . . . , tn } of tags ti ∈ [xi−1 , xi ] is called a tagged partition. (Let’s say “t tags P ”.) Corresponding to each such tagged partition is a Riemann sum R(f, P, t) =
n X
f (ti )(xi − xi−1 ).
i=1
The function f is called Riemann integrable over [a, b] if a real number I exists such that lim R(f, P, t) = I, kP k−→0
in the sense that for all ε > 0, there exists a δ > 0 such that R(f, P, t) − I < ε, whenever P is a partition of [a, b] with kP k < δ and t tags P . Of course, I is then referred to as the Riemann integral of f over [a, b]. The upper and lower Darboux sums for f with respect to the partition P are U (f, P ) =
n X
Mi (f )(xi − xi−1 ) and
L(f, P ) =
i=1 n X
mi (f )(xi − xi−1 ),
i=1
where Mi (f ) = sup f ([xi−1 , xi ]) and mi (f ) = inf f ([xi−1 , xi ]). Since, for each tag ti ∈ [xi−1 , xi ], mi (f ) ≤ f (ti ) ≤ Mi (f ), we see immediately that each Riemann sum R(f, P, t) satisfies L(f, P ) ≤ R(f, P, t) ≤ U (f, P ). Actually: Lemma. For each partition P of [a, b], U (f, P ) = sup R(f, P, t) and t tags P
L(f, P ) =
inf t tags P
13/1/2003 1054 mam 156
R(f, P, t).
EQUIVALENCE OF THE RIEMANN AND DARBOUX INTEGRALS
157
Proof. We have already noticed that when t tags P , R(f, P, t) ≤ U (f, P ), so U (f, P ) is an upper bound for {R(f, P, t) : t tags P }. Now, let ε > 0. For each i = 1, . . . , n, choose ti ∈ [xi−1 , xi ], so that f (ti ) > Mi (f ) −
ε . (b − a)
With the tag list t = {t1 , . . . , tn } we then have n X
R(f, P, t) =
f (ti )∆xi >
i=1
n X
n
Mi (f )∆xi −
i=1
X ε ∆xi (b − a) i=1
= U (f, P ) − ε, which shows U (f, P ) is the least upper bound. Hence, sup{R(f, P, t) : t tags P } = U (f, P ). Similarly, inf{R(f, P, t) : t tags P } = L(f, P ).
As a result of the lemma we obtain Z
b
f = inf U (f, P ) = inf sup R(f, P, t), a
P
P t tags P
a kind of lim sup, and Z
b
f = sup L(f, P ) = sup a
P
P
inf
R(f, P, t).
t tags P
a kind of liminf. Then, f is integrable in Darboux’s sense if the limsup and liminf are equal.
13/1/2003 1054 mam
158
EQUIVALENCE OF THE RIEMANN AND DARBOUX INTEGRALS
The real difference between the Riemann and Darboux formulations of the integral is the different limit mechanisms. Riemann uses a limit in terms of the mesh ( =norm ) kP k of the partition. We recall that if P and Q are partitions of [a, b], with Q finer than P , then L(f, P ) ≤ L(f, Q) ≤ U (f, Q) ≤ U (f, P ).
(∗)
Now, Darboux noticed more. Let kf k = supx∈[a,b] |f (x)|, the supremum norm of f . Lemma. If Q refines P , and the number of points in Q \ P is N , then U (f, P ) − U (f, Q) ≤ 2N kf kkP k and L(f, Q) − L(f, P ) ≤ 2N kf kkP k
Proof. As in the proof of the result (∗), it is enough to prove this for one additional point x∗ , the case of N additional points following by induction. Working as before with lower sums, let Q = P ∪ {x∗ }, x∗ ∈ / P. Now, there exists k with xk−1 < x∗ < xk . Then X mi (f )(xi − xi−1 ) + inf(f [xk−1 , xk ])(xk − xk−1 ) L(f, P ) = i6=k
and the lower sum for Q is the same, except that the last term is replaced by two new ones X L(f, Q) = mi (f )(xi − xi−1 )+ i6=k
inf f ([xk−1 , x∗ ])(x∗ − xk−1 ) + inf f ([x∗ , xk ])(xk − x∗ ). Thus, the difference is L(f, Q) − L(f, P ) = inf f ([xk−1 , x∗ ](x∗ − xk−1 ) + inf f ([x∗ , xk ](xk − x∗ ) − mk (f )(xk − xk−1 ). But all the values of f are between −kf k and kf k, so L(f, Q) − L(f, P ) ≤ kf k(x∗ − xk−1 ) + kf k(xk − x∗ ) + kf k(xk − xk−1 ) = 2kf k(xk − xk−1 ) ≤ 2kf kkP k, as required.
Darboux’s Theorem. If f is a bounded function on [a, b], then for each ε > 0, there exists a δ > 0, such that each partition P of norm < δ, U (f, P ) < L(f, P ) >
Z
b
Z
a b
f + ε and f − ε. a
13/1/2003 1054 mam
EQUIVALENCE OF THE RIEMANN AND DARBOUX INTEGRALS
159
Proof. Fix ε > 0. Using the definition of upper integral, Rb ε let P1 be a partition with U (f, P1 ) < a f + . 2 Suppose there are N points in P1 \ {a, b}. Choose δ > 0 with 4N kf kδ < ε. (Here kf k = supx∈[a,b] |f (x)|, as before.) Then, let P be any partition of [a, b] with kP k < δ. Let Q = P ∪ P1 . Then Q ⊃ P1 , so
U (f, Q) ≤ U (f, P1 ) <
Z
b
ε f+ . 2 a
But also, Q ⊃ P , and Q \ P has at most N points, so
U (f, P ) − U (f, Q) ≤ 2N kf kkP k ≤ 2N kf kδ < and hence U (f, P ) ≤ U (f, Q) +
ε < 2
Thus, there exists δ > 0 such that U (f, P ) < In the same way, for that same δ, L(f, P ) >
Z
ε , 2
b
f + ε. a
Rb
a
Rb
a
f + ε, whenever kP k < δ.
f − ε, whenever kP k < δ.
Theorem. For a bounded function f on [a, b], f is integrable according to Riemann’s definition iff it is integrable according to Darboux’s definition and the two integrals have the same value. Proof. Suppose the Riemann integral exists and equals I. Let ε > 0. Then, there exists a δ > 0 such that for each partition P with kP k < δ, and each tag list t for P, I − ε < R(f, P, t) < I + ε. Fix such a δ, then any such P . Taking the supremum over all the tags gives U (f, P ) ≤ I + ε and taking infimum over all the tags gives I − ε ≤ L(f, P ). Thus,
I − ε ≤ L(f, P ) ≤
Z
b
f≤
Z
I −ε≤
Z
b
f≤ a
f < U (f, P ) ≤ I + ε, a
a
so
b
Z
b
f ≤ I + ε, a
for arbitrary ε > 0, so upper and lower integrals are both equal to I, so the Darboux form of the integral exists and is the same as the Riemann one. 13/1/2003 1054 mam
160
EQUIVALENCE OF THE RIEMANN AND DARBOUX INTEGRALS
Rb Rb Rb Now suppose f is integrable in Darboux’s sense: a f = f = a f and let a ε > 0. Then, by Darboux’s theorem, there exists δ > 0 such that for all partitions P with kP k < δ, Z
b
f − ε < L(f, P ) and U (f, P ) < a
Z
b
f + ε. a
If t tags P , the Riemann sum R(f, P, t) is squeezed between, so Z
b
f − ε < R(f, P, t) < a
Z
b
f + ε. a
This shows that for all tagged partitions (P, t) with kP k < δ, Z b f < ε, R(f, P, t) − a and since ε > 0 is arbitrary, the integral in Riemann’s sense exists and equals the Darboux value.
13/1/2003 1054 mam
Rb a
f,
Properties of the Riemann Integral Theorem. (1) (linearity) The set of Riemann integrable functions on [a, b] is a vector space on which the integral is a linear functinal. Thus, Rb
(a) If f is integrable on [a, b] and k ∈ R, then kf is integrable and Rb k a f. (b) If f and g are integrable on [a, b] then so is f +g and
Rb a
(f +g) =
(2) (monotonicity) If f and g are integrable on [a, b] and f ≤ g, then R R b b (3) If f is integrable on [a, b] then so is |f | and a f ≤ a |f |.
a
Rb
f+
Rb a
g.
f≤
Rb
g.
a
Rb a
kf =
a
(4) (additivity on intervals) If a < c < b and f is integrable on [a, c] and on Rc Rb Rb [c, b], then f is integrable on [a, b] with a f = a f + c f . Proof. (1) (a) Let f be just a bounded function on [a, b], first. For this result, we recall the result that for any set A of real numbers bounded above k sup A, if k ≥ 0 sup kA = k inf A, if k ≤ 0, and a similar result for infimum. We handle the case k ≥ 0. For a fixed partition P = {x0 , . . . , xn } of [a, b], we have Mi (kf ) = sup{kf (t) : t ∈ [xi−1 , xi ]} = k sup{f (t) : t ∈ [xi−1 , xi ]} = kMi (f ) Multiplying by ∆xi and summing gives n X
Mi (kf )∆xi = k
i=1
n X
Mi (f )∆xi ; that is,
i=1
U (kf, P ) = kU (f, P ). Taking infimum over all possible partitions P , we have inf kU (f, P ) = k inf U (f, P ), P
which is to say
P
Z
b
kf = k a
Z
b
f. a
Similarly, for each partition P we have mi (kf ) = inf{kf (t) : t ∈ [xi−1 , xi ]} = kmi (f ); 13/1/2003 1054 mam 161
(U)
162
PROPERTIES OF THE RIEMANN INTEGRAL
hence, L(kf, P ) = kL(f, P ), and taking supremum over all such P gives Z
b
kf = k
Z
b
f.
a
(L)
a
Now, under the given hypothesis, f is integrable, so the right-hand sides of (U) and Rb (L) are both equal to k a f , so the left-hand sides are also; that is, kf is integrable, with Z Z b
b
kf = k
f.
a
a
The case k ≤ 0 is similar, and makes a good exercise. Note that in handling it you will need the fact that multiplication by a negative number changes supremum to infimum and vice-versa. (b) In fact, if f, g are bounded, Z
b
(f + g) ≤
Z
a
and
Z
b
(f + g) ≥
b
f+
Z
a
Z
a
b
g
(∗U)
g.
(∗L)
a
b
f+
Z
a
b a
To prove (∗U), let P be an arbitrary partition of [a, b]. Then for each i,
Mi (f + g) = sup{f (t) + g(t) : t ∈ [xi−1 , xi ]} ≤ sup{f (t) : t ∈ [xi−1 , xi ]} + sup{g(t) : t ∈ [xi−1 , xi ]} = Mi (f ) + Mi (g). Multiplying by ∆xi and summing gives U (f + g, P ) ≤ U (f, P ) + U (g, P ). So
Z
b
(f + g) ≤ U (f, P ) + U (g, P ). a
Now, if ε > 0, we can find a partition P1 of [a, b] with U (f, P1 ) <
Z
b
f + ε/2 a
and another one P2 with U (f, P2 ) <
Z
b
g + ε/2 a
Let P = P1 ∪ P2 , a refinement of both to get U (f, P ) ≤ U (f, P1 ) ≤
Z
b
f + ε/2, a
13/1/2003 1054 mam
PROPERTIES OF THE RIEMANN INTEGRAL
U (g, P ) ≤ U (g, P2 ) ≤
Z
163
b
g + ε/2; a
hence, U (f, P ) + U (g, P ) ≤
Z
b
f+
Z
a
Thus,
Z
b
(f + g) ≤
Z
b
f+
a
b
g+ε a
Z
b
g + ε.
a
a
Since ε was arbitrary, (∗) follows. The proof of (∗L) is similar and left as an exercise. Up till now, we didn’t assume integrability. But if f and g are integrable, the Rb Rb right sides of (∗U) and (∗L) are both a f + a g, so the left sides are so also. That is Z Z Z b
b
(f + g) =
b
f+
a
g,
a
a
as required. (2) To prove monotonicity of the integral, again we will actually prove it for upper and lower integrals separately. Let f ≤ g. Let P be an arbitrary partition of [a, b]. Then, for each i, and each t ∈ [xi−1 , xi ], f (t) ≤ g(t), so taking suprema over these t gives Mi (f ) ≤ Mi (g). Multiplying by ∆xi and summing gives U (f, P ) ≤ U (g, P ). Since P was arbitrary, we may take infimum over all such P and get Z
b
f≤
Z
b
g.
a
a
The proof for lower integral is similar. If f and g are assumed integrable, either statement is saying Z b Z b f≤ g. a
a
(3) Let f be integrable. If we can show |f | integrable, we can use monotonicity: −|f | ≤ f ≤ |f |, so noting that −
Rb a
|f | =
Rb a
(−|f |), we obtain −
Z
b
|f | ≤ a
13/1/2003 1054 mam
Z
b
f≤ a
Z
b
|f |, a
164
PROPERTIES OF THE RIEMANN INTEGRAL
or in other words
Z b Z b f ≤ |f |. a a
To prove that |f | is integrable, we use the Basic Criterion. Let ε > 0. Suppose P is a partition of [a, b] with U (f, P ) − L(f, P ) < ε. For each i, we have for s, t ∈ [xi−1 , xi ] ||f (t)| − |f (s)|| ≤ |f (t) − f (s)| Then Mi (|f |) − mi (|f |) = sup ||f (t)| − |f (s)|| ≤ sup |f (t) − f (s)| = Mi (f ) − mi (f ). s,t
s,t
So multiplying by ∆xi and summing gives U (|f |, P ) − L(|f |, P ) ≤ U (f, P ) − L(f, P ) < ε, hence |f | satisfies the Basic Criterion for integrablility, as required.
13/1/2003 1054 mam
PROPERTIES OF THE RIEMANN INTEGRAL
165
(4) The additivity on intervals is proved in a manner similar to that for the additivity on functions, (1(b)), but simpler. Let a < c < b and suppose f is integrable on each of [a, c] and [c,b]. Let ε > 0. Choose a partition P1 = {x0 , . . . , xn } of [a, c] with U (f, P1 ) <
Z
c
f + ε/2 a
and a partition P2 = {t0 , . . . , tm } of [c, b] with
U (f, P2 ) <
Z
b
f + ε/2. c
Put P = P1 ∪ P2 = {x0 , . . . , xn , t1 , . . . , tm }. (Note xn = t0 .) Then Z
b
f ≤ U (f, P ) a
=
n X
sup f ([xi−1 , xi ])∆xi +
i=1
m X
sup f ([tj−1 , tj ])∆tj
j=1
= U (f, P1 ) + U (f, P2 ) Z c Z b f+ f +ε ≤ a
Thus
Rb
a
f≤
Rc
a
f+
Rb
c
c
f + ε, for all ε > 0, so Z
b
f≤
Z
a
c
f+
Z
a
b
f c
and similarly Z
b
f≥
Z
a
c
f+
Z
b
f.
a
c
The integrability of f on [a, c] and on [c,b] implies the right sides here are both Rb Rc f + c f , so the left sides are equal, and hence f is integrable over [a, b] with a Z
b
f= a
as required. 13/1/2003 1054 mam
Z
c
f+ a
Z
b
f, c
166
PROPERTIES OF THE RIEMANN INTEGRAL We could avoid the epsilons in the above argument as follows: Let a < c < b and suppose f is integrable on each of [a, c] and [c,b]. Let P1 = {x0 , . . . , xn } be a partition of [a, c] and P2 = {t0 , . . . , tm } a partition of [c, b]. Put P = P1 ∪ P2 = {x0 , . . . , xn , t1 , . . . , tm }. (Note xn = c = t0 .) Then
Zb f ≤ U (f, P ) a
=
n X
sup f ([xi−1 , xi ])∆xi +
i=1
m X
sup f ([tj−1 , tj ])∆tj
j=1
= U (f, P1 ) + U (f, P2 )
Zb
Thus,
f ≤ U (f, P1 ) + U (f, P2 ). a
Take infimum over all partions P1 of [a, c] and over all partitions P2 of [c, b], yielding
Zb
Zc f≤
a
c
Zc f≥
Zb f+
a
R
f.
a
Zb
Similarly,
Zb f+
f.
a
c
R
The integrability of f on [a, c] and on [c,b] implies the right sides here are both c b a f + c f , so the left sides are equal, and hence f is integrable over [a, b] with
Zc
Zb
f= a
as required.
Zb
f+ a
f, c
1. Prove monotonicity of the integral by using the linearity of integral, together with the fact, done in another exercise, that the (lower) integral of a non-negative function is always ≥ 0. 2. Prove linearity of the integral, using Riemann’s definition. This proof is a straighforward adaptation of earlier proofs (such as limn (xn + yn ) = limn xn + limn yn ).
13/1/2003 1054 mam
Integral of composites, products, etc. Theorem. Let f : [a, b] −→ [c, d] and g : [c, d] −→ R. (a) if f is integrable and g is continuous then g ◦ f is integrable on [a, b]. (b) If f and g are both integrable, the composite need not be integrable. Proof. (a) Since g is continuous, it is bounded: kgk = supx∈[c,d] |g(x)| < ∞. Let ε > 0. Choose ε0 > 0 so that ε0 (b − a) + 2kgkε0 < ε. Since g is continuous on [c, d], g is uniformly continuous, so we may choose δ > 0 such that δ < ε0 and |g(s) − g(t)| ≤ ε0
when |s − t| < δ.
Since f is integrable, there exists a partition P of [a, b] with U (f, P ) − L(f, P ) < δ 2 . Fix such a P = {x0 , . . . , xn }. Divide the index set into two parts, A = {i : Mi (f ) − mi (f ) < δ} B = {i : Mi (f ) − mi (f ) ≥ δ} Then, for i in A, x, y ∈ [xi−1 , xi ] implies |f (x) − f (y)| ≤ Mi (f ) − mi (f ) < δ so |g ◦ f (x) − g ◦ f (y)| = |g(f (x)) − g(f (y))| ≤ ε0 , Hence, (Mi (g ◦ f ) − mi (g ◦ f ))∆xi ≤ ε0 ∆xi , for i ∈ A and
X
(Mi (g ◦ f ) − mi (g ◦ f ))∆xi ≤
i∈A
X
ε0 ∆xi ≤ ε0 (b − a),
i∈A
For i ∈ B, (Mi (f ) − mi (f ))/δ ≥ 1 so X 1X 1 ∆xi ≤ (Mi (f ) − mi (f ))∆xi ≤ (U (f, P ) − L(f, P )) < δ < ε0 , δ δ i∈B i∈B X X (Mi (g ◦ f ) − mi (g ◦ f ))∆xi ≤ 2kgk∆xi < 2kgkε0 , i∈B
i∈B
Thus, altogether, U (g ◦ f, P ) − L(g ◦ f, P ) ≤ ε0 (b − a) + 2kgkε0 < ε This shows g ◦ f is integrable by the Basic Integrability Criterion. (b) The composite need not be integrable if f and g are both integrable but not continuous. This is proved by letting f be the Dirichlet function on [0, 1] and finding an integrable g on [0,1] such that the composite is the indicator function of the rationals on [0,1]. A suitable g is given simply by g(u) = 1, if u 6= 0, and g(0) = 0. One can also prove that g ◦ f need not be integrable if g is integrable and f is continuous. This requires a more sophisticated argument. 13/1/2003 1054 mam 167
168
INTEGRAL OF COMPOSITES, PRODUCTS, ETC.
Examples. (0) If f is integrable on [a, b], then the function sin(f ), which really means sin ◦f is also integrable on [a, b], since the sine function is continuous everywhere. (1) If f is integrable, so is f 2 . Indeed the function g : u 7→ u2 is continuous and g ◦ f = f 2 . (2) If f, g are integrable on [a, b], then so is f g. Indeed, f + g is integrable, so f 2 , g 2 and (f + g)2 are integrable, and hence fg =
1 (f + g)2 − f 2 − g 2 2
is so also. (3) If f is integrable on [a, b] so is |f |. This has been proved elsewhere, but it is also a consequence of the present result since the absolute value map is continuous. (4) If f, g are integrable on [a, b], so are f ∨ g := max{f, g} and f ∧ g := min{f, g}. Indeed, for real numbers u, v,
2u, if u ≥ v 2v, if u < v, = 2 max{u, v}.
u + v + |u − v| =
Thus, f ∨g =
1 (f + g + |f − g|), 2
which is integrable since f, g, |f − g| are. Similarly f ∧g =
1 (f + g − |f − g|) 2
is integrable. 1. If f is integrable on [a, b] and there exists a number m > 0 such that f (x) > m, for all x ∈ [a, b], then 1/f is integrable. (The same would hold if m < 0 and f (x) < m for all x ∈ [a, b].)
13/1/2003 1054 mam
Fundamental Theorem of Calculus If f is a Riemann integrable function on an interval [a, b], we may define a new function F on [a, b] by integrating over subintervals. The result is a continuous function. Theorem. Let f be Riemann integrable on [a, b]. If F is defined on [a, b] by Z x f (t) dt F (x) = a
then F is Lipschitz, hence uniformly continuous on [a, b]. Proof. Recall that f integrable includes that f is bounded. Say |f (x)| ≤ K, for all x ∈ [a, b]. Thus, if x, y ∈ [a, b] with x < y, Z x Z y Z y f− f= f, F (y) − F (x) = a
so |F (y) − F (x)| ≤
Z
y
|f | ≤ x
Z
a
x
y
K ≤ K(y − x) = K|x − y|. x
If y < x, interchange the roles of x and y here; in either case, we get |F (x) − F (y)| ≤ K|x − y|, for all x, y ∈ [a, b]. This is the statement that F is Lipschitz. From this uniform continuity follows, as we know. The smallest possible K in the above argument is the supremum norm of f . What we proved is that |F (x) − F (y)| ≤ kf k|x − y|. (Recall: kf k = supx∈A |f (x)|.) The Fundamental Theorem of Calculus. Let f be Riemann integrable on [a, b]. (1) (differentiating an integral) If F is defined on [a, b] by Z x F (x) = f (t) dt a
and f is continuous at c ∈ [a, b], then F 0 (c) = f (c). (2) (integrating a derivative) If there exists a function F on [a, b] with F 0 = f , then Z b f = F (b) − F (a). a
Notice that in both cases we are assuming integrability of f . The second part Rb is saying that if F 0 is integrable, then a F 0 (t) dt = F (b) − F (a). Since this also Rx applies for those x between a and b, we get a F 0 = F (x) − F (a). 13/1/2003 1054 mam 169
170
FUNDAMENTAL THEOREM OF CALCULUS
Proof. (1) Suppose f is continuous at c. Given ε > 0, choose δ > 0 such that |x − c| < δ implies |f (x) − f (c)| < ε. Then since f (c) is constant Z x Z x Z x f− f (c) = f (t) − f (c) dt F (x) − F (c) − f (c)(x − c) = c c c R x But for |x−c| < δ, the absolute value of the right-side is ≤ c |f − f (c)| ≤ ε|x−c|. Thus, F (x) − F (c) − f (c) ≤ ε, x−c for |x − c| < δ. This says F (x) − F (c) F 0 (c) = lim = f (c), x−→c x−c as claimed. (2) Now, instead assume that there exists F such that F 0 = f . Recall that f is supposed integrable. Let P = {x0 , . . . , xn } be any partition of [a, b]. Use the mean value theorem to find points ti in (xi−1 , xi ) with f (ti )(xi − xi−1 ) = F 0 (ti )(xi − xi−1 ) = F (xi ) − F (xi−1 ). Sum this over i, using the telescoping property and get n X f (ti )(xi − xi−1 ) = F (b) − F (a). i=1
Now, mi (f ) ≤ f (ti ) ≤ Mi (f ), so this says L(f, P ) ≤ F (b) − F (a) ≤ U (f, P ) and thus
Z
b
f ≤ F (b) − F (a) ≤
Z
b
f. a
a
Since we know that f is integrable, both ends here are Rb f. a
Rb a
f , so F (b) − F (a) =
Well, you already know how to use this theorem to evaluate integrals. Let us not bore each other now. Corollary. (Integration by parts.) Suppose that f and g are differentiable on [a, b]. Then, Z b Z b 0 f g = f (b)g(b) − f (a)g(a) − f 0g a
a
Proof. Since f and g are differentiable, they are continuous, hence f g is continuous, hence integrable. By the product formula for differentiation, we have (f g)0 (x) = f 0 (x)g(x) + f (x)g 0 (x) for all x in [a, b]. Thus, by the Fundamental Theorem of Calculus, Z b Z b Z b f 0g + g0 f = (f 0 g + g 0 f ) a
a
=
Z
a b
(f g)0 a
= f (b)g(b) − f (a)g(a).
13/1/2003 1054 mam
FUNDAMENTAL THEOREM OF CALCULUS
Example. Let G(x) =
R x3 0
171
sin(cos(t)) dt. Find G0 (x) if possible.
Soln. Since sin R uand cos are continuous on R, their composite is also, R u so for all u ∈ R, the integral 0 sin(cos(t)) dt exists and the function F : u 7→ 0 sin(cos(t)) dt is differentiable by the Fundamental Theorem of Calculus, with F 0 (u) = sin(cos(u)), for all u. The function in question is G = F ◦ g, where g(x) = x3 . Thus, by the chain rule, G0 (x) = F 0 (g(x))g 0 (x) = sin(cos(g(x)))3x2 .
A similar question with both limits of integration “variable” is handled by R x3 R x3 R x2 subtraction. For example x2 sin(cos(t)) dt = 0 sin(cos(t)) dt − 0 sin(cos(t)) dt, and each term can be treated separately.
R
R
1. Let f be continuous on [a, b] and for each x ∈ [a, b], put F (x) = xb f. Prove that F is differentiable on [a, b] with derivative −f . This shows that if one uses the convention bx f = d x − xb f , when x < b, we still get f = f (x), when f is continuous at x. dx b
R
13/1/2003 1054 mam
R
172
FUNDAMENTAL THEOREM OF CALCULUS
Notes
13/1/2003 1054 mam
Pointwise and uniform convergence Let (fn ) be a sequence of functions defined on a set X. Then (fn ) is said to converge pointwise to f on X if for all x ∈ X, fn (x) converges to f (x). Notation: fn −→ f pointwise on X, or simply limn fn = f . Questions. Suppose (fn ) converges pointwise to f . If fn is continuous at x for each n, is f continuous at x? If fn is differentiable at x, is f differentiable? If fn is integrable is f integrable? Continuity at x means (for example) limt−→x f (t) = f (x), so the first question is, does lim lim fn (t) = lim lim fn (t)? t−→x n
n t−→x
The answer is, in general no. For integrals what we want to know really is whether lim n
Z
b
fn (x) dx =
Z
a
b
f (x) dx
=
a
Z
b
lim fn a
!
n
A limit of continuous functions not continuous. For each n ∈ N, define fn : [0, 1] −→ R by fn (x) = xn . Then for each x ∈ [0, 1], n
lim fn (x) = lim x = n
n
0, if 0 ≤ x < 1 1, if x = 1.
Thus fn −→ f, pointwise, where f (x) = 0 for x ∈ [0, 1) and f (1) = 1. This is certainly not a continuous function. A limit of integrable functions with the “wrong integral”. 1 For each n let fn be the function defined on [0, 2] which is 0 at 0, n at , 0 n 2 again at , linear in between, and 0 on the rest of the interval. n 1 if 0 ≤ x ≤ n2 x, n 1 2 2 fn (x) = 2 −n (x − ), if < x ≤ n n n 0, otherwise Then
R2 0
fn = 1, for all n, but limn fn (x) = 0, for all x ∈ [0, 2], So lim n
Z
2
fn (x) dx 6=
Z
0
2
lim fn (x) dx. 0
n
For another example, let fn (x) = n2 x(1 − x2 )n , 13/1/2003 1054 mam 173
x ∈ [0, 1].
174
POINTWISE AND UNIFORM CONVERGENCE
Then fn (x) → 0, for all x, yet Z
limn
1
fn (x) dx = lim n2 /2(n + 1) → +∞, n
0
which is certainly not
R1 0
0.
The problems here will be rectified by using a stronger kind of convergence of sequences of functions: Let (fn ) be a sequence of functions defined on a set X. Then (fn ) is said to converge uniformly to f on X if for all ε > 0 there exists N ∈ N such that for all n ≥ N and all x ∈ X, |fn (x) − f (x)| < ε. Notation: fn −→ f uniformly (on X). Example. For each n ∈ N, define fn : [0, 1] −→ R by fn (x) = 1 −
x . n
Then (fn ) converges uniformly on [0,1]. Proof. Our first job is to identify the limit. For each fixed x ∈ [0, 1], the sequence x ( ) converges to 0, so we put f (x) = 1, for all x ∈ [0, 1]. For each x ∈ [0, 1] we n have x 1 x |fn (x) − f (x)| = 1 − − 1 = ≤ . n n n Now if ε > 0, then by the Archimedian property, there exists N ∈ N such that 1 < ε, thus for all n ≥ N , and all x ∈ [0, 1], n |fn (x) − f (x)| ≤
1 < ε. n
Thus, For all ε > 0, there exists N ∈ N such that for all n ≥ N and all x ∈ [0, 1], |fn (x) − f (x)| < ε; that is, (fn ) converges to f uniformly on [0, 1]. Example. For each n ∈ N, define fn : [0, 1] −→ R by fn (x) = xn . Then (as we have seen) fn converges pointwise on [0,1] to f given by f (x) =
0, if 0 ≤ x < 1 1, if x = 1.
We claim that this convergence is not uniform. Indeed, choose ε = exists x ∈ [0, 1] with
1 . Let N ∈ N and choose n = N . We want to show there 2 |fn (x) − f (x)| ≥ ε
Now, fN is continuous at 1 with fN (1) = 1, so there exists a neighbourhood U = (1 − δ, 1], such that fN (x) > 1 − 1/3 = 2/3, for x ∈ U . 13/1/2003 1054 mam
POINTWISE AND UNIFORM CONVERGENCE
175
But, f (x) = 0 for x < 1. So choose any x ∈ (1 − δ, 1), obtaining |fN (x) − f (x)| > 1/2 = ε We have shown that there exists an ε > 0 such that for all N ∈ N, there exists n ≥ N and an x ∈ [0, 1] with |fn (x) − f (x)| ≥ ε. This the negation of the defintion of uniform convergence on [0, 1]. Note that in the positive example of uniform convergence, we majorized the 1 which went to 0, independently of x. This is distance to the limit by a number n the key to proving uniform convergence to a known function: Theorem. Let (fn ) be a sequence of functions on X, f another one. Then fn −→ f uniformly iff there exists a sequence (Kn ) of extended real numbers converging to 0, such that for all n ∈ N, |fn (x) − f (x)| ≤ Kn , for all x ∈ X. In this setting, kfn − f k (supremum norm) always serves as a suitable Kn . Proof. Assume fn −→ f uniformly on X. Then, for all ε > 0, there exists N such that for all n ≥ N , |fn (x) − f (x)| ≤ ε. But then by the definition of supremum, for all n ≥ N , kfn − f k = sup |fn (x) − f (x)| ≤ ε. x∈X
Thus, if we put Kn = kfn − f k, then, for all ε > 0, there exists N such that for n ≥ N , |Kn − 0| < ε; that is, Kn −→ 0. Conversely, suppose for each n, Kn is an extended real number such that for all x ∈ X |fn (x) − f (x)| ≤ Kn , and Kn −→ 0. Then, for all ε > 0, there exists N such that for n ≥ N, |fn (x) − f (x)| ≤ Kn < ε, for all x ∈ X. Thus, fn −→ f , uniformly on X, by definition.
Example. If fn (x) = xn , for x ∈ [0, 1), fn → f pointwise, where f (x) = 0, for x ∈ [0, 1). Then, |fn (x) − f (x)| = |xn |, for all x ∈ [0, 1). Thus, kfn − f k = supx∈[0,1) |xn | = 1. Since this does not converge to 0, (fn ) does not converge uniformly.
13/1/2003 1054 mam
176
POINTWISE AND UNIFORM CONVERGENCE
Cauchy criterion for uniform convergence. Let (fn ) be a sequence of functions on X to R. Then there exists a function f such that fn −→ f uniformly iff for all ε > 0 there exists N such that for n, m ≥ N , |fn (x) − fm (x)| ≤ ε, for all x ∈ X.
Proof. The proof of one direction is almost the same as the one for sequences of real numbers. Suppose (fn ) converges uniformly to f . Let ε > 0. Then we can choose N such that for n ≥ N , for all n ≥ N and all x ∈ X, |fn (x) − f (x)| < ε/2. Let n, m ≥ N . Then, for each x ∈ X, |fn (x) − fm (x)| ≤ |fn (x) − f (x)| + |f (x) − fm (x)| < ε/2 + ε/2 = ε. Thus, the condition is satisfied. Conversely, suppose the condition is satisfied. Namely, ∀ε > 0 ∃N ∈ N such that for n, m ≥ N , |fn (x) − fm (x)| ≤ ε , for all x ∈ X. Now, fix a particular x ∈ X. Then, ∀ε > 0 ∃N ∈ N such that for n, m ≥ N , |fn (x) − fm (x)| ≤ ε. This means that the the sequence (fn (x))n of real numbers is Cauchy. Since the set of real numbers is complete in the usual metric, this sequence converges. Since this held for an arbitrary x ∈ X, we define a function f on X by f (x) = lim fn (x), for all x ∈ X. n
So far we have fn −→ f pointwise. Now, going back to the Cauchy condition, fix ε > 0 and choose N such that for all n, m ≥ N , and all x ∈ X, |fn (x) − fm (x)| ≤ ε. Let x ∈ X, and n ≥ N . Then for all m ≥ N , |fn (x) − fm (x)| ≤ ε. Now let m −→ ∞. Since fm (x) −→ f (x), this yields |fn (x) − f (x)| ≤ ε. Since this was true for arbitrary n ≥ N and arbitrary x ∈ X, we can say for all n ≥ N and all x ∈ X, |fn (x) − f (x)| ≤ ε. Thus, for all ε > 0 there exists N such that for n ≥ N , |fn (x) − f (x)| ≤ ε, for all x ∈ X; That is fn −→ f uniformly. 13/1/2003 1054 mam
POINTWISE AND UNIFORM CONVERGENCE
177
Distance interpretation. . We saw above that using the idea of supremum norm kgk = sup |g(x)|, x∈X
a sequence of functions (fn ) converges uniformly to f iff kfn − f k −→ 0, or in other words, for all ε > 0, there exists N ∈ N with kfn − f k ≤ ε, for all n ≥ N . So we could define a distance (the uniform distance) on the set of functions on X by d(f, g) = kf − gk. This satisfies the three metric properties. (1) d(f, g) ≥ 0, d(f, g) = 0 iff f = g. (2) d(f, g) = d(g, f ). (3) d(f, g) ≤ d(f, h) + d(h, g). And then fn −→ f uniformly iff (fn ) converges to f in the uniform distance. Similarly, we see that the Cauchy criterion for uniform convergence is just the Cauchy criterion for this distance. The only thing stopping d from being a metric is the fact that d(f, g) could be +∞, this happens whenever f − g is not a bounded function. We could call d an extended metric. Uniform convergence of series of functions. As with sequences and series of numbers, if (fn )∞ n=1 is a sequence of numbers, P∞ the corresponding series n=1 fn refers to the sequence (sn ) of partial sums sn =
n X
fk .
k=1
(And there is a similar notation for sequences and series indexed on, say {0, 1, 2, . . . }.) The series is said to converge (pointwise) if (sn ) converges pointwise. P P We say that the series n fn converges absolutely if the series n |fn | converges. (Just as for series of numbers, absolute convergence implies convergence but not conversely.) P We say that the series n fn converges uniformly if (sn ) does and the limit is called the sum of the series. The Cauchy criterion for uniform convergence for series becomes P P Theorem. For a series n fn of functions on a set X, n fn converges uniformly on X iff for all ε > 0, there exists N ∈ N such that for n ≥ m ≥ N , n X fk (x) ≤ ε, for all x ∈ X. k=m
As a corollary we have 13/1/2003 1054 mam
178
POINTWISE AND UNIFORM CONVERGENCE
P The Weierstrass M-test. A series n fn of functions fn on X is uniformly and absolutely convergent provided there exists a sequence (Mn ) of real numbers with P |fn (x)| ≤ Mn , for all x ∈ X, and all n such that n Mn converges . The proof should now be an easy exercise. Notice also that if any Mn works in the Weierstrass M-test, then kfn k must also work, since |fn (x)| ≤ Mn for all x ∈ X implies |fn (x)| ≤ kfn k ≤ Mn for all x ∈ X. However, other choices of Mn are often easier to work with. Example. (A series converging uniformly for which the Weierstrass M-test does not apply.) Let fn (x) =
(−1)n+1 x2 for x ∈ [0, 5]. n
Here the Weierstrass M -test does not apply, since if we put (−1)n+1 x2 52 = , Mn = kfn k = sup n n x∈[0,5] then
P∞
n=1
Mn diverges.
However, since, for each x ∈ [0, 5], the sequence ( series
P∞
n=1
x2 ) is decreasing to 0, the n
(−1)n+1 x2 converges by the alternating series test to some f (x), n n 2 X x2 52 k+1 x (−1) ≤ . f (x) − ≤ k n+1 n+1 k=1
Thus, kf −
n 2 X 52 k+1 x −→ 0. (−1) ≤ k=1 fk k = sup f (x) − k n+1 x∈[0,5]
Pn
k=1
Hence, the series
P∞
13/1/2003 1054 mam
n=1
fn converges uniformly on [0, 5].
Uniform convergence: Continuity, integral, derivative. We know that under pointwise convergence, continuity can be lost in the limit. The same is true of integrability and the value of the integral of a limit function, and since integration is connected to differentiation by the Fundamental Theorem of Calculus, there are problems there as well. But, as promised earlier, the difficulties are to a large extent rectified by uniform convergence. Continuity. The uniform limit of continuous functions is continuous. Theorem. Let (fn ) be a sequence of functions on X, a ∈ X, and fn continuous at a for all n ∈ X. If fn −→ f uniformly on X, then f is also continuous at a. Proof. Let ε > 0. By uniform convergence there exists N such that for all n ≥ N and for all x ∈ X, ε |fn (x) − f (x)| ≤ . 3 Fix such an N . Since fN is continuous at a, we can choose a neighbourhood U of a such that ε |fN (x) − fN (a)| < , for all x ∈ U . 3 Thus, for all x ∈ U . |f (x) − f (a)| ≤ |f (x) − fN (x)| + |fN (x) − fN (a)| + |fN (a) − f (a)| < ε, which shows that f is continuous at a. Applying the above statement to each a ∈ X, we have: Theorem. Let (fn ) be a sequence of continuous functions on X converging uniformly to f . Then, f is also continuous. A slight modification of the result about uniform limit of continuous functions gives a corresponding result for a sequence of functions, each of whose limits exist. Theorem. Let (fn ) be a sequence of functions on X and let a be an accumulation point of X. Suppose (fn ) converges uniformly to f on X and for each n ∈ N, limx−→a fn (x) = yn . Then limn yn exists and lim f (x) = lim yn .
x−→a
n
In other words lim lim fn (x) = lim lim fn (x).
x−→a n
n x−→a
Proof. Let ε > 0. By the Cauchy condition for uniform convergence there exists N such that for all n, m ≥ N and for all x ∈ X, |fn (x) − fm (x)| ≤ ε. 13/1/2003 1054 mam 179
180
UNIFORM CONVERGENCE: CONTINUITY, INTEGRAL, DERIVATIVE.
Fix such an N . Fix n, m ≥ N for a moment and take the limit as x −→ a. This yields |yn − ym | ≤ ε, Thus, for all n, m ≥ N |yn − ym | ≤ ε, so the sequence (yn ) is Cauchy, hence converges to some number y. Now, we start again. Let ε > 0. Since fn converges uniformly on X, and yn −→ y, there exists N so large that n ≥ N implies |fn (x) − f (x)| ≤ ε/3
for all x ∈ X
and |yn − y| ≤ ε/3. Choose n = N. Since limx−→a fN (x) = yN , there is a neighbourhood U of a such that |fN (x) − yN | < ε/3, for x ∈ U \ {a}. Thus, for x ∈ U \ {a}, |f (x) − y| ≤ |f (x) − fN (x)| + |fN (x) − yN | + |yN − y| < ε, which shows that lim f (x) = y = lim yn ,
x−→a
as promised.
n
Example. For each n ∈ N let fn (x) = tan−1 (nx) for all x ∈ R. Here tan−1 refers to arctan, the inverse of the tangent function restricted to its principal domain (−π/2, π/2). Now, tan−1 (0) = 0; as y −→ ∞, tan−1 (y) −→ π/2;and as y −→ −∞, tan−1 (y) −→ −π/2. Thus, if x > 0 π/2, π lim fn (x) = lim tan−1 (nx) = 0, if x = 0 = sgn(x), n n 2 −π/2, if x < 0. where sgn is the function known as signum, which gives the “sign of x” (+1 if x is > 0, −1 if it is < 0 and 0 otherwise). Thus, fn −→ sgn pointwise. But the convergence cannot be uniform, since the the fn are all continuous, but the limit function is discontinuous at 0. Note. Can you see why a is assumed to be an accumulation point of X in the theorem on limits? If a is not an accumulation point of X, then any function on X converges to anything as x → a, so the result would be meaningless. Officially, the notation lim should not even be used in that case, since the limits would not be unique. Integration. The uniform limit of integrable functions is also integrable, and the integral of the limit is the limit of the integrals. More precisely: 13/1/2003 1054 mam
UNIFORM CONVERGENCE: CONTINUITY, INTEGRAL, DERIVATIVE.
181
Theorem. (Interchanging limit and integral.) If for each n ∈ N, fn is Riemann integrable on [a, b] and the sequence (fn ) converges uniformly to f on [a, b], then f is Riemann integrable on [a, b] and Z b Z b Z b fn = lim fn = f. lim n
a
a
n
a
(Use of this result is also referred to as “taking a limit under the integral sign”.) Proof. We could use the Basic Integrablility Criterion, but here is a quicker version of the proof. For each n, let Kn = kfn − f k. Since fn → f uniformly, Kn → 0. Now, for all n, f ≤ fn + |f − fn | ≤ fn + Kn , so by monotonicity of the upper integral, Z b Z b Z b Z b f≤ fn + Kn = fn + Kn (b − a). a
a
Similarly,
a
Z
b
f≥
a
Z
b
fn − Kn (b − a), a
a
so together these yield 0≤
Z
b
f−
Z
a
b
f ≤ 2Kn (b − a). a
Since the right side converges to 0, we see that the lower and upper integrals of f are identical, so f is integrable. Finally, Z Z Z b Z b Z b b b fn − f = (fn − f ) ≤ |fn − f | ≤ Kn = Kn (b − a) → 0 a a a a a This shows lim n
Z
b
fn = a
Z
b
f, a
as required.
Corollary. (Integration term byPterm.) For each n ∈ N, Let fn be Riemann integrable on [a, b] and let the series ∞ n=1 fn be uniformly convergent with sum f . Then, f is also Riemann integrable and Z b ∞ Z b X f= fn . a
n=1
a
P This follows because the convergence of ∞ n=1 fn uniformly to f , really means Pn that the sequence of sums sn = k=1 fk converges uniformly to f . Thus, Z b Z bX Z b Z b n ∞ Z b n Z b X X f= lim sn = lim sn = lim fk = lim fk = fn , a
a
n
as required.
13/1/2003 1054 mam
n
a
n
a k=1
n
k=1
a
n=1
a
182
UNIFORM CONVERGENCE: CONTINUITY, INTEGRAL, DERIVATIVE.
Differentiation. The uniform limit of derivatives is a derivative, and we may interchange the limit operation with that of differentiation, provided one more obviously necessary condition is satisfied. Theorem. Interchanging limit and derivative. Let (fn ) be a sequence of differentiable functions on [a, b]. If (1) the sequence (fn0 ) converges uniformly on [a, b] and (2) there is one point c ∈ [a, b] such that (fn (c)) converges, then, (fn ) converges uniformly to some f , f is differentiable, and lim fn0 = f 0 . n
Note. If we use D to stand for the differentiation operator, we can write this as Df = D(lim fn ) = lim Dfn . n
n
If we were to assume each fn had a continuous derivative, we could deduce this from the corresponding result for integrals, using the Fundamental Theorem of Calculus. The version given here doesn’t even assume that the derivatives are integrable. Notice also that the uniform convergence of the sequence of (fn ) is part of the conclusion, not the hypothesis. It is the sequence of derivatives that is assumed to converge uniformly. Proof. Let n, m ∈ N and t, x ∈ [a, b]. The Mean Value Theorem, applied to fn − fm yields a point s such that 0 fn (t) − fm (t) − (fn (x) − fm (x)) = (fn0 (s) − fm (s))(t − x). 0 0 (s)| ≤ kfn0 − fm k this yields Since |fn0 (s) − fm 0 |fn (t) − fm (t) − (fn (x) − fm (x))| ≤ k(fn0 − fm )k|t − x|.
((*))
Taking x = c, we obtain for all t ∈ [a, b], |fn (t) − fm (t)| ≤ |fn (c) − fm (c)| + kfn0 − 0 k|t − c|. Since (fn (c)) converges, (fn0 ) converges uniformly, and |t − c| ≤ |b − a|, fm for each ε > 0, there exists N , such that |fn (t) − fm (t)| < ε, for n, m ≥ N . Thus, (fn ) is uniformly Cauchy, so converges uniformly. Now, (∗) can be written fn (t) − fn (x) fm (t) − fm (x) 0 ≤ kfn0 − fm − k. t−x t−x Let f be the limit of (fn ) and g the limit of (fn0 ). Fix x ∈ [a, b] and fix n for a moment. Then, letting m → ∞, yields fn (t) − fn (x) f (t) − f (x) ≤ kfn0 − gk. − t−x t−x Since fn0 → g uniformly, this shows that the sequence of functions ϕn defined by ϕn (t) = 13/1/2003 1054 mam
fn (t) − fn (x) t−x
UNIFORM CONVERGENCE: CONTINUITY, INTEGRAL, DERIVATIVE.
183
converges uniformly, and we can interchange the order of limits, obtaining lim
t→x
f (t) − f (x) fn (t) − fn (x) = lim lim = lim fn0 (x) = g(x). n t→x n t−x t−x
Thus, f 0 = g and by hypothesis, fn0 −→ f 0 uniformly, as promised.
As in the case of integration, the result just proved yields a corresponding result about differentiating series. P∞ be a series of difCorollary. (Differentiation term-by-term.) Let n=1 fnP ∞ ferentiable real P functions defined on [a, b] such that the series P n=1 fn0 converges ∞ ∞ uniformly and n=1 fn (c) converges for some point c. Then n=1 fn converges uniformly and if f denotes the sum, then, f0 =
∞ X n=1
The proof is a straightforward exercise.
13/1/2003 1054 mam
fn0 .
184
UNIFORM CONVERGENCE: CONTINUITY, INTEGRAL, DERIVATIVE.
Notes
13/1/2003 1054 mam
Power Series A series of the form
∞ X
an (x − c)n
n=0
is called a power series about c. (The same name is given also to series where the summation P starts at n = 1 (or larger).) This can be considered a series of functions, n fn , where fn (x) = an (x − c)n . On what domains does this define a series which converges pointwise? On what domains does it converge uniformly? Can we integrate this series, or differentiate it, by integrating or differentiating the nice polynomials an (x − c)n ? The answer is suprisingly often YES. For the moment we consider power series of real numbers (and “real variables”). We will make some remarks about the complex case afterward. P n Definition. The Radius of convergence of the power series ∞ n=0 an (x − c) , is ) ( X n an r converges (possibly +∞). R = sup r : n
P n The value +∞ is given when the set {r : n a n r converges } is not bounded above. P ∞ One should note that R ≥ 0, since the series n=0 an rn converges to a0 when r = 0. Lemma. For the power series radius of convergence is
P
n
an (x − c)n , put α = lim supn |an |1/n . Then the R=
1 α
(interpreted as +∞, if α = 0, and as 0, if α = +∞). P Proof. This is an application of the root rest, which says that a series n xn converges absolutely if lim supn |xn |1/n < 1 and diverges if lim supn |xn |1/n > 1. Let α be as given in the statement. If α is finite, and r ≥ 0, then lim sup |an rn |1/n = lim sup(|an |1/n r) = r(lim sup |an |1/n ) = αr. n
P
n
n
n
Thus the series n an r converges absolutely for r < 1/α and diverges for r > 1/α. This shows that the radius of convergence is 1/α. In case α = 0, rα = 0 < 1, so the series converges for all r and hence the radius of convergence is +∞. Finally, in case α = +∞, lim supn |an rn |1/n = +∞ > 1, unless r = 0, in which case lim supn |an rn |1/n = lim supn 0 = 0. Thus, the radius of convergence is 0. Notice also that this proof actually showed that, in the definition of radius of convergence, one could use |an | in place of an , obtaining absolute convergence.
13/1/2003 1054 mam 185
186
POWER SERIES
Theorem. Let R be the radius of convergence of the power series
P
n
an (x − c)n .
(1) Then the series converges pointwise on (c − R, c + R). P (2) If 0 ≤ r < R, then the series n an (x − c)n converges uniformly on [c − r, c + r]. P∞ (3) Put f (x) = n=0 an (x − c)n , for x ∈ (c − R, c + R). Then: (a) If a, b ∈ (c − R, c + R) with a ≤ b, then f is integrable on [a, b] with Z
b
f (x)dx = a
∞ Z X n=0
b
an (x − c)n dx = a
∞ X
an
n=0
(x − c)n+1 n+1
b
.
x=a
(b) f is differentiable with f 0 (x) =
∞ X
nan (x − c)n−1 , for x ∈ (c − R, c + R).
n=0
The interval of convergence of the series is the largest interval on which the series converges. It consists of (c − R, c + R) together with those endpoints at which it converges. Proof. (The statements are vacuously satisfied if R = 0.) ((1) and (2)) Convergence pointwise (absolutely) on (c−R, c+R) means convergence for each fixed x ∈ (c−R, c+R). But if x belongs to this interval, equivalently if |x − c| < R, then there exists r with |x − c| < r < R; hence, if we can show the series converges uniformly on [c − r, c + r], it has to converge at x. Thus it is enough to prove (2). P So, suppose 0 ≤ r < R. then n |an |rn converges. But then, for all n and all x ∈ [c − r, c + r], |an (x − c)n | ≤ |an |rn , so by the Weierstrass M -test, the series converges uniformly for x ∈ [c − r, c + r]. (3) Let fn (x) = an (x−c)n defined for x ∈ (c−R, c+R). Let P a, b ∈ (c−R, c+R) and choose 0 ≤ r < R so that [a, b] ⊂ [c − r, c + r]. Then n fn converges to f uniformly on [c−r, c+r], hence on [a, b], so by the theorem on integrating uniformly convergent series, b Z b ∞ Z b ∞ X X (x − c)n+1 f= fn = an . n+1 a x=a n=0 a n=0 This proves part (a). To prove part (b), we apply the result on differentiating the sum of a series. However, for that result, we need uniformity of the convergence of the series ∞ X
fn0
n=0
of derivatives in a neighbourhood of the point where we intend to differentiate. Now, ∞ ∞ X X fn0 (x) = an n(x − c)n−1 . (D) n=0
13/1/2003 1054 mam
n=0
POWER SERIES
187
The radius of convergence of this “differentiated” series is determined by lim sup |nan |1/n = lim sup |n|1/n |an |1/n = lim sup |an |1/n , n
n
n
since n1/n → 1. Thus, the series (D) has radius of convergence R. Since also
∞ X
fn (c) converges to a0 ,
n=0
we can differentiate term by term and get f 0 (x) =
∞ X
fn0 (x),
n=0
for all x ∈ [c − r, c + r]. If we are given a particular point in (c − R, c + R), we simply choose r so that x belongs to [c − r, c + r] to complete the proof. Note. We see from the above arguments that the radii of convergence of the three series X an X X (x − c)n+1 , an (x − c)n an n(x − c)n−1 and n + 1 n n n are the same. P∞ Corollary. Suppose f (x) = n=0 an (x − c)n , for x ∈ (c − R, c + R). Then for each k ∈ N, f is k-times differentiable in this interval, with derivative f (k) (x) =
∞ X n=k
n! an (x − c)n−k , (n − k)!
and in particular, f (k) (c) = k!ak , so ak =
f (k) (c) . k!
Thus, we obtain the representation of f in terms of its so-called Taylor’s series: f (x) =
∞ X
an (x − c)n =
n=0
∞ X f (n) (c) (x − c)n , for x ∈ (c − R, c + R) n! n=0
The proof is a simple exercise. Example. Warning. The above theorem does not say that an infinitely differentiable function is always the sum of its Taylor’s series. It only applies to functions which are already known to be expandable as the sum of a power series. The function defined on R by f (x) = 13/1/2003 1054 mam
2
e−1/x , if x 6= 0 0, if x = 0
188
POWER SERIES
has f (n) (0) = 0, for all n ∈ N, so its Taylor’s series about 0 is the series ∞ X 0 n x n! n=0
the “0 series”. Its sum is 0 for all x, which is certainly different from f . The remainder term in Taylor’s Theorem is actually the entire function f . P∞ We know that, if R is the radius of convergence of the series n=0 an (x − c)n , then it converges pointwise on (c − R, c + R) and uniformly on any closed interval contained it this, but not necessarily on the whole interval. (The geometric series P n x shows this.) It diverges at each point outside the closed interval [c−R, c+R]. n What happens to the uniformity if there is convergence at an endpoint? P n Abel’s Theorem. If the power series ∞ n=0 an (x − c) converges at c + R, then it converges uniformly on [c, c+R]. Similarly, at c−R, then it converges P if it converges n uniformly on [c−R, c]. Consequently, ∞ n=0 an (x−c) converges uniformly on each closed interval contained in its interval of convergence. Proof. Without loss of generality, we may (and do) assume c = 0 and R = 1. (Why P a is this?) Let ε > 0. Since ∞ n=0 n converges, it is Cauchy, so we may find N such that m X ai < ε, for all m ≥ N, i=N
or what is the same
k X aN +i < ε, for all k ≥ 0. i=0
P k Put Ak = i=0 aN +i , which is therefore < ε for all k. Then, k X
aN +i xi = aN xN + aN +1 xN +1 + · · · + aN +k xN +k
i=0
= A0 xN + (A1 − A0 )xN +1 + (A2 − A1 )xN +2 + · · · + (Ak − Ak−1 )xN +k = xN [A0 + (A1 − A0 )x + (A2 − A1 )x2 + · · · + (Ak − Ak−1 )xk ] = xN [A0 (1 − x) + A1 (x − x2 ) + · · · + Ak−1 (xk−1 − xk ) + Ak xk ] Now the absolute value of this is at most xN [|A0 |(1 − x) + |A1 |(x − x2 ) + · · · + |Ak−1 |(xk−1 − xk ) + |Ak |xk ] ≤ xN [ε(1 − x) + ε(x − x2 ) + · · · + ε(xk−1 − xk ) + εxk ] = xN ε, since the sum telescopes. Thus,
m X ai xi < ε, for all m ≥ N, i=N
which shows that the series satisfies the Cauchy condition for uniform convergence on [0, 1]. The proof for the case of convergence at c − R is similar. Alternatively, it can be deduced from the P previous case by replacing x − c by −(x − c); that is, by considering the series n (−1)n an (x − c)n . 13/1/2003 1054 mam
POWER SERIES
Corollary. If
P∞
n=0
an converges, then limx→1−
189
P∞
n=0
an xn =
P∞
n=0
an .
P1 Example. Let f (x) P = 1/(1+x), for x 6= 1. Since 1/(1−x) = n=0 xn , for |x| < 1, ∞ we also have f (x) = n=0 (−1)n xn , for |x| < 1. The interval of convergence of this series is just (−1, 1). The Taylor’s series for f about 0 is just this, but f is not the sum of its Taylor’s series, except on that interval. Now, for each x ∈ [0, 1), we may integrate term by term, yielding Z x ∞ ∞ X X 1 xn+1 xn dt = = (−1)n (−1)n−1 . n + 1 n=1 n 0 1+t n=0 This latter series also converges at x = 1, because it is an alternating series of decreasing terms. 1 1 1 1 − + − + .... 2 3 4 Thus, by Abel’s Theorem, ∞ ∞ n X X 1 n−1 x lim = (−1) (−1)n−1 .. x→1− n n n=1 n=1 Assuming the properties of natural logarithms, we are now able to sum this series, because Z x Z 1 1 1 dt = dt = ln(1 + 1) − ln(1 + 0) = ln 2. lim x→1 0 1 + t 1 + t 0 The complex case. The definition of power series remains unchanged if the coefficients an and the center c are replaced by complex numbers. Traditionally, the variable z is used instead of x. Again, using the same proofs, we find that the series P∞ n a n=0 n (z − c) converges absolutely in the ball {z : |z − c| < R}, uniformly on any smaller ball, and diverges for |z−c| > R, where again R = 1/α, α = lim supn |an |1/n . The statement about integrating the series term by term doesn’t make sense with the definition of integral we are using — we have no concept of integration with respect to a complex variable. The complex derivative of a function f : Z → C, where Z ⊂ C is defined as f (z) − f (z0 ) . If f (z) is given by a power series, in the real case by f 0 (z0 ) = limz→z0 z − z0 the theorem on differentiation term-by-term is still true, but our proof doesn’t apply, because it depends on the Mean Value Theorem, which is not valid in the complex case. The product of two power series. P P series ∞ and ∞ n=0 an n=0 bn . The (Cauchy)product of these two series PGiven P n is ∞ c , where c = a n n=0 n k=0 k bn−k . Power series givePthe motivation P for this definition. If one formally multiplies the two power series n an z n and n z n term by term, collecting terms containing the same power of z (as if they were polynomials), one obtains X X an z n )( z n ) = (a0 + a1 z + a2 z 2 + a3 z 3 . . . )(b0 + b1 z + b2 z 2 + . . . ) ( n
n
= a0 b0 + (a0 b1 + a1 b0 )z + (a0 b2 + a1 b1 + a2 b0 )z 2 + . . . = c0 + c1 z + c2 z 2 + . . . . Taking z = 1 gives the above definition. 13/1/2003 1054 mam
190
POWER SERIES
Theorem. Suppose P∞ (a) n=0 an converges absolutely with sum A and P∞ (b) n=0 bn converges with sum B. P Pn Then, the series ∞ n=0 cn , with cn = k=0 ak bn−k , converges with sum AB. Proof. Put An =
P
k≤n
ak , Bn =
P
k≤n bk ,
and Cn =
P
k≤n ck .
Then,
Cn = a0 b0 + (a0 b1 + a1 b0 ) + (a0 b2 + a1 b1 + a2 b0 ) + · · · + (a0 bn + a1 bn−1 + · · · + an b0 ) = a0 Bn + a1 Bn−1 + a2 Bn−2 + . . . an B0 = (a0 + a1 + · · · + an )B + a0 (Bn − B) + a1 (Bn−1 − B) + a2 (Bn−2 − B) + . . . an (B0 − B) = An B + Wn where Wn = a0 (Bn − B) + a1 (Bn−1 − B) + a2 (Bn−2 − B) + · · · + an (B0 − B). Since An B → AB, our job is to show that Wn → 0. Let A = assumed finite by (a).
P∞
n=0
|an |, which was
Choose N so large that |Bn − B| ≤ ε, for n ≥ N . Then, |Wn | ≤ |a0 |ε + |a1 |ε + |a2 |ε + . . . |an−N |ε + |an−N +1 (BN −1 − B) + an−N +2 (BN −2 − B) + . . . an (B0 − B)| ≤ Aε + |an−N +1 (BN −1 − B) + an−N +2 (BN −2 − B) + . . . an (B0 − B)| Here, there are N terms, each of which tend to 0, since ak → 0. Thus, this less than Aε + ε for all n large enough. This shows that Wn → 0, and we are done. The same conclusion holds also with the absolute convergence in (a) replaced by convergence, provided the product series is known to converge. P P∞ Theorem. If ∞ n=0 an converges with sum A, P n=0 bn converges with sum B and P∞ n n=0 cn convergeces with sum C, where cn = k=0 ak bn−k , then AB = C. P∞ P∞ P n n n Proof. Define f (x) = ∞ n=0 an x , g(x) = n=0 bn x and h(x) = n=0 cn x , for x ∈ [0, 1]. For x < 1, the series converge absolutely and hence may be multiplied using the Cauchy product, so that f (x)g(x) = h(x) (0 ≤ x < 1). Because of Abel’s Theorem, these functions are continuous at 1: f (x) → A so that AB = C, as required.
g(x) −→ B
an+1 = α, the radius of convergence of an +∞, if α = 0; 0, if α = +∞).
1. If limn
13/1/2003 1054 mam
h(x) −→ C,
P
n
an (x − c)n is
1 (again interpreted as α
The exponential and trigonometric functions Here we define the exponential function, establish its main properties, and use it to obtain other elementary functions. We begin in outline form and give details afterward. (1) We define the exponential function for every complex number z by exp(z) =
∞ X zn . n! n=0
The series converges absolutely for every z and uniformly on each bounded subset of C. (2) exp(a + b) = exp(a) · exp(b), for all a, b ∈ C. (3) exp(0) = 1, exp(1) = e, exp(−z) = exp(z)−1 . (4) exp(z) 6= 0, for all z. (5) exp0 (z) = exp(z). (complex differentiation) (6) The restriction of exp to R is strictly increasing, continuous and positive. It agrees with the map x 7→ ex , defined earlier, so one also writes ez for exp(z). (7) limx→−∞ exp(x) = 0, limx→+∞ exp(x) = +∞ and hence, exp maps R onto (0, +∞). (8) The map t 7→ exp(it) maps R into [actually onto as we see in step (11)] the unit circle and we define the cosine and sine functions as the real and imaginary parts of this map. Thus, cos t = Re(exp(it))
sin t = Im(exp(it)),
or what is the same eit = cos t + i sin t
(“the Euler indentity”)
(9) sin and cos are differentiable with sin0 = cos,
cos0 = − sin .
(10) The functions cos and sin have power series representations ∞
cos t = 1 −
X t4 t2 t2k + + ··· = (−1)k 2! 4! (2k)! k=0
3
sin t = t −
5
t t + + ··· = 3! 5!
∞ X
(−1)k
k=0
t2k+1 (2k + 1)!
(11) There is a smallest positive number π such that exp(iπ/2) = i. Then, the interval [0, 2π) is mapped by t 7→ exp(it) onto the unit circle and exp(z) = 1 if and only if z = (2πi)k, for some integer k. (12) exp maps C onto C \ {0}. Thus, for every complex number w other than 0, there exists z such that ez = w. 13/1/2003 1054 mam 191
192
THE EXPONENTIAL AND TRIGONOMETRIC FUNCTIONS
Details (proofs). (1) Since
z z n+1 . z n = → 0, the defining series (n + 1)! n! n+1 ∞ X zn . n! n=0
converges, by the ratio test, for all z ∈ C. Thus, the definition is valid, the radius of convergence is R = +∞, and hence on each disk {z : |z| ≤ r}, the series converges uniformly and absolutely. (2) exp(a + b) = exp(a) · exp(b), for all a, b ∈ C. To see this, we use the Cauchy product (convolution) of the series. Since P bn converges, converges absolutely and n n!
P an n n!
∞ ∞ ∞ n X ak X bm X X ak bn−k = k! m! k!(n − k)! m=0 n=0 k=0
k=0
∞ n X 1 X n! = ak bn−k n! k!(n − k)! n=0 k=0
∞ X 1 = (a + b)n n! n=0
by the binomial theorem. (3) As usual for power series, 00 is defined to be 1 and since all the other terms of the series for exp(0) vanish, exp(0) = 1. Long ago we proved that exp(1) = limn (1 + 1/n)n = e. From (2), exp(z) exp(−z) = exp(z − z) = exp(0) = 1, so that exp(−z) = exp(z)−1 . (4) That exp(z) 6= 0, for all z, is an immediate consequence. (5) Using complex differentiation, exp0 (z) means lim
w→z
exp(w) − exp(z) exp(z + h) − exp(z) = lim h→0 w−z h
By (2) this is the same as exp(z) lim
h→0
exp(h) − exp(0) . h
But, using the series, we see that exp(h) − exp(0) = h
P∞
n=0
hn ∞ ∞ −1 X X hn−1 hn−2 n! = =1+h , h n! n! n=1 n=2
which converges to 1. Thus, exp0 (z) = exp(z), as claimed. 13/1/2003 1054 mam
THE EXPONENTIAL AND TRIGONOMETRIC FUNCTIONS
193
(6) It follows from (5) that restriction of exp to R also has exp0 (x) = exp(x). So we see that this function is differentiable, hence continuous. Since exp(0) = 1 and since exp(x) 6= 0, for all x, we know by the Intermediate Value Theorem that exp(x) > 0, for all real x. But then, exp0 (x) > 0, so exp is strictly increasing on R. Applying (2) by induction one finds that for n a natural number, exp(n) = en , and similar arguments (see the section [EXPONENTS] ) exp(r) = er , for rational r. By continuity, we find that, for real x, exp(x) = sup{er : r < x, r ∈ Q} = inf{es : s > x, s ∈ Q}. This is the way we defined ex . Thus, exp(x) = ex , for all real x, and one also writes ez for exp(z). P (7) Since, for positive real x, exp(x) = n xn /n! > x, limx→+∞ exp(x) = +∞, and limx→−∞ exp(x) = limx→∞ exp(−x) = limx→∞ 1/ exp(x) = 0. By the Intermediate Value Theorem, this entails that exp maps R onto (0, +∞). (8) Recall that for a complex number z = x+iy, the complex conjugate is z = x−iy and |z|2 = zz. Now, let t be real. From the series, exp(it) =
X (it)n n
n!
we see that exp(−it) is the complex conjugate of exp(it). Indeed, those terms with even n are real and don’t change when we replace i by −i. Those terms with odd n are imaginary, and are negated when we replace i by −i. Consequently | exp(it)|2 = exp(it) exp(−it) = 1. This shows that for all t ∈ R, | exp(it)| = 1; that is, the map t 7→ exp(it) maps R into the unit circle. We define the cosine and sine functions by cos t = Re(exp(it))
sin t = Im(exp(it));
that is, eit = cos t + i sin t
(“the Euler indentity”)
(9) Differentiating exp(it) with respect to t gives cos0 t + i sin0 t = i exp(it) = i(cos t + i sin t) = − sin t + i cos t, so that sin and cos are differentiable with cos0 = − sin,
sin0 = cos .
Note: we should be careful here. We are comparing differentiation with respect to a complex variable with differentiation with respect to a real variable. We know for each z, exp(w) − exp(z) = exp z, lim w→z w−z so exp(w) − exp(it) lim = exp it. w→it w − it 13/1/2003 1054 mam
194
THE EXPONENTIAL AND TRIGONOMETRIC FUNCTIONS
Hence, if we restrict w to only run through values of the form w = is, for s ∈ R, then exp(is) − exp(it) = exp it lim s→t is − it This gives lim
s→t
sin s − sin t cos s − cos t +i = i exp(it), s−t s−t
justifying the calculation at the beginning of this paragraph. (10) As we noted above, the terms of the series for exp(it) with even powers of it are real and those with odd powers of it are imaginary. That is ∞ X
cos t + i sin t = exp(it) =
∞
(−1)k
k=0
X t2k t2k+1 +i . (−1)k (2k)! (2k + 1)! k=0
In other words, cos t =
∞ X
sin t =
k=0 ∞ X
(−1)k (−1)k
k=0
t2k (2k)! t2k+1 (2k + 1)!
(11) Now we are going to define π. We know that cos 0 + i sin 0 = exp(i0) = 1, so cos 0 = 1 and from the series representation of cos, 24 22 + − cos 2 = 1 − 2! 4!
26 − ··· 6!
<1−
22 24 + = −1/3. 2! 4!
Thus, by the Intermediate Value Theorem, there exists a t > 0 with cos t = 0. Because of the continuity of cos, we can take t0 to be the smallest such t. Define π to be 2t0 ; thus, π is the smallest positive number such that cos π/2 = 0. Now, for t ∈ [0, π/2), cos t > 0. But sin0 t = cos t, so sin is strictly increasing on [0, π/2) and hence, since sin 0 = 0, sin > 0 on (0, π/2]. Since cos2 (iπ/2) + sin2 (π/2) = 1, sin2 (π/2) = 1, and hence sin(π/2) = 1. It follows that eiπ/2 = cos(π/2) + i sin(π/2) = i eiπ = eiπ/2 eiπ/2 = i2 = −1 ei3π/2 = −i ei2π = 1
(*)
Now, let z = u + iv be on the unit circle. If u > 0, v ≥ 0, there exists t with cos t = u. But then v 2 = sin2 t, so that v = sin t, since both are non-negative. If u ≤ 0, v > 0, then (u + iv)(−i) = v − iu = eit , for some t, and hence z = ei(t+π/2) . Finally, if v < 0, then −z = eit , for some t, by the previous 2 cases, so z = ei(t+π) . 13/1/2003 1054 mam
THE EXPONENTIAL AND TRIGONOMETRIC FUNCTIONS
195
We now prove that exp(z) = 1 if and only if z = (2πi)k, for some integer k. Certainly from (∗), for each k ∈ Z ei2πk = 1. To prove the converse, first observe that on (0, π), cos0 = − sin < 0, so cos is strictly decreasing there and on (π, 2π), cos0 > 0, so cos is strictly increasing there. Hence, 0 is the only y ∈ [0, 2π), with eiy = 1. If z = x + iy then exp(z) = ex eiy . If ez = 1, then |ez | = ex = 1, so x = 0 and eiy = 1. Let k = by/2πc, the greatest integer less than or equal to y/2π. Then, exp(i(y − 2kπ)) = 1, so y − 2kπ = 0, and thus y = 2kπ. Thus, z = (2πi)k, as required. (12) Let w be a complex number other than 0. Then, |w| = ex , for some real x and w/|w| is on the unit circle, so is of the form eiy , for some real y. Hence w = ex+iy , as required to establish all the properties listed. . The connection with the angles of geometry. . An angle θ, in radian measure, is identified with the length of an arc: namely, the arc traced out on the unit circle as one rotates the point (1, 0) through the angle θ. If γ(t) = eit , t ∈ [0, 2π], the curve γ has range the unit circle and the length of this arc is Z
2π 0
|γ (t)| dt = 0
Z
2π
1 dt = 2π, 0
Rθ and more generally, if t moves from 0 to θ, γ traces out an arc of length 0 1 dt = θ. So the approach we have taken is consistent with the angle interpretation, and we find the cosine and sine of an angle θ, as defined in terms of right triangles, is also consistent with the definitions of cos θ and sin θ here.
13/1/2003 1054 mam
196
THE EXPONENTIAL AND TRIGONOMETRIC FUNCTIONS
Notes
13/1/2003 1054 mam
Differentiation of vector-valued and complex-valued functions The definition of derivative given for real-valued functions applies without change to those with values in C or Rn . Thus, let f be a function defined on an interval I of R containing the point c with values in C or in Rn . We say f is differentiable at c if the limit f (x) − f (c) lim x−→c x−c exists. If so, this limit is called the derivative of f at c. The function f 0 with domain the set of points where f is differentiable defined by f 0 (c) = lim
x−→c
f (x) − f (c) x−c
is called the derivative of f . In the vector case, if f = (f1 , . . . fn ), that is, fi (x) is the ith component of f (x), for each i, we see that f is differentiable at c if and only if fi is differentiable at c, for all i, and we have f 0 (c) = (f10 (c), . . . , fn0 (c)). As a vector-space, C is identified with R2 through the correspondence: x + iy
←→
(x, y)
and we see that for a function f : I → C, f = f1 + if2 , f is differentiable at c iff each of f1 and f2 are differentiable and f 0 (c) = f10 (c) + if20 (c). The Tangent Characterization holds true again in this setting, with the identical proof. The only change is that some numbers become vectors: Theorem. For a function f : I −→ Rn and x ∈ I, f is differentiable at x iff there exists an m ∈ Rn and a function ε : I → Rn such that limt−→x ε(t) = ε(x) = 0 such that for all t ∈ I, f (t) = f (x) + m(t − x) + ε(t)(t − x). In this case, f 0 (x) = m. The product m(t − x) here means multiplication of the vector m by the scalar t − x. As t moves along the real line, f (x) + f 0 (x)(t − x) traces out a straight line tangent to the image of f at f (x). Since ε(t) → 0, as t gets close to x, the error in using the point on the tangent line to approximate the point f (t) becomes small, even compared to t − x. [This is actually the meaning assigned to the word “tangent”.] The formula in the tangent characterization can also be written: f (x + h) = f (x) + mh + ε0 (h)h, where ε0 (h) → 0 as h → 0. Of course this is meaningful only for those h for which x + h ∈ I. The tangent characterization of derivative again immediately gives continuity. 13/1/2003 1054 mam 197
198DIFFERENTIATION OF VECTOR-VALUED AND COMPLEX-VALUED FUNCTIONS
Theorem. If f is differentiable at c, then f is continuous at c. To obtain a chain rule for this notion of derivative, we have to be careful about the ranges of the two functions. The chain rule. Let I be an interval and g : I −→ R be differentiable at x0 . Let g(I) ⊂ J, f : J −→ Rn , and let f be differentiable at u0 = g(x0 ). Then, f ◦ g is differentiable at x0 , with (f ◦ g)0 (xo ) = f 0 (uo )g 0 (x0 ) = f 0 (g(x0 ))g 0 (x0 ). This is proved just as before. It will also be deducible from the version given elsewhere for functions of a vector variable. Notice that here if g had its values in Rn , f would have to be defined on a set of elements of Rn and the present definition of derivative would not apply. The simple algebraic results now become: Theorem. If f is constant on the interval I, then f is differentiable on I with f 0 (x) = 0 for all x ∈ I. Theorem. Let f : I −→ Rn , g : I −→ Rn be differentiable at c ∈ I and let k ∈ R, then: (a) kf is differentiable at c with (kf )0 (c) = kf 0 (c) (b) f + g is differentiable at c and (f + g)0 (c) = f 0 (c) + g 0 (c) (sum rule). (c) f · g is differentiable at c and (f · g)0 (c) = f 0 (c) · g(c) + f (c) · g 0 (c) (product rule for dot product). Of course, there is also a version of the product rule for one of the functions f and g scalar valued and the other vector valued, and a corresponding quotient rule. The Mean Value Theorem does not hold for vector (or complex) valued functions defined on a real interval. Example. For each t ∈ [0, 2π], let f (t) = eit . Then, f (2π) − f (0) = 1 − 1 = 0, but f 0 (t) = ieit has absolute value 1, for all t. But there is a consequence of the Mean Value Theorem that does hold; it is the one that we used to show that a function with a bounded derivative is Lipschitz. Theorem. Let f : [a, b] → Rn be continuous on [a, b] and differentiable on (a, b). Then, there exists x ∈ (a, b) such that |f (b) − f (a)| ≤ |f 0 (x)||b − a|. 13/1/2003 1054 mam
DIFFERENTIATION OF VECTOR-VALUED AND COMPLEX-VALUED FUNCTIONS199
Proof. Put z = f (b) − f (a) and define for t ∈ [a, b], ϕ(t) = z · f (t). Then, ϕ is a real-valued function, continuous on [a, b], differentiable in (a, b). Therefore there exists x ∈ (a, b) with ϕ(b) − ϕ(a) = ϕ0 (x)(b − a). But z is constant, so by the product rule, ϕ0 (x) = 0 · f (x) + z · f 0 (x) = z · f 0 (x). As for the other side of the equation, ϕ(b) − ϕ(a) = z · f (b) − z · f (a) = z · (f (b) − f (a)) = |z|2 . Thus, |z|2 = z · f 0 (x)(b − a) ≤ |z||f 0 (x)|(b − a), by the Cauchy-Schwarz inequality, and the result follows by cancelling the |z|. We will have a further generalization of this result when we turn to functions of a vector variable.
13/1/2003 1054 mam
200DIFFERENTIATION OF VECTOR-VALUED AND COMPLEX-VALUED FUNCTIONS
Notes
13/1/2003 1054 mam
Integration of vector-valued functions Let f1 , . . . , fn be functions on [a, b] ⊂ R to R, and let f = (f1 , . . . , fn ) be the corresponding function on [a, b] to Rn . The Riemann definition of integral still makes sense here, and one could show that under that approach we would have, whenever f1 , . . . fn are integrable, that f is integrable and Z b Z b Z b ! f= f1 . . . , fn . a
a
a
In any case, we will just take this to be the definition. When we need (or want) to show the variable of integration this becomes. ! Z b Z b Z b f (t) dt = f1 (t) dt, . . . , fn (t) dt a
a
a
It is clear this integral is still linear and additive on intervals, by just applying the real case to each coordinate. The same is true for the Fundamental Theorem of Calculus. Let us state the “integrating a derivative” form. Theorem. If F maps [a, b] into Rn , F 0 = f on [a, b], and f is Riemann integrable, then Z b f (t) dt = F (b) − F (a). a
The result about the integral of the absolute value of an integrable function f is also true, but the proof is a little trickier, since f is vector-valued and |f | is real-valued. Theorem. Let f : [a, b] → Rn be Riemann integrable. Then |f | is Riemann integrable and Z b Z b f ≤ |f |. a a Proof. The hypothesis is that each of the components f1 , . . . ,fn is integrable, so each of the squares f12 , . . . ,fn2 isP integrable and so is their sum. The square root function is continuous, so |f | = ( i fi2 )1/2 is also integrable. Put wi =
Rb a
Rb fi , so that w = (w1 , . . . , wn ) is a f . Then, Z bX X X Z b 2 2 |w| = wi = wi fi = wi fi . i
a
i
a
i
By the Cauchy-Schwarz inequality, for all t ∈ [a, b], X wi fi (t) ≤ |w||f (t)|. i
Thus, |w|2 ≤
Z
b
|w||f (t)| dt = |w| a
Z
b
|f |, a
and the result follows by cancelling the |w|. 13/1/2003 1054 mam 201
202
INTEGRATION OF VECTOR-VALUED FUNCTIONS
Rectifiable curves — arc length. A continuous mapping γ of an interval [a, b] into Rn is called a (parametrized) curve, because its range C = {γ(t) : t ∈ [a, b]} can be considered a geometric curve, traced out by the point γ(t) as t moves from [a, b]. The “distance traveled” by the point γ(t) is thought of as the length of the curve. But one should should keep in mind that same set C corresponds to many different maps and these then may have different lengths. If γ is one-to-one, it is called an arc. If γ(a) = γ(b), γ is called a closed curve. For a firm definition of the length of a curve, associate to each partition P = {x0 , x1 , . . . , xk } of [a, b] the number `(γ, P ) =
k X
|γ(xi ) − γ(xi−1 )|.
i=1
This is the sum of the distances between the points γ(xi ), and γ(xi−1 ), so is the length of a polygonal path with vertices γ(x0 ), γ(x1 ), . . . , γ(xk ). We define the length of γ to be `(γ) = sup{`(γ, P ) : P a partition of [a, b]}.
It is easy to see that the approximations `(γ, P ) increase as the partition P gets finer. If `(γ) is finite, we call the curve γ rectifiable. A smooth curve γ is one which is continuously differentiable, that is, for which γ 0 is continuous on the parameter interval. Theorem. If γ : [a, b] → Rn is a smooth curve, then γ is rectifiable and `(γ) =
Z
b
|γ 0 (t)| dt. a
Proof. Let P = {x0 , x1 , . . . , xk } be a partition of [a, b]. For each i we have by the Fundamental Theorem of Calculus, Z Z xi xi 0 |γ(xi ) − γ(xi−1 )| = γ (t) dt ≤ |γ 0 (t)| dt. xi−1 xi−1 Summing over i gives `(γ, P ) ≤
Z
b
|γ 0 (t)| dt a
and taking the supremum over all such partitions yields `(γ) ≤
Z
b
|γ 0 (t)| dt. a
For the reverse inequality, let ε > 0, and use the fact that γ 0 is uniformly continuous to choose δ > 0 such that |γ 0 (s) − γ 0 (t)| < ε, whenever |s − t| < δ. Then, 13/1/2003 1054 mam
INTEGRATION OF VECTOR-VALUED FUNCTIONS
203
choose any partition P = {x0 , . . . , xk } of [a, b] with mesh kP k < δ. For each t ∈ [xi−1 , xi ], we have |γ 0 (t)| ≤ |γ 0 (xi )| + ε and integrating gives Z
Z xi 0 |γ (t)| dt ≤ |γ (xi )|∆xi + ε∆xi = |γ (xi )∆xi | + ε∆xi = γ (xi ) dt + ε∆xi xi−1 xi−1 Z xi = γ 0 (t) + (γ 0 (xi ) − γ 0 (t)) dt + ε∆xi xi−1 Z Z xi xi ≤ γ 0 (t) dt + |γ 0 (xi ) − γ 0 (t)| dt + ε∆xi xi−1 xi−1 xi
0
0
0
≤ |γ(xi ) − γ(xi−1 | + 2ε∆xi
Summing over i yields Z
b
|γ 0 (t)| dt ≤ `(γ, P ) + 2ε ≤ `(γ) + 2ε a
Since ε is arbitrary,
Z
b
|γ 0 (t)| dt ≤ `(γ), a
which is all that was left to prove.
13/1/2003 1054 mam
Banach’s Contraction Mapping Theorem The result of the title, also known as Banach’s Fixed Point Theorem, is valid in general complete metric spaces. It provides conditions under which mapping have a fixed point; its proof, an algorithm for finding that point. If X is a metric space and f : X → X, f is called a contraction mapping if there is a number c < 1 with d(f (x), f (y) ≤ cd(x, y), for all x, y ∈ X. Banach’s Fixed Point Theorem. Let X be a complete metric space and let f : X → X be a contraction mapping. Then, there exists exactly one point x ∈ X such that f (x) = x. Proof. Define a sequence of points (xn ) as follows. Let x0 be any point of X and for n = 0, 1, 2, . . . , let xn+1 = f (xn ). Choose c < 1 so that d(f (x), f (y)) ≤ cd(x, y), for all x, y ∈ X. For n ≥ 1, we have d(xn , xn+1 ) = d(f (xn−1 ), f (xn )) ≤ c d(xn−1 , xn ). By induction, we obtain d(xn , xn+1 ) ≤ cn d(x0 , x1 ). If n < m, this yields d(xn , xm ) ≤
m−1 X
d(xi , xi+1 ) ≤ (cn + cn+1 + · · · + cm−1 )d(x0 , x1 )
k=n
≤
cn d(x0 , x1 ). 1−c
Since cn → 0, (xn ) is a Cauchy sequence. Since X is complete, (xn ) converges to some x ∈ X. Since f is a contraction mapping, it is (uniformly) continuous. Hence, f (x) = lim f (xn ) = lim xn+1 = x, n
n
as required. The uniqueness is immediate: if x and y are distinct fixed points, then d(x, y) = d(f (x), f (y)) ≤ cd(x, y) < d(x, y), a contradiction.
13/1/2003 1054 mam 204
Differentiation of vector functions of a vector variable We now study the general case of a mappings between a subset of Rn into Rm . These are often called transformations or vector valued functions of several variables. We recall that for real functions defined on an interval of R, f was differentiable at a point c if and only if there was a straight line approximating f closely at c, and a similar result held for vector valued functions of a real variable. This is the clue for the definition in the vector-to-vector case. A mapping T of a vector space E1 into another vector space E is called a linear transformation if, for all x, y ∈ E1 , and all scalars t, T (x + y) = T x + T y
T (tx) = tT x.
Note that for linear maps, one often writes T x instead of T (x). Definition. Let G be an open set of Rn and let f be a mapping into Rm , defined (at least) on G. We say that f is differentiable at x0 ∈ G, provided there exists a linear transformation T : Rn → Rm and a function ε : G → Rm such that ε(x) → ε(x0 ) = 0 and for all x ∈ G, f (x) = f (x0 ) + T (x − x0 ) + ε(x)|x − x0 |. If this is satisfied, T is denoted f 0 (x0 ) or Df (x0 ), and called the derivative of f at x0 . Of course the function ε is given by ε(x) =
f (x) − f (x0 ) − T (x − x0 ) , |x − x0 |
for x 6= x0 , (and 0, if x = x0 ). It should be noticed that we had to use |x − x0 |, instead of x − x0 , because we cannot divide by a vector. Another way we could write the defining condition is f (x) = f (x0 ) + f 0 (x0 )(x − x0 ) + R(x), where the remainder R(x) satisfies lim
x→x0
|R(x)| = 0. |x − x0 |
Such a term R(x) is said to be o(x) as x → x0 . At the risk of causing boredom, we also note that this could be written f (x0 + h) = f (x0 ) + f 0 (x0 )h + R1 (h), where lim
h→0
|R1 (h)| = 0. |h|
13/1/2003 1054 mam 205
206
DIFFERENTIATION OF VECTOR FUNCTIONS OF A VECTOR-VARIABLE
As we shall confirm shortly, linear transformations are continuous, so it follows as in the real setting, that differentiability imples continuity. Theorem. If G is an open subset of Rn and f : G → Rm is differentiable at x0 , then f is continuous at x0 . Connection with the usual definition for functions of a real variable. If T : R → R is a linear mapping, then its value at h ∈ R is T h = T (1)h. That is, if m = T (1), then T h = mh, ordinary multiplication. Conversely, every function of that form is linear. So, when we write f (x) = f (x0 ) + f 0 (x0 )(x − x0 ) + R(x), it doesn’t matter whether we think of f 0 (x0 ) as a number which is multiplied by x − x0 , or as a linear mapping, evaluated at x − x0 . Similarly, if v is a vector of Rm , then the mapping h ∈ R 7→ vh (multiplication of v by the scalar h) is a linear mapping. Conversely, if T : R → Rm is a linear transformation, and v is the vector T (1), T at h is again h(T (1)) = vh. Once more, then, it doesn’t matter whether we think of f 0 (x0 )(x − x0 ) as a vector multiplied by the scalar x − x0 or as a linear transformation acting on x − x0 . The case of real-valued functions of a vector variable. From elementary linear algebra, we learn: Theorem. If T : Rn → R, then T is a linear transformation if and only if there exists a = (a1 , . . . , an ) ∈ Rn such that T x = a1 x1 + a2 x2 + · · · + an xn = a · x, for all x ∈ Rn . The linearity of such a map follows immediately from the properties of dot product. For the converse, let e1 , e2 , . . . , en be the standard basis vectors of Rn , e1 = (1, 0, 0, . . . , 0) e2 = (0, 1, 0, . . . , 0) .. .=
.. .
en = (0, 0, 0, . . . , 1), so eij = 1, if i = j, and 0 otherwise. Then, each x ∈ Rn is of the form x = (x1 , . . . , xn ) = and we see that T x = T (ei ), for all i.
P
i
n X
xi ei ,
i=1
xi T ei = (a1 , . . . , an ) · (x1 , . . . , xn ) = a · x, where ai =
Thus, if a function T on an open set G of Rn with values in R is differentiable at x0 , then there is a vector a ∈ Rn such that f 0 (x0 )h = a · h, for all h ∈ Rn . The vector a here will turn out to be what is called the gradient of f at x0 . Often we think of the graph of the mapping f : G ⊂ Rn → R, namely: S = {(x, f (x)) : x ∈ G}, as an m-dimensional surface in Rm+1 . The graph of the map x 7→ f (x0 ) + f 0 (x0 )(x − x0 ) then becomes a (hyper)plane tangent to S at x0 . 13/1/2003 1054 mam
DIFFERENTIATION OF VECTOR FUNCTIONS OF A VECTOR VARIABLE
207
The space L(Rn , Rm ). The identification of the derivative as a real number or as a vector with the derivative as a linear transformation becomes clearer when we notice that the set L(Rn , Rm ) is itself a vector space under the usual operations (S + T )(x) = S(x) + T (x)
(cT )(x) = c(T (x)).
The correspondences mentioned above are actually vector space isomorphisms. Thus, as vector spaces, (1) R can be identified with L(R, R) under the isomorphism m 7→ Tm , where Tm h = mh, for all real h; (2) Rm can be identified with L(R, Rm ) under the correspondence v 7→ Tv , where Tv (h) = vh, multiplication by a scalar; (3) Rn can be identified with L(Rn , R) under the correspondence a 7→ Ta , where Ta (x) = a · x, for all x ∈ Rn . Of course, we are not only interested in the algebraic properties of derivatives, but also in related distances. To handle that, we introduce the norm of a linear transformation T , kT k = sup |T xk. |x|≤1
x x 1 |T x| and hence, Notice that if x 6= 0, then = 1, so |T ( )| = |x| |x| |x| |T x| ≤ kT k|x|. This evidently holds also when x = 0. On the other hand, if K ≥ 0 satisfies |T x| ≤ K|x|, for all x ∈ Rn , we have by the definition of supremum as a least upper bound that kT k ≤ K. Thus, Lemma. For T ∈ L(Rn , Rm ), kT k is the least K ≥ 0 such that |T x| ≤ K|x|, for all x ∈ Rn . . . . , en be the standard basis vectors of Rn , and x = (x1 , . . . , xn ) ∈ Now, let e1 ,P n R . Then, x = j=1 xj ej , so n
X X n n 1/2 X n |T x| = xj T ej ≤ |xj ||T ej | ≤ |x| |T ej |2 , j=1 j=1 j=1 by the Cauchy-Schwarz inequality. This shows that kT k ≤ (
n X
|T ej |2 )1/2 < +∞.
j=1
It follows that T is Lipschitz, hence uniformly continuous, because |T (x − y)| ≤ kT k|x − y|, for all x ∈ Rn .
13/1/2003 1054 mam
208
DIFFERENTIATION OF VECTOR FUNCTIONS OF A VECTOR-VARIABLE
Theorem. The map T 7→ kT k turns L(Rn , Rm ) into a normed space; that is, for all S, T ∈ L(Rn , Rm ) and all scalars c, (1) 0 ≤ kT k < ∞; (2) kS + T k ≤ kSk + kT k; and (3) kcT k ≤ |c|kT k. One can check that in the 3 special cases above, the correspondences also preserve the norm, as follows. Theorem. (1) If m ∈ R and Tm h = mh, for all real h, then kTm k = |m|. (2) If v ∈ Rn and Tv (h) = vh, multiplication by the scalar h, then kTm k = |m|. (3) If a ∈ Rn and Ta (x) = a · x, for all x ∈ Rn , the kTa k = |a|. Rules of differentiation. We first consider the case f = T is already a linear mapping. Then, T (x) = T (x0 ) + T (x − x0 ) + 0, so T 0 (x0 ) = T . Thus, Theorem. Every linear transformation T : Rn → Rm is differentiable at each point x of Rn , with T 0 (x) = T . We emphasize that we are not saying T 0 = T . In general if f : G → Rm , f = Df , is not a transformation of Rn to Rm , but rather a mapping that associates to each x ∈ Rn a linear transformation f 0 (x) ∈ L(Rn , Rm ). In the case f = T is linear, the value of f 0 (x) is T , for all x. 0
Derivatives of constant maps are 0. Theorem. Let f : G ⊂ Rn → Rm be defined by f (x) = v, for all x ∈ G. Then, f 0 (x) = 0, for all x ∈ Rm . This is immediate from the definition and is left as an exercise. The chain rule. Let G be an open subset of Rn and g : G −→ Rm be differentiable at x0 . Let f map an open set containing g(G) into Rk , and let f be differentiable at u0 = g(x0 ). Then, f ◦ g is differentiable at x0 , with (f ◦ g)0 (x0 ) = f 0 (u0 )g 0 (x0 ) = f 0 (g(x0 ))g 0 (x0 ). Here if T = g 0 (x0 ), S = f 0 (g(x0 )), we are talking about the composite mapping S ◦ T = ST . 13/1/2003 1054 mam
DIFFERENTIATION OF VECTOR FUNCTIONS OF A VECTOR VARIABLE
209
Proof. Since g is differentiable at x0 , there exists a function ε1 , continuous and 0 at x0 with g(x) = g(x0 ) + g 0 (x0 )(x − x0 ) + ε1 (x)|x − x0 |, (1) for all x ∈ G. Since f is differentiable at u0 , there exists a function, ε2 which is continuous and 0 at u0 = g(x0 ) with f (u) = f (u0 ) + f 0 (u0 )(u − u0 ) + ε2 (u)|u − u0 |,
(2)
for all u in an open set containing g(G). Replacing u by g(x) in (2) gives f (g(x)) = f (u0 ) + f 0 (u0 )(g(x) − u0 ) + ε2 (g(x))|g(x) − u0 |. But u0 = g(x0 ), so by (1) we may replace g(x) − u0 by g 0 (x0 )(x − x0 ) + ε1 (x)|x − x0 | yielding f (g(x)) = f (g(x0 )) + f 0 (u0 )[g 0 (x0 )(x − x0 ) + ε1 (x)|x − x0 |] + ε2 (g(x)) g 0 (x0 )(x − x0 ) + ε1 (x)|x − x0 | , 0
(∗)
0
= f (g(x0 )) + f (u0 )g (x0 )(x − x0 ) + R(x) where R(x) = f 0 (u0 )ε1 (x)|x − x0 | + ε2 (g(x)) g 0 (x0 )(x − x0 ) + ε1 (x)|x − x0 | . Thus, we need only show that the remainder R(x) satisfies |R(x)|/|x − x0 | → 0, as x → x0 . Well, |g 0 (x0 )(x − x0 ) + ε1 (x)|x − x0 || ≤ kg 0 (x0 )k|x − x0 | + |ε1 (x)||x − x0 |, so |R(x)| ≤ kf 0 (u0 )k|ε1 (x)| + |ε2 (g(x))|kg 0 (x0 )k + |ε1 (x)| → 0, |x − x0 | as required. Here we used the fact that g is continuous at x0 , and ε2 is continuous and 0 at g(x0 ) = u0 . Remark. The fact that for linear transformations we write T x instead of T (x) and ST instead of S ◦ T hides some subtleties in the computation above. For example, if S denotes f 0 (x0 ), T denotes g 0 (x0 ), and h denotes x − x0 , going from the first to the second line of (∗), we calculate S(T (h) + ε1 (x)|h| = S(T (h)) + S(ε1 (x)|h|) = (S ◦ T )(h) + S(ε1 (x))|h|, where we have “taken out” the scalar |h|. Theorem. Let f and g be functions defined and differentiable on an open set containing x0 ∈ R with values in Rm and let c ∈ R. Then f + g and cf are differentiable at x0 and (f + g)0 (x0 ) = f 0 (x0 ) + g 0 (x0 ) and (cf )0 (x0 ) = cf 0 (x0 ). Theorem. Let f = (f1 , . . . , fm ) map an open set G of Rn to Rm . Then f is differ0 (x)). entiable at x iff each fi is differentiable at x. In this case f 0 (x) = (f10 (x), . . . , fm Proof. Note that for variety, we have fixed x in the domain of f . 13/1/2003 1054 mam
210
DIFFERENTIATION OF VECTOR FUNCTIONS OF A VECTOR-VARIABLE
Assume fi is differentiable at x for each i = 1, . . . , m and let Ti = fi0 (x). Then, there exist ε1 , . . . , εm such that εi (h) → 0 as h → 0 and fi (x + h) − fi (x) = Ti h + εi (h)|h|. Let e1 , . . . , em be the standard basis vectors of Rm . Then, m m m X X X (fi (x + h) − fi (x))ei = (Ti h)ei + εi (h)|h|ei f (x + h) − f (x) = i=1
i=1
i=1
Thus, f (x + h) = f (x) + (T1 h, . . . , Tm h) + ε(h)|h|, where ε(h) → 0 as h → 0. Thus, f is differentiable at x and f 0 (x) is the linear transformation T whose components are T1 , . . . , Tn , as required. The converse can be simply established by direct calculation, but it is fun to see that it follows by the chain rule from other results. Indeed, if f is differentiable at x, fi = πi ◦ f , where πi is the projection of Rm onto its ith coodinate. Thus, fi0 (x) = πi0 (f (x)) ◦ f 0 (x) But, πi is linear so πi0 (f (x)) = πi , and we get fi0 (x) = πi ◦ f 0 (x), the ith coordinate of f 0 (x). Directional derivatives and partial derivatives. A lot of information about derivatives of a vector function f of a vector variable can be obtained from functions of a real variable, by looking at the behaviour of f along a straight line. Definition. Let u be a non-zero element of Rn , f a function defined in a neighbourhood of x ∈ Rn with values in Rm . The u-directional derivative of f at x, or derivative of f at x in the direction u is f (x + tu) − f (x) , t→0 t
Du f (x) = lim
provided this limit exists. Notice that this is the derivative at 0 of the function of a real variable, f ◦ `, where `(t) = x + tu, for all t ∈ R, which parametrizes a straight line through the point x in the direction u. In the special case u = ej , the j th basis vector, Dej f (x) is called the j th partial derivative at x and denoted Dj f (x) or ∂f (x) . Notice, that in this case, ∂xj Dj f (x) = lim
t→0
f (x1 , . . . , xj−1 , xj + t, xj+1 , . . . , xn ) − f (x1 , . . . , xn ) t
the derivative at xj of the function s 7→ g(s) obtained from f by fixing all the xk , for k 6= j, and replacing xj by the variable s. Theorem. Let u be a non-zero element of Rn , f a function defined in a neighbourhood of x ∈ Rn with values in Rm . Then the u-directional derivative of f at x is f 0 (x)u, the value of the linear transformation f 0 (x) at u. Proof. Let `(t) = x + tu, for all t ∈ R. Then, by the chain rule for differentiation of transformations, Du f (x)t = (f ◦ `)0 (0)t = (f 0 (`(0)) ◦ `0 (0))(t) = f 0 (`(0))(`0 (0)(t)) 13/1/2003 1054 mam
DIFFERENTIATION OF VECTOR FUNCTIONS OF A VECTOR VARIABLE
211
But `(0) = x, and `0 (0)t = ut, so we have Du f (x)t = f 0 (x)(ut) = (f 0 (x)u)t. Thus, Du f (x) = f 0 (x)u. Notes. (1) The reason we put the t in the computation in the above proof is that we were using the chain rule for transformations (i.e. vector functions of a vector variable), and the directional derivative was defined as a vector, not as a linear transformation. Multiplying by t amounts to evaluating the corresponding linear transformation at t. (2) It follows that if s is a scalar, Dsu f (x) = sDu f (x). Many people require that u be a unit vector and leave directional derivatives undefined otherwise; others define the directional derivative in terms of u/|u|. Of course, this means that formulas must be adjusted by the factor 1/|u|. The gradient. In the case of a real valued function f of a vector variable, we saw that there exists a vector a ∈ Rn such that f 0 (x)h = a · h, for all h ∈ Rn . We will now identify that a. If a = (a1 , . . . , an ), then the j th coordinate of a is simply aj = a · ej , th where ej is the j standard basis vector. Thus, in the present case, ∂f (x) aj = f 0 (x)ej = Dj f (x) = , ∂xj and a = (D1 f (x), . . . , Dn f (x)). The vector (D1 f (x), . . . , Dn f (x)), defined whenever each of the partials D1 f (x), . . . ,Dn f (x) exists is called the gradient of f at x and denoted ∇f (x) or grad f (x). Thus, if a real valued f is differentiable at x, then f 0 (x)h = grad f (x) · h, for all h ∈ Rn . Often, in this setting, one uses the notation dx = (dx1 , . . . , dxn ) instead of h and this becomes ∂f (x) ∂f (x) dx1 + · · · + dxn f 0 (x)dx = grad f (x) · dx = ∂x1 ∂xn Example. The existence of the gradient does not imply differentiability — not even continuity. Let x x 1 2 2 + x2 , if (x1 , x2 ) 6= 0 x f (x1 , x2 ) = 1 2 0, if (x1 , x2 ) = 0. Then, along the line x2 = 0, f (x) is constantly 0, so D1 f (0) = 0, and similarly D2 f (0) = 0, so ∇f (0) = (0, 0), but as x → 0 along the line x1 = x2 , f (x) → 1/2, so f is not continuous at 0. You can check this using ε, δ arguments or you can look at the same phenomenon by using a composition with the function ` : t 7→ t(1, 1) which decribes the line x1 = x2 . t2 lim f (t, t) = lim 2 = 1/2. t t + t2 t→0 In this example, we see that the derivative in the direction of (1, 1) does not exist. One might be tempted to believe that if all directional derivatives existed and were equal, then f would be differentiable, but this fails also. 13/1/2003 1054 mam
212
DIFFERENTIATION OF VECTOR FUNCTIONS OF A VECTOR-VARIABLE
Example. (The existence of all directional derivatives does not imply continuity.) Let f : R2 → R be given by
x21 x2 , if (x1 , x2 ) 6= 0 + x22 f (x1 , x2 ) = 0, if (x1 , x2 ) = 0. x41
Fix u = (u1 , u2 ) 6= 0. If u2 = 0, we get t2 u21 0 t = 0, t→0 t4 u4 1+0
Du f (0, 0) = lim while if u2 6= 0, Du f (0, 0) = lim
u2 t3 u21 u2 = 1. 2 2 + t u2 ) u2
t→0 t(t4 u4 1
Thus, all directional derivatives exist at (0, 0). This implies that the restriction of f to each straight line through the origin is continuous at 0. Nevertheless, f is not continuous at 0, for if we follow the curve γ : t 7→ (t, t2 ), we have lim f (γ(t)) = lim
t→0
13/1/2003 1054 mam
t→0 t4
t4 = 1/2 6= 0. + t4
DIFFERENTIATION OF VECTOR FUNCTIONS OF A VECTOR VARIABLE
213
The matrix of the derivative. The Jacobian.. We recall from Linear Algebra that every vector x ∈ Rn has a column matrix x1 .. representation [x] = . , where the xj are the coordinates of x with respect xn Pn to the standard basis x = (x1 , . . . , xn ) = j=1 xj ej . The mapping x 7→ [x] is a vector space isomorphism, and often one identifies x with [x]. Moreover, every linear transformation T : Rn → Rm has a matrix
a11 . [T ] = ..
a1n .. .
...
am1
...
amn
which satisfies [T x] = [T ][x],
(matrix multiplication).
The columns of [T ] are the column representations of the image vectors T e1 , . . . , T en . For the reader who has not seen that development, or would like a review, here is a brief version. Let e1 , . . . , en denote the standard basis vectors in Rn , and let e1 , . . . , em be the standard basis vectors in Rm . Then, x = n j=1 xj ej , and
P
Tx =
For each j, T ej =
Pm
i=1
n X
xj T ej .
j=1
aij ei , for some numbers aij , i = 1, . . . , m. Thus,
Tx =
m X X
xj
j
aij ei =
i=1
n m X X i=1
aij xj ei
j=1
Pn aij xj . In terms of matrix multiplication this j=1 a11 . . . a1n x1 .. .. [T x] = ... . .
Thus, the coordinates of T x are says
am1
...
amn
xn
Now, in case T is f 0 (x) the derivative at x of a transformation taking a neighbourhood of x ∈ Rn to Rm , for each j, f 0 (x)ej is the ej -directional derivative; that is, the partial derivative Dj f (x). Its coordinates with respect to e1 , . . . , em are just obtained by differentiating the coordinates of f at x. Thus, the ith coordinate of f 0 (x)ej is Dj fi (x), and the matrix [f 0 (x)] which represents the linear transformation f 0 (x) is
D1 f1 (x) . . . .. . D1 fm (x) . . .
Dn f1 (x) .. . Dn fm (x)
This matrix is often called the Jacobian matrix of f at x. The matrix is invertible if and only if its determinant is non-zero. This determinant det[f 0 (x)] is called the ∂(f1 , . . . , fm ) . Jacobian of f at x, sometimes denoted ∂(x1 , . . . , xn )
13/1/2003 1054 mam
214
DIFFERENTIATION OF VECTOR FUNCTIONS OF A VECTOR-VARIABLE
Continuous differentiability. For a function f : G ⊂ Rn → Rm , G open, f is continuously differentiable at a if f 0 is exists in a neighbourhood of a and f 0 is continous at a as a map on Rn to L(Rn , Rm ), with the distance given by the operator norm: d(T, S) = kT − Sk. n Recall that that if e1 , . . . , ej are the standard basis P 2 in R , P we showed, P vectors 2 2 then kT k ≤ j |T ej | . If [T ] = (aij ) the right-hand side here is j i aij . Thus, if T and S are linear transformations with matrices A = (aij ) and B = (bij ), then
kT − Sk ≤
X ij
1/2
(aij − bij )2
(***.)
Even though existence of all the partials doesn’t imply differentiability in general, it does so if the partials are continuous. Theorem. For a function defined in a neigbourhood of a ∈ Rn with values in Rm , f is continuously differentiable at a if and only if for all i, j the partial derivative Dj fi is continuous at a. Proof. ( =⇒ ) Let f be continously differentiable at a. Then for all x in a neigbourhood of a, each partial derivative exists with Dj fi (x) = (f 0 (x)ej ) · ei , where ei denotes the ith basis vector in Rm . Thus, |Dj fi (x) − Dj fi (a)| = |f 0 (x)ej · ei − f 0 (a)ej · ei | = |(f 0 (x) − f 0 (a))ej · ei | ≤ kf 0 (x) − f 0 (a)k |ej | | ei |. As x → 0, f 0 (x) → f 0 (a), by continuity of f 0 , so Dj fi is continous at a. ( ⇐= ) Conversely, assume the partial derivatives are continous at a. Once we show differentiability, the continuity of the derivative will follow from
kf 0 (x) − f 0 (a)k ≤
X ij
1/2
(Dj fi (x) − Dj fi (a))2
,
which is the application of (∗ ∗ ∗) to this situation. Without loss of generality, assume m = 1. Let ε > 0. Choose δ > 0 so small that for all j, |Dj f (x) − Dj f (a)| < ε/n, for |x − a| < δ Pk Take h ∈ Rn with |h| < δ. Put x0 = a, xk = a + j=1 hj ej , (so xn = a + Pn j=1 hj ej = a + h). Then, f (a + h) − f (a) =
n X
f (xj ) − f (xj−1 ).
j=1
By the Mean Value Theorem (for real functions of a real variable), f (xj ) − f (xj−1 ) = Dj f (cj )hj , 13/1/2003 1054 mam
DIFFERENTIATION OF VECTOR FUNCTIONS OF A VECTOR VARIABLE
215
for some cj on the line from xj−1 to xj . Thus, f (a + h) − f (a) −
n X
Dj f (cj )hj = 0,
j=1
so X n n X X n f (a + h) − f (a) − ≤ D f (a)h D f (c )h − D f (a)h j j j j j j j j=1 j=1 j=1 X |Dj f (cj ) − Dj f (a)||hj | ≤ j
≤ ε|h| Thus, by definition, f is differentiable at a with f 0 (a)h =
13/1/2003 1054 mam
P
j
Dj f (a)hj = ∇f (a)·h.
216
DIFFERENTIATION OF VECTOR FUNCTIONS OF A VECTOR-VARIABLE
Notes
13/1/2003 1054 mam
The Inverse Function Theorem For real functions of a real variable, we learned that if f 0 (x) 6= 0, for all x in an open interval I, then f is strictly monotone, hence injective, so the inverse function f −1 is defined everywhere on f (I), and this inverse function itself is also differentiable. If we knew only that f 0 (a) were non-zero at one point a, and if we assumed that f 0 were continuous, then we could still find a neighbourhood U of a such that f 0 (u) 6= 0 for all u ∈ U , so the result would still hold for f restricted to U . This is the version of the result we will develop for the higher dimensional case. The general idea of the theorem is that a derivative gives a local approximation to a function at point; if the derivative at the point is invertible, then the function is also invertible (near that point). The significance of the condition f 0 (a) 6= 0 is the invertiblity of f 0 (a). That is the condition we will have to impose on the general case. Inverse Function Theorem. Let f be a continuously differentiable mapping of an open subset G of Rn to Rn . If a ∈ G with f 0 (a) invertible and b = f (a), then (1) there exist open sets U and V such that a ∈ U , b ∈ V , and f maps U one-to-one onto V ; (2) if g is the inverse of the restriction of f to U , then g is continuously differentiable on V and g 0 (y) = [f 0 (g(y))]−1 .
Note. Unlike the case of functions on R to R, there is no hope for invertibility on all of G, even if f 0 (x) is invertible for all x ∈ G. You can check this by looking at the mapping f : R2 → R2 defined by f (x, y) = (ex cos y, ex sin y). This is actually the exponential function, viewed as a map on R2 to R2 instead of C to C. It is continuously differentiable at every point and its derivative (in the sense of transformation) is invertible, but f is far from one-to-one. Its inverse “the complex logarithm” has infinitely many “branches”. Theorem. Let Ω be the set of all invertible linear transformations of Rn to itself. (1) If T ∈ Ω and S ∈ L(Rn , Rn ) with kS − T k < 1/kT −1 k, then S is also invertible, so Ω is open. (2) The map T 7→ T −1 on Ω onto itself is continuous. Proof. (1) Let T be invertible and α = 1/kT −1 k. Then |x| = |T −1 T x| ≤ kT −1 k|T x| ≤ kT −1 k|(T − S)x + Sx| ≤ kT −1 k (kT − Sk|x| + |Sx|) . 13/1/2003 1054 mam 217
218
THE INVERSE FUNCTION THEOREM
Thus, (α − kT − Sk)|x| ≤ |Sx|.
(∗)
Now, for all S with kS − T k < α, we deduce from (∗) that Sx = 0 implies x = 0, so S is invertible. Thus, Ω is open. (2) Replacing x by S −1 y in (∗) gives |S −1 y| ≤
1 |y|, α − kT − Sk
so that kS −1 k ≤ 1/(α − kT − Sk) < 2/α = 2kT −1 k, provided kT − Sk < α/2. Also, kS −1 − T −1 k = kS −1 T T −1 − S −1 ST −1 k ≤ kS −1 kkT − SkkT −1 k ≤ 2kT −1 k2 kT − Sk, when kT − Sk < α/2. It follows that the inversion map is continous at T .
Mean Value Inequality. Let U be an open subset of Rn and f : U → Rm , with kf 0 (x)k ≤ K, for all x ∈ U . If U contains the line segment from a to b, then |f (b) − f (a)| ≤ K|b − a|.
If U is convex, that is, contains the line segment joining each pair of its points, then the result entails f is Lipschitz. Proof. We have proved elsewhere the corresponding result for vector functions of a real variable. We will reduce the present situation to that case. Let γ(t) = (1 − t)a + tb, so that as t traverses the interval [0, 1], γ(t) traverses the line segment from a to b. Let g = f ◦ γ. According to the chain rule, g 0 (t) = f 0 (γ(t))γ 0 (t) = f 0 (γ(t))(b − a). Then, |g 0 (t)| ≤ kf 0 (γ(t))k|b − a| ≤ K|b − a|, so according to the real variable Mean Value Inequality, |g(1) − g(0)| ≤ K|b − a||1 − 0|; that is, |f (b) − f (a)| ≤ K|b − a|, as required. We are now ready for the proof of the inverse function theorem.
13/1/2003 1054 mam
THE INVERSE FUNCTION THEOREM
219
Proof of the Inverse Function Theorem. Let f 0 (a) be invertible and denote it by T . Since f 0 is continuous at a, there exists an open ball U centred at a with kf 0 (x) − T k ≤
1 2kT −1 k
for all x ∈ U.
Then, we see that f 0 (x) is also invertible, though we won’t make use of that yet. For a fixed y ∈ Rn , define for all x ∈ U , ϕ(x) = ϕy (x) = x + T −1 (y − f (x)). Notice that y = f (x) if and only if ϕ(x) = x; that is, if and only if x is a fixed point of ϕ. Differentiating gives ϕ0 (x) = I + T −1 (−f 0 (x)) = T −1 (T − f 0 (x)); hence, kϕ0 (x)k ≤ kT −1 kkT − f 0 (x)k ≤ 1/2. Since U is convex, we can use the Mean Value Inequality, obtaining |ϕ(x1 ) − ϕ(x2 )| ≤ 12 |x1 − x2 |,
(C)
for all x1 , x2 ∈ U , showing that ϕ is a contraction mapping. It follows that ϕ can have at most one fixed point. Thus, there is at most one point x with y = f (x). This shows that f is one-to-one on U . Let V = f (U ). To show that V is also open, let y0 ∈ V . Let x0 be such that f (x0 ) = y0 . Choose r > 0 so small that the closed ball B(x0 , r) is contained in U . We will show that V contains the ball centred at y0 , radius r/2kT −1 k. So, let |y − y0 | ≤ r/2kT −1 k. Using the contraction mapping ϕ = ϕy , defined above, we compute |ϕ(x0 ) − x0 | = |T −1 (y − f (x0 ))| ≤ kT −1 k|y − f (x0 )| ≤ r/2, so that for x ∈ B(x0 , r), |ϕ(x) − x0 | ≤ |ϕ(x) − ϕ(x0 )| + |ϕ(x0 ) − x0 | ≤ 12 |x − x0 | +
r 2
≤ r.
The closed ball B(x0 , r) is a complete metric space, since it is closed and Rn is complete. Thus, by the Contraction Mapping Theorem (Banach’s fixed point theorem), ϕ has a fixed point x. Thus, f (x) = y, so that y ∈ V . This completes the proof that V is open. Now, let g be the inverse of the restriction of f to U . Our job is to show that g is continuously differentiable in V , with the expected formula for its derivative. The inequality (C) doesn’t actually depend on the particular y. Indeed, ϕ(x1 ) − ϕ(x2 ) = x1 − x2 − T −1 (f (x1 ) − f (x2 )), so (C) becomes |x1 − x2 − T −1 (f (x1 ) − f (x2 ))| ≤ 12 |x1 − x2 |, 13/1/2003 1054 mam
220
THE INVERSE FUNCTION THEOREM
and hence 1 2 |x1
− x2 | ≤ |T −1 (f (x1 ) − f (x2 ))|.
For y1 , y2 ∈ V , we may replace x1 and x2 by g(y1 ), g(y2 ), obtaining |g(y1 ) − g(y2 )| ≤ 2|T −1 (y1 − y2 )| ≤ 2kT −1 k |y1 − y2 |. Now, to show g is differentiable at any point y0 ∈ V , let y ∈ V also, and put x0 = g(y0 ), y = g(x), S = [f 0 (x0 )]−1 . Then, there is ε(x) → ε(x0 ) = 0 with g(y) − g(y0 ) − S(y − y0 ) = x − x0 − S(y − y0 ) = −S(y − y0 − f 0 (x0 )(x − x0 )) = −S(f (x) − f (x0 ) − f 0 (x0 )(x − x0 )) = −S(ε(x)|x − x0 |) = −S(ε(g(y))|g(y) − g(y0 )|. As y → y0 , −S(ε(g(y))) → 0 and |g(y) − g(y0 | ≤ 2kT −1 k|y − y0 |, so this shows g is differentiable at y0 , with derivative g 0 (y0 ) = S = [f 0 (g(y0 ))]−1 . Finally, g is continuous, f 0 is continuous by hypothesis, and the inversion map T → T −1 is continuous on the set Ω of all invertible linear operators on Rn , so g 0 — which is the composite of these 3 — is continuous on V , so g is also continuously differentiable, and we are done. Corollary. If f is a continuously differentiable mapping of an open set G of Rn into Rn , with f 0 (x) invertible for all x, then f is an open mapping; that is, f (W ) is open, for each open subset W of G. The proof is an exercise.
13/1/2003 1054 mam
The Implicit Function Theorem This is still to be included.
13/1/2003 1054 mam 221
222
THE IMPLICIT FUNCTION THEOREM
Notes
13/1/2003 1054 mam
Countability. Recall that a function f : A −→ B is called one-to-one or an injection if f (a) = f (a0 ) implies a = a0 . It is called onto or a surjection if, for each b ∈ B there exists a ∈ A with f (a) = b, and is called a bijection if it is both an injection and a surjection (one-to-one and onto). Definition. Two sets A, and B, are called equinumerous if there exists a bijection f : A −→ B. Notation: A ↔ B. We also say A is equinumerous with B, or that A and B are in one-to-one correspondence. Example. The set N = {1, 2, . . . } is equinumerous with {m ∈ N : m ≥ 2} = {2, 3, 4, . . . }. The idea of this example is basic and IMPORTANT, it is used over and over again! Let us put A = {m ∈ N : m ≥ 2}. Since we have no theorems to use yet, to show N ↔ A, we must find a bijection from N onto A. We simply let f (n) = n + 1, for n ∈ N. Certainly f : N −→ A, because if n ∈ N then n + 1 ≥ 1 + 1 = 2. To show that f is injective (i.e. one-to-one) let f (n) = f (n0 ). Then, n + 1 = n0 + 1, so n = n0 . To prove f is surjective, that is f maps onto A, let m ∈ A. Then m ∈ N and m > 1, so m − 1 ∈ N (one of our first theorems of natural numbers) and f (m − 1) = m − 1 + 1 = m.
Example. The set of natural numbers is equinumerous with the set of even natural numbers. To see this, recall that a natural number m is called even if it is divisible by 2. That, in turn, means that there exists n ∈ N such that m = 2n. Thus, the set of even numbers is exactly {2n : n ∈ N} = 2N. A bijection on N to 2N is given by f (n) = 2n, for n ∈ N. Certainly for all n ∈ N, f (n) is even and we have just checked that f is onto. To see that f is one-to-one, put f (n1 ) = f (n2 ). Then, 2n1 = 2n2 , so n1 = n2 as required.
Equinumerousness is an equivalence relation on the class of all sets. That is: 13/1/2003 1054 mam 223
224
COUNTABILITY
Lemma. For sets A, B, C: (a) A ↔ A. (b) A ↔ B implies B ↔ A. (c) A ↔ B and B ↔ C imply A ↔ C. Proof. (a) For any set A, the identity mapping defined by f (x) = x,
for all x ∈ A,
is a bijection. Indeed, if f (x) = f (x0 ) then x = x0 , just by definition (so f is an injection) and if x ∈ A then x = f (x) (so f is a surjection).
(b) Suppose f : A −→ B is a bijection. Then f −1 : B −→ A is also a bijection. Let us review the proof of this. From the definitions, f (x) = y iff x = f −1 (y). [For a given y in the range of f , the fact that f is one-to-one gives us exactly one x such that f (x) = y, and this x is defined to be f −1 (y)].
Now if y, y 0 ∈ B and f −1 (y) = f −1 (y 0 ), then f (f −1 (y)) = f (f −1 (y 0 )), so y = y 0 . Thus, f −1 is injective. To see that it is surjective, let x ∈ A. Then f (x) is some point y of B and thus x = f −1 (y). (c) is left as an exercise. Lemma. The only set equinumerous with ∅ is itself. Proof. Suppose A ↔ ∅. Then there is a bijection f : A −→ ∅. If A 6= ∅, choose a ∈ A then f (a) ∈ ∅, which is impossible. Lemma. (Pigeonhole Principle) For natural numbers m and n, if m < n, then there is no injection f : {1, . . . , n} −→ {1, . . . , m}. Proof. We prove this by induction on m. Thus the statement on which we do the induction is P(m): For all n ∈ N, if m < n then there is no injection f : {1, . . . , n} −→ {1, . . . , m}. 1-1
This is true for m = 1, because if n ∈ N and f : {1, . . . , n} −−→ {1}, then f (n) = f (1), which implies n = 1. Assume it is true for m. Let n ∈ N and n > m + 1. Suppose there were an injection f : {1, . . . , n} − → {1, . . . , m + 1}. Since there is no injection of {1, . . . , n} into {1, . . . , m}, there must be a p ∈ {1, . . . , n} with f (p) = m + 1. Fix such a p and put g(k) = f (k), for k < p and f (k + 1) for k = p, . . . , n − 1. Then g injects {1, . . . , n − 1} into {1, . . . , m}, which is a contradiction. Thus, no such f exists: the statement is true for m + 1. By the PMI, P (m) holds for all m ∈ N. That is, for all m, n ∈ N, if m < n, then there is no injection f : {1, . . . , n} −→ {1, . . . , m}. 13/1/2003 1054 mam
COUNTABILITY.
225
Corollary. There is no injection of {1, . . . , n} onto a proper subset of itself. Proof. Suppose h were an injection on {1, . . . , n} to itself which is not surjective. Since h cannot map {1, . . . , n} to {1, . . . , n − 1}, there is some i with h(i) = n and some other k ∈ {1, . . . , n} \ rangeh. Define f (j) =
h(j), j 6= i k,
j=i
.
Then f injects {1, . . . , n} into {1, . . . , n − 1} which is impossible.
Definition. The empty set is said to have 0 elements. For n ∈ N, A is said to have n elements if A ↔ {1, . . . , n} We prove now that this n is unique. It is called the cardinality of A (or the number of elements in A). denoted card(A) or #(A). Theorem. If n, m ∈ N and A ↔ {1, . . . , n} and A ↔ {1, . . . , m} then m = n. In other words, if A has m elements and A has n elements, then m = n. Proof. Let n, m ∈ N. And suppose A ↔ {1, . . . , n} and A ↔ {1, . . . , m}. Then, from the equivalence relation properties of ↔, {1, . . . , n} ↔ {1, . . . , m}, If m 6= n, one must be smaller. Say m < n. But, by the Pigeon-hole Principle, there can be no injection of {1, . . . , n} into {1, . . . , m},, so no bijection. This is a contradiction yielding m = n. Definition. A set is finite infinite, denumerable countable
called if it has n elements, for some n ∈ W = N ∪ {0}, if it is not finite if it is equinumerous with N if it is finite or denumerable.
Theorem. (Also referred to as the Pigeon-hole Principle) If m, n ∈ N, A has n elements, B has m elements and m < n, then there is no injection of A into B. Proof. Exercise. We have shown that the set {2, 3, . . . } is denumerable, as is the set of even numbers. The reader should check that: (•) The set of odd numbers is denumerable and (•) so is the set Z of integers {· · · − 2, −1, 0, 1, 2, . . . }.
13/1/2003 1054 mam
226
COUNTABILITY
Theorem. (1) The set N is infinite; hence, all denumerable sets are infinite. (2) Equivalent for a set A are: (a) A is infinite (b) A contains a denumerable subset (c) A is equinumerous with a proper subset of itself. Proof. (1) If N were not infinite, then since it is not ∅, there would be a bijection f : N −→ {1, . . . , n}, for some n ∈ N. But then, if we let g be the restriction of f to {1, . . . , n + 1}: g(x) = f (x), for x ∈ {1, . . . , n + 1}, then g is an injection of {1, . . . , n+1} into {1, . . . , n}. which violates the Pigeon-hole Principle. For the second statement, suppose A is denumerable, but finite. Then N ↔ A. If A were empty we would have N = ∅, and if n ∈ N with A ↔ {1, . . . , n}, we would have N ↔ {1, . . . n}, which cannot happen since N is infinite. (2) ((a) =⇒ (b)) Assume (a) holds; that is, A is infinite. Then, A is not empty, so we may choose a point x1 ∈ A. Suppose x1 , . . . , xk have been chosen as distinct (that is, with xi 6= xj if i 6= j) elements of A. Then {x1 , . . . , xk } ⊂ A, and A 6= {x1 , . . . , xk }, otherwise A would be finite. Thus 6 ∅, A \ {x1 , . . . , xk } = and we may choose another xk+1 ∈ A\{x1 , . . . , xk }. Thus, x1 , . . . , xk+1 are distinct elements of A. By recursion (definition by induction), we thus have proved the existence of a sequence of elements xi ∈ A, i ∈ N, such that xi 6= xj if i 6= j. (Indeed, if i < j, all the elements x1 , . . . , xj are distinct.) This says that the map g : i 7→ xi is one-to-one on N into A. Let B be the range of g, that is B = {xi : i ∈ N}. Then g is onto B so g is a bijection of N onto B. That is B is a denumerable subset of A. ((b) =⇒ (c)) Suppose B is a denumerable subset of A, say i 7→ xi is an injection on N onto B. We now define a map f on A by x, if x ∈ A \ B f (x) = xi+1 , if x = xi , for some i ∈ N. 13/1/2003 1054 mam
COUNTABILITY.
227
This f is 1-1 on A onto A \ {x1 }, a proper subset of A. Indeed, first, notice that if x ∈ A, then either x ∈ A \ B, in which case, f (x) = x cannot be x1 , or x is some xi ∈ B, and f (x) = xi+1 . But xi+1 cannot be xi , otherwise i + 1 = 1, since i 7→ xi is one-to-one. This is impossible, since each natural number is ≥ 1. This proves the range of f is contained in A \ {x1 } (which is all we really need to say that f maps to a proper subset of A. But we go on anyway, since the technique of proof is of use elswhere). If a ∈ A\{x1 }, either a ∈ A\B, so f (a) = a ∈ A\{x1 } or a = xi+1 , for some i and then a = f (xi ). This shows the range of f is exactly A \ {x1 }. To show f is injective, we let f (x) = f (x0 ) and show x = x0 There are three cases, both x, x0 ∈ A \ B, both are in B, or one is in A \ B and the other is in B. In the first case, f (x) = x and f (x0 ) = x0 and f (x) = f (x0 ) so x = x0 . In the second case, x = xi , and x0 = xj , for some i, j ∈ N. Thus, f (x) = xi+1 and f (x0 ) = xj+1 , so xi+1 = xj+1 . But the various xk are distinct, so i + 1 = j + 1, and therefore i = j. The third case cannot occur, since then (say) f (x) ∈ A \ B and f (x0 ) ∈ B. ((c) =⇒ (a)) Let A be equinumerous with a proper subset B of itself. (Thus B ⊂ A, A \ B 6= ∅, and A ↔ B.) Suppose A were finite. Then A 6= ∅, since A \ B 6= ∅, so A ↔ {1, . . . , n} for some n ∈ N. Fix such an n. Let f : A −→ B be a bijection and g : {1, . . . , n} −→ A a bijection. Then g −1 ◦ f ◦ g is still one-to-one (why?). Moreover, it maps {1, . . . , n} onto a proper subset of itself, which is impossible by the (corollary to) the Pigeonhole principle. Hence, A is infinite. In detail, since B is a proper subset of A, there exists an a ∈ A \ B. Then, there is an i with g(i) = a. If there would exist j with g −1 ◦ f ◦ g(j) = i, we would have f (g(j)) = g(i) = a, which contradicts the fact that f maps to B. Thus the range of g −1 ◦ f ◦ g(j) is not all of {1, . . . , n}.
Example. If a < b, then [a, b] is infinite. One proof of this would use (b). For each n ∈ N, let xn = a + set {xn : n ∈ N} is denumerable and contained in [a, b]. 13/1/2003 1054 mam
b−a . Then the n
228
COUNTABILITY
a+b . Then [a, c] is a proper subset of [a, b], 2 yet the function on [a, c] defined by f (x) = 2(x − a) + a maps [a, c] bijectively onto [a, b]. Another could use (c): Let c =
For convenience of reference, we now list the facts about countability most commonly used in Analysis the form of 4 theorems, and give the proofs afterward. A Theorem. (1) Every subset of a finite set is finite. (2) Every subset of a countable set is countable. (3) The image under any map of a finite set is finite. (4) The image under any map of a countable set is countable. B Theorem. For a non-empty set A, equivalent are: (a) A is countable. (b) There is a surjection f : N −→ A. (c) There is an injection g : A −→ N. C Theorem. The following are countable: N × N, hence the product of two countable sets. the union of a countable number of countable sets. the integers the rationals. the algebraic numbers D Theorem. Each interval [a, b] in R with a < b is uncountable, hence so is R itself.
13/1/2003 1054 mam
COUNTABILITY.
229
Proof of A(1). Let A be finite and B ⊂ A. If B is infinite, then it contains a denumerable subset D. But then D ⊂ A also, so A is also infinite. Proof of A(2). (Every subset of a countable set is countable.) Let A be countable and B ⊂ A. If A is finite, then so is B is by (1). So we may assume A is denumerable. Also, if B is finite we are done, so we also assume B is infinite. Let A = {a1 , a2 , . . . }, where the an are distinct. Now B is infinite, so the set M := {n ∈ N : an ∈ B} is not empty, in fact it is infinite (why?). By the Well-ordering Property of N, M has a least element. So let n1 be the first n in M . Thus / M. (∗) i < n1 implies i ∈ Continuing recursively, suppose n1 , . . . , nk have been chosen in M , we choose nk+1 to be the first element n of M \ {n1 , . . . , nk }. This defines a map k 7→ nk of N to M. At each stage, if n ≤ nk and n ∈ M , then n ∈ {n1 , . . . , nk }.
(∗∗)
Thus nk+1 > nk and we have k 7→ nk is one-to-one on N onto M ; hence, k 7→ ank is one-to-one on N onto B. Here are some details of the last paragraph. The statement (∗∗) is proved by induction on k: It is true for k = 1, since n1 is the first element of M (see (∗)). So suppose the statement is true for k, Then the elements of M \ {n1 , . . . , nk } are all > nk . Since nk+1 is one of these, nk+1 > nk and since it is the smallest one, if n ∈ M with n ≤ nk+1 , it has to be nk+1 or else n ≤ nk , In the second case, n ∈ {n1 , . . . , nk }, by the inductive hypothesis. Thus, in both cases if n ≤ nk+1 and n ∈ M , then n ∈ {n1 , . . . , nk+1 }, which is the statement for k + 1. Thus, by the PMI, (∗∗) is true for all k. Now, the map i 7→ ni is one-to-one, for if i 6= k, say i < k, then ni ∈ {n1 , . . . , nk−1 } and nk is not in this set. Finally, to prove the map i 7→ ni is onto M , let m ∈ M . Since nk < nk+1 for all k we have nk ≥ k, for all k ∈ N. (This is a simple induction exercise.) Thus m ∈ M and m ≤ nm , so m ∈ {n1 , . . . , nm }, by (∗∗).
13/1/2003 1054 mam
230
COUNTABILITY
As we said, the results A,B,C, and D above are listed together for ease of reference, but it is not necessarily the best order to prove them. We will come back to the rest of Theorem A later, and for now turn to Theorem B. Proof of Th. B. Let A 6= ∅. (a) =⇒ (b). Let A be countable. Then either A is denumerable, or A is finite. If A is denumerable, then there is a bijection f : N −→ A, and this is certainly a surjection. So, we are left with the case A is finite. Then, there is an n ∈ N and a bijection i 7→ ai on {1, . . . , n} onto A. Thus A = {a1 , . . . , an }. We simply define f : N −→ A by f (j) =
aj , if j ≤ n a1 , if j > n.
Clearly this map is onto since each of the elements of A was already one of the ai for i ≤ n. (b) =⇒ (c). Assume f : N −→ A is surjective. For each a ∈ A, there is an element n ∈ N with f (n) = a. Choose one such n (say the first one) and call it na . Then the map a 7→ na , is on A into N. This map is injective because if na = na0 , then f (na ) = f (na0 ), that is, a = a0 . (c) =⇒ (a). Now let g : A −→ N be injective. Then g[A] ⊂ N and g : A −→ g[A], is one-to-one, and onto. Thus A is equinumerous g[A], and g[A] is countable, since it is a subset of a countable set. Proof of A(c). Suppose A is countable, let g be any mapping. We have to show that g[A] is countable. By theorem B(b) there is a surjection f of N onto A, and then the composite map g ◦ f maps N onto g[A], so g[A], is also countable. It is left as an exercise to state and prove an analogue of Theroem B for finite sets and use it to prove A(3) that the image of any finite set is finite. 13/1/2003 1054 mam
COUNTABILITY.
231
Proof that N × N is countable. • One way to prove this is to show that the map f : (m, n) 7→ m + (m + n − 2)(m + n − 1)/2 is 1-1 on N × N onto N. This is “diagonal enumeration”. Onto is proved by induction: f (1, 1) = 1. Supposing f (m, n) = k either n = 1 in which case f (1, m + 1) = k + 1 otherwise f (m + 1, n) = k + 1. To prove 1-1: suppose f (m, n) = f (m0 , n0 ). If m + n = m0 + n0 , then m = m0 and (N + 1 − 2)(N + 1 − 1) = then n = n0 ; if m + n > m0 + n0 = N , then f (m, n) ≥ m + 2 (N − 2)(N − 1) (N − 1)N 0 0 = m + (N − 1) + > f (m , n ). m+ 2 2
• By B(c), it is enough to show that there is an injection g : N × N into N. One such is given by g : (m, n) 7→ 2m−1 (2n − 1),
for (m, n) ∈ N × N.
If g(m, n) = g(m0 , n0 ) then, 0
2m−1 (2n − 1) = 2m −1 (2n0 − 1). From very basic number theory, this implies 0
2m−1 = 2m −1
and
2n − 1 = 2n0 − 1
(otherwise you could prove that an even number was equal to an odd number) and then m = m0 and n = n0 , that is, (m, n) = (m0 , n0 ). The mapping g is actually also onto. For a given k ∈ N, k = g(m, n) where m is the greatest natural number for which 2m−1 divides k (leaving the odd number 2n − 1 as a quotient). It is interesting to draw a picture numbering pairs (m, n) this way. Try it.
Exercise. Deduce from the countability of N × N, that the cartesian product of two countable sets is countable, that is, if A and B are countable, so is A × B.
13/1/2003 1054 mam
232
COUNTABILITY
Proof that a countable union of countable sets is countable. If I is countable (6= ∅) and for each i ∈ I, Ai is countable (and 6= ∅), then there is a map g on N onto I and, for each i ∈ I, there is a map hi on N onto Ai . Then S the map f defined by f (m, n) = hg(m) (n) maps the countable set N × N onto i∈I Ai , hence the latter image is also countable. Proof that the set of rationals is countable. The set Z of integers is the union of the non-negative and the negative integers, so is countable. The set of rationals is the image of the countable set Z × N under the map (m, n) 7→ m/n, so is countable by B(2).
13/1/2003 1054 mam