Understanding Analysis (with Instructor's Solutions Manual)

Author: Stephen Abbott

1690 downloads 13466 Views 5MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

©

Stephen Abbott Mathematics Department Middlebury College Middlebury, VT 05753 USA [email protected] Editorial Board

S. Axler

F.W. Gehring

K.A. Ribet

Mathematics Department San Francisco State University San Francisco, CA 94132 USA [email protected]

Mathematics Department East Hall University of Michigan Ann Arbor, MI 48109 USA [email protected]. umich.edu

Mathematics Department University of California at Berkeley Berkeley, CA 94720-3840 USA [email protected]

Mathematics Subject Classification (2000): 2601 Library of Congress Cataloging-in-Publication Data Abbott, Stephen, 1964Understanding analysis / Stephen Abbott. p. cm. — (Undergraduate texts in mathematics) Includes bibliographical references and index. 1. Mathematical analysis. I. Title. II. Series. QA300 .A18 2001 515 —dc21 00-08308 ISBN 978-1-44 I 9-2866-5 Printed on acid-free paper. e-ISBN 978-0-387-21506-8 @ 2010 Springer ScienceBusiness Media, Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, Inc., 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed in the United States of America. 9876 springeronline.com

Springer Science+Business

Preface My primary goal in writing Understanding Analysis was to create an elementary one-semester book that exposes students to the rich rewards inherent in taking a mathematically rigorous approach to the study of functions of a real variable. The aim of a course in real analysis should be to challenge and improve mathematical intuition rather than to verify it. There is a tendency, however, to center an introductory course too closely around the familiar theorems of the standard calculus sequence. Producing a rigorous argument that polynomials are continuous is good evidence for a well-chosen deﬁnition of continuity, but it is not the reason the subject was created and certainly not the reason it should be required study. By shifting the focus to topics where an untrained intuition is severely disadvantaged (e.g., rearrangements of inﬁnite series, nowhere-diﬀerentiable continuous functions, Fourier series), my intent is to restore an intellectual liveliness to this course by oﬀering the beginning student access to some truly signiﬁcant achievements of the subject.

The Main Objectives In recent years, the standard undergraduate curriculum in mathematics has been subjected to steady pressure from several diﬀerent sources. As computers and technology become more ubiquitous, so do the areas where mathematical thinking can be a valuable asset. Rather than preparing themselves for graduate study in pure mathematics, the present majority of mathematics majors look forward to careers in banking, medicine, law, and numerous other ﬁelds where analytical skills are desirable. Another strong inﬂuence on college mathematics is the ongoing calculus reform eﬀort, now well over ten years old. At the core of this movement is the justiﬁable goal of presenting calculus in a more intuitive way, emphasizing geometric arguments over symbolic ones. Despite these various trends—or perhaps because of them—nearly every undergraduate mathematics program continues to require at least one semester of real analysis. The result is that instructors today are faced with the task of teaching a diﬃcult, abstract course to a more diverse audience less familiar with the nature of axiomatic arguments. The crux of the matter is that any prevailing sentiment in favor of marketing mathematics to larger groups must at some point be reconciled with the fact v

vi

Preface

that theoretical analysis is extremely challenging and even intimidating for some. One unfortunate resolution of this dilemma has been to make the course easier by making it less interesting. The omitted material is inevitably what gives analysis its true ﬂavor. A better solution is to ﬁnd a way to make the more advanced topics accessible and worth the eﬀort. I see three essential goals that a semester of real analysis should try to meet: 1. Students, especially those emerging from a reform approach to calculus, need to be convinced of the need for a more rigorous study of functions. The necessity of precise deﬁnitions and an axiomatic approach must be carefully motivated. 2. Having seen mainly graphical, numerical, or intuitive arguments, students need to learn what constitutes a rigorous mathematical proof and how to write one. 3. There needs to be signiﬁcant reward for the diﬃcult work of ﬁrming up the logical structure of limits. Speciﬁcally, real analysis should not be just an elaborate reworking of standard introductory calculus. Students should be exposed to the tantalizing complexities of the real line, to the subtleties of diﬀerent ﬂavors of convergence, and to the intellectual delights hidden in the paradoxes of the inﬁnite. The philosophy of Understanding Analysis is to focus attention on questions that give analysis its inherent fascination. Does the Cantor set contain any irrational numbers? Can the set of points where a function is discontinuous be arbitrary? Are derivatives continuous? Are derivatives integrable? Is an inﬁnitely diﬀerentiable function necessarily the limit of its Taylor series? In giving these topics center stage, the hard work of a rigorous study is justiﬁed by the fact that they are inaccessible without it.

The Structure of the Book This book is an introductory text. Although some fairly sophisticated topics are brought in early to advertise and motivate the upcoming material, the main body of each chapter consists of a lean and focused treatment of the core topics that make up the center of most courses in analysis. Fundamental results about completeness, compactness, sequential and functional limits, continuity, uniform convergence, diﬀerentiation, and integration are all incorporated. What is speciﬁc here is where the emphasis is placed. In the chapter on integration, for instance, the exposition revolves around deciphering the relationship between continuity and the Riemann integral. Enough properties of the integral are obtained to justify a proof of the Fundamental Theorem of Calculus, but the theme of the chapter is the pursuit of a characterization of integrable functions in terms of continuity. Whether or not Lebesgue’s measure-zero criterion is treated, framing the material in this way is still valuable because it is the questions that are important. Mathematics is not a static discipline. Students

Preface

vii

should be aware of the historical reasons for the creation of the mathematics they are learning and by extension realize that there is no last word on the subject. In the case of integration, this point is made explicitly by including some relatively recent developments on the generalized Riemann integral in the additional topics of the last chapter. The structure of the chapters has the following distinctive features. Discussion Sections: Each chapter begins with the discussion of some motivating examples and open questions. The tone in these discussions is intentionally informal, and full use is made of familiar functions and results from calculus. The idea is to freely explore the terrain, providing context for the upcoming deﬁnitions and theorems. A recurring theme is the resolution of the paradoxes that arise when operations that work well in ﬁnite settings are naively extended to inﬁnite settings (e.g., diﬀerentiating an inﬁnite series term-by-term, reversing the order of a double summation). After these exploratory introductions, the tone of the writing changes, and the treatment becomes rigorously tight but still not overly formal. With the questions in place, the need for the ensuing development of the material is well-motivated and the payoﬀ is in sight. Project Sections: The penultimate section of each chapter (the ﬁnal section is a short epilogue) is written with the exercises incorporated into the exposition. Proofs are outlined but not completed, and additional exercises are included to elucidate the material being discussed. The point of this is to provide some ﬂexibility. The sections are written as self-guided tutorials, but they can also be the subject of lectures. I have used them in place of a ﬁnal examination, and they work especially well as collaborative assignments that can culminate in a class presentation. The body of each chapter contains the necessary tools, so there is some satisfaction in letting the students use their newly acquired skills to ferret out for themselves answers to questions that have been driving the exposition.

Building a Course Teaching a satisfying class inevitably involves a race against time. Although this book is designed for a 12–14 week semester, there are still a few choices to make as to what to cover. • The introductions can be discussed, assigned as reading, omitted, or substituted with something preferable. There are no theorems proved here that show up later in the text. I do develop some important examples in these introductions (the Cantor set, Dirichlet’s nowhere-continuous function) that probably need to ﬁnd their way into discussions at some point. • Chapter 3, Basic Topology of R, is much longer than it needs to be. All that is required by the ensuing chapters are fundamental results about open and closed sets and a thorough understanding of sequential compactness. The characterization of compactness using open covers as well

viii

Preface as the section on perfect and connected sets are included for their own intrinsic interest. They are not, however, crucial to any future proofs. The one exception to this is a presentation of the Intermediate Value Theorem (IVT) as a special case of the preservation of connected sets by continuous functions. To keep connectedness truly optional, I have included two direct proofs of IVT, one using least upper bounds and the other using nested intervals. A similar comment can be made about perfect sets. Although proofs of the Baire Category Theorem are nicely motivated by the argument that perfect sets are uncountable, it is certainly possible to do one without the other. • All the project sections (1.5, 2.8, 3.5, 4.6, 5.4, 6.6, 7.6, 8.1–8.4) are optional in the sense that no results in later chapters depend on material in these sections. The four topics covered in Chapter 8 are also written in this project-style format, where the exercises make up a signiﬁcant part of the development. The only one of these sections that might require a lecture is the unit on Fourier series, which is a bit longer than the others.

The Audience The only prerequisite for this course is a robust understanding of the results from single-variable calculus. The theorems of linear algebra are not needed, but the exposure to abstract arguments and proof writing that usually comes with this course would be a valuable asset. Complex numbers are never used in this book. The proofs in Understanding Analysis are written with the introductory student ﬁrmly in mind. Brevity and other stylistic concerns are postponed in favor of including a signiﬁcant level of detail. Most proofs come with a fair amount of discussion about the context of the argument. What should the proof entail? Which deﬁnitions are relevant? What is the overall strategy? Is one particular proof similar to something already done? Whenever there is a choice, eﬃciency is traded for an opportunity to reinforce some previously learned technique. Especially familiar or predictable arguments are usually sketched as exercises so that students can participate directly in the development of the core material. The search for recurring ideas exists at the proof-writing level and also on the larger expository level. I have tried to give the course a narrative tone by picking up on the unifying themes of approximation and the transition from the ﬁnite to the inﬁnite. To paraphrase a passage from the end of the book, real numbers are approximated by rational ones; values of continuous functions are approximated by values nearby; curves are approximated by straight lines; areas are approximated by sums of rectangles; continuous functions are approximated by polynomials. In each case, the approximating objects are tangible and wellunderstood, and the issue is when and how well these qualities survive the limiting process. By focusing on this recurring pattern, each successive topic

Preface

ix

builds on the intuition of the previous one. The questions seem more natural, and a method to the madness emerges from what might otherwise appear as a long list of theorems and proofs. This book always emphasizes core ideas over generality, and it makes no eﬀort to be a complete, deductive catalog of results. It is designed to capture the intellectual imagination. Those who become interested are then exceptionally well prepared for a second course starting from complex-valued functions on more general spaces, while those content with a single semester come away with a strong sense of the essence and purpose of real analysis. Turning once more to the concluding passages of Chapter 8, “By viewing the diﬀerent inﬁnities of mathematics through pathways crafted out of ﬁnite objects, Weierstrass and the other founders of analysis created a paradigm for how to extend the scope of mathematical exploration deep into territory previously unattainable.” This exploration has constituted the major thrill of my intellectual life. I am extremely pleased to oﬀer this guide to what I feel are some of the most impressive highlights of the journey. Have a wonderful trip!

Acknowledgments The genesis of this book came from an extended series of conversations with Benjamin Lotto of Vassar College. The structure of the early chapters and the book’s overall thesis are in large part the result of several years of sharing classroom notes, ideas, and experiences with Ben. I am pleased with how the manuscript has turned out, and I have no doubt that it is an immeasurably better book because of Ben’s early contributions. A large part of the writing was done while I was enjoying a visiting position at the University of Virginia. Special thanks go to Nat Martin and Larry Thomas for being so generous with their time and wisdom, and especially to Loren Pitt, the scope of whose advice extends well beyond the covers of this book. I would also like to thank Julie Riddleberger for her help with many of the ﬁgures. Marian Robbins of Bellarmine College, Steve Kennedy of Carleton College, Paul Humke of Saint Olaf College, and Tom Kriete of the University of Virginia each taught from a preliminary draft of this text. I appreciate the many suggested improvements that this group provided, and I want to especially acknowledge Paul Humke for his contributions to the chapter on integration. My department and the administration of Middlebury College have also been very supportive of this endeavor. David Guertin came to my technological rescue on numerous occasions, Priscilla Bremser read early chapter drafts, and Rick Chartrand’s insightful opinions greatly improved some of the later sections. The list of students who have suﬀered through the long evolution of this book is now too long to present, but I would like to mention Brooke Sargent, whose meticulous class notes were the basis of the ﬁrst draft, and Jesse Johnson, who has worked tirelessly to improve the presentation of the many exercises in the book. The production team at Springer has been absolutely ﬁrst-rate. My sincere thanks goes to all of them with a special nod to Sheldon

x

Preface

Axler for encouragement and advice surely exceeding anything in his usual job description. In a recent rereading of the completed text, I was struck by how frequently I resort to historical context to motivate an idea. This was not a conscious goal I set for myself. Instead, I feel it is a reﬂection of a very encouraging trend in mathematical pedagogy to humanize our subject with its history. From my own experience, a good deal of the credit for this movement in analysis should go to two books: A Radical Approach to Real Analysis, by David Bressoud, and Analysis by Its History, by E. Hairer and G. Wanner. Bressoud’s book was particularly inﬂuential to the presentation of Fourier series in the last chapter. Either of these would make an excellent supplementary resource for this course. While I do my best to cite their historical origins when it seems illuminating or especially important, the present form of many of the theorems presented here belongs to the common folklore of the subject, and I have not attempted careful attribution. One exception is the material on the previously mentioned generalized Riemann integral, due independently to Jaroslav Kurzweil and Ralph Henstock. Section 8.1 closely follows the treatment laid out in Robert Bartle’s article “Return to the Riemann Integral.” In this paper, the author makes an italicized plea for teachers of mathematics to supplant Lebesgue’s ubiquitous integral with the generalized Riemann integral. I hope that Professor Bartle will see its inclusion here as an inspired response to that request. On a personal note, I welcome comments of any nature, and I will happily share any enlightening remarks—and any corrections—via a link on my webpage. The publication of this book comes nearly four years after the idea was ﬁrst hatched. The long road to this point has required the steady support of many people but most notably that of my incredible wife, Katy. Amid the ﬂurry of diﬃcult decisions and hard work that go into a project of this size, the opportunity to dedicate this book to her comes as a pure and easy pleasure. Middlebury, Vermont August 2000

Stephen Abbott

Contents Preface 1 The 1.1 1.2 1.3 1.4 1.5 1.6

v

Real Numbers √ Discussion: The Irrationality of 2 Some Preliminaries . . . . . . . . . The Axiom of Completeness . . . . Consequences of Completeness . . Cantor’s Theorem . . . . . . . . . Epilogue . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

1 1 4 13 18 29 33

2 Sequences and Series 2.1 Discussion: Rearrangements of Inﬁnite Series . . . . . . . 2.2 The Limit of a Sequence . . . . . . . . . . . . . . . . . . . 2.3 The Algebraic and Order Limit Theorems . . . . . . . . . 2.4 The Monotone Convergence Theorem and a First Look at Inﬁnite Series . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Subsequences and the Bolzano–Weierstrass Theorem . . . 2.6 The Cauchy Criterion . . . . . . . . . . . . . . . . . . . . 2.7 Properties of Inﬁnite Series . . . . . . . . . . . . . . . . . 2.8 Double Summations and Products of Inﬁnite Series . . . . 2.9 Epilogue . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . .

35 35 38 44

. . . . . .

. . . . . .

. . . . . .

. . . . . .

50 55 58 62 69 73

3 Basic Topology of R 3.1 Discussion: The Cantor Set . . . 3.2 Open and Closed Sets . . . . . . 3.3 Compact Sets . . . . . . . . . . . 3.4 Perfect Sets and Connected Sets 3.5 Baire’s Theorem . . . . . . . . . 3.6 Epilogue . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

75 75 78 84 89 94 96

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

4 Functional Limits and Continuity 99 4.1 Discussion: Examples of Dirichlet and Thomae . . . . . . . . . . 99 4.2 Functional Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.3 Combinations of Continuous Functions . . . . . . . . . . . . . . . 109 xi

xii

Contents 4.4 4.5 4.6 4.7

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

114 120 125 127

Derivative Discussion: Are Derivatives Continuous? . . . . . Derivatives and the Intermediate Value Property The Mean Value Theorem . . . . . . . . . . . . . A Continuous Nowhere-Diﬀerentiable Function . Epilogue . . . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

129 129 131 137 144 148

6 Sequences and Series of Functions 6.1 Discussion: Branching Processes . . . . . . . . . 6.2 Uniform Convergence of a Sequence of Functions 6.3 Uniform Convergence and Diﬀerentiation . . . . 6.4 Series of Functions . . . . . . . . . . . . . . . . . 6.5 Power Series . . . . . . . . . . . . . . . . . . . . . 6.6 Taylor Series . . . . . . . . . . . . . . . . . . . . 6.7 Epilogue . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

151 151 154 164 167 169 176 181

7 The 7.1 7.2 7.3 7.4 7.5 7.6 7.7

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

183 183 186 191 195 199 203 210

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

213 213 222 228 243

5 The 5.1 5.2 5.3 5.4 5.5

Continuous Functions on Compact Sets The Intermediate Value Theorem . . . . Sets of Discontinuity . . . . . . . . . . . Epilogue . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

Riemann Integral Discussion: How Should Integration be Deﬁned? The Deﬁnition of the Riemann Integral . . . . . . Integrating Functions with Discontinuities . . . . Properties of the Integral . . . . . . . . . . . . . The Fundamental Theorem of Calculus . . . . . . Lebesgue’s Criterion for Riemann Integrability . Epilogue . . . . . . . . . . . . . . . . . . . . . . .

8 Additional Topics 8.1 The Generalized Riemann Integral . . . . . . . 8.2 Metric Spaces and the Baire Category Theorem 8.3 Fourier Series . . . . . . . . . . . . . . . . . . . 8.4 A Construction of R From Q . . . . . . . . . .

. . . .

Bibliography

251

Index

253

Chapter 1

The Real Numbers 1.1

Discussion: The Irrationality of

√

2

Toward the end of his distinguished career, the renowned British mathematician G.H. Hardy eloquently laid out a justiﬁcation for a life of studying mathematics in A Mathematician’s Apology, an essay ﬁrst published in 1940. At the center of Hardy’s defense is the thesis that mathematics is an aesthetic discipline. For Hardy, the applied mathematics of engineers and economists held little charm. “Real mathematics,” as he referred to it, “must be justiﬁed as art if it can be justiﬁed at all.” To help make his point, Hardy includes two theorems from classical Greek mathematics, which, in his opinion, possess an elusive kind of beauty that, although diﬃcult to deﬁne, is easy to recognize. The ﬁrst of these results is Euclid’s proof that there are an inﬁnite number of prime numbers. The second result is the √ discovery, attributed to the school of Pythagoras from around 500 B.C., that 2 is irrational. It is this second theorem that demands our attention. (A course in number theory would focus on the ﬁrst.) The argument uses only arithmetic, but its depth and importance cannot be overstated. As Hardy says, “[It] is a ‘simple’ theorem, simple both in idea and execution, but there is no doubt at all about [it being] of the highest class. [It] is as fresh and signiﬁcant as when it was discovered—two thousand years have not written a wrinkle on [it].” Theorem 1.1.1. There is no rational number whose square is 2. Proof. A rational number is any number that can be expressed in the form p/q, where p and q are integers. Thus, what the theorem asserts is that no matter 2 how p and q are chosen, it is never the case that (p/q) = 2. The line of attack is indirect, using a type of argument referred to as a proof by contradiction. The idea is to assume that there is a rational number whose square is 2 and then proceed along logical lines until we reach a conclusion that is unacceptable. At this point, we will be forced to retrace our steps and reject the erroneous 1

2

Chapter 1. The Real Numbers

assumption that some rational number squared is equal to 2. In short, we will prove that the theorem is true by demonstrating that it cannot be false. And so assume, for contradiction, that there exist integers p and q satisfying 2 p (1) = 2. q We may also assume that p and q have no common factor, because, if they had one, we could simply cancel it out and rewrite the fraction in lowest terms. Now, equation (1) implies (2)

p2 = 2q 2 .

From this, we can see that the integer p2 is an even number (it is divisible by 2), and hence p must be even as well because the square of an odd number is odd. This allows us to write p = 2r, where r is also an integer. If we substitute 2r for p in equation (2), then a little algebra yields the relationship 2r2 = q 2 . But now the absurdity is at hand. This last equation implies that q 2 is even, and hence q must also be even. Thus, we have shown that p and q are both even (i.e., divisible by 2) when they were originally assumed to have no common factor. From this logical impasse, we can only conclude that equation (1) cannot hold for any integers p and q, and thus the theorem is proved. A component of Hardy’s deﬁnition of beauty in a mathematical theorem is that the result have lasting and serious implications for a network of other mathematical ideas. In this case, the ideas under assault were the Greeks’ understanding of the relationship between geometric length and arithmetic number. Prior to the preceding discovery, it was an assumed and commonly used fact that, given two line segments AB and CD, it would always be possible to ﬁnd a third line segment whose length divides evenly into the ﬁrst two. In modern terminology, this is equivalent to asserting that the length of CD is a rational multiple of the length of AB. Looking at the diagonal of a unit square (Fig. 1.1), it now followed (using the Pythagorean Theorem) that this was not always the case. Because the Pythagoreans implicitly interpreted number to mean rational number, they were forced to accept that number was a strictly weaker notion than length. Rather than abandoning arithmetic in favor of geometry (as the Greeks seem to have done), our resolution to this limitation is to strengthen the concept of number by moving from the rational numbers to a larger number system. From a modern point of view, this should seem like a familiar and somewhat natural phenomenon. We begin with the natural numbers N = {1, 2, 3, 4, 5, . . . }. The inﬂuential German mathematician Leopold Kronecker (1823–1891) once asserted that “The natural numbers are the work of God. All of the rest is

1.1. Discussion: The Irrationality of

√

2

3

•D √

2

1

C• • 1 A B √ Figure 1.1: 2 exists as a geometric length.

the work of mankind.” Debating the validity of this claim is an interesting conversation for another time. For the moment, it at least provides us with a place to start. If we restrict our attention to the natural numbers N, then we can perform addition perfectly well, but we must extend our system to the integers Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . . } if we want to have an additive identity (zero) and the additive inverses necessary to deﬁne subtraction. The next issue is multiplication and division. The number 1 acts as the multiplicative identity, but in order to deﬁne division we need to have multiplicative inverses. Thus, we extend our system again to the rational numbers p where p and q are integers with q = 0 . Q = all fractions q Taken together, the properties of Q discussed in the previous paragraph essentially make up the deﬁnition of what is called a ﬁeld. More formally stated, a ﬁeld is any set where addition and multiplication are well-deﬁned operations that are commutative, associative, and obey the familiar distributive property a(b + c) = ab + ac. There must be an additive identity, and every element must have an additive inverse. Finally, there must be a multiplicative identity, and multiplicative inverses must exist for all nonzero elements of the ﬁeld. Neither Z nor N is a ﬁeld. The ﬁnite set {0, 1, 2, 3, 4} is a ﬁeld when addition and multiplication are computed modulo 5. This is not immediately obvious but makes an interesting exercise (Exercise 1.3.1). The set Q also has a natural order deﬁned on it. Given any two rational numbers r and s, exactly one of the following is true: r < s,

r = s,

or

r > s.

This ordering is transitive in the sense that if r < s and s < t, then r < t, so we are conveniently led to a mental picture of the rational numbers as being laid out from left to right along a number line. Unlike Z, there are no intervals of empty space. Given any two rational numbers r < s, the rational number

4

Chapter 1. The Real Numbers

√

2 ↓ ↑ 1.5 2 1 1.414 √ Figure 1.2: Approximating 2 with rational numbers.

(r+s)/2 sits halfway in between, implying that the rational numbers are densely nestled together. With the ﬁeld properties of Q allowing us to safely carry out the algebraic operations of addition, subtraction, multiplication, and division, let’s remind ourselves just what it is that Q is lacking. By Theorem 1.1.1, it is apparent that we cannot always take square roots. The problem, however, is actually more fundamental than this. Using only rational numbers, it is possible to √ approximate 2 quite well (Fig. 1.2). For instance, 1.4142 = 1.999396. By adding more√decimal places to our approximation, we can get even closer to a value for 2, but, even so, we√are now well aware that there is a “hole” in the rational number√line where 2 ought to be. Of course, there are quite a √ few other holes—at 3 and 5, for example. Returning to the dilemma of the ancient Greek mathematicians, if we want every length along the number line to correspond to an actual number, then another extension to our number system is in order. Thus, to the chain N ⊆ Z ⊆ Q we append the real numbers R. The question of how to actually construct R from Q is rather complicated business. It is discussed in Section 1.3, and then again in more detail in Section 8.4. For the moment, it is not too inaccurate to say that R is obtained by ﬁlling in the gaps in Q. Wherever there is a hole, a new irrational number is deﬁned and placed into the ordering that already exists on Q. The real numbers are then the union of these irrational numbers together with the more familiar rational ones. What properties does the set of irrational numbers have? How do the sets of rational and irrational numbers ﬁt together? Is there a kind of symmetry between the rationals and the irrationals, or is there some sense in which we can argue that one type of real number is more common than the other? The one method we have seen so far for generating examples of irrational √ numbers is through square roots. Not too surprisingly, other roots such as 3 2 √ 5 or 3 are most often irrational. Can all irrational numbers be expressed as algebraic combinations of nth roots and rational numbers, or are there still other irrational numbers beyond those of this form?

1.2

Some Preliminaries

The vocabulary necessary for the ensuing development comes from set theory and the theory of functions. This should be familiar territory, but a brief review

1.2. Some Preliminaries

5

of the terminology is probably a good idea, if only to establish some agreed-upon notation.

Sets Intuitively speaking, a set is any collection of objects. These objects are referred to as the elements of the set. For our purposes, the sets in question will most often be sets of real numbers, although we will also encounter sets of functions and, on a few rare occasions, sets whose elements are other sets. Given a set A, we write x ∈ A if x (whatever it may be) is an element of A. If x is not an element of A, then we write x ∈ / A. Given two sets A and B, the union is written A ∪ B and is deﬁned by asserting that x ∈ A ∪ B provided that x ∈ A or x ∈ B (or potentially both). The intersection A ∩ B is the set deﬁned by the rule x ∈ A ∩ B provided x ∈ A and x ∈ B. Example 1.2.1. (i) There are many acceptable ways to assert the contents of a set. In the previous section, the set of natural numbers was deﬁned by listing the elements: N = {1, 2, 3, . . . }. (ii) Sets can also be described in words. For instance, we can deﬁne the set E to be the collection of even natural numbers. (iii) Sometimes it is more eﬃcient to provide a kind of rule or algorithm for determining the elements of a set. As an example, let S = {r ∈ Q : r2 < 2}. Read aloud, the deﬁnition of S says, “Let S be the set of all rational numbers whose squares are less than 2.” It follows that 1 ∈ S, 4/3 ∈ S, but 3/2 ∈ / S because 9/4 ≥ 2. Using the previously deﬁned sets to illustrate the operations of intersection and union, we observe that N ∪ E = N,

N ∩ E = E,

N ∩ S = {1}, and E ∩ S = ∅.

The set ∅ is called the empty set and is understood to be the set that contains no elements. An equivalent statement would be to say that E and S are disjoint. A word about the equality of two sets is in order (since we have just used the notion). The inclusion relationship A ⊆ B or B ⊇ A is used to indicate that every element of A is also an element of B. In this case, we say A is a subset of B, or B contains A. To assert that A = B means that A ⊆ B and B ⊆ A. Put another way, A and B have exactly the same elements. Quite frequently in the upcoming chapters, we will want to apply the union and intersection operations to inﬁnite collections of sets.

6

Chapter 1. The Real Numbers

Example 1.2.2. Let A1

= N = {1, 2, 3, . . . },

A2

= {2, 3, 4, . . . },

A3

= {3, 4, 5, . . . },

and, in general, for each n ∈ N, deﬁne the set An = {n, n + 1, n + 2, . . . }. The result is a nested chain of sets A1 ⊇ A2 ⊇ A3 ⊇ A4 ⊇ · · · , where each successive set is a subset of all the previous ones. Notationally, ∞ n=1

An ,

or A1 ∪ A2 ∪ A3 ∪ · · ·

An ,

n∈N

are all equivalent ways to indicate the set whose elements consist of any element that appears in at least one particular An . Because of the nested property of this particular collection of sets, it is not too hard to see that ∞

An = A1 .

n=1

The notion of intersection has the same kind of natural extension to inﬁnite collections of sets. For this example, we have ∞

An = ∅.

n=1

Let’s be sure we understand why this is the case. Supposewe had some natural ∞ number m that we thought might actually satisfy m ∈ n=1 An . What this would mean is that m ∈ An for every An in our collection of sets. Because m is not an element of Am+1 , no such m exists and the intersection is empty. As mentioned, most of the sets we encounter will be sets of real numbers. Given A ⊆ R, the complement of A, written Ac , refers to the set of all elements of R not in A. Thus, for A ⊆ R, Ac = {x ∈ R : x ∈ / A}. A few times in our work to come, we will refer to De Morgan’s Laws, which state that (A ∩ B)c = Ac ∪ B c and (A ∪ B)c = Ac ∩ B c . Proofs of these statements are discussed in Exercise 1.2.3.

1.2. Some Preliminaries

7

Admittedly, there is something imprecise about the deﬁnition of set presented at the beginning of this discussion. The deﬁning sentence begins with the phrase “Intuitively speaking,” which might seem an odd way to embark on a course of study that purportedly intends to supply a rigorous foundation for the theory of functions of a real variable. In some sense, however, this is unavoidable. Each repair of one level of the foundation reveals something below it in need of attention. The theory of sets has been subjected to intense scrutiny over the past century precisely because so much of modern mathematics rests on this foundation. But such a study is really only advisable once it is understood why our naive impression about the behavior of sets is insuﬃcient. For the direction in which we are heading, this will not happen, although an indication of some potential pitfalls is given in Section 1.6.

Functions Deﬁnition 1.2.3. Given two sets A and B, a function from A to B is a rule or mapping that takes each element x ∈ A and associates with it a single element of B. In this case, we write f : A → B. Given an element x ∈ A, the expression f (x) is used to represent the element of B associated with x by f . The set A is called the domain of f . The range of f is not necessarily equal to B but refers to the subset of B given by {y ∈ B : y = f (x) for some x ∈ A}. This deﬁnition of function is more or less the one proposed by Peter Lejeune Dirichlet (1805–1859) in the 1830s. Dirichlet was a German mathematician who was one of the leaders in the development of the rigorous approach to functions that we are about to undertake. His main motivation was to unravel the issues surrounding the convergence of Fourier series. Dirichlet’s contributions ﬁgure prominently in Section 8.3, where an introduction to Fourier series is presented, but we will also encounter his name in several earlier chapters along the way. What is important at the moment is that we see how Dirichlet’s deﬁnition of function liberates the term from its interpretation as a type of “formula.” In the years leading up to Dirichlet’s time, the term “function” was generally √ understood to refer to algebraic entities such as f (x) = x2 +1 or g(x) = x4 + 4. Deﬁnition 1.2.3 allows for a much broader range of possibilities. Example 1.2.4. In 1829, Dirichlet proposed the unruly function 1 if x ∈ Q g(x) = 0 if x ∈ / Q. The domain of g is all of R, and the range is the set {0, 1}. There is no single formula for g in the usual sense, and it is quite diﬃcult to graph this function (see Section 4.1 for a rough attempt), but it certainly qualiﬁes as a function according to the criterion in Deﬁnition 1.2.3. As we study the theoretical nature of continuous, diﬀerentiable, or integrable functions, examples such as this one will provide us with an invaluable testing ground for the many conjectures we encounter.

8

Chapter 1. The Real Numbers

Example 1.2.5 (Triangle Inequality). The absolute value function is so important that it merits the special notation |x| in place of the usual f (x) or g(x). It is deﬁned for every real number via the piecewise deﬁnition x if x ≥ 0 |x| = −x if x < 0. With respect to multiplication and division, the absolute value function satisﬁes (i) |ab| = |a||b| and (ii) |a + b| ≤ |a| + |b| for all choices of a and b. Verifying these properties (Exercise 1.2.4) is just a matter of examining the diﬀerent cases that arise when a, b, and a+b are positive and negative. Property (ii) is called the triangle inequality. This innocuous looking inequality turns out to be fantastically important and will be frequently employed in the following way. Given three real numbers a, b, and c, we certainly have |a − b| = |(a − c) + (c − b)|. By the triangle inequality, |(a − c) + (c − b)| ≤ |a − c| + |c − b|, so we get (1)

|a − b| ≤ |a − c| + |c − b|.

Now, the expression |a − b| is equal to |b − a| and is best understood as the distance between the points a and b on the number line. With this interpretation, equation (1) makes the plausible statement that “the distance from a to b is less than or equal to the distance from a to c plus the distance from c to b.” Pretending for a moment that these are points in the plane (instead of on the real line), it should be evident why this is referred to as the “triangle inequality.”

Logic and Proofs Writing rigorous mathematical proofs is a skill best learned by doing, and there is plenty of on-the-job training just ahead. As Hardy indicates, there is an artistic quality to mathematics of this type, which may or may not come easily, but that is not to say that anything especially mysterious is happening. A proof is an essay of sorts. It is a set of carefully crafted directions, which, when followed, should leave the reader absolutely convinced of the truth of the proposition in question. To achieve this, the steps in a proof must follow logically from previous steps or be justiﬁed by some other agreed-upon set of facts. In addition to being valid, these steps must also ﬁt coherently together to form a cogent argument. Mathematics has a specialized vocabulary, to be sure,

1.2. Some Preliminaries

9

but that does not exempt a good proof from being written in grammatically correct English. The one proof we have seen at this point (to Theorem 1.1.1) uses an indirect strategy called proof by contradiction. This powerful technique will be employed a number of times in our upcoming work. Nevertheless, most proofs are direct. (It also bears mentioning that using an indirect proof when a direct proof is available is generally considered bad manners.) A direct proof begins from some valid statement, most often taken from the theorem’s hypothesis, and then proceeds through rigorously logical deductions to a demonstration of the theorem’s conclusion. As we saw in Theorem 1.1.1, an indirect proof always begins by negating what it is we would like to prove. This is not always as easy to do as it may sound. The argument then proceeds until (hopefully) a logical contradiction with some other accepted fact is uncovered. Many times, this accepted fact is part of the hypothesis of the theorem. When the contradiction is with the theorem’s hypothesis, we technically have what is called a contrapositive proof. The next proposition illustrates a number of the issues just discussed and introduces a few more. Theorem 1.2.6. Two real numbers a and b are equal if and only if for every real number > 0 it follows that |a − b| < . Proof. There are two key phrases in the statement of this proposition that warrant special attention. One is “for every,” which will be addressed in a moment. The other is “if and only if.” To say “if and only if” in mathematics is an economical way of stating that the proposition is true in two directions. In the forward direction, we must prove the statement: (⇒) If a = b, then for every real number > 0 it follows that |a − b| < . We must also prove the converse statement: (⇐) If for every real number > 0 it follows that |a − b| < , then we must have a = b. For the proof of the ﬁrst statement, there is really not much to say. If a = b, then |a − b| = 0, and so certainly |a − b| < no matter what > 0 is chosen. For the second statement, we give a proof by contradiction. The conclusion of the proposition in this direction states that a = b, so we assume that a = b. Heading oﬀ in search of a contradiction brings us to a consideration of the phrase “for every > 0.” Some equivalent ways to state the hypothesis would be to say that “for all possible choices of > 0” or “no matter how > 0 is selected, it is always the case that |a − b| < .” But assuming a = b (as we are doing at the moment), the choice of 0 = |a − b| > 0 poses a serious problem. We are assuming that |a − b| < is true for every > 0, so this must certainly be true of the particular 0 just deﬁned. However, the statements |a − b| < 0 and |a − b| = 0

10

Chapter 1. The Real Numbers

cannot both be true. This contradiction means that our initial assumption that a = b is unacceptable. Therefore, a = b, and the indirect proof is complete. One of the most fundamental skills required for reading and writing analysis proofs is the ability to conﬁdently manipulate the quantifying phrases “for all” and “there exists.” Signiﬁcantly more attention will be given to this issue in many upcoming discussions.

Induction One ﬁnal trick of the trade, which will arise with some frequency, is the use of induction arguments. Induction is used in conjunction with the natural numbers N (or sometimes with the set N ∪ {0}). The fundamental principle behind induction is that if S is some subset of N with the property that (i) S contains 1 and (ii) whenever S contains a natural number n, it also contains n + 1, then it must be that S = N. As the next example illustrates, this principle can be used to deﬁne sequences of objects as well as to prove facts about them. Example 1.2.7. Let x1 = 1, and for each n ∈ N deﬁne xn+1 = (1/2)xn + 1. Using this rule, we can compute x2 = (1/2)(1) + 1 = 3/2, x3 = 7/4, and it is immediately apparent how this leads to a deﬁnition of xn for all n ∈ N. The sequence just deﬁned appears at the outset to be increasing. For the terms computed, we have x1 ≤ x2 ≤ x3 . Let’s use induction to prove that this trend continues; that is, let’s show (2)

xn ≤ xn+1

for all values of n ∈ N. For n = 1, x1 = 1 and x2 = 3/2, so that x1 ≤ x2 is clear. Now, we want to show that if we have xn ≤ xn+1 , then it follows that xn+1 ≤ xn+2 . Think of S as the set of natural numbers for which the claim in equation (2) is true. We have shown that 1 ∈ S. We are now interested in showing that if n ∈ S, then n+1 ∈ S as well. Starting from the induction hypothesis xn ≤ xn+1 , we can multiply across the inequality by 1/2 and add 1 to get 1 1 xn + 1 ≤ xn+1 + 1, 2 2 which is precisely the desired conclusion xn+1 ≤ xn+2 . By induction, the claim is proved for all n ∈ N.

1.2. Some Preliminaries

11

Any discussion about why induction is a valid argumentative technique immediately opens up a box of questions about how we understand the natural numbers. Earlier, in Section 1.1, we avoided this issue by referencing Kronecker’s famous comment that the natural numbers are somehow divinely given. Although we will not improve on this explanation here, it should be pointed out that a more atheistic and mathematically satisfying approach to N is possible from the point of view of axiomatic set theory. This brings us back to a recurring theme of this chapter. Pedagogically speaking, the foundations of mathematics are best learned and appreciated in a kind of reverse order. A rigorous study of the natural numbers and the theory of sets is certainly recommended, but only after we have an understanding of the subtleties of the real number system. It is this latter topic that is the business of real analysis.

Exercises

√ Exercise√1.2.1. (a) Prove that 3 is irrational. Does a similar argument work to show 6 is irrational? (b)√Where does the proof of Theorem 1.1.1 break down if we try to use it to prove 4 is irrational? Exercise 1.2.2. Decide which of the following represent true statements about the nature of sets. For any that are false, provide a speciﬁc example where the statement in question does not hold. (a) If A1 ⊇ A2 ⊇ A3 ⊇ A4 · · · are all sets containing an inﬁnite number of elements, then the intersection ∩∞ n=1 An is inﬁnite as well. (b) If A1 ⊇ A2 ⊇ A3 ⊇ A4 · · · are all ﬁnite, nonempty sets of real numbers, then the intersection ∩∞ n=1 An is ﬁnite and nonempty. (c) A ∩ (B ∪ C) = (A ∩ B) ∪ C. (d) A ∩ (B ∩ C) = (A ∩ B) ∩ C. (e) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C). Exercise 1.2.3 (De Morgan’s Laws). Let A and B be subsets of R. (a) If x ∈ (A ∩ B)c , explain why x ∈ Ac ∪ B c . This shows that (A ∩ B)c ⊆ c A ∪ Bc. (b) Prove the reverse inclusion (A ∩ B)c ⊇ Ac ∪ B c , and conclude that (A ∩ B)c = Ac ∪ B c . (c) Show (A ∪ B)c = Ac ∩ B c by demonstrating inclusion both ways. Exercise 1.2.4. Verify the triangle inequality in the special cases where (a) a and b have the same sign; (b) a ≥ 0, b < 0, and a + b ≥ 0. Exercise 1.2.5. Use the triangle inequality to establish the inequalities (a) |a − b| ≤ |a| + |b|; (b) ||a| − |b|| ≤ |a − b|. Exercise 1.2.6. Given a function f and a subset A of its domain, let f (A) represent the range of f over the set A; that is, f (A) = {f (x) : x ∈ A}.

12

Chapter 1. The Real Numbers

(a) Let f (x) = x2 . If A = [0, 2] (the closed interval {x ∈ R : 0 ≤ x ≤ 2}) and B = [1, 4], ﬁnd f (A) and f (B). Does f (A ∩ B) = f (A) ∩ f (B) in this case? Does f (A ∪ B) = f (A) ∪ f (B)? (b) Find two sets A and B for which f (A ∩ B) = f (A) ∩ f (B). (c) Show that, for an arbitrary function g : R → R, it is always true that g(A ∩ B) ⊆ g(A) ∩ g(B) for all sets A, B ⊆ R. (d) Form and prove a conjecture about the relationship between g(A ∪ B) and g(A) ∪ g(B) for an arbitrary function g. Exercise 1.2.7. Given a function f : D → R and a subset B ⊆ R, let f −1 (B) be the set of all points from the domain D that get mapped into B; that is, f −1 (B) = {x ∈ D : f (x) ∈ B}. This set is called the preimage of B. (a) Let f (x) = x2 . If A is the closed interval [0, 4] and B is the closed interval [−1, 1], ﬁnd f −1 (A) and f −1 (B). Does f −1 (A ∩ B) = f −1 (A) ∩ f −1 (B) in this case? Does f −1 (A ∪ B) = f −1 (A) ∪ f −1 (B)? (b) The good behavior of preimages demonstrated in (a) is completely general. Show that for an arbitrary function g : R → R, it is always true that g −1 (A ∩ B) = g −1 (A) ∩ g −1 (B) and g −1 (A ∪ B) = g −1 (A) ∪ g −1 (B) for all sets A, B ⊆ R. Exercise 1.2.8. Form the logical negation of each claim. One way to do this is to simply add “It is not the case that...” in front of each assertion, but for each statement, try to embed the word “not” as deeply into the resulting sentence as possible (or avoid using it altogether). (a) For all real numbers satisfying a < b, there exists an n ∈ N such that a + 1/n < b. (b) Between every two distinct real numbers, there is a rational number. √ (c) For all natural numbers n ∈ N, n is either a natural number or an irrational number. (d) Given any real number x ∈ R, there exists n ∈ N satisfying n > x. Exercise 1.2.9. Show that the sequence (x1 , x2 , x3 , . . . ) deﬁned in Example 1.2.7 is bounded above by 2; that is, prove that xn ≤ 2 for every n ∈ N. Exercise 1.2.10. Let y1 = 1, and for each n ∈ N deﬁne yn+1 = (3yn + 4)/4. (a) Use induction to prove that the sequence satisﬁes yn < 4 for all n ∈ N. (b) Use another induction argument to show the sequence (y1 , y2 , y3 , . . . ) is increasing. Exercise 1.2.11. If a set A contains n elements, prove that the number of diﬀerent subsets of A is equal to 2n . (Keep in mind that the empty set ∅ is considered to be a subset of every set.) Exercise 1.2.12. For this exercise, assume Exercise 1.2.3 has been successfully completed. (a) Show how induction can be used to conclude that c

(A1 ∪ A2 ∪ · · · ∪ An ) = Ac1 ∩ Ac2 ∩ · · · ∩ Acn

1.3. The Axiom of Completeness

13

for any ﬁnite n ∈ N. (b) Explain why induction cannot be used to conclude ∞ c ∞ An = Acn . n=1

n=1

It might be useful to consider part (a) of Exercise 1.2.2. (c) Is the statement in part (b) valid? If so, write a proof that does not use induction.

1.3

The Axiom of Completeness

What exactly is a real number? In Section 1.1, we got as far as saying that the set R of real numbers is an extension of the rational numbers Q in which there √ are no holes or gaps. We want every length along the number line—such as 2—to correspond to a real number and vice versa. We are going to improve on this deﬁnition, but as we do so, it is important to keep in mind our earlier acknowledgment that whatever precise statements we formulate will necessarily rest on other unproven assumptions or undeﬁned terms. At some point, we must draw a line and confess that this is what we have decided to accept as a reasonable place to start. Naturally, there is some debate about where this line should be drawn. One way to view the mathematics of the 19th and 20th centuries is as a stalwart attempt to move this line further and further back toward some unshakable foundation. The majority of the material covered in this book is attributable to the mathematicians working in the early and middle parts of the 1800s. Augustin Louis Cauchy (1789–1857), Bernhard Bolzano (1781–1848), Niels Henrik Abel (1802–1829), Peter Lejeune Dirichlet, Karl Weierstrass (1815–1897), and Bernhard Riemann (1826–1866) all ﬁgure prominently in the discovery of the theorems that follow. But here is the interesting point. Nearly all of this work was done using intuitive assumptions about the nature of R quite similar to our own informal understanding at this point. Eventually, enough scrutiny was directed at the detailed structure of R so that, in the 1870s, a handful of ways to rigorously construct R from Q were proposed. Following this historical model, our own rigorous construction of R from Q is postponed until Section 8.4. By this point, the need for such a construction will be more justiﬁed and easier to appreciate. In the meantime, we have many proofs to write, so it is important to lay down, as explicitly as possible, the assumptions that we intend to make about the real numbers.

An Initial Deﬁnition for R First, R is a set containing Q. The operations of addition and multiplication on Q extend to all of R in such a way that every element of R has an additive inverse and every nonzero element of R has a multiplicative inverse. Echoing

14

Chapter 1. The Real Numbers

the discussion in Section 1.1, we assume R is a ﬁeld, meaning that addition and multiplication of real numbers is commutative, associative, and the distributive property holds. This allows us to perform all of the standard algebraic manipulations that are second nature to us. We also assume that the familiar properties of the ordering on Q extend to all of R. Thus, for example, such deductions as “If a 0, then ac < bc” will be carried out freely without much comment. To summarize the situation in the oﬃcial terminology of the subject, we assume that R is an ordered ﬁeld, which contains Q as a subﬁeld. (A rigorous deﬁnition of “ordered ﬁeld” is presented in Section 8.4.) This brings us to the ﬁnal, and most distinctive, assumption about the real number system. We must ﬁnd some way to clearly articulate what we mean by insisting that R does not contain the gaps that permeate Q. Because this is the deﬁning diﬀerence between the rational numbers and the real numbers, we will be excessively precise about how we phrase this assumption, hereafter referred to as the Axiom of Completeness. Axiom of Completeness. Every nonempty set of real numbers that is bounded above has a least upper bound. Now, what exactly does this mean?

Least Upper Bounds and Greatest Lower Bounds Let’s ﬁrst state the relevant deﬁnitions, and then look at some examples. Deﬁnition 1.3.1. A set A ⊆ R is bounded above if there exists a number b ∈ R such that a ≤ b for all a ∈ A. The number b is called an upper bound for A. Similarly, the set A is bounded below if there exists a lower bound l ∈ R satisfying l ≤ a for every a ∈ A. Deﬁnition 1.3.2. A real number s is the least upper bound for a set A ⊆ R if it meets the following two criteria: (i) s is an upper bound for A; (ii) if b is any upper bound for A, then s ≤ b. The least upper bound is also frequently called the supremum of the set A. Although the notation s = lub A is still common, we will always write s = sup A for the least upper bound. The greatest lower bound or inﬁmum for A is deﬁned in a similar way (Exercise 1.3.2) and is denoted by inf A (Fig. 1.3). Although a set can have a host of upper bounds, it can have only one least upper bound. If s1 and s2 are both least upper bounds for a set A, then by property (ii) in Deﬁnition 1.3.2 we can assert s1 ≤ s2 and s2 ≤ s1 . The conclusion is that s1 = s2 and least upper bounds are unique.

1.3. The Axiom of Completeness

15

inf A ↓

lower bounds

❅ I ✻ ❅ A

sup A ↓ • ••••••

✒

upper bounds

Figure 1.3: Definition of sup A and inf A.

Example 1.3.3. Let

A=

1 :n∈N n

=

1 1 1, , , . . . 2 3

.

The set A is bounded above and below. Successful candidates for an upper bound include 3, 2, and 3/2. For the least upper bound, we claim sup A = 1. To argue this rigorously using Deﬁnition 1.3.2, we need to verify that properties (i) and (ii) hold. For (i), we just observe that 1 ≥ 1/n for all choices of n ∈ N. To verify (ii), we begin by assuming we are in possession of some other upper bound b. Because 1 ∈ A and b is an upper bound for A, we must have 1 ≤ b. This is precisely what property (ii) asks us to show. Although we do not quite have the tools we need for a rigorous proof (see Theorem 1.4.2), it should be somewhat apparent that inf A = 0. An important lesson to take from Example 1.3.3 is that sup A and inf A may or may not be elements of the set A. This issue is tied to understanding the crucial diﬀerence between the maximum and the supremum (or the minimum and the inﬁmum) of a given set. Deﬁnition 1.3.4. A real number a0 is a maximum of the set A if a0 is an element of A and a0 ≥ a for all a ∈ A. Similarly, a number a1 is a minimum of A if a1 ∈ A and a1 ≤ a for every a ∈ A. Example 1.3.5. To belabor the point, consider the open interval (0, 2) = {x ∈ R : 0 < x < 2}, and the closed interval [0, 2] = {x ∈ R : 0 ≤ x ≤ 2}. Both sets are bounded above (and below), and both have the same least upper bound, namely 2. It is not the case, however, that both sets have a maximum. A maximum is a speciﬁc type of upper bound that is required to be an element of the set in question, and the open interval (0, 2) does not possess such an element. Thus, the supremum can exist and not be a maximum, but when a maximum exists then it is also the supremum.

16

Chapter 1. The Real Numbers

Let’s turn our attention back to the Axiom of Completeness. Although we can see now that not every nonempty bounded set contains a maximum, the Axiom of Completeness asserts that every such set does have a least upper bound. We are not going to prove this. An axiom in mathematics is an accepted assumption, to be used without proof. Preferably, an axiom should be an elementary statement about the system in question that is so fundamental that it seems to need no justiﬁcation. Perhaps the Axiom of Completeness ﬁts this description, and perhaps it does not. Before deciding, let’s remind ourselves why it is not a valid statement about Q. Example 1.3.6. Consider again the set S = {r ∈ Q : r2 < 2}, and pretend for the moment that our world consists only of rational numbers. The set S is certainly bounded above. Taking b = 2 works, as does b = 3/2. But notice what happens as we go in search of the least√upper bound. (It may be useful here to know that the decimal expansion for 2 begins 1.4142 . . . .) We might try b = 142/100, which is indeed an upper bound, but then we discover that b = 1415/1000 is an upper bound that is smaller still. Is there a smallest one? In the rational numbers, there is not. In the real numbers, there is. Back in R, the Axiom of Completeness states that we may set α = sup S and be conﬁdent that such a number exists. In the next section, we will prove that α2 = 2. But according to Theorem 1.1.1, this implies α is not a rational number. If we are restricting our attention to only rational numbers, then α is not an allowable option for sup S, and the search for a least upper bound goes on indeﬁnitely. Whatever rational upper bound is discovered, it is always possible to ﬁnd one smaller. The tools needed to carry out the computations described in Example 1.3.6 depend on some results about how Q and N ﬁt inside of R. These are discussed in the next section. We now give an equivalent and useful way of characterizing least upper bounds. Recall that Deﬁnition 1.3.2 of the supremum has two parts. Part (i) says that sup A must be an upper bound, and part (ii) states that it must be the smallest one. The following lemma oﬀers an alternative way to restate part (ii). Lemma 1.3.7. Assume s ∈ R is an upper bound for a set A ⊆ R. Then, s = sup A if and only if, for every choice of > 0, there exists an element a ∈ A satisfying s − < a. Proof. Here is a short rephrasing of the lemma: Given that s is an upper bound, s is the least upper bound if and only if any number smaller than s is not an upper bound. Putting it this way almost qualiﬁes as a proof, but we will expand on what exactly is being said in each direction.

1.3. The Axiom of Completeness

17

(⇒) For the forward direction, we assume s = sup A and consider s−, where > 0 has been arbitrarily chosen. Because s − < s, part (ii) of Deﬁnition 1.3.2 implies that s − is not an upper bound for A. If this is the case, then there must be some element a ∈ A for which s − < a (because otherwise s − would be an upper bound). This proves the lemma in one direction. (⇐) Conversely, assume s is an upper bound with the property that no matter how > 0 is chosen, s − is no longer an upper bound for A. Notice that what this implies is that if b is any number less than s, then b is not an upper bound. (Just let = s − b.) To prove that s = sup A, we must verify part (ii) of Deﬁnition 1.3.2. (Read it again.) Because we have just argued that any number smaller than s cannot be an upper bound, it follows that if b is some other upper bound for A, then b ≥ s. It is certainly the case that all of our conclusions to this point about least upper bounds have analogous versions for greatest lower bounds. The Axiom of Completeness does not explicitly assert that a nonempty set bounded below has an inﬁmum, but this is because we do not need to assume this fact as part of the axiom. Using the Axiom of Completeness, there are several ways to prove that greatest lower bounds exist for bounded sets. One such proof is explored in Exercise 1.3.3.

Exercises Exercise 1.3.1. Let Z5 = {0, 1, 2, 3, 4} and deﬁne addition and multiplication modulo 5. In other words, compute the integer remainder when a + b and ab are divided by 5, and use this as the value for the sum and product, respectively. (a) Show that, given any element z ∈ Z5 , there exists an element y such that z + y = 0. The element y is called the additive inverse of z. (b) Show that, given any z = 0 in Z5 , there exists an element x such that zx = 1. The element x is called the multiplicative inverse of z. (c) The existence of additive and multiplicative inverses is part of the definition of a ﬁeld. Investigate the set Z4 = {0, 1, 2, 3} (where addition and multiplication are deﬁned modulo 4) for the existence of additive and multiplicative inverses. Make a conjecture about the values of n for which additive inverses exist in Zn , and then form another conjecture about the existence of multiplicative inverses. Exercise 1.3.2. (a) Write a formal deﬁnition in the style of Deﬁnition 1.3.2 for the inﬁmum or greatest lower bound of a set. (b) Now, state and prove a version of Lemma 1.3.7 for greatest lower bounds. Exercise 1.3.3. (a) Let A be bounded below, and deﬁne B = {b ∈ R : b is a lower bound for A}. Show that sup B = inf A. (b) Use (a) to explain why there is no need to assert that greatest upper bounds exist as part of the Axiom of Completeness. (c) Propose another way to use the Axiom of Completeness to prove that sets bounded below have greatest lower bounds.

18

Chapter 1. The Real Numbers

Exercise 1.3.4. Assume that A and B are nonempty, bounded above, and satisfy B ⊆ A. Show sup B ≤ sup A. Exercise 1.3.5. Let A ⊆ R be bounded above, and let c ∈ R. Deﬁne the sets c + A and cA by c + A = {c + a : a ∈ A} and cA = {ca : a ∈ A}. (a) Show that sup(c + A) = c + sup A. (b) If c ≥ 0, show that sup(cA) = c sup A. (c) Postulate a similar type of statement for sup(cA) for the case c < 0. Exercise 1.3.6. Compute, without proofs, the suprema and inﬁma of the following sets: (a) {n ∈ N : n2 < 10}. (b) {n/(m + n) : m, n ∈ N}. (c) {n/(2n + 1) : n ∈ N}. (d) {n/m : m, n ∈ N with m + n ≤ 10}. Exercise 1.3.7. Prove that if a is an upper bound for A, and if a is also an element of A, then it must be that a = sup A. Exercise 1.3.8. If sup A < sup B, then show that there exists an element b ∈ B that is an upper bound for A. Exercise 1.3.9. Without worrying about formal proofs for the moment, decide if the following statements about suprema and inﬁma are true or false. For any that are false, supply an example where the claim in question does not appear to hold. (a) A ﬁnite, nonempty set always contains its supremum. (b) If a < L for every element a in the set A, then sup A < L. (c) If A and B are sets with the property that a < b for every a ∈ A and every b ∈ B, then it follows that sup A < inf B. (d) If sup A = s and sup B = t, then sup(A + B) = s + t. The set A + B is deﬁned as A + B = {a + b : a ∈ A and b ∈ B}. (e) If sup A ≤ sup B, then there exists an element b ∈ B that is an upper bound for A.

1.4

Consequences of Completeness

The ﬁrst application of the Axiom of Completeness is a result that may look like a more natural way to mathematically express the sentiment that the real line contains no gaps. Theorem 1.4.1 (Nested Interval Property). For each n ∈ N, assume we are given a closed interval In = [an , bn ] = {x ∈ R : an ≤ x ≤ bn }. Assume also that each In contains In+1 . Then, the resulting nested sequence of closed intervals I1 ⊇ I2 ⊇ I3 ⊇ I4 ⊇ · · · ∞ has a nonempty intersection; that is, n=1 In = ∅.

1.4. Consequences of Completeness

19

∞ Proof. In order to show that n=1 In is not empty, we are going to use the Axiom of Completeness (AoC) to produce a single real number x satisfying x ∈ In for every n ∈ N. Now, AoC is a statement about bounded sets, and the one we want to consider is the set A = {an : n ∈ N} of left-hand endpoints of the intervals. [ a1

A={an : n∈N}

[ a2

[ a3

[ · · · an · · ·

] · · · bn

] · · · b3

] b2

] b1

Because the intervals are nested, we see that every bn serves as an upper bound for A. Thus, we are justiﬁed in setting x = sup A. Now, consider a particular In = [an , bn ]. Because x is an upper bound for A, we have an ≤ x. The fact that each bn is an upper bound for A and that x is the least upper bound implies x ≤ bn . Altogether then, we have an ≤ x ≤ bn , which means x ∈ In for every choice ∞ of n ∈ N. Hence, x ∈ n=1 In , and the intersection is not empty.

The Density of Q in R The set Q is an extension of N, and R in turn is an extension of Q. The next few results indicate how N and Q sit inside of R. Theorem 1.4.2 (Archimedean Property). (i) Given any number x ∈ R, there exists an n ∈ N satisfying n > x. (ii) Given any real number y > 0, there exists an n ∈ N satisfying 1/n < y. Proof. Part (i) of the proposition states that N is not bounded above. There has never been any doubt about the truth of this, and it could be reasonably argued that we should not have to prove it at all. This is a legitimate point of view, especially in light of the fact that we have decided to assume other familiar properties of N, Z, and Q as given. The counterargument is that we will prove it because we can. A set can possess the Archimedean property without being complete—Q is a ﬁne example— but a demonstration of this fact requires a good deal of scrutiny into the axiomatic construction of the ordered ﬁeld in question. In the case of R, the Axiom of Completeness furnishes us with a very short argument. A large number of deep results ultimately depend on this relationship between R and N, so having a proof for it adds a little extra certainty to these upcoming arguments. And so to the proof. Assume, for contradiction, that N is bounded above. By the Axiom of Completeness (AoC), N should then have a least upper bound,

20

Chapter 1. The Real Numbers

and we can set α = sup N. If we consider α − 1, then we no longer have an upper bound (see Lemma 1.3.7), and therefore there exists an n ∈ N satisfying α − 1 < n. But this is equivalent to α < n + 1. Because n + 1 ∈ N, we have a contradiction to the fact that α is supposed to be an upper bound for N. (Notice that the contradiction here depends only on AoC and the fact that N is closed under addition.) Part (ii) follows from (i) by letting x = 1/y. This familiar property of N is the key to an extremely important fact about how Q ﬁts inside of R. Theorem 1.4.3 (Density of Q in R). For every two real numbers a and b with a < b, there exists a rational number r satisfying a < r < b. Proof. To simplify matters, let’s assume 0 ≤ a < b. The case where a < 0 follows quickly from this one (Exercise 1.4.1). A rational number is a quotient of integers, so we must produce m, n ∈ N so that (1)

a<

m < b. n

The ﬁrst step is to choose the denominator n large enough so that consecutive increments of size 1/n are too close together to “step over” the interval (a, b). 1 n

2 n

3 n

m−1 n

···

• a

0

m n

• b

Using Theorem 1.4.2, we may pick n ∈ N large enough so that 1 < b − a. n

(2)

Multiplying inequality (1) by n gives na < m < nb. With n already chosen, the idea now is to choose m to be the smallest natural number greater than na. In other words, pick m ∈ N so that (3)

(4)

m − 1 ≤ na < m. Now, inequality (4) immediately yields a < m/n, which is half of the battle. Keeping in mind that inequality (2) is equivalent to a < b − 1/n, we can use (3) to write m

≤ na + 1 1 < n b− +1 n = nb.

Because m < nb implies m/n < b, we have a < m/n < b, as desired.

1.4. Consequences of Completeness

21

Theorem 1.4.3 is paraphrased by saying that Q is dense in R. Without working too hard, we can use this result to show that the irrational numbers are dense in R as well. Corollary 1.4.4. Given any two real numbers a < b, there exists an irrational number t satisfying a < t < b. Proof. Exercise 1.4.3.

The Existence of Square Roots It is time to tend to some unﬁnished business left over from Example 1.3.6 and this chapter’s opening discussion. Theorem 1.4.5. There exists a real number α ∈ R satisfying α2 = 2. Proof. After reviewing Example 1.3.6, consider the set T = {t ∈ R : t2 < 2} and set α = sup T . We are going to prove α2 = 2 by ruling out the possibilities α2 < 2 and α2 > 2. Keep in mind that there are two parts to the deﬁnition of sup T , and they will both be important. (This always happens when a supremum is used in an argument.) The strategy is to demonstrate that α2 < 2 violates the fact that α is an upper bound for T , and α2 > 2 violates the fact that it is the least upper bound. Let’s ﬁrst see what happens if we assume α2 < 2. In search of an element of T that is larger than α, write α+

1 n

2

1 2α + 2 n n 1 2α + < α2 + n n 2α + 1 2 = α + . n = α2 +

But now assuming α2 < 2 gives us a little space in which to ﬁt the (2α + 1)/n term and keep the total less than 2. Speciﬁcally, choose n0 ∈ N large enough so that 1 2 − α2 < . n0 2α + 1 This implies (2α + 1)/n0 < 2 − α2 , and consequently that

1 α+ n0

2

< α2 + (2 − α2 ) = 2.

Thus, α + 1/n0 ∈ T , contradicting the fact that α is an upper bound for T . We conclude that α2 < 2 cannot happen.

22

Chapter 1. The Real Numbers Now, what about the case α2 > 2? This time, write

1 α− n

2

2α 1 + 2 n n 2α . > α2 − n = α2 −

The remainder of the argument is requested in Exercise 1.4.6. √ A small modiﬁcation of this proof can be used to show that x exists for m any x ≥ 0. A formula for √ expanding (α + 1/n) called the binomial formula can be used to show that m x exists for arbitrary values of m ∈ N.

Countable and Uncountable Sets The applications of the Axiom of Completeness to this point have basically served to restore our conﬁdence in properties we already felt we knew about the real number system. One ﬁnal consequence of completeness that we are about to present is of a very diﬀerent nature and, on its own, represents an astounding intellectual discovery. The traditional way that mathematics gets done is by one mathematician modifying and expanding on the work of those who came before. This model does not seem to apply to Georg Cantor (1845–1918), at least with regard to his work on the theory of inﬁnite sets. At the moment, we have an image of R as consisting of rational and irrational numbers, continuously packed together along the real line. We have seen that both Q and I (the set of irrationals) are dense in R, meaning that in every interval (a, b) there exist rational and irrational numbers alike. Mentally, there is a temptation to think of Q and I as being intricately mixed together in equal proportions, but this turns out not to be the case. In a way that Cantor made precise, the irrational numbers far outnumber the rational numbers in making up the real line. Cardinality The term cardinality is used in mathematics to refer to the size of a set. The cardinalities of ﬁnite sets can be compared simply by attaching a natural number to each set. The set of Snow White’s dwarfs is smaller than the set of United States Supreme Court Justices because 7 is less than 9. But how might we draw this same conclusion without referring to any numbers? Cantor’s idea was to attempt to put the sets into a 1–1 correspondence with each other. There are fewer dwarfs than Justices because, if the dwarfs were all simultaneously appointed to the bench, there would still be two empty chairs to ﬁll. On the other hand, the cardinality of the Supreme Court is the same as the cardinality of the set of ﬁelders on a baseball team. This is because, when the judges take the ﬁeld, it is possible to arrange them so that there is exactly one judge at every position.

1.4. Consequences of Completeness

23

The advantage of this method of comparing the sizes of sets is that it works equally well on sets that are inﬁnite. Deﬁnition 1.4.6. A function f : A → B is one-to-one (1–1) if a1 = a2 in A implies that f (a1 ) = f (a2 ) in B. The function f is onto if, given any b ∈ B, it is possible to ﬁnd an element a ∈ A for which f (a) = b. A function f : A → B that is both 1–1 and onto provides us with exactly what we mean by a 1–1 correspondence between two sets. The property of being 1–1 means that no two elements of A correspond to the same element of B (no two judges are playing the same position), and the property of being onto ensures that every element of B corresponds to something in A (there is a judge at every position). Deﬁnition 1.4.7. Two sets A and B have the same cardinality if there exists f : A → B that is 1–1 and onto. In this case, we write A ∼ B. Example 1.4.8. (i) If we let E = {2, 4, 6, . . . } be the set of even natural numbers, then we can show N ∼ E. To see why, let f : N → E be given by f (n) = 2n. N: E:

1

2

3

4 ···

n

···

2

4

6

8

···

2n

··· ···

It is certainly true that E is a proper subset of N, and for this reason it may seem logical to say that E is a “smaller” set than N. This is one way to look at it, but it represents a point of view that is heavily biased from an overexposure to ﬁnite sets. The deﬁnition of cardinality is quite speciﬁc, and from this point of view E and N are equivalent. (ii) To make this point again, note that although N is contained in Z as a proper subset, we can show N ∼ Z. This time let (n − 1)/2 if n is odd f (n) = −n/2 if n is even. The important details to verify are that f does not map any two natural numbers to the same element of Z (f is 1–1) and that every element of Z gets “hit” by something in N (f is onto).

N: 1

2

3

4

5

6

7 ···

Z:

0 −1 1

−2 2

−3 3 · · ·

24

Chapter 1. The Real Numbers

−1

•

• 1

Figure 1.4: (−1, 1) ∼ R using f (x) = x/(x2 − 1).

Example 1.4.9. A little calculus (which we will not supply) shows that the function f (x) = x/(x2 − 1) takes the interval (−1, 1) onto R in a 1–1 fashion (Fig. 1.4). Thus (−1, 1) ∼ R. In fact, (a, b) ∼ R for any interval (a, b). Countable Sets Deﬁnition 1.4.10. A set A is countable if N ∼ A. An inﬁnite set that is not countable is called an uncountable set. From Example 1.4.8, we see that both E and Z are countable sets. Putting a set into a 1–1 correspondence with N, in eﬀect, means putting all of the elements into an inﬁnitely long list or sequence. Looking at Example 1.4.8, we can see that this was quite easy to do for E and required only a modest bit of shuﬄing for the set Z. A natural question arises as to whether all inﬁnite sets are countable. Given some inﬁnite set such as Q or R, it might seem as though, with enough cleverness, we should be able to ﬁt all the elements of our set into a single list (i.e., into a correspondence with N). After all, this list is inﬁnitely long so there should be plenty of room. But alas, as Hardy remarks, “[The mathematician’s] subject is the most curious of all—there is none in which truth plays such odd pranks.” Theorem 1.4.11. (i) The set Q is countable. (ii) The set R is uncountable. Proof. (i) For each n ∈ N, let An be the set given by p An = ± : where p, q ∈ N are in lowest terms with p + q = n . q

1.4. Consequences of Completeness

25

The ﬁrst few of these sets look like 1 −1 2 −2 0 1 −1 , , , A1 = , A2 = , A3 = , , 1 1 1 2 2 1 1 1 −1 3 −3 1 −1 2 −2 3 −3 4 −4 A4 = , and A5 = . , , , , , , , , , , 3 3 1 1 4 4 3 3 2 2 1 1 The crucial observation is that each An is ﬁnite and every rational number appears in exactly one of these sets. Our 1–1 correspondence with N is then achieved by consecutively listing the elements in each An . N: Q:

1

2

3

4

5

6

7

8

9

10

11

12

0 1

1 1

− 12

2 1

− 13

3 1

− 31

1 4

− 11 12

A1

A2

A3

− 21 13

A4

··· ···

Admittedly, writing an explicit formula for this correspondence would be an awkward task, and attempting to do so is not the best use of time. What matters is that we see why every rational number appears in the correspondence exactly once. Given, say, 22/7, we have that 22/7 ∈ A29 . Because the set of elements in A1 , . . . , A28 is ﬁnite, we can be conﬁdent that 22/7 eventually gets included in the sequence. The fact that this line of reasoning applies to any rational number p/q is our proof that the correspondence is onto. To verify that it is 1–1, we observe that the sets An were constructed to be disjoint so that no rational number appears twice. This completes the proof of (i). (ii) The second statement of Theorem 1.4.11 is the truly unexpected part, and its proof is done by contradiction. Assume that there does exist a 1–1, onto function f : N → R. Again, what this suggests is that it is possible to enumerate the elements of R. If we let x1 = f (1), x2 = f (2), and so on, then our assumption that f is onto means that we can write (1)

R = {x1 , x2 , x3 , x4 , . . . }

and be conﬁdent that every real number appears somewhere on the list. We will now use the Nested Interval Property (Theorem 1.4.1) to produce a real number that is not there. Let I1 be a closed interval that does not contain x1 . Next, let I2 be a closed interval, contained in I1 , which does not contain x2 . The existence of such an I2 is easy to verify. Certainly I1 contains two smaller disjoint closed intervals, and x2 can only be in one of these. In general, given an interval In , construct In+1 to satisfy (i) In+1 ⊆ In and (ii) xn+1 ∈ / In+1 .

26

Chapter 1. The Real Numbers In

[ [

]

•

xn+1

]

•

xn

In+1

∞ We now consider the intersection n=1 In . If xn0 is some real number from the list in (1), then we have xn0 ∈ / In0 , and it follows that xn0 ∈ /

∞

In .

n=1

Now, we are assuming that the list in (1) contains every real number, and this leads to the conclusion that ∞ In = ∅. n=1

∞ However, the Nested IntervalProperty (NIP) asserts that n=1 In = ∅. By ∞ NIP, there is at least one x ∈ n=1 In that, consequently, cannot be on the list in (1). This contradiction means that such an enumeration of R is impossible, and we conclude that R is an uncountable set. What exactly should we make of this discovery? It is an important exercise to show that any subset of a countable set must be either countable or ﬁnite. This should not be too surprising. If a set can be arranged into a single list, then deleting some elements from this list results in another (shorter, and potentially terminating) list. This means that countable sets are the smallest type of inﬁnite set. Anything smaller is either still countable or ﬁnite. The force of Theorem 1.4.11 is that the cardinality of R is, informally speaking, a larger type of inﬁnity. The real numbers so outnumber the natural numbers that there is no way to map N onto R. No matter how we attempt this, there are always real numbers to spare. The set Q, on the other hand, is countable. As far as inﬁnite sets are concerned, this is as small as it gets. What does this imply about the set I of irrational numbers? By imitating the demonstration that N ∼ Z, we can prove that the union of two countable sets must be countable. Because R = Q ∪ I, it follows that I cannot be countable because otherwise R would be. The inescapable conclusion is that, despite the fact that we have encountered so few of them, the irrational numbers form a far greater subset of R than Q. The properties of countable sets described in this discussion are useful for a few exercises in upcoming chapters. For easier reference, we state them as some ﬁnal propositions and outline their proofs in the exercises that follow. Theorem 1.4.12. If A ⊆ B and B is countable, then A is either countable, ﬁnite, or empty. Theorem 1.4.13. (i) If A1 , A2 , . . . Am are each countable sets, then the union A1 ∪ A2 ∪ · · · ∪ Am is countable. ∞ (ii) If An is a countable set for each n ∈ N, then n=1 An is countable.

1.4. Consequences of Completeness

27

Exercises Exercise 1.4.1. Without doing too much work, show how to prove Theorem 1.4.3 in the case where a < 0 by converting this case into the one already proven. Exercise 1.4.2. Recall that I stands for the set of irrational numbers. (a) Show that if a, b ∈ Q, then ab and a + b are elements of Q as well. (b) Show that if a ∈ Q and t ∈ I, then a + t ∈ I and at ∈ I as long as a = 0. (c) Part (a) can be summarized by saying that Q is closed under addition and multiplication. Is I closed under addition and multiplication? Given two irrational numbers s and t, what can we say about s + t and st? Exercise 1.4.3. Using Exercise 1.4.2, supply a√proof for Corollary 1.4.4 by √ applying Theorem 1.4.3 to the real numbers a − 2 and b − 2. Exercise 1.4.4. Use the Archimedean Property of R to rigorously prove that inf{1/n : n ∈ N} = 0. ∞ Exercise 1.4.5. Prove that n=1 (0, 1/n) = ∅. Notice that this demonstrates that the intervals in the Nested Interval Property must be closed for the conclusion of the theorem to hold. Exercise 1.4.6. (a) Finish the proof of Theorem 1.4.5 by showing that the assumption α2 > 2 leads to a contradiction of the fact that √ α = sup T . (b) Modify this argument to prove the existence of b for any real number b ≥ 0. Exercise 1.4.7. Finish the following proof for Theorem 1.4.12. Assume B is a countable set. Thus, there exists f : N → B, which is 1–1 and onto. Let A ⊆ B be an inﬁnite subset of B. We must show that A is countable. Let n1 = min{n ∈ N : f (n) ∈ A}. As a start to a deﬁnition of g : N → A, set g(1) = f (n1 ). Show how to inductively continue this process to produce a 1–1 function g from N onto A. Exercise 1.4.8. Use the following outline to supply proofs for the statements in Theorem 1.4.13. (a) First, prove statement (i) for two countable sets, A1 and A2 . Example 1.4.8 (ii) may be a useful reference. Some technicalities can be avoided by ﬁrst replacing A2 with the set B2 = A2 \A1 = {x ∈ A2 : x ∈ / A1 }. The point of this is that the union A1 ∪ B2 is equal to A1 ∪ A2 and the sets A1 and B2 are disjoint. (What happens if B2 is ﬁnite?) Now, explain how the more general statement in (i) follows. (b) Explain why induction cannot be used to prove part (ii) of Theorem 1.4.13 from part (i).

28

Chapter 1. The Real Numbers (c) Show how arranging N into the two-dimensional array 1 2 4 7 11 .. .

3 5 8 12 ···

6 9 13 ···

10 14 ···

15 ···

···

leads to a proof of Theorem 1.4.13 (ii). Exercise 1.4.9. (a) Given sets A and B, explain why A ∼ B is equivalent to asserting B ∼ A. (b) For three sets A, B, and C, show that A ∼ B and B ∼ C implies A ∼ C. These two properties are what is meant by saying that ∼ is an equivalence relation. Exercise 1.4.10. Show that the set of all ﬁnite subsets of N is a countable set. (It turns out that the set of all subsets of N is not a countable set. This is the topic of Section 1.5.) Exercise 1.4.11. Consider the open interval (0,1), and let S be the set of points in the open unit square; that is, S = {(x, y) : 0 < x, y < 1}. (a) Find a 1–1 function that maps (0, 1) into, but not necessarily onto, S. (This is easy.) (b) Use the fact that every real number has a decimal expansion to produce a 1–1 function that maps S into (0, 1). Discuss whether the formulated function is onto. (Keep in mind that any terminating decimal expansion such as .235 represents the same real number as .234999 . . . .) The Schr¨oder–Bernstein Theorem discussed in Exercise 1.4.13 to follow can now be applied to conclude that (0, 1) ∼ S. Exercise 1.4.12. A real number x ∈ R is called algebraic if there exist integers a0 , a1 , a2 , . . . , an ∈ Z, not all zero, such that an xn + an1 xn−1 + · · · + a1 x + a0 = 0. Said another way, a real number is algebraic if it is the root of a polynomial with integer coeﬃcients. Real numbers that are not algebraic are called transcendental numbers. Reread the last paragraph of Section 1.1. The ﬁnal question posed here is closely related to the question of whether or not transcendental numbers exist. √ √ √ √ (a) Show that 2, 3 2, and 3 + 2 are algebraic. (b) Fix n ∈ N, and let An be the algebraic numbers obtained as roots of polynomials with integer coeﬃcients that have degree n. Using the fact that every polynomial has a ﬁnite number of roots, show that An is countable. (For each m ∈ N, consider polynomials an xn + an1 xn−1 + · · · + a1 x + a0 that satisfy |an | + |an−1 | + · · · + |a1 | + |a0 | ≤ m.)

1.5. Cantor’s Theorem

29

(c) Now, argue that the set of all algebraic numbers is countable. What may we conclude about the set of transcendental numbers? Exercise 1.4.13 (Schr¨ oder–Bernstein Theorem). Assume there exists a 1– 1 function f : X → Y and another 1–1 function g : Y → X. Follow the steps to show that there exists a 1–1, onto function h : X → Y and hence X ∼ Y . (a) The range of f is deﬁned by f (X) = {y ∈ Y : y = f (x) for some x ∈ X}. Let y ∈ f (X). (Because f is not necessarily onto, the range f (X) may not be all of Y .) Explain why there exists a unique x ∈ X such that f (x) = y. Now deﬁne f −1 (y) = x, and show that f −1 is a 1–1 function from f (X) onto X. In a similar way, we can also deﬁne the 1–1 function g −1 : g(X) → Y . (b) Let x ∈ X be arbitrary. Let the chain Cx be the set consisting of all elements of the form (1)

. . . , f −1 (g −1 (x)), g −1 (x), x, f (x), g(f (x)), f (g(f (x))), . . . .

Explain why the number of elements to the left of x in the above chain may be zero, ﬁnite, or inﬁnite. (c) Show that any two chains are either identical or completely disjoint. (d) Note that the terms of the chain in (1) alternate between elements of X and elements of Y . Given a chain Cx , we want to focus on Cx ∩ Y , which is just the part of the chain that sits in Y . Deﬁne the set A to be the union of all chains Cx satisfying Cx ∩ Y ⊆ f (X). Let B consist of the union of the remaining chains not in A. Show that any chain contained in B must be of the form y, g(y), f (g(y)), g(f (g(y))), . . . , where y is an element of Y that is not in f (X). (e) Let X1 = A ∩ X, X2 = B ∩ X, Y1 = A ∩ Y , and Y2 = B ∩ Y . Show that f maps X1 onto Y1 and that g maps Y2 onto X2 . Use this information to prove X ∼Y.

1.5

Cantor’s Theorem

Cantor’s work into the theory of inﬁnite sets extends far beyond the conclusions of Theorem 1.4.11. Although initially resisted, his creative and relentless assault in this area eventually produced a revolution in set theory and a paradigm shift in the way mathematicians came to understand the inﬁnite.

Cantor’s Diagonalization Method The proof presented for Theorem 1.4.11 (ii) is diﬀerent from any of the arguments that Cantor gave for this result. It was chosen because of how directly it reveals the connection between the concepts of uncountability and completeness, and because the technique of using nested intervals will be used several more times in our work ahead.

30

Chapter 1. The Real Numbers

Cantor initially published his discovery that R is uncountable in 1874, but in 1891 he oﬀered another proof of this same fact that is startling in its simplicity. It relies on decimal representations for real numbers, which we will accept and use without any formal deﬁnitions. Theorem 1.5.1. The open interval (0, 1) = {x ∈ R : 0 < x < 1} is uncountable. Exercise 1.5.1. Show that (0, 1) is uncountable if and only if R is uncountable. This shows that Theorem 1.5.1 is equivalent to Theorem 1.4.11. Proof. As with Theorem 1.4.11, we proceed by contradiction and assume that there does exist a function f : N → (0, 1) that is 1–1 and onto. For each m ∈ N, f (m) is a real number between 0 and 1, and we represent it using the decimal notation f (m) = .am1 am2 am3 am4 am5 . . . . What is meant here is that for each m, n ∈ N, amn is the digit from the set {0, 1, 2, . . . , 9} that represents the nth digit in the decimal expansion of f (m). The 1–1 correspondence between N and (0, 1) can be summarized in the doubly indexed array N 1 ←→

(0, 1) f (1) = .a11

a12

a13

a14

a15

a16

···

2

←→

f (2)

=

.a21

a22

a23

a24

a25

a26

···

3

←→

f (3)

=

.a31

a32

a33

a34

a35

a36

···

4

←→

f (4)

=

.a41

a42

a43

a44

a45

a46

···

5

←→

f (5)

=

.a51

a52

a53

a54

a55

a56

···

6

←→

f (6)

=

.a61

a62

a63

a64

a65

a66

···

.. .

.. .

.. .

.. .

.. .

.. .

..

.. .

.. .

.

The key assumption about this correspondence is that every real number in (0, 1) is assumed to appear somewhere on the list. Now for the pearl of the argument. Deﬁne a real number x ∈ (0, 1) with the decimal expansion x = .b1 b2 b3 b4 . . . using the rule 2 if ann = 2 bn = 3 if ann = 2. Let’s be clear about this. To compute the digit b1 , we look at the digit a11 in the upper left-hand corner of the array. If a11 = 2, then we choose b1 = 3; otherwise, we set b1 = 2. Exercise 1.5.2. (a) Explain why the real number x = .b1 b2 b3 b4 . . . cannot be f (1). (b) Now, explain why x = f (2), and in general why x = f (n) for any n ∈ N.

1.5. Cantor’s Theorem

31

(c) Point out the contradiction that arises from these observations and conclude that (0, 1) is uncountable.

Exercise 1.5.3. Supply rebuttals to the following complaints about the proof of Theorem 1.5.1. (a) Every rational numbers has a decimal expansion so we could apply this same argument to show that the set of rational numbers between 0 and 1 is uncountable. However, because we know that any subset of Q must be countable, the proof of Theorem 1.5.1 must be ﬂawed. (b) A few numbers have two diﬀerent decimal representations. Speciﬁcally, any decimal expansion that terminates can also be written with repeating 9’s. For instance, 1/2 can be written as .5 or as .4999 . . . . Doesn’t this cause some problems? Exercise 1.5.4. Let S be the set consisting of all sequences of 0’s and 1’s. Observe that S is not a particular sequence, but rather a large set whose elements are sequences; namely, S = {(a1 , a2 , a3 , . . . ) : an = 0 or 1}. As an example, the sequence (1, 0, 1, 0, 1, 0, 1, 0, . . . ) is an element of S, as is the sequence (1, 1, 1, 1, 1, 1, . . . ). Give a rigorous argument showing that S is uncountable. Having distinguished between the countable inﬁnity of N and the uncountable inﬁnity of R, a new question that occupied Cantor was whether or not there existed an inﬁnity “above” that of R. This is logically treacherous territory. The same care we gave to deﬁning the relationship “has the same cardinality as” needs to be given to deﬁning relationships such as “has cardinality greater than” or “has cardinality less than or equal to.” Nevertheless, without getting too weighed down with formal deﬁnitions, one gets a very clear sense from the next result that there is a hierarchy of inﬁnite sets that continues well beyond the continuum of R.

Power Sets and Cantor’s Theorem Given a set A, the power set P (A) refers to the collection of all subsets of A. It is important to understand that P (A) is itself considered a set whose elements are the diﬀerent possible subsets of A. Exercise 1.5.5. (a) Let A = {a, b, c}. List the eight elements of P (A). (Do not forget that ∅ is considered to be a subset of every set.) (b) If A is ﬁnite with n elements, show that P (A) has 2n elements. (Constructing a particular subset of A can be interpreted as making a series of decisions about whether or not to include each element of A.)

32

Chapter 1. The Real Numbers

Exercise 1.5.6. (a) Using the particular set A = {a, b, c}, exhibit two diﬀerent 1–1 mappings from A into P (A). (b) Letting B = {1, 2, 3, 4}, produce an example of a 1–1 map g : B → P (B). (c) Explain why, in parts (a) and (b), it is impossible to construct mappings that are onto. Cantor’s Theorem states that the phenomenon in Exercise 1.5.6 holds for inﬁnite sets as well as ﬁnite sets. Whereas mapping A into P (A) is quite eﬀortless, ﬁnding an onto map is impossible. Theorem 1.5.2 (Cantor’s Theorem). Given any set A, there does not exist a function f : A → P (A) that is onto. Proof. This proof, like the others of its kind, is indirect. Thus, assume, for contradiction, that f : A → P (A) is onto. Unlike the usual situation in which we have sets of numbers for the domain and range, f is a correspondence between a set and its power set. For each element a ∈ A, f (a) is a particular subset of A. The assumption that f is onto means that every subset of A appears as f (a) for some a ∈ A. To arrive at a contradiction, we will produce a subset B ⊆ A that is not equal to f (a) for any a ∈ A. Construct B using the following rule. For each element a ∈ A, consider the subset f (a). This subset of A may contain the element a or it may not. This depends on the function f . If f (a) does not contain a, then we include a in our set B. More precisely, let B = {a ∈ A : a ∈ / f (a)}. Exercise 1.5.7. Return to the particular functions contructed in Exercise 1.5.6 and construct the subset B that results using the preceding rule. In each case, note that B is not in the range of the function used. We now focus on the general argument. Because we have assumed that our function f : A → P (A) is onto, it must be that B = f (a ) for some a ∈ A. The contradiction arises when we consider whether or not a is an element of B. Exercise 1.5.8. (a) First, show that the case a ∈ B leads to a contradiction. (b) Now, ﬁnish the argument by showing that the case a ∈ / B is equally unacceptable.

Exercise 1.5.9. As a ﬁnal exercise, answer each of the following by establishing a 1–1 correspondence with a set of known cardinality. (a) Is the set of all functions from {0, 1} to N countable or uncountable? (b) Is the set of all functions from N to {0, 1} countable or uncountable? (c) Given a set B, a subset A of P (B) is called an antichain if no element of A is a subset of any other element of A. Does P (N) contain an uncountable antichain?

1.6. Epilogue

1.6

33

Epilogue

The relationship of having the same cardinality is an equivalence relation (see Exercise 1.4.9), meaning, roughly, that all of the sets in the universe can be organized into disjoint groups according to their size. Two sets appear in the same group, or equivalence class, if and only if they have the same cardinality. Thus, N, Z, and Q are grouped together in one class with all of the other countable sets, whereas R is in another class that includes the interval (0, 1) among other uncountable sets. One implication of Cantor’s Theorem is that P (R)—the set of all subsets of R—is in a diﬀerent class from R, and there is no reason to stop here. The set of subsets of P (R)—namely P (P (R))—is in yet another class, and this process continues indeﬁnitely. Having divided the universe of sets into disjoint groups, it would be convenient to attach a “number” to each collection which could be used the way natural numbers are used to refer to the sizes of ﬁnite sets. Given a set X, there exists something called the cardinal number of X, denoted card X, which behaves very much in this fashion. For instance, two sets X and Y satisfy card X = card Y if and only if X ∼ Y . (Rigorously deﬁning card X requires some signiﬁcant set theory. One way this is done is to deﬁne card X to be a very particular set that can always be uniquely found in the same equivalence class as X.) Looking back at Cantor’s Theorem, we get the strong sense that there is an order on the sizes of inﬁnite sets that should be reﬂected in our new cardinal number system. Speciﬁcally, if it is possible to map a set X into Y in a 1–1 fashion, then we want card X ≤ card Y . Writing the strict inequality card X < card Y should indicate that it is possible to map X into Y but that it is impossible to show X ∼ Y . Restated in this notation, Cantor’s Theorem states that for every set A, card A < card P (A). There are some signiﬁcant details to work out. A kind of metaphysical problem arises when we realize that an implication of Cantor’s Theorem is that there can be no “largest” set. A declaration such as, “Let U be the set of all possible things,” is paradoxical because we immediately get that card U < card P (U ) and thus the set U does not contain everything it was advertised to hold. Issues such as this one are ultimately resolved by imposing some restrictions on what can qualify as a set. As set theory was formalized, the axioms had to be crafted so that objects such as U are simply not allowed. A more down-toearth problem in need of attention is demonstrating that our deﬁnition of “≤” between cardinal numbers really is an ordering. This involves showing that cardinal numbers possess a property analogous to real numbers, which states that if card X ≤ card Y and card Y ≤ card X, then card X = card Y . In the end, this boils down to proving that if there exists f : X → Y that is 1–1, and if there exists g : Y → X that is 1–1, then it is possible to ﬁnd a function h : X → Y that is both 1–1 and onto. A proof of this fact eluded Cantor but was eventually supplied independently by Ernst Schr¨oder (in 1896) and Felix Bernstein (in 1898). An argument for the Schr¨oder–Bernstein Theorem is outlined in Exercise 1.4.13.

34

Chapter 1. The Real Numbers

There was another deep problem stemming from the budding theory of cardinal numbers that occupied Cantor and which was not resolved during his lifetime. Because of the importance of countable sets, the symbol ℵ0 (“aleph naught”) is frequently used for card N. The subscript “0” is appropriate when we remember that countable sets are the smallest type of inﬁnite set. In terms of cardinal numbers, if card X < ℵ0 , then X is ﬁnite. Thus, ℵ0 is the smallest inﬁnite cardinal number. The cardinality of R is also signiﬁcant enough to deserve the special designation c = card R = card(0, 1). The content of Theorems 1.4.11 and 1.5.1 is that ℵ0 < c. The question that plagued Cantor was whether there were any cardinal numbers strictly in between these two. Put another way, does there exist a set A ⊆ R with card N < card A < card R? Cantor was of the opinion that no such set existed. In the ordering of cardinal numbers, he conjectured, c was the immediate successor of ℵ0 . Cantor’s “continuum hypothesis,” as it came to be called, was one of the most famous mathematical challenges of the past century. Its unexpected resolution came in two parts. In 1940, the German logician and mathematician Kurt G¨ odel demonstrated that, using only the agreed-upon set of axioms of set theory, there was no way to disprove the continuum hypothesis. In 1963, Paul Cohen successfully showed that, under the same rules, it was also impossible to prove this conjecture. Taken together, what these two discoveries imply is that the continuum hypothesis is undecidable. It can be accepted or rejected as a statement about the nature of inﬁnite sets, and in neither case will any logical contradictions arise. The mention of Kurt G¨odel brings to mind a ﬁnal comment about the signiﬁcance of Cantor’s work. G¨odel is best known for his “Incompleteness Theorems,” which pertain to the strength of axiomatic systems in general. What G¨ odel showed was that any consistent axiomatic system created to study arithmetic was necessarily destined to be “incomplete” in the sense that there would always be true statements that the system of axioms would be too weak to prove. At the heart of G¨odel’s very complicated proof is a type of manipulation closely related to what is happening in the proofs of Theorems 1.5.1 and 1.5.2. Variations of Cantor’s proof methods can also be found in the limitative results of computer science. The “halting problem” asks, loosely, whether some general algorithm exists that can look at every program and decide if that program eventually terminates. The proof that no such algorithm exists uses a diagonalization-type construction at the core of the argument. The main point to make is that not only are the implications of Cantor’s theorems profound but the argumentative techniques are as well. As a more immediate example of this phenomenon, the diagonalization method is used again in Chapter 6—in a constructive way—as a crucial step in the proof of the Arzela–Ascoli Theorem.

Chapter 2

Sequences and Series 2.1

Discussion: Rearrangements of Inﬁnite Series

Consider the inﬁnite series ∞ 1 1 1 1 1 1 1 (−1)n+1 = 1 − + − + − + − + ··· . n 2 3 4 5 6 7 8 n=1

If we naively begin adding from the left-hand side, we get a sequence of what are called partial sums. In other words, let sn equal the sum of the ﬁrst n terms of the series, so that s1 = 1, s2 = 1/2, s3 = 5/6, s4 = 7/12, and so on. One immediate observation is that the successive sums oscillate in a progressively narrower space. The odd sums decrease (s1 > s3 > s5 > . . . ) while the even sums increase (s2 < s4 < s6 < . . . ). S≈.69 s2

•

0

s4 s6

••

s5 s3

❄ • •

s1

• 1

s2 < s4 < s6 < · · · S · · · < s5 < s3 < s1 It seems reasonable—and we will soon prove—that the sequence (sn ) eventually hones in on a value, call it S, where the odd and even partial sums “meet”. At this moment, we cannot compute S precisely, but we know it falls somewhere between 7/12 and 5/6. Summing a few hundred terms reveals that S ≈ .69. Whatever its value, there is now an overwhelming temptation to write (1)

S =1−

1 1 1 1 1 1 1 + − + − + − + ··· 2 3 4 5 6 7 8

meaning, perhaps, that if we could indeed add up all inﬁnitely many of these numbers, then the sum would equal S. A more familiar example of an equation 35

36

Chapter 2. Sequences and Series

of this type might be 2=1+

1 1 1 1 1 1 + + + + + + ··· , 2 4 8 16 32 64

the only diﬀerence being that in the second equation we have a more recognizable value for the sum. But now for the crux of the matter. The symbols +, −, and = in the preceding equations are deceptively familiar notions being used in a very unfamiliar way. The crucial question is whether or not properties of addition and equality that are well understood for ﬁnite sums remain valid when applied to inﬁnite objects such as equation (1). The answer, as we are about to witness, is somewhat ambiguous. Treating equation (1) in a standard algebraic way, let’s multiply through by 1/2 and add it back to equation (1): 1 2S

=

1 2

− 14

+ 16

− 18

1 + 10

1 − 12

+···

1 1 1 1 + 11 − 12 + 13 −··· + S = 1 − 12 + 13 − 14 + 15 − 16 + 17 − 18 + 19 − 10

(2)

3 2

+ 13 − 12 + 15

S= 1

+ 17 − 14 + 19

1 1 + 11 − 16 + 13

···

Now, look carefully at the result. The sum in equation (2) consists precisely of the same terms as those in the original equation (1), only in a diﬀerent order. Speciﬁcally, the series in (2) is a rearrangement of (1) where we list the ﬁrst two positive terms (1 + 13 ) followed by the ﬁrst negative term (− 12 ), followed by the next two positive terms ( 15 + 17 ) and then the next negative term (− 14 ). Continuing this, it is apparent that every term in (2) appears in (1) and vice versa. The rub comes when we realize that equation (2) asserts that the sum of these rearranged, but otherwise unaltered, numbers is equal to 3/2 its original value. Indeed, adding a few hundred terms of equation (2) produces partial sums in the neighborhood of 1.03. Addition, in this inﬁnite setting, is not commutative! Let’s look at a similar rearrangement of the series ∞

(−1/2)n .

n=0

This series is geometric with ﬁrst term 1 and common ratio r = −1/2. Using the formula 1/(1 − r) for the sum of a geometric series (Example 2.7.5), we get 1−

1 1 1 2 1 1 1 1 1 1 = . + − + − + − + ··· = 2 4 8 16 32 64 128 256 3 1 − (− 12 )

This time, some computational experimentation with the “two positives, one negative” rearrangement 1+

1 1 1 1 1 1 1 1 − + + − + + − ··· 4 2 16 64 8 256 1024 32

2.1. Discussion: Rearrangements of Inﬁnite Series

37

yields partial sums quite close to 2/3. The sum of the ﬁrst 30 terms, for instance, equals .666667. Inﬁnite addition is commutative in some instances but not in others. Far from being a charming theoretical oddity of inﬁnite series, this phenomenon can be the source of great consternation in many applied situations. How, for instance, should a double summation over two index variables be deﬁned? Let’s say we are given a grid of real numbers {aij : i, j ∈ N}, where aij = 1/2j−i if j > i, aij = −1 if j = i, and aij = 0 if j < i.   1 1 1 1 −1 ··· 2 4 8 16     1 1 1  0 −1  · · · 2 4 8       1 1  0  · · · 0 −1 4 2       1  0  · · · 0 0 −1 2       0 0 0 −1 · · ·   0   .. .. .. .. .. . . . . . . . . We would like to attach a mathematical meaning to the summation ∞

aij

i,j=1

whereby we intend to include every term in the preceding array in the total. One natural idea is to temporarily ﬁx i and sum across each row. A moment’s reﬂection (and a fact about geometric series) shows that each row sums to 0. Summing the sums of the rows, we get   ∞ ∞ ∞ ∞  aij = aij  = (0) = 0. i,j=1

i=1

j=1

i=1

We could just as easily have decided to ﬁx j and sum down each column ﬁrst. In this case, we have ∞ ∞ ∞ ∞ −1 = −2. aij = aij = 2j−1 i,j=1 j=1 i=1 j=1 Changing the order of the summation changes the value of the sum. One common way that double sums arise (although not this particular one) is from the multiplication of two series. There is a natural desire to write ai bj = a i bj , i,j

38

Chapter 2. Sequences and Series

except that the expression on the right-hand side makes no sense at the moment. It is the pathologies that give rise to the need for rigor. A satisfying resolution to the questions raised will require that we be absolutely precise about what we mean as we manipulate these inﬁnite objects. It may seem that progress is slow at ﬁrst, but that is because we do not want to fall into the trap of letting the biases of our intuition corrupt our arguments. Rigorous proofs are meant to be a check on intuition, and in the end we will see that they actually improve our mental picture of the mathematical inﬁnite. As a ﬁnal example, consider something as intuitively ∞ fundamental as the associative property of addition applied to the series n=1 (−1)n . Grouping the terms one way gives (−1 + 1) + (−1 + 1) + (−1 + 1) + (−1 + 1) + · · · = 0 + 0 + 0 + 0 + · · · = 0, whereas grouping in another yields −1 + (1 − 1) + (1 − 1) + (1 − 1) + (1 − 1) + · · · = −1 + 0 + 0 + 0 + 0 + · · · = −1. Manipulations that are legitimate in ﬁnite settings do not always extend to inﬁnite settings. Deciding when they do and why they do not is one of the central themes of analysis.

2.2

The Limit of a Sequence

An understanding of inﬁnite series depends heavily on a clear understanding of the theory of sequences. In fact, most of the concepts in analysis can be reduced to statements about the behavior of sequences. Thus, we will spend a signiﬁcant amount of time investigating sequences before taking on inﬁnite series. Deﬁnition 2.2.1. A sequence is a function whose domain is N. This formal deﬁnition leads immediately to the familiar depiction of a sequence as an ordered list of real numbers. Given a function f : N → R, f (n) is just the nth term on the list. The notation for sequences reinforces this familiar understanding. Example 2.2.2. Each of the following are common ways to describe a sequence. (i) (1, 12 , 13 , 14 , · · · ), 2 3 4 ∞ (ii) ( 1+n n )n=1 = ( 1 , 2 , 3 , · · · ),

(iii) (an ), where an = 2n for each n ∈ N, (iv) (xn ), where x1 = 2 and xn+1 =

xn +1 2 .

On occasion, it will be more convenient to index a sequence beginning with n = 0 or n = n0 for some natural number n0 diﬀerent from 1. These minor variations should cause no confusion. What is essential is that a sequence be an inﬁnite list of real numbers. What happens at the beginning of such a list is of

2.2. The Limit of a Sequence

39

little importance in most cases. The business of analysis is concerned with the behavior of the inﬁnite “tail” of a given sequence. We now present what is arguably the most important deﬁnition in the book. Deﬁnition 2.2.3 (Convergence of a Sequence). A sequence (an ) converges to a real number a if, for every positive number , there exists an N ∈ N such that whenever n ≥ N it follows that |an − a| < . To indicate that (an ) converges to a, we write either lim an = a or (an ) → a. In an eﬀort to decipher this complicated deﬁnition, it helps ﬁrst to consider the ending phrase “|an − a| < ,” and think about the points that satisfy an inequality of this type. Deﬁnition 2.2.4. Given a real number a ∈ R and a positive number > 0, the set V (a) = {x ∈ R : |x − a| < } is called the -neighborhood of a. Notice that V (a) consists of all of those points whose distance from a is less than . Said another way, V (a) is an interval, centered at a, with radius . V (a)

( a−

a

) a+

Recasting the deﬁnition of convergence in terms of -neighborhoods gives a more geometric impression of what is being described. Deﬁnition 2.2.3B (Convergence of a Sequence: Topological Version). A sequence (an ) converges to a if, given any -neighborhood V (a) of a, there exists a point in the sequence after which all of the terms are in V (a). In other words, every -neighborhood contains all but a ﬁnite number of the terms of (an ).

aN

✛

a1

•

a2

•

a3

•

V (a)

··· ❄ • • •( • • •••••• • a−

a

)

✲

a+

Deﬁnition 2.2.3 and Deﬁnition 2.2.3B say precisely the same thing; the natural number N in the original version of the deﬁnition is the point where the sequence (an ) enters V (a), never to leave. It should be apparent that the value of N depends on the choice of . The smaller the -neighborhood, the larger N may have to be.

40

Chapter 2. Sequences and Series

√ Example 2.2.5. Consider the sequence (an ), where an = 1/ n. Our intuitive understanding of limits points conﬁdently to the conclusion that 1 lim √ = 0. n Before trying to prove this not too impressive fact, let’s ﬁrst explore the relationship between and N in the deﬁnition of convergence. For the moment, take to be 1/10. This deﬁnes a sort of “target zone” for the terms in the sequence. By claiming that the limit of (an ) is 0, we are saying that the terms in this sequence eventually get arbitrarily close to 0. How close? What do we mean by “eventually”? We have set = 1/10 as our standard for closeness, which leads to the -neighborhood (−1/10, 1/10) centered around the limit 0. How far out into the sequence must we look before the terms fall into this interval? The 100th term a100 = 1/10 puts us right on the boundary, and a little thought reveals that 1 1 . if n > 100, then an ∈ − , 10 10 Thus, for = 1/10 we choose N = 101 (or anything larger) as our response. Now, our choice of = 1/10 was rather whimsical, and we can do this again, letting = 1/50. In this case, our target neighborhood shrinks to (−1/50, 1/50), and it is apparent that we must travel farther out into the sequence before an falls into this interval. How far? Essentially, we require that 1 1 √ < 50 n

which occurs as long as

n > 502 = 2500.

Thus, N = 2501 is a suitable response to the challenge of = 1/50. It may seem as though this duel could continue forever, with diﬀerent challenges being handed to us one after another, each one requiring a suitable value of N in response. In a sense, this is correct, except that the game is eﬀectively over the instant we recognize a rule for how to choose N given an arbitrary > 0. For this problem, the desired algorithm is implicit in the algebra carried out to compute the previous response of N = 2501. Whatever happens to be, we want 1 √ < n

which is equivalent to insisting that

n>

1 . 2

With this observation, we are ready to write the formal argument. We claim that

lim

1 √ n

= 0.

Proof. Let > 0 be an arbitrary positive number. Choose a natural number N satisfying 1 N > 2.

2.2. The Limit of a Sequence

41

We now verify that this choice of N has the desired property. Let n ≥ N . Then, n>

1 2

implies

1 √ < n

and hence

|an − 0| < .

Quantiﬁers The deﬁnition of convergence given ealier is the result of hundreds of years of reﬁning the intuitive notion of limit into a mathematically rigorous statement. The logic involved is complicated and is intimately tied to the use of the quantiﬁers “for all” and “there exists.” Learning to write a grammatically correct convergence proof goes hand in hand with a deep understanding of why the quantiﬁers appear in the order that they do. The deﬁnition begins with the phrase, “For all , there exists N ∈ N such that ...” Looking back at our ﬁrst example, we see that our formal proof begins with, “Let > 0 be an arbitrary positive number.” This is followed by a construction of N and then a demonstration that this choice of N has the desired property. This, in fact, is a basic outline for how every convergence proof should be presented. Template for a proof that (xn ) → x: - “Let > 0 be arbitrary.” - Demonstrate a choice for N ∈ N. This step usually requires the most work, almost all of which is done prior to actually writing the formal proof. - Now, show that N actually works. - “Assume n ≥ N.” - With N well chosen, it should be possible to derive the inequality |xn − x| < . Example 2.2.6. Show

lim

n+1 n

= 1.

As mentioned, before attempting a formal proof, we ﬁrst need to do some preliminary scratch work. In the ﬁrst example, we experimented by assigning speciﬁc values to (and it is not a bad idea to do this again), but let us skip straight to the algebraic punch line. The last line of our proof should be that for suitably large values of n, n + 1 < . − 1 n

42 Because

Chapter 2. Sequences and Series n + 1 1 n − 1 = n ,

this is equivalent to the inequality 1/n < or n > 1/. Thus, choosing N to be an integer greater than 1/ will suﬃce. With the work of the proof done, all that remains is the formal writeup. Proof. Let > 0 be arbitrary. Choose N ∈ N with N > 1/. To verify that this choice of N is appropriate, let n ∈ N satisfy n ≥ N . Then, n ≥ N implies n > 1/, which is the same as saying 1/n < . Finally, this means n + 1 n − 1 < , as desired.

Divergence Signiﬁcant insight into the role of the quantiﬁers in the deﬁnition of convergence can be gained by studying an example of a sequence that does not have a limit. Example 2.2.7. Consider the sequence 1 1 1 1 1 1 1 1 1 1 1 1 1 1, − , , − , , − , , − , , − , , − , , − , · · · . 2 3 4 5 5 5 5 5 5 5 5 5 5 How can we argue that this sequence does not converge to zero? Looking at the ﬁrst few terms, it seems the initial evidence actually supports such a conclusion. Given a challenge of = 1/2, a little reﬂection reveals that after N = 3 all the terms fall into the neighborhood (−1/2, 1/2). We could also handle = 1/4. (What is the smallest possible N in this case?) But the deﬁnition of convergence says “For all > 0...,” and it should be apparent that there is no response to a choice of = 1/10, for instance. This leads us to an important observation about the logical negation of the deﬁnition of convergence of a sequence. To prove that a particular number x is not the limit of a sequence (xn ), we must produce a single value of for which no N ∈ N works. More generally speaking, the negation of a statement that begins “For all P, there exists Q...” is the statement, “For at least one P, no Q is possible...” For instance, how could we disprove the spurious claim that “At every college in the United States, there is a student who is at least seven feet tall”? We have argued that the preceding sequence does not converge to 0. Let’s argue against the claim that it converges to 1/5. Choosing = 1/10 produces the neighborhood (1/10, 3/10). Although the sequence continually revisits this neighborhood, there is no point at which it enters and never leaves as the deﬁnition requires. Thus, no N exists for = 1/10, so the sequence does not converge to 1/5. Of course, this sequence does not converge to any other real number, and it would be more satisfying to simply say that this sequence does not converge.

2.2. The Limit of a Sequence

43

Deﬁnition 2.2.8. A sequence that does not converge is said to diverge. Although it is not too diﬃcult, we will postpone arguing for divergence in general until we develop a more economical divergence criterion later in Section 2.5.

Exercises Exercise 2.2.1. Verify, using the deﬁnition of convergence of a sequence, that the following sequences converge to the proposed limit. (a) lim (6n21+1) = 0. 3n+1) = 32 . (b) lim (2n+5) 2 (c) lim √n+3 = 0.

Exercise 2.2.2. What happens if we reverse the order of the quantiﬁers in Deﬁnition 2.2.3? Deﬁnition: A sequence (xn ) verconges to x if there exists an > 0 such that for all N ∈ N it is true that n ≥ N implies |xn − x| < . Give an example of a vercongent sequence. Can you give an example of a vergonent sequence that is divergent? What exactly is being described in this strange deﬁnition? Exercise 2.2.3. Describe what we would have to demonstrate in order to disprove each of the following statements. (a) At every college in the United States, there is a student who is at least seven feet tall. (b) For all colleges in the United States, there exists a professor who gives every student a grade of either A or B. (c) There exists a college in the United States where every student is at least six feet tall. Exercise 2.2.4. Argue that the sequence 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, (5 zeros), 1, . . . does not converge to zero. For what values of > 0 does there exist a response N . For which values of > 0 is there no suitable response? Exercise 2.2.5. Let [[x]] be the greatest integer less than or equal to x. For example, [[π]] = 3 and [[3]] = 3. Find lim an and supply proofs for each conclusion if (a) an = [[1/n]], (b) an = [[(10 + n)/2n]]. Reﬂecting on these examples, comment on the statement following Deﬁnition 2.2.3 that “the smaller the -neighborhood, the larger N may have to be.” Exercise 2.2.6. Suppose that for a particular > 0 we have found a suitable value of N that “works” for a given sequence in the sense of Deﬁnition 2.2.3. (a) Then, any larger/smaller (pick one) N will also work for the same > 0. (b) Then, this same N will also work for any larger/smaller value of .

44

Chapter 2. Sequences and Series

√ Exercise 2.2.7. Informally speaking, the sequence n “converges to inﬁnity.” (a) Imitate the logical structure of Deﬁnition 2.2.3 to create a rigorous deﬁnition √ for the mathematical statement lim xn = ∞. Use this deﬁnition to prove lim n = ∞. (b) What does your deﬁnition in (a) say about the particular sequence (1, 0, 2, 0, 3, 0, 4, 0, 5, 0, . . . )? Exercise 2.2.8. Here are two useful deﬁnitions: (i) A sequence (an ) is eventually in a set A ⊆ R if there exists an N ∈ N such that an ∈ A for all n ≥ N . (ii) A sequence (an ) is frequently in a set A ⊆ R if, for every N ∈ N, there exists an n ≥ N such that an ∈ A. (a) Is the sequence (−1)n eventually or frequently in the set {1}? (b) Which deﬁnition is stronger? Does frequently imply eventually or does eventually imply frequently? (c) Give an alternate rephrasing of Deﬁnition 2.2.3B using either frequently or eventually. Which is the term we want? (d) Suppose an inﬁnite number of terms of a sequence (xn ) are equal to 2. Is (xn ) necessarily eventually in the interval (1.9, 2.1)? Is it frequently in (1.9, 2.1)?

2.3

The Algebraic and Order Limit Theorems

The real purpose of creating a rigorous deﬁnition for convergence of a sequence is not to have a tool to verify computational statements such as lim 2n/(n+2) = 2. Historically, a deﬁnition of the limit like Deﬁnition 2.2.3 came 150 years after the founders of calculus began working with intuitive notions of convergence. The point of having such a logically tight description of convergence is so that we can conﬁdently state and prove statements about convergence sequences in general. We are ultimately trying to resolve arguments about what is and is not true regarding the behavior of limits with respect to the mathematical manipulations we intend to inﬂict on them. As a ﬁrst example, let us prove that convergent sequences are bounded. The term“bounded” has a rather familiar connotation but, like everything else, we need to be explicit about what it means in this context. Deﬁnition 2.3.1. A sequence (xn ) is bounded if there exists a number M > 0 such that |xn | ≤ M for all n ∈ N. Geometrically, this means that we can ﬁnd an interval [−M, M ] that contains every term in the sequence (xn ). Theorem 2.3.2. Every convergent sequence is bounded.

2.3. The Algebraic and Order Limit Theorems

45

Proof. Assume (xn ) converges to a limit l. This means that given a particular value of , say = 1, we know there must exist an N ∈ N such that if n ≥ N , then xn is in the interval (l − 1, l + 1). Not knowing whether l is positive or negative, we can certainly conclude that |xn | < |l| + 1 for all n ≥ N . xn , n≥N

✛

x2

•

x1 0

•

x3

•

( • •••••• •)

l−1

l

l+1

x5

•

x4

• ✲ ✻

M

We still need to worry (slightly) about the the terms in the sequence that come before the N th term. Because there are only a ﬁnite number of these, we let M = max{|x1 |, |x2 |, |x3 |, . . . , |xN −1 |, |l| + 1}. It follows that |xn | ≤ M for all n ∈ N, as desired. This chapter began with a demonstration of how applying familiar algebraic properties (commutativity of addition) to inﬁnite objects (series) can lead to paradoxical results. These examples are meant to instill in us a sense of caution and justify the extreme care we are taking in drawing our conclusions. The following theorems illustrate that sequences behave extremely well with respect to the operations of addition, multiplication, division, and order. Theorem 2.3.3 (Algebraic Limit Theorem). Let lim an = a, and lim bn = b. Then, (i) lim(can ) = ca, for all c ∈ R; (ii) lim(an + bn ) = a + b; (iii) lim(an bn ) = ab; (iv) lim(an /bn ) = a/b, provided b = 0. Proof. (i) Consider the case where c = 0. We want to show that the sequence (can ) converges to ca, so the structure of the proof follows the template we described in Section 2.2. First, we let be some arbitrary positive number. Our goal is to ﬁnd some point in the sequence (can ) after which we have |can − ca| < . Now,

|can − ca| = |c||an − a|.

46

Chapter 2. Sequences and Series

We are given that (an ) → a, so we know we can make |an − a| as small as we like. In particular, we can choose an N such that |an − a| < |c| whenever n ≥ N. To see that this N indeed works, observe that, for all n ≥ N , |can − ca| = |c||an − a| < |c| = . |c| The case c = 0 reduces to showing that the constant sequence (0, 0, 0, . . . ) converges to 0. This is addressed in Exercise 2.3.1. Before continuing with parts (ii), (iii), and (iv), we should point out that the proof of (i), while somewhat short, is extremely typical for a convergence proof. Before embarking on a formal argument, it is a good idea to take an inventory of what we want to make less than , and what we are given can be made small for suitable choices of n. For the previous proof, we wanted to make |can − ca| < , and we were given |an − a| < anything we like (for large values of n). Notice that in (i), and all of the ensuing arguments, the strategy each time is to bound the quantity we want to be less than , which in each case is |(terms of sequence) − (proposed limit)|, with some algebraic combination of quantities over which we have control. (ii) To prove this statement, we need to argue that the quantity |(an + bn ) − (a + b)| can be made less than an arbitrary using the assumptions that |an − a| and |bn − b| can be made as small as we like for large n. The ﬁrst step is to use the triangle inequality (Example 1.2.5) to say |(an + bn ) − (a + b)| = |(an − a) + (bn − b)| ≤ |an − a| + |bn − b|. Again, we let > 0 be arbitrary. The technique this time is to divide the between the two expressions on the right-hand side in the preceding inequality. Using the hypothesis that (an ) → a, we know there exists an N1 such that |an − a| < whenever n ≥ N1 . 2 Likewise, the assumption that (bn ) → b means that we can choose an N2 so that |bn − b| < whenever n ≥ N2 . 2 The question now arises as to which of N1 or N2 we should take to be our choice of N . By choosing N = max{N1 , N2 }, we ensure that if n ≥ N , then n ≥ N1 and n ≥ N2 . This allows us to conclude that |(an + bn ) − (a + b)| ≤ |an − a| + |bn − b| < + = 2 2

2.3. The Algebraic and Order Limit Theorems

47

for all n ≥ N , as desired. (iii) To show that (an bn ) → ab, we begin by observing that |an bn − ab| = |an bn − abn + abn − ab| ≤ |an bn − abn | + |abn − ab| = |bn ||an − a| + |a||bn − b|. In the initial step, we subtracted and then added abn , which created an opportunity to use the triangle inequality. Essentially, we have broken up the distance from an bn to ab with a midway point and are using the sum of the two distances to overestimate the original distance. This clever trick will become a familiar technique in arguments to come. Letting > 0 be arbitrary, we again proceed with the strategy of making each piece in the preceding inequality less than /2. For the piece on the right-hand side (|a||bn − b|), if a = 0 we can choose N1 so that n ≥ N1

implies

|bn − b| <

1 . |a| 2

(The case when a = 0 is handled in Exercise 2.3.7.) Getting the term on the left-hand side (|bn ||an − a|) to be less than /2 is complicated by the fact that we have a variable quantity |bn | to contend with as opposed to the constant |a| we encountered in the right-hand term. The idea is to replace |bn | with a worstcase estimate. Using the fact that convergent sequences are bounded (Theorem 2.3.2), we know there exists a bound M > 0 satisfying |bn | ≤ M for all n ∈ N. Now, we can choose N2 so that |an − a| <

1 M2

whenever

n ≥ N2 .

To ﬁnish the argument, pick N = max{N1 , N2 }, and observe that if n ≥ N , then |an bn − ab| ≤ |an bn − abn | + |abn − ab| = |bn ||an − a| + |a||bn − b| ≤ M |an − a| + |a||bn − b| + |a| = . < M M2 |a|2 (iv) This ﬁnal statement will follow from (iii) if we can prove that 1 1 (bn ) → b implies → bn b whenever b = 0. We begin by observing that 1 − 1 = |b − bn | . bn b |b||bn |

48

Chapter 2. Sequences and Series

Because (bn ) → b, we can make the preceding numerator as small as we like by choosing n large. The problem comes in that we need a worst-case estimate on the size of 1/(|b||bn |). Because the bn terms are in the denominator, we are no longer interested in an upper bound on |bn | but rather in an inequality of the form |bn | ≥ δ > 0. This will then lead to a bound on the size of 1/(|b||bn |). The trick is to look far enough out into the sequence (bn ) so that the terms are closer to b than they are to 0. Consider the particular value 0 = |b|/2. Because (bn ) → b, there exists an N1 such that |bn − b| < |b|/2 for all n ≥ N1 . This implies |bn | > |b|/2. Next, choose N2 so that n ≥ N2 implies |b|2 . 2 Finally, if we let N = max{N1 , N2 }, then n ≥ N implies 2 1 − 1 = |b − bn | 1 < |b| 1 = . bn b |b||bn | 2 |b| |b| 2 |bn − b| <

Limits and Order Although there are a few dangers to avoid (see Exercise 2.3.8), the Algebraic Limit Theorem veriﬁes that the relationship between algebraic combinations of sequences and the limiting process is as trouble-free as we could hope for. Limits can be computed from the individual component sequences provided that each component limit exists. The limiting process is also well-behaved with respect to the order operation. Theorem 2.3.4 (Order Limit Theorem). Assume lim an = a and lim bn = b. (i) If an ≥ 0 for all n ∈ N, then a ≥ 0. (ii) If an ≤ bn for all n ∈ N, then a ≤ b. (iii) If there exists c ∈ R for which c ≤ bn for all n ∈ N, then c ≤ b. Similarly, if an ≤ c for all n ∈ N, then a ≤ c. Proof. (i) We will prove this by contradiction; thus, let’s assume a < 0. The idea is to produce a term in the sequence (an ) that is also less than zero. To do this, we consider the particular value 0 = |a|. The deﬁnition of convergence guarantees that we can ﬁnd an N such that |an − a| < |a| for all n ≥ N . In particular, this would mean that |aN − a| < |a|, which implies aN < 0. This contradicts our hypothesis that aN ≥ 0. We therefore conclude that a ≥ 0. aN

(

a−0

· · · a2 a1 ❄ ••• • • ) • • • • •

a

0=a+0

2.3. The Algebraic and Order Limit Theorems

49

(ii) The Algebraic Limit Theorem ensures that the sequence (bn − an ) converges to b − a. Because bn − an ≥ 0, we can apply part (i) to get that b − a ≥ 0. (iii) Take an = c (or bn = c) for all n ∈ N, and apply (ii). A word about the idea of “tails” is in order. Loosely speaking, limits and their properties do not depend at all on what happens at the beginning of the sequence but are strictly determined by what happens when n gets large. Changing the value of the ﬁrst ten—or ten thousand—terms in a particular sequence has no eﬀect on the limit. Theorem 2.3.4, part (i), for instance, assumes that an ≥ 0 for all n ∈ N. However, the hypothesis could be weakened by assuming only that there exists some point N1 where an ≥ 0 for all n ≥ N1 . The theorem remains true, and in fact the same proof is valid with the provision that when N is chosen it be at least as large as N1 . In the language of analysis, when a property (such as non-negativity) is not necessarily true about some ﬁnite number of initial terms but is true for all terms in the sequence after some point N , we say that the sequence eventually has this property. (See Exercise 2.2.8.) Theorem 2.3.4, part (i), could be restated, “Convergent sequences that are eventually nonnegative converge to nonnegative limits.” Parts (ii) and (iii) have similar modiﬁcations, as will many other upcoming results.

Exercises Exercise 2.3.1. Show that the constant sequence (a, a, a, a, . . . ) converges to a. Exercise 2.3.2. Let xn ≥ 0 for all n ∈ N. √ (a) If (xn ) → 0, show that ( xn ) → 0. √ √ (b) If (xn ) → x, show that ( xn ) → x. Exercise 2.3.3 (Squeeze Theorem). Show that if xn ≤ yn ≤ zn for all n ∈ N, and if lim xn = lim zn = l, then lim yn = l as well. Exercise 2.3.4. Show that limits, if they exist, must be unique. In other words, assume lim an = l1 and lim an = l2 , and prove that l1 = l2 . Exercise 2.3.5. Let (xn ) and (yn ) be given, and deﬁne (zn ) to be the “shuﬄed” sequence (x1 , y1 , x2 , y2 , x3 , y3 , . . . , xn , yn , . . . ). Prove that (zn ) is convergent if and only if (xn ) and (yn ) are both convergent with lim xn = lim yn . Exercise 2.3.6. (a) Show that if (bn ) → b, then the sequence of absolute values |bn | converges to |b|. (b) Is the converse of part (a) true? If we know that |bn | → |b|, can we deduce that (bn ) → b? Exercise 2.3.7. (a) Let (an ) be a bounded (not necessarily convergent) sequence, and assume lim bn = 0. Show that lim(an bn ) = 0. Why are we not allowed to use the Algebraic Limit Theorem to prove this?

50

Chapter 2. Sequences and Series

(b) Can we conclude anything about the convergence of (an bn ) if we assume that (bn ) converges to some nonzero limit b? (c) Use (a) to prove Theorem 2.3.3, part (iii), for the case when a = 0. Exercise 2.3.8. Give an example of each of the following, or state that such a request is impossible by referencing the proper theorem(s): (a) sequences (xn ) and (yn ), which both diverge, but whose sum (xn + yn ) converges; (b) sequences (xn ) and (yn ), where (xn ) converges, (yn ) diverges, and (xn + yn ) converges; (c) a convergent sequence (bn ) with bn = 0 for all n such that (1/bn ) diverges; (d) an unbounded sequence (an ) and a convergent sequence (bn ) with (an − bn ) bounded; (e) two sequences (an ) and (bn ), where (an bn ) and (an ) converge but (bn ) does not. Exercise 2.3.9. Does Theorem 2.3.4 remain true if all of the inequalities are assumed to be strict? If we assume, for instance, that a convergent sequence (xn ) satisﬁes xn > 0 for all n ∈ N, what may we conclude about the limit? Exercise 2.3.10. If (an ) → 0 and |bn − b| ≤ an , then show that (bn ) → b. Exercise 2.3.11 (Cesaro Means). Show that if (xn ) is a convergent sequence, then the sequence given by the averages yn =

x1 + x2 + · · · + xn n

also converges to the same limit. Give an example to show that it is possible for the sequence (yn ) of averages to converge even if (xn ) does not. Exercise 2.3.12. Consider the doubly indexed array am,n = m/(m + n). (a) Intuitively speaking, what should limm,n→∞ am,n represent? Compute the “iterated” limits lim lim am,n

n→∞ m→∞

and

lim lim am,n .

m→∞ n→∞

(b) Formulate a rigorous deﬁnition in the style of Deﬁnition 2.2.3 for the statement lim am,n = l. m,n→∞

2.4

The Monotone Convergence Theorem and a First Look at Inﬁnite Series

We showed in Theorem 2.3.2 that convergent sequences are bounded. The converse statement is certainly not true. It is not too diﬃcult to produce an example of a bounded sequence that does not converge. On the other hand, if a bounded sequence is monotone, then in fact it does converge.

2.4. The Monotone Convergence Theorem and Inﬁnite Series

51

Deﬁnition 2.4.1. A sequence (an ) is increasing if an ≤ an+1 for all n ∈ N and decreasing if an ≥ an+1 for all n ∈ N. A sequence is monotone if it is either increasing or decreasing. Theorem 2.4.2 (Monotone Convergence Theorem). If a sequence is monotone and bounded, then it converges. Proof. Let (an ) be monotone and bounded. To prove (an ) converges using the deﬁnition of convergence, we are going to need a candidate for the limit. Let’s assume the sequence is increasing (the decreasing case is handled similarly), and consider the set of points {an : n ∈ N}. By assumption, this set is bounded, so we can let s = sup{an : n ∈ N}. It seems reasonable to claim that lim(an ) = s.

a1 a2 ··· an ≤an+1 ...

s=sup{an ;n∈N}

❄ • • • • • • • ••••••

✛

✲

To prove this, let > 0. Because s is the least upper bound of {an : n ∈ N}, s − is not an upper bound, so there exists a point in the sequence aN such that s − < aN . Now, the fact that (an ) is increasing implies that if n ≥ N , then aN ≤ an . Hence, s − < aN ≤ an ≤ s < s + , which implies |an − s| < , as desired. The Monotone Convergence Theorem is extremely useful for the study of inﬁnite series, largely because it asserts the convergence of a sequence without explicit mention of the actual limit. This is a good moment to do some preliminary investigations, so it is time to formalize the relationship between sequences and series. Deﬁnition 2.4.3. Let (bn ) be a sequence. An inﬁnite series is a formal expression of the form ∞ bn = b1 + b2 + b3 + b4 + b5 + · · · . n=1

We deﬁne the corresponding sequence of partial sums (sm ) by sm = b1 + b2 + b3 + · · · + bm , ∞ and say that the series n=1bn converges to B if the sequence (sm ) converges ∞ to B. In this case, we write n=1 bn = B.

52

Chapter 2. Sequences and Series

Example 2.4.4. Consider ∞ 1 . n2 n=1

Because the terms in the sum are all positive, the sequence of partial sums given by sm = 1 +

1 1 1 + + ··· + 2 m 4 9

is increasing. The question is whether or not we can ﬁnd some upper bound on (sm ). To this end, observe sm

1 1 1 1 + + + ··· + 2 2·2 3·3 4·4 m 1 1 1 1 1+ + + + ··· + 2·1 3·2 4·3 m(m − 1) 1 1 1 1 1 1 1 + − + − + ··· + − 1+ 1− 2 2 3 3 4 (m − 1) m 1 1+1− m 2.

= 1+ < = = <

Thus, 2 is an upper bound ∞for the sequence of partial sums, so by the Monotone Convergence Theorem, n=1 1/n2 converges to some (presently unknown) limit less than 2. Example 2.4.5 (Harmonic Series). This time, consider the so-called harmonic series ∞ 1 . n n=1

Again, we have an increasing sequence of partial sums, sm = 1 +

1 1 1 + + ··· + , 2 3 m

that upon naive inspection appears as though it may be bounded. However, 2 is no longer an upper bound because s4 = 1 +

1 + 2

1 1 + 3 4

>1+

1 + 2

1 1 + 4 4

= 2.

2.4. The Monotone Convergence Theorem and Inﬁnite Series

53

A similar calculation shows that s8 > 2 12 , and we can see that in general 1 1 1 1 1 1 1 + + ··· + + · · · + s2k = 1 + + + + ··· + 5 8 2k−1 + 1 2k 2 3 4 1 1 1 1 1 1 1 + + + ··· + + ··· + + ··· + k > 1+ + k 2 4 4 8 8 2 2 1 1 1 1 = 1+ +2 +4 + · · · + 2k−1 2 4 8 2k 1 1 1 1 = 1 + + + + ··· + 2 2 2 2 1 , = 1+k 2 which is unbounded. ∞ Thus, despite the incredibly slow pace, the sequence of partial sums of n=1 1/n eventually surpasses every number on the positive real line. Because convergent sequences are bounded, the harmonic series diverges. The previous example is a special case of a general argument that can be used to determine the convergence or divergence of a large class of inﬁnite series. Theorem 2.4.6 (Cauchy Condensation Test). Suppose (bn ) is decreasing ∞ and satisﬁes bn ≥ 0 for all n ∈ N. Then, the series n=1 bn converges if and only if the series ∞

2n b2n = b1 + 2b2 + 4b4 + 8b8 + 16b16 + · · ·

n=0

converges. Proof. First, assume that that the partial sums

∞

n=0

2n b2n converges. Theorem 2.3.2 guarantees

tk = b1 + 2b2 + 4b4 + · · · + 2k b2k are bounded; that is, there ∞ exists an M > 0 such that tk ≤ M for all k ∈ N. We want to prove that n=1 bn converges. Because bn ≥ 0, we know that the partial sums are increasing, so we only need to show that sm = b1 + b2 + b3 + · · · + bm is bounded. Fix m and let k be large enough to ensure m ≤ 2k+1 −1. Then, sm ≤ s2k+1 −1 and s2k+1 −1

= b1 + (b2 + b3 ) + (b4 + b5 + b6 + b7 ) + · · · + (b2k + · · · + b2k+1 −1 ) ≤ b1 + (b2 + b2 ) + (b4 + b4 + b4 + b4 ) + · · · + (b2k + · · · + b2k ) = b1 + 2b2 + 4b4 + · · · + 2k b2k = tk .

54

Chapter 2. Sequences and Series

Thus, sm ≤ tk ≤ M , and the sequence (sm ) is bounded. By the Monotone ∞ Convergence Theorem, we can conclude that n=1 ∞bn converges. ∞ n The proof that n=0 2 b2n diverges implies n=1 bn diverges is similar to Example 2.4.5. The details are requested in Exercise 2.4.1. ∞ Corollary 2.4.7. The series n=1 1/np converges if and only if p > 1. A rigorous argument for this corollary requires a few basic facts about geometric series. The proof is requested in Exercise 2.7.7 at the end of Section 2.7 where geometric series are discussed.

Exercises Exercise 2.4.6 by showing that if the ∞2.4.1. Complete the proof of Theorem ∞ series n=0 2n b2n diverges, then so does n=1 bn . Example 2.4.5 may be a useful reference. Exercise 2.4.2. (a) Prove that the sequence deﬁned by x1 = 3 and xn+1 =

1 4 − xn

converges. (b) Now that we know lim xn exists, explain why lim xn+1 must also exist and equal the same value. (c) Take the limit of each side of the recursive equation in part (a) of this exercise to explicitly compute lim xn . Exercise 2.4.3. Following the model of Exercise 2.4.2, show that the sequence deﬁned by y1 = 1 and yn+1 = 4 − 1/yn converges and ﬁnd the limit. Exercise 2.4.4. Show that √

√ √ 2, 2 2, 2 2 2, . . .

converges and ﬁnd the limit. Exercise 2.4.5 (Calculating Square Roots). Let x1 = 2, and deﬁne 1 2 xn + . xn+1 = xn 2 (a) Show that x2n is always greater than √ 2, and then use this to prove that xn − xn+1 ≥ 0. Conclude that lim xn = 2. √ (b) Modify the sequence (xn ) so that it converges to c. Exercise 2.4.6 (Limit Superior). Let (an ) be a bounded sequence. (a) Prove that the sequence deﬁned by yn = sup{ak : k ≥ n} converges.

2.5. Subsequences and the Bolzano–Weierstrass Theorem

55

(b) The limit superior of (an ), or lim sup an , is deﬁned by lim sup an = lim yn , where yn is the sequence from part (a) of this exercise. Provide a reasonable deﬁnition for lim inf an and brieﬂy explain why it always exists for any bounded sequence. (c) Prove that lim inf an ≤ lim sup an for every bounded sequence, and give an example of a sequence for which the inequality is strict. (d) Show that lim inf an = lim sup an if and only if lim an exists. In this case, all three share the same value.

2.5

Subsequences and the Bolzano–Weierstrass Theorem

In Example 2.4.5, we showed that the sequence of harmonic series does not converge by focusing our subsequence (s2k ) of the original sequence. For the topic of inﬁnite series aside and more fully develop subsequences.

partial sums (sm ) of the attention on a particular moment, we will put the the important concept of

Deﬁnition 2.5.1. Let (an ) be a sequence of real numbers, and let n1 < n2 < n3 < n4 < n5 < · · · be an increasing sequence of natural numbers. Then the sequence an1 , an2 , an3 , an4 , an5 , · · · is called a subsequence of (an ) and is denoted by (anj ), where j ∈ N indexes the subsequence. Notice that the order of the terms in a subsequence is the same as in the original sequence, and repetitions are not allowed. Thus if 1 1 1 1 1 (an ) = 1, , , , , , · · · , 2 3 4 5 6 then

1 1 1 1 , , , ,··· 2 4 6 8

and

1 1 1 1 , , , ,··· 10 100 1000 10000

are examples of legitimate subsequences, whereas 1 1 1 1 1 1 1 1 1 1 , , , , , ,··· and 1, 1, , , , , · · · 10 5 100 50 1000 500 3 5 7 9 are not. Theorem 2.5.2. Subsequences of a convergent sequence converge to the same limit as the original sequence.

56

Chapter 2. Sequences and Series

Proof. Exercise 2.5.1 This not too surprising result has several somewhat surprising applications. It is the key ingredient for understanding when inﬁnite sums are associative (Exercise 2.5.2). We can also use it in the following clever way to compute values of some familiar limits. Example 2.5.3. Let 0 b2 > b3 > b4 > · · · > 0, the sequence (bn ) is decreasing and bounded below. The Monotone Convergence Theorem allows us to conclude that (bn ) converges to some l satisfying b > l ≥ 0. To compute l, notice that (b2n ) is a subsequence, so (b2n ) → l by Theorem 2.5.2. But (b2n ) = (bn )(bn ), so by the Algebraic Limit Theorem, (b2n ) → l · l = l2 . Because limits are unique, l2 = l, and thus l = 0. Without much trouble (Exercise 2.5.5), we can generalize this example to conclude (bn ) → 0 whenever −1 < b < 1. Example 2.5.4 (Divergence Criterion). Theorem 2.5.2 is also useful for providing economical proofs for divergence. In Example 2.2.7, we were quite sure that 1 1 1 1 1 1 1 1 1 1 1 1 1 1, − , , − , , − , , − , , − , , − , , − , · · · 2 3 4 5 5 5 5 5 5 5 5 5 5 did not converge to any proposed limit. Notice that 1 1 1 1 1 , , , , ,··· 5 5 5 5 5 is a subsequence that converges to 1/5. Also,

1 1 1 1 1 − ,− ,− ,− ,− ,··· 5 5 5 5 5

is a diﬀerent subsequence of the original sequence that converges to −1/5. Because we have two subsequences converging to two diﬀerent limits, we can rigorously conclude that the original sequence diverges.

The Bolzano–Weierstrass Theorem In the previous example, it was rather easy to spot a convergent subsequence (or two) hiding in the original sequence. For bounded sequences, it turns out that it is always possible to ﬁnd at least one such convergent subsequence. Theorem 2.5.5 (Bolzano–Weierstrass Theorem). Every bounded sequence contains a convergent subsequence.

2.5. Subsequences and the Bolzano–Weierstrass Theorem

57

Proof. Let (an ) be a bounded sequence so that there exists M > 0 satisfying |an | ≤ M for all n ∈ N. Bisect the closed interval [−M, M ] into the two closed intervals [−M, 0] and [0, M ]. (The midpoint is included in both halves.) Now, it must be that at least one of these closed intervals contains an inﬁnite number of the points in the sequence (an ). Select a half for which this is the case and label that interval as I1 . Then, let an1 be some point in the sequence (an ) satisfying an1 ∈ I1 . I1

•

•

−M

• • ✻ an1

an2

❄ • • • ••••••• • •

I2

•

0

•

•

• M

Next, we bisect I1 into closed intervals of equal length, and let I2 be a half that again contains an inﬁnite number of points of the original sequence. Because there are an inﬁnite number of points from (an ) to choose from, we can select an an2 from the original sequence with n2 > n1 and an2 ∈ I2 . In general, we construct the closed interval Ik by taking a half of Ik−1 containing an inﬁnite number of points of (an ) and then select nk > nk−1 > · · · > n2 > n1 so that ank ∈ Ik . We want to argue that (ank ) is a convergent subsequence, but we need a candidate for the limit. The sets I1 ⊇ I2 ⊇ I3 ⊇ · · · form a nested sequence of closed intervals, and by the Nested Interval Property there exists at least one point x ∈ R contained in every Ik . This provides us with the candidate we were looking for. It just remains to show that (ank ) → x. Let > 0. By construction, the length of Ik is M (1/2)k−1 which converges to zero. (This follows from Example 2.5.3 and the Algebraic Limit Theorem.) Choose N so that k ≥ N implies that the length of Ik is less than . Because x and ank are both in Ik , it follows that |ank − x| < .

Exercises Exercise 2.5.1. Prove Theorem 2.5.2. Exercise 2.5.2. (a) Prove that if an inﬁnite series converges, then the associative property holds. Assume a1 + a2 + a3 + a4 + a5 + · · · converges to a limit L (i.e., the sequence of partial sums (sn ) → L). Show that any regrouping of the terms (a1 + a2 + · · · + an1 ) + (an1 +1 + · · · + an2 ) + (an2 +1 + · · · + an3 ) + · · ·

58

Chapter 2. Sequences and Series

leads to a series that also converges to L. (b) Compare this result to the example discussed at the end of Section 2.1 where inﬁnite addition was shown not to be associative. Why doesn’t our proof in (a) apply to this example? Exercise 2.5.3. Give an example of each of the following, or argue that such a request is impossible. (a) A sequence that does not contain 0 or 1 as a term but contains subsequences converging to each of these values. (b) A monotone sequence that diverges but has a convergent subsequence. (c) A sequence that contains subsequences converging to every point in the inﬁnite set {1, 1/2, 1/3, 1/4, 1/5, . . . }. (d) An unbounded sequence with a convergent subsequence. (e) A sequence that has a subsequence that is bounded but contains no subsequence that converges. Exercise 2.5.4. Assume (an ) is a bounded sequence with the property that every convergent subsequence of (an ) converges to the same limit a ∈ R. Show that (an ) must converge to a. Exercise 2.5.5. Extend the result proved in Example 2.5.3 to the case |b| < 1. Show lim(bn ) = 0 whenever −1 < b < 1. Exercise 2.5.6. Let (an ) be a bounded sequence, and deﬁne the set S = {x ∈ R : x < an for inﬁnitely many terms an }. Show that there exists a subsequence (ank ) converging to s = sup S. (This is a direct proof of the Bolzano–Weierstrass Theorem using the Axiom of Completeness.)

2.6

The Cauchy Criterion

The following deﬁnition bears a striking resemblance to the deﬁnition of convergence for a sequence. Deﬁnition 2.6.1. A sequence (an ) is called a Cauchy sequence if, for every > 0, there exists an N ∈ N such that whenever m, n ≥ N it follows that |an − am | < . To make the comparison easier, let’s restate the deﬁnition of convergence. Deﬁnition 2.2.3 (Convergence of a Sequence). A sequence (an ) converges to a real number a if, for every > 0, there exists an N ∈ N such that whenever n ≥ N it follows that |an − a| < . As we have discussed, the deﬁnition of convergence asserts that, given an arbitrary positive , it is possible to ﬁnd a point in the sequence after which the terms of the sequence are all closer to the limit a than the given . On

2.6. The Cauchy Criterion

59

the other hand, a sequence is a Cauchy sequence if, for every , there is a point in the sequence after which the terms are all closer to each other than the given . To spoil the surprise, we will argue in this section that in fact these two deﬁnitions are equivalent: Convergent sequences are Cauchy sequences, and Cauchy sequences converge. The signiﬁcance of the deﬁnition of a Cauchy sequence is that there is no mention of a limit. This is somewhat like the situation with the Monotone Convergence Theorem in that we will have another way of proving that sequences converge without having any explicit knowledge of what the limit might be. Theorem 2.6.2. Every convergent sequence is a Cauchy sequence. Proof. Assume (xn ) converges to x. To prove that (xn ) is Cauchy, we must ﬁnd a point in the sequence after which we have |xn − xm | < . This can be done using an application of the triangle inequality. The details are requested in Exercise 2.6.2. The converse is a bit more diﬃcult to prove, mainly because, in order to prove that a sequence converges, we must have a proposed limit for the sequence to approach. We have been in this situation before in the proofs of the Monotone Convergence Theorem and the Bolzano–Weierstrass Theorem. Our strategy here will be to use the Bolzano–Weierstrass Theorem. This is the reason for the next lemma. (Compare this with Theorem 2.3.2.) Lemma 2.6.3. Cauchy sequences are bounded. Proof. Given = 1, there exists an N such that |xm − xn | < 1 for all m, n ≥ N. Thus, we must have |xn | < |xN | + 1 for all n ≥ N . It follows that M = max{|x1 |, |x2 |, |x3 |, . . . , |xN −1 |, |xN | + 1} is a bound for the sequence (xn ). Theorem 2.6.4 (Cauchy Criterion). A sequence converges if and only if it is a Cauchy sequence. Proof. (⇒) This direction is Theorem 2.6.2. (⇐) For this direction, we start with a Cauchy sequence (xn ). Lemma 2.6.3 guarantees that (xn ) is bounded, so we may use the Bolzano–Weierstrass Theorem to produce a convergent subsequence (xnk ). Set x = lim xnk . The idea is to show that the original sequence (xn ) converges to this same limit. Once again, we will use a triangle inequality argument. We know the terms in the subsequence are getting close to the limit x, and the assumption that (xn ) is Cauchy implies the terms in the “tail” of the sequence are close to each other. Thus, we want to make each of these distances less than half of the prescribed .

60

Chapter 2. Sequences and Series Let > 0. Because (xn ) is Cauchy, there exists N such that |xn − xm | <

2

whenever m, n ≥ N . Now, we also know that (xnk ) → x, so choose a term in this subsequence, call it xnK , with nK ≥ N and |xnK − x| <

. 2

To see that nK has the desired property (for the original sequence (xn )), observe that if n ≥ nK , then |xn − x| = |xn − xnK + xnK − x| ≤ |xn − xnK | + |xnK − x| + = . < 2 2

The Cauchy Criterion is named after the French mathematician Augustin Louis Cauchy. Cauchy is a major ﬁgure in the history of many branches of mathematics—number theory and the theory of ﬁnite groups, to name a few— but he is most widely recognized for his enormous contributions in analysis, especially complex analysis. He is deservedly credited with inventing the based deﬁnition of limits we use today, although it is probably better to view him as a pioneer of analysis in the sense that his work did not attain the level of reﬁnement that modern mathematicians have come to expect. The Cauchy Criterion, for instance, was devised and used by Cauchy to study inﬁnite series, but he never actually proved it in both directions. The fact that there were gaps in Cauchy’s work should not diminish his brilliance in any way. The issues of the day were both diﬃcult and subtle, and Cauchy was far and away the most inﬂuential in laying the groundwork for modern standards of rigor. Karl Weierstrass played a major role in sharpening Cauchy’s arguments. We will hear a good deal more from Weierstrass, most notably in Chapter 6 when we take up uniform convergence. Bernhard Bolzano was working in Prague and was writing and thinking about many of these same issues surrounding limits and continuity. For whatever reason, his historical reputation does not seem to do justice to the impressive level of his insights.

Completeness Revisited In the ﬁrst chapter, we established the Axiom of Completeness (AoC) to be the assertion that “sets bounded above have least upper bounds.” We then used this axiom as the crucial step in the proof of the Nested Interval Property (NIP). In this chapter, AoC was the central step in the Monotone Convergence Theorem (MCT), and NIP was central to proving the Bolzano–Weierstrass Theorem

2.6. The Cauchy Criterion

61

(BW). Finally, we needed BW in our proof of the Cauchy Criterion (CC) for convergent sequences. The list of implications then looks like NIP ⇒ BW ⇒ CC. AoC ⇒ MCT. But this list is not the whole story. Recall that in our original discussions about completeness, the fundamental problem was that the rational numbers contained “gaps.” The reason for moving from the rational numbers to the real numbers to do analysis is so that when we encounter a √ sequence of numbers that looks as if it is converging to some number—say 2—then we can be assured that there is indeed a number there that we can call the limit. The assertion that “sets bounded above have least upper bounds” is simply one way to mathematically articulate our insistence that there be no “holes” in our number ﬁeld, but it is not the only way. Instead, we could have taken NIP to be an axiom and used it to prove that least upper bounds exist, or we could have assumed MCT and used it to prove NIP and the rest of the results. In fact, any of these results could have served as our starting point. The situation is reminiscent of the adage, “Which came ﬁrst, the chicken or the egg?” All of the preceding statements are equivalent in the sense that once we have assumed any one of them to be true, it is possible to derive the rest. (These implications are explored in Exercise 2.6.6.) However, because we have an example of an ordered ﬁeld that is not complete—namely, the set of rational numbers—we know it is impossible to prove any of them using only the ﬁeld and order properties. Just how we decide which should be the axiom and which then become theorems depends on preference and context and in the end does not matter all that much. What is important is that we understand all of these results—AoC, NIP, MCT, BW, and CC—as belonging to the same family, each asserting the completeness of R in its own particular language.

Exercises Exercise 2.6.1. Give an example of each of the following, or argue that such a request is impossible. (a) A Cauchy sequence that is not monotone. (b) A monotone sequence that is not Cauchy. (c) A Cauchy sequence with a divergent subsequence. (d) An unbounded sequence containing a subsequence that is Cauchy. Exercise 2.6.2. Supply a proof for Theorem 2.6.2. Exercise 2.6.3. (a) Explain how the following pseudo-Cauchy property diﬀers from the proper deﬁnition of a Cauchy sequence: A sequence (sn ) is pseudoCauchy if, for all > 0, there exists an N such that if n ≥ N , then |sn+1 −sn | < . (b) If possible, give an example of a divergent sequence (sn ) that is pseudoCauchy.

62

Chapter 2. Sequences and Series

Exercise 2.6.4. Assume (an ) and (bn ) are Cauchy sequences. Use a triangle inequality argument to prove cn = |an − bn | is Cauchy. Exercise 2.6.5. If (xn ) and (yn ) are Cauchy sequences, then one easy way to prove that (xn + yn ) is Cauchy is to use the Cauchy Criterion. By Theorem 2.6.4, (xn ) and (yn ) must be convergent, and the Algebraic Limit Theorem then implies (xn + yn ) is convergent and hence Cauchy. (a) Give a direct argument that (xn + yn ) is a Cauchy sequence that does not use the Cauchy Criterion or the Algebraic Limit Theorem. (b) Do the same for the product (xn yn ). Exercise 2.6.6. (a) Assume the Nested Interval Property (Theorem 1.4.1) is true and use a technique similar to the one employed in the proof of the Bolzano– Weierstrass Theorem to give a proof for the Axiom of Completeness. (The reverse implication was given in Chapter 1. This shows that AoC is equivalent to NIP.) (b) Use the Monotone Convergence Theorem to give a proof of the Nested Interval Property. (This establishes the equivalence of AoC, NIP, and MCT.) (c) This time, start with the Bolzano–Weierstrass Theorem and use it to construct a proof of the Nested Interval Property. (Thus, BW is equivalent to NIP, and hence to AoC and MCT as well.) (d) Finally, use the Cauchy Criterion to prove the Bolzano–Weierstrass Theorem. This is the ﬁnal link in the equivalence of the ﬁve characterizations of completeness discussed at the end of Section 2.6.

2.7

Properties of Inﬁnite Series

Given an inﬁnite series between

∞

k=1

ak , it is important to keep a clear distinction

(i) the sequence of terms: (a1 , a2 , a3 , . . . ) and (ii) the sequence of partial sums: (s1 , s2 , s3 , . . . ), where sn = a1 +a2 +· · ·+an . ∞ The convergence of the series k=1 ak is deﬁned in terms of the sequence (sn ). Speciﬁcally, the statement ∞

ak = A

means that

lim sn = A.

k=1

It is for this reason that we can immediately translate many of our results from the study of sequences into statements about the behavior of inﬁnite series. ∞ Theorem ∞ 2.7.1 (Algebraic Limit Theorem for Series). If k=1 ak = A and k=1 bk = B, then ∞ (i) k=1 cak = cA for all c ∈ R and

2.7. Properties of Inﬁnite Series (ii)

∞

k=1 (ak

+ bk ) = A + B.

Proof. (i) In order to show that quence of partial sums

63

∞

k=1

cak = cA, we must argue that the se-

tm = ca1 + ca2 + ca3 + · · · + cam ∞ converges to cA. But we are given that k=1 ak converges to A, meaning that the partial sums sm = a1 + a2 + a3 + · · · + am converge to A. Because tm = csm , applying the Algebraic Limit Theorem for sequences (Theorem 2.3.3) yields (tm ) → cA, as desired. (ii) Exercise 2.7.8. One way to summarize Theorem 2.7.1 (i) is to say that inﬁnite addition still satisﬁes the distributive property. Part (ii) veriﬁes that series can be added in the usual way. Missing from this theorem is any statement about the product of two inﬁnite series. At the heart of this question is the issue of commutativity, which requires a more delicate analysis and so is postponed until Section 2.8. ∞ Theorem 2.7.2 (Cauchy Criterion for Series). The series k=1 ak converges if and only if, given > 0, there exists an N ∈ N such that whenever n > m ≥ N it follows that |am+1 + am+2 + · · · + an | < . Proof. Observe that |sn − sm | = |am+1 + am+2 + · · · + an | and apply the Cauchy Criterion for sequences. The Cauchy Criterion leads to economical proofs of several basic facts about series. ∞ Theorem 2.7.3. If the series k=1 ak converges, then (ak ) → 0. Proof. Consider the special case n = m + 1 in the Cauchy Criterion for Convergent Series. Every statement of this result should be accompanied with a reminder to look at the harmonic series (Example 2.4.5) to erase any misconception that the converse statement is true. Knowing (ak ) tends to 0 does not imply that the series converges. Theorem 2.7.4 (Comparison Test). Assume (ak ) and (bk ) are sequences satisfying 0 ≤ ak ≤ bk for all k ∈ N. ∞ ∞ (i) If k=1 bk converges, then k=1 ak converges.

64 (ii) If

Chapter 2. Sequences and Series ∞

k=1

ak diverges, then

∞

k=1 bk

diverges.

Proof. Both statements follow immediately from the Cauchy Criterion for Series and the observation that |am+1 + am+2 + · · · + an | ≤ |bm+1 + bm+2 + · · · + bn |. Alternate proofs using the Monotone Convergence Theorem are requested in the exercises. This is a good point to remind ourselves again that statements about convergence of sequences and series are immune to changes in some ﬁnite number of initial terms. In the Comparison Test, the requirement that ak ≤ bk does not really need to hold for all k ∈ N but just needs to be eventually true. A weaker, but suﬃcient, hypothesis would be to assume that there exists some point M ∈ N such that the inequality ak ≤ bk is true for all k ≥ M . The Comparison Test is used to deduce the convergence or divergence of one series based on the behavior of another. Thus, for this test to be of any great use, we need a catalog of series we can use as measuring sticks. In Section 2.4, we proved the Cauchy Condensation Test, which led to the general statement ∞ that the series n=1 1/np converges if and only if p > 1. The next example summarizes the situation for another important class of series. Example 2.7.5 (Geometric Series). A series is called geometric if it is of the form ∞ ark = a + ar + ar2 + ar3 + · · · . k=0

If r = 1 and a = 0, the series evidently diverges. For r = 1, the algebraic identity (1 − r)(1 + r + r2 + r3 + · · · + rm−1 ) = 1 − rm enables us to rewrite the partial sum sm = a + ar + ar2 + ar3 + · · · + arm−1 =

a(1 − rm ) . 1−r

Now the Algebraic Limit Theorem (for sequences) and Example 2.5.3 justify the conclusion ∞ a ark = 1−r k=0

if and only if |r| < 1. Although the Comparison Test requires that the terms of the series be positive, it is often used in conjunction with the next theorem to handle series that contain some negative terms.

2.7. Properties of Inﬁnite Series

65

Theorem 2.7.6 ∞ (Absolute Convergence Test). If the series verges, then n=1 an converges as well.

∞

n=1

|an | con-

Proof. This proof makes use of both the necessity (the “if” direction) and the suﬃciency (the “only if” direction) of the Cauchy Criterion for Series. Because ∞ n=1 |an | converges, we know that, given an > 0, there exists an N ∈ N such that |am+1 | + |am+2 | + · · · + |an | < for all n > m ≥ N . By the triangle inequality, |am+1 + am+2 + · · · + an | ≤ |am+1 | + |am+2 | + · · · + |an |, ∞ so the suﬃciency of the Cauchy Criterion guarantees that n=1 an also converges. The converse of this theorem is false. In the opening discussion of this chapter, we considered the alternating harmonic series 1−

1 1 1 1 1 + − + − + ··· . 2 3 4 5 6

∞ Taking absolute values of the terms gives us the harmonic series n=1 1/n, which we have seen diverges. However, it is not too diﬃcult to prove that with the alternating negative signs the series indeed converges. This is a special case of the Alternating Series Test, whose proof is outlined in Exercise 2.7.1. Theorem 2.7.7 (Alternating Series Test). Let (an ) be a sequence satisfying, (i) a1 ≥ a2 ≥ a3 ≥ · · · ≥ an ≥ an+1 ≥ · · · and (ii) (an ) → 0. Then, the alternating series

∞

n=1 (−1)

n+1

an converges.

Proof. Exercise 2.7.1.

∞ Deﬁnition 2.7.8. If n=1 |an | converges, then we say that the original series ∞ ∞ other hand, the series n=1 an converges absolutely. If, on the n=1 an con∞ verges but the series of absolute values |a | does not converge, then we n n=1 ∞ say that the original series n=1 an converges conditionally. In terms of this newly deﬁned jargon, we have shown that ∞ (−1)n+1 n n=1

converges conditionally, whereas ∞ (−1)n+1 , n2 n=1

∞ 1 n 2 n=1

and

∞ (−1)n+1 2n n=1

66

Chapter 2. Sequences and Series

converge absolutely. In particular, any convergent series with (all but ﬁnitely many) positive terms must converge absolutely. The Alternating Series Test is the most accessible test for conditional convergence, but several others are explored in the exercises. In particular, Abel’s Test, outlined in Exercise 2.7.14, will prove useful in our investigations of power series in Chapter 6.

Rearrangements Informally speaking, a rearrangement of a series is obtained by permuting the terms in the sum into some other order. It is important that all of the original terms eventually appear in the new ordering and that no term gets repeated. In an earlier discussion, we formed a rearrangement of the alternating harmonic series by taking two positive terms for each negative term: 1 1 1 1 1 − + + − + ··· . 3 2 5 7 4 There are clearly an inﬁnite number of rearrangements of any sum; however, it is helpful to see why neither 1+

1+ nor

1 1 1 1 1 − + + − + ··· 2 3 4 5 6

1 1 1 1 1 1 1 1 − + + − + + − + ··· 3 4 5 7 8 9 11 12 is considered a rearrangement of the original alternating harmonic series. ∞ ∞ Deﬁnition 2.7.9. ∞ Let k=1 ak be a series. A series k=1 bk is called a rearrangement of k=1 ak if there exists a one-to-one, onto function f : N → N such that bf (k) = ak for all k ∈ N. 1+

We now have all the tools and notation in place to resolve an issue raised at the beginning of the chapter. In Section 2.1, we constructed a particular rearrangement of the alternating harmonic series that converges to a limit different from that of the original series. This happens because the convergence is conditional. ∞ Theorem 2.7.10. If k=1 ak converges absolutely, then any rearrangement of this series converges to the same limit. ∞ ∞ Proof. Assume k=1 ak converges absolutely to A, and let k=1 bk be a rear∞ rangement of k=1 ak . Let’s use sn =

n

ak = a1 + a2 + · · · + an

k=1

for the partial sums of the original series and use tm =

m k=1

bk = b1 + b2 + · · · + bm

2.7. Properties of Inﬁnite Series

67

for the partial sums of the rearranged series. Thus we want to show that (tm ) → A. Let > 0. By hypothesis, (sn ) → A, so choose N1 such that |sn − A| < 2 for all n ≥ N1 . Because the convergence is absolute, we can choose N2 so that n k=m+1

|ak | <

2

for all n > m ≥ N2 . Now, take N = max{N1 , N2 }. We know that the ﬁnite set of terms {a1 , a2 , a3 , . . . , aN } must all appear ∞ in the rearranged series, and we want to move far enough out in the series n=1 bn so that we have included all of these terms. Thus, choose M = max{f (k) : 1 ≤ k ≤ N }. of a ﬁnite It should now be evident that if m ≥ M , then (tm − sN ) consists ∞ set of terms, the absolute values of which appear in the tail k=N +1 |ak |. Our choice of N2 earlier then guarantees |tm − sN | < /2, and so |tm − A| = |tm − sN + sN − A| ≤ |tm − sN | + |sN − A| + = < 2 2 whenever m ≥ M .

Exercises Exercise 2.7.1. Proving the Alternating Series Test (Theorem 2.7.7) amounts to showing that the sequence of partial sums sn = a1 − a2 + a3 − · · · ± an converges. (The opening example in Section 2.1 includes a typical illustration of (sn ).) Diﬀerent characterizations of completeness lead to diﬀerent proofs. (a) Prove the Alternating Series Test by showing that (sn ) is a Cauchy sequence. (b) Supply another proof for this result using the Nested Interval Property (Theorem 1.4.1). (c) Consider the subsequences (s2n ) and (s2n+1 ), and show how the Monotone Convergence Theorem leads to a third proof for the Alternating Series Test. Exercise 2.7.2. (a) Provide the details for the proof of the Comparison Test (Theorem 2.7.4) using the Cauchy Criterion for series. (b) Give another proof for the Comparison Test, this time using the Monotone Convergence Theorem.

68

Chapter 2. Sequences and Series

Exercise 2.7.3. Let an be given. For each n ∈ N, let pn = an if an is positive and assign pn = 0 if an is negative. In a similar manner, let qn = an if an is negative and qn = 0 if an is positive. (a) Argue that if an diverges, then at least one of pn or qn diverges. (b) Show that if an converges conditionally, then both pn and qn diverge. Exercise 2.7.4. Give anexample to show that it is possible for both xn and yn to diverge but for xn yn to converge. 2 an also Exercise 2.7.5. (a) Show that if an converges absolutely, then converges absolutely. Does this proposition hold without absolute convergence? √ an ? (b) If an converges and an ≥ 0, can we conclude anything about Exercise 2.7.6. (a) Show that if xn converges absolutely, and the sequence (yn ) is bounded, then the sum xn yn converges. (b) Find a counterexample that demonstrates that part (a) does not always hold if the convergence of xn is conditional. Exercise 2.7.7. Now that we have proved the basic facts about geometric series, supply a proof for Corollary 2.4.7. Exercise 2.7.8. Prove Theorem 2.7.1 part (ii). ∞ Exercise 2.7.9 (Ratio Test). Given a series n=1 an with an = 0, the Ratio Test states that if (an ) satisﬁes an+1 = r < 1, lim an then the series converges absolutely. (a) Let r satisfy r < r < 1. (Why must such an r exist?) Explain why there exists an N such that n ≥ N implies |an+1 | ≤ |an |r . (b) Why does |aN | (r )n necessarily converge? (c) Now, show that |an | converges. Exercise 2.7.10. (a) Show that if an > 0 and lim(nan ) = l with l = 0, then the series an diverges. (b) Assume an > 0 and lim(n2 an ) exists. Show that an converges. bn both of which Exercise 2.7.11. Find an and examples of two series diverge but for which min{an , bn } converges. To make it more challenging, produce examples where (an ) and (bn ) are positive and decreasing. Exercise 2.7.12 (Summation by Parts). Let (xn ) and (yn ) be sequences, and let sn = x1 + x2 + · · · + xn . Use the observation that xj = sj − sj−1 to verify the formula n j=m+1

xj yj = sn yn+1 − sm ym+1 +

n j=m+1

sj (yj − yj+1 ).

2.8. Double Summations and Products of Inﬁnite Series

69

Exercise 2.7.13 (Dirichlet’s ∞Test). Dirichlet’s Test for convergence states that if the partial sums of n=1 xn are bounded (but not necessarily convergent), and if (yn ) is a sequence satisfying y1 ≥ y2 ≥ y3 ≥ · · · ≥ 0 with lim yn = 0, ∞ then the series n=1 xn yn converges. ∞ (a) Let M > 0 be an upper bound for the partial sums of n=1 xn . Use Exercise 2.7.12 to show that n x y j j ≤ 2M |ym+1 |. j=m+1 (b) Prove Dirichlet’s Test just stated. (c) Show how the Alternating Series Test (Theorem 2.7.7) can be derived as a special case of Dirichlet’s Test. Exercise ∞2.7.14 (Abel’s Test). Abel’s Test for convergence states that if the series n=1 xn converges, and if (yn ) is a sequence satisfying ∞

y1 ≥ y2 ≥ y3 ≥ · · · ≥ 0,

then the series n=1 xn yn converges. (a) Carefully point out how the hypothesis of Abel’s Test diﬀers from that of Dirichlet’s Test in Exercise 2.7.13. ∞ (b) Assume that n=1 an has partial sums that are bounded by a constant A > 0, and assume b1 ≥ b2 ≥ b3 ≥ · · · ≥ 0. Use Exercise 2.7.12 to show that n aj bj ≤ 2Ab1 . j=1 (c) ProveAbel’s Test via the following strategy. For a ﬁxed m ∈ N , apply n part (b) to j=m+1 xj yj by setting an = xm+n and bn = ym+n . (Argue that ∞ an upper bound on the partial sums of n=1 an can be made arbitrarily small by taking m to be large.)

2.8

Double Summations and Products of Inﬁnite Series

Given a doubly indexed array of real numbers {aij : i, y ∈ N}, we discovered in Section 2.1 that there is a dangerous ambiguity in how we might deﬁne ∞ i,j=1 aij . Performing the sum over ﬁrst one of the variables and then the other is referred to as an iterated summation. In our speciﬁc example, summing the rows ﬁrst and then taking the sum of these totals produced a diﬀerent result than ﬁrst computing the sum of each column and adding these sums together. In short, ∞ ∞ ∞ ∞ aij = aij . j=1 i=1

i=1 j=1

70

Chapter 2. Sequences and Series

∞ There are still other ways to reasonably deﬁne i,j=1 aij . One natural idea is to calculate a kind of partial sum by adding together ﬁnite numbers of terms in larger and larger “rectangles” in the array; that is, for m, n ∈ N, set (1)

smn =

m n

aij .

i=1 j=1

The order of the sum here is irrelevant because the sum is ﬁnite. Of particular interest to our discussion are the sums snn (sums over “squares”), which form a legitimate sequence indexed by n and thus can be subjected to our arsenal of theorems and deﬁnitions. If the sequence (snn ) converges, for instance, we might wish to deﬁne ∞ aij = lim snn . i,j=1

n→∞

Exercise 2.8.1. Using the particular array (aij ) from Section 2.1, compute limn→∞ snn . How does this value compare to the two iterated values for the sum already computed? There is a deep similarity between the issue of how to deﬁne a double summation and the topic of rearrangements discussed at the end of Section 2.7. Both relate to the commutativity of addition in an inﬁnite setting. For rearrangements, the resolution came with the added hypothesis of absolute convergence, and it is not surprising that the same remedy applies for double summations. Under the assumption of absolute convergence, each of the methods discussed for computing the value of a double sum yields the same result. Exercise 2.8.2. Show that if the iterated series ∞ ∞ |aij | i=1 j=1

∞ converges (meaning that for each ﬁxed i ∈ N the series j=1 |aij | converges to ∞ some real number bi , and the series i=1 bi converges as well), then the iterated series ∞ ∞ aij i=1 j=1

converges.

Theorem 2.8.1. Let {aij : i, j ∈ N } be a doubly indexed array of real numbers. If ∞ ∞ |aij | converges, then both value. Moreover,

∞ ∞ i=1

lim snn =

n→∞

i=1 j=1

j=1 aij and ∞ ∞ i=1 j=1

∞ ∞ j=1

aij =

i=1

∞ ∞ j=1 i=1

aij converge to the same aij ,

2.8. Double Summations and Products of Inﬁnite Series where snn =

n

i=1

n

j=1

71

aij .

Proof. In the same way that we deﬁned the “rectangular partial sums” smn above in equation (1), deﬁne tmn =

m n

|aij |.

i=1 j=1

Exercise 2.8.3. (a) Prove that the set {tmn : m, n ∈ N} is bounded above, and use this fact to conclude that the sequence (tnn ) converges. (b) Now, use the fact that (tnn ) is a Cauchy sequence to argue that (snn ) is a Cauchy sequence and hence converges. We can now set S = lim snn . n→∞

In order to prove the theorem, we must show that the two iterated sums converge to this same limit. We will ﬁrst show that S=

∞ ∞

aij .

i=1 j=1

Because {tmn : m, n ∈ N} is bounded above, we can let B = sup{tmn : m, n ∈ N}. Let > 0 be arbitrary. Because B is the least upper bound for this set, we know there exists a particular tm0 n0 that satisﬁes B−

< tm0 n0 ≤ B. 2

Exercise 2.8.4. (a) Argue that there exists an N1 ∈ N such that m, n ≥ N1 implies B − 2 < tmn ≤ B. (b) Now, show that there exists an N such that |smn − S| < for all m, n ≥ N . For the moment, consider m ∈ N to be ﬁxed and write smn as smn =

n j=1

a1j +

n j=1

a2j + · · · +

n

amj .

j=1

Our hypothesis guarantees that for each ﬁxed row i, the series verges absolutely to some real number ri .

∞

j=1

aij con-

72

Chapter 2. Sequences and Series

Exercise 2.8.5. (a) Use the Algebraic Limit Theorem (Theorem 2.3.3) and the Order Limit Theorem (Theorem 2.3.4) to show that for all m ≥ N |(r1 + r2 + · · · + rm ) − S| ≤ . ∞ ∞ Conclude that the iterated sum i=1 j=1 aij converges to S. Exercise ∞ ∞ 2.8.6. Finish the proof by showing that the other iterated sum, can j=1 i=1 aij , converges to S as well. Notice that the same argument ∞ be used once it is established that, for each ﬁxed column j, the sum i=1 aij converges to some real number ci .

One ﬁnal common way of computing a double summation is to sum along diagonals where i + j equals a constant. Given a doubly indexed array {aij : i, j ∈ N }, let d2 = a11 ,

d3 = a12 + a21 ,

d4 = a13 + a22 + a31 ,

and in general set dk = a1,k−1 + a2,k−2 + · · · + ak−1,1 . ∞ Then, k=2 dk represents another reasonable way of summing over every aij in the array. Exercise 2.8.7. (a) Assuming ∞ the hypothsis—and hence the conclusion—of Theorem 2.8.1, show that k=2 dk converges absolutely. ∞ (b) Imitate the strategy in the proof of Theorem 2.8.1 to show that k=2 dk converges to S = limn→∞ snn .

Products of Series Conspicuously missing from the Algebraic Limit Theorem for Series (Theorem 2.7.1) is any statement about the product of two convergent series. One way to formally carry out the algebra on such a product is to write ∞ i=1

 ∞  ai  bj 

= (a1 + a2 + a3 + · · · )(b1 + b2 + b3 + · · · )

j=1

= a1 b1 + (a1 b2 + a2 b1 ) + (a3 b1 + a2 b2 + a1 b3 ) + · · · ∞ = dk , k=2

where

dk = a1 bk−1 + a2 bk−2 + · · · + ak−1 b1 .

2.9. Epilogue

73

This particular form of the product, examined earlier in Exercise 2.8.7, is called the Cauchy product of two series. Although there is something algebraically natural about writing the product in this form, it may very well be that computing the value of the sum is more easily done via one or the other iterated summation. The question remains, then, as to how the value of the Cauchy product—if it exists—is related to these other values of the double sum. If the two series being multiplied converge absolutely, it is not too diﬃcult to prove that the sum may be computed in whatever way is most convenient. ∞ ∞ Exercise 2.8.8. Assume that i=1 ai converges absolutely to A, and j=1 bj converges absolutely to B. (a) Show that the set   m n   |ai bj | : m, n ∈ N   i=1 j=1

∞ ∞ is bounded. Use this to show that the iterated sum i=1 j=1 |ai bj | converges so that we may apply n 2.8.1. nTheorem (b) Let snn = i=1 j=1 ai bj , and use the Algebraic Limit Theorem to show that limn→∞ snn = AB. Conclude that ∞ ∞ i=1 j=1

a i bj =

∞ ∞

ai bj =

j=1 i=1

∞

dk = AB,

k=2

where, as before, dk = a1 bk−1 + a2 bk−2 + · · · + ak−1 b1 .

2.9

Epilogue

Theorems 2.7.10 and 2.8.1 make it clear that absolute convergence is an extremely desirable quality to have when manipulating series. On the other hand, the situation for conditionally convergent series is delightfully pathological. In the case of rearrangements, not only ∞ are they no longer guaranteed to converge to the same limit, but in fact if n=1 an converges conditionally, then for any ∞ r ∈ R there exists a rearrangement of n=1 an that converges to r. To see why, let’s look again at the alternating harmonic series ∞ (−1)n+1 . n n=1

∞ The negative terms taken alone form the series n=1 (−1)/2n. The partial sums of this series are precisely −1/2 the partial sums of the harmonic series, and so march oﬀ (at half speed) to negative inﬁnity. A similar argument shows ∞ that the sum of positive terms n=1 1/(2n − 1) also diverges to inﬁnity. It is not too diﬃcult to argue that this situation is always the case for conditionally convergent series (Exercise 2.7.3). Now, let r be some proposed limit, which, for

74

Chapter 2. Sequences and Series

the sake of this argument, we take to be positive. The idea is to take as many positive terms as necessary to form the ﬁrst partial sum greater than r. We then add negative terms until the partial sum falls below r, at which point we switch back to positive terms. The fact that there is no bound on the sums of either the positive terms or the negative terms allows this process to continue indeﬁnitely. The fact that the terms themselves tend to zero is enough to guarantee that the partial sums, when constructed in this manner, indeed converge to r as they oscillate around this target value. Perhaps the best way to summarize the situation is to say that the hypothesis of absolute convergence essentially allows us to treat inﬁnite sums as though they were ﬁnite sums. This assessment extends to double sums as well, although there are a few subtleties to address. In the case of products, we showed in Exercise 2.8.8 that the Cauchy product of two absolutely convergent inﬁnite series converges to the product of the two factors, but in fact the same conclusion follows if we only have absolute convergence in one of the two original series. In the notation of Exercise 2.8.8, if an converges absolutely to A, and if bn converges (perhaps conditionally) to B, then the Cauchy product d = AB. k On the other hand, if both an and bn converge conditionally, then it is √ an possible for the Cauchy product to diverge. Squaring (−1)n / n provides example of this phenomenon. Of course, it is also possible to ﬁnd a = A n dk conconditionally and bn = B conditionally whose Cauchy product verges. If this is the case, then the convergence is to the right value, namely dk = AB. A proof of this last fact will be oﬀered in Chapter 6, where we undertake the study of power series. Here is the connection. A power series has the form a0 + a1 x + a2 x2 + · · · . If we multiply two power series together as though they were polynomials, then when we collect common powers of x the result is (a0 + a1 x + a2 x2 + · · · )(b0 + b1 x + b2 x2 + · · · ) = a0 b0 + (a0 b1 + a1 b0 )x + (a0 b2 + a1 b1 + a2 b0 )x2 + · · · = d0 + d1 x + d2 x2 + · · · , which is the Cauchy product of an xn and bn xn . (The index starts with n = 0 rather than n = 1.) Upcoming results about the good behavior of power series will lead to a proof that convergent Cauchy products sum to the proper value. In the other direction, Exercise 2.8.8 will be useful in establishing a theorem about the product of two power series.

Chapter 3

Basic Topology of R 3.1

Discussion: The Cantor Set

What follows is a fascinating mathematical construction, due to Georg Cantor, which is extremely useful for extending the horizons of our intuition about the nature of subsets of the real line. Cantor’s name has already appeared in the ﬁrst chapter in our discussion of uncountable sets. Indeed, Cantor’s proof that R is uncountable occupies another spot on the short list of the most signiﬁcant contributions toward understanding the mathematical inﬁnite. In the words of the mathematician David Hilbert, “No one shall expel us from the paradise that Cantor has created for us.” Let C0 be the closed interval [0, 1], and deﬁne C1 to be the set that results when the open middle third is removed; that is, & ' & ' 1 2 1 2 = 0, ∪ ,1 . C1 = C0 \ , 3 3 3 3 Now, construct C2 in a similar way by removing the open middle third of each of the two components of C1 : ' & ' & ' & & ' 2 1 2 7 8 1 ∪ , ∪ ∪ ,1 . C2 = 0, , 9 9 3 3 9 9 If we continue this process inductively, then for each n = 0, 1, 2, . . . we get a set Cn consisting of 2n closed intervals each having length 1/3n . Finally, we deﬁne the Cantor set C (Fig. 3.1) to be the intersection C=

∞

Cn .

n=0

It may be useful to understand C as the remainder of the interval [0, 1] after the iterative process of removing open middle thirds is taken to inﬁnity: & ' 1 2 1 2 7 8 C = [0, 1]\ , ∪ , ∪ , ∪ ··· . 3 3 9 9 9 9 75

76

Chapter 3. Basic Topology of R 1

0

C0

•

C1

•

C2

•

C3 .. .

• • • • .. .. . .

•

0 0

1/3

2/3

1/3

2/3

•

1/9

•

2/9

•

•

• • • • .. .. . .

1

• •

• 7/9

8/9

•

•

• • • • .. .. . .

Figure 3.1: Defining the Cantor set; C =

∞

n=0

1

•

• • • • .. .. . . Cn .

There is some initial doubt whether anything remains at all, but notice that because we are always removing open middle thirds, then for every n ∈ N, 0 ∈ Cn and hence 0 ∈ C. The same argument shows 1 ∈ C. In fact, if y is the endpoint of some closed interval of some particular set Cn , then it is also an endpoint of one of the intervals of Cn+1 . Because, at each stage, endpoints are never removed, it follows that y ∈ Cn for all n. Thus, C at least contains the endpoints of all of the intervals that make up each of the sets Cn . Is there anything else? Is C countable? Does C contain any intervals? Any irrational numbers? These are diﬃcult questions at the moment. All of the endpoints mentioned earlier are rational numbers (they have the form m/3n ), which means that if it is true that C consists of only these endpoints, then C would be a subset of Q and hence countable. We shall see about this. There is some strong evidence that not much is left in C if we consider the total length of the intervals removed. To form C1 , an open interval of length 1/3 was taken out. In the second step, we removed two intervals of length 1/9, and to construct Cn we removed 2n−1 middle thirds of length 1/3n . There is some logic, then, to deﬁning the “length” of C to be 1 minus the total 1 +2 3

1 1 1 1 3 +4 + · · · + 2n−1 + · · · = 9 27 3n 1−

2 3

= 1.

The Cantor set has zero length. To this point, the information we have collected suggests a mental picture of C as a relatively small, thin set. For these reasons, the set C is often referred to as Cantor “dust.” But there are some strong counterarguments that imply a very diﬀerent picture. First, C is actually uncountable, with cardinality equal to the cardinality of R. One slightly intuitive but convincing way to see this is to create a 1–1 correspondence between C and sequences of the form (an )∞ n=1 , where an = 0 or 1. For each c ∈ C, set a1 = 0 if c falls in the left-hand component of C1 and set a1 = 1 if c falls in the right-hand component. Having established where in C1 the point c is located, there are now two possible components of C2 that might contain c. This time, we set a2 = 0 or 1 depending on whether c falls in the left or right half of these two components of C2 . Continuing in this

3.1. Discussion: The Cantor Set •

−→

−→

•

77 •

•

−→

•

•

•

•

−→

Figure 3.2: Magnifying sets by a factor of 3.

way, we come to see that every element c ∈ C yields a sequence (a1 , a2 , a3 , . . . ) of zeros and ones that acts as a set of directions for how to locate c within C. Likewise, every such sequence corresponds to a point in the Cantor set. Because the set of sequences of zeros and ones is uncountable (Exercise 1.5.4), we must conclude that C is uncountable as well. What does this imply? In the ﬁrst place, because the endpoints of the approximating sets Cn form a countable set, we are forced to accept the fact that not only are there other points in C but there are uncountably many of them. From the point of view of cardinality, C is quite large—as large as R, in fact. This should be contrasted with the fact that from the point of view of length, C measures the same size as a single point. We conclude this discussion with a demonstration that from the point of view of dimension, C strangely falls somewhere in between. There is a sensible agreement that a point has dimension zero, a line segment has dimension one, a square has dimension two, and a cube has dimension three. Without attempting a formal deﬁnition of dimension (of which there are several), we can nevertheless get a sense of how one might be deﬁned by observing how the dimension aﬀects the result of magnifying each particular set by a factor of 3 (Fig. 3.2). (The reason for the choice of 3 will become clear when we turn our attention back to the Cantor set). A single point undergoes no change at all, whereas a line segment triples in length. For the square, magnifying each length by a factor of 3 results in a larger square that contains 9 copies of the original square. Finally, the magniﬁed cube yields a cube that contains 27 copies of the original cube within its volume. Notice that, in each case, to compute the “size” of the new set, the dimension appears as the exponent of the magniﬁcation factor. Now, apply this transformation to the Cantor set. The set C0 = [0, 1] becomes the interval [0, 3]. Deleting the middle third leaves [0, 1] ∪ [2, 3], which is where we started in the original construction except that we now stand to produce an additional copy of C in the interval [2, 3]. Magnifying the Cantor set by a factor of 3 yields two copies of the original set. Thus, if x is the dimension of C, then x should satisfy 2 = 3x , or x = ln 2/ ln 3 ≈ .631 (Fig. 3.3).

78

Chapter 3. Basic Topology of R dim 0

×3 →

new copies 1 = 30

segment

1

→

3 = 31

square

2

→

9 = 32

cube

3

→

27 = 33

C

x

→

2 = 3x

point

Figure 3.3: Dimension of C; 2 = 3x ⇒ x = ln 2/ ln 3.

The notion of a noninteger or fractional dimension is the impetus behind the term “fractal,” coined in 1975 by Benoit Mandlebrot to describe a class of sets whose intricate structures have much in common with the Cantor set. Cantor’s construction, however, is over a hundred years old and for us represents an invaluable testing ground for the upcoming theorems and conjectures about the often elusive nature of subsets of the real line.

3.2

Open and Closed Sets

Given a ∈ R and > 0, recall that the -neighborhood of a is the set V (a) = {x ∈ R : |x − a| < }. In other words, V (a) is the open interval (a − , a + ), centered at a with radius . Deﬁnition 3.2.1. A set O ⊆ R is open if for all points a ∈ O there exists an -neighborhood V (a) ⊆ O. Example 3.2.2. (i) Perhaps the simplest example of an open set is R itself. Given an arbitrary element a ∈ R, we are free to pick any -neighborhood we like and it will always be true that V (a) ⊆ R. It is also the case that the logical structure of Deﬁnition 3.2.1 requires us to classify the empty set ∅ as an open subset of the real line. (ii) For a more useful collection of examples, consider the open interval (c, d) = {x ∈ R : c < x < d}. To see that (c, d) is open in the sense just deﬁned, let x ∈ (c, d) be arbitrary. If we take = min{x − c, d − x}, then it follows that V (x) ⊆ (c, d). It is important to see where this argument breaks down if the interval includes either one of its endpoints. The union of open intervals is another example of an open set. This observation leads to the next result.

3.2. Open and Closed Sets

79

Theorem 3.2.3. (i) The union of an arbitrary collection of open sets is open. (ii) The intersection of a ﬁnite collection of open sets is open. Proof. To prove (i), we let {Oλ : λ ∈ Λ} be a collection of open sets and let O = λ∈Λ Oλ . Let a be an arbitrary element of O. In order to show that O is open, Deﬁnition 3.2.1 insists that we produce an -neighborhood of a completely contained in O. But a ∈ O implies that a is an element of at least one particular Oλ . Because we are assuming Oλ is open, we can use Deﬁnition 3.2.1 to assert that there exists V (a) ⊆ Oλ . The fact that Oλ ⊆ O allows us to conclude that V (a) ⊆ O. This completes the proof of (i). For (ii), let {O1 , O2 , . . . , ON } be a ﬁnite collection of open sets. Now, if N a ∈ k=1 Ok , then a is an element of each of the open sets. By the deﬁnition of an open set, we know that, for each 1 ≤ k ≤ N , there exists Vk (a) ⊆ Ok . We are in search of a single -neighborhood of a that is contained in every Ok , so the trick is to take the smallest one. Letting = min{1 , 2 , . . . , N }, it follows N that V (a) ⊆ Vk (a) for all k, and hence V (a) ⊆ k=1 Ok , as desired.

Closed Sets Deﬁnition 3.2.4. A point x is a limit point of a set A if every -neighborhood V (x) of x intersects the set A in some point other than x. Limit points are also often referred to as “cluster points” or “accumulation points,” but the phrase “x is a limit point of A” has the advantage of explicitly reminding us that x is quite literally the limit of a sequence in A. Theorem 3.2.5. A point x is a limit point of a set A if and only if x = lim an for some sequence (an ) contained in A satisfying an = x for all n ∈ N. Proof. (⇒) Assume x is a limit point of A. In order to produce a sequence (an ) converging to x, we are going to consider the particular -neighborhoods obtained using = 1/n. By Deﬁnition 3.2.4, every neighborhood of x intersects A in some point other than x. This means that, for each n ∈ N, we are justiﬁed in picking a point an ∈ V1/n (x) ∩ A with the stipulation that an = x. It should not be too diﬃcult to see why (an ) → x. Given an arbitrary > 0, choose N such that 1/N < . It follows that |an − x| < for all n ≥ N . The reverse implication is requested as Exercise 3.2.4. The restriction that an = x in Theorem 3.2.5 deserves a comment. Given a point a ∈ A, it is always the case that a is the limit of a sequence in A if we are allowed to consider the constant sequence (a, a, a, . . . ). There will be occasions where we will want to avoid this somewhat uninteresting situation, so it is important to have a vocabulary that can distinguish limit points of a set from isolated points.

80

Chapter 3. Basic Topology of R

Deﬁnition 3.2.6. A point a ∈ A is an isolated point of A if it is not a limit point of A. As a word of caution, we need to be a little careful about how we understand the relationship between these concepts. Whereas an isolated point is always an element of the relevant set A, it is quite possible for a limit point of A not to belong to A. As an example, consider the endpoint of an open interval. This situation is the subject of the next important deﬁnition. Deﬁnition 3.2.7. A set F ⊆ R is closed if it contains its limit points. The adjective “closed” appears in several other mathematical contexts and is usually employed to mean that an operation on the elements of a given set does not take us out of the set. In linear algebra, for example, a vector space is a set that is “closed” under addition and scalar multiplication. In analysis, the operation we are concerned with is the limiting operation. Topologically speaking, a closed set is one where convergent sequences within the set have limits that are also in the set. Theorem 3.2.8. A set F ⊆ R is closed if and only if every Cauchy sequence contained in F has a limit that is also an element of F . Proof. Exercise 3.2.6. Example 3.2.9. (i) Consider A=

1 :n∈N . n

Let’s show that each point of A is isolated. Given 1/n ∈ A, choose = 1/n − 1/(n + 1). Then, 1 V (1/n) ∩ A = . n It follows from Deﬁnition 3.2.4 that 1/n is not a limit point and so is isolated. Although all of the points of A are isolated, the set does have one limit point, namely 0. This is because every neighborhood centered at zero, no matter how small, is going to contain points of A. Because 0 ∈ / A, A is not closed. The set F = A ∪ {0} is an example of a closed set and is called the closure of A. (The closure of a set is discussed in a moment.) (ii) Let’s prove that a closed interval [c, d] = {x ∈ R : c ≤ x ≤ d} is a closed set using Deﬁnition 3.2.7. If x is a limit point of [c, d], then by Theorem 3.2.5 there exists (xn ) ⊆ [c, d] with (xn ) → x. We need to prove that x ∈ [c, d]. The key to this argument is contained in the Order Limit Theorem (Theorem 2.3.4), which summarizes the relationship between inequalities and the limiting

3.2. Open and Closed Sets

81

process. Because c ≤ xn ≤ d, it follows from Theorem 2.3.4 (iii) that c ≤ x ≤ d as well. Thus, [c, d] is closed. (iii) Consider the set Q ⊆ R of rational numbers. An extremely important property of Q is that its set of limit points is actually all of R. To see why this is so, recall Theorem 1.4.3 from Chapter 1, which is referred to as the density property of Q in R. Let y ∈ R be arbitrary, and consider any neighborhood V (y) = (y −, y +). Theorem 1.4.3 allows us to conclude that there exists a rational number r = y that falls in this neighborhood. Thus, y is a limit point of Q. The density property of Q can now be reformulated in the following way. Theorem 3.2.10 (Density of Q in R). Given any y ∈ R, there exists a sequence of rational numbers that converges to y. Proof. Combine the preceding discussion with Theorem 3.2.5. The same argument can also be used to show that every real number is the limit of a sequence of irrational numbers. Although interesting, part of the allure of the rational numbers is that, in addition to being dense in R, they are countable. As we will see, this tangible aspect of Q makes it an extremely useful set, both for proving theorems and for producing interesting counterexamples.

Closure Deﬁnition 3.2.11. Given a set A ⊆ R, let L be the set of all limit points of A. The closure of A is deﬁned to be A = A ∪ L. In Example 3.2.9 (i), we saw that if A = {1/n : n ∈ N}, then the closure of A is A = A ∪ {0}. Example 3.2.9 (iii) veriﬁes that Q = R. If A is an open interval (a, b), then A = [a, b]. If A is a closed interval, then A = A. It is not for lack of imagination that in each of these examples A is always a closed set. Theorem 3.2.12. For any A ⊆ R, the closure A is a closed set and is the smallest closed set containing A. Proof. If L is the set of limit points of A, then it is immediately clear that A contains the limit points of A. There is still something more to prove, however, because taking the union of L with A could potentially produce some new limit points of A. In Exercise 3.2.8, we outline the argument that this does not happen. Now, any closed set containing A must contain L as well. This shows that A = A ∪ L is the smallest closed set containing A.

Complements The mathematical notions of open and closed are not antonyms the way they are in standard English. If a set is not open, that does not imply it must be closed. Many sets such as the half-open interval (c, d] = {x ∈ R : c < x ≤ d} are neither

82

Chapter 3. Basic Topology of R

open nor closed. The sets R and ∅ are both simultaneously open and closed although, thankfully, these are the only ones with this unsightly property. There is, however, an important relationship between open and closed sets. Recall that the complement of a set A ⊆ R is deﬁned to be the set Ac = {x ∈ R : x ∈ / A}. Theorem 3.2.13. A set O is open if and only if Oc is closed. Likewise, a set F is closed if and only if F c is open. Proof. Given an open set O ⊆ R, let’s ﬁrst prove that Oc is a closed set. To prove Oc is closed, we need to show that it contains all of its limit points. If x is a limit point of Oc , then every neighborhood of x contains some point of Oc . But that is enough to conclude that x cannot be in the open set O because x ∈ O would imply that there exists a neighborhood V (x) ⊆ O. Thus, x ∈ Oc , as desired. For the converse statement, we assume Oc is closed and argue that O is open. Thus, given an arbitrary point x ∈ O, we must produce an -neighborhood V (x) ⊆ O. Because Oc is closed, we can be sure that x is not a limit point of Oc . Looking at the deﬁnition of limit point, we see that this implies that there must be some neighborhood V (x) of x that does not intersect the set Oc . But this means V (x) ⊆ O, which is precisely what we needed to show. The second statement in Theorem 3.2.13 follows quickly from the ﬁrst using the observation that (E c )c = E for any set E ⊆ R. The last theorem of this section should be compared to Theorem 3.2.3. Theorem 3.2.14. (i) The union of a ﬁnite collection of closed sets is closed. (ii) The intersection of an arbitrary collection of closed sets is closed. Proof. De Morgan’s Laws state that for any collection of sets {Eλ : λ ∈ Λ} it is true that c c Eλc and Eλ = Eλc . Eλ = λ∈Λ

λ∈Λ

λ∈Λ

λ∈Λ

The result follows directly from these statements and Theorem 3.2.3. The details are requested in Exercise 3.2.10.

Exercises Exercise 3.2.1. (a) Where in the proof of Theorem 3.2.3 part (ii) does the assumption that the collection of open sets be ﬁnite get used? (b) Give an example of an inﬁnite collection of nested open sets O1 ⊇ O2 ⊇ O3 ⊇ O4 ⊇ · · · whose intersection

∞

n=1

On is closed and nonempty.

3.2. Open and Closed Sets Exercise 3.2.2. Let

83

B=

(−1)n n : n = 1, 2, 3, . . . n+1

.

(a) Find the limit points of B. (b) Is B a closed set? (c) Is B an open set? (d) Does B contain any isolated points? (e) Find B. Exercise 3.2.3. Decide whether the following sets are open, closed, or neither. If a set is not open, ﬁnd a point in the set for which there is no -neighborhood contained in the set. If a set is not closed, ﬁnd a limit point that is not contained in the set. (a) Q. (b) N. (c) {x ∈ R : x > 0}. (d) (0, 1] = {x ∈ R : 0 < x ≤ 1}. (e) {1 + 1/4 + 1/9 + · · · + 1/n2 : n ∈ N}. Exercise 3.2.4. Prove the converse of Theorem 3.2.5 by showing that if x = lim an for some sequence (an ) contained in A satisfying an = x, then x is a limit point of A. Exercise 3.2.5. Let a ∈ A. Prove that a is an isolated point of A if and only if there exists an -neighborhood V (a) such that V (x) ∩ A = {a}. Exercise 3.2.6. Prove Theorem 3.2.8. Exercise 3.2.7. Let x ∈ O, where O is an open set. If (xn ) is a sequence converging to x, prove that all but a ﬁnite number of the terms of (xn ) must be contained in O. Exercise 3.2.8. Given A ⊆ R, let L be the set of all limit points of A. (a) Show that the set L is closed. (b) Argue that if x is a limit point of A ∪ L, then x is a limit point of A. Use this observation to furnish a proof for Theorem 3.2.12. Exercise 3.2.9. (a) If y is a limit point of A ∪ B, show that y is either a limit point of A or a limit point of B (or both). (b) Prove that A ∪ B = A ∪ B. (c) Does the result about closures in (b) extend to inﬁnite unions of sets? Exercise 3.2.10 (De Morgan’s Laws). A proof for De Morgan’s Laws in the case of two sets is outlined in Exercise 1.2.3. The general argument is similar. (a) Given a collection of sets {Eλ : λ ∈ Λ}, show that c c c Eλ = Eλ and Eλ = Eλc . λ∈Λ

λ∈Λ

λ∈Λ

λ∈Λ

(b) Now, provide the details for the proof of Theorem 3.2.14.

84

Chapter 3. Basic Topology of R

Exercise 3.2.11. Let A be bounded above so that s = sup A exists. Show that s ∈ A. Exercise 3.2.12. Decide whether the following statements are true or false. Provide counterexamples for those that are false, and supply proofs for those that are true. c (a) For any set A ⊆ R, A is open. (b) If a set A has an isolated point, it cannot be an open set. (c) A set A is closed if and only if A = A. (d) If A is a bounded set, then s = sup A is a limit point of A. (e) Every ﬁnite set is closed. (f) An open set that contains every rational number must necessarily be all of R. Exercise 3.2.13. Prove that the only sets that are both open and closed are R and the empty set ∅. Exercise 3.2.14. A set A is called an Fσ set if it can be written as the countable union of closed sets. A set B is called a Gδ set if it can be written as the countable intersection of open sets. (a) Show that a closed interval [a, b] is a Gδ set. (b) Show that the half-open interval (a, b] is both a Gδ and an Fσ set. (c) Show that Q is an Fσ set, and the set of irrationals I forms a Gδ set. (We will see in Section 3.5 that Q is not a Gδ set, nor is I an Fσ set.)

3.3

Compact Sets

Deﬁnition 3.3.1. A set K ⊆ R is compact if every sequence in K has a subsequence that converges to a limit that is also in K. Example 3.3.2. The most basic example of a compact set is a closed interval. To see this, notice that if (an ) is contained in an interval [c, d], then the Bolzano– Weierstrass Theorem guarantees that we can ﬁnd a convergent subsequence (ank ). Because a closed interval is a closed set (Example 3.2.9, (ii)), we know that the limit of this subsequence is also in [c, d]. What are the properties of closed intervals that we used in the preceding argument? The Bolzano–Weierstrass Theorem requires boundedness, and we used the fact that closed sets contain their limit points. As we are about to see, these two properties completely characterize compact sets in R. The term “bounded” has thus far only been used to describe sequences (Deﬁnition 2.3.1), but an analogous statement can easily be made about sets. Deﬁnition 3.3.3. A set A ⊆ R is bounded if there exists M > 0 such that |a| ≤ M for all a ∈ A. Theorem 3.3.4 (Heine–Borel Theorem). A set K ⊆ R is compact if and only if it is closed and bounded.

3.3. Compact Sets

85

Proof. Let K be compact. We will ﬁrst prove that K must be bounded, so assume, for contradiction, that K is not a bounded set. The idea is to produce a sequence in K that marches oﬀ to inﬁnity in such a way that it cannot have a convergent subsequence as the deﬁnition of compact requires. To do this, notice that because K is not bounded there must exist an element x1 ∈ K satisfying |x1 | > 1. Likewise, there must exist x2 ∈ K with |x2 | > 2, and in general, given any n ∈ N, we can produce xn ∈ K such that |xn | > n. Now, because K is assumed to be compact, (xn ) should have a convergent subsequence (xnk ). But the elements of the subsequence must necessarily satisfy |xnk | > nk , and consequently (xnk ) is unbounded. Because convergent sequences are bounded (Theorem 2.3.2), we have a contradiction. Thus, K must at least be a bounded set. Next, we will show that K is also closed. To see that K contains its limit points, we let x = lim xn , where (xn ) is contained in K and argue that x must be in K as well. By Deﬁnition 3.3.1, the sequence (xn ) has a convergent subsequence (xnk ), and by Theorem 2.5.2, we know (xnk ) converges to the same limit x. Finally, Deﬁnition 3.3.1 requires that x ∈ K. This proves that K is closed. The proof of the converse statement is requested in Exercise 3.3.2. There may be a temptation to consider closed intervals as being a kind of standard archetype for compact sets, but this is misleading. The structure of compact sets can be much more intricate and interesting. For instance, one implication of the Heine–Borel Theorem (Theorem 3.3.4) is that the Cantor set is compact (Exercise 3.3.3). It is more useful to think of compact sets as generalizations of closed intervals. Whenever a fact involving closed intervals is true, it is often the case that the same result holds when we replace “closed interval” with “compact set.” As an example, let’s experiment with the Nested Interval Property proved in the ﬁrst chapter. Theorem 3.3.5. If K1 ⊇ K2 ⊇ K3 ⊇ K4⊇ · · · is a nested sequence of ∞ nonempty compact sets, then the intersection n=1 Kn is not empty. Proof. In order to take advantage of the compactness of each Kn , we are going to produce a sequence that is eventually in each of these sets. Thus, for each n ∈ N, pick a point xn ∈ Kn . Because the compact sets are nested, it follows that the sequence (xn ) is contained in K1 . By Deﬁnition 3.3.1, (xn ) has a convergent subsequence (xnk ) whose limit x = lim xnk is an element of K1 . In fact, x is an element of every Kn for essentially the same reason. Given a particular n0 ∈ N, the terms in the sequence (xn ) are contained in Kn0 as long as n ≥ n0 . Ignoring the ﬁnite number of terms for which nk < n0 , the same subsequence (xnk ) is then also contained in Kn0 . The conclusion ∞ is that x = lim xnk is an element of Kn0 . Because n0 was arbitrary, x ∈ n=1 Kn and the proof is complete.

86

Chapter 3. Basic Topology of R

Open Covers Deﬁning compactness for sets in R is reminiscent of the situation we encountered with completeness in that there are a number of equivalent ways to describe this phenomenon. We demonstrated the equivalence of two such characterizations in Theorem 3.3.4. What this theorem implies is that we could have decided to deﬁne compact sets to be sets that are closed and bounded, and then proved that sequences contained in compact sets have convergent subsequences with limits in the set. There are some larger issues involved in deciding what the deﬁnition should be, but what is important at this moment is that we be versatile enough to use whatever description of compactness is most appropriate for a given situation. To add to the delight, there is a third useful characterization of compactness, equivalent to the two others, which is described in terms of open covers and ﬁnite subcovers. Deﬁnition 3.3.6. Let A ⊆ R. An open cover for A is a (possibly inﬁnite) collection of open sets {Oλ : λ ∈ Λ} whose union contains the set A; that is, A ⊆ λ∈Λ Oλ . Given an open cover for A, a ﬁnite subcover is a ﬁnite subcollection of open sets from the original open cover whose union still manages to completely contain A. Example 3.3.7. Consider the open interval (0, 1). For each point x ∈ (0, 1), let Ox be the open interval (x/2, 1). Taken together, the inﬁnite collection {Ox : x ∈ (0, 1)} forms an open cover for the open interval (0, 1). Notice, however, that it is impossible to ﬁnd a ﬁnite subcover. Given any proposed ﬁnite subcollection {Ox1 , Ox2 , . . . , Oxn }, set x = min{x1 , x2 , . . . , xn } and observe that any real number y satisfying n 0 < y ≤ x /2 is not contained in the union i=1 Oxi . Ox1

( 0

•

x2 2

• x2

•

x1 2

Ox2

•

)

x1

1

Now, consider a similar cover for the closed interval [0, 1]. For x ∈ (0, 1), the sets Ox = (x/2, 1) do a ﬁne job covering (0, 1), but in order to have an open cover of the closed interval [0, 1], we must also cover the endpoints. To remedy this, we could ﬁx > 0, and let O0 = (−, ) and O1 = (1 − , 1 + ). Then, the collection {O0 , O1 , Ox : x ∈ (0, 1)}

3.3. Compact Sets

87

is an open cover for [0, 1]. But this time, notice there is a ﬁnite subcover. Because of the addition of the set O0 , we can choose x so that x /2 < . It follows that {O0 , Ox , O1 } is a ﬁnite subcover for the closed interval [0, 1]. Theorem 3.3.8. Let K be a subset of R. All of the following statements are equivalent in the sense that any one of them implies the two others: (i) K is compact. (ii) K is closed and bounded. (iii) Any open cover for K has a ﬁnite subcover. Proof. The equivalence of (i) and (ii) is the content of Theorem 3.3.4. What remains is to show that (iii) is equivalent to (i) and (ii). Let’s ﬁrst assume (iii), and prove that it implies (ii) (and thus (i) as well). To show that K is bounded, we construct an open cover for K by deﬁning Ox to be an open interval of radius 1 around each point x ∈ K. In the language of neighborhoods, Ox = V1 (x). The open cover {Ox : x ∈ K} then must have a ﬁnite subcover {Ox1 , Ox2 , . . . , Oxn }. Because K is contained in a ﬁnite union of bounded sets, K must itself be bounded. The proof that K is closed is more delicate, and we argue it by contradiction. Let (yn ) be a Cauchy sequence contained in K with lim yn = y. To show that K is closed, we must demonstrate that y ∈ K, so assume for contradiction that this is not the case. If y ∈ / K, then every x ∈ K is some positive distance away from y. We now construct an open cover by taking Ox to be an interval of radius |x − y|/2 around each point x in K. Because we are assuming (iii), the resulting open cover {Ox : x ∈ K} must have a ﬁnite subcover {Ox1 , Ox2 , . . . , Oxn }. The contradiction arises when we realize that, in the spirit of Example 3.3.7, this ﬁnite subcover cannot contain all of the elements of the sequence (yn ). To make this explicit, set |xi − y| 0 = min :1≤i≤n . 2 Because (yn ) → y, we can certainly ﬁnd a term yN satisfying |yN − y| < 0 . But such a yN must necessarily be excluded from each Oxi , meaning that yN ∈ /

n

Oxi .

i=1

Thus our supposed subcover does not actually cover all of K. This contradiction implies that y ∈ K, and hence K is closed and bounded. The proof that (ii) implies (iii) is outlined in Exercise 3.3.8.

Exercises Exercise 3.3.1. Show that if K is compact, then sup K and inf K both exist and are elements of K.

88

Chapter 3. Basic Topology of R

Exercise 3.3.2. Prove the converse of Theorem 3.3.4 by showing that if a set K ⊆ R is closed and bounded, then it is compact. Exercise 3.3.3. Show that the Cantor set deﬁned in Section 3.1 is a compact set. Exercise 3.3.4. Show that if K is compact and F is closed, then K ∩ F is compact. Exercise 3.3.5. Decide which of the following sets are compact. For those that are not compact, show how Deﬁnition 3.3.1 breaks down. In other words, give an example of a sequence contained in the given set that does not possess a subsequence converging to a limit in the set. (a) Q. (b) Q ∩ [0, 1]. (c) R. (d) R ∩ [0, 1]. (e) {1, 1/2, 1/3, 1/4, 1/5, . . . }. (f) {1, 1/2, 2/3, 3/4, 4/5, . . . }. Exercise 3.3.6. As some more evidence of the surprising nature of the Cantor set, follow these steps to show that the sum C + C = {x + y : x, y ∈ C} is equal to the closed interval [0, 2]. (Keep in mind that C has zero length and contains no intervals.) The observation that {x + y : x, y ∈ C} ⊆ [0, 2] passes for obvious, so we only need to prove the reverse inclusion [0, 2] ⊆ {x + y : x, y ∈ C}. Thus, given s ∈ [0, 2], we must ﬁnd two elements x, y ∈ C satisfying x + y = s. (a) Show that there exist x1 , y1 ∈ C1 for which x1 + y1 = s. Show in general that, for an arbitrary n ∈ N, we can always ﬁnd xn , yn ∈ Cn for which xn + yn = s. (b) Keeping in mind that the sequences (xn ) and (yn ) do not necessarily converge, show how they can nevertheless be used to produce the desired x and y in C satisfying x + y = s. Exercise 3.3.7. Decide whether the following propositions are true or false. If the claim is valid, supply a short proof, and if the claim is false, provide a counterexample. (a) An arbitrary intersection of compact sets is compact. (b) Let A ⊆ R be arbitrary, and let K ⊆ R be compact. Then, the intersection A ∩ K is compact. (c) If F1 ⊇ F2 ⊇ F3 ⊇F4 ⊇ · · · is a nested sequence of nonempty closed ∞ sets, then the intersection n=1 Fn = ∅. (d) A ﬁnite set is always compact. (e) A countable set is always compact. Exercise 3.3.8. Follow these steps to prove the ﬁnal implication in Theorem 3.3.8.

3.4. Perfect Sets and Connected Sets

89

Assume K satisﬁes (i) and (ii), and let {Oλ : λ ∈ Λ} be an open cover for K. For contradiction, let’s assume that no ﬁnite subcover exists. Let I0 be a closed interval containing K, and bisect I0 into two closed intervals A1 and B1 . (a) Why must either A1 ∩ K or B1 ∩ K (or both) have no ﬁnite subcover consisting of sets from {Oλ : λ ∈ Λ}. (b) Show that there exists a nested sequence of closed intervals I0 ⊇ I1 ⊇ I2 ⊇ · · · with the property that, for each n, In ∩ K cannot be ﬁnitely covered and lim |In | = 0. (c) Show that there exists an x ∈ K such that x ∈ In for all n. (d) Because x ∈ K, there must exist an open set Oλ0 from the original collection that contains x as an element. Argue that there must be an n0 large enough to guarantee that In0 ⊆ Oλ0 . Explain why this furnishes us with the desired contradiction. Exercise 3.3.9. Consider each of the sets listed in Exercise 3.3.5. For each one that is not compact, ﬁnd an open cover for which there is no ﬁnite subcover. Exercise 3.3.10. Let’s call a set clompact if it has the property that every closed cover (i.e., a cover consisting of closed sets) admits a ﬁnite subcover. Describe all of the clompact subsets of R.

3.4

Perfect Sets and Connected Sets

One of the underlying goals of topology is to strip away all of the extraneous information that comes with our intuitive picture of the real numbers and isolate just those properties that are responsible for the phenomenon we are studying. For example, we were quick to observe that any closed interval is a compact set. The content of Theorem 3.3.4, however, is that the compactness of a closed interval has nothing to do with the fact that the set is an interval but is a consequence of the set being bounded and closed. In Chapter 1, we argued that the set of real numbers between 0 and 1 is an uncountable set. This turns out to be the case for any nonempty closed set that does not contain isolated points.

Perfect Sets Deﬁnition 3.4.1. A set P ⊆ R is perfect if it is closed and contains no isolated points. Closed intervals (other than the singleton sets [a, a]) serve as the most obvious class of examples of perfect sets, but in fact it is not too diﬃcult to prove that the Cantor set from Section 3.1 is another example. Theorem 3.4.2. The Cantor set is perfect.

∞ Proof. The Cantor set is deﬁned as the intersection C = n=0 Cn , where each Cn is a ﬁnite union of closed intervals. By Theorem 3.2.14, each Cn is closed, and by the same theorem, C is closed as well. It remains to show that no point in C is isolated.

90

Chapter 3. Basic Topology of R

Let x ∈ C be arbitrary. To convince ourselves that x is not isolated, we must construct a sequence (xn ) of points in C, diﬀerent from x, that converges to x. From our earlier discussion, we know that C at least contains the endpoints of the intervals that make up each Cn . In Exercise 3.4.3, we sketch the argument that these are all that is needed to construct (xn ). One argument for the uncountability of the Cantor set was presented in Section 3.1. Another, perhaps more satisfying, argument for the same conclusion can be obtained from the next theorem. Theorem 3.4.3. A nonempty perfect set is uncountable. Proof. If P is perfect and nonempty, then it must be inﬁnite because otherwise it would consist only of isolated points. Let’s assume, for contradiction, that P is countable. Thus, we can write P = {x1 , x2 , x3 , . . . }, where every element of P appears on this list. The idea is to construct a sequence / K2 , of nested compact sets Kn , all contained in P , with the property that x1 ∈ x2 ∈ / K3 , x 3 ∈ / K4 , . . . . Some care must be taken to ensure that each Kn is nonempty, for then we can use Theorem 3.3.5 to produce an x∈

∞

Kn ⊆ P

n=1

that cannot be on the list {x1 , x2 , x3 , . . . }. Let I1 be a closed interval that contains x1 in its interior (i.e., x1 is not an endpoint of I1 ). Now, x1 is not isolated, so there exists some other point y2 ∈ P that is also in the interior of I1 . Construct a closed interval I2 , centered on y2 , so that I2 ⊆ I1 but x1 ∈ / I2 . More explicitly, if I1 = [a, b], let = min{y2 − a, b − y2 , |x1 − y2 |}. Then, the interval I2 = [y − /2, y + /2] has the desired properties. &

I1

•

x1

[ y•2 ]

'

I2

This process can be continued. Because y2 ∈ P is not isolated, there must exist another point y3 ∈ P in the interior of I2 , and we may insist that y3 = x2 . Now, construct I3 centered on y3 and small enough so that x2 ∈ / I3 and I3 ⊆ I2 . Observe that I3 ∩ P = ∅ because this intersection contains at least y3 . If we carry out this construction inductively, the result is a sequence of closed intervals In satisfying

3.4. Perfect Sets and Connected Sets

91

(i) In+1 ⊆ In , (ii) xn ∈ In+1 , and (iii) In ∩ P = ∅. To ﬁnish the proof, we let Kn = In ∩ P. For each n ∈ N, we have that Kn is closed because it is the intersection of closed sets, and bounded because it is contained in the bounded set In . Hence, Kn is compact. By construction, Kn is not empty and Kn+1 ⊆ Kn . Thus, we can employ Theorem 3.3.5 to conclude that the intersection ∞ Kn = ∅. n=1

But each ∞ Kn is a subset of P , and the fact that xn ∈ In+1 leads to the conclusion that n=1 Kn = ∅, which is the sought-after contradiction.

Connected Sets Although the two open intervals (1, 2) and (2, 5) have the limit point x = 2 in common, there is still some space between them in the sense that no limit point of one of these intervals is actually contained in the other. Said another way, the closure of (1, 2) (see Deﬁnition 3.2.11) is disjoint from (2, 5), and the closure of (2, 5) does not intersect (1, 2). Notice that this same observation cannot be made about (1, 2] and (2, 5), even though these latter sets are disjoint. Deﬁnition 3.4.4. Two nonempty sets A, B ⊆ R are separated if A ∩ B and A ∩ B are both empty. A set E ⊆ R is disconnected if it can be written as E = A ∪ B, where A and B are nonempty separated sets. A set that is not disconnected is called a connected set. Example 3.4.5. (i) If we let A = (1, 2) and B = (2, 5), then it is not diﬃcult to verify that E = (1, 2) ∪ (2, 5) is disconnected. Notice that the sets C = (1, 2] and D = (2, 5) are not separated because C ∩D = {2} is not empty. This should be comforting. The union C ∪ D is equal to the interval (1, 5), which better not qualify as a disconnected set. We will prove in a moment that every interval is a connected subset of R and vice versa. (ii) Let’s show that the set of rational numbers is disconnected. If we let √ √ A = Q ∩ (−∞, 2) and B = Q ∩ ( 2, ∞), √ then we certainly have Q = A ∪ B. The fact that A ⊆ (−∞, 2) implies (by the √ Order Limit Theorem) that any limit point of A will necessarily fall in (−∞, 2]. Because this is disjoint from B, we get A ∩ B = ∅. We can similarly show that A ∩ B = ∅, which implies that A and B are separated. The deﬁnition of connected is stated as the negation of disconnected, but a little care with the logical negation of the quantiﬁers in Deﬁnition 3.4.4 results in a positive characterization of connectedness. Essentially, a set E is connected

92

Chapter 3. Basic Topology of R

if, no matter how it is partitioned into two nonempty disjoint sets, it is always possible to show that at least one of the sets contains a limit point of the other. Theorem 3.4.6. A set E ⊆ R is connected if and only if, for all nonempty disjoint sets A and B satisfying E = A ∪ B, there always exists a convergent sequence (xn ) → x with (xn ) contained in one of A or B, and x an element of the other. Proof. Exercise 3.4.6. The concept of connectedness is more relevant when working with subsets of the plane and other higher-dimensional spaces. This is because, in R, the connected sets coincide precisely with the collection of intervals (with the understanding that unbounded intervals such as (−∞, 3) and [0, ∞) are included). Theorem 3.4.7. A set E ⊆ R is connected if and only if whenever a < c < b with a, b ∈ E, it follows that c ∈ E as well. Proof. Assume E is connected, and let a, b ∈ E and a < c < b. Set A = (−∞, c) ∩ E

and

B = (c, ∞) ∩ E.

Because a ∈ A and b ∈ B, neither set is empty and, just as in Example 3.4.5 (ii), neither set contains a limit point of the other. If E = A ∪ B, then we would have that E is disconnected, which it is not. It must then be that A ∪ B is missing some element of E, and c is the only possibility. Thus, c ∈ E. Conversely, assume that E is an interval in the sense that whenever a, b ∈ E satisfy a < c < b for some c, then c ∈ E. Our intent is to use the characterization of connected sets in Theorem 3.4.6, so let E = A ∪ B, where A and B are nonempty and disjoint. We need to show that one of these sets contains a limit point of the other. Pick a0 ∈ A and b0 ∈ B, and, for the sake of the argument, assume a0 < b0 . Because E is itself an interval, the interval I0 = [a0 , b0 ] is contained in E. Now, bisect I0 into two equal halves. The midpoint of I0 must either be in A or B, and so choose I1 = [a1 , b1 ] to be the half that allows us to have a1 ∈ A and b1 ∈ B. Continuing this process yields a sequence of nested intervals In = [an , bn ], where an ∈ A, bn ∈ B, and the length (bn − an ) → 0. The remainder of this argument should feel familiar. By the Nested Interval Property, there exists an ∞ x∈ In , n=0

and it is straightforward to show that the sequences of endpoints each satisfy lim an = x and lim bn = x. But now x ∈ E must belong to either A or B, thus making it a limit point of the other. This completes the argument.

Exercises Exercise 3.4.1. If P is a perfect set and K is compact, is the intersection P ∩K always compact? Always perfect?

3.4. Perfect Sets and Connected Sets

93

Exercise 3.4.2. Does there exist a perfect set consisting of only rational numbers? Exercise 3.4.3. Review the portion of the proof given for Theorem 3.4.2 and follow these steps to complete the argument. (a) Because x ∈ C1 , argue that there exists an x1 ∈ C ∩ C1 with x1 = x satisfying |x − x1 | ≤ 1/3. (b) Finish the proof by showing that for each n ∈ N, there exists xn ∈ C∩Cn , diﬀerent from x, satisfying |x − xn | ≤ 1/3n . Exercise 3.4.4. Repeat the Cantor construction from Section 3.1 starting with the open interval [0, 1]. This time, however, remove the open middle fourth from each component. (a) Is the resulting set compact? Perfect? (b) Using the algorithms from Section 3.1, compute the length and dimension of this Cantor-like set. Exercise 3.4.5. Let A and B be subsets of R. Show that if there exist disjoint open sets U and V with A ⊆ U and B ⊆ V , then A and B are separated. Exercise 3.4.6. Prove Theorem 3.4.6. Exercise 3.4.7. (a) Find an example of a disconnected set whose closure is connected. (b) If A is connected, is A necessarily connected? If A is perfect, is A necessarily perfect? Exercise 3.4.8. A set E is totally disconnected if, given any two points x, y ∈ E, there exist separated sets A and B with x ∈ A, y ∈ B, and E = A ∪ B. (a) Show that Q is totally disconnected. (b) Is the set of irrational numbers totally disconnected? ∞ Exercise 3.4.9. Follow these steps to show that the Cantor set C = n=0 Cn described in Section 3.1 is totally disconnected in the sense described in Exercise 3.4.8. (a) Given x, y ∈ C, with x < y, set = y − x. For each n = 0, 1, 2, . . . , the set Cn consists of a ﬁnite number of closed intervals. Explain why there must exist an N large enough so that it is impossible for x and y both to belong to the same closed interval of CN . (b) Argue that there exists a point z ∈ / C such that x < z < y. Explain how this proves that there can be no interval of the form (a, b) with a < b contained in C. (c) Show that C is totally disconnected. Exercise 3.4.10. Let {r1 , r2 , r3 , . . . } be an enumerationof the rational num∞ bers, and for each n ∈ N set n = 1/2n . Deﬁne O = n=1 Vn (rn ), and let F = Oc . (a) Argue that F is a closed, nonempty set consisting only of irrational numbers.

94

Chapter 3. Basic Topology of R

(b) Does F contain any nonempty open intervals? Is F totally disconnected? (See Exercise 3.4.8 for the deﬁnition.) (c) Is it possible to know whether F is perfect? If not, can we modify this construction to produce a nonempty perfect set of irrational numbers?

3.5

Baire’s Theorem

The nature of the real line can be deceptively elusive. The closer we look, the more intricate and enigmatic R becomes, and the more we are reminded to proceed carefully (i.e., axiomatically) with all of our conclusions about properties of subsets of R. The structure of open sets is fairly straightforward. Every open set is either a ﬁnite or countable union of open intervals. Standing in opposition to this tidy description of all open sets is the Cantor set. The Cantor set is a closed, uncountable set that contains no intervals of any kind. Thus, no such characterization of closed sets should be anticipated. Recall that the arbitrary union of open sets is always an open set. Likewise, the arbitrary intersection of closed sets is closed. By taking unions of closed sets or intersections of open sets, however, it is possible to obtain a fairly broad selection of subsets of R. In Exercises 3.2.14, we introduced the following two classes of sets. Deﬁnition 3.5.1. A set A ⊆ R is called an Fσ set if it can be written as the countable union of closed sets. A set B ⊆ R is called a Gδ set if it can be written as the countable intersection of open sets. Exercise 3.5.1. Argue that a set A is a Gδ set if and only if its complement is an Fσ set. Exercise 3.5.2. Replace each with the word ﬁnite or countable, depending on which is more appropriate. (a) The union of Fσ sets an in Fσ set. intersection of Fσ sets is an Fσ set. (b) The (c) The union of Gδ sets is a Gδ set. (d) The intersection of Gδ sets is a Gδ set. Exercise 3.5.3. (This exercise has already appeared as Exercise 3.2.14.) (a) Show that a closed interval [a, b] is a Gδ set. (b) Show that the half-open interval (a, b] is both a Gδ and an Fσ set. (c) Show that Q is an Fσ set, and the set of irrationals I forms a Gδ set. It is not readily obvious that the class Fσ does not include every subset of R, but we are now ready to argue that I is not an Fσ set (and consequently Q is not a Gδ set). This will follow from a theorem due to Ren´e Louis Baire (1874–1932). Recall that a set G ⊆ R is dense in R if, given any two real numbers a < b, it is possible to ﬁnd a point x ∈ G with a < x < b.

3.5. Baire’s Theorem

95

Theorem 3.5.2. If {G1 , G , G3 , . . . } is a countable collection of dense, open 2∞ sets, then the intersection n=1 Gn is not empty. Proof. Before embarking on the proof, notice that we have seen a conclusion like this before. Theorem 3.3.5 asserts that a nested sequence of compact sets has a nontrivial intersection. In this theorem, we are dealing with dense, open sets, but as it turns out, we are going to use Theorem 3.3.5—and actually, just the Nested Interval Property—as the crucial step in the argument. Exercise 3.5.4. (a) Starting with n = 1, inductively construct a nested sequence of closed intervals I1 ⊇ I2 ⊇ I3 ⊇ · · · satisfying In ⊆ Gn . Give special attention to the issue of the endpoints of each In . (b) Now, use Theorem 3.3.5 or the Nested Interval Property to ﬁnish the proof.

Exercise 3.5.5. Show that it is impossible to write R=

∞

Fn ,

n=1

where for each n ∈ N, Fn is a closed set containing no nonempty open intervals. Exercise 3.5.6. Show how the previous exercise implies that the set I of irrationals cannot be an Fσ set, and Q cannot be a Gδ set. Exercise 3.5.7. Using Exercise 3.5.6 and versions of the statements in Exercise 3.5.2, construct a set that is neither in Fσ nor in Gδ .

Nowhere-Dense Sets We have encountered several equivalent ways to assert that a particular set G is dense in R. In Section 3.2, we observed that G is dense in R if and only if every point of R is a limit point of G. Because the closure of any set is obtained by taking the union of the set and its limit points, we have that G is dense in R if and only if G = R. The set Q is dense in R; the set Z is clearly not. In fact, in the jargon of analysis, Z is “nowhere-dense” in R. Deﬁnition 3.5.3. A set E is nowhere-dense if E contains no nonempty open intervals. Exercise 3.5.8. Show that a set E is nowhere-dense in R if and only if the complement of E is dense in R.

96

Chapter 3. Basic Topology of R

Exercise 3.5.9. Decide whether the following sets are dense in R, nowheredense in R, or somewhere in between. (a) A = Q ∩ [0, 5]. (b) B = {1/n : n ∈ N}. (c) the set of irrationals. (d) the Cantor set. We can now restate Theorem 3.5.2 in a slightly more general form. Theorem 3.5.4 (Baire’s Theorem). The set of real numbers R cannot be written as the countable union of nowhere-dense sets. Proof. For contradiction, assume that E1 , E2 , E3 , . . . are each nowhere-dense ∞ and satisfy R = n=1 En . Exercise 3.5.10. Finish the proof by ﬁnding a contradiction to the results in this section.

3.6

Epilogue

Baire’s Theorem is yet another statement about the size of R. We have already encountered several ways to describe the sizes of inﬁnite sets. In terms of cardinality, countable sets are relatively small whereas uncountable sets are large. We also brieﬂy discussed the concept of “length,” or “measure,” in Section 3.1. Baire’s Theorem oﬀers a third perspective. From this point of view, nowhere-dense sets are considered to be “thin” sets. Any set that is the countable union—i.e., a not very large union—of these small sets is called a “meager” set or a set of “ﬁrst category.” A set that is not of ﬁrst category is of “second category.” Intuitively, sets of the second category are the “fat” subsets. The Baire Category Theorem, as it is often called, states that R is of second category. There is a signiﬁcance to the Baire Category Theorem that is diﬃcult to appreciate at the moment because we are only seeing a special case of this result. The real numbers are an example of a complete metric space. Metric spaces are discussed in some detail in Section 8.2, but here is the basic idea. Given a set of mathematical objects such as real numbers, points in the plane or continuous functions deﬁned on [0,1], a “metric” is a rule that assigns a “distance” between two elements in the set. In R, we have been using |x−y| as the distance between the real numbers x and y. The point is that if we can create a satisfactory notion of “distance” on these other spaces (we will need the triangle inequality to hold, for instance), then the concepts of convergence, Cauchy sequences, and open sets, for example, can be naturally transferred over. A complete metric space is any set with a suitably deﬁned metric in which Cauchy sequences have limits. We have spent a good deal of time discussing the fact that R is a complete metric space whereas Q is not.

3.6. Epilogue

97

The Baire Category Theorem in its more general form states that any complete metric space must be too large to be the countable union of nowhere-dense subsets. One particularly interesting example of a complete metric space is the set of continuous functions deﬁned on the interval [0, 1]. (The distance between two functions f and g in this space is deﬁned to be sup |f (x) − g(x)|, where x ∈ [0, 1].) Now, in this space we will see that the collection of continuous functions that are diﬀerentiable at even one point can be written as the countable union of nowhere-dense sets. Thus, a fascinating consequence of Baire’s Theorem in this setting is that most continuous functions do not have derivatives at any point. Chapter 5 concludes with a construction of one such function. This odd situation mirrors the roles of Q and I as subsets of R. Just as the familiar rational numbers constitute a minute proportion of the real line, the diﬀerentiable functions of calculus are exceedingly atypical of continuous functions in general.

Chapter 4

Functional Limits and Continuity 4.1

Discussion: Examples of Dirichlet and Thomae

Although it is common practice in calculus courses to discuss continuity before diﬀerentiation, historically mathematicians’ attention to the concept of continuity came long after the derivative was in wide use. Pierre de Fermat (1601–1665) was using tangent lines to solve optimization problems as early as 1629. On the other hand, it was not until around 1820 that Cauchy, Bolzano, Weierstrass, and others began to characterize continuity in terms more rigorous than prevailing intuitive notions such as “unbroken curves” or “functions which have no jumps or gaps.” The basic reason for this two-hundred year waiting period lies in the fact that, for most of this time, the very notion of function did not really permit discontinuities. Functions were entities such as polynomials, sines, and cosines, always smooth and continuous over their relevant domains. The gradual liberation of the term function to its modern understanding—a rule associating a unique output to a given input—was simultaneous with 19th century investigations into the behavior of inﬁnite series. Extensions of the power of calculus were intimately tied to the ability to represent a function f (x) as a limit of polynomials (called a power series) or as a limit of sums of sines and cosines (called a trigonometric or Fourier series). A typical question for Cauchy and his contemporaries was whether the continuity of the limiting polynomials or trigonometric functions necessarily implied that the limit f would also be continuous. Sequences and series of functions are the topics of Chapter 6. What is relevant at this moment is that we realize why the issue of ﬁnding a rigorous deﬁnition for continuity ﬁnally made its way to the fore. Any signiﬁcant progress on the question of whether the limit of continuous functions is continuous (for 99

100

Chapter 4. Functional Limits and Continuity

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

Figure 4.1: Dirichlet’s Function, g(x).

Cauchy and for us) necessarily depends on a deﬁnition of continuity that does not rely on imprecise notions such as “no holes” or “gaps.” With a mathematically unambiguous deﬁnition for the limit of a sequence in hand, we are well on our way toward a rigorous understanding of continuity. Given a function f with domain A ⊆ R, we want to deﬁne continuity at a point c ∈ A to mean that if x ∈ A is chosen near c, then f (x) will be near f (c). Symbolically, we will say f is continuous at c if lim f (x) = f (c).

x→c

The problem is that, at present, we only have a deﬁnition for the limit of a sequence, and it is not entirely clear what is meant by limx→c f (x). The subtleties that arise as we try to fashion such a deﬁnition are well-illustrated via a family of examples, all based on an idea of the prominent German mathematician, Peter Lejeune Dirichlet. Dirichlet’s idea was to deﬁne a function g in a piecewise manner based on whether or not the input variable x is rational or irrational. Speciﬁcally, let 1 if x ∈ Q g(x) = 0 if x ∈ / Q. The intricate way that Q and I ﬁt inside of R makes an accurate graph of g technically impossible to draw, but Figure 4.1 illustrates the basic idea. Does it make sense to attach a value to the expression limx→1/2 g(x)? One idea is to consider a sequence (xn ) → 1/2. Using our notion of the limit of a sequence, we might try to deﬁne limx→1/2 g(x) as simply the limit of the sequence g(xn ). But notice that this limit depends on how the sequence (xn ) is chosen. If each xn is rational, then lim g(xn ) = 1.

n→∞

On the other hand, if xn is irrational for each n, then lim g(xn ) = 0.

n→∞

4.1. Discussion: Examples of Dirichlet and Thomae

101

• •• •• •• • •• •• •• •• • • •• •• •• •• • •• •• •• •• • •• •• •• •••••••••••• •• • ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • • •• •• •• •• • • •• ••

Figure 4.2: Modified Dirichlet Function, h(x).

This unacceptable situation demands that we work harder on our deﬁnition of functional limits. Generally speaking, we want the value of limx→c g(x) to be independent of how we approach c. In this particular case, the deﬁnition of a functional limit that we agree on should lead to the conclusion that lim g(x)

x→1/2

does not exist.

Postponing the search for formal deﬁnitions for the moment, we should nonetheless realize that Dirichlet’s function is not continuous at c = 1/2. In fact, the real signiﬁcance of this function is that there is nothing unique about the point c = 1/2. Because both Q and I (the set of irrationals) are dense in the real line, it follows that for any z ∈ R we can ﬁnd sequences (xn ) ⊆ Q and (yn ) ⊆ I such that lim xn = lim yn = z. (See Example 3.2.9 (iii).) Because lim g(xn ) = lim g(yn ), the same line of reasoning reveals that g(x) is not continuous at z. In the jargon of analysis, Dirichlet’s function is a nowhere-continuous function on R. What happens if we adjust the deﬁnition of g(x) in the following way? Deﬁne a new function h (Fig. 4.2) on R by setting x if x ∈ Q h(x) = 0 if x ∈ / Q. If we take c diﬀerent from zero, then just as before we can construct sequences (xn ) → c of rationals and (yn ) → c of irrationals so that lim h(xn ) = c

and

lim h(yn ) = 0.

Thus, h is not continuous at every point c = 0.

102

Chapter 4. Functional Limits and Continuity

1 •

•

•

•

1 • • • 2 • • • • • • • • • • • • • •• • • • • • ••• •• ••• ••• ••• ••• ••• ••• ••• ••• •••••••••••••••••••••••••••••••••••••••••••• 1 3 − 12 1 2 2 2

Figure 4.3: Thomae’s Function, t(x).

If c = 0, however, then these two limits are both equal to h(0) = 0. In fact, it appears as though no matter how we construct a sequence (zn ) converging to zero, it will always be the case that lim h(zn ) = 0. This observation goes to the heart of what we want functional limits to entail. To assert that lim h(x) = L

x→c

should imply that h(zn ) → L

for all sequences

(zn ) → c.

For reasons not yet apparent, it is beneﬁcial to fashion the deﬁnition for functional limits in terms of neighborhoods constructed around c and L. We will quickly see, however, that this topological formulation is equivalent to the sequential characterization we have arrived at here. To this point, we have been discussing continuity of a function at a particular point in its domain. This is a signiﬁcant departure from thinking of continuous functions as curves that can be drawn without lifting the pen from the paper, and it leads to some fascinating questions. In 1875, K.J. Thomae discovered the function  if x = 0  1 1/n if x = m/n ∈ Q\{0} is in lowest terms with n > 0 t(x) =  0 if x ∈ / Q. If c ∈ Q, then t(c) > 0. Because the set of irrationals is dense in R, we can ﬁnd a sequence (yn ) in I converging to c. The result is that lim t(yn ) = 0 = t(c), and Thomae’s function (Fig. 4.3) fails to be continuous at any rational point. The twist comes when√we try this argument on some irrational point in the domain such as c = 2. All irrational values get mapped to zero by t, so the natural thing √ would be√to consider a sequence (xn ) of rational numbers that converges to 2. Now, 2 ≈ 1.414213 . . . so a good start on a particular

4.2. Functional Limits

103

√ sequence of rational approximations for 2 might be 14 141 1414 14142 141421 1, , ,... . , , , 10 100 1000 10000 100000 But notice that the denominators of these fractions are getting larger. In this case, the sequence t(xn ) begins,

1 1 1 1 1 ,... 1, , , , , 5 100 500 5000 100000

√ and is fast approaching 0 = t( 2). We will see that this always happens. The closer a rational number is chosen to a ﬁxed irrational number, the larger its denominator must necessarily be. As a consequence, Thomae’s function has the bizarre property of being continuous at every irrational point on R and discontinuous at every rational point. Is there an example of a function with the opposite property? In other words, does there exist a function deﬁned on all of R that is continuous on Q but fails to be continuous on I? Can the set of discontinuities of a particular function be arbitrary? If we are given some set A ⊆ R, is it always possible to ﬁnd a function that is continuous only on the set Ac ? In each of the examples in this section, the functions were deﬁned to have erratic oscillations around points in the domain. What conclusions can we draw if we restrict our attention to functions that are somewhat less volatile? One such class is the set of so-called monotone functions, which are either increasing or decreasing on a given domain. What might we be able to say about the set of discontinuities of a monotone function on R?

4.2

Functional Limits

Consider a function f : A → R. Recall that a limit point c of A is a point with the property that every -neighborhood V (c) intersects A in some point other than c. Equivalently, c is a limit point of A if and only if c = lim xn for some sequence (xn ) ⊆ A with xn = c. It is important to remember that limit points of A do not necessarily belong to the set A unless A is closed. If c is a limit point of the domain of f , then, intuitively, the statement lim f (x) = L

x→c

is intended to convey that values of f (x) get arbitrarily close to L as x is chosen closer and closer to c. The issue of what happens when x = c is irrelevant from the point of view of functional limits. In fact, c need not even be in the domain of f . The structure of the deﬁnition of functional limits follows the “challenge– response” pattern established in the deﬁnition for the limit of a sequence. Recall

104

Chapter 4. Functional Limits and Continuity

L+ V (L)

L L−

c−δ c c+δ

Vδ (c)

Figure 4.4: Definition of Functional Limit.

that given a sequence (an ), the assertion lim an = L implies that for every neighborhood V (L) centered at L, there is a point in the sequence—call it aN — after which all of the terms an fall in V (L). Each -neighborhood represents a particular challenge, and each N is the respective response. For functional limit statements such as limx→c f (x) = L, the challenges are still made in the form of an arbitrary -neighborhood around L, but the response this time is a δ-neighborhood centered at c. Deﬁnition 4.2.1. Let f : A → R, and let c be a limit point of the domain A. We say that limx→c f (x) = L provided that, for all > 0, there exists a δ > 0 such that whenever 0 < |x − c| < δ (and x ∈ A) it follows that |f (x) − L| < . This is often referred to as the “–δ version” of the deﬁnition for functional limits. Recall that the statement |f (x) − L| < is equivalent to f (x) ∈ V (L). Likewise, the statement |x − c| < δ is satisﬁed if and only if x ∈ Vδ (c). The additional restriction 0 < |x − c| is just an economical way of saying x = c. Recasting Deﬁnition 4.2.1 in terms of neighborhoods—just as we did for the deﬁnition of convergence of a sequence in Section 2.2—amounts to little more than a change of notation, but it does help emphasize the geometrical nature of what is happening (Fig. 4.4). Deﬁnition 4.2.1B (Topological Version). Let c be a limit point of the domain of f : A → R. We say limx→c f (x) = L provided that, for every neighborhood V (L) of L, there exists a δ-neighborhood Vδ (c) around c with the property that for all x ∈ Vδ (c) diﬀerent from c (with x ∈ A) it follows that f (x) ∈ V (L).

4.2. Functional Limits

105

The parenthetical reminder “(x ∈ A)” present in both versions of the definition is included to ensure that x is an allowable input for the function in question. When no confusion is likely, we may omit this reminder, leaving it to the reader to understand that the appearance of f (x) carries with it the implicit assumption that x is in the domain of f . On a similar note, there is no reason to discuss functional limits at isolated points of the domain. Thus, functional limits will only be considered as x tends toward a limit point of the function’s domain. Example 4.2.2. (i) To familiarize ourselves with Deﬁnition 4.2.1, let’s prove that if f (x) = 3x + 1, then lim f (x) = 7. x→2

Let > 0. Deﬁnition 4.2.1 requires that we produce a δ > 0 so that 0 < |x − 2| < δ leads to the conclusion |f (x) − 7| < . Notice that |f (x) − 7| = |(3x + 1) − 7| = |3x − 6| = 3|x − 2|. Thus, if we choose δ = /3, then 0 < |x − 2| < δ implies |f (x) − 7| < 3 (/3) = . (ii) Let’s show

lim g(x) = 4,

x→2

where g(x) = x2 . Given an arbitrary > 0, our goal this time is to make |g(x) − 4| < by restricting |x − 2| to be smaller than some carefully chosen δ. As in the previous problem, a little algebra reveals |g(x) − 4| = |x2 − 4| = |x + 2||x − 2|. We can make |x − 2| as small as we like, but we need an upper bound on |x + 2| in order to know how small to choose δ. The presence of the variable x causes some initial confusion, but keep in mind that we are discussing the limit as x approaches 2. If we agree that our δ-neighborhood around c = 2 must have radius no bigger than δ = 1, then we get the upper bound |x + 2| ≤ |3 + 2| = 5 for all x ∈ Vδ (c). Now, choose δ = min{1, /5}. If 0 < |x − 2| < δ, then it follows that |x2 − 4| = |x + 2||x − 2| < (5)

= , 5

and the limit is proved.

Sequential Criterion for Functional Limits We worked very hard in Chapter 2 to derive an impressive list of properties enjoyed by sequential limits. In particular, the Algebraic Limit Theorem (Theorem 2.3.3) and the Order Limit Theorem (Theorem 2.3.4) proved invaluable in a large number of the arguments that followed. Not surprisingly, we are going to need analogous statements for functional limits. Although it is not diﬃcult to

106

Chapter 4. Functional Limits and Continuity

generate independent proofs for these statements, all of them will follow quite naturally from their sequential analogs once we derive the sequential criterion for functional limits motivated in the opening discussion of this chapter. Theorem 4.2.3 (Sequential Criterion for Functional Limits). Given a function f : A → R and a limit point c of A, the following two statements are equivalent: (i) lim f (x) = L. x→c

(ii) For all sequences (xn ) ⊆ A satisfying xn = c and (xn ) → c, it follows that f (xn ) → L. Proof. (⇒) Let’s ﬁrst assume that limx→c f (x) = L. To prove (ii), we consider an arbitrary sequence (xn ), which converges to c and satisﬁes xn = c. Our goal is to show that the image sequence f (xn ) converges to L. This is most easily seen using the topological formulations of the deﬁnition. Let > 0. Because we are assuming (i), Deﬁnition 4.2.1B implies that there exists Vδ (c) with the property that all x ∈ Vδ (c) diﬀerent from c satisfy f (x) ∈ V (L). All we need to do then is argue that our particular sequence (xn ) is eventually in Vδ (c). But we are assuming that (xn ) → c. This implies that there exists a point xN after which xn ∈ Vδ (c). It follows that n ≥ N implies f (xn ) ∈ V (L), as desired. (⇐) For this implication, we intend to argue by contradiction. Thus, we assume that statement (ii) is true, and (carefully) negate statement (i). To say that lim f (x) = L x→c

means that there exists at least one particular 0 > 0 for which no δ is a suitable response. In other words, no matter what δ > 0 we try, there will always be at least one point x ∈ Vδ (c)

with

x = c

for which

f (x) ∈ / V0 (L).

For each n ∈ N, let δn = 1/n. From the preceding discussion, it follows that for / V0 (L). But each n ∈ N we may pick an xn ∈ Vδn (c) with xn = c and f (xn ) ∈ now notice that the result of this is a sequence (xn ) → c with xn = c, where the image sequence f (xn ) certainly does not converge to L. Because this contradicts (ii), which we are assuming is true for this argument, we may conclude that (i) must also hold. Theorem 4.2.3 has several useful corollaries. In addition to the previously advertised beneﬁt of granting us some short proofs of statements about how functional limits interact with algebraic combinations of functions, we also get an economical way of establishing that certain limits do not exist. Corollary 4.2.4 (Algebraic Limit Theorem for Functional Limits). Let f and g be functions deﬁned on a domain A ⊆ R, and assume limx→c f (x) = L and limx→c g(x) = M for some limit point c of A. Then,

4.2. Functional Limits

107

(i) lim kf (x) = kL for all k ∈ R, x→c

(ii) lim [f (x) + g(x)] = L + M , x→c

(iii) lim [f (x)g(x)] = LM , and x→c

(iv) lim f (x)/g(x) = L/M , provided M = 0. x→c

Proof. These follow from Theorem 4.2.3 and the Algebraic Limit Theorem for sequences. The details are requested in Exercise 4.2.5. Corollary 4.2.5 (Divergence Criterion for Functional Limits). Let Let f be a function deﬁned on A, and let c be a limit point of A. If there exist two sequences (xn ) and (yn ) in A with xn = c and yn = c and lim xn = lim yn = c

but

lim f (xn ) = lim f (yn ),

then we can conclude that the functional limit limx→c f (c) does not exist. Example 4.2.6. Assuming the familiar properties of the sine function, let’s show that limx→0 sin(1/x) does not exist (Fig. 4.5). If xn = 1/2nπ and yn = 1/(2nπ + π/2), then lim(xn ) = lim(yn ) = 0. However, sin(1/xn ) = 0 for all n ∈ N while sin(1/yn ) = 1. Thus, lim sin(1/xn ) = lim sin(1/yn ), so by Corollary 4.2.5, limx→0 sin(1/x) does not exist.

Figure 4.5: The function sin(1/x) near zero.

108

Chapter 4. Functional Limits and Continuity

Exercises Exercise 4.2.1. Use Deﬁnition 4.2.1 to supply a proof for the following limit statements. (a) limx→2 (2x + 4) = 8. (b) limx→0 x3 = 0. (c) limx→2 x3 = 8. (d) limx→π [[x]] = 3, where [[x]] denotes the greatest integer less than or equal to x. Exercise 4.2.2. Assume a particular δ > 0 has been constructed as a suitable response to a particular challenge. Then, any larger/smaller (pick one) δ will also suﬃce. Exercise 4.2.3. Use Corollary 4.2.5 to show that each of the following limits does not exist. (a) limx→0 |x|/x (b) limx→1 g(x) where g is Dirichlet’s function from Section 4.1. Exercise 4.2.4. Review the deﬁnition of Thomae’s function t(x) from Section 4.1. (a) Construct three diﬀerent sequences (xn ), (yn ), and (zn ), each of which converges to 1 without using the number 1 as a term in the sequence. (b) Now, compute lim t(xn ), lim t(yn ), and lim t(zn ). (c) Make an educated conjecture for limx→1 t(x), and use Deﬁnition 4.2.1B to verify the claim. (Given > 0, consider the set of points {x ∈ R : t(x) ≥ }. Argue that all the points in this set are isolated.) Exercise 4.2.5. (a) Supply the details for how Corollary 4.2.4 part (ii) follows from the sequential criterion for functional limits in Theorem 4.2.3 and the Algebraic Limit Theorem for sequences proved in Chapter 2. (b) Now, write another proof of Corollary 4.2.4 part (ii) directly from Deﬁnition 4.2.1 without using the sequential criterion in Theorem 4.2.3. (c) Repeat (a) and (b) for Corollary 4.2.4 part (iii). Exercise 4.2.6. Let g : A → R and assume that f is a bounded function on A ⊆ R (i.e., there exists M > 0 satisfying |f (x)| ≤ M for all x ∈ A). Show that if limx→c g(x) = 0, then limx→c g(x)f (x) = 0 as well. Exercise 4.2.7. (a) The statement limx→0 1/x2 = ∞ certainly makes intuitive sense. Construct a rigorous deﬁnition in the “challenge–response” style of Definition 4.2.1 for a limit statement of the form limx→c f (x) = ∞ and use it to prove the previous statement. (b) Now, construct a deﬁnition for the statement limx→∞ f (x) = L. Show limx→∞ 1/x = 0. (c) What would a rigorous deﬁnition for limx→∞ f (x) = ∞ look like? Give an example of such a limit.

4.3. Combinations of Continuous Functions

109

Exercise 4.2.8. Assume f (x) ≥ g(x) for all x in some set A on which f and g are deﬁned. Show that for any limit point c of A we must have lim f (x) ≥ lim g(x).

x→c

x→c

Exercise 4.2.9 (Squeeze Theorem). Let f, g, and h satisfy f (x) ≤ g(x) ≤ h(x) for all x in some common domain A. If limx→c f (x) = L and limx→c h(x) = L at some limit point c of A, show limx→c g(x) = L as well.

4.3

Combinations of Continuous Functions

Deﬁnition 4.3.1. A function f : A → R is continuous at a point c ∈ A if, for all > 0, there exists a δ > 0 such that whenever |x − c| < δ (and x ∈ A) it follows that |f (x) − f (c)| < . If f is continuous at every point in the domain A, then we say that f is continuous on A. The deﬁnition of continuity looks much like the deﬁnition for functional limits, with a few subtle diﬀerences. The most important is that we require the point c to be in the domain of f . The value f (c) then becomes the value of limx→c f (x). With this observation in mind, it is tempting to shorten Deﬁnition 4.3.1 to say that f is continuous at c ∈ A if lim f (x) = f (c).

x→c

This is ﬁne except for the very minor fact that the deﬁnition of functional limits requires the point c to be a limit point of A, and this is not technically assumed in Deﬁnition 4.3.1. Now one consequence of Deﬁnition 4.3.1 is that any function is continuous at isolated points of its domain (Exercise 4.3.4), but this is hardly profound. As the name suggests, isolated points are too far from other points of the domain to contribute to any interesting phenomena. With our attention ﬁrmly focused on limit points of the domain, we summarize several equivalent ways to characterize continuity. Theorem 4.3.2 (Characterizations of Continuity). Let f : A → R, and let c ∈ A be a limit point of A. The function f is continuous at c if and only if any one of the following conditions is met: (i) For all > 0, there exists a δ > 0 such that |x−c| < δ (and x ∈ A) implies |f (x) − f (c)| < ; (ii) lim f (x) = f (c); x→c

(iii) For all V (f (c)), there exists a Vδ (c) with the property that x ∈ Vδ (c) (and x ∈ A) implies f (x) ∈ V (f (c)); (iv) If (xn ) → c (with xn ∈ A), then f (xn ) → f (c).

110

Chapter 4. Functional Limits and Continuity

Proof. Statement (i) is just Deﬁnition 4.3.1. Statement (ii) is seen to be equivalent to (i) by applying Deﬁnition 4.2.1 and observing that the case x = c (which is excluded in the deﬁnition of functional limits) leads to the requirement f (c) ∈ V (f (c)), which is trivially true. Statement (iii) is the standard rewording of (i) using topological neighborhoods in place of the absolute value notation. Finally, statement (iv) follows using an argument nearly identical to that of Theorem 4.2.3 with some slight modiﬁcations for when xn = c. The length of this list is somewhat deceiving. Statements (i), (ii), and (iii) are closely related and essentially remind us that functional limits have an –δ formulation as well as a topological description. Statement (iv), however, is qualitatively diﬀerent from the ﬁrst three. As a general rule, the sequential characterization of continuity is usually the most useful for demonstrating that a function is not continuous at some point. Corollary 4.3.3 (Criterion for Discontinuity). Let f : A → R, and let c ∈ A be a limit point of A. If there exists a sequence (xn ) ⊆ A where (xn ) → c but such that f (xn ) does not converge to f (c), we may conclude that f is not continuous at c. The sequential characterization of continuity is also important for the other reasons that it was important for functional limits. In particular, it allows us to bring our catalog of results about the behavior of sequences to bear on the study of continuous functions. The next theorem should be compared to Corollary 4.2.3 as well as to Theorem 2.3.3. Theorem 4.3.4 (Algebraic Continuity Theorem). Assume f : A → R and g : A → R are continuous at a point c ∈ A. Then, (i) kf (x) is continuous at c for all k ∈ R; (ii) f (x) + g(x) is continuous at c; (iii) f (x)g(x) is continuous at c; and (iv) f (x)/g(x) is continuous at c, provided the quotient is deﬁned. Proof. All of these statments can be quickly derived from Corollary 4.2.4 and Theorem 4.3.2. These results provide us with the tools we need to ﬁrm up our arguments in the opening section of this chapter about the behavior of Dirichlet’s function and Thomae’s function. The details are requested in Exercise 4.3.6. Later, we provide some more examples of arguments for and against continuity of some familiar functions. Example 4.3.5. All polynomials are continuous on R. In fact, rational functions (i.e., quotients of polynomials) are continuous wherever they are deﬁned.

4.3. Combinations of Continuous Functions

111

Figure 4.6: The function x sin(1/x) near zero.

To see why this is so, we begin with the elementary claim that g(x) = x and f (x) = k, where k ∈ R, are continuous on R (Exercise 4.3.3). Because an arbitrary polynomial p(x) = a0 + a1 x + a2 x2 + · · · + an xn consists of sums and products of g(x) with diﬀerent constant functions, we may conclude from Theorem 4.3.4 that p(x) is continuous. Likewise, Theorem 4.3.4 implies that quotients of polynomials are continuous as long as the denominator is not zero. Example 4.3.6. In Example 4.2.6, we saw that the oscillations of sin(1/x) are so rapid near the origin that limx→0 sin(1/x) does not exist. Now, consider the function x sin(1/x) if x = 0 g(x) = 0 if x = 0. To investigate the continuity of g at c = 0 (Fig. 4.6), we can estimate |g(x) − g(0)| = |x sin(1/x) − 0| ≤ |x|. Given > 0, set δ = , so that whenever |x − 0| = |x| < δ it follows that |g(x) − g(0)| < . Thus, g is continuous at the origin. Example 4.3.7. The greatest integer function [[x]] is deﬁned for all x ∈ R by letting [[x]] equal the largest integer n ∈ Z satisfying n ≤ x. This familiar step function certainly has discontinuous “jumps” at each integer value of its domain, but it is a good exercise to try and articulate this observation in the language of analysis. Given m ∈ Z, deﬁne the sequence (xn ) by xn = m − 1/n. It follows that (xn ) → m, but h(xn ) → (m − 1),

112

Chapter 4. Functional Limits and Continuity

which does not equal m = h(m). By Corollary 4.3.3, we see that h fails to be continuous at each m ∈ Z. Now let’s see why h is continuous at a point c ∈ / Z. Given > 0, we must ﬁnd a δ-neighborhood Vδ (c) such that x ∈ Vδ (c) implies h(x) ∈ V (h(c)). We know that c ∈ R falls between consecutive integers n < c < n + 1 for some n ∈ Z. If we take δ = min{c − n, (n + 1) − c}, then it follows from the deﬁnition of h that h(x) = h(c) for all x ∈ Vδ (c). Thus, we certainly have h(x) ∈ V (h(c)) whenever x ∈ Vδ (c). This latter proof is quite diﬀerent from the typical situation in that the value of δ does not actually depend on the choice of . Usually, smaller ’s require smaller δ’s in response, but here the same value of δ works no matter how small is chosen. √ Example 4.3.8. Consider f (x) = x deﬁned on A = {x ∈ R : x ≥ 0}. Exercise 2.3.2 outlines a sequential proof that f is continuous on A. Here, we give an –δ proof of the same fact. Let > 0. We need to argue that |f (x) − f (c)| can be made less than for all values of √ x in some δ neighborhood around c. If c = 0, this reduces to the statement x < , which happens as long as x < 2 . Thus, if we choose δ = 2 , we see that |x − 0| < δ implies |f (x) − 0| < . √ √ For a point c ∈ A diﬀerent from zero, we need to estimate | x − c|. This time, write √ √ √ √ √ √ |x − c| |x − c| x+ c √ √ ≤ √ . =√ | x − c| = | x − c| √ x+ c x+ c c √ In order to make this quantity less than , it suﬃces to pick δ = c. Then, |x − c| < δ implies √ √ √ c | x − c| < √ = , c as desired. Although we have now shown that both polynomials and the square root function are continuous, the Algebraic Continuity Theorem does not √ provide the justiﬁcation needed to conclude that a function such as h(x) = 3x2 + 5 is continuous. For this, we must prove that compositions of continuous functions are continuous. Theorem 4.3.9 (Composition of Continuous Functions). Given f : A → R and g : B → R, assume that the range f (A) = {f (x) : x ∈ A} is contained in the domain B so that the composition g ◦ f (x) = g(f (x)) is well-deﬁned on A. If f is continuous at c ∈ A, and if g is continuous at f (c) ∈ B, then g ◦ f is continuous at c. Proof. Exercise 4.3.2.

4.3. Combinations of Continuous Functions

113

Exercises

√ Exercise 4.3.1. Let g(x) = 3 x. (a) Prove that g is continuous at c = 0. (b) Prove that g is continuous at a point c = 0. (The identity a3 − b3 = (a − b)(a2 + ab + b2 ) will be helpful.) Exercise 4.3.2. (a) Supply a proof for Theorem 4.3.9 using the –δ characterization of continuity. (b) Give another proof of this theorem using the sequential characterization of continuity (from Theorem 4.3.2 (iv)). Exercise 4.3.3. Using the –δ characterization of continuity (and thus using no previous results about sequences), show that the linear function f (x) = ax+b is continuous at every point of R. Exercise 4.3.4. (a) Show using Deﬁnition 4.3.1 that any function f with domain Z will necessarily be continuous at every point in its domain. (b) Show in general that if c is an isolated point of A ⊆ R, then f : A → R is continuous at c. Exercise 4.3.5. In Theorem 4.3.4, statement (iv) says that f (x)/g(x) is continuous at c if both f and g are, provided that the quotient is deﬁned. Show that if g is continuous at c and g(c) = 0, then there exists an open interval containing c on which f (x)/g(x) is always deﬁned. Exercise 4.3.6. (a) Referring to the proper theorems, give a formal argument that Dirichlet’s function from Section 4.1 is nowhere-continuous on R. (b) Review the deﬁnition of Thomae’s function in Section 4.1 and demonstrate that it fails to be continuous at every rational point. (c) Use the characterization of continuity in Theorem 4.3.2 (iii) to show that Thomae’s function is continuous at every irrational point in R. (Given > 0, consider the set of points {x ∈ R : t(x) ≥ }. Argue that all the points in this set are isolated.) Exercise 4.3.7. Assume h : R → R is continuous on R and let K = {x : h(x) = 0}. Show that K is a closed set. Exercise 4.3.8. (a) Show that if a function is continuous on all of R and equal to 0 at every rational point, then it must be identically 0 on all of R. (b) If f and g are deﬁned on all of R and f (r) = g(r) at every rational point, must f and g be the same function? Exercise 4.3.9 (Contraction Mapping Theorem). Let f be a function deﬁned on all of R, and assume there is a constant c such that 0 < c < 1 and |f (x) − f (y)| ≤ c|x − y| for all x, y ∈ R. (a) Show that f is continuous on R.

114

Chapter 4. Functional Limits and Continuity (b) Pick some point y1 ∈ R and construct the sequence (y1 , f (y1 ), f (f (y1 )), . . . ).

In general, if yn+1 = f (yn ), show that the resulting sequence (yn ) is a Cauchy sequence. Hence we may let y = lim yn . (c) Prove that y is a ﬁxed point of f (i.e., f (y) = y) and that it is unique in this regard. (d) Finally, prove that if x is any arbitrary point in R then the sequence (x, f (x), f (f (x)), . . . ) converges to y deﬁned in (b). Exercise 4.3.10. Let f be a function deﬁned on all of R that satisﬁes the additive condition f (x + y) = f (x) + f (y) for all x, y ∈ R. (a) Show that f (0) = 0 and that f (−x) = −f (x) for all x ∈ R. (b) Show that if f is continuous at x = 0, then f is continuous at every point in R. (c) Let k = f (1). Show that f (n) = kn for all n ∈ N, and then prove that f (z) = kz for all z ∈ Z. Now, prove that f (r) = kr for any rational number r. (d) Use (b) and (c) to conclude that f (x) = kx for all x ∈ R. Thus, any additive function that is continuous at x = 0 must necessarily be a linear function through the origin. Exercise 4.3.11. For each of the following choices of A, construct a function f : R → R that has discontinuities at every point x in A and is continuous on Ac . (a) A = Z. (b) A = {x : 0 < x < 1}. (c) A = {x : 0 ≤ x ≤ 1}. (d) A = { n1 : n ∈ N}. Exercise 4.3.12. Let C be the Cantor set constructed in Section 3.1. Deﬁne g : [0, 1] → R by 1 if x ∈ C g(x) = 0 if x ∈ / C. (a) Show that g fails to be continuous at any point c ∈ C. (b) Prove that g is continuous at every point c ∈ / C.

4.4

Continuous Functions on Compact Sets

Deﬁnition 4.4.1. Given a function f : A → R and a subset B ⊆ A, let f (B) represent the range of f over the set B; that is, f (B) = {f (x) : x ∈ B}. We say f is bounded if f (A) is bounded in the sense of Deﬁnition 2.3.1. For a given subset B ⊆ A, we say f is bounded on B if f (B) is bounded. The adjectives open, closed, bounded, compact, perfect, and connected are all used to describe subsets of the real line. An interesting question is to sort out which, if any, of these properties are preserved when a particular set A ⊆ R

4.4. Continuous Functions on Compact Sets

115

is mapped to f (A) via a continuous function. For instance, if A is open and f is continuous, is f (A) necessarily open? The answer to this question is no. If f (x) = x2 and A is the open interval (−1, 1), then f (A) is the interval [0, 1), which is not open. The corresponding conjecture for closed sets also turns out to be false, although constructing a counterexample requires a little more thought. Consider the function 1 g(x) = 1 + x2 and the closed set A = [0, ∞) = {x : x ≥ 0}. Because g(A) = (0, 1] is not closed, we must conclude that continuous functions do not, in general, map closed sets to closed sets. Notice, however, that our particular counterexample required using an unbounded closed set A. This is not incidental. Sets that are closed and bounded—that is, compact sets—always get mapped to closed and bounded subsets by continuous functions. Theorem 4.4.2 (Preservation of Compact Sets). Let f : A → R be continuous on A. If K ⊆ A is compact, then f (K) is compact as well. Proof. Let (yn ) be an arbitrary sequence contained in the range set f (K). To prove this result, we must ﬁnd a subsequence (ynk ), which converges to a limit also in f (K). The strategy is to take advantage of the assumption that the domain set K is compact by translating the sequence (yn )—which is in the range of f —back to a sequence in the domain K. To assert that (yn ) ⊆ f (K) means that, for each n ∈ N, we can ﬁnd (at least one) xn ∈ K with f (xn ) = yn . This yields a sequence (xn ) ⊆ K. Because K is compact, there exists a convergent subsequence (xnk ) whose limit x = lim xnk is also in K. Finally, we make use of the fact that f is assumed to be continuous on A and so is continuous at x in particular. Given that (xnk ) → x, we conclude that (ynk ) → f (x). Because x ∈ K, we have that f (x) ∈ f (K), and hence f (K) is compact. An extremely important corollary is obtained by combining this result with the observation that compact sets are bounded and contain their supremums and inﬁmums (Exercise 3.3.1). Theorem 4.4.3 (Extreme Value Theorem). If f : K → R is continuous on a compact set K ⊆ R, then f attains a maximum and minimum value. In other words, there exists x0 , x1 ∈ K such that f (x0 ) ≤ f (x) ≤ f (x1 ) for all x ∈ K. Proof. Exercise 4.4.3.

Uniform Continuity Although we have proved that polynomials are always continuous on R, there is an important lesson to be learned by constructing direct proofs that the functions f (x) = 3x + 1 and g(x) = x2 (previously studied in Example 4.2.2) are everywhere continuous.

116

Chapter 4. Functional Limits and Continuity

Example 4.4.4. (i) To show directly that f (x) = 3x + 1 is continuous at an arbitrary point c ∈ R, we must argue that |f (x) − f (c)| can be made arbitrarily small for values of x near c. Now, |f (x) − f (c)| = |(3x + 1) − (3c + 1)| = 3|x − c|, so, given > 0, we choose δ = /3. Then, |x − c| < δ implies |f (x) − f (c)| = 3|x − c| < 3

3

= .

Of particular importance for this discussion is the fact that the choice of δ is the same regardless of which point c ∈ R we are considering. (ii) Let’s contrast this with what happens when we prove g(x) = x2 is continuous on R. Given c ∈ R, we have |g(x) − g(c)| = |x2 − c2 | = |x − c||x + c|. As discussed in Example 4.2.2, we need an upper bound on |x + c|, which is obtained by insisting that our choice of δ not exceed 1. This guarantees that all values of x under consideration will necessarily fall in the interval (c − 1, c + 1). It follows that |x + c| ≤ |x| + |c| ≤ (|c| + 1) + |c| = 2|c| + 1. Now, let > 0. If we choose δ = min{1, /(2|c| + 1)}, then |x − c| < δ implies |f (x) − f (c)| = |x − c||x + c| <

2|c| + 1

(2|c| + 1) = .

Now, there is nothing deﬁcient about this argument, but it is important to notice that, in the second proof, the algorithm for choosing the response δ depends on the value of c. The statement δ=

2|c| + 1

means that larger values of c are going to require smaller values of δ, a fact that should be evident from a consideration of the graph of g(x) = x2 (Fig. 4.7). Given, say, = 1, a response of δ = 1/3 is suﬃcient for c = 1 because 2/3 < x < 4/3 certainly implies 0 < x2 < 2. However, if c = 10, then the steepness of the graph of g(x) means that a much smaller δ is required—δ = 1/21 by our rule—to force 99 < x2 < 101. The next deﬁnition is meant to distinguish between these two examples. Deﬁnition 4.4.5. A function f : A → R is uniformly continuous on A if for every > 0 there exists a δ > 0 such that |x − y| < δ implies |f (x) − f (y)| < .

4.4. Continuous Functions on Compact Sets

( V (f (c3 ))

6 5

( V (f (c2 ))

6 5

( V (f (c1 ))

117

6 5 c1

Vδ1 (c1 )

c2

Vδ2 (c2 )

c3

Vδ3 (c3 )

Figure 4.7: g(x) = x2 ; A larger c requires a smaller δ.

Recall that to say that “f is continuous on A” means that f is continuous at each individual point c ∈ A. In other words, given > 0 and c ∈ A, we can ﬁnd a δ > 0 perhaps depending on c such that if |x − c| < δ then |f (x) − f (c)| < . Uniform continuity is a strictly stronger property. The key distinction between asserting that f is “uniformly continuous on A” versus simply “continuous on A” is that, given an > 0, a single δ > 0 can be chosen that works simultaneously for all points c in A. To say that a function is not uniformly continuous on a set A, then, does not necessarily mean it is not continuous at some point. Rather, it means that there is some 0 > 0 for which no single δ > 0 is a suitable response for all c ∈ A. Theorem 4.4.6 (Sequential Criterion for Nonuniform Continuity). A function f : A → R fails to be uniformly continuous on A if there exists a particular 0 > 0 and two sequences (xn ) and (yn ) in A satisfying |xn − yn | → 0

but

|f (xn ) − f (yn )| ≥ 0 .

Proof. Take the logical negation of Deﬁnition 4.4.5, and consider the particular values δn = 1/n. The details are requested in Exercise 4.4.5. Example 4.4.7. The function h(x) = sin(1/x) (Fig. 4.5) is continuous at every point in the open interval (0, 1) but is not uniformly continuous on this interval. The problem arises near zero, where the increasingly rapid oscillations take domain values that are quite close together to range values a distance 2

118

Chapter 4. Functional Limits and Continuity

apart. To illustrate Theorem 4.4.6, take 0 = 2 and set xn =

1 π/2 + 2nπ

and

yn =

1 . 3π/2 + 2nπ

Because each of these sequences tends to zero, we have |xn − yn | → 0, and a short calculation reveals |h(xn ) − h(yn )| = 2 for all n ∈ N. Whereas continuity is deﬁned at a single point, uniform continuity is always discussed in reference to a particular domain. In Example 4.4.4, we were not able to prove that g(x) = x2 is uniformly continuous on R because larger values of x require smaller and smaller values of δ. (As another illustration of Theorem 4.4.6, take xn = n and yn = n + 1/n.) It is true, however, that g(x) is uniformly continuous on the bounded set [−10, 10]. Returning to the argument set forth in Example 4.4.4 (ii), notice that if we restrict our attention to the domain [−10, 10] then |x + y| ≤ 20 for all x and y. Given > 0, we can now choose δ = /20, and verify that if x, y ∈ [−10, 10] satisfy |x − y| < δ, then 20 = . |f (x) − f (y)| = |x2 − y 2 | = |x − y||x + y| < 20 In fact, it is not diﬃcult to see how to modify this argument to show that g(x) is uniformly continuous on any bounded set A in R. Now, Example 4.4.7 is included to keep us from jumping to the erroneous conclusion that functions that are continuous on bounded domains are necessarily uniformly continuous. A general result does follow, however, if we assume that the domain is compact. Theorem 4.4.8. A function that is continuous on a compact set K is uniformly continuous on K. Proof. Assume f : K → R is continuous at every point of a compact set K ⊆ R. To prove that f is uniformly continuous on K we argue by contradiction. By the criterion in Theorem 4.4.6, if f is not uniformly continuous on K, then there exist two sequences (xn ) and (yn ) in K such that lim |xn − yn | = 0

while

|f (xn ) − f (yn )| ≥ 0

for some particular 0 > 0. Because K is compact, the sequence (xn ) has a convergent subsequence (xnk ) with x = lim xnk also in K. We could use the compactness of K again to produce a convergent subsequence of (yn ), but notice what happens when we consider the particular subsequence (ynk ) consisting of those terms in (yn ) that correspond to the terms in the convergent subsequence (xnk ). By the Algebraic Limit Theorem, lim(ynk ) = lim((ynk − xnk ) + xnk ) = 0 + x. The conclusion is that both (xnk ) and (ynk ) converge to x ∈ K. Because f is assumed to be continuous at x, we have lim f (xnk ) = f (x) and lim f (ynk ) = f (x), which implies lim(f (xnk ) − f (ynk )) = 0.

4.4. Continuous Functions on Compact Sets

119

A contradiction arises when we recall that (xn ) and (yn ) were chosen to satisfy |f (xn ) − f (yn )| ≥ 0 for all n ∈ N. We conclude, then, that f is indeed uniformly continuous on K.

Exercises Exercise 4.4.1. (a) Show that f (x) = x3 is continuous on all of R. (b) Argue, using Theorem 4.4.6, that f is not uniformly continuous on R. (c) Show that f is uniformly continuous on any bounded subset of R. Exercise 4.4.2. Show that f (x) = 1/x2 is uniformly continuous on the set [1, ∞) but not on the set (0, 1]. Exercise 4.4.3. Furnish the details (including an argument for Exercise 3.3.1 if it is not already done) for the proof of the Extreme Value Theorem (Theorem 4.4.3). Exercise 4.4.4. Show that if f is continuous on [a, b] with f (x) > 0 for all a ≤ x ≤ b, then 1/f is bounded on [a, b]. Exercise 4.4.5. Using the advice that follows Theorem 4.4.6, provide a complete proof for this criterion for nonuniform continuity. Exercise 4.4.6. Give an example of each of the following, or state that such a request is impossible. For any that are impossible, supply a short explanation (perhaps referencing the appropriate theorem(s)) for why this is the case. (a) a continuous function f : (0, 1) → R and a Cauchy sequence (xn ) such that f (xn ) is not a Cauchy sequence; (b) a continuous function f : [0, 1] → R and a Cauchy sequence (xn ) such that f (xn ) is not a Cauchy sequence; (c) a continuous function f : [0, ∞) → R and a Cauchy sequence (xn ) such that f (xn ) is not a Cauchy sequence; (d) a continuous bounded function f on (0, 1) that attains a maximum value on this open interval but not a minimum value. Exercise 4.4.7. Assume that g is deﬁned on an open interval (a, c) and it is known to be uniformly continuous on (a, b] and [b, c), where a 0 such that f is uniformly continuous on the set [b, ∞), then f is uniformly continuous on [0, ∞). √ (b) Prove that f (x) = x is uniformly continuous on [0, ∞).

120

Chapter 4. Functional Limits and Continuity

Exercise 4.4.9. A function f : A → R is called Lipschitz if there exists a bound M > 0 such that f (x) − f (y) ≤M x−y for all x, y ∈ A. Geometrically speaking, a function f is Lipschitz if there is a uniform bound on the magnitude of the slopes of lines drawn through any two points on the graph of f . (a) Show that if f : A → R is Lipschitz, then it is uniformly continuous on A. (b) Is the converse statement true? Are all uniformly continuous functions necessarily Lipschitz? Exercise 4.4.10. Do uniformly continuous functions preserve boundedness? If f is uniformly continuous on a bounded set A, is f (A) necessarily bounded? Exercise 4.4.11 (Topological Characterization of Continuity). Let g be deﬁned on all of R. If A is a subset of R, deﬁne the set g −1 (A) by g −1 (A) = {x ∈ R : g(x) ∈ A}. Show that g is continuous if and only if g −1 (O) is open whenever O ⊆ R is an open set. Exercise 4.4.12. Construct an alternate proof of Theorem 4.4.8 using the open cover characterization of compactness from Theorem 3.3.8 (iii). Exercise 4.4.13 (Continuous Extension Theorem). (a) Show that a uniformly continuous function preserves Cauchy sequences; that is, if f : A → R is uniformly continuous and (xn ) ⊆ A is a Cauchy sequence, then show f (xn ) is a Cauchy sequence. (b) Let g be a continuous function on the open interval (a, b). Prove that g is uniformly continuous on (a, b) if and only if it is possible to deﬁne values g(a) and g(b) at the endpoints so that the extended function g is continuous on [a, b]. (In the forward direction, ﬁrst produce candidates for g(a) and g(b), and then show the extended g is continuous.)

4.5

The Intermediate Value Theorem

The Intermediate Value Theorem (IVT) is the name given to the very intuitive observation that a continuous function f on a closed interval [a, b] attains every value that falls between the range values f (a) and f (b) (Fig. 4.8). Here is this observation in the language of analysis. Theorem 4.5.1 (Intermediate Value Theorem). If f : [a, b] → R is continuous, and if L is a real number satisfying f (a) < L < f (b) or f (a) > L > f (b), then there exists a point c ∈ (a, b) where f (c) = L.

4.5. The Intermediate Value Theorem

121

f (b)

•

L

f (a)

•

a

c

b

Figure 4.8: Intermediate Value Theorem.

This theorem was freely used by mathematicians of the 18th century (including Euler and Gauss) without any consideration of its validity. In fact, the ﬁrst analytical proof was not oﬀered until 1817 by Bolzano in a paper that also contains the ﬁrst appearance of a somewhat modern deﬁnition of continuity. This emphasizes the signiﬁcance of this result. As discussed in Section 4.1, Bolzano and his contemporaries had arrived at a point in the evolution of mathematics where it was becoming increasingly important to ﬁrm up the foundations of the subject. Doing so, however, was not simply a matter of going back and supplying the missing proofs. The real battle lay in ﬁrst obtaining a thorough and mutually agreed-upon understanding of the relevant concepts. The importance of the Intermediate Value Theorem for us is similar in that our understanding of continuity and the nature of the real line is now mature enough for a proof to be possible. Indeed, there are several satisfying arguments for this simple result, each one isolating, in a slightly diﬀerent way, the interplay between continuity and completeness.

Preservation of Connected Sets The most potentially useful way to understand the Intermediate Value Theorem (IVT) is as a special case of the fact that continuous functions map connected sets to connected sets. In Theorem 4.4.2, we saw that if f is a continuous function on a compact set K, then the range set f (K) is also compact. The analogous observation holds for connected sets. Theorem 4.5.2 (Preservation of Connectedness). Let f : A → R be continuous. If E ⊆ A is connected, then f (E) is connected as well. Proof. Intending to use the characterization of connected sets in Theorem 3.4.6,

122

Chapter 4. Functional Limits and Continuity

let f (E) = A ∪ B where A and B are disjoint and nonempty. Our goal is to produce a sequence contained in one of these sets that converges to a limit in the other. Let C = {x ∈ E : f (x) ∈ A}

and

D = {x ∈ E : f (x) ∈ B}.

The sets C and D are called the preimages of A and B, respectively. Using the properties of A and B, it is straightforward to check that C and D are nonempty and disjoint and satisfy E = C ∪ D. Now, we are assuming E is a connected set, so by Theorem 3.4.6, there exists a sequence (xn ) contained in one of C or D with x = lim xn contained in the other. Finally, because f is continuous at x, we get f (x) = lim f (xn ). Thus, it follows that f (xn ) is a convergent sequence contained in either A or B while the limit f (x) is an element of the other. With another nod to Theorem 3.4.6, the proof is complete. In R, a set is connected if and only if it is a (possibly unbounded) interval. This fact, together with Theorem 4.5.2, leads to a short proof of the Intermediate Value Theorem (Exercise 4.5.1). We should point out that the proof of Theorem 4.5.2 does not make use of the equivalence between connected sets and intervals in R but relies only on the general deﬁnitions. The previous comment that this is the most useful way to approach IVT stems from the fact that, although it is not discussed here, the deﬁnitions of continuity and connectedness can be easily adapted to higher-dimensional settings. Theorem 4.5.2, then, remains a valid conclusion in higher dimensions, whereas the Intermediate Value Theorem is essentially a one-dimensional result.

Completeness A typical way the Intermediate Value Theorem is applied is to prove the existence of roots. Given f (x) = x2 − 2, for instance, we see that f (1) = −1 and f (2) = 2. Therefore, there exists a point c ∈√(0, 1) where f (c) = 0. In this case, we can easily compute c = 2, meaning that we really did not need IVT to show that f has a root. We spent a good deal of time in Chapter 1 √ proving that 2 exists, which was only possible once we insisted on the Axiom of Completeness as part of our assumptions about the real numbers. The fact that √ the Intermediate Value Theorem has just asserted that 2 exists suggests that another way to understand this result is in terms of the relationship between the continuity of f and the completeness of R. The Axiom of Completeness (AoC) from the ﬁrst chapter states that “Sets which are bounded above have least upper bounds.” Later, we saw that the Nested Interval Property (NIP) is an equivalent way to assert that the real numbers have no “gaps.” Either of these characterizations of completeness can be used as the cornerstone for an alternate proof of Theorem 4.5.1. Proof. I. (First approach using AoC.) To simplify matters a bit, let’s consider the special case where f is a continuous function satisfying f (a) < 0 < f (b) and

4.5. The Intermediate Value Theorem

123

show that f (c) = 0 for some c ∈ (a, b). First let K = {x ∈ [a, b] : f (x) ≤ 0}.

f (b)

a f (a)

•

I ❅

❅

✻ ✒

K

✻

b

c=sup K

Notice that K is bounded above by b, and a ∈ K so K is not empty. Thus we may appeal to the Axiom of Completeness to assert that c = sup K exists. There are three cases to consider: f (c) > 0, f (c) < 0, and f (c) = 0. The fact that c is the least upper bound of K can be used to rule out the ﬁrst two cases, resulting in the desired conclusion that f (c) = 0. The details are requested in Exercise 4.5.5. II. (Second approach using NIP.) Again, consider the special case where L = 0 and f (a) < 0 < f (b). Let I0 = [a, b], and consider the midpoint z = (a + b)/2. If f (z) ≥ 0, then set a1 = a and b1 = z. If f (z) < 0, then set a1 = z and b1 = b. In either case, the interval I1 = [a1 , b1 ] has the property that f is negative at the left endpoint and nonnegative at the right.

• f (z)>0 • a

z

b

I0 I1 I2

This procedure can be inductively repeated, setting the stage for an application of the Nested Interval Property. The remainder of the argument is left as Exercise 4.5.6.

124

Chapter 4. Functional Limits and Continuity

The Intermediate Value Property Does the Intermediate Value Theorem have a converse? Deﬁnition 4.5.3. A function f has the intermediate value property on an interval [a, b] if for all x < y in [a, b] and all L between f (x) and f (y), it is always possible to ﬁnd a point c ∈ (x, y) where f (c) = L. Another way to summarize the Intermediate Value Theorem is to say that every continuous function on [a, b] has the intermediate value property. There is an understandable temptation to suspect that any function that has the intermediate value property must necessarily be continuous, but that is not the case. We have seen that sin(1/x) if x = 0 g(x) = 0 if x = 0 is not continuous at zero (Example 4.2.6), but it does have the intermediate value property on [0, 1]. The intermediate value property does imply continuity if we insist that our function is monotone (Exercise 4.5.4).

Exercises Exercise 4.5.1. Show how the Intermediate Value Theorem follows as a corollary to Theorem 4.5.2. Exercise 4.5.2. Decide on the validity of the following conjectures. (a) Continuous functions take bounded open intervals to bounded open intervals. (b) Continuous functions take bounded open intervals to open sets. (c) Continuous functions take bounded closed intervals to bounded closed intervals. Exercise 4.5.3. Is there a continuous function on all of R with range f (R) equal to Q? Exercise 4.5.4. A function f is increasing on A if f (x) ≤ f (y) for all x < y in A. Show that the Intermediate Value Theorem does have a converse if we assume f is increasing on [a, b]. Exercise 4.5.5. Finish the proof of the Intermediate Value Theorem using the Axiom of Completeness started previously. Exercise 4.5.6. Finish the proof of the Intermediate Value Theorem using the Nested Interval Property started previously. Exercise 4.5.7. Let f be a continuous function on the closed interval [0, 1] with range also contained in [0, 1]. Prove that f must have a ﬁxed point; that is, show f (x) = x for at least one value of x ∈ [0, 1].

4.6. Sets of Discontinuity

125

Exercise 4.5.8. Imagine a clock where the hour hand and the minute hand are indistinguishable from each other. Assuming the hands move continuously around the face of the clock, and assuming their positions can be measured with perfect accuracy, is it always possible to determine the time?

4.6

Sets of Discontinuity

Given a function f : R → R, deﬁne Df ⊆ R to be the set of points where the function f fails to be continuous. In Section 4.1, we saw that Dirichlet’s function g(x) had Dg = R. The modiﬁcation h(x) of Dirichlet’s function had Dh = R\{0}, zero being the only point of continuity. Finally, for Thomae’s function t(x), we saw that Dt = Q. Exercise 4.6.1. Using modiﬁcations of these functions, construct a function f : R → R so that (a) Df = Z. (b) Df = {x : 0 < x ≤ 1}. We concluded the introduction with a question about whether Df could take the form of any arbitrary subset of the real line. As it turns out, this is not the case. The set of discontinuities of a real-valued function on R has a speciﬁc topological structure that is not possessed by every subset of R. Speciﬁcally, Df , no matter how f is chosen, can always be written as the countable union of closed sets. In the case where f is monotone, these closed sets can be taken to be single points.

Monotone Functions Classifying Df for an arbitrary f is somewhat involved, so it is interesting that describing Df is fairly straightforward for the class of monotone functions. Deﬁnition 4.6.1. A function f : A → R is increasing on A if f (x) ≤ f (y) whenever x < y and decreasing if f (x) ≥ f (y) whenever x < y in A. A monotone function is one that is either increasing or decreasing. Continuity of f at a point c means that limx→c f (x) = f (c). One particular way for a discontinuity to occur is if the limit from the right at c is diﬀerent from the limit from the left at c. As always with new terminology, we need to be precise about what we mean by “from the left” and “from the right.” Deﬁnition 4.6.2 (Right-hand limit). Given a limit point c of a set A and a function f : A → R, we write lim f (x) = L

x→c+

if for all > 0 there exists a δ > 0 such that |f (x)−L| < whenever 0 < x−c < δ. Equivalently, in terms of sequences, limx→c+ f (x) = L if lim f (xn ) = L for all sequences (xn ) satisfying xn > c and lim(xn ) = c.

126

Chapter 4. Functional Limits and Continuity

Exercise 4.6.2. State a similar deﬁnition for the left-hand limit lim f (x) = L.

x→c−

Theorem 4.6.3. Given f : A → R and a limit point c of A, limx→c f (x) = L if and only if lim f (x) = L and lim+ f (x) = L. + x→c

x→c

Exercise 4.6.3. Supply a proof for this proposition. Generally speaking, discontinuities can be divided into three categories: (i) If limx→c f (x) exists but has a value diﬀerent from f (c), the discontinuity at c is called removable. (ii) If limx→c+ f (x) = limx→c− f (x), then f has a jump discontinuity at c. (iii) If limx→c f (x) does not exist for some other reason, then the discontinuity at c is called an essential discontinuity. We are now equipped to characterize the set Df for an arbitrary monotone function f . Exercise 4.6.4. Let f : R → R be increasing. Prove that limx→c+ f (x) and limx→c− f (x) must each exist at every point c ∈ R. Argue that the only type of discontinuity a monotone function can have is a jump discontinuity. Exercise 4.6.5. Construct a bijection between the set of jump discontinuities of a monotone function f and a subset of Q. Conclude that Df for a monotone function f must either be ﬁnite or countable, but not uncountable.

Df for an Arbitrary Function Recall that the intersection of an inﬁnite collection of closed sets is closed, but for unions we must restrict ourselves to ﬁnite collections of closed sets in order to ensure the union is closed. For open sets the situation is reversed. The arbitrary union of open sets is open, but only ﬁnite intersections of open sets are necessarily open. Deﬁnition 4.6.4. A set that can be written as the countable union of closed sets is in the class Fσ . (This deﬁnition also appeared in Section 3.5.) To this point, we have constructed functions where the set of discontinuity has been R (Dirichlet’s function), R\{0} (modiﬁed Dirichlet function), Q (Thomae’s function), Z, and (0, 1] (Exercise 4.6.1). Exercise 4.6.6. Show that in each case we get an Fσ set as the set where each function is discontinuous. The upcoming argument depends on a concept called α-continuity.

4.7. Epilogue

127

Deﬁnition 4.6.5. Let f be deﬁned on R, and let α > 0. The function f is α-continuous at x ∈ R if there exists a δ > 0 such that for all y, z ∈ (x−δ, x+δ) it follows that |f (y) − f (z)| < α. The most important thing to note about this deﬁnition is that there is no “for all” in front of the α > 0. As we will investigate, adding this quantiﬁer would make this deﬁnition equivalent to our deﬁnition of continuity. In a sense, α-continuity is a measure of the variation of the function in the neighborhood of a particular point. A function is α-continuous at a point c if there is some interval centered at c in which the variation of the function never exceeds the value α > 0. Given a function f on R, deﬁne Dα to be the set of points where the function f fails to be α-continuous. In other words, Dα = {x ∈ R : f is not α-continuous at x}. Exercise 4.6.7. Prove that, for a ﬁxed α > 0, the set Dα is closed. The stage is set. It is time to characterize the set of discontinuity for an arbitrary function f on R. Theorem 4.6.6. Let f : R → R be an arbitary function. Then, Df is an Fσ set. Proof. Recall that Df = {x ∈ R : f is not continuous at x}. Exercise 4.6.8. If α1 < α2 , show that Dα2 ⊆ Dα1 . Exercise 4.6.9. Let α > 0 be given. Show that if f is continuous at x, then it is α-continuous at x as well. Explain how it follows that Dα ⊆ Df . Exercise 4.6.10. Show that if f is not continuous at x, then f is not αcontinuous for some α > 0. Now explain why this guarantees that Df =

∞ n=1

D n1 .

Because each D n1 is closed, the proof is complete.

4.7

Epilogue

Theorem 4.6.6 is only interesting if we can demonstrate that not every subset of R is in an Fσ set. This takes some eﬀort and was included as an exercise in Section 3.5 on the Baire Category Theorem. Baire’s Theorem states that if R is written as the countable union of closed sets, then at least one of these sets must contain a nonempty open interval. Now Q is the countable union of singleton points, and we can view each point as a closed set that obviously contains no

128

Chapter 4. Functional Limits and Continuity

intervals. If the set of irrationals I were a countable union of closed sets, it would have to be that none of these closed sets contained any open intervals or else they would then contain some rational numbers. But this leads to a contradiction to Baire’s Theorem. Thus, I is not the countable union of closed sets, and consequently it is not an Fσ set. We may therefore conclude that there is no function f that is continuous at every rational point and discontinuous at every irrational point. This should be compared with Thomae’s function discussed earlier. The converse question is interesting as well. Given an arbitrary Fσ set, W.H. Young showed in 1903 that it is always possible to construct a function that has discontinuities precisely on this set. His construction involves the same Dirichlet-type deﬁnitions we have seen but is understandably more intricate. By contrast, a function demonstrating the converse for the monotone case is not too diﬃcult to describe. Let D = {x1 , x2 , x3 , x4 , . . . } be an arbitrary countable set of real numbers. In order to construct a monotone function that has discontinuities precisely on D, intuitively attach a “weight” of 1/2n to each point xn ∈ D. Now, deﬁne f (x) =

n:xn

1 n 2 <x

where for each x ∈ R the sum is intended to be taken over all of those weights corresponding to points to the left of x. (If there are no points in D to the left of x, then set f (x) = 0.) Any worries about the order of the sum can be alleviated by observing that the convergence is absolute. It is not too hard to show that the resulting function f is monotone and has jump discontinuities of size 1/2n at each point xn in D, as desired (Exercise 6.4.8).

Chapter 5

The Derivative 5.1

Discussion: Are Derivatives Continuous?

The geometric motivation for the derivative is most likely familiar territory. Given a function g(x), the derivative g (x) is understood to be the slope of the graph of g at each point x in the domain. A graphical picture (Fig. 5.1) reveals the impetus behind the mathematical deﬁnition g (c) = lim

x→c

g(x) − g(c) . x−c

The diﬀerence quotient (g(x) − g(c))/(x − c) represents the slope of the line through the two points (x, g(x)) and (c, g(c)). By taking the limit as x approaches c, we arrive at a well-deﬁned mathematical meaning for the slope of the tangent line at x = c. The myriad applications of the derivative function are the topic of much of the calculus sequence, as well as several other upper-level courses in mathematics. None of these applied questions are pursued here in any length, but it should be pointed out that the rigorous underpinnings for diﬀerentiation worked

✒

(x,g(x)) (c,g(c))

m=

g(x)−g(c) x−c

m=g (c) • ✶ ✏ ✏ ✏✏

✏ ✏✏• ✮ ✠ c

x

✲

Figure 5.1: Definition of g (c). 129

130

Chapter 5. The Derivative

out in this chapter are an essential foundation for any applied study. Eventually, as the derivative is subjected to more and more complex manipulations, it becomes crucial to know precisely how diﬀerentiation is deﬁned and how it interacts with other mathematical operations. Although physical applications are not explicitly discussed, we will encounter several questions of a more abstract quality as we develop the theory. Most of these are concerned with the relationship between diﬀerentiation and continuity. Are continuous functions always diﬀerentiable? If not, how nondiﬀerentiable can a continuous function be? Are diﬀerentiable functions continuous? Given that a function f has a derivative at every point in its domain, what can we say about the function f ? Is f continuous? How accurately can we describe the set of all possible derivatives, or are there no restrictions? Put another way, if we are given an arbitrary function g, is it always possible to ﬁnd a diﬀerentiable function f such that f = g, or are there some properties that g must possess for this to occur? In our study of continuity, we saw that restricting our attention to monotone functions had a signiﬁcant impact on the answers to questions about sets of discontinuity. What eﬀect, if any, does this same restriction have on our questions about potential sets of nondiﬀerentiable points? Some of these issues are harder to resolve than others, and some remain unanswered in any satisfactory way. A particularly useful class of examples for this discussion are functions of the form n x sin(1/x) if x = 0 gn (x) = 0 if x = 0. When n = 0, we have seen (Example 4.2.6) that the oscillations of sin(1/x) prevent g0 (x) from being continuous at x = 0. When n = 1, these oscillations are squeezed between |x| and −|x|, the result being that g1 is continuous at x = 0 (Example 4.3.6). Is g1 (0) deﬁned? Using the preceding deﬁnition, we get g1 (0) = lim

x→0

g1 (x) = lim sin(1/x), x→0 x

which, as we now know, does not exist. Thus, g1 is not diﬀerentiable at x = 0. On the other hand, the same calculation shows that g2 is diﬀerentiable at zero. In fact, we have g2 (0) = lim x sin(1/x) = 0. x→0

At points diﬀerent from zero, we can use the familiar rules of diﬀerentiation (soon to be justiﬁed) to conclude that g2 is diﬀerentiable everywhere in R with − cos(1/x) + 2x sin(1/x) if x = 0 g2 (x) = 0 if x = 0. But now consider

lim g2 (x).

x→0

Because the cos(1/x) term is not preceded by a factor of x, we must conclude that this limit does not exist and that, consequently, the derivative function is

5.2. Derivatives and the Intermediate Value Property

131

Figure 5.2: The function g2 (x) = x2 sin(1/x) near zero.

not continuous. To summarize, the function g2 (x) is continuous and diﬀerentiable everywhere on R (Fig. 5.2), the derivative function g2 is thus deﬁned everywhere on R, but g2 has a discontinuity at zero. The conclusion is that derivatives need not, in general, be continuous! The discontinuity in g2 is essential, meaning limx→0 g (x) does not exist as a one-sided limit. But, what about a function with a simple jump discontinuity? For example, does there exist a function h such that −1 if x ≤ 0 h (x) = 1 if x > 0. A ﬁrst impression may bring to mind the absolute value function, which has slopes of −1 at points to the left of zero and slopes of 1 to the right. However, the absolute value function is not diﬀerentiable at zero. We are seeking a function that is diﬀerentiable everywhere, including the point zero, where we are insisting that the slope of the graph be −1. The degree of diﬃculty of this request should start to become apparent. Without sacriﬁcing diﬀerentiability at any point, we are demanding that the slopes jump from −1 to 1 and not attain any value in between. Although we have seen that continuity is not a required property of derivatives, the intermediate value property will prove a more stubborn quality to ignore.

5.2

Derivatives and the Intermediate Value Property

Although the deﬁnition would technically make sense for more complicated domains, all of the interesting results about the relationship between a function and its derivative require that the domain of the given function be an interval.

132

Chapter 5. The Derivative

Thinking geometrically of the derivative as a rate of change, it should not be too surprising that we would want to conﬁne the independent variable to move about a connected domain. The theory of functional limits from Section 4.2 is all that is needed to supply a rigorous deﬁnition for the derivative. Deﬁnition 5.2.1. Let g : A → R be a function deﬁned on an interval A. Given c ∈ A, the derivative of g at c is deﬁned by g (c) = lim

x→c

g(x) − g(c) , x−c

provided this limit exists. If g exists for all points c ∈ A, we say that g is diﬀerentiable on A. Example 5.2.2. (a) Consider f (x) = xn , where n ∈ N, and let c be any arbitrary point in R. Using the algebraic identity xn − cn = (x − c)(xn−1 + cxn−2 + c2 xn−3 + · · · + cn−1 ), we can calculate the familiar formula xn − c n = lim (xn−1 + cxn−2 + c2 xn−3 + · · · + cn−1 ) f (c) = lim x→c x − c x→c = cn−1 + cn−1 + · · · + cn−1 = ncn−1 . (b) If g(x) = |x|, then attempting to compute the derivative at c = 0 produces the limit |x| g (0) = lim , x→0 x which is +1 or −1 depending on whether x approaches zero from the right or left. Consequently, this limit does not exist, and we conclude that g is not diﬀerentiable at zero. Example 5.2.2 (b) is a reminder that continuity of g does not imply that g is necessarily diﬀerentiable. On the other hand, if g is diﬀerentiable at a point, then it is true that g must be continuous at this point. Theorem 5.2.3. If g : A → R is diﬀerentiable at a point c ∈ A, then g is continuous at c as well. Proof. We are assuming that g (c) = lim

x→c

g(x) − g(c) x−c

exists, and we want to prove that limx→c g(x) = g(c). But notice that the Algebraic Limit Theorem for functional limits allows us to write g(x) − g(c) (x − c) = g (c) · 0 = 0. lim (g(x) − g(c)) = lim x→c x→c x−c It follows that limx→c g(x) = g(c).

5.2. Derivatives and the Intermediate Value Property

133

Combinations of Diﬀerentiable Functions The Algebraic Limit Theorem (Theorem 2.3.3) led easily to the conclusion that algebraic combinations of continuous functions are continuous. With only slightly more work, we arrive at a similar conclusion for sums, products, and quotients of diﬀerentiable functions. Theorem 5.2.4. Let f and g be functions deﬁned on an interval A, and assume both are diﬀerentiable at some point c ∈ A. Then, (i) (f + g) (c) = f (c) + g (c), (ii) (kf ) (c) = kf (c), for all k ∈ R, (iii) (f g) (c) = f (c)g(c) + f (c)g (c), and

(iv) (f /g) (c) =

g(c)f (c)−f (c)g (c) , [g(c)]2

provided that g(c) = 0.

Proof. Statements (i) and (ii) are left as exercises. To prove (iii), we rewrite the diﬀerence quotient as (f g)(x) − (f g)(c) x−c

f (x)g(x) − f (x)g(c) + f (x)g(c) − f (c)g(c) x−c ' & ' & f (x) − f (c) g(x) − g(c) + g(c) . = f (x) x−c x−c

=

Because f is diﬀerentiable at c, it is continuous there and thus limx→c f (x) = f (c). This fact, together with the functional-limit version of the Algebraic Limit Theorem (Theorem 4.2.4) justiﬁes the conclusion lim

x→c

(f g)(x) − (f g)(c) = f (c)g (c) + f (c)g(c). x−c

A similar proof of (iv) is possible, or we can use an argument based on the next result. Each of these options is each discussed in Exercise 5.2.2. The composition of two diﬀerentiable functions also fortunately results in another diﬀerentiable function. This fact is referred to as the chain rule. To discover the proper formula for the derivative of the composition g ◦ f , we can write g(f (x)) − g(f (c)) x→c x−c

(g ◦ f ) (c) = lim

g(f (x)) − g(f (c)) f (x) − f (c) · x→c f (x) − f (c) x−c = g (f (c)) · f (c).

=

lim

With a little polish, this string of equations could qualify as a proof except for the pesky fact that the f (x) − f (c) expression causes problems in the denominator if f (x) = f (c) for x values in arbitrarily small neighborhoods of c. (The function g2 (x) discussed in Section 5.1 exhibits this behavior near c = 0.) The upcoming proof of the chain rule manages to ﬁnesse this problem but in content is essentially the argument just given.

134

Chapter 5. The Derivative

Theorem 5.2.5 (Chain Rule). Let f : A → R and g : B → R satisfy f (A) ⊆ B so that the composition g ◦ f is well-deﬁned. If f is diﬀerentiable at c ∈ A and if g is diﬀerentiable at f (c) ∈ B, then g ◦ f is diﬀerentiable at c with (g ◦ f ) (c) = g (f (c)) · f (c). Proof. Because g is diﬀerentiable at f (c), we know that g(y) − g(f (c)) . y − f (c) y→f (c)

g (f (c)) = lim

Another way to assert this same fact is to let d(y) be the diﬀerence (1)

d(y) =

g(y) − g(f (c)) − g (f (c)), y − f (c)

and observe that limy→f (c) d(y) = 0. At the moment, d(y) is not deﬁned when y = f (c), but it should seem natural to declare that d(f (c)) = 0, so that d is continuous at f (c). Now, we come to the ﬁnesse. Equation (1) can be rewritten as (2)

g(y) − g(f (c)) = [g (f (c)) + d(y)](y − f (c)).

Observe that this equation holds for all y ∈ B including y = f (c). Thus, we are free to substitute y = f (t) for any arbitrary t ∈ A. If t = c, we can divide equation (2) by (t − c) to get g(f (t)) − g(f (c)) (f (t) − f (c)) = [g (f (c)) + d(f (t))] t−c t−c for all t = c. Finally, taking the limit as t → c and applying the Algebraic Limit Theorem yields the desired formula.

Darboux’s Theorem One conclusion from this chapter’s introduction is that although continuity is necessary for the derivative to exist, it is not the case that the derivative function itself will always be continuous. Our speciﬁc example was g2 (x) = x2 sin(1/x), where we set g2 (0) = 0. By tinkering with the exponent of the leading x2 factor, it is possible to construct examples of diﬀerentiable functions with derivatives that are unbounded, or twice-diﬀerentiable functions that have discontinuous second derivatives (Exercise 5.2.5). The underlying principle in all of these examples is that by controlling the size of the oscillations of the original function, we can make the corresponding oscillations of the slopes volatile enough to prevent the existence of the relevant limits. It is signiﬁcant that for this class of examples, the discontinuities that arise are never simple jump discontinuities. (A precise deﬁnition of “jump discontinuity” is presented in Section 4.6.) We are now ready to conﬁrm our earlier suspicions that, although derivatives do not in general have to be continuous,

5.2. Derivatives and the Intermediate Value Property

✛

✲

a c

135

f (c)=0

b

✲

Figure 5.3: The Interior Extremum Theorem.

they do possess the intermediate value property. (See Deﬁnition 4.5.3.) This surprising observation is a fairly straightforward corollary to the more obvious observation that diﬀerentiable functions attain maximums and minimums at points where the derivative is equal to zero (Fig. 5.3). Theorem 5.2.6 (Interior Extremum Theorem). Let f be diﬀerentiable on an open interval (a, b). If f attains a maximum value at some point c ∈ (a, b) (i.e., f (c) ≥ f (x) for all x ∈ (a, b)), then f (c) = 0. The same is true if f (c) is a minimum value. Proof. Because c is in the open interval (a, b), we can construct two sequences (xn ) and (yn ), which converge to c and satisfy xn < c < yn for all n ∈ N. The fact that f (c) is a maximum implies that f (yn ) − f (c) ≤ 0 for all n, and thus f (c) = lim

n→∞

f (yn ) − f (c) ≤0 yn − c

by the Order Limit Theorem (Theorem 2.3.4). In a similar way, f (xn ) − f (c) ≥0 xn − c for each xn because both numerator and denominator are negative. This implies that f (xn ) − f (c) ≥ 0, f (c) = lim n→∞ xn − c and therefore f (c) = 0, as desired. The Interior Extremem Theorem is the fundamental fact behind the use of the derivative as a tool for solving applied optimization problems. This idea, discovered and exploited by Pierre de Fermat, is as old as the derivative itself. In a sense, ﬁnding maximums and minimums is arguably why Fermat invented his method of ﬁnding slopes of tangent lines. It was 200 years later that the French mathematician Gaston Darboux (1842–1917) pointed out that Fermat’s method of ﬁnding maximums and minimums carries with it the implication that

136

Chapter 5. The Derivative

if a derivative function attains two distinct values f (a) and f (b), then it must also attain every value in between. The noticeably long delay between these discoveries is indicative of the diﬀerence between the kinds of mathematical questions that were relevant during these two eras. Whereas Fermat was creating a tool for solving a computational problem, by the middle of the 19th century mathematics had become more introspective. Mathematicians were directing their energies toward understanding their subject for its own sake. Theorem 5.2.7 (Darboux’s Theorem). If f is diﬀerentiable on an interval [a, b], and if α satisﬁes f (a) < α < f (b) (or f (a) > α > f (b)), then there exists a point c ∈ (a, b) where f (c) = α. Proof. We ﬁrst simplify matters by deﬁning a new function g(x) = f (x) − αx on [a, b]. Notice that g is diﬀerentiable on [a, b] with g (x) = f (x) − α. In terms of g, our hypothesis states that g (a) < 0 < g (b), and we hope to show that g (c) = 0 for some c ∈ (a, b). The remainder of the argument is outlined in Exercise 5.2.6.

Exercises Exercise 5.2.1. Supply proofs for parts (i) and (ii) of Theorem 5.2.4. Exercise 5.2.2. (a) Use Deﬁnition 5.2.1 to produce the proper formula for the derivative of f (x) = 1/x. (b) Combine the result in part (a) with the chain rule (Theorem 5.2.5) to supply a proof for part (iv) of Theorem 5.2.4. (c) Supply a direct proof of Theorem 5.2.4 (iv) by algebraically manipulating the diﬀerence quotient for (f /g) in a style similar to the proof of Theorem 5.2.4 (iii). Exercise 5.2.3. By imitating the Dirichlet constructions in Section 4.1, construct a function on R that is diﬀerentiable at a single point. a x if x ≥ 0 Exercise 5.2.4. Let fa (x) = 0 if x < 0. (a) For which values of a is f continuous at zero? (b) For which values of a is f diﬀerentiable at zero? In this case, is the derivative function continuous? (c) For which values of a is f twice-diﬀerentiable? Exercise 5.2.5. Let

ga (x) =

xa sin(1/x) if x = 0 0 if x = 0.

Find a particular (potentially noninteger) value for a so that (a) ga is diﬀerentiable on R but such that ga is unbounded on [0, 1]. (b) ga is diﬀerentiable on R with ga continuous but not diﬀerentiable at zero. (c) ga is diﬀerentiable on R and ga is diﬀerentiable on R, but such that ga is not continuous at zero.

5.3. The Mean Value Theorem

137

Exercise 5.2.6. (a) Assume that g is diﬀerentiable on [a, b] and satisﬁes g (a) < 0 < g (b). Show that there exists a point x ∈ (a, b) where g(a) > g(x), and a point y ∈ (a, b) where g(y) < g(b). (b) Now complete the proof of Darboux’s Theorem started earlier. Exercise 5.2.7. Review the deﬁnition of uniform continuity (Deﬁnition 4.4.5) and also the content of Theorem 4.4.8, which states that continuous functions on compact sets are uniformly continuous. (a) Propose a deﬁnition for what it should mean to say that f : A → R is uniformly diﬀerentiable on A. (b) Give an example of a uniformly diﬀerentiable function on [0, 1]. (c) Is there a theorem analogous to Theorem 4.4.8 for diﬀerentiation? Are functions that are diﬀerentiable on a closed interval [a, b] necessarily uniformly diﬀerentiable? The class of examples discussed in Section 5.1 may be useful. Exercise 5.2.8. Decide whether each conjecture is true or false. Provide an argument for those that are true and a counterexample for each one that is false. (a) If a derivative function is not constant, then the derivative must take on some irrational values. (b) If f exists on an open interval, and there is some point c where f (c) > 0, then there exists a δ-neighborhood Vδ (c) around c in which f (x) > 0 for all x ∈ Vδ (c). (c) If f is diﬀerentiable on an interval containing zero and if limx→0 f (x) = L, then it must be that L = f (0). (d) Repeat conjecture (c) but drop the assumption that f (0) necessarily exists. If f (x) exists for all x = 0 and if limx→0 f (x) = L, then f (0) exists and equals L.

5.3

The Mean Value Theorem

The Mean Value Theorem (Fig. 5.4) makes the geometrically plausible assertion that a diﬀerentiable function f on an interval [a, b] will, at some point, attain a slope equal to the slope of the line through the endpoints (a, f (a)) and (b, f (b)). More tersely put, f (b) − f (a) f (c) = b−a for at least one point c ∈ (a, b). On the surface, there does not appear to be anything especially remarkable about this observation. Its validity appears undeniable—much like the Intermediate Value Theorem for continuous functions—and its proof is rather short. The ease of the proof, however, is misleading, as it is built on top of some hardfought accomplishments from the study of limits and continuity. In this regard, the Mean Value Theorem is a kind of reward for a job well done. As we will see, it is a prize of exceptional value. Although the result itself is geometrically obvious, the Mean Value Theorem is the cornerstone for almost every major theorem pertaining to diﬀerentiation. We will use it to prove L’Hospital’s rules

138

Chapter 5. The Derivative ✯ (b,f (b)) ✟



(a,f (a))✟

✟

✟

✟

•✟ a

✟

✟•

✯ ✟

✟

✟✟ ✙

c

✟

f (c)=

b

f (b)−f (a) b−a

✲

Figure 5.4: The Mean Value Theorem.

regarding limits of quotients of diﬀerentiable functions. A rigorous analysis of how inﬁnite series of functions behave when diﬀerentiated requires the Mean Value Theorem (Theorem 6.4.3), and it is the crucial step in the proof of the Fundamental Theorem of Calculus (Theorem 7.5.1). It is also the fundamental concept underlying Lagrange’s Remainder Theorem (Theorem 6.6.1) which approximates the error between a Taylor polynomial and the function that generates it. The Mean Value Theorem can be stated in various degrees of generality, each one important enough to be given its own special designation. Recall that the Extreme Value Theorem (Theorem 4.4.3) states that continuous functions on compact sets always attain maximum and minimum values. Combining this observation with the Interior Extremum Theorem for diﬀerentiable functions (Theorem 5.2.6) yields a special case of the Mean Value Theorem ﬁrst noted by the mathematician Michel Rolle (1652–1719) (Fig. 5.5). Theorem 5.3.1 (Rolle’s Theorem). Let f : [a, b] → R be continuous on [a, b] and diﬀerentiable on (a, b). If f (a) = f (b), then there exists a point c ∈ (a, b) where f (c) = 0. Proof. Because f is continuous on a compact set, f attains a maximum and a minimum. If both the maximum and minimum occur at the endpoints, then f is necessarily a constant function and f (x) = 0 on all of (a, b). In this case, we can choose c to be any point we like. On the other hand, if either the maximum or minimum occurs at some point c in the interior (a, b), then it follows from the Interior Extremum Theorem (Theorem 5.2.6) that f (c) = 0. Theorem 5.3.2 (Mean Value Theorem). If f : [a, b] → R is continuous on [a, b] and diﬀerentiable on (a, b), then there exists a point c ∈ (a, b) where f (c) =

f (b) − f (a) . b−a

5.3. The Mean Value Theorem

139

✛

✲

•

f (c)=0

•

a

c

f (a)=f (b)

b

Figure 5.5: Rolle’s Theorem.

Proof. Notice that the Mean Value Theorem reduces to Rolle’s Theorem in the case where f (a) = f (b). The strategy of the proof is to reduce the more general statement to this special case. The equation of the line through (a, f (a)) and (b, f (b)) is f (b) − f (a) y= (x − a) + f (a). b−a ( d(x)

✟

✟ •

✟

✟

✟

✟

✟ ✟•

✟

(b,f (b))

(a,f (a))

a

x

b

We want to consider the diﬀerence between this line and the function f (x). To this end, let ' & f (b) − f (a) (x − a) + f (a) , d(x) = f (x) − b−a and observe that d is continuous on [a, b], diﬀerentiable on (a, b), and satisﬁes d(a) = 0 = d(b). Thus, by Rolle’s Theorem, there exists a point c ∈ (a, b) where d (c) = 0. Because f (b) − f (a) d (x) = f (x) − , b−a we get f (b) − f (a) , 0 = f (c) − b−a which completes the proof.

140

Chapter 5. The Derivative

The point has been made that the Mean Value Theorem manages to ﬁnd its way into nearly every proof of any statement related to the geometrical nature of the derivative. As a simple example, if f is a constant function f (x) = k on some interval A, then a straightforward calculation of f using Deﬁnition 5.2.1 shows that f (x) = 0 for all x ∈ A. But how do we prove the converse statement? If we know that a diﬀerentiable function g satisﬁes g (x) = 0 everywhere on A, our intuition suggests that we should be able to prove g(x) is constant. It is the Mean Value Theorem that provides us with a way to articulate rigorously what seems geometrically valid. Corollary 5.3.3. If g : A → R is diﬀerentiable on an interval A and satisﬁes g (x) = 0 for all x ∈ A, then g(x) = k for some constant k ∈ R. Proof. Take x, y ∈ A and assume x < y. Applying the Mean Value Theorem to g on the interval [x, y], we see that g (c) =

g(y) − g(x) y−x

for some c ∈ A. Now, g (c) = 0, so we conclude that g(y) = g(x). Set k equal to this common value. Because x and y are arbitrary, it follows that g(x) = k for all x ∈ A. Corollary 5.3.4. If f and g are diﬀerentiable functions on an interval A and satisfy f (x) = g (x) for all x ∈ A, then f (x) = g(x) + k for some constant k ∈ R. Proof. Let h(x) = f (x) − g(x) and apply Corollary 5.3.3 to the diﬀerentiable function h. The Mean Value Theorem has a more general form due to Cauchy. It is this generalized version of the theorem that is needed to analyze L’Hospital’s rules and Lagrange’s Remainder Theorem. Theorem 5.3.5 (Generalized Mean Value Theorem). If f and g are continuous on the closed interval [a, b] and diﬀerentiable on the open interval (a, b), then there exists a point c ∈ (a, b) where [f (b) − f (a)]g (c) = [g(b) − g(a)]f (c). If g is never zero on (a, b), then the conclusion can be stated as f (c) f (b) − f (a) . = g (c) g(b) − g(a) Proof. This result follows by applying the Mean Value Theorem to the function h(x) = [f (b)−f (a)]g(x)−[g(b)−g(a)]f (x). The details are requested in Exercise 5.3.4.

5.3. The Mean Value Theorem

141

L’Hospital’s Rules The Algebraic Limit Theorem asserts that when taking a limit of a quotient of functions we can write lim f (x) f (x) = x→c , lim x→c g(x) lim g(x) x→c

provided that each individual limit exists and limx→c g(x) is not zero. If the denominator does converge to zero and the numerator does not, then it is not diﬃcult to argue that the quotient f (x)/g(x) grows in absolute value without bound as x approaches c (Exercise 5.3.9). L’Hospital’s rules are named for the Marquis de L’Hospital (1661–1704), who learned the results from his tutor, Johann Bernoulli (1667–1748), and published them in 1696 in what is regarded as the ﬁrst calculus text. Stated in diﬀerent levels of generality, they are a favorite tool for handling the indeterminant cases when either numerator and denominator both tend to zero or both tend simultaneously to inﬁnity. Theorem 5.3.6 (L’Hospital’s Rule: 0/0 case). Assume f and g are continuous functions deﬁned on an interval containing a, and assume that f and g are diﬀerentiable on this interval, with the possible exception of the point a. If f (a) = 0 and g(a) = 0, then f (x) =L x→a g (x) lim

implies

lim

x→a

f (x) = L. g(x)

Proof. This argument follows from a straightforward application of the Generalized Mean Value Theorem. It is requested as Exercise 5.3.11. L’Hospital’s rule remains true if we replace the assumption that f (a) = g(a) = 0 with the hypothesis that limx→a g(x) = ∞. To this point we have not been explicit about what it means to say that a limit equals ∞. The logical structure of such a deﬁnition is precisely the same as it is for ﬁnite functional limits. The diﬀerence is that rather than trying to force the function to take on values in some small -neighborhood around a proposed limit, we must show that g(x) eventually exceeds any proposed upper bound. The arbitrarily small > 0 is replaced by an arbitrarily large M > 0. Deﬁnition 5.3.7. Given g : A → R and a limit point c of A, we say that limx→c g(x) = ∞ if, for every M > 0, there exists a δ > 0 such that whenever 0 < |x − c| < δ it follows that g(x) ≥ M . We can deﬁne limx→c g(x) = −∞ in a similar way. The following version of L’Hospital’s rule is referred to as the ∞/∞ case, although the hypothesis only requires that the function in the denominator tend to inﬁnity. If the numerator is bounded, then it is a straightforward exercise to prove that the resulting quotient tends to zero (Exercise 5.3.10). The argument for the general case is relatively involved when compared to the 0/0 case. To simplify the notation of the proof, we state the result using a one-sided limit.

142

Chapter 5. The Derivative

Theorem 5.3.8 (L’Hospital’s Rule: ∞/∞ case). Assume f and g are differentiable on (a, b), and that limx→a g(x) = ∞ (or −∞). Then f (x) =L x→a g (x) lim

Proof. Because limx→a

f (x) g (x)

implies

f (x) = L. x→a g(x) lim

= L, there exists a δ1 > 0 such that f (x) < − L g (x) 2

for all a < x < a + δ1 . For convenience of notation, let t = a + δ1 and mentally note that t is ﬁxed for the remainder of the argument. Our functions are not deﬁned at a, but for any a < x < t we can apply the Generalized Mean Value Theorem on the interval [x, t] to get f (c) f (x) − f (t) = g(x) − g(t) g (c) for some c ∈ (x, t). Our choice of t then implies (1)

L−

f (x) − f (t) <
for all x in (a, t). (x) , the strategy is to multiply equation In an eﬀort to isolate the fraction fg(x) (1) by (g(x) − g(t))/g(x). We need to be sure, however, that this quantity is positive, which amounts to insisting that 1 ≥ g(t)/g(x). Because t is ﬁxed and limx→a g(x) = ∞, we can choose δ2 > 0 so that g(x) ≥ g(t) for all a < x < a+δ2 . Carrying out the desired multiplication results in g(t) g(t) f (x) − f (t) 1− , 1− < < L+ L− 2 g(x) 2 g(x) g(x) which after some algebraic manipulations yields L−

−Lg(t) + 2 g(t) + f (t) Lg(t) − 2 g(t) + f (t) f (x) + <
Again, let’s remind ourselves that t is ﬁxed and that limx→a g(x) = ∞. Thus, we can choose a δ3 such that a < x < a + δ3 implies that g(x) is large enough to ensure that both −Lg(t) + 2 g(t) + f (t) g(x)

and

Lg(t) − 2 g(t) + f (t) g(x)

are less than /2. Putting this all together and choosing δ = min{δ1 , δ2 , δ3 } guarantees that f (x) g(x) − L < for all a < x < a + δ.

5.3. The Mean Value Theorem

143

Exercises Exercise 5.3.1. Recall from Exercise 4.4.9 that a function f : A → R is “Lipschitz on A” if there exists an M > 0 such that f (x) − f (y) ≤M x−y for all x, y ∈ A. Show that if f is diﬀerentiable on a closed interval [a, b] and if f is continuous on [a, b], then f is Lipschitz on [a, b]. Exercise 5.3.2. Recall from Exercise 4.3.9 that a function f is contractive on a set A if there exists a constant 0 < s < 1 such that |f (x) − f (y)| ≤ s|x − y| for all x, y ∈ A. Show that if f is diﬀerentiable and f is continuous and satisﬁes |f (x)| < 1 on a closed interval, then f is contractive on this set. Exercise 5.3.3. Let h be a diﬀerentiable function deﬁned on the interval [0, 3], and assume that h(0) = 1, h(1) = 2, and h(3) = 2. (a) Argue that there exists a point d ∈ [0, 3] where h(d) = d. (b) Argue that at some point c we have h (c) = 1/3. (c) Argue that h (x) = 1/4 at some point in the domain. Exercise 5.3.4. (a) Supply the details for the proof of Cauchy’s Generalized Mean Value Theorem (Theorem 5.3.5). (b) Give a graphical interpretation of the Generalized Mean Value Theorem analogous to the one given for the Mean Value Theorem at the beginning of Section 5.3. (Consider f and g as parametric equations for a curve.) Exercise 5.3.5. A ﬁxed point of a function f is a value x where f (x) = x. Show that if f is diﬀerentiable on an interval with f (x) = 1, then f can have at most one ﬁxed point. Exercise 5.3.6. Let g : [0, 1] → R be twice-diﬀerentiable (i.e., both g and g are diﬀerentiable functions) with g (x) > 0 for all x ∈ [0, 1]. If g(0) > 0 and g(1) = 1, show that g(d) = d for some point d ∈ (0, 1) if and only if g (1) > 1. (This geometrically plausible fact is used in the introductory discussion to Chapter 6.) Exercise 5.3.7. (a) Recall that a function f : (a, b) → R is increasing on (a, b) if f (x) ≤ f (y) whenever x < y in (a, b). Assume f is diﬀerentiable on (a, b). Show that f is increasing on (a, b) if and only if f (x) ≥ 0 for all x ∈ (a, b). (b) Show that the function x/2 + x2 sin(1/x) if x = 0 g(x) = 0 if x = 0 is diﬀerentiable on R and satisﬁes g (0) > 0. Now, prove that g is not increasing over any open interval containing 0.

144

Chapter 5. The Derivative

Exercise 5.3.8. Assume g : (a, b) → R is diﬀerentiable at some point c ∈ (a, b). If g (c) = 0, show that there exists a δ-neighborhood Vδ (c) ⊆ (a, b) for which g(x) = g(c) for all x ∈ Vδ (c). Compare this result with Exercise 5.3.7. Exercise 5.3.9. Assume that limx→c f (x) = L, where L = 0, and assume limx→c g(x) = 0. Show that limx→c |f (x)/g(x)| = ∞. Exercise 5.3.10. Let f be a bounded function and assume limx→c g(x) = ∞. Show that limx→c f (x)/g(x) = 0. Exercise 5.3.11. Use the Generalized Mean Value Theorem to furnish a proof of the 0/0 case of L’Hospital’s rule (Theorem 5.3.6). Exercise 5.3.12. Assume f and g are as described in Theorem 5.3.6, but now add the assumption that f and g are diﬀerentiable at a and f and g are continuous at a. Find a short proof for the 0/0 case of L’Hospital’s rule under this stronger hypothesis. Exercise 5.3.13. Review the hypothesis of Theorem 5.3.6. What happens if we do not assume that f (a) = g(a) = 0, but assume only that limx→a f (x) = 0 and limx→a g(x) = 0? Assuming we have a proof for Theorem 5.3.6 as it is written, explain how to construct a valid proof under this slightly weaker hypothesis.

5.4

A Continuous Nowhere-Diﬀerentiable Function

Exploring the relationship between continuity and diﬀerentiability has led to both fruitful results and pathological counterexamples. The bulk of discussion to this point has focused on the continuity of derivatives, but historically a signiﬁcant amount of debate revolved around the question of whether continuous functions were necessarily diﬀerentiable. Early in the chapter, we saw that continuity was a requirement for diﬀerentiability, but, as the absolute value function demonstrates, the converse of this proposition is not true. A function can be continuous but not diﬀerentiable at some point. But just how nondiﬀerentiable can a continuous function be? Given a ﬁnite set of points, it is not diﬃcult to imagine how to construct a graph with corners at each of these points, so that the corresponding function fails to be diﬀerentiable on this ﬁnite set. The trick gets more diﬃcult, however, when the set becomes inﬁnite. For instance, is it possible to construct a function that is continuous on all of R but fails to be diﬀerentiable at every rational point? Not only is this possible, but the situation is even more delightful. In 1872, Karl Weierstrass presented an example of a continuous function that was not diﬀerentiable at any point. (It seems to be the case that Bernhard Bolzano had his own example of such a beast as early as 1830, but it was not published until much later.) Weierstrass actually discovered a class of nowhere-diﬀerentiable functions of the form ∞ f (x) = an cos(bn x) n=0

5.4. A Continuous Nowhere-Diﬀerentiable Function

1

❅ ❅

❅

❅

❅

❅

❅ ❅

−2

145

❅ ❅

❅

−1

1

❅

2

3

Figure 5.6: The function h(x).

where the values of a and b are carefully chosen. Such functions are speciﬁc examples of Fourier series discussed in Section 8.3. The details of Weierstrass’ argument are simpliﬁed if we replace the cosine function with a piecewise linear function that has oscillations qualitatively like cos(x). Deﬁne h(x) = |x| on the interval [−1, 1] and extend the deﬁnition of h to all of R by requiring that h(x + 2) = h(x). The result is a periodic “sawtooth” function (Fig. 5.6). Exercise 5.4.1. Sketch a graph of (1/2)h(2x) on [−2, 3]. Give a qualitative description of the functions hn (x) =

1 h(2n x) 2n

as n gets larger. Now, deﬁne g(x) =

∞ n=0

hn (x) =

∞ 1 h(2n x). n 2 n=0

The claim is that g(x) is continuous on all of R but fails to be diﬀerentiable at any point.

Inﬁnite Series of Functions and Continuity The deﬁnition of g(x) is a signiﬁcant departure from the way we usually deﬁne functions. For each x ∈ R, g(x) is deﬁned to be the value of an inﬁnite series. Exercise 5.4.2. Fix x ∈ R. Argue that the series ∞ 1 h(2n x) n 2 n=0

converges absolutely and thus g(x) is properly deﬁned.

146

Chapter 5. The Derivative

–1

1

Figure 5.7: A sketch of g(x) =

∞

n=0 (1/2

2

n

)h(2n x).

Exercise 5.4.3. Taking the continuity of h(x) as given, reference the proper theorems from Chapter 4 that imply that the ﬁnite sum gm (x) =

m 1 h(2n x) n 2 n=0

is continuous on R. This brings us to an archetypical question in analysis: When do conclusions that are valid in ﬁnite settings extend to inﬁnite ones? A ﬁnite sum of continuous functions is certainly continuous, but does this necessarily hold for an inﬁnite sum of continuous functions? In general, we will see that this is not always the case. For this particular sum, however, the continuity of the limit function g(x) can be proved. Deciphering when results about ﬁnite sums of functions extend to inﬁnite sums is one of the fundamental themes of Chapter 6. Although a self-contained argument for the continuity of g is not beyond our means at this point, we will nevertheless postpone the proof (Exercise 6.4.4), leaving it as an enticement for the upcoming study of uniform convergence (or as an exercise for those who have already covered it).

Nondiﬀerentiability With the proper tools in place, the proof that g is continuous is quite straightforward. The more diﬃcult task is to show that g is not diﬀerentiable at any point in R. Let’s ﬁrst look at the point x = 0. Our function g does not appear to be diﬀerentiable here (Fig. 5.7), and a rigorous proof is not too diﬃcult. Consider the sequence xm = 1/2m , where m = 0, 1, 2, . . . .

5.4. A Continuous Nowhere-Diﬀerentiable Function

147

Exercise 5.4.4. Show that g(xm ) − g(0) = m + 1, xm − 0 and use this to prove that g (0) does not exist. Any temptation to say something like g (0) = ∞ should be resisted. Setting xm = −(1/2m ) in the previous argument produces diﬀerence quotients heading toward −∞. The geometric manifestation of this is the “cusp” that appears at x = 0 in the graph of g. Exercise 5.4.5. (a) Modify the previous argument to show that g (1) does not exist. Show that g (1/2) does not exist. (b) Show that g (x) does not exist for any rational number of the form x = p/2k where p ∈ Z and k ∈ N ∪ {0}. The points described in Exercise 5.4.5 (b) are called “dyadic” points. If x = p/2k is a dyadic rational number, then the function hn has a corner at x as long as n ≥ k. Thus, it should not be too surprising that g fails to be diﬀerentiable at points of this form. The argument is more intricate at points between the dyadic points. Assume x is not a dyadic number. For a ﬁxed value of m ∈ N ∪ {0}, x falls between two adjacent dyadic points, p p+1 <x< m . 2m 2 Set xm = p/2m and y = (p + 1)/2m . Repeating this for each m yields two sequences (xm ) and (ym ) satisfying lim xm = lim ym = x

and

xm < x < ym .

Exercise 5.4.6. (a) Without working too hard, explain why the partial sum gm = h0 + h1 + · · · + hm is diﬀerentiable at x. Now, prove that, for every value of m, we have |gm+1 (x) − gm (x)| = 1. (b) Prove the two inequalities g(ym ) − g(x) g(xm ) − g(x) < gm . (x) < ym − x xm − x (c) Use parts (a) and (b) to show that g (x) does not exist. Weierstrass’ original 1872 paper contained a demonstration that the inﬁnite sum ∞ f (x) = an cos(bn x) n=0

148

Chapter 5. The Derivative

deﬁned a continuous nowhere-diﬀerentiable function provided 0 < a < 1 and b was an odd integer satisfying ab > 1 + 3π/2. The condition on a is easy to ∞ understand. If 0 < a < 1, then n=0 an is a convergent geometric series, and the forthcoming Weierstrass M-Test (Theorem 6.4.5) can be used to conclude that f is continuous. The restriction on b is more mysterious. In 1916, G.H. Hardy extended Weierstrass’ result to include any value of b for which ab ≥ 1. Without looking at the details of either of these arguments, we nevertheless get a sense that the lack of a derivative is intricately tied to the relationship between the compression factor (the parameter a) and the rate at which the frequency of the oscillations increases (the parameter b). Exercise 5.4.7. Review the argument for the nondiﬀerentiability of g(x) at nondyadic points. the argument still work if we replace g(x) with the ∞ Does n n summation (1/2 )h(3 x)? Does the argument work for the function n=0 ∞ n n (1/3 )h(2 x)? n=0

5.5

Epilogue

Far from being an anomaly to be relegated to the margins of our understanding of continuous functions, Weierstrass’ example and those like it should actually serve as a guide to our intuition. The image of continuity as a smooth curve in our mind’s eye severely misrepresents the situation and is the result of a bias stemming from an overexposure to the much smaller class of diﬀerentiable functions. The lesson here is that continuity is a strictly weaker notion than diﬀerentiability. In Section 3.6, we alluded to a corollary of the Baire Category Theorem, which asserts that Weierstrass’ construction is actually typical of continuous functions. We will see that most continuous functions are nowhere-diﬀerentiable, so that it is really the diﬀerentiable functions that are the exceptions rather than the rule. The details of how to phrase this observation more rigorously are spelled out in Section 8.2. To say that the nowhere-diﬀerentiable function g constructed in the previous section has “corners” at every point of its domain slightly misses the mark. Weierstrass’ original class of nowhere-diﬀerentiable functions was constructed from inﬁnite sums of smooth trigonometric functions. It is the densely nested oscillating structure that makes the deﬁnition of a tangent line impossible. So what happens when we restrict our attention to monotone functions? How nondiﬀerentiable can an increasing function be? Given a ﬁnite set of points, it is not diﬃcult to piece together a monotone function which has actual corners— and thus is not diﬀerentiable—at each point in the given set. A natural question is whether there exists a continuous, monotone function that is nowherediﬀerentiable. Weierstrass suspected that such a function existed but only managed to produce an example of a continuous, increasing function which failed to be diﬀerentiable on a countable dense set (Exercise 7.5.11). In 1903, the French mathematician Henri Lebesgue (1875–1941) demonstrated that Weierstrass’ intuition had failed on this account. Lebesgue proved that a continuous, monotone function would have to be diﬀerentiable at “almost” every point in

5.5. Epilogue

149

its domain. To be speciﬁc, Lebesgue showed that, for every > 0, the set of points where such a function fails to be diﬀerentiable can be covered by a countable union of intervals whose lengths sum to a quantity less than . This notion of “zero length,” or “measure zero” as it is called, was encountered in our discussion of the Cantor set and is explored more fully in Section 7.6, where Lebesgue’s substantial contribution to the theory of integration is discussed. With the relationship between the continuity of f and the existence of f somewhat in hand, we once more return to the question of characterizing the set of all derivatives. Not every function is a derivative. Darboux’s Theorem forces us to conclude that there are some functions—those with jump discontinuities in particular—that cannot appear as the derivative of some other function. Another way to phrase Darboux’s Theorem is to say that all derivatives must satisfy the intermediate value property. Continuous functions do possess the intermediate value property, and it is natural to ask whether every continuous function is necessarily a derivative. For this smaller class of functions, the answer is yes. The Fundamental Theorem of Calculus, treated in ) xChapter 7, states that, given a continuous function f , the function F (x) = a f satisﬁes F = f . This does the trick. The collection of derivatives at least contains the continuous functions. The search for a concise characterization of all possible derivatives, however, remains largely unsuccessful. As a ﬁnal remark, we )will see that by cleverly choosing f , this technique x of deﬁning F via F (x) = a f can be used to produce examples of continuous functions ) x which fail to be diﬀerentiable on interesting sets, provided we can show that a f is deﬁned. The question of just how to deﬁne integration became a central theme in analysis in the latter half of the 19th century and has continued on to the present. Much of this story is discussed in detail in Chapter 7 and Section 8.1.

Chapter 6

Sequences and Series of Functions 6.1

Discussion: Branching Processes

The fact that polynomial functions are so ubiquitous in both pure and applied analysis can be attributed to any number of reasons. They are continuous, inﬁnitely diﬀerentiable, and deﬁned on all of R. They are easy to evaluate and easy to manipulate, both from the points of view of algebra (adding, multiplying, factoring) and calculus (integrating, diﬀerentiating). It should be no surprise, then, that even in the earliest stages of the development of calculus, mathematicians experimented with the idea of extending the notion of polynomials to functions that are essentially polynomials of inﬁnite degree. Such objects are called power series, and are formally denoted by ∞

an xn = a0 + a1 x + a2 x2 + a3 x4 + · · · .

n=0

The basic dilemma from the point of view of analysis is deciphering when the desirable qualities of the limiting functions (the polynomials in this case) are passed on to the limit (the power series). To put the discussion in a more concrete context, let’s look at a particular problem from the theory of probability. In 1873, Francis Galton asked the London Mathematical Society to consider the problem of the survival of surnames (which at that time were passed to succeeding generations exclusively by adult male children). “Assume,” Galton said, “that the law of population is such that, in each generation, p0 percent of the adult males have no male children who reach adult life; p1 have one such male child; p2 percent have two; and so on... Find [the probability that] the surname will become extinct after r generations.” We should add (or make explicit) the assumption that the lives of each oﬀspring, and the descendants thereof, proceed independently of the fortunes of the rest of the family. 151

152

Chapter 6. Sequences and Series of Functions

Galton asks for the probability of extinction after r generations, which we will call dr . If we begin with one parent, then d1 = p0 . If p0 = 0, then dr will clearly equal 0 for all generations r. To keep the problem interesting, we will insist that from here on p0 > 0. Now, d2 , whatever it equals, will certainly satisfy d1 ≤ d2 because if the population is extinct after one generation it will remain so after two. By this reasoning, we have a monotone sequence d1 ≤ d2 ≤ d3 ≤ d4 · · · , which, because we are dealing with probabilities, is bounded above by 1. By the Monotone Convergence Theorem, the sequence converges, and we can let d = lim dr r→∞

be the probability that the surname eventually goes extinct at any time in the future. Knowing it exists, our task is to ﬁnd d. The truly clever step in the solution is to deﬁne the function G(x) = p0 + p1 x + p2 x2 + p3 x3 + · · · . In the case of producing male oﬀspring, it seems safe to assume that this sum terminates after ﬁve or six terms, because nature would have it that pn = 0 for all values of n beyond this point. However, if we were studying neutrons in a nuclear reactor, or heterozygotes carrying a mutant gene (as is often the case with the theory of branching processes), then the notion of an inﬁnite sum becomes a more attractive model. The point is this: We will proceed with reckless abandon and treat the function G(x) as though it were a familiar polynomial of ﬁnite degree. At the end of the computations, however, we will have to again become well-trained analysts and be prepared to justify the manipulations we have made under the hypothesis that G(x) represents an inﬁnite sum for each value of x. The critical observation is that G(dr ) = dr+1 . The way to understand this is to view the expression G(dr ) = p0 + p1 dr + p2 d2r + p3 d3r + · · · as a sum of the probabilities for diﬀerent distinct ways extinction could occur in r + 1 generations based on what happens after the ﬁrst generational step. Speciﬁcally, p0 is the probability that the initial parent has no oﬀspring and so still has none after r + 1 generations. The term p1 dr is the probability that the initial parent has one male child times the probability that this child’s own lineage dies out after r generations. Thus, the probability p1 dr is another contribution toward the probability of extinction in r + 1 steps. The third term represents the probability that the initial parent has two children and that the surnames of each of these two children die out within r generations. Continuing in this way, we see that every possible scenario for extinction in r + 1 steps is

6.1. Discussion: Branching Processes

153

accounted for exactly once within the sum G(dr ). By the deﬁnition of dr+1 , we get G(dr ) = dr+1 . Now for some analysis. If we take the limit as r → ∞ on each side of the equation G(dr ) = dr+1 , then on the right-hand side we get lim dr+1 = d. Assuming G is continuous, we have d = lim dr+1 = lim G(dr ) = G(d). r→∞

r→∞

The conclusion that d = G(d) means that the point d is a ﬁxed point of G. It can be located graphically by ﬁnding where the graph of G intersects the line y = x.

•

1

p0

•

p0

1

(i) G (1) ≤ 1

•

1

• d

1

(ii) G (1) > 1

It is always the case that G(1) = p0 + p1 + p2 + p3 + · · · = 1 because the probabilities (pk ) form a complete distribution. But d = 1 is not necessarily the only candidate for a solution to G(d) = d. Graph (ii) illustrates a scenario in which G has another ﬁxed point in the interval (0, 1) in addition to x = 1. Treating G as though it were a polynomial, we diﬀerentiate term-by-term to get G (x) = p1 + 2p2 x + 3p3 x2 + 4p4 x3 + · · · and

G (x) = 2p2 + 6p3 x + 12p4 x2 + · · · .

On the interval [0, 1], every term in G and G is nonnegative which means G is an increasing, convex function from G(0) = p0 > 0 up to G(1) = 1. This suggests that the two preceding graphs form a rather complete picture of the possibilities for the behavior of G with regard to ﬁxed points. Of particular interest is graph (ii), where the graph of y = x intersects G twice in [0, 1]. Using the Mean Value Theorem, we can prove (Exercise 5.3.6) that G(d) = d for some other point d ∈ (0, 1) if and only if G (1) > 1. Now, G (1) = p1 + 2p2 + 3p3 + 4p4 + · · ·

154

Chapter 6. Sequences and Series of Functions

has a very interesting interpretation within the language of probability. The sum is a weighted average, where in each term we have multiplied the number of male children by the probability of actually producing this particular number. The result is a value for the expected number of male oﬀspring from a given parent. Said another way, G (1) is the average number of male children produced by the parents in this particular family tree. It is not diﬃcult to argue that (dr ) will converge to the smallest solution to G(d) = d on [0, 1] (Exercise 6.5.12), and so we arrive at the following conclusion. If each parent produces, on average, more than one male child, then there is a positive probability that the surname will survive. The equation G(d) = d will have a unique solution in (0, 1), and 1 − d represents the probability that the surname does not become extinct. On the other hand, if the expected number of male oﬀspring per parent is one or less than one, then extinction occurs with probability one. The implications of these results on nuclear reactions and the spread of cancer are fascinating topics for another time. What is of concern to us here is ∞ whether our manipulations of G(x) are justiﬁed. The assumption that n=0 pn = 1 guarantees that G is at least deﬁned at x = 1. The point x = 0 poses no problem, but is G necessarily well-deﬁned for 0 < x < 1? If so, how might we prove that G is continuous on this set? Diﬀerentiable? Twice-diﬀerentiable? If G is diﬀerentiable, can we compute the derivative by naively diﬀerentiating each term of the series? Our initial attack on these questions will require us to focus attention on the interval [0, 1). Some interesting subtleties arise when we try to extend our results to include the endpoint x = 1.

6.2

Uniform Convergence of a Sequence of Functions

Just as in chapter two, we will initially concern ourselves with the behavior and properties of converging sequences of functions. Because convergence of inﬁnite sums is deﬁned in terms of the associated sequence of partial sums, the results from our study of sequences will be immediately applicable to the questions we have raised about power series and about inﬁnite series of functions in general.

Pointwise Convergence Deﬁnition 6.2.1. For each n ∈ N, let fn be a function deﬁned on a set A ⊆ R. The sequence (fn ) of functions converges pointwise on A to a function f : A → R if, for all x ∈ A, the sequence of real numbers fn (x) converges to f (x). In this case, we write fn → f , lim fn = f , or limn→∞ fn (x) = f (x). This last expression is helpful if there is any confusion as to whether x or n is the limiting variable. Example 6.2.2. (i) Consider fn (x) = (x2 + nx)/n

6.2. Uniform Convergence of a Sequence of Functions

155

4

y 2

–4

–2

0

2 x

4

–2

Figure 6.1: f1 , f5 , f10 , and f20 where fn = (x2 + nx)/n.

on all of R. Graphs of f1 , f5 , f10 , and f20 (Fig. 6.1) give an indication of what is happening as n gets larger. Algebraically, we can compute x2 + nx x2 = lim + x = x. n→∞ n→∞ n n

lim fn (x) = lim

n→∞

Thus, (fn ) converges pointwise to f (x) = x on R. (ii) Let gn (x) = xn on the set [0, 1], and consider what happens as n tends to inﬁnity (Fig 6.2). If 0 ≤ x < 1, then we have seen that xn → 0. On the other hand, if x = 1, then xn → 1. It follows that gn → g pointwise on [0, 1], where 0 for 0 ≤ x < 1 g(x) = 1 for x = 1.

1

y

0

x

1

Figure 6.2: g(x) = limn→∞ xn is not continuous on [0, 1].

156

Chapter 6. Sequences and Series of Functions

–1

x

1

Figure 6.3: hn → |x| on [−1, 1]; limit is not differentiable.

1

(iii) Consider hn (x) = x1+ 2n−1 on the set [−1, 1] (Fig. 6.3). For a ﬁxed x ∈ [−1, 1] we have 1

lim hn (x) = x lim x 2n−1 = |x|.

n→∞

n→∞

Examples 6.2.2 (ii) and (iii) are our ﬁrst indication that there is some diﬃcult work ahead of us. The central theme of this chapter is analyzing which properties the limit function inherits from the approximating sequence. In Example 6.2.2 (iii) we have a sequence of diﬀerentiable functions converging pointwise to a limit that is not diﬀerentiable at the origin. In Example 6.2.2 (ii), we see an even more fundamental problem of a sequence of continuous functions converging to a limit that is not continuous.

Continuity of the Limit Function With Example 6.2.2 (ii) ﬁrmly in mind, we begin this discussion with a doomed attempt to prove that the pointwise limit of continuous functions is continuous. Upon discovering the problem in the argument, we will be in a better position to understand the need for a stronger notion of convergence for sequences of functions. Assume (fn ) is a sequence of continuous functions on a set A ⊆ R, and assume (fn ) converges pointwise to a limit f . To argue that f is continuous, ﬁx a point c ∈ A, and let > 0. We need to ﬁnd a δ > 0 such that |x − c| < δ

implies

|f (x) − f (c)| < δ.

By the triangle inequality, |f (x) − f (c)| = |f (x) − fn (x) + fn (x) − fn (c) + fn (c) − f (c)| ≤ |f (x) − fn (x)| + |fn (x) − fn (c)| + |fn (c) − f (c)|.

6.2. Uniform Convergence of a Sequence of Functions

157

(We should really call this the “quadralateral inequality” because we are using three joined “sides” as an overestimate for the length of the fourth.) Our ﬁrst, optimistic impression is that each term in the sum on the right-hand side can be made small—the ﬁrst and third by the fact that fn → f , and the middle term by the continuity of fn . In order to use the continuity of fn , we must ﬁrst establish which particular fn we are talking about. Because c ∈ A is ﬁxed, choose N ∈ N so that |fN (c) − f (c)| < . 3 Now that N is chosen, the continuity of fN implies that there exists a δ > 0 such that |fN (x) − fN (c)| < 3 for all x satisfying |x − c| < δ. But here is the problem. We also need |fN (x) − f (x)| <

3

for all

|x − c| < δ.

The values of x depend on δ, which depends on the choice of N . Thus, we cannot go back and simply choose a diﬀerent N . More to the point, the variable x is not ﬁxed the way c is in this discussion but represents any point in the interval (c−δ, c+δ). Pointwise convergence implies that we can make |fn (x)−f (x)| < /3 for large enough values of n, but the value of n depends on the point x. It is possible that diﬀerent values for x will result in the need for diﬀerent—larger— choices for n. This phenomenon is apparent in Example 6.2.2 (ii). To achieve the inequality 1 |gn (1/2) − g(1/2)| < , 3 we need n ≥ 2, whereas |gn (9/10) − g(9/10)| <

1 3

is true only after n ≥ 11.

Uniform Convergence To resolve this dilemma, we deﬁne a new, stronger notion of convergence of functions. Deﬁnition 6.2.3. Let fn be a sequence of functions deﬁned on a set A ⊆ R. Then, (fn ) converges uniformly on A to a limit function f deﬁned on A if, for every > 0, there exists an N ∈ N such that |fn (x)−f (x)| < whenever n ≥ N and x ∈ A. To emphasize the diﬀerence between uniform convergence and pointwise convergence, we restate Deﬁnition 6.2.1, being more explicit about the relationship

158

Chapter 6. Sequences and Series of Functions

between , N , and x. In particular, notice where the domain point x is referenced in each deﬁnition and consequently how the choice of N then does or does not depend on this value. Deﬁnition 6.2.1B. Let fn be a sequence of functions deﬁned on a set A ⊆ R. Then, (fn ) converges pointwise on A to a limit f deﬁned on A if, for every > 0 and x ∈ A, there exists an N ∈ N (perhaps dependent on x) such that |fn (x) − f (x)| < whenever n ≥ N . The use of the adverb uniformly here should be reminiscent of its use in the phrase “uniformly continuous” from Chapter 4. In both cases, the term “uniformly” is employed to express the fact that the response (δ or N ) to a prescribed can be chosen to work simultaneously for all values of x in the relevant domain. Example 6.2.4. (i) Let gn (x) =

1 . n(1 + x2 )

For any ﬁxed x ∈ R, we can see that lim gn (x) = 0 so that g(x) = 0 is the pointwise limit of the sequence (gn ) on R. Is this convergence uniform? The observation that 1/(1 + x2 ) ≤ 1 for all x ∈ R implies that 1 ≤ 1. |gn (x) − g(x)| = − 0 n 2 n(1 + x ) Thus, given > 0, we can choose N > 1/ (which does not depend on x), and it follows that n ≥ N implies |gn (x) − g(x)| < for all x ∈ R. By Deﬁnition 6.2.3, gn → 0 uniformly on R. (ii) Look back at Example 6.2.2 (i), where we saw that fn (x) = (x2 + nx)/n converges pointwise on R to f (x) = x. On R, the convergence is not uniform. To see this write 2 x + nx x2 |fn (x) − f (x)| = − x = , n n and notice that in order to force |fn (x) − f (x)| < , we are going to have to choose x2 N> . Although this is possible to do for each x ∈ R, there is no way to choose a single value of N that will work for all values of x at the same time. On the other hand, we can show that fn → f uniformly on the set [−b, b]. By restricting our attention to a bounded interval, we may now assert that x2 b2 ≤ . n n

6.2. Uniform Convergence of a Sequence of Functions fn ,n≥N

f + f f −

✠

✲

A

159

Figure 6.4: fn → f uniformly on A.

g4 g3 g2

g1 g+ g g−

A

Figure 6.5: gn → g pointwise, but not uniformly.

Given > 0, then, we can choose N>

b2

independently of x ∈ [−b, b]. Graphically speaking, the uniform convergence of fn to a limit f on a set A can be visualized by constructing a band of radius ± around the limit function f . If fn → f uniformly, then there exists a point in the sequence after which each fn is completely contained in this -strip (Fig. 6.4). This image should be compared with the graphs in Figures 6.1–6.2 from Example 6.2.2 and the one in Figure 6.5.

Cauchy Criterion Recall that the Cauchy Criterion for convergent sequences of real numbers was an equivalent characterization of convergence which, unlike the deﬁnition, did not make explicit mention of the limit. The usefulness of the Cauchy Criterion suggests the need for an analogous characterization of uniformly convergent sequences of functions. As with all statements about uniformity, pay attention to where the quantifying phrase “for all x ∈ A” appears in the statement.

160

Chapter 6. Sequences and Series of Functions

Theorem 6.2.5 (Cauchy Criterion for Uniform Convergence). A sequence of functions (fn ) deﬁned on a set A ⊆ R converges uniformly on A if and only if for every > 0 there exists an N ∈ N such that |fn (x) − fm (x)| < for all m, n ≥ N and all x ∈ A. Proof. Exercise 6.2.6.

Continuity Revisited The stronger assumption of uniform convergence is precisely what is required to remove the ﬂaws from our attempted proof that the limit of continuous functions is continuous. Theorem 6.2.6. Let (fn ) be a sequence of functions deﬁned on A ⊆ R that converges uniformly on A to a function f . If each fn is continuous at c ∈ A, then f is continuous at c. Proof. Fix c ∈ A and let > 0. Choose N so that |fN (x) − f (x)| <

3

for all x ∈ A. Because fN is continuous, there exists a δ > 0 for which |fN (x) − fN (c)| <

3

is true whenever |x − c| < δ. But this implies |f (x) − f (c)| = |f (x) − fN (x) + fN (x) − fN (c) + fN (c) − f (c)| ≤ |f (x) − fN (x)| + |fN (x) − fN (c)| + |fN (c) − f (c)| + + = . < 3 3 3 Thus, f is continuous at c ∈ A.

Exercises Exercise 6.2.1. Let

nx . 1 + nx2 (a) Find the pointwise limit of (fn ) for all x ∈ (0, ∞). (b) Is the convergence uniform on (0, ∞)? (c) Is the convergence uniform on (0, 1)? (d) Is the convergence uniform on (1, ∞)? fn (x) =

Exercise 6.2.2. Let

nx + sin(nx) . 2n Find the pointwise limit of (gn ) on R. Is the convergence uniform on [−10, 10]? Is the convergence uniform on all of R? gn (x) =

6.2. Uniform Convergence of a Sequence of Functions

161

Exercise 6.2.3. Consider the sequence of functions hn (x) =

x 1 + xn

over the domain [0, ∞). (a) Find the pointwise limit of (hn ) on [0, ∞). (b) Explain how we know that the convergence cannot be uniform on [0, ∞). (c) Choose a smaller set over which the convergence is uniform and supply an argument to show that this is indeed the case. Exercise 6.2.4. For each n ∈ N, ﬁnd the points on R where the function fn (x) = x/(1 + nx2 ) attains its maximum and minimum values. Use this to prove (fn ) converges uniformly on R. What is the limit function? Exercise 6.2.5. For each n ∈ N, deﬁne fn on R by 1 if |x| ≥ 1/n fn (x) = n|x| if |x| < 1/n. (a) Find the pointwise limit of (fn ) on R and decide whether or not the convergence is uniform. (b) Construct an example of a pointwise limit of continuous functions that converges everywhere on the compact set [−5, 5] to a limit function that is unbounded on this set. Exercise 6.2.6. Using the Cauchy Criterion for convergent sequences of real numbers (Theorem 2.6.4), supply a proof for Theorem 6.2.5. (First, deﬁne a candidate for f (x), and then argue that fn → f uniformly.) Exercise 6.2.7. Assume that (fn ) converges uniformly to f on A and that each fn is uniformly continuous on A. Prove that f is uniformly continuous on A. Exercise 6.2.8. Decide which of the following conjectures are true and which are false. Supply a proof for those that are valid and a counterexample for each one that is not. (a) If fn → f pointwise on a compact set K, then fn → f uniformly on K. (b) If fn → f uniformly on A and g is a bounded function on A, then fn g → f g uniformly on A. (c) If fn → f uniformly on A, and if each fn is bounded on A, then f must also be bounded. (d) If fn → f uniformly on a set A, and if fn → f uniformly on a set B, then fn → f uniformly on A ∪ B. (e) If fn → f uniformly on an interval, and if each fn is increasing, then f is also increasing. (f) Repeat conjecture (e) assuming only pointwise convergence. Exercise 6.2.9. Assume (fn ) converges uniformly to f on a compact set K, and let g be a continuous function on K satisfying g(x) = 0. Show (fn /g) converges uniformly on K to f /g.

162

Chapter 6. Sequences and Series of Functions

Exercise 6.2.10. Let f be uniformly continuous on all of R, and deﬁne a sequence of functions by fn (x) = f (x + n1 ). Show that fn → f uniformly. Give an example to show that this proposition fails if f is only assumed to be continuous and not uniformly continuous on R. Exercise 6.2.11. Assume (fn ) and (gn ) are uniformly convergent sequences of functions. (a) Show that (fn + gn ) is a uniformly convergent sequence of functions. (b) Give an example to show that the product (fn gn ) may not converge uniformly. (c) Prove that if there exists an M > 0 such that |fn | ≤ M and |gn | ≤ M for all n ∈ N, then (fn gn ) does converge uniformly. Exercise 6.2.12. Theorem 6.2.6 has a partial converse. Assume fn → f pointwise on a compact set K and assume that for each x ∈ K the sequence fn (x) is increasing. Follow these steps to show that if fn and f are continuous on K, then the convergence is uniform. (a) Set gn = f − fn and translate the preceding hypothesis into statements about the sequence (gn ). (b) Let > 0 be arbitrary, and deﬁne Kn = {x ∈ K : gn (x) ≥ }. Argue that K1 ⊇ K2 ⊇ K3 ⊇ · · · is a nested sequence of compact sets, and use this observation to ﬁnish the argument. Exercise 6.2.13 (Cantor Function). Review the construction of the Cantor set C ⊆ [0, 1] from Section 3.1. This exercise makes use of results and notation from this discussion. (a) Deﬁne f0 (x) = x for all x ∈ [0, 1]. Now, let  for 0 ≤ x ≤ 1/3  (3/2)x 1/2 for 1/3 < x < 2/3 f1 (x) =  (3/2)x − 1/2 for 2/3 ≤ x ≤ 1. Sketch f0 and f1 over [0, 1] and observe that f1 is continuous, increasing, and constant on the middle third (1/3, 2/3) = [0, 1]\C1 . (b) Construct f2 by imitating this process of ﬂattening out the middle third of each nonconstant segment of f1 . Speciﬁcally, let  for 0 ≤ x ≤ 1/3  (1/2)f1 (3x) f1 (x) for 1/3 < x < 2/3 f2 (x) =  (1/2)f1 (3x − 2) + 1/2 for 2/3 ≤ x ≤ 1. If we continue this process, show that the resulting sequence (fn ) converges uniformly on [0, 1]. (c) Let f = lim fn . Prove that f is a continuous, increasing function on [0, 1] with f (0) = 0 and f (1) = 1 that satisﬁes f (x) = 0 for all x in the open set [0, 1]\C. Recall that the “length” of the Cantor set C is 0. Somehow, f manages to increase from 0 to 1 while remaining constant on a set of “length 1.”

6.2. Uniform Convergence of a Sequence of Functions

163

Exercise 6.2.14. Recall that the Bolzano–Weierstrass Theorem (Theorem 2.5.5) states that every bounded sequence of real numbers has a convergent subsequence. An analogous statement for bounded sequences of functions is not true in general, but under stronger hypotheses several diﬀerent conclusions are possible. One avenue is to assume the common domain for all of the functions in the sequence is countable. (Another is explored in the next two exercises.) Let A = {x1 , x2 , x3 , . . . } be a countable set. For each n ∈ N, let fn be deﬁne on A and assume there exists an M > 0 such that |fn (x)| ≤ M for all n ∈ N and x ∈ A. Follow these steps to show that there exists a subsequence of (fn ) that converges pointwise on A. (a) Why does the sequence of real numbers fn (x1 ) necessarily contain a convergent subsequence (fnk )? To indicate that the subsequence of functions (fnk ) is generated by considering the values of the functions at x1 , we will use the notation fnk = f1,k . (b) Now, explain why the sequence f1,k (x2 ) contains a bounded subsequence. (c) Carefully construct a nested family of subsequences (fm,k ), and use Cantor’s diagonalization technique (from Theorem 1.5.1) to produce a single subsequence of (fn ) that converges at every point of A. Exercise 6.2.15. A sequence of functions (fn ) deﬁned on a set E ⊆ R is called equicontinuous if for every > 0 there exists a δ > 0 such that |fn (x)−fn (y)| < for all n ∈ N and |x − y| < δ in E. (a) What is the diﬀerence between saying that a sequence of functions (fn ) is equicontinuous and just asserting that each fn in the sequence is individually uniformly continuous? (b) Give a qualitative explanation for why the sequence gn (x) = xn is not equicontinuous on [0, 1]. Is each gn uniformly continuous on [0, 1]? Exercise 6.2.16 (Arzela–Ascoli Theorem). For each n ∈ N, let fn be a function deﬁned on [0, 1]. If (fn ) is bounded on [0, 1]—that is, there exists an M > 0 such that |fn (x)| ≤ M for all n ∈ N and x ∈ [0, 1]—and if the collection of functions (fn ) is equicontinuous (Exercise 6.2.15), follow these steps to show that (fn ) contains a uniformly convergent subsequence. (a) Use Exercise 6.2.14 to produce a subsequence (fnk ) that converges at every rational point in [0, 1]. To simplify the notation, set gk = fnk . It remains to show that (gk ) converges uniformly on all of [0, 1]. (b) Let > 0. By equicontinuity, there exists a δ > 0 such that |gk (x) − gk (y)| <

3

for all |x−y| < δ and k ∈ N. Using this δ, let r1 , r2 , . . . , rm be a ﬁnite collection of rational points with the property that the union of the neighborhoods Vδ (ri ) contains [0,1]. Explain why there must exist an N ∈ N such that |gs (ri ) − gt (ri )| <

3

164

Chapter 6. Sequences and Series of Functions

for all s, t ≥ N and ri in the ﬁnite subset of [0, 1] just described. Why does having the set {r1 , r2 , . . . , rm } be ﬁnite matter? (c) Finish the argument by showing that, for an arbitrary x ∈ [0, 1], |gs (x) − gt (x)| < for all s, t ≥ N .

6.3

Uniform Convergence and Diﬀerentiation

Example 6.2.2 (iii) imposes some signiﬁcant restrictions on what we might hope to be true regarding diﬀerentiation and uniform convergence. If hn → h uniformly and each hn is diﬀerentiable, we should not anticipate that hn → h because in this example h (x) does not even exist at x = 0. The key assumption necessary to be able to prove any facts about the derivative of the limit function is that the sequence of derivatives be uniformly convergent. This may sound as though we are assuming what it is we would like to prove, and there is some validity to this complaint. The more hypotheses a proposition has, the more diﬃcult it is to apply. The content of the next theorem is that if we are given a pointwise convergent sequence of diﬀerentiable functions, and if we know that the sequence of derivatives converges uniformly to something, then the limit of the derivatives is indeed the derivative of the limit. Theorem 6.3.1. Let fn → f pointwise on the closed interval [a, b], and assume that each fn is diﬀerentiable. If (fn ) converges uniformly on [a, b] to a function g, then the function f is diﬀerentiable and f = g. Proof. Let > 0 and ﬁx c ∈ [a, b]. We want to argue that f (c) exists and equals g(c). Because f is deﬁned by the limit f (c) = lim

x→c

f (x) − f (c) , x−c

our task is to produce a δ > 0 so that f (x) − f (c) − g(c) < x−c whenever 0 < |x − c| < δ. Using the triangle inequality, we can estimate f (x) − f (c) fn (x) − fn (c) f (x) − f (c) − g(c) ≤ − x−c x−c x−c fn (x) − fn (c) + − fn (c) + |fn (c) − g(c)| . x−c Our intent is to force each of the three terms on the right-hand side to be less than /3. This will not be too diﬃcult in the case of the third term (because

6.3. Uniform Convergence and Diﬀerentiation

165

fn → g uniformly) nor in the case of the second (because fn is diﬀerentiable). Handling the ﬁrst term requires the most delicate touch, and we tend to this task ﬁrst. Apply the Mean Value Theorem to the function fm −fn on the interval [c, x]. (If x < c, the argument is the same.) By MVT, there exists an α ∈ (c, x) such that (fm (x) − fn (x)) − (fm (c) − fn (c)) fm (α) − fn (α) = . x−c Now, by the Cauchy Criterion for Uniform Convergence (Theorem 6.2.5), there exists an N1 ∈ N such that m, n ≥ N1 implies |fm (α) − fn (α)| <

. 3

We should point out that α depends on the choice of m and n, so it is crucial to have uniform convergence of (fn ) at this point in the argument. Putting these last two statements together leads to the conclusion that fm (x) − fm (c) fn (x) − fn (c) < − 3 x−c x−c for all m, n ≥ N1 and all x ∈ [a, b]. Because fm → f , we can take the limit as m → ∞ and use the Order Limit Theorem (Theorem 2.3.4) to assert that f (x) − f (c) fn (x) − fn (c) ≤ − 3 x−c x−c for all n ≥ N1 . To complete the proof, choose N2 large enough so that |fm (c) − g(c)| <

3

for all m ≥ N2 , and then let N = max{N1 , N2 }. Having settled on a choice of N , we use the fact that fN is diﬀerentiable to produce a δ > 0 for which fN (x) − fN (c) < (c) − f N 3 x−c whenever 0 < |x − c| < δ. Finally, we observe that for these values of x, f (x) − f (c) ≤ f (x) − f (c) − fN (x) − fN (c) − g(c) x−c x−c x−c fN (x) − fN (c) + (c) + |fN (c) − g(c)| − fN x−c + + = < 3 3 3

166

Chapter 6. Sequences and Series of Functions

The hypothesis in Theorem 6.3.1 is unnecessarily strong. We actually do not need to assume that fn (x) → f (x) at each point in the domain because the assumption that the sequence of derivatives (fn ) converges uniformly is nearly strong enough to prove that (fn ) converges, uniformly in fact. Two functions with the same derivative may diﬀer by a constant, so we must assume that there is at least one point x0 where fn (x0 ) → f (x0 ). Theorem 6.3.2. Let (fn ) be a sequence of diﬀerentiable functions deﬁned on the closed interval [a, b], and assume (fn ) converges uniformly on [a, b]. If there exists a point x0 ∈ [a, b] where fn (x0 ) is convergent, then (fn ) converges uniformly on [a, b]. Proof. Exercise 6.3.5. Combining the last two results produces a stronger version of Theorem 6.3.1. Theorem 6.3.3. Let (fn ) be a sequence of diﬀerentiable functions deﬁned on the closed interval [a, b], and assume (fn ) converges uniformly to a function g on [a, b]. If there exists a point x0 ∈ [a, b] for which fn (x0 ) is convergent, then (fn ) converges uniformly. Moreover, the limit function f = lim fn is diﬀerentiable and satisﬁes f = g.

Exercises Exercise 6.3.1. (a) Let

sin(nx) . n Show that hn → 0 uniformly on R. At what points does the sequence of derivatives hn converge? (b) Modify this example to show that it is possible for a sequence (fn ) to converge uniformly but for (fn ) to be unbounded. hn (x) =

Exercise 6.3.2. Consider the sequence of functions deﬁned by gn (x) =

xn . n

(a) Show (gn ) converges uniformly on [0, 1] and ﬁnd g = lim gn . Show that g is diﬀerentiable and compute g (x) for all x ∈ [0, 1]. (b) Now, show that (gn ) converges on [0, 1]. Is the convergence uniform? Set h = lim gn and compare h and g . Are they the same? Exercise 6.3.3. Consider the sequence of functions fn (x) =

x . 1 + nx2

Exercise 6.2.4 contains some advice for how to show that (fn ) converges uniformly on R. Review or complete this exercise. Now, let f = lim fn . Compute fn (x) and ﬁnd all the values of x for which f (x) = lim fn (x).

6.4. Series of Functions

167

Exercise 6.3.4. Let

nx + x2 , 2n and set g(x) = lim gn (x). Show that g is diﬀerentiable in two ways: (a) Compute g(x) by algebraically taking the limit as n → ∞ and then ﬁnd g (x). (b) Compute gn (x) for each n ∈ N and show that the sequence of derivatives (gn ) converges uniformly on every interval [−M, M ]. Use Theorem 6.3.3 to conclude g (x) = lim gn (x). (c) Repeat parts (a) and (b) for the sequence fn (x) = (nx2 + 1)/(2n + x). gn (x) =

Exercise 6.3.5. Use the following advice to supply a proof for Theorem 6.3.2. To get started, observe that the triangle inequality implies that, for any x ∈ [a, b], |fn (x) − fm (x)| ≤ |(fn (x) − fm (x)) − (fn (x0 ) − fm (x0 ))| + |fn (x0 ) − fm (x0 )|. Now, apply the Mean Value Theorem to fn − fm .

6.4

Series of Functions

Deﬁnition 6.4.1. For each n ∈ N, let fn and f be functions deﬁned on a set A ⊆ R. The inﬁnite series ∞

fn (x) = f1 (x) + f2 (x) + f3 (x) + · · ·

n=1

converges pointwise on A to f (x) if the sequence sk (x) of partial sums deﬁned by sk (x) = f1 (x) + f2 (x) + · · · + fk (x) converges pointwise to f (x). The series converges uniformly on A to f if the sequence sk (x) converges uniformly ∞on A to f (x). ∞ In either case, we write f = n=1 fn or f (x) = n=1 fn (x), always being explicit about the type of convergence involved. ∞ If we have a series n=1 fn where the functions fn are continuous, then the Algebraic Continuity Theorem (Theorem 4.3.4) guarantees that the partial sums—because they are ﬁnite sums—will be continuous as well. A corresponding observation is true if we are dealing with diﬀerentiable functions. As a consequence, we can immediately translate the results for sequences in the previous sections into statements about the behavior of inﬁnite series of functions. Theorem ∞6.4.2. Let fn be continuous functions deﬁned on a set A ⊆ R, and assume n=1 fn converges uniformly on A to a function f . Then, f is continuous on A. Proof. Apply Theorem 6.2.6 to the partial sums sk = f1 + f2 + · · · + fk .

168

Chapter 6. Sequences and Series of Functions

Theorem 6.4.3. ∞ Let fn be diﬀerentiable functions deﬁned on an interval [a, b], and assume n=1 fn (x) converges uniformly to a limit g(x) on A. If there exists ∞ ∞ a point x0 ∈ [a, b] where n=1 fn (x0 ) converges, then the series n=1 fn (x) converges uniformly to a diﬀerentiable function f (x) satisfying f (x) = g(x) on [a, b]. In other words, f (x) =

∞

fn (x)

and

f (x) =

∞

fn (x).

n=1

n=1

Proof. Apply Theorem 6.3.3 to the partial sums sk = f1 + f2 + · · · + fk . Observe that Theorem 5.2.4 implies that sk = f1 + f2 + · · · + fk . In the vocabulary of inﬁnite series, the Cauchy Criterion takes the following form. Theorem (Cauchy Criterion for Uniform Convergence of Series). 6.4.4 ∞ A series n=1 fn converges uniformly on A ⊆ R if and only if for every > 0 there exists an N ∈ N such that for all n > m ≥ N , |fm+1 (x) + fm+2 (x) + fm+3 (x) + · · · + fn (x)| < for all x ∈ A. The beneﬁts of uniform convergence over pointwise convergence suggest the need for some ways of determining when a series converges uniformly. The following corollary to the Cauchy Criterion is the most common such tool. In particular, it will be quite useful in our upcoming investigations of power series. Corollary 6.4.5 (Weierstrass M-Test). For each n ∈ N, let fn be a function deﬁned on a set A ⊆ R, and let Mn > 0 be a real number satisfying

for all x ∈ A. If

|fn (x)| ≤ Mn ∞ n=1 Mn converges, then n=1 fn converges uniformly on A.

∞

Proof. Exercise 6.4.2.

Exercises Exercise 6.4.1. Prove that if verges uniformly to zero.

∞

n=1 gn

converges uniformly, then (gn ) con-

Exercise 6.4.2. Supply the details for the proof of the Weierstrass M-Test (Corollary 6.4.5). ∞ Exercise 6.4.3. (a) Show that g(x) = n=1 cos(2n x)/2n is continuous on all of R. ∞ (b) Prove that h(x) = n=1 xn /n2 is continuous on [−1, 1].

6.5. Power Series

169

Exercise 6.4.4. In Section 5.4, we postponed the argument that the nowherediﬀerentiable function ∞ 1 g(x) = h(2n x) n 2 n=0 is continuous on R. Use the Weierstrass M-Test to supply the missing proof. Exercise 6.4.5. Let f (x) =

∞ sin(kx) k=1

k3

.

(a) Show that f (x) is diﬀerentiable and that the derivative f (x) is continuous. (b) Can we determine if f is twice-diﬀerentiable? Exercise 6.4.6. Observe that the series ∞ x3 x4 x2 xn + + f (x) = =x+ + ··· 2 3 4 n n=1 converges for every x in the half-open interval [0, 1) but does not converge when x = 1. For a ﬁxed x0 ∈ (0, 1), explain how we can still use the Weierstrass M-Test to prove that f is continuous at x0 . Exercise 6.4.7. Let h(x) =

∞

1 . 2 + n2 x n=1

(a) Show that h is a continuous function deﬁned on all of R. (b) Is h diﬀerentiable? If so, is the derivative function h continuous? Exercise 6.4.8. Let {r1 , r2 , r3 , . . . } be an enumeration of the set of rational numbers. For each rn ∈ Q, deﬁne 1/2n for x > rn un (x) = 0 for x ≤ rn . ∞ Now, let h(x) = n=1 un (x). Prove that h is a monotone function deﬁned on all of R that is continuous at every irrational point.

6.5

Power Series

It is time to put some mathematical teeth into our understanding of functions expressed in the form of a power series; that is, functions of the form f (x) =

∞

an xn = a0 + a1 x + a2 x2 + a3 x3 + · · · .

n=0

The ﬁrst order of business is to determine the points x ∈ R for which the resulting series on the right-hand side converges. This set certainly contains x = 0, and, as the next result demonstrates, it takes a very predictable form.

170

Chapter 6. Sequences and Series of Functions

∞ Theorem 6.5.1. If a power series n=0 an xn converges at some point x0 ∈ R, then it converges absolutely for any x satisfying |x| < |x0 |. ∞ Proof. If n=0 an xn0 converges, then the sequence of terms (an xn0 ) is bounded. (In fact, it converges to 0.) Let M > 0 satisfy |an xn0 | ≤ M for all n ∈ N. If x ∈ R satisﬁes |x| < |x0 |, then n n x x |an xn | = |an x0 |n ≤ M . x0 x0 But notice that

n x M x0 n=0 ∞

is a geometric series with ratio |x/x0 | < 1 and so converges. By the Comparison ∞ Test, n=0 an xn converges absolutely. The main implication of Theorem 6.5.1 is that the set of points for which a given power series converges must necessarily be {0}, R, or a bounded interval centered around x = 0. Because of the strict inequality in Theorem 6.5.1, there is some ambiguity about the endpoints of the interval, and it is possible that the set of convergent points may be of the form (−R, R), [−R, R), (−R, R], or [−R, R]. The value of R is referred to as the radius of convergence of a power series, and it is customary to assign R the value 0 or ∞ to represent the set {0} or R, respectively. Some of the standard devices for computing the radius of convergence for a power series are explored in the exercises. Of more interest to us here is the investigation of the properties of functions deﬁned in this way. Are they continuous? Are they diﬀerentiable? If so, can we diﬀerentiate the series term-by-term? What happens at the endpoints?

Establishing Uniform Convergence The positive answers to the preceding questions, and the usefulness of power series in general, are largely due to the fact that they converge uniformly on compact sets contained in their domain of convergent points. As we are about to see, a complete proof of this fact requires a fairly delicate argument attributed to the Norwegian mathematician Niels Abel. A signiﬁcant amount of progress, however, can be made with the Weierstrass M-Test (Corollary 6.4.5). ∞ Theorem 6.5.2. If a power series n=0 an xn converges absolutely at a point x0 , then it converges uniformly on the closed interval [−c, c], where c = |x0 |. Proof. This proof requires a straightforward application of the Weierstrass MTest. The details are requested in Exercise 6.5.5. For many applications, Theorem 6.5.2 is good enough. For instance, combining our results about uniform convergence and power series to this point, we

6.5. Power Series

171

can now argue that a power series that converges on an open interval (−R, R) is necessarily continuous on this interval (Exercise 6.5.4). But what happens if we know that a series converges at an endpoint of its interval of convergence? Does the good behavior of the series on (−R, R) necessarily extend to the endpoint x = R? If the convergence of the series at x = R is absolute convergence, then we can again rely on Theorem 6.5.2 to conclude that the series converges uniformly on the set [−R, R]. The remaining interesting open question is what happens if a series converges conditionally at a point x = R. We may still use Theorem 6.5.1 to conclude that we have pointwise convergence on the interval (−R, R], but more work is needed to establish uniform convergence on compact sets containing x = R.

Abel’s Theorem

∞ We should remark that if the power series g(x) = n=0 an xn converges conditionally at x = R, then it is possible for it to diverge when x = −R. The series ∞ (−1)n xn n n=1 with R = 1 is an example. To keep our attention ﬁxed on the convergent endpoint, we will prove uniform convergence on the set [0, R]. The argument we need is very similar to the proof of Abel’s Test, which is included in the exercises of Section 2.7. The ﬁrst step in the proof of Abel’s Test is an estimate, sometimes called Abel’s Lemma, which we will presently need again. Lemma ∞ 6.5.3 (Abel’s Lemma). Let bn satisfy b1 ≥ b2 ≥ b3 ≥ · · · ≥ 0, and let n=1 an be a series for which the partial sums are bounded. In other words, assume there exists A > 0 such that |a1 + a2 + · · · + an | ≤ A for all n ∈ N . Then, for all n ∈ N, |a1 b1 + a2 b2 + a3 b3 + · · · + an bn | ≤ 2Ab1 . Proof. This inequality follows from the so-called summation-by-parts formula. Exercises 2.7.12 and 2.7.14(b) from Section 2.7 contain the relevant deﬁnitions and advice required to complete the argument. It is worth observing that if A were an upper bound on the partial sums of |an | (note the absolute value bars), then the proof of Lemma 6.5.3 would be a simple exercise in the triangle inequality. (Also, we would not need the factor of 2, and, actually, it is not needed in general except that our particular method of proof requires it.) The point of the matter is that because we are only assuming conditional convergence, the triangle inequality is not going to be of any use in proving Abel’s Theorem, but we are now in possession of an inequality that we can use in its place.

172

Chapter 6. Sequences and Series of Functions

∞ Theorem 6.5.4 (Abel’s Theorem). Let g(x) = n=0 an xn be a power series that converges at the point x = R > 0. Then the series converges uniformly on the interval [0, R]. A similar result holds if the series converges at x = −R. Proof. To set the stage for an application of Lemma 6.5.3, we ﬁrst write g(x) =

∞

n

an x =

n=0

∞ n=0

(an Rn )

x n R

.

Let > 0. By the Cauchy Criterion for Uniform Convergence of Series (Theorem 6.4.4), we will be done if we can produce an N such that n > m ≥ N implies m+1 x m+2 m+2 (am+1 Rm+1 ) x (1) + (a R ) + ··· m+2 R R x n +(an Rn ) < . R ∞ Because we are assuming that n=1 an Rn converges, the Cauchy Criterion for convergent series of real numbers guarantees that there exists an N such that |am+1 Rm+1 + am+2 Rm+2 + · · · + an Rn | <

2

whenever n > m ≥ N . But now, for any ﬁxed m ∈ N , we can apply Lemma 6.5.3 to the sequences obtained by omitting the ﬁrst m terms. Using /2 as a ∞ bound on the partial sums of j=1 am+j Rm+j and observing that (x/R)m+j is monotone decreasing, an application of Lemma 6.5.3 to equation (1) yields x m+2 m+1 (am+1 Rm+1 ) x + (am+2 Rm+2 ) + ··· R R x m+1 x n + (an Rn ) ≤ . ≤2 R 2 R The fact that the inequality is not strict (as the Cauchy Criterion technically requires) is a distraction but not a real deﬁciency. We leave it as a point for discussion.

The Success of Power Series An economical way to summarize the conclusions of Theorem 6.5.2 and Abel’s Theorem is with the following statement. Theorem 6.5.5. If a power series converges pointwise on the set A ⊆ R, then it converges uniformly on any compact set K ⊆ A. Proof. Exercise 6.5.6.

6.5. Power Series

173

This fact leads to the desirable conclusion that a power series is continuous at every point at which it converges. To make an argument for diﬀerentiability, we would like to appeal to Theorem 6.4.3; however, this result has a slightlymore involved set of hypotheses. In order to conclude that a power ∞ series n=0 an xn is diﬀerentiable, and that term-by-term diﬀerentiation is al∞ lowed, we need to know beforehand that the diﬀerentiated series n=1 nan xn−1 converges uniformly. ∞ Theorem 6.5.6. an xn converges for all x ∈ (−R, R), then the diﬀer∞If n=0 n−1 entiated series n=1 nan x converges at each x ∈ (−R, R) as well. Consequently, the convergence is uniform on compact sets contained in (−R, R).

Proof. Exercise 6.5.7. We should point out that it is possible for a series to converge at an endpoint but for the diﬀerentiated series to diverge at this point. The series ∞ x = R n n=1 (−x) /n has this property when x = 1. On the other hand, if the diﬀerentiated series does converge at the point x = R, then Abel’s Theorem applies and the convergence of the diﬀerentiated series is uniform on compact sets that contain R. With all the pieces in place, we summarize the impressive conclusions of this section. Theorem 6.5.7. Assume g(x) =

∞

an xn

n=0

converges on an interval A ⊆ R. The function g is continuous on A and diﬀerentiable on any open interval (−R, R) ⊆ A. The derivative is given by g (x) =

∞

nan xn−1 .

n=1

Moreover, g is inﬁnitely diﬀerentiable on (−R, R), and the successive derivatives can be obtained via term-by-term diﬀerentiation of the appropriate series. Proof. The details for why g is continuous are requested in the exercises. Theorem 6.5.6 justiﬁes the application of Theorem 6.4.3, which veriﬁes the formula for g . A diﬀerentiated power series is a power series in its own right, and Theorem 6.5.6 implies that, although the series may no longer converge at a particular endpoint, the radius of convergence does not change. By induction, then, power series are diﬀerentiable an inﬁnite number of times.

174

Chapter 6. Sequences and Series of Functions

Exercises Exercise 6.5.1. Consider the function g deﬁned by the power series g(x) = x −

x3 x4 x5 x2 + − + − ··· . 2 3 4 5

(a) Is g deﬁned on (−1, 1)? Is it continuous on this set? Is g deﬁned on (−1, 1]? Is it continuous on this set? What happens on [−1, 1]? Can the power series for g(x) possibly converge for any other points |x| > 1? Explain. (b) For what values of x is g (x) deﬁned? Find a formula for g . Exercise 6.5.2. Find suitable coeﬃcients (an ) so that the resulting power se ries an xn (a) converges absolutely for all x ∈ [−1, 1] and diverges oﬀ of this set; (b) converges conditionally at x = −1 and diverges at x = 1; (c) converges conditionally at both x = −1 and x = 1. (d) Is it possible to ﬁnd an example of a power series that converges conditionally at x = −1 and converges absolutely at x = 1? Exercise 6.5.3. Explain why a power series can converge conditionally for at most two points. Exercise 6.5.4. (a) By referencing the proper theorems, produce a detailed argument that a power series that converges on the interval (−R, R) necessarily represents a continuous function at each point x ∈ (−R, R). (b) If the series converges at an endpoint x = R, point out how we know continuity extends to the set (R, R]. Exercise 6.5.5. Use the Weierstrass M-Test to prove Theorem 6.5.2. Exercise 6.5.6. Show how Theorem 6.5.1, Theorem 6.5.2, and Abel’s Theorem together imply that if a power series converges pointwise on a compact set, then the convergence is actually uniform on this set. Exercise 6.5.7. (a) The Ratio Test (from Exercise 2.7.9) states that if (bn ) is series a sequence of nonzero terms satisfying lim |bn+1 /bn | = r < 1, then the n−1 bn converges. Use this to argue that if s satisﬁes 0 < s < 1, then ns is bounded for all n ≥ 1. (b) Given an arbitrary x ∈ (−R, R), pick t to satisfy |x| < t < R. Use the observation n−1 x 1 n n−1 |an tn | |nan xn−1 | = t t to construct a proof for Theorem 6.5.6. Exercise 6.5.8. Let an xn be a power series with an = 0, and assume an+1 L = lim n→∞ an

6.5. Power Series

175

exists. (a) Show that if L = 0, then the series converges for all x in (−1/L, 1/L). (The advice in Exercise 2.7.9 may be helpful.) (b) Show that if L = 0, then the series converges for all x ∈ R. (c) Show that (a) and (b) continue to hold if L is replaced by the limit ak+1 L = lim sn where sn = sup :k≥n . n→∞ ak The value L is called the “limit superior” or “lim sup” of the sequence |an+1 /an |. It exists if and only if the sequence is bounded (Exercise 2.4.6). (d) Show that if |an+1 /an | is unbounded, then the original series a n xn converges only when x = 0. Exercise 6.5.9. Use Theorem 6.5.7 to argue that power series are unique. If we have ∞ ∞ an xn = b n xn n=0

n=0

for all x in an interval (−R, R), prove that an = bn for all n = 0, 1, 2, . . . . (Start by showing that a0 = b0 .) Exercise 6.5.10. Review the deﬁnitions and results from Section 2.8 concerning products of series and Cauchy products in particular. At the end of Section bn converge condi2.9, we mentioned the following result: If both an and tionally to A and B respectively, then it is possible for the Cauchy product, dn where dn = a0 bn + a1 bn−1 + · · · + an b0 , to diverge. However, if dn does converge, then it must converge to AB. To prove this, set bn xn , and h(x) = dn xn . f (x) = an xn , g(x) = Use Abel’s Theorem and the result in Exercise 2.8.8 to establish this result. ∞ Exercise 6.5.11. A series n=0 an is said to be Abel-summable to L if the power series ∞ f (x) = an xn n=0

converges for all x ∈ [0, 1) and L = limx→1− f (x). (a) Show that any series that converges to a limit L is also Abel-summable to L. ∞ (b) Show that n=0 (−1)n is Abel-summable and ﬁnd the sum. Exercise 6.5.12. Consider the function G from the opening discussion on branching processes, and recall that the increasing monotone sequence of probabilities (dr ) has a limit d = lim dr that satisﬁes G(d) = d. Assume we are in the situation where there are two ﬁxed points: G(1) = 1 and some other value 0 < d0 < 1 satisfying G(d0 ) = d0 . Formulate an argument for why the sequence (dr ) necessarily converges to the value d = d0 and not to d = 1.

176

6.6

Chapter 6. Sequences and Series of Functions

Taylor Series

Our study of power series has led to some enthusiastic conclusions about the nature of functions of the form f (x) = a0 + a1 x + a2 x2 + a3 x3 + a4 x4 + · · · . Despite their inﬁnite character, power series can be manipulated more or less as though they are polynomials. On its interval of convergence, a power series is continuous and inﬁnitely diﬀerentiable, and successive derivatives can be computed by performing the desired operation on each individual term in the series—just as it is done for polynomials. As we will see in the next chapter, the situation regarding integrals of power series is just as pleasant. As students of introductory calculus know well, the processes of integration and diﬀerentiation, as well as basic algebraic manipulations, are rather straightforward when applied only to polynomials. It is the introduction of functions such as sin(x), ln(x), and arctan(x) that necessitates much of the symbolic acrobatics taught in ﬁrst and second semester calculus courses. This phenomenon leads us to ask whether functions such as arctan(x) and the rest have representations as power series. In the examples in this section, we will assume all of the familiar properties of the trigonometric, inverse trigonometric, exponential, and logarithmic functions. Rigorously deﬁning these functions is an interesting exercise in analysis. In fact, one of the most common methods for providing deﬁnitions is through power series. The point of this discussion, however, is to come at this question from the other direction. Assuming we are in possession of an inﬁnitely diﬀerentiable function such as arctan(x), can we ﬁnd suitable coeﬃcients an so that arctan(x) = a0 + a1 x + a2 x2 + a3 x3 + a4 x4 + · · · for at least some nonzero values of x?

Manipulating Series We already have one example of a power series expansion of a familiar function. In the material on inﬁnite series in Example 2.7.5 from Chapter 2, we proved that (1)

1 = 1 + t + t 2 + t 3 + t4 + · · · 1−t

for all t ∈ (−1, 1). Substituting −t2 for t yields 1 = 1 − t2 + t4 − t6 + t8 − · · · . 1 + t2 But now we can use the fact that arctan(x) =

* 0

x

1 dt. 1 + t2

6.6. Taylor Series

177

Although integration has not been rigorously studied at this point, we will see )b )b in Chapter 7 that if fn → f uniformly on an interval [a, b], then a fn → a f . This observation, together with the Fundamental Theorem of Calculus, leads to the formula 1 1 1 arctan(x) = x − x3 + x5 − x7 + · · · . 3 5 7 Exercise 6.6.1. Upcoming results in Chapter 7 will justify this equation for all x ∈ (−1, 1), but notice that this series actually converges when x = 1. Assuming that arctan(x) is continuous, explain why the value of the series at x = 1 must necessarily be arctan(1). What interesting identity do we get in this case? Exercise 6.6.2. Starting from the identity in equation (1) of this section, ﬁnd a power series representation for ln(1 + x). For which values of x is this expression valid?

Taylor’s Formula for the Coeﬃcients Manipulating old series to produce new ones was a well-honed craft in the 17th and 18th centuries when the usefulness of inﬁnite series was ﬁrst being realized. But there also emerged a formula for producing the coeﬃcients from “scratch”— a recipe for generating a power series representation using only the function in question and its derivatives. The technique is named after the mathematician Brook Taylor (1685–1731) who published it in 1715, although it was certainly known previous to this date. Given an inﬁnitely diﬀerentiable function f deﬁned on some interval centered at zero, the idea is to assume that f has a power series expansion and deduce what the coeﬃcients must be; that is, write (2)

f (x) = a0 + a1 x + a2 x2 + a3 x3 + a4 x4 + a5 x5 + · · · .

Setting x = 0 in this expression gives f (0) = a0 . Exercise 6.6.3. (a) Take derivatives of each side of equation (2), and deduce that f (0) = a1 . In general, prove that if f has a power series expansion, then the coeﬃcients must be given by the formula an =

f (n) (0) . n!

Supply references to the theorem(s) that justify the manipulations carried out on the series in equation (2). Exercise 6.6.4. Use Taylor’s formula for an from the preceding exercise to produce/verify the so-called Taylor series for sin(x) given by sin(x) ∼ x −

x3 x5 x7 + − + ··· . 3! 5! 7!

178

Chapter 6. Sequences and Series of Functions

Lagrange’s Remainder Theorem We need to be very clear about what we have proved to this point. To derive Taylor’s formula, we assumed that f actually had a power series representation. The conclusion is that if f is inﬁnitely diﬀerentiable on an interval centered at zero, and if f can be expressed in the form ∞

f (x) =

an xn ,

n=0

then it must be that an =

f (n) (0) . n!

But what about the converse question? Assume f is inﬁnitely diﬀerentiable in a neighborhood of zero. If we let an = does the resulting series

f (n) (0) , n!

∞

an xn

n=0

converge to f (x) on some nontrivial set of points? Does it converge at all? If it does converge, we know that the limit function is a well-behaved, inﬁnitely diﬀerentiable function whose derivatives at zero are exactly the same as the derivatives of f . Is it possible for this limit to be diﬀerent from f ? In other words, might the Taylor series of a function f converge to the wrong thing? Let SN (x) = a0 + a1 x + a2 x2 + · · · + aN xN . The polynomial SN (x) is a partial sum of the Taylor series expansion for the function f (x). Thus, we are interested in whether or not lim SN (x) = f (x)

N →∞

for some values of x besides zero. A powerful tool for analyzing this question was provided by Joseph Louis Lagrange (1736–1813). The idea is to consider the diﬀerence EN (x) = f (x) − SN (x), which represents the error between f and the partial sum SN . Theorem 6.6.1 (Lagrange’s Remainder Theorem). Let f be inﬁnitely differentiable on (−R, R), deﬁne an = f (n) (0)/n!, and let SN = a0 + a1 x + a2 x2 + · · · + aN xN .

6.6. Taylor Series

179

Given x = 0, there exists a point c satisfying |c| < |x| where the error function EN (x) = f (x) − SN (x) satisﬁes EN (x) =

f (N +1) (c) N +1 . x (N + 1)!

Before embarking on a proof, let’s examine the signiﬁcance of this result. Proving SN (x) → f (x) is equivalent to showing EN (x) → 0. There are three components to the expression for EN (x). In the denominator, we have (N + 1)!, which helps to make EN small as N tends to inﬁnity. In the numerator, we have xN +1 , which potentially grows depending on the size of x. Thus, we should expect that a Taylor series is less likely to converge the farther x is chosen from the origin. Finally, we have f (N +1) (c), which is a bit of a mystery. For functions with straightforward derivatives, this term can often be handled using a suitable upper bound. Example 6.6.2. Consider the Taylor series for sin(x) generated earlier. How well does 1 1 S5 (x) = x − x3 + x5 3! 5! approximate sin(x) on the interval [−2, 2]? Lagrange’s Remainder Theorem asserts that the diﬀerence between these two functions is E5 (x) = sin(x) − S5 (x) =

− cos(c) 6 x 6!

for some c in the interval (−|x|, |x|). Without knowing the value of c, we can be quite certain that | cos(c)| ≤ 1. Because x ∈ [−2, 2], we have that |E5 (x)| ≤

26 ≈ .089. 6!

Exercise 6.6.5. Prove that SN (x) converges uniformly to sin(x) on [−2, 2]. Generalize this proof to show that the convergence is uniform on any interval of the form [−R, R]. Exercise 6.6.6. (a) Generate the Taylor coeﬃcients for the exponential function f (x) = ex , and then prove that the corresponding Taylor series converges uniformly to ex on any interval of the form [−R, R]. (b) Verify the formula f (x) = ex . (c) Use a substitution to generate the series for e−x , and then calculate ex · e−x by multiplying together the two series and collecting common powers of x. Proof of Lagrange’s Remainder Theorem: Review the Generalized Mean Value Theorem (Theorem 5.3.5) from Chapter 5. Exercise 6.6.7. Explain why the error function EN (x) = fN (x) − SN (x) satisﬁes (n) EN (0) = 0 for all n = 0, 1, 2, . . . , N .

180

Chapter 6. Sequences and Series of Functions

To simplify notation, let’s assume x > 0 and apply the Generalized Mean Value Theorem to the functions EN (x) and xN +1 on the interval [0, x]. Thus, there exists a point x1 ∈ (0, x) such that (x1 ) EN (x) EN = . xN +1 (N + 1)xN 1

Exercise 6.6.8. Finish the proof of Lagrange’s Remainder Theorem.

A Counterexample Lagrange’s Remainder Theorem is extremely useful for determining how well the partial sums of the Taylor series approximate the original function, but it leaves unresolved the central question of whether or not the Taylor series necessarily converges to the function that generated it. The appearance of the nth derivative f (n) (c) in the error formula makes any general statement impossible. There are, in fact, several other ways to represent the error between the partial sum SN (x) and the function f (x), but none lend themselves to a proof that SN → f . This is because no such proof exists! Let 2 for x = 0, e−1/x g(x) = 0 for x = 0. d x e = ex and In the exercises that follow we will need the familiar formula dx −x x the property that e = 1/e . (Note that we could use the series generated in Exercise 6.6.6 as the deﬁnition of the exponential function ex . Parts (b) and (c) of this exercise verify that this series possesses these properties.) Although we have proved all of the standard rules of diﬀerentiation, none of these rules can be used to directly compute the derivatives of g at x = 0.

Exercise 6.6.9. Use the ∞/∞ version of L’Hospital’s rule (Theorem 5.3.8) to prove that g (0) = 0. Exercise 6.6.10. Compute g (x) for x = 0. Compute g (x) and g (x) for x = 0. Use these observations and invent whatever notation is needed to give a general description for the nth derivative g (n) (x) at points diﬀerent from zero. Now,

g (x) − g (0) g (x) = lim . x→0 x→0 x−0 x

g (0) = lim

Exercise 6.6.11. Compute g (0). From this example, produce a general argument for how to compute g (n) (0). Exercise 6.6.12. Discuss the consequences of this example. Is g inﬁnitely differentiable? What does its Taylor series look like? At what points does this series converge? To what? What are the implications of this example on the conjecture that every inﬁnitely diﬀerentiable function can be represented by its Taylor series expansion?

6.7. Epilogue

6.7

181

Epilogue

The fact that power series behave so impeccably well under the operations of calculus makes the search for Taylor series expansions a worthwhile enterprise. As it turns√out, the traditional list of functions from calculus—sin(x), ln(x), arccos(x), 1 + x—all have Taylor series representations that converge on some nontrivial interval to the function from which they were derived. This fact played a major role in the expanding achievements of calculus in the 17th and 18th centuries and understandably led to speculation that every function could be represented in such a fashion. (The term “function” at this time implicitly referred to functions that were inﬁnitely diﬀerentiable.) This point of view eﬀectively ended with Cauchy’s discovery in 1821 of the counterexample presented at the end of the previous section. So under what conditions does the Taylor series necessarily converge to the generating function? Lagrange’s Remainder Theorem states that the diﬀerence between the Taylor polynomial SN (x) and the function f (x) is given by EN (x) =

f (N +1) (c) N +1 . x (N + 1)!

The Ratio Test shows that the (N + 1)! term in the denominator grows more rapidly than the xN +1 term in the numerator. Thus, if we knew for instance that |f (N +1) (c)| ≤ M for all c ∈ (−R, R) and N ∈ N, we could be sure that EN (x) → 0 and hence that SN (x) → f (x). This is the case for sin(x), cos(x), and ex , whose derivatives do not grow at all as N → ∞. It is also possible to formulate weaker conditions on the rate of growth of f (N +1) that guarantee convergence. It is not altogether clear whether Cauchy’s counterexample should come as a surprise. The fact that every previous search for a Taylor series ended in success certainly gives the impression that a power series representation is an intrinsic property of inﬁnitely diﬀerentiable functions. But notice what we are saying here. A Taylor series for a function f is constructed from the values of f and its derivatives at the origin. If the Taylor series converges to f on some interval (−R, R), then the behavior of f near zero completely determines its behavior at every point in (−R, R). One implication of this would be that if two functions with Taylor series agree on some small neighborhood (−, ), then these two functions would have to be the same everywhere. When it is put this way, we probably should not expect a Taylor series to always converge back to the function from which it was derived. As we have seen, this is not the case for real-valued functions. What is fascinating, however, is that results of this nature do hold for functions of a complex variable. The deﬁnition of the derivative looks symbolically the same when the real numbers are replaced by complex numbers, but the implications are profoundly diﬀerent. In this setting, a function that is diﬀerentiable at every point in some open disc must necessarily be inﬁnitely diﬀerentiable on this set. This supplies the ingredients to construct

182

Chapter 6. Sequences and Series of Functions

the Taylor series that in every instance converges uniformly on compact sets to the function that generated it.

Chapter 7

The Riemann Integral 7.1

Discussion: How Should Integration be Deﬁned?

The Fundamental Theorem of Calculus is a statement about the inverse relationship between diﬀerentiation and integration. It comes in two parts, depending on whether we are diﬀerentiating an integral or integrating a derivative. Under suitable hypotheses on the functions f and F , the Fundamental Theorem of Calculus states that * b (i) F (x) dx = F (b) − F (a) and a

(ii) if G(x) =

* a

x

f (t) dt, then G (x) = f (x).

Before we can undertake any type of rigorous investigation of these statements, )b we need to settle on a deﬁnition for a f . Historically, the concept of integration was deﬁned as the inverse process of diﬀerentiation. In other words, the integral of a function f was understood to be a function F that satisﬁed F = f . Newton, Leibniz, Fermat, and the other founders of calculus then went on to explore the relationship between antiderivatives and the problem of computing areas. This approach is ultimately unsatisfying from the point of view of analysis because it results in a very limited number of functions that can be integrated. Recall that every derivative satisﬁes the intermediate value property (Darboux’s Theorem, Theorem 5.2.7). This means that any function with a jump discontinuity cannot be a derivative. If we want to deﬁne integration via antidiﬀerentiation, then we must accept the consequence that a function as simple as 1 for 0 ≤ x < 1 h(x) = 2 for 1 ≤ x ≤ 2 is not integrable on the interval [0, 2]. 183

184

Chapter 7. The Riemann Integral

•

x0 =a

c1

•

x1

c2

•

x2

c3 x3 ...

•

cn xn−1 xn =b

Figure 7.1: A Riemann Sum.

A very interesting shift in emphasis occurred around 1850 in the work of Cauchy, and soon after in the work of Bernhard Riemann. The idea was to completely divorce integration from the derivative and instead use the notion of “area under the curve” as a starting point for building a rigorous deﬁnition of the integral. The reasons for this were complicated. As we have mentioned earlier (Section 1.2), the concept of function was undergoing a transformation. The traditional understanding of a function as a holistic formula such as f (x) = x2 was being replaced with a more liberal interpretation, which included such bizarre constructions as Dirichlet’s function discussed in Section 4.1. Serving as a catalyst to this evolution was the budding theory of Fourier series (discussed in Section 8.3), which required, among other things, the need to be able to integrate these more unruly objects. The Riemann integral, as it is called today, is the one usually discussed in introductory calculus. Starting with a function f on [a, b], we partition the domain into small subintervals. On each subinterval [xk−1 , xk ], we pick some point ck ∈ [xk−1 , xk ] and use the y-value f (ck ) as an approximation for f on [xk−1 , xk ]. Graphically speaking, the result is a row of thin rectangles constructed to approximate the area between f and the x-axis. The area of each rectangle is f (ck )(xk − xk−1 ), and so the total area of all of the rectangles is given by the Riemann sum (Fig. 7.1) n

f (ck )(xk − xk−1 ).

k=1

Note that “area” here comes with the understanding that areas below the x-axis are assigned a negative value. What should be evident from the graph is that the accuracy of the Riemannsum approximation seems to improve as the rectangles get thinner. In some

7.1. Discussion: How Should Integration be Deﬁned?

185

sense, we take the limit of these approximating Riemann sums as the width of the individual subintervals of the partitions tends to zero. This limit, if it exists, )b is Riemann’s deﬁnition of a f .

This brings us to a handful of questions. Creating a rigorous meaning for the limit just referred to is not too diﬃcult. What will be of most interest to us—and was also to Riemann—is deciding what types of functions can be integrated using this procedure. Speciﬁcally, what conditions on f guarantee that this limit exists?

The theory of the Riemann integral turns on the observation that smaller subintervals produce better approximations to the function f . On each subinterval [xk−1 , xk ], the function f is approximated by its value at some point ck ∈ [xk−1 , xk ]. The quality of the approximation is directly related to the diﬀerence |f (x) − f (ck )| as x ranges over the subinterval. Because the subintervals can be chosen to have arbitrarily small width, this means that we want f (x) to be close to f (ck ) whenever x is close to ck . But this sounds like a discussion of continuity! We will soon see that the continuity of f is intimately related to the existence of )b the Riemann integral a f .

Is continuity suﬃcient to prove that the Riemann sums converge to a welldeﬁned limit? Is it necessary, or can the Riemann integral handle a discontinuous function such as h(x) mentioned earlier? Relying on the intuitive notion )2 of area, it would seem that 0 h = 3, but does the Riemann integral reach this conclusion? If so, how discontinuous can a function be before it fails to be integrable? Can the Riemann integral make sense out of something as pathological as Dirichlet’s function on the interval [0, 1]? A function such as g(x) =

x2 sin( x1 ) for x = 0 0 for x = 0

raises another interesting question. Here is an example of a diﬀerentiable function, studied in Section 5.1, where the derivative g (x) is not continuous. As we explore the class of integrable functions, some attempt must be made to reunite the integral with the derivative. Having deﬁned integration independently of diﬀerentiation, we would like to come back and investigate the conditions under which equations (i) and (ii) from the Fundamental Theorem of Calculus stated earlier hold. If we are making a wish list for the types of functions that we want to be integrable, then in light of equation (i) it seems desirable to expect this set to at least contain the set of derivatives. The fact that derivatives are not always continuous is further motivation not to content ourselves with an integral that cannot handle some discontinuities.

186

Chapter 7. The Riemann Integral

Mk

❄ ✻

mk

a=x0

xk−1

xk

b=xn

Figure 7.2: Upper and Lower Sums.

7.2

The Deﬁnition of the Riemann Integral

Although it has the beneﬁt of some modern polish, the development of the integral presented in this chapter is closely related to the procedure just discussed. In place of Riemann sums, we will construct upper sums and lower sums (Fig. 7.2), and in place of a limit we will use a supremum and an inﬁmum. Throughout this section, it is assumed that we are working with a bounded function f on a closed interval [a, b], meaning that there exists an M > 0 such that |f (x)| ≤ M for all x ∈ [a, b].

Partitions, Upper Sums, and Lower Sums Deﬁnition 7.2.1. A partition P of [a, b] is a ﬁnite, ordered set P = {a = x0 < x1 < x2 < · · · < xn = b}. For each subinterval [xk−1 , xk ] of P , let mk = inf{f (x) : x ∈ [xk−1 , xk ]}

and

Mk = sup{f (x) : x ∈ [xk−1 , xk ]}.

The lower sum of f with respect to P is given by L(f, P ) =

n

mk (xk − xk−1 ).

k=1

Likewise, we deﬁne the upper sum of f with respect to P by U (f, P ) =

n k=1

Mk (xk − xk−1 ).

7.2. The Deﬁnition of the Riemann Integral

187

For a particular partition P , it is clear that U (f, P ) ≥ L(f, P ). The fact that this same inequality holds if the upper and lower sums are computed with respect to diﬀerent partitions is the content of the next two lemmas. Deﬁnition 7.2.2. A partition Q is a reﬁnement of a partition P if Q contains all of the points of P . In this case, we write P ⊆ Q. Lemma 7.2.3. If P ⊆ Q, then L(f, P ) ≤ L(f, Q), and U (f, P ) ≥ U (f, Q). Proof. Consider what happens when we reﬁne P by adding a single point z to some subinterval [xk−1 , xk ] of P .

m k mk =mk

xk−1 z

xk

Focusing on the lower sum for a moment, we have mk (xx − xk−1 ) = mk (xk − z) + mk (z − xk−1 ) ≤ mk (xk − z) + mk (z − xk−1 ), where mk = inf {f (x) : x ∈ [z, xk ]}

and

mk = inf {f (x) : x ∈ [xk−1 , z]}

are each necessarily as large or larger than mk . By induction, we have L(f, P ) ≤ L(f, Q), and an analogous argument holds for the upper sums. Lemma 7.2.4. If P1 and P2 are any two partitions of [a, b], then L(f, P1 ) ≤ U (f, P2 ). Proof. Let Q = P1 ∪ P2 be the so-called common reﬁnement of P1 and P2 . Because Q ⊆ P1 and Q ⊆ P2 , it follows that L(f, P1 ) ≤ L(f, Q) ≤ U (f, Q) ≤ U (f, P2 ).

188

Chapter 7. The Riemann Integral

Integrability Intuitively, it helps to visualize a particular upper sum as an overestimate for the value of the integral and a lower sum as an underestimate. As the partitions get more reﬁned, the upper sums get potentially smaller while the lower sums get potentially larger. A function is integrable if the upper and lower sums “meet” at some common value in the middle. Rather than taking a limit of these sums, we will instead make use of the Axiom of Completeness and consider the inﬁmum of the upper sums and the supremum of the lower sums. Deﬁnition 7.2.5. Let P be the collection of all possible partitions of the interval [a, b]. The upper integral of f is deﬁned to be U (f ) = inf{U (f, P ) : P ∈ P}. In a similar way, deﬁne the lower integral of f by L(f ) = sup{U (f, P ) : P ∈ P}. The following fact is not surprising. Lemma 7.2.6. For any bounded function f on [a, b], it is always the case that U (f ) ≥ L(f ). Proof. Exercise 7.2.1. Deﬁnition 7.2.7 (Riemann Integrability). A bounded function f deﬁned on the interval [a, b] is Riemann-integrable if U (f ) = L(f ). In this case, we )b )b deﬁne a f or a f (x) dx to be this common value; namely, * a

b

f = U (f ) = L(f ).

The modiﬁer “Riemann” in front of “integrable” accurately suggests that there are other ways to deﬁne the integral. In fact, our work in this chapter will expose the need for a diﬀerent approach, one of which is discussed in Section 8.1. In this chapter, the Riemann integral is the only method under consideration, so it will usually be convenient to drop the modiﬁer “Riemann” and simply refer to a function as being “integrable.”

Criteria for Integrability To summarize the situation thus far, it is always the case for a bounded function f on [a, b] that sup{L(f, P ) : P ∈ P} = L(f ) ≤ U (f ) = inf{U (f, P ) : P ∈ P}. The function f is integrable if the inequality is an equality. The major thrust of our investigation of the integral is to describe, as best we can, the class

7.2. The Deﬁnition of the Riemann Integral

189

of integrable functions. The preceding inequality reveals that integrability is really equivalent to the existence of partitions whose upper and lower sums are arbitrarily close together. Theorem 7.2.8. A bounded function f is integrable on [a, b] if and only if, for every > 0, there exists a partition P of [a, b] such that U (f, P ) − L(f, P ) < . Proof. Let > 0. If such a partition P exists, then U (f ) − L(f ) ≤ U (f, P ) − L(f, P ) < . Because is arbitrary, it must be that U (f ) = L(f ), so f is integrable. (To be absolutely precise here, we could throw in a reference to Theorem 1.2.6.) The proof of the converse statement is a familiar triangle inequality argument with parentheses in place of absolute value bars because, in each case, we know which quantity is larger. Because U (f ) is the greatest lower bound of the upper sums, we know that, given some > 0, there must exist a partition P1 such that U (f, P1 ) L(f ) − . 2 Now, let P = P1 ∪ P2 be the common reﬁnement. Keeping in mind that the integrability of f means U (f ) = L(f ), we can write U (f, P ) − L(f, P ) ≤ U (f, P1 ) − L(f, P2 ) = (U (f, P1 ) − U (f )) + (L(f ) − L(f, P2 )) < + = . 2 2 In the discussion at the beginning of this chapter, it became clear that integrability is closely tied to the concept of continuity. To make this observation more precise, let P = {a = x0 < x1 < x2 < · · · < xn = b} be an arbitrary partition of [a, b], and deﬁne ∆xk = xk − xk−1 . Then, U (f, P ) − L(f, P ) =

n

(Mk − mk )∆xk ,

k=1

where Mk and mk are the supremum and inﬁmum of the function on the interval [xk−1 , xk ] respectively. Our ability to control the size of U (f, P )−L(f, P ) hinges on the diﬀerences Mk − mk , which we can interpret as the variation in the range of the function over the interval [xk−1 , xk ]. Restricting the variation of f over arbitrarily small intervals in [a, b] is precisely what it means to say that f is uniformly continuous on this set.

190

Chapter 7. The Riemann Integral

Theorem 7.2.9. If f is continuous on [a, b], then it is integrable. Proof. The ﬁrst crucial observation is that because f is continuous on a compact set, it is uniformly continuous. This means that, given > 0, there exists a δ > 0 so that |x − y| < δ guarantees |f (x) − f (y)| <

. b−a

Now, let P be a partition of [a, b] where ∆xk = xk − xk−1 is less than δ for every subinterval of P . Mk =f (zk )

mk =f (yk )

xk−1

zk

yk

xk −xk−1 <δ

xk

Given a particular subinterval [xk−1 , xk ] of P , we know from the Extreme Value Theorem (Theorem 4.4.3) that the supremum Mk = f (zk ) for some zk ∈ [xk−1 , xk ]. In addition, the inﬁmum mk is attained at some point yk also in the interval [xk−1 , xk ]. But this means |zk − yk | < δ, so Mk − mk = f (zk ) − f (yk ) <

. b−a

Finally, U (f, P ) − L(f, P ) =

n k=1

n

(Mk − mk )∆xk <

∆xk = , b−a k=1

and f is integrable by the criterion given in Theorem 7.2.8.

Exercises Exercise 7.2.1. Let f be a bounded function on [a, b], and let P be an arbitrary partition of [a, b]. First, explain why U (f ) ≥ L(f, P ). Now, prove Lemma 7.2.6. Exercise 7.2.2. Consider f (x) = 2x + 1 over the interval [1, 3]. Let P be the partition consisting of the points {1, 3/2, 2, 3}. (a) Compute L(f, P ), U (f, P ), and U (f, P ) − L(f, P ). (b) What happens to the value of U (f, P ) − L(f, P ) when we add the point 5/2 to the partition? (c) Find a partition P of [1, 3] for which U (f, P ) − L(f, P ) < 2.

7.3. Integrating Functions with Discontinuities

191

Exercise 7.2.3. Show directly (without appealing to Theorem 7.2) that the constant function f (x) = k is integrable over any closed interval [a, b]. What is )b f? a Exercise 7.2.4. (a) Prove that a bounded function f is integrable on [a, b] if and only if there exists a sequence of partitions (Pn )∞ n=1 satisfying lim [U (f, Pn ) − L(f, Pn )] = 0.

n→∞

(b) For each n, let Pn be the partition of [0, 1] into n equal subintervals. Find formulas for U (f, Pn ) and L(f, Pn ) if f (x) = x. The formula 1+2+3+· · ·+n = n(n + 1)/2 will be useful. (c) Use the sequential criterion for integrability from (a) to show directly that f (x) = x is integrable on [0, 1]. Exercise 7.2.5. Assume that, for each n, fn is an integrable function on [a, b]. If (fn ) → f uniformly on [a, b], prove that f is also integrable on this set. (We will see that this conclusion does not necessarily follow if the convergence is pointwise.) Exercise 7.2.6. Let f : [a, b] → R be increasing on the set [a, b] (i.e., f (x) ≤ f (y) whenever x < y). Show that f is integrable on [a, b].

7.3

Integrating Functions with Discontinuities

The fact that continuous functions are integrable is not so much a fortunate discovery as it is evidence for a well-designed integral. Riemann’s integral is a modiﬁcation of Cauchy’s deﬁnition of the integral, which was crafted speciﬁcally to work on continuous functions. The interesting issue is discovering just how dependent the Riemann integral is on the continuity of the integrand. Example 7.3.1. Consider the function 1 for x = 1 f (x) = 0 for x = 1 on the interval [0, 2]. If P is any partition of [0, 2], a quick calculation reveals that U (f, P ) = 2. The lower sum L(f, P ) will be less than 2 because any subinterval of P that contains x = 1 will contribute zero to the value of the lower sum. The way to show that f is integrable is to construct a partition that minimizes the eﬀect of the discontinuity by embedding x = 1 into a very small subinterval. Let > 0, and consider the partition P = {0, 1 − /3, 1 + /3, 2}. Then, + 0() + 1 1 − L(f, P ) = 1 1 − 3 3 2 = 2 − . 3

192

Chapter 7. The Riemann Integral

Because U (f, P ) = 2, we have U (f, P ) − L(f, P ) =

2 < . 3

We can now use Theorem 7.2.8 to conclude that f is integrable. Although the function in Example 7.3.1 is extremely simple, the method used to show it is integrable is really the same one used to prove that any bounded function with a single discontinuity is integrable. The notation in the following proof is more cumbersome, but the essence of the argument is that the misbehavior of the function at its discontinuity is isolated inside a particularly small subinterval of the partition. Theorem 7.3.2. If f : [a, b] → R is bounded, and f is integrable on [c, b] for all c ∈ (a, b), then f is integrable on [a, b]. An analogous result holds at the other endpoint. Proof. Let M be a bound for f so that |f (x)| ≤ M for all x ∈ [a, b]. If P = {a = x0 < x1 < x2 < · · · xn = b} is a partition of [a, b], then U (f, P ) − L(f, P ) =

n

(Mk − mk )∆xk

k=1

= (M1 − m1 )(x1 − a) +

n

(Mk − mk )∆xk

k=2

= (M1 − m1 )(x1 − a) + (U (f, P[x1 ,b] ) − L(f, P[x1 ,b] )), where P[x1 ,b] = {x1 < x2 < · · · < xn = b} is the partition of [x1 , b] obtained by deleting a from P . Given > 0, the ﬁrst step is to choose x1 close enough to a so that (M1 − m1 )(x1 − a) < . 2 This is not too diﬃcult. Because M1 − m1 ≤ 2M , we can pick x1 so that x1 − a ≤ . 4M Now, by hypothesis, f is integrable on [x1 , b] so there exists a partition P1 of [x1 , b] for which U (f, P1 ) − L(f, P1 ) < . 2 Finally, we let P2 = {a} ∪ P1 be a partition of [a, b], from which it follows that U (f, P2 ) − L(f, P2 ) ≤ (2M )(x1 − a) + (U (f, P1 ) − L(f, P1 )) < + = . 2 2

7.3. Integrating Functions with Discontinuities

193

Theorem 7.3.2 only allows for a discontinuity at the endpoint of an interval, but that is easily remedied. In the next section, we will prove that integrability on the intervals [a, b] and [b, d] is equivalent to integrability on [a, d]. This property, together with an induction argument, leads to the conclusion that any function with a ﬁnite number of discontinuities is still integrable. What if the number of discontinuities is inﬁnite? Example 7.3.3. Recall Dirichlet’s function 1 for x rational g(x) = 0 for x irrational from Section 4.1. If P is some partition of [0, 1], then the density of the rationals in R implies that every subinterval of P will contain a point where g(x) = 1. It follows that U (g, P ) = 1. On the other hand, L(g, P ) = 0 because the irrationals are also dense in R. Because this is the case for every partition P , we see that the upper integral U (f ) = 1 and the lower integral L(f ) = 0. The two are not equal, so we conclude that Dirichlet’s function is not integrable. How discontinuous can a function be before it fails to be integrable? Before jumping to the hasty (and incorrect) conclusion that the Riemann integral fails for functions with more than a ﬁnite number of discontinuities, we should realize that Dirichlet’s function is discontinuous at every point in [0, 1]. It would be useful to investigate a function where the discontinuities are inﬁnite in number but do not necessarily make up all of [0, 1]. Thomae’s function, also deﬁned in Section 4.1, is one such example. The discontinuous points of this function are precisely the rational numbers in [0, 1]. In Section 7.6, we will see that Thomae’s function is Riemann-integrable, raising the bar for allowable discontinuous points to include potentially inﬁnite sets. The conclusion of this story is contained in the doctoral dissertation of Henri Lebesgue, who presented his work in 1901. Lebesgue’s elegant criterion for Riemann integrability is explored in great detail in Section 7.6. For the moment, though, we will take a short detour from questions of integrability and construct a proof of the celebrated Fundamental Theorem of Calculus.

Exercises Exercise 7.3.1. Consider the function 1 for 0 ≤ x < 1 h(x) = 2 for x = 1 over the interval [0, 1]. (a) Show that L(f, P ) = 1 for every partition P of [0, 1]. (b) Construct a partition P for which U (f, P ) < 1 + 1/10. (c) Given > 0, construct a partition P for which U (f, P ) < 1 + .

194

Chapter 7. The Riemann Integral

Exercise 7.3.2. In Example 7.3.3, we learned that Dirichlet’s function g(x) is not Riemann-integrable. Construct a sequence gn (x) of integrable functions with gn → g pointwise on [0, 1]. This demonstrates that the pointwise limit of integrable functions need not be integrable. Compare this example to the result requested in Exercise 7.2.5. Exercise 7.3.3. Here is an alternate explanation for why a function f on [a, b] with a ﬁnite number of discontinuities is integrable. Supply the missing details. Embed each discontinuity in a suﬃciently small open interval and let O be the union of these intervals. Explain why f is uniformly continuous on [a, b]\O, and use this to ﬁnish the argument. Exercise 7.3.4. Assume f : [a, b] → R is integrable. (a) Show that if one value of f (x) is changed at some point x ∈ [a, b], then f is still integrable and integrates to the same value as before. (b) Show that the observation in (a) holds if a ﬁnite number of values of f are changed. (c) Find an example to show that by altering a countable number of values, f may fail to be integrable. Exercise 7.3.5. Let f (x) =

1 0

if x = 1/n for some n ∈ N otherwise.

Show that f is integrable on [0, 1] and compute

)1 0

f.

Exercise 7.3.6. A set A ⊆ [a, b] has content zero if for every > 0 there exists a ﬁnite collection of open intervals {O1 , O2 , . . . , ON } that contain A in their union and whose lengths sum to or less. Using |On | to refer to the length of each interval, we have A⊆

N n=1

On

and

N

|On | ≤ .

k=1

(a) Let f be bounded on [a, b]. Show that if the set of discontinuous points of f has content zero, then f is integrable. (b) Show that any ﬁnite set has content zero. (c) Content zero sets do not have to be ﬁnite. They do not have to be countable. Show that the Cantor set C deﬁned in Section 3.1 has content zero. (d) Prove that 1 if x ∈ C h(x) = 0 if x ∈ / C. is integrable, and ﬁnd the value of the integral.

7.4. Properties of the Integral

7.4

195

Properties of the Integral

Before embarking on the proof of the Fundamental Theorem of Calculus, we need to verify what are probably some very familiar properties of the integral. The discussion in the previous section has already made use of the following fact. Theorem 7.4.1. Assume f : [a, b] → R is bounded, and let c ∈ (a, b). Then, f is integrable on [a, b] if and only if f is integrable on [a, c] and [c, b]. In this case, we have * b * c * b f= f+ f. a

a

c

Proof. If f is integrable on [a, b], then for > 0 there exists a partition P such that U (f, P ) − L(f, P ) < . Because reﬁning a partition can only potentially bring the upper and lower sums closer together, we can simply add c to P if it is not already there. Then, let P1 = P ∩ [a, c] be a partition of [a, c], and P2 = P ∩ [c, b] be a partition of [c, b]. It follows that U (f, P1 ) − L(f, P1 ) <

and

U (f, P2 ) − L(f, P2 ) < ,

implying that f is integrable on [a, c] and [c, b]. Conversely, if we are given that f is integrable on the two smaller intervals [a, c] and [c, b], then given an > 0 we can produce partitions P1 and P2 of [a, c] and [c, b], respectively, such that U (f, P1 ) − L(f, P1 ) < and U (f, P ) − L(f, P ) < . 2 2 Letting P = P1 ∪ P2 produces a partition of [a, b] for which U (f, P ) − L(f, P ) < . Thus, f is integrable on [a, b]. Continuing to let P = P1 ∪ P2 as earlier, we have * b f ≤ U (f, P ) < L(f, P ) + a

which implies

)b a

= L(f, P1 ) + L(f, P2 ) + * c * b ≤ f+ f + , )c

f≤ * c a

a

)b

c

f + c f . To get the other inequality, observe that * b f+ f ≤ U (f, P1 ) + U (f, P2 ) a

c

< L(f, P1 ) + L(f, P2 ) + = L(f, P ) + * b ≤ f + . a

196

Chapter 7. The Riemann Integral )c

Because > 0 is arbitrary, we must have *

c

a

* f+

b

c

)b

f+

a

* f=

a

b

c

f≤

)b a

f , so

f,

as desired. The proof of Theorem 7.4.1 demonstrates some of the standard techniques involved for proving facts about the Riemann integral. Admittedly, manipulating partitions does not lend itself to a great deal of elegance. The next result catalogs the remainder of the basic properties of the integral that we will need in our upcoming arguments. Theorem 7.4.2. Assume f and g are integrable functions on the interval [a, b]. )b )b )b (i) The function f + g is integrable on [a, b] with a (f + g) = a f + a g. (ii) For k ∈ R, the function kf is integrable with (iii) If m ≤ f ≤ M , then m(b − a) ≤ (iv) If f ≤ g, then

)b a

f≤

)b a

)b

)b a

kf = k

)b a

f.

f ≤ M (b − a).

a

g.

(v) The function |f | is integrable and |

)b a

f| ≤

)b a

|f |.

Proof. Properties (i) and (ii) are reminiscent of the Algebraic Limit Theorem and its many descendants (Theorems 2.3.3, 2.7.1, 4.2.4, and 5.2.4). In fact, there is a way to use the Algebraic Limit Theorem for this argument as well. An immediate corollary to Theorem 7.2.8 is that a function f is integrable on [a, b] if and only if there exists a sequence of partitions (Pn ) satisfying (1)

lim [U (f, Pn ) − L(f, Pn )] = 0,

n→∞

)b and in this case a f = lim U (f, Pn ) = lim L(f, Pn ). (A proof for this was requested as Exercise 7.2.4.) To prove (ii) for the case k ≥ 0, ﬁrst verify that for any partition P we have U (kf, P ) = kU (f, P )

and

L(kf, P ) = kL(f, P ).

Exercise 1.3.5 is used here. Because f is integrable, there exist partitions (Pn ) satisfying (1). Turning our attention to the function (kf ), we see that lim [U (kf, Pn ) − L(kf, Pn )] = lim k [U (f, Pn ) − L(f, Pn )] = 0,

n→∞

n→∞

and the formula in (ii) follows. The case where k < 0 is similar except that we have U (kf, Pn ) = kL(f, Pn ) and L(kf, Pn ) = kU (f, Pn ).

7.4. Properties of the Integral

197

A proof for (i) can be constructed using similar methods and is requested in Exercise 7.4.5. To prove (iii), observe that * U (f, P ) ≥

a

b

f ≥ L(f, P )

for any partition P . Statement (iii) follows if we take P to be the trivial partition consisting of only the endpoints a and b. For (iv), let h = g − f ≥ 0 and use (i) and (iii). Because −|f | ≤ f ≤ |f |, statement (v) will follow from (iv) provided that we can show that |f | is actually integrable. The proof of this fact is outlined in Exercise 7.4.1. )b To this point, the quantity a f is only deﬁned in the case where a < b. Deﬁnition 7.4.3. If f is integrable on the interval [a, b], deﬁne *

a

b

Also, deﬁne

* f =−

* c

c

b

a

f.

f = 0.

Deﬁnition 7.4.3 is a natural convention to simplify the algebra of integrals. If f is an integrable function on some interval I, then it is straightforward to verify that the equation * b * c * b f= f+ f a

a

c

from Theorem 7.4.1 remains valid for any three points a, b, and c chosen in any order from I.

Uniform Convergence and Integration If (fn ) is a sequence of integrable functions on [a, b], and if fn → f , then we are inevitably going to want to know whether * (2)

a

b

* fn →

a

b

f.

This is an archetypical instance of one of the major themes of analysis: When does a mathematical manipulation such as integration respect the limiting process? If the convergence is pointwise, then any number of things can go wrong. It is possible for each fn to be integrable but for the limit f not to be integrable

198

Chapter 7. The Riemann Integral

(Exercise 7.3.2). Even if the limit function f is integrable, equation (2) may fail to hold. As an example of this, let n if 0 < x < 1/n fn (x) = 0 if x = 0 or x ≥ 1/n. )1 Each fn has two discontinuities on [0, 1] and so is integrable with 0 fn = 1. For each x ∈ [0, 1], we have lim fn (x) = 0 so that fn → 0 pointwise on [0, 1]. But now observe that the limit function f = 0 certainly integrates to 0, and * 0 = lim

n→∞

1

0

fn .

As a ﬁnal remark on what can go wrong in (2), we should point out that it is )1 possible to modify this example to produce a situation where lim 0 fn does not even exist. One way to resolve all of these problems is to add the assumption of uniform convergence. Theorem 7.4.4. Assume that fn → f uniformly on [a, b] and that each fn is integrable. Then, f is integrable and * lim

n→∞

a

b

* fn =

b

a

f.

Proof. The proof that f is integrable was requested as Exercise 7.2.5. The remainder of this argument is asked for in Exercise 7.4.3.

Exercises Exercise 7.4.1. (a) Let f be a bounded function on a set A, and set M = sup{f (x) : x ∈ A}, M = sup{|f (x)| : x ∈ A},

m = inf{f (x) : x ∈ A},

and

m = inf{|f (x)| : x ∈ A}.

Show that M − m ≥ M − m . (b) Show that if f is integrable on the interval [a, b], then |f | is also integrable on this interval. )b (c) Provide the details for the argument that in this case we have | a f | ≤ )b |f |. a Exercise 7.4.2. Review Deﬁnition 7.4.3. Show that if c ≤ a ≤ b and f is )c )b )b integrable on the interval [c, b], then it is still the case that a f = a f + c f . Exercise 7.4.3. Prove Theorem 7.4.4 including an argument for Exercise 7.2.5 if it is not already done.

7.5. The Fundamental Theorem of Calculus

199

Exercise 7.4.4. Decide which of the following conjectures is true and supply a short proof. For those that are not true, give a counterexample. (a) If |f | is integrable on [a, b] then f is also integrable on this set. (b) Assume g is integrable and ) g ≥ 0 on [a, b]. If g(x) > 0 for an inﬁnite number of points x ∈ [a, b], then g > 0. (c) If g is continuous on [a, b] and g ≥ 0 with g(x0 ) > 0 for at least one point )b x0 ∈ [a, b], then a g > 0. )b (d) If a f > 0, there is an interval [c, d] ⊆ [a, b] and a δ > 0 such that f (x) ≥ δ for all x ∈ [c, d]. Exercise 7.4.5. Let f and g be integrable functions on [a, b]. (a) Show that if P is any partition of [a, b], then U (f + g, P ) ≤ U (f, P ) + U (g, P ). Provide a speciﬁc example where the inequality is strict. What does the corresponding inequality for lower sums look like? (b) Review the proof of Theorem 7.4.2 (ii), and provide an argument for part (i) of this theorem. Exercise 7.4.6. Review the discussion immediately preceding Theorem 7.4.4. (a) Produce an example of a sequence fn → 0 pointwise on [0, 1] where )1 limn→∞ 0 fn does not exist. (b) Produce another example (if necessary) where fn → 0 and the sequence )1 f is unbounded. 0 n (c) Is it possible to construct each fn to be continuous in the examples of parts (a) and (b)? (d) Does it seem possible to construct the sequence (fn ) to be uniformly bounded? (Uniformly bounded means that there exists a single M > 0 satisfying |fn | ≤ M for all n ∈ N. Exercise 7.4.7. Assume that gn and g are bounded integrable functions with gn → g on [0, 1]. The convergence is not uniform; however, the convergence is )1 uniform on any set of the form [δ, 1] where 0 < δ < 1. Show that limn→∞ 0 gn = )1 g. 0

7.5

The Fundamental Theorem of Calculus

The derivative and the integral have been independently deﬁned, each in its own rigorous mathematical terms. The deﬁnition of the derivative is motivated by the problem of ﬁnding tangent lines and is given in terms of functional limits of diﬀerence quotients. The deﬁnition of the integral grows out of the desire to describe areas under nonconstant functions and is given in terms of supremums and inﬁmums of ﬁnite sums. The Fundamental Theorem of Calculus reveals the remarkable inverse relationship between the two processes.

200

Chapter 7. The Riemann Integral

The result is stated in two parts. The ﬁrst is a computational statement that describes how an antiderivative can be used to evaluate an integral over a particular interval. The second statement is more theoretical in nature, expressing the fact that every continuous function is the derivative of its indeﬁnite integral. Theorem 7.5.1 (Fundamental Theorem of Calculus). (i) If f : [a, b] → R is integrable, and F : [a, b] → R satisﬁes F (x) = f (x) for all x ∈ [a, b], then * b f = F (b) − F (a). a

(ii) Let g : [a, b] → R be integrable, and deﬁne * x g G(x) = a

for all x ∈ [a, b]. Then, G is continuous on [a, b]. If g is continuous at some point c ∈ [a, b], then G is diﬀerentiable at c and G (c) = g(c). Proof. (i) Let P be a partition of [a, b] and apply the Mean Value Theorem to F on a typical subinterval [xk−1 , xk ] of P . This yields a point tk ∈ (xk−1 , xk ) where F (xk ) − F (xk−1 ) = F (tk )(xk − xk−1 ) = f (tk )(xk − xk−1 ). Now, consider the upper and lower sums U (f, P ) and L(f, P ). Because mk ≤ f (tk ) ≤ Mk (where mk is the inﬁmum on [xk−1 , xk ] and Mk is the supremum), it follows that n [F (xk ) − F (xk−1 )] ≤ U (f, P ). L(f, P ) ≤ k=1

But notice that the sum in the middle telescopes so that n

[F (xk ) − F (xk−1 )] = F (b) − F (a),

k=1

which is independent of the partition P . Thus we have L(f ) ≤ F (b) − F (a) ≤ U (f ). )b )b Because L(f ) = U (f ) = a f , we conclude that a f = F (b) − F (a). (ii) To prove the second statement, take x, y ∈ [a, b] and observe that * x * x * y g− g = g |G(x) − G(y)| = a

a

*

≤

y x

y

|g|

≤ M |x − y|,

7.5. The Fundamental Theorem of Calculus

201

where M > 0 is a bound on |g|. This shows that G is Lipschitz and so is uniformly continuous on [a, b] (Exercise 4.4.9). Now, let’s assume that g is continuous at c ∈ [a, b]. In order to show that G (c) = g(c), we rewrite the limit for G (c) as * x * c G(x) − G(c) 1 g(t) dt − g(t) dt lim = lim x→c x→c x − c x−c *ax a 1 = lim g(t) dt . x→c x − c c We would like to show that this limit equals g(c). Thus, given an > 0, we must produce a δ > 0 such that if |x − c| < δ then * x 1 g(t) dt − g(c) < . (1) x − c c The assumption of continuity of g gives us control over the diﬀerence |g(t)−g(c)|. In particular, we know that there exists a δ > 0 such that |t − c| < δ implies |g(t) − g(c)| < . To take advantage of this, we cleverly write the constant g(c) as * x 1 g(c) dt g(c) = x−c c and combine the two terms in equation (1) into a single integral. Keeping in mind that |x − c| ≥ |t − c|, we have that for all |x − c| < δ, * x * x 1 = 1 g(t) dt − g(c) [g(t) − g(c)] dt x − c x − c c c * x 1 ≤ |g(t) − g(c)| dt (x − c) c * x 1 < dt = . (x − c) c

Exercises Exercise 7.5.1. We have seen that not every derivative is continuous, but explain how we at least know that every continuous function is a derivative. )x Exercise 7.5.2. (a) Let f (x) = |x| and deﬁne F (x) = −1 f . Find a formula for F (x) for all x. Where is F continuous? Where is F diﬀerentiable? Where does F (x) = f (x)? (b) Repeat part (a) for the function 1 if x < 0 f (x) = 2 if x ≥ 0.

202

Chapter 7. The Riemann Integral

Exercise 7.5.3. The hypothesis in Theorem 7.5.1 (i) that F (x) = f (x) for all x ∈ [a, b] is slightly stronger than it needs to be. Carefully read the proof and state exactly what needs to be assumed with regard to the relationship between f and F for the proof to be valid. Exercise 7.5.4 (Natural Logarithm). Let * x 1 H(x) = dt, 1 t where we consider only x > 0. (a) What is H(1)? Find H (x). (b) Show that H is strictly increasing; that is, show that if 0 < x < y, then H(x) < H(y). (c) Show that H(cx) = H(c) + H(x). (Think of c as a constant and diﬀerentiate g(x) = H(cx).) Exercise 7.5.5. The Fundamental Theorem of Calculus can be used to supply a shorter argument for Theorem 6.3.1 under the additional assumption that the sequence of derivatives is continuous. Assume fn → f pointwise and fn → g uniformly on [a, b]. Assuming each fn is continuous, we can apply Theorem 7.5.1 (i) to get * x fn = fn (x) − fn (a) a

for all x ∈ [a, b]. Show that g(x) = f (x). Exercise 7.5.6. Use part (ii) of Theorem 7.5.1 to construct another proof of part (i) of Theorem ) x 7.5.1 using the following strategy. Given f and F as in part (i), set G(x) = a f . What is the relationship between F and G? Exercise 7.5.7 (Average Value). If g is continuous on [a, b], show that there exists a point c ∈ (a, b) where g(c) =

1 b−a

* a

b

g.

Exercise 7.5.8. Given a function f on [a, b], deﬁne the total variation of f to be + ( n |f (xk ) − f (xk−1 )| , V f = sup k=1

where the supremum is taken over all partitions P of [a, b]. (a) If f is continuously diﬀerentiable (f exists as a continuous function), )b use the Fundamental Theorem of Calculus to show V f ≤ a |f |. (b) Use the Mean Value Theorem to establish the reverse inequality and )b conclude that V f = a |f |.

7.6. Lebesgue’s Criterion for Riemann Integrability Exercise 7.5.9. Let

h(x) =

203

1 if x < 1 or x > 1 0 if x = 1,

)x and deﬁne H(x) = 0 h. Show that even though h is not continuous at x = 1, H(x) is still diﬀerentiable at x = 1. Exercise 7.5.10. Assume f is integrable on [a, b] and has a “jump discontinuity” at c ∈ (a, b). This means that both one-sided limits exist as x approaches c from the left and from the right, but that lim f (x) = lim f (x). +

x→c−

x→c

(This phenomenon is discussed in more detail in Section 4.6.) )x Show that F (x) = a f is not diﬀerentiable at x = c. Exercise 7.5.11. The Epilogue to Chapter 5 mentions the existence of a continuous monotone function that fails to be diﬀerentiable on a dense subset of R. Combine the results of Exercise 7.5.10 and Exercise 6.4.8 to show how to construct such a function.

7.6

Lebesgue’s Criterion for Riemann Integrability

We now return to our investigation of the relationship between continuity and the Riemann integral. We have proved that continuous functions are integrable and that the integral also exists for functions with only a ﬁnite number of discontinuities. At the opposite end of the spectrum, we saw that Dirichlet’s function, which is discontinuous at every point on [0, 1], fails to be Riemann-integrable. The next examples show that the set of discontinuities of an integrable function can be inﬁnite and even uncountable.

Riemann-integrable Functions with Inﬁnite Discontinuities Recall from Section   1 1/n t(x) =  0

4.1 that Thomae’s function if x = 0 if x = m/n ∈ Q\{0} is in lowest terms with n > 0 if x ∈ /Q

is continuous on the set of irrationals and has discontinuities at every) rational 1 point. Let’s prove that Thomae’s function is integrable on [0, 1] with 0 t = 0. Let > 0. The strategy, as usual, is to construct a partition P of [0, 1] for which U (t, P ) − L(t, P ) < .

204

Chapter 7. The Riemann Integral

Exercise 7.6.1. a) First, argue that L(t, P ) = 0 for any partition P of [0, 1]. b) Consider the set of points D/2 = {x : t(x) ≥ /2}. How big is D/2 ? c) To complete the argument, explain how to construct a partition P of [0, 1] so that U (t, P ) < . We ﬁrst met the Cantor set C in Section 3.1. We have since learned that C is a compact, uncountable subset of the interval [0, 1]. The request of Exercise 4.3.12 is to prove that the function 1 if x ∈ C g(x) = 0 if x ∈ /C is continuous at every point of the complement of C and has discontinuities at each point of C. Thus, g is not continuous on an uncountably inﬁnite set. ∞ Exercise 7.6.2. Using the fact that C = n=0 Cn , where each Cn consists of a ﬁnite union of closed intervals, argue that g is Riemann-integrable on [0, 1].

Sets of Measure Zero Thomae’s function fails to be continuous at each rational number in [0, 1]. Although this set is inﬁnite, we have seen that any subset of Q is countable. Countably inﬁnite sets are the smallest type of inﬁnite set. The Cantor set is uncountable, but it is also small in a sense that we are now ready to make precise. In the introduction to Chapter 3, we presented an argument that the Cantor set has zero “length.” The term “length” is awkward here because it really should only be applied to intervals or unions of intervals, which the Cantor set is not. There is a generalization of the concept of length to more general sets called the measure of a set. Of interest to our discussion are subsets that have measure zero. Deﬁnition 7.6.1. A set A ⊆ R has measure zero if, for all > 0, there exists a countable collection of open intervals On with the property that A is contained in the union of all of the intervals On and the sum of the lengths of all of the intervals is less than or equal to . More precisely, if |On | refers to the length of the interval On , then we have A⊆

∞

On

and

n=1

∞

|On | ≤ .

n=1

Example 7.6.2. Consider a ﬁnite set A = {a1 , a2 , . . . , aN }. To show that A has measure zero, let > 0 be arbitrary. For each 1 ≤ n ≤ N , construct the interval , an + . Gn = an − 2N 2N Clearly, A is contained in the union of these intervals, and N n=1

|Gn | =

N = . N n=1

7.6. Lebesgue’s Criterion for Riemann Integrability

205

Exercise 7.6.3. Show that any countable set has measure zero. Exercise 7.6.4. Prove that the Cantor set (which is uncountable) has measure zero. Exercise 7.6.5. Show that if two sets A and B each have measure zero, then A ∪ B has measure zero as well. In addition, discuss the proof of the stronger statement that the countable union of sets of measure zero also has measure zero. (This second statement is true, but a completely rigorous proof requires a result about double summations discussed in Section 2.8.)

α-Continuity Deﬁnition 7.6.3. Let f be deﬁned on [a, b], and let α > 0. The function f is α-continuous at x ∈ [a, b] if there exists δ > 0 such that for all y, z ∈ (x−δ, x+δ) it follows that |f (y) − f (z)| < α. Let f be a bounded function on [a, b]. For each α > 0, deﬁne Dα to be the set of points in [a, b] where the function f fails to be α-continuous; that is, (1)

Dα = {x ∈ [a, b] : f is not α-continuous at x.}.

The concept of α-continuity was previously introduced in Section 4.6. Several of the ensuing exercises appeared as exercises in this section as well. Exercise 7.6.6. If α1 < α2 , show that Dα2 ⊆ Dα1 . Now, let (2)

D = {x ∈ [a, b] : f is not continuous at x }.

Exercise 7.6.7. (a) Let α > 0 be given. Show that if f is continuous at x ∈ [a, b], then it is α-continuous at x as well. Explain how it follows that Dα ⊆ D. (b) Show that if f is not continuous at x, then f is not α-continuous for some α > 0. Now, explain why this guarantees that D=

∞

D1/n .

n=1

Exercise 7.6.8. Prove that for a ﬁxed α > 0, the set Dα is closed. Exercise 7.6.9. By imitating the proof of Theorem 4.4.8, show that if, for some ﬁxed α > 0, f is α-continuous at every point on some compact set K, then f is uniformly α-continuous on K. By uniformly α-continuous, we mean that there exists a δ > 0 such that whenever x and y are points in K satisfying |x − y| < δ, it follows that |f (x) − f (y)| < α.

206

Chapter 7. The Riemann Integral

Compactness Revisited Compactness of subsets of the real line can be described in three equivalent ways. The following theorem appears toward the end of Section 3.3. Theorem 7.6.4. Let K ⊆ R. The following three statements are all equivalent, in the sense that if any one is true, then so are the two others. (i) Every sequence contained in K has a convergent subsequence that converges to a limit in K. (ii) K is closed and bounded. (iii) Givena collection of open intervals {Gα : α ∈ Λ} that covers K; that is, K ⊆ α∈Λ Gα , there exists a ﬁnite subcollection {Gα1 , Gα2 , Gα3 , . . . , GαN } of the original set that also covers K. The equivalence of (i) and (ii) has been used throughout the core material in the text. Characterization (iii) has been less central but is essential to the upcoming argument. So that the material in this section is self-contained, we quickly outline a proof that (i) and (ii) imply (iii). (This also appears as Exercise 3.3.8.) Proof. Assume K satisﬁes (i) and (ii), and let {Gα : α ∈ Λ} be an open cover of K. For contradiction, let’s assume that no ﬁnite subcover exists. Let I0 be a closed interval containing K, and then bisect I0 into two closed intervals A1 and B1 . It must be that either A1 ∩ K or B1 ∩ K (or both) has no ﬁnite subcover consisting of sets from {Gα : α ∈ Λ}. Let I1 be a half of I0 containing a part of K that cannot be ﬁnitely covered. Repeating this construction results in a nested sequence of closed intervals I0 ⊇ I1 ⊇ I2 ⊇ · · · with the property that, for any n, In ∩ K cannot be ﬁnitely covered and limn |In | = 0. Exercise 7.6.10. (a) Show that there exists an x ∈ K such that x ∈ In for all n. (b) Because x ∈ K, there must exist an open set Gα0 from the original collection that contains x as an element. Explain why this furnishes us with the desired contradiction.

Lebesgue’s Theorem We are now prepared to completely categorize the collection of Riemann-integrable functions in terms of continuity. Theorem 7.6.5 (Lebesgue’s Theorem). Let f be a bounded function deﬁned on the interval [a, b]. Then, f is Riemann-integrable if and only if the set of points where f is not continuous has measure zero.

7.6. Lebesgue’s Criterion for Riemann Integrability

207

Proof. Let M > 0 satisfy |f (x)| ≤ M for all x ∈ [a, b], and let D and Dα be deﬁned as in the preceding equations (1) and (2). Let’s ﬁrst assume that D has measure zero and prove that our function is integrable. (⇐) Set α= . 2(b − a) Exercise 7.6.11. Show that there exists a ﬁnite collection of disjoint open intervals {G1 , G2 , . . . , GN } whose union contains Dα and that satisﬁes N

|Gn | <

n=1

. 4M

Exercise 7.6.12. Let K be what remains of the interval [a, b] after the open N intervals Gn are all removed; that is, K = [a, b]\ n=1 Gn . Argue that f is uniformly α-continuous on K. Exercise 7.6.13. Finish the proof in this direction by explaining how to construct a partition P of [a, b] such that U (f, P ) − L(f, P ) ≤ . It will be helpful to break the sum U (f, P ) − L(f, P ) =

n

(Mk − mk )∆xk

k=1

into two parts, one over those subintervals that contain points of Dα and the other over subintervals that do not. (⇒) For the other direction, assume f is Riemann-integrable. We must argue that the set D of discontinuities of f has measure zero. Fix α > 0, and let > 0 be arbitrary. Because f is Riemann-integrable, there exists a partition P of [a, b] such that U (f, P ) − L(f, P ) < α. Exercise 7.6.14. (a) Use the subintervals of the partition P to prove that Dα has measure zero. Point out that it is possible to choose a cover for Dα that consists of a ﬁnite number of open intervals. (Sets for which this is possible are sometimes called content zero. See Exercise 7.3.6.) (b) Show how this implies that D has measure zero.

A Nonintegrable Derivative To this point, our one example of a nonintegrable function is Dirichlet’s nowherecontinuous function. We close this section with another example that has special signiﬁcance. The content of the Fundamental Theorem of Calculus is that integration and diﬀerentiation are inverse processes of each other. This led us to ask (in the ﬁnal paragraph of the discussion in Section 7.1) whether we could integrate every derivative. For the Riemann integral, the answer is a resounding

208

Chapter 7. The Riemann Integral

1

Figure 7.3: A preliminary sketch of f1 (x).

no. What follows is the construction of a diﬀerentiable function whose derivative cannot be integrated with the Riemann integral. We will once again be interested in the Cantor set C=

∞

Cn ,

n=0

deﬁned in Section 3.1. As an initial step, let’s create a function f (x) that is diﬀerentiable on [0, 1] and whose derivative f (x) has discontinuities at every point of C. The key ingredient for this construction is the function g(x) =

x2 sin(1/x) if x > 0 0 if x ≤ 0.

Exercise 7.6.15. (a) Find g (0). (b) Use the standard rules of diﬀerentiation to compute g (x) for x = 0. (c) Explain why, for every δ > 0, g (x) attains every value between 1 and −1 as x ranges over the set (−δ, δ). Conclude that g is not continuous at x = 0. Now, we want to transport the behavior of g around zero to each of the endpoints of the closed intervals that make up the sets Cn used in the deﬁnition of the Cantor set. The formulas are awkward but the basic idea is straightforward. Start by setting f0 (x) = 0 on C0 = [0, 1]. To deﬁne f1 on [0, 1], ﬁrst assign f1 (x) = 0

for all

' & ' & 2 1 ∪ ,1 . x ∈ C1 = 0, 3 3

7.6. Lebesgue’s Criterion for Riemann Integrability

209

1

Figure 7.4: A graph of f2 (x).

In the remaining open middle third, put translated “copies” of g oscillating toward the two endpoints (Fig. 7.3). In terms of a formula, we have  0 if x ∈ [0, 1/3]    g(x − 1/3) if x is just to the right of 1/3 f1 (x) = g(−x + 1/3) if x is just to the left of 2/3    0 if x ∈ [2/3, 1] . Finally, we splice the two oscillating pieces of f1 together in such a way that makes f1 diﬀerentiable. This is no great feat, and we will skip the details so as to keep our attention focused on the two endpoints 1/3 and 2/3. These are the points where f1 (x) fails to be continuous. To deﬁne f2 (x), we start with f1 (x) and do the same trick as before, this time in the two open intervals (1/9, 2/9) and (7/9, 8/9). The result (Fig. 7.4) is a diﬀerentiable function that is zero on C2 and has a derivative that is not continuous on the set {1/9, 2/9, 1/3, 2/3, 7/9, 8/9}. Continuing in this fashion yields a sequence of functions f0 , f1 , f2 , . . . deﬁned on [0, 1]. Exercise 7.6.16. (a) If c ∈ C, what is limn→∞ fn (c)? (b) Why does limn→∞ fn (x) exist for x ∈ / C? Now, set

f (x) = lim fn (x). n→∞

Exercise 7.6.17. (a) Explain why f (x) exists for all x ∈ / C. (b) If c ∈ C, argue that |f (x)| ≤ (x − c)2 for all x ∈ [0, 1]. Show how this implies f (c) = 0.

210

Chapter 7. The Riemann Integral

(c) Give a careful argument for why f (x) fails to be continuous on C. Remember that C contains many points besides the endpoints of the intervals that make up C1 , C2 , C3 , . . . . Let’s take inventory of the situation. Our goal is to create a nonintegrable derivative. Our function f (x) is diﬀerentiable, and f fails to be continuous on C. We are not quite done. Exercise 7.6.18. Why is f (x) Riemann-integrable on [0, 1]? The reason the Cantor set has measure zero is that, at each stage, 2n−1 open intervals of length 1/3n are removed from Cn−1 . The resulting sum ∞ 1 n−1 2 n 3 n=1 converges to one, which means that the approximating sets C1 , C2 , C3 , . . . have total lengths tending to zero. Instead of removing open intervals of length 1/3n at each stage, let’s see what happens when we remove intervals of length 1/3n+1 . Exercise 7.6.19. Show that, under these circumstances, the sum of the lengths of the intervals making up each Cn no longer tends to zero as n → ∞. What is this limit? ∞ If we again take the intersection n=0 Cn , the result is a Cantor-type set with the same topological properties—it is closed, compact and perfect. But a consequence of the previous exercise is that it no longer has measure zero. This is just what we need to deﬁne our desired function. By repeating the preceding construction of f (x) on this new Cantor-type set of positive measure, we get a diﬀerentiable function whose derivative has too many points of discontinuity. By Lebesgue’s Theorem, this derivative cannot be integrated using the Riemann integral.

7.7

Epilogue

Riemann’s deﬁnition of the integral was a modiﬁcation of Cauchy’s integral, which was originally designed for the purpose of integrating continuous functions. In this goal, the Riemann integral was a complete success. For continuous functions at least, the process of integration now stood on its own rigorous footing, deﬁned independently of diﬀerentiation. As analysis progressed, however, the dependence of integrability on continuity became problematic. The last example of Section 7.6 highlights one type of weakness: not every derivative can be integrated. Another limitation of the Riemann integral arises in association with limits of sequences of functions. To get a sense of this, let’s once again consider Dirichlet’s function g(x) introduced in Section 4.1. Recall that g(x) = 1 whenever x is rational, and g(x) = 0 at every irrational point. Focusing on the interval [0, 1] for a moment, let {r1 , r2 , r3 , r4 . . . }

7.7. Epilogue

211

be an enumeration of the countable number of rational points in this interval. Now, let g1 (x) = 1 if x = r1 and deﬁne g1 (x) = 0 otherwise. Next, deﬁne g2 (x) = 1 if x is either r1 or r2 , and let g2 (x) = 0 at all other points. In general, for each n ∈ N, deﬁne 1 if x ∈ {r1 , r2 , . . . , rn } gn (x) = 0 otherwise. Notice that each )gn has only a ﬁnite number of discontinuities and so is Riemann1 integrable with 0 gn = 0. But we also have gn → g pointwise on the interval [0, 1]. The problem arises when we remember that Dirichlet’s nowherecontinuous function is not Riemann-integrable. Thus, the equation * (1)

lim

n→∞

0

1

* gn =

0

1

g

fails to hold, not because the values on each side of the equal sign are diﬀerent but because the value on the right-hand side does not exist. The content of Theorem 7.4.4 is that this equation does hold whenever we have gn → g uniformly. This is a reasonable way to resolve the situation, but it is a bit unsatisfying because the deﬁciency in this case is not entirely with the type of convergence but lies in the strength of the Riemann integral. If we could make sense of the right-hand side via some other deﬁnition of integration, then maybe equation (1) would actually be true. Such a deﬁnition was introduced by Henri Lebesque in 1901. Generally speaking, Lebesgue’s integral is constructed using a generalization of length called the measure of a set. In the previous section, we studied sets of measure zero. In particular, we showed that the rational numbers in [0,1] (because they are countable) have measure zero. The irrational numbers in [0,1] have measure one. This should not be too surprising because we now have that the measures of these two disjoint sets add up to the length of the interval [0, 1]. Rather than chopping up the x-axis to approximate the area under the curve, Lebesgue suggested partitioning the y-axis. In the case of Dirichlet’s function g, there are only two range values—zero and one. The integral, according to Lebesgue, could be deﬁned via * 1 g = 1 · [measure of set where g = 1] + 0 · [measure of set where g = 0] 0

= 1 · 0 + 0 · 1 = 0. )1 With this interpretation of 0 g, equation (1) is now valid! The Lebesgue integral is presently the standard integral in advanced mathematics. The theory is taught to all graduate students, as well as to many advanced undergraduates, and it is the integral used in most research papers where integration is required. The Lebesgue integral generalizes the Riemann integral in the sense that any function that is Riemann-integrable is Lebesgueintegrable and integrates to the same value. The real strength of the Lebesgue

212

Chapter 7. The Riemann Integral

integral is that the class of integrable functions is much larger. Most importantly, this class includes the limits of diﬀerent types of Cauchy sequences of integrable functions. This leads to a group of extremely important convergence theorems related to equation (1) with hypotheses much weaker than the uniform convergence assumed in Theorem 7.4.4. Despite its prevalence, the Lebesgue integral does have a few drawbacks. There are functions whose improper Riemann integrals exist but that are not Lebesgue-integrable. Another disappointment arises from the relationship between integration and diﬀerentiation. Even with the Lebesgue integral, it is still not possible to prove * b f = f (b) − f (a) a

without some additional assumptions on f . Around 1960, a new integral was proposed that can integrate a larger class of functions than either the Riemann integral or the Lebesgue integral and suﬀers from neither of the preceding weaknesses. Remarkably, this integral is actually a return to Riemann’s original technique for deﬁning integration, with some small modiﬁcations in how we describe the “ﬁneness” of the partitions. An introduction to the generalized Riemann integral is the topic of Section 8.1.

Chapter 8

Additional Topics The foundation in analysis provided by the ﬁrst seven chapters is suﬃcient background for the exploration of some advanced and historically important topics. The writing in this chapter is similar to that in the concluding project sections of each individual chapter. Exercises are included within the exposition and are designed to make each section a narrative investigation into a signiﬁcant achievement in the ﬁeld of analysis.

8.1

The Generalized Riemann Integral

Chapter 7 concluded with Henri Lebesgue’s elegant result that a bounded function is Riemann-integrable if and only if its points of discontinuity form a set of measure zero. To eliminate the dependence of integrability on continuity, Lebesgue proposed a new method of integration that has become the standard integral in mathematics. In the Epilogue to Chapter 7, we brieﬂy outlined some of the strengths and weaknesses of the Lebesgue integral, concluding with a look back to the Fundamental Theorem of Calculus (Theorem 7.5.1). (Lebesgue’s measure-zero criterion is not a prerequisite for understanding the material in this section, but the discussion in Section 7.7 provides some useful context for what follows.) If F is a diﬀerentiable function on [a, b], then in a perfect world we might hope to prove that * (1)

a

b

F = F (b) − F (a).

Notice that although this is the conclusion of part (i) of Theorem 7.5.1, there we needed the additional requirement that F be Riemann-integrable. To drive this point home, Section 7.6 concluded with an example of a function that has a derivative that the Riemann integral cannot handle. The Lebesgue integral alluded to earlier is a signiﬁcant improvement. It can integrate our example 213

214

Chapter 8. Additional Topics

from Section 7.6, but ultimately it too suﬀers from the same setback. Not every derivative is integrable, no matter which integral is used. What follows is a short introduction to the generalized Riemann integral, discovered independently around 1960 by Jaroslav Kurzweil and Ralph Henstock. As mentioned in Section 7.7, this lesser-known modiﬁcation of the Riemann integral can actually integrate a larger class of functions than Lebesgue’s ubiquitous integral and yields a surprisingly simple proof of equation (1) above with no additional hypotheses.

The Riemann Integral as a Limit Let

P = {a = x0 < x1 < x2 < · · · < xn = b}

be a partition of [a, b]. A tagged partition is one where in addition to P we have chosen points ck in each of the subintervals [xk−1 , xk ]. This sets the stage for the concept of a Riemann sum. Given a function f : [a, b] → R, and a tagged partition (P, {ck }nk=1 ), the Riemann sum generated by this partition is given by R(f, P ) =

n

f (ck )(xk − xk−1 ).

k=1

Looking back at the deﬁnition of the upper sum U (f, P ) =

n

Mk (xk − xk−1 )

where

Mk = sup{f (x) : x ∈ [xk−1 , xk ]},

where

mk = inf{f (x) : x ∈ [xk−1 , xk ]},

k=1

and the lower sum L(f, P ) =

n

mk (xk − xk−1 )

k=1

it should be clear that L(f, P ) ≤ R(f, P ) ≤ U (f, P ) for any bounded function f . In Deﬁnition 7.2.7, we characterized integrability by insisting that the inﬁmum of the upper sums equal the supremum of the lower sums. Any Riemann sum is going to fall between a particular upper and lower sum. If the upper and lower sums are converging on some common value, then the Riemann sums are also eventually close to this value as well. The next theorem shows that it is possible to characterize Riemann integrability in a way equivalent to Deﬁnition 7.2.7 using an –δ-type deﬁnition applied to Riemann sums. Deﬁnition 8.1.1. Let δ > 0. A partition P is δ-ﬁne if every subinterval [xk−1 , xk ] satisﬁes xk − xk−1 < δ. In other words, every subinterval has width less than δ.

8.1. The Generalized Riemann Integral

215

Theorem 8.1.2. A bounded function f : [a, b] → R is Riemann-integrable with * b f =A a

if and only if, for every > 0, there exists a δ > 0 such that, for any tagged partition (P, {ck }) that is δ-ﬁne, it follows that |R(f, P ) − A| < . Before attempting the proof, we should point out that, in some treatments, the criterion in Theorem 8.1.2 is actually taken as the deﬁnition of Riemann integrability. In fact, this is how Riemann originally deﬁned the concept. The spirit of this theorem is close to what is taught in most introductory calculus courses. To approximate the area under the curve, Riemann sums are constructed. The hope is that as the partitions become ﬁner, the corresponding approximations get closer to the value of the integral. The content of Theorem 8.1.2 is that if the function is integrable, then these approximations do indeed converge to the value of the integral, regardless of how the tags are chosen. Conversely, if the approximating Riemann sums for ﬁner and ﬁner partitions collect around some value A, then the function is integrable and integrates to A. Proof. (⇒) For the forward direction, we begin with the assumption that f is integrable on [a, b]. Given an > 0, we must produce a δ > 0 such that if )b (P, {ck }) is any tagged partition that is δ-ﬁne, then |R(f, P ) − a f | < . Because f is integrable, we know there exists a partition P such that U (f, P ) − L(f, P ) <

. 3

Let M > 0 be a bound on |f |, and let n be the number of subintervals of P (so that P really consists of n + 1 points in [a, b]). We will argue that choosing δ = /9nM has the desired property. Here is the idea. Let (P, {ck }) be an arbitrary tagged partition of [a, b] that is δ-ﬁne, and let P = P ∪ P . The key is to establish the string of inequalities L(f, P ) −

< L(f, P ) ≤ U (f, P ) < U (f, P ) + . 3 3

Exercise 8.1.1. (a) Explain why both the Riemann sum R(f, P ) and between L(f, P ) and U (f, P ). (b) Explain why U (f, P ) − L(f, P ) < /3.

)b a

f fall

By the previous exercise, if we can show U (f, P ) < U (f, P ) + /3 (and similarly L(f, P ) − /3 < L(f, P )), then it will follow that * b f < R(f, P ) − a

216

Chapter 8. Additional Topics

and the proof will be done. Thus, we turn our attention toward estimating the distance between U (f, P ) and U (f, P ). Exercise 8.1.2. Explain why U (f, P ) − U (f, P ) ≥ 0. A typical term in either U (f, P ) or U (f, P ) has the form Mk (xk − xk−1 ), where Mk is the supremum of f over [xk−1 , xk ]. A good number of these terms appear in both upper sums and so cancel out. Exercise 8.1.3. (a) In terms of n, what is the largest number of terms of the form Mk (xk − xk−1 ) that could appear in one of U (f, P ) or U (f, P ) but not the other? (b) Finish the proof in this direction by arguing that U (f, P ) − U (f, P ) < /3. (⇐) For this direction, we assume that the –δ criterion in Theorem 8.1.2 holds and argue that f is integrable. Integrability, as we have deﬁned it, depends on our ability to choose partitions for which the upper sums are close to the lower sums. We have remarked that given any partition P , it is always the case that L(f, P ) ≤ R(f, P ) ≤ U (f, P ) no matter which tags are chosen to compute R(f, P ). Exercise 8.1.4. (a) Show that if f is continuous, then it is possible to pick tags {ck }nk=1 so that R(f, P ) = U (f, P ). Similarly, there are tags for which R(f, P ) = L(f, P ) as well. (b) If f is not continuous, it may not be possible to ﬁnd tags for which R(f, P ) = U (f, P ). Show, however, that given an arbitrary > 0, it is possible to pick tags for P so that U (f, P ) − R(f, P ) < . The analogous statement holds for lower sums. Exercise 8.1.5. Use the results of the previous exercise to ﬁnish the proof of Theorem 8.1.2. It may be easier to ﬁrst argue that f is integrable using the )b criterion in Theorem 7.2.8 and then argue that a f = A.

Gauges and δ(x)-ﬁne Partitions The key to the generalized Riemann integral is to allow the δ in Theorem 8.1.2 to be a function of x. Deﬁnition 8.1.3. A function δ : [a, b] → R is called a gauge on [a, b] if δ(x) > 0 for all x ∈ [a, b].

8.1. The Generalized Riemann Integral

217

Deﬁnition 8.1.4. Given a particular gauge δ(x), a tagged partition (P, {ck }nk=1 ) is δ(x)-ﬁne if every subinterval [xk−1 , xk ] satisﬁes xk − xk−1 < δ(ck ). In other words, each subinterval [xk−1 , xk ] has width less than δ(ck ). It is important to see that if δ(x) is a constant function, then Deﬁnition 8.1.4 says precisely the same thing as Deﬁnition 8.1.1. In the case where δ(x) is not a constant, Deﬁnition 8.1.4 describes a way of measuring the ﬁneness of partitions that is quite diﬀerent. Exercise 8.1.6. Consider the interval [0, 1]. (a) If δ(x) = 1/9, ﬁnd a δ(x)-ﬁne tagged partition of [0, 1]. Does the choice of tags matter in this case? (b) Let 1/4 if x = 0 δ(x) = x/3 if 0 < x ≤ 1. Construct a δ(x)-ﬁne tagged partition of [0,1]. The tinkering required in Exercise 8.1.6 (b) may cast doubt on whether an arbitrary gauge always admits a δ(x)-ﬁne partition. However, it is not too diﬃcult to show that this is indeed the case. Theorem 8.1.5. Given a gauge δ(x) on an interval [a, b], there exists a tagged partition (P, {ck }nk=1 ) that is δ(x)-ﬁne. Proof. Let I0 = [a, b]. It may be possible to ﬁnd a tag such that the trivial partition P = {a = x0 < x1 = b} works. Speciﬁcally, if b − a < δ(x) for some x ∈ [a, b], then we can set c1 equal to such an x and notice that (P, {c1 }) is δ(x)-ﬁne. If no such x exists, then bisect [a, b] into two equal halves. Exercise 8.1.7. Apply the previous algorithm to each half and then explain why this procedure must eventually terminate after some ﬁnite number of steps.

Generalized Riemann Integrability Keeping in mind that Theorem 8.1.2 oﬀers an equivalent way to deﬁne Riemann integrability, we now propose a new method for deﬁning the value of the integral. Deﬁnition 8.1.6 (Generalized Riemann Integrability). A function f on [a, b] has generalized Riemann integral A if, for every > 0, there exists a gauge δ(x) on [a, b] such that for each tagged partition (P, {ck }nk=1 ) that is δ(x)-ﬁne, it is true that |R(f, P ) − A| < . )b In this case, we write A = a f . Theorem 8.1.7. If a function has a generalized Riemann integral, then the value of the integral is unique.

218

Chapter 8. Additional Topics

Proof. Assume that a function f has generalized Riemann integral A1 and that it also has generalized Riemann integral A2 . We must prove A1 = A2 . Let > 0. Deﬁnition 8.1.6 assures us that there exists a gauge δ1 (x) such that |R(f, P ) − A1 | < 2 for all tagged partitions that are δ1 (x)-ﬁne. Likewise, there exists another gauge δ2 (x) such that |R(f, P ) − A2 | < 2 for all δ2 (x)-ﬁne tagged partitions. Exercise 8.1.8. Finish the argument.

The implications of Deﬁnition 8.1.6 on the resulting class of integrable functions are far reaching. This is somewhat surprising given that the criteria for integrability in Deﬁnition 8.1.6 and Theorem 8.1.2 diﬀer in such a small way. One observation that should be immediately evident is the following. Exercise 8.1.9. Explain why every function that is Riemann-integrable with )b f = A must also have generalized Riemann integral A. a The converse statement is not true, and that is the important point. One example that we have of a non-Riemann-integrable function is Dirichlet’s function 1 if x ∈ Q g(x) = 0 if x ∈ /Q which has discontinuities at every point of R. Theorem 8.1.8. Dirichlet’s function g(x) is generalized Riemann-integrable on )1 [0, 1] with 0 g = 0. Proof. Let > 0. By Deﬁnition 8.1.6, we must construct a gauge δ(x) on [0, 1] such that whenever (P, {ck }nk=1 ) is a δ(x)-ﬁne tagged partition, it follows that 0≤

n

g(ck )(xk − xk−1 ) < .

k=1

The gauge represents a restriction on the size of ∆xk = xk − xk−1 in the sense that ∆xk < δ(ck ). The Riemann sum consists of products of the form g(ck )∆xk . Thus, for irrational tags, there is nothing to worry about because g(ck ) = 0 in this case. Our task is to make sure that any time a tag ck is rational, it comes from a suitably thin subinterval. Let {r1 , r2 , r3 , . . . } be an enumeration of the countable set of rational numbers contained in [0, 1]. For each rk , set δ(rk ) = /2k+1 . For x irrational, set δ(x) = 1.

8.1. The Generalized Riemann Integral

219

Exercise 8.1.10. Show that if (P, {ck }nk=1 ) is a δ(x)-ﬁne tagged partition, then R(f, P ) < . Keep in mind that each rational number rk can show up as a tag in at most two subintervals of P .

Dirichlet’s function fails to be Riemann-integrable because, given any (untagged) partition, it is possible to make R(f, P ) = 1 or R(f, P ) = 0 by choosing the tags to be either all rational or all irrational. For the generalized Riemann integral, choosing all rational tags results in a tagged partition that is not δ(x)-ﬁne (when δ(x) is small on rational points) and so does not have to be considered. In general, allowing for nonconstant gauges allows us to be more discriminating about which tagged partitions qualify as δ(x)-ﬁne. The result, as we have just seen, is that it may be easier to achieve the inequality |R(f, P ) − A| < for the often smaller and more carefully selected set of tagged partitions that remain.

The Fundamental Theorem of Calculus We conclude this brief introduction to the generalized Riemann integral with a proof of the Fundamental Theorem of Calculus. As was alluded to earlier, the most notable distinction between the following theorem and part (i) of Theorem 7.5.1 is that here we do not need to assume that the derivative function is integrable. Using the generalized Riemann integral, every derivative is integrable, and the integral can be evaluated using the antiderivative in the familiar way. It is also interesting to note that in Theorem 7.5.1 the Mean Value Theorem played the crucial role in the argument, but it is not needed here. Theorem 8.1.9. Assume F : [a, b] → R is diﬀerentiable at each point in [a, b] and set f (x) = F (x). Then, f has the generalized Riemann integral * a

b

f = F (b) − F (a).

Proof. Let P = {a = x0 < x1 < x2 < · · · < xn = b} be a partition of [a, b]. Both this proof and the proof of Theorem 7.5.1 make use of the following fact. Exercise 8.1.11. Show that F (b) − F (a) =

n k=1

[F (xk ) − F (xk−1 )] .

220

Chapter 8. Additional Topics

If {ck }nk=1 is a set of tags for P , then we can estimate the diﬀerence between the Riemann sum R(f, P ) and F (b) − F (a) by n [F (xk ) − F (xk−1 ) − f (ck )(xx − xk−1 )] |F (b) − F (a) − R(f, P )| = k=1

≤

n

|F (xk ) − F (xk−1 ) − f (ck )(xx − xk−1 )| .

k=1

Let > 0. To prove the theorem, we must construct a gauge δ(c) such that (2)

|F (b) − F (a) − R(f, P )| <

for all (P, {ck }) that are δ(c)-ﬁne. (Using the variable c in the gauge function is more convenient than x in this case.) Exercise 8.1.12. For each c ∈ [a, b], explain why there exists a δ(c) > 0 (a δ > 0 depending on c) such that F (x) − F (c) < − f (c) x−c for all 0 < |x − c| < δ(c). This δ(c) is the desired gauge on [a, b]. Let (P, {ck }nk=1 ) be a δ(c)-ﬁne partition of [a, b]. It just remains to show that equation (2) is satisﬁed for this tagged partition. Exercise 8.1.13. (a) For a particular ck ∈ [xk−1 , xk ] of P , show that |F (xk ) − F (ck ) − f (ck )(xk − ck )| < (xk − ck ) and

|F (ck ) − F (xk−1 ) − f (ck )(ck − xk−1 )| < (c − xk−1 ).

(b) Now, argue that |F (xk ) − F (xk−1 ) − f (ck )(xk − xk−1 )| < (xk − xk−1 ), and use this fact to complete the proof of the theorem. If we consider the function ( x 3/2) sin(1/x) if x = 0 F (x) = 0 if x = 0 then it is not too diﬃcult to show that F is diﬀerentiable everywhere, including x = 0, with √ √ (3/2) x sin(1/x) − (1/ x) cos(1/x) if x = 0 F (x) = 0 if x = 0.

8.1. The Generalized Riemann Integral

221

What is notable here is that the derivative is unbounded near the origin. The theory of the ordinary Riemann integral begins with the assumption that we only consider bounded functions on closed intervals, but there is no such restriction for the generalized Riemann integral. Theorem 8.1.9 proves that F has a generalized integral. Now, improper Riemann integrals have been created to extend Riemann integration to some unbounded functions, but it is another interesting fact about the generalized Riemann integral that any function having an improper integral must already be integrable in the sense described in Deﬁnition 8.1.6. As a parting gesture, let’s show how Theorem 8.1.9 yields a short veriﬁcation of the change-of-variables technique from calculus. Theorem 8.1.10 (Substitution Formula). Let g : [a, b] → R be diﬀerentiable at each point of [a, b], and assume F is diﬀerentiable on the set g([a, b]). If f (x) = F (x) for all x ∈ g([a, b]), then *

b

a

(f ◦ g) · g =

*

g(b)

g(a)

f.

Proof. The hypothesis of the theorem guarantees that the function (F ◦ g)(x) is diﬀerentiable for all x ∈ [a, b]. Exercise 8.1.14. (a) Why are we sure that (F ◦ g) (x) has a generalized Riemann integral? (b) Use the chain rule (Theorem 5.2.5) and Theorem 8.1.9 to prove that * a

b

(f ◦ g) · g = F (g(b)) − F (g(a)).

(c) Finish the proof by showing that *

g(b)

g(a)

f = F (g(b)) − F (g(a)).

The impressive properties of the generalized Riemann integral do not end here. The central source for the material in this section is Robert Bartle’s excellent article “Return to the Riemann Integral,” which appeared in the American Mathematical Monthly, October, 1996. This article goes on to outline convergence theorems in the spirit of Theorem 7.4.4 for the generalized Riemann integral as well as its relationship to the theory of the Lebesgue integral. A more detailed development can be found in the recently published book Integral: An Easy Approach after Kurzweil and Henstock by Rudolph V´ yborn´ y and Lee Peng Yee or in a forthcoming book by Robert Bartle to be published by the American Mathematical Society.

222

8.2

Chapter 8. Additional Topics

Metric Spaces and the Baire Category Theorem

A natural question to ask is whether the theorems we have proved about sequences, series, and functions in R have analogs in the plane R2 or in even higher dimensions. Looking back over the proofs, one crucial observation is that most of the arguments depend on just a few basic properties of the absolute value function. Interpreting the statement “|x − y|” to mean the “distance from x to y in R,” our aim is to experiment with other ways of measuring “distance” on other sets such as R2 and C[0, 1], the space of continuous functions on [0, 1]. Deﬁnition 8.2.1. Given a set X, a function d : X × X → R is a metric on X if for all x, y ∈ X: (i) d(x, y) ≥ 0 with d(x, y) = 0 if and only if x = y, (ii) d(x, y) = d(y, x), and (iii) for all z ∈ X, d(x, y) ≤ d(x, z) + d(z, y). A metric space is a set X together with a metric d. Property (iii) in the previous deﬁnition is the “triangle inequality.” The next two exercises illustrate the point that the same set X can be home to several diﬀerent metrics. When referring to a metric space, we must specify the set and the particular distance function d. Exercise 8.2.1. Decide which of the following are metrics on X = R2 . For each, we let x = (x1 , x2 ) and y = (y1 , y2 ) be points in the plane. (a) d(x, y) = (x1 − y1 )2 + (x2 − y2 )2 . (b) d(x, y) = 1 if x = y; and d(x, x) = 0. (c) d(x, y) = max{|x1 − y1 |, |x2 − y2 |}. (d) d(x, y) = |x1 x2 + y1 y2 |. Exercise 8.2.2. Let C[0, 1] be the collection of continuous functions on the closed interval [0, 1]. Decide which of the following are metrics on C[0, 1]. (a) d(f, g) = sup{|f (x) − g(x)| : x ∈ [0, 1]}. (b) d(f, g) = |f (1) − g(1)|. )1 (c) d(f, g) = 0 |f − g|. The following distance function is called the discrete metric and can be deﬁned on any set X. For any x, y ∈ X, let 1 if x = y ρ(x, y) = 0 if x = y. Exercise 8.2.3. Verify that the discrete metric is actually a metric.

8.2. Metric Spaces and the Baire Category Theorem

223

Basic Deﬁnitions Deﬁnition 8.2.2. Let (X, d) be a metric space. A sequence (xn ) ⊆ X converges to an element x ∈ X if for all > 0 there exists an N ∈ N such that d(xn , x) < whenever n ≥ N . Deﬁnition 8.2.3. A sequence (xn ) in a metric space (X, d) is a Cauchy sequence if for all > 0 there exists an N ∈ N such that d(xm , xn ) < whenever m, n ≥ N . Exercise 8.2.4. Show that a convergent sequence is Cauchy. The Cauchy Criterion, as it is called in R, was an “if and only if” statement. In the general metric space setting, however, the converse statement does not always hold. Recall that, in R, the assertion that “Cauchy sequences converge” was shown to be equivalent to the Axiom of Completeness. In order to transport the Axiom of Completeness into a metric space, we would need to have an ordering on our space so that we could discuss such things as upper bounds. It is an interesting observation that not every set can be ordered in a satisfying way (the points in R2 for example). Even without an ordering, we are still going to want completeness. For metric spaces, the convergence of Cauchy sequences is taken to be the deﬁnition of completeness. Deﬁnition 8.2.4. A metric space (X, d) is complete if every Cauchy sequence in X converges to an element of X. Exercise 8.2.5. (a) Consider R2 with the metric deﬁned in Exercise 8.2.1 (b). What do Cauchy sequences look like in this space? Is R2 complete with respect to this metric? (b) Show that C[0, 1] is complete with respect to the metric in Exercise 8.2.2 (a). (c) Deﬁne C 1 [0, 1] to be the collection of diﬀerentiable functions on [0,1] whose derivatives are also continuous. Is C 1 [0, 1] complete with respect to the metric deﬁned in Exercise 8.2.2 (a)? (d) What does a convergent sequence in R look like when we consider the discrete metric ρ(x, y) examined in Exercise 8.2.3? The metric on C[0, 1] in Exercise 8.2.2 (a) is important enough to have earned the nickname “sup norm” and is denoted by d(f, g) = "f − g"∞ = sup{|f (x) − g(x)| : x ∈ [0, 1]}. In all upcoming discussions, it is assumed that the space C[0, 1] is endowed with this metric unless otherwise speciﬁed. Deﬁnition 8.2.5. Let (X, d) be a metric space. A function f : X → R is continuous at a point x ∈ X if for all > 0 there exists a δ > 0 such that |f (x) − f (y)| < whenever d(x, y) < δ.

224

Chapter 8. Additional Topics

Exercise 8.2.6. Which of these functions on C[0, 1] are continuous? )1 (a) g(f ) = 0 f k, where k is some ﬁxed function in C[0, 1]. (b) g(f ) = f (1/2). (c) g(f ) = f (1/2), but this time with respect to the metric in Exercise 8.2.2 (c).

Topology on Metric Spaces Deﬁnition 8.2.6. Given > 0 and an element x in the metric space (X, d), the -neighborhood of x is the set V (x) = {y ∈ X : d(x, y) < }. Exercise 8.2.7. (a) Describe the -neighborhoods in R2 for each of the diﬀerent metrics described in Exercise 8.2.1. How about for the discrete metric? (b) What do -neighborhoods in R look like using the discrete metric ρ(x, y)? With the deﬁnition of an -neighborhood, we can now deﬁne open sets, limit points, and closed sets exactly as we did before. A set O ⊆ X is open if for every x ∈ O we can ﬁnd a neighborhood V (x) ⊆ O. A point x is a limit point of a set A if every V (x) intersects A in some point other than x. A set C is closed if it contains its limit points. Exercise 8.2.8. (a) Let (X, d) be a metric space, and pick x ∈ X. Verify that an -neighborhood V (x) is an open set. Is the set C (x) = {y ∈ X : d(x, y) ≤ } a closed set? (b) Show that the set Y = {f ∈ C[0, 1] : "f "∞ ≤ 1} is closed in C[0, 1]. (c) Is the set T = {f ∈ C[0, 1] : f (0) = 0} open, closed, or neither in C[0, 1]? We deﬁne compactness in metric spaces just as we did for R. Deﬁnition 8.2.7. A subset K of a metric space (X, d) is compact if every sequence in K has a convergent subsequence that converges to a limit in K. An extremely useful characterization of compactness in R is the proposition that a set is compact if and only if it is closed and bounded. For abstract metric spaces, this proposition only holds in the forward direction. Exercise 8.2.9. (a) Supply a deﬁnition for bounded subsets of a metric space (X, d). (b) Show that if K is a compact subset of the metric space (X, d), then K is closed and bounded. (c) Show that Y ⊆ C[0, 1] from Exercise 8.2.8 (b) is closed and bounded but not compact. A good hint for part (c) of the previous exercise can be found in Exercise 6.2.15 from Chapter 6. This exercise deﬁnes the concept of an equicontinuous family of functions, which is a key ingredient in the Arzela–Ascoli Theorem (Exercise 6.2.16). The Arzela–Ascoli Theorem states that any bounded,

8.2. Metric Spaces and the Baire Category Theorem

225

equicontinuous collection of functions in C[0, 1] must have a uniformly convergent subsequence. One way to summarize this famous result—which we did not have the language for in Chapter 6—is as a statement describing a particular class of compact subsets in C[0, 1]. Looking at the deﬁnition of compactness, and remembering that the uniform limit of continuous functions is continuous, the Arzela–Ascoli Theorem states that any closed, bounded, equicontinuous collection of functions is a compact subset of C[0, 1]. Deﬁnition 8.2.8. Given a subset E of a metric space (X, d), the closure E is the union of E together with its limit points. The interior of E is denoted by E ◦ and is deﬁned as E ◦ = {x ∈ E : there exists V (x) ⊆ E}. Closure and interior are dual concepts. Results about these concepts come in pairs and exhibit an elegant and useful symmetry. Exercise 8.2.10. (a) Show that E is closed if and only if E = E. Show that E is open if and only if E ◦ = E. c (b) Show that E = (E c )◦ , and similarly that (E ◦ )c = E c . A good hint for the previous exercise is to review the proofs from Chapter 3, where closure at least is discussed. Thinking of all of these concepts as they relate to R or R2 with the usual metric is not a bad idea. However, it is important to remember also that rigorous proofs must be constructed purely from the relevant deﬁnitions. Exercise 8.2.11. To keep things from sounding too familiar, ﬁnd an example of a metric space (from somewhere in the preceding discussion) where V (x) = {y ∈ X : d(x, y) ≤ } for some -neighborhood in a metric space (X, d). We are on our way toward the Baire Category Theorem. The next deﬁnitions provide the ﬁnal bit of vocabulary needed to state the result. Deﬁnition 8.2.9. A set A ⊆ X is dense in the metric space (X, d) if A = X. ◦ A subset E of a metric space (X, d) is nowhere-dense in X if E is empty. Exercise 8.2.12. If E is a subspace of a metric space (X, d), show that E is c nowhere-dense in X if and only if E is dense in X.

The Baire Category Theorem In Section 3.5, we proved Baire’s Theorem, which states that it is impossible to write the real numbers R as the countable union of nowhere-dense sets. Previous to this, we knew that R was too big to be written as the countable union of single points (R is uncountable), but Baire’s Theorem improves on this by asserting

226

Chapter 8. Additional Topics

that the only way to make R from a countable union of arbitrary sets is for the closure of at least one of these sets to contain an interval. The keystone to the proof of Baire’s Theorem is the completeness of R. The idea now is to replace R with an arbitrary complete metric space and prove the theorem in this more general setting. This leads to a statement that can be used to discuss the size and structure of other spaces such as R2 and C[0, 1]. At the end of Chapter 3, we mentioned one particularly fascinating implication of this result for C[0, 1], which is that—despite the substantial diﬃculty required to produce an example of one—most continuous functions are nowhere-diﬀerentiable. It would be a good idea at this point to reread Sections 3.6 and 5.5. We are now equipped to carry out the details promised in these discussions. Theorem 8.2.10. Let (X, d) be a complete metric space, ∞ and let {On } be a countable collection of dense, open subsets of X. Then, n=1 On is not empty. Proof. When we proved this theorem on R, completeness manifested itself in the form of the Nested Interval Property. We could derive something akin to NIP in the metric space setting, but instead let’s take an approach that uses the convergence of Cauchy sequences (because this is how we have deﬁned completeness). Pick x1 ∈ O1 . Because O1 is open, there exists an 1 > 0 such that V1 (x1 ) ⊆ O1 . Exercise 8.2.13. (a) Give the details for why we know there exists a point x2 ∈ V1 (x1 ) ∩ O2 and an 2 > 0 satisfying 2 < 1 /2 with V2 (x2 ) contained in O2 and V2 (x2 ) ⊆ V1 (x1 ). (b) Proceed along this line and use the completeness of (X, d) to produce a single point x ∈ On for every n ∈ N.

Theorem 8.2.11 (Baire Category Theorem). A complete metric space is not the union of a countable collection of nowhere-dense sets. Proof. Let (X, d) be a complete metric space. Exercise 8.2.14. If E is nowhere-dense in X, then what can we say about (E)c ? Now, complete the proof of the theorem.

This result is called the Baire Category Theorem because it creates two categories of size for subsets in a metric space. A set of “ﬁrst category” is one that can be written as a countable union of nowhere-dense sets. These are the small, intuitively thin subsets of a metric space. We now see that if our metric space is complete, then it is necessarily of “second category,” meaning it cannot be written as a countable union of nowhere-dense sets. Given a subset A of a complete metric space X, showing that A is of ﬁrst category is a mathematically

8.2. Metric Spaces and the Baire Category Theorem

227

precise way of proving that A constitutes a very minor portion of the set X. The term “meager” is often used to mean a set of ﬁrst category. With the stage set, we now outline the argument that continuous functions that are diﬀerentiable at even one point of [0,1] form a meager subset of the metric space C[0, 1]. Theorem 8.2.12. The set D = {f ∈ C[0, 1] : f (x) exists for some x ∈ [0, 1]} is a set of ﬁrst category in C[0, 1]. Proof. For each pair of natural numbers m, n, deﬁne Am,n = {f ∈ C[0, 1] : there exists x ∈ [0, 1] where f (x) − f (t) ≤ n whenever 0 < |x − t| < 1 . x−t m This deﬁnition takes some time to digest. Think of 1/m as deﬁning a δneighborhood around the point x, and view n as an upper bound on the magnitude of the slopes of lines through the two points (x, f (x)) and (t, f (t)). The set Am,n contains any function in C[0, 1] for which it is possible to ﬁnd at least one point x where the slopes through (x, f (x)) and points on the function nearby—within 1/m to be precise—are bounded by n. Exercise 8.2.15. Show that if f ∈ C[0, 1] is diﬀerentiable at a point x ∈ [0, 1], then f ∈ Am,n for some pair m, n ∈ N. The collection of subsets {Am,n : m, n ∈ N} is countable, and we have just seen that the union of these sets contains our set D. Because it is not diﬃcult to see that a subset of a set of ﬁrst category is ﬁrst category, the ﬁnal hurdle in the argument is to prove that each Am,n is nowhere-dense in C[0, 1]. Fix m and n. The ﬁrst order of business is to prove that Am,n is a closed set. To this end, let (fk ) be a sequence in Am,n and assume fk → f in C[0, 1]. We need to show f ∈ Am,n . Because fk ∈ Am,n , then for each k ∈ N there exists a point xk ∈ [0, 1] where fk (xk ) − fk (t) ≤n xk − t for all 0 < |x − t| < 1/m. Exercise 8.2.16. (a) The sequence (xk ) does not necessarily converge, but explain why there exists a subsequence (xkl ) that is convergent. Let x = lim(xkl ). (b) Prove that fkl (xkl ) → f (x). (c) Let t ∈ [0, 1] satisfy 0 < |x − t| < 1/m. Show that f (x) − f (t) x−t ≤n and conclude that Am,n is closed.

228

Chapter 8. Additional Topics

Because Am,n is closed, Am,n = Am,n . In order to prove that Am,n is nowhere-dense, we just have to show that it contains no -neighborhoods, so pick an arbitrary f ∈ Am,n , let > 0, and consider the -neighborhood V (f ) in C[0, 1]. To show that this set is not contained in Am,n , we must produce a function g ∈ C[0, 1] that satisﬁes "f − g"∞ < and has the property that there is no point x ∈ [0, 1] where g(x) − g(t) x − t ≤ n for all 0 < |x − t| < 1/m. Exercise 8.2.17. A function is called piecewise linear if its graph consists of a ﬁnite number of line segments. (a) Show that there exists a piecewise linear function p ∈ C[0, 1] satisfying "f − p"∞ < /2. (b) Show that if h is any function in C[0, 1] that is bounded by 1, then the function g(x) = p(x) + h(x) 2 satisﬁes g ∈ V (f ). (c) Construct a piecewise linear function h(x) in C[0, 1] that is bounded by 1 and leads to the conclusion g ∈ / Am,n , where g is deﬁned as in (b). Explain how this completes the argument for Theorem 8.2.12.

8.3

Fourier Series

In his famous treatise, Theorie Analytique de la Chaleur (The Analytical Theory of Heat), 1822, Joseph Fourier (1768–1830) boldly asserts, “Thus there is no function f (x), or part of a function, which cannot be expressed by a trigonometric series.”1 It is diﬃcult to exaggerate the mathematical richness of this idea. It has been convincingly argued by mathematical historians that the ensuing investigation into the validity of Fourier’s conjecture was the fundamental catalyst for the pursuit of rigor that characterizes 19th century mathematics. Power series had been in wide use in the 150 years leading up to Fourier’s work, largely because they behaved so well under the operations of calculus. A function expressed as a power series is continuous, diﬀerentiable an inﬁnite number of times, and can be integrated and diﬀerentiated as though it were a polynomial. In the presence of such agreeable behavior, there was no compelling reason for mathematicians to formulate a more precise understanding of “limit” or “convergence” because there were no arguments to resolve. Fourier’s successful implementation of trigonometric series to the study of heat ﬂow changed all of this. To understand what the fuss was really about, we need to look more closely at what 1 Quotes in this section are taken from the article by W.A. Coppel, “J.B. Fourier—On the Occasion of his Two Hundredth Birthday,” American Mathematical Monthly, 76, 1969.

8.3. Fourier Series

229

Fourier was asserting, focusing individually on the terms “function,” “express,” and “trigonometric series.”

Trigonometric Series The basic principle behind any series representations is to express a given function f (x) as a sum of simpler functions. For power series, the component functions are {1, x, x2 , x3 , . . . }, so that the series takes the form f (x) =

∞

an xn = a0 + a1 x + a2 x2 + a3 x3 + · · · .

n=0

A trigonometric series is a very diﬀerent type of inﬁnite series where the functions {1, cos(x), sin(x), cos(2x), sin(2x), cos(3x), sin(3x), . . . } serve as the components. Thus, a trigonometric series has the form f (x) = a0 + a1 cos(x) + b1 sin(x) + a2 cos(2x) + b2 sin(2x) + a3 cos(3x) + · · · ∞ = a0 + an cos(nx) + bn sin(nx). n=1

The idea of representing a function in this way was not completely new when Fourier ﬁrst publicly proposed it in 1807. About 50 years earlier, Jean Le Rond d’Alembert (1717–1783) published the partial diﬀerential equation ∂2u ∂2u = 2 2 ∂x ∂t

(1)

as a means of describing the motion of a vibrating string. In this model, the function u(x, t) represents the displacement of the string at time t ≥ 0 and at some point x, which we will take to be in the interval [0, π]. Because the string is understood to be attached at each end of this interval, we have (2)

u(0, t) = 0

and

u(π, t) = 0

for all values of t ≥ 0. Now, at t = 0, the string is displaced some initial amount, and at the moment it is released we assume ∂u (x, 0) = 0, ∂t

(3)

meaning that, although the string immediately starts to move, it is given no initial velocity at any point. Finding a function u(x, t) that satisﬁes equations (1), (2), and (3) is not too diﬃcult. Exercise 8.3.1. (a) Verify that u(x, t) = bn sin(nx) cos(nt)

230

Chapter 8. Additional Topics

satisﬁes equations (1), (2), and (3) for any choice of n ∈ N and bn ∈ R . What goes wrong if n ∈ / N? (b) Explain why any ﬁnite sum of functions of the form given in part (a) would also satisfy (1), (2), and (3). (Incidentally, it is possible to hear the diﬀerent solutions in (a) for values of n up to 4 or 5 by isolating the harmonics on a well-made stringed instrument.) Now, we come to the truly interesting issue. We have just seen that any function of the form (4)

u(x, t) =

N

bn sin(nx) cos(nt)

n=1

solves d’Alembert’s wave equation, as it is called, but the particular solution we want depends on how the string is originally “plucked.” At time t = 0, we will assume that the string is given some initial displacement f (x) = u(x, 0). Setting t = 0 in our family of solutions in (4), the hope is that the initial displacement function f (x) can be expressed as (5)

f (x) =

N

bn sin(nx).

n=1

What this means is that if there exist suitable coeﬃcients b1 , b2 , . . . , bN so that f (x) can be written as a sum of sine functions as in (5), then the vibrating-string problem is completely solved by the function u(x, t) given in (4). The obvious question to ask, then, is just what types of functions can be constructed as linear combinations of the functions {sin(x), sin(2x), sin(3x), . . . }. How general can f (x) be? Daniel Bernoulli (1700–1782) is usually credited with proposing the idea that by taking an inﬁnite sum in equation (5), it may be possible to represent any initial position f (x) over the interval [0, π]. Fourier was studying the propagation of heat when trigonometric series resurfaced in his work in a very similar way. For Fourier, f (x) represented an initial temperature applied to the boundary of some heat-conducting material. The diﬀerential equations describing heat ﬂow are slightly diﬀerent from d’Alembert’s wave equation, but they still involve the second derivatives that make expressing f (x) as a sum of trigonometric functions the crucial step in ﬁnding a solution.

Periodic Functions In the early stages of his work, Fourier focused his attention on even functions (i.e., functions satisfying f (x) = f (−x)) and sought out ways to represent them as series of the form an cos(nx). Eventually, he arrived at the more general formulation of the problem, which is to ﬁnd suitable coeﬃcients (an ) and (bn ) to express a function f (x) as (6)

f (x) = a0 +

∞ n=1

an cos(nx) + bn sin(nx).

8.3. Fourier Series

231

Figure 8.1: f (x) = x2 over (−π, π], extended to be 2π periodic.

As we begin to explore how arbitrary f (x) can be, it is important to notice that every component of the series in equation (6) is periodic with period 2π. Turning our attention to the term “function,” it now follows that any function we hope to represent by a trigonometric series will necessarily be periodic as well. We will give primary attention to the interval (−π, π]. What this means is that, given a function such as f (x) = x2 , we will restrict our attention to f over the domain (−π, π] and then extend f periodically to all of R via the rule f (x) = f (x + 2kπ) for all k ∈ Z (Fig. 8.1). This convention of focusing on just the part of f (x) over the interval (−π, π] hardly seems controversial, but it did generate some confusion in Fourier’s time. In Sections 1.2 and 4.1, we alluded to the fact that in the early 1800s the term “function” was used to mean something more like “formula.” It was generally believed that a function’s behavior over the interval (−π, π] determined its behavior everywhere else, a point of view that follows naturally from an overly zealous faith in Taylor series. The modern deﬁnition of function given in Definition 1.2.3 is attributed to Dirichlet from the 1830s, although the idea had been suggested earlier by others. In Theorie Analytique de la Chaleur, Fourier clariﬁes his own use of the term by stating that a “function f (x) represents a succession of values or ordinates, each of which is arbitrary...We do not suppose these ordinates to be subject to a common law; they succeed each other in any matter whatever, and each of them is given as if it were a single quantity.” In the end, we will need to make a few assumptions about the nature of our functions, but the requirements we will need are quite mild, especially when compared with restrictions such as “inﬁnitely diﬀerentiable,” which are necessary— but not suﬃcient—for the existence of a Taylor series representation.

Types of Convergence This brings us to a discussion of the word “expressed.” The assumptions we must ultimately place on our function depend on the kind of convergence we aim to demonstrate. How are we to understand the equal sign in equation (6)?

232

Chapter 8. Additional Topics

Our usual course of action with inﬁnite series is ﬁrst to deﬁne the partial sum (7)

SN (x) = a0 +

N

an cos(nx) + bn sin(nx).

n=1

To “express f (x) as a trigonometric series” then means ﬁnding coeﬃcients ∞ (an )∞ n=0 and (bn )n=1 so that (8)

f (x) = lim SN (x). N →∞

The question remains as to what kind of limit this is. Fourier probably imagined something akin to a pointwise limit because the concept of uniform convergence had not yet been formulated. In addition to pointwise convergence and uniform convergence, there are still other ways to interpret the limit in equation (8). Although it won’t be discussed here, it turns out that proving * π |SN (x) − f (x)|2 dx → 0 −π

is a natural way to understand equation (8) for a particular class of functions. This is referred to as L2 convergence. An alternate type of convergence that we will discuss, called Cesaro mean convergence, relies on demonstrating that the averages of the partial sums converge, in our case uniformly, to f (x).

Fourier Coeﬃcients In the discussion that follows, we are going to need a few calculus facts. Exercise 8.3.2. Using trigonometric identities when necessary, verify the following integrals. (a) For all n ∈ N, * π * π cos(nx)dx = 0 and sin(nx)dx = 0. −π

−π

(b) For all n ∈ N, * π cos2 (nx)dx = π −π

(c) For all m, n ∈ N,

*

π

−π

For m = n, * π −π

* and

π

−π

sin2 (nx)dx = π.

cos(mx) sin(nx)dx = 0.

* cos(mx) cos(nx)dx = 0

and

π

−π

sin(nx) sin(nx)dx = 0.

8.3. Fourier Series

233

The consequences of these results are much more interesting than their proofs. The intuition from inner-product spaces is useful. Interpreting the integral as a kind of dot product, this exercise can be summarized by saying that the functions {1, cos(x), sin(x), cos(2x), sin(2x), cos(3x), . . . } are all orthogonal to each other. The content of what follows is that they in fact form a basis for a large class of functions. The ﬁrst order of business is to deduce some reasonable candidates for the coeﬃcients (an ) and (bn ) in equation (6). Given a function f (x), the trick is to assume we are in possession of a representation described in (6) and then manipulate this equation in a way that leads to formulas for (an ) and (bn ). This is exactly how we proceeded with Taylor series expansions in Section 6.6. Taylor’s formula for the coeﬃcients was produced by repeatedly diﬀerentiating each side of the desired representation equation. Here, we integrate. To compute a0 , integrate each side of equation (6) from −π to π, brazenly take the integral inside the inﬁnite sum, and use Exercise 8.3.2 to get / * π * π . ∞ a0 + f (x)dx = an cos(nx) + bn sin(nx) dx −π

−π

* =

π

−π

n=1 ∞ * π

a0 dx +

= a0 (2π) +

−π

n=1 ∞

[an cos(nx) + bn sin(nx)] dx

an 0 + bn 0 = a0 (2π).

n=1

Thus, (9)

a0 =

1 2π

*

π

−π

f (x)dx.

The switching of the sum and the integral sign in the second step of the previous calculation should rightly raise some eyebrows, but keep in mind that we are really working backward from a hypothetical representation for f (x) to get a proposal for what a0 should be. The point is not to justify the derivation of the formula but rather to show that using this value for a0 ultimately gives us the representation we want. That hard work lies ahead. Now, consider a ﬁxed m ≥ 1. To compute am , we ﬁrst multiply each side of equation (6) by cos(mx) and again integrate over the interval [−π, π]. Exercise 8.3.3. Derive the formulas * 1 π (10) am = f (x) cos(mx)dx π −π for all m ≥ 1.

and

bm =

1 π

*

π

−π

f (x) sin(mx)dx

234

Chapter 8. Additional Topics

Let’s take a short break and empirically test our recipes for (am ) and (bm ) on a few simple functions. Example 8.3.1. Let  if 0 < x < π  1 0 if x = 0 or x = π f (x) =  −1 if −π < x < 0. The fact that f is an odd function (i.e., f (−x) = −f (x)) means we can avoid doing any integrals for the moment and just appeal to a symmetry argument to conclude * π * 1 1 π a0 = f (x)dx = 0 and an = f (x) cos(nx) = 0 2π −π π −π for all n ≥ 1. We can also simplify the integral for bn by writing bn =

1 π

*

π

−π

* 2 π sin(nx)dx π 0 2 −1 = cos(nx)|π0 π n 4/nπ if n is odd = 0 if n is even.

f (x) sin(nx) =

Proceeding on blind faith, we plug these results into equation (6) to get the representation ∞ 4 1 f (x) = sin((2n + 1)x). π n=0 2n + 1 A graph of a few of the partial sums of this series (Fig. 8.2) should generate some optimism about the legitimacy of what is happening. Exercise 8.3.4. (a) Referring to the previous example, explain why we can be sure that the convergence of the partial sums to f (x) is not uniform on any interval containing 0. (b) Repeat the computations of Example 8.3.1 for the function g(x) = |x| and examine graphs for some partial sums. This time, make use of the fact that g is even (g(x) = g(−x)) to simplify the calculations. By just looking at the coeﬃcients, how do we know this series converges uniformly to something? (c) Use graphs to collect some empirical evidence regarding the question of term-by-term diﬀerentiation in our two examples to this point. Is it possible to conclude convergence or divergence of either diﬀerentiated series by looking at the resulting coeﬃcients? In Chapter 6, we have a theorem about the legitimacy of term-by-term diﬀerentiation. Can it be applied to either of these examples?

8.3. Fourier Series

–3

235

–2

–1

1

2

3

Figure 8.2: f , S4 , and S20 on [−π, π].

The Riemann–Lebesgue Lemma In the examples we have seen to this point, the sequences of Fourier coeﬃcients (an ) and (bn ) all tend to 0 as n → ∞. This is always the case. Understanding why this happens is crucial to our upcoming convergence proof. We start with a simple observation. The reason * π sin(x)dx = 0 −π

is that the positive and negative portions of the sine curve cancel each other out. The same is true of * π −π

sin(nx)dx = 0.

Now, when n is large, the period of the oscillations of sin(nx) becomes very short—2π/n to be precise. If h(x) is a continuous function, then the values of h do not vary too much as sin(nx) ranges over each short period. The result is that the successive positive and negative oscillations of the product h(x) sin(nx) (Fig. 8.3) are nearly the same size so that the cancellation leads to a small value for * π h(x) sin(nx)dx. −π

Theorem 8.3.2 (Riemann–Lebesgue Lemma). Assume h(x) is continuous on (−π, π]. Then, * π * π h(x) sin(nx)dx → 0 and h(x) cos(nx)dx → 0 −π

as n → ∞.

−π

236

Chapter 8. Additional Topics

Figure 8.3: h(x) and h(x) sin(nx) for large n.

Proof. Remember that, like all of our functions from here on, we are mentally extending h to be 2π-periodic. Thus, while our attention is generally focused on the interval (−π, π], the assumption of continuity is intended to mean that the periodically extended h is continuous on all of R. Note that in addition to continuity on (−π, π], this amounts to insisting that limx→−π+ h(x) = h(π). Exercise 8.3.5. Explain why h is uniformly continuous on R. Given > 0, choose δ > 0 such that |x − y| < δ implies |h(x) − h(y)| < /2. The period of sin(nx) is 2π/n, so choose N large enough so that 2π/n < δ whenever n ≥ N . Now, consider a particular interval [a, b] of length 2π/n over which sin(nx) moves through one complete oscillation. Exercise 8.3.6. Show that plete the proof.

)b a

h(x) sin(nx)dx < /n, and use this fact to com-

Applications of Fourier series are not restricted to continuous functions (Example 8.3.1). Even though our particular proof makes use of continuity, the Riemann–Lebesgue lemma holds under much weaker hypotheses. It is true, however, that any proof of this fact ultimately takes advantage of the cancellation of positive and negative components. Recall from Chapter 2 that this type of cancellation is the mechanism that distinguishes conditional convergence from absolute convergence. In the end, what we discover is that, unlike power series, Fourier series can converge conditionally. This makes them less robust, perhaps, but more versatile and capable of more interesting behavior.

8.3. Fourier Series

237

A Pointwise Convergence Proof Let’s return once more to Fourier’s claim that every “function” can be “expressed” as a trigonometric series. Our recipe for the Fourier coeﬃcients in equations (9) and (10) implicitly requires that our function be integrable. This is the major motivation for Riemann’s modiﬁcation of Cauchy’s deﬁnition of the integral. Because integrability is a prerequisite for producing a Fourier series, we would like the class of integrable functions to be as large as possible. The natural question to ask now is whether Riemann integrability is enough or whether we need to make some additional assumptions about f in order to guarantee that the Fourier series converges back to f . The answer depends on the type of convergence we hope to establish. f (x) = a0 +

∞

an cos(nx) + bn sin(nx)

n=1

bounded integrable continuous

pointwise convergence

diﬀerentiable f continuous

Cesaro mean convergence

uniform convergence L2 convergence

There is no tidy way to summarize the situation. For pointwise convergence, integrability is not enough. At present, “integrable” for us means Riemannintegrable, which we have only rigorously deﬁned for bounded functions. In 1966, Lennart Carleson proved (via an extremely complicated argument) that the Fourier series for such a function converges pointwise at every point in the domain excluding possibly a set of measure zero. This term surfaced in our discussion of the Cantor set (Section 3.1) and is deﬁned rigorously in Section 7.6. Sets of measure zero are small in one sense, but they can be uncountable, and there are examples of continuous functions with Fourier series that diverge at uncountably many points. Lebesgue’s modiﬁcation of Riemann’s integral in 1901 proved to be a much more natural setting for Fourier analysis. Carleson’s proof is really about Lebesgue-integrable functions which are allowed to be )π unbounded but for which −π |f |2 is ﬁnite. One of the cleanest theorems in this area states that, for this class of square Lebesgue-integrable functions, the Fourier series always converges to the function from which it was derived if we interpret convergence in the L2 sense described earlier. As a ﬁnal warning about how fragile the situation is, there is an example due to A. Kolmogorov (1903–1987) of a Lebesgue-integrable function where the Fourier series fails to converge at any point. Although all of these results require signiﬁcantly more background to pursue in any rigorous way, we are in a position to prove some important theorems that

238

Chapter 8. Additional Topics

require a few extra assumptions about the function in question. We will content ourselves with two interesting results in this area. Theorem 8.3.3. Let f (x) be continuous on (−π, π], and let SN (x) be the N th partial sum of the Fourier series described in equation (7), where the coeﬃcients (an ) and (bn ) are given by equations (9) and (10). It follows that lim SN (x) = f (x)

N →∞

pointwise at any x ∈ (−π, π] where f (x) exists. Proof. Cataloging a few preliminary facts makes for a smoother argument. Fact 1: (a) cos(α − θ) = cos(α) cos(θ) + sin(α) sin(θ). (b) sin(α + θ) = sin(α) cos(θ) + cos(α) sin(θ). Fact 2:

1 2

+ cos(θ) + cos(2θ) + cos(3θ) + · · · + cos(N θ) =

any θ = 2nπ.

sin((N + 1/2)θ) for 2 sin(θ/2)

Facts 1(a) and 1(b) are familiar trigonometric identities. Fact 2 is not as familiar. Its proof (which we omit) is most easily derived by taking the real part of a geometric sum of complex exponentials. The function in Fact 2 is called the Dirichlet kernel in honor of the mathematician responsible for the ﬁrst rigorous convergence proof of this kind. Integrating both sides of this identity leads to our next important fact. Fact 3: Setting (

sin((N +1/2)θ) 2 sin(θ/2) ,

DN (θ) =

1/2 + N,

if θ = 2nπ if θ = 2nπ

from Fact 2, we see that *

π

−π

DN (θ)dθ = π.

Although we will not restate it, the last fact we will use is the Riemann– Lebesgue lemma. Fix a point x ∈ (−π, π]. The ﬁrst step is to simplify the expression for SN (x). Now x is a ﬁxed constant at the moment, so we will write the integrals in equations (9) and (10) using t as the variable of integration. Keeping an eye

8.3. Fourier Series

239

on Facts 1(a) and (2), we get that SN (x) = a0 + & =

N n=1

1 2π

=

1 π

=

1 π

=

1 π

*

π

an cos(nx) + bn sin(nx) '

π

'

f (t) cos(nt)dt cos(nx) π −π ' N & * π 1 + f (t) sin(nt)dt sin(nx) π −π n=1 . / * π N 1 f (t) cos(nt) cos(nx) + sin(nt) sin(nx) dt + 2 n=1 −π . / * π N 1 f (t) cos(nt − nx) dt + 2 n=1 −π * π f (t)DN (t − x)dt. −π

f (t)dt +

N & * 1

n=1

−π

As one ﬁnal simpliﬁcation, let u = t − x. Then, * * 1 π 1 π−x SN (x) = f (u + x)DN (u)du = f (u + x)DN (u)du. π −π π −π−x The last equality is a result of our agreement to extend f to be 2π-periodic. Because DN is also periodic (it is the sum of cosine functions), it does not matter over what interval we compute the integral as long as we cover exactly one full period. To prove SN (x) → f (x), we must show that |SN (x) − f (x)| gets arbitrarily small when N gets large. Having expressed SN (x) as an integral involving DN (u), we are motivated to do a similar thing for f (x). By Fact 3, * * 1 π 1 π DN (u)du = f (x)DN (u)du, f (x) = f (x) π −π π −π and it follows that (11)

1 SN (x) − f (x) = π

*

π

−π

(f (u + x) − f (x))DN (u)du.

Our goal is to show this quantity tends to zero as N → ∞. A sketch of DN (u) (Fig. 8.4) for a few values of N reveals why this might happen. For large N , the Dirichlet kernel DN (u) has a tall, thin spike around u = 0, but this is precisely where f (u + x) − f (x) is small (because f is continuous). Away from zero, DN (u) exhibits the fast oscillations that hearken back to the Riemann– Lebesgue lemma (Theorem 8.3.2). Let’s see how to use this theorem to ﬁnish the argument.

240

Chapter 8. Additional Topics

Figure 8.4: D6 (u) and D16 (u).

Using Fact 1(b), we can rewrite the Dirichlet kernel as & ' 1 sin(N u) cos(u/2) sin((N + 1/2)u) DN (u) = = + cos(N u) . 2 sin(u/2) 2 sin(u/2) Then, equation (11) becomes & ' * π 1 sin(N u) cos(u/2) SN (x) − f (x) = (f (u + x) − f (x)) + cos(N u) du 2π −π sin(u/2) * π sin(N u) cos(u/2) 1 (f (u + x) − f (x)) = 2π −π sin(u/2) + (f (u + x) − f (x)) cos(N u)du * π * π 1 1 = px (u) sin(N u)du + qx (u) cos(N u)du, 2π −π 2π −π where in the last step we have set px (u) =

(f (u + x) − f (x)) cos(u/2) sin(u/2)

and

qx (u) = f (u + x) − f (x).

Exercise 8.3.7. (a) First, argue why the integral involving qx (u) tends to zero as N → ∞. (b) The ﬁrst integral is a little more subtle because the function px (u) has the sin(u/2) term in the denominator. Use the fact that f is diﬀerentiable at x (and a familiar limit from calculus) to prove that the ﬁrst integral goes to zero as well. This completes the argument that SN (x) → f (x) at any point x where f is diﬀerentiable. If the derivative exists everywhere, then we obviously get

8.3. Fourier Series

241

SN → f pointwise. If we add the assumption that f is continuous, then it is not too diﬃcult to show that the convergence is uniform. In fact, there is a very strong relationship between the speed of convergence of the Fourier series and the smoothness of f . The more derivatives f possesses, the faster the partial sums SN converge to f .

Cesaro Mean Convergence Rather than pursue the proofs in this interesting direction, we will ﬁnish this very brief introduction to Fourier series with a look at a diﬀerent type of convergence called Cesaro mean convergence. Exercise 8.3.8. Prove that if a sequence of real numbers (xn ) converges, then the arithmetic means yn =

x1 + x2 + x3 + · · · + xn n

also converge to the same limit. Give an example to show that it is possible for the sequence of means (yn ) to converge even if the original sequence (xn ) does not. The discussion preceding Theorem 8.3.3 is intended to create a kind of reverence for the diﬃculties inherent in deciphering the behavior of Fourier series, especially in the case where the function in question is not diﬀerentiable. It is from this humble frame of mind that the following elegant result due to L. Fej´er in 1904 can best be appreciated. Theorem 8.3.4 (Fej´ er’s Theorem). Let SN (x) be the N th partial sum of the Fourier series for a function f on (−π, π]. Deﬁne σN (x) =

N 1 SN (x). N + 1 n=0

If f is continuous on (−π, π], then σN (x) → f (x) uniformly. Proof. This argument is patterned after the proof of Theorem 8.3.3 but is actually much simpler. In addition to the trigonometric formulas listed in Facts 1 and 2, we are going to need a version of Fact 2 for the sine function, which looks like 0 1 0 1 sin N2θ sin (N + 1) θ2 1 0 sin(θ) + sin(2θ) + sin(3θ) + · · · + sin(N θ) = . sin θ2 Exercise 8.3.9. Use the previous identity to show that . 0 1 /2 sin (N + 1) θ2 1 1/2 + D1 (θ) + D2 (θ) + · · · + DN (θ) 0 1 = . N +1 2(N + 1) sin θ2

242

Chapter 8. Additional Topics

The expression in Exercise 8.3.9 is called the Fej´er kernel and will be denoted by FN (θ). Analogous to the Dirichlet kernel DN (θ) from the proof of Theorem 8.3.3, FN is used to greatly simplify the formula for σN (x). Exercise 8.3.10. (a) Show that σN (x) =

1 π

*

π

−π

f (u + x)FN (u) du.

(b) Graph the function FN (u) for several values of N . Where is FN large, and where is it close to zero? Compare this function to the Dirichlet kernel DN (u). Now, prove that FN → 0 uniformly on any set of the form {u : |u| ≥ δ}, where δ > 0 is ﬁxed (and)u is restricted to the interval (−π, π]). π (c) Prove that −π FN (u) du = π. (d) To ﬁnish the proof of Fej´er’s theorem, ﬁrst choose a δ > 0 so that |u| < δ

implies

|f (x + u) − f (x)| < .

Set up a single integral that represents the diﬀerence σN f (x) − f (x) and divide this integral into sets where |u| ≤ δ and |u| ≥ δ. Explain why it is possible to make each of these integrals suﬃciently small, independently of the choice of x.

Weierstrass Approximation Theorem The hard work of proving Fej´er’s theorem has many rewards, one of which is access to a relatively short argument for a profoundly important theorem discovered by Weierstrass in 1885. Theorem 8.3.5 (Weierstrass Approximation Theorem). If f is a continuous function on a closed interval [a, b], then there exists a sequence of polynomials that converges uniformly to f (x) on [a, b]. Proof. We have actually seen a special case of this result once before in Section 6.6 on Taylor series. Even if this material has not been covered, the formula (12)

sin(x) = x −

x3 x5 x7 x9 + − + − ··· 3! 5! 7! 9!

is probably familiar from a course in calculus. The content of Section 6.6 is that, by using Lagrange’s Remainder Theorem, we can prove that the Taylor series in (12) converges uniformly to sin(x) on any bounded subset of R. Uniform convergence of a series means the partial sums converge uniformly, and the partial sums in this case are polynomials. Notice that this is precisely what Theorem 8.3.5 asks us to prove, only we must do it for an arbitrary, continuous function in place of sin(x). Using Taylor series will not work in general. The major problem is that to construct a Taylor series we need the function to be inﬁnitely diﬀerentiable, and

8.4. A Construction of R From Q

243

even in this case we might get a series that either does not converge or converges to the wrong thing. We do plan to use Taylor series, however. The important point about them for this conversation is that the Taylor series for sin(x) and cos(x) do converge uniformly to the proper limit on any bounded set. Exercise 8.3.11. (a) Make use of the previous comments and use Fej´er’s Theorem to complete the proof of Theorem 8.3.5 under the added assumption that the interval [a, b] is [0, π]. (b) Show how the case for an arbitrary interval [a, b] follows from this one.

It is interesting to juxtapose this result of Weierstrass with his demonstration of a continuous nowhere-diﬀerentiable function. Although there exist continuous functions that oscillate so wildly that they fail to have a derivative at any point, these unruly functions are always uniformly within of an inﬁnitely diﬀerentiable polynomial.

Approximation as a Unifying Theme Viewing the last section of this chapter as a kind of appendix (included to clear up some loose ends from Chapter 1 regarding the deﬁnition of the real numbers), Weierstrass’ Approximation Theorem makes for a ﬁtting close to our introductory survey of some of the gems of analysis. The idea of approximation permeates the entire subject. Every real number can be approximated with rational ones. The value of an inﬁnite sum is approximated with partial sums, and the value of a continuous function can be approximated with its values nearby. A function is diﬀerentiable when a straight line is a good approximation to the curve, and it is integrable when ﬁnite sums of rectangles are a good approximation to the area under the curve. Now, we learn that every continuous function can be approximated arbitrarily well with a polynomial. In every case, the approximating objects are tangible and well-understood, and the issue is how well these properties survive the limiting process. By viewing the diﬀerent inﬁnities of mathematics through pathways crafted out of ﬁnite objects, Weierstrass and the other founders of analysis created a paradigm for how to extend the scope of mathematical exploration deep into territory previously unattainable. Although our journey ends here, the road is long and continues to be written.

8.4

A Construction of R From Q

This entire section is devoted to constructing a proof for the following theorem: Theorem 8.4.1. There exists an ordered ﬁeld in which every nonempty set that is bounded above has a least upper bound. In addition, this ﬁeld contains Q as a subﬁeld.

244

Chapter 8. Additional Topics

There are a few terms to deﬁne before this statement can be properly understood and proved, but it can essentially be paraphrased as “the real numbers exist.” In Section 1.1, we encountered a major failing of the rational number system as a place to do analysis. Without the square root of 2 (and uncountably many other irrational numbers) we cannot conﬁdently move from a Cauchy sequence to its limit because in Q there is no guarantee that such a number exists. (A review of Sections 1.1 and 1.3 is highly recommended at this point.) The resolution we proposed in Chapter 1 came in the form of the Axiom of Completeness, which we restate. Axiom of Completeness. Every nonempty set of real numbers that is bounded above has a least upper bound. Now let’s be clear about how we actually proceeded in Chapter 1. This is the property that distinguishes Q from R, but by referring to this property as an axiom we were making the point that it was not something to be proved. The real numbers were deﬁned simply as an extension of the rational numbers in which bounded sets have least upper bounds, but no attempt was made to demonstrate that such an extension is actually possible. Now, the time has ﬁnally come. By explicitly building the real numbers from the rational ones, we will be able to demonstrate that the Axiom of Completeness does not need to be an axiom at all; it is a theorem! There is something ironic about having the ﬁnal section of this book be a construction of the number system that has been the underlying subject of every preceding page, but there is something perfectly apt about it as well. Through eight chapters stretching from Cantor’s Theorem to the Baire Category Theorem, we have come to see how profoundly the addition of completeness changes the landscape. We all grow up believing in the existence of real numbers, but it is only through a study of classical analysis that we become aware of their elusive and enigmatic nature. It is because completeness matters so much, and because it is responsible for such perplexing phenomena, that we should now feel obliged—compelled really—to go back to the beginning and verify that such a thing really exists. As we mentioned in Chapter 1, proceeding in this order puts us in good historical company. The pioneering work of Cauchy, Bolzano, Abel, Dirichlet, Weiestrass, and Riemann preceded—and in a very real sense led to—the host of rigorous deﬁnitions for R that were proposed in 1872. Georg Cantor is a familiar name responsible for one of these deﬁnitions, but alternate constructions of the real number system also came from Charles Meray (1835–1911), Heinrich Heine (1821–1881), and Richard Dedekind (1831–1916). The formulation that follows is the one due to Dedekind. In a sense it is the most abstract of the approaches, but it is the most appropriate for us because the veriﬁcation of completeness is done in terms of least upper bounds.

8.4. A Construction of R From Q

245

Dedekind Cuts We begin this discussion by assuming that the rational numbers and all of the familiar properties of addition, multiplication, and order are available to us. At the moment, there is no such thing as a real number. Deﬁnition 8.4.2. A subset A of the rational numbers is called a cut if it possesses the following three properties: (c1) A = ∅ and A = Q. (c2) If r ∈ A, then A also contains every rational q < r. (c3) A does not have a maximum; that is, if r ∈ A, then there exists s ∈ A with r < s. Exercise 8.4.1. (a) Fix r ∈ Q. Show that the set Cr = {t ∈ Q : t < r} is a cut. The temptation to think of all cuts as being of this form should be avoided. Which of the following subsets of Q are cuts? (b) S = {t ∈ Q : t ≤ 2} (c) T = {t ∈ Q : t2 < 2 or t < 0} (d) U = {t ∈ Q : t2 ≤ 2 or t < 0} Exercise 8.4.2. Let A be a cut. Show that if r ∈ A and s ∈ / A, then r < s. To dispel any suspense, let’s get right to the point. Deﬁnition 8.4.3. Deﬁne the real numbers R to be the set of all cuts in Q. This may feel awkward at ﬁrst—real numbers should be numbers, not sets of rational numbers. The counterargument here is that when working on the foundations of mathematics, sets are about the most basic building blocks we have. We have deﬁned a set R whose elements are subsets of Q. We now must set about the task of imposing some algebraic structure on R that behaves in a way familiar to us. What exactly does this entail? If we are serious about constructing a proof for Theorem 8.4.1, we need to be more speciﬁc about what we mean by an “ordered ﬁeld.”

Field and Order Properties Given a set F and two elements x, y ∈ F , an operation on F is a function that takes the ordered pair (x, y) to a third element z ∈ F . Writing x + y or xy to represent diﬀerent operations reminds us of the two operations that we are trying to emulate. Deﬁnition 8.4.4. A set F is a ﬁeld if there exist two operations—addition (x + y) and multiplication (xy)—that satisfy the following list of conditions: (f1) (commutativity) x + y = y + x and xy = yx for all x, y ∈ F .

246

Chapter 8. Additional Topics

(f2) (associativity) (x+y)+z = x+(y+z) and (xy)z = x(yz) for all x, y, z ∈ F . (f3) (identities exist) There exist two special elements 0 and 1 with 0 = 1 such that x + 0 = 0 and x1 = x for all x ∈ F . (f4) (inverses exist) Given x ∈ F , there exists an element −x ∈ F such that x + (−x) = 0. If x = 0, there exists an element x−1 such that xx−1 = 1. (f5) (distributive property) x(y + z) = xy + xz for all x, y, z ∈ F . Exercise 8.4.3. Using the usual deﬁnitions of addition and multiplication, determine which of these properties are possessed by N, Z, and Q, respectively. Although we will not pursue this here in any depth, all of the familiar algebraic manipulations in Q (e.g., x + y = x + z implies y = z) can be derived from this short list of properties. Deﬁnition 8.4.5. An ordering on a set F is a relation, represented by ≤, with the following three properties: (o1) For arbitrary x, y ∈ F , at least one of the statements x ≤ y or y ≤ x is true. (o2) If x ≤ y and y ≤ x, then x = y. (o3) If x ≤ y and y ≤ z, then x ≤ z. We will sometimes write y ≥ x in place of x ≤ y. The strict inequality x < y is used to mean x ≤ y but x = y. A ﬁeld F is called an ordered ﬁeld if F is endowed with an ordering ≤ that satisﬁes (o4) If y ≤ z, then x + y ≤ x + z. (o5) If x ≥ 0 and y ≥ 0, then xy ≥ 0. Let’s take stock of where we are. To prove Theorem 8.4.1, we are accepting as given that the rational numbers are an ordered ﬁeld. We have deﬁned the real numbers R to be the collection of cuts in Q, and the challenge now is to invent addition, multiplication, and an ordering so that each possesses the properties outlined in the preceding two deﬁnitions. The easiest of these is the ordering. Let A and B be two arbitrary elements of R. Deﬁne

A≤B

to mean

A ⊆ B.

Exercise 8.4.4. Show that this deﬁnes an ordering on R by verifying properties (o1), (o2), and (o3) from Deﬁnition 8.4.5.

8.4. A Construction of R From Q

247

Algebra in R Given A and B in R, deﬁne A + B = {a + b : a ∈ A and b ∈ B}. Before checking properties (f1)–(f4) for addition, we must ﬁrst verify that our deﬁnition really deﬁnes an operation. Is A + B actually a cut? To get the ﬂavor of how these arguments look, let’s verify property (c2) of Deﬁnition 8.4.2 for the set A + B. Let a + b ∈ A + B be arbitrary and let s ∈ Q satisfy s < a + b. Then, s − b < a, which implies that s − b ∈ A because A is a cut. But then s = (s − b) + b ∈ A + B, and (c2) is proved. Exercise 8.4.5. (a) Show that (c1) and (c3) also hold for A + B. Conclude that A + B is a cut. (b) Check that addition in R is commutative (f1) and associative (f2). (c) Show that the cut O = {p ∈ Q : p < 0} successfully plays the role of the additive identity (f3). (Showing A + O = O amounts to proving that these two sets are the same. The standard way to prove such a thing is to show two inclusions: A + O ⊆ O and O ⊆ A + O.) What about additive inverses? Given A ∈ R, we must produce a cut −A with the property that A + (−A) = O. This is a bit more diﬃcult than it sounds. Conceptually, the cut −A consists of all rational numbers less than − sup A. The problem is how to deﬁne this set without using suprema, which are strictly oﬀ limits at the moment. (We are building the ﬁeld in which they exist!) Given A ∈ R, deﬁne −A = {r ∈ Q : there exists t ∈ / A with t < −r}. A

) r •

−A

0

t • )❄ −r

Exercise 8.4.6. (a) Prove that −A deﬁnes a cut. (b) What goes wrong if we set −A = {r ∈ Q : −r ∈ / A}? (c) If a ∈ A and r ∈ −A, show a + r ∈ O. This shows A + (−A) ⊆ O. Now, ﬁnish the proof of property (f4) for addition in Deﬁnition 8.4.4. (d) Show that property (o3) holds.

248

Chapter 8. Additional Topics

Although the ideas are similar, the technical diﬃculties increase when we try to create a deﬁnition for multiplication in R. This is largely due to the fact that the product of two negative numbers is positive. The standard method of attack is ﬁrst to deﬁne multiplication on the positive cuts. Given A ≥ O and B ≥ O in R, deﬁne the product AB = {ab : a ∈ A, b ∈ B with a, b ≥ 0} ∪ {q ∈ Q : q < 0}. Exercise 8.4.7. (a) Show that AB is a cut. (b) Prove property (o4) from Deﬁnition 8.4.5. (c) Propose a good candidate for the multiplicative identity (1) on R and show that this works for all cuts A ≥ O. (d) Show that AO = O for all cuts A ≥ O. Products involving at least one negative factor can be deﬁned in terms of the product of two positive cuts by observing that −A ≥ 0 whenever A ≤ O. (Given A ≤ O, property (c4) implies A + (−A) ≤ O + (−A), which yields O ≤ −A.) For any A and B in R, deﬁne  as given if A ≥ O and B ≥ O    −[A(−B)] if A ≥ O and B < O AB = −[(−A)B] if A < O and B ≥ O    (−A)(−B) if A < O and B < O. Verifying that multiplication deﬁned in this way satisﬁes all the required ﬁeld properties is important but uneventful. The proofs generally fall into cases for when terms are positive or negative and follow a pattern similar to those for addition. We will leave them as an unoﬃcial exercise and move on to the punch line.

Least Upper Bounds Having proved that R is an ordered ﬁeld, we now set our sights on showing that this ﬁeld is complete. We deﬁned completeness in Chapter 1 in terms of least upper bounds. Here is a summary of the relevant deﬁnitions from that discussion. Deﬁnition 8.4.6. A set A ⊆ R is bounded above if there exists a B ∈ R such that A ≤ B for all A ∈ A. The number B is called an upper bound for A. A real number S ∈ R is the least upper bound for a set A ⊆ R if it meets the following two criteria: (i) S is an upper bound for A and (ii) if B is any upper bound for A, then S ≤ B. Exercise 8.4.8. Let A ⊆ R be nonempty and bounded above, and let S be the union of all A ∈ A. (a) First, prove that S ∈ R by showing that it is a cut. (b) Now, show that S is the least upper bound for A.

8.4. A Construction of R From Q

249

This ﬁnishes the proof that R is complete. Notice that we could have proved that least upper bounds exist immediately after deﬁning the ordering on R, but saving it for last gives it the privileged place in the argument it deserves. There is, however, still one loose end to sew up. The statement of Theorem 8.4.1 mentions that our complete ordered ﬁeld contains Q as a subﬁeld. This is a slight abuse of language. What it should say is that R contains a subﬁeld that looks and acts exactly like Q. Exercise 8.4.9. Consider the collection of so-called “rational” cuts of the form Cr = {t ∈ Q : t < r} where r ∈ Q. (See Exercise 8.4.1.) (a) Show that Cr + Cs = Cr+s for all r, s ∈ Q. Verify Cr Cs = Crs for the case when r, s ≥ 0. (b) Show that Cr ≤ Cs if and only if r ≤ s in Q.

Cantor’s Approach As a way of giving Georg Cantor the last word, let’s brieﬂy look at his very diﬀerent approach to constructing R out of Q. One of the many equivalent ways to characterize completeness is with the assertion that “Cauchy sequences converge.” Given a Cauchy sequence of rational numbers, we are now well aware that this sequence may converge to a value not in Q. Just as before, the goal is to create something, which we will call a real number, that can serve as the limit of this sequence. Cantor’s idea was essentially to deﬁne a real number to be the entire Cauchy sequence. The ﬁrst problem one encounters with this approach is the realization that two diﬀerent Cauchy sequences can converge to the same real number. For this reason, the elements in R are more appropriately deﬁned as equivalence classes of Cauchy sequences where two sequences (xn ) and (yn ) are in the same equivalence class if and only if (xn − yn ) → 0. As with Dedekind’s approach, it can be momentarily disorienting to supplant our relatively simple notion of a real number as a decimal expansion with something as unruly as an equivalence class of Cauchy sequences. But what exactly do we mean by a decimal expansion? And how are we to understand the number 1/2 as both .5000... and .4999...? We leave it as an exercise.

Bibliography [1] Robert G. Bartle, The Elements of Real Analysis. Second Edition. John Wiley and Sons, New York, 1964. [2] Robert G. Bartle, “Return to the Riemann Integral,” American Mathematical Monthly, October, 1996. [3] E. T. Bell, Men of Mathematics. Simon and Schuster, New York, 1937. [4] Carl B. Boyer, A History of Mathematics. Princeton University Press, Princeton, New Jersey, 1969. [5] David Bressoud, A Radical Approach to Real Analysis. The Mathematical Association of America, Washington D.C., 1994. [6] Soo Bong Chae, Lebesgue Integration. Monographs and Textbooks in Pure and Applied Mathematics, Marcel Dekker, New York, 1980. [7] W.A. Coppel, “J.B. Fourier—On the Occasion of his Two Hundredth Birthday,” American Mathematical Monthly, 76, 1969, pp. 468–483. [8] Roger Cooke, “Uniqueness of Trigonometric Series and Descriptive Set Theory,” Oﬀprint from Archive for History of Exact Sciences, Volume 45, number 4, Springer–Verlag, New York, 1993, pp. 281–334. [9] William Dunham, Journey Through Genius. John Wiley and Sons, New York, 1990. [10] H, Dym and H.P. McKean, Fourier Series and Integrals. Academic Press, Inc., New York, 1972. [11] E. Hairer and G. Wanner, Analysis by Its History. Undergraduate Texts in Mathematics, Springer–Verlag, New York, 1996. [12] Paul R. Halmos, Naive Set Theory. Undergraduate Texts in Mathematics, Springer–Verlag, New York, 1974. [13] G.H. Hardy, A Mathematician’s Apology. Cambridge University Press (Canto Edition), Cambridge, 1992. 251

252

Bibliography

[14] E.W. Hobson, The Theory of Functions of a Real Variable and the Theory of Fourier’s Series. Volume 1, Third Edition. Harren Press, Washington D.C., 1950. [15] Cameron Parks, Branching Processes. Senior thesis, Middlebury College, 1998. [16] Walter Rudin, Principles of Mathematical Analysis. International Series in Pure and Applied Mathematics, McGraw–Hill, New York, 1964. [17] George Simmons, Calculus Gems: Brief Lives and Memorable Mathematics. McGraw–Hill, New York, 1992. [18] Rudolph V´ yborn´ y and Lee Peng Yee, Integral: An Easy Approach after Kurzweil and Henstock. Cambridge University Press, Cambridge, 2000.

Index Abel, Niels Henrik, 13, 170 Abel’s Lemma, 171 Abel’s Test, 69, 171 Abel’s Theorem, 172 Abel-summable, 175 absolute convergence, 65 of power series, 170 Absolute Convergence Test, 65 accumulation point, 79 additive inverse, 17, 246 Algebraic Limit Theorem for continuous functions, 110 for derivatives, 133 for functional limits, 106 for sequences, 45 for series, 62 algebraic number, 28 Alternating Series Test, 65 antichain, 32 Archimedean property, 19 Arzela–Ascoli Theorem, 34, 163, 224 average value, 202 Axiom of Completeness, 14, 16, 17, 60, 62, 122, 244

Cantor, Georg, 22, 75, 244, 249 Cantor diagonalization method, 29, 34 Cantor function, 162 Cantor set, 75, 89, 149, 204, 205, 208 Cantor’s Theorem, 32 cardinal number, 33 cardinality, 22, 33 Carleson, Lennart, 237 category, ﬁrst or second, 96, 226 Cauchy, Augustin Louis, 13, 60, 99, 181 Cauchy Condensation Test, 53 Cauchy Criterion, 59, 61, 62 for series, 63 for uniform convergence, 160 for uniform convergence of series, 168 in a metric space, 223 Cauchy product, 73, 74, 175 Cauchy sequence, 58 as a real number, 249 converges, 59 in a metric space, 223 Cesaro means, 50, 241 chain rule, 134 closed set, 80 in a metric space, 224 closure, 80, 81 in a metric space, 225 cluster point, 79 Cohen, Paul, 34 compact, 84 in a metric space, 224 subsets of C[0, 1], 225 Comparison Test, 63 complement, 6, 82 complete metric space, 223, 226 conditional convergence, 65 connected set, 91, 121 continuity, 109

Baire, Ren´e Louis, 94 Baire Category Theorem, 96, 127, 148, 226 Baire’s Theorem, 96, 127 Bernoulli, Daniel, 230 Bernoulli, Johann, 141 Bernstein, Felix, 33 Bolzano, Bernhard, 13, 60, 99, 121, 144 Bolzano–Weierstrass Theorem, 56, 60, 62 bounded function, 114 sequence, 44 set, 84, 224 branching processes, 151

253

254 α, 127, 205 and integrability, 190 and Riemann integrability, 185 and uniform convergence, 160 characterizations of, 109 in a metric space, 223 nowhere, 101 of compositions, 112 on a set, 109 on compact sets, 115 uniform, 116 Continuous Extension Theorem, 120 continuum hypothesis, 34 Contraction Mapping Theorem, 113 convergence L2 , 232, 237 absolute, 65, 70 Cesaro mean, 232, 241 conditional, 65 metric space, 223 of a Cauchy sequence, 59 of a sequence, 39, 58 of a subsequence, 55 of p–series, 54 of series, 51, 62 pointwise, 154, 158 pointwise for series, 167 to inﬁnity, 44 uniform, 157 uniform for series, 167 countable, 24 Q is, 24 R is not, 24 Cantor set is not, 77, 90 subsets, 26 unions, 26 cut, 245 d’Alembert, Jean Le Rond, 229 Darboux, Gaston, 135 Darboux’s Theorem, 136, 149, 183 De Morgan’s Laws, 6, 11, 83 decimal expansion, 30, 249 decreasing function, 125 sequence, 51 Dedekind, Richard, 244 Dedekind cut, 245 density in a metric space, 225

Index nowhere, 95, 225 of Q, 20, 81 derivative, 129, 132, 199 dimension, of Cantor set, 77 Dirichlet, Peter Lejeune, 7, 13, 100 Dirichlet kernel, 238, 239, 242 Dirichlet’s function, 7, 100, 125, 184, 193, 210, 218 Dirichlet’s Test, 69 disconnected set, 91 totally, 93 discontinuity all types, 126 essential, 131 divergence of a sequence, 42, 56 of functional limits, 107 domain, 7 double summation, 37, 69 empty set, 5 equivalence classes of Cauchy sequences, 249 of sets, 33 equivalence relation, 28, 33 eventually, 44, 49, 64 Extreme Value Theorem, 115 Fej´er, Lip´ ot, 241 Fej´er kernel, 242 Fej´er’s Theorem, 241 Fermat, Pierre de, 99, 135 ﬁeld, 3, 14, 17, 245 ﬁxed point, 143, 153 Fourier, Joseph, 228 Fourier coeﬃcients, 232 converge to zero, 235 Fourier series, 99, 145, 184, 228 Cesaro mean convergence of, 241 pointwise convergence of, 238 fractal, 78 frequently, 44 function, 7 functional limit, 104 Fundamental Theorem of Calculus, 138, 149, 177, 183, 200, 202, 213, 219 Galton, Francis, 151

Index gauge, 216 Generalized Mean Value Theorem, 140 generalized Riemann integral, 217 G¨ odel, Kurt, 34 greatest lower bound, 14, 17 halting problem, 34 Hardy, Godfrey Harold, 1, 148 harmonic series, 52 alternating, 73 Heine, Heinrich, 244 Heine–Borel Theorem, 84 increasing function, 125 sequence, 51 inﬁmum, 14, 17 inﬁnite series, 51, 62 associative property, 57 comparison test, 63 converges, 62 double summations, 69 of functions, 167 partial sum, 62 products, 63, 72 ratio test, 68 integer, 3 countable, 24 integral generalized Riemann, 212, 213, 217 improper, 221 Lebesgue, 211, 213, 237 lower, 188 Riemann, 188 substitution formula, 221 upper, 188 interior, 225 Interior Extremum Theorem, 135 intermediate value property, 124, 131 of derivatives, 136 Intermediate Value Theorem, 120 irrational number, 1, 4 isolated point, 80 Kolmogorov, Andrey, 237 Kronecker, Leopold, 2, 11 Lagrange, Joseph Louis, 178 Lagrange’s Remainder Theorem, 178 least upper bound, 14, 16, 243, 248

255 Lebesgue, Henri, 148, 193, 213 Lebesgue integral, 211 Lebesgue’s Theorem, 206 length, of Cantor set, 76, 204 L’Hospital, Guillaume Fran¸cois Antoine de, 141 L’Hospital’s rule, 141 lim inf, 55 lim sup, 55, 175 limit of a function, 104 of a sequence, 39 of Riemann sums, 185, 215 right-hand, 125 superior, 55, 175 limit point, 79 in metric space, 224 Lipschitz function, 120 lower integral, 188 lower sum, 186 Mandlebrot, Benoit, 78 maximum, 15 attained on compact sets, 115 Mean Value Theorem, 137, 138, 219 generalized, 140 measure zero, 204, 213, 237 Meray, Charles, 244 metric, 222 discrete, 222 metric space, 96, 222 complete, 223 minimum, 15 monotone function, 103, 125, 128, 148 sequence, 51 Monotone Convergence Theorem, 51, 60, 62 multiplicative inverse, 17, 246 natural logarithm, 202 natural number, 2 neighborhood, 39, 78 in a metric space, 224 Nested Interval Property, 18, 60, 122 nowhere continuous, 101 dense, 95, 225, 226 diﬀerentiable, 144, 226, 227

256 one-to-one, 23 onto, 23 open cover, 86, 206 open set, 78 in a metric space, 224 Order Limit Theorem, 48 ordered ﬁeld, 3, 14, 246 ordering, 246 partial sum, 51 partition, 186 δ-ﬁne, 214 δ(x)-ﬁne, 217 reﬁnement, 187 tagged, 214 perfect set, 89 piecewise linear, 228 pointwise convergence, 154 for series, 167 of Fourier series, 237, 238 power series, 74, 99, 151, 169, 229 diﬀerentiation of, 173 uniform convergence of, 172 power set, 31 preimage, 12, 122 proof by contradiction, 9 by induction, 10 contrapositive, 9 of convergence, 41 quantiﬁers, 41 radius of convergence, 170 range, 7 Ratio Test, 68 rational number, 1, 3, 249 countable, 24 real number, 4, 13 as a Cauchy sequence, 249 as a cut, 245 uncountable, 24 rearrangement, 36, 66 reﬁnement, 187 common, 187 Riemann, Bernhard, 13, 184 Riemann integral, 184, 188 and continuity, 190, 210 and discontinuity, 192, 193, 203, 206

Index criterion for existence, 189 improper, 212 properties of, 195 Riemann sum, 184, 214 Riemann–Lebesgue lemma, 235 Rolle, Michel, 138 Rolle’s Theorem, 138 Schr¨ oder, Ernst, 33 Schr¨ oder–Bernstein Theorem, 29, 33 separated sets, 91 sequence, 38 sequential criterion for continuity, 102, 109 for functional limits, 106 set, 5 Fσ , 94, 126 Gδ , 94 closed, 80 compact, 84 complement of, 6 connected, 91 disjoint, 5 empty, 5 fat, 96 ﬁrst or second category, 96, 226 inclusion, 5 meager, 96, 227 of measure zero, 204 open, 78 perfect, 89 subset, 5 square roots, 1, 54 Squeeze Theorem, 49, 109 subsequence, 55 substitution formula, 221 summation by parts, 68 sup norm, 223 supremum, 14 Taylor, Brook, 177 Taylor series, 176, 231, 242 formula for coeﬃcients, 177 remainder formula, 178 Thomae, K.J., 102 Thomae’s function, 102, 125, 203 total variation, 202 transcendental number, 28 triangle inequality, 8, 11, 47, 222 trigonometric series, 229

Index uniform convergence, 157 and continuity, 160, 167 and diﬀerentiation, 164, 166, 168 and integration, 197, 212 of power series, 170, 172 of series, 167 uniformly α-continuous, 205 continuous, 116 uniqueness of generalized Riemann integral, 217

257 of limits, 49 upper bound, 14, 248 upper integral, 188 upper sum, 186 wave equation, 230 Weierstrass, Karl, 13, 60, 99, 144, 148 Weierstrass Approximation Theorem, 242 Weierstrass M-Test, 168

i [section] [section] [section] [section] [section]

ii

Author’s note What began as a desire to sketch out a simple “answer key” for the problems in Understanding Analysis has inevitably evolved into something a bit more ambitious. As I was generating solutions for the over 350 exercises in the text, I found myself adding regular commentary on common pitfalls and strategies that frequently arise. My sense is that this manual should be a useful supplement to instructors teaching a course or to individuals engaged in an independent study. As with the textbook itself, I tried to write with the introductory student firmly in mind. In my teaching of analysis, I have come to understand the strong correlation between how students learn analysis and how they write it. A final goal I have for these notes is to illustrate by example how the form and grammar of a written argument are intimately connected to the clarity of a proof and, ultimately, to its validity. I would like to thank former students Carrick Detweiller, Katherine Ott, Yared Gurmu, and Yuqiu Jiang for their considerable help with a preliminary draft. I would also like to thank the readers of Understanding Analysis for the many comments I have received about the text. Especially appreciated are the constructive suggestions as well as the pointers to errors of fact, and I welcome more of the same. Middlebury, Vermont May 2004

Stephen Abbott

v

vi

Author’s note

Contents Author’s note 1 The 1.1 1.2 1.3 1.4 1.5

v

Real Numbers √ Discussion: The Irrationality of 2 Some Preliminaries . . . . . . . . . The Axiom of Completeness . . . . Consequences of Completeness . . Cantor’s Theorem . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

1 1 1 6 8 14

2 Sequences and Series 2.1 Discussion: Rearrangements of Infinite Series . . . . . . . 2.2 The Limit of a Sequence . . . . . . . . . . . . . . . . . . . 2.3 The Algebraic and Order Limit Theorems . . . . . . . . . 2.4 The Monotone Convergence Theorem and a First Look at Infinite Series . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Subsequences and the Bolzano–Weierstrass Theorem . . . 2.6 The Cauchy Criterion . . . . . . . . . . . . . . . . . . . . 2.7 Properties of Infinite Series . . . . . . . . . . . . . . . . . 2.8 Double Summations and Products of Infinite Series . . . .

. . . . . . . . . . . .

19 19 19 21

. . . . .

. . . . .

. . . . .

. . . . .

25 29 31 33 39

3 Basic Topology of R 3.1 Discussion: The Cantor Set . . . 3.2 Open and Closed Sets . . . . . . 3.3 Compact Sets . . . . . . . . . . . 3.4 Perfect Sets and Connected Sets 3.5 Baire’s Theorem . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

45 45 45 49 51 55

4 Functional Limits and Continuity 4.1 Discussion: Examples of Dirichlet and Thomae 4.2 Functional Limits . . . . . . . . . . . . . . . . . 4.3 Combinations of Continuous Functions . . . . . 4.4 Continuous Functions on Compact Sets . . . . 4.5 The Intermediate Value Theorem . . . . . . . . 4.6 Sets of Discontinuity . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

57 57 57 61 66 70 72

vii

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

viii 5 The 5.1 5.2 5.3 5.4

Contents Derivative Discussion: Are Derivatives Continuous? . . . . . Derivatives and the Intermediate Value Property The Mean Value Theorem . . . . . . . . . . . . . A Continuous Nowhere-Differentiable Function .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

6 Sequences and Series of Functions 6.1 Discussion: Branching Processes . . . . . . . . . 6.2 Uniform Convergence of a Sequence of Functions 6.3 Uniform Convergence and Differentiation . . . . 6.4 Series of Functions . . . . . . . . . . . . . . . . . 6.5 Power Series . . . . . . . . . . . . . . . . . . . . . 6.6 Taylor Series . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

89 . 89 . 89 . 97 . 99 . 102 . 105

7 The 7.1 7.2 7.3 7.4 7.5 7.6

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

111 111 111 114 117 120 123

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

129 129 133 141 149

Riemann Integral Discussion: How Should Integration be Defined? The Definition of the Riemann Integral . . . . . . Integrating Functions with Discontinuities . . . . Properties of the Integral . . . . . . . . . . . . . The Fundamental Theorem of Calculus . . . . . . Lebesgue’s Criterion for Riemann Integrability .

8 Additional Topics 8.1 The Generalized Riemann Integral . . . . . . . 8.2 Metric Spaces and the Baire Category Theorem 8.3 Fourier Series . . . . . . . . . . . . . . . . . . . 8.4 A Construction of R From Q . . . . . . . . . .

. . . .

75 75 75 79 84

Chapter 1

The Real Numbers 1.1

Discussion: The Irrationality of

1.2

Some Preliminaries

√

2

Exercise 1.2.1. (a) Assume, for contradiction, that there exist integers p and q satisfying µ ¶2 p (1) = 3. q Let us also assume that p and q have no common factor. Now, equation (1) implies (2)

p2 = 3q 2 .

From this, we can see that p2 is a multiple of 3 and hence p must also be a multiple of 3. This allows us to write p = 3r, where r is an integer. After substituting 3r for p in equation (2), we get (3r)2 = 3q 2 , which can be simplified to 3r2 = q 2 . This implies q 2 is a multiple of 3 and hence q is also a multiple of 3. Thus we have shown p and q have a common factor, namely 3, when they were originally assumed to have no common factor. √ A similar argument will work for 6 as well because we get p2 = 6q 2 which implies p is a multiple of 2 and 3. After making √ the necessary substitutions, we can conclude q is a multiple of 6, and therefore 6 must be irrational. (b) In this case, the fact that p2 is a multiple of 4 does not imply p is also a multiple of 4. Thus, our proof breaks down at this point. Exercise 1.2.2. (a) False, as seen in Example 1.2.2. (b) True. This will follow from upcoming results about compactness in Chapter 3. (c) False. Consider sets A = {1, 2, 3}, B = {3, 6, 7} and C = {5}. Note that A ∩ (B ∪ C) = {3} is not equal to (A ∩ B) ∪ C = {3, 5}. 1

2

Chapter 1. The Real Numbers (d) True. (e) True.

Exercise 1.2.3. (a) If x ∈ (A ∩ B)c then x ∈ / (A ∩ B). But this implies x ∈ /A or x ∈ / B. From this we know x ∈ Ac or x ∈ B c . Thus, x ∈ Ac ∪ B c by the definition of union. (b) To show Ac ∪ B c ⊆ (A ∩ B)c , let x ∈ Ac ∪ B c and show x ∈ (A ∩ B)c . So, if x ∈ Ac ∪ B c then x ∈ Ac or x ∈ B c . From this, we know that x ∈ / A or x∈ / B, which implies x ∈ / (A ∩ B). This means x ∈ (A ∩ B)c , which is precisely what we wanted to show. (c) In order to prove (A ∪ B)c = Ac ∩ B c we have to show, (1)

(A ∪ B)c ⊆ Ac ∩ B c and,

(2)

Ac ∩ B c ⊆ (A ∪ B)c .

To demonstrate part (1) take x ∈ (A ∪ B)c and show that x ∈ (Ac ∩ B c ). So, if x ∈ (A ∪ B)c then x ∈ / (A ∪ B). From this, we know that x ∈A / and x ∈ / B which implies x ∈ Ac and x ∈ B c . This means x ∈ (Ac ∩ B c ). Similarly, part (2) can be shown by taking x ∈ (Ac ∩ B c ) and showing that x ∈ (A ∪ B)c . So, if x ∈ (Ac ∩ B c ) then x ∈ Ac and x ∈ B c . From this, we know that x ∈ / A and x ∈ / B which implies x ∈ / (A ∪ B). This means x ∈ (A ∪ B)c . Since we have shown inclusion both ways, we conclude that (A ∪ B)c = Ac ∩ B c . Exercise 1.2.4. (a)When a and b have the same sign, consider the following two cases: (i) If a ≥ 0 and b ≥ 0 then we have a + b > 0 which implies |a + b| = a + b. Furthermore, because |a| = a and |b| = b, we have |a| + |b| = a + b. This implies, |a + b| = |a| + |b|, which satisfies the triangle inequality. (ii) If a ≤ 0 and b ≤ 0 then we have a + b ≤ 0 which implies |a + b| = −a − b. Furthermore, since we know |a| = −a and |b| = −b we have |a| + |b| = −a − b. This implies, |a + b| = |a| + |b|, which satisfies the triangle inequality. (b)If a ≥ 0, b < 0, and a + b ≥ 0 then we have |a + b| = a + b = a − (−b) = |a| − |b| < |a| + |b|. This implies |a + b| ≤ |a| + |b| as desired. Exercise 1.2.5. (a) Observe that |a − b| = |a + (−b)| ≤ |a| + | − b| = |a| + |b| which implies |a − b| ≤ |a| + |b|. (b) First note that |a| = |a − b + b| ≤ |a − b| + |b|. Taking |b| to the left side of the inequality we get |a| − |b| ≤ |a − b|. Reversing the roles of a and b in the previous argument gives |b| − |a| ≤ |b − a|, and because |a − b| = |b − a| the result follows. Exercise 1.2.6. (a) f (A) = [0, 4] and f (B) = [1, 16]. In this case, f (A ∩ B) = f (A) ∩ f (B) = [1, 4] and f (A ∪ B) = f (A) ∪ f (B) = [0, 16]. (b) Take A = [0, 2] and B = [−2, 0] and note that f (A ∩ B) = {0} but f (A) ∩ f (B) = [0, 4].

1.2. Some Preliminaries

3

(c) We have to show y ∈ g(A ∩ B) implies y ∈ g(A) ∩ g(B). If y ∈ g(A ∩ B) then there exists an x ∈ A ∩ B with g(x) = y. But this means x ∈ A and x ∈ B and hence g(x) ∈ g(A) and g(x) ∈ g(B). Therefore, g(x) = y ∈ g(A) ∩ g(B). (d) Our claim is g(A ∪ B) = g(A) ∪ g(B). In order to prove it, we have to show, (1)

g(A ∪ B) ⊆ g(A) ∪ g(B) and,

(2)

g(A) ∪ g(B) ⊆ g(A ∪ B).

To demonstrate part (1), we let y ∈ g(A ∪ B) and show y ∈ g(A) ∪ g(B). If y ∈ g(A ∪ B) then there exists x ∈ A ∪ B with g(x) = y. But this means x ∈ A or x ∈ B, and hence g(x) ∈ g(A) or g(x) ∈ g(B). Therefore, g(x) = y ∈ g(A) ∪ g(B). To demonstrate the reverse inclusion, we let y ∈ g(A) ∪ g(B) and show y ∈ g(A ∪ B). If y ∈ g(A) ∪ g(B) then y ∈ g(A) or y ∈ g(B). This means we have an x ∈ A or x ∈ B such that g(x) = y. This implies, x ∈ A ∪ B, and hence g(x) ∈ g(A ∪ B). Since we have shown parts (1) and (2), we can conclude g(A ∪ B) = g(A) ∪ g(B). Exercise 1.2.7. (a) f −1 (A) = [−2, 2] and f −1 (B) = [−1, 1]. In this case, f −1 (A ∩ B) = f −1 (A) ∩ f −1 (B) = [−1, 1] and f −1 (A ∪ B) = f −1 (A) ∪ f −1 (B) = [−2, 2]. (b) In order to prove g −1 (A ∩ B) = g −1 (A) ∩ g −1 (B), we have to show, (1)

g −1 (A ∩ B) ⊆ g −1 (A) ∩ g −1 (B) and,

(2)

g −1 (A) ∩ g −1 (B) ⊆ g −1 (A ∩ B).

To demonstrate part (1), we let x ∈ g −1 (A ∩ B) and show x ∈ g −1 (A) ∩ g −1 (B). So, if x ∈ g −1 (A ∩ B) then g(x) ∈ (A ∩ B). But this means g(x) ∈ A and g(x) ∈ B, and hence g(x) ∈ A ∩ B. This implies, x ∈ g −1 (A) ∩ g −1 (B). To demonstrate the reverse inclusion, we let x ∈ g −1 (A) ∩ g −1 (B) and show x ∈ g −1 (A ∩ B). So, if x ∈ g −1 (A) ∩ g −1 (B) then x ∈ g −1 (A) and x ∈ g −1 (B). This implies g(x) ∈ A and g(x) ∈ B, and hence g(x) ∈ A ∩ B. This means, x ∈ g −1 (A ∩ B). Similarly, in order to prove g −1 (A ∪ B) = g −1 (A) ∪ g −1 (B), we have to show, (1)

g −1 (A ∪ B) ⊆ g −1 (A) ∪ g −1 (B) and,

(2)

g −1 (A) ∪ g −1 (B) ⊆ g −1 (A ∪ B).

To demonstrate part (1), we let x ∈ g −1 (A ∪ B) and show x ∈ g −1 (A) ∪ g −1 (B). So, if x ∈ g −1 (A ∪ B) then g(x) ∈ (A ∪ B). But this means g(x) ∈ A or g(x) ∈ B, which implies x ∈ g −1 (A) or x ∈ g −1 (B). From this we know x ∈ g −1 (A) ∪ g −1 (B).

4

Chapter 1. The Real Numbers

To demonstrate the reverse inclusion, we let x ∈ g −1 (A) ∪ g −1 (B) and show x ∈ g −1 (A ∪ B). So, if x ∈ g −1 (A) ∩ g −1 (B) then x ∈ g −1 (A) or x ∈ g −1 (B). This implies g(x) ∈ A or g(x) ∈ B, and hence g(x) ∈ A ∪ B. This means, x ∈ g −1 (A ∪ B). Exercise 1.2.8. (a) There exist two real numbers a and b satisfying a < b such that for all n ∈ N we have a + 1/n ≥ b. (b) There exist two distinct rational numbers with the property that every number in between them is irrational. √ (c) There exists a natural number n where n is rational but not a natural number. (d) There exists a real number x such that n ≤ x for all n ∈ N. Exercise 1.2.9. (a) We will use induction to prove xn ≤ 2, for every n ∈ N. For n = 1, we can easily see x1 = 1 ≤ 2. Now, we want to show that if we have xn ≤ 2, then it follows that xn+1 ≤ 2. Starting from the induction hypothesis xn ≤ 2, we multiply across the inequality by 1/2 and add 1 to get 1 1 xn + 1 ≤ 2 + 1 = 2, 2 2 which is precisely the the desired conclusion xn+1 ≤ 2. By induction, the claim is proved for all n ∈ N. Exercise 1.2.10. (a) For n = 1, we can easily see y1 = 1 < 4, and this proves the base case. Now, we want to show that if we have yn < 4, then it follows that yn+1 < 4. Starting from the induction hypothesis yn < 4, we can multiply across the inequality by 3/4 and add 1 to get 3 3 yn + 1 < 4 + 1 = 4, 4 4 which is the the desired conclusion yn+1 ≤ 4. By induction, the claim is proved for all n ∈ N. (b) For n = 1, we can easily see y1 = 1 < 7/4 = y2 , proving the base case. Now, we want to show that if we have yn ≤ yn+1 , then it follows that yn+1 ≤ yn+2 . Starting from the induction hypothesis yn ≤ yn+1 , we can multiply across the inequality by 3/4 and add 1 to get 3 3 yn + 1 < yn+1 + 1 4 4 which is the the desired conclusion yn+1 ≤ yn+1 . By induction, the claim is proved for all n ∈ N.

1.2. Some Preliminaries

5

Exercise 1.2.11. We will use induction, this time starting with n = 0, to prove the claim. When n = 0 then A = ∅. For this case, the set A has just the empty set as its only subset. Since 20 = 1, the claim is true in this case. Now we have to show that if sets of size n have 2n different subsets, then it follows that sets of size n + 1 have 2n+1 different subsets. Given a set A of size n + 1, first remove an arbitrary element a ∈ A. The set A\{a} has n elements, and we can use the induction hypothesis to say that there are exactly 2n subsets of A\{a}. Said another way, there are precisely 2n subsets of A that do not contain the particular element a. By adding the element a to each of these we will produce 2n new subsets of A. Since every subset of A either contains a or does not contain a, we can be sure that we have listed them all. Thus, the total number of subsets of A is given by 2n (for the subsets without a) plus 2n (for the subsets that do contain a), and 2n + 2n = 2n+1 . By induction, the claim is proved for all n ∈ N. Exercise 1.2.12. (a) From Exercise 1.2.3 we know (A1 ∪ A2 )c = Ac1 ∩ Ac2 which proves the base case. Now we want to show that c

if we have (A1 ∪ A2 ∪ · · · ∪ An ) = Ac1 ∩ Ac2 ∩ · · · ∩ Acn , then it follows that (A1 ∪ A2 ∪ · · · ∪ An+1 )c = Ac1 ∩ Ac2 ∩ · · · ∩ Acn+1 . Since the union of sets obey the associative law, (A1 ∪ A2 ∪ · · · ∪ An+1 )c = ((A1 ∪ A2 ∪ · · · ∪ An ) ∪ An+1 )c which is equal to (A1 ∪ A2 ∪ · · · ∪ An )c ∩ Acn+1 . Now from our induction hypothesis we know that (A1 ∪ A2 ∪ · · · ∪ An )c = Ac1 ∩ Ac2 ∩ · · · ∩ Acn which implies that (A1 ∪ A2 ∪ · · · ∪ An )c ∩ Acn+1 = Ac1 ∩ Ac2 ∩ · · · ∩ Acn ∩ Acn+1 . By induction, the claim is proved for all n ∈ N. (b) The point here is to distinguish between asserting that a statement is true for all values of n ∈ N and asserting that it is true in the infinite case. Induction cannot be used when we have an infinite number of sets. It is used to prove facts that hold true for each value T of n ∈ N. For instance, in Exercise n 1.2.2, we could use induction to show that k=1 Ak isTinfinite for all choices of ∞ n ∈ N , but notice that this conclusion is not true for k=1 An . S∞ T∞ c (c) In order to prove ( n=1 An ) = n=1 Acn we have to show, Ã (1)

∞ [

n=1

!c An

⊆

∞ \ n=1

Acn and,

6

Chapter 1. The Real Numbers

(2)

∞ \ n=1

Ã Acn

⊆

∞ [

!c An

.

n=1

S∞ T∞ c To demonstrate part (1), we let x ∈ ( n=1 An ) and show x ∈ n=1 Acn . So, if S∞ c x ∈ ( n=1 An ) then x ∈ / An for all n ∈ N. This implies T∞ x iscin the complement of each An and by the definition of intersection x ∈ n=1 T∞An . c To demonstrate the reverse inclusion, we let x ∈ n=1 An and show x ∈ S∞ T∞ c ( n=1 An ) . So, if x ∈ n=1 Acn then xS∈ Acn for all n ∈ N which means ∞ x∈ / A / ( n=1 An ) and we can now conclude Sn∞for all cn ∈ N. This implies x ∈ x ∈ ( n=1 An ) .

1.3

The Axiom of Completeness

Exercise 1.3.1. (a) For any z ∈ Z5 the additive inverse is y = 5 − z. (b) For z = 1 the additive inverse is x = 1, for z = 2 it is x = 3, for z = 3 it is x = 2,and for z = 4 it is x = 4. (c) For any z ∈ Z4 the additive inverse is y = 4 − z. However, the multiplicative inverse of 2 does not exist. In general, additive inverses exist in Zn for all values of n. Multiplicative inverses exist for prime values of n only. Exercise 1.3.2. (a) A real number i is the greatest upper bound, or the infimum, for a set A ⊆ R if it meets the following two criteria: (i) i is a lower bound for A; i.e., i ≤ a for all a ∈ A, and (ii) if l is any lower bound for A, then l ≤ i. (b) Lemma: Assume i ∈ R is a lower bound for a set A ⊆ R. Then, i = inf A if and only if, for every choice of ² > 0, there exists an element a ∈ A satisfying i + ² > a. (i) To prove this in the forward direction, assume i = inf A and consider i + ², where ² > 0 has been arbitrarily chosen. Because i + ² > i, statement (ii) implies i + ² is not a lower bound for A. Since this is the case, there must be some element a ∈ A for which i + ² > a because otherwise i + ² would be a lower bound. (ii) For the backward direction, assume i is a lower bound with the property that no matter how ² > 0 is chosen, i + ² is no longer a lower bound for A. This implies that if l is any number greater than i then l is no longer a lower bound for A. Because any number greater than i cannot be a lower bound, it follows that if l is some other lower bound for A, then l ≤ i. This completes the proof of the lemma. Exercise 1.3.3. (a) Because A is bounded below, B is not empty. Also, for all a ∈ A and b ∈ B, we have b ≤ a. The first thing this tells us is that B is bounded above and thus α = sup B exists by the Axiom of Completeness. It remains to show that α = inf A. The second thing we see is that every element

1.3. The Axiom of Completeness

7

of A is an upper bound for B. By part (ii) of the definition of supremum, α ≤ a for all a ∈ A and we conclude that α is a lower bound for A. Is it the greatest lower bound? Sure it is. If l is an arbitrary lower bound for A then l ∈ B, and part (i) of the definition of supremum implies l ≤ α. This completes the proof. (b) We do not need to assume that greatest lower bounds exist as part of the Axiom of Completeness because we now have a proof that they exist. By demonstrating that the infimum of a set A is always equal to the supremum of a different set, we can use the existence of least upper bounds to assert the assert the existence of greatest lower bounds. (c) Given a set A, define −A = {−a : a ∈ A}. Now if A is bounded below it follows that −A is bounded above and it is not too hard to prove inf A = sup(−A) using an argument much like those in Exercise 1.3.5. Exercise 1.3.4. Observe that all elements of B are contained in A and hence sup A ≥ b for all b ∈ B. By Definition 1.3.2 part (ii), sup B is less than or equal to any other upper bounds of B. Because sup A is an upper bound for B, it follows that sup B ≤ sup A. Exercise 1.3.5. (a) Note that c + sup A is an upper bound for c + A. Now, we have to show if d is any upper bound for c + A, then c + sup A ≤ d. We know c + a ≤ d for all a ∈ A, and thus a ≤ d − c for all a ∈ A. This means d − c is an upper bound for A and by part (ii) of Definition 1.3.2, sup A ≤ d − c. But this implies c + sup A ≤ d which is precisely what we wanted to show. (b) In the case c = 0, cA = {0} and without too much difficulty we can argue that sup(cA) = 0 = c sup A. So let’s focus on the case where c > 0. Observe that c sup A is an upper bound for cA. Now, we have to show if d is any upper bound for cA, then c sup A ≤ d. We know ca ≤ d for all a ∈ A, and thus a ≤ d/c for all a ∈ A. This means d/c is an upper bound for A, and by Definition 1.3.2 sup A ≤ d/c. But this implies c sup A ≤ c(d/c) = d, which is precisely what we wanted to show. (c) Assuming the set A is bounded below, we claim sup(cA) = c inf A for the case c < 0. In order to prove our claim we first show c inf A is an upper bound for cA. Since inf A ≤ a for all a ∈ A, we multiply both sides of the equation to get c inf A ≥ ca for all a ∈ A. This shows that c inf A is an upper bound for cA. Now, we have to show if d is any upper bound for cA, then c inf A ≤ d. We know ca ≤ d for all a ∈ A, and thus d/c ≤ a for all a ∈ A. This means d/c is a lower bound for A and from Exercise 1.3.2, d/c ≤ inf A. But this implies c inf A ≤ c(d/c) ≤ d, which is precisely what we wanted to show. Exercise 1.3.6. (a) The supremum is 3 and the infimum is 1. (b) The supremum is 1 and the infimum is 0. (c) The supremum is 1/2 and the infimum is 1/3. (d) The supremum is 9 and the infimum is 1/9. Exercise 1.3.7. Since a is an upper bound for A, we just need to verify the second part of the definition of supremum and show that if d is any upper bound

8

Chapter 1. The Real Numbers

then a ≤ d. By the definition of upper bound a ≤ d because a is an element of A. Hence, by Definition 1.3.2, a is the supremum of A. Exercise 1.3.8. Set ² = sup B − sup A > 0. By Lemma 1.3.7, there exists an element b ∈ B satisfying sup B − ² < b, which implies sup A < b. Because sup A is an upper bound for A, then b is as well. Exercise 1.3.9. (a) True. (b) False. If we consider A = (a, b), every element in A is less than b but sup A = b. (c) False. Consider, the open sets A = (c, d) and B = (d, f ). Then a < b for every a ∈ A and b ∈ B, but sup A = d = inf B. (d) True (e) False. If we take A = [0, 2] and B = (0, 2), we see that sup A = sup B but there is no element b ∈ B that is an upper bound for A.

1.4

Consequences of Completeness

Exercise 1.4.1. We have to show there exists a rational number between a and b when a < 0. If b > 0, then by Theorem 1.4.3 we know there exists a rational number r satisfying 0 < r < b and so a < r < b as well. If b ≤ 0 then we can use Theorem 1.4.3 to say that there exists r ∈ Q satisfying −b < r < −a and it follows that a < −r < b. The proof that −r is rational is part of the next exercise. Exercise 1.4.2. (a) We have to show if a, b ∈ Q, then ab and a + b are elements of Q. By definition, Q = {p/q : p, q ∈ Z, q 6= 0}. So take a = p/q and b = c/d pc where p, q, c, d ∈ Z and q, d 6= 0. Then, ab = qd where pc, qd ∈ Z because Z is closed under multiplication. This implies ab ∈ Q. To see that a + b is rational, write p c pd + qc + = , q d qd and observe that both pd + qc and qd are integers with qd 6= 0. (b) Assume, for contradiction, that a + t ∈ Q. Then t = (t + a) − a is the difference of two rational numbers, and by part (a) t must be rational as well. This contradiction implies a + t ∈ I. Likewise, if we assume at ∈ Q, then t = (at)(1/a) would again be rational by the result in (a). This implies at ∈ I. (c) The set of irrationals is not closed under addition and multiplication. Given two irrationals s and t, √ s + t can be either irrational or rational. For √ instance, if s = √2 and t = −√ 2, then s + t =√0 which √ is an√element of Q. However, if s = 2 and t = 2 2 then s + t = 2 + 2 2 = 3 3 which √ is an element of I. Similarly, st can be either irrational or rational. If s = √ √2 and t =√ − 2, then st√= √ −1 which is a rational number. However,if s = 2 and √ t = 3 then st = 2 3 = 6 which is an irrational number.

1.4. Consequences of Completeness

9

Exercise 1.4.3. We have to show the existence of an irrational number between any√two real numbers a and b. By applying Theorem 1.4.3 on √ the real numbers √ √ a− 2 and b− 2 we can find a rational number r satisfying a− 2 0, then the Archimedean Property of R states there exists n ∈ N such that 1/n < c. This means that any c > 0 is not a lower bound. Thus, if c is a lower bound it must satisfy c ≤ 0 as desired. T∞ Exercise 1.4.5. Let x ∈ R be arbitrary. To prove n=1 (0, 1/n) = ∅ it is enough to show that x ∈ / (0, 1/n) for some n ∈ N. If x ≤ 0 then we can take n = 1 and observe x ∈ / (0, 1). If x > 0 then by Theorem 1.4.2 we know there T∞ exists an n0 ∈ N such that 1/n0 < x. This implies x ∈ / n=1 (0, 1/n), and our proof is complete. Exercise 1.4.6. (a) Now, we need to pick n0 large enough so that 1 α2 − 2 < n0 2α

or

2α < α2 − 2. n0

With this choice of n0 , we have (α − 1/n0 )2 > α2 − 2α/n0 = α2 − (α2 − 2) = 2. This means (α − 1/n0 ) is an upper bound for T . But (α − 1/n0 ) < α and α = sup T is supposed to be the least upper bound. This contradiction means that the case α2 > 2 can be ruled out. Because√we have already ruled out α2 < 2, we are left with α2 = 2 which implies α = 2 exists in R. (b) Define the set T = {t ∈ R : t2 < b, and let α = sup T which we know exists because T is non-empty (it contains 0) and bounded above. As before, we’ll show α2 = b by ruling out the possibilities α2 b. First assume α2 < b and observe, µ

1 α+ n

¶2

2α 1 + 2 n n 2α 1 < α2 + + n n 2α + 1 . = α2 + n = α2 +

From Theorem 1.4.2(ii), choose n0 large enough so that b − α2 1 < . n0 2α + 1

10

Chapter 1. The Real Numbers

This implies (2α + 1)/n0 < 2 − α2 , and consequently that µ ¶2 1 α+ < α2 + (b − α2 ) = b. n0 Thus, α + 1/n0 ∈ T , contradicting the fact that α is an upper bound for T . We conclude that α2 b. This time, we have µ ¶2 1 α− n

2α 1 + 2 n n 2α > α2 − . n

=

α2 −

This time pick n0 large enough so that 1 α2 − b < n0 2α

or

2α < α2 − b. n0

With this choice of n0 , we have (α − 1/n0 )2 > α2 − 2α/n0 = α2 − (α2 − b) = b. This means (α−1/n0 ) is an upper bound for T . But then (α−1/n0 ) < α = sup T leads to a contradiction because all upper bounds of T should be greater than or equal to the supremum α. Thus, α2 > b is not a possibility and we are left with α2 = b as desired. Exercise 1.4.7. Next let n2 = min{n ∈ N : f (n) ∈ A\{f (n1 )}} and set g(2) = f (n2 ). In general, assume we have defined g(k) for k < m, and let g(m) = f (nm ) where nm = min{n ∈ N : f (n) ∈ A\{f (n1 ) . . . f (nk−1 )}}. To show that g : N → A is 1–1, observe that m 6= m0 implies nm 6= nm0 and it follows that f (nm ) = g(m) 6= g(m0 ) = f (nm0 ) because f is assumed to be 1–1. To show that g is onto, let a ∈ A be arbitrary. Because f is onto, a = f (n0 ) for some n0 ∈ N. This means n0 ∈ {n : f (n) ∈ A} and as we inductively remove the minimal element, n0 must eventually be the minimum by at least the n0 − 1st step. Exercise 1.4.8. (a) Because A1 is countable, there exists a 1–1 and onto function f : N → A1 . If B2 = ∅, then A1 ∪ A2 = A1 which we already know to be countable. If B2 = {b1 , b2 , . . . , bm } has m elements then define h : A1 ∪ B2 via ½ bn if n ≤ m h(n) = f (n − m) if n > m. The fact that h is a 1–1 and onto follows immediately from the same properties of f .

1.4. Consequences of Completeness

11

If B2 is infinite, then by Theorem 1.4.12 it is countable, and so there exists a 1–1 onto function g : N → B2 . In this case we define h : A1 ∪ B2 by ½ f ((n + 1)/2) if n is odd h(n) = g(n/2) if n is even. Again, the proof that h is 1–1 and onto is derived directly from the fact that f and g are both bijections. Graphically, the correspondence takes the form

N:

1

2

3

4

5

6

l

l

l

l

l

l

A1 ∪ B2 : a1

b1

a2

b2

a3

b3

··· ···

To prove the more general statement in Theorem 1.4.13, we may use induction. We have just seen that the result holds for two countable sets. Now let’s assume that the union of m countable sets is countable, and show that the union of m + 1 countable sets is countable. Given m + 1 countable sets A1 , A2 , . . . , Am+1 , we can write A1 ∪ A2 ∪ · · · ∪ Am+1 = (A1 ∪ A2 ∪ · · · ∪ Am ) ∪ Am+1 . Then Cm = A1 ∪ · · · ∪ Am is countable by the induction hypothesis, and Cm ∪ An+1 is just the union of two countable sets which we know to be countable. This completes the proof. (b) Induction can not be used when we have infinite number of sets. It can only be used to prove facts that hold true for each value of n ∈ N. See the discussion in Exercise 1.2.12 for more on this. (c) Let’s first consider the case where the sets {AS n } are disjoint. In order to ∞ achieve 1-1 correspondence between the set N and n=1 An , we first label the elements in each countable set An as An = {an1 , an2 , an3 , . . .}. S∞ Now arrange the elements of n=1 An in an array similar to the one for N given in the exercise: A1 = a11 a12 a13 a14 a15 · · · A2 = a21 a22 a23 a24 · · · A3 = a31 a32 a33 · · · A4 = a41 a42 · · · A5 = a51 · · · .. . S∞ This establishes a 1–1 and onto mapping g : N → n=1 An where g(n) corresponds to the element ajk where (j, k) is the row and column location of n in the array for N given in the exercise.

12

Chapter 1. The Real Numbers

If the sets {An } are not disjoint then our mapping may not be 1–1. In this case we could again replace An with Bn = An \{A1 ∪ · · · ∪ An−1 }. Another approachSis to use the previous argument to establish a 1–1 correspondence ∞ between n=1 An and an infinite subset of N, and then appeal to Theorem 1.4.12. Exercise 1.4.9. (a) Since A ∼ B we know there is 1-1, onto function from A onto B. This means we can define another function g : B → A that is also 1-1 and onto. More specifically, if f : A → B is 1-1 and onto then f −1 : B → A exists and is also 1-1 and onto. (b) We will show there exists a 1-1, onto function h : A → C. Because A ∼ B, there exists g : A → B that is 1–1 and onto. Likewise, B ∼ C implies that there exists f : B → C that is also 1-1 and onto. So let’s define h : A → C by the composition h = f ◦ g. In order to show f ◦ g is 1-1, take a1 , a2 ∈ A where a1 6= a2 and show f (g(a1 )) 6= f (g(a2 )). Well, a1 6= a2 implies that g(a1 ) 6= g(a2 ) because g is 1–1. And g(a1 ) 6= g(a2 ) implies that f (g(a1 )) 6= (f (g(a − 2)) because f is 1–1. This shows f ◦ g is 1–1. In order to show f ◦ g is onto, we take c ∈ C and show that there exists an a ∈ A with f (g(a)) = c. If c ∈ C then there exists b ∈ B such that f (b) = c because f is onto. But for this same b ∈ B we have an a ∈ A such that g(a) = b since g is onto. This implies f (b) = f (g(a)) = c and therefore f ◦ g is onto. Exercise 1.4.10. For each k ∈ N, let Ak be the set of subsets of N whose maximal element is k. For example, A1 is the set containing just the subset {1}. In A2 we would have {2} and {1, 2}. For A3 there would be four elements: {3}, {1, 3}, {2, 3}, and {1, 2, 3}. There are two key observations to make. The first is that every Ak contains a finite number of elements. The second is that every finite subset of N must appear in exactly one of the sets Ak . Setting A S0∞= ∅, this allows us to assert that the set of all finite subsets of N is equal to k=0 Ak . Now we may proceed as in the proof of Theorem 1.4.11 (i) and argue that the countable union of finite subsets is countable. Exercise 1.4.11. (a). The function f (x) = (x, 13 ) is 1–1 from (0, 1) to S. (b) Given (x, y) ∈ S, let’s write x and y in their decimal expansions x = .x1 x2 x3 . . .

and

y = .y1 y2 y3 . . .

where we make the convention that we always use the terminating form (or repeated 0s) over the repeating 9s form when the situation arises. Now define f : S → (0, 1) by f (x, y) = .x1 y1 x2 y2 x3 y3 . . . In order to show f is 1–1, assume we have two distinct points (x, y) 6= (w, z) from S. Then it must be that either x 6= w or y 6= z, and this implies that in at least one decimal place we have xi 6= wi or yi 6= zi . But this is enough to conclude f (x, y) 6= f (w, z).

1.4. Consequences of Completeness

13

The function f is not onto. For instance the point t = .555959595 . . . is not in the range of f because the ordered pair (x, y) with x = .555 . . . and y = .5999 . . . would not be allowed due to our convention of using terminating decimals instead of repeated 9s. √ √ Exercise 1.4.12. (a) 2 is √ a root√of the polynomial x2 − 2, 3 2 is a root of the polynomials x3 − 2, and 3 + 2 is a root of x4 − 10x2 + 1. Since all of these numbers are roots of polynomials with integer coefficients, they are all algebraic. (b) Fix n, m ∈ N. The set of polynomials of the form an xn + an−1 xn−1 + · · · + a1 x + a0 satisfying |an | + |an−1 | + · · · + |a0 | ≤ m is finite because there are only a finite number of choices for each of the coefficients (given that they must be integers.) If we let Anm be the set of all the roots of polynomials of this form, then because each one of these polynomials has at most n roots, the set Anm is finite. Thus An , the set of algebraic numbers obtained as roots of any polynomial (with integer coefficients) of degree n, can be written as a countable union of finite sets ∞ [ An = Anm . m=1

It follows that An is countable. S∞ (c) If A is the set of all algebraic numbers, then A = n=1 An . Because each An is countable, we may use Theorem 1.4.13 to conclude that A is countable as well. If T is the set transcendental numbers, then A ∪ T = R. Now if T were countable, then R = A ∪ T would also be countable. But this is a contradiction because we know R is uncountable, and hence the collection of transcendental numbers must also be uncountable. Exercise 1.4.13. (a) Given y ∈ f (X), the fact that f is 1–1 implies that the x ∈ X satisfying f (x) = y must be unique. This allows us to define the function f −1 from f (X) back to X because now there is no ambiguity about the value of f −1 (y). By focusing only on the range f (X) (and not all of Y ) we may say that f is a 1–1 and onto function from X to f (X). Its inverse, f −1 , from f (X) back to X is then easily seen to be 1–1 and onto as well. (b) If x ∈ / g(Y ) then g −1 (x) is not defined, and the number of elements to the left of x in Cx is 0. Similarly, if g −1 (x) ∈ / f (X), then f −1 (g −1 (x)) is not defined and the chain terminates with just one element to the left of x. In general, we get a finite number of elements to the left of x if some iterate falls outside of either g(Y ) or f (X). (c) Given x, x0 ∈ X, assume Cx ∩ Cx0 6= ∅. Without loss of generality, let y ∈ Y satisfy y ∈ Cx ∩ Cx0 . Then either y = f (g(· · · g(f (x))))

or

y = g −1 (f −1 (· · · f −1 (g −1 (x)))),

14

Chapter 1. The Real Numbers

and a similar statement is true with x0 in place of x. Equating the expressions for x and x0 and applying the appropriate combination of f, g, f −1 and g −1 , we can show that x ∈ Cx0 and x0 ∈ Cx . This is sufficient to conclude Cx = Cx0 . (d)Let C by a chain in B. Then C ∩ Y is not a subset of f (X), so there exists y ⊂ Y with y ∈ C but y ∈ / f (X). Note that C and Cy have a point in common, so they must be equal. (e) Note that Y1 ⊆ f (X). To show f : X1 → Y1 is onto we pick a y1 ∈ Y1 and show there exists x1 ∈ X1 with f (x1 ) = y1 . Well, if y1 ∈ Y1 ⊆ f (X), we know there exists x1 ∈ X such that f (x1 ) = y1 . But we must be sure that x1 ∈ X1 . However, Cx1 contains y1 which is an element of some chain in A. Since chains that intersect must be identical, Cx1 ⊆ A, and x1 ∈ X1 . Now we have to show g : Y2 → X2 is onto. To do this, we pick x2 ∈ X2 and show that there exists y2 ∈ Y2 with g(y2 ) = x2 . Since x2 ∈ X2 ⊆ g(Y ) we know there exists y2 ∈ Y such that g(y2 ) = x2 . Now we need to show that y2 ∈ B because if y2 ∈ Y and y2 ∈ B then y2 ∈ B ∩ Y = Y2 . We know that Cx2 contains y2 which is an element of some chain B. Since chains that intersect are identical Cx2 ⊆ B, y2 ∈ B, and hence y2 ∈ Y2 . Finally, to prove X ∼ Y define h : X → Y by ½ f (x) if x ∈ X1 h(x) = g −1 (x) if x ∈ X2 Because X = X1 ∪ X2 and f and g −1 are 1–1 and have disjoint ranges on these respective spaces, we get that h is 1–1. Because Y = Y1 ∪ Y2 and f and g −1 are respectively onto, it follows that h is onto as well.

1.5

Cantor’s Theorem

Exercise 1.5.1. The function f (x) = (x − 1/2)/(x − x2 ) is a 1–1, onto mapping from (0, 1) to R. This shows (0, 1) ∼ R, and the result follows using the ideas in Exercise 1.4.9. Exercise 1.5.2. (a) The real number x = .b1 b2 b3 b4 . . . cannot be equal to f (1) = a11 a12 a13 a14 . . . because they differ in the first decimal place; i.e., b1 6= a11 . (b) The real number x cannot be equal to f (2) because b2 6= a22 and they differ in the second decimal place. In general, x 6= f (n) because bn 6= ann . (c) Since f is onto, every real number x ∈ (0, 1) should be in the indexed array. However, the specific x we have constructed is not equal to f (n) for any n ∈ N, and hence not contained in the range of f . This is contradiction to the assumption that f is onto. We conclude that (0,1) must be uncountable. Exercise 1.5.3. (a) If we imitate the proof to try and show that Q is uncountable, we can construct a real number x in the same way. This x will again fail to be in the range of our function f , but there is no reason to expect x to be rational. The decimal expansions for rational numbers either terminate or repeat, and this will not be true of the constructed x.

1.5. Cantor’s Theorem

15

(b) By using the digits 2 and 3 in our definition of bn we eliminate the possibility that the point x = .b1 b2 b3 . . . has some other possible decimal representation (and thus it cannot exist somewhere in the range of f in a different form.) Exercise 1.5.4. Our proof will have the same structure as that of Cantor’s. So let us assume for contradiction that there exists a function f : N → S that is 1-1 and onto. The 1-1 correspondence between N and S can be represented by the following indexed array N 1

←→ f (1) =

.a11

a12

a13

a14

a15

a16

···

2

←→ f (2) =

.a21

a22

a23

a24

a25

a26

···

3

←→ f (3) =

.a31

a32

a33

a34

a35

a36

···

4

←→ f (4) =

.a41

a42

a43

a44

a45

a46

···

5

←→ f (5) =

.a51

a52

a53

a54

a55

a56

···

6

←→ f (6) =

.a61

a62

a63

a64

a65

a66

···

.. .

.. .

.. .

.. .

.. .

.. .

..

.. .

.. .

.

where amn = 1 or 0 for m, n ∈ N. Now let us define a sequence (xn ) = (x1 , x2 , x3 , . . .) ∈ S via ½ 0 if ann = 1 xn = 1 if ann = 0. From this definition we can see that f (1) is not the sequence (xn ) because a11 is not the same as x1 . Similarly, f (2) 6= (xn ) since a22 6= x2 . In general, f (n) 6= (xn ) since ann 6= xn for all n ∈ N . Because f is onto, all sequences in S should be in the range of f . However, the specific sequence (xn ) we defined above is not equal to f (n) for any n ∈ N. This contradiction implies that the set S is uncountable. Exercise 1.5.5. (a) P (A) = {∅, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}. (b) An induction proof for this fact is given in Exercise 1.2.11. A more combinatoric proof can be obtained by listing the n elements of A. To construct a subset of A, we consider each element and associate either a ‘Y’ if we decide to include it in our subset or an ‘N’ if we decide not to include it. Thus, to each subset of A there is an associated sequence of length n of Y s and N s. This correspondence is 1–1, and the proof is done by observing there are 2n such sequences. Exercise 1.5.6. (a) Given set A = {a, b, c}, A can be mapped in a 1–1 fashion into P (A) in many ways. For example, we could write (i) a → {a} b → {a, c} c → {a, b, c}

16

Chapter 1. The Real Numbers

As another example we might say (ii) a → {b, c} b→∅ c → {a, c}. (b) An example of 1–1 mapping from B to P (B) is: 1 → {1} 2 → {2, 3, 4} 3 → {1, 2, 4} 4 → {2, 3}. (c) Because 2n > n for every n, the power set P (A) simply has too many elements to be mapped into A in a 1–1 fashion. Exercise 1.5.7. For the example in (a) (i), the set B = {b}. For example (ii) we get B = {a, b}. In part (b) we find B = {3, 4}. In every case, the set B fails to be in the range of the function that we defined. Exercise 1.5.8. (a) If a0 ∈ B, then by the definition of B we conclude that a0 ∈ / f (a0 ). But f (a0 ) = B which means that a0 ∈ / B, a contradiction. Thus we must reject the possibility that a0 ∈ B. (b) But now let’s assume that a0 ∈ / B. Then by the definition of B, a0 ∈ f (a0 ). 0 Because f (a ) = B, this implies that a0 ∈ B, which is another contradiction. Therefore a0 ∈ / B is equally unacceptable. Because it is impossible for a0 to be in neither B nor B c , the initial assumption that B = f (a0 ) for some a0 ∈ A must have been false. In other words, such an element a0 does not exist, and the function f : A → P (A) is not onto. Exercise 1.5.9. (a) The set A of functions from {0, 1} to N is countable. To see this, first observe that A can be put into a 1–1 correspondence with the set of ordered pairs {(m, n) : m, n ∈ N}. To be precise, if f ∈ A, then f is a function from {0, 1} to N, and we can match it up with the ordered pair (m, n) where m = f (0) and n = f (1). To show that {(m, n) : m, n ∈ N} is countable, we can either use an argument similar to the proof of Theorem 1.4.11 (i) where we showed that Q is countable. Another approach would be to write {(m, n) : m, n ∈ N} =

∞ [

{(m, n) : m ∈ N}

n=1

and use Theorem 1.4.13. (b) This set is uncountable. A function from N to {0, 1} is in fact just a sequence consisting of 0’s and 1’s, so the set of such functions is precisely the set S from Exercise 1.5.4. (c) The set P (N) does contain an uncountable antichain. To construct such an antichain, let’s first let E = {2, 4, . . . , 2n, . . .} be the even natural numbers

1.5. Cantor’s Theorem

17

and O = {1, 3, . . . , 2n − 1, . . .} be the odd natural numbers, enumerated in the standard way. Now consider the set S from Exercise 1.5.4 which we know to be uncountable. For each s = (s1 , s2 , s3 , . . .) we construct the subset As ⊆ N using the rule that 2n ∈ As if and only if sn = 1 and 2n − 1 ∈ As if and only if sn = 0. The fact that E and O are disjoint with N = E ∪ O is enough to prove that the collection {As : s ∈ S} is an uncountable antichain.

18

Chapter 1. The Real Numbers

Chapter 2

Sequences and Series 2.1

Discussion: Rearrangements of Infinite Series

2.2

The Limit of a Sequence

Exercise 2.2.1. a) Let ² > 0 be arbitrary. We must show that there exists an N ∈ N such that n ≥ N implies | 6n21+1 − 0| < ². Well, ¯ ¯ ¯ ¯ 1 1 ¯ ¯= − 0 ¯ 6n2 + 1 ¯ 6n2 + 1 , p as this will always be positive. So pick N to satisfy N > 1/6². It then follows that for n ≥ N implies 6n21+1 < ². b) Let ² > 0 be arbitrary. Now we must produce an N so that n ≥ N implies 3n+1 | 2n+5 − 32 | < ². This time notice, ¯ ¯ ¯ 3n + 1 3 ¯ 3 3n + 1 6n + 15 − 6n − 2 13 ¯ ¯ − = . ¯ 2n + 5 2 ¯ = 2 − 2n + 5 = 4n + 10 4n + 10 13 Now pick N such that N > 13−10² 4² , then for n ≥ N it follows that 4n+10 < ². c) Let ² > 0 be arbitrary. We must produce an N so that n ≥ N implies 2 | √n+3 − 0| < ². Well, ¯ ¯ ¯ ¯ 2 ¯= √ 2 . ¯√ − 0 ¯ ¯ n+3 n+3

Pick N to satisfy N > 4/²2 − 3. It then follows that when n ≥ N we get √2 < ² as desired. n+3 Exercise 2.2.2. Consider the sequence (− 12 , 12 , − 12 , 12 · · · ) This sequence verconges to x = 0. To see this, note that we only have to produce a single ² > 0 19

20

Chapter 2. Sequences and Series

where the prescribed condition follows, and in this case we can take ² = 1. This ² works because for all N ∈ N, it is true that n ≥ N implies |xn − 12 | < 1. This is also an example of a vercongent sequence that is divergent. Notice that the “limit” x = 0 is not unique. We could also show this same sequence verconges to x = 1 by choosing ² = 2. In general, a vercongent sequence is a bounded sequence. By a bounded sequence, we mean that there exists an M ≥ 0 satisfying |xn | <≤ M for all n ∈ N. In this case we can always take x = 0 and ² = M + 1. Then |xn − x| = |xn | < ², and the sequence (xn ) verconges to 0. Exercise 2.2.3. a) There exists at least one college in the United States where all students are less than seven feet tall. b) There exists a college in the United States where all professors gave at least one student a grade of C or less. c) At every college in the United States, there is a student less than six feet tall. Exercise 2.2.4. For any ² that is greater than 1, there exists a response N . In this case, N can be any natural number. For any ² that is less than or equal to 1, there exists no suitable response. This is because, although the 1s in the sequence occur less and less frequently as we go out the sequence, there is still no point in the sequence where the sequence enters the neighborhood (−², ²) and never leaves. Exercise 2.2.5. a) The limit of (an ) is zero. To show this let ² > 0 be arbitrary. We must show that there exists an N ∈ N such that n ≥ N implies |[[1/n]]−0| < ². Well, pick N > 1. If n ≥ N we then have; ¯·· ¸¸ ¯ ¯ 1 ¯ ¯ ¯ ¯ n − 0¯ = |0 − 0| < ², because [[1/n]] = 0 for all n > 1. b) Again the limit of an is zero. Let ² > 0 be arbitrary. By picking N > 10 we have that for n ≥ N , ¯·· ¯ ¸¸ ¯ 10 + n ¯ ¯ − 0¯¯ = |0 − 0| < ², ¯ 2n because [[(10 + n)/2n]] = 0 for all n > 10. In these exercises, the choice of N does not depend on ² in the usual way. In exercise (b) for instance, setting N = 11 is a suitable response for every choice of ² > 0. Thus, this is a rare example where a smaller ² > 0 does not require a larger N in response. Exercise 2.2.6. (a) Any larger N will also work for the same ² > 0. (b) This same N will also work for any larger value of ².

2.3. The Algebraic and Order Limit Theorems

21

Exercise 2.2.7. a) A sequence (an ) “converges to infinity” if, for every positive number a, there exists an N ∈ N such that whenever n ≥ N it follows that an > a. Let a > 0 be arbitrary. We must show that there exists √ √ an N ∈ N such that n ≥ N implies that n > a. Well, pick N > a2 . Then, n > a for all n ≥ N . b) According to the above definition, this sequence does not converge to infinity. Exercise 2.2.8. (a) The sequence (−1)n is frequently in the set 1. (b) Definition (i) is stronger. “Frequently” does not imply “eventually”, but “eventually” implies “frequently”. (c) A sequence (an ) converges to a real number a if, given any ²-neighborhood V² (a) of a, (an ) is eventually in V² (a). (d) Suppose an infinite number of terms of a sequence (xn ) are equal to 2, then (xn ) is frequently in the interval (1.9, 2.1). However, (xn ) is not necessarily eventually in the interval (1.9, 2.1). Consider the sequence (2, 0, 2, 0, 2, · · · ), for instance.

2.3

The Algebraic and Order Limit Theorems

Exercise 2.3.1. Let ² > 0 be arbitrary. We need to show that there exists an N such that when n ≥ N , |an − a| < ². Well, for all n |an − a| = |a − a| = 0 < ². So we can choose N to be anything we like. Exercise 2.3.2. (a) Let ² > 0 be arbitrary. We must find an N such that √ n ≥ N implies | xn − 0| < ². Because (xn ) → 0, there exists p N ∈ N such that n ≥ N implies |xn − 0| = xn < ²2 . Using this N , we have (xn )2 < ²2 , which √ gives | xn − 0| < ² for all n ≥ N , as desired. (b) Part (a) handles the case x = 0, so we may assume x >√0. Let ² > 0. √ This time we must find an N such that n ≥ N implies | xn − x| < ², for all n ≥ N . Well, √ ¶ µ√ √ √ xn + x √ √ √ | xn − x| = | xn − x| √ xn + x |xn − x| √ = √ xn + x |xn − x| √ ≤ x √ Now because (xn ) → x and x > 0, we can choose N such that |xn − x| < ² x whenever n ≥ N . And this implies that for all n ≥ N , √ √ √ ² x | xn − x| < √ = ² x

22

Chapter 2. Sequences and Series

as desired. Exercise 2.3.3. Let ² > 0 be arbitrary. We must show that there exists an N such that n ≥ N implies |yn − l| < ². In terms of ²-neighborhoods (which are a bit easier to use in this case), we must equivalently show yn ∈ (l − ², l + ²) for all n ≥ N . Because (xn ) → l, we can pick an N1 such that xn ∈ (l − ², l + ²) for all n ≥ N1 . Similarly, because (zn ) → l we can pick an N2 such that zn ∈ (l−², l+²) whenever n ≥ N2 . Now, because xn ≤ yn ≤ zn , if we let N = max{N1 , N2 }, then it follows that yn ∈ (l − ², l + ²), for all n ≥ N. This completes the proof. Exercise 2.3.4. We can prove this directly from the definition of convergence, or by using the Algebraic Limit Theorem. (i) Proof using the definition of convergence: Let ² > 0 be arbitrary. Let’s show |l1 − l2 | < ². We know that lim an = l1 , so there exists N1 ∈ N such that n ≥ N1 implies |an − l1 | < ²/2. Similarly, since lim an = l2 , there exists N2 ∈ N such that n ≥ N2 implies |an − l2 | < ²/2. Setting N = max{N1 , N2 } gives us that for n ≥ N , |l1 − l2 | = |l1 − an + an − l2 | ≤ |l1 − an | + |an − l2 | = |an − ln | + |an − l2 | < ²/2 + ²/2 = ². Thus it is clear that |l1 − l2 | < ². By Theorem 1.2.6, l1 = l2 . (ii) Proof using the Algebraic Limit Theorem: First observe that lim(an − an ) = lim(an ) − lim(an ) = l1 − l2 But we also have lim(an − an ) = lim 0 = 0, and therefore l1 − l2 = 0 which implies l1 = l2 . Exercise 2.3.5. (⇒) Let ² > 0 be arbitrary. Let’s call the limit that (zn ) converges to L. Then we need to show that there exists an N such that when n ≥ N , it follows that |yn − L| < ². Because (zn ) → L, we can pick N so that |zn −L| < ² for all n ≥ N. Because yn = z2N it certainly follows that |yn −L| < ² whenever n ≥ N. A similar argument holds for the (xn ) sequence. (⇐) Let ² > 0 be arbitrary. Again, let L be the common limit of (xn ) and (yn ). We need to show that there exists an N such that when n ≥ N it follows that |zn − L| < ². Choose N1 so that |xn − L| < ² for all n ≥ N1 , and choose N2 such that |yn − L| < ² for all n ≥ N2 . Finally, let N = max{2N1 , 2N2 }, and it follows from the construction of the sequence (zn ) that |zn − L| < ² whenever n ≥ N.

2.3. The Algebraic and Order Limit Theorems

23

Exercise 2.3.6. (a)By the triangle inequality, |bn | = |bn − b + b| ≤ |bn − b| + |b| Thus |bn | − |b| ≤ |bn − b|, and in fact ||bn | − |b|| ≤ |bn − b|. Since (bn ) → b, there exists N ∈ N such that |bn − b| < ² whenever n ≥ N . Therefore, ||bn |−|b|| ≤ |bn −b| < ² for all n ≥ N as well, which proves |bn | → |b|. (b) The converse of (a) is false. Consider bn = (−1)n . We can see that |bn | → 1, but (bn ) is divergent. Exercise 2.3.7. a) Because (an ) is bounded, there exists a K satisfying |an | ≤ K. Let ² > 0 be arbitrary. We need to find an N such that when n ≥ N it follows that |an bn − 0| < ². Well, |an bn − 0| = |an ||bn | ≤ K|bn |. Because (bn ) → 0, we can pick an N such that |bn | <

² . K

Finally, we can conclude that for this choice of N , |an bn − 0| ≤ K|bn | < K

² =² K

for all n ≥ N . Therefore, (an bn ) → 0. We may not use the Algebraic Limit Theorem in this case because the hypothesis of that theorem requires that both (an ) and (bn ) be convergent. (And this may not be so for (an ).) b) No, for instance if (an ) = (1, −1, 1, −1, · · · ), (an bn ) will not converge. c) All convergent series are bounded. Therefore, if (an ) → a and (bn ) → 0, then by part (a), (an bn ) → 0. Exercise 2.3.8. (a) Consider (xn ) = (−1, 1, −1, 1, · · · ), and (yn ) = (1, −1, 1, −1, · · · ). Both sequences diverge but (xn + yn ) converges. (b) Such a request is impossible because by the Algebraic Limit Theorem, if (xn + yn ) converges to l and (xn ) converges to x, then lim(yn ) = lim(yn + xn − xn ) = lim(xn + yn ) − lim(xn ) = l − x. So (yn ) must also converge. (c) Consider the sequence (bn ) = (1, 12 , 13 , 14 , · · · ). To prevent the sequence from “converging to infinity” we could also add alternating negative signs.

24

Chapter 2. Sequences and Series

(d) Such a request is impossible. By Theorem 2.3.2,(bn ) is bounded. If (an − bn ) were bounded, then we could show that (an ) = (an − bn ) + (bn ) would also have to be bounded, which is not the case. Thus, (an − bn ) is unbounded. (e) Take (an ) = 1/n, and (bn ) = (−1)n . Such a request would be impossible if we were given that lim an 6= 0. Exercise 2.3.9. No, the Order Limit Theorem (Theorem 2.3.4) does not remain valid if the inequalities are assumed to be strict (in the conclusion of the theorem). For example, the sequence (1/n) converges to zero although every term is strictly positive. Exercise 2.3.10. Let ² > 0 be arbitrary. We want to produce an N such that for every n ≥ N , |bn − b| < ². Because (an ) → 0, there exists an N ∈ N such that for every n ≥ N, an = |an − 0| < ². Using this same N , we have |bn − b| ≤ an < ² whenever n ≥ N . Therefore (bn ) → b. Exercise 2.3.11. Let ² > 0 be arbitrary. Then we need to find an N such that n ≥ N implies |yn − L| < ². Because (xn ) → L, we know that there exists M > 0 such that |xn − L| < M for all n. Also, there exists an N1 such that n ≥ N1 implies |xn − L| < ²/2. Now for n ≥ N1 we can write ¯ ¯ ¯ x1 + x2 + · · · + xN1 + · · · + xn nL ¯¯ |yn − L| = ¯¯ − n n ¯ ¯ ¯ ¯ (x1 − L) + (x2 − L) + · · · + (xN1 −1 − L) (xN1 − L) + · · · + (xn − L) ¯ ¯ + = ¯¯ ¯ n n ¯ ¯ ¯ ¯ ¯ (x1 − L) + (x2 − L) + · · · + (xN1 −1 − L) ¯ ¯ (xN1 − L) + · · · + (xn − L) ¯ ¯+¯ ¯ ≤ ¯¯ ¯ ¯ ¯ n n (N1 − 1)M ²(n − N1 ) + . ≤ n 2n Because N1 and M are fixed constants at this point, we may choose N2 so that (N1 −1)M < ²/2 for all n ≥ N2 . Finally, let N = max{N1 , N2 } be the desired N . n 1 To see that this works, keep in mind that n−N < 1 and observe n |yn − L| ≤

(N1 − 1)M ²(n − N1 ) ² ² + < + =² n 2n 2 2

for all n ≥ N . This completes the proof. The sequence (xn ) = (1, −1, 1, −1, · · · ) does not converge, but the averages satisfy (yn ) → 0.

2.4. The Monotone Convergence Theorem and a First Look atInfinite Series25 Exercise 2.3.12. (a) Intuitively speaking, lim am,n

m,n→∞

should be a number that is arbitrarily close to the values of am,n when m and n are both large. However, with two index variables, this raises the question of whether we insist that the variables be large simultaneously or whether we allow them to “go to infinity” one at a time in an iterated fashion. The ”iterated” limit limn→∞ limm→∞ am,n is the limit of the sequence of the limits of the columns in the doubly indexed array am,n . To compute this, first fix n ∈ N and let 1 1 = = 1. m→∞ 1 + n/m 1+0

bn = lim am,n = lim m→∞

Thus lim lim am,n = lim bn = lim 1 = 1.

n→∞ m→∞

n→∞

n→∞

In the other order, we first fix m and compute the limit along each row of the am,n array to get µ ¶ m/n 0 lim lim am,n = lim lim = lim = 0. m→∞ n→∞ m→∞ n→∞ (m/n) + 1 m→∞ 0 + 1 From this example we can see that it is possible for lim lim am,n 6= lim lim am,n ,

m→∞ n→∞

n→∞ m→∞

and so defining doubly indexed limits in this fashion would be problematic to say the least. (b) A doubly indexed array (am,n ) satisfies lim am,n = l

m,n→∞

if for every positive number ², there exists an N ∈ N such that whenever n, m ≥ N it follows that |am,n − l| < ².

2.4

The Monotone Convergence Theorem and a First Look at Infinite Series

P∞ P∞ Exercise 2.4.1. We will show that if n=0 2n b2n diverges then n=1 bn diverges by again exploiting a relationship between the partial sums sm = b1 + b2 + · · · + bm , and tk = b1 + 2b2 + · · · + 2k b2k . P∞ Because n=0 2n b2n diverges, its monotone sequence of partial sums (tk ) must be unbounded. To show that (sm ) is unbounded it is enough to show that for

26

Chapter 2. Sequences and Series

all k ∈ N, there is term sm satisfying sm ≥ tk /2. This argument is similar to the one for the forward direction, only to get the inequality to go the other way we group the terms in sm so that the last (and hence smallest) term in each group is of the form b2k . Given an arbitrary k, we focus our attention on s2k and observe that s2 k

= ≥ = = =

b1 + b2 + (b3 + b4 ) + (b5 + b6 + b7 + b8 ) + · · · + (b2k−1 +1 + · · · + b2k ) b1 + b2 + (b4 + b4 ) + (b8 + b8 + b8 + b8 ) + · · · + (b2k + · · · + b2k ) b1 + b2 + 2b4 + 4b8 + · · · + 2k−1 b2k ¢ 1¡ 2b1 + 2b2 + 4b4 + 8b8 + · · · + 2k b2k 2 b1 /2 + tk /2.

Because (tk ) is unbounded, P the sequence (sm ) must also be unbounded and ∞ cannot converge. Therefore, n=1 bn diverges. Exercise 2.4.2. (a) We will show that this sequence is decreasing and bounded. First, let’s use induction to show that this sequence is decreasing. Observe that x1 = 3 > 1 = x2 . Now, we need to prove that if xn > xn+1 , then xn+1 > xn+2 . Well, xn > xn+1 implies that −xn < −xn+1 . Adding 4 to both sides of the inequality gives 4 − xn < 4 − xn+1 . It follows that 1 1 > , 4 − xn 4 − xn+1 which is precisely what we need to conclude xn+1 > xn+2 . Thus by induction, (xn ) is decreasing. The argument above shows that (xn ) is bounded above by 3, so now we’ll show that (xn ) is bounded below. Clearly x1 > 0. Now assume xn > 0. Because 1 (xn ) is decreasing, we know that xn ≤ x1 = 3, which implies that xn+1 = 4−x n is positive. By induction, (xn ) is bounded below by 0 for all n ∈ N. Therefore this sequence converges by Monotone Convergence Theorem. (b) Since the sequence (xn+1 ) is just the sequence (xn ) shifted by 1 (and without the first term), the two sequences have the same limit. (c) From (b), we can let x = lim(xn ) = lim(xn+1 ). Now the Algebraic Limit Theorem tells us that x = lim xn+1 = lim

1 1 = , 4 − xn 4−x

and it follows that x must satisfy the equation x2 − 4x + 1 = 0. Solving the equation gives √ √ 4 ± 16 − 4 x= = 2 ± 3, 2 √ and since x1 = 3 and (xn ) is decreasing, we conclude that x = 2 − 3.

2.4. The Monotone Convergence Theorem and a First Look atInfinite Series27 Exercise 2.4.3. (a) First, y1 = 1 < 7/2 = y2 . To use induction to prove that (yn ) is increasing we assume yn < yn+1 and show that yn+1 < yn+2 . Starting with the inequality yn < yn+1 , we take reciprocals to get 1/yn > 1/yn+1 . Then multiplying by −1 and adding 4 to each side gives 4 − 1/yn < 4 − 1/yn+1 which is precisely the desired statement yn+1 < yn+2 . Now we know that (yn ) is increasing and bounded below by y1 = 1. Because the terms in (yn ) are all positive, it follows that yn < 4 for all n ∈ N and our increasing sequence is bounded above. Thus, by the Monotone Convergence Theorem we may set y = lim yn = lim yn+1 . Taking limits across the recursive equation for yn+1 gives y = lim yn+1 = lim(4 − 1/yn ) = 4 − 1/y which implies that y satisfies y 2 − 4y + 1 = 0. A little algebra yields y =

√

3 + 2.

Exercise 2.4.4. We will show that this sequence is √ increasing√and bounded. First rewrite the sequence in a recursive way: x1 = 2, xn+1 = 2xn . Let’s prove that the sequence is increasing by induction. For the base case we observe that q √ x1 = 2 < 2 2 = x2 , so we just need to prove that xn < xn+1 implies xn+1√< xn+2 √ . But if xn < xn+1 √ √ √ then xn < xn+1 , and multiplying by 2 gives 2xn < 2xn+1 . Thus we have xn+1 < xn+2 and the sequence is increasing. To show the sequence is √ bounded√above by 2 we first observe that x1 < 2. Now if xn < 2, then xn+1 = 2xn < 2 · 2 = 2 as well, and (xn ) is bounded. Therefore this sequence converges by Monotone Convergence Theorem and we can assert that both (xn ) and (xn+1 ) converge to some real√number l. Taking √ limits across the recursive equation xn+1 = 2xn yields l = 2l, which implies l = 2. We should note that the last steps in this problem involved taking the limit inside a square root sign, and this is not a manipulation that is justified by the Algebraic Limit Theorem. Instead we should reference Exercise 2.3.2 to support this part of the argument. Exercise 2.4.5. (a) We first observe that a simple induction argument shows that xn is positive for all n. We can also write µ µ ¶¶2 µ ¶2 1 2 x2n 1 xn 1 2 xn+1 − 2 = xn + −2= + 2 −1= − ≥0 2 xn 4 xn 2 xn as any number squared is positive. This shows that x2n ≥ 2 for all choices of n. (It’s worth mentioning that this part of the argument, and the next, is not by induction.) Now let’s argue that (xn ) is decreasing. If we write µ ¶ 2 1 1 x2 − 2 1 xn + = xn − = n , xn − xn+1 = xn − 2 xn 2 xn 2xn

28

Chapter 2. Sequences and Series

then we can see that xn − xn+1 is positive because x2n ≥ 2. Because we have shown that (xn ) is decreasing and bounded below, we may set x = lim xn = lim xn+1 . Taking limits across the recursive equation we find x = lim xn+1 n→∞

1 = lim n→∞ 2

√ which implies x = 2. (b) The sequence xn+1 converges to

√

1 = 2

µ ¶ 2 x 1 xn + = + xn 2 x

µ ¶ c xn + xn

c using a similar argument.

Exercise 2.4.6. (a) For each n ∈ N, set An = {ak : k ≥ n} so that yn = sup An . Because An+1 ⊆ An it follows (by Exercise 1.3.4) that yn+1 ≤ yn and so (yn ) is decreasing. If L is a lower bound for (an ), then for all n ∈ N it must be that yn ≥ an ≥ L. Thus (yn ) is both decreasing and bounded, and it follows from the Monotone Convergence Theorem that (yn ) converges. (b)Define the limit inferior of (an ) as lim inf an = lim zn , where zn = inf{ak : k ≥ n}. The sequence (zn ) is increasing (because we are taking the greatest lower bound of a smaller set each time) and bounded above (because (an ) is bounded.) Thus (zn ) converges by MCT. (c) For each n ∈ N we have yn ≥ zn , so by the Order Limit Theorem (Theorem 2.3.4) lim yn ≥ lim zn . This shows lim sup an ≥ lim inf an for every bounded sequence. The sequence (an ) = (1, 0, 1, 0, 1, 0, · · · ) has lim sup an = 1 and lim inf an = 0. Notice that this sequence is not convergent. (d) First let’s prove that if lim yn = lim zn = l, then lim an = l as well. Let ² > 0. There exists an N ∈ N such that yn ∈ V² (l) and zn ∈ V² (l) for all n ≥ N . Because zn ≤ an ≤ yn , it must also be the case that an ∈ V² (l) for all n ≥ N . Therefore lim an exists and is equal to l. Next, let’s show that if lim an = l, then lim yn = l. (The proof that lim zn = l is similar.) Let ² > 0 be arbitrary. Because lim an = l, there exists an N ∈ N such that n ≥ N implies an ∈ V² (l). This means that l−² and l+² are lower and upper bounds for the set {an , an+1 , an+2 , · · · }. It follows that l − ² ≤ yn ≤ l + ² for all n ≥ N . Keeping in mind that we already know y = lim yn exists, we can use the Order Limit Theorem to assert that l − ² ≤ y ≤ l + ², and because ² is arbitrary we must have y = l. (Theorem 1.2.6 could be referenced in this last step.)

2.5. Subsequences and the Bolzano–Weierstrass Theorem

2.5

29

Subsequences and the Bolzano–Weierstrass Theorem

Exercise 2.5.1. Assume (an ) → L and let (anj ) be a subsequence of (an ). We must show (anj ) → L as well. Let ² > 0 be arbitrary. We need to produce an N ∈ N such that j ≥ N implies |anj − L| < ². Because (an ) → L we know there exists an N such that |an − L| < ² for n ≥ N . But nj ≥ j, so this same N works for the subsequence as well. To be precise, j ≥ N implies nj ≥ N , and so |anj − L| < ² as desired. Exercise 2.5.2. (a) Letting sn = a1 +a2 +· · ·+sn , we are given that lim sn = L. For the regrouped series, let’s write b1 b2

= a1 + a2 + · · · + an1 , = an1 +1 + an1 +2 + · · · + an2 , .. . bm = anm−1 +1 + · · · + anm , P∞ and the claim is that the series m=1 bm converges to L as well. To prove this, just observe that if (tm ) is the sequence of partial sums for the regrouped series, then tm

= b1 + b2 + · · · + bm = (a1 + · · · + an1 ) + · · · + (anm−1 +1 + · · · + anm ) = snm .

which means that (tm ) is a subsequence of (sn ) and therefore converges to L by Theorem 2.5.2. Exercise 2.5.3. (a) (1/2, 1/2, 1/3, 2/3, 1/4, 3/4, 1/5, 4/5, · · · , 1/n, (n−1)/n, · · · ) (b) Impossible. This convergent subsequence would then be bounded; however, this would imply that the original sequence was also bounded. Because the original sequence is monotone, we know it cannot be bounded because we are told it diverges. (c) The sequence µ ¶ 1 1 1 1 1 1 1 1 1 1 1, 1, , 1, , , 1, , , , 1, , , , , 1, · · · 2 2 3 2 3 4 2 3 4 5 has this property. Notice that there is also a subsequence converging to 0. We shall see that this is unavoidable. (d) (1, 1, 2, 1, 3, 1, 4, 1, 5, 1, · · · ) (e) Impossible. Theorem 2.5.5 guarantees us that all bounded sequences have convergent subsequences. Exercise 2.5.4. Let’s assume, for contradiction, that (an ) does not converge to a. Paying close attention to the quantifiers in the definition of convergence for

30

Chapter 2. Sequences and Series

a sequence, what this means is that there exists an ²0 > 0 such that for every N ∈ N we can find an n ≥ N for which |a − an | ≥ ²0 . Using this, we can build a subsequence of (an ) that never enters the ²–neighborhood V²0 (a). To see how, first pick n1 so that |a − an1 | ≥ ². Next choose n2 > n1 so that |a − an2 | ≥ ²0 . Because our negated definition says that ”...for every N ∈ N, we can find an n ≥ N ...” we can be sure that having chosen nj , we may pick nj+1 > nj so that |a − anj+1 | ≥ ²0 . Because (an ) is bounded, the resulting subsequence (anj ) must be bounded as well. Now apply the Bolzano–Weierstrass Theorem to (anj ) to say that there exists a convergent subsequence (of (anj ) and hence also of (an )) which we will write as (anjk ). By hypothesis, this convergent subsequence must converge to a, but therein lies the contradiction. Because (anjk ) is a subsequence of (anj ), it never enters the neighborhood V²0 (a) and it cannot converge to a. This completes the proof. Exercise 2.5.5. From Example 2.5.3 we know that this is true for 0 0 be arbitrary and set a = |b|. Because we know (an ) → 0 (by Example 2.5.3), we may choose N so that n ≥ N implies |an − 0| < ². But this N will also work for the sequence (bn ) because |bn − 0| = |bn | = |an | < ² whenever n ≥ N . Exercise 2.5.6. Because (an ) is bounded, the set S is not empty and bounded above. By AoC, we know there exists an s ∈ R satisfying s = sup S. For a fixed k ∈ N consider s − 1/k. Because s is the least upper bound, s − 1/k is not an upper bound and there exists a point s0 ∈ S satisfying s − 1/k < s0 . A quick look at the definition of S then shows that, in fact, s − 1/k ∈ S and consequently there exist an infinite number of terms an satisfying s − 1/k < an . Because s is an upper bound for S we can be sure that s + 1/k ∈ / S from which we can conclude that there are only a finite number of terms an satisfying s + 1/k < an . Taken together, these observations show that for all k ∈ N, there are an infinite number of terms an satisfying s−

1 1 < an ≤ s + . k k

To inductively build our convergent subsequence (ank ) first pick an1 to satisfy s − 1 < an1 ≤ s + 1. Now given that we have constructed ank , choose nk+1 > nk so that 1 1 s− < ank+1 ≤ s + . k+1 k+1 (Here we are using the fact that this inequality is satisfied by an infinite number of terms an and so there is certainly one where n > nk .) To show (ank ) → s, we let ² > 0 be arbitrary and choose K > 1/². If k ≥ K then 1/k < ² which implies s − ² < ank < s + ², and the proof is complete.

2.6. The Cauchy Criterion

2.6

31

The Cauchy Criterion

Exercise 2.6.1. (a) (1, −1/2, 1/3, −1/4, 1/5, −1/6, . . .) (b) (1, 2, 3, 4, 5, 6, . . .) (c) Impossible, if a sequence is Cauchy then Theorem 2.6.4 tells us that it converges and Theorem 2.5.2 says that subsequences of convergent sequences converge. (d) (1, 1, 1/2, 2, 1/3, 3, 1/4, 4, 1/5, 5, . . .) Exercise 2.6.2. Assume (xn ) converges to x, and let ² > 0 be arbitary. Becuase (xn ) → x, there exists N ∈ N such that n, m ≥ N implies |xn − x| < ²/2 and |xm − x| < ²/2. By the triangle inequality, |xn − xm | = ≤ < =

|xn − x + x − xm | |xn − x| + |xm − x| ² ² + 2 2 ².

Therefore, |xn − xm | < ² whenever n, m ≥ N , and (xn ) is a Cauchy sequence. Exercise 2.6.3. (a) The difference is that this definition only requires that the difference between consecutive elements become arbitrarily small, whereas the real Cauchy property requires that any two elements beyond a certain point in the sequence differ by an arbitrarily small amount. (b) (1), (1 + 1/2), (1 + 1/2 + 1/3), (1 + 1/2 + 1/3 + 1/4), . . P . This is the sequence of partial sums for the harmonic series 1/n which we have seen diverges even though sn+1 − sn = 1/n goes to zero. Exercise 2.6.4. Let ² > 0 be arbitrary. We know that there exists an N1 ∈ N such that n, m ≥ N implies |an − am | < ²/2. Also, we know there exists an N2 ∈ N such that n, m ≥ N2 implies |bn − bm | < ²/2. Set N = max{N1 , N2 }. By the triangle inequality and its variation in Exercise 1.2.5 (b), |cn − cm | = ||an − bn | − |am − bm || ≤ |(an − bn ) − (am − bm )| = |(an − am ) + (bm − bn )| ≤ |an − am | + |bm − bn | ² ² < + = ², 2 2 whenever n, m ≥ N . Therefore (cn ) is a Cauchy sequence. Exercise 2.6.5. (a) Let ² > 0 be arbitrary. We need to find an N so that n, m ≥ N implies |(xn + yn ) − (xm + ym )| < ². Because (xn ) and (yn ) are Cauchy we can pick N so that when n, m ≥ N it follows that |xn − xm | < ²/2 and |yn − ym | < ²/2. Now write, |(xn + yn ) − (xm + ym )| ≤ |xn − xm | + |yn − ym | <

² ² + = ². 2 2

32

Chapter 2. Sequences and Series

(b) Let ² > 0 be arbitrary. We must produce an N such that n, m ≥ N implies |xn yn − xm ym | < ². Note that |xn yn − xm ym |

= |xn yn − xn ym + xn ym − xm ym | ≤ |xn yn − xn ym | + |xn ym − xm ym | = |xn ||yn − ym | + |ym ||xn − xm |.

Because (xn ) and (yn ) are Cauchy, we know by Lemma 2.6.3 that they are bounded. So let K ≥ |xn | and L ≥ |ym | for all m, n. We also know that we ² can pick N1 such that m, n ≥ N1 implies |xn − xm | < 2L . Similarly, pick N2 so ² that m, n ≥ N2 implies |yn − ym | < 2K . Now let N = max{N1 , N2 }. Then for m, n ≥ N it follows that |xn yn − xm ym | ≤ |xn ||yn − ym | + |ym ||xn − xm | < K

² ² +L = ². 2K 2L

Exercise 2.6.6. (a) Let A be a non-empty set that is bounded above, and let b1 be an upper bound for A. Next choose a real number a1 that is not an upper bound for A. Necessarily, a1 < b1 and we can let I1 be the closed interval I1 = [a1 , b1 ]. Our goal in this proof is to show that the set A has a least upper bound, and our major tool for getting there is the Nested Interval Property. With an eye toward using NIP, bisect the interval I1 and let c1 be the midpoint. If c1 is an upper bound for A, then set b2 = c1 and a2 = a1 . If c1 is not an upper bound for A, then set b2 = b1 and a2 = c1 . Letting I2 = [a2 , b2 ], we see that in either case described the left endpoint a2 is not an upper bound for A while the right endpoint b2 is an upper bound for A. We now continue this process inductively. Given that we have constructed In = [an , bn ], we let cn be the midpoint. If cn is an upper bound for A, we let bn+1 = cn and an+1 = an . If cn is not an upper bound then it becomes the left endpoint; i.e., an+1 = cn and bn+1 = bn . The resulting collection In = [an , bn ] of nested intervals has the property that, for every n ∈ N, the point an fails to be an upper bound while bn is an upper bound. By the Nested Interval Property, we know there exists a real number s∈

∞ \

In .

n=1

Setting M = b1 − a1 , we can also see that the length of In is M/2n−1 which tends to zero. From this fact, we can easily prove that s = lim an

and

s = lim bn .

We now claim that s = sup A. To show that s is an upper bound for A, we let a ∈ A be arbitrary. Because each bn is an upper bound, we observe that a ≤ bn for all n. By the Order Limit Theorem, a ≤ s as well, and we conclude that s is an upper bound for A.

2.7. Properties of Infinite Series

33

To show that s is the least upper bound, we let l be some arbitrary upper bound and observe that an < l for all n. Again using the Order Limit Theorem, we may conclude s ≤ l, and this completes the argument. (b) Let In = [an , bn ] be a nested collection of closed intervals. To prove NIP, we must produce an x ∈ R satisfying am ≤ x ≤ bm for all m ∈ N. Because the intervals are nested, the sequence (an ) is increasing and bounded above (by b1 for instance.) By MCT, we know there exists a real number x satisfying x = lim an . Now fix m ∈ N. A short contradiction argument shows am ≤ x. The nested property of the intervals also gives us that an ≤ bm for all n ∈ N, and the Order Limit Theorem then implies x ≤ bm , as desired. (c) Just as in (b), we start with a nested collection of closed intervals In = [an , bn ] and argue that there is a real number x common to all of them. Focusing on the sequence (an ) of left-hand endpoints, we may not assert (because MCT is off limits) that it converges, but it is certainly bounded. By the Bolzano– Weierstrass Theorem, there exists a convergent subsequence (ank ), and we can set x = lim ank . Now fix m ∈ N. Because ank ≤ bm for all k ∈ N, the Order Limit Theorem implies, just as before, that x ≤ bm . Also, choosing a particular term nK ≥ m we T∞can argue that am ≤ anK ≤ x must be true. Thus x ∈ Im for all m, and n=1 Im is not empty. (d) Let (an ) be a bounded sequence so that there exists M > 0 satisfying |an | ≤ M for all n. Our goal is to use the Cauchy Criterion to produce a convergent subsequence. First construct the sequence of closed intervals and the subsequence with ank ∈ Ik according to the method described in the proof of the BolzanoWeierstrass Theorem in the text. Rather than using NIP to produce a candidate for the limit of this subsequence, we can argue that (ank ) is convergent by appealing to the Cauchy Criterion. Let ² > 0. By construction, the length of Ik is M (1/2)k−1 which converges to zero. Choose N so that k ≥ N implies that the length of Ik is less than ². So for any s, t ≥ N , because ans and ant are in Ik , it follows that |ans − ant | < ². Having shown (ank ) is a Cauchy sequence, we know it converges.

2.7

Properties of Infinite Series

Exercise 2.7.1. (a) Here we show that the sequence of partial sums (sn ) converges by showing that it is a Cauchy sequence. Let ² > 0 be arbitrary. We need to find an N such that n > m ≥ N implies |sn − sm | < ². First recall, |sn − sm | = |am+1 − am+2 + am+3 − · · · ± an |. Because (an ) is decreasing and the terms are positive, an induction argument shows that for all n > m we have |am+1 − am+2 + am+3 − · · · ± an | ≤ |am+1 |.

34

Chapter 2. Sequences and Series

So, by virtue of the fact that (an ) → 0, we can choose N so that m ≥ N implies |am | ≤ ². But this implies |sn − sm | = |am+1 − am+2 + · · · ± an | ≤ |am+1 | < ² whenever n > m ≥ N , as desired. (b) Let I1 be the closed interval [0, s1 ]. Then let I2 be the closed interval [s2 , s1 ], which must be contained in I1 as (an ) is decreasing. Continuing in this fashion, we can construct a nested sequence of closed intervals I1 ⊇ I2 ⊇ I3 ⊇ · · · . By the Nested Interval Property there exists at least one point S satisfying S ∈ In for every n ∈ N. We now have a candidate for the limit, and it remains to show that (sn ) → S. Let ² > 0 be arbitrary. We need to demonstrate that there exists an N such that |sn − S| < ² whenever n ≥ N . By construction, the length of In is |sn −sn−1 | = an . Because (an ) → 0 we can choose N such that an < ² whenever n ≥ N . Thus, |sn − S| ≤ an < ² because both sn , S ∈ In . (c) The subsequence (s2n ) is increasing and bounded above (by a1 for instance.) The Monotone Convergence Theorem allows us to assert that there exists an S ∈ R satisfying S = lim(s2n ). One way to prove that the other subsequence (s2n+1 ) converges to the same value is to use the Algebraic Limit Theorem and the fact that (an ) → 0 to write lim(s2n+1 ) = lim(s2n + a2n+1 ) = S + lim(a2n+1 ) = S + 0 = S. The fact that both (s2n ) and (s2n+1 ) converge to S implies that (sn ) → S as well. (See Exercise 2.3.5.) P∞ Exercise 2.7.2. (a) (i) Assume k=1 bk converges. Thus, given ² > 0, there exists an N ∈ N such that whenever n > m ≥ N it follows that |bm+1 + bm+2 + · · · + bn | < ². Since 0 ≤ ak ≤ bk for all k ∈ N, we have |am+1 + am+2 + · · · + an | < |bm+1 + bm+2 + · · · + bn | < ² P∞ whenever n > m ≥ N , and k=1 ak converges as well. (ii) Rather than trying to work with a negated version of the Cauchy Criterion, we can argue by contradiction. This is actually an example of a contrapositive proof. Rather than proving ”If P, then Q,” we can argue that ”Not Q implies not P.” In the context ”Not Q implies not P∞ of this particular problem,P ∞ P” is just the statement “ k=1 bk converges implies that k=1 ak converges.” But this is exactly what we showed in (i). P∞ (b)(i) Let sn = a1 + · · · + an be the P partial sums for k=1 ak , and let ∞ tn = b1 + · · · + bn be the partial sums for k=1 . Because 0 ≤ ak ≤ bk for all

2.7. Properties of Infinite Series

35

k ∈ N, both (sn ) andP(tn ) are increasing, and in addition we have sn ≤ tn for ∞ all n ∈ N. Because P k=1 bk converges, (tn ) is bounded and thus (sn ) is also ∞ bounded. By MCT, k=1 ak converges. (ii) As mentioned previously, this is just the contrapositive version of the statement in (i). P Exercise pn n . If both P 2.7.3. (a) The key observation is that an = pn + qP and qn converge, then by the Algebraic Limit Theorem, an would also converge, and this is not the case. (b) In additionPto an = pn + qn , we also have |an | = pn − qn . Because we are given that |an | diverges, it P must be (for the reasons similar to those P in (a)) that at least one of p or qP n n diverges. So let’s assume (without P loss of generality) that pn diverges. If qn were to converge, then we could P write pn = an − qn . Keeping in mind that we P are assuming an converges, the Algebraic Limit Theorem P would imply that pn should also converge. This contradiction implies that qn must, in fact, diverge. Exercise 2.7.4. One example would be xn = (1, 0, 1, 0, 1, 0, . . .)

and

yn = (0, 1, 0, 1, 0, 1, . . .).

Another would be to set xn = yn = 1/n for all n ∈ N.

P Exercise 2.7.5. (a) By definition of absolute convergence, |an | must converge. Theorem 2.7.3 tells us that there must be an N such that P n ≥ N implies ∞ |an | < 1. Now, a2n < |an | forPn ≥ N . Thus, by Theorem 2.7.4, n=N a2n con∞ verges and therefore so does n=1 a2n as there are only a finite number of terms before N . Because a2n ≥ 0, the convergence is absolute. P √ n+1 This result does not hold without absolute / n P convergence. Consider (−1) which converges conditionally; however, 1/n diverges. P 2 P (b) This is not a true statement. Consider 1/n which converges, however, 1/n does not converge. Exercise 2.7.6. (a) Because (yn ) P is bounded, there exits M ≥ 0 such that |yn | ≤ M . Now we arePgiven that |xn | converges, and the Algebraic Limit Theorem tells us that M |xn | also converges. P Because |xn yn | ≤ M |xn |, we may use the Comparison Test to assert that |xn yn | converges. Finally, the P Absolute Convergence Test implies xn yn converges. n n which is certainly bounded. Then n = (−1) P P (b) Set xn = (−1) /n, and let yP xn converges conditionally, but xn yn = 1/n diverges. P Exercise 2.7.7. By thePCauchy Condensation Test (Theorem 2.4.6) 1/np n n p converges if and only if 2 (1/2 ) converges. But notice that ¶ µ p X µ 1 ¶p−1 X µ 1 ¶n X 1 n = = . 2 2n 2n 2p−1 By the Geometric Series Test (Example 2.7.5), this series converges if and only 1 | < 1. Solving for p we find that p must satisfy p > 1. if | 2p−1

36

Chapter 2. Sequences and Series

Exercise 2.7.8. In order to show that that the sequence of partial sums

P∞

k=1 (ak

+ bk ) = A + B, we must argue

rm = (a1 + b1 ) + (a2 + b2 ) + (a3 + b3 ) + · · · + (am + bm ) P∞ P∞ converges to A+B. We are given that k=1 ak = A and k=1 bk = B, meaning that the partial sums sm = a1 + a2 + a3 + · · · + am converge to A and tm = b1 + b2 + b3 + · · · + bm converge to B. Because sm + tm = rm , applying the Algebraic Limit Theorem for sequences (Theorem 2.3.3) yields (rm ) → A + B, as desired. Exercise 2.7.9. (a) This r0 must exist because R is dense in itself. 0 First, ¯ ¯ pick an ²-neighborhood around r of size ²0 = |r ¯− r |.¯ Because ¯ an+1 ¯ ¯ an+1 ¯ lim ¯ an ¯ = r, there exists an N such that n ≥ N implies ¯ an ¯ ∈ V²0 (r). ¯ ¯ ¯ ¯ It follows that ¯ aan+1 ¯ ≤ r0 for all n ≥ N , and this implies the statement in (a) n P (b) Having chosen N , |aN | is now a fixed number. Also, (r0 )n is a geometric seriesPwith |r0 | < 1, so it converges. Therefore, by the Algebraic Limit Theorem |aN | (r0 )n converges. (c) From (a) we know that there exists an N such that |aN +1 | ≤ |aN |r0 . Extending this we find |aN +2 | ≤ |aN +1 |r0 ≤ |aN |(r0 )2 , and using induction we can say that |ak | ≤ |aN |(r0 )k−N for all k ≥ N. P∞ Thus, k=N |ak | converges by the Comparison Test and part (b). Because ∞ X k=1

and

PN −1 k=1

|ak | =

N −1 X

|ak | +

k=1

∞ X

|ak |

k=N

|ak | is just a finite sum, the series

P∞ k=1

|ak | converges.

Exercise 2.7.10. (a) The idea here is that eventually the terms an “look like” a non-zero constant times 1/n, and we know that any series of this form diverges. To make this precise, let ²0 = l/2 > 0. Because (nan ) → l, there exists N ∈ N such that nan ∈ V²0 (l) for all n ≥ N . A little algebra shows that this implies we must have nan > l/2, or an > (l/2) (1/n)

for all n ≥ N.

Because this inequality is true for all but some finite P number of terms, we may still appeal to the Comparison Test to assert that an diverges. (b) Assume that lim(n2 an ) → L ≥ 0. The definition of convergence (with ²0 = 1) tells us that there exists an N such that n2 an < L + 1 for all n ≥ N .

2.7. Properties of Infinite Series

37

P This means that eventually an < (L + 1)/n2 . We know that the series 1/n2 converges, and by the Algebraic Limit Theorem for series (Theorem P P 2.7.1), (L + 1)/n2 converges as well. Thus, by the Comparison Test an must converge. Exercise 2.7.11. A preliminary example would be to let (an ) = (1, 0, 1, 0, 1, . . . )

and

(bn ) = (0, 1, 0, 1, 0, . . .).

To handle the more challenging version, we shall construct two positive deP 2 creasing sequences (a ) and (b ) with min{a , b } = 1/n where a and n n n n n P bn each diverge. First set a1 = b1 = 1. For 2 ≤ n ≤ 5, let an = 1/4 and let bn = 1/n2 . By holding an = 1/4 constant over 4 terms, we have added 1 to the P 2 partial sums of an . For 6 ≤ n ≤ 6 + 24, let an = P1/n and hold bn = 1/25 constant. This will add one to the partial sums of bn . Now we switch again and hold an = 1/302 for the next 302 terms whileP letting bnP = 1/n2 . Continuing this process will ensure P that the partial sums of an and bn are unbounded P while min{an , bn } = 1/n2 converges. Exercise 2.7.12. First write n X

xj yj

=

j=m+1

=

n X j=m+1 n X

(sj − sj−1 )yj sj yj −

j=m+1

n X

sj−1 yj .

j=m+1

Then, focusing on the second sum in the above expression, we have n X j=m+1

sj−1 yj =

n−1 X

sj yj+1 = sm ym+1 − sn yn+1 +

j=m

n X

sj yj+1 .

j=m+1

Substituting this back into our first equation gives the result. Exercise 2.7.13. (a) Let M > 0 be an upper bound for the partial sums, sn , P of x . Making use of Exercise 2.7.12 and overestimating the partial sums of n P xn with M , we find ¯ ¯ ¯ ¯ ¯ X ¯ ¯ ¯ n X ¯ n ¯ ¯ ¯ ¯ ¯ ¯ ¯ x y = s y − s y + s (y − y ) j j¯ m m+1 j j j+1 ¯ ¯ ¯ n n+1 ¯j=m+1 ¯ ¯ ¯ j=m+1 ≤ M yn+1 + M ym+1 +

n X

M (yj − yj+1 )

j=m+1

= M yn+1 + M ym+1 + M (ym+1 − yn+1 ) = 2M ym+1 .

38

Chapter 2. Sequences and Series

(b) In order to show that the series converges we will use the Cauchy Criterion for Series. Let ² > 0 be arbitrary. We must show that there exists an N such that whenever n > m ≥ N it follows that |xm+1 ym+1 +xm+2 ym+2 +· · ·+xn yn | < ². By part (a), ¯ ¯ ¯ X ¯ ¯ n ¯ ¯ |xm+1 ym+1 + xm+2 ym+2 + · · · + xn yn | = ¯ xj yj ¯¯ ≤ 2M ym+1 . ¯j=m+1 ¯ Because (yn ) → 0, we can pick N such that m ≥ N implies ym < ²/(2M ). Using this N , we find that ² |xm+1 ym+1 + xm+2 ym+2 + · · · + xn yn | ≤ 2M ym+1 < 2M =² 2M whenever n > m ≥ N as desired. (c) The Alternating Series Test is the special case where xn = (−1)n+1 . The P partial sums of xn in this case look like (1, 0, 1, 0, 1, . . .) which is a bounded sequence. Exercise 2.7.14. (a) Abel’s Test differs from Dirichlet’s Test in that we assume P P more about xn but less about (yn ). Specifically, we now assume that xn converges; however, (yn ) may converge to a limit greater than zero. P (b) Let A > 0 be an upper bound for the partial sums, sn , P of an . By making use of Exercise 2.7.12 and replacing the partial sums of an with A, we find ¯ ¯ ¯ ¯ ¯ n ¯ ¯ ¯ n X ¯X ¯ ¯ ¯ ¯ ¯ ¯ ¯ a b = s b − s b + s (b − b ) j j¯ m m+1 j j j+1 ¯ ¯ ¯ n n+1 ¯ j=1 ¯ ¯ ¯ j=m+1 ¯ ¯ ¯ n ¯ ¯ X ¯ ≤ Abn+1 + Abm+1 + ¯¯ A(bj − bj+1 )¯¯ ¯j=m+1 ¯ = Abn+1 + Abm+1 + A(bm+1 − bn+1 ) = 2Abm+1 ≤ 2Ab1 . (c) In order to show that the series converges we will use the Cauchy Criterion. Let ² > 0 be arbitrary. We must show that there exists an N such that whenever n > m ≥ N it follows that ¯ ¯ ¯ n ¯ ¯ X ¯ |xm+1 ym+1 + xm+2 ym+2 + · · · + xn yn | = ¯¯ xj yj ¯¯ < ². ¯j=m+1 ¯ Thinking of m as fixed for the moment, let an = xm+n and bn = ym+n , and apply part (b) to get ¯ ¯ ¯ ¯ ¯ n ¯ ¯n−m ¯ ¯ X ¯ ¯X ¯ ¯ ¯ ¯ xj yj ¯ = ¯ aj bj ¯¯ ≤ 2A1 b1 ¯ ¯j=m+1 ¯ ¯ j=1 ¯

2.8. Double Summations and Products of Infinite Series

39

P∞ where A1 is an upper bound P on the partial sums of j=m+1 xj . But this is the crucial point. Because xn converges, the Cauchy Criterion tells us that its “tail” can be made arbitrarily small. That is, we can pick N such that n > m ≥ N implies n X ² xj < . 2y 1 j=m+1 Looking again at what the constant A1 represents, it now follows that if m ≥ N then ¯ ¯ ¯ ¯ X ¯ ¯ n ² ¯ A1 ≤ ¯ xj ¯¯ < . 2y 1 ¯j=m+1 ¯ Putting this altogether and noting that b1 = y1+n ≤ y1 , we find |xm+1 ym+1 + xm+2 ym+2 + · · · + xn yn | ≤ 2A1 b1 < 2y1

² =² 2y1

whenever n > m ≥ N as desired.

2.8

Double Summations and Products of Infinite Series

Exercise 2.8.1. Examining the sum over squares we get s11 = −1, s22 = −3/2, s33 = −7/4, and in general snn = −2 +

1 2n−1

.

Now taking the limit we find (snn ) → −2. This value corresponds to the value previously computed by fixing j and summing down each column. Exercise 2.8.2. In order to show that the iterated series ∞ X ∞ X

aij

i=1 j=1

P∞ converges we must first show that for each fixed i ∈ N the series j=1 aij P ∞ converges to some real number ri . Then we need to show that the series i=1 ri converges. P∞ Fix i ∈ R. By our hypothesis, j=1 |aij | converges. Thus, the Absolute P∞ Convergence Test tells us j=1 aij converges to some real number ri . By looking at the partial we can use the Limit Theorem Psums, POrder P∞ to assert that |ri | ≤ bi , ∞ ∞ where bi = j=1 |aij |. Because b converges, i i=1 i=1 |ri | converges by the P Comparison Test, and then ri must converge as well.

40

Chapter 2. Sequences and Series

P∞ Exercise 2.8.3. (a) As we have been doing, let bi = j=1 |aij | for all i ∈ N. P∞ Our hypothesis tells us that there exists L ≥ 0 satisfying i=1 bi = L. Because we are adding all non-negative terms, it follows that tmn =

n m X X

|aij | ≤

m X ∞ X

i=1 j=1

i=1 j=1

|aij | ≤

m X

bi ≤ L.

i=1

Thus, tmn is bounded. We can now conclude that (tnn ) converges by the Monotone Convergence Theorem, as it is both increasing and bounded. (b) Let ² > 0 be arbitrary. We need to find an N such that n > m ≥ N implies |snn − smm | < ². Now the expression snn − smm is really a sum over a finite collection of aij terms. If each aij included in the sum is replaced with |aij |, the sum only gets larger (this is just the triangle inequality), and the result is that ¯ ¯ ¯X ¯ n m X m X ¯ n X ¯ |snn − smm | = ¯¯ aij − aij ¯¯ ≤ |tnn − tmm |. ¯ i=1 j=1 ¯ i=1 j=1 We know that (tnn ) converges, so pick N so that n > m ≥ N implies |tnn − tmm | < ². It follows that (snn ) is Cauchy and must converge. Exercise 2.8.4. (a) The fact that tmn is a sum of non-negative terms implies that if m1 ≥ m and n1 ≥ n then tm1 n1 ≥ tmn . So let N1 = max{m0 , n0 }. Then it follows that ² B − < tm0 ,n0 ≤ tmn ≤ B 2 for all m, n ≥ N1 . (b) Without loss of generality, let n > m ≥ N . Then, |smn − S| = ≤ = ≤

|smn − smm + smm − S| |smn − smm | + |smm − S| ¯ ¯ ¯m ¯ n ¯X X ¯ ¯ aij ¯¯ + |smm − S| ¯ ¯ i=1 j=m+1 ¯ |tmn − tmm | + |smm − S|.

We have already chosen N1 such that |tmn − tmm | <

² 2

whenever n > m ≥ N1 .

Because (snn ) → S, we can pick N2 so that |smm − S| <

² 2

whenever m ≥ N2 .

Setting N = max{N1 , N2 }, we can conclude that |smn − S| < ²/2 + ²/2 = ² for all n > m ≥ N .

2.8. Double Summations and Products of Infinite Series

41

Exercise 2.8.5. Thinking of m as fixed and n as the limiting variable, the Algebraic Limit Theorem can be applied to the finite number of components of smn =

n X j=1

a1j +

n X

a2j + · · · +

j=1

n X

amj

j=1

to conclude that lim smn = r1 + r2 + · · · + rm .

n→∞

If, in addition, we insist that m ≥ N (where N is the one constructed in the previous exercise), then we have that −² < smn − S < ² is eventually true once n is larger than N . Applying the Order Limit Theorem we find −² ≤ (r1 + r2 + · · · + rm ) − S ≤ ² for all m ≥ N . last statement is extremely close to what we need to conclude that P∞This P∞ i=1 j=1 aij converges to S. Given an arbitrary ² > 0, we have produced an N such that |(r1 + r2 + · · · + rm ) − S| ≤ ²

for all m ≥ N

The only distraction is that our definition of convergence requires a strict inequality, and we have a “less than or equal to ²” result. This, however, is not a problem. Because ² is arbitrary, we could just as easily have chosen to let ²0 < ² at the beginning and constructed our argument using ²0 throughout the proof. On a more general note, while we strive at the introductory level to adhere to the exact wording of our definitions, there comes a point in epsilon–style arguments where it becomes more convenient to simply make quantities less than something that we know can be made arbitrarily small. Exercise P∞ 2.8.6. P∞ As the exercise explains, the same argument can be used to prove j=1 i=1 aij converges to S once we show that for each j ∈ N the sum P∞ i=1 aij converges P∞ to some real number cj . To show Pi=1 aij converges for each j ∈ N, it suffices P∞ to prove that the ab∞ solute series i=1 |aij | converges. Recall that bi = j=1 |aij |, so it is certainly P∞ the case that bi ≥ |aij | for all i, j ∈ N. But our hypothesis says that i=1 bi P∞ converges, and so by the Comparison Test, i=1 |aij | converges for all values of j. Exercise 2.8.7. (a) In order to prove absolute convergence, let un = |d2 | + |d3 | + |d4 | + · · · + |dn | =

n X k=2

|dk |.

42

Chapter 2. Sequences and Series

We must now show that (un ) converges. Well, n X

un =

|dk | ≤

n n X X

|aij | = tnn .

i=1 j=1

k=2

Because, un ≤ tnn for all n and (tnn ) converges, we know that un converges by the Comparison Test. Pn(b) Let ² > 0 be arbitrary. We need to find N such that n ≥ N implies | k=2 dk − S| < ². By hypothesis, (snn ) → S, so choose N1 so that |snn − S| <

² 2

for all n ≥ N1 .

We are also given that (tnn ) converges (this is the absolute convergence hypothesis), an so there exists N2 such that |tnn − tmm | <

² 2

for all n > m ≥ N2 .

In essence, this says that once we get far enough out into the array (aij ) in any direction, the absolute values of the terms do not add up to anything significant. To take advantage of this we set N = max{N1 , 2N2 }. Then, for n ≥ N ¯ ¯ ¯ ¯ n n ¯X ¯ ¯X ¯ ¯ ¯ ¯ ¯ dk − S ¯ = ¯ dk − snn + snn − S ¯ ¯ ¯ ¯ ¯ ¯ k=2 k=2 ¯ ¯ n ¯X ¯ ¯ ¯ ≤ ¯ dk − snn ¯ + |snn − S| ¯ ¯ ¯k=2 ¯ n ¯X ¯ ² ¯ ¯ < ¯ dk − snn ¯ + ¯ ¯ 2 k=2

Pn

Because n ≥ 2N2 , the partial sum k=2 dk along diagonals contains every term in the “square” sum sN2 N2 . It follows that ¯ ¯ n ¯ ¯ X ² ¯ ¯ dk ¯ ≤ (tnn − tN2 N2 ) < . ¯snn − ¯ ¯ 2 k=2

Putting it altogether, we have ¯ ¯ n ¯X ¯ ² ² ¯ ¯ dk − S ¯ < + = ² ¯ ¯ ¯ 2 2

for all n ≥ N,

k=2

and we conclude that

P∞ k=2

dk = S.

P∞ P∞ Exercise 2.8.8. (a) It is possible, as suggested, to prove that i=1 j=1 |ai bj | converges by first proving that it is bounded and then taking advantage of the fact that it is monotone. However, a method similar to proving that it

2.8. Double Summations and Products of Infinite Series

43

is bounded can be used to directly prove that it converges. We will use this method. Let ∞ ∞ X X |ai | = L and |bj | = M. i=1

j=1

For each fixed i ∈ N, the Algebraic Limit Theorem allows us write P∞ |ai | j=1 |bj |. Continuing this process, we see ∞ X ∞ X

|ai bj | =

i=1 j=1

∞ X

|ai |

i=1

∞ X

|bj | =

j=1

∞ X

|ai |M = M

i=1

∞ X

P∞ j=1

|ai bj | =

|ai | = M L,

i=1

P∞ P∞ and therefore i=1 j=1 |ai bj | converges to M L. (b) Again, fix i ∈ R. Now we can write lim snn = lim

n→∞

n X n X

n→∞

ai bj = lim

i=1 j=1

n→∞

Ã n X

! n  X ai  bj  .

i=1

j=1

Applying the Algebraic Limit Theorem to the limits of theseP partial sums we ∞ P∞ find that limn→∞ snn = AB. From part (a) we know that i=1 j=1 |ai bj | converges, so we can use Theorem 2.8.1 and Exercise 2.8.7 to conclude that ∞ X ∞ X i=1 j=1

ai bj =

∞ X ∞ X j=1 i=1

ai bj =

∞ X k=2

dk = lim snn = AB. n→∞

44

Chapter 2. Sequences and Series

Chapter 3

Basic Topology of R 3.1

Discussion: The Cantor Set

3.2

Open and Closed Sets

Exercise 3.2.1. (a) We cannot always take minimums of infinite sets. Therefore the step where we let ² = min{²1 , ²2 , . . . , ²N } > 0 requres that we are working with a finite collection of open sets. You can, however, take the infimum of an infinite set, but the infimum of the set could be 0. T∞ 1 O = ∅. , ). Then (b) Let On = ( −1 n n=1 n n Exercise 3.2.2. (a) {−1, 1} (b) B is not a closed set because it does not contain its limit points. (c) B is not an open set. Given any point of B, it is impossible to find an ²-neighborhood contained in B. (d) All points in B are isolated points. (e) B = B ∪ {−1, 1} Exercise 3.2.3. (a) Neither. Given any point in Q, there is no ²-neighborhood contained in Q. The set of limit points not contained in Q is I. (b) Closed. Given any point in N, there is no ²-neighborhood of that point contained in the set. (c) Open. The limit point 0 is not contained in the set {x ∈ R : x > 0}. (d) Neither. There is no ²-neighborhood of 1 contained in (0, 1]. The limit point 0 is not contained in the set. (e) Neither. There is no ²-neighborhood of any point in the set contained in {1 + 1/4 + 1/9 + · · · + 1/n2 : n ∈ N}. Without the square on the n in this set, P∞ we would have no limit point. However, since n=1 1/n2 converges, the limit of the partial sums is a limit point for this set. This limit point is not an element of the set. Exercise 3.2.4. Let x = lim an for some sequence (an ) contained in A, and assume that an 6= x for all n in N. We want to show that x is a limit point of A. 45

46

Chapter 3. Basic Topology of R

The sequence (an ) converges to x, so by Definition 2.2.3B, every ²-neighborhood V² (x) contains all but a finite number of the terms of (an ). Since (an ) is contained in A, this means that V² (x)∩A is non-empty and contains elements other than x. Hence, x is a limit point of A. Exercise 3.2.5. (⇒) Assume a is an isolated point of A. By Definition 3.2.5, a is not a limit point. Therefore there exists an ²-neighborhood V² (a) such that V² (a) ∩ A = ∅ or V² (a) ∩ A = {a}. Since a is an element of A the former cannot be true. Therefore, V² (a) ∩ A = {a}. (⇐) Assume that there exists an ²-neighborhood V² (a) such that V² (a)∩A = {a}. It follows from Definition 3.2.4 that a is not a limit point of A, and hence it is isolated. Exercise 3.2.6. (⇒) Assume that the set F ⊆ R is closed. Then F contains its limit points. We will show that that every Cauchy sequence (an ) contained in F has its limit in F by showing that the limit of (an ) is either a limit point or possibly an isolated point of F . Because (an ) is Cauchy, we know x = lim an exists. If an 6= x for all x, then it follows from Theorem 3.2.5 that x is a limit point of F . Now consider a Cauchy sequence an where an = x for some n. Because (an ) ⊆ F it follows that x ∈ F as well. (Note that if an is eventually equal to x, then it may not be true that x is a limit point of F .) (⇐) Assume that every Cauchy sequence contained in F has a limit that is also an element of F . To show that F is closed we want to show that it contains its limit points. Let x be a limit point of F . By Theorem 3.2.5, x = lim an for some sequence (an ). Because (an ) converges, it must be a Cauchy sequence. So x is contained in F , and therefore F is closed. Exercise 3.2.7. Let x ∈ O, where O is an open set. Let x = lim xn . It follows from Definition 3.2.1 that there exists an ²-neighborhood V² (x) of x such that V² (x) ∈ O. Because (xn ) is a convergent sequence, by Definition 2.2.3B every ²-neighborhood V² (x) of x contains all but a finite number of the terms of (xn ). Therefore all but a finite number of terms of (xn ) are contained in O. Exercise 3.2.8. (a) Let L be the set of limit points of A, and suppose that x is a limit point of L. We want to show that x is an element of L; in other words, that x is a limit point of A. Let V² (x) be arbitrary. By the definition of a limit point, V² (x) intersects L at a point l ∈ L, where l 6= x. Now choose ²0 > 0 small enough so that V²0 (l) ⊆ V² (x) and x ∈ / V²0 (l). Since l ∈ L, l is a limit point of A 0 and so V² (l) intersects A. This implies V² (x) intersects A at a point different than x, and therefore x is a limit point of A and thus an element of L. (b) Assume x is a limit point of A ∪ L and consider the ²-neighborhood V² (x) for an arbitrary ² > 0. We know V² (x) must intersect A ∪ L and we would like to argue that it in fact intersects A. If V² (x) intersects A at a point different than x we are done, so let’s assume that there exists an l ∈ L with l ∈ V² (x). Using the same argument employed in (a), we take ²0 > 0 small enough so that V²0 (l) ⊆ V² (x), and x ∈ / V²0 (l). Because l is a limit point of A we have that there exists an a ∈ V²0 (l) ⊆ V² (x) and thus V² (x) intersects A at some point other than x, as desired.

3.2. Open and Closed Sets

47

Exercise 3.2.9. (a) Let y be a limit point of A ∪ B. By Theorem 3.2.5, there exists a sequence (cn ) contained in A ∪ B satisfying y = lim cn with y 6= cn for all n ∈ N. Because (cn ) is contained in A ∪ B it must be that either A or B (or both) contains an infinite number of terms of (cn ). This subsequence contained entirely in one set or the other will also converge to y, and we are done with another nod to Theorem 3.2.5. (b) Clearly A ⊆ A ∪ B, and any limit point of A will by definition be a limit point of A ∪ B. Thus A ⊆ A ∪ B. Similarly, B ⊆ A ∪ B. It follows that A ∪ B ⊆ A ∪ B. We also have that A ∪ B ⊆ A ∪ B, and so A ∪ B ⊆ A ∪ B. But by Theorem 3.2.14, A ∪ B is closed, so A ∪ B = A ∪ B. Hence, A ∪ B ⊆ A ∪ B, and so A ∪ B = A ∪ B. S∞ S∞ (c) No. Take An = {1/n}. Then n=1 An = {1/n : n ∈ N}. But n=1 An = {1/n : n ∈ N} ∪ {0}. ¡S ¢c Exercise 3.2.10. (a) Let x ∈ x is not an element of Eλ λ∈Λ Eλ . Then T for all λ. Hence x ∈ Eλc for all λ. So x ∈ λ∈Λ Eλc . We have just shown ¡S ¢c T ¡S ¢c T c c that λ∈Λ λ∈Λ Eλ . T Eλ c⊆ λ∈Λ Eλ . Now we will show that λ∈Λ S Eλ ⊆ Let x ∈ λ∈Λ Eλ . Then for all λ, x ∈ / Eλ . So x ∈ / λ∈Λ Eλ , and hence ¡S ¢c x∈ E . Therefore λ∈Λ λ Ã !c [ \ Eλ Eλc . = λ∈Λ

λ∈Λ

Secondly, we want to show that !c Ã \ [ Eλ = Eλc . ¡T

¢c

λΛ

λ∈Λ

Let x ∈ a λ0 ∈ Λ for which x is not an element λ∈Λ Eλ . Then there exists S ¡T ¢c c c ⊆ of λ∈Λ Eλ S Eλ0 . c Therefore x ∈ Eλ0 . SSo x ∈c λ∈Λ Eλ , and we have0 E . Then there exists a λ ∈ Λ such that E . Now assume x ∈ λ∈Λ λ λ∈Λ λ ¡T ¢c T Eλ . So it is also true that x∈ / Eλ0 . Therefore x ∈ / λ∈Λ Eλ , so x ∈ λ∈Λ ¢c ¡T S c and we have reached our desired conclusion. λ∈Λ Eλ λ∈Λ Eλ ⊆ (b) (i)Suppose that Eλ is a finite collection of closed sets. Then their complements, Eλc are a finite collection of open sets. We know by Theorem 3.2.3 that the intersection of a finite collection of open sets is open. In symbols, !c Ã \ [ Eλ Eλc = λ∈Λ

λ∈Λ

λ∈Λ

λ∈λ

S is an open set. Therefore the union of a finite collection of closed sets, λ∈λ Eλ is closed. suppose that Eλ is an arbitrary collection of closed sets. Then S (ii) Now c is open by Theorem 3.2.3. By De Morgan’s Laws, E λ∈Λ λ Ã !c [ \ c Eλ = Eλ .

48

Chapter 3. Basic Topology of R

It then follows from Theorem 3.2.13 that the intersection of an arbitrary collection of closed sets is closed. Exercise 3.2.11. Let A be bounded above and let s = sup A. Then for ² > 0, there exists an a ∈ A such that s − ² < a. Hence a falls in the ²-neighborhood V² (s) of s. So V² (s) intersects A at a point other than s, and hence s is a limit point of A. Therefore s ∈ A. Exercise 3.2.12. (a) True. By Theorem 3.2.12 A is closed. It then follows c from Theorem 3.2.13 that A is open. (b) True. If a ∈ A is an isolated point, then there exists an ²0 > 0 satisfying V²0 (a) ∩ A = {a}. It follows that for any 0 < ² ≤ ²0 we would again have V² (a) ∩ A = {a}. However, for A to be open, it would have to be that V² (a) ⊆ A for some 0 < ² ≤ ²0 , and this is impossible. (c) True. Throughout the proof, let’s let L be the set of limit points for A. (⇒) Suppose that A is closed. Then A includes its limit points, so A = A ∪ L = A. (⇐) Let A = A. Then A = A ∪ L, hence A contains its limit points and therefore it is closed. (d) True. See Exercise 3.2.11. (e) True. If A = {a1 , a2 , · · · , an } is a finite set, then A has no limit points. (To prove this, let x ∈ R be arbitrary and let ²0 = min{|x − an | : an 6= x}. Then V²0 (x) cannot intersect A at a point other than x, and therefore x is not a limit point.) By default, A contains its empty set of limit points and thus is closed. √ √ (f) False. (−∞, 2) ∪ ( 2, ∞) is a counterexample. For a more interesting example, see Exercise 3.4.10. Exercise 3.2.13. For contradiction, assume that there exists a nonempty set A that is both open and closed. Because A 6= R, B = Ac is also non-empty, and B is open and closed as well. Pick a point a1 ∈ A and b1 ∈ B. We can assume, without loss of generality, that a1 < b1 . Bisect the interval [a1 , b1 ] at c = (b1 − a1 )/2. Now c ∈ A or c ∈ B. If c ∈ A, let a2 = c and let b2 = b1 . If c ∈ B, let b2 = c and let a2 = a1 . Continuing this process yields a sequence of nested intervals In = [an , bn ], where T∞ an ∈ A and bn ∈ B. By the Nested Interval Property, there exists an x ∈ n=1 In . Because the lengths (bn − an ) → 0, we can show lim an = x which implies that x ∈ A because A is closed. However, it is also true that lim bn = x and thus x ∈ B because B is closed. Thus we have shown x ∈ A and x ∈ Ac . This contradiction implies that no such A exists, and we conclude that R are ∅ are the only two sets that are both open and closed. (This argument is closely related to the discussion of connected sets in the next section.) T∞ Exercise 3.2.14. (a) [a, b] = n=1 (a − 1/n, b + 1/n). T∞ S∞ (b) (a, b] = n=1 (a, b + 1/n) ; (a, b] = n=1 [a + 1/n, b] (c) Because Q is countable, we can write Q = {r1 , r2 , r3 , . . .}. Note that each S singleton set {rn } is closed and the complement T {rn }c is open. Then ∞ ∞ c Q = n=1 {rn } shows that Q is an Fσ set, and I = Q = n=1 {rn }c shows that I is a Gδ set.

3.3. Compact Sets

49 @

1

@

@

@

@

2 @ 3

@

1 3

@

@

@

@

@

@

@

1 3

@

@

x+y=s

@

@@

@

@@ 2 3

@ @

1

2

Figure 3.1: x + y = s must intersect C1 × C1 .

3.3

Compact Sets

Exercise 3.3.1. Let K be compact. Then by Theorem 3.3.4, K is closed and bounded. By the Axiom of Completeness, sup K exists, and by Exercise 3.2.11 we know sup K ∈ K. Because K is closed, K = K and hence sup K ∈ K. A similar argument shows inf K ∈ K. Exercise 3.3.2. Let K ⊆ R be closed and bounded. Since K is bounded, the Balzano-Weierstrass Theorem guarantees that for any sequence (an ) contained in K, we can find a convergent subsequence (ank ). Because the set is closed, the limit of this subsequence is also in K. Hence K is compact. Exercise 3.3.3. T We will show that the Cantor set is closed and bounded. ∞ Recall that C = n=0 Cn . Each Cn is closed because it is a finite union of closed intervals. Now since C is an intersection of closed sets, C itself is closed by Theorem 3.2.14. By construction, the Cantor set is bounded above by 1 and below by 0. Hence, C is a compact set. Exercise 3.3.4. Let K be compact and let F be closed. Then K ∩ F is closed by Theorem 3.2.14. Because K is bounded, K ∩ F must be bounded as well. Thus K ∩ F is closed and bounded, and hence compact. Exercise 3.3.5. √ (a) Not compact. Let (an ) be a sequence of rational numbers converging to 2. (b) Not compact. Let (an ) be a sequence of rational numbers converging to an irrational number in the interval (0, 1). (c) Not compact. Let an = n. (d) Compact. (e) Not compact. Let an = 1/n. The sequence (an ) converges to 0 (and thus so does every subsequence), which is not an element of the set. (f) Compact.

50

Chapter 3. Basic Topology of R

Exercise 3.3.6. (a) Fix s ∈ [0, 2]. We want to find an x1 , y1 ∈ C1 such that x1 + y1 = s. We know that C1 = [0, 1/3] ∪ [2/3, 1]. Then we have that: [0, 1/3] + [0, 1/3] = [0, 2/3] [0, 1/3] + [2/3, 1] = [2/3, 4/3] [2/3, 1] + [2/3, 1] = [4/3, 1]. Hence C1 + C1 = [0, 2/3] ∪ [2/3, 4/3] ∪ [4/3, 2] = [0, 2], so for any s ∈ [0, 2], we can find an x1 , y1 ∈ C1 such that x1 + y1 = s. A convenient way to visualize this result in the (x, y)–plane is to shade in the four squares corresponding to the components of C1 × C1 (see Figure 3.1) and observe that, for each s ∈ [0, 2], the line x + y = s must intersect at least one of the squares. For each n we can draw a similar picture (with increasing numbers of smaller squares), and our job is to argue that the line x + y = s continues to intersect at least one of the smaller squares To argue by induction, suppose that we can find xn , yn ∈ Cn such that xn + yn = s. To show that this must hold for n + 1, let’s focus attention on a square from the nth stage where xn + yn = s holds (i.e., where x + y = s intersects an nth stage square). Moving to the n + 1th stage means removing the open middle third of this shaded region. But this results in a situation precisely like the one in Figure 3.1, implying that the line x + y = s must intersect a (n + 1)st stage square. This shows that there exist xn+1 , yn+1 ∈ Cn+1 where xn+1 + yn+1 = s. (b) We have (xn ) and (yn ) with xn , yn ∈ Cn and xn + yn = s for all n. The sequence (xn ) doesn’t converge, but (xn ) is bounded so by the BolzanoWeierstrass Theorem there exists a convergent subsequence (xnk ). Set x = lim xnk . Now look at the corresponding subsequence (ynk ) = s − xnk . Using the Algebraic Limit Theorem, we see that this subsequence converges to y = lim(x−xnk ) = s−x. This shows x+y = s. We now need to argue that x, y ∈ C. One temptation is to say that because C is closed, x = lim(xnk ) must be in C. However, we don’t know (and it probably isn’t true) that (xnk ) is in C. We can say that (xnk ) is in C1 , and because C1 is closed we may conclude x ∈ C1 . In fact, given any fixed n0 , we can argue that x ∈ Cn0 because xnk is (with the exception of some finite number of terms) contained in Cn0 . This implies T∞ x ∈ n=1 Cn = C as desired, and a similar argument works for y. Exercise 3.3.7. (a) True. By Theorem 3.2.14, an arbitrary intersection of closed sets is closed. Boundedness is also preserved by intersections; therefore, the arbitrary intersection of compact sets will be compact. (b) False. Let K be a closed interval and let A be an open set such that A ⊆ K. Then A ∩ K is not closed, and hence it is not compact. (c) False. Let Fn = [n, ∞). Then Fn is closed for all n, but the intersection of these sets is empty. (d) True. A finite set is clearly bounded, and by a previous exercise we know that a finite set is closed. (e) False. The rational numbers are countable but they are not compact.

3.4. Perfect Sets and Connected Sets

51

Exercise 3.3.8. (a) If A1 ∩ K and B1 ∩ K both had finite subcovers consisting of the form {Oλ : λ ∈ Λ}, then there would exist a finite subcover for K. But we assumed that such a finite subcover did not exist for K. Hence either A1 ∩ K or B1 ∩ K (or both) has no finite subcover. (b) Let I1 be a half of I0 whose intersection with K does not have a finite subcover, so that I1 ∩ K cannot be finitely covered and I1 ⊆ I0 . Then bisect I1 into two closed intervals, A2 and B2 and again let I2 = A2 if A2 ∩ K does not have a finite subcover. Otherwise, let I2 = B2 Continuing this process of bisecting the interval In , we get the desired sequence In with lim |In | = 0. (c) Because T K is compact, K ∩ In is also compact for each n ∈ N. By ∞ Theorem 3.3.5, n=1 In ∩ K is non-empty, and there exists an x ∈ K ∩ In for all n. (d) Let x ∈ K and let Oλ0 be an open set that contains x. Because Oλ0 is open, there exists ²0 > 0 such that V²0 (x) ⊆ Oλ0 . Now choose n0 such that |In0 | < ²0 . Then In0 is contained in the single open set Oλ0 and thus it has a finite subcover. This contradiction implies that K must have originally had a finite subcover. Exercise 3.3.9. (a) Let Oλ = (λ − 1, λ + 1) where λ ∈ N. (b) Let α be a fixed irrational number in the interval (0, 1). For each n ∈ N set On = (−1, α − 1/n) ∪ (α + 1/n, 2). The union over n of all these sets gives (−1, α) ∪ (α, 2) which contains Q ∩ [0, 1]. This cover has no finite subcover. (c) Let Oλ = (λ − 1, λ + 1) where λ ∈ N. (e) Let On = (1/n, 2) for each n ∈ N. The union gives (0, 2) and there is no finite subcover. Exercise 3.3.10. If A is a finite set then it clearly clompact. Conversely, assume A is clompact. Because a singleton set is a closed set, the collection of singleton sets consisting of the elements of A is a closed cover. This cover must have a finite subcover, and it follows that A is a finite set. To summarize, a set is “clompact” if and only if it is finite.

3.4

Perfect Sets and Connected Sets

Exercise 3.4.1. Let P be a perfect set and let K be compact. Consider the set P ∩ K. This set is closed by Theorem 3.2.14. Since K is bounded, P ∩ K will be bounded as well, and thus the intersection of the two sets is compact. However, P ∩ K is not necessarily perfect. For example, let K be a singleton set contained in P . Then P ∩ K is a singleton set and is not perfect. Exercise 3.4.2. No. A non-empty perfect set must be uncountable and subsets of Q are all countable sets. Exercise 3.4.3. (a) We are given an arbitrary x ∈ C. Because x ∈ C1 ⊆ C, x must fall in one of the two intervals that make up C1 . The key idea to remember is that C contains at least the endpoints of these two intervals. Thus, if 0 ≤ x < 1/3, let x1 = 1/3. If x = 1/3, then take x = 0. We can do a similar

52

Chapter 3. Basic Topology of R 0

1

C0

•

C1

•

C2

•

C3 .. .

• • • • .. .. . .

•

0 0

3/8

5/8

3/8

5/8

•

9/64

•

15/64

•

•

• • • • .. .. . .

1

• •

• 49/64

•

• • • • .. .. . .

55/64

•

1

•

• • • • .. .. . .

Figure 3.2: The “open middle–fourth” Cantor set.

thing if x falls in the other interval. This is, if 2/3 ≤ x < 1, then let x1 = 1, and if x = 1 then set x1 = 2/3. In all of these cases we have x1 ∈ C with |x − x1 | ≤ 1/3. (b) For each n ∈ N, the length of each interval that makes up Cn is 1/3n . It is also true that the endpoints of these intervals are always elements of C. For every n, let xn be an endpoint of the interval that contains x. If x happens to be an endpoint of a Cn interval, then let xn be the opposite endpoint of this interval. Thus we have xn ∈ C with xn 6= x such that |x − xn | ≤ 1/3n . Because 1/3n → 0, it follows that xn → x. This means that x ∈ C is not an isolated point. Having already seen that C is closed, we conclude that C is perfect. Exercise 3.4.4. (a) This set is compact and perfect, and the arguments proceed exactly as they do for the original Cantor set. (See Figure 3.2.) (b) The length of this set is equal to 1 minus the lengths of the missing pieces: µ ¶ 1 3 9 Length = 1 − + 2( ) + 4( ) + ··· 4 32 256 µ ¶ 1 3 9 = 1− + + + ··· 4 16 64 µ ¶ 1/4 = 1− 1 − 3/4 = 1 − 1 = 0. To find the dimension of this set, magnify the set by 83 . Then C0 = [0, 8/3] and C1 = [0, 1] ∪ [5/3, 8/3]. Thus we obtain two copies of the set. If x is the ln 2 ≈ .707. dimension of the set, then x should satisfy 2 = ( 38 )x , or x = ln(8/3) Exercise 3.4.5. Let U and V be disjoint, open sets with A ⊆ U and B ⊆ V . We claim that U ∩ V = ∅ and U ∩ V = ∅. To see why this is true, note that because U and V are disjoint we have U ⊆ V c . Now V c is closed (because V is open) and thus U must also satisfy U ⊆ V c by Theorem 3.2.12. This proves U ∩ V = ∅, and the other statement has a similar proof.

3.4. Perfect Sets and Connected Sets

53

Since A ⊆ U , limit points of A will also be limit points of U and we get A ⊆ U . Hence A ∩ V = ∅ and therefore A ∩ B = ∅. Similarly, B ⊆ V , so A ∩ B = ∅. Therefore, A and B are separated. Exercise 3.4.6. (⇒) Let E be a connected set. Assume E = A ∪ B where A, B are disjoint, non-empty sets. Since E is connected, A and B are not separated. So either A ∩ B or A ∩ B is not empty. Without loss of generality, assume x ∈ A ∩ B. Then x ∈ B and x ∈ A, but x ∈ / A because A and B were assumed to be disjoint. Therefore x is a limit point of A. Then by Theorem 3.2.5 there exists a convergent sequence (xn ) contained in A that converges to x. (⇐) We will prove this direction by proving the contrapositive. Assume E ⊆ R is disconnected. We want to find two non-empty, disjoint sets A, B satisfying E = A∪B such that there never exists a convergent sequence (xn ) → x with (xn ) contained in one of A or B, and x an element of the other. Because E is disconnected, there exist separated sets A and B satisfying E = A ∪ B. Now suppose (xn ) is contained in A and (xn ) → x. Then either x ∈ A or x is a limit point of A, and in either case x ∈ A. Because A ∪ B = ∅, we know x ∈ / B. If we assume (xn ) is convergent sequence in B, a similar argument shows that its limit cannot be in A. This completes the proof. Exercise 3.4.7. (a) A = Q ∩ (0, 5). Then A is disconnected, for we √ Consider √ can write A = (0, 2) ∪ ( 2, 5). But A = [0, 5], which is connected. (b) If A is connected, A is connected as well. This follows directly from Theorem 3.4.7. If A is perfect then A is closed and A = A. Hence, A is perfect as well. Exercise 3.4.8. (a) Given any x, y ∈ Q, choose z ∈ I such that x < z < y. We know that such a z exists because the irrational numbers are dense. Then let Q = A ∪ B, where A = Q ∩ (−∞, z) and B = Q ∩ (z, ∞). The sets A and B are separated (see Example 3.4.5(ii)), and x ∈ A and y ∈ B. (b) The set of irrational numbers is totally disconnected because the rational numbers are also dense in R. Thus we can follow the same argument as in part (a) by letting x, y ∈ I and choosing z ∈ Q. Exercise 3.4.9. (a) The length of each interval in Cn is 1/3n . If we choose an N so that 1/3N < ², then x, y cannot belong to the same interval. (b) Let x and y be on separate intervals of CN , where N is chosen as in (a). Then there exists an open interval between x and y that is not contained in C. Choose z in this interval. Then x < z < y and z ∈ / C. If (a, b) were an open interval satisfying (a, b) ⊆ C, then we could find x, y ∈ C with a < x < y < b, and it would follow that [x, y] ⊆ C. However, we have now shown that for all such x and y there exists a point z ∈ (x, y) with z∈ / C. Thus, C contains no intervals (open or closed). (c) Informally speaking, totally disconnected sets in R and sets that do not contain any intervals. This is the content of part (b). To say it again, we know that given any x, y ∈ C with x < y, there exists a z ∈ / C satisfying x < z < y. Take A = C ∩ [0, z) and B = C ∩ (z, 1]. Then A, B are separated with x ∈ A and y ∈ B, and C = A ∪ B.

54

Chapter 3. Basic Topology of R

S∞ Exercise 3.4.10. (a) Since O = n=1 V²n (rn ) is a union of open sets, O is open. Therefore F is closed. Every rational number is contained in O, so F must contain only irrationals. Now we must argue that F is non-empty. To informally see this, look at the “length” of O. Since O is the P∞union of open sets of length 1/2n−1 , the length of O must be no greater than n=1 1/2n n − 1 = 2. Therefore the entire real line cannot be covered by O, and hence F is non-empty. A way to avoid applying the concept of “length” to sets that are not finite unions of intervals would be to assume, for contradiction, that F = ∅. Then O = R and, in particular, the compact set [0, 3] is covered by {V²n (rn ) : n ∈ N}. Now let {V²n1 (rn1 ), V²n2 (rn2 ), . . . V²nm (rnm )} be a finite subcover for [0, 3]. The lengths of this finite collection of open intervals must sum to a total less than 2, and therefore they cannot cover the set [0, 3]. (b) No, the set F does not contain any non-empty open intervals. Every non-trivial interval contains a rational number and this rational is not an element of F . Hence F contains no such intervals. This proves that F is totally disconnected. Given arbitrary a, b ∈ F with a < b, we can find a rational number c with a < c < b. Then writing F = A ∪ B where A = F ∩ (−∞, c) and B = F ∩ (c, ∞) finishes the argument. (c) It is not possible to know whether F is perfect as it is possible for F to contain isolated points. There does exist a non-empty perfect set of irrational numbers. To modify the construction, we again write Q = {r1 , r2 , r3 , . . .}, but this time we define ²n √ inductively. Set ²1 = 2/2 √ and, as a convention, let V² (x) = ∅ whenever ² = 0. For n ≥ 2, let ²n = min{ 2/2n , dn /2} where dn = inf{|x − rn | : x ∈

n−1 [

V²k (rk )}.

k=1

Sn−1 Geometrically, dn is the distance from rn to the set On−1 = k=1 V²k (rk ). The idea is to inductively build the open set O as a disjoint union of positively spaced neighborhoods of the form V²n (rn ). If dn = 0, then because ²n is always irrational whenever it is non-zero, we may conclude rn ∈ On−1 . If dn > 0, then the definition of ²n ensures that (1) S∞

V²n (rn ) ∩ V²m (rm ) = ∅

for all 1 ≤ m < n.

Now O = n=1 V²n (rn ) is open and contains Q, so F = Oc is again a closed set inside the irrationals. It remains to show that it contains no isolated points. Let x ∈ F be arbitrary and assume, for contradiction, that x is isolated. Thus there exits ²0 > 0 such that (x−²0 , x) and (x, x+²0 ) are both be contained in O. Because of the way we constructed O it now follows that there must exist n0 and m0 such that (x − ²0 , x) ⊆ V²n0 (rn0 )

and

(x, x + ²0 ) ⊆ V²m0 (rm0 ).

But this contradicts statement (1) above because the point x is a limit point of each of these two neighborhoods. This contradiction proves x is not isolated and the proof is complete.

3.5. Baire’s Theorem

3.5

55

Baire’s Theorem

Exercise 3.5.1. (⇒) Let A be a Gδ set. We want to show that this implies that Ac is an Fσ set. By the definition of a Gδ set,TA can be written as the ∞ countable intersection of open sets. In symbols, A = Sn=1 On where On is open ∞ c for each n ∈ N. Then by De Morgan’s Law, A = n=1 Onc . Because On is c c open, On is closed. Hence, A is the countable union of closed closed sets, and therefore it is an Fσ set. S∞ (⇐) Now let B be an Fσ set. Then we know that B = n=1 Fn , where Fn is T closed for each n ∈ N. It then follows from De Morgan’s Law that ∞ B c = n=1 Fnc . Therefore, B c is the countable intersection of open sets, which makes it a Gδ set. Exercise 3.5.2. (a) countable. (b) finite. (c) finite. (d) countable. Exercise 3.5.3. See Exercise 3.2.14. Exercise 3.5.4. (a) Pick a point x1 ∈ G1 . Since G1 is open, there exists an ²1 > 0 such that V²1 (x1 ) ⊆ G1 . Now take ²01 < ²1 , and let I1 = V²01 (x1 ). The significant point to make here is that I1 is a closed interval but we still have the containment I1 ⊆ V²1 (x1 ) ⊆ G1 . Because G2 is dense, there exists an x2 ∈ V²01 (x1 ) ⊆ G1 . Now G2 ∩ V²01 (x1 ) is open, so there exists an ²2 > 0 such that V²2 (x2 ) ⊆ G2 ∩ V²01 (x1 ). If we again choose a smaller ²02 < ²2 , then as before the closed interval I2 = V²02 (x2 ) satisfies I2 ⊆ G2 as well as I2 ⊆ I1 . We may continue this process to create a nested sequence of closed intervals I1 ⊇ I2 ⊇ I3 ⊇ · · · satisfying In ⊆TGn for all n ∈ N. ∞ (b) By the Nested Interval Property, there exists T∞ an x ∈ n=1 In . Because In ⊆ Gn it follows that x ∈ Gn for all n. Hence n=1 Gn is not empty. Exercise 3.5.5. Let F be a closed set containing no non-empty open intervals. Then F c is open and we claim that it must also be dense in R. To see why, assume x, y ∈ R satisfy x < y. By hypothesis, the open interval (x, y) is not contained in F which means there exists a point z ∈ F c satisfying x < z < y. This proves F c is dense. S∞Turning to the statement in the exercise, assume for contradiction that R = set containing no non-empty open intervals. n=1 Fn where each Fn is a closed T∞ Taking complements we get ∅ = n=1 Fnc , and we have just seen that each Fnc is a dense open set in R. But this is a contradiction, because the intersection of dense open sets is not empty. Exercise S 3.5.6. Assume, for contradiction, that I is an Fσ set. Then we can ∞ write I = n=1 Fn , where each Fn is a closed set. Because each Fn is a subset of I, we can also assert that Fn fails to contain any open intervals. Now Q is the countable union of singleton sets, and each singleton set certainly qualifies as a closed set containing no open intervals. But this implies that we can write R

56

Chapter 3. Basic Topology of R

as the countable union of closed sets, none of which contain any open intervals. In the previous exercise we showed that this is impossible, hence I cannot be an Fσ set. If Q were a Gδ set, then by Exercise 3.5.1, we could show that I was an Fσ set, which we have just shown to be impossible. Exercise 3.5.7. The set (I ∩ (−∞, 0]) ∪ (Q ∩ [0, ∞)) is neither an Fσ set nor a Gδ set. Exercise 3.5.8. (⇒) Assume that E is nowhere dense in R. Then E contains no nonempty open intervals. Given any x, y ∈ R with x < y, we know (x, y) is c not a subset of E. So there exists a z ∈ E satisfying x < z < y. We also have c c that E is open because E is closed. This proves E is dense. c (⇐) Assume that E is dense. Then for any x, y ∈ R with x < y, we can c find a z ∈ E satisfying x < z < y. Therefore E cannot contain any nonempty open intervals. It then follows from the definition that E is nowhere-dense. Exercise 3.5.9. (a) Somewhere in between. (b) Nowhere dense. (c) Dense. (d) Nowhere dense. S∞ Exercise 3.5.10. Assume, for contradiction, that R = n=1 En . Then cerS∞ T∞ c tainly R = n=1 En . By De Morgan’s Law this implies that ∅ = n=1 En . c c Because En is nowhere dense, En is dense. We also know that En is open. Then we have reached a contradiction, since by Theorem 3.5.2 the countable intersection of dense, open sets is not empty.

Chapter 4

Functional Limits and Continuity 4.1

Discussion: Examples of Dirichlet and Thomae

4.2

Functional Limits

Exercise 4.2.1. (a) Let ² > 0. Notice that |f (x) − 8| = |(2x + 4) − 8| = |2x − 4| = 2|x − 2|. Choose δ = ²/2. Then 0 < |x − 2| < δ = ²/2 implies that ³²´ |f (x) − 8| = 2|x − 2| < 2 = ². 2 1

1

(b) Let ² > 0. Choose δ = ² 3 . Then 0 < |x| < δ = ² 3 implies that 1

|f (x) − 0| = |x3 | < (² 3 )3 = ². (c) Given an arbitrary ² > 0, our goal is to make |x3 − 8| < ² by restricting |x − 2| to be smaller than some carefully chosen δ. Note that |x3 − 8| = |(x2 + 2x + 4)(x − 2)| = |(x2 + 2x + 4)||x − 2|. By insisting that δ ≤ 1, we can restrict x to fall in the interval (1, 3). This implies |(x2 + 2x + 4)| ≤ 9 + 6 + 4 = 19. Now choose δ = min{1, ²/19}. If 0 < |x − 2| < δ, then it follows that ³ ² ´ |x3 − 8| = |(x2 + 2x + 4)| |x − 2| ≤ 19 =² 19 as desired. 57

58

Chapter 4. Functional Limits and Continuity

(d) For arbitrary ² > 0, choose δ = 1/10. Then, 0 < |x − π| < δ = 1/10 implies 3 < x < 4 and hence [[x]] = 3. Thus, |[[x]] − 3| = |3 − 3| = 0 < ² as desired. Although in most cases smaller values of ² require smaller values of δ in response, this is a non-standard situation where δ can be chosen independently of the value of ². Exercise 4.2.2. Then any smaller δ will also suffice. Exercise 4.2.3. (a) If xn = −1/n and yn = 1/n for n ∈ N , then lim(xn ) = lim(yn ) = 0. However, | − 1/n| |xn | = = −1 xn −1/n

and

|yn | |1/n| = = 1. yn 1/n

Thus, lim

|xn | |yn | 6= lim , xn yn

and so by Corollary 4.2.5, limx→0 q |x|/x does not exist. n+1 (b) Let xn = n and yn = n+1 n for n ∈ N . Then lim(xn ) = lim(yn ) = 0. We also have xn ∈ Q and yn ∈ I for all n ∈ N, so that lim g(xn ) = lim 1 = 1

while

lim g(yn ) = lim 0 = 0.

By Corollary 4.2.5, limx→1 g(x) does not exist. p Exercise 4.2.4. (a) Let xn = (n+1)/n, yn = (n + 1)/n and zn = (2n+1)/2n. Note that lim(xn ) = lim(yn ) = lim(zn ) = 1. (b) For (xn ) we get t(xn ) = 1/n which converges to 0. For (yn ) we get t(yn ) = 0 which converges to 0. For (zn ) we get t(zn ) = 1/2n which converges to 0. (c) The point to make is that the closer a rational number is to 1, the larger its denominator has to be, and thus the smaller the value of t(x). Because t(x) = 0 for all irrational numbers, the conjecture is that limx→1 t(x) = 0. In order to prove our claim, we have to show that given ² > 0, there exists a δ neighborhood around 1 such that x ∈ Vδ (1) implies t(x) ∈ V² (0). If we set T = {x ∈ R : t(x) ≥ ²}, then notice that x ∈ T if and only if x is a rational number of the form x = m/n where n ≤ 1/². If we focus on some finite interval such as [0, 2] then the restriction on the size of n implies that the set T ∩ [0, 2] is finite. With finite sets, we are allowed to take minimums and so let δ = min{y : y ∈ T ∩ [0, 2]} > 0. To see that this choice of δ “works”, we note that if x ∈ Vδ (1) then x ∈ / T and thus t(x) ∈ V² (0).

4.2. Functional Limits

59

Exercise 4.2.5. (a) Showing lim [f (x)+g(x)] = L+M is equivalent to showing x→c

f (xn ) + g(xn ) → L + M whenever xn → c. Since we are given f (xn ) → L and g(xn ) → M , we can use Theorem 2.3.3 part (ii) to conclude f (xn ) + g(xn ) → L + M. (b) Let ² > 0 be arbitrary. We need to show, there exists δ such that 0 < |x − c| < δ implies |(f (x) + g(x)) − (L + M )| < ². Note that, |(f (x) + g(x)) − (L + M )| = |(f (x) − L) + (g(x) − M )| ≤ |f (x) − L| + |g(x) − M |. Since limx→c f (x) = L, there exists δ1 such that 0 < |x − c| < δ1 implies |f (x) − L| < ²/2. In addition, because limx→c g(x) = M , there exists δ2 such that 0 < |x − c| < δ2 implies |g(x) − M | < ²/2. Now if we pick δ = min{δ1 , δ2 } then 0 < |x − c| < δ implies that |(f (x) + g(x)) − (L + M )| ≤ |f (x) − L| + |g(x) − M | ² ² < + = ², 2 2 as desired. (c) Showing lim [f (x)g(x)] = LM is equivalent to showing f (xn )g(xn ) → x→c

LM whenever xn → c. Since we are given f (xn ) → L and g(xn ) → M , we can use Theorem 2.3.3 part (iii) to conclude f (xn )g(xn ) → LM . Now let’s write another proof of the corollary based on Definition 4.2.1. Note that, |f (x)g(x) − (LM )|

= |f (x)g(x) − f (x)M + f (x)M − (LM )| ≤ |f (x)(g(x) − M )| + |M (f (x) − L)| = |f (x)||g(x) − M | + |M ||f (x) − L|.

Since limx→c f (x) = L, there exists δ1 such that 0 < |x − c| < δ1 implies |f (x) − L| < ²/(2M ). Next we need a lemma that says f (x) is bounded. Although this may not be the case over the whole domain A, it is certainly true in some neighborhood around x = c. Given ²0 = 1, for instance, we know there exists δ2 > 0 such that 0 < |x − c| < δ2 implies |f (x) − L| < 1, and in this case we then have |f (x)| < |L| + 1. We now use the fact that limx→c g(x) = M to assert that there exists δ3 > 0 such that 0 < |x − c| < δ3 implies |g(x) − M | < ²/(2(|L| + 1)). Finally, if we pick δ = min{δ1 , δ2 , δ3 }, then |f (x)g(x) − (LM )| ≤

|f (x)||g(x) − M | + |M ||f (x) − L| µ ¶ ³ ² ´ ² < (|L| + 1) +M =² 2(|L| + 1) 2M

whenever 0 < |x − c| < δ. Finally, we point out that this proof assumes M 6= 0. The case M = 0 is a little easier in fact and can be handled as a corollary of the next exercise.

60

Chapter 4. Functional Limits and Continuity

Exercise 4.2.6. We are given that there exists an M > 0 such that |f (x)| ≤ M for all x ∈ A. Let ² > 0 be arbitrary. Because we know limx→c g(x) = 0, there exists δ > 0 such that 0 < |x − c| < δ implies |g(x) − 0| = |g(x)| < ²/M . It follows that ³ ² ´ |g(x)f (x) − 0| = |g(x)||f (x)| < M =² M whenever 0 < |x − c| < δ . Therefore, limx→c g(x)f (x) = 0. Exercise 4.2.7. (a) We say limx→c f (x) = ∞ if for every arbitrarily large M , there exists δ > 0, such that whenever 0 < |x − c| < δ it follows that f (x) > M . q 1 Let M > 0 be arbitrary. To prove limx→c 1/x2 = ∞, we can choose δ = M . q 1 1 Then 0 < |x| < δ = M implies x2 < M from which it follows that 1/x2 > M , as desired. (b) We say limx→∞ f (x) = L if for every ² > 0, there exists K > 0, such that whenever x > K it follows that |f (x) − L| < ². Let ² > 0. To prove limx→∞ 1/x = 0, choose K = 1/². If x > K = 1/², then 1/x < ² as desired. (c) We say limx→∞ f (x) = ∞ if for every M > 0 there exists K > 0, such that whenever x > K it follows √ that f (x) > M . An example of function with 2 such a limit would be f (x) = x. Given √ an arbitrary M > 0, choose K = M . 2 If x > K = M , then it follows that x > M as desired. Exercise 4.2.8. Let limx→c f (x) = L and let limx→c g(x) = M . We are asked to show L ≥ M . This result is in the same spirit as the Order Limit Theorem (Theorem 2.3.4), and using the Sequential Criterion for Functional Limits we can in fact derive this result from OLT. Let (xn ) be a sequence in A satisfying (xn ) → c with xn 6= c for all n. We are given than f (xn ) ≥ g(xn ), and thus the Order Limit Theorem tells us lim f (xn ) ≥ lim g(xn ). (This requires that we know the limits exist, a hypothesis not included in early editions of this problem.) By the Sequential Criterion for Functional Limits, L = lim f (xn ) and M = lim g(xn ), and thus L ≥ M as desired. Exercise 4.2.9. This is another situation where we could use the analogous statement for sequences (Exercise 2.3.3) to prove the functional limit version. (We could also apply the previous exercise to each inequality.) Instead, we shall give a proof in terms of the ²–δ definition of functional limits. Let ² > 0. Because limx→c f (x) = L, there exists δ1 > 0 such that 0 < |x − c| < δ1 implies L − ² < f (x) < L + ². Likewise, there exists δ2 > 0 such that 0 < |x − c| < δ2 implies L − ² < h(x) < L + ². Choosing δ = min{δ1 , δ2 }, we see that L − ² < f (x) ≤ g(x) ≤ h(x) < L + ² whenever 0 < |x − c| < δ, which implies |g(x) − L| < ² as desired.

4.3. Combinations of Continuous Functions

4.3

61

Combinations of Continuous Functions

√ √ Exercise 4.3.1. (a) Let ² > 0. Note that |g(x) − c| = | 3 √ x − 0| = | 3 x| where c = 0. Now if we set δ = ²3 , then |x − 0| < δ = ²3 implies | 3 x| < ². This shows g(x) is continuous at c = 0. (b) For c 6= 0 write, ! Ã√ √ √ 3 3 √ √ √ √ x2 + 3 xc + c2 3 3 3 3 √ |g(x) − g(c)| = | x − c| = | x − c| √ √ 3 3 x2 + 3 xc + c2 |x − c| |x − c| √ ≤ √ . √ 3 2 3 2 3 + xc + c c √ √ 3 3 Therefore, if we pick δ = ² c2 , then |x − c| < δ = ² c2 implies √ 3 √ √ |x − c| ² c2 3 3 √ < = ². |g(x) − g(c)| = | x − c| ≤ √ 3 2 3 2 c c =

√ 3

x2

Exercise 4.3.2. (a) Let ² > 0. Because g is continuous at f (c) ∈ B, for every ² > 0, there exists an α > 0 such that |g(y) − g(f (c))| < ² whenever y satisfies |y − f (c)| < α. Now, because f is continuous at c ∈ A, for this value of α, we can find a δ > 0 such that |x − c| < δ implies that |f (x) − f (c)| < δ. Combining the two statements, we see that for ² > 0, there exists δ > 0 such that |x−c| < δ implies |g(f (x)) − g(f (c))| < ². Therefore, g ◦ f is continuous at c. (b) Let’s now prove Theorem 4.3.9 using the sequential characterization of continuity in Theorem 4.3.2 (iv). Assume (xn ) → c (with c ∈ A). Our goal is to show g(f (xn )) → g(f (c)). Because f is continuous at c, we know f (xn ) → f (c). Then, because g is continuous at f (c), we know that g(f (xn )) → g(f (c)). This completes the proof. Exercise 4.3.3. Let ² > 0. We need to argue that |f (x) − f (c)| can be made less than ² for all values of x in some δ neighborhood around an arbitrary c. For the case where a 6= 0, write |f (x) − f (c)| = |(ax + b) − (ac + b)| = |ax − ac| = |a||x − c|. So if we pick δ = ²/|a|, then |x − c| < δ = ²/|a| implies |f (x) − f (c)| = |a||x − c| < |a|

² = ². |a|

Therefore, f (x) is continuous. If a = 0, then |f (x) − f (c)| = 0, and we may choose δ = 1 regardless of how ² is chosen. Exercise 4.3.4. (a) Let ² > 0 and fix n ∈ Z. If we set δ = 1, then the point x = n will be the only element of the domain that lies in the Vδ (n)

62

Chapter 4. Functional Limits and Continuity

neighborhood. It follows trivially that f (x) ∈ V² (f (n)) for the point x = n, and we may conclude that f is continuous at n by Theorem 4.3.2 (iii). (b) Let ² > 0. If c is an isolated point of A, then there exists a neighborhood Vδ (c) that intersects the set A only at c. Because x ∈ Vδ (c) ∩ A implies that x = c, we see f (x) = f (c) ∈ V² (f (c)). Thus f (x) is continuous at the isolated point c using the criterion in Theorem 4.3.2 (iii). Exercise 4.3.5. Set ²0 = |g(c)| which we are assuming to be greater than zero. Because g is continuous, we know there exists an open neighborhood Vδ (c) with the property that g(x) ∈ V²0 (g(c)) provided x ∈ Vδ (c). But notice that V²0 (g(c)) does not contain zero and so we can be sure that g(x) 6= 0 whenever x ∈ Vδ (c). This guarantees f (x)/g(x) is defined on Vδ (c) as long as x ∈ A. (To properly answer this question as written we need the additional assumption that c be an interior point to the common domain A.) Exercise 4.3.6. (a) We are asked to show Dirichlet’s function g(x) is nowherecontinuous on R. First consider an arbitrary r ∈ Q. Because I is dense in R there exists a sequence (xn ) ⊆ I with (xn ) → r. Then, g(xn ) = 0 for all n ∈ N while g(r) = 1. Since lim g(xn ) = 0 6= g(r) we can use Corollary 4.3.3 to conclude g(x) is not continuous at r ∈ Q. Now let’s consider an arbitrary i ∈ I. Because Q is dense in R we can find a sequence (yn ) ⊆ Q with (yn ) → i. This time g(yn ) = 1 for all n ∈ N while g(i) = 0. Because lim g(yn ) = 1 6= g(i) we can conclude that g is not continuous at i. Combining the two results, we can conclude that Dirichlet’s function is indeed nowhere continuous on R. (b) Consider an arbitrary rational number r ∈ Q and observe that t(r) 6= 0. Because I is dense, there exists a sequence (xn ) ⊆ I with (xn ) → r. Then, t(xn ) = 0 for all n ∈ N while t(r) 6= 0. Thus, lim t(xn ) 6= t(r) and t(x) is not continuous at r. (c) Consider an arbitrary c ∈ I. Given ² > 0, set T = {x ∈ R : t(x) ≥ ²}. If x ∈ T , then x is a rational number of the form x = m/n with m, n ∈ Z where n satisfies |n| ≤ 1/². By focusing our attention on the interval [c − 1, c + 1] around the point c, we see that the restriction on the size of n implies that the set T ∩ [c − 1, c + 1] is finite. In a finite set, all points are isolated so we can pick a neighborhood Vδ (c) around c such that all x ∈ Vδ (c) implies x ∈ / T . But if x ∈ / T then t(x) < ² or t(x) ∈ V² (t(c)). By Theorem 4.3.2 (iii), we conclude t(x) is continuous at c. Exercise 4.3.7. We will prove the set K is closed by showing that it contains all its limit points. Let c be a limit point of K. By Theorem 3.2.5 there is a sequence (xn ) ⊆ K with (xn ) → c. Because h is continuous on R, lim h(xn ) = h(c). But notice xn ∈ K, implies h(xn ) = 0, and thus lim h(xn ) = 0. We conclude h(c) = 0, which implies c ∈ K, as desired. Exercise 4.3.8. (a) Consider an arbitrary c ∈ I. Because Q is dense in R we can find a sequence (rn ) ⊆ Q such that (rn ) → c. Using the continuity of f , we see lim f (xn ) = f (c). But we are given that xn ⊆ Q implies f (xn ) = 0, and so f (c) = lim f (xn ) = 0.

4.3. Combinations of Continuous Functions

63

(b) First define a new function h(x) = f (x) − g(x). By Theorem 4.2.4, h(x) is continuous. Because f (r) = g(r) at every r ∈ Q, we have h(r) = 0 on Q and part(a) implies h(x) = 0 on all of R. This shows f and g are the same function. (The hypothesis that f and g are continuous was not included in early editions.) Exercise 4.3.9. Geometrically speaking, the condition on f described in this problem says that if f is applied to any two points x and y, then the image values f (x) and f (y) are closer together (in a uniform way) than x and y. This is the reason for the term “contraction.” (a) Let ² > 0 and fix y ∈ R. To show f is continuous at y, choose δ = ²/c, and observe that |x − y| < δ = ²/c implies ³²´ |f (x) − f (y)| ≤ c|x − y| < c = ². c Because y is arbitrary, f (x) must be continuous on R. (b) Observe that for any fixed n ∈ N, |ym+1 − ym+2 | = |f (ym ) − f (ym+1 )| ≤ c|ym − ym+1 |. This idea can be extended inductively to conclude that |ym+1 − ym+2 | ≤ c|ym − ym+1 | ≤ c2 |ym−1 − ym | ≤ · · · ≤ cm |y1 − y2 |. P∞ The fact that 0 < c < 1 means n=1 cn converges, and this will enable us to conclude that (yn ) is a Cauchy sequence. To see how, first note that for m < n we have |ym − yn |

≤ |ym − ym+1 | + |ym+1 − ym+2 | + · · · + |yn−1 − yn | ≤ cm−1 |y1 − y2 | + cm |y1 − y2 | + · · · + cn−2 |y1 − y2 | = cm−1 |y1 − y2 |(1 + c + · · · + cn−m−1 ) µ ¶ 1 m−1 < c |y1 − y2 | . 1−c

Let ² > 0, and choose N ∈ N large enough so that cN −1 < ²(1 − c)/|y1 − y2 |. Then the previous calculation shows that n > m ≥ N implies |ym − yn | < ². We conclude that (yn ) is Cauchy. (c) Set y = lim yn . Because f is continuous, f (y) = lim f (yn ). But f (yn ) = yn+1 , and so f (y) = lim yn+1 . Because lim yn+1 = lim yn = y, it follows that f (y) = y and y is a “fixed point.” (d) The argument in (b) and (c) applies to any sequence of iterates. Thus, given an arbitrary x, we may assert that (x, f (x), f (f (x)), . . .) converges to a limit x0 and that x0 is a fixed point of f . But y is also a fixed point and so |f (x0 ) − f (y)| = |x0 − y|.

64

Chapter 4. Functional Limits and Continuity

However, |f (x0 ) − f (y)| ≤ c|x0 − y|, must also be true, and because 0 < c < 1 we conclude that x0 = y. In summary, if f is a contraction on R, then f has a unique fixed point, and every sequence of iterates converges to this unique point. Exercise 4.3.10. (a) Note that f (0) = f (0 + 0) = f (0) + f (0) which implies f (0) = 0. For any x ∈ R, f (0) = f (x − x) = f (x) + f (−x) = 0. This implies f (−x) = −f (x). (b) Fix c ∈ R and let (xn ) → c. To prove that f is continuous at c it is enough to show lim f (xn ) = f (c). Now (c − xn ) → 0. Because we are given that f is continuous at 0, it follows that lim f (xn − c) = f (0) = 0. Combining the additive condition on f with the Algebraic Limit Theorem then gives 0 = lim f (c − xn ) = lim(f (c) − f (xn )) = f (c) − lim f (xn ), and we get f (c) = lim f (xn ) as desired. (c) For any n ∈ N, f (n) = f (1 + 1 + . . . + 1) = f (1) + f (1) + . . . + f (1) = nf (1) = nk. For z ∈ Z, the case z < 0 is all that remains to do. In (a) we saw f (−x) = −f (x). Observing that z = −|z| and |z| ∈ N, we can write (1)

f (z) = f (−|z|) = −f (|z|) = −|z|k = zk.

Before taking on an arbitrary rational number, let’s consider 1/n where n ∈ N. In this case, ¶ µ 1 1 1 k = f (1) = f + + ··· + n n n µ ¶ 1 = nf , n which gives f (1/n) = k/n. For m, n ∈ N we then get ¶ µ 1 1 1 + + ··· + f (m/n) = f n n n µ ¶ 1 = mf = k(m/n). n Finally, for any r ∈ Q satisfying r < 0, an argument similar to equation (1) above gives the result.

4.3. Combinations of Continuous Functions

65

(d) Fix x ∈ R. Because Q is dense in R, there exists a sequence (rn ) ⊆ Q with (rn ) → x. By our work in part (c) we know that f (rn ) = krn for all n. Then, because f is continuous at x, we have f (x) = lim f (rn ) = lim krn = kx. This completes the proof. Exercise 4.3.11. (a) The greatest integer function, h(x) = [[x]] from Example 4.3.7 is a suitable example. (b) Let   x(1 − x) if 0 < x < 1 with x ∈ Q 0 if 0 < x < 1 with x ∈ /Q k(x) =  0 if x ≥ 1 or x ≤ 0. Because x(1 − x) tends to zero as x approaches 0 and 1, it is possible to show that k is continuous at these points. (c) This time let   1 if 0 ≤ x ≤ 1 with x ∈ Q 0 if 0 ≤ x ≤ 1 with x ∈ /Q l(x) =  0 if x > 1 or x > 0. which fails to be continuous at 0 and 1, as requested. (d) The function ½ 1/n if x = 1/n for some n ∈ N g(x) = 0 otherwise is not continuous on A, but observe that it is continuous at 0. (Setting g(x) = 1 when x ∈ A would not work, instance.) Exercise 4.3.12. (a) Fix c ∈ C so that g(c) = 1. The standard way to proceed is to find a sequence (xn ) in the complement of C with (xn ) → c. Then lim g(xn ) 6= g(c) would show g is not continuous at c. Finding this sequence amounts to arguing that the Cantor set does not contain any intervals, and this is the content of Exercise 3.4.9. A more concise approach might be the following. Let ²0 = 1/2. Then for every δ > 0, the neighborhood Vδ (c) is not a subset of C (because C contains no intervals). Thus there exists a point x ∈ Vδ (c) with x ∈ / C, and consequently g(x) = 0 ∈ / V²0 (g(c)). By the criterion in Theorem 4.3.2 (iii), g is not continuous at c. (b) Now fix c ∈ / C, and let ² > 0 be arbitrary. Because C is closed, C c is open. This means that there exists a δ > 0 with Vδ (c) ⊆ C c . Now, if we consider any x ∈ Vδ (c), then x ∈ C c implies g(x) = 0. Looking again at the criterion for continuity in Theorem 4.3.2 (iii), we see that x ∈ Vδ (c) implies g(x) ∈ V² (g(c)), and thus g(x) is continuous at every c ∈ / C.

66

4.4

Chapter 4. Functional Limits and Continuity

Continuous Functions on Compact Sets

Exercise 4.4.1. (a) Fix c ∈ R and write |f (x) − f (c)| = |x3 − c3 | = |x − c||x2 + xc + c2 |. Insisting that δ ≤ 1 means that x will fall in the interval (c − 1, c + 1) and thus |x2 + xc + c2 | < (c + 1)2 + (c + 1)2 + c2 < 3(c + 1)2 . Now pick δ = min{1, ²/(3(c + 1)2 )}. Then |x − c| < δ implies µ ¶ ² |f (x) − f (c)| < 3(c + 1)2 = ². 3(c + 1)2 (b) The dependence of ² on the point c is evident in the previous formula with larger choices of c resulting in smaller values of δ. This means that the sequences (xn ) and (yn ) we seek are necessarily going to tend to infinity. Set xn = n and yn = n + 1/n. Then |xn − yn | = 1/n tends to zero as required, while µ ¶3 1 3 1 |f (xn ) − f (yn )| = |n3 − n + | = 3n + + 3 ≥ 3, n n n stays e0 = 3 units apart for all n ∈ N. This proves f is not uniformly continuous on R. (c) Let A be bounded by M . If x, c ∈ A then |x2 + xc + c2 | ≤ 3M 2 . Given ² > 0 we can now choose δ = ²/(3M 2 ), which is independent of c. If |x − c| < δ, it follow that ³ ² ´ |f (x) − f (c)| ≤ 3M 2 = ², 3M 2 and f is uniformly continuous on A. Exercise 4.4.2. For f (x) = 1/x2 we see ¯ ¯ ¯ ¯ ¶ µ ¯1 1 ¯¯ ¯¯ y 2 − x2 ¯¯ y+x ¯ . |f (x) − f (y)| = ¯ 2 − 2 ¯ = ¯ 2 2 ¯ = |y − x| x y x y x2 y 2 If we restrict our attention to x, y ≥ 1, then we can estimate y+x 1 1 = 2 + 2 ≤ 1 + 1 = 2. x2 y 2 x y xy Given ² > 0, we may then choose δ = ²/2 (independent of x and y), and it follows that |f (x) − f (y)| < (²/2)2 = ² whenever |x − y| < δ. This shows f is uniformly continuous on [1, ∞). If x and y are allowed to be arbitrarily close to zero, then the expression 2 (x+y)/(x2 y√ ) is unbounded√ and we get into trouble. To see this more explicitly, set xn = 1/ n and yn = 1/ n + 1. Then |xn − yn | → 0 while |f (xn ) − f (yn )| = |n − (n + 1)| = 1. By the criterion in Theorem 4.4.6, we conclude that f is not uniformly continuous on (0, 1].

4.4. Continuous Functions on Compact Sets

67

Exercise 4.4.3. Because compactness is preserved by continuous functions, the set f (K) is compact. By Exercise 3.3.1, y1 = sup f (K) exists and y1 ∈ f (K). Because y1 ∈ f (K), there must exist (at least one point) x1 ∈ K satisfying f (x1 ) = y1 , and it follows immediately from the definition of the supremum that f (x) ≤ f (x1 ) for all x ∈ K. A similar argument using the infimum yields x0 . Exercise 4.4.4. Because [a, b] is a compact set, it follows from the Extreme Value Theorem that f attains a minimum. That is, there exists a point x0 ∈ [a, b] where f (x0 ) ≤ f (x) for all x ∈ [a, b]. Then, because f (x0 ) > 0, we may write 1 1 ≤ , f (x) f (x0 ) and we see that 1/f is bounded. Exercise 4.4.5. Negating the definition of uniform continuity gives the following: A function f : A → R fails to be uniformly continuous on A if there exists ²0 > 0 such that for all δ > 0 we can find two points x and y satisfying |x − y| < δ but with |f (x) − f (y)| ≥ ²0 . The fact that no δ “works” means that if we were to try δ = 1, we would be able to find points x1 and y1 where |x1 − y1 | < 1 but |f (x1 ) − f (y1 )| ≥ ²0 . In a similar way, if we try δ = 1/n where n ∈ N, it follows that there exist points xn and yn with |xn − yn | < 1/n but where |f (xn ) − f (yn )| ≥ ²0 . The sequences (xn ) and (yn ) are precisely the ones described in Theorem 4.4.6. Exercise 4.4.6. (a) Let f (x) = 1/x and set xn = 1/n. Then f (xn ) = n which is not a Cauchy sequence. (b) This is impossible. A Cauchy sequence (xn ) in [0, 1] must have a limit in [0, 1] because this is a closed set. If x = lim xn , then by continuity f (x) = lim f (xn ). Because f (xn ) converges, it is a Cauchy sequence as well. (c) This is also impossible for the same reasons as in (b). Note that we did not use the compactness of [0, 1] but only the fact that it was closed. (d) The function f (x) = x(1 − x) has this property. Exercise 4.4.7. Let ² > 0 be arbitrary. Because f is uniformly continuous on (a, b], there exists δ1 > 0 such that |f (x) − f (y)| < ²/2 whenever x, y ∈ (a, b] satisfy |x − y| < δ1 . Likewise, there exists δ2 > 0 such that |f (x) − f (y)| < ²/2 whenever x, y ∈ [b, c) satisfy |x − y| < δ2 . Now set δ = min{δ1 , δ2 } and assume we have x and y satisfying |x−y| < δ. If both x and y fall in (a, b], or if they both fall in [b, c), then we get |f (x)−f (y)| < ²/2 < ². In the case where x b we may write |f (x) − f (y)| ≤ |f (x) − f (b)| + |f (b) − f (y)| <

² ² + = ². 2 2

Because δ1 and δ2 are both independent of x and y, δ is as well and we conclude that f is uniformly continuous on (a, c).

68

Chapter 4. Functional Limits and Continuity

Exercise 4.4.8. (a) We are given that f is uniformly continuous on [b, ∞). The set [0, b] is compact, and so by Theorem 4.4.8, f is also uniformly continuous on [0, b]. By argument precisely like the one in Exercise 4.4.7, we can show f is uniformly continuous on [0, ∞). (b) Let’s first focus our attention on the domain [1, ∞). If x, y ≥ 1, it follows that ¯ ¯ ¯ x−y ¯ √ 1 √ ¯ | x − y| = ¯ √ √ ¯ ≤ |x − y| . 2 x + y¯ √ So, given ² > 0 we can choose δ = 2², and it follows that f (x) = x is uniformly continuous on [1, ∞). By the observation in part (a), we get that f is uniformly continuous on [0, ∞). Exercise 4.4.9. (a) First write the Lipschitz condition in the form |f (x) − f (y)| ≤ M |x − y|

for all x, y ∈ A.

Given ² > 0, we choose δ = ²/M . Then |x − y| < δ implies |f (x) − f (y)| < M

² = ². M

This proves f is uniformly continuous. (b) No, √ all uniformly continuous functions are not Lipschitz. Consider f (x) = x on [0, 1]. A continuous function on a compact set is uniformly continuous. However, if we set y = 0 and consider x > 0, then we get ¯ ¯ ¯√ ¯ ¯ f (x) − f (y) ¯ ¯ x ¯ 1 ¯ ¯=¯ ¯ ¯ x−y ¯ ¯ x ¯ = √x , which is not bounded for x values arbitrarily close to zero. Exercise 4.4.10. Yes, uniformly continuous functions map bounded sets to bounded sets. Given ²0 = 1, there exists δ0 > 0 such that |f (x) − f (y)| < 1 as long as |x − y| < δ0 . Now the fact that A is bounded means that we can find a finite collection of points {x1 , x2 , . . . , xn } where the δ0 neighborhoods {Vδ0 (x1 ), Vδ0 (x2 ), . . . , Vδ0 (xn )} cover A. For each 0 ≤ i ≤ n, the image set f (Vδ0 (xi ) ∩ A) is bounded because |f (x) − f (y)| ≤ 1 whenever x, y ∈ Vδ0 (xi ) ∩ A. Because f (A) is covered by the finite collection of bounded sets {f (Vδ0 (xi ) ∩ A) : 0 ≤ i ≤ n}, it follows that f (A) is bounded as well. Exercise 4.4.11. (⇒) Assume g is continuous on R and let O ⊆ R be open. We want to prove g −1 (O) is open. To do this, we fix c ∈ g −1 (O) and show that there is a δ–neighborhood of c satisfying Vδ (c) ⊆ g −1 (O). Because c ∈ g −1 (O), we know g(c) ∈ O. Now O is open, so there exists an ² > 0 such that V² (g(c)) ⊆ O. Given this particular ², the continuity of g at

4.4. Continuous Functions on Compact Sets

69

c allows us to assert that there exists a neighborhood Vδ (c) with the property that x ∈ Vδ (c) implies g(x) ∈ V² (g(c)) ⊆ O. But this implies Vδ (c) ⊆ g −1 (O), which proves that g −1 (O) is open. (⇐) Conversely, we assume g −1 (O) is open whenever O is open, and show that g is continuous at an arbitrary point c ∈ R. Let ² > 0, and set O = V² (g(c)). Certainly O is open, so our hypothesis gives us that g −1 (O) is open. Because c ∈ g −1 (O), there exists a δ > 0 with Vδ (c) ⊆ g −1 (O). But this means that whenever x ∈ Vδ (c) we get g(x) ∈ O = V² (g(c)), and we conclude that g is continuous at c by the criterion in Theorem 4.3.2 (iii). Exercise 4.4.12. Assume f is continuous on a compact set K. We must show f is uniformly continuous. Let ² > 0. Then for each x ∈ K, the continuity of f tells us that there exists a δx > 0 (depending on x) with the property that |y − x| < δx

implies

|f (y) − f (x)| < ²/2.

Now consider the open cover of K consisting of the neighborhoods of the form {V 12 δx (x) : x ∈ K}. Because K is compact, there exists a finite subcover corresponding to a finite set of points {x1 , x2 , . . . , xn } in K. That is, K ⊆ V 12 δx1 (x1 ) ∪ V 12 δx2 (x2 ) ∪ · · · ∪ V 21 δxn (xn ). Because we have a finite cover, we may now let 1 1 1 δ = min{ δx1 , δx2 , . . . , δxn }, 2 2 2 and be confident that δ > 0. Now assume |x − y| < δ. Because we have a cover for K, there must exist xi for some 0 ≤ i ≤ n where |xi − x| < 12 δxi < δxi . It follows that |f (x) − f (xi )| < ²/2. Also, 1 |y − xi | ≤ |y − x| + |x − xi | < δ + δxi < δxi , 2 and so we get |f (y) − f (xi )| < ²/2 as well. Finally, |f (x) − f (y)|

≤

|f (x) − f (xi )| + |f (xi ) − f (y)| ² ² < + = ². 2 2

Because δ is chosen independently of x, this shows f is uniformly continuous on K. Exercise 4.4.13. (a) We want to show that f (xn ) is a Cauchy sequence, so let ² > 0 be arbitrary. Because f is uniformly continuous, there exists δ > 0

70

Chapter 4. Functional Limits and Continuity

such that |f (x) − f (y)| < ² whenever |x − y| < δ. Given this δ, we use the fact that (xn ) is a Cauchy sequence to say that there exists an N ∈ N such that |xn − yn | < δ whenever m, n ≥ N . Combining the last two statements we see that |f (xn ) − f (yn )| < ² whenever m, n ≥ N , which shows that f (xn ) is Cauchy. (b) (⇒ ∞) Let’s first assume f is uniformly continuous on (a, b). Now fix a sequence (xn ) in (a, b) with (xn ) → a. It follows from (a) that g(xn ) converges, so let’s define the value of g(a) by asserting that g(a) = lim g(xn ). Proving that g is continuous at a amounts to showing that if we now take an arbitrary sequence (yn ) that converges to a, then it follows that g(a) = lim g(yn ) as well. This is equivalent to showing that lim[g(yn ) − g(xn )] = 0. Given ² > 0, there exists a δ > 0 such that |g(y) − g(x)| < ² whenever |x−y| < δ. Because (xn ) and (yn ) each converge to a, we see that (yn −xn ) → 0. Thus, there exists an N ∈ N such that |yn − xn | < δ for all n ≥ N . But this implies |g(yn ) − g(xn )| < ² for all n ≥ N , and we conclude lim[g(yn ) − g(xn )] = 0. Because this implies g(a) = lim g(yn ), we see that g is continuous at a. A similar argument can be used for the point b. (⇐) Given that g can be continuously extended to the domain [a, b], we immediately get that g is uniformly continuous because [a, b] is a compact set. Thus g is certainly uniformly continuous on the smaller set (a, b).

4.5

The Intermediate Value Theorem

Exercise 4.5.1. The set [a, b] is connected, and so by Theorem 4.5.2, the image set f ([a, b]) is also connected. Because f (a) and f (b) are both elements of f ([a, b]), we see that L ∈ f ([a, b]) as well by Theorem 3.4.6. But this implies that there exists a point c ∈ (a, b) satisfying L = f (c), as desired. Exercise 4.5.2. (a) False. The function f (x) = 1/x takes the bounded interval (0, 1) to the unbounded interval (1, ∞). (b) False. The function f (x) = x(1 − x) takes the open interval (0, 1) to the set (0, 1/4], which is clearly not open. (c) True. By the Preservation of Compactness result, a continuous function maps a bounded closed set (i.e., a compact set) to another compact set. Then, by the Preservation of Connectedness result we may conclude that this compact set is, in fact, an interval. Exercise 4.5.3. No, because Q is not connected. If such a function were to contain 1 and 2 in its range,√then by the Intermediate Value Theorem, its range would also have to contain 2 (and many other irrational points).

4.5. The Intermediate Value Theorem

71

Exercise 4.5.4. Assume f : [a, b] → ∞ is increasing and satisfies the intermediate value property stated in Definition 4.5.3. Let’s fix c ∈ (a, b) (the case where c is an endpoint is similar), and let ² > 0. Our task is to produce a δ > 0 such that |x − c| < δ implies |f (x) − f (c)| < ². We know f (a) ≤ f (c). If f (c) − ²/2 < f (a), then set x1 = a. If f (a) ≤ f (c) − ²/2, then the intermediate value property for f implies that there exists x1 < c where f (x1 ) = f (c) − ²/2. Because f is increasing, we see that in either case x ∈ (x1 , c] implies ² f (c) − = f (x1 ) ≤ f (x) ≤ f (c). 2 We can follow a similar process on the other side to get that there exists a point x2 > c with the property that ² f (c) ≤ f (x) ≤ f (x2 ) = f (c) + , 2 whenever x ∈ [c, x2 ). Finally, we set δ = min{c − x1 , x2 − c}, and it follows that ² ² f (c) − ≤ f (x) ≤ f (c) + provided |x − c| < δ. 2 2 This completes the proof. Exercise 4.5.5. Assume, for contradiction, that f (c) > 0. If we set ²0 = f (c), then the continuity of f implies that there exists a δ0 > 0 with the property that x ∈ Vδ0 (c) implies f (x) ∈ V²0 (f (c)). But this implies that f (x) > 0 and thus x∈ / K for all x ∈ Vδ0 (c). What this means is that if c is an upper bound on K, then c − δ0 is a smaller upper bound, violating the definition of the supremum. We conclude that f (x) > 0 is not allowed. Now assume that f (c) < 0. This time, the continuity of f allows us to produce a neighborhood Vδ1 (c) where x ∈ Vδ1 (c) implies f (x) < 0. But this implies that a point such as c + δ1 /2 is an element of K, violating the fact that c is an upper bound for K. It follows that f (c) < 0 is also impossible, and we conclude that f (c) = 0 as desired. This proves the Intermediate Value Theorem for the special case where L = 0. To prove the more general version, we consider the auxiliary function h(x) = f (x) − L which is certainly continuous. From the special case just considered we know h(c) = 0 for some point c ∈ (a, b) from which it follows that f (c) = L. Exercise 4.5.6. By repeating the construction started in the text, we get a nested sequence of intervals In = [an , bn ] where f (an ) < 0 and f (bT n ) ≥ 0 for all ∞ n ∈ N. By the Nested Interval Property, there exists a point c ∈ n=1 In . The fact that the lengths of the intervals are tending to zero means that the two sequences (an ) and (bn ) each converge to c. Because f is continuous at c, we get f (c) = lim f (an ) where f (an ) < 0 for all n. Then the Order Limit Theorem implies f (c) ≤ 0. Because we also have f (c) = lim f (bn ) with f (bn ) ≥ 0, it must be that f (c) ≥ 0. We conclude that f (c) = 0.

72

Chapter 4. Functional Limits and Continuity

Exercise 4.5.7. The trick here is to apply the Intermediate Value Theorem to the function g(x) = f (x) − x. Because the range of f is contained in the interval [0, 1] we see that g(0) = f (0) ≥ 0

and

g(1) = f (1) − 1 ≤ 0.

It follows from IVT that we must have g(c) = 0 for some point c ∈ [0, 1], and this is equivalent to f (c) = c. Exercise 4.5.8. No. Let (m1 , h1 ) represent the position of the minute and hour hands respectively on “clock 1”, where the variables take values in the interval [0, 12] (with zero identified with 12). Let (m2 , h2 ) be the same for “clock 2.” Assume that clock 1 is set at 12:00 and clock 2 is set at H:00, where H ∈ {1, 2, . . . 11}. If we set x = m2 = h1 , then we may consider m1 = m1 (x) to be a continuous function of x with m1 (0) = 0 and m1 (1) = 12. Likewise, h2 = h2 (x) is also a continuous function of x with h2 (0) = H and h2 (1) = H + (1/12). Now what happens to the function d(x) = h2 (x) − m1 (x) as x ranges over the domain [0, 1]? Well, d(0) = H > 0 and d(1) = H + (1/12) − 12 < 0, and so by IVT there must exist a point c ∈ (0, 1) where d(c) = 0. For this value of c, the two times corresponding to m1 = m1 (c), h1 = c

and

m2 = c, h2 = h2 (c),

are indistinguishable if the hands on the two clocks are identical. This happens 11 times (once for each value of H) in the course of a twelve hour span of time. (Note: Refinements in this solution have admittedly made the use of IVT a bit artificial in this problem. We could explicitly write h2 (x) = H + (x/12) and m1 (x) = 12x, and then solve to get c = (12H)/143. As an example, let’s set H = 1. Then for clock 1 we have (144/143,12/143) which is approximately 12:05:02, and for clock 2 we have (12/143,144/143) which is approximately 1:00:25.)

4.6

Sets of Discontinuity

Exercise 4.6.1. This problem is contained in Exercise 4.3.11 Exercise 4.6.2. We say that limx→c− f (x) = L if for all ² > 0 there exists a δ > 0 such that |f (x) − L| < ² whenever 0 < c − x < δ. Exercise 4.6.3. (⇒) Let’s assume limx→c f (x) = L. Then given ² > 0, there exists a δ > 0 such that |f (x) − L| < ² whenever 0 < |x − c| < δ. This δ then satisfies the required condition to prove the existence of the left and right limits. (⇐) In the other direction, if we are given ² > 0 then we know that there exists a δ1 > 0 such that |f (x) − L| < ² whenever 0 < x − c < δ1 . We also know there exists a δ2 > 0 such that |f (x) − L| < δ2 whenever 0 < c − x < δ2 . If we set δ = min{δ1 , δ2 }, then it follows that |f (x) − l| < ² for all 0 < |x − c| < δ. We conclude limx→c f (x) = L.

4.6. Sets of Discontinuity

73

Exercise 4.6.4. This argument is very similar in spirit to the proof of the Monotone Convergence Theorem. Given c ∈ R, let’s prove that limx→c− f (x) exists for an increasing function f . Our first task is to produce a candidate for the value of the limit. To this end, set A = {f (x) : x < c}. Because f is increasing, A is bounded above by f (c). By AoC, we can set L = sup A. The claim is that limx→c− f (x) = L. Let ² > 0. By the least upper bound property of the supremum, we know that there exists an x0 < c satisfying L − ² < f (x0 ) ≤ L. If we set δ = c − x0 , then the fact that f is increasing implies that L − ² < f (x0 ) ≤ f (x) ≤ L whenever 0 < c − x < δ. This proves the claim. For the right-hand limit we can fashion a similar argument to show that lim f (x) = L0 ,

x→c+

where L0 = inf{f (x) : x > c}. A final consequence of this argument is that the value of the function at c must satisfy L ≤ f (c) ≤ L0 . If L = L0 then f is continuous at c, and if L < L0 then we have a jump discontinuity. There are no other possibilities. Exercise 4.6.5. Let c be a point of discontinuity for an increasing function f . If we set lim− f (x) = Lc and lim+ f (x) = L0c , x→c

x→c

L0c .

then we know Lc < Because Q is dense in R, there exists a rational number rc satisfying Lc < rc < L0c . It is also true that c1 < c2 implies rc1 < rc2 which implies that the mapping φ(c) = rc defined on the set of discontinuities of f must be 1–1. Because the range of φ is a subset of Q, it follows that the set of discontinuities of f is either countable or finite. Exercise 4.6.6. For Dirichlet’s function we see R is closed. For the modified Dirichlet function, we setSAn = (−∞, −1/n] ∪ [1/n, ∞) ∞ which is closed for each n ∈ N. Then R\{0} = n=1 An is an Fσ set. For Thomae’s function we observe that Q is the countable union of singleton sets, and a singleton set is closed. S∞ For the interval (0, 1] write (0, 1] = n=1 [1/n, 1].

74

Chapter 4. Functional Limits and Continuity

Exercise 4.6.7. Before getting started on this proof, let’s observe that the statement c ∈ Dα is equivalent to saying that for all δ > 0 there exist points y, z ∈ Vδ (c) satisfying |f (y) − f (z)| ≥ α. To prove Dα is closed we let c be a limit point of Dα and argue that c ∈ Dα . So let δ > 0 be arbitrary. Because c is a limit point, there must exist x0 ∈ Dα satisfying x0 ∈ Vδ/2 (c). But this means that there exist points y, z ∈ Vδ/2 (x0 ) where |f (y) − f (z)| ≥ α. Because Vδ/2 (x0 ) ⊆ Vδ (c), the points y, z provide us with exactly what we need to conclude that c ∈ Dα . (An alternate proof showing Dαc is open is also a productive way to attack this problem.) Exercise 4.6.8. Assume α1 < α2 and let c ∈ Dα2 . Given δ > 0, the statement c ∈ Dα2 implies that there exist y, z ∈ Vδ (c) satisfying |f (y) − f (z)| ≥ α2 > α1 . Thus c ∈ Dα1 as well. Exercise 4.6.9. Assume f is continuous at x. Then given our fixed α > 0, we know there exists a δ > 0 such that |f (y) − f (x)| <

α 2

provided y ∈ Vδ (x).

Thus, if y, z ∈ Vδ (x) we then get |f (y) − f (z)| ≤

|f (y) − f (x)| + |f (x) − f (z)| α α < + = α, 2 2

and we conclude that f is α-continuous at x. The contrapositive of this conclusion is that if f is not α-continuous at x, then it certainly cannot be continuous at x. This is precisely what it means to say Dα ⊆ Df . Exercise 4.6.10. Assume f is not continuous at x. Negating the ²–δ definition of continuity we get that there exists an ²0 > 0 with the property that for all δ > 0 there exists a point y ∈ Vδ (x) where |f (y) − f (x)| ≥ ²0 . Noting simply that both x, y ∈ Vδ (x), we conclude that f is not α-continuous for α = ²0 (or anything smaller.) S ∞ To prove Df = n=1 D1/n we argue for inclusion each way. If x ∈ Df , then we have just shown that x ∈ D²0 for some ²0 > 0. Choosing n0 ∈ S N small enough ∞ so that 1/n0 ≤ ²0 , it follows that x ∈ D1/n0 . This proves Df ⊆ n=1 D1/n . For the reverse inclusion we observe that Exercise 4.6.9 implies D1/n ⊆ Df for all n ∈ N, and the result follows.

Chapter 5

The Derivative 5.1

Discussion: Are Derivatives Continuous?

5.2

Derivatives and the Intermediate Value Property

Exercise 5.2.1. (i) First we rewrite the difference quotient as (f + g)(x) − (f + g)(c) x−c

f (x) + g(x) − f (c) − g(c) x−c f (x) − f (c) g(x) − g(c) + . x−c x−c

= =

The fact that f and g are differentiable at c together with the functional-limit version of the Algebraic Limit Theorem (Theorem 4.2.4) justifies the conclusion (f + g)0 (c) = f 0 (c) + g 0 (c). (ii) This time we rewrite the difference quotient as (kf )(x) − (kf )(c) x−c

= =

kf (x) − kf (c) x−c µ ¶ f (x) − f (c) k x−c

Because f is differentiable at c, it follows from the functional-limit version of the Algebraic Limit Theorem that (kf )0 (c) = kf 0 (c). Exercise 5.2.2. (a) For c 6= 0, the derivative of f at c is given by the formula f 0 (c) = lim

x→c

(c − x)/xc −1 −1 1/x − 1/c = lim = lim = 2. x→c x→c xc x−c x−c c 75

76

Chapter 5. The Derivative

(b) To avoid confusion with the notation in Theorem 5.2.4, let’s set h(x) = 1/x. By the Chain Rule, µ ¶0 1 −g 0 (x) = (h ◦ g)0 (x) = . g(x) [g(x)]2 Then using the product rule (Theorem 5.2.4 (iii)), we have µ ¶0 f (x) = [f (x)(h ◦ g)(x)]0 = f 0 (x)(h ◦ g)(x) + f (x)(h ◦ g)0 (x) g f 0 (x) f (x)g 0 (x) − = g(x) [g(x)]2 0 g(x)f (x) − f (x)g 0 (x) = [g(x)]2 provided that g(c) 6= 0. (c) Rewrite the difference quotient as µ ¶ (f /g)(x) − (f /g)(c) 1 f (x) f (c) = − x−c x − c g(x) g(c) µ ¶ 1 f (x)g(c) − f (c)g(x) = x−c g(x)g(c) µ ¶ 1 f (x)g(c) − f (c)g(c) + f (c)g(c) − f (c)g(x) = x−c g(x)g(c) µ ¶ 1 f (x) − f (c) g(x) − g(c) = g(c) − f (c) . g(x)g(c) x−c x−c Applying the Algebraic Limit Theorem for functional limits gives µ ¶0 f 1 (c) = (g(c)f 0 (c) − f (c)g 0 (c)) , g [g(c)]2 which gives the result. Exercise 5.2.3. Consider ½ h(x) =

x2 0

if x ∈ Q if x ∈ / Q.

For points different from zero this function is not continuous and thus not differentiable either. At zero, we have h0 (0) = lim

x→0

h(x) . x

Given ² > 0, choose δ = ². Because |h(x)/x| ≤ x, we see that |h(x)/x| < ² whenever 0 < |x| < δ and it follows that h is differentiable at zero with h0 (0) = 0.

5.2. Derivatives and the Intermediate Value Property

77

Exercise 5.2.4. (a) From the left side of zero we have limx→0− f (x) = 0, so we require that limx→0+ xa = 0 as well. This occurs if and only if a > 0. (b) From (a) we know fa (0) = 0. For fa0 (0) we again begin by considering the limit from the left and see that lim

x→0−

fa (x) − fa (0) 0 = lim− = 0. x−0 x→0 x

Thus, we require that lim+

x→0

xa = lim+ xa−1 = 0 x x→0

as well. This occurs if and only if a > 1. The derivative formula (xa )0 = axa−1 (which we have not justified for a ∈ / N) shows that fa0 (x) is continuous in this case. (c) Because we continue to get zero on the left, for the second derivative to exist we must have lim

x→0+

(xa )0 − 0 axa−1 = lim = lim axa−2 = 0. + x−0 x x→0 x→0+

This occurs whenever a > 2. Exercise 5.2.5. (a) With regards to the existence of ga0 (x) at x = 0 we see that xa sin(1/x) = lim xa−1 sin(1/x) = 0, x→0 x→0 x

ga0 (0) = lim

as long as a > 1. For x 6= 0, ga0 (x) always exists and using the standard rules of differentiation we get ga0 (x) = −xa−2 cos(1/x) + axa−1 sin(1/x). Setting 1 < a < 2 makes xa−2 cos(1/x) unbounded near zero and yields the desired function. (b) For ga0 (x) to be continuous we need lim ga0 (x) = ga0 (0) = 0

x→0

and, looking at the above expression for ga0 (x), we see that this happens as long as a > 2. For the second derivative ga00 (0) we consider the limit µ ¶ ¢ 1 ¡ a−2 ga0 (x) 00 ga (0) = lim = lim −x cos(1/x) + axa−1 sin(1/x) x→0 x→0 x x ¡ ¢ = lim −xa−3 cos(1/x) + axa−2 sin(1/x) x→0

which exists if and only if a > 3. Thus setting 2 < a ≤ 3 gives the desired function.

78

Chapter 5. The Derivative

(c) From (b) we see that choosing a > 3 makes ga0 differentiable at zero. Away from zero we get ga00 (x) = −xa−4 sin(1/x) − (2a − 2)xa−3 cos(1/x) + a(a − 1)xa−2 sin(1/x), which fails to be continuous at zero when a ≤ 4. Setting 3 < a ≤ 4 gives the desired function. Exercise 5.2.6. (a) First let’s prove that there exists x ∈ (a, b) where g(x) < g(a). Let (xn ) be a sequence in (a, b) satisfying (xn ) → a. Then we have g 0 (a) = lim

n→∞

g(xn ) − g(a) < 0. xn − a

The denominator is always positive. If the numerator were always positive then the Order Limit Theorem would imply g 0 (a) ≥ 0. Because we know this is not the case, we may conclude that the numerator is eventually negative and thus g(x) < g(a) for some x near a. The proof that there exists y ∈ (a, b) where g(y) < g(b) is similar. (b) We must show that g 0 (c) = 0 for some c ∈ (a, b). Because g is differentiable on the compact set [a, b] it must also be continuous here, and so by Extreme Value Theorem (Theorem 4.4.3), g attains a minimum at a point c ∈ [a, b]. From our work in (a) we know that the minimum of g is neither g(a) nor g(b), and therefore c ∈ (a, b). Finally, the Interior Extremum Theorem (Theorem 5.2.6) allows us to conclude g 0 (c) = 0. To prove the general result stated in the theorem we just observe that g 0 (c) = 0 is equivalent to the conclusion f 0 (c) = α. Exercise 5.2.7. (a) A function f : A → R is uniformly differentiable on A with derivative f 0 (t) if for every ² > 0 there exists a δ > 0 such that |x − t| < δ implies ¯ ¯ ¯ f (x) − f (t) ¯ 0 ¯ − f (t)¯¯ < ². ¯ x−t (b) Consider f (x) = x2 , which has derivative f 0 (x) = 2x, and observe ¯ 2 ¯ ¯ ¯ ¯ x − t2 ¯ ¯ (x − t)(x + t) ¯ ¯ ¯=¯ ¯ = |x − t|. − 2t − 2t ¯ x−t ¯ ¯ ¯ x−t Given ² > 0, we can choose δ = ². Then |x − t| < δ = ² implies ¯ 2 ¯ ¯ x − t2 ¯ ¯ ¯ = |x − t| < ² − 2t ¯ x−t ¯ as desired. (c) Not necessarily. Consider g2 (x) in Section 5.1. It is differentiable on [0, 1], but not uniformly differentiable on [0.1]. Given a fixed ² > 0, the value of the response δ gets progressively smaller as we try to compute g20 (t) at points

5.3. The Mean Value Theorem

79

closer and closer to zero. To see this explicitly, set tn = 1/(2nπ) and xn = 0. Then observe that |xn − tn | → 0 while ¯ ¯ ¯ g2 (xn ) − g2 (tn ) ¯ 0 ¯ − g2 (tn )¯¯ = |tn sin(1/tn ) + cos(1/tn ) − 2tn sin(1/tn )| ¯ x n − tn = |cos(1/tn ) − tn sin(1/tn )| = 1 for all n ∈ N. In the spirit of the criterion for non-uniform continuity described in Theorem 4.4.6, we see that g2 (x) is not uniformly differentiable. Exercise 5.2.8. (a) True. Although the derivative function need not be continuous, it does satisfy the intermediate value property. Thus, if the derivative of a function takes on two distinct values then it attains every value–rational and irrational–in between these two. (b) False. Consider ½ x/2 + x2 sin(1/x) if x 6= 0 f (x) = 0 if x = 0. At zero we can show that f 0 (0) = 1/2. Away from zero we get f 0 (x) = 1/2 − cos(1/x) + 2x sin(1/x), which takes on negative values in every δ-neighborhood of zero. (c)True. Assume, for contradiction, that L 6= f 0 (0) and choose ²0 > 0 so that ²0 < |f 0 (0) − L|. From the hypothesis that limx→0 f 0 (x) = L we know there exists a δ > 0 such that 0 < |x| < δ implies that |f 0 (x) − L| < ²0 . Now our choice of ²0 guarantees that there exists a point α between f 0 (0) and L but outside V²0 (L). However, by Darboux’s Theorem, there exists a point x ∈ Vδ (0) such that f 0 (x) = α. This suggests that α ∈ V²0 (L), which is a contradiction. Therefore L = f 0 (0). (d) True. More to come...

5.3

The Mean Value Theorem

Exercise 5.3.1. Because f 0 is continuous on the compact set [a, b], we know that it is bounded. Thus, there exists M > 0 such that |f 0 (x)| ≤ M for all x ∈ [a, b]. Now, given x < y in the interval [a, b], the Mean Value Theorem says that there exists a point c ∈ (a, b) for which f (x) − f (y) = f 0 (c). x−y Because |f 0 (c)| ≤ M (regardless of the value of c), it follows that ¯ ¯ ¯ f (x) − f (y) ¯ ¯ ¯ ≤ M. ¯ x−y ¯

80

Chapter 5. The Derivative

Exercise 5.3.2. Because f 0 is continuous on a compact set—let’s call it I—the Extreme Value Theorem can be used to conclude that f 0 (c) attains a maximum and a minimum value on I. Thinking in terms of absolute value, this means that there exists a point x0 ∈ I where |f 0 (x)| ≤ |f 0 (x0 )| for all x ∈ I. Setting s = |f 0 (x0 )|, we see from our hypothesis that 0 ≤ s < 1. Now, given x < y in I, the Mean Value Theorem tells us that there exists a point c ∈ I where ¯ ¯ ¯ f (x) − f (y) ¯ ¯ = |f 0 (c)| ≤ |f 0 (x0 )| = s. ¯ ¯ ¯ x−y It follows that |f (x) − f (y)| ≤ s|x − y|, and f is contractive on I. Exercise 5.3.3. (a) Set g(x) = x − h(x). Because g(1) = −1 and g(3) = 1, by the Intermediate Value Theorem (Theorem 4.5.1), there must exist a d ∈ [0, 3] where g(d) = 0. In terms of h, we note that this implies h(d) = d, as desired. (b) Applying the Mean Value Theorem to h on the interval [0, 3] implies that there exists a point c ∈ (0, 3) where h0 (c) =

h(3) − h(0) 2−1 1 = = . 3−0 3 3

(c) Applying Rolle’s Theorem to h on the interval [1, 3], we see that there must exist a point a0 ∈ (1, 3) where h0 (a) = 0. In (b), we found a point where h0 (c) = 1/3. Because 1/4 falls between 0 and 1/3, we can appeal to Darboux’s Theorem to assert that h0 (x) = 1/4 at some point between c and a. Exercise 5.3.4. (a) Let h(x) = [f (b) − f (a)]g(x) − [g(b) − g(a)]f (x). From the many “algebraic limit” theorems we know that h is continuous on [a, b] and differentiable on (a, b). We also have h(a) = g(a)f (b) − f (a)g(b) = h(b). Thus by Rolle’s Theorem, there exists a c ∈ (a, b) where h0 (c) = 0. Because h0 (x) = [f (b) − f (a)]g 0 (x) − [g(b) − g(a)]f 0 (x), we see that

[f (b) − f (a)]g 0 (c) − [g(b) − g(a)]f 0 (c) = 0,

and the result follows. (b) Set x = g(t) and y = f (t) and consider the parametric curve in the x–y plane drawn as t ranges over the interval [a, b]. The quantity (f (b)−f (a))/(g(b)− g(a)) corresponds to the slope of the segment joining the endpoints of this curve, while f 0 (c)/g 0 (c) gives the slope of the line tangent to the curve at the point (g(c), f (c)). In this context, the Generalized Mean Value Theorem says that if g 0 is never zero, then at some point along the parametric curve, the tangent line must be parallel to the segment joining the two endpoints.

5.3. The Mean Value Theorem

81

Exercise 5.3.5. Assume, for contradiction, that f has two distinct fixed points x1 and x2 . Noting that f (x1 ) = x1 and f (x2 ) = x2 , the Mean Value Theorem implies that there exists c where f 0 (c) =

f (x1 ) − f (x2 ) x1 − x2 = = 1. x1 − x2 x1 − x2

Because this is impossible, we conclude that f can have at most one fixed point. Exercise 5.3.6. (⇒) First let’s show that if g(d) = d for some d ∈ (0, 1), then g 0 (1) > 1. Applying the Mean Value Theorem to g on [d, 1], we can see that g 0 (c) =

g(1) − g(d) 1−d = =1 1−d 1−d

for some c ∈ (d, 1). Now we apply the Mean Value Theorem to g 0 on [c, 1] to assert that g 0 (1) − g 0 (c) g 00 (a) = 1−c for some a ∈ (c, 1). Because g 00 (a) > 0, the numerator in the previous expression must be strictly positive and it follows that g 0 (1) > g 0 (c) = 1. (⇐) Now let’s show that if g 0 (1) > 1, then g(d) = d for some d ∈ (0, 1). As we often do in arguments about fixed points, define the auxiliary function f (x) = g(x) − x and observe that f (0) = g(0) > 0. If we could find a point x ∈ (0, 1) where f (x) < 0, then we could use the Intermediate Value Theorem to conclude f (d) = 0 for some d. At x = 1 we have f (1) = g(1) − 1 = 0 and f 0 (1) = g 0 (1) − 1 > 0. If (xn ) ⊆ (0, 1) satisfies (xn ) → 1, then f 0 (1) = lim

n→∞

f (1) − f (xn ) −f (xn ) = lim > 0. n→∞ 1 − xn 1 − xn

If f (xn ) ≥ 0 for all n ∈ N, then the Order Limit Theorem would imply f 0 (1) ≤ 0. Because this is not the case, it follows that f (xn ) < 0 must be true for some values of n. Because f (0) > 0, we know that there must exist a point d where f (d) = 0. Finally, this implies g(d) = d. Exercise 5.3.7. (a) (⇒) If f is increasing on (a, b), then f (x) − f (c) ≥0 x−c for every x, c ∈ (a, b). It follows from the Order Limit Theorem (or an analogous version for functional limits) that f 0 (c) = lim

x→c

for all c ∈ (a, b).

f (x) − f (c) ≥0 x−c

82

Chapter 5. The Derivative

(⇐) For the other direction we use the Mean Value Theorem. Here we are assuming f 0 (x) ≥ 0 on (a, b) and we are asked to prove that f is increasing. Given x < y, it follows from MVT that f 0 (c) =

f (y) − f (x) y−x

for some point c ∈ (a, b). Because f 0 (c) ≥ 0 and y − x > 0, we conclude that f (x) ≤ f (y) and f is increasing. (b) First observe that 1 x/2 + x2 sin(1/x) 1 = lim + x sin(1/x) = . x→0 x→0 x 2 2 Thus the derivative is strictly positive at zero. Away from zero, however, we get g 0 (x) = 1/2 − cos(1/x) + 2x sin(1/x). g 0 (0) = lim

If we set xn = 1/(2nπ), then g 0 (xn ) < 0 for all n ∈ N. Because (xn ) → 0, there is no neighborhood around zero in which g 0 (x) ≥ 0, and so by part (a), the function is not increasing in any neighborhood of zero. The moral here is that knowing that the derivative is positive at a point does not imply that the function is increasing near this point. Exercise 5.3.8. Let’s consider the case where L = g 0 (c) > 0. Set ²0 = L. Because g(x) − g(c) L = lim , x→c x−c there exists a neighborhood Vδ (c) with the property that g(x) − g(c) ∈ V²0 (L) x−c whenever x ∈ Vδ (c). But notice that V²0 (L) contains only positive numbers. This means that if x > c then g(x) > g(c), and if x < c then g(x) < g(c). Exercise 5.3.7 reminds us that a positive derivative at a single point does not imply that the function is increasing in a neighborhood of this point. What this exercise shows is that then we can say something weaker. If g 0 (c) > 0 then it does follow that x > c implies g(x) > g(c) and x < c implies g(x) < g(c). Very roughly speaking, we might say that “g is increasing at the point c.” Exercise 5.3.9. Let M > 0 be arbitrary. We need to produce a δ such that 0 < |x − c| < δ implies that |f (x)/g(x)| ≥ M . Choose δ1 so that 0 < |x − c| < δ1 implies |f (x) − L| < |L|/2. This guarantees that f (x) is not too close to zero, and in particular we have |f (x)| ≥ |L|/2. Because limx→c g(x) = 0, we can choose δ2 such that |g(x)| < |L|/2M provided 0 < |x − c| < δ2 . Let δ = min{δ1 , δ2 }. Then we have ¯ ¯ ¯ f (x) ¯ |L|/2 ¯ ¯ ¯ g(x) ¯ ≥ |L|/2M = M whenever 0 < |x − c| < δ, and the result is proved.

5.3. The Mean Value Theorem

83

Exercise 5.3.10. The fact that f is bounded means that there exists M > 0 satisfying |f (x)| ≤ M for all x in the domain. Let ² > 0. Because limx→c g(x) = ∞, there exists a δ > 0 such that 0 < |x − c| < δ implies |g(x)| ≥ M/². It then follows that ¯ ¯ ¯ f (x) ¯ M ¯ ¯ ¯ g(x) ¯ < M/² = ², provided 0 < |x − c| < δ, and the proof is complete. Exercise 5.3.11. Let ² > 0. Because L = limx→a f 0 (x)/g 0 (x), we know that there exists a δ > 0 such that ¯ 0 ¯ ¯ f (t) ¯ ¯ ¯ ¯ g 0 (t) − L¯ < ² provided 0 < |t − a| < δ. This δ is going to suffice to prove L = limx→a f (x)/g(x) as well. To see why, pick x ∈ Vδ (a) with a < x (the case x < a is similar) and apply GMVT to f and g on the interval [a, x]. In this case we get a point c ∈ (a, x) where f 0 (c) f (x) − f (a) f (x) = = . 0 g (c) g(x) − g(a) g(x) Because c must satisfy 0 < |c − a| < δ, it follows that ¯ ¯ ¯ 0 ¯ ¯ f (x) ¯ ¯ f (c) ¯ ¯ ¯=¯ ¯<² − L − L ¯ g(x) ¯ ¯ g 0 (c) ¯ whenever 0 < |x − a| < δ. This completes the proof. Exercise 5.3.12. For all x 6= a we can write f (x) f (x) − f (a) (f (x) − f (a))/(x − a) = = . g(x) g(x) − g(a) (g(x) − g(a))/(x − a) Because f and g are differentiable at a, we may use the Algebraic Limit Theorem to conclude f 0 (a) f (x) lim = 0 . x→a g(x) g (a) Finally, the continuity of f 0 and g 0 at a implies f 0 (a) f 0 (x) = 0 , 0 x→a g (x) g (a)

L = lim

and the result follows. (Note that this argument also assumes g 0 (a) 6= 0.) Exercise 5.3.13. Because f and g are continuous on the interval containing a, we can conclude that f (a) = limx→a f (x) = 0 and g(a) = limx→a g(x) = 0. Now we have the same hypothesis as Theorem 5.3.6, and the rest of the proof will be the same.

84

5.4

Chapter 5. The Derivative

A Continuous Nowhere-Differentiable Function

Exercise 5.4.1. The graph of h1 (x) is similar to the sawtooth function h(x) except that the maximum height is now 1/2 and the length of the period is 1. For each n, the maximum height of hn (x) is 1/2n and the period is 1/2n−1 . Note that the slopes of the segments that make up hn (x) continue to be ±1 for all values of n. Exercise 5.4.2. The key observation is that h( x) ≤ 1 so that for every n we have 1 1 0 ≤ n hn (2n x) ≤ n . 2 2 P∞ Because the geometric series n=0 1/2n converges, the Comparison Test implies that our series for g(x) converges for every choice of x. Because all the terms are positive, the convergence is absolute. Exercise 5.4.3. For each n, the linear function l(x) = 2n x is certainly continuous. Then the Composition of Continuous Functions Theorem (Theorem 4.3.9) implies h(2n x) is continuous. The Algebraic Continuity Theorem (Theorem 4.3.4) part (i) implies 21n h(2n x) is continuous. Finally, part (ii) of the same theorem says 1 1 gm (x) = h(x) + h(2x) + · · · + m h(2m x) 2 2 is continuous as long as the sum is finite. Exercise 5.4.4. For g 0 (0) to exist, the sequential criterion for limits requires that g(xm ) − g(0) g 0 (0) = lim m→∞ xm − 0 exist for any sequence (xm ) → 0. Fix m ∈ N and consider xm = 1/2m . Then g(xm ) =

∞ X 1 h(2n−m ). n 2 n=0

If n > m then h(2n−m ) = 0 because the sawtooth function is zero at any multiple of 2. If n ≤ m then we are on the part of the graph where h(x) = x and we get 1 1 1 h(2n−m ) = n 2n−m = . 2n 2 2m It follows that g(xm ) can be represented with the finite sum m X 1 . g(xm ) = m 2 n=0

5.4. A Continuous Nowhere-Differentiable Function

85

Turning our attention to the difference quotient, we get g(xm ) − g(0) = xm − 0

Pm

m X 1/2m = 1 = m + 1. 1/2m n=0

n=0

Because this quantity increases without bound, it is impossible for limm→∞ g(xm )/xm to exist. It follows that g is not differentiable at zero. Exercise 5.4.5. (a) To show that g 0 (1) does not exist we continue to let xm = 1/2m and consider g(1 + xm ) =

∞ ∞ X X 1 1 n m h(2 h(2n + 2n−m ). (1 + 1/2 )) = n n 2 2 n=0 n=0

If n > m then, as before, 1 h(2n + 2n−m ) = 0. 2n If 1 ≤ n ≤ m, then 1 1 1 1 h(2n + 2n−m ) = n h(2n−m ) = n 2n−m = m . 2n 2 2 2 If n = 0, then 1 h(2n + 2n−m ) = h(1 + 1/2m ) = h(1) − 1/2m = g(1) − 1/2m . 2n If we write down the difference quotient for the interval [1, xm ] we get Pm n n n−m ) − g(1) g(1 + xm ) − g(1) n=0 1/2 h(2 + 2 = m xm 1/2 Pm [ n=1 1/2m ] + (g(1) − 1/2m ) − g(1) = = m − 1. 1/2m Because this is (again) unbounded as m → ∞, it must be that g 0 (1) does not exist. (b) Now let x = p/2k and consider g(x + xm ) =

∞ ∞ X X 1 1 n m h(2 (x + 1/2 )) = h(p2n−k + 2n−m ). n n 2 2 n=0 n=0

Because we are ultimately interested in what happens as m → ∞, let’s compute g(x + xm ) assuming m > k. If n > m then because we are at a multiple of 2 on the graph of h(x) it follows that 1 h(p2n−k + 2n−m ) = 0. 2n

86

Chapter 5. The Derivative

If k < n ≤ m, then the periodicity of h allows us to write 1 1 1 1 h(p2n−k + 2n−m ) = n h(2n−m ) = n 2n−m = m . 2n 2 2 2 Finally, if 0 ≤ n ≤ k, then 2n x and 2n−m fall on the same linear segment of h(x) and we get ¤ 1 1 1 £ h(p2n−k + 2n−m ) = n h(p2n−k ) ± 2n−m = n h(2n x) ± 1/2m , n 2 2 2 where the choice of + or − depends on the value of p. Observing that g(x) = Pk n n n=0 (1/2 )h(2 x), it follows that Pm n n−k + 2n−m ) − g(x) g(x + xm ) − g(x) n=0 1/2 h(p2 = xm 1/2m hP i £Pm ¤ k m n n m 1/2 + (1/2 )h(2 x) ± 1/2 − g(x) n=k+1 n=0 = 1/2m =

(m − k − 1) +

k X

±1 ≥ m − 2k − 1.

n=0

Because this is unbounded as m → ∞, it must be that g 0 (x) does not exist for any dyadic rational point on the graph. The fact that we get ∞ from the right for all of these limits is reflected in the graph of g by the downward cusps that appear at every dyadic rational point. Exercise 5.4.6. (i) Because each hi is differentiable at all nondyadic points, Theorem 5.2.4 implies that the finite sum gm is differentiable at nondyadic points as well. This same theorem also allows us to say 0 0 |gm+1 (x) − gm (x)| = |h0m+1 (x)|

and h0m+1 (x) = ±1 because it is a piecewise linear function consisting of segments of slope ± 1. 0 (b) The partial sum gm is a piecewise linear function and gm (x) is the slope of the piece containing the nondyadic point x ∈ [xm , ym ]. The first important observation is that because hn (xm ) = hn (ym ) = 0 for all n > m, it follows that gm (xm ) = g(xm ) and gm (ym ) = g(ym ). Focusing on the graphs over the interval [xm , ym ], what we see is that gm is the line segment connecting the points (xm , g(xm )) and (ym , g(ym ) and thus 0 gm (x) =

g(ym ) − g(xm ) . ym − xm

The other important observation is that because hn (x) ≥ 0 for all n, we get that g(x) > gm (x). Put another way, the segment of gm over the interval [xm , ym ] lies under the graph of g (and is equal to g at the endpoints). It follows that g(ym ) − g(x) g(ym ) − g(xm ) g(x) − g(xm ) < < , ym − x ym − xm x − xm

5.4. A Continuous Nowhere-Differentiable Function

87

and the result follows. (c) If g 0 (x) did exist, then the sequential criterion for functional limits would imply that g 0 (x) = lim

m→∞

g(xm ) − g(x) g(ym ) − g(x) = lim . m→∞ xm − x ym − x

Then we could use a squeeze theorem argument to conclude that 0 g 0 (x) = lim gm (x). m→∞

0 (x) does not exist. From our work in (a) we see The problem is that limm→∞ gm 0 that gm (x) is not a Cauchy sequence and so it cannot converge. We conclude that g 0 (x) does not exist. P∞ Exercise 5.4.7. If we set g(x) = n=0 (1/2n )h(3n x), then we have 0 0 |gm+1 (x) − gm (x)| = |h0m+1 (x)| = (3/2)m+1 . 0 Because this does not tend to zero, the sequence gm (x) again fails to be a Cauchy 0 sequence and we can conclude that g (x) does not exist. To set up a parallel with Hardy’s result, set a = 1/2 and b = 3 and notice that ab = 3/2 ≥ 1. What happens when ab < 1? Letting a = 1/3 and b = 2 corresponds to the P∞ function g(x) = n=0 (1/3n )h(2n x). In this case we have 0 0 |gm+1 (x) − gm (x)| = |h0m+1 (x)| = (2/3)m+1 ,

which does tend to zero as m → ∞. Thus, our argument no longer works and, in fact, it turns out that g(x) is differentiable at every nondyadic point in its domain (see Theorem 6.4.3.)

88

Chapter 5. The Derivative

Chapter 6

Sequences and Series of Functions 6.1

Discussion: Branching Processes

6.2

Uniform Convergence of a Sequence of Functions

Exercise 6.2.1. (a) By dividing the numerator and denominator by n, we can compute nx 1 x lim fn (x) = lim = lim = . n→∞ n→∞ 1 + nx2 n→∞ 1/n + x2 x Therefore, the pointwise limit of fn (x) is f (x) = 1/x. (b) The convergence of (fn (x)) is not uniform on (0, ∞). To see this write ¯ ¯ ¯ nx 1 ¯¯ 1 ¯ |fn (x) − f (x)| = ¯ − ¯= . 1 + nx2 x x + nx3 In order to make |fn (x) − f (x)| < ² we must choose N≥

1 − ²x . ²x3

For a fixed ² > 0, the expression (1 − ²x)/(²x3 ) grows without bound as x tends to zero, and thus there is no way to pick a value of N that will work for every value of x in (0, ∞). (c) The convergence is not uniform on (0, 1) either. As seen in (b), the problem arises when x tends to zero and this is equally relevant over the domain (0, 1). (d) The convergence is uniform on the interval (1, ∞). If x > 1 then it follows that 1 1 ≤ . |fn (x) − f (x)| = x + nx3 1+n 89

90

Chapter 6. Sequences and Series of Functions

Given ² > 0, choose N large enough so that 1/(1 + n) < ² whenever n ≥ N . It follows that |fn (x) − f (x)| < ² for all n ≥ N and thus (fn ) → f uniformly on (1, ∞). Exercise 6.2.2. To compute the pointwise limit write ¶ µ 1 x x lim gn (x) = lim + sin(nx) = . n→∞ n→∞ 2 2n 2 Setting g(x) = x/2, we see that

¯ ¯ ¯ ¯ 1 1 ¯ sin(nx)¯¯ ≤ . |gn (x) − g(x)| = ¯ 2n 2n

Given ² > 0, choose N > 1/(2²) and observe that this is independent of x. Then n ≥ N implies 1 |gn (x) − g(x)| ≤ < ² for all n ≥ N . 2n It follows that gn → g uniformly on R, and thus on any subset of R as well. Exercise 6.2.3. (a) The pointwise limit of (hn ) on [0, ∞) is  if 0 ≤ x < 1  x x 1/2 if x=1 h(x) = lim = n→∞ 1 + xn  0 if x > 1 (b) Theorem 6.2.6 tells us that if the convergence were uniform then h(x) would be continuous. However, h(x) is not continuous at x = 1 and so the convergence cannot be uniform on any domain containing this point. In fact, the convergence is not uniform over any domain that has x = 1 as a limit point. (c) Consider the set [2, ∞). If x ≥ 2 then ¯ ¯ ¯ ¯ x ¯< x ≤ 1 . − 0 |hn (x) − h(x)| = ¯¯ ¯ xn n 1+x 2n−1 Given ² > 0, pick N so that n ≥ N implies 1/2n−1 < ². Then |hn (x) − h(x)| < ² for all n ≥ N , and we conclude that hn → h uniformly on [2, ∞). Exercise 6.2.4. Taking the derivative we find fn0 (x) =

1 − x2 n (x2 n + 1)2

√ which yields critical points ±1/ n. Using the standard√ techniques from calculus n and the minimum at we can determine that the maximum of f occurs at 1/ √ √ √ √ −1/ n. Because fn (1/ n) = |fn (−1/ n)| = 1/(2 n), we see that 1 |fn (x)| ≤ √ 2 n

for all x ∈ R.

To show (fn ) → 0 uniformly √ on R, we let ² > 0 and choose N large enough so that n ≥ N implies 1/(2 n) < ². It follows that |fn (x) − 0| < ² whenever n ≥ N , as desired.

6.2. Uniform Convergence of a Sequence of Functions

91

Exercise 6.2.5. (a) Taking the limit for each fixed value of x we find that fn (x) converges pointwise to ½ 1 if x 6= 0 f (x) = 0 if x = 0. Each of the functions fn is continuous, but the limit function f is not. Therefore, Theorem 6.2.6 tells us that the convergence cannot be uniform. (b) We can imitate the construction in (a) but use an unbounded function like 1/x2 in place of the constant function 1. Specifically, let ½ 1/x2 if |x| ≥ 1/n fn (x) = n3 |x| if |x| < 1/n Then each fn is continuous and the pointwise limit is f (x) = lim fn (x) = 1/x2 , except at zero where we get a limit of f (0) = 0. Exercise 6.2.6. (⇒) This is the easier of the two directions. Let ² > 0 be arbitrary. Given that (fn ) converges uniformly on A, our job is to produce an N such that |fn (x) − fm (x)| < ² for all m, n ≥ N and x ∈ A. Because we are given that (fn ) converges uniformly, we may let f (x) = limn→∞ fn (x). By the definition of uniform convergence, there exists an N with the property that |fn (x) − f (x)| <

² 2

for all n ≥ N and x ∈ A.

Now given m, n ≥ N , it follows that |fn (x) − fm (x)|

= |fn (x) − f (x) + f (x) − fm (x)| ≤ |fn (x) − f (x)| + |f (x) − fm (x)| ² ² < + =² 2 2

for all x ∈ A. This completes the proof in the forward direction. (⇐) In this direction we assume that, given ² > 0, there exists an N such that |fn (x) − fm (x)| < ² for all m, n ≥ N and x ∈ A. Our goal is to prove that fn (x) converges uniformly. To produce a candidate for the limit, notice that for each x ∈ A our hypothesis tells us that the sequence (fn (x)) is a Cauchy sequence. Because Cauchy sequences converge, it makes sense to define the limit function f (x) = lim fn (x). n→∞

It is important to realize that, because we are applying the Cauchy Criterion to sequences generated at each point x ∈ A, all we have proved thus far is that fn (x) → f (x) pointwise on A. Let ² > 0. Using our hypothesis again (in its full strength this time), we know that there exists an N such that −² < fn (x) − fm (x) < ²

for all m, n ≥ N and x ∈ A.

92

Chapter 6. Sequences and Series of Functions

The Algebraic Limit Theorem says that lim (fn (x) − fm (x)) = fn (x) − f (x),

m→∞

and the Order Limit Theorem then implies −² ≤ fn (x) − f (x) ≤ ²

for all n ≥ N and x ∈ A.

This is sufficient to conclude that fn → f uniformly on A. Exercise 6.2.7. This argument really amounts to adopting the proof of Theorem 6.2.6 to this stronger set of assumptions. Let ² > 0 be arbitrary. We need to show that there exists a δ > 0 such that |x − y| < δ implies |f (x) − f (y)| < ² for all x, y ∈ A. First choose N so that |fN (x) − f (x)| <

² 3

for all x ∈ A.

Because fN is uniformly continuous on A, there exists a δ > 0 such that |fN (x) − fN (y)| < ²/3

whenever |x − y| < δ.

But this implies |f (x) − f (y)|

= ≤ <

|f (x) − fN (x) + fN (x) − fN (y) + fN (y) − f (y)| |f (x) − fN (x)| + |fN (x) − fN (y)| + |fN (y) − f (y)| ² ² ² + + =² 3 3 3

We conclude that f is uniformly continuous on A. Exercise 6.2.8. (a) False. Consider Example 6.2.2 (ii). (b) True. Let ² > 0 be arbitrary and assume |g(x)| ≤ M . We need to show that there exists an N such that n ≥ N implies |fn g − f g| < ². Because fn → f uniformly, we know there exists an N such that |fn − f | < ²/M for all n ≥ N . It follows that |fn g − f g| = |g||fn − f | ≤ M |fn − f | < M (²/M ) = ² for all n ≥ N , as desired. (c) True. Pick N such that |fN (x) − f (x)| ≤ 1

for all x ∈ A.

If |fN (x)| ≤ M for all x ∈ A, then it follows that |f (x)| ≤ M + 1 on A, and hence f is bounded. (d) True. Let ² > 0 be arbitrary. Because fn → f uniformly on A we can pick N1 such that n ≥ N1 implies |fn − f | < ² for all x ∈ A. Similarly, because

6.2. Uniform Convergence of a Sequence of Functions

93

fn → f uniformly on B we can pick N2 so that n ≥ N2 implies |fn − f | < ² for all x ∈ B. Now let N = max{N1 , N2 }. Then n ≥ N implies |fn − f | < ² for all x ∈ A ∪ B. (e) True. Let x < y be arbitrary points in the domain. We are given that fn (x) ≤ fn (y) for all n. Because fn (x) → f (x) and fn (y) → f (y), we may use the Order Limit Theorem to conclude f (x) ≤ f (y). This proves f is increasing. (f) True. The proof in (e) does not require uniform convergence. Exercise 6.2.9. Let ² > 0 be arbitrary. We need to show that there exists an N ∈ N such that when n ≥ N it follows that |fn /g − f /g| < ². First write ¯ ¯ ¯ ¯ ¯ fn ¯ ¯ ¯ ¯ − f ¯ = ¯ 1 ¯ |fn − f |. ¯g g ¯ ¯g ¯ Because g is continuous and never zero, 1/g is also continuous on K. The fact that K is compact implies 1/g is bounded, so let M > 0 satisfy |1/g| ≤ M. Because (fn ) → f uniformly, we can pick N such that |fn − f | < It follows that

² M

whenever n ≥ N.

¯ ¯ ¯ fn ¯ ¯ − f¯<M ² =² ¯g g¯ M

for all n ≥ N , as desired. Exercise 6.2.10. Let ² > 0 be arbitrary. We need to show that there exists an N such that n ≥ N implies |fn (x) − f (x)| < ². Because f is uniformly continuous on all of R, we can pick δ so that |f (x) − f (y)| < ²

whenever |x − y| < δ.

Now choose N > 1/δ. If n ≥ N then |(x + 1/n) − x| < δ and it follows that |fn (x) − f (x)| = |f (x + 1/n) − f (x)| < ², as desired. This proposition fails if f is not uniformly continuous. Consider f (x) = x2 which is continuous but not uniformly continuous on all of R. In this case we see |fn (x) − f (x)| = |(x + 1/n)2 − x2 | = |2x/n + 1/n2 |. Although for each x ∈ R, this expression tends to zero as n → ∞, we see that larger values of x require larger values of n and the convergence is not uniform. Exercise 6.2.11. Without the limit functions mentioned, it is a little smoother to argue in terms of the Cauchy Criterion. Let ² > 0. Choose N1 so that |fn − fm | < ²/2

for all n, m ≥ N1 ,

94

Chapter 6. Sequences and Series of Functions

and choose N2 so that |gn − gm | < ²/2

for all m, n ≥ N2 .

Letting N = max{N1 , N2 } we see that |(fn + gn ) − (fm + gm )|

≤ |fn − fm | + |gn − gm | ² ² < + =² 2 2

for all m, n ≥ N . It follows from the Cauchy Criterion for Uniform Convergence that (fn + gn ) converges uniformly. (b) Looking ahead to (c), we see that problems can arise when at least one of the limit functions is unbounded. For example, let fn (x) = x + 1/n and gn (x) = 1/n. Then fn (x) → x uniformly on R and gn (x) → 0 uniformly on R. However fn (x)gn (x) = x/n + 1/n2 . Although fn gn → 0 pointwise on R, the convergence is not uniform. (c) The first step is to write |fn gn − fm gm |

=

|fn gn − fn gm + fn gm − fm gm |

≤

|fn ||gn − gm | + |gm ||fn − fm |.

Given ² > 0, choose N1 so that |fn − fm | <

² 2M

for all n, m ≥ N1 .

² 2M

for all m, n ≥ N2 .

Also, choose N2 so that |gn − gm | <

Letting N = max{N1 , N2 } we see that |fn gn − fm gm |

≤ |fn ||gn − gm | + |gm ||fn − fm | ² ² < M +M =² 2M 2M

whenever n ≥ N , as desired. Exercise 6.2.12. (a) Setting gn = f − fn we see that (i) gn is continuous for each n ∈ N , (ii) gn (x) is decreasing for all x ∈ K, and (iii) gn (x) → 0 for all x ∈ K. (b) We first prove that each Kn is closed. To see this, assume (xm ) is a convergent sequence in Kn . If x = lim xm , then x ∈ K (because K is closed) and the fact that gn is continuous on all of K allows us to write gn (x) = limm→∞ gn (xm ). Because gn (xm ) ≥ ² for all m, it follows that the limit gn (x)

6.2. Uniform Convergence of a Sequence of Functions

1

1 2

1 3

95

2 3

1

Figure 6.1: Sketch of f1 for the Cantor function.

also satisfies gn (x) ≥ ². But this implies x ∈ Kn and we see that Kn contains its limit points and thus is closed. Each Kn is also bounded because it is a subset of the bounded set K, and it follows that Kn is compact. The nested property Kn ⊇ Kn+1 is a direct consequence of our assumption that gn (x) ≥ gn+1 (x) for all x ∈ K, and so we are prepared to use Theorem 3.3.5. Assume, for contradiction, that Kn is nonempty for every n ∈ N. Then Theorem 3.3.5 implies there exists a point x satisfying x ∈ Kn for every n. But this means gn (x) ≥ ² for every n, contradicting our assumption that gn (x) → 0. We conclude that there must exist an N for which Kn = ∅ for all n ≥ N , and this is equivalent to asserting that |gn (x)| < ²

for all n ≥ N and x ∈ K.

We conclude that gn → 0 uniformly, and thus fn → f uniformly as well. Exercise 6.2.13. (a) A sketch of f1 is given in Figure 6.1. (b) Looking at f1 for the moment, notice that for every n ∈ N we have |f1 (x) − fn (x)| = 0 if x ∈ [1/3, 2/3]. Off of this middle set, the fact that every fn is increasing means we still have the estimate |f1 (x) − fn (x)| ≤

1 . 2

In general, given m < n we see |fm (x) − fn (x)| ≤

1 . 2m

By the Cauchy Criterion for Uniform Convergence (Theorem 6.2.5), we conclude that (fn ) converges uniformly. (c) The limit function f is continuous by Theorem 6.2.6. Exercise 6.2.8 (e) gives an argument that f is increasing, and f (0) = 0 and f (1) = 1 follows

96

Chapter 6. Sequences and Series of Functions

quickly from the fact that 0 and 1 are fixed by every fn . Finally, if x is a point in [0, 1]\C, then x must fall in the complement of some Cm . Notice that f (x) = fm (x) and the recursive way that each fn is constructed means that in fact f (y) = fn (y) for all n ≥ m and y ∈ [0, 1]\Cm . It follows that f is constant on [0, 1]\Cm . Exercise 6.2.14. (a) The sequence of real numbers fn (x1 ) is bounded by M . The Bolzano–Weierstrass Theorem implies that there is a convergent subsequence. (b) Focusing on the sequence f1,k (x2 ), we again use the Bolzano–Weierstrass Theorem to conclude that there is a convergent subsequence which we write as f2,k (x2 ). (c) Keep in mind that if m0 > m then (fm0 ,k ) is a subsequence of (fm,k ). The key idea is to let fnk = fk,k = (f1,1 , f2,2 , f3,3 , . . .). The nested quality shows that (fk,k ) is a subsequence of f1,k and thus fk,k (x1 ) converges. But what about fk,k (xm ) for an arbitrary xm ∈ A? Well, after the first m terms, we see that fk,k becomes a proper subsequence of fm,k (i.e., fk,k is eventually in fm,k ), and it follows that fk,k (xm ) converges. This shows fk,k converges pointwise on A. Exercise 6.2.15. (a) If each fn is uniformly continuous, the choice of δ can be made independently of x but δ will certainly depend on the function fn . It is possible for different functions fn to require smaller δ responses, and it may be that there is no single δ that will work simultaneously for all functions in the collection fn as the definition of equicontinuity requires. (b) For each n, the function gn is continuous on the compact set [0, 1] and thus it is uniformly continuous. However, the sequence (gn ) is not equicontinuous over the set [0, 1]. The trouble occurs near 1. To make the discussion concrete, let’s take ² = 1/2 and set y = 1. The definition of equicontinuity requires us to produce a δ > 0 with the property that |xn − 1| <

1 2

for all n ∈ N and |x − 1| < δ.

But notice that δ cannot be chosen independently of n because no matter how close to 1 we take our value of x, it will always be possible to find a large value of n that makes |xn − 1| ≥ 1/2. Exercise 6.2.16. (a) Because the set of rational numbers in [0, 1] is countable, Exercise 6.2.14 gives us exactly what we need to produce the sequence (gs ). (b) Consider a fixed ri from our finite set {r1 , r2 , . . . , rm }. Because (gs ) converges pointwise at every rational, the sequence (gs (ri )) is a Cauchy sequence. Thus we can choose Ni such that ² for all s, t ≥ Ni . |gs (ri ) − gt (ri )| < 3

6.3. Uniform Convergence and Differentiation

97

Letting N = max{N1 , N2 , . . . , Nm } produces the desired N . Note that if the set {r1 , r2 , . . . , rm } were infinite then N would be the maximum of an infinite set which is problematic to say the least. (c) Given x ∈ [0, 1], we know there exists a rational ri from our designated set satisfying |ri − x| < δ. It follows that |gs (x) − gs (ri )| <

² 3

for all s ∈ N.

Using this fact (twice) and the result in (b) we see that s, t ≥ N implies |gs (x) − gt (x)|

= |gs (x) − gs (ri ) + gs (ri ) − gt (ri ) + gt (ri ) − gt (x)| ≤ |gs (x) − gs (ri )| + |gs (ri ) − gt (ri )| + |gt (ri ) − gt (x)| ² ² ² + + = ². < 3 3 3

It follows that (gs ) converges uniformly using the Cauchy criterion in Theorem 6.2.5.

6.3

Uniform Convergence and Differentiation

Exercise 6.3.1. (a) Write ¯ ¯ ¯ sin(nx) ¯ ¯ ≤ 1. |hn (x) − 0| = ¯¯ n ¯ n Given ² > 0, choose N > 1/² which is independent of x. Then n ≥ N implies |hn − 0| < ² and we conclude hn → 0 uniformly. By contrast, the sequence of derivatives h0n (x) = cos(nx) diverges for all values of x except x = π/2 √ + kπ. (b) The sequence fn (x) = sin(nx)/ n has this property. The lesson here is that uniform convergence of a sequence of functions does not, by itself, imply anything particularly useful about the behavior of the sequence of derivatives. Exercise 6.3.2. (a) First we deduce that g = lim gn = 0, and the convergence is uniform on [0, 1]. To prove this, we must find an N such that n ≥ N implies |xn /n − 0| < ². But notice that |

xn 1 − 0| ≤ n n

for all x ∈ [0, 1].

Given ² > 0, pick N > 1/². Then n ≥ N implies |xn | < ² for all x ∈ [0, 1], as desired. Because g(x) = 0 for all x ∈ [0, 1] it is differentiable and, furthermore, g 0 (0) = 0.

98

Chapter 6. Sequences and Series of Functions (b) Writing gn0 (x) =

nxn−1 = xn−1 , n

we see that the sequence (gn0 ) converges pointwise on [0, 1], to ½ 0 if x 6= 1 h(x) = lim gn0 (x) = 1 if x = 1. n→∞ The convergence is not uniform over [0, 1], and in fact it is not uniform over any set that contains 1 as a limit point. Comparing h = lim gn0 to g 0 is illuminating. Note in particular that h(1) 6= g(1), so that it is possible for the sequence of derivatives to converge to the “wrong” value when the convergence of gn0 is not uniform. On the other hand, the convergence of gn0 is uniform on sets of the form [0, c] where c < 1, and this is reflected by the fact that h(x) = g(x) on [0, 1). Exercise 6.3.3. We have seen that fn → 0 uniformly meaning that the limit f = lim fn satisfies f 0 (x) = 0 for all values of x. Taking the derivative we get fn0 (x) =

1 − nx2 . 1 + 2nx2 + n2 x4

If x 6= 0 then we can show lim fn0 (x) = 0 = f 0 (x). However, for x = 0 we get fn0 (0) = 1 for all n and thus f 0 (0) 6= lim fn0 (1). Exercise 6.3.4. (a) We have x2 x x + = , n→∞ 2 2n 2

g(x) = lim gn (x) = lim n→∞

so g 0 (x) = 1/2. (b) This time we compute gn0 (x) first to get gn0 (x) =

1 x + , 2 n

and note that the pointwise limit of this sequence is 1/2. For x ∈ [−M, M ] we can write ¯x¯ M ¯ ¯ |gn0 (x) − 1/2| = ¯ ¯ ≤ . n n Given ² > 0, choose N > M/², independent of x. Then n ≥ N implies |gn0 (x) − 1/2| < ², and we conclude that gn0 → 1/2 uniformly on [−M, M ]. It follows from Theorem 6.3.3 that g 0 (x) = 1/2. (c) Taking the pointwise limit of fn (x) gives x2 + 1/n x2 = . n→∞ 2 + x/n 2

f (x) = lim fn (x) = lim n→∞

6.4. Series of Functions

99

Thus, f 0 (x) = x. Computing the derivative sequence first we get fn0 (x) = so that

4n2 x + 3nx2 + 1 , 4n2 + 4nx + x2

4x + 3x2 /n + 1/n2 = x. n→∞ 4 + 4x/n + x2 /n2

lim fn0 (x) = lim

n→∞

Arguing for uniform convergence on intervals of the form [−M, M ] is less elegant for this example but no harder really. For values of x satisfying |x| < M we have ¯ ¯ ¯ −nx2 − x3 + 1 ¯ nM 2 + M 3 + 1 0 ¯ ¯≤ |fn (x) − x| = ¯ 2 , 4n + 4nx + x2 ¯ 4n2 − 4nM as long as n > M . Because this estimate does not depend on x and tends to zero as n → ∞, it follows that fn0 (x) → x uniformly on [−M, M ]. Exercise 6.3.5. Let x ∈ [a, b] and assume, without loss of generality, that x > x0 . Applying the Mean Value Theorem to the function fn − fm on the interval [x0 , x], we get that there exists a point α such that 0 (fn (x) − fm (x)) − (fn (x0 ) − fm (x0 )) = (fn0 (α) − fm (α))(b − a).

Let ² > 0. Because (fn0 ) converges uniformly, the Cauchy Criterion asserts that there exists an N1 such that 0 |fn0 (c) − fm (c)| <

² 2(b − a)

for all n, m ≥ N and c ∈ [a, b].

Our hypothesis states that (fn (x0 )) converges so there exists an N2 such that |fn (x0 ) − fm (x0 )| <

² 2

for all n, m ≥ N2 .

Finally, let N = max{N1 , N2 }. Then if n, m ≥ N it follows that |fn (x) − fm (x)| ≤ |(fn (x) − fm (x)) − (fn (x0 ) − fm (x0 ))| + |fn (x0 ) − fm (x0 )| 0 = |(fn0 (α) − fm (α))(b − a)| + |fn (x0 ) − fm (x0 )| ² ² < (b − a) + = ². 2(b − a) 2 Because our choice of N is independent of the point x, the Cauchy Criterion implies that the sequence (fn ) converges uniformly on [a, b].

6.4

Series of Functions

Exercise 6.4.1. Let ² > 0. By Theorem 6.4.4, there exists an N such that |gm+1 (x) + · · · + gn (x)| < ²

whenever n > m ≥ N.

100

Chapter 6. Sequences and Series of Functions

Because this holds for all m ≥ N , we can set m = n − 1 to get that |fn (x)| < ²

whenever n > N .

This proves gn → 0 uniformly. Exercise 6.4.2. The key idea is to use the Cauchy criterion for convergence of aPseries of real numbers given in Theorem 2.7.2. Let ² > 0 be arbitrary. Because ∞ n=1 Mn converges, there exists an N such that n > m ≥ N implies |Mm+1 + Mm+2 + · · · + Mn | < ². Because |fm+1 (x) + fm+2 (x) + · · · + fn (x)| ≤ Mm+1 + Mn+2 + · · · + Mn , we can appeal to the CauchyPcriterion for uniform convergence of series (Theo∞ rem 6.4.4) to conclude that n=1 fn converges uniformly. Exercise 6.4.3. (a) Because ¯ ¯ ¯ cos(2n x) ¯ ¯ ¯≤ 1 , ¯ 2n ¯ 2n P∞ and we know n=1 1/2n converges, it follows by the Weierstrass M–Test that P ∞ n n n=1 cos(2 x)/2 converges uniformly. Because each of the summands is continuous, the limit is as well according to Theorem 6.4.2. (b) Again, we have an infinite sum of continuous functions and we would like to conclude that the limit is continuous. This will follow if we can argue the convergence On [−1, 1] we can make the estimate |xn /n2 | ≤ 1/n2 . P∞is uniform. 2 Because 1/n converges, we may invoke the Weierstrass M–Test to assert P∞ n=1 that n=1 xn /n2 converges uniformly on [−1, 1]. This completes the proof. Exercise 6.4.4. The “sawtooth” function h(x) satisfies |h(x)| ≤ 1 for all x ∈ R. Thus ¯ ¯ ¯1 ¯ ¯ h(2n )¯ ≤ 1 . ¯ 2n ¯ 2n P∞ Because n=1 1/2n , the Weierstrass M–Test implies that our series for g(x) converges uniformly. The fact that h(x) is continuous allow us to invoke Theorem 6.4.2 to conclude g is continuous. Exercise 6.4.5. (a) The series for f certainly converges uniformly, but Theorem 6.4.3 requires us to look at the differentiated series (1)

∞ X cos(kx) . k2 n=1

For this series we can make the estimate ¯ ¯ ¯ cos(kx) ¯ 1 ¯ ¯ ¯ k2 ¯ ≤ k2 .

6.4. Series of Functions

101

P∞ Because n=1 1/k 2 converges, the M–Test asserts that the series above in (1) converges uniformly. Now Theorem 6.4.3 asserts that f is differentiable and f 0 (x) =

∞ X cos(kx) . k2 n=1

Finally, we note that the uniform convergence also implies (via Theorem 6.4.2) that f 0 (x) is continuous because each of the summands is. (b) To use Theorem 6.4.3 to determine whether f 0 (x) is differentiable requires that we differentiate the series for f 0 term-by-term and consider ∞ X − sin(kx) . k n=1

P∞ Unfortunately, the Weierstrass M–Test cannot be used here because n=1 1/k diverges. The differentiability of f 0 turns out to be a very deep question that has been studied in depth by Riemann and Hardy, among others. Exercise 6.4.6. First fix x0 ∈ [0, 1). Now choose c to satisfy x0 < c < 1 and apply theP M–Test on [0, c]. Over this interval we get the estimate |xn /n| ≤ cn /n. ∞ Because n=1 cn /n converges, the M–Test implies the convergence is uniform and thus f is continuous at x0 ∈ [0, c]. Exercise 6.4.7. (a) First observe that the summands are continuous functions and satisfy ¯ ¯ ¯ ¯ 1 1 ¯ ¯ for all x ∈ R. ¯ x2 + n2 ¯ ≤ n2 P∞ Because n=1 1/n2 converges, the M–Test implies the convergence is uniform and hence h(x) is continuous on R. (b) To determine if h is differentiable we consider the differentiated series ∞ X

−2x . 2 + n2 ) 2 (x n=1 Restricting our attention to an interval [−M, M ], we get the estimate ¯ ¯ ¯ −2x ¯ 2M ¯ ¯ ¯ x2 + n2 ¯ ≤ n2 , P∞ and, as before, we note n=1 2M/n2 converges. This proves that the differentiated series converges uniformly to h0 (x) and that h0 is continuous on [−M, M ]. Because the interval [−M, M ] is arbitrary in this argument, we conclude that h0 exists and is continuous on all of R. n Exercise 6.4.8. P∞ Using the M–Test and the fact that |un (x)| ≤ 1/2 , we can show that n=1 un (x) converges uniformly to h(x). An argument like the one in Exercise 6.2.8 (e) shows that h is increasing. Finally, notice that Theorem 6.2.6 is stated and proved in terms of an individual point c in the domain. In our case, if we take c to be irrational, then we see that every un is continuous at c and thus h is as well.

102

6.5

Chapter 6. Sequences and Series of Functions

Power Series

Exercise 6.5.1. (a) The series for g converges for all x ∈ (−1, 1] and is continuous over this interval. The series does not converge when x = −1 and Theorem 6.5.1 implies that it then does not converge for any values of x satisfying |x| > 1. (b) Theorem 6.5.6 implies g is differentiable on (−1, 1) with g 0 (x) = 1 − x + x2 − x3 + · · · . Notice that our formula for g 0 (x) no longer converges when x = 1 (although the function g is technically differentiable at this point.) P∞ Exercise 6.5.2. (a) n=1 xn /n2 P∞ n (b) n=1 x /n P∞ (c) n=1 (−1)n x2n /n P (d) No. If the series converges absolutely at x = 1, then |an | converges, it follows that the series converges absolutely at x = −1 as well. Exercise 6.5.3. The set of convergent points for a power series must be R, {0} or an interval. In the case of an interval, we have seen that the convergence is absolute in the interior of this interval. Thus, the two endpoints are the only candidates for conditional convergence. Exercise 6.5.4. (a) Let x ∈ (−R, R) be arbitrary. We want to prove that the power series is continuous at x. To do this, choose c > 0 satisfying 0 < |x| < c < R, and consider the compact set [−c, c] contained in (−R, R). Absolute convergence of the series at x = c implies that we get uniform convergence over the interval [−c, c]. Because the summands in a power series are continuous, we may conclude that the series represents a continuous function on [−c, c], and hence is continuous at x. (b) The content of Abel’s Theorem is that convergence at an endpoint x = R implies uniform convergence over the interval [0, R]. Once we have established uniform convergence, continuity follows (once again) from Theorem 6.4.2 and the observation that the summands are all continuous. Exercise 6.5.5. Set Mn = |an xn0 | and note that absolute convergence at x0 implies ∞ ∞ X X |an xn0 | = Mn n=0

n=0

converges. If x ∈ [−c, c] then we get the estimate |an xn | ≤ |an xn0 | = Mn , and the Weierstrass M–Test implies that [−c, c].

P∞ n=0

an xn converges uniformly on

6.5. Power Series

103

P Exercise 6.5.6. Assume an xn converges pointwise on the compact set K. Because K is compact, there exist points x0 , x1 ∈ K satisfying x0 ≤ x ≤ x1

for all x ∈ K.

At this point we need to consider a few different cases. If x0 > 0 then K ⊆ [0, x1 ] and Theorem 6.5.1 implies we get pointwise convergence on [0, x1 ]. Then Abel’s Theorem implies that the convergence is uniform on [0, x1 ] hence over the set K ⊆ [0, x1 ]. If x1 < 0 then K ⊆ [x0 , 0] we can make a similar argument along the same lines. If x0 ≤ 0 ≤ x1 then Theorem 6.5.1 and Abel’s Theorem imply that we the series converges uniformly over each of the intervals [x0 , 0] and [0, x1 ]. It is a straightforward exercise to show that this implies uniform convergence over [x0 , x1 ] and hence over the set K ⊆ [x0 , x1 ]. Exercise 6.5.7. (a) Applying the Ratio Test to the sequence an = nsn−1 we find ¯ ¯ ¯ ¯ n ¯ ¯ ¯ an+1 ¯ ¯ ns + sn ¯ ¯ 1 ¯¯ ¯ ¯ ¯ ¯ ¯ lim = lim ¯ = lim s + s¯ = s. n→∞ ¯ an ¯ n→∞ nsn−1 ¯ n→∞ ¯ n P Because 0 < s < 1, the series an converges by the Ratio Test. Therefore, the sequence (nsn−1 ) converges to zero and thus is bounded. (b) LetP x ∈ (−R, R) be arbitrary and pick |x| < t < R. We will Pt to satisfy show that |nan xn−1 | converges, implying nan xn−1 converges. First write ∞ X

n−1

|nan x

n=1

µ ¯ ¯ ¶ ∞ X 1 ¯ x ¯n−1 |= n¯ ¯ |an tn |. t t n=1

Because |x/t| < 1, by part (a) we can pick a bound L satisfying ¯ x ¯n−1 ¯ ¯ n¯ ¯ ≤L t

for all n ∈ N.

Now we have ∞ X n=1

|nan xn−1 | =

µ ¯ ¯ ¶ ∞ ∞ X 1 LX ¯ x ¯n−1 n¯ ¯ |an tn | |an tn | ≤ t t t n=1 n=1

where the last sum converges because t ∈ (−R, R). Therefore, converges absolutely and thus converges.

P∞ n=1

nan xn−1

Exercise 6.5.8. (a) For a fixed x, apply the Ratio Test to the series to get ¯ ¯ ¯ ¯ ¯ an+1 xn+1 ¯ ¯ an+1 ¯ ¯ ¯ ¯ |x| = L|x|. ¯ lim = lim n→∞ ¯ an xn ¯ n→∞ ¯ an ¯

P

an xn

If |x| < 1/L then L|x| < 1 and the series converges. (b) If L = 0 then L|x| = 0 < 1 for every value of x and the Ratio Test implies that the series converges on all of R.

104

Chapter 6. Sequences and Series of Functions

(c) This will follow using the same proofs if we can prove the following modified version of the Ratio Test: Given a sequence (bn ), let ¯ ½¯ ¾ ¯ bk+1 ¯ ¯:k≥n . L0 = lim sn where sn = sup ¯¯ n→∞ bk ¯ P If L0 < 1 then bn converges. The proof is very similar to the proof of the Ratio Test in Exercise 2.7.9. First choose R to satisfy L0 < R < 1. The sequence (sn ) is decreasing, and because it converges to L0 we know there exists an N such that ¯ ¯ ¯ bk+1 ¯ ¯ ¯ ¯ bk ¯ ≤ R for all k ≥ N . An induction proof like the one before shows |bk | ≤ |bN |Rk−N for all k ≥ N , P and P then we may compare the series bk to the convergent geometric series P |aN | Rk to conclude that bk converges. (d) The statement in this exercise is false. A condition such as |an+1 /an | ≥ 1 for all values of n after some point in the sequence would be sufficient to prove the series diverges. P∞ P∞ n n Exercise 6.5.9. Set g(x) = n=0 bn x . Because n=0 an x and h(x) = g(x) = h(x) on (−R, R) we see a0 = g(0) = h(0) = b0 . But we also know that g and h are infinitely differentiable. Taking the derivative and setting x = 0 yields the formulas a1 = g 0 (0) and b1 = h0 (0). Again, because g = h we see that a1 = b1 . Taking successive derivatives and setting x = 0 leads to the conclusion that an = bn for all n ∈ N. (The upcoming work on Taylor’s formula in the next section is very relevant to this discussion.) P P P Exercise 6.5.10. We are assuming an , bn and dn each converge which, according to Abel’s Theorem, tells us that the respective series for f , g, and h converge uniformly on [0, 1]. Among other things, this implies that f , g and h are all continuous functions over the closed interval [0, 1]. Fix x ∈ [0, know convergence at 1, Theorem 6.5.1 P1). Because P we Pwe have implies that an xn , bn xn and dn xn each converge absolutely. This fact means that we can invoke the result in Exercise 2.8.8 to assert that X h(x) = dn xn = f (x)g(x). Because this is true for all x ∈ [0, 1), and because f , g and h are continuous on the closed interval [0, 1], it follows that h(1) = f (1)g(1) or ³X ´ ³ X ´ X dn = an bn , as desired.

6.6. Taylor Series

105

P P Exercise 6.5.11. (a) Assume an converges to L. If we set f (x) = an xn , then Abel’s Theorem implies that the series for f converges uniformly on the interval [0, 1]. Because the summands are continuous polynomials, this proves that f is continuous on [0, 1]. In particular, this implies Plimx→1− f (x) = f (1). But notice that f (1) = L and thus we have shown that an is Abel-summable to L. (b) Using some familiar facts about geometric series, observe that ∞ X

(−1)n xn

=

1 − x + x2 − x3 + x4 −

n=0

= =

1 1 − (−x) 1 , 1+x

provided |x| < 1. Then limx→1− 1/(1 + x) = 1/2 shows that our series is Abelsummable to 1/2. Exercise 6.5.12. Begin with the observation that 0 < d0 . Because G is strictly increasing we see G(0) < G(d0 ). But notice G(0) = d1 and G(d0 ) = d0 and so we have d1 < d0 . This argument can be repeated. Given dr < d0 we have dr+1 = G(dr ) < G(d0 ) = d0 , and it follows that dr < d0 for all values of r. We conclude that (dr ) converges to d0 —the smaller of the two fixed points.

6.6

Taylor Series

Exercise 6.6.1. Because the series converges when x = 1, Abel’s Theorem implies that we get uniform convergence over the interval [0, 1] and thus the series represents a continuous function over the interval [0, 1]. Assuming that arctan(x) is continuous over [0, 1], it follows that if these two continuous functions agree for all values of x ∈ [0, 1), then they must also agree when x = 1. Setting x = 1 into this formula gives “Leibniz’s formula,” π 1 1 1 = 1 − + − + ··· . 4 3 5 7 Exercise 6.6.2. From equation (1) we get 1 = 1 − x + x2 − x3 + x4 − x5 + · · · . 1+x Then integrating gives Z

x

ln(1 + x) = 0

1 x2 x3 x4 x5 dt = x − + − + + ··· . 1+t 2 3 4 5

This series converges for all x in the interval (−1, 1].

106

Chapter 6. Sequences and Series of Functions

Exercise 6.6.3. The key idea is to take the derivative of each side of equation (2) using a term-by-term approach for the series on the right (this is justified by Theorem 6.5.7). Setting x = 0 after n derivatives gives the formula for an . Exercise 6.6.4. We set a0 = sin(0) = 0, a1 = cos(0)/1! = 1, a2 = − sin(0)/2! = 0, a3 = − cos(0)/3! = −1/3!, and so on. Then substitute these values for an and f into the expression in equation (2). It remains to show that the series expression actually equals sin(x) for any values of x other than x = 0. Exercise 6.6.5. To do this we will show that EN (x) → 0 uniformly on [−2, 2]. By Lagrange’s Remainder Theorem we have ¯ (N +1) ¯ ¯f ¯ N +1 ¯ (c) N +1 ¯¯ 1 1 ¯ ¯x ¯≤ |EN (x)| = ¯ x ≤ 2N +1 ¯ (N + 1)! (N + 1)! (N + 1)! for x ∈ [−2, 2]. From past experience we know that factorials grow faster than exponentials or, put another way, that limN →∞ 2N +1 /(N + 1)! = 0. Thus, given ² > 0 we can choose an M so that N ≥ M implies 2M +1 /(M + 1)! < ². It follow that 2M +1 |EM (x)| ≤ < ² for all M ≥ N , (M + 1)! and hence EN (x) → 0 uniformly on [−2, 2]. Replacing the constant 2 with an arbitrary constant R has no effect on the validity of the argument. Exercise 6.6.6. (a) Because f (n) (x) = ex for every n, we get an = e0 /n! = 1/n! which yields ∞ X xn x2 x3 ex = =1+x+ + + ··· . n! 2! 3! n=0 To show that the series converges uniformly to ex on any interval of the form [−R, R] use Lagrange’s Remainder formula to write ¯ ¯ ¯ ¯ ec eR N +1 ¯ N +1 ¯ |EN (x)| = ¯ x ≤ ¯ (N + 1)! R (N + 1)! for all x ∈ [−R, R]. Now, just as in the previous exercise, this error bound tends to zero as N → ∞. Because this bound is independent of x, it follows that EN (x) → 0 uniformly on [−R, R] and we get that SN (x) → ex uniformly on [−R, R] as well. (b) To verify the formula f 0 (x) = ex we differentiate the series representation ex = 1 + x +

x2 x3 x4 + + + ··· 2! 3! 4!

term-by-term to get (ex )

0

= = =

x x2 x3 + 3 + 4 + ··· 2! 3! 4! x2 x3 x4 1+x+ + + + ··· 2! 3! 4! x e . 0+1+2

6.6. Taylor Series

107

(c) Starting from the formula ex = e−x =

∞ X

P∞ n=0

(−x)n /n! = 1 − x +

n=0

xn /n! we get the formula

x2 x3 x4 − + − ··· . 2! 3! 4!

Reviewing the material on Cauchy products from the end of Section 2.8 we now write x2 x3 x2 x3 + + · · · )(1 − x + − + ···) 2! 3! 2! 3! 1 1 −1 1 1 1 = 1 + (−1 + 1)x + ( − 1 + )x2 + ( + − + )x3 + · · · 2! 2! 3! 2! 2! 3! = 1

(ex )(e−x ) =

(1 + x +

The key to the above calculation is to use the binomial formula to show that the coefficient for xn is n X 1 (−1)n−k 1 = (−1 + 1)n = 0 k! (n − k)! n!

for all n ≥ 1.

k=0

The point of this exercise is to illustrate that if we take the power series representation for ex to be the definition of the exponential function, then familiar statements such as (ex )0 = ex and e−x = 1/ex follow naturally from the definition. Exercise 6.6.7. Applying the definition of the error function at zero we find (n)

EN (0) =

(n)

f (n) (0) − SN (0)

=

f (n) (0) − n!an

=

f (n) (0) − n!

f (n) (0) =0 n!

for all n = 0, 1, 2, · · · , N. Exercise 6.6.8. By applying the Generalized Mean Value Theorem to the functions EN (x) and xN +1 on the interval [0, x] we know that there exists a point x1 ∈ (0, x) such that EN (x) =

0 xN +1 EN (x1 ) . (N + 1) xN 1

0 Now apply the Generalized Mean Value Theorem to the functions EN (x) and N x on the interval [0, x1 ], to get that there exists a point x2 ∈ (0, x1 ) where 0 EN (x1 ) E 00 (x2 ) = N N −1 . N x1 N x2

Substituting this observation into our earlier result gives EN (x) =

0 00 (x1 ) EN (x2 ) xN +1 EN xN +1 . = N N (N + 1) x1 (N + 1)(N ) x2 −1

108

Chapter 6. Sequences and Series of Functions

Continuing in this manner we find EN (x) =

0 00 xN +1 EN (x1 ) xN +1 EN (x2 ) = ··· = N −1 (N + 1) xN (N + 1)(N ) x2 1 (N +1)

=

(xN +1 ) xN +1 EN N −N (N + 1)! xN +1

where xN +1 ∈ (0, xN ) ⊆ · · · ⊆ (0, x). Now set c = xN +1 and, noting that cN −N = 1, write (N +1) E (c) N +1 EN (x) = N x . (N + 1)! (N +1)

Finally, because SN that

(N +1)

= 0 we have that EN

(x) = f (N +1) (x) and it follows

(N +1)

EN (x) =

EN (c) N +1 f (N +1) (c) N +1 x = x . (N + 1)! (N + 1)!

This proves Lagranges’s Remainder Theorem. Exercise 6.6.9. By the definition of the derivative, we have 2

g(x) − g(0) e−1/x 1/x = lim = lim 1/x2 x→0 x→0 x→0 e x−0 x

g 0 (0) = lim

where both numerator and denominator tend to ∞ as x approaches zero. Applying the ∞/∞ version of L’Hospital’s rule we can write g 0 (0) = lim

−1/x2

2 x→0 e1/x (−2/x3 )

= lim

x→0

x = 0. 2e1/x2

Exercise 6.6.10. Computing the derivatives for x 6= 0 we find 2

g 0 (x) =

2e−1/x , x3

2

g 00 (x) = − 2

g 000 (x) =

2

6e−1/x 4e−1/x + , x4 x6 2

2

24e−1/x 36e−1/x 8e−1/x − + , 5 7 x x x9

and in general we can write g (n) (x) =

2 n X f (n, k)e−1/x

k=1

x2k+n

where f (n, k) describes the coefficients. Exercise 6.6.11. To compute g 00 (0) from the definition we substitute the formula for g 0 (x) away from zero to get 2

2e−1/x 2/x4 g 0 (x) = lim = lim 2 . x→0 x→0 e1/x x→0 x x4

g 00 (0) = lim

6.6. Taylor Series

109

Applying L’Hospital’s rule we can write −8/x5 4/x2 = lim 2 2 . x→0 −2e1/x /x3 x→0 e1/x

g 00 (0) = lim

One more application of L’Hospital’s rule lets us conclude −8/x3 −4 = lim 1/x2 = 0. 2 x→0 −2e1/x /x3 x→0 e

g 00 (0) = lim

2

In general, whenever we have a quotient of the form x−m /e1/x , what we discover is that by repeated applications of L’Hospital’s rule we can show 1/xm 2 = 0. x→0 e1/x lim

An induction argument now proves that g (n) (0) = 0 for all n. To see this explicitly, observe that if g (n) (0) = 0 then our formula from the previous exercise yields g (n+1) (0) = = =

g (n) (x) − g (n) (0) x→0 x−0 g (n) (x) lim x→0 x n X f (n, k)/x2k+n+1 lim . x→0 e1/x2 k=1 lim

Because this limit is zero for each term in the sum, we see that g (n+1) (0) = 0, and it follows that g (n) (0) = 0 for all values of n. Exercise 6.6.12. We have discussed the fact that g is an infinitely differentiable function as long as ex has this property. This means that g has a Taylor series. Because g (n) (0) = 0 for all n, every coefficient in the series expansion is 0. Thus the Taylor series exists and converges at every value of x to zero. But notice that g(x) 6= 0 whenever x 6= 0. The Taylor series for g(x) exists and converges, but it does not converge to g(x) apart from the center point x = 0. Thus, every infinitely differentiable function cannot be represented by its Taylor series.

110

Chapter 6. Sequences and Series of Functions

Chapter 7

The Riemann Integral 7.1

Discussion: How Should Integration be Defined?

7.2

The Definition of the Riemann Integral

Exercise 7.2.1. Momentarily fix the partition P . Then Lemma 7.2.4 implies L(f, P ) ≤ U (f, P 0 )

for all partitions P 0 .

Because L(f, P ) is a lower bound for the set of upper sums, it must be less than the greatest lower bound for this set; i.e., L(f, P ) ≤ U (f ). But P is arbitrary in this discussion meaning that U (f ) is an upper bound on the set of lower sums. From the definition of the supremum we get L(f ) ≤ U (f ) as desired. Exercise 7.2.2. (a) L(f, P ) = 17/2, U (f, P ) = 23/2, and U (f, P ) − L(f, P ) = 3. (b) In this case U (f, P ) − L(f, P ) = 2. (c) Adding any new point to {1, 3/2, 2, 5/2, 3} will do it. Exercise 7.2.3. For any partition P of [a, b] we have L(f, P ) =

n X

k(xk − xk−1 ) = k(b − a),

k=1

as well as U (f, P ) =

n X

k(xk − xk−1 ) = k(b − a).

k=1

Thus L(f ) = k(b − a) and U (f ) = k(b − a). Because the upper and lower Rb integrals are equal, the function f (x) = k is integrable with a f = k(b − a). 111

112

Chapter 7. The Riemann Integral

Exercise 7.2.4. (a) (⇒) Assume there exists a sequence of partitions (Pn ) satisfying lim [U (f, Pn ) − L(f, Pn )] = 0. n→∞

Given ² > 0, choose PN from this sequence so that U (f, PN ) − L(f, PN ) < ². Then Theorem 7.2.8 implies f is integrable. (⇐) Conversely, if f is integrable then given ²n = 1/n, Theorem 7.2.8 implies that there exists a partition Pn satisfying U (f, Pn ) − L(f, Pn ) < 1/n. It follows that lim [U (f, Pn ) − L(f, Pn )] = 0, n→∞

as desired. (b) For the partition Pn we have xk = k/n, mk = (k − 1)/n and Mk = k/n. Then µ ¶ n n X k 1 X 1 n(n + 1) U (f, Pn ) = (1/n) = 2 , k= 2 n n n 2 k=1

k=1

and L(f, Pn ) =

n X (k − 1) k=1

n

µ ¶ n 1 X 1 (n − 1)n (1/n) = 2 (k − 1) = 2 . n n 2 k=1

(c) Now we may compute · lim [U (f, Pn ) − L(f, Pn )] = lim

n→∞

n→∞

¸ · ¸ n(n + 1) (n − 1)n 1 − = lim = 0. 2 2 n→∞ 2n 2n n

The result in (a) now implies that f (x) = x is integrable. Exercise 7.2.5. We shall use the criterion in Theorem 7.2.8. The shape of the proof is determined by the triangle inequality estimate U (f, P ) − L(f, P ) =

U (f, P ) − U (fN , P ) + U (fN , P ) − L(fN , P ) +L(fN , P ) − L(f, P ) ≤ |U (f, P ) − U (fN , P )| + (U (fN , P ) − L(fN , P )) +|L(fN , P ) − L(f, P )|.

Let ² > 0 be arbitrary. Because fn → f uniformly, we can choose N so that |fN (x) − f (x)| ≤

² 3(b − a)

for all x ∈ [a, b].

Now the function fN is integrable and so there exists a partition P for which U (fN , P ) − L(fN , P ) <

² . 3

Let’s consider a particular subinterval [xk−1 , xk ] from this partition. If Mk = sup{f (x) : x ∈ [xk−1 , xk ]}

and

Nk = sup{fN (x) : x ∈ [xk−1 , xk ]},

7.2. The Definition of the Riemann Integral

113

then our choice of fN guarantees that |Mk − Nk | ≤

² . 3(b − a)

From this estimate we can argue that |U (f, P ) − U (fN , P )| =

¯ n ¯ ¯X ¯ ¯ ¯ ¯ (Mk − Nk )∆xk ¯ ¯ ¯ k=1

≤

n X

² ² ∆xk = . 3(b − a) 3

k=1

Similarly we can show |L(fN , P ) − L(f, P )| ≤

² . 3

Putting this altogether, we see that using our choices of fN and P in the preliminary estimate gives U (f, P ) − L(f, P ) <

² ² ² + + = ². 3 3 3

By the criterion in Theorem 7.2.8 we conclude that the uniform limit of integrable functions is integrable. Exercise 7.2.6. As in the previous exercise, we shall use the criterion in Theorem 7.2.8. Let P be a partition where all the subintervals have equal length ∆x = xk − xk−1 . Because the function is increasing, on each subinterval [xk−1 , xk ] we have Mk = f (xk )

and

mk = f (xk−1 ).

Thus, U (f, P ) − L(f, P )

=

n X

(Mk − mk )∆x

k=1

=

∆x

n X

(f (xk ) − f (xk−1 ))

k=1

=

∆x(b − a).

Given ² > 0, choose a partition P² to have equal subintervals with common length satisfying ∆x < ²/(b − a). The previous calculation then shows U (f, P² ) − L(f, P² ) = ∆x(b − a) <

² (b − a) = ². b−a

114

Chapter 7. The Riemann Integral

7.3

Integrating Functions with Discontinuities

Exercise 7.3.1. (a) Let P be an arbitrary partition of [0, 1]. On any subinterval [xk−1 , xk ], it must be that mk = inf{h(x) : x ∈ [xk−1 , xk ]} = 1, and it follows that n n X X L(h, P ) = mk ∆xk = ∆xk = 1. k=1

k=1

(b) Consider the partition P = {0, .95, 1}. Then U (h, P ) = (1)(.95) + (2)(.05) = 1.05. (c) Consider the partition P² = {0, 1 − ²/2, 1}. Then ³ ³²´ ²´ ² U (h, P² ) = (1) 1 − + (2) =1+ . 2 2 2 The implication is that for this partition we have U (h, P² )−L(h, P² ) < ², proving that h is integrable. Exercise 7.3.2. First write Q ∩ [0, 1] = {r1 , r2 , r3 , . . .}, which is allowed because Q ∩ [0, 1] is a countable set. For each n ∈ N define ½ 1 if x ∈ {r1 , r2 , . . . , rn } gn (x) = 0 otherwise. Because gn has only a finite number of discontinuities we know it is integrable, and gn → g pointwise is easy to verify. This example is discussed explicitly in the Epilogue to Chapter 7. Exercise 7.3.3. Assume f is not continuous on the finite set {z1 , z2 , . . . , zN }. We shall build a partition P in two steps: first handling the “bad” or discontinuous points, and then handling the remainder of the interval [a, b]. Assume f is bounded by M and let ² > 0. Around each zi construct disjoint subintervals small enough so that the sum of the lengths of all N of these comes to less than ²/(4M ). Focusing on just these subintervals we see that ³ ² ´ X X ² 2M ∆xk = 2M (Mk − mk )∆xk ≤ = . 4M 2 bad pts bad pts If O is the union of the open subintervals that surround each zi , then [a, b]\O is a compact set. Because f is continuous on this set, it is uniformly continuous and so there exists a δ > 0 with the property that |f (x) − f (y)| <

² 2(b − a)

whenever |x − y| < δ.

7.3. Integrating Functions with Discontinuities

115

Focusing on the intervals that make up [a, b]\O (the “good points”), we partition these so that all the resulting subintervals have length less than δ. This puts us into a situation like the one in Theorem 7.2. In particular we get that X X ² ² ² (Mk − mk )∆xk < ∆xk < (b − a) = . 2(b − a) 2(b − a) 2 good pts good pts Putting these two parts together we see X X U (f, P ) − L(f, P ) = (Mk − mk )∆xk + (Mk − mk )∆xk bad pts good pts ² ² < + = ², 2 2 and by the criterion in Theorem 7.2.8, f is integrable. Rb Exercise 7.3.4. (a) Assume f is integrable so that U (f ) = L(f ) = a f. Now let f0 be the modified function where we have changed the value of f at x0 . Set D = |f (x0 ) − f0 (x0 )|. We want to prove that U (f0 ) = U (f ) and L(f0 ) = L(f ). Let ² > 0 be arbitrary. To argue that U (f0 ) = U (f ), it is sufficient to find a partition for which U (f0 , P ) < U (f ) + ². Because U (f ) = inf{U (f, P ) : P ∈ P}, we know there exists a partition P where U (f, P ) < U (f ) + ²/2. The first step is let P 0 be a refinement of P with the property that the interval(s) containing x0 have width less than ²/(4D). Because P ⊆ P 0 we know U (f, P 0 ) ≤ U (f, P ). Now observe that because f and f0 agree everywhere except at x0 it follows that ² |U (f, P 0 ) − U (f0 , P 0 )| < D(2∆x) < . 2 (The extra 2 is needed in case the point x0 is an endpoint of an interval in P 0 and is thus contained in two subintervals.) Finally, we see that ³ ² ²´ ² U (f0 , P 0 ) < U (f, P 0 ) + ≤ U (f ) + + = U (f ) + ², 2 2 2 and we conclude that U (f0 ) = U (f ).. The proof that L(f0 ) = L(f ) is similar. (b) This follows using an induction argument. (c) Dirichlet’s function differs from the zero function in only a countable number of points but is not integrable. Exercise 7.3.5. Every interval contains points where f (x) = 0, and thus it follows that L(f, P ) = 0 for every partition P . This implies that L(f ) = 0. It remains to show that U (f ) = 0. Let ² > 0 be arbitrary and consider the finite set {1, 1/2, 1/3, . . . , 1/N } consisting of points of the form 1/n that satisfy 1/n ≥ ²/2. Because this set is finite, we may construct a set of disjoint intervals around each of these points with the property that the sum of the lengths of these intervals comes to less

116

Chapter 7. The Riemann Integral

than ²/2. Letting P be the partition that results from taking the union of these intervals together with the interval [0, ²/2], it follows that U (f, P ) <

² ² + = ², 2 2

and f integrates to zero. Exercise 7.3.6. (a) This proof is nearly indentical to the argument in Exercise 7.3.3. In particular, we shall build a partition P in two steps: first handling the “bad” or discontinuous points, and then handling the “good” or continuous parts of the interval [a, b]. Assume f is bounded by M and let ² > 0. Because our set of discontinuities has content zero, we may let {O1 , . . . , ON } be a collection of open intervals that covers the set of discontinuous points and satisfies N X

|On | ≤

n=1

² . 4M

Focusing on just these subintervals we see that |On | = ∆xn and ³ ² ´ X X ² (Mn − mn )∆xn ≤ 2M ∆xn = 2M = . 4M 2 bad pts bad pts SN If O = n=1 On then [a, b]\O is a compact set. Because f is continuous on this set, it is uniformly continuous and so there exists a δ > 0 with the property that |f (x) − f (y)| <

² 2(b − a)

whenever |x − y| < δ.

Focusing on the intervals that make up [a, b]\O (the “good points”), we partition these so that all the resulting subintervals have length less than δ. This puts us into a situation like the one in Theorem 7.2. In particular we get that X X ² ² ² (Mk − mk )∆xk < ∆xk < (b − a) = . 2(b − a) 2(b − a) 2 good pts good pts Putting these two parts together we see X X U (f, P ) − L(f, P ) = (Mk − mk )∆xk + (Mk − mk )∆xk bad pts good pts ² ² < + = ², 2 2 and by the criterion in Theorem 7.2.8, f is integrable. (b) Given a finite set {z1 , z2 , . . . , zN } and ² > 0, let On = V²0 (zn ) where ²0 = ²/(2N ). Then |On | = ²/N and the sum of these lengths is equal to ², as desired.

7.4. Properties of the Integral

117

(c) Recall that we defined the Cantor set C as the intersection ∞ \

C=

Cn ,

n=0

where Cn consists of 2n closed intervals of length 1/3n . Given ² > 0, choose m so that 2m (1/3m ) < ²/2. Now it would be nice if we could just use the intervals that make up Cm as our covering set. However, the definition of content zero requires that we use open intervals. To fix this, we can imbed each of the 2m closed intervals that make up Cm in a slightly larger open interval whose length is equal to 1/3m + (²/2)2−m . This collection of open intervals will then contain C (because Cm does) and the lengths will sum to ² ² ² 2m [1/3m + 2−m ] < + = ², 2 2 2 as desired. (d) The fact that h is integrable follows immediately from (a), (c), and the result in Exercise 4.3.12. Because C contains no intervals, we see that R1 L(h, P ) = 0 for every partition P and so it must be that 0 h = 0.

7.4

Properties of the Integral

Exercise 7.4.1. (a) Let ² > 0 be arbitrary and choose x1 and x2 so that M 0 − ²/2 < |f (x1 )| and m0 + ²/2 > |f (x2 )|. Then using Exercise 1.2.5 (b) we can write (M 0 − m0 ) − ² ≤ |f (x1 )| − |f (x2 )| ≤ |f (x1 ) − f (x2 )| ≤ M − m. (b) Let ² > 0. Because f is integrable, there exists a partition P satisfying U (f, P ) − L(f, P ) < ². But now from part (a) it follows that U (|f |, P ) − L(|f |, P ) ≤ U (f, P ) − L(f, P ) < ², and the result follows. (c) Because −|f | ≤ f ≤ |f | and all of these functions are integrable, we know from Theorem 7.4.2 (iv) and (ii) that Z

Z

b

−

Z

b

|f | ≤

f≤

a

b

|f |.

a

a

Exercise 7.4.2. From Theorem 7.4.1 we get Z

Z

b

f= c

Z

a

f+ c

b

f. a

118

Chapter 7. The Riemann Integral

Then Definition 7.4.3 allows us to write into the first statement, gives us Z

Z

c

c

f =− Z

b

f+ a

Ra

Rc a

f which, when substituted

b

f=

f.

c

a

Exercise 7.4.3. The properties of the integral in Theorem 7.4.2 allow us to write ¯ Z ¯Z Z b ¯¯ ¯¯Z b ¯ ¯ b b ¯ ¯ ¯ ¯ |fn − f |. fn − f ¯ = ¯ (fn − f )¯ ≤ ¯ ¯ ¯ ¯ a ¯ a a a Let ² > 0 be arbitrary. Because fn → f uniformly, there exists an N such that |fn (x) − f (x)| < ²/(b − a) Thus, for n ≥ N we see that ¯Z Z b ¯¯ ¯ b ¯ ¯ f − f¯ ¯ ¯ a n ¯ a

for all n ≥ N and x ∈ [a, b].

Z

b

≤

|fn − f | a

Z ≤

a

b

² = ², b−a

and the result follows. Exercise 7.4.4. (a) False. Dirichlet’s function is a counterexample. (b) False. The functions in Exercise 7.3.5 and Exercise 7.3.6 are counterexamples. (c) True. Because g is continuous at x0 with g(x0 ) > 0, there exists a δneighborhood Vδ (x0 ) with the property that g(x) ≥ g(x0 )/2 for all x ∈ Vδ (x0 ). Now let P be a partition that contains the interval Vδ (x0 ). When we compute the lower sum L(f, P ) with respect to this partition, the contribution from the subinterval Vδ (x0 ) is at least [g(x0 )/2]2δ > 0. The assumption that g(x) ≥ 0 on the rest of [a, b] guarantees that there are no negative terms in the sum L(f, P ), and it follows that Z b

f = L(f ) ≥ L(f, P ) > 0. a

(d) True. We again argue using lower sums. Because the value of the integral is strictly positive, there must exist a partition P such that L(f, P ) > 0. But this implies that there is at least one subinterval [c, d] in the partition P where the product m(d − c) is strictly positive. Because m = inf{f (x) : x ∈ [c, d]}, setting δ = m gives the result. Exercise 7.4.5. (a) Consider a particular subinterval [xk−1 , xk ] of P and let Mk = sup{f (x) : x ∈ [xk−1 , xk ]},

Mk0 = sup{g(x) : x ∈ [xk−1 , xk ]}, and

Mk00 = sup{f (x) + g(x) : x ∈ [xk−1 , xk ]}.

7.4. Properties of the Integral

119

Because Mk + Mk0 is an upper bound for the set {f (x) + g(x) : x ∈ [xk−1 , xk ]} it follows that Mk00 ≤ Mk + Mk0 . This inequality leads directly to the conclusion that U (f + g, P ) ≤ U (f, P ) + U (g, P ). The two sides are usually not equal because the functions f and g could easily take on their larger values in different places of each subinterval. For example, consider f (x) = x and g(x) = 1 − x on the interval [0, 1]. Then Mk = 1,

Mk0 = 1

and

Mk00 = 1,

so we have Mk00 < Mk + Mk0 . The inequality for lower sums takes the form L(f +g, P ) ≥ L(f, P )+L(g, P ). (b) Because f and g are integrable, there exist sequences of partitions (Pn ) and (Qn ) such that (1)

lim [U (f, Pn ) − L(f, Pn )] = 0

n→∞

and

lim [U (g, Qn ) − L(g, Qn )] = 0.

n→∞

For each n, let Rn be the common refinement Rn = Pn ∪ Qn . Then part (a) of this exercise and Lemma 7.2.3 imply U (f + g, Rn ) − L(f + g, Rn ) ≤ ≤ ≤

[U (f, Rn ) + U (g, Rn )] − [L(f, Rn ) + L(g, Rn )] [U (f, Pn ) + U (g, Qn )] − [L(f, Pn ) + L(g, Qn )] [U (f, Pn ) − L(f, Pn )] + [U (g, Qn ) − L(g, Qn )].

From (1) it now follows that lim [U (f + g, Rn ) − L(f + g, Rn )] = 0,

n→∞

and the result follows. Exercise 7.4.6. (a) Set ½ fn (x) =

(−1)n n if 0 < x < 1/n 0 if x = 0 or x ≥ 1/n.

R1 Then 0 fn = (−1)n , and the limit of these integrals does not exist. (b) Set ½ 2 n if 0 < x < 1/n fn (x) = 0 if x = 0 or x ≥ 1/n. R1 Then 0 fn = n which is unbounded as n → ∞. (c) Sure. Rather than putting in step functions over the intervals [0, 1/n] we could put taller and taller triangular “tents” that would be continuous and still create the same effect. (d) This is a delicate question that requires a deeper study of the integral to work out in any satisfactory way. In all of the examples in this exercise, the sequence of badly behaving functions has been unbounded. This turns out to be a requirement. With a stronger integral it is possible to prove that if fn are integrable functions that are uniformly bounded, and if fn → f pointwise on Rb Rb [a, b], then a fn → a f.

120

Chapter 7. The Riemann Integral

Exercise 7.4.7. This exercise requires the stronger hypothesis (not included in earlier editions) that gn and g are uniformly bounded; i.e., that there exists M > 0 satisfying |g(x)| ≤ M

and

|gn (x)| ≤ M

for all n and x ∈ [0, 1].

As a first step we use the properties of the integral proved in this section to write ¯Z 1 Z δ Z 1 Z 1 ¯ Z 1 ¯ ¯ ¯ ¯ |gn − g| = |gn − g| + |gn − g|. gn − g¯ ≤ ¯ 0

0

0

0

δ

Let ² > 0. Let’s first pick δ < ²/(4M ). Having chosen δ, we know gn → g uniformly on [δ, 1], so there exists an N such that |gn − g| < ²/2 for all n ≥ N . It follows that if n ≥ N then ¯Z 1 Z 1 Z 1 ¯ Z δ ¯ ¯ ¯ ¯ |g − g| + |gn − g| ≤ g − g n n ¯ ¯ 0

0

0

Z ≤

2M + 0

≤

Z

δ

δ

1

²/2 δ

(2M )δ + ²/2 < ²,

and the result follows.

7.5

The Fundamental Theorem of Calculus

Exercise 7.5.1. Assume g is continuous on [a, b] and set G(x) = part (ii) of the Fundamental Theorem, g is the derivative of G.

Rx a

g(t) dt. By

Exercise 7.5.2. (a) For f (x) = |x| we get ½ −x2 /2 + 1/2 if x < 0 F (x) = x2 /2 + 1/2 if x ≥ 0 In this case, F is continuous and differentiable with F 0 (x) = f (x) for all x ∈ R. This follows from FTC but it is interesting to check this directly from the formula for F , especially at x = 0 where we get F 0 (0) = 0 from both sides. (b) This time we get ½ x+1 if x < 0 F (x) = 2x + 1 if x ≥ 0 A sketch of F is valuable and illustrates in particular that F is continuous on all of R but fails to be differentiable at x = 0 due to the “corner” on the graph. If x 6= 0, then we certainly get F 0 (x) = f (x) as predicted by FTC. Exercise 7.5.3. The Mean Value Theorem does not require F (x) to be differentiable at the endpoints so we could get by with assuming that F is continuous on [a, b] and differentiable on (a, b) with F 0 (x) = f (x) for all x ∈ (a, b). By appealing to Theorem 7.4.1 we could in fact weaken the hypothesis even more to allow F 0 (x) = f (x) to fail at an arbitrary finite number of points.

7.5. The Fundamental Theorem of Calculus

121

Exercise 7.5.4. (a) H(1) = 0. Using FTC we see that H 0 (x) = 1/x for all x > 0. (b) Given x < y, apply the Mean Value Theorem to H on the interval [x, y] to get H(y) − H(x) = H 0 (t)(y − x) for some t ∈ (x, y). Because H 0 (t) = 1/t > 0, it follows that H(y) > H(x). (c) Using the Chain Rule, we see that g 0 (x) = H 0 (cx) · c =

1 1 ·c= . cx x

Thus g and H have the same derivative and so by Corollary 5.3.4 to the Mean Value Theorem we know that g(x) = H(x) + k, or H(cx) = H(x) + k, for some constant k. To determine k, set x = 1 to get H(c) = H(1) + k = k, and the result follows. Exercise 7.5.5. Because fn0 → g uniformly on any interval of the form [a, x], it follows from Theorem 7.4.4 that Z x Z x 0 lim f = g. n→∞

a

a

Taking the limit as n → ∞ on each side of the equation leads to the equation Z x f (x) = f (a) + g.

Rx a

fn0 = fn (x) − fn (a)

a

But g is the uniform limit of continuous functions and so g must also be continuous. Part (ii) of the Fundamental Theorem of Calculus then implies that f 0 (x) = g(x), as desired. Exercise 7.5.6. This exercise requires that f be continuous. If we set G(x) = Rx f , then given this extra assumption about f , it follows from part (ii) of FTC a that G0 (x) = f (x). Because F 0 (x) = f (x) as well, F and G have the same derivative and a corollary to the Mean Value Theorem implies (1)

G(x) = F (x) + k,

for some constant k. To compute k, set x = a in equation (1) to get 0 = F (a)+k or k = −F (a). Substituting this back into (1) and setting x = b we find Z

b

f = G(b) = F (b) − F (a). a

ExerciseR 7.5.7. The idea is to apply the Mean Value Theorem to the function x G(x) = a g on the interval [a, b]. Note that g is continuous and so G is

122

Chapter 7. The Riemann Integral

differentiable and thus MVT can be employed. In this case we get that there exists a point c ∈ (a, b) where G(b) − G(a) 1 G (c) = = b−a b−a

Z

b

0

g. a

Because G0 (c) = g(c), this gives the desired result. Exercise 7.5.8. (a) Let P be a partition of [a, b] and consider a particular subinterval [xk−1 , xk ] of P . Because f 0 is continuous, we may use FTC to write Z

xk

f (xk ) − f (xk−1 ) =

f 0.

xk−1

Computing the variation with respect to this particular partition, we get ¯ ¯ n n ¯Z xk ¯ X X ¯ 0¯ |f (xk ) − f (xk−1 )| = f¯ ¯ ¯ xk−1 ¯ k=1 k=1 Z Z b n X xk ≤ |f 0 | = |f 0 |. k=1

xk−1

a

Rb What we discover is that a |f 0 | is an upper bound on the set of variations, and Rb it follows that V f ≤ a |f 0 | because V f is the least upper bound of this set. (b) Given a partition P , this time we apply MVT to an arbitrary subinterval [xk−1 , xk ] to get f (xk ) − f (xk−1 ) = f 0 (ck )∆xk

for some ck ∈ (xk−1 , xk ).

Because lower sums are computed by taking the infimum over each subinterval, this allows us to write n X k=1

|f (xk ) − f (xk−1 )| =

n X

|f 0 (ck )|∆xk ≥ L(|f 0 |, P ).

k=1

It follows that V f is an upper bound for the set of lower sums for |f 0 | and we Rb immediately get V f ≥ a |f 0 |. Rb Parts (a) and (b) then imply V f = a |f 0 |. Exercise 7.5.9. In this case, H(x) = x. Any doubts about whether this formula holds when x = 1 should be alleviated by the fact that we know H is continuous on all of R. It is evident, then, that H is differentiable everywhere. The point to make is that the statement in FTC part (ii) (if g is continuous then G is differentiable) does not have a converse unless we are more specific about the type of discontinuity in g.

7.6. Lebesgue’s Criterion for Riemann Integrability

123

Exercise 7.5.10. Let L1 = limx→c− f (x). If we insist that f (c) = L1 , then the argument in the text for FTC part (ii) can be used to show lim

x→c−

F (x) − F (c) = f (c) = L1 . x−c

On the other hand, if we let L2 = limx→c+ f (x), and set f (c) = L2 , then the same argument also shows that lim+

x→c

F (x) − F (c) = f (c) = L2 . x−c

Because L1 6= L2 the result is that the graph of F has a “corner” at x = c and is not differentiable. P∞ Exercise 7.5.11. Let h(x) = n=1 un (x) be the function defined in Exercise 6.4.8. Note that 0 < h(x) < 1 and h is increasing. By Exercise 7.2.6, h is integrable over any interval and thus we can set Z x H(x) = h(t) dt. 0

Part (ii) of FTC implies that H is continuous (and differentiable at every irrational point.) Also, if x < y then Z y H(y) − H(x) = h(t) ≥ 0, x

and it follows that H is increasing. Now fix a rational number rN from the enumeration in Exercise 6.4.8. The fact that h is increasing implies that both limx→r− h(x) and limx→r+ h(x) exist, and we can show that they must differ by N N rN . Then Exercise 7.5.10 implies that H is not differentiable at rN , and hence at any rational point in R.

7.6

Lebesgue’s Criterion for Riemann Integrability

Exercise 7.6.1. (a) Because t(x) = 0 for every irrational and the irrationals are dense in R, it follows that L(t, P ) = 0 for every partition P . (b) If x ∈ D²/2 then x must be a rational number of the form x = m/n with n ≤ 2/². The number of such points in the interval [0, 1] is finite. (c) Let {x1 , x2 , . . . , xN } be the finite set of points in D²/2 ∩[0, 1]. Now build a partition P by constructing small, disjoint intervals around each xk with length less than ²/(2N ). Because |t(x)| ≤ 1, the contribution of all of the intervals containing “bad points” to the upper sum will be at most N · (²/(2N )) · 1 = ²/2. On all of the other intervals we have |t(x)| ≤ ²/2 and so, taken altogether, these contribute at most ²/2 to the value of the upper sum. It follows that R1 U (t, P ) ≤ ²/2 + ²/2 = ². Thus, t is integrable and 0 t = 0.

124

Chapter 7. The Riemann Integral

Exercise 7.6.2. Because C contains no intervals, g(x) will equal zero at least once in every subinterval of every partition P . It follows that L(g, P ) = 0. Therefore, given ² > 0, our task is to find a partition where U (g, P ) < ². From R1 this we will be able to conclude that g is integrable and 0 g = 0. The set Cm consists of 2m intervals of length 3m , so choose m large enough so that 2m /3m < ²/2. It would be nice to simply use the intervals that make up Cm to construct our partition, but we need to worry a bit that the endpoints of these intervals are in C. To fix this, we can imbed each of the 2m closed intervals that make up Cm in a slightly larger interval whose length is equal to 1/3m + (²/2)2−m . This collection of intervals then contains C in its interior and the lengths sum to ² ² ² 2m [1/3m + 2−m ] < + = ². 2 2 2 Now two things follow if we let P be the partition obtained from these slightly enlarged intervals from Cm . First, because |g(x)| ≤ 1, the contribution of all of the intervals containing points of C to U (g, P ) is bounded by ². Second, on all of the other intervals, our function is zero and there are no contributions to U (g, P ). It follows that U (g, P ) ≤ ², as desired. Exercise 7.6.3. Let A = {a1 , a2 , a3 , . . .} be a countable set. Given ² > 0, let On = V²n (an ) where ²n = ²/2n+1 . Clearly the collection {On : n ∈ N} covers A and we have ∞ ∞ X X ² |On | = = ². n 2 n=1 n=1 Exercise 7.6.4. In Exercise 7.3.6 we proved that C has content zero which immediately implies C has measure zero. Exercise 7.6.5. Given ² > 0, let {O Pn∞: n ∈ N} be a collection of open intervals that cover A with the property that n=1 |On | ≤ ²/2. Likewise, P∞ let {Pn : n ∈ N} be a collection of open intervals that cover B satisfying n=1 |Pn | ≤ ²/2. Then the collection {On , Pn : n ∈ N} is still countable (the union of countable sets is countable), it forms a cover for the union A ∪ B, and ∞ X

|On | + |Pn | =

n=1

∞ X

|On | +

n=1

∞ X

|Pn | ≤ ²

n=1

as desired. Now assume we are given a countable collection {A1 , A2 , A3 , . . .} of sets of measure zero. Let ² > 0. For each Ak , let {Ok,n : n ∈PN} be a countable ∞ k collection of open intervals that cover Ak and satisfies n=1 |Ok,n | ≤ ²/2 . It follows that {Ok,n : n, k ∈ N} is a countable S collection of open intervals ∞ (Theorem 1.4.13 (ii)) whose union certainly covers k=1 Ak . Finally, taking the sum of the lengths of all of the intervals in {Ok,n : k, n ∈ N} involves reordering this set, but the content of Theorem 2.8.1 is that we are justified in simply computing the iterated sum ∞ X ∞ X k=1 n=1

|Ok,n | =

∞ X ² = ². 2k

k=1

7.6. Lebesgue’s Criterion for Riemann Integrability This shows

S∞ k=1

125

Ak has measure zero.

Exercise 7.6.6. See Exercise 4.6.8. Exercise 7.6.7. See Exercises 4.6.9 and 4.6.10 Exercise 7.6.8. See Exercise 4.6.7 Exercise 7.6.9. Assume, for contradiction, that f is not uniformly α-continuous on K. This means that given δn = 1/n, there must exist points xn , yn ∈ K such that 1 |xn − yn | < and |f (xn ) − f (yn )| ≥ α. n Because (xn ) ⊆ K and K is compact, there exists a convergent subsequence (xnk ). Set x = lim xnk and note that x ∈ K. If we consider the corresponding subsequence (ynk ) we see that lim ynk

= lim[xnk + (ynk − xnk )] = x + lim(ynk − xnk ) = x.

Because (xnk ) and (ynk ) both converge to x, it follows that given any δ > 0 we can find points xnK , ynK ∈ (x − δ, x + δ) satisfying |f (xnK ) − f (ynK )| ≥ α. But this contradicts the assumption that f is α-continuous at x, and we conclude that f must be uniformly α-continuous on K. Exercise 7.6.10. See Exercise 3.3.8 (c) and (d). Exercise 7.6.11. Because D has measure zero, we know there exists a countable collection of open intervals {G1 , G2 , . . .} whose union contains D and that satisfies (1)

∞ X n=1

|Gn | <

² . 4M

But Da ⊆ D is closed and hence compact. This means we can find a finite collection {G1 , . . . , GN } that covers Dα and the inequality above in (1) is certainly true for this smaller set. Exercise 7.6.12. If x ∈ K then x ∈ / Dα and it follows that f is α-continuous at x. Because we are removing open intervals from [a, b], we see that K is a closed set (it is a finite union of closed intervals). By Exercise 7.6.9, f is uniformly α-continuous on K. Exercise 7.6.13. As a first step in constructing P² we include the intervals PN from the open cover {G1 , G2 , . . . , GN }. Because n=1 |Gn | < ²/(4M ) the contribution of these subintervals to U (f, P² ) − L(f, P² ) can be estimated by ³ ² ´ X X ² = . (Mk − mk )∆xk < (2M ) ∆xk ≤ (2M ) 4M 2

126

Chapter 7. The Riemann Integral

SN Now consider the set K = [a, b]\ n=1 Gn . The function f is uniformly αcontinuous on K and so there exists δ > 0 such that |f (x) − f (y)| < α whenever |x − y| < δ. To finish constructing the partition P² we take each interval in K and subdivide it until all of the subintervals have length less than δ. The implication here is that on each of these subintervals we get Mk − mk ≤ α. Thus, the contribution of all of the subintervals that make up K is less than ¶ µ X X ² ² (b − a) = . (Mk − mk )∆xk ≤ α ∆xk < 2(b − a) 2 Altogether then we get U (f, P² ) − L(f, P² ) <

² ² + = ², 2 2

and it follows that f is Riemann-integrable. Exercise 7.6.14. (a) To produce a cover for Da , let {G1 , G2 , . . . , GN } be the collection of closed intervals from the partition P² that contain points of Dα . On each subinterval Gk , it follows that Mk − mk ≥ α. This enables us to write X α² > (Mk0 − mk0 )∆xk0 ≥

N X

(Mk − mk )|Gk |

k=1

≥ α

N X

|Gk |.

k=1

PN What we immediately see is that k=1 |Gk | < ². Now our definition of measure zero requires that our cover for Dα consist of open intervals. To remedy this, we can take each Gk to be open and cover the finite number of endpoints we have lost with intervals chosen small enough to keep the sum less than ². (b) From (a) we see that Dα has measure zero. Using Exercise 7.6.7, we can argue that the set D is a countable union of Dα sets. With a nod to Exercise 7.6.5, we conclude that D has measure zero. An issue discussed in Exercise 7.6.5 is that the proof in the countable case requires a Sresult about absolute convergence of double summations. To show ∞ that D = n=1 D1/n has measure zero we can avoid this complication because the cover for each D1/n consists of a finite collection of open intervals. Thus, we have a double summation but one of the sums is finite and we can use the Algebraic Limit Theorem to justify the manipulations we need. Exercise 7.6.15. (a) From the definition of the derivative we get g 0 (0) = limx→0 g(x)/x. For x < 0 we get g(x)/x = 0 and for x > 0 we get g(x)/x = x sin(1/x). In both cases we see limx→0 g(x)/x = 0, so g 0 (0) = 0. (b) The chain rule and product rule yield can be applied when x > 0 and this gives us the formula ½ − cos(1/x) + 2x sin(1/x) if x > 0 g 0 (x) = 0 if x ≤ 0.

7.6. Lebesgue’s Criterion for Riemann Integrability

127

(c) The cos(1/x) term in the formula for g 0 (x) oscillates between +1 and −1 as x → 0. Because the other term in this formula converges to zero, the net effect is that g 0 (x) attains every value between +1 and −1 as x → 0 from the right. Exercise 7.6.16. (a) If c ∈ C then fn (c) = 0 for all n ∈ N. It follows that limn→∞ fn (c) = 0. (b) If x ∈ / C then choose N to be the smallest natural number for which x ∈ / CN . Then, by its construction, fn (x) = fN (x) for all n ≥ N and limn→∞ fn (x) = fN (x). Exercise 7.6.17. If x ∈ / C then, as in the previous exercise, there is a smallest natural number N such that x ∈ / CN . This means that x is part of an open c interval O ⊆ CN where f (y) = fN (y)

for all y ∈ O.

Because fN is differentiable everywhere and O is open, we can be sure that f is differentiable at x. (b) Fix c ∈ C and let x ∈ [0, 1] be arbitrary. If x ∈ C then f (x) = 0 so |f (x)| ≤ (x − c)2 is trivially true. If x ∈ / C, then either f (x) = (x − c0 )2 sin(1/(x − c0 )) for some c0 ∈ C or—because of the “splicing together” process—we at least have |f (x)| ≤ (x − c0 )2 where c0 is an endpoint of an interval that makes up some Cn . The point to emphasize is that there are no elements of C between x and c0 which means |x − c0 | ≤ |x − c| and consequently |f (x)| ≤ (x − c0 )2 ≤ (x − c)2 , as desired. Turning our attention toward computing f 0 (c), we now have ¯ ¯ ¯ ¯ ¯ f (x) − f (c) ¯ ¯ f (x) ¯ |x − c|2 ¯ ¯=¯ ¯ ¯ x − c ¯ ¯ x − c ¯ ≤ |x − c| = |x − c|, from which it follows that f (x) − f (c) = 0. x→c x−c

f 0 (c) = lim

(c) Let CE consist of the countable set of points that appear as endpoints of the intervals that make up C1 , C2 , C3 , . . . . The content of Exercise 7.6.15 is not only that f 0 (x) fails to be continuous at each cE ∈ CE but that f 0 (x) attains every value between 1 and −1 in every neighborhood of cE . Given an arbitrary c ∈ C an argument like the one in Exercise 3.4.3 shows that there

128

Chapter 7. The Riemann Integral

exists a sequence (cn ) ⊆ CE with cn → c. Let δ > 0 be arbitrary. Choose N so that |cN − c| < δ/2 so that Vδ/2 (cN ) ⊆ Vδ (c). Because cN ∈ CE , we know that f 0 attains every value between 1 and −1 in the neighborhood Vδ/2 (cN ) and therefore the same is true inside the neighborhood Vδ (c). Because δ was arbitrary, we conclude that f 0 is not continuous at c. Exercise 7.6.18. The set of discontinuities of f 0 is precisely the Cantor set C. Because C has measure zero (see Exercise 7.6.4), Lebesgue’s Theorem (Theorem 7.6.5) implies f 0 is Riemann-integrable. Exercise 7.6.19. We start with the interval [0, 1]. To form C1 we remove 1 interval of length 1/9. To form C2 we then remove two intervals of length 1/27. In general, to form Cn we remove 2n−1 intervals of length 1/3n+1 . If we take the sum of the lengths of all of the intervals to be removed we get µ ¶ µ ¶ 1 1/9 1 1 1 +2 +4 + ··· = = . 9 27 81 1 − 2/3 3 This implies that the lengths |C1 |, |C2 |, |C3 |, . . . satisfy lim |Cn | = 1 −

n→∞

1 2 = . 3 3

Chapter 8

Additional Topics 8.1

The Generalized Riemann Integral

Exercise 8.1.1. (a) For any tagged partition (P, {ck }), it is certainly true that mk ≤ f (ck ) ≤ Mk , and this is enough to conclude L(f, P ) ≤ R(f, P ) ≤ U (f, P ). Rb The fact that L(f, P ) ≤ a f ≤ U (f, P ) follows from Definition 7.2.7. (b) Because P 0 is a refinement of P² , we can use Lemma 7.2.3 to argue U (f, P 0 ) − L(f, P 0 ) ≤ U (f, P² ) − L(f, P² ) <

² . 3

Exercise 8.1.2. This again follows from Lemma 7.2.3 and the fact that P 0 is a refinement of P . Exercise 8.1.3. (a) To form P 0 from P we added the points of P² . This means adding (n − 1) potentially new points to the interior of [a, b]. Now each new point adds two terms to U (f, P 0 ) that do not appear in U (f, P ) and also creates one term in U (f, P ) that is no longer in U (f, P 0 ). Thus, there can be at most 3(n − 1) terms of the form Mk ∆xk that appear in one of U (f, P 0 ) or U (f, P ) but not the other. (b) Because P is assumed to be δ-fine and P ⊆ P 0 , any term from either U (f, P 0 ) or U (f, P ) can be estimated by |Mk (xx − xk−1 )| ≤ M δ =

² . 9n

Using our conclusion from (a), we then get U (f, P ) − U (f, P 0 ) ≤ 3(n − 1) 129

² ² < . 9n 3

130

Chapter 8. Additional Topics

Exercise 8.1.4. (a) For each subinterval [xk−1 , xk ] from a partition P , we use Mk = sup{f (x) : x ∈ [xk−1 , xk ]} to compute the upper sum. By the Extreme Value Theorem, there exist points ck ∈ [xk−1 , xk ] where f (ck ) = Mk . Using the set {ck } as our tags, it follows that U (f, P ) = R(f, P ). (b) Assume our partition P has n subintervals. Using Lemma 1.3.7, we can pick points ck ∈ [xk−1 , xk ] so that Mk − f (ck ) <

² n∆xk

for each k ∈ {1, . . . , n}.

Then U (f, P ) − R(f, P ) =

n X

(Mk − f (ck ))∆xk <

k=1

n X k=1

² ∆xk = ². n∆xk

Exercise 8.1.5. We shall prove f is integrable using the criterion in Theorem 7.2.8. Let ² > 0. From our hypothesis we know that there exists a δ > 0 such that ² (1) |R(f, P ) − A| < 4 for all δ-fine partitions P regardless of the choice of tags. So let P² be δ-fine and use the previous exercise to pick tags {ck } so that U (f, P² ) − R(f, P² , {ck }) <

² . 4

Now we can also pick tags {dk } so that R(f, P² , {dk }) − L(f, P² ) <

² , 4

and from (1) above it must be that |R(f, P² , {ck }) − R(f, P² , {dk })| <

² . 2

A triangle inequality argument then implies U (f, P² ) − L(f, P² ) <

² ² ² + + = ², 4 2 4

and we conclude that f is integrable. A second implication from this string of inequalities is that L(f, P² ) ≤ A ≤ U (f, P² ), Rb from which may conclude that a f = A. Exercise 8.1.6. (a) Take P = {0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1}. The choice of tags does not matter because ∆xk = 1/10 < 1/9 = δ(ck ) for every choice of ck . (b) One such partition could be 1 1 1 2 1 3 3 P = {0, , , , , , , , 1}. 5 4 3 5 2 5 4

8.1. The Generalized Riemann Integral

131

For the tags we let c1 = 0 on the first subinterval [0, 1/5]. For every other subinterval we take ck to be the right-hand endpoint: c2 = 1/4, c3 = 1/3 and so on. Exercise 8.1.7. Assume, for contradiction, that this process does not terminate after a finite number of steps. Then we obtain a sequence of nested intervals (In ) satisfying |In | → 0 and δ(x) ≤ |In |

for all x ∈ In .

From T∞ the Nested Interval Property we know that there exists a point x0 ∈ n=1 In . But then δ(x0 ) ≤ |In | for all n ∈ N, and it follows that δ(x0 ) = 0. Because this is not allowed in the definition of a gauge, we conclude that the algorithm does terminate after a finite number of steps and we obtain a δ(x)-fine tagged partition. Exercise 8.1.8. Let δ(x) = min{d1 (x), δ2 (x)}. It is clear that δ(x) > 0 so this function qualifies a gauge. From Theorem 8.1.5, there exists a tagged partition (P, {ck }) that is δ(x)-fine and, consequently, it is also δ1 (x)-fine and δ2 (x)-fine. It follows that |A1 − A2 |

≤

|A1 − R(f, P )| + |R(f, P ) − A2 | ² ² < + = ², 2 2 and we conclude that A1 = A2 . Exercise 8.1.9. Looking at Theorem 8.1.2, we just observe that the constant δ can also serve as the gauge function δ = δ(x) required in Definition 8.1.6. / Q then g(ck )∆xk = 0. If Exercise 8.1.10. Let (P, {ck }) be δ(x)-fine. If c ∈ ck = rk0 for some k 0 , then ² g(ck )∆xk = ∆xk < δ(rk0 ) = k0 +1 . 2 Because it is possible for rk0 to be a tag in at most two partitions, it follows that n ∞ ³²´ X X g(ck )∆xk < 2 δ(rk0 ) = 2 = ². 2 k=1 k0 =1 R1 Thus R(g, P ) < ² and it follows that 0 g = 0. Exercise 8.1.11. This is due to the fact Pnthat we have a “telescoping” sum. Writing out the terms in the finite sum k=1 F (xk ) − F (xk−1 ), we can check that all of the summands cancel out except F (xn ) = F (b) and −F (x0 ) = −F (a). Exercise 8.1.12. We are assuming F is differentiable with F 0 (c) = f (c). This means that F (x) − F (c) . f (c) = lim x→c x−c The ²–δ criterion for functional limits then asserts the existence of the δ(c) > 0 described in the exercise.

132

Chapter 8. Additional Topics

Exercise 8.1.13. Let’s first apply the result in Exercise 8.1.12 with x = xk and c = ck . Because our tagged partition is assumed to be δ(c)-fine we know that (xk − ck ) ≤ (xk − xk−1 ) < δ(ck ) and so ¯ ¯ ¯ ¯ F (xk ) − F (ck ) ¯ ¯ − f (c ) k ¯ < ². ¯ xk − ck Multiplying by the positive number (xk −ck ) gives the first requested inequality. To obtain the second one we again apply Exercise 8.1.12, this time with x = xk−1 and c = ck . An equivalent way to write these two inequalities is −²(xk − ck ) < F (xk ) − F (ck ) − f (ck )(xk − ck ) < ²(xk − ck ) −²(ck − xk−1 ) < F (ck ) − F (xk−1 ) − f (ck )(ck − xk−1 ) < ²(ck − xk−1 ), and adding along the respective columns yields −²∆xk < F (xk ) − F (xk−1 ) − f (ck )∆xk < ²∆xk . Now this is equivalent to |F (xk ) − F (xk−1 ) − f (ck )∆xk | < ²∆xk and taking a sum over k gives us n X

|F (xk ) − F (xk−1 ) − f (ck )∆xk | < ²(b − a).

k=1

Looking back at the beginning of the proof in the text, we see that we have now derived the inequality requested in (2) albeit with ²(b − a) in place of ². This completes the proof. Exercise 8.1.14. (a) One implication of Theorem 8.1.9 is that every derivative has a generalized Riemann integral. (b) A second implication of Theorem 8.1.9 is Z b (F ◦ g)0 = F (g(b)) − F (g(a)). a

By the Chain Rule, (F ◦ g)0 (x) = = which implies

Z

b

F 0 (g(x)) · g 0 (x) f (g(x)) · g 0 (x) = (f ◦ g) · g 0 (x)

(f ◦ g) · g 0 = F (g(b)) − F (g(a)).

a

(c) Because f = F 0 on the interval g([a, b]), Theorem 8.1.9 implies that f has generalized Riemann integral Z g(b) f = F (g(b)) − F (g(a)). g(a)

8.2. Metric Spaces and the Baire Category Theorem

133

Combining this with the last equation in (b) gives Z

b a

Z (f ◦ g) · g 0 =

g(b)

f, g(a)

as desired.

8.2

Metric Spaces and the Baire Category Theorem

Exercise 8.2.1. (a) This is a metric. In fact this is the standard Euclidean distance function on R2 . Conditions (i) and (ii) are straightforward. The most common way to prove (iii) is to introduce the scalar product from vector calculus. Squaring both sides of (iii) gives in equivalent inequality that can be derived using the so-called Schwartz inequality. An alternative proof can be derived by first considering the special case where the point z falls on the line l(t) = (x1 , x2 ) + t(y1 − x1 , y2 − x2 ),

t∈R

through the points x and y. In this case z = l(t0 ) for some t0 ∈ R and it follows that d(x, z) = |t0 |d(x, y) and d(z, y) = |1 − t0 |d(x, y). Then the triangle inequality in R implies d(x, y) = (t0 + 1 − t0 ) d(x, y) ≤ (|t0 | + |1 − t0 |) d(x, y) = d(x, z) + d(z, y). To prove the general case, we let z ∈ R2 be arbitrary, and pick zt to be the point on the line l(t) such that the line through z and zt is perpendicular to l(t). Because x and y are both on the line l(t), we can use the Pythagorean Theorem to show that d(x, zt ) ≤ d(x, z) and d(y, zt ) ≤ d(y, z). Applying the previous result about collinear points we get d(x, y) ≤ d(x, zt ) + d(zt , y) ≤ d(x, z) + d(z, y). (b) This is a metric. Again, conditions (i) and (ii) can be easily verified. For (iii), we must consider 5 distinct cases. First, suppose that x, y and z are all distinct. Then d(x, y) = 1 < 2 = d(x, z) + d(z, y). If x = y 6= z, then d(x, y) = 0 < 2 = d(x, z) + d(z, y). If x 6= y = z, then d(x, y) = 1 ≤ 1 = d(x, z) + d(z, y),

134

Chapter 8. Additional Topics

which is identical to the case where y 6= x = z. Finally, if x = y = z, then d(x, y) = 0 ≤ 0 = d(x, z) + d(z, y). Thus the triangle inequality holds for all possible scenarios. (c) This is a metric. It is clear that d(x, y) ≥ 0. Also, if max{|x1 − y1 |, |x2 − y2 |} = 0, then |x1 − y1 | = 0 and |x2 − y2 | = 0. But this is true if and only if x1 = y1 and x2 = y2 . This proves (i). Because |xi − yi | = |yi − xi |, condition (ii) holds. For (iii), consider the case where max{|x1 − y1 |, |x2 − y2 |} = |x1 − y1 |. The triangle inequality from R1 implies |x1 − y1 | ≤ |x1 − z1 | + |z1 − y1 |. Because |x1 − z1 | ≤ d(x, z) and |z1 − y1 | ≤ d(z, y), it follows that |x1 − y1 | ≤ d(x, z) + d(z, y), and similar argument works in the other case. (d) This is not a metric, for it fails conditions (i) and (iii). This example fails (i) because we can have have d(x, y) = 0 where x 6= y. For instance, let x = (1, −1) and let y = (1, 1). Then d(x, y) = |1(−1)+1(1)| = 0, but x2 6= y2 , so x 6= y. Part (iii) also does not hold in general. Consider x = (1, −1), y = (4, −1), and z = (1, 1). Then d(x, y) = 6 > 5 = d(x, z) + d(z, y), which violates the triangle inequality. Exercise 8.2.2. (a) This is a metric. Clearly d(f, g) ≥ 0 and sup{|f (x) − g(x)|} = 0 if and only if f (x) = g(x) for all x ∈ [0, 1]. We also have that |f (x) − g(x)| = |g(x) − f (x)|, so condition (ii) holds. For (iii), we want to show that sup{|f (x) − g(x)|} ≤ sup{|f (x) − h(x)|} + sup{|h(x) − g(x)|}. Because | − g| is a continuous function on the compact set [0, 1], the Extreme Value Theorem asserts that there exists an x0 ∈ [0, 1] where |f (x0 ) − g(x0 )| is maximum. It follows that |f (x0 ) − g(x0 )|

=

|f (x0 ) − h(x0 ) + h(x0 ) − g(x0 )|

≤ |f (x0 ) − h(x0 )| + |h(x0 ) − g(x0 )| ≤ sup{|f (x) − h(x)|} + sup{|h(x) − g(x)|}. Hence (iii) is true and we have a metric. (b) This is not a metric. It fails condition (i), because it is possible for f (1)−g(1) = 0 when f 6= g. For instance, let f (x) = 2x−1 and let g(x) = 3x−2. Then f (1) − g(1) = 0 but f (x) 6= g(x).

8.2. Metric Spaces and the Baire Category Theorem

135

R1 (c) This is a metric. We can immediately verify that 0 |f − g| ≥ 0. That R1 |f − g| = 0 implies f = g is a consequence of the fact that |f − g| is non0 negative and continuous. The details of this argument are contained in Exercise 7.4.4 (c). This shows condition (i) holds. Clearly |f − g| = |g − f |, so (ii) is true as well. For (iii), we know that |f − g| = ≤

|f − h + h − g| |f − h| + |h − g|.

It then follows from Theorem 7.4.2 (i) and (iv) that Z

Z

1

|f − g| ≤ 0

Z

1

1

|f − h| + 0

|h − g|. 0

Thus (iii) holds, and we have a metric. Exercise 8.2.3. See Exercise 8.2.1 (b). In this exercise, we did not use the fact that X = R2 . Hence our argument holds for any set X. Exercise 8.2.4. Let (X, d) be a metric space and let (xn ) ⊆ X converge to x ∈ X. Given ² > 0, there exists an N such that d(xn , x) < ²/2 whenever n ≥ N . Now if n, m ≥ N we can use the triangle inequality to write d(xm , xn ) ≤ d(xm , x) + d(x, xn ) <

² ² + = ². 2 2

Thus (xn ) is a Cauchy sequence. Exercise 8.2.5. (a) By considering values of ² less than one, we can show that Cauchy sequences in this metric space are eventually constant sequences. Because such a sequence converges (to this constant value), R2 is complete with respect to this metric. (b) Assume that (fn ) is a Cauchy sequence in the metric of Exercise 8.2.2 (a). Then given ² > 0, there exists an N such that d(xm , xn ) < ² for all m, n ≥ N . This implies that |fm (x) − fn (x)| < ²

for all m, n ≥ N and x ∈ [0, 1].

Thus (fn ) converges uniformly according to the Cauchy Criterion for Uniform Convergence (Theorem 6.2.5). What is really happening here is that convergence with respect to this metric is equivalent to uniform convergence on [0, 1]. If f = limn→∞ fn uniformly, then f is continuous by Theorem 6.2.6. Hence f is an element of C[0, 1] and the metric is complete. (c) Let’s start with a Cauchy sequence (fn ) in C[0, 1]. In (b) we saw that there exists f ∈ C[0, 1] such that fn → f uniformly, but it does not have to be the case that f ∈ C 1 [0, 1]. A counterexample appears in Example 6.2.2 (iii). (d) Convergent sequences in the discrete metric are eventually constant sequences.

136

Chapter 8. Additional Topics

1

¦E ¦E ¦ E ¦ E ¦ E E ¦

-

1/2

1

Figure 8.1: Sketch of hδ (x).

Exercise 8.2.6. (a) Let ² > 0. We need to find a δ > 0 such that |g(f ) − g(h)| < ²

whenever d(f, h) < δ.

Because k ∈ C[0, 1], there exists a constant K > 0 satisfying |k(x)| ≤ K for all x ∈ [0, 1]. Now the properties of the Riemann integral allow us to write ¯Z 1 Z 1 ¯ ¯ ¯ |g(f ) − g(h)| = ¯¯ fk − hk ¯¯ 0 0 ¯Z 1 ¯ ¯ ¯ = ¯¯ (f − h)k ¯¯ 0 Z 1 Z 1 ≤ |f − h||k| ≤ K |f − h|. 0

0

Now pick δ = ²/K. Then d(f, h) < δ implies Z |g(f ) − g(h)| ≤

1

K

|f − h| 0

Z < K

0

1

² = ², K

and g is continuous on C[0, 1]. (b) Let ² > 0. We need to find a δ > 0 such that d(f, h) < δ implies |g(f ) − g(h)| < ². In this case it works to take δ = ². To see why, note that if d(f, h) < ² then |g(f ) − g(h)| = |f (1/2) − h(1/2)| ≤ sup{|f (x) − h(x)| : x ∈ [0, 1]} = d(f, h) < ², as desired. (c) Let f be the zero function and for small δ > 0 let hδ be the function pictured in Figure 8.1. Note that h(x) = 0 on [0, 1/2 − δ] and [1/2 + δ, 1]. On

8.2. Metric Spaces and the Baire Category Theorem

137

(1/2−δ, 1/2+δ) define h to be the piecewise linear “tent” satisfying h(1/2) = 1. R1 Then d(f, hδ ) = 0 |hδ | = δ. Now observe that for all δ > 0 we have |f (1/2) − hδ (1/2)| = 1. Given ²0 = 1/2, for instance, the functions hδ can be chosen arbitrarily close to f and still satisfy |g(f ) − g(hδ )| ≥ ²0 . Thus g is not continuous at f , and a similar argument shows it is not continuous at any other point. Exercise 8.2.7. (a) The ²-neighborhoods of the metric in (a) are discs with center x and radius ². For the metric in (b), V² (x) = R2 if ² ≥ 1. If ² < 1, then V² (x) is a singleton point. The metric in part (c) has ²-neighborhoods that form a square with sides of length 2² and x in the center. In the discrete metric, V² (x) = X if ² ≥ 1. If ² < 1, then V² (x) is a singleton point. (b) Using the discrete metric in R, V² (x) is the entire real line if ² ≥ 1. If ² < 1 then V² (x) is a singleton point. Exercise 8.2.8. (a) Let a ∈ V² (x). We want to show that there exists an ²0 > 0 such that V²0 (a) ⊆ V² (x). According to Definition 8.2.6, d(x, a) < ². Let ²0 = ² − d(x, a). If b ∈ V²0 (a) then d(a, b) < ²0 and the triangle inequality imlies d(x, b) ≤ d(x, a) + d(a, b) < d(x, a) + ²0 = ². This implies that b ∈ V² (x), so V²0 (a) ⊆ V² (x). Hence V² (x) is open. The set C² (x) is closed. Assume that y is a limit point of C² (x). If δ > 0, then Vδ (y) intersects C² (x) at a point z 6= y. So d(x, z) ≤ ² and d(z, y) < δ. By the triangle inequality, d(x, y) ≤ ≤

d(x, z) + d(z, y) ² + δ.

Because δ > 0 is arbitrary, it must be that d(x, y) ≤ ². Therefore y ∈ C² (x) and thus C² (x) is closed. (b) Let h be a limit point of Y , and let ² > 0 be arbitrary. Then V² (h) intersects Y at a point g 6= h. So d(g, h) < ², and g ∈ Y meaning |g(x)| ≤ 1 for all x ∈ [0, 1]. It follows that |h(x)|

= |h(x) − g(x) + g(x)| ≤ |h(x) − g(x)| + |g(x)| < ² + 1.

Because ² > 0 is arbitrary, |h(x)| ≤ 1 for all x ∈ [0, 1] and thus h ∈ Y . This proves Y is closed. (c) This set is closed. Suppose that g is a limit point of T . Then, given ² > 0, we know V² (g) intersects T at a point h 6= g. So |g(0) − h(0)| ≤ d(g, h)) < ².

138

Chapter 8. Additional Topics

But h ∈ T so h(0) = 0 and this implies |g(0)| < ². Because ² is arbitrary, we conclude that g(0) = 0, and hence g ∈ T . Thus T is closed. Exercise 8.2.9. (a) A subset K of a metric space (X, d) is bounded if there exists M > 0 and x ∈ X such that d(x, k) ≤ M for all k ∈ K. (b) First let’s prove that K is bounded. Assume, for contradiction, that K is not bounded. Our goal is to produce a sequence in K that does not have a convergent subsequence. Because K is not bounded, there must exist elements x1 , x2 ∈ K satisfying d(x1 , x2 ) ≥ 1. Having picked x2 , there exists an element x3 ∈ K such that d(x2 , x3 ) ≥ 2. In general, given xn ∈ K, we can pick xn+1 ∈ K satisfying d(xn , xn+1 ) ≥ 2n . An extended triangle inequality argument shows that it must be that d(xn , xm ) ≥ 1 for all m 6= n. Now, because K is assumed to be compact, (xn ) has a convergent subsequence (xnk ). But the elements of the subsequence (xnk ) must necessarily satisfy d(xnk , xnk0 ) ≥ 1, and consequently (xnk ) is not Cauchy and cannot converge. This contradiction proves K is bounded. To show that K is closed let x be an arbitrary limit point of K. For each δn = 1/n, the neighborhood Vδn (x) intersects K so we can choose xn ∈ K ∩ Vδn (x). It follows that (xn ) → x and (xn ) ⊆ K. By compactness, there is a subsequence (xnk ) → y with y ∈ K. But every subsequence of a convergent sequence converges to the same limit, so y = x which implies x ∈ K. Thus, K is closed. (c) We have already shown that Y is closed, and the fact that d(f, 0) ≤ 1 for all f ∈ Y shows that Y is bounded. To see that Y is not compact, consider the sequence fn (x) = xn and check that fn ∈ Y . Now the pointwise limit f (x) = lim fn (x) is not continuous and every subsequence of (fn ) will necessarily converge pointwise to f ∈ / C[0, 1]. Because convergence in the metric space C[0, 1] means uniform convergence and uniform limits of continuous functions are continuous, we see there is no way to find a convergent subsequence of (fn ). Exercise 8.2.10. (a) (⇒) Assume that E is closed. Then E contains its limit points, so E ∪ L = E, where L is the set of limit points of E. Therefore E = E. (⇐) Now assume that E = E. Then E = E ∪ L, so E contains its limit points and hence it is closed. (⇒) Assume that E is open. Then for each x ∈ E, there exists an ² > 0 such that V² (x) ⊆ E. Hence E = E ◦ . (⇐) For the other direction, if E ◦ = E, then for each x ∈ E, V² (x) ⊆ E. Hence E is open. c

(b) Let x ∈ E . Then x ∈ / E. Hence there exists ² > 0 such that V² (x) does not intersect E, so V² (x) ⊆ E c . By Definition 8.2.8, x ∈ (E c )◦ . This shows c E ⊆ (E c )◦ . To prove the other inclusion let x ∈ (E c )◦ . Then there exists c c c V² (x) ⊆ E c . So x ∈ / E, and hence x ∈ E . Thus (E c )◦ ⊆ E and E = (E c )◦ . To prove the second statement let x ∈ (E ◦ )c . Then x ∈ / E ◦ , so every V² (x) fails to be contained in E. Thus every V² (x) intersects E c , and therefore x ∈ E c . This shows (E ◦ )c ⊆ E c . Now assume x ∈ E c . Then every V² (x) intersects E c ,

8.2. Metric Spaces and the Baire Category Theorem

139

and so V² (x) is not contained in E. Thus x ∈ / E ◦ implying x ∈ (E ◦ )c . This ◦ c ◦ c c proves E ⊆ (E ) and hence we have (E ) = E c . Exercise 8.2.11. Set ² = 1 and consider the discrete metric on an arbitrary space X consisting of at least two points. If we fix x ∈ X, then V² (x) is just the singleton set {x}. Because this set has no limit points we also get V² (x) = {x}. On the other hand, the set {y ∈ X : d(x, y) ≤ 1} is the entire space X. Note that an important consequence of Exercise 8.2.8 (a) is that we always have the inclusion V² (x) ⊆ {y ∈ X : d(x, y) ≤ ²}. This fact is used implicitly in the proof of Theorem 8.2.10. ◦

Exercise 8.2.12. (⇒) Let E be nowhere-dense in X. Then E is empty. This c c means that given x ∈ E, every V² (x) intersects E . So x is a limit point of E . c c It follows that E = X, and hence E is dense. c

c

(⇐) Now assume that E is dense. Then E = X. So every point x ∈ X is c c either an element of E or a limit point of E . This implies that for all ² > 0, ◦ V² (x) is not contained in E, which means that E is empty. Hence E is nowhere dense. Exercise 8.2.13. (a) Pick x1 ∈ O1 . Because O1 is open, there exists an ²1 > 0 such that V²1 (x1 ) ⊆ O1 . Since O2 is dense, there exists an x2 ∈ V²1 (x1 ) ∩ O2 . We also have that V²1 (x1 ) ∩ O2 is open, so there exists an ²2 > 0 such that V²2 (x2 ) ⊆ V²1 (x1 ) ∩ O2 , and let’s also insist that ²2 satisfy ²2 < ²1 /2. Now certainly V²2 (x2 ) ⊆ O2 , but we want V²2 (x2 ) ⊆ V²1 (x1 ) to be true. By shrinking ²2 we can ensure that the closure of V²2 (x2 ) is contained in V²1 (x1 ). (The result in Exercise 8.2.8 (a) and the discussion in the solution of Exercise 8.2.11 contain the justification for this last claim.) (b) In general, following (a) we can produce a sequence (xn ) with V²n+1 (xn+1 ) ⊆ V²n (xn ) ⊆ On

where ²n+1 < ²1 /2n .

This last condition on (²n ) ensures that (xn ) is a Cauchy sequence and so x = limn→∞ xn exists because our space is complete. For each m ∈ N, our sequence (xn ) is eventually contained T∞ in the set V²m+1 (xm+1 ) ⊆ Om . It follows that x ∈ Om and the intersection m=1 Om is not empty. Exercise 8.2.14. If E is nowhere-dense in X, then (E)c is dense. Although we have not explicitly proved it to this point, we can also show that the complement of a closed set (such as E) is open. Now suppose that EnSis a collection of nowhere dense sets and assume, S∞ for ∞ contradiction, that X = n=1 En . ThenTcertainly it is true that X = n=1 En . ∞ By De Morgan’s Law, this implies that n=1 (En )c is empty. But since (En )c is dense and open, this intersection is not empty by Theorem 8.2.10, so we have reached a contradiction.

140

Chapter 8. Additional Topics

Exercise 8.2.15. Assume f is differentiable at x so that f 0 (x) = lim

t→x

f (x) − f (t) . x−t

Choose n > |f 0 (x)|. Applying the definition of functional limits with ²0 = n − |f 0 (x)|, it follows that there exists a δ > 0 such that ¯ ¯ ¯ f (x) − f (t) ¯ 0 ¯ − f (x)¯¯ < ²0 whenever 0 < |x − t| < δ. ¯ x−t Now choose m large enough so that 1/m < δ. Then we can show 0 < |x − t| < 1/m implies ¯ ¯ ¯ f (x) − f (t) ¯ ¯ ¯ ¯ x − t ¯ ≤ n, and we conclude that f ∈ Am,n . Exercise 8.2.16. (a) The sequence (xk ) is contained in [0, 1] and so the Bolzano– Weierstrass Theorem can be applied to argue that there is a convergent subsequence. (b) Let ² > 0. Because fkl → f uniformly, we can pick L1 so that l ≥ L1 implies |fkl (y) − f (y)| < ²/2 for all y ∈ [0, 1]. Now the limit function f is continuous at x and so there exists a δ > 0 such that |f (xkl ) − f (x)| <

² 2

whenever |xkl − x| < δ.

Because xkl → x, we can pick L2 so that |xkl − x| < δ for all l ≥ L2 . Finally, set L = max{L1 , L2 }. Then l ≥ L implies |fkl (xkl ) − f (x)| ≤

|fkl (xkl ) − f (xkl )| + |f (xkl ) − f (x)| ² ² < + = ². 2 2

(c) If t satisfies |x−t| < 1/m, then there exists an L such that |xkl −x| < 1/m for all l ≥ L. In this case we have ¯ ¯ ¯ fkl (xkl ) − fkl (t) ¯ ¯ ¯ ≤ n. ¯ ¯ xkl − t Now taking the limit as l → ∞ and using (b) together with the Algebraic Limit Theorem and the Order Limit Theorem gives ¯ ¯ ¯ f (x) − f (t) ¯ ¯ ¯ ¯ x − t ¯ ≤ n. The conclusion is that f ∈ Am,n meaning that Am,n contains its limit points and thus is closed.

8.3. Fourier Series

141

Exercise 8.2.17. (a) Because f is continuous on [0,1], it is uniformly continuous. Thus, given ² > 0, there exists δ > 0 such that (1)

|f (x) − f (y)| < ²/4

whenever |x − y| < δ.

Now let {0 = x0 < x1 < · · · < xn = 1} be a partition of [0, 1] where every subinterval satisfies xk − xk−1 < δ. Our function p is going to satisfy p(xk ) = f (xk ) for all k = 0, 1, . . . , n. On each subinterval [xk−1 , xk ] we define p(x) to be the line segment connecting the endpoints (xk−1 , f (xk−1 )) and (xk , f (xk )). It’s straightforward to check that p is piecewise linear and continuous. Also, given a point x ∈ [xk−1 , xk ], statement (1) above implies |f (x) − p(x)|

≤

|f (x) − f (xk )| + |f (xk ) − p(x)| ² ² ² < + = . 4 4 2

It follows that kf − pk + ∞ < ²/2. (b) Assume |h(x)| ≤ 1 for all x ∈ [0, 1]. Then |f (x) − g(x)|

² h(x)| 2 ² ≤ |f (x) − p(x)| + |h(x)| 2 ² ² < + = ². 2 2 = |f (x) − p(x) +

It follows that d(f, g) < ² and thus g ∈ V² (f ). (c) Because p is piecewise linear, we can let M be the maximum of the absolute values of the slopes of each segment that make up p. Now consider the sawtooth function h(x) from Section 5.4 and sketched in Figure 5.6. For any choice of N ∈ N, the function gN (x) = p(x) +

² h(N x) 2

is continuous, piecewise linear and, by part (b), falls in the ²-neighborhood V² (f ). Now if we choose N > 2(n + M )/², we can argue that every line segment that makes up gN has slope greater than n in absolute value. The result of this is that gN ∈ / Am,n and consequently V² (f ) is not contained in Am,n . Because ² and f were arbitrary, it follows that Am,n has no interior points and thus it is nowhere dense. We conclude that D is a subset of the countable union of the nowhere dense sets {Am,n } and thus D is a set of first category in C[0, 1].

8.3

Fourier Series

Exercise 8.3.1. (a) Taking partial derivatives yields ∂2u ∂2u 2 = −b sin(nx) cos(nt) · n = . n ∂x2 ∂t2

142

Chapter 8. Additional Topics

Also u(0, t) = bn sin(0) cos(nt) = 0

and

u(π, t) = bn sin(πn) cos(nt) = 0. Note that this second statement requires n be an integer. Finally, ∂u = −bn sin(nx) sin(nt) · n, ∂t and setting t = 0 gives ∂u ∂t (x, 0) = 0. (b) The derivative is a linear transformation meaning that the derivative of the sum of functions is the sum of the derivatives of each one. This property makes (1) and (3) true for a sum of solutions, and (2) is easy to check as well. Exercise 8.3.2. (a) Z

π

cos(nx) dx = −π

¯π ¯ 1 sin(nx)¯¯ = 0. n −π

(b) Using a trigonometric identity for cos2 θ we get ¯π ¶ Z π ´ µ −π ³π ¯ x 1 cos2 (nx) dx = + sin(2nx)¯¯ = +0 − + 0 = π. 2 4n 2 2 −π −π (c) Using a trigonometric identity for cos θ sin β we get ¯π Z π cos(n − m)x cos(n + m)x ¯¯ cos(mx) sin(nx) dx = − − = 0, 2(n − m) 2(n + m) ¯−π −π where the zero occurs because the cosine function is even and gives the same value at x = π and x = −π. The other integrals in (a), (b) and (c) can be done in a similar fashion. Exercise 8.3.3. Start with equation (6) in the text and multiply each side of this equation by cos(mx) to get f (x) cos(mx) = a0 cos(mx) =

∞ X

an cos(nx) cos(mx) + bn sin(nx) cos(mx).

n=1

Now take the integral of each side of this equation from −π to π and, as before, distribute the integral through the infinite sum. Using Exercise 8.3.2, we see that for a0 and for every value of n ∈ N we get an integral that equals zero except the one where n = m. When n = m we get Z π am cos2 (mx) dx = am π −π

and it follows that

Z

π

f (x) cos(mx) dx = am π. −π

The formula for am is immediate. To get the formula for bm we multiply across equation (6) by sin(mx) and follow the same procedure.

8.3. Fourier Series

143

Exercise 8.3.4. (a) The approximating functions are trigonometric functions, which are continuous. The limit function f (x) is not continuous. Because the uniform limit of continuous functions is continuous, we know the convergence in this case cannot be uniform. (b) The function g(x) = |x| is even and so a symmetry argument shows that bn = 0 for all n ≥ 1. For a0 we write µ ¶ Z π Z 1 1 π 1 π2 π a0 = |x|dx = xdx = = . 2π −π π 0 π 2 2 For an with n ≥ 1 we use integration by parts to compute Z Z 1 π 2 π an = |x| cos(nx)dx = x cos(nx)dx π −π π 0 ¯π ¶ µ ¯ 2 x 1 = sin(nx) + 2 cos(nx)¯¯ π n n 0 2 = (cos(nπ) − 1) n2 π ½ −4/(n2 π) if n is odd = 0 if n is even. Plugging these results into equation (6) in the text we get |x| =

∞ π 4 X 1 − cos((2m + 1)x). 2 π m=0 (2m + 1)2

Before constructing any graphs, we can observe that the coefficients in this case go to zero like 1/n2 . More specifically, we have |an cos(nx)| < (4/π)(1/n2 ) P and because 1/n2 converges we can use the Weierstrass M-Test to conclude that our series converges uniformly to some continuous function. In fact, Sn does converge to g is this case as suggested by the sketch of S3 and g in Figure 8.2. (c) Taking the term-by-term derivative of the series for g(x) = |x| in (b) gives the Fourier series for the square-wave f (x) derived in Example 8.3.1. This makes intuitive sense because away from zero we have g 0 (x) = f (x). Using SN (x) to denote the partial sums of the Fourier series for g(x) in (b), we have g(x) = lim SN (x). Then graphical evidence suggests that, for all x 6= nπ, (1)

0 g 0 (x) = f (x) = lim SN (x).

In order to use Theorem 6.4.3 to prove something rigorous, we would need to 0 know that SN (the Fourier series for f (x)) converges uniformly. The Weierstrass M-Test is of no use because the Fourier coefficients for f (x) go to zero like 1/n P and 1/n diverges. We remarked in (a) that the convergence to f (x) is not

144

Chapter 8. Additional Topics

–3

–2

–1

1

x

2

3

Figure 8.2: g(x) = |x| and S3 on [−π, π].

uniform on intervals containing x = 0. This is reassuring since g 0 does not exist here. On compact sets that do not contain points of the form x = nπ, it turns out that the convergence of the series for f (x) is uniform meaning statement (1) above could be proved using Theorem 6.4.3. Differentiating the series for f (x) term-by-term gives a series of the form ∞ 4 X 4 cos((2m + 1)x) = (cos(x) + cos(3x) + cos(5x) + · · · ). π m=0 π

When x = ±π/2 the series converges to zero, but otherwise the series diverges because the terms do not tend to zero. Thus, even though f 0 exists (away from zero), we cannot obtain a valid representation for f 0 by differentiating the Fourier series for f in a term-by-term fashion. This predicament should be contrasted with the situation for power series where term-by-term differentiation always yields a valid series. Exercise 8.3.5. Recall that any function continuous on a compact set is uniformly continuous. Thus h is uniformly continuous over, say, [π, 3π]. This means that given ² > 0, there exists a δ > 0 that “works” for all pairs x, y in this set. Now the fact that h is periodic implies that this δ suffices on all of R. Exercise 8.3.6. Let c be the midpoint of [a, b], and let’s assume [a, b] is chosen so that sin(na) = sin(nb) = sin(nc) = 0 with sin(nx) ≥ 0 on [a, c] and sin(nx) ≤ 0 on [c, b].

a

c

b

8.3. Fourier Series

145

Then Z

Z

b

a

Z

c

h(x) sin(nx) dx =

h(x) sin(nx) dx + a

b

h(x) sin(nx) dx, c

and the trick is to argue that because h does not change very much over this interval, the two integrals on the right mostly cancel out. To make this explicit, note that sin(n(x + π/n)) = − sin(nx) so Z b Z c Z c h(x) sin(nx) dx = h(x+π/n) sin(n(x+π/n)) dx = − h(x+π/n) sin(nx) dx. c

a

a

Then we can write ¯ ¯Z ¯ ¯ b ¯ ¯ h(x) sin(nx) dx¯ ¯ ¯ ¯ a

= ≤ < =

¯Z c ¯ ¯ ¯ ¯ (h(x) − h(x + π/n)) sin(nx) dx¯ ¯ ¯ a Z c |h(x) − h(x + π/n)| sin(nx) dx a Z ² c sin(nx) dx 2 a µ ¶ ² 2 ² = . 2 n n

Over the interval [−π, π] there are exactly n intervals of length 2π/n like the interval [a, b]. Thus it follows that ¯Z ¯ ¯Z π ¯ ¯ bn ¯ ³²´ ¯ ¯ ¯ ¯ ¯ = ², h(x) sin(nx) dx¯¯ ≤ n · ¯ h(x) sin(nx) dx¯ < n ¯ ¯ an ¯ n −π for all n ≥ N . This completes the proof. Exercise 8.3.7. (a) Because f is continuous, the function qx (u) = f (u + x) − f (x) is continuous. It follows from the Riemann–Lebesgue Lemma (Theorem 8.3.2) that Z π

qx (u) cos(N x) dx → 0

as N → ∞.

−π

(b) The idea here is to show that the discontinuity of px (u) at zero is removable; that is, that px (u) can be defined at u = 0 in such a way that makes px continuous. To see how to do this write px (u)

= =

(f (u + x) − f (x)) cos(u/2) sin(u/2) f (u + x) − f (x) (u/2) 2 · · cos(u/2). u sin(u/2)

The fact that f is differentiable at x and the well-known limit limt→0 sin(t)/t = 1 imply lim px (u) = 2f 0 (x). u→0

146

Chapter 8. Additional Topics

Thus, defining px (0) = 2f 0 (x) makes px continuous on (−π, π] and it now follows from the Riemann–Lebesgue Lemma that Z π px (u) sin(N u) du → 0 as N → ∞. −π

Exercise 8.3.8. This exercise appeared as Exercise 2.3.11. Exercise 8.3.9. For k = 1, 2, . . . , N write · ¸ sin(kθ) cos(θ/2) 1 cos(kθ) + Dk (θ) = 2 sin(θ/2) as in the proof of Theorem 8.3.3. Then, # # " " N N N X 1 1 1 X cos(θ/2) X + Dk (θ) = 1+ cos(kθ) + sin(kθ) N +1 2 2(N + 1) sin(θ/2) k=1 k=1 k=1 · ¸ 1 1 cos(θ/2) sin(N θ/2) sin((N + 1)θ/2) = + DN (θ) + 2(N + 1) 2 sin(θ/2) sin(θ/2) =

1 [B] , 2(N + 1) sin2 (θ/2)

where B=

sin2 (θ/2) sin(θ/2) sin(N θ + θ2 ) + + cos(θ/2) sin(N θ/2) sin((N + 1)θ/2). 2 2

To finish the proof we must show that B = sin2 ((N + 1)θ/2). Using the identity sin(t) cos(t) = (1/2) sin(2t), we can write sin2 ((N + 1)θ/2) =

2

[sin(N θ/2) cos(θ/2) + cos(N θ/2) sin(θ/2)] sin(N θ) sin(θ) = sin2 (N θ/2) cos2 (θ/2) + + cos2 (N θ/2) sin2 (θ/2). 2

Now we use Fact 1(b) from the text together with the identities sin(t) cos(t) = (1/2) sin(2t) and 1 + cos(t) = 2 cos2 (t/2) to write B

=

=

=

sin2 (θ/2) sin(θ/2) + [cos(N θ) sin(θ/2) + sin(N θ) cos(θ/2)] 2 2 + cos(θ/2) sin(N θ/2) [cos(N θ/2) sin(θ/2) + sin(N θ/2) cos(θ/2)] sin(N θ) sin(θ) sin2 (θ/2) [1 + cos(N θ)] + 2 4 sin(N θ) sin(θ) + + sin2 (N θ/2) cos2 (θ/2) 4 sin(N θ) sin(θ) sin2 (θ/2) cos2 (N θ/2) + + sin2 (N θ/2) cos2 (θ/2). 2

This completes the derivation.

8.3. Fourier Series

–3

147

–2

–1

1

x

2

3

Figure 8.3: F16 on [−π, π].

Exercise 8.3.10. (a) Setting D0 = 1/2, we get that Z 1 π Sn (x) = f (u + x)Dn (u) du for all n ≥ 0 π −π as in the proof of Theorem 8.3.3. Then, σN (x) =

N 1 X Sn (x) = N + 1 n=0

= =

¶ N µ Z 1 X 1 π f (u + x)Dn (u) du N + 1 n=0 π −π ! Ã Z N 1 π 1 X Dn (u) du f (u + x) π −π N + 1 n=0 Z 1 π f (u + x)FN (u) du. π −π

(b) Looking at Figure 8.3, we see that FN , like DN , has a spike at the origin. However, unlike DN , FN ≥ 0 and as N gets larger we can observe that away from zero the magnitude of the oscillations actually dies off to zero. To make this observation explicit, we can refer to the formula " #2 sin((N + 1) θ2 ) 1 FN (u) = . 2(N + 1) sin( θ2 ) If δ ≤ |u| ≤ π, then | sin(u/2)| ≥ sin(δ/2) and we see µ ¶2 1 1 |FN (u)| ≤ . 2(N + 1) sin(δ/2) Because this estimate tends to zero as N → ∞ and is independent of u, we see that FN → 0 uniformly on the set δ ≤ |u| ≤ π.

148

Chapter 8. Additional Topics Rπ (c) This follows from the fact that −π Dk (u) du = π for k = 0, 1, . . . , N . (d) From (c) we are able to write Z 1 π f (x) = f (x)FN (u) du, π −π

so that σN (x) − f (x)

= =

1 π 1 π

Z

π

(f (u + x) − f (x))FN (u) du −π Z δ

Z (f (u + x) − f (x))FN (u) du.

(f (u + x) − f (x))FN (u) du + −δ

|u|≥δ

Given ² > 0, use the uniform continuity of f to choose δ > 0 so that |f (x + u) − f (x)| < ²

whenever |u| < δ.

It follows that ¯ Z ¯ ¯1 δ ¯ ¯ ¯ (f (u + x) − f (x))FN (u) du¯ ¯ ¯ π −δ ¯

≤ <

² π ² π

Z

δ

FN (u) du −δ Z π

FN (u) du = ². −π

Having chosen δ, now pick N0 large enough so that N ≥ N0 implies |FN (u)| ≤ ² for all |u| ≥ δ. Letting M be an upper bound on the size of |f | we see |f (u + x) − f (x)|FN (u) ≤ 2M ² as long as |u| ≥ δ, and it follows that ¯Z ¯ Z π ¯ 1 ¯¯ 1 ¯ du = 4M ². (f (u + x) − f (x))FN (u) du¯ ≤ (2M ²) ¯ ¯ π π ¯ |u|≥δ −π Combining the estimates on each of these two integrals, we get that |σN (x) − f (x)| ≤ ² + 4M ²

for all x ∈ (−π, π] and N ≥ N0 .

Because ² is arbitrary, we conclude that σN → f uniformly, and the proof is complete. Exercise 8.3.11. Fix eN = 1/N. If we can find a polynomial pN (x) such that |pN (x) − f (x)| < ²N

for all x ∈ [0, π],

it will follow that pN → F uniformly, as desired. From Fej´er’s theorem, we know there exists N such that |σN (x) − f (x)| <

²N 2

for all x ∈ [0, π].

8.4. A Construction of R From Q

149

But σN (x) is a linear combination of the partial sums Sn (x) and each Sn (x) is a linear combination of functions of the form cos(kx) and sin(kx). From the previous discussion about Taylor series, we know it is possible to find polynomials that are arbitrarily and uniformly close to the trigonometric functions that constitute each Sn . Because the sums in question are all finite, a repeated application of the triangle inequality implies that we can find a polynomial pN satisfying ²N |pN (x) − σN (x)| < for all x ∈ [0, π]. 2 Finally, one last triangle inequality argument shows |pN (x) − f (x)|

≤

|pN (x) − σN (x)| + |σN (x) − f (x)| ²N ²N < + = ²N . 2 2

This proves the result on the interval [0, π]. (b) To prove the general case we just use the change of variables t = π(x − a)/(b−a) and observe that polynomials are preserved under this transformation.

8.4

A Construction of R From Q

Exercise 8.4.1. (a) We have to show Cr possesses the three properties of a cut. Property (c1) can be verified by noticing that Cr contains all rational t < r and hence, it is not the empty set. Also, the set Cr 6= Q since all rational numbers greater than r are not contained in Cr . To prove property (c2), fix t ∈ Cr and assume q < t. Because t ∈ Cr we have q < t < r and thus q is an element of Cr , as desired. Finally, let’s show property(c3) holds for Cr . Note that for any t ∈ Cr we can produce q ∈ Cr with t < q < r by letting q = (t + r)/2 .This shows Cr does not have a maximum. (b) The set S is not a cut because it has a maximum. (c) The set T is a cut. √ (d) The set U is also a cut. It may seem as though 2 is a maximum, but our definition of a cut deals √ exclusively with rational numbers. At the moment there is no such thing as 2. In fact, this cut (which is equal to the cut in (c)) √ is to become 2 when we are finished. Exercise 8.4.2. Because A is a cut, all rational q < r are also in A. Hence, a rational number s ∈ / A must be greater than r ∈ A because if s ≤ r then s would be an element of A. Exercise 8.4.3. The operations of addition and multiplication are commutative and associative on all of these sets, and the distributive property holds. The set of natural numbers is not a field because there is no additive identity and no additive inverses. Although N has a multiplicative identity, it also fails to have multiplicative inverses. The set of integers is an improvement in that Z has an additive identity and additive inverses. However, multiplicative inverses do not

150

Chapter 8. Additional Topics

exist for elements of Z (except for the numbers -1 and 1). The set of rational numbers Q possesses all the properties of a field. Exercise 8.4.4. In order to prove property (o1), we have to show that, for every pair of real numbers A and B, at least one of the statements A ⊆ B or B ⊆ A is true. This means either A is a subset of B or B be is a subset of A. If A is a subset of B then we are done, so let’s assume that A is not a subset of B. Our goal is to show that B ⊆ A. Because A is not a subset of B there must exist an element a ∈ A where a ∈ / B. Now let b ∈ B be arbitrary. Because a ∈ / B, we know from Exercise 8.4.2 that b < a. Then property (c2) implies b ∈ A which shows B ⊆ A. Property (o2) is verified by noting that A ⊆ B and B ⊆ A is true if and only if A = B. In fact, showing containment in each direction is the standard way to prove two sets are equal. Property (o3) is also straightforward because A ⊆ B and B ⊆ C certainly implies A ⊆ C. Exercise 8.4.5. (a) The set A + B is not the empty set because A is not empty and B is not empty. To argue A + B 6= Q pick r1 notinA and l2 ∈ / B. Given an arbitrary elements a ∈ A and b ∈ B, we again use Exercise 8.4.2 to say that a < l1 and b < l2 . This implies l1 + l2 is an upper bound on A + B meaning A + B cannot be all of Q. To show that A + B does not have a maximum, fix c ∈ A + B and write c = a + b where a ∈ A and b ∈ B. By property (c3) we know that there exists s ∈ A with a < s. Also, there exists r ∈ B with b < r. We can now conclude s + r ∈ A + B with c < s + r. (b) To show that addition is commutative we can write A+B

= =

{a + b : a ∈ A, b ∈ B} {b + a : a ∈ A, b ∈ B} = B + A.

The proof that addition is associative is similar in that it follows directly from the fact that addition of rational numbers is associative. In particular, we can show that x ∈ (A + B) + C if and only if x = a + b + c where a ∈ A, b ∈ B and c ∈ C. Then (a + b) + c = a + (b + c) and the rest is clear sailing. (c) (Note that some early editions of the text erroneously suggest showing A + O = O instead of A + O = A.) Let’s follow the advice to prove inclusions in each direction, starting with A + O ⊆ A. Given a + b ∈ A + O where a ∈ A and b ∈ O, we know b < 0. Thus, a + b < a, and by property (c2), a + b ∈ A. To prove the reverse inclusion, fix a ∈ A. By property (c3) there must exist s ∈ A satisfying a < s, from which it follows that a − s ∈ O. Then a = s + (a − s) ∈ A + O, which proves A ⊆ A + O. These two inclusions together show A = A + O.

8.4. A Construction of R From Q

151

Exercise 8.4.6. (a) Let’s verify property (c1). Because A 6= Q, there exists t∈ / A. Since t < t + 1 we can conclude −(t + 1) ∈ −A by the definition of −A, and thus −A 6= ∅. To show −A 6= Q we start by noting A is not empty and picking a ∈ A. If r ∈ −A we know there exists t ∈ / A with t < −r. Then t ∈ /A implies a < t and it follows that r < −a. This proves that −A is bounded above by −a and thus −A 6= Q. To prove property (c2), we let r ∈ −A and consider s ∈ Q satisfying s < r. Because r ∈ −A there exists t ∈ / A with t < −r. Since s < r implies −r < −s we have t < −s, which means s ∈ −A. Finally, let’s prove property (c3). If we let r ∈ −A, then there exists t ∈ /A with t < −r. By the density property of the rational numbers we can choose s ∈ Q such that t < s < −r. This implies −s ∈ −A and, because r < −s, we see that −A does not possess a maximum. (b) If we set −A = {r ∈ Q : −r ∈ / A} then −A will not necessarily be a cut. In particular, property (c3) may fail to hold. For instance, if we let A = {r : r < 0} then −A = {r : r ≤ 0} has a maximum value. (c) Because r ∈ −A we know there exists t ∈ / A with t < −r. By Exercise 8.4.2 we have a < t, which implies a < −r. Thus, a + r < 0 and so a + r ∈ O. This shows A + (−A) ⊆ O. Now, let’s prove the reverse inclusion by fixing o ∈ O and finding a ∈ A and b ∈ −A so that a + b = o. Set ² = |o|/2 = −o/2. Now choose a rational number t∈ / A with the property that t − ² ∈ A. (Here we are relying on properties (c1) and (c2) of a cut. In particular, we could show that if no such t existed then either A = Q or A = ∅.) Now the fact that t ∈ / A implies −(t + ²) ∈ −A. Then o = −2² = −(t + ²) + (t − ²) ∈ −A + A, and we conclude O ⊆ −A + A. This proves (f4). (d) (Early versions of the text ask to prove property (o3) which has already been done. Later versions ask for proofs of (o4) earlier and (o5) later in the section.) Exercise 8.4.7. (a) We must show AB has the properties of a cut. Let’s first verify property (c1). The set AB 6= ∅ because all rational q < 0 are in AB. Furthermore, because A and B are bounded above then so are products of the form ab where both a, b ≥ 0 with a ∈ A and b ∈ B. This implies AB 6= Q. To prove property (c2), we let t ∈ AB be arbitrary and let s ∈ Q satisfy s < t. If s < 0 then s ∈ AB by the way we have defined the product. For the case 0 ≤ s < t it must be that t = ab where a ∈ A and b ∈ B satisfy a > 0 and b > 0. Because s < ab we have s/b < a which implies s/b ∈ A. Then ³s´ (b) ∈ AB, s= b and (c2) is proved. To verify property (c3), consider t ∈ AB. If t < 0 then t < t/2 and t/2 ∈ AB because t/2 < 0 as well. If t ≥ 0 then t = ab for some a ∈ A and b ∈ B. Applying

152

Chapter 8. Additional Topics

property (c3) to A and B we get s ∈ A and r ∈ B with a < s and b < r. We conclude sr ∈ A + B with ab < sr. (b) Let A, B and C be cuts and assume A ≤ B meaning A ⊆ B. To show A + C ≤ B + C we let x ∈ A + C be arbitrary. Then x = a + c where a ∈ A and c ∈ C. Because A ⊆ B we have a ∈ B as well and it follows that x ∈ B + C. This proves (o4). Property (o5) follows immediately from our definition of the product of two positive cuts. (c) The cut I = {p ∈ Q : p < 1} is the multiplicative identity. Exercise 8.4.1 contains the argument that I = C1 is actually a cut. We now show AI = A for all A ≥ O by demonstrating inclusion both ways. Fix q ∈ AI. Because I ≥ 0, then either q < 0 or q = ab where a, b ≥ 0 with a ∈ A and b < 1. If q < 0 then q ∈ A because A ≥ 0. In the other case we have q = ab < a and property (c2) implies ab ∈ A. Thus, AI ⊆ A. In the other direction we consider a ∈ A. If a < 0 then a ∈ AI by our definition of the product of positive cuts. If a ≥ 0, then property (c3) says that we can pick a rational p ∈ A with a < p. This implies a/p < 1 and hence a/p ∈ I. But then, µ ¶ a a= (p) ∈ AI, p which shows AI ⊆ A, and we conclude that A = AI. (d) To show AO ⊆ O we let b ∈ AO be arbitrary. Because there are no positive elements of O, we see from our definition of the product AO that we must have b < 0. This implies b ∈ O and we conclude AO ⊆ O. The reverse inclusion is true because a ∈ O means a < 0 which implies a ∈ AO. Exercise 8.4.8. (a) In order to prove S ∈ R we have to show S possesses the three properties of a cut. Consider property (c1). Since A = 6 ∅, the set S, which is the union of all A ∈ A, cannot be the empty set. In addition, because A is bounded above by some cut B, we have that S ≤ B. Since, B 6= Q we conclude S 6= Q as well. To prove property (c2), we let a ∈ S and consider r ∈ Q satisfying r < a. Because a ∈ S, it follows that a ∈ A for some A ∈ A. Since A is a cut, r ∈ A which implies r ∈ S as well. Finally, to verify property (c3), let’s fix an arbitrary a ∈ S and show that there exists an element q in S with a < q. As before, if a ∈ S then a ∈ A for some A ∈ A. Since A is a cut we can find q ∈ A, and hence in S, with a < q. (b) By definition, S is the union of all A ∈ A which implies A ⊆ S or A ≤ S. This shows S is an upper bound for A. Now, let B be an arbitrary upper bound for A. To show S ≤ B, consider an arbitrary s ∈ S. As we have seen several times now, it must be that s ∈ A for some A in A and this implies a ∈ B because A ⊆ B. Therefore, S ⊆ B or S ≤ B, and our proof is complete. Exercise 8.4.9. (a) We first show Cr + Cs = Cr+s by showing inclusion both ways. For the forward inclusion, let t + p ∈ Cr + Cs where t ∈ Cr and p ∈ Cs . Then t < r and p < s, and we see t + p < r + s. This implies t + p ∈ Cr+s and thus Cr + Cs ⊆ Cr+s .

8.4. A Construction of R From Q

153

For the reverse inclusion we start with p ∈ Cr+s . Then p < r + s implies r +s−p > 0. Letting ² = r +s−p, a little algebra yields p = (r −²/2)+(s−²/2). Observe that r − ²/2 ∈ Cr and s − ²/2 ∈ Cs , and this implies p ∈ Cr + Cs , as desired. We conclude Cr+s ⊆ Cr + Cs and therefore the sets are equal. To verify Cr Cs = Crs for positive r and s we fix q ∈ Cr Cs . If q < 0 then q < rs which implies q ∈ Crs . If q ≥ 0 then q = ap for some a ∈ Cr and p ∈ Cs where both a, p ≥ 0. Because everything is positive, we get ap < rs which implies q = ap ∈ Crs . This shows Cr Cs ⊆ Crs . For the other inclusion we consider p ∈ Crs . If p < 0 then the way we have defined the product ensures p ∈ Cr Cs . If p ≥ 0 then observe that p < rs implies p/s < r from which we conclude that p/s ∈ Cr . Then ³p´ p= (s) ∈ Cr Cs , s and it follows that Crs ⊆ Cr Cs . Thus Cr Cs = Crs . (b) (⇒) For each n ∈ N the rational number r − (1/n) ∈ Cr . Because Cr ⊆ Cs , we see r − (1/n) ∈ Cs . This means r−

1 <s n

for all n ∈ N,

and a short contradiction argument shows r ≤ s. (⇐) Conversely, assume r ≤ s. If a ∈ Cr then a < r ≤ s which implies a ∈ Cs . Therefore, Cr ⊆ Cs or, equivalently, Cr ≤ Cs .