International Student Edition
ADVANCED ENGINEERING MATHEMATICS
This page intentionally left blank
ADVANCED ENGINEER...
319 downloads
2703 Views
12MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
International Student Edition
ADVANCED ENGINEERING MATHEMATICS
This page intentionally left blank
ADVANCED ENGINEERING MATHEMATICS International Student Edition
PETER V. O’NEIL University of Alabama at Birmingham
Australia Canada Mexico Singapore Spain United Kingdom United States
Advanced Engineering Mathematics, International Student Edition by Peter V. O’Neil
Associate Vice-President and Editorial Director: Evelyn Veitch Publisher: Chris Carson Developmental Editor: Kamilah Reid Burrell/ Hilda Gowaus
Production Services: RPK Editorial Services
Creative Director: Angela Cluer
Copy Editor: Shelly Gerger-Knechtl/ Harlan James
Interior Design: Terri Wright
Proofreader: Erin Wagner/Harlan James
Cover Design: Andrew Adams
Indexer: RPK Editorial Services
Compositor: Integra
Permissions Coordinator: Vicki Gould
Production Manager: Renate McCloy
Printer: Quebecor World
COPYRIGHT © 2007 by Nelson, a division of Thomson Canada Limited.
ALL RIGHTS RESERVED. No part of this work covered by the copyright herein may be reproduced, transcribed, or used in any form or by any means—graphic, electronic, or mechanical, including photocopying, recording, taping, Web distribution, or information storage and retrieval systems—without the written permission of the publisher.
North America Nelson 1120 Birchmount Road Toronto, Ontario M1K 5G4 Canada
Printed and bound in the United States 1
2
3
4
07
06
For more information contact Nelson, 1120 Birchmount Road, Toronto, Ontario, Canada, M1K 5G4. Or you can visit our Internet site at http://www.nelson.com Library of Congress Control Number: 2006900028 ISBN: 0-495-08237-6 If you purchased this book within the United States or Canada you should be aware that it has been wrongfully imported without the approval of the Publisher or the Author.
For permission to use material from this text or product, submit a request online at www.thomsonrights.com Every effort has been made to trace ownership of all copyright material and to secure permission from copyright holders. In the event of any question arising as to the use of any material, we will be pleased to make the necessary corrections in future printings.
Asia Thomson Learning 5 Shenton Way #01-01 UIC Building Singapore 068808 Australia/New Zealand Thomson Learning 102 Dodds Street Southbank, Victoria Australia 3006 Europe/Middle East/Africa Thomson Learning High Holborn House 50/51 Bedford Row London WC1R 4LR United Kingdom Latin America Thomson Learning Seneca, 53 Colonia Polanco 11560 Mexico D.F. Mexico Spain Paraninfo Calle/Magallanes, 25 28015 Madrid, Spain
Contents
PART
1
Chapter 1
Ordinary Differential Equations 1 First-Order Differential Equations
3
1.1 Preliminary Concepts 3 1.1.1 General and Particular Solutions 3 1.1.2 Implicitly Defined Solutions 4 1.1.3 Integral Curves 5 1.1.4 The Initial Value Problem 6 1.1.5 Direction Fields 7 1.2 Separable Equations 11 1.2.1 Some Applications of Separable Differential Equations 14 1.3 Linear Differential Equations 22 1.4 Exact Differential Equations 26 1.5 Integrating Factors 33 1.5.1 Separable Equations and Integrating Factors 37 1.5.2 Linear Equations and Integrating Factors 37 1.6 Homogeneous, Bernoulli, and Riccati Equations 38 1.6.1 Homogeneous Differential Equations 38 1.6.2 The Bernoulli Equation 42 1.6.3 The Riccati Equation 43 1.7 Applications to Mechanics, Electrical Circuits, and Orthogonal Trajectories 1.7.1 Mechanics 46 1.7.2 Electrical Circuits 51 1.7.3 Orthogonal Trajectories 53 1.8 Existence and Uniqueness for Solutions of Initial Value Problems 58
Chapter 2
Second-Order Differential Equations
46
61
2.1 Preliminary Concepts 61 2.2 Theory of Solutions of y + pxy + qxy = fx 62 2.2.1 The Homogeneous Equation y + pxy + qx = 0 64 2.2.2 The Nonhomogeneous Equation y + pxy + qxy = fx 2.3 Reduction of Order 69 2.4 The Constant Coefficient Homogeneous Linear Equation 73 2.4.1 Case 1: A 2 − 4B > 0 73 2.4.2 Case 2: A 2 − 4B = 0 74
68
v
vi
Contents 2.4.3 Case 3: A 2 − 4B < 0 74 2.4.4 An Alternative General Solution in the Complex Root Case 75 2.5 Euler’s Equation 78 2.6 The Nonhomogeneous Equation y + pxy + qxy = fx 82 2.6.1 The Method of Variation of Parameters 82 2.6.2 The Method of Undetermined Coefficients 85 2.6.3 The Principle of Superposition 91 2.6.4 Higher-Order Differential Equations 91 2.7 Application of Second-Order Differential Equations to a Mechanical System 2.7.1 Unforced Motion 95 2.7.2 Forced Motion 98 2.7.3 Resonance 100 2.7.4 Beats 102 2.7.5 Analogy with an Electrical Circuit 103
Chapter 3
The Laplace Transform
107
3.1 Definition and Basic Properties 107 3.2 Solution of Initial Value Problems Using the Laplace Transform 3.3 Shifting Theorems and the Heaviside Function 120 3.3.1 The First Shifting Theorem 120 3.3.2 The Heaviside Function and Pulses 122 3.3.3 The Second Shifting Theorem 125 3.3.4 Analysis of Electrical Circuits 129 3.4 Convolution 134 3.5 Unit Impulses and the Dirac Delta Function 139 3.6 Laplace Transform Solution of Systems 144 3.7 Differential Equations with Polynomial Coefficients 150
Chapter 4
Series Solutions 4.1 4.2 4.3 4.4
Chapter 5
155
Power Series Solutions of Initial Value Problems 156 Power Series Solutions Using Recurrence Relations 161 Singular Points and the Method of Frobenius 166 Second Solutions and Logarithm Factors 173
Numerical Approximation of Solutions
181
5.1 Euler’s Method 182 5.1.1 A Problem in Radioactive Waste Disposal 5.2 One-Step Methods 190 5.2.1 The Second-Order Taylor Method 190 5.2.2 The Modified Euler Method 193 5.2.3 Runge-Kutta Methods 195 5.3 Multistep Methods 197 5.3.1 Case 1 r = 0 198 5.3.2 Case 2 r = 1 198 5.3.3 Case 3 r = 3 199 5.3.4 Case 4 r = 4 199
187
116
93
Contents
PART
2
Chapter 6
Vectors and Linear Algebra 201 Vectors and Vector Spaces 6.1 6.2 6.3 6.4 6.5
Chapter 7
203
The Algebra and Geometry of Vectors 203 The Dot Product 211 The Cross Product 217 The Vector Space R n 223 Linear Independence, Spanning Sets, and Dimension in R n
Matrices and Systems of Linear Equations
228
237
7.1 Matrices 238 7.1.1 Matrix Algebra 239 7.1.2 Matrix Notation for Systems of Linear Equations 242 7.1.3 Some Special Matrices 243 7.1.4 Another Rationale for the Definition of Matrix Multiplication 246 7.1.5 Random Walks in Crystals 247 7.2 Elementary Row Operations and Elementary Matrices 251 7.3 The Row Echelon Form of a Matrix 258 7.4 The Row and Column Spaces of a Matrix and Rank of a Matrix 266 7.5 Solution of Homogeneous Systems of Linear Equations 272 7.6 The Solution Space of AX = O 280 7.7 Nonhomogeneous Systems of Linear Equations 283 7.7.1 The Structure of Solutions of AX = B 284 7.7.2 Existence and Uniqueness of Solutions of AX = B 285 7.8 Matrix Inverses 293 7.8.1 A Method for Finding A−1 295
Chapter 8
Determinants 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9
Chapter 9
299
Permutations 299 Definition of the Determinant 301 Properties of Determinants 303 Evaluation of Determinants by Elementary Row and Column Operations Cofactor Expansions 311 Determinants of Triangular Matrices 314 A Determinant Formula for a Matrix Inverse 315 Cramer’s Rule 318 The Matrix Tree Theorem 320
Eigenvalues, Diagonalization, and Special Matrices 9.1 Eigenvalues and Eigenvectors 324 9.1.1 Gerschgorin’s Theorem 328 9.2 Diagonalization of Matrices 330 9.3 Orthogonal and Symmetric Matrices
339
323
307
vii
viii
Contents 9.4 Quadratic Forms 347 9.5 Unitary, Hermitian, and Skew Hermitian Matrices
PART
3
Chapter 10
352
Systems of Differential Equations and Qualitative Methods 359 Systems of Linear Differential Equations
361
10.1 Theory of Systems of Linear First-Order Differential Equations 361 10.1.1 Theory of the Homogeneous System X = AX 365 10.1.2 General Solution of the Nonhomogeneous System X = AX + G 372 10.2 Solution of X = AX when A is Constant 374 10.2.1 Solution of X = AX when A has Complex Eigenvalues 377 10.2.2 Solution of X = AX when A does not have n Linearly Independent Eigenvectors 379 10.2.3 Solution of X = AX by Diagonalizing A 384 10.2.4 Exponential Matrix Solutions of X = AX 386 10.3 Solution of X = AX + G 394 10.3.1 Variation of Parameters 394 10.3.2 Solution of X = AX + G by Diagonalizing A 398
Chapter 11
Qualitative Methods and Systems of Nonlinear Differential Equations 11.1 11.2 11.3 11.4 11.5 11.6 11.7
PART
4
Chapter 12
Nonlinear Systems and Existence of Solutions 403 The Phase Plane, Phase Portraits and Direction Fields Phase Portraits of Linear Systems 413 Critical Points and Stability 424 Almost Linear Systems 431 Lyapunov’s Stability Criteria 451 Limit Cycles and Periodic Solutions 461
406
Vector Analysis 473 Vector Differential Calculus
475
12.1 Vector Functions of One Variable 475 12.2 Velocity, Acceleration, Curvature and Torsion 481 12.2.1 Tangential and Normal Components of Acceleration 488 12.2.2 Curvature as a Function of t 491 12.2.3 The Frenet Formulas 492 12.3 Vector Fields and Streamlines 493 12.4 The Gradient Field and Directional Derivatives 499 12.4.1 Level Surfaces, Tangent Planes and Normal Lines 503 12.5 Divergence and Curl 510 12.5.1 A Physical Interpretation of Divergence 512 12.5.2 A Physical Interpretation of Curl 513
403
Contents
Chapter 13
Vector Integral Calculus
517
13.1 Line Integrals 517 13.1.1 Line Integral with Respect to Arc Length 525 13.2 Green’s Theorem 528 13.2.1 An Extension of Green’s Theorem 532 13.3 Independence of Path and Potential Theory in the Plane 536 13.3.1 A More Critical Look at Theorem 13.5 539 13.4 Surfaces in 3-Space and Surface Integrals 545 13.4.1 Normal Vector to a Surface 548 13.4.2 The Tangent Plane to a Surface 551 13.4.3 Smooth and Piecewise Smooth Surfaces 552 13.4.4 Surface Integrals 553 13.5 Applications of Surface Integrals 557 13.5.1 Surface Area 557 13.5.2 Mass and Center of Mass of a Shell 557 13.5.3 Flux of a Vector Field Across a Surface 560 13.6 Preparation for the Integral Theorems of Gauss and Stokes 562 13.7 The Divergence Theorem of Gauss 564 13.7.1 Archimedes’s Principle 567 13.7.2 The Heat Equation 568 13.7.3 The Divergence Theorem as a Conservation of Mass Principle 13.8 The Integral Theorem of Stokes 572 13.8.1 An Interpretation of Curl 576 13.8.2 Potential Theory in 3-Space 576
PART
5
Chapter 14
570
Fourier Analysis, Orthogonal Expansions, and Wavelets 581 Fourier Series
583
14.1 Why Fourier Series? 583 14.2 The Fourier Series of a Function 586 14.2.1 Even and Odd Functions 589 14.3 Convergence of Fourier Series 593 14.3.1 Convergence at the End Points 599 14.3.2 A Second Convergence Theorem 601 14.3.3 Partial Sums of Fourier Series 604 14.3.4 The Gibbs Phenomenon 606 14.4 Fourier Cosine and Sine Series 609 14.4.1 The Fourier Cosine Series of a Function 610 14.4.2 The Fourier Sine Series of a Function 612 14.5 Integration and Differentiation of Fourier Series 614 14.6 The Phase Angle Form of a Fourier Series 623 14.7 Complex Fourier Series and the Frequency Spectrum 630 14.7.1 Review of Complex Numbers 630 14.7.2 Complex Fourier Series 631
ix
x
Contents
Chapter 15
The Fourier Integral and Fourier Transforms 15.1 15.2 15.3 15.4
15.5 15.6 15.7
15.8
15.9
Chapter 16
637
The Fourier Integral 637 Fourier Cosine and Sine Integrals 640 The Complex Fourier Integral and the Fourier Transform 642 Additional Properties and Applications of the Fourier Transform 652 15.4.1 The Fourier Transform of a Derivative 652 15.4.2 Frequency Differentiation 655 15.4.3 The Fourier Transform of an Integral 656 15.4.4 Convolution 657 15.4.5 Filtering and the Dirac Delta Function 660 15.4.6 The Windowed Fourier Transform 661 15.4.7 The Shannon Sampling Theorem 665 15.4.8 Lowpass and Bandpass Filters 667 The Fourier Cosine and Sine Transforms 670 The Finite Fourier Cosine and Sine Transforms 673 The Discrete Fourier Transform 675 15.7.1 Linearity and Periodicity 678 15.7.2 The Inverse N -Point DFT 678 15.7.3 DFT Approximation of Fourier Coefficients 679 Sampled Fourier Series 681 15.8.1 Approximation of a Fourier Transform by an N -Point DFT 685 15.8.2 Filtering 689 The Fast Fourier Transform 694 15.9.1 Use of the FFT in Analyzing Power Spectral Densities of Signals 15.9.2 Filtering Noise From a Signal 696 15.9.3 Analysis of the Tides in Morro Bay 697
Special Functions, Orthogonal Expansions, and Wavelets
695
701
16.1 Legendre Polynomials 701 16.1.1 A Generating Function for the Legendre Polynomials 704 16.1.2 A Recurrence Relation for the Legendre Polynomials 706 16.1.3 Orthogonality of the Legendre Polynomials 708 16.1.4 Fourier–Legendre Series 709 16.1.5 Computation of Fourier–Legendre Coefficients 711 16.1.6 Zeros of the Legendre Polynomials 713 16.1.7 Derivative and Integral Formulas for Pn x 715 16.2 Bessel Functions 719 16.2.1 The Gamma Function 719 16.2.2 Bessel Functions of the First Kind and Solutions of Bessel’s Equation 16.2.3 Bessel Functions of the Second Kind 722 16.2.4 Modified Bessel Functions 725 16.2.5 Some Applications of Bessel Functions 727 16.2.6 A Generating Function for Jn x 732 16.2.7 An Integral Formula for Jn x 733 16.2.8 A Recurrence Relation for Jv x 735 16.2.9 Zeros of Jv x 737
721
Contents 16.2.10 Fourier–Bessel Expansions 739 16.2.11 Fourier–Bessel Coefficients 741 16.3 Sturm–Liouville Theory and Eigenfunction Expansions 745 16.3.1 The Sturm–Liouville Problem 745 16.3.2 The Sturm–Liouville Theorem 752 16.3.3 Eigenfunction Expansions 755 16.3.4 Approximation in the Mean and Bessel’s Inequality 759 16.3.5 Convergence in the Mean and Parseval’s Theorem 762 16.3.6 Completeness of the Eigenfunctions 763 16.4 Wavelets 765 16.4.1 The Idea Behind Wavelets 765 16.4.2 The Haar Wavelets 767 16.4.3 A Wavelet Expansion 774 16.4.4 Multiresolution Analysis with Haar Wavelets 774 16.4.5 General Construction of Wavelets and Multiresolution Analysis 16.4.6 Shannon Wavelets 776
PART
6
Chapter 17
xi
775
Partial Differential Equations 779 The Wave Equation
781
17.1 The Wave Equation and Initial and Boundary Conditions 781 17.2 Fourier Series Solutions of the Wave Equation 786 17.2.1 Vibrating String with Zero Initial Velocity 786 17.2.2 Vibrating String with Given Initial Velocity and Zero Initial Displacement 791 17.2.3 Vibrating String with Initial Displacement and Velocity 793 17.2.4 Verification of Solutions 794 17.2.5 Transformation of Boundary Value Problems Involving the Wave Equation 796 17.2.6 Effects of Initial Conditions and Constants on the Motion 798 17.2.7 Numerical Solution of the Wave Equation 801 17.3 Wave Motion Along Infinite and Semi-Infinite Strings 808 17.3.1 Wave Motion Along an Infinite String 808 17.3.2 Wave Motion Along a Semi-Infinite String 813 17.3.3 Fourier Transform Solution of Problems on Unbounded Domains 815 17.4 Characteristics and d’Alembert’s Solution 822 17.4.1 A Nonhomogeneous Wave Equation 825 17.4.2 Forward and Backward Waves 828 17.5 Normal Modes of Vibration of a Circular Elastic Membrane 831 17.6 Vibrations of a Circular Elastic Membrane, Revisited 834 17.7 Vibrations of a Rectangular Membrane 837
Chapter 18
The Heat Equation
841
18.1 The Heat Equation and Initial and Boundary Conditions 18.2 Fourier Series Solutions of the Heat Equation 844
841
xii
Contents 18.2.1 Ends of the Bar Kept at Temperature Zero 844 18.2.2 Temperature in a Bar with Insulated Ends 847 18.2.3 Temperature Distribution in a Bar with Radiating End 848 18.2.4 Transformations of Boundary Value Problems Involving the Heat Equation 851 18.2.5 A Nonhomogeneous Heat Equation 854 18.2.6 Effects of Boundary Conditions and Constants on Heat Conduction 857 18.2.7 Numerical Approximation of Solutions 859 18.3 Heat Conduction in Infinite Media 865 18.3.1 Heat Conduction in an Infinite Bar 865 18.3.2 Heat Conduction in a Semi-Infinite Bar 868 18.3.3 Integral Transform Methods for the Heat Equation in an Infinite Medium 869 18.4 Heat Conduction in an Infinite Cylinder 873 18.5 Heat Conduction in a Rectangular Plate 877
Chapter 19
The Potential Equation
879
19.1 19.2 19.3 19.4 19.5
Harmonic Functions and the Dirichlet Problem 879 Dirichlet Problem for a Rectangle 881 Dirichlet Problem for a Disk 883 Poisson’s Integral Formula for the Disk 886 Dirichlet Problems in Unbounded Regions 888 19.5.1 Dirichlet Problem for the Upper Half Plane 889 19.5.2 Dirichlet Problem for the Right Quarter Plane 891 19.5.3 An Electrostatic Potential Problem 893 19.6 A Dirichlet Problem for a Cube 896 19.7 The Steady-State Heat Equation for a Solid Sphere 898 19.8 The Neumann Problem 902 19.8.1 A Neumann Problem for a Rectangle 904 19.8.2 A Neumann Problem for a Disk 906 19.8.3 A Neumann Problem for the Upper Half Plane 908
PART
7
Chapter 20
Complex Analysis 911 Geometry and Arithmetic of Complex Numbers
913
20.1 Complex Numbers 913 20.1.1 The Complex Plane 914 20.1.2 Magnitude and Conjugate 915 20.1.3 Complex Division 916 20.1.4 Inequalities 917 20.1.5 Argument and Polar Form of a Complex Number 918 20.1.6 Ordering 920 20.2 Loci and Sets of Points in the Complex Plane 921 20.2.1 Distance 922 20.2.2 Circles and Disks 922 20.2.3 The Equation z − a = z − b 923 20.2.4 Other Loci 925 20.2.5 Interior Points, Boundary Points, and Open and Closed Sets
925
Contents 20.2.6 20.2.7 20.2.8 20.2.9
Chapter 21
Limit Points 929 Complex Sequences 931 Subsequences 934 Compactness and the Bolzano-Weierstrass Theorem
Complex Functions
935
939
21.1 Limits, Continuity, and Derivatives 939 21.1.1 Limits 939 21.1.2 Continuity 941 21.1.3 The Derivative of a Complex Function 943 21.1.4 The Cauchy–Riemann Equations 945 21.2 Power Series 950 21.2.1 Series of Complex Numbers 951 21.2.2 Power Series 952 21.3 The Exponential and Trigonometric Functions 957 21.4 The Complex Logarithm 966 21.5 Powers 969 21.5.1 Integer Powers 969 21.5.2 z 1/n for Positive Integer n 969 21.5.3 Rational Powers 971 21.5.4 Powers z w 972
Chapter 22
Complex Integration
975
22.1 Curves in the Plane 975 22.2 The Integral of a Complex Function 980 22.2.1 The Complex Integral in Terms of Real Integrals 983 22.2.2 Properties of Complex Integrals 985 22.2.3 Integrals of Series of Functions 988 22.3 Cauchy’s Theorem 990 22.3.1 Proof of Cauchy’s Theorem for a Special Case 993 22.4 Consequences of Cauchy’s Theorem 994 22.4.1 Independence of Path 994 22.4.2 The Deformation Theorem 995 22.4.3 Cauchy’s Integral Formula 997 22.4.4 Cauchy’s Integral Formula for Higher Derivatives 1000 22.4.5 Bounds on Derivatives and Liouville’s Theorem 1001 22.4.6 An Extended Deformation Theorem 1002
Chapter 23
Series Representations of Functions
1007
23.1 Power Series Representations 1007 23.1.1 Isolated Zeros and the Identity Theorem 1012 23.1.2 The Maximum Modulus Theorem 1016 23.2 The Laurent Expansion 1019
Chapter 24
Singularities and the Residue Theorem
1023
24.1 Singularities 1023 24.2 The Residue Theorem 1030 24.3 Some Applications of the Residue Theorem
1037
xiii
xiv
Contents 24.3.1 The Argument Principle 1037 24.3.2 An Inversion for the Laplace Transform 24.3.3 Evaluation of Real Integrals 1040
Chapter 25
Conformal Mappings 25.1 25.2 25.3 25.4 25.5
PART
8
Chapter 26
1039
1055
Functions as Mappings 1055 Conformal Mappings 1062 25.2.1 Linear Fractional Transformations 1064 Construction of Conformal Mappings Between Domains 1072 25.3.1 Schwarz–Christoffel Transformation 1077 Harmonic Functions and the Dirichlet Problem 1080 25.4.1 Solution of Dirichlet Problems by Conformal Mapping 1083 Complex Function Models of Plane Fluid Flow 1087
Probability and Statistics 1097 Counting and Probability
1099
26.1 26.2 26.3
The Multiplication Principle 1099 Permutations 1102 Choosing r Objects from n Objects 1104 26.3.1 r Objects from n Objects, with Order 1104 26.3.2 r Objects from n Objects, without Order 1106 26.3.3 Tree Diagrams 1107 26.4 Events and Sample Spaces 1112 26.5 The Probability of an Event 1116 26.6 Complementary Events 1121 26.7 Conditional Probability 1122 26.8 Independent Events 1126 26.8.1 The Product Rule 1128 26.9 Tree Diagrams in Computing Probabilities 1130 26.10 Bayes’ Theorem 1134 26.11 Expected Value 1139
Chapter 27
Statistics 27.1
27.2 27.3
27.4
1143
Measures of Center and Variation 1143 27.1.1 Measures of Center 1143 27.1.2 Measures of Variation 1146 Random Variables and Probability Distributions 1150 The Binomial and Poisson Distributions 1154 27.3.1 The Binomial Distribution 1154 27.3.2 The Poisson Distribution 1157 A Coin Tossing Experiment, Normally Distributed Data, and the Bell Curve 27.4.1 The Standard Bell Curve 1174 27.4.2 The 68, 95, 99.7 Rule 1176
1159
Contents 27.5 27.6 27.7 27.8
Sampling Distributions and the Central Limit Theorem 1178 Confidence Intervals and Estimating Population Proportion 1185 Estimating Population Mean and the Student t Distribution 1190 Correlation and Regression 1194
Answers and Solutions to Selected Problems Index
I1
A1
xv
This page intentionally left blank
Preface
This Sixth Edition of Advanced Engineering Mathematics maintains the primary goal of previous editions—to engage much of the post-calculus mathematics needed and used by scientists, engineers, and applied mathematicians, in a setting that is helpful to both students and faculty. The format used throughout begins with the correct developments of concepts such as Fourier series and integrals, conformal mappings, and special functions. These ideas are then brought to bear on applications and models of important phenomena, such as wave and heat propagation and filtering of signals. This edition differs from the previous one primarily in the inclusion of statistics and numerical methods. The statistics part treats random variables, normally distributed data, bell curves, the binomial, Poisson, and student t-distributions, the central limit theorem, confidence intervals, correlation, and regression. This is preceded by prerequisite topics from probability and techniques of enumeration. The numerical methods are applied to initial value problems in ordinary differential equations, including a proposal for radioactive waste disposal, and to boundary value problems involving the heat and wave equations. Finally, in order to include these topics without lengthening the book, some items from the fifth edition have been moved to a website, located at http://engineering.thomsonlearning.com. I hope that this provides convenient accessibility. Material selected for this move includes some biographies and historical notes, predator/prey and competing species models, the theory underlying the efficiency of the FFT, and some selected examples and problems. The chart on the following page offers a complete organizational overview.
Acknowledgments This book is the result of a team effort involving much more than an author. Among those to whom I owe a debt of appreciation are Chris Carson, Joanne Woods, Hilda Gowans and Kamilah Reid-Burrell of Thomson Engineering, and Rose Kernan and the professionals at RPK Editorial Services, Inc. I also want to thank Dr. Thomas O’Neil of the California Polytechnic State University for material he contributed, and Rich Jones, who had the vision for the first edition of this book many years ago. Finally, I want to acknowledge the reviewers, whose suggestions for improvements and clarifications are much appreciated: Preliminary Review
Panagiotis Dimitrakopoulos, University of Maryland Mohamed M. Hafez, University of California, Davis Jennifer Hopwood, University of Western Australia Nun Kwan Yip, Purdue University xvii
xviii Organizational Overview Ordinary Differential Equations
Laplace Transforms
Series Solutions
Systems of Ordinary Differential Equations
Special Functions
Vectors, Matrices, Determinants
Statistical Analysis Systems of Algebraic Equations
Eigenfunction Expansions, Completeness Haar Wavelets
Vector Analysis Qualitative Methods, Stability, Analysis of Critical Points Probability
Partial Differential Equations Statistics
Mathematical Models
Fourier Analysis Complex Analysis Fourier Series, Integrals
Fourier Transforms
Discrete Fourier Transform
P r eface
xix
Draft Review
Sabri Abou-Ward, University of Toronto Craig Hildebrand, California State University – Fresno Seiichi Nomura, University of Texas, Arlington David L. Russell, Virginia Polytechnic Institute and State University Y.Q. Sheng, McMaster University
Peter V. O’neil University of Alabama at Birmingham
This page intentionally left blank
PA RT
1
CHAPTER 1 First-Order Differential Equations
CHAPTER 2 Second-Order Differential Equations
CHAPTER 3
Ordinary Differential Equations
The Laplace Transform
CHAPTER 4 Series Solutions
CHAPTER 5 Numerical Approximation of Solutions
A differential equation is an equation that contains one or more derivatives. For example, y x + yx = 4 sin3x and d4 w − wt2 = e−t dt4 are differential equations. These are ordinary differential equations because they involve only total derivatives, rather than partial derivatives. Differential equations are interesting and important because they express relationships involving rates of change. Such relationships form the basis for developing ideas and studying phenomena in the sciences, engineering, economics, and increasingly in other areas, such as the business world and the stock market. We will see examples of applications as we learn more about differential equations. 1
The order of a differential equation is the order of its highest derivative. The first example given above is of second order, while the second is of fourth order. The equation xy − y2 = ex is of first order. A solution of a differential equation is any function that satisfies it. A solution may be defined on the entire real line, or on only part of it, often an interval. For example, y = sin2x is a solution of y + 4y = 0 because, by direct differentiation, y + 4y = −4 sin2x + 4 sin2x = 0 This solution is defined for all x (that is, on the whole real line). By contrast, y = x lnx − x is a solution of y =
y + 1 x
but this solution is defined only for x > 0. Indeed, the coefficient 1/x of y in this equation means that x = 0 is disallowed from the start. We now begin a systematic development of ordinary differential equations, starting with the first order case.
2
CHAPTER
1
PRELIMINARY CONCEPTS SEPARABLE EQUATIONS HOMOGENEOUS, BERNOULLI, AND RICCATI EQUATIONS APPLICATIONS TO MECHANICS, ELECTRICAL CIRCUITS, AND ORTHOGONAL TRAJECTORIES EXI
First-Order Differential Equations
1.1
Preliminary Concepts Before developing techniques for solving various kinds of differential equations, we will develop some terminology and geometric insight.
1.1.1
General and Particular Solutions
A first-order differential equation is any equation involving a first derivative, but no higher derivative. In its most general form, it has the appearance Fx y y = 0
(1.1)
in which yx is the function of interest and x is the independent variable. Examples are y − y2 − ey = 0 y − 2 = 0 and y − cosx = 0 Note that y must be present for an equation to qualify as a first-order differential equation, but x and/or y need not occur explicitly. A solution of equation (1.1) on an interval I is a function that satisfies the equation for all x in I. That is, Fx x x = 0
for all x in I.
For example, x = 2 + ke−x 3
4
CHAPTER 1
First-Order Differential Equations
is a solution of y + y = 2 for all real x, and for any number k. Here I can be chosen as the entire real line. And x = x lnx + cx is a solution of y =
y +1 x
for all x > 0, and for any number c. In both of these examples, the solution contained an arbitrary constant. This is a symbol independent of x and y that can be assigned any numerical value. Such a solution is called the general solution of the differential equation. Thus x = 2 + ke−x is the general solution of y + y = 2. Each choice of the constant in the general solution yields a particular solution. For example, fx = 2 + e−x and
gx = 2 − e−x
√ hx = 2 − 53e−x
are√all particular solutions of y + y = 2, obtained by choosing, respectively, k = 1, −1 and − 53 in the general solution.
1.1.2
Implicitly Defined Solutions
Sometimes we can write a solution explicitly giving y as a function of x. For example, y = ke−x is the general solution of y = −y as can be verified by substitution. This general solution is explicit, with y isolated on one side of an equation, and a function of x on the other. By contrast, consider y = −
2xy3 + 2 3x2 y2 + 8e4y
We claim that the general solution is the function yx implicitly defined by the equation x2 y3 + 2x + 2e4y = k
(1.2)
in which k can be any number. To verify this, implicitly differentiate equation (1.2) with respect to x, remembering that y is a function of x. We obtain 2xy3 + 3x2 y2 y + 2 + 8e4y y = 0 and solving for y yields the differential equation. In this example we are unable to solve equation (1.2) explicitly for y as a function of x, isolating y on one side. Equation (1.2), implicitly defining the general solution, was obtained by a technique we will develop shortly, but this technique cannot guarantee an explicit solution.
1.1 Preliminary Concepts
1.1.3
5
Integral Curves
A graph of a solution of a first-order differential equation is called an integral curve of the equation. If we know the general solution, we obtain an infinite family of integral curves, one for each choice of the arbitrary constant.
EXAMPLE 1.1
We have seen that the general solution of y + y = 2 is y = 2 + ke−x for all x. The integral curves of y + y = 2 are graphs of y = 2 + ke−x for different choices of k. Some of these are shown in Figure 1.1. y 30
k6
20 k3 10 k 0 ( y 2) 2 k 3
k 6
1
0
1
2
3
10
20
FIGURE 1.1 Integral curves of y + y = 2 for k = 0 3 −3 6, and −6.
EXAMPLE 1.2
It is routine to verify that the general solution of y +
y = ex x
is y=
1 xex − ex + c x
4
5
6
x
6
CHAPTER 1
First-Order Differential Equations
for x = 0. Graphs of some of these integral curves, obtained by making choices for c, are shown in Figure 1.2. y 40 30
c 20
20 10
c5 c0
0.5 10 20
1.5
1.0
2.0
2.5
x 3.0
c 6 c 10
FIGURE 1.2
−10.
Integral curves of y + x1 y = ex for c = 0 5 20 −6, and
We will see shortly how these general solutions are obtained. For the moment, we simply want to illustrate integral curves. Although in simple cases integral curves can be sketched by hand, generally we need computer assistance. Computer packages such as MAPLE, MATHEMATICA and MATLAB are widely available. Here is an example in which the need for computing assistance is clear.
EXAMPLE 1.3
The differential equation y + xy = 2 has general solution yx = e−x
2 /2
x
2e
0
2 /2
d + ke−x /2 2
Figure 1.3 shows computer-generated integral curves corresponding to k = 0, 4, 13, −7, −15 and −11.
1.1.4
The Initial Value Problem
The general solution of a first-order differential equation Fx y y = 0 contains an arbitrary constant, hence there is an infinite family of integral curves, one for each choice of the constant. If we specify that a solution is to pass through a particular point x0 y0 , then we must find that particular integral curve (or curves) passing through this point. This is called an initial value problem. Thus, a first order initial value problem has the form Fx y y = 0
yx0 = y0
in which x0 and y0 are given numbers. The condition yx0 = y0 is called an initial condition.
1.1 Preliminary Concepts
7
y k 13 10 k4
5 2
k0
0
k 7
4
2
4
x
5 k 11
10
k 15 15 FIGURE 1.3 Integral curves of y + xy = 2 for k = 0 4 13 −7 −15, and
−11.
EXAMPLE 1.4
Consider the initial value problem y + y = 2
y1 = −5
From Example 1.1, the general solution of y + y = 2 is y = 2 + ke−x Graphs of this equation are the integral curves. We want the one passing through 1 −5. Solve for k so that y1 = 2 + ke−1 = −5 obtaining k = −7e The solution of this initial value problem is y = 2 − 7ee−x = 2 − 7e−x−1 As a check, y1 = 2 − 7 = −5 The effect of the initial condition in this example was to pick out one special integral curve as the solution sought. This suggests that an initial value problem may be expected to have a unique solution. We will see later that this is the case, under mild conditions on the coefficients in the differential equation.
1.1.5
Direction Fields
Imagine a curve, as in Figure 1.4. If we choose some points on the curve and, at each point, draw a segment of the tangent to the curve there, then these segments give a rough outline of the shape of the curve. This simple observation is the key to a powerful device for envisioning integral curves of a differential equation.
8
CHAPTER 1
First-Order Differential Equations y
x FIGURE 1.4 Short tangent segments suggest the shape of the curve.
The general first-order differential equation has the form Fx y y = 0 Suppose we can solve for y and write the differential equation as y = fx y Here f is a known function. Suppose fx y is defined for all points x y in some region R of the plane. The slope of the integral curve through a given point x0 y0 of R is y x0 , which equals fx0 y0 . If we compute fx y at selected points in R, and draw a small line segment having slope fx y at each x y, we obtain a collection of segments which trace out the shapes of the integral curves. This enables us to obtain important insight into the behavior of the solutions (such as where solutions are increasing or decreasing, limits they might have at various points, or behavior as x increases). A drawing of the plane, with short line segments of slope fx y drawn at selected points x y, is called a direction field of the differential equation y = fx y. The name derives from the fact that at each point the line segment gives the direction of the integral curve through that point. The line segments are called lineal elements.
EXAMPLE 1.5
Consider the equation y = y2 Here fx y = y2 , so the slope of the integral curve through x y is y2 . Select some points and, through each, draw a short line segment having slope y2 . A computer generated direction field is shown in Figure 1.5(a). The lineal elements form a profile of some integral curves and give us some insight into the behavior of solutions, at least in this part of the plane. Figure 1.5(b) reproduces this direction field, with graphs of the integral curves through 0 1, 0 2, 0 3, 0 −1, 0 −2 and 0 −3. By a method we will develop, the general solution of y = y2 is y=−
1 x+k
so the integral curves form a family of hyperbolas, as suggested by the curves sketched in Figure 1.5(b).
1.1 Preliminary Concepts
9
y 4
2
4
2
2
0
4
x
2
4 FIGURE 1.5(a) A direction field for y = y2 .
y 4
2
2 4
2
4
0
2
4 FIGURE 1.5(b) Direction field for y = y2 and integral curves through 0 1, 0 2, 0 30 −1, 0 −2, and 0 −3.
x
10
CHAPTER 1
First-Order Differential Equations
EXAMPLE 1.6
Figure 1.6 shows a direction field for y = sinxy together with the integral curves through 0 1, 0 2, 0 3, 0 −1, 0 −2 and 0 −3. In this case, we cannot write a simple expression for the general solution, and the direction field provides information about the behavior of solutions that is not otherwise readily apparent. y 4
2
4
2
0
2
4
x
2
4 Direction field for y = sinxy and integral curves through 0 1, 0 2, 0 3, 0 −1, 0 −2, and 0 −3.
FIGURE 1.6
With this as background, we will begin a program of identifying special classes of firstorder differential equations for which there are techniques for writing the general solution. This will occupy the next five sections.
SECTION 1.1
PROBLEMS
In each of Problems 1 through 6, determine whether the given function is a solution of the differential equation. √ 1. 2yy = 1 x = x − 1 for x > 1 2. y + y = 0 x = Ce−x C −e 2y + e for x > 0 x = 2x 2x √ 2xy C 4. y = for x = ± 2 x = 2 2 − x2 x −2
3. y = −
x
x
5. xy = x − y x =
x2 − 3 for x = 0 2x
6. y + y = 1 x = 1 + Ce−x In each of Problems 7 through 11, verify by implicit differentiation that the given equation implicitly defines a solution of the differential equation. 7. y2 + xy − 2x2 − 3x − 2y = C y − 4x − 2 + x + 2y − 2y = 0
1.2 Separable Equations 8. xy3 − y = C y3 + 3xy2 − 1y = 0
17. y = x + y y2 = 2
18. y = x − xy y0 = −1
9. y − 4x + e = C 8x − ye − 2y + xe y = 0 2
2
xy
xy
xy
19. y = xy y0 = 2
10. 8 lnx − 2y + 4 − 2x + 6y = C; y =
20. y = x − y + 1 y0 = 1
x − 2y 3x − 6y + 4
2x3 + 2xy2 − y x + 2 y = 0 11. tan y/x + x = C 2 2 x +y x + y2 −1
2
In each of Problems 12 through 16, solve the initial value problem and graph the solution. Hint: Each of these differential equations can be solved by direct integration. Use the initial condition to solve for the constant of integration. 12. y = 2x y2 = 1
In each of Problems 21 through 26, generate a direction field and some integral curves for the differential equation. Also draw the integral curve representing the solution of the initial value problem. These problems should be done by a software package. 21. y = siny y1 = /2 22. y = x cos2x − y y1 = 0 23. y = y sinx − 3x2 y0 = 1
−x
13. y = e y0 = 2
24. y = ex − y y−2 = 1
14. y = 2x + 2 y−1 = 1
25. y − y cosx = 1 − x2 y2 = 2
15. y = 4 cosxsinx y/2 = 0
26. y = 2y + 3 y0 = 1
16. y = 8x + cos2x y0 = −3 In each of Problems 17 through 20 draw some lineal elements of the differential equation for −4 ≤ x ≤ 4, −4 ≤ y ≤ 4. Use the resulting direction field to sketch a graph of the solution of the initial value problem. (These problems can be done by hand.)
1.2
11
27. Show that, for the differential equation y + pxy = qx, the lineal elements on any vertical line x = x0 , with px0 = 0, all pass through the single point , where = x0 +
1 px0
and
=
qx0 px0
Separable Equations DEFINITION 1.1
Separable Differential Equation
A differential equation is called separable if it can be written y = AxBy
In this event, we can separate the variables and write, in differential form, 1 dy = Ax dx By wherever By = 0. We attempt to integrate this equation, writing
1 dy = Ax dx By
This yields an equation in x, y, and a constant of integration. This equation implicitly defines the general solution yx. It may or may not be possible to solve explicitly for yx.
12
CHAPTER 1
First-Order Differential Equations
EXAMPLE 1.7
y = y2 e−x is separable. Write dy = y2 e−x dx as 1 dx = e−x dx y2 for y = 0. Integrate this equation to obtain 1 − = −e−x + k y an equation that implicitly defines the general solution. In this example we can explicitly solve for y, obtaining the general solution y=
1 −k
e−x
Now recall that we required that y = 0 in order to separate the variables by dividing by y2 . In fact, the zero function yx = 0 is a solution of y = y2 ex , although it cannot be obtained from the general solution by any choice of k. For this reason, yx = 0 is called a singular solution of this equation. Figure 1.7 shows graphs of particular solutions obtained by choosing k as 0, 3, −3, 6 and −6. y 3
2 k0 1 k 3 k 6 2 k6
1 k3
1
2
x
1
Integral curves of y = y2 e−x for k = 0 3 −3 6, and −6. FIGURE 1.7
Whenever we use separation of variables, we must be alert to solutions potentially lost through conditions imposed by the algebra used to make the separation.
1.2 Separable Equations
13
EXAMPLE 1.8
x2 y = 1 + y is separable, and we can write 1 1 dy = 2 dx 1+y x The algebra of separation has required that x = 0 and y = −1, even though we can put x = 0 and y = −1 into the differential equation to obtain the correct equation 0 = 0. Now integrate the separated equation to obtain 1 ln1 + y = − + k x This implicitly defines the general solution. In this case, we can solve for yx explicitly. Begin by taking the exponential of both sides to obtain 1 + y = ek e−1/x = Ae−1/x in which we have written A = ek . Since k could be any number, A can be any positive number. Then 1 + y = ±Ae−1/x = Be−1/x in which B = ±A can be any nonzero number. The general solution is y = −1 + Be−1/x in which B is any nonzero number. Now revisit the assumption that x = 0 and y = −1. In the general solution, we actually obtain y = −1 if we allow B = 0. Further, the constant function yx = −1 does satisfy x2 y = 1 + y. Thus, by allowing B to be any number, including 0, the general solution yx = −1 + Be−1/x contains all the solutions we have found. In this example, y = −1 is a solution, but not a singular solution, since it occurs as a special case of the general solution. Figure 1.8 shows graphs of solutions corresponding to B = −8 −5 0 4 and 7. y
B7
4
B4
2 0 2
1
2
4 6
3
4 5 B0
x
B 5 B 8
8 FIGURE 1.8 Integral curves of x2 y = 1 + y for
B = 0 4 7 −5, and −8.
We often solve an initial value problem by finding the general solution of the differential equation, then solving for the appropriate choice of the constant.
14
CHAPTER 1
First-Order Differential Equations
EXAMPLE 1.9
Solve the initial value problem y = y2 e−x y1 = 4 We know from Example 1.7 that the general solution of y = y2 e−x is yx =
1 −k
e−x
Now we need to choose k so that y1 =
1 = 4 e−1 − k
from which we get 1 k = e−1 − 4 The solution of the initial value problem is yx =
1 e−x
+ 41 − e−1
EXAMPLE 1.10
The general solution of y = y
x − 12 y+3
is implicitly defined by 1 y + 3 lny = x − 13 + k 3
(1.3)
To obtain the solution satisfying y3 = −1, put x = 3 and y = −1 into equation (1.3) to obtain 1 −1 = 23 + k 3 hence k=−
11 3
The solution of this initial value problem is implicitly defined by 11 1 y + 3 lny = x − 13 − 3 3
1.2.1
Some Applications of Separable Differential Equations
Separable equations arise in many contexts, of which we will discuss three.
1.2 Separable Equations
15
EXAMPLE 1.11
(The Mathematical Policewoman) A murder victim is discovered, and a lieutenant from the forensic science laboratory is summoned to estimate the time of death. The body is located in a room that is kept at a constant 68 degrees Fahrenheit. For some time after the death, the body will radiate heat into the cooler room, causing the body’s temperature to decrease. Assuming (for want of better information) that the victim’s temperature was a “normal” 986 at the time of death, the lieutenant will try to estimate this time by observing the body’s current temperature and calculating how long it would have had to lose heat to reach this point. According to Newton’s law of cooling, the body will radiate heat energy into the room at a rate proportional to the difference in temperature between the body and the room. If Tt is the body temperature at time t, then for some constant of proportionality k, T t = k Tt − 68 The lieutenant recognizes this as a separable differential equation and writes 1 dT = k dt T − 68 Upon integrating, she gets lnT − 68 = kt + C Taking exponentials, she gets T − 68 = ekt+C = Aekt in which A = eC . Then T − 68 = ±Aekt = Bekt Then Tt = 68 + Bekt Now the constants k and B must be determined, and this requires information. The lieutenant arrived at 9:40 p.m. and immediately measured the body temperature, obtaining 944 degrees. Letting 9:40 be time zero for convenience, this means that T0 = 944 = 68 + B and so B = 264. Thus far, Tt = 68 + 264ekt To determine k, the lieutenant makes another measurement. At 11:00 she finds that the body temperature is 892 degrees. Since 11:00 is 80 minutes past 9:40, this means that T80 = 892 = 68 + 264e80k Then e80k = so
212 264
80k = ln
212 264
16
CHAPTER 1
First-Order Differential Equations
and
212 1 ln k= 80 264
The lieutenant now has the temperature function: Tt = 68 + 264eln212/264t/80 In order to find when last time when the body was 986 (presumably the time of death), solve for the time in Tt = 986 = 68 + 264eln212/264t/80 To do this, the lieutenant writes 306 = eln212/264t/80 264 and takes the logarithm of both sides to obtain 212 306 t ln ln = 264 80 264 Therefore the time of death, according to this mathematical model, was 80 ln306/264 ln212/264
t=
which is approximately −538 minutes. Death occurred approximately 538 minutes before (because of the negative sign) the first measurement at 9:40, which was chosen as time zero. This puts the murder at about 8:46 p.m.
EXAMPLE 1.12
(Radioactive Decay and Carbon Dating) In radioactive decay, mass is converted to energy by radiation. It has been observed that the rate of change of the mass of a radioactive substance is proportional to the mass itself. This means that, if mt is the mass at time t, then for some constant of proportionality k that depends on the substance, dm = km dt This is a separable differential equation. Write it as 1 dm = k dt m and integrate to obtain lnm = kt + c Since mass is positive, m = m and lnm = kt + c Then mt = ekt+c = Aekt in which A can be any positive number.
1.2 Separable Equations
17
Determination of A and k for a given element requires two measurements. Suppose at some time, designated as time zero, there are M grams present. This is called the initial mass. Then m0 = A = M so mt = Mekt If at some later time T we find that there are MT grams, then mT = MT = MekT Then
ln
hence
MT M
= kT
MT 1 k = ln T M
This gives us k and determines the mass at any time: mt = MelnMT /Mt/T We obtain a more convenient formula for the mass if we choose the time of the second measurement more carefully. Suppose we make the second measurement at that time T = H at which exactly half of the mass has radiated away. At this time, half of the mass remains, so MT = M/2 and MT /M = 1/2. Now the expression for the mass becomes mt = Meln1/2t/H or mt = Me− ln2t/H
(1.4)
This number H is called the half-life of the element. Although we took it to be the time needed for half of the original amount M to decay, in fact, between any times t1 and t1 + H, exactly half of the mass of the element present at t1 will radiate away. To see this, write mt1 + H = Me− ln2t1 +H/H = Me− ln2t1 /H e− ln2H/H = e− ln2 mt1 = 21 mt1 Equation (1.4) is the basis for an important technique used to estimate the ages of certain ancient artifacts. The earth’s upper atmosphere is constantly bombarded by high-energy cosmic rays, producing large numbers of neutrons, which collide with nitrogen in the air, changing some of it into radioactive carbon-14, or 14 C. This element has a half-life of about 5,730 years. Over the relatively recent period of the history of this planet in which life has evolved, the fraction of 14 C in the atmosphere, compared to regular carbon, has been essentially constant. This means that living matter (plant or animal) has injested 14 C at about the same rate over a long historical period, and objects living, say, two million years ago would have had the same ratio of carbon-14 to carbon in their bodies as objects alive today. When an organism dies, it ceases its intake of 14 C, which then begins to decay. By measuring the ratio of 14 C to carbon in an artifact, we can estimate the amount of the decay, and hence the time it took, giving an
18
CHAPTER 1
First-Order Differential Equations
estimate of the time the organism was alive. This process of estimating the age of an artifact is called carbon dating. Of course, in reality the ratio of 14 C in the atmosphere has only been approximately constant, and in addition a sample may have been contaminated by exposure to other living organisms, or even to the air, so carbon dating is a sensitive process that can lead to controversial results. Nevertheless, when applied rigorously and combined with other tests and information, it has proved a valuable tool in historical and archeological studies. To apply equation (1.4) to carbon dating, use H = 5730 and compute ln2 ln2 = ≈ 0000120968 H 5730 in which ≈ means “approximately equal” (not all decimal places are listed). Equation (1.4) becomes mt = Me−0000120968t Now suppose we have an artifact, say a piece of fossilized wood, and measurements show that the ratio of 14 C to carbon in the sample is 37 percent of the current ratio. If we say that the wood died at time 0, then we want to compute the time T it would take for one gram of the radioactive carbon to decay this amount. Thus, solve for T in 037 = e−0000120968T We find that T =−
ln037 ≈ 8,219 0000120968
years. This is a little less than one and one-half half-lives, a reasonable estimate if nearly the 14 C has decayed.
2 3
of
EXAMPLE 1.13
(Torricelli’s Law) Suppose we want to estimate how long it will take for a container to empty by discharging fluid through a drain hole. This is a simple enough problem for, say, a soda can, but not quite so easy for a large oil storage tank or chemical facility. We need two principles from physics. The first is that the rate of discharge of a fluid flowing through an opening at the bottom of a container is given by dV = −kAv dt in which Vt is the volume of fluid in the container at time t, vt is the discharge velocity of fluid through the opening, A is the cross sectional area of the opening (assumed constant), and k is a constant determined by the viscosity of the fluid, the shape of the opening, and the fact that the cross-sectional area of fluid pouring out of the opening is slightly less than that of the opening itself. In practice, k must be determined for the particular fluid, container, and opening, and is a number between 0 and 1. We also need Torricelli’s law, which states that vt is equal to the velocity of a free-falling particle released from a height equal to the depth of the fluid at time t. (Free-falling means that the particle is influenced by gravity only). Now the work done by gravity in moving the particle from its initial point by a distance ht is mght, and this must equal the change in the kinetic energy, 21 mv2 . Therefore, vt = 2ght
1.2 Separable Equations 18 h
r
19
18
h
FIGURE 1.9
Putting these two equations together yields dV = −kA 2ght dt
(1.5)
We will apply equation (1.5) to a specific case to illustrate its use. Suppose we have a hemispherical tank of water, as in Figure 1.9. The tank has radius 18 feet, and water drains through a circular hole of radius 3 inches at the bottom. How long will it take the tank to empty? Equation (1.5) contains two unknown functions, Vt and ht, so one must be eliminated. Let rt be the radius of the surface of the fluid at time t and consider an interval of time from t0 to t1 = t0 + t. The volume V of water draining from the tank in this time equals the volume of a disk of thickness h (the change in depth) and radius rt∗ , for some t∗ between t0 and t1 . Therefore V = rt∗ 2 h so h V = rt∗ 2 t t In the limit as t → 0, dh dV = r 2 dt dt Putting this into equation (1.5) yields r 2
dh = −kA 2gh dt
Now V has been eliminated, but at the cost of introducing rt. However, from Figure 1.9, r 2 = 182 − 18 − h2 = 36h − h2 so dh = −kA 2gh 36h − h2 dt This is a separable differential equation, which we write as
36h − h2 dh = −kA 2g dt 1/2 h
Take g to be 32 feet per second per second. The radius of the circular opening is 3 inches, or 1 feet, so its area is A = /16 square feet. For water, and an opening of this shape and size, 4 the experiment gives k = 08. The last equation becomes √ 1 1/2 3/2 dh = −08 36h − h 64 dt 16
CHAPTER 1
20
First-Order Differential Equations
or
36h1/2 − h3/2 dh = −04 dt
A routine integration yields 2 2 24h3/2 − h5/2 = − t + c 5 5 or 60h3/2 − h5/2 = −t + k Now h0 = 18, so √
60183/2 − 185/2 = k
Thus k = 2268 2 and ht is implicitly determined by the equation √ 60h3/2 − h5/2 = 2268 2 − t √ The tank is empty when h = 0, and this occurs when t = 2268 2 seconds, or about 53 minutes, 28 seconds. The last three examples contain an important message. Differential equations can be used to solve a variety of problems, but a problem usually does not present itself as a differential equation. Normally we have some event or process, and we must use whatever information we have about it to derive a differential equation and initial conditions. This process is called mathematical modeling. The model consists of the differential equation and other relevant information, such as initial conditions. We look for a function satisfying the differential equation and the other information, in the hope of being able to predict future behavior, or perhaps better understand the process being considered.
SECTION 1.2
PROBLEMS
In each of Problems 1 through 10, determine if the differential equation is separable. If it is, find the general solution (perhaps implicitly defined). If it is not separable, do not attempt a solution at this time.
In each of Problems 11 through 15, solve the initial value problem. 11. xy2 y = y + 1 y3e2 = 2 12. y = 3x2 y + 2 y2 = 8
1. 3y = 4x/y2
13. lnyx y = 3x2 y y2 = e3
2. y + xy = 0
14. 2yy = ex−y y4 = −2
3. cosyy = sinx + y
15. yy = 2x sec3y y2/3 = /3
4. e
x+y
y = 3x
5. xy + y = y 6. y =
2
x + 12 − 2y 2y
7. x sinyy = cosy 8.
x 2y2 + 1 y = y x+1
9. y + y = ex − siny 10. cosx + y + sinx − yy = cos2x
2
16. An object having a temperature of 90 degrees Fahrenheit is placed into an environment kept at 60 degrees. Ten minutes later the object has cooled to 88 degrees. What will be the temperature of the object after it has been in this environment for 20 minutes? How long will it take for the object to cool to 65 degrees? 17. A thermometer is carried outside a house whose ambient temperature is 70 degrees Fahrenheit. After five minutes the thermometer reads 60 degrees, and fifteen minutes after this, 504 degrees. What is the outside temperature (which is assumed to be constant)?
1.2 Separable Equations 18. Assume that the population of bacteria in a petri dish changes at a rate proportional to the population at that time. This means that, if Pt is the population at time t, then dP = kP dt for some constant k. A particular culture has a population density of 100,000 bacteria per square inch. A culture that covered an area of 1 square inch at 10:00 a.m. on Tuesday was found to have grown to cover 3 square inches by noon the following Thursday. How many bacteria will be present at 3:00 p.m. the following Sunday? How many will be present on Monday at 4:00 p.m.? When will the world be overrun by these bacteria, assuming that they can live anywhere on the earth’s surface? (Here you need to look up the land area of the earth.) 19. Assume that a sphere of ice melts at a rate proportional to its surface area, retaining a spherical shape. Interpret melting as a reduction of volume with respect to time. Determine an expression for the volume of the ice at any time t. 20. A radioactive element has a half-life of ln2 weeks. If e3 tons are present at a given time, how much will be left 3 weeks later? 21. The half-life of uranium-238 is approximately 45 · 109 years. How much of a 10-kilogram block of U-238 will be present 1 billion years from now? 22. Given that 12 grams of a radioactive element decays to 91 grams in 4 minutes, what is the half-life of this element? 23. Evaluate
e−t
2 −9/t 2
dt
0
Hint: Let Ix =
e−t
2 −x/t2
dt
0
Calculate I x by differentiating under the integral sign, then let u = x/t. Show that I x = −2Ix and solve for Ix. Evaluate by using the stan 2 the constant √ dard result that 0 e−t dt = /2. Finally, evaluate I3. 24. Derive the fact used in Example 1.13 that vt = 2ght. Hint: Consider a free-falling particle having height ht at time t. The work done by gravity in moving the particle from its starting point to a given point is mght, and this must equal the change in the kinetic energy, which is 1/2mv2 .
21
25. Calculate the time required to empty the hemispherical tank of Example 1.13 if the tank is positioned with its flat side down. 26. (Draining a Hot Tub) Consider a cylindrical hot tub with a 5-foot radius and height of 4 feet, placed on one of its circular ends. Water is draining from the tub through a circular hole 58 inches in diameter located in the base of the tub. (a) Assume a value k = 06 to determine the rate at which the depth of the water is changing. Here it is useful to write dh dV dV/dt dh = = dt dV dt dV/dh (b) Calculate the time T required to drain the hot tub if it is initially full. Hint: One way to do this is to write T=
0
H
dt dh dh
(c) Determine how much longer it takes to drain the lower half than the upper half of the tub. Hint: Use the integral suggested in (b), with different limits for the two halves. 27. (Draining a Cone) A tank shaped like a right circular cone, with its vertex down, is 9 feet high and has a diameter of 8 feet. It is initially full of water. (a) Determine the time required to drain the tank through a circular hole of diameter 2 inches at the vertex. Take k = 06. (b) Determine the time it takes to drain the tank if it is inverted and the drain hole is of the same size and shape as in (a), but now located in the new base. 28. (Drain Hole at Unknown Depth) Determine the rate of change of the depth of water in the tank of Problem 27 (vertex at the bottom) if the drain hole is located in the side of the cone 2 feet above the bottom of the tank. What is the rate of change in the depth of the water when the drain hole is located in the bottom of the tank? Is it possible to determine the location of the drain hole if we are told the rate of change of the depth and the depth of the water in the tank? Can this be done without knowing the size of the drain opening? 29. Suppose the conical tank of Problem 27, vertex at the bottom, is initially empty and water is added at the constant rate of /10 cubic feet per second. Does the tank ever overflow? 30. (Draining a Sphere) Determine the time it takes to completely drain a spherical tank of radius 18 feet if it is initially full of water and the water drains through a circular hole of radius 3 inches located in the bottom of the tank. Use k = 08.
22
1.3
CHAPTER 1
First-Order Differential Equations
Linear Differential Equations DEFINITION 1.2
Linear Differential Equation
A first-order differential equation is linear if it has the form y x + pxy = qx Assume that p and q are continuous on an interval I (possibly the whole real line). Because of the special form of the linear equation, we can obtain the general solution on I by a clever observation. Multiply the differential equation by e px dx to get
e
px dx
y x + pxe
px dx
y = qxe
px dx
The left side of this equation is the derivative of the product yxe
px dx
, enabling us to write
d yxe px dx = qxe px dx dx Now integrate to obtain
yxe
px dx
Finally, solve for yx: yx = e−
px dx
qxe px dx dx + C
=
qxe
px dx
dx + Ce−
px dx
(1.6)
The function e px dx is called an integrating factor for the differential equation, because multiplication of the differential equation by this factor results in an equation that can be integrated to obtain the general solution. We do not recommend memorizing equation (1.6). Instead, recognize the form of the linear equation and understand the technique of solving it by multiplying by e px dx . EXAMPLE 1.14
The equation y + y = sinx is linear. Here px = 1 and qx = sinx, both continuous for all x. An integrating factor is
e
dx
or ex . Multiply the differential equation by ex to get y ex + yex = ex sinx or yex = ex sinx Integrate to get
1 yex = ex sinx dx = ex sinx − cosx + C 2 The general solution is yx =
1
sinx − cosx + Ce−x 2
1.3 Linear Differential Equations
23
EXAMPLE 1.15
Solve the initial value problem y y = 3x2 − x
y1 = 5
First recognize that the differential equation can be written in linear form: 1 y + y = 3x2 x An integrating factor is e to get
1/x dx
= elnx = x, for x > 0. Multiply the differential equation by x xy + y = 3x3
or xy = 3x3 Integrate to get 3 xy = x4 + C 4 Then 3 C yx = x3 + 4 x for x > 0. For the initial condition, we need y1 = 5 =
3 +C 4
so C = 17/4 and the solution of the initial value problem is 3 17 yx = x3 + 4 4x for x > 0. Depending on p and q, it may not be possible to evaluate all of the integrals in the general solution 1.6 in closed form (as a finite algebraic combination of elementary functions). This occurs with y + xy = 2 whose general solution is yx = 2e−x
2 /2
ex
2 /2
dx + Ce−x /2 2
2 We cannot write ex /2 dx in elementary terms. However, we could still use a software package to generate a direction field and integral curves, as is done in Figure 1.10. This provides some idea of the behavior of solutions, at least within the range of the diagram.
24
CHAPTER 1
First-Order Differential Equations y 4
2
4
2
x 0
2
4
2
4
Integral curves of y + xy = 2 passing through 0 2, 0 4, 0 −2, and 0 −5. FIGURE 1.10
Linear differential equations arise in many contexts. Example 1.11, involving estimation of time of death, involved a separable differential equation which is also linear and could have been solved using an integrating factor.
EXAMPLE 1.16
(A Mixing Problem) Sometimes we want to know how much of a given substance is present in a container in which various substances are being added, mixed, and removed. Such problems are called mixing problems, and they are frequently encountered in the chemical industry and in manufacturing processes. As an example, suppose a tank contains 200 gallons of brine (salt mixed with water), in which 100 pounds of salt are dissolved. A mixture consisting of 18 pound of salt per gallon is flowing into the tank at a rate of 3 gallons per minute, and the mixture is continuously stirred. Meanwhile, brine is allowed to empty out of the tank at the same rate of 3 gallons per minute (Figure 1.11). How much salt is in the tank at any time?
1
8 lb/gal; 3 gal/min
3 gal/min
FIGURE 1.11
Before constructing a mathematical model, notice that the initial ratio of salt to brine in the tank is 100 pounds per 200 gallons, or 21 pound per gallon. Since the mixture pumped in has a constant ratio of 18 pound per gallon, we expect the brine mixture to dilute toward the incoming ratio, with a “terminal” amount of salt in the tank of 18 pound per gallon, times 200 gallons. This leads to the expectation that in the long term (as t → ) the amount of salt in the tank should approach 25 pounds.
1.3 Linear Differential Equations
25
Now let Qt be the amount of salt in the tank at time t. The rate of change of Qt with time must equal the rate at which salt is pumped in, minus the rate at which it is pumped out. Thus dQ = (rate in) − (rate out) dt gallons Qt pounds gallons 1 pounds 3 − 3 = 8 gallon minute 200 gallon minute =
3 3 − Qt 8 200
This is the linear equation Q t + An integrating factor is e obtain
3/200 dt
3 3 Q= 200 8
= e3t/200 . Multiply the differential equation by this factor to
Q e3t/200 +
3 3t/200 3 e Q = e3t/200 200 8
or 3t/200 3 3t/200 Qe = e 8 Then Qe3t/200 =
3 200 3t/200 e + C 8 3
so Qt = 25 + Ce−3t/200 Now Q0 = 100 = 25 + C so C = 75 and Qt = 25 + 75e−3t/200 As we expected, as t increases, the amount of salt approaches the limiting value of 25 pounds. From the derivation of the differential equation for Qt, it is apparent that this limiting value depends on the rate at which salt is poured into the tank, but not on the initial amount of salt in the tank. The term 25 in the solution is called the steady-state part of the solution because it is independent of time, and the term 75e−3t/200 is the transient part. As t increases, the transient part exerts less influence on the amount of salt in the tank, and in the limit the solution approaches its steady-state part.
CHAPTER 1
26
First-Order Differential Equations
SECTION 1.3
PROBLEMS
In each of Problems 1 through 8, find the general solution. Not all integrals can be done in closed form. 3 1. y − y = 2x2 x 2. y − y = sinhx 3. y + 2y = x 4. sin2xy + 2y sin2 x = 2 sinx 5. y − 2y = −8x2 6. x2 − x − 2y + 3xy = x2 − 4x + 4 7. y + y =
x−1 x2
8. y + secxy = cosx In each of Problems 9 through 14, solve the initial value problem. 9. y +
1 y = 3x y3 = 4 x−2
10. y + 3y = 5e2x − 6 y0 = 2 11. y +
2 y = 3 y0 = 5 x+1
16. A 500-gallon tank initially contains 50 gallons of brine solution in which 28 pounds of salt have been dissolved. Beginning at time zero, brine containing 2 pounds of salt per gallon is added at the rate of 3 gallons per minute, and the mixture is poured out of the tank at the rate of 2 gallons per minute. How much salt is in the tank when it contains 100 gallons of brine? Hint: The amount of brine in the tank at time t is 50 + t. 17. Two tanks are cascaded as in Figure 1.12. Tank 1 initially contains 20 pounds of salt dissolved in 100 gallons of brine, while tank 2 contains 150 gallons of brine in which 90 pounds of salt are dissolved. At time zero a brine solution containing 21 pound of salt per gallon is added to tank 1 at the rate of 5 gallons per minute. Tank 1 has an output that discharges brine into tank 2 at the rate of 5 gallons per minute, and tank 2 also has an output of 5 gallons per minute. Determine the amount of salt in each tank at any time t. Also determine when the concentration of salt in tank 2 is a minimum and how much salt is in the tank at that time. Hint: Solve for the amount of salt in tank 1 at time t first and then use this solution to determine the amount in tank 2.
5 gal/min
12. x2 − 2xy + x2 − 5x + 4y = x4 − 2x3 e−x y3 = 18e−3
1 2
lb/gal Tank 1
13. y − y = 2e4x y0 = −3 14. y +
5y = 3x3 + x y−1 = 4 9x
Tank 2 5 gal/min
15. Find all functions with the property that the y-intercept of the tangent to the graph at x y is 2x2 .
1.4
5 gal/min
FIGURE 1.12
Mixing between tanks in Problem 17.
Exact Differential Equations We continue the theme of identifying certain kinds of first-order differential equations for which there is a method leading to a solution. We can write any first order equation y = fx y in the form Mx y + Nx yy = 0 For example, put Mx y = −fx y and Nx y = 1. An interesting thing happens if there is a function such that
= Mx y and
x
= Nx y
y
(1.7)
1.4 Exact Differential Equations
27
In this event, the differential equation becomes
dy + = 0
x y dx which, by the chain rule, is the same as d x yx = 0 dx But this means that x yx = C with C constant. If we now read this argument from the last line back to the first, the conclusion is that the equation x y = C implicitly defines a function yx that is the general solution of the differential equation. Thus, finding a function that satisfies equation (1.7) is equivalent to solving the differential equation. Before taking this further, consider an example.
EXAMPLE 1.17
The differential equation y = −
2xy3 + 2 3x2 y2 + 8e4y
is neither separable nor linear. Write it in the form M + Ny = 2xy3 + 2 + 3x2 y2 + 8e4y y = 0
(1.8)
with Mx y = 2xy3 + 2
and
Nx y = 3x2 y2 + 8e4y
Equation (1.8) can in turn be written M dx + N dy = 2xy3 + 2 dx + 3x2 y2 + 8e4y dy = 0 Now let x y = x2 y3 + 2x + 2e4y Soon we will see where this came from, but for now, observe that
= 2xy3 + 2 = M
x
and
= 3x2 y2 + 8e4y = N
y
With this choice of x y, equation (1.9) becomes
dx + dy = 0
x
y or d x y = 0
(1.9)
28
CHAPTER 1
First-Order Differential Equations
The general solution of this equation is x y = C or, in this example, x2 y3 + 2x + 2e4y = C This implicitly defines the general solution of the differential equation (1.8). To verify this, differentiate the last equation implicitly with respect to x: 2xy3 + 3x2 y2 y + 2 + 8e4y y = 0 or 2xy3 + 2 + 3x2 y2 + 8e4y y = 0 This is equivalent to the original differential equation y = −
2xy3 + 2 3x2 y2 + 8e4y
With this as background, we will make the following definitions.
DEFINITION 1.3
Potential Function
A function is a potential function for the differential equation Mx y + Nx yy = 0 on a region R of the plane if, for each x y in R,
= Mx y and
x
DEFINITION 1.4
= Nx y
y
Exact Differential Equation
When a potential function exists on a region R for the differential equation M + Ny = 0, then this equation is said to be exact on R.
The differential equation of Example 1.17 is exact (over the entire plane), because we exhibited a potential function for it, defined for all x y. Once a potential function is found, we can write an equation implicitly defining the general solution. Sometimes we can explicitly solve for the general solution, and sometimes we cannot. Now go back to Example 1.17. We want to explore how the potential function that materialized there was found. Recall we required that
= 2xy3 + 2 = M
x
and
= 3x2 y2 + 8e4y = N
y
1.4 Exact Differential Equations
29
Pick either of these equations to begin and integrate it. Say we begin with the first. Then integrate with respect to x: x y =
dx = 2xy3 + 2 dx
x
= x2 y3 + 2x + gy In this integration with respect to x we held y fixed, hence we must allow that y appears in the “constant” of integration. If we calculate / x, we get 2xy2 + 2 for any function gy. Now we know to within this function g. Use the fact that we know / y to write
= 3x2 y2 + 8e4y
y =
2 3 x y + 2x + gy = 3x2 y2 + g y
y
This equation holds if g y = 8e4y , hence we may choose gy = 2e4y . This gives the potential function x y = x2 y3 + 2x + 2e4y If we had chosen to integrate / y first, we would have gotten x y =
3x2 y2 + 8e4y dy
= x2 y3 + 2e4y + hx Here h can be any function of one variable, because no matter how hx is chosen,
2 3 x y + 2e4y + hx = 3x2 y2 + 8e4y
y as required. Now we have two expressions for / x:
= 2xy3 + 2
x =
2 3 x y + 2e4y + hx = 2xy3 + h x
x
This equation forces us to choose h so that h x = 2, and we may therefore set hx = 2x. This gives x y = x2 y3 + 2e4y + 2x as we got before. Not every first-order differential equation is exact. For example, consider y + y = 0
30
CHAPTER 1
First-Order Differential Equations
If there were a potential function , then we would have
= y
x
= 1
y
Integrate / x = y with respect to x to get x y = xy + gy. Substitute this into / y = 1 to get
xy + gy = x + g y = 1
y But this can hold only if g y = 1 − x, an impossibility if g is to be independent of x. Therefore, y + y = 0 has no potential function. This differential equation is not exact (even though it is easily solved either as a separable or as a linear equation). This example suggests the need for a convenient test for exactness. This is provided by the following theorem, in which a “rectangle in the plane” refers to the set of points on or inside any rectangle having sides parallel to the axes.
THEOREM 1.1
Test for Exactness
Suppose Mx y, Nx y, M/ y, and N/ x are continuous for all x y within a rectangle R in the plane. Then, Mx y + Nx yy = 0 is exact on R if and only if, for each x y in R,
M
N =
y
x Proof
If M + Ny = 0 is exact, then there is a potential function and
= Mx y and
x
= Nx y
y
Then, for x y in R,
M
=
y
y
x
2
2 = = =
y x x y x
y
=
N
x
Conversely, suppose M/ y and N/ x are continuous on R. Choose any x0 y0 in R and define, for x y in R, x y =
x
x0
M y0 d +
y
Nx d y0
Immediately we have, from the fundamental theorem of calculus,
= Nx y
y
(1.10)
1.4 Exact Differential Equations
31
since the first integral in equation (1.10) is independent of y. Next, compute
x
y = M y0 d + Nx d
x
x x0
x y0 y N x d = Mx y0 + y0 x y M = Mx y0 + x d y0 y = Mx y0 + Mx y − Mx y0 = Mx y and the proof is complete. For example, consider again y + y = 0. Here Mx y = y and Nx y = 1, so
N =0
x
and
M =1
y
throughout the entire plane. Thus, y + y = 0 cannot be exact on any rectangle in the plane. We saw this previously by showing that this differential equation can have no potential function.
EXAMPLE 1.18
Consider x2 + 3xy + 4xy + 2xy = 0 Here Mx y = x2 + 3xy and Nx y = 4xy + 2x. Now
N = 4y + 2
x
and
M = 3x
y
and 3x = 4y + 2 is satisfied by all x y on a straight line. However, N/ x = M/ y cannot hold for all x y in an entire rectangle in the plane. Hence this differential equation is not exact on any rectangle.
EXAMPLE 1.19
Consider ex siny − 2x + ex cosy + 1 y = 0 With Mx y = ex siny − 2x and Nx y = ex cosy + 1, we have
M
N = ex cosy =
x
y for all x y. Therefore this differential equation is exact. To find a potential function, set
= ex siny − 2x
x
and
= ex cosy + 1
y
32
CHAPTER 1
First-Order Differential Equations
Choose one of these equations and integrate it. Integrate the second equation with respect to y: x y = ex cosy + 1 dy = ex siny + y + hx Then we must have
= ex siny − 2x
x =
x e siny + y + hx = ex siny + h x
x
Then h x = −2x and we may choose hx = −x2 . A potential function is x y = ex siny + y − x2 The general solution of the differential equation is defined implicitly by ex siny + y − x2 = C Note of Caution: If is a potential function for M + Ny = 0, itself is not the solution. The general solution is defined implicitly by the equation x y = C.
SECTION 1.4
PROBLEMS
In each of Problems 1 through 8, determine where (if anywhere) in the plane the differential equation is exact. If it is exact, find a potential function and the general solution, perhaps implicitly defined. If the equation is not exact, do not attempt a solution at this time. 1. 2y2 + yexy + 4xy + xexy + 2yy = 0 2. 4xy + 2x + 2x2 + 3y2 y = 0 3. 4xy + 2x2 y + 2x2 + 3y2 y = 0 4. 2 cosx + y − 2x sinx + y − 2x sinx + yy = 0 1 5. + y + 3y2 + xy = 0 x 6. ex siny2 + xex siny2 + 2xyex siny2 + ey y = 0 7. sinhx sinhy + coshx coshyy = 0 8. 4y4 + 3 cosx + 16y3 x − 3 cosyy = 0 In each of Problems 9 through 14, determine if the differential equation is exact in some rectangle containing in its interior the point where the initial condition is given. If so, solve the initial value problem. This solution may be implicitly defined. If the differential equation is not exact, do not attempt a solution.
9. 3y4 − 1 + 12xy3 y = 0 y1 = 2 10. 2y − y2 sec2 xy2 + 2x − 2xy sec2 xy2 y = 0 y1 = 2 11. x cos2y − x − sin2y − x − 2x cos2y − xy = 0 y/12 = /8 y 12. 1 + ey/x − ey/x + ey/x y = 0 y1 = −5 x 13. y sinhy − x − coshy − x + y sinhy − xy = 0 y4 = 4 14. ey + xey − 1y = 0 y5 = 0 In Problems 15 and 16, choose a constant so that the differential equation is exact, then produce a potential function and obtain the general solution. 15. 2xy3 − 3y − 3x + x2 y2 − 2yy = 0 16. 3x2 + xy − x2 y−1 y = 0 17. Let be a potential function for M + Ny = 0 in some region R of the plane. Show that for any constant c, + c is also a potential function. How does the general solution of M + Ny = 0 obtained by using differ from that obtained using + c?
1.5 Integrating Factors
1.5
33
Integrating Factors “Most” differential equations are not exact on any rectangle. But sometimes we can multiply the differential equation by a nonzero function x y to obtain an exact equation. Here is an example that suggests why this might be useful.
EXAMPLE 1.20
The equation y2 − 6xy + 3xy − 6x2 y = 0
(1.11)
is not exact on any rectangle. Multiply it by x y = y to get y3 − 6xy2 + 3xy2 − 6x2 yy = 0
(1.12)
Wherever y = 0, equations (1.11) and (1.12) have the same solution. The reason for this is that equation (1.12) is just
y y2 − 6xy + 3xy − 6x2 y = 0 and if y = 0, then necessarily y2 − 6xy + 3xy − 6x2 y = 0. Now notice that equation (1.12) is exact (over the entire plane), having potential function x y = xy3 − 3x2 y2 Thus the general solution of equation (1.12) is defined implicitly by xy3 − 3x2 y2 = C and, wherever y = 0, this defines the general solution of equation (1.11) as well. To review what has just occurred, we began with a nonexact differential equation. We multiplied it by a function chosen so that the new equation was exact. We solved this exact equation, then found that this solution also worked for the original, nonexact equation. The function therefore enabled us to solve a nonexact equation by solving an exact one. This idea is worth pursuing, and we begin by giving a name to .
DEFINITION 1.5
Let Mx y and Nx y be defined on a region R of the plane. Then x y is an integrating factor for M + Ny = 0 if x y = 0 for all x y in R, and M + Ny = 0 is exact on R.
How do we find an integrating factor for M + Ny = 0? For to be an integrating factor, M + Ny = 0 must be exact (in some region of the plane), hence
N = M
x
y
(1.13)
34
CHAPTER 1
First-Order Differential Equations
in this region. This is a starting point. Depending on M and N , we may be able to determine from this equation. Sometimes equation (1.13) becomes simple enough to solve if we try as a function of just x or just y.
EXAMPLE 1.21
The differential equation x − xy − y = 0 is not exact. Here M = x − xy and N = −1 and equation (1.13) is
− = x − xy
x
y Write this as −
= x − xy − x
x
y
Now observe that this equation is simplified if we try to find as just a function of x, because in this event / y = 0 and we are left with just
= x
x This is separable. Write 1 d = x dx and integrate to obtain ln = 21 x2 Here we let the constant of integration be zero because we need only one integrating factor. From the last equation, choose 2
x = ex /2 a nonzero function. Multiply the original differential equation by ex x − xye
x2 /2
−e
x2 /2
2 /2
to obtain
y = 0
This equation is exact over the entire plane, and we find the potential function x y = 2 1 − yex /2 . The general solution of this exact equation is implicitly defined by 1 − yex
2 /2
= C
In this case, we can explicitly solve for y to get yx = 1 − Ce−x /2 2
and this is also the general solution of the original equation x − xy − y = 0. If we cannot find an integrating factor that is a function of just x or just y, then we must try something else. There is no template to follow, and often we must start with equation (1.13) and be observant.
1.5 Integrating Factors
35
EXAMPLE 1.22
Consider 2y2 − 9xy + 3xy − 6x2 y = 0. This is not exact. With M = 2y2 − 9xy and N = 3xy − 6x2 , begin looking for an integrating factor by writing equation (1.13):
3xy − 6x2 =
2y2 − 9xy
x
y This is 3xy − 6x2
+ 3y − 12x = 2y2 − 9xy + 4y − 9x
x
y
(1.14)
If we attempt = x, then / y = 0 and we obtain 3xy − 6x2
+ 3y − 12x = 4y − 9x
x
which cannot be solved for as just a function of x. Similarly, if we try = y, so / x = 0, we obtain an equation we cannot solve. We must try something else. Notice that equation (1.14) involves only integer powers of x and y. This suggests that we try x y = xa yb . Substitute this into equation (1.14) and attempt to choose a and b. The substitution gives us 3axa yb+1 − 6axa+1 yb + 3xa yb+1 − 12xa+1 yb = 2bxa yb+1 − 9bxa+1 yb + 4xa yb+1 − 9xa+1 yb Assume that x = 0 and y = 0. Then we can divide by xa yb to get 3ay − 6ax + 3y − 12x = 2by − 9bx + 4y − 9x Rearrange terms to write 1 + 2b − 3ay = −3 + 9b − 6ax Since x and y are independent, this equation can hold for all x and y only if 1 + 2b − 3a = 0
and
− 3 + 9b − 6a = 0
Solve these equations to obtain a = b = 1. An integrating factor is x y = xy. Multiply the differential equation by xy to get 2xy3 − 9x2 y2 + 3x2 y2 − 6x3 yy = 0 This is exact with potential function x y = x2 y3 − 3x3 y2 . For x = 0 and y = 0, the solution of the original differential equation is given implicitly by x2 y3 − 3x3 y2 = C The manipulations used to find an integrating factor may fail to find some solutions, as we saw with singular solutions of separable equations. Here are two examples in which this occurs.
EXAMPLE 1.23
Consider 2xy − y = 0 y−1
(1.15)
36
CHAPTER 1
First-Order Differential Equations
We can solve this as a separable equation, but here we want to make a point about integrating factors. Equation (1.15) is not exact, but x y = y − 1/y is an integrating factor for y = 0, a condition not required by the differential equation itself. Multiplying the differential equation by x y yields the exact equation 2x −
y−1 y = 0 y
with potential function x y = x2 − y + lny and general solution defined by x2 − y + lny = C
for y = 0
This is also the general solution of equation (1.15), but the method used has required that y = 0. However, we see immediately that y = 0 is also a solution of equation (1.15). This singular solution is not contained in the expression for the general solution for any choice of C.
EXAMPLE 1.24
The equation y − 3 − xy = 0
(1.16)
is not exact, but x y = 1/xy − 3 is an integrating factor for x = 0 and y = 3, conditions not required by the differential equation itself. Multiplying equation (1.16) by x y yields the exact equation 1 1 − y = 0 x y−3 with general solution defined by lnx + C = lny − 3 This is also the general solution of equation (1.16) in any region of the plane not containing the lines x = 0 or y = 3. This general solution can be solved for y explicitly in terms of x. First, any real number is the natural logarithm of some positive number, so write the arbitrary constant as C = lnk, in which k can be any positive number. The equation for the general solution becomes lnx + lnk = lny − 3 or lnkx = lny − 3 But then y − 3 = ±kx. Replacing ±k with K, which can now be any nonzero real number, we obtain y = 3 + Kx as the general solution of equation (1.16). Now observe that y = 3 is a solution of equation (1.16). This solution was “lost”, or at least not found, in using the integrating factor as a method of solution. However, y = 3 is not a singular solution because we can include it in the expression y = 3 + Kx by allowing K = 0. Thus the general solution of equation (1.16) is y = 3 + Kx, with K any real number.
1.5 Integrating Factors
1.5.1
37
Separable Equations and Integrating Factors
We will point out a connection between separable equations and integrating factors. The separable equation y = AxBy is in general not exact. To see this, write it as AxBy − y = 0 so in the present context we have Mx y = AxBy and Nx y = −1. Now
−1 = 0
x
and
AxBy = AxB y
y
and in general AxB y = 0. However, y = 1/By is an integrating factor for the separable equation. If we multiply the differential equation by 1/By, we get Ax − an exact equation because
x
−
1 y = 0 By
1
=
Ax = 0 By
y
The act of separating the variables is the same as multiplying by the integrating factor 1/By.
1.5.2
Linear Equations and Integrating Factors
Consider the linear equation y + pxy = qx. We can write this as pxy − qx + y = 0, so in the present context, Mx y = pxy − qx and Nx y = 1. Now
1 = 0
x
and
pxy − qx = px
y
so the linear equation is not exact unless px is identically zero. However, x y = e is an integrating factor. Upon multiplying the linear equation by , we get
pxy − qxe
px dx
+e
px dx
px dx
y = 0
and this is exact because
px dx
e
pxy − qxe px dx = pxe px dx =
x
y
SECTION 1.5
PROBLEMS
1. Determine a test involving M and N to tell when M + Ny = 0 has an integrating factor that is a function of y only. 2. Determine a test to determine when M + Ny = 0 has an integrating factor of the form x y = xa yb for some constants a and b. 3. Consider y − xy = 0. (a) Show that this equation is not exact on any rectangle.
(b) Find an integrating factor x that is a function of x alone. (c) Find an integrating factor y that is a function of y alone. (d) Show that there is also an integrating factor x y = xa yb for some constants a and b. Find all such integrating factors.
CHAPTER 1
38
First-Order Differential Equations
In each of Problems 4 through 12, (a) show that the differential equation is not exact, (b) find an integrating factor, (c) find the general solution (perhaps implicitly defined), and (d) determine any singular solutions the differential equation might have.
4. xy − 3y = 2x
13. 1 + xy = 0 ye4 = 0 14. 3y + 4xy = 0 y1 = 6 15. 2y3 − 2 + 3xy2 y = 0 y3 = 1
3
16. y1 + x + 2xy = 0 y4 = 6
5. 1 + 3x − e−2y y = 0
2
6. 6x2 y + 12xy + y2 + 6x2 + 2yy = 0
17. 2xy + 3y = 0 y0 = 4 (Hint: try = ya ebx )
7. 4xy + 6y2 + 2x2 + 6xyy = 0
18. 2y1 + x2 + xy = 0 y2 = 3 (Hint: try = xa ebx )
8. y2 + y − xy = 0
19. sinx − y + cosx − y − cosx − yy = 0
2
y0 = 7/6
9. 2xy2 + 2xy + x2 y + x2 y = 0 10. 2y2 − 9xy + 3xy − 6x2 y = 0 (Hint: try x y = xa y b )
11. y + y = y (Hint: try x y = e y ) 4
ax b
12. x2 y + xy = −y−3/2 (Hint: try x y = xa yb ) In each of Problems 13 through 20, find an integrating factor, use it to find the general solution of the differential equation, and then obtain the solution of the initial value problem.
1.6
20. 3x2 y + y3 + 2xy2 y = 0 y2 = 1
21. Show that any nonzero constant multiple of an integrating factor for M + Ny = 0 is also an integrating factor. 22. Let x y be an integrating factor for M + Ny = 0 and suppose that the general solution is defined by x y = C. Show that x yGx y is also an integrating factor, for any differentiable function G of one variable.
Homogeneous, Bernoulli, and Riccati Equations In this section we will consider three additional kinds of first-order differential equations for which techniques for finding solutions are available.
1.6.1
Homogeneous Differential Equations
DEFINITION 1.6
Homogeneous Equation
A first-order differential equation is homogeneous if it has the form y y = f x
1.6 Homogeneous, Bernoulli, and Riccati Equations
39
In a homogeneous equation, y is isolated on one side, and the other side is some expression in which y and x must always appear in the combination y/x. For example, y x y = sin y x is homogeneous, while y = x2 y is not. Sometimes algebraic manipulation will put a first order equation into the form of the homogeneous equation. For example, y y = (1.17) x+y is not homogeneous. However, if x = 0, we can write this as y =
y/x 1 + y/x
(1.18)
a homogeneous equation. Any technique we develop for homogeneous equations can therefore be used on equation (1.18). However, this solution assumes that x = 0, which is not required in equation (1.17). Thus, as we have seen before, when we perform manipulations on a differential equation, we must be careful that solutions have not been overlooked. A solution of equation (1.18) will also satisfy (1.17), but equation (1.17) may have other solutions as well. Now to the point. A homogeneous equation is always transformed into a separable one by the transformation y = ux. To see this, compute y = u x + x u = u x + u and write u = y/x. Then y = fy/x becomes u x + u = fu We can write this as 1 du 1 = fu − u dx x or, in differential form, 1 1 du = dx fu − u x and the variables (now x and u) have been separated. Upon integrating this equation, we obtain the general solution of the transformed equation. Substituting u = y/x then gives the general solution of the original homogeneous equation.
EXAMPLE 1.25
Consider xy = Write this as y =
y2 + y x
y 2 x
y + x
Let y = ux. Then u x + u = u2 + u
40
CHAPTER 1
First-Order Differential Equations
or u x = u2 Write this as 1 1 du = dx 2 u x and integrate to obtain −
1 = lnx + C u
Then ux =
−1 lnx + C
the general solution of the transformed equation. The general solution of the original equation is y=
−x lnx + C
EXAMPLE 1.26 A Pursuit Problem
A pursuit problem is one of determining a trajectory so that one object intercepts another. Examples involving pursuit problems are missiles fired at airplanes and a rendezvous of a shuttle with a space station. These are complex problems that require numerical approximation techniques. We will consider a simple pursuit problem that can be solved explicitly. Suppose a person jumps into a canal of constant width w and swims toward a fixed point directly opposite the point of entry into the canal. The person’s speed is v and the water current’s speed is s. Assume that, as the swimmer makes his way across, he always orients to point toward the target. We want to determine the swimmer’s trajectory. Figure 1.13 shows a coordinate system drawn so that the swimmer’s destination is the origin and the point of entry into the water is w 0. At time t the swimmer is at the point xt yt. The horizontal and vertical components of his velocity are, respectively, x t = −v cos and
y t = s − v sin
with the angle between the positive x axis and xt yt at time t. From these equations, dy y t s − v sin s = = = tan − sec dx x t −v cos v
y (x, y) v
α
v sin(α) v cos(α)
(0, 0) FIGURE 1.13
(w, 0) The swimmer’s path.
x
1.6 Homogeneous, Bernoulli, and Riccati Equations
41
From Figure 1.13, tan =
y x
and
sec =
1 2 x + y2 x
Therefore y s1 2 dy = − x + y2 dx x v x Write this as the homogeneous equation y s dy = − dx x v
1+
y 2 x
and put y = uv to obtain s1 1 dx du = − √ 2 v x 1+u Integrate to get
s lnu + 1 + u2 = − lnx + C v
Take the exponential of both sides of this equation: u + 1 + u2 = eC e−s lnx/v We can write this as
u + 1 + u2 = Kx−s/v
This equation can be solved for u. First write 1 + u2 = Kx−s/v − u and square both sides to get 1 + u2 = K 2 e−2s/v − 2Kue−s/v + u2 Now u2 cancels and we can solve for u: 1 1 s/v 1 x ux = Kx−s/v − 2 2K Finally, put u = y/x to get 1 1 1+s/v 1 x yx = Kx1−s/v − 2 2K To determine K, notice that yw = 0, since we put the origin at the point of destination. Thus, 1 1 1 1+s/v Kw1−s/v − w =0 2 2K and we obtain K = ws/v Therefore,
w x 1−s/v x 1+s/v − yx = 2 w w
42
CHAPTER 1
First-Order Differential Equations y
s/v 34
0.30 0.25 1
0.20
s/v 2
0.15
s/v 3
0.10
s/v 5
1
1
0.05 0.0
0.2
0.4
0.6
0.8
1.0
x
Graphs of
w x 1−s/v x 1+s/v y= − 2 w w for s/v equal to 15 , 13 , 21 and 43 , and w chosen as 1. FIGURE 1.14
As might be expected, the path the swimmer takes depends on the width of the canal, the speed of the swimmer, and the speed of the current. Figure 1.14 shows trajectories corresponding to s/v equal to 15 , 13 , 21 and 43 , with w = 1.
1.6.2
The Bernoulli Equation
DEFINITION 1.7
A Bernoulli equation is a first order equation, y + Pxy = Rxy in which is a real number.
A Bernoulli equation is separable if = 0 and linear if = 1. About 1696, Leibniz showed that a Bernoulli equation with = 1 transforms to a linear equation under the change of variables: v = y1− This is routine to verify. Here is an example.
EXAMPLE 1.27
Consider the equation 1 y + y = 3x2 y3 x which is Bernoulli with Px = 1/x, Rx = 3x2 , and = 3. Make the change of variables v = y−2
1.6 Homogeneous, Bernoulli, and Riccati Equations
43
Then y = v−1/2 and 1 y x = − v−3/2 v x 2 so the differential equation becomes 1 1 − v−3/2 v x + v−1/2 = 3x2 v−3/2 2 x or, upon multiplying by −2v3/2 , 2 v − v = −6x2 x a linear equation. An integrating factor is e− factor to get
2/x dx
= x−2 . Multiply the last equation by this
x−2 v − 2x−3 v = −6 which is x−2 v = −6 Integrate to get x−2 v = −6x + C so v = −6x3 + Cx2 The general solution of the Bernoulli equation is yx =
1.6.3
1 vx
=√
1 Cx2 − 6x3
The Riccati Equation
DEFINITION 1.8
A differential equation of the form y = Pxy2 + Qxy + Rx is called a Riccati equation.
A Riccati equation is linear exactly when Px is identically zero. If we can somehow obtain one solution Sx of a Riccati equation, then the change of variables y = Sx +
1 z
44
CHAPTER 1
First-Order Differential Equations
transforms the Riccati equation to a linear equation. The strategy is to find the general solution of this linear equation and from it produce the general solution of the original Riccati equation.
EXAMPLE 1.28
Consider the Riccati equation 2 1 1 y = y2 + y − x x x By inspection, y = Sx = 1 is one solution. Define a new variable z by putting 1 y = 1+ z Then y = −
1 z z2
Substitute these into the Riccati equation to get 1 2 1 1 2 1 1 1+ 1+ − + − 2 z = z x z x z x or 3 1 z + z = − x x This is linear. An integrating factor is e
3/x dx
= x3 . Multiply by x3 to get
x3 z + 3x2 z = x3 z = −x2 Integrate to get 1 x3 z = − x3 + C 3 so 1 C zx = − + 3 3 x The general solution of the Riccati equation is yx = 1 +
1 1 = 1+ zx −1/3 + C/x3
This solution can also be written yx = in which K = 3C is an arbitrary constant.
K + 2x3 K − x3
1.6 Homogeneous, Bernoulli, and Riccati Equations
PROBLEMS
SECTION 1.6
In each of Problems 1 through 14, find the general solution. These problems include all types considered in this section. 1 1 1. y = 2 y2 − y + 1 x x
2 1 2. y + y = 3 y−4/3 x x
3. y + xy = xy2 4. y =
x y + y x
5. y =
y x+y
6. y =
4 1 2 1 y − y− 2x x x
In each of Problems 16 through 19, use the idea of Problem 15 to find the general solution. 16. y =
y−3 x+y−1
17. y =
3x − y − 9 x+y+1
18. y =
x + 2y + 7 −2x + y − 9
19. y =
2x − 5y − 9 −4x + y + 9
20. Continuing from Problem 15, consider the case that ae − bd = 0. Now let u = ax + by/a, assuming that a = 0. Show that this transforms the differential equation of Problem 15 into the separable equation du b au + c = 1+ F dx a du + r
7. x − 2yy = 2x − y 8. xy = x cosy/x + y 1 1 9. y + y = 4 y−3/4 x x
In each of Problems 21 through 24, use the method of Problem 20 to find the general solution.
10. x2 y = x2 + y2 1 2 11. y = − y2 + y x x 12. x3 y = x2 y − y3 13. y = −e−x y2 + y + ex 3 2 14. y + y = y2 x x 15. Consider the differential equation y = F
45
ax + by + c dx + ey + r
in which a, b, c, d, e, and r are constants and F is a differentiable function of one variable. (a) Show that this equation is homogeneous if and only if c = r = 0. (b) If c and/or r is not zero, this equation is called nearly homogeneous. Assuming that ae − bd = 0, show that it is possible to choose constants h and k so that the transformation X = x + h Y = y + k converts this nearly homogeneous equation into a homogeneous one. Hint: Put x = X − h y = Y − k into the differential equation and obtain a differential equation in X and Y . Use the conclusion of (a) to choose h and k so that this equation is homogeneous.
21. y =
x−y+2 x−y+3
22. y =
3x + y − 1 6x + 2y − 3
23. y =
x − 2y 3x − 6y + 4
24. y =
x−y+6 3x − 3y + 4
25. (The Pursuing Dog) A man stands at the junction of two perpendicular roads and his dog is watching him from one of the roads at a distance A feet away. At a given instant the man starts to walk with constant speed v along the other road, and at the same time the dog begins to run toward the man with speed 2v. Determine the path the dog will take, assuming that it always moves so that it is facing the man. Also determine when the dog will eventually catch the man. (This is American Mathematical Monthly problem 3942, 1941). 26. (Pursuing Bugs) One bug is located at each corner of a square table of side length a. At a given time they begin moving at constant speed v, each pursuing its neighbor to the right. (a) Determine the curve of pursuit of each bug. Hint: Use polar coordinates with the origin at the
CHAPTER 1
46
First-Order Differential Equations
center of the table and the polar axis containing one of the corners. When a bug is at f , its target is at f + /2. Use the chain rule to write
speed . The bug moves toward the center of the disk at constant speed v. (a) Derive a differential equation for the path of the bug, using polar coordinates.
dy dy/d = dx dx/d
(b) How many revolutions will the disk make before the bug reaches the center? (The solution will be in terms of the angular speed and radius of the disk.)
where y = f sin and x = f cos. (b) Determine the distance traveled by each bug. (c) Does any bug actually catch its quarry?
(c) Referring to (b), what is the total distance the bug will travel, taking into account the motion of the disk?
27. (The Spinning Bug) A bug steps onto the edge of a disk of radius a that is spinning at a constant angular
1.7
Applications to Mechanics, Electrical Circuits, and Orthogonal Trajectories 1.7.1
Mechanics
Before applying first-order differential equations to problems in mechanics, we will review some background. Newton’s second law of motion states that the rate of change of momentum (mass times velocity) of a body is proportional to the resultant force acting on the body. This is a vector equation, but we will for now consider only motion along a straight line. In this case Newton’s law is d F =k mv dt We will take k = 1, consistent with certain units of measurement, such as the English, MKS, or gcs systems. The mass of a moving object need not be constant. For example, an airplane consumes fuel as it moves. If m is constant, then Newton’s law is dv = ma dt in which a is the acceleration of the object along the line of motion. If m is not constant, then F =m
dv dm +v dt dt Newton’s law of gravitational attraction states that if two objects have masses m1 and m2 , and they (or their center of masses) are at distance r from each other, then each attracts the other with a gravitational force of magnitude mm F = G 12 2 r This force is directed along the line between the centers of mass. G is the universal gravitational constant. If one of the objects is the earth, then F =m
F =G
mM R + x2
where M is the mass of the earth, R is its radius (about 3,960 miles), m is the mass of the second object, and x is its distance from the surface of the earth. This assumes that the earth is
1.7 Applications to Mechanics, Electrical Circuits, and Orthogonal Trajectories
47
spherical and that its center of mass is at the center of this sphere, a good enough approximation for some purposes. If x is small compared to R, then R + x is approximately R and the force on the object is approximately GM m R2 which is often written as mg. Here g = GM/R2 is approximately 32 feet per second per second or 98 meters per second per second. We are now ready to analyze some problems in mechanics. Terminal Velocity Consider an object that is falling under the influence of gravity, in a medium such as water, air or oil. This medium retards the downward motion of the object. Think, for example, of a brick dropped in a swimming pool or a ball bearing dropped in a tank of oil. We, want to analyze the object’s motion. Let vt be the velocity at time t. The force of gravity pulls the object down and has magnitude mg. The medium retards the motion. The magnitude of this retarding force is not obvious, but experiment has shown that its magnitude is proportional to the square of the velocity. If we choose downward as the positive direction and upward as negative, then Newton’s law tells us that, for some constant , dv dt If we assume that the object begins its motion from rest (dropped, not thrown) and if we start the clock at this instant, then v0 = 0. We now have an initial value problem for the velocity: v = g − v2 v0 = 0 m This differential equation is separable. In differential form, F = mg − v2 = m
1 dv = dt g − /mv2 Integrate to get
m tanh−1 g
Solve for the velocity, obtaining vt =
mg tanh
v = t + C mg
g t + C m
Now use the initial condition to solve for the integration constant: mg g v0 = tanh C = 0 m Since tanh = 0 only if = 0, this requires that C = 0 and the solution for the velocity is mg g tanh t vt = m Even in this √ generality, we can draw an important conclusion about the motion. As t increases, tanh g/mt approaches 1. This means that mg lim vt = t→
CHAPTER 1
48
First-Order Differential Equations
This means that an object falling under the influence of gravity, through a retarding medium (with force proportional to the square of the velocity), will not√ increase in velocity indefinitely. Instead, the object’s velocity approaches the limiting value mg/. If the medium is deep enough, the object will settle into a descent of approximately constant velocity. This number √ mg/ is called the terminal velocity of the object. Skydivers experience this phenomenon.
7 ft 9 ft
Motion of a Chain on a Pulley A 16 foot long chain weighing pounds per foot hangs over a small pulley, which is 20 feet above the floor. Initially, the chain is held at rest with 7 feet on one side and 9 on the other, as in Figure 1.15. How long after the chain is released, and with what velocity, will it leave the pulley? When 8 feet of chain hang on each side of the pulley, the chain is in equilibrium. Call this position x = 0 and let xt be the distance the chain has fallen below this point at time t. The net force acting on the chain is 2x and the mass of the chain is 16/32, or /2 slugs. The ends of the chain have the same speed as its center of mass, so the acceleration of the chain at its center of mass is the same as it is at its ends. The equation of motion is dv = 2x 2 dt
FIGURE 1.15
Chain on a pulley.
from which cancels to yield dv = 4x dt A chain rule differentiation enables us to write this equation in terms of v as a function of x. Write dv dv dx dv = =v dt dx dt dx Then v
dv = 4x dx
This is a separable equation, which we solve to get v2 = 4x2 + K Now, x = 1 when v = 0, so K = −4 and v2 = 4x2 − 4 The chain leaves the pulley when x = 8. Whenever this occurs, v2 = 463 = 252, so v = feet per second (about 1587 feet per second). To calculate the time tf required for the chain to leave the pulley, compute tf = =
tf 0
8 1
√ 6 7
dt dv dv 0 81 dt dx = dv dx 1 v dt =
√
√ 252 = 6 7
1.7 Applications to Mechanics, Electrical Circuits, and Orthogonal Trajectories √ Since vx = 2 x2 − 1,
49
8 √ 1 1 1 8 2 lnx + x − 1 dx = tf = √ 2 1 2 x2 − 1 1 √ 1 = ln8 + 63 2
about 138 seconds. In this example the mass was constant, so dm/dt = 0 in Newton’s law of motion. Next is an example in which the mass varies with time. Chain Piling on the Floor Suppose a 40 foot long chain weighing pounds per foot is supported in a pile several feet above the floor, and begins to unwind when released from rest with 10 feet already played out. Determine the velocity with which the chain leaves the support. The amount of chain that is actually in motion changes with time. Let xt denote the length of that part of the chain that has left the support by time t and is currently in motion. The equation of motion is dm dv +v = F (1.19) dt dt where F is the total external force acting on the chain. Now F = x = mg, so m = x/g = x/32. Then m
dm dx = = v dt 32 dt 32 Further, dv dv =v dt dx as in the preceding example. Put this information into equation (1.19) to get x dv v + v2 = x 32 dx 32 If we multiply this equation by 32/xv, we get 32 dv 1 + v= dx x v
(1.20)
which we recognize as a Bernoulli equation with = −1. Make the transformation w = v1− = v2 . Then v = w1/2 and dv 1 −1/2 dw = w dx 2 dx Substitute these into equation (1.20) to get 1 −1/2 dw 1 1/2 w + w = 32w−1/2 2 dx x Upon multiplying this equation by 2w1/2 , we get w +
2 w = 64 x
a linear equation for wx. Solve this to get wx = vx2 =
64 C x+ 2 3 x
50
CHAPTER 1
First-Order Differential Equations
Since v = 0 when x = 10, 0 = 64/310 + C/100, so C = −64 000/3. Therefore,
64 1000 2 vx = x− 2 3 x The chain leaves the support when x = 40. At this time,
64 1000 2 v = 40 − = 4210 3 1600 √ so, the velocity is v = 2 210, or about 29 feet per second. In these models involving chains, air resistance was neglected as having no significant impact on the outcome. This was quite different from the analysis of terminal velocity, in which air resistance is a key factor. Without it, skydivers dive only once! Motion of a Block Sliding on an Inclined Plane A block weighing 96 pounds is released from rest at the top of an inclined plane of slope length 50 feet, √ and making an angle /6 radians with the horizontal. Assume a coefficient of friction of = 3/4. Assume also that air resistance acts to retard the block’s descent down the ramp, with a force of magnitude equal to one half the block’s velocity. We want to determine the velocity vt of the block at any time t. Figure 1.16 shows the forces acting on the block. Gravity acts downward with magnitude mg sin, which is 96 sin/6, or 48 pounds. Here mg = 96 is the weight of the block. The drag due to friction acts in the reverse direction and is, in pounds, √ 3 −N = −mg cos = − 96 cos = −36 4 6 The drag force due to air resistance is −v/2, the negative sign indicating that this is a retarding force. The total external force on the block is F = 48 − 36 − 21 v = 12 − 21 v Since the block weighs 96 pounds, it has a mass of 96/32 slugs, or 3 slugs. From Newton’s second law, 3
dv 1 = 12 − v dt 2
This is a linear equation, which we write as 1 v + v = 4 6 N mg sin(θ) θ
mg cos(θ)
mg π 6
FIGURE 1.16
inclined plane.
Forces acting on a block on an
1.7 Applications to Mechanics, Electrical Circuits, and Orthogonal Trajectories An integrating factor is e
1/6 dt
51
= et/6 . Multiply the differential equation by this factor to obtain v et/6 + 16 et/6 v = vet/6 = 4et/6
and integrate to get vet/6 = 24et/6 + C The velocity is vt = 24 + Ce−t/6 Since the block starts from rest at time zero, v0 = 0 = 24 + C, so C = −24 and vt = 24 1 − e−t/6 Let xt be the position of the block at any time, measured from the top of the plane. Since vt = x t, we get xt = vt dt = 24t + 144e−t/6 + K If we let the top of the block be the origin along the inclined plane, then x0 = 0 = 144 + K, so K = −144 The position function is
xt = 24t + 144 e−t/6 − 1
We can now determine the block’s position and velocity at any time. Suppose, for example, we want to know when the block reaches the bottom of the ramp. This happens when the block has gone 50 feet. If this occurs at time T , then xT = 50 = 24T + 144 e−T/6 − 1 This transcendental equation cannot be solved algebraically for T , but a computer approximation yields T ≈ 58 seconds. Notice that lim vt = 24
t→
which means that the block sliding down the ramp has a terminal velocity. If the ramp is long enough, the block will eventually settle into a slide of approximately constant velocity. The mathematical model we have constructed for the sliding block can be used to analyze the motion of the block under a variety of conditions. For example, we can solve the equations leaving arbitrary, and determine the influence of the slope angle of the ramp on position and velocity. Or we could leave unspecified and study the influence of friction on the motion.
1.7.2
Electrical Circuits
Electrical engineers often use differential equations to model circuits. The mathematical model is used to analyze the behavior of circuits under various conditions, and aids in the design of circuits having specific characteristics. We will look at simple circuits having only resistors, inductors and capacitors. A capacitor is a storage device consisting of two plates of conducting material isolated from one another by an insulating material, or dielectric. Electrons can be transferred from one plate to another via external circuitry by applying an electromotive force to the circuit. The charge on a capacitor is essentially a count of the difference between the numbers of electrons on the two plates. This charge is proportional to the applied electromotive force, and the constant of proportionality
52
CHAPTER 1
First-Order Differential Equations
is the capacitance. Capacitance is usually a very small number, given in micro (10−6 ) or pico (10−12 ) farads. To simplify examples and problems, some of the capacitors in this book are assigned numerical values that would actually make them occupy large buildings. An inductor is made by winding a conductor such as wire around a core of magnetic material. When a current is passed through the wire, a magnetic field is created in the core and around the inductor. The voltage drop across an inductor is proportional to the change in the current flow, and this constant of proportionality is the inductance of the inductor, measured in henrys. Current is measured in amperes, with one amp equivalent to a rate of electron flow of one coulomb per second. Charge qt and current it are related by it = q t The voltage drop across a resistor having resistance R is iR. The drop across a capacitor having capacitance C is q/C. And the voltage drop across an inductor having inductance L is Li t. We construct equations for a circuit by using Kirchhoff’s current and voltage laws. Kirchhoff’s current law states that the algebraic sum of the currents at any juncture of a circuit is zero. This means that the total current entering the junction must balance the current leaving (conservation of energy). Kirchhoff’s voltage law states that the algebraic sum of the potential rises and drops around any closed loop in a circuit is zero. As an example of modeling a circuit mathematically, consider the circuit of Figure 1.17. Starting at point A, move clockwise around the circuit, first crossing the battery, where there is an increase in potential of E volts. Next there is a decrease in potential of iR volts across the resistor. Finally, there is a decrease of Li t across the inductor, after which we return to point A. By Kirchhoff’s voltage law, E − iR − Li = 0 which is the linear equation i +
E E i= R L
Solve this to obtain E + Ke−Rt/L R To determine the constant K, we need to be given the current at some time. Even without this, we can tell from this equation that as t → , the current approaches the limiting value E/R. This is the steady-state value of the current in the circuit. Another way to derive the differential equation of this circuit is to designate one of the components as a source, then set the voltage drop across that component equal to the sum of the voltage drops across the other components. To see this approach, consider the circuit of Figure 1.18. Suppose the switch is initially open so that no current flows, and that the charge it =
R
R L
E
C E
A FIGURE 1.17
RL Circuit.
FIGURE 1.18
RC circuit.
1.7 Applications to Mechanics, Electrical Circuits, and Orthogonal Trajectories
53
on the capacitor is zero. At time zero, close the switch. We want the charge on the capacitor. Notice that we have to close the switch before there is a loop. Using the battery as a source, write iR +
1 q = E C
or Rq +
1 q = E C
This leads to the linear equation q +
1 E q= RC R
with solution qt = EC 1 − e−t/RC satisfying q0 = 0. This equation provides a good deal information about the circuit. Since the voltage on the capacitor at time t is qt/C, or E1 − e−t/RC , we can see that the voltage approaches E as t → . Since E is the battery potential, the difference between battery and capacitor voltages becomes negligible as time increases, indicating a very small voltage drop across the resistor. The current in this circuit can be computed as it = q t =
E −t/RC e R
after the circuit is switched on. Thus it → E/R as t → . Often we encounter discontinuous currents and potential functions in dealing with circuits. These can be treated using Laplace transform techniques, which we will discuss in Chapter 3.
1.7.3
Orthogonal Trajectories
Two curves intersecting at a point P are said to be orthogonal if their tangents are perpendicular (orthogonal) at P. Two families of curves, or trajectories, are orthogonal if each curve of the first family is orthogonal to each curve of the second family, wherever an intersection occurs. Orthogonal families occur in many contexts. Parallels and meridians on a globe are orthogonal, as are equipotential and electric lines of force. A problem that occupied Newton and other early developers of the calculus was the determination of the family of orthogonal trajectories of a given family of curves. Suppose we are given a family of curves in the plane. We want to construct a second family of curves so that every curve in is orthogonal to every curve in wherever an intersection occurs. As a simple example, suppose consists of all circles about the origin. Then consists of all straight lines through the origin (Figure 1.19). It is clear that each straight line is orthogonal to each circle wherever the two intersect.
54
CHAPTER 1
First-Order Differential Equations y
x
FIGURE 1.19 Orthogonal families: circles and lines.
In general, suppose we are given a family of curves. These must be described in some way, say by an equation Fx y k = 0 giving a different curve for each choice of the constant k. Think of these curves as integral curves of a differential equation y = fx y which we determine from the equation Fx y k = 0 by differentiation. At a point x0 y0 , the slope of the curve C in through this point is fx0 y0 . Assuming that this is nonzero, any curve through x0 y0 and orthogonal to C at this point, must have slope −1/fx0 y0 . (Here we use the fact that two lines are orthogonal if and only if their slopes are negative reciprocals.) The family of orthogonal trajectories of therefore consists of the integral curves of the differential equation y = −
1 fx y
Solve this differential equation for the curves in .
EXAMPLE 1.29
Consider the family of curves that are graphs of Fx y k = y − kx2 = 0 This is a family of parabolas. We want the family of orthogonal trajectories. First obtain the differential equation of . Differentiate y − kx2 = 0 to get y − 2kx = 0 To eliminate k, use the equation y − kx2 = 0 to write y k= 2 x Then y y − 2 2 x = 0 x or y y = 2 = fx y x
1.7 Applications to Mechanics, Electrical Circuits, and Orthogonal Trajectories
55
This is the differential equation of the family . Curves in are integral curves of this differential equation, which is of the form y = fx y, with fx y = 2y/x. The family of orthogonal trajectories therefore has differential equation y = −
1 x =− fx y 2y
This equation is separable, since 2ydy = −x dx Integrate to get y2 = − 21 x2 + C This is a family of ellipses 1 2 x + y2 2
= C
Some of the parabolas and ellipses from and are shown in Figure 1.20. Each parabola in is orthogonal to each ellipse in wherever these curves intersect. y
x
FIGURE 1.20 Orthogonal families: parabolas and ellipses.
SECTION 1.7
PROBLEMS
Mechanical Systems 1. Suppose that the pulley described in this section is only 9 feet above the floor. Assuming the same initial conditions as in the discussion, determine the velocity with which the chain leaves the pulley. Hint: The mass of the part of the chain that is in motion is 16 − x/32. 2. Determine the time it takes for the chain in Problem 1 to leave the pulley.
pounds per foot. The chain is supported by a small frictionless pulley located more than 40 feet above the floor. Initially, the chain is held at rest with 23 feet hanging on one side of the pulley with the remainder of the chain, along with the weight, on the other side. How long after the chain is released, and with what velocity, will it leave the pulley?
3. Suppose the support is only 10 feet above the floor in the discussion of the chain piling on the floor. Calculate the velocity of the moving part of the chain as it leaves the support. (Note the hint to Problem 1.)
5. (Chain on a Table) A 24-foot chain weighing pounds per foot is stretched out on a very tall, frictionless table with 6 feet hanging off the edge. If the chain is released from rest, determine the time it takes for the end of the chain to fall off the table, and also the velocity of the chain at this instant.
4. (Chain and Weight on a Pulley) An 8-pound weight is attached to one end of a 40-foot chain that weighs
6. (Variable Mass Chain on a Low Table) Suppose the chain in Problem 5 is placed on a table that is only
56
CHAPTER 1
First-Order Differential Equations
4 feet high, so that the chain accumulates on the floor as it slides off the table. Two feet of chain are already piled up on the floor at the time that the rest of the chain is released. Determine the velocity of the moving end of the chain at the instant it leaves the table top. Hint: The mass of that part of the chain that is moving changes with time. Newton’s law applies to the center of mass of the moving system. 7. Determine the time it takes for the chain to leave the support in the discussion of the chain piling on the floor. 8. Use the conservation of energy principle (potential energy plus kinetic energy of a conservative system is a constant of the motion) to obtain the velocity of the chain in the discussion involving the chain on the pulley. 9. Use the conservation of energy principle to give an alternate derivation of the conclusion of the discussion of the chain piling on the floor. 10. (Paraboloid of Revolution) Determine the shape assumed by the surface of a liquid being spun in a circular bowl at constant angular velocity . Hint: Consider a particle of liquid located at x y on the surface of the liquid, as in Figure 1.21. The forces acting on the particle are the horizontal force having magnitude m2 x and a vertical force of magnitude mg. Since the particle is in radial equilibrium, the resultant vector is normal to the curve.
y (x, y)
mω2 x
mg x FIGURE 1.21 Particle on the surface of a spinning liquid.
Properties of spinning liquids have found application in astronomy. A Canadian astronomer has constructed a telescope by spinning a bowl of mercury, creating a reflective surface free of the defects obtained by the usual grinding of a solid lens. He claims that the idea was probably known to Newton, but that he is the first to carry it out in practice. Roger Angel, a University of Arizona astronomer, has developed this idea into a technique for producing telescope mirrors called spin casting. As reported in Time (April 27, 1992), “ a complex ceramic mold is assembled inside the furnace and filled with glittering chunks of Pyrex-type glass. Once the furnace lid is sealed,
the temperature will slowly ratchet up over a period of several days, at times rising no more than 2 degrees Centigrade in an hour. At 750 degrees C (1382 degrees Fahrenheit), when the glass is a smooth, shiny lake, the furnace starts to whirl like a merry-go-round, an innovation that automatically spins the glass into the parabolic shape traditionally achieved by grinding.” The result is a parabolic surface requiring little or no grinding before a reflective coat is applied. Professor Angel believes that the method will allow the construction of much larger mirrors than are possible by conventional techniques. Supporting this claim is his recent production of one of the world’s largest telescope mirrors, a 65 meter (about 21 feet) to be placed in an observatory atop Mount Hopkins in Arizona. 11. A 10-pound ballast bag is dropped from a hot air balloon which is at an altitude of 342 feet and ascending at a rate of 4 feet per second. Assuming that air resistance is not a factor, determine the maximum height attained by the bag, how long it remains aloft, and the speed with which it strikes the ground. 12. A 48-pound box is given an initial push of 16 feet per second down an inclined plane that has a gradient of 7 . If there is a coefficient of friction of 13 between 24 the box and the plane, and an air resistance equal to 3 the velocity of the box, determine how far the box 2 will travel before coming to rest. 13. A skydiver and her equipment together weigh 192 pounds. Before the parachute is opened, there is an air drag equal to six times her velocity. Four seconds after stepping from the plane, the skydiver opens the parachute, producing a drag equal to three times the square of the velocity. Determine the velocity and how far the skydiver has fallen at time t. What is the terminal velocity? 14. Archimedes’ principle of buoyancy states that an object submerged in a fluid is buoyed up by a force equal to the weight of the fluid that is displaced by the object. A rectangular box, 1 by 2 by 3 feet, and weighing 384 pounds, is dropped into a 100-foot-deep freshwater lake. The box begins to sink with a drag due to the water having magnitude equal to 21 the velocity. Calculate the terminal velocity of the box. Will the box have achieved a velocity of 10 feet per second by the time it reaches bottom? Assume that the density of the water is 625 pounds per cubic foot. 15. Suppose the box in Problem 14 cracks open upon hitting the bottom of the lake, and 32 pounds of its contents fall out. Approximate the velocity with which the box surfaces. 16. The acceleration due to gravity inside the earth is proportional to the distance from the center of the
1.7 Applications to Mechanics, Electrical Circuits, and Orthogonal Trajectories earth. An object is dropped from the surface of the earth into a hole extending through the earth’s center. Calculate the speed the object achieves by the time it reaches the center. 17. A particle starts from rest at the highest point of a vertical circle and slides under only the influence of gravity along a chord to another point on the circle. Show that the time taken is independent of the choice of the terminal point. What is this common time?
57
10
30
15
i1
i2
i3
5H
6V
1 10
F
4 10
F
FIGURE 1.24
Circuits 18. Determine each of the currents in the circuit of Figure 1.22.
10
22. In a constant electromotive force RL circuit, we find that the current is given by E it = 1 − e−Rt/L + i0e−Rt/L R Let i0 = 0. (a) Show that the current increases with time. (b) Find a time t0 at which the current is 63% of E/R. This time is called the inductive time constant of the circuit.
i2
i1 15
10 V
30
FIGURE 1.22
19. In the circuit of Figure 1.23, the capacitor is initially discharged. How long after the switch is closed will the capacitor voltage be 76 volts? Determine the current in the resistor at that time. (Here k denotes 1000 ohms and F denotes 10−6 farads.)
(c) Does the inductive time constant depend on i0? If so, in what way? 23. Recall that the charge qt in an RC circuit satisfies the linear differential equation 1 1 q = Et q + RC R (a) Solve for the charge in the case that Et = E, constant. Evaluate the constant of integration by using the condition q0 = q0 . (b) Determine limt→ qt and show that this limit is independent of q0 . (c) Graph qt. Determine when the charge has its maximum and minimum values.
250 2 µF 80 V FIGURE 1.23
20. Suppose, in Problem 19, the capacitor had a potential of 50 volts when the switch was closed. How long would it take for the capacitor voltage to reach 76 volts? 21. For the circuit of Figure 1.24, find all currents immediately after the switch is closed, assuming that all of these currents and the charges on the capacitors are zero just prior to closing the switch.
(d) Determine at what time qt is within 1% of its steady-state value (the limiting value requested in (b)).
Orthogonal Trajectories In each of Problems 24 through 29, find the family of orthogonal trajectories of the given family of curves. If software is available, graph some curves in the given family and some curves in the family of orthogonal trajectories. 24. x + 2y = K 25. 2x2 − 3y = K 26. x2 + 2y2 = K 27. y = Kx2 + 1 28. x2 − Ky2 = 1 29. y = ekx
58
1.8
CHAPTER 1
First-Order Differential Equations
Existence and Uniqueness for Solutions of Initial Value Problems We have solved several initial value problems y = fx y
yx0 = y0
and have always found that there is just one solution. That is, the solution existed, and it was unique. Can either existence or uniqueness fail to occur? The answer is yes, as the following examples show.
EXAMPLE 1.30
Consider the initial value problem y = 2y1/2
y0 = −1
The differential equation is separable and has general solution yx = x + C2 To satisfy the initial condition, we must choose C so that y0 = C 2 = −1 and this is impossible if C is to be a real number. This initial value problem has no real-valued solution.
EXAMPLE 1.31
Consider the problem y = 2y1/2
y2 = 0
One solution is the trivial function y = x = 0 But there is another solution. Define x =
0 x − 22
for all x
for x ≤ 2 for x ≥ 2
Graphs of both solutions are shown in Figure 1.25. Uniqueness fails in this example. Because of examples such as these, we look for conditions that ensure that an initial value problem has a unique solution. The following theorem provides a convenient set of conditions.
1.8 Existence and Uniqueness for Solutions of Initial Value Problems y
59
y y (x)
y (x)
x
x 2 √
FIGURE 1.25 Graphs of solutions of y = 2 y;
y2 = 0.
THEOREM 1.2
Existence and Uniqueness
Let f and f/ y be continuous for all x y in a closed rectangle R centered at x0 y0 . Then there exists a positive number h such that the initial value problem y = fx y
yx0 = y0
has a unique solution defined over the interval x0 − h x0 + h. As with the test for exactness (Theorem 1.1), by a closed rectangle we mean all points on or inside a rectangle in the plane, having sides parallel to the axes. Geometrically, existence of a solution of the initial value problem means that there is an integral curve of the differential equation passing through x0 y0 . Uniqueness means that there is only one such curve. This is an example of a local theorem, in the following sense. The theorem guarantees existence of a unique solution that is defined on some interval of width 2h, but it says nothing about how large h is. Depending on f and x0 , h may be small, giving us existence and uniqueness “near” x0 . This is dramatically demonstrated by the initial value problem y = y2
y0 = n
in which n is any positive integer. Here fx y = y2 and f/ y = 2y, both continuous over the entire plane, hence on any closed rectangle about 0 n. The theorem tells us that there is a unique solution of this initial value problem in some interval −h h. In this case we can solve the initial value problem explicitly, obtaining yx = −
1 x − n1
This solution is valid for −1/n < x < 1/n, so we can take h = 1/n in this example. This means that the size of n in the initial value controls the size of the interval for the solution. The larger n is, the smaller this interval must be. This fact is certainly not apparent from the initial value problem itself! In the special case that the differential equation is linear, we can improve considerably on the existence/uniqueness theorem. THEOREM 1.3
Let p and q be continuous on an open interval I and let x0 be in I. Let y0 be any number. Then the initial value problem y + pxy = qx has a unique solution defined for all x in I.
yx0 = y0
60
CHAPTER 1
First-Order Differential Equations
In particular, if p and q are continuous for all x, then there is a unique solution defined over the entire real line. Equation (1.6) of Section 1.3 gives the general solution of the linear equation. Using this, we can write the solution of the initial value problem: x
x x − x p d p d x qe 0 d + y0 yx = e 0
Proof
x0
Because p and q are continuous on I, this solution is defined for all x in I. Therefore, in the case that the differential equation is linear, the initial value problem has a unique solution in the largest open interval containing x0 , in which both p and q are continuous.
PROBLEMS
SECTION 1.8
In each of Problems 1 through 5, show that the conditions of Theorem 1.2 are satisfied by the initial value problem. Assume familiar facts from the calculus about continuity of real functions of one and two variables. 1. y = 2y2 + 3xey sinxy y2 = 4 2. y = 4xy + coshx y1 = −1 3. y = xy3 − siny y2 = 2 4. y = x5 − y5 + 2xey y3 = 5. y = x2 ye−2x + y2 y3 = 8 6. Consider the initial value problem y = 2y yx0 = y0 . (a) Find two solutions, assuming that y0 > 0. (b) Explain why part (a) does not violate Theorem 1.2. Theorem 1.2 can be proved using Picard iterates, which we will discuss briefly. Suppose f and f/ y are continuous in a closed rectangle R having x0 y0 in its interior and sides parallel to the axes. Consider the initial value problem y = fx y yx0 = y0 . For each positive integer n, define x yn x = y0 + ft yn−1 t dt x0
This is a recursive definition, giving y1 x in terms of y0 , then y2 x in terms of y1 x, and so on. The functions yn x for n = 1 2 are called Picard iterates for the initial value problem. Under the assumptions made on f , the sequence yn x converges for all x in some interval about x0 , and the limit of this sequence is the solution of the initial value problem on this interval. In each of Problems 7 through 10, (a) use Theorem 1.2 to show that the problem has a solution in some interval about x0 , (b) find this solution, (c) compute Picard iterates y1 x through y6 x, and from these guess yn x in general, and (d) find the Taylor series of the solution from (b) about x0 . You should find that the iterates computed in (c) are partial sums of the series of (d). Conclude that in these examples the Picard iterates converge to the solution. 7. y = 2 − y y0 = 1 8. y = 4 + y y0 = 3 9. y = 2x2 y1 = 3 10. y = cosx y = 1
2
CHAPTER
REDUCTION OF ORDER CONSTANT COEFFICIENT HOMOGENEOUS LINEAR EQUATION EULER’S EQUATION NONHOMOGENEOUS EQUATION y + pxy + qxy = fx APPLICATION OF SECOND ORDER DIFFERENTIAL
Second-Order Differential Equations
2.1
Preliminary Concepts A second-order differential equation is an equation that contains a second derivative, but no higher derivative. Most generally, it has the form Fx y y y = 0 although only a term involving y need appear explicitly. For example, y = x3 xy − cosy = ex and y − 4xy + y = 2 are second-order differential equations. A solution of Fx y y y = 0 on an interval I (perhaps the whole real line) is a function that satisfies the differential equation at each point of I: Fx x x x = 0
for x in I.
For example, x = 6 cos4x − 17 sin4x is a solution of y + 16y = 0 for all real x. And x = x3 coslnx is a solution of x2 y − 5xy + 10y = 0 for x > 0. These can be checked by substitution into the differential equation. The linear second-order differential equation has the form Rxy + Pxy + Qxy = Sx 61
62
CHAPTER 2
Second-Order Differential Equations
in which R, P, Q, and S are continuous in some interval. On any interval where Rx = 0, we can divide this equation by Rx and obtain the special linear equation y + pxy + qxy = fx
(2.1)
For the remainder of this chapter, we will concentrate on this equation. We want to know: 1. What can we expect in the way of existence and uniqueness of solutions of equation (2.1)? 2. How can we produce all solutions of equation (2.1), at least in some cases that occur frequently and have important applications? We begin with the underlying theory that will guide us in developing techniques for explicitly producing solutions of equation (2.1).
2.2
Theory of Solutions of y + pxy + qxy = fx To get some feeling for what we are dealing with, and what we should be looking for, consider the simple linear second-order equation y − 12x = 0 We can write this as y = 12x and integrate to obtain y =
y x dx =
12x dx = 6x2 + C
Integrate again: yx =
y x dx =
6x2 + C dx = 2x3 + Cx + K
This solution is defined for all x, and contains two arbitrary constants. If we recall that the general solution of a first order equation contained one arbitrary constant, it seems natural that the solution of a second-order equation, involving two integrations, should contain two arbitrary constants. For any choices of C and K, we can graph the integral curves y = 2x3 + Cx + K as curves in the plane. Figure 2.1 shows some of these curves for different choices of these constants. Unlike the first-order case, there are many integral curves through each point in the plane. For example, suppose we want a solution satisfying the initial condition y0 = 3 Then we need y0 = K = 3 but are still free to choose C as any number. All solutions yx = 2x3 + Cx + 3 pass through 0 3. Some of these curves are shown in Figure 2.2.
2.2 Theory of Solutions of y + pxy + qxy = fx
63
y 20
10
2
1
x
1
2
10
20
FIGURE 2.1 Graphs of y = 2x3 + Cx + K for various values of C and K.
y 30 20 10 2
1 0
1
2
x
10 20 FIGURE 2.2 Graphs of y = 2x3 + Cx + 3 for various values of C.
We single out exactly one of these curves if we specify its slope at 0 3. Suppose, for example, we also specify the initial condition y 0 = −1 Since y x = 6x2 + C, this requires that C = −1. There is exactly one solution satisfying both initial conditions (going through a given point with given slope), and it is yx = 6x2 − x + 3 A graph of this solution is given in Figure 2.3. To sum up, at least in this example, the general solution of the differential equation involved two arbitrary constants. An initial condition y0 = 3, specifying that the solution curve must pass through 0 3, determined one of these constants. However, that left infinitely many
64
CHAPTER 2
Second-Order Differential Equations y 40 20
2 3
0 1 20
1
2
3
x
40 FIGURE 2.3
Graph of y = 2x3 − x + 3.
solution curves passing through 0 3. The other initial condition, y 0 = −1, picked out that solution curve through 0 3 having slope −1 and gave a unique solution of this problem. This suggests that we define the initial value problem for equation (2.1) to be the differential equation, defined on some interval, together with two initial conditions, one specifying a point lying on the solution curve, and the other its slope at that point. This problem has the form y + pxy + qxy = fx yx0 = A y x0 = B in which A and B are given real numbers. The main theorem on existence and uniqueness of solutions for this problem is the secondorder analogue of Theorem 1.3 in Chapter 1. THEOREM 2.1
Let p, q, and f be continuous on an open interval I. Let x0 be in I and let A and B be any real numbers. Then the initial value problem y + pxy + qxy = fx yx0 = A y x0 = B has a unique solution defined for all x in I. This gives us an idea of the kind of information needed to specify a unique solution of equation (2.1). Now we need a framework in which to proceed in finding solutions. We will provide this in two steps, beginning with the case that fx is identically zero.
2.2.1
The Homogeneous Equation y + pxy + qx = 0
When fx is identically zero in equation (2.1), the resulting equation y + pxy + qx = 0
(2.2)
is called homogeneous. This term was used in a different context with first-order equations, and its use here is unrelated to that. Here it simply means that the right side of equation (2.1) is zero. A linear combination of solutions y1 x and y2 x of equation (2.2) is a sum of constant multiples of these functions: c1 y1 x + c2 y2 x with c1 and c2 real numbers. It is an important property of the homogeneous linear equation that linear combinations of solutions are again solutions.
2.2 Theory of Solutions of y + pxy + qxy = fx
65
THEOREM 2.2
Let y1 and y2 be solutions of y + pxy + qxy = 0 on an interval I. Then any linear combination of these solutions is also a solution. Let c1 and c2 be real numbers. Substituting yx = c1 y1 x + c2 y2 x into the differential equation, we obtain
Proof
c1 y1 + c2 y2 + pxc1 y1 + c2 y2 + qxc1 y1 + c2 y2 = c1 y1 + c2 y2 + c1 pxy1 + c2 pxy2 + c1 qxy1 + c2 qxy2 = c1 y1 + pxy1 + qxy1 + c2 y2 + pxy2 + qxy2 = 0 + 0 = 0 because of the assumption that y1 and y2 are both solutions. Of course, as a special case (c2 = 0), this theorem tells us also that, for the homogeneous equation, a constant multiple of a solution is a solution. Even this special case of the theorem fails for a nonhomogeneous equation. For example, y1 x = 4e2x /5 is a solution of y + 2y − 3y = 4e2x but 5y1 x = 4e2x is not. The point to taking linear combinations c1 y1 + c2 y2 is to obtain more solutions from just two solutions of equation (2.2). However, if y2 is already a constant multiple of y1 , then c1 y1 + c2 y2 = c1 y1 + c2 ky1 = c1 + kc2 y1 just another constant multiple of y1 . In this event, y2 is superfluous, providing us nothing we did not know from just y1 . This leads us to distinguish the case in which one solution is a constant multiple of another, from the case in which the two solutions are not multiples of each other.
DEFINITION 2.1
Linear Dependence, Independence
Two functions f and g are linearly dependent on an open interval I if, for some constant c, either fx = cgx for all x in I, or gx = cfx for all x in I. If f and g are not linearly dependent on I, then they are said to be linearly independent on the interval.
EXAMPLE 2.1
y1 x = cosx and y2 x = sinx are solutions of y + y = 0, over the real line. Neither of these functions is a constant multiple of the other. Indeed, if cosx = k sinx for all x, then in particular √ √2 2 = = k sin =k cos 4 2 4 2 so k must be 1. But then cosx = sinx for all x, a clear absurdity (for example, let x = 0). These solutions are linearly independent. Now we know from Theorem 2.2 that a cosx + b sinx
66
CHAPTER 2
Second-Order Differential Equations
is a solution for any numbers a and b. Because cosx and sinx are linearly independent, this linear combination provides an infinity of new solutions, instead of just constant multiples of one we already know. There is a simple test to tell whether two solutions of equation (2.2) are linearly independent on an interval. Define the Wronskian of solutions y1 and y2 to be Wx = y1 xy2 x − y1 xy2 x This is the 2 × 2 determinant
THEOREM 2.3
y x 1 Wx = y1 x
y2 x y2 x
Wronskian Test
Let y1 and y2 be solutions of y + pxy + qxy = 0 on an open interval I. Then, 1. Either Wx = 0 for all x in I, or Wx = 0 for all x in I. 2. y1 and y2 are linearly independent on I if and only if Wx = 0 on I. Conclusion (1) means that the Wronskian of two solutions cannot be nonzero at some points of I and zero at others. Either the Wronskian vanishes over the entire interval, or it is nonzero at every point of the interval. Conclusion (2) states that nonvanishing of the Wronskian is equivalent to linear independence of the solutions. Putting both conclusions together, it is therefore enough to test Wx at just one point of I to determine linear dependence or independence of these solutions. This gives us great latitude to choose a point at which the Wronskian is easy to evaluate.
EXAMPLE 2.2
In Example 2.1, we considered the solutions y1 x = cosx and y2 x = sinx of y + y = 0, for all x. In this case, linear independence was obvious. The Wronskian of these solutions is cosx sinx Wx = − sinx cosx = cos2 x + sin2 x = 1 = 0
EXAMPLE 2.3
It is not always obvious whether two solutions are linearly independent or dependent on an interval. Consider the equation y + xy = 0. This equation appears simple but is not easy to solve. By a power series method we will develop later, we can write two solutions 1 1 6 1 y1 x = 1 − x3 + x − x9 + · · · 6 180 12,960 and y2 x = x −
1 4 1 7 1 x + x − x10 + · · · 12 504 45,360
2.2 Theory of Solutions of y + pxy + qxy = fx
67
with both series converging for all x. Here I is the entire real line. The Wronskian of these solutions at any nonzero x would be difficult to evaluate, but at x = 0 we easily obtain W0 = y1 0y2 0 − y1 0y2 0 = 11 − 00 = 1 Nonvanishing of the Wronskian at this one point is enough to conclude linear independence of these solutions. We are now ready to use the machinery we have built up to determine what is needed to find all solutions of y + pxy + qx = 0. THEOREM 2.4
Let y1 and y2 be linearly independent solutions of y + pxy + qxy = 0 on an open interval I. Then, every solution of this differential equation on I is a linear combination of y1 and y2 . This fundamental theorem provides a strategy for finding all solutions of y + pxy + qxy = 0 on I. Find two linearly independent solutions. Depending on p and q, this may be difficult, but at least we have a specific goal. If necessary, use the Wronskian to test for independence. The general linear combination c1 y1 + c2 y2 , with c1 and c2 arbitrary constants, then contains all possible solutions. We will prove the theorem following introduction of some standard terminology.
DEFINITION 2.2
Let y1 and y2 be solutions of y + pxy + qxy = 0 on an open interval I. 1. y1 and y2 form a fundamental set of solutions on I if y1 and y2 are linearly independent on I. 2. When y1 and y2 form a fundamental set of solutions, we call c1 y1 + c2 y2 , with c1 and c2 arbitrary constants, the general solution of the differential equation on I.
In these terms, we find the general solution by finding a fundamental set of solutions. Here is a proof of Theorem 2.4. Let be any solution of y + pxy + qxy = 0 on I. We want to show that there must be numbers c1 and c2 such that Proof
x = c1 y1 x + c2 y2 x Choose any x0 in I. Let x0 = A and x0 = B. By Theorem 2.1, is the unique solution on I of the initial value problem y + pxy + qxy = 0
yx0 = A y x0 = B
Now consider the system of two algebraic equations in two unknowns: y1 x0 c1 + y2 x0 c2 = A y1 x0 c1 + y2 x0 c2 = B
68
CHAPTER 2
Second-Order Differential Equations
It is routine to solve these algebraic equations. Assuming that Wx0 = 0, we find that c1 =
Ay2 x0 − By2 x0 By x − Ay1 x0 c2 = 1 0 Wx0 Wx0
With this choice of c1 and c2 , the function c1 y1 + c2 y2 is a solution of the initial value problem. By uniqueness of the solution of this problem, x = c1 y1 x + c2 y2 x on I, and the proof is complete. The proof reinforces the importance of having a fundamental set of solutions, since the nonvanishing of the Wronskian plays a vital role in showing that an arbitrary solution must be a linear combination of the fundamental solutions.
2.2.2
The Nonhomogeneous Equation y + pxy + qxy = fx
The ideas just developed for the homogeneous equation (2.2) also provide the key to solving the nonhomogeneous equation y + pxy + qxy = fx
(2.3)
THEOREM 2.5
Let y1 and y2 be a fundamental set of solutions of y + pxy + qxy = 0 on an open interval I. Let yp be any solution of equation (2.3) on I. Then, for any solution of equation (2.3), there exist numbers c1 and c2 such that = c1 y1 + c2 y2 + yp This conclusion leads us to call c1 y1 + c2 y2 + yp the general solution of equation (2.3) and suggests the following strategy. To solve y + pxy + qxy = fx: 1. find the general solution c1 y1 + c2 y2 of the associated homogeneous equation y + pxy + qxy = 0, 2. find any solution yp of y + pxy + qxy = fx, and 3. write the general solution c1 y1 + c2 y2 + yp . This expression contains all possible solutions of equation (2.3) on the interval. Again, depending on p, q, and f , the first two steps may be formidable. Nevertheless, the theorem tells us what to look for and provides a clear way to proceed. Here is a proof of the theorem. Proof
Since and yp are both solutions of equation (2.3), then − yp + p − yp + q − yp = + p + q − yp + pyp + qyp = f − f = 0
Therefore, − yp is a solution of y + py + qy = 0. Since y1 and y2 form a fundamental set of solutions for this homogeneous equation, there are constants c1 and c2 such that − yp = c1 y1 + c2 y2 and this is what we wanted to show.
2.3 Reduction of Order
69
The remainder of this chapter is devoted to techniques for carrying out the strategies just developed. For the general solution of the homogeneous equation (2.2) we must produce a fundamental set of solutions. And for the nonhomogeneous equation (2.3) we need to find one particular solution, together with a fundamental set of solutions of the associated homogeneous equation (2.2).
SECTION 2.2
PROBLEMS
In each of Problems 1 through 6, (a) verify that y1 and y2 are solutions of the differential equation, (b) show that their Wronskian is not zero, (c) write the general solution of the differential equation, and (d) find the solution of the initial value problem. 1. y − 4y = 0 y0 = 1 y 0 = 0 y1 x = cosh2x y2 x = sinh2x
9. Give an example to show that the product of two solutions of y + pxy + qxy = 0 need not be a solution. 10. Show that y1 x = 3e2x − 1 and y2 x = e−x + 2 are solutions of yy + 2y − y 2 = 0, but that neither 2y1 nor y1 + y2 is a solution. Why does this not contradict Theorem 2.2?
2. y + 9y = 0 y/3 = 0 y /3 = 1 y1 x = cos3x y2 x = sin3x 3. y + 11y + 24y = 0 y0 = 1 y 0 = 4 y1 x = e−3x y2 x = e−8x 4. y + 2y + 8y = 0 √ y0 = 2 y 0 = 0 √ y1 x = e−x cos 7x y2 x = e−x sin 7x 7 16 5. y − y + 2 y = 0 y1 = 2 y 1 = 4 x x y1 x = x4 y2 x = x4 lnx 1 1 6. y + y + 1 − 2 y = 0 y = −5 y = 8 x 4x 2 2 y1 x = cosx y2 x = sinx x x 7. Let y1 x = x2 and y2 x = x3 . Show that Wx = x4 for all real x. Then W0 = 0, but Wx is not identically zero. Why does this not contradict Theorem 2.3.1, with the interval I chosen as the entire real line?
2.3
8. Show that y1 x = x and y2 x = x2 are linearly independent solutions of x2 y − 2xy + 2y = 0 on −1 1, but that W0 = 0. Why does this not contradict Theorem 2.3.1 on this interval?
11. Suppose y1 and y2 are solutions of y + pxy + qxy = 0 on a b, and that p and q are continuous on this interval. Suppose y1 and y2 both have a relative extremum at x0 in a b. Prove that y1 and y2 are linearly dependent on a b. 12. Let be a solution of y + pxy + qxy = 0 on an open interval I, and suppose x0 = 0 for some x0 in I. Suppose x is not identically zero. Prove that x0 = 0. 13. Let y1 and y2 be distinct solutions of y + pxy + qxy = 0 on an open interval I. Let x0 be in I and suppose y1 x0 = y2 x0 = 0. Prove that y1 and y2 are linearly dependent on I. Thus linearly independent solutions cannot share a common zero.
Reduction of Order Given y + pxy + qxy = 0, we want two independent solutions. Reduction of order is a technique for finding a second solution, if we can somehow produce a first solution. Suppose we know a solution y1 , which is not identically zero. We will look for a second solution of the form y2 x = uxy1 x. Compute y2 = u y1 + uy1
y2 = u y1 + 2u y1 + uy1
In order for y2 to be a solution we need u y1 + 2u y1 + uy1 + p u y1 + uy1 + quy1 = 0
70
CHAPTER 2
Second-Order Differential Equations
Rearrange terms to write this equation as u y1 + u 2y1 + py1 + u y1 + py1 + qy1 = 0 The coefficient of u is zero because y1 is a solution. Thus we need to choose u so that u y1 + u 2y1 + py1 = 0 On any interval in which y1 x = 0, we can write u +
2y1 + py1 u = 0 y1
To help focus on the problem of determining u, denote gx =
2y1 x + pxy1 x y1 x
a known function because y1 x and px are known. Then u + gxu = 0 Let v = u to get v + gxv = 0 This is a linear first-order differential equation for v, with general solution vx = Ce−
gx dx
Since we need only one second solution y2 , we will take C = 1, so vx = e− Finally, since v = u , ux =
e−
gx dx
gx dx
dx
If we can perform these integrations and obtain ux, then y2 = uy1 is a second solution of y + py + qy = 0. Further, Wx = y1 y2 − y1 y2 = y1 uy1 + u y1 − y1 uy1 = u y12 = vy12 Since vx is an exponential function, vx = 0. And the preceding derivation was carried out on an interval in which y1 x = 0. Thus Wx = 0 and y1 and y2 form a fundamental set of solutions on this interval. The general solution of y + py + qy = 0 is c1 y1 + c2 y2 . We do not recommend memorizing formulas for g, v and then u. Given one solution y1 , substitute y2 = uy1 into the differential equation and, after the cancellations that occur because y1 is one solution, solve the resulting equation for ux.
EXAMPLE 2.4
Suppose we are given that y1 x = e−2x is one solution of y + 4y + 4y = 0. To find a second solution, let y2 x = uxe−2x . Then y2 = u e−2x − 2e−2x u
and
y2 = u e−2x + 4e−2x u − 4u e−2x
Substitute these into the differential equation to get u e−2x + 4e−2x u − 4u e−2x + 4u e−2x − 2e−2x u + 4ue−2x = 0
2.3 Reduction of Order
71
Some cancellations occur because e−2x is one solution, leaving u e−2x = 0 or u = 0 Two integrations yield ux = cx + d. Since we only need one second solution y2 , we only need one u, so we will choose c = 1 and d = 0. This gives ux = x and y2 x = xe−2x Now
e−2x xe−2x Wx = = e−4x = 0 −2e−2x e−2x − 2xe−2x
for all x. Therefore, y1 and y2 form a fundamental set of solutions for all x, and the general solution of y + 4y + 4y = 0 is yx = c1 e−2x + c2 xe−2x
EXAMPLE 2.5
Suppose we want the general solution of y − 3/xy + 4/x2 y = 0 for x > 0, and somehow we find one solution y1 x = x2 . Put y2 x = x2 ux and compute y2 = 2xu + x2 u
and
y2 = 2u + 4xu + x2 u
Substitute into the differential equation to get 3 4 2u + 4xu + x2 u − 2xu + x2 u + 2 x2 u = 0 x x Then x2 u + xu = 0 Since the interval of interest is x > 0, we can write this as xu + u = 0 With v = u , this is xv + v = xv = 0 so xv = c We will choose c = 1. Then v = u =
1 x
so u = lnx + d and we choose d = 0 because we need only one suitable u. Then y2 x = x2 lnx is a second solution. Further, for x > 0, 2 x2 lnx x = x3 = 0 Wx = 2x 2x lnx + x
CHAPTER 2
72
Second-Order Differential Equations
Then x2 and x2 lnx form a fundamental set of solutions for x > 0. The general solution is for x > 0 is yx = c1 x2 + c2 x2 lnx
PROBLEMS
SECTION 2.3
In each of Problems 1 through 10, verify that the given function is a solution of the differential equation, find a second solution by reduction of order, and finally write the general solution. 1. y + 4y = 0 y1 x = cos2x 2. y − 9y = 0 y1 x = e
(d) y + y 2 = 0
3x
3. y − 10y + 25y = 0 y1 x = e
(e) y = 1 + y 2 5x
4. x2 y − 7xy + 16y = 0 y1 x = x4 for x > 0 5. x2 y − 3xy + 4y = 0 y1 x = x2 for x > 0 6. 2x2 + 1y − 4xy + 4y = 0 y1 x = x for x > 0 1 8 7. y − y − 2 y = 0 y1 x = x4 for x > 0 x x
8. y −
(b) xy + 2y = x (c) 1 − y = 4y
(a) xy = 2 + y
2x 2 y + y = 0 y1 x = x 1 + x2 1 + x2
1 1 1 9. y + y + 1 − 2 y = 0 y1 x = √ cosx for x 4x x x>0
10. 2x2 + 3x + 1y + 2xy − 2y = 0 y1 x = x on any 1 interval not containing −1 or − 2 11. Verify that, for any nonzero constant a, y1 x = e−ax is a solution of y + 2ay + a2 y = 0. Write the general solution. 12. A second-order equation Fx y y y = 0 in which y is not explicitly present can sometimes be solved by putting u = y . This results in a first-order equation Gx u u = 0. If this can be solved for ux, then y1 x = ux dx is a solution of the given secondorder equation. Use this method to find one solution, then find a second solution, and finally the general solution of the following.
13. A second-order equation in which x does not explicitly appear can sometimes be solved by putting u = y and thinking of y as the independent variable and u as a function of y. Write du du dy du d dy = = =u y = dx dx dx dy dx dy to convert Fy y y = 0 into the first-order equation Fy u udu/dy = 0. Solve this equation for uy and then set u = y to solve for y as a function of x. Use this method to find a solution (perhaps implicitly defined) of each of the following. (a) yy + 3y 2 = 0 (b) yy + y + 1y 2 = 0 (c) yy = y2 y + y 2 (d) y = 1 + y 2 (e) y + y 2 = 0 14. Consider y + Ay + By = 0, in which A and B are constants and A2 − 4B = 0. Show that y1 x = e−Ax/2 is one solution, and use reduction of order to find the second solution y2 x = xe−Ax/2 . 15. Consider y + A/xy + B/x2 y = 0 for x > 0, with A and B constants such that A − 12 − 4B = 0. Verify that y1 x = x1−A/2 is one solution, and use reduction of order to derive the second solution y2 x = x1−A/2 lnx.
2.4 The Constant Coefficient Homogeneous Linear Equation
2.4
73
The Constant Coefficient Homogeneous Linear Equation The linear homogeneous equation y + Ay + By = 0
(2.4)
in which A and B are numbers, occurs frequently in important applications. There is a standard approach to solving this equation. The form of equation (2.4) requires that constant multiples of derivatives of yx must sum to zero. Since the derivative of an exponential function ex is a constant multiple of ex , we will look for solutions yx = ex . To see how to choose , substitute ex into equation (2.4) to get 2 ex + Aex + Bex = 0 This can only be true if 2 + A + B = 0 This is called the characteristic equation of equation (2.4). Its roots are √ −A ± A2 − 4B = 2 leading to three cases.
2.4.1
Case 1: A2 − 4B > 0
In this case the characteristic equation has two real, distinct roots, √ √ −A − A2 − 4B −A + A2 − 4B a= and b = 2 2 yielding solutions y1 x = eax and y2 x = ebx for equation (2.4). These form a fundamental set of solutions on the real line, since Wx = eax bebx − ebx aebx = b − aea+bx and this is nonzero because a = b. The general solution in this case is yx = c1 eax + c2 ebx
EXAMPLE 2.6
The characteristic equation of y − y − 6y = 0 is 2 − − 6 = 0 with roots a = −2 and b = 3. The general solution is y = c1 e−2x + c2 e3x
74
CHAPTER 2
2.4.2
Second-Order Differential Equations
Case 2: A2 − 4B = 0
Now the characteristic equation has the repeated root = −A/2, so y1 x = e−Ax/2 is one solution. This method does not provide a second solution, but we have reduction of order for just such a circumstance. Try y2 x = uxe−Ax/2 and substitute into the differential equation to get A A2 −Ax/2 ue − Au e−Ax/2 + u e−Ax/2 + A − ue−Ax/2 + u e−Ax/2 + Bue−Ax/2 = 0 4 2 Divide by e−Ax/2 and rearrange terms to get A2 u + B− u = 0 4 Because in the current case, A2 − 4B = 0, this differential equation reduces to just u = 0, and we can choose ux = x. A second solution in this case is y2 x = xe−Ax/2 . Since y1 and y2 are linearly independent, they form a fundamental set and the general solution is yx = c1 e−Ax/2 + c2 xe−Ax/2 = e−Ax/2 c1 + c2 x EXAMPLE 2.7
The characteristic equation of y − 6y + 9y = 0 is 2 − 6 + 9 = 0, with repeated root = 3. The general solution is yx = e3x c1 + c2 x
2.4.3
Case 3: A2 − 4B < 0
Now the characteristic equation has complex roots √ −A ± 4B − A2 i 2 For convenience, write A p=− 2
q=
1√ 4B − A2 2
so the roots of the characteristic equation are p ± iq. This yields two solutions y1 x = ep+iqx
and y2 x = ep−iqx
These are linearly independent because their Wronskian is ep+iqx ep−iqx Wx = p + iqep+iqx p − iqep−iqx = p − iqe2px − p + iqe2px = −2iqe2px and this is nonzero in the current case in which q = 0. Therefore the general solution is yx = c1 ep+iqx + c2 ep−iqx
(2.5)
EXAMPLE 2.8
√ The characteristic equation of y + 2y + 6y = 0 is 2 + 2 + 6 = 0, with roots −1 ± 5i. The general solution is yx = c1 e−1+
√ 5ix
+ c2 e−1−
√ 5ix
2.4 The Constant Coefficient Homogeneous Linear Equation
2.4.4
75
An Alternative General Solution in the Complex Root Case
When the characteristic equation has complex roots, we can write a general solution in terms of complex exponential functions. This is sometimes inconvenient, for example, in graphing the solutions. But recall that any two linearly independent solutions form a fundamental set. We will therefore show how to use the general solution (2.5) to find a fundamental set of real-valued solutions. Begin by recalling the Maclaurin expansions of ex , cosx, and sinx: ex =
1 n 1 1 1 1 x = 1 + x + x2 + x3 + x4 + x5 + · · · n! 2! 3! 4! 5! n=0
cosx =
−1n n=0
2n!
x2n = 1 −
1 2 1 4 1 6 x + x − x +··· 2! 4! 6!
and sinx =
−1n 2n+1 1 1 1 x = x − x3 + x5 − x7 + · · · 3! 5! 7! n=0 2n + 1!
with each series convergent for all real x. The eighteenth century Swiss mathematician Leonhard Euler experimented with replacing x with ix in the exponential series and noticed an interesting relationship between the series for ex , cosx, and sinx. First, eix =
1 ixn n! n=0
= 1 + ix +
1 1 1 1 1 ix2 + ix3 + ix4 + ix5 + ix6 + · · · 2! 3! 4! 5! 6!
Now, integer powers of i repeat the values i, −1, −i, 1 with a period of four: i2 = −1
i3 = −i
i4 = 1
i5 = i4 i = i
i6 = i4 i2 = −1
i7 = i4 i3 = −i
and so on, continuing in cyclic fashion. Using this fact in the Maclaurin series for eix , we obtain 1 i 1 i 1 eix = 1 + ix − x2 − x3 + x4 + x5 − x6 − · · · 2! 3! 4! 5! 6! 1 3 1 5 1 2 1 4 1 6 = 1− x + x − x +··· +i x− x + x −··· 2! 4! 6! 3! 5! = cosx + i sinx
(2.6)
This is Euler’s formula. In a different form, it was discovered a few years earlier by Newton’s contemporary Roger Cotes (1682–1716). Cotes is not of the stature of Euler, but Newton’s high opinion of him is reflected in Newton’s remark, “If Cotes had lived, we would have known something.” Since cos−x = cosx and sin−x = − sinx, replacing x by −x in Euler’s formula yields e−ix = cosx − i sinx Now return to the problem of solving y + Ay + By = 0 when the characteristic equation has complex roots p ± iq. Since p and q are real numbers, we have ep+iqx = epx eiqx = epx cosqx + i sinqx = epx cosqx + iepx sinqx
76
CHAPTER 2
Second-Order Differential Equations
and ep−iqx = epx e−iqx = epx cosqx − i sinqx = epx cosqx − iepx sinqx The general solution (2.5) can therefore be written yx = c1 epx cosqx + ic1 epx sinqx + c2 epx cosqx − ic2 epx sinqx = c1 + c2 epx cosqx + c1 − c2 iepx sinqx We obtain solutions for any numerical choices of c1 and c2 . In particular, if we choose 1 c1 = c2 = we obtain the solution 2 y3 x = epx cosqx And if we put c1 =
1 1 and c2 = − we obtain still another solution 2i 2i y4 x = epx sinqx
Further, these last two solutions are linearly independent on the real line, since epx cosqx epx sinqx Wx = pepx cosqx − qepx sinqx pepx sinqx + qepx cosqx = e2px sinqx cosqx + cos2 qx − sinqx cosqx + sin2 qx = e2px = 0
for all real x
We can therefore, if we prefer, form a fundamental set of solutions using y3 and y4 , writing the general solution of y + Ay + By = 0 in this case as yx = epxc1 cosqx + c2 sinqx This is simply another way of writing the general solution of equation (2.4) in the complex root case.
EXAMPLE 2.9
Revisiting the equation y − 6y + 6y = 0 of Example 2.8, we can also write the general solution √ √ yx = e−x c1 cos 5x + c2 sin 5x We now have the general solution of the constant coefficient linear homogeneous equation y + Ay + By = 0 in all cases. As usual, we can solve an initial value problem by first finding the general solution of the differential equation, then solving for the constants to satisfy the initial conditions.
EXAMPLE 2.10
Solve the initial value problem y − 4y + 53y = 0
y = −3 y = 2
First solve the differential equation. The characteristic equation is 2 − 4 + 53 = 0
2.4 The Constant Coefficient Homogeneous Linear Equation
77
with complex roots 2 ± 7i. The general solution is yx = c1 e2x cos7x + c2 e2x sin7x Now y = c1 e2 cos7 + c2 e2 sin7 = −c1 e2 = −3 so c1 = 3e−2 Thus far yx = 3e−2 e2x cos7x + c2 e2x sin7x Compute y x = 3e−2 2e2x cos7x − 7e2x sin7x + 2c2 e2x sin7x + 7c2 e2x cos7x Then y = 3e−2 2e2 −1 + 7c2 e2 −1 = 2 so 8 c2 = − e−2 7 The solution of the initial value problem is 8 yx = 3e−2 e2x cos7x − e−2 e2x sin7x 7
8 2x− 3 cos7x − sin7x =e 7
SECTION 2.4
PROBLEMS
In each of Problems 1 through 12, find the general solution.
In each of Problems 13 through 21, solve the initial value problem.
1. y − y − 6y = 0
13. y + 3y = 0 y0 = 3 y 0 = 6
2. y − 2y + 10y = 0
14. y + 2y − 3y = 0 y0 = 6 y 0 = −2
3. y + 6y + 9y = 0
15. y − 2y + y = 0 y1 = y 1 = 0
4. y − 3y = 0
16. y − 4y + 4y = 0 y0 = 3 y 0 = 5
5. y + 10y + 26y = 0
6. y + 6y − 40y = 0 7. y + 3y + 18y = 0 8. y + 16y + 64y = 0 9. y − 14y + 49y = 0 10. y − 6y + 7y = 0 11. y + 4y + 9y = 0 12. y + 5y = 0
17. y + y − 12y = 0 y2 = 2 y 2 = 1 18. y − 2y − 5y = 0 y0 = 0 y 0 = 3 19. y − 2y + y = 0 y1 = 12 y 1 = −5 20. y − 5y + 12y = 0 y2 = 0 y 2 = −4 21. y − y + 4y = 0 y−2 = 1 y −2 = 3 22. This problem illustrates how small changes in the coefficients of a differential equation may cause dramatic changes in the solutions.
CHAPTER 2
78
Second-Order Differential Equations
(a) Find the general solution x of y − 2ay + a2 y = 0, with a a nonzero constant.
(b) Find the solution of the initial value problem
(b) Find the general solution x of y − 2ay + a2 − 2 y = 0, in which is a positive constant.
Here is any positive number.
y − 2ay + a2 − 2 y = 0
(c) Is it true that lim→0 x = x? How does this answer differ, if at all, from the conclusion in Problem 22(c)?
(c) Show that, as → 0, the differential equation in (b) approaches in a limit sense the differential equation in (a), but the solution x for (b) does not in general approach the solution x for (a).
24. Suppose is a solution of
23. (a) Find the solution of the initial value problem
y − 2ay + a y = 0 2
y + Ay + By = 0
y0 = c y 0 = d
yx0 = a y x0 = b
Here A, B, a, and b are constants. Suppose A and B are positive. Prove that limx→ x = 0.
with a, c, and d constants and a = 0.
2.5
y0 = c y 0 = d
Euler’s Equation In this section we will define another class of second-order differential equations for which there is an elementary technique for finding the general solution. The second-order homogeneous equation 1 1 y + Ay + 2 By = 0 x x
(2.7)
with A and B constant, is called Euler’s equation. It is defined on the half-lines x > 0 and x < 0. We will assume for this section that x > 0. We will solve Euler’s equation by transforming it to a constant coefficient linear equation, which we can solve easily. Recall that any positive number x can be written as et for some t (namely for t = lnx). Make the change of variables x = et
or, equivalently, t = lnx
and let Yt = yet That is, in the function yx, replace x by et , obtaining a new function of t. For example, if yx = x3 , then Yt = et 3 = e3t . Now compute chain-rule derivatives. First, y x =
dY dt 1 = Y t dt dx x
so Y t = xy x
2.5 Euler’s Equation Next, d d y x = y x = dx dx
79
1 Y t x
1 d 1 Y t Y t + 2 x x dx 1 1 dY dt = − 2 Y t + x x dt dx 1 1 1 = − 2 Y t + Y t x x x 1 = 2 Y t − Y t x =−
Therefore, x2 y x = Y t − Y t If we write Euler’s equation as x2 y x + Axy x + Byx = 0 then these substitutions yield Y t − Y t + AY t + BYt = 0 or Y + A − 1Y + BY = 0
(2.8)
This is a constant coefficient homogeneous linear differential equation for Yt. Solve this equation, then let t = lnx in the solution Yt to obtain yx satisfying the Euler equation. We need not repeat this derivation each time we want to solve an Euler equation, since the coefficients A − 1 and B for the transformed equation (2.8) are easily read from the Euler equation (2.7). In carrying out this strategy, it is useful to recall that, for x > 0, xr = er lnx
EXAMPLE 2.11
Find the general solution of x2 y + 2xy − 6y = 0. Upon letting x = et , this differential equation transforms to Y + Y − 6Y = 0 The coefficient of Y is A − 1, with A = 2 in Euler’s equation. The general solution of this linear homogeneous differential equation is Yt = c1 e−3t + c2 e2t for all real t. Putting t = lnx with x > 0, we obtain yx = c1 e−3 lnx + c2 e2 lnx = c1 x−3 + c2 x2 and this is the general solution of the Euler equation.
80
CHAPTER 2
Second-Order Differential Equations
EXAMPLE 2.12
Consider the Euler equation x2 y − 5xy + 9y = 0. The transformed equation is Y − 6Y + 9Y = 0 with general solution Yt = c1 e3t + c2 te3t Let t = lnx to obtain yx = c1 x3 + c2 x3 lnx for x > 0. This is the general solution of the Euler equation.
EXAMPLE 2.13
Solve x2 y + 3xy + 10y = 0 This transforms to Y + 2Y + 10Y = 0 with general solution Yt = c1 e−t cos3t + c2 e−t sin3t Then yx = c1 x−1 cos3 lnx + c2 x−1 sin3 lnx 1 = c1 cos3 lnx + c2 sin3 lnx x for x > 0. As usual, we can solve an initial value problem by finding the general solution of the differential equation, then solving for the constants to satisfy the initial conditions.
EXAMPLE 2.14
Solve the initial value problem x2 y − 5xy + 10y = 0
y1 = 4 y 1 = −6
We will first find the general solution of the Euler equation, then determine the constants to satisfy the initial conditions. With t = lnx, we obtain Y − 6Y + 10Y = 0 having general solution Yt = c1 e3t cost + c2 e3t sint The general solution of the Euler equation is yx = c1 x3 coslnx + c2 x3 sinlnx
2.5 Euler’s Equation
81
For the first initial condition, we need y1 = 4 = c1 Thus far, yx = 4x3 coslnx + c2 x3 sinlnx Then y x = 12x2 coslnx − 4x2 sinlnx + 3c2 x2 sinlnx + c2 x2 coslnx so y 1 = 12 + c2 = −6 Then c2 = −18 and the solution of the initial value problem is yx = 4x3 coslnx − 18x3 sinlnx
Observe the structure of the solutions of different kinds of differential equations. Solutions of the constant coefficient linear equation y + Ay + By = 0 must have the forms ex , xex , ex cosx, or ex sinx, depending on the coefficients. And solutions of an Euler equation x2 y + Axy + By = 0 must have the forms xr , xr lnx, xp cosq lnx, or xp sinq lnx. For example, x3 could never be a solution of the linear equation and e−6x could never be the solution of an Euler equation.
SECTION 2.5
PROBLEMS
In each of Problems 1 through 12, find the general solution. 1. x2 y + 2xy − 6y = 0 2. x2 y + 3xy + y = 0 3. x2 y + xy + 4y = 0 4. x2 y + xy − 4y = 0 5. x2 y + xy − 16y = 0 6. x2 y + 3xy + 10y = 0 7. x2 y + 6xy + 6y = 0
13. x2 y + 5xy + 20y = 0 y−1 = 3 y −1 = 2 (Here the solution of Euler’s equation for x < 0 is needed). 14. x2 y + 5xy − 21y = 0 y2 = 1 y 2 = 0 15. x2 y − xy = 0 y2 = 5 y 2 = 8 16. x2 y − 3xy + 4y = 0 y1 = 4 y 1 = 5 17. x2 y + 7xy + 13y = 0 y−1 = 1 y −1 = 3 18. x2 y + xy − y = 0 y2 = 1 y 2 = −3
8. x2 y − 5xy + 58y = 0
19. x2 y + 25xy + 144y = 0 y1 = −4 y 1 = 0
9. x2 y + 25xy + 144y = 0
20. x2 y − 9xy + 24y = 0 y1 = 1 y 1 = 10
10. x2 y − 11xy + 35y = 0
21. x2 y + xy − 4y = 0 y1 = 7 y 1 = −3
11. x2 y − 2xy + 12y = 0
22. Here is another approach to solving an Euler equation. For x > 0, substitute y = xr and obtain values of r to make this a solution. Show how this leads in all cases to the same general solution as obtained by the transformation method.
2
12. x y + 4y = 0 In each of Problems 13 through 21, solve the initial value problem.
82
2.6
CHAPTER 2
Second-Order Differential Equations
The Nonhomogeneous Equation y + pxy + qxy = fx In view of Theorem 2.5, if we are able to find the general solution yh of the linear homogeneous equation y + pxy + qxy = 0, then the general solution of the linear nonhomogeneous equation y + pxy + qxy = fx
(2.9)
is y = yh + yp , in which yp is any solution of equation (2.9). This section is devoted to two methods for finding such a particular solution yp .
2.6.1
The Method of Variation of Parameters
Suppose we can find a fundamental set of solutions y1 and y2 for the homogeneous equation. The general solution of this homogeneous equation has the form yh x = c1 y1 x + c2 y2 x. The method of variation of parameters consists of attempting a particular solution of the nonhomogeneous equation by replacing the constants c1 and c2 with functions of x. Thus, attempt to find ux and vx so that yp x = uxy1 x + vxy2 x is a solution of equation (2.9). How should we choose u and v? First compute yp = uy1 + vy2 + u y1 + v y2 In order to simplify this expression, the first condition we will impose on u and v is that u y1 + v y2 = 0
(2.10)
Now yp = uy1 + vy2 Next compute yp = u y1 + v y2 + uy1 + vy2 Substitute these expressions for yp and yp into equation (2.9): u y1 + v y2 + uy1 + vy2 + pxuy1 + vy2 + qxuy1 + vy2 = fx Rearrange terms in this equation to get u y1 + pxy1 + qxy1 + v y2 + pxy2 + qxy2 + u y1 + v y2 = fx The two terms in square brackets vanish because y1 and y2 are solutions of the homogeneous equation. This leaves u y1 + v y2 = fx
(2.11)
Now solve equations (2.10) and (2.11) for u and v to get u x = −
y2 xfx Wx
and v x =
y1 xfx Wx
(2.12)
2.6 The Nonhomogeneous Equation y + pxy + qxy = fx
83
in which W is the Wronskian of y1 and y2 . If we can integrate these equations to determine u and v, then we have yp .
EXAMPLE 2.15
We will find the general solution of y + y = secx for −/4 < x < /4. The characteristic equation of y + 4y = 0 is 2 + 4 = 0, with roots ±2i. We may therefore choose y1 x = cos2x and y2 x = sin2x. The Wronskian of these solutions of the homogeneous equation is cos2x sin2x Wx = = 2 −2 sin2x 2 cos2x With fx = secx, equations (2.12) give us 1 u x = − sin2x secx 2 1 1 = − 2 sinx cosx = − sinx 2 cosx and 1 1 1 v x = cos2x secx = 2 cos2 x − 1 2 2 cosx = cosx − Then ux = and vx =
1 secx 2
− sinx dx = cosx
cosx dx −
= sinx −
1 secx dx 2
1 ln secx + tanx 2
Here we have let the constants of integration be zero because we need only one u and one v. Now we have the particular solution yp x = uxy1 x + vxy2 x 1 = cosx cos2x + sinx − ln secx + tanx sin2x 2 The general solution of y + y = secx is yx = yh x + yp x = c1 cos2x + c2 sin2x 1 + cosx cos2x + sinx − ln secx + tanx sin2x 2
84
CHAPTER 2
Second-Order Differential Equations
EXAMPLE 2.16
Suppose we want the general solution of 4 4 y − y + 2 y = x2 + 1 x x for x > 0. The associated homogeneous equation is 4 4 y − y + 2 y = 0 x x which we recognize as an Euler equation, with fundamental solutions y1 x = x and y2 x = x4 for x > 0. The Wronskian of these solutions is x x4 = 3x4 Wx = 1 4x3 and this is nonzero for x > 0. From equations (2.12), u x = −
x4 x2 + 1 1 = − x2 + 1 4 3x 3
and xx2 + 1 1 v x = = 3x4 3
1 1 + x x3
Integrate to get 1 1 ux = − x3 − x 9 3 and vx =
1 1 lnx − 2 3 6x
A particular solution is 1 1 1 1 lnx − 2 x4 yp x = − x3 − x x + 9 3 3 6x The general solution is yx = yh x + yp x 1 1 1 1 = c1 x + c2 x4 − x4 − x2 + x4 lnx − x2 9 3 3 6 1 1 1 = c1 x + c2 x4 − x4 − x2 + x4 lnx 9 2 3 for x > 0
2.6 The Nonhomogeneous Equation y + pxy + qxy = fx
2.6.2
85
The Method of Undetermined Coefficients
Here is a second method for finding a particular solution yp , but it only applies if px and qx are constant. Thus consider y + Ay + By = fx Sometimes we can guess the general form of a solution yp from the form of fx. For example, suppose fx is a polynomial. Since derivatives of polynomials are polynomials, we might try a polynomial for yp x Substitute a polynomial with unknown coefficients into the differential equation, and then choose the coefficients to match y + Ay + By with fx. Or suppose fx is an exponential function, say fx = e−2x . Since derivatives of e−2x are just constant multiples of e−2x , we would attempt a solution of the form yp = Ce−2x , substitute into the differential equation, and solve for C to match the left and right sides of the differential equation. Here are some examples of this method.
EXAMPLE 2.17
Solve y − 4y = 8x2 − 2x. Since fx = 8x2 − 2x is a polynomial of degree 2, we will attempt a solution yp x = ax2 + bx + c We do not need to try a higher degree polynomial, since the degree of y − 4y must be 2. If, for example, we included an x3 term in yp , then yp − 4yp would have an x3 term, and we know that it does not. Compute yp = 2ax + b
and yp = 2a
and substitute into the differential equation to get 2a − 4ax2 + bx + c = 8x2 − 2x Collect coefficients of like powers of x to write −4a − 8x2 + −4b + 2x + 2a − 4c = 0 For yp to be a solution for all x, the polynomial on the left must be zero for all x. But a second degree polynomial can have only two roots, unless it is the zero polynomial. Thus all the coefficients must vanish, and we have the equations −4a − 8 = 0 −4b + 2 = 0 2a − 4c = 0 Solve these to obtain a = −2
1 b= 2
c = −1
Thus a solution is 1 yp x = −2x2 + x − 1 2 as can be verified by substitution into the differential equation.
86
CHAPTER 2
Second-Order Differential Equations
If we want the general solution of the differential equation, we need the general solution yh of y − 4y = 0. This is yh x = c1 e2x + c2 e−2x The general solution of y − 4y = 8x2 − 2x is 1 yx = c1 e2x + c2 e−2x − 2x2 + x − 1 2 The method we have just illustrated is called the method of undetermined coefficients, because the idea is to guess a general form for yp and then solve for the coefficients to make a solution. Here are two more examples, after which we will point out a circumstance in which we must supplement the method.
EXAMPLE 2.18
Solve y + 2y − 3y = 4e2x . Because fx is a constant times an exponential, and the derivative of such a function is always a constant times the same function, we attempt yp = ae2x . Then yp = 2ae2x and yp = 4ae2x . Substitute into the differential equation to get 4ae2x + 4ae2x − 3ae2x = 4e2x 4 to get the solution 5 4 yp x = e2x 5 Again, if we wish we can write the general solution
Then 5ae2x = 4e2x , so choose a =
4 yx = c1 e−3x + c2 ex + e2x 5
EXAMPLE 2.19
Solve y − 5y + 6y = −3 sin2x. Here fx = −3 sin2x. Now we must be careful, because derivatives of sin2x can be multiples sin2x or cos2x, depending on how many times we differentiate. This leads us to include both possibilities in a proposed solution: yp x = c cos2x + d sin2x Compute yp = −2c sin2x + 2d cos2x
yp = −4c cos2x − 4d sin2x
Substitute into the differential equation to get − 4c cos2x − 4d sin2x − 5 −2c sin2x + 2d cos2x + 6 c cos2x + d sin2x = −3 sin2x Collecting the cosine terms on one side and the sine terms on the other:
2d + 10c + 3 sin2x = −2c + 10d cos2x
2.6 The Nonhomogeneous Equation y + pxy + qxy = fx
87
For yp to be a solution for all real x, this equation must hold for all x. But sin2x and cos2x are linearly independent (they are solutions of y + 4y = 0, and their Wronskian is nonzero). Therefore neither can be a constant multiple of the other. The only way the last equation can hold for all x is for the coefficient to be zero on both sides: 2d + 10c = −3 and 10d − 2c = 0 Then d=−
3 52
and c = −
15 52
and we have found a solution: yp x = −
15 3 sin2x − cos2x 52 52
The general solution of this differential equation is yx = c1 e3x + c2 e2x −
15 3 sin2x − cos2x 52 52
As effective as this method is, there is a difficulty that is intrinsic to the method. It can be successfully met, but one must be aware of it and know how to proceed. Consider the following example.
EXAMPLE 2.20
Solve y + 2y − 3y = 8ex . The coefficients on the left side are constant, and fx = 8ex seems simple enough, so we proceed with yp x = cex . Substitute into the differential equation to get cex + 2cex − 3cex = 8ex or 0 = 8ex Something is wrong. What happened? The problem in this example is that ex is a solution of y + 2y − 3y = 0, so if we substitute ce into y + 2y − 3y = 8ex , the left side will equal zero, not 8ex . This difficulty will occur whenever the proposed yp contains a term that is a solution of the homogeneous equation y + Ay + By = 0, because then this term (which may be all of the proposed yp ) will vanish when substituted into y + Ay + By. There is a way out of this difficulty. If a term of the proposed yp is a solution of y + Ay + By = 0, multiply the proposed solution by x and try the modified function as yp . If this also contains a term, or by itself, satisfies y + Ay + By = 0, then multiply by x again to x
88
CHAPTER 2
Second-Order Differential Equations
try x2 times the original proposed solution. This is as far as we will have to go in the case of second-order differential equations. Now continue Example 2.20 with this strategy.
EXAMPLE 2.21
Consider again y + 2y − 3y = 8ex . We saw that yp = cex does not work, because ex , and hence also cex , satisfies y + 2y − 3y = 0. Try yp = cxex . Compute yp = cex + cxex
yp = 2cex + cxex
and substitute into the differential equation to get 2cex + cxex + 2 cex + cxex − 3cxex = 8ex Some terms cancel and we are left with 4cex = 8ex Choose c = 2 to obtain the particular solution yp x = 2xex .
EXAMPLE 2.22
Solve y − 6y + 9y = 5e3x . Our first impulse is to try yp = ce3x . But this is a solution of y − 6y + 9y = 0. If we try yp = cxe3x , we also obtain an equation that cannot be solved for c. The reason is that the characteristic equation of y − 6y + 9y = 0 is − 32 = 0, with repeated root 3. This means that e3x and xe3x are both solutions of the homogeneous equation y − 6y + 9y = 0. Thus try yp x = cx2 e3x . Compute yp = 2cxe3x + 3cx2 e3x
yp = 2ce3x + 12cxe3x + 9cx2 e3x
Substitute into the differential equation to get 2ce3x + 12cxe3x + 9cx2 e3x − 6 2cxe3x + 3cx2 e3x + 9cx2 e3x = 5e3x After cancellations we have 2ce3x = 5e3x so c = 5/2. We have found a particular solution yp x = 5x2 e3x /2. The last two examples suggest that in applying undetermined coefficients to y +Ay +By = fx, we should first obtain the general solution of y + Ay + By = 0. We need this anyway for a general solution of the nonhomogeneous equation, but it also tells us whether to multiply our first choice for yp by x or x2 before proceeding. Here is a summary of the method of undetermined coefficients. 1. From fx, make a first conjecture for the form of yp . 2. Solve y + Ay + By = 0. If a solution of this equation appears in any term of the conjectured form for yp , modify this form by multiplying it by x. If this modified function still occurs in a solution of y + Ay + By = 0, multiply by x again (so the original yp is multiplied by x2 in this case). 3. Substitute the final proposed yp into y + Ay + By = fx and solve for its coefficients.
2.6 The Nonhomogeneous Equation y + pxy + qxy = fx
89
Here is a list of functions to try in the initial stage (1) of formulating yp . In this list Px indicates a given polynomial of degree n, and Qx and Rx polynomials with undetermined coefficients, of degree n. fx
Initial Guess for yp
Px
Qx
ceax
deax
cosbx or sinbx
c cosbx + d sinbx
Pxeax
Qxeax
Px cosbx or Px sinbx ax
ax
Pxe cosbx or Pxe sinbx
Qx cosbx + Rx sinbx Qxeax cosbx + Rxeax sinbx
EXAMPLE 2.23
Solve y + 9y = −4x sin3x. With fx = −4x sin3x, the preceding list suggests that we attempt a particular solution of the form yp x = ax + b cos3x + cx + d sin3x Now solve y + 9y = 0 to obtain the fundamental set of solutions cos3x and sin3x. The proposed yp includes terms b cos3x and d sin3x, which are also solutions of y + 9y = 0. Therefore, modify the proposed yp by multiplying it by x, trying instead yp x = ax2 + bx cos3x + cx2 + dx sin3x Compute yp =2ax + b cos3x − 3ax2 + 3bx sin3x + 2cx + d sin3x + 3cx2 + 3dx cos3x and yp =2a cos3x − 6ax + 3b sin3x − 6ax + 3b sin3x − 9ax2 + 9bx cos3x + 2c sin3x + 6cx + 3d cos3x + 6cx + 3d cos3x − 9cx2 + 9dx sin3x Substitute these into the differential equation to obtain 2a cos3x − 6ax + 3b sin3x − 6ax + 3b sin3x − 9ax2 + 9bx cos3x + 2c sin3x + 6cx + 3d cos3x + 6cx + 3d cos3x − 9cx2 + 9dx sin3x + 9ax2 + 9bx cos3x + 9cx2 + 9dx sin3x = −4x sin3x Now collect coefficients of “like” terms sin3x, x sin3x, x2 sin3x, and so on). We get 2a + 6d cos3x + −6b + 2c sin3x + 12cx cos3x + −12a + 4x sin3x = 0
90
CHAPTER 2
Second-Order Differential Equations
with all other terms canceling. For this linear combination of cos3x, sin3x, x cos3x, and x sin3x to be zero for all x, each coefficient must be zero. Therefore, 2a + 6d = 0 −6b + 2c = 0 12c = 0 and −12a + 4 = 0 1 1 Then a = , c = 0, b = 0, and d = − . We have found the particular solution 3 9 1 1 yp x = x2 cos3x − x sin3x 3 9 The general solution is 1 1 yx = c1 cos3x + c2 sin3x + x2 cos3x − x sin3x 3 9 Sometimes a differential equation has nonconstant coefficients but transforms to a constant coefficient equation. We may then be able to use the method of undetermined coefficients on the transformed equation and then use the results to obtain solutions of the original equation.
EXAMPLE 2.24
Solve x2 y − 5xy + 8y = 2 lnx. The method of undetermined coefficients does not apply here, since the differential equation has nonconstant coefficients. However, from our experience with the Euler equation, apply the transformation t = lnx and let Yt = yet . Using results from Section 2.5, the differential equation transforms to Y t − 6Y t + 8Yt = 2t which has constant coefficients on the left side. The homogeneous equation Y − 6Y + 8Y = 0 has general solution Yh t = c1 e2t + c2 e4t and, by the method of undetermined coefficients, we find one solution of Y − 6Y + 8Y = 2t to be 3 1 Yp t = t + 4 16 The general solution for Y is 3 1 Yt = c1 e2t + c2 e4t + t + 4 16 Since t = lnx, the original differential equation for y has general solution 3 1 lnx + 4 16 1 3 = c1 x2 + c2 x4 + lnx + 4 16
yx = c1 e2 lnx + c2 e4 lnx +
2.6 The Nonhomogeneous Equation y + pxy + qxy = fx
2.6.3
91
The Principle of Superposition
Consider the equation y + pxy + qxy = f1 x + f2 x + · · · + fN x
(2.13)
Suppose ypj is a solution of y + pxy + qxy = fj x We claim that yp1 + yp2 + · · · + ypN is a solution of equation (2.13). This is easy to check by direct substitution into the differential equation: yp1 + yp2 + · · · + ypN + pxyp1 + yp2 + · · · + ypN +qxyp1 + yp2 + · · · + ypN = yp1 + pxyp1 + qxyp1 + · · · + ypN + pxypN + qxypN
= f1 x + f2 x + · · · + fN x This means that we can solve each equation y + pxy + qxy = fj x individually, and the sum of these solutions is a solution of equation (2.13). This is called the principle of superposition, and it sometimes enables us to solve a problem by breaking it into a sum of “smaller” problems that are easier to handle individually.
EXAMPLE 2.25
Solve y + 4y = x + 2e−2x . Consider two problems: Problem 1: y + 4y = x, and Problem 2: y + 4y = 2e−2x . Using undetermined coefficients, we find that a solution of Problem 1 is yp1 x = x/4, and that a solution of Problem 2 is yp2 x = e−2x /4. Therefore, 1 yp x = x + e−2x 4 is a solution of y + 4y = x + 2e−2x . The general solution of this differential equation is 1 yx = c1 cos2x + c2 sin2x + x + e−2x 4
2.6.4
Higher-Order Differential Equations
The methods we now have for solving y + pxy + qxy = fx under certain conditions can also be applied to higher-order differential equations, at least in theory. However, there are practical difficulties to this approach. Consider the following example.
92
CHAPTER 2
Second-Order Differential Equations
EXAMPLE 2.26
Solve d4 y dy d6 y + 15y = 0 −4 4 +2 6 dx dx dx If we take a cue from the second-order case, we attempt solutions y = ex . Upon substituting this into the differential equation, we obtain an equation for : 6 − 44 + 2 + 15 = 0 In the second-order case, the characteristic polynomial is always of degree 2 and easily solved. Here we encounter a sixth-degree polynomial whose roots are not obvious. They are, approximately, − 1685798616 ± 02107428331i − 004747911354 ± 1279046854i and 1733277730 ± 04099384482i
When the order of the differential equation is n > 2, having to find the roots of an nth degree polynomial is enough of a barrier to make this approach impractical except in special cases. A better approach is to convert this sixth-order equation to a system of first-order equations as follows. Define new variables z1 = y
z2 = y
z3 = y
z4 =
d3 y dx3
z5 =
d4 y dx4
z6 =
d5 y dx5
Now we have a system of six first-order differential equations: z1 = z2 z2 = z3 z3 = z4 z4 = z5 z5 = z6 z6 = 4z5 − 2z2 − 15z1 The last equation in this system is exactly the original differential equation, stated in terms of the new quantities zj . The point to reformulating the problem in this way is that powerful matrix techniques can be invoked to find solutions. We therefore put off discussion of differential equations of order higher than 2 until we have developed the matrix machinery needed to exploit this approach.
2.7 Application of Second-Order Differential Equations to a Mechanical System
SECTION 2.6
PROBLEMS
In each of Problems 1 through 6, find the general solution using the method of variation of parameters. 1. y + y = tanx 2. y − 4y + 3y = 2 cosx + 3 3. y + 9y = 12 sec3x 4. y − 2y − 3y = 2 sin2 x 5. y − 3y + 2y = cose−x 6. y − 5y + 6y = 8 sin2 4x In each of Problems 7 through 16, find the general solution using the method of undetermined coefficients. 7. y − y − 2y = 2x2 + 5 8. y − y − 6y = 8e2x 9. y − 2y + 10y = 20x2 + 2x − 8 10. y − 4y + 5y = 21e2x 11. y − 6y + 8y = 3ex 12. y + 6y + 9y = 9 cos3x 13. y − 3y + 2y = 10 sinx
21. x2 y − 5xy + 8y = 3x 4 22. x2 y + 3xy + y = x 23. x2 y + xy + 4y = sin2 lnx 24. x2 y + 2xy − 6y = x2 − 2 25. y − 4y + 4y = e3x − 1 26. y − y − 2y = x In each of Problems 27 through 42, solve the initial value problem. 27. y − 4y = −7e2x + x y0 = 1 y 0 = 3 28. y + 4y = 8 + 34 cosx y0 = 3 y 0 = 2 29. y + 8y + 12y = e−x + 7 y0 = 1 y 0 = 0 30. y − 3y = 2e2x sinx y0 = 1 y 0 = 2 31. y − 2y − 8y = 10e−x + 8e2x y0 = 1 y 0 = 4 32. y − 6y + 9y = 4e3x y0 = 1 y 0 = 2 33. y − 5y + 6y = cos2x y0 = 0 y 0 = 4 34. y − y + y = 1 y1 = 4 y 1 = −2
14. y − 4y = 8x2 + 2e3x
35. y − 8y + 2y = e−x y−1 = 5 y −1 = 2
15. y − 4y + 13y = 3e2x − 5e3x
36. y + 6y + 9y = − cosx y0 = 1 y 0 = −6
16. y − 2y + y = 3x + 25 sin3x
37. y − y = 5 sin2 x y0 = 2 y 0 = −4
In each of Problems 17 through 26, find the general solution of the differential equation, using any method. 17. y − y − 2y = e2x 18. x2 y + 5xy − 12y = lnx 19. y + y − 6y = x 20. y − y − 12y = 2 sinh2 x
2.7
93
38. y + y = tanx y0 = 4 y 0 = 3 39. x2 y − 6y = 8x2 y1 = 1 y 1 = 0 40. x2 y + 7xy + 9y = 27 lnx y1 = 1 y 1 = −4 41. x2 y − 2xy + 2y = 10 sinlnx; y1 = 3 y 1 = 0 42. x2 y − 4xy + 6y = x4 ex y2 = 2 y 2 = 7
Application of Second-Order Differential Equations to a Mechanical System Envision a spring of natural (unstretched) length L and spring constant k. This constant quantifies the “stiffness” of the spring. The spring is suspended vertically. An object of mass m is attached at the lower end, stretching the spring d units past its rest length. The object comes to rest in its equilibrium position. It is then displaced vertically a distance y0 units (up or down), and released, possibly with an initial velocity (Figure 2.4). We want to construct a mathematical model allowing us to analyze the motion of the object. Let yt be the displacement of the object from the equilibrium position at time t. As a convenience, take this equilibrium position to be y = 0. Choose down as the positive direction. Both of these choices are arbitrary.
94
CHAPTER 2
Second-Order Differential Equations (a) Unstretched
(b) Static equilibrium
(c) System in motion
L
d y0
m y m FIGURE 2.4
Mass/spring system.
Now consider the forces acting on the object. Gravity pulls it downward with a force of magnitude mg. By Hooke’s law, the force the spring exerts on the object has magnitude ky. At the equilibrium position, the force of the spring is −kd, negative because it acts upward. If the object is pulled downward a distance y from this position, an additional force −ky is exerted on it. Thus, the total force on the object due to the spring is −kd − ky The total force due to gravity and the spring is mg − kd − ky Since at the equilibrium point (y = 0) this force is zero, then mg = kd. The net force acting on the object due to gravity and the spring is therefore just −ky. Finally, there are forces tending to retard or damp out the motion. These include air resistance or viscosity of the medium if the object is suspended in some fluid such as oil. A standard assumption, arising from experiment, is that the retarding forces have magnitude proportional to the velocity y . Thus, for some constant c called the damping constant, the retarding forces have magnitude cy . The total force acting on the object due to gravity, damping and the spring itself therefore have magnitude −ky − cy Finally, there may be a driving force of magnitude ft on the object. Now the total external force acting on the object has magnitude F = −ky − cy + ft Assuming that the mass is constant, Newton’s second law of motion enables us to write my = −ky − cy + ft or y +
c k y + y = ft m m
(2.14)
2.7 Application of Second-Order Differential Equations to a Mechanical System
95
This is the spring equation. We will analyze the motion described by solutions of this equation, under various conditions.
2.7.1
Unforced Motion
Suppose first that ft = 0, so there is no driving force. Now the spring equation is y +
c k y + y=0 m m
2 +
k c + = 0 m m
with characteristic equation
This has roots =−
c 1 √ 2 ± c − 4km 2m 2m
As we might expect, the general solution, hence the motion of the object, will depend on its mass, the amount of damping, and the stiffness of the spring. Consider the following cases. Case 1 c2 − 4km > 0 In this event, the characteristic equation has two real, distinct roots: 1 √ 2 1 √ 2 c c + − 1 = − c − 4km and 2 = − c − 4km 2m 2m 2m 2m The general solution of equation (2.14) in this case is yt = c1 e1 t + c2 e2 t Clearly 2 < 0. Since m and k are positive, c2 − 4km < c2 , so also. Therefore,
√
c2 − 4km < c and 1 is negative
lim yt = 0
t→
regardless of initial conditions. In the case c2 − 4km > 0, the motion of the object decays to zero as time increases. This case is called overdamping, and it occurs when the square of the damping constant exceeds four times the product of the mass and spring constant.
EXAMPLE 2.27 Overdamping
Suppose c = 6, k = 5, and m = 1. Now the general solution is yt = c1 e−t + c2 e−5t Suppose, to be specific, the object was initially (at t = 0) drawn upward 4 units from the equilibrium position and released downward with a speed of 2 units per second. Then y0 = −4 and y 0 = 2, and we obtain 1 yt = e−t −9 + e−4t 2 A graph of this solution is shown in Figure 2.5. What does the solution tell us about the motion? Since −9+e−4t < 0 for t > 0, then yt < 0 and the object always remains above the equilibrium point. Its velocity y t = e−t 9 − 5e−4t /2 decreases to zero as t increases, and yt → 0 as t increases, so the object moves downward
96
CHAPTER 2
Second-Order Differential Equations y
y x 0
2
4
6
8
10
0
2
x 4
6
8
10
1
1
2
2
3
3
4
4
FIGURE 2.5 An example of overdamped motion, no driving force.
FIGURE 2.6 An example of critical damped motion, no driving force.
toward equilibrium with ever decreasing velocity, approaching closer to but never reaching the equilibrium point, and never coming to rest. Case 2 c2 − 4km = 0 Now the general solution of the spring equation (2.14) is yt = c1 + c2 te−ct/2m This case is called critical damping. While yt → 0 as t → , as in the overdamping case, we will see an important difference between critical and overdamping.
EXAMPLE 2.28
Let c = 2 and k = m = 1. Now yt = c1 + c2 te−t . Suppose the object is initially pulled up four units above the equilibrium position and then pushed downward with a speed of 5 units per second. Then y0 = −4 and y 0 = 5, so yt = −4 + te−t Observe that y4 = 0, so, unlike the what we saw with overdamping, the object actually reaches the equilibrium position, four seconds after it was released, and then passes through it. In fact, yt reaches its maximum when t = 5 seconds, and this maximum value is y5 = e−5 , about 0007 unit below the equilibrium point. The velocity y t = 5 − te−t is negative for t > 5, so the object’s velocity decreases after this 5-second point. Since yt → 0 as t → , the object moves with decreasing velocity back toward the equilibrium point as time increases. Figure 2.6 shows a graph of the displacement function in this case. In general, when critical damping occurs, the object either passes through the equilibrium point exactly once, as just seen, or never reaches it at all, depending on the initial conditions. Case 3 c2 − 4km < 0 Now the spring constant and mass together are sufficiently large that c2 < 4km, and the damping is less dominant. This case is called underdamping. The general solution now is yt = e−ct/2m c1 cost + c2 sint in which =
1 √ 4km − c2 2m
2.7 Application of Second-Order Differential Equations to a Mechanical System
97
Because c and m are positive, yt → 0 as t → . However, now the motion is oscillatory because of the sine and cosine terms in the solution. The motion is not, however, periodic, because of the exponential factor, which causes the amplitude of the oscillations to decay to zero as time increases.
EXAMPLE 2.29
Suppose c = k = 2 and m = 1. Now the general solution is yt = e−t c1 cost + c2 sint Suppose the object is driven downward from a point three units above equilibrium, with an initial speed of two units per second. Then y0 = −3 and y 0 = 2 and the solution is yt = −e−t 3 cost + sint The behavior of this solution is more easily visualized if we write it in phase angle form. We want to choose C and so that 3 cost + sint = C cost + For this, we need 3 cost + sint = C cost cos − C sint sin so C cos = 3
and
C sin = −1
Then C sin 1 = tan = − C cos 3 so
1 1 = tan−1 − = − tan−1 3 3
To solve for C, write C 2 cos2 + C 2 sin2 = C 2 = 32 + 12 = 10 √ so C = 10. Now we can write the solution as √ yt = 10e−t cost − tan−1 1/3 The √ graph is therefore√a cosine curve with decaying amplitude, squashed between graphs of y = 10e−t and y = − 10e−t . The solution is shown in Figure 2.7, with these two exponential functions shown as reference curves. Because of the oscillatory cosine term, the object passes back and forth through the equilibrium point. In fact, it passes through equilibrium exactly when yt = 0, or 1 2n + 1 t = tan−1 + 3 2 for n = 0 1 2 3 · · · In theory, the object oscillates through the equilibrium infinitely often in this underdamping case, although the amplitudes of the oscillations decrease to zero as time increases.
98
CHAPTER 2
Second-Order Differential Equations
3
10 et
2 1 1 2 3
0
4
2
8
6
t
10 et
FIGURE 2.7
An example of underdamped motion, no
driving force.
2.7.2
Forced Motion
Now suppose an external driving force of magnitude ft acts on the object. Of course, different forces will cause different kinds of motion. As an illustration, we will analyze the motion under the influence of a periodic driving force ft = A cost, with A and positive constants. Now the spring equation is y+
A c k y + y = cost m m m
(2.15)
We know how to solve this nonhomogeneous linear equation. Begin by finding a particular solution, using the method of undetermined coefficients. Attempt a solution yp x = a cost + b sint Substitution of this into equation (2.15) and rearrangement of terms yields
k A k bc ac +a − −b −a2 + cost = b2 + sint m m m m m Since sint and cost are not constant multiples of each other, the only way this can be true for all t ≥ 0 is for the coefficient on each side of the equation to be zero. Therefore −a2 +
k A bc +a − = 0 m m m
and b2 +
k ac − b = 0 m m
Solve these for a and b, keeping in mind that A, c, k, and m are given. We get a= Let 0 = given by
Ak − m2 k − m2 2 + 2 c2
and
b=
Ac k − m2 2 + 2 c2
k/m. Then a particular solution of equation (2.15), for this forcing function, is yp x =
mA20 − 2 cost m2 20 − 2 2 + 2 c2 +
assuming that c = 0 or = 0 .
Ac m2 20 − 2 2 + 2 c2
sint
(2.16)
2.7 Application of Second-Order Differential Equations to a Mechanical System
99
We will now examine some specific cases to get some insight into the motion with this forcing function. Overdamped Forced Motion Suppose c =√6, k = 5, and √ m = 1, as we had previously in the overdamping case. Suppose also that A = 6 5 and = 5. If the object is released from rest from the equilibrium position, then the displacement function satisfies the initial value problem √ √ y + 6y + 5y = 6 5 cos 5t y0 = y 0 = 0 This problem has the unique solution √ yt =
√ 5 −e−t + e−5t + sin 5t 4
a graph of which is shown in Figure 2.8. As time increases, the exponential terms decrease to zero, exerting less influence on the motion,√while the sine term oscillates. Thus, as t increases, and the object moves up and down through the the solution tends to behave more like sin 5t √ equilibrium point, with approximate period 2/ 5. Contrast this with the overdamped motion with no forcing function, in which the object began above the equilibrium point and moved with decreasing velocity down toward it, but never reached it. Critically Damped Forced Motion Let c = 2 and m = k = 1. Suppose = 1 and A = 2. Assume that the object is released from rest from the equilibrium position. Now the initial value problem for the position function is y + 2y + y = 2 cost
y0 = y 0 = 0
with solution yt = −te−t + sint A graph of this solution is shown in Figure 2.9. The exponential term exerts a significant influence at first, but decreases to zero as time increases. The term −te−t decreases to zero as √ 5 t increases, but not as quickly as the corresponding term 4 −e−t + e−5t in the overdamping case. Nevertheless, after a while the motion settles into nearly (but not exactly, because −te−t is never actually zero for positive t) a sinusoidal motion back and forth through the equilibrium point. This is an example of critically damped forced motion.
1.0
1.0
0.5
0.5
0
5
10
15
20
t
0
5
10
15
20
0.5
0.5
1.0
1.0
FIGURE 2.8 of overdamped motion √ An example √ driven by 6 5 cos 5t.
FIGURE 2.9 An example of critical damped motion driven by 2 cost.
t
100
CHAPTER 2
Second-Order Differential Equations
1.0 0.5 0 0.5
5
10
15
20
t
1.0 FIGURE 2.10 example of underdamped motion √ An √ driven by 2 2 cos 2t.
√ √ Underdamped Forced Motion Suppose now that c = k = 2, m = 1, = 2, and A = 2 2. Now c2 − 4km < 0, and we have underdamped motion, but this time with a forcing function. If the object is released from rest from the equilibrium position, then the initial value problem for the displacement function is √ √ y0 = y 0 = 0 y + 2y + 2y = 2 2 cos 2t with solution
√ √ yt = − 2e−t sint + sin 2t
Unlike the other two cases, the exponential factor in this solution has√a sint factor. Figure 2.10 shows a graph of this function. As time increases, the term − 2e−t sint becomes less influential and the motion settles √ nearly into an oscillation back and forth through the equilibrium point, with period nearly 2/ 2.
2.7.3
Resonance
In the absence of damping, an interesting phenomenon called resonance can occur. Suppose c = 0 but that there is still a periodic driving force ft = A cost. Now the spring equation is A k y + y = cost m m From equation (2.16) with c = 0, this equation has general solution yt = c1 cos0 t + c2 sin0 t +
A m20 − 2
cost
(2.17)
√ in which 0 = k/m. This number is called the natural frequency of the spring system, and is a function of the stiffness of the spring and mass of the object, while is the input frequency and is contained in the driving force. This general solution assumes that the natural and input frequencies are different. Of course, the closer we choose the natural and input frequencies, the larger the amplitude of the cost term in the solution. Consider the case that the natural and input frequencies are the same. Now the differential equation is A k y = cos0 t (2.18) m m and the function given by equation (2.17) is not a solution. To solve equation (2.18), first write the general solution yh of y + k/my = 0: y +
yh t = c1 cos0 t + c2 sin0 t
2.7 Application of Second-Order Differential Equations to a Mechanical System
101
For a particular solution of equation (2.18), we will proceed by the method of undetermined coefficients. Since the forcing function contains a term found in yh , we will attempt a particular solution of the form yp t = at cos0 t + bt sin0 t Substitute this into equation (2.18) to obtain −2a0 sin0 t + 2b0 cos0 t =
A cos0 t m
Thus choose A m
a=0
and
yp t =
A t sin0 t 2m0
2b0 =
leading to the particular solution
The general solution of equation (2.18) is therefore yt = c1 cos0 t + c2 sin0 t +
A t sin0 t 2m0
This solution differs from that in the case = 0 in the factor of t in yp t. Because of this, solutions increase in amplitude as t increases. This phenomenon is called resonance. As a specific example, let c1 = c2 = 0 = 1 and A/2m = 1 to write the solution as yt = cost + sint + t sint A graph of this function is shown in Figure 2.11, clearly revealing the increasing magnitude of the oscillations with time. While there is always some damping in the real world, if the damping constant is close to zero compared to other factors, such as the mass, and if the natural and input frequencies are (nearly) equal, then oscillations can build up to a sufficiently large amplitude to cause resonancelike behavior and damage a system. This can occur with soldiers marching in step across a bridge. If the cadence of the march (input frequency) is near enough to the natural frequency of the material of the bridge, vibrations can build up to dangerous levels. This occurred near Manchester, England, in 1831 when a column of soldiers marching across the Broughton Bridge caused it to collapse. More recently, the Tacoma Narrows Bridge in Washington experienced increasing oscillations driven by energy from the wind, causing it to whip about in sensational
20 10
0
5
10 20 FIGURE 2.11 Resonance.
10
15
20
t
102
CHAPTER 2
Second-Order Differential Equations
fashion before its collapse into the river. Videos of the wild thrashing about of the bridge are available in some libraries and engineering and science departments.
2.7.4
Beats
In the absence of damping, an oscillatory driving force can also cause a phenomenon called beats. Suppose = 0 and consider y + 20 y =
A cos0 t m
The Tacoma Narrows Bridge was completed in 1940 and stood as a new standard of combined artistry and functionality. The bridge soon became known for its tendency to sway in high winds, but no one suspected what was about to occur. On November 7, 1940, energy provided by unusually strong winds, coupled with a resonating effect in the bridge’s material and design, caused the oscillations in the bridge to be reinfored and build to dangerous levels. Soon, the twisting caused one side of the sidewalk to rise 28 feet above that of the other side. Concrete dropped out of the roadway, and a section of the suspension span completely rotated and fell away. Shortly thereafter, the entire center span collapsed into Puget Sound. This sensational construction failure motivated new mathematical treatments of vibration and wave phenomena in the design of bridges and other large structures. The forces that brought down this bridge are a more complicated version of the resonance phenomenon discussed in Section 2.7.3.
2.7 Application of Second-Order Differential Equations to a Mechanical System
103
1.0 0.5
0
5
10
15
20
25
30
t
0.5 1.0 FIGURE 2.12 Beats.
Assuming that the object is released from rest at the equilibrium position, then y0 = y 0 = 0 and from equation (2.17) we have the solution yt =
A
cost − cos0 t m20 − 2
The behavior of this solution reveals itself more clearly if we write it as 1 1 2A sin + t sin − t yt = 2 0 2 0 m20 − 2 This formulation reveals a periodic variation of amplitude in the solution, depending on the relative sizes of 0 + and 0 − . It is this periodic variation of amplitude that is called a 1 beat. As a specific example, suppose 0 + = 5 and 0 − = , and the constants are chosen 2 so that 2A/ m20 − 2 = 1. In this case, the displacement function is 5t t yt = sin sin 2 4 The beats are apparent in the graph of this solution in Figure 2.12.
2.7.5
Analogy with an Electrical Circuit
If a circuit contains a resistance R, inductance L, and capacitance C, and the electromotive force is Et, then the impressed voltage is obtained as a sum of the voltage drops in the circuit: Et = Li t + Rit +
1 qt C
Here it is the current at time t, and qt is the charge. Since i = q , we can write the second-order linear differential equation q +
1 R 1 q + q = E L LC L
If R, L, and C are constant, this is a linear equation of the type we have solved for various choices of Et. It is interesting to observe that this equation is of exactly the same form as the equation for the displacement of an object attached to a spring, which is y +
1 c k y + y = ft m m m
CHAPTER 2
104
Second-Order Differential Equations
This means that solutions of one equation readily translate into solutions of the other and suggests the following equivalences between electrical and mechanical quantities: displacement function yt ⇐⇒ charge qt velocity y t ⇐⇒ current it driving force ft ⇐⇒ electromotive force Et mass m ⇐⇒ inductance L damping constant c ⇐⇒ resistance R spring modulus k ⇐⇒ reciprocal 1/C of the capacitance
EXAMPLE 2.30
Consider the circuit of Figure 2.13, driven by a potential of Et = 17 sin2t volts. At time zero the current is zero and the charge on the capacitor is 1/2000 coulomb. The charge qt on the capacitor for t > 0 is obtained by solving the initial value problem 10q + 120q + 1000q = 17 sin2t q0 =
1 q 0 = 0 2000
The solution is qt =
1 −6t 1 e 7 cos8t − sin8t +
− cos2t + 4 sin2t 1500 240
0
1
3
2
4
5
t
6
0.01 0.02
120
E(t) 17 sin(2t) volts
10 3 F 0.03
10 H FIGURE 2.13
FIGURE 2.14 Transient part of the current for the circuit of Figure 2.13.
0.03
0.03
0.02
0.02 0.01
0.01 0.01
0
2
4
6
8
10
t 0.01
0
0.02
0.02
0.03
0.03
FIGURE 2.15 Steady-state part of the current for the circuit of Figure 2.13.
Figure 2.13.
FIGURE 2.16
5
10
15
Current function for the circuit of
20
t
2.7 Application of Second-Order Differential Equations to a Mechanical System
105
The current can be calculated as 1 1
4 cos2t + sin2t it = q t = − e−6t cos8t + sin8t + 30 120 The current is a sum of a transient part 1 −6t e cos8t + sin8t 30 named for the fact that it decays to zero as t increases, and a steady-state part −
1
4 cos2t + sin2t 120 The transient and steady-state parts are shown in Figures 2.14 and 2.15, and their sum, the current, is shown in Figure 2.16.
SECTION 2.7
PROBLEMS
1. The object of this problem is to gauge the relative effects of initial position and velocity on the motion in the unforced, overdamped case. Solve the initial value problems y + 4y + 2y = 0
y0 = 5 y 0 = 0
y + 4y + 2y = 0
y0 = 0 y 0 = 5
and
Graph the solutions on the same set of axes. What conclusions can be drawn from these solutions about the influence of initial position and velocity? 2. Repeat the experiment of Problem 1, except now use the critically damped unforced equation y + 4y + 4y = 0. 3. Repeat the experiment of Problem 1 for the underdamped unforced case y + 2y + 5y = 0.
8. y + 2y + 5y = 0 y0 = A y 0 = 0; A has values 1 3 6 10 −4 and −7. 9. y + 2y + 5y = 0 y0 = 0 y 0 = A; A has values 1 3 6 10 −4 and −7. 10. An object having mass 1 gram is attached to the lower end of a spring having spring modulus 29 dynes per centimeter. The bob is, in turn, adhered to a dashpot that imposes a damping force of 10v dynes, where vt is the velocity at time t in centimeters per second. Determine the motion of the bob if it is pulled down 3 centimeters from equilibrium and then struck upward with a blow sufficient to impart a velocity of 1 centimeter per second. Graph the solution. Solve the problem when the initial velocity is, in turn, 2, 4, 7, and 12 centimeters per second. Graph these solutions on the same set of axes to visualize the influence of the initial velocity on the motion.
6. y + 4y + 4y = 0 y0 = A y 0 = 0; A has values 1 3 6 10 −4 and −7.
11. An object having mass 1 kilogram is suspended from a spring having a spring constant of 24 newtons per meter. Attached to the object is a shock absorber, which induces a drag of 11v newtons (velocity is in meters per second). The system is set in motion by 25 lowering the bob centimeters and then striking it 3 hard enough to impart an upward velocity of 5 meters per second. Solve for and graph the displacement function. Obtain the solution for the cases that the bob is lowered, in turn, 12, 20, 30, and 45 centimeters, and graph the displacement functions for the five cases on the same set of axes to see the effect of the distance lowered.
7. y + 4y + 4y = 0 y0 = 0 y 0 = A; A has values 1 3 6 10 −4 and −7.
12. When an 8-pound weight is suspended from a spring, it stretches the spring 2 inches. Determine the
Problems 4 through 9 explore the effects of changing the initial position or initial velocity on the motion of the bob. In each, use the same set of axes to graph the solution of the initial value problem for the given values of A and observe the effect that these changes cause in the solution. 4. y + 4y + 2y = 0 y0 = A y 0 = 0; A has values 1 3 6 10 −4 and −7. 5. y + 4y + 2y = 0 y0 = 0 y 0 = A; A has values 1 3 6 10 −4 and −7.
106
CHAPTER 2
Second-Order Differential Equations
equation of motion when an object with a mass of 7 kilograms is suspended from this spring, and the system is set in motion by striking the object an upward blow, imparting a velocity of 4 meters per second. 13. How many times can the bob pass through the equilibrium point in the case of overdamped motion? What condition can be placed on the initial displacement y0 to guarantee that the bob never passes through equilibrium? 14. How many times can the bob pass through the equilibrium point in the case of critical damping? What condition can be placed on y0 to ensure that the bob never passes through this position? How does the initial velocity influence whether the bob passes through the equilibrium position? 15. In underdamped motion, what effect does the damping constant c have on the frequency of the oscillations of motion? 16. Suppose y0 = y 0 = 0. Determine the maximum displacement of the bob in the critically damped case, and show that the time at which this maximum occurs is independent of the initial displacement.
a fluid that imposes a drag of 2v pounds. The entire system is subjected to an external force 4 cost. Determine the value of that maximizes the amplitude of the steady-state oscillation. What is this maximum amplitude? 21. Consider overdamped forced motion governed by y + 6y + 2y = 4 cos3t. (a) Find the solution satisfying y0 = 6 y 0 = 0. (b) Find the solution satisfying y0 = 0 y 0 = 6. (c) Graph these solutions on the same set of axes to compare the effect of initial displacement with that of initial velocity. 22. Carry out the program of Problem 21 for the critically damped forced system governed by y + 4y + 4y = 4 cos3t. 23. Carry out the program of Problem 21 for the underdamped forced system governed by y + y + 3y = 4 cos3t. In each of Problems 24 through 27, use the information to find the current in the RLC circuit of Figure 2.17. Assume zero initial current and capacitor charge.
17. Suppose the acceleration of the bob on the spring at distance d from the equilibrium position is a. Prove √ that the period of the motion is 2 d/a in the case of undamped motion. 18. A mass m1 is attached to a spring and allowed to vibrate with undamped motion having period p. At some later time a second mass m2 is instantaneously fused with m1 . Prove that the new object, having mass m1 +m2 , exhibits simple harmonic motion with period p/ 1 + m2 /m1 . 19. Let yt be the solution of y + 20 y = A/m cost, with y0 = y 0 = 0. Assuming that = 0 , find lim→0 yt. How does this limit compare with the solution of y + 20 y = A/m cos0 t, with y0 = y 0 = 0? 20. A 16-pound weight is suspended from a spring, 8 stretching it feet. Then the weight is submerged in 11
R C L
FIGURE 2.17
RLC
circuit. 24. R = 200 L = 01 H C = 0006 F Et = te−t volts 25. R = 400 L = 012 H C = 004 F Et = 120 sin20t volts 26. R = 150 L = 02 H C = 005 F, Et = 1 − e−t volts 27. R = 450 L = 095 H C = 0007 F, Et = e−t sin2 3t volts
CHAPTER
BASIC PROPERTIES INITIAL VALUE PROBLEMS USING THE LAPLACE TRANSFORM SHIFTING THEOREMS AND THE HEAVISIDE FUNCTION CONVOLUTION UNIT IMPULSES AND THE DIRAC DELTA FUNC-
3
The Laplace Transform
3.1
Definition and Basic Properties In mathematics, a transform is usually a device that converts one type of problem into another type, presumably easier to solve. The strategy is to solve the transformed problem, then transform back the other way to obtain the solution of the original problem. In the case of the Laplace transform, initial value problems are often converted to algebra problems, a process we can diagram as follows: initial value problem ⇓ algebra problem ⇓ solution of the algebra problem ⇓ solution of the initial value problem.
DEFINITION 3.1
Laplace Transform
The Laplace transform f of f is a function defined by e−st ft dt fs = 0
for all s such that this integral converges. 107
108
CHAPTER 3
The Laplace Transform
The Laplace transform converts a function f to a new function called f. Often we use t as the independent variable for f and s for the independent variable of f. Thus, ft is the function f evaluated at t, and fs is the function f evaluated at s. It is often convenient to agree to use lowercase letters for a function put into the Laplace transform, and its upper case for the function that comes out. In this notation, F = f
G = g
H = h
and so on.
EXAMPLE 3.1
Let ft = eat , with a any real number. Then
fs = Fs =
0
k
e−st eat dt =
ea−st dt
0
1 a−st = lim e e dt = lim k→ 0 k→ a − s
1 a−sk 1 = lim − e k→ a − s a−s =−
k
a−st
0
1 1 = a−s s−a
provided that a − s < 0, or s > a. The Laplace transform of ft = eat is Fs = 1/s − a, defined for s > a.
EXAMPLE 3.2
Let gt = sint. Then gs = Gs = = lim
k→ 0
k
e−st sint dt
0
e−st sint dt
−ks
e cos k + se−ks sin k − 1 1 = lim − = 2 k→ s2 + 1 s +1 Gs is defined for all s > 0. A Laplace transform is rarely computed by referring directly to the definition and integrating. Instead, we use tables of Laplace transforms of commonly used functions (such as Table 3.1) or computer software. We will also develop methods that are used to find the Laplace transform of a shifted or translated function, step functions, pulses, and various other functions that arise frequently in applications. The Laplace transform is linear, which means that constants factor through the transform, and the transform of a sum of functions is the sum of the transform of these functions.
3.1 Definition and Basic Properties
Table of Laplace Transforms of Functions
TA B L E 3 . 1 ft 1.
1
2.
t
3.
tn n = 1 2 3 · · ·
4.
1 √ t
5.
eat
6.
teat
7.
tn eat
8. 9.
1 eat − ebt a−b 1 aeat − bebt a−b
10.
c − beat + a − cebt + b − aect a − bb − cc − a
11.
sinat
12.
cosat
13.
1 − cosat
14.
at − sinat
15.
sinat − at cosat
16
sinat + at cosat
17.
t sinat
18.
t cosat
19.
cosat − cosbt b − ab + a
20.
eat sinbt
21.
eat cosbt
22.
sinhat
23.
coshat
24.
sinatcoshat − cosatsinhat
25.
sinatsinhat
Fs = fts 1 s 1 s2 n! sn+1 s 1 s−a 1 s − a2 n! s − an+1 1 s − as − b s s − as − b 1 s − as − bs − c a s 2 + a2 s s 2 + a2 a2 ss2 + a2 a3 s2 s2 + a2 2a3 2 s + a2 2 2as2 2 s + a2 2 2as s2 + a2 2 s2 − a2 s2 + a2 2 s s2 + a2 s2 + b2 b s − a2 + b2 s−a s − a2 + b2 a s 2 − a2 s s 2 − a2 4a3 s4 + 4a4 2a2 s 4 s + 4a4
109
110
CHAPTER 3
The Laplace Transform
(continued)
TA B L E 3 . 1 ft
Fs = fts
26.
sinhat − sinat
2a3 s4 − a4
27.
coshat − cosat
28.
1 √ eat 1 + 2at t
2a2 s s4 − a4 s s − a3/2
29.
J0 at
√
30.
Jn at
1 an
31.
√ J0 2 at
32.
1 sinat t
33.
2
1 − cosat t
34.
2
1 − coshat t
1 s2 + a2 √
37.
a 1 2 √ − aea t erfc √ t t a 1 2 √ + aea t erf √ t t √ 2 ea t erfa t
38.
√ 2 ea t erfca t
39.
erfc
35. 36.
40. 41. 42. 43. 44.
a √
2 t
1 2 √ e−a /4t t 1 t + a √ 1 sin2a t t t f a t bt/a e f a
45.
t
46.
t − a
47.
Ln t (Laguerre polynomial)
s2 + a2 − s √ s2 + a2
1 −a/s e s a tan−1 s 2 s + a2 ln 2 s ln
s 2 − a2 s2
1 s+a √ s s − a2 a √ ss − a2 1 √ √ s s + a √
1 −a√s e s √ 1 √ e−a s s √ 1 √ eas erfc as s a erf √ s aFas aFas − b e−s 1 − e−s s e−as 1 s−1 n s s
n
3.1 Definition and Basic Properties
(continued)
TA B L E 3 . 1
48.
ft
Fs = fts
n! √ H2n t 2n! t
1 − sn sn+1/2
(Hermite polynomial) 49.
√
−n! H2n+1 t 2n + 1!
1 − sn sn+3/2
(Hermite polynomial) 50.
1 as2
triangular wave
1 − e−as 1 + e−as
=
as 1 tanh 2 as 2
f (t) 1 a 51.
2a
3a
4a
t as 1 tanh s 2
square wave
f (t) 1 a
1 52.
2a
3a
t
4a
e−as 1 − 2 as s1 − e−as
sawtooth wave
f (t) 1 a
2a
t
Operational Formulas ft
Fs
aft + bgt
aFs + bGs
f t f
sFs − f0+
t
sn Fs − sn−1 f0 − · · · − f n−1 0
f d
1 Fs s
n t
0
tft n
t ft 1 ft t
−F s −1n F n s F d s
eat ft
Fs − a
ft − aHt − a
e−as Fs
ft + = ft (periodic)
1 e−st ft dt 1 − e−s 0
111
112
CHAPTER 3 THEOREM 3.1
The Laplace Transform Linearity of the Laplace Transform
Suppose fs and gs are defined for s > a, and and are real numbers. Then f + gs = Fs + Gs for s > a. Proof
e−st ft dt and 0 e−st gt dt converge for s > a. Then f + gs = e−st ft + gt dt
By assumption,
0
0
=
0
e−st ft dt +
0
e−st gt dt = Fs + Gs
for s > a. This conclusion extends to any finite sum: 1 f1 + · · · + n fn s = 1 F1 s + · · · + n Fn s for all s such that each Fj s is defined. Not every function has a Laplace transform, because 0 e−st ft dt may not converge for any real values of s. We will consider conditions that can be placed on f to ensure that f has a Laplace transform. k −st An obvious necessary −st condition is that 0 e ft dt must be defined for every k > 0, because fs = 0 e ft dt. For this to occur, it is enough that f be piecewise continuous on 0 k for every positive number k. We will define this concept in general terms because it occurs in other contexts as well.
DEFINITION 3.2
Piecewise Continuity
f is piecewise continuous on a b if there are points a < t 1 < t2 < · · · < t n < b such that f is continuous on each open interval a t1 , tj−1 tj , and tn b, and all of the following one-sided limits are finite: lim ft lim ft lim ft and lim ft
t→a+
t→tj −
t→tj +
t→b−
This means that f is continuous on a b except perhaps at finitely many points, at each of which f has finite one-sided limits from within the interval. The only discontinuities a piecewise continuous function f can experience on a b are finitely many jump discontinuities (gaps of finite width in the graph). Figure 3.1 shows typical jump discontinuities in a graph. For example, let ⎧ 2 t for 0 ≤ t ≤ 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ 2 at t = 2 ft = ⎪ 1 for 2 < t ≤ 3 ⎪ ⎪ ⎪ ⎪ ⎩ −1 for 3 < t ≤ 4
3.1 Definition and Basic Properties
113
y(t) 4 3
y (t)
2 1 t1
t
t2
0 1
FIGURE 3.1 A function having jump discontinuities at t1 and t2 .
1
2
3
4
t
FIGURE 3.2
⎧ 2 t ⎪ ⎪ ⎪ ⎨ 2 ft = ⎪ 1 ⎪ ⎪ ⎩ −1
if if if if
0≤t<2 t=2 2
Then f is continuous 0 4 except at 2 and 3, where f has jump discontinuities. A graph of this function is shown in Figure 3.2. k If f is piecewise continuous on 0 k, then so is e−st ft, and 0 e−st ft dt exists. k Existence of 0 e−st ft dt for every positive k does not ensure existence of limk→ k −st 2 2 e ft dt. For example, ft = et is continuous on every interval 0 k, but 0 e−st et dt 0 −st diverges for every real value of s. Thus, for convergence of 0 e ft dt, we need another condition on f . The form of this integral suggests one condition that is sufficient. If, for some numbers M and b, we have ft ≤ Mebt , then e−st ft ≤ Meb−st
for s ≥ b
But
Meb−st dt
0
converges (to M/s − b) if b − s < 0, or s > b. Then, by comparison, 0 e−st ft dt also converges if s > b, hence 0 e−st ft dt converges if s > b. This line of reasoning suggests a set of conditions which are sufficient for a function to have a Laplace transform.
THEOREM 3.2
Existence of f
Suppose f is piecewise continuous on 0 k for every positive k. Suppose also that there are numbers M and b, such that ft ≤ Mebt for t ≥ 0. Then 0 e−st ft dt converges for s > b, hence fs is defined for s > b. Many functions satisfy these conditions, including polynomials, sinat, cosat, eat , and others. The conditions of the theorem are sufficient, but not necessary for a function to have a Laplace transform. Consider, for example, ft = t−1/2 for t > 0. This function is not piecewise
114
CHAPTER 3
The Laplace Transform
k continuous on any 0 k because limt→0+ t−1/2 = . Nevertheless, 0 e−st t−1/2 dt exists for every positive k and s > 0. Further, 2 fs = e−st t−1/2 dt = 2 e−sx dx (let x = t1/2 ) 0
2 −z2 e dz =√ s 0 = s
0
√ (let z = x s)
2 √ in which we have used the fact (found in some standard integral tables) that 0 e−z dz = /2. Now revisit the flow chart at the start of this chapter. Taking the Laplace transform of a function is the first step in solving certain kinds of problems. The bottom of the flow chart suggests that at some point we must be able to go back the other way. After we find some function Gs, we will need to produce a function g whose Laplace transform is G. This is the process of taking an inverse Laplace transform.
DEFINITION 3.3
Given a function G, a function g such that g = G is called an inverse Laplace transform of G. In this event, we write g = −1 G
For example,
−1
and
−1
1 t = eat s−a
1 t = sint s2 + 1
This inverse process is ambiguous because, given G, there will be be many functions whose Laplace transform is G. For example, we know that the Laplace transform of e−t is 1/s + 1 for s > −1. However, if we change ft at just one point, letting e−t for t = 3 ht = 0 for t = 3 −st −st then 0 e ft dt = 0 e ht dt, and h has the same Laplace transform as f . In such a case, which one do we call the inverse Laplace transform of 1/s + 1? One answer is provided by Lerch’s Theorem, which states that two continuous functions having the same Laplace transform must be equal. THEOREM 3.3
Lerch
Let f and g be continuous on 0 and suppose that f = g. Then f = g.
3.1 Definition and Basic Properties
115
In view of this, we will partially resolve the ambiguity in taking the inverse Laplace transform by agreeing that, given Fs, we seek a continuous f whose Laplace transform is F . If there is no continuous inverse transform function, then we simply have to make some agreement as to which of several possible candidates we will call −1 F. In applications, context will often make this choice obvious. Because of the linearity of the Laplace transform, its inverse is also linear. THEOREM 3.4
If −1 F = f and −1 G = g, and and are real numbers, then −1 F + G = f + g If Table 3.1 is used to find f, look up f in the left column and read f from the right column. For −1 F, look up F in the right column and match it with f in the left.
SECTION 3.1
PROBLEMS
In each of Problems 1 through 10, use the linearity of the Laplace transform, and Table 3.1, to find the Laplace transform of the function. 1. 2 sinht − 4 2. cost − sint 3. 4t sin2t 4. t2 − 3t + 5 5. t − cos5t 6. 2t2 e−3t − 4t + 1 7. t + 42
5 s + 72 6 1 − 17. s − 4 s − 42
2 1 3 4 − 2+ 6 18. 4 s s s s 16.
Suppose that ft is defined for all t ≥ 0. Then f is periodic with period T if ft + T = ft for all t ≥ 0. For example, sint has period 2. In Problems 19–22, assume that f has period T . 19. Show that
8. 3e−t + sin6t
fs =
9. t − 3t + cos4t 3
10. −3 cos2t + 5 sin4t In each of Problems 11 through 18, use the linearity of the inverse Laplace transform and Table 3.1 to find the (continuous) inverse Laplace transform of the function. 11. 12. 13. 14. 15.
−2 s + 16 4s s2 − 14 2s − 5 s2 + 16 3s + 17 s2 − 7 1 3 + s − 7 s2
20. Show that n+1T nT
n+1T
e−st ft dt
n=0 nT
e−st ft dt = e−nsT
T
e−st ft dt
0
21. From Problems 19 and 20, show that T −nsT e e−st ft dt fs = n=0
0
n 22. Use the geometric series n=0 r = 1/1 − r for r < 1, together with the result of Problem 21, to show that T 1 fs = e−st ft dt −sT 1−e 0
In each of Problems 23 through 30, a periodic function is given, sometimes by a graph. Find f, using the result of Problem 22.
CHAPTER 3
116
The Laplace Transform
5 23. f has period 6 and. ft = 0
for 0 < t ≤ 3 for 3 < t ≤ 6
28. f has the graph of Figure 3.6.
f (t)
24. ft = E sint, with E and positive constants. (Here f has period /).
3
25. f has the graph of Figure 3.3.
2
f (t)
0
5
FIGURE 3.6
5 10
0
t
55 60
30 35
t
16 18
8 10
29. f has the graph of Figure 3.7.
FIGURE 3.3
f (t)
26. f has the graph of Figure 3.4.
h f (t) 2
a
0
3a
4a
5a
6a
7a
5a
6a
t
t
12
6
2a
FIGURE 3.7
FIGURE 3.4
30. f has the graph of Figure 3.8.
27. f has the graph of Figure 3.5.
f (t)
f (t) E sin(ωt)
h
E π
2π ω
ω
3π ω
t a
3a
4a
t
FIGURE 3.8
FIGURE 3.5
3.2
2a
Solution of Initial Value Problems Using the Laplace Transform The Laplace transform is a powerful tool for solving some kinds of initial value problems. The technique depends on the following fact about the Laplace transform of a derivative. THEOREM 3.5
Laplace Transform of a Derivative
Let f be continuous on 0 and suppose f is piecewise continuous on 0 k for every positive k. Suppose also that limk→ e−sk fk = 0 if s > 0. Then f s = sFs − f0
(3.1)
That is, the Laplace transform of the derivative of f is s times the Laplace transform of f at s, minus f at zero.
3.2 Solution of Initial Value Problems Using the Laplace Transform Proof
117
Begin with an integration by parts, with u = e−st and dv = f t dt. For k > 0, k k e−st f t dt = e−st ftk0 − −se−st ft dt 0
0
= e−sk fk − f0 + s
k
e−st ft dt
0
Take the limit as k → and use the assumption that e−sk fk → 0 to obtain
k −sk −st f s = lim e fk − f0 + s e ft dt k→
= −f0 + s
0
0
e−st ft dt = −f0 + sFs
If f has a jump discontinuity at 0 (as occurs, for example, if f is an electromotive force that is switched on at time zero), then this conclusion can be amended to read f s = sFs − f0+ where f0+ = lim ft t→0+
is the right limit of ft at 0. For problems involving differential equations of order 2 or higher, we need a higher derivative version of the theorem. Let f j denote the jth derivative of f . As a notational convenience, we let f 0 = f . THEOREM 3.6
Laplace Transform of a Higher Derivative
Suppose f , f , · · · , f n−1 are continuous on 0 , and f n is piecewise continuous on 0 k for every positive k. Suppose also that limk→ e−sk f j k = 0 for s > 0 and for j = 1 2 n − 1. Then f n s = sn Fs − sn−1 f0 − sn−2 f 0 − · · · − sf n−2 0 − f n−1 0
(3.2)
The second derivative case (n = 2) occurs sufficiently often that we will record it separately. Under the conditions of the theorem, f s = s2 Fs − sf0 − f 0
(3.3)
We are now ready to use the Laplace transform to solve certain initial value problems.
EXAMPLE 3.3
Solve y − 4y = 1 y0 = 1 We know how to solve this problem, but we will use the Laplace transform to illustrate the technique. Write ys = Ys. Take the Laplace transform of the differential equation, using the linearity of and equation (3.1), with yt in place of ft: y − 4ys = y s − 4 ys 1 = sYs − y0 − 4Ys = 1s = s
118
CHAPTER 3
The Laplace Transform
Here we used the fact (from Table 3.1) that 1s = 1/s for s > 0. Since y0 = 1, we now have s − 4Ys = y0 +
1 1 = 1+ s s
At this point we have an algebra problem to solve for Ys, obtaining Ys =
1 1 + s − 4 ss − 4
(note the flow chart at the beginning of this chapter). The solution of the initial value problem is
1 1 y = −1 Y = −1 + −1 s−4 ss − 4 From entry 5 of Table 3.1, with a = 4, −1
1 = e4t s−4
And from entry 8, with a = 0 and b = 4,
1 1 0t 1 −1 = e − e4t = e4t − 1 ss − 4 −4 4 The solution of the initial value problem is 1 yt = e4t + e4t − 1 4 5 4t 1 = e − 4 4 One feature of this Laplace transform technique is that the initial value given in the problem is naturally incorporated into the solution process through equation (3.1). We need not find the general solution first, then solve for the constant to satisfy the initial condition.
EXAMPLE 3.4
Solve y + 4y + 3y = et
y0 = 0 y 0 = 2
Apply to the differential equation to get y + 4 y + 3 y = et Now y = s2 Y − sy0 − y 0 = s2 Y − 2 and y = sY − y0 = sY Therefore, s2 Y − 2 + 4sY + 3Y =
1 s−1
3.2 Solution of Initial Value Problems Using the Laplace Transform
119
Solve for Y to obtain 2s − 1 s − 1s2 + 4s + 3
Ys =
The solution is the inverse Laplace transform of this function. Some software will produce this inverse. If we want to use Table 3.1, we must use a partial fractions decomposition to write Ys as a sum of simpler functions. Write Ys = =
2s − 1 s − 1s2 + 4s + 3 2s − 1 A B C = + + s − 1s + 1s + 3 s − 1 s + 1 s + 3
This equation can hold only if, for all s, As + 1s + 3 + Bs − 1s + 3 + Cs − 1s + 1 = 2s − 1 Now choose values of s to simplify the task of determining A, B, and C. Let s = 1 to get 8A = 1, so A = 18 . Let s = −1 to get −4B = −3, so B = 43 . Choose s = −3 to get 8C = −7, so C = − 78 . Then Ys =
1 1 3 1 7 1 + − 8 s−1 4 s+1 8 s+3
Now read from Table 3.1 that 1 3 7 yt = et + e−t − e−3t 8 4 8 Again, the Laplace transform has converted an initial value problem to an algebra problem, incorporating the initial conditions into the algebraic manipulations. Once we obtain Ys, the problem becomes one of inverting the transformed function to obtain yt. Equation (3.1) has an interesting consequence that will be useful later. Under the conditions of the theorem, we know that f = s f − f0 Suppose ft is defined by an integral, say ft =
t
g d 0
Now f0 = 0 and, assuming continuity of g, f t = gt. Then t
f = g = s g d 0
This means that
t 0
1 g d = g s
(3.4)
enabling us to take the Laplace transform of a function defined by an integral. We will use this equation later in dealing with circuits having discontinuous electromotive forces. Thus far we have illustrated a Laplace transform technique for solving initial value problems with constant coefficients. However, we could have solved the problems in these examples by other means. In the next three sections we will develop the machinery needed to apply the Laplace transform to problems that defy previous methods.
CHAPTER 3
120
The Laplace Transform
PROBLEMS
SECTION 3.2
11. Suppose f satisfies the hypotheses of Theorem 3.5, except for a jump discontinuity at 0. Show that f s = sFs − f0+, where f0+ = limt→0+ ft.
In each of Problems 1 through 10, use the Laplace transform to solve the initial value problem. 1. y + 4y = 1 y0 = −3 2. y − 9y = t y0 = 5
12. Suppose f satisfies the hypotheses of Theorem 3.5, except for a jump discontinuity at a positive number c. Prove that
3. y + 4y = cost y0 = 0 4. y + 2y = e−t y0 = 1
f s = sFs − f0 − e−cs fc+ − fc−
5. y − 2y = 1 − t y0 = 4 6. y + y = 1 y0 = 6 y 0 = 0
where fc− = limt→c− ft.
7. y − 4y + 4y = cost y0 = 1 y 0 = −1
13. Suppose g is piecewise continuous on 0 k for every k > 0, and that there are numbers M, b, and a such that gt ≤ Mebt for t ≥ a. Let G = g. Show that
t 1 a 1 gw dw s = Gs − gw dw s s 0 0
8. y + 9y = t2 y0 = y 0 = 0 9. y + 16y = 1 + t y0 = −2 y 0 = 1 10. y − 5y + 6y = e−t y0 = 0 y 0 = 2
3.3
Shifting Theorems and the Heaviside Function One point to developing the Laplace transform is to broaden the class of problems we are able to solve. Methods of Chapters 1 and 2 are primarily aimed at problems involving continuous functions. But many mathematical models deal with discontinuous processes (for example, switches thrown on and off in a circuit). For these, the Laplace transform is often effective, but we must learn more about representing discontinuous functions and applying both the transform and its inverse to them.
3.3.1
The First Shifting Theorem
We will show that the Laplace transform of eat ft is nothing more than the Laplace transform of ft, shifted a units to the right. This is achieved by replacing s by s − a in Fs to obtain Fs − a.
THEOREM 3.7
First Shifting Theorem, or Shifting in the s Variable
Let fs = Fs for s > b ≥ 0. Let a be any number. Then eat fts = Fs − a Proof
for s > a + b
Compute eat fts = =
for s − a > b, or s > a + b.
eat e−st fs ds
0
0
e−s−at ft dt = Fs − a
3.3 Shifting Theorems and the Heaviside Function
121
EXAMPLE 3.5
We know from Table 3.1 that cosbt = s/s2 + b2 . For the Laplace transform of eat cosbt, replace s with s − a to get s−a s − a2 + b2
eat cosbts =
EXAMPLE 3.6
Since t3 = 6/s4 , then t3 e7t s =
6 s − 74
The first shifting theorem suggests a corresponding formula for the inverse Laplace transform: If f = F , then −1 Fs − a = eat ft Sometimes it is convenient to write this result as −1 Fs − a = eat −1 Fs
(3.5)
EXAMPLE 3.7
Suppose we want to compute −1
4 s2 + 4s + 20
We will manipulate the quotient into a form to which we can apply the shifting theorem. Complete the square in the denominator to write 4 s2 + 4s + 20
=
4 s + 22 + 16
Think of the quotient on the right as a function of s + 2: Fs + 2 =
4 s + 22 + 16
This means we should choose Fs =
4 s2 + 16
Now the shifting theorem tells us that e−2t sin4t = Fs − −2 = Fs + 2 =
4 s + 22 + 16
122
CHAPTER 3
The Laplace Transform
and therefore −1
4 = e−2t sin4t s + 22 + 16
EXAMPLE 3.8
Compute −1
3s − 1 s2 − 6s + 2
Again, begin with some manipulation into the form of a function of s − a for some a: 3s − 1 3s − 1 = s2 − 6s + 2 s − 32 − 7 =
8 3s − 3 + = Gs − 3 + Ks − 3 s − 32 − 7 s − 32 − 7
if we choose Gs =
3s s2 − 7
and
Ks =
8 s2 − 7
Now apply equation (3.5) (in the second line) to write
3s − 1 −1 = −1 Gs − 3 + −1 Ks − 3 s2 − 6s + 2 = e3t −1 Gs + e3t −1 Ks
3s 8 + e3t −1 2 = e3t −1 2 s −7 s −7
s 1 3t −1 3t −1 = 3e + 8e s2 − 7 s2 − 7 √ √ 8 = 3e3t cosh 7t + √ e3t sinh 7t 7
3.3.2
The Heaviside Function and Pulses
We will now lay the foundations for solving certain initial value problems having discontinuous forcing functions. To do this, we will use the Heaviside function. Recall that f has a jump discontinuity at a if limt→a− ft and limt→a− ft both exist and are finite, but unequal. Figure 3.9 shows a typical jump discontinuity. The magnitude of the jump discontinuity is the “width of the gap” in the graph at a. This width is lim ft − lim ft t→a−
t→a−
Functions with jump discontinuities can be treated very efficiently using the unit step function, or Heaviside function.
3.3 Shifting Theorems and the Heaviside Function
123
y
Magnitude of the jump discontinuity at a
x
a FIGURE 3.9
DEFINITION 3.4
Heaviside Function
The Heaviside function H is defined by Ht =
0 1
if t < 0 if t ≥ 0
Oliver Heaviside (1850–1925) was an English electrical engineer who did much to introduce Laplace transform methods into engineering practice. A graph of H is shown in Figure 3.10. It has a jump discontinuity of magnitude 1 at 0. The Heaviside function may be thought of as a flat switching function, “on” when t ≥ 0, where Ht = 1, and “off” when t < 0, where Ht = 0. We will use it to achieve a variety of effects, including switching functions on and off at different times, shifting functions along the axis, and combining functions with pulses. To begin this program, if a is any number, then Ht − a is the Heaviside function shifted a units to the right, as shown in Figure 3.11, since 0 if t < a Ht − a = 1 if t ≥ a Ht − a models a flat signal of magnitude 1, turned off until time t = a and then switched on. We can use Ht − a to achieve the effect of turning a given function g off until time t = a, at which time it is switched on. In particular, 0 if t < a Ht − agt = gt if t ≥ a
H(t a)
y
(0, 1)
1 x FIGURE 3.10
function Ht.
The Heaviside
a FIGURE 3.11 A shifted Heaviside
function.
t
124
CHAPTER 3
The Laplace Transform y
y
y cos(t)
1.0
y H(t π)cos(t)
0.5
π 2
0
2
π 4
6
8
10
12
t
4
6
8
10
12
t
0.5 1.0 FIGURE 3.12
Comparison of y = cost and y = Ht − cost.
is zero until time t = a, at which time it switches on gt. To see this in a specific case, let gt = cost for all t. Then 0 if t < Ht − gt = Ht − cost = cost if t ≥ Graphs of cost and Ht − cost are shown in Figure 3.12 for comparison. We can also use the Heaviside function to describe a pulse.
DEFINITION 3.5
Pulse
A pulse is a function of the form k Ht − a − Ht − b in which a < b and k is a nonzero real number
This pulse function is graphed in Figure 3.13. It has value 0 if t < a (where Ht − a = Ht − b = 0), value 1 if a ≤ t < b (where Ht − a = 1 and Ht − b = 0), and value 0 if t ≥ b (where Ht − a = Ht − b = 1). Multiplying a function g by this pulse has the effect of leaving gt switched off until time a. The function is then turned on until time b, when it is switched off again. For example, let gt = et . Then ⎧ ⎪ ⎨0 if t < 1
Ht − 1 − Ht − 2et = et if 1 ≤ t < 2 ⎪ ⎩ 0 if t ≥ 2 Figure 3.14 shows a graph of this function. Next consider shifted functions of the form Ht −agt −a. If t < a, the gt −aHt −a = 0 because Ht − a = 0. If t ≥ a, then Ht − a = 1 and Ht − agt − a = gt − a, which is gt shifted a units to the right. Thus the graph of Ht − agt − a is zero along the horizontal axis until t = a, and for t ≥ a is the graph of gt for t ≥ 0, shifted a units to the right to begin at a instead of 0.
3.3 Shifting Theorems and the Heaviside Function
125
f (t) 8 7 6 5 4 3 2 1
1 t a FIGURE 3.13
t 0
b
Pulse function Ht − a − Ht − b.
0.5
1.0
1.5
2.0
2.5
3.0
FIGURE 3.14 Graph of ft = Ht − 1 − Ht − 2et .
EXAMPLE 3.9
Consider gt = t2 and a = 2. Figure 3.15 compares the graph of g with the graph of Ht − 2 gt − 2. The graph of g is a familiar parabola. The graph of Ht − 2gt − 2 is zero until time 2, then has the shape of the graph of t2 for t ≥ 0, but shifted 2 units to the right to start at t = 2. It is important to understand the difference between gt, Ht −agt, and Ht −agt −a. Figure 3.16 shows graphs of these three functions for gt = t2 and a = 3.
3.3.3
The Second Shifting Theorem
Sometimes Ht − agt − a is referred to as a shifted function, although it is more than that because this graph is also zero for t < a. The second shifting theorem deals with the Laplace transform of such a function. y
y
t
t
2
FIGURE 3.15 Comparison of y = t 2 and
y = t − 22 Ht − 2.
y
y
y
t
t 3
t 3
FIGURE 3.16 Comparison of y = t 2 , y = t 2 Ht − 3, and y = t − 32 Ht − 3.
126
CHAPTER 3 THEOREM 3.8
The Laplace Transform Second Shifting Theorem, or Shifting In the t Variable
Let fs = Fs for s > b. Then Ht − aft − as = e−as Fs for s > b. That is, we obtain the Laplace transform of Ht − aft − a by multiplying the Laplace transform of ft by e−as . Proof
Proceeding from the definition, Ht − aft − as = =
0 a
e−st Ht − aft − a dt e−st ft − a dt
because Ht − a = 0 for t < a, and Ht − a = 1 for t ≥ a. Now let w = t − a in the last integral to obtain Ht − aft − as = e−sa+w fw dw 0
= e−as
0
e−sw fw dw = e−as Fs
EXAMPLE 3.10
Suppose we want the Laplace transform of Ht −a. Write this as Ht −aft −a, with ft = 1 for all t. Since Fs = 1/s (from Table 3.1 or by direct computation from the definition), then 1 Ht − as = e−as 1s = e−as s
EXAMPLE 3.11
Compute g, where gt = 0 for 0 ≤ t < 2 and gt = t2 + 1 for t ≥ 2. Since gt is zero until time t = 2, and is then t2 + 1, we may write gt = Ht − 2t2 + 1. To apply the second shifting theorem, we must write gt as a function, or perhaps sum of functions, of the form ft − 2Ht − 2. This necessitates writing t2 + 1 as a sum of functions of t − 2. One way to do this is to expand t2 + 1 in a Taylor series about 2. In this simple case we can achieve the same result by algebraic manipulation: t2 + 1 = t − 2 + 22 + 1 = t − 22 + 4t − 2 + 5 Then gt = t2 + 1Ht − 2 = t − 22 Ht − 2 + 4t − 2Ht − 2 + 5Ht − 2
3.3 Shifting Theorems and the Heaviside Function
127
Now we can apply the second shifting theorem: g = t − 22 Ht − 2 + 4 t − 2Ht − 2 + 5 Ht − 2 = e−2s t2 + 4e−2s t + 5e−2s 1
2 4 5 = e−2s 3 + 2 + s s s As usual, any formula for the Laplace transform of a class of functions can also be read as a formula for an inverse Laplace transform. The inverse version of the second shifting theorem is: −1 e−as Fst = Ht − aft − a
(3.6)
This enables us to compute the inverse Laplace transform of a known transformed function multipled by an exponential e−as .
EXAMPLE 3.12
Compute
−1
se−3s s2 + 4
The presence of the exponential factor suggests the use of equation (3.6). Concentrate on finding
s −1 s2 + 4 This inverse can be read directly from Table 3.1, and is ft = cos2t. Therefore −3s se −1 2 t = Ht − 3 cos2t − 3 s +4 We are now prepared to solve certain initial value problems involving discontinuous forcing functions.
EXAMPLE 3.13
Solve the initial value problem y + 4y = ft in which
y0 = y 0 = 0
ft =
0 t
for t < 3 for t ≥ 3
Because of the discontinuity in f , methods developed in Chapter 2 do not apply. First recognize that ft = Ht − 3t
128
CHAPTER 3
The Laplace Transform
Apply the Laplace transform to the differential equation to get y + y = s2 Ys − sy0 − y 0 + 4Ys = s2 + 4Ys = Ht − 3t in which we have inserted the initial conditions y0 = y 0 = 0. In order to use the second shifting theorem to compute Ht − 3t, write Ht − 3t = Ht − 3t − 3 + 3 = Ht − 3t − 3 + 3 Ht − 3 = e−3s t + 3e−3s 1 =
1 −3s 3 −3s e + e s2 s
We now have s2 + 4Y =
1 −3s 3 −3s e + e s2 s
The transform of the solution is Ys =
3s + 1 s2 s2 + 4
e−3s
The solution is within reach. We must take the inverse Laplace transform of Ys. To do this, first use a partial fractions decomposition to write 3s + 1 s2 s2 + 4
e−3s =
3 1 −3s 3 s 11 1 1 e − 2 e−3s + 2 e−3s − 2 e−3s 4s 4 s +4 4s 4 s +4
Each term is an exponential times a function whose Laplace transform we know, and we can apply equation (3.6) to write 3 3 yt = Ht − 3 − Ht − 3 cos2t − 3 4 4 1 1 1 + Ht − 3t − 3 − Ht − 3 sin2t − 3 4 4 2 Because of the Ht − 3 factor in each term, this solution is zero until time t = 3, and we may write 0 for t < 3 1 1 yt = 3 3 − cos2t − 3 + t − 3 − sin2t − 3 for t ≥ 3 4 4 4 8 or, upon combining terms, 0 for t < 3 yt = 1
2t − 6 cos2t − 3 − sin2t − 3 for t ≥ 3 8 A graph of this solution is shown in Figure 3.17. In this example, it is interesting to observe that the solution is differentiable everywhere, even though the function f occurring in the differential equation had a jump discontinuity at 3. This behavior is typical of initial value problems having a discontinuous forcing function. If the differential equation has order n and is a solution, then and its first n − 1 derivatives will be continuous, while the nth derivative will have a jump discontinuity wherever f does, and these jump discontinuities will agree in magnitude with the corresponding jump discontinuities of f .
3.3 Shifting Theorems and the Heaviside Function
129
y 2.5 2.0 1.5 1.0 0.5 0
2
4
6
8
t
FIGURE 3.17 Solution of
y + 4y =
0 t
if 0 ≤ t < 3 if t ≤ 3
y0 = y 0 = 0
Often we need to write a function having several jump discontinuities in terms of Heaviside functions in order to use the shifting theorems. Here is an example.
EXAMPLE 3.14
Let
⎧ ⎪ if t < 2 ⎨0 ft = t − 1 if 2 ≤ t < 3 ⎪ ⎩ −4 if t ≥ 3
A graph of f is shown in Figure 3.18. There are jump discontinuities of magnitude 1 at t = 2 and magnitude 6 at t = 3. Think of ft as consisting of two nonzero parts, the part that is t − 1 on 2 3 and the part that is −4 on 3 . We want to turn on t − 1 at time 2 and turn it off at time 3, then turn on −4 at time 3 and leave it on. The first effect is achieved by multiplying the pulse function Ht − 2 − Ht − 3 by t − 1. The second is achieved by multiplying Ht − 3 by 4. Therefore ft = Ht − 2 − Ht − 3t − 1 − 4Ht − 3 As a check, this gives ft = 0 if t < 2 because all of the shifted Heaviside functions are zero for t < 2. For 2 ≤ t < 3, Ht − 2 = 1 but Ht − 3 = 0 so ft = t − 1. And for t ≥ 3, Ht − 2 = Ht − 3 = 1, so ft = −4.
3.3.4
Analysis of Electrical Circuits
The Heaviside function is important in many kinds of problems, including the analysis of electrical circuits, where we anticipate turning switches on and off. Here are two examples.
EXAMPLE 3.15
Suppose the capacitor in the circuit of Figure 3.19 initially has zero charge and that there is no initial current. At time t = 2 seconds, the switch is thrown from position B to A, held there for 1 second, then switched back to B. We want the output voltage Eout on the capacitor.
130
CHAPTER 3
The Laplace Transform
2 1 250,000
t 2
1 2
3
4
5 A
3 4
B
10 6 F
E out
10 V
FIGURE ⎧ 3.18
⎪ ⎨
0 ft = t − 1 ⎪ ⎩ −4
Graph of if t < 2 if 2 ≤ t < 3 if t ≥ 3
FIGURE 3.19
From the circuit diagram, the forcing function is zero until t = 2, then has value 10 volts until t = 3, and then is zero again. Thus E is the pulse function Et = 10 Ht − 2 − Ht − 3 By Kirchhoff’s voltage law, Rit +
1 qt = Et C
or 250, 000q t + 106 qt = Et We want to solve for q subject to the initial condition q0 = 0. Apply the Laplace transform to the differential equation, incorporating the initial condition, to write 250000 sQt − q0 + 106 Qt = 250000sQ + 106 Q = Et Now Ets = 10 Ht − 2s − 10 Ht − 3s 10 −2s 10 −3s e − e s s We now have the following equation for Q: =
25105 sQs + 106 Qs =
10 −2s 10 −3s e − e s s
or Qs = 410−5
1 1 e−2s − 410−5 e−3s ss + 4 ss + 4
Use a partial fractions decomposition to write
1 −2s 1 −3s −5 1 −2s −5 1 −3s − 10 Qs = 10 e − e e − e s s+4 s s+4 By the second shifting theorem, −1
1 −2s e t = Ht − 2 s
3.3 Shifting Theorems and the Heaviside Function
131
Eout (t)
E(t) 10
10 8 6 4
2 1
2
0
2
FIGURE 3.20
t
3
Input voltage for the circuit
of Figure 3.19.
and
−1
0
2
4
6
8
t
FIGURE 3.21 Output voltage for the circuit of Figure 3.19.
1 −2s = Ht − 2ft − 2 e s+4
where ft = −1 1/s + 4 = e−4t . Thus
1 −2s −1 = Ht − 2e−4t−2 e s+4 The other two terms in Qs are treated similarly, and we obtain qt = 10−5 Ht − 2 − Ht − 2e−4t−2 − 10−5 Ht − 3 − Ht − 3e−4t−3 = 10−5 Ht − 2 1 − e−4t−2 − 10−5 Ht − 3 1 − e−4t−3 Finally, since the output voltage is Eout t = 106 qt, Eout t = 10Ht − 2 1 − e−4t−2 − 10Ht − 3 1 − e−4t−3 The input and output voltages are graphed in Figures 3.20 and 3.21.
EXAMPLE 3.16
The circuit of Figure 3.22 has the roles of resistor and capacitor interchanged from the circuit of the preceding example. We want to know the output voltage itR at any time. The differential equation of the preceding example applies to this circuit, but now we are interested in the current. Since i = q , then 25105 it + 106 qt = Et
i0 = q0 = 0
10 6 F A 10 V
FIGURE 3.22
E out
B 2.5 10 5
132
CHAPTER 3
The Laplace Transform Eout (t)
E(t)
10
10
5 0
2
3
4
5
6
t
5
2 1 0
1
t
2 3
10
FIGURE 3.23 Input voltage for the circuit of Figure 3.22.
FIGURE 3.24
Output voltage for the circuit
of Figure 3.22.
The strategy of eliminating q by differentiating and using i = q does not apply here, because Et is not differentiable. To eliminate qt in the present case, write t t i d + q0 = i d qt = 0
0
We now have the following problem to solve for the current: t 25105 it + 106 i d = Et 0
i0 = 0
This is not a differential equation. Nevertheless, we have the means to solve it. Take the Laplace transform of the equation, using equation (3.4), to obtain 1 25105 Is + 106 Is = Es s 1 1 = 10 e−2s − 10 e−3s s s Here I = i. Solve for Is to get Is = 410−5
1 −2s 1 −3s e − 410−5 e s+4 s+4
Take the inverse Laplace transform to obtain it = 410−5 Ht − 2e−4t−2 − 410−5 Ht − 3e−4t−3 The input and output voltages are graphed in Figures 3.23 and 3.24.
SECTION 3.3
PROBLEMS
In each of Problems 1 through 15, find the Laplace transform of the function. 1. t3 − 3t + 2e−2t 2. e−3t t − 2
3. ft =
1 cost
4. e4t t − cost
for 0 ≤ t < 7 for t ≥ 7
3.3 Shifting Theorems and the Heaviside Function 5. ft = 6. ft =
t 1 − 3t
for 0 ≤ t < 3 for t ≥ 3
2t − sint 0
for 0 ≤ t < for t ≥
7. e−t 1 − t2 + sint t2 for 0 ≤ t < 2 8. ft = 1 − t − 3t2 for t ≥ 2 cost for 0 ≤ t < 2 9. ft = 2 − sint for t ≥ 2 ⎧ ⎨−4 for 0 ≤ t < 1 0 for 1 ≤ t < 3 10. ft = ⎩ −t e for t ≥ 3 11. te−2t cos3t 12. et 1 − cosht t − 2 for 0 ≤ t < 16 13. ft = −1 for t ≥ 16 1 − cos2t for 0 ≤ t < 3 14. ft = 0 for t ≥ 3 15. e−5t t4 + 2t2 + t In each of Problems 16 through 25, find the inverse Laplace transform of the function. 16.
17.
18.
1 s2 + 4s + 12 1 s2 − 4s + 5 1 −5s e s3
se−2s 19. 2 s +9 20.
3 −4s e s+2
21.
1 s2 + 6s + 7
s−4 22. 2 s − 8s + 10 23.
s+2 s2 + 6s + 1
24.
1 e−s s − 53
25.
1 e−21s ss2 + 16
133
t 26. Determine e−2t 0 e2w cos3w dw. Hint: Use the first shifting theorem. In each of Problems 27 through 32, solve the initial value problem by using the Laplace transform. 27. y + 4y = ft y0 = 1 y 0 = 0, with ft = 0 for 0 ≤ t < 4 3 for t ≥ 4 28. y − 2y − 3y = ft y0 = 1 y 0 = 0, with ft = 0 for 0 ≤ t < 4 12 for t ≥ 4 29. y3 −8y = gt y0 = y 0 = y 0 = 0, with gt = 0 for 0 ≤ t < 6 2 for t ≥ 6 30. y + 5y + 6y = ft y0 = y 0 = 0, with ft = −2 for 0 ≤ t < 3 0 for t ≥ 3 31. y3 − y + 4y − 4y = ft y0 = y 0 = 0, 1 for 0 ≤ t < 5 y 0 = 1, with ft = 2 for t ≥ 5 32. y −4y +4y = ft y0 = −2 y 0 = 1, with ft = t for 0 ≤ t < 3 t + 2 for t ≥ 3 33. Calculate and graph the output voltage in the circuit of Figure 3.19, assuming that at time zero the capacitor is charged to a potential of 5 volts and the switch is opened at 0 and closed 5 seconds later. 34. Calculate and graph the output voltage in the RL circuit of Figure 3.25 if the current is initially zero and 0 for 0 ≤ t < 5 Et = 2 for t ≥ 5
R E(t)
L
FIGURE 3.25
35. Solve for the current in the RL circuit of Problem 34 if the current is initially zero and Et = k for 0 ≤ t < 5 0 for t ≥ 5
CHAPTER 3
134
The Laplace Transform
36. Solve for the current in the RL circuit of Problem 34 if the initial current is zero and Et = 0 for 0 ≤ t < 4 Ae−t for t ≥ 4
f (t) h t a
37. Write the function graphed in Figure 3.26 in terms of the Heaviside function and find its Laplace transform.
b
c
FIGURE 3.28
f (t) 40. Solve for the current in the RL circuit of Figure 3.29 if the initial current is zero, Et has period 4, and 10 for 0 ≤ t < 2 Et = 0 for 2 ≤ t < 4
K t a
b
FIGURE 3.26
L E(t)
R
38. Write the function graphed in Figure 3.27 in terms of the Heaviside function and find its Laplace transform. FIGURE 3.29
f (t) Hint: See Problem 22 of Section 3.1 for the Laplace transform of a periodic function. You should find that Is = Fs/1 + e−2s for some Fs. Use a geometric series to write 1 = −1n e−2ns 1 + e−2s n=0
M t a
b
c
N FIGURE 3.27
to write Is as an infinite series, then take the inverse transform term by term by using a shifting theorem. Graph the current for 0 ≤ t < 8.
39. Write the function graphed in Figure 3.28 in terms of the Heaviside function and find its Laplace transform.
3.4
Convolution In general the Laplace transform of the product of two functions is not the product of their transforms. There is, however, a special kind of product, denoted f ∗ g, called the convolution of f with g. Convolution has the feature that the transform of f ∗ g is the product of the transforms of f and g. This fact is called the convolution theorem.
DEFINITION 3.6
Convolution
If f and g are defined on 0 , then the convolution f ∗ g of f with g is the function defined by t f ∗ gt = ft − g d 0
for t ≥ 0.
3.4 Convolution THEOREM 3.9
135
Convolution Theorem
If f ∗ g is defined, then f ∗ g = f g Proof
Let F = f and G = g. Then FsGs = Fs
e−st gt dt =
0
Fse−s g d
0
in which we changed the variable of integration to and brought Fs within the integral. Now recall that e−s Fs = Ht − ft − s Substitute this into the integral for FsGs to get FsGs =
0
Ht − ft − sg d
(3.7)
But, from the definition of the Laplace transform, Ht − ft − =
0
e−st Ht − ft − dt
Substitute this into equation (3.7) to get FsGs = =
0
0
0
0
e−st Ht − ft − dt g d
e−st gHt − ft − dt d
Now recall that Ht − = 0 if 0 ≤ t < , while Ht − = 1 if t ≥ . Therefore, FsGs =
0
e−st gft − dt d
Figure 3.30 shows the t plane. The last integration is over the shaded region, consisting of points t satisfying 0 ≤ ≤ t < . Reverse the order of integration to write FsGs = = =
t 0
0
e
0
−st
t 0
0
e−st gft − d
gft − d dt
e−st f ∗ gt dt = f ∗ gs
136
CHAPTER 3
The Laplace Transform
τ
tτ
t FIGURE 3.30
Therefore FsGs = f ∗ gs as we wanted to show. The inverse version of the convolution theorem is useful when we want to find the inverse transform of a function that is a product, and we know the inverse transform of each factor. THEOREM 3.10
Let −1 F = f and −1 G = g. Then −1 FG = f ∗ g
EXAMPLE 3.17
Compute
−1
1 ss − 42
We can do this several ways (a table, a program, a partial fractions decomposition). But we can also write
1 1 −1 1 −1 = = −1 FsGs ss − 42 s s − 42 Now −1
1 = 1 = ft s
and −1
1 = te4t = gt s − 42
Therefore, −1
1 = ft ∗ gt = 1 ∗ te4t ss − 42 t 1 1 1 = e4 d = te4t − e4t + 4 16 16 0
The convolution operation is commutative.
3.4 Convolution
137
THEOREM 3.11
If f ∗ g is defined, so is g ∗ f , and f ∗ g = g ∗ f Proof
Let z = t − in the integral defining the convolution to get f ∗ gt = =
t 0
0
t
ft − g d fzgt − z−1 dz =
t 0
fzgt − z dz = g ∗ ft
Commutativity can have practical importance, since the integral defining g ∗ f may be easier to evaluate than the integral defining f ∗ g in specific cases. Convolution can sometimes enable us to write solutions of problems that are stated in very general terms.
EXAMPLE 3.18
We will solve the problem y − 2y − 8y = ft
y0 = 1 y 0 = 0
Apply the Laplace transform, inserting the initial values, to obtain y − 2y − 8ys = s2 Ys − s − 2sYs − 1 − 8Ys = fs = Fs Then s2 − 2s − 8Ys − s + 2 = Fs so Ys =
s−2 1 Fs + 2 s2 − 2s − 8 s − 2s − 8
Use a partial fractions decomposition to write Ys =
1 1 1 1 1 1 2 1 Fs − Fs + + 6 s−4 6 s+2 3 s−4 3 s+2
Then 1 1 1 2 yt = e4t ∗ ft − e−2t ∗ ft + e4t + e−2t 6 6 3 3 This is the solution, for any function f having a convolution with e4t and e−2t . Convolution is also used to solve certain kinds of integral equations, in which the function to be determined occurs in an integral. We saw an example of this in solving for the current in Example 3.16.
138
CHAPTER 3
The Laplace Transform
EXAMPLE 3.19
Determine f such that ft = 2t2 +
t 0
ft − e− d
Recognize the integral on the right as the convolution of f with e−t . Thus the equation has the form ft = 2t2 + f ∗ e−t t Taking the Laplace transform of this equation yields Fs =
4 1 + Fs 3 s s+1
Then Fs =
4 4 + 4 3 s s
and from this we easily invert to obtain 2 ft = 2t2 + t3 3
SECTION 3.4
PROBLEMS
In each of Problems 1 through 8, use the convolution theorem to compute the inverse Laplace transform of the function (even if another method would work). Wherever they occur, a and b are positive constants. 1.
1 s2 + 4s2 − 4
1 e−2s s2 + 16 s 3. 2 2 s + a s2 + b2 2.
s2 4. s − 3s2 + 5
8.
2 s3 s2 + 5
In each of Problems 9 through 16, use the convolution theorem to write a formula for the solution of the initial value problem in terms of ft. 9. y − 5y + 6y = ft y0 = y 0 = 0 10. y + 10y + 24y = ft y0 = 1 y 0 = 0 11. y − 8y + 12y = ft y0 = −3 y 0 = 2 12. y − 4y − 5y = ft y0 = 2 y 0 = 1 13. y + 9y = ft y0 = −1 y 0 = 1
5.
1 ss2 + a2 2
6.
1 s4 s − 5
15. y3 − y − 4y + 4y = ft y0 = y 0 = 1, y 0 = 0
7.
1 e−4s ss + 2
16. y4 − 11y + 18y = ft; y0 = y 0 = y 0 = y3 0 = 0
14. y − k2 y = ft y0 = 2 y 0 = −4
3.5 Unit Impulses and the Dirac Delta Function
t 23. ft = e−3t et − 3 0 fe3 d
In each of Problems 17 through 23, solve the integral equation. t 17. ft = −1 + 0 ft − e−3 d 18. ft = −t +
t
19. ft = e−t +
0
t 0
21. ft = 3 +
t 0
25. Show by example that in general f ∗ 1 = f , where 1 denotes the function that is identically 1 for all t. Hint: Consider ft = cost.
ft − d t 0
26. Use the convolution t theorem to determine the Laplace transform of e−2t 0 e2w cos3w dw.
ft − sin d
27. Use the convolution theorem to show that
t w 1 −1 Fs t = f d dw s2 0 0
f cos 2t − d
22. ft = cost + e−2t
3.5
24. Use t the convolution theorem to derive the formula 0 fw dws = 1/sFs. What assumptions are needed about ft?
ft − sin d
20. ft = −1 + t − 2
t 0
139
fe2 d
Unit Impulses and the Dirac Delta Function Sometimes we encounter the concept of an impulse, which may be intuitively understood as a force of large magnitude applied over an instant of time. We can model an impulse as follows. For any positive number , consider the pulse defined by 1 t = Ht − Ht − As shown in Figure 3.31, this is a pulse of magnitude 1/ and duration . By letting approach zero, we obtain pulses of increasing magnitude over shorter time intervals. Dirac’s delta function is thought of as a pulse of “infinite magnitude” over an “infinitely short” duration, and is defined to be t = lim t →0+
This is not really a function in the conventional sense, but is a more general object called a distribution. Nevertheless, for historical reasons it continues to be referred to as the delta function. It is also named for the Nobel laureate physicist P.A.M. Dirac. The shifted delta function t − a is zero except for t = a, where it has its infinite spike. We can define the Laplace transform of the delta function as follows. Begin with 1 t − a = Ht − a − Ht − a −
1
t 0
FIGURE 3.31 Graph of
t − a.
140
CHAPTER 3
The Laplace Transform
Then
1 1 −as 1 −a+s e−as 1 − e−s = e − e s s s
t − a = This suggests that we define
e−as 1 − e−s = e−as →0+ s
t − a = lim In particular, upon choosing a = 0 we have
t = 1 Thus we think of the delta function as having constant Laplace transform equal to 1. The following result is called the filtering property of the delta function. If at time a, a signal (function) is hit with an impulse, by multiplying it by t − a, and the resulting signal is summed over all positive time by integrating from zero to infinity, then we obtain exactly the signal value fa. THEOREM 3.12
Filtering Property
Let a > 0 and let f be integrable on 0 and continuous at a. Then
ftt − a dt = fa
0
Proof
First calculate
0
ft t − a dt =
0
1
Ht − a − Ht − a − ft dt
1 a+ = ft dt a
By the mean value theorem for integrals, there is some t between a and a + such that
a+
a
ft dt = ft
Then
0
ft t − a dt = ft
As → 0+, a + → a, so t → a and, by continuity, ft → fa. Then lim
→0+ 0
ft t − a dt = =
as we wanted to show.
0
ft lim t − a dt →0+
0
ftt − a dt = lim ft = fa →0+
3.5 Unit Impulses and the Dirac Delta Function
141
If we apply the filtering property to ft = e−st , we get
0
e−st t − a dt = e−as
consistent with the definition of the Laplace transform of the delta function. Further, if we change notation in the filtering property and write it as
0
f − t d = ft
then we can recognize the convolution of f with and read the last equation as f ∗ = f The delta function therefore acts as an identity for the “product” defined by the convolution of two functions. Here is an example of a boundary value problem involving the delta function.
EXAMPLE 3.20
Solve y + 2y + 2y = t − 3
y0 = y 0 = 0
Apply the Laplace transform to the differential equation to get s2 Ys + 2sYs + 2Ys = e−3s hence Ys =
e−3s s2 + 2s + 2
To find the inverse transform of the function on the right, first write Ys =
1 e−3s s + 12 + 1
Now use both shifting theorems. Because −1 1/s2 + 1 = sint, a shift in the s–variable gives us
1 −1 = e−t sint s + 12 + 1 Now shift in the t–variable to obtain yt = Ht − 3e−t−3 sint − 3 A graph of this solution is shown in Figure 3.32. The solution is differentiable for t > 0, except that y t has a jump discontinuity of magnitude 1 at t = 3. The magnitude of the jump is the coefficient of t − 3 in the differential equation.
142
CHAPTER 3
The Laplace Transform y(t) 0.30 0.25 0.20 0.15 0.10 0.05 0
2
FIGURE 3.32
yt =
e
−t−3
4
6
8
t
10
Graph of 0
if 0 ≤ t < 3
sint − 3
if t ≥ 3
The delta function may be used to study the behavior of a circuit that has been subjected to transients. These are generated during switching, and the high input voltages associated with them can create excessive current in the components, damaging the circuit. Transients can also be harmful because they contain a broad spectrum of frequencies. Introducing a transient into a circuit can therefore have the effect of forcing the circuit with a range of frequencies. If one of these is near the natural frequency of the system, resonance may occur, resulting in oscillations large enough to damage the system. For this reason, before a circuit is built, engineers sometimes use a delta function to model a transient and study its effect on the circuit.
EXAMPLE 3.21
Suppose, in the circuit of Figure 3.33, the current and charge on the capacitor are zero at time zero. We want to determine the output voltage response to a transient modeled by t The output voltage is qt/C, so we will determine qt. By Kirchhoff’s voltage law, Li + Ri +
1 q = i + 10i + 100q = t C
Since i = q , q + 10q + 100q = t We assume initial conditions q0 = q 0 = 0. Apply the Laplace transform to the differential equation and use the initial conditions to obtain s2 Qs + 10sQs + 100Qs = 1
E in (t) δ(t)
FIGURE 3.33
1H
10
0.01 F
3.5 Unit Impulses and the Dirac Delta Function
143
Then 1
Qs =
s2 + 10s + 100
In order to invert this by using a shifting theorem, complete the square to write Qs =
1 s + 52 + 75
Since
√ 1 1 = √ sin5 3t 2 s + 75 5 3
√ 1 1 = √ e−5t sin5 3t 2 s + 5 + 75 5 3
−1 then −1
qt = The output voltage is
√ 1 20 qt = 100qt = √ e−5t sin5 3t C 3 A graph of this output is shown in Figure 3.34. The circuit output displays damped oscillations at its natural frequency, even though it was not explicitly forced by oscillations of this frequency. If we wish, we can obtain the current by it = q t. Eout (t) 5 4 3 2 1 1
0.5 0
1.0
1.5
2.0
2.5
3.0
t
FIGURE 3.34 Output of the circuit of
Figure 3.32.
SECTION 3.5
PROBLEMS
In each of Problems 1 through 5, solve the initial value problem and graph the solution. 1. y + 5y + 6y = 3t − 2 − 4t − 5; y0 = y 0 = 0
2. y − 4y + 13y = 4t − 3 y0 = y 0 = 0
3. y3 + 4y + 5y + 2y = 6t; y0 = y 0 = y 0 = 0 4. y + 16y = 12t − 5/8 y0 = 3 y 0 = 0. 5. y + 5y + 6y = Bt y0 = 3 y 0 = 0. Call the solution . What are 0 and 0? Using this
144
CHAPTER 3
The Laplace Transform
information, what physical phenomenon does the Dirac delta function model? 6. Suppose f is not continuous at a, but limt→a+ ft = fa+ is finite. Prove that 0 ftt − a dt = fa+. 7. Evaluate 0 sint/tt − /6 dt. 2 8. Evaluate 0 t2 t − 3 dt. 9. Evaluate 0 ftt − 2 dt, where ⎧ ⎪ ⎨t for 0 ≤ t < 2 ft = t2 for t > 2 ⎪ ⎩ 5 for t = 2 10. It is sometimes convenient to consider t as the derivative of the Heaviside function Ht. Use the definitions of the derivative, the Heaviside function, and the delta function (as a limit of ) to give a heuristic justification for this. 11. Use the idea that H t = t from Problem 10 to determine the output voltage of the circuit of Example 3.16 by differentiating the relevant equation to obtain an equation in i rather than writing the charge as an integral. 12. If H t = t, then H ts = 1. Show that not all of the operational rules for the Laplace transform are compatible with this expression. Hint: Check to see whether H ts = s Hts − H0+.
3.6
13. Evaluate t − a ∗ ft. 14. An object of mass m is attached to the lower end of a spring of modulus k. Assume that there is no damping. Derive and solve an equation of motion for the position of the object at time t > 0, assuming that, at time zero, the object is pushed down from the equilibrium position with an initial velocity v0 . With what momentum does the object leave the equilibrium position? 15. Suppose an object of mass m is attached to the lower end of a spring having modulus k. Assume that there is no damping. Solve the equation of motion for the position of the object for any time t ≥ 0 if, at time zero, the weight is struck a downward blow of magnitude mv0 . How does the position of the object in Problem 14 compare with that of the object in this problem for any positive time? 16. A 2-pound weight is attached to the lower end of a spring, stretching it 83 inches. The weight is allowed to come to rest in the equilibrium position. At some later time, which is called time zero, the weight is struck a downward blow of magnitude 41 pound (an impulse). Assume that there is no damping in the system. Determine the velocity with which the weight leaves the equilibrium position as well as the frequency and magnitude of the resulting oscillations.
Laplace Transform Solution of Systems The Laplace transform can be of use in solving systems of equations involving derivatives and integrals.
EXAMPLE 3.22
Consider the system of differential equations and initial conditions for the functions x and y: x − 2x + 3y + 2y = 4 2y − x + 3y = 0 x0 = x 0 = y0 = 0 Begin by applying the Laplace transform to the differential equations, incorporating the initial conditions. We get 4 s2 X − 2sX + 3sY + 2Y = s 2sY − sX + 3Y = 0
3.6 Laplace Transform Solution of Systems
145
Solve these equations for Xs and Ys to get Xs =
2 4s + 6 and Ys = s2 s + 2s − 1 ss + 2s − 1
A partial fractions decomposition yields Xs = −
1 1 1 10 1 71 −3 2 + + 2s s 6 s+2 3 s−1
and 2 1 1 1 1 + Ys = − + s 3 s+2 3 s−1 Upon applying the inverse Laplace transform, we obtain the solution 1 10 7 xt = − − 3t + e−2t + et 2 6 3 and 1 2 yt = −1 + e−2t + et 3 3 The analysis of mechanical and electrical systems having several components can lead to systems of differential equations that can be solved using the Laplace transform.
EXAMPLE 3.23
Consider the spring/mass system of Figure 3.35. Let x1 = x2 = 0 at the equilibrium position, where the weights are at rest. Choose the direction to the right as positive, and suppose the weights are at positions x1 t and x2 t at time t. By two applications of Hooke’s law, the restoring force on m1 is −k1 x1 + k2 x2 − x1 and that on m2 is −k2 x2 − x1 − k3 x2 By Newton’s second law of motion, m1 x1 = −k1 + k2 x1 + k2 x2 + f1 t
k1
FIGURE 3.35
m1
k2
m2
k3
146
CHAPTER 3
The Laplace Transform
and m2 x2 = k2 x1 − k2 + k3 x2 + f2 t These equations assume that damping is negligible, but allow for forcing functions acting on each mass. As a specific example, suppose m1 = m2 = 1 and k1 = k3 = 4 while k2 = 25 . Suppose f2 t = 0, so no external driving force acts on the second mass, while a force of magnitude f1 t = 2 1 − Ht − 3 acts on the first. This hits the first mass with a force of constant magnitude 2 for the first 3 seconds, then turns off. Now the system of equations for the displacement functions is x1 = −
13 5 x1 + x2 + 2 1 − Ht − 3 2 2 5 13 x2 = x1 − x2 2 2
If the masses are initially at rest at the equilibrium position, then x1 0 = x2 0 = x1 0 = x2 0 = 0 Apply the Laplace transform to each equation of the system to get 13 5 21 − e−3s X1 + X2 + 2 2 s 5 13 s2 X2 = X1 − X2 2 2 s 2 X1 = −
Solve these to obtain X1 s =
2 13 1 2 1 − e−3s + s s2 + 9s2 + 4 2 s
and X2 s =
5 1 1 − e−3s s2 + 9s2 + 4 s
In preparation for applying the inverse Laplace transform, use a partial fractions decomposition to write X1 s =
1 s 13 1 −3s 1 s 13 1 1 s 1 s − 2 − 2 − e + 2 e−3s + 2 e−3s 36 s 4 s + 4 9 s + 9 36 s 4 s +4 9 s +9
X2 s =
5 1 1 s 1 s 1 s 5 1 −3s 1 s − + − e + 2 e−3s − 2 e−3s 36 s 4 s2 + 4 9 s2 + 9 36 s 4 s +4 9 s +9
and
3.6 Laplace Transform Solution of Systems
147
Now it is routine to apply the inverse Laplace transform to obtain the solution x1 t =
1 13 1 − cos2t − cos3t 36 4 9
13 1 1 + − + cos2t − 3 − cos3t − 3 Ht − 3 36 4 9
x2 t =
1 5 − cos2t + 19 cos3t 36 4
5 1 1 + − + cos2t − 3 − cos3t − 3 Ht − 3 36 4 9
EXAMPLE 3.24
In the circuit of Figure 3.36, suppose the switch is closed at time zero. We want to know the current in each loop. Assume that both loop currents and the charges on the capacitors are initially zero. Apply Kirchhoff’s laws to each loop to get 40i1 + 120q1 − q2 = 10 60i2 + 120q2 = 120q1 − q2 Since i = q , we can write qt = 40i1 + 120 60i2 + 120
t 0
t 0
t 0
i d + q0. Put into the two circuit equations, we get
i1 − i2 d + 120 q1 0 − q2 0 = 10
i2 d + 120q2 0 = 120
t 0
i1 − i2 d + 120 q1 0 − q2 0
Put q1 0 = q2 0 = 0 in this system to get 40i1 + 120 60i2 + 120
t 0 t 0
i1 − i2 d = 10 i2 d = 120
t 0
i1 − i2 d
40
60 1 120
10 V FIGURE 3.36
F
1 120
F
CHAPTER 3
148
The Laplace Transform
Apply the Laplace transform to each equation to get 120 120 10 I1 − I2 = s s s 120 120 120 I = I − I 60I2 + s 2 s 1 s 2 40I1 +
After some rearrangement, we have s + 3I1 − 3I2 =
1 4
2I1 − s + 4I2 = 0 Solve these to get I1 s =
3 1 1 1 s+4 = + 4s + 1s + 6 20 s + 1 10 s + 6
I2 s =
1 1 1 1 1 = − 2s + 1s + 6 10 s + 1 10 s + 6
and
Now use the inverse Laplace transform to find the solution i1 t =
SECTION 3.6
3 −t 1 −6t 1 1 e + e i2 t = e−t − e−6t 20 10 10 10
PROBLEMS
In each of Problems 1 through 10, use the Laplace transform to solve the initial value problem for the system.
11. Use the Laplace transform to solve the system y1 − 2y2 + 3y1 =0
1. x − 2y = 1 x + y − x = 0 x0 = y0 = 0
y1 − 4y2 + 3y3 =t
2. 2x − 3y + y = 0 x + y = t x0 = y0 = 0
y1 − 2y2 + 3y3 = − 1 y1 0 = y2 0 = y3 0 = 0
3. x + 2y − y = 1 2x + y = 0 x0 = y0 = 0 4. x + y − x = cos2t x + 2y = 0 x0 = y0 = 0
12. Solve for the currents in the circuit of Figure 3.37, assuming that the currents and charges are initially zero and that Et = 2Ht − 4 − Ht − 5.
5. 3x − y = 2t x + y − y = 0 x0 = y0 = 0
2
6. x + 4y − y = 0 x + 2y = e−t x0 = y0 = 0 7. x + 2x − y = 0 x + y + x = t2 x0 = y0 = 0 8. x + 4x − y = 0 x + y = t x0 = y0 = 0
E(t)
10. x + 2y − x = 0 4x + 3y + y = −6 x0 = y0 = 0
i1
FIGURE 3.37
i2 5H
3
9. x + y + x − y = 0 x + 2y + x = 1 x0 = y0 = 0
1
4
3.6 Laplace Transform Solution of Systems
149
highly varnished table. Show that, if stretched and released from rest, the masses oscillate with respect to each other with period
13. Solve for the currents in the circuit of Figure 3.37 if the currents and charges are initially zero and Et = 1 − Ht − 4 sin2t − 4. 14. Solve for the displacement functions of the masses in the system of Figure 3.38. Neglect damping and assume zero initial displacements and velocities, and external forces f1 t = 2 and f2 t = 0. 15. Solve for the displacement functions in the system of Figure 3.38 if f1 t = 1 − Ht − 2 and f2 t = 0. Assume zero initial displacements and velocities.
2
m1
m1 m2 km1 + m2
m2
k
FIGURE 3.40
18. Solve for the currents in the circuit of Figure 3.41 if Et = 5Ht − 2 and the initial currents are zero.
k1 6
m1 1
E(t)
k2 2
m2 1
16. Consider the system of Figure 3.39. Let M be subjected to a periodic driving force ft = A sint. The masses are initially at rest in the equilibrium position. (a) Derive and solve the initial value problem for the displacement functions. (b) Show that, if m and k2 are chosen so that = k2 /m, then the mass m cancels the forced vibrations of M. In this case we call m a vibration absorber.
c1
y1
i2
10
10
20. Two tanks are connected by a series of pipes as shown in Figure 3.42. Tank 1 initially contains 60 gallons of brine in which 11 pounds of salt are dissolved. Tank 2 initially contains 7 pounds of salt dissolved in 18 gallons of brine. Beginning at time zero a mixture containing 16 pound of salt for each gallon of water is pumped into tank 1 at the rate of 2 gallons per minute, while salt water solutions are interchanged between the two tanks and also flow out of tank 2 at the rates shown in the diagram. Four minutes after time zero, salt is poured into tank 2 at the rate of 11 pounds per minute for a period of 2 minutes. Determine the amount of salt in each tank for any time t ≥ 0.
3 gal/ min
2 gal/ min k2
i1
19. Solve for the currents in the circuit of Figure 3.41 if Et = 5t − 1.
FIGURE 3.38
k1
30 H
FIGURE 3.41
k3 3
M
20 H
m 1 6
y2
lb/gal 1
2
FIGURE 3.39
17. Two objects of masses m1 and m2 are attached to opposite ends of a spring having spring constant k (Figure 3.40). The entire apparatus is placed on a
5 gal/ min FIGURE 3.42
2 gal/ min
150
CHAPTER 3
The Laplace Transform
21. Two tanks are connected by a series of pipes as shown in Figure 3.43. Tank 1 initially contains 200 gallons of brine in which 10 pounds of salt are dissolved. Tank 2 initially contains 5 pounds of salt dissolved in 100 gallons of water. Beginning at time zero, pure water is pumped into tank 1 at the rate of 3 gallons per minute, while brine solutions are interchanged between the tanks at the rates shown in the diagram. Three minutes after time zero, 5 pounds of salt are dumped into tank 2. Determine the amount of salt in each tank for any time t ≥ 0.
3.7
3 gal/ min
3 gal/ min
1
2 gal/ min
5 lb
2
4 gal/ min
1 gal/ min
FIGURE 3.43
Differential Equations with Polynomial Coefficients The Laplace transform can sometimes be used to solve linear differential equations having polynomials as coefficients. For this we need the fact that the Laplace transform of tft is the negative of the derivative of the Laplace transform of ft.
THEOREM 3.13
Let fs = Fs for s > b and suppose that F is differentiable. Then tfts = −F s for s > b. Proof
Differentiate under the integral sign to calculate d d −st e−st ft dt e ft dt = ds 0 ds 0 −te−st ft dt = e−st −tft dt =
F s =
0
0
= −tfts and this is equivalent to the conclusion of the theorem. By applying this result n times, we reach the following.
COROLLARY 3.1
Let fs = Fs for s > b and let n be a positive integer. Suppose F is n times differentiable. Then, for s > b, tn fts = −1n
dn Fs dsn
3.7 Differential Equations with Polynomial Coefficients
151
EXAMPLE 3.25
Consider the problem ty + 4t − 2y − 4y = 0
y0 = 1
If we write this differential equation in the form y + pty + qty = 0, then we must choose pt = 4t − 2/t, and this is not defined at t = 0, where the initial condition is given. This problem is not of the type for which we proved an existence/uniqueness theorem in Chapter 2. Further, we have only one initial condition. Nevertheless, we will look for functions satisfying the problem as stated. Apply the Laplace transform to the differential equation to get ty + 4 ty − 2 y − 4 y = 0 Calculate the first three terms as follows. First, ty = −
d d 2 y = −
s Y − sy0 − y 0 ds ds
= −2sY − s2 Y + 1 because y0 = 1 and y 0, though unknown, is constant and has zero derivative. Next, d
y ds d = − sY − y0 = −Y − sY ds
ty = −
Finally, y = sY − y0 = sY − 1 The transform of the differential equation is therefore −2sY − s2 Y + 1 − 4Y − 4sY − 2sY + 2 − 4Y = 0 Then Y +
3 4s + 8 Y= ss + 4 ss + 4
This is a linear first-order differential equation, and we will find an integrating factor. First compute
4s + 8 ds = ln s2 s + 42 ss + 4
Then eln s
2 s+42
= s2 s + 42
152
CHAPTER 3
The Laplace Transform
is an integrating factor. Multiply the differential equation by this factor to obtain s2 s + 42 Y + 4s + 8ss + 4Y = 3ss + 4 or
s2 s + 42 Y = 3ss + 4 Integrate to get s2 s + 42 Y = s3 + 6s2 + C Then Ys =
s 6 C + + 2 2 2 s + 4 s + 4 s s + 42
Upon applying the inverse Laplace transform, we obtain yt = e−4t + 2te−4t +
C
−1 + 2t + e−4t + 2te−4t 32
This function satisfies the differential equation and the condition y0 = 1 for any real number C. This problem does not have a unique solution. When we applied the Laplace transform to a constant coefficient differential equation y + Ay + By = ft, we obtained an algebraic expression for Y . In this example, with polynomials occurring as coefficients, we obtained a differential equation for Y because the process of computing the transform of tk yt involves differentiating Ys. In the next example, we will need the following fact.
THEOREM 3.14
Let f be piecewise continuous on 0 k for every positive number k, and suppose there are numbers M and b such that ft ≤ Mebt for t ≥ 0. Let f = F . Then lim Fs = 0
s→
Proof
Write Fs =
0
e−st ft dt ≤
M −s−bt e = b−s
= 0
e−st Mebt dt
0
M →0 s−b
as s → . This result will enable us to solve the following initial value problem.
3.7 Differential Equations with Polynomial Coefficients
153
EXAMPLE 3.26
Suppose we want to solve y + 2ty − 4y = 1
y0 = y 0 = 0
Unlike the preceding example, this problem satisfies the hypotheses of the existence/uniqueness theorem in Chapter 2. Apply the Laplace transform to the differential equation to get 1 s2 Ys − sy0 − y 0 + 2 ty s − 4Ys = s Now y0 = y 0 = 0 and d
y s ds d
sYs − y0 = −Ys − sY s =− ds
ty s = −
We therefore have 1 s2 Ys − 2Ys − 2sY s − 4Ys = s or Y +
1 3 s − Y =− 2 s 2 2s
This is a linear first-order differential equation for Y . To find an integrating factor, first compute 3 s 1 − ds = 3 lns − s2 s 2 4 The exponential of this function, or s3 e−s
2 /4
is an integrating factor. Multiply the differential equation by this function to obtain s3 e−s
2 /4
1 2 Y = − se−s /4 2
Then s3 e−s
2 /4
Y = e−s
2 /4
+C
so Ys =
1 C s2 /4 + e s3 s3
CHAPTER 3
154
The Laplace Transform
We do not have any further initial conditions to determine C. However, in order to have lims→ Ys = 0, we must choose C = 0. Then Ys = 1/s3 so 1 yt = t2 2
SECTION 3.7
PROBLEMS
Use the Laplace transform to solve each of Problems 1 through 10. 1. t2 y − 2y = 2
6. y + 2ty − 4y = 6 y0 = 0 y 0 = 0 7. y + 8ty = 0 y0 = 4 y 0 = 0
2. y + 4ty − 4y = 0 y0 = 0 y 0 = −7
5. ty + t − 1y + y = 0 y0 = 0
3. y − 16ty + 32y = 14 y0 = y 0 = 0 4. y + 8ty − 8y = 0 y0 = 0 y 0 = −4
8. y − 4ty + 4y = 0 y0 = 0 y 0 = 10 9. y − 8ty + 16y = 3 y0 = 0 y 0 = 0 10. 1 − ty + ty − y = 0 y0 = 3 y 0 = −1
CHAPTER
POWER SERIES SOLUTIONS OF INITIAL VALUE PROBLEMS SINGULAR POINTS AND THE METHOD OF FROBENIUSPOWER SERIES SOLUTIONS USING RECURRENCE RELATIONS SECOND SOLUTIONS AND LOGARITHM
4
Series Solutions
Sometimes we can find an explicit, closed form solution of a differential equation or initial value problem. This occurs with y + 2y = 1
y0 = 3
which has the unique solution 1 yx = 1 + 5e−2x 2 This solution is explicit, giving yx as a function of x, and is in closed form because it is a finite algebraic combination of elementary functions (which are functions such as polynomials, trigonometric functions, and exponential functions). Sometimes standard methods do not yield a solution in closed form. For example, the problem y + ex y = x 2
y0 = 4
has the unique solution yx = e−e
x
x 0
2 ee d + 4e−e
x
This solution is explicit, but it is not in closed form because of the integral. It is difficult to analyze this solution, or even to evaluate it at specific points. Sometimes a series solution is a good strategy for solving an initial value problem. Such a solution is explicit, giving yx as an infinite series involving constants times powers of x. It may also reveal important information about the behavior of the solution—for example, whether it passes through the origin, whether it is an even or odd function, or whether the function is increasing or decreasing on a given interval. It may also be possible to make good approximations to function values from a series representation. We will begin with power series solutions for differential equations admitting such solutions. Following this, we will develop another kind of series for problems whose solutions do not have power series expansions about a particular point. This chapter assumes familiarity with basic facts about power series. 155
156
4.1
CHAPTER 4
Series Solutions
Power Series Solutions of Initial Value Problems Consider the linear first-order initial value problem y + pxy = qx
yx0 = y0
If p and q are continuous on an open interval I about x0 , we are guaranteed by Theorem 1.3 that this problem has a unique solution defined for all x in I. With a stronger condition on these coefficients, we can infer that the solution will have a stronger property, which we now define.
DEFINITION 4.1
Analytic Function
A function f is analytic at x0 if fx has a power series representation in some open interval about x0 : fx =
an x − x0 n
n=0
in some interval x0 − h x0 + h
For example, sinx is analytic at 0, having the power series representation sinx =
−1n 2n+1 x n=0 2n + 1!
This series converges for all real x. Analyticity requires at least that f be infinitely differentiable at x0 , although this by itself is not sufficient for f to be analytic at x0 . We claim that, when the coefficients of an initial value problem are analytic, then the solution is as well. THEOREM 4.1
Let p and q be analytic at x0 . Then the initial value problem y + pxy = qx
yx0 = y0
has a solution that is analytic at x0 . This means that an initial value problem whose coefficients are analytic at x0 has an analytic solution at x0 . This justifies attempting to expand the solution in a power series about x0 , where the initial condition is specified. This expansion has the form yx =
an x − x0 n
n=0
in which an =
1 n y x0 n!
(4.1)
4.1 Power Series Solutions of Initial Value Problems
157
One strategy to solve the initial value problem of the theorem is to use the differential equation and the initial condition to calculate these derivatives, hence obtain coefficients in the expansion (4.1) of the solution.
EXAMPLE 4.1
Consider again the problem y + ex y = x 2
y0 = 4
The theorem guarantees an analytic solution at 0: yx =
1 n y 0xn n=0 n!
= y0 + y 0x +
1 1 y 0x2 + y3 0x3 + · · · 2! 3!
We will know this series if we can determine the terms y0, y 0, y 0, · · · . The initial condition gives us y0 = 4. Put x = 0 into the differential equation to get y 0 + y0 = 0 or y 0 + 4 = 0 Then y 0 = −4 Next determine y 0. Differentiate the differential equation to get y + ex y + ex y = 2x
(4.2)
and put x = 0 to get y 0 + y 0 + y0 = 0 Then y 0 = −y 0 − y0 = −−4 − 4 = 0 Next we will find y3 x. Differentiate equation (4.2) to get y3 + 2ex y + ex y + ex y = 2 Then y3 0 + 2y 0 + y 0 + y0 = 2 or y3 0 + 2−4 + 4 = 2 Then y3 0 = 6 Next differentiate equation (4.3): y4 + 3ex y + 3ex y + ex y3 + ex y = 0
(4.3)
158
CHAPTER 4
Series Solutions
Evaluate this at 0 to get y4 0 + 3−4 + 30 + 6 + 4 = 0 so y4 0 = 2 At this point we have the first five terms of the Maclaurin expansion of the solution: 1 1 1 yx = y0 + y 0x + y 0x2 + y3 0x3 + y4 0x4 + · · · 2 6 24 1 1 = 4 − 4x + x3 + x4 + · · · 6 12 By differentiating more times, we can write as many terms of this series as we want.
EXAMPLE 4.2
Consider the initial value problem y + sinxy = 1 − x
y = −3
Since the initial condition is given at x = , we will seek terms in the Taylor expansion of the solution about . This series has the form 1 yx =y + y x − + y x − 2 2 +
1 3 1 y x − 3 + y4 x − 4 + · · · 6 24
We know the first term, y = −3. From the differential equation, y = 1 − + 3 sin = 1 − Now differentiate the differential equation: y x + cosxy + sinxy = −1 Substitute x = to get y − −3 = −1 so y = −4 Next differentiate equation (4.4): y3 x − sinxy + 2 cosxy + sinxy = 0 Substitute x = to get y3 − 21 − = 0 so y3 = 21 −
(4.4)
4.1 Power Series Solutions of Initial Value Problems
159
Up to this point we have four terms of the expansion of the solution about : yx = −3 + 1 − x − −
4 21 − x − 2 + x − 3 + · · · 2! 3!
1 = −3 + 1 − x − − 2x − 2 + 1 − x − 3 + · · · 3 Again, with more work we can compute more terms. This method for generating a series solution of a first order linear initial value problem extends readily to second order problems, justified by the following theorem. THEOREM 4.2
Let p, q and f be analytic at x0 . Then the initial value problem y + pxy + qxy = fx
yx0 = A y x0 = B
has a unique solution that is also analytic at x0 .
EXAMPLE 4.3
Solve y − xy + ex y = 4
y0 = 1 y 0 = 4
Methods from preceding chapters do not apply to this problem. Since −x, ex , and 4 are analytic at 0, the problem has a series solution expanded about 0. The solution has the form yx = y0 + y 0x +
1 1 y 0x2 + y3 0x3 + · · · 2! 3!
We already know the first two coefficients from the initial conditions. From the differential equation, y 0 = 4 − y0 = 3 Now differentiate the differential equation to get y3 − y − xy + ex y + ex y = 0 Then y3 0 = y 0 − y0 − y 0 = −1 Thus far we have four terms of the series solution about 0: 3 1 yx = 1 + 4x + x2 − x3 + · · · 2 6 Although we have illustrated the series method for initial value problems, we can also use it to find general solutions.
CHAPTER 4
160
Series Solutions
EXAMPLE 4.4
We will find the general solution of y + cosxy + 4y = 2x − 1 The idea is to think of this as an initial value problem, y + cosxy + 4y = 2x − 1
y0 = a y 0 = b
with a and b arbitrary (these will be the two arbitrary constants in the general solution). Now proceed as we have been doing. We will determine terms of a solution expanded about 0. The first two coefficients are a and b. For the coefficient of x2 , we find, from the differential equation y 0 = −y 0 − 4y0 − 1 = −b − 4a − 1 Next, differentiate the differential equation: y3 − sinxy + cosxy + 4y = 2 so y3 0 = −y 0 − 4y 0 + 2 = b + 4a + 1 − 4b + 2 = 4a − 3b + 3 Continuing in this way, we obtain (with details omitted) yx = a + bx + +
−1 − 4a − b 2 3 + 4a − 3b 3 x + x 2 6
1 + 12a + 8b 4 −16 − 40a + b 5 x + x +··· 24 120
In the next section, we will revisit power series solutions, but from a different perspective.
SECTION 4.1
PROBLEMS
In each of Problems 1 through 10, find the first five nonzero terms of the power series solution of the initial value problem, about the point where the initial conditions are given. 1. y + y − xy = 0 y0 = −2 y 0 = 0 2. y + 2xy + x − 1y = 0 y0 = 1 y 0 = 2 3. y − xy = 2x y1 = 3 y 1 = 0 4. y + xy = −1 + x y2 = 1 y 2 = −4 5. y −
1 1 y + y = 0 y1 = 7 y 1 = 3 x2 x
8. y + y − x4 y = sin2x y0 = 0 y 0 = −2 9. y +
1 y − xy = 0 y0 = y 0 = 1 x+2
1 10. y − y + y = 1 y4 = 0 y 4 = 2 x In each of Problems 11 through 20, find the first five nonzero terms of the Maclaurin expansion of the general solution. 11. y + sinxy = −x 12. y − x2 y = 1
6. y + x2 y = ex y0 = −2 y 0 = 7
13. y + xy = 1 − x + x2
7. y − ex y + 2y = 1 y0 = −3 y 0 = 1
14. y − y = lnx + 1
4.2 Power Series Solutions Using Recurrence Relations 15. y + xy = 0
In each of Problems 22 through 25, the initial value problem can be solved in closed form using methods from Chapters 1 and 2. Find this solution and expand it in a Maclaurin series. Then find the Maclaurin series solution using methods of Section 4.1. The two series should agree.
16. y − 2y + xy = 0 17. y − x3 y = 1 18. y + 1 − xy + 2xy = 0 19. y + y − x2 y = 0
22. y + y = 1 y0 = 0 y 0 = 0
20. y − 8xy = 1 + 2x9
23. y + y = 2 y0 = −1
21. Find the first five terms of the Maclaurin series solution of Airy’s equation y + xy = 0, satisfying y0 = a y 0 = b.
4.2
161
24. y + 3y + 2y = x y0 = 0 y 0 = 1 25. y − 4y + 5y = 1 y0 = −1 y 0 = 4
Power Series Solutions Using Recurrence Relations We have just seen one way to utilize the differential equation and initial conditions to generate terms of a series solution, expanded about the point where the initial conditions are specified. Another way to generate coefficients is to develop a recurrence relation, which allows us to produce coefficients once certain preceding ones are known. We will consider three examples of this method.
EXAMPLE 4.5
Consider y + x2 y = 0. Suppose we want a solution expanded about 0. Instead successive derivatives at 0, as we did before, now begin by substituting of computing n yx = n=0 an x into the differential equation. To do this, we need y =
nan xn−1
and y =
n=1
nn − 1an xn−2
n=2
Notice that the series for y begins at n = 1, and that for y at n = 2. Put these series into the differential equation to get y + x2 y =
nn − 1an xn−2 +
n=2
an xn+2 = 0
(4.5)
n=0
Shift indices in both summations so that the power of x occurring in each series is the same. One way to do this is to write
nn − 1an xn−2 =
n=2
n + 2n + 1an+2 xn
n=0
and n=0
an xn+2 =
an−2 xn
n=2
Using these series, we can write equation (4.5) as n=0
n + 2n + 1an+2 xn +
n=2
an−2 xn = 0
162
CHAPTER 4
Series Solutions
We can combine the terms for n ≥ 2 under one summation and factor out the common xn (this was the reason for rewriting the series). When we do this, we must list the n = 0 and n = 1 terms of the first summation separately, or else we lose terms. We get 21a2 x0 + 32a3 x +
n + 2n + 1an+2 + an−2 xn = 0
n=2
The only way for this series to be zero for all x in some open interval about 0 is for the coefficient of each power of x to be zero. Therefore, a2 = a3 = 0 and, for n = 2 3 , n + 2n + 1an+2 + an−2 = 0 This implies that an+2 = −
1 a n + 2n + 1 n−2
for n = 2 3
(4.6)
This is a recurrence relation for this differential equation. In this example, it gives an+2 in terms of an−2 for n = 2 3 Thus, we know a4 in terms of a0 , a5 in terms of a1 , a6 in terms of a2 , and so on. The form of the recurrence relation will vary with the differential equation, but it always gives coefficients in terms of one or more previously indexed ones. Using equation (4.6), we proceed: a4 = −
1 1 a = − a0 43 0 12
a5 = −
1 1 a = − a1 54 1 20
(by putting n = 2);
(by putting n = 3); a6 = −
1 a =0 65 2
because a2 = 0
a7 = −
1 a =0 76 3
because a3 = 0
a8 = −
1 1 a4 = a 87 5612 0
a9 = −
1 1 a = a 98 5 7220 1
and so on. The first few terms of the series solution expanded about 0 are 1 a x4 12 0 1 1 8 1 9 a1 x5 + 0x6 + 0x7 + x + x +··· − 20 672 1440 1 6 1 9 1 4 1 5 = a0 1 − x + x + · · · + a1 x − x + x +··· 12 672 20 1440
yx = a0 + a1 x + 0x2 + 0x3 −
This is actually the general solution, since a0 and a1 are arbitrary constants. Note that a0 = y0 and a1 = y 0, so a solution is completely specified by giving y0 and y 0.
4.2 Power Series Solutions Using Recurrence Relations
163
EXAMPLE 4.6
Consider the nonhomogeneous differential equation y + x2 y + 4y = 1 − x2 Attempt a solution yx =
n=0 an x
n
. Substitute this series into the differential equation to get
nn − 1an xn−2 + x2
n=2
Then
nan xn−1 + 4
n=1
nn − 1an xn−2 +
n=2
an x n = 1 − x 2
n=0
nan xn+1 +
n=1
4an xn = 1 − x2
(4.7)
n=0
Shift indices in the first and second summation so that the power of x occurring in each is xn :
nn − 1an xn−2 =
n=2
and
n + 2n + 1an+2 xn
n=0
nan xn+1 =
n=1
n − 1an−1 xn
n=2
Equation (4.7) becomes
n + 2n + 1an+2 xn +
n=0
n − 1an−1 xn +
n=2
4an xn = 1 − x2
n=0
We can combine summations from n = 2 on, writing the n = 0 and n = 1 terms from the first and third summations separately. Then 2a2 x0 + 6a3 x + 4a0 x0 + 4a1 x +
n + 2n + 1an+2 + n − 1an−1 + 4an xn = 1 − x2
n=2
For this to hold for all x in some interval about 0, the coefficient of xn on the left must match the coefficient of xn on the right. By matching these coefficients, we get: 2a2 + 4a0 = 1 (from x0 ), 6a3 + 4a1 = 0 (from x), 43a4 + a1 + 4a2 = −1 (from x2 ), and, for n ≥ 3, n + 2n + 1an+2 + n − 1an−1 + 4an = 0
164
CHAPTER 4
Series Solutions
From these equations we get, in turn, 1 − 2a0 2 2 a3 = − a1 3 1 a4 = −1 − a1 − 4a2 12 1 1 1 1 − 2a0 = − − a1 − 12 12 3 2
a2 =
1 1 2 = − + a0 − a1 4 3 12 and, for n = 3 4 an+2 = −
4an + n − 1an−1 n + 2n + 1
This is the recurrence relation for this differential equation, and it enables us to determine an+2 if we know the two previous coefficients an and an−1 . With n = 3 we get 1 8 4a3 + 2a2 =− − a1 + 1 − 4a0 a5 = − 20 20 3 =−
2 1 1 + a0 + a1 20 5 15
With n = 4 the recurrence relation gives us a6 = − =
1 1 1 8 4a4 + 3a3 = − −1 + a0 − a1 − 2a1 30 30 3 3
4 7 1 − a + a 30 45 0 90 1
Thus far we have six terms of the solution: 1 2 yx = a0 + a1 x + − 2a0 x2 − a1 x3 2 3 1 1 2 1 1 2 4 + − + a0 − a1 x + − + a0 + a1 x5 4 3 12 20 5 15 4 1 7 − a + a x6 + · · · + 30 45 0 90 1
Using the recurrence relation, we can produce as many terms of this series as we wish. A recurrence relation is particularly suited to computer generation of coefficients. Because this recurrence relation specifies each an (for n ≥ 3) in terms of two preceding coefficients, it is called a two-term recurrence relation. It will give each an for n ≥ 3 in terms of a0 and a1 , which are arbitrary constants. Indeed, y0 = a0 and y 0 = a1 , so assigning values to these constants uniquely determines the solution. Sometimes we must represent one or more coefficients as power series to apply the current method. This does not alter the basic idea of collecting coefficients of like powers of x and solving for the coefficients.
4.2 Power Series Solutions Using Recurrence Relations
165
EXAMPLE 4.7
Solve y + xy − y = e3x Each coefficient at 0, so we will series solution expanded about 0. is analytic look nfor a power n 3x n Substitute y = a x and also e = 3 /n!x into the differential equation to get: n=0 n n=0
nn − 1an xn−2 +
n=2
nan xn −
n=1
an xn =
n=0
3n n=0
n!
xn
Shift indices in the first summation to write this equation as
n + 2n + 1an+2 xn +
n=0
nan xn −
n=1
an x n =
n=0
3n n=0
n!
xn
We can collect terms from n = 1 on under one summation, obtaining
n + 2n + 1an+2 + n − 1an xn + 2a2 − a0 = 1 +
n=1
3n n=1
n!
xn
Equate coefficients of like powers of x on both sides of the equation to obtain 2a2 − a0 = 1 and, for n = 1 2 , n + 2n + 1an+2 + n − 1an =
3n n!
This gives 1 a2 = 1 + a0 2 and, for n = 1 2 we have the one-term recurrence relation (in terms of one preceding coefficient) an+2 =
3n /n! + 1 − nan n + 2n + 1
Using this relationship we can generate as many coefficients as we want in the solution series, in terms of the arbitrary constants a0 and a1 . The first few terms are 1 + a0 2 1 3 yx a0 + a1 x + x + x 2 2 1 a0 7 1 57 a0 − + + x4 + x5 + x6 + · · · 3 24 40 30 24 8
SECTION 4.2
PROBLEMS
In each of Problems 1 through 12, find the recurrence relation and use it to generate the first five terms of the Maclaurin series of the general solution.
1. y − xy = 1 − x 2. y − x3 y = 4
166
CHAPTER 4
Series Solutions
3. y + 1 − x2 y = x
8. y + x2 y + 2y = 0
4. y + 2y + xy = 0
9. y + 1 − xy + 2y = 1 − x2
5. y − xy + y = 3
10. y + y − 1 − x + x2 y = −5
6. y + xy + xy = 0
11. y + xy = cosx
7. y − x2 y + 2y = x
12. y + xy = 1 − ex
4.3
Singular Points and the Method of Frobenius In this section we will consider the second-order linear differential equation Pxy + Qxy + Rxy = Fx
(4.8)
If we can divide this equation by Px and obtain an equation of the form y + pxy + qxy = fx
(4.9)
with analytic coefficients in some open interval about x0 , then we can proceed to a power series solution of equation (4.9) by methods already developed, and thereby solve equation (4.8). In this case we call x0 an ordinary point of the differential equation. If, however, x0 is not an ordinary point, then this strategy fails and we must develop some new machinery.
DEFINITION 4.2
Ordinary and Singular Points
x0 is an ordinary point of equation (4.8) if Px0 = 0 and Qx/Px, Rx/Px, and Fx/Px are analytic at x0 . x0 is a singular point of equation (4.8) if x0 is not an ordinary point.
Thus, x0 is a singular point if Px0 = 0, or if any one of Qx/Px, Rx/Px, or Fx/Px fails to be analytic at x0 .
EXAMPLE 4.8
The differential equation x3 x − 22 y + 5x + 2x − 2y + 3x2 y = 0 has singular points at 0 and 2, because Px = x3 x − 22 and P0 = P2 = 0. Every other real number is a regular point of this equation. In an interval about a singular point, solutions can exhibit behavior that is quite different from what we have seen in an interval about an ordinary point. In particular, the general solution of equation (4.8) may contain a logarithm term, which will tend toward in magnitude as x approaches x0 .
4.3 Singular Points and the Method of Frobenius
167
In order to seek some understanding of the behavior of solutions near a singular point, we will concentrate on the homogeneous equation Pxy + Qxy + Rxy = 0
(4.10)
Once this case is understood, it does not add substantial further difficulty to consider the nonhomogeneous equation (4.8). Experience and research have shown that some singular points are “worse” than others, in the sense that the subtleties they bring to attempts at solution are deepened. We therefore distinguish two kinds of singular points.
DEFINITION 4.3
Regular and Irregular Singular Points
x0 is a regular singular point of equation (4.10) if x0 is a singular point, and the functions x − x0
Qx Px
and x − x0 2
Rx Px
are analytic at x0 . A singular point that is not regular is said to be an irregular singular point.
EXAMPLE 4.9
We have already noted that x3 x − 22 y + 5x + 2x − 2y + 3x2 y = 0 has singular points at 0 and 2. We will classify these singular points. In this example, Px = x3 x − 22 , Qx = 5x + 2x − 2 and Rx = 3x2 . First consider x0 = 0. Now x − x0
Qx 5xx + 2x − 2 5 x+2 = = 2 Px x3 x − 22 x x−2
is not defined at 0, hence is not analytic there. This is enough to conclude that 0 is an irregular singular point of this differential equation. Next let x0 = 2 and consider x − 2
x+2 Qx =5 3 Px x
and x − 22
Rx 3 = Px x
Both of these functions are analytic at 2. Therefore, 2 is a regular singular point of the differential equation. Suppose now that equation (4.10) has a regular singular point at x0 . Then there may be no solution as a power series about x0 . In this case we attempt to choose numbers cn and a number r so that yx = cn x − x0 n+r (4.11) n=0
168
CHAPTER 4
Series Solutions
is a solution. This series is called a Frobenius series, and the strategy of attempting a solution of this form is called the method of Frobenius. A Frobenius series need not be a power series, since r may be negative or may be a noninteger. A Frobenius series “begins” with c0 xr , which is constant only if r = 0. Thus, in computing the derivative of the Frobenius series (4.11), we get y x =
n + rcn x − x0 n+r−1
n=0
and this summation begins at zero again because the derivative of the n = 0 term need not be zero. Similarly, y x =
n + rn + r − 1cn x − x0 n+r−2
n=0
We will now illustrate the method of Frobenius.
EXAMPLE 4.10
We want to solve
x2 y + x
1 1 + 2x y + x − y = 0 2 2
It to show that 0 is a regular singular point. Substitute a Frobenius series yx = is routine n+r into the differential equation to get n=0 cn x
n + rn + r − 1cn xn+r +
n=0
1 n=0
+
n=0
cn xn+r+1 −
1 n=0 2
2
n + rcn xn+r +
2n + rcn xn+r+1
n=0
cn xn+r = 0
Shift indices in the third and fourth summations to write this equation as
1 1 1 r rr − 1c0 + c0 r − c0 x + n + rn + r − 1cn + n + rcn 2 2 2 n=1
1 + 2n + r − 1cn−1 + cn−1 − cn xn+r = 0 2 This equation will hold if the coefficient of each xn+r is zero. This gives us the equations 1 1 rr − 1c0 + c0 r − c0 = 0 2 2
(4.12)
1 1 n + rn + r − 1cn + n + rcn + 2n + r − 1cn−1 + cn−1 − cn = 0 2 2
(4.13)
and
for n = 1 2 Assuming that c0 = 0, an essential requirement in the method, equation (4.12) implies that 1 1 rr − 1 + r − = 0 2 2
(4.14)
4.3 Singular Points and the Method of Frobenius
169
This is the indicial equation for this differential equation, and it determines the values of r we 1 can use. Solve it to obtain r1 = 1 and r2 = − . Equation (4.13) enables us to solve for cn in 2 terms of cn−1 to get the recurrence relation cn = −
1 + 2n + r − 1 c 1 1 n−1 n + rn + r − 1 + n + r − 2 2
for n = 1 2 First put r = r1 = 1 into the recurrence relation to obtain 1 + 2n c cn = − 3 n−1 n n+ 2 Some of these coefficients are
for n = 1 2
3 6 c1 = − c0 = − c0 5 5 2 5 5 6 6 c2 = − c 1 = − − c0 = c0 7 7 5 7 7 14 6 4 c = − c0 c3 = − c2 = − 27 27 7 0 9 2 and so on. One Frobenius solution is
6 4 6 y1 x = c0 x − x2 + x3 − x4 + · · · 5 7 9
Because r1 is a nonnegative integer, this first Frobenius solution is actually a power series about 0. 1 For a second Frobenius solution, substitute r = r2 = − into the recurrence relation. To 2 avoid confusion we will replace cn with cn∗ in this relation. We get 3 1+2 n− 2 cn∗ = − c∗ 1 3 1 1 1 n−1 n− n− + n− − 2 2 2 2 2 for n = 1 2 This simplifies to 2n − 2 ∗ c cn∗ = − 3 n−1 n n− 2 It happens in this example that c1∗ = 0, so each cn∗ = 0 for n = 1 2 and the second Frobenius solution is y2 x =
cn∗ xn−1/2 = c0∗ x−1/2
for x > 0
n=0
The method of Frobenius is justified by the following theorem.
170
CHAPTER 4
Series Solutions
THEOREM 4.3
Method of Frobenius
Suppose x0 is a regular singular point of Pxy + Qxy + Rxy = 0. Then there exists at least one Frobenius solution yx = cn x − x0 r n=0
with c0 = 0. Further, if the Taylor expansions of x − x0 Qx/Rx and x − x0 2 Rx/Px about x0 converge in an open interval x0 − h x0 + h, then this Frobenius series also converges in this interval, except perhaps at x0 itself. It is significant that the theorem only guarantees the existence of one Frobenius solution. Although we obtained two such solutions in the preceding example, the next example shows that there may be only one.
EXAMPLE 4.11
Suppose we want to solve x2 y + 5xy + x + 4y = 0 Zero is a regular singular point, so attempt a Frobenius solution yx = into the differential equation to get
n + rn + r − 1cn xn+r +
n=0
5n + rcn xn+r +
n=0
cn xn+r+1 +
n=0
n=0 cn x
n+r
. Substitute
4cn xn+r = 0
n=0
Shift indices in the third summation to write this equation as
n + rn + r − 1cn xn+r +
n=0
5n + rcn xn+r +
n=0
cn−1 xn+r +
n=1
4cn xn+r = 0
n=0
Now combine terms to write
rr − 1 + 5r + 4c0 xr +
n + rn + r − 1cn + 5n + rcn + cn−1 + 4cn xn+r = 0
n=1
Setting the coefficient of xr equal to zero (since c0 = 0 as part of the method), we get the indicial equation rr − 1 + 5r + 4 = 0 with the repeated root r = −2. The coefficient of xn+r in the series, with r = −2 inserted, gives us the recurrence relation n − 2n − 3cn + 5n − 2cn + cn−1 + 4cn = 0 or cn = −
1 c n − 2n − 3 + 5n − 2 + 4 n−1
for n = 1 2 This simplifies to cn = −
1 c n2 n−1
for n = 1 2 3
4.3 Singular Points and the Method of Frobenius
171
Some of the coefficients are c1 = −c0 1 1 1 c2 = − c 1 = c 0 = c 4 4 22 0 1 1 1 c =− c3 = − c 2 = − c 9 4·9 0 2 · 32 0 c4 = −
1 1 1 c c3 = c0 = 16 4 · 9 · 16 2 · 3 · 42 0
and so on. In general, cn = −1n
1 c n!2 0
for n = 1 2 3 The Frobenius solution we have found is yx = c0 x−2 − x−1 + = c0
n=0
−1n
1 1 2 1 − x+ x +··· 4 36 576
1 n−2 x n!2
In this example, xQx/Px = x5x/x2 = 5 and x2 Rx/Px = x2 x + 4/x2 = x + 4. These polynomials are their own Maclaurin series about 0, and these series, being finite, converge for all x. By Theorem 4.3, the Frobenius series solution converges for all x, except x = 0. In this example the method of Frobenius produces only one solution. In the last example the recurrence relation produced a simple formula for cn in terms of c0 . Depending on the coefficients in the differential equation, a formula for cn in terms of c0 may be quite complicated, or it may even not be possible to write a formula in terms of elementary algebraic expressions. We will give another example, having some importance for later work, in which the Frobenius method may produce only one solution.
EXAMPLE 4.12 Bessel Functions of the First Kind
The differential equation x2 y + xy + x2 − 2 y = 0 is called Bessel’s equation of order , for ≥ 0. Although it is a second-order differential equation, this description of it as being of order refers to the parameter appearing in it, and is traditional. Solutions of Bessel’s equation are called Bessel functions, and we will encounter them in Chapter 16 when we treat special functions, and again in Chapter 18 when we analyze heat conduction in an infinite cylinder. Zero is a regular singular point of Bessel’s equation, so attempt a solution yx =
n=0
cn xn+r
172
CHAPTER 4
Series Solutions
Upon substituting this series into Bessel’s equation, we obtain
rr − 1 + r − 2 c0 xr + rr + 1 + r + 1 − 2 c1 xr+1 +
n + rn + r − 1 + n + r − 2 cn + cn−2 xn+r = 0
(4.15)
n=2
Set the coefficient of each power of x equal to zero. Assuming that c0 = 0, we obtain the indicial equation r 2 − 2 = 0 with roots ±. Let r = in the coefficient of xr+1 in equation (4.15) to get 2 + 1c1 = 0 Since 2 + 1 = 0, we conclude that c1 = 0. From the coefficient of xn+r in equation (4.15), we get
n + rn + r − 1 + n + r − 2 cn + cn−2 = 0 for n = 2 3 Set r = in this equation and solve for cn to get cn = −
1 c nn + 2 n−2
for n = 2 3 Since c1 = 0, this equation yields c3 = c5 = = codd = 0 For the even-indexed coefficients, write c2n = − =− =
1 1 c2n−2 = − 2 c 2n2n + 2 2 nn + 2n−2 1 −1 c 22 nn + 2n − 1 2n − 1 + 2 2n−4 1
c 24 nn − 1n + n + − 1 2n−4
= = =
−1n c 22n nn − 1 · · · 21n + n − 1 + · · · 1 + 0 −1n
c 22n n!1 + 2 + · · · n + 0
One Frobenius solution of Bessel’s equation of order is therefore y1 x = c0
−1n x2n+ 2n n=0 2 n!1 + 2 + · · · n +
(4.16)
These functions are called Bessel functions of the first kind of order . The roots of the indicial equation for Bessel’s equation are ±. Depending on , we may or may not obtain two linearly independent solutions by using and − in the series solution (4.16). We will discuss this in more detail when we treat Bessel functions in Chapter 16, where we will see that, when is a positive integer, the functions obtained by using and then − in the recurrence relation are linearly dependent.
4.4 Second Solutions and Logarithm Factors
PROBLEMS
SECTION 4.3
In each of Problems 1 through 6, find all of the singular points and classify each singular point as regular or singular.
rence relation, and (d) use the results of (b) and (c) to find the first five nonzero terms of two linearly independent Frobenius solutions. 7. 4x2 y + 2xy − xy = 0
1. x2 x − 32 y + 4xx2 − x − 6y + x2 − x − 2y = 0 2. x3 − 2x2 − 7x − 4y − 2x2 + 1y + 5x2 − 2xy = 0 3. x2 x − 2y + 5x − 7y + 23 + 5x2 y = 0
8. 16x2 y − 4x2 y + 3y = 0 9. 9x2 y + 22x + 1y = 0
4. 9 − x2 y + 2 + x2 y = 0
10. 12x2 y + 5xy + 1 − 2x2 y = 0
5. x − 2−1 y + x−5/2 y = 0
11. 2xy + 2x + 1y + 2y = 0
6. x2 sin2 x − y + tanx − tanxy + 7x − 2 cosxy = 0
12. 2x2 y − xy + 1 − x2 y = 0
In each of Problems 7 through 15, (a) show that zero is a regular singular point of the differential equation, (b) find and solve the indicial equation, (c) determine the recur-
14. 3x2 y + 4xy − 3x + 2y = 0
4.4
173
13. 2x2 y + x2x + 1y − 2x2 + 1y = 0 15. 9x2 y + 9xy + 9x2 − 4y = 0
Second Solutions and Logarithm Factors In the preceding section we saw that under certain conditions we can always produce a Frobenius series solution of equation (4.10), but possibly not a second, linearly independent solution. This may occur if the indicial equation has a repeated root, or even if it has distinct roots that differ by a positive integer. In the case that the method of Frobenius only produces one solution, there is a method for finding a second, linearly independent solution. The key is to know what form to expect this solution to have, so that this template can be substituted into the differential equation to determine the coefficients. This template is provided by the following theorem. We will state the theorem with x0 = 0 to simplify the notation. To apply it to a differential equation having a singular point x0 = 0, use the change of variables z = x − x0 .
THEOREM 4.4
A Second Solution in the Method of Frobenius
Suppose 0 is a regular singular point of Pxy + Qxy + Rxy = 0 Let r1 and r2 be roots of the indicial equation. If these are real, suppose r1 ≥ r2 . Then 1. If r1 − r2 is not an integer, there are two linearly independent Frobenius solutions y1 x =
n=0
cn xn+r1
and
y2 x =
cn∗ xn+r2
n=0
with c0 = 0 and c0∗ = 0. These solutions are valid in some interval 0 h or −h 0.
174
CHAPTER 4
Series Solutions
2. If r1 − r2 = 0, there is a Frobenius solution y1 x = a second solution y2 x = y1 x lnx +
n=0 cn x
n+r1
with c0 = 0 as well as
cn∗ xn+r1
n=1
Further, y1 and y2 form a fundamental set of solutions on some interval 0 h. 3. If r1 − r2 is a positive integer, then there is a Frobenius series solution y1 x =
cn xn+r1
n=0
In this case there is a second solution of the form y2 x = ky1 x lnx +
cn∗ xn+r2
n=0
If k = 0 this is a second Frobenius series solution; if not the solution contains a logarithm term. In either event, y1 and y2 form a fundamental set on some interval 0 h. We may now summarize the method of Frobenius as follows, for the equation Pxy + 0 is a regular singular point. Qxy + Rxy = 0. Suppose n+r c into the differential equation. From the indicial equation, Substitute yx = n=0 n x determine the values of r. If these are distinct and do not differ by an integer, we are guaranteed two linearly independent Frobenius solutions. If the indicial equation has repeated roots, then there is just one Frobenius solution y1 . But there is a second solution y2 x = y1 x lnx +
cn∗ xn+r1
n=1
The series on the right starts its summation at n = 1, not n = 0. Substitute y2 x into the differential equation and obtain a recurrence relation for the coefficients cn∗ . Because this solution has a logarithm term, y1 and y2 are linearly independent. If r1 − r2 is a positive integer, there may or may not be a second Frobenius solution. In this case there is a second solution of the form y2 x = ky1 x lnx +
cn∗ xn+r2
n=0
Substitute y2 into the differential equation and obtain an equation for k and a recurrence relation for the coefficients cn∗ . If k = 0, we obtain a second Frobenius solution; if not, then this second solution has a logarithm term. In either case y1 and y2 are linearly independent. In the preceding section we saw in Example 4.10 a differential equation in which r1 − r2 was not an integer. There we found two linearly independent Frobenius solutions. We will illustrate cases (2) and (3) of the theorem.
EXAMPLE 4.13 Conclusion (2), Equal Roots
Consider again x2 y + 5xy + x + 4y = 0. In Example 4.11 we found one Frobenius solution y1 x = c0
n=0
−1n
1 n−2 x n!2
4.4 Second Solutions and Logarithm Factors
175
The indicial equation is r + 22 = 0 with the repeated root r = −2. Conclusion (2) of the theorem suggests that we attempt a second solution of the form y2 x = y1 x lnx +
cn∗ xn−2
n=1
Note that the series on the right begins at n = 1, not n = 0. Substitute this series into the differential equation to get, after some rearrangement of terms, 4y1 + 2xy1 +
n − 2n − 3cn∗ xn−2 +
n=1
+
5n − 2cn∗ xn−2 +
n=1
cn∗ xn−1
n=1
4cn∗ xn−2 + lnx x2 y1 + 5xy1 + x + 4y1 = 0
n=1
The bracketed coefficient of lnx is zero because y1 is a solution of the differential equation. In the last equation, c0∗ = 1 (we need only one second solution), shift indices to write choose ∗ n−1 ∗ n−2 c x = c x , and substitute the series obtained for y1 x to get n=1 n n=2 n−1 4−1n 2−1n −1 ∗ −1 + n − 2 − 2x + c1 x + n!2 n!2 n=2 ∗ + 4cn∗ xn−2 = 0 + n − 2n − 3cn∗ + 5n − 2cn∗ + cn−1
Set the coefficient of each power of x equal to zero. From the coefficient of x−1 we get c1∗ = 2 From the coefficient of xn−2 in the summation we get, after some routine algebra, 2−1n ∗ n + n2 cn∗ + cn−1 = 0 n!2 or cn∗ = −
1 ∗ 2−1n c − n2 n−1 nn!2
for n = 2 3 4 This enables us to calculate as many coefficients as we wish. Some of the terms of the resulting solution are y2 x = y1 x lnx +
11 25 2 2 3 137 − + x− x + x3 + · · · x 4 108 3456 432,000
Because of the logarithm term, it is obvious that this solution is not a constant multiple of y1 , so y1 and y2 form a fundamental set of solutions (on some interval 0 h). The general solution is yx = C1 + C2 lnx + C2
−1n n=0
n!2
xn−2
2 3 11 25 2 137 − + x− x + x3 + · · · x 4 108 3456 432,000
176
CHAPTER 4
Series Solutions
EXAMPLE 4.14 Conclusion (3), with k = 0
n+r The equation x2 y +x2 y −2y = 0 has a regular singular point at 0. Substitute yx = n=0 cn x and shift indices to obtain
rr − 1 − 2c0 xr + n + rn + r − 1cn + n + r − 1cn−1 − 2cn xn+r = 0 n=1
Assume that c0 = 0. The indicial equation is r 2 − r − 2 = 0, with roots r1 = 2 and r2 = −1. Now r1 − r2 = 3 and case (3) of the theorem applies. For a first solution, set the coefficient of xn+r equal to zero to get n + rn + r − 1cn + n + r − 1cn−1 − 2cn = 0
(4.17)
Let r = 2 to get n + 2n + 1cn + n + 1cn−1 − 2cn = 0 or cn = −
n+1 c nn + 3 n−1
for n = 1 2
Using this recurrence relation to generate terms of the series, we obtain
3 2 1 3 1 4 1 5 1 6 1 2 x − x + x +··· y1 x = c0 x 1 − x + x − x + 2 20 30 168 1120 8640 Now try the second root r = −1 in the recurrence relation (4.17). We get ∗ − 2cn∗ = 0 n − 1n − 2cn∗ + n − 2cn−1
for n = 1 2 When n = 3, this gives c2∗ = 0, which forces cn∗ = 0 for n ≥ 2. But then 1 y2 x = c0∗ + c1∗ x Substitute this into the differential equation to get 2 ∗ −3 2 ∗ −2 ∗ ∗1 x 2c0 x + x −c0 x − 2 c1 + c0 = −c0∗ − 2c1∗ = 0 x Then c1∗ = − 21 c0∗ and we obtain the second solution 1 1 − y2 x = c0∗ x 2 with c0∗ nonzero but otherwise arbitrary. The functions y1 and y2 form a fundamental set of solutions.
EXAMPLE 4.15 Conclusion (3), k = 0
Consider differential equation xy − y = 0, which has a regular singular point at 0. Substitute the yx = n=0 cn xn+r to obtain n=0
n + rn + r − 1cn xn+r−1 −
n=0
cn xn+r = 0
4.4 Second Solutions and Logarithm Factors
177
Shift indices in the second summation to write this equation as r 2 − rc0 xr−1 +
n + rn + r − 1cn − cn−1 xn+r−1 = 0
n=1
The indicial equation is r 2 − r = 0, with roots r1 = 1, r2 = 0. Here r1 − r2 = 1, a positive integer, so we are in case (3) of the theorem. The recurrence relation is n + rn + r − 1cn − cn−1 = 0 for n = 1 2 Let r = 1 and solve for cn cn =
1 c nn + 1 n−1
for n = 1 2 3
Some of the coefficients are c1 =
1 c 12 0
c2 =
1 1 c = c 23 1 223 0
c3 =
1 1 c2 = c 34 23234 0
In general, we find that cn =
1 c n!n + 1! 0
for n = 1 2 This gives us a Frobenius series solution
1 xn+1 n!n + 1! n=0
1 1 1 4 x + = c0 x + x 2 + x 3 + 2 12 144
y1 x = c0
In this example, if we put r = 0 into the recurrence relation, we get nn − 1cn − cn−1 = 0 for n = 1 2 But if we put n = 1 into this equation, we get c0 = 0, contrary to the assumption that c0 = 0. Unlike the preceding example, we cannot find a second Frobenius solution by simply putting r2 into the recurrence relation. Try a second solution y2 x = ky1 x lnx +
cn∗ xn
n=0
(here xn+r2 = xn because r2 = 0). Substitute this into the differential equation to get x
ky1
1 − ky1 2 + nn − 1cn∗ xn−2 x x n=2
1 lnx + 2ky1
− ky1 lnx −
n=0
cn∗ xn = 0
(4.18)
178
CHAPTER 4
Series Solutions
Now k lnx xy1 − y1 = 0 because y1 is a solution of the differential equation. For the remaining terms in equation (4.18), insert the series for y1 x (with c0 = 1 for convenience) to get 2k
1 n 1 n ∗ n−1 x x − k + c nn − 1x − cn∗ xn = 0 n 2 n=0 n! n=0 n!n + 1! n=2 n=0
Shift indices in the third summation to write this equation as 2k
1 n 1 n ∗ n x x − k + c n + 1nx − cn∗ xn = 0 n+1 2 n=0 n! n=0 n!n + 1! n=1 n=0
Then 2k − k − c0∗ x0 +
n=1
2k k ∗ ∗ + nn + 1cn+1 − cn xn = 0 − n!2 n!n + 1!
Then k − c0∗ = 0 and, for n = 1 2 2k k ∗ + nn + 1cn+1 − − cn∗ = 0 n!2 n!n + 1! This gives us k = c0∗ and the recurrence relation
1 2n + 1k ∗ cn+1 = cn∗ − nn + 1 n!n + 1! for n = 1 2 3 Since c0∗ can be any nonzero real number, we may choose c0∗ = 1. Then k = 1 For a particular second solution, let c1∗ = 0 obtaining: 3 7 35 4 x −··· y2 x = y1 x lnx + 1 − x2 − x3 − 4 36 1728 To conclude this section, we will produce a second solution for Bessel’s equation, in a case where the Frobenius method yields only one solution. This will be of use later when we study Bessel functions.
EXAMPLE 4.16 Bessel Function of the Second Kind
Consider Bessel’s equation of zero order = 0. From Example 4.12, this is x2 y + xy + x2 y = 0 We know from that example that the indicial equation has only one root, r = 0. From equation (4.16), with c0 = 1, one Frobenius solution is y1 x =
k=0
−1k
1 22k k!2
x2k
Attempt a second, linearly independent solution of the form y2 x = y1 x lnx +
k=1
ck∗ xk
4.4 Second Solutions and Logarithm Factors
179
Substitute y2 x into the differential equation to get 1 xy1 lnx + 2y1 − y1 + kk − 1ck∗ xk−1 x k=2 1 + y1 lnx + y1 + kck∗ xk−1 + xy1 lnx + ck∗ xk+1 = 0 x k=1 k=1
Terms involving lnx and y1 x cancel, and we are left with 2y1 +
kk − 1ck∗ xk−1 +
k=2
kck∗ xk−1 +
k=1
ck∗ xk+1 = 0
k=1
Since kk − 1 = k − k, part of the first summation cancels all terms except the k = 1 term in the second summation, and we have 2
2y1 +
k2 ck∗ xk−1 + c1∗ +
k=2
ck∗ xk+1 = 0
k=1
Substitute the series for y1 into this equation to get 2
−1k 2k−1 2 ∗ k−1 ∗ x + k c x + c + ck∗ xk+1 = 0 k 1 2k−1 k!k − 1! k=1 2 k=2 k=1
Shift indices in the last series to write this equation as
−1k
k=1
22k−2 k!k − 1!
x2k−1 + c1∗ + 4c2∗ x +
∗ k2 ck∗ + ck−2 xk−1 = 0
(4.19)
k=3
The only constant term on the left side of this equation is c1∗ , which must therefore be zero. The only even powers of x appearing in equation (4.19) are in the right-most series when k is odd. The coefficients of these powers of x must be zero, hence ∗ k2 ck∗ + ck−2 =0
for k = 3 5 7
But then all odd-indexed coefficients are multiples of c1∗ , which is zero, so ∗ c2k+1 =0
for k = 0 1 2
To determine the even-indexed coefficients, replace k by 2j in the second summation of equation (4.19) and k with j in the first summation to get
−1j 2j−1 ∗ ∗ ∗ x + 4c x + 4j 2 c2j + c2j−2 x2j−1 = 0 2 2j−2 j!j − 1! j=1 2 j=2
Now combine terms and write this equation as
−1j 2 ∗ ∗ 2j−1 4c2∗ − 1x + + 4j c + c = 0 2j 2j−2 x 2j−2 j!j − 1! 2 j=2 Equate the coefficient of each power of x to zero. We get c2∗ =
1 4
and the recurrence relation ∗ = c2j
−1j+1 1 ∗ c − 22j j!2 j 4j 2 2j−2
for j = 2 3 4
CHAPTER 4
180
Series Solutions
If we write some of these coefficients, a pattern emerges:
1 −1 ∗ c4 = 2 2 1 + 24 2
1 1 1 ∗ c6 = 2 2 2 1 + + 246 2 3 and, in general, ∗ = c2j
1 −1j+1 1 −1j+1 + · · · + ∅j 1 + = 22 42 · · · 2j2 2 j 22j j!2
where ∅j = 1 +
1 1 +···+ 2 j
for j = 1 2 · · ·
We therefore have a second solution of Bessel’s equation of order zero: y2 x = y1 x lnx +
−1k+1 k=1
22k k!2
∅kx2k
for x > 0. This solution is linearly independent from y1 x for x > 0. When a differential equation with a regular singular point has only one Frobenius series solution expanded about that point, it is tempting to try reduction of order to find a second solution. This is a workable strategy if we can write y1 x in closed form. But if y1 x is an infinite series, it may be better to substitute the appropriate form of the second solution from Theorem 4.4 and solve for the coefficients.
SECTION 4.4
PROBLEMS
In each of Problems 1 through 10, (a) find the indicial equation, (b) determine the appropriate form of each of two linearly independent solutions, and (c) find the first five terms of each of two linearly independent solutions. In Problems 11 through 16, find only the form that two linearly independent solutions should take. 1. xy + 1 − xy + y = 0 2. xy − 2xy + 2y = 0
9. x2 − xy − 2x − 1y + 2y = 0 10. x2 y + xx3 + 1y − y = 0
4 y=0 11. 25x1 − x2 y − 205x − 2y + 25x − x 16 y + 12. 63x − 45x + 8y + 2x − 21 3 + x 27 4− 2 y = 0 x
6. 4x2 y + 4xy − y = 0
13. 12x4 + 3xy − 25x + 77x − 2y + 1 24 5 − y=0 3x 8 y=0 14. 3x2x + 3y + 26 − 5xy + 7 2x − x
7. x2 y − 2xy − x2 − 2y = 0
15. xx + 4y − 3x − 2y + 2y = 0
3. xx − 1y + 3y − 2y = 0 4. 4x2 y + 4xy + 4x2 − 9y = 0 5. 4xy + 2y + y = 0
8. xy − y + 2y = 0
16. 3x3 + x2 y − x10x + 1y + x2 + 2y = 0
CHAPTER
5
EULER’S METHOD ONE-STEP METHODS MULTISTEP METHODS EULER’S METHOD ONE-STEP METHODS MULTISTEP METHODS EULER’S METHOD ONE-STEP METHODS MULTISTEP METHODS EULER’S METHOD
Numerical Approximation of Solutions
Often we are unable to produce a solution of an initial value problem in a form suitable for drawing a graph or calculating numerical values. When this happens we may turn to a scheme for approximating numerical values of the solution. Although the idea of a numerical approximation is not new, it is the development and ready accessibility of high speed computers that have made it the success that it is today. Some problems thought to be intractable thirty years ago are now considered solved from a practical point of view. Using computers and numerical approximation techniques, we now have increasingly accurate models for weather patterns, national and international economies, global warming, ecological systems, fluid flow around airplane wings and ship hulls, and many other phenomena of interest and importance. A good numerical approximation scheme usually includes the following features. 1. At least for first order initial value problems, the scheme usually starts at a point x0 where the initial value is prescribed, then builds approximate values of the solution at points specified to the left or right of x0 . The accuracy of the method will depend on the distance between successive points at which the approximations are made, their increasing distance from x0 , and of course the coefficients in the differential equation. Accuracy can also be influenced by the programming and by the architecture of the computer. For some complex models, such as the Navier-Stokes equations governing fluid flow, computers have been built with architecture dedicated to efficient approximation of solutions of that particular model. 2. A good numerical scheme includes an estimate or bound on the error in the approximation. This is used to understand the accuracy in the approximation, and often to guide the user in choosing certain parameters (such as the number of points at which approximations are made, and the distance between successive points). Often a compromise must be made between increasing accuracy (say, by choosing more points) and keeping the time or cost of the computation within reason. The type of problem under consideration may dictate what might be acceptable bounds on the error. If NASA is placing a satellite in a Jupiter orbit, a one meter error might be acceptable, while an error of this magnitude would be catastrophic in performing eye surgery. 181
182
CHAPTER 5
Numerical Approximation of Solutions
3. The method must be implemented on a computer. Only simple examples, devised for illustrative purposes, can be done by hand. Many commercially available software packages include routines for approximating and graphing solutions of differential equations. Among these are MAPLE, MATHEMATICA and MATLAB. We will now develop some specific methods.
Euler’s Method Euler’s method is a scheme for approximating the solution of y = fx y
yx0 = y0
in which x0 , y0 and the function f are given. The method is a good introduction to numerical schemes because it is conceptually simple and geometrically appealing, although it is not the most accurate. Let yx denote the solution (which we know exists, but do not know explicitly). The key to Euler’s method is that if we know yx at some x, then we can compute fx yx, and therefore know the slope y x of the tangent to the graph of the solution at that point. We will exploit this fact to approximate solution values at points x1 = x0 + h, x2 = x0 + 2h, , xn = x0 + nh. First choose h (the step size) and the number n of iterations to be performed. Now form the first approximation. We know yx0 = y0 . Calculate fx0 y0 and draw the line having this slope through x0 y0 . This line is tangent to the integral curve through x0 y0 . Move along this tangent line to the point x1 y1 . Use y1 as an approximation to yx1 . This is illustrated in Figure 5.1. We have some hope that this is a "good" approximation, for h "small", because the tangent line fits the curve closely "near" the point. Next compute fx1 y1 . This is the slope of the tangent to the graph of the solution of the differential equation passing through x1 y1 . Draw the line through x1 y1 having this slope, and move along this line to x2 y2 . This determines y2 , which we take as an approximation to yx2 (see Figure 5.1 again). Continue in this way. Compute fx2 y2 and draw the line with this slope through x2 y2 . Move along this line to x3 y3 and use y3 as an approximation to yx3 . pe slo ) ,y2 x2 f(
y
(x2, y2)
(x1, y1)
slope y) f(x 1, 1
(x3, y3) (x0, y0)
x1 x2
x x3
slo pe 0, y 0)
x0
f (x
5.1
FIGURE 5.1 Approximation points formed according to Euler’s method.
5.1 Euler’s Method
183
In general, once we have reached xk yk , draw the line through this point having slope fxk yk and move along this line to xk+1 yk+1 . Take yk+1 as an approximation to yxk+1 . This is the idea of the method. Obviously it is quite sensitive to how much fx y changes if x and y are varied by a small amount. The method also tends to accumulate error, since we use the approximation yk to make the approximation yk+1 . In Figure 5.2, the successively drawn line segments used to determine the approximate values move away from the actual solution curve as x increases, causing the approximations to be less accurate as more of them are made (that is, as n is chosen larger). Following segments of lines is conceptually simple and appealing, but it is not sophisticated enough to be very accurate in general. We will now derive an analytic expression for the approximate solution value yk at xk . From Figure 5.1, y1 = y0 + fx0 y0 x1 − x0 At the next step, y2 = y1 + fx1 y1 x2 − x1 After we have obtained the approximate value yk , the next step (Figure 5.3) gives yk+1 = yk + fxk yk xk+1 − xk Since each xk+1 − xk = h, we can summarize the discussion as follows.
DEFINITION 5.1
Euler’s Method
Euler’s method is to define yk+1 in terms of yk by yk+1 = yk + fxk yk xk+1 − xk or yk+1 = yk + hfxk yk for k = 0 1 2 n − 1. yk is the Euler approximation to yxk .
y (x4, y4) y (x3, y3)
yk+1
(x2, y2) (x0, y0)
x0
slope f (xk, yk) yk
(x1, y1)
x1
x2
x3
x4
x
FIGURE 5.2 Accumulating error in Euler’s method.
xk xk+1 FIGURE 5.3
x
184
CHAPTER 5
Numerical Approximation of Solutions
EXAMPLE 5.1
Consider √ y = x y
y2 = 4
This separable differential equation is easily solved: 2 x2 yx = 1 + 4 This enables us to observe how the method works by direct comparison with the exact solution. First we must decide on h and n. Since we do not have any error estimates, we have no rationale for making a particular choice. For illustration, choose h = 02 and n = 20. Then x0 = 2 and x20 = x0 + nh = 2 + 2002 = 6. Now √ yk+1 = yk + 02xk yk for k = 0 1 2 19 Table 5.1 lists the Euler approximations, and Figure 5.4 shows a graph of this approximate solution (actually, a smooth curve drawn through the approximated points), together with a graph of the actual solution, for comparison. Notice that the approximation becomes less accurate as x moves further from 2.
TA B L E 5 . 1
√ Approximate Values of the Solution of y = x yy2 = 4h = 02n = 20
x
yapp x
x
yapp x
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
4 48 5763991701 6916390802 828390462 9895723242 11783171355 1398007530 1652259114 1944924644 2280094522
4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0
26.62097204 30.95499533 35.85107012 41.35964033 47.53354060 54.42799784 62.10063249 70.61145958 80.02288959 90.39972950
y(1+
100
x2 4
2
)
80 60
Euler approximation
40 20 0 2
3
4
5
6
x
FIGURE 5.4 Exact and Euler approximate solutions of √ y = x y y2 = 4 with stepsize 0.2 and twenty iterations.
5.1 Euler’s Method
TA B L E 5 . 2
185
√ Approximate Values of the Solution of y = x yy2 = 4h = 01n = 40
x
yapp x
x
yapp x
2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0
4 44 44840499716 5324524701 5855248129 6435991009 7070222385 7761559519 8513768060 9330762195 1021660479 1117550751 1221183095 1333008472 1453492754 1583116734 1722376132 1871781603 2031858741 2203148088 2386205135
4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0
25.8160330 27.89919080 30.11761755 32.47743693 34.98495199 37.64661553 40.46903007 43.45894792 46.62327117 49.96905171 53.50349126 57.23394138 61.16790347 65.31302881 69.67711854 74.26812370 79.09414521 84.16343390 89.48439050 95.06556568
Exact solution
100
h = 0.1 h = 0.2
80 60 40 20 0 2
3
4
5
6
x
FIGURE 5.5 Exact and Euler approximate solutions of √ y = x y y2 = 4, first with h = 02 and twenty iterations, and then h = 01 and forty iterations.
The accuracy of this method depends on h. If we choose h = 01 and n = 40 (so the approximation is still for 2 ≤ x ≤ 6), we get the approximate values of Table 5.2. A graph of this approximation is shown in Figure 5.5, showing an improved approximation by choosing h smaller. With today’s computing power, we would have no difficulty using a much smaller h.
186
CHAPTER 5
Numerical Approximation of Solutions
EXAMPLE 5.2
Consider y = sinxy
y2 = 1
We cannot write a simple solution for this problem. Figure 5.6 shows a direction field for y = sinxy, and Figure 5.7 repeats this direction field, with some integral curves, including the one through 2 1. This is a graph of the solution (actually an approximation done by the software used for the direction field). For a numerical approximation of the solution, choose h = 02 and n = 20 to obtain an approximate solution for 2 ≤ x ≤ 6. The generated values are given in Table 5.3, and a smooth curve is drawn through these points in Figure 5.8. y 4
2
−2
−4
0
4
2
x
–2
–4
A direction field for y = sinxy.
FIGURE 5.6
y 6 5 4 3 2 (2, 1) 1
0
1
2
3
4
5
FIGURE 5.7 A direction field and some integral curves for y = sinxy, including the integral curve through (2,1).
6
x
5.1 Euler’s Method
TA B L E 5 . 3
187
Approximate Values of the Solution of y =sinxyy2=1h=02n=20
x
yapp x
x
yapp x
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 2.6 3.8 4.0
1 1.181859485 1.284944186 1.296483096 1.251031045 1.180333996 1.102559149 1.027151468 0.9584362156 0.8976573370 0.8444064257
4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0
0.7976369224 0.7562418383 0.7192812325 0.6860163429 0.6558744703 0.6248056314 0.6032491169 0.5801105326 0.5587461082 0.5389516129
y 1.2 1.0 0.8 0.6 x 0
1
2
3
4
5
6
FIGURE 5.8 Euler approximate solution of y = sinxy y2 = 1h = 02 n = 20.
From the examples, it appears that the error in an Euler approximation is proportional to h. It can be shown that this is indeed the case, and for this reason Euler’s method is a first order method. If a method has error that is proportional to hp , it is called an order p method.
5.1.1
A Problem in Radioactive Waste Disposal
Disposal of radioactive waste materials generated by nuclear power plants, medical research, military testing and other sources is a serious international problem. In view of the long halflives of some of the materials involved, there is no real prospect for disposal, and the problem becomes one of safe storage. For example, Uranium-235 has a half-life of 713108 years, and Thorium-232 a half-life of 1391010 years. If a ton of Thorium-232 is stored, more than half of it will still be here to see our sun consume all of its fuel and die. Storage plans have been proposed and studied on both national levels (for example, by the U.S. Atomic Energy Commission) and the international level (by the International Atomic Energy Agency of the United Nations). Countries have developed a variety of policies and plans. Argentina has embarked on a program of storing containers of radioactive materials in granite vaults. Belgium is planning to bury containers in clay deposits. Canada plans to use crystalline rocks in the Canadian shield. The Netherlands is considering salt domes. France and Japan are planning undersea depositories. And the United States has a diversified approach
188
CHAPTER 5
Numerical Approximation of Solutions
which have included sites at Yucca Mountain in Nevada and the Hanford Reservation in the state of Washington. One idea which has been considered is to store the material in fifty-five gallon containers and drop them into the ocean at a point about 300 feet deep, shallow enough to prevent rupture of the drums by water pressure. It was found that drums could be manufactured that would endure indefinitely at this depth. But then another point was raised. Would the drums withstand the impact of settling on the ocean floor after being dropped from a ship? Testing showed that the drums could indeed rupture if they impacted the bottom at a speed in excess of 40 feet per second. The question now is: will a drum achieve this velocity in a 300 foot descent through seawater? To answer this question, we must analyze what happens when a drum is dropped into the water and allowed to settle to the bottom. Each 55-gallon drum weighs about 535 pounds after being filled with the material and some insulation. When in the water, the drum is buoyed up by a force equal to the weight of the water displaced. Fifty-five gallons is about 7.35 cubic feet, and the density of seawater is about 64 pounds per cubic foot, so each barrel will be subject to a buoyant force of about 470 pounds. In addition to this buoyant force, the water will impose a drag on the barrel as it sinks, impeding its descent. It is well known that objects sinking in a fluid are subject to a drag force which is proportional to a power of the velocity. Engineers had to determine the constant of proportionality and the exponent for a drum in seawater. √ After testing, they estimated that the drag force of the water was approximately equal to 05v 10/3 pounds, in which v is the velocity in feet per second. Let yt be the depth of the drum in the water at time t, with downward chosen as the positive direction. Let y = 0 at the (calm) surface of the water. Then vt = y t. The forces acting on the drum are the buoyant and drag forces (acting upward) and the force of gravity (acting downward). Since the force of gravity has magnitude mg, with m the mass of the drum, then by Newton’s law, m
√ dv = mg − 470 − 05v 10/3 dt
For this problem, mg = 535 pounds. Use g = 32 ft/sec2 to determine that m = 167 slugs. Assume that the drum is released from rest at the surface of the water. The initial value problem for the velocity of the descending drum is 167
√ dv = 535 − 470 − 05v 10/3 dt
v0 = 0
or √ dv 1 = 65 − 05v 10/3 dt 167
v0 = 0
We want the velocity with which the drum hits bottom. One approach might give us a quick answer. It is not difficult to show that a drum sinking in seawater will have a terminal velocity. If the terminal velocity of the drum is less than 40 feet per second, then a drum released from rest will never reach a speed great enough to break it open upon impact with the ocean floor, regardless of the depth! Unfortunately, a quick calculation, letting dv/dt = 0, shows that the terminal velocity is about 100 feet per second, not even close to 40. This estimate is therefore inconclusive in determining whether the drums have a velocity of 40 feet per second upon impact at 300 feet. We could try solving the differential equation for vt and integrating to get an equation for the depth at time t. Setting yt = 300 would then yield the time required for the drum to reach this depth, and we could put this time back into vt to see if the velocity exceeds
5.1 Euler’s Method
189
40 feet per second at this time. The differential equation is separable. However, solving it leads to the integral 1 √ dv 10/3 v − 130 which has no elementary evaluation. Another approach would be to express the velocity as a function of the depth. A differential equation for vy can be obtained by writing dv dv dv dy = =v dt dy dt dy This gives us the initial value problem √
dv 65 − 05v = dy 167v
10/3
v0 = 0
(5.1)
This equation is also separable, but we cannot perform the integrations needed to find vy explicitly. We have reached a position that is common in modeling a real-world phenomenon. The model (5.1) does not admit a closed form solution. At this point we will opt for a numerical approach to obtain approximate values for the velocity. But life is not this easy! If we attempt Euler’s method on the problem with equation (5.1), we cannot even get started because the initial condition is v0 = 0, and v occurs in the denominator. There is a way around this difficulty. Reverse perspective and look for depth as a function of velocity, yv. We will then calculate y40, the depth when the velocity reaches 40 feet per second. Since the velocity is an increasing function of the depth, if y40 > 300 feet, we will know that the barrel could not have achieved a velocity of 40 feet per second when it reached the bottom. If y40 < 300, then we will know that when the drum hits bottom it was moving at more than 40 feet per second, hence is likely to rupture. Since dy/dv = 1/dv/dy, the initial value problem for yv is dy 167v √ = dv 65 − 05v 10/3
y0 = 0
√ Write 10/3 ≈ 1054 and apply Euler’s method with h = 1 and n = 40. We get y40 ≈ 2682 feet. With h = 05 and n = 80 we get y40 ≈ 2723 feet. Further reductions in step size will provide better accuracy. With h = 01 and n = 400 we get y40 ≈ 2755 feet. Based on these numbers it would appear that the drum will exceed 40 feet per second when it has fallen 300 feet, hence has a good chance of leaking dangerous material. A more detailed analysis, using an error bound that we have not discussed, leads to the conclusion that the drum achieves a velocity of 40 feet per second somewhere between 272 and 279 feet, giving us confidence that it has reached this velocity by the time it lands on the ocean floor. This led to the conclusion that the plan for storing radioactive waste materials in drums on the ocean floor is too dangerous to be feasible.
SECTION 5.1
PROBLEMS
In each of Problems 1 through 6, generate approximate numerical solutions using h = 02 and twenty iterations,
then h = 01 and forty iterations, and finally h = 005 and eighty iterations. Graph the approximate solutions on the
CHAPTER 5
190
Numerical Approximation of Solutions y = 1/x y1 = 0. Use h = 001. Will this approximation be less than or greater than the actual value?
same set of axes. Also obtain error bounds for each case. In each of Problems 1 through 5, obtain the exact solution and graph it with the approximate solutions.
9. In the analysis of the radioactive waste disposal problem, how does the constant of proportionality for the drag on the drum affect the conclusion? √ Carry out the numerical analysis if the drag is √03v 10/3 , and again for the case that the drag is 08v 10/3 .
1. y = y sinx y0 = 1
2. y = x + y y1 = −3 3. y = 3xy y0 = 5 4. y = 2 − x y0 = 1 5. y = y − cosx y1 = −2
6. y = x − y y0 = 4 2
7. Approximate e as follows. Use Euler’s method with h = 001 to approximate y1, where yx is the solution of y = y y0 = 1. Sketch a graph of the solution before applying Euler’s method and determine whether the approximate value obtained is less than or greater than the actual value. 8. Approximate ln2 by using Euler’s method to approximate y2, where yx is the solution of
5.2
√ 10. Try exponents other than 10/3 for the velocity in the disposal problem to gauge the effect of this number on the conclusion. In particular, perform the analysis √ if the drag equals 05v (1 is slightly less than 10/3) and again for√a drag effect of 05v4/3 (4/3 is slightly greater than 10/3). 11. Suppose the drums are dropped over a part of the ocean having a depth of 340 feet. Will the drums be likely to rupture on impact with the ocean floor?
One-Step Methods Euler’s method is a one-step method because the approximation at xk+1 depends only on the approximation at xk , one step back. We will consider some other one-step methods for the initial value problem y = fx y
yx0 = y0
As usual, let the step size be h, and denote xk = x0 + kh for k = 0 1 2 n.
5.2.1
The Second-Order Taylor Method
By Taylor’s theorem with remainder (under certain conditions on f and h) we can write 1 2 h y xk 2! 1 1 + · · · + hm ym xk + hm+1 ym+1 k m! m + 1!
yxk+1 = yxk + hy xk +
for some k in xk xk+1 . If ym+1 x is bounded, then the last term in this sum can be made as small as we like by choosing h small enough. We therefore form the approximation yk+1 ≈ yxk + hy xk +
1 2 1 h y xk + · · · + hm ym xk 2! m!
If m = 1, this is Euler’s method, since y xk = fxk yk . Now let m = 2. Then yk+1 ≈ yxk + hy xk +
1 2 h y xk 2!
(5.2)
5.2 One-Step Methods
191
We know that yx = fx yx. This suggests that in the approximation (5.2) we consider fxk yk as an approximation of y xk if yk is an approximation of yxk . Thus consider y xk ≈ fxk yk This leaves the term y xk in the approximation (5.2) to treat. First differentiate the expression y x = fx yx to get y x =
f
f x y + x yy x
x
y
This suggests we consider y xk ≈
f
f xk yk + xk yk fxk yk
x
y
Insert these approximations of y xk and y xk into the approximation (5.2) to get
f 1
f xk yk + xk yk fxk yk yk+1 ≈ yk + hfxk yk + h2 2
x
y This is a one-step method, because yk+1 is obtained from information at xk , one step back from xk+1
DEFINITION 5.2
Second-Order Taylor Method
The second-order Taylor method consists of approximating yxk+1 by the expression 1 2 f
f yk+1 ≈ yk + hfxk yk + h x y + xk yk fxk yk 2
x k k
y
This expression can be simplified by adopting the notation fk = fxk yk
f = fx
x
f = fy
y
and
f x y = fx k = fxk
x k k
f x y = fy k = fyk
y k k
Now the formula is 1 yk+1 ≈ yk + hfk + h2 fxk + fk fyk 2
192
CHAPTER 5
Numerical Approximation of Solutions
EXAMPLE 5.3
Consider y = y2 cosx
y0 = 1/5
With fx y = y2 cosx, we have fx = −y2 sinx and fy = 2y cosx. Form 1 yk+1 ≈ yk + hyk2 cosxk + h2 yk2 cos2 xk − h2 yk2 sinxk 2 With h = 02 and twenty iterations (n = 20) we get the approximate values given in Table 5.4, for points xk = 0 + 02k for k = 0 1 20.
TA B L E 5 . 4
Approximation Values of the Solution of y = y2 cosx y0 = 1/5
x
yapp x
x
yapp x
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
0.2 0.20832 0.2170013470 0.2256558280 0.2337991830 0.2408797598 0.2463364693 0.2496815188 0.2505900093 0.2489684556 0.2449763987
2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
02389919589 02315347821 02231744449 02144516213 02058272673 01976613648 01902141527 01836603456 01781084317 01736197077
y 0.24
0.22
0.20
0.18 x 0
1
2
3
4
FIGURE 5.9 Exact and second-order Taylor approximate solutions of y = y2 cosx y0 = 15 .
This problem can be solved exactly, and we obtain yx = 1/5 − sinx. Figure 5.9 shows a graph of this solution, together with a smooth curve drawn through the approximated function values. The student should redo the approximation, using h = 01 and n = 40 for comparison. The Euler approximations for this example, with h = 02, are yk+1 = yk + 02yk2 cosxk
5.2 One-Step Methods
193
It is instructive to compute these approximations for n = 20, and compare the accuracy of the Euler method with that of the second-order Taylor method for this problem.
5.2.2
The Modified Euler Method
Near the end of the nineteenth century the German mathematician Karl Runge noticed a similarity between part of the formula for the second-order Taylor method and another Taylor polynomial approximation. Write the second-order Taylor formula as
1 yk+1 = yk + h fk + hfx xk yk + fk fy xk yk (5.3) 2 Runge observed that the term in square brackets on the right side of this equation resembles the Taylor approximation fxk + h yk + k ≈ fk + hfx xk yk + hfy xk yk In fact, the term in square brackets in equation (5.3) is exactly the right side of the last equation if we choose = = 1/2. This suggests the approximation h hfk yk+1 ≈ yk + hf xk + yk + 2 2
DEFINITION 5.3
Modified Euler Method
The modified Euler method consists of defining the approximation yk+1 by h hfk yk+1 ≈ yk + hf xk + yk + 2 2
The method is in the spirit of Euler’s method, except that fx y is evaluated at xk + h/2 yk + hfk /2 instead of at xk yk . Notice that xk + h/2 is midway between xk and xk+1 .
EXAMPLE 5.4
Consider 1 y − y = 2x2 x
y1 = 4
Write the differential equation as y =
y + 2x2 = fx y x
Using the modified Euler method with h = 02 and n = 20 iterations, generate the approximate solution values given in Table 5.5.
194
CHAPTER 5
Numerical Approximation of Solutions
The exact solution of this initial value problem is yx = x3 + 3x. The graph of this solution, together with a smooth curve drawn through the approximated values, are shown in Figure 5.10. The two curves coincide in the scale of the drawing. For example, y5 = 140, while y20 , the approximated solution value at x20 = 1 + 2002 = 5, is 1397. This small a difference does not show up on the graph. TA B L E 5 . 5
Approximate Values of the Solution of y = y/x + 2x2 y1 = 4
x
yapp x
x
yapp x
1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0
4 5320363636 6927398601 8869292639 1119419064 1395020013 1718541062 2094789459 2525871247 3024691542 3587954731
3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0
4223164616 4935124526 5728637379 6608505841 7579532194 8646518560 9814266841 1108757877 1247125592 1397009975
y 140 120 100 80 60 40 20 0 1
2
3
4
5
x
FIGURE 5.10 Exact and modified Euler approximation of the solution of y − 1/xy = 2x2 y1 = 4.
We leave it for the student to do this example using the other approximation schemes for comparison. For this initial value problem, the other methods with h = 02 give: 1 2 Euler: yk+1 = yk + 02 y + 2xk xk k ⎤ ⎡ yk + 01 xyk + 2xk2 k + 2xk + 012 ⎦ Modified Euler: yk+1 = yk + 02 ⎣ xk + 01 Second-order Taylor:
yk yk yk 2 + 2xk + 002 − + 4xk + + 2xk yk+1 = yk + 02 xk xk xk
5.2 One-Step Methods
5.2.3
195
Runge-Kutta Methods
An entire class of one-step methods is generated by replacing the right side in the modified Euler method with the general form afk + bfxk + h yk + hfk The idea is to choose the constants a, b, and to obtain an approximation with as favorable an error bound as possible. The fourth order Runge-Kutta method (known as RK4) has proved both computationally efficient and accurate, and is obtained by a clever choice of these constants in approximating slopes at various points. Without derivation, we will state the method.
DEFINITION 5.4
RK4
The RK4 method of approximation is to define yk+1 in terms of yk by yk+1 = yk +
h
W + 2Wk2 + 2Wk3 + Wk4 6 k1
where Wk1 = fk Wk2 = fxk + h/2 yk + hWk1 /2 Wk3 = fxk + h/2 yk + hWk2 /2 Wk4 = fxk + h yk + hWk3
EXAMPLE 5.5
Consider y =
1 cosx + y y0 = 1 y
Choose h = 01 and n = 40 to obtain approximate solution values at 1, 11 , , 39, 4. Compute Wk1 =
1 cosxk + yk yk
Wk2 =
1 cosxk + yk + 005 + 005Wk1 yk + 005Wk1
Wk3 =
1 cosxk + yk + 005 + 005Wk2 yk + 005Wk2
Wk4 =
1 cosxk + yk + 01 + 01Wk3 yk + 01Wk3
and 01 Wk1 + 2Wk2 + 2Wk3 + Wk4 6 For comparison, we also compute approximations using modified Euler: 01 005 cosxk + yk yk+1 = yk + cos xk + yk + 005 + yk + 0051/yk cosxk + yk yk yk+1 = yk +
196
CHAPTER 5
Numerical Approximation of Solutions
TA B L E 5 . 6
Approximate Values of the Solution of y = 1/y cosx + y y0 = 1
x
RK4 Application
Modified Euler Application
x
RK4 Application
Modified Euler Application
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2
1 1046496334 1079334533 1100247716 1110582298 1111396688 1103521442 1087597013 1064096638 1033337679 09954821879 −1711647122 −176180864 −1809127759 −1853613925 −1895247358 −1933982047 −1969747992 −2002452881 −2031983404 −2058206363 −2080969742
1 1046149156 1078757139 1099515885 1109747010 1110493110 1102574266 1086623840 1063110255 1032347712 09944964300 −1168193825 −1209281807 −1244631007 −1274170205 −1297734699 −1315071925 −1325843341 −1329623556 −1325897430 −1314055719 −1293389806
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0
09505251773 08982661137 08382537806 07696832419 06911954958 06004493273 04930456632 03589118368 0156746754 −1602586399 −1658600340 −2100103902 −2115423092 −2126727488 −2133806037 −2136440412 −2134410465 −2127506124 −2115514719
09495508719 08973115249 0837306783 07688117656 06904155222 05998524927 04928907147 03602630404 01719452522 −1068545688 −1121333887 −1263086041 −1222220133 −1169751386 −1104515644 −1025210458 −0930599244 −08181811612 −06862852770
y 1.0 0.5 x 0
1
2
3
4
–0.5 –1.0
Euler modified
–1.5 –2.0
RK4
FIGURE 5.11 Runge-Kutta and modified Euler approximations of the solution of y = 1/y cosx + y y0 = 1.
Table 5.6 shows the computed values, and Figure 5.11 shows graphs drawn through the approximated points by both methods. The two graphs are in good agreement as x nears 2, but then they diverge from each other. This divergence can be seen in the table. In general, RK4 is more accurate than modified Euler, particularly as the distance increases from the point where the initial data is specified. It can be shown that the Taylor and modified Euler methods are of order h2 , while RK4 is of order h4 . Since usually 0 < h < 1, h4 < h2 < h , so accuracy improves much faster by choosing smaller h with RK4 than with the other methods. There are higher order Runge-Kutta
5.3 Multistep Methods
197
methods which are of order hp for larger p. Such methods offer greater accuracy, but usually at a cost of more computing time. All of the methods of this section are one-step methods, which have the general form yk+1 = yk + xk yk . In the next section, we will discuss multistep methods.
SECTION 5.2
PROBLEMS
In each of Problems 1 through 6, use the modified Euler, Taylor and RK4 methods to approximate the solution, first using h = 02 with twenty iterations, then h = 01 with forty iterations, and finally h = 005 with eighty iterations. Graph the approximate solutions for each method, and for a given h, on the same set of axes. 1. y = sinx + y y0 = 2 2. y = y − x2 y1 = −4. Also solve this problem exactly and include a graph of the exact solution with graphs of the approximate solutions. 3. y = cosy + e−x y0 = 1 4. y = y3 − 2xy y3 = 2 5. y = −y + e−x y0 = 4. Also solve this problem exactly and include a graph of the exact solution with graphs of the approximate solutions. 6. y = sec1/y − xy2 y/4 = 1
5.3
7. Do Problem 3 of Section 5.1 with RK4 instead of Euler. Which method yields the better result? 8. Do Problem 5 of Section 5.1 with RK4 instead of Euler. Which method yields the better result? 9. Derive the improved Euler method, also known as the Heun method, as follows. Begin with Euler’s method and replace fk with fk + fk+1 /2. Next replace yk+1 in fk+1 with yk + hfk . The result should be the approximation scheme yk+1 = yk + h/2fk + fxk+1 yk + hfk . In each of Problems 10 through 12, use Euler, modified Euler and improved Euler to approximate the solution. Use h = 02 with n = 20, then h = 01 with n = 40 and then h = 005 with n = 80. Graph the approximate solutions, for each h, on the same set of axes. Whenever the solution can be found in exact form, graph this solution with the approximate solutions for comparison. 10. y = 1 − y y0 = 2 11. y = −y/x + x y1 = 1 12. y = y − ex y−1 = 4
Multistep Methods We continue with the problem y = fx y yx0 = y0 . The solution is yx, and we presumably do not have a "good" expression for this function. We want to obtain approximate values yk to yxk , where xk = x0 + kh for k = 0 1 n. The basis for some multistep methods is the informal belief that, if pk x is a polynomial that x x approximates fx yx on xk xk+1 , then xkk+1 pk xdx should approximate xkk+1 fx yxdx. Now write xk+1 xk+1 xk+1 yxk+1 − yxk = y xdx = fx yxdx ≈ pk xdx xk
xk
Therefore yxk+1 ≈ yxk +
xk
xk+1
xk
pk xdx
(5.4)
So far this is deliberately vague. Nevertheless, this proposed approximation (5.4) contains the germ of an idea which we will now pursue.
198
CHAPTER 5
Numerical Approximation of Solutions
First we must decide how to choose the polynomials pk x. Suppose we have somehow arrived at satisfactory approximations yk , yk−1 yk−r to the solution at xk , xk−1 xk−r , respectively. Then fk = fxk yk ≈ fxk yxk fk−1 = fxk−1 yk−1 ≈ fxk−1 yxk−1 fk−r = fxk−r yk−r ≈ fxk−r yxk−r Keep in mind here that yxk is the exact solution evaluated at xk (this is unknown to us), and yk is an approximation of this solution value, obtained by some scheme. Now choose pk x to be the polynomial of degree r passing through the points xk fk xk−1 fk−1 xk−r fk−r These r +1 points will uniquely determine the r-degree polynomial pk x. When this polynomial is inserted into the approximation scheme (5.4), we obtain a multistep approximation method in which the approximation yk+1 of yxk+1 is defined by yk+1 = yk +
xk+1 xk
pk xdx
(5.5)
We obtain different methods for different choices of r. Consider some cases of interest.
5.3.1
Case 1 r = 0
Now pk x is a zero degree polynomial, or constant. Specifically, pk x = fk for xk ≤ x ≤ xk+1 . The approximation scheme defined by equation (5.5) becomes yk+1 = yk +
xk+1 xk
fk dx = yk + fk xk+1 − xk = yk + hfk
for k = 0 1 2 , n − 1. This is Euler’s method, a one-step method.
5.3.2
Case 2 r = 1
Now pk x is a first degree polynomial, whose graph is the straight line through xk fk and xk−1 fk−1 . Therefore 1 1 pk x = − x − xk fk−1 + x − xk−1 fk h h Upon inserting this into the scheme (5.5) we get yk+1 = yk +
xk+1 xk
1 1 − x − xk fk−1 + x − xk−1 fk dx h h
5.3 Multistep Methods
199
A routine integration which we omit yields h yk+1 = yk + 3fk − fk−1 2 for k = 1 2 n − 1. This is a two-step method because computation of yk+1 requires prior computation of information at the two points xk and xk−1 . For larger r, the idea is the same, but the details are move involved because pk x is of degree r and the integral in equation (5.5) is more involved, though still elementary. Here are the final results for two more cases.
5.3.3
Case 3 r = 3 yk+1 = yk +
h
23fk − 16fk−1 + 5fk−2 12
for k = 2 3 n − 1. This is a three-step method, requiring information at three points to compute the approximation yk+1 .
5.3.4
Case 4 r = 4 yk+1 = yk +
h
55fk − 59fk−1 + 37fk−2 − 9fk−3 24
(5.6)
for k = 3 4 n − 1. This is a four-step method. We might expect multistep methods to improve in accuracy as the number of steps increases, since more information is packed into the computation of the approximation at the next point. This is in general true, and an r-step method using an interpolating polynomial of degree r on each subinterval has error of order Ohr . The cost in improved accuracy is that the polynomials must be computed on each interval and more data is put into computation of each successive yk+1 . The schemes just given for r = 1 2, and 3 are called Adams-Bashforth multistep methods. One drawback to a multistep method is that some other method must be used to initiate it. For example, equation (5.6) involves fk−3 , and so is only valid for k = 3 4 n − 1. Some other scheme must be used to start it by feeding in y1 , y2 and y3 (and, of course, y0 is given as information). Often RK4 is used as an initiator in computing these first values. Another class of multistep methods, called Adams-Moulton methods, is obtained by using different data points to determine the interpolating polynomial pk x to use in equation (5.4). For r = 2, pk x is now chosen as the unique second degree polynomial passing through xk−1 fk−1 , xk fk and xk+1 fk+1 . This leads to the approximating scheme yk+1 = yk +
h
9f + 19fk − 5fk−1 + fk−2 24 k+1
(5.7)
This Adams-Moulton method is a four step method, and has error of order Oh4 . There is a significant difference between the Adams-Bashforth method (5.6) and the AdamsMoulton method (5.7). The former determines yk+1 in terms of three previously computed quantities, and is said to be explicit. The latter contains yk+1 on both sides of the equation (5.7), because yk+1 = fxk+1 yk+1 , and therefore defines yk+1 implicitly by an equation containing yk+1 on both sides. Equation (5.7) therefore only provides an equation containing yk+1 , from which yk+1 must then be extracted, a perhaps nontrivial task.
CHAPTER 5
200
Numerical Approximation of Solutions
SECTION 5.3
PROBLEMS
In each of Problems 1 through 5, use the Taylor, modified Euler, and RK4 methods to approximate solution values. First use h = 02 with 20 iterations, then h = 01 with 40 iterations, then h = 005 with 80 iterations.
1. y = 4y − x y3 = 0 2
2. y = x siny − x2 y1 = −3 3. y = x2 + 4y y0 = −2 4. y = 1 − cosx − y + x2 y3 = 6 5. y = 4x3 − xy + cosy y0 = 4 In each of Problems 6, 7 and 8, use the Adams-BashforthMoulton scheme, first with h = 02 and twenty iterations, then with h = 01 and forty iterations. 6. y = y − x3 y−2 = −4 7. y = 2xy − y3 y0 = 2 8. y = lnx + x2 y y2 = 1
9. Carry out the details for deriving the two-step scheme stated for the case r = 2. 10. Carry out the details for deriving the three-step scheme stated for the case r = 3. 11. Every one-step and multistep method we have considered is a special case of the general expression yk+1 =
n
j yk+1−j
j=1
+ hxk+1−m xk xk+1 yk+1−m yk yk+1 By making appropriate choices of m, the j s and , show how this formula gives Euler’s method, the modified Euler method, the Taylor method, RK4, and the Adams-Bashforth method.
PA RT
2
CHAPTER 6 Vectors and Vector Spaces
CHAPTER 7
Vectors and Linear Algebra
Matrices and Systems of Linear Equations
CHAPTER 8 Determinants
CHAPTER 9 Eigenvalues, Diagonalization, and Special Matrices
Some quantities are completely determined by their magnitude, or “size.” This is true of temperature and mass, which are numbers referred to some scale or measurement system. Such quantities are called scalars. Length, volume, and distance are other scalars. By contrast, a vector carries with it a sense of both magnitude and direction. The effect of a push against an object will depend not only on the magnitude or strength of the push, but also on the direction in which it is exerted. This part is concerned with the notation and algebra of vectors and objects called matrices. This algebra will be used to solve systems of linear algebraic equations and systems of linear differential equations. It will also give us the machinery needed for the quantitative study of systems of differential equations (Part 3), in which we attempt to determine the behavior and properties of solutions when we cannot write these solutions explicitly or in closed form. In Part 4, vector algebra will be used to develop vector calculus, which extends derivatives and integrals to higher dimensions, with applications to models of physical systems, partial differential equations, and the analysis of complex-valued functions.
201
This page intentionally left blank
CHAPTER
6
ALGEBRA AND GEOMETRY OF VECTORS THE DOT PRODUCT THE CROSS PRODUCT THE VECTOR SPACE R n LINEAR INDEPENDENCE SPANNING SETS AND DIMENSION IN R n ABSTRACT VECTOR SPACES
Vectors and Vector Spaces
6.1
The Algebra and Geometry of Vectors When dealing with vectors, a real number is often called a scalar. The temperature of an object and the grade of a motor oil are scalars. We want to define the concept of a vector in such a way that the package contains information about both direction and magnitude. One way to do this is to define a vector (in 3-dimensional space) as an ordered triple of real numbers.
DEFINITION 6.1
Vector
A vector is an ordered triple a b c, in which a, b, and c are real numbers.
We represent the vector a b c as an arrow from the origin 0 0 0 to the point a b c in 3-space, as in Figure 6.1. In this way, the direction indicated by the arrow, as viewed from the origin, gives the direction of the vector. The length of the arrow is the magnitude (or norm) of the vector—a longer arrow represents a vector of greater strength. Since the distance from √ the origin to the point a b c is a2 + b2 + c2 , we will define this number to be the magnitude of the vector a b c.
DEFINITION 6.2
Norm of a Vector
The norm, or magnitude, of a vector a b c is the number a b c defined by a b c = a2 + b2 + c2 203
204
CHAPTER 6
Vectors and Vector Spaces z
z
z
(a, b, c)
18
(a, b, c)
(1, 4, 1)
y x
y
y x
FIGURE 6.1 The vector a b c is represented by the arrow from 0 0 0 to the point a b c.
x
FIGURE 6.2
−1 4 1 =
FIGURE 6.3 Parallel representations of the same vector.
√ 18.
√ √ For example, the norm of −1 4 1 is −1 4 1 = 1 + 16 + 1 = 18. This is the length of the arrow from the origin to the point −1 4 1 (Figure 6.2). The only vector that is not represented by an arrow from the origin is the zero vector 0 0 0, which has zero magnitude and no direction. It is, however, useful to have a zero vector, because various forces in a physical process may cancel each other, resulting in a zero force or vector. The number a is the first component of a b c, b is the second component, and c the third component. Two vectors are equal if and only if each of their respective components is equal: a b c = u v w if and only if a = u
b = v
c = w
We will usually denote scalars (real numbers) by letters in regular type face (a, b, c, A, B, ), and vectors by letters in boldface (a, b, c, A, B, ). The zero vector is denoted O. Although there is a difference between a vector (ordered triple) and an arrow (visual representation of a vector), we often speak of vectors and arrows interchangeably. This is useful in giving geometric interpretations to vector operations. However, any two arrows having the same length and same direction are said to represent the same vector. In Figure 6.3, all the arrows represent the same vector. We will now develop algebraic operations with vectors and relate them to the norm.
DEFINITION 6.3
Product of a Scalar and Vector
The product of a real number with a vector F = a b c is denoted F, and is defined by F = a b c
Thus a vector is multiplied by a scalar by multiplying each component by the scalar. For example, 32 −5 1 = 6 −15 3
and
− 5−4 2 10 = 20 −10 −50
The following relationship between norm and the product of a scalar with a vector leads to a simple geometric interpretation of this operation.
6.1 The Algebra and Geometry of Vectors
205
THEOREM 6.1
Let F be a vector and a scalar. Then 1. F = F. 2. F = 0 if and only if F = O. Proof
If F = a b c, then F = a b c, so F = 2 a2 + 2 b2 + 2 c2 = a2 + b2 + c2 = F
This proves conclusion (1). For (2), first recall that O = 0 0 0, so O = 02 + 02 + 02 = 0 Conversely, if F = 0, then a2 + b2 + c2 = 0, hence a = b = c = 0 and F = O Consider this product of a scalar with a vector from a geometric point of view. By (1) of the theorem, the length of F is times the length of F. Multiplying by lengthens the arrow representing F if > 1, and shrinks it to a shorter arrow if 0 < < 1. Of course, if = 0 then F is the zero vector, with zero length. But the algebraic sign of has an effect as well. If is positive, then F has the same direction as F, while if is negative, F has the opposite direction.
EXAMPLE 6.1
Let F = 2 4 1, as shown in Figure 6.4. 3F = 6 12 3 is along the same direction as F, but is represented as an arrow three times longer. But −3F = −6 −12 −3, while being three times longer than F, is in the direction opposite that of F through the origin. And 21 F = 1 2 1/2 is in the same direction as F, but half as long.
z (6, 12, 3) (1, 2, 12 ) (2, 4, 1) y (6, 12, 3)
x
FIGURE 6.4 Scalar multiples of a vector.
In particular, the scalar product of −1 with F = a b c is the vector −a −b −c, having the same length as F, but the opposite direction. This vector is called “minus F,” or the negative of F, and is denoted −F. Consistent with the interpretation of multiplication of a vector by a scalar, we define two vectors F and G to be parallel if each is a nonzero scalar multiple of the other. Of course if F = G and = 0, then G = 1/F. Parallel vectors may differ in length, and even be in opposite directions, but the straight lines through arrows representing these vectors are parallel lines in 3-space.
206
CHAPTER 6
Vectors and Vector Spaces
The algebraic sum of two vectors is defined as follows.
DEFINITION 6.4
Vector Sum
The sum of F = a1 b1 c1 and G = a2 b2 c2 is the vector F + G = a1 + a2 b1 + b2 c1 + c2
That is, we add vectors by adding respective components. For example, −4 2 + 16 1 −5 = 12 + 1 −3 If F = a1 b1 c1 and G = a2 b2 c2 , then the sum of F with −G is a1 − a2 b1 − b2 c1 − c2 . It is natural to denote this vector as F − G, and refer to it as “F minus G.” For example, −4 2 minus 16 1 −5 is −4 2 − 16 1 −5 = −20 − 1 7 We therefore subtract two vectors by subtracting their respective components. Vector addition and multiplication of a vector by a scalar have the following computational properties. THEOREM 6.2
Algebra of Vectors
Let F, G, and H be vectors and let and be scalars. Then 1. 2. 3. 4. 5. 6.
F + G = G + F. F + G + H = F + G + H. F + O = F. F + G = F + G. F = F + F = F + F
Conclusion (1) is the commutative law for vector addition, and (2) is the associative law. Conclusion (3) states that the zero vector behaves with vectors like the number zero does with real numbers, as far as addition is concerned. The theorem is proved by routine calculations, using the properties of real-number arithmetic. For example, to prove (4), write F = a1 b1 c1 and G = a2 b2 c2 . Then F + G = a1 + a2 b1 + b2 c1 + c2 = a1 + a2 b1 + b2 c1 + c2 = a1 + a2 b1 + b2 c1 + c2 = a1 b1 c1 + a2 b2 c2 = a1 b1 c1 + a2 b2 c2 = F + G Vector addition has a simple geometric interpretation. If F and G are represented as arrows from the same point P, as in Figure 6.5, then F + G is represented as the arrow from P to the opposite vertex of the parallelogram having F and G as two incident sides. This is called the parallelogram law for vector addition.
6.1 The Algebra and Geometry of Vectors
207
z (0, 0, 1)
F
G
F G
G
G G
P
(0, 1, 0)
F
F
k
G
y
j
i (1, 0, 0)
F
F
x
FIGURE 6.5
FIGURE 6.6
Parallelogram law for vector addition.
Another way of visualizing the parallelogram law.
FIGURE 6.7 Unit vectors along the axes.
The parallelogram law suggests a strategy for visualizing addition that is sometimes useful. Since two arrows having the same direction and length represent the same vector, we could apply the parallelogram law to form F + G as in Figure 6.6, in which the arrow representing G is drawn from the tip of F, rather from a common initial point with F. We often do this in visualizing computations with vectors. Any vector can be written as a sum of scalar multiples of “standard” vectors as follows. Define i = 1 0 0
j = 0 1 0
k = 0 0 1
These are unit vectors (length 1) aligned along the three coordinate axes in the positive direction (Figure 6.7). In terms of these vectors, F = a b c = a1 0 0 + b0 1 0 + c0 0 1 = ai + bj + ck This is called the standard representation of F. When a component of F is zero, we usually just omit it in the standard representation. For example, −3 0 1 = −3i + k Figure 6.8 shows two points P1 a1 b1 c1 and P2 a2 b2 c2 . It will be useful to know the vector represented by the arrow from P1 to P2 . Let H be this vector. Denote G = a1 i + b1 j + c1 k
and F = a2 i + b2 j + c2 k
By the parallelogram law (Figure 6.9), G + H = F Hence H = F − G = a2 − a1 i + b2 − b1 j + c2 − c1 k For example, the vector represented by the arrow from −2 4 1 to 14 5 −7 is 16i + j − 8k. The vector from 14 5 −7 to −2 4 1 is the negative of this, or −16i − j + 8k. Vector notation and algebra are often useful in solving problems in geometry. This is not our goal here, but the reasoning involved is often useful in solving problems in the sciences and engineering. We will give three examples to demonstrate the efficiency of thinking in terms of vectors.
208
CHAPTER 6
Vectors and Vector Spaces (a 1, b1, c1 )
z
z
H (a 2 a 1, b 2 b1, c2 c1 )
G
(a 1, b1, c1)
(a 2, b2, c2)
(a 2, b2, c2)
F (a 2, b2, c2)
y
y x
x
FIGURE 6.8
FIGURE 6.9
The arrow from a1 b1 c1 to a2 b2 c2 is a2 − a1 i + b2 − b1 j + c2 − c1 k.
EXAMPLE 6.2
Suppose we want the equation of the line L through the points 1 −2 4 and 6 2 −3. This problem is more subtle in 3-space than in the plane because in three dimensions there is no point-slope formula. Reason as follows. Let x y z be any point on L. Then (Figure 6.10), the vector represented by the arrow from 1 −2 4 to x y z must be parallel to the vector from 1 −2 4 to 6 2 −3, because arrows representing these vectors are both along L. This means that x − 1i + y + 2j + z − 4k is parallel to 5i + 4j − 7k. Then, for some scalar t, x − 1i + y + 2j + z − 4k = t 5i + 4j − 7k But then the respective components of these vectors must be equal: x − 1 = 5t
y + 2 = 4t
z − 4 = −7t
x = 1 + 5t
y = −2 + 4t
Then z = 4 − 7t
(6.1)
A point is on L if and only if its coordinates are 1 + 5t −2 + 4t 4 − 7t for some real number t (Figure 6.11). Equations (6.1) are parametric equations of the line, with t, which can be assigned any real value, as parameter. When t = 0, we get 1 −2 4, and when t = 1, we get 6 2 −3. We can also write the equation of this line in what is called normal form. By eliminating t, this form is x−1 y+2 z−4 = = 5 4 −7 We may also envision the line as swept out by the arrow pivoted at the origin and extending to the point 1 + 5t −2 + 4t 4 − 7t as t varies over the real numbers. Some care must be taken in writing the normal form of a straight line. For example, the line through 2 −1 6 and −4 −1 2 has parametric equations x = 2 − 6t
y = −1
z = 6 − 4t
If we eliminate t, we get x−2 z−6 = −6 −4
y = −1
6.1 The Algebra and Geometry of Vectors
209
z
(x, y, z)
(x 1)i (y 2)j (z 4)k (1, 2, 4) L y
5i 4j 7k
z
(1 5t, 2 4t, 4 7t)
L y
x ( 6, 2, 3)
x
FIGURE 6.10
FIGURE 6.11
Every point on the line has second coordinate −1, and this is independent of t. This information must not be omitted from the equations of the line. If we omit y = −1 and write just x−2 z−6 = −6 −4 then we have the equation of a plane, not a line.
EXAMPLE 6.3
Suppose we want a vector F in the x y−plane, making an angle of /7 with the positive x−axis, and having magnitude 19. By “find a vector” we mean determine its components. Let F = ai + bj. From the right triangle in Figure 6.12, cos/7 =
a 19
and
sin/7 =
b 19
Then F = 19 cos/7i + 19 sin/7j
y 19 π 7
a
b x
FIGURE 6.12
EXAMPLE 6.4
We will prove that the line segments formed by connecting successive midpoints of the sides of a quadrilateral form a parallelogram. Again, our overall objective is not to prove theorems of geometry, but this argument is good practice in the use of vectors. Figure 6.13 illustrates what we want to show. Draw the quadrilateral again, with arrows (vectors) A, B, C, and D as sides. The vectors x, y, u, and v drawn with dashed lines connect
210
CHAPTER 6
Vectors and Vector Spaces P1 B y x A
u
v
P0 FIGURE 6.13
C
D
FIGURE 6.14
the midpoints of successive sides (Figure 6.14). We want to show that x and u are parallel and of the same length, and that y and v are also parallel and of the same length. From the parallelogram law for vector addition and the definitions of x and u, 1 1 x = A+ B 2 2 and 1 1 u = C + D 2 2 But also by the parallelogram law, A + B is the arrow from P0 to P1 , while C + D is the arrow from P1 to P0 . These arrows have the same length, and opposite directions. This means that A + B = −C + D But then x = −u, so these vectors are parallel and of the same length (just opposite in direction). A similar argument shows that y and v are also parallel and of the same length, completing the proof.
SECTION 6.1
PROBLEMS
In each of Problems 1 through 5, compute F + G, F − G, F, G, 2F, and 3G. √ 1. F = 2i − 3j + 5k G = 2i + 6j − 5k 2. F = i − 3k G = 4j
9. F = i − 2j G = i − 3j 10. F = −i + 4j G = −2i − 2j In each of Problems 11 through 15, determine F and represent F and F as arrows.
3. F = 2i − 5j G = i + 5j − k √ 4. F = 2i + j − 6k G = 8i + 2k
11. F = i + j = −1/2
5. F = i + j + k G = 2i − 2j + 2k
13. F = −3j = −4
In each of Problems 6 through 10, calculate F + G and F − G by representing the vectors as arrows and using the parallelogram law.
14. F = 6i − 6j = 1/2
6. F = i G = 6j 7. F = 2i − j G = i − j
In each of Problems 16 through 21, find the parametric equations of the straight line containing the given points. Also find the normal form of this line.
8. F = −3i + j G = 4j
16. (1,0,4),(2,1,1)
12. F = 6i − 2j = 2
15. F = −3i + 2j = 3
6.2 The Dot Product 17. 3 0 0 −3 1 0
25. 15 7/4
18. 2 1 1 2 1 −2
26. 25 3/2
19. (0,1,3), (0,0,1) 20. 1 0 −4 −2 −2 5 21. 2 −3 6 −1 6 4 In each of Problems 22 through 26, find a vector F in the x y-plane having the given length and making the angle (given in radians) with the positive x-axis. Represent the vector as an arrow in the plane. √ 22. 5 /4 23. 6 /3 24. 5 3/5
6.2
211
27. Let P1 P2 Pn be distinct points in 3-space, with n ≥ 3. Let Fi be the vector represented by the arrow from Pi to Pi+1 for i = 1 2 n − 1 and let Fn be the vector represented by the arrow from Pn to P1 . Prove that F1 + F2 + · · · + Fn = O. 28. Let F be any nonzero vector. Determine a scalar t such that tF = 1. 29. Use vectors to prove that the altitudes of any triangle intersect in a single point. (Recall that an altitude is a line from a vertex, perpendicular to the opposite side of the triangle.)
The Dot Product Throughout this section, let F = a1 i + b1 j + c1 k and G = a2 i + b2 j + c2 k
DEFINITION 6.5
Dot Product
The dot product of F and G is the number F · G defined by F · G = a1 a 2 + b 1 b 2 + c 1 c 2
For example,
√ √ 3i + 4j − k · −2i + 6j + 3k = −2 3 + 24 − 3
Sometimes the dot product is referred to as a scalar product, since the dot product of two vectors is a scalar (real number). This must not be confused with the product of a vector with a scalar. Here are some rules for operating with the dot product. THEOREM 6.3
Properties of the Dot Product
Let F, G, and H be vectors, and a, scalar. Then 1. 2. 3. 4.
F · G = G · F. F + G · H = F · H + G · H. F · G = F · G = F · G. F · F = F2 .
5. F · F = 0 if and only if F = O. Conclusion (1) is the commutativity of the dot product (we can perform the operation in either order), and (2) is a distributive law. Conclusion (3) states that a constant factors through a
212
CHAPTER 6
Vectors and Vector Spaces
dot product. Conclusion (4) is very useful in some kinds of calculations, as we will see shortly. A proof of the theorem involves routine calculations, two of which we will illustrate. Proof
For (3), write F · G = a1 a2 + b1 b2 + c1 c2 = a1 a2 + b1 b2 + c1 c2 = F · G = a1 a2 + b1 b2 + c1 c2 = F · G
For (4), we have F · F = a1 i + b1 j + c1 k · a1 i + b1 j + c1 k = a21 + b12 + c12 = F2 Using conclusion (4) of the theorem, we can derive a relationship we will use frequently. LEMMA 6.1
Let F and G be vectors, and let and be scalars. Then F + G2 = 2 F2 + 2F · G + 2 G Proof
By using Theorem 6.3, we have F + G2 = F + G · F + G = 2 F · F + F · G + G · F + 2 G · G = 2 F · F + 2F · G + 2 G · G = F2 + 2F · G + G2
The dot product can be used to determine the angle between vectors. Represent F and G as arrows from a common point, as in Figure 6.15. Let be the angle between F and G. The arrow from the tip of F to the tip of G represents G − F, and these three vectors form the sides of a triangle. Now recall the law of cosines, which states, for the triangle of Figure 6.16, that a2 + b2 − 2ab cos = c2
(6.2)
Apply this to the vector triangle of Figure 6.15, with sides of length a = G, b = F, and c = G − F. By using Lemma 6.1 with = −1 and = 1, equation (6.2) becomes G2 + F2 − 2 G F cos = G − F2 = G2 + F2 − 2G · F Then F · G = F G cos Assuming that neither F nor G is the zero vector, then cos =
F·G F G
(6.3)
This provides a simple way of computing the cosine of the angle between two arrows representing vectors. Since vectors can be drawn along straight lines, this also lets us calculate the angle between two intersecting lines.
6.2 The Dot Product
b c
θ
z (1, 3, 1)
θ
G–F
G
213
y
θ
a
x (0, 2, 4)
F FIGURE 6.15
FIGURE 6.16
FIGURE 6.17
Law of cosines: a2 + b2 − 2ab cos = c2 .
EXAMPLE 6.5
Let F = −i + 3j + k and G = 2j − 4k. The cosine of the angle between these vectors (Figure 6.17) is cos = = Then
−i + 3j + k · 2j − 4k −i + 3j + k 2j − 4k −10 + 32 + 1−4 2 =√ √ √ 2 2 2 2 2 1 +3 +1 2 +4 220 √ = arccos2/ 220
√ which is that unique number in 0 whose cosine is 2/ 220. is approximately 1436 radians.
EXAMPLE 6.6
Lines L1 and L2 are given, respectively, by the parametric equations x = 1 + 6t
y = 2 − 4t
z = −1 + 3t
and x = 4 − 3p
y = 2p
z = −5 + 4p
in which the parameters t and p take on all real values. We want the angle between these lines at their point of intersection, which is 1 2 −1 (on L1 for t = 0 and on L2 for p = 1). Of course, two intersecting lines have two angles between them (Figure 6.18). However, the sum of these angles is , so either angle determines the other. The strategy for solving this problem is to find a vector along each line, then find the angle between these vectors. For a vector F along L1 , choose any two points on this line, say 1 2 −1 and, with t = 1, 7 −2 2. The vector from the first to the second point is F = 7 − 1i + −2 − 2j + 2 − −1k = 6i − 4j + 3k
214
CHAPTER 6
Vectors and Vector Spaces z
y F
(7, 2, 2) L1
ϕ
θ
(1, 2, 1)
G
x
θ (4, 0, 5 ) L2
FIGURE 6.18
FIGURE 6.19
Two points on L2 are 1 2 −1 and, with p = 0, 4 0 −5. The vector G from the first to the second of these points is G = 4 − 1i + 0 − 2j + −5 − −1k = 3i − 2j − 4k These vectors are shown in Figure 6.19. The cosine of the angle between F and G is F·G 14 63 − 4−2 + 3−4 =√ =√ √ F G 36 + 16 + 9 9 + 4 + 16 1769 √ One angle between the lines is = arccos14/ 1769, approximately 123 radians. cos =
If we had √ used −G in place of G in this calculation, we would have gotten = arccos −14/ 1769, or about 191 radians. This is the supplement of the angle found in the example.
EXAMPLE 6.7
The points A1 −2 1, B0 1 6, and C−3 4 −2 form the vertices of a triangle. Suppose we want the angle between the line AB and the line from A to the midpoint of BC. This line is a median of the triangle and is shown in Figure 6.20 Visualize the sides of the triangle as vectors, as in Figure 6.21. If P is the midpoint of BC, then H1 = H2 because both vectors have the same direction and length. From the coordinates of the vertices, calculate F = −i + 3j + 5k and G = −4i + 6j − 3k We want the angle between F and K, so we need K. By the parallelogram law, F + H1 = K and K + H2 = G Since H1 = H2 , these equations imply that K = F + H1 = F + G − K Therefore, 1 5 9 K = F + G = − i + j + k 2 2 2
6.2 The Dot Product A
A
θ
θ
B
215
G
F C
P
FIGURE 6.20
B
K H1
P
H2
C
FIGURE 6.21
Now the cosine of the angle we want is cos =
F·K 42 42 =√ √ =√ F K 35 110 3850
is approximately 083 radians. The arrows representing two nonzero vectors F and G are perpendicular exactly when the cosine of the angle between them is zero, and by equation (6.3) this occurs when F · G = 0. This suggests we use this condition to define orthogonality (perpendicularity) of vectors. If we agree to the convention that the zero vector is orthogonal to every vector, then this dot product condition allows a general definition without requiring that the vectors be nonzero.
DEFINITION 6.6
Orthogonal Vectors
Vectors F and G are orthogonal if and only if F · G = 0.
EXAMPLE 6.8
Let F = −4i + j + 2k, G = 2i + 4k and H = 6i − j − 2k. Then F · G = 0, so F and G are orthogonal. But F · H and G · H are nonzero, so F and H are not orthogonal, and G and H are not orthogonal. Sometimes orthogonality of vectors is a useful device for dealing with lines and planes in three-dimensional space.
EXAMPLE 6.9
Two lines are given parametrically by L1 x = 2 − 4t
y = 6 + t
z = 3t
and L2 x = −2 + p
y = 7 + 2p
z = 3 − 4p
We want to know whether these lines are perpendicular. (It does not matter whether the lines intersect).
216
CHAPTER 6
Vectors and Vector Spaces
The idea is to form a vector along each line and test these vectors for orthogonality. For a vector along L1 , choose two points on this line, say 2 6 0 when t = 0 and −2 7 3 when t = 1. Then F = −4i + j + 3k is along L1 . Two points on L2 are −2 7 3 when p = 0 and −1 9 −1 when p = 1. Then G = i + 2j − 4k is along L2 . Since F · G = −14 = 0, these vectors, hence these lines, are not orthogonal.
EXAMPLE 6.10
Suppose we want the equation of a plane to the vector N = −2i + 4j + k.
containing the point −6 1 1 and perpendicular
(6, 1, 1) z
(x 6)i (y 1) j (z 1)k (x, y, z) 2i 4 j k y
x FIGURE 6.22
A strategy to find such an equation is suggested by Figure 6.22. A point x y z is on if and only if the vector from −6 1 1 to x y z is in and therefore is orthogonal to N. This means that x + 6i + y − 1j + z − 1k · N = 0 Carrying out this dot product, we get the equation −2x + 6 + 4y − 1 + z − 1 = 0 or −2x + 4y + z = 17 This is the equation of the plane. Of course the given point −6 1 1 satisfies this equation. We will conclude this section with the important Cauchy-Schwarz inequality, which states that the dot product of two vectors cannot be greater in absolute value than the product of the lengths of the vectors. THEOREM 6.4
Cauchy-Schwarz Inequality
Let F and G be vectors. Then F · G ≤ F G If either vector is the zero vector, then both sides of the proposed inequality are zero. Thus suppose neither vector is the zero vector. In this event,
Proof
cos =
F·G F G
6.3 The Cross Product
217
where is the angle between F and G. But then −1 ≤
F·G ≤ 1 F G
so − F G ≤ F · G ≤ F G which is equivalent to the Cauchy-Schwarz inequality.
SECTION 6.2
PROBLEMS
In each of Problems 1 through 6, compute the dot product of the vectors and the cosine of the angle between them. Also determine if they are orthogonal and verify the Cauchy-Schwarz inequality for these vectors. 1. i 2i − 3j + k 2. 2i − 6j + k i − j
11. 0 −1 4 7i + 6j − 5k 12. −2 1 −1 4i + 3j + k In each of Problems 13 through 16, find the cosine of the angle between AB and the line from A to the midpoint of BC. 13. A = 1 −2 6 B = 3 0 1 C = 4 2 −7
3. −4i − 2j + 3k 6i − 2j − k
14. A = 3 −2 −3 B = −2 0 1 C = 1 1 7
4. 8i − 3j + 2k −8i − 3j + k
15. A = 1 −2 6 B = 0 4 −3 C = −3 −2 7
5. i − 3k 2j + 6k
16. A = 0 5 −1 B = 1 −2 5 C = 7 0 −1
6. i + j + 2k i − j + 2k In each of Problems 7 through 12, find the equation of the plane containing the given point and having the given vector as normal vector. 7. −1 1 2 3i − j + 4k
17. Suppose F · X = 0 for every vector X. What can be concluded about F? 18. Suppose F · i = F · j = F · k = 0. What can be concluded about F?
8. −1 0 0 i − 2j
19. Suppose F = O. Prove that the unit vector u for which F · u is a maximum must be parallel to F.
9. 2 −3 4 8i − 6j + 4k
20. Prove that for any vector F, F = F · ii + F · jj + F · kk
10. −1 −1 −5 −3i + 2j
6.3
The Cross Product The dot product produces a scalar from two vectors. We will now define the cross product, which produces a vector from two vectors. For this section, let F = a1 i + b1 j + c1 k and G = a2 i + b2 j + c2 k.
DEFINITION 6.7
Cross Product
The cross product of F with G is the vector F × G defined by F × G = b1 c2 − b2 c1 i + a2 c1 − a1 c2 j + a1 b2 − a2 b1 k
218
CHAPTER 6
Vectors and Vector Spaces
This vector is read “F cross G.” For example, i + 2j − 3k × −2i + j + 4k = 8 + 3i + 6 − 4j + 1 + 4k = 11i + 2j + 5k A cross product is often computed as a three by three “determinant,” with the unit vectors in the first row, components of F in the second row, and components of G in the third. If expanded by the first row, this determinant gives F × G. For example i 1 −2
j 2 1
k −3 4
2 = 1
−3 4
1 i− −2
−3 4
j+ 1 −2
2 1
k
= 11i + 2j + 5k = F × G The interchange of two rows in a determinant results in a change of sign. This means that interchanging F and G in the cross product results in a change of sign: F × G = −G × F This is also apparent from the definition. Unlike addition and multiplication of real numbers and the dot product operation, the cross product is not commutative, and the order in which it is performed makes a difference. This is true of many physical processes, for example, the order in which chemicals are combined may make a significant difference. Some of the rules we need to compute with cross products are given in the next theorem.
THEOREM 6.5
Properties of the Cross Product
Let F, G, and H be vectors and let be a scalar. 1. F × G = −G × F. 2. F × G is orthogonal to both F and G. 3. F × G = F G sin, in which is the angle between F and G. 4. If F and G are not zero vectors, then F × G = O if and only F and G are parallel. 5. F × G + H = F × G + F × H. 6. F × G = F × G = F × G.
Proofs of these statements are for the most part routine calculations. We will prove (2) and (3). Proof
For (2), compute F · F × G = a1 b1 c2 − b2 c1 + b1 a2 c1 − a1 c2 + c1 a1 b2 − a2 b1 = 0
Therefore F and F × G are orthogonal. A similar argument holds for G.
6.3 The Cross Product
219
For (3), compute F × G2 = b1 c2 − b2 c1 2 + a2 c1 − a1 c2 2 + a1 b2 − a2 b1 2 = a21 + b12 + c12 a22 + b22 + c22 − a1 a2 + b1 b2 + c1 c2 2 = F2 G2 − F · G2 = F2 G2 − F2 G2 cos2 = F2 G2 1 − cos2 = F2 G2 sin2 Because 0 ≤ ≤ , all of the factors whose squares appear in this equation are nonnegative, and upon taking square roots we obtain conclusion (3). If F and G are nonzero and not parallel, then arrows representing these vectors determine a plane in 3-dimensional space (Figure 6.23). F × G is orthogonal to this plane and oriented as in Figure 6.24. If a person’s right hand is placed so that the fingers curl from F to G, then the thumb points up along F × G. This is referred to as the right-hand rule. G × F = −F × G points in the opposite direction. As a simple example, i × j = k, and these three vectors define a standard right-handed coordinate system in 3-space.
G
G F
F
FIGURE 6.23
FIGURE 6.24
Plane determined by F and G.
Right-hand rule gives the direction of F × G.
The fact that F × G is orthogonal to both F and G is often useful. If they are not parallel, then vectors F and G determine a plane (Figure 6.23). This is consistent with the fact that three points, not on the same straight line, determine a plane. One point forms a base point for drawing the arrows representing F and G, and the other two points are the terminal points of these arrows. If we know a vector orthogonal to both F and G, then this vector is orthogonal to every vector in . Such a vector is said to be normal to . In Example 6.10 we showed how to find the equation of a plane, given a point in the plane and a normal vector. Now we can find the equation of a plane, given three points in it (not all on a line), because we can use the cross product to produce a normal vector.
EXAMPLE 6.11
Suppose we want the equation of the plane containing the points 1 2 1, −1 1 3, and −2 −2 −2. Begin by finding a vector normal to . We will do this by finding two vectors in and taking their cross product. The vectors from 1 2 1 to the other two given points are in (Figure 6.25). These vectors are F = −2i − j + 2k and G = −3i − 4j − 3k
220
CHAPTER 6
Vectors and Vector Spaces z
(1, 1, 3) F
x
NF × G (1, 2, 1) y G (2, 2, 2)
FIGURE 6.25
Form
i j k 2 N = F × G = −2 −1 −3 −4 −3
= 11i − 12j + 5k
This vector is normal to (orthogonal to every vector lying in ). Now proceed as in Example 6.10. If x y z is any point in , then x − 1i + y − 2j + z − 1k is in and so is orthogonal to N. Therefore,
x − 1i + y − 2j + z − 1k · N = 11x − 1 − 12y − 2 + 5z − 1 = 0 This gives 11x − 12y + 5z = −8 This is the equation of the plane in the sense that a point x y z is in the plane if and only its coordinates satisfy this equation. If we had specified three points lying on a line (collinear) in this example, then we would have found that F and G are parallel, hence F × G = O. When we calculated this cross product and got a nonzero vector, we knew that the points were not collinear. The cross product also has geometric interpretations as an area or volume.
THEOREM 6.6
Let F and G be represented by arrows lying along incident sides of a parallelogram (Figure 6.26). Then the area of this parallelogram is F × G. The area of a parallelogram is the product of the lengths of two incident sides and the sine of the angle between them. Draw vectors F and G along two incident sides. Then
Proof
G
θ FIGURE 6.26
= F × G.
F Area
6.3 The Cross Product
221
these sides have length F and G. If is the angle between them, then the area of the parallelogram is F G sin. But this is exactly F × G.
EXAMPLE 6.12
A parallelogram has two sides extending from 0 1 −2 to 1 2 2 and from 0 1 −2 to 1 4 1. We want to find the area of this parallelogram. Form vectors along these sides: F = i + j + 4k Calculate
G = i + 3j + 3k
j k 1 4 = −9i + j + 2k 3 3 √ and the area of the parallelogram is F × G = 86 square units. i F × G = 1 1
If a rectangular box is skewed as in Figure 6.27, the resulting solid is called a rectangular parallelopiped. All of its faces are parallelograms. We can find the volume of such a solid by combining dot and cross products, as follows. F×G
ψ
H G F
FIGURE 6.27
FIGURE 6.28 Volume
Parallelopiped.
= H·F×G.
THEOREM 6.7
Let F, G and H be vectors along incident sides of a rectangular parallelopiped. Then the volume of the parallelopiped is H · F × G. This is the absolute value of the real number formed by taking the dot product of H with F × G. Proof Figure 6.28 shows the parallelopiped. F × G is normal to the plane of F and G and oriented as shown in the diagram, according to the right-hand rule. If H is along the third side of the parallelogram, and is the angle between H and F × G, then H cos is the altitude of the parallelopiped. The area of the base parallelogram is F × G by Theorem 6.6. Thus the volume of the parallelopiped is
H F × G cos But this is H · F × G.
222
CHAPTER 6
Vectors and Vector Spaces
EXAMPLE 6.13
One corner of a rectangular parallelopiped is at −1 2 2, and three incident sides extend from this point to 0 1 1, −4 6 8, and −3 −2 4. To find the volume of this solid, form the vectors F = 0 − −1i + 1 − 2j + 1 − 2k = i − j − k G = −4 − −1i + 6 − 2j + 8 − 2k = −3i + 4j + 6k and H = −3 − −1i + −2 − 2j + 4 − 2k = −2i − 4j + 2k Calculate
i j k F × G = 1 −1 −1 = −2i − 3j + k −3 4 6
Then H · F × G = −2−2 + −4−3 + 21 = 18 and the volume is 18 cubic units. The quantity H · F × G is called a scalar triple product. We will outline one of its properties in the problems.
SECTION 6.3
PROBLEMS
In each of Problems 1 through 6, compute F × G and, independently, G × F, verifying that one is the negative of the other. Use the dot product to compute the cosine of the angle between F and G and use this to determine sin. Then calculate F G sin and verify that this gives F × G.
11. −4 2 −6 1 1 3 −2 4 5 In each of Problems 12 through 16, find the area of the parallelogram having incident sides extending from the first point to each of the other two.
1. F = −3i + 6j + k G = −i − 2j + k
12. 1 −3 7 2 1 1 6 −1 2
2. F = 6i − k G = j + 2k
13. 6 1 1 7 −2 4 8 −4 3
3. F = 2i − 3j + 4k G = −3i + 2j
14. −2 1 6 2 1 −7 4 1 1
4. F = 8i + 6j G = 14j
15. 4 2 −3 6 2 −1 2 −6 4
5. F = 5i + 3j + 4k G = 20i + 6k
16. 1 1 −8 9 −3 0 −2 5 2
6. F = 2k G = 8i − j
In each of Problems 17 through 21, find the volume of the parallepiped whose incident sides extend from the first point to each of the other three.
In each of Problems 7 through 11, determine whether the points are collinear. If they are not, find an equation of the plane containing all three points.
17. 1 1 1 −4 2 7 3 5 7 0 1 6
7. −1 1 6 2 0 1 3 0 0
18. 0 1 −6 −3 1 4 1 7 2 −3 0 4
8. 4 1 1 −2 −2 3 6 0 1
19. 1 6 1 −2 4 2 3 0 0 2 2 −4
9. 1 0 −2 0 0 0 5 1 1
20. 0 1 7 9 1 3 −2 4 1 3 0 −3
10. 0 0 2 −4 1 0 2 −1 1
21. 1 1 1 2 2 2 6 1 3 −2 4 6
6.4 The Vector Space R n In each of Problems 22 through 26, find a vector normal to the given plane. There are infinitely many such vectors. 22. 8x − y + z = 12 23. x − y + 2z = 0
30. Use vector operations to find a formula for the area of the triangle having vertices ai bi ci for i = 1 2 3. What conditions must be placed on these coordinates to ensure that the points are not collinear (all on a line)? The scalar triple product of F, G, and H is defined to be F G H = F · G × H.
24. x − 3y + 2z = 9 25. 7x + y − 7z = 7 26. 4x + 6y + 4z = −5 27. Prove that F × G + H = F × G + F × H. 28. Prove that F × G = F × G = F × G. 29. Prove that F × G × H = F · HG − F · GH.
6.4
223
31. Let F = a1 i + b1 j + c1 k, G = a2 i + b2 j + c2 k, and H = a3 i + b3 j + c3 k. Prove that a1 b1 c1
F G H = a2 b2 c2 a3 b3 c3
The Vector Space Rn The world of everyday experience has three space dimensions. But often we encounter settings in which more dimensions occur. If we want to specify not only the location of a particle but the time in which it occupies a particular point, we need four coordinates x y z t. And specifying the location of each particle in a system of particles may require any number of coordinates. The natural setting for such problems is R n , the space of points having n coordinates.
DEFINITION 6.8
n-vector
If n is a positive integer, an n-vector is an n-tuple x1 x2 xn , with each coordinate xj a real number. The set of all n-vectors is denoted R n .
R1 is the real line, consisting of all real numbers. We can think of real numbers as 1-vectors, but there is is no advantage to doing this. R 2 consists of ordered pairs x y of real numbers, and each such ordered pair (or 2-vector) can be identified with a point in the plane. R 3 consists of all 3-vectors, or points in 3-space. If n ≥ 4, we can no longer draw a set of mutually independent coordinate axes, one for each coordinate, but we can still work with vectors in R n according to rules we will now describe.
DEFINITION 6.9
Algebra of R n
1. Two n-vectors are added by adding their respective components: x1 x2 xn + y1 y2 yn = x1 + y1 x2 + y2 xn + yn 2. An n-vector is multiplied by a scalar by multiplying each component by the scalar: x1 x2 xn = x1 x2 xn
224
CHAPTER 6
Vectors and Vector Spaces
The zero vector in R n is the n-vector O = 0 0 0 having each coordinate equal to zero. The negative of F = x1 x2 xn is −F = −x1 −x2 −xn . As we did with n = 3, we denote G + −F as G − F. The algebraic rules in R n mirror those we saw for R 3 . THEOREM 6.8
Let F, G, and H be in R n , and let and be real numbers. Then 1. F + G = G + F. 2. F + G + H = F + G + H. 3. F + O = F. 4. 5. 6. 7.
+ F = F + F. F = F. F + G = F + G. O = O.
Because of these properties of the operations of addition of n-vectors, and multiplication of an n-vector by a scalar, we call R n a vector space. In the next section we will clarify the sense in which R n can be said to have dimension n. The length (norm, magnitude) of F = x1 x2 xn is defined by a direct generalization from the plane and 3-space: " F = x12 + x22 + · · · + xn2 There is no analogue of the cross product for vectors in R n when n > 3. However, the dot product readily extends to n-vectors.
DEFINITION 6.10
Dot Product of n-Vectors
The dot product of x1 x2 xn and y1 y2 yn is defined by x1 x2 xn · y1 y2 yn = x1 y1 + x2 y2 + · · · + xn yn
All of the conclusions of Theorem 6.3 remain true for n-vectors, as does Lemma 6.1. We will record these results for completeness. THEOREM 6.9
Let F, G, and H be n-vectors, and let and be real numbers. Then 1. 2. 3. 4.
F · G = G · F. F + G · H = F · H + G · H. F · G = F · G = F · G. F · F = F2 .
6.4 The Vector Space R n
225
5. F · F = 0 if and only if F = O. 6. F + G2 = 2 F2 + 2F · G + 2 G2 . The Cauchy-Schwarz inequality holds for n-vectors, but the proof given previously for 3-vectors does not generalize to R n . We will therefore give a proof that is valid for any n. THEOREM 6.10
Cauchy-Schwarz Inequality in R n
Let F and G be in R n . Then F · G ≤ F G Proof The inequality reduces to 0 ≤ 0 if either vector is the zero vector. Thus suppose F = O and G = O. Choose = G and = − F in Theorem 6.9(6). We get
0 ≤ F + G2 = G2 F2 − 2 G F F · G + F2 G2 Upon dividing this inequality by 2 F G we obtain F · G ≤ F G Now go back to Theorem 6.9(6), but this time choose = G and = F to get 0 ≤ G2 F2 + 2 G F F · G + G2 F2 and upon dividing by 2 F G we get − F G ≤ F · G We have now shown that − F G ≤ F · G ≤ F G and this is equivalent to the Cauchy-Schwarz inequality. In view of the Cauchy-Schwarz inequality, we can define the cosine of the angle between vectors F and G in R n by 0 if F or G equals the zero vector cos = F · G/F G if F = O and G = O This is sometimes useful in bringing some geometric intuition to R n . For example, it is natural to define F and G to be orthogonal if the angle between them is /2, and by this definition of cos, this is equivalent to requiring that F · G = 0, consistent with orthogonality in R 2 and R 3 . We can define a standard representation of vectors in R n by defining unit vectors along the n directions: e1 = 1 0 0 0 e2 = 0 1 0 0 en = 0 0 0 1
226
CHAPTER 6
Vectors and Vector Spaces
Now any n-vector can be written x1 x2 xn = x1 e1 + x2 e2 + · · · + xn en =
n
xj e j
j=1
A set of n-vectors containing the zero vector, as well as sums of vectors in the set and scalar multiples of vectors in the set, is called a subspace of R n .
DEFINITION 6.11
Subspace
A set S of n-vectors is a subspace of R n if: 1. O is in S. 2. The sum of any vectors in S is in S. 3. The product of any vector in S with any real number is also in S.
Conditions (2) and (3) of the definition can be combined by requiring that F + G be in S for any vectors F and G in S, and any real numbers and .
EXAMPLE 6.14
Let S consist of all vectors in R n having norm 1. In R 2 (the plane) this is the set of points on the unit circle about the origin; in R 3 this is the set of points on the unit sphere about the origin. S is not a subspace of R n for several reasons. First, O is not in S because O does not have norm 1. Further, a sum of vectors in S is not in S (a sum of vectors having norm 1 does not have norm 1). And, if = 1, and F has norm 1, then F does not have norm 1, so F is not in S. In this example S failed all three criteria for being a subspace. It is enough to fail one to disqualify a set of vectors from being a subspace.
EXAMPLE 6.15
Let K consist of all scalar multiples of −1 4 2 0 in R 4 . We want to know if K is a subspace of R 4 . First, O is in K, because O = 0−1 4 2 0 = 0 0 0 0. Next, if F and G are in K, then F = −1 4 2 0 for some and G = −1 4 2 0 for some , so F + G = + −1 4 2 0 is a scalar multiple of −1 4 2 0 and therefore is in K. Finally, if F = −1 4 2 0 is any vector in K, and is any scalar, then F = −1 4 2 0 is a scalar multiple of −1 4 2 0, and hence is in K. Thus K is a subspace of R 4 .
6.4 The Vector Space R n
227
EXAMPLE 6.16
Let S consist of just the zero vector O in R n . Then S is a subspace of R n . This is called the trivial subspace. At the other extreme, R n is also a subspace of R n . In R 2 and R 3 there are simple geometric characterizations of all possible subspaces. Begin with R 2 . We claim that only subspaces are the trivial subspace consisting of just the zero vector, or R 2 itself, or all vectors lying along a single line through the origin. To demonstrate this, we need the following fact. LEMMA 6.2
Let F and G be nonzero vectors in R 2 that are not parallel. Then every vector in R 2 can be written in the form F + G for some scalars and . Represent F and G as arrows from the origin (Figure 6.29). These determine nonparallel lines L1 and L2 , respectively, through the origin, because F and G are assumed to be nonparallel. Let V be any 2-vector. If V = O, then V = 0F + 0G. We therefore consider the case that V = O and represent it as an arrow from the origin as well. We want to show that V must be the sum of scalar multiples of F and G. If V is along L1 , then V = F for some real number , and then V = F + 0G. Similarly, if V is along L2 , then V = G = 0F + G. Thus, suppose that V is not a scalar multiple of either F or G. Then the arrow representing V is not along L1 or L2 . Now carry out the construction shown in Figure 6.30. Draw lines parallel to L1 and L2 from the tip of V. Arrows from the origin to where these parallels intersect L1 and L2 determine, respectively, vectors A and B. By the parallelogram law, Proof
V = A + B But A is along L1 , so A = F for some scalar . And B is along L2 , and so B = G for some scalar . Thus V = F + G, completing the proof. We can now completely characterize the subspaces of R 2 . y B y
Parallel to L 1
L1 F x
V F
G A
L2
x G FIGURE 6.29
THEOREM 6.11
Parallel to L2 FIGURE 6.30
The Subspaces of R 2
Let S be a subspace of R 2 . Then one of the following three possibilities must hold: 1. S = R2 , or 2. S consists of just the zero vector, or 3. S consists of all vectors parallel to some straight line through the origin.
CHAPTER 6
228
Vectors and Vector Spaces
Suppose cases (1) and (2) do not hold. Because S is not the trivial subspace, S must contain at least one nonzero vector F. We will show that every vector in S is a scalar multiple of F. Suppose instead that there is a nonzero vector G in S that is not a scalar multiple of F. If V is any vector in R 2 , then by Lemma 6.2, V = F + G for some scalars and . But S is a subspace or R 2 , so F + G is in S. This would imply that every vector in R 2 is in S, hence that S = R2 , a contradiction. Therefore there is no vector in S that is not a scalar multiple of F. We conclude that every vector in S is a scalar multiple of F, and the proof is complete. Proof
By a similar argument involving more cases, we can prove that any subspace of R 3 must be either the trivial subspace, R 3 itself, all vectors parallel to some line through the origin, or all vectors parallel to some plane through the origin.
SECTION 6.4
PROBLEMS
In each of Problems 1 through 6, find the sum of the vectors and express this sum in standard form. Calculate the dot product of the vectors and the angle between them. The latter may be expressed as an inverse cosine of a number. 1. −1 6 2 4 0 6 −1 4 1 1 2. 0 1 4 −3 2 8 6 −4 3. 1 −4 3 2 16 0 0 4 4. 6 1 1 −1 2 4 3 5 1 −2 5. 0 1 6 −4 1 2 9 −3 6 6 −12 4 −3 −3 2 7 6. −5 2 2 −7 −8 1 1 1 −8 7 In each of Problems 7 through 13, determine whether the set of vectors is a subspace of R n for the appropriate n. 7. S consists of all vectors x y z x x in R5 . 8. S consists of all vectors x 2x 3x y in R 4 . 9. S consists of all vectors x 0 0 1 0 y in R6 . 10. S consists of all vectors 0 x y in R 3 .
6.5
11. S consists of all vectors x y x + y x − y in R 4 . 12. S consists of all vectors in R7 having zero third and fifth components. 13. S consists of all vectors in R 4 whose first and second components are equal. 14. Let S consist of all vectors in R 3 on or parallel to the plane ax + by + cz = k, with a, b, and c real numbers, at least one of which is nonzero, and k = 0. Is S a subspace of R 3 ? 15. Let F and G be in R n . Prove that F + G2 + F − G2 = 2 F2 + G2 Hint: Use the fact that the square of the norm of a vector is the dot product of the vector with itself. 16. Let F and G be orthogonal vectors in R n . Prove that F + G2 = F2 + G2 . This is called Pythagoras’s theorem. 17. Suppose F and G are vectors in R n satisfying the relationship of Pythagoras’s theorem (Problem 16). Does it follow that F and G are orthogonal?
Linear Independence, Spanning Sets, and Dimension in Rn In solving systems of linear algebraic equations and systems of linear differential equations, as well as for later work in Fourier analysis, we will use terminology and ideas from linear algebra. We will define these terms in R n , where we have some geometric intuition.
6.5 Linear Independence, Spanning Sets, and Dimension in R n
DEFINITION 6.12
229
Linear Combinations in R n
A linear combination of k vectors F1 Fk in R n is a sum 1 F1 + · · · + k Fk in which each j is a real number.
For example, −8−2 4 1 0 + 61 1 −1 7 − 8 0 0 0 is a linear combination of −2 4 1 0, 1 1 −1 7, and 8 0 0 0 in R 4 . This linear combination is equal to the 4-vector 22 − 8 −26 −14 42 The set of all linear combinations of any given (finite) number of vectors in R n is always a subspace of R n . THEOREM 6.12
Let F1 Fk be in R n , and let V consist of all vectors 1 F1 + · · · + k Fk , in which each j can be any real number. Then V is a subspace of R n . First, O is in V (choose 1 = 2 = · · · = k = 0). Next, suppose G and H are in V . Then
Proof
G = 1 F1 + · · · + k Fk and H = 1 F1 + · · · + k Fk for some real numbers 1 k 1 k . Then G + H = 1 + 1 F1 + · · · + k + k Fk is again a linear combination of F1 Fk , and so is in V . Finally, let G = 1 F1 + · · · + k Fk be in V . If c is any real number, then cG = c1 F1 + · · · + ck Fk is also a linear combination of F1 Fk , and is therefore in V . Therefore V is a subspace of R n .
Whenever we form a subspace by taking a linear combination of given vectors, we say that these vectors span the subspace.
DEFINITION 6.13
Spanning Set
Let F1 Fk be vectors in a subspace S of R n . Then F1 Fk form a spanning set for S if every vector in S is a linear combination of F1 Fk . In this case we say that S is spanned by F1 Fk , or that F1 Fk span S.
230
CHAPTER 6
Vectors and Vector Spaces
For example, i, j, and k span R 3 , because every vector in R 3 can be written ai + bj + ck. The vector i + j in R 2 spans the subspace consisting of all vectors i + j, with any scalar. These vectors all lie along the straight line y = x through the origin in the plane. Different sets of vectors may span the same subspace of R n . Consider the following example.
EXAMPLE 6.17
Let V be the subspace of R 4 consisting of all vectors 0 0. Every vector in S can be written 0 0 = 1 0 0 0 + 0 1 0 0 so 1 0 0 0 and 0 1 0 0 span V . But we can also write any vector in V as 2 0 0 0 + 0 0 0 2 so the vectors 2 0 0 0 and 0 0 0 also span V . Different numbers of vectors may also span the same subspace. The vectors 0 0 =
4 0 0 0 0 3 0 0 1 2 0 0 also span V . To see this, write an arbitrary vector in V as 0 0 =
−2 −4 4 0 0 0 + 0 3 0 0 + 21 2 0 0 4 3
The last example suggests that some spanning sets are more efficient than others. If two vectors will span a subspace V , why should we choose a spanning set with three vectors in it? Indeed, in the last example, the last vector in the spanning set 4 0 0 0, 0 3 0 0, 1 2 0 0 is a linear combination of the first two: 2 1 1 2 0 0 = 4 0 0 0 + 0 3 0 0 4 3 Thus any linear combination of these three vectors can always be written as a linear combination of just 4 0 0 0 and 0 3 0 0. The third vector, being a linear combination of the first two, is “extraneous information.” These ideas suggest the following definition.
DEFINITION 6.14
Linear Dependence and Independence
Let F1 Fk be vectors in R n . 1. F1 Fk are linearly dependent if and only if one of these vectors is a linear combination of the others. 2. F1 Fk are linearly independent if and only if they are not linearly dependent.
Linear dependence of F1 Fk means that, whatever information these vectors carry, not all of them are needed, because at least one of them can be written in terms of the others. For example, if Fk = 1 F1 + · · · + k−1 Fk−1
6.5 Linear Independence, Spanning Sets, and Dimension in R n
231
then knowing just F1 Fk−1 gives us Fk as well. In this sense, linearly dependent vectors are redundant. We can remove at least one Fj and the remaining k − 1 vectors will span the same subspace of R n that F1 Fk do. Linear independence means that no one of the vectors F1 Fk is a linear combination of the others. Whatever these vectors are telling us (for example, specifying a subspace), we need all of them or we lose information. If we omit Fk , we cannot retrieve it from F1 Fk−1 .
EXAMPLE 6.18
The vectors 1 1 0 and −2 0 3 are linearly independent in R 3 . To prove this, suppose instead that these vectors are linearly dependent. Then one is a linear combination of the other, say −2 0 3 = 1 1 0 But then, from the first components, = −2, while from the second components, = 0, an impossibility. These vectors span the subspace V of R 3 consisting of all vectors −2 0 3 + 1 1 0 Both of the vectors 1 1 0 and −2 0 3 are needed to describe V . If we omit one, say 1 1 0, then the subspace of R 3 spanned by the remaining vector, −2 0 3, is different from V . For example, it does not have 1 1 0 in it. The following is a useful characterization of linear dependence and independence. THEOREM 6.13
Let F1 Fk be vectors in R n . Then 1. F1 Fk are linearly dependent if and only if there are real numbers 1 k , not all zero, such that 1 F1 + 2 F2 + · · · + k Fk = O 2. F1 Fk are linearly independent if and only if an equation 1 F1 + 2 F2 + · · · + k Fk = O can hold only if 1 = 2 = · · · = k = 0. To prove (1), suppose first that F1 Fk are linearly dependent. Then at least one of these vectors is a linear combination of the others. Say, to be specific, that
Proof
F1 = 2 F2 + · · · + k Fk Then F1 − 2 F2 + · · · + k Fk = O But this is a linear combination of F1 Fk adding up to the zero vector and having a nonzero coefficient (the coefficient of F1 is 1). Conversely, suppose 1 F1 + 2 F2 + · · · + k Fk = O
232
CHAPTER 6
Vectors and Vector Spaces
with at least some j = 0. We want to show that F1 Fk are linearly dependent. By renaming the vectors if necessary, we may suppose for convenience that 1 = 0. But then F1 = −
2 F2 − · · · − k Fk 1 1
so F1 is a linear combination of F2 Fk and hence F1 Fk are linearly dependent. This completes the proof of (1). Conclusion (2) follows from (1) and the fact that F1 Fk are linearly independent exactly when these vectors are not linearly dependent. This theorem suggests a strategy for determining whether a given set of vectors is linearly dependent or independent. Given F1 Fk , set 1 F1 + 2 F2 + · · · + k Fk = O
(6.4)
and attempt to solve for the coefficients 1 k . If equation (6.4) forces 1 = · · · = k = 0, then F1 Fk are linearly independent. If we can find at least one nonzero j so that equation (6.4) is true, then F1 Fk are linearly dependent.
EXAMPLE 6.19
Consider 1 0 3 1, 0 1 −6 −1 and 0 2 1 0 in R 4 . We want to know whether these vectors are linearly dependent or independent. Look at a linear combination c1 1 0 3 1 + c2 0 1 −6 −1 + c3 0 2 1 0 = 0 0 0 0 If this is to hold, then each component of the vector c1 c2 + 2c3 3c1 − 6c2 + c3 c1 − c2 must be zero: c1 = 0 c2 + 2c3 = 0 3c1 − 6c2 + c3 = 0 c1 − c2 = 0 The first equation gives c1 = 0, so the fourth equation tells us that c2 = 0. But then the second equation requires that c3 = 0. Therefore, the only linear combination of these three vectors that equals the zero vector is the trivial linear combination (all coefficients zero), and by (2) of Theorem 6.13, the vectors are linearly independent. In the plane R 2 , two vectors are linearly dependent if and only if they are parallel. In R 3 , two vectors are linearly dependent if and only if they are parallel, and three vectors are linearly dependent if and only if they are in the same plane. Any set of vectors that includes the zero vector must be linearly dependent. Consider, for example, the vectors O, F2 , , Fk . Then the linear combination 1O + 0F2 + · · · + 0Fk = O is a linear combination of these vectors that add up to the zero vector, but has a nonzero coefficient (the coefficient of O is 1). By Theorem 6.13(1), these vectors are linearly dependent. There is a special circumstance in which it is particularly easy to tell that a set of vectors is linearly independent. This is given in the following lemma, which we will use later.
6.5 Linear Independence, Spanning Sets, and Dimension in R n
233
LEMMA 6.3
Let F1 Fk be vectors in R n . Suppose each Fj has a nonzero element in some component where each of the other Fi s has a zero component. Then F1 Fk are linearly independent. An example will clarify why this is true.
EXAMPLE 6.20
Consider the vectors F1 = 0 4 0 0 2 F2 = 0 0 6 0 −5 F3 = 0 0 0 −4 12 To see why these are linearly independent, suppose F1 + F2 + !F3 = 0 0 0 0 0 Then 0 4 6 −4! 2 − 5 + 12! = 0 0 0 0 0 From the second components, 4 = 0 so = 0. From the third components, 6 = 0 so = 0. And from the fourth components, −4! = 0 so ! = 0. Then the vectors are linearly independent by Theorem 6.13(2). The fact that each of the vectors has a nonzero element where all the others have only zero components makes it particularly easy to conclude that = = ! = 0, and that is what is needed to apply Theorem 6.13. There is another important setting in which it is easy to tell that vectors are linearly independent. Nonzero vectors F1 Fk in R n are said to be mutually orthogonal if each is orthogonal to each of the other vectors in the set. That is, Fi ·Fj = 0 if i = j. Mutually orthogonal nonzero vectors are necessarily linearly independent.
THEOREM 6.14
Let F1 Fk be mutually orthogonal nonzero vectors in R n . Then F1 Fk are linearly independent. Proof
Suppose 1 F1 + 2 F2 + · · · + k Fk = O
For any j = 1 k, 1 F1 + 2 F2 + · · · + k Fk · Fj = 0 = 1 F1 · Fj + 2 F2 · Fj + · · · + j Fj · Fj + · · · + k Fk · Fj # #2 = cj Fj · Fj = cj #Fj # # #2 because Fi · Fj = 0 if i = j. But Fj is not the zero vector, so #Fj # = 0, hence cj = 0. Therefore, each coefficient is zero and F1 Fk are linearly independent by Theorem 6.13(2).
234
CHAPTER 6
Vectors and Vector Spaces
EXAMPLE 6.21
The vectors −4 0 0, 0 −2 1, 0 1 −2 are linearly independent in R 3 , because each is orthogonal to the other two. A “smallest” spanning set for a subspace of R n is called a basis for that subspace.
DEFINITION 6.15
Basis
Let V be a subspace of R n . A set of vectors F1 Fk in V form a basis for V if F1 Fk are linearly independent and also span V .
Thus, for F1 Fk to be a basis for V , every vector in V must be a linear combination of F1 Fk , and if any Fj is omitted from the list F1 Fk , the remaining vectors do not span V . In particular, if Fj is omitted, then the subspace spanned by F1 Fj−1 Fj+1 Fk cannot contain Fj , because by linear independence, Fj is not a linear combination of F1 Fj−1 Fj+1 Fk .
EXAMPLE 6.22
i, j, and k form a basis for R 3 , and e1 e2 en form a basis for, R n .
EXAMPLE 6.23
Let V be the subspace of R n consisting of all n−vectors with zero first component. Then e2 en form a basis for V .
EXAMPLE 6.24
In R 2 , let V consist of all vectors parallel to the line y = 4x. Every vector in V is a multiple of 1 4. This vector by itself forms a basis for V . In fact, any vector 4 with = 0 forms a basis for V .
EXAMPLE 6.25
In R 3 , let M be the subspace of all vectors on or parallel to the plane x + y + z = 0. A vector x y z in R 3 is in M exactly when z = −x − y, so such a vector can be written x y z = x y −x − y = x1 0 −1 + z0 1 −1 The vectors 1 0 −1 and 0 1 −1 therefore span M. Since these two vectors are linearly independent, they form a basis for M. We may think of a basis of V as a minimal linearly independent spanning set F1 Fk for V . If we omit any of these vectors, the remaining vectors will not be enough to span V .
6.5 Linear Independence, Spanning Sets, and Dimension in R n
235
And if we use additional vectors, say the set F1 Fk H, then this set also spans V , but is not linearly independent (because H is a linear combination of F1 Fk ). There is nothing unique about a basis for a subspace of R n . Any nontrivial subspace of R n has infinitely many different bases. However, it is a theorem of linear algebra, which we will not prove, that for a given subspace V of R n , every basis has the same number of vectors in it. This number is the dimension of the subspace.
DEFINITION 6.16
The dimension of a subspace of R n is the number of vectors in any basis for the subspace.
In particular, R n (which is a subspace of itself) has dimension n, a basis consisting of the n vectors e1 en . The subspace in Example 6.25 has dimension 2.
SECTION 6.5
PROBLEMS
In each of Problems 1 through 10, determine whether the given vectors are linearly independent or dependent in R n for appropriate n. 1. 3i + 2j i − j in R
13. i + 6j − 2k −i + 4j − 3k i + 16j − 7k 14. 4i − 3j + k 10i − 3j 2i − 6j + 3k
3
2. 2i 3j 5i − 12k i + j + k in R 3 3. 8 0 2 0 0 0 0 0 0 0 0 1 −1 0 in R7 4. 1 0 0 0 0 1 1 0 −4 6 6 0 in R 4 5. 1 2 −3 1, 4 0 0 2, 6 4 −6 4 in R 4 6. 0 1 1 1 −3 2 4 4 −2 2 34 2 1 1 −6 −2 in R 4 7. 1 −2 4 1 6 6 in R 2
15. 8i + 6j 2i − 4j i + k 16. 12i − 3k i + 2j − k −3i + 4j In each of Problems 17 through 24, determine a basis for the subspace S of R n and determine the dimension of the subspace. 17. S consists of all vectors x y −y −x in R 4 . 18. S consists of all vectors x y 2x3y in R 4 .
8. −1 1 0 0 0 0 −1 1 0 0 0 1 1 1 0 in R5
19. S consists of all vectors in the plane 2x − y + z = 0.
9. −2 0 0 1 1 1 0 0 0 0 0 0 0 0 2 1 −1 3 3 1 in R 4
20. S consists of all vectors x y −y x − y z in R5 .
10. 3 0 0 4 2 0 0 8 in R 4
21. S consists of all vectors in R 4 with zero second component.
11. Prove that three vectors in R 3 are linearly dependent if and only if their scalar triple product is zero. (See Problem 31 in Section 6.3).
22. S consists of all vectors −x x y 2y in R 4 .
In each of Problems 12 through 16, use the result of Problem 11 to determine whether the three vectors in R 3 are linearly dependent or independent. 12. 3i + 6j − k 8i + 2j − 4k i − j + k
23. S consists of all vectors parallel to the line y = 4x in R 2 . 24. S consists of all vectors parallel to the plane 4x + 2y − z = 0 in R 3 .
This page intentionally left blank
CHAPTER
MATRICES ELEMENTARY ROW OPERATIONS AND ELEMENTARY MATRICES THE ROW ECHELON FORM OF A MATRIX SOLUTION OF HOMOGENEOUS SYSTEMS OF LINEAR EQUATIONS THE SOLUTION SPACE OF
7
Matrices and Systems of Linear Equations
This chapter is devoted to the notation and algebra of matrices, as well as their use in solving systems of linear algebraic equations. To illustrate the idea of a matrix, consider a system of linear equations: x1 + 2x2 − x3 + 4x4 = 0 3x1 − 4x2 + 2x3 − 6x4 = 0 x1 − 3x2 − 2x3 + x4 = 0 All of the information needed to solve this system lies in its coefficients. Whether the first unknown is called x1 , or y1 , or some other name, is unimportant. It is important, however, that the coefficient of the first unknown in the second equation is 3. If we change this number we may change the solutions of the system. We can therefore work with such a system by storing its coefficients in an array called a matrix: ⎞ ⎛ 1 2 −1 4 ⎝3 −4 2 −6⎠ 1 −3 −2 1 This matrix displays the coefficients in the pattern in which they appear in the system of equations. The coefficients of the ith equation are in row i, and the coefficients of the j th unknown xj are in column j. The number in row i, column j is the coefficient of xj in equation i. But matrices provide more than a visual aid or storage device. The algebra and calculus of matrices will form the basis for methods of solving systems of linear algebraic equations, and later for solving systems of linear differential equations and analyzing solutions of systems of nonlinear differential equations.
237
238
7.1
CHAPTER 7
Matrices and Systems of Linear Equations
Matrices DEFINITION 7.1
Matrix
An n by m matrix is an array of objects arranged in n rows and m columns.
We will denote matrices by boldface type, as was done with vectors. When A is an n by m matrix, we often write that A is n × m (read “n by m”). The first integer is the number of rows in the matrix, and the second integer is the number of columns. The objects in the matrix may be numbers, functions, or other quantities. For example, 2 √1 2 −5 1 is a 2 × 3 matrix,
e2x cosx
is a 2 × 2 matrix, and
e−4x x2
⎛
⎞ 0 ⎜−4 ⎟ ⎜ 3⎟ ⎝ x⎠ 2
is a 4 × 1 matrix. A matrix having the same number of rows as columns is called a square matrix. The 2 × 2 matrix shown above is square. The object in row i and column j of a matrix is called the i j element, or i j entry of the matrix. If a matrix is denoted by an upper case letter, say A, then its i j element is often denoted aij and we write A = aij . For example, if ⎛ ⎞ 0 x H = hij = ⎝1 − sinx 1 − 2i⎠ x2 i then H is a 3 × 2 matrix, and h11 = 0, h12 = x, h21 = 1 − sinx, h22 = 1 − 2i, h31 = x2 and h32 = i. We will be dealing with matrices whose elements are real or complex numbers, or functions. Sometimes it is also convenient to denote the i j element of A by Aij . In the matrix H, H22 = 1 − 2i DEFINITION 7.2
and H31 = x2
Equality Matrices
A = aij and B = bij are equal if and only if they have the same number of rows, the same number of columns, and for each i and j, aij = bij .
If two matrices either have different numbers of rows or columns, or if the objects in a particular location in the matrices are different, then the matrices are unequal.
7.1 Matrices
7.1.1
239
Matrix Algebra
We will develop the operations of addition and multiplication of matrices and multiplication of a matrix by a number.
DEFINITION 7.3
Matrix Addition
If A = aij and B = bij are n × m matrices, then their sum is the n × m matrix A + B = aij + bij
We therefore add matrices by adding corresponding elements. For example, 1 2 −3 −1 6 3 0 8 0 + = 4 0 2 8 12 14 12 12 16 If two matrices are of different dimensions (different numbers of rows or columns), then they cannot be added, just as we do not add 4-vectors and 7-vectors.
DEFINITION 7.4
Product of a Matrix and a Scalar
If A = aij and is a scalar, then A is the matrix defined by A = aij
This means that we multiply a matrix by by multiplying each element of the matrix by . For example, ⎛ ⎞ ⎛ ⎞ 2 0 6 0 ⎜0 0⎟ ⎜0 0 ⎟ ⎟ ⎜ ⎟ 3⎜ ⎝1 4⎠ = ⎝3 12⎠ 2 6 6 18 and
x
1 x x = −x2 −x cosx
x2 x cosx
Some, but not all, pairs of matrices can be multiplied.
DEFINITION 7.5
Multiplication of Matrices
Let A = aij be an n × r matrix, and B = bij an r × m matrix. Then the matrix product AB is the n × m matrix whose i j element is ai1 b1j + ai2 b2j + · · · + air brj That is,
AB =
r
k=1
aik bkj
240
CHAPTER 7
Matrices and Systems of Linear Equations
If we think of each row of A as an r-vector, and each column of B as an r-vector, then the i j element of AB is the dot product of row i of A with column j of B : i j element of AB = row i of A · column j of B This is why the number of columns of A must equal the number of rows of B for AB to be defined. These rows of A and columns of B must be vectors of the same length in order to take this dot product. Thus not every pair of matrices can be multiplied. Further, even when AB is defined, BA need not be. We will give one rationale for defining matrix multiplication in this way shortly. First we will look at some examples of matrix products and then develop the rules of matrix algebra.
EXAMPLE 7.1
Let
A=
1 2
1 3 and B = 5 2
1 1
3 4
Then A is 2 × 2 and B is 2 × 3, so AB is defined (number of columns of A equals the number of rows of B). Further, AB is 2 × 3 (number of rows of A, number of columns of B). Now compute 1 3 1 1 3 AB = 2 5 2 1 4 1 3 · 1 2 1 3 · 1 1 1 3 · 3 4 = 2 5 · 1 2 2 5 · 1 1 2 5 · 3 4 7 4 15 = 12 7 26 In this example, BA is not defined, because the number of columns of B, which is 3, does not equal the number of rows of A, which is 2.
EXAMPLE 7.2
Let A=
1 4
1 2 1 6
1 2
⎛
−1 ⎜ 2 and B = ⎜ ⎝ 1 12
⎞ 8 1⎟ ⎟ 1⎠ 6
Since A is 2 × 4 and B is 4 × 2, AB is defined and is a 2 × 2 matrix: 1 1 2 1 · −1 2 1 12 1 1 2 1 · 8 1 1 6 AB = 4 1 6 2 · −1 2 1 12 4 1 6 2 · 8 1 1 6 15 17 = 28 51
7.1 Matrices In this example BA is also defined, and is ⎛ −1 ⎜ 2 BA = ⎜ ⎝ 1 12 ⎛ 31 ⎜ 6 =⎜ ⎝ 5 36
4 × 4: ⎞ 8 1⎟ ⎟ 1 1⎠ 4 6 7 46 3 10 2 8 18 60
1 2 1 6
1 2
241
⎞ 15 4⎟ ⎟ 3⎠ 24
As the last example shows, even when both AB and BA are defined, these may be matrices of different dimensions. Matrix multiplication is noncommutative, and it is the exception rather than the rule to have AB equal BA. If A is a square matrix, then AA is defined and is also square. Denote AA as A2 . Similarly, AA2 = A3 and, for any positive integer k, Ak = AA · · · A, a product with k factors. Some of the rules for manipulating matrices are like those for real numbers.
THEOREM 7.1
Let A, B, and C be matrices. Then, whenever the indicated operations are defined, we have: 1. 2. 3. 4.
A + B = B + A. AB + C = AB + AC. A + BC = AC + BC. ABC = ABC.
For (1), both matrices must have the same dimensions, say n × m. For (2), B and C must have the same dimensions, and the number of columns in A must equal the number of rows in B and in C. For (4), A must be n × r, B must be r × k and C must be k × m. Then ABC and ABC are n × m. The theorem is proved by direct appeal to the definitions. We will provide the details for (1) and (2). To prove (1), let A = aij and B = bij . Then
Proof
A + B = aij + bij = bij + aij = B + A because each aij and bij is a number or function and the addition of these objects is commutative. For (2), let A = aij , B = bij and C = cij . Suppose A is n × k and B and C are k × m. Then B + C is k × m, so AB + C is defined and is n × m. And AB and BC are both defined and n × m. There remains to show that the i j element of AB + AC is the same as the i j element of AB + C. Row i of A, and columns j of B and C, are k-vectors, and from properties of the dot product, i j element of AB + C = row i of A · column j of B + C = row i of A · column j of B + row i of A · column j of C = i j element of AB + i j element of AC = i j element of AB + AC
242
CHAPTER 7
Matrices and Systems of Linear Equations
We have already noted that matrix multiplication does not behave in some ways like multiplication of numbers. Here is a summary of three significant differences. Difference 1 For matrices, even when AB and BA are both defined, possibly AB = BA.
EXAMPLE 7.3
but
1 −2
0 4
−2 1
−2 1
6 3
1 −2
6 −2 = 3 8
0 −14 = 4 −5
6 0
24 12
Difference 2 There is no cancellation in products. If AB = AC, we cannot infer that B = C.
EXAMPLE 7.4
1 3
1 3
But
2 1 1 2 7 = 16 3 3 5 11 7 18 = 21 54
4 3
2 2 = 16 5
4 3
7 11
Difference 3 The product of two nonzero matrices may be zero.
EXAMPLE 7.5
1 0
7.1.2
2 0
6 4 0 = −3 −2 0
0 0
Matrix Notation for Systems of Linear Equations
Matrix notation is very efficient for writing systems of linear algebraic equations. Consider, for example, the system 2x1 − x2 + 3x3 + x4 = 1 x1 + 3x2 − 2x4 = 0 −4x1 − x2 + 2x3 − 9x4 = −3 The matrix of coefficients of this system is the 3 × 4 matrix ⎞ ⎛ 2 −1 3 1 3 0 −2⎠ A=⎝ 1 −4 −1 2 −9
7.1 Matrices
243
Row i contains the coefficients of the ith equation, and column j contains the coefficients of xj . Define ⎛ ⎞ ⎛ ⎞ x1 1 ⎜ x2 ⎟ ⎟ and B = ⎝ 0⎠ X=⎜ ⎝ x3 ⎠ −3 x4 Then
⎛ ⎞ ⎞ x 1 ⎜ 1⎟ x2 ⎟ −2⎠ ⎜ ⎝ x3 ⎠ −9 x4 ⎛ ⎞ ⎛ ⎞ 2x1 − x2 + 3x3 + x4 1 ⎠ = ⎝ 0⎠ x1 + 3x2 − 2x4 =⎝ −3 −4x1 − x2 + 2x3 − 9x4 ⎛
2 AX = ⎝ 1 −4
−1 3 −1
3 0 2
We can therefore write the system of equations in matrix form as AX = B This is more than just notation. Soon this matrix formulation will enable us to use matrix operations to solve the system. A similar approach can be taken toward systems of linear differential equations. Consider the system x1 + tx2 − x3 = 2t − 1 t2 x1 − costx2 − x3 = et Let
1 A= 2 t
t − cost
−1 −1
⎛ ⎞ x1 X = ⎝x2 ⎠ x3
2t − 1 and F = et
Then the system can be written AX = F in which X is formed by differentiating each matrix element of X. As with systems of linear algebraic equations, this formulation will enable us to bring matrix methods to bear on solving the system of differential equations. In both of these formulations, the definition of matrix product played a key role. Matrix multiplication may seem unmotivated at first, but it is just right for converting a system of linear algebraic or differential equations to a matrix equation.
7.1.3
Some Special Matrices
Some matrices occur often enough to warrant special names and notation.
DEFINITION 7.6
Zero Matrix
Onm denotes the n × m zero matrix, having each element equal to zero.
244
CHAPTER 7
Matrices and Systems of Linear Equations
For example,
O23 =
0 0
0 0
0 0
If A is n × m, then A + Onm = Onm + A = A The negative of A is the matrix obtained by replacing each element of A with its negative. This matrix is denoted −A. If A = aij , then −A = −aij . If A is n × m, then A + −A = Onm Usually we write A + −B as A − B.
DEFINITION 7.7
Identity Matrix
The n × n identity matrix is the matrix In having each i j element equal to zero if i = j, and each i i− element equal to 1.
For example,
1 I2 = 0
0 1
⎛
1 I 3 = ⎝0 0
and
0 1 0
⎞ 0 0⎠ 1
THEOREM 7.2
If A is n × m, then AIm = In A = A We leave a proof of this to the student.
EXAMPLE 7.6
Let
⎛
1 A=⎝ 2 −1 Then
⎛
1 I 3 A = ⎝0 0 and
⎛
1 AI2 = ⎝ 2 −1
0 1 0
⎞⎛ 0 1 0⎠ ⎝ 2 1 −1 ⎞ 0 1 1⎠ 0 8
⎞ 0 1⎠ 8 ⎞ ⎛ 0 1 1⎠ = ⎝ 2 8 −1
⎛
1 0 =⎝ 2 1 −1
⎞ 0 1⎠ = A 8
⎞ 0 1⎠ = A 8
7.1 Matrices
DEFINITION 7.8
245
Transpose
If A = aij is an n × m matrix, then the transpose of A is the m × n matrix At = aji .
The transpose of A is formed by making row k of A, column k of At .
EXAMPLE 7.7
Let
−1 A= 0
6
3 12
3 −5
This is a 2 × 4 matrix. The transpose is the 4 × 2 matrix ⎞ ⎛ −1 0 ⎜ 6 ⎟ ⎟ At = ⎜ ⎝ 3 12⎠ 3 −5
THEOREM 7.3
1. In t = In . 2. For any matrix A, At t = A. 3. If AB is defined, then ABt = Bt At . Conclusion (1) should not be surprising, since row i of In is the same as column i, so interchanging rows and columns has no effect. Similarly, (2) is intutively clear. If we interchange rows and columns of A to form At , and then interchange the rows and columns of At , we should put everything back where it was, resulting in A again. We will prove conclusion (3). Proof Let A = aij be n × k and let B = bij be k × m. Then AB is defined and is n × m. Since Bt is m × k and At is k × n, then Bt At is defined and is m × n. Thus ABt and Bt At have the same dimensions. Now we must show that the i j element of ABt equals the i j element of Bt At . Falling back on the definition of matrix product, we have
i j element of Bt At =
k s=1
=
k s=1
This completes the proof of (3).
Bt is At sj =
k
bsi ajs
s=1
ajs bsi = j i element of AB = i j element of ABt
246
CHAPTER 7
Matrices and Systems of Linear Equations
In some calculations it is convenient to write the dot product of two n-vectors as a matrix product, using the transpose. Write the n-vector x1 x2 xn as a 1 × n column matrix: ⎛ ⎞ x1 ⎜ x2 ⎟ ⎜ ⎟ X = ⎜ ⎟ ⎝ ⎠ xn Then
X t = x1
···
x2
xn
an n × 1 matrix. Let y1 y2 · · · yn also be an n-vector, which we write as an 1 × n column matrix: ⎛ ⎞ y1 ⎜ y2 ⎟ ⎜ ⎟ Y = ⎜ ⎟ ⎝ ⎠ yn Then Xt Y is the 1 × 1 matrix X t Y = x1
x2
···
⎛ ⎞ y1 ⎜ y ⎜ 2⎟ ⎟ xn ⎜ ⎟ ⎝ ⎠ yn
= x1 y1 + x2 y2 + · · · + xn yn = X · Y Here we have written the resulting 1 × 1 matrix as just its single element, without the matrix brackets. This is common practice for 1 × 1 matrices. We now have the dot product of two n-vectors, written as 1 × n column vectors, as the matrix product Xt Y This will prove particularly useful when we treat eigenvalues of matrices.
7.1.4
Another Rationale for the Definition of Matrix Multiplication
We have seen that matrix multiplication allows us to write linear systems of algebraic and differential equations in compact matrix form as AX = B or AX = F. Matrix products are also tailored to other purposes, such as changes of variables in linear equations. To illustrate, consider a 2 × 2 linear system a11 x1 + a12 x2 = c1 a21 x1 + a22 x2 = c2
(7.1)
Change variables by putting x1 = h11 y1 + h12 y2 x2 = h21 y1 + h22 y2 Then a11 h11 y1 + h12 y2 + a12 h21 y1 + h22 y2 = c1
(7.2)
7.1 Matrices
247
and a21 h11 y1 + h12 y2 + a22 h21 y1 + h22 y2 = c2 After rearranging terms, the transformed system is a11 h11 + a12 h21 y1 + a11 h12 + a12 h22 y2 = c1 a21 h11 + a22 h21 y1 + a21 h12 + a22 h22 y2 = c2
(7.3)
Now carrry out the same transformation using matrices. Write the original system (7.1) as AX = C, where a a12 x c A = 11 X= 1 and C = 1 a21 a22 x2 c2 and the equations of the transformation (7.2) as X = HY, where h h12 y H = 11 and Y = 1 h21 h22 y2 Then AX = AHY = AHY = C Now observe that
a11 a12 h11 h12 AH = a21 a22 h21 h22 a11 h11 + a12 h21 a11 h12 + a12 h22 = a21 h11 + a22 h21 a21 h12 + a22 h22
exactly as we found in the system (7.3) by term by term substitution. The definition of matrix product is just what is needed to carry out a linear change of variables. This idea also applies to linear transformations in systems of differential equations.
7.1.5
Random Walks in Crystals
We will conclude this section with another application of matrix multiplication, this time to the problem of enumerating the paths atoms can take through a crystal lattice. Crystals have sites arranged in a lattice pattern. An atom may jump from a site it occupies to any adjacent, vacant site. If more than one adjacent site is vacant, the atom “selects” its target site at random. The path such an atom makes through the crystal is called a random walk. We can represent the lattice of locations and adjacencies by drawing a point for each location, with a line between two points only if an atom can move directly from one to the other in the crystal. Such a diagram is called a graph. Figure 7.1 shows a typical graph. In this graph an atom could move from point v1 to v2 or v3 , to which it is connected by lines, but not directly to v6 because there is no line between v1 and v6 . Two points are called adjacent in G if there is a line between them in the graph. A point is not considered adjacent to itself—there are no lines starting and ending at the same point. A walk of length n in such a graph is a sequence t1 tn+1 of points (not necessarily different) with each tj adjacent to tj+1 in the graph. Such a walk represents a possible path an atom might take through various sites in the crystal. Points may repeat in a walk because an atom may return to the same site any number of times. A vi − vj walk is a walk that begins at vi and ends at vj .
248
CHAPTER 7
Matrices and Systems of Linear Equations v2 v1
v3
v6 v5
v4
FIGURE 7.1
A typical graph.
Physicists and materials engineers who study crystals are interested in the following question: given a crystal with n sites labeled v1 vn , how many different walks of length k are there between any two sites (or from a site back to itself)? Matrices enter into the solution of this problem as follows. Define the adjacency matrix A of the graph to be the n × n matrix having each i i element zero, and for i = j, the i j-element equal to 1 if there is a line in the graph between vi and vj , and 0 if there is no such line. For example, the graph of Figure 7.1 has adjacency matrix ⎞ ⎛ 0 1 1 1 0 0 ⎜1 0 1 0 0 0 ⎟ ⎟ ⎜ ⎜1 1 0 1 0 0 ⎟ ⎟ ⎜ A=⎜ ⎟ ⎜1 0 1 0 1 1 ⎟ ⎝0 0 0 1 0 1⎠ 0 0 0 1 1 0 The 1 2 element of A is 1 because there is a line between v1 and v2 , while the 1 5 element is zero because there is no line between v1 and v5 . The following remarkable theorem uses the adjacency matrix to solve the walk-enumeration problem.
THEOREM 7.4
Let A = aij be the adjacency matrix of a graph G having points v1 vn . Let k be any positive integer. Then the number of distinct vi − vj walks of length k in G is equal to the i j element of Ak . We can therefore calculate the number of random walks of length k between any two points (or from any point back to itself) by reading the elements of the kth power of the adjacency matrix. Proof Proceed by mathematical induction on k. First consider the case k = 1. If i = j, there is a vi − vj walk of length 1 in G exactly when there is a line between vi and vj , and in this case aij = 1. There is no vi − vj walk of length 1 if vi and vj have no line between them, and in this case aij = 0. If i = j, there is no vi − vi walk of length 1, and aii = 0. Thus, in the case k = 1, the i j element of A gives the number of walks of length 1 from vi to vj , and the conclusion of the theorem is true. Now assume that the conclusion of the theorem is true for walks of length k. We will prove that the conclusion holds for walks of length k + 1. Thus, we are assuming that the i j
7.1 Matrices v3
v2
v8
v1 th
1
vr
g len
v7
length k
vi FIGURE 7.2
v4
v5
v6
vj
249
FIGURE 7.3
element of Ak is the number of distinct vi − vj walks of length k in G, and we want to prove that the i j element of Ak+1 is the number of distinct vi − vj walks of length k + 1. Consider how a vi − vj walk of length k + 1 is formed. First there must be a vi − vr walk of length 1 from vi to some point vr adjacent to vi , followed by a vr − vj walk of length k (Figure 7.2). Therefore number of distinct vi − vj walks of length k + 1 = sum of the number of distinct vr − vj walks of length k with the sum taken over all points vr adjacent to vi . Now air = 1 if vr is adjacent to vi , and air = 0 otherwise. Further, by the inductive hypothesis, the number of distinct vr − vj walks of length k is the r j element of Ak . Denote Ak = B = bij . Then, for r = 1 n, air brj = 0 if vr is not adjacent to vi and air brj = the number of distinct vi − vj walks of length k + 1 passing through vr , if vr is adjacent to vi . Therefore the number of vi − vj walks of length k + 1 in G is ai1 b1j + ai2 b2j + · · · + ain bnj because this counts the number of walks of length k from vr to vj for each point vr adjacent to vi . But this sum is exactly the i j element of AB, which is Ak+1 . This completes the proof by induction. For example, the adjacency matrix of the graph of Figure 7.3 is ⎛
0 ⎜1 ⎜ ⎜0 ⎜ ⎜0 A=⎜ ⎜0 ⎜ ⎜1 ⎜ ⎝0 0
1 0 1 0 0 0 1 1
0 1 0 1 0 0 0 0
0 0 1 0 1 1 1 1
0 0 0 1 0 1 1 0
1 0 0 1 1 0 0 0
0 1 0 1 1 0 0 1
⎞ 0 1⎟ ⎟ 0⎟ ⎟ 1⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 1⎠ 0
250
CHAPTER 7
Matrices and Systems of Linear Equations
Suppose we want the number of v4 − v7 ⎛ 0 5 ⎜6 2 ⎜ ⎜1 7 ⎜ ⎜4 4 A3 = ⎜ ⎜2 5 ⎜ ⎜4 4 ⎜ ⎝3 9 2 8
walks of length 3 in G. Calculate ⎞ 1 4 2 4 3 2 7 4 5 4 9 8⎟ ⎟ 0 8 3 2 3 2⎟ ⎟ 8 6 8 8 11 10⎟ ⎟ 3 8 4 6 8 4⎟ ⎟ 2 8 6 2 4 4⎟ ⎟ 3 11 8 4 6 7⎠ 2 10 4 4 7 4
We read from the 4 7 element of A3 that there are 11 walks of length 3 from v4 to v7 . For this relatively simple graph, we can actually list all these walks: v4 v7 v4 v7 v4 v3 v4 v7 v4 v8 v4 v7 v4 v5 v4 v7 v4 v6 v4 v7 v4 v7 v8 v7 v4 v7 v5 v7 v4 v7 v2 v7 v4 v3 v2 v7 v4 v8 v2 v7 v4 v6 v5 v7 Obviously it would not be practical to determine the number of vi − vj walks of length k by explicitly listing them if k or n is large. Software routines for matrix calculations make this theorem a practical solution to the random walk counting problem.
SECTION 7.1
PROBLEMS
In each of Problems 1 through 6, carry out the computation with the given matrices A and B. ⎞ ⎛ ⎛ 1 −1 3 −4 0 ⎟ ⎜ ⎜ 6⎠ B = ⎝−2 −1 1. A = ⎝ 2 −4 −1 1 2 8 15 2A − 3B ⎞ ⎛ ⎛ 3 −2 2 ⎜ 2 ⎜ 0 1⎟ ⎟ ⎜ 2. A = ⎜ ⎝ 14 2⎠ B = ⎝14 1 6 8 x 1−x 1 3. A = B = 2 ex x
requested ⎞ 0 ⎟ 6⎠ 4
⎞ 4 1⎟ ⎟ −5A + 3B 16⎠ 25 −6 A2 + 2AB cosx
4. A = 14 B = −12 −3A − 5B 1 −2 1 7 −9 5. A = 8 2 −5 0 0 −5 1 8 21 7 4A + 8B B= 12 −6 −2 −1 9 0 8 −2 3 B = A3 − B2 6. A = −5 1 1 1 In each of Problems 7 through 16, determine which of AB and BA are defined. Carry out the products that are defined.
⎛
−4 7. A = ⎝−2 1 −2 8. A = 3 9. A = −1 ⎛
10.
11.
12.
13.
−3 ⎜ 6 A=⎜ ⎝ 18 1 ⎛ −21 ⎜ 12 A=⎜ ⎝ 1 13 −2 A= 3 ⎛ −4 A=⎝ 0 −3
⎞ ⎞ ⎛ −2 4 6 12 5 6 2 1 4⎠ −2 3⎠ B = ⎝−3 −3 1 0 0 1 6 −9 1 8 −4 6 8 B = 1 −4 −1 ⎛ ⎞ −3 ⎜ 2⎟ ⎜ ⎟ ⎟ 6 2 14 −22 B = ⎜ ⎜ 6⎟ ⎝ 0⎠ −4 ⎞ 1 2⎟ ⎟ B = −16 0 0 28 −22⎠ 0 1 1 26 6 ⎞ 4 8 −3 3 2 1 0 14⎟ ⎟ B = −9 16 5 9 14 0 16 0 −8⎠ 4 8 0 1 −3 7 2 4 B = −5 6 1 0 9 ⎞ −2 0 5 3⎠ B = 1 −3 4 1 1
7.2 Elementary Row Operations and Elementary Matrices ⎞ 3 ⎜ 0⎟ ⎟ 14. A = ⎜ ⎝−1⎠ B = 3 −2 7 4 7 −8 1 −4 B = 15. A = −4 7 1 6 ⎞ ⎛ −3 2 ⎜ 0 −2⎟ ⎟ B = −5 5 ⎜ 16. A = ⎝ 1 8⎠ 3 −3 ⎛
251
24. Let G be the graph of Figure 7.5. Determine the number of v1 − v4 walks of length 4 in G. Determine the number of v2 − v3 walks of length 2. 3 0
7
v1 2
In each of Problems 17 through 21,determine if AB is defined and if BA is defined. For those products that are defined, give the dimensions of the product matrix. 17. A is 14 × 21, B is 21 × 14. 18. A is 18 × 4, B is 18 × 4. 19. A is 6 × 2, B is 4 × 6.
v2 v5 v3
v4
FIGURE 7.5
25. Let G be the graph of Figure 7.6. Determine the number of v4 − v5 walks of length 2, the number of v2 − v3 walks of length 3, and the number of v1 − v2 and v4 − v5 walks of length 4 in G.
20. A is 1 × 3, B is 3 × 3. 21. A is 7 × 6, B is 7 × 7. 22. Find nonzero 2 × 2 matrices A, B, and C such that BA = CA but B = C. 23. Let G be the graph of Figure 7.4. Determine the number of v1 − v4 walks of length 3, the number of v2 − v3 walks of length 3, and the number of v2 − v4 walks of length 4 in G.
v5
v1 v2 v4
v3
FIGURE 7.6
26. Let A be the adjacency matrix of a graph G.
v2 v1 v3
v5 v4 FIGURE 7.4
7.2
(a) Prove that the i j− element of A2 equals the number of points of G that are neighbors of vi in G. This number is called the degree of vi . (b) Prove that the i j− element of A3 equals twice the number of triangles in G containing vi as a vertex. A triangle in G consists of three points, each a neighbor of the other two.
Elementary Row Operations and Elementary Matrices When we solve a system of linear algebraic equations by elimination of unknowns, we routinely perform three kinds of operations: interchange of equations, multiplication of an equation by a nonzero constant, and addition of a constant multiple of one equation to another equation. When we write a homogeneous system in matrix form AX = O, row k of A lists the coefficients in equation k of the system. The three operations on equations correspond, respectively, to the interchange of two rows of A, multiplication of a row A by a constant, and addition of a scalar multiple of one row of A to another row of A. We will focus on these row operations in anticipation of using them to solve the system.
252
CHAPTER 7
Matrices and Systems of Linear Equations
DEFINITION 7.9
Let A be an n × m matrix. The three elementary row operations that can be performed on A are: 1. Type I operation: interchange two rows of A. 2. Type II operation: Multiply a row of A by a nonzero constant. 3. Type III operation: Add a scalar multiple of one row to another row.
The rows of A are m-vectors. In a Type II operation, multiply a row by a nonzero constant by multiplying this row vector by the number. That is, multiply each element of the row by that number. Similarly, in a Type III operation, we add a scalar multiple of one row vector to another row vector.
EXAMPLE 7.8
Let ⎛
−2 ⎜ 1 A=⎜ ⎝ 0 2
1 1 1 −3
Type I operation: if we interchange rows 2 and 4 ⎛ −2 1 ⎜ 2 −3 ⎜ ⎝ 0 1 1 1
⎞ 6 2⎟ ⎟ 3⎠ 4
of A, we obtain the new matrix ⎞ 6 4⎟ ⎟ 3⎠ 2
Type II operation: multiply row 2 of A by 7 to get ⎛ ⎞ −2 1 6 ⎜ 7 7 14⎟ ⎜ ⎟ ⎝ 0 1 3⎠ 2 −3 4 Type III operation: add 2 times row 1 to row 3 of A, obtaining ⎛ ⎞ −2 1 6 ⎜ 1 1 2⎟ ⎜ ⎟ ⎝−4 3 15⎠ 2 −3 4 Elementary row operations can be performed on any matrix. When performed on an identity matrix, we obtain special matrices that will be particularly useful. We therefore give matrices formed in this way a name.
7.2 Elementary Row Operations and Elementary Matrices
DEFINITION 7.10
253
Elementary Matrix
An elementary matrix is a matrix formed by performing an elementary row operation on In .
For example,
⎛
0 ⎝1 0
1 0 0
⎞ 0 0⎠ 1
is an elementary matrix, obtained from I3 by interchanging rows 1 and 2. And ⎛ ⎞ 1 0 0 ⎝ 0 1 0⎠ −4 0 1 is the elementary matrix formed by adding −4 times row 1 of I3 to row 3. The following theorem is the reason why elementary matrices are interesting. It says that each elementary row operation on A can be performed by multiplying A on the left by an elementary matrix. THEOREM 7.5
Let A be an n × m matrix. Let B be formed from A by an elementary row operation. Let E be the elementary matrix formed by performing this elementary row operation on In . Then B = EA We leave a proof to the exercises. It is instructive to see the theorem in practice.
EXAMPLE 7.9
Let
⎛
⎞ 1 −5 4⎠ A=⎝ 9 −3 2
Suppose we form B from A by interchanging rows 2 and 3 of A. We can do this directly. But we can also form an elementary matrix by performing this operation on I3 to form ⎛ ⎞ 1 0 0 E = ⎝0 0 1⎠ 0 1 0 Then
⎛
1 EA = ⎝0 0
0 0 1
⎞ ⎞ ⎛ ⎞⎛ 1 −5 1 −5 0 2⎠ = B 4⎠ = ⎝−3 1⎠ ⎝ 9 9 4 −3 2 0
254
CHAPTER 7
Matrices and Systems of Linear Equations
EXAMPLE 7.10
Let A=
0 5
−7 3 6 1 −11 3
Form C from A by multiplying row 2 by −8. Again, we can do this directly. However, if we form E from by performing this operation on I2 , then 1 0 E= 0 −8 and
1 0 0 −7 3 6 EA = 0 −8 5 1 −11 3 0 −7 3 6 = = C −40 −8 88 −24
EXAMPLE 7.11
Let
⎛
−6 14 4 A=⎝ 4 −3 2 Form D from A by adding 6 times row 1 to row 2. ⎛ 1 0 E = ⎝6 1 0 0 then
⎞ 2 −9⎠ 13 If we perform this operation on I3 to form ⎞ 0 0⎠ 1
⎞ ⎞⎛ −6 14 2 0 4 −9⎠ 0⎠ ⎝ 4 1 −3 2 13 ⎞ −6 14 2 3⎠ = B = ⎝−32 88 −3 2 13 ⎛
1 EA = ⎝6 0 ⎛
0 1 0
Later we will want to perform not just one elementary row operation, but a sequence of such operations. Suppose we perform operation 1 on A to form A1 , then operation 2 on A1 to form A2 , and so on until finally we perform r on Ar−1 to get Ar . This process can be diagrammed: A → A1 → A2 → · · · → Ar−1 → Ar 1
2
3
r−1
r
7.2 Elementary Row Operations and Elementary Matrices
255
Let Ej be the elementary matrix obtained by performing operation j on In . Then A1 = E1 A A2 = E2 A1 = E2 E1 A A3 = E3 A2 = E3 E2 E1 A Ar = Er Ar−1 = Er Er−1 · · · E3 E2 E1 A This forms a matrix = Er Er−1 · · · E2 E1 such that Ar = A
The significance of this equation is that we have produced a matrix such that multiplying A on the left by performs a given sequence of elementary row operations. is formed as a product Er Er−1 E2 E1 of elementary matrices, in the correct order, with each elementary matrix performing one of the prescribed elementary row operations in the sequence (E1 performs the first operation, E2 the second, and so on until Er performs the last). We will record this result as a theorem. THEOREM 7.6
Let A be an n × m matrix. If B is produced from A by any finite sequence of elementary row operations, then there is an n × n matrix such that B = A The proof of the theorem is contained in the line of reasoning outlined just prior to its statement.
EXAMPLE 7.12
Let
⎛
⎞ 2 1 0 A = ⎝ 0 1 2⎠ −1 3 2 We will form a new matrix B from A by performing, in order, 1 : interchange rows 1 and 2 of A to form A1 . 2 : multiply row 3 of A1 by 2 to form A2 . 3 : add two times row 1 to row 3 of A2 to get A3 = B If we perform this sequence in order, starting with A, we get ⎛ ⎞ ⎛ 0 1 2 0 1 A → A1 = ⎝ 2 1 0⎠ → A2 = ⎝ 2 1 1 2 −1 3 2 −2 6 ⎛ ⎞ 0 1 2 → A3 = ⎝ 2 1 0⎠ = B 3 −2 8 8
the following operations:
⎞ 2 0⎠ 4
256
CHAPTER 7
Matrices and Systems of Linear Equations
To produce such that B = A, perform this sequence of operations in turn, beginning with I3 : ⎛ ⎞ ⎛ ⎞ 0 1 0 0 1 0 I3 → ⎝ 1 0 0 ⎠ → ⎝ 1 0 0 ⎠ 1 2 0 0 1 0 0 2 ⎛ ⎞ 0 1 0 → ⎝1 0 0⎠ = . 3 0 2 2 Now check that
⎛
0 A = ⎝1 0 ⎛
0 =⎝ 2 −2
1 0 2
⎞⎛ ⎞ 0 2 1 0 0⎠ ⎝ 0 1 2⎠ 2 −1 3 2 ⎞ 1 2 1 0⎠ = B 8 8
It is also easy to check that = E3 E2 E1 , where Ej is the elementary matrix obtained by performing operation j on I3 .
EXAMPLE 7.13
Let
⎛
6 A = ⎝9 0
−1 1 3 7 2 1
⎞ 4 −7⎠ 5
We want to perform, in succession and in the given order, the following operations: 1 : add −3row 2) to row 3, 2 : add 2(row 1) to row 2, 3 : interchange rows 1 and 3, 4 : multiply row 2 by −4. Suppose the end result of these operations is the matrix B. We will produce a 3 × 3 matrix such that B = A. Perform the sequence of operations, starting with I3 : ⎛ ⎞ ⎛ ⎞ 1 0 0 1 0 0 1 0 ⎠ → ⎝2 1 0⎠ I3 → ⎝0 1 2 0 −3 1 0 −3 1 ⎛ ⎞ ⎛ ⎞ 0 −3 1 0 −3 1 1 0⎠ → ⎝−8 −4 0⎠ = → ⎝2 3 4 1 0 0 1 0 0 Then
⎛
⎞⎛ ⎞ 0 −3 1 6 −1 1 4 3 7 −7⎠ A = ⎝−8 −4 0⎠ ⎝9 1 0 0 0 2 1 5 ⎛ ⎞ −27 −7 −20 26 = ⎝−84 −4 −36 −4⎠ = B 6 −1 1 4
7.2 Elementary Row Operations and Elementary Matrices
257
It is straightforward to check that = E4 E3 E2 E1 , where Ej is the elementary matrix obtained from I3 by applying operation j . If the operations j are performed in succession, starting with A, then B results.
DEFINITION 7.11
Row Equivalence
Two matrices are row equivalent if and only if one can be obtained from the other by a sequence of elementary row operaitons.
In each of the last two examples, B is row equivalent to A. The relationship of row equivalence has the following properties: THEOREM 7.7
1. Every matrix is row equivalent to itself. (This is the reflexive property). 2. If A is row equivalent to B, then B is row equivalent to A. (This is the symmetry property). 3. If A is row equivalent to B, and B to C, then A is row equivalent to C. (This is transitivity). It is sometimes of interest to undo the effect of an elementary row operation. This can always be done by the same kind of elementary row operation. Consider each kind of operation in turn. If we interchange rows i and j of A to form B, then interchanging rows i and j of B yields A again. Thus a Type I operation can reverse a Type I operation. If we form C from A by multiplying row i by nonzero , then multiplying row i of C by 1/ brings us back to A. A Type II operation can reverse a Type II operation. Finally, suppose we form D from A by adding (row i) to row j. Then ⎛ ⎞ a11 a12 · · · a1m ⎜· · · · · · · · · · · · ⎟ ⎜ ⎟ ⎜ ai1 ai2 · · · aim ⎟ ⎜ ⎟ ⎟ A=⎜ ⎜· · · · · · · · · · · · ⎟ ⎜ aj1 aj2 · · · ajm ⎟ ⎜ ⎟ ⎝· · · · · · · · · · · · ⎠ an1 an2 · · · anm and
⎛
a11 ⎜ ··· ⎜ ⎜ ai1 ⎜ D=⎜ ⎜ ··· ⎜ai1 + aj1 ⎜ ⎝ ··· an1
a12 ··· ai2 ··· ai2 + aj2 ··· an2
⎞ ··· a1m ⎟ ··· ··· ⎟ ⎟ ··· aim ⎟ ⎟ ··· ··· ⎟ · · · aim + ajm ⎟ ⎟ ⎠ ··· ··· ··· anm
Now we can get from D back to A by adding −(row i) to row j of D. Thus a Type III operation can be used to reverse a Type III operation. This ability to reverse the effects of elementary row operations will be useful later, and we will record it as a theorem.
CHAPTER 7
258
Matrices and Systems of Linear Equations
THEOREM 7.8
Let E1 be an elementary matrix that performs an elementary row operation on a matrix A. Then there is an elementary matrix E2 such that E2 E1 A = A. In fact, E2 E1 = In .
SECTION 7.2
PROBLEMS
In each of Problems 1 through 8, perform the row operation, or sequence of row operations, directly on A, and then find a matrix such that the final result is A. ⎞ ⎛ −2 1 4 2 √ 1 16 3⎠; multiply row 2 by 3 1. A = ⎝ 0 1 −2 4 8 ⎞ ⎛ 3 −6 ⎜1 1⎟ ⎟; add 6 times row 2 to row 3. ⎜ 2. A = ⎝ 8 −2⎠ 0 5 ⎞ ⎛ −2 14 6 √ 1 −3⎠; add 13 times row 3 to row 1, 3. A = ⎝ 8 2 9 5 then interchange rows 2 and 1, then multiply row 1 by 5. ⎞ ⎛ −4 6 −3 4. A = ⎝ 12 4 −4⎠; interchange rows 2 and 3, then 1 3 0 add negative row 1 to row 2. √ −3 15 ; add 3 times row 2 to row 1, then 5. A = 2 8 multiply row 2 by 15, then interchange rows 1 and 2. ⎞ ⎛ 3 −4 5 9 1 3 −6⎠; add row 1 to row 3, then 6. A = ⎝2 1 13 2 6
7.3
√ add 3 times row 1 to row 2, then multiply row 3 by row 4, then add row 2 to row 3. ⎞ ⎛ −1 0 3 0 2 9⎠; multiply row 3 by 4, then 7. A = ⎝ 1 3 −9 7 −5 7 add 14 times row 1 to row 2, then interchange rows 3 and 2. ⎞ ⎛ 0 −9 14 5 2⎠; interchange rows 2 and 3, then 8. A = ⎝1 9 15 0 add 3 times row 2 to row 3, then interchange rows 1 and 3, then multiply row 3 by 4. In Problems 9, 10, and 11, A is an n × m matrix. 9. Let B be formed from A by interchanging rows s and t. Let E be formed from In by interchanging rows s and t. Prove that B = EA. 10. Let B be formed from A by multiplying row s by , and let E be formed from In by multiplying row s by . Prove that B = EA. 11. Let B be formed from A by adding times row s to row t. Let E be formed from In by adding times row s to row t. Prove that B = EA.
The Row Echelon Form of a Matrix Sometimes a matrix has a special form that makes it convenient to work with in solving certain problems. For solving systems of linear algebraic equations, we want the reduced row echelon form, or reduced form, of a matrix. Let A be an n × m matrix. A zero row of A is a row having each element equal to zero. If at least one element of a row is nonzero, that row is a nonzero row. The leading entry of a nonzero row is its first nonzero element, reading from left to right. For example, if ⎛ ⎞ 0 2 7 ⎜1 −2 0⎟ ⎟ A=⎜ ⎝0 0 0⎠ 0 0 9
7.3 The Row Echelon Form of a Matrix
259
then row three is a zero row and rows one, two, and four are nonzero rows. The leading entry of row 1 is 2, the leading entry of row 2 is 1, and the leading entry of row 4 is 9. We do not speak of a leading entry of a zero row. We can now define a reduced row echelon matrix.
DEFINITION 7.12
Reduced Row Echelon Matrix
A matrix is in reduced row echelon form if it satisfies the following conditions: 1. The leading entry of any nonzero row is 1. 2. If any row has its leading entry in column j, then all other elements of column j are zero. 3. If row i is a nonzero row and row k is a zero row, then i < k. 4. If the leading entry of row r1 is in column c1 , and the leading entry of row r2 is in column c2 , and if r1 < r2 , then c1 < c2 . A matrix in reduced row echelon form is said to be in reduced form, or to be a reduced matrix.
A reduced matrix has a very special structure. By condition (1), if we move from left to right along a nonzero row, the first nonzero number we see is 1. Condition (2) means that, if we stand at the leading entry 1 of any row, and look straight up or down, we see only zeros in the rest of this column. A reduced matrix need not have any zero rows. But if there is a zero row, it must be below any nonzero row. That is, all the zero rows are at the bottom of the matrix. Condition (4) means that the leading entries move downward to the right as we look at the matrix.
EXAMPLE 7.14
The following four matrices are all reduced:
⎞ 0 1 3 0 1 −4 1 0 ⎝ 0 0 0 1⎠ 0 0 0 1 0 0 0 0 ⎞ ⎛ 2 0 0 1 0 0 3 ⎜0 1 0 −2 0 1 0⎟ ⎟ and ⎜ ⎝0 0 1 0 0 0⎠ 0 0 0 0 0 0 0 0
⎛
0 ⎜0 ⎜ ⎝0 0
1 0 0 0
⎛
⎞ 1 4⎟ ⎟ 1⎠ 0
EXAMPLE 7.15
To see one context in which reduced matrices are interesting, consider the last matrix of the preceding example and suppose it is the matrix of coefficients of a system of homogeneous linear equations. This system is AX = O, and the equations are: x1 + 3x4 + x5 = 0 x2 − 2x4 + 4x5 = 0 x3 + x5 = 0
260
CHAPTER 7
Matrices and Systems of Linear Equations
The fourth row represents the equation 0x1 + 0x2 + 0x3 + 0x4 = 0, which we do not write out (it is satisfied by any numbers x1 through x4 , and so provides no information). Because the matrix of coefficients is in reduced form, this system is particularly easy to solve. From the third equation, x3 = −x5 From the second equation, x2 = 2x4 − 4x5 And from the first equation, x1 = −3x4 − x5 We can therefore choose x4 = , any number, and x5 = , any number, and obtain a solution by choosing the other unknowns as x1 = −3 −
x2 = 2 − 4
x3 = −
The form of the reduced matrix is selected just so that as a matrix of coefficients of a system of linear equations, the solution of these equations can be read by inspection.
EXAMPLE 7.16
The matrix
⎛
0 ⎜0 A=⎜ ⎝0 0
1 0 0 0
5 1 0 0
0 0 1 0
⎞ 0 0⎟ ⎟ 0⎠ 1
is not reduced. The leading entry of row 2 is 1, as it must be, but there is a nonzero element in the column containing this leading entry. However, A is row equivalent to a reduced matrix. If we add −5 (row 2) to row 1, we obtain ⎛ ⎞ 0 1 0 0 0 ⎜0 0 1 0 0⎟ ⎟ B=⎜ ⎝0 0 0 1 0⎠ 0 0 0 0 1 and this is a reduced matrix.
EXAMPLE 7.17
The matrix
⎛
2 C = ⎝0 1
0 1 0
⎞ 0 0⎠ 1
is not reduced. The leading entry of the first row is not 1, and the first column, containing this leading entry of row 1, has another nonzero element. In addition, the leading entry of row 3 is to the left of the leading entry of row 2, and this violates condition (4). However, C is row
7.3 The Row Echelon Form of a Matrix equivalent to a reduced matrix. First form D by ⎛ 1 D = ⎝0 1
261
multiplying row 1 by 21 : ⎞ 0 0 1 0⎠ 0 1
Now form F from D by adding − (row 1) to row 3: ⎞ ⎛ 1 0 0 F = ⎝0 1 0⎠ 0 0 1 Then F is a reduced matrix that is row equivalent to C, since it was formed by a sequence of elementary row operations, starting with C. In the last two examples we had matrices that were not in reduced form, but could in both cases proceed to a reduced matrix by elementary row operations. We claim that this is always possible (although in general more operations may be needed than in these two examples). THEOREM 7.9
Every matrix is row equivalent to a reduced matrix. The proof consists of exhibiting a sequence of elementary row operations that will produce a reduced matrix. Let A be any matrix. If A is a zero matrix, we are done. Thus suppose that A has at least one nonzero row. Reading from left to right across the matrix, find the first column having a nonzero element. Suppose this is in column c1 . Reading from top to bottom in this column, suppose is the top nonzero element. Say is in row r1 Multiply this row by 1/ to obtain a matrix B in which column c1 has its top nonzero element equal to 1, and this is in row r1 . If any row below r1 in B has a nonzero element in column c1 , add − times row r1 to this row. In this way we obtain a matrix C that is row equivalent to A, having 1 in the r1 c1 position, and all other elements of column c1 equal to zero. Now interchange, if necessary, rows 1 and r1 of C to obtain a matrix D having leading entry 1 in row 1 and column c1 , and all other elements of this column equal to zero. Further, by choice of c1 , any column of D to the left of column c1 has all zero elements (if there is such a column). D is row equivalent to A. If D is reduced, we are done. If not, repeat this procedure, but now look for the first column, say column c2 , to the right of column c1 having a nonzero element below row 1. Let ! be the top nonzero element of this column lying below row 1. Say this element occurs in row r2 . Multiply row r2 by 1/! to obtain a new matrix E having 1 in the r2 c2 position. If this column has a nonzero element above or below row r2 , add − (row r2 ) to this row. In this way we obtain a matrix F that is row equivalent to A and has leading entry 1 in row r2 and all other elements of column c2 equal to zero. Finally, form G from F by interchanging rows r2 and 2, if necessary. If G is reduced we are done. If not, locate the first column to the right of column c2 having a nonzero element and repeat the procedure done to form the first two rows of G. Since A has only finitely many columns, eventually this process terminates in a reduced matrix R. Since R was obtained from A by elementary row operations, R is row equivalent to A and the proof is complete. Proof
The process of obtaining a reduced matrix row equivalent to a given matrix A is referred to as reducing A. It is possible to reduce a matrix in many different ways (that is, by different sequences of elementary row operations). We claim that this does not matter and that for a given A any reduction process will result in the same reduced matrix.
262
CHAPTER 7
Matrices and Systems of Linear Equations
THEOREM 7.10
Let A be a matrix. Then there is exactly one reduced matrix AR that is row equivalent to A. We leave a proof of this result to the student. In view of this theorem, we can speak of the reduced form of a given matrix A. We will denote this matrix AR .
EXAMPLE 7.18
Let
⎛
−2 A=⎝ 0 2
1 1 0
⎞ 3 1⎠ 1
We want to find AR . Column 1 has a nonzero element in row 1. Begin with: ⎛ ⎞ −2 1 3 A = ⎝ 0 1 1⎠ 2 0 1 Begin with the operations:
⎛
1
1 ⎜ → ⎝0 2 2 ⎛ 1 ⎜ → add −2(row 1) to row 3 → ⎝0 0 multiply row 1 by −
− 21 1 0 − 21 1 1
− 23
⎞
⎟ 1⎠ 1 ⎞ 3
−2
⎟ 1⎠ 4
In the last matrix, column 2 has a nonzero element below row 1, the highest being 1 in the 2 2 position. Since we want a 1 here, we do not have to multiply this row by anything. However, we want zeros above and below this 1, in the 1 2 and 3 2 positions. Thus add 1/2 times row 2 to row 1, and − row 2 to row 3 in the last matrix to obtain ⎛ ⎞ 1 0 −1 ⎝0 1 1⎠ 0 0 3 In this matrix column 3 has a nonzero element below row 2, in the 3 3 location. Multiply row 3 by 1/3 to obtain ⎛ ⎞ 1 0 −1 ⎝0 1 1⎠ 0 0 1 Finally, we want zeros above the 3 3 position in column 3. Add row 3 to row 1 and − row 3 to row 2 to get ⎛ ⎞ 1 0 0 AR = ⎝0 1 0⎠ 0 0 1 This is AR because it is a reduced matrix and it is row equivalent to A.
7.3 The Row Echelon Form of a Matrix
263
To illustrate the last theorem, we will use a different sequence of elementary row operations to reduce A, arriving at the same final result. Proceed: ⎛ 0 A → (add row 3 to row 1) → ⎝0 2 ⎛ 0 add −1(row 2) to row 1 → ⎝0 2 ⎛ 0 1 (row 1) → ⎝0 3 2
⎞ 4 1⎠ 1 ⎞ 3 1⎠ 1 ⎞ 1 1⎠ 1 ⎞ 1 0⎠ 0 ⎞ 0 0⎠ 1 ⎞ 0 0 ⎠ = AR 1
1 1 0 0 1 0 0 1 0
⎛
0 add −1(row 1) to rows 2 and 3 → ⎝0 2 ⎛ 2 interchange rows 1 and 3 → ⎝0 0 ⎛ 1 1 (row 1) → ⎝0 2 0
0 1 0 0 1 0 0 1 0
EXAMPLE 7.19
Let ⎛
0 ⎜0 B=⎜ ⎝0 0
0 0 1 4
0 2 0 3
0 0 1 4
⎞ 0 0⎟ ⎟ 1⎠ 0
Reduce B as follows: ⎛
0 ⎜0 add − 4(row 3) to row 4 → ⎜ ⎝0 0 ⎛ 0 ⎜0 interchange rows 3 and 1 → ⎜ ⎝0 0 ⎛ 0 ⎜0 1 (row 2) → ⎜ ⎝0 2 0
0 0 1 0
0 2 0 3
0 0 1 0
1 0 0 0
0 2 0 3
1 0 0 0
1 0 0 0
0 1 0 3
1 0 0 0
⎞ 0 0⎟ ⎟ 1⎠ −4 ⎞ 1 0⎟ ⎟ 0⎠ −4 ⎞ 1 0⎟ ⎟ 0⎠ −4
264
CHAPTER 7
Matrices and Systems of Linear Equations ⎛
0 ⎜0 add −3row 2) to row 4 → ⎜ ⎝0 0 ⎛ 0 ⎜0 1 − (row 4) → ⎜ ⎝0 4 0 ⎛ 0 ⎜0 add −1(row 4) to row 1 → ⎜ ⎝0 0 ⎛ 0 ⎜0 interchange row 3 and 4 → ⎜ ⎝0 0
1 0 0 0
0 1 0 0
1 0 0 0
1 0 0 0
0 1 0 0
1 0 0 0
1 0 0 0
0 1 0 0
1 0 0 0
1 0 0 0
0 1 0 0
1 0 0 0
⎞ 1 0⎟ ⎟ 0⎠ −4 ⎞ 1 0⎟ ⎟ 0⎠ 1 ⎞ 0 0⎟ ⎟ 0⎠ 1 ⎞ 0 0⎟ ⎟ 1⎠ 0
This is a reduced matrix, hence it is the reduced matrix BR of B. In view of Theorem 7.6 of the preceding section, we immediately have the following.
THEOREM 7.11
Let A be an n × m matrix. Then there is an n × n matrix such that A = AR . There is a convenient notational device that enables us to find both and AR together. We know what is. If A is n = n × m, then is an n × n matrix formed by starting with In and carrying out, in order, the same sequence of elementary row operations used to reduce A. A simple way to form while reducing A is to form an n × n + m matrix
I A by placing n In alongside A on its left. The first n columns of this matrix In A are just In , and the last m columns are A. Now reduce A by elementary row operations, performing the same operations on the first n columns (In ) as well. When A is reduced, the resulting n × n + m matrix will have the form AR , and we read as the first n columns.
EXAMPLE 7.20
Let A=
−3 1 0 4 −2 1
We want to find a 2 × 2 matrix such that A = AR . Since A is 2 × 3, form the matrix ⎛ 1
I2 A = ⎝ 0
0 1
−3 4
⎞ 1 0⎠ −2 1
7.3 The Row Echelon Form of a Matrix
265
Now reduce the last three columns, performing the same operations on the first two. The column of dashes is just a bookkeeping device to separate A from I2 . Proceed ⎛ ⎞ 1 1 1 − 0 1 − 0 3 ⎠
I2 A → − (row 1) → ⎝ 3 3 0 1 4 −2 1 ⎛ 1 −1 → (row 2) → ⎝ 3 4 0 ⎛ −1 → (row 2 − row 1) → ⎝ 3
1 4
→ −6(row 2) →
0
1 3
1 4
⎛ 1 ⎝− 3
− 23
⎛ 1 −1 → (row 2) + (row 1) → ⎝ 3 −2
− 21 − 23
1 4
1
− 13
⎞ 0⎠
0
− 16
1 4
0
−2
⎞ 0⎠
1 − 13 1 − 21
0
1
− 13
0
1
⎞ 0⎠ − 23 ⎞
1
0
− 21 ⎠
0
1
− 23
The last three columns are in reduced form, so they form AR . The first two columns form : −1 − 21 = −2 − 23 As a check on this, form the product −1 − 21 −3 A = 4 −2 − 23
SECTION 7.3
1 1 0 = −2 1 0
0
− 21
1
− 23
PROBLEMS
In each of the following, find the reduced form of A and produce a matrix such that A = AR . ⎞ ⎛ 1 −1 3 1 2⎠ 1. A = ⎝0 0 0 0 3 1 1 4 2. A = 0 1 0 0 ⎞ ⎛ −1 4 1 1 ⎜ 0 0 0 0⎟ ⎟ 3. A = ⎜ ⎝ 0 0 0 0⎠ 0 0 0 1 1 0 1 1 −1 4. A = 0 1 0 0 2
⎛
6 ⎜0 5. A = ⎜ ⎝1 0
2 1
6. A =
⎛
−1 7. A = ⎝ 2 7 8. A =
−3 0
⎞ 1 0⎟ ⎟ 3⎠ 1 2 1
4 3 1
⎞ 6 −5⎠ 1
4 0
4 0
= AR
CHAPTER 7
266
9. A =
−1 1
⎛
8 10. A = ⎝0 4
7.4
2 0 2 1 0
3 0 1 1 0
Matrices and Systems of Linear Equations 1 0
⎛
4 1 11. A = ⎝2 2 0 1 ⎛ ⎞ 6 ⎜−3⎟ ⎟ 12. A = ⎜ ⎝ 1⎠ 1
⎞ 0 3⎠ −3
⎞ −7 0⎠ 0
The Row and Column Spaces of a Matrix and Rank of a Matrix In this section we will develop three numbers associated with matrices that play a significant role in the solution of systems of linear equations. Suppose A is an n × m matrix with real number elements. Each row of A has m elements and can be thought of as a vector in Rm . There are n such vectors. The set of all linear combinations of these row vectors is a subspace of Rm called the row space of A. This space is spanned by the row vectors. If these row vectors are linearly independent, they form a basis for this row space and this space has dimension n. If they are not linearly independent, then some subset of them forms a basis for the row space, and this space has dimension < n. If we look down instead of across, we can think of each column of A as a vector in Rn . We often write these vectors as columns simply to keep in mind their origin, although they can be written in standard vector notation. The set of all linear combinations of these columns forms a subspace of Rn . This is the column space of A. If these columns are linearly independent, they form a basis for this column space, which then has dimension m; otherwise, this dimension is less than m.
EXAMPLE 7.21
Let
⎛
⎞ −2 6 1 ⎜ 2 2 −4⎟ ⎜ ⎟ ⎜ B = ⎜ 10 −8 12⎟ ⎟ ⎝ 3 1 −2⎠ 5 −5 7
The row space is the subspace of R3 spanned by the row vectors of B. This row space consists of all vectors −2 6 1 + 2 2 −4 + !10 −8 12 + 3 1 −2 + 5 −5 7 The first three row vectors are linearly independent. The last two are linear combinations of the first three. Specifically, 3 1 −2 =
4 181 13 −2 6 1 + 2 2 −4 + 10 −8 12 101 202 101
and 5 −5 7 = −
39 53 7 −2 6 1 − 2 2 −4 + 10 −8 12 101 202 101
The first three row vectors form a basis for the row space, which therefore has dimension 3. The row space of B is all of R3 .
7.4 The Row and Column Spaces of a Matrix and Rank of a Matrix
267
The column space of B is the subspace of R5 consisting of all vectors ⎛
⎞ ⎛ ⎞ ⎛ ⎞ −2 6 1 ⎜ 2⎟ ⎜ 2⎟ ⎜−4⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎜ 10⎟ + ⎜−8⎟ + ! ⎜ 12⎟ ⎝ 3⎠ ⎝ 1⎠ ⎝−2⎠ 5 −5 7 These three column vectors are linearly independent in R5 . Neither is a linear combination of the other two, or, equivalently, the only way this linear combination can be the zero vector is for = = ! = 0. Therefore the column space of B has dimension 3 and is a subspace of the dimension 5 space R5 . In this example the row space of matrix had the same dimension as the column space, even though the row vectors were in Rm and the column vectors in Rn , with n = m. This is not a coincidence.
THEOREM 7.12
For any matrix A having real numbers as elements, the row and column spaces have the same dimension. Proof
Suppose A is n × m: ⎛
a11 a21
⎜ ⎜ ⎜ ⎜ ⎜ A=⎜ ⎜ ar1 ⎜ar+11 ⎜ ⎜ ⎝ an1
a12 a22
··· ···
a1r a2r
a1r+1 a2r+1
··· ···
ar2 ar+12
··· ···
arr ar+1r
arr+1 ar+1r+1
··· ···
an2
···
anr
anr +1
···
a1m a2m
⎞
⎟ ⎟ ⎟ ⎟ ⎟ arm ⎟ ⎟ ar+1m ⎟ ⎟ ⎟ ⎠ anm
Denote the row vectors R1 Rn , so Ri = ai1 ai2 · · · aim in Rm Now suppose that the dimension of the row space of A is r. Then exactly r of these row vectors are linearly independent. As a notational convenience, suppose the first r rows R1 Rr are linearly independent. Then each of Rr+1 Rn is a linear combination of these r vectors. Write Rr+1 = r+11 R1 + · · · + r+1r Rr Rr+2 = r+21 R1 + · · · + r+2r Rr Rn = n1 R1 + · · · + nr Rr
268
CHAPTER 7
Matrices and Systems of Linear Equations
Now observe that column j of A can be written ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ a1j 1 0 0 ⎜ 0 ⎟ ⎜ 1 ⎟ ⎜ 0 ⎟ ⎜ a2j ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ arj ⎟ = a1j ⎜ 0 ⎟ + a2j ⎜ 0 ⎟ + · · · + arj ⎜ 1 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ar+1j ⎟ ⎜r+11 ⎟ ⎜r+12 ⎟ ⎜r+1r ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ anj n1 n2 nr This means that each column vector of A is a linear combination of the r n-vectors on the right side of this equation. These r vectors therefore span the column space of A. If these vectors are linearly independent, then the dimension of the column space is r. If not, then remove from this list of vectors any that are linear combinations of the others and thus determine a basis for the column space having fewer than r vectors. In any event, Dimension of the column space of A ≤ dimension of the row space of A. By essentially repeating this argument, with row and column vectors interchanged, we obtain Dimension of the row space of A ≤ dimension of the column space of A and these two inequalities together prove the theorem.
It is interesting to ask what effect elementary row operations have on the row space of a matrix. The answer is—none! We will need this fact shortly. THEOREM 7.13
Let A be an n × m matrix, and let B be formed from A by an elementary row operation. Then the row space of A and the row space of B are the same. If B is obtained by a Type I operation, we simply interchange two rows. Then A and B still have the same row vectors, just listed in a different order, so these row vectors span the same row space. Suppose B is obtained by a Type II operation, multiplying row i by a nonzero constant c. Linear combinations of the rows of A have the form
Proof
1 R1 + · · · + i Ri + · · · + n Rn while linear combinations of the rows of B are 1 R1 + · · · + ci Ri + · · · + n Rn Since i can be any number, so can ci , so these linear combinations yield the same vectors when the coefficients are chosen arbitrarily. Thus the row space of A and B are again the same. Finally, suppose B is obtained from A by adding c (row i) to row j. The column vectors of B are now R1 Rj−1 cRi + Rj Rj+1 Rn
7.4 The Row and Column Spaces of a Matrix and Rank of a Matrix
269
But we can write an arbitrary linear combination of these rows of B as 1 R1 + · · · + j−1 Rj−1 + j cRi + Rj + j+1 Rj+1 + · · · + n Rn and this is 1 R1 + · · · + i + cj Ri + · · · + j Rj + · · · + n Rn which is again just a linear combination of the row vectors of A. Thus again the row spaces of A and B are the same, and the theorem is proved.
COROLLARY 7.1
For any matrix A, the row spaces of A and AR are the same. This follows immediately from Theorem 6.13. Each time we perform an elementary row operation on a matrix, we leave the row space unchanged. Since we obtain AR from A by elementary row operations, then A and AR must have the same row spaces. The dimensions of the row and column spaces will be important when we consider solutions of systems of linear equations. There is another number that will play a significant role in this, the rank of a matrix.
DEFINITION 7.13
Rank
The rank of a matrix A is the number of nonzero rows in AR .
We denote the rank of A as rankA. If B is a reduced matrix, then B = BR , so the rank of B is just the number of nonzero rows of B itself. Further, for any matrix A rankA = number of nonzero rows of AR = rankAR We claim that the rank of a matrix is equal to the dimension of its row space (or column space). First we will show this for reduced matrices.
LEMMA 7.1
Let B be a reduced matrix. Then the rank of B equals the dimension of the row space of B. Let R1 Rr be the nonzero row vectors of B. The row space consists of all linear combinations
Proof
c1 R1 + · · · + cr Rr If nonzero row j has its leading entry in column k, then the kth component of Rj is 1. Because B is reduced, all the other elements of column k are zero, hence each other Ri has kth component zero. By Lemma 5.3, R1 Rr are linearly independent. Therefore these vectors form a basis for the row space of B, and the dimension of this space is r. But r = number of nonzero rows of B = number of nonzero rows of BR = rankB.
270
CHAPTER 7
Matrices and Systems of Linear Equations
EXAMPLE 7.22
Let
⎛
0 ⎜0 ⎜ B=⎝ 0 0
1 0 0 0
0 1 0 0
0 0 1 0
3 0 −2 1 2 0 0 0
⎞ 6 5⎟ ⎟ −4⎠ 0
Then B is in reduced form, so B = BR . The rank of B is its number of nonzero rows, which is 3. Further, the nonzero row vectors are 0 1 0 0 3 0 6 0 0 1 0 −2 1 5 0 0 0 1 2 0 −4 and these are linearly independent. Indeed, if a linear combination of these vectors yielded the zero vector, we would have 0 1 0 0 3 0 6 + 0 0 1 0 −2 1 5 + !0 0 0 1 2 0 −4 = 0 0 0 0 0 0 0 But then 0 ! 3 − 2 + 2! 6 + 5 − 4! = 0 0 0 0 0 0 0 and from the second, third, and fourth components we read that = = ! = 0. By Theorem 6.13(2), these three row vectors are linearly independent and form a basis for the row space, which therefore has dimension 3. Using this as a stepping stone, we can prove the result for arbitrary matrices.
THEOREM 7.14
For any matrix A, the rank of A equals the dimension of the row space of A. Proof
From the lemma, we know that rankA = rankAR = dimension of the row space of AR = dimension of the row space of A,
since A and AR have the same row space. Of course, we can also assert that rankA = dimension of the column space of A. If A is n × m, then so is AR . Now AR cannot have more than n nonzero rows (because it has only n rows). This means that rankA ≤ number of rows of A. There is a special circumstance in which the rank of a square matrix actually equals its number of rows.
7.4 The Row and Column Spaces of a Matrix and Rank of a Matrix
271
THEOREM 7.15
Let A be an n × n matrix. Then rankA = n if and only if AR = In . If AR = In , then the number of nonzero rows in AR is n, since In has no zero rows. Hence in this case rankA = n. Conversely, suppose that rankA = n. Then AR has n nonzero rows, hence no zero rows. By definition of a reduced matrix, each row of AR has leading entry 1. Since each row, being a nonzero row, has a leading entry, then the i i elements of AR are all equal to 1. But it is also required that, if column j contains a leading entry, then all other elements of that column are zero. Thus AR must have each i j element equal to zero if i = j, so AR = In .
Proof
EXAMPLE 7.23
Let
⎛
⎞ 1 −1 4 2 1 3 2⎠ A = ⎝0 3 −2 15 8
We find that
⎛
1 AR = ⎝0 0
0 1 0
7 3 0
⎞ 0 2⎠ 0
Therefore rankA = 2. This is also the dimension of the row space of A and of the column space of A.
In the next section we will use the reduced form of a matrix to solve homogeneous systems of linear algebraic equations.
SECTION 7.4
PROBLEMS
In each of Problems 1 through 14, (a) find the reduced form of the matrix, and from this the rank, (b) find a basis for the row space of the matrix, and the dimension of this space, and (c) find a basis for the column space and the dimension of this space.
−4 2 ⎛ 1 2. ⎝0 2 ⎛ −3 3. ⎝ 2 4 1.
1 2 −1 1 −1
3 0
⎛
4.
5. 6.
⎞
4 3⎠ 11 ⎞ 1 2⎠ −3
7.
8.
6 ⎝12 1 8 1 1 0 ⎛ 2 ⎜1 ⎜ ⎝0 4 ⎛ 0 ⎝0 0
0 0 −1 −4 −1
0 0 0
3 0 2 −1 0 0 −1 0 0
0 1
3 1
2 0
1 2 0
⎞ 1 3⎟ ⎟ 1⎠ 7 ⎞ 0 −1⎠ 2
⎞ 1 2⎠ 0
272 ⎛
0 9. ⎝6 2 ⎛ 1 ⎜2 10. ⎜ ⎝1 3 ⎛ −3 11. ⎝ 1 0
7.5
CHAPTER 7
Matrices and Systems of Linear Equations
⎞ 3 0⎠ 2 ⎞ 0 0 0 0⎟ ⎟ 0 −1⎠ 0 0 ⎞ 2 2 0 5⎠ 0 2 4 1 2
⎛
−4 −2 1 4 −4 12. ⎝ 0 1 0 0 ⎞ ⎛ −2 5 7 13. ⎝ 0 1 −3⎠ −4 11 11 −3 2 1 14. 6 −4 −2
⎞ 6 2⎠ 0
1 −2
0 0
15. Show that for any matrix A, rankA = rankAt .
Solution of Homogeneous Systems of Linear Equations We will apply the matrix machinery we have developed to the solution of systems of n linear homogeneous equations in m unknowns: a11 x1 + a12 x2 + · · · + a1m xm = 0 a2 x1 + a22 x2 + · · · + a2m xm = 0 an1 x1 + an2 x2 + · · · + anm xm = 0 This term homogeneous applies here because the right side of each equation is zero. As a prelude to a matrix approach to solving this system, consider the simple system x1 − 3x2 + 2x3 = 0 −2x1 + x2 − 3x3 = 0 We can solve this easily by “eliminating unknowns.” Add 2(equation 1) to equation 2 to get −5x2 + x3 = 0 hence 1 x 2 = x3 5 Now put this into the first equation of the system to get 3 x1 − x3 + 2x3 = 0 5 or 7 x1 + x3 = 0 5 Then 7 x 1 = − x3 5 We now have the solution: 7 1 x1 = − x2 = x3 = 5 5
7.5 Solution of Homogeneous Systems of Linear Equations
273
in which can be any number. For this system, two of the unknowns can be written as constant multiples of the third, which can be assigned any value. The system therefore has infinitely many solutions. For this simple system we do not need matrices. However, it is instructive to see how matrices could be used here. First, write this system in matrix form as AX = O, where ⎛ ⎞ x1 1 −3 2 A= and X = ⎝x2 ⎠ −2 1 −3 x3 Now reduce A. We find that
AR =
1
0
0
1
7 5 − 15
The system AR X = 0 is just 7 x1 + x3 = 0 5 1 x2 − x3 = 0 5 This reduced system has the advantage of simplicity—we can solve it on sight, obtaining the same solutions that we got for the original system. This is not a coincidence. AR is formed from A by elementary row operations. Since each row of A contains the coefficients of an equation of the system, these row operations correspond in the system to interchanging equations, multiplying an equation by a nonzero constant, and adding a constant multiple of one equation to another equation of the system. This is why these elementary row operations were selected. But these operations always result in new systems having the same solutions as the original system (a proof of this will be given shortly). The reduced system AR X = O therefore has the same solutions as AX = O. But AR is defined in just such a way that we can just read the solutions, giving some unknowns in terms of others, as we saw in this simple case. We will look at two more examples and then say more about the method in general.
EXAMPLE 7.24
Solve the system x1 − 3x2 + x3 − 7x4 + 4x5 = 0 x1 + 2x2 − 3x3 = 0 x2 − 4x3 + x5 = 0 This is the system AX = O, with
⎛
1 A = ⎝1 0 We find that
⎛
1
⎜ A R = ⎝0 0
−3 2 1
1 −3 −4
−7 0 0
0
0
− 35 16
1
0
0
1
28 16 7 16
⎞ 4 0⎠ 1 13 ⎞ 16 ⎟ − 20 16 ⎠ 9 − 16
274
CHAPTER 7
Matrices and Systems of Linear Equations
The systems AX = O and AR X = O have the same solutions. But the equations of the reduced system AR X = O are 35 13 x + x =0 16 4 16 5 28 20 x2 + x4 − x5 = 0 16 16 7 9 x3 + x4 − x5 = 0 16 16 x1 −
From these we immediately read the solution. We can let x4 = and x5 = (any numbers), and then x1 =
13 20 9 35 28 7 − x2 = − + x3 = − + 16 16 16 16 16 16
Not only did we essentially have the solution once we obtained AR , but we also knew the number of arbitrary constants that appear in the solution. In the last example this number was 2. This was the number of columns, minus the number of rows having leading entries (or m − rankA). It is convenient to write solutions of AX = O as column vectors. In the last example, we could write ⎛ 35 ⎞ − 13 16 16 ⎜ 28 ⎟ ⎜− 16 + 20 16 ⎟ ⎜ ⎟ X = ⎜− 7 + 9 ⎟ ⎜ 16 16 ⎟ ⎝ ⎠ This formulation also makes it easy to display other information about solutions. In this example, we can also write ⎞ ⎛ ⎞ ⎛ −13 35 ⎜ 20⎟ ⎜−28⎟ ⎟ ⎜ ⎟ ⎜ ⎟ + ⎜ 9⎟ −7 X=!⎜ ⎟ ⎜ ⎟ ⎜ ⎝ 0⎠ ⎝ 16⎠ 16 0 in which ! = /16 can be any number (since can be any number), and = /16 is also any number. This displays the solution as a linear combination of two linearly independent vectors. We will say more about the significance of this in the next section.
EXAMPLE 7.25
Solve the system −x2 + 2x3 + 4x4 = 0 −x3 + 3x4 = 0 2x1 + x2 + 3x3 + 7x4 = 0 6x1 + 2x2 + 10x3 + 28x4 = 0
7.5 Solution of Homogeneous Systems of Linear Equations Let
⎛
0 ⎜0 A=⎜ ⎝2 6 We find that
⎛
1 ⎜0 AR = ⎜ ⎝0 0
−1 0 1 2
0 1 0 0
⎞ 4 3⎟ ⎟ 7⎠ 28
2 −1 3 10
0 0 1 0
275
⎞ 13 −10⎟ ⎟ −3⎠ 0
From the first three rows of AR , read that x1 + 13x4 = 0 x2 − 10x4 = 0 x3 − 3x4 = 0 Thus the solution is given by x1 = −13 x2 = 10 x3 = 3 x4 = in which can be any number. We can write the solution as ⎛ ⎞ −13 ⎜ 10⎟ ⎟ X = ⎜ ⎝ 3⎠ 1 with any number. In this example every solution is a constant multiple of one 4− vector. Note also that m − rankA = 4 − 3 = 1. We will now firm up some of the ideas we have discussed informally, and then look at additional examples. First, everything we have done in this section has been based on the assertion that AX = 0 and AR X = O have the same solutions. We will prove this. THEOREM 7.16
Let A be an n × m matrix. Then the linear homogeneous systems AX = O and AR X = O have the same solutions. We know that there is an n × n matrix such that A = AR . Further, can be written as a product of elementary matrices E1 · · · Er . Suppose first that X = C is a solution of AX = O. Then AC = O, so
Proof
AC = AC = AR C = O = O Then C is also a solution of AR X = O. Conversely, suppose K is a solution of AR X = O. Then AR K = O. We want to show that AK = O also. Because AR K = O, we have AK = O , or Er · · · E1 AK = O
276
CHAPTER 7
Matrices and Systems of Linear Equations
By Theorem 7.8, for each Ej , there is elementary matrix Ej∗ that reverses the effect of Ej . Then, from the last equation we have ∗ E1∗ E2∗ · · · Er−1 Er∗ Er Er−1 · · · E2 E1 AK = O ∗ But Er∗ Er = In , because Er∗ reverses the effect of Er . Similarly, Er−1 Er−1 = In , until finally the last equation becomes
E1∗ E1 AK = AK = O Thus K is a solution of AX = O and the proof is complete. The method for solving AX = O, which we illustrated above, is called the of the GaussJordan method, or complete pivoting. Here is an outline of the method. Keep in mind that, in a system AX = O, row k gives the coefficients of equation k, and column j contains the coefficients of xj as we look down the set of equations. Gauss-Jordan Method for Solving AX = O 1. Find AR . 2. Look down the columns of AR . If column j contains the leading entry of some row (so all other elements of this column are zero), then xj is said to be dependent. Determine all the dependent unknowns. The remaining unknowns (if any) are said to be independent. 3. Each nonzero row of AR represents an equation in the reduced system, having one dependent unknown (in the column having the leading entry 1) and all other unknowns in this equation (if any) independent. This enables us to write this dependent unknown in terms of the independent ones. 4. After step (3) is carried out for each nonzero row, we have each dependent unknown in terms of the independent ones. The independent unknowns can then be assigned any values, and these determine the dependent unknowns, solving the system. We can write the resulting solution as a linear combination of column solutions, one for each independent unknown. The resulting expression, containing an arbitrary constant for each independent unknown, is called the general solution of the system.
EXAMPLE 7.26
Solve the system −x1 + x3 + x4 + 2x5 = 0 x2 + 3x3 + 4x5 = 0 x1 + 2x2 + x3 + x4 + x5 = 0 −3x1 + x2 + 4x5 = 0 The matrix of coefficients is
⎛
−1 ⎜ 0 A=⎜ ⎝ 1 −3
0 1 2 1
1 3 1 0
1 0 1 0
⎞ 2 4⎟ ⎟ 1⎠ 4
7.5 Solution of Homogeneous Systems of Linear Equations We find that
⎛
1
⎜ ⎜0 AR = ⎜ ⎜0 ⎝ 0
0
0
0
1
0
0
0
1
0
0
0
1
− 98
277
⎞
5⎟ ⎟ 8⎟ 9⎟ 8⎠ − 41
Because columns 1 through 4 of AR contain of leading entries of rows, x1 , x2 , x3 and x4 are dependent, while the remaining unknown, x5 is independent. The equations of the reduced system (which has the same solutions as the original system) are: 9 x 1 − x5 = 0 8 5 x 2 + x5 = 0 8 9 x3 + x 5 = 0 8 1 x4 − x5 = 0 4 We wrote these out for illustration, but in fact the solution can be read immediately from AR . We can choose x5 = , any number, and then 9 5 9 1 x1 = x2 = − x3 = − x4 = 8 8 8 4 The dependent unknowns are given by AR in terms of the independent unknowns (only one in this case). We can write this solution more neatly as ⎛ ⎞ 9 ⎜−5⎟ ⎜ ⎟ ⎟ X=!⎜ ⎜−9⎟ ⎝ 2⎠ 8 in which ! = /8 can be any number. This is the general solution of AX = O. In this example, m − rankA = 5 − 4 = 1
EXAMPLE 7.27
Consider the system 3x1 − 11x2 + 5x3 = 0 4x1 + x2 − 10x3 = 0 4x1 + 9x2 − 6x3 = 0 The matrix of coefficients is
⎛
3 A = ⎝4 4
−11 1 9
⎞ 5 −10⎠ −6
278
CHAPTER 7
Matrices and Systems of Linear Equations
The reduced matrix is ⎛
1 AR = ⎝0 0
0 1 0
⎞ 0 0 ⎠ = I3 1
The reduced system is just x1 = 0 x2 = 0 x3 = 0 This system has only the trivial solution, with each xj = 0. Notice that in this example there are no independent unknowns. If there were, we could assign them any values and have infinitely many solutions.
EXAMPLE 7.28
Consider the system 2x1 − 4x2 + x3 + x4 + 6x5 + 12x6 − 5x7 = 0 −4x1 + x2 + 6x3 + 3x4 + 10x5 − 9x6 + 8x7 = 0 7x1 + 2x2 + 4x3 − 8x4 + 6x5 − 5x6 + 15x7 = 0 2x1 + x2 + 6x3 + 3x4 − 4x5 − 2x6 − 21x7 = 0 The coefficient matrix is ⎛
2 ⎜−4 A=⎜ ⎝ 7 2
⎞ −4 1 1 6 12 −5 1 6 3 10 −9 8⎟ ⎟ 2 4 −8 6 −5 15⎠ 1 6 3 −4 −2 −21
We find that ⎛
1
0
0
0
⎜ ⎜0 AR = ⎜ ⎜0 ⎝
1
0
0
0
1
0
0
0
0
1
−7 3 −233 82 1379 738 −1895 738
7 6 −395 164 −995 1476 1043 1476
⎞
−29 6 −375 ⎟ ⎟ 164 ⎟ 2161 ⎟ 1476 ⎠ −8773 1476
From this matrix we see that x1 , x2 , x3 , x4 are dependent, and x5 , x6 , x7 are independent. We read immediately from AR that 7 7 29 x1 = x5 − x6 + x7 3 6 6 233 395 375 x5 + x6 + x x2 = 82 164 164 7 1379 995 2161 x5 + x6 − x x3 = − 738 1476 1476 7 1895 1043 8773 x4 = x5 − x6 + x 738 1476 1476 7
7.5 Solution of Homogeneous Systems of Linear Equations
279
while x5 , x6 and x7 can (independent of each other) be assigned any numerical values. To make the solution look neater, write x5 = 1476, x6 = 1476 and x7 = 1476!, where , and ! are any numbers. Now the solution can be written x1 = 3444 − 1722 + 7134! x2 = 4194 + 3555 + 3375! x3 = −2758 + 995 − 2161! x4 = 3790 − 1043 + 8773! x5 = 1476 x6 = 1476 x7 = 1476! with , and ! any numbers. In column notation, ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ 7134 −1722 3444 ⎜ 3375⎟ ⎜ 3555⎟ ⎜ 4194⎟ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎜−2161⎟ ⎜ 995⎟ ⎜−2758⎟ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ X = ⎜ ⎜ 3790⎟ + ⎜−1043⎟ + ! ⎜ 8773⎟ ⎜ ⎟ ⎜ ⎜ 1476⎟ 0⎟ 0⎟ ⎟ ⎜ ⎜ ⎟ ⎜ ⎝ ⎠ ⎝ ⎠ ⎝ 0⎠ 1476 0 1476 0 0 This is the general solution, being a linear combination of three linearly independent 7− vectors. In this example, m − rankA = 7 − 4 = 3 In each of these examples, after we found the general solution, we noted that the number m−rankA, the number of columns minus the rank of A, coincided with the number of linearly independent column solutions in the general solution (the number of arbitrary unknowns in the general solution). We will see shortly that this is always true. In the next section we will put the Gauss-Jordan method into a vector space context. This will result in an understanding of the algebraic structure of the solutions of a system AX = O, as well as practical criteria for determining when such a system has a nonzero solution.
SECTION 7.5
PROBLEMS
In each of Problems 1 through 12, find the general solution of the system and write it as a column matrix or sum of column matrices. 1.
x1 − 2x3 + x5 = 0 x3 + x4 − x5 = 0 6.
x1 − 2x5 = 0 7.
−3x3 + 2x4 + x5 = 0 3.
x 1 + x2 = 0 4.
−10x1 − x2 + 4x3 − x4 + x5 − x6 = 0 x2 − x3 + 3x4 = 0 2x1 − x2 + x5 = 0
−2x1 + x2 + 2x3 = 0 x 1 − x2 = 0
6x1 − x2 + x3 = 0 x1 − x4 + 2x5 = 0
−3x1 + x2 − x3 + x4 + x5 = 0 x2 + x3 + 4x5 = 0
x1 − x2 + 3x3 − x4 + 4x5 = 0 2x1 − 2x2 + x3 + x4 = 0
x1 + 2x2 − x3 + x4 = 0 x2 − x3 + x4 = 0
2.
5.
x2 − x4 + x6 = 0 8.
8x1 − 2x3 + x6 = 0 2x1 − x2 + 3x4 − x6 = 0
4x1 + x2 − 3x3 + x4 = 0
x2 + x3 − 2x5 − x6 = 0
2x1 − x3 = 0
x4 − 3x5 + 2x6 = 0
CHAPTER 7
280
Matrices and Systems of Linear Equations
x2 − 3x4 + x5 = 0
9.
11.
x1 − 2x2 + x5 − x6 + x7 = 0
2x1 − x2 + x4 = 0
x3 − x4 + x5 − 2x6 + 3x7 = 0
2x1 − 3x2 + 4x5 = 0
x1 − x5 + 2x6 = 0 2x1 − 3x4 + x5 = 0
10.
4x1 − 3x2 + x4 + x5 − 3x6 = 0
12.
2x2 − x6 + x7 − x8 = 0
2x2 + 4x4 − x5 − 6x6 = 0
x3 − 4x4 + x8 = 0
3x1 − 2x2 + 4x5 − x6 = 0
x2 − x3 + x4 = 0
2x1 + x2 − 3x3 + 4x4 = 0
7.6
2x1 − 4x5 + x7 + x8 = 0
x2 − x5 + x6 − x7 = 0
The Solution Space of AX = O Suppose A is an n×m matrix. We have been writing solutions of AX = O as column m−vectors. Now observe that the set of all solutions has the algebraic structure of a subspace of Rm . THEOREM 7.17
Let A be an n × m matrix. Then the set of solutions of the system AX = O is a subspace of Rm . Let S be the set of all solutions of AX = O. Then S is a set of vectors in Rm . Since AO = O, O is in S. Now suppose X1 and X2 are solutions, and and are real numbers. Then
Proof
AX1 + X2 = AX1 + AX2 = O + O = O so X1 + X2 is also a solution, hence is in S. Therefore S is a subspace of Rm . We would like to know a basis for this solution space, because then every solution is a linear combination of the basis vectors. This is similar to finding a fundamental set of solutions for a linear homogeneous differential equation, because then every solution is a linear combination of these fundamental solutions. In examples in the preceding section, we were always able to write the general solution as a linear combination of m − rankA linearly independent solution (vectors). This suggests that this number is the dimension of the solution space. To see why this is true in general, notice that we obtain a dependent xj corresponding to each row having a leading entry. Since only nonzero rows have leading entries, the number of dependent unknowns is the number of nonzero rows of AR . But then the number of independent unknowns is the total number of unknowns, m, minus the number of dependent unknowns (the number of nonzero rows of AR ). Since the number of nonzero rows of AR is the rank of A, then the general solution can always be written as a linear combination of m − rankA independent solutions. Further, as a practical matter, solving the system AX = O by solving the system AR X = O automatically displays the general solution as a linear combination of this number of basis vectors for the solution space. We will summarize this discussion as a theorem.
7.6 The Solution Space of AX=O
281
THEOREM 7.18
Let A be n × m. Then the solution space of the system AX = O has dimension m − rankA or, equivalently m − (number of nonzero rows in AR ).
EXAMPLE 7.29
Consider the system −4x1 + x2 + 3x3 − 10x4 + x5 = 0 2x1 + 8x2 − x3 − x4 + 3x5 = 0 −6x1 + x2 + x3 − 5x4 − 2x5 = 0 The matrix of coefficients is
⎛
−4 A=⎝ 2 −6 and we find that
⎛
1
⎜ A R = ⎝0 0
1 3 8 −1 1 1
0
0
1
0
0
1
⎞ −10 1 −1 3⎠ −5 −2 33 118 −32 59 −164 59
65 ⎞ 118 21 ⎟ 59 ⎠ 56 59
Now A has m = 5 columns, and AR has 3 nonzero rows, so rankA = 3 and the solution space of AX = O has dimension m − rankA = 5 − 3 = 2 From the reduced system we read the solutions 65 21 33 32 − x2 = − 118 118 59 59 56 164 x3 = − x4 = x5 = 59 59
x1 = −
in which and are any numbers. It is neater to replace with 118! and with 118 (which still can be any numbers) and write the general solution as ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ −65 −33 −33! − 65 ⎜ −42⎟ ⎜ 64⎟ ⎜ 64! − 42 ⎟ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ = ! ⎜ 328⎟ + ⎜−112⎟ 328! − 112 X=⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎝ ⎝ 118⎠ ⎠ ⎝ 0⎠ 118! 118 0 118 This displays the general solution (arbitrary element of the solution space) as a linear combination of two linearly independent vectors which form a basis for the dimension two solution space.
282
CHAPTER 7
Matrices and Systems of Linear Equations
We know that a system AX = O always has at least the zero (trivial) solution. This may be the only solution it has. Rank provides a useful criterion for determining when a system AX = O has a nontrivial solution. THEOREM 7.19
Let A be n × m. Then the system AX = O has a nontrivial solution if and only if m > rankA This means that the system of homogeneous equations has a nontrivial solution exactly when the number of unknowns exceeds the rank of the coefficient matrix (the number of nonzero rows in the reduced matrix). We have seen that the dimension of the solution space is m − rankA. There is a nontrivial solution if and only if this solution space has something in it besides the zero solution, and this occurs exactly when the dimension of this solution space is positive. But m − rankA > 0 is equivalent to m > rankA.
Proof
This theorem has important consequences. First, suppose the number of unknowns exceeds the number of equations. Then n < m. But rankA ≤ n is always true, so in this case m − rankA ≥ m − n > 0 and by Theorem 7.19, the system has nontrivial solutions. COROLLARY 7.2
A homogeneous system AX = O with more unknowns than equations always has a nontrivial solution. For another consequence of Theorem 7.19, suppose that A is square, so n = m. Now the dimension of the solution space of AX = O is n − rankA. If this number is positive, the system has nontrivial solutions. If n − rankA is not positive, then it must be zero, because rankA ≤ n is always true. But n − rankA = 0 corresponds to a solution space with only the zero solution. And it also corresponds, by Theorem 7.15, to A having the identity matrix as its reduced matrix. This means that a square system AX = O, having as the same number of unknowns as equations, has only the trivial solution exactly when the reduced form of A is the identity matrix. COROLLARY 7.3
Let A be an n × n matrix of real numbers. Then the system AX = O has only the trivial solution exactly when AR = In .
EXAMPLE 7.30
Consider the system −4x1 + x2 − 7x3 = 0 2x1 + 9x2 − 13x3 = 0 x1 + x2 + 10x3 = 0
7.7 Nonhomogeneous Systems of Linear Equations The matrix of coefficients is 3 × 3:
⎛
⎞ 1 −7 9 −13⎠ 1 10
−4 A=⎝ 2 1 We find that
⎛
1 AR = ⎝0 0
283
0 1 0
⎞ 0 0 ⎠ = I3 1
This means that the system AX = O has only the solution x1 = x2 = x3 = 0. This makes sense in view of the fact that the system AX = O has the same solutions as the reduced system AR X = O, and when AR = I3 this reduced system is just X = O.
SECTION 7.6
PROBLEMS
1.–12. For n = 1 · · · 12, use the solution of Problem n, Section 7.5, to determine the dimension of the solution space of the system of homogeneous equations.
14. Prove Corollary 7.2.
13. Can a system AX = O, in which there are at least
15. Prove Corollary 7.3.
7.7
as many equations as unknowns, have a nontrivial solution?
Nonhomogeneous Systems of Linear Equations We will now consider nonhomogeneous linear systems of equations: a11 x1 + a12 x2 + · · · + a1m xm = b1 a2 x1 + a22 x2 + · · · + a2m xm = b2 an1 x1 + an2 x2 + · · · + anm xm = bn We can write this system in matrix form as AX = B, in which A = aij is the n × m matrix of coefficients, ⎛ ⎞ ⎛ ⎞ b1 x1 ⎜b2 ⎟ ⎜ x2 ⎟ ⎜ ⎟ ⎜ ⎟ X = ⎜ ⎟ and B = ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ xm bn This system has n equations in m unknowns. Of course, if each bj = 0 then this is a homogeneous system AX = O. A homogeneous system always has at least one a solution, the zero solution. A nonhomogeneous system need not have any solution at all.
284
CHAPTER 7
Matrices and Systems of Linear Equations
EXAMPLE 7.31
Consider the system 2x1 − 3x2 = 6 4x1 − 6x2 = 18 If there were a solution x1 = x2 = , then from the first equation we would have 2 − 3 = 6. But then the second equation would give us 4−6 = 18 = 22−3 = 12, a contradiction. We therefore have an existence question to worry about with the nonhomogeneous system. Before treating this issue, we will ask: what must solutions of AX = B look like?
7.7.1
The Structure of Solutions of AX = B
We can take a cue from linear second order differential equations. There we saw that the every solution of y + py + qy = fx is a sum of a solution the homogeneous equation y + py + qy = 0, and a particular solution of y + py + qy = fx. We will show that the same idea holds true for linear algebraic systems of equations as well. THEOREM 7.20
Let Up be any solution of AX = B. Then every solution of AX = B is of the form Up + H, in which H is a solution of AX = O. Proof
Let W be any solution of AX = B. Since Up is also a solution of this system, then AW − Up = AW − AUp = B − B = O
Then W − Up is a solution of AX = O. Letting H = W − Up , then W = Up + H. Conversely, if W = Up + H, where H is a solution of AX = O, then AW = AUp + H = AUp + AH = B + O = B so W is a solution of AX = B. This means that, if Up is any solution of AX = B, and H is the general solution of AX = O, then the expression Up + H contains all possible solutions of AX = B. For this reason we call such Up + H the general solution of AX = B, for any particular solution Up of AX = B.
EXAMPLE 7.32
Consider the system −x1 + x2 + 3x3 = −2 x2 + 2x3 = 4 Here
−1 A= 0
1 3 1 2
and
−2 B= 4
7.7 Nonhomogeneous Systems of Linear Equations
285
We find from methods of the preceding sections that the general solution of AX = O is ⎛ ⎞ 1 ⎝−2⎠ 1 By a method we will describe shortly,
⎛ ⎞ 6 Up = ⎝ 4 ⎠ 0
is a particular solution of AX = B. Therefore every solution of AX = B is contained in the expression ⎛ ⎞ ⎛ ⎞ 1 6 ⎝−2⎠ + ⎝4⎠ 1 0 in which is any number. This is the general solution of the system AX = B.
7.7.2
Existence and Uniqueness of Solutions of AX = B
Now we know what to look for in solving AX = B. In this section we will develop criteria to determine when a solution Up exists, as well as a method that automatically produces the general solution in the form X = H + Up , where H is the general solution of AX = O
DEFINITION 7.14
Consistent System of Equations
A nonhomogeneous system AX = B is said to be consistent if there exists a solution. If there is no solution, the system is inconsistent.
The difference between a system AX = O and AX = B is B. For the homogeneous system, it is enough to specify the coefficient matrix A when working with the system. But for AX = B, we must incorporate B into our computations. For this reason, we introduce the augmented matrix AB. If A is n × m, AB is the n × m + 1 matrix formed by adjoining B to A as a new last column. For example, if ⎛ ⎞ ⎛ ⎞ −3 2 6 1 5 A = ⎝ 0 3 3 −5⎠ and B = ⎝ 2⎠ 2 4 4 −6 −8 then
⎛
⎜−3 ⎜ AB = ⎜ 0 ⎝ 2
2
6
3
3
4
4
1 −5 −6
⎞ 5⎟ ⎟ 2⎟ ⎠ −8
The column of dots does not count in the dimension of the matrix, and is simply a visual device to clarify that we are dealing with an augmented matrix giving both A and B for a system AX = B. If we just attached B as a last column without such an indicator, we might we dealing with a homogeneous system having 3 equations in 5 unknowns.
286
CHAPTER 7
Matrices and Systems of Linear Equations
Continuing with these matrices for the moment, reduce A to find AR : ⎞ ⎛ 1 1 0 0 3 ⎟ ⎜ AR = ⎝0 1 0 −3⎠ 4 0 0 1 3 Next, reduce AB (ignore the dotted column in the row operations) to get ⎛ ⎜1 ⎜
ABR = ⎜0 ⎝ 0
0
0
1
0
0
1
−3 4 3 1 3
⎞ − 163 ⎟ ⎟ 15 ⎟ 4 ⎠ 37 − 12
Notice that
ABR = AR C If we reduce the augmented matrix AB, we obtain in the first m columns the reduced form of A, together with some new last column. The reason for this can be seen by reviewing how we reduce a matrix. Perform elementary row operations, beginning with the left-most column containing a leading entry, and work from left to right through the columns of the matrix. In finding the reduced form of the augmented matrix AB, we deal with columns 1 · · · m, which constitute A. The row operations used to reduce AB will, of course, operate on the elements of the last column as well, eventually resulting in what is called C in the last equation. We will state this result as a theorem.
THEOREM 7.21
Let A be n × m and let B be m × 1. Then for some m × 1 matrix C,
ABR = AR C The reason this result is important is that the original system AX = B and the reduced system AR X = C have the same solutions (as in the homogeneous case, because the elementary row operations do not change the solutions of the system). But because of the special form of AR , the system AR X = C is either easy to solve by inspection, or to see that there is no solution.
EXAMPLE 7.33
Consider the system ⎛
−3 ⎝ 1 0
2 4 −2
⎞ ⎛ ⎞ 8 2 −6⎠ X = ⎝ 1⎠ −2 2
7.7 Nonhomogeneous Systems of Linear Equations We will reduce the augmented matrix ⎛ ⎜−3 ⎜
AB = ⎜ 1 ⎝ 0
2
2
4
−6
−2
2
⎞
One way to proceed is
287
8⎟ ⎟ 1⎟ ⎠ −2 ⎛
⎞ 1⎟ ⎟ 2 2 8⎟ ⎠ 0 −2 2 −2 ⎞ ⎛ 1 1 4 −6 ⎜ ⎟ ⎜ ⎟ → add 3(row 1) to row 2 → ⎜0 14 −16 11⎟ ⎝ ⎠ 0 −2 2 −2 ⎛ ⎞ 4 −6 1⎟ ⎜1 1 ⎜ ⎟ 8 11 → row 2 → ⎜0 ⎟ 1 −7 14 ⎠ ⎝ 14 0 −2 2 −2 ⎛ ⎞ 10 15 1 0 − − 7 7 ⎟ ⎜ ⎜ ⎟ 11 ⎟ → −4(row 2) to row 1 2(row 2) to row 3 → ⎜0 1 − 8 7 14 ⎠ ⎝ 0 0 − 27 − 37 ⎛ ⎞ 10 15 1 0 − − 7 7 ⎟ ⎜ 7 ⎜ ⎟ 11 ⎟ → − (row 3) → ⎜0 1 − 8 7 14 ⎠ ⎝ 2 3 0 0 1 2 ⎛ ⎞ ⎜1 0 0 0 ⎟ ⎜ ⎟ 10 8 → (row 3) to row 1, (row 3) to row 2 → ⎜ 0 1 0 25 ⎟ ⎜ ⎟ 7 7 ⎝ ⎠ 0 0 1 23 ⎜ 1 ⎜
AB → interchange rows 1 and 2 → ⎜−3 ⎝
4
−6
As can be seen in this process, we actually arrived at AR in the first three rows and columns, and whatever ends up in the last column is what we call C:
ABR = AR C Notice that the reduced augmented matrix is I3 C and represents the reduced system I3 X = C. This is the system ⎛ ⎞ ⎛ ⎞ 0 1 0 0 5⎟ ⎝0 1 0 ⎠ X = ⎜ ⎝2⎠ 3 0 0 1 2
288
CHAPTER 7
Matrices and Systems of Linear Equations
which we solve by inspection to get x1 = 0 x2 = 5/2 x3 = 3/2. Thus reducing AB immediately yields the solution ⎛ ⎞ 0 ⎜5⎟ Up = ⎝ 2 ⎠ 3 2
of the original system AX = B. Because AR = I3 , Corollary 7.3 tells us that the homogeneous system AX = O has only the trivial solution, and therefore H = O in Theorem 7.20 and Up is the unique solution of AX = B.
EXAMPLE 7.34
The system 2x1 − 3x2 = 6 4x1 − 6x2 = 18 is inconsistent, as we saw in Example 7.31. We will put the fact that this system has no solution into the context of the current discussion. Write the augmented matrix ⎛ ⎞ 2 −3 6 ⎠
AB = ⎝ 4 −6 18 Reduce this matrix. We find that ⎛ 1
ABR = ⎝ 0
− 23 0
⎞ 0⎠
1
From this we immediately read the reduced system AR X = C : 0 1 − 23 X= AR X = 1 0 0 This system has the same solutions as the original system. But the second equation of the reduced system is 0x1 + 0x2 = 1 which has no solution. Therefore AX = B has no solution either. In this example, the reduced system has an impossible equation because AR has a zero second row, while the second row of ABR has a nonzero element in the augmented column. Whenever this happens, we obtain an equation having all zero coefficients of the unknowns, but equal to a nonzero number. In such a case the reduced system AR X = C, hence the original system AX = B, can have no solution. The key to recognizing when this will occur is that it happens when the rank of AR (its number of nonzero rows) is less than the rank of AB.
7.7 Nonhomogeneous Systems of Linear Equations
289
THEOREM 7.22
The nonhomogeneous system AX = B has a solution if and only if A and AB have the same rank. Proof Let A be n × m. Suppose first that rankA = rank AB = r. By Theorems 7.12 and 7.14, the column space of AB has dimension r. Certainly r cannot exceed the number of columns of A, soB, which is column m + 1 of AB, must be a linear combination of the first m columns of AB, which form A. This means that, for some numbers 1 m , ⎛
⎞ ⎛ ⎞ ⎛ ⎞ a11 a12 a1m ⎜a21 ⎟ ⎜a22 ⎟ ⎜a2m ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ B = 1 ⎜ ⎟ + 2 ⎜ ⎟ + · · · + m ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ an1
⎛
an2
⎞
anm ⎛
⎞ 1 1 a11 + 2 a12 + · · · + m a1m ⎜ 1 a21 + 2 a22 + · · · + m a2m ⎟ ⎜ 2 ⎟ ⎜ ⎟ ⎜ ⎟ =⎜ ⎟ = A ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ 1 an1 + 2 an2 + · · · + m anm m ⎛
⎞ 1 ⎜ 2 ⎟ ⎜ ⎟ But then ⎜ ⎟ is a solution of AX = B. ⎝ ⎠ m
⎛
⎞ 1 ⎜ 2 ⎟ ⎜ ⎟ Conversely, suppose AX = B has a solution ⎜ ⎟. Then ⎝ ⎠ m ⎞ ⎛ ⎞ 1 a11 + 2 a12 + · · · + m a1m 1 ⎜ 2 ⎟ ⎜ 1 a21 + 2 a22 + · · · + m a2m ⎟ ⎟ ⎜ ⎟ ⎜ B = A⎜ ⎟ = ⎜ ⎟ ⎠ ⎝ ⎠ ⎝ ⎛
m 1 an1 + 2 an2 + · · · + m anm ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ a11 a12 a1m ⎜a21 ⎟ ⎜a22 ⎟ ⎜a2m ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ = 1 ⎜ ⎟ + 2 ⎜ ⎟ + · · · + m ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ an1
an2
anm
n Then B is a linear combination of the columns of A, thought of as vectors in R . But then the column space of A is the same as the column space of AB. Then
rankA = dimension of the column space of A = dimension of the column space of AB = rank AB and the proof is complete.
290
CHAPTER 7
Matrices and Systems of Linear Equations
EXAMPLE 7.35
Solve the system x1 − x2 + 2x3 = 3 −4x1 + x2 + 7x3 = −5 −2x1 − x2 + 11x3 = 14 The augmented matrix is
⎛
⎜ 1 −1 2
AB = ⎜ ⎝−4 1 7 −2 −1 11 · · · When we reduce this matrix we obtain
⎞ 3⎟ ⎟ −5⎠ 14
⎛
⎜1 ⎜
ABR = AR C = ⎜0 ⎝ 0
0
−3
1
−5
0
0
⎞ 0⎟ ⎟ 0⎟ ⎠ 1
The first three columns of this reduced matrix make up AR . But rankA = 2 and rank ABR = 3 so this system has no solution. The last equation of the reduced system is 0x1 + 0x2 + 0x3 = 1 which can have no solution.
EXAMPLE 7.36
Solve x1 − x3 + 2x4 + x5 + 6x6 = −3 x2 + x3 + 3x4 + 2x5 + 4x6 = 1 x1 − 4x2 + 3x3 + x4 + 2x6 = 0 The augmented matrix is
⎛
⎜1 ⎜
AB = ⎜0 ⎝ 1 Reduce this to get
0
−1
2
1
6
1
1
3
2
4
−4
3
1
0
2
⎛ ⎜1 ⎜
ABR = ⎜0 ⎝ 0
0
0
27 8
15 8
60 8
1
0
13 8
9 8
20 8
0
1
11 8
7 8
12 8
⎞ −3⎟ ⎟ 1⎟ ⎠ 0 ⎞ − 178 ⎟ ⎟ 1⎟ 8⎠ 7 8
7.7 Nonhomogeneous Systems of Linear Equations
291
The first six columns of this matrix form AR and we read that rankA = 3 = rank ABR . From ABR , identify x1 x2 x3 as dependent and x4 x5 x6 as independent. The number of independent unknowns is m − rankA = 6 − 3 = 3, and this is the dimension of the solution space of AX = O. From the reduced augmented matrix, the first equation of the reduced system is x1 +
27 15 60 17 x4 + x5 + x6 = − 8 8 8 8
so 27 15 60 17 x4 − x5 − x6 − 8 8 8 8 We will not write out all of the equations of the reduced system. The point is that we can read directly from AB that x1 = −
R
x2 = −
13 9 20 1 x4 − x5 − x6 + 8 8 8 8
and 11 7 12 7 x4 − x5 − x6 + 8 8 8 8 while x4 x5 x6 can be assigned any numerical values. We can write this solution as ⎛ 27 ⎞ − 8 x4 − 158 x5 − 608 x6 − 178 ⎜ 13 ⎟ ⎜ − 8 x4 − 98 x5 − 208 x6 + 18 ⎟ ⎜ ⎟ ⎜ − 11 x − 7 x − 12 x + 7 ⎟ ⎜ 4 5 6 8 8 8 ⎟ X=⎜ 8 ⎟ ⎜ ⎟ x4 ⎜ ⎟ ⎝ ⎠ x5 x6 x3 = −
If we let x4 = 8, x5 = 8, and x6 = 8!, with , and ! any numbers, then the general solution is ⎛ ⎞ ⎛ 17 ⎞ ⎛ ⎞ ⎛ ⎞ −8 −60 −15 −27 ⎜−20⎟ ⎜ 1 ⎟ ⎜ −9⎟ ⎜−13⎟ ⎜ ⎟ ⎜ 8⎟ ⎜ ⎟ ⎜ ⎟ ⎜−12⎟ ⎜ 7 ⎟ ⎜ −7⎟ ⎜−11⎟ ⎜ ⎟ ⎜ 8⎟ ⎜ ⎟ ⎜ ⎟ X = ⎜ ⎟ + ⎜ 0⎟ + ! ⎜ 0⎟ + ⎜ 0⎟ ⎜ ⎟ ⎜ ⎜ ⎟ ⎜ 8⎟ ⎟ ⎝ 0⎠ ⎝ 0⎠ ⎝ 8⎠ ⎝ 0⎠ 8 0 0 0 This is in the form H + Up , with H the general solution of AX = O and Up a particular solution of AX = B. Since the general solution is of the form X = H + Up , with H the general solution of AX = O, the only way AX = B can have a unique solution is if H = O, that is, the homogeneous system must have only the trivial solution. But, for a system with the same number of unknowns as equations, this can occur only if AR is the identity matrix. THEOREM 7.23
Let A be n × n. Then the nonhomogeneous system AX = B has a unique solution if and only if AR = In . This, in turn, occurs exactly when rankA = n.
292
CHAPTER 7
Matrices and Systems of Linear Equations
EXAMPLE 7.37
Consider the system
⎛
2 ⎝−5 1 The augmented matrix is
1 1 1
⎞ ⎛ ⎞ −11 −6 9⎠ X = ⎝ 12⎠ 14 −5
⎞ ⎜ 2 1 −11 −6⎟ ⎟ ⎜
AB = ⎜−5 1 9 12⎟ ⎠ ⎝ 1 1 14 −5 ⎛
and we find that
⎛ ⎜1 ⎜
ABR = ⎜0 ⎝
0
0
1
0
0
0
1
⎞ − 86 31 ⎟ ⎟ 191 ⎟ − 155 ⎠ 11 − 155
The first three columns tell us that AR = I3 . The homogeneous system AX = O has only the trivial solution. Then AX = B has a unique solution, which we read from ABR : ⎛ −86 ⎞ 31
⎟ ⎜ X = ⎝ −191 155 ⎠ −11 155
Note that rankA = 3 and the dimension of the solution space AX = O is n − rankA = 3 − 3 = 0, consistent with this solution space having no elements except the zero vector.
SECTION 7.7
PROBLEMS
In each of Problems 1 through 14, find the general solution of the system or show that the system has no solution. 1.
4.
3x1 − 2x2 + x3 = 6
x1 − 4x2 = 3
x1 + 10x2 − x3 = 2 −3x1 − 2x2 + x3 = 0 2.
5.
x2 + x3 − 6x4 + x6 = −9 x1 − x2 + x6 = 0
2x1 − 3x2 + x4 = 16 2x1 − 3x2 + x4 − x6 = 0
3x2 − 4x4 = 10 x1 − 3x2 + 4x5 − x6 = 8
4x1 − 2x2 + 3x3 + 10x4 = 1 x1 − 3x4 = 8
3.
2x1 − 3x2 = 1 −x1 + 3x2 = 0
6.
2x1 − 3x2 + x4 = 1
3x1 − 2x3 + x5 = 1
3x2 + x3 − x4 = 0
x2 − x4 + 6x6 = 3
2x1 − 3x2 + 10x3 = 0
7.8 Matrix Inverses 7.
8x2 − 4x3 + 10x6 = 1
12.
−4x1 + 5x2 − 6x3 = 2
x3 + x5 − x6 = 2
2x1 − 6x2 + x3 = −5
x4 − 3x5 + 2x6 = 0 2x1 − 3x3 = 1
8.
−6x1 + 16x2 − 11x3 = 1 13.
x1 − x2 + x3 = 1 9. 10. 11.
4x1 + 3x2 = 4 7x1 − 3x2 + 4x3 = −7
−2x1 + x2 + 7x3 = 4 14.
−6x1 + 2x2 − x3 + x4 = 0 x1 + 4x2 − x4 = −5 x1 + x2 + x3 − 7x4 = 0
15. Let A be an n × m matrix with rank r. Prove that the reduced system AR X = B has a solution if and only if br+1 = · · · = bn = 0.
2x1 + x2 − x3 + 4x4 = 6 x2 − 3x4 = −5
7.8
4x1 − x2 + 4x3 = 1 x1 + x2 − 5x3 = 0
2x1 − 4x2 + x3 = 2 14x3 − 3x5 + x7 = 2 x1 + x2 + x3 − x4 + x6 = −4 3x1 − 2x2 = −1
293
Matrix Inverses
DEFINITION 7.15
Matrix Inverse
Let A be an n × n matrix. Then B is an inverse of A if AB = BA = In
In this definition B must also be n × n because both AB and BA must be defined. Further, if B is an inverse of A, then A is also an inverse of B. It is easy to find nonzero square matrices that have no inverse. For example, let 1 0 A= 2 0 a b If B is an inverse of A, say B = , then we must have c d 1 0 a b a b 1 0 AB = = = 2 0 c d 2a 2b 0 1 But then a = 1 b = 0 2a = 0
and
b=1
which are impossible conditions. On the other hand, some matrices do have inverses. For example, 4 4 − 17 − 17 2 1 2 1 1 0 7 7 = = 2 2 1 4 1 4 0 1 − 17 − 17 7 7
294
CHAPTER 7
Matrices and Systems of Linear Equations
DEFINITION 7.16
Nonsingular and Singular Matrices
A square matrix is said to be nonsingular if it has an inverse. If it has no inverse, the matrix is called singular.
If a matrix has an inverse, then it can have only one. THEOREM 7.24
Uniqueness of Inverses
Let B and C be inverses of A. Then B = C. Proof
Write B = BIn = BAC = BAC = In C = C
In view of this we will denote the inverse of A as A−1 . Here are properties of inverse matrices. In proving parts of the theorem, we repeatedly employ the strategy that, if AB = BA = In , then B must be the inverse of A. THEOREM 7.25
1. In is nonsingular and In−1 = In . 2. If A and B are nonsingular n × n matrices, then AB is nonsingular and AB−1 = B−1 A−1 3. If A is nonsingular, so is A−1 , and A−1 −1 = A 4. If A is nonsingular, so is At , and At −1 = A−1 t 5. If A and B are n × n and either is singular, then AB and BA are both singular. Proof
For (2), compute ABB−1 A−1 = ABB−1 A−1 = AA−1 = In
Similarly, B−1 A−1 AB = In . Therefore AB−1 = B−1 A−1 . For (4), use Theorem 7.3(3) to write At A−1 t = A−1 At = In t = In Similarly, A−1 t At = AA−1 t = In Therefore At −1 = A−1 t . We will be able to give a very short proof of (5) when we have developed determinants. We saw before that not every matrix has an inverse. How can we tell whether a matrix is singular or nonsingular? The following theorem gives a reasonable test.
7.8 Matrix Inverses
295
THEOREM 7.26
An n × n matrix A is nonsingular if and only if AR = In . Alternatively, an n × n matrix is nonsingular if and only if its rank is n. The proof consists of understanding a relationship between a matrix having an inverse, and its reduced form being the identity matrix. The key lies in noticing that we can form the columns of a matrix product AB by multiplying, in turn, A by each column of B: ⎛ ⎞ b1j ⎜b2j ⎟ ⎜ ⎟ column j of AB = A(column j of B = A ⎜ ⎟ ⎝ ⎠ bnj We will build an inverse for A a column at a time. To have AB = In , we must be able to choose the columns of B so that ⎛ ⎞ 0 ⎛ ⎞ ⎜0⎟ b1j ⎜ ⎟ ⎜ ⎟ ⎜b2j ⎟ ⎜⎟ ⎜ ⎟ ⎟ column j of AB = A ⎜ ⎟ = column j of In = ⎜ (7.4) ⎜1⎟ ⎝ ⎠ ⎜ ⎟ ⎜⎟ bnj ⎝ ⎠ 0
Proof
with a 1 in the j th place and zeros elsewhere. Suppose now that AR = In . Then, by Theorem 7.23, the system (7.4) has a unique solution for each j = 1 · · · n. These solutions form the columns of a matrix B such that AB = In , and then B = A−1 . (Actually we must show that AB = In also, but we leave this as an exercise). Conversely, suppose A is nonsingular. Then system (7.4) has a unique solution for j = 1 n, because these solutions are the columns of A−1 . Then, by Theorem 7.23, AR = In .
7.8.1
A Method for Finding A−1
We know some computational rules for working with matrix inverses, as well as a criterion for a matrix to have an inverse. Now we want an efficient way of computing A−1 from A. Theorem 7.26 suggests a strategy. We know that, in any event, there is an n × n matrix such that A = AR . is a product of elementary matrices representing the elementary row operations used to reduce A. Previously we found by adjoining In to the left of A to form an n × 2n matrix In A. Reduce A, performing the elementary row operations on all of In A to eventually arrive at In . This produces such that A = AR If AR = In then = A−1 . If AR = In , then A has no inverse.
EXAMPLE 7.38
Let
A=
5 6
−1 8
We want to know if A is nonsingular and, if it is, produce its inverse.
296
CHAPTER 7
Matrices and Systems of Linear Equations
Form
⎛ 1
I2 A = ⎝ 0
0 1
⎞ 5 −1⎠ 6 8
Reduce A (the last two columns), carrying out the same ⎛ 1 1 0
I2 A → (row 1) → ⎝ 5 5 0 1 ⎛ 1 5
operations on the first two columns: ⎞ 1 − 15 ⎠ 6 8
0
→ −6(row 1) + (row 2) → ⎝ − 65 1 ⎛ 1 5 0 → (row 2) → ⎝ 5 46 − 466 465 ⎛ 8 1 1 → (row 2) + (row 1) → ⎝ 46 46 5 − 466 465
⎞ − 15 ⎠
1
46 5
0
1
⎞ − 15 ⎠
0
1 ⎞
1
0⎠
0
1
In the last two columns we read AR = I2 . This means that A is nonsingular. From the first two columns, 1 8 1 A−1 = 46 −6 5
EXAMPLE 7.39
Let
−3 A= 4 Perform a reduction:
21 −28
⎛ 1 0
I2 A = ⎝ 0 1 ⎛ 1 −1 0 → − (row 1) → ⎝ 3 3 0 1 ⎛ −1 0 −4(row 1) + (row 2) → ⎝ 3 4 1 3
−3 4
⎞ 21⎠ −28
⎞ −7⎠ 4 −28 ⎞ 1 −7⎠ 0 0 1
We read AR from the last two columns, which form a 2 × 2 reduced matrix. Since this is not I2 , A is singular and has no inverse. Here is how inverses relate to the solution of systems of linear equations in which the number of unknowns equals the number of equations.
7.8 Matrix Inverses
297
THEOREM 7.27
Let A be an n × n matrix. 1. A homogeneous system AX = O has a nontrivial solution if and only if A is singular. 2. A nonhomogeneous system AX = B has a solution if and only if A is nonsingular. In this case the unique solution is X = A−1 B. For a homogeneous system AX = O, if A were nonsingular then we could multiply the equation on the left by A−1 to get X = A−1 O = O Thus in the nonsingular case, a homogeneous system can have only a trivial solution. In the singular case, we know that rankA < n, so the solution space has positive dimension n − rankA and therefore has nontrivial solutions in it. In the nonsingular case, we can multiply a nonhomogeneous equation AX = B on the left by A−1 to get the unique solution X = A−1 B However, if A is singular, then rankA < n, and then Theorem 7.22 tells us that the system AX = B can have no solution.
EXAMPLE 7.40
Consider the nonhomogeneous system 2x1 − x2 + 3x3 = 4 x1 + 9x2 − 2x3 = −8 4x1 − 8x2 + 11x3 = 15 The matrix of coefficients is
⎛
2 A = ⎝1 4 and we find that
⎞ −1 3 9 −2⎠ −8 11
⎛
⎞ 83 −13 −25 1 ⎝ −19 10 7⎠ A−1 = 53 −44 12 19
The unique solution of this system is
⎛ ⎞⎛ ⎞ 83 −13 −25 4 1 ⎝−19 10 7⎠ ⎝−8⎠ X = A−1 B = 53 −44 12 19 15 ⎛ 61 ⎞ 53
⎟ ⎜ = ⎝− 51 53 ⎠ 13 53
298
CHAPTER 7
SECTION 7.8
Matrices and Systems of Linear Equations
PROBLEMS
In each of Problems 1 through 10, find the inverse of the matrix or show that the matrix is singular. −1 2 1. 2 1 12 3 2. 4 1 −5 2 3. 1 2 −1 0 4. 4 4 6 2 5. 3 3 ⎛ ⎞ 1 1 −3 ⎜ ⎟ 1⎠ 6. ⎝2 16 0 0 4 ⎞ ⎛ −3 4 1 ⎟ ⎜ 7. ⎝ 1 2 0⎠ 1 1 3 ⎞ ⎛ −2 1 −5 ⎟ ⎜ 1 4⎠ 8. ⎝ 1 0 3 3 ⎞ ⎛ −2 1 1 ⎟ ⎜ 9. ⎝ 0 1 1⎠ −3 0 6
⎛
12 ⎜ 10. ⎝−3 0
1 2 9
⎞ 14 ⎟ 0⎠ 14
In each of Problems 11 through 15, find the unique solution of the system, using Theorem 7.27(2). 11.
x1 − x2 + 3x3 − x4 = 1 x2 − 3x3 + 5x4 = 2 x1 − x3 + x4 = 0 x1 + 2x3 − x4 = −5
12.
8x1 − x2 − x3 = 4 x1 + 2x2 − 3x3 = 0 2x1 − x2 + 4x3 = 5
13.
2x1 − 6x2 + 3x3 = −4 −x1 + x2 + x3 = 5 2x1 + 6x2 − 5x3 = 8
14.
12x1 + x2 − 3x3 = 4 x1 − x2 + 3x3 = −5 −2x1 + x2 + x3 = 0
15.
4x1 + 6x2 − 3x3 = 0 2x1 + 3x2 − 4x3 = 0 x1 − x2 + 3x3 = −7
16. Let A be nonsingular. Prove that, for any positive integer k, Ak is nonsingular, and Ak −1 = A−1 k . 17. Let A, B and C be n × n real matrices. Suppose BA = AC = In . Prove that B = C.
CHAPTER
8
PERMUTAIONS DEFINITION OF THE DETERMINANT PROPERTIES EVALUATION OF DETERMINANTS ELEMENTARY ROW AND COLUMN OPERATIONS COFACTOR EXPANSIONS DETERMINANTS OF TRIANGULAR
Determinants
If A is a square matrix, the determinant of A is a sum of products of elements of A, formed according to a procedure we will now describe. First we need some information about permutations.
8.1
Permutations If n is a positive integer, a permutation of order n is an arrangement of the integers 1 n in any order. For example, suppose p is a permutation that reorders the integers 1 6 as 3 1 4 5 2 6 Then p1 = 3
p2 = 1
p3 = 4
p4 = 5
p5 = 2
p6 = 6
with pj the number the permutation has put in place j. For small n it is possible to list all permutations on 1 n. Here is a short list: For n = 2 there are two permutations on the integers 1 2, one leaving them in place and the second interchanging them: 1 2 2 1 299
300
CHAPTER 8
Determinants
For n = 3 there are six permutations on 1 2 3, and they are
1 2 3 1 3 2 2 1 3 2 3 1 3 1 2 3 2 1
For n = 4 there are twenty four permutations on 1 2 3 4: 1 2 3 4 1 2 4 3 1 3 2 4 1 3 4 2 1 4 2 3 1 4 3 2 2 1 3 4 2 1 4 3 2 3 1 4 2 3 4 1 2 4 1 3 2 4 3 1 3 1 2 4 3 1 4 2 3 2 1 4 3 2 4 1 3 4 1 2 3 4 2 1 4 1 2 3 4 1 3 2 4 2 1 3 4 2 3 1 4 3 1 2 4 3 2 1 An examination of this list of permutations suggests a systematic approach by which they were all listed, and such an approach will work in theory for higher n. However, we can also observe that the number of permutations on 1 n increases rapidly with n. There are n! = 1 · 2 · · · · · n permutations on 1 n. This fact is not difficult to derive. Imagine a row of n boxes, and start putting the integers from 1 to n into the boxes, one to each box. There are n choices for a number to put into the first box, n − 1 choices for the second, n − 2 for the third, and so on until there is only one left to put in the last box. There is a total of nn − 1n − 2 · · · 1 = n! ways to do this, hence n! permutations on n objects. A permutation is characterized as even or odd, according to a rule we will now illustrate. Consider the permutation 2 5 1 4 3 on the integers 1 5. For each number k in the list, count the number of integers to its right that are smaller than k. In this way form a list k 2 5 1 4 3
number of integers smaller than k to the right of k 1 3 0 1 0
Sum the integers in the right column to get 5, which is odd. We therefore call this permutation odd. As an example of an even permutation, consider 2 1 5 4 3
8.2 Definition of the Determinant
301
Now the list is k 2 1 5 4 3
number of integers smaller than k to the right of k 1 0 2 1 0
and the integers in the right column sum to 4, an even number. This permutation is even. If p is a permutation, let 0 if p is even sgnp = 1 if p is odd
SECTION 8.1
PROBLEMS
1. The six permutations of 1 2 3 are given in the discussion. Which of these permutations are even and which are odd?
3. Show that half of the permutations on 1 2 n are even, and the other half are odd.
2. The 24 permutations of 1 2 3 4 are given in the discussion. Which of these are even and which are odd?
8.2
Definition of the Determinant Let A = aij be an n × n matrix, with numbers or functions as elements.
DEFINITION 8.1
The determinant of A, denoted detA, is the sum of all products −1sgnp a1p1 a2p2 · · · anpn taken over all permutations p on 1 n. This sum is denoted −1sgnp a1p1 a2p2 · · · anpn
(8.1)
p
Each term in the defining sum (8.1) contains exactly one element from each row and from each column, chosen according to the indices j pj determined by the permutation. Each product in the sum is multiplied by 1 if the permutation is even, and by −1 if p is odd.
302
CHAPTER 8
Determinants
Since there are n! permutations on 1 n, this sum involves n! terms and is therefore quite daunting for, say n ≥ 4. We will examine the small cases n = 2 and n = 3 and then look for ways of evaluating det A for larger n. In the case n = 2, a11 a12 A= a21 a22 We have seen that there are 2 permutations on 1 2, namely p 1 2 which is an even permutation, and q 2 1 which is odd. Then detA = −1sgnp a1p1 a2p2 + −1sgnq a1q1 a2q2 = −10 a11 a22 + −11 a12 a21 = a11 a22 − a12 a21 This rule for evaluating detA holds for any In the case n = 3, ⎛ a11 A = ⎝ a21 a31
2 × 2 matrix. a12 a22 a32
⎞ a13 a23 ⎠ a33
The permutations of 1 2 3 are p1 1 2 3 p2 1 3 2 p3 2 1 3 p4 2 3 1 p5 3 1 2 p6 3 2 1 It is routine to check that p1 , p5 and p6 are even, and p2 , p3 and p4 are odd. Then detA = −1sgnp1 a1p1 1 a2p1 2 a3p1 3 + −1sgnp2 a1p2 1 a2p2 2 a3p2 3 + −1sgnp3 a1p3 1 a2p3 2 a3p3 3 + −1sgnp4 a1p4 1 a2p4 2 a3p4 3 + −1sgnp5 a1p5 1 a2p5 2 a3p5 3 + −1sgnp6 a1p6 1 a2p6 2 a3p6 3 = a11 a22 a33 − a11 a23 a32 − a12 a21 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 If A is 4 × 4, then evaluation of detA by direct recourse to the definition will involve 24 terms, as well as explicitly listing all 24 permutations on 1 2 3 4. This is not practical. We will therefore develop some properties of determinants which will make their evaluation more efficient.
8.3 Properties of Determinants
SECTION 8.2
PROBLEMS ⎛
6 3. A = ⎝ 2 0
In Problems 1 through 4, use the formula for detA in the 3 × 3 case to evaluate the determinant of the given matrix. ⎛
1 1. A = ⎝ 1 0 ⎛
−1 2. A = ⎝ 2 1
8.3
⎞ 0 −1 ⎠ 1
6 2 1 3 2 1
303
⎛
−4 4. A = ⎝ 0 0
⎞ 1 0 ⎠ 4
−3 1 1 0 1 0
⎞ 5 4 ⎠ −4 ⎞ 1 1 ⎠ 0
5. The permutations on 1 2 3 4 were listed in Section 8.1. Use this list to write a formula for detA when A is 4 × 4.
Properties of Determinants We will develop some of the properties of determinants that are used in evaluating them and deriving some of their properties. There are effective computer routines for evaluating quite large determinants, but these are also based on the properties we will display. First, it is standard to use vertical lines to denote determinants, so we will often write detA = A This should not be confused with absolute value. If A has numerical elements, then A is a number and can be positive, negative or zero. Throughout the rest of this chapter let A and B be n × n matrices. Our first result says that a matrix having a zero row has a zero determinant. THEOREM 8.1
If A has a zero row, then A = 0. This is easy to see from the defining sum (8.1). Suppose, for some i , each aij = 0. Each term of the sum (8.1) contains a factor aipj i from row i, hence each term in the sum is zero. Next, we claim that multiplying a row of a matrix by a scalar has the effect of multiplying the determinant of the matrix by . THEOREM 8.2
Let B be formed from A by multiplying row k by a scalar . Then B = A The effect of multiplying row k of A by is to replace each akj by akj . Then bij = aij for i = k, and bkj = akj , so A = −1sgnp a1p1 a2p2 · · · akpk · · · anpn p
304
CHAPTER 8
Determinants
and B =
−1sgnp b1p1 b2p2 · · · bkpk · · · bnpn
p
=
p
=
−1sgnp a1p1 a2p2 · · · akpk · · · anpn
−1sgnp a1p1 a2p2 · · · akpk · · · anpn = A
p
The next result states that the interchange of two rows in a matrix causes a sign change in the determinant. THEOREM 8.3
Let B be formed from A by interchanging two rows. Then A = − B A proof of this involves a close examination of the effect of a row interchange on the terms of the sum (8.1), and we will not go through these details. The result is easy to see in the case of 2 × 2 determinants. Let a11 a12 a21 a22 and B = A= a21 a22 a11 a12 Then A = a11 a22 − a12 a21 and B = a21 a12 − a22 a11 = − A This result has two important consequences. The first is that the determinant of a matrix with two identical rows must be zero. COROLLARY 8.1
If two rows of A are the same, then A = 0. The reason for this is that, if we form B from A by interchanging the identical rows, then B = A, so B = A. But by Theorem 8.3, B = − A, so A = 0. COROLLARY 8.2
If for some scalar , row k of A is times row i, then A = 0. To see this, consider two cases. First, if = 0, then row k of A is a zero row, so A = 0. If = 0, then we can multiply row k of A by 1/ to obtain a matrix B having rows i and k the same. Then B = 0. But B = 1/ A by Theorem 8.2, so A = 0. Next, we claim that the determinant of a product is the product of the determinants.
8.3 Properties of Determinants
305
THEOREM 8.4
Let A and B be n × n matrices. Then, AB = A B. Obviously this extends to a product involving any finite number of n × n matrices. The theorem enables us to evaluate the determinant of such a product without carrying out the matrix multiplications of all the factors. We will illustrate the theorem when we have efficient ways of evaluating determinants. The following theorem gives the determinant of a matrix that is written as a sum of matrices in a special way.
THEOREM 8.5
Suppose each element of row k of A is written as a sum kj + kj . Form two matrices from A. The first, A1 , is identical to A except the elements of row k are kj . The second, A2 , is identical to A except the elements of row k are kj . Then A = A1 + A2 If we display the elements of these matrices, the conclusion states that a11 k1 + k1 an1
···
···
a1n
· · · kj + kj
···
kn + kn
···
···
ann
a1j
anj
a11 · · · a1j · · · a1n = k1 · · · kj · · · kn an1 · · · anj · · · ann a11 · · · a1j · · · a1n + k1 · · · kj · · · kn an1 · · · anj · · · ann
This result can be seen by examining the terms of (8.1) for each of these determinants: A =
−1sgnp a1p1 a2p2 · · · kpk + kpk · · · anpn p
= −1sgnp a1p1 a2p2 · · · kpk · · · anpn p
+
−1sgnp a1p1 a2p2 · · · + kpk · · · anpn = A1 + A2
p
As a corollary to this, adding a scalar multiple of one row to another of a matrix does not change the value of the determinant.
306
CHAPTER 8
Determinants
COROLLARY 8.3
Let B be formed from A by adding ! times row i to row k. Then B = A. This result follows immediately from the preceding theorem by noting that row k of B is !aij + akj . Then a11 a i1 B = !ai1 + ak1 an1 a11 · · · ai1 · · · = ! ai1 · · · an1 · · ·
ann
a11
···
a1j
···
a1n
ai1
···
aij
···
ain
ak1
···
akj
···
akn
an1
···
anj
···
ann
···
a1j
···
a1n
···
aij
···
ain
···
!aij + akj
···
!ain + akn
···
anj
a1j
···
a1n
aij
···
ain
aij
···
ain
anj
···
ann
··· +
= A
In the last line, the first term is ! times a determinant with rows i and k identical, hence is zero. The second term is just A. We now know the effect of elementary row operations on a determinant. In summary: Type I operation—interchange of two rows. This changes the sign of the determinant. Type II operation—multiplication of a row by a scalar . This multiplies the determinant by Type III operation—addition of a scalar multiple of one row to another row. This does not change the determinant. Recall that the transpose At of a matrix A is obtained by writing the rows of A as the columns of At . We claim that a matrix and its transpose have the same determinant.
THEOREM 8.6
A = At . For example, consider the 2 × 2 case: a11 a12 A= a21 a22
At =
a11 a12
a21 a22
Then A = a11 a22 − a12 a21
and
At = a11 a22 − a21 a12 = A
8.4 Evaluation of Determinants by Elementary Row and Column Operations
307
A proof of this theorem consists of comparing terms of the determinants. If A = aij then At = aji . Now, from the defining sum (8.1), A =
−1sgnp a1p1 a2p2 · · · anpn
p
and At =
−1sgnp At ip1 At 2p2 · · · At npn
p
=
−1sgnq aq11 aq22 · · · aqnn
q
One can show that each term −1sgnp a1p1 a2p2 · · · anpn in the sum for A is equal to a corresponding term −1sgnq aq11 aq22 · · · aqnn in the sum for At . The key is to realize that, because q is a permutation of 1 n, we can rearrange the terms in the latter product to write them in increasing order of the first (row) index. This induces a permutation on the second (column) index, and we can match this term up with a corresponding term in the sum for A. We will not elaborate the details of this argument. One consequence of this result is that we can perform not only elementary row operations on a matrix, but also the corresponding elementary column operations, and we know the effect of each operation on the determinant. In particular, from the column perspective: If two columns of A are identical, or if one column is a zero column, then A = 0. Interchange of two columns of A changes the sign of the determinant. Multiplication of a column by a scalar multiplies the determinant by And addition of a scalar multiple of one column to another column does not change the determinant. These operations on rows and columns of a matrix, and their effect on the determinant of the newly formed matrix, form the basis for strategies to evaluate determinants.
SECTION 8.3
PROBLEMS
1. Let A = aij be an n×n matrix and let be any scalar. Let B = aij . Thus B is formed by multiplying each element of A by . Prove that B = n A. 2. Let A = aij be an n × n matrix. Let be a nonzero number. Form a new matrix B = i−j aij . How are A and
8.4
B related? Hint: It is useful to examine the 2 × 2 and 3 × 3 cases to get some idea of what B looks like. 3. An n × n matrix is skew-symmetric if A = −At . Prove that the determinant of a skew-symmetric matrix of odd order is zero.
Evaluation of Determinants by Elementary Row and Column Operations The use of elementary row and column operations to evaluate a determinant is predicated upon the following observation. If a row or column of an n × n matrix A has all zero elements except possibly for aij in row i and column j, then the determinant of A is −1i+j aij times the determinant of the n − 1 × n − 1 matrix obtained by deleting row i and column j from A.
308
CHAPTER 8
Determinants
This reduces the problem of evaluating an n × n determinant to one of evaluating a smaller determinant, having one less row and one less calumn. Here is a statement of this result, with (1) the row version and (2) the column version. THEOREM 8.7
1. Row Version a11 ai−11 0 a i+11 a
···
a1j−1
a1j
a1j+1
···
a1n
· · · ai−1j−1 ai−1j ai−1j+1 · · · ai−1n ··· 0 aij 0 ··· 0 · · · ai+1j−1 ai+1j ai+1j+1 · · · ai+1n · · · a a a · · · a n1 nj−1 nj nj+1 nn a11 · · · a1j−1 a1j+1 · · · a1n · · · a a · · · a a i−1j−1 i−1j+1 i−1n =−1i+j aij i−11 ai+11 · · · ai+1j−1 ai+1j+1 · · · ai+1n an1 · · · anj−1 anj+1 · · · ann
2. Column Version a11 ai−11 ai1 a i+11 a
···
a1j−1
0
a1j+1
· · · ai−1j−1 0 ai−1j+1 · · · aij−1 aij aij+1 · · · ai+1j−1 0 ai+1j+1 · · · a 0 a n1 nj−1 nj+1 a11 · · · a1j−1 ai−11 · · · ai−1j−1 i+j =−1 aij ai+11 · · · ai+1j−1 an1 · · · anj−1
···
a1n
··· ··· ···
ai−1n ain ai+1n
···
ann
a1j+1
···
a1n
ai−1j+1 ai+1j+1
··· ···
ai−1n ai+1n
anj+1
···
ann
This result suggests one strategy for evaluating a determinant. Given an n × n matrix A, use the row and/or column operations to obtain a new matrix B having at most one nonzero element in some row or column. Then A is a scalar multiple of B, and B is a scalar multiple of the n − 1 × n − 1 determinant formed by deleting from B the row and column containing this nonzero element. We can then repeat this strategy on this n − 1 × n − 1 matrix, eventually reducing the problem to one of evaluating a “small” determinant. Here is an illustration of this process.
8.4 Evaluation of Determinants by Elementary Row and Column Operations
309
EXAMPLE 8.1
Let
⎛ ⎜ ⎜ A=⎜ ⎜ ⎝
−6 −1 8 0 1
⎞ 0 1 3 2 5 0 1 7 ⎟ ⎟ 3 2 1 7 ⎟ ⎟ 1 5 −3 2 ⎠ 15 −3 9 4
We want to evaluate A. There are many ways to proceed with the strategy we are illustrating. To begin, we can exploit the fact that a13 = 1 and use elementary row operations to get zeros in the rest of column 3. Of course a23 = 0 to begin with, so we need only worry about column 3 entries in rows 3 4 5. Add −2(row 1) to row 3, −5(row 1) to row 4 and 3(row 1) to row 5 to get ⎞ ⎛ −6 0 1 3 2 ⎜ −1 5 0 1 7 ⎟ ⎟ ⎜ ⎜ 3 0 −5 3 ⎟ B = ⎜ 20 ⎟ ⎝ 30 1 0 −18 −8 ⎠ −17 15 0 18 10 Because we have used Type III row operations, A = B Further, by Theorem 8.7, B = −11+3 b13 C = 1 C = C where C is the 4 × 4 matrix obtained by deleting row 1 and column 3 of B: ⎛ ⎞ −1 5 1 7 ⎜ 20 3 −5 3 ⎟ ⎟ C=⎜ ⎝ 30 1 −18 −8 ⎠ −17 15 18 10 This is a 4 × 4 matrix, “smaller” than A. We will now apply the strategy to C. We can, for example, exploit the −1 entry in the 1 1 position of C, this time using column operations to get zeros in row 1, columns 2 3 4 of the new matrix. Specifically, add 5(column 1 ) to column 2, add column 1 to column 3, and add 7(column 1) to column 4 of C to get ⎛ ⎞ −1 0 0 0 ⎜ 20 103 15 143 ⎟ ⎟ D=⎜ ⎝ 30 151 12 202 ⎠ −17 −70 1 −109 Again, because we used Type III operations (this time on columns) of C, then C = D But by the theorem, because we are using the element d11 = −1 as the single nonzero element of row 1, we have D = −11+1 d11 E = − E
CHAPTER 8
310
Determinants
in which E is the 3 × 3 matrix obtained by deleting row 1 and column 1 from D: ⎛ ⎞ 103 15 143 202 ⎠ E = ⎝ 151 12 −70 1 −109 To evaluate E, we can exploit the entry 3 2 −12(row 3) to row 2 to get ⎛ 1153 F = ⎝ 991 −70
entry e31 = 1. Add −15(row 3) to row 1 and ⎞ 0 1778 0 1510 ⎠ 1 −109
Then E = F By the theorem, using the only nonzero element f32 = 1 of column 2 of F, we have F = −13+2 1 G = − G in which G is the 2 × 2 matrix obtained by deleting row 3 and column 2 of F: 1153 1778 G= 991 1510 At the 2 × 2 state, we evaluate the determinant directly: G = 11531510 − 1778991 = −20 968 Working back, we now have A = B = C = D = − E = − F = G = −20 968 The method is actually quicker to apply than might appear from this example, because we included comments as we proceeded with the calculations.
SECTION 8.4
PROBLEMS
In each of Problems 1 through 10, use the strategy of this section to evaluate the determinant of the matrix. ⎛
−2 1. ⎝ 1 7 ⎛ 2 2. ⎝ 14 −13 ⎛ −4 3. ⎝ −2 2
4 6 0
⎞ 1 3 ⎠ 4
−3 1 −1 5 3 −2
⎞ 7 1 ⎠ 5 ⎞ 6 5 ⎠ 6
⎛
2 4. ⎝ 4 13 ⎛
17 5. ⎝ 1 14 ⎛
−3 ⎜ 1 ⎜ 6. ⎝ 7 2
−5 3 0
⎞ 8 8 ⎠ −4
−2 12 7
⎞ 5 0 ⎠, −7
3 −2 1 1
9 15 1 −1
⎞ 6 6 ⎟ ⎟ 5 ⎠ 3
8.5 Cofactor Expansions ⎛
0 ⎜ 6 ⎜ 7. ⎝ 1 4 ⎛
2 ⎜ 3 8. ⎜ ⎝ −2 4
8.5
1 −3 −5 8 7 1 0 8
1 2 1 2 −1 1 3 −1
⎛
⎞ −4 2 ⎟ ⎟ −2 ⎠ 2 ⎞ 0 8 ⎟ ⎟ 1 ⎠ 0
10 ⎜ 0 ⎜ 9. ⎝ 0 −2 ⎛ −7 ⎜ 1 10. ⎜ ⎝ 0 6
1 3 1 6 16 0 3 1
311
⎞ 2 9 ⎟ ⎟ 7 ⎠ 8
−6 3 1 8 2 0 −4 1
⎞ 4 5 ⎟ ⎟ 4 ⎠ −5
Cofactor Expansions Theorem 8.5 suggests the following. If we select any row i of a square matrix A, we can write a11 a12 · · · · · · a1n a11 a12 ai1 ai2 · · · · · · ain = ai1 0 an1 an2 · · · · · · ann an1 an2 a11 a11 a12 · · · · · · a1n + 0 ai2 · · · · · · 0 + · · · + 0 an1 an1 an2 · · · · · · ann
··· ··· ··· ··· ··· ···
0 ann a1n
a1n
0
··· ··· ··· ···
an2
··· ···
ann
a12
ain
(8.2)
Each of the determinants on the right of equation (8.2) has a row in which every element but possibly one is zero, so Theorem 8.7 applies to each of these determinants. The first determinant on the right is −1i+1 ai1 times the determinant of the matrix obtained by deleting row i and column 1 from A. The second determinant on the right is −1i+2 ai2 times the determinant of the matrix obtained by deleting row i and column 2 from A. And so on, until the last matrix on the right is −1i+n ain times the determinant of the matrix obtained by deleting row i and column n from A. We can put all of this more succinctly by introducing the following standard terminology.
DEFINITION 8.2
Minor
If A is an n × n matrix, the minor of aij is denoted Mij , and is the determinant of the n − 1 × n − 1 matrix obtained by deleting row i and column j of A. Cofactor The number −1i+j Mij is called the cofactor of aij .
We can now state the following formula for a determinant.
312
CHAPTER 8 THEOREM 8.8
Determinants Cofactor Expansion by a Row
If A is n × n, then for any integer i with i ≤ i ≤ n, A =
n −1i+j aij Mij
(8.3)
j=1
This is just equation (8.2) in the notation of cofactors. The sum (8.3) is called the cofactor expansion of A by row i because it is the sum, across this row, of each matrix element times its cofactor. This yields A no matter which row is used. Of course, if some aik = 0 then we need not calculate that term in equation (8.3), so it is to our advantage to expand by a row having as many zero elements as possible. The strategy of the preceding subsection was to create such a row using row and column operations, resulting in what was a cofactor expansion, by a row having only one (possibly) nonzero element.
EXAMPLE 8.2
Let ⎛
−6 3 A = ⎝ 12 −5 2 4
⎞ 7 −9 ⎠ −6
If we expand by row 1, we get A =
3 −11+j M1j j=1
−5 −9 + −11+2 3 12 = −11+1 −6 2 4 −6 12 −5 + −11+3 7 2 4
−9 −6
= −630 + 36 − 3−72 + 18 + 7−48 + 10 = 172 Just for illustration, expand by row 3: 3 3+j 3+1 A = −1 a3j M3j = −1 2
7 −9 j=1 −6 −6 7 3+3 + −1 + −13+2 4 −6 12 12 −9 3 −5
3 −5
= 2−27 + 35 − 454 − 84 − 630 − 36 = 172
Because, for purposes of evaluating determinants, row and column operations can both be used, we can also develop a cofactor expansion of A by column j. In this expansion, we move down a column of a matrix and sum each term of the column times its cofactor.
8.5 Cofactor Expansions THEOREM 8.9
313
Cofactor Expansion by a Column
Let A be an n × n matrix. Then for any j with 1 ≤ j ≤ n, A =
n
−1i+j aij Mij
(8.4)
i=1
This differs from the expansion (8.3) in that the latter expands across a row, while the sum (8.4) expands down a column. All of these expansions, by any row or column of A, yield A.
EXAMPLE 8.3
Consider again
⎛
−6 A = ⎝ 12 2
3 −5 4
⎞ 7 −9 ⎠ −6
Expanding by column 1 gives us A =
3 −5 −9 −1i+1 ai1 Mi1 = −11+1 −6 4 −6 i=1 3 3 7 7 2+1 3+1 + −1 12 + −1 2 4 −6 −5 −9
=−630 + 36 − 12−18 − 28 + 2−27 + 35 = 172 If we expand by column 2 we get
12 −9 A = −1 ai2 Mi2 = −1 3 2 −6 i=1 −6 −6 7 3+2 + −12+2 −5 + −1 4 12 2 −6 3
i+2
1+2
=−3−72 + 18 − 536 − 14 − 454 − 84 = 172
SECTION 8.5
PROBLEMS
In Problems 1–10, use cofactor expansions, combined with elementary row and column operations when this is useful, to evaluate the determinant of the matrix. ⎞ ⎛ −4 2 −8 1 0 ⎠ 1. ⎝ 1 1 −3 0 ⎞ ⎛ 1 1 6 2. ⎝ 2 −2 1 ⎠ 3 −1 4
⎛
7 3. ⎝ 1 −3 ⎛ 5 4. ⎝ −1 −2 ⎛ −5 ⎜ 2 ⎜ 5. ⎝ 4 1
−4 1 −2
⎞ 1 4 ⎠ 0 ⎞ 3 6 ⎠ 4
0 −1 4 −1
1 3 −5 6
−3 −2 1
⎞ 6 7 ⎟ ⎟ −8 ⎠ 2
7 −9
CHAPTER 8
314
Determinants
⎞ 4 3 −5 6 ⎜ 1 −5 15 2 ⎟ ⎟ 6. ⎜ ⎝ 0 −5 1 7 ⎠ 8 9 0 15 ⎞ ⎛ −3 1 14 1 16 ⎠ 7. ⎝ 0 2 −3 4 ⎞ ⎛ 14 13 −2 5 ⎜ 7 1 1 7 ⎟ ⎟ 8. ⎜ ⎝ 0 2 12 3 ⎠ 1 −6 5 2 ⎞ ⎛ −5 4 1 7 ⎜ −9 3 2 −5 ⎟ ⎟ 9. ⎜ ⎝ −2 0 −1 1 ⎠ 1 14 0 3 ⎛ −8 5 1 7 2 ⎜ 0 1 3 5 −6 ⎜ 1 5 3 10. ⎜ ⎜ 2 2 ⎝ 0 4 3 7 2 1 1 −7 −6 5 ⎛
11. Show that 1 1 1 !
8.6
2 2 !2
This is Vandermonde’s determinant. This and the next problem are best done with a little thought in using facts about determinants, rather than a brute-force approach. 12. Show !
that !
!
= + + ! + − + − ! 0 1 −1 1 1 ! × 1 1 !
!
13. Let A be a square matrix such that A−1 = At . Prove that A = ±1.
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
14. Prove that three points x1 y1 , x2 y2 and x3 y3 are collinear (on the same straight line) if and only if 1 x1 y1 1 x2 y2 = 0 1 x3 y3
= − ! − − !
Hint: This determinant is zero exactly when one row or column is a linear combination of the other two.
Determinants of Triangular Matrices The main diagonal of a square matrix A consists of the elements a11 , a22 , , ann . We call A upper triangular if all the elements below the main diagonal are zero. That is, aij = 0 if i > j. Such a matrix has the appearance ⎛
a11 0 0
⎜ ⎜ ⎜ ⎜ A=⎜ ⎜ ⎜ ⎝ 0 0
a12 a22 0
a13 a23 a33
··· ··· ···
a1n−1 a2n−1 a3n−1
a1n a2n a3n
0 0
0 0
··· 0
an−1n−1 0
an−1n ann
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
If we expand A by cofactors down the first column, we have A = a11
a22 0
a23 a33
··· ···
a2n−1 a3n−1
a2n a3n
0 0
0 0
··· ···
an−1n−1 0
an−1n ann
8.7 A Determinant Formula for a Matrix Inverse
315
and the determinant on the right is again upper triangular, so expand by its first column to get a33 a34 · · · a3n 0 a44 · · · a3n A = a11 a22 ··· 0 0 · · · ann with another upper triangular determinant on the right. Continuing in this way, we obtain A = a11 a22 · · · ann The determinant of an upper triangular matrix is the product of its main diagonal elements. The same conclusion holds for lower triangular matrices (all elements above the main diagonal are zero). Now we can expand the determinant along the top row, each time obtaining just one minor that is again lower triangular.
EXAMPLE 8.4
15 0 0 0 0
Evaluate the following determinants. 0 0 0 0 0 −4 7 0 0 0 0 12 2 0 0 0 3 −4 1. 1 1 −2 0 0 0 1 5 0 1 −4 16 10 −4 16 1 17 4
8.7
6 0 0 0 0 0
1 −4 0 0 0 0
−1 2 −5 0 0 0
2 2 10 14 0 0
4 −6 √ 2 0 0
7 3 3 9 √ √ 15 −4 = 1512 2e = 180 2e 22 0 e
PROBLEMS
SECTION 8.6
2.
−7 12 0 0 0
2 −3 1 0 13 0
1 1 −7 0 −4 3
3 2 3. 17 22 43
0 −6 14 −2 12
0 0 2 15 1
0 0 0 8 −1
0 0 0 0 5
A Determinant Formula for a Matrix Inverse Determinants can be used to tell whether a matrix is singular or nonsingular. In the latter case, there is a way of writing the inverse of a matrix by using determinants.
316
CHAPTER 8
Determinants
First, here is a simple test for nonsingularity. We will use the fact that we reduce a matrix by using elementary row operations, whose effects on determinants are known (Type I operations change the sign, Type II operations multiply the determinant by a nonzero constant, and Type III operations do not change the determinant at all). This means that, for any square matrix A, A = AR for some nonzero constant . THEOREM 8.10
Let A be an n × n matrix. Then A is nonsingular if and only if A = 0. Suppose first that A = 0. Since A = AR for some nonzero constant , AR can have no zero row, so AR = In . Then rankA = n, so A is nonsingular by Theorems 7.26 and 7.15. Conversely, suppose A is nonsingular. Then AR = In . Then A = AR = 0.
Proof
Using this result, we can give a short proof of Theorem 7.25(5). Suppose A and B are n × n matrices, and AB is singular. Then AB = A B = 0 so A = 0 or B = 0, hence either A or B (or possibly both) must be singular. We will now write a formula for the inverse of a square matrix, in terms of cofactors of the matrix. THEOREM 8.11
Let A be an n × n nonsingular matrix. Define an n × n matrix B by putting bij =
1 −1i+j Mji A
Then, B = A−1 . That is, the i j element of A−1 is the cofactor of aji (not aij ), divided by the determinant of A. Proof
By the way B is defined, the i j element of AB is ABij =
n
aik bkj =
k=1
n 1 −1j+k aik Mjk A k=1
Now examine the sum on the right. If i = j, we get ABii =
n 1 −1i+k aik Mik A k=1
and the summation is exactly the cofactor expansion of A by row i. Therefore ABii =
A = 1 A
If i = j, then the summation in the expression for ABij is the cofactor expansion, by row j, of the determinant of the matrix formed from A by replacing row j by row i. But this matrix then has two identical rows, hence has determinant zero. Then ABij = 0 if i = j, and we conclude that AB = In . A similar argument shows that BA = In , hence B = A−1 .
8.7 A Determinant Formula for a Matrix Inverse
317
This method of computing a matrix inverse is not as efficient in general as the reduction method discussed previously. Nevertheless, it works well for small matrices, and in some discussions it is useful to have a formula for the elements of a matrix inverse.
EXAMPLE 8.5
Let
⎛
−2 A=⎝ 6 2 Then
−2 6 2
⎞ 4 1 3 −3 ⎠ 9 −5
4 1 3 −3 9 −5
= 120
so A is nonsingular. Compute the nine elements of the inverse matrix B: 1 1 1 3 −3 12 b11 = M11 = = = 9 −5 120 120 120 10 b12 =
1 1 4 −1M21 = − 120 120 9
1 1 4 b13 = M = 120 31 120 3
1 1 =− −3 8
1 1 6 b21 = − M =− 120 12 120 2 b22 =
1 −2 120 2
1 1 6 M13 = 120 120 2
Then
1 = 0 −3
3 2 = 9 5
1 1 −2 b32 = − M =− 120 23 120 2 1 −2 b33 = 120 6
−3 1 = −5 5
1 1 = −5 15
1 1 −2 b23 = − M =− 120 32 120 6 b31 =
29 1 = −5 120
4 13 = 9 60
1 4 =− 3 4 ⎛
⎜ B = A−1 = ⎝
1 10 1 5 2 5
29 120 1 15 13 60
−1 8
⎞
⎟ 0 ⎠
−1 4
CHAPTER 8
318
SECTION 8.7
Determinants
PROBLEMS
In each of Problems 1 through 10, use Theorem 8.10 to determine whether the matrix is nonsingular. If it is, use Theorem 8.11 to find its inverse. 2 −1 1. 1 6 3 0 2. 1 4 −1 1 3. 1 4 2 5 4. −7 −3 ⎞ ⎛ 6 −1 3 1 −4 ⎠ 5. ⎝ 0 2 2 −3 ⎞ ⎛ −14 1 −3 2 −1 3 ⎠ 6. ⎝ 1 1 7
8.8
⎛
0 7. ⎝ 2 1 ⎛
11 8. ⎝ 0 4 ⎛
3 ⎜ 4 9. ⎜ ⎝ −2 13 ⎛ 7 ⎜ 8 10. ⎜ ⎝ 1 3
−4 −1 −1 0 1 −7
⎞ 3 6 ⎠ 7 ⎞ −5 0 ⎠ 9
1 6 1 0
−2 −3 7 1
⎞ 1 9 ⎟ ⎟ 4 ⎠ 5
−3 2 5 −2
−4 0 −1 −5
⎞ 1 0 ⎟ ⎟ 7 ⎠ 9
Cramer’s Rule Cramer’s rule is a determinant formula for solving a system of equations AX = B when A is n × n and nonsingular. In this case, the system has the unique solution X = A−1 B. We can, therefore, find X by computing A−1 and then A−1 B. Here is another way to find X.
THEOREM 8.12
Cramer’s Rule
Let be a nonsingular n × n matrix of numbers. Then the unique solution of AX = B is ⎛ A⎞ x1 ⎜ x2 ⎟ ⎜ ⎟ ⎜ ⎟, where ⎝ ⎠ xn xk =
1 Ak B A
and Ak B is the matrix obtained from A by replacing column k of A by B. Here is a heuristic argument to suggest why this works. Let ⎛ ⎞ b1 ⎜ b2 ⎟ ⎜ ⎟ B = ⎜ ⎟ ⎝ ⎠ bn
8.8 Cramer’s Rule
319
Multiply column k of A by xk . The determinant of the resulting matrix is xk A, so a11 a12 · · · a1k xk · · · a1n a21 a22 · · · a2k xk · · · a2n xk A = an1 an2 · · · ank xk · · · ann For each j = k, add xj times column j to column k. This Type III operation does not change the value of the determinant, and we get a11 a12 · · · a11 x1 + a12 x2 + · · · + a1n xn · · · a1n a21 a22 · · · a21 x1 + a22 x2 + · · · + a2n an · · · a2n xk A = an1 an2 · · · an1 x1 + an2 x2 + · · · + ann xn · · · ann a11 a12 · · · b1 · · · a1n a21 a22 · · · b2 · · · a2n = = Ak B an1 an2 · · · bn · · · ann Solving for xk yields the conclusion of Cramer’s Rule.
EXAMPLE 8.6
Solve the system x1 − 3x2 − 4x3 = 1 −x1 + x2 − 3x3 = 14 x2 − 3x3 = 5 The matrix of coefficients is
⎛
⎞ 1 −3 −4 1 −3 ⎠ A = ⎝ −1 0 1 −3
We find that A = 13. By Cramer’s rule, 1 −3 1 14 1 x1 = 13 5 1 1 1 1 −1 14 x2 = 13 0 5 1 −3 1 −1 1 x3 = 13 0 1
= − 117 = −9 13 −4 10 −3 = − 13 −3 1 25 14 = − 13 5 −4 −3 −3
CHAPTER 8
320
Determinants
Cramer’s rule is not as efficient as the Gauss-Jordan reduction. Gauss-Jordan also applies to homogeneous systems, and to systems with different numbers of equations than unknowns. However, Cramer’s rule does provide a formula for the solution, and this is useful in some contexts.
SECTION 8.8
PROBLEMS
In each of Problems 1 through 10, either find the solution by Cramer’s rule or show that the rule does not apply. 1.
6.
6x1 + 4x2 − x3 + 3x4 − x5 = 7 x1 − 4x2 + x5 = −5 x1 − 3x2 + x3 − 4x5 = 0 −2x1 + x3 − 2x5 = 4 x3 − x4 − x5 = 8
7.
2x1 − 4x2 + x3 − x4 = 6 x2 − 3x3 = 10 x1 − 4x3 = 0 x2 − x3 + 2x4 = 4
8.
2x1 − 3x2 + x4 = 2 x2 − x3 + x4 = 2 x3 − 2x4 = 5 x1 − 3x2 + 4x3 = 0
9.
14x1 − 3x3 = 5 2x1 − 4x3 + x4 = 2 x1 − x2 + x3 − 3x4 = 1 x3 − 4x4 = −5
10.
x2 − 4x4 = 18 x1 − x2 + 3x3 = −1 x1 + x2 − 3x3 + x4 = 5 x2 + 3x4 = 0
15x1 − 4x2 = 5 8x1 + x2 = −4
2.
x1 + 4x2 = 3 x1 + x2 = 0
3.
8x1 − 4x2 + 3x3 = 0 x1 + 5x2 − x3 = −5 −2x1 + 6x2 + x3 = −4
4.
5x1 − 6x2 + x3 = 4 −x1 + 3x2 − 4x3 = 5 2x1 + 3x2 + x3 = −8
5.
x1 + x2 − 3x3 = 0 x2 − 4x3 = 0 x1 − x2 − x3 = 5
8.9
The Matrix Tree Theorem In 1847, G.R. Kirchhoff published a classic paper in which he derived many of the electrical circuit laws that bear his name. One of these is the matrix tree theorem, which we will discuss now. Figure 8.1 shows a typical electrical circuit. The underlying geometry of the circuit is shown in Figure 8.2. This diagram of points and connecting lines is called a graph, and was seen in the context of the movement of atoms in crystals in Section 7.1.5. A labeled graph has symbols attached to the points. Some of Kirchhoff’s results depend on geometric properties of the circuit’s underlying graph. One such property is the arrangement of the closed loops. Another is the number of spanning trees in the labeled graph. A spanning tree is a collection of lines in the graph forming no closed loops, but containing a path between any two points of the graph. Figure 8.3 shows a labeled graph and two spanning trees in this graph. Kirchhoff derived a relationship between deteminants and the number of labeled trees in a graph.
8.9 The Matrix Tree Theorem
321
V
FIGURE 8.1
v2
v1
v3
FIGURE 8.2
v1
v2
v7 v10
v8
v5
v7
v10
v8
v10
v9 v4
v6
v8 v9
v5
v6
v4
v5
FIGURE 8.3 A labeled graph and two of its spanning trees.
THEOREM 8.13
Matrix Tree Theorem
Let G be a graph with vertices labeled v1 vn . Form an n × n matrix T = tij as follows. If i = j, then tii is the number of lines to vi in the graph. If i = j, then tij = 0 if there is no line between vi and vj in G, and tij = −1 if there is such a line. Then, all cofactors of T are equal, and their common value is the number of spanning trees in G.
EXAMPLE 8.7
For the labeled graph of Figure 8.4, T is ⎛ 3 −1 ⎜ −1 3 ⎜ ⎜ 0 −1 ⎜ T =⎜ ⎜ 0 −1 ⎜ 0 0 ⎜ ⎝ −1 0 −1 0 G v7
the 7 × 7 matrix 0 0 0 −1 −1 −1 −1 0 0 0 3 −1 0 −1 0 −1 4 −1 0 −1 0 −1 3 −1 −1 −1 0 −1 4 −1 0 −1 −1 −1 4 v1
v2 v3
v6 v5
v3
v2
v1
v7
v9 v6
v3
v4
FIGURE 8.4 Graph G.
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
v4
322
CHAPTER 8
Determinants
Evaluate any cofactor of T. For example, covering up 3 −1 −1 3 −1 −1 4 −1 −1 1+1 −1 M11 = 0 −1 0 0 −1 0 0 0 −1
row 1 and column 1, we have 0 0 0 0 −1 0 −1 0 −1 3 −1 −1 −1 4 −1 −1 −1 4
= 386 Evaluation of any cofactor of T yields the same result. Even with this small graph, it would clearly be impractical to enumerate the spanning trees by attempting to list them all.
PROBLEMS
SECTION 8.9
1. Find the number of spanning trees in the graph of Figure 8.5.
1
4. Find the number of spanning trees in the graph of Figure 8.8.
5 1
2 4
6
3
2. Find the number of spanning trees in the graph of Figure 8.6.
1 6
3
4
FIGURE 8.8
5. Find the number of spanning trees in the graph of Figure 8.9.
5
4
3 5
FIGURE 8.5
2
2
2 3
1
FIGURE 8.6
3. Find the number of spanning trees in the graph of Figure 8.7.
1 6
4 5
FIGURE 8.9
2 3
5
6
4
FIGURE 8.7
6. A complete graph on n points consists of n points with a line between each pair of points. This graph is often denoted Kn . With the points labeled 1 2 n, show that the number of spanning trees in Kn is nn−2 for n = 3 4 .
CHAPTER
9
EIGENVALUES AND EIGENVECTORS DIAGONALIZATION OF MATRICES ORTHOGONAL AND SYMMETRIC MATRICES QUADRATIC FORMS UNITARY HERMITIAN, AND SKEW HERMITIAN MATRICES EIGENVALUES
Eigenvalues, Diagonalization, and Special Matrices
Suppose A is an n × n matrix of real numbers. If we write an n-vector E as a column ⎛ ⎜ ⎜ E=⎜ ⎝
1 2
⎞ ⎟ ⎟ ⎟ ⎠
n then AE is an n × 1 matrix which we may also think of as an n-vector. We may therefore consider A as an operator that moves vectors about in Rn . Because AaE1 + bE2 = aAE1 + bAE2 , A is called a linear operator. Vectors have directions associated with them. Depending on A, the direction of AE will generally be different from that of E. It may happen, however, that for some vector E, AE and E are parallel. In this event there is a number such that AE = E. Then is called an eigenvalue of A, with E an associated eigenvector. The idea of an operator moving a vector to a parallel position is simple and geometrically appealing. It also has powerful ramifications in a variety of contexts. Eigenvalues contain important information about the solutions of systems of differential equations, and in models of physical phenomena may they have physical significance as well (such as the modes of vibration of a mechanical system, or the energy states of an atom). 323
324
9.1
CHAPTER 9
Eigenvalues, Diagonalization, and Special Matrices
Eigenvalues and Eigenvectors Let A be an n × n matrix of real or complex numbers.
DEFINITION 9.1
Eigenvalues and Eigenvectors
A real or complex number is an eigenvalue of A if there is a nonzero n × 1 matrix (vector) E such that AE = E Any nonzero vector E satisfying this relationship is called an eigenvector associated with the eigenvalue .
Eigenvalues are also known as characteristic values of a matrix, and eigenvectors can be called characteristic vectors. We will typically write eigenvectors as column matrices and think of them as vectors in Rn . If an eigenvector has complex components, we may think of it as a vector in C n , which consists of n-tuples of complex numbers. Since an eigenvector must be nonzero vector, at least one component is nonzero. If is a nonzero scalar and AE = E, then AE = AE = E = E This means that nonzero scalar multiples of eigenvectors are again eigenvectors.
EXAMPLE 9.1
Since
1 0
0 0
0 4
=
0 0
=0
0 4
0 an associated eigenvector. Although the zero 4 vector cannot be an eigenvector, the number zero can be an eigenvalue of a matrix. For any 0 scalar = 0, is also an eigenvector associated with the eigenvalue 0. 4
0 is an eigenvalue of this matrix, with
EXAMPLE 9.2
Let
⎛
1 A=⎝ 0 0
⎞ −1 0 1 1 ⎠ 0 −1
9.1 Eigenvalues and Eigenvectors
325
⎛
⎞ 6 Then 1 is an eigenvalue with associated eigenvector ⎝ 0 ⎠, because 0 ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 6 6 6 A⎝ 0 ⎠ = ⎝ 0 ⎠ = 1⎝ 0 ⎠ 0 0 0 ⎛
⎞ Because any nonzero multiple of an eigenvector is an eigenvector, then ⎝ 0 ⎠ is also an 0 eigenvector associated with eigenvalue 1, for any nonzero number⎛. ⎞ 1 Another eigenvalue of A is −1, with associated eigenvector ⎝ 2 ⎠, because −4 ⎞ ⎞⎛ ⎛ 1 1 −1 0 ⎝ 0 1 1 ⎠⎝ 2 ⎠ −4 0 0 −1 ⎛ ⎞ ⎛ ⎞ −1 1 = ⎝ −2 ⎠ = −1 ⎝ 2 ⎠ 4 −4 ⎛ ⎞ Again, any vector ⎝ 2 ⎠, with = 0, is an eigenvector associated with −1. −4 We would like a way of finding all of the eigenvalues of a matrix A. The machinery to do this is at our disposal, and we reason as follows. For to be an eigenvalue of A, there must be an associated eigenvector E, and AE = E. Then E − AE = O, or In E − AE = O The identity matrix was inserted so we could write the last equation as In − AE = O This makes E a nontrivial solution of the n × n system of linear equations In − AX = O But this system can have a nontrivial solution if and only if the coefficient matrix has determinant zero, that is, In − A = 0. Thus is an eigenvalue of A exactly when In − A = 0. This is the equation − a11 −a12 · · · −a1n −a21 − a22 · · · −a2n = 0 −an1 −an2 · · · − ann When the determinant on the left is expanded, it is a polynomial of degree n in , called the characteristic polynomial of A. The roots of this polynomial are the eigenvalues of A. Corresponding to any root , any nontrivial solution E of In − AX = O is an eigenvector associated with . We will summarize these conclusions.
326
CHAPTER 9
Eigenvalues, Diagonalization, and Special Matrices
THEOREM 9.1
Let A be an n × n matrix of real or complex numbers. Then 1. is an eigenvalue of A if and only if In − A = 0. 2. If is an eigenvalue of A, then any nontrivial solution of In − AX = O is an associated eigenvector.
DEFINITION 9.2
Characteristic Polynomial
The polynomial In − A is the characteristic polynomial of A, and is denoted pA .
If A is n × n, then pA is an nth degree polynomial with real or complex coefficients determined by the elements of A. This polynomial therefore has n roots, though some may be repeated. An n × n matrix A always has n eigenvalues 1 n , in which each eigenvalue is listed according to its multiplicity as a root of the characteristic polynomial. For example, if pA = − 1 − 32 − i4 we list 7 eigenvalues: 1 3 3 i i i i. The eigenvalue 3 has multiplicity 2 and i has multiplicity 4.
EXAMPLE 9.3
Let
⎛
1 −1 1 A=⎝ 0 0 0
⎞ 0 1 ⎠ −1
as in Example 9.2. The characteristic polynomial is −1 1 −1 pA = 0 0 0
0 −1 +1
= − 12 + 1 The eigenvalues of A are 1 1 −1. To find eigenvectors associated with eigenvalue 1, solve ⎛ ⎞ 0 1 0 1I3 − AX = ⎝ 0 0 −1 ⎠ X = O 0 0 2 This has general solution
⎛
⎞ ⎝ 0 ⎠ 0
and these are the eigenvectors associated with eigenvalue 1, with = 0
9.1 Eigenvalues and Eigenvectors For eigenvectors associated with −1, solve ⎛ −2 1 −1I3 − AX = ⎝ 0 −2 0 0 The general solution is
327
⎞ 0 −1 ⎠ X = O 0
⎞ ⎝ 2 ⎠ −4 ⎛
and these are the eigenvectors associated with eigenvalue −1, as long as = 0
EXAMPLE 9.4
Let
A=
1 −2 2 0
The characteristic polynomial is 1 0 1 −2 pA x = − 0 1 2 0 −1 2 = − 1 + 4 = 2 − + 4 = −2 √ √ This has roots 1 + 15i/2 and 1 − 15i/2, and these are the eigenvalues of A. Even though A has real elements, the eigenvalues may√be complex. To find eigenvectors associated with 1 + 15i/2, solve the system I2 − AX = O, which for this is √ 1 + 15i 1 0 1 −2 − X = O 0 1 2 0 2 This is the system
√ 1+ 15i 2
−1
−2 or
2
√
1+ 15i 2
x1 x2
=
0 0
√ −1 + 15i x1 + 2x2 = 0 2 √ 1 + 15i −2x1 + x2 = 0 2
We find the general solution of this system to be 1 1−√15i 4
and this is an eigenvector associated with the eigenvalue
√ 1+ 15i 2
for any nonzero scalar .
328
CHAPTER 9
Eigenvalues, Diagonalization, and Special Matrices √
Corresponding to the eigenvalue 1− 2 15i solve the system 1−√15i −1 2 2 X = O √ 1− 15i −2 2 obtaining the general solution
1
√ 1+ 15i 4
This is an eigenvector corresponding to the eigenvalue
√ 1− 15i 2
for any = 0.
Finding the eigenvalues of a matrix is equivalent to finding the roots of an nth degree polynomial, and if n ≥ 3 this may be difficult. There are efficient computer routines which are usually based on the idea of putting the matrix through a sequence of transformations, the effect of which on the eigenvalues is known. This strategy was used previously to evaluate determinants. There are also approximation techniques, but these are sensitive to error. A number that is very close to an eigenvalue may not behave like an eigenvalue. We will conclude this section with a theorem due to Gerschgorin. If real eigenvalues are plotted on the real line, and complex eigenvalues as points in the plane, Gerschgorin’s theorem enables us to delineate regions of the plane containing the eigenvalues.
9.1.1 THEOREM 9.2
Gerschgorin’s Theorem Gerschgorin
Let A be an n × n matrix of real or complex numbers. For k = 1 n, let rk =
n akj j=1j=k
Let Ck be the circle of radius rk centered at k k , where akk = k + ik . Then each eigenvalue of A, when plotted as a point in the complex plane, lies on or within one of the circles C1 Cn . The circles Ck are called Gerschgorin circles. For the radius of Ck , read across row k and add the magnitudes of the row elements, omitting the diagonal element akk . The center of Ck is akk , plotted as a point in the complex plane. If the Gerschgorin circles are drawn and the disks they bound are shaded, then we have a picture of a region containing all of the eigenvalues of A.
EXAMPLE 9.5
Let
⎛
12i ⎜ 1 A=⎜ ⎝ 4 1 − 3i
1 9 −6 2 + i 1 −1 −9 1
⎞ −4 −1 ⎟ ⎟ 4i ⎠ 4 − 7i
A has characteristic polynomial pA = 4 + 3 − 5i3 + 18 − 4i2 + 290 + 90i + 1374 − 1120i
9.1 Eigenvalues and Eigenvectors
329
It is not clear what the roots of this polynomial are. Form the Gerschgorin circles. Their radii are: r1 = 1 + 9 + 4 = 14 √ √ r2 = 1 + 5 + 1 = 2 + 5 r3 = 4 + 1 + 4 = 9 and
√ 10 + 9 + 1 = 10 + 10 √ C1 has radius 14 and center 0 12, C2 has radius √ 2 + 5 and center −6 0, C3 has radius 9 and center −1 0 and C4 has radius 10 + 10 and center 4 −7. Figure 9.1 shows the Gerschgorin circles containing the eigenvalues of A. r4 =
√
y (0, 12)
(6, 0) x (4, 7)
(1, 0)
FIGURE 9.1 Gerschgorin circles.
Gerschgorin’s theorem is not intended as an approximation scheme, since the Gerschgorin circles may have large radii. For some problems, however, just knowing some information about possible locations of eigenvalues can be important. For example, in studies of the stability of fluid flow, it is important to know whether there are eigenvalues in the right half-plane.
SECTION 9.1
PROBLEMS
In each of Problems 1 through 16, (a) find the eigenvalues of the matrix, (b) corresponding to each eigenvalue, find an eigenvector, and (c) sketch the Gerschgorin circles and (approximately) locate the eigenvalues as points in the plane. 1. 2. 3.
1 2
3 1
0 4
−5 1
0 2
5. 6.
−2 1
4.
1 2 0 0
−2 4 −6 2 1 0
⎞ 2 0 0 7. ⎝ 1 0 2 ⎠ 0 0 3 ⎞ ⎛ −2 1 0 0 ⎠ 8. ⎝ 1 3 0 0 −1 ⎛
6 −3
330
CHAPTER 9
Eigenvalues, Diagonalization, and Special Matrices
⎞ −3 1 1 9. ⎝ 0 0 0 ⎠ 0 1 0 ⎞ ⎛ 0 0 −1 1 ⎠ 10. ⎝ 0 0 2 0 0 ⎞ ⎛ −14 1 0 0 2 0 ⎠ 11. ⎝ 1 0 2 ⎞ ⎛ 3 0 0 12. ⎝ 1 −2 −8 ⎠ 0 −5 1 ⎞ ⎛ 1 −2 0 0 0 ⎠ 13. ⎝ 0 −5 0 7 ⎞ ⎛ −2 1 0 0 ⎜ 1 0 0 1 ⎟ ⎟ 14. ⎜ ⎝ 0 0 0 0 ⎠ 0 0 0 0 ⎞ ⎛ −4 1 0 1 ⎜ 0 1 0 0 ⎟ ⎟ 15. ⎜ ⎝ 0 0 2 0 ⎠ 1 0 0 3 ⎛
9.2
⎛
5 ⎜ 0 ⎜ 16. ⎝ 0 0
1 1 0 0
0 0 0 0
⎞ 9 9 ⎟ ⎟ 9 ⎠ 0
! and ! are real numbers, are real. ⎛ 18. Show that the eigenvalues of ⎝ ! if all of the matrix elements are real. 17. Show that the eigenvalues of
, in which , ⎞ ! ⎠ are real, "
19. Let be an eigenvalue of A with eigenvector E. Show that, for any positive integer k, k is an eigenvalue of Ak , with eigenvector E. 20. Let be an eigenvalue of A with eigenvector E, and an eigenvalue of A with eigenvector L. Suppose = . Show that E and L are linearly independent as vectors in Rn . 21. Let A be an n × n matrix. Prove that the constant term of pA x is −1n A. Use this to show that any singular matrix must have zero as one of its eigenvalues.
Diagonalization of Matrices We have referred to the elements aii of a square matrix as its main diagonal elements. All other elements are called off-diagonal elements.
DEFINITION 9.3
Diagonal Matrix
A square matrix having all off-diagonal elements equal to zero is called a diagonal matrix.
We often write a diagonal matrix having main diagonal elements d1 dn as ⎛ ⎜ ⎜ ⎜ ⎝
d1
O d2
O
⎞ ⎟ ⎟ ⎟ ⎠
dn
with O in the upper right and lower left corners to indicate that all off-diagonal elements are zero. Here are some properties of diagonal matrices that make them pleasant to work with.
9.2 Diagonalization of Matrices
331
THEOREM 9.3
Let
⎛ ⎜ ⎜ D=⎜ ⎝
O
d1 d2 O
⎞
⎛
⎟ ⎟ ⎟ ⎠
⎜ ⎜ and W = ⎜ ⎝
dn
O
w1 w2
O
⎞ ⎟ ⎟ ⎟ ⎠
wn
Then 1.
⎛ ⎜ ⎜ DW = WD = ⎜ ⎝
d 1 w1
O d2 w2
O
⎞ ⎟ ⎟ ⎟ ⎠
dn wn
2. D = d1 d2 · · · dn . 3. D is nonsingular if and only if each main diagonal element is nonzero. 4. If each dj = 0, then ⎛ ⎞ 1/d1 O ⎜ ⎟ 1/d2 ⎜ ⎟ D−1 = ⎜ ⎟ ⎝ ⎠ 1/dn
O 5. The eigenvalues of D are its main diagonal elements. 6. An eigenvector associated with dj is ⎞ ⎛ 0 ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ ⎜ 0 ⎟ ⎟ ⎜ ⎜ 1 ⎟ ⎟ ⎜ ⎜ 0 ⎟ ⎟ ⎜ ⎜ ⎟ ⎝ ⎠ 0 with 1 in row j and all other elements zero.
We leave a proof of these conclusions to the student. Notice that (2) follows from the fact that a diagonal matrix is upper (and lower) triangular. “Most” square matrices are not diagonal matrices. However, some matrices are related to diagonal matrices in a way that enables us to utilize the nice features of diagonal matrices.
DEFINITION 9.4
Diagonalizable Matrix
An n × n matrix A is diagonalizable if there exists an n × n matrix P such that P−1 AP is a diagonal matrix. When such P exists, we say that P diagonalizes A.
332
CHAPTER 9
Eigenvalues, Diagonalization, and Special Matrices
The following theorem not only tells us when a matrix is diagonalizable, but also how to find a matrix P that diagonalizes it. THEOREM 9.4
Diagonalizability
Let A be an n × n matrix. Then A is diagonalizable if it has n linearly independent eigenvectors. Further, if P is the n × n matrix having these eigenvectors as columns, then P−1 AP is the diagonal matrix having the corresponding eigenvalues down its main diagonal. Here is what this means. Suppose 1 n are the eigenvalues of A (some possibly repeated), and V1 Vn are corresponding eigenvectors. If these eigenvectors are linearly independent, we can form a nonsingular matrix P using Vj as column j. It is the linear independence of the eigenvectors that makes P nonsingular. We claim that P−1 AP is the diagonal matrix having the eigenvalues of A down its main diagonal, in the order corresponding to the order the eigenvectors were listed as columns of P.
EXAMPLE 9.6
Let
−1 0
A=
4 3
A has eigenvalues −1 3 and corresponding eigenvectors P=
1 0
1 1
1 0
and
1 1
respectively. Form
Because the eigenvectors are linearly independent, this matrix is nonsingular (note that P = 0). We find that 1 −1 −1 P = 0 1 Now compute
−1
P AP = =
1 0 −1 0
−1 1 0 3
−1 0
4 3
1 0
1 1
which has the eigenvalues down the main diagonal, corresponding to the order in which the eigenvectors were written as columns of P. If we use the other order in writing the eigenvectors as columns, and define 1 1 Q= 1 0 then we get
−1
Q AQ =
3 0
0 −1
9.2 Diagonalization of Matrices
333
Any linearly independent eigenvectors can be used in this diagonalization process. For 6 −4 example, if we use and , which are simply nonzero scalar multiples of the 0 −4 previously used eigenvectors, then we can define 6 −4 S= 0 −4 and now
−1
S AS =
−1 0
0 3
EXAMPLE 9.7
Here is an example with more complicated arithmetic, but the idea remains the same. Let ⎞ ⎛ −1 1 3 4 ⎠ A=⎝ 2 1 1 0 −2 √ √ The eigenvalues are −1 − 21 + 21 29 − 21 − 21 29, with corresponding eigenvectors, respectively, √ √ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 3+ √ 29 29 3− √ ⎝ −3 ⎠ ⎝ 10 + 2 29 ⎠ ⎝ 10 − 2 29 ⎠ 1 2 2 These are linearly independent. Form the matrix √ ⎛ 1 3+ √ 29 P = ⎝ −3 10 + 2 29 1 2 We find that √ P−1 =
Then
⎛
232 √ 29
29 ⎜ √ ⎝ −16 − 2 29 √ 812 −16 − 2 29 ⎛ ⎜ P−1 AP = ⎜ ⎝
−1
√ ⎞ 3− √ 29 10 − 2 29 ⎠ 2
− √116 29 √ −1 +√ 29 1 + 29
0
0
√ −1+ 29 2
0
0
⎞
√ ⎟ −19 + 5√ 29 ⎠ 19 + 5 29
0 0
232 √ 29
√ −1− 29 2
⎞ ⎟ ⎟ ⎠
In this example, although we found P−1 explicitly, we did not actually need it to diagonalize A. Theorem 9.4 assures us that P−1 AP is a diagonal matrix with the eigenvalues of A down its main diagonal. All we really needed was to know that A had three linearly independent eigenvectors. This is a useful fact to keep in mind, particularly if P and P−1 are cumbersome to compute.
334
CHAPTER 9
Eigenvalues, Diagonalization, and Special Matrices
EXAMPLE 9.8
Let
−4 A= −2 √ √ The eigenvalues are −3 + 47i/2 and −3 − 47i/2. Corresponding eigenvectors are, respectively, 8 8 √ √ 1 − 47i 1 + 47i −1 3
Since these eigenvalues are linearly independent, there is a nonsingular 2 × 2 matrix P that diagonalizes A: −3+√47i 0 2 −1 P AP = √ −3− 47i 0 2 Of course, if we need P for some other calculation, as will occur later, we can write it down: 8 8 √ √ P= 1 − 47i 1 + 47i And, if we wish, we can compute P
−1
√
=
47i 752
√ 8 −1 − i√47 1 − i 47 −8
However, even without explicitly writing P−1 , we know what P−1 AP is.
EXAMPLE 9.9
It is not necessary that A have n distinct eigenvalues in order to have n linearly independent eigenvectors. For example, let ⎛ ⎞ 5 −4 4 A = ⎝ 12 −11 12 ⎠ 4 −4 5 The eigenvalues are 1 1 −3, with 1 having multiplicity 2. Associated with −3 we find an eigenvector ⎛ ⎞ 1 ⎝ 3 ⎠ 1 To find eigenvectors associated with 1 we must solve the system ⎞ ⎛ ⎞ ⎛ ⎞⎛ 0 −4 4 −4 x1 I3 − AX = ⎝ −12 12 −12 ⎠ ⎝ x2 ⎠ = ⎝ 0 ⎠ 0 −4 4 −4 x3 This system has general solution
⎞ ⎞ ⎛ 0 1 ⎝ 0 ⎠+⎝ 1 ⎠ 1 −1 ⎛
9.2 Diagonalization of Matrices
335
We can therefore find two linearly independent eigenvectors associated with eigenvalue 1, for example, ⎛ ⎞ ⎛ ⎞ 1 0 ⎝ 0 ⎠ and ⎝ 1 ⎠ −1 1 We can now form the nonsingular matrix ⎛
1 1 0 P=⎝ 3 1 −1
⎞ 0 1 ⎠ 1
that diagonalizes A: ⎛
−3 P−1 AP = ⎝ 0 0
0 1 0
⎞ 0 0 ⎠ 1
Here is a proof of Theorem 9.4, explaining why a matrix P formed from linearly independent eigenvectors must diagonalize A. The proof makes use of an observation we have made before. When multiplying two n × n matrices A and B, column j of AB = A(column j of B). Let the eigenvalues of A be 1 n and corresponding eigenvectors V1 Vn . These form the columns of P. Since these eigenvectors are assumed to be linearly independent, the dimension of the column space of P is n. Therefore rankP = n and P is nonsingular by Theorems 7.15 and 7.26. Now compute P−1 AP as follows. First,
Proof
column j of AP = A(column j of P = AVj = j Vj Thus the columns of AP are 1 V1 n Vn and AP has the form ⎛ ⎞ ··· AP = ⎝ 1 V1 2 V2 · · · n Vn ⎠ ··· Then column j of P−1 AP = P−1 (column j of AP) = P−1 j Vj = j P−1 Vj But Vj is column j of P, so ⎛ ⎜ ⎜ ⎜ ⎜ P−1 Vj = column j of P−1 P = ⎜ ⎜ ⎜ ⎜ ⎝
0 0 1 0
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
336
CHAPTER 9
Eigenvalues, Diagonalization, and Special Matrices
in which the column matrix on the right has all zero elements except 1 in row j. Combining the last two equations, we have ⎞ ⎛ ⎛ ⎞ 0 0 ⎜ 0 ⎟ ⎜ 0 ⎟ ⎟ ⎜ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎟ column j of P−1 AP = j P−1 Vj = j ⎜ ⎜ 1 ⎟ = ⎜ j ⎟ ⎟ ⎜ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ 0 0 We now know the columns of P−1 AP, and ⎛ 1 ⎜ 0 ⎜ ⎜ P−1 AP = ⎜ 0 ⎜ ⎝ 0
putting them together gives us ⎞ 0 0 ··· 0 2 0 · · · 0 ⎟ ⎟ 0 3 · · · 0 ⎟ ⎟ ⎟ ⎠ 0 0 · · · n
We can strengthen the conclusions of Theorem 9.4. So far, if A has n linearly independent eigenvectors, then we can diagonalize A. We will now show that this is the only time A can be diagonalized. Further, if, for any Q, Q−1 AQ is a diagonal matrix, then Q must have linearly independent eigenvectors of A as its columns.
THEOREM 9.5
Let A be an n × n diagonalizable matrix. Then A has n linearly independent eigenvectors. Further, if Q−1 AQ is a diagonal matrix, then the diagonal elements of Q−1 AQ are the eigenvalues of A, and the columns of Q are corresponding eigenvectors. Proof
Suppose that ⎛ ⎜ ⎜ Q−1 AQ = ⎜ ⎝
d1
O d2
O
⎞ ⎟ ⎟ ⎟ = D ⎠
dn
Denote column j of Q as Vj . Then V1 Vn are linearly independent, because Q is nonsingular. We will show that dj is an eigenvalue of A, with corresponding eigenvector Vj . Write AQ = QD and compute both sides of this product separately. First, since the columns of Q are V1 Vn , ⎛ ⎞ ··· QD = ⎝ V1 V2 · · · Vn ⎠ D ··· ⎛ ⎞ ··· = ⎝ d1 V1 d2 V2 · · · dn Vn ⎠ ···
9.2 Diagonalization of Matrices
337
a matrix having dj Vj as column j. Now compute ⎛ ⎞ ··· AQ = A ⎝ V1 V2 · · · Vn ⎠ ··· ⎛ ⎞ ··· = ⎝ AV1 AV2 · · · AVn ⎠ ··· a matrix having AVj as column j. Since AQ = QD, then column j of AQ equals column j of QD, so AVj = dj Vj which proves that dj is an eigenvalue of A with associated eigenvector Vj . As a consequence of this theorem, we see that not every matrix is diagonalizable.
EXAMPLE 9.10
Let
B=
1 0
−1 1
B has eigenvalues 1 1, and every eigenvector has the form
1 0
. There are not two linearly
independent eigenvectors, so B is not diagonalizable. We could also proceed here by contradiction. If B were diagonalizable, then for some P, 1 0 −1 P AP = 0 1 From Theorem 9.5, the columns of P must be eigenvectors, so P must have the form P= 0 0 But this matrix is singular (it has zero determinant, and its columns are multiples of each other, hence linearly dependent). Thus no matrix can diagonalize B. The key to diagonalization of an n × n matrix A is therefore the existence of n linearly independent eigenvectors. We saw (Example 9.9) that this does not require that the eigenvalues be distinct. However, if A does have n distinct eigenvalues, we claim that it must have n linearly independent eigenvectors, hence must be diagonalizable. THEOREM 9.6
Let the n×n matrix A have n distinct eigenvalues. Then corresponding eigenvectors are linearly independent. We will show by induction that any k distinct eigenvalues have associated with them k linearly independent eigenvectors. For k = 1, an eigenvector associated with a single eigenvalue is linearly independent, being a nonzero vector. Now suppose that any k − 1 distinct eigenvalues have associated with them k − 1 linearly independent eigenvectors. Suppose we have distinct
Proof
338
CHAPTER 9
Eigenvalues, Diagonalization, and Special Matrices
eigenvalues 1 k . Let V1 Vk be associated eigenvectors. We want to show that V1 Vk are linearly independent. If these eigenvectors were linearly dependent, there would be numbers c1 ck not all zero such that c1 V1 + · · · + ck Vk = O By relabeling if necessary, we may suppose for convenience that c1 = 0. Now 1 In − Ac1 V1 + · · · + ck Vk = O = c1 1 In − AV1 + c2 1 In − AV2 + · · · + ck 1 In − AVk = c1 1 V1 − AV1 + c2 1 V2 − AV2 + · · · + ck 1 Vk − AVk = c1 1 V1 − 1 V1 + c2 1 V2 − 2 V2 + · · · + ck 1 Vk − k Vk = c2 1 − 2 V2 + · · · + ck 1 − k Vk But V2 Vk are linearly independent by the inductive hypothesis, so each of these coefficients must be zero. Since 1 − j = 0 for j = 2 k by the assumption that the eigenvalues are distinct, then c2 = · · · = ck = 0 But then c1 V1 = O. Since V1 is an eigenvector and cannot be O, then c1 = 0 also, a contradiction. Therefore V1 Vk are linearly independent.
COROLLARY 9.1
If an n × n matrix A has n distinct eigenvalues, then A is diagonalizable.
EXAMPLE 9.11
Let ⎛
−2 ⎜ 1 A=⎜ ⎝ 0 2 The eigenvalues of A are 3 4 − 25 + diagonalizable. For some P, ⎛
3 ⎜ 0 ⎜ P−1 AP = ⎜ ⎝ 0 0
1 2
0
0 0 4 0
⎞ 5 0 ⎟ ⎟ 0 ⎠ −3
√ √ 41 − 25 − 21 41. Because these are distinct, A is
0 4 0
0 3 4 0
0 0
− 25
0 0
√ + 41 1 2
0
0
− 25
√ − 21 41
We do not need to actually produce P explicitly to conclude this.
⎞ ⎟ ⎟ ⎟ ⎠
9.3 Orthogonal and Symmetric Matrices
SECTION 9.2
PROBLEMS ⎛
−2 ⎜ −4 10. ⎜ ⎝ 0 0
In each of Problems 1 through 10, produce a matrix that diagonalizes the given matrix, or show that this matrix is not diagonalizable. 0 −1 1. 4 3 5 3 2. 1 3 1 0 3. −4 1 −5 3 4. 0 9 ⎞ ⎛ 5 0 0 3 ⎠ 5. ⎝ 1 0 0 0 −2 ⎞ ⎛ 0 0 0 6. ⎝ 1 0 2 ⎠ 0 1 3 ⎞ ⎛ −2 0 1 0 ⎠ 7. ⎝ 1 1 0 0 −2 ⎞ ⎛ 2 0 0 2 1 ⎠ 8. ⎝ 0 0 −1 2 ⎞ ⎛ 1 0 0 0 ⎜ 0 4 1 0 ⎟ ⎟ 9. ⎜ ⎝ 0 0 −3 1 ⎠ 0 0 1 −2
9.3
339
0 −2 0 0
⎞ 0 0 ⎟ ⎟ 0 ⎠ −2
0 0 −2 0
11. Suppose A2 is diagonalizable. Prove that A is diagonalizable. 12. Let A have eigenvalues 1 n and suppose P diagonalizes A. Prove that, for any positive integer k, ⎛ ⎜ ⎜ Ak = P ⎜ ⎝
k1
O k2
O
⎞ ⎟ ⎟ −1 ⎟P ⎠
kn
In each of Problems 13 through 16, compute the indicated power of the matrix, using the idea of Problem 12. −1 0 A18 13. A = 1 −5 −3 −3 A16 14. A = −2 4 0 −2 A43 15. A = 1 0 −2 3 A31 16. A = 3 −4
Orthogonal and Symmetric Matrices Recall that the transpose of a matrix is obtained by interchanging the rows with the columns. For example, if A=
−6 1
3 −7
−6 3
1 −7
then At =
Usually At is simply another matrix. However, in the special circumstance that the transpose of a matrix is its inverse, we call A an orthogonal matrix.
340
CHAPTER 9
Eigenvalues, Diagonalization, and Special Matrices
DEFINITION 9.5
Orthogonal Matrix
A square matrix A is orthogonal if and only if AAt = At A = In .
An orthogonal matrix is therefore nonsingular, and we find its inverse simply by taking its transpose.
EXAMPLE 9.12
Let
⎛
0
⎜ A=⎝ 1 0 Then
⎛
0
⎜ AAt = ⎝ 1 0
√1 5
√2 5
0
0
√2 5
− √15
√1 5
√2 5
0
0
√2 5
− √15
⎞⎛ ⎟⎜ ⎠⎝
⎞ ⎟ ⎠
0
1
0
√1 5 √2 5
0
√2 5 − √15
0
⎞ ⎟ ⎠ = I3
and a similar calculation gives At A = I3 . Therefore this matrix is orthogonal, and ⎞ ⎛ 0 1 0 ⎟ ⎜ √2 A−1 = At = ⎝ √15 0 ⎠ 5 √2 √1 0 − 5 5 Because the transpose of the transpose of a matrix is the original matrix, a matrix is orthogonal exactly when its transpose is orthogonal. THEOREM 9.7
A is an orthogonal matrix if and only if At is an orthogonal matrix. Orthogonal matrices have several interesting properties. We will show first that the determinant of an orthogonal matrix must be 1 or −1. THEOREM 9.8
If A is an orthogonal matrix, then A = ±1. t 2 t Proof Since AA = In , then AA = 1 = A At = A . The next property of orthogonal matrices is actually the rationale for the name orthogonal. A set of vectors in Rn is said to be orthogonal if any two distinct vectors in the set are orthogonal (that is, their dot product is zero). The set is orthonormal if, in addition, each vector has length 1. We claim that the rows of an orthogonal matrix form an orthonormal set of vectors, as do the columns.
9.3 Orthogonal and Symmetric Matrices
341
This can be seen in the matrix of the last example. The row vectors are 0 √15 √25 1 0 0 0 √25 − √15 These each have length 1, and each is orthogonal to each of the other two. Similarly, the columns of that matrix are ⎛ ⎞ ⎛ √1 ⎞ ⎛ √2 ⎞ 5 5 0 ⎟ ⎜ ⎟ ⎝ 1 ⎠⎜ ⎝ 0 ⎠⎝ 0 ⎠ 0 √2 − √1 5
5
Each is orthogonal to the other two, and each has length 1. Not only do the row (column) vectors of an orthogonal matrix form an orthonormal set of vectors in Rn , but this property completely characterizes orthogonal matrices. THEOREM 9.9
Let A be a real n × n matrix. Then 1. A is orthogonal if and only if the row vectors form an orthonormal set of vectors in Rn . 2. A is orthogonal if and only if the column vectors form an orthonormal set of vectors in Rn . Proof Recall that the i j element of AB is the dot product of row i of A with column j of B. Further, the columns of At are the rows of A. Therefore,
i j element of AAt = (row i of A · column j of At ) = row i of A) · row j of A). Now suppose that A is an orthogonal matrix. Then AAt = In , so the i j element of AAt is zero if i = j. Therefore the dot product of two distinct rows of A is zero, and the rows form an orthogonal set of vectors. Further, the dot product of row i with itself is the i i element of AAt , and this is 1, so the rows form an orthonormal set of vectors. Conversely, suppose the rows of A form an orthonormal set of vectors. Then the dot product row i with row j is zero if i = j, so the i j element of AAt is zero if i = j. Further, the i i element of AAt is the dot product of row i with itself, and this is 1. Therefore AAt = In . Similarly, At A is In , so A is an orthogonal matrix. This proves (1). A proof of (2) is similar. We now have a great deal of information about orthogonal matrices. We will use this to completely determine all 2 × 2 orthogonal matrices. Let a b Q= c d What do we have to say about a, b, c and d to make this an orthogonal matrix? First, the two row vectors must be orthogonal (zero dot product), and must have length 1, so ac + bd = 0
(9.1)
a2 + b 2 = 1
(9.2)
c2 + d2 = 1
(9.3)
The two column vectors must also be orthogonal, so in addition, ab + cd = 0
(9.4)
342
CHAPTER 9
Eigenvalues, Diagonalization, and Special Matrices
Finally, Q = ±1, so ad − bc = ±1 This leads to two cases. Case 1—ad − bc = 1. Multiply equation (9.1) by d to get acd + bd2 = 0 Substitute ad = 1 + bc into this equation to get c1 + bc + bd2 = 0 or c + bc2 + d2 = 0 But c2 + d2 = 1 from equation (9.3), so c + b = 0, hence c = −b Put this into equation (9.4) to get ab − bd = 0 Then b = 0 or a = d, leading to two subcases. Case 1-(a)—b = 0. Then c = −b = 0 also, so a 0 Q= 0 d But each row vector has length 1, so a2 = d2 = 1. Further, Q = ad = 1 in the present case, so a = d = 1 or a = d = −1. In these cases, Q = I2 Case 1-(b)—b = 0. Then a = d, so Q=
or
a −b
Q = −I2
b a
Since a2 + b2 = 1, there is some in 0 2 such that a = cos and b = sin. Then cos sin Q= − sin cos This includes the two results of case 1(a) by choosing = 0 or = . Case 2—ad − bc = −1. By an analysis similar to that just done, we find now that, for some , cos sin Q= sin − cos These two cases give all the 2 × 2 orthogonal matrices. For example, with = /4 we get the orthogonal matrices 1 1 1 1 √ 2 − √12
√
2 √1 2
and
√ 2 √1 2
√ 2 − √12
9.3 Orthogonal and Symmetric Matrices and with = /6 we get
√ 3 2
− 21
1 2 √ 3 2
and
√ 3 2 1 2
1 2 √ − 23
343
We can recognize the orthogonal matrices cos sin − sin cos as rotations in the plane. If the positive x y system is rotated counterclockwise radians to form a new x y system, the coordinates in the two systems are related by x cos sin x = y − sin cos y We will now consider another kind of matrix that is related to the class of orthogonal matrices.
DEFINITION 9.6
Symmetric Matrix
A square matrix is symmetric if A = At .
This means that each aij = aji , or that the main diagonal. For example, ⎛ −7 ⎜ −2 ⎜ ⎝ 1 14
the matrix elements are the same if reflected across −2 2 −9 47
⎞ 1 14 −9 47 ⎟ ⎟ 6 ⎠ 22
is symmetric. A symmetric matrix need not have real numbers as elements. However, when it does, it has the remarkable property of having only real eigenvalues. THEOREM 9.10
The eigenvalues of a real, symmetric matrix are real numbers. Before showing why this is true, we will review √ some facts about complex numbers. A complex number z = a + ib has magnitude z = a2 + b2 . The conjugate of z is defined to be z = a − ib. When z is represented as the point a b in the plane, z is the point a −b, which is the reflection of a b across the x-axis. A number is real exactly when it equals its own conjugate. Further, zz = a2 + b2 = z2 and z = z We take the conjugate A of a matrix A by taking the conjugate of each of its elements. The product of a conjugate is the conjugate of a product: AB = A B
344
CHAPTER 9
Eigenvalues, Diagonalization, and Special Matrices
Further, the operation of taking the conjugate commutes with the operation of taking a transpose: t
C = Ct For example, if ⎛
⎞ 1 − 2i 0 ⎠ 4
⎛
⎞ 1 + 2i 0 ⎠ 4
i 3 C=⎝ −2 + i then
−i 3 C=⎝ −2 − i and t
C =
−i 1 + 2i
−2 − i 4
3 0
= Ct
We will now prove that the eigenvalues of a real symmetric matrix must be real. Proof
Let A be an n × n matrix of real numbers. Let be an eigenvalue, and let ⎛ ⎞ e1 ⎜ e2 ⎟ ⎜ ⎟ E=⎜ ⎟ ⎝ ⎠ en
be an associated eigenvector. Then AE = E. Multiply this equation on the left by the 1 × n matrix t E = e1 e2 · · · en to get t
t
t
E AE = E E = E E
=
e1
e2
···
⎛ en
⎜ ⎜ ⎜ ⎝
e1 e2
⎞ ⎟ ⎟ ⎟ ⎠
en = e1 e1 + e2 e2 + · · · + en en = e1 2 + e2 2 + · · · + en 2
(9.5)
which is a real number. Here we are using the standard convention that a 1 × 1 matrix is identified with its single element. Now compute t t E AE = E A E = Et AE in which we have used the fact that A has real elements to write A = A.
(9.6)
9.3 Orthogonal and Symmetric Matrices
345
Now Et AE is a 1×1 matrix, and so is the same as its transpose. Recalling that the transpose of a product is the product of the transposes in the reverse order, take the transpose of the last equation (9.6) to get t t t Et AE = Et AE = E AEt t = E AE (9.7) From equations (9.6) and (9.7) we have t
t
E AE = E AE t
Therefore the 1 × 1 matrix E AE, being equal to its conjugate, is a real number. Now return to equation (9.5). We have just shown that the left side of this equation is real. Therefore the right side must be real. But e1 2 + e2 2 + · · · + en 2 is certainly real. Therefore is real, and the theorem is proved. One ramification of this theorem is that a real, symmetric matrix also has real eigenvectors. We claim that, more than this, eigenvectors from distinct eigenvalues are orthogonal. THEOREM 9.11
Let A be a real symmetric matrix. Then eigenvectors associated with distinct eigenvalues are orthogonal. Proof
Let and be distinct eigenvalues with, respectively, eigenvectors ⎛ ⎞ ⎞ ⎛ g1 e1 ⎜ g2 ⎟ ⎜ e2 ⎟ ⎜ ⎟ ⎟ ⎜ E = ⎜ ⎟ and G = ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ en gn
Identifying, as usual, a real number with the 1×1 matrix having this number as its only element, the dot product of these two n-vectors can be written as a matrix product e1 g1 + · · · + en gn = Et G Since AE = E and AG = G, we have Et G = Et G = AEt G = Et At G = Et AG = Et AG = Et G = Et G Then − Et G = 0 But = , so Et G = 0 and the dot product of these two eigenvectors is zero. These eigenvectors are therefore orthogonal.
EXAMPLE 9.13
Let
⎛
3 0 A=⎝ 0 2 −2 0
⎞ −2 0 ⎠ 0
346
CHAPTER 9
Eigenvalues, Diagonalization, and Special Matrices
a 3 × 3 real symmetric matrix. The eigenvalues are 2 −1 4, with associated eigenvectors ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 0 1 2 ⎝ 1 ⎠⎝ 0 ⎠⎝ 0 ⎠ 0 2 −1 These form an orthogonal set of vectors. In this example, the eigenvectors, while orthogonal to each other, are not all of length 1. However, a scalar multiple of an eigenvector is an eigenvector, so we can also write the following eigenvectors of A: ⎛ ⎞ ⎛ √1 ⎞ ⎛ √2 ⎞ 0 5 5 ⎟ ⎜ ⎟ ⎝ 1 ⎠⎜ ⎝ 0 ⎠⎝ 0 ⎠ 0 √2 − √15 5 These are still mutually orthogonal (multiplying by a positive scalar does not change orientation), but are now orthonormal. They can therefore be used as columns of an orthogonal matrix ⎛ ⎞ √2 0 √15 5 ⎜ ⎟ Q=⎝ 1 0 0 ⎠ 0 √25 − √15 These column vectors, being orthogonal to each other, are linearly independent by Theorem 6.14. But whenever we form a matrix from linearly independent eigenvectors of A, this matrix diagonalizes A. Further, since Q is an orthogonal matrix, Q−1 = Qt . Therefore, as we can easily verify in this example, ⎛ ⎞ 2 0 0 Q−1 AQ = ⎝ 0 −1 0 ⎠ 0 0 4 The idea we have just illustrated forms the basis for the following result. THEOREM 9.12
Let A be a real, symmetric matrix. Then there is a real, orthogonal matrix that diagonalizes A.
EXAMPLE 9.14
Let
⎛
2 A=⎝ 1 0 The eigenvalues are
√
1 −2 4
⎞ 0 4 ⎠ 2
√ 21 − 21 and 2, with associated eigenvectors, respectively ⎞ ⎞ ⎛ ⎞ ⎛ ⎛ −4 √ 1 √1 ⎝ 21 − 2 ⎠ ⎝ − 21 − 2 ⎠ ⎝ 0 ⎠ 1 4 4
9.4 Quadratic Forms
347
These eigenvectors are mutually orthogonal, but not orthonormal. Divide each eigenvector by its length to get the three new eigenvectors: ⎞ ⎞ ⎛ ⎛ ⎛ ⎞ 1 −4 √1 1⎝ √ 1 1 21 − 2 ⎠ ⎝ − 21 − 2 ⎠ ⎝ 0 ⎠ 1 4 4 where
" =
√ 42 − 4 21
and =
√
17
The orthogonal matrix Q having these normalized eigenvectors as columns diagonalizes A.
SECTION 9.3
PROBLEMS
In each of Problems 1 through 12, find the eigenvalues of the matrix and, for each eigenvalue, a corresponding eigenvector. Check that eigenvectors associated with distinct eigenvalues are orthogonal. Find an orthogonal matrix that diagonalizes the matrix. 4 −2 1. −2 1 −3 5 2. 5 4 6 1 3. 1 4 −13 1 4. 1 4 ⎛ ⎞ 0 1 0 5. ⎝ 1 −2 0 ⎠ 0 0 3 ⎛ ⎞ 0 1 1 6. ⎝ 1 2 0 ⎠ 1 0 2
9.4
⎛
5 7. ⎝ 0 2 ⎛ 8.
9.
10.
11.
12.
2 ⎝ −4 0 ⎛ 0 ⎝ 0 0 ⎛ 1 ⎝ 3 0 ⎛ 0 ⎜ 0 ⎜ ⎝ 0 0 ⎛ 5 ⎜ 0 ⎜ ⎝ 0 0
⎞ 2 0 ⎠ 0
0 0 0
−4 0 0 0 1 −2 3 0 1
⎞ 0 0 ⎠ 0 ⎞ 0 −2 ⎠ 0 ⎞
0 1 ⎠ 1
0 1 −2 0
0 −2 1 0
0 0 −1 0
0 −1 0 0
⎞ 0 0 ⎟ ⎟ 0 ⎠ 0 ⎞ 0 0 ⎟ ⎟ 0 ⎠ 0
Quadratic Forms DEFINITION 9.7
A (complex) quadratic form is an expression n n
ajk zj zk
j=1 k=1
in which each ajk and zj are complex numbers.
(9.8)
348
CHAPTER 9
Eigenvalues, Diagonalization, and Special Matrices
For n = 2 this quadratic form is a11 z1 z1 + a12 z1 z2 + a21 z1 z2 + a22 z2 z2 The terms involving zj zk with j = k, are the mixed product terms. The quadratic form is real if each ajk and zj is real. In this case we usually write zj as xj . Since xj = xj when xj is real, the form (9.8) in this case is n n
ajk xj xk
j=1 k=1
For n = 2, this is a11 x12 + a12 + a21 x1 x2 + a22 x22 The terms involving x12 and x22 are the squared terms in this real quadratic form, and x12 is the mixed product term. It is often convenient to write a quadratic form (9.8) in matrix form. If A = aij and ⎛ ⎜ ⎜ Z=⎜ ⎝
⎞
z1 z2
⎟ ⎟ ⎟ ⎠
zn then ⎛ t
Z AZ =
=
z1
z2
···
zn
⎜ ⎜ ⎜ ⎝
a11 z1 + · · · + a1n zn
a11 a21
a12 a22
··· ···
a1n a2n
an1
an2
···
ann
···
⎞⎛ ⎟⎜ ⎟⎜ ⎟⎜ ⎠⎝
z1 z2
⎞ ⎟ ⎟ ⎟ ⎠
zn ⎛ z1 z ⎜ ⎜ 2 an1 z1 + · · · + ann zn ⎜ ⎝
⎞ ⎟ ⎟ ⎟ ⎠
zn = a11 z1 z1 + · · · + a1n zn z1 + · · · + an1 z1 zn + · · · + ann zn zn =
n n
ajk zj zk
j=1 k=1
Similarly, any real quadratic form can be written in matrix form as Xt AX. Given a quadratic form, we may choose different matrices A such that the form is Zt AZ.
EXAMPLE 9.15
Let A=
1 3
4 2
9.4 Quadratic Forms Then
x1
x2
1 3
4 2
x1 x2
=
x1 + 3x2
4x1 + 2x2
x1 x2
349
= x12 + 3x1 x2 + 4x1 x2 + 2x22 = x12 + 7x1 x2 + 2x22 But we can also write this quadratic form as x12 +
7 7 x1 x2 + x2 x1 + 2x22 = x1 2 2
x2
1
7 2
7 2
2
x1 x2
The advantage of the latter formulation is that the quadratic form is Xt AX with A a symmetric matrix. There is an expression involving a quadratic form that gives the eigenvalues of a matrix in terms of an associated eigenvector. We will have use for this shortly.
LEMMA 9.1
Let A be an n × n matrix of real or complex numbers. Let be an eigenvalue with eigenvector Z. Then t
=
Proof
t
Z AZ t
ZZ
t
Since AZ = Z, then Z AZ = Z Z.
Using a calculation done in equation (9.5), we can write n n 1 = 2 ajk zj zk n j=1 k=1 j=1 zj
Quadratic forms arise in a variety of contexts. In mechanics, the kinetic energy of a particle is a real quadratic form, and in analytic geometry a conic is the locus of points in the plane for which a quadratic form in the coordinates is equal to some constant. For example, 1 x12 + x22 = 9 4 is the equation of an ellipse in the x1 x2 plane. In some problems involving quadratic forms, calculations are simplified if we transform from the x1 x2 xn coordinate system to a y1 y2 yn system in which there are no mixed product terms. That is, we want to choose y1 yn so that n n j=1 k=1
aij xj xk =
n
j yj2
j=1
The y1 yn coordinates are called principal axes for the quadratic form.
(9.9)
350
CHAPTER 9
Eigenvalues, Diagonalization, and Special Matrices
This kind of transformation is commonly done in analytic geometry, where a rotation of axes is used to eliminate mixed product terms in the equation of a conic. For example, the change of variables 1 1 x1 = √ y1 + √ y2 2 2 1 1 x2 = √ y1 − √ y2 2 2 transforms the quadratic form x12 − 2x1 x2 + x22 to 2y22 with no mixed product term. Using this transformed form, we could analyze the graph of x12 − 2x1 x2 + x22 = 4 in the x1 x2 − system, in terms of the graph of y22 = 2 in the y1 y2 − system. √ In the y1 y2 − plane it is clear that the graph consists of two horizontal straight lines y2 = ± 2. We will now show that a transformation that eliminates the mixed product terms of a real quadratic form always exists. THEOREM 9.13
Principal Axis Theorem
Let A be a real symmetric matrix with eigenvalues 1 n . Let Q be orthogonal matrix an that diagonalizes A. Then the change of variables X = QY transforms nj=1 nk=1 ajk xj xk to n
j yj2
j=1
Proof
The proof is a straightforward calculation: n n
aij xj xk = Xt AX
j=1 k=1
= QYt AQY = Yt Qt AQY = Yt Qt AQY = y1
···
⎛
⎜ ⎜ yn ⎜ ⎝
O
1 2 O
n
⎞
⎛ ⎟ ⎟⎜ ⎟⎝ ⎠
⎞ y1 ⎟ ⎠ yn
= 1 y12 + · · · + n yn2 The expression 1 y12 + · · · + n yn2 is called the standard form of the quadratic form Xt AX.
9.4 Quadratic Forms
351
EXAMPLE 9.16
Consider again x12 − 2x1 x2 + x22 . This is Xt AX with 1 −1 A= −1 1 The eigenvalues of A are 0 and 2, with corresponding eigenvectors 1 1 and 1 −1 Dividing each eigenvector by its length, we obtain the eigenvectors 1 1 √
2 √1 2
√ 2 − √12
and
These form the columns of an orthogonal matrix Q that diagonalizes A: 1 1 Q= The transformation defined by X = QY is x1 = x2
√ 2 √1 2
√1 2 √1 2
√
2
− √12
√1 2 − √12
y1 y2
which gives exactly the transformation used above to reduce the quadratic form x12 − 2x1 x2 + x22 to the standard form 2y22 .
EXAMPLE 9.17
Analyze the conic 4x12 − 3x1 x2 + 2x22 = 8. First write the quadratic form as Xt AX = 8, where 4 − 23 A= − 23 2 √ The eigenvalues of A are 6 ± 13/2. By the principal axis theorem there is an orthogonal matrix Q that transforms the equation of the conic to standard form: √ √ 6 + 13 2 6 − 13 2 y1 + y2 = 8 2 2 This is an ellipse in the y1 y2 − plane. Figure 9.2 shows a graph of this ellipse. x2 y 2
x1 y1 FIGURE 9.2
CHAPTER 9
352
Eigenvalues, Diagonalization, and Special Matrices
PROBLEMS
SECTION 9.4
In each of Problems 1 through 6, find a matrix A such that the quadratic form is Xt AX. 1.
x12 + 2x1 x2 + 6x22
14. x12 − 2x1 x2 + 4x22 = 6 15. 3x12 + 5x1 x2 − 3x22 = 5 16. −2x12 + 3x22 + x1 x2 = 5
2. 3x12 + 3x22 − 4x1 x2 − 3x1 x3 + 2x2 x3 + x32
17. 4x12 − 4x22 + 6x1 x2 = 8
3. x12 − 4x1 x2 + x22
18. 6x12 + 2x1 x2 + 5x22 = 14
4.
2x12 − x22 + 2x1 x2
5. −x12 + x42 − 2x1 x4 + 3x2 x4 − x1 x3 + 4x2 x3 6. x12 − x22 − x1 x3 + 4x2 x3 In Problems 7 through 13, find the standard form of the quadratic form. 7. −5x12 + 4x1 x2 + 3x22 8. 4x12 − 12x1 x2 + x22 9. −3x12 + 4x1 x2 + 7x22 10. 4x12 − 4x1 x2 + x22 11. −6x1 x2 + 4x22 12. 5x12 + 4x1 x2 + 2x22 13. −2x1 x2 + 2x32 In each of Problems 14 through 18, use the principal axis theorem to analyze the conic.
9.5
In each of Problems 19 through 22, write the quadratic form defined by the matrix. −2 1 19. 1 6 ⎞ ⎛ 14 −3 0 2 1 ⎠ 20. ⎝ −3 0 1 7 ⎞ ⎛ 6 1 −7 0 ⎠ 21. ⎝ 1 2 −7 0 1 ⎞ ⎛ 7 1 −2 0 −1 ⎠ 22. ⎝ 1 −2 −1 3 23. Give an example of a real, 3 × 3 matrix that cannot be the coefficient matrix of a real quadratic form.
Unitary, Hermitian, and Skew Hermitian Matrices If U is a nonsingular complex matrix, then U−1 exists and is generally also a complex matrix. We claim that the operations of taking the complex conjugate and of taking a matrix inverse can be performed in either order. LEMMA 9.2
U
−1
= U−1 .
Proof
We know that the conjugate of a product is the product of the conjugates, so In = In = UU−1 = U U−1
This implies that U−1 is the inverse of U. Now define a matrix to be unitary if the inverse of its conjugate (or conjugate of its inverse) is equal to its transpose.
DEFINITION 9.8
Unitary Matrix
An n × n complex matrix U is unitary if and only if U
−1
= Ut .
9.5 Unitary, Hermitian, and Skew Hermitian Matrices
353
This condition is equivalent to saying that UUt = In .
EXAMPLE 9.18
Let
U=
√ i/√2 −i/ 2
√ 1/√2 1/ 2
Then U is unitary because
√ √ √ −i/√2 1/√2 i/√2 UU = i/ 2 1/ 2 1/ 2 1 0 = 0 1
√ −i/√2 1/ 2
t
If U is a real matrix, then the unitary condition UUt = In becomes UUt = In , which makes U an orthogonal matrix. Unitary matrices are the complex analogues of orthogonal matrices. Since the rows (or columns) of an orthogonal matrix form an orthonormal set of vectors, we will develop the complex analogue of the concept of orthonormality. Recall that, for two vectors x1 xn and y1 yn in Rn , we can define the column matrices ⎛ ⎛ ⎞ ⎞ y1 x1 ⎜ y2 ⎟ ⎜ x2 ⎟ ⎜ ⎜ ⎟ ⎟ X = ⎜ ⎟ and Y = ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ xn
yn
and obtain the dot product X · Y as Xt Y. In particular, this gives the square of the length of X as Xt X = x12 + x22 + · · · + xn2 To generalize this to the complex case, suppose we have complex n-vectors z1 z2 zn and w1 w2 wn . Form the column matrices ⎛ ⎛ ⎞ ⎞ w1 z1 ⎜ w2 ⎟ ⎜ z2 ⎟ ⎜ ⎜ ⎟ ⎟ Z = ⎜ ⎟ and W = ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ zn
wn
It is tempting to define the dot product of these complex vectors as Zt W. The problem with this is that then we get Zt Z = z21 + z22 + · · · + z2n and this will in general be complex. We want to interpret the dot product of a vector with itself as the square of its length, and this should be a nonnegative real number. We get around this by defining the dot product of complex Z and W to be t
Z · W = Z W = z1 w1 + z2 w2 + · · · + zn wn
354
CHAPTER 9
Eigenvalues, Diagonalization, and Special Matrices
In this way the dot product of Z with itself is t
Z Z = z1 z1 + z2 z2 + · · · + zn zn = z1 2 + z2 2 + · · · + zn 2 a nonnegative real number With this as background, we will define the complex analogue of an orthonormal set of vectors.
DEFINITION 9.9
Unitary System of Vectors
Complex n-vectors F1 Fr form a unitary system if Fj · Fk = 0 for j = k, and each Fj · Fj = 1.
If each Fj has all real components, then this corresponds exactly to an orthonormal set of vectors in Rn . We can now state the analogue of Theorem 9.9 for unitary matrices. THEOREM 9.14
Let U be an n × n complex matrix. Then U is unitary if and only if its row vectors form a unitary system. The proof is like that of Theorem 8.9, and is left to the student. It is not difficult to show that U is also unitary if and only if its column vectors form a unitary system.
EXAMPLE 9.19
Consider again U=
√ i/√2 −i/ 2
√ 1/√2 1/ 2
The row vectors, written as 2 × 1 matrices, are √ √ i/√2 −i/√2 F1 = and F2 = 1/ 2 1/ 2 Then F1 · F2 = F1 · F1 =
√
i/ 2 √ i/ 2
√ −i/√2 = 0 1/ 2 1/ 2 √ √ i/ 2 √ =1 1/ 2 1/ 2 √
√
and F2 · F2 =
√
−i/ 2
1/ 2
√ −i/√ 2 = 1 1/ 2
9.5 Unitary, Hermitian, and Skew Hermitian Matrices
355
We will show that the eigenvalues of a unitary matrix must lie on the unit circle in the complex plane.
THEOREM 9.15
Let be an eigenvalue of the unitary matrix U. Then = 1. Let E be an eigenvector associated with . Then UE = E, so UE = E. Then t t UE = E
Proof
so t
t
t
E U = E t
But U is unitary, so U = U−1 , and t
t
E U−1 = E Multiply both sides of this equation on the right by UE to get t
t
t
t
E E = E UE = E E = E E t
Now E E is the dot product of the eigenvector with itself, and so is a positive number. Dividing t the last equation by E E gives = 1. But then 2 = 1, so = 1. We have defined a matrix to be unitary if its transpose is the conjugate of its inverse. A matrix is hermitian if its transpose is equal to its conjugate. If the transpose equals the negative of its conjugate, the matrix is called skew-hermitian.
DEFINITION 9.10
1. Hermitian Matrix An n × n complex matrix H is hermitian if and only if H = Ht . 2. Skew-Hermitian Matrix An n × n complex matrix S is skew-hermitian if and only if S = −St .
In the case that H has real elements, hermitian is the same as symmetric, because in this case H = H.
EXAMPLE 9.20
Let
⎛
15 H = ⎝ −8i 6 + 2i
8i 0 −4 − i
⎞ 6 − 2i −4 + i ⎠ −3
356
CHAPTER 9
Eigenvalues, Diagonalization, and Special Matrices
Then ⎛
⎞ 6 + 2i −4 − i ⎠ = Ht −3
−8i 0 −4 + i
15 H = ⎝ 8i 6 − 2i so H is hermitian. If ⎛
0 S = ⎝ 8i 2i then S is skew-hermitian because ⎛
−8i 0 −4i
0 S = ⎝ −8i −2i
8i 0 4i
⎞ 2i 4i ⎠ 0
⎞ −2i −4i ⎠ = −St 0
The following theorem says something about quadratic forms with hermitian or skewhermitian matrices.
THEOREM 9.16
Let ⎛ ⎜ ⎜ Z=⎜ ⎝
z1 z2
⎞ ⎟ ⎟ ⎟ ⎠
zn be a complex matrix. Then t
1. If H is hermitian, then Z HZ is real. t 2. If S is skew-hermitian, then Z SZ is zero or pure-imaginary. t
For (1), suppose H is hermitian. Then H = H, so
Proof
t
t
Z HZ = Z HZ = Zt HZ t
But Z HZ is a 1 × 1 matrix and so equals its own transpose. Continuing from the last equation, we have t t t t Zt HZ = Zt HZ = Z H Zt t = Z HZ Therefore t
t
Z HZ = Z HZ t
t
Since Z HZ equals its own conjugate, then Z HZ is real.
9.5 Unitary, Hermitian, and Skew Hermitian Matrices
357
t
To prove (2) suppose S is skew-hermitian. Then S = −S. By an argument like that done in the proof of (1), we get t
t
Z SZ = −Z SZ t
Now write Z SZ = + i. The last equation becomes − i = − − i t
Then = −, so = 0 and Z SZ is pure imaginary. Using these results on quadratic forms, we can say something about the eigenvalues of hermitian and skew-hermitian matrices. THEOREM 9.17
1. The eigenvalues of a hermitian matrix are real. 2. The eigenvalues of a skew-hermitian matrix are zero or pure imaginary. For (1), let be an eigenvalue of the hermitian matrix H , with associated eigenvector E. By Lemma 8.1,
Proof
t
=
E HE t
EE
But by (1) of the preceding theorem, the numerator of this quotient is real. The denominator is the square of the length of E, and so is also real. Therefore is real. For (2), let be an eigenvalue of the skew-hermitian matrix S, with associated eigenvector E. Again by Lemma 8.1, t
=
E SE t
EE
By (2) of the preceding theorem, the numerator of this quotient is either zero or pure imaginary. Since the denominator is a positive real number, then is either zero or pure imaginary. Figure 9.3 shows a graphical representation of the conclusions of Theorems 9.15 and 9.17. When plotted as points in the complex plane, eigenvalues of a unitary matrix lie on the unit circle about the origin, eigenvalues of a hermitian matrix lie on the horizontal (real) axis, and eigenvalues of a skew-hermitian matrix lie on the vertical (imaginary) axis.
Imaginary axis Unitary (eigenvalues have magnitude 1)
Skew-hermitian (pure imaginary eigenvalues) Real axis
1
Hermitian (real eigenvalues) FIGURE 9.3 Eigenvalue locations.
358
CHAPTER 9
SECTION 9.5
Eigenvalues, Diagonalization, and Special Matrices
PROBLEMS
In each of Problems 1 through 9, determine whether the matrix is unitary, hermitian, skew-hermitian, or none of these. Find the eigenvalues of each matrix and an associated eigenvector for each eigenvalue. Determine which matrices are diagonalizable. If a matrix is diagonalizable, produce a matrix that diagonalizes it. 0 2i 1. 2i 4 3 4i 2. 4i −5 ⎞ ⎛ 0 1 0 0 1−i ⎠ 3. ⎝ −1 0 −1 − i 0 √ √ ⎛ ⎞ 1/ √2 i/√2 0 4. ⎝ −1/ 2 i/ 2 0 ⎠ 0 0 1 ⎞ ⎛ 3 2 0 0 i ⎠ 5. ⎝ 2 0 −i 0
⎛
−1 6. ⎝ 0 3+i ⎛ i 7. ⎝ −1 0 ⎛ 3i 8. ⎝ −1 0 ⎛ 8 9. ⎝ −1 −i
0 1 0 1 0 2i 0 0 −i −1 0 0
⎞ 3−i 0 ⎠ 0 ⎞ 0 2i ⎠ 0 ⎞ 0 i ⎠ 0 ⎞ i 0 ⎠ 0
10. Let A be unitary, hermitian or skew-hermitian. Prove that AAt = AA. 11. Prove that the main diagonal elements of a skewhermitian matrix must be zero or pure imaginary. 12. Prove that the main diagonal elements of a hermitian matrix must be real. 13. Prove that the product of two unitary matrices is unitary.
PA RT
3
CHAPTER 10 Systems of Linear Differential Equations
CHAPTER 11 Qualitative Methods and Systems of Nonlinear Differential Equations
Systems of Differential Equations and Qualitative Methods
We will now use matrices to study systems of differential equations. These arise, for example, in modeling mechanical and electrical systems having more than one component. We will separate our study of systems of differential equations into two chapters. The first, Chapter 10, is devoted to systems of linear differential equations. For these, powerful matrix methods can be brought to bear to write solutions. For systems of nonlinear differential equations, for which we usually cannot write explicit solutions, we must develop a different set of tools designed to determine qualitative properties of solutions. This is done in Chapter 11.
359
This page intentionally left blank
CHAPTER
10
THEORY OF SYSTEMS OF LINEAR FIRST-ORDER DIFFERENTIAL EQUATIONS SOLUTION OF X = AX WHEN A IS CONSTANT SOLUTION OF X = AX + G THEORY OF SYSTEMS OF LINEAR FIRST-ORDER
Systems of Linear Differential Equations
Before beginning to study linear systems, recall from Section 2.6.4 that a linear differential equation of order n always gives rise to a system of n first-order linear differential equations, in such a way that the solution of the system gives the solution of the original nth order equation. Systems can be treated using matrix techniques, which are now at our disposal. For this reason we did not spend time on differential equations of order higher than 2 in Part 1. We will assume familiarity with vectors in Rn , matrix algebra, determinants, and eigenvalues and eigenvectors. These can be reviewed as needed from Part 2. We begin by laying the foundations for the use of matrices to solve linear systems of differential equations.
10.1
Theory of Systems of Linear First-Order Differential Equations In this chapter we will consider systems of n first-order linear differential equations in n unknown functions:
x1 t = a11 tx1 t + a12 tx2 t + · · · + a1n txn t + g1 t x2 t = a21 tx1 t + a22 tx2 t + · · · + a2n txn t + g2 t xn t = an1 tx1 t + an2 tx2 t + · · · + ann txn t + gn t 361
362
CHAPTER 10
Systems of Linear Differential Equations
Let
⎛ ⎜ ⎜ At = ⎜ ⎝ ⎛ ⎜ ⎜ Xt = ⎜ ⎝
a11 t a21 t
a12 t a22 t
an1 t
an2 t
x1 t x2 t
⎞ ⎟ ⎟ ⎟ ⎠
··· ···
a1n t a2n t
··· · · · ann t ⎛
⎜ ⎜ and Gt = ⎜ ⎝
xn t
⎞ ⎟ ⎟ ⎟ ⎠ g1 t g2 t
⎞ ⎟ ⎟ ⎟ ⎠
gn t
Differentiate a matrix by differentiating each element, so ⎛ ⎞ x1 t ⎜ x2 t ⎟ ⎜ ⎟ X t = ⎜ ⎟ ⎝ ⎠ xn t Matrix differentiation follows the “normal” rules we learn in calculus. For example, XtYt = X tYt + XtY t in which the order of the factors must be maintained. Now the system of differential equations is X t = AtXt + Gt
(10.1)
or X = AX + G This system is nonhomogeneous if Gt = O for at least some t, in which O denotes the n × 1 zero matrix ⎛ ⎞ 0 ⎜ 0 ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ 0 If Gt = O for all the relevant values of t, then the system is homogeneous, and we write just X = AX A solution of X = AX + G is any n × 1 matrix of functions that satisfies this matrix equation.
EXAMPLE 10.1
The 2 × 2 system x1 = 3x1 + 3x2 + 8 x2 = x1 + 5x2 + 4e3t
10.1 Theory of Systems of Linear First-Order Differential Equations can be written
x1 x2
=
One solution is
Xt =
3 1
3 5
x1 x2
+
8 4e3t
3e2t + e6t − 4e3t − 103 −e2t + e6t + 23
363
as can be verified by substitution into the system. In terms of individual components, this solution is x1 t = 3e2t + e6t − 4e3t −
10 3
2 x2 t = −e2t + e6t + 3
On February 20, 1962, John Glenn became the first American to orbit the Earth. His flight lasted nearly five hours and included three complete circuits of the globe. This and subsequent Mercury orbitings paved the way for the space shuttle program, which now includes shuttles launched from the NASA Kennedy Space Center to carry out experiments under zero gravity, as well as delivery of personnel and equipment to the developing international space station. Ultimate goals of space shuttle missions include studying how humans function in a zero-gravity environment over extended periods of time, scientific observations of phenomena in space and on our own planet, and the commercial development of space. Computation of orbits and forces involved in shuttle missions involves the solution of large systems of differential equations.
364
CHAPTER 10
Systems of Linear Differential Equations
Initial conditions for the system (10.1) have the form ⎛ ⎞ x1 t0 ⎜ x2 t0 ⎟ ⎜ ⎟ Xt0 = ⎜ ⎟ = X0 ⎝ ⎠ xn t0 in which X0 is a given n × 1 matrix of constants. The initial value problem we will consider for systems is the problem: X = AX + G
Xt0 = X0
(10.2)
This is analogous to the initial value problem x = ax + g
xt0 = x0
for single first-order equations. Theorem 1.3 gave criteria for existence and uniqueness of solutions of this initial value problem. The analogous result for the initial value problem (10.2) is given by the following. THEOREM 10.1
Existence and Uniqueness
Let I be an open interval containing t0 . Suppose each aij t and gj t are continuous on I. Let X0 be a given n × 1 matrix of real numbers. Then the initial value problem X = AX + G
Xt0 = X0
has a unique solution defined for all t in I.
EXAMPLE 10.2
Consider the initial value problem x1 = x1 + tx2 + cost x2 = t3 x1 − et x2 + 1 − t x1 0 = 2 x2 0 = −5 This is the system
X =
1 t3
t −et
with
X0 =
cost 1−t
X+
2 −5
This initial value problem has a unique solution defined for all real t, because each aij t and gj t are continuous for all real t. We will now determine what we must look for to find all solutions of X = AX + G. This will involve a program that closely parallels that for the single first-order equation x = ax + g, beginning with the homogeneous case.
10.1 Theory of Systems of Linear First-Order Differential Equations
10.1.1
365
Theory of the Homogeneous System X = AX
We begin with the homogeneous system X = AX. Because solutions of X = AX are n × 1 matrices of real functions, these solutions have an algebraic structure, and we can form linear combinations (finite sums of scalar multiples of solutions). In the homogeneous case, any linear combination of solutions is again a solution.
THEOREM 10.2
Let 1 k be solutions of X = AX, all defined on some open interval I. Let c1 ck be any real numbers. Then the linear combination c1 1 + · · · + ck k is also a solution of X = AX, defined on I. Proof
Compute c1 1 + · · · + ck k = c1 1 + · · · + ck k = c1 A1 + · · · + ck Ak = Ac1 1 + · · · + ck k
Because of this, the set of all solutions of X = AX has the structure of a vector space, called the solution space of this system. It is not necessary to have a background in vector spaces to follow the discussion of solutions of X = AX that we are about to develop. However, for those who do have this background we will make occasional reference to show how ideas fit into this algebraic framework. In a linear combination c1 1 + · · · + ck k of solutions, any j that is already a linear combination of the other solutions is unnecessary. For example, suppose 1 = a2 2 + · · · + ak k . Then c1 1 + · · · + ck k = c1 a2 2 + · · · + ak k + c2 2 + · · · + ck k = c1 a2 + c2 2 + · · · + c1 ak + ck k In this case any linear combination of 1 2 k is actually a linear combination of just 2 k , and 1 is not needed. 1 is redundant in the sense that, if we have 2 k , then we have 1 also. We describe this situation by saying that the functions 1 2 k are linearly dependent. If no one of the functions is a linear combination of the others, then these functions are called linearly independent.
DEFINITION 10.1 Linear Dependence
Solutions 1 2 k , of X = AX, defined on an interval I, are linearly dependent on I if one solution is a linear combination of the others on this interval. Linear Independence
Solutions 1 2 k , of X = AX, defined on an interval I, are linearly independent on I if no solution in this list is a linear combination of the others on this interval.
Thus a set of solutions is linearly independent if it is not linearly dependent. Linear dependence of functions is a stronger condition than linear dependence of vectors. For vectors in Rn , V1 is a linear combination of V2 and V3 if V1 = aV2 + bV3 for some real
366
CHAPTER 10
Systems of Linear Differential Equations
numbers a and b. In this case V1 V2 V3 are linearly dependent. But for solutions 1 2 3 of X = AX, 1 is a linear combination of 2 and 3 if there are numbers a and b such that 1 t = a2 t + b2 t for all t in the relevant interval, perhaps the entire real line. It is not enough to have this condition hold for just some values of t.
EXAMPLE 10.3
Consider the system
X = It is routine to check that
1 t =
−2e3t e3t
1 1
−4 5
X
and 2 t =
1 − 2te3t te3t
are solutions, defined for all real values of t. These solutions are linearly independent on the entire real line, since neither is a constant multiple of the other (for all real t). The function 11 − 6te3t 3 t = −4 + 3te3t is also a solution. However, 1 2 3 are linearly dependent, because, for all real t, 3 t = −41 t + 32 t This means that 3 is a linear combination of 1 and 2 , and the list of solutions 1 2 3 , although longer, carries no more information about the solution of X = AX than the list of solutions 1 2 . If is a solution of X = AX, then is an n × 1 column matrix of functions: ⎛ ⎞ f1 t ⎜ f2 t ⎟ ⎜ ⎟ t = ⎜ ⎟ ⎝ ⎠ fn t For any choice of t, say t = t0 , this is an n × 1 matrix of real numbers which can be thought of as a vector in Rn . This point of view, and some facts about determinants, provides us with a test for linear independence of solutions of X = AX. The following theorem reduces the question of linear independence of n solutions of X = AX, to a question of whether an n × n determinant of real numbers is nonzero.
THEOREM 10.3
Test for Linear Independence of Solutions
Suppose that
⎛
⎜ ⎜ 1 t = ⎜ ⎝
11 t 21 t n1 t
⎞
⎛
⎜ ⎟ ⎜ ⎟ ⎟ 2 t = ⎜ ⎝ ⎠
12 t 22 t n2 t
⎞
⎛
⎜ ⎟ ⎜ ⎟ ⎟ n t = ⎜ ⎝ ⎠
1n t 2n t nn t
⎞ ⎟ ⎟ ⎟ ⎠
10.1 Theory of Systems of Linear First-Order Differential Equations
367
are solutions of X = AX on an open interval I. Let t0 be any number in I. Then 1. 1 2 n are linearly independent on I if and only if 1 t0 n t0 are linearly independent, when considered as vectors in Rn . 2. 1 2 n are linearly independent on I if and only if 11 t0 12 t0 · · · 1n t0 21 t0 22 t0 · · · 2n t0 = 0 · · · n1 t0 n2 t0 · · · nn t0 Conclusion (2) is an effective test for linear independence of n solutions of X = AX on an open interval. Evaluate each solution at some point of the interval. Each j t0 is an n × 1 (constant) column matrix. Evaluate the determinant of the n × n matrix having these columns. If this determinant is nonzero, then the solutions are linearly independent; if it is zero, they are linearly dependent. Another way of looking at (2) of this theorem is that it reduces a question of linear independence of n solutions of X = AX, to a question of linear independence of n vectors in Rn . This is because the determinant in (2) is nonzero exactly when its row (or column) vectors are linearly independent.
EXAMPLE 10.4
From the preceding example, −2e3t 1 t = e3t are solutions of
and
2 t =
1 − 2te3t te3t
1 −4 X = X 1 5 on the entire real line, which is an open interval. Evaluate these solutions at some convenient point, say t = 0: −2 1 1 0 = and 2 0 = 1 0
Use these as columns of a 2 × 2 matrix and evaluate its determinant: −2 1 1 0 = −1 = 0 Therefore 1 and 2 are linearly independent solutions. A proof of Theorem 10.3 makes use of the uniqueness of solutions of the initial value problem (Theorem 10.1). For (1), let t0 be any point in I. Suppose first that 1 n are linearly dependent on I. Then one of the solutions is a linear combination of the others. By reordering if necessary, say 1 is a linear combination of 2 n . Then there are numbers c2 cn so that
Proof
1 t = c2 2 t + · · · + cn n t
368
CHAPTER 10
Systems of Linear Differential Equations
for all t in I. In particular, 1 t0 = c2 2 t0 + · · · + cn n t0 This implies that the vectors 1 t0 n t0 are linearly dependent vectors in Rn . Conversely, suppose that 1 t0 n t0 are linearly dependent in Rn . Then one of these vectors is a linear combination of the others. Again, as a convenience, suppose 1 t0 is a linear combination of 2 t0 n t0 . Then there are numbers c2 cn such that 1 t0 = c2 2 t0 + · · · + cn n t0 Define t = 1 t − c2 2 t − · · · − cn n t for all t in I. Then is a linear combination of solutions of X = AX, hence is a solution. Further ⎛ ⎞ 0 ⎜ 0 ⎟ ⎜ ⎟ t0 = ⎜ ⎟ ⎝ ⎠ 0 Therefore, on I, is a solution of the initial value problem X = AX But the zero function
Xt0 = O ⎛
⎜ ⎜ t = ⎜ ⎝
0 0
⎞ ⎟ ⎟ ⎟ ⎠
0 is also a solution of this initial value problem. Since this initial value problem has a unique solution, then for all t in I, ⎛ ⎞ 0 ⎜ 0 ⎟ ⎜ ⎟ t = t = ⎜ ⎟ ⎝ ⎠ 0 Therefore
⎛ ⎜ ⎜ t = 1 t − c2 2 t − · · · − cn n t = ⎜ ⎝
0 0
⎞ ⎟ ⎟ ⎟ ⎠
0 for all t in I, which means that 1 t = c2 2 t + · · · + cn n t for all t in I. Therefore 1 is a linear combination of 2 n , hence 1 2 n are linearly dependent on I.
10.1 Theory of Systems of Linear First-Order Differential Equations
369
Conclusion (2) follows from (1) and the fact that n vectors in Rn are linearly independent if and only if the determinant of the n × n matrix having these vectors as columns is nonzero. Thus far we know how to test n solutions of X = AX for linear independence, if A is n × n. We will now show that n linearly independent solutions are enough to determine all solutions of X = AX on an open interval I. We saw a result like this previously when it was found that two linear independent solutions of y + pxy + qxy = 0 determine all solutions of this equation. THEOREM 10.4
Let A = aij t be an n × n matrix of functions that are continuous on an open interval I. Then 1. The system X = AX has n linearly independent solutions defined on I. 2. Given any n linearly independent solutions 1 n defined on I, every solution on I is a linear combination of 1 n . By (2), every solution of X = AX, defined on I , must be of the form c1 1 + c2 2 + · · · + cn n . For this reason, this linear combination, with 1 n any n linearly independent solutions, is called the general solution of X = AX on I. To prove that there are matrices ⎛ 1 ⎜ 0 ⎜ ⎜ E1 = ⎜ 0 ⎜ ⎝
Proof
n linearly independent solutions, define the n × 1 constant ⎞
⎛
⎟ ⎜ ⎟ ⎜ ⎟ 2 ⎜ ⎟E = ⎜ ⎟ ⎜ ⎠ ⎝
0
0 1 0
⎞
⎛
⎟ ⎜ ⎟ ⎜ ⎟ ⎜ n ⎟E = ⎜ ⎟ ⎜ ⎠ ⎝
0
0 0 0
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
1
Pick any t0 in I. We know from Theorem 10.1 that the initial value problem X = AX
Xt0 = Ej
has a unique solution j defined on I, for j = 1 2 n. These solutions are linearly independent by Theorem 10.3, because the way the initial conditions were chosen, the n × n matrix whose columns are these solutions evaluated at t0 is In , with determinant 1. This proves (1). To prove (2), suppose now that 1 n are any n linearly independent solutions of X = AX, defined on I. Let be any solution. We want to prove that is a linear combination of 1 n . Pick any t0 in I. We will first show that there are numbers c1 cn such that t0 = c1 1 t0 + · · · + cn n t0 Now t0 , and each j t0 , is an n × 1 column matrix of constants. Form the n × n matrix S using 1 t0 n t0 as its columns, and consider the system of n linear algebraic equations in n unknowns ⎛ ⎞ c1 ⎜ c2 ⎟ ⎟ ⎜ S ⎜ ⎟ = t0 (10.3) ⎝ ⎠ cn
370
CHAPTER 10
Systems of Linear Differential Equations
The columns of S are linearly independent vectors in Rn , because 1 n are linearly independent. Therefore S is nonsingular, and the system (10.3) has a unique solution. This solution gives constants c1 cn such that t0 = c1 1 t0 + · · · + cn n t0 We now claim that t = c1 1 t + · · · + cn n t for all t in I. But observe that and c1 1 + · · · + cn n are both solutions of the initial value problem X = AX
Xt0 = t0
Since this problem has a unique solution, then t = c1 1 t + · · · + cn n t for all t in I , and the proof is complete. In the language of linear algebra, the solution space of X = AX has dimension n, the order of the coefficient matrix A. Any n linearly independent solutions form a basis for this vector space.
EXAMPLE 10.5
Previously we saw that 1 t =
−2e3t e3t
and 2 t =
1 − 2te3t te3t
are linearly independent solutions of
X =
1 1
−4 5
X
Because A is 2 × 2 and we have 2 linearly independent solutions, the general solution of this system is −2e3t 1 − 2te3t t = c1 + c 2 e3t te3t The expression on the right contains every solution of this system. In terms of components, x1 t = −2c1 e3t + c2 1 − 2te3t x2 t = c1 e3t + c2 te3t We will now make a useful observation. In the last example, form a 2 × 2 matrix t having 1 t and 2 t as columns: −2e3t 1 − 2te3t
t = e3t te3t
10.1 Theory of Systems of Linear First-Order Differential Equations Now observe that, if C =
c1 c2
371
, then
tC = =
−2e3t 1 − 2te3t c1 e3t te3t c2
c1 −2e3t + c2 1 − 2te3t
c1 e3t + c2 te3t −2e3t 1 − 2te3t + c = c1 2 e3t te3t = c1 1 t + c2 2 t
The point is that we can write the general solution c1 1 + c2 2 compactly as tC, with
t a square matrix having the independent solutions as columns, and C a column matrix of arbitrary constants. We call a matrix formed in this way a fundamental matrix for the system X = AX. In terms of this fundamental matrix, the general solution is Xt = tC. We can see that tC also satisfies the matrix differential equation X = AX. Recall that we differentiate a matrix by differentiating each element of the matrix. Then, because C is a constant matrix, −6e3t e3t − 6te3t tC = t C = C 3e3t e3t + 3te3t Now compute A tC = A tC 1 −4 −2e3t 1 − 2te3t = C 1 5 e3t te3t −2e3t − 4e3t 1 − 2te3t − 4te3t C = −2e3t + 5e3t 1 − 2te3t + 5te3t −6e3t 1 − 6te3t = 3e3t e3t + 3te3t Therefore tC = A tC as occurs if tC is a solution of X = AX.
DEFINITION 10.2
is a fundamental matrix for the n × n system X = AX if the columns of are linearly independent solutions of this system.
Writing the general solution of X = AX as Xt = C is particularly convenient for solving initial value problems.
372
CHAPTER 10
Systems of Linear Differential Equations
EXAMPLE 10.6
Solve the initial value problem
X =
−4 5
1 1
X0 =
−2 3
We know from Example 10.5 that the general solution if Xt = tC, where −2e3t 1 − 2te3t
t = e3t te3t We need to choose C so that
X0 = 0C =
−2 3
Putting t = 0 into , we must solve the algebraic system −2 1 −2 C= 1 0 3 The solution is
C= =
−2 1 0 1
−1 1 −2 0 3 1 −2 3 = 2 3 4
The unique solution of the initial value problem is therefore 3 t = t 4 −2e−3t − 8te3t = 3e3t + 4te3t In this example 0−1 could be found by linear algebra methods (Sections 7.8.1 and 8.7) or by using a software package.
10.1.2 General Solution of the Nonhomogeneous System X = AX+ G Solutions of the nonhomogeneous system X = AX + G do not have the algebraic structure of a vector space, because linear combinations of solutions are not solutions. However, we will show that the general solution of this system (an expression containing all possible solutions) is the sum of the general solution of the homogeneous system X = AX, and any particular solution of the nonhomogeneous system. This is completely analogous to Theorem 2.5 for the second order equation y + pxy + qxy = fx.
THEOREM 10.5
Let be a fundamental matrix for X = AX, and let p be any solution of X = AX + G. Then the general solution of X = AX + G is X = C + p , in which C is an n × 1 matrix of arbitrary constants.
10.1 Theory of Systems of Linear First-Order Differential Equations Proof
373
First, C + p is a solution of the nonhomogeneous system, because C + p = C + p = A C + A p + G = A C + p + G
Now let be any solution of X = AX + G. We claim that − p is a solution of X = AX. To see this, calculate − p = − p = A + G − A p + G = A − p Since C is the general solution of X = AX, there is a constant n × 1 matrix K such that − p = K. Then = K + p , completing the proof. We now know what to look for in solving a system of n linear, first-order differential equations in n unknown functions. For the homogeneous system, X = AX, we look for n linearly independent solutions to form a fundamental matrix t. For the nonhomogeneous system X = AX + G, we first find the general solution C of X = AX, and any particular solution p of X = AX + G. The general solution of X = AX + G is then C + p . This is an overall strategy. Now we need ways of implementing it and actually producing fundamental matrices and particular solutions for given systems.
SECTION 10.1
PROBLEMS
In each of Problems 1 through 5, (a) verify that the given functions satisfy the system, (b) form a fundamental matrix t for the system, (c) write the general solution in the form tC, carry out this product, and verify that the rows of tC are the components of the given solution, and (d) find the unique solution satisfying the initial conditions. 1.
4.
x1 t = 2e
x2 t = c1 e
x1 = 5x1 + 3x2 x2 = x1 + 3x2
x2 t = 2c1 e4t cost − sint + 2c2 e4t cost + sint
√ 15t 15t + c2 sin 2 2
3t/2
√ √ √ 15t 15t − cos + 15 sin 2 2 √ √ √ 15t 15t + 15 cos 2 2
x1 0 = −2 x2 0 = 7
x1 t = c1 e4t cost + c2 e4t sint 5.
x1 = 5x1 − 4x2 + 4x3 x2 = 12x1 − 11x2 + 12x3 x3 = 4x1 − 4x2 + 5x3
x1 0 = −2 x2 0 = 1 x1 t = 3x1 + 8x2 x2 t = x1 − x2 x1 t = 4c1 e
c1 cos
x1 = 2x1 + x2 x2 = −3x1 + 6x2
√ 1+2 3t
√
− c2 e3t/2 sin
x1 0 = 0 x2 0 = 4
3.
3t/2
x1 t = −c1 e2t + 3c2 e6t x2 t = c1 e2t + c2 e6t
2.
x1 = x1 − x2 x2 = 4x1 + 2x2
√ 1−2 3t
x1 t = c1 et + c3 e−3t x2 t = c2 e2t + c3 e−3t
+ 4c2 e √ √ √ √ x2 t = −1 + 3c1 e1+2 3t + −1 − 3c2 e1−2 3t
x3 t = c2 − c1 et + c3 e−3t
x1 0 = 2 x2 0 = 2
x1 0 = 1 x2 0 = −3 x3 0 = 5
CHAPTER 10
374
10.2
Systems of Linear Differential Equations
Solution of X = AX when A is Constant Consider the system X = AX, with A an n × n matrix of real numbers. In the case y = ay, with a constant, we get exponential solutions y = ceax . This suggests we try a similar solution for the system. Try X = et , with an n × 1 matrix of constants to be determined, and a number to be determined. Substitute this proposed solution into the differential equation to get et = Aet This requires that A = We should therefore choose as an eigenvalue of A, and as an associated eigenvector. We will summarize this discussion.
THEOREM 10.6
Let A be an n × n matrix of real numbers. Then et is a nontrivial solution of X = AX if and only if is an eigenvalue of A, with associated eigenvector . We need n linearly independent solutions to form a fundamental matrix. We will have these if we can find n linearly independent eigenvectors, whether or not some eigenvalues may be repeated. THEOREM 10.7
Let A be an n × n matrix of real numbers. Suppose A has eigenvalues 1 n , and suppose there are associated eigenvectors 1 n that are linearly independent. Then 1 e1 t n en t are linearly independent solutions of X = AX, on the entire real line. We know that each j ej t is a nontrivial solution. The question is whether these solutions are linearly independent. Form the n × n matrix having these solutions, evaluated at t = 0, as its columns. This matrix has n linearly independent columns 1 n , and therefore has a nonzero determinant. By Theorem 10.3(2), 1 e1 t n en t are linearly independent on the real line.
Proof
EXAMPLE 10.7
Consider the system
X =
4 3
2 3
X
1 . These 1 − 23 eigenvectors are linearly independent (originating from distinct eigenvalues), so we have two linearly independent solutions, 1 1 t e and e6t 1 − 23 A has eigenvalues 1 and 6, with corresponding eigenvectors
1
and
10.2 Solution of X = AX when A is Constant We can write the general solution as
Xt = c1
1
et + c2
− 23
1 1
375
e6t
Equivalently, we can write the fundamental matrix et e6t
t = − 23 et e6t In terms of , the general solution is Xt = tC. In terms of components, x1 t = c1 et + c2 e6t 3 x2 t = − c1 et + c2 e6t 2
EXAMPLE 10.8
Solve the system
⎛
5 X = ⎝ 12 4
−4 −11 −4
⎞ 4 12 ⎠ X 5
The eigenvalues of A are −3 1 1. Even though one eigenvalue is repeated, A has three linearly independent eigenvectors. They are: ⎛ ⎞ 1 ⎝ 3 ⎠ associated with eigenvalue − 3 1 and
⎛
⎞ 1 ⎝ 1 ⎠ 0
⎛ and
⎞ −1 ⎝ 0 ⎠ associated with 1 1
This gives us three linearly independent solutions ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 1 −1 ⎝ 3 ⎠ e−3t ⎝ 1 ⎠ et ⎝ 0 ⎠ et 1 0 1 A fundamental matrix is
⎛
e−3t ⎝ 3e−3t
t = e−3t
et et 0
⎞ −et 0 ⎠ et
The general solution is Xt = tC.
EXAMPLE 10.9 A Mixing Problem
Two tanks are connected by a series of pipes, as shown in Figure 10.1. Tank 1 initially contains 20 liters of water in which 150 grams of chlorine are dissolved. Tank 2 initially contains 50 grams of chlorine dissolved in 10 liters of water.
376
CHAPTER 10
Systems of Linear Differential Equations Pure water: 3 liters/min
Mixture: 3 liters/min
Tank 1
Mixture: 2 liters/min
Tank 2
Mixture: 4 liters/min
Mixture: 1 liter/min
FIGURE 10.1
Beginning at time t = 0, pure water is pumped into tank 1 at a rate of 3 liters per minute, while chlorine/water solutions are interchanged between the tanks and also flow out of both tanks at the rates shown. The problem is to determine the amount of chlorine in each tank at any time t > 0. At the given rates of input and discharge of solutions, the amount of solution in each tank will remain constant. Therefore, the ratio of chlorine to chlorine/water solution in each tank should, in the long run, approach that of the input, which is pure water. We will use this observation as a check of the analysis we are about to do. Let xj t be the number of grams of chlorine in tank j at time t. Reading from Figure 10.1, rate of change of xj t = xj t = rate in minus rate out gram liter x gram liter ·0 +3 · 2 =3 min liter min 10 liter liter liter x1 gram x gram −2 −4 · · 1 min 20 liter min 20 liter =−
6 3 x1 + x2 20 10
Similarly, with the dimensions excluded, x2 t = 4
x x 4 x1 4 − 3 2 − 2 = x1 − x2 20 10 10 20 10
The system is X = AX, with A=
− 103 1 5
3 10 − 25
The initial conditions are x1 0 = 150 x2 0 = 50 or X0 =
150 50
10.2 Solution of X = AX when A is Constant
377
The eigenvalues of A are − 101 and − 35 , and corresponding eigenvectors are, respectively, 3 −1 2 and 1 1 These are linearly independent and we can write the fundamental matrix 3 e−t/10 −e−3t/5 2
t = e−t/10 e−3t/5 The general solution is Xt = tC. To solve the initial value problem, we must find C so that 3 −1 150 2 X0 = = 0C = C 50 1 1 Then
C= =
3 2
−1
1
1
2 5 2 −5
2 5 3 5
−1
150 50
150 50
=
80 −30
The solution of the initial value problem is 3 e−t/10 −e−3t/5 80 2 Xt = −30 e−t/10 e−3t/5 120e−t/10 + 30e−3t/5 = 80e−t/10 − 30e−3t/5 Notice that x1 t → 0 and x2 t → 0 as t → , as we expected.
10.2.1
Solution of X = AX when A has Complex Eigenvalues
Consider a system X = AX. If A is a real matrix, the characteristic polynomial of A has real coefficients. It may, however, have some complex roots. Suppose = + i is a complex eigenvalue, with eigenvector . Then A = , so A = But A = A if A has real elements, so A = This means that = − i is also an eigenvalue, with eigenvector . This means that et and et can be used as two of the n linearly independent solutions needed to form a fundamental matrix. This resulting fundamental matrix will contain some complex entries. There is nothing wrong with this. However, sometimes it is convenient to have a fundamental matrix with only real entries. We will show how to replace these two columns, involving complex numbers, with two other linearly independent solutions involving only real quantities. This can be done for any pair of columns arising from a pair of complex conjugate eigenvalues.
378
CHAPTER 10
Systems of Linear Differential Equations
THEOREM 10.8
Let A be an n × n real matrix. Let + i be a complex eigenvalue with corresponding eigenvector U + iV, in which U and V are real n × 1 matrices. Then et U cost − V sint and et U sint + V cost are real linearly independent solutions of X = AX.
EXAMPLE 10.10
Solve the system X = AX, with ⎛
2 A=⎝ 0 0
0 −2 2
⎞ 1 −2 ⎠ 0
√ √ The eigenvalues are 2 −1 + 3i −1 − 3i. Corresponding eigenvectors are, respectively, ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1√ 1 1 √ ⎝ 0 ⎠ ⎝ −2 3i ⎠ ⎝ 2 3i ⎠ √ √ 0 −3 + 3i −3 − 3i One solution is
⎛
⎞ 1 ⎝ 0 ⎠ e2t 0
and two other solutions are ⎛ ⎞ 1√ √ ⎝ −2 3i ⎠ e−1+ 3it √ −3 + 3i
⎛ and
⎝
1 √
⎞
⎠ e−1− 2 3i √ −3 − 3i
√ 3tt
These three solutions are linearly independent and can be used as columns of a fundamental matrix ⎞ ⎛ √ √ e−1+ 3it√ e−1− 3tt√ e2t √ √ ⎟ ⎜ −1− 3tt
1 t = ⎝ 0 −2 √ 3ie−1+ 3it 2 3ie ⎠ √ √ √ −1+ 3it −1− 3tt 0 −3 + 3ie −3 − 3ie However, we can also produce a real fundamental matrix as follows. First write ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ 0√ 1√ 1 ⎝ −2 3i ⎠ = ⎝ 0 ⎠ + i ⎝ −2 3 ⎠ = U + iV √ √ −3 3 −3 + 3i
10.2 Solution of X = AX when A is Constant with
⎛
⎞ 1 U=⎝ 0 ⎠ −3
Then
379
⎛
and
⎞ 0√ ⎠ V = ⎝ −2 √ 3 3
⎞ 1√ √ √ √ ⎝ −2 3i ⎠ e−1+ 3it = U + iV e−t cos 3t + ie−t sin 3t √ −3 + 3i √ √ √ = Ue−t cos 3t − Ve−t sin 3t + i Ve−t cos 3t √ (10.4) +Ue−t sin 3t ⎛
And ⎛
⎞ 1 √ √ √ √ ⎝ 2 3i ⎠ e−1− 3tt = U − iV e−t cos 3t − ie−t sin 3t √ −3 − 3i √ √ √ = Ue−t cos 3t − Ve−t sin 3t − i Ve−t cos 3t √ + Ue−t sin 3t (10.5)
The functions (10.4) and (10.5) are solutions, so any linear combination of these is also a solution. Taking their sum and dividing by 2 yields the solution √ √ 1 t = Ue−t cos 3t − Ve−t sin 3t And taking their difference and dividing by 2i yields the solution √ √ 2 t = Ve−t cos 3t + Ue−t sin 3t Using these, together with the solution found from the eigenvalue 2, we can form the fundamental matrix √ √ ⎛ 2t ⎞ e−t cos 3t e−t sin 3t e √ √ √ √ −t −t ⎠
2 t = ⎝ 0 2 3e √ sin√ 3t √ √−2 3e√ cos 3t √ −t −t 0 e −3 cos 3t − 3 sin 3t e 3 cos 3t − 3 sin 3t Either fundamental matrix can be used to write the general solution, Xt = 1 tC or Xt =
2 tK. However, the latter involves only real numbers and real-valued functions. A proof of the theorem follows the reasoning of the example, and is left to the student.
10.2.2
Solution of X = AX when A does not have n Linearly Independent Eigenvectors
We know how to produce a fundamental matrix for X = AX when A has n linearly independent eigenvectors. This certainly occurs if A has n distinct eigenvalues, and may even occur when A has repeated eigenvalues. However, we may encounter a matrix A having repeated eigenvalues, for which there are not n linearly independent eigenvectors. In this case we cannot yet write a fundamental matrix. This section is devoted to a procedure to follow in this case to find a fundamental matrix. We will begin with two examples and then make some general remarks.
380
CHAPTER 10
Systems of Linear Differential Equations
EXAMPLE 10.11
We will solve the system X = AX, with
A=
1 −3
3 7
A has one eigenvalue 4 of multiplicity 2. Eigenvectors all have the form
1 1
, with = 0.
A does not have two linearly independent eigenvectors. We can immediately write one solution 1 1 t = e4t 1 1 We need another solution. Write E1 = and attempt a second solution 1 2 t = E1 te4t + E2 e4t in which E2 is a 2 × 1 constant matrix to be determined. For this to be a solution, we need to have 2 t = A2 t: E1 e4t + 4te4t + 4E2 e4t = AE1 te4t + AE2 e4t Divide this equation by e4t to get E1 + 4E1 t + 4E2 = AE1 t + AE2 But AE1 = 4E1 , so the terms having t as a factor cancel and we are left with AE2 − 4E2 = E1 Write this equation as If E2 =
a b
A − 4I2 E2 = E1
, this is the linear system of two equations in two unknowns: A − 4I2
or
−3 −3
3 3
a b
1 1
=
a b
=
1 1
s This system has general solution E2 = , in which s can be any nonzero number. Let 1+3s 3 1 s = 1 to get E2 = and hence the second solution 4 3
2 t = E1 te + E2 e = 4t
=
1+t 4 +t 3
4t
e4t
1 1
te + 4t
1 4 3
e4t
10.2 Solution of X = AX when A is Constant
381
If we use 1 0 and 2 0 as columns to form the matrix 1 1 1 43 then this matrix has determinant 1/3, hence 1 and 2 are linearly independent by Theorem 10.3(2). Therefore 1 t and 2 t are linearly independent and can be used as columns of a fundamental matrix 4t 1 + te4t e
t = e4t 43 + te4t The general solution of X = AX is Xt = tC. The procedure followed in this example is similar in spirit to solving the differential equation y − 5y + 6y = e3x by undetermined coefficients. We are tempted to try yp x = ae3x , but this will not work because e3x is a solution of y − 5y + 6y = 0. We therefore try yp x = axe3x , multiplying the first attempt ae3x by x. The analogous step for the system was to try the second solution 2 t = E1 te4t + E2 e4t . We will continue to explore the case of repeated eigenvalues with another example.
EXAMPLE 10.12
Consider the system X = AX, in which ⎛
⎞ −2 −1 −5 0 ⎠ A = ⎝ 25 −7 0 1 3
A has eigenvalue scalar ⎛ −2⎞with multiplicity 3, and corresponding eigenvectors are ⎛ all nonzero ⎞ −1 −1 multiples of ⎝ −5 ⎠. This gives us one solution of X = AX. Denoting ⎝ −5 ⎠ = E1 , we 1 1 have one solution ⎞ ⎛ −1 1 t = ⎝ −5 ⎠ e−2t = E1 e−2t 1 We need three linearly independent solutions. We will try a second solution of the form 2 t = E1 te−2t + E2 e−2t in which E2 is a 3 × 1 matrix to be determined. Substitute this proposed solution into X = AX to get E1 e−2t − 2te−2t + E2 −2e−2t = AE1 te−2t + AE2 e−2t Upon dividing by the common factor of e−2t , and recalling that AE1 = −2E1 , this equation becomes E1 − 2tE1 − 2E2 = −2tE1 + AE2 or AE2 + 2E2 = E1
382
CHAPTER 10
Systems of Linear Differential Equations
We can write this equation as A + 2I3 E2 = E1 or
⎛
0 ⎝ 25 0
−1 −5 1
⎛ ⎞ ⎞ −5 −1 0 ⎠ E2 = ⎝ −5 ⎠ 5 1
⎞ With E2 = ⎝ ⎠, this is the nonhomogeneous system ! ⎛ ⎞⎛ ⎞ ⎛ ⎞ 0 −1 −5 −1 ⎝ 25 −5 0 ⎠ ⎝ ⎠ = ⎝ −5 ⎠ 0 1 5 ! 1 ⎛ ⎞ −s with general solution ⎝ 1 − 5s ⎠, in which s can be any number. For a specific solution, s choose s = 1 and let ⎛ ⎞ −1 E2 = ⎝ −4 ⎠ 1 ⎛
This gives us the second solution 2 t = E1 te−2t + E2 e−2t ⎛ ⎞ ⎛ ⎞ −1 −1 = ⎝ −5 ⎠ te−2t + ⎝ −4 ⎠ e−2t 1 1 ⎛ ⎞ −1 − t = ⎝ −4 − 5t ⎠ e−2t 1+t We need one more solution. Try for a solution of the form 1 3 t = E1 t2 e−2t + E2 te−2t + E3 e−2t 2 We want to solve for E3 . Substitute this proposed solution into X = AX to get 1 E1 te−2t − t2 e−2t + E2 e−2t − 2te−2t + E3 −2e−2t = AE1 t2 e−2t + AE2 te−2t + AE3 e−2t 2 ⎛ ⎞ 1 Divide this equation by e−2t and use the fact that AE1 = −2E1 and AE2 = ⎝ 3 ⎠ to get −1 ⎞ ⎛ 1 E1 t − E1 t2 + E2 − 2E2 t − 2E3 = −E1 t2 + ⎝ 3 ⎠ t + AE3 (10.6) −1
10.2 Solution of X = AX when A is Constant Now
383
⎡⎛
⎞ ⎛ ⎞⎤ ⎛ ⎞ −1 −1 1 E1 t − 2E2 t = ⎣⎝ −5 ⎠ − 2 ⎝ −4 ⎠⎦ t = ⎝ 3 ⎠ t 1 1 −1
so equation (10.6) has three terms cancel, and it reduces to E2 − 2E3 = AE3 Write this equation as A + 2I3 E3 = E2 or
⎛
0 ⎝ 25 0
⎛ ⎞ ⎞ −1 −5 −1 −5 0 ⎠ E3 = ⎝ −4 ⎠ 1 5 1
with general solution
⎛
1−25s 25
⎞
⎟ ⎜ ⎝ 1 − 5s ⎠ s in which s can be any number. Choosing s = 1, we ⎛ 24 − 25 ⎜ E3 = ⎝ −4 1 A third solution is
can let ⎞ ⎟ ⎠
⎛ 24 ⎞ ⎞ ⎛ ⎞ − 25 −1 −1 1⎝ ⎟ ⎜ 2 −2t −2t ⎠ ⎝ ⎠ −5 t e + −4 te + ⎝ −4 ⎠ e−2t 3 t = 2 1 1 1 ⎞ ⎛ 24 − 25 − t − 21 t2 ⎟ ⎜ = ⎝ −4 − 4t − 25 t2 ⎠ e−2t ⎛
1 + t + 21 t2 To show that 1 , 2 and 3 are linearly independent, Theorem 10.3(2) is convenient. Form the 3 × 3 matrix having these solutions, evaluated at t = 0, as columns: ⎛ ⎞ −1 −1 − 24 25 ⎜ ⎟ ⎝ −5 −4 −4 ⎠ 1 1 1 The determinant of this matrix is − 251 , so this matrix is nonsingular and the solutions are linearly independent. We can use these solutions as columns of a fundamental matrix 24 ⎛ ⎞ − 25 − t − 21 t2 e−2t −1 − te−2t −e−2t ⎟ ⎜
t = ⎝ −5e−2t −4 − 5te−2t −4 − 4t − 25 t2 e−2t ⎠ 1 + t + 21 t2 e−2t e−2t 1 + te−2t The general solution of X = AX is Xt = tC.
384
CHAPTER 10
Systems of Linear Differential Equations
These examples suggest a procedure which we will now outline in general. Begin with a system X = AX, with A an n × n matrix of real numbers. We want the general solution, so we need n linearly independent solutions. Case 1—A has n linearly independent eigenvectors. Use these eigenvectors to write n linearly independent solutions and use these as columns of a fundamental matrix. (This case may occur even if A does not have n distinct eigenvalues). Case 2—A does not have n linearly independent eigenvectors. Let the eigenvalues of A be 1 n At least one eigenvalue must be repeated, because if A has n distinct eigenvectors, the corresponding eigenvectors must be linearly independent, putting us back in case 1. Suppose 1 r are the distinct eigenvalues, while r+1 n repeat some of these first r eigenvalues. If Vj is an eigenvector corresponding to j for j = 1 r, we can immediately write r linearly independent solutions 1 t = V1 e1 t r t = Vr er t Now work with the repeated eigenvalues. Suppose is a repeated eigenvalue, say = 1 with multiplicity k. We already have one solution corresponding to , namely 1 . To be consistent in notation with the examples just done, denote V1 = E1 and 1 = 1 . Then 1 t = V1 e1 t = E1 et is one solution corresponding to . For a second solution corresponding to , let 2 t = E1 tet + E2 et Substitute this proposed solution into X = AX and solve for E2 . If k = 2, this yields a second solution corresponding to and we move on to another multiple eigenvalue. If k ≥ 3, we do not yet have all the solutions corresponding to , so we attempt 1 3 t = E1 t2 et + E2 tet + E3 et 2 Substitute 3 t into the differential equation and solve for E3 to get a third solution corresponding to . If ≥ 4, continue with 1 1 E t3 et + E2 tt2 et + E3 tet + E4 et 3! 1 2! substitute into the differential equation and solve for E4 , and so on. Eventually, we reach 4 t =
k t =
1 1 E1 tk−1 et + E tk−2 et + · · · + Ek−1 tet + Ek et k − 1! k − 2! 2
substitute into the differential equation and solve for Ek . This procedure gives, for an eigenvalue of multiplicity k, k linearly independent solutions of X = AX. Repeat the procedure for each eigenvalue until n linearly independent solutions have been found.
10.2.3 Solution of X = AX by Diagonalizing A We now take a different tack and attempt to exploit diagonalization. Consider the system ⎞ ⎛ −2 0 0 0 ⎠ X X = ⎝ 0 4 0 0 −6
10.2 Solution of X = AX when A is Constant
385
The constant coefficient matrix A is a diagonal matrix, and this system really consists of three independent differential equations, each involving just one of the variables: x1 = −2x1 x2 = 4x2 x3 = −6x3 Such a system is said to be uncoupled. Each equation is easily solved independent of the others, obtaining x1 = c1 e−2t x2 = c2 e4t x3 = c3 e−6t The system is uncoupled because the coefficient matrix A is diagonal. Because of this, we can immediately write the eigenvalues −2 4 −6 of A, and find the corresponding eigenvectors ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 0 0 ⎝ 0 ⎠⎝ 1 ⎠⎝ 0 ⎠ 0 0 1 Therefore X = AX has fundamental matrix ⎛ −2t e
t = ⎝ 0 0
0 e4t 0
0 0
⎞ ⎠
e−6t
and the general solution is Xt = tC. However we wish to approach this system, the point is that it is easy to solve because A is a diagonal matrix. Now in general A need not be diagonal. However, A may be diagonalizable (Section 9.2). This will occur exactly when A has n linearly independent eigenvectors. In this event, we can form a matrix P, whose columns are eigenvectors of A, such that ⎛ ⎞ 1 0 · · · 0 ⎜ 0 2 · · · 0 ⎟ ⎜ ⎟ P−1 AP = D = ⎜ ⎟ ⎝ · · · ⎠ 0
0
···
n
D is the diagonal matrix having the eigenvalues 1 n down its main diagonal. This will hold even if some of the eigenvalues have multiplicities greater than 1, provided that A has n linearly independent eigenvectors. Now make the change of variables X = PZ in the differential equation X = AX. First compute X = PZ = PZ = AX = APZ so Z = P−1 APZ = DZ The uncoupled system Z = DZ can be solved by inspection. A fundamental matrix for Z = DZ is ⎞ ⎛ t e1 0 ··· ··· 0 ⎜ 0 e2 t · · · · · · 0 ⎟ ⎟ ⎜ ⎜ ⎟
D t = ⎜ ⎟ · · · ⎜ ⎟ ⎝ 0 0 ⎠ 0 · · · en−1 t 0 0 ··· 0 en t
386
CHAPTER 10
Systems of Linear Differential Equations
and the general solution of Z = DZ is Zt = D tC Then Xt = PZt = P D tC is the general solution of the original system X = AX. That is, t = P D t is a fundamental matrix for X = AX. In this process we need P, whose columns are eigenvectors of A, but we never actually need to calculate P−1 .
EXAMPLE 10.13
Solve
X =
3 1
3 5
X
The eigenvalues and associated eigenvectors of A are −3 1 2, and 6 1 1 Because A has distinct eigenvalues, A is diagonalizable. Make the change of variables X = PZ, where −3 1 P= 1 1 This transforms X = AX to Z = DZ, where 2 D= 0
0 6
This uncoupled system has fundamental matrix 2t e
D t = 0 Then X = AX has fundamental matrix
t = P D t = =
−3 1
0 e6t
1 1
−3e2t e2t
e6t e6t
e2t 0
0 e6t
The general solution of X = AX is Xt = tC.
10.2.4 Exponential Matrix Solutions of X = AX A first-order differential equation y = ay has general solution yx = ceax . At the risk of stretching the analogy too far, we might conjecture whether there might be a solution eAt C for a matrix differential equation X = AX. We will now show how to define the exponential matrix eAt to make sense out of this conjecture. For this section, let A be an n × n matrix of real numbers.
10.2 Solution of X = AX when A is Constant
387
The Taylor expansion of the real exponential function, et = 1 + t +
1 2 1 3 t + t +··· 2! 3!
suggests the following.
DEFINITION 10.3
Exponential Matrix
The exponential matrix eAt is the n × n matrix defined by eAt = In + At +
1 2 2 1 3 3 A t + A t +··· 2! 3!
It can be shown that this series converges for all real t, in the sense that the infinite series of elements in the i j place converges. Care must be taken in computing with exponential matrices, because matrix multiplication is not commutative. The analogue of the relationship eat ebt = ea+bt is given by the following. THEOREM 10.9
Let B be an n × n real matrix. Suppose AB = BA. Then eA+Bt = eAt eBt Because A is a constant matrix,
d 1 2 2 1 3 3 1 4 4 d At e = I + At + A t + A t + A t + · · · dt dt n 2! 3! 4! 1 1 = A + A2 t + A3 t2 + A4 t3 + · · · 2! 3!
1 2 2 1 3 3 1 4 4 = A In + At + A t + A t + A t + · · · 2! 3! 4! = AeAt The derivative of eAt , obtained by differentiating each element, is the product AeAt of two n × n matrices, and has the same form as the derivative of the scalar exponential function eat . One ramification of this derivative formula is that, for any n × 1 constant matrix K, eAt K is a solution of X = AX. LEMMA 10.1
For any real n × 1 constant matrix K, eAt K is a solution of X = AX. Proof
Compute t = eAt K = AeAt K = At
Even more, eAt is a fundamental matrix for X = AX.
388
CHAPTER 10
Systems of Linear Differential Equations
THEOREM 10.10
eAt is a fundamental matrix for X = AX. Proof
Let Ej be the n × 1 matrix with 1 in the j 1 place and all other entries zero: ⎛ ⎞ 0 ⎜ 0 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎟ Ej = ⎜ ⎜ 1 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ 0 ⎠ 0
Then eAt Ej is the j th column of eAt . This column is a solution of X = AX by the lemma. Further, the columns of eAt are linearly independent by Theorem 10.3(2), because if we put t = 0, we get eAt = In , which has a nonzero determinant. Thus is a fundamental matrix for X = AX. In theory, then, we can find the general solution eAt C of X = AX, if we can compute eAt . This, however, can be a daunting task. As an example, for an apparently simple matrix such as 1 2 A= −2 4 we find using a software package that eAt = √ √ √ √ 5t/2 √4 e5t/2 sin 7t/2 e cos 7t/2 + 7e5t/2 sin 7t/2 7 √ √ √ √ e5t/2 cos 7t/2 − 7e5t/2 sin 7t/2 − √47 e5t/2 sin 7t/2 This is a fundamental matrix for X = AX. It would be at least as easy, for this A, to find the √ 4√ eigenvalues of A, which are 25 ± 21 i 7, then find corresponding eigenvectors, , 3±i 7 and use these to obtain a fundamental matrix. We will now pursue an interesting line of thought. We claim that, even though eAt may be tedious or even impractical to compute for a given A, it is often possible to compute the product eAt K, for carefully chosen K, as a finite sum, and hence generate solutions to X = AX. To do this we need the following. LEMMA 10.2
Let A be an n × n real matrix and K an n × 1 real matrix. Let be any number. Then 1. eIn t K = et K. 2. eAt K = et eA−In t K. Proof
For (1), since In m = In for any positive integer m, we have
1 1 In t 2 2 3 3 e K = In + In t + In t + In t + · · · K 2! 3!
1 2 2 1 3 3 = 1 + t + t + t + · · · In K = et K 2! 3!
10.2 Solution of X = AX when A is Constant
389
For (2), first observe that In and A − In commute, since In A − In = In A − In 2 = A − In = A − In In Then, using Theorem 10.9, eAt K = eAt+In t−In t K = eA−In t eIn t K = eA−In t et K = et eA−In t K Now suppose we want to solve X = AX. Let 1 r be the distinct eigenvalues of A, and let j have multiplicity mj . Then m1 + · · · + mr = n For each j , find as many linearly independent eigenvectors as possible. For j , this can be any number from 1 to mj inclusive. If this yields n linearly independent eigenvectors, then we can write the general solution as a sum of eigenvectors j times exponential functions ej t , and we do not need eAt . Thus suppose some j has multiplicity mj ≥ 2, but there are fewer than mj linearly independent eigenvectors. Find an n × 1 constant matrix K1 that is linearly independent from the eigenvectors found for j and such that A − j In K1 = O,
but
A − j In 2 K1 = O
Then eAt K1 is a solution of X = AX. Further, because of the way K1 was chosen
eAt K1 = ej t eA−j In t K1 = ej t K1 + A − j In K1 t with all other terms of the series for eA−j In t K1 vanishing because A − j In 2 K1 = O forces A − j In m K1 = O for m ≥ 2. We can therefore compute ej t K1 as a sum of just two terms. If we now have mj solutions corresponding to j , then leave this eigenvalue and move on to any others that do not yet have as many linearly independent solutions as their multiplicity. If we do not yet have mj solutions corresponding to j , then find a constant n × 1 matrix K2 such that A − j In K2 = O
and A − j In 2 K2 = O
but A − j In 3 K2 = O Then eAt K2 is a solution of X = AX, and we can compute this solution as a sum of just three terms:
1 eAt K2 = ej t eA−j In t K2 = ej t K2 + A − j In K2 t + A − j In 2 t2 2! The other terms in the infinite series for eAt K2 vanish because A − j In 3 K2 = A − j In 4 K2 = · · · = O If this gives us mj solutions associated with j , move on to another eigenvalue for which we do not yet have as many solutions as the multiplicity of the eigenvalue. If not, produce an n × 1 constant matrix K3 such that A − j In K3 = O, A − j In 2 K3 = O and A − j In 3 K3 = O but A − j In 4 K3 = 0 Then eAt K3 can be computed as a sum of four terms.
390
CHAPTER 10
Systems of Linear Differential Equations
Keep repeating this process. For j it must terminate after at most mj − 1 steps, because we began with at least one eigenvector associated with j and then produced more solutions to obtain a total of mj linearly independent solutions associated with j . Once these are obtained, we move on to another eigenvalue for which we have fewer solutions than its multiplicity, and repeat this process for that eigenvalue, and so on. Eventually we generate a total of n linearly independent solutions, thus obtaining the general solution of X = AX.
EXAMPLE 10.14
Consider X = AX, where
⎛
2 ⎜ 0 A=⎜ ⎝ 0 0
1 2 0 0
0 1 2 0
⎞ 3 1 ⎟ ⎟ 4 ⎠ 4 ⎛
⎞ 9 ⎜ 6 ⎟ ⎟ The eigenvalues are 4 2 2 2. Associated with 4 we find the eigenvector ⎜ ⎝ 8 ⎠, so one 4 solution of X = AX is ⎛ ⎞ 9 ⎜ 6 ⎟ 4t ⎟ 1 t = ⎜ ⎝ 8 ⎠e 4 Associated with 2 we find that every eigenvector has the form ⎛ ⎞ ⎜ 0 ⎟ ⎜ ⎟ ⎝ 0 ⎠ 0 A second solution is
⎛
⎞ 1 ⎜ 0 ⎟ 2t ⎟ 2 t = ⎜ ⎝ 0 ⎠e 0
Now find a 4 × 1 constant matrix K1 such that A − 2I4 K1 = O, but A − 2I4 2 K1 = O. First compute ⎛ ⎞2 0 1 0 3 ⎜ 0 0 1 1 ⎟ ⎟ A − 2I4 2 = ⎜ ⎝ 0 0 0 4 ⎠ 0 0 0 2 ⎛ ⎞ 0 0 1 7 ⎜ 0 0 0 6 ⎟ ⎟ =⎜ ⎝ 0 0 0 8 ⎠ 0 0 0 4
10.2 Solution of X = AX when A is Constant
391
Solve A − 2I4 2 K1 = O to find solutions of the form ⎛ ⎞ ⎜ ⎟ ⎜ ⎟ ⎝ 0 ⎠ 0 We will choose the solution
⎛
⎞ 0 ⎜ 1 ⎟ ⎟ K1 = ⎜ ⎝ 0 ⎠ 0
to avoid duplicating the eigenvector already found associated with 2. Then ⎛ ⎞⎛ ⎞ ⎛ ⎞ 0 1 0 3 0 1 ⎜ 0 0 1 1 ⎟⎜ 1 ⎟ ⎜ 0 ⎟ ⎟⎜ ⎟ ⎜ ⎟ A − 2I4 K1 = ⎜ ⎝ 0 0 0 4 ⎠ ⎝ 0 ⎠ = ⎝ 0 ⎠ = O 0 0 0 2 0 0 as required. Thus form the third solution 3 t = eAt K1 = e2t K1 + A − 2I4 K1 t ⎡⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎤ 0 0 1 0 3 0 ⎢⎜ 1 ⎟ ⎜ 0 0 1 1 ⎟ ⎜ 1 ⎟ ⎥ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎥ = e2t ⎢ ⎣⎝ 0 ⎠ + ⎝ 0 0 0 4 ⎠ ⎝ 0 ⎠ t⎦ 0 0 0 0 2 0 ⎡⎛ ⎞ ⎛ ⎞ ⎤ ⎛ ⎞ 0 1 t ⎢⎜ ⎟ ⎜ 0 ⎟ ⎥ ⎜ 1 ⎟ 2t 2t ⎢⎜ 1 ⎟ ⎟ t⎥ = ⎜ ⎟e +⎜ = e ⎣⎝ 0 ⎠ ⎝ 0 ⎠ ⎦ ⎝ 0 ⎠ 0 0 0 The three solutions found up to this point are linearly independent. Now we need a fourth solution. It must come from the eigenvalue 2, because 4 has multiplicity 1 and we have one solution corresponding to this eigenvalue. Look for K2 such that A − 2I4 K2 = O
and A − 2I4 2 K2 = O
but A − 2I4 3 K2 = O First compute
⎛
0 ⎜ 0 3 ⎜ A − 2I4 = ⎝ 0 0 Solutions of A − 2I4 3 K2 = O are of the form ⎛ ⎜ ⎜ ⎝ ! 0
0 0 0 0 ⎞ ⎟ ⎟ ⎠
0 0 0 0
⎞ 18 12 ⎟ ⎟ 16 ⎠ 8
392
CHAPTER 10
Systems of Linear Differential Equations
We will choose
⎛
⎞ 1 ⎜ 1 ⎟ ⎟ K2 = ⎜ ⎝ 1 ⎠ 0
to avoid duplicating previous choices. Of course other choices are possible. It is verify that A − 2I4 K2 = O and A − 2I4 2 K2 = O. Thus form the fourth solution
1 At 2t 2 2 4 t = e K2 = e K2 + A − 2In K2 t + A − 2In t 2! ⎞ ⎞⎛ ⎞ ⎛ ⎡⎛ ⎛ ⎞⎛ 1 0 1 0 3 1 0 0 1 7 1 ⎟ ⎜ 0 0 1 1 ⎟⎜ 1 ⎟ ⎢⎜ ⎜ 0 0 0 6 ⎟⎜ 1 1 2t ⎢⎜ 1 ⎟ ⎟t+ ⎜ ⎟⎜ ⎟⎜ +⎜ = e ⎣⎝ 1 ⎠ ⎝ 0 0 0 4 ⎠⎝ 1 ⎠ 2 ⎝ 0 0 0 8 ⎠⎝ 1 0 0 0 0 2 0 0 0 0 4 0 ⎛ 1 2 ⎞ 1+t + 2t ⎜ ⎟ 2t 1+t ⎟e =⎜ ⎝ ⎠ 1
routine to
⎞ ⎤ ⎟ 2⎥ ⎟t ⎥ ⎠ ⎦
0 We now have four linearly independent solutions, hence the general solution. We can also write the fundamental matrix ⎛ 4t ⎞ 9e e2t te2t 1 + t + 21 t2 e2t ⎜ 6e4t 0 e2t ⎟ 1 + te2t ⎟
t = ⎜ ⎝ 8e4t 0 ⎠ 2t 0 e 4e4t
SECTION 10.2
0
0
0
PROBLEMS
In each of Problems 1 through 5, find a fundamental matrix for the system and use it to write the general solution. These coefficients matrices for these systems have real, distinct eigenvalues. 1. x1 = 3x1 x2 = 5x1 − 4x2 2. x1 = 4x1 + 2x2 x2 = 3x1 + 3x2 3. x1 = x1 + x2 x2 = x1 + x2 4. x1 = 2x1 + x2 − 2x3 x2 = 3x1 − 2x2 x3 = 3x1 − x2 − 3x3 5. x1 = x1 +2x2 +x3 x2 = 6x1 −x2 x3 = −x1 −2x2 −x3 In each of Problems 6 through 11, find a fundamental matrix for the system and use it to solve the initial value problem. The matrices of these systems have real, distinct eigenvalues.
6. x1 = 3x1 − 4x2 x2 = 2x1 − 3x2 x1 0 = 7 x2 0 = 5 7. x1 = x1 − 2x2 x2 = −6x1 x1 0 = 1 x2 0 = −19 8. x1 = 2x1 − 10x2 x2 = −x1 − x2 x1 0 = −3 x2 0 =6 9. x1 = 3x1 − x2 + x3 x2 = x1 + x2 − x3 x3 = x1 − x2 + x3 x1 0 = 1 x2 0 = 5 x3 0 = 1 10. x1 = 2x1 + x2 − 2x3 x2 = 3x1 − 2x2 x3 = 3x1 + x2 − 3x3 x1 0 = 1 x2 0 = 7 x3 0 = 3 11. x1 = 2x1 + 3x2 + 3x3 x2 = −x2 − 3x3 x3 = 2x3 x1 0 = 9 x2 0 = −1 x3 0 = −3
10.2 Solution of X = AX when A is Constant 12. Show that the change of variables z = lnt for t > 0 transforms the system tx1 = ax1 + bx2 tx2 = cx1 + dx2 into a linear system X = AX, assuming that a b c and d are real constants. 13. Use the idea of Problem 12 to solve the system tx1 = 6x1 + 2x2 tx2 = 4x1 + 4x2 14. Solve the system tx1 = −x1 − 3x2 tx2 = x1 − 5x2 In each of Problems 15 through 19, find a real-valued fundamental matrix for the system X = AX, with A the given matrix. 2 −4 15. 1 2 0 5 16. −1 −2 3 −5 17. 1 −1 ⎞ ⎛ 1 −1 1 0 ⎠ 18. ⎝ 1 −1 1 0 −1 ⎞ ⎛ −2 1 0 0 ⎠ 19. ⎝ −5 0 0 3 −2 In each of Problems 20 through 23, find a real-valued fundamental matrix for the system X = AX, with A the given matrix. Use this to solve the initial value problem, with X0 the given n × 1 matrix. 2 3 2 20. −5 1 8 1 3 −2 21. 10 5 −3 5 2 −5 22. 0 1 −2 ⎞ ⎛ ⎛ ⎞ 3 −3 1 7 23. ⎝ 2 −1 0 ⎠ ⎝ 4 ⎠ 1 −1 1 3 24. Can a matrix with at least one complex, non-real element have only real eigenvalues? If not, give a proof. It it can, give an example. In each of Problems 25 through 30, find a fundamental matrix for the system X = AX, using the method of Section 10.2.2, with A the given matrix.
25. 26.
3 0
2 3
2 5
0 2
⎛
2 27. ⎝ 0 0 ⎛ 1 28. ⎝ 0 4 ⎛ 1 ⎜ 0 29. ⎜ ⎝ 0 0 ⎛ ⎜ 30. ⎜ ⎝
0 0 0 −1
393
⎞ 6 9 ⎠ 2 ⎞
5 8 −1 5 1 8
0 0 ⎠ 1
5 3 3 0
−2 0 0 0 1 0 0 −2
0 1 0 0
⎞ 6 4 ⎟ ⎟ 4 ⎠ 1 ⎞ 0 0 ⎟ ⎟ 1 ⎠ 0
In each of Problems 31 through 35, find the general solution of the system X = AX, with A the given matrix, and use this general solution to solve the initial value problem, for the given n × 1 matrix X0. Use the method of Section 10.2.2 for these problems. 7 −1 5 31. 1 5 3 2 0 4 32. 3 5 2 ⎞ ⎞ ⎛ ⎛ 0 −4 1 1 33. ⎝ 0 2 −5 ⎠ ⎝ 4 ⎠ 12 0 0 −4 ⎞ ⎞ ⎛ ⎛ 2 −5 2 1 3 ⎠ ⎝ −3 ⎠ 34. ⎝ 0 −5 4 0 0 −5 ⎞ ⎛ ⎛ ⎞ 1 −2 0 0 2 ⎟ ⎜ 1 −1 0 ⎜ 0 ⎟ ⎜ −2 ⎟ ⎟ 35. ⎜ ⎝ 0 0 5 −3 ⎠ ⎝ 1 ⎠ 0 0 3 −1 4 In each of Problems 36 through 40, find the general solution of the system by diagonalizing the coefficient matrix. 36. x1 = −2x1 + x2 x2 = −4x1 + 3x2 37. x1 = 3x1 + 3x2 x1 = x1 + 5x2 38. x1 = x1 + x2 x2 = x1 + x2 39. x1 = 6x1 + 5x2 x2 = x1 + 2x2 40. x1 = 3x1 − 2x2 x2 = 9x1 − 3x2
CHAPTER 10
394
Systems of Linear Differential Equations
In each of Problems 41-45, solve the system X = AX, with A the matrix of the indicated problem, by finding eAt . 41. 42. 43. 44. 45.
A A A A A
10.3
as as as as as
in in in in in
Problem Problem Problem Problem Problem
In each of Problems 46-50, solve the initial value problem of the referred problem, using the exponential matrix.
25. 26. 27. 28. 29.
46. 47. 48. 49. 50.
Problem Problem Problem Problem Problem
31. 32. 33. 34. 35.
Solution of X = AX + G We now turn to the nonhomogeneous system X t = AtXt + Gt, assuming that the elements of the n × n matrix At, and the n × 1 matrix Gt, are continuous on some interval I, which may be the entire real line. Recall that the general solution of X = AX + G has the form Xt = tC + p t, where
t is an n × n fundamental matrix for the homogeneous system X = AX, C is an n × 1 matrix of arbitrary constants, and p is a particular solution of X = AX + G. At least when A is a real, constant matrix, we have a strategy for finding . We will concentrate in this section on strategies for finding a particular solution p .
10.3.1 Variation of Parameters Recall the variation of parameters method for second-order differential equations. If y1 x and y2 x form a fundamental set of solutions for y x + pxy x + qxyx = 0 then the general solution of this homogeneous equation is yh x = c1 y1 x + c2 y2 x To find a particular solution yp x of the nonhomogeneous equation y x + pxy x + qxyx = fx replace the constants in yh by functions and attempt to choose ux and vx so that yp x = uxy1 x + vxy2 x is a solution. The variation of parameters method for the matrix equation X = AX + G follows the same idea. Suppose we can find a fundamental matrix for the homogeneous system X = AX. The general solution of this homogeneous system is then Xh t = tC, in which C is an n × 1 matrix of arbitrary constants. Look for a particular solution of X = AX + G of the form p t = tUt in which Ut is an n × 1 matrix of functions of t which is to be determined. Substitute this proposed solution into the differential equation to get U = A U + G, or
U + U = A U + G
10.3 Solution of X = AX + G
395
Now is a fundamental matrix for X = AX, so = A . Therefore U = A U and the last equation reduces to
U = G Since is a fundamental matrix, the columns of are linearly independent. This means that is nonsingular, so the last equation can be solved for U to get U = −1 G As in the case of second order differential equations, we now have the derivative of the function we want. Then Ut = −1 tGtdt in which we integrate a matrix by integrating each element of the matrix. Once we find a suitable Ut, we have a particular solution p t = tUt of X = AX + G. The general solution of this nonhomogeneous equation is then Xt = tC + tUt in which C is an n × 1 matrix of constants.
EXAMPLE 10.15
Solve the system
X =
1 −10 −1 4
X+
et sint
First we need a fundamental matrix for of A are −1 and 6, with X= AX. The eigenvalues 5 −2 associated eigenvectors, respectively, and . Therefore a fundamental matrix for 1 1 X = AX is −t −2e6t 5e
t = e−t e6t
We find (details provided at the end of the example) that 1 2et et
−1 t = 7 −e−6t 5e−6t Compute U t = −1 tGt =
1 7
1 = 7 Then
et −e−6t
2et 5e−6t
e2t + 2et sint −e−5t + 5e−6t sint
et sint
t 2t sintdt e−5tdt + 2 e −6t − e dt + 5 e sintdt 1 2t e + 17 et sint − cost 14 = 1 −5t 5 −6t e + 259 e −6 sint − cost 35
Ut =
−1 tGtdt =
1 7
396
CHAPTER 10
Systems of Linear Differential Equations
The general solution of X = AX + G is
5e−t e−t
Xt = tC + tUt = + =
5e−t e−t
5e−t e−t
−2e6t e6t −2e6t e6t
C
1 2t e + 17 et sint − cost 14 1 −5t 5 −6t e + 259 e −6 sint − cost 35
−2e6t e6t
3 t e 10 1 t e 10
C+
+ 35 sint − 25 cost 37 37
+ 371 sint − 376 cost
If we want to write the solution in terms of the component functions, let C =
c1 c2
to obtain
25 3 t 35 e + sint − cost 10 37 37 6 1 1 x2 t = c1 e−t + c2 e6t + et + sint − cost 10 37 37 x1 t = 5c1 e−t − 2c2 e6t +
Although the coefficient matrix A in this example was constant, this is not a requirement of the variation of parameters method. In the example we needed −1 t. Standard software packages will produce this inverse. We could also proceed as follows, reducing t and recording the row operations beginning with the identity matrix, as discussed in Section 7.8.1: ⎞ ⎛ −t 6t −2e ⎠ ⎝ 1 0 5e −t 0 1 e e6t add − 15 (row 1) to row 2
⎛ 1
0
− 15
1
⎝ multiply row 1 by 15 et
⎛ ⎜ ⎝
multiply row 2 by 57 e−6t
add 25 e7t (row 2) to row 1
1 t e 5
0
− 15
1
1 t e 5
0
− 17 e−6t
5 −6t e 7
0
⎛ ⎜ ⎝
⎞ − 25 e7t ⎟ ⎠ 7 6t e 5
1
⎛ ⎜ ⎝
⎞ −2e6t ⎠ 7 6t e 5
5e−t 0
1 t e 7
2 t e 7
− 17 e−6t
5 −6t e 7
⎞ − 25 e7t ⎟ ⎠ 1
1 0
1 0
⎞ 0 ⎟ ⎠ 1
Since the last two columns for I2 , the first two columns are −1 t.
10.3 Solution of X = AX + G
397
Variation of Parameters and the Laplace Transform There is a connection between the variation of parameters method and the Laplace transform. Suppose we want a particular solution p of X = AX + G, in which A is an n × n real matrix. The variation of parameters method is to find a particular solution p t = tUt , where t is a fundamental matrix for X = AX. Explicitly, Ut = −1 tGtdt We can choose a particular Ut by carrying out this integration from 0 to t: t Ut =
−1 sGsds 0
Then p t = t
t 0
−1 sGsds =
t 0
t −1 sGsds
In this equation can be any fundamental matrix for X = AX. In particular, suppose we choose t = eAt . This is sometimes called the transition matrix for X = AX, since it is a fundamental matrix such that 0 = In . Now −1 s = e−As , so
t −1 s = eAt e−As = eAt−s = t − s and p t =
t 0
t − sGsds
This equation has the same form as the Laplace transform convolution of and G, except that in the current setting these are matrix functions. Now define the Laplace transform of a matrix to be the matrix obtained by taking the Laplace transform of each of its elements. This extended Laplace transform has many of the same computational properties as the Laplace transform for scalar functions. In particular, we can define the convolution integral t
t − sGsds
t ∗ Gt = 0
In terms of this convolution, p t = t ∗ Gt This is a general formula for a particular solution of X = AX + G when t = eAt .
EXAMPLE 10.16
Consider the system X = We find that
e = At
1 1
−4 5
1 − 2te3t te3t
X+
e2t t
−4te3t 1 + 2te3t
= t
398
CHAPTER 10
Systems of Linear Differential Equations
A particular solution of X = AX + G is given by t p t =
t − sGsds 0
2s −4t − se3t−s e dt 1 + 2t − se3t−s s 0 t 1 − 2t + 2s e3t e−s − 4 t − s e3t se−3s = ds t − s e3t e−s + 1 + 2t − 2s e3t se−3s 0 t 3t −s 3t −3s t
1 − 2t + 2s e e − 4 − s e se ds 0 t =
t − s e3t e−s + 1 + 2t − 2s e3t se−3s ds 0 −3e2t + 89 e3t − 229 e3t t − 49 t − 278 27 = e2t + 119 e3t t − 28 e3t − 19 t + 271 27
=
t
1 − 2t − se3t−s t − se3t−s
The general solution of X = AX + G is 1 − 2te3t −4te3t Xt = C te3t 1 + 2te3t −3e2t + 89 e3t − 229 e3t t − 49 t − 278 27 + e2t + 119 e3t t − 28 e3t − 19 t + 271 27
10.3.2 Solution of X = AX+ G by Diagonalizing A Consider the case that A is a constant, diagonalizable matrix. Then A has n linearly independent eigenvectors. These form columns of a nonsingular matrix P such that ⎛ ⎞ 1 0 · · · 0 ⎜ 0 2 · · · 0 ⎟ ⎜ ⎟ P−1 AP = D = ⎜ ⎟ ⎝ ··· ⎠ 0 0 · · · n with eigenvalues down the main diagonal in the order corresponding to the eigenvector columns of P. As we did in the homogeneous case X = AX, make the change of variables X = PZ. Then the system X = AX + G becomes X = PZ = APZ + G or PZ = APZ + G Multiply this equation on the left by P−1 to get Z = P−1 APZ + P−1 G = DZ + P−1 G This is an uncoupled system of the form z1 = 1 z1 + f1 t z2 = 2 z2 + f2 t zn = n zn + fn t
10.3 Solution of X = AX + G where
⎛
⎞
f1 t f2 t
⎜ ⎜ P−1 Gt = ⎜ ⎝
399
⎟ ⎟ ⎟ ⎠
fn t ⎛ ⎜ ⎜ Solve these n first-order differential equations independently, form Zt = ⎜ ⎝
z1 t z2 t
⎞ ⎟ ⎟ ⎟, and then ⎠
zn t the solution of X = AX + G is Xt = PZt. Unlike diagonalization in solving the homogeneous system X = AX, in this nonhomogeneous case we must explicitly calculate P−1 in order to determine P−1 Gt, the nonhomogeneous term in the transformed system.
EXAMPLE 10.17
Consider
X =
3 1
3 5
8 4e3t
X+
The eigenvalues of A are 2 and 6, with eigenvectors, respectively, Let
P=
−3 1
Then
−1
P AP =
1 1
2 0
0 6
Compute P
−1
=
− 41
1 4 3 4
1 4
The transformation X = PZ transforms the original system into 2 0 8 Z + P−1 Z = 0 6 4e3t − 41 41 2 0 8 = Z+ 1 3 0 6 4e3t 4
or
Z =
2 0
0 6
Z+
4
−2 + e3t 2 + 3e3t
−3 1
and
1 1
.
400
CHAPTER 10
Systems of Linear Differential Equations
This is the uncoupled system z1 = 2z1 − 2 + e3t z2 = 6z2 + 2 + 3e3t Solve these linear first-order differential equations independently: z1 t = c1 e2t + e3t + 1 1 z2 t = c2 e6t − e3t − 3 Then
Zt =
Then
Xt = PZt = =
−3 1
c1 e2t + e3t + 1 c2 e6t − e3t − 13
1 1
c1 e2t + e3t + 1 c2 e6t − e3t − 13
−3c1 e2t + c2 e6t − 4e3t − 103
c1 e2t + c2 e6t + 23 −4e3t − 103 −3e2t e6t = C+ 2 e2t e6t
(10.7)
3
This is the general solution of X = AX + G. It is the general solution of X = AX, plus a particular solution of X = AX + G. Indeed, −3e2t e6t e2t e6t is a fundamental matrix for the homogeneous system X = AX. To illustrate the solution of an initial value problem, suppose we want the solution of 2 3 3 8 X0 = X+ X = −7 1 5 4e3t Since we have the general solution (10.7) of this system, all we need to do is determine C so that this initial condition is satisfied. We need −4 − 103 −3 1 X0 = C+ 2 1 1 3 22 −3 −3 1 2 = C+ = 2 1 1 −7 3
This is the equation PC =
2 + 223 −7 − 23
=
28 3 − 233
10.3 Solution of X = AX + G We already have P−1 , so C=P
−1
28 3 − 233
= =
− 41 1 4
− 174
1 4 3 4
28 3 − 233
401
41 − 12
The initial value problem has the unique solution 17 −4e3t − 103 −4 −3e2t e6t X = + 41 2 e2t e6t − 12 3 51 41 6t e2t − 12 e − 4e3t − 103 4 = 41 6t − 174 e2t − 12 e + 23
SECTION 10.3
PROBLEMS
In each of Problems 1 through 5, use variation of parameters to find the general solution of X = AX + G, with A and G as given. −3et 5 2 1. −2 1 e3t 1 2 −4 2. 3t 1 −2 2e6t 7 −1 3. 6t 1 5 6te ⎞ ⎞ ⎛ 2t ⎛ e cos3t 2 0 0 ⎠ −2 4. ⎝ 0 6 −4 ⎠ ⎝ −2 0 4 −2 ⎞ ⎛ ⎛ ⎞ 0 1 0 0 0 t ⎜ 4 3 0 0 ⎟ ⎜ −2e ⎟ ⎟ ⎜ ⎟ 5. ⎜ ⎝ 0 0 3 0 ⎠⎝ 0 ⎠ −1 2 9 1 et In each of Problems 6 through 9, use variation of parameters to solve the initial value problem X = AX + G X0 = X0 , with A, G and X0 as given (in that order). 0 2 0 2 6. 3 10t 5 2 5 −4 2et −1 7. t 4 −3 3 2e ⎞ ⎞ ⎛ ⎞ ⎛ ⎛ 2t 10e 5 2 −3 1 2 4 ⎠ ⎝ 6e2t ⎠ ⎝ 11 ⎠ 8. ⎝ 0 −2 0 0 1 −e2t
⎛
1 9. ⎝ 3 4
−3 −5 7
⎞ ⎛ ⎞ ⎞ ⎛ te−2t 0 6 −2t ⎠⎝ 2 ⎠ 0 ⎠ ⎝ te −2 3 t2 e−2t
10. Recall that a transition matrix for X = AX is a fundamental matrix t such that 0 = In . (a) Prove that, for a transition matrix t, −1 t =
−t and t + s = t s for real s and t. (b) Suppose t is any fundamental matrix for X = AX. Prove that t = t −1 0 is a transition matrix. That is, t is a fundamental matrix and 0 = In . In each of Problems 11, 12 and 13, verify that the given matrix is a fundamental matrix for the system and use it to find a transition matrix. 2et e6t 11. x1 = 4x1 + 2x2 x2 = 3x1 + 3x2 −3et e6t −5t 2e 1 + 5te−5t 5 12. x1 = −10x2 x2 = x1 −10x2 5 −5t te e−5t 2 2 13. x1 = 5x1 − 4x2 + 4x3 x2 = 12x1 − 11x2 + 12x3 , ⎞ ⎛ −3t et 0 e 0 et ⎠ x3 = 4x1 − 4x2 + 5x3 ⎝ 3e−3t e−3t −et et In each of Problems 14 through 18, find the general solution of the system by diagonalization. The general solutions of the associated homogeneous systems X = AX were requested in Problems 36 through 40 of Section 10.2.
CHAPTER 10
402
Systems of Linear Differential Equations
14. x1 = −2x1 + x2 x2 = −4x1 + 3x2 + 10 cost 15. x1 = 3x1 + 3x2 + 8 x2 = x1 + 5x2 + 4e3t 16. x1 = x1 + x2 + 6e3t x2 = x1 + x2 + 4 17. x1 = 6x1 + 5x2 − 4 cos3t x2 = x1 + 2x2 + 8 18. x1 = 3x1 − 2x2 + 3e2t x2 = 9x1 − 3x2 + e2t In each of Problems 19 through 23, solve the initial value problem by diagonalization. 19.
x1
x2
= x1 + x2 + 6e = x1 + x2 + 2e x1 0 = 6 x2 0 = 0 2t
2t
20. x1 = x1 − 2x2 + 2t x2 = −x1 + 2x2 + 5 x1 0 = 13 x2 0 = 12 21. x1 = 2x1 − 5x2 + 5 sint x2 = x1 − 2x2 x1 0 = 10 x2 0 = 5 22. x1 = 5x1 − 4x2 + 4x3 − 3e−3t x2 = 12x1 − 11x2 + 12x3 + t x3 = 4x1 − 4x2 + 5x3 x1 0 = 1 x2 0 = −1 x3 0 = 2 23. x1 = 3x1 − x2 − x3 x2 = x1 + x2 − x3 + t x3 = x1 − x2 + x3 + 2et x1 0 = 1 x2 0 = 2 x3 0 = −2
CHAPTER
11
NONLINEAR SYSTEMS AND EXISTENCE OF SOLU TIONS THE PHASE PLANE, PHASE PORTRAITS, AND DIRECTION FIELDS CRITICAL POINTS AND STABILITY ALMOST LINEAR SYSTEMS PREDATOR/PREY POP
Qualitative Methods and Systems of Nonlinear Differential Equations
11.1
Nonlinear Systems and Existence of Solutions The preceding chapter was devoted to matrix methods for solving systems of differential equations. Matrices are suited to linear problems. In algebra, the equations we solve by matrix methods are linear, and in differential equations the systems we solve by matrices are also linear. However, many interesting problems in mathematics, the sciences, engineering, economics, business and other areas involve systems of nonlinear differential equations, or nonlinear systems. We will consider such systems having the special form: x1 t = F1 t x1 x2 xn x2 t = F2 t x1 x2 xn
(11.1)
xn t = Fn t x1 x2 xn This assumes that each equation of the system can be written with a first derivative isolated on one side, and a function of t and the unknown functions x1 t xn t on the other. Initial conditions for this system have the form x1 t0 = x10 x2 t0 = x20 xn t0 = xn0
(11.2)
in which t0 is a given number and x10 xn0 are given numbers. An initial value problem consists of finding a solution of the system (11.1) satisfying the initial conditions (11.2). We will state an existence/uniqueness result for this initial value problem. In the statement, an open rectangular parallelepiped in n + 1− dimensional t x1 xn − space consists of all points t x1 xn in Rn+1 whose coordinates satisfy inequalities < t < 1 < x1 < 1 n < xn < n 403
404
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
If n = 1 this is an open rectangle in the t x plane, and in three space these points form an open three-dimensional box in 3-space. “Open” means that only points in the interior of parallelopiped, and no points on the bounding faces, are included. THEOREM 11.1
Existence/Uniqueness for Nonlinear Systems
Let F1 Fn and their first partial derivatives be continuous at all points of an open rectangular parallelepiped K in Rn+1 . Let t0 x10 xn0 be a point of K. Then there exists a positive number h such that the initial value problem consisting of the system (11.1) and the initial conditions (11.2) has a unique solution x1 = 1 t x2 = 2 t xn = n t defined for t0 − h < t < t0 + h. Many systems we encounter are nonlinear and cannot be solved in terms of elementary functions. This is why we need an existence theorem, and why we will shortly develop qualitative methods to determine properties of solutions without having them explicitly in hand. As we develop ideas and methods for analyzing nonlinear systems, it will be helpful to have some examples to fall back on and against which to measure new ideas. Here are two examples that are important and that come with some physical intuition about how solutions should behave.
EXAMPLE 11.1 The Simple Damped Pendulum
We will derive a system of differential equations describing the motion of a simple pendulum, as shown in Figure 11.1. Although we have some intuition about how a pendulum bob should move, nevertheless the system of differential equations describing this motion is nonlinear and cannot be solved in closed form. Suppose the pendulum bob has mass m, and is at the end of a rod of length L. The rod is assumed to be so light that its weight does not figure into the motion of the bob. It serves only to constrain the bob to remain at fixed distance L from the point of suspension. The position of the bob at any time is described by its displacement angle t from the vertical. At some time we call t = 0 the bob is displaced by an angle 0 and released from rest. To describe the motion of the bob, we must analyze the forces acting on it. Gravity acts downward with a force of magnitude mg. The damping force (air resistance, friction of the bar at its pivot point) is assumed to have magnitude c t for some positive constant c. By Newton’s laws of motion, the rate of change of angular momentum about any point, with respect to time,
cL θ'(t)
θ
m mg j
L 1 cos(θ)
L sin(θ) i FIGURE 11.1
pendulum.
Simple, damped
11.1 Nonlinear Systems and Existence of Solutions
405
equals the moment of the resultant force about that point. The angular momentum is mL2 t. From the diagram, the horizontal distance between the bob and the vertical center position at time t is L sint. Then, mL2 t = −cL t − mgL sint The negative signs on the right take into account the fact that, if the bob is displaced to the right, these forces tend to make the bob rotate clockwise, which is the negative orientation. It is customary to write this differential equation as + ! + 2 sin = 0
(11.3)
with ! = c/mL and = g/L. Convert this second order equation to a system as follows. Let 2
x = y = Then the pendulum equation (11.3) becomes x = y y + !y + 2 sinx = 0 or x = y y = −2 sinx − !y This is a nonlinear system because of the sinx term. We cannot write a solution of this system in closed form. However, we will soon have methods to analyze the behavior of solutions and hence the motion itself. In matrix form, the pendulum system is 0 1 0 X+ X = 0 −! −2 sinx x in which X = . y
EXAMPLE 11.2 Nonlinear Spring
Consider an object of mass m attached to a spring. If the object is displaced and released, its motion is governed by Hooke’s law, which states that the force exerted on the mass by the spring is Fr = −kr, with k a positive constant and r the distance displaced from the equilibrium position (position at which the object is at rest). Figure 2.4 shows a typical such mass/spring system, with r used here for the displacement instead of y used in Chapter 2. This is a linear model, since F is a linear function (a constant times r to the first power). The spring model becomes nonlinear if Fr is nonlinear. Simple nonlinear models are achieved by adding terms to −kr. What kind of terms should we add? Intuition tells us that the spring should not care whether we displace an object left or right before releasing it. Since displacements in opposite directions carry opposite signs, this means that we want F−r = −Fr, so F should be an odd function.This suggests adding multiples of odd powers of r. The simplest such model is Fr = −kr + r 3
CHAPTER 11
406
Qualitative Methods and Systems of Nonlinear Differential Equations
If we also allow a damping force which in magnitude is proportional to the velocity, then by Newton’s law this spring motion is governed by the second order differential equation mr = −kr + r 3 − cr To convert this to a system, let x = r and y = r . The system is x = y y = − In matrix form, this system is X =
SECTION 11.1
0 −k/m
1 −c/m
X+
0 x3 /m
PROBLEMS x0 = a, y0 = b. What are the physical interpretations of the initial conditions? Are there any restrictions on the numbers a and b in applying the theorem to assert the existence of a unique solution in some interval −h h?
1. Apply the existence/uniqueness theorem to the system for the simple damped pendulum, with initial conditions x0 = a, y0 = b. What are the physical interpretations of the initial conditions? Are there any restrictions on the numbers a and b in applying the theorem to assert the existence of a unique solution in some interval −h h?
3. Suppose the driving force for the nonlinear spring has additional terms, say Fr = −kr + r 3 + r 5 . Does this problem still have a unique solution in some interval −h h?
2. Apply the existence/uniqueness theorem to the system for the nonlinear spring system, with initial conditions
11.2
k c x + x3 − y m m m
The Phase Plane, Phase Portraits and Direction Fields Throughout this chapter we will consider systems of two first-order differential equations in two unknowns. In this case it is convenient to denote the variables as x and y rather than x1 and x2 . Thus consider the system x t = fxt yt y t = gxt yt
(11.4)
in which f and g are continuous, with continuous first partial derivatives, in some part of the plane. We often write this system as X = Fxt yt where
X=
x y
and Fx y =
fx y gx y
The system (11.4) is a special case of the system (11.1). We assume in (11.4) that neither f nor g has an explicit dependence on t. Rather, f and g depend only on x and y, and t appears only through dependencies of these two variables on t. We refer to such a system as autonomous.
11.2 The Phase Plane, Phase Portraits and Direction Fields
407
Working in the plane will allow us the considerable advantage of geometric intuition. If x = t, y = t is a solution of (11.4), the point t t traces out a curve in the plane as t varies. Such a curve is called a trajectory, or orbit, of the system. A copy of the plane containing drawings of trajectories is called a phase portrait for the system (11.4). In this context, the x y plane is called the phase plane. We may consider trajectories as oriented, with t t moving along the trajectory in a certain direction as t increases. If we think of t as time, then t t traces out the path of motion of a particle, moving under the influence of the system (11.4), as time increases. In the case of orbits that are closed curves, we take counterclockwise orientation as the positive orientation, unless specific exception is made. In some phase portraits, short arrows are also drawn. The arrow at any point is along the tangent to the trajectory through that point, and in the direction of motion along this trajectory. This type of drawing combines the phase portrait with a direction field, and gives an overall sense of the flow of the trajectories, as well as graphs of some specific trajectories. One way to construct trajectories is to write dy dy/dt gx y = = dx dx/dt fx y Because the system is autonomous, this is a differential equation in x and y and we can attempt to solve it and graph solutions. If the system is nonautonomous, then f/g may depend explicitly on t then we cannot use this strategy to generate trajectories.
EXAMPLE 11.3
Consider the autonomous system x = y = fx y y = x2 y2 = gx y Then x2 y 2 dy = = x2 y dx y a separable differential equation we write as 1 dy = x2 dx y Integrate to get 1 ln y = x3 + C 3 or 3
y = Aex /3 Graphs of these curves for various values of A form trajectories of this system, some of which are shown in Figure 11.2.
408
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations y 6 4 2
3
2
1
0
2
1
x
2 4 6 FIGURE 11.2
Some trajectories of the system
x = y y = x2 y 2
.
EXAMPLE 11.4
For the autonomous system x = −2y − x sinxy y = 2x + y sinxy
(11.5)
we have dy 2x + y sinxy =− dx 2y + x sinxy This is not separable, but we can write 2x + y sinxydx + 2y + x sinxydy = 0 which is exact. We find the potential function Hx y = x2 + y2 − cosxy, and the general solution of this differential equation is defined implicitly by Hx y = x2 + y2 − cosxy = C in which C is an arbitrary constant. Figure 11.3 shows a phase portrait for this system (11.5), consisting of graphs of these curves for various choices of C. Usually we will not be so fortunate as to be able to solve dy/dx = gx y/fx y in closed form. In such a case we may still be able to use a software package to generate a phase portrait. Figure 11.4 is a phase portrait of the system x = x cosy y = x2 − y3 + sinx − y
11.2 The Phase Plane, Phase Portraits and Direction Fields
409
y 2
1
–2
–1
1
0
2
x
–1
–2 FIGURE 11.3 Phase portrait for
x = −2y − x sinxy y = 2x + y sinxy
.
y 2
1
0
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
x
–1
–2
–3 FIGURE 11.4 Phase portrait for
x = x cosy y = x2 − y3 + sinx − y
.
generated in this way. Figure 11.5 (p. 410) is a phase portrait for a damped pendulum with 2 = 10 and ! = 03, and Figure 11.6 (p. 410) is a phase portrait for a nonlinear spring system with = 02, k/m = 4 and c/m = 2. We will consider phase portraits for the damped pendulum and nonlinear spring system in more detail when we treat almost linear systems. If x = t, y = t is a solution of (11.4), and c is a constant, we call the pair t + c, t + c a translation of and . We will use the following fact.
410
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations y
8 6 4 2 15
10
5
5
10
15
x
2 4 6 FIGURE 11.5
Phase portrait for a damped pendulum.
y 30 20 10 10
5
0
5
10
x
10 20 30 40 FIGURE 11.6
Phase portrait for a nonlinear spring.
LEMMA 11.1
A translation of a solution of the system (11.4) is also a solution of this system. Proof
Suppose x = t, y = t is a solution. This means that x t = t = ft t and y t = t = gt t
Let , xt = t + c and , yt = t + c
11.2 The Phase Plane, Phase Portraits and Direction Fields
411
for some constant c. By the chain rule, d, x dt + c dt + c = = t + c dt dt + c dt = ft + c t + c = f, xt, yt and, similarly, d, y = t + c = gt + c t + c = g, xt, yt dt Therefore x = , xt, y =, yt is also a solution. We may think of a translation as a reparamatrization of the trajectory, which of course does not alter the fact that it is a trajectory. If we think of the point t t as moving along the orbit, a translation simply means rescheduling the point to change the times at which it passes through given points of the orbit. We will need the following facts about trajectories.
THEOREM 11.2
Let F and G be continuous, with continuous first partial derivatives, in the x y plane. Then, 1. If a b is a point in the plane, there is a trajectory through a b 2. Two trajectories passing through the same point must be translations of each other. Proof
Conclusion (1) follows immediately from Theorem 11.1, since the initial value problem x = fx y y = gx y
x0 = a y0 = b
has a solution, and the graph of this solution is a trajectory through a b. For (2), suppose x = 1 t, y = 1 t and x = 2 t, y = 2 t are trajectories of the system (11.4). Suppose both trajectories pass through a b. Then for some t0 , 1 t0 = a and 1 t0 = b and for some t1 , 2 t1 = a and 2 t1 = b Let c = t0 − t1 and define , xt = 1 t + c and , yt = 1 t + c. Then x = , xt, y = , yt is a trajectory, by Lemma 11.1. Further, , xt1 = 1 t0 = a and , yt1 = t0 = b Therefore x = , xt, y =, yt is the unique solution of the initial value problem x = fx y y = gx y
xt1 = a yt1 = b
But x = 2 t, y = 2 t is also the solution of this problem. Therefore, for all t, 2 t = , xt = 1 t + c and 2 t =, yt = 1 t + c This proves that the two trajectories x = 1 t, y = 1 t and x = 2 t, y = 2 t are translations of each other.
412
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
If we think of translations of trajectories as the same trajectory (just a change in the parameter), then conclusion (2) states that distinct trajectories cannot cross each other. This would violate uniqueness of the solution of the system that passes through the point of intersection. Conclusion (2) of Theorem 11.2 does not hold for systems that are not autonomous.
EXAMPLE 11.5
Consider the system 1 x t = x = ft x y t 1 y t = − y + x = gt x y t This is nonautonomous, since f and g have explicit t-dependencies. We can solve this system. The first equation is separable. Write 1 1 dx = dt x t to obtain xt = ct. Substitute this into the second equation to get 1 y + y = ct t a linear first-order differential eqaution. This equation can be written ty + y = ct2 or ty = ct2 Integrate to get c ty = t3 + d 3 Hence c 1 yt = t2 + d 3 t Now observe that conclusion (2) of Theorem 11.2 fails for this system. For example, for any number t0 , the trajectory xt =
1 1 2 t02 1 t yt = t − t0 3t0 3 t
passes through 1 0 at time t0 . Because t0 is arbitrary, this gives many trajectories passing through 1 0 at different times, and these trajectories are not translations of each other. We now have the some of the vocabulary and tools needed to analyze 2 × 2 nonlinear autonomous systems of differential equations. First, however, we will reexamine linear systems, which we know how to solve explicitly. This will serve two purposes. It will give us some experience with phase portraits, as well as insight into significant features that solutions of a system might have. In addition, we will see shortly that some nonlinear systems can be thought of as perturbations of linear systems (that is, as linear systems with “small” nonlinear terms added). In such a case, knowledge of solutions of the linear system yields important information about solutions of the nonlinear system.
11.3 Phase Portraits of Linear Systems
SECTION 11.2
413
PROBLEMS
In each of Problems 1 through 6, find the general solution of the system and draw a phase portrait containing at least six trajectories of the system. 1. x = 4x + y y = −17x − 4y
7. x = 9y y = −4x 8. x = 2xy y = y2 − x2 9. x = y + 2 y = x − 1
2. x = 2x y = 8x + 2y
10. x = cscx y = y
3. x = 4x − 7y y = 2x − 5y
11. x = x y = x + y
4. x = 3x − 2y y = 10x − 5y
12. x = x2 y = y
5. x = 5x − 2y y = 4y 13. How would phase portraits for the following systems compare with each other?
6. x = −4x − 6y y = 2x − 11y In each of Problems 7 through 12, use the method of Examples 11.3 and 11.4 to draw some integral curves (at least six) for the system.
11.3
(a) x = Fx y y = Gx y (b) x = −Fx y y = −Gx y
Phase Portraits of Linear Systems In preparation for studying the nonlinear autonomous system (11.4), we will thoroughly analyze the linear system (11.6) X = AX x in which A is a 2 × 2 real matrix and X = . We assume that A is nonsingular, so the y equation AX = O has only the trivial solution. For the linear system X = AX, we actually have the solutions in hand. We will examine these solutions to prepare for the analysis of nonlinear systems, for which we are unlikely to have explicit solutions. The origin 0 0 stands apart from other points in the plane in the following respect. The trajectory through the origin is the solution of: 0 X = AX X0 = O = 0 and this is the constant trajectory xt = 0 yt = 0 for all t. The graph of this trajectory is the single point 0 0. For this reason, the origin is called an equilibrium point of the system, and the constant solution X = O is called an equilibrium solution. The origin is also called a critical point of X = AX. By Theorem 11.1, no other trajectory can pass through this point. As we proceed, observe how the behavior of trajectories of X = AX near this critical point is the key to understanding the behavior of trajectories throughout the entire plane. The critical point, then, will be the focal point in drawing a phase portrait of the system and analyzing the behavior of solutions.
414
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
We will draw the phase portrait for X = AX in all cases that can occur. Because the general solution of (11.6) is completely determined by the eigenvalues of A, we will use these eigenvalues to distinguish cases. Case 1—Real, distinct eigenvalues and of the same sign. Let associated eigenvectors be, respectively, E1 and E2 . Since and are distinct, E1 and E2 are linearly independent. The general solution is xt Xt = = c1 E1 et + c2 E2 et yt Since E1 and E2 are vectors in the plane, we can represent them as arrows from the origin, as in Figure 11.7. Draw half-lines L1 and L2 from the origin along these eigenvectors, respectively, as shown. These half-lines lines are parts of trajectories, and so do not pass through the origin, which is itself a trajectory. y L2 L1
E2 E1
x
FIGURE 11.7 Eigenvectors E1 and E2 of X = AX, for distinct eigenvalues of A.
Now consider subcases. Case 1-(a)—The eigenvalues are negative, say < < 0. Since et → 0 and et → 0 as t → , then Xt → 0 0 and every trajectory approaches the origin as t → . However, this can happen in three ways, depending on an initial point P0 x0 y0 we choose for a trajectory to pass through at time t = 0. Here are the three possibilities. If P0 is on L1 , then c2 = 0 and Xt = c1 E1 et which for any t is a scalar multiple of E1 . The trajectory through P0 is the half-line from the origin along L1 through P0 , and the arrows toward the origin indicate that points on this trajectory approach the origin along L1 as t increases. This is the trajectory T1 of Figure 11.8. If P0 is on L2 , then c1 = 0 and now Xt = c2 E2 et . This trajectory is a half-line from the origin along L2 through P0 . Again, the arrows indicate that points on this trajectory also approach the origin along L2 as t → . This is the trajectory T2 of Figure 11.8. If P0 is on neither L1 or L2 , then the trajectory is a curve through P0 having the parametric form Xt = c1 E1 et + c2 E2 et Write this as
Xt = et c1 E1 e−t + c2 E2
11.3 Phase Portraits of Linear Systems
415
y T3
P0
L2
L1
P0 T1
T2
P0 x
FIGURE 11.8 Trajectories along E1 , along E2 , or asymptotic to E2 in the case < < 0.
Because − < 0, e−t → 0 as t → and the term c1 E1 e−t exerts increasingly less influence on Xt. In this case, Xt still approaches the origin, but also approaches the line L2 , as t → . A typical such trajectory is shown as the curve T3 of Figure 11.8. A phase portrait of X = AX in this case therefore has some trajectories approaching the origin along the lines through the eigenvectors of A and all others approaching the origin along curves that approach one of these lines asymptotically. In this case the origin is called a nodal sink of the system X = AX. We can think of particles flowing along the trajectories and toward the origin. The following example and phase portrait are typical of nodal sinks.
EXAMPLE 11.6
Consider the system X = AX, in which A=
−6 5
−2 1
A has engenvalues and corresponding eigenvectors 2 −1 −1 and − 4 −5 1 In the notation of the discussion, = −4 and = −1. The general solution is −1 2 −4t e + c2 e−t Xt = c1 1 −5 L1 is the line through −1 1, and L2 the line through 2 −5. Figure 11.9 shows a phase portrait for this system, with the origin a nodal sink. Case 1-(b)—The eigenvalues are positive, say 0 < < . The discussion of Case 1-(a) can be replicated with one change. Now et and et approach instead of zero as t increases. The phase portrait is like that of the previous case, except all the arrows are reversed and trajectories flow away from the origin instead of into the origin as time increases. As we might expect, now the origin is called a nodal source. Particles are flowing away from the origin. Here is a typical example of a nodal source.
416
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations y 4
2
3
2
1
0
1
2
x
2
4 Phase portrait showing a nodal sink x = −6x − 2y . y = 5x + y
FIGURE 11.9
of
EXAMPLE 11.7
Consider the system
X =
3 1
3 5
This has eigenvalues and eigenvectors −3 1 2 and 6 1 1 Now = 6 and = 2, and the general solution is −3 1 2t e + c2 e6t Xt = c1 1 1 Figure 11.10 shows a phase portrait for this system, exhibiting the behavior expected for a nodal source at the origin. Case 2—Real, distinct eigenvalues of opposite sign. Suppose the eigenvalues are and with < 0 < . The general solution still has the appearance Xt = c1 E1 et + c2 E2 et and we start to draw a phase portrait by again drawing half-lines L1 and L2 from the origin along the eigenvectors. If P0 is on L1 , then c2 = 0 and Xt moves on this half-line away from the origin as t increases, because > 0 and et → as t → . But if P0 is on L2 , then c1 = 0 and Xt moves along this half-line toward the origin, because < 0 and et → 0 as t → .
11.3 Phase Portraits of Linear Systems
417
y 4
2
3
2
1
0
1
2
x
2
4
x = 3x + 3y
FIGURE 11.10 Nodal source of
y = x + 5y
.
The arrows along the half-lines along the eigenvectors therefore have opposite directions, toward the origin along L2 and away from the origin along L1 . This is in contrast to Case 1, in which solutions starting out on the half-lines through the eigenvectors either both approached the origin or both moved away from the origin as time increased. If P0 is on neither L1 nor L2 , then the trajectory through P0 does not come arbitrarily close to the origin for any times, but rather approaches the direction determined by the eigenvector E1 as t → (in which case et → 0) or the direction determined by E2 as t → − (in which case et → 0). The phase portrait therefore has typical trajectories as shown in Figure 11.11. The
y 4
2
4
2
0
2
4
2
4
FIGURE 11.11 Typical phase portrait for a saddle point at the origin.
x
418
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
lines along the eigenvectors determine four trajectories that separate the plane into four regions. A trajectory starting in one of these regions must remain in it because distinct trajectories cannot cross each other, and such a trajectory is asymptotic to both of the lines bounding its region. The origin in this case is called a saddle point.
EXAMPLE 11.8
Consider X = AX with
−1 3 2 −2
A=
Eigenvalues and eigenvectors of A are −1 3 −4 and 1 1 2 The general solution is Xt = c1
−1 1
e−4t + c2
3 2
et
and a phase portrait is given in Figure 11.12. In this case of a saddle point at the origin, trajectories do not enter or leave the origin, but asymptotically approach the lines determined by the eigenvectors. Case 3—Equal eigenvalues. Suppose A has the real eigenvalue of multiplicity 2. There are two possibilities.
y
150
100
50
150
100
50
0
50
FIGURE 11.12
Saddle point of
x = −x + 3y y = 2x − 2y
.
x
11.3 Phase Portraits of Linear Systems
419
Case 3-(a)—A has two linearly independent eigenvectors E1 and E2 . Now the general solution of X = AX is Xt = c1 E1 + c2 E2 et If
E1 =
a b
and E2 =
h k
then, in terms of components, xt = c1 a + c2 het
yt = c1 b + c2 ket
Now yt = constant. xt This means that all trajectories in this case are half-lines from the origin. If > 0, arrows along these trajectories are away from the origin, as in Figure 11.13. If < 0, they move toward the origin, reversing the arrows in Figure 11.13. y
x
FIGURE 11.13 Typical proper node with positive eigenvalue of A.
The origin in case 3 - (a) is called a proper node. Case 3-(b)—A does not have two linearly independent eigenvectors. In this case there is an eigenvector E and the general solution has the form Xt = c1 Et + Wet + c2 Eet = c1 W + c2 E + c1 Et et To visualize the trajectories, begin with arrows from the origin representing the vectors W and E. Now, for selected constants c1 and c2 , draw the vector c1 W + c2 E, which may have various orientations relative to W and E, depending on the signs and magnitudes of c1 and c2 . Some possibilities are displayed in Figure 11.14. For given c1 and c2 , the vector c1 W + c2 E + c1 Et drawn as an arrow from the origin, sweeps out a straight line L as t varies over all real values. For a given t, Xt is the vector c1 W + c2 E + c1 Et from the origin to a point on L, with length adjusted by a factor et . If is negative, then this length goes to zero as t → and the vector Xt sweeps out a curve as shown in Figure 11.15, approaching the origin tangent to E. If
420
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations y WE
W 2E
W WE E
x
W 2E FIGURE 11.14 Vectors c1 W + c2 E in the case of an improper node.
y L c1W c2E c1Et X(t) = [c1W c2E c1Et]eλt W
FIGURE 11.15
E
x
Typical trajectory near an improper
node.
> 0, we have the same curve (now et → 0 as t → − ), except that the arrow indicating direction of the flow on the trajectory is reversed. The origin in this case is called a improper node of the system X = AX. The following example has a phase portrait that is typical of improper nodes.
EXAMPLE 11.9
Let A=
−10 −6
6 2
Then A has eigenvalue −4, and every eigenvector is a real constant multiple of E = A routine calculation gives W=
1 7 6
1 1
.
11.3 Phase Portraits of Linear Systems
421
and the general solution is Xt = c1
t+1 t + 76
e−4t + c2
1 1
e−4t
Figure 11.16 is a phase portrait for this system. We can see that the trajectories approach the origin tangent to E in this case of an improper node at the origin, with negative eigenvalue for A.
y 60 50 40 30 20 10 100
50
0
50
100
x
10
FIGURE 11.16 Phase portrait for the improper node
of
x = −10x + 6y y = −6x + 2y
.
Case 4—Complex eigenvalues with nonzero real part. We know that the complex eigenvalues must be complex conjugates, say = + i and = − i. The complex eigenvectors are also conjugates. Write these, respectively, as U + iV and U − iV. Then the general solution of X = AX is Xt = c1 et U cost − V sint + c2 et U sint + V cost Suppose first that < 0. The trigonometric terms in this solution cause Xt to rotate about the origin as t increases, while the factor et causes Xt to move closer to the origin (or, equivalently, the length of the vector Xt to decrease to zero) as t → . This suggests a trajectory that spirals inward toward the origin as t increases. Since t varies over the entire real line, taking on both negative and positive values, the trajectories when > 0 have the same spiral appearance, but now the arrows are reversed and Xt moves outward, away from the origin, as t → . The origin in this case is called a spiral point. When < 0 the origin is a spiral sink because the flow defined by the trajectories is spiralling into the origin. When > 0 the origin is a spiral source because now the origin appears to be spewing material outward in a spiral pattern. The phase portrait in the following example is typical of a spiral source.
422
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
EXAMPLE 11.10
Let
A=
−1 4
−2 3
with eigenvalues 1 + 2i and 1 − 2i and eigenvectors, respectively, Let
U=
−1 2
and V =
1 0
−1 + i 2
and
−1 − i 2
.
so that the eigenvectors are U + iV and U − iV. The general solution of X = AX is
−1 1 t Xt = c1 e cos2t − sin2t 2 0
−1 1 + c2 et sin2t + cos2t 2 0 Figure 11.17 gives a phase portrait for this system, showing trajectories spiraling away from the spiral source at the origin because the real part of the eigenvalues is positive.
y
40
20
40
30
20
10
0
10
20
x
20
FIGURE 11.17
Spiral source of the system
x = −x − 2y y = 4x + 3y
.
Case 5—Pure imaginary eigenvalues. Now trajectories have the form Xt = c1 U cost − V sint + c2 U sint + V cost Because of the trigonometric terms, this trajectory moves about the origin. Unlike the preceding case, however, there is no exponential factor to decrease or increase distance from the origin as t increases. This trajectory is a closed curve about the origin, representing a periodic solution
11.3 Phase Portraits of Linear Systems
423
of the system. The origin in this case is called a center of X = AX. In general, any closed trajectory of X = AX represents a periodic solution of this system.
EXAMPLE 11.11
Let
3 18 −1 −3
A=
−3 − 3i 1
A has eigenvalues 3i and −3i, with respective eigenvectors, eigenvectors and −3 + 3i . A phase portrait is given in Figure 11.18, showing closed trajectories about the 1 center (origin). If we wish, we can write the general solution y
10
5
40
20
0
20
40
x
5
10
FIGURE 11.18 Center of the system
x = 3x + 18y y = −x − 3y
.
−3 3 cos3t + sin3t 1 0
−3 −3 + c2 sin3t + cos3t 1 0
Xt = c1
We now have a complete description of the behavior of trajectories for the 2 × 2 constant coefficient system X = AX. The general appearance of the phase portrait is completely determined by the eigenvalues of A, and the critical point 0 0 is the primary point of interest, with the following correspondences: Real, distinct eigenvalues of the same sign—0 0 is a nodal source (Figure 11.10, p. 417) or sink (Figure 11.9, p. 416). Real, distinct eigenvalues of opposite sign—0 0 is a saddle point (Figure 11.12, p. 418).
CHAPTER 11
424
Qualitative Methods and Systems of Nonlinear Differential Equations
Equal eigenvalues, two linearly independent eigenvectors—0 0 is a proper node (Figure 11.13, p. 419). Equal eigenvalues, all eigenvectors a multiple of a single eigenvector—0 0 is an improper node (Figure 11.16, p. 421) Complex eigenvalues with nonzero real part—0 0 is a spiral point (Figure 11.17, p. 422). Pure imaginary eigenvalues—0 0 is a center (Figure 11.18, p. 423). When we speak of a classification of the origin of a linear system, we mean a determination of the origin as a nodal source or sink, saddle point, proper or improper node, spiral point or center.
SECTION 11.3
PROBLEMS 4. x = 9x − 7y y = 6x − 4y
In each of Problems 1 through 10, use the eigenvalues of the matrix of the system to classify the origin of the system. Draw a phase portrait for the system. It is assumed here that software is available to do this, and it is not necessary to solve the system to generate the phase portrait.
5. x = 7x − 17y y = 2x + y 6. x = 2x − 7y y = 5x − 10y 7. x = 4x − y y = x + 2y
1. x = 3x − 5y y = 5x − 7y
8. x = 3x − 5y y = 8x − 3y
2. x = x + 4y y = 3x
9. x = −2x − y y = 3x − 2y
3. x = x − 5y y = x − y
11.4
10. x = −6x − 7y y = 7x − 20y
Critical Points and Stability A complete knowledge of the possible phase portraits of linear 2 ×2 systems is good preparation for the analysis of nonlinear systems. In this section we will introduce the concept of critical point for a nonlinear system, define stability of critical points, and prepare for the qualitiative analysis of nonlinear systems, in which we attempt to draw conclusions about how solutions will behave, without having explicit solutions in hand. We will consider the 2 × 2 autonomous system x t = fxt yt y t = gxt yt or, more compactly, x = fx y y = gx y This is the system (11.4) discussed in Section 11.2. We will assume that f and g are continuous with continuous first partial derivatives in some region D of the x y− plane. In specific cases D may be the entire plane. This system can be written in matrix form as X = FX
11.4 Critical Points and Stability
425
In August of 1999, the Petronas Towers was officially opened. Designed by the American firm of Cesar Pelli and Associates, in collaboration with Kuala Lumpur City Center architects, the graceful towers have an elegant slenderness (height to width) ratio of 9:4. This was made possible by modern materials and building techniques, featuring high-strength concrete that is twice as effective as steel in sway reduction. The towers are supported by 75-foot-by-75-foot concrete cores and an outer ring of super columns. The 88 floors stand 452 meters above street level, and include 65,000 square meters of stainless steel cladding and 77,000 square meters of vision glass. Computations of stability of structures involve the analysis of critical points of systems of nonlinear differential equations.
where
Xt =
xt yt
and FX =
fx y gx y
Taking the lead from the linear system X = AX, we make the following definition.
DEFINITION 11.1
Critical Point
A point x0 y0 in D is a critical point (or equilibrium point) of X = FX if fx0 y0 = gx0 y0 = 0
We see immediately one significant difference between the linear and nonlinear cases. The linear system X = AX, with A nonsingular, has exactly one critical point, the origin. A nonlinear system X = FX can have any number of critical points. We will, however, only consider systems in which critical points are isolated. This means that, about any critical point, there is a circle that contains no other critical point of the system.
426
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
EXAMPLE 11.12
Consider the damped pendulum (Example 11.1), whose motion is governed by the system x = y y = −2 sinx − !y Here fx y = y and gx y = −2 sinx − !y The critical points are solutions of y=0 and −2 sinx − !y = 0 These equations are satisfied by all points n 0, in which n = 0 ±1, ±2, . These critical points are isolated. About any point n 0, we can draw a circle (for example, of radius 1/4) that does not contain any other critical point. For this problem, the critical points split naturally into two classes. Recall that x = is the angle of displacement of the pendulum from the vertical downward position, with the bob at the bottom, and y = d/dt. When n is even, then x = = 2k for k any integer. Each critical point 2k 0 corresponds to the bob pointing straight down, with zero velocity (because y = x = = 0). When n is odd, then x = = 2k + 1 for k any integer. The critical point 2k + 1 0 corresponds to the bob in the vertical upright position, with zero velocity. Without any mathematical analysis, there is an obvious and striking difference between these two kinds of critical points. At, for example, x = 0 , the bob hangs straight down from the point of suspension. If we displace it slightly from this position and then release it, the bob will go through some oscillations of decreasing amplitude, after which it will return to its downward position and remain there. This critical point, and all critical points 2n 0, are what we will call stable. Solutions of the pendulum equation for initial values near this critical point remain close to the constant equilibrium solution for all later times. By contrast, consider the critical point 0. This has the bob initially balanced vertically upward. If the bob is displaced, no matter how slightly, it will swing downward and oscillate back and forth some number of times, but never return to this vertical position. Solutions near this constant equilibrium solution (bob vertically up) do not remain near this position, but move away from it. This critical point, and any critical point 2k + 1 0, is unstable.
EXAMPLE 11.13
Consider the damped nonlinear spring of Example 11.2. The system of differential equations governing the motion is x = y y = −
k c x + x3 − y m m m
11.4 Critical Points and Stability
427
√ √ The critical points are 0 0, k/ 0 and − k/ 0. Recall that x measures the position of the spring, from the equilibrium (rest) position, and y = dx/dt is the velocity of the spring. We will do a mathematical analysis of this system shortly, but for now look at these critical points from the point of view of our experience with how springs behave. If we displace the spring very slightly from the equilibrium solution 0 0, and then release it, we expect it to undergo some motion back and forth and then come to rest, approaching the equilibrium point. In this sense 0 0 is a stable critical √ point. However, if we displace the spring slightly from a position very nearly at distance k/ to the right or left of the equilibrium position and then release it, the spring may or may not return to this position, depending on the relative sizes of the damping constant c and the coefficients in the nonlinear spring force function, particularly . In this sense these equilibrium points may be stable or may not be. In the next section we will develop the tools for a more definitive analysis of these critical points. Taking a cue from these examples, we will define a concept of stability of critical points. Recall that V = v1 2 + v2 2 is the length (or norm) of a vector V = v1 v2 in the plane. If W = w1 w2 is also a vector in the plane, then V − W is the length of the vector from W to V. If W = w1 w2 , then 1/2 V − W = v1 − w1 2 + v2 − w2 2 is also the distance between the points v1 v2 and w1 w2 . Finally, if X0 is a given vector, then the locus of points (vectors) X such that X − X0 < r for any positive r, is the set of points X within the circle of radius r about X0 . These are exactly the points at distance
DEFINITION 11.2
Stability of a Critical Point
Let X0 = x0 y0 be a critical point of X = FX. Then X0 is stable if and only if, given any positive number there exists a positive number such, if X = #t is a solution of X = FX and 0 − X0 < , then #t exists for all t ≥ 0, and t − X0 < for all t ≥ 0 We say that X0 is unstable if this point is not stable.
Keep in mind that the constant solution Xt = X0 is the unique solution through this critical point. That is, the trajectory through a critical point is just this point itself. A critical point X0 is stable if solutions that are initially (at t = 0) close (within ) to X0 , remain close
428
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
δ X0
Φ(0) = (x(0), y(0))
FIGURE 11.19
X = FX.
Stable critical point of
(within ) for all later times. In terms of trajectories, this means that a trajectory that starts out sufficiently close to X0 at time zero, must remain close to this equilibrium solution at all later times. Figure 11.19 illustrates this idea. This does not imply that solutions that start near X0 approach this point as a limit as t → . They may simply remain within a small disk about X0 , without approaching X0 in a limiting sense. If, however, solutions initially near X0 also approach X0 as a limit, then we call X0 an asymptotically stable critical point.
DEFINITION 11.3
Asymptotically Stable Critical Point
X0 is an asymptotically stable critical point of X = FX if and only if X0 is a stable critical point, and there exists a positive number such that, if a solution X = #t satisfies #0 − X0 < , then limt→ #t = X0
This concept is illustrated in Figure 11.20. Stability does not imply asymptotic stability. It is less obvious that asymptotic stability does not imply stability. A solution might start “close
X0
δ
Φ(0)
FIGURE 11.20 Asymptotically stable critical point of X = FX.
11.4 Critical Points and Stability
429
enough” to the critical point and actually approach the critical point in the limit as t → , but for some arbitrarily large positive times move arbitrarily far from X0 (before bending back to approach it in the limit). In the case of the damped pendulum, critical points 2n 0 are asymptotically stable. If the bob is displaced slightly from the vertical downward position and then released, it will eventually approach this vertical downward position in the limit as t → . To get some experience with stability and asymptotic stability, and also to prepare for nonlinear systems that are in some sense “nearly” linear, we will review the critical point 0 0 for the linear system X = AX, in the context of stability. Nodal Source or Sink This occurs when the eigenvalues of A are real and distinct, but of the same sign—a nodal sink when they are negative, and a nodal source when they are positive. From the phase portrait in Figure 11.9, p. 416 0 0 is stable and asymptotically stable when the eigenvalues are negative (nodal sink), because then all trajectories tend toward the origin as time increases. However, 0 0 is unstable when the eigenvalues are positive (nodal source), because in this case all trajectories move away from the origin with increasing time (Figure 11.10, p. 417). Saddle Point The origin is a saddle point when A has real eigenvalues of opposite sign. A saddle point is unstable.This is apparent in Figure 11.12, p. 418 in which we can see that trajectories do not remain near the origin as time increases, nor do they approach the origin as a limit. Proper Node The origin is a proper node when the eigenvalues of A are equal and A has two linearly independent eigenvectors. Figure 11.13, p. 419 shows a typical proper node. When the arrows are toward the origin (negative eigenvalues), this node is stable and asymptotically stable. When the trajectories are oriented away from the origin, this node is not stable. Improper Node The origin is an improper node when the eigenvalues of A are equal and A does not have two linearly independent eigenvectors. Now the origin is a stable and asymptotically stable critical point if the eigenvalue is negative, and unstable if the eigenvalue is positive. Figure 11.16 shows trajectories near a stable improper node (negative eigenvalue). If the eigenvalue is positive, the trajectories have orientation away from the origin, and then this node is unstable. Spiral Point The origin is a spiral point when the eigenvalues are complex conjugates with nonzero real part. When this real part is positive, the origin is a spiral source (trajectories spiral away from the origin, as in Figure 11.17), and in this case the origin is unstable. When this real part is negative, the origin is a stable and asymptotically stable spiral sink (trajectories spiralling into the origin). The phase portrait of such a sink has the same appearance as a spiral source, with arrows on the trajectories reversed. Center The origin is a center when the eigenvalues of A are pure imaginary. A center is stable, but not asymptotically stable (Figure 11.18). There is a succinct graphical way of summarizing the classifications and stability type of the critical point 0 0 for the linear system X = AX. Let a b A= c d
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
The eigenvalues of A are solutions of 2 − a + d + ad − bc = 0 Let p = −a + d and q = ad − bc to write this equation as 2 + p + q = 0 The eigenvalues of A are
−p ± p2 − 4q 2
These are real or complex depending on whether p2 − 4q ≥ 0 or p2 − 4q < 0. In the p q plane of Figure 11.21, the boundary between these two cases is the parabola p2 = 4q. Now the p q plane gives a summary of conclusions as follows: Asymptotically stable spiral Asymptotically points stable node
Node
p 2 4q
p 2 4q
q Unstable spiral points Stable centers
430
Node
Unstable node p
Unstable saddle point p 2 4q FIGURE 11.21
Classification of 0 0 for X = AX.
Above this parabola (p2 < 4q) the eigenvalues are complex conjugates with nonzero real parts (spiral point). On the parabola (p2 = 4q) the eigenvalues are real and equal (proper or improper node). On the q–axis, the eigenvalues are pure imaginary (center). Between the p–axis and the parabola, the eigenvalues are real and distinct, with the same sign (nodal source or sink). Below the p–axis, the eigenvalues are real and have opposite sign (saddle point). It is interesting to observe how sensitive the classification and stability type of a critical point are to changes in the coefficients of the system. Suppose we begin with a linear system X = AX, and then perturb one or more elements of A by “small” amounts to form a new system. How (if at all) will this change the classification and stability of the critical point? The classification and stability of 0 0 are completely determined by the eigenvalues, so the issue is really how small changes in the matrix elements affect the eigenvalues. The eigenvalues of A are −p ± p2 − 4q/2, which is a continuous function of p and q. Thus small changes in p and q (caused by small changes in a, b, c and d) result in small changes in the eigenvalues. There are two cases in which arbitrarily small changes in A will change the nature of the critical point.
11.5 Almost Linear Systems
431
(1) If the origin is a center (pure imaginary eigenvalues), then p = −a − d = 0. Arbitrarily small changes in a and d can change this, resulting in a new matrix whose eigenvalues have positive or negative real parts. For the new, perturbed system, 0 0 is no longer a center. This means that centers are sensitive to arbitrarily small changes in A. (2) The other sensitive case is that both eigenvalues are the same, which occurs when p2 − 4q = 0. Again, arbitrarily small changes in A can result in this quantity becoming positive or negative, changing the classification of the critical point. However, the stability or instability of 0 0 is determined by the sign of p, and sufficiently small changes in A will leave this sign unchanged Thus in this case the classification of kind of critical point the system has is more sensitive to change than its stability or instability. These considerations should be kept in mind when we state Theorem 11.3 in the next section. With this background on linear systems and the various characteristics of its critical point, we are ready to analyze systems that are in some sense approximated by linear systems.
SECTION 11.4
PROBLEMS
1.–10. For j = 1 10 classify the critical point of the system of Problem j of Section 11.3, as to being stable and asymptotically stable, stable and not asymptotically stable, or unstable. 11. Consider the system X = AX, where A = 1 −3 , with > 0. 2 −1 + (a) Show that, when = 0, the critical point is a center, stable but not asymptotically stable. Generate a phase portrait for this system. (b) Show that, when = 0, the critical point is not a center, no matter how small is chosen. Generate a 1 phase portrait for this system with = . 10 This problem illustrates the sensitivity of trajectories of the system to small changes in the coefficients, in the case of pure imaginary eigenvalues.
11.5
12. Consider the system X = AX, where A = 2+ 5 and > 0. −5 −8 (a) Show that, when = 0, A has equal eigenvalues and does not have two linearly independent eigenvectors. Classify the type of critical point at the origin and its stability characteristics. Generate a phase portrait for this system. (b) Show that, if is not zero (but can be arbitrarily small in magnitude), then A has real and distinct eigenvalues. Classify the type of critical point at the origin in this case, as well as its stability characteristics. 1 Generate a phase portrait for the case = 10 This problem illustrates the sensitivity of trajectories to small changes in the coefficients, in the case of equal eigenvalues.
Almost Linear Systems Suppose X = FX is a nonlinear system. We want to define a sense in which this system may be thought of as “almost linear.” Suppose the system has the special form X = AX + GX
(11.7)
px y added. Any nonlinqx y earity of the system (11.7) is in GX. We refer to the system X = AX as the linear part of the system (11.7).
This is a linear system X = AX, with another term, GX =
432
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
Assume that p0 0 = q0 0 = 0 so the system (11.7) has a critical point at the origin. The idea we want to pursue is that if the nonlinear term is “small enough”, then the behavior of solutions of the linear system X = AX near the origin may give us information about the behavior of solutions of the original, nonlinear system near this critical point. The question is: how small is “small enough?” We will assume in this discussion that A is a nonsingular, 2 × 2 matrix of real numbers, and that p and q are continuous at least within some disk about the origin. In the following definition, we refer to partial derivatives of G, by which we mean
G
G px py Gx = = = and Gy = qx qy
x
y
DEFINITION 11.4
Almost Linear
The system (11.7) is almost linear in a neighborhood of 0 0 if G and its first partial derivatives are continuous within some circle about the origin, and lim
X→O
GX = 0 X
(11.8)
This condition (11.8) means that, as X is chosen closer to the origin, GX must become small in magnitude faster than X does. This gives a precise measure of “how small” the nonlinear term must be near the origin for the system (11.7) to qualify as almost linear. If we write x a b px y X= , A= and GX = Gx y = y c d qx y then the system (11.7) is x = ax + by + px y y = cx + dy + qx y Condition (11.8) now becomes lim
xy→00
px y qx y = lim = 0 2 2 xy→00 x +y x2 + y 2
These limits, in terms of the components of GX, are sometimes easier to deal with than the limit of GX / X as X approaches the origin, although the two formulations are equivalent.
EXAMPLE 11.14
The system
X =
4 −2 1 6
X+
−4xy −8x2 y
11.5 Almost Linear Systems
433
is almost linear. To verify this, compute −4xy and xy→00 x2 + y 2 lim
−8x2 y xy→00 x2 + y 2 lim
There are various ways of showing that these limits are zero, but here is a device worth remembering. Express x y in polar coordinates by putting x = r cos and y = r sin. Then −4xy 4r 2 cos sin = −4r cos sin → 0 =− r x2 + y 2 as r → 0, which must occur if x y → 0 0. Similarly, −8x2 y r 3 cos2 sin = −8r 2 cos2 sin → 0 = −8 r x2 + y 2 as r → 0.
Figure 11.22(a) shows a phase portrait of this system. For comparison, a phase portrait of the linear part X = AX is given in Figure 11.22(b). Notice a qualitative similarity between the phase portraits near the origin. This is the rationale for the definition of almost linear systems. We will now display a correspondence between the type of critical point, and its stability properties, for the almost linear system X = AX + G and its linear part X = AX. The behavior is not always the same. Nevertheless, in some cases which we will identify, properties of the critical point for the linear system carry over to either the same properties for the almost linear system or, if not the same, at least to important information about the nonlinear system.
y 6
4
2
0
1
2
3
x
4
2 FIGURE 11.22(a) Phase portrait for
x = 4x − 2y − 4xy y = x + 6y − 8x2 y
.
434
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations y
20 10
6
4
2
0
2
4
6
8
x
10 20 30
FIGURE 11.22(b)
Phase portrait for the linear part of the system
of Figure 11.22(a). THEOREM 11.3
Let and be the eigenvalues of A. Assume that X = AX + G is almost linear. Then the following conclusions hold for the system X = AX + G. 1. If and are unequal and negative, then the origin is an asymptotically stable nodal sink of X = AX + G. If these eigenvalues are unequal and positive, then the origin is an unstable nodal source of X = AX + G. 2. If and are of opposite sign, then the origin is an unstable saddle point of X = AX + G. 3. If and are complex with negative real part, then the origin is an asymptotically stable spiral point of X = AX + G. If these eigenvalues have positive real part, then the origin is an unstable spiral point. 4. If and are equal and negative, then the linear system has an asymptotically stable proper or improper node, while the almost linear system has an asymptotically stable node or spiral point. If and are equal and positive, then the linear system has an unstable proper or improper node, while the almost linear system has an unstable node or spiral point. 5. If and are pure imaginary (conjugates of each other), then the origin is a center of X = AX, but may be a center or spiral point of the almost linear system X = AX + G. Further, in the case of a spiral point of the almost linear system, the critical point may be unstable or asymptotically stable. The only case in which the linear system fails to provide definitive information of some kind about the almost linear system is that the eigenvalues of A are pure imaginary. In this event, the linear system has a stable center, while the almost linear system can have a stable center or a spiral point which may be stable or unstable. In light of this theorem, when we ask for an analysis of a critical point of an almost linear system, we mean a determination of whether the point is an asymptotically stable nodal sink, an unstable nodal source, an unstable saddle point, an asymptotically stable spiral point or unstable spiral point, or, from (5) of the theorem, either a center or spiral point. A proof of this theorem requires some delicate analysis that we will avoid. The rest of this section is devoted to examples and phase portraits.
11.5 Almost Linear Systems
435
EXAMPLE 11.15
The system
X =
−1 −1 −1 −3
X+
x2 y 2 x3 − y 2
√ is almost √ linear and has only one critical point, 0 0. The eigenvalues of A are −2 + 2 and −2 − 2, which are distinct and negative. The origin is an asymptotically stable nodal sink of the linear system X = AX, and hence is also a stable and asymptotically stable nodal sink of the almost linear system. Figure 11.23 (a) and (b) shows a phase portrait of the almost linear system and its linear part, respectively. y
6
4
2
0.6
0.4
0.2
0
0.2
0.4
x
2 FIGURE 11.23(a) Phase portrait for
x = −x − y + x2 y2 . y = −x − 3y + x3 − y2
y
20
10
4
2
0
2
4
10 20
FIGURE 11.23(b) Phase portrait for the linear part of the system of Figure 11.23(a).
x
436
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
EXAMPLE 11.16
The system
X =
3 −4 6 2
X+
x2 cosy y3
√ is almost linear. The only critical point is 0 0. The eigenvalues of A are 25 + 21 i 95 and √ 5 − 21 i 95. The linear part has an unstable spiral point at the origin. The origin is therefore an 2 unstable spiral point of the almost linear system. Phase portraits for the given nonlinear system and its linear part are shown in Figure 11.24 (a) and (b), respectively. y
4
2
1.0
0.5
0.5
0
1.0
x
2 4
FIGURE 11.24(a)
Phase portrait for
x = 3x − 4y + x2 cosy . y = 6x + 2y + y3
y
4 2
2
1
0
1
2
2 4
FIGURE 11.24(b) Phase portrait for the linear part of the system of Figure 11.24(a).
x
11.5 Almost Linear Systems
437
EXAMPLE 11.17
The system
X =
−1 2
2 3
X+
x siny 8 sinx
√ is almost √ linear, and its only critical point is the origin. The eigenvalues of A are 1 + 2 2 and 1 − 2 2, which are real and of opposite sign. The origin is an unstable saddle point of the linear part, hence also of the given system. Phase portraits of both systems are shown in Figure 11.25 (a) (nonlinear system) and (b) (linear part).
EXAMPLE 11.18
The system
4 −2
X =
11 −4
X+
x siny siny
√ √ is almost linear, and its only critical point is 0 0. The eigenvalues of A are 6i and − 6i. The origin is a stable, but not asymptotically stable, center for the linear part. The theorem does not allow us to draw a definitive conclusion about the almost linear system, which might have a center or spiral point at the origin. Figure 11.26 (a) and (b) shows phase portraits for the almost linear system and its linear part, respectively.
EXAMPLE 11.19
Consider the system
X =
1
−1 −
X+
hxx2 + y2 kyx2 + y2
y 30 20 10 80
60
40
20
0
20
40
60
80
10 20 30 FIGURE 11.25(a) Phase portrait for
x = −x + 2y + x siny y = 2x + 3y + 8 sinx
.
x
438
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations y
30 20 10 80
60
40
20
0
20
40
x
60
10 20 30 FIGURE 11.25(b) Phase portrait for the linear part of the system of Figure 11.25(a).
√ 2 in which √ , h and k are constants. The eigenvalues of the matrix of the linear part are − 1 2 and − − 1. Consider cases. If 0 < < 1, then these eigenvalues are pure imaginary. The origin is a center of the linear part but may be a center or spiral point of the almost linear system. If > 1, then the eigenvalues are real and of opposite sign, so the origin is an unstable saddle point of both the linear part and the original almost linear system. If = ±1, then A is singular and the system is not almost linear. Figure 11.27 (a) shows a phase portrait for this system with h = 04, k = 07 and = 13 . Figure 11.27 (b) has = 2 The next example demonstrates the sensitivity of case (5) of Theorem 11.3. y
30
20
10
0
x
10
10
20
30 FIGURE 11.26(a)
Phase portrait for
x = 4x + 11y + x siny y = −2x − 4y + siny
.
11.5 Almost Linear Systems y 6 4 2
10
5
0
5
x
10
2 4 6 FIGURE 11.26(b)
Phase portrait for the linear part of the system
of Figure 11.26(a). EXAMPLE 11.20
Let be a real number and consider the system y + xx2 + y2 X = −x + yx2 + y2 We can write this in the form X = AX + G as 0 1 xx2 + y2 X = X+ yx2 + y2 −1 0
y 6 4 2 0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
x
0.8
2 4 6 8 FIGURE 11.27(a) Phase portrait for
x = 13 x − y + 04xx2 + y2 y = x − 13 y + 07yx2 + y2
.
439
440
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations y 6 4 2 0.4
0.2
0
0.2
0.4
x
2 4 6 8 FIGURE 11.27(b)
Phase portrait for
x = 2x − y + 04xx2 + y2 y = x − 2y + 07yx2 + y2
.
The origin is a critical point of this almost linear system. The eigenvalues of A are i and −i, so the linear part of this system has a center at the origin. This is the case in which Theorem 11.3 does not give a definitive conclusion for the nonlinear system. To analyze the nature of this critical point for the nonlinear system, use polar coordinates r and . Since r 2 = x2 + y 2 then rr = xx + yy
= x y + xx2 + y2 + y −x + yx2 + y2 = x2 + y2 x2 + y2 = r 4 Then dr = r 3 dt This is a separable equation for r, which we solve to get rt = √
1 k − 2t
in which k is constant determined by initial conditions (a point the trajectory is to pass through). Now consider cases. If < 0, then rt =
1 k + 2 t
→0
as t → . In this case the trajectory approaches the origin in the limit as t → , and 0 0 is asymptotically stable.
11.5 Almost Linear Systems
441
However, watch what happens if > 0. Say r0 = , so the trajectory starts at a point at a positive distance from the origin. Then k = 1/2 and rt =
1 1/2 − 2t
In this case, as t increases from 0 and approaches 1/22 , rt → . This means that, at finite times, the trajectory is arbitrarily far away from 0 0, hence 0 0 is unstable when is positive. A phase portrait for = −02 is given in Figure 11.28 (a), and for = 02 in Figure 11.28 (b). Figure 11.28 (c) gives a phase portrait for the linear part of this system. y
8 6 4 2 1.0
0.5
0
0.5
x
1.0
2 4 FIGURE 11.28(a) Phase portrait for
with = −02.
x = y + xx2 + y2 y = −x + yx2 + y2
,
y
8 6 4 2 1.0
0.5
0 2 4
FIGURE 11.28(b)
= 02.
0.5
1.0
x
442
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations y 8 6 4 2 8
6
4
2
0 2
2
4
6
8
x
4 6 8
FIGURE 11.28(c)
The linear part = 0.
Example 11.20 shows how sensitive an almost linear system can be when the eigenvalues of the linear part are pure imaginary. In this example, can be chosen arbitrarily small. Still, when is negative, the origin is asymptotically stable, and when is positive, regardless of magnitude, the origin becomes unstable. Thus far the discussion has been restricted to nonlinear systems in the special form X = AX + G, with the origin as the critical point. However, in general a nonlinear system comes in the form X = FX, and there may be critical points other than the origin. We will now show how to translate a critical point x0 y0 to the origin so that X = FX translates to a system X = AX + G. This makes the linear part of the translated system transparent. Further, since Theorem 11.3 is set up to deal with critical points at the origin, we can apply it to X = AX + G whenever this system is almost linear. f Thus suppose x0 y0 is a critical point of X = FX, where F = . Assume that f g and g are continuous with continuous first and second partial derivatives at least within some circle about x0 y0 . By Taylor’s theorem for functions of two variables, we can write, for x y within some circle about x0 y0 as fx y = fx0 y0 + fx x0 y0 x − x0 + fy x0 y0 y − y0 + x y and gx y = gx0 y0 + gx x0 y0 x − x0 + gy x0 y0 y − y0 + x y where lim
xy→x0 y0
x y x − x0
2 + y − y
0
2
=
lim
xy→x0 y0
x y x − x0 2 + y − y0 2
= 0
(11.9)
Now x0 y0 is assumed to be a critical point of X = FX, so fx0 y0 = gx0 y0 = 0 and these expansions are fx y = fx x0 y0 x − x0 + fy x0 y0 y − y0 + x y and gx y = gx x0 y0 x − x0 + gy x0 y0 y − y0 + x y
11.5 Almost Linear Systems Let , X= Then d, X= dt
= =
d x − x0 dt d y − y0 dt
fx y gx y
=
x − x0 y − y0
x y
443
= X = FX
fx x0 y0
fy x0 y0
gx x0 y0
gy x0 y0
x − x0 y − y0
+
x y x y
= Ax0 y0 , X + G Because of the condition (11.9), this system is almost linear. Omitting the tilda notation for simplicity, this puts the translated system into the form X = Ax0 y0 X + G, with the critical point x0 y0 of X = FX translated to the origin as the critical point of the almost linear system X = Ax0 y0 X + G. Now we can apply the preceding discussion and Theorem 11.3 to the translated system X = Ax0 y0 X + G at the origin, and hence draw conclusions about behavior of solutions of X = FX near x0 y0 . We use the notation Ax0 y0 for the matrix of the linear part of the translated system for two reasons. First, it reminds us that this is the translated system (since we dropped the , X notation). Second, when we are analyzing several critical points of the same system, this notation reminds us which critical point is under consideration, and clearly distinguishes the linear part associated with one critical point from that associated with another. In carrying out this strategy, it is important to realize that we do not have to explicitly compute x y or x y, which in some cases would be quite tedious, or not even practical. The point is that we know that the translated system X = Ax0 y0 X + G is almost linear if F has continuous first and second partial derivatives, a condition that is usually easy to verify.
EXAMPLE 11.21
Consider the system
X = FX =
sinx − x2 + y2
cosx + y + 1 2
Here fx y = sinx − x2 + y2 and gx y = cosx + y + 1/2. This is an almost linear system because f and g are continuous with continuous first and second partial derivatives throughout the plane. For the critical points, solve sinx − x2 + y2 = 0 cosx + y + 1 = 0 2 Certainly x = y = n is a solution for every integer n. Every point n n in the plane is a critical point. There may be other critical points as well, but other solutions of fx y = gx y = 0
444
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
are not obvious. We will need the partial derivatives fx = cosx − 2x fy = 2y sinx + y + 1 gy = − sinx + y + 1 2 2 2 2 Now consider a typical critical point n n. We can translate this point to the origin and write the translated system as X = Ann X + G with fx n n fy n n Ann = gx n n gy n n cosn − 2n 2n = − 2 sin2n + 1 2 − 2 sin2n + 1 2 2n −1n − 2n = −1n+1 2 −1n+1 2 gx = −
and
GX =
x y x y
We need not actually compute x y or x y. Because the system is almost linear, the qualitative behavior of trajectories of the nonlinear system near n n is (with exceptions noted in Theorem 11.3) determined by the behavior of trajectories of the linear system X = Ann X. We are therefore led to consider the eigenvalues of Ann , which are 1 1 2 −1n − n ± 9 − 40n−1n + 16n2 4 4 We will consider several values for n. For n = 0, the eigenvalues are and −/2, so the origin is an unstable saddle point of the linear system and also of the nonlinear system. For n = 1, the eigenvalues of A11 are 1 2 1 9 + 40 + 16 − −1± 4 4 which are approximately 2 010 1 and −5 580 9. Therefore 1 1 is also an unstable saddle point. For n = 2, the eigenvalues are 1 2 1 9 − 80 + 64 −2± 4 4 which are approximately −1 214 6 + 2 481 2i and −1 214 6 − 2 481 2i. These are complex conjugates with negative real part, so 2 2 is an asymptotically stable spiral point. For n = 3, the eigenvalues are 1 2 1 − −3± 9 + 120 + 144 4 4 which are approximately −9 959 and 2 388 2. Thus 3 3 is an unstable saddle point. For n = 4, the eigenvalues are 1 1 2 −4± 9 − 160 + 256 4 4 approximately −3 214 6 + 3 140 7i and −3 214 6 − 3 140 7i. We conclude that 4 4 is an asymptotically stable spiral point.
11.5 Almost Linear Systems
445
For n = 5, 1 1 2 9 + 200 + 400 − −5± 4 4 approximately 2 570 5 and −14 141, so 5 5 is an unstable saddle point. The pattern suggested by the cases n = 2 and n = 4 is broken with n = 6. Now the eigenvalues are 1 1 2 9 − 240 + 576 −6± 4 4 approximately −52146 ± 23606i, so 6 6 is also an unstable spiral point. With n = 7 we get eigenvalues 1 2 1 − −7± 9 + 280 + 784 4 4 approximately 2 680 2 and −18 251, so 7 7 is an unstable saddle point. The new pattern that seens to be forming is broken with the next case. If n = 8 the eigenvalues are 1 1 2 −8± 9 − 320 + 1664 4 4 approximately −9 806 9 and −4 622 3, so 8 8 is a stable node. Figures 11.29, 11.30 and 11.31 show phase portraits of this system, focusing on trajectories near selected critical points. The student should experiment with phase portraits near some of the other critical points, for example, those with negative coordinates.
y 3
2
1
3
2
1
0
1
x
1
FIGURE 11.29 Trajectories of the system of Example 11.21 near
the origin.
446
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations y
2
1
2
0
2
4
6
8
x
1
2 Trajectories of the system of Example 11.21
FIGURE 11.30
near 1 1.
y
4
3
2
1 1.5
2.0
2.5
3.0
3.5
4.0
4.5
Trajectories of the system of Example 11.21 near
FIGURE 11.31
2 2 and 4 4.
EXAMPLE 11.22 Damped Pendulum
The system for the damped pendulum is x = y y = −2 sinx − !y In matrix form, this system is
X = FX =
y −2 sinx − !y
x
11.5 Almost Linear Systems
447
Here fx y = y
and
gx y = −2 sinx − !y
The partial derivatives are fx = 0
fy = 1
gx = −2 cosx
gy = −!
We saw in Example 11.12 that the critical points are n 0 with n any integer. When n is even, this corresponds to the pendulum bob hanging straight down, and when n is odd, to the bob initially pointing straight up. We will analyze these critical points. Consider first the critical point 0 0. The linear part of the system has matrix fx 0 0 fy 0 0 A00 = gx 0 0 gy 0 0 0 1 = −2 −! with eigenvalues − 21 ! + 21 ! 2 − 42 and − 21 ! − 21 ! 2 − 42 . Recall that ! = c/mL and 2 = g/L. As we might expect, the relative sizes of the damping force, the mass of the bob and the length of the pendulum will determine the nature of the motion. The following cases occur. (1) If ! 2 − 42 > 0, then the eigenvalues are real, unequal √ and negative, so the origin is an asymptotically stable nodal sink. This happens when c > 2m gL. This gives a measure of how large the damping force must be, compared to the mass of the bob and length of the pendulum, to have trajectories spiralling toward the equilibrium solution 0 0. In this case, after release following a small displacement from the vertical downward position, the bob moves toward this position with decreasing velocity, oscillating back and forth through this position and eventually coming to rest in the limit as t → . Figure 11.32 shows a phase portrait for the pendulum with ! 2 = 08 and = 044. y
4
2
3
2
1
0
1
2
3
2 4
FIGURE 11.32 Phase portrait for the damped pendulum with ! 2 = 08 and = 044! 2 − 42 > 0.
x
448
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations y 4
2
3
2
1
0
1
2
x
3
2
4 Damped pendulum with ! 2 = 08 and = 02! − 42 = 0. FIGURE 11.33 2
2
(2) If ! 2 − 42 = 0, then the eigenvalues are equal and negative, corresponding to an asymptotically stable proper or improper node of the linear system. This is the case in which Theorem 11.3 does not give a definitive conclusion, and the origin could be an asymptotically stable node or spiral point of the nonlinear pendulum. This case occurs when √ c = 2m gL, a delicate balance between the damping force, mass, and pendulum length. In the case of an asymptotically stable node, the bob, when released, moves with decreasing velocity toward the vertical equilibrium position, approaching it as t → but not oscillating through it. Figure 11.33 gives a phase portrait for this case, in which ! 2 = 08 and 2 = 02. (3) If ! 2 − 42 < 0, then the eigenvalues are complex conjugates with negative real part. Hence the origin is an asymptotically stable spiral√point of both the linear part and the nonlinear pendulum system. This happens when c < 2m gL. Figure 11.34 displays this case, with ! 2 = 06 and 2 = 03. It is routine to check that each critical point 2n 0, in which the first coordinate is an even integer multiple of , has the same characteristics as the origin. Now consider critical points 2n + 1 0, with first coordinate an odd integer multiple of . To be specific, consider 0. Now the linear part of the system (with 0 translated to the origin) is A0 =
fx 0 gx 0
fy 0 gy 0
=
0 1 −2 cos −!
=
0 2
1 −!
The eigenvalues are − 21 ! + 21 ! 2 + 42 and − 21 ! − 21 ! 2 + 42 . These are real and of opposite sign, so 0 is an unstable saddle point. The other critical points 2n + 1 0 exhibit the same behavior. This is what we would expect of a pendulum in which the bob is initially in the vertical upward position, since arbitrarily small displacements will result in the bob moving away from this position, and it will never return to it. The analysis is the same for each critical point 2n + 1 0.
11.5 Almost Linear Systems
449
y
2 1
1.5
1.0
0.5
0
0.5
1.0
1.5
x
1 2
FIGURE 11.34 Damped pendulum with ! 2 = 06 and
2 = 03! 2 − 42 < 0.
EXAMPLE 11.23 Nonlinear Damped Spring
The nonlinear damped spring equation is x = y y = −
k c x + x3 − y m m m
This is
X = FX =
y −k/mx + /mx3 − c/my
Here
fx y = y and gx y = −
k c x + x3 − y m m m
√ √ There are three critical points, 0 0, k/ 0 and − k/ 0. The partial derivatives are
fx = 0 fy = 1 gx = −
k c + 3 x 2 gy = − m m m
450
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
First consider the behavior of trajectories near the origin. The linear part of the system has matrix 0 1 A00 = −k/m −c/m √ √ with eigenvalues 1/2m −c + c2 − 4mk and 1/2m −c − c2 − 4mk . This yields three cases, depending, as we might expect, on the relative magnitudes of the mass, damping constant and spring constant. 1. If c2 − 4mk > 0, then A00 has real, distinct, negative eigenvalues, so the origin is an asymptotically stable nodal sink. Small disturbances from the equilibrium position result in a motion that dies out with time, with the mass approaching the equilibrium position. 2. If c2 − 4mk = 0, then A00 has equal real, negative eigenvalues, so the origin is an asymptotically stable proper or improper node of the linear system. Hence the origin is an asymptotically stable node or spiral point of the nonlinear system. 3. If c2 − 4mk < 0, then A00 has complex conjugate eigenvalues with negative real part, and the origin is an asymptotically stable spiral point. Figure 11.35 shows a phase portrait √ for case (3), with c = 2, k = 5, = 1 and m = 3. Next, consider the critical point k/ 0. Now the linear part of the system obtained by translating this point to the origin is 0 1 √ A k/0 = k/m −c/m
y
2
1
2
1
0
1
x
1
Nonlinear spring system with c = 2, k = 5, = 1, and m = 3c2 − 4mk < 0.
FIGURE 11.35
√ √ with eigenvalues 1/2m −c + c2 + 4mk and 1/2m −c − c2 + 4mk . The first eigen√ value is positive and the second negative, √ so k/ 0 is an unstable saddle point. A similar analysis holds for the critical point − k/ 0
11.6 Lyapunov’s Stability Criteria
SECTION 11.5
PROBLEMS
In each of Problems 1 through 10, (a) show that the system is almost linear, (b) determine the critical points, (c) use Theorem 11.3 to analyze the nature of the critical point, or state why no conclusion can be drawn, and (d) generate a phase portrait for the system.
is possible in this case in general by considering the following two systems: x = y − x x2 + y2 y = −x − y x2 + y2 and
1. x = x − y + x y = x + 2y 2
451
2. x = x + 3y − x2 siny y = 2x + y − xy2
x = y + x x2 + y2 y = −x + y x2 + y2
(a) Show that the origin is a center for the associated linear system of both systems. (b) Show that each system is almost linear.
3. x = −2x + 2y y = x + 4y + y2
(c) Introduce polar coordinates, with x = r cos and y = r sin and use the chain rule to obtain
4. x = −2x − 3y − y2 y = x + 4y
x =
5. x = 3x + 12y y = −x − 3y + x3
6. x = 2x − 4y + 3xy y = x + y + x
and 2
7. x = −3x − 4y + x2 − y2 y = x + y 8. x = −3x − 4y y = −x + y − x2 y 9. x = −2x − y + y2 y = −4x + y 10. x = 2x − y − x3 sinx y = −2x + y + xy2 11. Theorem 11.3 is inconclusive in the case that the critical point of an almost linear system is a center of the associated linear system. Verify that no conclusion
11.6
dx dr = cosr t dr dt
dy dr = sinr t dr dt Use these to evaluate xx + yy in terms of r and r , where r = dr/dt. Thus convert each system to a system in terms of rt and t y =
(d) Use the polar coordinate version of the first system to obtain a separable differential equation for rt. Conclude from this that r t < 0 for all t. Solve for rt and show that rt → 0 as t → . Thus conclude that for the first system the origin is asymptotically stable. (e) Follow the procedure of (d), using the second system. However, now find that r t > 0 for all t. Solve for rt with the initial condition rt0 = r0 . Show that rt → as t → t0 + 1/r from the left. Conclude that the origin is unstable for the second system.
Lyapunov’s Stability Criteria There is a subtle criterion for stability due to the Russian engineer and mathematician Alexander M. Lyapunov (1857-1918). Suppose X = FX is a 2 × 2 autonomous system of first order differential equations (not necessarily almost linear), and that 0 0 is an isolated critical point. Lyapunov’s insight was this. Suppose there is a function, commonly denoted V , such that closed curves Vx y = c enclose the origin. Further, if the constants are chosen smaller, say 0 < k < c, then the curve Vx y = k lies within the region enclosed by the curve Vx y = c (Figure 11.36 (a)). So far this has nothing to do with the system of differential equations. However, suppose it also happens that, if a trajectory intersects the curve Vx y = c at some time, which we can take to be time zero, then it cannot escape from the region bounded by this curve, but must for all later times remain within this region (Figure 11.36 (b)). This would force trajectories starting out near the origin (meaning within Vx y = c) to forever lie at least this
452
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations y V(x, y) = 4 V(x, y) = 2 x V(x, y) = 1 FIGURE 11.36(a)
Closed curves contracting
about the origin.
y
V(x, y) = c
x
(x(t), y(t))
(x(0), y(0))
FIGURE 11.36(b) Trajectories entering shrinking regions about the origin.
close to the origin. But this would imply, by choosing c successively smaller, that the origin is a stable critical point! If in addition trajectories starting at a point on Vx y = c point into the region bounded by this curve, then we can further conclude that the trajectories are approaching the origin, hence that the origin is asymptotically stable. This is the intuition behind an approach to determining whether a critical point is stable or asymptotically stable. We will now develop the vocabulary which will allow us to give substance to this approach. First, we will distinguish certain functions that have been found to serve the role of V in this discussion. If r > 0, let Nr consist of all x y within distance r from the origin. Thus, x y is in Nr exactly when x2 + y 2 < r 2 This set is called the r-neighborhood of the origin, or, if we need no explicit reference to r, just a neighborhood of the origin.
11.6 Lyapunov’s Stability Criteria
DEFINITION 11.5
453
Positive Definite, Semidefinite
Let Vx y be defined for all x y in some neighborhood Nr of the origin. Suppose V is continuous with continuous first partial derivatives. Then 1. V is positive definite on Nr if V0 0 = 0 and Vx y > 0 for all other points of Nr . 2. V is positive semidefinite on Nr if V0 0 = 0 and Vx y ≥ 0 for all points of Nr . 3. V is negative definite on Nr if V0 0 = 0 and Vx y < 0 for all other points of Nr . 4. V is negative semidefinite on Nr if V0 0 = 0 and Vx y ≤ 0 for all points of Nr .
For example, Vx y = x2 + 3xy + 9y2 is positive definite on Nr for any positive r, and −3x2 + 4xy − 5y2 is negative definite on any Nr . The function x − y4 is positive semidefinite, being nonnegative but vanishing on the line y = x. The following lemma is useful in producing examples of positive definite and negative definite functions.
LEMMA 11.2
Let Vx y = ax2 + bxy + cy2 . Then V is positive definite (on any Nr ) if and only if a > 0 and 4ac − b2 > 0 V is negative definite (on any Nr ) if and only if a < 0 and 4ac − b2 > 0 Certainly V is continuous with continuous partial derivatives in the entire x y plane. Further, V0 0 = 0, and this is the only point at which Vx y vanishes. Now recall the second derivative test for extrema of a function of two variables. First,
Proof
Vx 0 0 = Vy 0 0 = 0 so the origin is a candidate for a maximum or minimum of V . For a maximum or minimum, we need Vxx 0 0Vyy 0 0 − Vxy 0 02 > 0 But this condition is the same as 2a2c − b2 > 0 or 4ac − b2 > 0 Satisfaction of this inequality requires that a and c have the same sign.
454
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
When a > 0, then Vxx 0 0 > 0 and the origin is a point where V has a minimum. In this event Vx y > V0 0 = 0 for all x y other than 0 0. Now V is positive definite. When a < 0 then Vxx 0 0 < 0 and the origin is a point where V has a maximum. Now Vx y < 0 for all x y other than 0 0, and V is negative definite. If x = xt y = yt defines a trajectory of X = FX, and V is a differentiable function of two variables, then Vxt yt is a differentiable function of t along this trajectory. We will denote the derivative of Vxt yt with respect to t as V˙ x y, or just V˙ . By the chain rule, V˙ x y = Vx xt ytx t + Vy xt yty t or, more succinctly, V˙ = Vx x + Vy y This is called the derivative of V along the trajectory, or the orbital derivative of V . The following two theorems show how these ideas about positive and negative definite functions relate to Lyapunov’s approach to stable and asymptotically stable critical points. The criteria given in the first theorem constitute Lyapunov’s direct method for determining the stability or asymptotic stability of a critical point. THEOREM 11.4
Lyapunov’s Direct Method for Stability
Let 0 0 be an isolated critical point of the autonomous 2 × 2 system X = FX. 1. If a positive definite function V can be found for some neighborhood Nr of the origin, such that V˙ is negative semidefinite on Nr , then the origin is stable. 2. If a positive definite function V can be found for some neighborhood Nr of the origin, such that V˙ is negative definite on Nr , then the origin is asymptotically stable. On the other side of the issue, Lyapunov’s second theorem gives a test to determine that a critical point is unstable. THEOREM 11.5
Lyapunov’s Direct Method for Instability
Let 0 0 be an isolated critical point of the autonomous 2 × 2 system X = FX. Let V be continuous with continuous first partial derivatives in some neighborhood of the origin, and let V0 0 = 0. 1. Suppose R > 0, and that in every neighborhood Nr , with 0 < r ≤ R, there is a point at which Vx y is positive. Suppose V˙ is positive definite in NR . Then 0 0 is unstable. 2. Suppose R > 0, and that in every neighborhood Nr , with 0 < r ≤ R, there is a point at which Vx y is negative. Suppose V˙ is negative definite in NR . Then 0 0 is unstable. Any function V playing the role cited in these theorems is called a Lyapunov function. Theorems 11.4 and 11.5 give no suggestion at all as to how a Lyapunov function might be produced, and in attempting to apply them this is the difficult part. Lemma 11.2 is sometimes useful in providing candidates, but, as might be expected, if the differential equation is complicated the task of finding a Lyapunov function might be insurmountable. In spite of this potential difficulty, Lyapunov’s theorems are useful because they do not require solving the system, nor do they require that the system be almost linear.
11.6 Lyapunov’s Stability Criteria
455
Adding to the mystique of the theorem is the nonobvious connection between V , V˙ , and stability characteristics of the critical point. We will give a plausibility argument intended to clarify this connection. Consider Figure 11.37, which shows a typical curve Vx y = c about the origin. Call this curve $. Here is how V˙ enters the picture. At any point P a b on this curve, the vector N = Vx i + Vy j is normal (perpendicular) to $, by which we mean that it is normal to the tangent to $ at this point. In addition, consider a trajectory x = t, y = t passing through P at time t = 0, also shown in Figure 11.37. Thus, 0 = a and 0 = b. The vector T = 0i + 0j is tangent to this trajectory (not to $) at a b. Now, V˙ a b = Vx a b 0 + Vy a b 0 = N · T the dot product of the normal to $ and the tangent to the trajectory at a b. Since the dot product of two vectors is equal to the product of their lengths and the cosine of the angle between them, we obtain V˙ a b = N T cos with the angle between T and N. Now look at conclusions (1) and (2) of the first Lyapunov theorem. If V˙ is negative semidefinite, then V˙ a b ≤ 0, so cos ≤ 0. Then /2 ≤ ≤ 3/2. This means that the trajectory at this point is moving at this point either into the region enclosed by $, or perhaps in the same direction as the tangent to $. The effect of this is that the trajectory cannot move away from the region enclosed by $. The trajectory cannot escape from this region, and so the origin is stable. If V˙ is negative definite, then cos < 0, so /2 < < 3/2 and now the trajectory actually moves into the region enclosed by $, and cannot simply trace out a path around the origin. In this case the origin is asymptotically stable. We leave it for the student to make a similar geometric argument in support of the second Lyapunov theorem.
y
Γ: V(x, y) = c
x P N
θ
T Tangent to Γ at P
FIGURE 11.37 Rationale for Lyapunov’s direct method.
456
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
EXAMPLE 11.24
Consider the nonlinear system
X =
−x3 −4x2 y
The origin is an isolated critical point. We will attempt to construct a Lyapunov function of the form Vx y = ax2 + bxy + cy2 that will tell us whether the origin is stable or unstable. We may not succeed in this, but it is a good first attempt because at least we know conditions on the coefficients of V to make this function positive definite or negative definite. The key lies in V˙ , so compute V˙ = Vx x + Vy y = 2ax + by−x3 + bx + 2cy−4x2 y = −2ax4 − bx3 y − 4bx3 y − 8cx2 y2 Now observe that the −2ax4 and −8cx2 y2 terms will be nonpositive if a and c are positive. The x3 y term will vary in sign, but we can make bx3 y vanish by choosing b = 0. We can choose a and c as any positive numbers, say a = c = 1. Then Vx y = x2 + y2 is positive definite in any neighborhood of the origin , and V˙ = −2x4 − 8x2 y2 is negative semidefinite in any neighborhood of the origin. By Lyapunov’s direct method (Theorem 11.4), the origin is stable. A phase portrait for this system is shown in Figure 11.38. We can draw no conclusion about asymptotic stability of the origin in the last example. If we had been able to find a Lyapunov function V so that V˙ was negative definite (instead of
y 2
1
1.0
0.5
0
0.5
1.0
1
2 FIGURE 11.38
Phase portrait for
x = −x3 y = −4x2 y
.
x
11.6 Lyapunov’s Stability Criteria
457
negative semidefinite), then we could have concluded that the origin was asymptotically stable. But we cannot be sure, from the work done, whether there is no such function, or whether we simply did not find one.
EXAMPLE 11.25
It is instructive to consider Lyapunov’s theorems as they relate to a simple physical problem, the undamped pendulum (put c = 0). The system is x = y g sinx L Recall that x = , the displacement angle of the bob from the downward vertical rest position. We have already characterized the critical points of this problem (with damping). However, this example makes an important point. In problems that are drawn from a physical setting, the total energy of the system can often serve as a Lyapunov function. This is a useful observation, since the search for a Lyapunov function constitutes the primary issue in attempting to apply Lyapunov’s theorems. Thus compute the total energy V of the pendulum. The kinetic energy is y=−
1 2 mL2 t 2 which in the variables of the system is 1 mL2 y2 2 The potential energy is the work done in lifting the bob above the lowest position. From Figure 11.1, this is mgL1 − cos or mgL1 − cosx The total energy is therefore given by 1 Vx y = mgL1 − cosx + mL2 y2 2 Clearly V0 0 = 0. The rest position of the pendulum (pendulum arm vertical with bob at the low point) has zero energy. Next, compute V˙ xt yt = Vx x t + Vy y t = mgL sinxx t + mL2 yy t Along any trajectory of the system, x = y and y = −g/L sinx, so the orbital derivative is g V˙ xt yt = mgL sinxy + mL2 y − sinx = 0 L This corresponds to the fact that, in a conservative physical setting, the total energy is a constant of the motion. Now V is positive definite. This is expected because the energy should be a minimum in the rest position, where V0 0 = 0. Further, V˙ is negative semidefinite. Therefore the origin is stable.
458
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
As expected, Lyapunov’s theorem did not tell us anything new about the pendulum, which we had already analyzed by other means. However, this example does provide some insight into a line of thought that could have motivated Lyapunov. In any conservative physical system, the total energy must be a constant of the motion, and we expect that any position with the system at rest should be stable if the potential energy is a minimum, and unstable if it is not. This suggests looking at the total energy as a candidate for a Lyapunov function V . In particular, for many mechanical systems the kinetic energy is a quadratic form. One then checks the orbital derivative V˙ to see if it is negative definite or semidefinite in a neighborhood of the point of interest.
EXAMPLE 11.26
Consider the system
X =
x3 − xy2 y3 + 6x2 y
The origin is an isolated critical point. We do not know whether it is stable, asymptotically stable, or unstable, so we will begin by trying to construct a Lyapunov theorem that fits either of the Lyapunov theorems. Attempt a Lyapunov function Vx y = ax2 + bxy + cx2 We know how to choose the coefficients to make this positive definite. The question is what happens with the orbital derivative. Compute V˙ = Vx x + Vy y = 2ax + byx3 − xy2 + bx + 2cyy3 + 6x2 y = 2ax4 + 2cy4 + 12c − 2ax2 y2 + 7bx3 y This looks promising because the first three terms can be made strictly positive for x y = 0 0. Thus choose b = 0 and a = c = 1 to get V˙ x y = 2x4 + 2y4 + 10x2 y2 > 0 for x y = 0 0. With this choice of the coefficients, Vx y = x2 + y2 is positive definite on any neighborhood of the origin, and V˙ is also positive definite. By Lyapunov’s second theorem (Theorem 11.5), the origin is unstable. Figure 11.39 shows a phase portrait for this system.
EXAMPLE 11.27
A nonlinear oscillator with linear damping can be modeled by the differential equation z + z + z + z2 + !z3 = 0 in which and ! are positive and = 0. To convert this to a system, let z = x and z = y to obtain x = y y = −y − x − x2 − !x3
11.6 Lyapunov’s Stability Criteria
459
y 2 1.5 1 0.5 1.0
0.5
0
0.5
1.0
x
0.5 1 FIGURE 11.39 Phase portrait for
x = x3 − xy2 y = y3 + 6x2 y
.
This is the system X =
y −y − x − x2 − !x3
We can construct a Lyapunov function by a clever observation. Let 1 1 1 1 Vx y = y2 + x2 + x3 + !x4 2 2 3 4 Then V˙ = x + x2 + !x3 y + y−y − x − x2 − !x3 = −y2 . Since > 0, V˙ is certainly negative semidefinite in any neighborhood of the origin. It may not be obvious whether V is positive definite in any neighborhood of the origin. Certainly the term y2 /2 in V is nonnegative. The other terms are 1 2 1 3 1 4 x + x + !x 2 3 4 which we can write as
x
2
1 1 2 1 + !x + x 2 4 3
Since gx = 21 + 41 !x2 + 13 x is continuous for all x, and g0 = 1/2, there is an interval −h h about the origin such that gx > 0 for − h < x < h
460
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
Then, in Nh , Vx y ≥ 0, and Vx y > 0 if x y = 0 0. Therefore V is positive definite in this neighborhood. We now have V positive definite and V˙ negative semidefinite in Nh , hence the origin is stable. Figure 11.40 shows a phase portrait for the case = 1, = 41 and ! = 16 . It is instructive to try different values of and ! to get some idea of effect of the nonlinear terms on the trajectories.
y 40 20
15
10
5
0
5
10
x
20 40 60
FIGURE 11.40
Nonlinear oscillator with = 1, = 41 , ! = 16 .
EXAMPLE 11.28
Consider the system x = fty + gtxx2 + y2 y = −ftx + gtyx2 + y2 Assume that the origin is an isolated critical point. If we attempt a Lyapunov function Vx y = ax2 + bxy + cy2 , then V˙ = 2ax + by fty + gtxx2 + y2 + bx + 2cy −ftx + gtyx2 + y2 = 2a − 2cftxy + 2ax2 gtx2 + y2 + 2cy2 gtx2 + y2 + bfty2 + 2bgtxyx2 + y2 − bftx2 We can eliminate three terms in the orbital derivative by trying b = 0. The term 2a − 2c ftxy vanishes if we choose a = c. To have V positive definite we need a and c positive, so let a = c = 1. Then Vx y = x2 + y2 which is positive definite in any neighborhood of the origin, and V˙ = 2x2 + y2 2 gt
11.7 Limit Cycles and Periodic Solutions
461
This is negative definite if gt < 0 for all t ≥ 0. In this case the origin is asymptotically stable. If gt ≤ 0 for t ≥ 0, then V is negative semidefinite and then the origin is stable. If gt > 0 for t ≥ 0, then the origin is unstable.
SECTION 11.6
PROBLEMS
In each of Problems 1 through 8, use Lyapunov’s theorem to determine whether the origin is stable asymptotically stable, or unstable.
4. x = −x2 y2 y = x2
1. x = −2xy2 y = −x2 y
6. x = x5 1 + y2 y = x2 y + y3
2. x = −x cos2 y y = 6 − xy2
7. x = x3 1 + y y = y3 4 + x2
3. x = −2x y = −3y3
8. x = x3 cot 2 y y = y3 2 + x4
11.7
5. x = xy2 y = y3
Limit Cycles and Periodic Solutions Nonlinear systems of differential equations can give rise to curves called limit cycles which have particularly interesting properties. To see how a limit cycle occurs naturally in a physical setting, draw a circle C of radius R on pavement, with R exceeding the length L between the points where the front and rear wheel of a bicycle touch the ground. Now grab the handlebars and push the bicycle so that its front wheel moves around C. What path does the rear wheel follow? If you tie a marker to the rear wheel so that it traces out the rear wheel’s path as the front wheel moves along C, you find that this path does not approach a particular point. Instead, as the front wheel continues its path around√C, the rear wheel asymptotically approaches a circle K concentric with C and having radius R2 − L2 . If the rear wheel begins outside C, it will spiral inward toward K, while if it begins inside C , it will work its way outward toward K. If the rear wheel begins on K, it will remain on K. This inner circle K has two properties in common with a stable critical point. Trajectories beginning near K move toward it, and if a trajectory begins on K, it remains there. However, K is not a point, but is instead a closed curve. K is a limit cycle of this motion.
DEFINITION 11.6
Limit Cycle
A limit cycle of a 2 × 2 system X = FX is a closed trajectory K having the property that there are trajectories x = t, y = t of the system such that t t spirals toward K in the limit as t → .
We have already pointed out the analogy between a limit cycle and a critical point. This analogy can be pushed further by defining a concept of stability and asymptotic stability for limit cycles that is modeled after stability and asymptotic stability for critical points.
462
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
DEFINITION 11.7
Let K be a limit cycle of X = FX. Then, 1. K is stable if trajectories starting within a certain distance of K must remain within a fixed distance of K. 2. K is asymptotically stable if every trajectory that starts sufficiently close to K spirals toward K as t → . 3. K is semistable if every trajectory starting on one side of K spirals toward K as t → , while there are trajectories starting on the other side of K that spiral away from K as t → . 4. K is unstable if there are trajectories starting on both sides of K that spiral away from K as t → .
Keep in mind that a closed trajectory of X = FX represents a periodic solution. We have seen periodic solutions previously with centers, which are critical points about which trajectories form closed curves. A limit cycle is therefore a periodic solution, toward which other solutions approach spirally. Often we are interested in whether a system X = FX has a periodic solution, and we will shortly develop some tests to tell whether a system has such a solution, or sometimes to tell that it does not.
EXAMPLE 11.29 Limit Cycle
Consider the almost linear system X =
1 −1
1 1
X+
−xx2 + y2 −y x2 + y2
(11.10)
0 0 is the only critical point of this system. The eigenvalues of 1 1 −1 1 are 1 ± i, so the origin is an unstable spiral point of the system X = AX + G and also of the linear system X = AX. Figure 11.41 shows a phase portrait of the linear system. Up to this point, whenever we have seen trajectories spiraling outward, they have grown without bound. We will now see that this does not happen with this current system. This will be transparent if we convert the system to polar coordinates. Since r 2 = x2 + y 2 then rr = xx + yy = x x + y − x x2 + y2 + y −x + y − y x2 + y2 = x2 + y2 − x2 + y2 x2 + y2 = r 2 − r 3
11.7 Limit Cycles and Periodic Solutions
463
y 10
5
10
5
0
5
10
x
5
10 FIGURE 11.41 Unstable spiral point for x = x + y, y = −x + y.
Then r = r − r 2 = r1 − r This tells us that, if 0 < r < 1, then r > 0 so the distance between a trajectory and the origin is increasing. Trajectories inside the unit circle are moving outward. But if r > 1, then r < 0, so the distance between a trajectory and the origin is decreasing. Trajectories outside the unit circle are moving inward. This does not yet tell us in detail how the trajectories are moving outward or inward. To determine this, we need to bring the polar angle into consideration. Differentiate x = r cos and y = r sin with respect to t: x = r cos − r sin y = r sin + r cos Now observe that x y − xy = r sin r cos − r sin − r cos r sin + r cos
= −r 2 cos2 + sin2 = −r 2 But from the system X = AX + G we have x y − xy = y x + y − x x2 + y2 − x −x + y − y x2 + y2 = x2 + y 2 = r 2 Therefore, r 2 = −r 2
464
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
from which we conclude that = −1 We now have an uncoupled system of differential equations for r and : r = r1 − r
= −1
This equation for r is separable. For the trajectory through r0 0 , solve these equations subject to the initial conditions r0 = r0
0 = 0
We get r=
1 = 0 − t 1 − r0 − 1/r0 e−t
These explicit solutions enable us to conclude the following. If r0 = 1, then r0 0 is on the unit circle. But then rt = 1 for all t, hence a trajectory that starts on the unit circle remains there for all times. Further, t = −1, so the point rt t moves clockwise around this circle as t increases. If 0 < r0 < 1, then r0 0 is within the disk bounded by the unit circle. Now rt =
1 <1 1 + 1 − r0 /r0 e−t
for all t > 0. Therefore a trajectory starting at a point inside the unit disk remains there forever. Further, rt → 1 as t → , so this trajectory approaches the unit circle from within. Finally, if r0 > 1, then r0 0 is outside the unit circle. But now rt > 1 for all t, so a trajectory starting outside the unit circle remains outside for all time. However, it is still true that rt → 1 as t → , so this trajectory approaches the unit circle from without. In sum, trajectories tend in the limit to wrap around the unit circle, either from within or from without, depending on where they start.The unit circle is a an asymptotically stable limit cycle of X = FX. A phase portrait is shown in Figure 11.42. Here is an example of a system having infinitely many asymptotically stable limit cycles. EXAMPLE 11.30
The system X =
0 −1
1 0
⎛ X+⎝
x sin y sin
x2 + y 2
⎞
⎠ x2 + y 2
has particularly interesting limit cycles, as can be seen in Figure 11.43. These occur as concentric circles about the origin. Trajectories originating within the innermost circle spiral toward this circle, as do trajectories beginning between this first circle and the second one. Trajectories beginning between the second and third circles spiral toward the third circle, as do trajectories originating between the third and fourth circles. This pattern continues throughout the plane. We will now develop some facts about closed trajectories (periodic solutions) and limit cycles. For the remainder of this section, fx y X = FX = gx y is a 2 × 2 autonomous system.
11.7 Limit Cycles and Periodic Solutions y 4
2
3
2
0
2
3
x
2
4 FIGURE 11.42 Limit cycle of x = x + y − x x2 + y 2 . y = −x + y − y x2 + y2
y
10 5
10
5
0
5
5 10
FIGURE 11.43 Asymptotically stable limit cycles of x = y + x sin x2 + y2 . y = −x + y sin x2 + y2
10
x
465
466
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations
The first result states that, under commonly encountered conditions, a closed trajectory of X = FX must always enclose a critical point. THEOREM 11.6
Enclosure of Critical Points
Let f and g be continuous with continuous first partial derivatives in a region of the plane containing a closed trajectory K of X = FX. Then K must enclose at least one critical point of X = FX. This kind of result can sometimes be used to tell that certain regions of the plane cannot contain closed trajectories of a system of differential equations. For example, suppose the origin is the only critical point of X = FX. Then we automatically know that there can be no closed trajectory in, for example, one of the quadrants, because such a closed trajectory could not enclose the origin, contradicting the fact that it must enclose a critical point. Bendixson’s theorem, which follows, gives conditions under which X = FX has no closed trajectory in a part of the plane. A region of the plane is called simply connected if it contains all the points enclosed by any closed curve in the region. For example, the region bounded by the unit circle is simply connected. But the shaded region shown in Figure 11.44 between the curves C and K is not simply connected, because C encloses points not in the region. A simply connected region can have no “holes” in it, because then a closed curve wrapping around a hole encloses points not in the region. y K
C x
FIGURE 11.44 Non-simply connected region.
THEOREM 11.7
Bendixson
Let f and g be continuous with continuous first partial derivatives in a simply connected region R of the plane. Suppose fx + gy has the same sign throughout points of R, either positive or negative. Then X = FX has no closed trajectory in R. Proof Suppose R contains a closed trajectory C representing the periodic solution x = t, y = t. Suppose t t traverses this curve exactly once as t varies from a to b, and let D be the region enclosed by C. Evaluate the line integral: b −gx ydx + fx ydy =
−gt t t + ft t t dt C
a
So far we have not used the fact that x = t, y = t is a solution of X = FX. Using this, we have t = x = fx y = ft t
11.7 Limit Cycles and Periodic Solutions
467
and t = y = gx y = gt t Then −gt t t + ft t t = −gt tft t + ft tgt t = 0 Therefore
C
But by Green’s theorem, C
−gx ydx + fx ydy = 0
−gx ydx + fx ydy =
D
fx + gy dxdy
and this integral cannot be zero because the integrand is continuous and of the same sign throughout D. This contradiction implies that no such closed trajectory C can exist within the region R.
EXAMPLE 11.31
Consider the system
X =
3x + 4y + x3 5x − 2y + y3
Here f and g are continuous, with continuous first partial derivatives, throughout the plane. Further, fx + gy = 3 + 3x2 − 2 + 3y2 > 0 for all x y. This system has no closed trajectory, hence no periodic solution. The last two theorems have been negative, in the sense of providing criteria for X = FX to have no periodic solution in some part of the plane. The next theorem, a major result credited dually to Henri Poincaré and Ivar Bendixson, gives a condition under which X = FX has a periodic solution. THEOREM 11.8
Poincaré-Bendixson
Let f and g be continuous with continuous first partial derivatives in a closed, bounded region R of the plane that contains no critical point of X = FX. Let C be a trajectory of X = FX that is in R for t ≥ t0 . Then, C must be a periodic solution (closed trajectory), or else C spirals toward a closed trajectory as t → . In either case, as long as a trajectory enters R at some time, and R contains no critical point, then X = FX has a periodic solution, namely this trajectory itself, or, if not, a closed trajectory approached spirally by this trajectory. On the face of it, this result may appear to contradict the conclusion of Theorem 11.6, since any periodic trajectory should enclose a critical point. However, this critical point need not be in the region R of the theorem. To illustrate, consider again the system (11.10). Let R be the region between the concentric circles r = 1 and r = 3 shown in Figure 11.45. The only critical point of X = FX is the origin, which is not in R. If we choose any trajectory
468
CHAPTER 11
Qualitative Methods and Systems of Nonlinear Differential Equations y r1
r3 x
FIGURE 11.45 Region between the circles r = 21 and r = 3, with trajectories approaching the limit cycle r = 1.
beginning at a point inside the unit circle, then, as we have seen, this trajectory approaches the unit circle, hence eventually enters R. The Poincaré -Bendixson theorem would allow us to assert just from this that R contains a periodic solution of the system. We will conclude this section with Lienard’s theorem, which gives conditions sufficient for a system to have a limit cycle. THEOREM 11.9
Lienard
Let p and q be continuous and have continuous derivatives on the entire real line. Suppose: 1. qx = −q−x for all x. 2. qx > 0 for all x > 0. 3. px = p−x for all x. Suppose also the equation Fx = 0 has exactly one positive root, where x Fx = pd 0
If this root is denoted !, suppose Fx < 0 for 0 < x < !, and Fx is positive and nondecreasing for x > !. Then, the system y X = −pxy − qx has a unique limit cycle enclosing the origin. Further, this limit cycle is asymptotically stable. Under the conditions of the theorem, the system has exactly one periodic solution, and every other trajectory spirals toward this closed curve as t → . As an illustration, we will use Lienard’s theorem to analyze the van der Pol equation. EXAMPLE 11.32 van der Pol Equation
The second-order differential equation z + z2 − 1z + z = 0
11.7 Limit Cycles and Periodic Solutions
469
in which is a positive constant, is called van der Pol’s equation. It was derived by the Dutch engineer Balthazar van der Pol in the 1920’s in his studies of vacuum tubes. It was of great interest to know whether this equation has periodic solutions, a question to which the answer is not obvious. First write van der Pol’s equation as a system. Let x = z and y = z to get x = y y = −x2 − 1 − x This system has exactly one critical point, the origin. Further, this system matches the one in Lienard’s theorem if we let px = x2 − 1 and qx = x. Now q−x = −qx, qx > 0 for x > 0, and p−x = px, as required. Next, let x 1 2 Fx = x −1 pd = x 3 0 √ √ √ F has exactly one positive zero, ! = 3. For 0 < x < 3, Fx < 0. Further, for x > 3, Fx is positive and increasing (hence nondecreasing). By Lienard’s theorem, the van der Pol equation has a unique limit cycle (hence periodic solution) enclosing the origin. This limit cycle is asymptotically stable, so trajectories beginning at points not on this closed trajectory spiral toward this limit cycle. Figures 11.46 through 11.49 show phase portraits for van der Pol’s equation, for various choices of . y
y
6
6
4
4
2
6
4
2
2
2
4
6
x
6
4
2
2 4
2
4
2
4
6
6
FIGURE 11.46 Phase portrait for van der Pol’s equation for = 02.
FIGURE 11.47 Phase portrait for van der Pol’s equation for = 05.
6
x
CHAPTER 11
470
Qualitative Methods and Systems of Nonlinear Differential Equations x2 y 3 4
2 2 1
4
2
0
2
4
x
2
1
0
1
1 2
2
3
4 Phase portrait for van der Pol’s equation
FIGURE 11.48
for = 1.
SECTION 11.7
PROBLEMS
In each of Problems 1 through 4, use Bendixson’s theorem to show that the system has no closed trajectory. Generate a phase portrait for the system.
1. x = −2x − y + x y = 10x + 5y + x y − 2y sinx 3
2
2. x = −x − 3y + e2x y = x + 2y + cosy 3. x = 3x − 7y + sinhx y = −4y + 5e3y 4. x = y y = −x + y9 − x2 − y2 , for x y in the elliptical region bounded by the graph of x2 + 9y2 = 9 5. Recall that with the transformation from rectangular to polar coordinates, we obtain x
dx dy dr +y =r dt dt dt
and y
FIGURE 11.49 Phase portrait for van der Pol’s equation for = 3.
dy d dx −x = −r 2 dt dt dt
Use these equations to show that the system x = y +
x x2 + y 2
f x2 + y2
y f x2 + y2 y = −x + 2 2 x +y has closed trajectories associated with zeros of the function f (a given continuous function of one variable). If the closed trajectory is a limit cycle, what is its direction of orientation? In each of Problems 6 through 9, use a conversion to polar coordinates (see Problem 5) to find all of the closed trajectories of the system, and determine which of these are limit cycles. Also classify the stability of each limit cycle. Generate a phase portrait for each system and attempt to identify the limit cycles.
2
x1
11.7 Limit Cycles and Periodic Solutions 6. x = 4y + x sin x2 + y2 y = −x + y sin x2 + y2
18. x = y y = −x + y − x2 y
8. x = x1 − x2 − y2 y = y1 − x2 − y2
19. x = x − 5y + y3 y = x − y + y3 + 7y5
9. x = y + x1 − x2 − y2 4 − x2 − y2 9 − x2 − y2 y = −x + y1 − x2 − y2 4 − x2 − y2 9 − x2 − y2
20. x = y y = −x + ye−y
2
17. x = y y = x2 + esinx
7. x = y1 − x − y y = −x1 − x − y 2
2
2
In each of Problems 10 through 13, use the PoincaréBendixson theorem to establish the existence of a closed trajectory of the system. In each problem, find an annular region R (region between two concentric circles) about the origin such that solutions within R remain within R. To do this, check the sign of xx + yy on circles bounding the annulus. Generate a phase portrait of the system and attempt to identify closed trajectories.
471
21. x = y y = −x3 22. x = 9x − 5y + xx2 + 9y2 y = 5x + 9y − yx2 + 9y2
10. x = x − y − x x2 + y2 y = x + y − y x2 + y2
A differential equation xt has a periodic solution if there is a solution x = t and a positive number T such that t + T = t for all t. In each of Problems 23 through 27, prove that the differential equation has a periodic solution by converting it to a system and using theorems from this section. Generate a phase portrait for this system and attempt to identify a closed trajectory, which represents a periodic solution.
11. x = 4x − 4y − xx2 + 9y2 y = 4x + 4y − yx2 + 9y2
23. x + x2 − 1x + 2 sinx = 0
12. x = y x = −x + y − yx2 + 2y2
24. x + 5x4 + 9x2 − 4xx + sinhx = 0
13. x = 4x − 2y − y4x2 + y2 y = 2x + 4y − y4x2 + y2
25. x + x3 = 0
In each of Problems 14 through 22, determine whether the system has a closed trajectory. Generate a phase portrait for the system and attempt to find closed trajectories. 14. x = 3x + 4xy + xy2 y = −2y2 + x4 y 15. x = −y + x + xx2 + y2 y = x + y + yx2 + y2 16. x = −y2 y = 3x + 2x3
26. x + 4x = 0 x 27. x + =0 1 + x2 28. Use Bendixson’s theorem to show that the van der Pol equation does not have a closed trajectory whose graph is completely contained in any of the following regions: (a) the infinite strip −1 < x < 1, (b) the half-plane x ≥ 1, or (c) the half-plane x ≤ −1.
This page intentionally left blank
PA RT
4
CHAPTER 12 Vector Differential Calculus
CHAPTER 13
Vector Analysis
Vector Integral Calculus
The next two chapters combine vector algebra and geometry with the processes of calculus to develop vector calculus, or vector analysis. We begin with vector differential calculus, and follow in the next chapter with vector integral calculus. Much of science and engineering deals with the analysis of forces—the force of water on a dam, air turbulence on a wing, tension on bridge supports, wind and weight stresses on buildings, and so on. These forces do not occur in a static state, but vary with position, time and usually a variety of conditions. This leads to the use of vectors that are functions of one or more variables. Our treatment of vectors is in two parts—vector differential calculus (Chapter 12), and vector integral calculus (Chapter 13). Vector differential calculus extends out ability to analyze motion problems from the real line to curves and surfaces in 3-space. Tools such as the directional derivative, divergence and curl of a vector, and gradient play significant roles in many applications. Vector integral calculus generalizes integration to curves and surfaces in 3-space. This will pay many dividends, including the computation of quantities such as mass, center of mass, work, and flux of a vector field, as well as physical interpretations of vector operations. The main results are the integral theorems of Green, Gauss, and Stokes, which have broad applications in such areas as potential theory and the derivation and solution of partial differential equations modeling physical processes. 473
This page intentionally left blank
CHAPTER
12
VECTOR FUNCTIONS OF ONE VARIABLE VELOCITY, ACCELERATION, CURVATURE, AND TORSION VECTOR FIELDS AND STREAMLINES THE GRADIENT FIELD AND DIRECTIONAL DERIVATIVES DIVERGENCE AND
Vector Differential Calculus
12.1
Vector Functions of One Variable In vector analysis we deal with functions involving vectors. We will begin with one such class of functions.
DEFINITION 12.1
Vector Function of One Variable
A vector function of one variable is a vector, each component of which is a function of the same single variable.
Such a function typically has the appearance Ft = xti + ytj + ztk in which xt, yt and zt are the component functions of F. For each t such that the components are defined, Ft is a vector. For example, if Ft = costi + 2t2 j + 3tk
(12.1)
then F0 = i, F = −i + 2 2 j + 3k, and F−3 = cos−3i + 18j − 9k. A vector function is continuous at t0 if each component function is continuous at t0 . A vector function is continuous if each component function is continuous (for those values of t for which they are all defined). For example, Gt =
1 i + lntk t−1
is continuous for all t > 0 with t = 1. 475
476
CHAPTER 12
Vector Differential Calculus
The derivative of a vector function is the vector function formed by differentiating each component. With the function F of (12.1), F t = − sinti + 4tj + 3k A vector function is differentiable if it has a derivative for all t for which it is defined. The vector function G defined above is differentiable for t positive and different from 1, and G t =
−1 1 i + k 2 t − 1 t
We may think of Ft as an arrow extending from the origin to xt yt zt. Since Ft generally varies with t, we must think of this arrow as having adjustable length and pivoting at the origin to swing about as xt yt zt moves. In this way the arrow sweeps out a curve in 3-space as t varies. This curve has parametric equations x = xt, y = yt, z = zt. Ft is called a position vector for this curve. Figure 12.1 shows a typical such curve. z F(t) y x FIGURE 12.1
Position vector for a curve.
The derivative of the position vector is the tangent vector to this curve. To see why this is true, observe from Figure 12.2 and the parallelogram law that the vector Ft0 + t − Ft0
z F(t0 t) F(t0 ) F(t0 ) F(t0 t) y x FIGURE 12.2
is represented by the arrow from xt0 yt0 zt0 to xt0 + t yt0 + t zt0 + t. Since t is a nonzero scalar, the vector 1
Ft0 + t − Ft0 t is along the line between these points. In terms of components, this vector is xt0 + t − xt0 yt + t − yt0 zt + t − zt0 i+ 0 j+ 0 k t t t
(12.2)
12.1 Vector Functions of One Variable
477
In the limit as t → 0, this vector approaches x t0 i + y t0 j + z t0 k, which is F t0 . In this limit, the vector (12.2) moves into a position tangent to the curve at xt0 yt0 zt0 , as suggested by Figure 12.3. This leads us to interpret F t0 as the tangent vector to the curve at xt0 yt0 zt0 . This assumes that F t0 is not the zero vector, which has no direction. z F(t0 ) F(t0 )
F(t0 t) y x FIGURE 12.3 F t0 = lim
Ft0 + t − Ft0 . t
t→0
We usually represent the tangent vector F t0 as an arrow from the point xt0 yt0 zt0 on the curve having Ft as position vector.
EXAMPLE 12.1
Let Ht = t2 i + sintj − t2 k. Ht is the position vector for the curve given parametrically by xt = t2 , yt = sint, zt = −t2 , part of whose graph is given in Figure 12.4. The tangent vector is H t = 2ti + costj − 2tk The tangent vector at the origin is H 0 = j The tangent vector at 1 sin1 −1 is H 1 = 2i + cos1j − 2k From calculus, we know that the length of a curve given parametrically by x = xt, y = yt, z = zt for a ≤ t ≤ b, is b length = x t2 + y t2 + z t2 a
in which it is assumed that x , y and z are continuous on a b. Now F t = x t2 + y t2 + z t2
478
CHAPTER 12
Vector Differential Calculus z 1.0 0.5 0.5 1.0
20 60 80
40 40
y
60 80
x
FIGURE 12.4 Part of the graph of x = t2 , y = sint, z = −t2 .
is the length of the tangent vector. Thus, in terms of the position vector Ft = xti + ytj + ztk, b length = F t dt a
The length of a curve having a tangent at each point is the integral of the length of the tangent vector over the curve.
EXAMPLE 12.2
Consider the curve given by the parametric equations x = cost
y = sint
z=
t 3
for −4 ≤ t ≤ 4. The position vector for this curve is 1 Ft = costi + sintj + tk 3 The graph of the curve is part of a helix wrapping around the cylinder x2 + y2 = 1, centered about the z-axis. The tangent vector at any point is 1 F t = − sinti + costj + k 3 Figure 12.5 shows part of the helix and tangent vectors at various points. The length of the tangent vector is
F t =
sin2 t + cos2 t +
1 1√ = 10 9 3
The length of this curve is length =
4
−4
F t dt =
4 −4
8 √ 1√ 10dt = 10 3 3
12.1 Vector Functions of One Variable
479
z
y x
FIGURE 12.5
Part of a circular helix and some of its tangent vectors.
Sometimes it is convenient to write the position vector of a curve in such a way that the tangent vector at each point has length 1. Such a tangent is called a unit tangent. We will show how this can be done (at least in theory) if the coordinate functions of the curve have continuous derivatives. Let Ft = xti + ytj + ztk for a ≤ t ≤ b, and suppose x , y and z are continuous. Define the real-valued function t st = F d a
As suggested by Figure 12.6, st is the length of the part of the curve from its initial point xa ya za to xt yt zt. As t moves from a to b, st increases from sa = 0 to sb = L, which is the total length of the curve. By the fundamental theorem of calculus, st is differentiable wherever Ft is continuous, and ds = F t = x t2 + y t2 + z t2 dt Because s is strictly increasing as a function of t, we can, at least in theory, solve for t in terms of s, giving the inverse function ts (see Figure 12.7). Now define Gs = Fts = xtsi + ytsj + ztsk for 0 ≤ s ≤ L. Then G is a position function for the same curve as F. As t varies from a to b, Ft sweeps out the same curve as Gs, as s varies from 0 to L. However, G has the advantage that the tangent vector G always has length 1. To see this, use the chain rule to compute d dt d Fts = Ft ds dt ds 1 1 F t = F t = ds/dt F t
G s =
and this vector has length 1.
480
CHAPTER 12
Vector Differential Calculus z
C s
s(t)
(x(t), y(t), z(t))
s0 s(t 0 )
y
(x(a), y(a), z(a)) FIGURE 12.6
t0 s 1 (s0 )
x
Length function along a
t
FIGURE 12.7 A length function has an inverse.
curve.
EXAMPLE 12.3
Consider again the helix having position function 1 Ft = costi + sintj + tk 3 for −4 ≤ t ≤ 4. We have already calculated F t =
1√ 10. 3
Therefore the length function along this curve is t 1√ 1√ st = 10d = 10t + 4 3 −4 3 Solve for the inverse function to write 3 t = ts = √ s − 4 10 Substitute this into the position vector to define 3 Gs = Fts = F √ s − 4 10 3 3 3 1 = cos √ s − 4 i + sin √ s − 4 j + √ s − 4 k 3 10 10 10 3 3 1 4 = cos √ s i + sin √ s j + √ s − k 3 10 10 10 Now compute
3 3 3 1 3 G s = − √ cos √ s i + √ sin √ s j + √ k 10 10 10 10 10
This is a tangent vector to the helix, and it has length 1. Assuming that the derivatives exist, then 1. Ft + Gt = F t + G t. 2. ftFt = f tFt + ftF t if f is a differentiable real-valued function. 3. Ft · Gt = F t · Gt + Ft · G t.
12.2 Velocity, Acceleration, Curvature and Torsion
481
4. Ft × Gt = F t × Gt + Ft × G t. 5. Fft = f tF ft. Items (2), (3) and (4) are all “product rules”. Rule (2) is for the derivative of a product of a scalar function with a vector function; (3) is for the derivative of a dot product; and (4) is for the derivative of a cross product. In each case the rule has the same form as the familiar calculus formula for the derivative of a product of two functions. However, in (4), order is important, since F × G = −G × F. Rule (5) is a vector version of the chain rule. In the next section we will use vector functions to develop the concepts of velocity and acceleration, which we will apply to the geometry of curves in 3–space.
SECTION 12.1
PROBLEMS
In each of Problems 1 through 8, compute the requested derivative (a) by carrying out the vector operation and differentiating the resulting vector or scalar, and (b) by using one of the differentiation rules (1) through (5) stated at the end of this section. 1. Ft = i+3t j+2tk ft = 4 cos3t d/dt ftFt 2
2. Ft = ti − 3t2 k Gt = i + costk d/dt Ft · Gt 3. Ft = ti + j + 4k Gt = i − costj + tk d/dt Ft × Gt 4. Ft = sinhtj − tk Gt = ti + t2 j − t2 k d/dt Ft × Gt 5. Ft = ti − coshtj + et k ft = 1 − 2t3 d/dt ftFt 6. Ft = ti − tj + t2 k Gt = sinti − 4j + t3 k d/dt Ft · Gt
12.2
7. Ft = −9i + t2 j + t2 k Gt = et i d/dt Ft × Gt 8. Ft = −4 costk Gt = −t2 i + 4 sintk d/dt Ft · Gt In each of Problems 9, 10, and 11, (a) write the position vector and tangent vector for the curve whose parametric equations are given, (b) find a length function st for the curve, (c) write the position vector as a function of s, and (d) verify that the resulting position vector has a derivative of length 1. 9. x = sint y = cost z = 45t 0 ≤ t ≤ 2 10. x = y = z = t3 −1 ≤ t ≤ 1 11. x = 2t2 y = 3t2 z = 4t2 1 ≤ t ≤ 3 12. Let Ft = xti + ytj + ztk. Suppose x, y and z are differentiable functions of t. Think of Ft as the position function of a particle moving along a curve in 3-space. Suppose F × F = O. Prove that the particle always moves in the same direction.
Velocity, Acceleration, Curvature and Torsion Imagine a particle moving along a path having position vector Ft = xti + ytj + ztk as t varies from a to b. We want to relate F to the dynamics of the particle. For calculations we will be doing, we assume that x, y and z are twice differentiable. We will also make use of the distance function along the curve, st =
t a
F d
482
CHAPTER 12
Vector Differential Calculus
DEFINITION 12.2
Velocity, Speed
1. The velocity vt of the particle at time t is defined to be vt = F t 2. The speed vt of the particle at time t is the magnitude of the velocity.
Velocity is therefore a vector, having magnitude and direction. If vt is not the zero vector, then the velocity is tangent to the curve of motion of the particle. Thus, at any instant the particle may be thought of as moving in the direction of the tangent to the path of motion. The speed at time t is a real-valued function, given by vt = vt = F t =
ds dt
This is consistent with the familiar idea of speed as the rate of change of distance (along the path of motion) with respect to time.
DEFINITION 12.3
Acceleration
The acceleration at of the particle is the rate of change of the velocity with respect to time: at = v t
Alternatively, at = F t As with velocity, acceleration is a vector.
EXAMPLE 12.4
Let Ft = sinti + 2e−t j + t2 k The path of the particle is the curve whose parametric equations are x = sint
y = 2e−t
z = t2
Part of the graph of this curve is shown in Figure 12.8. The velocity and acceleration are, respectively, vt = costi − 2e−t j + 2tk and at = − sinti + 2e−t j + 2k
12.2 Velocity, Acceleration, Curvature and Torsion
483
z 80 60
40
1
20 0 1
x
1 2 y
FIGURE 12.8 Part of the graph of x = sint, y = 2e−t , z = t2 .
The speed of the particle is vt =
cos2 t + 4e−2t + 4t2
If F t is not the zero vector, then this vector is tangent to the curve at xt yt zt. We may obtain a unit tangent by dividing this vector by its length: Tt =
1 F t
F t =
1 F t ds/dt
Equivalently, Tt =
1 1 vt vt = vt vt
the velocity divided by the speed. If arc length s along the path of motion is used as the parameter in the position function, then we have seen that F s = 1 automatically, so in this case the speed is identically 1 and this unit tangent is just the velocity vector. We will use this unit tangent to define a function that quantifies the “amount of bending” of a curve at a point.
DEFINITION 12.4
Curvature
The curvature % of a curve is the magnitude of the rate of change of the unit tangent with respect to arc length along the curve: # # # dT # # %s = # # ds #
The definition is motivated by the intuition (Figure 12.9) that the more a curve bends at a point, the faster the tangent vector is changing there. If the unit tangent vector has been written using s as parameter, then computing T s is straightforward. More often, however, the unit tangent is parametrized by some other variable, and then the derivative defining the curvature must be computed by the chain rule: # # # dT dt # # # %t = # dt ds #
484
CHAPTER 12
Vector Differential Calculus z
y
x FIGURE 12.9 Increasing curvature corresponds to an increasing rate of change of the tangent vector along the curve.
This gives the curvature as a function of the parameter used in the position vector. Since dt 1 1 = = ds ds/dt F t we often write %t =
1 F t
T t
EXAMPLE 12.5
Consider a straight line having parametric equations x = a + bt
y = c + dt
z = e + ht
in which a, b, c, d, e and h are constants. The position vector of this line is Ft = a + bti + c + dtj + e + htk We will compute the curvature using equation (12.3). First, F t = bi + dj + hk so F t =
b 2 + d 2 + h2
The unit tangent vector is Tt =
1 F t
=√
F t
1 b 2 + d 2 + h2
bi + dj + hk
(12.3)
12.2 Velocity, Acceleration, Curvature and Torsion
485
This is a constant vector, so T t = O and the curvature is %t =
1 T t = 0 F t
This is consistent with our intuition that a straight line should have curvature zero.
EXAMPLE 12.6
Let C be the circle of radius 4 about the origin in the plane y = 3. Using polar coordinates, this curve has parametric equations x = 4 cos
y = 3
z = 4 sin
for 0 ≤ t ≤ 2. The circle has position vector F = 4 cosi + 3j + 4 sink Then F = −4 sini + 4 cosk so F = 4 The unit tangent is T =
1
−4 sini + 4 cosk = − sini + cosk 4
Then T = − cosi − sink The curvature is % =
1 1 T = 4 F
The curvature of this circle is constant, again consistent with intuition. One can show that a circle of radius r has curvature 1/r. Not only does a circle have constant curvature, but this curvature also decreases the larger the radius is chosen, as we should expect.
EXAMPLE 12.7
Let C have parametric representation x = cost + t sint
y = sint − t cost
z = t2
for t > 0. Figure 12.10 shows part of the graph of C. We will compute the curvature. We can write the position vector Ft = cost + t sint i + sint − t cost j + t2 k
486
CHAPTER 12
Vector Differential Calculus z
–8
–6
–4
8
80 60 40 20 –2
0 –2
4
4
6
–4
6
8
x
y
FIGURE 12.10 Part of the graph of x = cost + t sint, y = sint− t cost, z = t2 .
A tangent vector is given by F t = t costi + t sintj + 2tk and F t =
√
5t
Next, the unit tangent vector is Tt =
1 F t
1 F t = √ costi + sintj + 2k 5
Compute 1 T t = √ − sinti + costj 5 We can now use equation (12.3) to compute %t =
1 F t
1 =√ 5t
T t
1 2 1 sin t + cos2 t = 5 5t
for t > 0. We will now introduce another vector of interest in studying motion along a curve.
DEFINITION 12.5
Unit Normal Vector
Using arc length s as parameter on the curve, the unit normal vector Ns is defined by Ns = provided that %s = 0.
1 T s %s
12.2 Velocity, Acceleration, Curvature and Torsion
487
The name given to this vector is motivated by two observations. First, Ns is a unit vector. Since %s = T s, then Ns =
1 T s = 1 T s
Second, Ns is orthogonal to the unit tangent vector. To see this, begin with the fact that Ts is a unit tangent, hence Ts = 1. Then Ts2 = Ts · Ts = 1 Differentiate this equation to get T s · Ts + Ts · T s = 2Ts · T s = 0 hence Ts · T s = 0 which means that Ts is orthogonal to T s. But Ns is a positive scalar multiple of T s, and so is in the same direction as T s. We conclude that Ns is orthogonal to Ts At any point of a curve with differentiable coordinate functions (not all vanishing for the same parameter value), we may now place a tangent vector to the curve, and a normal vector that is perpendicular to the tangent vector (Figure 12.11). z
T N
y x FIGURE 12.11 Tangent and normal vector to a curve at a point.
EXAMPLE 12.8
Consider again the curve with position function Ft = cost + t sint i + sint − t cost j + t2 k for t > 0. In Example 12.7 we computed the unit tangent and the curvature as functions of t. We will write the position vector as a function of arc length, and compute the unit tangent Ts and the unit normal Ns.
488
CHAPTER 12
Vector Differential Calculus
First, using F t =
√
5t from Example 12.7,
st =
t 0
F d =
Solve for t as a function of s:
in which =
t√ 0
√ 5d =
5 2 t 2
√ √ 2√ t = 1/4 s = s 5
√ . 1/4 2 5 . In terms of s, the position vector is
Gs = Fts
√ √ √
√ √ √ = cos s + s sin s i + sin s − s cos s j + 2 sk The unit tangent is Ts = G s √ √ 1 1 = 2 cos si + 2 sin sj + 2 k 2 2 This vector does indeed have length 1: 1 5 Ts = 4 + 4 = 4 4 2
√ 4 2 = 1 51/4
Now √ √ 3 3 T s = − √ sin si + √ cos sj 4 s 4 s so the curvature is %s = T s =
6 16s
1/2
1 = √ 4 s
21/2 51/4
3 =
1 53/4
1 1 √ √ 2 s
√ for s > 0. Since s = 5t2 /2, then in terms of t we have % = 1/5t, consistent with Example 12.7. Now compute the unit normal vector √
√ √ 4 s 3 1 3 Ns = T s = 3 − √ sin si + √ cos sj %s 4 s 4 s √ √ = − sin si + cos sj This is a unit vector orthogonal to Ts.
12.2.1 Tangential and Normal Components of Acceleration At any point on the trajectory of a particle, the tangent and normal vectors are orthogonal. We will now show how to write the acceleration at a point as a linear combination of the tangent and normal vectors there: a = aT T + aN N This is illustrated in Figure 12.12.
12.2 Velocity, Acceleration, Curvature and Torsion
489
z a a TT aNN y a a TT a N N
x
FIGURE 12.12
THEOREM 12.1
a=
dv T + v2 %N dt
Thus aT = dv/dt and aN = v2 %. The tangential component of the acceleration is the derivative of the speed, while the normal component is the curvature at the point, times the square of the speed there. Proof
First observe that Tt =
1 1 F t = v v F t
Therefore v = vT Then d dv v= T + vT t dt dt ds dT dv T+v = dt dt ds dv T + v2 T s = dt dv T + v2 %N = dt
a=
Here is one use of this decomposition of the acceleration. Since T and N are orthogonal unit vectors, then a2 = a · a = aT T + aN N · aT T + aN N = a2T T · T + 2aT aN T · N + a2N N · N = a2T + a2N From this, whenever two of a, aT and aN are known, we can compute the third quantity.
490
CHAPTER 12
Vector Differential Calculus
EXAMPLE 12.9
Return again to the curve C having position function Ft = cost + t sint i + sint − t cost j + t2 k for t > 0. We will compute the tangential and normal components of the acceleration. First, vt = F t = t costi + t sintj + 2tk so the speed is vt = F t =
√
5t
The tangential component of the acceleration is therefore dv √ aT = = 5 dt a constant for this curve. The acceleration vector is a = v = cost − t sinti + sint + t costj + 2k and a routine calculation gives a =
5 + t2
Then a2N = a2 − a2T = 5 + t2 − 5 = t2 Since t > 0, the normal component of acceleration is aN = t. The acceleration may therefore be written as √ a = 5T + tN If we know the normal component aN and the speed v, it is easy to compute the curvature, since aN = t = %v2 = 5t2 % implies that 1 5t as we computed in Example 12.7 directly from the tangent vector. Now the unit tangent and normal vectors are easy to compute in terms of t. First, %=
1 1 Tt = v = √ costi + sintj + 2k v 5 This is usually easy to compute, since v = F t is a straightforward calculation. But in addition, we now have the unit normal vector (as a function of t) 1 1 dt dT 1 T s = = T t % % ds dt %v 5t 1 = √ √ − sinti + costj = − sinti + costj 5t 5
Nt =
This calculation does not require the explicit computation of st and its inverse function.
12.2 Velocity, Acceleration, Curvature and Torsion
12.2.2
491
Curvature as a Function of t
Equation (12.3) gives the curvature in terms of the parameter t used for the position function: %t =
1 T t F t
This is a handy formula because it does not require introduction of the distance function st. We will now derive another expression for the curvature that is sometimes useful in calculating % directly from the position function.
THEOREM 12.2
Let F be the position function of a curve, and suppose that the components of F are twice differentiable functions. Then %=
F × F F 3/2
This states that the curvature is the magnitude of the cross product of the first and second derivatives of F, divided by the cube of the length of F . Proof
First write
ds a = aT T + % dt
2 N
Take the cross product of this equation with the unit tangent vector: 2 2 ds ds T × a = aT T × T + % T × N = % T × N dt dt since the cross product of any vector with itself is the zero vector. Then 2 ds T × a = % T × N dt 2 ds =% T N sin dt where is the angle between T and N. But these are orthogonal unit vectors, so = /2 and T = N = 1. Therefore 2 ds T × a = % dt Then %=
T × a ds/dt2
But T = F / F , a = T = F , and ds/dt = F , so # # # 1 # # 1 F × F # = F × F %= 2 3 # # F F F
492
CHAPTER 12
Vector Differential Calculus
EXAMPLE 12.10
Let C have position function Ft = t2 i − t3 j + tk We want the curvature of C. Compute F t = 2ti − 3t2 j + k F t = 2i − 6tj and i F × F = 2t 2
j −3t2 −6t
k 1 0
= 6ti + 2j − 6t2 k The curvature is %t = =
# # #6ti + 2j − 6t2 k# 2ti − 3t2 j + k3 √ 36t2 + 4 + 36t4 4t2 + 9t4 + 13/2
12.2.3 The Frenet Formulas If T and N are the unit tangent and normal vectors, the vector B = T × N is also a unit vector, and is orthogonal to T and N. At any point on the curve where these three vectors are defined and nonzero, the triple T, N, B forms a right-handed rectangular coordinate system (as in Figure 12.13). We can in effect put an x y z coordinate system at any point P on C, with the positive x-axis along T, the positive y-axis along N, and the positive z axis along B. Of course, this system twists and changes orientation in space as P moves along C (Figure 12.14). Since N = 1/%T s, then dT = %N ds
z
N
B
T
B
T y
x FIGURE 12.13
N
N B FIGURE 12.14
N B T
T
C
12.3 Vector Fields and Streamlines
493
Further, it can be shown that there is a scalar-valued function such that dN = −%T + B ds and dB = −N ds These three equations are called the Frenet formulas. The scalar quantity s is the torsion of C at xs ys zs. If we look along C at the coordinate system formed at each point by T, N and B, the torsion measures how this system twists about the curve as the point moves along the curve.
SECTION 12.2
PROBLEMS
In each of Problems 1 through 10, a position vector is given. Determine the velocity, speed, acceleration, tangential and normal components of the acceleration, the curvature, the unit tangent, unit normal and binormal vectors. 1. F = 3ti − 2j + t2 k
8. F = lnti − j + 2k 9. F = t2 i + t2 j + !t2 k 10. F = 3t costj − 3t sintk 11. Suppose we are given the position vector of a curve and find that the unit tangent vector is a constant vector. Prove that the curve is a straight line.
2. F = t sinti + t costj + k 3. F = 2ti − 2tj + tk
12. It is easy to verify that the curvature of any straight line is zero. Suppose C is a curve with twice differentiable coordinate functions, and curvature zero. Does it follow that C is a straight line?
4. F = et sinti − j + et costk 5. F = 3e−t i + j − 2k 6. F = costi + tj + sintk
12.3
7. F = 2 sinhtj − 2 coshtk
Vector Fields and Streamlines We now turn to the analysis of vector functions of more than one variable.
DEFINITION 12.6
Vector Field
A vector field in 3-space is a 3-vector whose components are functions of three variables. A vector field in the plane is a 2-vector whose components are functions of two variables.
A vector field in 3-space has the appearance Gx y z = fx y zi + gx y zj + hx y zk and, in the plane, Kx y = fx yi + gx yj
494
CHAPTER 12
Vector Differential Calculus
The term “vector field” is geometrically motivated. At each point P for which a vector field G is defined, we can represent the vector GP as an arrow from P. If is often useful in working with a vector field G to draw arrows GP at points through the region where G is defined. This drawing is also referred to as a vector field (think of arrows growing at points). The variations in length and orientation of these arrows gives some sense of the flow of the vector field, and its variations in strength, just as a direction field helps us visualize trajectories of a system X = FX of differential equations. Figures 12.15 and 12.16 show the vector fields Gx y = xyi + x − yj and Hx y = y cosxi + x2 − y2 j, respectively, in the plane. Figures 12.17, 12.18 and 12.19 show the vector fields Fx y z = cosx + yi − xj + x − zk, Qx y z = −yi + zj + x + y + zk and Mx y z = cosxi + e−x sinyj + z − yk, respectively, in 3-space.
y
y 1
1 0 1 1
1
x
FIGURE 12.15 Representation of the vector field Gx y = xyi + x − yj.
1 0 1
x
1
FIGURE 12.16 Representation of the vector field Hx y = y cosxi+ x2 − y2 j.
y 1 1 z 1
y 1
x
1
z
FIGURE 12.17 Representation of the vector field Fx y z = cosx + yi − xj + x − zk as arrows in planes z = constant.
1 1
1
x
FIGURE 12.18 Representation of the vector field Qx y z = −yi + zj + x + y + zk in planes z = constant.
12.3 Vector Fields and Streamlines
495
z
x 1 y
0
1
1 1
FIGURE 12.19 The vector field Mx y z = cosxi + e−x sinyj + z − yk in planes z = constant.
A vector field is continuous if each of its component functions is continuous. A partial derivative of a vector field is the vector field obtained by taking the partial derivative of each component function. For example, if Fx y z = cosx + yi − xj + x − zk then
F = Fx = − sinx + yi − j + k
x
F = Fy = − sinx + yi
y
and
F = Fz = −k
z
If Gx y = xyi + x − yj then
G = Gx = yi + j
x
and
G = Gy = xi − j
y
Streamlines of a vector field F are curves with the property that, at each point x y z, the vector Fx y z is tangent to the curve through this point.
DEFINITION 12.7
Streamlines
Let F be a vector field, defined for all x y z in some region of 3-space. Let be a set of curves with the property that, through each point P of , there passes exactly one curve from . The curves in are streamlines of F if, at each x y z in , the vector Fx y z is tangent to the curve in passing through x y z.
Streamlines are also called flow lines or lines of force, depending on context. If F is the velocity field for a fluid, the streamlines are often called flow lines (paths of particles in the
496
CHAPTER 12
Vector Differential Calculus
fluid). If F is a magnetic field the streamlines are called lines of force. If you put iron filings on a piece of cardboard and then hold a magnet underneath, the filings will be aligned by the magnet along the lines of force of the field. Given a vector field F, we would like to find the streamlines. This is the problem of constructing a curve through each point of a region, given the tangent to the curve at each point. Figure 12.20 shows typical streamlines of a vector field, together with some of the tangent vectors. We want to determine the curves from the tangents. z
y x FIGURE 12.20
Streamlines of a vector field.
To solve this problem, suppose C is a streamline of F = f i + gj + hj. Let C have parametric equations x = x y = y z = z The position vector for this curve is R = xi + yj + zk Now R = x i + y j + z k is tangent to C at x y z. But for C to be a streamline of F, Fx y z is also tangent to C at this point, hence must be parallel to R . These vectors must therefore be scalar multiples of each other. For some scalar t (which may depend on ), R = tFx y z But then dx dy dz i + j + k = tfx y zi + tgx y zj + thx y zk d d d This implies that dx dy dz = tf = tg = th d d d
(12.4)
Since f , g and h are given functions, these equations constitute a system of differential equations for the coordinate functions of the streamlines. If f , g and h are nonzero, then t can be eliminated to write the system in differential form as dx dy dz = = f g h
(12.5)
12.3 Vector Fields and Streamlines
497
EXAMPLE 12.11
We will find the streamlines of the vector field F = x2 i + 2yj − k. The system (12.4) is dy dx dz = tx2 = 2ty = −t d d d If x and y are not zero, this can be written in the form of equations (12.5): dx dy dz = = x2 2y −1 These equations can be solved in pairs. To begin, integrate dx = −dz x2 to get 1 − = −z + c x in which c is an arbitrary constant. Next, integrate dy = −dz 2y to get 1 ln y = −z + k 2 It is convenient to express two of the variables in terms of the third. If we solve for x and y in terms of z, we get x=
1 z−c
y = ae−2z
in which a = e2k is an arbitrary (positive) constant. This gives us parametric equations of the streamlines, with z as parameter: x=
1 z−c
y = ae−2z
z = z
To find the streamline through a particular point, we must choose c and a appropriately. For example, suppose we want the streamline through −1 6 2. Then z = 2 and we need to choose c and a so that −1 =
1 and 6 = ae−4 2−c
Then c = 3 and a = 6e4 , so the streamline through −1 6 2 has parametric equations x=
1 z−3
y = 6e4−2z
A graph of this streamline is shown in Figure 12.21.
z = z
498
CHAPTER 12
Vector Differential Calculus z 4 2 4000 x 10
2
8000
12000 16000 y
FIGURE 12.21 Part of the graph of x = 1/z − 3, y = 6e4−2z , z = z.
EXAMPLE 12.12
Suppose we want the streamlines of Fx y z = −yi + zk. Here the i component is zero, so we must begin with equations (12.4), not (12.5). We have dx = 0 d
dy = −ty d
dz = tz d
The first equation implies that x = constant. This simply means that all the streamlines are in planes parallel to the y z plane. The other two equations yield dy dz = −y z and an integration gives − lny + c = lnz Then lnzy = c implying that zy = k in which k is constant. The streamlines are given by the equations x = c
k z= y
in which c and k are arbitrary constants and y is the parameter. For example, to find the streamline through −4 1 7, choose c = −4 and k so that z = k/y passes through y = 1 z = 7. We need 7=
k = k 1
and the streamline has equations 7 x = −4 z = y The streamline is a hyperbola in the plane x = −4 (Figure 12.22).
12.4 The Gradient Field and Directional Derivatives
8
4.0
x
499
z 4 20
4.2 4.4 20
0 4 8
y
3.8 3.6
FIGURE 12.22 Part of the graph of x = −4,
z = 7/y.
SECTION 12.3
PROBLEMS
In each of Problems 1 through 5, compute the two first partial derivatives of the vector field and make a diagram in which each indicated vector is drawn as an arrow from the point at which the vector is evaluated. 1. Gx y = 3xi − 4xyj G0 1 G1 3 G1 4 G−1 −2 G−3 2 2. Gx y = ex i − 2x2 yj G0 0 G0 1 G2 −3 G−1 −3 3. Gx y = 2xyi+cosxj G/2 0 G0 0 G−1 1 G −3 G−/4 −2 4. Gx y = sin2xyi + x2 + yj G−/2 0 G0 2 G/4 4 G1 1 G−2 1 5. Gx y = 3x2 i+x−2yj G1 −1 G0 2 G−3 2 G−2 −2 G2 5 In each of Problems 6 through 10, compute the three first partial derivatives of the vector field. 6. F = exy i − 2x2 yj + coshz + yk 7. F = 4z2 cosxi − x3 yzj + x3 yk
12.4
8. F = 3xy3 i + lnx + y + zj + coshxyzk 9. F = −z4 sinxyi + 3xy4 zj + coshz − xk 10. F = 14x − 2yi + x2 − y2 − z2 j + 5xyk In each of Problems 11 through 16, find the streamlines of the vector field, then find the particular streamline through the given point. 11. F = i − y2 j + zk 2 1 1 12. F = i − 2j + k 0 1 1 13. F = 1/xi + ex j − k 2 0 4 14. F = cosyi + sinxj /2 0 −4 15. F = 2ez j − cosyk 3 /4 0 16. F = 3x2 i − yj + z3 k 2 1 6 17. Construct a vector field whose streamlines are straight lines. 18. Construct a vector field in the x y plane whose streamlines are circles about the origin.
The Gradient Field and Directional Derivatives Let x y z be a real-valued function of three variables. In the context of vectors, such a function is called a scalar field. We will define an important vector field manufactured from .
500
CHAPTER 12
Vector Differential Calculus
DEFINITION 12.8
Gradient
The gradient of a scalar field is the vector field & given by & =
i + j + k
x
y
z
wherever these partial derivatives are defined.
The symbol & is read “del phi”, and & is called the del operator. It operates on a scalar field to produce a vector field. For example, if x y z = x2 y cosyz, then & = 2xy cosyzi + x2 cosyz − x2 yz sinyzj − x2 y2 sinyzk The gradient field evaluated at a point P is denoted &P. For the gradient just computed, &1 −1 3 = −2 cos3i + cos3 − 3 sin3j + sin3k If is a function of just x and y, then & is a vector in the x y plane. For example, if x y = x − y cosy, then &x y = cosyi + − cosy − x − y sinyj At 2 this gradient is &2 = −i + j The gradient has the obvious properties & + = & + & and, if c is a number, then &c = c& We will now define the directional derivative, and relate this to the gradient. Suppose x y z is a scalar field. Let u = ai + bj + ck be a unit vector (length 1). Let P0 = x0 y0 z0 . Represent u as an arrow from P0 , as in Figure 12.23. We want to define a quantity that measures the rate of change of x y z as x y z varies from P0 , in the direction of u. To do this, notice that, if t > 0, then the point P x0 + at y0 + bt z0 + ct is on the line through P0 in the direction of u. Further, the distance from P0 to P along this direction is exactly t, because the vector from P0 to P is x0 + at − x0 i + y0 + bt − y0 j + z0 + ct − z0 k
(x, y, z) (x 0 at, y0 bt, z 0 ct) t0 P0 (x 0 , y 0 , z 0 )
u
FIGURE 12.23
12.4 The Gradient Field and Directional Derivatives
501
and this is just tu. The derivative d x + at y + bt z + ct dt is the rate of change of x + at y + bt z + ct with respect to this distance t, and
d x + at y + bt z + ct dt t=0 is this rate of change evaluated at P0 . This derivative gives the rate of change of x y z at P0 in the direction of u. We will summarize this discussion in the following definition. DEFINITION 12.9
Directional Derivative
The directional derivative of a scalar field at P0 in the direction of the unit vector u is denoted Du P0 , and is given by
d x + at y + bt z + ct Du P0 = dt t=0 We usually compute a directional derivative using the following. THEOREM 12.3
If is a differentiable function of two or three variables, and u is a constant unit vector, then Du P0 = &P0 · u Proof
Let u = ai + bj + ck. By the chain rule, d
x + at y + bt z + ct = a + b + c dt
x
y
z
Since x0 + at y0 + bt z0 + ct = x0 y0 z0 when t = 0, then Du P0 =
P a + P0 b + P0 c
x 0
y
z
= &P0 · u
EXAMPLE 12.13
Let x y z = x2 y − xez , P0 = 2 −1 and u = x y z at P0 in the direction of u is Du 2 −1 = &2 −1 · u
√1 6
i − 2j + k. Then the rate of change of
1 2 1 = x 2 −1 √ + y 2 −1 − √ + z 2 −1 √ 6 6 6
1 = √ 2xy − ez 2−1 − 2 x2 −21 + −xez 2−1 6 1 −3 = √ −4 − e − 8 − 2e = √ 4 + e 6 6
502
CHAPTER 12
Vector Differential Calculus
In working with directional derivatives, care must be taken that the direction is given by a unit vector. If a vector w of length other than 1 is used to specify the direction, then use the unit vector w/ w in computing the directional derivative. Of course, w and w/ w have the same direction. A unit vector is used with directional derivatives so that the vector specifies only direction, without contributing a factor of magnitude. Suppose now that x y z is defined at least for all points within some sphere about P0 . Imagine standing at P0 and looking in various directions. We may see x y z increasing in some, decreasing in others, perhaps remaining constant in some directions. In what direction does x y z have its greatest, or least, rate of increase from P0 ? We will now show that the gradient vector &P0 points in the direction of maximum rate of increase at P0 , and −&P0 in the direction of minimum rate of increase. THEOREM 12.4
Let and its first partial derivatives be continuous in some sphere about P0 , and suppose that &P0 = O. Then 1. At P0 , x y z has its maximum rate of change in the direction of &P0 . This maximum rate of change is &P0 . 2. At P0 , x y z has its minimum rate of change in the direction of −&P0 . This minimum rate of change is − &P0 . Proof
Let u be any unit vector. Then Du P0 = &P0 · u = &P0 u cos = &P0 cos
because u has length 1. is the angle between u and &P0 . The direction u in which has its greatest rate of increase from P0 is the direction in which this directional derivative is a maximum. Clearly the maximum occurs when cos = 1, hence when = 0. But this occurs when u is along &P0 . Therefore this gradient is the direction of maximum rate of change of x y z at P0 . This maximum rate of change is &P0 . For (2), observe that the directional derivative is a minimum when cos = −1, hence when = . This occurs when u is opposite &P0 , and this minimum rate of change is − &P0 .
EXAMPLE 12.14
Let x y z = 2xz + ey z2 . We will find the maximum and minimum rates of change of x y z from 2 1 1. First, &x y z = 2zi + ey z2 j + 2x + 2zey k so &P0 = 2i + ej + 4 + 2ek The maximum rate of increase of x y z at 2 1 1 is in the direction of this gradient, and this maximum rate of change is 4 + e2 + 4 + 2e2 The minimum rate of increase is in the direction of −2i − ej − 4 + 2ek, and is − 4 + e2 + 4 + 2e2 .
12.4 The Gradient Field and Directional Derivatives
12.4.1
503
Level Surfaces, Tangent Planes and Normal Lines
Depending on the function and the constant k, the locus of points x y z = k may form a surface in 3-space. Any such surface is called a level surface of . For example, if x√ y z = x2 + y2 + z2 and k > 0, then the level surface x y z = k is a sphere of radius k. If k = 0 this locus is just a single point, the origin. If k < 0 this locus is empty. There are no points whose coordinates satisfy this equation. The level surface x y z = 0 of x y z = z − sinxy is shown from three perspectives in Figures 12.24 (a), (b) and (c).
y
2
z 1
-2
x 1
0
-1
2
-1 -2
(a)
y 2 1
z
-2 -1
0 1 -1
2 x
-2
(b) FIGURE 12.24 Different perspectives of graphs of
z = sinxy.
504
CHAPTER 12
Vector Differential Calculus
z -2 -1 -1 0
-2
1 2 y
1
2
x
(c) FIGURE 12.24
(Continued).
Now consider a point P0 x0 y0 z0 on a level surface x y z = k. Assume that there are smooth (having continuous tangents) curves on the surface passing through P0 , such as C1 and C2 in Figure 12.25. Each such curve has a tangent vector at P0 . These tangent vectors determine a plane at P0 , called the tangent plane to the surface at P0 . A vector normal (perpendicular) to at P0 , in the sense of being normal to each of these tangent vectors, is called a normal vector, or just normal, to the surface at P0 . We would like to be able to determine the tangent plane and normal to a surface at a point. Recall that we can find the equation of a plane through a given point if we are given a normal vector to the plane. Thus the normal vector is the key to finding the tangent plane. z
∏
P0
C1 C2
y (x, y, z) = k x FIGURE 12.25 Tangents to curves on the surface through P0 determine the tangent plane .
THEOREM 12.5
Gradient As a Normal Vector
Let and its first partial derivatives be continuous. Then &P is normal to the level surface x y z = k at any point P on this surface at which this gradient vector is nonzero.
12.4 The Gradient Field and Directional Derivatives
505
We will outline an argument suggesting why this is true. Consider a point P0 x0 y0 z0 on the level surface x y z = k. Suppose a smooth curve C on this surface passes through P0 , as in Figure 12.26 (a). Suppose C has parametric equations x = xt y = yt z = zt
z (P0 ) (P0 ) P0 C y (x, y, z) constant
x
Tangent plane to the level surface at P0
P0
Level surface (x, y, z) k
(a)
(b)
FIGURE 12.26 P0 is normal to the level surface at P0 .
Since P0 is on this curve, for some t0 , xt0 = x0 yt0 = y0 zt0 = z0 Further, since the curve lies on the level surface, then xt yt zt = k for all t. Then d xt yt zt = 0 dt = x x t + y y t + z z t = & · x ti + y tj + z tk Now x ti + y tj + z tk = Tt is a tangent vector to C. In particular, letting t = t0 , then Tt0 is a tangent vector to C at P0 , and we have &P0 · Tt0 = 0 This means that &P0 is normal to the tangent to C at P0 . But this is true for any smooth curve on the surface and passing through P0 . Therefore &P0 is normal to the surface at P0 (Figure 12.26 (b)). Once we have this normal vector, finding the equation of the tangent plane is straightforward. If x y z is any other point on the tangent plane (Figure 12.27), then the vector x − x0 i + y − y0 j + z − z0 k is in this plane, hence is orthogonal to the normal vector. Then &P0 · x − x0 i + y − y0 j + z − z0 k = 0 Then
P x − x0 + P0 y − y0 + P0 z − z0 = 0
x 0
y
z
(12.6)
506
CHAPTER 12
Vector Differential Calculus
(x, y, z)
z
P0 : (x 0 , y 0 , z 0 )
(P0 )
(x, y, z) constant y x FIGURE 12.27
P0 · x − x0 i + y − y0 j + z − z0 k = 0.
This equation is satisfied by every point on the tangent plane. Conversely, if x y z satisfies this equation, then x − x0 i + y − y0 j + z − z0 k is normal to the normal vector, hence lies in the tangent plane, implying that x y z is a point in this plane. We call equation (12.6) the equation of the tangent plane to x y z = k at P0 .
EXAMPLE 12.15
Consider the level surface x y z = z − x2 + y2 = 0. This surface is the cone shown √ in Figure 12.28. We will find the normal vector and tangent plane to this surface at 1 1 2. First compute the gradient vector: & = −
x x2 + y 2
i−
y
j + k
x2 + y 2
provided that x and y are not both zero. Figure 12.29 shows & at a point on the cone determined by the position vector Rx y z = xi + yj + x2 + y2 k Then √ 1 1 &1 1 2 = − √ i − √ j + k 2 2
z z
z x 2 y 2
R
y x FIGURE 12.28
P0 y
x FIGURE 12.29 Cone z = x2 + y2 and normal at P0 .
12.4 The Gradient Field and Directional Derivatives
507
√ This is the normal vector to the cone at 1 1 2. The tangent plane at this point has equation √ 1 1 − √ x − 1 − √ y − 1 + z − 2 = 0 2 2 or √ x + y − 2z = 0 The cone has no tangent plane or normal vector at the origin, where the surface has a “sharp point”. This is analogous to a graph in the plane having no tangent vector where it has a sharp point (for example, y = x at the origin).
EXAMPLE 12.16
Consider the surface z = sinxy. If we let x y z = sinxy − z, then this surface is the level surface x y z = 0. The gradient vector is & = y cosxyi + x cosxyj − k This vector field is shown in Figure 12.30, with the gradient vectors drawn as arrows from selected points on the surface. The tangent plane at any point x0 y0 z0 on this surface has equation y0 cosx0 y0 x − x0 + x0 cosx0 y0 y − y0 − z − z0 = 0
z 2
2 1
1
1
1
2 2 x y
FIGURE 12.30 Gradient field = y cosxyi + x cosxyj − k represented as a vector field
on the surface z = sinxy.
508
CHAPTER 12
Vector Differential Calculus
For example, the tangent plane at 2 1 sin2 has equation cos2x − 2 + 2 cos2y − 1 − z + sin2 = 0 or cos2x + 2 cos2y − z = 4 cos2 − sin2 A patch of this tangent plane is shown in Figure 12.31. Similarly, the tangent plane at −1 −2 sin2 has equation 2 cos2x + cos2y + z = −4 cos2 + sin2 Part of this tangent plane is shown in Figure 12.32. A straight line through P0 and parallel to &P0 is called the normal line to the level surface x y z = k at P0 , assuming that this gradient vector is not zero. This idea is illustrated in Figure 12.33. To write the equation of the normal line, let x y z be any point on it. Then the vector x − x0 i + y − y0 j + z − z0 k is along this line, hence is parallel to &P0 . This means that, for some scalar t, either of these vectors is t times the other, say x − x0 i + y − y0 j + z − z0 k = t&P0 .
-2 z
-2 z
-1 -2
-1
-1 2 x
1
0
-2 1
-1
1
0
2 x
1 2 y
2 y
FIGURE 12.31 Part of the tangent plane to z = sinxy at 2 1 sin2.
FIGURE 12.32 Part of the tangent plane to z = sinxy at −1, −2, sin2
(P0 ) P0
Normal line to the surface (x, y, z) k at P0
FIGURE 12.33
12.4 The Gradient Field and Directional Derivatives
509
The components on the left must equal the respective components on the right: x − x0 =
P0 t y − y0 = P0 t z − z0 = P t
x
y
z 0
These are parametric equations of the normal line. As t varies over the real line, these equations give coordinates of points x y z on the normal line.
EXAMPLE 12.17
Consider again the cone√x y z = x2 + y2 − z = 0. In Example 12.15 we computed the gradient vector at 1 1 2, obtaining 1 1 − √ i − √ j + k 2 2 The normal line through this point has parametric equations √ 1 1 x − 1 = − √ t y − 1 = − √ t z − 2 = t 2 2 We can also write
SECTION 12.4
√ 1 1 x = 1 − √ t y = 1 − √ t z = 2 + t 2 2
PROBLEMS
In each of Problems 1 through 6, compute the gradient of the function and evaluate this gradient at the given point. Determine at this point the maximum and minimum rate of change of the function. 1. x y z = xyz 1 1 1 2. x y z = x2 y − sinxz 1 −1 /4 3. x y z = 2xy + xe −2 1 6
10. x y z = yz + xz + xy i − 4k In each of Problems 11 through 16, find the equations of the tangent plane and normal line to the surface at the point. √ 11. x2 + y2 + z2 = 4 1 1 2 12. z = x2 + y −1 1 2
z
4. x y z = cosxyz −1 1 /2
13. z2 = x2 − y2 1 1 0 14. x2 − y2 + z2 = 0 1 1 0
5. x y z = cosh2xy − sinhz 0 1 1 6. x y z = x2 + y2 + z2 2 2 2
15. 2x − cosxyz = 3 1 1
In each of Problems 7 through 10, compute the directional derivative of the function in the direction of the given vector. √ 7. x y z = 8xy2 − xz 1/ 3i + j + k
In each of Problems 17 through 20, find the angle between the two surfaces at the given point of intersection. (Compute this angle as the angle between the normals to the surfaces at this point).
8. x y z = cosx − y + ez i − j + 2k 9. x y z = x2 yz3 2¯j + k
16. 3x4 + 3y4 + 6z4 = 12 1 1 1
17. z = 3x2 + 2y2 −2x + 7y2 − z = 0 1 1 5 √ 18. x2 + y2 + z2 = 4 z2 + x2 = 2 1 2 1
510 19. z =
CHAPTER 12
Vector Differential Calculus
√ x2 + y2 x2 + y2 = 8 2 2 8
21. Suppose & = i + k. What can be said about level surfaces of ? Prove that the streamlines of & are orthogonal to the level surfaces of .
20. x2 + y2 + 2z2 = 10 x + y + z = 5 2 2 1
12.5
Divergence and Curl The gradient operator & produces a vector field from a scalar field. We will now discuss two other vector operations. One produces a scalar field from a vector field, and the other a vector field from a vector field.
DEFINITION 12.10
Divergence
The divergence of a vector field Fx y z = fx y zi + gx y zj + hx y zk is the scalar field
f g h + + div F =
x y z
For example, if F = 2xyi + xyz2 − sinyzj + zex+y k, then div F = 2y + xz2 − z cosyz + ex+y We read div F as the divergence of F, or just “div F”.
DEFINITION 12.11
Curl
The curl of a vector field Fx y z = fx y zi + gx y zj + hx y zk is the vector field
f h
g f
h g − − − curl F = i+ j+ k
y z
z x
x y
This vector is read “curl of F”, or just “curl F”. For example, if F = yi + 2xzj + zex k, then curl F = −2xi − zex j + 2z − 1k Divergence, curl and gradient can all be thought of in terms of the vector operations of multiplication of a vector by a scalar, dot product and cross product, using the del operator &. This is defined by &=
i + j + k
x
y
z
The symbol &, which is read “del”, is treated like a vector in carrying out calculations, and the “product” of / x, / y and / z with a function x y z is interpreted to mean, respectively,
12.5 Divergence and Curl
511
/ x, / y and / z. In this way, the gradient of is the product of the vector & with the scalar function :
i+ j+ k = i+ j+ k
x
y
z
x
y
z = & = gradient of The divergence of a vector is the dot product of del with the vector:
i + j + k · f i + gj + hk & ·F =
x
y
z =
f g h + + = divergence of F
x y z
And the curl of a vector is the cross product of del with the vector: i j k & × F = / x / y / z f g h
h g
f h
g f = − i+ − j+ − k
y z
z x
x y = curl F Informally, del times = gradient, del dot = divergence, and del cross = curl. This provides a way of thinking of gradient, divergence and curl in terms of familiar vector operations involving del, and will prove to be an efficient tool in carrying out computations. There are two fundamental relationships between gradient, divergence and curl. The first states that the curl of a gradient is the zero vector.
THEOREM 12.6
Curl of a Gradient
Let be continuous with continuous first and second partial derivatives. Then & × & = O This conclusion can also be written curl& = O The zero on the right is the zero vector, since the curl of a vector field is a vector.
512
CHAPTER 12 Proof
Vector Differential Calculus
By direct computation,
i+ j+ k & × & = & ×
x
y
z i j k
/ y
/ z = / x / x / y / z 2 2 2
2
2
2 = − i+ − j+ − k=O
y z z y
z x x z
x y y x
because the paired mixed partial derivatives in each set of parentheses are equal. The second relationship states that the divergence of a curl is the number zero. THEOREM 12.7
Let F be a continuous vector field whose components have continuous first and second partial derivatives. Then & · & × F = 0 We may also write divcurl F = 0 Proof
As with the preceding theorem, proceed by direct computation:
h g
f h
g f − − − & · & × F = + +
x y z
y z x
z x y =
2 h
2 g
2 f
2 h
2 g
2 f − + − + − =0
x y x z y z y x z x z y
because equal mixed partials appear in pairs with opposite signs. Divergence and curl have physical interpretations, two of which we will now develop.
12.5.1 A Physical Interpretation of Divergence Suppose Fx y z t is the velocity of a fluid at point x y z and time t. Time plays no role in computing divergence, but is included here because normally a velocity vector does depend on time. Imagine a small rectangular box within the fluid, as in Figure 12.34. We would like some measure of the rate per unit volume at which fluid flows out of this box across its faces, at any given time. First look at the front face II and the back face I in the diagram. The normal vector pointing out of the box from face II is i. The flux of the flow out of the box across face II is the normal component of the velocity (dot product of F with i), multiplied by the area of this face: flux outward across face II = Fx + x y z t · i y z = fx + x y z t y z
12.5 Divergence and Curl
513
Back face I (x, y, z) z Front face II
z
(x x, y, z)
x y
y x FIGURE 12.34
On face I, the unit outer normal is −i, so flux outward across face I = Fx y z t · −i y z = −fx y z t y z The total outward flux across faces I and II is therefore
fx + x y z t − fx y z t y z A similar calculation can be done for the other two pairs of sides. The total flux of fluid out of the box across its faces is
fx + x y z t −fx y z t y z + gx y + y z t − gx y z t x z + hx y z + z t − hx y z t x y The flux per unit volume is obtained by dividing this flux by the volume x y z of the box: flux per unit volume out of the box =
fx + x y z t − fx y z t x gx y + y z t − gx y z t + y +
hx y z + z t − hx y z t z
Now take the limit as x → 0, y → 0 and z → 0. The box shrinks to the point x y z and the flux per unit volume approaches
f g h + +
x y z which is the divergence of Fx y z t at time t. We may therefore intrepret the divergence of F as a measure of the outward flow or expansion of the fluid from this point.
12.5.2
A Physical Interpretation of Curl
Suppose an object rotates with uniform angular speed about a line L as in Figure 12.35. The angular velocity vector has magnitude and is directed along L as a right-handed screw would progress if given the same sense of rotation as the object. Put L through the origin as a convenience, and let R = xi + yj + zk for any point x y z on the rotating object. Let Tx y z be the tangential linear velocity. Then T = R sin = × R
514
CHAPTER 12
Vector Differential Calculus (x, y, z)
T
R θ
L
R sin θ
(0, 0, 0)
FIGURE 12.35 Angular velocity as the curl of the linear velocity.
where is the angle between R and . Since T and × R have the same direction and magnitude, we conclude that T = × R. Now write = ai + bj + ck to obtain T = × R = bz − cyi + cx − azj + ay − bxk Then
i & × T = / x bz − cy
j
/ y cx − az
k
/ z ay − bx
= 2ai + 2bj + 2ck = 2 Therefore 1 = & × T 2 The angular velocity of a uniformly rotating body is a constant times the curl of the linear velocity. Because of this interpretation, curl was once written rot (for rotation), particularly in British treatments of mechanics. This is also the motivation for the term irrotational for a vector field whose curl is zero. Other interpretations of divergence and curl follow from vector integral theorems we will see in the next chapter.
SECTION 12.5
PROBLEMS
In each of Problems 1 through 6, compute & · F and & × F and verify explicitly that & · & × F = 0.
In each of Problems 7 through 12, compute & and verify explicitly that & × & = O.
1. F = xi + yj + 2zk
7. x y z = x − y + 2z2
2. F = sinhxyzj
8. x y z = 18xyz + ex
3. F = 2xyi + xey j + 2zk
9. x y z = −2x3 yz2
4. F = sinhxi + coshxyzj − x + y + zk
10. x y z = sinxz
5. F = x2 i + y2 j + z2 k
11. x y z = x cosx + y + z
6. F = sinhx − zi + 2yj + z − y2 k
12. x y z = ex+y+z
12.5 Divergence and Curl 13. Let x y z be a scalar field and F a vector field. Derive expressions for & · F and & × F in terms of operations applied to x y z and F. 14. Let F = f i + gj + hk be a vector field. Define
i+ g j+ h k F·& = f
x
y
z
515
Let G be a vector field. Show that &F·G = F·&G+G·&F+F×& ×G+G×& ×F 15. Let F and G be vector fields. Prove that & · F × G = G · & × G − F · & × G 16. Let x y z and x y z be scalar fields. Prove that & · & × & = 0
This page intentionally left blank
CHAPTER
13
LINE INTEGRALS GREEN’S THEOREM INDEPENDENCE OF PATH AND POTENTIAL THEORY IN THE PLANE SURFACES IN 3-SPACE AND SURFACE INTEGRALS APPLICATIONS OF SURFACE INTEGRALS PREPARA
Vector Integral Calculus
This chapter is devoted to integrals of vector fields over curves and surfaces, and relationships between such integrals. These have important uses in solving partial differential equations and in constructing models used in the sciences and engineering.
13.1
Line Integrals We begin with the integral of a vector field over a curve. This requires some background on curves. Suppose a curve C in 3-space is given by parametric equations x = xt y = yt z = zt
for a ≤ t ≤ b
We call xt, yt and zt the coordinate functions of C. We will think of C not only as a geometric locus of points xt yt zt, but also as having an orientation or direction, given by the direction this point moves along C as t increases from a to b. Denote this orientation by putting arrows on the graph of the curve (Figure 13.1). Trajectories of a system of differential equations are also oriented curves, with the particle moving along the geometric locus in a direction dictated by the flow of the system. We call xa ya za the initial point of C, and xb yb zb the terminal point. A curve is closed if the initial and terminal points are the same.
517
518
CHAPTER 13
Vector Integral Calculus z
z
(x(b), y(b), z(b)) (2, 0, 4)
R
y
y x (x(a), y(a), z(a)) 0
x FIGURE 13.1
Orientation along a
2
t
FIGURE 13.2
curve. EXAMPLE 13.1
Let C be given by x = 2 cost y = 2 sint z = 4
0 ≤ t ≤ 2.
A graph of C is shown in Figure 13.2. This graph is the circle of radius 2 about the origin in the plane z = 4. The arrow on the curve indicates its orientation (the direction of motion of 2 cost 2 sint 4 around the graph as t varies from 0 to 2). The initial point is 2 0 4, obtained at t = 0, and the terminal point is also 2 0 4, obtained at t = 2. This curve is closed. Contrast C with the curve K given by x = 2 cost y = 2 sint z = 4 0 ≤ t ≤ 3. K has the same graph and counterclockwise orientation as C, but is a different curve. K is not closed. The initial point of K is 2 0 4, but its terminal point is −2 0 4. A particle moving about K goes around the circle once, and then makes another half-circle. It would presumably take more energy to move a particle about K than about C. K is also longer than C. A curve is: – continuous if its coordinate functions are continuous on the parameter interval, – differentiable if its coordinate functions are differentiable, and – smooth if its coordinate functions have continuous derivatives, which are not all zero for the same value of t. Because x ti + y tj + z tk is tangent to the curve if this is not the zero vector, a smooth curve is one that has a continuous tangent vector. The graph of a curve consists of all points xt yt zt as t varies over the parameter interval. As Example 13.1 shows, there is more to a curve than just its graph. We are dealing with curves both as geometric objects and as oriented objects, having a sense of direction along the graph. We will now define the line integral of a vector field over a smooth curve. DEFINITION 13.1
Line Integral
Suppose a smooth curve C has coordinate functions x = xt, y = yt, z = zt for a ≤ t ≤ b. Let fx y z, gx y z, and hx y z be continuous at least on the graph of C. Then, the line integral fx y z dx + gx y z dy + hx y z dz C
13.1 Line Integrals
519
is defined by fx y z dx + gx y z dy + hx y z dz C
=
b a
dy dz dx fxt yt zt + gxt yt zt + hxt yt zt dt dt dt dt 131
We can write this line integral more compactly as C
fdx + gdy + hdz
To evaluate C fdx + gdy + hdz, substitute the coordinate functions x = xt, y = yt and z = zt into fx y z, gx y z and hx y z, obtaining functions of t. Further, substitute dx =
dx dt, dt
dy =
dy dt dt
and dz =
dz dt dt
This results in the Riemann integral on the right side of equation (13.1) of a function of t over the range of values of this parameter.
EXAMPLE 13.2
Evaluate the line integral
C
xdx − yzdy + ez dz if C is given by x = t3 y = −t z = t2 1 ≤ t ≤ 2
Here fx y z = x
gx y z = −yz
and hx y z = ez
and, on C, dx = 3t2 dt
dy = −dt
and dz = 2tdt.
Then C
= =
xdx − yzdy + ez dz
2 1
2
2 t3 3t2 − −tt2 −1 + et 2t dt
1
111 2 3t5 − t3 + 2tet dt = + e4 − e 4
520
CHAPTER 13
Vector Integral Calculus
EXAMPLE 13.3
Evaluate C xyzdx −cosyzdy +xzdz over the straight line segment from 1 1 1 to −2 1 3. Here we are left to find the coordinate functions of the curve. Parametric equations of the line through these points are xt = 1 − 3t
yt = 1
zt = 1 + 2t
We must let t vary from 0 to 1 for the initial point to be 1 1 1 and the terminal point to be −2 1 3. Now xyzdx − cosyzdy + xzdz C
= =
1 0
1 − 3t11 + 2t−3 − cos1 + 2t0 + 1 − 3t1 + 2t2 dt
1 0
3 −1 + t + 6t2 dt = . 2
If C is a smooth curve in the x y plane (zero z-component), and fx y and gx y are continuous on C, then we can write a line integral fx ydx + gx ydy C
which we refer to as a line integral in the plane. We evaluate this according to the Definition 13.1, except now there is no z-component.
EXAMPLE 13.4
Evaluate C xydx − y sinxdy if C is given by xt = t2 and yt = t for −1 ≤ t ≤ 4. Proceed: xydx − y sinxdy C
=
4 −1
t2 t2t − t sint2 1dt
= 410 +
1 1 cos16 − cos1 2 2
In these examples we have included all the terms to follow equation 13.1 very literally. However, with some experience there are obvious shortcuts one can take. In Example 13.3, for could have example, y is constant on the curve, so dy = 0 and the term gxt yt zt dy dt been simply omitted. Thus far we can integrate only over a smooth curve. This requirement can be relaxed as follows. A curve C is piecewise smooth if x t, y t and z t are continuous, and not all zero for the same value of t, at all but possibly finitely many values of t. Since x ti + y tj + z tk is the tangent vector to C if this is not the zero vector, this condition means that a piecewise smooth curve has a continuous tangent at all but finitely many points. Such a curve typically has the appearance of Figure 13.3, with smooth pieces C1 Cn connected at points where the curve may have no tangent. We will refer to a piecewise smooth curve as a path. In Figure 13.3 the terminal point of Cj is the initial point of Cj+1 for j = 1 n − 1. The segments C1 Cn are in order as one moves from the initial to the terminal point of C. This
13.1 Line Integrals z
C4
y
C3 C1
C2
521
(0, 1)
C2
(2, 1)
C1
y
(1, 0)
x FIGURE 13.3 Typical piecewise smooth curve (path).
x
FIGURE 13.4
is indicated by the arrows showing orientation along the smooth pieces of C. If f , g and h are continuous over each Cj , then we define fdx + gdy + hdz = fdx + gdy + hdz + · · · + fdx + gdy + hdz C
C1
Cn
This allows us to take line integrals over paths, rather than restricting the integral to smooth curves.
EXAMPLE 13.5
Let C be the curve consisting of the quarter circle x2 + y2 = 1 in the x y− plane, from 1 to 0 1, followed by the horizontal line segment from 0 1 to 2 1. Compute 0 2 x ydx + y2 dy. C C is piecewise smooth and consists of two smooth pieces (Figure 13.4). Parametrize these as follows. The quarter circle part is C1 x = cost y = sint
0 ≤ t ≤ /2
The straight segment is C2 x = p y = p
0 ≤ p ≤ 2
We have used different names for the parameters on the curves to help distinguish them. Now evaluate the line integral along each of the two curves making up the path. For the line integral over C1 , compute /2 x2 ydx + y2 dy =
cos2 t sint− sint + sin2 t costdt 0
C1
=
0
/2
1 1 − cos2 t sin2 t + sin2 t cost dt = − + 16 3
Next evaluate the line integral over C2 . On C2 , x = p and y = 1, so dy = 0 and 2 8 x2 ydx + y2 dy = p2 dp = 3 C2 0 Then C
x2 ydx + y2 dy = −
1 8 1 + + = − + 3 16 3 3 16
522
CHAPTER 13
Vector Integral Calculus
It is sometimes useful to think of a line integral in terms of vector operations, particularly in the next section when we deal with potential functions. Consider C fdx + gdy + hdz. Form a vector field Fx y z = fx y zi + gx y zj + hx y zk If C has coordinate functions x = xt, y = yt, z = zt, we can form the position vector, for C: Rt = xti + ytj + ztk At any time t, the vector Rt can be represented by an arrow from the origin to the point xt yt zt on C. As t varies, this vector pivots about the origin and adjusts its length to sweep out the curve. If C is smooth, then the tangent vector R t is continuous. Now dR = dxi + dyj + dzk, so F · dR = fx y zdx + gx y zdy + hx y zdy and C
fx y zdx + gx y zdy + hx y zdz =
C
F · dR
This is just another way of writing a line integral in terms of vector operations. Line integrals arise in many contexts. For example, consider a force F causing a particle to move along a smooth curve C having position function Rt, where t varies from a to b. At any point xt yt zt of C, the particle may be thought of as moving in the direction of the tangent to its trajectory, and this tangent is R t. Now Fxt yt zt · R t is the dot product of a force with a direction, and so has the dimensions of work. By integrating this function from a to b, we “sum” the work being done by F over the entire path of motion. This suggests that C F · dR, or C fdx + gdy + hdz, can be interpreted as the work done by F in moving the particle over the path.
EXAMPLE 13.6
Calculate the work done by F = i − yj + xyzk in moving a particle from 0 0 0 to 1 −1 1 along the curve x = t, y = −t2 , z = t for 0 ≤ t ≤ 1. The work is work = F · dR = dx − ydy + xyzdz C
= =
C
1 0
1 + t2 −2t − t4 dt
1 0
3 1 − 2t3 − t4 dt = 10
The correct units (such as foot-pounds) would have to be provided from context. Line integrals have some of the usual properties we associate with integrals.
13.1 Line Integrals
523
THEOREM 13.1
Let C be a path and let having position vector R. Let F and G be vector fields that are continuous at points of C. Then 1.
C
F + G · dR =
2. For any number ,
C
C
F · dR +
F · dR =
C
C
G · dR
F · dR
This theorem illustrates the efficiency of the vector notation for line integrals. We could also write the conclusion (1) as f + f ∗ dx + g + g ∗ dy + h + h∗ dz C
=
C
fdx + gdy + hdz +
C
f ∗ dx + g ∗ dy + h∗ dz
a integrals, reversing the limits of integration changes the sign of the integral: b For Riemann fxdx = − fxdx. The analogous property for line integrals involves reversing orientaa b tion on C. Given C with an orientation from an initial point P to a terminal point Q, let −C denote the curve obtained from C by reversing the orientation to go from Q to P (Figure 13.5). Here is a more careful definition of this orientation reversal. z
z
C
C
Q y
P
Q y
P
x
x
FIGURE 13.5 Reversing orientation of a curve.
DEFINITION 13.2
Let C be a smooth curve with coordinate functions x = xt, y = yt, z = zt, for a ≤ t ≤ b. Then −C denotes the curve having coordinate functions , xt = xa + b − t
, yt = ya + b − t , zt = za + b − t
for a ≤ t ≤ b.
The initial point of −C is , xa, ya, za = xb yb zb
524
CHAPTER 13
Vector Integral Calculus
the terminal point of C. And the terminal point of −C is , xb, yb, zb = xa ya za the initial point of C. By the chain rule, −C is piecewise smooth if C is piecewise smooth. We will now show that the line integral of a vector field over −C is the negative of the line integral of the vector field over C. THEOREM 13.2
Let C be a smooth curve with coordinate functions x = xt, y = yt, z = zt. Let f , g and h be continuous on C. Then fdx + gdy + hdz = − fdx + gdy + hdz −C
First,
Proof
C
= Similarly,
fdx + gdy + hdz
b dx dy dz fxt yt zt + gxt yt zt + hxt yt zt dt dt dt dt a −C
=
C
fdx + gdy + hdz
b a
d, y d, z d, x xt, yt, zt + h, xt, yt, zt f, xt, yt, zt + g, dt dt dt dt
Change variables in the last integral by putting s = a + b − t. When t = a, s = b , and when t = b, s = a. Further, d, x d d dx ds dx = xa + b − t = xs = =− dt dt dt ds dt ds and, similarly, d, y dy d, z dz =− =− dt ds dt ds Finally, dt = −ds Then
−C
fdx + gdy + hdz
dy dz dx − gxs ys zs − hxs ys zs −1ds ds ds ds b
b dy dz dx fxs ys zx + gxs ys zs + hxs ys zs =− ds ds ds ds a = − fdx + gdy + hdz
=
a
−fxs ys zx
C
13.1 Line Integrals
525
In view of this theorem, the easiest way to evaluate −C fdx + gdy + hdz is usually to take the negative of C fdx + gdy + hdz. We need not actually write the coordinate functions of −C, as was done in the proof.
EXAMPLE 13.7
A force Fx y z = x2 i − zyj + x coszk moves a particle along the path C given by x = t2 , y = t, z = t for 0 ≤ t ≤ 3. The initial point is P 0 0 0 and the terminal point of C is Q 9 3 3. Suppose we want the work done in moving the particle along this path from Q to P. Since we want to go from the terminal to the initial point of C, the work done is −C F · dR. However, we do not need to formally define −C in terms of new coordinate functions. We can simply calculate C F · dR, and take the negative of this. Calculate F · dR = fdx + gdy + hdz C
C
=
3
t4 2t − tt1 + t2 cost dt
0
6 The work done in moving the particle along the path from Q to P is = 243 − 9 −
6 + 9 − 243
13.1.1
Line Integral with Respect to Arc Length
Line integrals with respect to arc length occur in some uses of line integrals. Here is the definition of this kind of line integral.
DEFINITION 13.3
Line Integral With Respect to Arc Length
Let C be a smooth curve with coordinate functions x = xt, y = yt, z = zt for a ≤ t ≤ b. Let be a real-valued function that is continuous on the graph of C. Then the integral of over C with respect to arc length is b x y zds = xt yt zt x t2 + y t2 + z t2 dt C
a
The rationale behind this definition is that the length function along C is t st = x 2 + y 2 + z 2 d a
Then ds =
x t2 + y t2 + z t2 dt
suggesting the integral in the definition.
526
CHAPTER 13
EXAMPLE 13.8
Evaluate
Vector Integral Calculus
C
xyds over the curve given by x = 4 cost y = 4 sint z = −3
for 0 ≤ t ≤ /2
Compute C
xyds = =
/2 0
/2
0
" 4 cost 4 sint 16 sin2 t + 16 cos2 tdt 64 cost sintdt = 32
Line integrals with respect to arc length occur in calculations of mass, density, and various other quantities for one-dimensional objects. Suppose, for example, we want the mass of a thin wire bent into the shape of a piecewise smooth curve C having coordinate functions x = xt y = yt z = zt
for a ≤ t ≤ b
The wire is one-dimensional in the sense that (ideally) it has length but not area or volume. We will derive an expression for the mass of the wire as follows. Let x y z be the density of the wire at any point. Partition a b into subintervals by inserting points a = t0 < t1 < · · · < tn−1 < tn = b Choose these points t units apart, so tj − tj−1 = t. These partition points of a b determine points Pj xtj ytj ztj along C, as shown in Figure 13.6. Assuming that the density function is continuous, we can choose t sufficiently small that on the piece of wire between Pj−1 and Pj , the values of the density function are approximated to whatever accuracy we wish by Pj . The length of the segment of wire between Pj−1 and Pj is s = sPj − sPj−1 , which is approximated by ds =
"
x tj 2 + y tj 2 + z tj 2 t
The density of this piece of wire between Pj−1 and Pj is therefore approximately the “nearly” constant value of the density on this piece, times the length of this piece of wire, this product being " xtj ytj ztj x tj 2 + y tj 2 + z tj 2 t
z Pj 1
Pn : (x(b), y(b), z(b))
Pj P0: (x (a), y(a), z(a)) x FIGURE 13.6
t0 a
y t1 t2 t3
b tn
t
13.1 Line Integrals
527
The mass of the entire length of wire is approximately the sum of the masses of these pieces: " n mass ≈ xtj ytj ztj x tj 2 + y tj 2 + z tj 2 t j=1
in which ≈ means “approximately equal”. Recognize this as the Riemann sum for a definite integral to obtain, in the limit as t → 0, mass = xt yt zt x t2 + y t2 + z t2 dt C
=
x y zds C
A similar argument leads to coordinates x y z of the center of mass of the wire: 1 1 1 x= xx y zds y = yx y zds z = zx y zds m C m C m C in which m is the mass of the wire.
EXAMPLE 13.9
A wire is bent into the shape of the quarter circle C given by x = 2 cost y = 2 sint z = 3
for 0 ≤ t ≤ /2
The density function is x y z = xy2 grams/centimeter. We want the mass and center of mass of the wire. The mass is m = xy2 ds = =
C
/2
0 /2 0
" 2 cost 2 sint2 4 sin2 t + 4 cos2 tdt 16 cost sin2 tdt =
16 grams. 3
Now compute the coordinates of the center of mass. First, 1 x= xx y zds m C " 3 /2 =
2 cost2 2 sint2 4 sin2 t + 4 cos2 tdt 16 0 /2 3 =6 cos2 t sin2 tdt = 8 0 Next,
1 yx y zds m C " 3 /2 =
2 cost 2 sint3 4 sin2 t + 4 cos2 tdt 16 0 /2 3 =6 cost sin3 tdt = 2 0
y=
CHAPTER 13
528
Vector Integral Calculus
Finally, 1 zx y zds m C " 3 /2 2 = 3 2 cost 2 sint 4 sin2 t + 4 cos2 tdt 16 0 /2 =9 sin2 t costdt = 3
z=
0
The last result could have been anticipated, since the z-component on the curve is constant. The center of mass is 3/8 3/2 3
SECTION 13.1
PROBLEMS
In each of Problems 1 through 15, evaluate the line integral. 1. C xdx − dy + zdz, with C given by xt = t yt = t zt = t3 for 0 ≤ t ≤ 1 2. C −4xdx + y2 dy − yzdz, with C given by xt = −t2 yt = 0 zt = −3t for 0 ≤ t ≤ 1 3. C x + yds, where C is given by x = y = t z = t2 for 0≤t≤2 4. C x2 zds, where C is the line segment from 0 1 1 to 1 2 −1 5. C F · dR, where F = cosxi − yj + xzk and R = ti − t2 j + k for 0 ≤ t ≤ 3 6. C 4xyds, with C given by x = y = t z = 2t for 1≤t≤2 7. C F · dR, with F = xi + yj − zk and C the circle x2 + y2 = 4 z = 0, going around once counterclockwise. 8. C yzds, with C the parabola z = y2 x = 1 for 0 ≤ y≤2 √ 9. C −xyzdz, with C the curve x = 1 y = z for 4 ≤ z≤9
13.2
10. 11. 12. 13. 14. 15.
xzdy, with C the curve x = y = t z = −4t2 for 1≤t≤3 8z2 ds, with C the curve x = y = 2t2 z = 1 for C 1≤t≤2 F · dR, with F = i − xj + k and R = costi − C sintj + tk for 0 ≤ t ≤ 8x2 dy, with C given by x = et y = −t2 z = t for C 1≤t≤2 xdy − ydx, C the curve x = y = 2t z = e−t for C 0≤t≤3 sinxds, with C given by x = t y = 2t z = 3t for C 1≤t≤3 C
16. Find the mass and center of mass of a thin, straight wire from the origin to 3 3 3 if x y z = x + y + z grams per centimeter. 17. Find the work done by F = x2 i − 2yzj + zk in moving an object along the line segment from 1 1 1 to 4 4 4. b 18. Show that any Riemann integral a fxdx is equal to a line integral C F · dR for appropriate choices of F and C. In this sense the line integral generalizes the Riemann integral.
Green’s Theorem Green’s theorem was developed independently by the self-taught British amateur natural philosopher George Green and the Ukrainian mathematician Michel Ostrogradsky. They were studying potential theory (electric potentials, potential functions), and they obtained an important relationship between double integrals and line integrals in the plane.
13.2 Green’s Theorem
529
y Exterior of C
y
Interior of C
C x
x FIGURE 13.7 Graph of a curve that is not simple.
FIGURE 13.8
The Jordan
curve theorem.
Let C be a piecewise smooth curve in the plane, having coordinate functions x = xt, y = yt for a ≤ t ≤ b. We will be interested in this section in C being a closed curve, so the initial and terminal points coincide. C is positively oriented if xt yt moves around C counterclockwise as t varies from a to b. If xt yt moves clockwise, then we say that C is negatively oriented. For example, let xt = cost and yt = sint for 0 ≤ t ≤ 2. Then xt yt moves counterclockwise once around the unit circle as t varies from 0 to 2, so C is positively oriented. If, however, K has coordinate functions xt = − cost and yt = sint for 0 ≤ t ≤ 2, then K is negatively oriented, because now xt yt moves in a clockwise sense. However, C and K have the same graph. A closed curve in the plane is positively oriented if, as you walk around it, the region it encloses is over your left shoulder. A curve is simple if the same point cannot be on the graph for different values of the parameter. This means that xt1 = xt2 and yt1 = yt2 can occur only if t1 = t2 . If we envision the graph of a curve as a train track, this means that the train does not return to the same location at a later time. Figure 13.7 shows the graph of a curve that is not simple. This would prevent a closed curve from being simple, but we make an exception of the initial and terminal points. If these are the only points obtained for different values of the parameter, then a closed curve is also called simple. For example, the equations x = cost and y = sint for 0 ≤ t ≤ 2, describe a simple closed curve. However, consider M given by x = cost, y = sint for 0 ≤ t ≤ 4. This is a closed curve, beginning and ending at 1 0, but xt yt traverses the unit circle twice counterclockwise as t varies from 0 to 4. M is a closed curve but it is not simple. It is a subtle theorem of topology, the Jordan curve theorem, that a simple closed curve C in the plane separates the plane into two regions having C as common boundary. One region contains points arbitrarily far from the origin, and is called the exterior of C. The other region is called the interior of C. These regions are displayed for a typical closed curve in Figure 13.8. The interior of C has finite area, while the exterior does not. / Finally, when a line integral is taken around a closed curve, we often use the symbol C in place of C . This notation is optional, and is simply a reminder that C is closed. It does not alter in any way the meaning of the integral. We are now ready to state the first fundamental theorem of vector integral calculus. Recall that a path is a piecewise smooth curve (having a continuous tangent at all but finitely many points).
THEOREM 13.3
Green
Let C be a simple closed positively oriented path in the plane. Let D consist of all points on C and in its interior. Let f , g, g/ x and f/ y be continuous on D. Then g f − fx ydx + gx ydy = dA
x y C D
530
CHAPTER 13
Vector Integral Calculus
The significance of Green’s theorem is that it relates an object that deals with a curve, which is one-dimensional, to an object related to a planar region, which is two-dimensional. This will have important implications when we discuss independence of path of line integrals in the next section, and later when we develop partial differential equations and complex analysis. Green’s theorem will also lead shortly to Stokes’s theorem and Gauss’s theorem, which are its generalizations to 3-space. We will prove the theorem under restricted conditions at the end of this section. For now, here are two computational examples.
EXAMPLE 13.10
Sometimes we use Green’s theorem as a computational aid to convert one kind of integral into another, possibly simpler, one. As an illustration, suppose we want to compute the work done by the force Fx y = y − x2 ex i + cos2y2 − xj in moving a particle about the / rectangular path C of Figure 13.9. If you try to evaluate C F · dR as a sum of line integrals over the straight line sides of this rectangle, you will find that the integrations cannot be done in elementary form. However, apply Green’s theorem, with D the region bounded by the rectangle. We obtain
2 2 x work = F · dR = cos2y − x − y − x e dA
x
y C D = −2dA = −2(area of D) = −4 D
EXAMPLE 13.11
Another typical use of Green’s theorem is in deriving very general results. To illustrate, suppose we want to evaluate 2x cos2ydx − 2x2 sin2ydy C
for every positively oriented simple closed path C in the plane. This may appear to be a daunting task, since there are infinitely many different such paths. However, observe the form of fx y and gx y in the line integral. In particular,
−2x2 sin2y − 2x cos2y = −4x sin2y + 4x sin2y = 0
x
y for all x and y. Therefore, Green’s theorem gives us 2x cos2ydx − 2x2 sin2ydy = 0dA = 0 C
D
In the next section we will see how the vanishing of this line integral for any closed curve allows an important conclusion about line integrals K 2x cos2ydx − 2x2 sin2ydy when K is not closed. We will conclude this section with a proof of Green’s theorem under special conditions on the region bounded by C. Assume that D can be described in two ways.
13.2 Green’s Theorem
531
y (0, 3) D
C2
y
(1, 3)
y k(x), x from b to a
C (1, 1)
C1
(0, 1) x
a
FIGURE 13.9
y h(x), x from a to b x b
FIGURE 13.10
First, D consists of all points x y with a≤x≤b and, for each x, hx ≤ y ≤ kx Graphs of the curves y = hx and y = kx form, respectively, the lower and upper parts of the boundary of D (see Figure 13.10). Second, D consists of all points x y with c≤y≤d and, for each y, Fy ≤ x ≤ Gy In this description, the graphs of x = Fy and x = Gy form, respectively, the left and right parts of the boundary of D (see Figure 13.11). Using these descriptions of D and the boundary of D, we can demonstrate Green’s theorem by evaluating the integrals involved. First, let C1 be the lower part of C (graph of y = hx) and C2 the upper part (graph of y = kx). Then fx ydx = fx ydx + fx ydx C
C1
= =
b
a
b
a
C2
fx hxdx +
a
fx kxdx b
− fx kx − fx hxdx
The upper and lower limits of integration in the second line maintain a counterclockwise orientation on C. y y d x F( y)
x G(y) x
y c FIGURE 13.11
532
CHAPTER 13
Vector Integral Calculus
Now compute b kx f f dA = dydx a hx y D y b kx =
fx yhx dx a
= Therefore
C
a
b
fx kx − fx hxdx
fx ydx = −
f dA D y
Using the other description of D, a similar computation shows that g gx ydy = dA C D x Upon adding the last two equations we obtain the conclusion of Green’s theorem.
SECTION 13.2
PROBLEMS
1. A particle moves once counterclockwise about the triangle with vertices 0 0 4 0 and 1 6, under the influence of the force F = xyi + xj. Calculate the work done by this force. 2. A particle moves once counterclockwise about the circle of radius 6 about the origin, under the influence of the force F = ex − y + x coshxi + y3/2 + xj. Calculate the work done. 3. A particle moves once counterclockwise about the rectangle with vertices 1 1, 1 7, 3 1 and 3 7, under the influence of the force F = − cosh4x4 + xyi + e−y + xj. Calculate the work done. In each of/ Problems 4 through 11, use Green’s theorem to evaluate C F · dR. All curves are oriented counterclockwise. 4. F = 2yi − xj, C is the circle of radius 4 about 1 3 5. F = x2 i − 2xyj, C is the triangle with vertices 1 1 4 1 2 6 6. F = x + yi + x − yj, C is the ellipse x2 + 4y2 = 1 7. F = 8xy2 j, C is the circle of radius 4 about the origin
8. F = x2 − yi + cos2y − e3y + 4xj, with C any square with sides of length 5 9. F = ex cosyi − ex sinyj, C is any simple closed piecewise smooth curve in the plane 10. F = x2 yi − xy2 j, C the boundary of the region x2 + y2 ≤ 4, x ≥ 0, y ≥ 0 11. F = xyi + xy2 − ecosy j, C the triangle with vertices 0 0 3 0 0 5 12. Let C be a positively oriented simple closed path with interior D. / (a) Show that the area of D equals C −ydx. / (b) Show that the area of D equals C xdy. 1−ydx + xdy. (c) Show that the area of D equals 2 13. Let ux y be continuous with continuous first and second partial derivatives on a simple closed path C and throughout the interior D of C. Show that 2 u 2 u
u
u − dx + dy = + 2 dA 2
y
x
y C D x
13.2.1 An Extension of Green’s Theorem There is an extension of Green’s theorem to include the case that there are finitely many points enclosed by C at which f , g, f/ y and/or g/ x are not continuous, or perhaps are not even defined. The idea is to excise the “bad points,” as we will now describe.
13.2 Green’s Theorem
533
Suppose C is a simple closed positively oriented path in the plane enclosing a region D. Suppose f , g, f/ y and g/ x are continuous on C, and throughout D except at points P1 Pn . Green’s theorem does not apply to this region. But with a little imagination we can still draw an interesting conclusion. Enclose each Pj with a circle Kj of sufficiently small radius that none of these circles intersects either C or each other (Figure 13.12). Next, cut a channel in D from C to K1 , then from K1 to K2 , and so on until finally a channel is cut from Kn−1 to Kn . A typical case is shown in Figure 13.13. Form the closed path C ∗ consisting of C (with a small segment cut out where the channel to K1 was made), each of the Kj s (with small cuts removed where the channels entered and exited), and the segments forming the connections between C and the successive Kj s. Figure 13.14 shows C ∗ , which encloses the region D∗ . By the way C ∗ was formed, the points P1 Pn are external to C ∗ (Figure 13.15). Further, f , g, f/ y and g/ x are continuous on C ∗ and throughout D∗ . We can therefore apply Green’s theorem to C ∗ and D∗ to conclude that g f − dA fx ydx + gx ydy =
x y C∗ D∗
-
(13.2)
Now imagine that the channels that were cut become narrower, merging to form segments between C and successive Kj s. Then C ∗ approaches the curve Cˆ of Figure 13.16, and D∗ ˆ shown in Figure 13.17. D ˆ consists of D with the disks bounded by approaches the region D K1 Kn cut out. In this limit process, equation 13.2 approaches C
fx ydx + gx ydy +
g f − fx ydx + gx ydy = dA ˆ
x y Kj D
n j=1
C
P1 K1
C P1
P2
Kn
P2
K1
K2
K2 Kn
Pn
FIGURE 13.12
Pn
FIGURE 13.13
y P1
C*
P2
P1
P2
P3 Pn
Pn x
FIGURE 13.14
FIGURE 13.15
534
CHAPTER 13
Vector Integral Calculus
C
D
FIGURE 13.16
FIGURE 13.17
On the left side of this equation, line integrals over the internal segments connecting C and the Kj s cancel because the integration is carried out twice over each segment, once in each direction. Further, the orientation on C is counterclockwise, but the orientation on each Kj is clockwise because of the way the boundaries were traversed (Figure 13.14). If we reverse the orientations on these circles the line integrals over them change sign and we can write C
fx ydx + gx ydy =
g f − fx ydx + gx ydy + dA ˆ
x y D Kj
n j=1
in which all the integrals are in the positive, counterclockwise sense about the curves C and K1 Kn . This is the generalization of Green’s theorem that we sought.
EXAMPLE 13.12
Suppose we are interested in -
−y C
x2 + y 2
dx +
x x2 + y 2
dy
with C any simple closed positively oriented path in the plane, but not passing through the origin. With fx y =
−y x2 + y 2
and
gx y =
x x2 + y 2
we have y 2 − x2
g
f = 2 =
x x + y2
y f , g, f/ y and g/ x are continuous at every point of the plane except the origin. This leads us to consider two cases. Case 1—C does not enclose the origin. Now Green’s theorem applies and C
g f −y x dA = 0 dx + 2 dy = − x2 + y 2 x + y2
x y D
13.2 Green’s Theorem y
535
C K x
FIGURE 13.18
Case 2—C encloses the origin. Draw a circle K centered at the origin, with radius sufficiently small that K does not intersect C (Figure 13.18). By the extension of Green’s theorem, fx ydx + gx ydy C
g f − dA = fx ydx + gx ydy + ˆ
x y K D = fx ydx + gx ydy -
K
ˆ is the region between K and C. Both of these line integrals are in the counterclockwise where D sense about the respective curves. The last line integral can be evaluated explicitly because we know K. Parametrize K by x = r cos y = r sin for 0 ≤ ≤ 2 Then
-
= =
K
fx ydx + gx ydy
2 0
0
2
−r sin r cos
−r sin +
r cos d r2 r2
d = 2
We conclude that 0 fx ydx + gx ydy = C 2
SECTION 13.2
x y i+ 2 j x2 + y 2 x + y2 3/2 1 xi + yj 2. F = x2 + y 2
if C encloses the origin
PROBLEMS
/ In each of Problems 1 through 5, evaluate C F · dR over any simple closed path in the x y plane that does not pass through the origin. 1. F =
if C does not enclose the origin
−y x 2 i + + x − 2y j x2 + y 2 x2 + y 2 −y x 4. F = + 3x i + 2 −y j x2 + y 2 x + y2 x y 2 5. F = + 2x i + − 3y j x2 + y 2 x2 + y 2 3. F =
536
13.3
CHAPTER 13
Vector Integral Calculus
Independence of Path and Potential Theory in the Plane In physics, a conservative force field is one that is derivable from a potential. We will use the same terminology.
DEFINITION 13.4
Conservative Vector Field
Let D be a set of points in the plane. A vector field Fx y is conservative on D if for some real-valued x y, F = & for all x y in D. In this event, is a potential function for F on D.
If is a potential function for F, then so is + c for any constant c, because & + c = &. For this reason we often speak of a potential function for F, rather than the potential function. Recall that, if Fx y = fx yi + gx yj and Rt = xti + ytj is a position function for C, then C F · dR is another way of writing C fx ydx + gx ydy. We will make frequent use of the notation C F · dR throughout this section because we want to examine the effect on this integral when F has a potential function, and for this we will use vector notation. First, the line integral of a conservative vector field can be evaluated directly in terms of a potential function. For suppose C is smooth, with coordinate functions x = xt, y = yt for a ≤ t ≤ b. If F = &, then Fx y = and
C
i+ j
x
y
dx + dy
y C x b dx dy = + dt
x dt y dt a b d xt ytdt = a dt
F · dR =
= xb yb − xa ya Denoting P1 = xb yb and P0 = xa ya, this result states that F · dR = P1 − P0 C
= terminal point of C − initial point of C
(13.3)
The line integral of a conservative vector field over a path is the difference in values of a potential function at end points of the path. This is familiar from physics. If a particle moves along a path under the influence of a conservative force field, then the work done is equal to the difference in the potential energy at the ends of the path. One ramification of equation (13.3) is that the actual path itself does not influence the outcome, only the end points of the path. If we chose a different path K between the same end points, we would obtain the same result for K F ·dR. This suggests the concept of independence of path of a line integral.
13.3 Independence of Path and Potential Theory in the Plane
DEFINITION 13.5
537
Independence of Path
F · dR is independent of path on a set D of points in the plane if for any points P0 and C P1 in D, the line integral has the same value over any paths in D having initial point P0 and terminal point P1 .
The discussion preceding the definition may now be summarized. THEOREM 13.4
Let and its first partial derivatives be continuous for all x y in a set D of points in the plane. Let F = &. / Then C F · dR is independent of path in D. Further, if C is a simple closed path in D, then C F · dR = 0. The independence of path follows from equation (13.3), which states that, when F = &, the value of C F · dR depends only on the values of x y at the end points of the path, and not where the path goes in between. For the last conclusion of the theorem, if C is a closed path in D, then the initial and terminal points coincide, hence the difference between the values of at the terminal and initial points is zero.
EXAMPLE 13.13
Let Fx y = 2x cos2yi − 2x2 sin2yj. It is routine to check that x y = x2 cos2y is a potential function for F. Since is continuous with continuous partial derivatives over the entire plane, we can let D consist of all points in the plane in the definition of independence of path. For example, if C is any path in the plane from 0 0 to 1 /8, then √ 2 F · dR = 1 /8 − 0 0 = 2 C / Further, if K is any simple closed path in D, then K F · dR = 0. It is clearly to our advantage to know whether a vector field is conservative, and, if it is, to be able to produce a potential function. Let Fx y = fx yi + gx yj F is conservative exactly when, for some , F = & =
i + j
x
y
and this requires that
= fx y and = gx y
x
y To attempt to find such a , begin with either of these equations and integrate with respect to the variable of the derivative, keeping the other variable fixed. The constant of integration is then actually a function of the other (fixed) variable. Finally, use the second equation to attempt to find this function.
538
CHAPTER 13
Vector Integral Calculus
EXAMPLE 13.14
Consider the vector field Fx y = 2x cos2yi − 2x2 sin2y + 4y2 j We want a real-valued function such that
= 2x cos2y
x
and
= −2x2 sin2y − 4y2
y
Choose one of these equations. If we pick the first, then integrate with respect to x, holding y fixed: x y = 2x cos2ydx = x2 cos2y + gy The “constant” of integration is allowed to involve y because we are reversing a partial derivative, and for any function of y,
2
x cos2y + gy = 2x cos2y
x as we require. We now have x y to within some function gy. From the second equation we need
= −2x2 sin2y − 4y2 = x2 cos2y + gy
y
y Then −2x2 sin2y − 4y2 = −2x2 sin2y + g y so g y = −4y2 Choose gy = −4y3 /3 to obtain the potential function 4 x y = x2 cos2y − y3 3 It is easy to check that F = &. Is every vector field in the plane conservative? As the following example shows, the answer is no.
EXAMPLE 13.15
Let Fx y = 2xy2 + yi + 2x2 y + ex yj. If this vector field were conservative, there would be a potential such that
= 2xy2 + y = 2x2 y + ex y
x
y Integrate the first equation with respect to x to get x y = 2xy2 + ydx = x2 y2 + xy + fy
13.3 Independence of Path and Potential Theory in the Plane
539
From the second equation,
= 2x2 y + ex y = x2 y2 + xy + fy = 2x2 y + x + f y
y
y But this would imply that f y = ex y − x and we cannot find a function of y alone satisfying this equation. Therefore F has no potential. Because not every vector field is conservative, we need some test to determine whether or not a given vector field is conservative. The following theorem provides such a test. THEOREM 13.5
Test for a Convervative Field
Let f and g be continuous in a region D bounded by a rectangle having its sides parallel to the axes. Then Fx y = fx yi + gx yj is conservative on D if and only if, for all x y in D,
g
f =
x
y
(13.4)
Sometimes the conditions of the theorem hold throughout the plane, and in this event the vector field is conservative for all x y when equation (13.4) is satisfied.
EXAMPLE 13.16
Consider again Fx y = 2xy2 + yi + 2x2 y + ex yj, from Example 13.15. Compute
f = 4xy + 1
y
and
g = 4xy + ex y
x
and these are unequal on any rectangular region of the plane. This vector field is not conservative. We showed in Example 13.15 that no potential function can exist for this field.
13.3.1
A More Critical Look at Theorem 13.5
The condition (13.4) derived in Theorem 13.5 can be written
g f − = 0
x y But the combination
g f −
x y also occurs in Green’s theorem. This must be more than coincidence. In this section we will explore connections between independence of path, Green’s theorem, condition (13.4), and existence of a potential function. The following example is instructive. Let D consist of all points in the plane except the origin. Thus, D is the plane with the origin punched out. Let −y x Fx y = 2 i+ 2 j = fx yi + gx yj 2 x +y x + y2
540
CHAPTER 13
Vector Integral Calculus
Then F is defined on D, and f and g are continuous with continuous partial derivatives on D. Further, we saw in Example 13.12 that
f g − =0
y x
for x y in D
Now evaluate C fx ydx + gx ydy over two paths from 1 0 to −1 0. First, let C be the top half of the unit circle, given by x = cos y = sin for 0 ≤ ≤ Then
=
C
fx ydx + gx ydy
0
− sin− sin + cos cosd =
0
d =
Next, let K be the path from 1 0 to −1 0 along the bottom half of the unit circle, given by x = cos y = − sin for 0 ≤ ≤ Then
=
K
fx ydx + gx ydy
0
sin− sin + cos− cosd = −
0
d = −
This means that C fx ydx + gx ydy is not independent of path in D. The path chosen between two given points makes a difference. This also means that F is not conservative over D. There is no potential function for F (by Theorem 13.4, if there were a potential function, then the line integral would have to be independent of path). This example suggests that there is something about the conditions specified on the set D in Theorem 13.5 that make a difference. The rectangular set in the theorem, where condition (13.4) is necessary and sufficient for existence of a potential, must have some property or properties that the set in this example lacks. We will explore this line of thought. Let D be a set of points in the plane. We call D a domain if it satisfies two conditions: 1. If P0 is any point of D, there is a circle about P0 such that every point enclosed by this circle is also in D. 2. Between any two points of D, there is a path lying entirely in D. For example, the right quarter plane S consisting of points x y with x ≥ 0 and y ≥ 0 enjoys property (2), but not (1). There is no circle that can be drawn about a point x 0 with x ≥ 0, that contains only points with nonnegative coordinates (Figure 13.19). Similarly, any circle drawn about a point 0 y in S must contains points outside of S. The shaded set M of points in Figure 13.20 does not satisfy condition (2). Any path C connecting the indicated points P and Q must at some time go outside of M. Figure 13.21 shows the set A of points between the circles of radius 1 and 3 about the origin. Thus, x y is in A exactly when 1 < x2 + y2 < 9
13.3 Independence of Path and Potential Theory in the Plane
541
y y (0, y) y
Q A
x
x
C P x
(x, 0) FIGURE 13.19 Right quarter plane x ≥ 0, y ≥ 0.
FIGURE 13.21 The region between two concentric circles.
FIGURE 13.20
This set satisfies conditions (1) and (2), and so is a domain. The boundary circles are drawn as dashed curves to emphasize that points on these curves are not in A. The conditions defining a domain are enough for the first theorem.
THEOREM 13.6
Let F be a vector field that is continuous on a domain D. Then, C F · dR is independent of path on D if and only if F is conservative. Proof We know that, if F is conservative, then C F · dR is independent of path on D. It is the converse that uses the condition that D is a domain. Conversely, suppose C F · dR is independent of path on D. We will produce a potential function. Choose any point P0 x0 y0 in D. If P x y is any point of D, define x y = F · dR C
in which C is any path in D from P0 to P. There is such a path because D is a domain. Further, because this line integral is independent of path, x y depends only on x y and P0 and not on the curve chosen between them. Thus is a function. Because F is continuous on D, is also continuous on D. Now let Fx y = fx yi + gx yj and select any point a b in D. We will show that
a b = fa b and
x
a b = ga b
y
For the first of these equations, recall that
a + x b − a b a b = lim →0
x x Because D is a domain, there is a circle about a b enclosing only points of D. Let r be the radius of such a circle and restrict x so that 0 < x < r. Let C1 be any path in D from P0 to a b and C2 the horizontal line segment from a b to a + x b, as shown in Figure 13.22. Let C be the path from P0 to a + x b consisting of C1 and then C2 . Now a + x b − a b = F · dR − F · dR = F · dR C
C1
C2
542
CHAPTER 13
Vector Integral Calculus y
(a, b) C2
y
(a x, b) x 0
(a x, b) x 0
C1
(x 0 , y0 )
(x 0 , y0 )
x
FIGURE 13.22
(a, b)
x
FIGURE 13.23
Parametrize C2 by x = a + t x, y = b for 0 ≤ t ≤ 1. Then a + x b − a b = F · dR = fx ydx + gx ydy = Then
C2
1 0
C2
fa + t x b xdt
a + x b − a b 1 = fa + t x bdt x 0
By the mean value theorem for integrals, there is a number between 0 and 1, inclusive, such that 1 fa + t x bdt = fa + x b 0
Therefore a + x b − a b = fa + x b x As x → 0, fa + x b → fa b by continuity of f , proving that lim
x→0+
a + x b − a b = fa b x
By a similar argument, using the path of Figure 13.23, we can show that lim
x→0−
a + x b − a b = fa b x
Therefore
a b = fa b
x To prove that / ya b = ga b, the reasoning is similar except now use the paths of Figures 13.24 and 13.25. This completes the proof of the theorem.
y
(a, b)
(a, b y) y 0
y (a, b y), y 0
(x 0 , y0 ) FIGURE 13.24
x
(x 0 , y0 ) FIGURE 13.25
x
13.3 Independence of Path and Potential Theory in the Plane
543
y C x
FIGURE 13.26 The set of points between two concentric circles is not simply connected.
We have seen that the condition (13.4) is necessary and sufficient for F to be conservative within a rectangular region (which is a domain). Although this result is strong enough for many purposes, it is possible to extend it to regions that are not rectangular in shape if another condition is added to the region. A domain D is called simply connected if every simple closed path in D encloses only points of D. A simply connected domain is one that has no “holes” in it, because a simple closed path about the hole will enclose points not in the domain. If D is the plane with the origin removed, then D is not simply connected, because the unit circle about the origin encloses a point not in D. We have seen in Example 13.12 that condition (13.4) may be satisfied in this domain by a vector field having no potential function on D. Similarly, the region between two concentric circles is not simply connected, because a closed curve in this region may wrap around the inner circle, hence enclose points not in the region (Figure 13.26). We will now show that simple connectivity is just what is needed to ensure that condition (13.4) is equivalent to existence of a potential function. The key is that simple connectivity allows the use of Green’s theorem. THEOREM 13.7
Let Fx y = fx yi + gx yj be a vector field and D a simply connected domain. Suppose f , g, f/ y and g/ x are continuous on D. Then, F is conservative on D if and only if
f
g =
y
x for all x y in D. Proof
Suppose first that F is conservative, with potential . Then fx y =
x
and gx y =
y
Then
f
2
2
g = = =
y
x y y x y for x y in D. For the converse, suppose that condition (13.4) holds throughout D. We will prove that F · dR is independent of path in D. By the previous theorem, this will imply that F is C conservative. To this end, let P0 and P1 be any points of D, and let C and K be paths in D from P0 to P1 . Suppose first that these paths have only their end points in common (Figure 13.27 (a)). We can then form a positively oriented simply closed path J from P0 to P0 by moving
544
CHAPTER 13
Vector Integral Calculus y K
y
P1
K P1
D* P0
C1
P0
C1 x
x
FIGURE 13.27(a)
FIGURE 13.27(b)
Paths C and K from P0 to P1 .
A closed path formed from C and −K.
from P0 to P1 along C, then back to P0 along −K (Figure 13.27 (b)). Let J enclose a region D∗ . Since D is simply connected, every point in D∗ is in D, over which f , g, f/ y and g/ x are continuous. Apply Green’s theorem to write g f − F · dR = dA = 0
x y J D∗ But then
J
F · dR = =
so
C
C
C
F · dR + F · dR −
F · dR =
K
−K
K
F · dR
F · dR = 0
F · dR
If C and K intersect each other between P0 and P1 as in Figure 13.28 then this conclusion can still be drawn by considering closed paths between successive points of intersection. We will not pursue this technical argument. Once we have independence of path of C F · dR on D, then F is conservative, and the theorem is proved. In sum: conservative vector field =⇒ independence of path of
C
F · dR on a set D
independence of path on a domain ⇐⇒ conservative vector field and conservative on a simply connected domain ⇐⇒
y D4
P0
D1
D2
P1
D3
x FIGURE 13.28
f
g =
x
y
13.4 Surfaces in 3-Space and Surface Integrals
SECTION 13.3
545
PROBLEMS 11. F = 2xyi + x2 − 1/yj 1 3 2 2 (the path cannot cross the x axis)
In each of Problems 1 through 8, determine whether F is conservative in the given region D. If it is, find a potential function. If D is not defined, it is understood to be the entire plane.
12. F = i + 6y + sinyj 0 0 1 3 13. F = 3x2 y2 − 6y3 i + 2x3 y − 18xy2 j 0 0 1 1 y 14. F = i + lnxj 1 1 2 2 (the path must lie in the x right half-plane x > 0)
1. F = y3 i + 3xy2 − 4j 2. F = 6y + yexy i + 6x + xexy j 3. F = 16xi + 2 − y2 j 4. F = 2xy cosx2 i + sinx2 j 2x 2y i + j, D the plane with the 5. F = x2 + y 2 x2 + y 2 origin removed
15. F = −8ey + ex i − 8xey j −1 −1 3 1 3 16. F = 4xy + 2 i + 2x2 j 1 2 3 3 (the path must x lie in the half-plane x > 0)
6. F = sinhx + yi + j
17. Prove the law of conservation of energy: the sum of the kinetic and potential energies of an object acted on by a conservative force field is a constant. Hint: The kinetic energy is m/2 R t2 , where m is the mass and Rt the position vector of the particle. The potential energy is −x y , where is a potential function for the force. Show that the derivative of the sum of the kinetic and potential energies is zero along any path of motion.
7. F = 2 cos2xey i + ey sin2x − yj 8. F = 3x2 y − sinx + 1i + x3 + ey j
/ In each of Problems 9 through 16, evaluate C F · dR for C any path from the first given point to the second. 9. F = 3x2 y2 − 4yi + 2x3 y − 4x3 j −1 1 2 3 10. F = ex cosyi − ex sinyj 0 0 2 /4
13.4
Surfaces in 3-Space and Surface Integrals Analogous to the integral of a function over a curve, we would like to develop an integral of a function over a surface. This will require some background on surfaces. A curve is often given by specifying coordinate functions, each of which is a function of a single variable or parameter. For this reason we think of a curve as a one-dimensional object, although the graph may be in 2-space or 3-space. A surface may be defined by giving coordinate functions which depend on two independent variables, say x = xu v
y = yu v
z = zu v
with u v varying over some set in the u v plane. The locus of such points may form a two-dimensional object in the plane or in R3 .
EXAMPLE 13.17
Suppose a surface is given by the coordinate functions x = au cosv
y = bu sinv
z = u
with u and v any real numbers and a and b nonzero constants. In this case it is easy to write z in terms of x and y, a tactic that is sometimes useful in visualizing the surface. Notice that x 2 y 2 + = cos2 v + sin2 v = 1 au bu
546
CHAPTER 13
Vector Integral Calculus
so x2 y 2 + = u2 = z2 a2 b 2 In the plane y = 0 (the x z plane), z = ±x/a, which are straight lines of slope ±1/a through the origin. In the plane x = 0 (the y z plane), z = ±y/b, and these are straight lines of slope ±1/b through the origin. The surface intersects a plane z = c = constant = 0, in an ellipse x2 y 2 + = c2 a2 b 2
z = c
This surface is called an elliptical cone because it has elliptical cross sections parallel to the x y plane.
EXAMPLE 13.18
Consider the surface having coordinate functions x = u cosv
y = u sinv
1 z = u2 sin2v 2
in which u and v can be any real numbers. Now 1 z = u2 sin2v = u2 sinv cosv 2 = u cosv u sinv = xy This surface intersects any plane z = c = constant = 0 in the hyperbola xy = c, z = c. However, the surface intersects a plane y = ±x in a parabola z = ±x2 . For this reason this surface is called a hyperbolic paraboloid. Sometimes x and y are used as parameters, and the surface is defined by giving z as a function of x and y, say z = Sx y. Now the graph of the surface is the locus of points x y Sx y, as x y varies over some set of points in the x y plane.
EXAMPLE 13.19
Consider z = write
4 − x2 − y2 for x2 + y2 ≤ 4. By squaring both sides of this equation, we can x2 + y2 + z2 = 4
This appears to be the equation of a sphere of radius 2 about the origin. However, in the original formulation with z given by the radical, we have z ≥ 0, so in fact we have not the sphere, but the hemisphere (upper half of the sphere) of radius 2 about the origin.
EXAMPLE 13.20
The equation z = x2 + y2 for x2 + y2 ≤ 8 determines a cone having circular cross sections √ parallel to the x y plane. The “top” of the cone is the circle x2 + y2 = 8 in the plane z = 8
13.4 Surfaces in 3-Space and Surface Integrals
547
EXAMPLE 13.21
The equation z = x2 +y2 defines a parabolic bowl, extending to infinity in the positive z-direction because there is no restriction on x or y. These surfaces are easy to visualize and sketch by hand. If the defining function is more complicated then we usually depend on a software package to sketch all or part of the surface. Examples are given in Figures 13.29 through 13.33.
y
2
z
2
1
x
1 0
-1
z
-1
-2
-2 -2
FIGURE 13.29
z = cosx2 − y2 .
FIGURE 13.30
x 2
y 2 1 -1-2 1 -1 0
6 sinx − y z= . 1 + x2 + y 2
-2 z -2
2y
1 0 1 2x -1
-1 -2
FIGURE 13.31
z=
4 cosx2 + y2 . 1 + x2 + y 2
2x
-1 z -1 -2
1
0 1 2 y
FIGURE 13.32 z = x2 cosx2 − y2 .
548
CHAPTER 13
Vector Integral Calculus
z
z -2
-2 -1
-1 0 1
y
1
Tv 0
2x
P0
2
∑u 0 y
x FIGURE 13.33
∑v 0
Tu 0
z = cosxy log4 + y.
∑
FIGURE 13.34 Tangents to curves on at P0 determine the tangent plane to the surface there.
Just as we can write a position vector to a curve, we can write a position vector Ru v = xu vi + yu vj + zu vk for a surface. For any u and v in the parameter domain, Ru v can be thought of as an arrow from the origin to the point xu v yu v zu v on the surface. A surface is simple if Ru1 v1 = Ru2 v2 can occur only if u1 = u2 and v1 = v2 . A simple surface is one that does not fold over and return to the same point for different values of the parameter pairs.
13.4.1 Normal Vector to a Surface
Let be a surface with coordinate functions x = xu v, y = yu v, z = zu v. Assume that these functions are continuous with continuous first partial derivatives. Let P0 xu0 v0 yu0 v0 zu0 v0 be a point on . If we fix v = v0 , we can define the curve v0 on the surface, having coordinate functions x = xu v0
y = yu v0
z = zu v0
(See Figure 13.34.) This is a curve because its coordinate functions are functions of the single variable u. The tangent vector to v0 at P0 is Tv0 =
x
y
z u0 v0 i + u0 v0 j + u0 v0 k
u
u
u
Similarly, if we fix u = u0 and use v as parameter, we obtain the curve (also shown in Figure 13.34). This curve has coordinate functions x = xu0 v
y = yu0 v
z = zu0 v
u0
on the surface
13.4 Surfaces in 3-Space and Surface Integrals The tangent vector to
u0
549
at P0 is
Tu0 =
x
y
z u0 v0 i + u0 v0 j + u0 v0 k
v
v
v
Assumingthat neither of these tangent vectors is the zero vector, they both lie in the tangent plane to at P0 . Their cross product is therefore normal (orthogonal) to this tangent plane, and is the vector we define to be the normal to at P0 :
x
y
z u0 v0 i + u0 v0 j + u0 v0 k
u
u
u
x
y
z u v i + u0 v0 j + u0 v0 k ×
v 0 0
v
v i j k x
y
z = u u0 v0 u u0 v0 u u0 v0 x u0 v0 y u0 v0 z u0 v0
v
v
v
y z z y
z x x z
x y y x = − − − i+ j+ k
u v u v
u v u v
u v u v
NP0 =
(13.5)
in which all the partial derivatives are evaluated at u0 v0 . An expression that is easier to remember is obtained by introducing Jacobian notation. Define the Jacobian determinant (named for the German mathematician Karl Jacobi) of functions f and g to be the 2 × 2 determinant
f g =
u v In this notation, the normal vector to
NP0 =
f
u
f
v
g
u
g
v
f g g f − = u v u v
at P0 is
z x
x y
y z i+ j+ k
u v
u v
u v
with all the partial derivatives evaluated at u0 v0 . This notation helps in remembering the normal vector because of the cyclic pattern in the Jacobian symbols. Write x y z in this order. For the first component of NP0 , omit the first letter, x, to obtain y z/ u v. For the second component, omit y, but maintain the same cyclic direction, moving left to right through x y z. This means we start with z, the next letter after y, then back to x, obtaining
z x/ u v. For the third component, omit z, leaving x y and the Jacobian
x y/ u v. at P0 . Of course, any nonzero real multiple of NP0 is also a normal to
550
CHAPTER 13
Vector Integral Calculus
EXAMPLE 13.22
Consider again the elliptical cone x = au cosv
y = bu sinv z = u √ Suppose we want the normal vector at P0 a 3/4 b/4 1/2, obtained when u = u0 = 1/2, v = v0 = /6. Compute the Jacobians:
y z z y
y z − =
u v 1/2/6
u v u v 1/2/6 √ = b sinv0 − bu cosv1/2/6 = − 3b/4
z x x z
z x − =
u v 1/2/6
u v u v 1/2/6 = −au sinv − a cosv01/2/6 = −a/4 and
x y
u v
x y y x − =
u v u v 1/2/6
1/2/6
= a cosvbu cosv − b sinv−au sinv1/2/6 = ab/2 The normal vector at P0 is √ b a ab NP0 = − 3 i − j + k 4 4 2 Consider the special case that the surface is given explicitly as z = Sx y. We may think of u = x and v = y as the parameters for and write the coordinate functions as x = x
y = y
z = Sx y
Since x/ x = 1 = y/ y and x/ y = y/ x = 0, we have
S
y z 0 1
y z = =
S = −
u v x y S
x
x
y
S
S
z x S
x
y =− =
x y 1 0
y and
x y 1 =
x y 0
0 = 1 1
The normal at a point P0 x0 y0 Sx0 y0 in this case is NP0 = −
S
S x y i − x0 y0 j + k
x 0 0
y
=−
z
z x0 y0 i − x0 y0 j + k
x
y
We can also denote this vector as Nx0 y0 .
(13.6)
13.4 Surfaces in 3-Space and Surface Integrals
EXAMPLE 13.23
Consider the cone given by z = Sx y = x
S = 2
x x + y2
551
x2 + y2 . Then
y
S = 2
y x + y2 √ except for x = y = 0. Take, for example, the point 3 1 10. The normal vector at this point is √ 3 1 N3 1 10 = − √ i − √ j + k 10 10 and
This normal vector is shown in Figure 13.35, and it points into the cone. In some contexts we want to know whether a normal vector is an inner normal (such as this one) or an outer normal (pointing out of the region bounded by the surface). If we wanted an outer normal at this point, we could take 1 3 −NP0 = √ i + √ j − k 10 10 This cone does not have a normal vector at the origin, which is a “sharp point” of the surface. There is no tangent plane at the origin.
z
(3, 1, 10)
N
y x FIGURE 13.35 Normal to
the cone z = √ 3 1 10.
x2 + y2 at
The normal vector (13.6) could also have been derived using the gradient vector. If given by z = Sx y, then is a level surface of the function
is
x y z = z − Sx y The gradient of this function is a normal vector, so compute & =
i+ j+ k
x
y
z
=−
13.4.2
S
S i − j + k = NP
x
y
The Tangent Plane to a Surface
If a surface has a normal vector N at a point P0 , then we can use N to determine the equation of the tangent plane to at P0 . Let x y z be any point on the tangent plane. Then the vector x − x0 i + y − y0 j + z − z0 k is in the tangent plane, hence is orthogonal to N. Then N · x − x0 i + y − y0 j + z − z0 k = 0
552
CHAPTER 13
Vector Integral Calculus
More explicitly,
y z
z x
x y x − x0 + y − y0 + z − z0 = 0
u v u0 v0
u v u0 v0
u v u0 v0 This is the equation of the tangent plane to at P0 .
EXAMPLE 13.24
Consider again the elliptical cone given by x = au cosv
y = bu sinv z = u √ √ We found in Example 13.22 that the normal vector at P a 3/4 b/4 1/2 is N = − 3b/4i− 0 a/4j + ab/2k. The tangent plane to at this point has equation √ √ b b ab 1 a 3 a x− + z − = 0 − 3 x− − 4 4 4 4 2 2 In the special case that is given by z = Sx y, then the normal vector at P0 is N = − S/ xx0 y0 i − S/ yx0 y0 j + k, so the equation of the tangent plane becomes −
S
S x0 y0 x − x0 − x0 y0 y − y0 + z − z0 = 0
x
y
This equation is usually written z − z0 =
S
S x y x − x0 + x0 y0 y − y0
x 0 0
y
13.4.3 Smooth and Piecewise Smooth Surfaces Recall that a curve is smooth if it has a continuous tangent. Similarly, a surface is smooth if it has a continuous normal vector. A surface is piecewise smooth if it consists of a finite number of smooth surfaces. For example, a sphere is smooth, and the surface of a cube is piecewise smooth. A cube consists of six square pieces, which are smooth, but does not have a normal vector along any of its edges. In calculus, it is shown that the area of a smooth surface given by z = Sx y is the integral 0 2 2
S
S area of = 1+ + dA (13.7)
x
y D where D is the set of points in the x y− plane for which S is defined. This may also be written 0 2 2
z
z area of = 1+ + dxdy
x
y D Equation (13.7) is the integral of the length of the normal vector (13.6): area of = Nx y dxdy D
This is analogous to the formula for the length of a curve as the integral of the length of the tangent vector.
13.4 Surfaces in 3-Space and Surface Integrals
553
More generally, if is given by coordinate functions x = xu v, y = yu v and z = zu v, with u v varying over some set D in the u v plane, then (13.8) area of = Nu v dudv D
the integral of the length of the normal vector, which is given by equation (13.5).
EXAMPLE 13.25
We will illustrate these formulasfor surface area for a simple case in which we know the area from elementary geometry. Let be the upper hemisphere of radius 3 about the origin. We can write as the graph of z = Sx y = 9 − x2 − y2 , with x2 + y2 ≤ 9. D consists of all points on or inside the circle of radius 3 about the origin in the x y-plane. We can use equation (13.7). Compute x x
z = − =− 2 2
x z 9−x −y and, by symmetry,
z y =−
y z Then area of
= =
D
D
0 1+ 0
2 2 x y + dxdy z z
z2 + x 2 + y 2 3 dxdy = dxdy 2 z D 9 − x2 − y 2
This is an improper double integral which we can evaluate easily by converting it to polar coordinates. Let x = r cos, y = r sin. Since D is the disk of radius 3 about the origin, 0 ≤ r ≤ 3 and 0 ≤ ≤ 2. Then 2 3 3 3 dxdy = rdrd √ 2 2 0 0 D 9 − r2 9−x −y 3 3
r = 6 dr = 6 −9 − r 2 1/2 0 √ 2 0 9−r
1/2 = 18 = 6 9 This is the area of a hemisphere of radius 3. We are now prepared to define the integral of a function over a surface.
13.4.4
Surface Integrals
The notion of the integral of a function over a surface is modeled after the line integral, with respect to arc length, of a function over a curve. Recall that, if a smooth curve C is given by x = xt, y = yt, z = zt for a ≤ t ≤ b, then the arc length along C is t st = x 2 + y 2 + z 2 d a
554
CHAPTER 13
Vector Integral Calculus
Then ds =
x t2 + y t2 + z t2 dt
and the line integral of a function f along C, with respect to arc length, is C
fx y zds =
b a
fxt yt zt x t2 + y t2 + z t2 dt
We want to lift these ideas up one dimension to integrate over a surface instead of a curve. Now we have coordinate functions that are functions of two independent variables, say u and v, with u v varying over some given set D in the u v plane. This means that b will be replaced by D . The differential element of arc length, ds, which is used in the a line integral, will be replaced by the differential element d of surface area on the surface. By equation (13.8), d = Nu v dudv, in which Nu v is the normal vector at the point xu v yu v zu v on .
DEFINITION 13.6
Surface Integral
Let be a smooth surface having coordinate functions x = xu v, y = yu v, z = zu v for u v in some set D of the u v plane. Let f be continuous on . Then the surface integral of f over is denoted fx y zd, and is defined by fx y zd = fxu v yu v zu v Nu v dudv D
If is a piecewise smooth surface having smooth components 1 n , with each component either disjoint from the others, or intersecting another component in a set of zero area (for example, along a curve), then fx y zd
=
fx y zd + · · · +
1
fx y zd
n
For example, we would integrate over the surface of a cube by summing the surface integrals over the six faces. Two such faces either do not intersect, or intersect each other along a line segment having zero area. The intersection condition is to prevent the selection of surface components that overlap each other in significant ways. This is analogous to a piecewise smooth curve C formed as the join of smooth curves C1 Cn . When we do this, we assume that two of these component curves either do not intersect, or intersect just at an end point, not along an arc of both curves. If is described by z = Sx y, then
fx y zd =
0
D
fx y Sx y 1 +
S
x
2 +
S
y
2 dxdy
We will look at some examples of evaluation of surface integrals, then consider uses of surface integrals.
13.4 Surfaces in 3-Space and Surface Integrals
555
z (0, 0, 4) (0, 1, 3) Part of the plane x y z 4
(2, 0, 2)
(0, 1)
(2, 1, 1) (2, 0)
y
(0, 4, 0) (2, 1)
(4, 0, 0) x FIGURE 13.36
EXAMPLE 13.26
Evaluate zd if is the part of the plane x + y + z = 4 lying above the rectangle 0 ≤ x ≤ 2, 0 ≤ y ≤ 1. The surface is shown in Figure 13.36. D consists of all x y with 0 ≤ x ≤ 2 and 0 ≤ y ≤ 1. With z = Sx y = 4 − x − y we have zd = z 1 + −12 + −12 dxdy D
√ 2 1 = 3 4 − x − ydydx 0
0
First compute 0
1
1 4 − x − ydy = 4 − xy − y2 2
1 0
1 7 = 4 − x − = − x 2 2 Then
zd =
√ √ 2 7 − x dx = 5 3 3 2 0
EXAMPLE 13.27
Recall the hyperbolic paraboloid of Example 13.18 given by 1 y = u sinv z = u2 sin2v 2 xyzd over the part of this surface corresponding to We will compute the surface integral 1 ≤ u ≤ 2, 0 ≤ v ≤ . First we need the normal vector. The components of Nu v are the Jacobians:
y z sinv u cosv =
u v u sin2v u2 cos2v x = u cosv
= u2 sinv cos2v − cosv sin2v
556
CHAPTER 13
Vector Integral Calculus
z x u sin2v =
u v cosv
u2 cos2v −u sinv
= −u2 sinv sin2v + cosv cos2v and
x y cosv −u sinv =
u v sinv u cosv
= u
Then Nu v2 = u4 sinv cos2v − cosv sin2v2 + u4 sinv sin2v + cosv cos2v2 + u2 = u2 1 + u2 so
Nu v = u 1 + u2
The surface integral is
1 2 xyzd =
u cosv u sinv sin2v u 1 + u2 dA u 2 D 2 = cosv sinv sin2vdv u5 1 + u2 du 0
=
4
100 √ 21
11 √ 5− 2 105
1
As we expect of an integral, fx y z + gx y zd = fx y zd + gx y zd and, for any real number ,
fx y zd =
fx y zd
The next section is devoted to some applications of surface integrals.
PROBLEMS
SECTION 13.4 In
each of Problems fx y zd.
1
through
10,
evaluate
1. fx y z = x, is the part of the plane x +4y +z = 10 in the first octant 2. fx y z = y2 , is the part of the plane z = x for 0 ≤ x ≤ 2, 0 ≤ y ≤ 4
3. fx y z = 1, is the part of the paraboloid z = x2 + y2 lying between the planes z = 2 and z = 7
4. fx y z = x + y, is the part of the plane 4x + 8y + 10z = 25 lying above the triangle in the x y− plane having vertices 0 0, 1 0 and 1 1 5. fx y z = z, is the part of the cone z = x2 + y2 lying in the first octant and between the planes z = 2 and z = 4 6. fx y z = xyz, is the part of the plane z = x + y with x y lying in the square with vertices 0 0, 1 0, 0 1 and 1 1
13.5 Applications of Surface Integrals
9. fx y z = z, is the part of the plane z = x − y for 0 ≤ x ≤ 1 0 ≤ y ≤ 5 10. fx y z = xyz, is the part of the cylinder z = 1 + y2 for 0 ≤ x ≤ 1 0 ≤ y ≤ 1
7. fx y z = y, is the part of the cylinder z = x2 for 0 ≤ x ≤ 2 0 ≤ y ≤ 3 8. fx y z = x2 , is the part of the paraboloid z = 4 − x2 − y2 lying above the x y plane
13.5
557
Applications of Surface Integrals 13.5.1 If
Surface Area
is a piecewise smooth surface, then 1d = . Nu v dudv = area of D
(This assumes a bounded surface having finite area.) Clearly we do not need the notion of a surface integral to compute the integral and obtain the area of a surface. However, we mention this result because it is in the same spirit as other familiar mensuration formulas: ds = length of C
C
D
dA = area of D
and, if M is a solid region in 3-space enclosing a volume, then dV = volume of M. M
13.5.2
Mass and Center of Mass of a Shell
Imagine a thin shell of negligible thickness taking the shape of a smooth surface . Let x y z be the density of the material of the shell at x y z. Assume that is continuous. We want to compute the mass of the shell. Suppose has coordinate functions x = xu v, y = yu v, z = zu v for u v in D. Form a grid over D in the u v plane by drawing lines (dashed lines in Figure 13.37) parallel to the axes, and retain only those rectangles R1 RN intersecting D. Let the vertical lines be u units apart, and the horizontal lines, v units apart. For u vvarying over Rj , we obtain a patch v for u v in Rj . or surface element j on the surface (Figure 13.38). That is, j u v = u Let uj vj be a point in Rj , and approximate the density of the surface element j by the constant j = xuj vj yuj vj zuj vj Because is continuous and j has finite area, we can choose u and v sufficiently small as closely as we like over j . that j approximates x y z Approximate the mass of j as j times the area of j . Now this area is area of j = Nu v dudv Rj
# # ≈ #Nuj vj # u v so mass of The mass of
j
# # ≈ j #Nuj vj # u v
is approximately the sum of the masses of the surface elements:
mass of
≈
N j=1
# # xuj vj yuj vj zuj vj #Nuj vj # u v
558
CHAPTER 13
Vector Integral Calculus
In 1927, Congress approved construction of a dam to control the Colorado River for the purpose of fostering agriculture in the American southwest and as a source of hydroelectric power, spurring the growth of Las Vegas and southern California. Construction began in 1931. One major problem of construction was the cooling of concrete as it was poured. Engineers estimated that the amount of concrete required would take 100 years to cool. The solution was to pour it in rows and columns of blocks, through which cooled water was pumped in pipes. Hoover Dam was completed in 1935 and is 727 feet high, 1244 feet long, 660 feet thick at its base, and 45 feet thick at the top. It weighs about 5.5 million tons and contains 3,250,000 cubic yards of concrete. On one side of the dam, Lake Mead is over 500 feet deep. Computation of forces and stresses on parts of the dam surface involve a combination of material science, fluid flow, and vector analysis.
v z (x(uj , vj), y(uj , vj), z(uj , vj)) Rj ∆v
(uj , vj)
∑j
∑
u y ∆u FIGURE 13.37
plane.
Grid over D in the u, v
x FIGURE 13.38
Surface element
j
on
.
13.5 Applications of Surface Integrals
559
This is a Riemann sum for D xu v yu v zu v Nu v dudv. Hence in the limit as u → 0 and v → 0 we obtain mass of = x y zd The mass of the shell is the surface integral of the density function. This is analogous to the mass of a wire being the line integral of the density function over the wire. The center of mass of the shell is x y z, where 1 1 x= xx y zd y = yx y zd m m and z=
1 zx y zd m
in which m is the mass of the shell.
EXAMPLE 13.28
We will find the mass and center of mass of the cone z = x2 + y2 , where x2 + y2 ≤ 4 and the density function is x y z = x2 + y2 . First calculate the mass m. We will need
z x
z y = and =
x z
y z Then m= = =
x
D
x2 + y2 1 +
2 2 0
+ y2 d 0
2
x2 y 2 + dydx z2 z2
√ r 2 2rdrd
0
√ 1 = 2 2 r 4 4
2
√ = 8 2
0
By symmetry of the surface and of the density function, we expect the center of mass to lie on the z axis, so x = y = 0. Finally, 1 2 2 z= √ zx + y d 8 2 0 x2 y 2 1 2 = √ x + y2 x2 + y2 1 + 2 + 2 dydx z z 8 2 D 2 2 1 = rr 2 rdrd 8 0 0
1 5 2 8 1 = 2 r = 8 5 5 0 The center of mass is 0 0 85 .
560
CHAPTER 13
Vector Integral Calculus
13.5.3 Flux of a Vector Field Across a Surface Suppose a fluid moves in some region of 3-space (for example, through a pipeline), with velocity Vx y z t. Consider an imaginary surface within the fluid, with continuous unit normal vector Nu v t. The flux of V across is the net volume of fluid, per unit time, flowing across in the direction of N. We would like to calculate this flux. In a time interval t the volume equals the of fluid flowing across a small piece j of V t, where V is the component of V in volume of the cylinder with base j and altitude N N . This volume (Figure 13.39) is V the direction of N, evaluated at some point of N tAj , j is the area of . Because = 1, V = V · N. The volume of fluid flowing across where A N j N j per unit time is j VN tAj = VN Aj = V · NAj t Sum these quantities over all the pieces of the surface and take the limit as the pieces are chosen smaller, as we did for the mass of the shell. We get flux of V across in the direction of N = V · Nd The flux of a vector across a surface is therefore computed as the surface integral of the normal component of the vector to the surface.
V
Volume VN t Area of j
z
N
VN t j y
x FIGURE 13.39
EXAMPLE 13.29
Find the flux of F = xi + yj + zk across the part of the sphere x2 + y2 + z2 = 4 lying between the planes z = 1 and z = 2. The surface is shown in Figure 13.40, along with the normal vector (computed below) at a point. We may think of as defined by z = Sx y , where S is defined by the equation of the sphere andx y varies over a set D in the x y plane. To determine D, observe that the plane z = 2 hits only at its “north pole” 0 0 2. The plane z = 1 intersects the sphere √ in the circle x2 + y2 = 3 , z = 1. This circle projects onto the x y plane to the circle of radius 3 about the origin. Thus D consists of points x y satisfying x2 + y2 ≤ 3 (shaded in Figure 13.41). To compute the partial derivatives z/ x and z/ y, we can implicitly differentiate the equation of the sphere to get 2x + 2z
z = 0
x
13.5 Applications of Surface Integrals z
z
x2 y2 z2 4 1 z 2
(0, 0, 2) 12
561
(x i y j zk)
(0, 0, 1)
y y
x
x
FIGURE 13.40
x2 y2 3
FIGURE 13.41
so x
z =−
x z and, similarly, y
z =−
y z A normal vector to the sphere is therefore x y − i − j − k z z Since we need a unit normal in computing flux, we must divide this vector by its length, which is 0 0 x2 y 2 x 2 + y 2 + z2 2 + 2 +1 = = 2 2 z z z z A unit normal is therefore
z y 1 x N= − i − j − k = − xi + yj + zk 2 z z 2 This points into the sphere. If we want the flux across from the outside of the sphere toward the inside, this is the normal to use. If we want the flux across from within the sphere, use −N instead. Now, 1 2 F · −N = x + y 2 + z2 2 There 1 flux = x2 + y2 + z2 d 2 0 x2 y 2 1 2 x + y2 + z2 1 + 2 + 2 dA = 2 D z z 0 1 2 x 2 + y 2 + z2 = x + y2 + z2 dA 2 D z2 3/2 1 1 2 x + y 2 + z2 dA = 2 D 4 − x2 − y 2 1 =4 dA D 4 − x2 − y 2
CHAPTER 13
562
Vector Integral Calculus
because x2 + y2 + z2 = 4 on
. Converting to polar coordinates, we have
2
√
3
1 rdrd √ 4 − r2
√3 = 8 −4 − r 2 1/2 0 = 8
flux = 4
0
0
We will see other applications of surface integrals when we discuss the integral theorems of Gauss and Stokes, and again when we study partial differential equations.
SECTION 13.5
PROBLEMS
1. Find the mass and center of mass of the triangular shell having vertices 1 0 0, 0 3 0 and 0 0 2 if x y z = xz + 1. 2. Find the center of mass of the portion of the homogeneous sphere x2 + y2 + z2 = 9 lying above the plane z = 1. (Homogeneous means that the density function is constant).
2 2 2 2 tween the cylinders x + y = 1 and x + y = 9, if x y z = xy/ 1 + 4x2 + 4y2 .
5. Find the mass and center of mass of the paraboloid z = 6 − x2 − y2 if x y z = 1 + 4x2 + 4y2 . 6. Find the center of mass of the part of the homogeneous sphere x2 + y2 + z2 = 1 lying in the first octant.
3. Findthe center of mass of the homogeneous cone z = x2 + y2 for x2 + z2 ≤ 9.
7. Find the flux of F = xi + yj − zk across the part of the plane x + 2y + z = 8 lying in the first octant.
4. Find the center of mass of the part of the paraboloid z = 16 − x2 − y2 lying in the first octant and be-
8. Find the flux of F = xzi − yk across the part of the sphere x2 + y2 + z2 = 4 lying above the plane z = 1.
13.6
Preparation for the Integral Theorems of Gauss and Stokes The fundamental results of vector integral calculus are the theorems of Gauss and Stokes. In this section we will begin with Green’s theorem and explore how natural generalizations lead to these results. With appropriate conditions on the curve and the functions, the conclusion of Green’s theorem is g f − fx ydx + gx ydy = dA
x y C D in which D is the region on and enclosed by the simple closed smooth curve C. Define the vector field Fx y = gx yi − fx yj Then & ·F =
g f −
x y
Now parametrize C by arc length, so the coordinate functions are x = xs, y = ys for 0 ≤ s ≤ L. The unit tangent vector to C is Ts = x si + y sj, and the unit normal vector
13.6 Preparation for the Integral Theorems of Gauss and Stokes
563
is Ns = y si − x sj. These are shown in Figure 13.42. This normal points away from the interior D of C, and so is an outer normal. Now F · N = gx y so
C
dx dy + fx y ds ds
- dx dy fx y + gx y ds ds ds C = F · Nds
fx ydx + gx ydy =
C
We may therefore write the conclusion of Green’s theorem in vector form as F · Nds = & · FdA C
(13.9)
D
This is a conservation of energy equation. Recall from Section 12.4.1 that the divergence of a vector field at a point is a measure of the flow of the field from that point. Equation (13.9) states that the flux of the vector field outward from D across C (because N is an outer normal) exactly balances the flow of the field from each point in D. The reason for writing Green’s theorem in this form is that it suggests a generalization to three dimensions. Replace the closed curve C in the plane with a closed surface in 3-space (closed meaning bounding a volume). Replace the line integral over C with a surface integral over , and allow the vector field F to be a function of three variables. We conjecture that equation (13.9) generalizes to & · FdV F · Nd = M
in which N is a unit normal to pointing away from the solid region M bounded by . We will see that, under suitable conditions on and F, this is the conclusion of Gauss’s divergence theorem. Now begin again with Green’s theorem. We will pursue a different generalization to three dimensions. This time let Fx y z = fx yi + gx yj + 0k The reason for adding the third component is to be able to take the curl: i j k
g f & × F = x y z = − k
x y f g 0 Then & × F · k =
y
g f −
x y
z T N
C
x FIGURE 13.42
x FIGURE 13.43
N
y
CHAPTER 13
564
Vector Integral Calculus
Further, with unit tangent Ts = x si + y sj to C, we can write dx dy F · Tds = fx yi + gx yj · i + j ds ds ds = fx ydx + gx ydy so the conclusion of Green’s theorem can also be written F · Tds = & × F · kdA C
(13.10)
D
Now think of D as a flat surface in the x y plane, with unit normal vector k, and bounded by the closed curve C. To generalize this, allow C to be a curve in 3-space bounding a surface having unit outer normal vector N, as shown in Figure 13.43. Now equation (13.10) suggests that F · Tds = & × F · Nd C
We will see this equation shortly as the conclusion of Stokes’s theorem.
SECTION 13.6
PROBLEMS
1. Let C be a simple closed path in the x y plane, with interior D. Let x y and x y be continuous with continuous first and second partial derivatives on C and throughout D. Let & 2 = Prove that
=
-
C
& 2 dA
D
C
2 2 + 2
x2
y
3. Let C be a simple closed path in the x y plane, with interior D. Let be continuous with continuous first and second partial derivatives on C and at all points of D. Let Nx y be the unit outer normal to C (outer meaning pointing away from D if drawn as an arrow at x y on C). Prove that N x yds = & 2 x ydA
−
D
(Recall that N x y is the directional derivative of in the direction of N).
dx + dy − & · &dA
y
x D
2. Under the conditions of Problem 1, show that & 2 − & 2 dA D
-
= − dx + − dy
y
y
x
x C
13.7
The Divergence Theorem of Gauss We have seen that, under certain conditions, the conclusion of Green’s theorem is F · Nds = & · FdA C
D
13.7 The Divergence Theorem of Gauss
565
Now make the following generalizations from the plane to 3-space: set D in the plane → a 3-dimensional solid M a closed curve C bounding D → a surface enclosing M a unit outer normal N to C → a unit outer normal N to a vector field F in the plane→ a vector field F in 3-space / a line integral C F · Nds → a surface integral F · Nd & · FdV a double integral D & · FdA → a triple integral M With these correspondences and some terminology, Green’s theorem suggests a theorem named for the great nineteenth-century German mathematician and scientist Carl Friedrich Gauss. A surface is closed if it encloses a volume. For example, a sphere is closed, as is a cube, while a hemisphere is not. A surface consisting of the top part of the sphere x2 + y2 + z2 = a2 , together with the disk x2 + y2 ≤ a2 in the x y plane, is closed. If filled with water (through some opening that is then sealed off), it will hold the water. A normal vector N to is an outer normal if, when represented as an arrow from a point of the surface, it points away from the region enclosed by the surface (Figure 13.44). If N is also a unit vector, then it is a unit outer normal. z
N (Unit outer normal)
M
y x FIGURE 13.44
THEOREM 13.8
Gauss’s Divergence Theorem
Let be a piecewise smooth closed surface. Let M be the set of points on and enclosed by . Let have unit outer normal vector N. Let F be a vector field whose components are continuous with continuous first and second partial derivatives on and throughout M. Then & · FdV (13.11) F · Nd = M
& · F is the divergence of the vector field, hence the name “divergence theorem”. In the spirit of Green’s theorem, Gauss’s theorem relates vector operations over objects of different dimensions. A surface is a two-dimensional object (it has area but no volume), while a solid region in 3-space is three-dimensional. Gauss’s theorem has several kinds of applications. One is to replace one of the integrals in equation (13.11) with the other, in the event that this simplifies an integral evaluation. A second is to suggest interpretations of vector operations. A third is to serve as a tool in deriving physical laws. Finally, we will use the theorem in developing relationships to be used in solving partial differential equations. Before looking at uses of the theorem, here are two purely computational examples to provide some feeling for equation (13.11).
566
CHAPTER 13
Vector Integral Calculus
EXAMPLE 13.30
Let be the piecewise smooth closed surface consisting of the surface 1 of the cone z = x2 + y2 for x2 + y2 ≤ 1, together with the flat cap 2 consisting of the disk x2 + y2 ≤ 1 in the plane z = 1. This surface is shown in Figure 13.45. Let Fx y z = xi + yj + zk. We will calculate both sides of equation (13.11). The unit outer normal to 1 is y 1 x i+ j−k N1 = √ z 2 z Then
because on
y 1 x F · N1 = xi + yj + zk · √ i+ j−k z 2 z 2 2 y 1 x + −z = 0 =√ z z 2
1,
z2 = x2 + y2 . (One can also see geometrically that F is orthogonal to N1 .) Then F · N1 d = 0
The unit outer normal to
1
2
is N2 = k, so F · N2 = z
Since z = 1 on
2,
then
F · N2 d =
2
2
= area of Therefore
F · Nd =
zd =
F · N1 d +
=
2
1
F · N2 d =
2
Now compute the triple integral. The divergence of F is & ·F =
x + y + z = 3
x
y
z
z
(0, 0, 1) 2 1
x2 y2 z 2 y
x FIGURE 13.45
d
2
13.7 The Divergence Theorem of Gauss so
M
567
& · FdV =
3dV M
= 3 volume of the cone of height 1, radius 1 1 = 3 = 3
EXAMPLE 13.31
Let
be the piecewise smooth surface of the cube having vertices 0 0 0 1 0 0 0 1 0 0 0 1 1 1 0 0 1 1 1 0 1 1 1 1
Let Fx y z = x2 i + y2 j + z2 k. We would like to compute the flux of this vector field across the faces of the cube. The flux is F · Nd. This integral can certainly be calculated directly, but it requires performing the integration over each of the six smooth faces of . It is easier to use the triple integral from Gauss’s theorem. Compute the divergence & · F = 2x + 2y + 2z and then flux = = = = =
F · Nd
M
& · FdV = 2
1 1 0
1 0
0
0
1
0 1
0
1
M
x + y + zdV
2x + 2y + 2zdzdydx
2x + 2y + 1dydx
2x + 2dx = 3
Now we will move to more substantial uses of the theorem.
13.7.1
Archimedes’s Principle
Archimedes’s Principle states that the buoyant force a fluid exerts on a solid object immersed in it, is equal to the weight of the fluid displaced. A bar of soap floats or sinks in a full bathtub for the same reason a battleship floats or sinks in the ocean. The issue rests with the weight of the fluid displaced by the object. We will derive this principle. Consider a solid object M bounded by a piecewise smooth surface . Let be the constant density of the fluid. Draw a coordinate system as in Figure 13.46, with M below the surface of the fluid. Using the fact that pressure is the product of depth and density, the pressure px y z at a point on is given by px y z = −z. The negative sign is used because z is negative in the downward direction and we want pressure to be positive. the Now consider a piece j of the surface, also shown in Figure 13.46. The force of pressure on this surface element has magnitude approximately −z times the area Aj of j .
568
CHAPTER 13
Vector Integral Calculus z y k x
N j
M FIGURE 13.46
Pressure force on j ≈ zN · Aj . Vertical component = zN · kAj .
If N is the unit outer normal to j , then the force caused by the pressure on j is approximately zNA j . The vertical component of this force is the magnitude of the buoyant force acting upward on j . This vertical component is zN · kAj . Sum these vertical components over the entire surface to obtain approximately the net buoyant force on the object, then take the limit as the surface elements are chosen smaller (areas tending to zero). We obtain in this limit that net buoyant force on = zN · kd Write this integral as into a triple integral:
zk · Nd
and apply Gauss’s theorem to convert the surface integral
net buoyant force on But & · zk = , so net buoyant force on
=
=
M
M
& · zkdV
dV = volume of M
But this is exactly the weight of the fluid displaced, establishing Archimedes’s Principle.
13.7.2 The Heat Equation We will derive a partial differential equation that models heat conduction. Suppose some medium (for example, a metal bar, the air in a room or water in a pool) has density x y z, specific heat x y z, and coefficient of thermal conductivity Kx y z. Let ux y z t be the temperature of the medium at time t and point x y z. We want to derive an equation for u. We will employ a device used frequently in deriving mathematical models. Consider an imaginary smooth closed surface within the medium, bounding a solid region M. The amount of heat energy leaving M across in a time interval t is K&u · Nd t This is the flux of the vector (K times the gradient of u) across this surface, multiplied by the length of the time interval. But, the change in temperature at x y z in M in this time interval is approximately u/ t t, so the resulting heat loss in M is
u dV t
t M
13.7 The Divergence Theorem of Gauss
569
Assuming that there are no heat sources or losses within M (for example, chemical reactions or radioactivity), the change in heat energy in M over this time interval must equal the heat exchange across . Then
u K&u · Nd t = dV t
t M Therefore
K&u · Nd =
M
u dV
t
Apply Gauss’s theorem to the surface integral to obtain
u & · K&udV = dV
t M M The role of Gauss’s theorem here is to convert the surface integral to a triple integral, thus obtaining an equation with the same kind of integral on both sides. This allows us to combine terms and write the last equation as u − & · K&u dV = 0
t M Now keep in mind a crucial point— is any smooth closed surface within the medium. Assume that the integrand in the last equation is continuous. If this integrand were nonzero at any point P0 of the medium, then it would be positive or negative at P0 , say positive. By continuity of this integrand, there would be a sphere S, centered at P0 , of small enough radius that the integrand would be strictly positive on and within S. But then we would have u − & · K&u dV > 0
t M in which M is the solid ball bounded by S. By choosing = S, this is a contradiction. We conclude that
u − & · K&u = 0
t
at all points in the medium, for all times. This gives us the partial differential equation
u = & · K&u
t
for the temperature function at any point and time. This equation is called the heat equation. We can expand
u
u
u & · K&u = & · K i + K j + K k
x
y
z
u
u
u K + K + K
x x
y y
z z 2
u 2 u 2 u
K u K u K u + + +K = + +
x x y y
z z
x2 y2 z2
=
= &K · &u + K& 2 u
570
CHAPTER 13
Vector Integral Calculus
in which & 2u =
2 u 2 u 2 u + +
x2 y2 z2
is called the Laplacian of u. (& 2 is read “del squared”). Now the heat equation can be written
u = &K · &u + K& 2 u
t
If K is constant, then its gradient is the zero vector and this equation simplifies to K 2
u = & u
t In the case of one space dimension (for example, if ux t is the temperature distribution in a thin bar lying along a segment of the x axis), this is
2 u
u =k 2
t
x with k = K/p.K The steady-state case occurs when u does not change with time. In this case u/ t = 0, and the last equation becomes & 2 u = 0 a partial differential equation called Laplace’s equation. In Chapters 18 and 19 we will write solutions of the heat equation and Laplace’s equation under various conditions.
13.7.3 The Divergence Theorem as a Conservation of Mass Principle We will derive a model providing a physical interpretation of the divergence of a vector. Let Fx y z t be the velocity of a fluid moving in a region of 3-space at point x y z point in this region. Place an imaginary sphere r of radius r about and time t. Let P0 be a 13.47. bounds a solid ball M . Let N be the unit outer normal to P0 , as in Figure r r r . We know that r F · Nd is the flux of F out of Mr across r . If r is sufficiently small, then for a given time, & ·Fx y z t is approximated by & ·FP0 t at all points x y z of Mr , to within any desired tolerance. Therefore & · Fx y z tdV ≈ & · FP0 tdV Mr
Mr
4 = & · FP0 t volume of Mr = r 3 & · FP0 t 3
N
P0 Mr
FIGURE 13.47
r
13.7 The Divergence Theorem of Gauss
571
Then
3 & · Fx y z tdV 4r 3 Mr 3 = F · Nd 4r 3 by Gauss’s theorem. Let r → 0. Then r contracts to its center P0 and this approximation becomes an equality: 3 & · FP0 t = lim F · Nd r→0 4r 3 r & · FP0 t ≈
On the right is the limit, as r → 0, of the flux of F across the sphere of radius r, divided by the volume of this sphere. This is the amount per unit volume of fluid flowing out of Mr across . Since the sphere contracts to P0 in this limit, we interpret the right side, hence also the r divergence of F at P0 , as a measure of fluid flow away from P0 . This provides a physical sense of the divergence of a vector field. In view of this interpretation, the equation & · FdV F · Nd = M
states that the flux of F out of M across its bounding surface exactly balances the divergence of fluid away from the points of M. This is a conservation of mass statement, in the absence of fluid produced or destroyed within M, and provides a model for the divergence theorem.
SECTION 13.7
PROBLEMS
In each of Problems 1 through 8, evaluate F · Nd or divFdV , whichever is most convenient. M 1. F = xi + yj − zk, the sphere of radius 4 about 1 1 1 2. F = 4xi − 6yj + k, the surface of the solid cylinder x2 + y2 ≤ 4, 0 ≤ z ≤ 2 (the surface includes the end caps of the cylinder) 3. F = 2yzi − 4xzj + xyk, the sphere of radius 5 about −1 3 1 4. F = x3 i + y3 j + z3 k, the sphere of radius 1 about the origin 5. F = 4xi − zj + xk, the hemisphere x2 + y2 + z2 = 1, z ≥ 0, including the base consisting of points x y with x2 + y2 ≤ 1 6. F = x − yi + y − 4xzj + xzk, the surface of the rectangular box bounded by the coordinate planes x = 0, y = 0 and z = 0 and by the planes x = 4, y = 2 and z = 3. 7. F = x2 i + y2 j + z2 k, the cone z = x2 + y2 for x2 + y2 ≤ 2, √together with the top cap consisting of points x y 2 with x2 + y2 ≤ 2.
8. F = x2 i − ez j + zk, the surface bounding the cylin2 2 der x + y ≤ 4, 0 ≤ z ≤ 2 (including the top and bottom caps of the cylinder) 9. Let be a smooth closed surface and F a vector field with components that are continuous with continuous first and second partial derivatives. through and its interior. Evaluate & × F · Nd. 10. Let x y z and x y z be continuous with continuous first and second partial derivatives on a smooth closed surface and M. Suppose its interior & = O in M. Prove that & 2 dV = 0. M 11. Show that under theconditions of Problem 10, if & = & = O, then & 2 − & 2 dV = 0. M 12. Let be a smooth closed surface bounding an interior M. Show that 1 R · Nd volume of M = 3 where R = xi + yj + zk is a position vector for . 13. Suppose f and g satisfy Laplace’s equation in a region M bounded by a smooth closed surface . Suppose
f/ = g/ on . Prove that for some constant k, fx y z = gx y z + k for all x y z in M.
572
13.8
CHAPTER 13
Vector Integral Calculus
The Integral Theorem of Stokes We have seen that the conclusion of Green’s theorem can be written C
F · Tds =
D
& × F · kdA
in which T is the unit tangent to C, a simple positively oriented closed curve enclosing a region D. Think of D as a flat surface with unit normal vector k and bounded by C. To generalize to three dimensions, allow as C to be a closed curve in 3-space, bounding a smooth surface in Figure 13.48. Here need not be a closed surface. Let N be a unit normal to . This raises a subtle point. At any point of there are two normal vectors, as shown in Figure 13.49. Which should we choose? In addition to this decision, we must choose a direction on C. In the plane we chose counterclockwise as positive orientation, but this has no meaning in three dimensions. First we will give a rule for choosing a particular unit normal to at each point. If has coordinate functions x = xu v, y = yu v, z = zu v, then the normal vector
z x
x y
y z i+ j+ k
u v
u v
u v divided by its length, yields a unit normal to . The negative of this unit normal is also a unit normal at the same point. Choose either this vector or its negative to use as the normal to , and call it n. Whichever is chosen as n, use it at all points of . That is, do not use n at one point and –n at another. Now use the choice of n to determine an orientation or direction on C to be called the positive orientation. Referring to Figure 13.50, at any point on C, if you stand along n (that is, your head is at the tip of n), then the positive direction on C is the one in which you walk to have over your left shoulder. The arrow shows the orientation on C obtained in this way. Although this is not a rigorous definition, it is sufficient for our purpose, without becoming enmeshed in topological details. With this direction, we say that C has been oriented coherently with n. If we had chosen the normal in the opposite direction, then we would have reached the opposite orientation on C. The choice of the normal determines the orientation on the curve. There is no intrinsic positive or negative orientation of the curve—simply an orientation coherent with the choice of normal. With this understanding, we can state Stokes’s theorem.
N
z
z
N
z N
C
C
y
y
x
x
x
FIGURE 13.48
FIGURE 13.49
FIGURE 13.50
Normals to a surface at a point.
y
13.8 The Integral Theorem of Stokes THEOREM 13.9
573
Stokes
Let be a piecewise smooth surface bounded by a piecewise smooth curve C. Suppose a unit normal n has been chosen on and that C is oriented coherently with n. Let Fx y z be a vector field whose component functions are continuous with continuous first and second partial derivatives on . Then, F · dR = & × F · nd C
We will write this conclusion in terms of coordinates and component functions. Let the component functions of F be, respectively, f , g and h. Then F · dR = fx y zdx + gx y zdy + hx y zdz C
C
Next,
i & × F = / x f
j
/ y g
k
/ z h
= h − g i + f − h j + g − f k
y z
z x
x y
The normal to the surface is given by equation (13.5) as N=
z x
y z
x y i+ j+ k
u v
u v
u v
Use this to define the unit normal nu v =
Nu v Nu v
Then Nu v Nu v
f h z x
h g y z 1 − + − =
y z u v
z x u v Nu v
g f x y + −
x y u v
& × F · n = & × F ·
Then
&
× F · nd
f h z x 1
h g y z − + −
y z u v
z x u v D Nu v
g f x y + − Nu v dudv
x y u v h g y z f h z x − + − =
y z u v
z x u v D
g f x y dudv + −
x y u v
=
574
CHAPTER 13
Vector Integral Calculus
in which the coordinate functions xu v, yu v and zu v from are substituted into the integral, and D is set of points u v over which these coordinate functions are defined. Keep in mind that the function to be integrated in Stokes’s theorem is & × F · n, in which n = N/ N is a unit normal. However, in converting the surface integral to a double integral over D, using the definition of surface integral, & × F · n must be multiplied by Nu v, with Nu v determined by equation (13.5).
EXAMPLE 13.32
Let Fx y z = −yi + xyj − xyzk and let consist of the part of the cone z = x2 + y2 for x2 + y2 ≤ 9. We will compute both sides of the conclusion of Stokes’s theorem to illustrate the various terms and integrals involved. The cone is shown in Figure 13.51. Its boundary curve C is the circle around the top of the cone, the curve x2 + y2 = 9 in the plane z = 3. In this example the surface is described by z = Sx y. Here x and y are the parameters, varying over the disk D given by x2 + y2 ≤ 9. We can compute a normal vector
z
z i− j+k
x
y x y = − i − j + k z z
N=−
For & × F · n in Stokes’s theorem, n is a unit normal, so compute the norm of N: # # 0 2 # x # y x y2 # N = # #− z i − z j + k# = z2 + z2 + 1 0 x2 + y2 + x2 + y2 √ = = 2 x2 + y 2 Use the unit normal n=
1 N = √ −xi − yj + zk N 2z
This normal is defined at all points of the cone except the origin, where there is no normal. n is an inner normal, pointing from any point on the cone into the region bounded by the cone. If we stand along n at points of C and imagine walking along C in the direction of the arrow in Figure 13.52, then the surface is over our left shoulder. Therefore this arrow orients z (0, 0, 3)
z
C: boundary curve
C: x 2 y 2 9, z 3 : z x 2 y 2
N(3, 0, 3)
: z x 2 y 2
y x FIGURE 13.51
y x FIGURE 13.52
13.8 The Integral Theorem of Stokes
575
C coherently with n. If we had used −n as normal vector, we would orient C in the other direction. We can parametrize C by x = 3 cost y = 3 sint z = 3
for 0 ≤ t ≤ 2
The point 3 cost 3 sint 3 traverses C in the positive direction (as determined by n) as t increases from 0 to 2. This completes the preliminary work and we can evaluate the integrals. For the line integral, F · dR = −ydx + xdy − xyz dz C
= =
C
2 0 2 0
−3 sint−3 sint + 3 cost3 costdt 9dt = 18
For the surface integral, first compute the curl of F: i j k & × F = / x / y / z −y x −xyz
= −xzi + yzj + 2k Then 1 & × F · n = √ x2 z − y2 z + 2z 2z 1 2 = √ x − y2 + 2 2 Then
& × F · nd =
= =
D
D
D
& × F · n N dxdy √ 1 √ x2 − y2 + 2 2 dx dy 2 x2 − y2 + 2dx dy
Use polar coordinates on D to write this integral as 2 3
0
=
0
0
2 3 0
r 2 cos2 − r 2 sin2 + 2 rdrd
r 3 cos2drd +
2 3
2rdrd 0
0
2
3 1 1 4 3 = sin2 r + 2 r 2 0 = 18 2 4 0 0 The following are two applications of Stokes’s theorem.
576
CHAPTER 13
Vector Integral Calculus
13.8.1 An Interpretation of Curl We will use Stokes’s theorem to argue for a physical interpretation of the curl operation. Think of Fx y z as the velocity of a fluid and let P0 be any point in the fluid. Consider a disk r of radius r about P0 , with unit normal vector n and boundary circle Cr coherently oriented, as in Figure 13.53. For the disk the normal vector is constant. By Stokes’s theorem, F · dR = & × F · nd Cr
r
Since R t is/ a tangent vector to Cr , then F · R is the tangential component of the velocity about Cr and Cr F · dR measures the circulation of the fluid about Cr . z
Boundary circle Cr
N P0
Disk r y
x FIGURE 13.53
By choosing r sufficiently small, & × Fx y z is approximated by & × FP0 as closely as we like on r . Further, since n is constant, circulation of F about Cr ≈ & × FP0 · nd r
= & × FP0 · narea of the disk
r
= r 2 & × FP0 · n Therefore & × FP0 · n ≈
1 circulation of F about Cr r 2
As r → 0, the disk contracts to its center P0 and we obtain & × FP0 · n = lim
r→0
1 circulation of F about Cr r 2
Since n is normal to the plane of Cr , this equation can be read & × FP0 · n = circulation of F per unit area in the plane normal to n Thus the curl of F is a measure of rotation of the fluid at a point. This is the reason why a fluid is called irrotational if the curl of the velocity vector is zero. For example, any conservative vector field is irrotational, because if F = &, then & × F = & × & = O.
13.8.2 Potential Theory in 3-Space As in the plane, a vector field Fx y z in 3-space is conservative if for some potential function , F = &. By exactly the same reasoning as applied in the two-dimensional case, we get F · dR = P1 − P0 C
13.8 The Integral Theorem of Stokes
577
if P0 is the initial point of C and P1 the terminal point. Therefore the line integral of a conservative vector field in 3-space is independent of path. If a potential function exists, we attempt to find it by integration just as in the plane.
EXAMPLE 13.33
Let Fx y z = yzexyz − 4xi + xzexyz + z + cosyj + xyexyz + yk For this to be the gradient of a scalar field , we must have
= yzexyz − 4x
x
= xzexyz + z + cosy
y
and
= xyexyz + y
z Begin with one of these equations, say the last, and integrate with respect to z to get x y z = exyz + yz + kx y where the “constant” of integration may involve the other two variables. Now we need
= yzexyz − 4x
x
k
xyz =
e + yz + kx y = yzexyz +
x
x This will be satisfied if
k = −4x
x so kx y must have the form kx y = −2x2 + cy Thus far x y z must have the appearance x y z = exyz + yz − 2x2 + cy Finally, we must satisfy
= xzexyz + z + cosy
y
xyz e + yz − 2x2 + cy =
y = xzexyz + z + c y Then c y = cosy and we may choose cy = siny
578
CHAPTER 13
Vector Integral Calculus
A potential function is given by x y z = exyz + yz − 2x2 + siny Of course, for any number a, x y z + a is also a potential function for F. As in the plane, in 3-space there are vector fields that are not conservative. We would like to develop a test to determine when F has a potential function. The discussion follows that in Section 13.3.1 for the plane. As we saw in the plane, the test requires conditions not only on F, but on the set D of points on which we want to find a potential function. We define a set D of points in 3-space to be a domain if 1. about every point of D there exists a sphere containing only points of D, and 2. between any two points of D there is a path lying entirely in D. This definition is analogous to the definition made for sets in the plane. For example, the set of points bounded by two disjoint spheres is not a domain because it fails to satisfy (2), while the set of points on or inside the solid unit sphere about the origin fails to be a domain because of condition (1). The set of points x y z with x ≥ 0, y ≥ 0 and z ≥ 0 is not a domain because it fails condition (1). For example, there is no sphere about the origin containing only points with nonnegative coordinates. The set of points x y z with x > 0, y > 0 and z > 0 is a domain. On a domain, existence of a potential function is equivalent to independence of path. THEOREM 13.10
Let D be a domain in 3-space, and let F be continuous on D. Then path on D if and only if F is conservative.
C
F · dR is independent of
We already know that existence of a potential function implies independence of path. For the converse, suppose C F · dR is independent of path in D. Choose any P0 in D. If P is any point of D, there is a path C in D from P0 to P, and we can define P = F · dR C
Because this line integral depends only on the end points of any path in D, this defines a function for all x y z in D. Now the argument used in the proof of Theorem 13.6 can be essentially duplicated to show that F = &. With one more condition on the domain D, we can derive a simple test for a vector field to be conservative. A set D of points in 3-space is simply connected if every simple closed path in D is the boundary of a piecewise smooth surface lying in D. This condition enables us to use Stokes’s theorem to derive the condition we want. THEOREM 13.11
Let D be a simply connected domain in 3-space. Let F and & × F be continuous on D. Then, F is conservative if and only if & × F = O in D. Thus, in simply connected domains, the conservative vector fields are the ones having curl zero—that is, the irrotational vector fields. In one direction, the proof is simple. If F = &, then & × F = & × & = O, without the requirement of simple connectivity. In the other direction, suppose & × F = O. To prove that F is conservative, it is enough to prove that C F · dR is independent of path in D. Let C and K be two paths from P0 to P1 in D.
13.8 The Integral Theorem of Stokes
579
z P1 K C
P0
y x FIGURE 13.54
Form a closed path L in D consisting of C and −K, as in Figure 13.54. Since D is simply connected, there is a piecewise smooth surface in D having boundary C. Then F · dR = F · dR − F · dR L
C
= so
C
K
&
F · dR =
× F · nd = 0 K
F · dR
and the line integral is independent of path, hence conservative. If Gx y = fx yi + gx yj, then we can think of G as a vector field in 3-space by writing Gx y = fx yi + gx yj + 0k Then
i & × G = / x fx y
j k
g f
/ y / z = − k
x y gx y 0
so the condition & × G = O in this two-dimensional case is exactly the condition of Theorem 13.5.
SECTION 13.8
PROBLEMS
In each of 1 through 5, use Stokes’s theorem to / Problems evaluate C F·dR or & ×F·Nd, whichever appears easier. 1. F = yx2 i − xy2 j + z2 k, the hemisphere x2 + y2 + 2 z = 4 z ≥ 0 2. F = xyi + yzj + xzk, the paraboloid z = x2 + y2 for 2 2 x +y ≤ 9 3. F = zi+xj+yk, the cone z = x2 + y2 for 0 ≤ z ≤ 4 the part of the paraboloid 4. F = z2 i + x2 j + y2 k, 2 2 z = 6 − x − y above the x y plane
5. F = xyi + yzj + xyk, the part of the plane 2x + 4y + z = 8 in the first octant 6. Calculate the circulation of F = x − yi + x2 yj + xzak counterclockwise about the circle x2 + y2 = 1. Here a is a positive constant. Hint: Use Stokes’s the orem, with any smooth surface having the circle as boundary. / 7. Use Stokes’s theorem to evaluate C F · Tds, where C is the boundary of the part of the plane x + 4y + z = 12 lying in the first octant, and F = x − zi + y − xj + z − yk.
580
CHAPTER 13
Vector Integral Calculus
In each of Problems 8 through 14, let be all of 3-space (so is a simply connected domain). Test to see if F is conservative. If it is, find a potential function.
In each of Problems 15 through 20, evaluate the line integral of the vector field on any path from the first point to the second by finding a potential function.
8. F = 2xi − 2yj + 2zk
15. F = i − 9y2 zj − 3y2 k 1 1 1 0 3 5
9. F = i − 2j + k 10. F = yz cosxi + z sinx + 1j + y sinxk
16. F = y cosxz − xyz sinxzi + x cosxzj −x2 sinxzk 1 0 1 1 7
11. F = x2 − 2i + xyzj − yz2 k
17. F = 6x2 eyz i+2x3 zeyz j+2x3 yeyz k 0 0 0 1 2 −1
12. F = exyz 1 + xyzi + x2 zj + x2 yk
18. F = −8y2 i − 16xy + 4zj − 4yk −2 1 1 1 3 2
13. F = cosx + y sinxi + x sinxyj + k
19. F = −i + 2z2 j + 4yzk 0 0 −4 1 1 6
14. F = 2x2 + 3y2 zi + 6xyzj + 3xy2 k
20. F = y − 4xzi + xj + 3z2 − 2x2 k 1 1 1 3 1 4
PA RT
5
CHAPTER 14 Fourier Series
CHAPTER 15 The Fourier Integral and Fourier Transforms
CHAPTER 16 Special Functions, Orthogonal Expansions and Wavelets
Fourier Analysis, Orthogonal Expansions, and Wavelets
In 1807 the French mathematician Joseph Fourier (1768–1830) submitted a paper to the Academy of Sciences in Paris. In it he presented a mathematical treatment of problems involving heat conduction. Although the paper was rejected for lack of rigor, it contained ideas that were so rich and widely applicable that they would occupy mathematicians in research to the present day. One surprising implication of Fourier’s work was that many functions can be expanded in infinite series or integrals involving sines and cosines. This revolutionary idea sparked a heated debate among leading mathematicians of the day and led to important advances in mathematics (Cantor’s work on cardinals and ordinals, orders of infinity, measure theory, real and complex analysis, differential equations), science and engineering (data compression, signal analysis), and applications undreamed of in Fourier’s day (CAT scans, PET scans, nuclear magnetic resonance). Today the term Fourier analysis refers to many extensions of Fourier’s original insights, including various kinds of Fourier series and integrals, real and complex Fourier transforms, discrete and finite transforms, and, because of their wide applicability, a variety of computer 581
programs for efficiently computing Fourier coefficients and transforms. The ideas behind Fourier series have also found important generalizations in a broad theory of eigenfunction expansions, in which functions are expanded in series of special functions (Bessel functions, orthogonal polynomials, and other functions generated by differential equations). More recently, wavelet expansions have been developed to provide additional tools in areas such as filtering and signal analysis. This part is devoted to some of these ideas and their applications.
582
CHAPTER
14
THE FOURIER SERIES OF A FUNCTION CONVERGENCE OF FOURIER SERIES FOURIER COSINE AND SINE SERIES INTEGRATION AND DIFFERENTIATION OF FOURIER SERIES THE PHASE ANGLE FORM OF A
Fourier Series
14.1
Why Fourier Series? A Fourier series is a representation of a function as a series of constants times sine and/or cosine functions of different frequencies. In order to see why such a series might be interesting, we will look at a problem of the type that led Fourier to consider them. Consider a thin bar of length , constant density and uniform cross section. Let ux t be the temperature at time t in the cross section of the bar at x, for 0 ≤ x ≤ . In Section 13.7.2 we derived a partial differential equation for u:
u
2 u = k 2 for 0 < x < t > 0 (14.1)
t
x in which k is a constant depending on the material of the bar. Suppose the left and right ends of the bar are kept at zero temperature, u0 t = u t = 0
for t > 0
(14.2)
and that the temperature throughout the bar at time t = 0 is specified ux 0 = fx = x − x
(14.3)
Intuitively, the heat equation, together with the initial temperature distribution throughout the bar, and the information that the ends are kept at zero degrees for all time, are enough to determine the temperature distribution ux t throughout the bar at any time. By a process that now bears his name, and which we will develop when we study partial differential equations, Fourier found functions satisfying the heat equation (14.1) and the conditions at the ends of the bar, equations (14.2), and having the form un x t = bn sinnxe−kn t 2
(14.4)
in which n can be any positive integer, and bn can be any real number. We will use these functions to find a function that also satisfies the condition (14.3). 583
584
CHAPTER 14
Fourier Series
Periodic phenomena have long fascinated mankind; our ancient ancestors were aware of the recurrence of phases of the moon and certain planets, the tides of lakes and oceans, and cycles in the weather. Isaac Newton’s calculus and law of gravitation enabled him to explain the periodicities of the tides, but it was left to Joseph Fourier and his successors to develop Fourier analysis, which has had profound applications in the study of natural phenomena and the analysis of signals and data.
A single choice of positive integer n0 and constant bn0 will not do. If we let ux t = 2 bn0 sinn0 xe−kn0 t , then we would need ux 0 = x − x = bn0 sinn0 x for 0 ≤ x ≤ an impossibility. A polynomial cannot equal a constant multiple of a sine function over 0 (or over any nontrivial interval). The next thing to try is a finite sum of the functions (14.4), say ux t =
N
bn sinnxe−kn t 2
(14.5)
n=1
Such a function will still satisfy the heat equation and the conditions (14.2). To satisfy the condition (14.3), we must choose N and bn s so that ux 0 = x − x =
N
bn sinnx for
0 ≤ x ≤
n=1
But this is also impossible. A finite sum of constant multiples of sine functions cannot equal a polynomial over 0 .
14.1 Why Fourier Series?
585
At this point Fourier had a brilliant insight. Since no finite sum of functions (14.4) can be a solution, attempt an infinite series: ux t =
bn sinnxe−kn t 2
(14.6)
n=1
This function will satisfy the heat equation, as well as the conditions ux 0 = u 0 = 0. To satisfy condition (14.3) we must choose the bn s so that ux 0 = x − x =
bn sinnx for
0 ≤ x ≤
(14.7)
n=1
This is quite different from attempting to represent the polynomial x − x by the finite trigonometric sum (14.5). Fourier claimed that equation (14.7) is valid for 0 ≤ x ≤ if the coefficients are chosen as 2 4 1 − −1n x − x sinnxdx = bn = 0 n3 By inserting these coefficients into the proposed solution (14.6), Fourier thus claimed that the solution of this heat conduction problem, with the given initial temperature, is ux t =
4 1 − −1n 2 sinnxe−kn t n=1 n3
The claim that 4 1 − −1n sinnx = x − x for n3 n=1
0≤x≤
was too much for mathematicians of Fourier’s time to accept. The mathematics of this time was not adequate to proving this kind of assertion. This was the lack of rigor that led the Academy to reject publication of Fourier’s paper. But the implications were not lost on Fourier’s colleagues. There is nothing unique about x −x as an initial temperature distribution, and many different functions could be used. What Fourier was actually claiming was that, for a broad class of functions f , coefficients bn could be chosen so that fx = n=1 bn sinnx on 0 . Eventually this and even more general claims for these series proposed by Fourier were proved. We will now begin an analysis of Fourier’s ideas and some of their ramifications.
SECTION 14.1 1. On x for tial 20
PROBLEMS
the same set of axes, generate a graph of 5 n 3 − x and n=1 4/ 1 − −1 /n sinnx 0 ≤ x ≤ . Repeat this for the par10 n 3 sums n=1 4/ 1 − −1 /n sinnx and n 3 n=1 4/ 1 − −1 /n sinnx This will give a sense of the correctness of Fourier’s intuition in asserting that x −x can be accurately represented by n 3 n=1 4/ 1 − −1 /n sinnx on this interval.
2. Prove that a polynomial cannot be a constant multiple of sinnx over 0 , for any positive integer n. Hint: One way is to proceed by induction on the degree of the polynomial. 3. Prove that a polynomial cannot be equal to a nonzero sum of the form nj=0 cj sinjx for 0 ≤ x ≤ , where the cj s are real numbers.
CHAPTER 14
586
14.2
Fourier Series
The Fourier Series of a Function L Let fx be defined for −L ≤ x ≤ L. For the time being, we assume only that −L fxdx exists. We want to explore the possibility of choosing numbers a0 a1 b1 b2 such that nx nx 1 fx = a0 + an cos + bn sin 2 L L n=1
(14.8)
for −L ≤ x ≤ L. We will see that this is sometimes asking too much, but that under certain conditions on f it can be done. However, to get started, we will assume the best of all worlds and suppose for the moment that equation (14.8) holds. What does this tell us about how to choose the coefficients? There is clever device used to answer this question, which was known to Fourier and others of his time. We will need the following elementary lemma.
LEMMA 14.1
Let n and m be nonnegative integers. Then 1.
L −L
cos
nx L
sin
mx L
dx = 0
2. If n = m, then
L −L
cos
nx L
cos
mx L
dx =
L −L
sin
nx L
sin
mx L
dx = 0
3. If n = 0, then
L −L
cos2
nx L
dx =
L −L
sin2
nx L
dx = L
The lemma is proved by straightforward integration. Now, to find a0 , integrate the series (14.8) term by term (supposing for now that we can do this):
1 L fxdx = a0 dx 2 −L −L L L nx nx + an dx + bn dx cos sin L L −L −L n=1 L
All of the integrals on the right are zero and this equation reduces to
L
−L
fxdx = La0
Therefore a0 =
1 L fxdx L −L
14.2 The Fourier Series of a Function
587
Next, we will determine ak for any positive integer k. Multiply equation (14.8) by coskx/L and integrate each term of the resulting series to get L 1 L kx kx dx = a0 dx fx cos cos L 2 L −L −L L L nx nx kx kx + an cos cos cos sin dx + bn dx L L L L −L −L n=1 L By the lemma, all of the integrals on the right are zero except for −L coskx/L coskx/Ldx, which occurs when n = k, and in this case this integral equals L. The right side of this equation therefore collapses to just one term, and the equation becomes L kx fx cos dx = ak L L −L whereupon 1 L kx dx ak = fx cos L −L L To determine bk , return to equation (14.8). This time multiply the equation by sinkx/L and integrate each term to get L kx kx 1 L fx sin sin dx = a0 dx L 2 L −L −L L L nx kx nx kx sin sin + an cos sin dx + bn dx L L L L −L −L n=1 Again, by the lemma, all terms on the right are zero except for when n = k, and this equation reduces to L kx fx sin dx = bk L L −L Therefore bk =
L −L
sinnx/L sinkx/Ldx
kx 1 L fx sin dx L −L L
We have now “solved” for the coefficients in the trigonometric series expansion 14.8. Of course, this analysis is flawed by the interchange of series and integrals, which is not always justified. However, the argument does tell us how the constants should be chosen, at least under certain conditions, and suggests the following definition. DEFINITION 14.1
Fourier Coefficients and Series
Let f be a Riemann integrable function on −L L. 1. The numbers an =
nx 1 L dx, fx cos L −L L
for
n = 0 1 2
588
CHAPTER 14
Fourier Series
and bn =
nx 1 L dx fx sin L −L L
for n = 1 2 3
are the Fourier coefficients of f on −L L. 2. The series nx nx 1 a0 + an cos + bn sin 2 L L n=1
is the Fourier series of f on −L L when the constants are chosen to be the Fourier coefficients of f on −L L.
EXAMPLE 14.1
Let fx = x for − ≤ x ≤ . We will write the Fourier series of f on − . The coefficients are: 1 xdx = 0 a0 = − 1 x cosnxdx an = −
1 x sinnx = 0 = 2 cosnx + n n − and
1 x sinnxdx −
1 x cosnx = 2 sinnx − n n −
bn =
2 2 = − cosn = −1n+1 n n since cosn = −1n if n is an integer. The Fourier series of x on − is 1 2 2 2 −1n+1 sinnx = 2 sinx − sin2x + sin3x − sin4x + sin5x − · · · 3 2 5 n=1 n
In this example the constant term and cosine coefficients are all zero, and the Fourier series contains only sine terms.
EXAMPLE 14.2
Let
fx =
0 x
for − 3 ≤ x < 0 for 0 ≤ x ≤ 3
14.2 The Fourier Series of a Function
589
Here L = 3 and the Fourier coefficients are: 1 3 1 3 3 fxdx = xdx = 3 −3 3 0 2 3 1 nx dx an = fx cos 3 −3 3 nx 1 3 = x cos dx 3 0 3 nx 3 nx 3 x = 2 2 cos + sin n 3 n 3 0 a0 =
=
3 n2 2
−1n − 1
and nx nx 1 3 1 3 dx = dx fx sin x sin 3 −3 3 3 0 3 nx 3 nx 3 x = 2 2 sin − cos n 3 n 3 0
bn =
=
3 −1n+1 n
The Fourier series of f on −3 3 is nx nx 3 3 3 n n+1 + + −1
−1 − 1 cos sin 4 n=1 n2 2 3 n 3
L L Even when fx is fairly simple, −L fx cosnx/Ldx and −L fx sinnx/Ldx can involve considerable labor if done by hand. Use of a software package to evaluate definite integrals is highly recommended. In these examples, we wrote the Fourier series of f , but did not claim that it equalled fx. For most x it is not obvious what the sum of the Fourier series is. However, in some cases it is obvious that the series does not equal fx. Consider again fx = x on − in Example 14.1. At x = and at x = −, every term of the Fourier series is zero, even though f = and f− = −. Even for very simple functions, then, there may be points where the Fourier series does not converge to fx. Shortly we will determine the sum of the Fourier series of a function. Until this is done, we do not know the relationship between the Fourier series and the function itself.
14.2.1
Even and Odd Functions
Sometimes we can save some work in computing Fourier coefficients by observing special properties of fx.
590
CHAPTER 14
Fourier Series
DEFINITION 14.2 Even Function
f is an even function on −L L if f−x = fx for −L ≤ x ≤ L. Odd Function
f is an odd function on −L L if f−x = −fx for −L ≤ x ≤ L.
For example, x2 , x4 , cosnx/L and e−x are even functions on any interval −L L. Graphs of y = x2 and y = cos5x/3 are given in Figure 14.1. The graph of such a function for −L ≤ x ≤ 0 is the reflection across the y-axis of the graph for 0 ≤ x ≤ L (Figure 14.2). The functions x, x3 , x5 and sinnx/L are odd functions on any interval −L L. Graphs of y = x, y = x3 and y = sin5x/2 are shown in Figure 14.3. The graph of an odd function for −L ≤ x ≤ 0 is the reflection across the vertical axis, and then across the horizontal axis, of the graph for 0 ≤ x ≤ L (Figure 14.4). If f is odd, then f0 = 0, since f−0 = f0 = −f0 Of course, “most” functions are neither even nor odd. For example, fx = ex is not even or odd on any interval −L L.
y
y
3 2
4
2
8 6 4 2
3
1
1
2
3
x
FIGURE 14.1 Graphs of even functions y = x2 and y = cos5x/3.
1 0 4 8
1
2
x
3
12 FIGURE 14.2 Graph of a typical even function, symmetric about the y axis.
y y 8 8
4
4 2
1
1 6
FIGURE 14.3 Graphis of odd functions y = x, y = x3 , and y = sin5x/2.
2
x
2
1
4
1
2
x
8 FIGURE 14.4 Graph of a typical odd function, symmetric through the origin.
14.2 The Fourier Series of a Function
591
Even and odd functions behave as follows under multiplication: even · even = even, odd · odd = even, and even · odd = odd. For example, x2 cosnx/L is an even function (product of two even functions); x2 sinnx/L is odd (product of an even function with an odd function); and x3 sinnx/L is even (product of two odd functions). Now recall from calculus that L fxdx = 0 if f is odd on −L L −L
and
L −L
fxdx = 2
L
fxdx
if
f is even on −L L.
0
These integrals are suggested by Figures 14.2 and 14.4. In Figure 14.4, f is odd on −L L, and the “area” bounded by the graph and the horizontal axis for −L ≤ x ≤ 0 is exactly L the negative of that bounded by the graph and the horizontal axis for 0 ≤ x ≤ L. This makes −L fxdx = 0. In Figure 14.2, where f is even, the area to the leftof the vertical axis, for −L ≤ x ≤ 0, equals L L that to the right, for 0 ≤ x ≤ L, so −L fxdx = 2 0 fxdx. One ramification of these ideas for Fourier coefficients is that, if f is an even or odd function, then some of the Fourier coefficients can be seen immediately to be zero, and we need not carry out the integrations explicitly. We saw this in Example 14.1 with fx = x, which is an odd function on − . There we found that the cosine coefficients were all zero, since x cosnx is an odd function.
EXAMPLE 14.3
We will find the Fourier series of fx = x4 on −1 1. Since f is an even function, x4 sinnx is odd and we know immediately that all the sine coefficients bn are zero. For the other coefficients, compute: a0 =
1 −1
x4 dx = 2
1 0
2 x4 dx = 5
and an =
1 −1
=2
x4 cosnxdx 1
0
x4 cosnxdx = 8
n2 2 − 6 −1n 4 n4
The Fourier series of x4 on −1 1 is 1 n2 2 − 6 + 8 −1n cosnx 5 n=1 4 n4
592
CHAPTER 14
Fourier Series
To again make the point about convergence, notice that f0 = 0 in this example, but the Fourier series at x = 0 is 1 n2 2 − 6 + 8 −1n 5 n=1 4 n4 It is not clear whether or not this series sums to the function value 0.
EXAMPLE 14.4
Let fx = x3 for −4 ≤ x ≤ 4. Because f is odd on −4 4, its Fourier cosine coefficients are all zero. Its Fourier sine coefficients are nx 1 4 3 bn = dx x sin 4 −4 4 nx 1 4 3 n2 2 − 6 = dx = −1n+1 128 3 3 x sin 2 0 4 n The Fourier series of x3 on −4 4 is
−1n+1 128
n=1
nx n2 2 − 6 sin n3 3 4
We will make use of this discussion later, so here is a summary of its conclusions: If f is even on −L L, then its Fourier series on this interval is nx 1 a0 + an cos (14.9) 2 L n=1 in which
nx 2 L dx for n = 0 1 2 fx cos L 0 L If f is odd on −L L, then its Fourier series on this interval is nx bn sin L n=1 an =
where bn =
SECTION 14.2
nx 2 L dx fx sin L 0 L
PROBLEMS
In each of Problems 1 through 12, find the Fourier series of the function on the interval. 1. fx = 4 −3 ≤ x ≤ 3 2. fx = −x −1 ≤ x ≤ 1
for n = 1 2
3. fx = coshx = 1 ≤ x ≤ 1 4. fx = 1 − x −2 ≤ x ≤ 2 −4 for − ≤ x ≤ 0 5. fx = 4 for 0 < x ≤
(14.10)
(14.11)
(14.12)
14.3 Convergence of Fourier Series 6. fx = sin2x − ≤ x ≤
12. fx =
7. fx = x2 − x + 3 −2 ≤ x ≤ 2 −x for − 5 ≤ x < 0 8. fx = 1 + x2 for 0 ≤ x ≤ 5 1 for − ≤ x < 0 9. fx = 2 for 0 ≤ x ≤ 10. fx = cosx/2 − sinx − ≤ x ≤ 11. fx = cosx −3 ≤ x ≤ 3
14.3
1−x 0
593
for − 1 ≤ x ≤ 0 for 0 < x ≤ 1
13. Suppose f and g are integrable on −L L and that fx = gx except for x = x0 , a given point in the interval. How are the Fourier series of f and g related? What does this suggest about the relationship between a function and its Fourier series on an interval? L 14. Prove that −L fxdx = 0 if f is odd on −L L. L L 15. Prove that −L fxdx = 2 0 fxdx if f is even on
−L L.
Convergence of Fourier Series It is one thing to be able to write the Fourier coefficients of a function f on an interval L L
−L L. This requires only existence of −L fx cosnx/Ldx and −L fx sinnx/Ldx. It is another issue entirely to determine whether the resulting Fourier series converges to fx—or even that it converges at all! The subtleties of this question were dramatized in 1873 when the French mathematician Paul Du Bois-Reymond gave an example of a function which is continuous on − , but whose Fourier series fails to converge at any point of this interval. However, the obvious utility of Fourier series in solving partial differential equations led in the nineteenth century to an intensive effort to determine their convergence properties. About 1829, Peter Gustav Lejeune-Dirichlet gave conditions on the function f which were sufficient for convergence of the Fourier series of f . Further, Dirichlet’s theorem actually gave the sum of the Fourier series at each point, whether or not this sum is fx. This section is devoted to conditions on a function that enable us to determine the sum of its Fourier series on an interval. These conditions center about the concept of piecewise continuity.
DEFINITION 14.3
Piecewise Continuous Function
Let fx be defined on a b, except possibly at finitely many points. Then f is piecewise continuous on a b if: 1. f is continuous on a b except perhaps at finitely many points. 2. Both limx→a+ fx and limx→b− fx exist and are finite. 3. If x0 is in a b and f is not continuous at x0 , then limx→x0 + fx and limx→x0 − fx exist and are finite.
Figures 14.5 and 14.6 shows graphs of typical piecewise continuous functions. At a point of discontinuity (which we assume are finite in number), the function must have finite one-sided limits. This means that the graph experiences at worst a finite gap at a discontinuity. Points where these occur are called jump discontinuities of the function. As an example of a simple function that is not piecewise continuous, let 0 for x = 0 fx = 1/x for 0 < x ≤ 1
594
CHAPTER 14
Fourier Series y y x x FIGURE 14.5 A piecewise continuous function.
FIGURE 14.6
Graph of a typical piecewise continuous function.
Then f is continuous on 0 1, and discontinuous at 0. However, limx→0+ fx = , so the discontinuity is not a finite jump discontinuity, and f is not piecewise continuous on 0 1.
EXAMPLE 14.5
Let
fx =
⎧ 5 ⎪ ⎪ ⎪ ⎪ ⎨x
for x = − for − < x < 1
⎪ 1 − x2 ⎪ ⎪ ⎪ ⎩ 4
for 1 ≤ x < 2
for 2 ≤ x ≤
A graph of f is shown in Figure 14.7. This function is discontinuous at −, and lim fx = −
x→−+
f is also discontinuous at 1, interior to − , and lim fx = 1
and
x→1−
lim fx = 0
x→1+
Finally, f is discontinuous at 2, and lim fx = −3
and
x→2−
lim fx = 4
x→2+
At each point of discontinuity interior to the interval, the function has finite one sided limits from both sides. At the point of discontinuity at the end point −, the function has a finite limit y 5 4 1
1 2
x
FIGURE 14.7 Graph of the function of Example 14.5.
14.3 Convergence of Fourier Series
595
from within the interval. In this example, the other end point is not an issue, as f is continuous (from the left) there. Therefore f is piecewise continuous on − . We will use the following notation for left and right limits of a function at a point: fx0 + = lim fx and fx0 − = lim fx x→x0 +
x→x0 −
In Example 14.5, f1− = 1 and f1+ = 0 and f2− = −3 and f2+ = 4 At the end points of an interval, we can still use this notation, except at the left end point we consider only the right limit (from inside the interval), and at the right end point we use only the left limit (again, so that the limit is taken from within the interval). Again referring to Example 14.5, f−+ = −
DEFINITION 14.4
and
f− = 4
Piecewise Smooth Function
f is piecewise smooth on a b if f and f are piecewise continuous on a b.
A piecewise smooth function is therefore one that is continuous except possibly for finitely many jump discontinuities, and has a continuous derivative at all but finitely many points, where the derivative may not exist but must have finite one-sided limits.
EXAMPLE 14.6
Let
⎧ 1 ⎪ ⎨ fx = −2x ⎪ ⎩ 9e−x
for − 4 ≤ x < 1 for 1 ≤ x < 2
for 2 ≤ x ≤ 3
Figure 14.8 shows a graph of f . The function is continuous except for finite jump discontinuities at 1 and 2. Therefore f is piecewise continuous on −4 3. The derivative of f is ⎧ 0 for − 4 < x < 1 ⎪ ⎨ for 1 < x < 2 f x = −2 ⎪ ⎩ −x −9e for 2 < x < 3 The derivative is continuous on −4 3 except at the points of discontinuity 1 and 2 of f , where f x does not exist. However, at these points f x has finite one-sided limits. Thus f is piecewise continuous on −4 3, so f is piecewise smooth. As suggested by Figure 14.8, a piecewise smooth function is one that has a continuous tangent at all but finitely many points. We will now state our first convergence theorem.
596
CHAPTER 14
Fourier Series y 1 4 3 2 1 1
1
2
3
4
x
2 3 4
Graph of ⎧ 1 for −4 ≤ x < 1 ⎪ ⎨ for 1 ≤ x < 2 fx = −2x ⎪ ⎩ −x 9e for 2 ≤ x ≤ 3 FIGURE 14.8
THEOREM 14.1
Convergence of Fourier Series
Let f be piecewise smooth on −L L. Then for −L < x < L, the Fourier series of f on
−L L converges to 1 fx+ + fx− 2 This means that, at each point between −L and L, the function converges to the average of its left and right limits. If f is continuous at x, then these left and right limits both equal fx, so the Fourier series converges to the function value at x. If f has a jump discontinuity at x, then the Fourier series may not converge to fx, but will converge to the point midway between the ends of the gap in the graph at x (Figure 14.9). y f(x) 1 2
( f (x) f (x)) L
x x f(x)
L
FIGURE 14.9 Convergence of a Fourier series at a jump discontinuity.
EXAMPLE 14.7
Let
⎧ ⎪ 5 sinx ⎪ ⎪ ⎪ ⎪ ⎪ 4 ⎨ fx = x2 ⎪ ⎪ ⎪ 8 cosx ⎪ ⎪ ⎪ ⎩4x
for for for for for
− 2 ≤ x < −/2 x = −/2 − /2 < x < 2 2≤x< ≤ x ≤ 2
14.3 Convergence of Fourier Series
597
y 25 20 15 10 5 6 4 2
4
x
6
5
FIGURE 14.10 ⎧
5 sinx ⎪ ⎪ ⎪ ⎪ ⎨4 Graph of fx = x2 ⎪ ⎪ 8 cosx ⎪ ⎪ ⎩ 4x
for for for for for
− 2 ≤ x < −/2 x = −/2 − /2 < x < 2 2≤x< ≤ x ≤ 2
A graph of f is given in Figure 14.10. Since f is piecewise smooth on −2 2, we can determine the sum of its Fourier series on this interval. In applying the theorem, we do not actually have to compute this Fourier series. We could do this, but it is not necessary to determine the sum of the Fourier series. For −2 < x < −/2, f is continuous and the Fourier series converges to fx = 5 sinx At x = −/2, f has a jump discontinuity and the Fourier series will converge to the average of the left and right limits of fx at −/2. Compute f−/2− =
lim
x→−/2−
fx =
lim
x→−/2−
5 sinx = 5 sin−/2 = −5
and f−/2+ =
lim
x→−/2+
fx =
lim
x→−/2+
x2 =
2 4
Therefore, at x = −/2, the Fourier series of f converges to 1 2 −5 2 4 On −/2 2 the function is continuous, so the Fourier series converges to x2 for −/2 < x < 2. At x = 2 the function the function has another jump discontinuity. Compute f2− = lim x2 = 4 x→2−
and f2+ = lim 8 cosx = 8 cos2 x→2+
598
CHAPTER 14
Fourier Series
At x = 2 the Fourier series converges to 1 4 + 8 cos2 2 On 2 , f is continuous. At each x with 2 < x < , the Fourier series converges to fx = 8 cosx. At x = , f has a jump discontinuity. Compute f− = lim 8 cosx = 8 cos = −8 x→−
and f+ = lim 4x = 4 x→+
At x = the Fourier series of f converges to 1 4 − 8 2 Finally, on 2, f is continuous and the Fourier series converges to fx = 4x. These conclusions can be summarized: ⎧ − ⎪ 5 sinx for − 2 < x < ⎪ ⎪ ⎪ 2 ⎪ ⎪ − 1 2 ⎪ ⎪ ⎪ −5 for x = ⎪ ⎪ 2 4 2 ⎪ ⎪ ⎪ 2 ⎪ for − /2 < x < 2 ⎪x ⎪ ⎨ 1 Fourier series converges to 4 + 8 cos2 for x = 2 ⎪2 ⎪ ⎪ ⎪ ⎪ 8 cosx for 2 < x < ⎪ ⎪ ⎪ ⎪ ⎪ 1 ⎪ ⎪ 4 − 8 for x = ⎪ ⎪ ⎪ 2 ⎪ ⎪ ⎩4x for < x < 2 Figure 14.11 shows a graph of this sum of the Fourier series, differing from the function itself on −2 2 at the jump discontinuities, where the series converges to the average of the left and right limits. y 25 20 15 10 5 –6
–4
4
6
–5
FIGURE 14.11 Graph of the Fourier series of the function of Figure 14.10.
x
14.3 Convergence of Fourier Series
599
If f is piecewise smooth on −L L and actually continuous on −L L, then the Fourier series converges to fx for −L < x < L.
EXAMPLE 14.8
Let fx =
for − 2 ≤ x ≤ 1 for 1 < x ≤ 2
x 2 − x2
Then f is continuous on −2 2 (Figure 14.12). f is differentiable except at x = 1, where f x has finite left and right limits, so f is piecewise smooth. For −2 < x < 2, the Fourier series of f converges to fx. In this example the Fourier series is an exact representation of the function on −2 2.
y 1.0 0.5 2
1
1
x
2
1.0 1.5 2.0 FIGURE 14.12 Graph of
fx =
14.3.1
for −2 ≤ x ≤ 1
x
2−x
2
for 1 < x ≤ 2
.
Convergence at the End Points
Theorem 14.1 does not address convergence of a Fourier series at the end points of the interval. There is a subtlety here that we will now discuss. The problem is that, while the function f of interest may be defined only on −L L, its Fourier series nx nx 1 a0 + an cos + bn sin 2 L L n=1
(14.13)
is defined for all real x for which the series converges. Further, the Fourier series is periodic, of period 2L. The value of the series is unchanged if x is replaced with x + 2L. How do we reconcile representing a function that is defined only on an interval by a function that is periodic and may be defined over the entire real line? The reconciliation lies in a periodic extension of f over the real line. Take the graph of fx on −L L and replicate it over successive intervals of length 2L. This defines a new function fp that agrees with fx for −L ≤ x < L, and has period 2L. This process is illustrated in Figure 14.13 for the function fx = x2 for −2 ≤ x < 2. This graph is simply repeated for
600
CHAPTER 14
Fourier Series y 4 3 2 6 4 2 0 2 4 6 8 10
x
FIGURE 14.13 Part of the periodic extension, of period 4, of fx = x2 for −2 ≤ x < 2.
2 ≤ x < 6 6 ≤ x < 10 −6 ≤ x < −2 −10 ≤ x < −6, and so on. The reason for using the half-open interval −L L in this extension is that, if fp is to have period 2L, then fp x + 2L = fp x for all x. But this requires that f−L = f−L + 2L = fL, so once fp −L is defined, fp L must equal this value. If we make this extension, then the convergence theorem applies to fp x at all x. In particular, at −L, the series converges to 1 fp −L− + fp −L+ 2 which is 1 fL− + f−L+ 2 Similarly, at L, the Fourier series converges to 1 fp L− + fp L+ 2 which is 1 fL− + f−L+ 2 The Fourier series converges to the same value at both L and at −L. This can also be seen directly from the series (14.13). If x = L, all the sine terms are sinn, which vanish, and the cosine terms are cosn, so the series at x = L is 1 a + a cosn 2 0 n=1 n At x = −L, again all the sine terms vanish, and because cos−n = cosn, the series at x = −L is also 1 a0 + an cosn 2 n=1
14.3 Convergence of Fourier Series
14.3.2
601
A Second Convergence Theorem
A second convergence theorem can be framed in terms of one-sided derivatives.
DEFINITION 14.5
Right Derivative
Suppose fx is defined at least for c < x < c + r for some positive number r. Suppose fc+ is finite. Then the right derivative of f at c is f c = lim
h→0+
fc + h − fc+ h
if this limit exists and is finite.
DEFINITION 14.6
Left Derivative
Suppose fx is defined at least for c − r < x < c for some positive number r. Suppose fc− is finite. Then the left derivative of f at c is fc + h − fc− h→0− h
f c = lim if this limit exists and is finite.
If f c exists, then f is continuous at c, so fc− = fc+ = fc, and in this case the left and right derivative are both equal to f c. However, Figure 14.14 shows the significance of the left and right derivatives when f has a jump discontinuity at c. The left derivative is the slope of the graph at x = c if we cover up the part of the graph to the right of c and keep only the left side. The right derivative is the slope at c if we cover up the left part and just keep the right part.
y
x0
Slope
fR' (x0 )
x Slope
fL'(x0 )
FIGURE 14.14 One-sided derivatives as slopes from the right or left.
602
CHAPTER 14
Fourier Series y 10 8 6 4 2 4
2
x
4
2
FIGURE 14.15
fx =
Graph of
1+x
for − < x < 1
x2
for 1 ≤ x <
.
EXAMPLE 14.9
Let
fx =
1+x x2
for − < x < 1 for 1 ≤ x <
Then f is continuous on − except at 1, where there is a jump discontinuity (Figure 14.15). Further, f is differentiable except at this point of discontinuity. Indeed, 1 for − < x < 1 f x = 2x for 1 < x < From the graph and the slopes of the “left and right pieces” at x = 1, we would expect the left derivative at x = 1 to be 1, and the right derivative to be 2. Check this from the definition. First, f1 + h − f1− f 0 = lim h→0− h 1 + 1 + h − 2 h = lim = 1 = lim h→0− h→0− h h and f1 + h − f1+ f c = lim h→0+ h 1 + h2 − 1 = lim 2 + h = 2 h→0+ h→0+ h
= lim
Using the one-sided derivatives, we can state the following convergence theorem. THEOREM 14.2
Let f be piecewise continuous on −L L. Then, 1. If −L < x < L and f has a left and right derivative at x, then the Fourier series of f on
−L L converges at x to 1 fx+ + fx− 2
14.3 Convergence of Fourier Series
603
2. If f −L and f L exist, then at both L and −L, the Fourier series of f on −L L converges to 1 f−L+ + fL− 2 As with the first convergence theorem, we need not compute the Fourier series to determine its sum.
EXAMPLE 14.10
Let
⎧ −x ⎪ ⎨e fx = −2x2 ⎪ ⎩ 4
for − 2 ≤ x < 1 for 1 ≤ x < 2 for x = 2
We want to determine the sum of the Fourier series of f on −2 2. A graph of f is shown in Figure 14.16. f is piecewise continuous, being continuous except for jump discontinuities at 1 and 2. For −2 < x < 1, f is continuous, and the Fourier series converges to fx = e−x . For 1 < x < 2, f is also continuous and the Fourier series converges to fx = −2x2 . At the jump discontinuity x = 1, the left and right derivatives exist (−e−1 and −4, respectively). We can determine these from the limits in the definitions, but these derivatives are also apparent from looking at the graph of f to the right and left of 1. Therefore the Fourier series converges at x = 1 to 1 f1− + f1+ 2 which is 1 −1 e −2 2 y 8 6 4 2 2
1
2
1
2
x
4 6 8 FIGURE 14.16 Graph of
fx =
⎧ ⎪ ⎨ ⎪ ⎩
e−x
−2x2 4
for −2 ≤ x < 1 for 1 ≤ x < 2 for x = 2
.
604
CHAPTER 14
Fourier Series
This takes care of each point in −2 2. Now consider the end points. The left derivative of f at 2 is −8 and the right derivative at −2 is −e2 . Therefore, at both 2 and at −2, the Fourier series converges to 1 1 f2− + f−2+ = −8 + e2 2 2 Figure 14.17 shows a graph of the sum of the Fourier series on −2 2, and can be compared with the graph of f . The two graphs agree except at the end points and at the jump discontinuity. The fact that f2 = 9 does not affect convergence of the Fourier series of fx at x = 2.
y 8 6 4 2 1
x 2 4 6 8
FIGURE 14.17 Graph of the Fourier series of the function of Figure 14.16.
A note of caution is warranted in applying the second convergence theorem. The left and right derivatives of a function at a point are relevant only to verify that the hypotheses of the theorem are satisfied at a jump discontinuity of the function. However, these derivatives play no role in the value to which the Fourier series converges at a point. That value involves the left and right limits of the function itself.
14.3.3 Partial Sums of Fourier Series Fourier’s claims for his series were counterintuitive in the sense that functions such as polynomials and exponentials do not seem to be likely candidates to be represented by series of sines and cosines. It is instructive to watch graphs of partial sums of some Fourier series converge to the graph of the function.
EXAMPLE 14.11
Let fx = x for − ≤ x ≤ . We saw in Example 14.1 that the Fourier series is 2 −1n+1 sinnx n n=1
14.3 Convergence of Fourier Series y
y
3 2 1 3
2 1
1 2 3
605
1
2
3
3 2 1
x
2 1 1 2 3
3
FIGURE 14.18(a) Fourth partial n+1 sinnx of sum S4 x = 4n=1 2−1 n the Fourier series of fx = x on − ≤ x ≤ .
1
2
3
x
FIGURE 14.18(b) Tenth partial sum of the Fourier series of fx = x on − .
y 3 2 1 3
2 1 1 2 3
1
2
3
x
FIGURE 14.18(c) Twentieth partial sum of the Fourier series of fx = x on − .
We can apply either convergence theorem to show that this series converges to x for − < x < 0 for x = and for x = − Figures 14.18 (a), (b) and (c) show, respectively the fourth, tenth and twentieth partial sums of this series, suggesting how they approach nearer to fx = x on − as more terms are included.
EXAMPLE 14.12
Let fx = ex for −1 ≤ x ≤ 1. The Fourier series of f on −1 1 is −1n 1 −1n e − e−1 + e − e−1 cosnx − n sinnx 2 2 2 1 + n2 2 n=1 1 + n This series converges to
⎧ ⎨ex 1 ⎩ e + e−1 2
for − 1 < x < 1 for x = 1 and for x = −1
Figures 14.19 (a) and (b) show the tenth and thirtieth partial sums of this series, compared with the graph of f .
606
CHAPTER 14
Fourier Series y
y
2.5 2.0 1.5 1.0
2.5 2.0 1.5 1.0 0.5
0.5 1.0 0.5
0.5
x
1.0
1.0 0.5
Tenth partial sum of the Fourier series of fx = ex on −1 1.
0.5
1.0
x
FIGURE 14.19(b) Thirtieth partial sum of the Fourier series of fx = ex on −1 1.
FIGURE 14.19(a)
EXAMPLE 14.13
Let fx = sinx for −1 ≤ x ≤ 1. The Fourier series of f on −1 1 is n sin1−1n+1 sinnx 2 n2 2 − 1 n=1
This series converges to
sinx 0
for − 1 < x < 1 for x = 1 and for x = −1
Figures 14.20 (a) and (b) show two partial sums of this series compared with the graph of f .
14.3.4 The Gibbs Phenomenon In 1881 the Michelson-Morley experiment revolutionized physics and helped pave the way for Einstein’s theory of general relativity. In a brilliant experiment using their adaptation of the interferometer, Michelson and Morley showed by careful measurements that the postulated “ether”, which physicists at that time believed permeated all of space, had no effect on the velocity of light as seen from different directions.
y 0.8
0.8 1.0
0.6 0.4 0.2
0.4
0.6
0.2 0.4 0.6
0 0.2 0.4 0.6 0.8 1.0 0.2
0.8 FIGURE 14.20(a) Fourth partial sum of the Fourier series of fx = sinx for −1 ≤ x ≤ 1.
x
14.3 Convergence of Fourier Series
607
y
0.8 1.0
0.4
0.8 0.6 0.4 0.2
0 0.2 0.4 0.6 0.8 1.0 0.6 0.2 0.2 0.4 0.6 0.8
x
FIGURE 14.20(b) Tenth partial sum of the function fx = sinx for −1 ≤ x ≤ 1.
Some years later Michelson was testing a mechanical device he had invented for computing Fourier coefficients and for constructing a function from its Fourier coefficients. In one test he used eighty Fourier coefficients for the function fx = x for − ≤ x ≤ . The machine responded with a graph having unexpected jumps near the end points and −. At first Michelson assumed that there was some problem with his machine. Eventually, however, it was found that this behavior is characteristic of Fourier series at jump discontinuities of the function. This became known as the Gibbs phenomenon, after the Yale mathematician Josiah Willard Gibbs, who was the first to satisfactorily define and explain it. The phenomenon had also been noticed some sixty years before by the English mathematician Wilbraham, who was, however, unable to analyze it. To illustrate the phenomenon, consider the function defined by ⎧ ⎪ ⎨−/4 for − ≤ x < 0 fx = 0 for x = 0 ⎪ ⎩ /4 for 0 < x ≤ Figure 14.21 shows a graph of this function, whose Fourier series is
1 sin2n − 1x n=1 2n − 1 By either convergence theorem this series converges to fx for − < x < . There is a jump discontinuity at 0, but 1 1 f0+ + f0− = − + = 0 = f0 2 2 4 4 y
4
4
FIGURE 14.21 Function illustrating the Gibbs phenomenon.
x
608
CHAPTER 14
Fourier Series
The N th partial sum of this Fourier series is SN x =
N
1 sin2n − 1x 2n −1 n=1
and Figure 14.22 shows graphs of S5 x, S14 x and S22 x. Each of these partial sums shows a peak near zero. Intuitively, since the partial sums approach fx as N → , we might expect these peaks to flatten out and become smaller as N is chosen larger. But they don’t. Instead, the peaks maintain roughly the same height, but move closer to the y axis as N increases. The partial sums do indeed have the function as a limit, but not in quite the way mathematicians expected. As another example, consider fx =
y
for − 2 ≤ x < 0 for 0 ≤ x ≤ 2
0 2−x
S22(x) S5(x)
0.8 S14(x)
0.6 0.4 0.2 0
0.5
1.0
1.5
2.0
2.5
3.0
x
Partial sums (for 0 ≤ x ≤ /4 showing the Gibbs phenomenon for the function of Figure 14.21.
FIGURE 14.22
y
S25(x) S10 (x)
2.0
S4(x) 1.5 1.0 0.5 2
1
x 0
1
2
FIGURE 14.23 Fourth, tenth, and twenty-fifth partial sums of the Fourier series of 0 for −2 ≤ x < 0 fx = . 2−x for 0 ≤ x ≤ 2
14.4 Fourier Cosine and Sine Series
609
This function has a jump discontinuity at 0, and Fourier series 1 2 2 n + sinnx/2 1 − −1 cosnx/2 + 2 n=1 n2 2 n Figure 14.23 shows the fourth, tenth and twenty-fifth partial sum of this series. Again, the Gibbs phenomenon shows up at the jump discontinuity. Gibbs showed that this behavior occurs in the Fourier series of a function at every point where it has a jump discontinuity.
SECTION 14.3
PROBLEMS
In each of Problems 1 through 10, use a convergence theorem to determine the sum of the Fourier series of the function on the interval. Whichever theorem is used, verify that the hypotheses are satisfied, assuming familiar facts from calculus about continuous and differentiable functions. It is not necessary to write the series itself to do this. Next, find the Fourier series of the function and graph f and, for N = 5 10 15 25, graph the N th partial sum of the series, together with the function on the interval. Point out any places where the Gibbs phenomenon is apparent in these graphs. ⎧ ⎪ ⎨2x 1. fx = 0 ⎪ ⎩ 2 x
for − 3 ≤ x < −2 for − 2 ≤ x < 1 for 1 ≤ x ≤ 3
5. fx =
7.
8.
9. 10.
2. fx = x for −2 ≤ x ≤ 2 3. fx = x e
for −3 ≤ x ≤ 3
2x − 2 4. fx = 3
14.4
for − ≤ x ≤ 1 for 1 < x ≤
for − ≤ x ≤ 0 for 0 < x ≤
cosx for − 2 ≤ x < 0 sinx for 0 ≤ x ≤ 2 −1 for − 4 ≤ x < 0 fx = 1 for 0 ≤ x ≤ 4 ⎧ 1 ⎪ ⎪0 for − 1 ≤ x < ⎪ ⎪ 2 ⎪ ⎨ 1 3 fx = 1 for ≤ x ≤ ⎪ 2 4 ⎪ ⎪ ⎪ ⎪ ⎩2 for 3 < x ≤ 1 4 −x fx = e for − ≤ x ≤ ⎧ ⎪ for − 4 ≤ x ≤ −2 ⎨−2 fx = 1 + x2 for − 2 < x ≤ 2 ⎪ ⎩ 0 for 2 < x ≤ 4
6. fx =
2
2 −x
x2 2
11. Let fx = x2 /2 for − ≤ x ≤ . Find the Fourier series of fx and evaluate it at an appropriately 2 chosen value of x to sum the series n=1 1/n . 12. Use the Fourier series of Problem 11 to sum the series n 2 n=1 −1 /n .
Fourier Cosine and Sine Series If fx is defined on −L L, we may be able to write its Fourier series. The coefficients of this series are completely determined by the function and the interval. We will now show that, if fx is defined on the half-interval 0 L, then we have a choice, and can write a series containing just cosines or just sines in attempting to represent fx on this half-interval.
610
CHAPTER 14
Fourier Series
14.4.1 The Fourier Cosine Series of a Function Let f be integrable on 0 L. We want to expand fx in a series of cosine functions. We already have the means to do this. Figure 14.24 shows a graph of a typical f . Fold this graph across the y− axis to obtain an function fe defined for −L ≤ x ≤ L: fx for 0 ≤ x ≤ L fe x = f−x for − L ≤ x < 0 fe is an even function, fe −x = fx and agrees with f on 0 L, fe x = fx for 0 ≤ x ≤ L We call fe the even extension of f to −L L.
EXAMPLE 14.14
Let fx = ex for 0 ≤ x ≤ 2. Then
fe x =
ex e−x
for 0 ≤ x ≤ 2 for − 2 ≤ x < 0
Here we put fe −x = fx = ex for 0 < x ≤ 2, meaning that fe x = e−x for −2 ≤ x < 0. A graph of fe is given in Figure 14.25. Because fe is an even function on −L L, its Fourier series on −L L is nx 1 a0 + an cos 2 L n=1 in which an =
nx nx 2 L 2 L dx = dx fe x cos fx cos L 0 L L 0 L
(14.14)
(14.15)
since fe x = fx for 0 ≤ x ≤ L. We call the series (14.14) the Fourier cosine series of f on
0 L. The coefficients (14.15) are the Fourier cosine coefficients of f on 0 L. The even extension fe was introduced only to be able to make use of earlier work to derive a series containing just cosines. When we actually write a Fourier cosine series, we just use (14.14) to calculate the coefficients, without defining fe .
y ex 2 x 0
y f L
L
FIGURE 14.24
Even
extension of f to
−L L.
x
2 FIGURE 14.25
y
y ex 0 x 2 2
x
14.4 Fourier Cosine and Sine Series
611
The other point to having fe in the background, however, is that we can use the Fourier convergence theorems to write a convergence theorem for cosine series.
THEOREM 14.3
Convergence of Fourier Cosine Series
Let f be piecewise continuous on 0 L. Then, 1. If 0 < x < L, and f has left and right derivatives at x, then the Fourier cosine series for fx on 0 L converges at x to 1 fx− + fx+ 2 2. If f has a right derivative at 0, then the Fourier cosine series for fx on 0 L converges at 0 to f0+. 3. If f has a left derivative at L, then the Fourier cosine series for fx on 0 L converges at L to fL−. Conclusions (2) and (3) follow from Theorem 14.2, applied to fe . Consider first x = 0. The Fourier series of fe converges at 0 to 1 f 0− + fe 0+ 2 e But fe 0+ = f0+ and fe 0− = f0+ so at 0 the series converges to 1 f0+ + f0+ = f0+ 2 A similar argument proves conclusion (3).
EXAMPLE 14.15
Let fx = e2x for 0 ≤ x ≤ 1. We will write the Fourier cosine series of f . Compute 1 e2x dx = e2 − 1 a0 = 2 0
and an = 2 =4
1
e2x cosnxdx
0
e2 −1n − 1 4 + n2 2
The cosine expansion of f is e2 −1n − 1 1 2 e − 1 + 4 cosnx 2 4 + n2 2 n=1
612
CHAPTER 14
Fourier Series
This series converges to ⎧ 2x ⎪ ⎨e 1 ⎪ ⎩ 2 e
for 0 < x < 1 for x = 0 for x = 1
Thus this cosine series converges to e2x for 0 ≤ x ≤ 1. Figures 14.26 (a) and (b) show a graph of f compared with the fifth and tenth partial sums of this cosine expansion, respectively.
y
y 7 7
6
6
5
5 4
4
3
3
2
2
1
1 0
0.2
0.4
0.6
x
0.8
FIGURE 14.26(a) Fifth partial sum of the cosine expansion of e2x on 0 1.
0
0.2
0.4
0.6
0.8
x
FIGURE 14.26(b) Tenth partial sum of the cosine expansion of e2x on 0 1.
14.4.2 The Fourier Sine Series of a Function By duplicating the strategy just used for writing a cosine series, except now extending f to an odd function fo over −L L, we can write a Fourier sine series for fx on 0 L. In particular, if fx is defined on 0 L, let fx for 0 ≤ x ≤ L fo x = −f−x for − L ≤ x < 0 Then fo is an odd function, and fo x = fx for 0 ≤ x ≤ L. This is the odd extension of f to
−L L. For example, if fx = e2x for 0 ≤ x ≤ 1, let e2x for 0 ≤ x ≤ 1 fo x = −2x for − 1 ≤ x < 0 −e This amounts to folding the graph of f over the vertical axis, then over the horizontal axis (Figure 14.27). Now write the Fourier series for fo x on −L L. By equations (14.11) and (14.12), the Fourier series of fo is n=1
bn sin
nx L
(14.16)
14.4 Fourier Cosine and Sine Series
613
y f L
L
x
FIGURE 14.27 Odd extension of f to −L L.
with coefficients bn =
nx nx 2 L 2 L dx = dx fo x sin fx sin L 0 L L 0 L
(14.17)
We call the series (14.16) the Fourier sine series of f on 0 L. The coefficients given by equation (14.17) are the Fourier sine coefficients of f on 0 L. As with cosine series, we do not need to explicitly make the extension to fo to write the Fourier sine series for f on 0 L. Again, as with the cosine expansion, we can write a convergence theorem for sine series using the convergence theorem for Fourier series. THEOREM 14.4
Convergence of Fourier Sine Series
Let f be piecewise continuous on 0 L. Then, 1. If 0 < x < L, and f has left and right derivatives at x, then the Fourier sine series for fx on 0 L converges at x to 1 fx− + fx+ 2 2. At 0 and at L, the Fourier sine series for fx on 0 L converges to 0. Conclusion (2) is immediate because each term of the sine series (14.16) is zero for x = 0 and for x = L.
EXAMPLE 14.16
Let fx = e2x for 0 ≤ x ≤ 1. We will write the Fourier sine series of f on 0 1. The coefficients are 1 e2x sinnxdx bn = 2 0
=2
n1 − −1n e2 4 + n2 2
The sine series is n1 − −1n e2 2 sinnx 4 + n2 2 n=1
This series converges to e2x for 0 < x < 1, and to zero for x = 0 and for x = 1. Figures 14.28 (a) and (b) show graphs of the tenth and fortieth partial sums of this series.
CHAPTER 14
614
Fourier Series y
y 8
7 6 5 4 3 2 1 0
6 4 2
0.2
0.4
0.6
0.8
1.0
FIGURE 14.28(a) Tenth partial sum of the sine expansion of e2x on 0 1.
SECTION 14.4
0
0.2
0.4
0.6
0.8
1.0
x
FIGURE 14.28(b) Fortieth partial sum of the sine expansion of e2x on 0 1.
PROBLEMS
In each of Problems 1 through 10, write the Fourier cosine series and the Fourier sine series of the function on the interval. Determine the sum of each series. 1. fx = 4 0 ≤ x ≤ 3 1 for 0 ≤ x ≤ 1 2. fx = −1 for 1 < x ≤ 2 0 for 0 ≤ x < 3. fx = cosx for ≤ x ≤ 2 4. fx = 2x for 0 ≤ x ≤ 1 5. fx = x2 for 0 ≤ x ≤ 2 6. fx = e−x for 0 ≤ x ≤ 1 x for 0 ≤ x ≤ 2 7. fx = 2 − x for 2 < x ≤ 3
14.5
x
⎧ ⎪ for 0 ≤ x < 1 ⎨1 8. fx = 0 for 1 ≤ x ≤ 3 ⎪ ⎩ −1 for 3 < x ≤ 5 x2 for 0 ≤ x < 1 9. fx = 1 for 1 ≤ x ≤ 4 10. fx = 1 − x3 for 0 ≤ x ≤ 2 11. Let fx be defined on −L L. Prove that f can be written as the sum of an even and an odd function on this interval. 12. Find all functions defined on −L L that are both even and odd. n 2 13. Find the sum of the series n=1 −1 /4n − 1. Hint: Expand sinx in a cosine series on 0 and choose an appropriate value of x.
Integration and Differentiation of Fourier Series In this section we will take a closer look at Fourier coefficients, and consider term by term differentiation and integration of Fourier series. Differentiation of Fourier series term-by-term generally leads to absurd results, even for extremely well behaved functions. Consider, for example, fx = x for − ≤ x ≤ . The Fourier series is 2 −1n+1 sinnx n n=1
14.5 Integration and Differentiation of Fourier Series
615
which converges to x for − < x < . Of course, f x = 1 for − < x < , so f is piecewise smooth. However, if we differentiate the Fourier series term-by-term, we get
2−1n+1 cosnx
n=1
which does not even converge on − . The term by term derivative of this Fourier series is unrelated to the derivative of fx. Integration of Fourier series has better prospects. THEOREM 14.5
Integration of Fourier Series
Let f be piecewise continuous on −L L, with Fourier series nx nx 1 a0 + an cos + bn sin 2 L L n=1 Then, for any x with −L ≤ x ≤ L, x nx nx 1 1 L an sin − bn cos − −1n ftdt = a0 x + L + 2 n=1 n L L −L The expression on the right in this equation is exactly what we get by integrating the Fourier series term by term, from −L to x. This means that, for any piecewise continuous function, we can integrate f from −L to x by integrating its Fourier series term-by-term. This holds even if the Fourier series does not converge to fx at this particular x (for example, f might have a jump discontinuity at x). Proof
Define Fx =
x −L
1 ftdt − a0 x 2
for −L ≤ x ≤ L. Then F is continuous on −L L and FL = F−L = La0 /2. Further, F x = fx − 21 a0 at every point of −L L where f is continuous. Hence F is piecewise continuous on −L L. Therefore the Fourier series of Fx converges to Fx on −L L: nx nx 1 + Bn sin (14.18) Fx = A0 + An cos 2 L L n=1 in which we will use upper case letters for Fourier coefficients of F , and lower case letters for those of f . Now compute the An s and Bn s for n = 1 2 by integrating by parts. First, nt 1 L dt Ft cos An = L −L L nt L nt 1 1 L L L sin sin F tdt = − Ft L n L L −L n L −L 1 nt 1 L ft − a0 sin dt =− n −L 2 L L nt nt 1 1 L dt + a0 dt =− ft sin sin n −L L 2n L −L L =− b n n
616
CHAPTER 14
Fourier Series
in which bn is the sine coefficient in the Fourier series of f on −L L. Similarly, Bn = = = = =
nt 1 L dt Ft sin L −L L nt nt L L 1 1 L L Ft − cos dt − F t − cos L n L L −L n L −L nt 1 L 1 dt ft − a0 cos n −L 2 L L nt nt 1 1 L dt − a0 dt ft cos cos n −L L 2n L −L L a n n
Therefore the Fourier series of F is nx nx 1 1 L Fx = A0 + −bn cos + an sin 2 n=1 n L L
for −L ≤ x ≤ L. Now we must determine A0 . But L 1 1 L a0 = A 0 − b cosn 2 2 n=1 n n 1 1 L = A0 − b −1n 2 n=1 n n
FL =
This gives us 1 2L A0 = La0 + b −1n n=1 n n
Upon substituting these expressions for A0 , An and Bn into the series (14.18), we obtain the conclusion of the theorem.
EXAMPLE 14.17
Let fx = x for − ≤ x ≤ . This function is continuous on − , and its Fourier series is 2 −1n+1 sinnx n n=1
We have seen that we get nonsense if we differentiate this series term by term. However, we can integrate it term by term to obtain, for any x in − ,
x
1 tdt = x2 − 2 2 − x 2 −1n+1 sinntdt = − n=1 n
14.5 Integration and Differentiation of Fourier Series
=
1 2 1 −1n+1 − cosnx + cosn n n n=1 n
=
2 −1n cosnx − −1n 2 n n=1
617
With stronger conditions on f , we can derive a result on term by term differentiation of Fourier series. THEOREM 14.6
Differentiation of Fourier Series
Let f be continuous on −L L and suppose fL = f−L. Let f be piecewise continuous on −L L. Then fx equals its Fourier series for −L ≤ x ≤ L, nx nx 1 + bn sin fx = a0 + an cos 2 L L n=1 and, at each point in −L L where f x exists, nx n −nan sinx/L + bn cos f x = L n=1 L We leave a proof of this to the student. The idea is to write the Fourier series of f x, noting that this Fourier series converges to f x wherever f x exists. Use integration by parts, as in the proof of Theorem 14.5, to relate the Fourier coefficients of f x to those of fx.
EXAMPLE 14.18
Let fx = x2 for −2 ≤ x ≤ 2. The hypotheses of Theorem 14.6 are satisfied. The Fourier series of f on −2 2 is nx 4 16 −1n cos fx = + 2 2 3 n=1 n 2 with equality between fx and its Fourier series. Because f x = 2x is continuous, and f x = 2 exists throughout the interval, then for −2 < x < 2, nx −1n+1 8 f x = 2x = sin n=1 n 2 For example, putting x = 1, we get n 8 −1n+1 sin = 2 n=1 n 2
or −1n+1 n=1
n
sin
n 2
=
4
Manipulations on Fourier series can sometimes be used to sum series such as this.
618
CHAPTER 14
Fourier Series
We now have conditions under which we can differentiate or integrate a Fourier series term by term. We will next consider conditions sufficient for a Fourier series to converge uniformly. First we will derive a set of important inequalities for Fourier coefficients, called Bessel’s inequalities. THEOREM 14.7
Bessel’s Inequalities
Let f be integrable on 0 L. Then 1. The coefficients in the Fourier sine expansion of f on 0 L satisfy 2 L bn2 ≤ fx2 dx L −L n=1 2. The coefficients in the Fourier cosine expansion of f on 0 L satisfy 1 2 2 L a0 + a2n ≤ fx2 dx 2 L 0 n=1 3. If f is integrable on −L L, then the Fourier coefficients of f on −L L satisfy 1 L 1 2 a0 + a2n + bn2 ≤ fx2 dx 2 L −L n=1 In particular, the sum of the squares of the (sine, cosine, or Fourier series) coefficients of f converges. We will prove (1), which is notationally simpler than the other two inequalities, but contains the idea of the argument. L Proof Since 0 fxdx exists, we can compute the Fourier sine coefficients and write the sine series nx bn sin L n=1 where bn =
nx 2 L dx fx sin L 0 L
The N th partial sum of this series is SN x =
N
bn sin
nx L
n=1
Now consider 0≤ = =
L 0
L 0
L
0
+
fx − SN x2 dx fx2 dx − 2 fx2 dx − 2
0
L
N
n=1
L 0
bn sin
fxSN xdx +
L
fx 0
N
bn sin
L 0
L
m=1
nx
n=1
N nx
SN x2 dx
bm sin
L
dx
mx L
dx
14.5 Integration and Differentiation of Fourier Series
=
L 0
fx2 dx − 2
L
0
L
sin
bn bm
fx2 dx −
N
L
fx sin 0
nx
0
n=1 m=1
=
bn
n=1 N N
+
N
L
bn Lbn +
n=1
sin
nx L
mx L
619
dx
dx
N
L bn bn 2 n=1
in which we have used the fact that L nx mx 0 sin dx = sin L L L/2 0
if n = m if n = m
We therefore have 0≤
L 0
fx2 dx − L
N
bn2 +
n=1
N L b2 2 n=1 n
or N
bn2 ≤
n=1
2 L fx2 dx L 0
Since the right side is independent of N , we can let N → to get 2 L bn2 ≤ fx2 dx L 0 n=1 proving conclusion (1). Conclusions (2) and (3) have similar proofs.
EXAMPLE 14.19
We will use Bessel’s inequality to derive an upper bound for an infinite series. Let fx = x2 for − ≤ x ≤ . The Fourier series of f converges to fx for all x in − : 1 −1n x2 = 2 + 4 2 cosnx 3 n n=1
Here a0 = 2 2 /3, an = 4−1n /n2 and bn = 0 (x2 is an even function). By Bessel’s inequality (3) of Theorem 14.7, 2 4−1n 2 1 4 2 1 2 2 + ≤ x dx = 4 2 2 3 n − 5 n=1 Then 16
1 ≤ 4 n=1 n
2 2 8 4 − 4 = 5 9 45
so 1 4 ≤ 4 90 n=1 n
which is approximately 10823232.
620
CHAPTER 14
Fourier Series
Using Bessel’s inequality for coefficients in a Fourier expansion on −L L , we can prove a result on uniform convergence of Fourier series.
THEOREM 14.8
Uniform and Absolute Convergence of Fourier Series
Let f be continuous on −L L and let f be piecewise continuous. Suppose f−L = fL. Then, the Fourier series of f on −L L converges absolutely and uniformly to fx on
−L L. Denote the Fourier coefficients of f by lower case letters, and those of f by upper case. Then 1 L f d = fL − f−L = 0 A0 = L −L
Proof
For positive integer n, we find by integration by parts, as in the proof of Theorem 14.5, that An =
n n bn and Bn = − a L L n
Now
1 0 ≤ An − n
2 = A2n −
2 1 An + 2 n n
and, similarly, 0 ≤ Bn2 −
2 1 B + 2 n n n
Then 1 1 1 1 2 A + Bn2 + 2 A + Bn ≤ n n n 2 n n Therefore 1 1 2 An + Bn2 + 2 an + bn ≤ L L 2 n hence L 1 L 2 An + Bn2 + 2 n2 2 2 2 Now applying Bessel’s inequaln=1 1/n converges, and n=1 An + Bn converges by ity to the Fourier coefficients of f . Therefore, by comparison, n=1 an + bn converges also. But, for −L ≤ x ≤ L, an + bn ≤
an cosnx/L + bn sinnx/L ≤ an + bn By a theorem of Weierstrass, this implies that the Fourier series of f converges uniformly on −L L. Further, the convergence is absolute, since the series of absolute values of terms of the series converges. Finally, by the Fourier convergence theorem, the Fourier series of f converges to fx on −L L. This completes the proof.
14.5 Integration and Differentiation of Fourier Series
621
EXAMPLE 14.20
Let fx = e−x for −1 ≤ x ≤ 1. Then ex fx = −x e
for − 1 ≤ x < 0 for 0 ≤ x ≤ 1
f is continuous on −1 1, and
f x =
−e−x ex
for 0 < x ≤ 1 for − 1 ≤ x < 0
f has no derivative at x = 0, which is a cusp of the graph (Figure 14.29). Thus f is piecewise continuous on −1 1. Finally, f1 = f−1 = e−1 . Therefore the Fourier series of f converges uniformly and absolutely to fx on −1 1: 1 − e−1 −1n
fx = 1 − e−1 + 2
1 + 2 n2
n=1
cosnx
for −1 ≤ x ≤ 1. We can integrate this series term by term. For example,
x −1
ftdt =
x −1
1 − e−1 dt + 2
1 − e−1 −1n n=1
= 1 − e−1 x + 1 + 2
1 + 2 n2
x −1
cosntdt
1 − e−1 −1n 1 sinnx 1 + 2 n2 n n=1
This is a correct equation, but it is not a Fourier series (the right side includes the polynomial term x). We may always integrate a Fourier series term by term, and the result may be a convergent series, but not necessarily a Fourier series.
y 1.0
0.6 0.4 0.2 1.0 0.5
0
0.5
1.0
x
FIGURE 14.29 Graph of
fx =
ex e
−x
for −1 ≤ x < 0 for 0 ≤ x ≤ 1
622
CHAPTER 14
Fourier Series
We can also differentiate the Fourier series for fx term by term at any point in −1 1 at which f x exists. Thus we can differentiate term by term for −1 < x < 0 and for 0 < x < 1. For such x, f x = −2
1 − e−1 −1n n=1
1 + 2 n2
n sinnx
We will conclude this section with Parseval’s theorem. Recall that Bessel’s inequality for Fourier coefficients on −L L requires only that we be able to compute these coefficients. If, however, we place continuity conditions on the function, as in Theorem 14.8, then we turn Bessel’s inequality into an equality. THEOREM 14.9
Parseval
Let f be continuous on −L L and let f be piecewise continuous. Suppose f−L = fL. Then the Fourier coefficients of f on −L L satisfy 1 2 1 L a0 + a2n + bn2 = fx2 dx 2 L −L n=1 Proof
The Fourier series of f on −L L converges to fx at each point of this interval: nx nx 1 + bn sin fx = a0 + an cos 2 L L n=1
Then
nx nx 1 fx2 = a0 fx + an fx cos + bn fx sin 2 L L n=1
We can integrate this Fourier series term by term, and multiplication of the series by the continuous function fx does not change this. Therefore L 1 L fx2 dx = a0 fxdx 2 −L −L L L nx nx dx + bn dx fx cos fx sin + an L L −L −L n=1 Recalling the integral formulas for the Fourier coefficients, this equation can be written L 1 fx2 dx = a0 La0 + an Lan + bn Lbn 2 −L n=1 and this is equivalent to the conclusion of the theorem.
EXAMPLE 14.21
Parseval’s theorem has various applications in deriving other properties of Fourier series. We will encounter it later when we discuss completeness of sets of eigenfunctions. However, one immediate use is in summing certain infinite series. To illustrate, the Fourier coefficients of cosx/2 on − are 1 4 a0 = cosx/2dx = −
14.6 The Phase Angle Form of a Fourier Series
623
and an =
1 4 −1n cosx/2 cosnxdx = − − 4n2 − 1
By Parseval’s theorem, 4 −1n 2 1 4 2 1 + = cos2 x/2dx = 1 2 2 4n − 1 − n=1 Then,
1 2 − 8 = 2 2 16 n=1 4n − 1
SECTION 14.5
PROBLEMS
1. Prove Theorem 14.6. An argument can be formulated along the lines discussed following the statement of the theorem. 2. Let fx = x for −1 ≤ x ≤ 1. (a) Write the Fourier series for fx on −1 1. (b) Show that this series can be differentiated term by term by yield the Fourier expansion of f x on
−1 1. (c) Determine f x and write its Fourier series on −1 1. Compare this series with that obtained in (b). 0 for − ≤ x ≤ 0 3. Let fx = x for 0 < x ≤ (a) Write the Fourier series of fx on − and show that this series converges to fx on − . (b) Show that this series can be integrated term by term.
14.6
(c) Use the results of (a) and x(b) to obtain a trigonometric series expansion for − ftdt on − . 4. Let fx = x2 for −3 ≤ x ≤ 3. (a) Write the Fourier series for fx on −3 3. (b) Show that this series can be differentiated termby-term and use this fact to obtain the Fourier expansion of 2x on −3 3. (c) Write the Fourier series of 2x on −3 3 by computation of the Fourier coefficients and compare the result with that of (b). 5. Let fx = x sinx for − ≤ x ≤ . (a) Write the Fourier series for fx on − . (b) Show that this series can be differentiated term by term and use this fact to obtain the Fourier expansion of sinx + x cosx on − . (c) Write the Fourier series of sinx + x cosx on
− by computation of the Fourier coefficients and compare the result with that of (b).
The Phase Angle Form of a Fourier Series A function is periodic with period p if fx + p = fx for all real x. If a function has a period, it has many periods. For example, cosx has periods 2, 4, 6 −2 −4 and, in fact, 2n for any integer n. The smallest positive period of a function is called its principal period. The principal period of sinx and cosx is 2. If f has period p, then for any x, and any integer n, fx + np = fx
624
CHAPTER 14
Fourier Series
For example, cos
6
+ 2 = cos + 4 = cos + 6 = · · · 6 6 6 − 2 = cos − 4 = · · · = cos 6 6
= cos
The graph of periodic fx repeats itself over every interval of length p (Figure 14.30). This means that we need only specify fx on an interval of length p, say on −p/2 p/2, to determine fx for all x. This specification of function values can be made on any interval
+ p of length p. Since f + p = f, the function must have the same value at the end points of this interval. This is why we specify values on the half-open interval + p, since f + p is determined once f is defined.
EXAMPLE 14.22
Let gx = 2x for −1 ≤ x < 1, and suppose g has period 2. Then the graph of g on −1 1 is repeated to cover the entire real line, as in Figure 14.31. Knowing the period, and the function values on −1 1, are enough to determine the function all x. for As a specific example, suppose we want to know g 27 . Because g has period 2, gx +2n = gx for any x and any integer n. Then 7 −1 −1 −1 g =g +4 = g =2 = −1 2 2 2 2 Similarly, g483 = g03 + 224 = g03 = 06
If f has period p, and is integrable, then we can calculate its Fourier coefficients on
−p/2 p/2 and write the Fourier series 2nx 1 2nx a0 + an cos + bn sin 2 p p n=1 Here L = p/2, so nx/L = 2nx/p in the previous discussion of Fourier series on −LL. The Fourier coefficients are 2nx 2 p/2 an = fx cos dx for n = 0 1 2 p −p/2 p
y
y p
2
p 2
2
x p FIGURE 14.30 Graph of a periodic function of fundamental period p.
1
1
2 FIGURE 14.31
x
14.6 The Phase Angle Form of a Fourier Series and
625
2 p/2 2nx bn = dx for n = 1 2 fx sin p −p/2 p Actually, because of the periodicity, we could choose any convenient number and write 2nx 2 +p an = fx cos dx for n = 0 1 2 (14.19) p p
and bn =
2 +p 2nx fx sin dx for n = 1 2 p p
(14.20)
Once we compute the coefficients, we can use a convergence theorem to determine where this series represents fx.
EXAMPLE 14.23
The function f shown in Figure 14.32 has fundamental period 6, and 0 for − 3 ≤ x < 0 fx = 1 for 0 ≤ x < 3 This function is called a square wave. It’s Fourier series on −3 3 is nx 1 1 + 1 − −1n sin 2 n=1 n 3
This series converges to 0 for −3 < x < 0, to 1 for 0 < x < 3, and to 1/2 at x = 0 and x = ±3. Because of the periodicity, this series also converges to fx on −6 −3 and on 3 6, on −6 −9 and on 6 9 , and so on. Sometimes we write 0 =
2 p
Now the Fourier series of f on −p/2 p/2 is 1 a0 + an cosn0 x + bn sinn0 x 2 n=1
(14.21)
y
9
6
3
FIGURE 14.32 Square wave: fx =
and f has period.
3
6
x
9
0
for −3 ≤ x < 0
1
for 0 ≤ x < 3
,
626
CHAPTER 14
Fourier Series
where an =
2 p/2 fx cosn0 xdx for n = 0 1 2 p −p/2
and bn =
2 p/2 fx sinn0 xdx for n = 1 2 p −p/2
It is sometimes useful to write the Fourier series (14.21) in a different way. We will look for numbers cn and n so that an cosn0 x + bn sinn0 x = cn cosn0 x + n To solve for these constants, write the last equation as an cosn0 x + bn sinn0 x = cn cosn0 x cosn − cn sinn0 x sinn One way to satisfy this equation is to have cn cosn = an and cn sinn = −bn Solve these for cn and n . First square both equations and add to obtain cn2 = a2n + bn2 so cn =
"
a2n + bn2
(14.22)
Next, write b cn sinn = tann = − n cn cosn an so
n = tan
−1
−bn an
assuming that an = 0. The numbers cn and n allow us to write the phase angle form of the Fourier series (14.21).
DEFINITION 14.7
Phase Angle Form
Let f have fundamental period p. Then the phase angle form of the Fourier series (14.21) of f is
in which 0 = 2/p, cn =
1 a0 + cn cosn0 x + n 2 n=1
a2n + bn2 , and n = tan−1 −bn /an for n = 1 2 .
14.6 The Phase Angle Form of a Fourier Series
627
The phase angle form of the Fourier series is also called its harmonic form. This expression displays the composition of a periodic function (satisfying certain continuity conditions) as a superposition of cosine waves. The term cosn0 x + n is the nth harmonic of f , cn is the nth harmonic amplitude , and n is the nth phase angle of f .
EXAMPLE 14.24
Suppose f has fundamental period p = 3, and fx = x2 for 0 ≤ x < 3 Since f has fundamental period 3, defining fx on any interval a b of length 3 determines fx for all x For example, f−1 = f−1 + 3 = f2 = 4 f5 = f2 + 3 = f2 = 22 = 4 (or observe that f5 = f−1 + 6 = f−1 + 2 · 3 = f−1 = 4), and f7 = f1 + 6 = f1 = 1 A graph of f is shown in Figure 14.33. Care must be taken if we want to write an algebraic expression for fx on a different 3 interval. For example, on the symmetric interval −3 about the origin, 2 2 x2 for 0 ≤ x < 23 fx = x + 32 for −3 ≤x<0 2 To find the Fourier coefficients of f , it is convenient to use equations (14.19) and (14.20) with = 0, since f is given explicitly on 0 3. Compute 2 3 2 x dx = 6 3 0 2nx 2 3 2 9 an = x cos dx = 2 2 3 0 3 n a0 =
and bn =
2 3 2 2nx 9 dx = − x sin 3 0 3 n
y
6 3 0
3
6
9
12
x
FIGURE 14.33 Graph of fx = x2 for 0 ≤ x < 3, with fx + 3 = fx for all x.
628
CHAPTER 14
Fourier Series
The Fourier series of f is. 3+
9 n n=1
1 2nx 2nx cos − sin n 3 3
We can think of this as the Fourier series of f origin. By the Fourier convergence theorem, this ⎧ 1 9 9 9 ⎪ ⎪ + = ⎪ ⎪ 2 4 4 4 ⎪ ⎪ ⎪ ⎪ 9 ⎪ ⎨ 2 ⎪ ⎪ ⎪ ⎪x + 32 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩x 2
on the symmetric interval series converges to for x =
(14.23)
−3 2
23 about the
±3 2
for x = 0
3 <x<0 2 3 for 0 < x < 2 For the phase angle, or harmonic form, of this Fourier series, compute " 9 cn = a2n + bn2 = 2 2 1 + n2 2 for n = 1 2 n
and
n = tan
−1
for −
−9/n − 9/n2 2
= tan−1 n
Since 0 = 2/3, the phase angle form of the series (14.23) 2nx 9 −1 2 2 3+ + tan n 1 + n cos 2 2 3 n=1 n The amplitude spectrum of a periodic function f is a plot of values of n0 on the horizontal axis versus cn /2 on the horizontal axis, for n = 1 2 . Thus the amplitude spectrum consists of points n0 cn /2 for n = 1 2 . It is also common to include the point 0 a0 on the vertical axis. Figure 14.34 shows the amplitude spectrum for the function of Example 13.24, consisting of points 0 3 and, for n = 1 2 , 9 2n 2 2 1+n 3 2n2 2 This graph allows us to envision the magnitude of the harmonics of which the periodic function is composed and clarifies which harmonics dominate in the function. This is useful in signal analysis, in which the function is the signal. cn 2
3.0
1.5 0.7 0.35 0.28
0.5 0
20 30 40 50
FIGURE 14.34 Amplitude spectrum of the function of Figure 14.33.
n0
14.6 The Phase Angle Form of a Fourier Series
SECTION 14.6
PROBLEMS
1. Let f and g have period p. Show that f + g has period p for any constants and .
11. Figure 14.36
y
2. Let f have period p and let and be positive constants. Show that gt = ft has period p/, and that ht = ft/ has period p.
1 2 1
3. Let fx be differentiable and have period p. Show that f x has period p. 4. Suppose f has period p. Show that, for any real number , +p p p/2 fxdx = fxdx = fxdx
1
2
x
3
1 FIGURE 14.36
−p/2
0
629
12. Figure 14.37 In each of Problems 5 through 9, find the phase angle form of the Fourier series of the function. Plot some points of the amplitude spectrum of the function.
y
f(t)
k 5. fx = x for 0 ≤ x < 2 and fx + 2 = fx for all x. ⎧ ⎪ ⎨1 6. fx = 0 ⎪ ⎩ fx + 2
for 0 ≤ x < 1 for 1 ≤ x < 2 for all x
2
1
1
9. fx = cosx for 0 ≤ x < 1 and fx = fx + 1 for all x. In each of Problems 10 through 14, find the phase angle form of the Fourier series of the function, part of whose graph is given in the indicated diagram. Plot some points of the amplitude spectrum of the function. 10. Figure 14.35
x
13. Figure 14.38
y 2 1 5
3
1
3
1
5
x
FIGURE 14.38
14. Figure 14.39
y
y 1
k 4
3 t
FIGURE 14.37
7. fx = 3x2 for 0 ≤ x < 4 and fx + 4 = fx for all x. ⎧ ⎪ for 0 ≤ x < 3 ⎨1 + x 8. fx = 2 for 3 ≤ x < 4 ⎪ ⎩ fx + 4 for all x
6
2
2
0
2
4
6
x
4 3 2 1 FIGURE 14.39
FIGURE 14.35
1
2
3
4
x
630
CHAPTER 14
Fourier Series
15. Determine the Fourier series representation of the steady-state current in the circuit of Figure 14.40 if Et =
100t 2 − t2 Et + 2
for − ≤ t < for all t
16. Determine the Fourier series representation of the steady-state current in the circuit shown in Figure 14.41 if Et = 10 sin800t. Hint: First show that 20 cos1600nt Et = 1−2 4n2 − 1 n=1
100 E(t)
i
500 E(t)
10 2 F
i
5H
0.2 F 10 H FIGURE 14.40
14.7
FIGURE 14.41
Complex Fourier Series and the Frequency Spectrum It is often convenient to work in terms of complex numbers, even when the quantities of interest are real. For example, electrical engineers often use equations having complex quantities to compute currents, realizing that at the end the current is the real part of a certain complex expression. We will cast Fourier series in this setting. Later, complex Fourier series and their coefficients will provide a natural starting point for the development of discrete Fourier transforms.
14.7.1 Review of Complex Numbers Given a complex number a + bi, its conjugate is a + bi = a − bi. If we identify a + bi with the point a b in the plane, then a − bi is a −b, the reflection of a b across the horizontal (real) axis (Figure 14.42). The conjugate of a product is the product of the conjugates: zw = z w for any complex numbers z and w. √ The magnitude, or modulus, of a + bi is a + bi = a2 + b2 , the distance from the origin to a b. It is useful to observe that a + bia + bi = a2 + b2 = a + bi2 If we denote the complex number as z, this equation is zz = z2 Introduce polar coordinates x = r cos, y = r sin to write z = x + iy = r cos + i sin = rei by Euler’s formula. Then r = z and is called an argument of z. It is the angle between the positive x axis and the point x y, or x + iy, in the plane (Figure 14.43). The argument is
14.7 Complex Fourier Series and the Frequency Spectrum y
y
y
(a, b) (a, b)
a ib re i (a, b)
r
x
631
8
2 2i 8 e i/4
4
x
FIGURE 14.42
FIGURE 14.43
FIGURE 14.44
Complex conjugate as a reflection across the horizontal axis.
Polar form of a complex number.
2 + 2i.
x Polar form of
√ determined to within integer multiples of 2. For example, 2 + 2i = 8 and the arguments of 2 + 2i are the angles /4 + 2n, with n any integer (Figure 14.44). Thus we can write √ 2 + 2i = 8ei/4 √ This is the polar form of 2 + 2i. We can actually write 2 + 2i = 8ei/4+2n , but this doesn’t contribute anything new to the polar form of 2 + 2i, since ei/4+2n = ei/4 e2ni and e2ni = cos2n + i sin2n = 1 If we use Euler’s formula twice, we can write eix = cosx + i sinx and e−ix = cosx − i sinx Solve these equations for sinx and cosx to write cosx =
1 ix 1 ix e + e−ix and sinx = e − e−ix 2 2i
(14.24)
Finally, we will use the fact that, if x is a real number, then eix = e−ix . This is true because eix = cosx + i sinx = cosx − i sinx = e−ix
14.7.2
Complex Fourier Series
We will use these ideas to formulate the Fourier series of a function in complex terms. Let f be a real-valued, periodic function with fundamental period p. Assume that f is integrable on
−p/2 p/2. As we did with the phase angle form of a Fourier series, write the Fourier series of fx on this interval as 1 a0 + an cosn0 x + bn sinn0 x 2 n=1
632
CHAPTER 14
Fourier Series
with 0 = 2/p. Use equations (14.24) to write this series as
1 1 in0 x 1 in0 x −in0 x −in0 x a a + e e + bn +e −e 2 0 n=1 n 2 2i
1 1 1 an − ibn ein0 x + an + ibn e−in0 x = a0 + 2 2 n=1 2
(14.25)
In the series (14.25), let 1 d0 = a0 2 and, for each positive integer n, 1 dn = an − ibn 2 Then the series (14.25) becomes d0 +
dn ein0 x + dn e−in0 x
n=1
= d0 +
dn ein0 x +
n=1
dn e−in0 x
n=1
Now consider the coefficients. First, 1 1 p/2 ftdt d0 = a0 = 2 p −p/2 And, for n = 1 2 , 1 dn = an − ibn 2 1 2 p/2 i 2 p/2 = ft cosn0 tdt − ft sinn0 tdt 2 p −p/2 2 p −p/2 1 p/2 ft cosn0 t − i sinn0 tdt = p −p/2 1 p/2 fte−in0 t dt = p −p/2 Then dn =
1 p/2 1 p/2 fte−in0 t dt = ftein0 t dt = d−n p −p/2 p −p/2
Put these results into the series (14.26) to get d0 +
dn ein0 x +
n=1
dn e−in0 x
n=1
1 = d0 + dn ein0 x + d−n e−in0 x 2 n=1 n=1
= d0 +
n=− n=0
dn ein0 x =
n=−
dn ein0 x
(14.26)
14.7 Complex Fourier Series and the Frequency Spectrum
633
We have reached this expression by rearranging terms in the Fourier series of a periodic function f . This suggests the following definition.
DEFINITION 14.8
Complex Fourier Series
Let f have fundamental period p. Let 0 = 2/p. Then the complex Fourier series of f is dn ein0 x n=−
where dn =
1 p/2 fte−in0 t dt p −p/2
for n = 0 ±1 ±2 The numbers dn are the complex Fourier coefficients of f .
In the formula for dn , the integration can actually be carried out over any interval of length p, because of the periodicity of f . Thus, for any real number , 1 +p dn = fte−in0 t dt p Since the complex Fourier series is just another way of writing the Fourier series, the convergence theorems (14.1) and (14.2) apply without any adjustments.
THEOREM 14.10
Let f be periodic with fundamental period p. Let f be piecewise smooth on −p/2 p/2. Then at each x the complex Fourier series converges to 21 fx+ + fx−. The amplitude spectrum of the complex Fourier series of a periodic function is a graph of the points n0 dn , in which dn is the magnitude of the complex coefficient dn . Sometimes this amplitude spectrum is referred to as a frequency spectrum.
EXAMPLE 14.25
We will compute the complex Fourier series of the full-wave rectification of E sint, in which E and are positive constants. This means that we want the complex Fourier series of E sint, whose graph is shown in Figure 14.45. This function has fundamental period / (even though E sint has period 2/). In this example, 0 = 2// = 2. The complex Fourier coefficients are / E sint e−2nit dt 0 E / = sinte−2nit dt 0
dn =
634
CHAPTER 14
Fourier Series y E
0
2
FIGURE 14.45
3
t
Graph of
E sint.
When n = 0 we get d0 =
E / 2E sintdt = 0
When n = 0, the integration is simplified by putting the sine term in exponential form: E / 1 it e − e−it e−2nit dt dn = 0 2i / E E / −1+2nit = e1−2nit dt − e dt 2i 0 2i 0
/
/ 1 1 E E e1−2nit e−1+2nit + = 2i 1 − 2ni 2i 1 + 2ni 0 0 1−2ni
−1+2ni 1 e 1 E e − + − =− 2 1 − 2n 1 − 2n 1 + 2n 1 + 2n Now e1−2ni = cos1 − 2n + i sin1 − 2n = −11−2n = −1 and e−1+2ni = cos1 + 2n − i sin1 + 2n = −11+2n = −1 Therefore dn = −
−1 1 −1 −1 E − + + 2 1 − 2n 1 − 2n 1 + 2n 1 + 2n
1 2E 2 4n − 1 When n = 0 this yields the correct value for d0 as well. The complex Fourier series of E sint is 1 E e2nit −2 2 n=− 4n − 1 =−
The amplutide spectrum is a plot of the points 2E 2n 2 4n − 1 Part of this plot is shown in Figure 14.46.
14.7 Complex Fourier Series and the Frequency Spectrum
635
dn 2E 3
2E 15
6 4 2
2
4
n' 0
6
FIGURE 14.46 Amplitude spectrum of
E sint.
PROBLEMS
SECTION 14.7
In each of Problems 1 through 7, write the complex Fourier series of f , determine what this series converges to, and plot some points of the frequency spectrum. Keep in mind that, in specifying a function of period p , it is sufficient to define fp on any interval of length p. 1. f has period 3 and fx = 2x for 0 ≤ x < 3 2. f has period 2 and fx = x2 for 0 ≤ x < 2 0 for 0 ≤ x < 1 3. f has period 4 and fx = 1 for 1 ≤ x < 4 4. f has period 6 and fx = 1 − x for 0 ≤ x < 6 −1 for 0 ≤ x < 2 5. f has period 4 and fx = 2 for 2 ≤ x < 4 6. f has period 5 and fx = e−x for 0 ≤ x < 5 x for 0 ≤ x < 1 7. f has period 2 and fx = 2 − x for 1 ≤ x < 2 8. Let f be the periodic function, part of whose graph is shown in Figure 14.47. Find the complex Fourier series of f and plot some points of its frequency spectrum.
y
FIGURE 14.47
9. The graphs of Figures 14.48 and 14.49 define two periodic functions f and g, respectively. Calculate the complex Fourier series of each function. Determine a relationship between the frequency spectra of these functions and also between their phase spectra.
f(t) 5
2
2
10 14
t
FIGURE 14.48
g(t)
8
8 4
The next problem involves the phase spectrum of f , which is a plot of points n n0 for n = 0 1 2 . Here n = tan−1 −bn /an is the nth phase angle of f .
5
4
8
12
x 4 FIGURE 14.49
12 16
t
This page intentionally left blank
CHAPTER
15
THE FOURIER INTEGRAL FOURIER COSINE AND SINE INTEGRALS COMPLEX FOURIER INTEGRAL AND THE FOURIER TRANSFORM ADDITIONAL PROPERTIES AND APPLICATIONS OF THE FOURIER TRANSFORM
The Fourier Integral and Fourier Transforms
15.1
The Fourier Integral If fx is defined on an interval −L L, we may be able to represent it, at least at “most” points on this interval, by a Fourier series. If f is periodic, then we may be able to represent it by a Fourier series on intervals along the entire real line. Now suppose fx is defined for all x, but is not periodic. Then we cannot represent fx by a Fourier series over the entire line. However, we may still be able to write a representation in terms of sines and cosines, using an integral instead of a summation. To see how this might be done, suppose f is absolutely integrable, which means that − fx dx converges, and that f is piecewise smooth on every interval −L L. Write the Fourier series of f on an arbitrary interval −L L, with the integral formulas for the coefficients included: 1 L 1 L fd + f cosn/Ld cosnx/L 2 −L L −L n=1 L
1 + f sinn/Ld sinnx/L L −L We want to let L → to obtain a representation of fx over the whole line. To see what limit, if any, this Fourier series approaches, let n =
n L
and n − n−1 =
= L 637
638
CHAPTER 15
The Fourier Integral and Fourier Transforms
Then the Fourier series on −L L can be written 1 L 1 L 1 fd + f cosn d cosn x 2 −L n=1 L −L L
1 + f sinn d sinn x L −L
(15.1)
Now let L → , causing → 0. In the last expression, 1 L fd → 0 2 −L L because by assumption −L fd converges. The other terms in the expression (15.1) resemble a Riemann sum for a definite integral, and we assert that, in the limit as L → and → 0, this expression approaches the limit 1 f cosd cosx 0 −
+ f sind sinx d −
This is the Fourier integral of f on the real line. Under the assumptions made about f , this integral converges to 1 fx− + fx+ 2 at each x. In particular, if f is continuous at x, then this integral converges to fx. Often this Fourier integral is written
0
A cosx + B sinxd
(15.2)
in which the Fourier integral coefficients of f are A =
1 f cosd −
B =
1 f sind −
and
This Fourier of fx is entirely analogous to a Fourier series on an integral representation interval, with 0 d replacing , and having integral formulas for the coefficients. These n=1 coefficients are functions of , which is the integration variable in the Fourier integral (15.2).
15.1 The Fourier Integral
639
EXAMPLE 15.1
Let
fx =
for − 1 ≤ x ≤ 1 for x > 1
1 0
y x
1
1
FIGURE 15.1
fx =
1 0
for − 1 ≤ x ≤ 1 for x > 1
Figure 15.1 is a graph of f . Certainly f is piecewise smooth, and Fourier coefficients of f are 1 1 2 sin A = cosd = −1 and 1 f sind = 0 B = − The Fourier integral of f is 2 sin cosxd 0
−
fx dx converges. The
Because f is piecewise smooth, this converges to 21 fx+ + fx− for all x. More explicitly, ⎧ 1 for − 1 < x < 1 ⎪ ⎪ ⎨ 2 sin 1 cosxd = for x = ±1 ⎪ 0 ⎪ ⎩2 0 for x > 1 There is an another expression for the Fourier integral of a function that we will sometimes find convenient. Write 1 f cosd cosx
A cosx + B sinx d = − 0 0
1 + f sind sinx d − 1 = f cos cosx 0 − + sin sinxdd 1 = f cos − xdd (15.3) 0 − Of course, this integral has the same convergence properties as the integral expression (15.2), since it is just a rearrangement of that integral.
CHAPTER 15
640
SECTION 15.1
The Fourier Integral and Fourier Transforms
PROBLEMS
In each of Problems 1 through 10, expand the function in a Fourier integral and determine what this integral converges to. x for − ≤ x ≤ 1. fx = 0 for x > k for − 10 ≤ x ≤ 10 2. fx = 0 if x > 10 ⎧ ⎪ ⎨−1 for − ≤ x ≤ 0 3. fx = 1 for 0 < x ≤ ⎪ ⎩ 0 for x > ⎧ ⎪ ⎨sinx for − 4 ≤ x ≤ 0 4. fx = cosx for 0 < x ≤ 4 ⎪ ⎩ 0 for x > 4 2 x for − 100 ≤ x ≤ 100 5. fx = 0 for x > 100
15.2
x for − ≤ x ≤ 2 0 for x < − and for x > 2
6. fx = 7. fx =
for − 3 ≤ x ≤ for x < −3 and for x >
sinx 0
⎧ ⎪ ⎨1/2 8. fx = 1 ⎪ ⎩ 0
for − 5 ≤ x < 1 for 1 ≤ x ≤ 5 for x > 5
9. fx = e−x 10. fx = xe−4x 11. Show that the Fourier integral of f can be written sint − x 1 dt ft lim → − t−x
Fourier Cosine and Sine Integrals If f is piecewise smooth on the half-line 0 , and 0 f d converges, then we can write a Fourier cosine or sine integral for f that is completely analogous to sine and cosine expansions of a function on an interval 0 L. To write a cosine integral, extend f to an even function fe defined on the whole line by setting fe x =
fx f−x
for x ≥ 0 for x < 0
This reflects the graph for x ≥ 0 back across the vertical axis. Since fe is an even function, its Fourier integral has only cosine terms. Since fe x = fx for x ≥ 0, this cosine integral can be defined to be the Fourier cosine integral of f on 0 . The coefficient of fe in its Fourier integral expansion is 1 f cosd − e and this is 2 f cosd 0 This suggests the following definition.
15.2 Fourier Cosine and Sine Integrals
641
DEFINITION 15.1
Fourier Cosine Integral Let f be defined on 0 and let 0 f d converge. The Fourier cosine integral of f is A cosxd 0
in which A =
2 f cosd 0
By applying the convergence theorem to the integral expansion of fe , we find that, if f is piecewise continuous on each interval 0 L, then its cosine integral expansion converges to 21 fx+ + fx− for each x > 0, and to f0 for x = 0. In particular, at any positive x at which f is continuous, the cosine integral converges to fx. By extending f to an odd function fo , similar to what we did with series, we obtain a Fourier integral for fo containing only sine terms. Since fo x = fx for x ≥ 0, this gives a sine integral for f on 0 .
DEFINITION 15.2
Fourier Sine Integral Let f be defined on 0 and let 0 f d converge. The Fourier sine integral of f is A sinxd 0
in which A =
2 f sind 0
If f is piecewise smooth on every interval 0 L, then this integral converges to 21 fx+ + fx− on 0 . As with Fourier sine series on a bounded interval, this Fourier sine integral converges to 0 at x = 0.
EXAMPLE 15.2 Laplace’s Integrals
Let fx = e−kx for x ≥ 0, with k a positive constant. Then f is continuously differentiable on any interval 0 L, and 1 e−kx dx = k 0 For the Fourier cosine integral, compute the coefficients k 2 −k 2 A = e cosd = 2 0 k + 2
CHAPTER 15
642
The Fourier Integral and Fourier Transforms
The Fourier cosine integral representation of f converges to e−kx for x ≥ 0: 1 2k e−kx = cosxd 2 0 k + 2 For the sine integral, compute B =
2 −k 2 e sinkd = 0 k2 + 2
The sine integral converges to e−kx for x > 0 and to 0 for x = 0: 2 e−kx = sinxd for x > 0 0 k 2 + 2 These integral representations are called Laplace’s integrals because A is 2/ times the Laplace transform of sinkx, while B is 2/ times the Laplace transform of coskx.
SECTION 15.2
PROBLEMS
In each of Problems 1 through 10, find the Fourier sine integral and Fourier cosine integral representations of the function. Determine to what each integral converges. x2 for 0 ≤ x ≤ 10 1. fx = 0 for x > 10 sinx for 0 ≤ x ≤ 2 2. fx = 0 for x > 2 ⎧ ⎪ ⎨1 for 0 ≤ x ≤ 1 3. fx = 2 for 1 < x < 4 ⎪ ⎩ 0 for x > 4 coshx for 0 ≤ x ≤ 5 4. fx = 0 for x > 5 ⎧ ⎪ ⎨2x + 1 for 0 ≤ x ≤ 5. fx = 2 for < x ≤ 3 ⎪ ⎩ 0 for x > 3
15.3
⎧ ⎪ for 0 ≤ x ≤ 1 ⎨x 6. fx = x + 1 for 1 < x ≤ 2 ⎪ ⎩ 0 for x > 2 7. fx = e−x cosx 8. fx = xe−3x 9. fx =
k 0
for x ≥ 0
for x ≥ 0 for 0 ≤ x ≤ c for x > c
in which k is constant and c is a positive constant 10. fx = e−2x cosx for x ≥ 0 11. Use the Laplace integrals to compute the Fourier cosine integral of fx = 1/1 + x2 and the Fourier sine integral of gx = x/1 + x2 .
The Complex Fourier Integral and the Fourier Transform It is sometimes convenient to have a complex form of the Fourier integral. This complex setting will prove a natural platform from which to develop the Fourier transform. Suppose f is piecewise smooth on each interval −L L, and that − fx dx converges. Then, at any x, 1 1 f cos − xdd fx+ + fx− = 2 0 −
15.3 The Complex Fourier Integral and the Fourier Transform
643
by the expression (15.3). Insert the complex exponential form of the cosine function into this expression to write 1 1 1 fx+ + fx− = f ei−x + e−i−x dd 2 0 − 2 1 1 fei−x dd + fe−i−x dd = 2 0 − 2 0 − In the first integral on the last line, put = −w to get 1 1 0 1 fx+ + fx− = fe−iw−x ddw + fe−i−x dd 2 2 − − 2 0 − Now write the variable of integration in the next to last integral as again and combine these two integrals to write 1 1 fx+ + fx− = fe−i−x dd (15.4) 2 2 − − This complex Fourier integral representation of f on the real line. If we let C = is the −it fte dtthen this integral is − 1 1 fx+ + fx− = C eix d 2 2 − We call C the complex Fourier integral coefficient of f .
EXAMPLE 15.3
Let fx = e−ax for all real x, with a a positive constant. We will compute the complex Fourier integral representation of f . First, we have e−ax for x ≥ 0 fx = ax for x < 0 e Further,
−
fxdx =
0 −
eax dx +
0
e−ax dx =
2 a
Now compute C = = =
−
0
−
0
−
e−at e−it dt eat e−it dt + ea−it dt +
e−at e−it dt
0 0
e−a+it dt
0
1 −1 −a+it a−it = e e + a − i a + i − 0 1 2a 1 + = 2 = a + i a − i a + 2
644
CHAPTER 15
The Fourier Integral and Fourier Transforms
The complex Fourier integral representation of f is e−ax =
1 a eix d − a2 + 2
The expression on the right side of equation (15.4) leads naturally into the Fourier transform. To emphasize a certain term, write equation (15.4) as 1 1 −i fx+ + fx− = fe d eix d (15.5) 2 2 − − The term in parentheses is what we will call the Fourier transform of f .
DEFINITION 15.3
Suppose function
−
Fourier Transform
fx dx converges. Then the Fourier transform of f is defined to be the f =
−
fte−it dt
Thus the Fourier transform of f is the coefficient C in the complex Fourier integral representation of f . turns a function f into a new function called f. Because the transform is used in signal analysis, we will often use t (for time) as the variable with f , and the variable of the transformed function f. The value of the function f at is f, and this number is computed for a given by evaluating the integral − fte−it dt. If we want to keep the variable t before our attention, we sometimes write f as ft. Engineers refer to the variable in the transformed function as the frequency of the signal f . Later we will discuss how the Fourier transform, and a truncated version called the windowed Fourier transform, are used to determine information about the frequency content of a signal. Because the symbol ft may be clumsy to use in calculations, we sometimes write the Fourier transform of f as fˆ. In this notation, f = fˆ
EXAMPLE 15.4
Let a be a positive constant. Then
e−at =
2a a2 + 2
This follows immediately from Example15.3, where we calculated the Fourier integral coefficient C of e−at . This coefficient is the Fourier transform of f .
15.3 The Complex Fourier Integral and the Fourier Transform
645
EXAMPLE 15.5
Let a and k be positive numbers, and let k for − a ≤ t < a ft = 0 for t < −a and for t ≥ a This pulse function can be written in terms of the Heaviside function as ft = k Ht + a − Ht − a and is graphed in Figure 15.2. The Fourier transform of f is fˆ = fte−it dt −
=
a
−a
=−
ke
−it
−k −it e dt = i
a −a
2k k −ia − eia = e sina i f (t)
k
Pulse function: f (t) k H(t a) H(t a)
a
a
t
FIGURE 15.2
Again, we can also write f =
2k sina
ft =
2k sina
or
In view of equation (15.5), the Fourier integral representation of f is 1 ˆ f eit d 2 − If f is continuous, and f piecewise continuous on every interval −L L, then the Fourier integral of f represents f : 1 ˆ f eit d ft = (15.6) 2 − We can therefore use equation (15.6) as an inverse Fourier transform, retrieving f from fˆ. This is important because, in applications, we use the Fourier transform to change a problem involving f from one form to another, presumably easier one, which is solved for fˆ. We must then have some way of getting back to the ft that we want, and equation (15.6) is the vehicle that is often used. We write −1 fˆ = f if f = fˆ. As we expect of any integral transform, is linear: f + g = f + g
646
CHAPTER 15
The Fourier Integral and Fourier Transforms
The integral defining the transform, and the integral (15.6) giving its inverse, are said to constitute a transform pair for the Fourier transform. Under certain conditions on f , fˆ =
−
1 ˆ f eit dt 2 −
fte−it dt and ft =
EXAMPLE 15.6
Let
1 − t for − 1 ≤ t ≤ 1 0 for t > 1 and for t < −1
ft =
Then f is continuous and absolutely integrable, and f is piecewise continuous. Compute fˆ = =
−
1
−1
fte−it dt
1 − te−it dt =
21 − cos 2
This is the Fourier coefficient C in the complex Fourier expansion of ft. If we want to go the other way, then by equation (15.6), 1 ˆ f eit d 2 − 1 1 − cos it e d = − 2
ft =
We can verify this by explicitly carrying out this integration. A software package yields 1 1 − cos it e d − 2 =t signum t + 1 + signum t + 1 + t signum t − 1 − signum t − 1 − 2 signum t in which ⎧ ⎪ ⎨ 1 signum = 0 ⎪ ⎩ −1
for > 0 for = 0 for < 0
This expression is equal to 1 − t for −1 ≤ t ≤ 1, and 0 for t > 1 and for t < −1, verifying the integral for the inverse. In the context of the Fourier transform, the amplitude spectrum is often taken to be a graph of fˆ. This is in the same spirit as the use of this term in connection with the Fourier integral.
15.3 The Complex Fourier Integral and the Fourier Transform
647
EXAMPLE 15.7
If ft = Hte−at , then fˆ = 1/a + i, so 1 ˆ f = √ 2 a + 2 Figure 15.3 shows a graph of fˆ. This graph is the amplitude spectrum of f . f()
f ()
f()
1 a
2k sin (a)
FIGURE 15.3
fˆ = √
1
Graph of , with 2
FIGURE 15.4
a2 +
ft = Hte−t .
EXAMPLE 15.8
The amplitude spectrum of the function f of Example 15.5 is a graph of sin ˆ f = 2k shown in Figure 15.4. We will now develop some of the important properties and computational rules for the Fourier transform. With each rule, there is also an inverse transform version which we will also state. Throughout, we assume that − ft dt converges, and, for the inverse version, that f is continuous and f piecewise continuous on each −L L. THEOREM 15.1
Time Shifting
If t0 is a real number, then ft − t0 = e−it0 fˆt That is, if we shift time back t0 units and replace ft by ft − t0 , then the Fourier transform of this shifted function is the Fourier transform of f , multiplied by the exponential factor e−it0 . Proof
ft − t0 =
−
ft − t0 e−it dt
= e−it0
−
ft − t0 e−it−t0 dt
648
CHAPTER 15
The Fourier Integral and Fourier Transforms
Let u = t − t0 to write ft − t0 = e−it0
−
fue−iu du = e−it0 fˆ
EXAMPLE 15.9
Suppose we want the Fourier transform of the pulse of amplitude 6 which turns on at time 3 and off at time 7. This is the function 0 for t < 3 and for t ≥ 7 gt = 6 for 3 ≤ t < 7 shown in Figure 15.5. We can certainly compute gˆ by integration. But we can also observe that the midpoint of the pulse (that is, of the nonzero part) occurs when t = 5. Shift the graph 5 units to the left to center the pulse at zero (Figure 15.6). Calling this shifted pulse f , then ft = gt + 5. Shifting f five units to the right again just gets us back to g: gt = ft − 5 The point to this is that we already know the Fourier transform of f from Example 15.5: ft = 12
sin2
By the time-shifting theorem, gt = ft − 5 = 12e−5i
sin2
The inverse version of the time-shifting theorem is: −1 e−it0 Ft = ft − t0
(15.7)
EXAMPLE 15.10
Suppose we want
−1
e2i 5 + i
g(t)
y 1.0
g(t)
6 0.6
6
3
7
t
FIGURE 15.5
⎧ ⎪ ⎨6 gx = 0 ⎪ ⎩
for 3 ≤ t < 7 for t < 3 and for t ≥ 7
–2
2
t
FIGURE 15.6 The function of Figure 15.5 shifted five units to the left.
0.2 2 1
0
t
FIGURE 15.7 Graph of Ht + 2e−5t+2 .
15.3 The Complex Fourier Integral and the Fourier Transform
649
The presence of the exponential factor suggests the inverse version of the time-shifting theorem. Put t0 = −2 in equation (15.7) to write 2i e −1 = ft − −2 = ft + 2 5 + i where
ft =
Therefore
−1
−1
1 = Hte−5t 5 + i
e2i = ft + 2 = Ht + 2e−5t+2 5 + i
A graph of this function is shown in Figure 15.7. The next result is reminiscent of the first shifting theorem for the Laplace transform (Theorem 3.7). THEOREM 15.2
Frequency Shifting
If 0 is any real number, then eit ft = fˆ − 0 Proof
eit ft = =
−
−
ei0 t fte−it dt fte−i−0 t dt = fˆ − 0
The inverse version of the frequency-shifting theorem is −1 fˆ − 0 t = ei0 t ft
THEOREM 15.3
Scaling
If a is a nonzero real number, then fat =
1 ˆ f /a a
This can be proved by a straightforward calculation proceeding from the definition. The inverse transform version of this result is −1 fˆ/at = a fat This conclusion is called a scaling theorem because we want the transform not of ft, but of fat, in which a can be thought of as a scaling factor. The theorem says that we can compute the transform of the scaled function by replacing by /a in the transform of the original function, and dividing by the magnitude of the scaling factor.
650
CHAPTER 15
The Fourier Integral and Fourier Transforms
EXAMPLE 15.11
We know from Example 15.6 that, if 1 − t ft = 0
for − 1 ≤ t ≤ 1 for t > 1 and for t < −1
then 1 − cos fˆ = 2 2 Let
gt = f7t =
Then
1 − 7t 0
for − 17 ≤ t ≤ 17 for t > 17 and for t <
1 gˆ = f7t = fˆ 7 7 1 − cos 2 1 − cos/7 = 14 = 7 /72 2
THEOREM 15.4
−1 7
7
Time Reversal
f−t = fˆ− This result is called time reversal because we replace t by −t in ft to get f−t. The transform of this new function is obtained by simply replacing by − in the transform of ft. This conclusion follows immediately from the scaling theorem by putting a = −1. The inverse version of time reversal is −1 fˆ−t = f−t THEOREM 15.5
Symmetry
fˆt = 2f− To understand this conclusion, begin with ft and take its Fourier tramsform fˆ. Replace by t and take the transform of the function fˆt. The symmetry property of the Fourier transform states that the transform of fˆt is just the original function ft with t replaced by −, and then this new function multiplied by 2.
EXAMPLE 15.12
Let
ft =
4 − t2 0
for − 2 ≤ t ≤ 2 for t > 2 and for t < −2
15.3 The Complex Fourier Integral and the Fourier Transform
651
f(t) 4 3 2 1 4 3 2 1 FIGURE 15.8
4 − t2 0
fx =
0
1
2
3
4
x
for −2 ≤ t ≤ 2 . for t > 2
Figure 15.8 shows a graph of f . The Fourier transform of f is fˆ =
−
=4
fte−it d =
2 −2
4 − t2 e−it dt
sin2 − 2 cos2 3
In this example, f−t = ft, so exchanging − for should not make any difference in fˆ, and we can see that this is indeed the case.
THEOREM 15.6
Modulation
If 0 is a real number, then ft cos0 t =
1ˆ f + 0 + fˆ − 0 2
and 1 ft sin0 t = i fˆ + 0 − fˆ − 0 2 Put cost = theorem to get
Proof
1 2
ei0 t + e−i0 t and use the linearity of and the frequency-shifting
ft cos0 t =
1 i0 t 1 e ft + e−i0 t ft 2 2
1 1 = ei0 t ft + e−i0 t ft 2 2 1ˆ 1ˆ = f − 0 + f + 0 2 2 The second conclusion is proved similarly, using sint = 2i1 ei0 t − e−i0 t .
CHAPTER 15
652
SECTION 15.3
The Fourier Integral and Fourier Transforms
PROBLEMS and
In each of Problems 1 through 8, find the complex Fourier integral of the function and determine what this integral converges to.
11. ft = 5 Ht − 3 − Ht − 11 2
12. ft = 5e−3t−5
13. ft = Ht − ke−t/4 14. ft = Ht − kt2 15. ft = 1/1 + t2 16. ft = 3Ht − 2e−3t 17. ft = 3e−4t+2 18. ft = Ht − 3e−2t
in which k is a positive constant ⎧ ⎪ cosx for 0 ≤ x ≤ ⎪ ⎪ 2 ⎨ 7. fx = sinx for − ≤ x < 0 ⎪ 2 ⎪ ⎪ ⎩0 for x > 2
In each of Problems 19 through 24, find the inverse Fourier transform of the function. 2
19. 9e−+4 /32 e20−4i 20. 3 − 5 − i
8. fx = x2 e−3x In each of Problems 9 through 18, find the Fourier transform of the function and graph the amplitude spectrum. Wherever k appears, it is a positive constant. For some problems one or more theorems from this section can be used in conjunction with the following transforms, which can be assumed: e
15.4
2a 2 = 2 e−at = a + 2
1 = e−a a2 + t 2 a
⎧ ⎪ ⎨ 1 for 0 ≤ t ≤ 1 9. ft = −1 for − 1 ≤ t < 0 ⎪ ⎩ 0 for t > 1 sint for − k ≤ t ≤ k 10. ft = 0 for t > k
1. fx = xe−x 1 − x for − 1 ≤ x ≤ 1 2. fx = 0 for x > 1 sinx for − 5 ≤ x ≤ 5 3. fx = 0 for x > 5 x for − 2 ≤ x ≤ 2 4. fx = 0 for x > 2 x for − 1 ≤ x ≤ 1 5. fx = −x for x > 1 e ⎧ ⎪ ⎨ 1 for 0 ≤ x ≤ k 6. fx = −1 for − k ≤ x < 0 ⎪ ⎩ 0 for x > k
−at
−2 /4a e a
21.
e2−6i 5 − 3 − i
10 sin3 + 1 + i 23. 6 − 2 + 5i 22.
Hint: Factor the denominator and use partial fractions. 104 + i 24. 9 − 2 + 8i
Additional Properties and Applications of the Fourier Transform 15.4.1 The Fourier Transform of a Derivative In using the Fourier transform to solve differential equations, we need an expression relating the transform of f to that of f . The following theorem provides such a relationship for derivatives of any order, and is called the operational rule for the Fourier transform. A similar issue arises for any integral transform when it is to be used in connection with differential equations (as in Theorems 3.5 and 3.6 for the Laplace transform).
15.4 Additional Properties and Applications of the Fourier Transform
653
Recall that the kth derivative of f is denoted f k . As a convenience we may let k = 0 in this symbol, with the understanding that f 0 = f . THEOREM 15.7
Differentiation in the Time Variable
n−1 n Let n be a positive integer. Suppose fn−1 iscontinuous, and f is piecewise continuous on t dt converges. Suppose each interval −L L. Suppose − f
lim f k t = lim f k t = 0
t→
t→−
for k = 0 1 n − 1. Then f n t = in fˆ Proof
Begin with the first derivative. Integrating by parts, we have f te−it dt f =
−
= fte−it
− −
−
ft−ie−it dt
Now e−it = cost − i sint has magnitude 1, and by assumption, lim ft = lim ft = 0
t→
Therefore f n t = i
t→−
−
fte−it dt = ifˆ
The conclusion for higher derivatives follows by inducation on n and the fact that. f n t =
d n−1 f t dt
The assumption that f is continuous in the operational rule can be relaxed to allow for a finite number of jump discontinuities, if we allow for these in the conclusion by adding appropriate terms. We will state this result for the transform of f .
THEOREM 15.8 Suppose f is continuous on the real line, except for jump discontinuities at t1 tM . Let f be piecewise continuous on every −L L. Assume that − ft dt converges, and that
lim ft = lim ft = 0
t→
t→−
Then f = ifˆ −
M j=1
ftj + − ftj −e−itj
654
CHAPTER 15
The Fourier Integral and Fourier Transforms f (t) f(tj) f(tj)
t
tj FIGURE 15.9 The function f has a jump discontinuity at tj .
Each term ftj + − ftj − is the difference between the one-sided limits of ft at the jump discontinuity tj . This shows up in Figure 15.9 as the size of the jump between ends of the graph at this point. Suppose first that f has a single jump discontinuity, at t1 . In the event of more jump discontinuities, the argument proceeds along the same lines, but includes more of the type of calculation we are about to do. Integrate by parts: f te−it dt f =
Proof
−
=
t1
−
f te−it dt +
t1
= fte−it − −
f te−it dt
t1
t1 −
ft−ie−it dt
+ fte−it t − −i 1
fte−it dt
t1
=ft1 −e−it1 − ft1 +e−it1 + i
−
fte−it dt
=ifˆ − ft1 + − ft1 −e−it1 Here is an example of the use of the operational rule in solving a differential equation.
EXAMPLE 15.13
Solve y − 4y = Hte−4t in which H is the Heaviside function. Thus the differential equation is e−4t for t ≥ 0 y − 4y = 0 for t < 0 Apply the Fourier transform to the differential equation to get y − 4ˆy = Hte−4t
15.4 Additional Properties and Applications of the Fourier Transform
2 0.04
4
2
4
655
t
0.08 0.12 FIGURE 15.10 yt = − 18 e−4t .
Using Theorem 15.7 and the fact that F Hte−4t = iˆy − 4ˆy =
1 , 4+i
write this equation as
1 4 + i
Solve for yˆ to obtain yˆ = The solution is
yt = −1
−1 16 + 2
−1 1 t = − e−4t 16 + 2 8
which is graphed in Figure 15.10. The inverse transform just obtained can be derived in several ways. We can use a table of Fourier transforms, or a software package that contains this transform. We can also see from Example 15.4 that
e−at =
2a a2 + 2
and choose a = 4. There is no arbitrary constant in this solution because the Fourier transform has returned the only solution that is continuous and bounded for all real t. Boundedness is assumed when we use the transform because of the required convergence of − yt dt.
15.4.2
Frequency Differentiation
The variable used for the Fourier transform is the frequency of ft, since it occurs in the complex exponential eit , which is cost + i sint. In this context, differentiation of fˆ with respect to is called frequency differentiation. We will now relate derivatives of fˆ and ft.
THEOREM 15.9
Frequency Differentiation
Let n be a positive integer. Let f be piecewise continuous on −L L for every positive number L, and assume that − tn ft dt converges. Then tn ft = in
dn ˆ f dn
656
CHAPTER 15
The Fourier Integral and Fourier Transforms
In particular, under the conditions of the theorem, d2 d ˆ f and t2 ft = − 2 fˆ d d Proof We will prove the theorem for n = 1. The argument for larger n is similar. Apply Leibniz’s rule for differentiation under the integral to write
d d ˆ f = fte−it dt fte−it dt = d d − − ft−ite−it dt = −i
tfte−it dt = tft = i
−
−
= −i tft
EXAMPLE 15.14
Suppose we want to compute t2 e−5t . Recall from Example 15.4 that e−5t =
10 25 + 2
By the frequency differentiation theorem, 2 −5t
t e
10 25 − 32 d2 = 20 = i d2 25 + 2 25 + 2 3 2
15.4.3 The Fourier Transform of an Integral The following enables us to take the transform of a function defined by an integral. THEOREM 15.10
Let f be piecewise continuous on every interval −L L. Suppose Suppose fˆ0 = 0. Then
t 1 fd = fˆ i −
−
ft dt converges.
t Let gt = − fd. Then g t = ft for any t at which f is continuous, and gt → 0 as t → − . Further, lim gt = fd = fˆ0 = 0 Proof
t→
−
We can therefore apply Theorem 15.7 to g to obtain fˆ = ft = g t t
fd = i gt = i −
This is equivalent to the conclusion to be proved.
15.4 Additional Properties and Applications of the Fourier Transform
15.4.4
657
Convolution
There are many transforms defined by integrals, and it is common for such a transformation to have a convolution operation. We saw a convolution for the Laplace transform in Chapter 3. We will now discuss convolution for the Fourier transform.
DEFINITION 15.4
Convolution
Let f and g be functions defined on the real line. Then f has a convolution with g if b b 1. a ftdt and a gtdt exist for every interval a b. 2. For every real number t, ft − g d −
converges. In this event, we define the convolution f ∗ g of f with g to be the function given by f ∗ gt = ft − gd −
In this definition, we wrote f ∗ gt for emphasis. However, the convolution is a function denoted f ∗ g, so we can write just f ∗ gt to indicate f ∗ g evaluated at t. THEOREM 15.11
Suppose f has a convolution with g. Then 1. (Commutativity of Convolution) g has a convolution with f , and f ∗ g = g ∗ f . 2. (Linearity) If f and g both have convolutions with h, and and are real numbers, then f + g also has a convolution with h, and f + g ∗ h = f ∗ g + g ∗ h Proof
For (1), let z = t − to write f ∗ gt = ft − gd =
−
−
fzgt − z−1dz =
−
gt − zfzdz = g ∗ ft
Conclusion (2) follows from elementary properties of integrals, given that the integrals involved converge. We are now ready for the main results on convolution. THEOREM 15.12
Suppose f and g are bounded and continuous on the real line, and that dt both converge. Then, gt −
−
ft dt and
658
CHAPTER 15
The Fourier Integral and Fourier Transforms
1.
−
f ∗ gtdt =
−
ftdt
−
gtdt
2. (Time Convolution) f1 ∗ g = fˆˆg 3. (Frequency Convolution) 1 ˆ ftgt = f ∗ gˆ 2 The first conclusion is that the integral, over the real line, of the convolution of f with g, is equal to the product of the integrals of f and of g over the line. Time convolution states that the Fourier transform of a convolution is the product of the transforms of the functions. This formula can be stated f ∗ g = fˆˆg That is, the Fourier transform of the convolution of f with g, is equal to the product of the transform of f with the transform of g. This has the important inverse version −1 fˆˆg t = f ∗ gt The inverse Fourier transform of the product of two transformed functions, is equal to the convolution of these functions. This is sometimes of use in evaluating an inverse Fourier transform. If we want −1 h, and are able to factor h into fˆˆg , a product of the transforms of two known functions, then the inverse transform of h is the convolution of these known functions. Frequency convolution can be stated ftgt =
1 ˆ f ∗ gˆ 2
The Fourier transform of a product of two functions is equal to the transforms of these functions. The inverse version of frequency convolution is
1 2
times the convolution of
−1 fˆ ∗ gˆ t = 2ftgt Proof
For (1), write
−
f ∗ gtdt = =
−
−
− −
ft − gd dt ft − gdt d =
−
−
ft − dt gd
assuming the validity of this interchange of the order of integration. Now, ft − dt = ftdt −
−
15.4 Additional Properties and Applications of the Fourier Transform for any real number . Therefore f ∗ gtdt =
−
=
−
−
−
659
ftdt gd
ftdt
−
gd =
−
ftdt
−
gtdt
For (2), begin by letting Ft = e−it ft and Gt = e−it gt for real t and . Then f ∗ gte−it dt f1 ∗ g = = = =
− −
−
−
− − −
ft − gd e−it dt e−it ft − gd dt e−it− ft − e−i gd dt
Now recognize that the integral in large parentheses in the last line is the convolution of F with G. Then, by (1) of this theorem applied to F and G, f1 ∗ g = F ∗ Gtdt = Ftdt Gtdt =
− −
fte
−it
dt
−
−
gte
−
−it
dt = fˆˆg
We leave conclusion (3) to the student.
EXAMPLE 15.15
Suppose we want to compute
−1
1 4 + 2 9 + 2
Recognize the problem as one of computing the inverse transform of a product of functions whose individual transforms we know: 1 1 1 −2t = fˆ with ft = e−2t e = 2 4+ 4 4 and
1 1 −3t 1 = = gˆ with gt = e−3t e 9 + 2 6 6
The inverse version of conclusion (2) tells us that
1 −1 t = −1 fˆˆg t = f ∗ gt 4 + 2 9 + 2 1 1 1 −2t− −3 = e−2t ∗ e−3t = e e d 4 6 24 −
660
CHAPTER 15
The Fourier Integral and Fourier Transforms
We must be careful in evaluating this integral because of the absolute values in the exponents. First, if t > 0, then t 0 e−2t− e−3 d + e−2t− e−3 d + e−2t− e−3 d 24 f ∗ gt = −
=
0
−
e−2t− e3 d +
0
t 0
e−2t− e−3 d +
t
e−2−t e−3 d
t
6 4 = e−2t − e−3t 5 5 If t < 0, then 24 f ∗ gt = =
t −
t
−
e−2t− e−3 d + e−2t− e3 d +
t 0
t
0
e−2t− e−3 d +
e2t− e3 d +
e−2t− e−3 d
0
e2t− e−3 d
0
4 6 = − e3t + e2t 5 5 Finally, compute 24 f ∗ g0 = Therefore
−1
2 e−2 e−3 d = 5 −
1 1 6 −2t 4 −3t t = e e − 4 + 2 9 + 2 24 5 5 =
1 −2t 1 −3t e − e 20 30
15.4.5 Filtering and the Dirac Delta Function A Dirac delta function is a pulse of infinite magnitude having infinitely short duration. One way to describe such an object mathematically is to form a short pulse 1
Ht + a − Ht − a 2a as shown in Figure 15.11, and take the limit as the width of the pulse approaches zero: 1
Ht + a − Ht − a 2a This is not a function in the standard sense, but is an object called a distribution. Distributions are generalizations of the function concept. For this reason many theorems do not apply to t. t = lim
a→0
y(t) 1 2a
a
a
FIGURE 15.11
y=
1
Ht + a − Ht − a. 2a
t
15.4 Additional Properties and Applications of the Fourier Transform
661
However, there are some formal manipulations that yield useful results. First, if we take the Fourier transform of the pulse, we get
a a 1 e−it dt = − e−it Ht + a − Ht − a = i −a −a =
sina 1 ia e − e−ia = 2 i
By interchanging the limit and the operation of taking the transform, we have
1 t = lim Ht + a − Ht − a a→0 2a 1 Ht + a − Ht − a a→0 2a sina = lim = 1 a→0 a = lim
This leads us to consider the Fourier transform of the delta function to be the function that is identically 1. Further, putting t formally through the convolution, we have ∗ f = f = f and f ∗ = f = f suggesting that ∗ f = f ∗ = f The delta function behaves like the identity under convolution. The following filtering property enables us to recover a function value by “summing” its values when hit with a shifted delta function. THEOREM 15.13
Filtering
If f has a Fourier transform and is continuous at t0 , then ftt − t0 dt = ft0 −
This result can be modified to allow for a jump discontinuity of f at t0 . In this event we get
−
15.4.6
1 ftt − t0 dt = ft0 + + ft0 − 2
The Windowed Fourier Transform
Suppose f is a signal. This means that f is a function that is defined over the real line, and has finite energy − ft2 dt.
662
CHAPTER 15
The Fourier Integral and Fourier Transforms
In analyzing ft, we sometimes want to localize its frequency content with respect to the time variable. We have mentioned that fˆ carries information about the frequencies of the signal. However, fˆ does not particularize information to specific time intervals, since fˆ = fte−it dt −
and this integration is over all time. Hence the picture we obtain does not contain information about specific times, but instead enables us only to compute the total amplitude spectrum ˆ f . If we think of ft as a piece of music being played over time, we would have to wait until the entire piece was done before even computing this amplitude spectrum. However, we can obtain a picture of the frequency content of ft within given time intervals by windowing the function before taking its Fourier transform. To do this, we first need a window function g, which is a function taking on nonzero values only on some closed interval, often 0 T or −T T. Figures 15.12 and 15.13 show typical graphs of such functions, one on 0 T and the other on −T T. The interval is called the support of g, and in this case that we are dealing with closed intervals, we say that g has compact support. The function g has zero values outside of this support interval. We window a function f with g by forming the product gtft, which vanishes outside of −T T. g(t)
g(t)
t
T
T
FIGURE 15.12 Typical window function with compact support 0 T.
T
t
FIGURE 15.13 Typical window function with compact support −T T.
EXAMPLE 15.16
Consider the window function
1 0
gt =
for − 4 ≤ t ≤ 4 for t > 4
having compact support −4 4. This function is graphed in Figure 15.14 (a), with the vertical segments at t = ±4 included to emphasize this interval. Let ft = t sint, shown in Figure 15.14 (b). To window f with g, form the product gtft, shown in Figure 15.14 (c). This windowed function vanishes outside the support of g. For this choice of g, windowing has the effect of turning the signal ft on at time −4 and turning it off at t = 4. The windowed Fourier transform (with respect to the choice of g) is ftgte−it dt win f = f1 win = =
−
T −T
ftgte−it dt
15.4 Additional Properties and Applications of the Fourier Transform
663
g(t) f(t) 1.0 0.8
10
0.6
5
5
10
0.4
5
0
2
4
10
t
6
Window function for t ≤ 4 . for t > 4
FIGURE 15.14(a)
gt =
1
0
t
5
0.2 6 4 2
10
FIGURE 15.14(b)
ft = t sint.
g(t) f (t) 1
4 6
2
2
4
6
t
1 2 3 FIGURE 15.14(c) f windowed
with g.
EXAMPLE 15.17
Let ft = 6e−t . Then fˆ =
Use the window function
−
gt =
6e−t e−it dt =
1 0
12 1 + 2
for − 2 ≤ t ≤ 2 for t > 2
Figure 15.15 shows a graph of the windowed function gtft.The windowed Fourier transform of f is f1 = 6e−t gte−it dt win =
− 2 −2
= 12
6e−t e−it dt
−2e−2 cos2 + e−2 + e−2 sin2 + 1 1 + 2
664
CHAPTER 15
The Fourier Integral and Fourier Transforms g(t) f (t) 1.0 0.8 0.6 0.4 0.2 4
2
0
2
4
x
ft = 6e−t windowed with for t ≤ 2 . for t > 2
FIGURE 15.15
gt =
1 0
This gives the frequency content of the signal f in the time interval −2 ≤ t ≤ 2. Often we use a shifted window function. Suppose the support of g is −T T. If t0 > 0, then the graph of gt − t0 is the graph of gt shifted t0 units to the right. Now ftgt − t0 for t0 − T ≤ t ≤ t0 + T ftgt − t0 = 0 for t < t0 − T and for t > t0 + T Figures 15.16 (a) through (d) illustrates this process. In this case we take the Fourier transform of the shifted windowed signal to be f wint0 = ftgt − t0 t0 +T = ftgt − t0 e−it dt t0 −T
This gives the frequency content of the signal in the time interval t0 − T t0 + T. Engineers sometimes refer to the windowing process is known as time-frequency localization. If g is the window function, the center of g is defined to be the point t gt2 dt tC = − gt2 dt − g(t)
T
T
FIGURE 15.16(a) A window function g on −T T.
t
t0 T FIGURE 15.16(b)
function gt − t0 .
t0 T
t
Shifted window
15.4 Additional Properties and Applications of the Fourier Transform
665
g f t0 T FIGURE 15.16(c)
t
t0 T
t0 T
Typical signal
t0 T
FIGURE 15.16(d)
t
gt − t0 ft.
ft.
The number tR =
−
t − tC 2 gt2 dt gt2 dt −
1/2
is the radius of the window function. The width of the window function is 2tR , and is referred to as the RMS duration of the window. It is assumed in this terminology that the integrals involved all converge. When we deal with the Fourier transform of the window function, then similar terminology applies: ˆg 2 d center of gˆ = C = − ˆg 2 d − and radius of gˆ = R =
−
− C 2 ˆg 2 d ˆg 2 d −
1/2
The width of gˆ is 2R , a number referred to as the RMS bandwidth of the window function.
15.4.7
The Shannon Sampling Theorem
We will derive the Shannon sampling theorem, which states that a band-limited signal can be reconstructed from certain sampled values. A signal f is band-limited if its Fourier transform fˆ has compact support (has nonzero values only on a closed interval of finite length). This means that, for some L, fˆ = 0 if > L Usually we choose L to be the smallest number for which this condition holds. In this event L is the bandwidth of the signal. The total frequency content of such a signal f lies in the band
−L L. We will now show that we can reconstruct a band-limited signal from samples taken at appropriately chosen times. Begin with the integral for the inverse Fourier transform, assuming that we can recover ft for all real t from its transform: ft =
1 ˆ f eit d 2 −
666
CHAPTER 15
The Fourier Integral and Fourier Transforms
Because f is band-limited, ft =
1 L ˆ f eit d 2 −L
(15.8)
Put this aside for the moment and write the complex Fourier series for fˆ on −L L: fˆ =
cn eni/L
(15.9)
n=−
where cn =
1 L ˆ f e−ni/L d 2L −L
Comparing cn with ft in equation (15.8), we conclude that −n cn = f L L Substitute this into equation (15.9) to get fˆ =
−n ni/L f e L n=− L
Since n takes on all integer values in this summation, we can replace n with −n to write n fˆ = e−ni/L f L n=− L
Now substitute this series for fˆ into equation (15.8) to get ft =
n 1 e−ni/L eit d f 2 L − n=− L
Interchange the sum and the integral to get n L 1 ft = f eit−n/L d 2L n=− L −L =
it−n/L L 1 1 e fn/L −L 2L n=− it − n/L
=
iLt−n 1 1 e fn/L − e−iLt−n it − n/L 2L n=−
=
fn/L
1 iLt−n 1 e − e−iLt−n Lt − n 2i
fn/L
sinLt − n Lt − n
n=−
=
n=−
(15.10)
This means that ft is known for all times t if just the function values fn/L are determined for all integer values of n. An engineer would sample the signal ft at times 0, ±/L, ±2/L, . Once the values of ft are known for these times, then equation (15.10) reconstructs the entire signal. This is actually the way engineers convert digital signals to analog signals, with application to technology such as that involved in making compact disks.
15.4 Additional Properties and Applications of the Fourier Transform
667
Equation (15.10) is known as the Shannon sampling theorem. We will encounter it again when we discuss wavelets. In the case L = , the sampling theorem has the simple form ft =
fn
n=−
15.4.8
sint − n t − n
(15.11)
Lowpass and Bandpass Filters
Consider a signal f , not necessarily band-limited. However, we assume that the signal has finite energy, so ft2 dt −
is finite. Such functions are called square integrable, and we will also encounter them later with wavelet expansions. The spectrum of f is given by its Fourier transform fˆ = fte−it dt −
If f is not band-limited, we can replace f with a band-limited signal f0 with bandwidth not exceeding a positive number 0 by applying a low pass filter which cuts off fˆ at frequencies outside the range −0 0 . That is, let fˆ for − 0 ≤ ≤ 0 f2 0 = 0 for > 0 This defines the transform of the function f0 , from which we recover f0 by the inverse Fourier transform: 1 ˆ 1 0 2 f0 eit d = f eit d f0 t = 2 − 2 −0 0 The process of applying the lowpass filter is carried out mathematically by multiplying by an appropriate function (essentially windowing). Define the characteristic function (I of an interval I by 1 if t is in I (I t = 0 if t is a real number that is not in I Now observe that ˆ f2 0 = ( −0 0 f
(15.12)
or, more succintly, ˆ f2 0 = ( −0 0 f In this context, ( −0 0 is called the transfer function. Its graph is shown in Figure 15.17. The inverse Fourier transform of the transfer function is 1 0 it sin0 t −1 ( −0 0 t = e d = 2 −0 t whose graph is given in Figure 15.18. In the case that 0 = , this is the function, evaluated at t − n instead of t, which occurs in the Shannon sampling formula (15.11) that reconstructs ft from sampled values fn on the integers. For this reason sin0 t/t is called the Shannon sampling function.
668
CHAPTER 15
The Fourier Integral and Fourier Transforms
0.8 0.6 0.4
χ[0, 0] 1
6
4
2
0.2
2
4
6 t
0 FIGURE 15.17
t
0
0.2
Graph of ( −0 0 .
FIGURE 15.18
0 = 27.
Graph of sin0 t/t for
Now recall Theorem 15.12(2) and (3) of Section 15.4.4. Analog filtering in the time variable t is done by convolution. If t is the filter function, then the effect of filtering a function f by is a new function g defined by gt = ∗ ft =
−
ft − d
Taking the Fourier transform of this equation, we have gˆ = ˆ fˆ We therefore filter in the frequency variable by taking a product of the Fourier transform of the filter function with the transform of the function being filtered. We can now formulate equation (15.12) as f0 t =
sin0 t ∗ ft t
This gives the lowpass filtering of f as the convolution of the Shannon sampling function with f . In lowpass filtering, we produce from the signal f a new signal f0 that is band-limited. That is, we filter out the frequencies of the signal outside of −0 0 . In a similar kind of filtering, called bandpass filtering, we want to filter out the effects of the signal outside of given bandwidths. A band-limited signal f can be decomposed into a sum of signals, each of which carries the information content of f within a certain given frequency band. To see how to do this, let f be a band-limited signal of bandwidth . Consider a finite increasing sequence of frequencies, 0 < 1 < 2 < · · · < N = For j = 1 N , define a bandwidth filter function j by means of its transfer function: ˆj = ( −j −j−1 + ( j−1 j This transfer function, which is a sum of characteristic functions of frequency intervals, is graphed in Figure 15.19. The bandwidth filter function j t, which filters the frequency
15.4 Additional Properties and Applications of the Fourier Transform
669
0.15 0.1 0.05 1
10
j j1 FIGURE 15.19
j1
5
t
j
( −j −j−1 + ( j−1 j .
0.05 0.1 0.15
5
10
t
FIGURE 15.20
sinj t − sinj−1 t with j = 22 t and j−1 = 17. j t =
content of ft outside of the frequency range j−1 j , is obtained by taking the inverse Fourier transform of ˆj . We get j t =
sinj t − sinj−1 t t
whose graph is shown in Figure 15.20. Now define functions f0 t =
sin0 t ∗ f t t
and, for for j = 1 2 N , fj t = j ∗ ft Then, for j = 1 2 N , each fj t carries the content of the signal ft in the frequency range j−1 ≤ ≤ j , while f0 t carries the content in 0 0 , which is the low-frequency range of ft. Further, ft = f0 t + f1 t + f2 t + · · · + fN t
(15.13)
giving a decomposition of the signal into components carrying the information of the signal for specific frequency intervals.
SECTION 15.4
PROBLEMS
In each of Problems 1 through 8, determine the Fourier transform of the function t 1. 9 + t2 2. 3te−9t
2
3. 26Htte−2t 4. Ht − 3t − 3e−4t d
Hte−3t 5. dt 6. t Ht + 1 − Ht − 1
7.
5e3it t2 − 4t + 13
8. Ht − 3e−2t In each of Problems 9, 10 and 11, use convolution to find the inverse Fourier transform of the function. 9.
10.
1 1 + i2 1 1 + i2 + i
CHAPTER 15
670 11.
The Fourier Integral and Fourier Transforms
sin3 2 + i
In each of Problems 12, 13 and 14, find the inverse Fourier transform of the function. 12.
6e4i sin2 9 + 2
In each of Problems 19 through 24, compute the windowed Fourier transform of the given function f , using the windowing function g. Also compute the center and RMS bandwidth of the window function. 1 for − 5 ≤ t ≤ 5 2 19. ft = t gt = 0 for t > 5
13. e−3+4 cos2 + 8 2 /9
14. e−
20. ft = cosat gt =
sin8
15. Prove the following form of Parseval’s theorem: 1 ˆ 2 ft2 dt = f d 2 − −
−t
21. ft = e gt =
1 0
for 0 ≤ t ≤ 4 for t < 0 and for t > 4
1 0
16. The content of a signal ft is defined to be power 2 dt, assuming that this integral converges. ft − Determine the power content of Hte−2t .
22. ft = e sint gt =
17. Determine the power content of 1/t sin3t. Hint: Use the result of Problem 15.
23. ft = t + 22 gt =
1 0
24. ft = Ht − gt =
y + 6y + 5y = t − 3
15.5
t
18. Use the Fourier transform to solve
for − 4 ≤ t ≤ 4 for t > 4
1 0
1 0
for − 1 ≤ t ≤ 1 for t > 1 for − 2 ≤ t ≤ 2 for t > 2 for 3 ≤ t ≤ 5 for t < 3 and for t > 5
The Fourier Cosine and Sine Transforms We saw in Section 15.3 how the Fourier integral representation of a function suggested its Fourier transform. We will now show how the Fourier cosine and sine integrals of a function suggest cosine and sine transforms. Suppose ft is piecewise smooth on each interval 0 L and 0 ft dt converges. Then for each t at which f is continuous, ft = a costd 0
where
2 ft costdt 0 Based on these two equations, we make the following. a =
DEFINITION 15.5
Fourier Cosine Transform
The Fourier cosine transform of f is defined by c f = ft costdt 0
Often we will denote c f = fˆC .
(15.14)
15.5 The Fourier Cosine and Sine Transforms
671
Notice that fˆC = a 2 and that
2 ˆ f costd (15.15) 0 C The integrals in expressions (15.14) and (15.15) form the transform pair for the Fourier cosine transform. The latter enables us, under certain conditions, to recover ft from fˆC . ft =
EXAMPLE 15.18
Let K be a positive number, and let
The Fourier transform of f is fˆC = =
for 0 ≤ t ≤ K for t > K
1 0
ft =
ft costdt 0
K
0
costdt =
sinK
The Fourier sine transform is defined in the same spirit.
DEFINITION 15.6
Fourier Sine Transform
The Fourier sine transform of f is defined by S f = ft sintdt 0
We also denote this as fˆS . If f is continuous at t > 0, then the Fourier integral sine representation is ft = b sintd 0
where b =
2 ft sintdt 0
Since fˆS = b 2 then ft =
2 ˆ f sintd 0 S
and this is the means by which we retrieve ft from fˆS .
672
CHAPTER 15
The Fourier Integral and Fourier Transforms
EXAMPLE 15.19
With f the function of Example 15.18, fˆS = ft sintdt =
0
K 0
sintdt =
1
1 − cosK
Both of these transforms are linear: C f + g = C f + C g and S f + g = S f + S g whenever all of these transforms are defined. When these transforms are used to solve differential equations, the following operational rules play a key role. THEOREM 15.14
Operational Rules
Let f and f be continuous on every 0 L, and let 0 ft dt converge. Suppose ft → 0 and f t → 0 as t → . Suppose f is piecewise continuous on every interval 0 L. Then, C f t = −2 fˆC − f 0
1. and
S f t = −2 fˆS + f0
2.
The theorem is proved by integrating by parts twice for each rule, and we leave the details to the student. The operational formula dictates which transform is used to solve a given problem. If we seek a function ft, for 0 ≤ t < , and f0 is specified, then we might consider a Fourier sine transform. If, however, information is given about f 0, then the cosine transform might be appropriate. When we solve partial differential equations we will encounter examples where this strategy is invoked.
SECTION 15.5
PROBLEMS
In each of Problems 1 through 6, determine the Fourier cosine transform and the Fourier sine transform of the function.
⎧ ⎪ ⎨ 1 4. ft = −1 ⎪ ⎩ 0
for 0 ≤ t < K for K ≤ t < 2K for t ≥ 2K
1. ft = e−t 2. ft = te−at , with a any positive number cost for 0 ≤ t ≤ K 3. ft = 0 for t > K with K any positive number
5. ft = e−t cost 6. ft =
sinht 0
for K ≤ t < 2K for 0 ≤ t < K and for t ≥ 2K
15.6 The Finite Fourier Cosine and Sine Transforms 7. Show that, under appropriate conditions on f and its derivatives, S f 4 t = 4 fˆS − 3 f0 + f 0
673
8. Show that, under appropriate conditions on f and its derivatives, C f 4 t = 4 fˆC + 2 f 0 − f 3 0
Hint: Consider conditions that allow application of the operational formula to f t .
15.6
The Finite Fourier Cosine and Sine Transforms The Fourier transform, cosine transform and sine transform are all motivated by the respective integral representations of a function. If we employ essentially the same line of reasoning, but using Fourier cosine and sine series instead of integrals, we obtain what are called finite transforms. Suppose f is piecewise smooth on 0 .
DEFINITION 15.7
Finite Fourier Cosine Transform
The finite Fourier cosine transform of f is defined by fx cosnxdx fn = , fC n = 0
for n = 0 1 2
If f is continuous at x in 0 , then fx has the Fourier cosine series representation 1 fx = a0 + an cosnx 2 n=1
where an =
2 2 f n fx cosnxdx = , 0 C
Then fx =
1, 2 , fC 0 + f n cosnx n=1 C
an inversion-type of expression from which we can recover fx from the finite Fourier cosine transform of f . By the same token, we can define a finite sine transform.
DEFINITION 15.8
Finite Fourier Sine Transform
The finite Fourier sine transform of f is defined by fx sinnxdx fn = , fS n = 0
for n = 1 2
CHAPTER 15
674
The Fourier Integral and Fourier Transforms
For 0 < x < , if f is continuous at x, then the sine series representation is fx =
2 , f n sinnx n=1 S
an inversion formula for the finite sine transform.
EXAMPLE 15.20
Let fx = x2 for 0 ≤ x ≤ . For the finite cosine transform, compute 1 , fC 0 = x2 dx = 3 3 0 and, for n = 1 2 −1n , fC n = x2 cosnxdx = 2 2 n 0 For the finite sine transform, compute −1n 2 − n2 2 − 2 , fS n = x2 sinnxdx = n3 0 Here are the fundamental operational rules for these transforms. THEOREM 15.15
Operational Rules
Let f and f be continuous on 0 , and let f be piecewise continuous. Then fC n − f 0 + −1n f f n = −n2,
1.
for n = 1 2 and f n = −n2, fS n + nf0 − n−1n f
2. for n = 1 2
We will see applications of these finite transforms when we discuss partial differential equations.
SECTION 15.6
PROBLEMS
In each of Problems 1 through 7, find the finite Fourier sine transform of the function.
6. cosax
1. K (any constant)
In each of Problems 8 through 14, find the finite Fourier cosine transform of the function. ⎧ 1 ⎪ ⎨ 1 for 0 ≤ x < 2 8. fx = 1 ⎪ ⎩−1 for ≤ x ≤ 2
2. x 3. x2 4. x5 5. sinax
7. e−x
15.7 The Discrete Fourier Transform
16. Let f be continuous and f piecewise continuous on
0 . Prove that
9. x 10. x
675
2
n = n, fS n − f0 + −1n f f3 c
11. x3 12. coshax
for n = 0 1 2
13. sinax 14. e−x 15. Suppose f is continuous on 0 and f is piecewise continuous. Prove that n = −n, f3 fC n S
for n = 1 2
15.7
The Discrete Fourier Transform If f has period p, its complex Fourier series is
dk eik0 t
k=−
Here 0 = 2/p and the complex Fourier coefficients are given by dk =
1 +p fte−ik0 t dt p
in which, because of the periodicity of f , can be any number. If we substitute the value of 0 , the complex Fourier series of f is
dk e2ikt/p
k=−
Under certain conditions on f , this series converges to 21 ft+ + ft− at any number t. We will choose = 0 in the formula for the coefficients, so 1 p dk = fte−2ikt/p dt for n = 0 ±1 ±2 p 0 To motivate the definition of the discrete Fourier transform, suppose we want to approximate dk . One way is to subdivide 0 p into N subintervals of equal length p/N , and choose a point tj in each jp/N j + 1p/N for j = 0 1 N − 1. Now approximate dk by a Riemann sum: dk ≈ =
−1 p 1 N f tj e−2iktj /p p j=0 N −1 1 N f tj e−2ikj/N N j=0
(15.16)
The N -point Fourier transform is a rule that acts on a given sequence of N complex numbers and produces an infinite sequence of complex numbers, one for each integer k (although with periodic repetitions, as we will see later). We will define the transform in such a way that,
676
CHAPTER 15
The Fourier Integral and Fourier Transforms
except for the 1/N factor, the approximating sum 15.16 is exactly the N -point discrete Fourier transform of the numbers ft0 , ft1 , · · · , ftN −1 .
DEFINITION 15.9
N -Point Discrete Fourier Transform 4 5N −1 Let N be a positive integer. Let u = uj j=0 be a sequence of N complex numbers. Then the N -point discrete Fourier transform of u is the sequence D u defined by D uk =
N −1
uj e−2ijk/N
j=0
for k = 0 ±1 ±2
To simplify the notation, we will use a convention used with the Laplace transform and denote the N -point discrete Fourier transform of a sequence u by U (lower case for the given sequence of N numbers, upper case of the same letter for its N -point discrete Fourier transform). 4 5N −1 In this notation, if u = uj j=0 , then Uk =
N −1
uj e−2ijk/N
j=0
We will also abbreviate the phrase “discrete Fourier transform” to DFT.
EXAMPLE 15.21 −1 Consider the constant sequence u = cNj=0 , in which c is complex number. The N -point DFT is given by
Uk =
N −1
ce−2ijk/N = c
j=0
N −1
e−2ik/N
j
j=0
Now recall that the sum of a finite geometric series is N −1
rj =
j=0
Applying this to Uk , we have Uk = c =c
1 − rN 1−r
(15.17)
N 1 − e−2ik/N 1 − e−2ik/N 1 − e−2ik = 0 for k = 0 ±1 ±2 1 − e−2ik/N
because, for any integer k, e−2ik = cos2k − i sin2k = 1 For any positive integer N , the N -point DFT of a constant sequence of N numbers, is an infinite sequence of zeros. In more relaxed terms, the N -point DFT of a constant sequence is zero.
15.7 The Discrete Fourier Transform
677
EXAMPLE 15.22
Let a be a complex number and N a positive integer. To avoid trivialities, suppose a is not −1 . an integer multiple of . We will find the N -point DFT of the sequence u = sinjaNj=0 Denoting this transform by the upper case letter, we have Uk =
N −1
sinjae−2ijk/N
j=0
Use the fact that sinja =
1 ija e − e−ija 2i
to write Uk = =
−1 −1 1 N 1 N eija e−2ijk/N − e−ija e−2ijk/N 2i j=0 2i j=0 −1 −1 j 1 N j 1 N eia−2ik/N − e−ia−2ik/N 2i j=0 2i j=0
Upon using equation (15.17) on each sum, we have N N 1 1 − eia−2ik/N 1 1 − e−ia−2ik/N Uk = − 2i 1 − eia−2ik/N 2i 1 − e−ia−2ik/N 1 1 − eiaN e−2ik 1 1 − e−iaN e−2ik − ia−2ik/N 2i 1 − e 2i 1 − e−ia−2ik/N 1 1 − eiaN 1 1 − e−iaN = − 2i 1 − eia−2ik/N 2i 1 − e−ia−2ik/N =
(15.18)
since e−2ik = 1. √ To make the example more explicit, suppose N = 5 and a = 2. Then the given sequence u is √ √ √ √ u0 = 0 u1 = sin 2 u2 = sin2 2 u3 = sin3 2 u4 = sin4 2 The 5-point DFT U has kth term √
√
1 1 − e5i 2 1 1 − e−5i 2 √ √ − Uk = 2i 1 − ei 2−2ik/5 2i 1 − e−i 2−2ik/5 For example, √
√
1 1 − e5i 2 1 1 − e−5i 2 √ − √ U0 = 2i 1 − ei 2 2i 1 − e−i 2 √ √ √ sin4 2 + sin 2 − sin5 2 = √ 2 − 2 cos 2 √
√
1 1 − e5i 2 1 1 − e−5i 2 √ √ U1 = − 2i 1 − ei 2−2i/5 2i 1 − e−i 2−2i/5
678
CHAPTER 15
The Fourier Integral and Fourier Transforms
and
√
√
1 1 − e5i 2 1 1 − e−5i 2 √ √ U2 = − 2i 1 − ei 2−4i/5 2i 1 − e−i 2−4i/5 We will develop some properties of this transform.
15.7.1 Linearity and Periodicity
4 5N −1 4 5N −1 If u = uj j=0 and v = vj j=0 are sequences of complex numbers, and a and b are complex numbers, then 4 5N −1 au + bv = auj + bvj j=0 Linearity of the N -point DFT is the property: D au + bvk = aUk + bVk This follows immediately from the definition of the transform, since D au + bvk =
N −1
auj + bvj e−2ijk/N
j=0
=a
N −1
uj e−2ijk/N + b
N −1
j=0
vj e−2ijk/N = aUk + bVk
j=0
Next we will show that the N -point DFT is periodic of period N . This means that, if the 4 5N −1 given sequence is u = uj j=0 , then for any integer k, Uk+N = Uk This can be seen in the DFT calculated in Example 15.22. In equation (15.18), replace k by k + N . In this example, this change shows up only in the term eia−2ik/N in the denominator. But, if k is replaced by k + N in this exponential, no change results, since eia−2ik+N/N = eia e−2ik e−2i = eia e−2ik = eia−2ik The argument in general proceeds as follows: Uk+N =
N −1
uj e−2ijk+N/N
j=0
=
N −1
uj e−2ijk/N e−2ijk =
j=0
N −1
uj e−2ijk/N = Uk
j=0
since e−2ijk = 1.
15.7.2 The Inverse N -Point DFT Suppose we have an N -point DFT Uk =
N −1
uj e−2ijk/N
j=0
4 5N −1 of a sequence uj j=0 of N numbers. We claim that uj =
−1 1 N U e2ijk/N N k=0 k
for j = 0 1 N − 1
(15.19)
15.7 The Discrete Fourier Transform
679
Because this expression retrieves the original N -point sequence from its discrete transform, equation (15.19) is called the inverse N -point discrete Fourier transform. To verify equation (15.19), it is convenient to put W = e−2i/N . Then W N = 1 and W −1 = e2i/N Now write −1 −1 1 N 1 N Uk e2ijk/N = U W −jk N k=0 N k=0 k −1 −1 N −1 −1 N 1 N 1 N −2irk/N = ur e u W rk W −jk W −jk = N k=0 r=0 N k=0 r=0 r
=
N −1 −1 1 N ur W rk W −jk N r=0 k=0
(15.20)
In the last summation, observe that W rk W −jk = e−2irk/N e2ijk/N = e−2r−jk/N = W r−jk For given j, if r = j, then by equation (15.17) for the finite sum of a geometric series, N −1
W rk W −jk =
k=0
because W
=e
r−j N
N −1
W r−jk =
k=0 −2ir−j N −1
= 1 and W W rk W −jk =
k=0
N −1
W r−j k =
k=0 r−j
1 − W r−j N =0 1 − W r−j
= e−2ir−j/N = 1. But if r = j, then
N −1 k=0
W jk W −jk =
N −1
1 = N
k=0
Therefore, in the last double sum in equation (15.20), we need retain only the term when r = j in the summation with respect to r, yielding −1 N −1 −1 1 N 1 N 1 ur W rk W −jk = uj W jk W −jk = uj N = uj N r=0 k=0 N k=0 N
and verifying equation (15.19).
15.7.3
DFT Approximation of Fourier Coefficients
We began this section by defining the N -point DFT so that Riemann sums approximating the Fourier coefficients of a periodic function were 1/N times the N -point DFT of the sequence of function values at partition points of the interval. We will now pursue more closely the idea of approximating Fourier coefficients by a discrete Fourier transform, with the idea of sampling partial sums of Fourier series. This approximation also allows the application of DFT software to the approximation of Fourier coefficients. We will consider a specific example, ft = sint for 0 ≤ t < 4, with the understanding that f is extended over the entire real line with period 4. A graph of part of f is shown in Figure 15.21. With p = 4, the Fourier coefficients are 1 4 1 4 sine−2ik/4 d = sine−ik/2 d dk = 4 0 4 0 cos4 − 1 1 k sin4 + i 2 2 (15.21) = 2 2 k −4 2 k −4 for k = 0 ±1 ±2
680
CHAPTER 15
The Fourier Integral and Fourier Transforms
1 0.8 0.6 0.4 0.2 8 6
0.2 0.4 0.6
2
4
6
8 10 12
t
ft = sint for 0 ≤ t < 4, extended periodically over the real line.
FIGURE 15.21
Now let N be a positive integer and subdivide 0 4 into N subintervals of equal length 4/N . These subintervals are 4j/N 4j + 1/N for j = 0 1 N − 1. Form N numbers by evaluating ft at the left end point of each of these subintervals. These points are 4j/N , so we obtain the N -point sequence 6N −1 4j u = sin N j=0 Form the N -point DFT of this sequence: −1 4j −2ijk/4 N 4j −ijk/2 e e Uk = sin = sin N N j=0 j=0
N −1
Then −1 1 1 N 4j −ijk/2 Uk = e sin N N j=0 N is a Riemann sum for the integral defining dk . We ask: to what extent does 1/NUk approximate dk ? In this example we have an explicit expression (15.21) for dk . We will explicitly evaluate −1 determined in Example 15.22 This gives us 1/NUk , using a = 4/N in the DFT of sinjaNj=0 1 1 U = N k N
1 1 − e4i 1 − e−4i 1 − 2i 1 − e4i/N −2ki/N 2i 1 − e−4i/N −2ki/N
Now approximate the exponential terms in the denominator by using the approximation ex ≈ 1 + x for x << 1. Then
1 1 − e4i 1 1 − e−4i − 2i 1 − 1 + 4i/N − 2ki/N 2i 1 − 1 + −4i/N − 2ki/N
1 1 − e4i 1 − e−4i =− − 4 −2 + k 2 + k
1 1 Uk ≈ N N
=−
1 1 4 − ke4i − e−4i − 2e4i + e−4i 2 2 4 k −4
15.8 Sampled Fourier Series
681
1 1
4 − 2ki sin4 − 4 cos4 2 4 k2 − 4 cos4 − 1 1 k sin4 = 2 2 + i 2 2 k −4 2 k −4
=−
The approximation ex ≈ 1 + x has therefore led to an approximate expression for 1/NUk that is exactly equal to dk . This approximation cannot be valid for all k, however. First, the approximation used for ex assumes that x << 1, and second, the N -point DFT is periodic of period N , so Uk+N = Uk , while there is no such periodicity in the dk s. In general it would be very difficult to derive an estimate on relative sizes of k and N that would result in 1/NUk approximating dk to within a given tolerance, and which would hold for a reasonable class of functions. However, for many science and engineering applications, the empirical rule k ≤ N/8 has proved effective.
SECTION 15.7
PROBLEMS 8. Uk = i−k N = 5
In each of Problems 1 through 6, compute D uk for k = 0 ±1 ±4 for the given sequence u. 1.
cosj5j=0
9. Uk = e−ik N = 7 10. Uk = k2 N = 5
2. eij 5j=0
11. Uk = cosk N = 5
3. 1/j + 15j=0
12. Uk = lnk + 1 N = 6
4. 1/j + 12 5j=0
In each of Problems 13 through 16, compute the first seven complex Fourier coefficients d0 , d±1 , d±2 and d±3 of f (see Section 14.7). Then use the DFT to approximate these coefficients, with N = 128.
5. j 2 5j=0 6. cosj − sinj4j=0
13. ft = cost
In each of 7 through 12, a sequence Uk Nk=0 is given. Determine the N -point inverse discrete Fourier transform of this sequence.
14. ft = e−t
7. Uk = 1 + ik N = 6
16. ft = te
15.8
for 0 ≤ t ≤ 2, f has period 2
for 0 ≤ t ≤ 3, f has period 3 for 0 ≤ t ≤ 1, f has period 1
15. ft = t2 2t
for 0 ≤ t ≤ 4, f has period 4
Sampled Fourier Series In the preceding subsection we discussed approximation of Fourier coefficients of a periodic function f . This was done by approximating terms of an N -point discrete Fourier transform formed by sampling ft at N points of 0 p. We will now discuss the use of an inverse DFT to approximate sampled partial sums of the Fourier series of a periodic function (that is, partial sums evaluated at chosen points). Consider the partial sum SM t =
M
dk e2ikt/p
k=−M
Subdivide 0 p into N subintervals, and choose sample points tj = jp/N for j = 0 1 N − 1.
682
CHAPTER 15
The Fourier Integral and Fourier Transforms
−1 Form the N -point sequence u = fjp/N Nj=0 and approximate
dk ≈
1 U, N k
where Uk =
N −1
fjp/Ne−2ijk/N
j=0
In order to have k ≤ N/8, as mentioned at the end of the preceding subsection, we will require that M ≤ N/8. Thus, SM t ≈
M 1 Uk e2ikt/p N k=−M
In particular, if we sample this partial sum at the partition points jp/N , then SM jp/N ≈
M 1 U e2ijk/N N k=−M k
We will show that the sum on the right is actually an N -point inverse DFT for a particular N -point sequence, which we will now determine. We will exploit the periodicity of the N -point DFT—that is, Uk+N = Uk for all integers k. Write SM jp/N ≈
−1 M 1 1 Uk e2ijk/N + U e2ijk/N N k=−M N k=0 k
=
M M 1 1 U−k e−2ijk/N + U e2ijk/N N k=1 N k=0 k
=
M M 1 1 U−k+N e2ij−k+N/N + U e2ijk/N N k=1 N k=0 k
=
1 N
N −1
Uk e2ijk/N +
k=N −M
M 1 U e2ijk/N N k=0 k
(15.22)
In these summations, we use the 2M + 1 numbers UN −M UN −1 U0 UM Since M < N/8, we must fill in other values to obtain an N -point sequence. One way to do this is to fill in the other places with zeros. Thus define ⎧ ⎪ ⎨Uk for k = 0 1 M Vk = 0 for k = M + 1 N − M − 1 ⎪ ⎩ Uk for k = N − M N − 1 Then the M th partial sum of the Fourier series of f , sampled at jp/N , is approximated by SM jp/N ≈
−1 1 N V e2ijk/N N k=0 k
15.8 Sampled Fourier Series
683
EXAMPLE 15.23
Let fx = t for 0 ≤ t < 2, and extend f over the entire real line with period 2. Part of the graph of f is shown in Figure 15.22. The Fourier coefficients of f are i 1 2 −2ikt/2 te dt = k dk = 2 0 1
for k = 0 for k = 0
and the complex Fourier series is 1+
i ikt e k=− k=0 k
This converges to t on 0 < t < 2 and on periodic extensions of this interval. The M th partial sum is SM t = 1 +
M
i ikt e k=−Mk=0 k
To be specific, choose N = 27 = 128 and M = 10 so M ≤ N/8. Sample the partial sum at points jp/N = j/64 for j = 0 1 127. Then 6N −1 6127 j jp = u= f N j=0 64 j=0 The 128-point DFT of u has kth term Uk =
127 j −ijk/64 e 64 j=0
Define ⎧ ⎪ ⎨ Uk Vk = 0 ⎪ ⎩ Uk
for k = 0 1 10 for k = 11 117 for k = 118 127
2.0
1.0 0.5 4
2
0
2
4
6
FIGURE 15.22 ft = t for 0 ≤ t < 2, periodically extended over the real line.
8
t
684
CHAPTER 15
The Fourier Integral and Fourier Transforms
Then S10 jp/N = S10 j/64 = 1 + ≈
10
i ijk/64 e k=−10k=0 k
127 1 V eijk/64 128 k=0 k
(15.23)
In understanding this discussion of approximation of sampled partial sums of a Fourier series, it is worthwhile to see the numbers actually play out in an example. We will do the computation S10 1/2, and then of the approximation (15.23) with j = 32. First, S10 1/2 = 1 +
10
i ik/2 e = 45847 k k=−10k=0
Now we must compute the Vk s. For these, we need the numbers U0 =
127 127 j j −ij/64 = 127 U1 = e = −10 + 40 735i 64 64 j=0 j=0
U2 =
127 j −ij/32 e = −10 + 20 355i U3 = −1 0 + 13 557i j=0 64
U4 = −10 + 10 153i U5 = −10 + 8 107 8i U6 = −10 + 6 741 5i U7 = −10 + 5 763 1i U8 = −10 + 5 027 3i U9 = −1 0 + 4 453 2i U10 = −10 + 3 992 2i U118 = −10 − 3 992 2i U119 = −1 0 − 4 453 2i U120 = −10 − 5 027 3i U121 = −10 − 5 763 1i U122 = −1 0 − 6 741 5i U123 = −10 − 8 107 8i U124 = −10 − 10 153i U125 = −1 0 − 13 557i U126 = −1 0 − 20 355i U127 = −10 − 40 735i Now compute 127
Vk eik/2 = 127 + −10 + 40 735iei/2 + −10 + 20 355i ei
k=0
+ −1 0 + 13 557i e3i/2 + −10 + 10 153i e2i + −10 + 8 107 8i e5i/2 + −10 + 6 741 5i e3i + −10 + 5 763 1i e7i/2 + −10 + 5 027 3i e4i + −1 0 + 4 453 2i e9i/2 + −10 + 3 992 2i e5i + −10 − 3 992 2i e118i/2 + −1 0 − 4 453 2i e119i/2 + −10 − 5 027 3i e120i/2 + −10 − 5 763 1i e121i/2 + −1 0 − 6 741 5i e122i/2 + −10 − 8 107 8i e123i/2 + −10 − 10 153i e124i/2 + −1 0 − 13 557i e125i/2 + −1 0 − 20 355i e126i/2 + −10 − 40 735i e127i/2 = 6104832
15.8 Sampled Fourier Series
685
Then 127 1 V eijk/64 = 476 94 128 k=0 k
This gives the 128-point DFT approximation 047694 to the sampled partial sum S10 21 , which we computed to be045847. The difference is 00185. The actual sum of the complex Fourier series at t = 21 is f 21 = 050000. In practice, we would obtain greater accuracy by using much larger N (allowing larger M), and a software routine to do the computations.
15.8.1
Approximation of a Fourier Transform by an N -Point DFT
We will show how the discrete Fourier transform can be used to approximate the Fourier transform of a function, under certain conditions. Suppose, to begin, that fˆ can be approximated to within some acceptable tolerance by an integral over a finite interval: fˆ =
−
fe−i d ≈
2L
fe−i d
0
Here we have written the length of the interval as 2L for a reason that will reveal itself shortly. Subdivide 0 2L into N subintervals of length 2L/N and choose partition points j = 2jL/N for j = 0 1 N . We can then approximate the integral on the right by a Riemann sum, obtaining fˆ ≈
N −1
f2jL/Ne
−2ijL/N
j=0
=
2L N
−1 2L N f2jL/Ne−2ijL/N N j=0
The sum on the right is nearly in the form of a DFT. If we put = k/L, with k any integer, then we have −1 2L N fˆk/L ≈ f2jL/Ne−2ijk/N N j=0
(15.24)
This gives fˆk/L, the Fourier transform of f sampled at points k/L, approximated by 2L/N times the N -point DFT of the sequence 6 2jL N −1 f N j=0 As noted previously, the DFT is periodic of period N , while fˆk/L is not, so we again make the restriction that k ≤ N/8.
686
CHAPTER 15
The Fourier Integral and Fourier Transforms
EXAMPLE 15.24
We will test the approximation (15.24) for a simple case. Let ft =
e−t 0
for t ≥ 0 for t < 0
Then f has Fourier transform fˆ = =
− 0
fe−i d e− e−i d =
1 − i 1 + 2
Choose L = 1, N = 27 = 128 and k = 3 (keep in mind that we want k ≤ N/8). Now k/L = 3 and 127 2 fˆk/L = fˆ3 ≈ e−j/64 e−6ij/128 128 j=0
=
127 e−j/64 e−3ij/64 = 0 124 51 − 0 298 84i 64 j=0
For comparison, 1 − 3i fˆ3 = = 01 − 03i 10 Suppose we try a larger N , say N = 29 = 512. Now 511 2 fˆ3 ≈ e−2j/512 e−6ij/512 512 j=0
=
511 e−j/256 e−3ij/256 = 0 105 95 − 0 299 4i 256 j=0
a better approximation than obtained with N = 128.
EXAMPLE 15.25
We will continue from the preceding example. There the emphasis was on detailing the idea of approximating a value of fˆ. Now we will use the same function, but carry out the approximation at enough points to sketch approximate graphs of Re fˆ, Im fˆ and ˆ f . Using L = 4 and N = 28 = 256, we obtain the approximation 255 fˆk/4 ≈ e−j/32 e−ijk/128 32 j=0
15.8 Sampled Fourier Series
687
We should have k ≤ N/8 = 32, although we will only compute approximate values of fˆk/4 for k = 1 13. Because in this example we can compute fˆ exactly, these values are included in the table to allow comparison. DFT approx. of fˆ
fˆ
k = 1 fˆ 41 k = 2 fˆ 1
99107 − 23509i
94118 − 23529i
84989 − 3996i
8 − 4i
k = 3
68989 − 4794i
64 − 48i
54989 − 4992i
5 − 5i
44013 − 4868i
39024 − 4878i
0 35758 − 0 46033i
03077 − 04615i
29605 − 42936i
24615 − 43077i
24989 − 39839i
2 − 4i
21484 − 36933i
16495 − 37113i
18782 − 34282i
13793 − 34483i
16668 − 31896i
11679 − 32117i
14989 − 29759i
1 − 3i
13638 − 27847i
086486 − 28108i
2 fˆ 43
k = 4 fˆ1 k = 5 fˆ 5 k = 6 k = 7
4 f 23 fˆ 47
k = 8 fˆ2 k = 9 fˆ 9 4
k = 10 k = 11
fˆ 25 fˆ 114
k = 12 fˆ3 k = 13 fˆ 13 4
The real part of fˆ is consistently approximated in this scheme with an error of about 05, while the imaginary part is approximated in many cases with an error of about 002. Improved accuracy can be achieved by choosing N larger. ˆ ˆ In Figures 15.23, 15.24 and 15.25, the approximate values of Re f , Im f and ˆ f , respectively, are compared with the values obtained from the exact expression for fˆ. The squares represent approximate values, and the dots are actual values. In Figure 15.24 the approximation is sufficiently close that the points are nearly indistinguishable (within the resolution of the diagram).
1.2 1.0 0.8 0.6 0.4 0.2 0
1
2
3
4
5
6
7
8
9 10 11 12 13
k
FIGURE 15.23 Comparison of the DFT approximation of Re fˆ with actual values for e−t for t ≥ 0 ft = . 0 for t < 0
688
CHAPTER 15
The Fourier Integral and Fourier Transforms
0 0.1 0.2 0.3 0.4 0.5 0.6
1
2
3
4
5
6
7
8
k
9 10 11 12 13
FIGURE 15.24 Comparison of the DFT approximation of Im fˆ with actual values for e−t for t ≥ 0 ft = . 0 for t < 0
Thus far the discussion has centered on functions f for which fˆ can be approximated 2L by an integral 0 fe−i d. We can extend this idea to the case that fˆ is approximated L by an integral −L fe−i d, over a symmetric interval of length 2L: fˆ ≈
L −L
fe−i d
Then, fˆk/L ≈ =
L −L
0
−L
fe−ik/L d fe−ik/L d +
L
fe−ik/L d
0
1.2 1.0 0.8 0.6 0.4 0.2 0
1
2
3
4
5
6
7
8
9 10 11 12 13
k
Comparison of the DFT approximation of e−t for t ≥ 0 fˆ with actual values for ft = . 0 for t < 0 FIGURE 15.25
15.8 Sampled Fourier Series
689
Upon letting " = + 2L in the first integral of the last line, we have 2L L fˆk/L ≈ f" − 2Le−ik"−2L/L d" + fe−ik/L d L
=
2L
L
f" − 2Le−ik"/L d" +
0
L
fe−ik/L d
0
since e−2ik = 1 if k is an integer. Write for " as variable of integration, obtaining 2L L fˆk/L = f − 2Le−ik/L d + fe−ik/L d 0
L
Now define
⎧ ⎪ ⎨ft gt = 21 fL + f−L ⎪ ⎩ ft − 2L
Then fˆk/L ≈ =
2L
for 0 ≤ t < L for t = L for L < t ≤ 2L
(15.25)
ge−ik/L d
0 L 0
= 2
g2te−2ikt/L 2dt (let = 2t)
L
g2te−2ikt/L
0
Finally, approximate the last integral by a Riemann sum, subdividing 0 L into L/N subintervals and choosing tj = jL/N for j = 0 1 N − 1. Then fˆ
−1 k 2jL −2ijk/N 2L N g ≈ e L N j=0 N
As before, we assume in using this approximation that k ≤ N/8. This approximates fˆk/L by a constant multiple of the N − point DFT of the sequence 6 2jL N −1 g N j=0 in which points of the sequence are obtained from the function g manufactured from f according to equation (15.25).
15.8.2
Filtering
A periodic signal ft, of period 2L, is often filtered for the purpose of cancelling out, or diminishing, certain unwanted effects, or perhaps for emphasizing certain effects one wants to study. Suppose ft has complex Fourier series
dn enit/L
n=−
where dn =
1 L fte−nit/L dt 2L −L
690
CHAPTER 15
The Fourier Integral and Fourier Transforms
Consider the N th partial sum N
SN t =
dj eijt/L
j=−N
A filtered partial sum of the Fourier series of f is a sum of the form N
j Z N j=−N
dj eijt/L
(15.26)
in which the filter function Z is a continuous, even function on −1 1. In particular applications the object is to choose Z to serve some specific purpose. By way of introduction, we will illustrate filtering for a filter that actually forms a basic approach to the entire issue of convergence of Fourier series. In the nineteenth century, there was an intense effort to understand the subtleties of convergence of Fourier series. An example of Du Bois-Reymond showed that it is possible for the Fourier series of a continuous function to diverge at every point. In the course of delving into the convergence question, it was observed that in many cases the sequence of averages of partial sums of a Fourier series is better behaved than the sequence of partial sums itself. This led to a consideration of averages of partial sums: N t =
−1 1 N 1 S t = S0 t + S1 t + · · · + SN −1 t N k=0 k N
The quantity N t is called the N th Cesàro sum of f , after the Italian mathematician who studied their properties. It was found that, if the partial sums of the Fourier series approach a particular limit at t, then N t must approach the same limit as N → , but not conversely. It is possible for the Cesàro sums to have a limit for some t, but for the Fourier series to diverge there. 2 It was the 19-year-old prodigy Fejér who proved that, if f is periodic of period 2, and ftdt exists, then N t → ft wherever f is continuous. This is a stronger result than 0 holds for the partial sums of the Fourier series. With this as background, write −1 k 1 N ijt/L de N t = N k=0 j=−k j We leave it as an exercise for the student to show that the terms in this double sum can be rearranged to write N t =
n N 1 − dn eint/L N n=−N
This is of the form of equation 15.26 with the Cesàro filter function Zt = 1 − t for − 1 ≤ t ≤ 1 The sequence n 8N 7 n 8N 7 Z = 1− N n=−N N n=−N is called the sequence of filter factors for the Cesàro filter.
15.8 Sampled Fourier Series
691
f(t) 1 3
2
2
t
1 FIGURE 15.26 ft =
−1
for − ≤ t < 0
1 for 0 ≤ t < ft + 2 = ft for all real t.
, and
One effect of the Cesaro filter is to damp out the Gibbs phenomenon, which is seen in the convergence of the Fourier series of a function at a point of discontinuity. As an example that displays the Gibbs phenomenon very clearly, consider −1 for − ≤ t < 0 ft = 1 for 0 ≤ t < with periodic extension to the real line. Figure 15.26 shows a graph of this periodic extension. Its complex Fourier coefficients are 1 0 1 −1dt + dt = 0 d0 = 2 − 2 0 and dn =
1 0 1 nit i −1 + −1n −e−nit dt + e dt = 2 − 2 0 n
The N th partial sum of this series is SN t = If N is odd, then SN t =
N
i −1 + −1n nit e n n=−Nn=0
1 1 1 4 sint + sin3t + sin5t + · · · + sinNt 3 5 N
The N th Cesàro sum (with L = ) is N t =
n i −1 + −1n N eint 1− N n n=−N
This can be written N t =
N
1−
n=1
−1 + −1n n −2 sinnt N n
Figure 15.27 shows graphs of S10 t and 10 t, and Figure 15.28 shows graphs of S20 t and 20 t. In the partial sums SN t, the Gibbs phenomenon is readily apparent near t = 0, where f has a jump discontinuity. Even though SN t → ft for 0 < t < and for − < t < 0, the graphs of SN t have relatively high peaks near zero which remain at nearly constant height even as N increases (although these peaks move toward the vertical axis as N increases). However, this phenomenon is not seen in the graphs of N t, which accelerates and “smooths out” the convergence of the Fourier series.
692
CHAPTER 15
The Fourier Integral and Fourier Transforms
1.0
1.0 0.5
3
0.5
3
2 1
1
2
t
3
2 1 0 0.5
0.5
1
2
3
t
1.0
1.0 FIGURE 15.27 S10 t and 10 t for the function of Figure 15.26.
S20 t and 20 t for the function of Figure 15.26.
FIGURE 15.28
The Cesáro filter also damps the effects of the higher frequency terms in the Fourier series, because the Cesáro filter factor 1 − n/N tends to zero as n increases toward N . This effect is also seen in the graphs of the Cesáro sums. There are many filters that are used in signal analysis. Two of the more commonly encountered ones are the Hamming and Gauss filters. The Hamming filter is named for Richard Hamming, who was for many years a senior scientist and researcher at Bell Labs. It is given by Zt = 54 + 46 cost The filtered N th partial sum of the complex Fourier series of f , using the Hamming filter, is N
54 + 46 cosn/N dn enit/L
n=−N
Another filter frequently used to filter out background noise in a signal is the Gauss filter, named for the nineteenth century mathematician and scientist Carl Friedrich Gauss. It is given by Zt = e−a t 2 2
in which a is a positive constant. The Gauss filtered partial sum of the complex Fourier series of f is N
e−a
2 n2 /N 2
dn enit/L
n=−N
Filtering is also applied to Fourier transforms. The filtered Fourier transform of f , using the filter function Zt, is Zfe−i d −
If this integral is approximated by an integral over a finite interval, L Zfe−i d ≈ Zfe−i d −
−L
then it is standard practice to approximate the integral on the right using a DFT. The Cesàro, Hamming and Gauss filters for this integral are, respectively, t Zt = 1 − (Cesàro) L Zt = 054 + 046 cost/L (Hamming)
15.8 Sampled Fourier Series
693
and Zt = e−at/L (Gauss). 2
SECTION 15.8
PROBLEMS
In each of Problems 1 through 6, a function is given, having period p. Compute the complex Fourier series of the function, and then the 10th partial sum of this series at the indicated point t0 . Then, using N = 128, compute a DFT approximation to this partial sum at the point. for 0 ≤ t ≤ 2 p = 2 t0 =
1. ft = 1 + t 2. ft = t2
for 0 ≤ t ≤ 1 p = 1 t0 =
3. ft = cost 4. ft = e−t 5. ft = t3
1 2
for 0 ≤ t ≤ 2, p = 2, t0 =
for 0 ≤ t ≤ 4, p = 4, t0 = for 0 ≤ t ≤ 1, p = 1, t0 =
6. ft = t sint
1 8
1 8
1 4
1 4
for 0 ≤ t ≤ 4, p = 4, t0 =
1 8
In each of Problems 7 through 10, make a DFT approximation to the Fourier transform of f at the given point, using N = 512 and the given value of L. e−4t for t ≥ 0 7. ft = 0 for t < 0 L = 3 fˆ4 cos2t for t ≥ 0 8. ft = 0 for t < 0 L = 6 fˆ2 te−2t for t ≥ 0 9. ft = 0 for t < 0 ˆ L = 3 f 12 t2 cost for t ≥ 0 10. ft = 0 for t < 0 ˆ L = 4 f 4 In each of Problems 11 through 14, use the DFT to ap ˆ ˆ proximate graphs of Re f , Im f and fˆ for 0 ≤ ≤ 3, using N = 256. For these functions, fˆ can be computed exactly. Graph each of the approximations of Re fˆ, Im fˆ and fˆ on the same of axes with, respectively, the actual function itself.
11. ft = t Ht − 1 − Ht − 2 12. ft = 2e−4t 13. ft = Ht − Ht − 1 14. ft = et Ht − Ht − 2 In each of Problems 15 through 19, graph the function, the fifth partial sum of its Fourier series on the interval, and the fifth Cesàro sum, using the same set of axes. Repeat this process for the tenth and twenty-fifth partial sums. Notice in particular the graphs at points of discontinuity of the function, where the Gibbs phenomenon shows up in the partial sums. 1 for 0 ≤ t < 2 15. ft = −1 for − 2 ≤ t < 0 t2 for − 2 ≤ t < 1 16. ft = 2 + t for 1 ≤ t < 2 ⎧ −1 ⎪ ⎪ −1 for − 1 ≤ t < ⎪ ⎨ 2 1 17. ft = 0 for − 1/2 ≤ t < ⎪ ⎪ 2 ⎪ ⎩ 1 for 1/2 ≤ t < 1 for − 3 ≤ t < 1 e−t 18. ft = cost for 1 ≤ t < 3 2 + t for − 1 ≤ t < 0 19. ft = 7 for 0 < t < 1 1 for 0 ≤ t < 2 20. Let ft = −1 for − 2 ≤ t < 0 Plot the fifth partial sum of the Fourier series for ft on −2 2, together with the fifth Cesàro sum, the fifth Hamming filtered partial sum, and the fifth Gauss filtered partial sum on the same set of axes. Repeat this for the tenth sums, and the twenty-fifth sums. t for − 2 ≤ t < 0 21. Let ft = 2 + t for 0 ≤ t < 2 Plot the fifth partial sum of the Fourier series for ft on − , together with the fifth Cesàro sum, the fifth Hamming filtered partial sum, and the fifth Gauss filtered partial sum on the same set of axes. Repeat this for the tenth sums, and the twenty-fifth sums.
694
15.9
CHAPTER 15
The Fourier Integral and Fourier Transforms
The Fast Fourier Transform The discrete Fourier transform is a powerful tool for approximating Fourier coefficients, partial sums of Fourier series and Fourier transforms. However, such a tool is only useful if there are efficient computing techniques for carrying out the large numbers of calculations involved in typical applications. This is where the Fast Fourier Transform, or FFT, comes in. The FFT is not a transform at all, but rather an efficient procedure for computing discrete Fourier transforms. Its impact in engineering and science over the past 35 years has been profound, because it makes the DFT a practical tool in analyzing data. The FFT first appeared formally in 1965 in a five-page paper, “An Algorithm for the Machine Calculation of Complex Fourier Coefficients”, by James W. Cooley of IBM and John W. Tukey of Princeton University. The catalyst behind preparation and publication of the paper was Richard Garwin, a physicist who has consulted for federal agencies on questions involving weapons and defense policies. Garwin became aware that Tukey had developed an algorithm for computing Fourier transforms, a tool that Garwin needed for his own work. When Garwin took Tukey’s ideas to the computer center at IBM Research in Yorktown Heights for the purpose of having them programmed, James Cooley was assigned to assist him. Because of the importance of an efficient method for computing Fourier transforms, word of Cooley’s program quickly spread and it became so much in demand that the Cooley–Tukey paper resulted. After the paper’s publication it was found that some of the concepts underlying the method, or similar to it, had already appeared in other contexts. Tukey himself has related that Phillip Rudnick of the Scripps Oceanographic Institute had reported programming a special case of the algorithm, using ideas from a paper by G.D. Danielson and Cornelius Lanczos. Lanczos, a Hungarian born physicist/mathematician whose career spanned many areas, had developed the essential ideas around 1938 and in the years following, when he was working on problems in numerical methods and Fourier analysis. Much earlier, Gauss had essentially discovered discrete Fourier analysis in calculating the orbit of Pallas, but of course there were no computers in the Napoleonic era. Today the FFT has become a standard part of certain instrumentation software. For example, FT-NMR, which stands for Fourier Transform-Nuclear Magnetic Resonance, uses the FFT as part of its data analysis system. The reason for this widespread use is the FFT’s efficiency, which can be illustrated by a simple example. It can be shown that, if N is a positive integer power of 2, then fˆk/L as given by equation (15.24), can be computed using no more than 4N log2 N arithmetic operations. If we simply compute all of the sums and products involved in computing fˆk/L, we must perform N − 1 additions and N + 1 multiplications, each duplicated N times to get the approximations at N points. This is a total of NN − 1 + NN + 1 = 2N 2 operations. Suppose, to be specific, N = 220 = 1 048 576. Now 2N 2 = 219901012 . If the computer we are using performs one million operations per second, this calculation will require about 2,199,023 seconds, or about 2545 days of computer time. Since a given project may require computation of the Fourier transform of many functions, this is intolerable in terms of both time and money. By contrast, if N = 2n , then 4N log2 N = 2n+2 log2 2n = n2n+2 With n = 20, this is 83,886,080 operations. At one million operations per second, this will take a little under 84 seconds, a very substantial improvement over 2545 days.
15.9 The Fast Fourier Transform
15.9.1
695
Use of the FFT in Analyzing Power Spectral Densities of Signals
The FFT is routinely used to display graphs of the power spectral densities of signals. For example, consider the relatively simple signal ft = sin250t + 2 sin2120t + sin2175t + sin2210t ft is written in this way to make the frequencies of the components readily identifiable. By writing sin100t as sin250t, we immediately see that this function has frequency 50. Figure 15.29 shows a plot of the power spectral density versus frequency in Hz. 10 4 10 3 10 2 10 1 10 0 10 1 10 2 10 3 0
100
200
300
400
500
FIGURE 15.29 FTT display of the power spectral density graph of y = sin100t + 2 sin240t + sin350t + sin420t.
Where is the FFT in this? It is in the software that produced the plot. For this example, the graph was drawn using MATLAB and an FFT with N = 210 = 1024. Using the same program and choice of N , Figure 15.30 shows the power spectral density graph of gt = cos225t + cos280t + cos2125t + cos2240t + cos2315t In both graphs the peaks occur at the primary frequencies of the function.
10 3 10 2 10 1 10 0 10 1 10 2 10 3 10 4 0
100
200
300
400
FIGURE 15.30 FFT display of the power spectral density graph of y = cos50t + cos160t + cos250t + cos480t + cos630t.
500
696
CHAPTER 15
The Fourier Integral and Fourier Transforms
15.9.2 Filtering Noise From a Signal The FFT is used sometimes to filter noise from a signal. We discussed filtering previously, but the FFT is the tool for actually carrying it out. To illustrate, consider the signal ft = sin225t + sin280t + sin2125t + sin2240t + sin2315t This is a simple signal. However, the signal shown in Figure 15.31 corresponds more closely to reality, and was obtained from the graph of ft by introducing zero-mean random noise. If we did not know the original signal ft, it would be difficult to identify from Figure 15.31 the main frequency components of ft because of the effect of the noise. However, the Fourier transform sorts out the frequencies. The power spectral density of the noisy signal of Figure 15.31 is shown in Figure 15.32, where the five main frequencies can be identified easily. This particular plot does not reliably give the amplitudes, but the frequencies stand out very well. Figure 15.32 was done using the FFT via MATLAB, with N = 29 = 512.
10 8 6 4 2 0 2 4 6 0
5
10 15 20 25 30 35
40 45 50
FIGURE 15.31 A portion of the signal y = sin50t + sin160t + sin250t + sin480t + sin630t corrupted with zero-mean random noise.
160 140 120 100 80 60 40 20 0
100
200
300
400
500
FIGURE 15.32 FFT calculation of the power spectral density of the signal of Figure 15.31.
15.9 The Fast Fourier Transform
15.9.3
697
Analysis of the Tides in Morro Bay
We will use the DFT and FFT to analyze a set of tidal data, seeking correlations between high and low tides and the relative positions of the sun, earth, and moon. The forces that cause the tides were of great interest to Isaac Newton as he struggled to understand the world around him, and he devoted considerable space in the Principia to this topic. At one point, Newton required new tables of lunar positions from then Royal Astronomer Flamsteed, who, because of a busy schedule coupled with a personal feud with Newton, was not forthcoming with the data. Newton responded by exerting both professional and political pressure on Flamsteed, through his connections at court, finally forcing Flamsteed to publish the data at his own expense. Years later Flamsteed came into possession of the remaining copies of this book, and is reported to have given vent to his anger with Newton by burning every copy. It was a triumph of Newton’s theory of gravitation, applied to the system consisting of the earth, moon and sun, that enabled Newton to account for two of the primary tides that occur each day. He was also able to explain why the tides have a twice-monthly maximum and minimum and why the extremes are greatest when the moon is farthest from the earth’s equatorial plane. The elliptical orbit of the moon about the earth also accounts for the monthly variation in tide heights resulting from the change in the distance between the earth and moon throughout the month. Morro Bay is near San Luis Obispo in California. Extensive data have been collected as the Pacific Ocean rolls in and out of the bay and tides wash up on the shore. Figure 15.33 shows a curve drawn through data points giving hourly tide heights for May 1993. We will analyze this data to determine the primary forces causing these tidal variations. As a curiosity, comparison with Figure 2.12 even suggests the presence of beats in periodic tide oscillation! Before carrying out this analysis, we need some background information. The length of a solar day is 24 hours. This is the time it takes the earth to spin once relative to the sun. The lunar day is 50 minutes longer than this. It takes the earth about 24.8 hours to spin once relative to the moon because the moon is traveling in the direction of the earth’s rotation (Figure 15.34). The sun exerts its primary tidal forces at a point on the earth twice each day, and the moon, twice each 24-hour-and-50-minute period. It is fairly clear why the tide should have a local maximum at a particular location when either the sun or moon is nearly above that point. It is not as obvious, however, that the tide will also rise at a point when either of these bodies is on the opposite side of the earth, as is observed. Newton was able to show that, as the earth/moon system travels about its center of mass (which is always interior to the earth), the moon actually exerts an outward force on the opposite side of the earth. The same is true of the earth/sun system. Hence both the sun and moon cause two daily tides.
Height (0.1 ft unit)
60 50 40 30 20 10 0 10
100
200
300 400 Hours
500
600
700
FIGURE 15.33 Tide profile in Morro Bay from hourly data collected May 1993.
698
CHAPTER 15
The Fourier Integral and Fourier Transforms t 24 hr 50 min t 24 hr
Sun Moon
Earth t0
FIGURE 15.34
Tidal forces are proportional to the product of the masses of the bodies involved, and inversely proportional to the cube of the distance between them. This enables us to determine the relative tidal forces of the moon and sun on the earth and its waters. Since the sun has a mass of approximately 27106 that of the moon and is about 390 times as far from the earth as the earth is from the moon, the sun’s influence on the earth’s tides is only about 046 times that of the moon’s influence. The semidiurnal (twice daily) tides caused by the sun and moon do not just vary between the same highs and lows each day. Other forces change the amplitudes of these highs and lows. These forces are periodic and are responsible for the beats that seem to be present in Figure 15.33. Authorities on tides claim that there are actually about 390 measurably significant partial tides. Depending on the application of the data, usually only seven to twelve of these are used in computing tables of high and low tides. We will focus for the rest of this discussion on three major contributing forces. First, as the moon orbits the earth, the distance between the two changes from about 222,000 miles at perigee to 253,000 miles at apogee. With the inverse cube law of tidal forces this difference is significant. The time from perigee to apogee is about 2755 days. Next, since the moon gains on the sun by about 50 minutes each day, if the three bodies are in conjuncation at some time then they will be in quadrature about seven days later. The twice daily tides will have large amplitudes when everything is aligned and the smallest variations when the earth/moon/sun angle is 90 degrees. The change from these greatest to smallest tide variations and back again is periodic with a period of 1476 days, half the time it takes the moon to circle the earth. The last tidal force we will consider is that resulting from the moon’s orbit being tilted about 5 degrees from the plane containing the earth’s orbit about the sun. The result of this deviation can be seen by observing the moon’s location in the sky over a 1-month period. As the moon traverses the earth in its orbit, it will be above the Northern Hemisphere for a while, helping create high tides in that region. Then it will move in a southerly direction, and while it is in the Southern Hemisphere there is little variation in the tides in the north. It takes 1366 days for the moon to move from the most northerly point to that farthest south. The principal periods resulting from these forces are the solar semidiurnal period of 12 hours, the lunar semidiurnal period of 12 hours, 50 minutes, 14 seconds; a lunar-solar diurnal period of 23 hours, 56 minutes, 4 seconds; and a lunar diurnal period of 25 hours, 49 minutes, 10 seconds. Now consider the actual data used to generate the graph in Figure 15.33, and look for this information. Apply the FFT to calculate the DFT of this set of 720 data points, take absolute values, and plot the resulting points. This results in the amplitude spectrum of Figure 15.35. The units along the horizontal (frequency) axis are cycles per 720 hours. Begin from the right side of the amplitude spectrum in Figure 15.35 and move left. The first place we see a high point is at about 60, which indicates a term in the data at a frequency of 60/720, or 1/12 cycles per hour. Equivalently, this point denotes the presence of a force that is felt about every 12 hours. This is the solar semidiurnal force. The next high point in the amplitude spectrum occurs almost immediately to the left of the first, at 58. The height of this data point indicates that this is the largest contribution to the
15.9 The Fast Fourier Transform
699
10
Height (0.1 ft)
8 6 4 2
0
10
20
30
40
50
60
70
N
FIGURE 15.35 Morro Bay tide spectrum.
tides. It occurs every 720/58, or 124 hours. This is the lunar semidiurnal tide. There is also some other small amplitude activity near this point, about which we will comment shortly. Continuing to move left in Figure 15.35, there is a large contribution at about 30, indicating a force with a frequency of 30/720, or 1/24, hence a period of about 24 hours. This is the lunar-solar diurnal period. The only other term of influence that stands out occurs at 28, indicating a frequency of 28/720. This translates into a period of 257 hours, and indicates the lunar diurnal period. Thus all of the dominant periods are accounted for and no other significant information occurs in the amplitude spectrum, except for the small scattering noted previously in the region around 58. Since the lunar day is not an exact multiple of 1 hour and the data samples were taken hourly, some of the data associated with the moon’s tidal forces have leaked onto adjacent points. This also skews the amplitudes, hindering our ability to accurately determine the sun/moon ratio of forces. The same rationale could account for some of the data near 28. No other discernible information shows up in the amplitude spectrum because all of the remaining forces have periods longer than 1 month, and this is longer than the time over which the data was taken. It is interesting to speculate on what Newton would have thought of this graphical verification of his theory. Given his personality, it is possible that he would not have been impressed, having worked it all out to his own satisfaction with his calculus.
SECTION 15.9
PROBLEMS
In each of Problems 1 through 4, use a software package with the FFT to produce a graph of the power spectrum of the function. Use N = 210 . 1. yt = 4 sin80t − sin20t 2. yt = 2 cos40t + sin90t 3. yt = 3 cos90t − sin30t 4. yt = cos220t + cos70t
In each of Problems 5 through 8, corrupt the signal with zero-mean random noise and use the FFT to plot the power density spectrum to identify the frequency components of the original signal. 5. yt = cos30t + cos70t + cos140t 6. yt = sin60t + 4 sin130t + sin2405t 7. yt = cos20t + sin140t + cos240t 8. yt = sin30t + 3 sin40t + sin130t + sin196t + sin220t
This page intentionally left blank
CHAPTER
16
LEGENDRE POLYNOMIALS BESSEL FUNCTIONS STURM–LIOUVILLE THEORY AND EIGENFUNCTION EXPANSIONS ORTHOGONAL POLYNOMIALS WAVELETS LEGENDRE POLYNOMIALS BESSEL
Special Functions, Orthogonal Expansions, and Wavelets
A function is designated as special when it has some distinctive characterics that make it worthwhile determining and recording its properties and behavior. Perhaps the most familiar examples of special functions are sinkx and coskx, which are solutions of an important differential equation, y + k2 y = 0, and arise in many other contexts as well. For us, the primary motivation for studying certain special functions is that they arise in solving ordinary and partial differential equations that model many physical phenomena. Like Fourier series, they constitute necessary items in the toolkit of anyone who wishes to understand and work with such models. We will begin with Legendre polynomials and Bessel functions. These are important in their own right, but also form a model of how to approach special functions and the kinds of properties we should look for. Following these, we will develop parts of Sturm–Liouville theory, which will provide a template for studying certain aspects special functions in general, for example, eigenfunction expansions, of which Fourier series are a special case. The chapter concludes with a brief introduction to wavelets, in the setting of eigenfunction expansions.
16.1
Legendre Polynomials There are many different approaches to Legendre polynomials. We will begin with Legendre’s differential equation 1 − x2 y − 2xy + y = 0
(16.1)
in which −1 ≤ x ≤ 1 and is a real number. This equation has the equivalent form
1 − x2 y + y = 0 which we will encounter in Chapter 17 in solving for the steady-state temperature distribution over a solid sphere. 701
702
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
We seek values of for which Legendre’s equation has nontrivial solutions. Writing Legendre’s equation as y −
2x y + y = 0 2 1−x 1 − x2
we conclude that 0 is an ordinary point. There are therefore power series solutions yx =
an x n
n=0
Substitute this series into the differential equation to get
an nn − 1an xn−2 −
n=2
nn − 1an xn −
n=2
2nan xn +
n=1
an xn = 0
n=0
Shift indices in the first summation to write the last equation as
n + 2n + 1an+2 xn −
n=0
nn − 1an xn −
n=2
2nan xn +
n=1
an xn = 0
n=0
Now combine terms for n ≥ 2 under one summation, writing the n = 0 and n = 1 terms separately: 2a2 + 6a3 x − 2a1 x + a0 + a1 x +
n + 2n + 1an+2 − n2 + n − an xn = 0
n=2
The coefficient of each power of x must be zero, hence 2a2 + a0 = 0
(16.2)
6a3 − 2a1 + a1 = 0
(16.3)
and, for n = 2 3 , n + 1n + 2an+2 − nn + 1 − an = 0 from which we get the recurrence relation an+2 =
nn + 1 − a for n = 2 3 n + 1n + 2 n
(16.4)
From equation (16.2) we have a2 = − a0 2 From equation (16.4), 6− 6− −6 − a2 = − a0 = a0 3·4 2 3·4 4! 20 − −6 − 20 − a4 = a0 a6 = 5·6 6!
a4 =
and so on. Every even-indexed coefficient a2n is a multiple, involving n and , of a0 . Here we have used the factorial notation, in which n! is the product of the integers from 1 through n, if n is a positive integer. For example, 6! = 720. By convention, 0! = 1.
16.1 Legendre Polynomials
703
From equation (16.3), a3 =
2− 2− a = a 6 1 3! 1
Then, from the recurrence relation (16.4), 12 − 2 − 12 − a3 = a1 4·5 5! 30 − 2 − 12 − 30 − a5 = a1 a7 = 6·7 7!
a5 =
and so on. Every odd-indexed coefficient a2n+1 is a multiple, also involving n and , of a1 . In this way we can write the solution 2 6 − 4 6 − 20 − 6 n yx = x − x +··· an x = a 0 1 − x − 2 4! 6! n=0 2 − 3 2 − 12 − 5 2 − 12 − 30 − 7 x + x + x +··· + a1 x + 3! 5! 7! The two series in large parentheses are linearly independent, one containing only even powers of x, the other only odd powers. Put 6 − 4 6 − 20 − 6 x − x +··· ye x = 1 − x2 − 2 4! 6! and yo x = x +
2 − 3 2 − 12 − 5 2 − 12 − 30 − 7 x + x + x +··· 3! 5! 7!
The general solution of Legendre’s differential equation is yx = a0 ye x + a1 yo x in which a0 and a1 are arbitrary constants. Some particular solutions are: with = 0 and a1 = 0, yx = a0 with = 2 and a0 = 0, yx = a1 x with = 6 and a1 = 0, yx = a0 1 − 3x2 with = 12 and a0 = 0,
yx = a1
with = 20 and a1 = 0,
and so on.
5 3 x− x 3
35 yx = a0 1 − 10x2 + x4 3
704
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets y P0 (x) P1 (x)
P3 (x)
1
1
x
P4 (x) P2 (x)
FIGURE 16.1
The first five Legendre
polynomials.
The values of for which solutions are polynomial (finite series) solutions are = nn + 1 for n = 0 1 2 3 . This should not be surprising, since the recurrence relation (16.4) contains nn + 1 − in its numerator. If for some nonnegative integer N we choose = NN + 1, then aN +2 = 0, hence also aN +4 = aN +6 = · · · = 0, and one of ye x or yo x will contain only finitely many nonzero terms, hence is a polynomial. These polynomial solutions of Legendre’s differential equation have many applications, for example, in astronomy, analysis of heat conduction, and approximation of solutions of equations fx = 0. To standardize and tabulate these polynomial solutions, a0 or a1 is chosen for each = nn + 1 so that the polynomial solution has the value 1 at x = 1. The resulting polynomials are called Legendre polynomials, and are usually denoted Pn x. The first six Legendre polynomials are 1 1 3 5x − 3x P0 x = 1 P1 x = x P2 x = 3x2 − 1 P3 x = 2 2 1 1 P4 x = 35x4 − 30x2 + 3 P5 x = 63x5 − 70x3 + 15x 8 8 Graphs of these polynomials are given in Figure 16.1. Pn x is of degree n, and contains only even powers of x if n is even, and only odd powers if n is odd. Although these polynomials are defined for all real x, the relevant interval for Legendre’s differential equation is −1 < x < 1. It will also be useful to keep in mind that, if qx is any polynomial solution of Legendre’s equation with = nn + 1, then qx must be a constant multiple of Pn x.
16.1.1 A Generating Function for the Legendre Polynomials Many properties of Legendre polynomials can be derived by using a generating function, a concept we will now develop. Let Lx t = √
1 1 − 2xt + t2
We claim that, if Lx t is expanded in a power series in powers of t, then the coefficient of tn is exactly the nth Legendre polynomial.
16.1 Legendre Polynomials THEOREM 16.1
705
Generating Function for the Legendre Polynomials
Lx t =
Pn xtn
n=0
We will give an argument suggesting why this is true. Write the Maclaurin series for 1 − w−1/2 : 1 3 1 15 105 4 945 5 w + w +··· = 1 + w + w2 + w3 + √ 2 8 48 384 3840 1−w for −1 < w < 1. Put w = 2xt − t2 to obtain 1 1 3 15 = 1 + 2xt − t2 + 2xt − t2 2 + 2xt − t2 3 √ 2 8 48 1 − 2xt + t2 +
105 945 2xt − t2 4 + 2xt − t2 5 + · · · 384 3840
Now expand each of these powers of 2xt − t2 and collect the coefficient of each power of t in the resulting expression: 1 1 3 3 3 5 15 = 1 + xt − t2 + x2 t2 − xt3 + t4 + x3 t3 − x2 t4 √ 2 2 2 8 2 4 1 − 2xt + t2 15 5 5 6 35 4 4 35 3 5 105 2 6 35 7 xt − t + x t − x t + x t − xt 8 16 8 4 16 16 35 8 63 5 5 315 4 6 315 3 7 315 2 8 315 9 63 10 t + xt − xt + xt − xt + xt − t +··· + 128 8 16 16 32 128 256 5 1 3 3 = 1 + xt + − + x2 t2 + − x + x3 t3 2 2 2 2 3 15 2 35 4 4 15 35 3 63 5 5 − x + x t + x− x + x t +··· + 8 4 8 8 4 8 +
= P0 x + P1 xt + P2 xt2 + P3 xt3 + P4 xt4 + P5 xt5 + · · · The generating function provides an efficient way of deriving many properties of Legendre polynomials. We will begin by using it to show that Pn 1 = 1
and Pn −1 = −1n
for n = 0 1 2 First, setting x = 1 we have 1 1 1 L1 t = √ = = P 1tn = 1 − t n=0 n 1 − 2t + t2 1 − t2
But, for −1 < r < 1, 1 = tn 1 − t n=0
Since 1/1 − t has only one Maclaurin expansion, the coefficients in these two series must be the same, hence each Pn 1 = 1.
706
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
Similarly, 1 1 1 = = P −1tn L−1 t = √ = 1 + t n=0 n 1 + 2t + t2 1 + t2
But, −1 < t < 1, 1 = −1n tn 1 + t n=0
so Pn −1 = −1n .
16.1.2 A Recurrence Relation for the Legendre Polynomials We will use the generating function to derive a recurrence relation for Legendre polynomials. THEOREM 16.2
Recurrence Relation for Legendre Polynomials
For any positive integer n, n + 1Pn+1 x − 2n + 1xPn x + nPn−1 x = 0 Proof
(16.5)
Begin by differentiating the generating function with respect to t:
Lx t 1 x−t = − 1 − 2xt + t2 −3/2 −2x + 2t =
t 2 1 − 2xt + t2 3/2
Now notice that 1 − 2xt + t2 Substitute Lx t =
n=0 Pn xt
n
Lx t − x − tLx t = 0
t
into the last equation to obtain
1 − 2xt + t2
nPn xtn−1 − x − t
n=1
Pn xtn = 0
n=0
Carry out the indicated multiplications to write
nPn xtn−1 −
n=1
2nxPn xtn +
n=1
nPn xtn+1 −
n=1
xPn xtn +
n=0
Pn xtn+1 = 0
n=0
Rearrange these series to have like powers of t in each summation:
n + 1Pn+1 xtn −
n=0
−
n=0
2nxPn xtn +
n=1
xPn xtn +
n − 1Pn−1 xtn
n=2
Pn−1 xtn = 0
n=1
Combine summations from n = 2 on, writing the terms for n = 0 and n = 1 separately: P1 x + 2P2 xt − 2xP1 xt − xP0 x − xP1 xt + P0 xt +
n + 1Pn+1 x − 2nxPn x + n − 1Pn−1 x − xPn x + Pn−1 x tn = 0
n=2
16.1 Legendre Polynomials
707
For this power series in t to be zero for all t in some interval about 0, the coefficient of tn must be zero for n = 0 1 2 Then P1 x − xP0 x = 0 2P2 x − 2xP1 x − xP1 x + P0 x = 0 and, for n = 2 3 , n + 1Pn+1 x − 2nxPn x + n − 1Pn−1 x − xPn x + Pn−1 x = 0 These give us P1 x = xP0 x P2 x =
1 3xP1 x − P0 x 2
and, for n = 2 3 , n + 1Pn+1 x − 2n + 1xPn x + nPn−1 x = 0 Since this equation is also valid for n = 1, this establishes the recurrence relation for all positive integers. Later we will need to know the coefficient of xn in Pn x. We will use the recurrence relation to derive a formula for this number.
THEOREM 16.3
For n = 1 2 , let An be the coefficient of xn in Pn x. Then An =
1 · 3 · · · · · 2n − 1 n!
For example, A1 = 1 A2 =
1·3 3 = 2! 2
and
A3 =
1·3·5 5 = 3! 2
as can be verified from the explicit expressions derived previously for P1 x, P2 x and P3 x. In the recurrence relation (16.5), the highest power of x that occurs is xn+1 , and this term appears in Pn+1 x and in xPn x. Thus the coefficient of xn+1 in the recurrence relation is
Proof
n + 1An+1 − 2n + 1An This must equal zero (because the other side of this recurrence equation is zero). Therefore An+1 =
2n + 1 A n+1 n
708
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
and this holds for n = 0 1 2 Now we can work back: An+1 =
2n + 1 2n + 1 2n − 1 + 1 An = A n+1 n + 1 n − 1 + 1 n−1
=
2n + 1 2n − 1 2n + 1 2n − 1 2n − 2 + 1 An−1 = A n+1 n n+1 n n − 2 + 1 n−2
=
2n + 1 2n − 1 2n − 3 2n + 1 2n − 1 2n − 3 3 A = ··· = · · · A0 n+1 n n − 1 n−2 n+1 n n−1 2
But A0 = 1 because P0 x = 1, so An+1 =
1 · 3 · 5 · · · · 2n − 12n + 1 n + 1!
(16.6)
for n = 0 1 2 . The conclusion of the theorem simply states this conclusion in terms of An instead of An+1 .
16.1.3 Orthogonality of the Legendre Polynomials We will prove the following.
THEOREM 16.4
Orthogonality of the Legendre Polynomials on −1 1
If n and m are nonnegative integers, then
1 −1
Pn xPm xdx = 0 if n = m
(16.7)
This integral relationship is called orthogonality of the Legendre polynomials on −1 1. We have seen this kind of behavior before, with the functions 1 cosx cos2x sinx sin2x on the interval − . The integral, from − to , of the product of any two of these (distinct) functions is zero. Because of this property, we were able to find the Fourier coefficients of a function (recall the argument given in Section 14.2). We will pursue a similar idea for Legendre polynomials after establishing equation (16.7). Begin with the fact that Pn x is a solution of Legendre’s equation (16.1) for = nn + 1. In particular, if n and m are distinct nonnegative integers, then
Proof
1 − x2 Pn x + nn + 1Pn x = 0 and
1 − x2 Pm x + mm + 1Pm x = 0 Multiply the first equation by Pm x and the second by Pn x and subtract the resulting equations to get
1 − x2 Pn x Pm x − 1 − x2 Pm x Pn x + nn + 1 − mm + 1Pn xPm x = 0
16.1 Legendre Polynomials
709
Integrate this equation:
1
−1
1 − x2 Pn x Pm xdx −
= mm + 1 − nn + 1
1 −1
1 − x2 Pm x Pn xdx
1
−1
Pn xPm xdx
Since n = m, equation (16.7) will be proved if we can show that the left side of the last equation is zero. But, by integrating the left side by parts, we have
1 −1
1 − x2 Pn x Pm xdx −
1 −1
1 − x2 Pm x Pn xdx
1
1 = 1 − x2 Pn xPm x −1 − 1 − x2 Pn xPm xdx −1
− 1 − x2 Pm xPn x1−1 +
1 −1
1 − x2 Pn xPm xdx = 0
and the orthogonality of the Legendre polynomials on −1 1 is proved.
16.1.4
Fourier–Legendre Series
Suppose fx is defined for −1 ≤ x ≤ 1. We want to explore the possibility of expanding fx in a series of Legendre polynomials: fx =
cn Pn x
(16.8)
n=0
We were in a similar situation in Section 14.2, except there we wanted to expand a function defined on − in a series of sines and cosines. We will follow the same reasoning that led to success then. Pick a nonnegative integer m and multiply the proposed expansion by Pm x, and then integrate the resulting equation, interchanging the series and the integral:
1
−1
fxPm xdx =
cn
n=0
1 −1
Pn xPm xdx
Because of equation (16.7), all terms in the summation on the right are zero except when n = m. The preceding equation reduces to
1
−1
fxPm xdx = cm
1 −1
Pm2 xdx
Then 1 cm =
fxPm xdx 1 P 2 xdx −1 m
−1
(16.9)
Taking the lead from Fourier series, we call the expansion n=0 cn Pn x the Fourier– Legendre series, or expansion, of fx, when the coefficients are chosen according to equation (16.9). We call these cn s the Fourier–Legendre coefficients of f . As with Fourier series, we must address the question of convergence of the Fourier– Legendre series of a function. This is done in the following theorem, which is similar in form to the Fourier convergence theorem. As we will see later, this is not a coincidence.
710
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
THEOREM 16.5
Let f be piecewise smooth on −1 1. Then, for −1 < x < 1,
cn Pn x =
n=0
if the
cn s
1 fx+ + fx− 2
are the Fourier–Legendre coefficients of f .
This means that, under the conditions on f , the Fourier–Legendre expansion of fx converges to the average of the left and right limits of fx at x, for −1 < x < 1. This is midway between the gap at the ends of the graph at x if fx has a jump discontinuity there (Figure 16.2). We saw this behavior previously with convergence of Fourier (trigonometric) series. If f is continuous at x, then fx+ = fx− = fx and the Fourier–Legendre series converges to fx. As a special case of general Fourier–Legendre expansions, any polynomial qx is a linear combination of Legendre polynomials. In the case of a polynomial, this linear combination can be obtained by just solving for xn in terms of Pn x and writing each power of x in qx in terms of Legendre polynomials. For example, let qx = −4 + 2x + 9x2 We can begin with x = P1 x 2
and then solve for x in P2 x: 3 1 P2 x = x2 − 2 2 so 2 1 2 1 x2 = P2 x + = P2 x + P0 x 3 3 3 3 Then
−4 + 2x + 9x2 = −4P0 x + 2P1 x + 9
2 1 P2 x + P0 x 3 3
= −P0 x + 2P1 x + 6P2 x We can now prove the perhaps surprising result that every Legendre polynomial is orthogonal to every polynomial of lower degree.
y
1 2
( f (x 0) f (x0))
x0 FIGURE 16.2 Convergence of a Fourier–Legendre expansion at a jump discontinuity of the function.
x
16.1 Legendre Polynomials
711
THEOREM 16.6
Let qx be a polynomial of degree m, and let n > m. Then 1 qxPn xdx = 0 −1
Proof
Write qx = c0 P0 x + c1 P1 x + · · · + cm Pm x
Then
1
−1
since for 0 ≤ k ≤ m < n,
qxPn xdx =
m
ck
k=0
1 −1
1
−1
Pk xPn xdx = 0
Pk xPn xdx = 0.
This result will be useful shortly in obtaining information about the zeros of the Legendre polynomials.
16.1.5
Computation of Fourier–Legendre Coefficients
The equation (16.9) for the Fourier–Legendre coefficients of f has inator. We will derive a simple expression for this integral.
1 −1
Pn2 xdx in the denom-
THEOREM 16.7
If n is a nonnegative integer, then
1 −1
Proof
Pn2 xdx =
2 2n + 1
As before, denote the coefficient of xn in Pn x as An . We will also denote 1 pn = Pn2 xdx −1
The highest power term in Pn x is An xn , while the highest power term in Pn−1 x is An−1 xn−1 . This means that all terms involving xn cancel in the polynomial qx = Pn x −
An xP An−1 n−1
and so qx has degree at most n − 1. Write Pn x = qx + Then
An xP x An−1 n−1
An pn = Pn xPn xdx = Pn x qx + xP x dx An−1 n−1 −1 −1 A 1 = n xP xPn−1 xdx An−1 −1 n
1
1
712
CHAPTER 16 because
Special Functions, Orthogonal Expansions, and Wavelets 1 −
qxPn xdx = 0. Now invoke the recurrence relation 16.5 to write xPn x =
n+1 n P x + P x 2n + 1 n+1 2n + 1 n−1
Then xPn xPn−1 x =
n+1 n P xPn−1 x + P 2 x 2n + 1 n+1 2n + 1 n−1
so An 1 xP xPn−1 xdx An−1 −1 n
n+1 1 n 1 2 A = n Pn+1 xPn−1 xdx + Pn−1 xdx An−1 2n + 1 −1 2n + 1 −1
pn =
Since
1 −1
Pn+1 xPn−1 xdx = 0, we are left with pn =
n 1 2 n An A p Pn−1 dx = n An−1 2n + 1 −1 An−1 2n + 1 n−1
Using the previously obtained value for An , we have pn =
n − 1! n 1 · 3 · 5 · · · · 2n − 3 · 2n − 1 2n − 1 pn−1 = p n! 1 · 3 · 5 · · · · · 2n − 3 2n + 1 2n + 1 n−1
Now work forward: 1 p1 = p0 = 3 3 p2 = p1 = 5
1 1 1 1 2 P0 x2 dx = dx = 3 −1 3 −1 3 32 2 5 2 7 2 = p3 = p2 = p4 = p3 = 53 5 7 7 9 9
and so on. By induction, we find that pn =
2 2n + 1
proving the theorem. This means that the nth Fourier–Legendre coefficient of f is 1 fxPn xdx 2n + 1 1 = fxPn xdx cn = −1 1 2 −1 P 2 xdx −1 n
EXAMPLE 16.1
Let fx = cosx/2 for −1 ≤ x ≤ 1. Then f and f are continuous on −1 1, so the Fourier–Legendre expansion of f converges to cosx/2 for −1 < x < 1. The coefficients are cn =
x 2n + 1 1 Pn xdx cos 2 2 −1
16.1 Legendre Polynomials
713
Because cosx/2 is an even function, cosx/2Pn x is an odd function for n odd. This means that cn = 0 if n is odd. We need only compute even-indexed coefficients. Some of these are: x 2 1 1 dx = cos 2 −1 2 1 5 x 1 2 2 − 12 c2 = 3x − 1 dx = 10 cos 2 −1 2 2 3 x 1 9 1 4 + 1680 − 180 2 c4 = 35x4 − 30x2 + 3 dx = 18 cos 2 −1 2 8 5
c0 =
Then, for −1 < x < 1, cos
x 2
2 2 − 12 4 + 1680 − 180 2 + 10 P x + 18 P4 x + · · · 2 3 5 2 − 12 2 9 4 + 1680 − 180 2 2 35x4 − 30x2 + 3 + · · · = +5 3x − 1 + 3 5 4
=
Although in this example, fx is simple enough to compute some Fourier–Legendre coefficients exactly, in a typical application we would use a software package to compute coefficients. The terms we have computed yield the approximation cosx/2 ≈ 0 636 62 − 0 343 553x2 − 1 + 0 0064724 35x4 − 30x2 + 3 + · · · = 099959 − 12248x2 + 022653x4 + · · ·
y 2 4
2 2 4 6 8 10 12 14 16
0
4
x
FIGURE 16.3 Comparison of cosx/2 with a
partial sum of a series expansion in Legendre polynomials.
Figure 16.3 shows a graph of cosx/2 and the first three nonzero terms of its Fourier– Legendre expansion. This series agrees very well with cosx/2 for −1 < x < 1, but the two diverge from each other outside this interval. This emphasizes the fact that the Fourier–Legendre expansion is only for −1 ≤ x ≤ 1.
16.1.6
Zeros of the Legendre Polynomials
P0 x = 1, and has no zeros, while √ P1 x = x has exactly one zero, namely x =0. P2 x = 1 2 3x − 1 has two real zeros, ±1/ 3. P3 x has three real zeros, namely 0 and ± 3/5. After 2
714
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
n = 3, finding zeros of Legendre polynomials quickly becomes complicated. For example, P4 x has four real zeros, and they are " " √ √ 1 1 ± 525 + 70 30 and ± 525 − 70 30 35 35 These are approximately ±08611 and ±03400. Each Pn x just tested has n real roots, all lying in the interval −1 1. We will show that this is true for all the Legendre polynomials. This includes P0 x, which of course has no roots. The proof of this assertion is based on the orthogonality of the Legendre polynomials.
THEOREM 16.8
Zeros of Pn x
Let n be a positive integer. Then Pn x has n real, distinct roots, all lying in −1 1. We first show that, if Pn x has a real root x0 in −1 1, then this root must be simple (that is, not repeated). For suppose x0 is a repeated root. Then Pn x0 = Pn x0 = 0. Then Pn x is a solution of the initial value problem
Proof
1 − x2 y + nn + 1y = 0
yx0 = y x0 = 0
But this problem has a unique solution, and the trivial function yx = 0 is a solution. This implies that Pn x is the zero function on an interval containing x0 , and this is false. Hence Pn x cannot have a repeated root in −1 1. Now suppose n is a positive integer. Then Pn x and P0 x are orthogonal on −1 1, so
1
−1
Pn xP0 xdx =
1 −1
Pn xdx = 0
Therefore Pn x cannot be strictly positive or strictly negative on −1 1, hence must change sign in this interval. Since Pn x is continuous, there must exist some x1 in −1 1 with Pn x1 = 0. So far, this gives us one real zero in this interval. Let x1 xm be all the zeros of Pn x in −1 1, with −1 < x1 < · · · < xm < 1. Then 1 ≤ m ≤ n. Suppose m < n. Then the polynomial qx = x − x1 x − x2 · · · x − xm has degree less than n, and so is orthogonal to Pn x:
1
−1
qxPn xdx = 0
But qx and Pn x change sign at exactly the same points in −1 1 , namely at x1 xm . Therefore qx and Pn x are either of the same sign on each interval −1 x1 x1 x2 xm 1, or of opposite sign on each of these intervals. This means that qxPn x is either strictly positive or strictly negative on −1 1 except at the finitely many points x1 xm 1 where this product vanishes. But then −1 qxPn xdx must be either positive or negative, a contradiction. We conclude that m = n, hence Pn x has n simple zeros in −1 1. Referring back to the graphs of P0 x through P4 x in Figure 16.1, we can see that each of these Legendre polynomials crosses the x-axis exactly n times between −1 and 1.
16.1 Legendre Polynomials
16.1.7
715
Derivative and Integral Formulas for Pn x
We will derive two additional formulas for Pn x that are sometimes used to further analyze Legendre polynomials. The first gives the nth Legendre polynomial in terms of the nth derivative of x2 − 1n . THEOREM 16.9
Rodrigues’s Formula
For n = 0 1 2 , Pn x =
1 dn x2 − 1n 2n n! dxn
In this statement, it is understood that the zero-order derivative of a function is the function itself. Thus, when n = 0, the proposed formula gives 1 d0 x2 − 10 = x2 − 10 = 1 = P0 x 20 0! dx0 For n = 1 it gives 1 1 d 2 x − 1 = 2x = x = P1 x 21! dx 2 and for n = 2, it gives 3 1 d2 1 1 12x2 − 4 = x2 − = P2 x x2 − 12 = 2 2 2 2! dx 8 2 2 Proof
Let w = x2 − 1n . Then w = nx2 − 1n−1 2x
Then x2 − 1w − 2nxw = 0 If this equation is differentiated k + 1 times, it is a routine exercise to verify that we obtain x2 − 1
dk+2 w dk+1 w − 2n − 2k − 2x dxk+2 dxk+1
− 2n + 2n − 2 + · · · + 2n − 2k − 1 + 2n − 2k
dk w = 0 dxk
Putting k = n, we have x2 − 1
dn+2 w dn+1 w + 2x dxn+2 dxn+1
− 2n + 2n − 2 + · · · + 2n − 2n − 1 + 2n − 2n The quantity in square brackets in this equation is 2n + 2n − 2 + · · · + 2 which is the same as 21 + 2 + · · · + n
dn w = 0 dxn
716
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
But this quantity is equal to nn + 1. (Recall that x2 − 1
n
j=1 j
= 21 nn + 1. Therefore
dn+2 w dn+1 w dn w + 2x − nn + 1 = 0 dxn+2 dxn+1 dxn
Upon multiplying this equation by −1, we have 1 − x2
dn+2 w dn+1 w dn w − 2x + nn + 1 = 0 dxn+2 dxn+1 dxn
But this means that dn w/dxn is a solution of Legendre’s equation with = nn + 1. Further, repeated differentiation of the polynomial x2 − 1n yields a polynomial. Therefore, the polynomial solution dn w/dxn must be a constant multiple of Pn x: dn w = cPn x dxn
(16.10)
Now, the highest order term in x2 − 1n is x2n , and the nth derivative of x2n is 2n2n − 1 · · · n + 1xn Therefore the coefficient of the highest power of x in dn w/dxn is 2n2n − 1 · · · n + 1. The highest order term in cPn x is cAn , where An is the coefficient of xn in Pn x. We know An , so equation (16.10) gives us 2n2n − 1 · · · n + 1 = cAn = c
1 · 3 · 5 · · · 2n − 1 n!
Then c=
n!n + 1 · · · 2n − 12n 2n! = 1 · 3 · 5 · · · 2n − 1 1 · 3 · 5 · · · 2n − 1
= 2 · 4 · 6 · · · 2n = 2n n! But now equation (16.10) becomes dn x2 − 1n = 2n n!Pn x dxn which is equivalent to Rodrigues’s formula. Next we will derive an integral formula for Pn x.
THEOREM 16.10
For n = 0 1 2 , Pn x =
n √ 1 x + x2 − 1 cos d 0
For example, with n = 0 we get 1 d = 1 = P0 x 0
16.1 Legendre Polynomials With n = 1 we get
717
√ 1 x + x2 − 1 cos d = x = P1 x 0
and with n = 2 we get
2 √ 1 x + x2 − 1 cos d 0 √ 1 2 x + 2x x2 − 1 cos + x2 − 1 cos2 d = 0 3 1 = x2 − = P2 x 2 2
Proof
Let Qn x =
n √ 1 x + x2 − 1 cos d 0
The strategy behind the proof is to show that Qn satisfies the same recurrence relation as the Legendre polynomials. Since Q0 = P0 and Q1 = P1 , this will imply that Qn = Pn for all nonnegative integers n. Proceed n + 1Qn+1 x − 2n + 1xQn x + nQn−1 x n+1 √ n+1 = x + x2 − 1 cos d 0 n √ 2n + 1 − x x + x2 − 1 cos d 0 n−1 √ n x + x2 − 1 cos + d 0 After a straightforward but lengthy computation, we find that n + 1Qn+1 x − 2n + 1xQn x + nQn−1 x n−1 √ n x + x2 − 1 cos 1 − x2 sin2 d = 0 n √ √ 1 x + x2 − 1 cos x2 − 1 cosd + 0 n √ √ Integrate the second integral by parts, with u = x + x2 − 1 cos and dv = x2 − 1 cosd to get n + 1Qn+1 x − 2n + 1xQn x + nQn−1 x n−1 √ n x + x2 − 1 cos 1 − x2 sin2 d = 0
n √ √ 1 x + x2 − 1 cos x2 − 1 sin + 0 n−1 √ √ 1 √ 2 − x − 1 sinn x + x2 − 1 cos x2 − 1− sind 0 = 0 completing the proof.
CHAPTER 16
718
Special Functions, Orthogonal Expansions, and Wavelets
PROBLEMS
SECTION 16.1
1. For n = 0 1 3 4 5, verify by substitution that Pn x is a solution of Legendre’s equation corresponding to = nn + 1.
where r = x2 + y2 + z2 . To do this, introduce the angle shown in Figure 16.4. Let d = x02 + y02 + z20 2 2 and R = x − x0 + y − y0 + z − z0 2 .
2. Use the recurrence relation (Theorem 16.2), and the list of P0 x P5 x given previously, to determine P6 x through P10 x. Graph these functions and observe the location of their zeros in −1 1.
P : (x, y, z) r
3. Use Rodrigues’s formula to obtain P1 x through P5 x. 4. Use Theorem 16.10 to obtain P3 x, P4 x and P5 x. 5. It can be shown that
R
θ (0, 0, 0)
(x 0 , y0, z 0)
d
FIGURE 16.4
2n − 2k! xn−2k −1 n Pn x = 2 k!n − k!n − 2k! k=0
n/2
k
(a) Use the law of cosines to write
Use this formula to generate P0 x through P5 x. The symbol n/2 denotes the largest integer not exceeding n/2. 6. Show that n Pn x =
k
n−k
n! d d
x + 1n n−k x − 1n k k!n − k! dx dx k=0
Hint: Put x2 − 1 = x − 1x + 1 in Rodrigues’s formula. 7. Let n be a nonnegative integer. Use reduction of order (Section 2.2) and the fact that Pn x is one solution of Legendre’s equation with = nn + 1 to obtain a second, linearly independent solution 1 Qn x = Pn x dx
Pn x2 1 − x2 8. Use the result of Problem 7 to show that 1 1+x Q0 x = − ln 2 1−x 1+x x Q1 x = 1 − ln 2 1−x and
(b) From our discussion of the generating function √ for Legendre polynomials, recall that, if 1/ 1 − 2at + t2 is expanded in a series about 0, convergent for t < 1, then the coefficient of tn is Pn a. (c) If r < d, let a = cos and t = r/d to obtain
1 P cosr n n+1 n d n=0
r =
(d) If r > d, show that 1 dn Pn cosr −n r n=0
r =
10. Show that
n=0
1
2n+1
Pn
1 1 =√ . 2 3
11. Let n be a nonnegative integer. Prove that P2n+1 0 = 0
3 1 1+x − x Q2 x = 3x2 − 1 ln 4 1−x 2
and
P2n 0 = −1n
2n! 22n n!2
12. Expand each of the following in a series of Legendre polynomials. (a) 1 + 2x − x2
for −1 < x < 1. 9. The gravitational potential at a point P x y z due to a unit mass at x0 y0 z0 is x y z =
1 x y z = d 1 − 2r/d cos + r/d2
1 x − x0 2 + y − y0 2 + z − z0 2
For some purposes (such as in astronomy) it is convenient to expand x y z in powers of r or 1/r,
(b) 2x + x2 − 5x3 (c) 2 − x2 + 4x4 In each of Problems 13 through 18, find the first five coefficients in the Fourier–Legendre expansion of the function. Graph the function and the sum of the first five terms of this expansion on the same set of axes, for −3 ≤ x ≤ 3.
16.2 Bessel Functions The expansion is only valid on −1 1, but it is instructive to see how the partial sum of the Fourier–Legendre expansion are generally unrelated outside this interval.
15. fx = sin2 x
13. fx = sinx/2
16. fx = cosx − sinx −1 for − 1 ≤ x ≤ 0 17. fx = 1 for 0 < x ≤ 1
14. fx = e−x
18. fx = x + 1 cosx
16.2
719
Bessel Functions We will now develop the second kind of special function we will use to introduce the general topic of special functions. Recall from Chapter 4 that the second-order differential equation x2 y + xy + x2 − 2 y = 0 is called Bessel’s equation of order . Thus the term order is used in two senses here—the differential equation is of second order, but traditionally we say that the equation has order to refer to the parameter occurring in the coefficient of y. In Example 4.12 of Section 4.3, we used the method of Frobenius to find a series solution yx = c0
−1n
n=0
22n n!1 + 2 + · · · n +
x2n+
in which c0 is a nonzero constant and ≥ 0. This solution is valid in some interval 0 R, depending on . It will be useful to write this solution in terms of the gamma function, which we will now develop.
16.2.1
The Gamma Function
For x > 0, the gamma function $ is defined by $x = tx−1 e−t dt 0
This integral converges for all x > 0. The gamma function has a fascinating history and many interesting properties. For us, the most useful is the following. THEOREM 16.11
Factorial Property of the Gamma Function
If x > 0, then $x + 1 = x$x Proof
If 0 < a < b, then we can integrate by parts, with u = tx and dv = e−t dt , to get b b b tx e−t dt = tx −e−t a − xtx−1 −1e−t dt a
a
= −bx e−b + ax e−a + x
b
tx−1 e−t dt
a
Take the limit of this equation as a → 0+ and b → to get tx e−t dt = $x + 1 = x tx−1 e−t dt = x$x 0
0
720
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
The reason why this is called the factorial property can be seen by letting x = n, a positive integer. By repeated application of the theorem, we get $n + 1 = n$n = n$n − 1 + 1 = nn − 1$n − 1 = nn − 1$n − 2 + 1 = nn − 1n − 2$n − 2 = · · · = nn − 1n − 2 · · · 21$1 = n!$1 But $1 =
0
e−t dt = 1
so $n + 1 = n! for any positive integer n. This is the reason for the term factorial property of the gamma function. It is possible to extend $x to negative (but noninteger) values of x by using the factorial property. For x > 0, write 1 $x = $x + 1 x
(16.11)
If −1 < x < 0, then x + 1 > 0 so $x + 1 is defined and we use the right side of equation (16.11) to define $x Once we have extended $x to −1 < x < 0, we can let −2 < x < −1. Then −1 < x + 1 < 0 so $x + 1 has been defined and we can again use equation 16.11 to define $x. In this way we can walk to the left along the real line, defining $x on −n − 1 −n as soon as it has been defined on the interval −n −n + 1 immediately to the right. For example, 1 1 1 1 = 1 $ − + 1 = −2$ $ − 2 2 2 −2 and
1 2 1 3 1 4 3 = 3 $ − +1 = − $ − = $ $ − 2 2 3 2 3 2 −2
Figure 16.5 (a) shows a graph of y = $x for 0 < x < 5. Graphs for −1 < x < 0, −2 < x < −1 and −3 < x < −2, respectively, are given in Figures 16.5(b), (c) and (d). y
y
20
10
15
20
10
30
5 0
40 1
2
FIGURE 16.5(a)
3
4
5
$x for 0 < x < 5.
x
1.0 0.8 0.6 0.4 0.2 FIGURE 16.5(b)
0
$x for −1 < x < 0.
x
16.2 Bessel Functions y
y
40
5.0
30
10
20
15
10 –2.0
20 –1.8 –1.6 –1.4 –1.2 –1.0
FIGURE 16.5(c)
16.2.2
721
x
3.0
$x for −2 < x < −1.
2.8 2.6 2.4 2.2 2.0
x
FIGURE 16.5(d) $x for −3 < x < −2.
Bessel Functions of the First Kind and Solutions of Bessel’s Equation
Now return to the Frobenius solution yx of Bessel’s equation, given previously. Part of the denominator in this solution is 1 + 2 + · · · n + in which we assume that ≥ 0. Now use the factorial property of the gamma function to write $n + + 1 = n + $n + = n + n + − 1$n + − 1 = · · · = n + n + − 1 · · · n + − n − 1$n + − n − 1 = 1 + 2 + · · · n − 1 + n + $ + 1 Therefore 1 + 2 + · · · n + =
$n + + 1 $ + 1
and we can write the solution as yx = c0
−1n $ + 1 2n+ x 2n n=0 2 n!$n + + 1
It is customary to choose c0 =
1 2 $ + 1
to obtain the solution we will denote as J x: J x =
−1n x2n+ 2n+ n!$n + + 1 n=0 2
J is called a Bessel function of the first kind of order . The series defining J x converges for all x. Because Bessel’s equation is of second order (as a differential equation), we need a second solution, linearly independent from J , to write the general solution. Theorem 4.4 in Section 4.4 tells us how to proceed to a second solution. In Example 4.12 we found that the indicial equation of Bessel’s equation is r 2 − 2 = 0, with roots ±. The key lies in the difference 2 between these roots. Omitting the details of the analysis, here are the conclusions.
722
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
1. If 2 is not an integer, then J and J− are linearly independent (neither is a constant multiple of the other), and the general solution of Bessel’s equation of order is yx = aJ x + bJ− x with a and b arbitrary constants. 2. If 2 is an odd positive integer, say 2 = 2n + 1, then = n + 21 for some positive integer n. In this case, J and J− are still linearly independent. It can be shown that in this case Jn+1/2 x and Jn−1/2 x can be expressed in closed form as finite sums of terms involving square roots, sines and cosines. For example, by manipulating the series for J x, we find that
2 2 2 sinx J1/2 x = sinx J−1/2 x = cosx J3/2 x = − cosx x x x x and
J−3/2 x =
cosx 2 − sinx − x x
In this case, the general solution of Bessel’s equation of order is yx = aJn+1/2 x + bJn−1/2 x with a and b arbitrary constants. 3. 2 is an integer, but is not of the form n + 21 for any positive integer n. In this case J x and J− x are solutions of Bessel’s equation, but they are linearly dependent. Indeed, one can check from the series that in this case, J− x = −1 J x In this case we must construct a second solution of Bessel’s equation, linearly independent from J x. This leads us to Bessel functions of the second kind.
16.2.3 Bessel Functions of the Second Kind In Section 4.4 we derived a second solution for Bessel’s equation for the case = 0. It was y2 x = J0 x lnx +
−1n+1 n=1
22n n!2
∅nx2n
in which ∅n = 1 +
1 1 +···+ 2 n
Instead of using this solution as it is written, it is customary to use a linear combination of y2 x and J0 x, which will of course also be a solution. This combination is denoted Y0 x, and is defined for x > 0 by Y0 x =
2
y x + ! − ln2J0 x 2
where ! is Euler’s constant, defined by ! = lim ∅n − lnn = 0577215664901533 n→
16.2 Bessel Functions
723
J0 and Y0 are linearly independent because of the lnx term in Y0 x, and the general solution of Bessel’s equation of order zero is therefore yx = aJ0 x + bY0 x with a and b arbitrary constants. Y0 is called a Bessel function of the second kind of order zero. With the choice made for the constants in defining Y0 , this function is also called Neumann’s function of order zero. If is a positive integer, say = n, a derivation similar to that of Y0 x, but with more computational details, yields the second solution x −1k+1 ∅k + ∅k + 1 2k+n 2 + ! + Yn x = J x ln x n 2 22k+n+1 k!k + n! k=1 −
n − k − 1! 2k−n 2 n−1 x k=0 22k−n+1 k!
This agrees with Y0 x if n = 0, with the understanding that in this case the last summation does not appear. The general solution of Bessel’s equation of positive integer order n is therefore yx = aJn x + bYn x Thus far we only have Y x for a nonnegative integer. We did not need this Bessel function of the second kind for the general solution of Bessel’s equation in other cases. However, it is possible to extend this definition of Y x to include all real values of by letting Y x =
1
J x cos − J− x sin
For any nonnegative integer n, one can show that Yn x = lim Y x →n
Y is Neumann’s Bessel function of order . This function is linearly independent from J x for x > 0, and it enables us to write the general solution of Bessel’s equation of order in all cases as yx = aJ x + bY x Graphs of some Bessel functions of both kinds are shown in Figures 16.6 and 16.7. y 1.0
y 0.4
y J0(x)
0.2
y J1 (x)
1
y J2(x)
y Y2 (x)
0.2 1
FIGURE 16.6
3
x
5 y Y0 (x)
x
Bessel functions of the first kind.
FIGURE 16.7
y Y1 (x)
Bessel functions of the second kind.
724
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
Is interesting to notice that solutions of Bessel’s equation illustrate all of the cases of the Frobenius theorem (Theorem 4.4). Case 1 occurs if 2 is not an integer, case 2 if = 0, case 3 with no logarithm term if = n + 21 for some nonnegative integer n, and case 3 with a logarithm term if is a positive integer. In applications and models of physical systems, Bessel’s equation often occurs in disguised form, requiring a change of variables to write the solution in terms of Bessel functions.
EXAMPLE 16.2
Consider the differential equation 9x2 y − 27xy + 9x2 + 35y = 0 Let y = x2 u and compute y = 2xu + x2 u
y = 2u + 4xu + x2 u
Substitute these into the differential equation to get 18x2 u + 36x3 u + 9x4 u − 54x2 u − 27x3 u + 9x4 u + 35x2 u = 0 Collect terms to write 9x4 u + 9x3 u + 9x4 − x2 u = 0 Divide by 9x2 to get
1 x2 u + xu + x2 − u = 0 9
which is Bessel’s equation of order = 13 . Since 2 is not an integer, the general solution for u is ux = aJ1/3 x + bJ−1/3 x Therefore the original differential equation has general solution yx = ax2 J1/3 x + bx2 J−1/3 x for x > 0. If a, b and c are constants and n is any nonnegative integer, then it is routine to show that xa J bxc and xa Y bxc are solutions of the general differential equation 2a − 1 a2 − 2 c 2 y = 0 (16.12) y + b2 c2 x2c−2 + y − x x2
EXAMPLE 16.3
Consider the differential equation √ 2 3−1 61 y − y + 784x6 − 2 y = 0 x x
16.2 Bessel Functions
725
√ To fit this into the template of equation 16.12, we must clearly choose a = 3. Because of the x6 term, try putting 2c − 2 = 6, hence c = 4. Now we must choose b and so that 784 = b2 c2 = 16b2 so b = 7, and a2 − 2 c2 = 3 − 16 2 = −61 This equation is satisfied by = 2. The general solution of the differential equation is therefore √ √ yx = c1 x 3 J2 7x4 + c2 x 3 Y2 7x4 for x > 0. Here c1 and c2 are arbitrary constants.
16.2.4
Modified Bessel Functions
Sometimes a model of a physical phenomenon will require a modified Bessel function for its solution. We will show how these are obtained. Begin with the general solution yx = c1 J0 kx + c2 Y0 kx of the zero-order Bessel function 1 y + y + k2 y = 0 x Let k = i. Then yx = c1 J0 ix + c2 Y0 ix is the general solution of 1 y + y − y = 0 x for x > 0. This is a modified Bessel equation of order zero, and J0 ix is a modified Bessel function of the first kind of order zero. Usually this is denoted 1 2 1 1 x + 2 2 x4 + 2 2 2 x6 + · · · 22 24 246 Normally Y0 ix is not used, but instead the second solution is chosen to be I0 x = J0 ix = 1 +
1 K0 x = ln2 − !I0 x − I0 x lnx + x2 + · · · 4 for x > 0. Here ! is Euler’s constant. K0 is a modified Bessel function of the second kind of order zero. Figure 16.8 shows graphs of I0 x and K0 x y y I0 (x)
5
1 0
y K0 (x) 3
FIGURE 16.8 Modified Bessel functions.
x
726
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
The general solution of 1 y + y − y = 0 x is therefore yx = c1 I0 x + c2 K0 x for x > 0. The general solution of 1 y + y − b2 y = 0 x
(16.13)
yx = c1 I0 bx + c2 K0 bx
(16.14)
is
for x > 0. By a routine calculation using the series expansion, we find that x xI0 xdx = I0 x + c for any nonzero constant . Often we are interested in the behavior of a function when the variable assumes increasing large values. This is called asymptotic behavior, and we will treat it later in some detail for Bessel functions in general. However, with just a few lines of work we can get some idea of how I0 x behaves for large x. Begin with 1 y + y − y = 0 x of which cI0 x is a solution for any constant c. Under the change of variables y = ux−1/2 , this equation transforms to 1 u = 1 − 2 u 4x √ with solution ux = c xI0 x for x > 0 and c any constant. Transform further by putting u = vex , obtaining v + 2v +
1 v = 0 4x2
√ with solution vx = c xe−x I0 x. Since we are interested in the behavior of solutions for large x, attempt a series solution of this differential equation for v of the form 1 1 1 vx = 1 + c1 + c2 2 + c3 2 + · · · x x x Substitute into the differential equation and arrange terms to obtain 1 1 1 1 + 2c1 − 4c2 + c1 −2c1 + 2 4 x 4 x3 1 1 1 1 + 6c2 − 6c3 + c2 + 12c3 − 8c4 + c3 + · · · = 0 4 x4 4 x5
16.2 Bessel Functions
727
Each coefficient must vanish, hence −2c1 +
1 = 0 4
1 2c1 − 4c2 + c1 = 0 4 1 6c2 − 6c3 + c2 = 0 4 1 12c3 − 8c4 + c3 = 0 4 and so on. Then 1 c1 = 8 c2 =
32 9 9 1 c1 = = 16 16 8 2 · 82
c3 =
25 25 32 32 52 c2 = = 24 24 2 · 82 3!83
c4 =
49 49 32 52 32 52 72 = c3 = 32 32 3!83 4!84
and the pattern is clear: vx = 1 +
11 32 1 32 52 1 32 52 72 1 + + + +··· 8 x 2 · 82 x2 3!83 x3 4!84 x4
Then, for some constant c,
11 32 1 32 52 1 32 52 72 1 ex I0 x = c √ 1 + + + + +··· 8 x 2 · 82 x2 3!83 x3 4!84 x4 x
The series on the right actually diverges, but the sum of the first N terms approximates I0 x as closely as we want, for x sufficiently large. This is called an asymptotic expansion of I0 x. √ By an analysis we will not carry out, it can be shown that c = 1/ 2. These results about modified Bessel functions will be applied shortly to a description of the skin effect in the flow of an alternating current through a wire of circular cross section.
16.2.5
Some Applications of Bessel Functions
We will use Bessel functions in the next chapter to solve certain partial differential equations. However, Bessel functions arise in many different contexts. We will discuss two such settings here. The Critical Length of a Vertical Rod Consider a thin elastic rod of uniform density and circular cross section, clamped in a vertical position as in Figure 16.9. If the rod is long enough, and the upper end is given a displacement and held in that position until the rod is at rest, the rod will remain bent or displaced when released. Such a length is referred to as an unstable length. At some shorter lengths, however, the rod will return to the vertical position when released, after some small oscillations. These lengths are referred to as stable lengths for the rod. We would like to determine the critical length LC , the transition point from stable to unstable. Suppose the rod has length L and weight w per unit length. Let a be the radius of its circular cross section and E the Young’s modulus for the material of the rod. (This is the ratio of stress
728
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets (0, 0) x
Q (ξ, η) P(x, y)
(L, 0) FIGURE 16.9
FIGURE 16.10
to the corresponding strain, for an elongation or linear compression). The moment of inertia about a diameter is I = a4 /4. Assume that the rod is in equilibrium and is then displaced slightly from the vertical, as in Figure 16.10. The x axis is vertical along the original position of the rod, with downward as positive and the origin at the upper end of the rod at equilibrium. Let Px y and Q be points on the displaced rod, as shown. The moment about P of the weight of an element w x at Q is w x y − yx. By integrating this expression we obtain the moment about P of the weight of the rod above P. Assume from the theory of elasticity that this moment about P is EIy x. Since the part of the rod above P is in equilibrium, then x EIy x = w y − yxd 0
Differentiate this equation with respect to x: EIy3 x = w yx − yx −
x 0
wy xd = −wxy x
Then y3 x +
w xy x = 0 EI
Let u = y to obtain the second order differential equation u +
w xu = 0 EI
Compare this equation with equation 16.12. We need 2a − 1 = 0
a2 − 2 c = 0
2c − 2 = 1
and b2 c2 =
This leads us to choose 1 a= 2
3 c= 2
1 = 3
b=
2 3
w EI
w EI
The general solution for ux is
ux = y x = c1
√
√ 2 w 3/2 2 w 3/2 x x xJ1/3 + c2 xJ−1/3 3 EI 3 EI
Since there is no bending moment at the top of the rod, y 0 = 0
16.2 Bessel Functions
729
We leave it for the student to show that this condition requires c1 = 0. Then √ 2 w 3/2 x y x = c2 xJ−1/3 3 EI Since the lower end of the rod is clamped vertically, y L = 0, so √ 2 w 3/2 = 0 L c2 LJ−1/3 3 EI Since c2 must be nonzero to avoid a trivial solution, we need 2 w 3/2 L = 0 J−1/3 3 EI The critical length LC is the smallest positive number which can be substituted for L in this equation. From a table of Bessel functions we find that the smallest positive number such that J−1/3 = 0 is approximately 18663. Therefore 2 w 3/2 L ≈ 18663 3 EI C so 1/3 EI LC ≈ 19863 w Alternating Current in a Wire We will analyze alternating current in a wire of circular cross section, culminating in a mathematical description of the skin effect (at high frequencies, “most” of the current flows through a thin layer at the surface of the wire). Begin with general principles named for Ampère and Faraday. Ampère’s law states that the line integral of the magnetic force around a closed curve (circuit) is equal to 4 times the integral of the electric current through the circuit. Faraday’s law states that the line integral of the electric force around a closed circuit equals the negative of the time derivative of the magnetic induction through the circuit. We want to use these laws to determine the current density at radius r in a wire of circular cross section and radius a. Let be the specific resistance of the wire, its permeability, and xr t and Hr t the current density and magnetic intensity, respectively at radius r and time t. To begin, apply Ampère’s law to a circle of radius r having its axis along the axis of the wire. We get r x2d 2rH = 4 0
or rH = 4
r
xd
(16.15)
0
Then
rH = 4xr
r so 1 rH = 4xr t r r
(16.16)
730
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets L r
FIGURE 16.11
Now apply Faraday’s law to the rectangular circuit of Figure 16.11, having one side of length L along the axis of the cylinder. We get
r LH td Lx0 t − Lxr t = −
t 0 Differentiate this equation with respect to r to get
H
x =
r
t
(16.17)
We want to use equations 16.16 and 16.17 to eliminate H. First multiply equation (16.17) by r to get r
H
x = r
r
t
Differentiate with respect to r:
x
H
x
rH = 4xr = 4r r = r =
r
r
r
t
t r
t
t in which we substituted from equation (16.16) at the next to last step. Then
x
x r = 4r
r
r
t
(16.18)
The idea is to solve this partial differential equation for xr t, then obtain Hr t from equation (16.15). To do this, assume that the current through the wire is an alternating current given by C cost, with C constant. Thus the period of the current is 2/. It is convenient to write zr t = xr t + iyr t, so xr t = Rezr t, and to think of the current as the real part of the complex exponential Ceit . The differential equation (16.18), with z in place of x, is
z
z
r = 4r (16.19)
r
r
t To solve this equation, we will attempt a solution of the form zr t = freit Substitute this proposed solution into equation (16.19) to get
rf reit = 4rfrieit
r
Divide by eit and carry out the differentiations to get 1 f r + f r − b2 fr = 0 r where b2 =
4 i
16.2 Bessel Functions
731
Comparing this equation with equation (16.13), we can write the general solution for fr in terms of modified Bessel functions: fr = c1 I0 br + c2 K0 br where
0 b=
4 1 + i √ 2
Because of the logarithm term in K0 r, which has infinite limit as r → 0 (center of the wire), choose c2 = 0. Thus fr has the form fr = c1 I0 br and zr t = c1 I0 breit To determine the constant, use the fact that (the real part of) Ceit is the total current, hence, using equation (16.14), a 2ac1 C = 2c1 rI0 brdr = I0 ba b 0 Then c1 =
1 bC 2a I0 ba
and zr t =
bC 1 I breit 2a I0 ba 0
Then xr t = Rezr t, and we leave it for the student to show that 2C it Hr t = Re I bre aI0 ba 0 We can use the solution for zr t to model the skin effect. The entire current flowing through a cylinder of radius r within the wire (and having the same central axis as the wire), is the real part of r b Ceit I0 br2rdr 2aI0 ba 0 and some computation shows that this is the real part of rI0 br it Ce aI0 ba Therefore current in the cylinder of radius r r I0 br = total current in the wire a I0 ba When the frequency is large, then the magnitude of b is large, and we can use the asymptotic expansion of I0 x given in Section 16.2.4 to write √ ba r −ba−r r I0 br r ebr = ≈ √ e ba a I0 ba a br e a
732
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
For any r, with 0 < r < a, we can make ar e−ba−r as small as we like by taking the frequency sufficiently large. This means that for large frequencies, most of the current is flowing near the outer surface of the wire. This is the skin effect.
16.2.6 A Generating Function for Jn x We now return to a development of general properties of Bessel functions. For the Legendre polynomials, we produced a generating function Lx t with the property that Lx t =
Pn xtn
n=0
In the same spirit, we will now produce a generating function for the integer order Bessel functions of the first kind.
THEOREM 16.12
Generating Function for Bessel Functions
ext−1/t/2 =
Jn xtn
(16.20)
n=−
To understand why equation (16.20) is true, begin with the familiar Maclaurin expansion of the exponential function to write ext−1/t/2 = ext/2 e−x/2t x k 1 xt m 1 k −1 = 2 2t m=0 m! k=0 k! 1 x2 xt 1 x2 t2 1 x3 t3 x 1 x3 + 1+ + + + · · · 1 − − + · · · 2 2! 22 3! 23 2t 2! 22 t2 3! 23 t3 To illustrate the idea, look for the coefficient of t4 in this product. We obtain t4 when x4 t4 /24 4! on the left is multiplied by 1 on the right, and when x5 t5 /25 5! is multiplied by −x/2t on the right, and when x6 t6 /26 6! is multiplied by x2 /22 2!t2 on the right, and so on. In this way we find that the coefficient of t4 in this product is −1n 1 4 1 5 1 1 6 7 x x x x x2n+4 − + − + · · · = 2n+4 n!n + 4! 24 4! 26 5! 28 2!6! 210 3!7! n=0 2
Now compare this series with J4 x =
−1n −1n 2n+4 = x x2n+4 2n+4 2n+4 2 n!$n + 4 + 1 2 n!n + 4! n=0 n=0
Similar reasoning establishes that the coefficient of tn in equation (16.20) is Jn x for any nonnegative integer n. For negative integers, we can use the fact that J−n x = −1n Jn x
16.2 Bessel Functions
733
An Integral Formula for Jn x
16.2.7
Using the generating function, we can derive an integral formula for Jn x when n is a positive integer. THEOREM 16.13
Bessel’s Integral
If n is a nonnegative integer, then 1 cosn − x sind 0
Jn x = Proof
Begin with the fact that
ext/2 e−x/2t =
Jn xtn
n=−
Since J−n x = −1 Jn x, n
−1
ext/2 e−x/2t = ext−1/t/2 =
Jn xtn + J0 x +
n=−
=
Jn xtn
n=1
−1n Jn xt−n + J0 x +
n=1
Jn xtn
n=0
1 = J0 x + Jn x t + −1 n t n=1 1 1 2n 2n−1 = J0 x + J2n x t + 2n + J2n−1 x t − 2n−1 t t n=1 n=1
n
n
(16.21)
Now let t = ei = cos + i sin Then t2n +
1 = e2in + e−2in = 2 cos2n t2n
and t2n−1 −
1 = ei2n−1 − e−i2n−1 = 2i sin2n − 1 t2n−1
Therefore equation (16.21) becomes ext−1/t/2 = eix sin = cosx sin + i sinx sin = J0 x + 2
J2n x cos2n + 2i
n=1
J2n−1 x sin2n − 1
n=1
The real part of the left side of this equation must equal the real part of the right side, and similarly for the imaginary parts: cosx sin = J0 x + 2
n=1
J2n x cos2n
(16.22)
734
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
and sinx sin = 2
J2n−1 x sin2n − 1
(16.23)
n=1
Now recognize that the series on the right in equations (16.22) and (16.23) are Fourier series. Focusing on equation (16.22) for the moment, its Fourier series is therefore 1 cosx sin = a0 + ak cosk + bk sink 2 k=1
= J0 x + 2
J2n x cos2n
n=1
Since we know the coefficients in a Fourier expansion, we conclude that 1 0 if k is odd ak = cosx sin coskd = − 2Jk x if k is even and bk =
1 cosx sin sinkd = 0 for k = 1 2 3 −
(16.24)
(16.25)
Similarly, from equation (16.23), 1 sinx sin = A0 + Ak cosk + Bk sink 2 k=1
=2
J2n−1 x sin2n − 1
n=1
so these Fourier coefficients are 1 Ak = sinx sin coskd = 0 − and
for
1 0 Bk = sinx sin sinkd = − 2Jk x
k = 0 1 2
(16.26)
if k is even if k is odd
(16.27)
Upon adding equations (16.24) and (16.27), we have 1 1 cosx sin coskd + sinx sin sinkd − − 1 2Jk x if k is even cosk − x sind = = − 2Jk x if k is odd Thus Jk x =
1 cosk − x sind 2 −
for
k = 0 1 2 3
To complete the proof, we have only to observe that cosk − x sin is an even function, hence − = 2 0 , so 1 Jk x = cosk − x sind for k = 0 1 2 3 0
16.2 Bessel Functions
735
A Recurrence Relation for J x
16.2.8
We will derive three recurrence-type relationships involving Bessel functions of the first kind. These provide information about the function or its derivative in terms of functions of the same type, but lower index. We begin with two relationships involving derivatives. THEOREM 16.14
If is a real number, then d x J x = x J−1 x dx Proof
(16.28)
Begin with the case that is not a negative integer. By direct computation, −1n d d 2n+ x J x = x x 2n+ n!$n + + 1 dx dx n=0 2 d −1n = x2n+2 dx n=0 22n+ n!$n + + 1 =
−1n 2n +
n=0
22n+ n!n + $n +
= x
x2n+2−1
−1n x2n+−1 = x J−1 x 2n+−1 n!$n + n=0 2
Now extend this result to the case that is a negative integer, say = −m with m a positive integer, by using the fact that J−m x = −1m Jm x We leave this detail to the student. THEOREM 16.15
If is a real number, then d − x J x = −x− J+1 x dx
(16.29)
Verification of this relationship is similar to that of equation (16.28). Using these two recurrence formulas involving derivatives, we can derive the following relationship between Bessel functions of the first kind of different orders. THEOREM 16.16
Let be a real number. Then for x > 0, 2 J x = J+1 x + J−1 x x Proof
Carry out the differentiations in equations (16.28) and (16.29) to write x J x + x−1 J x = x J−1 x
(16.30)
736
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
and x− J x − x−−1 J x = −x J+1 x Multiply the first equation by x− and the second by x to obtain J x + J x = J−1 x x and J x − J x = −J+1 x x Upon subtracting the second of these equations from the first, we obtain the conclusion of the theorem.
EXAMPLE 16.4
Previously we stated that
J1/2 x =
2 sinx J−1/2 x = x
2 cosx x
results obtained by direct reference to the infinite series for these Bessel functions. Putting = 21 into equation (16.30), we get 1 J x = J3/2 x + J−1/2 x x 1/2 Then 1 J3/2 x = J1/2 x − J−1/2 x x 2 1 2 sinx − cosx = x x x 2 1 sinx − cosx = x x Then, upon putting =
3 2
into equation (16.30), we get 3 J x = J5/2 x + J1/2 x x 3/2
Then 3 J5/2 x = −J1/2 x + J3/2 x x 3 2 1 2 sinx + sinx − cosx =− x x x x
2 3 3 = − sinx + 2 sinx − cosx x x x This process can be continued indefinitely. The point is that this is a better way to generate Bessel functions Jn+1/2 x than by referring to the infinite series each time.
16.2 Bessel Functions
16.2.9
737
Zeros of J x
We have seen in some of the applications that we sometimes need to know where J x = 0. Such points are the zeros of J x. We will show that J x has infinitely many simple positive zeros, and also obtain estimates for their locations. As a starting point, recall from equation (16.12) that y = J kx is a solution of x2 y + y + k2 x2 − 2 y = 0 √ Let k > 1. Now put ux = kxJ kx. Substitute this into Bessel’s equation to get 1 2 − 4 ux = 0 (16.31) u x + k2 − x2 Our intuition is that, as x increases, the term 2 − 1/4 /x2 exerts less influence on this equation for u, which begins to look more like u + k2 u = 0, with sine and cosine solutions. This √ suggests that, for large x, J kx is approximated by a trigonometric function, divided by kx. Since such a function has infinitely many positive zeros, so must J kx. In order to exploit this intuition, consider the equation v x + vx = 0
(16.32)
This has solution vx = sinx − , with any positive number. Multiply equation (16.31) by v and equation (16.32) by u and subtract to get 2 − 41 2 uv − vu = k − uv − uv x2 Write this equation as
2 − 41 uv − vu = k − 1 − x2
2
uv
Now compute
+
uv − vu dx
= u + v + − uv − v + u + + vu = −u + − u 1 + 2 − 4 = k2 − 1 − uxvxdx x2 Apply the mean value theorem for integrals to the last integral. There is some number between and + such that + 2 − 41 2 −u + − u = u k −1− sinx − dx x2 Now sinx − > 0 for < x < + . Further, we can choose large enough (depending on and k) that k2 − 1 −
2 − 41 >0 x2
738
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
for ≤ x ≤ + . Therefore the integral on the right in the last equation is positive. Then u + , u, and u cannot all be of the same sign. √ Since u is continuous, u must have a zero somewhere between and + . Since ux = kxJ kx, this proves that J kx has at least one zero between and + . In general, if is any sufficiently large number and k > 1, then J x has a zero between and + k. We can now state a general result on positive zeros of Bessel functions of the first kind.
THEOREM 16.17
Zeros of J x
Let k > 1 and let be a real number. Then, for sufficiently large, there is a zero of J x between + kn and + kn + 1 for n = 0 1 2 . Further, each zero is simple. The argument given prior to the theorem shows that, for any number sufficiently large (depending on and the selected k > 1), there is a zero of J x in the interval from that number to that number plus k. Thus there is a zero between and + k, and then between + k and + k + 1, and so on. Further, each zero is simple. For if a zero has multiplicity greater than 1, then J = J = 0. But then J x is a solution of the initial value problem x2 y + y + k2 x2 − 2 y = 0 y = y = 0
Proof
Since the solution of this problem is unique, and the zero function is a solution, this would imply that J x = 0 for x > 0, a contradiction. Thus each zero is simple. The theorem implies that we can order the positive zeros of J x in an increasing sequence j1 < j2 < j3 < so that limn→ jn = . It can be shown that for > −1, J x has no complex zeros. We will show that J has no positive zero in common with J+1 or J−1 . However, we claim that both J−1 and J+1 have at least one zero between any pair of positive zeros of J . This is the interlacing lemma stated as Theorem 16.18 below, and it means that the graphs of these three functions weave about each other, as can be seen in Figure 16.12 for J7 x, J8 x and J9 x. First we need the following. y 0.3
J8 (x) J9 (x)
J7 (x)
0.2 0.1 0
5
10
15
20
25
30
35
0.1 0.2 FIGURE 16.12
J9 x.
Interlacing of J7 x, J8 x, and
x
16.2 Bessel Functions
739
LEMMA 16.1
Let be a real number. Then, except possibly for x = 0, J has no zero in common with either J−1 or with J+1 . Recall from the proof of Theorem 16.16 that J x + J x = J−1 x x If = 0 and J = J−1 = 0, then J = 0 also. But then would be a zero of multiplicity at least two for J , a contradiction. A similar use of the relation J x − J x = J+1 x x shows that J also cannot share a nonzero zero with J+1 .
Proof
THEOREM 16.18
Interlacing Lemma
Let be any real number. Let a and b be distinct positive zeros of J . Then J−1 and J+1 each have at least one zero between a and b. Let fx = x J x. Then fa = fb = 0. Because f is differentiable at all points between a and b, by Rolle’s theorem, there is some c between a and b at which f c = 0. But Proof
f x =
d x J x = x J−1 x dx
so f c = 0 implies that J−1 c = 0. Similar reasoning, applied to gx = x− J x, and using the recursion relation
shows that J+1
d − x J x = −x− J+1 x dx has a zero between a and b.
The following table gives the first five positive zeros of J x for = 0 1 2 3 4. The numbers here are rounded at the third decimal place. The interlacing property of successive indexed Bessel functions can be seen by looking down the columns. For example, the second positive zero of J2 x falls between the second positive zeros of J1 x and J3 x.
J0 x J1 x J2 x J3 x J4 x
16.2.10
j1
j2
j3
j4
2.405 3.832 5.135 6.379 7.586
5.520 7.016 8.417 9.760 11.064
8.654 10.173 11.620 13.017 14.373
11.792 13.323 14.796 16.224 17.616
j5 14.931 16.470 . 17.960 19.410 20.827
Fourier–Bessel Expansions
Taking a cue from the Legendre polynomials, we might suspect that Bessel functions are orthogonal on some interval. They are not. However, let be any positive number. We know that J has infinitely many positive zeros, which we can arrange in an ascending sequence j1 < j2 < j3 < · · ·
740
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
√ For each such jn , we can consider the function xJ jn x for 0 ≤ x ≤ 1 (so jn x varies from 0 to jn ). We claim that these functions are orthogonal on 0 1, in the sense that the integral of the product of any two of these functions over 0 1, is zero. THEOREM 16.19
Orthogonality
√ Let ≥ 0. Then the functions xJ jn x, for n = 1 2 3 , are orthogonal on 0 1 in the sense that 1 xJ jn xJ jm xdx = 0 if n = m 0
This is in the same spirit as the orthogonality of the Legendre polynomials on −1 1, and orthogonality of the functions 1 cosx cos2x sinx sin2x on − . Proof
Again invoking equation (16.12), ux = J jn x satisfies x2 u + xu + jn2 x2 − 2 u = 0
And vx = J jm x satisfies x2 v + xv + jm2 x2 − 2 v = 0 Multiply the first equation by v and the second by u, and subtract the resulting equations to obtain x2 u v + xu v + jn2 x2 − 2 uv − x2 v u − xv u − jm2 x2 − 2 uv = 0 This equation can be written x2 u v − uv + xu v − uv = jm2 − jn2 x2 uv Divide by x: xu v − uv + u v − uv = jm2 − jn2 xuv Write this equation as
xu v − uv = jm2 − jn2 xuv Then
0
1
xu v − uv dx = xu v − uv 10 = J jn J jm − J jn J jm = 0 1 = jm2 − jn2 xJ jn xJ jm xdx 0
Since jn = jm , this proves the orthogonality of these functions on 0 1. As usual, whenever we have an orthogonality relationship, we are led to attempt Fouriertype expansions. Let f be defined on 0 1. How should we choose the coefficients to have an expansion fx =
n=1
an J jn x?
16.2 Bessel Functions
741
Using a now familiar strategy, multiply this equation by xJ jk x and integrate to get 1 1 1 xfxJ jk xdx = an xJ jn xJ jk xdx = ak xJ2 jk xdx 0
n=1
0
0
The infinite series of integrals has collapsed to a single term because of the orthogonality. Then 1 xfxJ jk xdx ak = 0 1 xJ2 jk xdx 0 We call these numbers the Fourier–Bessel coefficients of f . When these numbers are used in the series, we call jn x the Fourier–Bessel expansion, or Fourier–Bessel series, of n=1 an J√ f in terms of the functions xJ jn x. Sometimes a different point of view if adopted. It is common to say that the functions J jn x are orthogonal on 0 1 with respect to the weight function x = x. This simply means that the integral of the product of any two of these functions, and also multiplied by x, is zero over the interval 0 1: 1 xJ jn xJ jm xdx = 0 if n = m 0
This is the same integral we had before for orthogonality, but places the integral in the context of the weight function x, a viewpoint we will see shortly with Sturm–Liouville theory. √ Putting x = x in this integral has the same effect as putting a factor x with each J jn x. As with Fourier and Fourier–Legendre expansions, the fact that we can compute the coefficients and write the series does not mean that it is related to the function in any particular way. The following convergence theorem deals with this issue. THEOREM 16.20
Convergence of Fourier–Bessel Series
Let f be piecewise smooth on 0 1. Then, for 0 < x < 1,
1 an J jn x = fx+ + fx− 2 n=1 where an is the nth Fourier–Bessel coefficient of f . We will give an example of a Fourier–Bessel expansion after we learn more about the coefficients.
16.2.11
Fourier–Bessel Coefficients
1 The integral 0 xJ2 jk xdx occurs in the denominator of the expression for the Fourier–Bessel coefficients of any function, so it is useful to have an evaluation of this integral. THEOREM 16.21
If ≥ 0, then
1 0
1 2 xJ2 jk xdx = J+1 jk 2
Notice the importance here of the fact that J and J+1 cannot have a positive zero in common. Knowing that J jk = 0 implies that J+1 jk = 0.
742
CHAPTER 16 Proof
Special Functions, Orthogonal Expansions, and Wavelets
From the preceding discussion, x2 u + xu + jk2 x2 − 2 u = 0
where ux = J jk x. Multiply this equation by 2u x to get 2x2 u u + 2xu 2 + 2jk2 x2 − 2 uu = 0 We can write this equation as
x2 u 2 + jk2 x2 − 2 u2 − 2jk2 xu2 = 0 Now integrate, keeping in mind that u1 = 0: 1 1 0 = x2 u 2 + jk2 x2 − 2 u2 dx − 2jk2 xu2 dx 0
= x2 u 2 + jk2 x2 − 2 u2 10 − 2jk2 = u 12 − 2jk2
1
1 0
1
xu2 dx
0
xu2 dx
0
= jk2 Jk jk − 2jk2 Then
0
1 0
x J jk x2 dx
1 xJ2 jk xdx = J jk 2 2
Now in general J x − J x = −J+1 x x Then J jk −
J j = −J+1 jk jk k
so J jk = −J+1 jk Therefore
0
1
1 xJ2 jk xdx = J+1 jk 2 2
In view of this conclusion, the Fourier–Bessel coefficient of f is 1 2 an = xfxJ jn xdx
J+1 jn 2 0 Fourier–Bessel series will occur later when we solve the heat equation for certain kinds of regions. We will then be faced with the task of expanding the initial temperature function in a Fourier–Bessel series. We will also see a Fourier–Bessel expansion when we solve for the normal modes of vibration in a circular membrane. Generally Fourier–Bessel coefficients are difficult to compute because Bessel functions are difficult to evaluate at particular points, and even their zeros must be approximated. However, with modern computing power we can often make approximations to whatever degree of accuracy is needed.
16.2 Bessel Functions
743
EXAMPLE 16.5
Let fx = x1 − x for 0 ≤ x ≤ 1. Since f is continuous with a continuous derivative, its Fourier–Bessel series will converge to fx on 0 1:
x1 − x =
an J1 jn x for
0 < x < 1
n=1
where an =
1 2 x2 1 − xJ1 jn xdx
J2 jn 2 0
We will compute a1 through a4 , using eight decimal places in the first four zeros of J1 x: j1 = 383170597 j2 = 701558667 j3 = 1017346814 j4 = 1332369194 With the understanding that these integrations are approximations, compute 1 2 x2 1 − xJ1 383170597xdx a1 =
J2 3831705972 0 1 = 1232930609 x2 1 − xJ1 383170597xdx 0
= 045221702 1 2 x2 1 − xJ1 701558667xdx
J2 7015586672 0 1 = 2220508362 x2 1 − xJ1 701558667xdx
a2 =
0
= −003151859 1 2 x2 1 − xJ1 1017346814xdx
J2 10173468142 0 1 x2 1 − xJ1 1017346814xdx = 3207568554
a3 =
0
= 003201789 and
1 2 x2 1 − xJ1 1332369194xdx
J2 13323691942 0 1 x2 1 − xJ1 1332369194xdx = 4194557796
a4 =
0
= −000768864 Then, for 0 < x < 1, x1 − x ≈ 045221702J1 383170597x − 003151859J1 701558667x + 003201794J1 1017346814x − 000768864J1 1332369194x Figure 16.13 shows a graph of x1 − x and a graph of this four term sum of Bessel functions on 0 1. The graph is drawn on −1 23 to emphasize that, outside of 0 1, there is no claim
744
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets y
1.0
0.5 0 0.5
0.5
1.0
1.5
x
1.0 1.5 2.0 Approximation of x1 − x on
0 1 by a Fourier–Bessel series.
FIGURE 16.13
that x1 − x is approximated by the Fourier–Bessel series, and indeed the graphs diverge away from each other outside of 0 1. Accuracy on 0 1 can be improved by computing more terms in the series.
SECTION 16.2
PROBLEMS
1. Show that xa J bxc is a solution of 2a − 1 a2 − 2 c 2 y + b2 c2 x2c−2 + y = 0 y − x x2 In each of Problems 2 through 9, write the general solution of the differential equation in terms of functions xa J bxc and xa J− bxc . 1 7 y=0 2. y + y + 1 + 3x 144x2 1 4 3. y + y + 4x2 − 2 y = 0 x 9x 5 5 4. y − y + 64x6 + 2 y = 0 x x 3 5 5. y + y + 16x2 − 2 y = 0 x 4x 3 6. y − y + 9x4 y = 0 x 7 175 y=0 7. y − y + 36x4 + x 16x2 1 1 8. y + y − y=0 x 16x2 5 7 9. y + y + 81x4 + 2 y = 0 x 4x
10. Use the change of variables by = the differential equation
1 du to transform u dx
dy + by2 = cxm dx into the differential equation d2 u − bcxm u = 0 dx2 Use the result of problem 1 to find the general solution of this differential equation in terms of Bessel functions, and use this solution to solve the original differential equation. Assume that b is a positive constant. In each of Problems 11 through 16, use the given change of variables to transform the differential equation into one whose general solution can be written in terms of Bessel functions. Use this to write the general solution of the original differential equation. √ 11. 4x2 y + 4xy + x − 9y = 0 z = x 12. 4x2 y + 4xy + 9x3 − 36y = 0 z = x3/2 13. 9x2 y + 9xy + 4x2/3 − 16y = 0 z = 2x1/3 14. 9x2 y − 27xy + 9x2 + 35y = 0 u = y/x2 15. 36x2 y − 12xy + 36x2 + 7y = 0 u = x−2/3 y √ 16. 4x2 y + 8xy + 4x2 − 35y = 0 u = y x √ 17. Show that yx = xJ1/3 2kx3/2 3 is a solution of y + k2 xy = 0.
16.3 Sturm–Liouville Theory and Eigenfunction Expansions In each of Problems 18 through 22, write the general solution of the differential equation in terms of functions xa J bxc and xa Y bxc . 3 5 18. y − y + 4 − 2 y = 0 x x 1 3 19. y − y + 1 − 2 y = 0 x x 5 7 20. y − y + 1 − 2 y = 0 x x 3 3 1 21. y − y + + 2 y=0 x 4x x 1 15 22. y − y + 16x2 − 2 y = 0 x x 23. Show that
3 2 3 J5/2 x = cosx − 1 sinx − x x2 x 24. Show that
J−5/2 x =
2 x
3 3 sinx − 1 cosx + x2 x
(b) Multiply the differential equation for u by v, and the differential equation for v by u and subtract to show that
xu v − v u = 2 − 2 xuv (c) Show from part (b) that 2 − 2 xJ0 xJ0 xdx = x J0 xJ0 x − J0 xJ0 x This is one of a set of formulas called Lommel’s integrals. 27. Show that xI0 x = xI0 x. 28. In each of (a) through (d), find (approximately) the first five terms in the Fourier–Bessel expan sion n=1 an J1 jn x of fx, which is defined for 0 ≤ x ≤ 1. Compare the graph of this function with the graph of the sum of the first five terms in the series. (a) fx = x (b) fx = e−x
25. Let be a positive zero of J0 x. Show that 1 J xdx = 1/. 0 1 26. Let ux = J0 x and vx = J0 x. (a) Show that xu + u + 2 xu = 0. Derive a similar differential equation for v.
16.3
745
(c) fx = xe−x (d) fx = x2 e−x 29. Carry out the program of Problem 28, except now use an expansion n=1 an J2 jn x.
Sturm–Liouville Theory and Eigenfunction Expansions 16.3.1
The Sturm–Liouville Problem
We have now seen essentially the same scenario played out three times: differential equation =⇒ solutions that are orthogonal on a b =⇒ expansions of arbitrary functions in series of these solutions =⇒ convergence theorem for the expansion. First we had Fourier (trigonometric) series, then Legendre polynomials and Fourier–Legendre series, and then Bessel functions and Fourier–Bessel expansions. It stretches the imagination to think that the similarities in the convergence theorems for these expansions are mere coincidence. We will now develop a general theory into which these convergence theorems fit naturally. This will also expand our arsenal of tools in preparation for solving partial differential equations. Consider the differential equation y + Rxy + Qx + Pxy = 0
(16.33)
Given an interval a b on which the coefficients are continuous, we seek values of for which this equation has nontrivial solutions. As we will see, in some cases there will be boundary conditions solutions must satisfy (conditions specified at a and b), and sometimes not.
746
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
First put the differential equation into a convenient standard form. Multiply equation (16.33) by rx = e
Rxdx
to get y e
Rxdx
+ Rxy e
Rxdx
+ Qx + Pxe
Rxdx
y = 0
Since rx = 0, this equation has the same solutions as equation (16.33). Now recognize that the last equation can be written ry + q + py = 0
(16.34)
Equation (16.34) is called the Sturm–Liouville differential equation, or the Sturm–Liouville form of equation (16.33). We will assume that p, q and r and r are continuous on a b, or at least on a b, and px > 0 and rx > 0 on a b.
EXAMPLE 16.6
Legendre’s differential equation is 1 − x2 y − 2xy + y = 0 We can immediately write this in Sturm–Liouville form as 1 − x2 y + y = 0 for −1 ≤ x ≤ 1. Corresponding to the values = nn + 1, with n = 0 1 2 , the Legendre polynomials are solutions. As we saw in Section 16.1, there are also nonpolynomial solutions corresponding to other choices for . However, these nonpolynomial solutions are not bounded on −1 1.
EXAMPLE 16.7
√ Equation (16.12), with a = 0, c = 1 and b = , can be written 2 xy + x − y = 0 x
This is the Sturm–Liouville form of Bessel’s equation. For > 0, this equation has solutions in √ √ terms of the Bessel functions of order of the first and second kinds, J x and Y x. We will now distinguish three kinds of Sturm–Liouville problems. The Regular Sturm–Liouville Problem solutions of
We want numbers for which there are nontrivial
ry + q + py = 0 on an interval a b. These solutions must satisfy regular boundary conditions, which have the form A1 ya + A2 y a = 0 B1 yb + B2 y b = 0 A1 and A2 are given constants, not both zero, and similarly for B1 and B2 .
16.3 Sturm–Liouville Theory and Eigenfunction Expansions
747
The Periodic Sturm–Liouville Problem Now suppose ra = rb. We seek numbers and corresponding nontrivial solutions of the Sturm–Liouville equation on a b, satisfying the periodic boundary conditions ya = yb
y a = y b
The Singular Sturm–Liouville Problem We look for numbers and corresponding nontrivial solutions of the Sturm–Liouville equation on a b, subject to one of the following three kinds of boundary conditions: Type 1. ra = 0 and there is no boundary condition at a, while at b the boundary condition is B1 yb + B2 yb = 0 where B1 and B2 are not both zero. Type 2. rb = 0 and there is no boundary condition at b, while at a the condition is A1 ya + A2 y a = 0 with A1 and A2 not both zero. Type 3. ra = rb = 0, and no boundary condition is specified at a or b. In this case we want solutions that are bounded functions on a b. Each of these problems is a boundary value problem, specifying certain conditions at the endpoints of an interval, as contrasted with an initial value problem, which specifies information about the function and its derivative at a point (in the second order case). Boundary value problems usually do not have unique solutions. Indeed, it is exactly this lack of uniqueness that can be exploited to solve many important problems. In each of these problems, a number for which the Sturm–Liouville differential equation has a nontrivial solution is called an eigenvalue of the problem. A corresponding nontrivial solution is called an eigenfunction associated with this eigenvalue. The zero function cannot be an eigenfunction. However, any nonzero constant multiple of an eigenfunction associated with a particular eigenvalue, is also an eigenfunction for this eigenvalue. In mathematical models of problems in physics and engineering, eigenvalues usually have some physical significance. For example, in studying wave motion the eigenvalues are fundamental frequencies of vibration of the system. We will consider examples of these kinds of problems. The first will be important in analyzing problems involving heat conduction and wave propagation.
EXAMPLE 16.8 A Regular Problem
Consider the regular problem y + y = 0
y0 = yL = 0
on an interval 0 L. We will find the eigenvalues and eigenfunctions by considering cases on . Since we will show later that a Sturm–Liouville problem cannot have a complex eigenvalue, there are three cases. Case 1 = 0. Then yx = cx + d for some constants c and d. Now y0 = d = 0, and yL = cL = 0 requires that c = 0. This means that yx = cx + d must be the trivial solution. In the absence of a nontrivial solution, = 0 is not an eigenvalue of this problem. Case 2 is negative, say = −k2 for k > 0.
748
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
Now y − k2 y = 0 has general solution yx = c1 ekx + c2 e−kx Since y0 = c1 + c2 = 0 then c2 = −c1 , so y = 2c1 sinhkx. But then yL = 2c1 sinhkL = 0 Since kL > 0, sinhkL > 0, so c1 = 0. This case also leads to the trivial solution, so this Sturm–Liouville problem has no negative eigenvalue. Case 3 is positive, say = k2 . The general solution of y + k2 y = 0 is yx = c1 coskx + c2 sinkx Now y0 = c1 = 0 so yx = c2 sinkx. Finally, we need yL = c2 sinkL = 0 To avoid the trivial solution, we need c2 = 0. Then we must choose k so that sinkL = 0, which means that kL must be a positive integer multiple of , say kL = n. Then n2 2 for n = 1 2 3 L2 Each of these numbers is an eigenvalue of this Sturm–Liouville problem. Corresponding to each n, the eigenfunctions are nx yn x = c sin L in which c can be any nonzero real number. n =
EXAMPLE 16.9 A Periodic Sturm–Liouville Problem
Consider the problem y + y = 0
y−L = yL y −L = y L
on an interval −L L. Comparing this differential equation to equation (16.34), we have rx = 1, so r−L = rL, as required for a periodic Sturm–Liouville problem. Consider cases on . Case 1 = 0. Then y = cx + d. Now y−L = −cL + d = yL = cL + d implies that c = 0. The constant function y = d satisfies both boundary conditions. Thus = 0 is an eigenvalue with nonzero constant eigenfunctions. Case 2 < 0, say = −k2 . Now yx = c1 ekx + c2 e−kx
16.3 Sturm–Liouville Theory and Eigenfunction Expansions
749
Since y−L = yL, then c1 e−kL + c2 ekL = c1 ekL + c2 e−kL
(16.35)
And y −L = y L gives us (after dividing out the common factor k) c1 e−kL − c2 ekL = c1 ekL − c2 e−kL
(16.36)
Rewrite equation (16.35) as c1 e−kL − ekL = c2 e−kL − ekL This implies that c1 = c2 . Then equation (16.36) becomes c1 e−kL − ekL = c1 ekL − e−kL But this implies that c1 = −c1 , hence c1 = 0. The solution is therefore trivial, hence this problem has no negative eigenvalue. Case 3 is positive, say = k2 . Now yx = c1 coskx + c2 sinkx Now y−L = c1 coskL − c2 sinkL = yL = c1 coskL + c2 sinkL But this implies that 2c2 sinkL = 0 Next, y −L = kc1 sinkL + kc2 coskL = y L = −kc1 sinkL + kc2 coskL Then kc1 sinkL = 0 If sinkL = 0, then c1 = c2 = 0, leaving the trivial solution. Thus suppose sinkL = 0. This requires that kL = n for some positive integer n. Therefore the numbers n2 2 L2 are eigenvalues for n = 1 2 , with corresponding eigenfunctions nx nx + c2 sin yn x = c1 cos L L with c1 and c2 not both zero. We can combine Cases 1 and 3 by allowing n = 0, so the eigenvalue = 0 has corresponding nonzero constant eigenfunctions. n =
EXAMPLE 16.10 Bessel Functions as Eigenfunctions of a Singular Problem
Consider Bessel’s equation of order ,
2 xy + x − y = 0 x
750
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
on the interval 0 R. Here is any given nonnegative real number, and R > 0. In the context of the Sturm–Liouville differential equation, rx = x, and r0 = 0, so there is no boundary condition at 0. Let the boundary condition at R be yR = 0 We know that, if > 0, then the general solution of Bessel’s equation is √ √ yx = c1 J x + c2 Y x To have a solution that √ is bounded as x → 0+, we must choose c2 = 0. This leaves solutions of the form y = c1 J x. To satisfy the boundary condition at x = R, we must have √ yR = c1 J R = 0 √ We need c1 = 0 to avoid the trivial solution, so √ we must choose so that J R = 0. If j1 j2 are the positive zeros of J x, then R can be chosen as any jn . This yields an infinite sequence of eigenvalues n =
jn2 R2
with corresponding eigenfunctions cJ
jn x , R
with c constant but nonzero. This is an example of a type 1 singular Sturm–Liouville problem.
EXAMPLE 16.11 Legendre Polynomials as Eigenfunctions of a Singular Problem
Consider Legendre’s differential equation 1 − x2 y + y = 0 In the setting of Sturm–Liouville theory, rx = 1 − x2 . On the interval −1 1, we have r−1 = r1 = 0, so there are no boundary conditions and this is a singular Sturm–Liouville problem of type 3. We want bounded solutions on this interval, so choose = nn + 1, with n = 0 1 2 . These are the eigenvalues of this problem. Corresponding eigenfunctions are nonzero constant multiples of the Legendre polynomials Pn x. Finally, here is an example with more complicated boundary conditions.
EXAMPLE 16.12
Consider the regular problem y + y = 0
y0 = 0 3y1 + y 1 = 0
This problem is defined on 0 1. To find the eigenvalues and eigenfunctions, consider cases on .
16.3 Sturm–Liouville Theory and Eigenfunction Expansions
751
Case 1 = 0. Now yx = cx + d, and y0 = d = 0. Then y = cx. But from the second boundary condition, 3y1 + y 1 = 3c + c = 0 forces c = 0, so this case has only the trivial solution. This means that 0 is not an eigenvalue of this problem. Case 2 < 0. Write = −k2 with k > 0, so y − k2 y = 0, with general solution yx = c1 ekx + c2 e−kx Now y0 = 0 = c1 + c2 , so c2 = −c1 and yx = c1 sinhkx. Next, 3y1 + y 1 = 0 = 3c1 sinhk + c1 k coshk But for k > 0, sinhk and k coshk are positive, so this equation forces c1 = 0 and again we obtain only the trivial solution. This problem has no negative eigenvalue. Case 3 > 0, say = k2 Now y + k2 y = 0, with general solution yx = c1 coskx + c2 sinkx Then y0 = c1 = 0, so yx = c2 sinkx. The second boundary condition gives us 0 = 3c2 sink + kc2 cosk We need c2 = 0 to avoid the trivial solution, so look for k so that 3 sink + k cosk = 0 This means that k tank = − 3 This equation cannot be solved algebraically. However, Figure 16.14 shows graphs of y = tank and y = −k/3 on the same set of axes. These graphs intersect infinitely often in the half plane y
y tan (k)
k1
k2
k3
k4
k5 k
k
y3 FIGURE 16.14
752
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
k > 0. Let the k coordinates of these points of intersection be k1 , k2 , . The numbers n = −kn2 are the eigenvalues of this problem, with corresponding eigenfunctions c sinkn x for c = 0.
16.3.2 The Sturm–Liouville Theorem With these examples as background, here is the fundamental theorem of Sturm–Liouville theory.
THEOREM 16.22
1. Each regular and each periodic Sturm–Liouville problem has an infinite number of distinct real eigenvalues. If these are labeled 1 2 · · · so that n < n+1 , then limn→ n = . 2. If n and m are distinct eigenvalues of any of the three kinds of Sturm–Liouville problems defined on an interval a b, and n and m are corresponding eigenfunctions, then
b a
pxn xm xdx = 0
3. All eigenvalues of a Sturm–Liouville problem are real numbers. 4. For a regular Sturm–Liouville problem, any two eigenfunctions corresponding to a single eigenvalue are constant multiples of each other. Conclusion (1) assures us of the existence of eigenvalues, at least for regular and periodic problems. A singular problem may also have an infinite sequence of eigenvalues, as we saw in Example 16.10 with Bessel functions. Conclusion (1) also asserts that the eigenvalues “spread out”, so that, if arranged in increasing order, they increase without bound. For example, numbers 1 − 1/n could not be eigenvalues of a Sturm–Liouville problem, since these numbers approach 1 as n → . b In (2), denote f · g = a pxfxgxdx. This dot product for functions has many of the properties we have seen for the dot product of vectors. In particular, for functions f , g and h that are integrable on a b, f · g = g · f f · g + h = f · g + f · h f · g = f · g for any real number , and f · f ≥ 0 The last property relies on the assumption made for the Sturm–Liouville equation that px > 0 on a b. If f is also continuous on a b, then f · f = 0 only if f is the zero function, since b in this case a pxfx2 dx = 0 can be true only if fx = 0 for a ≤ x ≤ b. This analogy between vectors and functions is useful in visualizing certain processes and concepts, and now is an appropriate time to formalize the terminology.
16.3 Sturm–Liouville Theory and Eigenfunction Expansions
753
DEFINITION 16.1
Let p be continuous on a b and px > 0 for a < x < b. 1. If f and g are integrable on a b, then the dot product of f with g, with respect to the weight function p, is given by b f ·g = pxfxgxdx a
2. f and g are orthogonal on a b, with respect to the weight function p, if f · g = 0.
The definition of orthogonality is motivated by the fact that two vectors F and G in 3-space are orthogonal exactly when F · G = 0. Conclusion (2) may now be stated: eigenfunctions associated with distinct eigenvalues are orthogonal on a b, with weight function px. The weight function p is the coefficient of in the Sturm–Liouville equation. As we have seen explicitly for Fourier (trigonometric) series, Fourier–Legendre series and Fourier–Bessel series, this orthogonality of eigenfunctions is the key to expansions of functions in series of eigenfunctions of a Sturm–Liouville problem. This will become a significant issue when we solve certain partial differential equations modeling wave and radiation phenomena. Conclusion (3) states that a Sturm–Liouville problem can have no complex eigenvalue. This is consistent with the fact that eigenvalues for certain problems have physical significance, such as measuring modes of vibration of a system. Finally, conclusion (4) applies only to regular Sturm–Liouville problems. For example, the periodic Sturm–Liouville problem of Example 16.9 has eigenfunctions cosnx/L and sinnx/L associated with the single eigenvalue n2 2 /L2 , and these functions are certainly not constant multiples of each other. We will prove parts of the Sturm–Liouville theorem. A proof of (1) requires some delicate analysis that we will not pursue. For (2), we will essentially reproduce arguments made previously for Legendre polynomials and Bessel functions. Begin with the fact that
Proof
rn + q + n pn = 0 and rm + q + m pm = 0
Multiply the first equation by m and the second by n and subtract to get rn m − rm n = m − n pn m
Then a
b
x n x dx = m − n
rxn x m x − rxm
a
b
pxn xm xdx
754
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
Since n = m , conclusion (2) will be proved if we can show that the left side of the last equation is zero. Integrate by parts: b b rxn x m xdx − rxm x n xdx a
= m xrxn xa − b
a
b a
xa + − n xrxm b
rxn xm xdx
b a
rxn xm xdx
= rbm bn b − ram an a − rbn bm b + ran am a = rb m bn b − n bm b − ra m an a − n am a
(16.37)
To prove that this quantity is zero, use the boundary conditions that are in effect. Suppose first that we have a regular problem, with boundary conditions A1 ya + A2 y a = 0
B1 yb + B2 y b = 0
Applying the boundary condition at a to n and m , we have A1 n a + A2 n a = 0 and A1 m a + A2 m a = 0
Since A1 and A2 are assumed to be not both zero in the regular problem, then the system of algebraic equations n aX + n aY = 0 m aX + m aY = 0
has a nontrivial solution (namely X = A1 Y = A2 ). This requires that the determinant of the coefficients vanish: n a a n = n a a − m a a = 0 m n m a m a Using the boundary condition at b, we obtain n bm b − m bn b = 0
Therefore the right side of equation (16.37) is zero, proving the orthogonality relationship in the case of a regular Sturm–Liouville problem. The conclusion is proved similarly for the other kinds of Sturm–Liouville problems, by applying the relevant boundary conditions in equation (16.37). To prove conclusion (3), suppose that a Sturm–Liouville problem has a complex eigenvalue = + i. Let x = ux + ivx be a corresponding eigenfunction. Now r + q + p = 0 Take the complex conjugate of this equation, noting that x = u x + iv x and x = u x − iv x = x
16.3 Sturm–Liouville Theory and Eigenfunction Expansions
755
Since rx, px and qx are real-valued, these quantities are their own conjugates, and we get r + q + p = 0 This means that is also an eigenvalue, with eigenfunction . Now, if = 0, then and are distinct eigenvalues, hence b pxxxdx = 0 a
But then
b a
px ux2 + vx2 dx = 0
But, for a Sturm–Liouville problem, it is assumed that px > 0 for a < x < b. Therefore ux2 + vx2 = 0, so ux = vx = 0 on a b and x is the trivial solution. This contradicts being an eigenfunction. We conclude that = 0, so is real. Finally, to prove (4), suppose is an eigenvalue of a regular Sturm–Liouville problem, and and are both eigenfunctions associated with . Use the boundary condition at a, and reason as in part of the proof of (2), to show that a a − a a = 0 But then the Wronskian of and vanishes at a, so and are linearly dependent and one is a constant multiple of the other. We now have the machinery needed for general eigenfunction expansions.
16.3.3
Eigenfunction Expansions
In solving partial differential equations, we will often encounter the need to expand a function in a series of solutions of an associated ordinary differential equation—a Sturm–Liouville problem. Fourier series, Fourier–Legendre series, and Fourier–Bessel series are examples of such expansions. The function to be expanded will have some special significance in the problem. It might, for example, be an initial temperature function, or the initial displacement or velocity of a wave. To create a unified setting in which such series expansions can be understood, consider an analogy with vectors in 3-space. Given a vector F, we can always find real numbers a, b and c so that F = ai + bj + ck Although the constants are easy to find, we will pursue a formal process in order to identify a pattern. First, F · i = ai · i + bj · i + ck · i = a because i·i = 1
and
j · i = k · i = 0
Similarly, b = F · j and
c = F · k
756
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
The orthogonality of i, j and k provides a convenient mechanism for determining the coefficients in the expansion by means of the dot product. More generally, suppose U, V and W are any three nonzero vectors in 3-space that are mutually orthogonal, so U · V = U · W = V · W = 0 These vectors need not be unit vectors, and do not have to be aligned along the axes. However, because of their orthogonality, we can also easily write F in terms of these three vectors. Indeed, if F = U + V + !W then F · U = U · U + V · U + !W · U = U · U so =
F·U U·U
Similarly, F·V F·W and ! = (16.38) V·V W·W Again, we have a simple dot product formula for the coefficients. The idea of expressing a vector as a sum of constants times mutually orthogonal vectors, with formulas for the coefficients, extends to writing functions in series of eigenfunctions of Sturm–Liouville problems, with a formula similar to equation (16.38) for the coefficients. We have seen three such instances already, which we will briefly review in the context of the Sturm–Liouville theorem. =
The Sturm–Liouville problem is
Fourier Series
y + y = 0
y−L = yL = 0
(a periodic problem) with eigenvalues n2 2 /L2 for n = 0 1 2 and eigenfunctions 1 cosx/L cos2x/L sinx/L sin2x/L Here px = 1 and the dot product to be used is L f ·g = fxgxdx −L
If f is piecewise smooth on −L L, then for −L < x < L, nx nx 1 1 fx+ + fx− = a0 + an cos + bn sin 2 2 L L n=1 where
L an =
and
fx cosnx/Ldx fx · cosnx/L for n = 0 1 2 = L cosnx/L · cosnx/L cos2 nx/Ldx −L
−L
L bn =
fx sinnx/Ldx fx · sinnx/L for n = 1 2 = L 2 sinnx/L · sinnx/L sin nx/Ldx −L
−L
16.3 Sturm–Liouville Theory and Eigenfunction Expansions Fourier–Legendre Series
757
The Sturm–Liouville problem is 1 − x2 y + y = 0
with no boundary conditions on −1 1 because rx = 1−x2 vanishes at these endpoints. However, we seek bounded solutions. Eigenvalues are nn + 1 with corresponding eigenfunctions the Legendre polynomials P0 x P1 x Since px = 1, use the dot product 1 fxgxdx f ·g = −1
If f is piecewise smooth on −1 1, then for −1 < x < 1, 1 fx+ + fx− = cn Pn x 2 n=0
where
1 cn =
fxPn xdx f · Pn = 1 2 Pn · Pn P xdx −1 n
−1
Fourier–Bessel Series Consider the Sturm–Liouville problem 2 xy + x − y=0 x with boundary condition y1 = 0 on 0 1. Eigenvalues are = jn2 for n = 1 2 where j1 j2 are the positive zeros of J x, and eigenfunctions are J jn x. In this Sturm–Liouville problem, px = x and the dot product is 1 xfxgxdx f ·g = 0
If f is piecewise smooth on 0 1, then for 0 < x < 1 we can write the series 1 fx+ + fx− = cn J jn x 2 n=1
where
1 cn =
0
xfxJ jn xdx fx · J jn x = 1 J jn x · J jn x xJ2 jn xdx 0
again fitting the template we have seen in the other kinds of expansions. These expansions are all special cases of a general theory of expansions in series of eigenfunctions of Sturm–Liouville problems. THEOREM 16.23
Convergence of Eigenfunction Expansions
Let 1 2 be the eigenvalues of a Sturm–Liouville differential equation ry + q + py = 0 on a b, with one of the sets of boundary conditions specified previously. Let 1 2 be corresponding eigenfunctions, and define the dot product b pxfxgxdx f ·g = a
758
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
Let f be piecewise smooth on a b. Then, for a < x < b, 1 cn n x fx+ + fx− = 2 n=1
where cn =
f · n n · n
We call the numbers f · n n · n
(16.39)
the Fourier coefficients of f with respect to the eigenfunctions of this Sturm–Liouville problem. With this choice of coefficients, n=1 cn n x is the eigenfunction expansion of f with respect to these eigenfunctions. If the differential equation generating the eigenvalues and eigenfunctions has a special name (such as Legendre’s equation, or Bessel’s equation), then the eigenfunction expansion is usually called the Fourier - · · · series, for example, Fourier–Legendre series and Fourier–Bessel series.
EXAMPLE 16.13
Consider the Sturm–Liouville problem y + y = 0
y 0 = y /2 = 0
We find in a routine way that the eigenvalues of this problem are = 4n2 for n = 0 1 2 Corresponding to = 0, we can choose 0 x = 1 as an eigenfunction. Corresponding to = 4n2 , n x = cos2nx is an eigenfunction. This gives us the set of eigenfunctions 0 x = 1
1 x = cos2x
2 x = cos4x
Because the coefficient of in the differential equation is px = 1, and the interval is
0 /2, the dot product for this problem is /2 f ·g = fxgxdx 0
We will write the eigenfunction expansion of fx = x2 1−x for 0 ≤ x ≤ /2. Since f and f are continuous, this expansion will converge to x2 1 − x for 0 < x < /2. The coefficients in this expansion are /2 2 x 1 − xdx − 641 4 + 241 3 1 f ·1 = 0 /2 = 2 − = c0 = 1·1 /2 12 32 dx
0
and, for n = 1 2 , f · cos2nx cos2nx · cos2nx /2 2 x 1 − x cos2nxdx = 0 /2 cos2 2nxdx 0
cn =
=−
1 −4n2 −1n + 3 2 n2 −1n − 6−1n + 6 4 4n
16.3 Sturm–Liouville Theory and Eigenfunction Expansions y 0 0.2
759
y 0.2
0.4
0.6
0.8
1.0
1.2
x
1.4
0 0.2
0.4
0.2
0.4
0.6
0.8
1.0
1.2
1.4
x
0.4
0.6
f
0.6
0.8
0.8
1.0
1.0
1.2
f
1.2
1.4
1.4
FIGURE 16.15(a)
Fifth partial sum in
FIGURE 16.15(b)
Fifteenth partial sum.
Example 16.13.
Therefore, for 0 < x < /2, x2 1 − x = 2 −
1 − 12 32
1 1 −4n2 −1n + 3 2 n2 −1n − 6−1n + 6 cos2nx 4 4 n=1 n
Figure 16.15 (a) shows the fifth partial sum of this series, compared with f , and Figure 16.15 (b) shows the fifteenth partial sum of this expansion. Clearly this eigenfunction expansion is converging quite rapidly to x2 1 − x on this interval.
16.3.4
Approximation in the Mean and Bessel’s Inequality
In this and the next two sections we will discuss some additional properties of Fourier coefficients, as well as some subtleties in the convergence of Fourier series. For this discussion, let 1 , 2 , be normalized eigenfunctions of a Sturm–Liouville problem on a b. Normalized means that each eigenfunction n has been multiplied by a positive constant so that n · n = 1. This can always be done because a nonzero constant multiple of an eigenfunction is again an eigenfunction. We now have n · m =
b a
pxn xm xdx =
1 0
if n = m if n = m
For these normalized eigenfunctions, the nth Fourier coefficient is cn =
f · n = f · n n · n
We will now define one measure of how well a linear combination a given function f .
(16.40) N
n=1 kn n
approximates
760
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
DEFINITION 16.2
Best Mean Approximation
Let N be a positive integer and let f be a function that is integrable on a b. A linear combination N
kn n x
n=1
of 1 2 N is the best approximation in the mean to f on a b if the coefficients k1 kN minimize the quantity 2 b N IN f = px fx − kn n x dx a
n=1
IN f is the dot product of fx − Nn=1 kn n x with itself (with weight function p). For vectors in R3 , the dot product of a vector V = ai + bj + ck with itself is the square of its length: V · V = a2 + b2 + c2 = length of V2 This suggests that we define a length for functions by b pxgx2 dx = length of g2 g·g = a
Now IN f has the geometric interpretation of being the (square of the) length N of fx − N n=1 kn n x. The smaller this length is, the better the linear combination n=1 kn n x approximates fx on a b. This approximation is an average over the entire interval, as opposed to looking at the approximation at a particular point, hence the term “approximation in the mean”. We want to choose the kn s to make IN f the best possible mean approximation to f on a b, which means we want to make the length of fx − Nn=1 kn n x as small as possible. To determine how to choose the kn s, write ⎛ 2 ⎞ b N N 2 px ⎝fx − 2 fxn x + kn n x ⎠ dx 0 ≤ IN f = a
=
a
+
n=1 b
pxfx2 dx − 2
N
n=1
kn
n=1
N N
kn km
n=1 m=1
= f ·f −2
N
a
N
= f ·f −2
n=1
pxfxn xdx
pxn xm xdx
kn f · n +
N N
kn km n · m
n=1 m=1
kn f · n +
n=1 N
a
b
n=1
= f ·f −2
b
N
kn2 n · n
n=1
kn f · n +
N n=1
kn2
16.3 Sturm–Liouville Theory and Eigenfunction Expansions
761
since n · n = 1 for this normalized set of eigenfunctions. Now let cn = f · n , the nth Fourier coefficient of f for this set of normalized eigenfunctions. Complete the square by writing the last inequality as 0 ≤ f ·f −2
N
kn cn +
n=1
= f ·f +
N
N
kn2 −
n=1
cn − kn 2 −
n=1
N n=1
N
cn2 +
N
cn2
n=1
cn2
(16.41)
n=1
In this formulation, it is obvious that the right side achieves its minimum when each kn = cn . We have proved the following.
THEOREM 16.24
Let f be integrable on a b, and N a positive integer. Then, the linear combination that is the best approximation in the mean to f on a b is obtained by putting
N
n=1 kn n
kn = f · n for n = 1 2 . Thus, for any given N , the N th partial sum Nn=1 f · n n of the Fourier series n=1 f · n n of f , is the best approximation in the mean to f by a linear combination of 1 , 2 , N . The argument leading to the theorem has another important consequence. Put kn = cn = f · n in equality (16.41) to obtain 0 ≤ f ·f −
N
f · n 2
n=1
or N
f · n 2 ≤ f · f
n=1
Since N can be any positive integer, the series of squares of the Fourier coefficients of f converges, and the sum of this series cannot exceed the dot product of f with itself. This is Bessel’s inequality, and was proved in Section 14.5 (Theorem 14.7) for Fourier trigonometric series. THEOREM 16.25
Bessel’s Inequality
Let f be integrable on a b. Then the series of squares of the Fourier coefficients of f with respect to the normalized eigenfunctions 1 , 2 , converges. Further,
f · n 2 ≤ f · f
n=1
Under some circumstances, the inequality can be replaced by an equality. This leads us to consider the concept of convergence in the mean.
762
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
16.3.5 Convergence in the Mean and Parseval’s Theorem Continuing from the preceding subsection, 1 2 are assumed to be the normalized eigenfunctions of a Sturm–Liouville problem on a b. If f is continuous on a b with a piecewise continuous derivative, then for a < x < b, fx =
f · n n x
n=1
This convergence is called pointwise convergence, because it deals with convergence of the Fourier series individually at each x in a b. Under some conditions, this series may also converge uniformly. In addition to these two kinds of convergence, convergence in the mean is often used in the context of eigenfunction expansions.
DEFINITION 16.3
Convergence in the Mean
Let f be integrable on a b. The Fourier series n=1 f · n n of f , with respect to the normalized eigenfunctions 1 2 , is said to converge to f in the mean on a b if 2 b N lim px fx − f · n n dx = 0 N → a
n=1
Convergence in the mean of a Fourier series of f , to f , occurs when the length of fx − Nn=1 f · n n x approaches zero as N approaches infinity. This will certainly happen if the Fourier series converges to f , because then fx = n=1 f · n n x, and we know that this holds if f is continuous with a piecewise continuous derivative. For the remainder of this section, let C a b be the set of functions that are continuous on
a b, with piecewise continuous derivatives on a b.
THEOREM 16.26
1. If fx = n=1 f · n n x for a < x < b, then n=1 f · n n also converges in the mean to f on a b. 2. If f is in C a b, then n=1 f · n n converges in the mean to f on a b. The converse of (1) is false. It is possible for the length of fx − Nn=1 f · n n x to have limit zero as N → , but for the Fourier series not to converge to fx on the interval. This is because the integral in the definition of mean convergence is an averaging process and does not focus on the behavior of the Fourier series at any particular point. We will show that convergence in the mean for functions in C a b is equivalent to being able to turn Bessel’s inequality into an equality for all functions in this class.
16.3 Sturm–Liouville Theory and Eigenfunction Expansions
763
THEOREM 16.27
n=1 f
· n n converges in the mean to f for every f in C a b if and only if
f · n 2 = f · f
n=1
for every f in C a b. Proof
From the calculation done in proving Theorem 16.24, with kn = f · n , 2 b N N 0 ≤ IN f = px fx − f · n n dx = f · f − f · n 2 a
n=1
Therefore lim
N → a
b
px fx −
n=1
N
2 f · n n
dx = 0
n=1
if and only if f ·f −
f · n 2 = 0
n=1
Replacing the inequality with an equality in Bessel’s inequality yields the Parseval relationship. We can now state a condition under which this holds. COROLLARY 16.1
Parseval’s Theorem
If f is in C a b, then
f · n 2 = f · f
n=1
This follows immediately from the last two theorems. We know by Theorem 16.26(2) that, if f is in C a b, then the Fourier series of f converges to f in the mean. Then, by 2 Theorem 16.27, n=1 f · n = f · f . With more effort, the Parseval equation can be proved under much weaker conditions on f .
16.3.6
Completeness of the Eigenfunctions
Completeness is a concept that is perhaps most easily understood in terms of vectors. In 3-space, the vector k cannot be written as a linear combination i + j, even though i and j are orthogonal. The reason for this is that there is another direction in 3-space that is orthogonal to the plane of i and j, and i and j carry no information about the component a vector may have in this third direction. The vectors i and j are incomplete in R3 . By contrast, there is no nonzero vector that is orthogonal to each of i, j and k, so we say that these vectors are complete in R3 . Any 3-vector can be written as a linear combination of i, j and k. Now consider the normalized eigenfunctions 1 , 2 , Think of each j as defining a different direction, or axis, in the space of functions under consideration, which we take to be C a b. We say that these eigenfunctions are complete in C a b if the only function in C a b that is orthogonal to every eigenfunction is the zero function. If, however, there is a nontrivial function f in C a b that is orthogonal to every eigenfunction, then we say that the
764
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
eigenfunctions are incomplete. In this case there is another axis, or direction, in C a b that is not determined by all of the eigenfunctions. A function having a component in this other direction could not possibly be represented in a series of the incomplete eigenfunctions. We claim that the eigenfunctions are complete in the space of continuous functions with piecewise continuous derivatives on a b. THEOREM 16.28
The normalized eigenfunctions 1 2 are complete in C a b Suppose the eigenfunctions are not complete. Then there is some nontrivial function f in C a b that is orthogonal to each n . But because f is orthogonal to each n , each f n = 0, so
Proof
fx =
f · n n x = 0
for a < x < b
n=1
This contradiction proves the theorem.
EXAMPLE 16.14
The normalized eigenfunctions of the Sturm–Liouville problem y + y = 0 are
2
2 √ cos2x
y 0 = y /2 = 0
2 √ cos4x
2 √ cos6x
The constants were chosen to normalize the eigenfunctions, since /2 /2 4 n · n = cos2 2nxdx = 1 n2 dx = 0 0 This set E of eigenfunctions is complete in C 0 /2. This means that, except for fx ≡ 0, there is no f in C a b that is orthogonal to each eigenfunction. Observe the effect if one eigenfunction is removed. For example, the set E1 of eigenfunctions 2 2 2 √ cos4x √ cos6x is formed by removing fx = √2 cos2x from E. Now cos2x has no expansion in terms of E1 , even though cos2x is continuous with a continuous derivative on 0 /2. Indeed, if 2 2 cos2x = c0 + cn √ cos2nx n=2 then
c0 =
2 · cos2x = 0
16.4 Wavelets
765
and, for n = 2 3 2 cn = cos2x · √ cos2nx = 0 implying that cos2x = 0 for 0 < x < /2. This is an absurdity. The deleted set of eigenfunctions E1 , with one function removed from E, is not complete in C 0 /2.
PROBLEMS
SECTION 16.3
In each of Problems 1 through 12, classify the Sturm– Liouville problem as regular, periodic or singular; state the relevant interval; find the eigenvalues; and, corresponding to each eigenvalue, find an eigenfunction. In some cases eigenvalues may be implicitly defined by an equation. 1. y + y = 0 y0 = 0 y L = 0 2. y + y = 0 y 0 = 0 y L = 0 3. y + y = 0 y 0 = y4 = 0 4. y + y = 0 y0 = y y 0 = y
6. y + y = 0 y0 = 0 y + 2y = 0 7. y + y = 0 y0 − 2y 0 = 0 y 1 = 0 8. y + 2y + 1 + y = 0 y0 = y1 = 0 9. e2x y + e2x y = 0 y0 = y = 0 10. e
y + 1 + e
−6x
y = 0 y0 = y8 = 0
3
11. x y + xy = 0 y1 = ye3 = 0 −1
−3
14. fx = x for 0 ≤ x ≤ y + y = 0 y0 = y = 0 N = 30 −1 for 0 ≤ x ≤ 2 15. fx = 1 for 2 < x ≤ 4
16. fx = sin2x for 0 ≤ x ≤ y + y = 0 y 0 = y = 0 N = 30 17. fx = x2 for −3 ≤ x ≤ 3 y + y = 0 y−3 = y3 y −3 = y 3 N = 10 18. fx =
0 1
for 0 ≤ x ≤ 1/2 for 1/2 < x ≤ 1
y + 2y + 1 + y = 0 y0 = y1 = 0 N = 30
12. x y + 4 + x y = 0 y1 = ye = 0 4
In each of Problems 13 through 18, find the eigenfunction expansion of the given function in the eigenfunctions of the Sturm–Liouville problem. In each case, determine what the eigenfunction expansion converges to on the interval, and graph the function and the sum of first N terms
16.4
13. fx = 1 − x for 0 ≤ x ≤ L y + y = 0 y0 = yL = 0 N = 40
y + y = 0 y 0 = y4 = 0 N = 40
5. y + y = 0 y−3 = y3 y −3 = y 3
−6x
of the eigenfunction expansion on the same set of axes for the given interval. (In Problem 13, do the graph for L = 1)
19. Write Bessel’s inequality for the function fx = x4−x for the eigenfunctions of the Sturm–Liouville problem of Problem 3. 20. Write Bessel’s inequality for the function fx = e−x for the eigenfunctions of the Sturm–Liouville problem of Problem 6.
Wavelets 16.4.1
The Idea Behind Wavelets
Recent years have seen an explosion in both the mathematical development of wavelets, and their applications, which include signal analysis, data compression, filtering, and electromagnetics. Our purpose here is to introduce enough of the ideas behind wavelets to enable the student to pursue more thorough treatments. Think of a function defined on the real line as a signal. If the signal contains one fundamental frequency 0 , then f is a periodic function with period 2/0 and the Fourier series of ft is one tool for analyzing the signal’s frequency content. The amplitude spectrum of f consists of
766
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
a plot of points n0 cn /2, in which cn =
" a2n + bn2
with an and bn the Fourier coefficients of f . Under certain conditions on f , this enables us to represent the signal as a trigonometric series displaying the natural frequencies: 1 ft = a0 + an cosn0 t + bn sinn0 t 2 n=1
Often we model the signal by taking a partial sum of the Fourier series: N 1 ft ≈ a0 + an cosn0 t + bn sinn0 t 2 n=1
Although this process has proved useful in many instances, the Fourier trigonometric representation is not always the best device for analyzing signals. First, we may be interested in a signal that is not periodic. More generally, we may have a signal that is defined over the entire real line with no periodicity, and we require only that its energy be finite. This means that − ft2 dt is finite, or, if ft is complex valued, that − ft2 dt is finite. This integral is the energy content of the signal, and functions having finite energy are said to be square integrable. In general, Fourier expansions are not the best tool for the analysis of such functions. There are other disadvantages to Fourier trigonometric series. For a given f , we may have to choose N very large to model ft by a partial sum of a Fourier series. Finally, if we are interested on focusing on the behavior of ft in some finite time interval, or near some particular time, we cannot isolate those terms in the Fourier expansion that describe this behavior, but instead have to take the entire Fourier series, or its entire partial sum if we are modeling the signal. To illustrate, consider the signal shown in Figure 16.16. Explicitly, ⎧ ⎪ 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 ⎪ ⎪ − ⎪ ⎪ ⎪ 5 ⎪ ⎪ ⎪ ⎪ 11 ⎪ ⎪ ⎪ ⎪ ⎪ 5 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 ⎪ ⎪ ⎪ ⎪ ⎨ ft = −3 ⎪ ⎪ ⎪ ⎪ ⎪ 4 ⎪ ⎪ ⎪ − ⎪ ⎪ 5 ⎪ ⎪ ⎪ ⎪ 14 ⎪ ⎪ ⎪ ⎪ ⎪ 5 ⎪ ⎪ ⎪ ⎪ 4 ⎪ ⎪ ⎪ ⎪ ⎪ 5 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩0
1 4 3 ≤t< 8 1 ≤t< 2 3 ≤t< 4
for 0 ≤ t <
1 4 3 for 8 1 for 2 3 for ≤ t < 1 4 5 for 1 ≤ t < 4 5 11 for ≤ t < 4 8 11 3 for ≤t< 8 2 3 for t ≥ and for t < 0 2 for
16.4 Wavelets
767
f (t) 3 2 1 1.0
0.5
0
0.5
1.0
1.5
2.0
t
1 2 3 FIGURE 16.16 The signal ft.
The Fourier series of f on − 23 23 is 2nx 2nx 1 + a cos + bn sin 12 n=1 n 3 3 where
n n n n 1 −6 sin + 12 sin − 6 sin − 20 sin 5n 6 4 3 2 2n 5n 11n + 11 sin + 18 sin − 10 sin 3 6 12
an = −
and bn =
n n n n 1 −6 cos + 5 + 12 cos − 6 cos − 20 cos 5n 6 4 3 2
2n 5n 11n + 11 cos + 18 cos − 10 cos − 4 cosn 3 6 12
This series converges very slowly to the function. Indeed, Figure 16.17(a) shows the 80th partial sum of this series, and Figure 16.17(b) the 100th partial sum. Even with this number of terms, this partial sum does not model the signal very well. In addition, if we were interested in focusing on just part of the signal, there is no way of distinguishing certain terms of the Fourier series as carrying the most information about this part of the signal. Put another way, Fourier series do not localize information. These considerations suggest that we seek other sets of complete orthogonal functions in which square integrable functions might be expanded, and which overcome some of the difficulties just cited for Fourier trigonometric series. This is a primary motivation for wavelets. We will begin our discussion of wavelets by developing one important wavelet in detail, then use this construction to suggest some of the ideas behind wavelets in general.
16.4.2
The Haar Wavelets
We will construct an example that is important both historically and for present-day applications. The Haar wavelets were the first to be found (about 1910), and serve as a model of one approach to the development of other wavelets.
768
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
3
3
2
2
1
1 1.5 1.0 0.5
0
0.5
1.0
1.5 1.0 0.5
x
1.5
0.5
1.0
1.5
x
1
1
2
2
3
3 FIGURE 16.17(a) Eightieth partial sum of the Fourier series of the signal.
FIGURE 16.17(b) One-hundredth partial sum of the Fourier series of the signal.
Let L2 R denote the set of all real valued functions that are defined on the entire real line, and are square integrable. L2 R has the structure of a vector space, since linear combinations 1 f1 + 2 f2 + · · · + n fn of square integrable functions are square integrable. The dot product we will use for functions in L2 R is f ·g =
−
ftgtdt
Now consider the characteristic function of an interval I (or of any set of numbers on the real line). This function is denoted (I , and has the value 1 for t in I, and zero for t not in I. That is, 1 if t is in I (I t = 0 if t is not in I In particular, we will use the characteristic function of the half-open unit interval:
1 0
( 01 t =
for 0 ≤ t < 1 if t < 0 or if t ≥ 1
A graph of ( 01 is shown in Figure 16.18. We want to introduce new functions by both scaling and translation, with the objective of producing a complete orthonormal set of functions in L2 R. Recall that the graph of ft − k χ[0,1] 1
1 FIGURE 16.18
( 01
t
16.4 Wavelets
769
is the graph of ft translated k units to the right if k is positive, and k units to the left if k is negative. For example, Figure 16.19(a) shows a graph of t sint for 0 ≤ t ≤ 15 ft = 0 for t < 0 and for t > 15 Figure 16.19(b) is a graph of ft + 5 (graph of ft shifted five units to the left), and Figure 16.19(c) is a graph of ft − 5 (shift the graph of ft five units to the right). In addition, fkt is a scaling of the graph of f . fkt compresses (if k > 1) or stretches (if 0 < k < 1) the graph of ft for a ≤ t ≤ b onto the interval a/k b/k. For example, Figure 16.20(a) shows a graph of t sint for − 2 ≤ t ≤ 3 ft = 0 for t < −2 and for t > 0 Figure 16.20(b) shows a graph of f3t, compressing the graph of Figure 16.20(a) to the right and left, and Figure 16.20(c) shows a graph of ft/3, stretching out the graph of Figure 16.20(a). f(t 5)
f (t)
5
10
10
5
5
0
5
10
0
t
15
5
5
5
5
10
10
FIGURE 16.19(a)
FIGURE 16.19(b)
t sint for 0 ≤ t ≤ 15 ft = 0 for t < 0 and for t > 15
ft + 5.
f (t 5)
10 5 5
0
5
10
5 10 FIGURE 16.19(c) ft − 5.
15
20
10
t
15
t
770
CHAPTER 16
4
2
Special Functions, Orthogonal Expansions, and Wavelets f (t)
f(3t)
2
2
1
1
0
2
4
t
4
2
0 1
1
FIGURE 16.20(a)
FIGURE 16.20(b)
ft =
2
f3t.
t sint for −2 ≤ t ≤ 3 0 for t < −2 and for t > 0
f(t/3) 2 1
5
10
0
5
10
t
1
FIGURE 16.20(c)
ft/3.
Let t = ( 01 t, and define
t = 2t − 2t − 1 =
⎧ ⎪ ⎪ 1 ⎪ ⎨ −1 ⎪ ⎪ ⎪ ⎩ 0
1 for 0 ≤ t < 2 1 for ≤ t < 1 2 for t < 0 and for t ≥ 1
A graph of is shown in Figure 16.21. Next, consider translations t − n, in which n is any integer. This is the function t − n = 2t − n − 2t − n − 1 = 2t − 2n − 2t − 2n − 1 ⎧ 1 ⎪ ⎪ 1 for n ≤ t < n + ⎪ ⎨ 2 = −1 for n + 1 ≤ t < n + 1 ⎪ ⎪ 2 ⎪ ⎩ 0 for t < n and for t ≥ n + 1
4
t
16.4 Wavelets
771
ψ(t) 1
1 2
t
1
1 FIGURE 16.21
t = 2t − 2t − 1, with t = ( 01 .
ψ(2t m)
ψ(t n)
1
1 n1
n n
1 2
t
1
m 2
m 2
1 4
m1 2
t
1
FIGURE 16.22
t − n = 2t − n − 2t − n − 1.
FIGURE 16.23 2t − m =
22t − m − 22t − m − 1.
A graph of t − n is shown in Figure 16.22. Now combine a translation with a scaling. Consider the function 2t − m = 22t − m − 22t − m − 1 = 4t − 2m − 4t − 2m − 1 m ⎧ m ⎪ 1 for ≤ t < + 1/4 ⎪ ⎪ 2 ⎪ ⎨ 2 m m + 1 + 1/4 ≤ t < = −1 for ⎪ 2 2 ⎪ ⎪ ⎪ ⎩ 0 for t < m and for t ≥ m + 1 2 2 in which m is any integer. A graph of this function is shown in Figure 16.23. Before proceeding, we will observe that these translated and scaled functions are orthogonal in L2 R.
LEMMA 16.2
1. For distinct integers n and m, t − n · t − m = 0 and 2t − n · 2t − m = 0
772
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
2. For any integers n and m, t − n · 2t − m = 0 Proof If n = m, then the intervals n n + 1 on which t − n takes on its nonzero values, and m m + 1 on which t − m assumes its nonzero values, are disjoint. Then t − nt − m = 0 for all t and t − nt − mdt = 0 t − n · t − m = −
Similarly, for n = m, the intervals n/2 n + 1/2 and m/2 m + 1/2 on which 2t − n and 2t − m, respectively, take on their nonzero values, are disjoint, so 2t − n · 2t − m = 0. For (2), let n and m be any integers. If the intervals on which t − n and 2t − m have nonzero values are disjoint, then these functions are orthogonal. There are two cases in which these intervals are not disjoint. Case 1 n = m/2. In this case ⎧ 1 ⎪ ⎪ 1 for n ≤ t < n + ⎪ ⎪ 4 ⎨ 1 1 t − n2t − m = −1 for n + ≤ t < n + ⎪ 4 2 ⎪ ⎪ ⎪ ⎩ 0 for t < n and for t ≥ n + 1 2 Then t − n · 2t − m =
n+1/4 n
dt −
n+1/2 n+1/4
dt = 0
Case 2 n + 1/2 = m/2. Now
⎧ ⎪ ⎪ −1 ⎪ ⎪ ⎪ ⎨ t − n2t − m = 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ 0
1 3 ≤ t < n+ 2 4 3 for n + ≤ t < n + 1 4 1 3 for t < n + and for t ≥ n + 2 4
for n +
so t − n · 2t − m = −
n+3/4 n+1/2
dt +
n+1 n+3/4
dt = 0
However, while the functions t − n and 2t − m are orthogonal in L2 R, they do not form a complete set as n and m vary over the integers. We leave it for the student to produce nontrivial (that is, nonzero at least on some interval) square integrable functions that are orthogonal to all of these translated and scaled functions. The idea now is to extend this set of functions by using scaling factors 2m for integer m, to obtain functions that take on nonzero constant values on intervals that can be made shorter (positive m) or longer (negative m). Let mn t = 2m t − n
16.4 Wavelets
773
σm,n (t) σ3,n (t) σ2,n (t) 1 1
n 4
n 8 n 8
161
n 2 n 4
1 8
σ1,n (t)
n 2
1 2
n n
1 2
n1
t
σ0,n (t)
FIGURE 16.24 mn t for m = 0 1 2 3.
for each integer m and each integer n. Then mn t = 2m+1 t − 2n − 2m+1 t − 2n − 1 ⎧ n n 1 ⎪ ⎪ 1 for m ≤ t < m + m+1 ⎪ ⎪ 2 2 2 ⎨ n 1 n 1 = −1 for + ≤ t < + m m m+1 m ⎪ 2 2 2 2 ⎪ ⎪ ⎪ ⎩ 0 for t < n and for t ≥ n + 1 2m 2m 2m Figure 16.24 shows graphs of 0n t, 1n t, 2n t, and 3n t on the same set of axes, for comparison. Note that n determines how far out the t axis the graph occurs, while m controls the size of the interval over which the function is nonzero (shorter for m increasing and positive, longer for m increasing but m negative). In the drawing n is a positive integer, but n can also be chosen negative, in which case the graphs are to the left of the vertical axis. We claim that these functions form an orthogonal set in L2 R. THEOREM 16.29
If n, m, n and m are integers, and m n = m n , then mn · m n = 0 A proof of this is left to the student. One last detail before we get to the main point. The mn s are orthogonal, but they are not orthonormal. This is easily fixed. Divide each of these functions by its length, as defined by the dot product in L2 R. Compute n/2m +1/2m n/2m +1/2m 2 1 2 length of mn = mn · mn = mn tdt = dt = m m m 2 n/2 n/2 This suggests that we define the functions
mn t = 2m/2 mn t = 2m/2 2m+1 t − 2n − 2m+1 t − 2n − 1 ⎧ n n 1 m/2 ⎪ ⎪ for m ≤ t < m + m+1 ⎪ 2 ⎪ 2 2 2 ⎨ n 1 n 1 = −2m/2 for + ≤ t < + m m m+1 m ⎪ 2 2 2 2 ⎪ ⎪ n n 1 ⎪ ⎩ 0 for t < m and for t ≥ m + m 2 2 2 The functions mn form an orthonormal set in L2 R. These functions are the Haar wavelets. In the construction, is called the scaling function, and t = 2t − 2t − 1 is the mother wavelet. Graphs of these wavelets are similar to the graphs of Figure 16.24, but the segment at
774
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
height 1 in Figure 16.24 is now at height 2m/2 , and the segment at height −1 in Figure 16.24 is now at height −2m/2 . The Haar wavelets are complete in L2 R. The idea behind this can be envisioned as follows. If f is square integrable, then ft can be approximated as accurately as we like by a function g having compact support (gt = 0 outside some closed interval), and having constant values on half-open intervals of the form n/2m n + 1/2m , with n and m integers. Such intervals are of length 1/2m , which can be made longer or shorter by choice of the integer m. In turn, g can be approximated as closely as we like by a sum of constants times Haar wavelets, which are defined on such intervals, with the error in the approximation tending to zero as the number of terms in the sum is taken larger.
16.4.3 A Wavelet Expansion Suppose f is a square integrable function. We can attempt an expansion of f in a series of the Haar wavelets, which form a complete orthonormal set in L2 R. Such an expansion has the appearance
ft =
cmn mn t
m=− n=−
The equality in this expression is taken to mean that the series on the right converges in the mean to ft. This means that 2 M lim cmn mn t dt = 0 ft − M→ −
m=− n=−
The coefficients cmn can be found in the usual way by using the orthonormality of the Haar wavelets: f · m0 n0 =
cmn mn · m0 n0 = cm0 n0
m=− n=−
We will complete the example begun in Section 16.5.1, in which f is the signal whose graph is shown in Figure 16.16. As we saw in Figures 16.17(a) and (b), we would have to use a very large number of terms to model this signal with the partial sum of its Fourier expansion on − 23 23 . However, if we calculate the coefficients in the Haar expansion, we find that √ √ ft = 00 t + 211 t − 0621 t − 04 212 t + 25 t For some purposes we want Fourier trigonometric expansions, but for this signal the Haar wavelets provide a very efficient expansion.
16.4.4 Multiresolution Analysis with Haar Wavelets The term multiresolution analysis refers to a sequence of closed subspaces of L2 R that are related to the scaling used in defining a set of wavelets. We will discuss what this means in the context of the Haar wavelets. Because L2 R has the structure of a vector space, the following three conditions hold. 1. Linear combinations Nj=1 cj fj of functions in L2 R are also in L2 R. 2. The zero function, t = 0 for all t, is in L2 R, and serves as the zero vector of L2 R. For any function f in L2 R, f + = f . 3. If f is in L2 R, −f , defined by −ft = −ft, is also in L2 R.
16.4 Wavelets
775
A set S of square integrable functions is said to be a subspace of L2 R if S has at least one function in it, and, whenever f and g are in S, then f − g is in S. For example, the set of all constant multiples of ( 01 forms a subspace of L2 R. A subspace S is closed if convergent sequences of functions in S have their limit functions in S. For example, the subspace of all continuous square integrable functions is not closed, because a limit (in the sense of mean convergence) of continuous functions need not be continuous. If a subspace S is not closed, we can form the “smallest” subspace of L2 R containing all the functions in S, together with all the limits of convergent sequences of functions in S. This subspace, which may be all of L2 R, is called the closure of S, and is denoted S. S is closed, because by its formation it has all the limits of convergent sequences of functions that are in this space. We will now show how the Haar wavelets generate a sequence of closed subspaces of L2 R, which can be indexed by the integers so that each is contained in the next one in the list. The spaces are generated by different scalings of the scaling function , and may be thought of as associated with different degrees of resolution of the signal. To begin defining these spaces, let S0 consist of all linear combinations of the translated scaling function. These translated scaling functions have the form t − n for integer n, and a typical function in S0 has the form N
cj t − nj
j=1
where N is a positive integer, the cj s are real numbers and each nj is an integer. Now let V0 be the closure of S0 : V0 = S0 Next, let Sm be the space of all linear combinations of the functions 2m t − n, where n varies over the integers and m is a fixed integer in defining Sm . Let Vm = Sm From the scaling property of the scaling function, t = 2t + 2t − 1 we find that ft is in Vm exactly when f2t is in Vm+1 , and each Vm is contained within Vm+1 (written Vm ⊂ Vm+1 ). Thus the closed subspaces Vm , with integer m, form an ascending chain: · · · ⊂ V−2 ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ V2 ⊂ · · · This chain has two additional properties of importance. First, there is no nontrivial function contained in every Vm . We say that the intersection of all the closed subspaces Vm consists of just the zero function. And, finally, the ascending chain ends in L2 R. This means that every function in L2 R has a series expansion in terms of the Haar functions, a fact to which we have already alluded. The spaces Vm are said to form a multiresolution analysis of L2 R. This multiresolution analysis is generated by the scaling function .
16.4.5
General Construction of Wavelets and Multiresolution Analysis
The Haar wavelets have been known for nearly a century, together with the chain of subspaces that form a multiresolution analysis of L2 R. However, it remained unknown for
776
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets
some time whether this construction could be duplicated, resulting in multiresolution analyses starting from different scaling functions. To this end, we will use the hindsight of the Haar construction to make a definition of a scaling function and the associated multiresolution analysis.
DEFINITION 16.4
Scaling Function and Associated Multiresolution Analysis
Let be in L2 R. Then is a scaling function with multiresolution analysis Vm if · · · ⊂ V−2 ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ V2 ⊂ · · · is an ascending chain of closed subspaces of L2 R satisfying the conditions: 1. The translated functions t − n, for integer n, are orthonormal, and every function in V0 is a linear combination of functions of this form. 2. There is no nontrivial function that belongs to every Vm . (That is, the Vm s have trivial intersection.) 3. ft is in Vm exactly when f2t is in Vm+1 . 4. Every function in L2 R can be expanded in a series of functions from the Vm s.
V0 is a subspace of V1 , which contains functions orthogonal to every function in V0 . The subspace of V1 containing all of these functions is called the orthogonal complement of V0 in V1 . To draw an analogy from vectors in R3 , the constant multiples of k form a subspace of R3 that is the orthogonal complement of the plane defined by i and j. Every vector in this orthogonal complement is orthogonal to each linear combination ai + bj. Now use the scaling function to produce a mother wavelet , having the property that every function in this orthogonal complement of V0 in V1 is a linear combination of translates t − n. If there is such a mother wavelet, then we can form the family of wavelets mn = 2m/2 2m t − n for integers m and n.
16.4.6 Shannon Wavelets The Haar wavelets form a prototype for wavelets and multiresolution analysis, partly because they were the first, and partly because they are relatively easy to work with and visualize. The reason it took many years before other examples of scaling function/wavelet/multiresolution analysis were found is that this involves some fairly heavy analysis. However, there are other relatively simple examples. One consists of the Shannon wavelets. For these, begin with the Fourier transform of a potential scaling function. Let ˆ = ( − Taking the inverse Fourier transform, we obtain sint t This function occurs in the Shannon reconstruction theorem, which was proved in Section 15.4.7 for functions of bandwidth ≤ L. In the case that L = , the theorem states that a signal f whose t =
16.4 Wavelets
777
Fourier transform fˆ vanishes outside the interval − (that is, f has bandwidth ≤ ), can be reconstructed by sampling its values on the integers. Specifically, ft =
fn
n=−
sint − n = fnt − n t − n n=−
The space V0 in this context consists of functions in L2 R of bandwidth not exceeding . By scaling (let gt = f2t) we can consider the space V1 of functions of bandwidth not exceeding 2, and so on, forming a multiresolution analysis. Thus is a scaling function. We now need a mother wavelet that is orthogonal to each t − n, for integer n. By an argument we will not carry out (but whose conclusions can be verified in straightforward fashion), we obtain a suitable from in this case by setting sin2t − cost 1 − 22t − 1 = t = t − 2 t − 21 The frequency content of this function is obtained from its Fourier transform, ˆ = −e−i/2 (A where A consists of all in −2 −, together with all in 2. That is, on each of ˆ ˆ these intervals, = −e−i/2 , and for outside of these intervals, = 0. Figure 16.25 shows a graph of the mother wavelet , and Figure 16.26 a graph of its amplitude spectrum. This gives the frequency content of . The Shannon wavelets are the functions mn t = 2m/2 2m t − n =
2m/2 sin22m t − n − cos2m t − n t − 21
We leave it for the student to explore properties of these wavelets. Graphs of 10 t and 21 t are given in Figures 16.27(a) and (b). There are many other families of wavelets, including Meyer wavelets, Daubechies wavelets and Stömberg wavelets. These require a good deal more preliminary work for their definitions. Different wavelets are constructed for specific purposes, and they have applications in such areas as signal analysis, data compression and solution of integral equations. For an application ψ(t)
ψ(t)
1.0 0.8 0.6
0.8 0.6
6
4
2
4
6
0.6 0.8 1.0 Shannon mother wavelet sin2t − cost t = . t − 21 FIGURE 16.25
t
0.4 0.2 8 6 4 2
0
2
4
6
8
FIGURE 16.26 Amplitude spectrum of the Shannon mother wavelet.
t
778
CHAPTER 16
Special Functions, Orthogonal Expansions, and Wavelets ψ10 (t)
ψ21(t) 30 25 20 15 10 5
1.0 0.5 2.5 2.0 1.5 1.0 0.5 0.5
0.5
1.0 FIGURE 16.27(a)
Shannon wavelet 10 t.
t
1.0
0.6
FIGURE 16.27(b)
0.2 5
t 0.2
0.4
Shannon wavelet 21 t.
to the problem of using color patterns in the iris of the eye as a means of identification, see the article Iris Recognition, by John Daugman, appearing in American Scientist, July–August, 2001, pages 326–333.
SECTION 16.4
PROBLEMS
1. Show that mn t · m n t = 0 if m n = m n . 2. On the same set of axes, graph 11 t and 12 t. Explain from the graph why these two functions are orthogonal. 3. On the same set of axes, graph 13 t and −21 t. Explain from the graph why these two functions are orthogonal. 4. On the same set of axes, graph 21 t and 11 t. Explain from the graph why these two functions are orthogonal. 5. Graph 2t − 3. 6. Graph 2t + 6.
7. Let ft = 4−3−2 t + 6−11 t. Write the Fourier series of ft on −5 5. Graph the fiftieth partial sum of this series on the same set of axes with a graph of ft. 8. Let ft = −32−2 t + 420 t + 71−1 t. Write the Fourier series of ft on −4 4. Graph the fiftieth partial sum of this series on the same set of axes with a graph of ft. 9. Let ft = 3−4−1 t + 8−21 t. Write the Fourier series of ft on −6 6. Graph the fiftieth partial sum of this series on the same set of axes with a graph of ft. 10. Let ft = −2−2 t + 413 t + 21−2 t. Write the Fourier series of ft on −7 7. Graph the fiftieth partial sum of this series on the same set of axes with a graph of ft.
PA RT
6
CHAPTER 17 The Wave Equation
CHAPTER 18 The Heat Equation
CHAPTER 19
Partial Differential Equations
The Potential Equation
A differential equation in which partial derivatives occur is called a partial differential equation. Mathematical models of physical phenomena involving more than one independent variable often include partial differential equations. They also arise in such diverse areas as epidemiology (for example, multivariable predator/prey models of AIDS), traffic flow studies and the analysis of economies. We will be primarily concerned in this part with three broadly defined kinds of phenomena: wave motion, radiation or conduction of energy, and potential theory. Models of these phenomena involve partial differential equations called, respectively, the wave equation, the heat equation, and the potential equation, or Laplace’s equation. We will consider each of these in turn, deriving solutions under a variety of boundary and initial conditions describing different settings. The solution of partial differential equations requires a broad array of mathematical tools, including Fourier series, integrals and transforms, special functions and eigenfunction expansions. These were covered in Part 5, and can be referred to as needed.
779
This page intentionally left blank
CHAPTER
17
THE WAVE EQUATION AND INITIAL AND BOUNDARY CONDITIONS FOURIER SERIES SOLUTIONS OF THE WAVE EQUATION WAVE MOTION ALONG INFINITE AND SEMI-INFINITE STRINGS CHARACTERISTICS
The Wave Equation
17.1
The Wave Equation and Initial and Boundary Conditions Vibrations in a membrane or drum head, or oscillations induced in a guitar or violin string, are governed by a partial differential equation called the wave equation. We will derive this equation in a simple setting. Consider an elastic string stretched between two pegs, as on a guitar. We want to describe the motion of the string if it is given a small displacement and released to vibrate in a plane. Place the string along the x axis from 0 to L and assume that it vibrates in the x y plane. We want a function yx t such that, at any time t > 0, the graph of the function y = yx t of x, is the shape of the string at that time. Thus yx t allows us to take a snapshot of the string at any time, showing it as a curve in the plane. For this reason yx t is called the position function for the string. Figure 17.1 shows a typical configuration. To begin with a simple case, neglect damping forces such as air resistance and the weight of the string and assume that the tension Tx t in the string always acts tangentially to the string, and that individual particles of the string move only vertically. Also assume that the mass per unit length is constant. Now consider a typical segment of string between x and x + x and apply Newton’s second law of motion to write net force on this segment due to the tension = acceleration of the center of mass of the segment times its mass. This is a vector equation. For x small, the vertical component of this equation (Figure 17.2) gives us approximately Tx + x t sin + − Tx t sin = x
2 y x t
t2 781
782
CHAPTER 17
The Wave Equation
θ θ
y
→
T (x x, t)
θ y
0 FIGURE 17.1
y y(x, t), fixed t
→
y(x, t)
T (x, t) x
x
L
x x x
0
String profile
x
FIGURE 17.2
at time t.
where x is the center of mass of the segment and Tx t = Tx t = magnitude of T. Then
2 y Tx + x t sin + − Tx t sin = 2 x t x
t Now vx t = Tx t sin is the vertical component of the tension, so the last equation becomes vx + x t − vx t
2 y = 2 x t x
t In the limit as x → 0, we also have x → x and the last equation becomes
2 y
v = 2
x
t The horizontal component of the tension is hx t = Tx t cos, so vx t = hx t tan = hx t
(17.1)
y
x
Substitute this into equation (17.1) to get
y
2 y h = 2 x t
x
x
t
(17.2)
To compute the left side of this equation, use the fact that the horizontal component of the tension of the segment is zero, so hx + x t − hx t = 0 Thus h is independent of x and equation (17.2) can be written h
2 y
2 y =
x2
t2
Letting c2 = /h, this equation is often written 2
2 y 2 y = c
x2
t2 This is the one-dimensional (1-space dimension) wave equation. If we use subscript notation for partial derivatives, in which
yx =
y
x
and
yt =
then the wave equation is ytt = c2 yxx
y
t
17.1 The Wave Equation and Initial and Boundary Conditions
783
This spectacular photo, taken by Ensign John Gay from the U.S.S. Constellation, shows a shock wave cloud forming over the tail of a U.S. Navy F/A-18 Hornet as it breaks the sound barrier. Current theory is that sound density waves generated by the plane accumulate in a cone at the plane’s tail, and a drop in air pressure causes moist air to condense into water droplets there. Shock waves are not yet fully understood, and their mathematical modeling uses advanced techniques from the theory of partial differential equations.
In order to model the string’s motion, we need more than just the wave equation. We must also incorporate information about constraints on the ends of the string, and about the initial velocity and position of the string, which will obviously influence the motion. If the ends of the string are fixed, then y0 t = yL t = 0
for t ≥ 0
These are the boundary conditions. The initial conditions specify the initial (at time zero) position yx 0 = fx for 0 ≤ x ≤ L and the initial velocity
y x 0 = gx
t
for 0 < x < L
in which f and g are given functions satisfying certain compatibility conditions. For example, if the string is fixed at its ends, then the initial position function must reflect this by satisfying f0 = fL = 0 If the initial velocity is zero (the string is released from rest), then gx = 0.
784
CHAPTER 17
The Wave Equation
The wave equation, together with the boundary and initial conditions, constitute a boundary value problem for the position function yx t of the string. These constitute enough information to uniquely determine the solution yx t. If there is an external force of magnitude F units of force per unit length acting on the string in the vertical direction, then this derivation can be modified to obtain
2 y 1
2 y = c2 2 + F 2
x
t Again, the boundary value problem consists of this wave equation and the boundary and initial conditions. In 2-space dimensions the wave equation is 2
z 2 z
2 z 2 (17.3) = c +
t2
x2 y2 This equation governs vertical displacements zx y t of a membrane covering a specified region of the plane (for example, vibrations of a drum surface). Again, boundary and initial conditions must be given to determine a unique solution. Typically, the frame is fixed on a boundary (the rim of the drum surface), so we would have no displacement of points on the boundary: zx y t = 0
for x y on the boundary of the region and t > 0.
Further, the initial displacement and initial velocity must be given. These initial conditions have the form zx y 0 = fx y
z x y 0 = gx y
t
with f and g given. We will have occasion to use the two dimensional wave equation (17.3) expressed in polar coordinates, so we will derive this equation. Let x = r cos
y = r sin
Then r=
x2 + y 2
and = tan−1 y/x
Let zx y = zr cos r sin = ur Compute
z
u r u = +
x
r x x y u
u x − 2 = x2 + y2 r x + y2 =
x u y u − r r r 2
17.1 The Wave Equation and Initial and Boundary Conditions Then
2 z u y u y x − + =
x2
r x r
x r 2 r x
785
u y u − 2
r r x
y2 u 2xy u x2 2 u 2xy 2 u y2 2 u − + + + r 3 r r 4 r 2 r 2 r 3 r r 4 2 By a similar calculation, we get =
z y u x u = +
y r r r 2 and
2 z x2 u 2xy u y2 2 u 2xy 2 u x2 2 u − + + = +
y2 r 3 r r 4 r 2 r 2 r 3 r r 4 2 Then
2 z 2 z 2 u 1 u 1 2 u + + = 2+
x2 y2
r r r r 2 2 Therefore, in polar coordinates, the two-dimensional wave equation (17.3) is 2
u 1 u 1 2 u
2 u 2 + = c +
t2
r 2 r r r 2 2
(17.4)
in which ur t is the vertical displacement of the membrane from the x y plane at point r and time t. For the rest of this chapter we will solve boundary value problems involving wave motion in a variety of settings, making use of several techniques.
SECTION 17.1
PROBLEMS
1. Let yx t = sinnx/L cosnct/L. Show that y satisfies the one-dimensional wave equation for any positive integer n. √ 2. Show that zx y t = sinnx cosmy cos n2 + m2 ct satisfies the two-dimensional wave equation for any positive integers n and m. 3. Let f be any twice-differentiable function of one variable. Show that 1 yx t = fx + ct + fx − ct 2 satisfies the one-dimensional wave equation. 1 cosx sinct c satisfies the one-dimensional wave equation, together with the boundary conditions
4. Show that yx t = sinx cosct +
y0 t = y2 t =
1 sinct for t > 0 c
and the initial conditions
y yx 0 = sinx x 0 = cosx
t
for 0 < x <
5. Formulate a boundary value problem (partial differential equation, boundary and initial conditions) for vibrations of a rectangular membrane occupying a region 0 ≤ x ≤ a, 0 ≤ y ≤ b if the initial position is the graph of z = fx y and the initial velocity (at time zero) is gx y. The membrane is fastened to a stiff frame along the rectangular boundary of the region. 6. Formulate a boundary value problem for the motion of an elastic string of length L, fastened at both ends and released from rest with an initial position given by fx. The string vibrates in the x y plane. Its motion is opposed by air resistance, which has a force at each point of magnitude proportional to the square of the velocity at that point.
786
17.2
CHAPTER 17
The Wave Equation
Fourier Series Solutions of the Wave Equation We will begin with problems involving wave motion on a bounded interval. First we will consider the problem when there is an initial displacement, but no initial velocity (string released from rest). Following this we will allow an initial velocity but no initial displacement (string given an initial blow, but from its horizontal stretched position). Then we will show how to combine these to allow for both an initial velocity and initial displacement.
17.2.1 Vibrating String with Zero Initial Velocity Consider an elastic string of length L, fastened at its ends on the x axis at x = 0 and x = L. The string is displaced, then released from rest to vibrate in the x y plane. We want to find the displacement function yx t, whose graph is a curve in the x y plane showing the shape of the string at time t. If we took a snapshot of the string at time t, we would see this curve. The boundary value problem for the displacement function is 2
2 y 2 y = c
t2
x2
for 0 < x < L t > 0
y0 t = yL t = 0
for t ≥ 0
yx 0 = fx for 0 ≤ x ≤ L
y x 0 = 0
t
for 0 ≤ x ≤ L
The graph of fx is the position of the string before release. The Fourier method, or separation of variables, consists of attempting a solution of the form yx t = XxTt. Substitute this into the wave equation to obtain XT = c2 X T where T = dT/dt and X = dX/dx. Then T X = 2 X cT The left side of this equation depends only on x, and the right only on t. Because x and t are independent, we can choose any t0 we like and fix the right side of this equation at the constant value T t0 /c2 Tt0 , while varying x on the left side. Therefore X /X must be constant for all x in 0 L. But then T /c2 T must equal the same constant for all t > 0. Denote this constant −. (The negative sign is customary and convenient, but we would arrive at the same final solution if we used just ). is called the separation constant, and we now have X T = 2 = − X cT
17.2 Fourier Series Solutions of the Wave Equation
787
Then X + X = 0
and
T + c2 T = 0
The wave equation has separated into two ordinary differential equations. Now consider the boundary conditions. First, y0 t = X0Tt = 0 for t ≥ 0. If Tt = 0 for all t ≥ 0, then yx t = 0 for 0 ≤ x ≤ L and t ≥ 0. This is indeed the solution if fx = 0, since in the absence of initial velocity or a driving force, and with zero displacement, the string remains stationary for all time. However, if Tt = 0 for any time, then this boundary condition can be satisfied only if X0 = 0 Similarly, yL t = XLTt = 0 for t ≥ 0 requires that XL = 0 We now have a boundary-value problem for X: X + X = 0
X0 = XL = 0
The values of for which this problem has nontrivial solutions are the eigenvalues of this problem, and the corresponding nontrivial solutions for X are the eigenfunctions. We solved this regular Sturm–Liouville problem in Example 16.8, obtaining the eigenvalues n =
n2 2 L2
The eigenfunctions are nonzero constant multiples of nx Xn x = sin L for n = 1 2 At this point we therefore have infinitely many possibilities for the separation constant and for Xx. Now turn to Tt. Since the string is released from rest,
y x 0 = XxT 0 = 0
t This requires that T 0 = 0. The problem to be solved for T is therefore T + c2 T = 0
T 0 = 0
However, we now know that can take on only values of the form n2 2 /L2 , so this problem is really T +
n2 2 c2 T = 0 L2
T 0 = 0
788
CHAPTER 17
The Wave Equation
The differential equation for T has general solution nct nct + b sin Tt = a cos L L Now nc b = 0 T 0 = L so b = 0. We therefore have solutions for Tt of the form nct Tn t = cn cos L for each positive integer n, with the constants cn as yet undetermined. We now have, for n = 1 2 functions nct nx yn x t = cn sin cos (17.5) L L Each of these functions satisfies both boundary conditions and the initial condition yt x 0 = 0. We need to satisfy the condition yx 0 = fx. It may be possible to choose some n so that yn x t is the solution for some choice of cn . For example, suppose the initial displacement is 3x fx = 14 sin L Now choose n = 3 and c3 = 14 to obtain the solution 3x 3ct yx t = 14 sin cos L L This function satisfies the wave equation, the conditions y0 = yL = 0, the initial condition yx 0 = 14 sin3x/L, and the zero initial velocity condition
y x 0 = 0
t However, depending on the initial displacement function, we may not be able to get by simply by picking a particular n and cn in equation (17.5). For example, if we initially pick the string up in the middle and have initial displacement function x for 0 ≤ x ≤ L/2 fx = (17.6) L − x for L/2 < x ≤ L (as in Figure 17.3), then we can never satisfy yx 0 = fx with one of the yn s. Even if we try a finite linear combination yx t =
N
yn x t
n=1
y y(x, 0)
L 2
L 2
FIGURE 17.3
L
x, L x, x
0 x L2 L 2
xL
17.2 Fourier Series Solutions of the Wave Equation
789
we cannot choose c1 cN to satisfy yx 0 = fx for this function, since fx cannot be written as a finite sum of sine functions. We are therefore led to attempt an infinite superposition yx t =
cn sin
nx
n=1
L
cos
nct L
We must choose the cn s to satisfy yx 0 =
n=1
cn sin
nx L
We can do this! The series on the right is the Fourier sine expansion of fx on 0 L. Thus choose the Fourier sine coefficients n 2 L cn = f sin d L 0 L With this choice, we obtain the solution yx t =
L nct n 2 nx cos f sin d sin L n=1 0 L L L
(17.7)
This strategy will work for any initial displacement function f which is continuous with a piecewise continuous derivative on 0 L, and satisfies f0 = fL = 0. These conditions ensure that the Fourier sine series of fx on 0 L converges to fx for 0 ≤ x ≤ L. In specific instances, where fx is given, we can of course explicitly compute the coefficients in this solution. For example, if L = and the initial position function is fx = x cos5x/2 on 0 , then the nth coefficient in the solution (17.7) is 2 n cn = d cos5/2 sin 0 L =
8 n−1n+1 5 + 2n2 5 − 2n2
The solution for this initial displacement function, and zero initial velocity, is yx t =
8 n−1n+1 sin nx cos nct n=1 5 + 2n2 5 − 2n2
(17.8)
Figure 17.4(a) shows graphs of this function (profiles of the string) at times t = 0, 02, 04, 07, 09 and 13 seconds. Figure 17.4(b) shows profiles at times t = 12, 19, 3, 35, 42 and 47. And Figure 17.4(c) shows the graphs at times t = 51, 56, 59, 64, 7 and 8 3. These snapshots are made in groupings on the same set of axes to convey some sense of the motion with time. The solution we have derived by separation of variables can be put into the context of Sturm–Liouville theory (Section 16.3). The problem for X, namely X + X = 0
X0 = XL = 0
790
CHAPTER 17
The Wave Equation t0 0.4
t 0.2
0.3 t 0.4 0.2 0.1 0 0
0.5
1.0
–0.1
1.5
2.0
2.5
3.0
x
t 0.7
–0.2 t 0.9
t 1.3
–0.3
Profiles of the solution at times t = 0 02 04 07 09 and 13.
FIGURE 17.4(a)
0.3 0.2 t 4.2 0.1
t 1.9
x 0
0
0.5
1.0
1.5
2.0
2.5
3.0 t3
–0.1
t 4.7 –0.2 –0.3
t 3.5
t 1.2
FIGURE 17.4(b) String profiles at times t = 12 19 3 35 42
and 47.
is a regular Sturm–Liouville problem, and we found its eigenvalues and corresponding eigenfunctions. The final step in the solution was to expand the initial position function in a series of the eigenfunctions. For this problem this series is the Fourier sine expansion of fx on 0 L.
17.2 Fourier Series Solutions of the Wave Equation t 8.3
791
t 6.4
0.3 t 5.9
0.2 0.1 0 0
0.5
1.0
1.5 x
–0.1
2.0
2.5
3.0 t 5.6 t7
–0.2
t 5.1
–0.3 FIGURE 17.4(c) String profiles at times
t = 51 56 59 64 7 and 83.
17.2.2
Vibrating String with Given Initial Velocity and Zero Initial Displacement
Now consider the case that the string is released from its horizontal position (zero initial displacement), but with an initial velocity given at x by gx. The boundary value problem for the displacement function is 2
2 y 2 y = c
t2
x2
for 0 < x < L t > 0
y0 t = yL t = 0 yx 0 = 0
for t ≥ 0
for 0 ≤ x ≤ L
y x 0 = gx for 0 ≤ x ≤ L
t We begin as before with separation of variables. Put yx t = XxTt. Since the partial differential equation and boundary conditions are the same as before, we again obtain X + X = 0
X0 = XL = 0
with eigenvalues n =
n2 2 L2
and eigenfunctions constant multiples of Xn x = sin
nx
L Now, however, the problem for T is different and we have yx 0 = 0 = XxT0 so T0 = 0. The problem for T is T +
n2 2 c 2 T = 0 L2
T0 = 0
792
CHAPTER 17
The Wave Equation
(In the case of zero initial velocity we had T 0 = 0). The general solution of the differential equation for T is nct nct + b sin Tt = a cos L L Since T0 = a = 0, solutions for Tt are constant multiples of sinnct/L. Thus, for n = 1 2 we have functions nx nct yn x t = cn sin sin L L Each of these functions satisfies the wave equation, the boundary conditions and the zero initial displacement condition. To satisfy the initial velocity condition yt x 0 = gx, we generally must attempt a superposition nx nct sin yx t = cn sin L L n=1 Assuming that we can differentiate this series term by term, then nx
y nc x 0 = sin = gx cn
t L L n=1 This is the Fourier sine expansion of gx on 0 L. Choose the entire coefficient of sinnx/L to be the Fourier sine coefficient of gx on 0 L: 2 L nc n cn = d g sin L L 0 L or 2 L n cn = d g sin nc 0 L The solution is yx t =
n 2 1 L nx nct sin g sin d sin c n=1 n 0 L L L
(17.9)
For example, suppose the string is released from its horizontal position with an initial velocity given by gx = x 1 + cosx/L. Compute L L n n g sin 1 + cos d = sin d L L L 0 0 2 n L −1 if n = 1 2 −1 = nn 3L2 if n = 1 4 The solution corresponding to this initial velocity function is nx nct ct 2 L2 −1n 2 3L2 x sin + sin sin yx t = sin c 4 L L c n=2 n2 n2 − 1 L L (17.10) If we let c = 1 and L = , the solution 17.10 becomes yx t =
3 2−1n sinx sint + sinnx sinnt 2 2 2 n=2 n n − 1
Figure 17.5 shows graphs of this solution (positions of the string) at times t = 04, 12, 17, 26, 35 and 43.
17.2 Fourier Series Solutions of the Wave Equation
793
y(x, t) t = 1.7
1.5
1.2
2.6
1.0
0.4
0.5 0
0.5
0.5
1.0 1.5
2.0
2.5
3.0
x
3.5
1.0
4.3
FIGURE 17.5 String profiles at times t = 04, 1.2, 1.7, 2.6, 3.5, and 4.3.
17.2.3
Vibrating String with Initial Displacement and Velocity
Consider the motion of the string with both initial displacement fx and initial displacement gx. Formulate two separate problems, the first with initial displacement fx and zero initial velocity, and the second with zero initial displacement and initial velocity gx. We know how to solve both of these. Let y1 x t be the solution of the first problem, and y2 x t the solution of the second. Now let yx t = y1 x t + y2 x t Then y satisfies the wave equation and the boundary conditions. Further, yx 0 = y1 x 0 + y2 x 0 = fx + 0 = fx and
y
y
y x 0 = 1 x 0 + 2 x 0 = 0 + gx = gx
t
t
t Thus yx t is the solution in this case of nonzero initial displacement and velocity functions. For example, let the initial displacement function be x for 0 ≤ x ≤ L/2 fx = L − x for L/2 < x ≤ L and the initial velocity
x gx = x 1 + cos L The solution for the displacement function is the sum of the solution y1 x t for just displacement fx, with zero initial velocity, and the solution y2 x t with zero initial displacement and initial velocity gx. For y1 x t, use the solution (17.7). First evaluate n 2 L f sin d L L 0 2 L/2 n n 2 L = sin L − sin d + d L 0 L L L/2 L =
4L sinn/2 n2 2
794
CHAPTER 17
The Wave Equation
Therefore y1 x t =
nct nx 4L cos sinn/2 sin 2 2 L L n=1 n
We have already solved for y2 x t, obtaining x ct 2 3L2 sin sin y2 x t = c 4 L L nx nct L2 −1n 2 sin sin + 2 2 c n=2 n n − 1 L L The solution with the given initial position and initial velocity is yx t = y1 x t + y2 x t. If we let L = and c = 1, this solution is yx t =
4 sinn/2 sin nx cos nt 2 n n=1 3 + sin x sin t 2
+
2−1n sin nx sin nt 2 2 n=2 n n − 1
Graphs of this string profile are shown in Figure 17.6 for times t = 0125, 046, 093, 19, 25, 34 and 52. t 0.93
y(x, t) 0.46 1.5 1.0
1 8
1.9
0.5 0 0.5 1.0
0.5 2.5
1.0
1.5
2.0
2.5
3.0
x
5.2
1.5
3.4
FIGURE 17.6 Snapshot of the string at times t = 18 , 046, 093, 19, 25, 34, and 52.
17.2.4 Verification of Solutions In the solutions we have obtained thus far we have had to use an infinite series yx t = yn x t n=1
and determine the coefficients in the yn s by using a Fourier expansion. The question now is whether this infinite sum is indeed a solution of the boundary value problem. To be specific, consider the problem with initial position function fx and zero initial velocity. We derived the proposed solution nct nx yx t = cos (17.11) cn sin L L n=1
17.2 Fourier Series Solutions of the Wave Equation in which
795
2 L n d f sin cn = L 0 L
Certainly y0 t = yL t = 0, because every term in the series for yx t vanishes at x = 0 and at x = L. Further, under reasonable conditions on f , the Fourier sine series of fx converges to fx on 0 L, so yx 0 = fx, It is not obvious, however, that yx t satisfies the wave equation, even though each term in the series certainly does. The reason for this uncertainty is that we cannot justify term by term differentiation of the proposed series solution. We will now demonstrate a remarkable fact, which has other ramifications as well. We will show that the series in equation (17.11) can be summed in closed form. To do this, first write nx nct 1 nx + ct nx − ct sin cos = sin + sin L L 2 L L Then equation (17.11) becomes 9 nx + ct nx − ct 1 + cn sin c sin yx t = 2 n=1 n L L n=1
(17.12)
If the Fourier sine series for fx converges to fx on 0 L, as might normally be expected of a function that can be a displacement function for a string, then nx cn sin fx = L n=1 for 0 ≤ x ≤ L, and equation (17.12) becomes 1
fx + ct + fx − ct 2 If f is twice differentiable, we can use the chain rule to verify directly that yx t given by this expression satisfies the wave equation, wherever fx + ct and fx − ct are defined. This raises a difficulty, however, since fx is defined only for 0 ≤ x ≤ L. But t can be any nonnegative number, so the numbers x + ct and x − ct can vary over the entire real line. How then can we evaluate fx + ct and fx − ct? This difficulty can be overcome in two steps. First, extend f to an odd function fo defined on −L L by setting fx for 0 ≤ x ≤ L fo x = −f−x for − L < x < 0 yx t =
Notice that fo 0 = fo L = fo −L = 0 because the ends of the string are fixed. Now extend fo to a periodic function F of period 2L by replicating the graph of fo on successive intervals L 3L, 3L 5L −3L −L −5L −3L Figure 17.7(a) displays the odd extension of f defined on 0 L to fo defined on −L L, and Figure 17.7(b) shows the periodic extension of fo to the real line. We now have 1 yx t = Fx + ct + Fx − ct (17.13) 2 for 0 ≤ x ≤ L and t > 0. Assuming that f is twice differentiable, and that the joins at the ends of intervals where f has been extended to produce F are sufficiently smooth, then F is also twice differentiable, and the chain rule can be used to directly verify that yx t satisfies the
796
CHAPTER 17
The Wave Equation y
f
L L
x
Odd extension of
FIGURE 17.7(a)
f to −L L.
y 3L
2L
L
L
2L
3L
x
FIGURE 17.7(b) Periodic extension F of fo to the real line.
wave equation. This is an elegant expression for the solution in terms of the initial displacement function and the number c, which depends on the material from which the string is made. It is reasonable that the motion should be determined by these quantities. In practice, there will often be finitely many points in 0 L at which f is not differentiable. For example, fx as given by equation (17.6) is not differentiable at L/2. In such a case yx t given by equation (17.13) is the solution in a restricted sense, as there are isolated points at which it does not satisfy all the conditions of the boundary value problem. Equation (17.13) has an appealing physical interpretation. If we think of Fx as a wave, then Fx + ct is this wave translated ct units to the left, and Fx − ct is the wave translated ct units to the right. The motion of the string (in this case with zero initial velocity) is a sum of two waves, one moving to the right with velocity c, the other to the left with velocity c, and both waves are determined by the initial displacement function. We will say more about this when we discuss d’Alembert’s solution for the motion of an infinitely long string.
17.2.5 Transformation of Boundary Value Problems Involving the Wave Equation There are boundary value problems involving the wave equation for which separation of variables does not lead to the solution. This can occur because of the form of wave equation (for example, there may be an external forcing term), or because of the form of the boundary conditions. Here is an example of such a problem and a strategy for overcoming the difficulty. Consider the boundary value problem
2 y 2 y = 2 + Ax
t2
x y0 t = yL t = 0 yx 0 = 0
for 0 < x < L t > 0 for t ≥ 0,
y x 0 = 1
t
for 0 < x < L
A is a positive constant. The term Ax in the wave equation represents an external force which at x has magnitude Ax. We have let c = 1 in this problem. If we put yx t = XxTt into the partial differential equation, we get XT = X T + Ax and there is no way to separate the t dependency on one side of the equation, and the x dependent terms on the other.
17.2 Fourier Series Solutions of the Wave Equation
797
We will transform this problem into one for which separation of variables works. Let yx t = Yx t + x The idea is to choose to reduce the given problem to one we have already solved. Substitute yx t into the partial differential equation to get
2 Y
2 Y = + x + Ax
t2
x2 This will be simplified if we choose so that x + Ax = 0 There are many such choices. By integrating twice, we get x = −A
x3 + Cx + D 6
with C and D constants we can still choose any way we like. Now look at the boundary conditions. First, y0 t = Y0 t + 0 = 0 This will be just y0 t = Y0 t if we choose 0 = D = 0 Next, yL t = YL t + L = YL t − A
L3 + CL = 0 6
This will reduce to y0 t = YL t if we choose C so that L = −A
L3 + CL = 0 6
or 1 C = AL2 6 This means that 1 1 1 x = − Ax3 + AL2 x = Ax L2 − x2 6 6 6 With this choice of , Y0 t = YL t = 0 Now relate the initial conditions for y to initial condition for Y . First, 1 Yx 0 = yx 0 − x = −x = Axx2 − L2 6 And
Y
y x 0 = x 0 = 1
t
t
798
CHAPTER 17
The Wave Equation
We now have a boundary value problem for Yx t:
2 Y
2 Y =
t2
x2
for 0 < x < L t > 0
Y0 t = 0 YL t = 0
for t > 0
1
Y Yx 0 = Axx2 − L2 x 0 = 1 for 0 < x < L 6
t Using equations 17.7 and 17.9, we immediately write the solution L nt 2 nx 1 n Yx t = A 2 − L2 sin d sin cos L n=1 0 6 L L L nx nt n 1 L 2 sin + sin d sin n=1 n 0 L L L =
nt nx −1n 2AL3 cos sin 3 n=1 n3 L L
+
nx nt 1 − −1n 2L sin sin 2 2 n=1 n L L
The solution of the original problem is 1 yx t = Yx t + Ax L2 − x2 6 Figure 17.8(a) shows graphs of the string’s position at times t = 003, 02, 05, 09, 14 and 22, with c = 1 and L = . Figure 17.8(b) shows this string at times t = 28, 37, 44, 48 53, 61 and 67. These use L = and c = 1. y(x, t)
y(x, t) 4
4 2.2
2.8
2
2
0.9 0.5
1
0.03 0.2
0
0.5
1.0
1.5
2.0
2.5
3.0
Position of the string at times t = 003, 02, 05, 09, 14, and 22.
FIGURE 17.8(a)
3.7
3
1.4
3
x
4.4 6.7
1 0
0.5
1.0
4.8 1.5 5.3
2.0
2.5
3.0
x
6.1
FIGURE 17.8(b) Position at times t = 28, 37, 44, 48, 53, 61, and 67.
17.2.6 Effects of Initial Conditions and Constants on the Motion Using separation of variables, we have obtained series solutions of problems involving the vibrating string on a bounded interval. It is interesting to examine the effects that constants occuring in the problem have on the solution. We begin with an example investigating the effect of the constant c in the motion of the string.
17.2 Fourier Series Solutions of the Wave Equation
799
EXAMPLE 17.1
Consider again the problem of the wave equation with zero initial displacement and initial velocity given by x gx = x 1 + cos L The solution previously obtained, with L = , is yx t =
3 2−1n 1 sinx sinct + sinnx sinnct 2c n2 c n2 − 1 n=2
Figure 17.5 shows graphs of the string’s position at various times, with c = 1. Now we want to focus on how c influences the motion. Figure 17.9(a) shows the string profile at time t = 53, with c = 105. Figures 17.9(b) and (c) show the profile at the same time, with with c = 11 and 12, respectively. These graphs are placed on the same set of axes for comparison in Figure 17.9(d). The student is invited to select other times and graph the solution for different values of c. Next, consider a problem in which the initial data of the problem depends on a parameter.
y 0
y 0.5
1.0
1.5
2.0
2.5
3.0
x 0 0.1
0.2
0.5
1.0
1.5
2.0
2.5
x
3.0
0.2
0.4
0.3
0.6
0.4
0.8
0.5
1.0
0.6
FIGURE 17.9(a)
t = 53 and c = 105.
t = 53 and c = 11.
FIGURE 17.9(b)
y y
c 1.2
0.1 0 0.2
0.08 0.06
0.4
0.04
1.0
1.5
2.0
2.5
x
c 1.1
0.6
0.02 0
0.5
3.0
0.8 0.5
1.0
FIGURE 17.9(c)
1.5
2.0
2.5
t = 53 and c = 12.
3.0
x
c 1.05
1.0 FIGURE 17.9(d) String profile at time t = 53 with c having values 105, 11, and 12.
800
CHAPTER 17
The Wave Equation
EXAMPLE 17.2
Consider the problem
2 y
2 y = 144
t2
x2
for 0 < x < t > 0
y0 t = y t = 0
for t ≥ 0
y x 0 = sinx for 0 < x <
t in which is a positive number that is not an integer. It is routine to write the solution yx 0 = 0
yx t =
5 sin−1n+1 sinnx sin12nt 3 n=1 n2 − 2
Now compare graphs of this solution at various times, with different choices of . Figure 17.10(a) shows the string profile at t = 05 for equal to 07, 09, 15, 47 and 93. Figure 17.10(b) shows the graphs for these values of at t = 11, and Figure 17.10(c) shows the graphs at t = 28. We can also follow the motion of the string at different times for the same value of . Figure 17.11(a) shows the string profiles for = 07 at times t = 05, 11 and 28. Figures 17.11(b), (c), (d) and (e) each show the string profile for a given and for these three times.
y 0.5 0.4 0.3 0.2 0.1 0 0.1 0.2
y 0.9 0.8
0.7
0.7
0.9
0.6 1.5 9.3 1.0 1.5 0.5 4.7
1.5
0.4
2.5
3.0
0.2
9.3
x
2.0
0
0.5
1.0
1.5
2.0
2.5 3.0
4.7
FIGURE 17.10(a) String profiles at t = 05 for equal to 07, 09, 15, 47, and 93.
String profiles at t = 11.
FIGURE 17.10(b)
y 4.7
0.15 0.1
9.3
0.05 0 0.05
1.5 0.5 1.0
3.0 2.0
2.5
0.1 0.15 0.7 FIGURE 17.10(c)
0.9
1.5
String profiles at t = 28.
x
x
17.2 Fourier Series Solutions of the Wave Equation y
y 0.8
t 1.1
0.8
t 1.1
0.6
0.6
t 0.5
t 0.5
0.4
0.4
0.2
0.2 0 0.2
801
2.5
2.5 0.5
1.0
1.5
2.0
3.0
x
0
0.5
1.0
1.5
0.2
t 2.8
FIGURE 17.11(a) Graphs of the string with = 07 for times t = 05, 11, and 28.
2.0
x 3.0
t 2.8
FIGURE 17.11(b)
= 09.
y y
0.15 t 1.1
0.4 t 0.5
t 1.1
t 0.5
0.1 0.05
0.2
0
0.5
1.0
1.5
2.0
2.5
3.0
0 0.05
x
FIGURE 17.11(c)
1.0
2.0
2.5 3.0
t 2.8
0.15
= 15.
FIGURE 17.11(d)
y
1.5
0.1
t 2.8
0.2
0.5
= 47.
t 9.3
0.08 0.06 0.04 0.02 0 0.02 0.04 0.06 0.08
0.5 1.0
t 0.5
2.0
1.5
2.5 3.0
x
t 2.8
FIGURE 17.11(e) = 93.
In some of the exercises we will ask the student to employ a graphics package to exhibit string profiles at different times and under different conditions.
17.2.7
Numerical Solution of the Wave Equation
We will describe a numerical method for approximating solutions of the wave equation on an interval. The underlying idea is useful in approximating solutions of the heat equation as well,
x
802
CHAPTER 17
The Wave Equation
and involves difference approximations of the derivative. To understand this idea, begin with a function f of a single variable which is differentiable at x0 . Approximate f x0 ≈
fx0 + h − fx0 h
f x0 ≈
fx0 − h − fx0 −h
and also
with the approximation improving as h is chosen closer to zero. If h > 0, these are, respectively, the forward and backward difference approximations of f x0 . If we average these we get f x0 ≈
fx0 + h − fx0 − h 2h
This is the centered difference approximation of f x0 . If f is twice differentiable at x0 , then f x0 + h − f x0 − h 2h 1 fx0 + 2h − fx0 fx0 − fx0 − 2h ≈ − 2h 2h 2h
f x0 ≈
=
fx0 + 2h − 2fx0 + fx0 − 2h 4h2
Replacing 2h by h, we can write f x0 ≈
fx0 + h − 2fx0 + fx0 − h h2
This is the centered difference approximation of the second derivative. Applying these ideas to yx t, we can take increments x in x and t in t and write centered difference approximations of second partial derivatives:
2 y yx + x t − 2yx t + yx − x t x t ≈ 2
x x2 and
2 y yx t + t − 2yx t + yx t − t x t ≈ 2
t t2 We will use these to write numerical approximations of the solution to the problem:
2 y
2 y = c2 2 2
t
x
for 0 < x < L t > 0
y0 t = yL t = 0
for t ≥ 0
yx 0 = fx for 0 ≤ x ≤ L
y x 0 = gx
t
for 0 ≤ x ≤ L
The x t− region of interest is the strip 0 ≤ x ≤ L, t ≥ 0. Choose a positive integer N and let x = L/N . Partition 0 L by points xj = j x, so 0<
L 2L N − 1L NL < < ··· < < = L N N N N
17.2 Fourier Series Solutions of the Wave Equation
803
Also choose an increment t in time and let tk = k t for k = 0 1 2 In this way form a grid of points xj tk , called lattice points, over the x t− strip, as shown in Figure 17.12. It is convenient to write yjk = yxj tk = yj x k t Now replace the partial derivatives in the wave equation with centered difference approximations to get yj+1k − 2yjk + yj−1k yjk+1 − 2yjk + yjk−1 = c2 2 t x2 at xj yk . Solve this for yjk+1 to get c t 2 yjk+1 = yj+1k − 2yjk + yj−1k + 2yjk − yjk−1 x
(17.14)
Figure 17.13 shows why this equation is useful. The horizontal lines t = tk divide the x t− strip into horizontal time layers t units apart. Compute approximate values yjk at the lattice points xj tk . The points xj tk+1 , xj−1 tk , xj tk , xj+1 tk and xj tk−1 appear as a diamond configuration, with the middle three points at the tk level, the last point at the tk−1 level, and the first at the highest, tk+1 level. If we know the (approximate) value of yx t at each of the last four points (in levels tk and tk−1 ), then we know all the terms on the right of equation (17.14), hence we know the (approximate) value yjk+1 at the tk+1 level. We can work our way up such five point configurations, always solving for the value of yx t at the highest level, from previously derived values at the two next lower levels. This process fails at the edges of the x t− region because we cannot form this five point diamond configuration there. However, the initial and boundary information of the problem give information about yx t at the edges. In particular: yx 0 = fx at each point on the bottom side (t = 0) of the strip, and y0 t = yL t = 0 on the left and right sides of the strip. Thus, y0 tk = yL tk = 0 or equivalently, y0k = yLk = 0
for k = 0 1 2
And yxj 0 = yj0 = fj x for j = 1 N − 1
t
t
tk+1
tk + 1 tk
tk
}
∆t
(xj, tk)
tk–1
}
x x0 ∆x
xj xj + 1
x xN L
FIGURE 17.12 Lattice of points at which approximations are made.
xL
(xj, tk+1) (xj–1, tk)
(xj, tk)
(xj+1, tk)
(xj, tk–1)
x
xj–1 xj xj+1
x
FIGURE 17.13 For the wave equation, approximation of yxj tk+1 from preceding approximations, three at level tk and one at level tk−1 .
804
CHAPTER 17
The Wave Equation t
xL
(xj, t1) xj xj–1 xj+1
t1 t0 t–1
x
(xj, t–1)
A t−1 layer must be created to implement the scheme of Figure 17.13 at the t1 layer.
FIGURE 17.14
We have not yet used the initial condition on the velocity. Use the backward difference approximation of the first derivative to write yxj − t − yxj 0
y xj 0 ≈
t − t yj−1 − yj0 = gxj = gj x j = 1 N − 1 = − t
(17.15)
Notice that this equation contains a yj−1 term. This is at the layer below the bottom edge t = 0) of the x t− strip. There is really no such layer in a natural sense, but we create it artificially using this backward difference approximation in order to use the initial information y/ tx 0 = gx for 0 ≤ x ≤ L. Solve equation (17.15) for yj−1 to get yj−1 = yj0 − gj x t enabling us to determine the appropriate values to fill in on this lowest layer, in terms of known values on level zero and the initial velocity function. This provides the diamond configuration of Figure 17.14 when k = 0. The strategy now is to begin by filling in the yx t values at the grid points at levels k = −1 and k = 0. Then work up the layers, using equation (17.14) to fill in approximate values of yx t at successively higher layers. With today’s computing power, this can be done for a very large number of grid points. One fine point – the number c t/ x2 has a bearing on the stability of the method. If this number is less than 1/2, the method is stable and produces approximations that improve as x and t are chosen smaller (keeping c t/ x2 < 1/2). If c t/ x2 ≥ 1/2, the numerical approximations can be unstable, yielding unreliable results.
EXAMPLE 17.3
Consider the problem
2 y 2 y = 2
t2
x
for 0 < x < 1 t > 0
y0 t = y1 t = 0
y 1 yx 0 = x cosx/2 x 0 =
t 0
for 0 ≤ x ≤ 1/2 for 1/2 < x ≤ 1
17.2 Fourier Series Solutions of the Wave Equation
805
The exact solution is
−1n 16 yx t = 2 sin nx cos nt n=1 4n2 − 12 2 1 cosn/2 − 1 sin nx sin nt + n=1 n n
We will choose N = 10, so x = 01. Let t = 0025. Then c t/ x2 = 0025/012 = 00625 < 1/2. The equations for the approximations are yjk+1 = 00625 yj+1k − 2yjk + yj−1k + 2yjk − yjk−1 for j = 1 9, k = 0 1 2 (17.16) yj0 = f01j
for j = 1 9
and yj−1 = yj0 − gj x t = f01j − 0025g01j
for j = 1 9
Note that we take j from 1 through N − 1 = 9 because j = 0 corresponds to the left side of the x t− strip, and j = N = 10 refers to the right side of this strip, and information is given on these sides: y0 t = y1 t = 0. First, compute the values yj−1 on the lowest horizontal level: y1−1 = 007377 y2−1 = 016521 y3−1 = 024230 y4−1 = 029861 y5−1 = 032855 y6−1 = 035267 y7−1 = 031779 y8−1 = 024721 y9−1 = 014079 Next, compute the approximate values yj0 : y10 = 009877 y20 = 019021 y30 = 026730 y40 = 032361 y50 = 035355 y60 = 035267 y70 = 031779 y80 = 024721 y90 = 014079 Now systematically move up the t− axis, one level at a time. For t = 0025 , put k = 0 in equation (17.16), we have yj1 = 00625 yj+10 − 2yj0 + yj−10 + 2yj0 − yj−1 for j = 1 9 The computed values are: y11 = 012331 y21 = 021431 y31 = 0291 y41 = 034696 y51 = 037662 y61 = 035055 y71 = 031556 y81 = 024160 y91 = 013864 Next get the approximate values on the k = 2 layer (t = 20025 = 005) by putting k = 1 in equation (17.16) and using yj2 = 00625 yj+12 − 2yj2 + yj−12 + 2yj2 − yj1 for j = 1 9. In this way, we can form approximations at lattice points as high as we want in the x t− strip.
CHAPTER 17
806
The Wave Equation
PROBLEMS
SECTION 17.2
where
In each of Problems 1 through 8, solve the boundary value problem using separation of variables. Graph some of the partial sums of the series solution, for selected values of the time.
2 y
2 y = c2 2 2
t
x
1.
2
yx 0 = 2 sinx
y x 0 = 0
t
yx 0 = 0 2
8.
y x 0 = x3 − x for 0 ≤ x ≤ 3
t
2 y
2 y =8 2 2
t
x
where
3x fx = 6 − 3x
for 0 ≤ x ≤ 2
for 0 ≤ x ≤ for < x ≤ 2
for 0 < x < 5 t > 0
y0 t = y5 t = 0 yx 0 = 0
for t ≥ 0
y x 0 = 0
t
2 y
2 y =4 2 2
t
x
for 0 ≤ x ≤
for 0 < x < 2 t > 0
y0 t = y2 t = 0 yx 0 = fx
for t ≥ 0
for t ≥ 0
y x 0 = gx
t
for 0 ≤ x ≤ 4
for t ≥ 0
y x 0 = gx
t
for 0 ≤ x < 1/2 for 1/2 ≤ x ≤ 1
and
for 0 ≤ x ≤ 2
for 1 < x ≤ 2
for 0 < x < t > 0
y0 t = y t = 0
for t ≥ 0
y x 0 = − x
t
for 0 ≤ x ≤
9. Solve the boundary value problem
2 y
2 y = 3 2 + 2x 2
t
x
for 0 < x < t > 0
y x 0 = 1
t
0 3
2 y
2 y = 25 2 2
t
x
2
yx 0 = sinx
yx 0 = sin2x
for t ≥ 0
y0 t = y t = 0
5.
for 0 ≤ x ≤ 4
for 0 < x < 3 t > 0
y
y =9 2
t2
x
4.
gx =
for 0 < x < 4 t > 0
y0 t = y3 t = 0
6.
where
for t ≥ 0
2 y
2 y =4 2 2
t
x
for 0 ≤ x < 4 for 4 ≤ x ≤ 5
for 0 < x < 2 t > 0
yx 0 = xx − 2
for 0 ≤ x ≤ 1 for 1 < x < 2
y0 t = y4 t = 0
3.
2 y
2 y =9 2 2
t
x
y0 t = y2 t = 0
for 0 ≤ x ≤ 2
2
y
y =9 2
t2
x
2.
7.
for t ≥ 0
y x 0 = gx
t
2x where gx = 0
0 5−x
for 0 < x < 2 t > 0
y0 t = y2 t = 0 yx 0 = 0
gx =
y0 t = y2 t = 0
for 0 < x < 2 t > 0 for t ≥ 0
y x 0 = 0 for 0 ≤ x ≤ 2
t Graph some partial sums of the series solution. Hint: Upon putting yx t = XxTt, we find that the variables do not separate. Put Yx t = yx t+hx and choose h to obtain a boundary value problem that can be solved by Fourier series. yx 0 = 0
10. Solve
2 y
2 y = 9 2 + x2 2
t
x y0 t = y4 t = 0
for 0 < x < 4 t > 0 for t ≥ 0
y x 0 = 0 for 0 ≤ x ≤ 4
t Graph some partial sums of the solution for values of t. yx 0 = 0
17.2 Fourier Series Solutions of the Wave Equation
807
14. Consider the boundary value problem
11. Solve
2 y
2 y = 2 − cosx 2
t
x y0 t = y2 t = 0 yx 0 = 0
for 0 < x < 2 t > 0 for t ≥ 0
y x 0 = 0
t
for 0 ≤ x ≤ 2
Graph some partial sums of the solution for selected values of the time. 12. Transverse vibrations in a homogeneous rod of length are modeled by the partial differential equation a2
4 u 2 u + 2 =0
x4
t
for 0 < x < t > 0
Here ux t is the displacement at time t of the crosssection through x perpendicular to the x axis, and a2 = EI/A, where E is Young’s modulus, I is the moment of inertia of a cross-section perpendicular to the x axis, is the constant density, and A the cross-sectional area, assumed constant. (a) Let ux t = XxTt to separate the variables. (b) Solve for values of the separation constant and for X and T in the case of free ends:
2 u
2 u
3 u
3 u 0 t = 2 t = 3 0 t = 3 t = 0 2
x
x
x
x for t > 0. (c) Solve for values of the separation constant and for X and T in the case of supported ends: u0 t = u t =
2 u
2 u 0 t = t = 0
x2
x2
for t > 0.
u
2 u
2 u + A + Bu = c2 2 for 0 < x < L t > 0 2
t
t
x Here A and B are positive constants. The boundary conditions are
The initial conditions are
u ux 0 = fx x 0 = 0
t
y0 t = y4 t = 0 yx 0 = cosx
for t ≥ 0
for 0 ≤ x ≤ L
Assume that A2 L2 < 4BL2 + c2 2 .
for 0 < x < 4 t > 0 for t ≥ 0
y x 0 = 0
t
for 0 ≤ x ≤ 4
(a) Write a series solution. (b) Find a series solution when the term 5x3 is removed from the wave equation. (c) In order to gauge the effect of the forcing term on the motion, graph the 40th partial sum of the solution for (a) and (b) on the same set of axes at time t = 04 seconds. Repeat this procedure successively for times t = 08, 14, 2, 25, 3 and 4 seconds. 15. Consider the boundary value problem
2 y
2 y = 9 + cosx
t2
x2 y0 t = y4 t = 0 yx 0 = x4 − x
for 0 < x < 4 t > 0
for t ≥ 0
y x 0 = 0
t
for 0 ≤ x ≤ 4
(a) Write a series solution. (b) Find a series solution when the term cosx is removed from the wave equation. (c) In order to gauge the effect of the forcing term on the motion, graph the 40th partial sum of the solution for (a) and (b) on the same set of axes at time t = 06 seconds. Repeat this procedure successively for times t = 1, 14, 2, 3, 5 and 7 seconds. 16. Consider the boundary value problem
2 y
2 y = 9 2 − e−x 2
t
x y0 t = y4 t = 0
13. Solve the telegraph equation
u0 t = uL t = 0
2 y
2 y = 9 2 + 5x3 2
t
x
yx 0 = sinx
for 0 < x < 4 t > 0 for t ≥ 0
y x 0 = 0
t
for0 ≤ x ≤ 4
(a) Write a series solution. (b) Find a series solution when the term e−x is removed from the wave equation. (c) In order to gauge the effect of the forcing term on the motion, graph the 40th partial sum of the solution for (a) and (b) on the same set of axes at time t = 06 seconds. Repeat this procedure successively for times t = 1, 14, 2, 3, 5 and 7 seconds.
CHAPTER 17
808
The Wave Equation
17. Consider the problem
2 y
2 y = 2 2
t
x
y0 t = y1 t = 0 yx 0 = fx
y x 0 = 0
t where
yx 0 = sinx
for 0 < x < 1 t > 0
y x 0 = 1
t
for t ≥ 0
for 0 ≤ x ≤ 1
20. Consider the problem for 0 ≤ x ≤ 1/2 for 1/2 ≤ x ≤ 1
2 y
2 y = 2 2
t
x
Use x = 01 and t = 0025 to compute approximate values of yx t at lattice points in the x t− strip 0 ≤ x ≤ 1, t ≥ 0. Carry out the computations for five t− layers (that is, for t = 0 through t = 50025 = 0125. 18. Consider the problem for 0 < x < 2 t > 0
y0 t = y2 t = 0 yx 0 = 0
for t ≥ 0
yx 0 = x1 − x
2
y x 0 = x2
t
for 0 ≤ x ≤ 1
for 0 ≤ x ≤ 1
21. Consider the problem
2 y
2 y =
t2
x2
for 0 < x < 2 t > 0 for t ≥ 0
for 0 < x < 1 t > 0
y0 t = y1 t = 0 yx 0 = 0
for t ≥ 0
for 0 ≤ x ≤ 1
y x 0 = cosx
t
19. Consider the problem
17.3
for t ≥ 0
Use x = 02 and t = 0025. Compute approximate values of yx t, going up five layers from t = 0 through t = 0125.
for 0 ≤ x ≤ 2
y0 t = y2 t = 0
for 0 < x < 1 t > 0
y0 t = y1 t = 0
y x 0 = 1 for 0 ≤ x ≤ 2
t Use x = 01 and t = 0025 and compute approximate values of yx t, going up five layers from t = 0 through t = 0125.
2 y
2 y = 2 2
t
x
for 0 ≤ x ≤ 2
Use x = 01 and t = 0025. Compute approximate values of yx t, going up five layers from t = 0 through t = 0125.
for 0 ≤ x ≤ 1
x fx = 1−x
2 y
2 y = 2 2
t
x
for 0 ≤ x ≤ 2
for 0 ≤ x ≤ 1
Use x = 01 and t = 0025. Compute approximate values of yx t, going up five layers from t = 0 through t = 0125.
Wave Motion Along Infinite and Semi-Infinite Strings 17.3.1 Wave Motion Along an Infinite String If long distances are involved (such as with sound waves in the ocean used to monitor temperature changes), wave motion is sometimes modeled by an infinite string, in which case there are no boundary conditions. As with the finite string, we will consider separately the cases of zero initial velocity and zero initial displacement. Zero Initial Velocity
Consider the initial value problem
2 y
2 y = c2 2 2
t
x yx 0 = fx
for − < x < t > 0
y x 0 = 0
t
for < x <
17.3 Wave Motion Along Infinite and Semi-Infinite Strings
809
There are no boundary conditions, but we will impose the condition that the solution be a bounded function. To separate the variables, let yx t = XxTx and obtain, as before, X + X = 0 T + c2 T = 0 Consider cases on . Case 1 = 0. Now Xx = ax + b. This is a bounded solution if a = 0. Thus = 0 is an eigenvalue, with nonzero constant eigenfunctions. Case 2 < 0. Write = −2 with > 0. Then X − 2 X = 0, with general solution Xx = aex + be−x But ex is unbounded on 0 , so we must choose a = 0. And e−x is unbounded on − 0, so we must choose b = 0, leaving only the zero solution. This problem has no negative eigenvalue. Case 3 > 0, say = 2 with > 0. Now X + 2 X = 0, with general solution X x = a cosx + b sinx These functions are bounded for all > 0. Thus every positive number = 2 is an eigenvalue, with corresponding eigenfunction a cosx + b sinx for a and b not both zero. We can include Case 1 in Case 3, since a cosx + b sinx = constant if = 0. Now consider the equation for T , which we can now write as T + c2 2 T = 0 for ≥ 0. This has general solution Tt = a cosct + b sinct Now
y x 0 = XtT 0 = Xtcb = 0
t so b = 0. Thus solutions for T are constant multiples of T t = cosct For any ≥ 0, we now have a function y x t = X xT t = a cosx + b sinx cosct which satisfies the wave equation and the condition
y x 0 = 0
t We need to satisfy the condition yx 0 = fx. For the similar problem on 0 L, we had a function yn x t for each positive integer n, and we attempted a superposition yn x t. n=1 Now the eigenvalues fill out the entire nonnegative real line, so replace n=1 with 0 · · · d in forming the superposition: y x td = (17.17) yx t =
a cosx + b sinx cosctd 0
0
810
CHAPTER 17
The Wave Equation
The initial displacement condition requires that yx 0 =
a cosx + b sinx d = fx 0
The integral on the left is the Fourier integral representation of fx for − < x < . Thus choose the constants as the Fourier integral coefficients: 1 a = f cosd − and b =
1 f sind −
With this choice of the coefficients, and certain conditions on f (see the convergence theorem for Fourier integrals in Section 15.1), equation (17.17) is the solution of the problem.
EXAMPLE 17.4
Consider the problem 2
2 y 2 y = c
t2
x2
for − < x < t > 0
y x 0 = 0 for < x <
t A graph of the initial position of the string is given in Figure 17.15. To use equation (17.17), compute the Fourier integral coefficients: 1 − 2 a = e cosd = − 1 + 2 yx 0 = e−x
and b =
1 − e sind = 0 −
(For b we need not actually carry out the integration because the integrand is an odd function). The solution is 2 1 yx t = cosx cosctd 0 1 + 2 y 1.0 0.8 0.6 0.4 0.2 6
4
2
FIGURE 17.15
0
x 2
4
Graph of y = e−x .
6
17.3 Wave Motion Along Infinite and Semi-Infinite Strings
811
The solution (17.17) may be written in more compact form as follows. Insert the integral formulas for the coefficients: yx t =
a cosx + b sinx cosctd 0
1 = f cosd cosx 0 −
f sind sinx cosctd + −
1 =
cos cosx + sin sinx f cosctdd − 0 1 = cos − xf cosctdd (17.18) − 0 Zero Initial Displacement Suppose now the string is released from the horizontal position (zero initial displacement), with initial velocity gx. The initial value problem for the displacement function is 2
2 y 2 y = c for − < x < t > 0
t2
x2
y yx 0 = 0 x 0 = gx for < x <
t
Let yx t = XxTt and proceed exactly as in the case of initial displacement fx and zero initial velocity, obtaining eigenvalues = 2 for ≥ 0 and eigenfunctions X x = a cosx + b sinx Turning to T , we obtain, again as before, Tt = a cosct + b sinct However, this problem differs from the preceding one in the initial condition on Tt. Now we have yx 0 = XxT0 = 0 so T0 = 0 and hence a = 0. Thus in this, for each ≥ 0, Tt is a constant multiple of sinct. This gives us functions y x t = a cosx + b sinx sinct Now use the superposition yx t =
0
a cosx + b sinx sinctd
in order to satisfy the initial condition. Compute
y
a cosx + b sinxc cosctd =
t 0
(17.19)
812
CHAPTER 17
The Wave Equation
We need
y x 0 =
ca cosx + cb sinxd = gx
t 0 This is a Fourier integral expansion of the initial velocity function. With conditions on g (such as are given in the convergence theorem for Fourier integrals), choose ca =
1 g cosd −
cb =
1 g sind −
and
Then a =
1 g cosd c −
and
b =
1 g sind c −
With these coefficients, equation (17.19) is the solution of the problem.
EXAMPLE 17.5
Suppose the initial displacement is zero and the initial velocity is given by ex for 0 ≤ x ≤ 1 gx = 0 for x < 0 and for x > 1 A graph of this function is shown in Figure 17.16. To use equation (17.19) to write the displacement function, compute the coefficients: 1 1 1 g cosd = e cosd c − c 0 1 e cos + e sin − 1 = c 1 + 2
a =
y 3.0 2.5 2.0 1.5 1.0 0.5 2 1
0
1
2
x
FIGURE 17.16
gx =
ex
for 0 ≤ x ≤ 1
0
for x < 0
and for x > 1
17.3 Wave Motion Along Infinite and Semi-Infinite Strings and
813
1 1 1 g sind = e sind c − c 0 1 e cos − e sin − =− c 1 + 2
b =
The solution is
1 e cos + e sin − 1 yx t = cosx sinctd c 1 + 2 0 1 e cos − e sin − sinx sinctd − c 1 + 2 0
As in the case of wave motion on 0 L, the solution of a problem with nonzero initial velocity and displacement can be obtained as the sum of the solutions of two problems, in one of which there is no initial displacement, and in the other, zero initial velocity.
17.3.2
Wave Motion Along a Semi-Infinite String
We will now consider the problem of wave motion along a string fastened at the origin and stretched along the nonnegative x axis. Unlike the case of the string along the entire line, there is now one boundary condition, at x = 0. The problem is 2
2 y 2 y = c
t2
x2
y0 t = 0
for 0 < x < t > 0
for t ≥ 0
y x 0 = gx for 0 < x <
t Again, we want a bounded solution. Let yx t = XxTt and obtain yx 0 = fx
X + X = 0 T + c2 T = 0 In this problem we have a boundary condition: y0 t = X0Tt = 0 implying that X0 = 0. Begin by looking for the eigenvalues and corresponding eigenfunctions. Consider cases on . Case 1 = 0. Now Xx = ax + b. Since X0 = b = 0, then Xx = ax. This is unbounded on 0 unless a = 0, so = 0 yields no bounded nontrivial solution for X, and 0 is not an eigenvalue. Case 2 is negative. Now write = −2 to obtain X − 2 X = 0. This has general solution Xx = aex + be−x Now X0 = a + b = 0 implies that b = −a, so Xx = 2a sinhx. This is unbounded for x > 0 unless a = 0, so this problem has no negative eigenvalue.
814
CHAPTER 17
The Wave Equation
Case 3 is positive. Now write = 2 and obtain Xx = a cosx + b sinx Since X0 = a = 0, only the sine terms remain. Thus every positive number is an eigenvalue, with corresponding eigenfunctions nonzero constant multiples of sinx. Now the problem for T is T + c2 2 T = 0, with general solution Tt = a cosct + b sinct At this point we must isolate the problem into one with zero initial displacement or zero initial velocity. Suppose, to be specific, that gx = 0. Then T 0 = 0, so b = 0 and Tt must be a constant multiple of cosct. We therefore have functions y x t = c sinx cosct for each > 0. Define the superposition yx t = c sinx cosctd 0
Each such function satisfies the wave equation and the boundary condition, as well as yt x 0 = 0 for x > 0. To satisfy the condition on initial displacement, we must choose the coefficients so that yx 0 = c sinxd = fx 0
This is the Fourier integral expansion of fx on 0 , so choose 2 c = f sind 0 The solution of the problem is 2 f sind sinx cosctd yx t = 0 0 If the problem has zero initial displacement, but initial velocity gx, then a similar analysis leads to the solution yx t = c sinx sinctd 0
where c =
2 g sind c 0
EXAMPLE 17.6
Consider wave motion along the half-line governed by the problem:
2 y
2 y = 16 2 2
t
x y0 t = 0
for x > 0 t > 0
for t ≥ 0
y sinx for 0 ≤ x ≤ 4 x 0 = 0 yx 0 =
t 0 for x > 4
17.3 Wave Motion Along Infinite and Semi-Infinite Strings
815
Here c = 4. To write the solution, we need only compute the coefficients 2 f sind 0 2 4 2 cos2 − 1 = sin sind = 8 sin cos 0 2 − 2
c =
The solution is yx t =
17.3.3
8 sin cos 0
2 cos2 − 1 sinx cos4td 2 − 2
Fourier Transform Solution of Problems on Unbounded Domains
It is useful to have a variety of tools and methods available to solve boundary value problems. To this end, we will revisit problems of wave motion on the line and half line and approach the solution through the use of a Fourier transform. First, here is brief description of what is involved in using a transform. 1. The range of values for the variable in which the transform will be performed is one determining factor in choosing a transform. Another is how the information given in the boundary value problem fits into the operational formula for the transform. For example, the operational formula for the Fourier sine transform is S f x = −2 fˆS + f0 so we must be given information about f0 in the problem to make use of this transform. 2. If the transform is performed with respect to a variable of the boundary value problem, we obtain a differential equation involving the other variable(s). This differential equation must be solved subject to other information given in the problem. This solution gives the transform of the solution of the original boundary value problem. 3. Once we have the transform of the solution of the boundary value problem, we must invert it to obtain the solution of the boundary value problem itself. Finally, the Fourier transform of a real-valued function is often complex-valued. If the solution is real-valued, then the real part of the expression obtained using the Fourier transform is the solution. However, because expressions such as e−ix are often easier to manipulate than cosx and sinx, we often retain the entire complex expression as the “solution”, extracting the real part when we need numerical values, graphs, or other information. For reference, we will summarize (without conditions on the functions) some facts about the Fourier transform and the Fourier sine and cosine transforms. Fourier Transform f = fˆ = fx =
−
fxe−ix dx
1 ˆ f eix d 2 −
f = −2 fˆ
816
CHAPTER 17
The Wave Equation
Fourier Cosine Transform C f = fˆC = fx =
fx cosxdx 0
2 ˆ f cosxd 0 C
C f = −2 fˆC − f 0 Fourier Sine Transform S f = fˆS = fx =
fx sinxdx 0
2 ˆ f sinxd 0 S
S f = −2 fˆS + f0 Fourier Transform Solution of the Wave Equation on the Line Consider again the problem 2
2 y 2 y = c
t2
x2
yx 0 = fx
for − < x < t > 0
y x 0 = 0
t
for − < x <
Because x varies over the entire line, we can try the Fourier transform in the x variable. To do this, transform yx t as a function of x, leaving t as a parameter. First apply to the wave equation: 2 2
y
y 2 = c 2
t
x2 Because we are transforming in x, leaving t alone, we have 2 2 y
y
2
2 −ix −ix x te dx = yx te dx = yˆ t = 2
t2
t2 −
t2 − t where yˆ t is the Fourier transform, with respect to x , of yx t. The partial derivative with respect to t passes through the integral with respect to x because x and t are independent. For the Fourier transform, in x, of 2 y/ x2 , use the operational formula: 2
y = −2 yˆ t
x2 The transformed wave equation is therefore
2 yˆ t = −c2 2 yˆ t
t2 or
2 yˆ t + c2 2 yˆ t = 0
t2 Think of this as an ordinary differential equation for yˆ t in t, with carried along as a parameter. The general solution has the form yˆ t = a cosct + b sinct
17.3 Wave Motion Along Infinite and Semi-Infinite Strings
817
We obtain the coefficients by transforming the initial data. First, yˆ 0 = a = yx 0 = f = fˆ the transform of the initial position function. Next
y
ˆy x 0 = 0 = 0 cb = 0 =
t
t because the initial velocity is zero. Therefore b = 0 and yˆ t = fˆ cosct We now know the transform of the solution yx t. Invert this to find yx t: yx t =
1 ˆ f coscteix d 2 −
(17.20)
This is an integral formula for the solution, since fˆ is presumably known to us because we were given f . Since eix is complex-valued, we must actually take the real part of this integral to obtain yx t. However, the integral is often left in the form of equation (17.20) with the understanding that yx t is the real part. We will show that the solutions of this problem obtained by Fourier transform and Fourier integral are the same. Write the solution just obtained by transform as 1 ˆ f coscteix d 2 − 1 = fe−i d coscteix d 2 − − 1 −i−x = e cosctfdd 2 − − 1
cos − x − i sin − x cosctfdd = 2 − −
ytr x t =
Since the displacement function is real-valued, we must take the real part of this integral, obtaining yx t =
1 cos − x cosctfdd 2 − −
Finally, this integrand is an even function of , so 1 1 1 · · · d = 2 · · · d = · · · d 2 − 2 0 0 yielding yx t =
1 cos − x cosctfdd − 0
This agrees with the solution (17.18) obtained by Fourier integral.
818
CHAPTER 17
The Wave Equation
EXAMPLE 17.7
Solve for the displacement function on the real line if the initial velocity is zero and the initial displacement function is given by cosx for − /2 ≤ x ≤ /2 fx = 0 for x > /2 To use the solution (17.20) we must compute fˆ = fe−i d = −
=
2
cos 2 1−2
2
/2 −/2
cose−i d
for = 1 for = 1
fˆ is continuous, since lim
→1
The solution can be written yx t =
2 cos/2 = 1 − 2 2
1 cos/2 coscteix d − 1 − 2
with the understanding that yx t is the real part of the integral on the right. If we explicitly take this real part, then 1 cos/2 yx t = cosx cosctd − 1 − 2
EXAMPLE 17.8
In some instances a clever use of the Fourier transform can yield a closed form solution. Consider the problem
2 y
2 y = 9
t2
x2 yx 0 = 4e−5x
for − < x < t ≥ 0 for − < x <
y x 0 = 0
t Take the transform of the differential equation, obtaining as in the discussion
2 yˆ t = −92 yˆ t
t2 with general solution yˆ t = a cos3t + b sin3t Now use the initial conditions. Using the initial position function we have
yˆ 0 = a = yx 0 = 4e−5x =
40 25 + 2
17.3 Wave Motion Along Infinite and Semi-Infinite Strings
819
Next, using the initial velocity, write
ˆy
y 0 = 3b = x 0 = 0
t
t so b = 0. Then yˆ t =
40 cos3t 15 + 2
We can now write the solution in integral form as 1 40 yx t = −1 ˆy tx = cos3teix d 2 − 25 + 2 However, in this case we can explicitly invert yˆ t, using some facts about the Fourier transform. Begin by using the convolution theorem to write
40 −1 yx t = cos3t 25 + 2
40 = −1 ∗ −1 cos3t 25 + 2 = 4e−5x ∗ −1 cos3t
(17.21)
We need to compute the inverse Fourier transform of cos3t. Here is the variable of the transformed function, with t carried along as a parameter. The variable of the inverse transform will be x. Combine the fact that t = 1 from Section 15.4.5, with the modulation theorem (Theorem 15.6 in Section 15.3) to get cos0 x = + 0 + − 0 in which is the Dirac delta function. By the symmetry theorem (Theorem 15.5 of Section 15.3), + 0 + − 0 = 2 cos0 Therefore 1 −1 cos0 x = x + 0 + x − 0 2 Now put 0 = 3t to get 1 −1 cos3tx = x + 3t + x − 3t 2 Therefore equation 17.21 gives 1 yx t = 4e−5x ∗ x + 3t + x − 3t 2 = 2 e−5x ∗ x + 3t + e−5x ∗ x − 3t =2 e−5x− + 3td + 2 e−5x− − 3td −
= 2e
−5x+3t
−
+ 2e
−5x−3t
820
CHAPTER 17
The Wave Equation
in which the last line was obtained by using the filtering property of the Delta function (Theorem 15.13 of Section 15.4.5). This closed form of the solution and is easily verified directly. Transform Solution of the Wave Equation on a Half-Line We will use a transform to solve a wave problem on a half-line, with the left end fixed at x = 0. This time we will take the case of zero initial displacement, but a nonzero initial velocity: 2
2 y 2 y = c
t2
x2
y0 t = 0 yx 0 = 0
for 0 < x < t > 0
for t ≥ 0
y x 0 = gx for 0 < x <
t
Now the Fourier transform is inappropriate because both x and t range only over the nonnegative real numbers. We can try the Fourier sine or cosine transform in x. The operational formula for the sine transform requires the value of the solution at x = 0, while the formula for the cosine transform uses the value of the derivative at the origin. Since we are given the condition y0 t = 0 (fixed left end of the string), we are led to try the sine transform. Let yˆ S t be the sine transform of yx t in the x− variable. Take the sine transform of the wave equation. The partial derivatives with respect to t pass through the transform, and we use the operational formula for the transform of the second derivative with respect to x: 2
y
2 yˆ S 2 = −c2 2 yˆ S t + c2 y0 t = −c2 2 yˆ S t = c S
t2
x2 Then yˆ S t = a cosct + b sinct Now a = yˆ S 0 = S yx 0 = S 0 = 0 and
ˆyS 0 = cb = gˆ S
t so b =
1 gˆ c S
Therefore yˆ t =
1 gˆ sinct c S
This is the sine transform of the solution. The solution is obtained by inverting:
1 2 1 −1 gˆ S sinct x = gˆ sinx sinctd yx t = S c 0 c S
17.3 Wave Motion Along Infinite and Semi-Infinite Strings
821
EXAMPLE 17.9
Consider the following problem on a half-line
2 y
2 y = 25
t2
x2 y0 t = 0
for x > 0 t > 0
for t ≥ 0
yx 0 = 0
y x 0 = gx
t
where
gx =
9 − x2 0
for 0 < x <
for 0 ≤ x ≤ 3 for x > 3
If we use the Fourier sine transform, then the solution is 2 1 gˆ sinx sin5td yx t = 0 5 S All that is left to do is compute gˆ S = g sinxdx =
0
3 0
9 − x2 sinxdx
−8 cos3 + 6 cos − 24 sin cos2 + 6 sin + 92 + 2 3 yielding an integral expression for the solution. =
SECTION 17.3
PROBLEMS
In each of Problems 1 through 6, consider the wave
2 y
2 y equation 2 = c2 2 on the line, for the given value
t
x of c and the given initial conditions yx 0 = fx and
y x 0 = gx. Solve the problem using the Fourier in t tegral and then again using the Fourier transform. 1. c = 12 fx = e−5x gx = 0 8 − x for 0 ≤ x ≤ 8 2. c = 8 fx = 0 for x < 0 and for x > 0 gx = 0 sinx for − ≤ x ≤ 3. c = 4 fx = 0 gx = 0 for x > 2 − x for − 2 ≤ x ≤ 2 4. c = 1 fx = 0 for x > 2 gx = 0
5. c = 3 fx = 0 gx =
e−2x 0
for x ≥ 1 for x < 1
6. c = 2 fx = 0 ⎧ ⎪ ⎨ 1 for 0 ≤ x ≤ 2 gx = −1 for − 2 ≤ x < 0 ⎪ ⎩ 0 for x > 2 and for x < −2 In each of Problems 7 through 11, consider the wave
2 y
2 y equation 2 = c2 2 on the half-line, with yx 0 = 0
t
x for x > 0, and for the given value of c and the given bound y x 0 = gx ary initial conditions yx 0 = fx and
t for x ≥ 0. Solve the problem using separation of variables (the Fourier sine integral) and then again using the Fourier sine transform.
CHAPTER 17
822
x1 − x 7. c = 3 fx = 0 gx = 0
The Wave Equation for 0 ≤ x ≤ 1 for x > 1
⎧ ⎪ ⎨0 8. c = 3 fx = 0, gx = 2 ⎪ ⎩ 0
Sometimes the Laplace transform is effective in solving boundary value problems involving the wave equation. Use the Laplace transform to solve the following.
2 y
2 y = c2 2 for x > 0 t > 0 2
t
x sin2t for 0 ≤ t ≤ 1 y0 t = 0 for t > 0
12.
for 0 ≤ x < 4 for 4 ≤ x ≤ 11 for x > 11
yx 0 =
9. c = 2 fx = 0 gx =
cosx 0
for /2 ≤ x ≤ 5/2 for 0 ≤ x < /2 and
for x > 0 for x > 5/2
13. Solve
2 y
2 y = c2 2 2
t
x
−x
10. c = 6 fx = −2e gx = 0
x2 3 − x for 0 ≤ x ≤ 3 11. c = 14 fx = 0 gx = 0 for x > 3
17.4
y x 0 = 0
t
for x > 0 t > 0
y0 t = t
for t > 0
yx 0 = 0
y x 0 = A
t
for x > 0
Characteristics and d’Alembert’s Solution This section will involve repeated chain rule differentiations, which are efficiently written using 2 = ut , u = ux ,
t2u = utt , and so on. subscript notation for partial derivatives. For example, u
t
x Our objective is to examine a different perspective on the problem utt = c2 uxx
for − < x < t > 0
ux 0 = fx ut x 0 = gx for − < x < Here we are using ux t as the position function because we will be changing variables from the x y plane to a plane, and we do not want to confuse the solution function with coordinates of points. This boundary value problem, which we have solved using the Fourier integral and again using the Fourier transform, is referred to as the Cauchy problem for the wave equation.We will write a solution that dates to the eighteenth century. The lines x − ct = k1
x + ct = k2
with k1 and k2 any real constants, are called characteristics of the wave equation. These form two families of lines, one consisting of parallel lines with slope 1/c, the other of parallel lines with slope −1/c. Figure 17.17 shows some of these characteristics. We will see that these lines are closely related to the wave motion. However, our first use of them will be to write an explicit solution of the wave equation in terms of the initial data. Define a change of coordinates = x − ct
= x + ct
This transformation is invertible, since 1 x = + 2
t=
1 − + 2c
17.4 Characteristics and d’Alembert’s Solution
823
t x ct k1
x x ct k2
FIGURE 17.17 Characteristics of the wave
equation.
Define U = ux y Now compute derivatives: ux = U x + U x = U + U uxx = U x + U x + U x + U x = U + 2U + U ut = U −c + U c and utt = −c U −c + U c + c U −c + U c = c2 U − 2c2 U + c2 U Then utt − c2 uxx = 4c2 U In the new coordinates, the wave equation is U = 0 This is called the canonical form of the wave equation, and it is an easy equation to solve. First write it as U = 0 This means that U is independent of , say U = h Integrate to get U =
h d + F
in which F is the “constant” of integration of the partial derivative with respect to . Now h d is just another function of which we will write as G . Thus U = F + G
824
CHAPTER 17
The Wave Equation
where F and G must be twice continuously differentiable functions of one variable, but are otherwise arbitrary. We have shown that the solution of utt = c2 uxx has the form ux t = Fx − ct + Gx + ct
(17.22)
Equation (17.22) is called d’Alembert’s solution of the wave equation, after the French mathematician Jean le Rond d’Alembert (1717–1783). Every solution of utt = c2 uxx must have this form. Now we will show how to choose F and G to satisfy the initial conditions. First, ux 0 = Fx + Gx = fx
(17.23)
ut x 0 = −cF x + cG x = gx
(17.24)
and
Integrate equation (17.24) and rearrange terms to obtain 1 x −Fx + Gx = gd − F0 + G0 c 0 Add this equation to equation (17.23) to get 1 x 2Gx = fx + gd − F0 + G0 c 0 Therefore 1 1 x 1 1 Gx = fx + gd − F0 + G0 2 2c 0 2 2
(17.25)
But then, from equation (17.23), 1 1 x 1 1 Fx = fx − Gx = fx − gd + F0 − G0 2 2c 0 2 2
(17.26)
Finally, use equations (17.25) and (17.26) to write the solution as ux t = Fx − ct + Gx + ct 1 1 x−ct 1 1 = fx − ct − gd + F0 − G0 2 2c 0 2 2 1 x+ct 1 1 1 + fx + ct + gd − F0 + G0 2 2c 0 2 2 or, after cancellations, 1 x+ct 1 ux t = fx − ct + fx + ct + gd 2 2c x−ct
(17.27)
Equation (17.27) is d’Alembert’s formula for the solution of the Cauchy problem for the wave equation on the entire line. It is an explicit formula for the solution of the Cauchy problem, in terms of the given initial position and velocity functions.
EXAMPLE 17.10
We will solve the boundary value problem utt = 4uxx
for − < x < t > 0
ux 0 = e−x ut x 0 = cos4x for − < x <
17.4 Characteristics and d’Alembert’s Solution
825
By d’Alembert’s formula, we immediately have 1 x+2t 1 −x−2t e + e−x+2t + cos4d 2 4 x−2t 1 1 −x−2t sin4x + 2t − sin4x − 2t e + e−x+2t + = 2 16 1 1 −x−2t = e + e−x+2t + sin4x cos8t 2 8
ux t =
17.4.1
A Nonhomogeneous Wave Equation
Using the characteristics, we will write an expression for the solution of the nonhomogeneous problem: utt = c2 uxx + Fx t for − < x < t > 0 ux 0 = fx ut x 0 = gx for − < x < This problem is called nonhomogeneous because of the term Fx t, which we assume to be continuous for all real x and t ≥ 0. Fx t can be thought of as an external driving or damping force acting on the string. Suppose we want the solution at x0 t0 . Recall that the characteristics of the wave equation are straight lines in the x t plane. There are exactly two characteristics through this point, and these are the lines x − ct = x0 − ct0
and
x + ct = x0 + ct0
Segments of these characteristics, together with the interval x0 − ct0 x0 + ct0 , form a characteristic triangle , shown in Figure 17.18. Label the sides of as L, M and I. Since is a region in the x t plane, we can compute the double integral of −Fx t over :
− c2 ux − ut dA Fx tdA = c2 uxx − utt dA =
x
t Apply Green’s theorem to the last integral, with x and t as the independent variables instead of x and y. This converts the double integral to a line integral around the boundary C of . This piecewise smooth curve, which consists of three line segments, is oriented counterclockwise.
t x ct x 0 ct0 x ct x 0 ct0 M x 0 ct0 I
(x 0, t0 ) L x 0 ct0
FIGURE 17.18 Characteristic
triangle.
x
826
CHAPTER 17
The Wave Equation
We obtain by Green’s theorem, − Fx tdA = ut dx + c2 ux dt
C
Now evaluate the line integral on the right by evaluating it on each segment of C in turn. On I, t = 0, so dt = 0, and x varies from x0 − ct0 to x0 + ct0 , so I
ut dx + c2 ux dt =
x0 +ct0 x0 −ct0
ut x 0dx =
x0 +ct0 x0 −ct0
gd
On L, x + ct = x0 + ct0 , so dx = −cdt and L
ut dx + c2 ux dt =
1 ut −cdt + c2 ux − dx = −c du c L L
= −c ux0 t0 − ux0 + ct0 0 Finally, on M, x − ct = x0 − ct0 , so dx = cdt and 1 ut dx + c ux dt = ut cdt + c ux dx = c du c M L M
2
2
= c ux0 − ct0 0 − ux0 t0 M has initial point x0 t0 and terminal point x0 − ct0 0 because of the counterclockwise orientation on the boundary of . Upon summing these line integrals, we obtain −
Fx tdA =
x0 +ct0 x0 −ct0
gd
− c ux0 t0 − ux0 + ct0 0 + c ux0 − ct0 0 − ux0 t0 Then Fx tdA = −
=
x0 +ct0 x0 −ct0
x0 +ct0
x0 −ct0
gd − 2cux0 t0 + cux0 + ct0 0 + cux0 − ct0 0 gd − 2ux0 t0 + c fx0 + ct0 + fx0 − ct0
Solve this equation for ux0 t0 to obtain ux0 t0 =
1 1 x0 +ct0 1 gd + Fx tdA
fx0 − ct0 + fx0 + ct0 + 2 2c x0 −ct0 2c
We have used the subscript 0 on x0 t0 to focus attention on the point at which we are evaluating the solution. However, this can be any point with x0 real and t0 > 0. Thus the solution at an arbitrary point x t is ux t =
1 1 1 x+ct gd + F dd
fx − ct + fx + ct + 2 2c x−ct 2c
The solution at x t of the problem with the forcing term Fx t is therefore d’Alembert’s solution for the homogeneous problem (no forcing term), plus 1/2c times the double integral of the forcing term over the characteristic triangle having x t as a vertex.
17.4 Characteristics and d’Alembert’s Solution
827
EXAMPLE 17.11
Consider the problem utt = 25uxx + x2 t2
for − < x < t > 0
ux 0 = x cosx ut x 0 = e−x
for − < x <
The solution at any point x and time t has the form 1 x+5t − 1 ux t = x − 5t cosx − 5t + x + 5t cosx + 5t + e d 2 10 x−5t 1 2 2 dd + 10 All we have to do is evaluate the integrals. First, 1 1 1 x+5t − e d = − e−x−5t + e−x+5t 10 x−5t 10 10 For the double integral of the forcing term, proceed from Figure 17.19: 1 t x+5t−5 2 2 1 2 2 dd = dd 10 10 0 x−5t+5 =
1 4 2 5 6 t x + t 12 36
The solution is ux t =
1
x − 5t cosx − 5t + x + 5t cosx + 5t 2 1 1 5 1 − e−x−5t + e−x+5t + t4 x2 + t6 10 10 12 36
In the last example, ux t gives the position function of the string, at any given time t. The graph of ux t in the x t plane is not a snapshot of the string at any time. Rather, a picture of the string at time t is the graph of points x ux t, with t fixed at the time of interest. Figure 17.20(a) shows a segment of the string at time t = 03, both for the forced and unforced motion. Figure 17.20(b) shows a segment of the string for t = 06, again both for unforced and forced motion. This method of characteristics can also be used to solve boundary value problems involving the wave equation on a bounded interval 0 L. However, this is a good deal more involved than the solution on the entire line, so we will leave this to a more advanced treatment of partial differential equations.
η ξ 5η x 5t
ξ 5η x 5t (x 5t 5η, η) x 5t FIGURE 17.19
(x, t)
(x 5t 5η, η) x 5t
ξ
828
CHAPTER 17
The Wave Equation u(x, 0.3)
u(x, 0.6)
20 30 15 20
10
10
5 4
2
x 0
2
4
FIGURE 17.20(a) Profile of the forced and unforced string at t = 03.
3
2
1
FIGURE 17.20(b)
x 0
1
2
3
t = 06.
17.4.2 Forward and Backward Waves Continuing with the boundary value problem for the wave equation on the entire real line, we can write d’Alembert’s formula (17.27) for the solution as 1 1 x−ct ux t = gd fx − ct − 2 c 0 1 1 x+ct + fx + ct + gd 2 c 0 = x − ct + x + ct where 1 x 1 x = fx − gd 2 2c 0 and 1 x 1 x = fx + gd 2 2c 0 We call x − ct a forward (or right) wave, and x + ct a backward (or left) wave. The graph of x − ct is the graph of x translated ct units to the right. We may therefore think of x − ct as the graph of x moving to the right with velocity c. The graph of x + ct is the graph of x translated ct units to the left. Thus x + ct is the graph of x moving to the left with velocity c. The string profile at time t, given by the graph of y = ux t as a function of x, is the sum of these forward and backward waves at time t. As an example of this process, consider the boundary value problem in which c = 1, 4 − x2 for − 2 ≤ x ≤ 2 fx = 0 for x > 2 and gx = 0. This initial position function is shown in Figure 17.21(a). The solution is a sum of a forward and a backward wave: 1 1 ux t = x + ct + x − ct = fx + t + fx − t 2 2
17.4 Characteristics and d’Alembert’s Solution u(x, 0)
829
u(x, 13 )
4
3
2
3
6
2
4
1
2
1
0
1
2
x
3
fx =
4−x
2
for −2 ≤ x ≤ 2 for x > 2
0
2
0
2
x
4
FIGURE 17.21(b) Superposition of forward and backward waves at t = 13 .
FIGURE 17.21(a)
4
u(x, 1.2)
u(x, 1.6)
5
4
4
3
3 2 2 1
1 4
2
0
2
x
4
4
t = 12.
FIGURE 17.21(c)
2
4
4
3
3 2
1 2
FIGURE 17.21(e)
x
4
u(x, 2.1)
2
4
2 t = 16.
FIGURE 17.21(d)
u(x, 1.8)
6
0
1 2
t = 18.
4
6
x
6 4 2
0
2
4
6
x
FIGURE 17.21(f) t = 21.
At any time t, the motion consists of the initial position function translated t units to the right, superimposed on the initial position function translated t units to the left. We see the motion as the initial position function (Figure 17.21(a)) moving simultaneously right and left. Because fx vanishes outside of −2 2, these forward and backward waves actually separate and
830
CHAPTER 17
The Wave Equation u(x, 3)
u(x, 7)
4
4
3
3
2
2
1
1
6 4 2
0
FIGURE 17.21(g)
2
4
6
t = 3.
x
108 6 4 2 FIGURE 17.21(h)
0
2
4
6
8 10
x
t = 7.
become disjoint, one continuing to move to the right, and the other to the left on the real line. This process is shown in Figures 17.21(b) through (h).
SECTION 17.4
PROBLEMS
In each of Problems 1 through 6, determine the characteristics of the wave equation for the problem utt = c2 uxx
for − < x < t > 0
ux 0 = fx ut x 0 = gx
for − < x <
In each of Problems 13 through 18, write the solution of the problem utt = uxx
for − < x < t > 0
ux 0 = fx ut x 0 = 0
for − < x <
for the given value of c, and write the d’Alembert solution. 1. c = 1 fx = x2 gx = −x 2. c = 4 fx = x2 − 2x gx = cosx 3. c = 7 fx = cosx gx = 1 − x2 4. c = 5 fx = sin2x gx = x3 5. c = 14 fx = ex gx = x 6. c = 12 fx = −5x + x2 gx = 3 In each of Problems 7 through 12, solve the problem utt = c2 uxx + Fx t for − < x < t > 0 ux 0 = fx ut x 0 = gx
for − < x <
for the given c, fx and gx. 7. c = 4 fx = x gx = e−x Fx t = x + t 8. c = 2 fx = sinx gx = 2x Fx t = 2xt 9. c = 8 fx = x2 − x gx = cos2x Fx t = xt2 10. c = 4 fx = x2 gx = xe−x Fx t = x sint 11. c = 3 fx = coshx gx = 1 Fx t = 3xt3 12. c = 7 fx = 1+x gx = sinx Fx t = x−cost
as a sum of a forward and backward wave. Graph the initial position function and then graph the solution at selected times, showing the solution as a superposition of forward and backward waves moving in opposite directions along the real line. sin2x for − ≤ x ≤ 13. fx = 0 for x > 1 − x for − 1 ≤ x ≤ 1 14. fx = 0 for x > 1 cosx for − /2 ≤ x ≤ /2 15. fx = 0 for x > /2 1 − x2 for x ≤ 1 16. fx = 0 for x > 1 x2 − x − 2 for − 1 ≤ x ≤ 2 17. fx = 0 for x < −1 and for x > 2 x3 − x2 − 4x + 4 for − 2 ≤ x ≤ 2 18. fx = 0 for x > 2
17.5 Normal Modes of Vibration of a Circular Elastic Membrane
17.5
831
Normal Modes of Vibration of a Circular Elastic Membrane We will analyze the motion of a membrane (such as a drumhead) fastened onto a circular frame and set in motion with given initial position and velocity. Let the rest position of the membrane be in the x y plane with the origin at the center, and let the membrane have radius R. Using polar coordinates, the particle of membrane at r is assumed to vibrate vertical to the x y plane, and its displacement from the rest position at time t is zr t. Equation (17.4) gives the wave equation for this displacement function: 2
z 1 z 1 2 z
2 z 2 + =c +
t2
r 2 r r r 2 2 We will assume for the moment that the motion of the membrane is symmetric about the origin, in which case z depends only on r and t. Now the wave equation is 2
z 1 z
2 z 2 = c +
t2
r 2 r r Let the initial displacement be given by zr 0 = fr, and let the initial velocity be
z r 0 = gr
t Attempt a solution zr t = FrT We obtain, after a routine calculation, T + T = 0
1 and F + F + 2 F = 0 r c
If > 0, say = 2 , the equation for F is a zero order Bessel equation, with general solution r + bY0 r Fr = aJ0 c c Since Y0 r/c → − as r → 0 (the center of the membrane), choose b = 0. Now the equation for T is T + 2 T = 0 with general solution Tt = d cost + k sint We have, for each > 0, a function z r t = a J0
r cost + b J0 r sint c c
Since the membrane is fixed on a circular frame, z R t = a J0 R cost + b J0 R sint = 0 c c for t > 0. This condition is satisfied if J0 R/c = 0. Let j1 , j2 be the positive zeros of J0 , with j1 < j2 < · · ·
832
CHAPTER 17
The Wave Equation
and choose R = jn c or n =
jn c R
for n = 1 2 This yields the eigenvalues of this problem: n = 2n = We now have
zn r t = an J0
jn r R
jn2 c2 R2
jn r jn ct jn ct cos + b n J0 sin R R R
All of these functions satisfy the boundary condition zR t = 0. To satisfy the initial conditions, attempt a superposition jn r jn r jn ct jn ct a n J0 cos + bn J0 sin (17.28) zr t = R R R R n=1 Now zr 0 = fr =
n=1
an J0
jn r R
a Fourier-Bessel expansion of fr. Let s = r/R to convert this series to fRs =
an J0 jn s
n=1
in which s varies from 0 to 1. We know from Section 16.3.3 that the coefficients in this expansion are given by 1 2 an = sfRsJ0 jn sds 2
J1 jn 0 for n = 1 2 Next we must solve for the bn s. Compute
j r j c
z r 0 = gr = bn n J0 n
t R R n=1
This is a Fourier–Bessel expansion of gr. Again referring to Section 16.3.3, we must choose 1 2 j c bn n = sgRsJ0 jn sds R
J1 jn 2 0 or bn =
1 2R sgRsJ0 jn sds cjn J1 jn 2 0
for n = 1 2 With these coefficients, equation (17.28) is the solution for the position function of the membrane.
17.5 Normal Modes of Vibration of a Circular Elastic Membrane
833
The numbers n = jn c/R are the frequencies of normal modes of vibration, which have periods 2/n = 2R/jn c. The normal modes of vibration are the functions zn r t. Often these functions are written in phase angle form as j r zn r t = An J0 n cosn t + n R in which An and n are constants. The first normal mode is
z1 r t = A1 J0
j1 r R
cos1 t + 1
As r varies from 0 to R, j1 r/R varies from 0 to j1 . At any time t, a radial section through the membrane takes the shape of the graph of J0 x for 0 ≤ x ≤ j1 (Figure 17.22(a)). The second normal mode is jr z2 r t = A2 J0 2 cos2 t + 2 R Now as r varies from 0 to R, j2 r/R varies from 0 to j2 , passing through j1 along the way. Since J0 j2 r/R = 0 when j2 r/R = j1 , this mode has a nodal circle (fixed in the motion) at radius r=
j1 R j2
A section through the membrane takes the shape of the graph of J0 x for 0 ≤ x ≤ j2 (Figure 17.22(b)). Similarly, the third normal mode is jr z3 r t = A3 J0 3 cos3 t + 3 R and this mode had two nodes, one at r = j1 R/j3 and the second at r = j2 R/j3 . Now a radial section has the shape of the graph of J0 x for 0 ≤ x ≤ j3 (Figure 17.22(c)). In general, the nth normal mode has N − 1 nodes (fixed circles in the motion of the membrane), occurring at j1 R/jn jn−1 R/jn . In the next section we will revisit this problem, this time retaining the dependence of the displacement function. This will lead us to a solution involving a double Fourier sine series.
0 FIGURE 17.22(a)
normal mode.
y J0(x)
y J0(x)
J0(x) j1 First
x 0
j1
FIGURE 17.22(b)
normal mode.
j2 Second
x
0
j1
j2
j3
x
FIGURE 17.22(c) Third normal
mode.
CHAPTER 17
834
SECTION 17.5
The Wave Equation
PROBLEMS
1. Let c = R = 1, fr = 1 − r and gr = 0. Using material from Section 16.2 (Bessel functions), approximate the coefficients a1 through a5 in the solution given by equation 17.28 and graph the fifth partial sum of the solution for a selection of different times. Write the (approximate) normal modes zn r t = An J0 jn r cosn t + n for n = 1 5.
17.6
2. Repeat Problem 1, except now use fr = 1 − r 2 and gr = 0. 3. Repeat Problem 1, but now use fr = sinr and gr = 0.
Vibrations of a Circular Elastic Membrane, Revisited We will continue from the last section with vibrations of an elastic membrane fixed on a circular frame. Now, however, retain the -dependence of the displacement function and consider the entire wave equation 2
z 1 z 1 2 z
2 z 2 + =c +
t2
r 2 r r r 2 2 for 0 ≤ r < R, − ≤ ≤ , t > 0. We will use the initial conditions zr 0 = fr
z r 0 = 0
t
so the membrane is released from rest with the given initial displacement. In cylindrical coordinates can be replaced by + 2n for any integer n, so we will also impose the periodicity conditions zr − t = zr t and
z
z r − t = r t
for 0 ≤ r < R and t > 0. Put zr t = Fr)Tr in the wave equation to get T F + 1/rF 1 ) = + = − c2 T F r2 ) for some constant since the left side depends only on t, and the right side only on r and . Then T + c2 T = 0 and r 2 F + rF ) + r 2 = − F ) Because the left side depends only on r and the right side only on , and these are independent, for some constant , r 2 F + rF ) + r 2 = − = F ) Then ) + ) = 0
17.6 Vibrations of a Circular Elastic Membrane, Revisited
835
and r 2 F + rF + r 2 − F = 0 In solving these differential equations for Tt, Fr and ), we have the following boundary conditions. First, by periodicity, )− = ) and
) − = )
Next, because the membrane is fixed on the circular frame, FR = 0 Finally, because the initial velocity of the membrane is zero, T 0 = 0 The problem for ) is a periodic Sturm–Liouville problem which was solved in Section 16.3.1 (Example 16.9). The eigenvalues are n = n 2
for n = 0 1 2
and eigenfunctions are )n = an cosn + bn sinn With = n2 , the problem for F is r 2 F r + rF r + r 2 − n2 Fr = 0
FR = 0
We have seen (Section 15.2.2) that this differential equation has general solution √ √ Fr = aJn r + bYn r
√ in terms of Bessel functions of order n of the first and second kinds. Because Yn r √ is unbounded as r → 0+, choose b = 0 to have a bounded solution. This leaves Fr = aJn r. To find admissable values of , we need √ FR = aJn R = 0 √ We want to satisfy this with a nonzero to avoid a trivial solution. Thus R must be one of the positive zeros of Jn . Let these positive zeros be jn1 < jn2 < · · ·
doubly indexed because this derivation depends on the choice of = n2 . Then nk =
2 jnk R2
with jnk the kth positive zero of Jn x. The nk s are the eigenvalues. Corresponding eigenfunctions are nonzero multiples of jnk Jn r for n = 0 1 2 and k = 1 2 R With these values of , the problem for T is 2 jnk 2 T = 0 T +c R
T 0 = 0
836
CHAPTER 17
The Wave Equation
with solutions constant multiples of
j Tnk t = cos nk ct R We can now form functions
znk r t = ank cosn + bnk sinnJn
jnk jnk r cos ct R R
for n = 0 1 2 and k = 1 2 Each of these functions satisfies the wave equation and the boundary conditions, together with the condition of zero initial velocity. To satisfy the condition that the initial position is given by f , write a superposition j j zr t = (17.29)
ank cosn + bnk sinnJn nk r cos nk ct R R n=0 k=1 Now we need zr 0 = fr =
ank cosn + bnk sinnJn
n=0 k=1
jnk r R
To see how to choose these coefficients, first write this equation in the form j fr = a0k J0 0k r R k=1 jnk jnk + r cosn + r sinn ank Jn bnk Jn R R n=1 k=1 k=1 For a given r, think of fr as a function of . The last equation is the Fourier series expansion, on − , of this function of . Since we know the coefficients in the Fourier expansion of a function of , we can immediately write j0k 1 r = a0k J0 fr d = 0 r R 2 − k=1 and, for n = 1 2
jnk 1 r = fr cosnd = n r R −
jnk 1 r = fr sinnd = n r R −
ank Jn
k=1
and k=1
bnk Jn
Now recognize that, for each n = 0 1 2 the last three equations are expansions of functions of r in series of Bessel functions, with sets of coefficients, respectively, a0k , ank and bnk . From Section 16.3.3, we know the coefficients in these expansions: 1 2 a0k = 0 RJ0 j0k d for k = 1 2 2
J1 j0k 0 and, for n = 1 2 ank =
1 2 n RJn jnk d
Jn+1 jnk 2 0
for k = 1 2
17.7 Vibrations of a Rectangular Membrane and bnk =
1 2 n RJn jnk d
Jn+1 jnk 2 0
837
for k = 1 2
The idea in calculating the coefficients is to first perform the integrations with respect to to obtain the functions 0 r, n r and n r, written as Fourier-Bessel series. We then obtain s, by evaluating the integrals for the coefficients in these series, which are the ank s and the bnk the coefficients in this type of eigenfunction expansion. In practice, these integrals must be approximated because the zeros of the Bessel functions of order n can only be approximated.
SECTION 17.6
PROBLEMS
1. Approximate the vertical deflection of the center of a circular membrane of radius 2 for any time t > 0 by computing the first three nonzero terms of the solution for the case c = 2 and the initial displacement is fr = 4 − r 2 sin2 , with gr = .
17.7
2. Use the solution given in the section to prove the plausible fact that the center of the membrane remains undeflected for all time if the initial displacement is an odd function of (that is, fr − = −fr ). Hint: The only integer order Bessel function that is different from zero at r = 0 is J0 .
Vibrations of a Rectangular Membrane Consider an elastic membrane stretched across a rectangular frame, to which it is fixed. Suppose the frame and the rectangle it encloses occupy the region of the x y plane defined by 0 ≤ x ≤ L, 0 ≤ y ≤ K. The membrane is given an initial displacement and released with a given initial velocity. We want to determine the vertical displacement function zx y t. At any time t, the graph of z = zx y t for 0 < x < L, 0 < y < K is a snapshot of the membrane’s position at that time. If we had a film of this function evolving over time, we would have a motion picture of the membrane. The boundary value problem for z is 2
z 2 z
2 z 2 =a + for 0 < x < L 0 < y < K t > 0
t2
x2 y2 zx 0 t = zx K t = 0
for 0 < x < L t > 0
z0 y t = yL y t = 0
for 0 < y < K t > 0
zx y 0 = fx y for 0 < x < L 0 < y < K
z x y 0 = gx y for 0 < x < L 0 < y < K
t We will solve this problem for the case of zero initial velocity, gx y = 0. Attempt a separation of variables, zx y t = XxYyTt. We get XYT = a2 X YT + XY T or Y T X − = 2 aT Y X
838
CHAPTER 17
The Wave Equation
We are unable to isolate three variables on different sides of an equation. However, we can argue that the left side is a function of just y and t, and the right side just of x, and these three variables are independent. Therefore, for some constant , Y X T − = = − a2 T Y X Now we have X + X = 0
and
T Y + = a2 T Y
In the last equation, the left side depends only on t and the right side only on y, so for some constant , Y T + = = − 2 aT Y Then Y + Y = 0
and
T + a2 + T = 0
The variables have been separated, at the cost of introducing two separation constants. Now use the boundary conditions: z0 y t = X0YyTt = 0 implies that X0 = 0 Similarly, XL = 0 Y0 = 0
and YK = 0
The two problems for X and Y are X + X = 0
X0 = XL = 0
Y + Y = 0
Y0 = YK = 0
and
These have solutions: n =
n2 2 L2
Xn x = sin
m2 2 K2
Ym y = sin
nx L
and m =
my K
with n and m varying independently over the positive integers. The problem for T now becomes 2 2 n m2 2 T + a2 + T =0 L2 L2 Further, because of the assumption of zero initial velocity,
z x y 0 = XxYyT 0 = 0
t so T 0 = 0. Then T must be a constant multiple of n2 m2 + at cos L2 K 2
17.7 Vibrations of a Rectangular Membrane
839
For each positive integer n and m, we now have a function nx my n2 m2 sin cos znm x y t = anm sin + at L K L2 K 2 that satisfies all of the conditions of the problem, except possibly the initial condition zx y 0 = fx y. For this, use a superposition nx my n2 m2 zx y t = sin cos anm sin + at L K L2 K 2 n=1 m=1 We must choose the constants to satisfy zx y 0 = fx y =
n=1 m=1
anm sin
nx L
sin
my K
We can do this by exploiting a trick we used when introducing Fourier series. Pick a positive integer m0 and multiply both sides of this equation by sinm0 y/K to get fx y sin
m y nx my m y 0 0 anm sin = sin sin K L K K n=1 m=1
Now integrate from 0 to K in the y− variable, leaving terms in x alone. We get K m y nx K my m y 0 0 dy = sin dy fx y sin anm sin sin K L K K 0 0 n=1 m=1 By orthogonality of these sine functions on 0 K, all of the integrals are zero except for the term m = m0 . The series in m therefore collapses to a single term, with K m y K 0 dy = sin2 K 2 0 when m = m0 . So far we have K m y nx K 0 dy = anm0 sin fx y sin K L 0 n=1 2 The left side of this equation is a function of x. Pick any positive integer n0 and multiply this equation by sinn0 x/L: K n x m y nx n x K 0 sin dy = anm0 sin sin 0 fx y sin 0 L K L L 0 n=1 2 Integrate, this time in the x-variable: L K n x m y 0 sin dydx fx y sin 0 L K 0 0 L nx n x K anm0 sin 0 dx = sin 2 L L 0 n=1 All the integrals on the right are zero except when n = n0 , and this integral is L/2. The last equation becomes L K n x m y KL 0 sin dydx = a fx y sin 0 L K 2 2 n0 m0 0 0
CHAPTER 17
840
The Wave Equation
Dropping the zero subscripts, which were just for ease in keeping track of which integers were fixed, we now have nx my 4 L K sin dydx anm = fx y sin LK 0 0 L K With this choice of the coefficients, we have the solution for the displacement function.
EXAMPLE 17.12
Suppose the initial displacement is given by zx y 0 = xL − xyK − y and the initial velocity is zero. The coefficients in the double Fourier expansion are nx my 4 L K anm = sin dydx xL − xyK − y sin LK 0 0 L K L K nx my 4 = dx dy xL − x sin yK − y sin LK 0 L K 0 =
16L2 K 2
−1n − 1 −1m − 1 nm 2 3
The solution for the displacement function in this case is nx my 16L2 K 2 n 2 m2 n m sin cos
−1 − 1 −1 − 1 sin + at zx y t = 2 3 L K L2 K 2 n=1 m=1 nm
SECTION 17.7
PROBLEMS
1. Solve
2 z 2 z
2 z = 2+ 2 2
t
x
y
for 0 < x < 2 0 < y < 2 t > 0
zx 0 t = zx 2 t = 0
for 0 < x < 2 t > 0
z0 y t = z2 y t = 0
for 0 < y < 2 t > 0
zx y 0 = x siny for 0 < x < 2 0 < y < 2 2
z x y 0 = 0
t 2. Solve
for 0 < x < 2 0 < y < 2
2
2 z
z 2 z = 9 +
t2
x2 y2
for 0 < x < 0 < y < t > 0
zx 0 t = zx t = 0
for 0 < x < t > 0
z0 y t = z y t = 0
for 0 < y < t > 0
zx y 0 = sinx cosy for 0 < x < 0 < y <
z x y 0 = xy
t 3. Solve
for 0 < x < 0 < y <
2
z 2 z
2 z = 4 +
t2
x2 y2
for 0 < x < 2 0 < y < 2 t > 0
zx 0 t = zx 2 t = 0
for 0 < x < 2 t > 0
z0 y t = z2 y t = 0
for 0 < y < 2 t > 0
zx y 0 = 0
for 0 < x < 2 0 < y < 2
z x y 0 = 1
t
for 0 < x < 2 0 < y < 2
CHAPTER
18
THE HEAT EQUATION AND INITIAL AND BOUNDARY CONDITIONS FOURIER SERIES SOLUTIONS OF THE HEAT EQUATION HEAT CONDUCTION IN INFINITE MEDIA HEAT CONDUCTION IN AN INFINITE CYLIN
The Heat Equation
Heat and radiation phenomena are often modeled by a partial differential equation called the heat equation. We derived a three-dimensional version of the heat equation, using Gauss’s divergence theorem. We will now examine the heat equation more closely and solve it under a variety of conditions, following a program that parallels the one just carried out for the wave equation.
18.1
The Heat Equation and Initial and Boundary Conditions Let ux y z t be the temperature at time t and location x y z in a region of space. In Section 13.7.2, we showed that u satisfies the partial differential equation 2
u 2 u 2 u
u = K + + + &K · &u
t
x2 y2 z2 in which Kx y z is the thermal conductivity of the medium, x y z is the specific heat and x y z is the density. The term &K · &u is the dot product of the gradients of K and u. This is the heat equation in three space variables and the time. If the thermal conductivity of the medium is constant, then &K is the zero vector and the term &K · &u = 0. Now the three-dimensional heat equation is 2
u 2 u 2 u
u = K + +
t
x2 y2 z2 The 1-dimensional heat equation is
u K 2 u =
t x2 This equation often applies, for example, to heat conduction in a thin bar whose length is much larger than its other dimensions. To get some feeling for what is involved in the one-dimensional heat equation, we will give a separate derivation of it from basic principles. 841
842
CHAPTER 18
The Heat Equation
Consider a straight, thin bar of constant density and constant cross-sectional area A, placed along the x axis from 0 to L. Assume that the sides of the bar are insulated and do not allow heat loss, and that the temperature on the cross section of the bar perpendicular to the x axis at x is a function ux t of x and t only. Let the specific heat and the thermal conductivity K be constant. Consider a typical segment of the bar between x = and x = , as in Figure 18.1. By the definition of specific heat, the rate at which heat energy accumulates in this segment is
u A dx
t u(x, t) temperature on cross section at x at time t
u
x
0
α β
L
FIGURE 18.1
By Newton’s law of cooling, heat energy flows within this segment from the warmer to the cooler end at a rate equal to K times the negative of the temperature gradient (difference in temperature at the ends of the segment). Therefore, the net rate at which heat energy enters this segment of the bar at time t is
u
u t − KA t
x
x Assume that no energy is produced within the segment. Such production could occur, for example, if there is radiation or a heat source such as a chemical reaction. These would also change the mass of the segment with time. In the absence of these effects, the rate at which heat energy accumulates within the segment must balance the rate at which it enters the segment. Therefore 2 u
u
u
u t − t = KA A dx = KA dx 2
t
x
x x so u
2 u − K 2 dx = 0
t
x KA
This equation must be true for every and with 0 ≤ < ≤ L. If the term in parentheses in this integral were nonzero at any x0 and t0 , then by continuity we could choose an interval about x0 throughout which this term would be strictly positive or strictly negative. But then this integral of a positive or negative function over would be, respectively, positive or negative, a contradiction. We conclude that
u
2 u −K 2 = 0
t
x for 0 < x < L and for t > 0. This is the 1-dimensional heat equation. Often this partial differential equation is written
2 u
u =k 2
t
x where k = K/ is a positive constant depending on the material of the bar. The number k is called the diffusivity of the bar.
18.1 The Heat Equation and Initial and Boundary Conditions
843
This equation certainly does not determine the temperature function ux t uniquely. For example, if ux t is one solution, so is ux t + c for any real number c. For uniqueness of the solution, which we expect in models of physical phenomena, we need boundary conditions specifying information at the ends of the bar at all times, and initial conditions, giving the temperature throughout the bar at some time usually designated as time zero. The heat equation, together with certain initial and boundary conditions, uniquely determine the temperature distribution throughout the bar at all later times. For example, we might have the boundary value problem
2 u
u =k 2
t
x
for 0 < x < L t > 0
u0 t = T1 uL t = T2
for t ≥ 0
ux 0 = fx for 0 ≤ x ≤ L This problem models the temperature distribution in a bar of length L, whose left end is kept at constant temperature T1 and right end at constant temperature T2 , and whose initial temperature in the cross section at x is fx. The conditions at the ends of the bar are the boundary conditions, and the temperature at time zero is the initial condition. As a second example, consider the boundary value problem
2 u
u = k 2 for 0 < x < L t > 0
t
x
u
u 0 t = L t = 0 for t ≥ 0
x
x ux 0 = fx for 0 ≤ x ≤ L This problem models the temperature distribution in a bar having no heat loss across its ends. The boundary conditions given in this problem are called insulation conditions. Still other kinds of boundary conditions can be specified. For example, we might have a combination of fixed temperature and insulation conditions. If the left end is kept at constant temperature T and the right end is insulated, then u0 t = T
and
u L t = 0
x
Or we might have free radiation (convection), in which the bar loses heat by radiation from its ends into the surrounding medium, which is assumed to be maintained at constant temperature T . Now the model consists of the heat equation, the initial temperature function, and the boundary conditions
u 0 t = A u0 t − T
x
u L t = −A uL t − T
x
for t ≥ 0. Here A is a positive constant. Notice that if the bar is kept hotter than the surrounding medium, then the heat flow, as measured by u/ x, must be positive at the left end and negative at the right end. Boundary conditions u0 t = T1
u L t = −A uL t − T2
x
are used if the left end is kept at the constant temperature T1 while the right end radiates heat energy into a medium of constant temperature T2 .
CHAPTER 18
844
The Heat Equation
In 2-space dimensions, with constant thermal conductivity, the heat equation is 2
u
u 2 u =k +
t
x2 y2 while in 3-space dimensions it is 2
u 2 u 2 u
u =k + +
t
x2 y2 z2
SECTION 18.1
PROBLEMS kept at temperature t and the right end at temperature t. The initial temperature in the cross section at x is fx.
1. Formulate a boundary value problem modeling heat conduction in a thin bar of length L, if the left end is kept at temperature zero and the right end is insulated. The initial temperature in the cross section at x is fx.
3. Formulate a boundary value for the temperature function in a thin bar of length L if the left end is kept insulated and the right end at temperature t. The initial temperature in the cross section at x is fx.
2. Formulate a boundary value problem modeling heat conduction in a thin bar of length L, if the left end is
18.2
Fourier Series Solutions of the Heat Equation In this section we will solve several boundary value problems modeling heat conduction on a bounded interval. For this setting we will use separation of variables and Fourier series.
18.2.1 Ends of the Bar Kept at Temperature Zero Suppose we want the temperature distribution ux t in a thin, homogeneous (constant density) bar of length L, given that the initial temperature in the bar at time zero in the cross section at x perpendicular to the x axis is fx. The ends of the bar are maintained at temperature zero for all time. The boundary value problem modeling this temperature distribution is
u
2 u =k 2
t
x
for 0 < x < L t > 0
u0 t = uL t = 0
for t ≥ 0
ux 0 = fx for 0 ≤ x ≤ L We will use separation of variables. Substitute ux t = XxTt into the heat equation to get XT = kX T or X T = kT X
18.2 Fourier Series Solutions of the Heat Equation
845
The left side depends only on time, and the right side only on position, and these variables are independent. Therefore for some constant , X T = = − kT X Now u0 t = X0Tt = 0 If Tt = 0 for all t, then the temperature function has the constant value zero, which occurs if the initial temperature fx = 0 for 0 ≤ x ≤ L. Otherwise Tt cannot be identically zero, so we must have X0 = 0. Similarly, uL t = XLTt = 0 implies that XL = 0. The problem for X is therefore X + X = 0
X0 = XL = 0
We seek values of (the eigenvalues) for which this problem for X has nontrivial solutions (the eigenfunctions). This problem for X is exactly the same one encountered for the space-dependent function in separating variables in the wave equation. There we found that the eigenvalues are n =
n2 2 L2
for n = 1 2 , and corresponding eigenfunctions are nonzero constant multiples of nx Xn x = sin L The problem for T becomes T +
n2 2 k T = 0 L2
which has general solution Tn t = cn e−n For n = 1 2 · · · , we now have functions un x t = cn sin
2 2 kt/L2
nx L
e−n
2 2 kt/L2
which satisfy the heat equation on 0 L and the boundary conditions u0 t = uL t = 0. There remains to find a solution satisfying the initial condition. We can choose n and cn so that nx = fx un x 0 = cn sin L only if the given initial temperature function is a multiple of this sine function. This need not be the case. In general, we must attempt to construct a solution using the superposition ux t =
cn sin
nx
n=1
L
e−n
2 2 kt/L2
Now we need ux 0 =
n=1
cn sin
nx L
= fx
846
CHAPTER 18
The Heat Equation
which we recognize as the Fourier sine expansion of fx on 0 L. Thus choose n 2 L cn = f sin d L 0 L With this choice of the coefficients, we have the solution for the temperature distribution function: L n 2 nx −n2 2 kt/L2 e f sin (18.1) ux t = d sin L n=1 0 L L
EXAMPLE 18.1
Suppose the initial temperature function is constant A for 0 < x < L, while the temperature at the ends is maintained at zero. To write the solution for the temperature distribution function, we need to compute 2A 2A 2 L n d =
1 − cosn =
1 − −1n A sin cn = L 0 L n n The solution (18.1) is ux t =
nx 2 2 2 1 − −1n 2A sin e−n kt/L n=1 n L
Since 1 − −1n is zero if n is even, and equals 2 if n is odd, we need only sum over the odd integers and can write 2n − 1x −2n−12 2 kt/L2 1 4A sin ux t = e n=1 2n − 1 L Verification of the Solution The function given by equation (18.1) clearly satisfies the boundary and initial conditions of the problem. Each term vanishes at x = 0 and at x = L, and the coefficients were chosen so that ux 0 = fx. If we could differentiate this series term-byterm, it would also be easy to show that ux t satisfies the heat equation, since each term does. When we were faced with this problem with the wave equation, we used a trigonometric identity to sum the series. Here, because of the rapidly decaying exponential function in ux t, we can easily prove that the series converges uniformly. Choose any t0 > 0. Then, for t ≥ t0 , 1 2n − 1x 2 2 kt/L2 −2n−1 ≤ 1 e−2n−12 2 kt0 /L2 e 2n − 1 2n − 1 sin L Because the series
1 2 2 2 e−2n−1 kt0 /L 2n − 1 n=1 converges, the series for ux t converges uniformly for 0 ≤ x ≤ L and t ≥ t0 , by a theorem of Weierstrass often referred to as the M–test. By a similar argument, we can show that the series obtained by differentiating ux t term-by-term, once with respect to t, or twice with respect to x, also converge uniformly. We can therefore differentiate this series term by term, once with respect to t, and twice with respect to x. Since each term in the series satisfies the heat equation, so does ux t, verifying the solution (18.1). We will now consider the problem of heat conduction in a bar with insulated ends.
18.2 Fourier Series Solutions of the Heat Equation
18.2.2
847
Temperature in a Bar with Insulated Ends
Consider heat conduction in a bar with insulated ends, hence no energy loss across the ends. If the initial temperature is fx, the temperature function is modeled by the boundary value problem
u
2 u = k 2 for 0 < x < L t > 0
t
x
u
u 0 t = L t = 0 for t > 0
x
x ux 0 = fx for 0 ≤ x ≤ L Attempt a separation of variables by putting ux t = XxTt. We obtain, as in the preceding subsection, X + X = 0 T + kT = 0 Now
u 0 t = X 0Tt = 0
x implies (except in the trivial case of zero temperature) that X 0 = 0. Similarly,
u L t = X LTt = 0
x implies that X L = 0. The problem for Xx is therefore X + X = 0
X 0 = X L = 0
The eigenvalues are n2 2 L2 for n = 0 1 2 , with eigenfunctions nonzero constant multiples of nx Xn x = cos L The equation for T is now n =
T +
n2 2 k T = 0 L2
When n = 0 we get T0 t = constant. For n = 1 2 , Tn t = cn e−n We now have functions un x t = cn cos
2 2 kt/L2
nx
e−n
2 2 kt/L2
L for n = 0 1 2 , each of which satisfies the heat equation and the insulation boundary conditions. To satisfy the initial condition we must generally use a superposition nx 2 2 2 1 e−n kt/L ux t = c0 + cn cos 2 L n=1
848
CHAPTER 18
The Heat Equation
Here we wrote the constant term (n = 0) as c0 /2 in anticipation of a Fourier cosine expansion. Indeed, we need nx 1 ux 0 = fx = c0 + cn cos 2 L n=1
(18.2)
the Fourier cosine expansion of fx on 0 L. (This is also the expansion of the initial temperature function in the eigenfunctions of this problem.) We therefore choose cn =
2 L n d f cos L 0 L
With this choice of coefficients, equation (18.2) gives the solution of this boundary value problem.
EXAMPLE 18.2
Suppose the left half of the bar is initially at temperature A, and the right half is kept at temperature zero. Thus fx =
A for 0 ≤ x ≤ L/2 0 for L/2 < x ≤ L
Then c0 =
2 L/2 Ad = A L 0
and, for n = 1 2 , cn =
n 2 L/2 2A sinn/2 A cos d = L 0 L n
The solution for this temperature function is nx 2 2 2 n 1 2A ux t = A + cos e−n kt/L sin 2 n=1 2 L
Now sinn/2 is zero if n is even. Further, if n = 2k − 1 is odd, then sinn/2 = −1k+1 . The solution may therefore be written 2n − 1x −2n−12 2 kt/L2 −1n+1 1 2A cos ux t = A + e 2 n=1 2n − 1 L
18.2.3 Temperature Distribution in a Bar with Radiating End Consider a thin homogeneous bar of length L, with the left end maintained at zero temperature, while the right end radiates energy into the surrounding medium, which is kept at
18.2 Fourier Series Solutions of the Heat Equation
849
temperature zero. If the initial temperature in the bar’s cross section at x is fx, then the temperature distribution is modeled by the boundary value problem
2 u
u = k 2 for 0 < x < L t > 0
t
x
u u0 t = 0 L t = −AuL t for t > 0
x ux 0 = fx for 0 ≤ x ≤ L The boundary condition at L assumes that heat energy radiates from this end at a rate proportional to the temperature at that end of the bar. A is a positive constant called the transfer coefficient. Let ux t = XxTt and obtain X + X = 0 T + kT = 0 Since u0 t = X0Tt = 0, then X0 = 0 The condition at the right end of the bar implies that X L = −AXLTt hence X L + AXL = 0 The problem for X is therefore X + X = 0
X0 = 0
X L + AXL = 0
This is a regular Sturm–Liouville problem which we solved in Example 16.12 for the case A = 3 and L = 1, with yx in place of Xx.We will find the eigenvalues and eigenvalues in this more general setting by following that analysis. Consider cases on . Case 1 = 0. Then Xx = cx + d. Since X0 = d = 0, then Xx = cx. But then X L = c = −AXL = −AcL implies that c1 + AL = 0. But 1 + AL > 0, so c = 0 and this case has only the trivial solution. Hence 0 is not an eigenvalue of this problem. Case 2 < 0. Write = −2 with > 0. Then X − 2 X = 0, with general solution Xx = cex + de−x Now X0 = c + d = 0 so d = −c. Then Xx = 2c sinhx. Next, X L = 2c coshL = −AXL = −2Ac sinhL Now L > 0, so 2c coshL > 0 and −2Ac sinhL < 0, so this equation is impossible unless c = 0. This case therefore yields only the trivial solution for X, so this problem has no negative eigenvalue.
850
CHAPTER 18
The Heat Equation
Case 3 > 0. Now write = 2 with > 0. Now X + 2 X = 0, so Xx = c cosx + d sinx Then X0 = c = 0 so Xx = d sinx. Next, X L = d cosL = −AXL = −Ad sinL Then d = 0 or tanL = − A We can therefore have a nontrivial solution for X if is chosen to satisfy this equation. Let z = L to write this condition as tanz = −
1 z AL
Figure 18.2 shows graphs of y = tanz and y = −z/AL in the z y plane (with z as the horizontal axis). These graphs have infinitely many points of intersection to the right of the vertical axis. Let the z coordinates of these points of intersection be z1 z2 written in increasing order. Since = z/L, then n = 2n =
z2n L2
are the eigenvalues of this problem, for n = 1 2 Eigenfunctions are nonzero constant multiples of sinn x, or sinzn x/L. y
y tan(z) z1
z2
z3
z z
y AL π 2
3π 2
5π 2
FIGURE 18.2 Eigenvalues of the problem for a bar with a radiating end.
The eigenvalues here are obtained as solutions of a transcendental equation which we cannot solve exactly. Nevertheless, from Figure 18.2 it is clear that there is an infinite number of positive eigenvalues, and these can be approximated as closely as we like by numerical techniques. Now the equation for T is T +
z2n k T =0 L2
with general solution Tn t = cn e−zn kt/L 2
2
18.2 Fourier Series Solutions of the Heat Equation
851
For each positive integer n, let un x t = Xn xTn t = cn sin
z x 2 2 n e−zn kt/L L
Each of these functions satisfies the heat equation and the boundary conditions. To satisfy the initial condition, we must generally employ a superposition ux t =
cn sin
z x 2 2 n e−zn kt/L L
cn sin
z x n = fx L
n=1
and choose the cn s so that ux 0 =
n=1
This is not a Fourier sine series. It is, however, an expansion of the initial temperature function in eigenfunctions of the Sturm-Liouville problem for X. From Section 16.3.3, choose L cn =
f sinzn /Ld L 2 sin zn /Ld 0
0
The solution is ux t =
n=1
L 0
f sinzn /Ld
L 0
sin2 zn /Ld
sin
z x 2 2 n e−zn kt/L L
If we want to compute numerical values of the temperature at different points and times, we must make approximations. As an example, suppose A = L = 1 and fx = 1 for 0 < x < 1. Use Newton’s method to solve tanz = −z approximately to obtain z1 ≈ 20288
z2 ≈ 49132
z3 = 79787
z4 ≈ 110855
Using these values, perform numerical integrations to obtain c1 ≈ 19207
c2 ≈ 26593
c3 ≈ 41457
c4 ≈ 56329
Using just the first four terms, we have the approximation ux t ≈ 19027 sin20288xe−41160kt + 26593 sin49132xe−241395kt + 41457 sin79787xe−636597kt + 56329 sin110855xe−1228883kt Depending on the magnitude of a, these exponentials may be decaying so fast that these first few terms would suffice for some applications.
18.2.4
Transformations of Boundary Value Problems Involving the Heat Equation
Depending on the partial differential equation and the boundary conditions, it may be impossible to separate the variables in a boundary value problem involving the heat equation. Here is an example of a strategy that works for some problems.
852
CHAPTER 18
The Heat Equation
Heat Conduction in a Bar With Ends at Different Temperatures Consider a thin, homogeneous bar extending from x = 0 to x = L. The left end is maintained at constant temperature T1 , and the right end at constant temperature T2 . The initial temperature throughout the bar in the cross section at x is fx. The boundary value problem modeling this setting is
2 u
u =k 2
t
x
for 0 < x < L t > 0
u0 t = T1 uL t = T2
for t > 0
ux 0 = fx for 0 ≤ x ≤ L We assume that T1 and T2 are not both zero. Attempt a separation of variables by putting ux t = XxTt into the heat equation to obtain X + X = 0
T + kT = 0
The variables have been separated. However, we must satisfy u0 t = X0Tt = T1 If T1 = 0, this equation is satisfied by making X0 = 0. If, however, T1 = 0, then Tt = T1 /X0 = constant. Similarly, uL t = XLTt = T2 , so Tt = T2 /XL = constant. These conditions are impossible to satisfy except in trivial cases (such as fx = 0 and T1 = T2 = 0). We will perturb the temperature distribution function with the idea of obtaining a more tractable problem for the perturbed function. Set ux t = Ux t + x Substitute this into the heat equation to get 2
U
U =k + x
t
x2 This is the standard heat equation if we choose so that x = 0 This means that must have the form x = cx + d Now u0 t = T1 = U0 t + 0 becomes the more friendly condition U0 t = 0 if 0 = T1 . Thus choose d = T1 So far x = cx + T1 . Next, uL t = T2 = UL t + L becomes UL t = 0 if L = cL + T1 = T2 , so choose c=
1 T − T1 L 2
18.2 Fourier Series Solutions of the Heat Equation
853
Thus let x =
1 T − T1 x + T1 L 2
Finally, ux 0 = fx = Ux 0 + x becomes the following initial condition for U : Ux 0 = fx − x We now have a boundary value problem for U :
U
2 U =k 2
t
x U0 t = UL t = 0 Ux 0 = fx −
1 T − T1 x − T1 L 2
We know the solution of this problem (equation 18.1), and can immediately write
L 2 1 n nx −n2 2 kt/L2 f − T2 − T1 x − T1 sin d sin e Ux t = L n=1 0 L L L Once we obtain Ux t, the solution of the original problem is ux t = Ux t +
1 T − T1 x + T1 L 2
Physically we may regard this solution as a decomposition of the temperature distribution into a transient part and a steady-state part. The transient part is Ux t, which decays to zero as t increases. The other term, x, equals limt→ ux t and is the steady-state part. This part is independent of the time, representing the limiting value which the temperature approaches in the long-term. Such decompositions are seen in many physical systems. For example, in a typical electrical circuit the current can be written as a transient part, which decays to zero as time increases, and a steady-state part, which is the limit of the current function as t → .
EXAMPLE 18.3
Suppose, in the above discussion, T1 = 1, T2 = 2 and fx = 23 for 0 < x < L. Compute L L 1 1 n 1 n f − T2 − T1 − T1 sin d = − sin d L L 2 L L 0 0 1 1 + −1n = L 2 n The solution in this case is ux t =
1 + −1n nx −n2 2 kt/L2 1 sin e + x + 1 n L L n=1
854
CHAPTER 18
The Heat Equation
18.2.5 A Nonhomogeneous Heat Equation In this section we will consider a nonhomogeneous heat conduction problem on a finite interval:
u
2 u = k 2 + Fx t for 0 < x < L t > 0
t
x u0 t = uL t = 0
for t ≥ 0
ux 0 = fx for 0 ≤ x ≤ L The term Fx t could, for example, account for a heat source within the medium. It is easy to check that separation of variables does not work for this heat equation. To develop another approach, go back to the simple case that Fx t = 0. In this event we found a solution ux t =
bn sin
nx
n=1
L
e−n
2 2 kt/L2
in which bn is the nth coefficient in the Fourier sine expansion of fx on 0 L. Taking a cue from this, we will attempt a solution of the current problem of the form ux t =
Tn t sin
n=1
nx L
(18.3)
The problem is to determine each Tn t. The strategy for doing this is to derive a differential equation for Tn t. If t is fixed, then the left side of equation (18.3) is just a function of x, and the right side is its Fourier sine expansion on 0 L. We know the coefficients in this expansion, so 2 L n d (18.4) u t sin Tn t = L 0 L Now assume that, for any choice of t ≥ 0, Fx t, thought of as a function of x, can also be expanded in a Fourier sine series on 0 L: Fx t =
Bn t sin
n=1
where Bn t =
nx L
n 2 L F t sin d L 0 L
This coefficient may of course depend on t. Differentiate equation (18.4) to get n 2 L u t sin d Tn t = L 0 t L
(18.5)
(18.6)
(18.7)
Substitute for u/ t from the heat equation to get 2k L 2 u n n 2 L t sin F t sin Tn t = d + d L 0 x2 L L 0 L In view of equation (18.5), this equation becomes Tn t
n 2k L 2 u = t sin d + Bn t L 0 x2 L
(18.8)
18.2 Fourier Series Solutions of the Heat Equation
855
Now apply integration by parts twice to the integral on the right side of equation (18.8), at the end making use of the boundary conditions and of equation (18.4): 0
L
n L L n u n
2 u n
u t sin t cos t sin − d = d
x2 L
x L L x L 0 0 n n L u t cos =− d L 0 x L n n L n L n n u t sin =− + − u t cos d L L L 0 L L 0 n n2 2 L d u t sin =− 2 L L 0 =−
n2 2 L n2 2 T T t t = − L2 2 n 2L n
Substitute this into equation (18.8) to get Tn t = −
n2 2 k Tn t + Bn t L2
For n = 1 2 we now have a first order ordinary differential equation for Tn t: Tn t +
n2 2 k Tn t = Bn t L2
Next, use equation (18.4) to get the condition n n 2 L 2 L d = d = bn Tn 0 = u 0 sin f sin L 0 L L 0 L the nth coefficient in the Fourier sine expansion of fx on 0 L. Solve the differential equation for Tn t subject to this condition to get Tn t =
t
e−n
2 2 kt−/L2
0
Bn d + bn e−n
2 2 kt/L2
Finally, substitute this into equation (18.3) to obtain the solution ux t =
nx Bn d sin L 0 n=1 L nx 2 2 2 2 n + e−n kt/L f sin d sin L n=1 0 L L
t
e−n
2 2 kt−/L2
(18.9)
Notice that the last term is the solution of the problem if the term Fx t is missing, while the first term is the effect of the source term on the solution.
856
CHAPTER 18
The Heat Equation
EXAMPLE 18.4
Solve the problem
u
2 u = 4 2 + xt
t
x
for 0 < x < t > 0
u0 t = u t = 0 for t ≥ 0 ⎧ ⎨20 for 0 ≤ x ≤ 4 ux 0 = fx = ⎩0 <x≤ for 4 Since we have a formula for the solution, we need only carry out the required integrations. First compute 2 −1n+1 Bn t = t t sin n d = 2 0 n Now we can evaluate
t
e−4n
2 t−
0
Bn d =
t
2 0
−1n+1 −4n2 t− e d n
1 −1 + 4n2 t + e−4n t −1n+1 8 n5 2
= Finally, we need bn =
2 40 /4 40 1 − cosn/4 f sin n d = sinnd = 0 0 n
We can now write the solution
2 −4n2 t 1 n+1 −1 + 4n t + e −1 ux t = sin nx n5 n=1 8 +
40 1 − cosn/4 n=1
n
sinnxe−4n t 2
The second term on the right is the solution of the problem with the term xt deleted in the heat equation. Denote this “non-source” solution as u0 x t =
40 1 − cosn/4 n=1
n
sinnxe−4n t 2
The solution with the source term is
2 −4n2 t −1 + 4n t + e 1 −1n+1 ux t = u0 x t + sin nx n5 n=1 8
To gauge the effect on the solution of the term xt in the heat equation, Figures 18.3(a) through (d) show graphs of ux t and u0 x t at times t = 03, 08, 12 and 132. Both solutions decay to zero quite rapidly as time increases. This is shown in Figures 18.4, which shows the evolution of u0 x t over these times, and Figure 18.5, which follows ux t. The effect of the xt term is to retard this decay. Other terms Fx t would of course have different effects.
18.2 Fourier Series Solutions of the Heat Equation u
857
u
1.2
0.4
1.0
no source
0.8
0.3
0.6
0.2
0.4 0.1
0.2 0
0.5
1.0
1.5
2.0
2.5
3.0
x
Comparison of solutions with and without a source term for t = 03.
FIGURE 18.3(a)
no source 0.5
0
1.0
FIGURE 18.3(b)
u
1.5
2.0
2.5
3.0
x
t = 08.
u
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2 no source
0.1 0
0.5
1.0
FIGURE 18.3(c)
1.5
2.0
2.5
0.1 3.0
x
t = 12.
0
0.5 t 0.3
0.8
0.4
1.5
2.0
2.5
3.0
2.5
3.0
x
t 0.3 t 0.8 t 1.2
0.3
0.6 0.4
0.5
12, and 132.
1.0
0.2
t 0.8 t 1.2
t 1.32
FIGURE 18.4
18.2.6
1.0
u
1.0
0
0.5
FIGURE 18.3(d) t = 132.
u
0.2
no source
1.5
2.0
2.5
t 1.32
0.1 3.0
u0 x t at times t = 03, 08,
x
0
0.5
1.0
1.5
2.0
x
FIGURE 18.5 ux t at times t = 03, 08,
12, and 132.
Effects of Boundary Conditions and Constants on Heat Conduction
We have solved several problems involving heat conduction in a thin homogeneous bar of finite length. As we did with wave motion on an interval, computing power enables us to examine the effects of various constants or terms appearing in these problems, on the behavior of the solutions.
858
CHAPTER 18
The Heat Equation
EXAMPLE 18.5
Consider a thin bar of length , whose initial temperature is given by fx = x2 cosx/2. The ends of the bar are assumed to be maintained at zero temperature. The temperature function satisfies
2 u
u =k 2
t
x
for 0 < x < t > 0
u0 t = u t = 0
for t > 0
ux 0 = x cosx/2
for 0 ≤ x ≤
2
The solution is
2 2 2 cos/2 sin n d sin nx e−n kt n=1 0 16n −1n − 64n3 −1n − 48n − 64n3 4 2 sin nx e−n kt = n=1 64n6 − 48n4 + 12n2 − 1
ux t =
We can examine the effects of the diffusivity constant k on this solution by drawing graphs of y = ux t for various times, with different choices of this constant. Figure 18.6(a) shows the temperature distributions at time t = 02, for k taking the values 03, 06, 11 and 27. Figure 18.6(b) shows the temperature distributions at time t = 12 for these values of k.
u 2.0
u
0.3 0.6
1.5
1.2
1.1
0.3
1.0
0.6
0.8
1.0
2.7
0.6
1.1
0.4
0.5
0.2 x 0
0.5
1.0
1.5
2.0
2.5
3.0
FIGURE 18.6(a) Solution at time t = 02 with k = 03, 06, 11, and 27.
0
2.7 0.5
1.0
1.5
2.0
2.5
3.0
x
Solution at time t = 12 with k = 03, 06, 11, and 27.
FIGURE 18.6(b)
EXAMPLE 18.6
What difference does it make in the temperature distribution whether the ends are insulated or kept at temperature zero? Consider an initial temperature function fx = x2 − x, with a bar of length . Let the diffusivity be k = 41 . The solution if the ends are kept at temperature zero is 8 −1n+1 − 4 2 u1 x t = sin nx e−n t/4 3 n n=1
18.2 Fourier Series Solutions of the Heat Equation
859
The solution if the ends are insulated is 2 2 n −1n+1 + 6−1n − 6 1 2 cosnxe−n t/4 u2 x t = 3 + 4 12 n n=1 Figures 18.7(a) through (d) compare these two solutions for different values of the time. Figure 18.8(a) shows the evolution of the solution with zero end temperatures at different times, and Figure 18.8(b) shows this evolution for the solution with insulated ends.
18.2.7
Numerical Approximation of Solutions
Consider the standard heat conduction problem
u
2 u =k 2
t
x
for 0 < x < L t > 0
u0 t = uL t = 0
for t ≥ 0
ux 0 = fx for 0 ≤ x ≤ L One strategy for computing a numerical approximation of the solution is to begin by forming a grid over the x t− strip 0 ≤ x ≤ L, t ≥ 0, as we did with the wave equation on a bounded interval. u
u
u2(x, 0.4)
8
8
6
6
4
4
u1(x, 0.4)
u1(x, 0.9)
2
2
0
0.5
1.0
1.5
2.0
2.5
3.0
x
Comparison of the solution with insulated ends, with the solution having ends kept at zero temperature, at time t=0.4.
FIGURE 18.7(a)
u
0
0.5
1.0
FIGURE 18.7(b)
1.5
2.0
8 6
6
2.5
3.0
x
t = 09.
u
u2(x, 1.5) 8
u2(x, 3.6)
4
4
u1(x, 1.5) 2
2 0
u2(x, 0.9)
0.5
1.0
FIGURE 18.7(c)
1.5
2.0
t = 15.
2.5
3.0
x
0
u1(x, 3.6) 0.5
1.0
FIGURE 18.7(d)
1.5
2.0
t = 36.
2.5
3.0
x
860
CHAPTER 18
The Heat Equation u
u
9.0
4
t 0.4
3
8.5 t 1.5 t 3.6
8.0
t 0.9
7.5
t 1.5
2
7.0 6.5
1
0
t 0.9
t 0.4
9.5
t 3.6 0.5
1.0
1.5
2.0
6.0 2.5
3.0
x 0
Evolution with time of the solution with ends kept at zero temperature.
0.5
1.0
1.5
2.0
2.5
3.0
x
FIGURE 18.8(b) Evolution of the solution with insulated ends.
FIGURE 18.8(a)
Choose x = L/N , where N is a positive integer, and let xj = j x for j = 0 1 N . Also choose t positive. This defines lattice points xj tk = j x k t. Denote uj x k t = uij . Use centered difference approximations to the derivatives to replace the heat equation with: uj+1k − 2ujk + uj−1k ujk+1 − ujk =k t x2 In the heat equation, the partial derivative in t is first order, so this equation uses the approximation to u/ t on the left. Solve this equation for ujk+1 : ujk+1 =
k t u − 2ujk + uj−1k + ujk x2 j+1k
This enables us to approximate solution values at lattice points on the k + 1st horizontal level, from information at the next lower level, where approximations have already been made (Figure 18.9). t (x j, tk+1)
tk+1 tk
(x j–1, tk)
(x j, tk)
(x j+1, tk)
x j–1 x j x j+1
x
FIGURE 18.9 Approximation of uxj tk+1 is based on approximate values at three points in the tk layer.
Since we are moving up the layers of lattice points, filling in approximations at each layer from the layer below, there must be a starting layer at which we already have information. Data for a starting layer is provided by the initial and boundary conditions: u0k = uNk = 0
18.2 Fourier Series Solutions of the Heat Equation
861
(values at lattice points on the left and right sides of the strip), and uj0 = fxj = fj x These values are indicated in Figure 18.10. t u0
tk
u0
u0
u0
t2
u0
t1
u0 uj,0 f ( j∆ x) xj
u0,0 f (0) u1,0 f (∆ x)
u0 x
uL,0 f ( L∆ x)
u2,0 f (2∆ x)
FIGURE 18.10 Boundary data give exact values of ux t at lattice points on the boundary of the strip.
The quantity k t/ x2 should be less than 1/2 to ensure stability of the method.
EXAMPLE 18.7
Consider the problem
u 2 u = 2
t
x
for 0 < x < 1 t > 0
u0 t = u1 t = 0 ux 0 = x1 − x for 0 < x < 1 This has exact solution ux t =
8 1 2 2 sin2n − 1xe−2n−1 t 3 n=1 2n − 13
To make numerical approximations, we will choose x = 01 N = 10 and t = 00025. In this example, k = 1 so k t/ x2 = 1/4 < 1/2. We know that u0k = u10k = 0 Further uj0 = fj x = j011 − j01 This initiates the approximation. These values are filled in at the lowest (t = 0) level lattice points in Figure 18.11. To move from one horizontal layer to the next one up (according to the idea of Figure 18.9), use ujk+1 = 025uj+1k − 2ujk + uj−1k + ujk From here go on to the k = 1 (t = 00025) level, obtaining the values shown in Figure 18.12. Figure 18.13 shows the next level, k = 2, or t = 0005. And Figure 18.14 shows the k = 3, or t = 00075, level. Proceeding in this way, we can fill in approximate values at lattice points on any vertical level in the lattice.
862
CHAPTER 18
The Heat Equation xL
t u0, k
uj, 0
0 0 0 0 0
0 0 u 0 L, k 0 0 x
0
0.09
FIGURE 18.11
0.16
0.21
0.24
0.25
0.24
0.21
0.16
0.09
0
Values of uj0 u0k and u1k are known at lattice boundary points.
x1
t t1 0.0025 uj, 1 0
0.085 0.155 0.205 0.235 0.245 0.235 0.205 0.155 0.085 0 x
uj, 0
0.09
0
0.16
0.21
0.24
0.25
0.24
0.21
0.16
0.09
0
Approximate values at the t1 = 00025 level computed from known values at the t0 = 0 level.
FIGURE 18.12
x1
t uj, 2
0
uj, 1
0
uj, 0
0
0.081 0.150 0.200 0.230 0.240 0.230 0.200 0.150 0.081
0
0.085 0.155 0.205 0.235 0.245 0.235 0.205 0.155 0.085 0.09
0.16
0.21
0.24
0.25
0.24
0.21
0.16
0.09
0 x
0
FIGURE 18.13 Approximate values at the t2 = 0005 level computed from approximate values at the t1 = 00025 level.
t uj, 4
0
uj, 3
0
uj, 2
0
uj, 1
0
uj, 0
0
0.075 0.141 0.190 0.220 0.230 0.220 0.190 0.141 0.075
x1 0
0.078 0.145 0.195 0.225 0.235 0.225 0.195 0.145 0.078
0
0.081 0.150 0.200 0.230 0.240 0.230 0.200 0.150 0.081
0
0.085 0.155 0.205 0.235 0.245 0.235 0.205 0.155 0.085
0 x
FIGURE 18.14
0.09
0.16
0.21
0.24
0.25
0.24
0.21
0.16
0.09
Approximate values of the solution ux t at successive t-levels.
0
18.2 Fourier Series Solutions of the Heat Equation
SECTION 18.2
PROBLEMS
In each of Problems 1 through 7, write a solution of the boundary value problem. Graph the twentieth partial sum of the temperature distribution function on the same set of axes for different values of the time. 1.
u
2 u = k 2 for 0 < x < L t > 0
t
x u0 t = uL t = 0 for t ≥ 0 ux 0 = xL − x
2.
3.
for 0 ≤ x ≤ L
2 u
u = 4 2 for 0 < x < L t > 0
t
x u0 t = uL t = 0 for t ≥ 0 ux 0 = x2 L − x
for 0 ≤ x ≤ L
2 u
u = 3 2 for 0 < x < L t > 0
t
x u0 t = uL t = 0 for t ≥ 0 ux 0 = L 1 − cos2x/L
4.
5.
6.
7.
863
for 0 ≤ x ≤ L
u 2 u = 2 for 0 < x < t > 0
t
x
u
u 0 t = t = 0 for t ≥ 0
x
x ux 0 = sinx for 0 ≤ x ≤
2 u
u = 4 2 for 0 < x < 2 t > 0
t
x
u
u 0 t = 2 t = 0 for t ≥ 0
x
x ux 0 = x2 − x for 0 ≤ x ≤ 2
2 u
u = 4 2 for 0 < x < 3 t > 0
t
x
u
u 0 t = 3 t = 0 for t ≥ 0
x
x ux 0 = x2 for 0 ≤ x ≤ 3
2 u
u = 2 2 for 0 < x < 6 t > 0
t
x
u
u 0 t = 6 t = 0 for t ≥ 0
x
x ux 0 = e−x for 0 ≤ x ≤ 6
8. A thin, homogeneous bar of length L has insulated ends and initial temperature B, a positive constant. Find the temperature distribution in the bar.
9. A thin, homogeneous bar of length L has initial temperature equal to a constant B, and the right end x = L is insulated, while the left end is kept at a zero temperature. Find the temperature distribution in the bar. 10. A thin, homogeneous bar of thermal diffusivity 9 and length 2 cm and insulated sides has its left end maintained at temperature zero, while its right end is perfectly insulated. The bar has an initial temperature fx = x2 for 0 ≤ x ≤ 2. Determine the temperature distribution in the bar. What is limt→ ux t? 11. Show that the partial differential equation 2
u
u
u =k + A + Bu
t
x2
x can be transformed into a standard heat equation by choosing and appropriately and letting ux t = ex+t vx t. 12. Use the idea of Problem 11 to solve: 2
u
u
u = + 2u for 0 < x < t > 0 + 4
t
x2
x u0 t = u t = 0 ux 0 = x − x
for t ≥ 0 for 0 ≤ x ≤
13. Use the idea of Problem 11 to solve: 2
u
u
u = for 0 < x < 4 t > 0 +6
t
x2
x u0 t = u4 t = 0 ux 0 = 1
for t ≥ 0
for 0 ≤ x ≤ 4
Graph the twentieth partial sum of the solution for a selection of times. 14. Use the idea of Problem 11 to solve 2
u
u
u = for 0 < x < t > 0 − 6
t
x2
x u0 t = u t = 0 ux 0 = x2 − x
for t ≥ 0 for 0 ≤ x ≤
Graph the twentieth partial sum of the solution for selected times.
864
CHAPTER 18
The Heat Equation 19. k = 4 Fx t = t fx = x − x L =
15. Solve 2
u
u = 16 2
t
x
for 0 < x < 1 t > 0
u0 t = 2 u1 t = 5 ux 0 = x2
for t ≥ 0
for 0 ≤ x ≤ 1
Graph the twentieth partial sum of the solution for selected times.
21. k = 1 Fx t = t cosx fx = x2 5 − x L = 5 22. k = 4 Fx t =
K 0
for 0 ≤ x ≤ 1 for 1 < x ≤ 2
fx = sinx/2 L = 2 23. k = 16 Fx t = xt fx = K L = 3
16. Solve
u
2 u =k 2
t
x
for 0 < x < L t > 0
u0 t = T uL t = 0 ux 0 = xL − x
for t ≥ 0
for 0 ≤ x ≤ L
24. Devise a definition of continuous dependence of the solution on the initial data for the problem
2 y
y =k 2
t
x
for 0 < x < L t > 0
u0 t = uL t = 0
17. Solve
u
2 u = 4 2 − Au
t
x u0 t = u9 t = 0 ux 0 = 0
ux 0 = fx for 0 < x < 9 t > 0 for t ≥ 0
for 0 ≤ x ≤ 9
Here A is a positive constant. Choose A = 41 and graph the twentieth partial sum of the solution for a selection of times, using the same set of axes. Repeat this for the values A = 21 , A = 1 and A = 3. This gives some sense of the effect of the −Au term in the heat equation on the behavior of the temperature distribution. 18.
20. k = 1 Fx t = x sint fx = 1 L = 4
2 u
u = 9 2 for 0 < x < L t > 0
t
x u0 t = T uL t = 0 for t ≥ 0 ux 0 = 0
for 0 ≤ x ≤ 2
In each of Problems 19 through 23, solve the problem
2 u
u = k 2 + Fx t for 0 < x < L t > 0
t
x u0 t = uL t = 0
for t ≥ 0
ux 0 = fx for 0 ≤ x ≤ L for the given F , k, L and f . In each, choose a value of the time and, on the same set of axes, graph the twentieth partial sum of the solution of the given problem, together with the twentieth partial sum of the solution of the problem with the source term Fx t removed. Repeat this for other times. This yields some sense of the significance of Fx t on the behavior of the temperature distribution.
for t > 0
for 0 < x < L
Prove that this problem depends continuously on the initial data. 25. Find approximate solution values for the problem
u 2 u = 2
t
x
for 0 < x < 1 t > 0
u0 t = u1 t = 0 ux 0 = x2 1 − x
for t ≥ 0 for 0 ≤ x ≤ 1
Use x = 01 and t = 00025. Carry out calculations for the first four horizontal layers, including the t = 0 layer. 26. Find approximate solution values for the problem
u 2 u = 2
t
x
for 0 < x < 2 t > 0
u0 t = u1 t = 0
for t ≥ 0
ux 0 = sinx/2
for 0 ≤ x ≤ 2
Use x = 02 and t = 00025. Carry out calculations for the first four horizontal layers, including the t = 0 layer. 27. Find approximate solution values for the problem
u 2 u = 2
t
x
for 0 < x < 1 t > 0
u0 t = u1 t = 0 ux 0 = x cosx/2
for t ≥ 0 for 0 ≤ x ≤ 1
Use x = 01 and t = 00025. Carry out calculations for the first four horizontal layers, including the t = 0 layer.
18.3 Heat Conduction in Infinite Media
18.3
865
Heat Conduction in Infinite Media We will now consider problems involving the heat equation with the space variable extending over the entire real line or half-line.
18.3.1
Heat Conduction in an Infinite Bar
For a setting in which the length of the medium is very much greater than the other dimensions, it is sometimes suitable to model heat conduction by imagining the space variable free to vary over the entire real line. Consider the problem
2 u
u =k 2
t
x ux 0 = fx
for − < x < t > 0 for − < x <
There are no boundary conditions, so we impose the physically realistic condition that solutions should be bounded. Separate the variables by putting ux t = XxTt to obtain X + X = 0
T + kT = 0
The problem for X is the same as that encountered with the wave equation on the line, and the same analysis yields eigenvalues = 2 for ≥ 0 and eigenfunctions of the form a cosx + b sinx. 2 The problem for T is T + 2 kT = 0, with general solution de− kt . This is bounded for t ≥ 0. We now have, for ≥ 0, functions u x t = a cosx + b sinx e−
2 kt
that satisfy the heat equation and are bounded on the real line. To satisfy the initial condition, attempt a superposition of these functions over all ≥ 0, which takes the form of an integral: 2 (18.10) ux t =
a cosx + b sinx e− kt d 0
We need ux 0 =
0
a cosx + b sinx d = fx
This is the Fourier integral of fx on the real line, leading us to choose the coefficients 1 f cosd a = − and b =
1 f sind −
EXAMPLE 18.8
Suppose the initial temperature function is fx = e−x . Compute the coefficients 1 − 2 1 a = e cosd = − 1 + 2
866
CHAPTER 18
The Heat Equation
and b =
1 − e sind = 0 −
The solution for this initial temperature distribution is 2 1 2 ux t = cosxe− kt d 2 0 1+ The integral (18.10) for the solution is sometimes written in more compact form, reminiscent of the calculation in Section 17.3.1 for Fourier integral solutions of the wave equation on the entire line. Substitute the integrals for the coefficients into the integral for the solution to write 1 ux t = f cosd cosx − 0
1 2 f sind sinx e− kt d + − 1 2 =
cos cosx + sin sinxfde− kt d 0 − 1 2 cos − xfe− kt dd = 0 − A Single Integral Expression for the Solution on the Real Line Consider again the problem
2 u
u =k 2
t
x
for − < x < t > 0
ux 0 = fx for − < x < We have solved this problem to obtain the double integral expression 1 2 ux t = cos − xfe− kt dd 0 − Since the integrand is an even function in , then 0 · · · d = 21 − · · · d and this solution can also be written 1 2 ux t = cos − xfe− kt dd 2 − − We will show how this solution can be put in terms of a single integral. We need the following. LEMMA 18.1
For real and , with = 0, −
Proof
e
−" 2
" cos
Let Fx =
0
d" =
√
e−
2 /42
e−" cosx"d" 2
18.3 Heat Conduction in Infinite Media
867
One can show that this integral converges for all x, as does the integral obtained by interchanging d/dx and 0 · · · d". We can therefore compute 2 −e−" " sinx"d" F x = 0
Integrate by parts to get x F x = − Fx 2 Then F x x =− Fx 2 and an integration yields 1 ln Fx = − x2 + c 4 Then Fx = Ae−x 4 2
To evaluate the constant A, use the fact that F0 = A =
√
e 0
−" 2
d" =
2
a result found in many integral tables. Therefore √ −x2 /4 −" 2 e e cosx"d" = 2 0 Finally, let x = / and use the fact that the integrand is even with respect to " to obtain √ " " 2 2 −" 2 −" 2 e cos e cos d" = 2 d" = e− /4 − 0 Now let "=
√
kt = x − and =
√
kt
Then " = x − and
−
Then
e
−" 2
" cos
d" =
−
e
−2 kt
−
e−
2 kt
√ √ 2 cosx − ktd = e−x− /4kt
√ 2 cosx − d = √ e−x− /4kt kt
The solution of the heat conduction on the real line is therefore 1 2 ux t = f cos − xe− kt dd 2 − − √ 1 −x−2 /4kt = fd √ e 2 − kt
868
CHAPTER 18
The Heat Equation
After some manipulation, this solution is 1 −x−2 /4kt e fd ux t = √ 2 kt − This is simpler than the previously stated solution in the sense of containing only one integral.
18.3.2 Heat Conduction in a Semi-Infinite Bar If we consider heat conduction in a bar extending from 0 to infinity, then there is a boundary condition at the left end. If the temperature is maintained at zero at this end, then the problem is
u
2 u =k 2
t
x u0 t = 0
for 0 < x < t > 0
for t ≥ 0
ux 0 = fx
for 0 < x <
Letting ux t = XxTt, the problems for X and T are X + X = 0 T + kT = 0 If we proceed as we did for the real line, we obtain = 2 for ≥ 0, and functions X x = a cosx + b sinx Now, however, we also have the condition u0 t = X0Tt = 0 implying that X0 = 0 Thus we must choose each a = 0, leaving X x = b sinx. Solutions for T have the form 2 of constants times e− kt , so for each > 0 we have functions u x t = b sinxe− kt 2
Each of these functions satisfies the heat equation and the boundary condition u0 t = 0. To satisfy the initial condition, write a superposition 2 b sinxe− kt d (18.11) ux t = 0
Now the initial condition requires that ux 0 =
0
b sinxd
so choose the b s as the coefficients in the Fourier sine integral of fx on 0 : b =
2 f sind 0
With this choice of coefficients, the function given by equation (18.11) is the solution of the problem.
18.3 Heat Conduction in Infinite Media
869
EXAMPLE 18.9
Suppose the initial temperature function is given by − x for 0 ≤ x ≤ fx = 0 for x > The coefficients in the solution (18.11) are 2 2 − sin b = − sind = 0 2 The solution for this initial temperature function is 2 − sin 2 ux t = sinxe− kt d 0 2
18.3.3
Integral Transform Methods for the Heat Equation in an Infinite Medium
As we did with the wave equation on an unbounded domain, we will illustrate the use of Fourier transforms in problems involving the heat equation. Heat Conduction on the Line
Consider again the problem
u
2 u =k 2
t
x ux 0 = fx
for − < x < t > 0 for − < x <
which we have solved by separation of variables. Since x varies over the real line, we can attempt to use the Fourier transform in the x variable. Take the transform of the heat equation to get 2
u
u = k
t
x2 Because x and t are independent, the transform passes through the partial derivative with respect to t: u t
u
e−i d = u te−i d = uˆ t =
t
t
t −
t − For the transform, in the x− variable, of the second partial derivative of u with respect to x, use the operational formula: 2
u = −2 uˆ t
x2 The transform of the heat equation is therefore
uˆ t + k2 uˆ t = 0
t with general solution uˆ t = a e− kt 2
870
CHAPTER 18
The Heat Equation
To determine the coeffocient a , take the transform of the initial condition to get uˆ 0 = fˆ = a Therefore 2 uˆ t = fˆe− kt
This is the Fourier transform of the solution of the problem. To retrieve the solution, apply the inverse Fourier transform: 1 ˆ 2 2 ux t = −1 fˆe− kt x = f e− kt eix d 2 − Of course, the real part of this expression is ux t. To see that this solution agrees with that obtained by separation of variables, insert the integral for fˆ to obtain 1 ˆ 1 2 2 f e− kt eix d = fe−i d eix e− kt d 2 − 2 − − 1 2 = fe−i−x e− kt dd 2 − − 1 2 f cos − xe− kt dd = 2 − − i 2 f sin − xe− kt dd − 2 − − Taking the real part of this expression, we have ux t =
1 2 f cos − xe− kt dd 2 − −
the solution obtained by separation of variables.
Heat Conduction on the Half-Line
Consider again the problem
u
2 u =k 2
t
x u0 t = 0
for 0 < x < t > 0
for t ≥ 0
ux 0 = fx
for − < x <
which we have solved by separation of variables. To illustrate the transform technique, we will solve this problem again using the Fourier sine transform. Take the sine transform of the heat equation with respect to x, using the operational formula for the transform of the 2 u/ x2 term, to get
uˆ t = −2 kˆuS t + ku0 t
t S
18.3 Heat Conduction in Infinite Media
871
Since u0 t = 0, this is
uˆ t = −2 kˆuS t
t S with general solution uˆ S t = b e− kt 2
Now ux 0 = fx, so uˆ S 0 = fˆS = b and therefore 2 uˆ S t = fˆS e− kt
This is the sine transform of the solution. For the solution, apply the inverse Fourier sine transform to obtain 2 ˆ 2 ux t = f e− kt sinxd 0 S We leave it for the student to insert the integral expression for fˆS and show that this solution agrees with that obtained by separation of variables. Laplace Transform Solution of a Boundary Value Problem We have illustrated the use of the Fourier transform and Fourier sine transform in solving heat conduction problems. Here is an example in which the Laplace transform is the natural transform to use. Consider the problem on a half-line:
u
2 u =k 2
t
x ux 0 = A u0 t =
for x > 0 t > 0
for x > 0 B 0
for 0 ≤ t ≤ t0 for t > t0
in which A, B and t0 are positive constants. This specifies a problem with nonzero constant initial temperature and a discontinuous temperature distribution at the left end of the bar. We can write the boundary condition more neatly in terms of the Heaviside function H defined in Section 3.3.2: u0 t = B 1 − Ht − t0 Because of the discontinuity in u0 t, we think of trying a Laplace transform in t. Denote ux ts = Ux s with s the variable of the transformed function, and x carried along as a parameter. Take the Laplace transform of the heat equation: 2
u
u = k
t
x2 For the transform of u/ t, the derivative of the transformed variable, use the operational formula for the Laplace transform:
u s = sUx s − ux 0 = sUx s − A
t
872
CHAPTER 18
The Heat Equation
The transform passes through 2 u/ x2 because x and t are independent: 2 2
u
2 −st
2 Ux s −st u e x tdt = e ux tdt = s =
x2
x2
x2 0
x2 0 Transforming the heat equation therefore yields sUx s − A = k
2 Ux s
x2
Write this equation as
2 Ux s s A − Ux s = −
x2 k k a differential equation in x, for each s > 0. The general solution of this equation is Ux s = as e
√ s/kx
√ s/kx
+ bs e −
+
A s
The notation reflects the fact that the coefficients will in general depend on s. Now, to have a √ bounded solution we need as = 0, since e s/kx → as s → . Therefore √ s/kx
Ux s = bs e−
+
A s
(18.12)
To obtain bs , take the Laplace transform of u0 t = B 1 − Ht − t0 to get 1 1 U0 s = B 1s − B Ht − t0 s = B − B e−t0 s s s Then 1 A 1 U0 s = B − B e−t0 s = bs + s s s so bs =
B − A B −t0 s − e s s
Put this into equation (18.12) to get Ux s =
B − A B −t0 s −√s/kx A − e + e s s s
The solution is now obtained by using the inverse Laplace transform: ux t = −1 Ux s This inverse can be calculated using standard tables and makes use of the error function and complementary error function, which see frequent use in statistics. These functions are defined by 2 x −2 e d erfx = √ 0 and 2 −2 e d = 1 − erfx erfcx = √ x
18.4 Heat Conduction in an Infinite Cylinder
873
We obtain x x ux t = A erf √ + B erfc √ 1 − Ht − t0 2 kt 2 kt x x x Ht − t0 + A erf √ + B erfc √ − B erfc 2 kt 2 kt 2 kt − t0
PROBLEMS
SECTION 18.3
In each of Problems 1 through 4, consider the problem 2
u
u = k 2 for − < x < t > 0
t
x ux 0 = fx for − < x < Obtain a solution first by separation of variables (Fourier integral), and then again by Fourier transform.
8. fx =
x 0
for 0 ≤ x ≤ 2 for x > 2
In each of Problems 9 and 10, use a Fourier transform on the half-line to obtain a solution. 9.
u 2 u = 2 − tu for x > 0 t > 0
t
x ux 0 = xe−x for x > 0
1. fx = e−4x sinx for x ≤ 2. fx = 0 for x > x for 0 ≤ x ≤ 4 3. fx = 0 for x < 0 and for x > 4 e−x for − 1 ≤ x ≤ 1 4. fx = 0 for x > 1
10.
In each of Problems 5 through 8, solve the problem
In each of Problems 11 and 12, use the Laplace transform to obtain a solution.
u
2 u =k 2
t
x u0 t = 0
11.
for 0 < x <
u 2 u = 2 − u for x > 0 t > 0
t
x ux 0 = 0 for x > 0
u 0 t = ft for t > 0
x
6. fx = xe−x , with > 0. 1 for 0 ≤ x ≤ h 7. fx = 0 for x > h with h any positive number.
2 u
u = k 2 for x > 0 t > 0
t
x u0 t = t2 for t > 0 ux 0 = 0
5. fx = e−x , with any positive constant.
18.4
for t > 0
for 0 < x < t > 0
for t ≥ 0
ux 0 = fx
u0 t = 0
12.
for x > 0
2 u
u = k 2 for x > 0 t > 0
t
x u0 t = 0 for t > 0 ux 0 = e−x
for x > 0
Heat Conduction in an Infinite Cylinder We will consider the problem of determining the temperature distribution function in a solid, infinitely long, homogeneous cylinder of radius R. Let the axis of the cylinder be along the z
874
CHAPTER 18
The Heat Equation z
r
y
θ x
(r, θ)
FIGURE 18.15
axis in x y z space (Figure 18.15). If ux y z t is the temperature function, then u satisfies the 3-dimensional heat equation 2
u 2 u 2 u
u =k + +
t
x2 y2 z2 It is convenient to use cylindrical coordinates, which consist of polar coordinates in the plane together with the usual z coordinate, as in the diagram. With x = r cos and y = r sin, let ux y z t = Ur z t We saw in Section 17.1 that 1 2 U
2 u 2 u 2 U 1 U + + = +
x2 y2
r 2 r r r 2 2 Thus, in cylindrical coordinates, with Ur z t the temperature in the cylinder at point r z, and time t, U satisfies: 2
U 1 U 1 2 U 2 U
U =k + 2 2 + 2 +
t
r 2 r r r
z This is a formidable equation to engage at this point, so we will assume that the temperature at any point in the cylinder depends only on the time t and the horizontal distance r from the z axis. This symmetry assumption means that U/ = U/ z = 0, and the heat equation becomes 2
U 1 U
U =k for 0 ≤ r < R t > 0 +
t
r 2 r r In this case we will write Ur t instead of Ur z t. The boundary condition is UR t = 0
for t > 0
This means that the outer surface of the cylinder is kept at zero temperature. The initial condition is Ur 0 = fr for 0 ≤ r < R Separate the variables in the heat equation by putting Ur t = FrTt. We obtain 1 FrT t = k F rTt + F rTt r Because r and t are independent variables, this yields F + 1/rF T = = − kT F
18.4 Heat Conduction in an Infinite Cylinder
875
in which is the separation constant. Then 1 T + kT = 0 and F + F + F = 0 r Further, UR t = FRTt = 0 for t > 0, so we have the boundary condition FR = 0 The problem for F is a singular Sturm-Liouville problem (see Section 16.3.1) on 0 R, with only one boundary condition. We impose the condition that the solution must be bounded. Consider cases on . Case 1 Now
= 0. 1 F + F = 0 r
To solve this, put w = F r to get 1 w r + wr = 0 r or rw + w = rw = 0 This has general solution rwr = c so wr =
c = F r r
Then Fr = c lnr + d Now lnr → − as r → 0+ (center of the cylinder), so choose c = 0 to have a bounded solution. This means that Fr = constant for = 0. The equation for T in this case is T = 0, with T = constant also. In this event Ur t = constant. Since UR t = 0, this constant must be zero. In fact, Ur t = 0 is the solution in the case that fr = 0. If the temperature on the surface is maintained at zero, and the temperature throughout the cylinder is initially zero, then the temperature distribution remains zero at all later times, in the absence of heat sources. Case 2 < 0. Write = −2 with > 0. Now T − k2 T = 0 has general solution 2
Tt = ce kt which is unbounded unless c = 0, leading again to ur t = 0. This case leads only to the trivial solution. Case 3 > 0, say = 2 . 2 Now T + k2 T = 0 has solutions that are constant multiples of e− kt , and these are bounded for t > 0. The equation for F is 1 F r + F r + 2 Fr = 0 r
876
CHAPTER 18
The Heat Equation
or r 2 F r + rF r + 2 r 2 Fr = 0 In this form we recognize Bessel’s equation of order zero, with general solution Fr = cJ0 r + dY0 r J0 is Bessel’s function of the first kind order zero, and Y0 is Bessel’s function of the second kind of order zero (see Section 16.2.3). Since Y0 r → − as r → 0+, we must have d = 0. However, J0 r is bounded on 0 R, so Fr is a constant multiple of J0 r. The condition FR = 0 now requires that this constant be zero (in which case we get the trivial solution), or that be chosen so that J0 R = 0 This can be done. Recall that J0 x has infinitely many positive zeros, which we arrange as 0 < j 1 < j2 < · · · We can therefore have J0 R = 0 if R is any one of these numbers. Thus choose n =
jn R
The numbers jn2 R2 are the eigenvalues of this problem, and the eigenfunctions are nonzero constant multiples of J0 jn r/R. We now have, for each positive integer n, a function j r 2 2 Un r t = an J0 n e−jn kt/R R n = 2n =
To satisfy the initial condition Ur 0 = fr we must generally use a superposition j r 2 2 Ur t = an J0 n e−jn kt/R R n=1 We now must choose the coefficients so that j r Ur 0 = an J0 n = fr R n=1 This is an eigenfunction expansion of fr in terms of the eigenfunctions of the singular SturmLiouville problem for Fr. We know from Section 16.3.3 how to find the coefficients. Let = r/R. Then f R =
an J0 jn
n=1
and an =
1 2 fRJ0 jn d 2
J1 jn 0
The solution of the problem is 1 2 j r 2 2 Ur t = fRJ j d J0 n e−jn kt/R 0 n 2 j
J R 0 1 n n=1
18.5 Heat Conduction in a Rectangular Plate
SECTION 18.4
PROBLEMS
1. Suppose the cylinder has radius R = 1, and, in polar coordinates, initial temperature Ur 0 = fr = r for 0 ≤ r < 1. Assume that U1 t = 0 for t > 0. Approximate the integral in the solution and write the first five terms in the series solution for Ur t, with k = 1. (The first five zeros of J0 x are given in Section 16.2). Graph this sum of five terms for different values of t.
Approximate the integral in the solution and write the first five terms in the series solution for Ur t, with 1 a = . Graph this sum of five terms for different values 2 of t. 4. Determine the temperature distribution in a homogeneous circular cylinder of radius R with insulated top and bottom caps under the assumption that the temperature is independent of both the radial angle and height. Assume that heat is radiating from the lateral surface into the surrounding medium, which has temperature zero, with transfer coefficient A. The initial temperature is Ur 0 = fr. Hint: It will be necessary to know that an equation of the form kJ0 x + AJ0 x = 0 has infinitely many positive solutions. This can be proved, but assume it here. Solutions of this equation yield the eigenvalues for this problem.
2. Suppose the cylinder has radius R = 3, and, in polar coordinates, initial temperature Ur 0 = fr = er for 0 ≤ r < 3. Assume that U3 t = 0 for t > 0. Approximate the integral in the solution and write the first five terms in the series solution for Ur t, with k = 16. Graph this sum of five terms for different values of t. 3. Suppose the cylinder has radius R = 3, and, in polar coordinates, initial temperature Ur 0 = fr = 9 − r 2 for 0 ≤ r < 3. Assume that U3 t = 0 for t > 0.
18.5
877
Heat Conduction in a Rectangular Plate Consider the temperature distribution ux y t in a flat, square homogeneous plate covering the region 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 in the plane. The sides are kept at temperature zero and the interior temperature at time zero at x y is given by fx y = x1 − x2 y1 − y The problem for u is
2
u
u 2 u =k +
t
x2 y2
for 0 ≤ x ≤ 1 0 ≤ y ≤ 1 t > 0
ux 0 t = ux 1 t = 0
for 0 < x < 1 t > 0
u0 y t = u1 y t = 0
for 0 < y < 1 t > 0
ux y 0 = x1 − x2 y1 − y Let ux y t = XxYyTt and obtain X + X = 0
Y + Y = 0
T + + kT = 0
where and are the separation constants. The boundary conditions imply in the usual way that X0 = X1 = 0
Y0 = Y1 = 0
The eigenvalues and eigenfunctions are n = n2 2
Xn x = sinnx
m = m2 2
Ym y = sinmy
for n = 1 2 and
878
CHAPTER 18
The Heat Equation
for m = 1 2 The problem for T is now T + n2 + m2 2 kT = 0 with general solution Tnm t = cnm e−n
2 +m2 2 kt
For each positive integer n and each positive integer m, we now have functions unm x y t = cnm sinnx sinmye−n
2 +m2 2 kt
which satisfy the heat equation and the boundary conditions. To satisfy the initial condition, let ux y t =
cnm sinnx sinmye−n
2 +m2 2 kt
n=1 m=1
We must choose the coefficients so that ux y 0 = x1 − x2 y1 − y =
cnm sinnx sinmy
n=1 m=1
We find (as in Section 17.7) that 1 1 cnm = 4 x1 − x2 y1 − y sinnx sinmydxdy 0
=4
0
1 0
x1 − x sinnxdx
−1n = 48 n3 3
2
−1n − 1 m3 3
1 0
y1 − y sinmydy
The solution is ux y z =
48 −1n −1n − 1 2 2 2 sinnx sinmye−n +m kt 6 n=1 m=1 n3 m3
SECTION 18.5
PROBLEMS
1. Taking a cue from the problem just solved, write a double series solution for the following more general problem: 2
u
u 2 u =k for 0 ≤ x ≤ L +
t
x2 y2 0 ≤ y ≤ K t > 0 ux 0 t = ux K t = 0
for 0 < x < L t > 0
u0 y t = uL y t = 0
for 0 < y < K t > 0
ux y 0 = fx y 2. Write the solution for Problem 1 in the case that k = 4, L = 2, K = 3 and fx y = x2 L − x sin yK − y. 3. Write the solution for Problem 1 in the case that k = 1, L = , K = , and fx y = sinxy cosy/2.
CHAPTER
19
HARMONIC FUNCTIONS AND THE DIRICHLET PROBLEM DIRICHLET PROBLEM FOR A RECTANGLE POISSON’S INTEGRAL FORMULA FOR THE DISK DIRICHLET PROBLEMS IN UNBOUNDED REGIONS THE STEADY-
The Potential Equation
19.1
Harmonic Functions and the Dirichlet Problem The partial differential equation
2 u 2 u + =0
x2 y2 is called Laplace’s equation in two dimensions. In 3-dimensions this equation is
2 u 2 u 2 u + + = 0
x2 y2 z2 The Laplacian & 2 (read “del squared”) is defined in 2-dimensions by & 2u =
2 u 2 u +
x2 y2
and in three dimensions by & 2u =
2 u 2 u 2 u + +
x2 y2 z2
In this notation, Laplace’s equation is & 2 u = 0. A function satisfying Laplace’s equation in a certain region is said to be harmonic on that region. For example, x2 − y 2 and 2xy are both harmonic over the entire plane. 879
880
CHAPTER 19
The Potential Equation
Laplace’s equation is encountered in problems involving potentials, such as potentials for force fields in mechanics, or electromagnetic or gravitational fields. Laplace’s equation is also known as the steady-state heat equation. The heat equation in 2- or 3-space dimensions is
u = k& 2 u
t In the steady-state case (the limit as t → ) the solution becomes independent of t, so u/ t = 0 and the heat equation becomes Laplace’s equation. In problems involving Laplace’s equation there are no initial conditions. However, we often encounter the problem of solving & 2 ux y = 0 for x y in some region D of the plane, subject to the condition that ux y = fx y for x y on the boundary of D. This boundary is denoted D. Here f is a function having given values on D, which is often a curve, or made up of several curves (Figure 19.1). The problem of determining a harmonic function having given boundary values is called a Dirichlet problem, and f is called the boundary data of the problem. There are versions of this problem in higher dimensions, but we will be concerned primarily with dimension 2. y ∂D D x
FIGURE 19.1
Typical boundary D of a region D.
The difficulty of a Dirichlet problem is usually dependent on how complicated the region D is. In general, we have a better chance of solving a Dirichlet problem for a region that possesses some type of symmetry, such as a disk or rectangle. We will begin by solving the Dirichlet problem for some familiar regions in the plane.
SECTION 19.1
PROBLEMS
1. Let f and g be harmonic on a set D of points in the plane. Show that f + g is harmonic, as well as f for any real number . 2. Show that the following functions are harmonic on the entire plane: (a) x3 − 3xy2 (b) 3x2 y − y3
(c) x4 − 6x2 y2 + y4 (d) 4x3 y − 4xy3 (e) sinx coshy (f) cosx sinhy (g) e−x cosy 3. Show that lnx2 + y2 is harmonic on the plane with the origin removed.
19.2 Dirichlet Problem for a Rectangle
5. Show that, for any positive integer n, r −n cosn and r −n sinn are harmonic on the plane with the origin removed.
4. Show that r n cosn and r n sinn, in polar coordinates, are harmonic on the plane, for any positive integer n. Hint: Look up Laplace’s equation in polar coordinates.
19.2
881
Dirichlet Problem for a Rectangle Let R be a solid rectangle, consisting of points x y with 0 ≤ x ≤ L, 0 ≤ y ≤ K. We want to find a function that is harmonic at points interior to R, and takes on prescribed values on the four sides of R, which form the boundary R of R. This kind of problem can be solved by separation of variables if the boundary data is nonzero on only one side of the rectangle. We will illustrate this kind of problem, and then outline a strategy to follow if the boundary data is nonzero on more than one side.
EXAMPLE 19.1
Consider the Dirichlet problem & 2 ux y = 0
for 0 < x < L 0 < y < K
ux 0 = 0
for 0 ≤ x ≤ L
u0 y = uL y = 0
for 0 ≤ y ≤ K
ux K = L − x sinx
for 0 ≤ x ≤ L
Figure 19.2 shows the region and the boundary data. Let ux y = XxYy and substitute into Laplace’s equation to obtain X Y =− = − X Y Then X + X = 0
and
Y − Y = 0
From the boundary conditions, ux 0 = XxY0 = 0 y (L – x) sin(x) (0, K ) 0
(L, K ) R
0
0 (L, 0) FIGURE 19.2 Boundary data given on boundary sides of the rectangle.
x
882
CHAPTER 19
The Potential Equation
so Y0 = 0. Similarly, X0 = XL = 0 The problem for Xx is a familiar one, with eigenvalues n = n2 2 /L2 and eigenfunctions that are nonzero constant multiples of sinnx/L. The problem for Y is now Y −
n2 2 Y = 0 L2
Y0 = 0
Solutions of this problem are constant multiples of sinhny/L. For each positive integer n = 1 2 we now have functions ny nx un x y = bn sin sinh L L which are harmonic on the rectangle, and satisfy the zero boundary conditions on the top, bottom and left sides of the rectangle. To satisfy the boundary condition on the side y = K, we must use a superposition ny nx sinh ux y = bn sin L L n=1 Choose the coefficients so that ux K =
n=1
bn sin
nx L
nK sinh L
= L − x sinx
This is a Fourier sine expansion of L − x sinx on 0 L, so we must choose the entire coefficient to be the sine coefficient: nK n 2 L bn sinh L − sin sin = d L L 0 L = 4L2
n 1 − −1n cosL L4 − 2L2 n2 2 + n4 4
Then bn =
n 1 − −1n cosL 4L2 sinhnk/L L2 − n2 2 2
The solution is ux y =
ny nx n 1 − −1n cosL 4L2 sinh sin 2 2 2 2 L − n L L n=1 sinhnk/L
If nonzero boundary data is prescribed on all four sides of R, define four Dirichlet problems, in each of which the boundary data is nonzero on only one side. This process is outlined in Figure 19.3. Each of these problems can be solved by separation of variables. If uj x y is the solution of the j th problem, then ux y =
4
uj x y
j=1
is the solution of the original problem. This sum will satisfy the original boundary data because each uj x y satisfies the nonzero data on one side and is zero on the other three.
19.3 Dirichlet Problem for a Disk
883
y (0, k)
f2(x)
y
y
R
g2(y) 0
f1(x) (L, 0) FIGURE 19.3
x
R
0
0
f1(x)
x
R 0
y f2(x)
0
0 g1(y)
y
g2(y)
0
x
R
0 g1(y)
0 x
0
0
0
ux y = *4j=1 uj x y.
SECTION 19.2
PROBLEMS
u x b = 0
y
In each of Problems 1 through 5, solve the Dirichlet problem for the rectangle, with the given boundary conditions.
ux 0 =
1. u0 y = u1 y = 0 for 0 ≤ y ≤ , ux = 0 and ux 0 = sinx for 0 ≤ x ≤ 1
u0 y = 0 ua y = gy
& 2 ux y = 0
for 0 ≤ y ≤ b
for 0 < x < a 0 < y < b
ux 0 = 0 ux b = fx
3. u0 y = u1 y = 0 for 0 ≤ y ≤ 4, ux 0 = 0, ux 4 = x cosx/2 for 0 ≤ x ≤ 1
u0 y =
4. u0 y = siny u y = 0 for 0 ≤ y ≤ , ux 0 = x − x ux = 0 for 0 ≤ x ≤
u a y = 0
x
for 0 ≤ x ≤ a
for 0 ≤ y ≤ b
8. Solve for the steady-state temperature distribution in a thin flat plate covering the rectangle 0 ≤ x ≤ a, 0 ≤ y ≤ b if the temperature on the vertical sides and bottom side are kept at zero, and the temperature along the top side is fx = xx − a2 .
5. u0 y = 0 u2 y = siny for 0 ≤ y ≤ , ux 0 = 0 ux = x sinx for 0 ≤ x ≤ 2 6. Apply separation of variables to solve the following mixed boundary value problem (mixed means that some boundary conditions are given on the function, and others on its partial derivatives). & 2 ux y = 0
for 0 ≤ x ≤ a
7. Apply separation of variables to solve the following mixed boundary value problem:
2. u0 y = y2 − y u3 y = 0 for 0 ≤ y ≤ 2, ux 0 = ux 2 = 0 for 0 ≤ x ≤ 3
19.3
R
9. Solve for the steady-state temperature distribution in a thin flat plate covering the rectangle 0 ≤ x ≤ 4, 0 ≤ y ≤ 1 if the temperature on the horizontal sides is zero, while on the left side it is fy = siny and on the right side it is fy = y1 − y.
for 0 < x < a 0 < y < b
Dirichlet Problem for a Disk We will solve the Dirichlet problem for a disk of radius R centered about the origin. In polar coordinates, the problem is & 2 ur = 0
for 0 ≤ r < R − ≤ ≤
uR = f for − ≤ ≤
x
884
CHAPTER 19
The Potential Equation
Laplace’s equation in polar coordinates is
2 u 1 u 1 2 u + + = 0
r 2 r r r 2 2 It is easy to check that the functions 1 r n cosn
and r n sinn
are all harmonic on the entire plane. Thus attempt a solution 1 ur = a0 + an r n cosn + bn r n sinn 2 n=1
To satisfy the boundary condition, we need to choose the coefficients so that 1 uR = f = a0 + an Rn cosn + bn Rn sinn 2 n=1
But this is just the Fourier expansion of f on − , leading us to choose 1 a0 = fd − 1 f cosnd an Rn = − and bn R n =
1 f sinnd −
Then an =
1 f cosnd Rn −
bn =
1 f cosnd Rn −
and
The solution is ur =
1 fd 2 − n r 1 f cosnd cosn + f sinnd sinn + n=1 R − −
This can also be written ur = or
n r 1 1 fd + f cosn − d 2 − n=1 R −
n r 1 cosn − fd 1+2 ur = 2 − n=1 R
19.3 Dirichlet Problem for a Disk
EXAMPLE 19.2
Solve the Dirichlet problem & 2 ur = 0
for 0 ≤ r < 4 − ≤ ≤
u4 = f = 2 The solution is ur =
for − ≤ ≤
n r 1 2 1 d + 2 cosn − d 2 − n=1 4 −
4−1n r n 1 cosn = 2 + 3 n2 4 n=1
EXAMPLE 19.3
Solve the Dirichlet problem & 2 ux y = 0
for x2 + y2 < 9
ux y = x2 y2
for x2 + y2 = 9
Convert the problem to polar coordinates, using x = r cos and y = r sin. Let ux y = ur cos r sin = Ur The condition on the boundary, where r = 3, becomes U3 = 9 cos2 9 sin2 = 81 cos2 sin2 = f The solution is Ur =
1 81 sin2 cos2 d 2 − n 1 r 81 cos2 sin2 cosnd cosn + n=1 3 −
2 2 81 cos sin sinnd sinn + −
Now
81 4 − 0 81 cos2 sin2 cosnd = −81/8 − and
81 sin2 cos2 d =
−
Therefore Ur =
if n = 4 for n = 4
81 cos2 sin2 sinnd = 0
1 81 1 81 r 4 81 1 4 cos4 = − − r cos4 2 4 8 3 8 8
885
CHAPTER 19
886
The Potential Equation
To convert this solution back to rectangular coordinates, use the fact that cos4 = 8 cos4 − 8 cos2 + 1 Then 81 1 4 − 8r cos4 − 8r 2 r 2 cos2 + r 4 8 8 81 1 4 = − 8x − 8x2 + y2 x2 + x2 + y2 2 = ux y 8 8
Ur =
SECTION 19.3
PROBLEMS
In each of Problems 1 through 8, find a series solution for the Dirichlet problem for a disk of the given radius, with the given boundary data (in polar coordinates). 1. R = 3 f = 1
9. & 2 ux y = 0 ux y = x2
2. R = 3 f = 8 cos4
for x2 + y2 < 16 for x2 + y2 = 16
10. & 2 ux y = 0
3. R = 2 f = 2 −
for x2 + y2 < 9 for x2 + y2 = 9
ux y = x − y
4. R = 5 f = cos
11. & 2 ux y = 0
5. R = 4 f = e−
for x2 + y2 < 4
ux y = x − y 2
6. R = 1 f = sin2 7. R = 8 f = 1 − 3
12. & 2 ux y = 0
8. R = 4 f = e2
19.4
By converting to polar coordinates, write a solution for each of the following Dirichlet problems.
ux y = xy
2
for x2 + y2 = 4
for x2 + y2 < 25 for x2 + y2 = 25
Poisson’s Integral Formula for the Disk We have a series formula for the solution of the Dirichlet problem for a disk. In this section we will derive an integral formula for this solution. The problem for a disk of radius 1, in polar coordinates, is & 2 ur = 0
for 0 ≤ r < 1 − ≤ ≤
u1 = f for − ≤ ≤ The series solution from the preceding section, with R = 1, is 1 ur = 1 + 2 r n cosn − fd 2 − n=1 The quantity 1 n 1 + 2 r cosn" 2 n=1
(19.1)
19.4 Poisson’s Integral Formula for the Disk
887
is called the Poisson kernel, and is denoted Pr ". In terms of the Poisson kernel, the solution is ur = Pr − fd −
We will now obtain a closed form for the Poisson kernel, yielding an integral formula for the solution. Let z = rei" . By Euler’s formula, zn = r n ein" = r n cosn" + ir n sinn" so r n cosn", which appears in the Poisson kernel, is the real part of zn , denoted Rezn . Then n n 1 + 2 r cosn" = Re 1 + 2 z n=1
n=1
But z = r < 1 in the unit disk, so the geometric series
zn =
n=1
Then 1+2
r cosn" = Re 1 + 2 n
n=1
n n=1 z
converges. Further,
z 1−z
n
z
n=1
1+z 1 + rei" z = Re 1 + 2 = Re = Re 1−z 1−z 1 − rei"
The rest is just computation to help us extract this real part: 1 + rei" 1 − re−i" 1 + rei" = 1 − rei" 1 − rei" 1 − re−i" 1 − r 2 + r ei" − e−i" 1 − r 2 + 2ir sin" = = 1 + r 2 − r ei" + e−i" 1 + r 2 − 2r cos" Therefore 1+2
r cosn" = Re 1 + 2 n
n=1
n=1
n
z
=
1 − r2 1 + r 2 − 2r cos"
hence the solution given by equation (19.1) is ur =
1 1 − r2 fd 2 2 − 1 + r − 2r cos −
This is Poisson’s integral formula for the solution of the Dirichlet problem for the unit disk. For a disk of radius R, a simple change of variables yields the solution ur =
1 R2 − r 2 fd 2 − R2 + r 2 − 2Rr cos −
This integral, for the disk of radius R, is also known as Poisson’s formula.
(19.2)
CHAPTER 19
888
The Potential Equation
EXAMPLE 19.4
Revisit the problem & 2 ur = 0
for 0 ≤ r < 4 − ≤ ≤
u4 = f = 2
for − ≤ ≤
which was solved by Fourier series in the preceding section. The Poisson integral formula for the solution is 16 − r 2 1 2 d ur = 2 2 − 16 + r − 8r cos − 2 16 − r 2 = d 2 2 − 16 + r − 8r cos − This integral cannot be evaluated in closed form, but is often more suitable for numerical approximations than the infinite series solution.
SECTION 19.4
PROBLEMS
In each of Problems 1 through 4, write an integral formula for the solution of the Dirichlet problem for a disk of radius R about the origin, with the given data function on the boundary. Use the integral solution to approximate the value of ur at the given points. 1. R = 1 f = ; 1/2 3/4 /3 02 /4 2. R = 4 f = sin4; 1 /6 3 7/2 1 /4 25 /12 3. R = 15 f = − ; 4 12 3/2 8 /4 7 0 3
4. R = 6 f = e− ; 55 3/5 4 2/7 1 4 9/4 5. Dirichlet’s integral formula can sometimes be used to evaluate quite general integrals. As an example, let n be a positive integer and let ur = r n sinn for 0 ≤ r < R, 0 ≤ ≤ 2. We know that u is harmonic on the entire plane. We may therefore think of u as the solution of the Dirichlet problem on the disk r ≤ R
19.5
satisfying uR = f = Rn sinn. Use the Poisson integral formula of equation (19.2) (knowing in this case the solution) to write 1 2 R2 − r 2 r n sinn = 2 0 r 2 − 2rR cos − + R2 × Rn sinnd Now evaluate uR/2 /2 to derive the integral formula 2 sinn d = sinn/2 5 − 4 sin 32n−1 0 6. In Problem 5, evaluate uR/2 . What integral is obtained? 7. Use the strategy outlined in Problem 5, but now use ur = r n cosn. Obtain integrals by evaluating uR/2 /2 and uR/2 . 8. What integral formula is obtained by setting ur = 1 in Poisson’s integral formula?
Dirichlet Problems in Unbounded Regions We will consider the Dirichlet problem for some regions that are unbounded in the sense of containing points arbitrarily far from the origin. For such problems, the Fourier integral, Fourier transform, or Fourier sine or cosine transform may be a good means to a solution.
19.5 Dirichlet Problems in Unbounded Regions
19.5.1
889
Dirichlet Problem for the Upper Half Plane
Consider the problem & 2 ux y = 0
for − < x < y > 0
ux 0 = fx for − < x < We want a function that is harmonic on the upper half plane, and takes on given values along the x-axis. Let ux y = XxYy and separate the variables in Laplace’s equation to obtain X + X = 0 Y − Y = 0 We want a bounded solution. Consider cases on . Case 1 = 0. Now Xx = ax + b, and we obtain a bounded solution by choosing a = 0. Thus 0 is an eigenvalue of this problem, with constant eigenfunctions. Case 2 = −2 < 0. Now Xx = aex + be−x . But ex → as x → , so we must choose a = 0. And e−x → as x → − , so we must let b = 0 also, leaving the trivial solution. This problem has no negative eigenvalue. Case 3 = 2 > 0. Now Xx = a cosx + b sinx, a bounded function for any constants a and b. The equation for Y now becomes Y − 2 Y = 0, with general solution Yy = aey + be−y . Since y > 0 and > 0, ey → as y → , so we need a = 0. However, e−y is bounded for y > 0, so Yy = be−y . For each ≥ 0, we now have a function u x y = a cosx + b sinxe−y that satisfies Laplace’s equation. Attempt a solution of the problem with the superposition ux y =
0
a cosx + b sinxe−y d
To satisfy the boundary condition, choose the coefficients so that ux 0 = fx =
0
a cosx + b sinxd
This is the Fourier integral expansion of fx, so a =
1 f cosd −
b =
1 f sind −
and
890
CHAPTER 19
The Potential Equation
With these coefficients, we have the solution, which can be written in a compact form, involving only one integral, as follows. Write 1 ux y = f cosd cosx 0 −
+ f sind sinx e−y d −
1 =
cos cosx + sin sinx fe−y dd 0 −
1 = cos − xe−y d fd − 0 The inner integral can be evaluated explicitly:
e−y cos − xe−y d = 2
−y cos − x + − x sin − x y + − x2 0 0 y = 2 y + − x2 Therefore the solution of the Dirichlet problem for the upper half plane is y f ux y = d − y2 + − x2
(19.3)
To illustrate the technique, we will solve this problem again using the Fourier transform. Solution Using the Fourier Transform Apply the Fourier transform in the x variable, to Laplace’s equation. Now / y passes through the transform, and we can use the operational rule to take the transform of the derivative with respect to x. We get 2 2
u
u
2 uˆ + = 2 y − 2 uˆ y = 0 2 2
y
x
y The general solution of this differential equation in the y variable is uˆ y = a ey + b e−y Keep in mind that here varies over the real line (unlike in the solution by Fourier integral, whee designated a variable of integration over the half-line). Because ey → as y → , we must have a = 0 for positive . But e−y → as y → if < 0, so b = 0 for negative . Thus, b e−y if ≥ 0 uˆ y = y if < 0 a e We can consolidate this notation by writing uˆ y = c e−y To solve for c use the fact that ux 0 = fx to get uˆ 0 = fˆ = c The Fourier transform of the solution is uˆ y = fˆe−y
19.5 Dirichlet Problems in Unbounded Regions
891
To obtain ux y, apply the inverse Fourier transform to this function: ux y = −1 fˆe−y x 1 ˆ f e−y eix d 2 − 1 −i = fe d e−y eix d 2 − − 1 −y −i−x = e e d fd 2 − − =
Now e−i−x = cos − x − i sin − x and a routine integration gives −
e−y e−i−x d =
2y y2 + − x2
The solution by Fourier transform is ux y =
y f d − y2 + − x2
in agreement with the solution obtained by using separation of variables.
19.5.2
Dirichlet Problem for the Right Quarter Plane
Sometimes we can use the solution of one problem to produce a solution of another problem. We will illustrate this with the Dirichlet problem for the right quarter plane: & 2 ux y = 0
for x > 0 y > 0
ux 0 = fx for x ≥ 0 u0 y = 0
for y ≥ 0
The boundary of the right quarter plane consists of the nonnegative x axis, together with the nonnegative y axis, and information about the function sought must be given on both segments. In this case we are prescribing zero values on the vertical part, and given values fx on the horizontal part of the boundary. We could solve this problem be separation of variables. However, if we fold the upper half plane across the vertical axis, we obtain the right quarter-plane, suggesting that we explore the possibility of using the solution for the upper half-plane to obtain the solution for the right quarter plane. To do this, let fx for x ≥ 0 gx = anything for x < 0 By “anything,” we mean that for the moment we do not care what values are given gx for x < 0, but reserve the right to assign these values later. The Dirichlet problem & 2 ux y = 0
for − < x < y > 0
ux 0 = gx for − < x <
892
CHAPTER 19
The Potential Equation
for the upper half-plane has the solution uhp x y = Write this as uhp x y =
g y d − y2 + − x2
g g y 0 d + d − y2 + − x2 y2 + − x2 0
Change variables in the left-most integral by letting w = −. This integral becomes 0 0 g g−w d = −1dw 2 2 2 2 − y + − x y + w + x Now replace the integration variable by again to write
g− g y 0 −1d + d uhp x y = y2 + + x2 y2 + − x2 0 g− y f d = + 0 y2 + + x2 y2 + − x2 where in the last integral we have used the fact that g = f if ≥ 0. Now fill in the “anything” in the definition of g. Observe that the last integral will vanish on the positive y axis, at points 0 y, if f + g− = 0 for ≥ 0. This will occur if g− = −f. That is, make g the odd extension of f to the entire real line, obtaining 1 y 1 uhp x y = − fd 0 y2 + − x2 y2 + + x2 This is the solution of this particular Dirichlet problem for the upper half-plane. But this function is also harmonic on the right quarter-plane, vanishes when x = 0, and equals fx if x ≥ 0 and y = 0. Therefore uhp x y is also the solution of this Dirichlet problem for the right quarter-plane.
EXAMPLE 19.5
Consider the problem & 2u = 0 u0 y = 0 ux 0 = xe The solution is ux y =
for x > 0 y > 0 for y > 0 −x
for x > 0
y 1 1 − e− d 0 y2 + − x2 y2 + + x2
EXAMPLE 19.6
We will solve the problem & 2u = 0
for x > 0 y > 0
u0 y = 0
for y > 0
ux 0 = 1
for x > 0
19.5 Dirichlet Problems in Unbounded Regions The solution is ux y =
893
1 1 y y d − d 0 y2 + − x2 0 y2 + + x2
These integrals can be evaluated in closed form. For the first, y 1 1 y d = 2 d 2 2 0 y + − x 0 2 y 1 + −x y = By a similar calculation, y Then
0
y1 y
x x 1 1 − arctan − = + arctan 2 y 2 y
x 1 1 1 − arctan d = y2 + + x2 2 y x 2 ux y = arctan y
This function is harmonic on the right quarter plane and u0 y = 0 for y > 0. Further, if x > 0, x 2 2 lim arctan = = 1 y→0+ y 2 as required.
19.5.3
An Electrostatic Potential Problem
Consider the problem & 2 ux y = −h
for 0 < x < y > 0
u0 y = 0 u y = 1 ux 0 = 0
for y > 0
for 0 < x <
This is a Dirichlet problem if h = 0, but we will assume that h is a positive constant. This problem models the electrostatic potential in the strip consisting of all x y with 0 < x < and y > 0, assuming a uniform distribution of charge having density h/4 throughout this region. The partial differential equation & 2 u = −h is called Poisson’s equation. The boundary of the strip consists of the half-lines x = 0 and x = with y ≥ 0, and the segment on the x axis with 0 ≤ x ≤ . The strip and its boundary are shown in Figure 19.4. y
π FIGURE 19.4
Strip 0 ≤ x ≤ , y ≥ 0.
x
894
CHAPTER 19
The Potential Equation
Consider the possibilities for approaching this problem. Since y > 0, we might consider a Fourier sine or cosine transform in y. The difficulty here is that, in transforming Poisson’s equation, we would have to take the transform of −h, and a constant does not have a sine or cosine transform. For example, if we try to compute the Fourier sine transform, we must evaluate −h sinxdx 0
and this integral diverges. Since x varies from 0 to , we might try a finite Fourier sine or cosine transform in x. If we try the finite Fourier cosine transform, then the operational formula requires that we have information about the derivative of the function at the origin, and we have no such information. However, the finite sine transform’s operational formula requires information about the function at the ends of the interval, and this is given in the boundary conditions for y > 0. We will therefore attempt a solution using this transform. Denote the finite Fourier sine transform in the x variable as ux y = , uS n y Now apply the transform with respect to x to Poisson’s equation: 2 2
u
u + = −h 2
x
y2 By the operational formula, 2
u = −n2, uS n y − n−1n u y + nu0 y
x2 Because x and y are independent, 2 2
u
u
2
2 , u n y x y sinnxdx = ux y sinnxdx = = 2
y2
y2 0
y2 S 0 y Finally, −h =
0
h −h sinnxdx = − 1 − −1n n
Therefore Poisson’s equation transforms to −n2, uS n y − n−1n u y + nu0 y +
2 h , u n y = − 1 − −1n
y2 S n
Now u y = 1 and u0 y = 0, so this equation can be written as
2 h , u n y − n2, uS n y = n−1n − 1 − −1n
y2 S n For n = 1 2 this equation has general solution , uS n y = an eny + bn e−ny +
h −1n+1 + 3 1 − −1n n n
For this function to remain bounded for y > 0, choose an = 0 for n = 1 2 Then , uS n y = bn e−ny +
−1n+1 h + 3 1 − −1n n n
19.5 Dirichlet Problems in Unbounded Regions
895
To solve for bn , take the transform of the condition ux 0 = 0 to get 0 =, uS n 0 = bs +
h −1n+1 + 3 1 − −1n n n
Then bn = We therefore have
h −1n − 3 1 − −1n n n
−1n −1n+1 h h n , uS n y = − 3 1 − −1 e−ny + + 3 1 − −1n n n n n
n −1 h − 1 − −1n e−ny − 1 = n n
By the inversion formula, these are the coefficients in the Fourier sine series (in x) of the solution so the solution is
−1n h 2 − 1 − −1n e−ny − 1 sinnx ux y = n=1 n n
SECTION 19.5
PROBLEMS
1. Write an integral solution for the Dirichlet problem for the upper half plane if the boundary data is ⎧ ⎪ ⎨−1 for − 4 ≤ x < 0 fx = 1 for 0 ≤ x ≤ 4 ⎪ ⎩ 0 for x > 4
5. Write an integral solution for the Dirichlet problem for the right quarter-plane if
2. Write an integral solution for the Dirichlet problem for the upper half plane if the boundary data is fx = e−x .
6. Write an integral solution for the Dirichlet problem for the lower half-plane y < 0. 7. Solve the electrostatic potential problem for the strip 0 < x < , y > 0 with the boundary data
3. Write an integral solution for the Dirichlet problem for the right quarter plane if ux 0 = e−x cosx u0 y = 0
for y > 0
4. Write an integral solution for the Dirichlet problem for the right quarter plane if ux 0 = 0
for x > 0
for x > 0
u0 y = gy
for y > 0
and
u0 y = 0 u y = 0
for x > 0
and
ux 0 = fx
ux 0 = B sinx
u0 y = gy
for y > 0
Derive a solution first using separation of variables, and then by using an appropriate Fourier transform
for 0 < x <
Here B is a positive constant. 8. Solve the Dirichlet problem for the strip − < x < , 0 < y < 1 if ux 0 = 0
for x < 0
and ux 0 = e−x
and
for y > 0
for x > 0
with a positive number. 9. Solve the Dirichlet problem for the strip 0 < x < , y > 0 if u0 y = 0
and
u y = 2
for y > 0
CHAPTER 19
896
The Potential Equation
and ux 0 = −4
for 0 < x <
10. Solve the following problem in which data on the boundary is a mixture of values of the function and values of a partial derivative of the function: & 2u = 0
for 0 < x < 0 < y < 2
u0 y = 0 and u y = 4
for 0 < y < 2
and
u x 0 = ux 2 = 0
y
for 0 < x <
11. Find the steady-state temperature distribution in a thin, homogeneous flat plate extending over the right quarter plane x ≥ 0 y ≥ 0 if the temperature at y on
19.6
the vertical side is e−y and the temperature on the horizontal side is zero. 12. Solve for the steady-state temperature distribution in a homogeneous infinite flat plate covering the halfplane x ≥ 0 if the temperature on the boundary x = 0 is fy, where 1 for y ≤ 1 fy = 0 for y > 1 13. Solve for the steady-state temperature distribution in an infinite, homogeneous flat plate covering the half-plane y ≥ 0 if the temperature on the boundary y = 0 is zero for x < 4, constant A for 4 ≤ x ≤ 8, and zero for x > 8. 14. Write a general expression for the stead-state temperature distribution in an infinite, homogeneous flat plate covering the strip 0 ≤ y ≤ 1, x ≥ 0 if the temperatures on the left boundary and on the bottom side are zero, and the temperature on the top part of the boundary is fx.
A Dirichlet Problem for a Cube We will illustrate a Dirichlet problem in 3-space. Consider: & 2 ux y z = 0
for 0 < x < A 0 < y < B 0 < z < C
ux y 0 = ux y C = 0 u0 y z = uA y z = 0 ux 0 z = 0 ux B z = fx z We want a function that is harmonic on the cube (which may have sides of unequal length), and zero on five sides, but with prescribed values fx z on the sixth side. Let ux y z = XxYyZz to obtain X Y Z =− − = − X Y Z and then, after a second separation, Z Y = − = − Z Y Then X + X = 0 Z + Z = 0
and
Y − + Y = 0
From the boundary conditions, X0 = XA = 0 Z0 = ZC = 0 and Y0 = 0
19.6 A Dirichlet Problem for a Cube
897
The problems for X and Z are familiar ones, and we obtain eigenvalues and eigenfunctions: n2 2 A2
Xn x = sin
m2 2 C2
Zm z = sin
n =
nx A
and m =
mz C
with n and m independently varying over the positive integers. The differential equation for Yy becomes n2 2 m2 2 Y − Y = 0 + A2 C2
Y0 = 0
This has solutions that are constant multiples of sinhnm y, where nm =
n 2 2 m2 2 + A2 C2
For each positive integer n and m, we now have a function unm x y z = cnm sin
nx A
sin
mz C
sinh nm y
which satisfies Laplace’s equation and the zero boundary conditions given on five of the faces of the cube. To satisfy the condition on the sixth face, we generally must use a superposition ux y z =
cnm sin
nx
n=1 m=1
A
sin
mz C
sinh nm y
Now we must choose the coefficients so that ux B z = fx z =
n=1 m=1
cnm sin
nx A
sin
mz C
sinh nm B
We have encountered this kind of double Fourier sine expansion previously, in treating vibrations of a fixed-frame rectangular elastic membrane. From that experience, we can write cnm =
A C n m" 4 f " sin sin d"d AC sinhnm B 0 0 A C
As usual, if nonzero data is prescribed on more than one face, then we split the Dirichlet problem into a sum of problems, on each of which there is nonzero data on only one face.
CHAPTER 19
898
SECTION 19.6
The Potential Equation
PROBLEMS 3. & 2 ux y z = 0
1. Solve & ux y z = 0 2
ux y 0 = 0 ux y = x − xy − y
for 0 < x < 1 0 < y < 1 0 < z < 1
ux y 0 = ux 1 z = 0
u0 y z = u1 y z = 0
u0 y z = u1 y z = 0
ux 0 z = ux y 0 = 0
ux 0 z = 0 ux y 1 = xy
ux y = 1 ux 2 z = 2
2. Solve & 2 ux y z = 0 for 0 < x < 2 0 < y < 2 0 < z < 1
4. & 2 ux y z = 0
ux y 0 = ux y 1 = 0
for 0 < x < 1 0 < y < 2 0 < z <
ux y 0 = x2 1 − xy2 − y ux y = 0
u0 y z = 0
u0 y z = 0 u1 y z = siny sinz
ux 0 z = 0 ux 2 z = 0
ux 0 z = ux 2 z = 0
u2 y z = z
19.7
for 0 < x < 1 0 < y < 2 0 < z <
The Steady-State Heat Equation for a Solid Sphere Consider a solid sphere of radius R, centered at the origin. We want to solve for the steady-state temperature distribution, given the temperature at all times on the surface. In the steady-state case, u/ t = 0 and the heat equation is Laplace’s equation & 2 u = 0. We will use spherical coordinates , in which is the distance from the origin to x y z, is the polar angle between the positive x axis and the projection onto the x y plane of the line from the origin to x y z, and is the angle of declination from the positive z axis to this line (Figure 19.5). We will also assume symmetry about the z axis, so u is a function of and only. Then u/ = 0, and Laplace’s equation becomes & 2 u =
1 2 u cot u
2 u 2 u + = 0 + +
2 2 2 2 z ϕ
(ρ, θ, ϕ) ρ
θ
y
x FIGURE 19.5
Spherical coordinates.
The temperature on the surface is uR = f
19.7 The Steady-State Heat Equation for a Solid Sphere
899
To separate variables in the differential equation, let u = X# to obtain 2 1 cot X # + X # + 2 X# + X# = 0 2 Then # X # X + cot = −2 − 2 = − # # X X Then 2 X + 2X − X = 0
and # + cot# + # = 0
The differential equation for # can be written 1
# sin + # = 0 sin
(19.4)
Change variables by putting x = cos Then = arccosx. Let Gx = #arccosx Since 0 ≤ ≤ , then −1 ≤ x ≤ 1. Compute # sin = sin
d# dx dx d
= sinG x − sin = − sin2 G x = − 1 − cos2 G x = −1 − x2 G x Then d d
# sin = − 1 − x2 G x d d =−
dx d
1 − x2 G x dx d
=−
d
1 − x2 G x− sin dx
Then d d 1
# sin =
1 − x2 G x sin d dx and equation (19.4) transforms to
1 − x2 G x + Gx = 0 This is Legendre’s differential equation (Section 16.1). For bounded solutions, choose = nn + 1 for n = 0 1 2 These are the eigenvalues of this problem. The eigenfunctions are nonzero constant multiples of the Legendre polynomials Pn x. For n = 0 1 2 we now have a solution of the differential equation for #: #n = Gcos = Pn cos
900
CHAPTER 19
The Potential Equation
Now that we know the admissible values for , the differential equation for X becomes 2 X + 2X − nn + 1X = 0 This is a second order Euler differential equation, with general solution X = an + b−n−1 We must choose b = 0 to have a bounded solution at the center of the sphere, because −n−1 → as → 0+. Thus Xn = an n For each nonnegative integer n, we now have a function un = an n Pn cos that satisfies Laplace’s equation. To satisfy the boundary condition, write a superposition of these functions: u =
an n Pn cos
n=0
We must choose the coefficients to satisfy uR =
an Rn Pn cos = f
n=0
To put this into the setting of Fourier-Legendre expansions, let = arccosx to write
an Rn Pn x = farccosx
n=0
This is a Fourier-Legendre series for the known function farccosx. From Section 16.1.5, the coefficients are 2n + 1 1 an Rn = farccosxPn xdx 2 −1 or an =
2n + 1 1 farccosxPn xdx 2Rn −1
The steady-state temperature distribution is n 2n + 1 1 u = farccosxPn xdx Pn cos 2 R −1 n=0
EXAMPLE 19.7
Consider this solution in a specific case, with f = . Now n 2n + 1 1 u = arccosxPn xdx Pn cos 2 R −1 n=0
19.7 The Steady-State Heat Equation for a Solid Sphere
901
We will approximate some of these coefficients by approximating the integrals. From Section 16.1, the first six Legendre polynomials are P0 x = 1 P1 x = x
1 P2 x = 3x2 − 1 2 1 P4 x = 35x4 − 30x2 + 3 8
1 P3 x = 5x3 − 3x 2 1 P5 x = 63x5 − 70x3 + 15x 8 Approximate:
1 −1
arccosxdx ≈
−1
and
−1 1
−1 1
1
1 −1
x arccosxdx ≈ − 785 4
1 3x2 − 1 arccosxdx = 0 2
1 5x3 − 3x arccosxdx ≈ −4 908 7 × 10−2 2
1 35x4 − 30x2 + 3 arccosxdx = 0 8
1
1 63x5 − 70x3 + 15x arccosxdx ≈ −1 227 2 × 10−2 8 Taking the first six terms of the series as an approximation to the solution, we obtain 1 3 7 1 3 u ≈ − 7854 cos − 049087 5 cos3 − 3 cos 2 2 R 2 2 R 3 1 11 63 cos5 − 70 cos3 + 15 cos − 012272 2 R 8 Some of these terms can be combined, but we have written them all out initially to indicate how they arise. −1
We will return to the Dirichlet problem again when we treat complex analysis. There we will be in a position to exploit conformal mappings. The idea will be to map the region of interest in a certain way to the unit disk. Since we can solve the Dirichlet problem for the disk (that is, we know a formula for the solution), this maps the original problem to a problem we can solve. We then attempt to invert the map to transform the solution for the disk back into the solution for the original region. We will conclude this chapter with a brief discussion of the Neumann problem.
SECTION 19.7
PROBLEMS
1. Write a solution for the steady-state temperature distribution in the sphere if the initial data is given by f = A2 , in which A is a positive constant. Carry out an approximation integration to obtain the coefficients, and write (approximately) the first six terms of the series solution.
2. Carry out the program of Problem 1 for the initial data function f = sin for 0 ≤ ≤ . 3. Carry out the program of Problem 1 for the initial data function f = 3 . 4. Carry out the program of Problem 1 for the initial data function f = 2 − 2 .
902
CHAPTER 19
The Potential Equation nates is given by 0 ≤ ≤ R, 0 ≤ ≤ 2, 0 ≤ ≤ /2. The base disk is kept at temperature zero and the hemispherical surface is kept at constant temperature A. Assume that the distribution is independent of .
5. Solve for the steady-state temperature distribution in a hollowed sphere, given in spherical coordinates by R1 ≤ ≤ R2 . The inner surface = R1 is kept at constant temperature T1 , while the outer surface = R2 is kept at temperature zero. Assume that the temperature distribution is a function of and only. 6. Approximate the solution of Problem 5 by writing the first six terms of the series solution, carrying out any required integrations by a numerical method. 7. Solve for the steady-state temperature distribution in a solid closed hemisphere, which in spherical coordi-
19.8
8. Redo Problem 7, except now the base is insulated instead of being kept at temperature zero. 9. Redo Problem 7 for the case that the temperature on the hemispherical surface is uR = f, not necessarily constant.
The Neumann Problem A Neumann problem in the plane consists of finding a function that is harmonic on a given region D, and whose normal derivative on the boundary of the region is given. This problem has the form & 2 ux y = 0
for x y in D
u x y = gx y for x y in D
n where, as usual, D denotes the boundary of D. This boundary is often a piecewise smooth curve in the plane (but not necessarily a closed curve). The normal derivative is defined by
u = &u · n
n the dot product of the gradient of u with the unit outer normal to the curve (Figure 19.6). If this normal is n = n1 i + n2 j, then u/ n is
u
u
u = n1 + n 2
n
x
y We will use the following.
y
P D
x
∂D FIGURE 19.6 Outer normal n at a point on D.
19.8 The Neumann Problem LEMMA 19.1
903
Green’s First Identity
Let D be a bounded region in the plane, whose boundary D is a closed, piecewise smooth curve. Let k and h be continuous with continuous first and second partial derivatives on D and its boundary. Then
h k ds = k& 2 h + &k · &hdA
D n D In this line integral, ds denotes integration with respect to arc length along the curve bounding D. A 3-dimensional version was proved in Section 13.8.4 using Gauss’s divergence theorem. Here is a proof for this version in the plane. Proof
By Green’s theorem write
h k ds = k&h · nds = div k&h dA
D n
D D
Now,
h
h divk&h = div k i + k j
x
y
h
h = k + k
x
x
y
y 2 2
h h
k h k h + + + =k
x2 y2
x x y y = k& 2 h + &k · &h
Use this result as follows. If k = 1 and h = u, a harmonic function on D, then the double integral is zero because its integrand vanishes, and the line integral is just the line integral of the normal derivative of u over the boundary of the region. But on D, u/ n = g, a given function. We conclude that - u ds = gds = 0
D n
D This means that a necessary condition for a Neumann problem to have a solution is for the integral of the given normal derivative about the boundary of the region to be zero. This conclusion can be extended to the case that D is not a closed curve. For example, the boundary of the upper half plane is the horizontal axis, which is not a closed curve.
EXAMPLE 19.8
Solve the Neumann problem for a square: & 2 ux y = 0
for 0 ≤ x ≤ 1 0 ≤ y ≤ 1
subject to
u =0
n on the left side and top and bottom sides, while
u 1 y = y2
n
for 0 ≤ y ≤ 1
904
CHAPTER 19
The Potential Equation
Since
D
1
u 1 y2 dy = = 0 ds =
n 3 0
this problem has no solution. Existence can also be a question for a Dirichlet problem. However, for a Dirichlet problem, if the function given on the boundary is well-behaved (for example, continuous), and the region is “simple” (such as a disk, rectangle, half-planes and the like), then the Dirichlet problem has a solution. For Neumann problems, even for simple regions and apparently well-behaved data given for the normal derivative, there may be no solution if the integral of the data function about the boundary is not zero. We will now solve two Neumann problem to illustrate what is involved.
19.8.1 A Neumann Problem for a Rectangle Consider the problem & 2 ux y = 0
for 0 < x < a 0 < y < b
u
u x 0 = x b = 0
y
y
for 0 ≤ x ≤ a
u 0 y = 0 for 0 ≤ y ≤ b
x
u a y = gy for 0 ≤ y ≤ b
x For the rectangle, the normal derivative is u/ x on the vertical sides, and u/ y on the horizontal sides. As a necessary (but not sufficient) condition for existence of a solution, we require that b gydy = 0 0
This example will clarify why there can be no solution without this condition. Let ux y = XxYy and obtain X + X = 0 Y − Y = 0 Now
u x 0 = XxY 0 = 0
y implies that Y 0 = 0. Similarly,
u x b = XxY b = 0
y implies that Y b = 0. The problem for Y is Y − Y = 0 Y 0 = Y b = 0 This familiar Sturm–Liouville has eigenvalues and eigenfunctions ny n2 2 and Yn y = cos n = − 2 b b for n = 0 1 2
19.8 The Neumann Problem
905
Now the problem for X is X −
n2 2 X = 0 b2
Further,
u 0 y = X 0Yy = 0
x implies that X 0 = 0 For n = 0, the differential equation for X is just X = 0, so Xx = cx + d. Then X 0 = c = 0, so Xx = constant in this case. If n is a positive integer, then the differential equation for X has general solution Xx = cenx/b + de−nx/b Now X 0 =
n n c− d=0 b b
implies that c = d. This gives us Xn x = cosh
nx b
We now have functions u0 x y = constant and, for each positive integer n, un x y = cn cosh
nx b
cos
ny b
To satisfy the last boundary condition (on the right side of the rectangle), use a superposition ny nx ux y = c0 + cn cosh cos b b n=1 We need ny na
u n a y = gy = cn sinh cos
x b b n=1 b
a Fourier cosine expansion of gy on 0 b. Notice that the constant term in this expansion of gy is zero. This constant term is 1 b gydy b 0 which we have assumed to be zero. If this integral were not zero, then the cosine expansion of gy would have a nonzero constant term, contradicting the fact that that it does not. In this event this Neumann problem would have no solution. For the other coefficients in this cosine series, we have na 2 b n n c sinh = d g cos b n b b 0 b so b n 2 g cos d cn = n sinhna/b 0 b
906
CHAPTER 19
The Potential Equation
With this choice of coefficients, the solution of this Neumann problem is ux y = c0 +
cn cosh
n=1
nx b
cos
ny b
The number c0 is undetermined and remains arbitrary. This is because Neumann problems do not have unique solutions. If u is any solution of a Neumann problem, so is u + c for any constant c, because the boundary condition is on the normal derivative and c vanishes in this differentiation.
19.8.2 A Neumann Problem for a Disk We will solve the Neumann problem for a disk of radius R centered about the origin. In polar coordinates, the problem is & 2 ur = 0
for 0 ≤ r < R − ≤ ≤
u R = f for − ≤ ≤
r The normal derivative here is / r, since the line from the origin to a point on this circle is in the direction of the outer normal vector to the circle at that point. A necessary condition for existence of a solution is that fd = 0 −
and we assume that f satisfies this condition. As we did with the Dirichlet problem for a disk, attempt a solution 1 ur = a0 + an r n cosn + bn r n sinn 2 n=1
We need
u R = f = nan Rn−1 cosn + nbn Rn−1 sinn
r n=1
This is a Fourier expansion of f on − . The constant term in this expansion is 1 fd − and this must be zero because this Fourier series for u/ rR has a zero constant term. The assumption that this integral is zero is therefore consistent with this boundary condition. For the other coefficients, we need 1 nan Rn−1 an = f cosnd − and nan Rn−1 bn = Thus choose an =
1 f sinnd −
1 f cosnd nRn−1 −
19.8 The Neumann Problem
907
and
1 f sinnd nRn−1 − Upon inserting these coefficients, the solution is 1 1 r n R ur = a0 +
cosn cosn + sinn sinnfd 2 n=1 n R − bn =
We can also write this solution as
1 r n R 1 cosn − fd ur = a0 + 2 n=1 n R −
The term a0 /2 is an arbitrary constant. The factor of 1/2 in this arbitrary constant is just customary.
EXAMPLE 19.9
Solve the Neumann problem & 2 ux y = 0
for x2 + y2 < 1
u x y = xy2 for x2 + y2 = 1
n Switch to polar coordinates, letting ur cos r sin = Ur . Then & 2 Ur = 0
for 0 ≤ r < 1 − ≤ ≤
U 1 = cos sin2
r First, compute
−
cos sin2 d = 0
so it is worthwhile to try to solve this problem. Write the solution 1 1 1 rn Ur = a0 + cosn − cos sin2 d 2 n=1 n − Evaluate
−
cosn − cos sin d = 2
⎧ ⎪ ⎨0
cos ⎪ 4
⎩
− cos3 + 3 cos 4
for n = 2 4 5 6 if n = 1 if n = 3
The solution is therefore
1 1 3 1 Ur = a0 + r cos + r 3 − cos3 + cos 2 4 3 4 1 1 1 1 = a0 + r cos − r 3 cos3 + r 3 cos 2 4 3 4
To obtain the solution in rectangular coordinates, use x = r cos and r 2 = x2 + y2 to write 1 1 1 1 ux y = a0 + x − x3 + xx2 + y2 2 4 3 4
CHAPTER 19
908
The Potential Equation
Again, the solution has an arbitrary constant, which is written with a factor of because we started with a Fourier series and the constant is often called a0 /2.
1 2
simply
19.8.3 A Neumann Problem for the Upper Half Plane As an illustration of a Neumann problem for an unbounded domain, consider: & 2 ux y = 0
for − < x < y > 0
u x 0 = fx for − < x <
y The boundary of the region is the real axis, and / y is the derivative normal to this line. We require that − fxdx = 0 as a necessary condition for a solution to exist. There is an elegant device for reducing this problem to one we have already solved. Let v = u/ y. Then
2 u
2 u 2 u
2 u 2 + 2 = + & v= 2 = 0
x y
y y
y x2 y2 so v is harmonic wherever u is. Further
u vx 0 = x 0 = fx for − < x <
y Therefore v is the solution of a Dirichlet problem for the upper half plane. But we know the solution of this problem: y f vx y = d − y2 + − x2 Now recover u from v by integrating. To within an arbitrary constant, y u f dy = ddy ux y = 2
y − y + − x2 y 1 dy fd = − y2 + − x2 1 = lny2 + − x2 fd + c 2 − in which c is an arbitrary constant. This gives the solution of the Neumann problem for the upper half-plane.
SECTION 19.8
PROBLEMS 2. Solve
1. Solve & ux y = 0 2
for 0 < x < 1 0 < y < 1
u
u x 0 = 4 cosx x 1 = 0
y
y
u
u 0 y = 1 y = 0
x
x
for 0 ≤ x ≤ 1
for 0 ≤ y ≤ 1
& 2 ux y = 0
for 0 < x < 1 0 < y <
u
u x 0 = x = 0
y
y
u 0 y = y −
x 2
for 0 ≤ x ≤ 1
for 0 ≤ y ≤
19.8 The Neumann Problem
u y = cosy
x
for 0 ≤ y ≤
3. Solve & ux y = 0 2
for 0 < x < 0 < y <
for 0 ≤ x ≤
u
u 0 y = y = 0
x
x
for 0 ≤ y ≤
4. Use separation of variables to solve the mixed boundary value problem for 0 < x < 0 < y <
ux 0 = fx ux = 0
u
u 0 y = y = 0
x
x
for 0 ≤ x ≤
for 0 ≤ y ≤
Does this problem have a unique solution? 5. Attempt a separation of variables to solve for 0 < x < 1 0 < y < 1
ux 0 = ux 1 = 0
for 0 ≤ x ≤ 1
u 0 y = 3y2 − 2y
x
u 1 y = 0
x
for 0 ≤ y ≤ 1
for 0 ≤ r < R − ≤ ≤
u R = sin3
r
for 0 ≤ r < R − ≤ ≤
u R = cos2
r
for − ≤ ≤
for − < x < y > 0
u x 0 = xe−x
y
for − ≤ ≤
for − < x <
9. Solve the following Neumann problem for the upper half plane: & 2 ux y = 0
for − < x < y > 0
u x 0 = e−x sinx for − < x <
y 10. Solve the following Neumann problem for the lower half plane: & 2 ux y = 0
for − < x < y < 0
u x 0 = fx for − < x <
y 11. Solve the following Neumann problem for the right quarter plane: & 2 ux y = 0
u 0 y = 0
x
for x > 0 y > 0 for y ≥ 0
u x 0 = fx for 0 ≤ x <
y 12. Solve the following mixed problem: & 2 ux y = 0
6. Write a series solution for & 2 ur = 0
& 2 ur = 0
& 2 ux y = 0
u x = 6x − 3
y
& 2 ux y = 0
7. Write a solution for
8. Solve the following Neumann problem for the upper half plane:
u x 0 = cos3x
y
& 2 ux y = 0
909
u0 y = 0
for x > 0 y > 0 for y ≥ 0
u x 0 = fx
y
for 0 ≤ x <
This page intentionally left blank
PA RT
7
CHAPTER 20 Geometry and Arithmetic of Complex Numbers
Complex Analysis
CHAPTER 21 Complex Functions
CHAPTER 22 Complex Integration
CHAPTER 23 Series Representation of Functions
CHAPTER 24 Singularities and the Residue Theorem
CHAPTER 25 Conformal Mappings
This part is devoted to the calculus of functions that act on complex numbers and produce complex numbers. Such functions are called complex functions, and we are interested in their derivatives and integrals. These have not only a rich theory, but applications that are sometimes surprising. We will begin with the algebra and geometry of the complex number system.
911
This page intentionally left blank
20
CHAPTER
COMPLEX NUMBERS MAGNITUDE AND CONJUGATE INEQUALITIES ORDERING LOCI AND SETS OF POINTS IN THE COMPLEX PLANE COMPLEX NUMBERS MAGNITUDE AND CONJUGATE INEQUALITIES ORDER
Geometry and Arithmetic of Complex Numbers
20.1
Complex Numbers A complex number is a symbol of the form x + iy, or x + yi, in which x and y are real numbers and i2 = −1. Arithmetic of complex numbers is defined by equality: a + ib = c + id exactly when a = c and b = d addition: a + ib + c + id = a + c + ib + d and multiplication: a + ibc + id = ac − bd + iad + bc In multiplying complex numbers, we proceed exactly as we would with first order polynomials a + bx and c + dx, but with i in place of x: a + bic + di = ac + adi + bci + bdi2 = ac − bd + ad + bci because i2 = −1. For example, 6 − 4i8 + 13i = 68 − −413 + i 613 + −48 = 100 + 46i The real number a is called the real part of a + bi, and is denoted Rea + bi. The real number b is the imaginary part, denoted Ima + bi. For example, Re−23 + 7i = −23 and Im−23 + 7i = 7 Both the real and imaginary part of any complex number are real numbers. We may think of the complex number system as an extension of the real number system in the sense that every real number a is a complex number a + 0i. This extension of the reals to the complex numbers has profound consequences, both for algebra and analysis. For example, the polynomial equation x2 + 1 = 0 has no real solution, but it has two complex solutions, i and −i. More generally, the fundamental theorem of algebra states that every polynomial of 913
914
CHAPTER 20
Geometry and Arithmetic of Complex Numbers
positive degree n, having complex coefficients (some or all of which may be real), has exactly n roots in the complex numbers, counting repeated roots. This means that we need never extend beyond the complex numbers to find roots of polynomials having complex coefficients, as we have to extend beyond the reals to find the roots of a simple polynomial such as x2 + 1. Complex addition obeys many of the same rules of arithmetic as real numbers. Specifically, for any complex numbers z, w and u, z + w = w + z (commutativity of addition) zw = wz (commutativity of multiplication) z + w + u = z + w + u (associativity of addition) zwu = zwu (associativity of multiplication) zw + u = zw + zu (distributivity) z+0 = 0+z z · 1 = 1 · z
20.1.1 The Complex Plane Complex numbers admit two natural geometric intepretations. First, we may identify the complex number a + bi with the point a b in the plane, as in Figure 20.1. In this interpretation each real number a, or a + 0i, is identified with the point a 0 on the horizontal axis, which is therefore called the real axis. A number 0 + bi, or just bi, is called a pure imaginary number, and is associated with the point 0 b on the vertical axis. This axis is called the imaginary axis. Because of this correspondence between complex numbers and points in the plane, we often refer to the x y plane as the complex plane. When complex numbers were first noticed (in solving polynomial equations), mathematicians were suspicious of them, and even the great eighteenth-century Swiss mathematician Leonhard Euler, who used them in calculations with unparallelled proficiency, did not recognize them as “legitimate” numbers. It was the nineteenth century German mathematician Carl Friedrich Gauss who fully appreciated their geometric significance, and used his standing in the scientific community to promote their legitimacy to other mathematicians and natural philosophers. The second geometric interpretation of complex numbers is in terms of vectors. The complex number z = a+bi, or the point a b, may be thought of as a vector ai+bj in the plane, which may in turn be represented as an arrow from the origin to a b as in Figure 20.2. The first component of this vector is Rez and the second component is Imz. In this interpretation the definition of addition of complex numbers is equivalent to the parallelogram law for vector addition, since we add two vectors by adding their respective components (Figure 20.3).
Imaginary axis a bi (a, b)
a bi ai b j (a, b)
y
y
a bi
Real axis
x
FIGURE 20.1
FIGURE 20.2
The complex plane.
Complex numbers as vectors in a plane.
a c (b d )i c di x
FIGURE 20.3 Parallelogram law for addition of complex numbers.
20.1 Complex Numbers
20.1.2
915
Magnitude and Conjugate
DEFINITION 20.1
Magnitude
The magnitude of a + bi is denoted a + bi, and is defined by a + bi = a2 + b2
Of course, the magnitude of zero is zero. As Figure 20.4 suggests, if z = a + ib is a nonzero complex number, then z is the distance from the origin to the point a b. Alternatively, z is the length of the vector ai + bj representing z. For example, √ √ 2 − 5i = 4 + 25 = 29 The magnitude of a complex number is also called its modulus.
DEFINITION 20.2
Conjugate
The complex conjugate (or just conjugate) of a + bi is the number denoted a + bi, and defined by a + bi = a − bi
We get the conjugate of z by changing the sign of the imaginary part of z. For example, 3 − 8i = 3 + 8i, i = −i and −25 = −25 This operation does not change the real part of z. We have Re a + ib = a = Rea + ib and Ima + ib = −b = −Ima + ib The operation of taking the conjugate can be interpreted as a reflection across the real axis, because the point a −b associated with a − ib is the reflection across the horizontal axis of the point a b associated with a + ib (Figure 20.5). Here are some computational rules for magnitude and conjugate. y y
a bi
a bi (a, b) x
a bi (a, b) x (a, b) a bi a bi
FIGURE 20.4
FIGURE 20.5
Magnitude of a complex number.
Conjugate of a complex number.
916
CHAPTER 20
Geometry and Arithmetic of Complex Numbers
THEOREM 20.1
Let z and w be complex numbers. Then 1. z = z. 2. z + w = z + w. 3. zw = z w. 4. z = z. 5. zw = z w. 6. Rez = 21 z + z and Imz = 2i1 z − z. 7. z ≥ 0, and z = 0 if and only if z = 0. 8. zz = z2 . Conclusion (1) states that taking the conjugate of a conjugate returns the original number. This is geometrically evident, since a reflection of x y to x −y, followed by a reflection of x −y to x y, returns to the original point. For an analytical argument, write
Proof
a + ib = a − ib = a + ib For conclusion (5), let z = a + ib and w = c + id. Then zw = ac − bd + iad + bc = ac − bd2 + ad + bc2 = a2 c2 + b2 d2 − 2acbd + a2 d2 + b2 c2 + 2adbc = a2 c2 + a2 d2 + b2 c2 + b2 d2 = a2 + b2 c2 + d2 = z w A much neater proof of (5) will be available when we know about the polar form of a complex number. The other parts of the theorem are left to the student.
20.1.3 Complex Division Suppose we want to form the quotient z/w, where w = 0. This quotient is the complex number u such that wu = z. This is not very helpful in actually finding u, however. Here is a computationally effective way of performing complex division. Let z = a + ib and w = c + id and write a + ib a + ib c − id ac + bd + ibc − ad = = c + id c + id c − id c2 + d2 By multiplying and dividing the original fraction by the conjugate of the denominator, we obtain an expression in which the real and imaginary parts of the quotient are apparent. The reason for this is that the denominator is ww, which is the real number w2 . For example, 5 62 2 − 7i 2 − 7i 8 − 3i −5 − 62i = = = − − i 8 + 3i 8 + 3i 8 − 3i 64 + 9 73 73 so the real part of this quotient is −5/73 and the imaginary part is −62/73.
20.1 Complex Numbers
20.1.4
917
Inequalities
There are several inequalities we will have occasion to use.
THEOREM 20.2
Let z and w be complex numbers. Then 1. Rez ≤ z and Imz ≤ z. 2. z + w ≤ z + w. 3. z − w ≤ z − w. Proof
If z = a + ib, then Rez = a ≤
a2 + b2 = z
and Imz = b ≤
a2 + b2 = z
Conclusion (2), which is called the triangle inequality, was proved for vectors. Here is a separate proof in the context of complex numbers: 0 ≤ z + w2 = z + wz + w = z + wz + w = zz + zw + wz + ww = z2 + zw + zw + w2 = z2 + 2Rezw + w2 ≤ z2 + 2 zw + w2 = z2 + 2 z w + w2 = z2 + 2 z w + w2 = z + w2 In summary, 0 ≤ z + w2 ≤ z + w2 Upon taking the square root of these nonnegative quantities, we obtain the triangle inequality. For (3), use the triangle inequality to write z = z + w − w ≤ z + w + w hence z − w ≤ z + w By interchanging z and w, w − z ≤ z + w Therefore − z + w ≤ z − w ≤ z + w so z − w ≤ z + w
918
CHAPTER 20
Geometry and Arithmetic of Complex Numbers
20.1.5 Argument and Polar Form of a Complex Number Let z = a + ib be a nonzero complex number. Then a b is a point other than the origin in the plane. Let this point have polar coordinates r . As is standard with polar coordinates, the polar angle of a b is not uniquely determined. If we walk out the real axis r units to the right from the origin, and rotate this segment 0 radians until the segment ends at a b, as in Figure 20.6, then the polar angle for a b is any number 0 + 2n, in which n is any integer. A positive choice for n corresponds to rotating the segment from 0 to r an initial 0 radians to reach a b, and then continuing counterclockwise an additional n full circles, which again lands us on a b. A negative choice for n corresponds to rotating the segment from 0 to r an initial 0 radians, and then going around an additional n circles clockwise, again ending at a b. Thus, by convention, we think of counterclockwise rotations as having positive orientation in the plane, and clockwise rotations as having negative orientation. √ To √ illustrate, consider z = 1 + i. The point 1 1 has polar coordinates 2 /4, since 1 + i is 2 units from the origin, and the segment from the origin to 1 + i makes an angle /4 radians with the positive real axis (Figure 20.7). All of the polar coordinates of 1 1 have the form √ 2 /4 + 2n in which n can be any integer. If nonzero z has polar coordinates r , then r = z. The angle (which we will always express in radians) is called an argument of z. Any nonzero number has infinitely many arguments, and they differ from each other by integer multiples of 2. The arguments of 1 + i are /4 + 2n, for n any integer. Now recall Euler’s formula ei = cos + i sin If is any argument of z = a + ib, then a b has polar coordinates r , so a = r cos and b = r sin. Combining this fact with Euler’s formula, we have z = a + ib = r cos + ir sin = rei This exponential form for z is called the polar form of z. Any argument of z can be used in this polar form, because any two arguments 0 and 1 differ by some integer multiple of 2. If, say 1 = 0 + 2k, then rei1 = r cos0 + 2k + i sin0 + 2k = r cos0 + i sin0 = rei0
y
y
1i
(a, b)
2 π
θ0 r FIGURE 20.6 a b has polar coordinates r 0 + 2n, n any integer.
x
4
x 2
Polar coordinates of 1 + i are √ 2 /4 + 2n, n any integer.
FIGURE 20.7
20.1 Complex Numbers
919
EXAMPLE 20.1
√ We will find the polar form of −1 + 4i. First, r = −1 + 4i = 17. Now consider Figure 20.8. is an argument of −1 + 4i, and will be handy in determining . From the diagram, tan = 4, so = − = − tan−1 4 We can therefore write the polar form −1 + 4i =
√
17ei−tan
1 4i
−1 4
y
4
α
θ
x
1 FIGURE 20.8
− tan−1 4 is an argument of −1 + 4i.
EXAMPLE 20.2
We will find the polar form of 3 + 3i. As indicated in Figure 20.9, /4 is an argument of 3 + 3i. The polar form is √ 3 + 3i = 18ei/4
3 3i (3, 3)
y
For any real , 4
FIGURE 20.9
/4 is an argument of 3 + 3i.
x
i e = cos2 + sin2 = 1
This means that, in writing the polar form z = rei , the magnitude of z is wholly contained in the factor r, while ei , which has magnitude 1, contributes all the information about the direction of (nonzero) z from the origin. Because of the properties of the exponential function, some computations with complex numbers are simplified if polar forms are used. To illustrate, suppose we want to prove that zw = z w, something we did before by algebraic manipulation. Put z = rei and w = ei to immediately get zw = rei ei = r ei+ = r = z w The fact that ei+ = ei ei also means that the argument of a product is the sum of the arguments of the factors, to within an integer multiple of 2. Put more carefully, if 0 is any argument of z, and 1 is any argument of w, and ) is any argument of zw, then for some integer n, ) = 0 + 1 + 2n
920
CHAPTER 20
Geometry and Arithmetic of Complex Numbers
Multiplying two complex numbers has the effect of adding their arguments, to within integer multiples of 2.
EXAMPLE 20.3
Let z = i and w = 2 − 2i. One argument of z is 0 = /2, and one argument of w is 1 = 7/4 (Figure 20.10). Now zw = i2 − 2i = 2 + 2i and one argument of 2 + 2i is ) = /4. With this choice of arguments, 0 + 1 =
7 9 + = = ) + 2 2 4 4
If we had chosen 0 = /2 and 1 = −/4, then we would get 0 + 1 =
− = = ) 2 4 4 y
i
7 4 2
x
2 2i /2 is an argument of i, and 7/4 an argument of 2 − 2i.
FIGURE 20.10
20.1.6 Ordering Given any two distinct real numbers a and b, exactly one of a < b or b < a must be true. The real numbers are said to be ordered. We will show that there is no ordering of complex numbers. To understand why this is true, we must investigate the idea behind the ordering of the reals. The ordering of the real numbers is actually a partitioning of the nonzero real numbers into two mutually exclusive sets, N and P, with the following properties: 1. If x is a nonzero real number, then x is in P or −x is in P, but not both. 2. If x and y are in P, then x + y and xy are in P. Think of P as the set of positive numbers, and N as the set of negative numbers. The existence of such a partition of the nonzero reals satisfing conditions (1) and (2) is the reason why we can order the reals. An ordering is established by defining x < y if and only if y − x is in P. For example, 2 < 5 because 5 − 2 = 3 is positive. Does there exist a partition of the nonzero complex numbers into two sets, P and N , having properties (1) and (2)? If so, we can order the complex numbers.
20.2 Loci and Sets of Points in the Complex Plane
921
Suppose such a partition exists. Then either i is in P, or −i is in P , but not both. If i is in P, then i2 = −1 is in P by (2), so −1i = −i is in P. But this violates condition (1). If −i is in P, then −i−i = i2 = −1 is in P, so −1−i = i is in P, again violating (1). This shows that no such partition exists, and the complex numbers cannot be ordered. Whenever we write z < w, we are assuming that z and w are real numbers.
SECTION 20.1
PROBLEMS
In each of Problems 1 through 10, carry out the indicated calculation.
In each of Problems 17 through 22, determine argz. The answer should include all arguments of the number.
1. 3 − 4i6 + 2i
17. 3i
2. i6 − 2i + 1 + i 2+i 3. 4 − 7i 2 + i − 3 − 4i 4. 5 − i3 + i
18. −2 + 2i 19. −3 + 2i 20. 8 + i 21. −4 22. 3 − 4i
5. 17 − 6i−4 − 12i 3i 6. −4 + 8i
In each of Problems 23 through 28, write the complex number in polar form.
7. i3 − 4i2 + 2
23. −2 + 2i
8. 3 + i −6 + 2i 2 9. 1 − 8i
24. −7i
3
25. 5 − 2i 26. −4 − i
10. −3 − 8i2i4 − i
27. 8 + i
11. Prove that, for any positive integer n, i = 1 i 4n
4n+1
= i i
4n+2
= −1 and i
28. −12 + 3i 4n+3
= −i
12. Let z = a + ib. Determine Rez and Imz . 2
2
13. Let z = a + ib. Determine Rez2 − iz + 1 and Imz2 − iz + 1 14. Prove that z2 = z2 if and only if z is either real or pure imaginary. 15. Let z, w and u be complex numbers. Prove that, when represented as points in the plane, these numbers form the vertices of an equilateral triangle if and only if z2 + w2 + u2 = zw + zu + wu 16. Prove that Reiz = −Imz and Imiz = Rez.
20.2
29. Let z and w be complex numbers such that zw = 1, but such that either z or w has magnitude 1. Prove that z−w 1 − zw = 1 Hint: In problems involving magnitude, it is often useful to recall Theorem 20.1(8). To apply this result, square both sides of the proposed equality. 30. Prove that, for any complex numbers z and w, z + w2 + z − w2 = 2 z2 + w2 Hint: Keep Theorem 20.1(8) in mind.
Loci and Sets of Points in the Complex Plane Sometimes complex notation is very efficient in specifying loci of points in the plane. In this section we will illustrate this, and also discuss the complex representation of certain sets that occur frequently in discussions of complex integrals and derivatives.
922
CHAPTER 20
Geometry and Arithmetic of Complex Numbers
20.2.1 Distance
√ If z = a + ib is any complex number, z = a2 + b2 is the distance from the origin to z (point a b) in the complex plane. If w = c + id is also a complex number, then z − w = a − c + ib − d = a − c2 + b − d2 is the distance between z and w in the complex plane (Figure 20.11). This is the standard formula from geometry for the distance between points a b and c d.
20.2.2 Circles and Disks If a is a complex number and r is a positive (hence real) number, the equation z − a = r is satisfied by exactly those z whose distance from a is r. The locus of points satisfying this condition is the circle of radius r about a (Figure 20.12). This is the way we specify circles in the complex plane, and we often refer to “the circle z − a = r.” If a = 0, then any point on the circle z = r has polar form z = rei , where is the angle from the positive real axis to the line from the origin through z (Figure 20.13). As varies from 0 to 2, the point z = rei moves once counterclockwise around this circle, starting at z = r on the positive real axis when = 0, reaching ri when = /2, −r when = , −ri when = 3/2, and returning to r when = 2. If a = 0, then the center of the circle z − a = r is a instead of the origin. Now a point on the circle has the form z = a + rei which is simply a polar coordinate system translated to have a as its origin (Figure 20.14). As varies from 0 to 2, this point moves once counterclockwise about this circle. For example, the equation z − 3 + 7i = 4 defines the circle of radius 4 about the point 3 −7 in the plane. The complex number 3 − 7i is the center of the circle. A typical point on the circle has the form z = 3 − 7i + 4ei (Figure 20.15). y z y
y
z w x w FIGURE 20.11 z − w is the distance between z and w.
z a r r a
z r
x
x
z re i
FIGURE 20.12
FIGURE 20.13
The circle of radius r about a.
The circle of radius r about the origin.
20.2 Loci and Sets of Points in the Complex Plane
923
y x
3
y x
a
(3 4 cos(), 7 4 sin()) 3 7i 4e i
4 7
(3, 7)
z a re i FIGURE 20.15 The circle z − 3 + 7i = 4.
FIGURE 20.14
An inequality z − a < r specifies all points within the disk of radius r about a. Such a set is called an open disk. Open here means that points on the circumference of the circle bounding this disk are not in this set. A point on this circle would satisfy z − a = r , not z − a < r. We often indicate in a drawing that a disk is open by drawing a dashed boundary circle (Figure 20.16). For example, z − i < 8 specifies the points within the open disk of radius 8 about i. A closed disk of radius r and center a consists of all points on or within the circle of radius a about r. This set is specified by the weak inequality z − a ≤ r. In a drawing of such a set, we often draw a solid circle as boundary to indicate that these points are included in the closed disk (Figure 20.17).
20.2.3
The Equation z − a = z − b
Let w1 and w2 be distinct complex numbers. An equation z − w1 = z − w2 can be verbalized as “the distance between z and w1 must equal the distance between z and w2 .” As Figure 20.18 suggests, this requires that z be on the perpendicular bisector of the line segment connecting w1 and w2 . The equation z − w1 = z − w2 may therefore be considered the equation of this line.
y y
a
y r a
b x
x
FIGURE 20.16
FIGURE 20.17
FIGURE 20.18
z − a < r, the open disk of radius r about a.
z − a ≤ r, the closed disk of radius r about a.
z − a = z − b is satisfied by all z on the perpendicular bisector of the segment ab.
x
r
a
924
CHAPTER 20
Geometry and Arithmetic of Complex Numbers
EXAMPLE 20.4
The equation z + 6i = z − 1 + 3i is satisfied by all points on the perpendicular bisector of the segment between −6i and 1 − 3i. This is the segment connecting 0 −6 and 1 −3, as shown in Figure 20.19. We can obtain the “standard” equation of this line as follows. First write z + 6i2 = z − 1 + 3i2 or z + 6iz − 6i = z − 1 + 3iz − 1 − 3i This eliminates the absolute value signs. Carry out the multiplications to obtain zz + 6iz − z + 36 = zz − z − 3iz − z + 1 + 3i + 3iz − 3i + 9 Let z = x + iy. Then z − z = x − iy − x + iy = −2iy and −z − z = −2x, so the last equation becomes 6i−2iy + 36 = −2x + 3i−2iy + 10 or 12y = −2x + 6y − 26 This is the line 1 y = − x + 13 3 Now consider the inequality z + 6i < z − 1 + 3i We already know that the equation z + 6i = z − 1 + 3i describes a line separating the plane into two sets, having this line as boundary (Figure 20.19). The given inequality holds for points in one of these sets, on one side or the other of this line. Clearly z is closer to −6i than to 1 − 3i if z is below the boundary line. Thus the inequality specifies all z below this line, the shaded region in Figure 20.20. The boundary line itself is drawn dashed because points on this line do not belong to this region. The weak inequality z + 6i ≤ z − 1 + 3i specifies all points in the shaded region of Figure 20.21, together with all points on the boundary line itself.
y
y
II
y x
x 1 3i 6i FIGURE 20.19 The locus of the equation z+6i = z−1+3i.
I
x
1 3i 6i
Region I consists of points satisfying z + 6i < z − 1 + 3i.
FIGURE 20.20
1 3i 6i FIGURE 20.21 The region given by z + 6i ≤ z − 1 + 3i.
20.2 Loci and Sets of Points in the Complex Plane
20.2.4
925
Other Loci
When a geometric argument is not apparent, we try to determine a locus by substituting z = x + iy into the given equation or inequality.
EXAMPLE 20.5
Consider the equation z2 + 3Rez2 = 4 If z = x + iy, this equation becomes x2 + y2 + 3x2 − y2 = 4 or 4x2 − 2y2 = 4 The graph of this equation is the hyperbola of Figure 20.22. A complex number satisfies the given equation if and only if its representation as a point in the plane lies on this hyperbola.
y x 1
x2 2 y2 1 FIGURE 20.22
Locus of points z with z2 + 3 Rez2 = 4.
20.2.5
Interior Points, Boundary Points, and Open and Closed Sets
In the development of the calculus of complex functions, certain kinds of sets and points will be important. For this section let S be a set of complex numbers. A number is an interior point of S if it is in a sense completely surrounded by points of S.
DEFINITION 20.3
Interior Point
A complex number z0 is an interior point of S if there is an open disk about z0 containing only points of S.
This means that, for some positive r, all points satisfying z − z0 < r are in S. Clearly this forces z0 to be in S as well.
926
CHAPTER 20
Geometry and Arithmetic of Complex Numbers
DEFINITION 20.4
Open Set
S is open if every point of S is an interior point.
EXAMPLE 20.6
Let K be the open disk z − a < r (Figure 20.23). Every point of K is an interior point because about any point in K we can draw a disk of small enough radius to contain only points in K. Thus K is an open set, justifying the terminology “open disk” used previously for a disk that does not include any points on its bounding circle.
EXAMPLE 20.7
Let L consist of all points satisfying z − a ≤ r. Now L contains points that are not interior points, specifically those on the circle z − a = r. Any open disk drawn about such a point will contain points outside of the disk L (Figure 20.24). This set is not an open set.
EXAMPLE 20.8
Let V consist of all z = x + iy with x > 0. This is the right half plane, not including the imaginary axis that forms the boundary between the left and right half planes. As suggested by Figure 20.25, every point of V is an interior point, because about any point z0 = x0 + iy0 with x0 > 0, we can draw a small enough disk that all points it encloses will also have positive real parts. Since every point of V is an interior point, V is an open set.
EXAMPLE 20.9
Let M consist of all z = x + iy with x ≥ 0. Every point in M with positive real part is an interior point, just as in the preceding example. But not every point of M is interior. A point z = iy on the imaginary axis is in M, but cannot be enclosed in a disk that contains only points y y
y
V
L K
x a
a
x
x FIGURE 20.23
FIGURE 20.24
FIGURE 20.25
An open disk is an open set (all points are interior points).
Points on z − a = r are not interior points of z − a ≤ r.
Half plane Rez > 0 (an open set).
20.2 Loci and Sets of Points in the Complex Plane
927
y M x
FIGURE 20.26
Half plane Rez ≥ 0 (not an open set).
in M, having nonnegative real part (Figure 20.26). Since M contains points that are not interior points, M is not open.
EXAMPLE 20.10
Let W consist of all points on the real axis. Then no point of W is an interior point. Any disk, no matter how small the radius, drawn about a point on the real axis will contain points not on this axis, hence not in W . No point of W is an interior point of W . Returning again to the general discussion. boundary points of a set S are complex numbers that are in some sense on the “edge” of S.
DEFINITION 20.5
Boundary Point
A point z0 is a boundary point of S if every open disk about z0 contains at least one point in S and at least one point not in S.
A boundary point itself may or may not be in S. Because the definitions of interior point and boundary point are exclusive, no point can be an interior point and a boundary point of the same set. The set of all boundary points of S is called the boundary of S, and is denoted S.
EXAMPLE 20.11
The sets K and L of Examples 20.6 and 20.7 have the same boundary, namely the points on the circle z − a = r. K does not contain any of its boundary points, while L contains them all.
EXAMPLE 20.12
The set V of Example 20.8 has all the points on the imaginary axis as its boundary points. This set does not contain any of its boundary points. By contrast, M of Example 20.9 has the same boundary points as V , namely all points on the imaginary axis, but M contains all of these boundary points.
928
CHAPTER 20
Geometry and Arithmetic of Complex Numbers
EXAMPLE 20.13
For the real line (W in Example 20.10), every point of W is a boundary point. If we draw any open disk about a real number x, this disk contains a point of W , namely x, and many points not in W . There are no other boundary points of W .
EXAMPLE 20.14
Let E consist of all complex numbers z = x + iy with y > 0, together with the point −23i (Figure 20.27). Then −23i is a boundary point of E, because every disk about −23i certainly contains points not in E, but also contains a point of E, namely −23i itself. Every real number (horizontal axis) is also a boundary point of E.
y E
x iy, y 0 x
23i −23i is a boundary point of E.
FIGURE 20.27
A careful reading of the definition shows that every point of a set is either an interior point or a boundary point. THEOREM 20.3
Let S be a set of complex numbers and let z be in S. Then z is either a boundary point of S or an interior point of S. Proof Suppose z is in S, but is not an interior point. If D is any open disk about z, then D cannot contain only points of S, and so must contain at least one point not in S. But D also contains a point of S, namely z itself. Thus z must be a boundary point of S.
We emphasize, however, that a set may have boundary points that are not in the set, as occurs in some of the above examples. We also observe that an open set cannot contain any of its boundary points. THEOREM 20.4
Let S be a set of complex numbers. If S is open, then S can contain no boundary point. Suppose z is in S and S is open. Then some open disk D about z contains only points of S. But then this disk does not contain any points not in S, so z cannot be a boundary point of S.
Proof
20.2 Loci and Sets of Points in the Complex Plane
DEFINITION 20.6
929
Closed Set
A set of complex numbers is closed if it contains all of its boundary points.
For example, the closed disk z − z0 ≤ r is a closed set. The boundary points are all the points on the circle z − z0 = r, and these are all in the set. The set M of Example 20.9 is closed. Its boundary points are all the points on the imaginary axis, and these all belong to the set. The set W of Example 20.10 is closed, since every point in the set is a boundary point, and the set has no other boundary points. The terms closed and open are not mutually exclusive, and one is not the opposite of the other. A set may be both closed and open, or closed and not open, or open and not closed, or neither open nor closed. For example, the set of all complex number is open (every point is interior), and closed (every point is a boundary point, and contains them all). A closed disk is closed but not open, and an open disk is open and not closed. The following example gives a set that is neither open nor closed.
EXAMPLE 20.15
Let T consist of all points z = x + iy with −1 ≤ x ≤ 1 and y > 0. This is the infinite strip shown in Figure 20.28. The boundary points are all points −1 + iy with y ≥ 0, all points 1 + iy with y ≥ 0, and all points x with −1 ≤ x ≤ 1. Some of these points are in T , for example, the boundary points −1 + iy with y > 0. This means that T cannot be open. But some of these boundary points are not in T , for example, the points x with −1 ≤ x ≤ 1. Thus T is not closed.
y
T 1
1
x
FIGURE 20.28 The strip consisting of all z = x + iy with −1 ≤ x ≤ 1, y > 0.
20.2.6
Limit Points
A number z0 is a limit point of S if there are points of S arbitrarily close to z0 , but different from z0 .
930
CHAPTER 20
Geometry and Arithmetic of Complex Numbers
DEFINITION 20.7
Limit Point
A complex number z0 is a limit point of a set S if every open disk about z0 contains at least one point of S different from z0 .
Limit point differs from boundary point in requiring that every open disk about the point contain something from S other than the point itself. In Example 20.14, −23i is a boundary point of W , but not a limit point of W , because there are open disks about −23i that contain no other point of W .
EXAMPLE 20.16
For the set V of Example 20.8, every point on the vertical axis is a limit point. Given any such point z0 = iy0 , every disk about z0 contains points of V other than iy0 . Thus z0 is both a boundary point and a limit point. This example shows that a limit point of a set need not belong to the set. This set has many other limit points. For example, every number x + iy with x > 0 is a limit point that also belongs to V .
EXAMPLE 20.17
Let Q consist of the numbers i/n for n = 1 2 Every open disk about 0, no matter how small the radius, contains points i/n in Q if we choose n large enough. Therefore 0 is a limit point of Q. In this example, 0 is also a boundary point of Q (its only boundary point).
EXAMPLE 20.18
Let N consist of all in, with n an integer. Then N has no limit points. An open disk of radius 1/2 about in can have only one point in common with N , namely in itself. As these examples show, a limit point of a set may or may not be in the set. We claim that closed sets are exactly those that contain all of their limit points, with the understanding that a set contains all of its limit points in a vacuous sense if it has no limit points. THEOREM 20.5
Let S be a set of complex numbers. Then S is closed if and only if S contains all of its limit points. Proof Suppose first that S is closed and let w be a limit point of S. We will show that w is in S. Suppose w is not in S. We know that any disk z − w < r must contain a point zr of S other than w. But then this disk contains a point in S (namely zr ) and a point not in S (namely w itself). Therefore w is a boundary point of S. But S is closed, and so contains all of its boundary points, in particular w. This contradiction shows that w must be in S, hence S contains all of its limit points. Converely, suppose, if w is a limit point of S, then w is in S. We want to show that S is closed. To do this, we will show that S contains its boundary points. Let b be a boundary point
20.2 Loci and Sets of Points in the Complex Plane
931
of S. Suppose b is not in S. If z − b < r is an open disk about b, then this disk contains a point of S, because b is a boundary point. But this point is not b, because we have assumed that b is not in S. Then every open disk about b contains a point of S other than b, hence b is a limit point of S. But we have supposed that every limit point of S is in S, so b is in S. This contradiction proves that S contains all of its boundary points, hence S is closed. Here are some additional examples of limit points.
EXAMPLE 20.19
Let X consist of all numbers 2 − i/n, with n = 1 2 Then 2 is a limit point (and boundary point) of X. There are no other limit points of X.
EXAMPLE 20.20
Let Q consist of all complex numbers a+ib with a and b rational numbers. Then every complex number is both a limit point and a boundary point of Q. Some limit points of Q are in Q (if a and b are rational), and some are not (if a or b is irrational).
EXAMPLE 20.21
Let P consist of all complex numbers x + iy with −1 ≤ y < 1. Then each point of P is a limit point, and the points x + i are also limit points of P that do not belong to P.
EXAMPLE 20.22
Let D be the open disk z − z0 < r. Every point in D is a limit point. However, the points on the boundary circle z − z0 = r, which do not belong to D, are also limit points, as well as boundary points, of D.
20.2.7
Complex Sequences
The notion of complex sequence is a straightforward adaptation from the concept of real sequence.
DEFINITION 20.8
Sequence
A complex sequence zn is an assignment of a complex number zn to each positive integer n.
The number zn is the nth term of the sequence. For example, in has nth term in . We often indicate a sequence by listing the first few terms, including enough terms so that the pattern becomes clear and one can predict what zn is for any n. For example, we might write in as i i2 i3 in
932
CHAPTER 20
Geometry and Arithmetic of Complex Numbers y zn
z n3 L z n2 z n1
z4
z5
z3
x z1 FIGURE 20.29
z2 Convergence of
zn to L.
Convergence of complex sequences is also modeled after convergence of real sequences.
DEFINITION 20.9
Convergence
The complex sequence zn converges to the number L if, given any positive number , there is a positive number N such that zn − L < if n ≥ N
This means that we can make each term zn as close as we like to L by choosing n at least as large as some number N . Put another way, given any open disk D about z0 , we can find some term of the sequence so that all terms beyond this one in the list (that is, for large enough index) lie in D (Figure 20.29). This is the same idea as that behind convergence of real sequences, except on the real line open intervals replace open disks. When zn converges to L, we write zn → L, or limn→ zn = L. If a sequence does not converge to any number, then we say that the sequence diverges.
EXAMPLE 20.23
The sequence in diverges. This is the sequence i −1 −i 1 i −1 −i 1 and there is no point in the sequence beyond which all the terms approach one specific number arbitrarily closely. For example, if we take the disk z − i < 21 , then the first term of the sequence, and every fourth term after this, is in this disk, but no other term is in this disk.
EXAMPLE 20.24
The sequence 1 + i/n converges to 1. This follows from the definition since, if > 0, then 1 + i − 1 = i = 1 < n n n if n is chosen larger than 1/. Given > 0, we can choose N = 1/ in the definition of convergence.
20.2 Loci and Sets of Points in the Complex Plane
933
Convergence of a complex sequence can always be reduced to an issue of convergence of two real sequences. THEOREM 20.6
Let zn = xn + iyn and L = a + ib. Then zn → L if and only if xn → a and yn → b. For example, let zn = 1 + 1/nn + n + 2/ni. We know that 1 n =e lim 1 + n→ n and lim
n→
n+2 = 1 n
Then lim zn = e + i
n→
Suppose first that zn → a + bi. Let > 0. For some N , zn − L < if n ≥ N . Then, by Theorem 20.2(1), for n ≥ N ,
Proof
xn − a = Rezn − L ≤ zn − L < so xn → a. Similarly, if n ≥ N , yn − b = Imzn − L ≤ zn − L < so yn → b. Conversely, suppose xn → a and yn → b. Let > 0. For some N1 , xn − a < if n ≥ N1 2 For some N2 , yn − b < if n ≥ N2 2 Then, for n ≥ N1 + N2 , zn − L = xn − a + iyn − b ≤ xn − a + yn − b <
+ = 2 2
proving that zn → L. The notion of convergence of a complex sequence is intimately tied to the concept of a limit point of a set. THEOREM 20.7
Let K be a set of complex numbers and let w be a complex number. Then w is a limit point of K if and only if there is a sequence kn of points in K, with each kn = w, that converges to w. This is the rationale for the term limit point. A number w can be a limit point of a set only if w is the limit of a sequence of points in the set, all different from w. This holds whether or not w itself is in the set.
934
CHAPTER 20
Geometry and Arithmetic of Complex Numbers
For example, consider the open unit disk z < 1. We know that i is a limit point of this disk, because any open disk about i contains points of the unit disk different from i. But we can also find a sequence of points in the unit disk converging to i, for example, zn = 1 − 1/ni. Suppose first that w is a limit point of K. Then, for each positive integer n, the open disk of radius 1/n about w must contain a point of K different from w. Choose such a point and call it kn . Then each kn = w, kn is in K, and kn − w < 1/n. Since 1/n → 0 as n → , then kn converges to w. Conversely, suppose there is a sequence of points kn in K, all different from w, and converging to w. Let D be any open disk about w , say of radius . Because kn → w, D must contain all kn for n larger than some number N . But then D contains points of K different from w, and hence w is a limit point of K.
Proof
20.2.8 Subsequences A subsequence of a sequence is formed by picking out certain terms to form a new sequence.
DEFINITION 20.10
Subsequence
A sequence wj is a subsequence of zn if there are positive integers n1 < n 2 < · · · such that w j = z nj
The subsequence is therefore formed from zn by listing the terms of this sequence, z 1 z2 z 3 and then choosing, in order from left to right, some of the zj s to form a new sequence. A subsequence is a sequence in its own right, but consists of selected terms of an initially given sequence.
EXAMPLE 20.25
Let zn = in . We can define many subsequences of zn , but here is one. Let wj = z4j for j = 1 2 Then each wj = z4j = i = 1, and every term of this subsequence equals 1. Here nj = 4j in the definition. 4j
If a sequence converges, then every subsequence of it converges to the same limit. To see this, suppose zn → L. Let D be an open disk about L. Then “eventually” (that is, for large enough n), every zn is in D. If wj is a subsequence, then each wj = znj , so eventually all of these terms will also be in D and the subsequence also converges to L. However, a subsequence of a divergent sequence may diverge, or it may converge, as Example 20.25 shows. The sequence in diverges, but we were able to choose a subsequence having every term equal to 1, and this subsequence converges to 1.
20.2 Loci and Sets of Points in the Complex Plane
935
It is also possible for a divergent sequence to have no convergent subsequence. For example, let zn = ni. This sequence diverges, and we cannot choose a subsequence that converges. Now matter what subsequence is chosen, its terms will simply increase without bound the further out we go in the subsequence. In this example, the sequence ni is unbounded, hence divergent, and any subsequence is also unbounded and divergent. We get a more interesting result with bounded sequences.
DEFINITION 20.11
Bounded Sequence
zn is a bounded sequence if for some number M, zn ≤ M for n = 1 2
Alternatively, a sequence is bounded if there is some disk that contains all of its terms. We claim that every bounded sequence, convergent or not, has a convergent subsequence.
THEOREM 20.8
Let zn be a bounded sequence. Then zn has a convergent subsequence. This result has important consequences, for example, in treating the Cauchy integral theorem. Assuming the corresponding result for bounded real sequences, the conclusion for bounded complex sequences follows from Theorem 20.6. Let zn = xn + iyn form a bounded sequence. Then xn is a bounded real sequence, and so has a subsequence xnj that converges to some real number a. But then ynj is also a bounded real sequence, and so has a convergent subsequence ynjk that converges to some real number b. Using these indices, form the subsequence xnjk of xnj . This subsequence also converges to a. Then xnjk + iynjk is a subsequence of zn that converges to a + ib.
Proof
20.2.9
Compactness and the Bolzano-Weierstrass Theorem
DEFINITION 20.12
Bounded Set
A set K of complex numbers is bounded if, for some number M, z ≤ M for all z in K.
A bounded set is therefore one whose points cannot be arbitrarily far from the origin. Certainly any finite set is bounded, as is any open or closed disk. The set of points in, for integer n, is not bounded. The concepts of closed set and bounded set are independent. However, when combined, they characterize sets that have properties that are important in the analysis of complex functions. Such sets are called compact.
936
CHAPTER 20
Geometry and Arithmetic of Complex Numbers
DEFINITION 20.13
Compact Set
A set K of complex number is compact if it is closed and bounded.
Any closed disk is compact, while an open disk is not (it is not closed). The set of points in for integer n is not compact because it is not bounded (even though it is closed). Any finite set is compact. We will now show that any infinite compact set must contain at least one limit point. This is a remarkable result, since closed sets need not contain (or even have) any limit points, and bounded sets need not have limit points. THEOREM 20.9
Bolzano–Weierstrass
Let K be an infinite compact set of complex numbers. Then K contains a limit point. Since K is closed, any limit point of K must be in K. We will therefore concentrate on showing that there is a limit point of K. Choose any number z1 in K. Because K is infinite, we can choose a second number z2 in K, distinct from z1 . Next choose some z3 in K distinct from z1 and z2 , and continue this process. In this way generate an infinite sequence zn of distinct points in K. Because K is a bounded set, this sequence is bounded. Therefore zn contains a subsequence znj that converges to some number L. Because each term of this sequence is distinct from all the others, we can choose the subsequence so that no znj equals L. By Theorem 20.7, L is a limit point of K.
Proof
We are now ready to begin the calculus of complex functions.
SECTION 20.2
PROBLEMS
In each of Problems 1 through 11, determine the set of all z satisfying the given equation or inequality. In some cases it may be convenient to specify the set by a clearly labeled diagram. 1. z − 8 + 4i = 9 2. z = z − i 3. z2 + Imz = 16 4. z − i + z = 9
In each of Problems 12 through 19, a set of points (complex numbers) is given. Determine whether the set is open, closed, open and closed, or neither open nor closed. Determine all limit points of the set, all boundary points, the boundary of the set, and its closure. Also determine whether the set is compact. 12. S is the set of all z with z > 2. 13. K is the set of all z satisfying z − 1 ≤ z + 4i.
5. z + Rez = 0
14. T is the set of all z with 4 ≤ z + i ≤ 8.
6. z + z = 4
15. M consists of all z with Imz < 7.
7. Imz − i = Rez + 1 8. z = Imz − i
16. R is the set of all complex numbers 1/m + 1/ni, in which m and n may be any positive integers.
9. z + 1 + 6i = z − 3 + i
17. U is the set of all z such that 1 < Rez ≤ 3.
2
10. z − 4i ≤ z + 1 11. z + 2 + i > z − 1
18. V is the set of all z such that 2 < Rez ≤ 3 and −1 < Imz < 1.
20.2 Loci and Sets of Points in the Complex Plane 19. W consists of all z such that Rez > Inz2 .
22.
20. Suppose S is a finite set of complex numbers, say consisting of numbers z1 , z2 , zn .
23.
(b) Show that every zj is a boundary point of S.
24. 25. 26.
(c) Show that S is closed.
27.
(a) Show that S has no limit point.
In each of Problems 21 through 27, find the limit of the sequence, or state that the sequence diverges. 6 2in 21. 1 + n+1
28.
29.
937
4 2n 5 i 6 1 + 2n2 n − 1 i − n25 n 4 ni/3 e 4 4n 5 −i sinni 6 1 + 3n2 i 2n2 − n Consider the sequence eni/3 of Problem 24. Find two different convergent subsequences of this sequence. Find two convergent subsequences of the sequence i2n of Problem 22.
This page intentionally left blank
CHAPTER
21
LIMITS, CONTINUITY, AND DERIVATIVES CAUCHYRIEMANN EQUATIONS POWER SERIES THE EXPONENTIAL AND TRIGONOMETRIC FUNCTIONS THE COMPLEX LOGARITHM POWERS LIMITS, CONTINU
Complex Functions
21.1
Limits, Continuity, and Derivatives A complex function is a function that is defined for complex numbers in some set S, and takes on complex values. If denotes the set of complex numbers, and f is such a function, then we write f S → . This simply means that fz is a complex number for each z in S. The set S is called the domain of f . For example, let S consist of all z with z < 1 and define fz = z2 for z in S. Then f S → and f is a complex function. Often we define a function by some explicit expression in z, for example fz =
z+i z2 + 4
In the absence of specifying a domain S, we agree to allow all z for which the expression for fz is defined. This function is therefore defined for all complex z except 2i and −2i.
21.1.1
Limits
The notion of limit for a complex function is modeled after that for real-valued functions, with disks replacing intervals.
DEFINITION 21.1
Limit
Let f S → be a complex function and let z0 be a limit point of S. Let L be a complex number. Then lim fz = L
z→z0
939
940
CHAPTER 21
Complex Functions
if and only if, given > 0, there exists a positive number such that fz − L < for all z in S such that 0 < z − z0 < When limz→z0 fz = L, we call L the limit of fz as z approaches z0 .
Thus, limz→z0 fz = L if function values fz can be made to lie arbitrarily close (within ) to L by choosing z in S (so fz is defined) and close enough (within ) to z0 , but not actually equal to z0 . The condition 0 < z − z0 < excludes z0 itself from consideration. We are only interested in the behavior of fz at other points close to z0 . Put another way, given an open disk D of radius about L, we must be able to find an open disk D of radius about z0 , so that every point in D , except z0 itself, that is also in S, is sent by the function into D . This is illustrated in Figure 21.1. While it is not required in this definition that fz0 be defined, we do require that there be points arbitrarily close to z0 at which fz is defined. This is assured by making z0 a limit point of S, and is the reason for making this a requirement in the definition. It makes no sense to speak of a limit of fz, as z approaches z0 , if fz is not defined as z nears z0 . Even if fz0 is defined, there is no requirement in this limit that fz0 = L.
EXAMPLE 21.1
Let
fz =
z2 0
for z = i for z = i
Then limz→i fz = −1, but the limit does not equal f0. Indeed, even if f0 were not defined, we would still have limz→i fz = −1. Often we write fz → L
as
z → z0
when limz→z0 fz = L. Many limit theorems from real calculus hold for complex functions as well. Suppose limz→z0 fz = L and limz→z0 gz = K. Then lim fz + gz = L + K
z→z0
lim fz − gz = L − K
z→z0
lim cfz = cL for any number c
z→z0
lim fzgz = LK
z→z0
21.1 Limits, Continuity, and Derivatives
941
y z y D z
D
f (z)
z 0
FIGURE 21.1
z0
L
x
x limz→a fz = L.
FIGURE 21.2 z approaches z0 along any path in defining limz→z0 fz.
and, if K = 0, lim
z→z0
L fz = gz K
One significant difference between limits of complex functions and limits of real functions is the way the variable can approach the point. For a real function g, limx→a gx involves the behavior of gx as x approaches a from either side. On the line there are only two ways x can approach a. But limz→z0 fz = L involves the behavior of fz as z approaches z0 in the complex plane (or in a specified set S of allowable values) and this may involve z approaching z0 from any direction (Figure 21.2). The numbers fz must approach L along every path of approach of z to z0 in S. If along a single path of approach of z to z0 , fz does not approach L, then fz does not have limit L there. This makes limz→z0 fz = L in the complex plane a stronger statement than its real counterpart, requiring more of fz for z near z0 than is required of real functions. We will exploit this fact later to derive facts about complex functions.
21.1.2
Continuity
DEFINITION 21.2
A complex function f S → is continuous at a number z0 in S if and only if lim fz = fz0
z→z0
We say that f is continuous on a set K if f is continuous at each point of K. In particular, if f is continuous at all z for which fz is defined, then f is a continuous function.
Many familiar functions are continuous. Any polynomial is continuous for all z, and any rational function (quotient of polynomials) is continuous wherever its denominator is nonzero. When we have complex versions of the trigonometric and exponential functions, we will see that they are also continuous. If f is continuous at z0 , so is f . We should expect this. If, as z is chosen closer to z0 , fz becomes closer to fz0 , then it is reasonable that fz approaches fz0 . More rigorously, 0 ≤ fz − fz0 ≤ fz − fz0 → 0 if limz→z0 fz = fz0 .
942
CHAPTER 21
Complex Functions
If zn is a sequence of complex numbers and each fzn is defined, then fzn is also a complex sequence. For example, if fz = 2z2 and zn = 1/n, then fzn = 2/n2 . We claim that fzn converges if zn does, when f is continuous. Another way of saying this is that continuity preserves convergence of sequences. THEOREM 21.1
Let f S → be continuous, and let zn be a sequence of complex numbers in S. If zn converges to a number w in S, then fzn converges to fw. Here is the idea behind the theorem. Since f is continuous at w, then limz→w fz = fw. This means that fz must approach fw along any path of approach of z to w in S. But, if zn → w, we can think of the zn s as determining a path of approach of the variable z to w. Then fz must approach fw along this path, and hence fzn → fw. A converse of Theorem 21.1 can also be proved. If fzn → fw for every sequence zn of points of S converging to w, then f is continuous at w. We will now develop a significant property of continuous functions. First, define a complex function (continuous or not) to be bounded if the numbers fz do not become arbitrarily large in magnitude.
DEFINITION 21.3
Bounded Function
Let f S → . Then f is a bounded function if there is a positive number M such that fz ≤ M for all z in S.
Alternatively, f is bounded if there is a disk about the origin containing all the numbers fz for z in S. A continuous function need not be bounded (look at fz = 1/z for z = 0 ). We claim, however, that a continuous function defined on a compact set is bounded. This is analogous to the result that a real function that is continuous on a closed interval is bounded. On the real line, closed intervals are compact sets. THEOREM 21.2
Let f S → . Suppose S is compact and f is continuous on S. Then f is bounded. Proof Suppose f is not bounded. Then, if n is a positive integer, the disk of radius n about the origin cannot contain all fz for z in S. This means that there is some zn in S such that fzn > n. Now zn is a sequence of points in the bounded set S, hence has a convergent subsequence znj . Let this subsequence converge to w. Then w is a limit point of S, and S is closed, so w is in S also. Because f is continuous, fznj → fw. Then, for some N , we can make each fznj lie in the open disk of radius 1 about fw by choosing nj ≥ N . But this contradicts the fact that each fznj > nj . Therefore f must be a bounded function.
21.1 Limits, Continuity, and Derivatives
943
We can improve on this theorem as follows. Under the conditions of the preceding theorem, fz, which is real-valued, actually assumes a maximum and a minimum on S. THEOREM 21.3
Let f S → be continuous, and suppose S is compact. Then there are numbers z1 and z2 in S such that, for all z in S, fz1 ≤ fz ≤ fz2
21.1.3
The Derivative of a Complex Function
DEFINITION 21.4
Derivative
Let f S → , and suppose S is an open set. Let z0 be in S. Then f is differentiable at z0 if, for some complex number L, fz0 + h − fz0 = L h→0 h In this event we call L the derivative of f at z0 and denote it f z0 . If f is differentiable at each point of a set, then we say that f is differentiable on this set. lim
The reason for having S open in this definition is to be sure that there is some open disk about z0 throughout which fz is defined. When the complex number h is small enough in magnitude, then z0 + h is in this disk and fz0 + h is defined. This allows h to approach zero from any direction in the limit defining the derivative. This will have important ramifications shortly in the Cauchy–Riemann equations.
EXAMPLE 21.2
Let fz = z2 for all complex z. Then z + h2 − z2 = lim 2z + h = 2z h→0 h→0 h
f z = lim for all z.
For familiar functions such as polynomials, the usual rules for taking derivatives apply. For example, if n is a positive integer, and fz = zn , then f z = nzn−1 . When we develop the complex sine function fz = sinz, we will see that f z = cosz. Other familiar derivative formulas are: f + g z = f z + g z f − g z = f z − g z cf z = cf z fg z = fzg z + f zgz
944
CHAPTER 21 and
Complex Functions
f gzf z − fzg z z = g
gz2
if gz = 0
These conclusions assume that the derivatives involved exist. There is also a complex version of the chain rule. Recall that the composition of two functions is defined by f gz = fgz The chain rule for differentiating a composition is f g z = f gzg z assuming that g is differentiable at z and f is differentiable at gz. Often f z is denoted using the Leibniz notation df dz In this notation, the chain rule is d df dw fgz = dz dw dz where w = gz. Not all functions are differentiable.
EXAMPLE 21.3
Let fz = z. We will show that f is not differentiable at any point. To see why this is true, compute fz + h − fz z + h − z h = = h h h We want the limit of this quotient as h → 0. But this limit is in the complex plane, and the complex number h must be allowed to approach zero along any path. If h approaches zero along the real axis, then h is real, h = h and h/h = 1 → 1. But if h approaches zero along the imaginary axis, then h = ik for k real, and h −ik = = −1 → −1 h ik as k → 0. The quotient h/h approaches different numbers as h approaches zero along different paths. This means that h h→0 h does not exist, so f has no derivative at any point. lim
As with real functions, a complex function is continuous wherever it is differentiable. THEOREM 21.4
Let f be differentiable at z0 . Then f is continuous at z0 .
21.1 Limits, Continuity, and Derivatives Proof
We know that
lim
h→0
945
fz0 + h − fz0 − f z0 = 0 h
Let h =
fz0 + h − fz0 − f z0 h
Then limh→0 h = 0. Further, fz0 + h − fz0 = hf z0 + hh Since the right side has limit zero as h → 0, then lim fz0 + h − fz0 = 0
h→0
This is the same as lim fz0 + h = fz0
h→0
and this in turn implies that limz→z0 fz = fz0 . Therefore f is continuous at z0 .
21.1.4
The Cauchy–Riemann Equations
We will derive a set of partial differential equations that must be satisfied by the real and imaginary parts of a differentiable complex function. These equations also play a role in potential theory and in treatments of the Dirichlet problem. Let f be a complex function. If z = x + iy, we can always write fz = fx + iy = ux y + ivx y in which u and v are real-valued functions of the two real variables x and y. Then ux y = Re fz and vx y = Im fz
EXAMPLE 21.4
Let fz = 1/z for z = 0. Then fx + iy =
1 1 x − iy x y = = 2 −i 2 2 x + iy x + iy x − iy x + y x + y2
For this function ux y =
x x2 + y 2
and
vx y = −
y x2 + y 2
We will now derive a relationship between partial derivatives of u and v, at any point where f is differentiable. THEOREM 21.5
Cauchy–Riemann Equations
Let f S → , with S an open set. Write f = u + iv. Suppose z = x + iy is a point of S and f z exists. Then, at x y,
u v =
x y
and
v
u =−
x
y
CHAPTER 21
946
Proof
Complex Functions
Begin with f z = lim
h→0
fz + h − fz h
We know that this limit exists, hence must have the same value, f z, however h approaches zero. Consider two paths of approach of h to the origin. First, let h → 0 along the real axis (Figure 21.3). Now h is real, and z + h = x + h + iy. Then y
ux + h y + ivx + h y − ux y − ivx y h→0 h ux + h y − ux y vx + h y − vx y +i = lim h→0 h h
f z = lim
h x FIGURE 21.3
=
v
u +i
x
x
Next, take the limit along the imaginary axis (Figure 21.4). Put h = ik with k real, so h → 0 as k → 0. Now z = x + iy + k and ux y + k + ivx y + k − ux y − ivx y ik 1 ux y + k − ux y vx y + k − vx y + = lim k→0 i k k
y
f z = lim
k→0
h ik x
= −i
FIGURE 21.4
u v +
y y
in which we have used the fact that 1/i = −i. We now have two expressions for f z, so they must be equal:
u
v
u v + i = −i +
x
x
y y Setting the real part of the left side equal to the real part of the right, and then the imaginary part of the left side to the imaginary part of the right, yields the Cauchy–Riemann equations. One extra dividend of this proof is that we have also derived formulas for f z in terms of the real and imaginary parts of fz. For example, if fz = z3 , then fz = fx + iy = x + iy3 = x3 − 3xy2 + i3x2 y − y3 Then ux y = x3 − 3xy2
and
vx y = 3x2 y − y3
so f z =
u
v + i = 3x2 − 3y2 + i 6xy
x
x
This automatically displays the real and imaginary parts of f z. Of course, for this simple function it is just as easy to write directly f z = 3z2 = 3x + iy2 = 3x2 − y2 + 6xyi The Cauchy–Riemann equations constitute a necessary condition for f to be differentiable at a point. If they are not satisfied, then f z does not exist at this point.
21.1 Limits, Continuity, and Derivatives
947
EXAMPLE 21.5
Let fz = z. Then fz = x − iy and ux y = x, vx y = −y. Now
u
v = 1 =
x
y so the Cauchy–Riemann equations do not hold for f , at any point, and therefore f is not differentiable at any point.
EXAMPLE 21.6
Let fz = z Rez. Then fx + iy = x + iyx = x2 + ixy so ux y = x2 and vx y = xy. Now
u = 2x
x
v =x
y
u = 0
y
v = y
x
and
The Cauchy–Riemann equations do not hold at any point except z = 0. This means that f is not differentiable at z if z = 0, but may have a derivative at 0. In fact, this function is differentiable at 0, since f 0 = lim
h→0
fh − f0 = lim Reh = 0 h→0 h
While the Cauchy–Riemann equations are necessary for differentiability, they are not sufficient. If the Cauchy–Riemann equations hold at a point z, then f may or may not be differentiable at z. In the preceding example, the Cauchy–Riemann equations held at the origin, and f 0 existed. Here is an example in which the Cauchy–Riemann equations are satisfied at the origin, but f has no derivative there.
EXAMPLE 21.7
Let
fz =
z5 / z4 0
for z = 0 for z = 0
We will show that the Cauchy–Riemann equations are satisfied at z = 0, but that f is not differentiable at 0. First do some algebra to obtain ux y =
5xy4 − 10x3 y2 + x5 x2 + y2 2
if x y = 0 0
vx y =
y5 − 10x2 y3 + 5x4 y x2 + y2 2
if x y = 0 0
948
CHAPTER 21
Complex Functions
and u0 0 = v0 0 = 0 Compute the partial derivatives at the origin:
u uh 0 − u0 0 h5 0 0 = lim = lim 4 = 1 h→0 h→0 hh
x h u0 h − u0 0
u 0 0 = lim = lim 0 = 0 h→0 h→0
y h vh 0 − v0 0
v 0 0 = lim = lim 0 = 0 h→0 h→0
x h and
v v0 h − v0 0 h5 0 0 = lim = lim 4 = 1 h→0 h→0 hh
y h Therefore the Cauchy–Riemann equations are satisfied at the origin. However, f is not differentiable at 0. Consider 2 f0 + h − f0 h5 h5 h2 h = = = = 4 2 2 h hhh h h h h We claim that h/h2 has no limit as h → 0. This is easily seen by converting to polar form. If h = rei , then h = re−i , and 2 h r 2 e2i = 2 −2i = e4i r e h On the line making an angle with the positive real axis (Figure 21.5), the difference quotient f0 + h − f0 h has the constant value e4i , and so approaches this number as h → 0. The difference quotient therefore approaches different values along different paths, and so has no limit as h → 0. This example means that some condition(s) must be added to the Cauchy–Riemann equations to guarantee existence of a derivative at a point. The following theorem gives sufficient conditions for differentiability.
y
h
θ x
FIGURE 21.5
21.1 Limits, Continuity, and Derivatives
949
THEOREM 21.6
Let f S → be a complex function, with S an open set. Let f = u + iv. Suppose u, v and their first partial derivatives are continuous on S. Suppose also that u and v satisfy the Cauchy–Riemann equations on S. Then f is differentiable at each point of S. In real calculus, a function whose derivative is zero throughout an interval must be constant on that interval. Here is the complex analogue of this result, together with another result we will need later.
THEOREM 21.7
Let f be differentiable on an open disk D. Let f = u + iv, and suppose u and v satisfy the Cauchy–Riemann equations, and are continuous with continuous first partial derivatives in D. Then, 1. If f z = 0 for all z in D, then fz is constant on D. 2. If fz is constant in D, so is fz. Proof
To prove (1), recall from the proof of Theorem 21.5 that, for z in D, f z = 0 =
v
u +i
x
x
But then u/ x and v/ x are zero throughout D. By the Cauchy–Riemann equations, u/ y and v/ y are also zero at each point of D. Then ux y and vx y are constant on D, hence so is fz. For (2), suppose fz = k for all z in D. Then fz2 = ux y2 + vx y2 = k2
(21.1)
for x y in D. If k = 0, then fz = 0 for all z in D , hence fz = 0 on D. If k = 0, differentiate equation (21.1) with respect to x to get u
u
v +v = 0
x
x
(21.2)
Differentiate equation (21.1) with respect to y to get
u
v + v = 0
y
y
u
(21.3)
Using the Cauchy–Riemann equations, equations (21.2) and (21.3) can be written
u
u −v = 0
x
y
(21.4)
u
u + v = 0
y
x
(21.5)
u and u
Multiply equation (21.4) by u and equation (21.5) by v and add the resulting equations to get u2 + v2
u
u = k2 = 0
x
x
CHAPTER 21
950
Complex Functions
Therefore
u =0
x for all x y in D. By the Cauchy–Riemann equations,
v =0
y throughout D. Now a similar manipulation shows that
u v = =0
y
x on D. Therefore ux y and vx y are constant on D, so fz is constant also.
SECTION 21.1
PROBLEMS
In each of Problems 1 through 12, find u and v so that fz = ux y + ivx y, determine all points (if any) at which the Cauchy–Riemann equations are satisfied, and determine all points at which the function is differentiable. Familiar facts about continuity of real-valued functions of two variables may be assumed.
8. fz = z3 − 8z + 2 10. fz = iz + z
2. fz = z2 − iz
11. fz = −4z +
3. fz = z
21.2
6. fz = z + Imz z 7. fz = Rez 9. fz = z2
1. fz = z − i
4. fz =
5. fz = i z2
2z + 1 z
12. fz =
1 z
z−i z+i
Power Series We now know some facts about continuity and differentiability. However, the only complex functions we have at this point are polynomials and rational functions. A complex polynomial is a function pz = a0 + a1 z + a2 z2 + · · · + an zn in which the aj s are complex numbers, and a rational function is a quotient of polynomials, Rz =
a0 + a1 z + · · · + an zn b0 + b1 z + · · · + bm zm
Polynomials are differentiable for all z, and a rational function is differentiable for all z at which the denominator does not vanish. The vehicle for expanding our catalog of functions, obtaining exponential and trigonometric functions, logarithms, power functions and others, is the power series. We will precede a development of complex power series with some facts about series of constants.
21.2 Power Series
21.2.1
951
Series of Complex Numbers
We numbers. Consider a complex series will assume standard results about series of real th c , with each c a complex number. The N partial sum of this series is the finite n n=1n 5 4N is the sequence of partial sums of this series, and the c sum Nn=1 cn . The sequence n=1 n series converges if and only if this sequence of partial sums converges. If cn = an + ibn , then N
cn =
n=1
N
an + i
n=1
N
bn
n=1
5 4N the real partial sums Nn=1 an and Nn=1 bn converge as so n=1 cn converges if and only if N → . Further, if n=1 an = A and n=1 bn = B, then
cn = A + iB
n=1
We can therefore study any series of complex constants by considering two series of real constants, for which tests are available (ratio test, root test, integral test, comparison test and others). As with real series, if n=1 cn converges, then necessarily lim n→ cn = 0. In some instances we can not only show that a series converges, but we can find its sum. The geometric series is an important illustration of this which we will use often.
EXAMPLE 21.8
n Consider the series n=1 z , with z a given complex number. A series which adds successive powers of a single number is called a geometric series. We can sum this series as follows. Let SN =
N
zn = z + z2 + z3 + · · · + zN −1 + zN
n=1
Then zSN = z2 + z3 + · · · + zN + zN +1 If we subtract these finite sums, most terms cancel and we are left with SN − zSN = 1 − zSN = z − zN +1 Then, for z = 1, SN =
1 N +1 z − z 1−z 1−z
If z < 1, then zN +1 → 0 as N → , hence zN +1 → 0 also and in this case the geometric series converges:
zn = lim SN =
n=1
N →
z 1−z
If z ≥ 1, the geometric series diverges. Sometimes we have a geometric series with first term equal to 1. This is the series
The series
n=0
n=1 cn
zn = 1 +
n=1
zn = 1 +
1 z = 1−z 1−z
if z < 1
is said to converge absolutely if the real series
n=1 cn
converges.
952
CHAPTER 21
Complex Functions
THEOREM 21.8
If
converges absolutely, then this series converges. Proof Let cn = an + ibn . Suppose n=1 cn converges. Since 0 ≤ an ≤ cn , then by compari son, n=1 an converges. Similarly, 0 ≤ bn ≤ cn , so n=1 bn converges. Then n=1 an + ibn = n=1 cn converges. n=1 cn
EXAMPLE 21.9
Consider the series
−1n
n=1
2−i 1 + in
Compute √ −1n 2 − i = √ 5 1 + in 2n √ √ √ n 5/ 2 converges. This is 5 times the real geometric series Now the √ real series n=1 √ n n=1 1/ 2 , which converges because 1/ 2 < 1. Therefore the given complex series converges absolutely, hence converges. The point to Theorem 21.8 is that n=1 cn is a real series, and we have methods to test real series for convergence or divergence. We can therefore (in the case of absolute convergence) test a complex series for convergence by testing a real series. This approach is not all-inclusive, however, because a series may converge, but not converge absolutely. Such a series is said to converge conditionally. For example, the series −1n n=1
n
is well known to converge, but the series of absolute values of its terms is the divergent harmonic series n=1 1/n. With this background on complex series, we can take up power series.
21.2.2 Power Series DEFINITION 21.5
Power Series
A power series is a series of the form
cn z − z0 n
n=0
in which z0 and each cn is a given complex number.
21.2 Power Series
953
y z1 z0 x FIGURE 21.6
Convergence at z1 = z0 implies convergence on z − z0 < r = z1 − z0 .
The summation in a power series begins at n = 0 to allow for a constant term:
cn z − z0 n = c0 + c1 z − z0 + c2 z − z0 2 + · · ·
n=0
The number z0 is the center of the series, and the numbers cn are its coefficients. Given a power series, we want to know for what values of z, if any, it converges. Certainly any power series converges at its center z = z0 , because then the series is just c0 . The following theorem provides the key to determining if there are other values of z for which it converges. It says that, if we find a point z1 = z0 where the power series converges, then the series must converge absolutely at least for all points closer to z0 than z1 . This gives (absolute) convergence at least at points interior to the disk of Figure 21.6.
THEOREM 21.9
n Suppose n=0 cn z − z0 converges for some z1 different from z0 . Then the power series converges absolutely for all z satisfying z − z0 < z1 − z0 Proof
Suppose
n n=0 cn z1 − z0
converges. Then limn→ cn z1 − z0 n = 0. Then, for some N ,
cn z1 − z0 n < 1
if n ≥ N
Then, for n ≥ N ,
z − z0 n z − z0 n z − z0 n = z − z0 ≤ cn z − z0 = cn z − z n 1 z − z n z − z n
1
But if z − z0 < z1 − z0 , then
0
1
z − z0 z −z 1
0
<1
and then the geometric series z − z0 z −z
n=1
1
0
n
0
1
0
n
954
CHAPTER 21
Complex Functions
converges. By comparison (since these are series of real numbers),
cn z − z0 n
n=N
converges. But then
converges, so
cn z − z0 n
n=0
n=0 cn z − z0
n
converges absolutely, as we wanted to prove.
Apply this conclusion as follows. Imagine standing at z0 in the complex plane. Looking out in all directions, we may see no other points at which the power series converges. In this event, the series converges only for z = z0 . This is an uninteresting power series. A second possibility is that we see only points at which the power series converges. Now the power series converges for all z. The third possibility is that we see some points at which the series converges, and some at which it diverges. Let R be the distance from z0 to the nearest point, say ", at which the power series diverges. The distance R is critical in the following sense. If z is further from z0 than ", then the power series must diverge at z. For if it converged, then it would converge at all points closer to z0 than z, and hence would converge at " by Theorem 21.9. If z is closer to z0 than ", then the power series must converge at z, since " is the point closest to z0 at which the series diverges. This means that, in this third case, the power series converges for all z with z − z0 < R and diverges for all z with z − z0 > R The number R is called the radius of convergence of the power series, and the open disk z − z0 < R is called the open disk of convergence. The series converges inside this disk, and diverges outside the closed disk z − z0 ≤ R. At points on the boundary of this disk, z − z0 = R, the series might converge or diverge. If the power series converges only for z = z0 , we let the radius of convergence be R = 0. In this case we do not speak of an open disk of convergence. If the power series converges for all z, let R = . Now the open disk of convergence is the entire complex plane. In this case it is convenient to denote the disk of convergence as z − z0 < . Sometimes the radius of convergence can be calculated for a power series by using the ratio test.
EXAMPLE 21.10
Consider the power series n=0
−1n
2n z − 1 + 2i2n n+1
The center is z0 = 1 − 2i. We want the radius of convergence of this series.
21.2 Power Series
955
Consider the magnitude of the ratio of successive terms of this series: −1n+1 2n+1 z − 1 + 2i2n+2 2n + 1 n+2 z − 1 + 2i2 = 2n −1n n+1 n+2 z − 1 + 2i2n → 2 z − 1 + 2i2 as n → From the ratio test for real series, the power series will converge absolutely if this limit is less than 1, and diverge if this limit is greater than 1. Thus, the power series converges absolutely if 2 z − 1 + 2i2 < 1 or 1 z − 1 + 2i < √ 2 And the series diverges if
y
2 z − 1 + 2i2 > 1
x 1 2
1 2i
or 1 z − 1 + 2i > √ 2
√ √ The radius of convergence is 1/ 2 and the open disk of convergence is z − 1 + 2i < 1/ 2 (Figure 21.7).
FIGURE 21.7
Suppose now that a power series has a positive or infinite radius of convergence. For each z in the open disk of convergence, let fz =
cn z − z0 n
n=1
This defines a function f over this disk. We want to explore the properties of this function, in particular, whether it is differentiable. Answering this question requires the following technical lemma. LEMMA 21.1
The power series convergence.
n=0 cn z
− z0 n and
n=1 ncn z
− z0 n−1 have the same radius of
The lemma states that term by term differentiation of a power series does not change the radius of convergence. This means that, within the open disk of convergence, a power series defines a differentiable function whose derivative can be obtained by term by term differentiation. THEOREM 21.10
n Let n=0 cn z − z0 have positive or infinite radius of convergence. For each z in the open disk of convergence, let fz =
n=0
cn z − z0 n
956
CHAPTER 21
Complex Functions
Then f is differentiable on this open disk, and f z =
ncn z − z0 n−1
n=1
Using this theorem, we know how to differentiate afunction defined by a power series. n−1 is a power series But there is more to Theorem 21.10 than this. The series n=1 ncn z − z 0 in its own right, having the same radius of convergence as the series n=0 cn z − z0 n . We can therefore apply the theorem to this differentiated series and obtain f z =
nn − 1cn z − z0 n−2
n=2
within the open disk of convergence. Further, we can continue to differentiate as many times as we like within this disk. If f k z denotes the kth derivative, then f 3 z =
nn − 1n − 2cn zn−3
n=3
and in general f k z =
nn − 1n − 2 · · · n − k + 1cn z − z0 n−k
n=k th
If the k derivative is evaluated at z0 , then all terms of the series for f k z0 having positive powers of z − z0 vanish, leaving just the constant first term in this differentiated series. In this way, we get fz0 = c0 f z0 = c1 f z0 = 2c2 f 3 z0 = 32c3 and, in general, f k z0 = kk − 1k − 2 · · · 1ck We can solve these equations for the coefficients in terms of the function and its derivatives at z0 : ck =
1 k f z0 k!
for k = 0 1 2
(21.6)
where k! is the product of the integers from 1 through k, 0! = 1 by convention, and the zeroeth derivative f 0 z is just fz. This notation enables us to write one formula for the coefficients without considering the case k = 0 separately. The numbers given by equation (21.6) are the Taylor coefficients of f at z0 , and the power series 1 n f z0 z − z0 n n! n=0
is called the Taylor series for f at (or about) z0 . We have shown that, if a function f is defined in a disk by a power series centered at z0 , then the coefficients in this power series must be the Taylor coefficients, and the power series must be the Taylor series of f about z0 .
21.3 The Exponential and Trigonometric Functions
957
We are now in a position to define some of the elementary complex functions, including exponential and trigonometric functions and power functions.
SECTION 21.2
PROBLEMS
In each of Problems 1 through 8, determine the radius of convergence and open disk of convergence of the power series. 1.
n + 1 z + 3in n=0 2n
2.
3.
4.
5.
6.
n=0 −1
n=0
n=0
n=0
n
1 z − i2n 2n + 12
nn z − 1 + 3in n + 1n
2i 5+i
in 2n+1
n z + 3 − 4in
z + 8in
1 − in z − 3n n=0 n+2
21.3
7.
8.
n=0
n2 z + 6 + 2in 2n + 1
n3 3n n=0 n z + 2i 4
9. Is it possible for diverge at i?
cn z − 2in to converge at 0 and
n 10. Is it possible for n=0 cn z − 4 + 2i to converge at i and diverge at 1 + i? n 11. Consider n=0 cn z , where cn = 2 if n is even and cn = 1 if n is odd. Show that the radius of convergence of this power series is 1, but that this number cannot be computed using the ratio test. (This simply means that it is not always possible to use this test to determine the radius of convergence of a power series).
The Exponential and Trigonometric Functions We want to define the complex exponential function ez so that it agrees with the real exponential function when z is real. For all real x, ex =
1 n x n! n=0
Replace x with z in this series to obtain the power series 1 n z n! n=0
Compute n+1 z /n + 1! = lim 1 z = 0 lim n→ n + 1 n→ zn /n! Because this limit is less than 1 for all z, this power series converges for all z, and we can make the following definition.
958
CHAPTER 21
Complex Functions
DEFINITION 21.6
Exponential Function
For complex z, define the complex exponential function ez by ez =
1 n z n=0 n!
THEOREM 21.11
For every complex number, and every positive integer k, the kth derivative of ez is f k z = ez Proof
Compute f z =
1 n−1 1 1 n nz = zn−1 = z = ez n! n − 1! n! n=1 n=1 n=0
Therefore f z = ez . Continued differentiation now gives f k z = ez for any positive integer k. We will list properties of the complex exponential function, many of which are familiar from the real case. Conclusion (8) gives the real and imaginary parts of ez , enabling us to write ez = ux y + ivx y. Conclusion (9) has perhaps the main surprise we find when we extend the real exponential function to the complex plane. The complex exponential function is periodic! This period does not manifest itself in the real case because it is pure imaginary. THEOREM 21.12
1. e0 = 1. 2. If g is differentiable at z, then so is egz , and d gz e = g zegz dz 3. ez+w = ez ew for all complex z and w. 4. ez = 0 for all z. 5. e−z = 1/ez . 6. If z is real, then ez is real and ez > 0. 7. (Euler’s Formula) If y is real, then eiy = cosy + i siny 8. If z = x + iy, then ez = ex cosy + iex siny 9. ez is periodic with period 2ni for any integer n.
21.3 The Exponential and Trigonometric Functions
959
(1) is obvious and (2) follows from the chain rule for differentiation. To prove (3), fix any complex number u and define fz = ez eu−z , for all complex z. Then
Proof
f z = ez eu−z − ez eu−z = 0 for all z. By Theorem 21.7, on any open disk D z < R, fz is constant. For some number K, fz = K for z < R. But then f0 = K = e0 eu = eu , so for all z in D, ez eu−z = eu Now let u = z + w to get ez ew = ez+w Since R can be as large as we want, this holds for all complex z and w. To prove (4), suppose e = 0. Then 1 = e0 = e− = e e− = 0 a contradiction. For (5), argue as in (4) that 1 = e0 = ez−z = ez e−z so e−z = 1/ez To prove (7), write eiy = =
1 1 1 iyn = iy2n + iy2n+1 n! 2n! 2n + 1! n=0 n=0 n=0
1 2n 2n 1 i y + i2n+1 y2n+1 n=0 2n! n=0 2n + 1!
Now i2n = i2 n = −1n and i2n+1 = ii2n = i−1n so eiy =
−1n n=0
2n!
y2n + i
−1n 2n+1 y = cosy + i siny n=0 2n + 1!
in which we have used the (real) Maclaurin expansions of cosy and siny for real y. For (8), use (7) to write ez = ex+iy = ex eiy = ex cosy + i siny Finally, for conclusion (9), for any integer n, ez+2ni = ex+iy+2n = ex cosy + 2n + i siny + 2n = ex cosy + iex siny = ez Thus for any nonzero integer n, 2ni is a period of ez .
960
CHAPTER 21
Complex Functions
Conclusion (8) actually gives the polar form of ez in terms of x and y. It implies that the magnitude of ez is ex , and that an argument of ez is y. We may state these conclusions: ez = eRez = ex and argez = Imz + 2n = y + 2n It is also easy to verify that ez = ez To see this, write ez = ex cosy + i siny = ex cosy − i siny = ex−iy = ez For example, e2+6i = e2+6i = e2−6i = e2 cos6 − i sin6 Conclusion (9) can be improved. Not only is 2ni a period of ez , but these numbers are the only periods. This is part (4) of the next theorem. THEOREM 21.13
1. ez = 1 if and only if z = 2ni for some integer n. 2. ez = −1 if and only if z = 2n + 1i for some integer n. 3. ez = ew if and only if z − w = 2ni for some integer n. 4. If p is a period of ez , then p = 2ni for some integer n. Contrast conclusion (2) of this theorem with conclusion (6) of the preceding theorem. If x is real, then ex is a positive real number. However, the complex exponential function can assume negative values. Conclusion (2) of this theorem gives all values of z such that ez assumes the value −1. Proof
For (1), suppose first that ez = 1. Then ez = 1 = ex cosy + iex siny
Then ex cosy = 1
and
ex siny = 0
Now x is real, so ex > 0 and the second equation requires that siny = 0. Since this is the real sine function, we know all of its zeros, and can conclude that y = k for integer k. Now we must have ex cosy = ex cosk = 1 But cosk = −1k for integer k, so ex −1k = 1 For this to be satisfied, we first need −1k to be positive, hence k must be an even integer, say k = 2n. This leaves us with ex = 1 so x = 0. Therefore z = x + iy = 2ni
21.3 The Exponential and Trigonometric Functions
961
Conversely, suppose z = 2ni for some integer n. Then ez = cos2n + i sin2n = 1 Conclusion (2) can be proved by an argument that closely parallels that just done for (1). For (3), if z − w = 2ni, then ez−w =
ez = e2ni = 1 ew
so ez = ew Conversely, suppose ez = ew . Then ez−w = 1, so by (1), z − w = 2ni for some integer n. Finally, for (4), suppose p is a period of ez . Then ez+p = ez for all z. But then ez ep = ez so ep = 1 and, by (1), p = 2ni for some integer n. Using the properties we have derived for ez , we can sometimes solve equations involving this function.
EXAMPLE 21.11
Find all z such that ez = 1 + 2i To do this, let z = x + iy, so ex cosy + iex siny = 1 + 2i Then ex cosy = 1 and ex siny = 2 Add the squares of these equations to get e2x cos2 y + sin2 y = e2x = 5 Then x=
1 ln5 2
in which ln5 is the real natural logarithm of 5. Next, divide: ex siny = tany = 2 ex cosy so y = tan−1 2. One solution of the given equation is z = 21 ln5 + i tan−1 2, or approximately 08047 + 11071i.
962
CHAPTER 21
Complex Functions
We are now ready to extend the trigonometric functions from the real line to the complex plane. We want to define sinz and cosz for all complex z, so that these functions agree with the real sine and cosine functions when z is real. Following the method used to extend the exponential function from the real line to the complex plane, we begin with power series.
DEFINITION 21.7
For all complex z, let sinz =
−1n 2n+1 z n=0 2n + 1!
and
cosz =
−1n n=0
2n!
z2n
The definition presupposes that these series converge for all complex z, a fact that is easy to show. From the power series, it is immediate that cos−z = cosz and
sin−z = − sinz
By differentiating the series term by term, we find that, for all z, d sinz = cosz dz
and
d cosz = − sinz dz
Euler’s formula states that, for real y, eiy = cosy + i siny. We will now extend this to the entire complex plane. THEOREM 21.14
For every complex number z, eiz = cosz + i sinz The proof follows that of Theorem 21.12(7), with z in place of x. We can express sinz and cosz in terms of the exponential function as follows. First, from Theorem 21.14, eiz = cosz + i sinz and e−iz = cosz − i sinz Solve these equations for sinz and cosz to obtain 1 iz 1 iz e + e−iz e − e−iz and sinz = 2 2i Formulas such as these reveal one of the benefits of extending these familiar functions to the complex plane. On the real line, there is no apparent connection between ex , sinx and cosx. These formulations are also convenient for carrying out many manipulations involving sinz and cosz. For example, to derive the identity cosz =
sin2z = 2 cosz sinz
21.3 The Exponential and Trigonometric Functions
963
we have immediately that 1 iz 1 iz e + e−iz e − e−iz 2 2i 1 2iz 1 2iz = e − e−2iz + 1 − 1 = e − e−2iz = sin2z 2i 2i
2 sinz cosz = 2
Identities involving real trigonometric functions remain true in the complex case, and we will often use them without proof. For example, sinz + w = sinz cosw + cosz sinw Not all properties of the real sine and cosine are passed along to their complex extensions. Recall that cosx ≤ 1 and sinx ≤ 1 for real x. Contrast this with the following. THEOREM 21.15
cosz and sinz are unbounded in the complex plane. The proof consists of simply showing that both functions can be made arbitrarily large in magnitude by certain choices of z. Let z = iy with y real. Then sinz = siniy =
1 −y e − ey 2i
so sinz =
1 y e − e−y 2
and the right side can be made as large as we like by choosing y sufficiently large in magnitude. That is, as z moves away from the origin in either direction along the vertical axis, sinz increases in magnitude without bound. It is easy to check that cosz exhibits the same behavior. It is often useful to know the real and imaginary parts of these functions. THEOREM 21.16
Let z = x + iy. Then cosz = cosx coshy − i sinx sinhy and sinz = sinx coshy + i cosx sinhy These expressions are routine to derive starting from the exponential expressions for sinz and cosz. We will now show that the complex sine and cosine functions have exactly the same periods and zeros as their real counterparts. THEOREM 21.17
1. sinz = 0 if and only if z = n for some integer n. 2. cosz = 0 if and only if z = 2n + 1/2 for some integer n.
964
CHAPTER 21
Complex Functions
3. sinz and cosz are periodic with periods 2n, for n any nonzero integer. Further, these are the only periods of these functions. Conclusion (3) means that cosz + 2n = cosz and
sinz + 2n = sinz
for all complex z and, conversely, if cosz + p = cosz
for all z
sinz + q = sinz
for all z
then p = 2n, and if then q = 2n. This guarantees that the sine and cosine functions do not pick up additional periods when extended to the complex plane, as occurs with the complex exponential function. Proof
For (1), if n is an integer, then
1 1 ni e − e−ni = 1 − 1 = 0 2i 2i Thus every z = n, with n an integer, is a zero of sinz. To show that these are the only zeros, suppose sinz = 0. Let z = x + iy. Then sinn =
sinx coshy + i cosx sinhy = 0 Then sinx coshy = 0
and
cosx sinhy = 0
Since coshy > 0 for all real y, then sinx = 0, and for this real sine function, this means that x = n for some integer n. Then cosx sinhy = cosn sinhy = 0 But cosn = −1 = 0, so sinhy = 0 and this forces y = 0. Thus z = n. (2) can be proved by an argument similar to that used for (1). For (3), if n is an integer, then n
1 iz+2n e − e−iz+2n 2i 1 iz 1 iz 2ni = e e e − e−iz = sinz − e−iz e−2ni = 2i 2i so each even integer multiple of is a period of sinz. To show that there are no other periods, suppose p is a period of sinz. Then sinz + 2n =
sinz + p = sinz for all complex z. In particular, this must hold for z = 0, so sinp = 0 and then by (1), p = n for integer n. But we can also put z = i to have sini + n = sini Then eii+n − e−ii+n = e−1 − e Therefore e−1 cosn − e cosn = e−1 − e
21.3 The Exponential and Trigonometric Functions
965
If n is even, then cosn = 1 and this equation is true. If n is odd, then cosn = −1 and this equation becomes −e−1 + e = e−1 − e an impossibility. Therefore n is even, and the only periods of sinz are even integer multiples of . A similar argument establishes the same result for periods of cosz. Here is an example in which facts about cosz are used to solve an equation.
EXAMPLE 21.12
Solve cosz = i. Let z = x + iy, so cosx coshy − i sinx sinhy = i Then cosx coshy = 0
sinx sinhy = −1
and
Since coshy > 0 for all real y, the first equation implies that cosx = 0, so x=
2n + 1 2
in which (so far) n can be any integer. From the second equation, 2n + 1 sin sinhy = −1 2 Now sin2n + 1/2 = −1n , so sinhy = −1n+1 with n any integer. Thus y = sinh−1 −1n+1 . The solutions of cosz = i are therefore the complex numbers 2n + 1 + i sinh−1 −1 2
for n an even integer,
and 2n + 1 + i sinh−1 1 2
for n an odd integer.
A standard formula for the inverse hyperbolic sine function gives sinh−1 = ln + 2 + 1 for real. Therefore the solutions can be written √ 2n + 1 + i ln−1 + 2 for n an even integer, 2 and √ 2n + 1 + i ln1 + 2 for n an even integer. 2
CHAPTER 21
966
Complex Functions
The other trigonometric functions are defined by secz =
1 cosz
cscz =
1 sinz
tanz =
sinz cosz
cotz =
cosz sinz
in each case for all z for which the denominator does not vanish. Properties of these functions can be derived from properties of sinz and cosz.
SECTION 21.3
PROBLEMS
In each of Problems 1 through 10, write the function value in the form a + bi. 1. ei
12. Find ux y and vx y such that e1/z = ux y + ivx y. Show that u and v satisfy the CauchyRiemann equations for all z except zero. 13. Find ux y and vx y such that tanz = ux y + ivx y. Determine where these functions are defined, and show that they satisfy the Cauchy-Riemann equations for these points x y.
2. sin1 − 4i 3. cos3 + 2i
14. Find ux y and vx y such that secz = ux y + ivx y. Determine where these functions are defined and show that they satisfy the Cauchy-Riemann equations for all such points.
4. tan3i 5. e5+2i i 6. cot 1 − 4
15. Prove that sin2 z + cos2 z = 1 for all complex z. 16. Let z and w be complex numbers.
7. sin2 1 + i 8. cos2 − i − sin2 − i
(a) Prove that cosz sinw.
sinz + w = sinz cosw +
9. ei/2
(b) Prove that sinz sinw.
cosz + w = cosz cosw −
10. sinei
17. Find all solutions of ez = 2i. z2
11. Find ux y and vx y such that e = ux y + ivx y. Show that u and v satisfy the CauchyRiemann equations for all complex z.
21.4
18. Find all solutions of sinz = i. 19. Find all solutions of ez = −2.
The Complex Logarithm In real calculus, the natural logarithm is the inverse of the exponential function: for x > 0, y = lnx
if and only if x = ey
In this way, the real natural logarithm can be thought of as the solution of the equation x = ey for y in terms of x. We can attempt this approach in seeking a definition of the complex logarithm. Given z = 0, we ask whether there are complex numbers w such that ew = z
21.4 The Complex Logarithm
967
To answer this question, put z in polar form as z = rei . Let w = u + iv. Then z = rei = ew = eu eiv
Since and v are real, ei = eiv = 1 and equation (21.7) gives us r = z = eu . Hence
(21.7)
u = lnr the real natural logarithm of the positive number r. But now equation (21.7) implies that ei = eiv , so by Theorem 2.13(3), iv = i + 2ni and therefore v = + 2n in which n can be any integer. In summary, given nonzero complex z = rei , there are infinitely many complex numbers w such that ew = z, and these numbers are w = lnr + i + 2ni with n any integer. Since is any argument of z, and all arguments of z are contained in the expression + 2n for n integer, then in terms of z, w = lnz + i argz with the understanding that there are infinitely many different values for argz. Each of these numbers is called a complex logarithm of z. Each nonzero complex number therefore has infinitely many logarithms. To emphasize this, we often write logz = lnz + i argz This is read, “the logarithm of z is the set of all numbers lnz + i, where varies over all arguments of z.”
EXAMPLE 21.13
Let z = 1 + i. Then z =
√
2ei/4+2n . Then 8 7 √ logz = ln 2 + i + 2ni 4 Some of the logarithms of 1 + i are √ √ √ 9 7 ln 2 + i ln 2 + i ln 2 − i 4 4 4
EXAMPLE 21.14
Let z = −3. An argument of z is , and in polar form z = 3ei+2n = 3e2n+1i . Then logz = ln3 + 2n + 1i Some values of log−3 are ln3 + i ln3 + 3i ln3 + 5i ln3 − i ln3 − 3i, and so on.
968
CHAPTER 21
Complex Functions
The complex logarithm is not a function, because with each nonzero z it associates infinitely many different complex numbers. Nevertheless, logz exhibits some of the properties we are accustomed to with real logarithm functions, if properly understood. THEOREM 21.18
Let z = 0. If w is any value of logz, then ew = z. This is the complex function equivalent of the fact that, in real calculus, elnx = x. This is the condition we used to reason to a definition of logz. THEOREM 21.19
Let z and w be nonzero complex numbers. Then each value of logzw is a sum of values of logz and logw. Let z = rei and w = ei . Then zw = rei+ . If is any value of logzw, then for some integer N ,
Proof
= lnr + i + + 2N = lnr + i + ln + i + 2N But lnr + i is one value of logz, and ln + i + 2N is one value of logw, proving the theorem. Here is an example of the use of the logarithm to solve an equation involving the exponential function.
EXAMPLE 21.15
Solve for all z such that ez = 1 + 2i. In Example 21.11 we found one solution by separating the real and imaginary parts of ez . Using the logarithm, we obtain all solutions as follows: ez = 1 + 2i means that z = log1 + 2i = ln 1 + 2i + i arg1 + 2i =
1 ln5 + i arctan2 + 2n 2
in which n is any positive integer. Sometimes it is convenient to agree on a particular logarithm to use for nonzero complex numbers. This can be done by choosing an argument. For example, we could define, for z = 0, Logz = lnz + i where 0 ≤ < 2. This assigns to the symbol Logz that particular value of logz corresponding to the argument of z lying in 0 2. For example, √ Log1 + i = ln 2 + i 4 and Log−3 = ln3 + i
21.5 Powers
969
If this is done, then care must be taken in doing computations. For example, in general Logzw = Logz + Logw.
PROBLEMS
SECTION 21.4
5. −9 + 2i
In each of Problems 1 through 6, determine all values of logz and also the value of Logz defined in the discussion. 1. 2. 3. 4.
6. 5 7. Let z and w be nonzero complex numbers. Show that each value of logz/w is equal to a value of logz minus a value of logw.
−4i 2 − 2i −5 1 + 5i
21.5
8. Give an example to show that in general Logzw = Logz + Logw for all nonzero complex z and w
Powers We want to assign a meaning to the symbol zw when w and z are complex numbers and z = 0. We will build this idea in steps. Throughout this section z is a nonzero complex number.
21.5.1
Integer Powers
Integer powers present no problem. Define z0 = 1. If n is a positive integer, then zn = z · z · · · · z, a product of n factors of z. For example, 1 + i4 = 1 + i1 + i1 + i1 + i = −4 If n is a negative integer, then zn = 1/zn . For example, 1 + i−4 =
21.5.2
1 1 =− 1 + i4 4
z1/n for Positive Integer n
Let n be a positive integer. A number u such that un = z is called an nth root of z, and is denoted z1/n . Like the logarithm and argument, this is a symbol that denotes more than one number. In fact, we will see that every nonzero complex number has exactly n distinct nth roots. To determine these nth roots of z, let z = rei , with r = z and is any argument of z. Then z = rei+2k in which k can be any integer. Then z1/n = r 1/n ei+2k/n
(21.8)
Here r 1/n is the unique real nth root of the positive number r. As k varies over the integers, the expression on the right side of equation (21.8) produces complex numbers whose nth powers equal z. Let us see how many such numbers it produces.
970
CHAPTER 21
Complex Functions
For k = 0 1 n − 1, we get n distinct nth roots of z. They are r 1/n ei/n r 1/n ei+2/n r 1/n ei+4/n r 1/n ei+2n−1/n
(21.9)
We claim that other choices of k simply reproduce one of these nth roots. For example, if k = n, then equation (21.8) yields r 1/n ei+2n/n = r 1/n ei/n e2i = r 1/n ei/n the first number in the list (21.9). If k = n + 1, we get r 1/n ei+2n+1/n = r 1/n ei+2/n e2i = r 1/n ei+2/n the second number in the list (21.9), and so on. To sum up, for any positive integer n, the number of nth roots of any nonzero complex number z, is n. These nth roots are r 1/n ei+2k/n or
r
1/n
for k = 0 1 n − 1
+ 2k + 2k cos + i sin n n
for k = 0 1 n − 1
EXAMPLE 21.16
Find the fourth roots of 1 + i. √ Since one argument of 1 + i is /4, and 1 + i = 2, we have the polar form √ 1 + i = 2ei/4+2k The fourth roots are
√ 21/4 ei/4+2k/4
for k = 0 1 2 3
These numbers are 21/8 ei/16 or
21/8 ei/4+2/4
21/8 ei/4+4/4
21/8 ei/4+6/4
21/8 cos + i sin 16 16 9 9 21/8 cos + i sin 16 16 17 17 1/8 cos 2 + i sin 16 16 25 25 21/8 cos + i sin 16 16
EXAMPLE 21.17
The nth roots of 1 are called the nth roots of unity. These numbers have many uses, for example, in connection with the fast Fourier Transform. Since 1 has magnitude 1, and an argument of 1 is zero, the nth roots of unity are e2ki/n
for k = 0 1 n − 1
21.5 Powers
971
y
x
FIGURE 21.8
If we put = e2i/n , then these nth roots of unity are 1 2 n−1 . For example, the fifth roots of unity are 1 e2i/5 e4i/5 e6i/5 These are
and e8i/5
2 2 4 4 1 cos + i sin cos + i sin 5 5 5 5 6 8 8 6 cos + i sin cos + i sin 5 5 5 5
If plotted as points in the plane, the nth roots of unity form vertices of a regular polygon with vertices on the unit circle z = 1, and having one vertex at 1 0. Figure 21.8 shows the fifth roots of unity displayed in this way. If n is a negative integer, then z1/n =
1 z1/n
in the sense that the n numbers represented by the symbol on the left are calculated by taking the n numbers produced on the right. These are just the reciprocals of the nth roots of z.
21.5.3
Rational Powers
A rational number is a quotient of integers, say r = m/n. We may assume that n is positive and that m and n have no common factors. Write zr = zm/n = zm 1/n the nth roots of zm . It is routine to check that we get the same numbers if we first take the nth roots of z, then raise each to power m. This is because 1/n m m = r m/n eim+2k/n = r 1/n ei+2k/n = z1/n zm 1/n = r m eim+2k
EXAMPLE 21.18
We will find all values of 2 − 2i3/5 . 3 √ First, 2 − 2i = −16 − 16i. Thus we want the fifth roots of −16 − 16i. Now −16 − 16i = 512, and 5/4 is an argument of −16 − 16i. Then −16 − 16i = 5121/2 ei5/4+2k
CHAPTER 21
972
Complex Functions
and −16 − 16i1/5 = 5121/10 ei5/4+2k/5 Letting k = 0 1 2 3 4, we obtain the numbers 5121/10 e5i/4
5121/10 e13i/20
5121/10 e21i/20
5121/10 e29i/20
5121/10 e37i/20
These are all values of 2 − 2i3/5 .
21.5.4 Powers zw Suppose z = 0, and let w be any complex number. We want to define the symbol zw . In the case of real powers, ab is defined to be b lna. For example, 2 = e ln2 , and this is defined because ln2 is defined. We will take the same approach to zw , except now we must allow for the fact that logz denotes an infinite set of complex numbers. We therefore define zw to be the set of all numbers ew logz . If w = m/n, a rational number with common factors divided out, then ew logz has n distinct values. If w is not a rational number, then zw is an infinite set of complex numbers.
EXAMPLE 21.19
We will find all values of 1 − i1+i . √ These numbers are obtained as e1+i log1−i . First, 1 − i = 2 and −/4 is an argument of 1 − i (we reach the point 1 −1 by rotating /4 radians clockwise from the positive real axis). Therefore, in polar form, √ 1 − i = 2ei−/4+2n Thus all values of log1 − i are given by √ ln 2 + i − + 2n 4 Every value of 1 − i1+i is contained in the expression e1+i ln
√ 2+i−/4+2n
√
√
= eln 2+/4−2n eiln 2−/4+2n √ √ √ = 2e/4−2n cosln 2 − /4 + 2n + i sinln 2 − /4 + 2n √ √ √ = 2e/4−2n cosln 2 − /4 + i sinln 2 − /4
As n varies over all integer values, this expression gives all values of 1 − i1+i .
SECTION 21.5
PROBLEMS
In each of Problems 1 through 14, determine all values of zw .
4. 1 + i2−i
1. i1+i
6. 1 − i1/3
2. 1 + i2i
7. i1/4
3. i
i
5. −1 + i−3i
8. 161/4
21.5 Powers 9. −42−i −2−3i
10. 6
11. −161/4 1 + i 1/3 12. 1−i 1/6
13. 1
973
14. 7i3i 15. Let n be a positive integer, and let u1 un be the nth roots of unity. Prove that nj=1 uj = 0. Hint: Write each nth root of unity as a power of e2i/n . 16. Let n be a positive integer, and = e2i/n . Evaluate n−1 j j j=0 −1 .
This page intentionally left blank
CHAPTER
CURVES IN THE PLANE THE INTEGRAL OF A COMPLEX FUNCTION CAUCHY’S THEOREM CONSEQUENCES OF CAUCHY’S THEOREM THE DEFORMATION THEOREM BOUNDS ON DERIVATIVES CURVES IN THE
22
Complex Integration
We now know some important complex functions, as well as some facts about derivatives of complex functions. Next we want to develop an integral for complex functions. Real functions are defined over sets of real numbers, and are usually integrated over intervals. Complex functions are defined over sets of points in the plane, and are integrated over curves. Before defining this integral, we will review some facts about curves. For reference, real line integrals are discussed in Chapter 13.
22.1
Curves in the Plane A curve in the complex plane is a function $ a b → , defined on a real interval a b and having complex values. For each number t in a b, $t is a complex number, or point in the plane. The locus of such points is the graph of the curve. However, the curve is more than just a locus of points in the plane. $ has a natural orientation, which is the direction the point $t moves along the graph as t increases from a to b. In this sense, it is natural to refer to $a as the initial point of the curve, and $b as the terminal point. If $t = xt + iyt, then the graph of $ is the locus of points xt yt for a ≤ t ≤ b. The initial point of $ is xa ya and the terminal point is xb yb, and xt yt moves from the initial to the terminal point as t varies from a to b. The functions xt and yt are the coordinate functions of $.
EXAMPLE 22.1
Let $t = 2t + t2 i for 0 ≤ t ≤ 2. Then $t = xt + iyt 975
976
CHAPTER 22
Complex Integration y (4, 4)
4 3
y
2
Ψ(t) = e it
1
0
Ψ(3π) = 1 1
FIGURE 22.1
0 ≤ t ≤ 2.
2
3
4
Ψ(0) = 1
x
x
x = 2t, y = t2 for
FIGURE 22.2
+t = eit for 0 ≤ t ≤ 3.
where xt = 2t and yt = t2 . The graph of this curve is the part of the parabola y = x/22 , shown in Figure 22.1. As t varies from 0 to 2, the point $t = 2t t2 moves along this graph from the initial point $0 = 0 0 to the terminal point $2 = 4 4. The arrow on the graph indicates this orientation.
EXAMPLE 22.2
Let +t = eit for 0 ≤ t ≤ 3. Then +t = cost + i sint = xt + iyt, so xt = cost
yt = sint
Since x2 + y2 = 1, every point on this curve is on the unit circle about the origin. However, the initial point of + is +0 = 1 and the terminal point is +3 = e3i = −1. This curve is not closed. If this were a racetrack, the race begins at 1 in Figure 22.2, and ends at −1. A circular racetrack does not mean that the starting and ending points of the race are the same. This is not apparent from the graph itself. + is oriented counterclockwise, as the arrow indicates.
EXAMPLE 22.3
Let )t = eit for 0 ≤ t ≤ 4. This curve is closed, since )0 = 1 = )4. However, the point xt yt moves around the unit circle x2 + y2 = 1 twice as t varies from 0 to 4. This is also not apparent from just the graph itself (Figure 22.3).
y (t) e it (0) (2π) (4π) 1
FIGURE 22.3
)t = eit for 0 ≤ t ≤ 4.
x
22.1 Curves in the Plane
977
y y
(t) x(t) y(t)i x(t)i y(t)j
Position vector of a curve.
3
1
'(t) (t)
x FIGURE 22.4
y
2 x
FIGURE 22.5
Tangent vector to a curve.
4 x
The join $1 ⊕ $2 ⊕ $3 ⊕ $4 .
FIGURE 22.6
A curve $ is simple if $t1 = $t2 whenever t1 = t2 . This means that the same point is never repeated at different times. An exception is made for closed curves, which require that $a = $b. If this is the only point at which $t1 = $t2 with t1 = t2 , then we call $ a simple closed curve. The curve ) of Example 22.3 is closed, but not simple. If we define ,t = eit for 0 ≤ t ≤ 2, then xt yt goes around the circle exactly once as t varies from 0 to 2, and , is a simple closed curve. A curve $ a b → is continuous if each of its coordinate functions is continuous on
a b. If xt and yt are differentiable on a b we call $ a differentiable curve. If x t and y t are continuous, and do not both vanish for the same value of t, we call $ a smooth curve. All the curves in the above examples are smooth. In vector terms, we can write $t = xti + ytj (Figure 22.4). If $ is differentiable, and x t and y t are not both zero, then $ t = x ti + y tj is the tangent vector to the curve at the point xt yt (Figure 22.5). If $ is smooth, then x t and y t are continuous, so this tangent vector is continuous. A smooth curve is therefore one having a continuous tangent. To illustrate, in Example 22.3, )t = cost + i sint, so ) t = − sint + i cost. We can leave this as it is, or write the tangent vector ) t = − sinti + costj, exploiting the natural correspondence between complex numbers and vectors in the plane. Sometimes we form a curve $ by joining several curves $1 $n in succession, with the understanding that the terminal point of $j−1 must be the same as the initial point of $j for j = 2 n (Figure 22.6). Such a curve is called the join of $1 $n and is denoted $ = $1 ⊕ $2 ⊕ · · · ⊕ $n The curves $j are the components of this join. If each component of a join is smooth, then the join is piecewise smooth. It has a continuous tangent at each point, except perhaps at the seams where $j−1 is joined to $j . If the seams join in a smooth fashion, a join can even have a tangent at each of these points and itself be smooth.
EXAMPLE 22.4
Let $1 t = eit for 0 ≤ t ≤ , and let $2 t = −1 + ti for 0 ≤ t ≤ 3. Then $1 = −1 = $2 0, so the terminal point of $1 is the initial point of $2 . Figure 22.7 shows a graph of $1 ⊕ $2 . This curve is piecewise smooth, being a join of two smooth curves. The join has a tangent at each point except −1, where the connection is made to form the join. We will define a kind of equivalence of curves. Suppose $ a b → and # A B → are two smooth curves. We call these curves equivalent if one can be obtained from the other by a change of variables defined by a differentiable, increasing function. This means that there is a function taking points of A B to a b such that
978
CHAPTER 22
Complex Integration
1. t > 0 for A < t < B, 2. A = a and B = b, and 3. #p = $p for A ≤ t ≤ B. If we think of t = p, then $t = #p. The curves have the same initial and terminal points and the same graph and orientation, but $t moves along the graph as t varies from a to b, while #p moves along the same graph in the same direction as p varies from A to B. Informally, two curves are equivalent if one is just a reparametrization of the other. 1 3i 2 1
y 1 1
x
The join of $1 t = eit for 0 ≤ t ≤ , with $2 t = −1 + it for 0 ≤ t ≤ 3.
FIGURE 22.7
EXAMPLE 22.5
Let $t = t2 − 2ti for 0 ≤ t ≤ 1 and #p = sin2 p − 2 sinpi for 0 ≤ p ≤ /2 Both of these curves have the same graph (Figure 22.8), extending from initial point 0 to terminal point 1 − 2i. Let t = p = sinp for 0 ≤ p ≤ /2 Then is a differentiable, increasing function that takes 0 /2 onto 0 1. Further, for 0 ≤ p ≤ /2, $sinp = sin2 p − 2 sinpi = #p These curves are therefore equivalent. Informally, we will often describe a curve geometrically and speak of a curve and its graph interchangeably. When this is done, it is important to keep track of the orientation along the curve, and whether or not the curve is closed. For example, suppose $ is the straight line from 1 + i to 3 + 3i (Figure 22.9). This gives the graph and its orientation, and from this we can find $. Since the graph is the segment of the straight line from 1 1 to 3 3, the coordinate functions are x = t y = t for 1 ≤ t ≤ 3 Then $t = xt + yti = 1 + it for 1 ≤ t ≤ 3
22.1 Curves in the Plane 0
0.2
0.4
0.6
0.8
1.0
979
x
0.5
y 3 3i
1.0
y
1.5 3i
2.0
1i
x
i
(t) i 2e it, 0 t 2 2i x
y FIGURE 22.8
for 0 ≤ t ≤ 1.
Graph of $t = t2 − 2it
FIGURE 22.9
FIGURE 22.10
Directed line from 1 + i to 3 + 3i.
is one representation of the curve that has been described. There are of course other, equivalent representations. As another example, suppose # is the quarter-circle of radius 2 about i, from 2 + i to 3i (Figure 22.10). Again, we have given the graph and its orientation. Using polar coordinates centered at i = 0 1, we can write the coordinate functions xt = 2 cost yt = 1 + 2 sint for 0 ≤ t ≤ /2. As a function, this curve can be written #t = 2 cost + 2i sint + i = i + 2eit for 0 ≤ t ≤ /2 Other, equivalent, representations can also be used. Finally, we will often make statements such as “f is continuous on $,” by which we mean that f is a complex function that is continuous at all points on the graph of $. And when we refer to “z on $,” we mean a complex number z lying on the graph of $. Curves are the objects over which we integrate complex functions. We will now define this integral.
SECTION 22.1
PROBLEMS
In each of Problems 1 through 10, graph the curve, determine its initial and terminal points, whether it is closed or not closed, whether or not it is simple, and the tangent to the curve at each point where the tangent exists. This tangent may be expressed as a vector or as a complex function.
4. $t = 3 cost + 5 sinti for 0 ≤ t ≤ 2 5. )t = 3 cost + 5 sinti for 0 ≤ t ≤ 4 6. ,t = 4 sint − 2 costi for − ≤ t ≤ /2 7. +t = t − t2 i for −2 ≤ t ≤ 4
1. $t = 4 − 2i + 2eit for 0 ≤ t ≤
1 8. #t = 2t + 1 − t2 i for −3 ≤ t ≤ −1 2
2. $t = ie2it for 0 ≤ t ≤ 2
9. $t = cost − 2 sin2ti for 0 ≤ t ≤ 2
3. $t = t + t i for 1 ≤ t ≤ 3 2
10. t = t2 − t4 i for −1 ≤ t ≤ 1
CHAPTER 22
980
22.2
Complex Integration
The Integral of a Complex Function We will define the integral of a complex function in two stages, beginning with the special case that f is a complex function defined on an interval a b of real numbers. An example of such a function is fx = x2 + sinxi for 0 ≤ x ≤ . It is natural to integrate such a function as 1 fxdx = x2 dx + i sinxdx = 3 + 2i 3 0 0 0 This is the model we follow in general for such functions.
DEFINITION 22.1
Let f a b → be a complex function. Let fx = ux + ivx for a ≤ x ≤ b. Then b b b fxdx = uxdx + i vxdx a
a
a
Both of the integrals on the right are Riemann integrals of real-valued functions over a b.
EXAMPLE 22.6
Let fx = x − ix2 for 1 ≤ x ≤ 2. Then 2 2 2 3 7 fxdx = xdx − i x2 dx = − i 2 3 1 1 1
EXAMPLE 22.7
Let fx = cos2x + i sin2x for 0 ≤ x ≤ /4. Then /4 /4 /4 1 1 fxdx = cos2xdx + i sin2xdx = + i 2 2 0 0 0 In the last example, it is tempting to let fx = e2ix and adapt the fundamental theorem of calculus to complex functions to obtain
/4 /4 1 2ix /4 1 i/2 e e −1 fxdx = e2ix dx = = 2i 2i 0 0 0 =
1 1 1 cos + i sin − 1 = −1 + i = 1 + i 2i 2 2 2i 2
We will justify this calculation shortly. We can now define the integral of a complex function over a curve in the plane.
22.2 The Integral of a Complex Function
981
DEFINITION 22.2
Let f be a complex function. Let $ a b → be a smooth curve in the plane. Assume that f is continuous at all points on $. Then the integral of f over $ is defined to be b fzdz = f$t$ tdt $
a
Since z = $t on the curve, this integral is often written b fzdz = fztz tdt $ a This formulation has the advantage of suggesting the way $ fzdz is evaluated – replace z with zt on the curve, let dz = z tdt, and integrate over the interval a ≤ t ≤ b.
EXAMPLE 22.8
Evaluate $ zdz if $t = eit for 0 ≤ t ≤ . The graph of $ is the upper half of the unit circle, oriented counterclockwise from 1 to −1 (Figure 22.11). On $, zt = eit and z t = ieit . Further, fzt = zt = e−it because t is real. Then fzdz = e−it ieit dt = i dt = i 0
$
0
EXAMPLE 22.9
Evaluate # z2 dz if #t = t + it for 0 ≤ t ≤ 1. The graph of # is the straight line segment from the origin to 1 1, as shown in Figure 22.12. On the curve, zt = 1 + it. Since fz = z2 , fzt = zt2 = 1 + i2 t2 = 2it2 and z t = 1 + i Then
#
z2 dz =
1 0
2it2 1 + idt = −2 + 2i
1 0
2 t2 dt = −1 + i 3
y z(t) eit, 0 t 1
1
FIGURE 22.11
x
y
1i z(t) t it, 0 t 1 x
FIGURE 22.12
982
CHAPTER 22
Complex Integration
EXAMPLE 22.10
Evaluate $ z Rezdz if $t = t − it2 for 0 ≤ t ≤ 2. Here fz = z Rez, and on this curve, zt = t − it2 , so fzt = zt Rezt = t − it2 t = t2 − it3 Further, z t = 1 − 2it so $
z Rezdz = =
2 0
0
2
t2 − it3 1 − 2itdt = t2 − 2t4 dt − 3i
0
2
2 0
t2 − 3it3 − 2t4 dt
t3 dt = −
152 − 12i 15
We will show that the integrals of a function over equivalent curves are equal. This is important because we can parametrize a given curve infinitely many different ways, and this should not change the value of the integral of a given function over the curve.
THEOREM 22.1
Let $ and # be equivalent curves and let f be continuous on their graph. Then $
fzdz =
fzdz #
Proof Suppose $ a b → and # A B → . Because these curves are equivalent, there is a continuous function with positive derivative on A B such that A = a and B = b, and #p = $p for A ≤ p ≤ B. By the chain rule,
# p = $ p p Then #
fzdz =
B A
f#p# pdp =
B
f$p$ p pdp
A
Let s = p. Then s varies from a to b as p varies from A to B. Continuing from the last equation, we have #
fzdz =
b a
f$s$ sds =
fzdz $
Thus far we can integrate only over smooth curves. We can extend the definition to an integral over piecewise smooth curves by adding the integrals over the components of the join.
22.2 The Integral of a Complex Function
983
DEFINITION 22.3
Let $ = $1 ⊕ $2 ⊕ · · · ⊕ $n be a join of smooth curves. Let f be continuous on each $j . Then n fzdz = fzdz $
j=1 $j
EXAMPLE 22.11
Let $1 t = 3eit for 0 ≤ t ≤ /2, and let $2 t = t2 + 3it + 1 for 0 ≤ t ≤ 1. $1 is the quarter circle of radius 3 about the origin, extending counterclockwise from 3 to 3i, and $2 is the part of the parabola x = y − 32 /9 from 3i to 1 + 6i. Figure 22.13 shows a graph of $ = $1 ⊕ $2 . We will evaluate $ Imzdz.
y 1 6i 2 3i
On $1 write zt = 3eit = 3 cost + 3i sintThen /2 /2 Imzdz = Imztz tdt = 3 sint −3 sint + 3i costdt
1 3
x
0
$1
0
= −9
FIGURE 22.13
0
/2
sin2 tdt + 9i
/2 0
9 9 sint costdt = − + i 4 2
On $2 zt = t2 + 3it + 1 and z t = 2t + 3i so 1 Imzdz = Im t2 + 3it + 1 2t + 3idt 0
$2
= =
1 0 1 0
3t + 12t + 3idt = 6t2 + 6tdt + 9i
0
1
1 0
6t2 + 6t + 9it + 9i dt
t + 1dt = 5 +
27 i 2
Then $
22.2.1
9 27 9 9 fzdz = − + i + 5 + i = 5 − + 18i 4 2 2 2
The Complex Integral in Terms of Real Integrals
It is possible to think of the integral of a complex function over a curve as a sum of line integrals of real-valued functions of two real variables over the curve. Let fz = ux y + ivx y and, on the curve $, suppose zt = xt + iyt for a ≤ t ≤ b. Now fzt = uxt yt + ivxt yt and z t = x t + iy t
984
CHAPTER 22
Complex Integration
so fztz t = uxt yt + ivxt yt x t + iy t = uxt ytx t − vxt yty t + i vxt ytx t + uxt yty t Then $
fzdz =
b a
+i
uxt ytx t − vxt yty t dt a
b
vxt ytx t + uxt yty tdt
In the notation of real line integrals, fzdz = udx − vdy + i vdx + udy $
$
(22.1)
$
This formulation allows a perspective that is sometimes useful in developing properties of complex integrals.
EXAMPLE 22.12
Evaluate $ iz2 dz if $t = 4 cost + i sint for 0 ≤ t ≤ /2. Figure 22.14 shows a graph of $, which is part of the ellipse
To evaluate
$
x2 + y2 = 1 16 iz2 dz in terms of real line integrals, first compute fz = iz2 = −2xy + ix2 − y2 = u + iv
where ux y = −2xy and vx y = x2 − y2
y 1.0 0.8 0.6 0.4 0.2 0
1
FIGURE 22.14
for 0 ≤ t ≤ /2.
2
3
4
x = 4 cost, y = sint
x
22.2 The Integral of a Complex Function
985
On the curve, xt = 4 cost and yt = sint. Now equation (22.1) gives us /2 /2 −8 cost sint −4 sintdt − 16 cos2 t − sin2 t costdt iz2 dz = 0
$
+i =
0
/2
16 cos2 t − sin2 t −4 sintdt +
0
/2
−8 cost sint costdt 0
1 64 − i 3 3
We will have an easier way of evaluating simple line integrals such as have more properties of complex integrals.
22.2.2
$
iz2 dz, when we
Properties of Complex Integrals
We will develop some properties of
THEOREM 22.2
$
fzdz.
Linearity
Let $ be a piecewise smooth curve and let f and g be continuous on $. Let and be complex numbers. Then fz + gzdz = fzdz + gzdz $
$
$
This conclusion is certainly something we expect of anything called an integral. The result extends to arbitrary finite sums: n $ j=1
j fj zdz =
n j=1
j
$
fj zdz
Orientation plays a significant role in the complex integral, because it is an intrinsic part of the curve over which the integral is taken. Suppose $ a b → is a smooth curve, as typically shown in Figure 22.15. The arrow indicates orientation. We can reverse this orientation by defining the new curve $r t = $a + b − t for a ≤ t ≤ b $r is a smooth curve having the same graph as $. However, $r a = $b and $r b = $a $r starts where $ ends, and $r ends where $ begins. The orientation has been reversed. We claim that reversing orientation changes the sign of the integral. y (b) FIGURE 22.15
y
(a) x
r r (a) (b)
Reversing orientation on a curve.
r (b) (a) x
986
CHAPTER 22
Complex Integration
THEOREM 22.3
Reversal of Orientation
Let $ a b → C be a smooth curve. Let f be continuous on $. Then fzdz = − fzdz $
Proof
$r
Let u = a + b − t. By the chain rule, d d $r t = $a + b − t = $ uu t = −$ u = −$ a + b − t dt dt
Then
$r
fzdz =
b a
f$r t$r tdt = −
b a
f$a + b − t$ a + b − tdt
Now change variables by putting s = a + b − t. Then a b fzdz = − f$s$ s−1ds = − f$s$ sds = − fzdz $r
b
a
$
We need not actually define $r to reverse orientation in specific integrals — just integrate from b to a instead of from a to b. This reverses the roles of the initial and terminal points and hence the orientation. Or we can integrate from a to b and take the negative of the result. We will next state a complex version of the fundamental theorem of calculus. It states that, if f has a continuous antiderivative F , then the value of $ fzdz is the value of F at the terminal point of $, minus the value of F at the initial point. THEOREM 22.4
Let f be continuous on an open set G, and suppose F z = fz for z in G. Let $ a b → G be a smooth curve in G. Then b fzdz = F$b − F$a a
Proof
With $t = zt = xt + iyt, and Fz = Ux y + iVx y, b b b d Fztdt fzdz = fztz tdt = F ztz tdt = a a a dt $ b d b d Uxt ytdt + i Vxt ytdt = a dt a dt
Now we can apply the fundamental theorem of calculus to the two real integrals on the right to obtain fzdz = Uxb yb + iVxb yb − Uxa ya + iVxa ya $
= Fxb yb − Fxa ya = F$b − F$a
EXAMPLE 22.13
We will compute
$
z2 + izdz if $t = t5 − t costi for 0 ≤ t ≤ 1.
22.2 The Integral of a Complex Function
987
1 This is an elementary but tedious calculation if done by computing 0 fztz tdt. However, if we let G be the entire complex plane, then G is open, and Fz = z3 /3 + iz2 /2 satisfies F z = fz. The initial point of $ is $0 = 0 and the terminal point is $1 = 1 − cos1i. Therefore z2 + izdz = F$1 − F$0 = F1 − cos1i − F0 $
1 1 1 i 1 − cos1i + i = 1 − cos1i3 + 1 − cos1i2 = 1 − cos1i2 3 2 3 2
One ramification of Theorem 22.4 is that, under the given conditions, the value of $ fzdz depends only on the initial and terminal points of the curve. If # is also a smooth curve in G having the same initial point as $ and the same terminal point as $, then fzdz = fzdz $
#
This is called independence of path, about which we will say more later. Another consequence is that, if $ is a closed curve in G, then the initial and terminal points coincide and fzdz = 0 $
We will consider this circumstance in more detail when we consider Cauchy’s theorem. The next result is used in bounding the magnitude of an integral, as we sometimes need to do in making estimates in equations or inequalities. THEOREM 22.5
Let $ a b → C be a smooth curve and let f be continuous on $. Then b fzt z t dt fzdz ≤ $
a
If, in addition, there is a positive number M such that fz ≤ M for all z on $, then fzdz ≤ ML $
where L is the length of $. Proof
Write the complex number
$
fzdz in polar form: fzdz = rei $
Then r = e−i
$
fzdz = e−i
b
fztz tdt
a
Since r is real,
b b
r = Rer = Re e−i fztz tdt = Re e−i fztz t dt a
a
Now for any complex number w, Rew ≤ w. Therefore
Re e−i fztz t ≤ e−i fztz t = fztz t
988
CHAPTER 22
Complex Integration
since e−i = 1 for real. Then b b b fztz t dt = fzt z t dt fzdz = r = fztz tdt ≤ a
$
a
a
as we wanted to show. If now fz ≤ M on $, then b b fzdz fzt z t dt ≤ M z t dt ≤ $
a
a
If $t = xt + iyt, then z t = x t + iy t = so
x t2 + y t2
b x t2 + y t2 dt = ML fzdz ≤ M a
$
EXAMPLE 22.14
We will obtain a bound on $ eRez dz, where $ is the circle of radius 2 about the origin, traversed once counterclockwise. On $ we can write zt = 2 cost + 2i sint for 0 ≤ t ≤ 2. Now Rezt e = e2 cost ≤ e2 for 0 ≤ t ≤ 2. Since the length of $ is 4, then Rez dz ≤ 4e2 e $
This number bounds the magnitude of the integral. It is not claimed to be an approximation to the value of the integral to any degree of accuracy.
22.2.3 Integrals of Series of Functions We often want to interchange an integral and a series. We would like conditions under which ? fn zdz = fn zdz $
n=1
n=1 $
We willshow that, if we can bound each fn z, for z on the curve, by a positive constant Mn so that n=1 Mn converges, then we can interchange the summation and the integral and integrate the series term-by-term. THEOREM 22.6
Term-by-Term Integration
Let $ be a smooth curve and let fn be continuous on $ for n = 1 2 Suppose for each positive integer n there is a positive number Mn such that n=1 Mn converges and, for all z on $, fn z ≤ Mn
22.2 The Integral of a Complex Function Then then
n=1 fn z
converges absolutely for all z on $. Further, if we denote $
gzdz =
n=1 $
n=1 fn z
989
= gz,
fn zdz
For each z on $, the real series n=1 fn z converges by comparison with the convergent series n=1 Mn . Now let L be the length of $ and consider the partial sum
Proof
FN z =
N
fn z
n=1
Each FN is continuous on $ and N fn zdz = gzdz − FN zdz gzdz − $ $ $ n=1 ≤ L max gz − FN z z on $
Now for all z on $,
fn z ≤ Mn gz − Fn z = n=N +1 n=N +1 If is any positive number, we can choose N large enough that n=N +1 Mn < /L, because n=1 Mn converges. But then max gz − FN z < z on $ L so N fn zdz < L = gzdz − $ L n=1 $ for N sufficiently large. This proves that N lim fn zdz = gzdz N →
n=1 $
$
as we wanted to show. Of course, the theorem applies to a power series within its circle of convergence, with fn z = cn z − z0 n .
SECTION 22.2
PROBLEMS
In each of Problems 1 through 15, evaluate $ fzdz. All closed curves are oriented counterclockwise unless specific exception is made. 1. fz = 1 $t = t2 − it for 1 ≤ t ≤ 3 2. fz = z2 − iz $ is the quarter circle about the origin from 2 to 2i
3. fz = Rez $ is the line segment from 1 to 2 + i 4. fz = 1/z $ is the part of the half-circle of radius 4 about the origin, from 4i to −4i 5. fz = z − 1 $ is any piecewise smooth curve from 2i to 1 − 4i 6. fz = iz2 $ is the line segment from 1 + 2i to 3 + i
990
CHAPTER 22
Complex Integration
7. fz = sin2z $ is the line segment from −i to −4i
13. fz = iz $ is the line segment from 0 to −4 + 3i
8. fz = 1 + z $ is the part of the circle of radius 3 about the origin from −3i to 3i
14. fz = Imz $ is the circle of radius 1 about the origin
9. fz = −i cosz $ is any piecewise smooth curve from 0 to 2 + i
15. fz = z2 $ is the line segment from −i to 1 16. Find a bound for $ cosz2 dz, if $ is the circle of radius 4 about the origin. 1 dz, if $ is the line segment 17. Find a bound for $ 1+z from 2 + i to 4 + 2i.
2
10. fz = z2 $ is the line segment from −4 to i 11. fz = z − i3 $t = t − it2 for 0 ≤ t ≤ 2 12. fz = eiz $ is any piecewise smooth curve from −2 to −4 − i
22.3
Cauchy’s Theorem Cauchy’s theorem (or the Cauchy integral theorem) is considered the fundamental theorem of complex integration, and is named for the early nineteenth-century French mathematician and engineer Augustin-Louis Cauchy. He had the idea of the theorem, as well as many of its consequences, but was able to prove it only under what were later found to be unnecessarily restrictive conditions. Edouard Goursat proved the theorem as it is now usually stated, and for this reason it is sometimes called the Cauchy–Goursat theorem. The statement of the theorem implicitly makes use of the Jordan curve theorem, discussed previously in Section 13.2 in connection with Green’s theorem. It states that a continuous simple closed curve $ in the plane separates the plane into two open sets. One of these sets is unbounded, and is called the exterior of $, and the other set is bounded and is called the interior of $. The (graph of the) curve itself does not belong to either of these sets, but forms the boundary for both. Figure 22.16 illustrates the theorem. Although this conclusion may seem obvious for closed curves we might routinely sketch, it is difficult to prove because of the generality of its statement. Some terminology will make the statement of Cauchy’s theorem more efficient.
DEFINITION 22.4
Path
A path is a simple, piecewise smooth curve. A path in a set S is a path whose graph lies in S.
Thus, a path is a join of smooth curves with no self-intersections.
y
Exterior of
Interior of
x FIGURE 22.16
theorem.
Jordan curve
22.3 Cauchy’s Theorem
DEFINITION 22.5
991
Connected Set
A set S of complex numbers is connected if, given any two points z and w in S, there is a path in S having z and w as end points.
S is connected if it is possible to get from any point of S to any other point by moving along some path lying completely in S. An open disk is connected, as is a closed disk, while the set consisting of the two open disks z < 1 and z − 10i < 1 is not (Figure 22.17), because we cannot get from 0 to 10i without going outside the set.
DEFINITION 22.6
Domain
An open, connected set of complex numbers is called a domain.
D is a domain if 1. about any z in D, there is some open disk containing only points of D, and 2. we can get from any point in D to any other point in D by a path in D. For example, any open disk is a domain, as is the upper half plane consisting of all z with Imz > 0. A closed disk is not a domain (it is connected but not open), and a set consisting of two disjoint open disks is not a domain (it is open but not connected).
DEFINITION 22.7
Simply Connected
A set S of complex numbers is simply connected if every closed path in S encloses only points of S.
Every open disk is simply connected (Figure 22.18). If we draw a closed path in an open disk, this closed path will enclose only points in the open disk. The annulus of Figure 22.19, consisting of points between two concentric circles, is not simply connected, even though it is connected. We can draw a closed path contained in the annulus, but enclosing the inner boundary circle of the annulus. This curve encloses points not in the annulus, namely those enclosed by the inner boundary circle. We are now ready to state one version of Cauchy’s theorem. THEOREM 22.7
Cauchy’s Theorem
Let f be differentiable on a simply connected domain G. Let $ be a closed path in G. Then fzdz = 0 $ / Often integrals around closed paths are denoted . In this notation, the conclusion of the / theorem reads $ fzdz = 0. The oval on the integral sign is just a reminder that the path is closed, and does not alter the way we operate with the integral or the way it is evaluated.
992
CHAPTER 22
Complex Integration y
10i y z 1
y
x x
x
FIGURE 22.17
FIGURE 22.18
Disjoint open disks form a set that is not connected.
An open disk is simply connected.
z 2
FIGURE 22.19 The set of points between two concentric circles is not simply connected.
/ Informally, Cauchy’s theorem states that $ fzdz = 0 if f is differentiable on the curve and on all points enclosed by the curve. As a convention, closed curves are understood to be oriented positively (counterclockwise), unless specific exception is made.
EXAMPLE 22.15
/ 2 Evaluate $ ez dz, where $ is any closed path in the plane. 2 Figure 22.20 shows a typical $. Here fz = ez is differentiable for all z, and the entire plane is a simply connected domain. Therefore - 2 ez dz = 0 $
EXAMPLE 22.16
Evaluate
$
2z + 1 dz z2 + 3iz
where $ is the circle z + 3i = 2 of radius 2 and center −3i (Figure 22.21). y
y
x x
2 z 3i 2
FIGURE 22.20 A simple closed path $.
FIGURE 22.21
22.3 Cauchy’s Theorem
993
We can parametrize $t = −3i + 2eit for 0 ≤ t ≤ 2. $t traverses the circle once counterclockwise as t varies from 0 to 2. First observe that fz is differentiable except at points where the denominator vanishes, 0 and −3i. Use a partial fractions decomposition to write 11 6+i 1 fz = + 3i z 3 z + 3i Since 1/z is differentiable on and within the simply connected domain enclosed by $, by Cauchy’s theorem, - 11 dz = 0 $ 3i z However, 1/z + 3i is not differentiable in the simply connected domain enclosed by $, so Cauchy’s theorem does not apply to an integral of this function. We will evaluate this integral directly by writing zt = −3i + 2eit : - 6+i 1 6 + i 2 1 dz = z tdt 3 z + 3i 3 zt + 3i $ 0 6 + i 2 1 6 + i 2 6+i 2i = 2ieit dt = idt = it 3 0 2e 3 0 3 Therefore $
6+i 2 2z + 1 dz = 2i = − + 4i z2 + 3iz 3 3
We will see more impressive ramifications of Cauchy’s theorem shortly.
22.3.1
Proof of Cauchy’s Theorem for a Special Case
If we add an additional hypothesis, Cauchy’s theorem is easy to prove. Let fz = ux y + ivx y and assume that u and v and their first partial derivatives are continuous on G. Now we obtain Cauchy’s theorem immediately by applying Green’s theorem and the Cauchy–Riemann equations to equation (22.1). If D consists of all points on and enclosed by $, then fzdz = udx − vdy + i vdx + udy $
$
$
u v −v u − dA + i − dA = 0 =
x
y
x y D D because, by the Cauchy–Riemann equations,
u v
u
v = and =−
x y
y
x This argument is good enough for many settings in which Cauchy’s theorem is used. However, it is not an optimal argument because it makes the additional assumption about continuity of the partial derivatives of u and v. A rigorous proof of the theorem as stated involves topological subtleties we do not wish to engage here. In the next section we will develop some important consequences of Cauchy’s theorem.
CHAPTER 22
994
Complex Integration
SECTION 22.3
PROBLEMS 5. fz = z $ is the unit circle about the origin
In each of Problems 1 through 12, evaluate the integral of the function over the given closed path. All paths are positively oriented (counterclockwise). In some cases Cauchy’s theorem applies, and in some it does not.
7. fz = zez $ is the circle z − 3i = 8 8. fz = z2 − 4z + i $ is the rectangle with vertices 1, 8, 8 + 4i and 1 + 4i
1. fz = sin3z $ is the circle z = 4 2z $ is the circle z − i = 3 2. fz = z−i 1 3. fz = $ is given by z − 2i = 2 z − 2i3
9. fz = z2 $ is the circle of radius 7 about the origin 10. fz = sin1/z $ is the circle z − 1 + 2i = 1 11. fz = Rez $ is given by z = 2
4. fz = z sinz $ is the square having vertices 0, 1, 1 + i and i 2
22.4
6. fz = 1/z $ is the circle of radius 5 about the origin
12. fz = z2 + Imz $ is the square with vertices 0, −2i, 2 − 2i and 2
Consequences of Cauchy’s Theorem This section lays out some of the main results of complex integration, with profound implications for understanding the behavior and properties of complex functions, as well as for applications of the integral. As usual, all integrals over closed curves are taken with a counterclockwise orientation, unless otherwise noted.
22.4.1 Independence of Path In Section 22.2.2 we mentioned independence of path, according to which, under certain conditions on f , the value of $ fzdz depends only on the end points of the curve, and not on the particular curve chosen between those end points. Independence of path can also be viewed from the perspective of Cauchy’s theorem. Suppose f is differentiable on a simply connected domain G, and z0 and z1 are points of G. Let $1 and $2 be piecewise smooth curves in G having initial point z0 and terminal point z1 (Figure 22.22). If we reverse the orientation on $2 , we obtain the new curve, −$2 , going from z1 to z0 . Further, the join of $1 and −$2 forms a closed curve $, having initial and terminal point z0 (Figure 22.23). By Cauchy’s theorem and Theorem 22.3, $
fzdz = 0 =
$1 ⊕−$2
fzdz =
$1
fzdz −
fzdz $2
implying that $1
fzdz =
fzdz $2
This means that the integral does not depend on the particular curve (in G) between z0 and z1 , and is therefore independent of path. This argument is not entirely rigorous, because $1 ⊕ −$2 need not be a simple curve (Figure 22.24). In fact, $1 and $2 may cross each other any number of times as they progress from z0 to z1 . Nevertheless, we wanted to point out the connection between Cauchy’s theorem and the concept of independence of path of an integral which was discussed previously.
22.4 Consequences of Cauchy’s Theorem 1 z1
z1
2
2
1
If write
$
z2
2
1 z0
995
z0
z0
FIGURE 22.22
FIGURE 22.23
FIGURE 22.24
Paths $1 and $2 from z0 to z1 .
Closed curve $ = $1 ⊕ −$2 .
$1 ⊕ −$2 need not be simple.
fzdz is independent of path in G, and $ is any path from z0 to z1 , we sometimes $
fzdz =
z1
fzdz z0
The symbol on the right has the value of the line integral on the left, with $ any path from z0 to z1 in G.
22.4.2
The Deformation Theorem
The deformation theorem enables us, under certain conditions, to replace one closed path of integration with another, perhaps more convenient one. THEOREM 22.8
Deformation Theorem
Let $ and ! be closed paths in the plane, with ! in the interior of $. Let f be differentiable on an open set containing both paths and all points between them. Then, fzdz = fzdz $
!
Figure 22.25 shows the setting of the theorem. We may think of deforming one curve, say !, to the other. Imagine ! as made of rubber, and continuously deform it into the shape of $. In doing this, it is necessary that the intermediate stages of the deformation from ! to $ only
FIGURE 22.25 Deforming ! continuously into $.
996
CHAPTER 22
Complex Integration
cross over points at which f is differentiable, hence the hypothesis about f being differentiable on some open set containing both paths and all points between them. The theorem states that the integral of f has the same value over both paths when one can be deformed into the other, moving only over points at which the function is differentiable. This means that we can replace $ with another path ! that may be more convenient to use in evaluating the integral. Consider the following example.
EXAMPLE 22.17
Evaluate
$
1 dz z−a
over any closed path enclosing the given complex number a. Figure 22.26 shows a typical such path. We cannot parametrize $ because we do not know it specifically – it is simply any path enclosing a. Let ! be a circle of radius r about a, with r small enough that ! is enclosed by $ (Figure 22.27). Now fz = 1/z − a is differentiable at all points except a, hence on both curves and the region between them. By the deformation theorem, - 1 - 1 dz = dz $ z−a ! z−a But ! is easily parametrized: !t = a + reit for 0 ≤ t ≤ 2. Then - 1 2 1 2 it dz = ire dt = idt = 2i reit ! z−a 0 0 Therefore
-
1 dz = 2i z−a
$
The point is that, by means of the deformation theorem, we can evaluate this integral over any path enclosing a. Of course, if $ does not/ enclose a, and a is not on $, then 1/z − a is differentiable on $ and the set it encloses, so $ 1/z − adz = 0 by Cauchy’s theorem. The proof of the theorem employs a technique we will find useful in several settings. Figure 22.28 shows graphs of typical paths $ and !. Insert lines L1 and L2 between $ and ! (Figure 22.29) and use these to form two closed paths # and + (shown separated for emphasis in Figure 22.30). One path, #, consists of parts of $ and !, together with L1 and L2 , with orientation on each piece as shown in order to have positive orientation on #. The other path, + , consists of the rest of $ and !, again with L1 and L2 , with the orientation chosen on each piece so that + has positive orientation. Figure 22.31 shows the paths more realistically, Proof
y y
a
a
x
x FIGURE 22.26
FIGURE 22.27
22.4 Consequences of Cauchy’s Theorem
997
y
x
FIGURE 22.28
y
L1
y L1 L1
L2
L2 L2
L1
L2 x
x
FIGURE 22.29
FIGURE 22.30
FIGURE 22.31
sharing the inserted segments L1 and L2 . In Figure 22.31, $ is oriented counterclockwise, but ! is clockwise, due to their orientations as parts of # and + . Because f is differentiable on both # and + and the sets they enclose, Cauchy’s theorem yields fzdz = fzdz = 0 #
Then
#
+
fzdz +
+
fzdz = 0
(22.2)
In this sum of integrals, each L1 and L2 is integrated over in one direction as part of # and the opposite direction as part of + . The contributions from these segments therefore cancel in the sum (22.2). Next observe that, in adding these integrals, we obtain the integral over all of $, oriented counterclockwise, together with the integral over all of !, oriented clockwise. In view of Theorem 22.3, equation (22.2) becomes fzdz − fzdz = 0 #
or
!
#
fzdz =
fzdz !
in which the orientation on both $ and ! in these integrals is positive (counterclockwise). This proves the theorem.
22.4.3
Cauchy’s Integral Formula
We will now state a remarkable result which gives an integral formula for the values of a differentiable function. THEOREM 22.9
Cauchy Integral Formula
Let f be differentiable on an open set G. Let $ be a closed path in G enclosing only points of G. Then, for any z0 enclosed by $, 1 - fz fz0 = dz 2i $ z − z0 We will see many uses for this theorem, but one is immediate. Write the formula as - fz dz = 2ifz0 $ z − z0
998
CHAPTER 22
Complex Integration
This gives, under the conditions of the theorem, an evaluation of the integral on the left as a constant multiple of the function value on the right.
EXAMPLE 22.18
Evaluate -
2
ez dz $ z−i for any closed path that does not pass through i. 2 Let fz = ez . Then f is differentiable for all z. There are two cases. / e z2 2 Case 1 $ does not enclose i. Now $ z−i dz = 0 by Cauchy’s theorem, because ez /z − 2 is differentiable on and within $. Case 2 $ encloses i. By Cauchy’s integral formula, with z0 = i, - e z2 dz = 2ifi = 2ie−1 $ z−i
EXAMPLE 22.19
Evaluate
- e2z sinz2 dz z−2 $
over any path not passing through 2. Let fz = e2z sinz2 . Then f is differentiable for all z. This leads to two cases. Case 1 If $ does not enclose 2, then fz/z − 2 is differentiable on the curve and at all points it encloses. Now the integral is zero by Cauchy’s theorem. Case 2 If $ encloses 2, then by the integral formula, - e2z sinz2 dz = 2if2 = 2ie4 sin4 z−2 $ Observe the distinction between the roles of fz in Cauchy’s theorem and in Cauchy’s / fzdz. The integral repreintegral representation. Cauchy’s theorem is concerned with $ / sentation is concerned with integrals of the form $ fz/z − z0 dz, with fz given, but multiplied by a factor 1/z − z0 which is not defined at z0 . If $ does not enclose z0 , then fz/z − z0 = gz may be differentiable at z0 and we can attempt to apply Cauchy’s theorem / If z0 is enclosed by $, then under the appropriate conditions, the integral formula to $ gzdz. / gives $ gzdz in terms of fz0 . Here is a proof of the integral representation. First use the deformation theorem to replace $ with a circle ! of radius r about z0 , as in Figure 22.32. Then - fz - fz − fz + fz - fz 0 0 dz = dz = dz z − z0 $ z − z0 ! z − z0 ! - fz − fz 1 0 dz + dz = fz0 z − z0 ! z − z0 !
Proof
22.4 Consequences of Cauchy’s Theorem
999
y !
x
z0
FIGURE 22.32
in which fz0 could be brought outside the first integral because fz0 is constant. By Example 22.17, 1 dz = 2i ! z − z0 because ! encloses z0 . Therefore - fz $
z − z0
dz = 2ifz0 +
- fz − fz 0 dz z − z0 !
The integral representation is proved if we can show that the last integral is zero. Write !t = z0 + reit for 0 ≤ t ≤ 2. Then fz − fz0 2 fz0 + reit − fz0 it dz = ire dt ! z − z0 reit 0 2 it =
fz0 + re − fz0 dt 0
≤
2
0
≤ 2
fz0 + reit − fz0 dt
it max fz0 + re − fz0
0≤t≤2
By continuity of fz at z0 , fz0 + reit → fz0 as r → 0, so the term on the right in this inequality has limit zero as r → 0 Therefore we can make fz − fz0 dz ! z−z 0
arbitrarily small by choosing r sufficiently small. But this integral is independent of r by the deformation theorem. Hence fz − fz0 dz = 0 ! z − z0 so - fz − fz 0 dz = 0 z − z0 ! and the theorem is proved. The integral representation gives some idea of how strong the condition of differentiability is for complex functions. The integral - fz dz $ z − z0 equals 2ifz0 , and so determines fz0 at each z0 enclosed by $. But the value of this integral depends only on the values of fz on $. Thus, for a differentiable function, knowing function
1000
CHAPTER 22
Complex Integration
values on $ determines the values of the function at all points enclosed by $. There is no analogous result for differentiable real functions. Knowing the values of a differentiable real function x at the end points of an interval in general gives no information about values of this function throughout the interval.
22.4.4 Cauchy’s Integral Formula for Higher Derivatives We will now show that a complex function that is differentiable on an open set must have derivatives of all orders on this set. There is no result like this for real functions. A real function that is differentiable need not have a second derivative. And if it has a second derivative, it need not have a third, and so on. Not only does a differentiable complex function have derivatives of all orders, we will show that the nth derivative of the function at a point is also given by an integral formula, very much in the spirit of Cauchy’s integral formula. THEOREM 22.10
Cauchy’s Integral Formula for Higher Derivatives
Let f be differentiable on an open set G. Then f has derivatives of all orders at each point of G. Further, if $ is a closed path in G enclosing only points of G, and z0 is any point enclosed by $, then fz n! f n z0 = dz 2i $ z − z0 n+1 The integral on the right is exactly what we would obtain by differentiating Cauchy’s integral formula for fz0 , n times with respect to z0 , under the integral sign. As with the integral formula, this conclusion is often used to evaluate integrals
EXAMPLE 22.20
Evaluate -
3
$
ez dz z − i3
with $ any path not passing through i. If $ does not enclose i then this integral is zero by Cauchy’s theorem, since the only point 3 at which ez /z − i3 fails to be differentiable is i. Thus suppose $ encloses i. Because the factor 3 z − i occurs to the third power in the denominator, use n = 2 in the theorem, with fz = ez , to get 3 ez 2i 2 f i = if i dz = 3 z − i 2! $ Now f z = 3z2 ez and f z = 6zez + 9z4 ez 3
3
3
so $
ez dz = i 6ie−i + 9e−i = −6 + 9ie−i 3 z − i 3
The theorem can be proved by induction on n, but we will not carry out the details.
22.4 Consequences of Cauchy’s Theorem
22.4.5
1001
Bounds on Derivatives and Liouville’s Theorem
Cauchy’s integral formula for higher derivatives can be used to obtain bounds on derivatives of all orders. THEOREM 22.11
Let f be differentiable on an open set G. Let z0 be a point of G and let the open disk of radius r about z0 be in G. Suppose fz ≤ M for z on the circle of radius r about z0 . Then, for any positive integer n, n f z0 ≤ Mn! rn Let !t = z0 + reit for 0 ≤ t ≤ 2. Then fz0 + reit ≤ M for 0 ≤ t ≤ 2. By Theorems 22.10 and 22.5, n fz n! 2 fz0 + reit it f z0 = n! dz = ire dt 2 ! z − z n+1 2 0 r n+1 ein+1t Proof
0
≤
n! 2 fz0 + reit 1 n! Mn! 2M n = n dt ≤ 2 0 rn 2 r r
As an application of this theorem, we will prove Liouville’s theorem on bounded, differentiable functions. THEOREM 22.12
Liouville
Let f be a bounded function that is differentiable for all z. Then f is a constant function. Previously we noted that sinz is not a bounded function in the complex plane, as it is on the real line. This is consistent with Liouville’s theorem. Since sinz is differentiable for all z, and is clearly not a constant function, it cannot be bounded. Here is a proof of Liouville’s theorem. Suppose fz ≤ M for all complex z. Choose any number z0 and any r > 0. By Theorem 22.11, with n = 1,
Proof
M r Since f is differentiable in the entire complex plane, r may be chosen as large as we like, so f z0 must be less than any positive number. We conclude that f z0 = 0, hence f z0 = 0. Since z0 is any number, then fz = constant. f z0 ≤
Liouville’s theorem can be used to give a simple proof of the fundamental theorem of algebra. This theorem states that any nonconstant complex polynomial pz = a0 + a1 z + · · · + an zn has a complex root. That is, for some number ", p" = 0. From this it can be further shown that, if an = 0, then pz must have exactly n roots, counting each root k times in the list if its multiplicity is k. For example, pz = z2 − 6z + 9 has exactly two roots, 3 and 3 (a root of multiplicity 2). This fundamental theorem assumes only elementary terminology for its statement, and is usually included, in some form, in the high school mathematics curriculum. The leading nineteenth century mathematician Carl Friedrich Gauss considered the theorem so important
1002
CHAPTER 22
Complex Integration
that he devised many proofs (some say nearly twenty) over his lifetime. But even today rigorous proofs of the theorem require mathematical terms and devices far beyond those needed to state it. To prove the theorem using Liouville’s theorem, suppose pz is a nonconstant complex polynomial and that pz = 0 for all z. Then 1/pz is differentiable for all z. Let pz = a0 + a1 z + · · · + an zn with n ≥ 1 and an = 0. We will show that pz is bounded for all z. Since an zn = pz − a0 − a1 z − · · · − an−1 zn−1 then an zn ≤ pz + a0 + a1 z + · · · + an−1 zn−1 Then, for z ≥ 1,
pz ≥ an zn − a0 + a1 z + · · · + an−1 zn−1 a1 a a0 = zn−1 an z − n−1 − n−2 − · · · − n−1 zn−n z z ≥ zn−1 an z − a0 − a1 − · · · − an−1
But then 1 1 ≤ n−1 →0 pz z an z − a0 − a1 − · · · − an−1 as z → . Therefore limn→ 1/ pz = 0. This implies that, for some positive number R, 1 < 1 if z > R pz But the closed disk z ≤ R is compact, and 1/ pz is continuous, so 1/ pz is bounded on this disk by Theorem 21.1. Therefore, for some M, 1 ≤ M for z ≤ R pz Now we have 1/ pz bounded both inside and outside z ≤ 1. Putting these bounds together, 1 ≤ M + 1 for all z pz both in z ≤ R and in z ≥ R. This makes 1/pz a bounded function that is differentiable for all z. By Liouville’s theorem, 1/pz must be constant, a contradiction. Therefore there must be some complex " such that p" = 0, proving the fundamental theorem of algebra. Complex analysis provides several proofs of this theorem. Later we will see one using a technique for evaluating real integrals of rational functions involving sines and cosines.
22.4.6 An Extended Deformation Theorem The deformation theorem allows us to deform one closed path $ of integration into another, !, without changing the value of the line integral of a differentiable function f . A crucial condition for this process is that no stage of the deformation should pass over a point at which f fails to be differentiable. This means that f needs to be differentiable on both curves and the region between them. We will now extend this result to the case that $ encloses any finite number of disjoint closed paths. As usual, unless explicitly stated otherwise, all closed paths are assumed to be oriented positively (counterclockwise).
22.4 Consequences of Cauchy’s Theorem THEOREM 22.13
1003
Extended Deformation Theorem
Let $ be a closed path. Let !1 !n be closed paths enclosed by $. Assume that no two of $ !1 !n intersect, and no point interior to any !j is interior to any other !k . Let f be differentiable on an open set containing $, each !j , and all points that are both interior to $ and exterior to each !j . Then, $
fzdz =
n
fzdz
j=1 !j
This is the deformation theorem in the case n = 1. Figure 22.33 shows a typical scenario covered by this theorem. With the curves as shown (and the differentiability assumption on f ), the integral of f about $ is the sum of the integrals of f about each of the closed curves !1 !n . We will outline a proof after an illustration of a typical use of the theorem.
EXAMPLE 22.21
Consider
$
z dz z + 2z − 4i
where $ is a closed path enclosing −2 and 4i. We will evaluate this integral using the extended deformation theorem. Place a circle !1 about −2 and a circle !2 about 4i, of sufficiently small radii that neither circle intersects the other or $ and each is enclosed by $ (Figure 22.34). Then z z z dz = dz + dz z + 2z − 4i z + 2z − 4i z + 2z − 4i $ !1 !2 Use a partial fractions decomposition to write 1 − 2i 4 + 2i z = 5 5 +5 5 z + 2z − 4i z+2 z − 4i
y
4i
!2 y
!n
!1
2
!2
!1
!3 x FIGURE 22.33
FIGURE 22.34
x
1004
CHAPTER 22
Complex Integration
Then $
z dz = z + 2z − 4i
1 2 4 2 1 1 − i dz + + i dz 5 5 5 5 !1 z + 2 !1 z − 4i 1 2 4 2 1 1 + − i dz + + i dz 5 5 z + 2 5 5 z − 4i !2 !2
On the right, the second and third integrals are zero by Cauchy’s theorem (!1 does not enclose 4i and !2 does not enclose −2). The first and fourth integrals equal 2i by Example 22.17. Therefore 1 2 4 2 z dz = 2i − i + + i = 2i 5 5 5 5 $ z + 2z − 4i
A proof of the theorem can be modeled after the proof of the deformation theorem. As suggested in Figure 22.35, draw line segments L1 from $ to !1 , L2 from !1 to !2 Ln from !n−1 to !n and, finally, Ln+1 from !n to $. Form the closed paths # and shown separately in Figures 22.36 and 22.37, and as they actually appear in Figure 22.38. Then fzdz + fzdz = 0
Proof
#
both integrals being zero by Cauchy’s theorem. (By the hypotheses of the theorem, f is differentiable on and inside both # and ). In this sum of integrals over # and , each line segment Lj is integrated over in both directions, hence the contributions of the integrals over these segments is zero. Further, in this sum we retrieve the integral of fz over all of $, and all of each !j , with the orientation counterclockwise on $ and clockwise on each !j (note the orientations in Figure 22.38). y L n1
!1
!n
!2
Ln
L1
L2
!n1
x
L n1
FIGURE 22.35
FIGURE 22.36
FIGURE 22.37
FIGURE 22.38
22.4 Consequences of Cauchy’s Theorem
1005
Reversing the orientations on the !j s, so that all paths are oriented counterclockwise, the last sum becomes n fzdz − fzdz = 0 $
j=1 !j
yielding the conclusion of the theorem.
SECTION 22.4
PROBLEMS
In each of Problems 1 through 12, evaluate fzdz for the given function and closed (positively oriented) path. These problems may involve Cauchy’s theorem, Cauchy’s integral formulas and/or the deformation theorems. z4 $ is any closed path enclosing 2i 1. fz = z − 2i 2. fz =
sinz2 $ is any closed path enclosing 5 z−5
3. fz =
z2 − 5z + i $ is the circle z = 3 z − 1 + 2i
2z3 $ is the rectangle having vertices 4 ± i z − 22 and −4 ± i
4. fz =
5. fz =
iez $ is the circle z − 1 = 4 z − 2 + i2
6. fz =
cosz − i $ is any closed path enclosing −2i z + 2i3
7. fz =
z sin3z $ is the circle z − 2i = 9 z + 43
8. fz = 2iz z $ is the line segment from 1 to −i
9. fz = − ing −4
2 + i sinz4 $ is any closed path enclosz + 42
10. fz = z − i2 $ is the semicircle of radius 1 about 0 from i to −i 11. fz = Rez + 4 $ is the line segment from 3 + i to 2 − 5i 12. fz =
3z2 coshz $ is the circle of radius 8 about 1 z + 2i2
13. Evaluate
2
ecos cossind
0
/ Hint: Consider $ ez /zdz, with $ the unit circle about the origin. Evaluate this integral once using Cauchy’s integral formula, then again directly by using the coordinate functions for $. 14. Use the extended form of the deformation theorem / z − 4i to evaluate $ 3 dz, where $ is a closed path z + 4z enclosing the origin, 2i and −2i.
This page intentionally left blank
CHAPTER
23
POWER SERIES REPRESENTATIONS ISOLATED ZEROS AND THE IDENTITY THEOREM THE MAXIMUM MODULUS THEOREM THE LAURENT EXPANSION POWER SERIES REPRESENTATIONS ISOLATED
Series Representations of Functions
We will now develop two kinds of representations of a fz in series of powers of z − z0 . The first series will contain only nonnegative integer powers, hence is a power series, and applies when f is differentiable at z0 . The second will contain negative integer powers of z − z0 as well, and will be used when f is not differentiable at z0 .
23.1
Power Series Representations We know that a power series that converges in an open disk, or perhaps the entire plane, defines a function that is infinitely differentiable within this disk or the plane. We will now go the other way and show that a function that is differentiable on an open disk is represented by a power series expansion about the center of this disk. This will have important applications, including information about zeros of functions, and the maximum value that can be taken by the modulus fz of a differentiable function.
THEOREM 23.1
Taylor Series
Let f be differentiable on an open disk D about z0 . Then, for each z in D, fz =
f n z0 n=0
n!
z − z0 n
The series on the right is the Taylor series of fz about z0 , and the number f n z0 /n! is the nth Taylor coefficient of fz at z0 . The theorem asserts that the Taylor series of fz converges to fz, hence represents fz, within this disk. 1007
CHAPTER 23
1008
Series Representations of Functions
Let z be any point of D and R the radius of D. Choose a number r with z − z0 < r < R and let ! be the circle of radius r about z0 (Figure 23.1). By the Cauchy integral formula, using w for the variable of integration on !, Proof
R
r
!
z0 w
fz = FIGURE 23.1
1 - fw dw 2i ! w − z
Now 1 1 1 1 = = w − z w − z0 − z − z0 w − z0 1 − z − z0 /w − z0 Since w is on !, and z is enclosed by !, then z − z0 w−z < 1 0 so we can use a convergent geometric series to write 1 z − z0 n 1 1 = = z − z0 n n+1 w − z w − z0 n=0 w − z0 w − z 0 n=0 Then fw fw = z − z0 n w − z n=0 w − z0 n+1
(23.1)
Since f is continuous on !, for some M, fw ≤ M for w on !. Further, w − z0 = r, so fw 1 z − z0 n n w − z n+1 z − z0 ≤ M r r 0 Designate M
1 r
z − z0 r
n = Mn
Then n=0 Mn converges (this series is a constant times a convergent geometric series). By Theorem 22.6, the series in equation (23.1) can be integrated term-by-term to yield 1 - fw 1 - fw n dw = z − z0 dw fz = 2i ! w − z 2i ! n=0 w − z0 n+1 1 fw f n z0 n = z − z0 n dw z − z = 0 n+1 2i w − z n! ! 0 n=0 n=0 in which we used Cauchy’s integral formula for the nth derivative to write the coefficient in the last series. This proves the theorem. A complex function is said to be analytic at z0 if it has a power series expansion in some open disk about z0 . We have just proved that a function that is differentiable on an open disk about a point is analytic at that point. We only compute the coefficients of a Taylor series by derivative or integral formulas when other means fail. When possible, we use known series and operations such as differentiation
23.1 Power Series Representations
1009
and integration to derive a series representation. This strategy makes use of the uniqueness of power series representations. THEOREM 23.2
Suppose, in some disk z − z0 < r,
cn z − z0 n =
n=0
dn z − z0 n
n=0
Then, for n = 0 1 2 cn = dn . Proof
If we let fz be the function defined in this disk by both power series, then cn =
f n z0 = dn n!
This means that, no matter what method is used to find a power series for fz about z0 , the end result is the Taylor series.
EXAMPLE 23.1
Find the Taylor expansion of ez about i. We know that, for all z, ez =
1 n z n=0 n!
For an expansion about i, the power series must be in terms of powers of z − i. Thus write ez = ez−i+i = ei ez−i =
ei
n=0
1 z − in n!
This series converges for all z. In this example, it would have been just as easy to compute the Taylor coefficients directly: cn =
ei f n i = n! n!
EXAMPLE 23.2
Write the Maclaurin series for cosz3 . A Maclaurin expansion is a Taylor series about zero. For all z, cosz =
−1n n=0
2n!
z2n
3
All we need to do is replace z with z : cosz3 =
−1n n=0
2n!
z3 2n =
−1n n=0
2n!
z6n
Since this is an expansion about the origin, it is the expansion we seek.
1010
CHAPTER 23
Series Representations of Functions
EXAMPLE 23.3
Expand 2i 4 + iz in a Taylor series about −3i. We want a series in powers of z + 3i. Do some algebraic manipulation and then use a geometric series. In order to get z + 3i into the picture, write 2i 2i 2i 1 2i = = = i 4 + iz 4 + iz + 3i + 3 7 + iz + 3i 7 1 + 7 z + 3i If t < 1, then 1 1 = = −tn = −1n 1 + t 1 − −t n=0 n=0
With t = z + 3ii/7, we have
n 1 i n z + 3i = −1 7 1 + 7i z + 3i n=0
Therefore
n 2i i 2i 2−1n in+1 = −1n z + 3in = z + 3in n+1 4 + iz 7 n=0 7 7 n=0
Because this is a series expansion of the function about −3i, it is the Taylor series about −3i. This series converges for i z + 3i < 1 7 or z + 3i < 7 Thus z must be in the open disk of radius 7 about −3i. The radius of convergence of this series is 7. From Section 21.2, we can differentiate a Taylor series term by term within its open disk of convergence. This is sometimes useful in deriving the Taylor expansion of a function.
EXAMPLE 23.4
Find the Taylor expansion of fz = 1/1 − z3 about the origin. We could do this by algebraic manipulation, but it is easier to begin with the familiar geometric series gz =
1 = zn 1 − z n=0
for z < 1
Then g z =
1 = nzn−1 2 1 − z n=1
23.1 Power Series Representations
1011
and g z =
2 n−2 = nn − 1z = n + 1n + 2zn 1 − z3 n=2 n=0
for z < 1. Then fz =
1 n=0 2
n + 1n + 2zn
for z < 1
When fz is expanded in a power series about z0 , the radius of convergence of the series will be the distance from z0 to the nearest point at which fz is not differentiable. Think of a disk expanding uniformly from z0 , free to continue its expansion until it hits a point at which fz is not differentiable. For example, suppose fz = 2i/4 + iz, and we want a Taylor expansion about −3i. The only point at which fz is not defined is 4i, so the radius of convergence of this series will be the distance and 4i, or 7. We obtained this result previously from the Taylor betweenn −3i n+1 n+1 2−1 i /7 z + 3in of fz. expansion n=0
EXAMPLE 23.5
We will find the radius of convergence of the Taylor series of cscz about 3 − 4i. Since cscz = 1/ sinz, this function is differentiable except at z = n, with n any integer. As Figure 23.2 illustrates, is the nearest point to 3 − 4i at which cscz is not differentiable. The distance between and 3 − 4i is − 32 + 16 and this is the radius of convergence of the expansion of cscz about 3 − 4i. y
x
3) 2 16 (w 3 4i FIGURE 23.2
Existence of a power series expansion implies the existence of an antiderivative. THEOREM 23.3
Let f be differentiable in an open disk D about z0 . Then there is a differentiable function F such that F z = fz for all z in D. Proof
We know that f has a power series expansion in D: fz =
cn z − z0 n
n=0
Let Fz =
1 cn z − z0 n+1 n + 1 n=0
1012
CHAPTER 23
Series Representations of Functions
for z in D. It is routine to check that this power series has radius of convergence at least as large as the radius of D, and that F z = fz for z in D.
23.1.1 Isolated Zeros and the Identity Theorem We will use the Taylor series representation of a differentiable function to derive important information about its zeros.
DEFINITION 23.1
A number " is a zero of f if f" = 0. A zero " of f is isolated if there is an open disk about " containing no other zero of f .
For example, the zeros of sinz are n, with n any integer. These zeros are all isolated. By contrast, let sin1/z if z = 0 fz = 0 if z = 0 The zeros of f are 0 and the numbers 1/n, with n any nonzero integer. However, 0 is not an isolated zero, since any open disk about zero contains other zeros, 1/n, for n sufficiently large. We will show that the behavior of the zeros in this example disqualifies f from being differentiable at 0. THEOREM 23.4
Let f be differentiable on an open disk G and let " be a zero of f in G. Then either " is an isolated zero of f , or there is an open disk about " on which fz is identically zero. This means that a differentiable function that is not identically zero on some disk can have only isolated zeros. Proof
Write the power series expansion of fz about ", fz =
cn z − "n
n=0
in some open disk D centered at ". Now consider two cases. Case 1 If each cn = 0, then fz = 0 throughout D. Case 2 Suppose, instead, that some coefficients in the power series are not zero. Let m be the smallest integer such that cm = 0. Then c0 = c1 = · · · = cm−1 = 0 and cm = 0. For z in D, fz =
n=m
cn z − "n =
n=0
cn+m z − "n+m = z − "m
n=0
cn+m z − "n
23.1 Power Series Representations
1013
n Now n=0 cn+m z − " is a power series, and so defines a differentiable function gz for z in D. Further, fz = z − "m gz But g" = cm = 0, so there is some open disk K about " in which gz = 0. Therefore, for z = " in K, fz = 0 also, hence " is an isolated zero of f . This proof contained additional information about zeros that will be useful later. The number m in the proof is called the order of the zero " of fz. It is the smallest integer m such that the coefficient cm in the expansion of fz about " is nonzero. Now recall that cn = f n "/n!. Thus c0 = c1 = · · · = cm−1 = 0 implies that f" = f " = · · · = f m−1 " = 0 while cm = 0 implies that f m " = 0 In summary, an isolated zero " of f has order m if the function and its first m − 1 derivatives vanish at ", but the mth derivative at " is nonzero. Put another way, the order of the zero " is the lowest order derivative of f that does not vanish at ". We also derived the important fact that, if " is an isolated zero of order m of f , then we can write fz = z − "m gz where g is also differentiable in some disk about ", and g" = 0.
EXAMPLE 23.6
Consider fz = z2 cosz. 0 is an isolated zero of this differentiable function. Compute f z = 2z cosz − z2 sinz and f z = 2 cosz − 4z sinz + z2 cosz Observe that f0 = f 0 = 0 while f 0 = 0. Thus 0 is a zero of order 2 of f . In this case, we already have fz = z − 02 gz with g0 = 0, since we can choose gz = cosz
EXAMPLE 23.7
We have to exercise some care in identifying the order of a zero. Consider fz = z2 sinz. 0 is an isolated zero of f . Compute f z = 2z sinz + z2 cosz f z = 2 sinz + 4z cosz − z2 sinz f 3 z = 2 cosz + 4 cosz − 4z sinz − 2z sinz − z2 cosz = 6 cosz − 6z sinz − z2 cosz
1014
CHAPTER 23
Series Representations of Functions
Then f0 = f 0 = f 0 = 0 while f 3 0 = 0 This means that 0 is a zero of order 3. We can write −1n 2n+1 −1n 2n fz = z2 z z = z3 gz = z3 2n + 1! 2n + 1! n=0 n=0 where gz =
−1n 2n z n=0 2n + 1!
is differentiable (in this example, for all z) and g0 = 1 = 0. As a result of Theorem 23.3, we can show that, if a differentiable complex function vanishes on a convergent sequence of points in a domain (open, connected set), then the function is identically zero throughout the entire domain. This is a very strong result, for which there is no analogue for real functions. For example, consider 0 for x ≤ 0 hx = 2 for x > 0 x whose graph is shown in Figure 23.3 This function is differentiable for all real x, and is identically zero on half the line, but is not identically zero over the entire line. Another difference between differentiability for real and complex functions is evident in this example. While h is differentiable, it does not have a power series expansion about 0. By contrast, a complex function that is differentiable on an open set has a power series expansion about each point of the set. THEOREM 23.5
Let f be differentiable on a domain G. Suppose zn is a sequence of distinct zeros of f in G, converging to a point of G. Then fz = 0 for all z in G.
h(x) 16 14 12 10 8 6 4 2 4
2
0
2
FIGURE 23.3
hx =
x2 0
for x > 0 < 1 . for x ≤ 0
4
x
23.1 Power Series Representations
G D3 y
D2 ξ3 ξ 2 ξ1 "
D1 f(z) 0
Dn D0
f(z) 0
w
"
ξn1
D x FIGURE 23.4
1015
Distance 2r
Dn1 FIGURE 23.5
D0 " z1 D1 FIGURE 23.6
Suppose zn → " in G. Since f is continuous, fzn → f". But each fzn = 0, so f" = 0 also, and " must also be a zero of f in G. This means that " is not an isolated zero, so by Theorem 23.3, there must be an open disk D about ", in which fz is identically zero (Figure 23.4). We want to prove that this forces fz = 0 for all z in G. To do this, let w be any point of G. We will show that fw = 0. Since G is connected, there is a path $ in G from " to w. Choose a number r such that every point of $ is at a distance at least 2r from the boundary of G, and also choose r to be less than the radius of D. Now walk along $ from " to w, along the way selecting points at distance less than r from each other. This yields points " = 0 1 n = w on $, as in Figure 23.5. Form an open disk Dj of radius r about each j . (By choice of r, none of these disks reaches the boundary of G). Each j−1 is in Dj−1 , Dj and Dj+1 for j = 1 n − 1. Further, 0 = " is in D0 and D1 , and w is in Dn−1 and Dn . Since 1 is in D0 and D1 , there is a sequence of points in both D0 and D1 converging to 1 (Figure 23.6). But fz is identically zero on D0 , so fz vanishes on this sequence. Since this sequence is also in D1 , fz = 0 for all z in D1 . Now 2 is in D1 and D2 . Choose a sequence of points common to both of these disks, and converging to 2 . Since fz is identically zero on D1 , then fz = 0 at each point of this sequence. But since this sequence is also in D2 , then fz is identically zero on D2 . Continuing in this way, walk along $ from " to w. We find that fz is identically zero on each of the disks along the way. Finally, fz is zero on Dn . But w is in Dn , so fw = 0, and therefore fz = 0 for all z in G. Proof
This theorem leads immediately to the conclusion that two differentiable functions that agree on a convergent sequence in a domain must be the same function. This is called the identity theorem. COROLLARY 23.1
Identity Theorem
Let f and g be differentiable on a domain G. Suppose fz and gz agree on a convergent sequence of distinct points of G. Then fz = gz for all z in G. Proof
Apply Theorem 23.4 to the differentiable function f − g.
To get some idea of the power of this result, consider the problem of defining the complex sine function sinz so that it agrees with the real sine function when z is real. How many ways can this be done?
1016
CHAPTER 23
Series Representations of Functions
Put another way, is it possible to invent two distinct differentiable complex functions, f and g, defined for all z, such that, when z is real, fx = gx = sinx? If this could be done, then we would have fz = gz on a convergent sequence of complex numbers (chosen along the real line) in a domain (the entire plane), so necessarily f = g. There can be only one extension of a differentiable function from the real to the complex domain. This is why, when we extend a real function (such as the exponential or trigonometric functions) to the complex plane, we can be assured that this extension is unique.
23.1.2 The Maximum Modulus Theorem Suppose f S → , and S is a compact set. We know from Theorem 21.3 that fz assumes a maximum value on S. This means that, for at least one " in S, fz ≤ f" for all " in S. But this does not give any information about where in S the point " might be. We will now prove that any such " must lie on the boundary of S, if f is a differentiable function. This is called the maximum modulus theorem. The name of the theorem derives from the fact that the real-valued function fz is called the modulus of fz, and we are concerned with the maximum that the modulus of fz has as z varies over a set S. We will first show that a differentiable function that is not constant on an open disk cannot have its maximum modulus at the center of the disk. LEMMA 23.1
Let f be differentiable and not constant on an open disk D centered at z0 . Then, for some z in this disk, fz > fz0 Suppose instead that fz ≤ fz0 for all z in D. We will derive a contradiction. Let !t = z0 + reit for 0 ≤ t ≤ 2. Suppose r is small enough that this circle is contained in D. By Cauchy’s integral theorem, 1 - fz 1 2 fz0 = dz = fz0 + reit dt 2i ! z − z0 2 0 Proof
Then fz0 ≤
1 2 fz0 + reit dt 2 0
But z0 + reit is in D for 0 ≤ t ≤ 2, so fz0 + reit ≤ fz0 . Then 1 2 1 2 fz0 + reit dt ≤ fz0 dt = fz0 2 0 2 0 The last two inequalities imply that 1 2 fz0 + reit dt = fz0 2 0 But then
1 2 fz0 − fz0 + reit dt = 0 2 0
23.1 Power Series Representations
1017
This integrand is continuous and nonnegative for 0 ≤ t ≤ 2. If it were positive for any t, then there would be some subinterval of 0 2 on which the integrand would be positive, and then this integral would be positive, a contradiction. Therefore the integrand must be identically zero: fz0 + reit = fz0 for 0 ≤ t ≤ 2 This says that fz has the constant value fz0 on every circle about z0 and contained in D. But every point in D is on some circle about z0 and contained in D. Therefore fz = fz0 = constant for all z in D. Then by Theorem 21.7, fz = constant on D. This contradiction proves the lemma. We can now derive the maximum modulus theorem. THEOREM 23.6
Maximum Modulus Theorem
Let S be a compact, connected set of complex numbers. Let f be continuous on S and differentiable at each interior point of S. Then fz achieves its maximum value at a boundary point of S. Further, if f is not a constant function, then fz does not achieve its maximum at an interior point of S. Because S is compact and f is continuous, we know from Theorem 21.3 that fz achieves a maximum value at some (perhaps many) points of S. Let " be such a point. If " is an interior point, then there is an open disk D about " that contains only points of S. But then fz achieves its maximum on this disk at its center. Now there are two cases. Proof
Case 1 fz is constant on this disk. By the identity theorem, fz is constant on S. In this event fz is constant on S. Case 2 fz is not constant on this disk. Then fz ≤ f" for z in this disk, contradicting Lemma 23.1. In this case fz cannot achieve a maximum in the interior of S, and so must achieve its maximum at a boundary point.
EXAMPLE 23.8
Let fz = sinz. We will determine the maximum value of fz on the square 0 ≤ x ≤ , 0 ≤ y ≤ . First, it is convenient to work with fz2 , since this will have its maximum at the same value of z that fz does. Now fz = sinz = sinx coshy + i cosx sinhy so fz2 = sin2 x cosh2 y + cos2 x sinh2 y By the maximum modulus theorem, fz2 must have achieve its maximum value (for this square) on one of the sides of the square. Look at each side in turn. On the bottom side, y = 0 and 0 ≤ x ≤ , so fz2 = sin2 x achieves a maximum value of 1 On the right side, x = and 0 ≤ y ≤ , so fz2 = sinh2 y achieves a maximum value of sinh2 . This is because cos2 = 1 and sinhy is a strictly increasing function on 0 .
CHAPTER 23
1018
Series Representations of Functions
On the top side of the square, y = and 0 ≤ x ≤ . Now fz2 = sin2 x cosh2 + cos2 x sinh2 . We need to know where this achieves its maximum value for 0 ≤ x ≤ . This is a problem in single-variable calculus. Let gx = sin2 x cosh2 + cos2 x sinh2 Then g x = 2 sinx cosx cosh2 − 2 cosx sinx sinh2 = sin2x cosh2 − sinh2 = sin2x This derivative is zero on 0 at x = /2, so this is the critical point of g. Further, = cosh2 g 2 At the ends of the interval, we have g0 = g = sinh2 < cosh2 Therefore, on the top side of the square, fz2 achieves the maximum value of cosh2 . Finally, on the left side of the square, x = 0 and 0 ≤ y ≤ , so fz2 = sinh2 y, with maximum sinh2 on 0 ≤ y ≤ . We conclude that, on this square, fz2 has its maximum value of cosh2 , which is the maximum value of fz2 on the boundary of the square. Therefore fz has a maximum value of cosh on this square.
SECTION 23.1
PROBLEMS
In each of Problems 1 through 12, find the Taylor series of the function about the point. Also determine the radius of convergence and open disk of convergence of the series. 1. cos2z z = 0 2. e−z z = −3i 3.
1 4i 1−z
4. sinz2 0 5.
1 0 1 − z2
1 1 − 8i 6. 2+z 7. z2 − 3z + i 2 − i 8. 1 +
1 i 2 + z2
9. z − 92 1 + i 10. e − i sinz 0 z
11. sinz + i −i
12.
3 −5 z − 4i
13. Suppose f is differentiable in an open disk about zero, and satisfies f z = 2fz + 1. Suppose f0 = 1 and f 0 = i. Find the Maclaurin expansion of fz. 14. Find the first three terms of the Maclaurin expansion of sin2 z in three ways as follows: (a) First, compute the Taylor coefficients at 0. (b) Find the first three terms of the product of the Maclaurin series for sinz with itself. (c) Write sin2 z in terms of the exponential function and use the Maclaurin expansion of this function. 15. Show that 1 2 2z cos 1 2n z = e d 2 2 0 n=0 n!
Hint: First show that n 2 1 z zn = ezw dw n! 2i $ n!wn+1
23.2 The Laurent Expansion
1019
for n = 0 1 2 and $ the unit circle about the origin.
17. Find the maximum value of ez on the square 0 ≤ x ≤ 1 0 ≤ y ≤ .
16. Find the maximum value of cosz on the square 0 ≤ x ≤ 0 ≤ y ≤ .
18. Find the maximum value of sinz on the rectangle 0 ≤ x ≤ 2 0 ≤ y ≤ 1.
23.2
The Laurent Expansion If f is differentiable in some disk about z0 , then fz has a Taylor series representation about z0 . If a function is not differentiable at z0 , it may have a different kind of series expansion about z0 , a Laurent expansion. This will have profound implications in analyzing properties of functions and in such applications as evaluating real and complex integrals. First we need some terminology. The open set of points between two concentric circles is called an annulus. Typically an annulus is described by inequalities r < z − z0 < R in which r is the radius of the inner circle and R the radius of the outer circle (Figure 23.7). We allow r = 0 in this inequality, in which case the annulus 0 < z − z0 < R is a punctured disk (open disk with the center removed). We also allow R = . The annulus r < z − z0 < consists of all points outside the inner circle of radius r. An annulus 0 < z − z0 < consists of all complex z except z0 . We can now state the main result on Laurent series.
THEOREM 23.7
Let 0 ≤ r < R ≤ . Suppose f is differentiable in the annulus r < z − z0 < R. Then, for each z in this annulus, fz =
cn z − z0 n
n=−
where, for each integer n, cn =
fz 1 dz 2i $ z − z0 n+1
and $ is any closed path about z0 lying entirely in the annulus. A typical such $ is shown in Figure 23.8. The series in the theorem, which may include both positive and negative powers of z − z0 , is the Laurent expansion, or Laurent series, for fz about z0 in the given annulus. This expansion has the appearance ···+
c−2 c + −1 + c0 + c1 z − z0 + c2 z − z0 2 + · · · 2 z − z0 z − z0
The function need not be differentiable, or even defined, at z0 or at other points within the inner circle of the annulus. The numbers cn are the Laurent coefficients of f about z0 . A Laurent series is a decomposition of fz into a sum fz =
−1 n=−
cn z − z0 n +
n=0
cn z − z0 2 =
c−n + cn z − z0 n = hz + gz n n=1 z − z0 n=0
1020
CHAPTER 23
Series Representations of Functions
$ z0 r
R
z0
FIGURE 23.7
FIGURE 23.8
Circles z − z0 = r and z − z0 = R bounding the open annulus r < z − z0 < R.
Closed path $ enclosing z0 and lying in the annulus r < z − z0 < R.
The part containing only nonnegative powers of z − z0 defines a function gz that is differentiable on z − z0 < R (because this part is a Taylor expansion). The part containing only negative powers of z − z0 defines a function hz that is not defined at z0 . This part determines the behavior of fz about a point z0 where f is not differentiable. A proof of the theorem is given on the website. As with Taylor series, we rarely compute the coefficients in a Laurent expansion using this integral formula (quite the contrary, we will use one of these coefficients to evaluate integrals). Instead, we use known series and algebraic or analytic manipulations. This requires that we be assured that the Laurent expansion of a function in an annulus about a point is unique, and does not change with the method of derivation. THEOREM 23.8
Let f be differentiable in an annulus r < z − z0 < R. Suppose, for z in this annulus, fz =
bn z − z0 n
n=−
Then the bn s are the Laurent coefficients cn of f , and this series is the Laurent expansion of fz in this annulus. Choose ! as a circle about z0 in the annulus. Let k be any integer. Using Theorem 22.6, we get fw 1 n 2ick = dw = bn z − z0 dw k+1 w − z0 k+1 ! w − z0 ! n=−
Proof
=
n=−
bn
!
1 dw w − z0 k−n+1
Now, on !, w = z0 + reit for 0 ≤ t ≤ 2, with r the radius of !. Then 2 1 1 dw = ireit dt k−n+1 r k−n+1 eit k−n+1 ! w − z0 0 i 2 in−kt 0 if k = n = k−n e dt = r 2i if k = n 0
(23.2)
23.2 The Laurent Expansion
1021
Thus in equation (23.2), all terms in the last vanish series except the term with n = k, and the equation reduces to 2ick = 2ibk Hence for each integer k, bk = ck . Here are some examples of Laurent expansions.
EXAMPLE 23.9
e1/z is differentiable in the annulus 0 < z < , the plane with the origin removed. Since ez =
1 n z n=0 n!
then, in this annulus, e1/z =
1 n=0 n!
n 1 1 1 1 1 1 1 1 = 1+ + 2 + 3 + +··· z z 2z 6z 24 z4
This is the Laurent expansion of e1/z about 0, and it converges for all nonzero z. This expansion contains a constant term and infinitely many negative integer powers of z, but no positive powers.
EXAMPLE 23.10
We will find the Laurent expansion of cosz/z5 about zero. For all z, cosz =
−1n n=0
2n!
z2n
For z = 0, cosz 1 1 −1n 2n−5 1 1 1 1 1 = = 5− 3+ z − z+ z3 − · · · 5 z 2n! z 2 z 24 z 720 40 320 n=0
This is the Laurent expansion of cosz/z5 about 0. This expansion contains exactly three terms containing negative powers of z, and the remaining terms contain only positive powers. We can think of cosz/z5 = hz + gz, where gz = −
1 1 z+ z3 − · · · 720 40 320
is a differentiable function (it is a power series about the origin), and hz =
1 1 1 1 1 − 3+ 5 z 2z 24 z
It is hz that determines the behavior of cosz/z5 near the origin.
1022
CHAPTER 23
Series Representations of Functions
EXAMPLE 23.11
Find the Laurent expansion of 1 z + 1z − 3i about −1. Use partial fractions to write 1 −1 + 3i 1 1 − 3i 1 = + z + 1z − 3i 10 z + 1 10 z − 3i 1/z + 1 is already expanded around −1, so concentrate on the last term: z+1 n 1 1 1 −1 1 = = = z+1 z − 3i −1 − 3i + z + 1 −1 − 3i 1 − 1+3i 1 + 3i n=0 1 + 3i =−
1 z + 1n 1 + 3in+1 n=0
This expansion is valid for z + 1/1 + 3i < 1, or z + 1 < 1/z + 1z − 3i about −1 is
√ 10. The Laurent expansion of
1 −1 + 3i 1 1 − 3i 1 = − z + 1n z + 1z − 3i 10 z + 1 10 n=0 1 + 3in+1 √ and this representation is valid in the annulus 0 < z + 1 < 10. √ Notice that 10 is the distance from −1, the center of the Laurent expansion, to the other point, 3i, at which the function is not differentiable.
In the next chapter we will use the Laurent expansion to develop the powerful residue theorem, which has many applications, including the evalution of real and complex integrals.
SECTION 23.2
PROBLEMS
In each of Problems 1 through 10, write the Laurent expansion of the function in an annulus 0 < z − z0 < R about the point. 1. 2. 3. 4. 5.
2z i 1 + z2 sinz 0 z2 1 − cos2z 0 z2 z2 cosi/z 0 z2 1 1−z
6.
z2 + 1 1/2 2z − 1
7.
ez 0 z2
8.
sin4z 0 z
2
z+i i z−i 1 10. sinh 3 0 z 9.
CHAPTER
24
SINGULARITIES THE RESIDUE THEOREM SOME APPLICATIONS OF THE RESIDUE THEOREM THE ARGUMENT PRINCIPLE AN INVERSION FORMULA FOR THE LAPLACE TRANSFORM EVALUATION OF
Singularities and the Residue Theorem
As a prelude to the residue theorem, we will use the Laurent expansion to classify points at which a function is not differentiable.
24.1
Singularities DEFINITION 24.1
Isolated Singularity
A complex function f has an isolated singularity at z0 if f is differentiable in an annulus 0 < z − z0 < R, but not at z0 itself.
For example, 1/z has an isolated singularity at 0, and sinz/z − has an isolated singularity at . We will now identify singularities as being of different types, depending on the terms appearing in the Laurent expansion of the function about the singularity.
DEFINITION 24.2
Classification of Singularities
Let f have an isolated singularity at z0 . Let the Laurent expansion of fz in an annulus 0 < z − z0 < R be fz =
cn z − z0 n
n=−
1023
1024
CHAPTER 24
Singularities and the Residue Theorem
Then: 1. z0 is a removable singularity of f if cn = 0 for n = −1 −2 2. z0 is a pole of order m (m a positive integer) if c−m = 0 and c−m−1 = c−m−2 = · · · = 0 3. z0 is an essential singularity of f if c−n = 0 for infinitely many positive integers n.
These three types cover all the possibilities for an isolated singularity. In the case of a removable singularity, the Laurent expansion has no negative powers of z − z0 , and is therefore fz =
cn z − z0 n
n=0
a power series about z0 . In this case we can assign fz0 the value c0 to obtain a function that is differentiable in the open disk z − z0 < r.
EXAMPLE 24.1
Let fz =
1 − cosz z
for 0 < z < . Since cosz = 1 −
z2 z4 z6 + − +··· 2! 4! 6!
for all z, then fz =
1 − cosz z z3 z5 −1n+1 2n−1 = − + = z z 2! 4! 6! n=1 2n!
for z = 0. The series on the right is actually a power series, having the value 0 at z = 0. We can therefore define a new function 1 − cosz/z for z = 0 gz = 0 for z = 0 which agrees with fz for z = 0, but is defined at 0 in such a way as to be differentiable there, because gz has a power series expansion about 0. Because it is possible to extend f to a function g that is differentiable at 0, we call 0 a removable singularity of f . Thus, a removable singularity is one that can be “removed” by appropriately assigning the function a value at the point.
24.1 Singularities
1025
EXAMPLE 24.2
fz = sinz/z − has a removable singularity at . To see this, first write the Laurent expansion of fz in 0 < z − < . An easy way to do this is to begin with sinz − = sinz cos − cosz sin = − sinz so sinz = − sinz − =
−1n+1 z − 2n+1 n=0 2n + 1!
Then, for z = , −1n+1 sinz 1 1 = z − 2n = −1 + z − 2 − z − 4 + · · · z − n=0 2n + 1! 6 120
Although f is not defined, the series on the right is defined for z = , and is equal to −1 there. We therefore extend f to a differentiable function g defined over the entire plane by assigning the new function the value −1 when z = : fz for z = gz = −1 for z = This extension “removes” the singularity of f at , since fz = gz for z = , and g = −1. For f to have a pole at z0 , the Laurent expansion of f about z0 must have terms with negative powers of z − z0 , but only finitely many such terms. If the pole has order m, then this Laurent expansion has the form fz =
c−m c−m+1 c−1 + + · · · + + c z − z0 n z − z0 m z − z0 m−1 z − z0 n=0 n
with c−m = 0. This expansion is valid in some annulus 0 < z − z0 < R.
EXAMPLE 24.3
Let fz = 1/z + i. This function is its own Laurent expansion about −i, and c−1 = 1, while all other coefficients are zero. Thus −i is a pole of order 1 of f . This singularity is not removable. There is no way to assign a value to f−i so that the extended function is differentiable at −i.
EXAMPLE 24.4
gz = 1/z + i3 , then g has a pole of order 3 at −i. Here the function is its own Laurent expansion about −i, and the coefficient of 1/z + i3 is nonzero, while all other coefficients are zero.
1026
CHAPTER 24
Singularities and the Residue Theorem
DEFINITION 24.3
Simple and Double Poles
A pole of order 1 is called a simple pole. A pole of order 2 is a double pole.
EXAMPLE 24.5
Let fz =
sinz z3
For z = 0, fz = =
1 −1n 2n+1 −1n 2n−2 z z = z3 n=0 2n + 1! n=0 2n + 1!
1 2 1 1 1 z − z4 + · · · − + 2 z 6 120 5 040
Therefore f has a double pole at 0.
EXAMPLE 24.6
e1/z is defined for all nonzero z, and, for z = 0, e1/z =
1 1 n! zn n=0
Since this Laurent expansion has infinitely many negative powers of z, 0 is an essential singularity of e1/z . We will discuss several results that are useful in identifying poles of a function.
THEOREM 24.1
Condition for a Pole of Order m
Let f be differentiable in the annulus 0 < z − z0 < R. Then f has a pole of order m at z0 if and only if lim z − z0 m fz
z→z0
exists finite and is nonzero. Proof
Expand fz in a Laurent series in this annulus: fz =
cn z − z0 n
for 0 < z − z0 < R
n=−
Suppose f has a pole of order m at z0 . Then c−m = 0 and c−m−1 = c−m−2 = · · · = 0, so the Laurent series is fz = cn z − z0 n n=−m
24.1 Singularities
1027
Then z − z0 m fz =
cn z − z0 n+m =
n=−m
cn−m z − z0 n
n=0
= c−m + c−m+1 z − z0 + c−m+2 z − z0 2 + · · · Then lim z − z0 m fz = c−m = 0.
z→z0
Conversely, suppose limz→z0 z − z0 m fz = L = 0. We want to show that f has a pole of order m at z0 . Let > 0. Because of this limit, there is a positive < R such that z − z0 m fz − L <
if 0 < z − z0 <
Then, for such z, z − z0 m fz < L + In particular, if z − z0 = , then z − z0 −n−1 fz < L + z − z0 −n−m−1 = L + −n−m−1 The coefficients in the Laurent expansion of fz about z0 are given by fz 1 cn = dz 2i $ z − z0 n+1 in which we can choose $ to be circle of radius about z0 . Then cn ≤
1 2max fzz − z0 −n−1 < L + −n−m−1 = L + −n−m z on $ 2
Now −n−m can be made as small as we like by choosing small, if n < −m. We conclude that cn = 0, hence cn = 0 , if n < −m. Thus the Laurent expansion of fz about z0 has the form fz =
c−m+1 c c−m + + · · · + −1 + cn z − z0 n m m−1 z − z0 z − z0 z − z0 n=0
and therefore f has a pole of order m at z0 , as we wanted to show.
EXAMPLE 24.7
Look again at Example 24.3. Since lim z + ifz = lim z + i
z→−i
z→−i
1 = 1 = 0 z+i
f has a simple pole at −i. In Example 24.4, lim z + i3 gz = lim z + i3
z→−i
so g has a pole of order 3 at −i.
z→−i
1 = 1 = 0 z + i3
1028
CHAPTER 24
Singularities and the Residue Theorem
In Example 24.5, lim z2 z→0
sinz sinz = 1 = 0 = lim 3 z→0 z z
so sinz/z3 has a double pole at 0. It is a common error to think that this function has a pole of order 3 at zero, because the denominator has a zero of order 3 there. However, lim z3 z→0
sinz = lim sinz = 0 z→0 z3
so by Theorem 24.1, the function cannot have a third-order pole at 0. If fz is a quotient of functions, it is natural to look for poles at places where the denominator vanishes. Our first result along these lines deals with a quotient in which the denominator vanishes at z0 , but the numerator does not. Recall that gz has a zero of order k at z0 if gz0 = · · · = g k−1 z0 = 0, but g k z0 = 0. The order of the zero is the order of the lowest-order derivative that does not vanish at the point.
THEOREM 24.2
Let fz = hz/gz, where h and g are differentiable in some open disk about z0 . Suppose hz0 = 0, but g has a zero of order m at z0 . Then f has a pole of order m at z0 . We leave a proof of this result to the student.
EXAMPLE 24.8
fz =
1 + 4z3 sin6 z
has a pole of order 6 at 0, because the numerator does not vanish at 0 , and the denominator has a zero of order 6 at 0. By the same token, f has a pole of order 6 at n for any integer n. Theorem 24.2 does not apply if the numerator also vanishes at z0 . The example fz = sinz/z3 is instructive. The numerator has a zero of order 1 at 0, and the denominator a zero of order 3 at 0, and we saw in Example 24.5 that the quotient has a pole of order 2. It would appear that the orders of the zeros of numerator and denominator subtract (or cancel) to give the order of a pole at the point. This is indeed the case.
THEOREM 24.3
Poles of Quotients
Let fz = hz/gz, and suppose h and g are differentiable in some open disk about z0 . Let h have a zero of order k at z0 and g a zero of order m at z0 , with m > k. Then f has a pole of order m − k at z0 . A proof of this is left to the student. By allowing k = 0, this theorem includes the case that the numerator hz has no zero at z0 .
24.1 Singularities
1029
EXAMPLE 24.9
Consider fz =
z − 3/24 cos7 z
The numerator has a zero of order 4 at 3/2, and the denominator has a zero of order 7 there, so the quotient f has a pole of order 3 at 3/2.
EXAMPLE 24.10
Let fz = tan3 z/z9 . The numerator has a zero of order 3 at 0, and the denominator has a zero of order 9 at 0. Therefore f has a pole of order 6 at 0. There are also some useful results stated in terms of products, rather than quotients. We claim that the order of a pole of a product is the sum of the orders of the poles of the factors at a given point.
THEOREM 24.4
Poles of Products
Let f have a pole of order m at z0 and let g have a pole of order n at z0 . Then fg has a pole of order m + n at z0 .
EXAMPLE 24.11
Let fz =
1 cos4 zz − /22
Here fz is a product, which we write for emphasis as
1 1 fz = cos4 z z − /22 Now 1/ cos4 z has a pole of order 4 at /2, and 1/z − /22 has a pole of order 2 there, so f has a pole of order 6 at /2. f also has poles of order 4 (not 6) at z = 2n + 1/2 for n any nonzero integer other than −1. We are now prepared to develop the powerful residue theorem.
SECTION 24.1
PROBLEMS
In each of Problems 1 through 12, determine all singularities of the function and classify each singularity as removable, a pole of a certain order, or an essential singularity.
1. cosz/z2 2.
4 sinz + 2 z + i2 z − i
1030
CHAPTER 24
Singularities and the Residue Theorem
3. e1/z z + 2i
10. tanz
sinz z− cos2z 5. z − 12 1 + z2 z 6. z + 12
11. 1/ cosz
4.
12. e1/zz+1 13. Let f be differentiable at z0 and let g have a pole of order m at z0 . Let fz0 = 0. Prove that fg has a pole of order m at z0 . 14. Let h and g be differentiable at z0 , gz0 = 0, and h have a zero of order 2 at z0 . Prove that gz/hz has a pole of order 2 at z0
z−i z2 + 1 sinz 8. sinhz z 9. 4 z −1
7.
24.2
15. Suppose h and g are differentiable at z0 and gz0 = 0, while h has a zero of order 3 at z0 . Prove that gz/hz has a pole of order 3 at z0 .
The Residue Theorem To see a connection between Laurent series and the integral of a function, suppose f has a Laurent expansion fz =
cn z − z0 n
n=−
in some annulus 0 < z − z0 < R. Let $ be a closed path in this annulus and enclosing z0 . According to Theorem 23.6, the Laurent coefficients are given by an integral formula. In particular, the coefficient of 1/z − z0 is c−1 = Therefore
$
1 fzdz 2i $
fzdz = 2ic−1
(24.1)
Knowing this one coefficient in the Laurent expansion yields the value of this integral. This fact gives this coefficient a special importance, so we will give it a name.
DEFINITION 24.4
Residue
n Let f have an isolated singularity at z0 , and Laurent expansion fz = n=− cn z − z0 in some annulus 0 < z − z0 < R. Then the coefficient c−1 is called the residue of f at z0 , and is denoted Re sf z0 .
We will now extend the idea behind equation (24.1) to include the case that $ may enclose any finite number of points at which f is not differentiable.
24.2 The Residue Theorem
!2 z2
!1 z1 !3
!n
$
z3
zn
1031
FIGURE 24.1
THEOREM 24.5
Residue Theorem
Let $ be a closed path and let f be differentiable on $ and all points enclosed by $, except for z1 zn , which are all the isolated singularities of f enclosed by $. Then $
fzdz = 2i
n
Re sf zj
j=1
In words, the value of this integral is 2i times the sum of the residues of f at the singularities of f enclosed by $. Enclose each singularity zj with a closed path !j (Figure 24.1) so that each !j is in the interior of $, encloses exactly one singularity, and does not intersect any other !k . By the extended deformation theorem, n n fzdz = fzdz = 2i Re sf zj
Proof
$
j=1 !j
j=1
The residue theorem is only as effective as our efficiency in evaluating residues of a function at singularities. If we had to actually write the Laurent expansion of f about each singularity to pick off the coefficient of the 1/z − zj term, the theorem would be difficult to apply in many instances. What adds to its importance is that, at least for poles, there are efficient ways of calculating residues. We will now develop some of these. THEOREM 24.6
Residue at a Simple Pole
If f has a simple pole at z0 , then Re sf z0 = lim z − z0 fz z→z0
Proof
If f has a simple pole at z0 , then its Laurent expansion about z0 is fz =
c−1 + cn z − z0 n z − z0 n=0
in some annulus 0 < z − z0 < R. Then z − z0 fz = c−1 +
n=0
cn z − z0 n+1
1032
CHAPTER 24
Singularities and the Residue Theorem
so lim z − z0 fz = c−1 = Re sf z0
z→z0
EXAMPLE 24.12
fz = sinz/z2 has a simple pole at 0, and sinz sinz = 1 = lim z→0 z2 z If $ is any closed path in the plane enclosing the origin, then by the residue theorem, - sinz dz = 2i Re sf 0 = 2i z2 $ Re sf 0 = lim z z→0
EXAMPLE 24.13
Let fz =
z − 6i z − 22 z + 4i
Then f has a simple pole at −4i and a double pole at 2. Theorem 24.6 will not help us with the residue of f at 2, but at the simple pole, Re sf −4i = lim z + 4i z→−4i
z − 6i z − 6i −4i − 6i = lim = z − 22 z + 4i z→−4i z − 22 −4i − 22
2 3 = − + i 5 10 Before looking at residues at poles of order greater than 1, the following version of Theorem 24.6 is sometimes handy. COROLLARY 24.1
Let fz = hz/gz, where h is continuous at z0 and hz0 = 0. Suppose g is differentiable at z0 and has a simple zero there. Then f has a simple pole at z0 and Re sf z0 = Proof
hz0 g z0
f has a simple pole at z0 by Theorem 24.2. By Theorem 24.6, Re sf z0 = lim z − z0 z→z0
hz hz hz = lim = 0 gz z→z0 gz − gz0 /z − z0 g z0
EXAMPLE 24.14
Let fz =
4iz − 1 sinz
24.2 The Residue Theorem
1033
Then f has a simple pole at , and, by Corollary 24.1, Re sf =
4i − 1 = 1 − 4i cos
In fact, f has a simple pole at n for any integer n, and Re sf n =
4in − 1 = −1n −1 + 4ni cosn
EXAMPLE 24.15
Evaluate
- 4iz − 1 dz $ sinz
with $ the closed path of Figure 24.2. y
π
0
π
2π
x
FIGURE 24.2 $ encloses only the singularities −, 0, and 2 of 4iz−1 . sinz
$ encloses the poles 0, , 2 and −, but no other singularities of f . By the residue theorem and Example 24.14, - 4iz − 1 dz = 2i Re sf 0 + Re sf + Re sf 2 + Re sf − $ sinz = 2i −1 + 1 − 4i + −1 + 8i + 1 + 4i = −16 2 Here is a formula for the residue of a function at a pole of order greater than 1. THEOREM 24.7
Residue at a Pole of Order m
Let f have a pole of order m at z0 . Then Re sf z0 =
dm−1 1 lim m−1 z − z0 m fz m − 1! z→z0 dz
1034
CHAPTER 24
Singularities and the Residue Theorem
If m = 1 (simple pole), then m−1! = 0! = 1 by definition, and the m−1–order derivative is defined to be just the function itself. With these conventions, the conclusion of the theorem reduces to the result for residues at simple poles when m = 1. Proof
In some annulus about z0 , fz =
c−m c−m+1 c + + · · · + −1 + cn z − z0 n m m−1 z − z0 z − z0 z − z0 n=0
It is c−1 we want. Write z − z0 m fz = c−m + c−m+1 z − z0 + · · · + c−1 z − z0 m−1 +
cn z − z0 n+m
n=0
The right side of this equation is a power series about z0 , and can be differentiated any number of times within its open disk of convergence. Compute dm−1
z − z0 m fz dzm−1 = m − 1!c−1 + n + mn + m − 1 · · · n + 1z − z0 n+1 n=0
In the limit as z → z0 , this equation yields lim
z→z0
dm−1
z − z0 m fz = m − 1!c−1 = m − 1!Re sf z0 dzm−1
EXAMPLE 24.16
Let fz =
cosz z + i3
Then f has a pole of order 3 at −i. By Theorem 24.7, 1 d2 cosz lim 2 z + i3 Re sf −i = 2! z→−i dz z + i3 =
1 1 d2 1 lim 2 cosz = − cos−i = − cosi 2 z→−i dz 2 2
Here are some examples of the residue theorem in evaluating complex integrals.
EXAMPLE 24.17
Let fz = We want to evaluate of f .
/ $
2iz − cosz z3 + z
fzdz, with $ a closed path that does not pass through any singularity
24.2 The Residue Theorem
1035
The singularities of f are simple poles at 0, i and −i. First compute the residue of f at each of these points. Here it is convenient to use Corollary 24.1: Re sf 0 =
− cos0 = −1 1
Re sf i =
1 2i2 − cosi −2 − cosi = = 1 + cosi 3i2 + 1 −2 2
and Re sf −i =
2i−i − cos−i 1 = −1 + cosi 3−i2 + 1 2
Now consider cases. 1. If $ does not enclose any of the singularities, then
/ $
fzdz = 0 by Cauchy’s theorem.
2. If $ encloses 0 but not i or −i, then fzdz = 2i Re sf 0 = −2i $
3. If $ encloses i but not 0 or −i, then 1 fzdz = 2i 1 + cosi 2 $ 4. If $ encloses −i but not 0 or i, then 1 fzdz = 2i −1 + cosi 2 $ 5. If $ encloses 0 and i but not −i, then 1 fzdz = 2i −1 + 1 + cosi = i cosi 2 $ 6. If $ encloses 0 and −i but not i, then 1 1 fzdz = 2i −1 − 1 + cosi = 2i −2 + cosi 2 2 $ 7. If $ encloses i and −i but not 0, then 1 1 fzdz = 2i 1 + cosi − 1 + cosi = 2i cosi 2 2 $ 8. If $ encloses all three singularities, then 1 1 fzdz = 2i −1 + 1 + cosi − 1 + cosi = 2i −1 + cosi 2 2 $
EXAMPLE 24.18
Let fz = We want to evaluate
/ $
sinz z2 z2 + 4
fzdz, where $ is a closed path enclosing 0 and 2i but not −2i.
CHAPTER 24
1036
Singularities and the Residue Theorem
By Theorem 24.3, f has a simple pole at 0, not a double pole, because sinz has a simple zero at 0. f also has simple poles at 2i and −2i. Only the poles at 0 and 2i are of interest in using the residue theorem, because $ does not enclose −2i. Compute 1 sinz 1 = 2 z→0 z z +4 4
Re sf 0 = lim z fz = lim z→0
and sinz
Re sf 2i = lim z − 2ifz = lim
z→2i z2 z + 2i
z→2i
Then
$
=
sin2i i = sin2i −44i 16
1 i sinz dz = 2i + sin2i z2 z2 + 4 4 16
EXAMPLE 24.19
We will evaluate
-
e1/z dz $
for $ any closed path not passing through the origin. / There are two cases. If $ does not enclose the origin, then $ e1/z dz = 0 by Cauchy’s theorem. If $ does enclose the origin, then use the residue theorem. We need Re se1/z 0. As we found in Example 24.6, 0 is an essential singularity of e1/z . There is no simple general formula for the residue of a function at an essential singularity. However, e1/z =
1 1 n n=0 n! z
is the Laurent expansion of e1/z about 0, and the coefficient of 1/z is 1. Thus Re se1/z 0 = 1 and e1/z dz = 2i $
Next we will look at a variety of applications of the residue theorem.
SECTION 24.2
PROBLEMS
In each of Problems 1 through 16, use the residue theorem to evaluate the integral over the given path. -
1+z dz $ is the circle of radius 7 z − 12 z + 2i about −i 2z 2. dz $ is the circle of radius 3 about 1 2 $ z − i
1.
$
2
3. $
ez /zdz $ is the circle of radius 2 about −3i
- cosz dz $ is the square of side length 3 and sides 2 $ 4+z parallel to the axes, centered at −2i - z+i 5. dz $ is the square of side length 8 and sides 2 $ z +6 parallel to the axes, centered at the origin
4.
24.3 Some Applications of the Residue Theorem -
z−i dz $ is the circle of radius 1 about the origin 2z + 1
6. $
-
z
7.
sinh2 z
$
8.
dz $ is the circle of radius 1 about
1 2
- cosz i 1 dz $ is the circle of radius about zez 2 8 $ -
iz dz $ is the circle of radius 2 9. 2 + 9z − i z $ about −3i 2 10. e2/z dz $ is the square with sides parallel to the $
axes and of length 3, centered at −i 11.
- 8z − 4i + 1 dz $ is the circle of radius 2 about −i z + 4i $
16.
- z 2 dz $ is any closed path enclosing 1 z−1 $
17. With h and g as in Problem 14 of Section 24.1, show that 2 g z0 h4 z0 g z gz Re s z0 = 2 3 0 − hz h z0 3 h3 z0 2 18. With h and g as in Problem 15 of Section 24.1, show that gz 3 gz0 h5 z0 g z z0 = 3 0 − Re s hz h z0 10 h z0 2 4 h z0 gz0 h4 z0 g z0 h z0 − +9 24 6 h z0 3 19. Let g and h be differentiable at z0 . Suppose gz0 = 0 and let h have a zero of order k at z0 . Prove that gz/hz has a pole of order k at z0 , and
-
12.
z2 dz $ is the square of side length 4 and $ z − 1 + 2i sides parallel to the axes, centered at 1 − 2i cothzdz $ is the circle of radius 2 about i
13. $
14.
- 1 − z2 dz $ is the circle of radius 2 about 2 3 $ z −8 -
15.
e2z dz $ is any closed path enclosing 0 $ zz − 4i and 4i
24.3
1037
Re s
k k! gz z0 = hz hk z0
Hk Hk+1 × Hk+2 H 2k−1
0 Hk Hk+1 H2k−2
0 0 Hk H2k−3
··· ··· ··· ···
0 0 0 Hk+1
Gj =
g j z0 j!
G0 G1 G2 Gk−1
where Hj =
hj z0 j!
and
Some Applications of the Residue Theorem 24.3.1
The Argument Principle
The argument principle is an integral formula for the difference between the number of zeros and the number of poles of a function (counting multiplicities) enclosed by a given closed path $. THEOREM 24.8
Argument Principle
Let f be differentiable on a closed path $ and at all points in the set G of points enclosed by $, except at possibly a finite number of poles of f in G. Let Z be the number of zeros of f in G, and P the number of poles of f in G, with each pole and zero counted k times if its multiplicity is k. Then, - f z dz = Z − P $ fz Observe first that the only points in G where f /f might possibly have a singularity are the zeros and poles of f in G.
Proof
1038
CHAPTER 24
Singularities and the Residue Theorem
Now suppose that f has a zero of order k at z0 in G. We will show that f /f must have a simple pole at z0 , and that Re sf /f z0 = k. To see this, first note that, because z0 is a zero of order k, fz0 = f z0 = · · · = f k−1 z0 = 0 while f k z0 = 0. Then, in some open disk about z0 , the Taylor expansion of fz is fz =
cn z − z0 n =
cn+k z − z0 n+k
n=0
n=k
= z − z0 k
cn+k z − z0 n = z − z0 k gz
n=0
where g is differentiable at z0 (because it has a Taylor expansion there) and gz0 = ck = 0. Now, in some annulus 0 < z − z0 < R k f z kz − z0 k−1 gz + z − z0 k g z g z = = + fz z − z0 k gz z − z0 gz0 Since g z/gz is differentiable at z0 , then f z/fz has a simple pole at z0 , and Re sf /f z0 = k. Next, suppose f has a pole of order m at z1 . In some annulus about z0 , fz has Laurent expansion fz =
dn z − z1 n
n=−m
with d−m = 0. Then z − z1 m fz =
dn z − z1 n+m =
n=−m
dn−m z − z1 n = hz
n=0
with h differentiable at z1 and hz1 = c−m = 0. Then fz = z − z1 −m hz, so in some annulus about z1 , h z f z −mz − z1 −m−1 hz + z − z1 −m h z −m = = + −m fz z − z1 hz z − z1 hz Therefore f /f has a simple pole at z1 , with Re sf /f z1 = −m. Therefore, the sum of the residues of f z/fz at singularities of this function in G counts the zeros of f in G, according to multiplicity, and the negative of the number of poles of f in G, again according to multiplicity.
EXAMPLE 24.20
We will evaluate Write
/ $
cotzdz, with $ the closed path of Figure 24.3. cotz =
cosz f z = sinz fz
where fz = sinz. Since f has five simple zeros and no poles enclosed by $, the argument principle yields cotzdz = 2i5 − 0 = 10i $
24.3 Some Applications of the Residue Theorem
1039
y $
3 2
0
2
3
4
x
FIGURE 24.3
24.3.2
An Inversion Formula for the Laplace Transform
If f is a complex function defined at least for all z on 0 , the Laplace transform of f is fz = e−zt ftdt 0
for all z such that this integral is defined and finite. If f = F , then F is the Laplace transform of f , and f is an inverse Laplace transform of F . Sometimes we write f = −1 f, although this requires additional conditions for uniqueness because there are in general many functions having a given F as their Laplace transform. We will give a formula for −1 f in terms of the sum of the residues of ezt Fz at the poles of f . THEOREM 24.9
Inverse Laplace Transform
Let F be differentiable for all z except for a finite number of points z1 zn , which are all poles of F . Suppose for some real , F is differentiable for all z with Rez > . Suppose also that there are numbers M and R such that zFz ≤ M
for z > R
For t ≥ 0, let ft =
n
Re sezt Fz zj
j=1
Then fz = Fz
for Rez >
The condition that F is differentiable for Rez > means that F z for all z to the right of the vertical line x = . It is also assumed that zFz is a bounded function for z outside some sufficiently large circle about the origin. For example, this condition is satisfied by any rational function (quotient of polynomials) in which the degree of the denominator exceeds that of the numerator.
EXAMPLE 24.21
Let a > 0. We want an inverse Laplace transform of Fz = 1/a2 + z2 . This can be found in tables of Laplace transforms. To use the theorem, F has simple poles at ±ai. Compute zt e eati Re s ai = a2 + z 2 2ai
1040
CHAPTER 24
Singularities and the Residue Theorem
and
e−ati ezt −ai = Re s a 2 + z2 −2ai
An inverse Laplace transform of F is given by ft =
1 1 ati e − e−ati = sinat 2ai a
for t ≥ 0.
EXAMPLE 24.22
We want a function whose Laplace transform is Fz =
1 z2 − 4z − 12
F has simple poles at ±2 and a double pole at 1. Compute ezt ezt 1 2 = lim = e2t Re s z→2 z + 2z − 12 z2 − 4z − 12 4 ezt ezt 1 Re s −2 = lim = − e−2t 2 2 z→−2 z − 2z − 12 z − 4z − 1 36 and
Re s
zt e ezt d 1 = lim 2 2 2 z→1 dz z − 4z − 1 z −4 = lim ezt z→1
tz2 − 4t − 2z z2 − 42
1 2 = − tet − et 3 9
An inverse Laplace transform of F is given by 1 2 1 1 ft = − tet − et + e2t − e−2t 3 9 4 36 for t > 2 (since all poles of F occur on or to the left of the line Rez = 2). In these sections we can see a theme developing. A variety of problems (zeros of functions, inverse Laplace transforms, others to be discussed) can be approached by integrating an appropriately chosen complex function over an appropriately chosen path. The function and path must be selected so that the integral gives us the quantity we want to calculate, perhaps after some limit process. We can then use the residue theorem to explicitly evaluate the integral. Depending on the problem, choosing the right function and the right path can be a nontrivial task, but at least this method provides an approach.
24.3.3 Evaluation of Real Integrals We will illustrate the use of the residue theorem in evaluating several general classes of real integrals.
24.3 Some Applications of the Residue Theorem Integrals of for example,
2 0
Kcos sind
1041
Let Kx y be a quotient of polynomials in x and y,
x3 y − 2xy2 + x − 2y x4 + x3 If we replace x with cos and y with sin, we obtain a quotient involving sums of products of positive integer powers of cos and sin. We are interested in evaluating integrals of the form 2 Kcos sind 0
The idea will be to show that this real integral is equal to an integral of a certain complex function over the unit circle. We then use the residue theorem to evaluate this complex integral, obtaining the value of the real integral. To execute this strategy, let ! be the unit circle, oriented counterclockwise as usual. Parametrize ! by ! = ei for 0 ≤ ≤ 2. On this curve, z = ei and z = e−i = 1/z, so 1 1 i 1 −i = e +e cos = z+ 2 2 z and
1 1 1 i −i e −e = z− sin = 2i 2i z
Further, on !, dz = iei d = izd so d =
1 dz iz
Now we have 2 1 1 1 1 1 1 K Kcos sin i iei d dz = z+ z− 2 z 2i z iz ie 0 ! 2 = Kcos sind 0
This converts the real integral we want to evaluate into the integral of a complex function fz over the unit circle, where 1 1 1 1 1 fz = K z+ z− 2 z 2i z iz / Use the residue theorem to evaluate ! fzdz, obtaining
2 0
Kcos sind = 2i
Re sf p
(24.2)
p
The sum on the right is over all of the poles p of fz enclosed by the unit circle. Poles occurring outside the unit circle are not included in the calculation. Finally, equation (24.2) assumes that fz has no singularities on the unit circle. 2 The procedure for evaluating 0 Kcos sind, then, is to compute fz, determine its poles within the unit circle, evaluate the residues there, and apply equation (24.2). This is a
1042
CHAPTER 24
Singularities and the Residue Theorem
very powerful method that often yields closed form evaluations of integrals for which standard techniques of integration from real calculus are inadequate.
EXAMPLE 24.23
We will evaluate
2 0
sin2 d 2 + cos
The function K in the above discussion is Kx y =
y2 2+x
The first step is to replace x = cos with z + 1/z/2 and y = sin withz − 1/z/2i, and then multiply by 1/iz, to produce the complex function 1 1 1 1 1 fz = K z+ z− 2 z 2i z iz
1 2 z− 1 1 i z4 − 2z2 + 1 = = 2i 1 z 1 2 + 2 z + z iz 2 z2 z2 + 4z + 1 √ f has√a double pole at 0 and simple poles at zeros of z2 + 4z + 1, which are −2 + 3 and these two simple poles of f , the first is enclosed by ! and the second is not, so −2 − 3. Of √ discard −2 − 3. By equation (24.2),
2 0
√ sin2 d = 2i Re sf 0 + Re sf −2 + 3 2 + cos
Now d 2 d i z4 − 2z2 + 1 z fz = lim z→0 dz z→0 dz 2 z2 + 4z + 1 5 4 i z + 6z − 4z2 − 3z + 2z3 − 2 = lim 2 = −2i 2 z→0 z2 + 4z + 12
Re sf 0 = lim
and
√ z4 − 2z2 + 1 i Re sf −2 + 3 = 2 2zz2 + 4z + 1 + z2 2z + 4 z=−2+√3 √ i 42 − 24 3 = √ 2 −12 + 7 3
Then
2 0
√ √ i 42 − 24 3 sin2 90 − 52 3 d = 2i −2i + = √ √ 2 + cos 2 −12 + 7 3 12 − 7 3
approximately 168357. In applying this method, if a complex number results, check the calculations, because a real integral must have a real value.
24.3 Some Applications of the Residue Theorem
1043
EXAMPLE 24.24
We will evaluate
0
1 d + cos
where 0 < < . Since the method we have developed deals with integrals over 0 2, we must first decide how to accommodate an integral over 0 . Write 2 2 1 1 1 d + d d = + cos + cos 0 0 + cos Let w = 2 − in the last integral to obtain 0 2 1 1 1 d = −1dw = dw + cos + cos2 − w 0 + cosw Therefore
0
1 2 1 1 d = d + cos 2 0 + cos
and we can concentrate on the integral over 0 2. First produce the function fz = f has simple poles at
1 −2i 1 = 2 1 iz z + 2z + + z+ z 2
− ± 2 − 2 z=
Since > , these numbers are real. Only one of them, − + 2 − 2 z1 = is enclosed by !. The other is outside the unit disk and is irrelevant for our purposes. Then 1 2 1 1 1 d = d = 2i Re sf z1 2 0 + cos 2 0 + cos = i
−2i = 2 2z1 + 2 − 2
Before continuing with other kinds of real integrals we can evaluate using the residue theorem, we will take a brief excursion and give another, perhaps surprising, proof of the fundamental theorem of algebra. This argument is originally due to N.C. Ankeny, and the version we give appeared in Lion Hunting and Other Mathematical Pursuits, by R.P. Boas (The Mathematical Association of America Dolciani Mathematical Expositions, Volume 15). Let pz be a nonconstant polynomial with complex coefficients. We want to show that, for some number z, pz = 0. First, we may assume that px is real if x is real. To see why this is true, let pz = a0 + a1 z + · · · + an zn
1044
CHAPTER 24
Singularities and the Residue Theorem
where an = 0. Denote pz = a0 + a1 z + · · · + an zn Then qz = pzpz is a nonconstant polynomial. Further, if z = x is real, then x = x and qx = pxpx = a0 + a1 x + · · · + an xn a0 + a1 x + · · · + an xn = a0 + a1 x + · · · + an xn a0 + a1 x + · · · + an xn = a0 + a1 x + · · · + an xn 2 is real. We could then use qz in our argument in place of pz, since qz is a polynomial with no zero if pz has no zero. Thus suppose that pz = 0 for all z, and px is real if x is real. Because px is continuous and never zero for real x, px must be strictly positive or strictly negative for all real x. But then 2 1 d = 0 p2 cos 0 But, by the method we have just discussed, with ! the unit circle, we conclude that 2 1 1 - zn−1 1 1 d = dz = dz = 0 p2 cos i ! rz ! pz + 1/z iz 0 where rz = zn pz + 1/z 2 n 1 1 1 + a2 z + = zn a0 + a1 z + + · · · + an z + z z z
z2 + 1 z2 + 12 z2 + 1n + a2 = zn a0 + a1 + · · · + a n z z2 zn From this it is clear that rz is a polynomial. If r" = 0 for some " = 0, then we would have p" + 1/" = 0, so " + 1/" would be a zero of p, a contradiction. Further, r0 = an = 0 because p has degree n. Therefore rz = 0 for all z, so zn−1 /rz is a differentiable function for all z. But then, by Cauchy’s theorem, 1 - zn−1 dz = 0 i ! rz a contradiction. We conclude that pz = 0 for some number z, proving the fundamental theorem of algebra. Evaluation of
−
px/qxdx
We will now consider real integrals of the form px dx − qx
in which p and q are polynomials with real coefficients and no common factors, q has no real zeros, and the degree of q exceeds the degree of p by at least 2. These conditions are sufficient to ensure convergence of this improper integral. As with the preceding class of integrals, the strategy is to devise a complex integral that is equal to this real integral, then evaluate the complex integral using the residue theorem. To
24.3 Some Applications of the Residue Theorem
1045
y ! z3 R
z2 zm R
z1
S
R
x
FIGURE 24.4
do this, first observe that qz has real coefficients, so its zeros occur in complex conjugate pairs. Suppose the zeros of q are z1 z1 z2 z2 zm zm , with each zj in the upper half plane Imz > 0 and each zj in the lower half plane Imz < 0. Let $ be the curve shown in Figure 24.4, consisting of a semicircle ! of radius R and the segment S from −R to R on the real axis, with R large enough that $ encloses all the poles z1 zm of pz/qz in the upper half plane. Then pz pz - pz m dz = 2i Re sp/q zj = dz + dz (24.3) $ qz S qz ! qz j=1 On S, z = x for −R ≤ x ≤ R, so
R px pz dz = dx −R qx S qz
Next consider the integral over !. Since the degree of qz exceeds that of pz by at least 2, degree of z2 pz ≤ degree of qz This means that, for sufficiently large R, z2 pz/qz is bounded for z ≥ R. That is, for some number M, 2 z pz qz ≤ M for z ≥ R Then
so
pz M M qz ≤ z2 ≤ R2
for z ≥ R
pz M ! qz dz ≤ R2 length of !
M M → 0 as R → R = R2 R Thus, in the limit as R → in equation (24.3), the first integral on the right has limit px/qxdx, and the second integral has limit zero. In the limit as R → , equation − (24.3) yields px m pz dx = 2i Re s zj (24.4) qz − qx j=1 =
Equation (24.4) provides a general method for evaluating integrals of rational functions over the real line, under the assumptions made above. It is not necessary to repeat the derivation of this equation each time it is used—simply determine the zeros of qz in the upper half-plane, evaluate the residue of p/q at each such zero (which is a pole of p/q whose order must be determined), and apply equation (24.4).
1046
CHAPTER 24
Singularities and the Residue Theorem
EXAMPLE 24.25
We will evaluate
−
1 dx x6 + 64
Here pz = 1 and qz = z6 + 64. The degree of q exceeds that of p by 6, and q has no real zeros. The zeros of z6 + 64 are the sixth roots of −64. To find these, put −64 in polar form: −64 = 64ei+2n in which n can be any integer. The six sixth roots of −64 are 2ei+2n/6 for n = 0 1 2 3 4 5 The three roots in in the upper half-plane are z1 = 2ei/6 z2 = 2ei/2 = 2i
and z3 = 2e5i/6
We need the residue of 1/z6 + 64 at each of these simple poles. Corollary 24.1 is convenient to use here: 1 1 −5i/6 1 i/6 2e e Re s 6 = = i/6 5 z + 64 62e 192 1 i 1 2i = Re s 6 =− z + 64 62i5 192 and Re s
1 1 −25i/6 1 −i/6 1 5i/6 = = = 2e e e 6 5i/6 5 z + 64 62e 192 192
Then
−
2i −5i/6 1 dx = − i + e−i/6 e x6 + 64 192 5 5 i = cos − i sin − i + cos − i sin 96 6 6 6 6
Now cos
5 + cos =0 6 6
and
5 sin 6
= sin
6
=1
so
−
1 i dx = −2i = x6 + 64 96 48
24.3 Some Applications of the Residue Theorem
1047
Integrals of − px/qxcoscxdx and − px/qxsincxdx Suppose p and q are polynomials with real coefficients and no common factors, that the degree of q exceeds the degree of p by at least 2, and that q has no real zeros. We want to evaluate integrals px px coscxdx and sincxdx − qx − qx in which c is any positive number. Again, we proceed by looking for the integral of a suitably chosen complex function over a suitably chosen closed curve. Consider - pz eicz dz $ qz where $ is the closed path of the preceding subsection, enclosing all the zeros z1 zm of q lying in the upper half plane. Here is why this integral is promising. With $ consisting of the semicircle ! and the segment S on the real axis, as before, then pz R px - pz eicz dz = eicz dz + eicx dx $ qz ! qz −R qx R px - pz R px eicz dz + coscxdx + i sincxdx = ! qz −R qx −R qx m pz icz = 2i Re s e zj qz j=1 As R → , one can show that ! pz/qzeicz dz → 0, leaving px px m pz icz coscxdx + i sincxdx = 2i Re s e zj (24.5) qz − qx − qx j=1 The real partof the right side of equation (24.5) is − px/qx coscxdx, and the imaginary part is − px/qx sincxdx.
EXAMPLE 24.26
We will evaluate
−
coscx dx x2 + 2 x2 + 2
in which c, and are positive numbers and = . The zeros of the denominator in the upper half plane are i and i, and these are simple poles of fz =
eicz z2 + 2 z2 + 2
Compute Re sf i =
eici e−c = 2 2 2i − 2i2 − 2
and Re sf i =
e−c 2i2 − 2
1048
CHAPTER 24 Then
Singularities and the Residue Theorem
coscx sincx dx + i dx 2 2 2 2 2 2 2 2 − x + x + − x + x + −c
e−c e e−c e−c = 2i + − = 2 2i2 − 2 2i2 − 2 − 2
Separating real and imaginary parts, we have −c e e−c coscx dx = 2 − 2 2 2 2 − 2 − x + x + and
−
sincx dx = 0 x2 + 2 x2 + 2
The latter is obvious because the integrand is an odd function. Integrals Using Indented Contours Equation (24.4) enables us to evaluate certain improper integrals of quotients of polynomials, assuming that the denominator has no real zeros. We will extend this result to the case that the denominator has simple real zeros. Consider px dx − qx in which p and q are polynomials with real coefficients and no common factors, and the degree of q exceeds that of p by at least 2. Suppose q has complex zeros z1 zm in the upper half plane, as well as simple real zeros t1 tk . Let $ be the path of Figure 24.5, including a semicircle ! of radius R about the origin, small semicircles !j of radius centered at each real zero tj , and segments Lj along the real line connecting these semicircles as shown. We call such a path an indented path because of the small semicircles about the real zeros of qx. Let be small enough that no two of these semicircles intersects the other semicircles, and no zj is enclosed by any !k with j = k. Also suppose R is large enough that $ encloses each zj . Note that each tj is outside $. By the residue theorem, - pz pz dz = 2i Re s z qz j $ qz j=1 =
pz k k+1 px pz dz + dz + dx ! qz j=1 !j qz j=1 Lj qx
We want to investigate what happens in equation (24.6) when R → and → 0. y !
!1
!2
L1 t1 L2 t2 FIGURE 24.5
!3 L3
t3 L 4
x
(24.6)
24.3 Some Applications of the Residue Theorem
1049
When R → , we claim that ! pz/qzdz → 0 by an argument like that we have done before. The sum of the residues at zeros of q is unchanged in this limit. As → 0, the semicircles !j contract to tj , and the segments Lj expand to cover the interval −R R, and then the entire real line as R → . This means that, in equation (24.6) k+1 j=1 Lj
px pz dz → dx qz − qx
It is not yet clear what happens to each integral !j pz/qzdz in this process. We will show that each of these integrals approaches i times the residue of pz/qz at the real simple pole tj . To see this, write the Laurent expansion of pz/qz about tj : pz c c = −1 + cs z − tj s = −1 + gz qz z − tj s=0 z − tj
where g is differentiable at tj . On !j , z = tj + eit , where t varies from to 0 (for counterclockwise orientation on $). Then pz 1 dz = c−1 dz + gzdz !j qz !j z − t j !j 0 0 1 it ie dt + gtj + eit ieit dt = c−1 it e 0 = −ic−1 + i gtj + eit eit dt = −i Re s Now i
0
0 pz tj + i gtj + eit eit dt qz
gtj + eit eit dt → 0 as → 0. Therefore pz pz dz → −i Re s t qz j !j qz
Therefore, as R → and → 0 in equation (24.6), we get 2i
j=1
Re s
k pz pz px zj = −i Re s tj + dx qz qz qx − j=1
hence
−
k pz pz px dx = i Res t + 2i Re s z qx qz j qz j j=1 j=1
(24.7)
In a sense, the simple poles of pz/qz on the real line contribute “half residues”, having been enclosed by semicircles instead of circles, while the poles in the upper half plane contribute “full residues” to this sum.
1050
CHAPTER 24
Singularities and the Residue Theorem
EXAMPLE 24.27
Evaluate
−
3x + 2 xx − 4x2 + 9
Here fz =
3z + 2 zz − 4z2 + 9
The denominator has simple real zeros at 0 and 4, and simple complex zeros −3i and 3i. Only 3i is in the upper half plane. Compute the residues: 1 2 =− z→0 −36 18 7 14 Re sf 4 = limz − 4fz = = z→4 100 50
Re sf 0 = lim zfz =
and Re sf 3i = lim z − 3ifz = z→3i
Then
−
Integrals
0
2 + 9i 9i + 2 = 3i3i − 46i 72 − 54i
1 7 2 + 9i 3x + 2 14 = i − + + 2i = − 2 xx − 4x + 9 18 50 72 − 54i 75
xa px/qxdx
Let 0 < a < 1. We will consider integrals of the form
0
xa
px dx qx
in which p and q are polynomials with real coefficients and no common factors, q has no positive zeros, and the degree of q exceeds the degree of p by at least 1. We also assume that either q0 = 0, or qz has a simple zero at the origin. Let the nonzero zeros of q be z1 zm . These are all the nonzero zeros of q, not just those in the upper half-plane. Since the coefficients of q are real, this list includes complex conjugate pairs. Choose r small enough and R large enough that the closed path $ shown in Figure 24.6 encloses z1 zm . $ consists of !R (“most” of the circle of radius R about 0), !r (“most” of the circle of radius r about 0), and the line segments L1 and L2 connecting !r and !R . We will eventually let r → 0 and R → , but some work is required first. We must agree on a meaning for za , since this symbol generally denotes a (possibly infinite) set of different numbers. Write z = ei for some in 0 2, and define za = a eia As z approaches L1 , fz =
za pz xa px → qz qx
24.3 Some Applications of the Residue Theorem
1051
y !R z1
z2 !r
L1 x L2
zm z4
z3
FIGURE 24.6
where r < x < R. But as z approaches L2 , the lower side of the positive real axis, fz →
xa e2ai px qx
The reason for this is that the argument increases by 2 as z approaches the positive real axis from below, and then za = a ei+2a = a eia e2ai By the residue theorem, - za pz m dz = 2i Resf zj $ qz j=1 =
!R
za pz xa px xa e2ai px za pz dz + dz + dx + dx qz qx !r qz L1 qx L2
On L1 , x varies from r to R, while on L2 , x varies from R to r to maintain counterclockwise orientation on $. The last equation becomes 2i
m
Re sf zj =
za pz za pz dz + dz qz !r qz
!R
j=1
+
R r
r xa e2ai px xa px dx + dx qx qx R
By making estimates on the two circular arcs, we can show that the first two integrals in the last equation tend to zero as r → 0 and R → . In this limit, the last equation becomes 2i
m
Re sf zj =
j=1
0
0 xa e2ai px xa px dx + dx qx qx
From this we obtain 0
m 2i xa px dx = Re sf zj qx 1 − e2ai j=1
(24.8)
1052
CHAPTER 24
Singularities and the Residue Theorem
EXAMPLE 24.28
We will use equation (24.8) to evaluate
0
Here pz = 1, a =
1 3
x1/3 dx xx2 + 1
and qz = z1 + z2 , with simple zeros at 0, i and −i. Compute
Re sz1/3 /qz i =
1 i1/3 ei/2 1/3 = − ei/6 = 2i2 −2 2
and Re sz1/3 qz −i = Then
0
1 e3i/2 1/3 = − ei/2 −2 2
2i x1/3 1 i/6 dx = e + ei/2 − 2 2i/3 xx + 1 1−e 2 √ −i 3 1 √ = + i + i = 2 1+ 1 − 3i 2 2
2
Many other kinds of real integrals can be evaluated using complex integral techniques. Some of these require considerable ingenuity in seeking the right function to integrate over the path to get the result that is wanted. The Cauchy Principal Value Since we have been dealing with improper integrals, we will mention the Cauchy principal value. An integral I= gxdx −
is defined to be
0
lim
r→− r
gxdx + lim
R→ 0
R
gxdx
if both of these integrals converge. These limits are independent of each other. The Cauchy principal value of I is defined to be R CPV gxdx = lim gxdx −
R→ −R
This is a special case of the two independent limits defining I. In the event that − gxdx converges, certainly the value of I agrees with the Cauchy principal value of the integral. However, it is possible for an integral to have a finite CPV, but to diverge in the broader sense of the definition of I. This occurs with − xdx, which certainly diverges. However, this integral has Cauchy principal value 0, because for any positive R, R xdx = 0 −R
In some of the examples we have discussed, we were actually computing Cauchy principal R values, whenever we took a limit of −R gxdx as R → . In most cases the conditions imposed insured that the improper integral converged in the more general sense as well.
24.3 Some Applications of the Residue Theorem
SECTION 24.3
PROBLEMS
-
z dz, with $ the circle z = 2, first 2 $ 1+z using the residue theorem, and then by the argument principle.
1. Evaluate
/ 2. Evaluate $ tanzdz, with $ the circle z = , first by the residue theorem, and then by the argument principle. -
z+1 dz, with $ the circle z = 1, 2 $ z + 2z + 4 first by the residue theorem, and then by the argument principle.
3. Evaluate
4. Let pz = z − z1 z − z2 · · · z − zn , with z1 zn distinct complex numbers. Let $ be a positively oriented closed path enclosing each of the zj s. Prove that - p z dz = 2in $ pz first by the residue theorem, and then by the argument principle. In each of Problems 5 through 9, find an inverse Laplace transform of the function, using residues.
14.
1 z + 32
7.
1 z − 22 z + 4
8.
15.
In each of Problems 10 through 22, evaluate the integral.
2
1 d 6 + sin
2
1 d 2 − cos
10. 0
11. 0
−
13.
−
2
16.
−
18.
−
19.
−
1 dx xx + 4x2 + 16 sinx dx x2 − 4x + 5 cos2 x dx x2 + 42
1 dx x − 4x5 + 1
−
22.
d
sin + cos d 2 − cos
0
2 sin
2
20.
21.
x sin2x dx x4 + 16 2 + sin2
0
17.
1 dx x2 − 2x + 6
x3/4 dx x4 + 1
23. Let be a positive number. Show that cosx dx = e− 2 − x + 1 24. Let and be positive numbers. Show that cosx dx = 3 1 + e− 2 2 2 − x +
1 z2 + 9z − 22
9. z + 5−3
12.
−
0
z 5. 2 z +9 6.
1053
−
1 dx x4 + 1 1 dx x6 + 1
25. Let and be distinct positive numbers. Show that 2 2 1 d = 0 2 cos2 + 2 sin2 26. Let be a positive number. Show that /2 1 d = 0 + sin2 2 1 + 27. Let be a positive number. Show that √ −2 2 e e−x cos2xdx = 2 0 2
Hint: Integrate e−z about the rectangular path having corners at ±R and R ± i. Use Cauchy’s theorem to evaluate this integral, set this equal to the sum of the
1054
CHAPTER 24
Singularities and the Residue Theorem
integrals on the sides of the rectangle, and take the limit as R → . Assume the standard result that √ 2 e−x dx = 2 0
y R
28. Derive Fresnel’s integrals: 1 cosx2 dx = sinx2 dx = 2 2 0 0 2
Hint: Integrate eiz over the closed path bounding the sector 0 ≤ x ≤ R, 0 ≤ ≤ /4, show in Figure 24.7. Use Cauchy’s theorem to evaluate this integral, then evaluate it as the sum of integrals over the boundary segments of the sector. Show that the integral over the circular sector tends to zero as R → , and use the integrals over the straight segments to obtain Fresnel’s integrals.
x
FIGURE 24.7
29. Let and be positive numbers. Show that x sinx −/√2 dx = e sin √ x4 + 4 22 0 2 30. Let 0 < < . Show that
0
1 d = 2 + cos2 − 2 3/2
CHAPTER
25
FUNCTIONS AS MAPPINGS CONFORMAL MAPPINGS CONSTRUCTION OF CONFORMAL MAPPINGS BETWEEN DOMAINS HARMONIC FUNCTIONS AND THE DIRICHLET PROBLEM COMPLEX FUNCTION MODELS
Conformal Mappings
In the calculus of real functions of a single real variable, we may gain some insight into the function’s behavior by sketching its graph. For complex functions we cannot make the same kind of graph, since a complex variable z = x + iy by itself has two variables. However, we can set w = fz and make two copies of the complex plane, one for z and the other for image points w. As z traces out a path or varies over a set S in the z plane, we can plot image points w = fz in the w plane, yielding a picture of how the function acts on this path or on points in S. The set of all image points fz for z in S is denoted fS. When looked at in this way, a function is called a mapping or transformation. This idea is diagrammed in Figure 25.1. z S
w w f (z)
f (S)
FIGURE 25.1
Thinking of a function as a mapping can be a powerful tool in solving certain kinds of problems, including the analysis of fluid motion and the solution of partial differential equations, particularly Dirichlet problems. We will develop some ideas about mappings, then turn to applications.
25.1
Functions as Mappings First we need some terminology. Let f be a complex function and D a set of points in the plane on which fz is defined. Let D∗ also be a set of complex numbers. 1055
1056
CHAPTER 25
Conformal Mappings
DEFINITION 25.1
1. f maps D into D∗ if fz is in D∗ for every z in D. In this event, we write f D → D∗ . 2. f maps D onto D∗ if fz is in D∗ for every z in D and, conversely, if w is in D∗ , then there is some z in D such that w = fz. In this event, we call f an onto mapping.
Thus f D → D∗ is onto if every point of D∗ is the image under f of some point in D.
EXAMPLE 25.1
Let fz = iz for z ≤ 1. Then f acts on points in the closed unit disk D z ≤ 1. If z is in D, then fz = iz = z ≤ 1, so the image of any point in this disk is also in this disk. Here f maps D into D. (So D∗ = D in the definition). This mapping is onto. If w is in D, then z = w/i is also in D, and fz = fw/i = iw/i = w Every point in the unit disk is the image of some point in this disk under this mapping. We can envision this mapping geometrically. Since i = ei/2 , if z = rei , then fz = iz = rei/2 ei = rei+/2 so f takes z and adds /2 to its argument. This rotates the line from the origin to z by /2 radians counterclockwise. The action of f on z can be envisioned as in Figure 25.2. Since this function is simply a counterclockwise rotation by /2 radians, it is clear why f maps the unit disk onto itself.
z
w i
i
f(z) iz z
2π 1
FIGURE 25.2
w iz
1
The mapping fz = iz for z ≤ 1.
Often we have the function f and the set D of complex numbers to which we want to apply this mapping. We then must analyze fz to determine the image of D under the mapping. In effect, we are finding D∗ so that f is a mapping of D onto D∗ .
EXAMPLE 25.2
Let fz = z2 for z in the wedge D shown in Figure 25.3. D consists of all complex numbers on or between the nonnegative real axis and the line y = x.
25.1 Functions as Mappings z
w z2
1057
w D*
D π
π
4
2
FIGURE 25.3 w = z2 maps D
one-to-one onto D∗ .
In polar form z = rei is in D if 0 ≤ ≤ /4. Then fz = z2 = r 2 e2i , so f has the effect of squaring the magnitude of z, and doubling its argument. If z has argument between 0 and /4, then z2 has argument between 0 and /2. This spreads the wedge D out to cover the entire right quarter-plane, consisting of points on or between the nonnegative real and imaginary axes. If we call this right quarter plane D∗ , then f maps D onto D∗ . Some functions map more than one point to the same image. For example, fz = sinz maps all integer multiples of to zero. If each image point is the unique target of exactly one number, then the mapping is called one-to-one.
DEFINITION 25.2
A mapping f D → D∗ is one-to-one if distinct points of D map to distinct points in D∗ .
Thus f is one-to-one (or 1 − 1) if z1 = z2 implies that fz1 = fz2 . The notions of one-to-one and onto are independent of each other. A mapping may have one of these properties, or both, or neither. The mapping fz = z2 of Example 25.2 maps the wedge 0 ≤ argz ≤ /4 in one-to-one fashion onto the right quarter plane. However, fz = z2 does not map the entire complex plane to the complex plane in a one-to-one manner, since f−z = fz. This function does map the plane onto itself, since, given any complex number w, there is some z such that fz = z2 = w.
EXAMPLE 25.3
Let hz = z2 for all z. h maps the entire plane onto itself, but is not one-to-one. If z = x + iy, then hz = x2 − y2 + 2ixy = u + iv where u = x2 − y2 and v = 2xy. We will use this formulation to determine the image under f of a vertical line x = a. Any point on this line has the form z = a + iy, and maps to ha + iy = u + iv = a2 − y2 + 2iay Points on the line x = a map to points u v with u = a2 − y2 and v = 2ay. Write y = v/2a (assuming that a = 0) to obtain u = a2 −
v2 4a2
or v2 = 4a2 a2 − u the equation of a parabola in the u v plane. h maps vertical lines x = a = 0 to parabolas.
1058
CHAPTER 25
Conformal Mappings
If a = 0, the vertical line x = a is the imaginary axis, consisting of points z = iy. Now hz = −y2 , so h maps the imaginary axis in the x y plane to the nonpositive part of the real axis in the u v plane. Figure 25.4 shows the parabolic image of a line x = a = 0. The larger a is, the more the parabola opens up to the left (intersects the v axis further from the origin). As a is chosen smaller, approaching 0, these parabolas become “flatter”, approaching the nonpositive real axis in the u v plane. A horizontal line y = b consists of points z = x + ib, which map to hz = x + ib2 = x2 − b2 + 2ixb = u + iv Now u = x2 − b2 and v = 2xb, so, for b = 0, v2 = 4b2 b2 + u A typical such parabola is shown in Figure 25.5, opening up to the right. These parabolas also upen up more the larger b is. If b = 0, the line y = b is the real axis in the z plane, and this maps to hx = x2 , giving the nonnegative real axis in the u v plane as x takes on all real values.
z
w z2
y a
x
w
z v a2
bi
u
Im(z) b
x
Re z a FIGURE 25.4 w = z2 maps vertical lines to parabolas opening left.
w
w z2
y
v
b 2
u
FIGURE 25.5 w = z2 maps horizontal lines to parabolas opening right.
EXAMPLE 25.4
We will look at the exponential function w = Ez = ez as a mapping. Write w = u + iv = ex+iy = ex cosy + iex siny so u = ex cosy
and v = ex siny
As a mapping of the entire plane to itself, E is not onto (no number maps to zero), and E is also not one-to-one (all points z + 2ni have the same image, for n any integer). Consider a vertical line x = a in the x y plane. The image of this line consists of points u + iv with u = ea cosy
v = ea siny
Then u2 + v2 = e2a so the line x = a maps to the circle of radius ea about the origin in the u v plane. Actually, as the point z = a + iy moves along this vertical line, the image point u + iv makes one complete circuit around the circle every time y varies over an interval of length 2, since cosy + 2n = cosy and siny + 2n = siny. We may therefore think of a vertical line as infinitely many intervals of length 2 strung together, and the exponential function wraps each segment once around the circle u2 + v2 = e2a (Figure 25.6).
25.1 Functions as Mappings z
w
w ez
y a
z v
ea
x
w
Re z a
b
u
w = ez wraps a vertical line around a circle, covering the circle once for every interval of length 2.
Im z d
d
Im z c
c
Im z b
b
x
u
lines to half-rays from the origin.
w
v
b
θd θc
w ez
Re z a
v
FIGURE 25.7 w = ez maps horizontal
y
a
w
ea
FIGURE 25.6
z
w ez
y
1059
u
x w e a w e b
Re z b
FIGURE 25.8 w = ez maps the rectangle shown to a wedge bounded by two half-rays and two circles.
The image of a point z = x + ib on the horizontal line y = b is a point u + iv with u = ex sinb
y = ex cosb
As x varies over the real line, ex varies from 0 to over the positive real axis. The point ex sinb ex cosb moves along a half-line from the origin to infinity, making an angle b radians with the positive real axis (Figure 25.7). In polar coordinates this half-line is = b. Using these results, we can find the image of any rectangle in the x y plane, having sides parallel to the axes. Let the rectangle have sides on the lines x = a, x = b, y = c and y = d (in the x y plane in Figure 25.8). These lines map, respectively, to the circles u2 + v2 = e2a
u2 + v2 = e2b
and the half-lines =c
and = d
The wedge in the u v plane in Figure 25.8 is the image of the rectangle under this exponential mapping. Given a mapping f and a domain D, here is a strategy that is often successful in determining fD. Suppose D has a boundary made up of curves !1 !n . Find the images of these curves, f!1 f!n . These form curves in the w− plane, bounding two sets, labeled I and II in Figure 25.9. fD is one of these two sets. To determine which it is, choose a convenient point " in D and locate f". This set will be fD.
EXAMPLE 25.5
We will determine the image, under the mapping w = fz = sinz, of the strip S consisting of all z with −/2 < Rez < /2 and Imz > 0. S is shown in Figure 25.10.
1060
CHAPTER 25
Conformal Mappings z z !1 !2
w
f(!1)
!n
f(!n)
II
I
D
Re z 2 Im z 0
Re z 2 Im z 0
2
2
f(!2)
FIGURE 25.10 Strip bounded by vertical lines x = −/2 and x = /2 and the x axis.
FIGURE 25.9
The boundary of S consists of the segment −/2 ≤ x ≤ /2 on the real axis, together with the half-lines x = −/2 and x = /2 for y ≥ 0. We will carry out the strategy of looking at images of the lines bounding S. First, w = u + iv = sinx coshy + i cosx sinhy If x = −/2, then w = u + iv = − coshy Since 0 ≤ y < on this part of the boundary of S, then coshy varies from 1 to . The image of the left vertical boundary of S is therefore the interval − −1 on the real axis in the u v plane. If x = /2, a similar analysis shows that the image of the right vertical boundary of S is
1 on the real axis in the u v plane. Finally, if y = 0, then w = sinx As x varies from −/2 to /2, sinx varies from −1 to 1. Thus /2 /2 maps to −1 1 in the u v plane. Figure 25.11 shows these results. The boundary of S maps onto the entire real axis in the u v plane. This axis is the boundary of two sets in the w plane, the upper half-plane and the lower half-plane. Choose any convenient z in S, say z = i. Its image is w = sini = i sinh1 lying in the upper half-plane. Therefore the image of S is the upper half-plane. Orientation plays a role in these mappings. Imagine walking along the boundary of S in a counterclockwise sense. This means that we start somewhere up the left boundary x = −/2, walk down this line to the real axis, then turn left and walk along this axis to x = /2, then turn left again and proceed up the right boundary line. Follow the movement of the image point fz z
w y
Re z 2 Im z 0
2
v
Re z 2 Im z 0 2
x
1
u 1
w = sinz maps x = −/2, y ≥ 0, to u ≤ −1; −/2 ≤ x ≤ /2 to −1 ≤ u ≤ 1; and x = /2, y ≥ 0, to u ≥ 1.
FIGURE 25.11
25.1 Functions as Mappings
1061
as z takes this route. As z moves down the line x = −/2, fz = sinz begins somewhere to the left of −1 on the real axis in the w plane, and moves toward w = −1. As w turns the first corner and moves along the real axis in the z plane from −/2 to /2, fz continues through −1 and proceeds along the real axis to w = +1. Finally, z turns upward and moves along the line x = /2, and fz moves through w = 1 and out the real axis in the w plane. As z traverses the boundary of the strip in a counterclockwise sense (interior of S to the left), fz traverses the boundary of the upper half plane from left to right in a counterclockwise sense (interior of this plane to the left). In this example, as z moves over the boundary of D in a positive (counterclockwise) sense, fz moves over the boundary of fD in a positive sense. In the next section we will discuss mappings that preserve angles and sense of rotation.
SECTION 25.1
PROBLEMS
1. In each of (a) through (e), find the image of the given rectangle under the mapping w = ez . Sketch the rectangle in the z plane and its image in the w plane.
4. Determine the image of the sector /4 ≤ ≤ 5/4 under the mapping w = z2 . Sketch the sector and its image.
(a) 0 ≤ x ≤ 0 ≤ y ≤ (b) −1 ≤ x ≤ 1 − ≤ y ≤ 2 2 (c) 0 ≤ x ≤ 1 0 ≤ y ≤ 4 (d) 1 ≤ x ≤ 2 0 ≤ y ≤ (e) −1 ≤ x ≤ 2 − ≤ y ≤ 2 2 2. In each of (a) through (e), find the image of the rectangle under the mapping w = cosz. Sketch the rectangle and its image in each case.
5. Determine the image of the sector /6 ≤ ≤ /3, under the mapping w = z3 . Sketch the sector and its image.
(a) 0 ≤ x ≤ 1 1 ≤ y ≤ 2 ≤ x ≤ 1 ≤ y ≤ 3 (b) 2 (c) 0 ≤ x ≤ ≤ y ≤ 2 (d) ≤ x ≤ 2 1 ≤ y ≤ 2 (e) 0 ≤ x ≤ 0 ≤ y ≤ 1 2 3. In each of (a) through (e), find the image of the rectangle under the mapping w = 4 sinz. Sketch the rectangle and its image in each case. (a) 0 ≤ x ≤ 0 ≤ y ≤ 2 2 ≤ x ≤ 0 ≤ y ≤ (b) 4 2 4 (c) 0 ≤ x ≤ 1 0 ≤ y ≤ 6 3 ≤x≤ 0 ≤ y ≤ (d) 2 2 2 (e) 1 ≤ x ≤ 2 1 ≤ y ≤ 2
6. Show that the mapping w=
1 1 z+ 2 z
maps the circle z = r onto an ellipse with foci 1 and −1 in the w plane. Sketch a typical circle and its image. 7. Show that the mapping of Problem 6 maps a half-line = constant onto a hyperbola with foci ±1 in the w plane. Sketch a typical half-line and its image. 8. Show that the mapping w = 1/z maps every straight line to a circle or straight line, and every circle to a circle or straight line. Give an example of a circle that maps to a line, and a line that maps to a circle. 9. Determine the image of the infinite strip 0 ≤ Imz ≤ 2 under the mapping w = ez . 10. Let D consist of all z in the rectangle having vertices ±i and ± i, with a positive number. (a) Determine the image of D under the mapping w = cosz. Sketch D and its image. (b) Determine the image of D under the mapping w = sinz. Sketch this image. (c) Determine the image of D under the mapping w = 2z2 . Sketch this image.
1062
25.2
CHAPTER 25
Conformal Mappings
Conformal Mappings Let f D → D∗ be a mapping.
DEFINITION 25.3
Angle Preserving Mapping
We say that f preserves angles if, for any z0 in D, if two smooth curves in D intersect at z0 , and the angle between these curves at z0 is , then the images of these curves intersect at the same angle at fz0 .
This idea is illustrated in Figure 25.12. The images of !1 and !2 are curves f!1 and f!2 in D∗ . Suppose !1 and !2 intersect at z0 , and that their tangents have an angle between them there. We require that the tangents to f!1 and f!2 intersect at fz0 at the same angle. If this condition holds for all smooth curves passing through each point of D, then f is angle preserving on D. z
w
!2
f(!2)
!1
f(z 0)
z0
FIGURE 25.12
DEFINITION 25.4
f(!1)
Angle-preserving mapping.
Orientation Preserving Mapping
f preserves orientation if a counterclockwise rotation in D is mapped by f to a counterclockwise rotation in D∗ .
This idea is illustrated in Figure 25.13. If L1 and L2 are lines through any point z0 in D, and the sense of rotation from L1 to L2 is counterclockwise, then the sense of rotation from fL1 to fL2 through fz0 in D∗ must also be counterclockwise. Of course fL1 and fL2 need not be straight lines, but one can still consider a sense of rotation frrom the tangent to fL1 to the tangent to fL2 at fz0 . By contrast, Figure 25.14 illustrates a mapping that does not preserve orientation. z
L2 z0
L1
FIGURE 25.13
mapping.
w
f (L 2 ) f (z 0)
z
f (L 1)
Orientation-preserving
L2 L1
FIGURE 25.14
mapping.
w f (L 1) f (L 2 )
A nonorientation-preserving
25.2 Conformal Mappings
1063
Preservation of angles and orientation are independent concepts. A mapping may preserve one but not the other. If f D → D∗ preserves both, we say that f is conformal.
DEFINITION 25.5
Conformal Mapping
f D → D∗ is a conformal mapping if f is angle preserving and orientation preserving.
The following theorem generates many examples of conformal mappings. THEOREM 25.1
Let f D → D∗ be a differentiable function defined on a domain D. Suppose f z = 0 for all z in D. Then f is conformal. Thus, a differentiable function with a nonvanishing derivative on a domain (connected, open set) maps this set in such a way as to preserve both angles and orientation. We will sketch an argument showing why this is true. Let z0 be in D and let ! be a smooth curve in D through z0 . Then f! is a smooth curve through fz0 in D∗ (Figure 25.15). If w = fz and w0 = fz0 , then w − w0 =
fz − fz0 z − z0 z − z0
Now recall that argument behaves like a logarithm, in the sense that any argument of a product is a sum of arguments of the individual factors. Then fz − fz0 + argz − z0 argw − w0 = arg (25.1) z − z0 In Figure 25.16, is the angle between the positive real axis and the line through z and z0 , and is an argument of z − z0 . The angle between the positive real axis and the line through w and w0 in the w plane is an argument of w − w0 . In the limit as z → z0 , equation (25.1) gives us = arg f z0 + . It is here that we use the assumption that f z0 = 0, because 0 has no argument. If ! ∗ is another smooth curve through z0 , then by the same reasoning, ∗ = arg f z0 + ∗ Then − ∗ = − ∗ y z
v
w !
z f(!)
z0 D
FIGURE 25.15
f (z 0 )
w w0
z0 D*
θ
FIGURE 25.16
f()
x
u
1064
CHAPTER 25
Conformal Mappings
(to within integer multiples of 2). But − ∗ is the angle between the tangents to ! and ! ∗ at z0 , and − ∗ is the angle between the tangents to f! and f! ∗ at fz0 . Therefore f preserves angles. The last “equation” also implies that f preserves orientation, since the sense of rotation from ! to ! ∗ is the same as the sense of rotation from f! to f! ∗ . A reversal of the sense of rotation would be implied if we had found that − ∗ = ∗ − . For example, w = sinz is differentiable, with a nonvanishing derivative, on the strip −/2 < Rez < /2, and so is a conformal mapping of this strip onto a set in the w plane. A composition of conformal mappings is conformal. Suppose f maps D conformally onto D∗ , and g maps D∗ conformally onto D∗∗ . Then g f maps D conformally onto D∗∗ (Figure 25.17), because angles and orientation are preserved at each stage of the mapping. We will now consider an important class of conformal mappings.
25.2.1 Linear Fractional Transformations Often we have domains D and D∗ (for example, representing areas of a fluid flow), and we want to produce a conformal mapping of D onto D∗ . This can be a formidable task. Linear fractional transformations are relatively simple conformal mappings that will sometimes serve this purpose.
DEFINITION 25.6
Linear Fractional Transformation
A linear fractional transformation is a function az + b Tz = cz + d in which a, b, c and d are given complex numbers, and ad − bc = 0.
Other names for this kind of function are Möbius transformation and bilinear transformation. The function is defined except at z = −d/c, which is a simple pole of T . Further, T z =
ad − bc cz + d2
f D*
D
g°f
g
D**
FIGURE 25.17 A composition of conformal mappings is conformal.
25.2 Conformal Mappings
1065
and this is nonzero if z = −d/c. T is therefore a conformal mapping of the plane with the point z = −d/c removed. The condition ad − bc = 0 ensures that T is one-to-one, hence invertible. If we set w = az + b/cz + d, then the inverse mapping is z=
dw − b −cw + a
which is also a linear fractional transformation. We will look at some special kinds of linear fractional transformations.
EXAMPLE 25.6
Let w = Tz = z + b, with b constant. This is called a translation because T shifts z horizontally by Reb units and vertically by Imz units. For example, if Tz = z + 2 − i, then T takes z and moves it two units to the right and one unit down (Figure 25.18). We can see this with the following points and their images: 0 → 2 − i
1 → 3 − i
i → 2
4 + 3i → 6 + 2i
EXAMPLE 25.7
Let w = Tz = az, with a a nonzero constant. This is called a rotation/magnitication. To see why, first observe that w = a z If a > 1, this transformation lengthens a complex number, in the sense of lengthening the line from the origin to z. If a < 1 it shortens this distance. Thus the term magnification. Now write the polar forms z = rei and a = Aei . Then Tz = arei+ so the transformation adds to the argument of any nonzero complex number. This rotates the number counterclockwise through an angle . This is the reason for the term rotation. The total effect of the transformation is therefore a scaling and a rotation. As a specific example, consider w = 2 + 2iz This will map i → −2 + 2i 1 → 2 + 2i
and
1 + i → 4i
as shown in Figure 25.19. As Figure 25.20 suggests, in general the image of z is obtained by √ multiplying the magnitude of z by 2 + 2i = 8, and rotating the line from the origin to z counterclockwise through /4 radians. If a = 1, Tz = az is called a pure rotation, since in this case there is no magnification effect, just a rotation through an argument of a.
1066
CHAPTER 25
Conformal Mappings z z z wz 2 i
i
w v
y
4i
1i
2 2i
2 2i
x
u
1
FIGURE 25.18
FIGURE 25.19
y
v
z
w T(z)
1 z
2 2i z
θ
FIGURE 25.20
z
θ π4
z x
u
The mapping Tz = 2 + 2iz.
1 z
z 1
FIGURE 25.21
Image of z under an inversion.
EXAMPLE 25.8
Let w = Tz = 1/z. This mapping is called an inversion. For z = 0, w =
1 z
and argw = arg1 − argz = − argz (within integer multiples of 2). This means that we arrive at Tz by moving 1/ z units from the origin along the line from 0 to z, and then reflecting this point across the real axis (Figure 25.21). This maps points inside the unit disk to the exterior of it, and points exterior to the interior, while points on the unit circle remain on the unit circle √ (but get moved around √ this circle, except for 1 and −1). For example, if z = 1 + i/ 2, then 1/z = 1 − i/ 2 (Figure 25.22). We will now show that translations, rotation/magnifications and inversions are the fundamental linear fractional transformations, in the sense that any such mapping can be achieved as a sequence of transformations of these three kinds. To see how to do this, begin with Tz =
az + b cz + d
Tz =
a b z+ d d
If c = 0, then
is a rotation/magnification followed by a translation: z →
rot/mag
a a b z → z+ d trans d d
25.2 Conformal Mappings
1067
z 1i
z
2
1 z
1i
z 1
2
1 1i 2
FIGURE 25.22 Image of a point on the unit circle under an inversion.
If c = 0, then T is the result of the following sequence: z → cz → cz + d → rot/mag
→
rot/mag
trans
inv
1 cz + d
a az + b bc − ad 1 bc − ad 1 → + = = Tz c cz + d trans c cz + d c cz + d
This way of breaking a linear fractional transformation into simpler components has two purposes. First, we can analyze general properties of these transformations by analyzing the simpler component transformations. Perhaps more important, we sometimes use this sequence to build conformal mappings between given domains. The following is a fundamental property of linear fractional transformations. We use the term line to mean straight line. THEOREM 25.2
A linear fractional transformation maps any circle to a circle or line, and any line to a circle or line. Because of the preceding discussion, we need to verify this only for translations, rotation/magnifications and inversions. It is obvious geometrically that a translation maps a circle to a circle and a line to a line. Similarly, a rotation/magnification maps a circle to a circle and a line to a line. Now we need to determine the effect of an inversion on a circle or line. Begin with the fact that any circle or line in the plane is the graph of an equation
Proof
Ax2 + y2 + Bx + Cy + R = 0 in which A, B, C, and R are real numbers. This graph is a circle if A = 0 and a line if A = 0 and B and C are not both zero.With z = x + iy , this equation becomes A z2 +
B C z + z + z − z + R = 0 2 2i
Now let w = 1/z. The image in the w plane of this locus is the graph of 1 1 1 B 1 C 1 A 2+ + − + + R = 0 2 w w 2i w w w Multiply this equation by ww (the same as w2 ) to get R w2 +
B C w + w − w − w + A = 0 2 2i
1068
CHAPTER 25
Conformal Mappings
In the w plane, this is the equation of a circle if R = 0, and a line if A = 0 and B and C are not both zero. As the proof shows, translations and rotation/magnifications actually map circles to circles and lines to lines, while an inversion may map a circle to a circle or line, and a line to a circle or line.
EXAMPLE 25.9
Let w = Tz = iz − 2 + 3. This is the sequence z → z − 2 → iz − 2 → iz − 2 + 3 = w
(25.2)
a translation by 2 to the left, followed by a counterclockwise rotation by /2 radians (an argument of i is /2), and then a translation by 3 to the right. Since this mapping does not involve an inversion, it maps circles to circles and lines to lines. As a specific example, consider the circle K given by x − 22 + y2 = 9 with radius 3 and center 2 0. Write this equation as x2 + y2 − 4x − 5 = 0 or z2 − 2z + z − 5 = 0 Solve w = iz − 2 + 3 to get z = −iw − 3 + 2 and substitute into the last equation to get iw − 3 + 22 − 2 −iw − 3 + 2 + −iw − 3 + 2 − 5 = 0 After some routine manipulation this gives us w2 − 3w + w = 0 With w = u + iv, this is u − 32 + v2 = 9 a circle of radius 3 and center 3 0 in the u v− plane. This result could have been predicted geometrically from the sequence of elementary mappings 25.2, shown in stages in Figure 25.23. The sequence first moves the circle 2 units to the left, then (multiplication by i) rotates it by /2 radians (which leaves the center and radius the same), and then translates it 3 units right. The result is a circle of radius 3 about 3 0.
EXAMPLE 25.10
We will examine the effects of an inversion w = 1/z on the vertical line Rez = a = 0. On this line, z = a + iy and its image consists of points w= It is routine to check that
a 1 y = 2 − 2 i = u + iv 2 a + iy a + y a + y2 1 2 1 + v2 = 2 u− 2a 4a
25.2 Conformal Mappings y
1
Translation x 5 (x 2) 2 y 2 9
2
Rotation 3
(a)
Translation 3
3 (b)
3
1069
v
0
(c)
3
6
u
(d)
FIGURE 25.23
so the image of this vertical line is a circle in the u v plane having center 1/2a 0 and radius 1/2a. In preparation for constructing mappings between given domains, we will show that we can always produce a linear fractional transformation mapping three given points to three given points. THEOREM 25.3
Three Point Theorem
Let z1 , z2 and z3 be three distinct points in the z plane, and w1 , w2 and w3 three distinct points in the w plane. Then there is a linear fractional transformation T of the z plane to the w plane such that Tz1 = w1 Tz2 = w2 Proof
and Tz3 = w3
Let w = Tz be the solution for w in terms of z and the six given points in the equation w1 − ww3 − w2 z1 − z2 z3 − z = z1 − zz3 − z2 w1 − w2 w3 − w
Substitution of z = zj into this equation yields w = wj for j = 1 2 3.
EXAMPLE 25.11
We will find a linear fractional transformation that maps 3 → i 1 − i → 4
and
2 − i → 6 + 2i
Put z1 = 3
z2 = 1 − i
w1 = i
w2 = 4
z3 = 2 − i
and w3 = 6 + 2i
in equation (25.3) to get i − w2 + 2i2 + i2 − i − z = 3 − z1i − 46 + 2i − w Solve for w to get w = Tz = Then each Tzj = wj .
20 + 4iz − 68 + 16i 6 + 5iz − 22 + 7i
(25.3)
1070
CHAPTER 25
Conformal Mappings z N: (0, 0, 2) S(x, y) y x
(x, y)
FIGURE 25.24
Stereographic projection identifying the complex sphere with the extended complex plane.
One can show that specification of three points and their images uniquely determines a linear fractional transformation. In the last example, then, T is the only linear fractional transformation mapping the three given points to their given images. When dealing with mappings, it is sometimes convenient to replace the complex plane with the complex sphere. To visualize how this is done, consider the three-dimensional coordinate system in Figure 25.24. A sphere of radius 1 is placed with its south pole at the origin and north pole at 0 0 2. Think of the x y plane as the complex plane. For any x y in this plane, the line from 0 0 2 to x y intersects the sphere in exactly one point Sx y. This associates with each point on the sphere, except 0 0 2, a unique point in the complex plane, and conversely. This mapping is called the stereographic projection of the sphere (minus its north pole) onto the plane. This punctured sphere is called the complex sphere. The point 0 0 2 plays the role of a point at infinity. This is motivated by the fact that, as x y is chosen further from the origin in the x y plane, Sx y moves closer to 0 0 2 on the sphere. The point 0 0 2 is not associated with any complex number, but gives a way of envisioning infinity as a point, something we cannot do in the plane. The extended complex plane (consisting of all complex numbers, together with infinity) is in a one-to-one correspondence with this sphere, including its north pole. To get some feeling for the point at infinity, consider the line y = x in the x y plane. This consists of complex numbers x + xi. If we let x → , the point 1 + ix moves out this line indefinitely far away from the origin. The image of this line on the complex sphere is part of a great circle, and the image point Sx y on the sphere approaches 0 0 2 as x → . This enables us to think of 1 + ix as approaching a specific location we can point to in this limit process, instead of just going further from the origin. In defining a linear fractional transformation, it is sometimes convenient to map one of the three given points in the last theorem to infinity. This can be done by deleting the factors in equation (25.3) involving w3 . THEOREM 25.4
Let z1 z2 z3 be three distinct complex numbers, and w1 w2 distinct complex numbers. Then there is a linear fractional transformation w = Tz that maps Tz1 = w1 Tz2 = w2 Proof
and Tz3 =
Such a transformation is obtained by solving for w in the equation w1 − wz1 − z2 z3 − z = z1 − zw1 − w2 z3 − z2
(25.4)
25.2 Conformal Mappings
1071
EXAMPLE 25.12
We will find a linear fractional transformation mapping i → 4i 1 → 3 − i 2 + i → Solve for w in the equation 4i − wi − 12 + i − z = i − z−3 + 5i1 + i to get w = Tz =
5 − iz − 1 + 3i −z + 2 + i
Some other properties of linear fractional transformations are pursued in the exercises. We now turn to the problem of constructing conformal mappings between given domains.
SECTION 25.2
PROBLEMS
In each of Problems 1 through 5, find a linear fractional transformation taking the given points to the indicated images. 1. 1 → 1 2 → −i 3 → 1 + i 2. i → i 1 → −i 2 → 0 3. 1 → 1 + i 2i → 3 − i 4 → 4. −5 + 2i → 1 3i → 0 −1 → 5. 6 + i → 2 − i i → 3i 4 → −i In each of Problems 6 through 12, find the image of the given circle or line under the linear fractional transformation. 6. w =
2i Rez = −4 z
7. w = 2iz − 4 Rez = 5 8. w =
z−i 1 1 z + z + z − z = 4 iz 2 2i
9. w =
z−1+i z = 4 2z + 1
10. w = 3z − i z − 4 = 3 11. w =
2z − 5 3 z + z − z − z − 5 = 0 z+i 2i
1 + 3iz − 2 z − i = 1 12. w = z 13. Prove that the mapping w = z is not conformal. 14. Prove that the composition of two linear fractional transformations is a linear fractional transformation.
15. Prove that every linear fractional transformation has an inverse, and that this inverse is also a linear fractional transformation. (T ∗ is an inverse of T if T T ∗ and T ∗ T are both the identity mapping, taking each point to itself.) 16. Show that there is no linear fractional transformation mapping the open disk z < 1 onto the set of points bounded by the ellipse u2 /4 + v2 = 1/16. In Problems 17 and 18, the setting is the extended complex plane, which includes the point at infinity. 17. A point z0 is a fixed point of a mapping f if fz0 = z0 . Suppose f is a linear fractional transformation that is neither a translation nor the identity mapping fz = z. Prove that f must have either one or two fixed points, but cannot have three. Why does this conclusion fail to hold for translations? How many fixed points can a translation have? 18. Let f be a linear fractional transformation with three fixed points. Prove that f is the identity mapping. In each of Problems 19, 20 and 21, write the linear fractional transformation as the end result of a sequence of mappings, each of which is a translation, rotation/ magnification or inversion. iz − 4 z z−4 20. w = 2z + i 21. w = iz + 6 − 2 + i z−i 22. w = z+3+i 19. w =
CHAPTER 25
1072
25.3
Conformal Mappings
Construction of Conformal Mappings Between Domains A strategy for solving some kinds of problems (for example, Dirichlet problems) is to find the solution for a “simple” domain (for example, the unit disk), then map this domain conformally to the domain of interest. This mapping may carry the solution for the disk to a solution for the latter domain. Of course, this strategy is predicated on two steps: finding a domain for which we can solve the problem, and being able to map this domain to the domain for which we want the solution. We will now discuss the latter problem. Although in practice it may be an imposing task to find a conformal mapping between given domains, the following result assures us that such a mapping exists, with one exception.
THEOREM 25.5
Riemann Mapping Theorem
Let D∗ be a domain in the w plane, and assume that D∗ is not the entire w plane. Then there exists a one-to-one conformal mapping of the unit disk z < 1 onto D∗ . This powerful result implies the existence of a conformal mapping between given domains. Suppose we want to map D onto D∗ (neither of which is the entire plane). Insert a third plane, the " plane, between the z plane and the w plane, as in Figure 25.25. By Riemann’s theorem, there is a one-to-one conformal mapping g of the unit disk " < 1 onto D∗ . Similarly, there is a one-to-one conformal mapping f of " < 1 onto D. Then g f −1 is a one-to-one conformal mapping of D onto D∗ . In theory, then, two domains, neither of which is the entire plane, can be mapped conformally in one-to-one fashion onto one another. This does not, however, make such mappings easy to find. In attempting to find such mappings, the following observation is useful. A conformal mapping of a domain D onto a domain D∗ will map the boundary of D to the boundary of D∗ . We use this as follows. Suppose D is bounded by a path C (not necessarily closed) which separates the z plane into two domains, D and . These are called complementary domains. Similarly, suppose D∗ is bounded by a path C ∗ which separates the w plane into complementary domains D∗ and ∗ (Figure 25.26). Try to find a conformal mapping f that sends points of C to points of C ∗ . This may be easier than trying to find a mapping of the entire domain. This mapping will then send D to either D∗ or to ∗ . To see which, choose a point z0 in D and see whether fz0 is in D∗ or ∗ . If fz0 is in D∗ (Figure 25.27(a)), then f D → D∗ and we have our conformal mapping. If fz0 is in ∗ as in Figure 25.27(b), then f D → ∗ . This is not yet the mapping we want, but sometimes we can take another step and use f to manufacture a conformal mapping from D to D∗ . We will now construct some conformal mappings, beginning with very simple ones and building up to more difficult problems.
z D
f 1 f
g f 1 g
i
w D*
D 1
Mapping D onto D∗ through the unit disk.
FIGURE 25.25
D
z
C FIGURE 25.26
D*
w D*
C*
25.3 Construction of Conformal Mappings Between Domains
z
z
w
D
D*
D z0
f : D → D* D
f (z 0 )
C
C
C*
FIGURE 25.27(a)
D*
D*
D
D*
w
f : D → D*
z0
f (z 0 )
1073
C*
FIGURE 25.27(b)
EXAMPLE 25.13
Suppose we want to map the unit disk D z < 1 conformally onto the disk D∗ w < 3. Clearly a magnification w = fz = 3z will do this, because all we have to do is expand the unit disk to a disk of radius 3 (Figure 25.28). Notice that this mapping carries the boundary of D onto the boundary of D∗ .
EXAMPLE 25.14
Map the unit disk D z < 1 conformally onto the domain w > 3. Here we are mapping D to the complementary domain of the preceding example. We already know that fz = 3z maps D conformally onto w < 3. Combine this map with an inversion, letting 3 gz = f1/z = z This maps z < 1 to w > 3 (Figure 25.29). Again, the boundary of the unit disk maps to the boundary of w > 3, which is the circle of radius 3 about the origin in the w plane.
EXAMPLE 25.15
We will map the unit disk D z < 1 onto the disk D∗ w − i < 3, of radius 3 and centered at i in the w plane. Figure 25.30 suggests one way to construct this map. We want to expand the unit disk’s radius by a factor of 3, then translate the resulting disk up one unit. Thus, map in steps: z → 3z → 3z + i w z
y
w 3z
z
w v 3i
x
FIGURE 25.28
onto w < 3.
3 z
3i
i
i 1
w
3
u
Mapping of z < 1
1
3
FIGURE 25.29 Mapping of z < 1
onto w > 3.
1074
CHAPTER 25
Conformal Mappings
w " i 3z i z
" 3z
w
"
4i
w " i
3i
y
z
i
i 0 i
3 2i FIGURE 25.30
Mapping of z < 1 onto w − i < 3.
w v i
Re(z) 0 x
w 1 1
1
FIGURE 25.31
a magnification followed by a translation. The mapping is w = fz = 3z + i. This maps the unit circle z = 1 to the circle w − i = 3 because w − i = 3z = 3 z = 3 Further, the origin in the z plane (center of D) maps to i in the w− plane, and i is the center of D∗ , so f D → D∗ .
EXAMPLE 25.16
Suppose we want to map the right half-plane D Rez > 0 onto the unit disk D∗ w < 1. The domains are shown in Figure 25.31. The boundary of D is the imaginary axis Rez = 0. We will map this to the boundary of the unit disk w = 1. To do this, pick three points on Rez = 0, and three on w = 1, and use these to define a linear fractional transformation. There is a subtlety, however. To maintain positive orientation (counterclockwise on closed curves), choose three points in succession down the imaginary axis, so a person walking along these points sees the right half-plane on the left. Map these to points in order counterclockwise around w = 1 For example, we can choose z1 = i
z2 = 0
and z3 = −i
on the imaginary axis in the z plane, and map these in order to w1 = 1
w2 = i
w3 = −1
From equation (25.3), we have 1 − w−1 − ii−i − z = i − z−i1 − i−1 − w Solve for w:
z−1 w = Tz = −i z+1
This conformal mapping must take the right half-plane to the interior or exterior of the unit disk in the w plane. To see which it is, pick a point in Rez > 0, say z = 1. Since T1 = 0 is in D∗ , T maps the right half-plane to the unit disk w < 1 as we want.
EXAMPLE 25.17
Suppose we want to map the right half-plane to the exterior of the unit disk, the domain w > 1. We have T Rez > 0 → w < 1 from the preceding example. If we follow this map (taking
u
25.3 Construction of Conformal Mappings Between Domains
1075
Rez > 0 onto the unit disk) with an inversion (taking the unit disk to the exterior of the unit disk), we will have the map we want. Thus, with Tz as in the last example, let z+1 1 =i fz = Tz z−1 As a check, 1 + i is in the right half-plane, and 2+i f1 + i = i = 2+i i is exterior to the unit disk in the w plane.
EXAMPLE 25.18
We will map the right half-plane Rez > 0 conformally onto the disk w − i < 3. We can do this as a composition of mappings we have already constructed. Put an intermediate " plane between the z and w planes (Figure 25.32). From Example 25.16, map Rez > 0 onto the unit disk " < 1 by z−1 " = fz = −i z+1 Now use the mapping of Example 25.15 to send the unit disk " < 1 onto the disk w − i < 3: w = g" = 3" + i The composition g f is a conformal mapping of Rez > 0 onto w − i < 3: z−1 2i−z + 2 +i = w = g fz = gfz = 3fz + i = 3 −i z+1 z+1
EXAMPLE 25.19
We will map the infinite strip S −/2 < Imz < /2 onto the unit disk w < 1. Recall from Example 25.4 that the exponential function maps horizontal lines to halflines from the origin. The boundary of S consists of two horizontal lines, Imz = −/2 and Imz = /2. On the lower boundary line, z = x − i/2, so ez = ex e−i/2 = −iex which varies over the negative imaginary axis as x takes on all real values. On the upper boundary of S, z = x + i/2, and ez = iex varies over the positive part of the imaginary axis as x varies over the real line. The imaginary axis forms the boundary of the right half-plane Rez > 0, as well as of the left half-plane Rew < 0 in the w plane. The mapping w = ez must map S to one of these complementary domains. However, this mapping sends 0 to 1, in the right half-plane, so the mapping w = fz = ez maps S to the right half-plane. We want to map S onto the unit disk. But now we know a mapping of S onto the right half-plane, and we also know a mapping of the right half-plane onto the unit disk. All we have to do is put these together.
1076
CHAPTER 25
Conformal Mappings
z
gf
w 4i
g
i
"
f
i
z
w 1
x
1
u Re(z" ) 0
– i
2i FIGURE 25.32
w v
" i 2
2
Imz < 2 → Re" > 0 → w < 1 yields a mapping of Imz < 2 onto w < 1.
Mapping of Rez > 0
FIGURE 25.33
onto w − i < 3.
In Figure 25.33, put a " plane between the z and w planes. Map " = fz = ez taking S onto the right half-plane Re" > 0. Next map " −1 w = g" = −i " +1 taking the right half-plane Re" > 0 onto the unit disk w < 1. Therefore, the function w = g fz = gfz z e −1 = gez = −i z e +1 is a conformal mapping of S onto w < 1. In terms of hyperbolic functions, this mapping can be written w = −i tanhz/2 The conformal mapping of the last example is not a linear fractional transformation. These are convenient to use whenever possible. However, even when we know from the Riemann mapping theorem that a conformal mapping exists between two domains, we are not guaranteed that we can always find such a mapping in the form of a linear fractional transformation.
EXAMPLE 25.20
We will map the disk z < 2 onto the domain D∗ u + v > 0 in the u v plane. These domains are shown in Figure 25.34. Consider mappings we already have that relate to this problem. First, we can map z < 2 to " < 1 by a simple magnification (multiply by 21 ). But we also know a mapping from the unit disk to the right half-plane. Finally, we can obtain D∗ from the right half-plane by a
y
z
w v D*
D
u
x z2
FIGURE 25.34
u v 0
25.3 Construction of Conformal Mappings Between Domains z
" 1 w
z 2
w
"
1077
4
Re (z ) 0
u v 0
FIGURE 25.35
counterclockwise rotation through /4 radians, an effect achieved by multiplying by ei/4 . This suggests the strategy of constructing the mapping we want in the stages shown in Figure 25.35: z < 2 → " < 1 → Re > 0 → u + v > 0 The first step is achieved by 1 " = z 2 Next, use the inverse of the mapping from Example 25.16 and name the variables " and to get =
1 + i" 1 − i"
This maps " < 1 → Re > 0 Finally, perform the rotation: w = ei/4 In sum,
w=e
i/4
=e
i/4
1 + i" 1 − i"
=e
i/4
1 + iz/2 1 − iz/2
=
2 + iz i/4 e 2 − iz
This maps the disk z < 2 conformally onto the half-plane u + v > 0. For example, 0 is in the disk, and √ 2 i/4 1 + i w0 = e = 2 is in u + v > 0. We will briefly discuss the Schwarz–Christoffel transformation, which may be used when a domain has a polygon as a boundary.
25.3.1
Schwarz–Christoffel Transformation
Suppose we want a conformal mapping of the upper half-plane to the interior of a polygon P, which could be a triangle, rectangle, pentagon or other polygon. A linear fractional transformation will not do this. However, the Schwarz–Christoffel transformation was constructed just for this purpose. Let P have vertices w1 wn in the w plane (Figure 25.36). Let the exterior angles of P be 1 n . We claim that there are constants z0 , a and b, with Imz0 > 0, and real numbers x1 xn such that the function z fz = a − x1 −1 − x2 −2 · · · − xn −n d + b (25.5) z0
1078
CHAPTER 25
Conformal Mappings z
w v
n
1 w2
w1
y
1
w v
wn
2 xn
x1 x2
x
wn
2 w3
u
3
u FIGURE 25.36
w1 n
w2
FIGURE 25.37
is a conformal mapping of onto . This integral is taken over any path in from z0 to z in . The factors − xj −j are defined using the complex logarithm obtained by taking the argument lying in 0 2. Any function of the form of equation (25.5) is called a Schwarz–Christoffel transformation. To see the idea behind this function, suppose each xj < xj+1 . If z is in , let gz = az − x1 −1 z − x2 −2 · · · z − xn −n Then f z = gz and arg f z = argz − 1 argz − x1 − · · · − n argz − xn As we saw in the discussion of Theorem 25.1, arg f z is the number of radians by which the mapping f rotates tangent lines, if f z = 0. Now imagine z moving from left to right along the real axis (Figure 25.37), which is the boundary of . On − x1 , fz moves along a straight line (no change in the angle). As z passes over x1 , however, arg f z changes by 1 . This angle remains fixed as z moves from x1 toward x2 . As z passes over x2 , arg f z changes by 2 , then remains at this value until z reaches x3 , where arg f z changes by 3 , and so on. Thus arg f z remains constant on intervals xj−1 xj and increases by j as z passes over xj . The net result is that the real axis is mapped to a polygon P ∗ having exterior angles 1 n . These numbers are actually determined by 1 n−1 , since n
j = 2
j=1
P ∗ has the same exterior angles as P but need not be the same as P because of its location and size. We may have to rotate, translate and/or magnify P ∗ to obtain P. These effects are achieved by choosing x1 xn to make P ∗ similar to P, and then choosing a (rotation/magnification) and b (translation) to obtain P. If we choose zn = , then z1 zn−1 are mapped to the vertices of P. In this case the Schwarz–Christoffel transformation is z fz = a − x1 −1 − x2 −2 · · · − xn −n−1 d + b (25.6) z0
It can be shown that any conformal mapping of onto a polygon must have the form of a Schwarz–Christoffel transformation. In practice a Schwarz–Christoffel transformation can be difficult or impossible to determine in closed form because of the integration.
25.3 Construction of Conformal Mappings Between Domains
1079
EXAMPLE 25.21
We will map the upper half-plane onto a rectangle. Choose x1 = 0 x2 = 1 and x3 as any real number greater than 1. The Schwarz–Christoffel transformation of equation (25.6) has the form z 1 fz = a d + b z0 − 1 − x3 with a and b chosen to fit the dimensions of the rectangle and its orientation with respect to the axes. The radical appears because the exterior angles of a rectangle are all equal to /2, so 4j=1 j = 4k = 2. This integral is an elliptic integral and cannot be evaluated in closed form.
EXAMPLE 25.22
We will map onto the strip S Imw > 0 −c < Rew < c in the w plane. Here c is a positive constant.
and the strip S are shown in Figure 25.38. To use the Schwarz–Christoffel transformation, we must think of S as a polygon with vertices −c, c and . Choose x1 = −1 to map to −c and x2 = 1 to map to c. Map to . The exterior angles of the strip are /2 and /2, so 1 = 2 = 21 . The transformation has the form z w = fz = a + 1−1/2 − 1−1/2 d + b z0
We will choose z0 = 0 and b = 0. Write − 1−1/2 = −1 − −1/2 = −i1 − −1/2 With −ai = A, we have w = fz = A
z 0
1 d 1 − 2 1/2
This integral is reminiscent of the real integral representation of the inverse sine function. Indeed, we can write w = A sin−1 z by which we mean that z = sin
w
A To choose A so that −1 maps to −c and 1 to c, we need c = 1 sin A
y
z
w v x
FIGURE 25.38
Re w c c
Re w c c
u
CHAPTER 25
1080
Conformal Mappings
Thus choose c/A = /2, or A=
2c
The mapping is w=
2c −1 sin z
If we choose c = /2, this mapping is just w = sin−1 z, mapping onto the strip Imw > 0, −/2 < Rew < /2. This is consistent with the result of Example 25.5.
SECTION 25.3
PROBLEMS
In each of Problems 1 through 6, find a linear fractional transformation mapping the first domain onto the second. 1. z < 3 onto w − 1 + i < 6 2. z < 3 onto w − 1 + i > 6
maps the upper half plane onto the rectangle with verticesl 0, c, c + ic and ic, where c = 1 1 3 $ $ /$ . Here $ is the gamma function. 2 4 4 10. Define the cross ratio of z1 z2 z3 and z4 to be the image of z1 under the linear fractional transformation that maps z2 → 1, z3 → 0, z4 → . Denote this cross ratio as z1 z2 z3 z4 . Suppose T is any linear fractional transformation. Show that T preserves the cross ratio. That is,
3. z + 2i < 1 onto w − 3 > 2 4. Rez > 1 onto Imw > −1 5. Rez < 0 onto w < 4 6. Imz > −4 onto w − i > 2 7. Find a conformal mapping of the upper half-plane Imz > 0 onto the wedge 0 < argw < /3.
z1 z2 z3 z4 = Tz1 Tz2 Tz3 Tz4
8. Let w = Logz, in which the logarithm is given a unique value for each nonzero z by rectricting the argument of z to lie in 0 2. Show that this mapping takes Imz > 0 onto the strip 0 < Imw < .
11. Prove that z1 z2 z3 z4 is the image of z1 under the linear fractional transformation defined by z − z4 z − z2 w = 1− 3 z3 − z2 z − z4
9. Show that the Schwarz–Christoffel transformation z fz = 2i + 1−1/2 − 1−1/2 −1/2 d
12. Prove that z1 z2 z3 z4 is real if and only if the zj s are on the same circle or straight line.
0
25.4
Harmonic Functions and the Dirichlet Problem Given a set D of points in the plane, let D denote the boundary of D. A Dirichlet problem for D is to find a solution of Laplace’s equation
2 u 2 u + =0
x2 y2 for x y in D, satisfying the boundary condition ux y = fx y for x y in D Here f is a given function, usually assumed to be continuous on the boundary of D.
25.4 Harmonic Functions and the Dirichlet Problem
1081
A function satisfying Laplace’s equation in a set is said to be harmonic on that set. Thus the Dirichlet problem for a set is to find a function that is harmonic on that set, and satisfies given data on the boundary of the set. Chapter 19 is devoted to solutions of Dirichlet problems using methods from real analysis. Our purpose here is to apply complex function methods to Dirichlet problems. The connection between a Dirichlet problem and complex function theory is given by the following.
THEOREM 25.6
Let D be an open set in the plane, and let fz = ux y + ivx y be differentiable on D. Then u and v are harmonic on D. That is, the real and imaginary parts of a differentiable complex function are harmonic. Proof
By the Cauchy–Riemann equations,
2 u 2 u
u
u + = +
x2 y2
x x
y y
2 v
v
v
2 v − = 0 = + − =
x y
y
x
x y y x
Therefore u is harmonic on D. The proof that v is harmonic is similar. Conversely, given a harmonic function u, there is a harmonic function v so that fz = ux y + ivx y is differentiable. Such a v is called a harmonic conjugate for u. THEOREM 25.7
Let u be harmonic on a domain D. Then, for some v, ux y + ivx y defines a differentiable complex function for z = x + iy in D. Proof
Let gz =
u
u −i
x
y
for x y in D. Using the Cauchy–Riemann equations and Theorem 21.6, we find that g is differentiable on D. Then, for some function G, G z = gz for z in D. Write Gz = Ux y + iVx y Now G z =
U
u
u
U −i = gz = −i
x
y
x
y
Therefore
U
u =
x
x
and
U
u =
y
y
on D. Then, for some constant K, Ux y = ux y + K
1082
CHAPTER 25
Conformal Mappings
Let fz = Gz − K. Then f is differentiable at all points of D. Further, fz = Gz − K = Ux y + iVx y − K = ux y + ivx y We may therefore choose vx y = Vx y, proving the theorem. Given a harmonic function u, we will not be interested in actually producing a harmonic conjugate v. However, we will exploit the fact that there exists such a function to produce a differentiable complex function f = u + iv, given harmonic u. This enables us to apply complex function methods to Dirichlet problems. As a preliminary, we will derive two important properties of harmonic functions. THEOREM 25.8
Mean Value Property
Let u be harmonic on a domain D. Let x0 y0 be any point of D, and let C be a circle of radius r centered at x0 y0 , contained in D and enclosing only points of D. Then 1 2 ux0 y0 = ux0 + r cos y0 + r sind 2 0 As varies from 0 to 2, x0 + r cos y0 + r sin moves once about the circle of radius r about x0 y0 . The conclusion of the theorem is called the mean value property because it states that the value of a harmonic function at the center of any circle in the domain is the average of its values on the circle. For some v, f = u + iv is differentiable on D. Let z0 = x0 + iy0 . By Cauchy’s integral formula, 1 - fz ux0 y0 + ivx0 y0 = fz0 = dz 2i C z − z0 1 2 fz0 + rei i = ire d 2i 0 rei 1 2 = ux0 + r cos y0 + r sind 2 0 i 2 + vx0 + r cos y0 + r sind 2 0
Proof
By taking the real and imaginary part of both sides of this equation, we get the conclusion of the theorem. If D is a bounded domain, then the set D consisting of D, together with all boundary points of D, is called the closure of D. D is a closed and bounded, hence compact set. If ux y is continuous on D, then ux y must achieve a maximum value on D. If u is also harmonic on D, we claim that this maximum must occur at a boundary point of D. This is reminiscent of the maximum modulus theorem, from which it follows. THEOREM 25.9
Let D be a bounded domain. Suppose u is continuous on D and harmonic on D. Then ux y achieves its maximum value on D at a boundary point of D. Proof
First produce v so that f = u + iv is differentiable on D. Define gz = efz
25.4 Harmonic Functions and the Dirichlet Problem
1083
for all z in D. Then g is differentiable on D. By the maximum modulus theorem, gz achieves its maximum at a boundary point of D. But gz = euxy+ivxy = euxy Since euxy is a strictly increasing real function, euxy and ux y must achieve their maximum values at the same point. Therefore ux y must achieve its maximum at a boundary point of D. For example, ux y = x2 −y2 is harmonic on the open unit disk x2 +y2 < 1, and continuous on its closure x2 +y2 ≤ 1. This function must therefore achieve its maximum value for x2 +y2 ≤ 1 at a boundary point of this disk, namely at a point for which x2 + y2 = 1. We find that this maximum value is 1, occurring at 1 0 and at −1 0.
25.4.1
Solution of Dirichlet Problems by Conformal Mapping
We want to use conformal mappings to solve Dirichlet problems. The strategy is to first solve the Dirichlet problem for a disk. Once we have this, we can attempt to solve a Dirichlet problem for another domain D by constructing a conformal mapping between the unit disk and D, and applying this mapping to the solution we have for the disk. In Section 19.3 we derived Poisson’s integral formula for the solution of the Dirichlet problem for a disk, using Fourier methods. We will use complex function methods to obtain a form of this solution that is particularly suited to using with conformal mappings. , z < 1 and assumes given values We want a function u that is harmonic on the disk D ux y = gx y on the boundary circle. Suppose u is harmonic on the slightly larger disk z < 1 + . If v is a harmonic conjugate of u, then f = u + iv is differentiable on this disk. We may, by adding a constant if necessary, choose v so that v0 0 = 0. Expand f in a Maclaurin series fz =
an zn
(25.7)
n=0
Then
1 fz + fz 2 1 1 an zn + an zn = a0 + a zn + an zn = 2 n=0 2 n=1 n
ux y = Refx + iy =
Now let " be on the unit circle !. Then "2 = "" = 1, so " = 1/" and the series is u" = a0 +
1 a " n + an " −n 2 n=1 n
Multiply this equation by " m /2i and integrate over !. Within the open disk of convergence, the series and the integral can be interchanged. We get 1 a0 - m 1 1 m n+m −n+m u"" d" = " d" + " d" + an " d" (25.8) a 2i ! 2i ! 2 2i n=1 n ! ! Recall that !
" d" = k
0 2i
if k = −1 if k = −1
1084
CHAPTER 25
Conformal Mappings
Therefore, if m = −1 in equation (25.8), we have 1 1 u" d" = a0 2i ! " If m = −n − 1 with n = 1 2 3 , we obtain 1 1 u"" −n−1 = an 2i ! 2 Substitute these coefficients into equation (25.7) to get 1 1 1 n −n−1 fz = d"zn an z = u" d" + u"" 2i ! " n=0 n=1 i ! n z 1 u" d" = 1+2 2i ! " n=1 " Since z < 1 and " = 1, then z/" < 1 and the geometric series in this equation converges: n z z z/" = = " 1 − z/" " − z n=1 Then
1 " +z 1 1 2z 1 fz = d" = d" u" 1 + u" 2i ! " −z " 2i ! " −z "
If u" = g", given values for u on the boundary of the unit disk, then, for z < 1, 1 " +z 1 ux y = Re fz = Re d" g" 2i ! " −z "
(25.9)
This is an integral formula for the solution of the Dirichlet problem for the unit disk. We leave it as an exercise for the student to retrieve the Poisson integral formula from this expression by putting z = rei and " = ei . Equation (25.9) is well suited to solving certain Dirichlet problems by conformal mappings. , where D , is the Suppose we know a differentiable, one-to-one conformal mapping T D → D, unit disk w < 1 in the w plane. Assume that T maps C, the boundary of D, onto the unit , and that T −1 is also a differentiable conformal mapping. circle , C bounding D, To help follow the discussion, we will use " for an arbitrary point of , C, for a point on C, and , x, y for a point in the w plane (Figure 25.39). Now consider a Dirichlet problem for D:
2 u 2 u + =0
x2 y2
for x y in D
ux y = gx y for x y in C
y
z
w v
D x C
FIGURE 25.39
w D:
T
"
T 1
C
y) (x, u
25.4 Harmonic Functions and the Dirichlet Problem
1085
If w = Tz, then z = T −1 w and we define , gw = gT −1 w = gz In the w plane, we now have a Dirichlet problem for the unit disk:
2, u 2, u + 2 =0 2
x
y
, for , x, y in D
, u, x, y = , g, x, y for , x, y on , C From equation (25.9) the solution of this problem for the unit disk is the real part of " +w 1 1 , , f w = d" g" 2i ,C " −w " Finally, recall that T maps C onto , C, let " = T for on C to obtain ux y = Re fz = Re
1 1 T + Tz , T d gT 2i C T − Tz T
Since , gT = gT −1 T = g, we have the solution ux y = Re fz = Re
1 T + Tz T d g 2i C T − Tz T
(25.10)
This solves the Dirichlet problem for the original domain D. To illustrate this technique, we will solve the Dirichlet problem for the right half plane:
2 u 2 u + =0
x2 y2
for x > 0 − < x <
u0 y = gy for − < y < We need a conformal mapping from the right half plane to the unit disk. There are many such mappings. From Example 25.16, we can use
z−1 w = Tz = −i z+1 Compute T z =
−2i z + 12
From equation (25.10), the solution is the real part of 1 −i − 1/ + 1 − iz − 1/z + 1 1 −2i fz = u d 2i C −i − 1/ + 1 + iz − 1/z + 1 −i − 1/ + 1 + 12 1 1 z − 1 = u d i C − z 2 − 1
1086
CHAPTER 25
Conformal Mappings
The boundary C of the right half plane is the imaginary axis, and is not a closed curve. Parametrize C as = 0 t = it, with t varying from − to for positive orientation on D (as we walk down this axis, D is over our left shoulder). We get 1 − −1 itz − 1 idt fz = u0 t i it − z 1 + t2 itz − 1 1 1 = u0 t dt − it − z 1 + t2 The solution is the real part of this integral. Now t, u0 and 1/1+t2 are real, so concentrate on the term containing i and z = x + iy: itz − 1 itx − ty − 1 itx − ty − 1 −it − x + iy = = it − z it − x − iy it − x − iy −it − x + iy =
txt − y − itx2 + ityt − y + txy + it − y + x x2 + t − y2
The real part of this expression is x1 + t2 x2 + t − y2 Therefore ux y = Re fz = =
1 1 x1 + t2 u0 t 2 dt 2 − x + t − y 1 + t2
1 x gt 2 dt − x + t − y2
This is an integral formula for the solution of the Dirichlet problem for the right half-plane.
SECTION 25.4
PROBLEMS
1. Using complex function methods, write an integral solution for the Dirichlet problem for the upper half-plane Imz > 0. 2. Using complex function methods, write an integral solution for the Dirichlet problem for the right quarterplane Rez > 0, Imz > 0, if the boundary conditions are ux 0 = fx and u0 y = 0. 3. Write an integral solution for the Dirichlet problem for the disk z − z0 < R. 4. Write a formula for the solution of the Dirichlet problem for the right half-plane if the boundary condition is given by 1 for − 1 ≤ y ≤ 1 u0 y = 0 for y > 1 5. Write a formula for the solution of the Dirichlet problem for a unit disk if the boundary condition is given by ux y = x − y for x y on the unit circle.
6. Write a formula for the lem for the unit disk given by 1 i ue = 0
solution of the Dirichlet probif the boundary condition is for 0 ≤ ≤ /4 for /4 < < 2
7. Write a formula for the solution of the Dirichlet problem for the strip −1 < Imz < 1, Rez > 0, if the boundary condition is given by ux 1 = ux −1 = 0 u0 y = 1 − y
for 0 < x < for − 1 ≤ y ≤ 1
8. Write an integral formula for the solution of the Dirichlet problem for the strip −1 < Rez < 1, Imz > 0 if the boundary condition is given by ux 0 = 1 u−1 y = u1 y = e
−y
for − 1 < x < 1 for 0 < y <
25.5 Complex Function Models of Plane Fluid Flow
25.5
1087
Complex Function Models of Plane Fluid Flow We will discuss how complex functions and integration are used in modeling and analyzing the flow of fluids. Consider an incompressible fluid, such as water under normal conditions. Assume that we are given a velocity field Vx y in the plane. By assuming that the flow depends only on two variables, we are taking the flow to be the same in all planes parallel to the complex plane. Such a flow is called plane-parallel. This velocity vector is also assumed to be independent of time, a circumstance described by saying that the flow is stationary. Write Vx y = ux yi + vx yj Since we can identify vectors and complex numbers, we will by a mild abuse of notation write the velocity vector as a complex function Vz = Vx + iy = ux y + ivx y Given Vz, think of the complex plane as divided into two sets. The first is the domain D on which V is defined. The complement of D consists of all complex numbers not in D. Think of the complement as comprising channels confining the fluid to D, or as barriers through which the fluid cannot flow. This enables us to model fluid flow through a variety of configurations and around barriers of various shapes. Suppose ! is a closed path in D. From vector analysis, if we parametrize ! by x = xs y = ys, with s arc length along the path, then the vector x si + y sj is a unit tangent vector to !, and dx dy i + j ds = udx + vdy ui + vj · ds ds This is the dot product of the velocity with the tangent to a path trajectory, leading us to interpret udx + vdy !
as a measure of the velocity of the fluid along !. The value of this integral is called the circulation of the fluid along !. The vector −y si + x sj is a unit normal vector to !, being perpendicular to the tangent vector (Figure 25.40). Therefore dy dx − ui + vj · − i + j ds = −vdx + udy dx ds ! ! is the negative of the integral of the normal component of the velocity along the path. When this integral is not zero, it is called the flux of the fluid across the path. This gives a measure
N
2
γ FIGURE 25.40
T
1088
CHAPTER 25
Conformal Mappings
of fluid flowing across ! out from the region bounded by !. When this flux is zero for every closed path in the domain of the fluid, the fluid is called solenoidal. A point z0 = x0 y0 is a vortex of the fluid if the circulation has a nonzero constant value on every closed path about z0 , in the interior of some punctured disk 0 < z − z0 < r. The constant/ value of the circulation is the strength of the vortex. If ! −vdx + udy has the same positive value k for all closed paths about z0 in some punctured disk about z0 , then we call z0 a source of strength k; if k is negative, z0 is a sink of strength k. The connection between the velocity field of a fluid and complex functions is provided by the following. THEOREM 25.10
Let u and v be continuous with continuous first and second partial derivatives in a simply connected domain D. Suppose ui + vj is irrotational and solenoidal in D. Then u and −v satisfy the Cauchy-Riemann equations in D, and fz = ux y − ivx y is a differentiable complex function on D. Conversely, if u and −v satisfy the Cauchy-Riemann equations on D, then ui + vj defines an irrotational, solenoidal flow on D. Let ! be any closed path in D. If M is the interior of !, then every point in M is also in D by the assumption that D is simply connected. By Green’s theorem, v u − udx + vdy = dA = 0
x y ! M
Proof
because the flow is irrotational. But the flow is also solenoidal, so, again by Green’s theorem, u v + dA = 0 −vdx + udy =
x y ! M Because M can be any set of points in D bounded by a closed path, the integrands in both of these double integrals must be zero throughout D, so
u = −v
x y
and
u
= − −v
y
x
By Theorem 21.6, fz = ux y − ivx y is differentiable on D. The converse follows by a similar argument. Theorem 25.9 provides some insight into irrotational, solenoidal flows. If the flow is irrotational, then
v u − curlui + vj = k = 0
x y as shown in the proof of the theorem. This curl is a vector normal to the plane of the flow. From the discussion of Section 12.5.2, the curl of ui + vj is twice the angular velocity of the particle of fluid at x y. The fact that this curl is zero for an irrotational flow means that the fluid particles may experience translations and distortions in their motion, but no rotation. There is no swirling effect in the fluid. If the flow is solenoidal, then divui + vj =
u v + = 0
x y
A further connection between flows and complex functions is provided by the following.
25.5 Complex Function Models of Plane Fluid Flow
1089
THEOREM 25.11
Let f be a differentiable function defined on a domain D. Then f z is an irrotational, solenoidal flow on D. Conversely, if V = ui + vj is an irrotational, solenoidal vector field on a simply connected domain D, then there is a differentiable complex function f defined on D such that f z = V . Further, if fz = x y + ix y, then
= u = v = −v
x
y
x
and
= u
y
We leave a proof of this result to the student. In view of the fact that f z is the velocity of the flow, we call f a complex potential of the flow. Theorem 25.11 implies that any differentiable function fz = x y + ix y defined on a simply connected domain determines an irrotational, solenoidal flow f z =
+i = ux y − ivx y = ux y + ivx y
x
x
We call the velocity potential of the flow, and curves x y = k are called equipotential curves. The function is called the stream function of the flow, and curves x y = c are called streamlines. We may think of w = fz as a conformal mapping wherever f z = 0. A point at which f z = 0 is called a stagnation point of the flow. Thinking of f as a mapping, in the w plane we have w = fz = x y + ix y = + i Equipotential curves x y = k map under f to vertical lines = k, and streamlines x y = c map to horizontal lines = c. Since these sets of vertical and horizontal lines are mutually orthogonal in the w plane, the streamlines and equipotential curves in the z plane also form orthogonal families. Every streamline is orthogonal to each equipotential curve at any point where they intersect. This conclusion fails at a stagnation point, where the mapping may not be conformal. Along an equipotential curve x y = k, d =
dx + dy = udx + vdy = 0
x
y
Now ui + vj is the velocity of the flow at x y and x si + y sj is a unit tangent to the equipotential curve through x y. Since the dot product of these two vectors is zero from the fact that d = 0 along the equipotential curve, we conclude that the velocity is orthogonal to the equipotential curve through x y, provided that x y is not a stagnation point. Similarly, along a streamline x y = c, d =
dx + dy = −vdx + udy = 0
x
y
so the normal to the velocity vector is orthogonal to the streamline. This means that the velocity is tangent to the streamline and justifies the interpretation that the particle of fluid at x y is moving in the direction of the streamline at that point. We therefore interpret streamlines as the trajectories of particles in the fluid. If we sat on a particular particle, we would ride along a streamline. For this reason, graphs of streamlines form a picture of the motions of the particles of fluid. The remainder of this section is devoted to some illustrations of these ideas.
1090
CHAPTER 25
Conformal Mappings
EXAMPLE 25.23
Let fz = −Kei z, in which K is a positive constant and 0 ≤ ≤ 2. Write fz = −K cos + i sin x + iy = −K x cos − y sin − iK y cos + x sin If fz = x y + ix y, then x y = −K x cos − y sin and x y = −K y cos + x sin Equipotential curves are graphs of x y = −K x cos − y sin = constant. Since K is constant, equipotential curves are graphs of x cos − y sin = k or y = cotx + b in which b = k sec is constant. These are straight lines with slope cot(. Streamlines are graphs of x y = − tanx + d straight lines with slope − tan. These lines make an angle − with the positive real axis, as in Figure 25.41. These are the trajectories of the flow, which may be thought of as moving along these straight lines. The streamlines and equipotential curves are orthogonal, their slopes being negative reciprocals. Now compute f z = −Kei = −Ke−i This implies that the velocity is of constant magnitude K. In summary, f models a uniform flow with velocity of constant magnitude K and making an angle − with the positive real axis, as in Figure 25.41.
EXAMPLE 25.24
Consider the flow represented by the complex potential fz = z2 . This function is differentiable for all z, but f 0 = 0 so the origin is a stagnation point. We will see what effect this has on the flow and determine the trajectories. With z = x + iy, fz = x2 − y2 + 2ixy, so x y = x2 − y2
and x y = 2xy
Equipotential curves are hyperbolas x2 − y 2 = k
25.5 Complex Function Models of Plane Fluid Flow
1091
y
x
FIGURE 25.42 Equipotential curves and streamlines of the flow with complex potential fz = z2 .
FIGURE 25.41
Streamlines of the flow with complex potential fz = −Kei z.
if k = 0. Streamlines are hyperbolas xy = c if c = 0. Some curves of these families are shown in Figure 25.42. If k = 0 the equipotential curves are graphs of x2 − y2 = 0, which are two straight lines y = x and y = −x through the origin. If c = 0 the streamlines are the axes x = 0 and y = 0. The velocity of the flow is f z = 2z. f models a nonuniform flow having velocity of magnitude 2 z at z. We can envision this flow as a fluid moving along the streamlines. In any quadrant, the particles move along the hyperbolas xy = c, with the axes acting as barriers of the flow (think of sides of a container holding the fluid).
EXAMPLE 25.25
We will analyze the complex potential fz =
iK Logz 2
Here K is a positive number and Logz denotes that branch of the logarithm defined by Logz = 21 lnx2 + y2 + i, where is the argument of z lying in 0 ≤ < 2 for z = 0. If z = x + iy, then iK 1 2 K iK 2 fz = ln x + y + i = − + lnx2 + y2 2 2 2 4 Now x y = −
K 2
and x y =
K lnx2 + y2 4
Equipotential curves are graphs of = constant, and these are half-lines from the origin making an angle with the positive real axis. Streamlines are graphs of x y = constant, and these are circles about the origin. Since streamlines are trajectories of the fluid, particles are moving on circles about the origin. Some streamlines and equipotential curves are shown in Figure 25.43.
1092
CHAPTER 25
Conformal Mappings
Equipotential lines
y
y
x
x Streamlines FIGURE 25.43 Streamlines and equipotential lines of the flow having complex potential fz = iK/2Logz.
FIGURE 25.44
Streamlines of the flow having complex potential fz = iK/2Logz.
It is easy to check that f z = iK/21/z if z = 0. On a circle z = r, the magnitude of the velocity of the fluid is K K 1 = f z = 2 z 2r This velocity increases as r → 0, so we have particles of fluid swirling about the origin, with increasing velocity toward the center (origin) (Figure 25.44). The origin is a vortex of the flow. To calculate the circulation of the flow about the origin, write f z = −
iK z y x iK 1 K iK =− = − = u + iv 2 z 2 z2 2 x2 + y2 2 x2 + y2
If ! is the circle of radius r about the origin, then on !, x = r cos and y = r sin, so
2 K r sin K r cos udx + vdy = −r sin − r cos d 2 r 2 2 r 2 ! 0 K 2 = −d = −K 2 0 This is the value of the circulation on any circle about the origin. By a similar calculation, we get −vdx + udy = 0 !
so the origin is neither a source nor a sink. In this example we may restrict z > R and think of a solid cylinder about the origin as a barrier, with the fluid swirling about this cylinder (Figure 25.45).
EXAMPLE 25.26
We can interchange the roles of streamlines and equipotential curves in the preceding example by setting fz = KLogz with K a positive constant. Now fz =
K lnx2 + y2 + iK 2
25.5 Complex Function Models of Plane Fluid Flow
1093
y
y Streamlines
x
x
z R
Equipotential curves
FIGURE 25.45 Flow around a cylindrical barrier of radius R.
FIGURE 25.46 Equipotential curves and streamlines for the potential fz = K Logz.
so K lnx2 + y2 and x y = K 2 The equipotential curves are circles about the origin and the streamlines are half-lines emanating from the origin (Figure 25.46). As they must, these circles and lines form orthogonal families of curves. The velocity of this flow is x y =
f z =
x K y =K 2 + iK 2 = u + iv z x + y2 x + y2
Let ! be a circle of radius r about the origin. Now we find that udx + vdy = 0 !
and
!
−vdx + udy = 2K
The origin is a source of strength 2K. We can think of particles of fluid streaming out from the origin, moving along straight lines with decreasing velocity as their distance from the origin increases.
EXAMPLE 25.27
We will model flow around an elliptical barrier. From Example 25.25, the complex potential fz = iK/2Logz for z > R models flow with circulation −K about a cylindrical barrier of radius R about the origin. To model flow about an elliptical barrier, conformally map the circle z = R to an ellipse. For this, consider the mapping a2 z in which a is a positive constant. This is called a Joukowski transformation, and is used in analyzing fluid flow around airplane wings because of the different images of the circle that result by making different choices of a. Let z = x + iy and w = X + iY . We find that the circle x2 + y2 = R2 is mapped to the ellipse w = z+
Y2 X2 + = R2 1 + a/R2 1 − a/R2 provided that a = R. This ellipse is shown in Figure 25.47. If a = R, the circle maps to
−2a 2a on the real axis.
1094
CHAPTER 25
Conformal Mappings z
2
w z az
w
z R FIGURE 25.47 Joukowski transformation mapping a circle to an ellipse.
Solve for z in the Joukowski transformation. As a quadratic equation, this yields two solutions, and we choose √ w + w2 − 4a2 z= 2 Compose this mapping with the complex potential function for the circular barrier in Example 25.25 We get √ iK w + w2 − 4a2 Fw = fzw = Log 2 2 This is the complex potential for flow in the w plane about an elliptical barrier if R > a, and about the flat plate −2a ≤ X ≤ 2a, Y = 0 if R = a. We will conclude this section with an application of complex integration to fluid flow. Suppose f is a complex potential for a flow about a barrier whose boundary is a closed path !. Let the thrust of the fluid outside the barrier be the vector Ai + Bj. Then a theorem of Blasius asserts that 1 A − iB = i f z2 dz 2 ! in which is the constant density of the fluid. Further, the moment of the thrust about the origin is given by 1 2 Re − z f z dz 2 ! In practice, these integrals are usually evaluated using the residue theorem.
SECTION 25.5
PROBLEMS
1. Analyze the flow given by the complex potential fz = az, in which a is a nonzero complex constant. Sketch some equipotential curves and streamlines, determine the velocity, and whether the flow has any sources or sinks. 2. Analyze the flow having complex potential fz = z3 . Sketch some equipotential curves and streamlines. 3. Sketch some equipotential curves and streamlines for the flow having potential fz = cosz. 4. Sketch some equipotential curves and streamlines for the flow having potential fz = z + iz2 .
5. Analyze the flow having potential fz = KLogz − z0 , in which K is a nonzero real constant and z0 is a given complex number. Show that z0 is a source for this flow if K > 0 and a sink if K < 0. Sketch some equipotential curves and streamlines of this flow. 6. Analyze the flow having potential fz = z−a KLog , where K is a nonzero real number z−b and a and b are distinct complex numbers. Sketch some equipotential curves and streamlines for this flow.
25.5 Complex Function Models of Plane Fluid Flow 1 7. Let fz = k z + , with k a nonzero real constant. z Sketch some equipotential curves and streamlines for this flow. Show that f models flow around the upper half of the unit circle. 8. Let m − ik z−a fz = Log 2 z−b in which m and k are nonzero real numbers and a and b are distinct complex numbers. Show that this flow has a source or sink of strength m and a vortex of strength k at both a and b. (A point combining properties of a source (or sink) and a vortex is called a spiral vortex). Sketch some equipotential curves and streamlines for this flow. 9. Analyze the flow having potential 1 ib fz = k z + + Logz z 2
1095
in which k and b are nonzero real constants. Sketch some equipotential curves and streamlines for this flow. 10. Analyze the flow having potential √ √ 2z − ia 3 fz = iKa 3Log √ 2z + ia 3 with K and a positive constants. Show that this potential models an irrotational flow around a cylinder 4x2 + 4y − a2 = a2 with a flat boundary along the y axis. Sketch some equipotential curves and streamlines for this flow. 11. Use Blasius’s theorem to show that the force per unit width on√the cylinder in Problem 10 has vertical component 2 3aK 2 , with the constant density of the fluid.
This page intentionally left blank
PA RT
8 Probability and Statistics
CHAPTER 26 Counting and Probability
CHAPTER 27 Statistics
V
XV
I II
XIII
III
XII
XI
XI
IV V
IX VI
IX VIII
VII Few areas of expertise have as profound an influence on us as statistics. Manufactured products are tested for reliability by performing statistical analyses of random samples. Risk analysis is an application of statistics. Statistics determine costs and availability of insurance, whether a new drug will receive FDA approval for distribution, which television programs will be shown and when, boundaries of congressional districts in the United States, the distribution of some kinds of federal aid, public health programs, and on and on. Professionals in business, medicine, science, engineering, and other areas use statistical analyses because they provide rigorous tools for analyzing information and drawing conclusions with measurable degrees of confidence. Understanding and using ideas from statistics require some knowledge of probability, which in turn makes use of counting techniques. Based on this chain of dependence, we will begin this part with counting, followed by probability, and then statistics.
1097
This page intentionally left blank
CHAPTER
26
COMBINATIONS, PERMUTATIONS, MULTIPLICATION PRINCIPLE, PROBABILITIES, CONDITIONAL PROBABILITIES, BAYES’S THEOREM, TREE DIAGRAMS
Counting and Probability
Counting may sound simple enough. However, counting problems can be subtle. For example, how many lowercase, unordered nine-letter codes can be formed from the English alphabet if each vowel can be used no more than once? We will begin with techniques for solving the kinds of counting problems that we will encounter in statistics.
26.1
The Multiplication Principle Suppose we have some process that proceeds in n independent stages. Independent means that the outcome of one stage is not influenced by the outcome of the others. How many different ways can the entire process be carried out? As a simple example, suppose we are designing a car (the process), and we have four door designs and six fender designs. Door and fender selection are the two stages. How many different cars can we design from these choices, if each door looks good with each fender, and two cars are different if they differ in either a fender, a door, or both? This problem is easily solved. With the first door, we can use any of six fenders for six possibilities. With the second door, we can choose any of six fenders for six more cars. Similarly, there are six fenders that can go with the third door and six with the fourth. The total number of car possibilities is 6 + 6 + 6 + 6 = 24 Or, since there are six fenders with each of four doors, the number of cars is 4 · 6 = 24 To take a slightly more complicated example, suppose we have four doors, six fender styles, and nine hood shapes. Now how many cars can be formed? Since we can form 24 cars from the doors and fenders, and each can go with one of the hoods, there are 24 · 9 = 216 cars that can be made in this way. The answer in this case is 4 · 6 · 9. 1099
1100
CHAPTER 26
Counting and Probability
These examples suggest a general principle. The Multiplication Principle Suppose a process consists of n independent stages, or steps. Suppose stage j can be done in sj ways. Then the total number of ways of carrying out the process is the product s1 s2 sn One way to look at this result is to envision n boxes, in each of which any of a certain number of objects (or choices) can be put. s1 s2 s3 sn−1 sn The total number of ways of picking one object from each box is the product of the number of objects in each box.
EXAMPLE 26.1
A person flips a coin nine times. How many outcomes are there? We want to count something, namely the number of outcomes. However, before we count something, we should be clear on what we are counting. What is an outcome? An outcome here is the result of nine flips, hence will be a string of nine symbols, with the jth symbol an H (if toss j came up a head) or a T (if toss j was a tail). A typical outcome might be T H H T H H H T H The number of outcomes is the number of different nine letter strings, with each letter a T or an H. Think of nine empty boxes. There are two possible results we can put in the first box, namely T or H. There are also two for the second, and so on. Further, these flips are independent. The result of any flip does not influence the result of any other flip. The total number of possible outcomes of nine flips is therefore two multiplied by itself nine times, or 29 , which equals 512. This is an application of the multiplication principle, with n = 9 and each sj = 2. Similarly, the number of possible outcomes with thirty flips is 230 = 1 073 741 824
EXAMPLE 26.2
A game consists of flipping a coin four times, then rolling five dice. What is the number of outcomes? Here the activity or experiment has nine stages. The first four are all coin tosses, each with two possible outcomes. The last five are dice tosses, each with six outcomes. The number of possible outcomes is 2 · 2 · 2 · 2 · 6 · 6 · 6 · 6 · 6, or 24 65 , which equals 124 416. In the formalism of the multiplication principle, n = 9 and s1 = s2 = s3 = s4 = 2, while s5 = s6 = s7 = s8 = s9 = 6.
EXAMPLE 26.3
We want to form identification codes by choosing seven integers from 1 through 9, inclusive. How many ways can this be done?
26.1 The Multiplication Principle
1101
This problem has a wrinkle to it. If we choose the seven integers with replacement (that is, each can be any integer from 1 through 9, inclusively), then the total number of outcomes is 97 or 4 782 969. If, however, the seven integers are chosen without replacement, then every time we pick one it is used up, and we cannot choose it again. This means that there are nine choices for the first integer picked, but only eight for the second, seven for the third, and so on. Now the number of possible outcomes is 9 · 8 · 7 · 6 · 5 · 4 · 3 or 181 440.
SECTION 26.1
PROBLEMS
1. A game consists of flipping a coin seven times and rolling a die five times. (a) What is a typical outcome? (b) How many possible outcomes are there of this game? 2. A room has forty chests of drawers, and each chest has six drawers. A worker is assigned the task of inspecting each drawer in each chest. If each chest and each drawer take one second, how many seconds will it take the worker to complete this task? 3. A game consists of rolling ten dice, then choosing an integer falling between 7 and 21, inclusively. How many different outcomes are there to this game? 4. A man’s outfit for the day consists of a pair of shoes, a pair of trousers, a shirt, tie, jacket, sweater (but not both), and coat and a hat. He has seven pairs of shoes, twelve pairs of trousers, fifteen shirts, ten ties, three jackets, four sweaters, two coats and four hats. How many different outfits are possible, assuming that all the pairings making up an outfit are compatible (so, for example, any shirt looks good with any sweater)? How does the answer change, if at all, if an outfit consists of a pair of shoes, a pair of trousers, a shirt, tie, jacket and/or sweater, and a coat and/or hat? 5. ID codes for a company’s employees are to be formed as strings of six symbols. Each of the first three symbols can be any of the integers 1, 3 , 7, or 9, and each of the last three can be any of the letters A, C , D,
E or K. The company has 3 300 employees. Determine whether it is possible to assign each employee a different code in this way. 6. A factory produces tops for coffee tables. For each top, any of six colored panels are selected, then any of fifteen designer patterns is stamped on the selected panel. Finally, the underside of the panel is given any of four different finishes. How many different coffee table tops can be produced in this way? 7. A woman can fill out a set of golf clubs by choosing any one of seven putters, any one of fifteen drivers, any one of eight wedges, and any one of seventeen irons. How many different sets are there? 8. A carnival game consists of a large wheel with the numbers 1 through 50 spaced equally around the perimeter. A pointer is pivoted at the center and is spun by the operator, eventually coming to land on one of the numbers. The counter is spun four times. What does an outcome of these four spins look like, and how many different possible outcomes are there? 9. A committee of four is to be formed on the city council. The first member must be chosen from among seven residents of south side, the second from among eleven residents of city center, the third from among six residents of the park area, and the fourth from among fourteen residents of north beach. How many different committes can be formed? 10. Six coins are each flipped twenty times. What does an outcome look like, and how many different outcomes are there?
CHAPTER 26
1102
26.2
Counting and Probability
Permutations Suppose we have a collection of objects, and we want to arrange or list them in some order. How many ways are there to do this? This question has an important but simple answer. Think of having n boxes. We can put any of the numbers 1 2 n object in the first box, so there are n ways to fill this box. With one object chosen, there are n − 1 ways to fill the second box, then n − 2 ways to pick an object for the third box, and so on. Continue in this way. For the third from the last box, there are three remaining objects to choose from. For the next to last, there are two to choose from. Finally, there is only one object left, and that goes into the last box. By the multiplication principle, the total number of ways of making these choices is n · n − 1 · n − 2 3 · 2 · 1 which is the product of the integers from 1 through n. Our conclusion is: The number of ways of ordering n objects is the product of the integers from 1 through n, inclusive.
An ordering of objects is called a permutation of these objects. Thus, the number of permutations of three objects is 6, the number of permutations of four objects is 24, and the number of permutations of n objects is 1 · 2 · 3 n. This product occurs frequently in counting and other contexts (for example, coefficients in Taylor expansions), and is denoted n! (read “n factorial”). In this language: The number of permutations on n objects is n!
Here is a short list of factorials: n
n!
1 2 3 4 5 6 10 15 20 30 50
1 2 6 24 120 720 3,628,800 13077 × 1012 24329 × 1018 26525 × 1032 30414 × 1064
Factorials grow at an extremely rapid rate, 10! is already well over three million.
EXAMPLE 26.4
A lottery entry involves choosing any ordering of the objects a, b, c, d, e, and then any ordering of the objects A, B, C, D. How many possible outcomes are there in this lottery? Think of the game as consisting of two stages, the first choosing any ordering of a, b, c, d, e, the second choosing an ordering of A, B, C, D. There are 5! = 120 ways of doing the first step, and 4! = 24 ways of doing the second step. Further, the two steps are independent.
26.2 Permutations
1103
By the multiplication principle, there are 120 · 24 = 2880 possible outcomes. For example, one outcome is b c a d e B A D C Suppose we require that a be in the first position, and A in the last. How many outcomes are possible with this requirement? Think of nine boxes, with a already placed in the first and A in the last: a A In the second through fifth places, there can be any arrangement of b, c, d and e. There are 4! such arrangements. In the sixth through eighth places, there can be any arrangement of B, C and D. There are 3! = 6 such arrangements. There are therefore 4!3! = 246 = 144 outcomes.
EXAMPLE 26.5
Suppose we are playing a poker game in which each hand has seven cards, but the order in which the cards is played is important. A particular player is dealt a jack, a king, a four, a two, a ten, a seven and a queen. How many different ordered hands can the player form? Since there are seven different cards, there are 7! = 5040 different orderings. This is the number of different hands that can be formed. The actual value of the cards is irrelevant information in reaching this conclusion. Given any seven different cards, there are 5040 orderings of them.
SECTION 26.2
PROBLEMS
1. How many ways can the first nine letters of the alphabet be arranged in different orders?
5. (a) How many arrangements are there of the objects a b c d f g h?
2. How many different codes can be formed from the lower case letters of the English alphabet, if a code consists of seventeen distinct letters, with different orders counting as different codes?
(b) How many arrangements are there if we insist on using only those beginning with a?
3. An ID number for each employee in a certain company consists of a string of nine numbers, consisting of an arrangements of the integers 1 through 9. How many different ID numbers can be formed?
6. We want to form ID numbers by using n distinct symbols and allowing any order for their arrangement. How large must n be to accommodate 20 000 people? How many for 1 000 000 people?
4. Suppose we have five symbols available, and these are %, $, #, ∗ and &. One plan is to form passwords by using different orderings of these five symbols. A second plan is to use ordered strings chosen from these five symbols, in which any symbol can be used one or more times. How many different passwords are possible under each plan?
7. Letters a through f are to be arranged in a three by two pattern, with one symbol in each location. For example,
(c) How many arrangements are there if a must be first and g fifth in the list?
da bf ce
CHAPTER 26
1104
Counting and Probability
is one such pattern. How many patterns can be formed in this way?
11. How many ways are there of choosing, with order, the even integers out of the integers 1 through 12?
8. Bowling pins are traditionally arranged in a triangle of fifteen identical pins. Suppose we decide to number the pins 1 through 15. How many different arrangements are there of these pins?
12. We have seen that n! increases at a fast pace as n is chosen larger. Because factorials arise in many contexts, there are formulas that can be used to approximate n! when n is large, but which can be computed in fewer steps than it takes to compute n! as the product of the integers from 1 through n. Stirling’s formula states that n n √ n! ≈ 2n e with accuracy improving as n is chosen larger. To test this supposition, make a three column table, having values of n for n = 1 2 20 in the first column. In the second column, compute n!, and in the third column, the Stirling approximation. Compute the percentage error in the approximation in each case.
9. A lottery is run as follows. Twelve slips of paper are placed in a bowl. Each slip has a different symbol on it. A player makes an ordered list of these twelve symbols, and wins if they are drawn from the bowl in this order. How many possible different outcomes are there of this lottery? 10. The letters a through l are to be arranged in some order. How many arrangements are there that have a in the second place, d in the fifth place and k in the seventh place?
26.3
Choosing r Objects from n Objects Suppose we have n objects, and we want to choose r of them, with 1 ≤ r ≤ n. How many ways are there to do this? The answer depends on whether or not we take order into account. First consider the case that we do.
26.3.1
r Objects from n Objects, with Order
Given n objects, how many ways are there to make an ordered list of r of these objects? Using the multiplication principle, it is not difficult to derive a general formula for this number.
THEOREM 26.1
Let r and n be positive integers, with 1 ≤ r ≤ n. Then, the number of ways of picking r objects from n objects, taking order into account, is n! n − r! To understand this conclusion, envision r boxes. There are n choices of objects to put in box 1, leaving n − 1 choices for box 2, then n − 2 for box 3, and so on, until finally there are n − r + 1 objects left from which to pick one for box r. By the multiplication principle, the number of ways of making these choices of r objects from the given n objects is the product nn − 1n − 2 n − r + 1
26.3 Choosing r Objects from n Objects
1105
This product is n − r + 1 n = =
1 · 2 · · · · n − r − 1n − rn − r + 1 · · · · · n 1 · 2 · · · n − r n! n − r!
This number is often denoted n Pr , which is an abbreviation for “the number of permutations of r objects chosen from n objects.” Thus, n Pr
=
n! n − r!
For example, if n = 4 and r = 2, then =
4 P2
4! = 4 · 3 = 12 2!
This is the number of ways of choosing two ordered objects from four objects. As a convenience, define 0! = 1. In the present context, this makes sense. Suppose, for example, we want to compute 6 P6 . This is the number of ways of choosing six ordered objects from six objects. But this is exactly the number of permutations of six objects, which we know equals 6! Now compute 6 P6
=
6! 6! = = 6! 6 − 6! 0!
EXAMPLE 26.6
An election is being held and there are eight nominations for three offices. A ballot consists of three lines. The name filled in first receives one vote for president, the second name receives one vote for vice president, and the third name receives one vote for janitor. No name can be duplicated on any ballot. How many different ballots are possible in this election? Obviously the order in which candidate names are listed is significant here. The number of ballots is the number of ways of choosing three from eight, with order: 8 P3
=
8! = 6 · 7 · 8 = 336 5!
EXAMPLE 26.7
An ID code is to consist of eight different lowercase letters of the English alphabet, written in some order, together with an ordered string of any five distinct integers, chosen from 1 through 9, inclusively. How many different codes are there? There are 26 P8 ways of choosing eight of the 26 letters of the alphabet, with order, and 9 P5 ways of choosing five of the integers 1 through 9, with order. By the multiplication principle, the number of codes is 26! 9! 26 P8 9 P5 = 18! 4! = 19 · 20 26 · 5 · 6 9 = 952421014
1106
CHAPTER 26
26.3.2
Counting and Probability
r Objects from n Objects, without Order
Sometimes we want the number of ways of choosing r objects out of n objects, but the order in which the selected objects are listed does not matter. This number is denoted n Cr , and is called the number of combinations of r objects chosen from n objects. Whenever the word “ combination” is used, order is not taken into account. When “ permutation” is used, order is a factor. We already know how to compute n Cr . Given n and r, n Pr differs from n Cr only in taking the orderings into account. There are r! ways of ordering any r of the n objects. Therefore each combination of r of the n objects counted in n Cr gives rise to r! different orderings of these objects, and these are all included in the number n Pr . This means that n Pr
= r!n Cr
Therefore, n Cr
=
1 n! n Pr = r! r!n − r!
For example, suppose we want to choose two objects out of four. If order is taken into account, then there are twelve possibilities, since 4 P2
=
4! = 12 2!
Explicitly, the possible choices are ab ac ad bc bd cd ba ca da cb db dc If order is not to be taken into account, then ab and ba are counted as one choice. Similarly, ac and ca are the same, and so on. There are six distinct choices (without order) of two of these four letters, and they are ab ac ad bc bd cd Without having this explicit list, we could still know that there are six such combinations by computing 4 C2
=
3·4 4! = = 6 2!2! 2
EXAMPLE 26.8
Elections are being held to form a committee to consist of four club members. There are seven nominations. A ballot consists of a list of four of the nominees, without regard to order, because the committee has no hierarchy. How many different ballots are there? Discount ordering and look for combinations, not permutations, of four members chosen from the seven nominees. The number of these combinations is 2·3·4·5·6·7 7! = = 35 7 C4 = 4!3! 2·3·4·2·3
EXAMPLE 26.9
Consider a poker game in which seven cards are dealt face down. Bets are placed after all hands have been dealt, at which time each player has seven cards making up the hand. What is the total number of possible hands?
26.3 Choosing r Objects from n Objects
1107
Assume a standard 52 card deck. Then the number of hands is the number of ways of choosing seven from 52 cards, without regard to order: 52 C7
=
52! 52! = 7!52 − 7! 7!45!
=
46 · 47 · 48 · 49 · 50 · 51 · 52 = 133 784 560 2·3·4·5·6·7
Order does not matter here, because bets are made only after each player receives all seven cards, and nothing would change if the same seven cards were dealt in a different order. Now suppose each player is dealt one card face up, and then each player is dealt one card face down, at which time bets are made. Then each player is dealt one more card face down, followed by another round of betting. This continues until each player has seven cards (so six are face down, and there are six rounds of betting). Notice the difference between this and the first game. In the first game, the player only cares about which seven cards have been dealt, and the order made no difference. In the second game, order matters. If a queen is dealt first to a player (for all to see) and a second queen is dealt as the second card (for only the player to see), the player might bet in the early rounds differently than if the second queen is dealt in the last round. And opponents might bet differently if both queens had been dealt face down. Now order is vital to betting strategy. With order, the number of possible hands is 52 P7
=
52! = 46 · 47 · 48 · 49 · 50 · 51 · 52 = 674 274 182 400 45!
It is routine to check that this is 7! times Often n Cr is written
52 C7 .
n r
and
. For example, 6 6! = = 15 2!4! 2 5 5! = = 10 2 2!3!
This notation is often seen in connection with binomial expansions. If x and y are numbers, and n is a positive integer, then x + yn =
n n k=0
k
xk yn−k
Because of their appearance as coefficients in a binomial expansion, the numbers nk are called binomial coefficients. The coefficient of xk (which is the same as the coefficient of xn−k ) is the number of ways of choosing k objects from n objects, disregarding the order of the choice. This makes sense, because if we choose k from n objects, then we have automatically also chosen n − k of the n objects.
26.3.3
Tree Diagrams
In working with permutations and combinations, it is often handy to use a systematic device called a tree diagram. An illustration will clarify what this is.
1108
CHAPTER 26
Counting and Probability
Consider strings of symbols of length 5, using either an H or a T in each place. For example, HHTHT
and HTTTT
are two such strings. There are 25 = 32 strings. Now, suppose we want the number of strings having exactly two T ’s (and therefore three H’s). Think of a string of five boxes. We want to pick out two to put T in, the rest automatically getting an H. There are 5 C3 = 10 ways of doing this. Order is irrelevant here because, once we pick two boxes, we put an identical T in each, then an identical H in the other three. None of this involves a new idea. But now suppose we want to actually list all ten of these strings. Here is a systematic way to do this. Put a dot on a piece of paper to represent a starting point. Next, place two dots to the right (or left, if you prefer), one for H and the other for T , with a line (or edge, or branch) from the starting dot to each of these. So far the diagram displays the outcomes of the first flip. Next draw two lines, ending in dots for H and T , from each of the first H and T dots. The diagram so far, Figure 26.1, shows outcomes of the first two flips. Continue, putting two lines from each of the four end dots drawn so far, with each line ending with an H or a T dot (Figure 26.2). The pattern is obvious, and we carry out two more steps (two lines from each H or T , then repeat once more), until strings of five edges form along each path from the starting dot to the the right-most dots (Figure 26.3). The resulting configuration is called a tree diagram, and it is useful for reading information about
H H T H
H
T
T
H
H H
H
T
T T
H H T
T T
T
FIGURE 26.1
FIGURE 26.2
Outcomes of the first two coin flips.
three coin flips.
Outcomes of the first
26.3 Choosing r Objects from n Objects H T
H
H T
T
H
1109
H
H
H
T
T H H
T H
H T H T
T
H T H
T T
H T
T H T H H
T H
T
T
H H
H T H T H
H T T H
T H
T H T
T H
T T
H T H
T T FIGURE 26.3 Outcomes of five coin flips.
the strings. Each path of five edges, from the starting dot to an end dot on the right, details one complete sequence of five flips of the coin. By following various paths, we can, for example, easily pick out exactly those having two T ’s and three H’s: HHHTT HHTHT HHTTH HTHHT HTHTH HTTHH THHHT THHTH THTHH TTHHH Although we knew without the tree diagram that there would be ten of these outcomes, the tree diagram is a systematic way of listing them all, if that is needed.
1110
CHAPTER 26
Counting and Probability 1 23 4
1 2
3 4 5
H 6
H 6 5
43
2
1 6
4 5
5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3
1 2 3 4
5
6
1 2
3 4
1
5 6
2
1 2 3 4 5 6
3
4
1 2 3 4
5 1 6 2 3
T 6
5
4 6 6 5 4
3
2
5
1 1
1
2 3 4 5 6
1
2
H
2 3 4 5 6
3
1 2
T
4 3
1
1 2
T
6
5
4 5 6
3
23 4 5 6 1 2
6
5
1 2 3 4
6
4 5 6
65
FIGURE 26.4
twice).
4
5
1 2 3 4
6 3 4 5 6 6
1 2 3
1 2 3 4 5
6
5
1 2 3 4
5
4
1 2 3
1 2 3 4 5 6
Outcomes of flipping two coins or one coin twice), then rolling two dice (or one die
26.3 Choosing r Objects from n Objects
1111
EXAMPLE 26.10
A game consists of flipping two coins, then rolling two dice. An outcome will consist of a string of length four, the first two symbols recording the results of the coin flips (H or T ) and the last two the numbers that come up on the dice 1 through 6). There are 2 · 2 · 6 · 6 (or 72) outcomes. Figure 26.4 shows a tree diagram for this game. By following all of the paths from the left starting dot to the end dots on the right, we can read all of the outcomes explicitly. For example, all of the outcomes in which the two dice come up both 3, or both 4, or one 3 and one 4, can be easily read from the diagram: HH33 HT 33 TH33 TT 33 HH44 HT 44 TH44 TT 44 HH34 HT 34 TH34 TT 34 HH43 HT 43 TH43 TT 43
SECTION 26.3
PROBLEMS
1. Seven members of an audience of 25 people are to be chosen to win a prize. The first name drawn will win half of the planet, the second name drawn will win an airplane, the third name drawn a new house, and so on down to the last name drawn, which will win fifty cents worth of merchandise at the nearest drug store. How many different possible outcomes are there of this drawing? 2. There are five positions open on the board of a swimming club, and sixteen people are eligible for election to the board. A ballot consists of a list of names of five of the eligible members, with order being important because new board members are assigned positions of decreasing importance, depending on how far down the list a person is selected. How many different ballots are possible in this election? 3. A car dealer has a lottery. First, 22 names are selected at random from a data base of past customers. Six of these names will be winners. The first name chosen can go through the lot and pick any car. The second person can do the same, but perhaps the best car has already been chosen by the first person. The third person also goes through the lot, taking third pick, and so on. How many different possibilities are there for lists of people to go through the lot and make choices? 4. A game consists of choosing and listing, in order, three numbers from the integers 1 2 20.
(a) How many different choices are there? (b) What percentage of these begins with the number 4? Would this answer change if the first number is 17 instead of 4? (c) What percentage of these choices ends with the number 9? Would this number be the same if the last number were 11 instead of 9? (d) How many choices are there if the first number must be 3 and the last number 15? 5. How many different ways can a ten card hand be dealt from a standard 52 card deck, if the order in which the cards are dealt is unimportant? What is the number if the order of the deal is significant? 6. How many different nine man lineups can be formed from a roster of 17 players, if order of selection is not a factor? How many can be formed if order makes a difference? 7. How many different ways can four drumsticks be chosen from a barrel containing twenty drumsticks, if order of selection does not matter? 8. A company is selecting twelve of its forty employees to lay off. How many different ways can such a selection be made, if the order of choice is unimportant? 9. A carnival game begins with the contestant drawing five cards from a standard deck. Contrast the number
CHAPTER 26
1112
Counting and Probability
of different ways this can be done if order is important, with the number of different hands that can be drawn if order does not count. 10. How many different ways can a seven person committee be selected from a group of seventeen people, (a) if the order of selection is important, and (b) if the order does not matter? 11. Let n and k be integers with 0 ≤ k ≤ n. Show that n n = k n−k 12. Five coins are tossed. (a) Draw a tree diagram for this experiment.
13. Three dice are rolled. (a) Draw a tree diagram for this experiment. (b) List all outcomes in which the sum of the dice is even. (c) List all outcomes in which the first and third die are odd. (d) List all outcomes in which the first die comes up 2 or 5, and the last die is 3, 4 or 6. 14. Two coins are tossed and then and two dice are rolled. (a) Draw a tree diagram for this experiment.
(b) List all outcomes in which the second and third coins are heads.
(b) List all outcomes having a head on the first coin and the sum of the dice is odd.
(c) List all outcomes in which the first coin comes up heads and the fourth, tails.
(c) List all outcomes in which the first die is a 1 or 5.
(d) List all outcomes in which two coins come up heads and two come up tails.
(d) List all outcomes in which both coins come up tails and the second die is 1, 3 or 4.
26.4
Events and Sample Spaces In this section we begin to establish a framework within which we will be able to compute probabilities of certain kinds of events. In order to isolate the essential issues, begin with a simple probability problem that we can all solve. Flip an honest coin. What is the probability that it comes up heads? Everyone knows that this probability is 1/2. Although this probability is sometimes stated as 50%, we prefer to specify any probability as a number from 0 to 1 inclusive. Why is this obvious? Perhaps without explicit verbalization, 1/2 seems natural for this probability because a head is one of two equally likely outcomes, heads or tails. Formally, Probability (head) =
number of ways the coin can come up heads 1 = total number of possible outcomes of one flip 2
Try this reasoning on a similar but slightly more complicated problem. Roll a single die. What is the probability that it comes up 3? It is common to guess that this probability must be 1/6. Why? There are six possible outcomes (six faces on the die), and they are all equally likely (honest die), so 3 will likely come up one in six times, for a probability of 1/6. Again, Probability (rolling 3) =
number of ways 3 can come up 1 = total number of possible outcomes of one roll 6
Notice a common thread in both of these probabilities. In each, we performed an experiment (flip a coin, roll a die), and there were certain outcomes. The probability of an outcome was the number of ways this outcome could occur, divided by the total number of possible outcomes. These simple examples are useful in suggesting an approach that will enable us to handle more complicated probability questions.
26.4 Events and Sample Spaces
1113
The fundamental setting for any question in probability is an experiment, by which we mean simply something that is done. It could be a coin flip, a roll of a die, pulling an item from a production line, picking twelve cards from a deck, choosing a marble from a jar, guessing a lottery number, choosing a patient in a clinical trial, or almost any other action that has a finite number of outcomes. We will not consider experiments with infinitely many outcomes. We assume that each experiment we deal with has known outcomes. The number of outcomes may be very large, so we do not want to have to actually make a list to compute a probability. But we do need to know what the outcomes are and at least be able to describe them all by some rule or rules. We will also have to be able to count them. For example, suppose an experiment is to choose an integer at least as large as 1 and no larger than ten hundred billion. This experiment has ten hundred billion possible outcomes, namely all of the integers in the specified range. It would take us a long time to actually write these all down in a list, and there is no need to do this. The important thing is that we know all the outcomes, what they look like, and how many there are.
DEFINITION 26.1
Sample Space
The set of all outcomes of an experiment is called its sample space.
EXAMPLE 26.11
An experiment is to flip two coins. Now there are four possible outcomes. If the sample space is called Q, then Q = HH TT HT TH These are all the outcomes. Notice that TH and HT are separate outcomes. There are two ways we can get a head and a tail—a head on the first coin and tail on the second or a head on the second coin and tail on the first.
EXAMPLE 26.12
Roll two dice. How may outcomes are there? We can record the result of rolling two dice as a pair a b, where a is the number that comes up on the first die, and b is the number on the second die. There are six choices for a, and six for b, so by the multiplication principle, there are 6 · 6 = 36 outcomes. If we list them explicitly, the sample space W is W = 1 1 1 2 1 3 1 4 1 5 1 6 2 1 2 2 2 3 2 4 2 5 2 6 3 1 3 2 3 3 3 4 3 5 3 6 4 1 4 2 4 3 4 4 4 5 4 6 5 1 5 2 5 3 5 4 5 5 5 6 6 1 6 2 6 3 6 4 6 5 6 6
1114
CHAPTER 26
Counting and Probability
EXAMPLE 26.13
An experiment consists of choosing (without replacement) five different letters of the English alphabet and writing them in some order. What are the outcomes, and how many outcomes are there? Suppose we use only lowercase letters. Then there are 26 letters from which to choose five. Order counts because of the way the experiment was defined. Thus there are 26 P5
=
26! = 22 · 23 · 24 · 25 · 26 = 7893600 21!
outcomes, each of which is a string of different letters (because the letters were chosen without replacement). Typical outcomes are acdfg and gdfac. We know what an outcome looks like, and how many there are. Once we understand the concepts of an experiment and the resulting sample space, we can define an event for an experiment.
DEFINITION 26.2
Event
An event for an experiment is a set of outcomes.
The rationale for this definition is that we sometimes want the probability of something that is more complicated than a single outcome, and consists of several outcomes specified in some way.
EXAMPLE 26.14
Flip three coins. A natural way to record an outcome of this experiment is as a 3-character string of H’s and T ’s. Since each flip has two possible outcomes, there are 23 = 8 outcomes. This is a small enough number that we can actually write out the sample space explicitly: S = HHH HHT HTH HTT TTT TTH THT THH An event in this experiment is any collection of outcomes. The entire sample space S is an event. Another is E1 = HHH TTT HTH Another event is E2 = HHH TTT This is the event that all three flips come up the same. Still another event is E3 = HTT THT TTH the event that there is exactly one head.
26.4 Events and Sample Spaces
1115
EXAMPLE 26.15
Consider the experiment of rolling two dice. The sample space W consists of 36 outcomes, each of the form a b, where a and b can independently be any integer from 1 through 6, inclusively. Suppose we are interested in the event E that the sum of the numbers that come up on the dice is 5. E consists of all outcomes a b with a + b = 5: E = 1 4 2 3 3 2 4 1 These are all the ways the dice can total 5. Suppose we want the event B consisting of all the ways we can roll a total of 2. This is the event B = 1 1 There is only one way to roll a total of 2. We have the intuition that it is less likely to roll a total of 2 than a total of 5, because there is only one way to roll 2, but four ways to roll 5. It will be useful to have an empty event. This event is denoted ∅, and consists of no outcomes. To illustrate, in the last example, the event that the dice total 13 is empty, containing no outcomes, since this can never happen. We are now ready to compute probabilities. In doing this, we emphasize a crucial point, the difference between an outcome and an event. An outcome is something that occurs when the experiment is performed. An event is a collection of one or more outcomes (or even perhaps no outcomes). Outcomes are the building blocks of events.
SECTION 26.4
PROBLEMS
In each of Problems 1 through 10, an experiment is described. Describe a typical outcome, and count the number of outcomes. Determine the number of outcomes in each of the events defined for each experiment. It is not necessary to list all of the outcomes of an event explicitly.
4. Pick five distinct lowercase letters from the English alphabet, taking order into account. Event I consists of all outcomes beginning with zk. Event W consists of all outcomes having third letter e. Event P consists of all outcomes with chjk as the last four letters.
1. Roll four dice. Event A consists of all outcomes in which the dice total 9. Event B consists of all outcomes in which each die comes up 1 or 2.
5. Flip three coins and then roll three dice. Event C consists of all outcomes with three heads and the dice totalling 15. Event D consists of all outcomes with at least two heads and the dice totalling at least 15. Event E consists of all outcomes with exactly one head and only a five or six showing on the each die.
2. Draw seven cards from a standard 52 card deck, with order of the draw having no importance. Event W consists of all hands having only cards numbered 2 or 3. Event K consists of all hands having only kings and/or aces. Event M consists of all hands with exactly one ace. 3. Pick four distinct lowercase letters from the English alphabet, without regard to order. Event A consists of the outcomes having an a and a z. Event W consists of the outcomes having only letters between a and g, inclusively.
6. Pick an integer from 0 through 10 inclusive, then three letters from the English alphabet. M is the event that the number chosen is even. K is the event that the number is 3, 5 or 7, and one of the letters is w. 7. Five dice are rolled. U is the event that the dice total at least 28. K is the event that the dice total no more than 7.
1116
CHAPTER 26
Counting and Probability jacks, four kings and three aces were drawn (together with other cards).
8. Twenty three coins are flipped. R is the event that at least twenty heads come up. Q is the event that the last nineteen flips all came up heads. 9. Sixteen cards are drawn (without regard to order) from a standard 52-card deck. Y is the event that every card drawn was a face card or ace. M is the event that four
26.5
10. A ten sided polygon has its sides numbered 1 10. Four sides are selected at random. Y is the event that the sides total exactly 14. P is the event that sides 2, 3 and 7 were picked.
The Probability of an Event Suppose some experiment has taken place, and we focus on a certain event. We want to assign a number, between 0 and 1, to this event in such a way as to give a measure of the likelihood, or probability, that this event will occur if we perform this experiment. In this scheme, probability 0 means that the event cannot occur, probability 1 means that it is certain, and numbers strictly between 0 and 1 are assigned to events that are possible, but not certain. Further, these numbers must be assigned so that larger probabilities correspond in a reasonable way to “more likely” events. To calculate the probability of an event, the following must be in place. 1. There must be an experiment and a sample space S = o1 o2 oN with finitely many outcomes. We call oj the j th outcome. 2. For each outcome oj , there must be given, or determined, a number Proj , with 0 < Proj < 1. Proj is called the probability of oj . 3. The sum of the probabilities of all the outcomes must equal 1: Pro1 + Pro2 + · · · + ProN = 1 The function Pr, operating on the outcomes, is called a probability function, or probability measure, for this experiment. When we have a probability function, the probability of an event E is denoted PrE, and is defined to be the sum of the probabilities of the outcomes in E: PrE = sum of probabilities of outcomes in the event E. For example if E = o3 o7 o10 , then PrE = Pro3 + Pro7 + Pro10 in which each of the probabilities of outcomes on the right is assumed known. The probability of the empty event, ∅, is defined to be zero, and the probability of the entire sample space (which is an event), is one: Pr∅ = 0 PrS = 1 The fact that PrS = 1 is a necessary consequence of Property 3 of the probability function. S is itself an event containing all of the outcomes, and the probability of this event is by definition the sum of the probilities of all the outcomes in S, which must equal one. Finding a probability function for an experiment is now the central issue in computing probabilities of events.
26.5 The Probability of an Event
1117
EXAMPLE 26.16
Consider a single coin flip. We do not need all of this machinery for such a simple experiment, but it is instructive to see how it works here. This experiment is to flip a coin once. There are two outcomes, which we call H and T , so the sample space is S = H T. Here o1 = H and o2 = T and the number of outcomes is N = 2. We need to assign a probability to each outcome. If the coin is honest, then we expect a head to have the same likelihood of coming up a tail, so we set 1 PrH = PrT = 2 This is consistent with Property 3, since the sum of the probabilities of all the outcomes is one: 1 1 + = 1 2 2 In this experiment, there are only four events, namely the empty event ∅, H consisting of just H, T consisting of just T , and H T, which is the entire sample space. PrH + PrT =
EXAMPLE 26.17
We will redo the last example with a twist. Suppose the coin is dishonest, and on any flip, a head is twice as likely as a tail. If o1 = H and o2 = T , then Pro1 = 2 Pro2 Now, to be a probability measure, we require that Pro1 + Pro2 = 1 These equations imply that 2 1 and PrT = 3 3 This is a legitimate probability function for this experiment, but it is different from that of the preceding example because the coin is dishonest in a known way. PrH =
We will now move to more complicated experiments, beginning with a class of special probability problems having equally likely outcomes. Suppose an experiment has N outcomes o1 oN , and each is equally likely: Pro1 = Pro2 = · · · = ProN Since Pro1 + Pro2 + · · · + ProN = N Pro1 = 1 then necessarily Pro1 =
1 N
Because all the outcomes are equally likely, then Proj =
1 N
for j = 1 2 N
1118
CHAPTER 26
Counting and Probability
Such an experiment is called an equally likely outcome experiment. We also say that the sample space is an equally likely sample space. In such a case, if an event has k outcomes in it, then the probability of this event is the sum of the probabilities of the outcomes in the event, which is 1/N added to itself k times, or k/N . We will record these results for future use. THEOREM 26.2
In an equally likely outcome experiment with N outcomes, the probability of each outcome is 1/N . Further, if an event E has k outcomes, then PrE =
k N
Another, sometimes useful, way of writing this result is that, in an equally likely outcome experiment, Probability of an Event E =
the number of outcomes in E the total number of outcomes of the experiment
=
the number of ways the event can occur the total number of outcomes of the experiment
(26.1)
This reduces many probability calculations to counting problems.
EXAMPLE 26.18
Flip three coins. What is the probability of getting exactly two heads? There are 23 = 8 outcomes, each equally likely. The probability of any one outcome is 1/8. The event E of getting exactly two heads is E = H H T H T H T H H containing three outcomes. The probability of getting exactly three heads is therefore 3 PrE = 8 EXAMPLE 26.19
Three dice are rolled. What is the probability that the dice total 17? Each outcome is a triple a b c, with a the number that came up on the first die, b that on the second, and c the number on the third die. Assuming honest dice, all outcomes are equally likely. There are N = 63 = 216 outcomes. The event E we are interested in (dice total 17) is E = 6 6 5 6 5 6 5 6 6 There are k = 3 outcomes in E. By equation (26.1), PrE = 3/216
EXAMPLE 26.20
An experiment consists of flipping 25 honest coins. What is the probability that exactly seven coins come up heads?
26.5 The Probability of an Event
1119
This is an equally likely outcome experiment. We need the number of outcomes, and the number of outcomes in which there are exactly seven heads. Since there are 25 flips, each with two possible outcomes, the experiment has 225 = 33 554 432 outcomes. The event E we are interested in is that exactly seven tosses yield H (so the other 18 are all tails). The number of outcomes having exactly seven H’s is the number of ways of choosing seven out of 25 without regard to order, and this number is 25 C7
=
25! 25! = = 480 700 7!25 − 7! 7!18!
Now we have both the numerator and denominator in equation (26.1), and PrE =
480700 ≈ 00143 33554432
What is the probability of coming up with at least 22 heads? Let F be this event. The key phrase here is “at least.” We obtain at least 22 heads if we roll exactly 22 heads, or exactly 23, or exactly 24, or all 25 heads. These are all different outcomes comprising parts of event F . We will count all of these outcomes: Number of outcomes with at least 22 heads = number of outcomes with exactly 22 heads + number of outcomes with exactly 23 heads + number of outcomes with exactly 24 heads + number of outcomes with exactly 25 heads =
25 C22 + 25 C23 + 25 C24 + 25 C25
=
25! 25! 25! 25! + + + = 2626 3!22! 2!23! 1!24! 25!
From equation (26.1), PrF =
2626 ≈ 0000078 33554432
As intuition might suggest, this event is not one to bet on.
EXAMPLE 26.21
Suppose a person is dealt five cards, without regard to order, from a standard deck. What is the probability that the hand has four aces in it? The total number of outcomes is the number of ways of choosing five cards from 52, 52! = 2598960 disregarding order. This number is 52 C5 = 5!47! This is the denominator in equation (26.1). Now we need the numerator. How many unordered five-card poker hands have exactly four aces? If four of the cards are aces, the fifth card can be any of the remaining 48 cards. Therefore the probability of being dealt four aces is Prfour aces =
48 2598960
which is approximately 0000018. This is why, in westerns, a gambler showing four aces usually gets shot.
1120
CHAPTER 26
Counting and Probability
We will consider experiments in which not all outcomes have the same probability when we have more machinery to deal with them efficiently.
SECTION 26.5
PROBLEMS
1. Suppose five honest coins are flipped. (a) Find the probability of getting exactly two heads. (b) Find the probability of getting at least two heads. 2. Roll four dice.
(a) What is the probability that all seven dice came up 3? (b) What is the probability that five dice came up 1, and two came up 4? (c) What is the probability that the sum of the numbers that came up is 26?
(a) What is the probability that exactly two 4’s come up?
(d) What is the probability that the sum of the numbers that came up is at least 26?
(b) What is the probability that exactly three 4’s come up?
7. Twenty balls in an urn are numbered 1 through 20. A blindfolded contestant draws five balls from the urn, with the order of the draw recorded.
(c) What is the probability that at least two 4’s come up? (d) What is the probability that the dice total 22?
(a) What is the probability that balls 1 2 3 4, and 5 were drawn (in this order)?
3. Two cards are selected from a standard deck. The order of the draw is unimportant.
(b) What is the probability that the number 3 ball was selected?
(a) What is the probability that both cards were kings?
(c) What is the probability that an even-numbered ball was drawn?
(b) What is the probability that neither card was an ace or face card? 4. Four letters are selected from the English alphabet (only lowercase is used). The order of the selection is recorded. (a) What is the probability that the first letter was q? (b) What is the probability that a and b are two of the letters? (c) What is the probability that the letters are a, b, d, z, in this order? 5. Eight bowling balls are in a bin. Two are defective, having been manufactured as cubes instead of the traditional spherical shape. A person uses a remote gripping device to pick three of the balls out of the bin, sight unseen. The order of the choice is unimportant. (a) What is the probability that none of the balls chosen was defective?
8. Seven drawers in a desk contain a 50-cent piece, while two other drawers contain a thousand dollar bill. A person is allowed to choose three drawers at random. The order of the draw is unimportant. (a) What is the probability that the person gets at least $1,000? (b) What is the probability that the person ends up with less than one dollar? (c) What is the probability that the payoff is $150? 9. Five cards are drawn, without regard to order, from a standard deck. (a) What is the probability that the hand contains at exactly one jack and exactly one king? (b) What is the probability that the hand contains at least two aces?
(b) What is the probability that exactly one defective ball was taken?
10. Twenty integers are chosen at random, and without regard to order, from the integers 0 to 100, inclusively.
(c) What is the probability that both defective balls were taken?
(a) What is the probability that all of the numbers chosen are larger than 79?
6. Seven pyramid-shaped (tetrahedron) dice are tossed. The faces on each pyramid are numbered 1 through 4.
(b) What is the probability that one of the numbers is 5?
26.6 Complementary Events
26.6
1121
Complementary Events If E is an event for some experiment, then E consists of certain outcomes. The outcomes that are not in E form the complement of E, denoted E C . This is another event. Notice that the complement of E C is E again: E C C = E We may think of E C as the event “E does not occur”. The outcomes in E, together with those in E C , are all of the outcomes. Further, E and E C have no outcomes in common. If we listed all of the outcomes in E, and then all those in E C , we would list all the outcomes in the sample space S, with no repetitions. This means that PrS = PrE + PrE C = 1 Another way of looking at this equation is that, for any event E, it is certain (probability 1) that either E occurs, or E does not occur (which is the same as E C occurs). If we write this equation in a slightly different way, we obtain the following. Principle of Complementarity: PrE = 1 − PrE C This equation holds for E = S as well, since then E C = ∅, so PrE = PrS = 1 and PrE C = Pr∅ = 0. The principle of complementarity is useful because it offers us the choice of computing PrE or PrE C , and one might be easier than the other.
EXAMPLE 26.22
Roll three dice. There are 63 = 216 outcomes. Consider the event E that the dice come up with a total of at least 5. Compute PrE. One way to do this is by using equation (26.1): number of outcomes in E 216 Counting the number of outcomes in E is certainly possible, but it will be tedious. E contains all outcomes with the dice summing to 5 or more, meaning that the dice can total 5, 6, 7, 8, 18. We would have to count the number of ways each of these totals can occur. However, look at E C , which consists of all tosses of the three dice in which the total is 4 or less. There are many fewer such outcomes, making them easier to count. The only way the dice can total 4 is to come up 2 1 1 or 1 2 1 or 1 1 2 (three ways). The only way they can total less than 4 is to total 3, which can happen only one way (they all come up 1). Therefore E C has exactly four outcomes in it, so PrE =
PrE C =
1 4 = 216 54
and by the principle of complementarity, 53 1 = ≈ 098 54 54 Rolling a total of at least 5 with three dice is not certain, but it is very likely. PrE = 1 −
CHAPTER 26
1122
SECTION 26.6
Counting and Probability
PROBLEMS
1. Seven dice are rolled. What is the probability that at least two of them come up 4? 2. Fourteen coins are tossed. What is the probability that at least three come up heads?
5. Four numbers are chosen at random and without regard to order from among the integers from 1 through 55, inclusively. What is the probability that at least one number is greater than 4?
3. Five cards are drawn from a standard deck, without regard to order. What is the probability that at least one is a face card, an ace, or numbered 4 or higher?
6. Six dice are rolled. What is the probability that they total at least 11?
4. Two coins are tossed and five dice are rolled. What is the probability that two heads and at least one 4 come up?
7. Here is the famous Birthday Problem. Suppose N people are in a room. What is the probability that at least two have the same birthday?
26.7
Conditional Probability Probability depends in subtle ways on how much is known. Here is an illustration of this. Suppose we pick a card at random from a deck of 52 cards. What is the probability that we chose a diamond? Since one fourth of the cards are diamonds, the probability of choosing a diamond is 1/4. However, suppose, as the card was drawn, it flashed for an instant in front of a mirror and we saw that the card’s suit is red. We immediately know that the card must be a diamond or heart. This narrows our choice down to one suit out two, not four, so the probability of a diamond is now 1/2. The additional knowledge that the suit was red has changed the sample space, which now consists of only those outcomes that are consistent with the additional information that the card chosen was red. Which is correct, 1/4 or 1/2? The answer is both are right, because these numbers are different probabilities. The first is a straightforward probability of picking a particular suit of card from the entire deck. The second is a conditional probability, in which we not only pick a card, but also have additional information that eliminates certain outcomes as possibilities, hence reduces the sample space. To put all of this on a firm footing that we can work with, suppose an experiment leads to outcomes and a sample space S. Single out a particular event U , having positive probability. (This will be the additional information of the conditional probability). If E is any event, define E ∩ U to be the event consisting of outcomes common to E and to U . For example, if E = a b c d e f g h and
U = k g h a w z
then E ∩ U = a g h This is also an event. E ∩ U is read “E intersect U ,” or “the intersection of E and U ,” or, in the probability context, “E and U .” Notice that E ∩ U = U ∩ E, since the act of taking all outcomes common to E and U does not depend on the order in which E and U are written. It is possible that there are no outcomes common to E and U , in which case E ∩ U = ∅, the empty event.
26.7 Conditional Probability
1123
Now imagine that the experiment is performed, but that, by some means (such as the mirror in the card experiment), we know that U occurs. The conditional probability of E, knowing U , is denoted PrE U and is computed as PrE U =
PrE ∩ U PrU
(26.2)
By taking the probability of E ∩ U in the numerator, we are factoring in the information that U is assumed known, so we only look at outcomes in E that are also in U . In the denominator, we have the probability of U computed with respect to the original sample space of the experiment. In effect, we are considering only outcomes of E that are in U , and thinking of U as the new sample space. We will redo the card example to illustrate these ideas and how equation (26.2) works.
EXAMPLE 26.23
Pick a card out of a 52-card deck. What is the probability that a diamond is chosen? Since 13 of the 52 cards are diamonds, PrE =
13 1 = 52 4
Now let U be the event that the card is a diamond or heart. This is the information flashed in the mirror. In this example E ∩ U = E, because every event in E (card is a diamond) is already in U (card is a diamond or heart). Further, U consists of exactly half of the cards, so PrU = 1/2 (in the original experiment). Then PrE U =
PrE ∩ U PrE 1/4 1 = = = PrU PrU 1/2 2
consistent with intuition. Knowing that the card is a diamond or heart, the probability of picking a diamond is 1/2.
EXAMPLE 26.24
Toss three dice. What is the probability of rolling a total of 5? This is not a conditional probability, but we need this result for the conditional probability we will consider. The sample space consists of triples a b c, in which each letter can be any integer from 1 through 6 inclusive. There are 63 = 216 outcomes. The outcomes in which the dice total 5 are 1 1 3 1 3 1 3 1 1 1 2 2 2 1 2 2 2 1 six in number. These constitute the event E: Roll a total of 5. Then PrE =
6 1 = 216 36
Now suppose a person sitting on the side knows that two dice are loaded, and must always come up 2. This means that at least two dice must come up 2 on any roll of all three dice. Let U consist of all outcomes x 2 2 2 x 2
and 2 2 x
1124
CHAPTER 26
Counting and Probability
in which x (for the honest die) can be any of 1, 2, 3, 4, 5, or 6. There are 16 such outcomes (note that when x = 2 all three triples are the same 2 2 2), so PrU =
2 16 = 216 27
Now E ∩ U = 1 2 2 2 1 2 2 2 1 the outcomes common to E and U . There are three such outcomes, so 3 1 = 216 72 The conditional probability that E will happen, knowing that U happens, is PrE ∩ U =
PrE U =
PrE ∩ U 1/72 3 = = PrU 2/27 16
This is the probability that the dice will sum to 5, as far as the person who knows about the two loaded dice is concerned. Notice that the probability of E (dice sum to 5) is only 1/36. However, to the person who knows that two dice always come up 2, the probability of the dice summing to 5 is considerably improved to 3/16. Knowing that two dice always come up 2 changes the sample space by eliminating certain outcomes. Examine equation (26.2) more closely. By using equation (26.1) twice, we can write PrE ∩ U =
number of outcomes common to E and U number of outcomes in S
and number of outcomes in U number of outcomes in S From these equations a conditional probability can be computed as PrU =
PrE U =
PrE ∩ U PrU
number of outcomes common to E and U number of outcomes in S = number of outcomes in U number of outcomes in S number of outcomes common to E and U (26.3) = number of outcomes in U This is exactly the result we would get by equation (26.1) if we used the fact that U is known to alter the experiment, thinking of U as the new sample space, and counting only those outcomes in E that are also in U . There is nothing in equation (26.3) that is not already in equation (26.2), but it makes explicit the cancellations carried out in the last example to compute a conditional probability.
EXAMPLE 26.25
Four coins are tossed. Two of them fall within view of an observer, who sees that they are both heads. What is the probability, to this observer, that exactly three of the coins come up heads?
26.7 Conditional Probability
1125
The experiment is to toss four coins. An outcome consists of a string abcd of four letters, each of which is either an H or a T . The event E we are interested in consists of all such strings with exactly three H s. E has four outcomes in it: E = THHH HTHH HHTH HHHT Let U be the event that at least two heads have come up, since this is what the observer knows when he or she sees two heads. Then U consists of all of the outcomes with exactly two heads, or exactly three heads, or exactly four heads. The number of these outcomes is 4 C2 + 4 C3 + 4 C4
= 6 + 4 + 1 = 11
Further, every outcome in E is also in U , so the number of outcomes common to E and U is four, the number of outcomes in E. By equation (26.3), the conditional probability that E happens, knowing that U happens, is PrE U =
number of outcomes common to E and U 4 = number of outcomes in U 11
The knowledge that at least two heads must come up makes the outcome of exactly three heads more likely than it would be without this information.
SECTION 26.7
PROBLEMS
1. Two coins are flipped. (a) What is the probability that the first one came up heads? (b) What is the probability that the first one came up heads, if we know that at least one came up heads? 2. Four coins are flipped. (a) What is the probability that exactly three came up heads? (b) What is the probability that exactly three come up heads, if we saw that one came up tails? 3. Four coins are flipped. (a) What is the probability that at least three came up tails? (b) What is the probability that at least three came up tails, if we saw two coins land tails? 4. Six cards are dealt from a standard deck without regard to order. (a) What is the probability that exactly two face cards were dealt? (b) What is the probability that exactly two face cards were dealt, if we know that a king was dealt?
5. What is the probability that four rolls of the dice total exactly 19? What is the probability of totaling exactly 19 if we know that one die came up 1? 6. What is the probability that two rolled dice sum to at least 9, if we know that one die came up even? 7. What is the probability that a five-card poker hand (unordered) has four aces, if we know that a four of spades was dealt? 8. What is the probability that a toss of seven coins will produce at least five heads, if we know that four of them came up heads? 9. Calculate the probability that four dice will all come up odd. What is this probability if we know that one came up 1 and another came up 5? What is the probability if we know that the second die came up 6? 10. Suppose A and B are events. Denote A ∪ B as the event containing all outcomes in A and all those in B. For example, if A = a b c d and B = c d e f, then A ∪ B = a b c d e f. Show that PrA ∪ B = PrA + PrB − PrA ∩ B Conclude that, if A and B have no outcomes in common, then PrA ∪ B = PrA + PrB. However, give an example to show that this equality fails to hold if A and B have outcomes in common.
CHAPTER 26
1126
26.8
Counting and Probability
Independent Events Informally, two events are independent if the knowledge that one occurs implies no information about the probability of the other occurring. This can be stated in terms of conditional probability.
DEFINITION 26.3
Events E and U are independent if PrE U = PrE or PrU E = PrU If E and U are not independent, then we say that they are dependent.
EXAMPLE 26.26
Consider the simple experiment of flipping one coin once. The sample space is S = H T, and the nonempty events are H, T and H T. We claim that the events E = H and U = T are dependent. The reason for this is that E ∩ U = ∅, so PrE ∩ U = 0, forcing PrE U = 0 (equation (26.2)). However, PrE = 1/2, so PrE U = PrE Similarly, U ∩ E = ∅, so PrU E = 0, but PrU = 1/2, so PrU E = PrU Therefore U and E are not independent. Although we have formally verified that E and U are not independent, the dependence of these events in this experiment is obvious. In one coin flip, if we know that one of the events H or T occurs, then we know that the other does not.
EXAMPLE 26.27
Draw two cards from a deck by first drawing one card, then drawing a second card (without replacement). The outcomes are pairs a b, in which a is the first card drawn, and b the second. Order makes a difference here, since, for example, if a king of hearts is drawn first, then b could not be a king of hearts. The card drawn first influences the possibilities for the card drawn second. The sample space S consists of all pairs a b with a and b cards, but a = b. There are 52 · 51 = 2652 outcomes in S. Consider the events E a king is drawn first, and U a jack, queen, king or ace is drawn second.
26.8 Independent Events
1127
Events in E are pairs a b with a = king and b any card different from that drawn for a. There are four ways of drawing a king, and after this, 51 ways of drawing the second card, so there are 4 · 51 = 204 events in E. Then PrE =
204 2652
How many outcomes are in U ? U will consist of all pairs a b, in which a and b are both drawn from the jacks, queens, kings and aces, together with all pairs a b with only b drawn from the jacks, queens, kings or aces. In the first category, there are 16 possibilities for a and after that 15 for b, for 15 · 16 = 240 possible outcomes. In the second category with only b a jack, queen, king or ace, there are 36 choices for a and 16 for b, for a total of 36 · 16 = 576 outcomes. Therefore U has 240 + 576 = 816 outcomes. To compute the conditional probability PrE U, we need to look at E ∩ U . This event consists of all outcomes a b with a a king, and b drawn from the jacks, queens, kings and aces. There are four choices for a, and then 15 for b, so E ∩ U has 4 · 15 = 60 outcomes in it. Then number of outcomes common to E and U number of outcomes in U 60 = 816
PrE U =
Since 204/2652 = 60/816, then PrE = PrE U Similarly, it is routine to check that PrU = 816/2652 and PrU E = 60/204, so PrU = PrU E This means that E and U are not independent, and are therefore dependent. This makes sense intuitively. If we know U , then we know that a jack, queen, king, or ace was drawn second. This changes the probability that a king was drawn first.
EXAMPLE 26.28
We will repeat the experiment of the preceding example, but this time with replacement. That is, draw a card, record the result, then replace the card in the deck, shuffle, and draw again. Now the sample space S consists of all pairs a b with a and b possibly the same card. There are 52 · 52 = 2 704 outcomes in S. Let E and U be the events of the preceding example. Now E consists of outcomes a b, with a a king (four possibilities), but b can be any card (52 possibilities). There are now 4 · 52 = 208 outcomes in E, different from the preceding example because this experiment is different. Now PrE =
208 2704
This happens to be the same probability of this event in the previous experiment. Now there are more outcomes, but also more ways E can happen. Now consider pairs a b in U . There are 52 possible choices for a, and 16 for b, for a total of 52 · 16 = 832 outcomes in U .
1128
CHAPTER 26
Counting and Probability
Finally, the outcomes common to E and U are the outcomes a b with a a king and b a jack, queen, king or ace. There are four ways to choose a, and (with replacement) 16 cards from which to draw b, for 4 · 16 = 64 outcomes common to E and U . Therefore number of outcomes common to E and U number of outcomes in U 64 = PrE = 832 The probability of E occurring is the same as the probability of E occurring, knowing that U occurs. Events E and U are therefore independent. Again, this makes sense intuitively. Because we are drawing with replacement, knowing that a face card was drawn second does not tell us anything about the first card drawn. PrE U =
26.8.1 The Product Rule Equation (26.2) can be written PrE ∩ U = PrE U PrU
(26.4)
This equation is called the product rule, and it is particularly useful in the context of independent events. If E and U are independent, then PrE U = PrE or PrU E = PrU. If PrE U = PrE, then the product rule becomes PrE ∩ U = PrE PrU If PrU E = PrU, we can use the fact that E ∩ U = U ∩ E and use the product rule to write PrE ∩ U = PrU ∩ E = PrU E PrE = PrU PrE In either event, then, if E and U are independent, then PrE ∩ U = PrE PrU Conversely, suppose E and U are events, and PrE ∩U = PrE PrU. Then by the product rule, PrE ∩ U = PrE PrU = PrU PrE U so PrE U = PrE and therefore E and U are independent. We can summarize these conclusions as follows. THEOREM 26.3
Events E and U are independent if and only if PrE ∩ U = PrE PrU
(26.5)
This result is used in two ways. First, it provides another way of determining whether or not two two events are independent. Equation (26.5) is sometimes easier to check than the two conditions in the definition of independence. Second, equation (26.5) provides an often convenient way of computing the probability of an event E ∩U , consisting of outcomes common to E and U , as a product of the individual probabilities of E and U .
26.8 Independent Events
1129
Keep in mind, however, that the product rule applies only when E and U are independent. In this case, equation (26.5) reads “the probability of E and U is the product of the probability of E with the probability of U .
EXAMPLE 26.29
Suppose the probability of having a boy is 049, and that of having a girl, 051. A family has four children. What is the probability that exactly three of them are boys? Here the experiment is to have four children. An outcome is a string of four letters, for example, gbbg for girl, boy, boy, girl. The event E that we are interested in is that the family has three boys and one girl: E = bbbg bbgb bgbb gbbb The probability of any one of these is the product of the probabilities of each letter, g or b. The reason for this is that, in looking at four births !, any of these letters can be a g or a b independent of what the others are. Therefore Prbbbg = 049049049051 = 006 = Prbbgb = Prbgbb = Prgbbb The probability of E is the sum of the probabilities of each outcome in E: PrE = 006 + 006 + 006 + 006 = 024 This probability is based on the information that girls are slightly more probable than boys in individual births. If boys and girls were equally likely (each probability 1/2), then we would proceed as follows. E has four outcomes in it, and the sample space has 24 = 16 outcomes. Now the probability of E is 4/16 = 025.
SECTION 26.8
PROBLEMS
1. Four coins are flipped. E is the event that exactly one coin comes up heads. U is the event that at least three coins come up tails. Determine whether E and U are independent. 2. Two cards are drawn from a standard deck, without replacement. E is the event that both cards are aces. U is the event that one card is a diamond and the other is a spade. Determine whether these events are independent.
comes up heads. U is the event that at least one die comes up 6. Determine whether these events are independent. 6. An experiment consists of picking two cards from a standard deck, with replacement. E is the event that the first card drawn was a king. U is the event that the second card drawn was an ace. Determine whether E and U are independent.
3. Two dice are rolled. E is the event that the dice total more than 11. U is the event that at least one die comes up an even number. Determine whether these events are independent.
7. Two cards are dealt from an honest deck, without replacement. E is the event that the first card dealt was a jack of diamonds. U is the event that the second card was a club or spade. Determine whether these events are independent.
4. A family has six children. E is the event that at least three are girls. U is the event that at least two are girls. Determine whether these events are independent.
8. Four coins are flipped. E is the event that the first coin comes up heads. U is the event that the last coin comes up tails. Determine whether these events are independent.
5. An experiment consists of flipping two coins and then rolling two dice. E is the event that at least one coin
9. Suppose a coin has been shaved so that the probability of tossing a head is 040. This coin is flipped
CHAPTER 26
1130
Counting and Probability
four times. What is the probability of getting at least two heads. What is the probability of getting exactly two heads? 10. A dishonest die comes up only 1, 4 or 6. This die is rolled three times. What is the probability that the total is 5?
26.9
11. A jar contains twenty marbles, eight red, eight blue and four green. The probability of drawing a blue marble is twice that of drawing a green, and three times that of drawing a red. What is the probability of drawing exactly two red marbles if three marbles are drawn at random, without replacement?
Tree Diagrams in Computing Probabilities Tree diagrams, in conjunction with the product rule, often provide a convenient way of computing certain probabilities in which the experiment can be broken down into a sequence of steps.
EXAMPLE 26.30
A cabinet has two drawers, labeled left and right. In the left drawer are three envelopes, e1 , e2 , and e3 , and in the right drawer are four vouchers, v1 through v4 . A drawer is chosen at random, then an object is selected at random from that drawer. What is the probability that the object drawn is v3 ? We may think of this experiment as proceeding in two stages, each of which is an experiment in its own right: Experiment 1—choose a drawer. Experiment 2—choose an object from a drawer. The tree diagram of Figure 26.5 displays the sequence of experiments. First, there is a probability of 1/2 of choosing either drawer, hence the numbers on the first two branches of the tree. There is a probability of 1/3 of choosing any of the envelopes, if drawer one was chosen. There is a probability of 1/4 of choosing any of the vouchers, if drawer two was chosen. The branches from each drawer to the last outcomes are conditional probabilities, since, for example, we can only pick an envelope if drawer one was chosen. This is where the product rule for probabilities comes in. To find the probability of choosing voucher v3 first follow the lower branch to drawer two, with probability 1/2, then the branch from drawer two to v3 , which has conditional probability 1/4. The product of these two numbers gives the probability of choosing v3 : Prv3 = Prdrawer two Prv3 knowing drawer two was selected 1 1 1 · = 2 4 8 As we might expect by symmetry, the probability of choosing any envelope is the same (namely 21 · 13 = 16 ), and the probability of choosing any voucher is the same ( 21 · 41 = 18 ). Notice that the probabilities of all of the outcomes sum to one, as they must: 1 1 3 +4 = 1 6 8 =
The tree diagram is actually taking us through a sequence of conditional probabilities. The first stage of the experiment is to choose a drawer, but after this, the choice of envelope or voucher is a conditional probability, based on the choice of the drawer. In this example, there
26.9 Tree Diagrams in Computing Probabilities
1131
e1 (Prob = 1/6) 1/3 1/3 Left drawer
e2 (1/6)
1/3 e3 (1/6)
1/2
υ1
1/2 1/4 Right drawer
1/4
(1/8)
υ2 (1/8)
1/4
υ3 (1/8)
1/4
υ4 (1/8) FIGURE 26.5 Probabilities of outcomes of choosing a drawer, then an envelope or voucher.
were only two stages to the experiment, but more stages could be accommodated by continuing the tree with more edges from each stage to the next one. For the probability of any final outcome o, first identify each path leading to o. On each such path, multiply the probabilities on each branch of the path, and then add these results.
EXAMPLE 26.31
A room contains three urns. One urn has two baskets. One basket holds an envelope containing a $1 bill, an envelope containing $50 and an envelope containing $100. The other basket has three envelopes, one containing a quarter, one a half dollar and one a silver dollar. The second urn has two baskets. One basket holds an envelope containing nothing and an envelope containing $1 000, while the second basket has three envelopes, containing $50, $1, and $500. The third urn contains four baskets. The first basket has four envelopes, three holding nothing and the fourth, $200. The second basket has two envelopes, one with nothing and one with $300. The third basket has three envelopes, with, respectively, $1, $50, and $500. The fourth basket contains two envelopes, with $100 and $500. A person gets to choose one urn, then from it, one basket, and from it, one envelope. Determine the outcomes and the probability of each. This is an example of a problem that takes longer to state than to solve. Figure 26.6 shows a tree diagram for this experiment. By following paths to each payoff, as many times as it occurs as an end result, we obtain the following probabilities:
1132
CHAPTER 26
Counting and Probability
$1 $50
1/3 1/3
$100
1/3 $ 0.25
Basket 1 1/3 1/3
$ 0.50
Basket 2
1/2
1/3 $1 1/2
Urn 1
$0 1/2 1/3
$ 1000
1/2 Basket 1
$ 0.50 1/3
1/2
1/3 1/3
1/2
1/3 Basket 2
Urn 2
$ 500
$0 $0 $0 1/4 1/4
1/3
1/4 1/4
$ 200
Basket 1 1/2 1/2
1/4
$0 $ 300
1/4 Basket 2
$1
1/3
1/4
Urn 3
$ 50
1/3 1/4
1/3
Basket 3
$ 500 1/2 $ 100
Basket 4 1/2
$ 500 FIGURE 26.6
$1
Outcomes of choosing an urn, then a basket, then an envelope.
26.9 Tree Diagrams in Computing Probabilities
1133
111 111 3 111 +3 + = Pr$0 = 322 344 3 4 2 16 Pr$025 = Pr$050 = Pr$1 = Pr$50 = Pr$100 = Pr$200 = Pr$300 = Pr$500 = Pr$1 000 =
111 1 = 3 2 3 18 111 111 1 + = 323 323 9 111 111 111 111 7 + + + = 3 2 3 3 2 3 3 2 3 3 4 3 36 111 111 1 + = 3 2 3 3 4 3 12 7 111 111 + = 3 2 3 3 4 2 72 111 1 = 3 4 4 48 1 111 = 3 4 2 24 111 111 111 1 + + = 323 343 342 8 111 1 = 3 2 2 12
Again, these probabilities of all the outcomes must sum to 1.
SECTION 26.9
PROBLEMS
1. A cabinet has four drawers. Two of the drawers each contain two envelopes, each containing $10, one drawer contains one envelope with $5 and one envelope with $50, and one drawer contains one empty envelope and one envelope with $1 200. A person can choose any drawer, and from that drawer, any one envelope. What are the outcomes and their probabilities? 2. A room has three urns in it. One urn has two compartments, one with nothing in it, and one with a key which can open any of four safes. One safe has diamonds, one has stocks, one has cash, and one has a Cracker Jacks whistle of sentimental value to the owner of the urns. The other two urns each have three compartments. One of these urns has two empty compartments and one filled with Confederate currency. The third urn has one compartment filled with expensive perfume, one filled with stock certificates, and the third with the deed to a mansion. A person can pick an urn and
any compartment and, if the key comes up, any one of the safes. What are the outcomes and their probabilities? 3. A wealthy sultan has six automobile sheds which look identical from the outside. Each shed contains a number of identical containers, with each container closed, but holding one vehicle. One shed has two identical Fords and a Chevrolet (in their containers). A second shed has a VW Beetle (circa 1952) and a Porsche. A third shed has a Lamborghini and a very nice tricycle. A fourth shed has two Mercedes SUVs and a Honda Civic. A fifth shed has a World War I tank (partially destroyed in battle) and a Porsche. And the sixth shed has three mountain bicycles and a mint condition Stanley Steamer. A person can pick a shed, and then any container in that shed. What are the outcomes and their probabilities? 4. A traveler can choose to fly in a Piper Cub that seats one passenger,a company jet seating eight passengers,
1134
CHAPTER 26
Counting and Probability
or a jumbo jet seating seven hundred passengers. The traveler can pick any plane and any passenger seat on that plane. What is the probability that the traveler picks an odd numbered seat on the jumbo jet? 5. A person can choose any of five houses, each of which has four upstairs bedrooms. In one of the houses, three bedrooms are empty and one contains an antique chair worth $50 000. In another house, two bedrooms are
26.10
empty and one contains $1 000, and one contains a newly minted nickel. In a third house, each bedroom contains $500. And in the last two houses, two bedrooms contain $1 500 each, one contains $20, and one contains a person-eating lion which has not been fed recently. If the person is to pick a house and then a bedroom, what are the outcomes and their probabilities?
Bayes’ Theorem Bayes’ Theorem is named for the Reverend Thomas Bayes (1702–1761), although it only appeared in print after his death. It enables us to determine the conditional probability of E, knowing U , if we know the individual probabilities of E and U ∩ E (which is the event U and E ) as well as the conditional probabilities of U knowing E and U knowing E C . The rule is as close as we normally come in mathematics to “getting something for nothing,” and its derivation appears at first to be just a simplistic sequence of substitutions. If done in the right order, however, a very useful formula results. Begin with an experiment, and consider events E and U . Recall that E ∩ U = U ∩ E. Now Probability of E, Assuming U = PrE U =
PrE ∩ U PrU
=
PrU ∩ E PrU
=
PrU E PrE PrU
(26.6)
in which we used the product rule (equation (26.4)) in the numerator to go from the next to last line to the last line. Now consider Figure 26.7, in which U and E are shown as typical sets of outcomes (events). We can split the outcomes in U into two kinds: those that are also in E, hence in U ∩ E, and those that are not in E, hence are in U ∩ E C . Furthermore, U ∩ E and U ∩ E C have no outcomes in common. Therefore PrU = PrU ∩ E + PrU ∩ E C Now apply the product rule (26.4) to each of the probabilities on the right to obtain PrU = PrU E PrE + PrU E C PrE C Substitute this result into the denominator on the right in equation (26.6) to get PrE U =
PrU E PrE PrU E PrE + PrU E C PrE C
(26.7)
Equation (26.7) is known as Bayes’ Theorem, although it is actually a special case of a more general result to be stated shortly (equation (26.8)). The theorem enables us to determine the conditional probability of E, knowing U , if we know: the probability of E, the probability of E C , the probability of U knowing E, and the probability of U knowing E C . The tree diagram in Figure 26.8 shows points representing the events E and E C , which together contain all the outcomes. Outcome U is given. From point E (the case that E occurs),
26.10 Bayes’ Theorem
1135
E
U U∩ E
U∩ EC
FIGURE 26.7 U consists of outcomes in U and in EU ∩ E and outcomes in U that are not in EU ∩ E C .
the two conditional events, U E and U C E are shown. And from vertex E C , the case that E does not occur, the two conditional events, U E C and U C E C are shown. Four of the branches are labeled with probabilities. The numerator in Bayes’ Theorem is the product of the probabilities on branches 1 and 2. The denominator is the sum of this product, and the product of the probabilities on branches 3 and 4. The paths ending in conditional probabilities involving U C are not relevant in computing PrE U in this way. Bayes’ Theorem is all about drawing probability inferences from certain kinds of available information. Here is a typical application of the theorem. EXAMPLE 26.32
A factory produces wombles. Suppose, over a given day, the probability of producing a defective womble is 004. Suppose the probability that a defective womble will result in injury to a womble user is 002, while the probability that a womble user is injured through no fault in the product is 006. A lawsuit is in progress, in which the plaintiff has been injured and wishes a settlement from the company. The defense attorney wants some measure of whether a womble was to blame. The defense therefore wants to know the probability that the product was defective, if it is known that an injury occurred. Let U be the event that injury occurs, and let E be the event that a defective womble was produced. From the given information, PrE = 004
PrU E = 002
and
PrU E C = 006
We want to determine PrE U. Figure 26.9 shows the three probabilities just listed, together with the probability PrE C = 1 − 004 = 096 which we infer from the principle of complementarity. We now have all the terms occurring on the right side of Bayes’ Theorem, so PrE U = =
PrU E PrE PrU E PrE + PrU E C PrE C 002004 ≈ 00137 002004 + 006096
CHAPTER 26
Counting and Probability
Pr(U
/ E)
U/E
2 U/E
E
02
0.
UC/E
4
1
0.0
(E )
E Pr
Pr
(E
C
)
4 6
0.9
3
U/E C )
U/E C
Pr(
1136
EC
EC
0.0
6
UC/EC FIGURE 26.8
(simplest case).
Bayes’ theorem
U/EC FIGURE 26.9 Computing probabilities of a defective womble, knowing that an injury occurred.
In Figure 26.8, points for U C E and U C E C were included for completeness, although paths to these two points were not used in Bayes’ Theorem. In Figure 26.9, only the points representing events used in Bayes’ Theorem are included. There is a more general form of Bayes’ Theorem in which the single event E is replaced by k events E1 Ek . Now we find that PrEj U =
PrEj PrU Ej PrE1 PrU E1 + PrE2 PrU E2 + · · · + PrEk PrU Ek
(26.8)
for j = 1 k. Figure 26.10 illustrates this conclusion. Like Figure 26.9, this tree diagram contains only paths needed for terms in equation (26.8).
EXAMPLE 26.33
An airplane manufacturing company receives shipments of parts from five companies. The following table gives the percent of the total parts needs filled by each company, and the probability for each company that a part is defective (data taken over some period of time). A defective part is found. What is the probability that it came from Company 4?
26.10 Bayes’ Theorem
1137
U/E1
E1 U/E2 E2 U/Ek
Ek
C
E1
C
U/E1 C
E2
C
U/E2 C
Ek
U/EkC
FIGURE 26.10 General form of Bayes’
theorem.
Company 1 2 3 4 5
Supplied 15 2 4 48 27
Probability of Defect .03 .04 .01 .02 .08
Imagine a part has been chosen from the parts inventory. Let Ej be the event that the part came from company j, for j = 1 2 3 4 5. Let U be the event that the part is defective. We want to compute PrE4 U. From equation (26.8) we have enough information to do this: PrE4 U = =
PrE4 PrU E4 PrE1 PrU E1 + PrE2 PrU E2 + · · · + PrE5 PrU E5 048002 015003 + 002004 + 004004 + 048002 + 027008
= 025
1138
CHAPTER 26
Counting and Probability
rounded to two decimal places. Notice that the percent supplied by each company (Column 2 of the table) is converted to decimals in thinking of these percents as probabilities. Just for comparison, consider the probability that the defective part came from the next largest supplier, Company 5. The denominator in applying equation (26.8) is the same as that used for Company 4, and we need only recompute the numerator: PrE5 U =
027008 015003 + 002004 + 004004 + 048002 + 027008
= 057 to two decimal places. The disparity is due to the fact that, while Company 5 supplies a little more than half the parts Company 4 does, it has four times the failure rate on its parts.
SECTION 26.10
PROBLEMS
1. A company hires people in four different groupings to install automobile carpet. The company has found that people with at least ten years experience generally produce products that are defective 1% of the time; people with five to ten years of experience, 3% of the time; people with one to five years experience, 57% of the time; and people with less than one year, 113% of the time. Of the total workforce preparing carpet, those with ten or more years comprise 45% of the total; those with five to ten years, 37%; those with one to five years, 7%; and those with less than one year, 11%. For each of these categories, calculate the probability that a defective installation was done by a worker in that group. 2. Suppose 37% of a company’s computer keyboards are manufactured in Los Angeles, and 63% in Detroit. Suppose 2% of the keyboards made in Los Angeles have some problem, and 53% of those made in Detroit are defective. Calculate the probability that a given defective keyboard was manufactured in Los Angeles, and also the probability that it came from Detroit. 3. A drug trial performed with seriously ill patients produces the following data: Adult Adult Boys Girls Men Women Survived at least two years 257 320 104 152 Survived less than one year 619 471 51 38
(a) If a patient is selected randomly from this pool, what is the probability of choosing an adult male, if we are told that the person survived at least two more years?
(b) What is the probability of choosing a girl if we are told that the person survived less than one year? 4. A certain herb is grown in three places on Earth. It is known that this herb has medicinal value in treating certain illnesses, but that sometimes it is fatal. In the accompanying table, column two gives the percent of the total herb production of the world that is grown in that location, and the third column gives a percent of the herb from that location that is fatal.
Lower Amazon Sub-Saharan Africa Newark, New Jersey
% of Total 62 25 13
Percent Fatal 14 6 1/2
For each location, calculate the probability that a sample of the herb came from that location, if it was found to be fatal. 5. Target pistols are manufactured in six cities. The following table gives the percent of the total production that comes from each city, together with the probability that the gun explodes upon firing. City City City City City City
1 2 3 4 5 6
% of Total 15 7 21 4 9 44
Probability of Exploding .02 .01 .06 .02 .03 .09
Suppose a gun explodes. Calculate, for each city, the probability that the gun came from there.
26.11 Expected Value
26.11
1139
Expected Value For anyone visiting a carnival or gambling casino, the expected value of a game or experiment is a very useful number to be able to compute. Here is an example to illustrate the idea.
EXAMPLE 26.34
A carnival game has a large wheel with the numbers 1 through 50 equally spaced around the outer rim. It costs $1 to bet on which number will end up at the top when the wheel is spun and allowed to come to rest. A winner receives $25. Is this a profitable game to play? In view of the payoff, which is twenty-five times the cost of making a bet, it apparently is believed that such a game is a winning proposition for the player. At least, the game is popular and seems to draw a lot of players. On the other hand, we should be suspicious, since the carnival is a profit-making enterprise and the owners of the game must be making money or they would change the game. Who is right? We will answer this question by trying to define a sense of how much we should expect to win or lose each time we play. On one spin of the wheel, there are fifty equally likely outcomes, assuming an honest wheel (a reasonable assumption, since the player can bet on any number). On any number we pick, the probability of winning is 1/50, and the probability of losing is 49/50. Since winning pays $25 and losing costs $1, compute 49 1 $25 − $1 = −$048 50 50 This is the expected value of this game from the player’s perspective. A player should expect on average to lose (because of the negative sign) forty-eight cents on each spin. Therefore the "house" expects to win this much on each spin, with each player. Although the game owner will occasionally make payouts, the owner stands on average to make nearly a half dollar per player per game. Thought of in a slightly different way, the player will on average win $25 , for 1/50, or 2% of the time, and lose $1 for 49/50, or 98% of the time. Although the winning payoff is many times the loss on each game, a person will most likely lose so many more times than he or she wins that on average each person will contribute forty-eight cents to the carnival for each game. In general, we obtain the expected value of an experiment by multiplying the probability of each outcome by the value of that outcome, with a plus sign for value gained, and a negative sign for value lost, and adding the results. Thus, the expected value has a perspective, since a loss for one person may be a win for another. Of course, the concept of expected value assumes that each outcome has been given a value or payoff of some kind.
EXAMPLE 26.35
Twelve coins are flipped. We win $8 every time at least eight heads come up, and lose $2 otherwise, with no significance placed on the order of the coins. What is the expected value of this game?
1140
CHAPTER 26
Counting and Probability
The number of ways of obtaining at least eight heads is the sum of the number of ways of obtaining exactly eight, nine, ten, eleven or twelve heads. This number is 12 12 12 12 12 + + + + 8 9 10 11 12 = 495 + 220 + 66 + 12 + 1 = 794 Since there are 2
12
outcomes, the probability of obtaining eight or more heads is Preight or more heads =
794 4096
The probability of this not occurring is Prfewer than eight heads = 1 −
794 3302 = 4096 4096
The expected value of this game is 794 3302 252 8 − 2 = − 4096 4096 4096 or about −00615 dollars. On average, the owner expects to make slightly over six cents on each game.
EXAMPLE 26.36
Based on performance over the past thirty years, a salesman estimates the following probabilities for commissions on sales over the coming year: 025 probability of getting $400 from shoe sales, 015 probability of getting $200 from trouser sales, 035 probability of getting $600 from sport jacket sales, 020 probability of getting $1 000 from sales of complete golf outfits, and 005 probability of getting $300 from sales of derby hats. He wants to calculate his expected commission to estimate his income for tax purposes. The information is most easily read by making a table: Commission Probability
400 0.25
200 0.15
600 0.35
1000 0.20
300 0.05
Notice that the probabilities add up to 1, since presumably all possibilities for commission income have been included. The expected commission income is 400025 + 200015 + 600035 + 1000020 + 300005 = 555 dollars.
SECTION 26.11
PROBLEMS
1. A game consists of flipping seven coins, without regard to order. If at least three heads come up, the player wins $5. If not, the player loses $9. What is the player’s expected value for this game?
3. Six cards are dealt from a standard deck, without regard to order. If one or more aces comes up, the player wins $45. Otherwise he or she loses $30. What is the player’s expected value?
2. Five dice are tossed. If at least three come up 5 or 6, the house pays $50. If not, the house wins $4. What is the player’s expected value?
4. Nine coins are flipped. If seven or more come up tails, the player wins $50. Otherwise the player loses $3. What is the player’s expected value?
26.11 Expected Value
1141
5. Twenty marbles, labeled 1 through 20, are in a jar. A game consists of reaching in blind and pulling out three marbles. If at least two are even, the player wins $3. Otherwise, the player loses $7. What is the player’s expected value?
7. Seven hats are in a closet. Four are black and three are red. A person reaches in and pulls out four hats, without regard to order. If at least two are red, the player wins $10. Otherwise the player loses $5. What is the player’s expected value?
6. Sixteen dice are rolled, and unordered results are recorded. If at least five sixes come up, the player wins $70, otherwise the player pays $4. What is the player’s expected value?
8. Six coins are flipped, then four dice are rolled. If four heads or a total of at least 24 on the dice result, the player wins $25. Otherwise the player loses $3. What is the expected value of this game?
This page intentionally left blank
CHAPTER
27
BELL CURVES, NORMALLY DISTRIBUTED DATA, BINOMIAL DISTRIBUTION, STUDENT t-DISTRIBUTION, CORRELATION COEFFICIENT, CONFIDENCE INTERVAL, REGRESSION LINE
Statistics
With this background on counting and probability, we can begin to look at some of the ideas behind statistics.
27.1
Measures of Center and Variation Many processes generate numbers, in tables of data, graphs, or some other format. Such numbers are called data. A statistic is a measure that assists in the analysis of data in order to extract information and draw conclusions from the data. In this section we will discuss two kinds of elementary statistics. The first kind are measures of center, attempting to quantify in some sense the “middle” or “center” of a data set. The second kind of measure attempts to quantify the “spread” of the data, giving a sense of how the numbers vary from each other.
27.1.1
Measures of Center
Suppose we have N numbers, A1 , A2 , AN , not necessarily all distinct.
DEFINITION 27.1
Mean or Average
The mean, or average, of numbers A1 , A2 , AN is their sum, divided by N : Mean of A1 A2 AN =
A 1 + A 2 + · · · + AN N
This number is often denoted x. Sometimes the Greek letter (mu) is also used. 1143
1144
CHAPTER 27
Statistics
EXAMPLE 27.1
The mean of the numbers 16, −25, 27, 12, 15 and 31 is x=
16 − 25 + 27 + 12 + 15 + 31 38 = 6 3
As this example shows, the mean of a set of numbers need not be one of the numbers in the set. In computing means, some numbers may occur more than once in the data list. For example, in a list of daily high temperatures in a city over a thirty year period, some temperatures may be repeated. These are still listed because they accurately record the data of interest. The mean of a data set is a statistic that provides what is in a sense a “balance point” for the data. Consider a standard 12 inch ruler, and suppose the data set consists of the numbers 2, 3, 35, 4, 8, 11 and 115. Imagine a one gram weight placed at each of these points on the ruler, as in Figure 27.1. At what point should a fulcrum be placed so that the ruler balances? The answer is, it should be placed at the mean of the data set, which is 2 + 3 + 35 + 4 + 8 + 11 + 115 = 61429, 7 just past the half-way mark on the ruler. In this way the mean behaves like the center of mass of the data. Mean is a useful statistic, but it must be treated with care. x=
6.1429
FIGURE 27.1
The mean is the balance point for the data set.
EXAMPLE 27.2
Last year there were eight graduates of a very prestigious electrical engineering progam. Upon graduation, they all received good job offers with starting salaries listed below: Person
Starting Salary
Jones Smith Connolly Andrews Finch Hatton Douglas Seagram
$80,000 $85,000 $78,000 $90,000 $80,000 $87,000 $92,000 $6,000,000
The average starting salary is x = $824000. This is very misleading. The first seven graduates listed certainly did very well, but nothing like this. What the data do not reveal is that Seagram hit 200 home runs in 250 at bats in college and could pitch every other day, either right handed
27.1 Measures of Center and Variation
1145
or left handed, with a 153 mile per hour fast ball and a curve ball that breaks at an angle of 3/4 radians. In this data set, Seagram would be called an outlier, which is a point very far removed from all of the others. Outliers skew the mean and invite false conclusions if care is not taken. In some analyses of (fairly large) data sets, the highest and lowest numbers are dropped before computing a mean. For example, in some figure skating scoring systems, the high and low scores are deleted and the other scores averaged. A different measure of center is given by the median. This number locates the “middle” of a data set, without taking into account the actual magnitude of the numbers themselves.
DEFINITION 27.2
Median
Suppose we are given an ordered data set A1 , A2 , , AN , in which Aj ≤ Aj+1 for j = 1 N − 1. The median of this set is a number M having the property that half of numbers in the data set are ≤ M, and half of the numbers are ≥ M.
The mean gives a center of gravity (weighted center) of the data set, taking the magnitude and sign of each number into account. By contrast, the median gives a geometric center (actual middle of the list), and only assumes that the numbers are written in nondecreasing order, without other regard to their magnitude. Like the mean, the median may or may not itself be a number in the data set. It is easy to verify that if: N is odd, then M = AN +1/2 and N is even, then M =
AN/2 + AN/2+1 2
In the case that N is odd, one number, namely AN +1/2 , is at the exact center of the ordered list, so in this case the median is a number in the list. When N is even, the median is the average of entry N/2 and entry N/2 + 1 in the list, counting from the left, and may or may not be a number in the list.
EXAMPLE 27.3
Find the median of the data set −4 16 1 2 2 3 7 5 15 −4 2 1 1 3 9 8. First order (or sort) the list: −4 −4 1 1 1 2 2 2 3 3 5 7 8 9 15 16 Here N = 16, so A 8 + A9 2+3 = = 25 2 2 In this case the median is not in the set. There are eight numbers in the data set to the left of 25, and eight numbers to the right: M=
−4 −4 1 1 1 2 2 2 M = 25 3 3 5 7 8 9 15 16 : ;< = : ;< =
1146
CHAPTER 27
Statistics
EXAMPLE 27.4
Consider the data set −4 16 1 2 2 3 7 5 15 −4 2 1 1 3 9 8 1 As an ordered list, this is −4 −4 1 1 1 1 2 2 2 3 3 5 7 8 9 15 16 with N = 17 numbers, an odd number. If we denote these numbers B1 B17 , the median is M = BN +1/2 = B9 = 2 and we can write −4 −4 1 1 1 1 2 2 M = 2 3 3 5 7 8 9 15 16 ;< = : ;< = : The median is 2, but it is the third 2 from the left in the ordered list, not one of the other 2’s. In this case the median is a number in the list. Data is frequently given by a frequency table, particularly if N is large. This is a table that gives each number according to its frequency in the list, saving the trouble of writing the number down as many times as it is repeated.
EXAMPLE 27.5
Data (already ordered) is given in the frequency table Data Point Frequency
−3 5
−1 2
4 12
7 6
12 12
17 21
The data has −3 listed five times, −1 two times, and so on. The sum of the frequencies is N = 58, and this is the number of data points. The mean is −35 + −12 + 412 + 76 + 1212 + 1721 287 = 58 29 approximately 9897. Because the number of data points is even, the median is A29 + A30 /2. Counting the frequencies from the left, the 29th number in the ordered list occurs in the group of twelve 12’s, and equals 12. The 30th number is also 12, so the median is 12. =
27.1.2 Measures of Variation Measures of variation are indicators of how “spread out” the data is, as opposed to where a “center” might be located. The simplest measure of variation is the range.
DEFINITION 27.3
Range
The range of a data set is the difference between the largest number in the set and the smallest number in the set.
27.1 Measures of Center and Variation
1147
EXAMPLE 27.6
In the data set −4 −4 1 1 1 1 2 2 2 3 3 5 7 8 9 15 16, the range is 16 − −4 = 20. If these numbers are marked on the number line, the number at the left is twenty units from the number at the right of the list.
Note that the data set need not be ordered to compute the range. However, if the set is large, it is certainly easier to locate the largest and smallest numbers if the list is ordered. Another measure of variation is the standard deviation, which considers variations of data points from the mean of the data.
DEFINITION 27.4
Standard Deviation
Let A1 AN be numbers in a data set, having mean x. 1. If these numbers represent a sample within a (generally much) larger population, then the standard deviation of this sample is > ? N ? 1 A − x2 s=@ N − 1 j=1 j 2. If these numbers represent the entire population, or total set of items under consideration, then the standard deviation is > ? N ?1 =@ A − x2 N j=1 j
For example, an x-ray unit might take data at 4000 points. The readings at these points would be the entire population of data, and would be used for a standard deviation. It might be useful, however, to periodically take samples of 30 of these points, for which we would use the standard deviation s. The numbers s2 and 2 (squares of the standard deviation) are called the variance of the data, and are used in some contexts. The standard deviation of A1 AN is zero if and only if each Aj = x. In this case the data points are all the same and there is no deviation from the mean. However, if one or more Aj has a large difference from the mean, then the standard deviation will be correspondingly large. A large standard deviation indicates data that are “spread out”, while a small standard deviation corresponds to data that are clustered together, hence all lie fairly close to the mean. Of course, the idea of variation from the mean makes no sense if there is only one number, so N > 1 is assumed in defining the standard deviation s. It is a routine exercise in algebra to show that
s=
> 2 ? N @ N j=1 A2j − Nj=1 Aj NN − 1
1148
CHAPTER 27
Statistics
Some statisticians use 1/N in place of 1/N − 1 in defining s. We will use 1/N − 1 because, for a given s, if x is known, then any N − 1 of the data points determine the remaining one.
EXAMPLE 27.7
Consider the data set consisting of the numbers 11, 13, 16, 2, 21, 22, 24 and 25. These numbers are all fairly close together, with a range of only 25 − 11 = 14. The mean of these numbers is x=
11 + 13 + 16 + 2 + 21 + 22 + 24 + 35 = 2025 8
For the standard deviation, first compute the sum of the squares of the differences of the numbers from the mean: 11 − 20252 + 13 − 20252 + 16 − 20252 + 2 − 20252 21 − 20252 + 22 − 20252 + 24 − 20252 + 25 − 20252 = 19650 The standard deviation is s=
19650 = 05298 7
to four decimal places. This small standard deviation is due to the fact that these numbers are fairly close to one another with not much spread. Contrast this standard deviation with a similar calculation for the following data set, which is more “spread out.”
EXAMPLE 27.8
Consider the data set 11, 46, 92, 157 and 28. The range is 269. Routine calculations give x = 1172 and = 94943. The large standard deviation reflects the fact that the numbers in the data set are spread out. Greater spread would result in a larger standard deviation. For example, if 11 were replaced by −127, making√the range 4070, then the mean of the new data set is 896, and the standard deviation is s = 48119/4 ≈ 10968. Figures 27.2 and 27.3 show scatter plots of the data in Examples 27.7 and 27.8, respectively, giving a visual sense of the connection between spread of the data and standard deviation. As another example of the idea of s as a spread, we would expect the standard deviation of the data set of electrical engineering starting salaries to be very large. A routine calculation shows that this standard deviation is approximately 2091400, all due to the outlier Seagram.
27.1 Measures of Center and Variation
1149
3.0
2.5
2.0
1.5
1.0
0.5
0 0
1
2
3
4
5
6
7
8
9
FIGURE 27.2 Data of Example 27.7.
30
25
20
15
10
5
0 0
1
2
3
4
5
6
FIGURE 27.3 Data of Example 27.8.
SECTION 27.1
PROBLEMS
1. Sometimes the range of a data set is used to make a rough estimate of the standard deviation, according to the rule s≈
1 largest number in the list − smallest number 4
Compute s, then try this estimate for the following data sets: (a) −5 −2 −15 0 0 1 1 1 2 7 9 116 (b) 7 1 −4 2 2 1 4 1 1 2 6 3 (c) 5 −2 −10 5 2 2 5 3 5
CHAPTER 27
1150
Statistics
2. Compute the mean, median, and standard deviation s for each of the data sets: (a) −4 −6 25 3 8 5 −3 −4 8 3 22
−3 4
−1 2
0 6
1 4
3 12
4 3
Find the mean, median, and standard deviation s for this data set.
(b) 1 1 1 −1 2 3 −1 4 2
4. Data are given by the following frequency table:
(c) 3 −4 2 15 −4 −4 2 1 7
Data point Frequency
(d) 93 95 97 10 84 87 88 88 41 (e) −16 −14 −10 0 0 1 1 3 5 7 3. Data are given by the following frequency table:
27.2
Data point Frequency
−12 4
−97 2
−8 6
−76 4
−51 12
4 3
Find the mean, median, and standard deviation s for this data set.
Random Variables and Probability Distributions A random variable on an experiment is a function that assigns a real number to each outcome of the experiment, based in some specified way on the random outcomes of the experiment. If X is such a random variable, then Xo is a real number for each outcome o of the experiment.
EXAMPLE 27.9
Consider the experiment of flipping three coins. Each outcome is a string of three letters, each an H or T . If o is an outcome, let Xo = number of heads in o. X assigns a number to each outcome. In this example, for any outcome o, Xo must be one of the numbers 0, 1, 2, or 3. The outcomes of this experiment are o1 = HHH
o2 = HHT
o3 = HTH
o4 = THH
o5 = HTT
o6 = THT
o7 = TTH
o8 = TTT
and Xo1 = 3
Xo2 = Xo3 = Xo4 = 2
Xo5 = Xo6 = Xo7 = 1
and
Xo8 = 0
For any repetition of this experiment, any outcome may occur, hence X is a random variable.
EXAMPLE 27.10
Consider the experiment of flipping six coins, but repeating these six flips fifteen times. An outcome consists of a list of fifteen strings of six letters each, each letter being an H or T . If o is an outcome, let Xo equal the total number of tails that came up in the fifteen repetitions of six flips. Xo can be any number from 0 through 90, inclusively and X is a random variable, because for any fifteen repetitions of the six flips, the outcome is random.
27.2 Random Variables and Probability Distributions
DEFINITION 27.5
1151
Probability Distribution
A probability distribution on a random variable X is a function P which assigns a probability Px to each value x that the random variable can assume.
If x denotes a summation over all values x that the random variable can assume, then a probability distribution must satisfy Px = 1 x
and 0 ≤ Px ≤ 1 for each outcome x. Both of these requirements are consistent with our understanding of a probability function. The only new wrinkle is that now we have a probability function not on the outcomes themselves, but on the values of the random variable X. The probability PXo is the probability, not of the outcome o itself, but of Xo, the value of the random variable at o. We are using P for this probability distribution, to distinguish it from a probability Pr that may be defined directly on the outcomes of the experiment. Pro is the probability that the outcome o occurs, and Px that the random variable assumes the value x. We could therefore write is the probability Px as PXo, the first sum being over the values x that X takes on, and the second x o sum over the outcomes o of the experiment.
EXAMPLE 27.11
Flip three coins. There are 23 = 8 outcomes o1 o8 , each with probability Proj = 1/8 Define a random variable X as follows. For each outcome oj , let Xoj = number of tails in oj . Then X has the following values: XHHH = 0
XHHT = XHTH = XTHH = 1
XTTH = XTHT = XHTT = 2
XTTT = 3
There are four values this probability distribution can assume, namely 0, 1, 2, and 3. A probability distribution on X is given by assigning a probability to each value X can assume. In this example, assign 1 3 P0 = P1 = 8 8 3 1 P2 = and P3 = 8 8 The probability function Pr acts on the outcomes of the experiment, and every outcome is equally likely. The probability distribution P acts on the values of the random variable, giving a probability for each value of the random variable to occur. Not all values that the random variable assumes are equally likely. Three of the eight outcomes outcomes have two tails, so P2 = 3/8 and so on.
1152
CHAPTER 27
Statistics
The mean and standard deviation for a random variable are defined as follows.
DEFINITION 27.6
Mean and Standard Deviation
Let X be a Random Variable. Suppose X has probability distribution P. 1. The mean of X is the number defined by = xPx x
2. The standard deviation of X is denoted , and is defined by = x − 2 Px The sum
x
x
is over all values that the random variable can assume.
The expression for is reminiscent of the expected value of an experiment or game. Each term of the sum for is the product of a payoff x and the probability that the random variable achieves this value. As a special case, if the numerical values of X are x1 , x2 , , xN and if X is equally likely to achieve each of these values, then Pxj = 1/N and the formula for becomes =
N j=1
xj
1 N
=
N 1 x = x = mean of the data set x1 xN N j=1 j
Further, in this case 0 =
n
0 xj − 2 /N =
j=1
N
j=1 xj
− x2
N
the familiar formula for the standard deviation of a population. It is a routine calculation to show that =
x2 Px − 2 x
EXAMPLE 27.12
If we flip four coins, there are 24 = 16 outcomes: o1 = HHHH
o2 = TTTT
o3 = THHH
o4 = HTHH
o5 = HHTH
o6 = HHHT
o7 = HTHT
o8 = HHTT
o9 = HTTH
o10 = THHT
o11 = THTH
o12 = TTHH
o13 = HTTT
o14 = THTT
o15 = TTHT
o16 = TTTH
27.2 Random Variables and Probability Distributions
1153
Define a random variable X by letting Xo equal the number of tails in the outcome o. From the list of outcomes, we have: Xo1 = 0
Xo2 = 4
Xo3 = Xo4 = Xo5 = Xo6 = 1 Xo7 = Xo8 = Xo9 = Xo10 = Xo11 = Xo12 = 2 and Xo13 = Xo14 = Xo15 = Xo16 = 3 X can take on five numerical values. We can assign a probability to each numerical value of X by defining P0 = P4 =
1 16
4 6 4 P2 = and P3 = 16 16 16 For example, the value 3 (for three tails) occurs four times out of the sixteen outcomes of the experiment, so its probability of occurring as a numerical value of X is 4/16. Notice that these probabilities sum to 1. The mean of X is = xPx P1 =
x
1 1 4 6 4 =0 +4 + 1 + 2 + 3 = 2 16 16 16 16 16 On average, with four flips, we expect to see two tails. The standard deviation for X is = x − 2 Px x
1 1 4 + 4 − 22 + 1 − 22 16 16 16 6 4 + 2 − 22 + 3 − 22 = 1 16 16
= 0 − 22
Random variables are not restricted to have only finitely many outcomes. For example, if the experiment is to pick a positive integer k, and Xk = k, then X is a random variable which assumes infinitely many values. This leads to the following definition, which includes a subtlety involving orders of infinity.
DEFINITION 27.7
Random Variables
A discrete random variable is one that assumes either a finite or countable number of values. A continuous random variable is one that assumes an uncountable infinity of values.
CHAPTER 27
1154
Statistics
An infinite set is called countable if it can be put into one-to-one correspondence with the set of positive integers. For example, suppose E consists of all powers 2n , with n a positive integer. The correspondence n ↔ 2n , for n any positive integer, matches each positive integer with a unique number in E, and conversely. E is therefore countable, or countably infinite. An infinite set that cannot be put into one-to-one correspondence with the positive integers is said to be uncountable, or uncountably infinite. The real numbers form an uncountable set, because it can be shown that there is no one-to-one match between the positive integers and the real numbers. The real numbers have a higher order of infinity than the positive integers, even though both sets are infinite. As an example of a continuous random variable, suppose the experiment is to choose a number (not necessarily an integer) between 0 and 1 inclusive. There are uncountably many such numbers. If Xk = k for 0 ≤ k ≤ 1, then X is continuous because the number of values X can assume is uncountably infinite.
SECTION 27.2
PROBLEMS
1. Suppose two dice are rolled. If o is an outcome, let Xo = the sum of the numbers on the dice. Determine the probability distribution of X, as well as the mean and standard deviation for X. 2. Four coins are tossed. If o is an outcome, let 1 if two or more tails come up Xo = 3 otherwise. Determine the probability distribution of X, and its mean and standard deviation. 3. A wheel having the numbers 1 through 20 equally spaced about its circumference, is spun, and the number that is on top when the wheel comes to rest is recorded. If o is an outcome, let Xo = the number of factors in the prime factorization of o. For example, if 7 comes up, then X7 = 1 because 7 is prime. But X15 = 2 because 15 = 35, and X12 = 3 because 12 = 223. By convention, 1 is not prime and has no prime factors. Determine the probability distribution of X, and the mean and standard deviation of X.
27.3
4. Suppose two dice are rolled. If o is an outcome, and the two dice show different numbers, let Xo be the quotient of the larger number divided by the smaller. If o is an outcome in which both dice come up the same, let Xo be this common value. Determine the probability distribution of X, and its mean and standard deviation. 5. Two cards are picked in succession (without replacement) from a standard deck. If o is an outcome, let Xo equal the sum of the numbers on the cards if both cards are numbered; Xo = 11 if exactly one card is a jack, queen, king or ace; and Xo = 12 if two of the cards are in the jack, queen, king, ace group. Determine the probability distribution for X, as well as its mean and standard deviation. 6. An experiment is to choose any integer from 1 to 30, inclusively. If o is an outcome, let Xo = 1 if o is divisible by 2, Xo = 2 if o is divisible by 3 but not 2, and Xo = otherwise. Determine the probability distribution of X, and its mean and standard deviation.
The Binomial and Poisson Distributions Not every distribution that can be defined is interesting. However, the binomial and Poisson distributions have important applications. This section is devoted to these two distributions.
27.3.1 The Binomial Distribution The binomial distribution is used in a very specific setting. Suppose we have some experiment or procedure that involves a fixed number of trials (repetitions). For example, in flipping a coin
27.3 The Binomial and Poisson Distributions
1155
fifty times, the experiment consists of the fifty flips, and each flip is a trial. Assume that the following conditions are satisfied. 1. Each trial must have exactly two outcomes. These might be called A and B, or S (success) and F (failure), or H and T (for a coin), or some other designation. 2. The trials are independent. The outcome of one trial is unaffected by the outcome of any other trial. 3. For each trial, the probabilities remain constant. For example, with coin tosses, each toss might have 1/2 probability for a head, and 1/2 for a tail, and these probabilities do not change as more trials are performed. Now suppose N trials are carried out, and the result of each trial is S or F . Define Xk = the number of times S comes up exactly k times in the N trials. Then X is a random variable, taking on the values 0, 1 , , N . Suppose S has probability p on any one trial, meaning that F has probability q = 1 − p. Then the probability distribution P of X is the binomial distribution N Px = px q N −x for x = 0 1 N (27.1) x N Recall that or N Cx is the number of ways of choosing x objects from N objects without x regard to order and is given by N! N = x x!N − x! To see why equation (27.1) holds, consider the probability of getting S exactly x times in the N trials. Imagine picking x boxes out of N boxes, without regard to order. There are N ways of doing this. For each such way, the probability of that way occurring is px q N −x , x because S comes up x times, each with probability p, and therefore F comes up N − x times, each with probability q. Therefore the probability of getting S exactly x times is Px = (number of ways this outcome occurs) · (probability of each occurrence) N = px q N −x x Notice that this is the term having p to the x power in the binomial expansion p + qN . Since p + q = 1, N N Px = px q N −x = p + qN = 1N = 1 x x x=0 as is required for the probability distribution of a random variable.
EXAMPLE 27.13
A district attorney is deciding on strategy for considering a plea bargain. The D.A. knows that jurors make a correct vote for guilt or innocence about 79% of the time. There are twelve jurors, who vote independently, and it takes a vote of at least ten to convict. What is the probability that at least ten jurors will reach a correct decision?
1156
CHAPTER 27
Statistics
We want the probability of at least ten out of twelve votes being correct (as decided by the evidence and presentation). Let N = 12, p = 079 and q = 1 − 079 = 021. Since at least ten jurors means exactly ten, exactly eleven or exactly twelve, compute 12 12 12 P10 + P11 + P12 = 07910 0212 + 07911 021 + 07912 10 11 12 12 · 11 07910 0212 + 1207911 021 + 07912 2
=
≈ 0523 This is the probability that at least ten jurors reach a correct decision. As another example, the probability that exactly eight jurors reach a correct decision is 12 P8 = 0798 0214 ≈ 0146 8 For a binomial probability distribution, the mean and standard deviation have a particularly simple form.
THEOREM 27.1
For a binomial probability distribution on N trials, = Np and =
Npq
These are routine computations. For the mean, write N N N N k N −k = k pq = k pk q N −k k k k=0 k=1 =
N
N
k=1
because
k
N k
N −1 k−1
pk q N −k
N − 1! kN ! =N k!N − k! k − 1!N − k! N −1 =N k−1 =
Now change the summation index to m = k − 1 to write =
N k=1
= Np
N
N −1 k−1
N −1 k=0
N −1 m
k N −k
pq
=
N −1 m=0
N
N −1 m
pm+1 q N −1−m
pm q N −1−m = Npp + qN −1 = Np
because p + q = 1. A similar calculation verifies that =
√
Npq.
27.3 The Binomial and Poisson Distributions
1157
EXAMPLE 27.14
An assembly line produces automatic veebles. It has been found that the probability that a veeble is defective (fails to veebulate) is 002. If 40 veebles are selected at random, what is the probability that no more than three are defective? Let N = 40 and p = 002, so q = 098. The random variable X is defined by Xx = number of veebles that are defective, for x = 0 1 40. The probability distribution is binomial, and Px = probability that x of the forty veebles are defective. If no more than three are defective, then none, one, two or three could be defective. We want to know 40 40 40 40 0 40 39 2 38 P0 + P1 + P2 + P3 = pq + pq + pq + p3 q 37 0 1 2 3 = 09840 + 4000209839 + 20390022 09838 + 2013380023 09837 ≈ 0992 This probability makes sense intuitively. The probability of one veeble being defective is quite small, so it should be very likely that fewer than four are defective. The mean of this probability distribution is = Np = 40002 = 08 The standard deviation is =
27.3.2
Npq =
40002098 ≈ 0885
The Poisson Distribution
Suppose we have some specific interval or segment in mind. This could be an interval in time, or, in terms of a space unit, some segment such as a length, distance, area, or something else. We are often interested in the occurrence of some event falling within this interval or segment. Let Xx = the number of times x occurs in the given interval. Then X is a random variable, assuming that the occurrence of falling or not falling in the interval is a random event. The probability that the event falls in the interval x times can be shown to have the form x e− (27.2) x! in which is a ratio of certain occurrences per interval, according to the setting of the problem. P is a probability distribution on the random variable X, and is called the Poisson distribution. The binomial distribution is discrete, and in fact finite, since it deals with the probability of some event occurring a certain number of times out of a given number. By contrast, the Poisson distribution is not finite, since it can take on values 0 1 2 . In the Poisson distribution, it is assumed that the occurrences under consideration are independent of each other, and are uniformly distributed over the interval of interest. The occurrences are therefore not allowed to cluster in one particular part of the interval. Px =
EXAMPLE 27.15
A game consists of tossing darts onto a large flat mat that has been divided into 450 blocks of 6 square inches each. In one session, 370 darts were thrown. Suppose we want the probability that one block was hit exactly twice or exactly four times.
1158
CHAPTER 27
Statistics
We will use the Poisson distribution because the game involves an event (dart hitting the floor) occurring over an interval, which in this case is a 6-square-inch block of the floor. First, compute 370 = 0822 450 darts per block. Let Px = x e− /x! for x = 0 1 2 . We want the probability of a block being hit exactly two or exactly four times. Thus, compute =
08222 e−0822 08224 e−0822 + ≈ 0157 2! 4! Since 4500157 = 7065, we would expect on average that 71 blocks will be hit exactly two or four times. For contrast, suppose only 80 darts were thrown. Now take P2 + P4 =
=
80 = 0178 450
so 01782 e−0178 01784 e−0178 + = 00133 2! 4! Since 45000133 = 5985, we would expect six of the blocks to be hit exactly two or four times if 80 darts are thrown. P2 + P4 =
It is possible to show that the mean of the Poisson distribution (27.2) is , and the standard √ deviation is = .
SECTION 27.3
PROBLEMS
Binomial Distribution 1. In each of (a) through (f), a number N of trials is performed, each with probability p of success. Use a binomial distribution to compute the probability of x successes for the given values. (a) N = 8 p = 043 x = 2 (b) N = 4 p = 07 x = 3 (c) N = 6 p = 05 x = 3 (d) N = 10 p = 06 2 ≤ x ≤ 5 (e) N = 8 p = 04 x ≥ 7 (f) N = 10 p = 058 2 ≤ x ≤ 4 (g) N = 10, p = 035, x = 3 or x = 7 (h) N = 7, p = 024, x = 1 or x = 3 or x = 5 2. A professor gives a test consisting of six multiple choice questions, each offering four answers from which to choose. A student who knows less than nothing takes the test by the method of random choice, simply guessing on each question.
(a) What is the probability that the student gets exactly one question right? (b) What is the probability of getting exactly two right? (c) What is the probability of getting four or five right? (d) What is the probability of getting all six questions right? 3. In Problem 2, change the number of choices for each question to three. Now answer questions (a) through (d). 4. In Problem 2, keep the number of choices given with each question at four, but suppose now there are ten questions. Answer the questions posed in Problem 2, and also determine the probability of getting five, six, or seven questions correct. 5. An auto manufacturer purchases starting mechanisms from an independent contractor. Every week, a quality control officer pulls one hundred starters out of the shipment from the contractor and inspects each. A starter is either defective or not, with no range in
27.4 A Coin Tossing Experiment, Normally Distributed Data, and the Bell Curve between. If two or fewer are defective, the total shipment of the week is accepted. Otherwise it is returned to the contractor. Further, it is known that overall the starter company has a 32% rate of production of defective products. What is the probability that the next shipment will be accepted? 6. In a clinical trial, it was found that 7% of those in the control group (who did not receive the drug) experienced a particular side effect commonly associated with this drug. Suppose the same 7% rate was found in those taking the drug in the trial. Find the probability that among the 1000 subjects taking the drug exactly 92 exhibited the side effect.
Poisson Distribution 7. For = 13, plot points of the Poisson distribution corresponding to x = 0 1 2 8. Draw a smooth curve through these points. This will gives some sense of the fact that a Poisson distribution has an approximately bell-shaped curve although here x ≥ 0 and the curve is not an exact bell curve. 8. Assuming a Poisson distribution, determine Px for each of the following. (a) = 09 x = 6 (b) = 085 x = 10 (c) = 092 x = 4 (d) = 087 x = 8 (e) = 094 1 ≤ x ≤ 5
9. In a splatter test conducted over a full day, a car windshield is subdivided into 320 pieces, and it is found that 295 were hit by some material kicked up from the road by other cars. Assuming a Poisson distribution, determine (a) the probability that some piece is hit three times (b) the probability that some piece is hit from two to five times, inclusively 10. A new computer chip has one million microdots etched on its surface. In the manufacturing process, 997 850 dots are imprinted with information. Assuming a Poisson distribution, determine the probability that the process will incorrectly etch exactly three information packets on the same dot. 11. A golf driving range subdivides the area between 100 and 300 yards from the tee into 500 regions. In one four hour period, 476 balls were driven into this area. Assuming a Poisson distribution, compute the probability that any region was hit 3 times. What is the probability that a region was hit from two to six times, inclusive? 12. A gaming machine randomly selects a positive integer each time fifty cents is inserted. A trained cocker spaniel always bets on 15. The probability of 15 coming up is 1/24. The spaniel plays the game 100 times. Assuming a Poisson distribution, find the probability
(f) = 064 3 ≤ x ≤ 5
(a) of winning 7 times
(g) = 075 x = 3 or x = 8
(b) of winning 3 times
(h) = 097 x = 1 or x = 3 or x = 10
(c) of not winning even once
27.4
1159
A Coin Tossing Experiment, Normally Distributed Data, and the Bell Curve Here is a simple but instructive experiment. Flip a coin twelve times and record the number of heads. If we perform this experiment once, some number of heads from 0 to 12, inclusively, will come up. We would be surprised to see no heads or twelve heads, but this is not impossible. An outcome of, say, five, six or seven heads would not cause much reaction. Repeat the experiment (12 flips) a few times, recording the number of heads each time. On any given set of 12 flips, we can get any number of heads from 0 to 12, inclusive. Now suppose we conduct N sets of 12 flips. Record the number of heads on each set of 12 flips. Is there any pattern that becomes apparent in carrying out large numbers of repetitions of 12 flips? Since in any repetition the number of heads can be any integer from 0 through 12, we might at first guess that there is not. However, a perhaps surprising regularity begins to occur if we look not just at outcomes, but at frequencies of outcomes. Instead of making a list of
CHAPTER 27
Statistics
the number of heads in each repetition, count, over all the repetitions, the frequency of the occurrence of any given number of heads. For example, suppose we run 50 repetitions of 12 flips. If five heads occur in exactly 15 of these repetitions, we say that 5 has a frequency of 15. If eight heads occur in exactly 12 of the repetitions, then 8 has a frequency of 12. In this way, we can look at the frequency of each outcome over all the repetitions. To illustrate, below are the results of N = 21 repetitions of the 12 coin flips, recording the number of heads on each set. (If you carry out your own 21 sets, you may get different outcomes). The first set of twelve flips had four heads, the seventh set eight heads, the thirteenth had six heads, and so on. Exactly three of the sets had four heads, so four heads has frequency 3. Similarly, 8 heads occur six times in row two of Table 27.1, so 8 heads has a frequency of 6, and 9 heads has a frequency of 3, and so on. Table 27.2 is a frequency table, summarizing the results of Table 27.1 in terms of frequencies: It is convenient to display this frequency table in a graph of number of heads (horizontal) against frequency (vertical). The resulting bar graph is shown in Figure 27.4. This bar graph, based on only 21 repetitions, does not suggest any pattern. However, Figures 27.5 through 27.9 show, in turn, frequency graphs for 86 254 469 920 and, finally,
TA B L E 2 7 . 1 Set of Twelve Flips Number of Heads
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 4 6 9 8 5 6 8 6 8
9
8
6
6
8
8
6
3
9
4
4
6
TA B L E 2 7 . 2 Number of Heads Frequency of this Number in 21 Trials
0
1
2
3
4
5
6
7
8
9
10
11
12
0
0
0
1
3
1
7
0
6
3
0
0
0
4
5 6 7 Number of heads
11
12
8 7 6 Frequency
1160
5 4 3 2 1 0 1
FIGURE 27.4
2
3
12 coin flips, 21 repetitions.
8
9
10
Frequency
27.4 A Coin Tossing Experiment, Normally Distributed Data, and the Bell Curve
1161
20 18 16 14 12 10 8 6 4 2 0 1
2
3
4
5 6 7 Number of heads
8
9
10
11
12
FIGURE 27.5 12 coin flips, 86 repetitions.
60
Frequency
50 40 30 20 10 0 1
2
3
4
5 6 7 8 Number of heads
9
10
11
12
FIGURE 27.6 12 coin flips, 254 repetitions.
120
Frequency
100 80 60 40 20 0 1
2
3
4
5
6 7 8 Number of heads
FIGURE 27.7 12 coin flips, 469 repetitions.
9
10
11
12
CHAPTER 27
Statistics
250
Frequency
200 150 100 50 0 1
2
3
4
5
6
7
8
9
10
11
12
Number of heads FIGURE 27.8
12 coin flips, 920 repetitions.
350 300 250 Frequency
1162
200 150 100 50 0 1
2
3
4
5
6
7
8
9
10
11
12
Number of heads FIGURE 27.9
12 coin flips, 1414 repetitions.
1414 repetitions of the 12 flips. Figure 27.10 shows a summary of the bar charts of Figures 27.5 through 27.9, except that, instead of bars, we have drawn a curve through the tops of the bars. The bottom curve is for 86 repetitions of the 12 flips, the next for 254 repetitions, and so on through 469, 920, and—the highest curve—1414 repetitions. Notice that, as the number of repetitions increases, the frequency graphs in figure 27.10 approach a bell curve. We will give a more careful characterization of bell curves shortly, but for now, this is a curve having the general appearance of the graph of Figure 27.11. The fact that the frequency charts approach a bell-shaped curve as N increases is characterized by saying that the data of these trials are normally distributed. Normal distribution means that there is a bell curve which can be fit (approximately) to the data. We will see that such a bell curve contains a great deal of information about probabilities of the experiment. Part of our task will be to see how to extract this information. There are many other normally distributed data sets whose graphs approximate bell curves. For example, Table 27.3 gives score ranges on a standardized test that is administered nationally, and the percentage of the students taking the test who scored in each range that year.
27.4 A Coin Tossing Experiment, Normally Distributed Data, and the Bell Curve
1163
350 300 250 200 150 100 50 0 1
2
3
4
5
6
7
8
9
10
11
12
FIGURE 27.10 Line graphs of frequencies of heads for 86, 254, 469, 920 and 1414 repetitions.
0.2
0.15
0.1
0.05
0 –4
0
4
8
12
FIGURE 27.11 Typical bell curve.
This data is graphed as a bar chart in Figure 27.12 (a). If a curve is drawn through the tops of the bars, as in Figure 27.12 (b), an approximately bell-shaped curve results. We have now seen two examples of curves we have referred to as "bell curves". Here is a definition of this term.
CHAPTER 27
Statistics
TA B L E 2 7 . 3 Range 200–240 250–290 300–340 350–390 400–440 450–490 500–540 550–590 600–640 650–690 700–740 750–800
Percent of Students in this Range 09 18 46 89 135 176 175 143 106 64 25 14
20 Percent of students in range
1164
18 16 14 12 10 8 6 4 2 0 0 290 340 390 440 490 540 590 640 690 740 800 0– 00– 50– 00– 50– 00– 50– 00– 50– 00– 50– 25 3 3 4 4 5 5 6 6 7 7
24
0–
20
Score ranges FIGURE 27.12(a)
Standardized test scores.
20 18 16 14 12 10 8 6 4 2 0 0
2
FIGURE 27.12(b)
4
6
8
10
Standardized test scores bell curve.
12
14
27.4 A Coin Tossing Experiment, Normally Distributed Data, and the Bell Curve
DEFINITION 27.8
1165
Bell Curve
A bell curve is the graph of an exponential function y= √
1
e−x−
2 /2 2
2
in which is any positive number and is any number.
Figure 27.13 shows graphs of three bell curves drawn on the same axes for comparison. All curves have = 6, and the choices for are 35, 4 and 45. The following are some important properties shared by every bell curve. 1. The curve lies entirely above the horizontal axis and and has a single maximum point at x = . 2. The curve is symmetric about the vertical line x = . 3. The curve is asymptotic to the horizontal axis as x → and as x → − . 4. The curve has exactly two points of inflection. 5. The area bounded by the bell curve and the x-axis is 1: 1 −x−2 /2 2 e dx = 1 √ 2 − We will see that is actually a standard deviation, and is a median. 0.7
σ = 3.5
0.6
σ=4
0.5
σ = 4.5
0.4
0.3
0.2
0.1
0 0
5
10
FIGURE 27.13 Bell curves corresponding to = 6 and equal to 35, 4 and 45.
15
1166
CHAPTER 27
Statistics
It is Property 5 which makes possible the following connection between a bell curve and probability.
THEOREM 27.2
Suppose the bell curve y = Then,
2 2 √ 1 e−x− /2 2
is the graph of a continuous random variable X.
1. If a < b, and Pa ≤ x ≤ b denotes the probability that an observed data value Xx falls within a b, then 1 b −x−2 /2 2 Pa ≤ x ≤ b = √ e dx 2 a (See the shaded portion of Figure 27.14.) 2. If Px ≥ a denotes the probability that an observed data value Xx is at least as large as a, then Px ≥ a = √
1
2
e−x−
2 /2 2
dx
a
(See the shaded portion of Figure 27.15.) 3. If Px ≤ b denotes the probability that an observed data value Xx is no larger than b, then Px ≤ b = √
1 2
b
−
e−x−
2 /2 2
(See the shaded part of Figure 27.16.)
a FIGURE 27.14
b
Shaded area equals Pa ≤ x ≤ b.
dx
27.4 A Coin Tossing Experiment, Normally Distributed Data, and the Bell Curve
1167
a FIGURE 27.15 Shaded area equals Px ≥ a.
b FIGURE 27.16 Shaded area equals Px ≤ b.
A few years ago these integrals would be approximated using tables constructed for this purpose. Now, however, the integrals are quickly approximated by mathematics computation software, such as MAPLE or MATHEMATICA. Now we must make the significant leap from a continuous random variable to a discrete random variable, since the latter are what are often generated by experiments or procedures that we actually carry out. We have seen two examples (coin flips and/standardized test scores) in which the data approximated a bell curve. Such data, which are often values of a discrete random variable, are said to be normally distributed. We will now show how the mean and standard deviation
1168
CHAPTER 27
Statistics
of normally distributed data can be used to associate the data with a particular bell curve (by specifying the constants and ), and then how conclusions about the probability of certain outcomes related to the data can be drawn from the curve, using the theorem. Suppose we carry out some set of trials (such as flipping a coin, or inspecting items for defects, or determining whether a drug has beneficial results), and each trial has exactly two possible outcomes, which we will call A and B, or success and failure, or some other bimodal designation. Suppose also the outcomes of separate trials are independent of each other. We therefore have a binomial random variable which we will call X, where Xx = number of times A occurs in x. Let the probability that A occurs in any trial be p. (For an honest coin flip, p = 1/2). By complementarity the probability that B occurs is q = 1 − p. The frequency of A after N trials, as N increases, approaches a normal distribution given by a bell curve. The mean of the frequencies of any set of N trials given by this bell curve is = Np and the standard deviation of this data is =
Np1 − p
The bell curve, or normal curve, associated with these trials is the graph of 1 2 2 y= √ e−x− /2 2
(27.3)
Now suppose we pick an interval of values on the x-axis, and we want the probability that Xx falls within this interval (that is, that A occurs between a and b times, inclusive, in x). Here we must be careful. The graph of the continuous bell curve from equation (27.3) is only approximated by the graph of the discrete random variable. We must therefore make an adjustment in relating area under this curve to a probability. This is done by what is called a continuity adjustment in the limits of integration, effected by adding 1/2 to the upper limit of integration, and subtracting 1/2 from the lower limit. This is described in the following theorem. THEOREM 27.3
1. Let a ≤ x ≤ b. Assume that the random variable X has a binomial distribution. Then Pa ≤ x ≤ b, denote the probability that an observed data value Xx falls within a b, is given by b+1/2 1 2 2 Pa ≤ x ≤ b ≈ e−x− /2 dx √ a−1/2 2 2. Px ≥ a = probability that the observed value is at least as large as a 1 2 2 ≈ e−x− /2 dx √ a−1/2 2 3. Px ≤ b = probability that the observed value is no larger than b b+1/2 1 2 2 ≈ e−x− /2 dx √ − 2
27.4 A Coin Tossing Experiment, Normally Distributed Data, and the Bell Curve
1169
Keep in mind that this continuity adjustment in the limits of integration applies only to the bell curve approximation of a binomial distribution. The reason these conclusions are stated as approximations is that we are approximating the discrete random variable X by a continuous bell curve (chosen with and determined by X), and also the limits of integration include a continuity adjustment which is itself an approximation. We will illustrate how the theorem is used to extract numerical information about events in an experiment.
EXAMPLE 27.16
Flip a coin 12 times. Previously we performed many repetitions of the 12 flips, charting the frequencies of the occurrences of heads. Here N = 12, and the probability of obtaining a head on any one flip of the coin is p = 1/2, assuming an honest coin. The only other outcome of any coin toss is a tail, which has probability q = 1 − 1/2 = 1/2. The mean of the (approximately) normally distributed data is 1 = Np = 12 = 6 2 not a surprising result. Ideally, if we flip a coin 12 times, we expect on average to obtain 6 heads, although in a specific run of 12 flips we might get any number of heads from 0 through 12. The standard deviation is = Npq =
0 12
√ 1 1 = 3 2 2
The bell curve for this experiment is y= √
1
e−x−
2 /2 2
2 1 1 2 2 = √ √ e−x−6 /6 = √ e−x−6 /6 2 3 6
A graph of this curve is shown in Figure 27.17. The maximum occurs at x = 6, and this line is also the axis of symmetry. Suppose we want to compute the probability that the number of heads that comes up falls between, say, 4 and 8. Taking care to use the continuity adjustment, this probability is 1 85 −x−62 /6 P4 ≤ x ≤ 8 ≈ √ e dx ≈ 085109 6 35 The probability is approximately 085 that a head will come up 4, 5, 6, 7 or 8 times. Figure 27.18 shows this probability as an area under the approximating normal curve.
1170
CHAPTER 27
Statistics
0.20
0.15
0.10
0.05
0 0 FIGURE 27.17
2
4
6
8
10
12
8
10
12
1 2 Bell curve y = √ e−x−6 /6 . 6
0.20
0.15
0.10
0.05
0 0 FIGURE 27.18
2
4
6
P4 ≤ x ≤ 8, with the continuity adjustment.
27.4 A Coin Tossing Experiment, Normally Distributed Data, and the Bell Curve
1171
In this example we can compute P4 ≤ x ≤ 8 exactly using a binomial distribution for comparison: P4 + P5 + P6 + P7 + P8 4 8 5 7 6 6 1 1 1 1 1 1 12 12 12 = + + 4 5 6 2 2 2 2 2 2 7 5 8 4 1 1 1 1 12 12 + + 7 8 2 2 2 2 1 12 12 12 12 12 + + + + = 12 4 5 6 7 8 2 =
1 495 + 792 + 924 + 792 + 495 = 0854 212
differing from the approximation by about three thousandths. If we left out the continuity adjustment, we would compute P4 ≤ x ≤ 8 as 1 8 −x−62 /6 e dx √ 6 4 which is approximately 0751 79. We know that this is incorrect because we have computed P4 ≤ x ≤ 8 exactly in this case using a binomial distribution. What is the probability that the number of heads will be 10 or more? This is 1 2 Px ≥ 10 ≈ √ e−x−6 /6 dx ≈ 002165 10−1/2 6 As we might expect, getting 10 or more heads in 12 flips is very unlikely. This probability is represented by the shaded area in Figure 27.19. What is the probability of getting from 0 to 6 heads, inclusive? This is the area of the shaded region in Figure 27.20, and is given by 1 6+1/2 −x−62 /6 e dx ≈ 061359 Px ≤ 6 ≈ √ 6 − It is a common error to think that Px ≤ 6 should be 1/2. However, there are seven ways the number of heads can be 6 or less, but only six ways it can be more than 6. This probability also dramatizes the need for the continuity adjustment. The area under this bell curve, for x ≤ 6, is indeed half the total area bounded by the curve, or 1/2. However, the probability Px ≤ 6 is not 1/2. The apparent discrepancy is due to the fact that this bell curve is only an approximation to the data of the experiment. Finally, what is the probability that there will be exactly 7 heads? This probability is 1 7+1/2 −x−62 /6 Px = 7 = √ e dx ≈ 0193 6 7−1/2 Again, this is a probability we can compute without a bell curve as P7 =
12 7
7 5 1 1 = 0193 · · · 2 2
1172
CHAPTER 27
Statistics
0.20
0.15
0.10
0.05
0 0 FIGURE 27.19
2
4
6
8
10
12
10
12
Px ≥ 10, with the continuity adjustment.
0.20
0.15
0.10
0.05
0 0 FIGURE 27.20
2
4
6
8
Px ≤ 6, with the continuity adjustment.
27.4 A Coin Tossing Experiment, Normally Distributed Data, and the Bell Curve
1173
EXAMPLE 27.17
Suppose we roll a die 50 times. We are interested in the frequency of having 4 come up. Think of each roll or trial as having two outcomes of interest, either 4 or not 4. Since there are six faces on the die, the probability of getting a 4 on one roll is p = 1/6, while the probability of getting not 4 is 5/6. Since N = 50 for fifty trials, the mean of the normally distributed data is 1 25 = 50 = 6 3 approximately 833. The standard deviation is 0 5 250 5 √ 1 = 50 = = 10 6 6 36 6 approximately 264. The bell curve associated with these trials is the graph of y= √
1 2
e−x−
2 /2 2
1 2 = √ √ e−9x−25/3 /125 2 56 10
6 2 e−9x−25/3 /125 = √ 5 20 Suppose, for example, we want the probability that, on fifty rolls of the die, the number of times 4 comes up is between 2 and 15, inclusively. This probability is (approximately) the
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0 0
4
8
FIGURE 27.21 P2 ≤ x ≤ 15, using the bell curve
6 2 e−9x−25 /125 . y= √ 5 20
12
16
1174
CHAPTER 27
Statistics
shaded area under the bell curve shown in Figure 27.21. This area appears to be nearly all of the area under the curve, so we expect P2 ≤ x ≤ 15 to be close to 1. Compute 15+1/2 6 2 P2 ≤ x ≤ 15 ≈ √ e−9x−25/3 /125 dx ≈ 099198 5 20 2−1/2 Suppose we want the probability that the number of 4’s that come up in fifty rolls is 47 or 48. The area between the bell curve and the horizontal axis for x between 47 and 48 is nearly 0. In fact, it is so small that it doesn’t show up on the graph (in the scale of the drawing). This means that, although 47 or 48 4s are not impossible in fifty rolls, it is extremely unlikely that either of these number of 4s will come up. To check this intuition, compute 48+1/2 6 2 P47 ≤ x ≤ 48 ≈ √ e−9x−25/3 /125 dx 5 20 47−1/2 ≈ 7732510−48
EXAMPLE 27.18
Suppose an airline finds (by analyzing past flight information) that its planes fly an average of 1500 miles per month per passenger, with a standard deviation of 90 miles. Suppose we want the probability that a typical passenger flies between 1400 and 1600 miles per month. Under the assumption that the data is normally distributed, we have = 1500 and = 90, and these determine the appropriate bell curve for this study: y= √
1 2
e−x−
2 /2 2
=
1 √
90 2
e−x−1500
2 /16200
The probability that a typical passenger flies between 1400 and 1600 miles per month is approximately P1400 ≤ x ≤ 1600 ≈
1 √
90 2
1600+1/2
e−x−1500
2 /16200
dx
1400−1/2
≈ 0735 86
27.4.1 The Standard Bell Curve The bell curve corresponding to = 0 and = 1 is the graph of 1 2 y = √ e−z /2 2
(27.4)
The graph is shown in Figure 27.22, with the horizontal axis as the line of symmetry. This curve is called the standard or normal bell curve. It is routine to use z as the variable in this context, and statisticians often refer to values on the horizontal axis as z-scores. One reason for having a standard bell curve is that it provides a reference point from which to compile certain information in tables. To illustrate one case where this is useful, suppose we have in mind a particular area under the standardized bell curve. For example, we may have a particular target probability that we are aiming for, and we want to know a value z = z0 so
27.4 A Coin Tossing Experiment, Normally Distributed Data, and the Bell Curve
1175
0.4
0.3
0.2
0.1
0 –3
–2
–1
0
1 1
FIGURE 27.22 Standard bell curve y = √
2
2 2 /2
e−z
3
.
that the area under the curve, to the left of z0 , gives that probability. If the probability we are aiming for is k, then we need to solve 1 z0 −z2 /2 e dz = k √ 2 − for z0 , a highly nontrivial task. Table 27.4 gives some commonly used probabilities, and the value of z0 such that that the area to the left of z0 under the standard bell curve gives this probability. For example, 1 1645 −z2 /2 e dz = 095002 √ 2 −
TA B L E 2 7 . 4 Area to the Left of z0 099 095 090 085 080 075
z0 233 1645 129 104 084 067
1176
CHAPTER 27
Statistics 0.4
0.3
0.2
0.1
0 –3
–2
–1
0
1
2
3
z0 = 1.29 FIGURE 27.23 Area under the standard bell curve to the left of z0 = 129 is approximately 0.90.
and 1 129 −z2 /2 e dz = 090147 √ 2 − The last integral is illustrated in Figure 27.23.
27.4.2
The 68, 95, 997 Rule
Suppose we have some experiment that involves a fixed number of trials. For example, in flipping a coin fifty times, the experiment consists of the fifty flips, each of which is a trial. Let the probability of the result of a trial that we are interested in(such as heads for a coin toss) be p. Compute the mean = Np and standard deviation = Np1 − p. Define the random variable Xx = number of occurrences of the trial result of interest in outcome x. Then the bell curve associated with X is 1 2 2 y= √ e−x− /2 2
(27.5)
27.4 A Coin Tossing Experiment, Normally Distributed Data, and the Bell Curve
1177
It is routine to check (solve y x = 0) that this curve has two points of inflection, namely x = + and x = − . The points of inflection are therefore one standard deviation to the left and right of the mean, hence to the left and right of the bell curve’s axis of symmetry. Moving one, two or three standard deviations to the left or right of the mean is particularly significant, according to the following rule. The 68 95 997 Rule The area under the bell curve between − and + is approximately 68% of the total area. The area under bell curve between − 2 and + 2 is approximately 95% of the total area. The area under bell curve between − 3 and + 3 is approximately 997% of the total area. In terms of probabilities: The probability that the outcome falls between − and + is approximately 068. The probability that the outcome falls between − 2 and + 2 is approximately 095. The probability that the outcome falls between − 3 and + 3 is approximately 0997 The probability that an outcome falls to the left of − 3 or to the right of + 3 is 1 − 0997 = 0003. Hence it is nearly certain that any outcome of the experiment will fall within three standard deviations of the mean.
EXAMPLE 27.19
Suppose we have a dishonest coin, and the probability of a head on any toss is p = 08. Flip the coin 1000 times. Compute = 100008 = 800 and = 10000802 = 1265 Now − = 78735 and + = 81265, so in the thousand flips, there is a probability of about 68 of seeing between 787 and 813 heads. There is a probability of about 95 of seeing between 774 and 826 heads, and a probability of about 997 of seeing between 762 and 838 heads.
SECTION 27.4
PROBLEMS
1. Roll a die 90 times. We are interested in the outcome that a 6 comes up. (a) Determine and for this experiment. (b) Write the equation of the corresponding bell curve and graph the curve. Determine the probability that a 6 comes up:
(f) fewer than 20 times (g) exactly 45 times 2. A coin is flipped 100 times. (a) Determine and for this experiment.
(c) between 30 and 60 times
(b) Write the equation of the appropriate bell curve and graph the curve. Determine the probability that a head comes up:
(d) between 2 and 80 times
(c) between 20 and 40 times
(e) at least 35 times
(d) between 10 and 50 times
1178
CHAPTER 27
Statistics
(e) between 45 and 55 times (f) at least 30 times (g) fewer than 60 times (h) exactly 55 times 3. In a certain kingdom, babies are born with a probability of 048 for boys and 052 for girls. In a particular year, it is expected that 350 children will be born. We are interested in the outcome that a newborn child was a girl. (a) Determine and for this experiment. (b) Write the equation of the bell curve and graph the curve. Determine the probability that: (c) at least 220 of the babies will be girls (d) at least 150 of the babies will be girls (e) the number of girl babies will fall between 120 and 250 (f) the number of girl babies will be exactly 180 4. It is known that, when a stratigather is produced by a certain company, it may be either left handed or right handed. The probability that a particular stratigather will be right handed is 058, and that it will be left handed, 042. A thousand stratigathers are picked at random. Compute the probability (a) That at least 400 of these are right handed. (b) That between 400 and 600 are right handed. (c) That no more than 450 are right handed. (d) That exactly 520 are right handed. 5. A bus company finds that its buses drive an average of 700 miles per month for each passenger with a standard deviation of 65 miles. Assume that the data consisting of the number of miles flown per passenger per month
27.5
is normally distributed. Determine the probability that a typical passenger rides between 250 and 600 miles per month. What is the probability that the passenger rides between 600 and 900 miles per month? 6. An auto rental firm finds that its cars travel an average of 940 miles for each rental, with a standard deviation of 76 miles. Find the probability that a customer who rents the car drives it between 750 and 1000 miles. What is the probability that the car is driven between 300 and 500 miles? 7. In a recent year in the American League, hitters averaged 247, with a standard deviation of 021. Determine the probability that a hitter averaged between 245 and 270. What is the probability that a hitter averaged over 260? What is the probability that a batter averages 300 or more? 8. A coin has a probability of 042 of coming up heads if it is flipped. The coin is flipped 2000 times. (a) Determine a and b so that there is a probability of about 068 of seeing between a and b heads. (b) Determine a and b so that there is a probability of about 095 of seeing between a and b heads. (c) Determine a and b so that there is a probability of about 099 of seeing between a and b heads. 9. An honest die is rolled, and we are interested in the outcome that the number that comes up is even. A total of 550 tosses are made. (a) Determine a and b so that there is a probability of about 068 of seeing between a and b even tosses. (b) Determine a and b so that there is a probability of about 095 of seeing between a and b even tosses. (c) Determine a and b so that there is a probability of about 099 of seeing between a and b even tosses.
Sampling Distributions and the Central Limit Theorem We now have some facts and information about data sets and how certain statistics can be extracted from them. We now move toward addressing a fundamental problem of statistics. In statistical studies, the term population refers to the set of all the objects under study. For example, if we are studying airplane parts coming off a factory production line in a certain time period, then the population would consist of all the parts produced in that time. Suppose we want to measure the success of some process (such as manufacturing) by estimating how many defective products are produced. An obvious strategy is to sample the product to try to infer from the sample a measure of how many products are defective in the entire population. How can we do this in a reasonable and efficient manner? Clearly the answer lies somewhere between the extremes of inspecting every item (probably costly and inefficient), and inspecting only none or one (of questionable value). How do we decide how many items
27.5 Sampling Distributions and the Central Limit Theorem
1179
to sample, and what confidence can we have in conclusions drawn from samples? In particular, can we tie the number of samples to a predetermined level of confidence, so that we can set the sample size to achieve this confidence level? To approach this important issue, let us agree that, in taking a sample from a collection of objects, we will sample with replacement. This means that, if we pick an object and make some measurement on it, then we replace this object back in the population before picking another object to test. This way each sample is chosen from the entire population. One reason for doing this is that usually very large numbers of objects are involved, so sampling with or without replacement makes little numerical difference. In addition, sampling with replacement ensures that the selection of each sample object is independent from the selection of any other, since all objects are chosen from the entire set of objects. This is a considerable simplification. We also will agree that sample size is fixed at some number n throughout the discussion. Thus, all samples in a given study are taken with replacement, and have the same sample size. When we take samples involving numerical quantities, each sample will have a mean (mean of the numbers in the sample). The sampling distribution of the mean is the probability distribution of the means of the samples, assuming that all the samples have the same size. We also may compute other statistics from the samples, such as medians of the samples and standard deviations of the samples, and average these.
EXAMPLE 27.20
A contractor looks at the number of buildings his company has completed over the past five years, and the numbers are 1, 4, 6, 8 and 9. These numbers constitute the population under consideration. Samples of size two are taken, with replacement. Of course in reality, we would not take samples of two out of five, but this example has the sole purpose of illustrating the discussion. With replacement, there are 52 = 25 samples of two of the numbers. The samples are listed in Table 27.5, together with each sample mean and sample standard deviation. If we average all fifty numbers contained in the samples, we find that this mean is 56. Column 2 of the table contains the mean of each sample, which in this case is an average of just two numbers for each sample. If the means x for each sample x are averaged (that is, average the numbers in column two), we find that this sample mean is also 56. This is not a coincidence. The mean of all the numbers in the samples is the same as the average of all the sample means. We say that this sample statistic (mean of the samples) targets the population parameter (mean of all the items appearing in the samples). The sampling distribution of the mean is the probability distribution of the sample means (which are in Column 2). There are 25 samples, and, for example, the number 5 occurs four times as an average of samples, so P5 = 4/25. Similarly, the number 35 occurs twice as an average of samples, so P35 = 2/25. The entire sample distribution is: P1 =1/25 P25 = P35 = P45 = 2/25 P5 = 4/25 P4 =1/25 P6 = 3/25 P65 = P7 = P75 = 2/25 P8 = 1/25 P8 = 2/25 P9 = 1/25 As required, these probabilities sum to 1. If we compute the mean of the sample standard deviations (Column 3), we get 22368. However, if we compute the standard deviation of the fifty items in the samples, we get 28705. The standard deviation of the samples is not targeted by the mean of the sample standard deviations.
1180
CHAPTER 27
Statistics
TA B L E 2 7 . 5 Sample x 1,1 1,4 1,6 1,8 1,9 4,1 4,4 4,6 4,8 4,9 6,1 6,4 6,6 6,8 6,9 8,1 8,4 8,6 8,8 8,9 9,1 9,4 9,6 9,8 9,9
Sample Mean x 1 25 35 45 5 25 4 5 6 65 35 5 6 7 75 45 6 7 8 85 5 65 75 85 9
Sample Standard Deviation 0 21213 35355 43012 56569 21213 0 14142 28284 35355 35355 14142 0 14142 21213 49497 28284 14142 0 07071 56569 35355 21213 07071 0
In making inferences about a population by observing samples, the sample distribution of a proportion often plays a role. Suppose a sample of n objects from the population has k of one type of object. Then the sample proportion of that sample for that item is k/n. The sampling distribution of the population, with respect to that item, is the probability distribution of the sample proportions for that item.
EXAMPLE 27.21
Suppose, in trying to determine quality of manufacture of a product, ten samples of 20 items each are taken off the line (with replacement). Table 27.6 shows the data and the sample ratios (number defective divided by sample size): The sample ratio for each sample (with regard to defective items) is the number of defective items in the sample, divided by n = 20. Since 1/20 occurs four times out of ten as a sample ratio, P1/20 = 4/10 = 04. In this way we compute the sampling distribution of this proportion: P1/20 = 04 P0 = 04 P1/10 = 02
We now have the vocabulary to state the important central limit theorem, which relates a distribution of the original population itself to a distribution of the sample means.
27.5 Sampling Distributions and the Central Limit Theorem
1181
TA B L E 2 7 . 6 Sample 1 2 3 4 5 6 7 8 9 10
THEOREM 27.4
Number Defective 1 0 2 0 0 1 2 1 1 0
Sample Ratio 1/20 0 1/10 0 0 1/20 1/10 1/20 1/20 0
Central Limit Theorem
Suppose a random variable X has mean and standard deviation . Then the distribution of sample means will approach a normal distribution as the sample size increases, regardless of whether or not the original population distribution is normally distributed. Further, 1. The mean of all the sample means equals . √ 2. The standard deviation of all sample means is approximately / n, where n is the sample size. This approximation can be improved by increasing n. The first conclusion can be phrased: the mean of the samples equals the mean of the population. The second conclusion justifies the sense in which we can approximate the standard deviation of the total population using the standard deviation of the sample means. The central limit theorem justifies the approximation of a distribution by a normal distribution. Even when the original random variable does not have a normal distribution, the sample means will approach a normal distribution. As a rule of thumb (that is, true except for “pathological” cases), we can safely approximate the sample means distribution by a normal distribution if n > 30. The following example illustrates the idea of the central limit theorem, and emphasizes the crucial difference that must be understood between the distribution of the original random variable, and the distribution of the sample means.
EXAMPLE 27.22
Imagine that forty samples of six numbers each have been drawn, with replacement, from the integers 0 through 9, inclusive. Table 27.7 shows the samples and the sample means. The sample column has forty entries of six numbers each, hence a total of 240 numbers (the population), each an integer from 0 through 9, inclusively. The random variable X is defined by Xk = number of occurrences of k in this set of 240 numbers. Figure 27.24 shows a graph of these values of Xk. Clearly this is not a normal distribution, nor should we expect it to be. Figure 27.25 shows a graph of the distribution of the sample means. This distribution is not normal, because n = 6 is too small. However, if n were chosen larger, the sample means would more closely approximately a true normal distribution.
1182
CHAPTER 27
Statistics
TA B L E 2 7 . 7 Sample 289186 957888 231332 767099 686505 859901 063159 501547 855909 161078 809973 290800 925079 732111 795556 145776 276535 605812 633966 747358
Sample Mean 566 75 233 633 5 533 4 366 6 383 6 316 533 25 616 5 466 366 550 560
Sample 188635 405396 148235 332674 962669 445186 637269 521802 040848 358298 568735 630761 607495 188965 201037 749018 180629 291554 958087 799365
Sample Mean 516 45 383 416 633 466 550 3 4 583 560 383 516 616 216 483 433 433 616 65
35 30 25 20 15 10 5 0 0 FIGURE 27.24
2
4
6
8
10
Graph of X(k).
It is routine to compute the mean of the population, obtaining =
387 80
A similar routine calculation shows that the mean of the sample means is also 387/80, as expected. The standard deviation of the population is found to be =
√ 492 159 240
27.5 Sampling Distributions and the Central Limit Theorem
1183
14 12 10 8 6 4 2 0 0
2
4
6
8
FIGURE 27.25 Distribution of the sample means.
which is approximately 29231. The standard deviation of the sample means is √ 86 919 means = 240 √ √ or approximately 12284. If we compute / n, which is / 6 for this example, we get 11933. According to the central limit theorem, this should approximate means , and indeed this number is within 0035 of means . Even with a small n, this approximation is good to about three hundredths in this example.
EXAMPLE 27.23
The headmaster of an exclusive private high school is interested in the probability that all twenty two of his seniors will score at least 670 out of 900 on a standardized test used by colleges in making admission decisions about applicants. The national average on this test is 640, with a standard deviation of 120. Think of taking samples of size n = 22 from the national pool of students taking this test, and use the approximately distributed sample means to draw conclusions about this sample. For the entire population, = 640 and = 120. For the sample means, means = = 640 and 120 means = √ 22 The bell curve for the sample means is the graph of √ 2 1 2 y =√ e−x−640 /2120/ 22 √ 2120/ 22 √ 11 −11x−6402 /14400 = √ e 120
1184
CHAPTER 27
Statistics
0.012
0.008
0.004
0 600
620
FIGURE 27.26
x ≥ 670.
640
660
680
Area under the bell curve y =
700
720
740
√ 2 11 √ e−11x−640 /14400 120
for
We want the area under this bell curve for x ≥ 670. This area is shown in Figure 27.26, and is approximately equal to √ 11 900 −11x−6402 /14400 e dx √ 120 670 or 0120 48. Perhaps on average, the headmaster can expect the entire class to score this high once in ten years, although it also could happen two years in a row, or never in this headmaster’s tenure. What is the probability that all twenty-two students of the students score between 640 and 700? This is √ 11 700 −11x−6402 /14400 e dx √ 120 640 or 0490 49, very close to one half. What is the probability that ten of the students score better than 660? Now use n = 10 in forming the samples, so the bell curve is √ 5 2 y= √ e−5x−640 /14400 120 and the probability we want is
which is 0299 08, about 3/10.
√ 5 900 −5x−6402 /14400 e dx √ 120 660
27.6 Confidence Intervals and Estimating Population Proportion
SECTION 27.5
PROBLEMS
1. Thirty samples are taken of five digit numbers. Each of these numbers has been drawn, with replacement, from the integers 3 through 9, inclusive. The samples are given in the following table: Sample
Sample
Sample
Sample
Sample
74974 76693 43856 85994 36496 48659
78365 57733 86934 43865 59378 43958
94489 58438 96649 67339 47937 48896
84937 59786 88533 59643 75488 69537
33933 58393 48576 58834 84938 78953
(a) Compute the sample means, the mean of the sample means, and the mean of the population. (b) Compute the standard deviation of the population and the standard deviation √ of the sample means, and verify that means ≈ / n for the appropriate choice of n. 2. Thirty samples are taken of seven digit numbers. Each of these digits has been drawn, with replacement, from the integers 0 through 9, inclusive. The samples are given in the following table: Sample 9388204 8902817 7921082 8019283 5501829 4039112
Sample 7939720 3022850 5910293 4911029 3371932 9571120
Sample 5773902 5492018 9017839 3019402 0002910 9912118
Sample 1912038 4491002 4491029 5616209 3611390 8401938
Sample 1749823 4910292 4019283 9201898 4092179 7109032
(a) Compute the sample means, the mean of the sample means, and the mean of the population. (b) Compute the standard deviation of the population and the standard deviation √ of the sample means, and verify that means ≈ / n for the appropriate choice of n. 3. A recruiter for a government agency wants his candidates to be successful on a test administered by the agency to prospective employees. He would like his group of thirty recruits to score at least 700 out of the possible 950. Over the time this test has been administered, the average score has been 670, with a standard deviation of 105.
27.6
1185
(a) Estimate the probability that all thirty of the recruits will score in the 700–750 range. (b) Estimate the probability that all the recruits will score at least 670. (c) The recruiter gets a bonus if seven of the candidates score 800 or higher. Should the recruiter look forward to a bonus? 4. A uniform maker has a large order from a new branch of the military. On average, the pants legs are 31 inches, with a standard deviation of 32 inches. Suppose six new recruits are chosen at random. (a) Find the probability that their mean length is at least 29 inches. (b) Find the probability that their length is in the 29–33 inch range. (c) Find the probability that their length is no greater than 32 inches. 5. A state uses standard braces for its highway bridges. The braces can handle a mean pressure of 5000 pounds per square inch, with a standard deviation of 800. Suppose twenty braces are tested at random throughout the state. (a) What is the probability that their mean pressure tolerance is at least 4800 pounds per square inch? (b) What is the probability that their mean pressure tolerance falls in the 4700–5100 range? (c) What is the probability that their mean pressure tolerance is not less than 4500 pounds per square inch? 6. A product coming off an assembly line has been found to weigh on average 17 pounds, with a standard deviation of 18. Thirty samples are chosen at random from the line. (a) What is the probability that these samples will weigh on average between 168 and 172 pounds? (b) What is the probability that they weigh on average at least 167 pounds? (c) Find the probability that they do not weigh on average less than 172 pounds.
Confidence Intervals and Estimating Population Proportion If we somehow know that exactly five of every thousand cars coming out of a plant are defective, then there is no need to take samples, and we do not need statistics. The need for statistical analysis arises when we do not have perfect knowledge of the outcome of some
1186
CHAPTER 27
Statistics
process, as is usually the case in real-world scenarios. In this section we will focus on how to estimate (make an educated guess about) certain statistics, such as the mean and standard deviation of a population. We also want to be able to determine a confidence interval, which is a set of values measuring, in a sense we will define, ranges of accuracy to be expected of our estimates. Throughout this section we will have some population proportion p. We are examining some question or process which has exactly two outcomes, called success and failure. We form random samples of size n, and consider the sample proportions , p = x/n, where n is the size of the sample, and x is the number of successes in the sample. This number is between 0 and 1 and may be thought of as a probability. The complementary probability is , q = 1 −, p. As a typical example, we might draw samples of 900 trout from a stream over a day, and find that 47% are male. The sample size here is n = 900, and the sample proportion is , p = 047. We also assume that there is a fixed number of independent trials, and that the resulting distribution can be approximated by a normal distribution. In making estimates of various statistics or numbers associated with the population, we will use the notion of a confidence interval, which is an interval of values used to estimate the statistic. The corresponding confidence level of the estimate is the ratio of the times that this interval actually does contain the statistic of interest, taken over a large number of repetitions or samples. This confidence level is routinely denoted as 1 − . The most common confidence levels, expressed as percents, are 90%, 95%, or 99%. Below 90% is often unacceptable as a worthwhile measure of accuracy of the estimate. Associate with 90% the probability 09, so = 01 and 1 − = 09. Similarly, for 95%, use = 005, so 1 − = 095. And for 99%, use = 001, so 1 − = 099. If, for example, we are using 95% as a confidence level to estimate the population proportion p, and this determines (in a way we will describe) an interval a < p < b, then we say we are 95% confident that the true value of p, as estimated, falls between a and b. When these percentages are written decimals, we may think of them as probabilities, or areas bounded by the standard bell curve, which is the graph of 1 2 y = √ e−z /2 2 Part of this graph is shown in Figure 27.27. For a given , there is a point Z such that the area under the graph to the right of Z and to the left of −Z is , hence the area between −Z and Z is 1 − . This number Z, which depends on , is denoted z/2 . The use of /2 in the subscript is a reminder that the bell graph has two tails (to the left of −z/2 and to the right of z/2 ), having total area , so the area under just one tail is /2. The number z/2 is called a critical value, separating those sample proportions (between −z/2 and z/2 ) that are likely from those that are unlikely (in the right or left tails of the bell curve). Table 27.8 gives the critical values corresponding to confidence levels of 85%, 90%, 95%, and 99%. These percents are written as decimals 085, 090, 095, and 099 to have interpretations as areas. For example, the area under the standard bell curve between −z/2 and z/2 for = 015 is 1 144 −z2 /2 e dz = 085013 √ 2 −144 corresponding to 85% of the sample proportions falling in this interval. We may also think of 085 as the probability that a sample proportion falls between −144 and 144 for this standard curve.
27.6 Confidence Intervals and Estimating Population Proportion
1187
0.4
0.3
0.2
0.1 Area α
Area α
2
2
0 –zα/2
zα/2
FIGURE 27.27 Critical point z/2 for the standard bell curve.
TA B L E 2 7 . 8 Confidence Level (As a Decimal) 085 090 095 099
015 010 005 001
z/2 144 1645 196 2575
From previous results, we can write = np and =
√
npq
Since these results are for n trials, we can write the mean of the sample proportions as np =p = n and the standard deviations of the sample proportions as √ npq pq = = n n In general, we do not know p (or q), and we want to estimate p and , q. these values from , Replace p by , p and q by , q in the expression for to obtain , p, q /n. The probability is 1 −
1188
CHAPTER 27
Statistics
that a sample proportion will differ from the actual population proportion (which we want p, q /n. This leads us to define the maximum error of the to estimate) by no more than z/2 , estimate to be , p, q (27.6) = z/2 n or , p1 −, p = z/2 n The confidence level associated with this is , p− < p <, p +
EXAMPLE 27.24
Suppose samples of 900 fish are drawn from a stream each day. Suppose on a particular day, it is found that 47% were male. Here n = 900 and , p = 047. Based on the information we have, 047 is the best estimate we can make for , p, the proportion of the total fish population that is male. Suppose we want the maximum error associated with a 95% confidence level. For a 95% confidence level, = 005 and we read from Table 27.8 the critical value z/2 = 196. The maximum error is 047053 = 00326078 = 196 900 Generally, we round maximum error calculations to three decimal places, so take = 0033. The confidence interval is 047 − 0033 < p < 047 + 0033 or 0437 < p < 0503 There is a probability of 1 − (or 095) that the actual population proportion falls in this interval. Often, we want to determine the size of the sample we need to estimate some statistic with a certain level of confidence. To do this for the population proportion, begin with the expression (27.6) for the maximum error and solve for n to get p, q z/2 2, (27.7) 2 Use this to estimate the size n that should be used to estimate p with a confidence interval , p− < p <, p + , assuming that , p is known. Of course, usually equation (27.7) will not yield an integer, so in practice take n to be the smallest integer larger than z/2 2, p, q /2 . For example, if the right side of equation (27.7) is 95624, then choose n = 957. In many practical situations that occur, we do not know , p. If this is the case, replace , p, q in equation (27.7) with 025 to write n=
n=
025z/2 2 2
27.6 Confidence Intervals and Estimating Population Proportion
1189
The rationale for this is that 025 is the largest value that , p, q can assume, occuring when , p =, q = 05.
EXAMPLE 27.25
A member of a United Nations team wants to survey the population of a certain country to determine the commonality of households having at least two telephones. She wants to be 90% confident of her results, and to be in error by no more than 3%. How many households should she survey? Here we take 090 to be the confidence level that is wanted, and = 003. There is no information about , p, so compute 025z/2 2 2 For a 90% confidence level, Table 27.8 gives z/2 = 1645. Thus compute 02516452 = 75167 0032 She should do a random survey of 752 households. Suppose, from another recent study, it is estimated that , p = 032. Now use equation (27.7) and compute 16452 032068 = 65426 0032 This estimate tells us to survey 655 households. This, however, assumes that the "recent study" was recent enough that the number of two-telephone households might not have changed much. If, for example, this study was done ten years ago, we might play safe and go with 752 households in the survey. Suppose we wanted a 95% instead of 90% confidence level, assuming no information about , p. Now we would use = 005 and z/2 = 196. Compute 0251962 = 106711 0032 so she should survey 1068 households. It is not surprising that more households need to be included in the survey to increase the confidence level.
SECTION 27.6
PROBLEMS
1. A parachute assembly and packing plant produces thousands of parachutes each month. One month, 1200 are inspected and it is found that 02% have a defect. (a) Find an estimate with a 99% confidence interval of the proportion of parachutes produced in a month that are defective. (b) Redo (a), except with a 95% confidence level. (c) Redo (a), with a 90% confidence level.
(d) Suppose only 800 parachutes are inspected instead of 1200, and it is still found that 02% are defective. Now determine an estimate with a 95% confidence interval of the proportion of parachutes produced in a month that are defective. How has decreasing the size of the sample affected the confidence interval? 2. A drug being tested for a certain condition appears to have side effects. In a large clinical trial conducted nationally, 750 patients who took the drug are selected
1190
CHAPTER 27
Statistics
at random, and it is found that 22% developed skin rashes.
(b) How many people should he survey if, in (a), he is willing to allow the error to be no more than 5%?
(a) Determine an estimate with a 99% confidence interval of the proportion of the entire patient population that will be expected to develop a skin rash.
(c) Suppose, in the previous year’s study, it was found that , p = 037. Now make an estimate of how many people he should survey for a 95% confidence level and an error of no more than 2%.
(b) Redo (a) for a 95% confidence interval.
(d) Suppose another person in marketing notes an economic trend suggesting that a better estimate for , p is 032. How will this affect the conclusion of (c) about the number of people to be surveyed?
3. A survey taken over a week has inspected 200 fish in a lake and found that 87 were man-eating guppies. (a) Determine an estimate with a 95% confidence interval of the proportion of the fish population of the lake that are man-eating guppies. (b) Determine an estimate with an 85% confidence interval of the proportion of the fish population that are man-eating guppies. 4. A plant produces packaged chicken for distribution to grocery stores throughout the midwest. In a sample of 100 packages of chicken taken over one work day, it is found that seven contain the blue virus, which has no other effect on the person who consumes the chicken than to turn their skin permanently light blue. (a) Determine an estimate with a 99% confidence interval of the proportion of one day’s chicken packages that contain the virus. (b) Determine an estimate with a 90% confidence interval of the proportion of one day’s chicken packages that contain the virus. 5. A marketing executive wants to survey a city to determine how many people have purchased his company’s product over the past twelve months. (a) Suppose he wants a 95% confidence level in the results of the survey with an error of no more than 2%. How many people should he survey?
27.7
6. A public health official is charged with completing an annual survey of a county population, checking for incidents of a certain viral infection over the past year. (a) Suppose she wants a 99% confidence level in the results of the survey with an error of no more than 3%. How many people should she survey? (b) How many people should she survey if, in (a), she is willing to settle for a 95% confidence level? (c) Suppose, in the previous year’s study, it was found that , p = 012. Make an estimate of how many people she should survey for a 99% confidence level and an error of no more than 2%. 7. A professional pollster is charged with taking a survey throughout a large state, in order to estimate the number of qualified voters who favor the Freeload Party candidate. (a) Suppose he wants an 85% confidence level in the results of the survey with an error of no more than 7%. How many people should he survey? (b) How many people should he survey if, in (a), he is willing to allow the error to be no more than 12%? (c) Suppose, in the previous election, it was found that , p = 006. Now make an estimate of how many people he should survey, for an 85% confidence level and an error of no more than 7%.
Estimating Population Mean and the Student t Distribution This section follows the theme of the preceding one, except there we wanted to estimate population proportion, and now we are concerned with estimating the population mean . We assume the same setting of random independent samples of uniform size, and a distribution that can be approximated as a normal distribution. Let x be the sample mean of sample x. As noted previously, in the absence of more definitive information, it is reasonable to estimate the population mean as the sample mean. The issue now is to develop a concept of maximum error enabling us to state confidence √ intervals. Previously we saw that, if is the standard deviation of the population, then / n is the standard deviation of the sample means, with n the sample size as usual. In the preceding
27.7 Estimating Population Mean and the Student t Distribution
1191
section, equation (27.6) gave the maximum error in the estimate of the population proportion. The adaptation of this to maximum error in the estimate of the population mean is = z/2 √ (27.8) n We will still use for this maximum error, taking from context whether we are referring to population proportion or mean. The corresponding confidence interval in estimating from x is x − < < x + If we solve equation (27.8) for n, we get z 2 /2 n= The right side of this equation is used to estimate the size of samples that should be taken to have a maximum error of and a confidence level determined by the critical point z/2 .
EXAMPLE 27.26
Suppose a study of blood pressure is being planned. Sample groups of size n = 210 are to be used, and a 99% confidence level is sought. It is known from many studies that = 082, and the sample mean is x = 127. We want to estimate the population mean and the confidence interval. From Table 27.8, choose z/2 = 2575. Then 082 = 2575 √ = 014571 210 so x − 014571 < < x + 014571 or 12685 < < 12715 Suppose we go back to the planning stages, and imagine that we have not yet decided on the number of samples to take. We want n so that we have a 95% confidence level. For 95%, use z/2 = 196. We need some information about the maximum error. Suppose we want = 02. Now estimate 082196 2 n= = 64577 02 so use n = 65 as sample size. This example assumed the ideal condition that we had information about . Often, perhaps usually, we do not know . In such a case, it is common practice to use the student t distribution instead of the normal distribution. Assuming that the population is approximately normal, the student t distribution, or just t distribution, is defined by tx =
x− √ s/ n
Because of the appearance of n in this definition, there is a different t distribution for each positive integer n. Each is bell-shaped in general appearance, but is not a normal distribution.
1192
CHAPTER 27
Statistics
As n increases, the corresponding t distributions approach the standard normal curve (Figure 27.28). As with the standard bell curve, t/2 is that number such that the area under the t distribution to the right of t/2 and to the left of −t/2 , is , hence the area between −t/2 and t/2 is 1 − (Figure 27.29). In determining t/2 , n plays an important role, since the distribution is different for each n. In the present context, with estimation of population mean as the objective, note that the mean x of each sample x is a sum of n numbers, divided by n. If x is known, any n − 1 of these numbers can be chosen as having any values, and the nth number is then determined. We therefore say that t/2 has n − 1 degrees of freedom. In a table of critical points for t distributions, there will be a row or column associated with the number of degrees of freedom, and from this t/2 can be read for given values of and n. In practice, we use t-critical points just as we used z-critical points previously. Now, given n and , which in turn is determined by a confidence level that is stated, define the maximum error t as s t = t/2 √ n The corresponding confidence level is x − t < < x + t
Standard bell curve t distribution n=7 t distribution n=3
FIGURE 27.28 Student t distributions for n = 3 and n = 7, compared to the standard bell curve.
27.7 Estimating Population Mean and the Student t Distribution
1193
t distribution
Area α2
Area α 2
–tα/2
0
tα/2
FIGURE 27.29 Critical point −t/2 for the student t distribution.
EXAMPLE 27.27
An international gymnastics oversight committee is checking blood levels of athletes for the performance enhancing chemical hyperjump. Random samples of 35 gymnasts are taken, and it is known that the sample mean is x = 04 milligrams. We want to construct the maximum error if a confidence level of 99% is needed, and then find the associated confidence interval about the population mean. For a 99% confidence level, = 001. With n = 35, the number of degrees of freedom is 34, and we find from a standard table that t/2 = 2728. Suppose from the sample we compute the standard deviation s = 003. Now compute the maximum error: 003 t = 2728 √ = 0013833 35 The confidence interval is 04 − 0013833 < < 04 + 0013833 or 0386 < < 0414 to three decimal places.
CHAPTER 27
1194
SECTION 27.7
Statistics
PROBLEMS (b) Suppose in (a) that the sample size has not been set, but n must be chosen to have a 99% confidence level with an error no more than 01. Estimate the sample size that should be used.
1. (a) Suppose a study of blood concentration of a certain protein is being done. Groups of 350 patients are sampled. It is known from parallel studies that = 104, and the sample means are x = 72. For a 99% confidence level, estimate the population mean and the confidence interval.
4. Random samples of 50 lawnmower motors are taken from a plant each week, checking for a certain amount of contaminant in the fuel line. Assume that x = 08 and s = 005. Use the t-distribution to construct a 95% confidence interval in estimating the population mean. Hint: A table estimates the critical value here as 2009.
(b) Suppose in (a), the sample size has not yet been determined, but n should be chosen to have a 99% confidence level and an error of no more than 02. Estimate the sample size that should be used. 2. (a) Samples of size 50 are being drawn from a company’s production line. It has been found that x = 119, and = 07. If a 95% confidence level is wanted, estimate the population mean and determine the confidence interval.
5. Random samples of 75 students are drawn from the senior year population of a large city public school system in studying absences per month. With x = 3 and s = 07, construct a 95% confidence interval in estimating the population mean. Hint: The critical value is approximately 1992.
(b) Suppose in (a), the sample size has not yet been determined, but n should be chosen to have a 99% confidence level and an error of no more than 02. Estimate the sample size that should be used.
6. Random samples of fish of size 200 are taken from fishing waters near a certain island, checking for irregularities in gill structure. With x = 7 and s = 001, construct a 95% confidence interval in estimating the population mean. Hint: The critical value is approximately 1971.
3. (a) Samples of 100 patients are being drawn randomly from a clinical trial. If it is estimated that = 24 and x = 106, estimate the population mean and confidence interval for a 95% confidence level.
27.8
Correlation and Regression In this section, we are concerned with a collection of n data pairs xi yi , which we can plot as points in the plane We say that a correlation exists between the x values and the y values when there is some specific relationship between them. Although there are many different kinds of correlations, we will be concerned here with linear correlations, and whether a linear correlation exists between a given data set of points. A linear correlation exists when the points xi yi are approximately distributed about a straight line in the plane. Figure 27.30 shows a data set with a linear correlation, while there is no such correlation in the data set of Figure 27.31. The data set of Figure 27.30 would be said to have a positive linear correlation (between x and y values), while the data set of Figure 27.32 has a negative linear correlation. These characterizations depend on the slope of the line that is approximately along the points. Define the number
n c=
i=1 xi − xyi − y
n − 1sx sy
(27.9)
27.8 Correlation and Regression 8 7 6 5 4 3 2 1 0 0
2
4
6
10
8
FIGURE 27.30 Data with a positive linear correlation.
30 25 20 15 10 5 0 0
1
2
3
FIGURE 27.31 Data with no significant linear correlation.
4 2 0 –2 –4 –6 –8 FIGURE 27.32 Data with a negative linear correlation.
4
5
1195
1196
CHAPTER 27
Statistics
to be the linear correlation coefficient of the data. As usual, x=
n 1 x n i=1 i
is the mean of the xi ’s, and y is the mean of the yi ’s. Sometimes x y is called the centroid of the data. The term sx is the standard deviation of x1 xn , while sy is the standard deviation of y1 yn . It is routine to carry out the algebra to show that n ni=1 xi yi − ni=1 xi ni=1 yi c = " " n ni=1 xi2 − ni=1 xi 2 n ni=1 yi2 − ni=1 yi 2 While this expression for c may appear more complicated that in equation (27.9), it is more explicit in the sense that it involves only the data points themselves, and not their means or standard deviations. It is routine to check that −1 ≤ c ≤ 1. Further, if the points lie on a line, say yi = axi for i = 1 n, then c = 1. In general, if c is near 1 or −1, we say that there is a significant linear correlation between the xi s and yi s. If c is near 0, we say that there is no significant linear correlation between the xi s and the yi s. Although this is vague, we will firm up this idea shortly. The linear correlation coefficient is used in connection with tables of values. Table 27.9 shows a typical part of such a table. This table has been constructed for a 95% ( = 005) confidence level. Corresponding to the number n of samples, the critical value has this significance. If c is greater than the critical value, then there is a 1 − 005 (or 095) probability (confidence) that there is a significant correlation between the xi ’s and the yi ’s. For example, with n = 15 points, if c > 0514, there is a 1 − 005 = 095 probability that there is a significant linear correlation between the points. If n = 50 and c > 0279, there is a 095 probability that there is a significant linear correlation. If c is less than or equal to the critical value, then there is only a 005 probability of a linear correlation between the xi ’s and yi ’s. Notice that, as the number of sample points increases, we can infer significant linear correlations with smaller values of c, while, if n is small (and therefore provide less information), it takes a larger c to draw this inference.
TA B L E 2 7 . 9 n 10 15 20 25 30 35 40 45 50 60 80 100
Critical value of c for = 005 0632 0514 0444 0396 0361 0335 0312 0294 0279 0254 0220 0196
27.8 Correlation and Regression
1197
EXAMPLE 27.28
The following table gives gold medal heights in the pole vault for the Olympics, from some years extending from 1932 through 1996. Heights are in feet. Year 1932 1936 1948 1952 1956 1960 1964 1968
Height 1415 1427 1410 1492 1496 1542 1673 1771
Year 1972 1976 1980 1984 1988 1992 1996
Height 1804 1804 1896 1885 1977 1902 1942
We will investigate this data set for a significant linear correlation, or absence of one. A scatter graph of the data is shown in Figure 27.33. The x-values are the years and the y-values are the heights. We find that x = 19669333, y = 169573, sx = 196813, and sy = 21138. A routine computation yields c = 09546. From Table 27.9 with n = 15, because c > 0514, we infer that there is a 95 probability that there is a significant linear correlation between x- and y-values. This is intuitively apparent from Figure 27.33.
25 20 15 10 5 0 1920
1930
1940
1950
1960
1970
1980
1990
2000
FIGURE 27.33 Olympic gold medal vaulting heights.
EXAMPLE 27.29
Consider the following data set: xi yi
0 1
05 212
10 624
15 244
20 1158
25 372
30 104
35 2563
40 263
Figure 27.34 shows a scatter graph of this data. There does not appear to be a significant linear correlation. To check this intuition, we find that c = 03884153476. Table 27.9 does not have
1198
CHAPTER 27
Statistics
30 25 20 15 10 5 0 0 FIGURE 27.34
1
2
3
4
5
Data with no significant linear correlation.
a critical value corresponding to n = 9, so we will use the value for n = 10 as a best estimate. This critical value is 0632. Since c < 0632, there is only a 005 probability that there is a significant linear correlation between the x-values and the y-values. Correlation must not be confused with causality. Just because two sets of data have a significant linear correlation, does not imply that the x-values cause the y-values. There may or may not be such a cause and effect relationship. In the case of pole vaulting heights, it is hard to believe that the year is a cause of the height achieved in the pole vault. The fact that the heights increase with passing years is due to a complex set of factors, probably including conditioning of the athletes, improvements in vaulting technique, and changes in materials from which vaulting poles were made. To understand the idea of regression, consider again the data shown in Figure 27.33. We found that there was a significant linear correlation between x-values and y-values. This suggests that the data, plotted as points in the plane, form an “approximate” straight line. It is possible to determine a straight line L which is a best fit to the data. This line is called the regression line for the data. It is defined to be the best least squares fit to the data, in the sense of minimizing the sum of the squares of the vertical distances between the yi s and the respective y-values on the line corresponding to the xi s. If we write the equation of the regression line as y = a + bx, then the standard least squares method yields ni=1 yi ni=1 xi2 − ni=1 xi ni=1 xi yi a= (27.10) n ni=1 xi2 − ni=1 xi 2 and b=
n
n
n n i=1 xi yi − i=1 xi i=1 yi n ni=1 xi2 − ni=1 xi 2
(27.11)
We can also compute a from the expression a = y − bx if we have computed the means of the xi ’s and the yi ’s.
(27.12)
27.8 Correlation and Regression
1199
EXAMPLE 27.30
We will determine the regression line for the heights of Olympic pole vaults. Use equations (27.11) and (27.12) to compute a and b, obtaining a = −185637 and b = 0103 The regression line for this data has equation y = −185637 + 0103x Figure 27.35 again shows the data (as in Figure 27.33), along with the regression line. The data ends at 1996. If we use the regression line to try to project ahead to 2008, we get y = −185637 + 01032008 = 21187 ft This would require an extraordinary effort. But we learn a more dramatic lesson if we use the line to project to the 3000 Olympics. Now we get y = −185637 + 01033000 = 12336 ft No one would believe this. The point is that the regression line in this example has positive slope, and it will return as large as value of y as we want by putting in a sufficiently large value of x. However, there are clearly limits to how high a human being can vault. We do not know what this limit is, but 124 feet seems too high. As this example shows, we must not use a regression line to attempt to project conclusions about y-values that are “too far” outside the data set from which the line was derived. The further out of touch we get from the data, the less reliable our projected conclusions are. There is a very useful interpretation that can be made for the square of the linear correlation coefficient c. Suppose we have sample points x1 y1 xn yn , and there is a significant linear correlation between the xi ’s and the yi ’s. The regression line is y = a + bx, with a and b from equations (27.10) or (27.12), and (27.11). For a given xi , write yr i = a+bxi to distinguish between yi and the y-coordinate of the point on the regression line corresponding to xi . 25
20
15
10
5
0 1920
1930
1940
1950
1960
1970
1980
FIGURE 27.35 Olympic vaulting heights and the regression line.
1990
2000
1200
CHAPTER 27
Statistics
If we think of the regression line as defining a sense of how the data “should” arrange itself, we can look to the regression line to “explain” the y-values. That is, think of yr i as the number that should go with xi , and yi as the number that actually does go with xi in the sample. This makes it reasonable to attempt to compute the ratio of differences of the data from the mean that are explained by the regression line, to the total differences of the data from the mean. We can quantify these ideas as follows. Interpret n
yi − y
i=1
as the total deviation of the yi s from their mean. Further, n
yr i − y
i=1
is the deviation of regression line y-values from the mean of the yi ’s. Think of this as a deviation explained by the regression line. The ratio we are interested in, which we will temporarily call k, is therefore n i=1 yr i − y k= n i=1 yi − y Next, n
yi − yr i
i=1
is the deviation of the regression line values from the actual y-values. Think of this as the deviation for which the regression line presents no rationale. Now observe that n n n yi − y = yr i − y + yi − yr i i=1
Therefore,
i=1
i=1
n
i=1 yr i − y n i=1 yr i − y + i=1 yi − yr i
k = n =
differences of yi s from the mean explained by the regression line total differences of yi ’s from the mean
Now comes what may be a surprise. If the algebra is carried out, we find that k = c2 . Thus c enables us to detect a significant linear correlation, and c2 gives a measure of how well the deviations of the yi ’s from the norm are explained by the regression line.
EXAMPLE 27.31
A state compiles statistics on deer hunters and deaths in the deer population each year. For a period of ten years, these are given by the following table: Number of Hunters Deaths in Deer Population
168 28
172 30
194 36
204 39
230 42
270 50
295 46
320 57
390 61
402 62
We find that x = 25550, y = 4510, sx = 97961, and sy = 1228. A routine computation yields c = 08488. With n = 10 in Table 27.9, the critical value is 0632. Because c > 0632, there is
27.8 Correlation and Regression
1201
70
Number of deer deaths
60 50 40 30 20 10 0 0
100
200
300
400
500
Number of hunters FIGURE 27.36 Hunters vs. deer deaths.
a 95 probability that there is a significant linear correlation between x- and y-values. A scatter plot of these values is given in Figure 27.36. Now we ask: How many of the deaths in the deer population are explained by the number of hunters? We expect some deer to be killed by hunters, and some to die from other causes. Compute c2 = 084882 = 07201 We conclude that 72% of the deer deaths can be explained by the regression line, which means that 72% can be explained by the presence of the hunters. The other 28% must be accounted for by other factors. Finally, we would like to develop a kind of confidence interval for predictions based on a regression line. In the present context, such an interval will be called a prediction interval. Begin with the spread of the error of the estimate, which is analogous to standard deviation in the sense that it measures the spread of the sample points about the regression line. This spread of the error of the estimate is denoted Sr and is defined by n 2 i=1 yi − yr i Sr = n−2 It is routine to verify that 0 Sr =
n
2 i=1 yi − a
n
i=1 yi − b
n−2
n
i=1 xi yi
with a and b given by equations (27.10) and (27.11) (or (27.12)), respectively.
1202
CHAPTER 27
Statistics
Next, analogous to the process developed for confidence intervals, define the maximum error r for a given x-value (which is in the x-population, but need not be one of the actual sample values xi ) to be 0 nx − x2 1 r = t/2 Sr 1 + + n 2 n n n i=1 xi − i=1 xi 2 Here t/2 is a critical point of the Student t-distribution corresponding to a confidence level . We have usually taken = 005 to have a 95% confidence level in the conclusion. The number t/2 must be determined from a table. The difference between the current and previous use is that previously, in estimating population mean, we had n − 1 degrees of freedom, and now we have n − 2. This information must be used in order to reference the value of t/2 from a table of critical values for the t-distribution. Now, for a given x, we define the prediction interval for the corresponding y to be yx − r < y < yx + r where yx = a + bx, the y-value given by the equation of the regression line for the given value of x.
EXAMPLE 27.32
Consider again the hunters and deaths in the deer population given over a ten year period. Suppose we want to estimate how many deer kills would correspond to x = 350 hunters. It is routine to compute the equation of the regression line, obtaining y = 84509 + 01386x. We have stated that x = 25550. We need to compute yr i = a + bxi for i = 1 10. These values are: xi 168 172 194 204 230 270 295 320 390 402 yr i = a + bxi 31.7357 32.2901 35.3393 36.7253 40.3289 45.8729 49.3379 52.8029 62.5049 64.1681
Compute 10 yi − yr i 2 = 803589 i=1
and
0 Sr =
Then
10
i=1 yi − yr i
8 0
r = 230631694 1 +
2
= 31694
10350 − 25550 1 + 10 10765989 − 26452
= 76703 Finally, compute yx = a + bx = 84509 + 01386350 = 59961, the y-value on the regression line corresponding to x = 350. The prediction interval is yx − r < y < yx + r , or 59961 − 76703 < y < 59961 + 76703
27.8 Correlation and Regression
1203
In the present context, we will use two decimal places and write the prediction interval as 5229 < y < 6763 The meaning of this interval is that, for a hunter population of 350, the predicted (by the regression line) number of deer deaths is 5996 (or 60) , and we have a confidence level of 95% (because of the way we chose the critical value t/2 ) that the actual number is between 52 and 68. Again, we can change the confidence level from 95% to another value by making the appropriate change in the choice of t/2 .
SECTION 27.8
PROBLEMS
For each of Problems 1 through 8, draw a scatter plot of the data, compute c and determine whether or not there is a significant linear correlation between the xi ’s and the yi ’s. If there is not, the problem is completed. If there is, determine the equation of the regression line, and, for each given x-value, determine the y-value predicted by the regression line, as well as the corresponding prediction interval with a 95% confidence level.
7. 0 −31
2.5 1.22
6.1 7.2
8.3 10.37
9.8 12.9
11.6 15.85
15.3 20.7
19.5 28.19
22.9 32.7
25.5 38.6
29.6 43.92
32.2 46.8
36.2 56.2
40.8 61.9
17.2 25.2
x = −4 111 301, and 667. 8.
2. 0.2 0.9 1.4 1.9 2.6 3.4 4.2 6.8 10 15.9 2.8 0.93 −061 −21 −39 −65 −89 −161 −272 −4241
xi yi
0 3
1.3 0.5
1.9 −084
5.4 7.4 12.9 −74 −1232 −248
2.3 −174 15.3 −276
2.5 −190 19.4 −359
2.8 −273 22.5 −437
3.1 4.8 −30 −69 25.9 −485
x = 15 43 13, and 32.
x = 52 86, and 178. 3. xi yi
x = 35 47, and 73
0.98 1.3 2.1 2.8 3.6 5.2 5.8 7.3 9.7 13.6 3.91 4.10 4.81 5.07 6.21 7.15 7.66 9.04 10.84 14.26
x = 04 62, and 151.
xi yi
xi 3.7 3.9 4.2 4.6 4.8 5.2 5.7 6.3 6.5 6.9 yi −12 3.9 −56 3.8 −129 4.2 −272 4.8 −186 15.9
xi yi
1. xi yi
6.
1.9 2.5 5.4 8.91 10.5 17.3 18.9 21.7 25.3 −14 4.2 3.15 −56 10.7 11.2 −68 −153 2.7
34.7 9.4
x = 0 92, and 388.
9. A basketball coach is attempting to evaluate the impact of an expensive player on the team’s won/loss record. He considers the following table, which gives number of games per season in which the player has seen at least twenty five minutes of floor time, together with wins over the season.
4. xi 4.8 5.8 7.3 9.5 15.3 19.5 27.4 32.6 36.3 41.9 yi 34.8 41.1 47.2 57.1 76.9 98.6 133.8 148.4 164.9 197.5
xi : Games yi : Wins
62 47
58 46
59 47
73 56
69 55
75 62
72 55
81 63
46 38
79 64
(a) Show that there is a significant linear correlation between games in which the player participated, and team wins.
x = 29 29 and 46. 5.
(b) Write the equation of the regression line. xi yi
3 1
34 13
38 19
42 228
59 382
75 542
x = 06 30 390, and 421.
123 90
218 191
298 249
353 287
(c) Estimate the percentage of wins (in games the player played in) that are explained by this player’s floor presence.
CHAPTER 27
1204
Statistics
(d) Predict with a 95% confidence level how many games the team will win next season if this player is in 85 games. Write the prediction interval for this estimate.
(a) Show that there is a significant linear correlation between number of visits and the number of thefts.
10. A drug is being tested. One purpose of the test is to demonstrate the efficacy of the drug, but another is to measure incidence of a certain side effect (lowered body temperature). The following table gives numbers of randomly chosen patients in ten states, together with the number exhibiting the side effect:
(c) Estimate the percentage of thefts that are explained by this person’s number of visits to the city each year.
xi : Patients yi : Side Effect
5,000 160
5,500 167
4,200 135
10,000 305
6,900 221
8,800 265
12,500 394
6,000 190
8,000 235
5,700 183
(a) Show that there is a significant linear correlation between number of patients and incidents of this side effect.
(b) Determine the equation of the regression line.
(d) Predict with a 95% confidence level how many thefts are likely to occur if this person visits New York 25 times next year. Write the prediction interval for this estimate. 12. A city is experimenting with an additive to the water with the objective of decreasing tooth decay. Over a ten year period, records have been kept on pounds of the chemical put into the water system and the number of children twelve or under who were cavity free that year.
(b) Determine the equation of the regression line. (c) Estimate the percentage of patients that will have this side effect in a patient population of 17 000. (d) Predict with a 95% confidence level how many patients out of a population of 17 500 will exhibit this side effect. Write the prediction interval for this estimate. 11. Over a period of ten years, the number of times a certain individual has visited New York each year has been recorded. Records have also been kept of the number of incidents of major jewelry thefts in the city for each year in this period. The numbers are as follows: xi : Visits yi : Thefts
17 6
12 5
15 6
22 8
14 5
9 3
18 7
21 8
10 5
13 5
xi : Additive (pounds) yi : Cavity free
410 3550
500 4412
475 4119
390 3420
470 4090
525 4598
480 4203
510 4452
590 5143
600 5259
(a) Show that there is a significant linear correlation between number of pounds of the chemical, and the number of children who were cavity free. (b) Determine the equation of the regression line. (c) Estimate the percentage of children who were cavity free that is explained by the chemical added to the water supply. (d) Predict with a 95% confidence level how many children are likely to be cavity free if 650 pounds are used. Write the prediction interval for this estimate.
Answers and Solutions to Selected Problems CHAPTER 1 Section 1.1 √ 1. Yes, since 2 = 2 x − 1
√
1
2 x−1
= 1 for x > 1
3. Yes
5. Yes
d 2 y + xy − 2x2 − 3x − 2y = y − 4x − 3 + 2y + x − 2y = 0 dx 13. y = 3 − e−x 15. y = 2 sin2 x − 2 7.
17. Direction field for y = x + y; solution satisfying y2 = 2
19. Direction field for y = xy; solution satisfying y0 = 2
y
4
y
4
4
2
2
2
2
4
x
4
2
2
2
2
4
4
21. Direction field for y = siny; solution satisfying y1 = /2
y
4
4
2
2
2
x
23. Direction field for y = y sinx − 3x2 ; solution satisfying y0 = 1
y
4
4
2
4
x
4
2
2
2
2
4
4
4
x
A1
Answers and Solutions to Selected Problems
A2
25. Direction field for y − y cosx = 1 − x2 ; solution satisfying y2 = 2
y 4 2 4
2
2
4
x
2 4 27. Hint: The lineal element to the solution curve through x0 z has equation y = qx0 − px0 zx − x0 + z
Section 1.2 1 ; also y = 0 7. secy = Ax 9. Not separable 1 − cx 1 2 2 11. 2 y − y + lny + 1 = lnx − 2 13. lny = 3x2 − 3 15. 3y sin3y + cos3y = 9x2 − 5 3 √ 4 1/3 1/3 17. 45 F 19. V = V0 + k t ; V0 = initial volume 21. 8.57 kg 23. 21 e−6 3 √ 25. 3888 2 s, or approximately 91 min, 39 s 27. (a) 576 s (b) 489 s (approximate) 29. No 1. 2x2 = y3 + c
3. Not separable
5. y =
Section 1.3 1. y = cx3 + 2x3 ln x 3. y = 21 x − 41 + ce−2x 5. y = 4x2 + 4x + 2 + ce2x x−1 ex dx + ce−x 9. y = x2 − x − 2 11. y = x + 1 + 4x + 1−2 13. y = 23 e4x − 11 7. y = e−x ex 3 x2 15. y = −2x2 + cx 17. A1 t = 50 − 30e−t/20 A2 t = 75 + 90e−t/20 − 75e−t/30 A2 t has its minimum value of 5450 81 pounds at 60 ln 95 minutes.
Section 1.4 1. 2xy2 + exy + y2 = c
5. y3 + xy + ln x = c 7. coshx sinhy = c 9. 3xy4 − x = 47 11. x sin2y − x = 13. Not exact 15. = −3 x2 y3 − 3xy − 3y2 = c 24
17. + c = = M and + c = =N
x
x
y
y 3. Not exact
Section 1.5 1. 3.
5. 11. 19.
N M − is independent of x.
x
y
N 1
M = 1 = −1, so this equation is not exact. (b) x = 2 (c) y = (a)
y
x x (d) x y = xa yb for all a and b satisfying a + b = −2 1 x2 y = c or y = −1 e3y xe3y − ey = c 7. x2 y x4 y2 + 2x3 y3 = c 9. y+1 1 e−3x y−4 y3 − 1 = ky3 e3x 13. y = 4 − ln x 15. x x2 y3 − 2 = −9 17. x
cM = c M = c N = cN ex ex sinx − y = 21 21.
y
y
x
x 1 M
1 y2
1 2 y = 4e−x /3 y
Answers and Solutions to Selected Problems
A3
Section 1.6 x 2 3. y = 1/1 + ex /2 5. y ln y − x = cy 7. xy − x2 − y2 = c c − ln x 4/7 2ex ce − br ar − dc 2 13. y = 2x 15. h = k= 11. y = 2 + 2 9. y = x−1 c − 75 x−5/4 cx − 1 ce − 1 ae − bd ae − bd 17. 3x − 22 − 2x − 2y + 3 − y + 32 = c 19. 2x + y − 32 = cy − x + 3 21. x − y + 32 = 2x + c 23. 3x − 2y − 8 ln x − 2y + 4 = x + c 25. Let the dog start at A 0, and suppose the man walks upward into the upper half-plane. The dog’s path is along the √ √ 1 2 graph of y = − A A − x + √ A − x3/2 + A, and the dog catches the man at A 2A/3 at time t = 2A/3v 3 3 A units. v a 27. (a) r = a − , a spiral (b) revolution 2v √ a + v2 + a2 2 1 a 2 2 2 v + a + ln (c) Distance = 2 v2 v 1. y = x +
Section 1.7
√ 8 1358 , about 18.91 ft/s 3. Velocity = 8 30, about 43.82 ft/s 9 3 √ √ √ Velocity = 12 5, about 26.84 ft/s; time = 23 ln6 + 35, about 2.15 s √ √ 3 40 x Time = dx, about 1.7117 seconds 9. Velocity = 2 210, about 28.98 ft/s √ 8 10 x3 − 100 Max height = 342.25 ft; object hits the ground at 148 ft/s, 4.75 s after drop vt = 32 − 32e−t for 0 ≤ t ≤ 4 vt = 81 + ke−8t /1 − ke−8t ft/s for t ≥ 4, with k = e32 3e4 − 4/5e4 − 4 limt→ vt = 8 ft/s; st = 32t + e−t − 1 for 0 ≤ t ≤ 4 st = 8t + 2 ln1 − ke−8t + 64 + 32e−4 − 2 ln2e4 /5e4 − 4 for t ≥ 4 √ 17.5 ft/s 17. t = 2 R/g, where R = radius of the circle VC = 76 V when t = ln20/2; i 21 ln20 = 16 A 21. i1 0+ = i2 0+ = 3/20 amp i3 0+ = 0 001 EC −t/RC (a) qt = EC + q0 − ECe (b) EC (d) −RC ln q0 − EC 3 y = c − 4 ln x 27. x2 + 2y2 − 4y = c 29. y2 lny2 − 1 + 2x2 = c
1. Velocity = 5. 7. 11. 13.
15. 19. 23. 25.
Section 1.8 1 1 1 −1n+1 n x (c) yn = 1 + x − x2 + x3 − x4 + · · · + 2 6 24 n! n n+1 −1 n 1 1 1 −1 x = 1 + x − x2 + x3 − x4 + · · · + xn + · · · (d) y = 2 − n! 2 6 24 n! n=0
7. (b) y = 2 − e−x
9. (b) y = 73 + 23 x3 (c) y0 = 3 y1 = 73 + 23 x3 = y2 = y3 = · · · = yn (d) y = 3 + 2x − 1 + 2x − 12 + 23 x − 13 = y1
CHAPTER 2 Section 2.2 1. y = cosh2x
3. y =
12 −3x e 5
− 75 e−8x
5. y = 2x4 − 4x4 ln x
4 6 7. y1 and y2 are solutions of x2 y − 4xy + 6y = 0. We must write this equation as y − y + 2 y = 0 to have the form x x to which the theorem applies, and then we must have x = 0. On an interval not containing 0, the Wronskian of these solutions is nonzero.
A4
Answers and Solutions to Selected Problems
9. y − y − 2y = 0 has solutions y1 = e−x and y2 = e2x , but y1 y2 = ex is not a solution. Many other examples also work. 11. Hint: Recall that at a relative extremum x0 y x0 = 0. 13. Hint: Consider Wx0 .
Section 2.3 1. y = c1 cos2x + c2 sin2x 3. y = c1 e5x + c2 xe5x 5. y = c1 x2 + c2 x2 ln x 7. y = c1 x4 + c2 x−2 cosx sinx 9. y = c1 + c2 11. y = c1 e−ax + c2 xe−ax √ √ x x 13. (a) y4 = c1 x + c2 (b) y − 1ey = c1 x + c2 or y = c3 (c) y = c1 ec1 x /c2 − ec1 x or y = 1/c3 − x (d) y = ln secx + c1 + c2 (e) y = ln c1 x + c2
Section 2.4 1. y = c1 e−2x + c2 e3x 7. 13. 17. 21.
3. y = e−3x c1 + c2 x 5. y = e−5x c1 cosx + c2 sinx √ √ √ √
c1 cos3 7x/2 + c2 sin3 7x/2 9. y = e7x c1 + c2 x 11. y = e−2x c1 cos 5x + c2 sin 5x y=e y = 5 − 2e−3x 15. yx = 0 y = 97 e3x−2 + 57 e−4x−2 19. y = ex−1 29 − 17x √ √ y = ex+2/2 cos 215 x + 2 + √515 sin 215 x + 2 −3x/2
1 ax e d − ac + cex + ac − d + ce−x 2 (c) lim→0 x = x (use l’Hôpital’s rule)
23. (a) x = eax c + d − acx
(b) x =
Section 2.5 1. y = c1 x2 + c2 x−3 7. 13. 17. 19.
5. y = c1 x4 + c2 x−4 √ √ y = c1 x + c2 x 9. y = x c1 + c2 lnx 11. y = x c1 cos 39 lnx/2 + c2 sin 39 lnx/2 y = x−2 3 cos4 ln−x − 2 sin4 ln−x 15. y = −3 + 2x2 y = −x−3 cos2 ln−x y = −4x−12 1 + 12 lnx 21. y = 11 x2 + 17 x−2 4 4 −2
3. y = c1 cos2 lnx + c2 sin2 lnx
−3
−12
3/2
Section 2.6 1. y = c1 cosx + c2 sinx − cosx ln secx + tanx 3. y = c1 cos3x + c2 sin3x + 4x sin3x + 43 ln cos3x cos3x −x
5. y = c1 ex + c2 e2x − e2x cose−x
7. y = c1 e + c2 e − x + x − 4 9. y = e c1 cos3x + c2 sin3x + 2x2 + x − 1 11. y = c1 e2x + c2 e4x + ex 13. y = c1 ex + c2 e2x + 3 cosx + sinx 15. y = e2x c1 cos3x + c2 sin3x + 13 e2x − 21 e3x 2x
2
x
17. y = c1 e2x + c2 e−x + 13 xe2x 19. y = c1 e2x + c2 e−3x − 16 x − 361
21. y = c1 x2 + c2 x4 + x
23. y = c1 cos2 lnx + c2 sin2 lnx − 41 cos2 lnx lnx
25. y = c1 e2x + c2 xe2x + e3x − 41
27. y = 47 e2x − 43 e−2x − 47 xe2x − 41 x 19 −6x 29. y = 38 e−2x − 120 e + 15 e−x + 127
31. y = 2e4x + 2e−2x − 2e−x − e2x
33. y = − 17 e2x + 55 e3x + 521 cos2x − 525 sin2x 4 13 √ √ 1 4x+1 35. y = 11 55 − e cosh 14x + 1 + √114 5e − 198 sinh 14x + 1 + 111 e−x e 37. y = 4e−x − sin2 x − 2
39. y = 2x3 + x−2 − 2x2
41. y = x − x2 + 3 coslnx + sinlnx
Answers and Solutions to Selected Problems
A5
Section 2.7 √ √ 1. y = e−2t 5 cosh 2t + √102 sinh 2t √ y = √52 e−2t sinh 2t graphed below
3. y = 25 e−t 2 cos2t + sin2t y = 25 e−t sin2t graphed below
y
y 5
5
4
4 3
3
y(0) = 5
2
2 y(0) = 0
1
1 t 0
2
4
6
8
10
2
0
4
1
12
√ A 5. y = √ e−2t sinh 2t graphed below 2
y A 10
1.5
1.5 1.0
6
0.5
3
0 4
0.5 1.0
9. y =
1.0
A1
2
4
0.5 6
8
t
10
1.0
A 7
A −t e sin2t graphed below 2
2 1 2
1
0 0.5
y
0
t
8
7. y = Ate−2t graphed below
y 2.0
6
4
6
8
t
2
4
6
8
t
Answers and Solutions to Selected Problems
A6
11. y = 601 57e−8t − 52e−3t down positive 13. At most once; no condition on only y0 is enough, as one needs to specify y 0 also. 15. Increasing C decreases the frequency 17. Hint: Show that m/k2 = d/a2 √ √ 1 √ sinh 7t − 28 cos3t + 72 sin3t 21. a y = 373 e−3t 2266 cosh 7t + 6582 7 √ √ 1 √ sinh 7t − 28 cos3t + 72 sin3t b y = 373 e−3t 28 cosh 7t + 2106 7 These functions are graphed below.
19.
A t sin0 t 2m0
y 6 5 4 3 2 1 t
0
2
4
6
8
10
12
14
√ √ e−t/2 98 cos 11t/2 + √7411 sin 11t/2 − 8 cos3t + 4 sin3t √ √ 1 e−t/2 8 cos 11t/2 + √164 sin 11t/2 − 8 cos3t + 4 sin3t b y = 15 11 These functions are graphed below.
23. a y =
1 15
y 6 4 2 2
4
6
8
10
12
14 t
0 2
25. it = 015e−00625t − 54 × 10−7 e−333327t + 0015 cos20t − 000043 sin20t 27. it = 00001633e−t + 000161e−03177t + 0000023e−t cos6t − 0000183e−t sin6t
CHAPTER 3 Section 3.1 1.
1 1 − s−1 s+1
3.
13. 2 cos4t − 45 sin4t
16s s2 + 42
5.
15. 3e7t + t
1 s − s2 s2 + 25
7.
17. e4t − 6te4t
2 6 8 16 3 s 9. 4 − 2 + 2 + + s3 s2 s s s s + 16 5 23. s1 + e−3s
11. −2e−16t
Answers and Solutions to Selected Problems
A7
25. From the graph, ft = 0 if 0 < t ≤ 5, and ft = 5 if 5 < t ≤ 10, and ft = 0 if 10 < t ≤ 25. Further, 5e−5s 1 − e−5s . ft + 25 = ft, so f is periodic of period T = 25. Thus £ fs = s1 − e−25s E 1 29. ft = h if 0 < t ≤ a and ft = 0 if a < t ≤ 2a. Further, ft + 2a = ft, so f is periodic 27. 2 s + 2 1 − e−s/ h . of period 2a, and L fs = s1 + e−as
Section 3.2 1. y = 41 − 13 e−4t 4
4 −4t 4 3. y = − 17 e + 17 cost + 171 sint
7. y =
+
22 2t e 25
−
13 2t te 5
3 25
cost −
4 25
sint
9. y =
1 16
+
5. y = − 41 + 21 t + 17 e2t 4 1 t − 33 16 16
15 cos4t + 64 sin4t
Section 3.3 1 6 3 2 s 1 3. 1 − e−7s + 2 cos7e−7s − 2 sin7e−7s − + s + 24 s + 22 s + 2 s s +1 s +1 2 s 2 s 1 1 11 4 1 1 − 9. + − − e−2s 5. 2 − e−3s − 2 e−3s 7. + s s s s + 1 s + 13 s + 12 + 1 s2 + 1 s s2 + 1 s2 + 1 s2 + 4s − 5 1 2 24 1 15 −16s 4 1 e 11. 2 13. 2 − − 2 + 15. + + 17. e2t sint s + 4s + 132 s s s s s + 55 s + 53 s + 52 √ √ √ 19. cos3t − 2Ht − 2 21. √12 e−3t sinh 2t 23. e−3t cosh2 2t − 2√1 2 e−3t sinh2 2t 1.
25. 29. 31.
1
1 − cos4t − 21Ht − 21 27. y = cos2t + 43 1 − cos2t − 4Ht − 4 16 √ y = − 41 + 121 e2t−6 + 16 e−t−6 cos 3t − 4Ht − 6 y = − 41 + 25 et − 203 cos2t − 15 sin2t + − 41 + 25 et−5 + 203 cos2t − 5 − 15 sin2t − 5Ht − 5
33. Eout = 5e−4t + 10 1 − e−4t−5 Ht − 5
35. it =
k k 1 − e−Rt/L − 1 − e−Rt−5/L Ht − 5 R R
K K 37. £ KHt − a − KHt − bs = e−as − e−bs s s c−t c−t t−a t−a Ht − a + h − Ht − b − h Ht − c 39. £ h b−a c−b b−a c−b hc − a e−bs h e−cs h e−as − + = b − a s2 c − bb − a s2 c − b s2
Section 3.4 cosat − cosbt t sinat if b2 = a2 if b2 = a2 b − ab + a 2a 1 1 5. 4 1 − cosat − 3 t sinat 7. 21 − 21 e−2t−4 Ht − 4 9. yt = e3t ∗ ft − e2t ∗ ft a 2a 11. yt = 41 e6t ∗ ft − 41 e2t ∗ ft + 2e6t − 5e2t 13. yt = 13 sin3t ∗ ft − cos3t + 13 sin3t 1.
1
sinh2t − sin2t 16
3.
15. yt = 43 et − 41 e2t − 121 e−2t − 13 et ∗ ft + 41 e2t ∗ ft + 121 e−2t ∗ ft 17. ft = 21 e−2t − 23 √ √ 19. ft = cosht 21. ft = 3 + 25 15et/2 sin 15t/2 23. ft = 41 e−2t + 43 e−6t
Section 3.5 1. y = 3 e−2t−2 − e−3t−2 Ht − 2 − 4 e−2t−5 − e−3t−5 Ht − 5 3. y = 6e−2t − e−t + te−t 5. t = B + 9e−2t − B + 6e−3t 0 = 3 0 = B 7. 3/ 9. 4 11. Eout = 10e−4t−2 Ht − 2 − 10e−4t−3 Ht − 3 13. 0 if t < a ft − a if t ≥ a m k 15. yt = v0 sin t k m
A8
Answers and Solutions to Selected Problems
Section 3.6 1. xt = −2 + 2et/2 − t yt = −1 + et/2 − t
3. xt = 49 + 13 t − 49 e3t/4 yt = − 23 + 23 e3t/4
5. xt = 43 − 43 e2t/3 + 21 t2 + 21 t yt = − 23 e2t/3 + t + 23 7. xt = e−t cost + t − 1 yt = e−t sint + t2 − t 9. xt = 1 − e−t − 2te−t yt = 1 − e−t 11. y1 t = 21 et + 21 e−t − 1 − t y2 t = − 41 t2 − 21 t y3 t = − 16 et + 16 e−t − 13 t 2 13. i1 t = 15 1 − 21 e−t/2 − 85
e−t−4/2 − cos2t − 4 + 29 sin2t − 4Ht − 4 and 1 −t/2 2 −t−4/2 + 85 e − cos2t − 4 − 4 sin2t − 4Ht − 4 i2 t = 10 e
5 1 4 4 15. x1 t = 36 − 20 cos2t − 45 cos3t − 365 − 201 cos2t − 2 − 45 cos3t − 2 Ht − 2 and
2 2 x2 t = 181 − 101 cos2t + 45 cos3t − 181 − 101 cos2t − 2 + 45 cos3t − 2 Ht − 2
17. m1 y1 = ky2 − y1 m2 y2 = −ky2 − y1 y1 0 = y1 0 = y2 0 = 0 y2 0 = d. Then m1 s2 + kY1 − kY2 = 0 and 2 m2 s2 + kY2 − kY1 = m2 ds. Replace Y2 with m1 sk +k Y1 in the second equation to get kd Y1 s = . The quadratic term in the denominator indicates that the objects will m1 s s2 + km1 + m2 /m1 m2 m1 m2 . oscillate with period 2 km1 + m2
1 −t−1 3 −t−1/6
+ 20 e 19. i1 = 10 e Ht − 1 i2 = − 101 e−t−1 + 101 e−t−1/6 Ht − 1 21. x1 t = 9e−t/100 + e−3t/50 + 3 e−t−3/100 − e−3t−3/50 Ht − 3 x2 t = 6e−t/100 − e−3t/50 + 2e−t−3/100 + 3e−3t−3/50 Ht − 3
Section 3.7 1. y = −1 + ce−2/t
3. y = 7t2
5. y = ct2 e−t
7. y = 4
9. y = 3t2 /2
CHAPTER 4 Section 4.1 1 6 1. y = −2 − 13 x3 + 121 x4 − 601 x5 − 120 x +···
3. y = 3 + 25 x − 12 + 56 x − 13 + 245 x − 14 + 16 x − 15 + · · ·
5. y = 7 + 3x − 1 − 2x − 12 − x − 13 + x − 14 + · · · 7. y = −3 + x + 4x2 + 76 x3 + 13 x4 + · · · 1 1 1 9. y = 1 + x − x2 + x3 + x4 + · · · 4 4 32
31 6 379 8 5 6 43 8 1741 11. y = a0 1 − 21 x2 + 16 x4 − 720 x + 40320 x − 21 x2 − 18 x4 + 144 x − 5760 x + 1209600 x10 + · · ·
1 1 4 1 6 1 x − x + x8 + · · · 13. y = x − 1 + A 1 − x2 + 2 42! 83! 164! 1 2n 2 x = x − 1 + A + 1e−x /2 y = x − 1 + A −1n n!2n n=0 an−1 1 6 1 1 for n ≥ 1 a1 = 0 y1 = 1 − 16 x3 + 180 15. an+2 = − x − 12960 x9 + 1710720 x12 − · · · a0 = 0 n + 1n + 2 1 7 1 1 y2 = x − 121 x4 + 504 x − 45360 x10 + 7076160 x13 − · · · 1 1 1 1 1 17. y = a0 1 + 45 x5 + 45910 x10 + 459101415 x15 + · · · + a1 x + 56 x6 + 561011 x11 + 1 1 1 1 x16 + · · · + 21 x2 + 267 x7 + 2671112 x12 + 26711121617 x17 + · · · 5610111516
1 1 1 1 6 1 7 1 1 7 5 x − x + · · · + a1 x − x2 + x3 − x4 + x −··· 19. y = a0 1 + x4 − x5 + 12 60 360 2520 2 6 24 120 1 1 1 1 1 1 3 6 9 4 7 21. y = a 1 − 23 x + 2356 x − 235689 x + · · · + b x − 34 x + 3467 x − 3467910 x10 + · · · 3 3x3 1 4 23. y = 2 − 3e−x = −1 + 3x − x2 + − x +··· 2 3! 8 x4 + · · · 25. y = 15 + 15 e2x 32 sinx − 6 cosx = −1 + 4x + 11x2 + 343 x3 + 27 4
Answers and Solutions to Selected Problems Section 4.2 1 1. a0 arbitrary, a1 = 1 2a2 − a0 = −1 an = an−2 for n ≥ 3, n 1 1 1 1 1 1 x4 + 246 x6 + 2468 x8 + · · · + x + 13 x3 + 35 x5 + 357 x7 + 3579 x9 + · · · y = a0 + a0 − 1 21 x2 + 24 1 3. a0 arbitrary, a1 + a0 = 0 2a2 + a1 = 1 an+1 = −an + an−2 for n ≥ 2 n + 1 2 1 13 1 4 11 5 31 6
1 2 1 3 7 4 y = a0 1 − x + 2! x + 3! x − 4! x + · · · + 2! x − 3! x + 4! x + 5! x − 6! x + · · · n − 1 a for n ≥ 1 5. a0 a1 arbitrary, a2 = 21 3 − a0 , an+2 = n + 1n + 2 n y = a0 + a1 x + 3 − a0 2!1 x2 + 4!1 x4 + 6!3 x6 + 35 x8 + 357 x10 + · · · 8! 10! n − 1an−1 − 2an 7. a0 a1 arbitrary, a2 + a0 = 0 6a3 + 2a1 = 1 an+2 = for n ≥ 2 1 1 3 1 4 n + 1n + 2 1 2 y = a0 + a1 x − a0 x + 6 − 3 a1 x + 6 a0 + 12 a1 x + · · · 9. a0 a1 arbitrary, 2a2 + a1 + 2a0 = 1 6a3 + 2a2 + a1 = 0 n − 2an − n + 1an+1 for n ≥ 3 an+2 = n + 1n 1 + 2 1 2 1 yx = a0 + a1 x + 2 − a0 − 2 a1 x + 3 a0 − 16 x3 − 241 + 121 a0 x4 + · · · −a2k−1 + 1/2k!−1k 1 11. a0 arbitrary, a1 = 1 a2k = − 2k for k ≥ 1 a2k−2 for k ≥ 1 a2k+1 = 2k + 1 1 1 y = a0 1 − 21 x2 + 24 x4 − 246 x6 + · · · + x − 2!1 x3 + 13 x5 − · · · 5!
Section 4.3 1. 0, regular, 3, regular
3. 0, regular, 2, regular 5. 0, irregular, 2, regular 1 1 n 1 cn−1 for n ≥ 1 y1 = xn+1/2 y2 = x 7. 4r 2 − 2r = 0 cn = 2n + r2n + 2r − 1 2n + 1! 2n! n=0 n=0 −4 c 9. 9r 2 − 9r + 2 = 0 cn = for n ≥ 1 9n + rn + r − 1 + 2 n−1 4 42 43 44 45 y1 = x2/3 1 − 34 x + 3467 x2 − 3467910 x3 + 34679101213 x4 − 346791012131516 x6 + · · · 258 · · · 3n − 1−1n 4n n+2/3 147 · · · 3n − 2−1n 4n n+1/3 x x y2 = x1/3 + = x2/3 + 3n + 1! 3n! n=1 n=1 −2n + r − 1 c for n ≥ 1 n + r2n + 2r − 1 n−1 n −1 2n + 3 n+1/2 −1n 2n n + 1 n x x y1 = x1/2 + y2 = 1 + 3n! n=1 n=1 135 · · · 2n − 1
11. 2r 2 − r = 0 cn =
13. 2r 2 − r − 1 = 0 c1 = −
2 2c − 2n + r − 1cn−1 c0 cn = n−2 for n ≥ 2 2r + 3 2n + r2 − n + r − 1
2 18 164 2284 37272 x2 + 2!57 x3 − 3!579 x4 + 4!57911 x5 − 5!5791113 x6 + · · · y1 = x − 1!5 13 119 1353 y2 = x−1/2 − x1/2 + 23 x3/2 − 3!3 x5/2 + 4!35 x7/2 − 5!357 x9/2 + · · ·
−9 c for n ≥ 2 9n + r2 − 4 n−2 2 3 3 3 3 y1 = x2/3 − 2 x8/3 + 4 x14/3 − 6 x20/3 2 1!5 2 2!58 2 3!5811 34 35 x26/3 − 10 x32/3 + · · · + 8 2 4!581114 2 5!58111417 3 32 33 y2 = x−2/3 − 2 x4/3 + 4 x10/3 − 6 x16/3 2 1!1 2 2!14 2 3!147 34 35 x22/3 − 10 x28/3 + · · · + 8 2 4!14710 2 5!1471013
15. 9r 2 − 4 = 0 9r 2 + 18r + 5c1 = 0 cn =
A9
A10
Answers and Solutions to Selected Problems
Section 4.4 1 3 1 4 1 1. y1 = c0 x − 1 y2 = c0∗ x − 1 lnx − 3x + 41 x2 + 36 x + 288 x + 2400 x5 + · · ·
x4 3 − 4x 3. y1 = c0 x4 + 2x5 + 3x6 + 4x7 + 5x8 + · · · = c0 y = c0∗ x − 12 2 x − 12
1 1 1 1 x3/2 + 2 x5/2 − 3 x7/2 + 4 x9/2 + · · · 5. y1 = c0 x1/2 − 2 · 1!3 2 2!35 2 3!357 2 4!3579 1 2 1 1 1 ∗ 3 4 x + 4 x −··· y2 = c0 1 − x + 2 x − 3 2 2 2!3 2 3!35 2 4!357 7. y1 = c0 x2 + 3!1 x4 + 5!1 x6 + 7!1 x8 + 9!1 x10 + · · · y2 = c0∗ x − x2 + 2!1 x3 − 3!1 x4 + 4!1 x5 − · · · 9. y1 = c0 1 − x y2 = c0∗ 1 + 21 x − 1 lnx − 2/x 11. (a) 25r 2 + 15r − 4 = 0, with roots r1 = 15 r2 = − 45 (b) y1 = cn xn+1/5 c0 = 0, and y2 = ky1 x lnx + cn∗ xn−4/5 n=0
n=0
13. (a) 48r 2 − 20r − 8 = 0, with roots r1 = 23 r2 = − 41 (b) y1 = cn xn+2/3 with c0 = 0 y2 = cn∗ xn−1/4 with c0∗ = 0 n=0
n=0
15. (a) 4r − 10r = 0, with roots r1 = 2
5 r2 2
=0
(b) y1 =
n=0
cn xn+5/2 with, c0 = 0 y2 =
n=0
cn∗ xn with c0∗ = 0
CHAPTER 5 Section 5.1 In Problems 1 through 6, approximate solutions were computed by Euler’s method with h = 02, then h = 01 and h = 005. 1. Solution of y = y sinx y0 = 1 on [0, 4] x 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
h = 02 1 1 1.03973387 1.12071215 1.24727249 1.42622019 1.66624478 1.97684583 2.36646226 2.83955291
h = 01 1 1.009098344 1.06048863 1.15460844 1.29838437 1.50052665 1.77177248 2.12354101 2.56550146 3.10178429
h = 005 1 1.011503107 1.07118496 1.17239843 1.32567276 1.54082744 1.82981081 2.205173 2.67726796 3.24991245
Exact Solution Value 1 1.02013342 1.08213832 1.19084648 1.35431161 1.58359518 1.89201471 2.29339409 2.79882454 3.41167064
2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4
3.39261128 4.00958982 4.65793762 5.28719068 5.83230149 6.22305187 6.39869129 6.32398766 6.0007799 5.46968635 4.80035219
3.72595725 4.41563127 5.12853123 5.90260482 6.36250556 6.73296375 6.85637053 6.70882212 6.30806368 5.70948505 4.99149302
3.91473574 4.64536355 5.39367649 6.09104775 6.65682772 7.01389293 7.10744867 6.92043335 6.47904791 5.8457521 5.10259695
4.12121011 4.89640429 5.68251387 6.403782 6.97423289 7.31547887 7.37646684 6.14775408 6.66425664 5.99525134 5.22598669
3. Solution of y = 3xy y0 = 5 on [0, 4] x h = 02 0 5 0.2 5 0.4 5.6
h = 01 5 5.15 5.90531
h = 005 5 5.22810641 6.14480554
Exact 5 5.30918274 6.35624575
Answers and Solutions to Selected Problems 0.6 0.8 1 1.2 1.4 1.6 1.8
6.944 9/44384 13.9768832 22.3630131 38.4643826 70.774464 138.717949
7.66399928 10.9426582 17.2324981 29.7949892 56.3244477 115.972038 259.17431
8.09129921 11.8990307 19.4849022 35.4287514 71.3385803 158.672557 388.901099
8.58003432 13.0584824 22.4084454 43.3556883 94.5792316 232.627372 645.121012
2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4
288.533335 634.773337 1472.67414 3593.32491 9198.91176 24653.0835 69028.6339 201563.611 612753.378 1936300.68 6351066,22
626.631648 1634.25534 4584.73992 13800.0672 44461.0564 152981.603 560983.536 2188060.18 9060757.21 39765851.2 184664659
1047.95246 3097.73286 10024.1822 35439.1284 136620.046 573253.363 2613461.49 12923807.1 69209615.8 400745113 2,50522053(109 )
2017.14397 7111.2827 28266.6493 126682.333 640137.266 3647081.85 23427893.9 169682214 1.38565379(109 ) 1,27581728(1010 ) 1,32445611(1011 )
5. Solution of y = y − cosx y1 = −2 on [1, 5] x 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
h = 02 −2 −250806046 −30821441 −373256636 −447323972 −532244724 −630370733 −744674857 −878861954 −103749657
h = 01 −2 −252479287 −31216086 −380291652 −458543259 −549105224 −654791245 −779161134 −926662206 −110279477
h = 005 −2 −253395265 −314338517 −384303839 −46482934 −558620527 −668669139 −798890878 −954186557 −114063233
Exact −2 −254372206 −316674525 −388424161 −471647511 −568995512 −6838775 −820617878 −984641081 −11826918
3 3.2 3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 5
−122615144 −145158187 −172193235 −204698286 −242844426 −291031376 −347930364 −416536915 −499228433 −598849815 −718794775
−131430765 −156943098 −187815545 −225256874 −270726284 −325982807 −393145364 −474765804 −573917761 −694304735 −8403149
−136570618 −163855953 −197034487 −237461298 −286779576 −346979486 −42046999 −510166552 −619598251 −753038559 −915664948
−142309923 −171609616 −207420636 −25127167 −305025422 −370949271 −451801881 −550939413 −672445787 −821292389 −10035338
7. 2.70481383; this approximation is too small because y x > 0 9. A drag coefficient of 0.3 gives y40 ≈ 23875 feet. A drag coefficient of 0.8 gives y40 ≈ 3515 feet. 11. With h = 01 we get y40 ≈ 3426 feet, so drums will likely rupture on impact.
Section 5.2 In Problems 1 through 9, the approximate solutions were computed using h = 01 by the modified Euler, Taylor and RK4 methods. 1. Solution of y = sinx + y y0 = 2 on [0, 4]. x Modified Euler 0 2 0.2 2.16260835 0.4 2.27764497 0.6 2.34149618 0.8 2.35864818 1.0 2.33660171 1.2 2.28294468
Taylor 2 2.16331964 2.27864781 2.34249201 2.35950334 2.33728697 2.28347612
RK4 2 2.16257799 2.27783452 2.34198641 2.35938954 2.33750216 2.28392071
A11
A12
Answers and Solutions to Selected Problems x 1.4 1.6 1.8
Modified Euler 2.20420589 2.10562833 1.99129888
Taylor 2.20461236 2.2.10593799 2.99153503
RK4 2.2.0519759 2.106598 2.99222519
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
1.86436976 1.72727096 1.58188451 1.42967893 1.27181015 1.10919644 0.942574127 0.772538925 0.599577036 0.424088544 0.246405248
1.86455048 1.72740984 1.58199169 1.42976193 1.27187458 1.10924652 0.942613018 0.772569046 0.599600237 0.424106255 0.24641858
1.8652422 1.72808569 1.58264163 1.4303807 1.27245995 1.10979812 0.94313596 0.773056001 0.60057301 0.424535308 0.24682153
x Modified Euler 0 1 0.2 1.26402944 0.4 1.45281544 0.6 1.58349547 0.8 1.67072292 1.0 1.72600007 1.2 1.75807746 1.4 1.77354268 1.6 1.77733578 1.8 1.7731483
Taylor 1 1.26403233 1.45269679 1.1.58325838 1.67040421 1.7256358 1.75769492 1.77316065 1.77696658 1.77279968
RK4 1 1.26466198 1.45391187 1.58485267 1.67218108 1.7274517 1.75945615 1.77481079 1.7784748 1.77415235
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
1.76339926 1.75078179 1.73641198 1.72133454 1.7062763 1.69172569 1.67799425 1.66526414 1.65362453 1.64309917 1.63382381
1.76459409 1.75182385 1.73731088 1.72210246 1.70692664 1.69227208 1.6784499 1.66564147 1.65393488 1.64335274 1.63387303
3. Solution of y = cosy + e−x y0 = 1 on [0., 4]
1.7637228 1.75107799 1.73668018 1.72157515 1.7064905 1.69191512 1.67816081 1.6650987 1.65375148 1.64320934 1.63384681
5. Solution of y = −y + e−x y0 = 4 on [0, 4] x 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
Modified Euler 4 3.43920787 2.95033866 2.5257356 2.1581561 1.84087289 1.56772495 1.33313308 1.13209122 0.960142926
Taylor 4 3.43898537 2.94997425 2.52528799 2.15766738 1.84037264 1.56723338 1.33266345 1.13165172 0.959737036
RK4 4 3.43866949 2.94940876 2.52453424 2.15677984 1.83939807 1.56621078 1.33162448 1.13062135 0.958734358
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
0.813341791 0.688221511 0.581743491 0.491259275 0.414468233 0.349378445 0.294270343 0.247663405 0.208285963 0.175048125 0.14701764
0.812973396 0.687889673 0.581447053 0.4909963 0.414236234 0.349174974 0.294092618 0.247508773 0.20815189 0.174932237 0.146917746
0.812012458 0.686980287 0.58059555 0.49020621 0.413508964 0.348509965 0.293488305 0.246962588 0.207660638 0.174492328 0.146525383
Answers and Solutions to Selected Problems
A13
11. Solution of y = − xy + x y1 = 1 on [1, 5] x 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8
Euler 1 1.01909091 1.10307692 1.23666667 1.41176471 1.62368421 1.86952381 2.1473913 2.456 2.79444445
3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 3.8 5.0
3.16206897 3.5583871 3.9830303 4.43571428 4.91621621 5.42435897 5.95999999 6.52302235 7.11333332 7.73085105 8.37551019
Modified Euler Improved Euler 1 1 1.03616507 1.03583333 1.13040682 1.13 1.27101358 1.270625 1.45144731 1.45111111 1.66777308 1.6675 1.9174816 1.91727273 2.19889755 2.19875 2.51085981 2.51076923 2.85253832 2.8525 3.223324 3.62275978 4.050496 4.50626061 4.98983879 5.50105862 6.03978091 6.6.60589176 7.19929706 7.81991837 8.46768981
3.22333334 3.6228125 4.05058824 4.50638889 4.99 5.50125 6.03999999 6.60613636 7.19956521 7.82020832 8.46799999
Exact 1 1.03555556 1.12952381 1.27 1.45037037 1.6666667 1.91636364 2.19777778 2.50974359 2.85142858 3.22222223 3.62166667 4.04941177 4.50518519 4.98877194 5.50000001 6.03873016 6.60484849 7.19826087 7.81888889 8.46666667
Section 5.3 In Problems 1 through 5, approximate solutions were computed using h = 01 and the Taylor, modified Euler, RK4 and Adams-Bashforth-Moulton methods. 1. Solution of y = 4y2 − x y3 = 0 on [3, 7] x 3.0 3.2 3.4 3.6 3,8 4.0 4.2 4.4 4.6 4.8
Taylor 0 −054950962 −0806871572 −0903569977 −0950052097 −0981689238 −100868179 −103393561 −105828751 −108199512
Modified Euler 0 −0535839146 −0798461491 −0900545864 −0949091031 −0981381742 −100856977 −103388121 −105825061 −10819642
RK4 Adams-Bashforth-Moulton 0 0 −053549231 −0535492931 −0806778016 −080455223 −0907274543 −0905024282 −0952253424 −0952144216 −0982583 −0982752774 −100898001 −100908338 −103403163 −103408442 −105831574 −105832243 −108200303 −10820113
5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.0
−110515444 −11278152 −11500123 −117177457 −119312738 −121409353 −123469371 −125494679 −127487005 −129447936 −131378934
−110512619 −11277886 −114998697 −117175035 −119310414 −121407118 −123467218 −125492601 −127487995 −129445989 −131377045
−110515651 −112781558 −115001214 −117177421 −11931269 −121409298 −123469309 −125494612 −127486934 −129447861 −131378855
−110515675 −112781814 −115001326 −117177585 −119312833 −121409441 −123469455 −125494751 −126487076 −129447997 −131378993
3. Solution of y = x2 + 4y y0 = −2 on [0, 4] x 0 0.2 0.4 0.6 0.8 1.0
Taylor −2 −43786 −956702144 −208862678 −456106649 −996738964
Modified Euler −2 −437798 −956504339 −208813151 −455991965 −99648156
RK4 Adams-Bashforth-Moulton −2 −2 −444723874 −444723874 −987011722 −986993267 −218900872 −218887964 −485658508 −48561028 −107830182 −107815098
A14
Answers and Solutions to Selected Problems x 1.2 1.4 1.6 1.8
Taylor −217977103 −476967742 −104409633 −228614646
Modified Euler −217920101 −476842265 −104382086 −228554246
RK4 Adams-Bashforth-Moulton −239588317 −239545057 −532640112 −532522265 −118458908 −11842788 −263515604 −26435871
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
−500652093 −109649933 −240161713 −526031881 −115219882 −252375158 −552799719 −12104893 −26522407 −580946399 −127250454
−500519732 −109620934 −240098188 −52589273 −115189402 −252308393 −552653478 −121052897 −265153905 −58079271 −12721679
−5862.83785 −130450526 −290271396 −645912795 −143730717 −31983686 −711719918 −158376449 −352429725 −784250292 −174516688
−586082572 −130400457 −290148203 −64561245 −143658051 −319662181 −711302336 −158277101 −352194351 −783694707 −174385976
5. Solution of y = 4x3 − xy + cosy y0 = 4 on [0, 4] x 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
Taylor Modified Euler 4 4 3.7794333 3.78043259 3.4063912 3.41050545 2.98742511 2.99596552 2.67888513 2.69113585 2.61904369 2.63355977 2.90467175 2.9211945 3.62976051 3.65076771 4.97054479 4.99929765 7.06207082 7.07183105
RK4 Adams-Bashforth-Moulton 4 4 3.77999805 3.77999805 3.41104815 3.4110294 2.99802794 2.99786651 2.69319067 2.69290979 2.63372362 2.63352302 2.91895169 2.91894526 3.64732119 3.64756467 4.99555968 4.99474741 7.05743353 7.05537836
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
9.37312564 12.2268713 15.6095086 19.4141045 23.4681005 28.1280137 33.10814 38.2813467 43.7987621 49.6967946 55.936434
9.37941313 12.2487484 15.605291 19.408775 23.5079285 28.1077469 33.052868 38.2882438 43.8484638 49.7487578 55.9845424
9.38527061 12.2600988 15.6111703 19.4310567 23.5044464 28.1104867 33.0802212 38.3144963 43.8554581 49.746549 55.9835947
7. Solution of y = 2xy − y3 y0 = 2 on [0, 4] x 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
h = 02 2 1.22921715 1.07091936 1.04836859 1.01157152 1.14419902 1.31901324 1.49026217 1.65343971 1.7992031
h = 01 2 1.27928486 1.08638842 1.05688972 1.10836346 1.21475378 1.35628443 1.51174278 1.66272411 1.79992887
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
1.92265942 2.02954279 2.1312142 2.23463857 2.23692569 2.42896574 2.50245762 2.55106451 2.55394614 2.38180227 2.10320803
1.92272594 2.03410264 2.13707792 2.23374303 2.32542008 2.41296935 2.49699181 2.57793593 2.65615241 2.7319243 2.80548524
9.3863398 12.2388627 15.6243401 19.377544 23.5176327 28.1137305 32.9831436 38.2409524 43.8459653 49.7574115 55.9910457
Answers and Solutions to Selected Problems
A15
CHAPTER 6 Section 6.1
√ √ √ √ √ 1. 2 + 2i + 3j 2 − 2i − 9j + 10k 38 63 4i − 6j + 10k 3 2i + 18j − 15k √ √ 3. 3i − k i − 10j + k 29 3 3 4i − 10j 3i + 15j − 3k √ √ 5. 3i − j + 3k −i + 3j − k 3 2 3 2i + 2j + 2k 6i − 6j + 6k 7. F + G = 3i − 2j F−G = i 9. F + G = 2i − 5j
y
1
y
y
2 x
G
1 FG
F 1
1
G FG
1 1
F
αF
y G
x 1 2 3 1 F 2 3 G 4 FG 5
FG x F
15. F = −9i + 6j
y
y F αF
x
2
13. F = 12j
11. F = − 21 i − 21 j
F−G = j
y
αF
αF
F
x x
6 2
9 3
x αF
F αF
17. x = 3 + 6t y = −t z = 0 − < t < 19. x = 0 y = 1 + t z = 3 + 2t − < t < √ 21. x = 2 + 3t y = −3 − 9t z = 6 + 2t − < t < 23. F = 3i + 3 3j 25. F = √152 i − √152 j
Section 6.2
√ √ 1. F · G = 2 cos = 2/ 14; not orthogonal; F · G = 2 < 14 = F G √ √ √ √ √ √ 3. −23 −23/ 29 41; not orthogonal; 23 < 29 41 5. −18 −9/10; not orthogonal; 18 < 10 40 = 20 √ 7. 3x − y + 4z = 4 9. 4x − 3y + 2z = 25 11. 7x + 6y − 5z = −26 13. 112/11 105 √ 15. 113/5 590 17. F = O 19. Hint: F · U = F cos, which is a maximum when cos = 1
Section 6.3 In Problems 1 through 5, the value is given in order for F × G cos sin, and the common value of F G sin and F × G. √ √ √ √ √ √ √ √ √ √ 1. 8i + 2j + 12k −4/ 69 53/ 69 2 53 3. −8i − 12j − 5k −12/ 29 13 233/ 29 13 233 √ √ √ √ √ √ 5. 18i + 50j − 60k 62/5 2 109 1606/5 2 109 2 1606 7. Not collinear; x − 2y + z = 3 √ √ 9. Not collinear; 2x − 11y + z = 0 11. Not collinear; 29x + 37y − 12z = 30 13. 7 2 15. 2 209 17. 92 19. 98 21. 22 23. i − j + 2k 25. 7i + j − 7k i i j k j k i j k f f f3 27. F × G + H = = f1 f2 f3 + f1 f2 f3 = F × G + F × H g +1 h g +2 h g3 + h3 g1 g2 g 3 h1 h2 h3 1 1 2 2
Answers and Solutions to Selected Problems
A16
i 31. F G H = a1 i + b1 j + c1 k · a2 a 3 a1 b1 c1 = a2 b2 c2 a b c 3
3
j b2 b3
k c2 c3
= a1 b2 c3 − c2 b3 + b1 c2 a3 − a2 c3 + c1 a2 b3 − b2 a3
3
Section 6.4 In Problems 1 through 5, we give in order the sum of the two vectors, their dot product, and the cosine of the angle between them. √ √ 1. 5e1 + 5e2 + 6e3 + 5e4 + e5 0 0 3. 17e1 − 4e2 + 3e3 + 6e4 24 6/ 30 17 √ √ 5. 6e1 + 7e2 − 6e3 − 2e5 − e6 + 11e7 + 4e8 −94 −47/ 37 303 7. S is a subspace of R5 . 9. S is not a subspace of R6 . 11. S is a subspace of R4 . 13. S is a subspace of R4 . 15. F + G2 + F − G2 = F + G · F + G + F − G · F − G = F · F + 2F · G + G · G + F · F − 2F · G + G · G = 2F · F + 2G · G = 2F2 + G2 17. Using part of the calculation from Problem 15, F + G2 = F2 + 2F · G + G2 . If also F + G2 = F2 + G2 , then F · G = 0 and the vectors are orthogonal.
Section 6.5 1. 15. 19. 21. 23.
Independent 3. Independent 5. Dependent 7. Dependent 9. Independent 13. Dependent; F G H = 0 Independent; F G H = −44 = 0 17. A basic consists of 1 0 0 −1 and 0 1 −1 0; the dimension is 2. A basis consists of 1 0 −2 and 0 1 1; the dimension is 2. A basis consists of 1 0 0 0 0 0 1 0 and 0 0 0 1; the dimension is 3. A basis consists of (1,4); the dimension is 1.
CHAPTER 7 Section 7.1
1.
5.
9.
11. 13. 17.
23.
2 + 2x − x2 −12x + 1 − xx + ex + 2 cosx x x 2x x −22 − 2x + e + 2e cosx 4 + 2x + 2e + 2xe −10 −34 −16 −30 −14 −36 0 68 196 20 10 −2 −11 −8 −45 ; BA is not defined. 7. AB = 128 −40 −36 −8 72 −5 1 15 61 −63 ⎞ ⎛ 3 −18 −6 −42 66 ⎜ −2 12 4 28 −44 ⎟ ⎟ ⎜ 36 12 84 −132 ⎟ AB = 115 BA = ⎜ −6 ⎝ 0 0 0 0 0 ⎠ 4 −24 −8 −56 88 410 36 −56 227 AB is not defined; BA = 17 253 40 −1 39 −84 21 AB is not defined; BA = −16 −13 −5 15. AB = BA is not defined. −23 38 3 AB is 14 × 14; BA is 21 × 21. 19. AB is not defined; BA is 4 × 2. 21. AB is not defined; BA is 7 × 6. ⎞ ⎞ ⎛ ⎛ 2 7 7 4 4 14 17 17 18 18 ⎜ 7 8 9 9 9 ⎟ ⎜ 17 34 33 26 26 ⎟ ⎟ ⎟ ⎜ ⎜ A3 = ⎜ 7 9 8 9 9 ⎟ and A4 = ⎜ 17 33 34 26 26 ⎟. ⎝ 4 9 9 6 7 ⎠ ⎝ 18 26 26 25 24 ⎠ 4 9 9 7 6 18 26 26 24 25 The number of distinct v1 − v4 walks of length 3 is A3 14 , or 4; the number of v1 − v4 walks of length 4 is A4 14 = 18. The number of distinct v2 − v3 walks of length 3 is A3 23 = 9; the number of distinct v2 − v4 walks of length 4 is 26. 14 10 −26
−2 −5 −43
6 −6 −8
3.
Answers and Solutions to Selected Problems ⎞ ⎞ ⎛ ⎛ 4 2 3 3 2 10 10 11 11 10 42 32 41 ⎜ 2 3 2 2 3 ⎟ ⎜ 10 ⎜ 32 30 32 6 10 10 6 ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ 25. A2 = ⎜ 3 2 4 3 2 ⎟ A3 = ⎜ 11 10 10 11 10 ⎟, and A4 = ⎜ 41 32 42 ⎝ 3 2 3 4 2 ⎠ ⎝ 11 10 11 10 10 ⎠ ⎝ 41 32 41 2 3 2 2 3 10 6 10 10 6 32 30 32 The number of distinct v4 − v5 walks of length 2 is 2, the number of v2 − v3 walks of length 3 v1 − v2 walks of length 4 is 32, and the number of v4 − v5 walks of length 4 is 32. ⎛
41 32 41 42 32 is 10,
32 30 32 32 30 the
A17
⎞ ⎟ ⎟ ⎟. ⎠ number of
Section 7.2 In Problems 1 through 7, the first matrix is A, the second is . ⎞ ⎛ ⎞ ⎛ 1 √0 0 −2 √1 4√ 2 √ 1. ⎝ 0 3 16 3 3 3 ⎠ ⎝ 0 3 0 ⎠ 1 −2 4 8 0 0 1 ⎛ ⎞ ⎛ ⎞ 40√ 5√ −15 0 5 √0 √ 3. ⎝ −2 + 2 13 14 + 9 13 6 + 5 13 ⎠ ⎝ 1 0 13 ⎠ 2 9 5 0 0 1 −1 0 3 0 √ 15 30 √ 120√ −36 28 −20 7. 5. 1 3 −3 + 2 3 15 + 8 3 −13 3 44
0 28 9
1 0 14
0 0 1
0 4 0
Section 7.3 In Problems 1 through 11, the reduced matrix AR is given first, then a matrix such that A = AR . ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎛ 0 1 0 −1 0 0 1 1 −4 −1 0 1 1 0 1 0 5 0 0 1⎟ ⎜ 0 0 0 1⎟ ⎜0 1⎟ ⎜0 ⎜0 0 1 2 0 1 0 5. ⎝ 3. ⎝ 1. 0 0⎠ ⎝1 0 0 0 0⎠ ⎝ 0 0 1 0⎠ 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 ⎛ 1 0 0 1 0 0 −8 −2 38 1 1 0 0 0 0 1 ⎜ 0 1 0 0 1 0 ⎝ 37 43 −7 7. 9. 1 1 11. 0 1 23 21 270 2 2 0 0 1 0 0 1 19 −29 11
⎞ 0 1 −3 0 0 1⎟ 0 −6 17 ⎠ 1 0 0 ⎞ 0 21 −1 ⎟ 0 0 1⎠ − 17
2 7
− 37
Section 7.4 In Problems 1 through 13, first the reduced matrix is given, then its rank, a basis for its row space (as row vectors), the dimension of the row space, a basis for the column space (as column vectors) and the dimension of the column space. 1 −3 1 0 1 0 − 35 1 −4 2 2 2 0 1 2 −3 1 2 2 2 2 3. 2 −4 1 3 2 2 0 2 1. 2 2 0 1 35 −3 4 0 0 1 0 − 41 21 −4 8 5. 2 2 8 −4 3 2 1 −1 1 0 2 1 5 −1 1 0 1 −4 2 ⎞ ⎞ ⎛ ⎞ ⎛ ⎛ ⎛ ⎞ 1 0 0 2 2 1 0 1 0 1 −1 3 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ ⎜ ⎟ 7. ⎝ 3 2 2 1 1 −1 3 0 0 1 3 ⎝ 3 0 0 1 ⎠ 0 ⎠ ⎝ 0 ⎠ ⎝ 1 ⎠ 0 0 0 4 0 7 3 4 0 1 0 0 0 1 0 3 0 4 3 6 1 0 2 2 2 3 6 1 0 3 9. 2 2 2 0 0 1 2 2 −3 1 0 0 1 0 5 3 0 1 0 3 −3 2 2 1 0 5 0 0 2 3 11. 2 0 0 0 0 1 5 −2 1 0 −11 1 2 0 0 1 −3 2 −2 5 7 0 1 −3 2 13. 11 −4 0 0 0
Answers and Solutions to Selected Problems
A18
Section 7.5 ⎛ ⎞ ⎞ ⎛ ⎜ −1 1 ⎜ 0 ⎜ ⎜ 1 ⎟ ⎜ −1 ⎟ 3. 0 (only the trivial solution) 5. ⎜ 1. ⎝ +⎝ ⎠ ⎠ ⎜ 0 1 ⎜ 0 ⎝ 1 0 ⎛
− 49 − 47 − 58 13 8
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
1 ⎛ ⎛ ⎜ ⎜ 9. ⎜ ⎝
5 14 11 7 6 7
⎞ ⎟ ⎟ ⎟ ⎠
1
⎜ ⎜ ⎜ ⎜ ⎜ 11. ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
1 1 0 1 1 0 0
⎛ ⎞ ⎞ ⎛ 0 −2 ⎜ − 23 ⎟ ⎟ ⎜ 1 ⎟ ⎜ ⎟ ⎟ ⎜ 2 ⎟ ⎟ ⎜ 2 ⎟ ⎟ ⎜ ⎜ 3 ⎟ ⎟ ⎜ −3 ⎟ ⎟ ⎜ 4 ⎟ ⎟ ⎜ ⎟+⎜ − ⎟+! ⎜ 0 ⎟ ⎟ ⎜ 3 ⎟ ⎟ ⎜ ⎜ 0 ⎟ ⎟ ⎟ ⎜ ⎜ ⎟ ⎟ ⎜ 0 ⎟ ⎟ ⎜ ⎟ ⎟ ⎜ ⎝ 1 ⎠ ⎠ ⎝ 0 ⎠ 0 1 ⎞
⎛
− 56
⎜ −2 ⎜ 3 ⎜ 8 ⎜ − 3 7. ⎜ ⎜ 2 ⎜ −3 ⎜ ⎝ 1 0
⎛
⎞
− 59
⎜ − 10 ⎟ ⎜ ⎟ 9 ⎜ 13 ⎟ ⎜ − ⎟ 9 ⎟ + ⎜ ⎜ ⎟ ⎜ − 19 ⎟ ⎜ ⎟ ⎝ ⎠ 0 1
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
Section 7.6 1. 2 3. 0 5. 1 7. 2 9. 1 11. 3 13. Yes, provided that rank(A) < number of unknowns in the system.
Section 7.7 In each of Problems 1 through 13, the unique or system has no solution. ⎛ ⎛ 0 ⎞ ⎛ 1 ⎞ ⎛ ⎞ ⎜ ⎜ 0 ⎟ ⎜ 1 ⎟ 1 ⎜ ⎜ 1 ⎟ ⎜ 3 ⎟ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ ⎜ 1. ⎝ 21 ⎠ 3. ⎜ 2 ⎟ + ⎜ 2 ⎟ + ! ⎜ ⎜ ⎜ 0 ⎟ ⎜ 1 ⎟ ⎜ ⎝ ⎠ ⎠ ⎝ 4 ⎝ 0 1 0 0 ⎛ ⎜ ⎜ ⎜ 7. ⎜ ⎜ ⎝
− 21 −1 3 1 0
⎞
⎛
⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟+⎜ ⎟ ⎜ ⎠ ⎝
− 43 1 −2 0 1
⎞
9 8
⎞
⎟ ⎜ 2 ⎟ ⎟ ⎜ ⎟ ⎟ ⎜ ⎟ ⎟+⎜ 0 ⎟ ⎟ ⎜ ⎟ ⎠ ⎝ 0 ⎠ 0
⎛
⎞ ⎛ 22 ⎞ − 19 15 15 ⎜ 3⎟ ⎜ −5⎟ ⎜ ⎟ ⎜ ⎟ 11. ⎜ 67 ⎟ + ⎜ 121 ⎟ ⎝ 15 ⎠ ⎝− 15 ⎠
⎛
general solution is given, whichever applies, or it is noted that the
⎞ ⎛ 9 ⎞ − 17 2 2 ⎜ ⎟ −6 ⎟ ⎟ ⎜ 3 ⎟ 51 ⎟ 25 ⎟ ⎜ − 4 ⎟+⎜ 4 ⎟ ⎟ ⎜ ⎟ 0 ⎟ ⎜ 0 ⎟ 0 ⎠ ⎝ 0 ⎠ 1 0 ⎛ ⎞ ⎛ 1 −1 ⎜ 0 ⎜ 1 ⎟ ⎜ ⎟ ⎜ ⎜ ⎟ ⎜ ⎜ 0 ⎜ 0 ⎟ ⎜ ⎟ ⎜ ⎟+⎜ 1 0 9. ⎜ ⎜ ⎟ ⎜ ⎜ 0 ⎜ 0 ⎟ ⎜ ⎟ ⎜ ⎜ ⎟ ⎜ ⎝ 0 ⎝ 0 ⎠ 0 0
⎛ ⎞ ⎛ ⎛ 2 ⎞ −2 −4 ⎞ −1 ⎜ ⎟ −4 ⎟ ⎜ 2 ⎟ ⎜ 9 ⎟ ⎜ ⎜ 7 ⎟ ⎟ ⎜ −2 ⎟ ⎜ ⎜ −38 ⎟ ⎜ ⎟ ⎜ ⎟ 5. ⎜ 3 ⎟ + ⎜ 3 ⎟ + ⎜ − 11 ⎟ ⎜ 2 ⎟ 2 ⎟ ⎜ −4 ⎟ ⎜ ⎝ ⎠ ⎠ ⎝ 0 ⎠ ⎝ 1 0 0 0 1 ⎛ 3⎞ ⎞ ⎛ 1 ⎞ ⎛ 29 ⎞ ⎛ ⎞ − 14 −1 −7 14 ⎜ 0⎟ ⎟ ⎜ 0⎟ ⎜ 0⎟ ⎜ 0⎟ ⎜ ⎟ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ ⎟ ⎜ 3⎟ ⎟ ⎜ 1 ⎟ ⎜ 1⎟ ⎜ ⎟ ⎜ 14 ⎟ ⎟ ⎜− 14 ⎟ ⎜ 7 ⎟ ⎜ 0⎟ ⎜ ⎟ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ ⎟ ⎟ + ! ⎜ 0⎟ + ⎜ 0⎟ + ⎜ 0⎟ + ⎜ 0⎟ ⎜ ⎟ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎜ 0⎟ ⎟ ⎟ ⎜ ⎟ ⎜ 1⎟ ⎟ ⎜ 0⎟ ⎜ 0⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ ⎟ ⎝ 0⎠ ⎠ ⎝ 0⎠ ⎝ 0⎠ ⎝ 1⎠ 0
0
1
0
⎛ 16 ⎞ 57
⎜ ⎟ 13. ⎝ 99 57 ⎠ 23
57 0 1 15. AX = B has a solution if and only if rank A = rank A B. Since AR is a reduced matrix, this occurs only if the last n − r rows of B are zero rows, hence br+1 = br+2 = · · · = bn = 0
Section 7.8 1 −1 1. 2 5
2 1
1 −2 3. 1 12
2 5
3 1 5. 12 −3
−2 6
⎛ −6 1 ⎜ 7. ⎝ 3 31 1
11 10 −7
⎞ 2 ⎟ −1⎠ 10
⎛ −6 1 ⎜ 9. ⎝ 3 12 −3
6 9 3
⎞ 0 ⎟ −2⎠ 2
Answers and Solutions to Selected Problems ⎞ −23 ⎟ 1 ⎜ ⎜−75⎟ 11. ⎟ ⎜ 11 ⎝ −9⎠ 14 ⎛
⎞ 22 1⎜ ⎟ 13. ⎝27⎠ 7 30 ⎛
A19
1 −21 14 15. 5 0
CHAPTER 8 Section 8.1 1. 1, 2, 3 (the identity permutation) is even, as are 2, 3, 1 and 3, 1, 2; the permutations 1, 3, 2 and 2, 1, 3 and 3, 2, 1 are odd.
Section 8.2 1. −3
3. −62
Section 8.3
−1sgnp b1p1 b2p2 bnpn = −1sgnp a1p1 a2p2 anpn p p = n −1sgnp a1p1 a2p2 anpn = n detA
1. detB =
p
3. Using the result of Problem 1, detA = −At = −1n detAt = −1n detA. If n is odd, then −1n = −1, so detA = −detA, hence detA = 0.
Section 8.4 1. −22
3. −14
5. −2247
7. −122
9. −72
Section 8.5 1. 32 3. 3 5. −773 7. −152 9. 1693 13. 1 = detIn = detAA −1 = detA detA−1 = detA detAt = detA2 , hence detA = ±1.
Section 8.6 3. −1440
1. 2240
Section 8.7
1.
6 1 13 −1
1 2
⎛
210 1 ⎜ ⎜ 899 9. ⎜ 378 ⎝ 275 −601
3. −42 −124 −64 122
1 −4 1 5 42 223 109 −131
1 1
⎛
5 1 ⎜ −8 5. ⎝ 32 −2 ⎞ 0 −135⎟ ⎟ ⎟ −27⎠ 81
3 −24 −14
⎞ 1 ⎟ 24⎠ 6
⎛
−1 1 ⎜ 7. ⎝−8 29 −1
25 −3 −4
⎞ −21 ⎟ 6⎠ 8
Section 8.8 1. x1 = −11/47 x2 = −100/47 3. x1 = −1/2 x2 = −19/22 x3 = 2/11 5. x1 = 5/6 x2 = −10/3 x3 = −5/6 7. x1 = −86 x2 = −109/2 x3 = −43/2 x4 = 37/2 9. x1 = 11/31 x2 = −409/93 x3 = −1/93 x4 = 116/93
Section 8.9 1. 21
3. 61
5. 61
A20
Answers and Solutions to Selected Problems
CHAPTER 9 Section 9.1 In the following, when an eigenvalue is listed twice in succession, this eigenvalue has multiplicity 2, but does not have two linearly independent eigenvectors (hence only one eigenvector is listed for such an eigenvalue). In Problems 1–15, MAPLE code for the Gerschgorin circles is included. √ √ √ √ 6/2 − 6/2 1 − 6 1. 1 + 6 1 1 plot({[1+(3)*sin(t),3*cos(t), t=-Pi..Pi],[1+(2)*sin(t),2*cos(t), t=-Pi..Pi]}, scaling=CONSTRAINED); # and plot POINTS at (1,0) 7 0 3. −5 2 −1 1 plot({[2+sin(t),cos(t), t=-Pi..Pi]}, scaling=CONSTRAINED); # and plot POINTS at (-5,0) and (2,0) √ √ √ √ 1 −1 + 47i 1 −1 − 47i 5. 3 + 47i 3 − 47i 2 2 4 4 plot({[1+ (6)*sin(t),6*cos(t), t=-Pi..Pi],[2+ (2)*sin(t),2*cos(t), t=-Pi..Pi]}, scaling=CONSTRAINED); # and plot POINTS at (1,0) and (2,0) ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 0 2 0 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 7. 0 ⎝1⎠ 2 ⎝1⎠ 3 ⎝2⎠ 0 0 3 plot({[3*sin(t),3*cos(t), t=-Pi..Pi]}, scaling=CONSTRAINED); # and plot POINTS at (2,0), (0,0) and (3,0) ⎛ ⎞ ⎛ ⎞ 1 1 ⎜ ⎟ ⎜ ⎟ 9. 0 0 ⎝0⎠ −3 ⎝0⎠ 3 0 plot({[-3+(2)*sin(2),2*cos(t),t=-Pi..Pi],[sin(t),cos(t),t=-Pi..Pi]}, scaling=CONSTRAINED); # and plot POINTS at (-3,0) and (0,0) ⎞ ⎛ ⎞ ⎛ 0 −16 ⎟ ⎜ ⎟ ⎜ 11. 2 2 ⎝0⎠ −14 ⎝ 0 ⎠ 1 1 plot({[-14+sin(t),cos(t),t=-Pi..Pi],[2+sin(t),cos(t),t=-Pi..Pi]}, scaling=CONSTRAINED); # and plot POINTS at (-14,0) and (2,0) ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 6 0 14 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 13. 0 ⎝ 7⎠ 1 ⎝0⎠ 7 ⎝0⎠ 1 10 5 plot({[1+(2)*sin(t),2*cos(t),t=-Pi..Pi],[7+(5)*sin(t),5*cos(t),t=-Pi..Pi]}, scaling=CONSTRAINED); # and plot POINTS at (1,0), (0,0) and (7,0) √ ⎞ √ ⎞ ⎞ ⎛ ⎞ ⎛ ⎛ ⎛ 0 −2 −7 + 53 −7 − 53 ⎜0⎟ 1 ⎜−11⎟ ⎟ 1 ⎟ ⎜ ⎜ √ √ 0 0 ⎟ ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ ⎜ 15. 1 ⎜ ⎟; 2 ⎜ ⎟; −1 + 53 ⎜ ⎟; −1 − 53 ⎜ ⎟ ⎝1⎠ 2 ⎝ 0⎠ ⎠ 2 ⎠ ⎝ ⎝ 0 0 0 1 2 2 plot({[-4+(2)*sin(t),2*cos(t),t=-Pi..Pi],[3+sin(t),cos(t),t=-Pi..Pi]}, scaling=CONSTRAINED); # and plot POINTS at (-4,0), (1,0), (2,0) and (3,0) 17. The characteristic polynomial is I2 − A = 2 − + ! + ! − 2 . Now + !2 − 4! − 2 = − !2 + 42 ≥ 0, so the eigenvalues are real. 19. Suppose AE = E and E = O. Then A2 E = AAE = AE = AE = 2 E, so 2 is an eigenvalue of A2 with eigenvector E. The same idea holds for k > 2.
Answers and Solutions to Selected Problems
A21
21. The constant term of pA = In − A is found by setting = 0, yielding − A, which equals −1n A. 0 can be an eigenvalue of A if and only if this constant term is zero, which occurs exactly when A = 0, and this occurs exactly when A is singular.
Section 9.2 √ √ 1 3 + 7i 3 − 7i 3. Not diagonalizable 1. 8 −8 −8 ⎞ ⎛ 1 0 0 0 √ √ ⎜ 2+3 5 ⎟ ⎟ ⎜0 1 2−341 3 41 ⎟ √ √ 9. ⎜ ⎜ −1− 5 ⎟ −1+ 5 ⎠ ⎝0 0 2 2 0 0 1 1
⎛
0 ⎜ 5. ⎝1 0
5 1 0
0
⎞
⎟ − 23 ⎠ 1
7. Not diagonalizable
11. Hint: A2 has n linearlyindependent eigenvalues 1 n .Thus eigenvectors X1 Xn with associated A2 − j In Xj = A− j In A+ j In Xj = O. Now show that either A− j In Xj = O or A+ j In Xj = O. 1 0 0 222 15. 13. 1 1 − 518 518 221 0 4
Section 9.3
√ √ ⎞ √ √ ⎛ √1+ 2 √1− 2√ √ √ 1+ 2 1 − 2 ⎝ 4+2√2 4−2 2 ⎠ 3. 5 + 2 ; 5 − 2 ; √ 1 √ √ 1 √ 1 1 4+2 2 4−2 2 √ √ ⎞ ⎛ ⎛ √ ⎞ √ ⎞ ⎛ 1− 2 1+ 2 √ √ 0 √ √ 1+ 2 1− 2 4+2 2 4−2 2 ⎟ √ ⎜ √ ⎜ ⎟ ⎟ ⎜ 1 1 ⎟ −1 + 2 ⎝ 1 ⎠; −1 − 2 ⎝ 1 ⎠; ⎜ ⎝0 √4+2√2 √4−2√2 ⎠ 0 0 1 0 0 √ √ ⎞ ⎛ ⎛ √ ⎞ √ ⎞ ⎛0 √ 5+ 41 √ 5− 41√ √ 41 41 5 + 5 − 82+10 41 82−10 41 ⎟ √ √ 1 ⎟ 1 ⎟ ⎜ ⎜ ⎜ ⎟ 1 0 0 5 + 41 ⎝ 0 ⎠; 5 − 41 ⎝ 0 ⎠; ⎜ ⎝ ⎠ 2 2 √ 4 √ 0 √ 4 √ 4 4 82+10 41 82−10 41 ⎛ ⎞ ⎞ ⎞ 1 ⎛ ⎛ 0 0 √ √ 0 0 ⎟ √ √ √ ⎟ 1 √ ⎟ ⎜ 1 − √1− 17√ ⎟ ⎜0 − √1+ 17√ ⎜ ⎜ 1 + 17 ⎝−1 − 17⎠; 1 − 17 ⎝−1 + 17⎠; ⎜ 34+2 17 34−2 17 ⎟ ⎝ ⎠ 2 2 4 4 √ 4 √ √ 4 √ 0 34+2 17 34+2 17 ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 0 0 0 0 0 0 ⎜0⎟ ⎜1⎟ ⎜−1⎟ ⎜0 0 √1 − √12 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 2 0 ⎜ ⎟; −1 ⎜ ⎟; 3 ⎜ ⎟; ⎜ ⎟ 1 √1 ⎠ ⎝0⎠ ⎝1⎠ ⎝ 1⎠ ⎝0 0 √ 2 2 1 0 0 0 1 0 0
1 −2 1 1 1. 0 ; 5 ; √ 2 1 5 2 ⎛ ⎞ 0 ⎜ ⎟ 5. 3 ⎝0⎠; 1 ⎛ ⎞ 0 ⎜ ⎟ 7. 0 ⎝1⎠; 0 ⎛ ⎞ 1 ⎜ ⎟ 9. 0 ⎝0⎠; 0 ⎛ ⎞ 1 ⎜0 ⎟ ⎜ ⎟ 11. 0 ⎜ ⎟; ⎝0 ⎠ 0
−2 1
Section 9.4 1.
1 1
⎛
1 6
3.
1 −2
−2 1
−1 ⎜ 0 ⎜ 5. ⎜ 1 ⎝− 2
0
− 21
0
2
2
0
−1
3 2
0
−1
⎞
3⎟ 2⎟
⎟ 0⎠
√ √ 7. −1 + 2 5y12 + −1 − 2 5y22
1 √ √ √ √ 2 2 2 9. 2 + 29y1 + 2 − 29y2 11. 2 + 13y1 + 2 − 13y22 13. −y12 + y22 + 2y32 √ √ 15. 21 61y12 − 21 16y22 = 5; hyperbola 17. 5y12 − 5y22 = 8; hyperbola 19. −2x12 + 2x1 x2 + 6x22 21. 6x12 + 2x1 x2 − 14x1 x3 + 2x22 + x32
A22
Answers and Solutions to Selected Problems
Section 9.5
i ; not diagonalizable 1 ⎛ ⎞ ⎞ ⎛ ⎛ ⎞ ⎞ ⎛ 1 1 1 1 1 1 √ √ ⎜ 0 ⎟ √ ⎜ √ ⎟ √ ⎜ √ ⎟ ⎜ 0 3i − 3i ⎟ ⎟ ⎟ ⎜ 3. Skew-hermitian; 0 ⎜ ⎝ 1 + i ⎠; 3i ⎝ 3i ⎠; − 3i ⎝ − 3i ⎠; ⎝ ⎠ 1+i −1 − i −1 − i −1 − i −1 − i 2 2 2 − 5 + 3 = 0. and corresponding eigenvectors are, approximately, 5. Hermitian;⎛eigenvalues satisfy 3 − 3⎛ ⎞ ⎞ Eigenvalues ⎛ ⎞ 1 1 1 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ 4051374 ⎝ 0525687 ⎠; 0482696 ⎝−1258652⎠; −153407 ⎝ −2267035 ⎠; −1477791i 2607546i −0129755i ⎛ ⎞ 1 1 1 ⎜ ⎟ −1258652 −2267035 ⎠ ⎝ 0525687 −0129755i 2607546i −147791i 1. None of these; 2 2
3 2 7. Skew-hermitian; 4i = 0. Eigenvalues and eigenvectors are, approximately, ⎛ eigenvalues ⎞ satisfy −⎛i + 5 −⎞ ⎞ ⎛ −i i i ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ −2164248i ⎝−3164248⎠; 0772866i ⎝0227134⎠; 2391382i ⎝−1391382⎠; 2924109 0587772 −1163664 ⎞ ⎛ −i i i ⎟ ⎜ ⎝−3164248 0227134 −1391382⎠ 2924109 0587772 −1163664 ⎛ ⎞ ⎛ ⎛ √ ⎞ √ ⎞ √ ⎞ ⎛ √ 0 4+3 2 4−3 2 0 4+3 2 4−3 2 √ ⎜ √ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ ⎟ 9. Hermitian; 0 ⎝ i ⎠; 4 + 3 2 ⎝ −1 ⎠; 4 − 3 2 ⎝ −1 ⎠; ⎝ i −1 −1 ⎠ 1 −i −i 1 −i −i
CHAPTER 10 Section 10.1
3e6t ; x1 t = −3e2t + 3e6t x2 t = 3e2t + e6t e6t √ √ √ 4e1−2 3t 4e1+2 3t √ 1+2√3t √ 5 3. t = + 1 − 53 3 e1−2 3t , √ 1+2√3t √ 1−2√3t ; x1 t = 1 + 3 3 e −1 + 3e −1 − 3e √ √ √ √ x2 t = 1 − 16 3 e1+2 3t + 1 + 16 3 e1−2 3t ⎞ ⎛ t 0 e−3t e ⎟ ⎜ et 3e−3t ⎠; x1 t = 10et − 9e−3t x2 t = 24et − 27e−3t x3 t = 14et − 9e−3t 5. t = ⎝ 0 t t −3t −e e e
1. t =
−e2t e2t
Section 10.2
0 7c1 e3t 7e3t is a fundamental matrix; general solution is Xt = tC = . 1. t = 5c1 e3t + c2 e−4t 5e3t e−4t c1 + c2 e2t 1 e2t ; Xt = tC = 3. t = 2t −1 e −c1 + c2 e2t ⎛ ⎞ ⎞ ⎛ 3t −4t −e 1 2e c1 + 2c2 e3t − c3 e−4t ⎜ ⎟ ⎟ ⎜ 3e3t 2e−4t ⎠; Xt = ⎝ 6c1 + 3c2 e3t + 2c3 e−4t ⎠ 5. t = ⎝ 6 −13 −2e3t e−4t −13c1 − 2c2 e3t + c3 e−4t
Answers and Solutions to Selected Problems
A23
e−3t 2e4t is a fundamental matrix; the solution of the initial value problem is −3e4t 2e−3t 6e4t − 5e−3t Xt = . −9e4t − 10e−3t ⎞ ⎞ ⎞ ⎛ ⎛ 6 −t 51 4t ⎞ ⎛ 2t ⎛ −t 0 e4t 0 e2t 3e3t 4e − 3e3t 3e −5e + 5 e ⎟ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ ⎜ t = ⎝1 e2t e3t ⎠; Xt = ⎝2 + 4e2t − e3t ⎠ 11. t = ⎝−5e−t −e2t 0 ⎠; Xt = ⎝ 2e−t − 3e2t ⎠ 0 e2t 1 0 e3t 2 − e3t 0 3e2t 2t 8 2t t t 2 5e cost 5e sint 2e cos2t 2e sin2t c1 t + c2 t 17. 15. t = c1 t8 − 2c2 t2 et 2 cost + sint et 2 sint − cost e2t sin2t −e2t cos2t ⎞ ⎛ e−t sin2t 0 e−t cos2t 2 cost − 14 sint ⎟ ⎜ −t −t e cos2t − 2 sin2t e sin2t + 2 cos2t⎠ 21. ⎝ 0 10 cost − 20 sint 3e−t cos2t 3e−t sin2t e−2t ⎞ ⎛ t 2e + 5et cost + sint 3t 3t 2te3t e c1 e + 2c2 te3t ⎟ ⎜ t ; general solution Xt = ⎝2e + et 2 cost + 6 sint⎠ 25. t = 0 e3t c2 e3t 2et + et cost + 3 sint ⎛ 2t ⎞ ⎞ ⎛ 2t e 3e5t 27te5t c1 e + 3c2 + 27c3 te5t ⎜ ⎟ ⎟ ⎜ 3e5t 3 + 27te5t ⎠; Xt = ⎝ 3c2 + 3 + 27tc3 e5t ⎠ ⎝0 0 −e5t 2 − 9te5t
−c2 + 2 − 9tc3 e5t ⎛ ⎞ ⎛ ⎞ 3t t 2 3e 2c1 + 3c2 e3t + c3 et e 0 ⎜0 2e3t 0 −2et ⎟ ⎜ 2c e3t − 2c et ⎟ 5 + 2te6t ⎜ ⎟ ⎟ ⎜ 2 4 ; Xt = 31. Xt = ⎜ ⎟ ⎟ ⎜ ⎝1 23t ⎝c1 + 2c2 e3t − 2c4 et ⎠ 3 + 2te6t 0 −2et ⎠ t t 0 0 0 e c4 e ⎞ ⎛ ⎛ 2t ⎞ 2 cost + 6 sint −4t −e + 1 + 22te ⎜−2 cost + 4 sint⎟ 3c1 e2t + c2 e6t ⎟ ⎟ ⎜ ⎜ Xt = ⎝ −6e2t + 10e−4t ⎠ 35. Xt = ⎜ 37. Xt = ⎟ 1 − 9te2t ⎠ ⎝ −c1 e2t + c2 e6t 12e−4t 2t 4 − 9te 3t t 3t 2te c1 e + 5c2 e7t e 41. eAt = Xt = −c1 et + c2 e7t 0 e3t ⎞ ⎛ 2t 23 + 3te5t − 23 e2t −1 + 9te5t + e2t e 4t 4t −4te 1 − 2te ⎟ ⎜ 45. eAt = ⎝ 0 eAt = 1 + 3te5t 9te5t ⎠ te4t 1 + 2te4t 0 −te5t 1 − 3te5t
7. t =
9.
13.
19.
23.
27.
29.
33.
39.
43.
Section 10.3
1. Xt =
c1 1 + 2t + 2c2 t + t2 e3t
3. Xt =
c1 + 1 + tc2 + 2t + t2 − t3 e6t
c1 + c2 t + 4t2 − t3 e6t
−2c1 t + 1 − 2tc2 + t − t2 e3t + 23 et ⎞ ⎛ c2 e t ⎜ 1 − 2c et + c − 9c e3t ⎟ −1 − 14tet ⎟ ⎜ 2 3 4 7. Xt = 5. Xt = ⎜ ⎟ 3 − 14tet ⎠ ⎝ 2c4 e3t t 3t c1 − 5c2 t + 1 + 3te + c3 e ⎞ ⎛ 6 + 12t + 21 t2 e−2t 2 t e + 35 e6t − 25 et + 25 e6t ⎟ ⎜ 5 1 2 −2t −1 is a transition matrix. 9. Xt = ⎝ 2 + 12t + 2 t e ⎠ 11. t 0 = 3 t − 35 et + 35 e6t e + 25 e6t 5 3 −2t 3 + 38t + 66t2 + 13 t e 6 ⎞ ⎛ −3t e−3t − et −e−3t + et −e + 2et 3c1 e2t + c2 e6t − 4e3t − 103 ⎜ −3t t −3t t −3t t⎟ 13. ⎝−3e + 3e 3e − 2e −3e + 3e ⎠ 15. Xt = −c1 e2t + c2 e6t + 23 −e−3t + et e−3t − et −e−3t + 2et
A24
Answers and Solutions to Selected Problems
68 54 cos3t − 145 sin3t + 407 c1 et + 5c2 e7t + 145
2 + 41 + te2t 2 24 −2 + 21 + 2te2t −c1 et + c2 e7t + 145 cos3t + 145 sin3t − 487 ⎛ 1 2t ⎞ − 4 e + 2 + 2tet − 43 − 21 t 10 cost + 25 t sint − 5t cost ⎜ ⎟ 23. Xt = ⎝ e2t + 2 + 2tet − 1 − t ⎠ 21. Xt = 5 cost + 25 sint − 25 t cost 5 2t 3 1 t − 4 e + 2te − 4 − 2 t 17. Xt =
19. Xt =
CHAPTER 11 Section 11.1 1. a gives the initial value of (hence initial displacement); b gives the initial rate of change of with respect to time. There are no restrictions (at least mathematically) on a and b. 3. Yes
Section 11.2 1. Xt =
−c1 cost − c2 sint ; 4c1 − c2 cost + c1 + 4c2 sint
a phase portrait is shown below.
c1 e−3t + 7c2 e2t ; c1 e−3t + 2c2 e2t
3. Xt =
a phase portrait is shown below.
y
y 20
20
10
10
2
6 4 2 0 10
4
x
6
60 40 20 0
20 10
20
20 c1 e5t + 2c2 e4t ; c2 e4t a phase portrait is shown below.
5. Xt =
y
120
80
40
0
20
x
20 40 60
7.
4x dy = − 4x2 + 9y2 = c; integral curves are ellipses with center (0, 0). dx 9y
40
x
Answers and Solutions to Selected Problems dy = dx dy 11. = dx 13. These 9.
x−1 x − 12 − y + 22 = c; integral curves are hyperbolas with center (1 −2). y+2 x+y y = x ln cx x systems have the same graphs of trajectories, but with opposite orientations.
Section 11.3 1. eigenvalues −2 −2; improper nodal sink (phase portrait given below)
3. ±2i, center (phase portrait given below)
y
y
30
8 6 4
20 5
10
15 10
0
5 10
10
x
15
20
10
5. 4 ± 5i; spiral point (phase portrait given below)
y
2 1 2 0 1
4
6
8 10
x
2 √ 9. −2 ± 3i; spiral point (phase portrait given below)
y 10 5 2 1
0 5
10
x
7. 3, 3; improper nodal source (phase portrait given below)
y
8 6
20
4 6 8
20
4
10
0
1
2
3
x
0.4 0.8
8 6 4 2 2 4 6
0
0.4
0.8
x
A25
A26
Answers and Solutions to Selected Problems
Section 11.4 1. stable and asymptotically stable improper node 3. stable but not asymptotically stable center 5. unstable spiral point 7. unstable improper node 9. stable and asymptotically stable spiral point
Section 11.5 3. (0, 0) is an unstable saddle point; (−5 −5) is an asymptotically stable nodal sink (phase portrait given below).
1. (0, 0) is an unstable spiral point; (−3/2 3/4) is an unstable saddle point (phase portrait given below).
y
2
y
10
2
5
6
0
10 8 6 4
2
10
x
4
5
14
10
18 2 0
5. (0, 0) is a center of the linear system, and may be a center or spiral point of the nonlinear system; (1/2 −1/8) and (−1/2 1/8) are unstable saddle points (phase portrait given below).
2
4
6
8
7. (0, 0) is an asymptotically stable node or spiral point (phase portrait given below).
y
y 25 20
4 8 6
30
20
10
15 10 5
2 0 5 10
x
10
5 0 10
x
5 2 4 6 8 10 12
10
x
15
9. (0, 0) is an unstable saddle point; (3/8 3/2) is an asymptotically stable spiral point.
Section 11.6 1. stable
3. asymptotically stable
5. unstable
7. unstable
Section 11.7
f g
f g + = 3 + 4x2 − 2 sinx ≥ 1 + 4x2 > 0 3. + = coshx − 1 + 15e3y > 0
x y
x y dr d 7. = 0 = r 2 − 1; trajectories are the origin (a critical point), all points on the circle r = 1 (each is a critical dt dt point), and the closed circles r = a = 1. Each closed trajectory r = a = 1 is a stable limit cycle. 1.
Answers and Solutions to Selected Problems
A27
dr d = r1 − r 2 4 − r 2 9 − r 2 = −1; trajectories are the origin (a critical point), and the circles r = 1 r = 2, dt dt and r = 3 are each closed trajectories. The circles r = 1 and r = 3 are asymptotically stable limit cycles, and r = 2 is an unstable limit cycle. 11. xx + yy = x2 + y2 4 − x2 + 9y2 . The ellipse x2 + 9y2 = 4 is a closed trajectory, and the annular region 4 ≤ r 2 ≤ 4 has the stated property. 9 9.
13. The ellipse 4x2 + y2 = 4 is a limit cycle; the region defined by 1 ≤ r 2 ≤ 4 has the stated property. dy dr 15. No; = r1 + r 2 > 0 for r > 0. 17. No; > 0 and the system has no critical points, hence no trajectory. dt dt
f g + > 0 21. x4 + 2y2 = C is a closed trajectory for all positive C. 19. No;
x y 23. Apply Lienard’s theorem with px = x2 − 1 and qx = 2x + sinx. 25. x4 + 2y2 = C are closed trajectories for each positive C. 27. ln1 + x2 + y2 = C are closed trajectories for each positive C.
CHAPTER 12 Section 12.1 1. 3. 5. 7. 9.
11.
d
ftFt = −12 sin3ti + 12t 2 cos3t − 3t sin3tj + 8 cos3t − 3t sin3tk dt d
Ft × Gt = 1 − 4 sinti − 2tj − cost − t sintk dt d
ftFt = 1 − 8t3 i + 6t2 cosht − 1 − 2t3 sinhtj + −6t2 et + et 1 − 2t3 k dt d
Ft × Gt = tet 2 + t j − k dt √ position: Ft = sinti + costj − sintj + 45k; length: st = 2026t; + 45tk;tangent:F t = costi 45s s s position: Gs = Fts = sin √ i + cos √ j+ √ k 2026 2026 2026 √ 2 2 2 2 2 position: Ft = 2t 4t k = t 2i + 3j + 4k; tangent: F t = 2t2i + 3j + 4k; length: st = 29t − 1; i + 3t j + s 2i + 3j + 4k position: Gs = 1 + √ 29
Section 12.2
4t 6 6 9 + 4t2 at = 2k aT = √ aN = √ % = 9 + 4t2 3/2 9 + 4t2 9 + 4t2 1 1 T= √
3i + 2tk N = √
−2ti + 3k B = −j 9 + 4t2 9 + 4t2 1 3. v = 2i − 2j + k v = 3 a = O aT = aN = % = 0 T = 13 2i − 2j + k N = √ i + j (or any unit vector perpendicular 2 √ to T), B = 16 2−i + j + 4k √ √ 5. v = −3e−t i + j − 2k v = 3 6e−t a = 3e−t i + j − 2k aT = −3 6e−t 1 1 1 aN = 0 % = 0 T = √ −i − j + 2k N = √ i − j, or any unit vector perpendicular to T B = √ i + j + k 2 3 6 7. v = 2 coshtj − 2 sinhtk v = 2 cosh2t a = 2 sinhtj − 2 coshtk 1 2 sinh2t 2 aN = % = aT = 2 cosh2t3/2 cosh2t cosh2t 1 1
coshtj − sinhtk N =
− sinhtj − coshtk B = −i T= cosh2t cosh2t 1. vt = 3i + 2tk vt =
Answers and Solutions to Selected Problems
A28
9. v = 2ti + j + !k v = 2t 2 + 2 + ! 2 a = 2i + j + !k T = 2sgnt 2 + 2 + ! 2 , where sgnt equals 1 i + j + !k N is any unit vector perpendicular to 1 if t ≥ 0 and −1 if t < 0 aN = % = 0 T = 2 + 2 + ! 2 T, and B = T × N.
Section 12.3 1. 5. 9.
11. 13. 15. 17.
G
G
G
G = 3i − 4yj = −4xj 3. = 2yi − sinxj = 2xi
x
y
x
y
G
F
F
F
G = 6xi + j = −2j 7. = −4z2 sinxi − 3x2 yzj + 3x2 yk = −x3 zj + x3 k = 8z cosxi − x3 yj
x
y
x
y
z
F
F = −yz4 cosxyi + 3y4 zj − sinhz − xk = −xz4 cosxyi + 12xy3 zj
x
y
F = −4z3 sinxyi + 3xy4 j + sinhz − xk
z 1 1 x = x y = z = ex+k x = x y = z = ex−2 x+c x−1 x = x y = ex x − 1 + c x2 = −2z + k x = x y = ex x − 1 − e2 z = 21 12 − x2 √ x = c y = y 2ez = k − siny x = 3 y = y z = ln1 + 41 2 − 21 siny Take any constant F, say F = i + j + k. 23. This is impossible.
Section 12.4
√ √ 1. yzi + xzj + xyk i + j + k 3 and − 3 √ √ 3. 2y + ez i + 2xj + xez k 2 + e6 i − 4j − 2e6 k 20 + 4e6 + 5e12 and − 20 + 4e6 + 5e12 5. 2y sinh2xyi + 2x sinh2xyj − coshzk − cosh1k cosh1 and − cosh1 1 1 7. √ 8y2 − z + 16xy − x 9. √ 2x2 z3 + 3x2 yz2 3 5 √ √ 11. x + y + 2z = 4 x = y = 1 + 2t z = 21 + 2t 13. x = y x = 1 + 2t y = 1 − 2t z = 0 √ 15. x = 1 x = 1 + 2t y = z = 1 17. cos−1 45/ 10653 ≈ 111966 rad √ 19. cos−1 1/ 2, or /4 rad 21. Level surfaces are planes x + z = k. The streamlines of & are parallel to &.
Section 12.5 In 1, 3, and 5, & · F is given first, then & × F. 1. 4, O
3. 2y + xey + 2 ey − 2xk
5. 2x + y + z O
In 7, 9, and 11, & is given. 7. 11. 13. 15.
i − j + 4zk 9. −6x2 yz2 i − 2x3 z2 j − 4x3 yzk
cosx + y + z − x sinx + y + zi − x sinx + y + zj − x sinx + y + zk & · F = & · F + & · F & × F = & × F + & × F Hint: Let F = f1 i + f2 j + f3 k and G = g1 i + g2 j + g3 k and compute both sides of the proposed identity.
CHAPTER 13 Section 13.1
√ 1. 0 3. 26 2 5. sin3 − 81 7. 0 3 2 √ 27 15. 14 cos1 − cos3 17. − 2
9. − 422 5
√ 11. 48 2
Section 13.2 1. −8
3. −12
5. −40
7. 512
9. 0
11.
95 4
13. −12e4 + 4e2
Answers and Solutions to Selected Problems
A29
Section 13.2.1 1. 0
3. 2 if C encloses the origin, 0 if C does not.
5. 0
Section 13.3 1. Conservative; x y = xy3 − 4y.
3. Conservative; x y = 8x2 + 2y − 13 y3 .
5. Conservative; x y = lnx + y . 2
9. −27
11. 5 + ln3/2
15. e3 − 24e − 9e−1
13. −5
Section 13.4 √ 1. 125 2
3.
Section 13.5 1.
49 12
12
35
33 24 35 35
7. Conservative; x y = ey sin2x − 21 y2 .
2
293/2 − 27 6
28 √ 2 3
5.
√ 3. 9 2 0 0 2
Section 13.6
-
7.
9
ln4 + 8
5. 78 0 0 27 13
√ √ 17 + 4 17
√ 9. −10 3
7. 32
dx + dy.
y
x 3. Hint: The unit tangent to C is x si + y sj. Show that the unit outer normal is N = y si − x sj. Then
dy dx
DN x y = & · N = − . But then DN x yds = − dx + dy. Now apply Green’s theorem to
x ds y ds
y
x -
− dx + dy.
y
x c 1. Hint: Apply Green’s theorem to
c
−
Section 13.7 1.
256 3
3. 0
5.
8 3
7. 2
9. 0 because & · & × F = 0.
Section 13.8 1. −8 15. −403
3. −16 17. 2e
−2
5. − 323
7. −108
9. x y = x − 2y + z
11. Not conservative.
13. Not conservative.
19. 71
CHAPTER 14 Section 14.1 1. Graphs of the second, third, and fourth partial sums are shown below.
y
y 2.5
2.5
2.0
2.0
1.5
1.5
1.0
1.0
0.5
0.5
0
0.5
1.0
1.5 2.0 n=2
2.5
3.0
x
0
0.5
1.0
1.5 2.0 n=3
2.5
3.0
x
A30
Answers and Solutions to Selected Problems y 2.5 2.0 1.5 1.0 0.5 0
0.5
1.0
1.5 2.0 n=4
2.5
3.0
x
Section 14.2 2 16 1 −1n 1 sinh + sinh cosnx 5. sin2n − 1x 2 n + 1 2n −1 n=1 n=1 nx nx 4 3 2 13 1 16 + −1n 2 2 cos + sin 9. + sin2n − 1x 7. 3 n=1 n 2 n 2 2 n=1 2n − 1 nx 1 −1n cos 13. The Fourier coefficients of f and g are the same. 11. sin3 + 6 sin3 2 2 3 3 n=1 n − 9
1. 4
3.
Section 14.3
n n 6 11 2n 1 2n + 1. The Fourier series is 4 sin − sin + 2 2 cos − cos + 2−1n + 18 n=1 n 3 3 n 3 3
nx n n n 6 6 18 2n 1 2n n cos + 4 cos + cos − 15−1 + sin − sin sin n3 3 3 3 n 3 3 n2 2 3 3 6 n=1 18 n nx − 3 3 cos − −1n sin ; this series converges to 23 if x = 3 or if x = −3, to 2x if −3 < x < −2, to n 3 3 −2 if x = −2, to 0 if −2 < x < 1, to 21 if x = 1, and to x2 if 1 < x < 3.
11 41 − 32n 1 3. Let n = n/3. The Fourier series is sinh3 − 2 cosh3 + −1n sinh3 + + 3 1 + 2n 31 + 2n 3 n=1 6
6 8 cosh3 4 2 − 3 42n − 1 cosh3 6n cosn x + −1n sinh3 sinn x; this + n n 2 2 − n 2 2 2 1 + n 1 + n 31 + n 1 + 2n 2 n=1 converges to 18 cosh3 if x = −3 or x = 3, and to x2 e−x if −3 < x < 3.
6 + 2 1 2 2 −1n 2 n n +2 1 − −1 −1 sinnx; this converges to 21 2 + 2 for 5. cosnx + + + 2 3 6 n n n n n=1 n=1 x = or x = −, to x2 if − < x < 0, to 1 if x = 0, and to 2 if 0 < x < . 4 1 2n − 1x sin ; converges to −1 if −4 < x < 0, to 0 if x = −4, 0 or 4, and to 1 if 0 < x < 4. 7. n=1 2n − 1 4 1 − −1n e− 2 1 − e− + cosnx; converges to e−x for − ≤ x ≤ . 9. n=1 n2 + 1
12. − 2 /12
Section 14.4 1. cosine series: 4 (this function isits own Fourier cosine expansion), converging to 4 for 0 ≤ x ≤ 3; sine series: 1 16 2n − 1x sin , converging to 0 if x = 0 or x = 3, and to 4 for 0 < x < 3 n=1 2n − 1 3
Answers and Solutions to Selected Problems
A31
2 1 −1n 2n − 1 2n − 1x cosx − cos , converging to 0 if 0 ≤ x < or x = 2, to − 21 2 n=1 2n − 32n + 1 2 −1n 2n − 1 n 2n − 1x 2 2 sin − sinnx, if x = , and to cosx if < x < 2; sine series: n=1 2n − 32n + 1 2 n=2 n − 1 1 converging to 0 if 0 ≤ x < or x = 2, to − 2 if x = , and to cosx if < x < 2 nx 4 16 −1n 5. cosine series: + 2 , converging to x2 for 0 ≤ x ≤ 2; sine series: cos 2 3 n=1 n 2
nx −1n 21 − −1n 8 sin + , converging to x2 for 0 ≤ x < 2 and to 0 for x = 2 − n=1 n n3 2 2
nx 12 6 2n 4 2n 1 sin + 2 2 cos − 2 2 1 + −1n cos , converging to x if 7. cosine series: + 2 n=1 n 3 n 3 n 3 0 ≤ x < 2, to 1 if x = 2, and to 2 − x if 2 < x ≤ 3; sine series:
4 2n 2 12 nx 2n n − cos + −1 , converging to x if 0 ≤ x < 2, to 1 if x = 2, to sin sin 2 2 3 n 3 n 3 n=1 n 3. cosine series:
2 − x if 2 < x < 3, and to 0 if x = 3 n nx n 5 16 1 4 9. cosine series: + 2 − sin cos , converging to x2 if 0 ≤ x ≤ 1, and to 1 if cos 2 3 6 n=1 n 4 n 4 4 n 2−1n nx 64 n 16 + − 1 − sin , converging to x2 if 1 < x ≤ 4; sine series: sin cos 2 2 4 n3 3 4 n 4 n=1 n 0 ≤ x ≤ 1, to 1 if 1 < x ≤ 4, and to 0 if x = 4 11. Let gx = 21 fx + f−x and hx = 21 fx − f−x. Then g is even and h is odd, and fx = gx + hx. 1 13. − 2 4
Section 14.5
−1n − 1 1 −1n+1 + sinnx . This series converges cosnx + 4 n2 n n=1 to 0 for − < x < 0, to x for 0 < x < , and to 21 f0+ + f0−, or 0, at x = 0. (b) f is continuous, hence piecewise continuous on − . By Theorem 14.5, its Fourier series can be integrated term-by-term to yield the integral of the sum of the Fourier series. x 0 if − ≤ x ≤ 0 (c) First, . ftdt = 1 2 x if 0<x≤ − 2
3. (a) The Fourier series of f on − is
This function is represented by the series obtained by integrating the Fourier series
term-by-term from − to x to 1 1 2 −1n+1 1 −1n − 1 1 n sinnx + − cosnx + −1 . obtain: x + + 4 4 n2 n n n n=1 1 −1n+1 cosnx. 5. (a) For − ≤ x ≤ x sinx = − cosx + 2 2 2 n=2 n − 1 (b) f is continuous with continuous first and second derivatives on − , and f− = f. Theorem 14.6 gives −1n 1 us x cosx + sinx = sinx + 2 n sinnx for − < x < . 2 2 n=2 n − 1 1 1 (c) The Fourier series of x cosx + sinx on − is sinx + sinnx. 2n−1n 2 2 n −1 n=2
Section 14.6
1 2 cos nx − n=1 n 2 nx 8 1 n 48 + tan−1 n cos 2nx − 1 + n2 2 cos 9. 7. 16 + 2 n=1 2 2 n=1 4n2 − 1 2 3 2 1 2 1 2n − 1x cos nx + −1n cos + 1 − −1n 13. + 11. n=1 n 2 2 n=1 2n − 1 2 2
3. Hint: Write the definition of f x + p and use the periodicity of f .
5. 1 −
Answers and Solutions to Selected Problems
A32
15. it =
120−1n+1 10 − n2 1200−1n+1 cosnt + sinnt 2 2 2 2 n 100n2 + 10 − n2 2 n=1 n 100n + 10 − n
Section 14.7 1. 3 +
1 2nix/3 3i e n=− n=0 n
3.
1 3 − 4 2
n 1 n sin + i cos − 1 enix/2 2 2 n=− n=0 n
2 1 3i 1 1 1 + e2n−1ix/2 7. − 2 e2n−1ix 2 n=− n=0 2n − 1 2 n=− n=0 2n − 12
n 5 5 5 5i 1 1 2n sin enix/6 and gx = + cos −1 − 9. fx = + 3 n=− n=0 n 3 3 n=− n=0 n 3 6 2n nix/6 i sin e . f and g have the same frequency spectra but different phase spectra. 3
5.
CHAPTER 15 Section 15.1
2 sin 2 cos sinxd, converging to − if x = −, to x for − < x < , to if x = , and − 2 2 2 0 to 0 if x > . 2 1 − cos sinxd, converging to − 21 if x = −, to −1 if − < x < 0, to x = 0 if x = 0, to 1 if 0 0 < x < , to 21 if x = , and to 0 if x > . 1
400 cos100 + 20 0002 − 4 sin100 cosxd, converging to x2 if −100 < x < 100, to 5000 if 3 0 x = ±100, and to 0 if x > 100. 2
− sin sin2 cosx − cos sin2 sinxd, converging to sinx if 2 − 1 0 −3 ≤ x ≤ , and to 0 if x < −3 or x > . 2 cosxd, converging to e−x for all real x. 1 + 2 0
1.
3.
5.
7.
9.
Section 15.2
4
10 sin10 − 502 − 1 cos10 − 1 sinxd; 3 4
10 cos10 − 502 − 1 sin10 cosxd; both integrals converge to x2 for cosine integral: 3 0 0 ≤ x < 10, to 50 if x = 10, and to 0 for x > 10. 2 2
1 + cos − 2 cos4 sinxd; cosine integral:
2 sin4 − sin cosxd; 3. sine integral: 0 0 3 both integrals converge to 1 for 0 < x < 1, to for x = 1, to 2 for 1 < x < 4, to 1 for x = 4, and to 0 for x > 4. The 2 cosine integral converges to 1 at x = 0, while the sine integral converges to 0 at x = 0. 6 2 4 5. sine integral:
1 + 1 − 2 cos − 2 cos3 + sin sinxd; cosine integral: 2 0 6 2 4
2 − 1 sin + 2 sin3 +
cos − 1 cosxd; both integrals converge to 1 + 2x for 2 0 1 0 < x < to 2 3 + 2 for x = to 2 for < x < 3 to 1 for x = 3 and to 0 for x > 3. The sine integral converges to 0 for x = 0, while the cosine integral converges to 1 for x = 0. 2 3 2 2 + 2 7. sine integral: sinxd; cosine integral: cosxd; both integrals converge to 4 + 4 4 + 4 0 0 −x e cosx for x > 0. The cosine integral converges to 1 for x = 0, and the sine integral converges to 0 for x = 0. 1. sine integral:
0
Answers and Solutions to Selected Problems
A33
2k 2k
1 − cosc sinxd; cosine integral: sinc cosxd; both integrals converge 0 0 1 to k for 0 < x < c, to 2 k for x = c, and to 0 for x > c. The cosine integral converges to k for x = 0, and the sine integral to 0 for x = 0. 1 x e− cosxd = and e− sinxd = . 11. For all x 1 + x2 1 + x2 0 0
9. sine integral:
Section 15.3 1. 3. 5.
7.
9. 19.
1 − 2 2 i 3 −2 eix d; converging to xe−x for all real x. −8 − 1 − 2 2 + 42 2 1 − 2 2 + 42 2 sin5 eix d; converging to sinx for −5 < x < 5, and to 0 for x ≥ 5. i 2 − 2 −
i 1 1 sin + e−1 2 cos + 2 cos − sin eix d; converging to x for −1 < x < 1, −e−1 2 − +1 +1 to 21 1 + e−1 for x = 1, to 21 −1 + e−1 for x = −1, and to e−x for x > 1.
sin/2 − 1 − sin/2 1 cos/2 + i + + i cos/2 eix d; converging to cosx − 2 − 2 − 1 2 − 1 2 − 1 2 − 1 for 0 < x < /2, to sinx for −/2 < x < 0, to 0 for x > /2, to 21 at x = 0, to − 21 at x = −/2, and to 0 at x = /2. 4 24 2i 10
cos − 1 11. − e−2i sin 13. e−1+4ik/4 15. e− 17. e2i 1 + 4i 16 + 2 2 −8t2 −4it e e 18 21. Ht + 2e−10−5−3it 23. Ht 2e−3t − e−2t
Section 15.4 5 −2i−3 −3−3 i − 1 7. e e 9. Htte−t 3 + i 3
3 −4it 1 1 1 1 −2t+3 −2t−3 e
1 − e Ht + 3 −
1 − e Ht − 3 13. + 4 4 2 9 + t + 22 9 + t − 22 25 1 3 19. 3 502 sin5 + 20 cos5 − 4 sin5 tC = 0 tR = 3 e−4 cos4 − 1 4 −4 −4 −4 sin4 e sin4 2
e tC = 2 tR = − + i + e cos4 − 1 2 2 2 +1 +1 +1 +1 3 1 4 2 2 2 − 3 −8 sin2 − 4 cos2 + 2 sin2 + 2i 3 8 cos2 − 4 sin2 tC = 0 tR = 3
1. i H−e3 − He−3 11. 17. 21. 23.
3.
26 2 + i2
5.
Section 15.5 1 fˆs = 2 1+ 1 + 2
1 sinK1 − sinK1 + + 3. fˆC = 2 1 − 1+
ˆfS = − 1 cosK1 + − cosK1 − 2 − 1 2 1+ 1−
1 1 1 5. fˆC = + 2 1 + 1 + 2 1 + 1 − 2 1 1− 1+ − fˆS = 2 1 + 1 + 2 1 + 1 − 2 positive L; 7. Sufficient conditions are: f and f 3 continuous on 0 f 4 piecewise continuous on 0 L for every 3 ftdt and f tdt and ft → 0 f t → 0 f t → 0; and f t → 0 as t → . Also needed are 0 0 convergent. 1. fˆC =
A34
Answers and Solutions to Selected Problems
Section 15.6 1. 5. 7. 11. 13. 15.
K 2 2 2
1 − −1n for n = 1 2 3. − 3 + −1n for n = 1 2 − n n n3 n −1n n sina 0 ˜S n = for n = 1 2 if a is not an integer; if a = m, a positive integer, then f /2 a2 − n2 n 2 /2 if n = 0 n − ˜
1 − −1 . e for n = 1 2 9. f n = n C
−1 − 1/n2 if n = 1 2 n2 + 1 2 1 6 6 3 f˜C 0 = 4 f˜C n = 4 + −1n − 4 if n = 1 2 . 4 n n2 n a
1 − −1n cosa for n = 0 1 2 , if a is not an integer. a2 − n2 f x sinnxdx and integrate by parts. Write f˜S n =
if n = m . if n = m
0
Section 15.7 1. D u0 = D u2 = D u4 =
5 j=0 5 j=0 5
cosj ≈ −23582 D u1 =
5
cosje−ij/3 ≈ 29369 − 42794i
j=0
cosje−2ij/3 ≈ 13292 − 16579 × 10−2 i D u3 =
D u−4 =
cosje−ij ≈ 96238 × 10−2
j=0
cosje−4ij/3 ≈ 13292 + 16579 × 10−2 i D u−1 =
j=0
D u−2 =
5
5 j=0 5
cosje
2ij/3
cosje
4ij/3
−2
5
cosjeij/3 ≈ 29369 + 42794i
j=0 5
≈ 13292 + 16579 × 10 i D u−3 =
cosjeij ≈ 96238 × 10−2
j=0 −2
≈ 13292 − 16579 × 10 i
j=0 5 1 1 −ij/3 = 245 D u1 = e ≈ 81667 − 40415i D u2 ≈ 65 − 17321i j + 1 j + 1 j=0 j=0 D u3 ≈ 61667 D u4 ≈ 65 + 17321i D u−1 ≈ 81667 + 40415i D u−2 ≈ 65 + 17321i D u−3 ≈ 61667 D u−4 ≈ 65 − 17321i 5. D u0 = 55 D u1 ≈ −60 + 31177i D u2 ≈ −140 + 10392i D u3 = 15 D u4 ≈ −140 − 10392i D u−1 ≈ −60 − 31177i D u−2 ≈ −140 − 10392i D u−3 ≈ −15 D u−4 ≈ −140 + 10392i 5 5 7. The inverse is uj 5j=0 , where u0 = 16 1 + ik ≈ −13333 + 16667i u1 = 16 1 + ik eik/3 ≈ −42703
3. D u0 =
5
k=0
k=0
+54904i u2 ≈ −16346 × 10−2 + 561i u3 ≈ 33333 + 5i u4 ≈ 84968 + 27233i, and u5 ≈ 15937 − 2049i 6 6 1 1 9. u0 = e−ik ≈ 10348 + 14751 × 10−2 i u1 = e−ik e2ik/7 ≈ 93331 − 29609i 7 k=0 7 k=0 u2 ≈ −94163 × 10−2 + 88785 × 10−2 i u3 ≈ −23947 × 10−2 + 62482 × 10−2 i u4 ≈ 43074 × 10−3 + 51899 × 10−2 i u5 ≈ 25788 × 10−2 + 43852 × 10−2 i u6 ≈ 51222 × 10−2 + 34325 × 10−2 i 11. u0 ≈ −1039 u1 ≈ 42051 + 29456i u2 ≈ 13143 + 31205 × 10−2 i u3 ≈ 13143 − 31205 × 10−2 i u4 ≈ 42051 − 29456i ki cos2 − 1 1 2 1 sin2 + . For the DFT cose−ki d = − 13. The Fourier coefficients are dk = 2 0 2 2 k2 − 1 2 2 k2 − 1 127 1 approximation, choose N = 128, and approximate dk by fk = cosj/64e−ijk/64 . 128 j=0 Then 1 d0 = sin2 ≈ 045465 f0 ≈ 046017 2 d1 ≈ −51259 × 10−2 − 2508i and f1 ≈ −45737 × 10−2 − 25075i d2 ≈ −11816 × 10−2 − 11562i and f2 ≈ −62931 × 10−3 − 11553i d3 ≈ −51767 × 10−3 − 75984 × 10−2 i and f3 ≈ 34589 × 10−4 − 75849 × 10−2 i
Answers and Solutions to Selected Problems
A35
d−1 ≈ −51259 × 10−2 + 2508i and f−1 ≈ −45737 × 10−2 + 25075i d−2 ≈ −11816 × 10−2 + 11562i f−2 ≈ −62931 × 10−3 + 11553i d−3 ≈ −51767 × 10−3 + 75984 × 10−2 i f−3 ≈ 34589 × 10−4 + 75849 × 10−2 i 1 1 1 if k = 0, and d0 = 13 . The DFT approximation is 2 e−2ki d = +i 15. Now, dk = 2 2 2 k 2k 0 127 j 1 fk = e−ijk/64 . 128 j=0 128 d0 = 13 f0 ≈ 32944 d1 ≈ 50661 × 10−2 + 15915i f1 ≈ 46765 × 10−2 + 15912i d2 ≈ 12665 × 10−2 + 79577 × 10−2 i f2 ≈ 87691 × 10−3 + 79514 × 10−2 i d3 ≈ 5629 × 10−3 + 53052 × 10−2 i f3 ≈ 17329 × 10−3 + 52956 × 10−2 i d−1 ≈ 50661 × 10−2 − 15915i f−1 ≈ 46765 × 10−2 − 15912i d−2 ≈ 12665 × 10−2 − 79577 × 10−2 i f−2 ≈ 87691 × 10−3 − 79514 × 10−2 i d−3 ≈ 5629 × 10−3 − 53052 × 10−2 i f−3 ≈ 17329 × 10−3 − 52956 × 10−2 i
Section 15.8 1. The complex Fourier series is 2 +
i ikt e , so S10 18 = 10207 + 16653 × 10−16 i. Approximate k k=− k=0
127 1 1 ≈ V eik/8 , which is 10552 + 10−14 i 8 128 k=−0 k 10207 + 16653 × 10−16 i − 10552 − 20983 × 10−16 i = 0003452
1 sin2 i cos2 − 1 ikt − + k e . Then 3. The complex Fourier series is 2 2 k2 − 1 2 2 k2 − 1 k=− S10
k=±1
127 1 1 1 = 10672 − 34694 × 10−18 i. Approximate S10 ≈ V eik/8 , which works out to S10 8 8 128 k=0 k 10428 + 38025 × 10−15 i 10672 − 34694 × 10−18 i − 10428 + 38025 × 10−15 i = 0002440 1 3 1 i 2k2 2 − 3 2kit e 5. The complex Fourier series is + + , and 4 k=− k=0 4 2 k2 4 3 k3 127 1 1 ≈ V eki/2 , which works out to S10 41 = −72901 × 10−4 + 10408 × 10−17 i. Approximate S10 4 128 k=0 k 34826 × 10−3 + 91593 × 10−16 i − 72901 × 10−4 + 10408 × 10−17 i − 34826 × 10−3 + 91593 × 10−16 i ≈ 0004212 7. 014386 − 012455i 9. −65056 × 10−3 − 2191 × 10−3 i. In the graphs associated with Problems 11 and 13, the series 1 points are the actual values computed from the transform, and the series two points are the DFT approximations. In all cases the approximations can be improved by choosing N larger. 2 sin2 + cos2 − sin − cos 2 cos2 − sin2 − cos + sin 11. fˆ = +i . Note: in the sum 2 2 of equation (15.24), 11 ≤ j ≤ 20 because ft is zero outside the interval [1,2). Generate the following table using L = 4: k 1 2 3 4 5 6 7 8 9 10 11 12
fˆ k/4 13845 − 05673i 10579 − 10421i 05761 − 13488i 02068 − 14404i −05162 − 13098i −09483 − 09865i −12108 − 05325i −12708 − 00295i −11323 + 04386i −08329 + 07966i −04359 + 09953i −00166 + 10168i
DFT fˆ k/4 13764 − 05714i 10445 − 1048i 05558 − 13521i −0573 − 14372i −05453 − 12959i −09749 − 09602i −12288 − 04945i −12751 + 0170i −11194 + 04866i −08026 + 08393i −03909 + 10258i 0374 + 10299i
ˆ f k/4 14962 1485 14667 14406 14078 13684 13227 12711 12143 11525 10866 10169
DFT fˆ k/4 14903 14796 14619 14383 1406 13684 13247 12752 12206 11613 10978 10306
A36
Answers and Solutions to Selected Problems The plots below compare the (a) real parts, (b) imaginary parts, and (c) absolute values, of fˆ k/4 and the DFT approximations.
2.0 1.5
1.5
1.0
1.0 f 0.5
f
0
0
0.5
DFT
0.5
1.0
1.0 1.5
DFT
0.5
1.5 1
2
3
4
5
6 7 (a)
8
2.0
9 10 11 12
1
2
3
4
5
6 7 (b)
8
9 10 11 12
1.6 f
1.4 1.2
DFT
1.0 0.8 0.6 0.4 0.2 0
1
2
3
4
5
6
7
8
9 10 11 12
(c) cos − 1 sin +i . With L = 4 we obtain the following table (with j summing from 0 to 10 in 13. fˆ = equation (15.24)):
k 1 2 3 4 5 6 7 8 9 10 11 12
fˆ k/4 9896 − 1244i 9589 − 2448i 9089 − 3577i 8415 − 4597i 7592 − 5477i 6650 − 6195i 5623 − 6733i 4546 − 7081i 3458 − 7236i 2394 − 7205i 1388 − 6997i 0470 − 6633i
DFT fˆ k/4 10686 − 1318i 1035 − 25925i 98047 − 37821i 90716 − 48489i 81791 − 57604i 71616 − 64909i 60574 − 70222i 49076 − 73447i 37537 − 74574i 26362 − 73678i 15924 − 70914i 0655 − 66506i
ˆ f k/4 99739 98965 97675 95888 93614 90885 87722 84147 80198 75923 71333 66496
DFT fˆ k/4 10767 1067 10509 10286 10004 96654 92738 88334 83488 78252 7268 66828
Answers and Solutions to Selected Problems
A37
The plots below compare the (a) real parts, (b) imaginary parts, and (c) absolute values, of fˆ k/4 and the DFT approximations.
1.2 1.0 DFT
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.8 f 0.6 0.4 0.2 0
1
2
3
4
5
6
7
8
9
10 11 12
1
2
3
4
5
6 7 (b)
(a)
1.2
8
9 10 11 12
DFT
1.0 f
0.8 0.6 0.4 0.2 0
1
2
3
4
5
6
7
8
9 10 11 12
(c) i 4 2n − 1t −1 + −1n enit/2 , or sin . 2 n=− n=0 n n=1 2n − 1 N nt 2 1 − −1n sin . The N th Cesàro sum is Then SN t = 2 n=1 n n i n 2 N N nt −1 + −1n enit/2 , or N t = 1 − −1n sin . 1− 1− N t = N n N n 2 n=−N n=1 Graphs are drawn for N = 5 10 25.
15. The Fourier series of f is
1.0
1.0 0.5
2
0 1 Ces`aro
Fourier
0.5
2 1
2
t
1
1
0
0.5
0.5
1.0
1.0
N=5
N = 10
2
t
A38
Answers and Solutions to Selected Problems
1.0 0.5
2 1
0
1
t
2
0.5 1.0 N = 25 N i 2 −1n − cosn/2enit = cosn/2 − −1n sinnt N t = n=−Nn=0 n n=1 n n i n 2 N N −1n − cosn/2enit = cosn/2 − −1n sinnt 1− 1− N n N n n=−Nn=0 n=1 Graphs are drawn for N = 5 10 25.
17. SN t =
N
1.0
1.0
0.5 –1.0
1.0
–0.5 0
0.5
1.0
t
0.5
0.5
0
–0.5
0.5
–1.0
1.0
N=5
N = 10
1.0 1.0
0.5
0.5 0
0.5
1.0
t
0.5 1.0 N = 25 N 1 1 1 − −1n 17 i 7 n n + 1 − −1 −1 + + − 1 enit 4 n=−Nn=0 2 n2 2 n 2 2 N 5 17 2 1 − −1n n + − + 3−1 sinnt = cosnt − 4 n=1 n2 2 n 2 n 1 1 − −1n N 1 17 i 7 n n N t = + 1 − −1
−1 1− + + − 1 enit 4 n=−Nn=0 N 2 n2 2 n 2 2
n 1 − −1n N 5 17 2 n + − + 3−1 = cosnt − sinnt 1− 4 n=1 N n2 2 n 2 Graphs are drawn for N = 5 10 25 (see next page).
19. SN t =
0.5
1.0
t
Answers and Solutions to Selected Problems
7
7
6
6 5
5
4
4 3
3
2
2
1 1.0
0.5
A39
0 N=5
0.5
1
t
1.0
1.0
0.5
0 N = 10
0.5
1.0
t
7 6 5 4 3 2 1 1.0
0.5
0 N = 25
0.5
1.0
t
N
N i −2 3−1n − 1enit/2 = 1 + 3−1n − 1 sinnt/2 N t = n n n=−Nn=0 n=1 N −2 1 − Nn 3−1n − 1 sinnt/2 1+ n
21. SN t = 1 +
n=1
Hamming filtered partial sum: HN t = 1 + N
N −2 054 + 046 cosn/N3−1n − 1 sinnt/2 n n=1
2 −a 2 n2 /N 2 e 3−1n − 1 sinnt/2 n n=1 1 Graphs are drawn for N = 5 10 25, with a = in the Gauss filter. 2
Gauss filtered partial sum: GN t = 1 +
−
4
Fourier
3
2 Ces`aro 2
1
Gauss
Ces`aro
1 0
1
2
t
2
1
2 Hamming
Gauss
2
1 1
Fourier
4
3
N=5
1
0
2 Hamming
N = 10
1
2
t
Answers and Solutions to Selected Problems
A40
Fourier 4 3
Gauss
2 Ces`aro 2
1
1 1
0
2 Hamming
N = 25
Section 15.9 1. Power spectrum of yt = 4 sin80t − sin20t
10 4 10 3 10 2 10 1 10 0 10 1 10 2 10 3
0
50
100 150 200 250 300 350 400 450 500
3. Power spectrum of yt = 3 cos90t − sin30t
10 4 10 3 10 2 10 1 10 0 10 1 10 2 10 3
0
50
100 150 200 250 300 350 400 450 500
1
2
t
Answers and Solutions to Selected Problems 5. Corrupted signal of yt = cos30t + cos70t + cos140t
6 4 2 0 2 4 6 8
0
5
10
15
20
25
30
35
40
45
50
Frequency components of the corrupted signal
160 140 120 100 80 60 40 20 0
50 100 150 200 250 300 350 400 450 500
7. Corrupted signal of yt = cos20t + sin140t + cos240t
6 4 2 0 2 4 6 8
0
5
10
15
20
25
30
35
40
45
50
A41
Answers and Solutions to Selected Problems
A42
Frequency components of the corrupted signal
150
100
50
0
50 100 150 200 250 300 350 400 450 500
CHAPTER 16 Section 16.1 3. We will illustrate for P3 x. By Rodrigues’s formula, 1 d3 1 1 x2 − 13 = 120x3 − 72x = 5x3 − 3x. P3 x = 3 2 3! dx3 48 2 1 6 − 2k! 3 = 1 and x3−2k = −1k 5. For example, with n = 3 we have 2 8k!3 − k!3 − 2k! k=0 5 4! 3 6! x3 − x = x3 − x = P3 x 83!3! 82!1! 2 2 After cancellations because Pn x is one solution, we obtain 7. Substitute Qx = Pnxux in Legendre’s equation. P x Pn v x 2x 2x u + 2 − = −2 n + u = 0. Let v = u to write . Integrate to obtain 2 Pn 1 − x vx Pn x 1 − x2 1 1 . Then ux = dx and lnvx = −2 lnPn x − ln1 − x2 , so vx = Pn x2 1 − x2 Pn x2 1 − x2 Qx = Pn xux. −112 2 + 4 + 1008 1 12 2 − 10 1 5x3 −3x+660 63x5 −70x3 +15x+· · · . 13. For −1 < x < 1 sinx/2 = 2 x +168 4 2 6 8 In the graph, the sum of these terms is indistinguishable from the function.
y 1.0 0.5 1.0
0.5
0 0.5 1.0
0.5
1.0
x
Answers and Solutions to Selected Problems
A43
1 5 15 15 1 1 + − cos1 sin1 + − cos2 1 3x2 − 1 + 15. For −1 < x < 1 sin2 x = − cos1 sin1 + 2 2 8 8 4 2
531 585 585 1 cos1 sin1 − + cos2 1 35x4 − 30x2 + 3 + · · · . In the graph, the sum of these terms is 32 32 16 8 indistinguishable from the function.
y 0.7 0.6 0.5 0.4 0.3 0.2 0.1 1.0
0.5
0
0.5
1.0
x
71 3 11 1 5x3 − 3x + 63x5 − 70x3 + 15x + · · · . The graph below 17. For −1 < x < 0 and for 0 < x < 1 fx = x − 2 82 16 8 shows the function and the sum of these three terms of the Fourier–Legendre expansion.
y 1.0 0.5 1.0
0.5
0 0.5 0.5
1.0
x
1.0
Note that in 13 and 15, just the first three terms of the eigenfunction expansion approximate the function so closely that the two graphs are virtually indistinguishable. In 17, it is clear that we must include more terms of the eigenfunction expansion to reasonably approximate fx.
Section 16.2 1. Let y = xa J bxc and compute y = axa−1 J bxc + xa bcxc−1 J bxc and Y = aa − 1xa−2 J bxc + 2axa−1 bcxc−1 + xa bcc − 1xc−2 J bxc + xa b2 c2 x2c−2 J bxc . Substitute these into the differential equation and simplify to get c2 xa−2 bxc 2 J bxc + bxc J bxc + bxc 2 − 2 J bxc = 0. 3. y = c1 J1/3 x2 + c2 J−1/3 x2 5. y = c1 x−1 J3/4 2x2 + c2 x−1 J−3/4 2x2 7. y = c1 x4 J3/4 2x3 + c2 x4 J−3/4 2x3 9. y = c1 x−2 J1/2 3x3 + c2 x−2 J−1/2 3x3 √ √ 11. y1 = c1 J3 x + c2 Y3 x 13. y = c1 J4 2x1/3 + c2 Y4 2x1/3 15. y = c1 x2/3 J1/2 x + c2 x2/3 J−1/2 x
+ zJ1/3 + z2 − 19 J1/3 = 0 17. Substitute into the differential equation and use the face that J1/3 z satisfies z2 J1/3 √ √ 19. y = c1 xJ2 x + c2 xY2 x 21. y = c1 x2 J2 x + c2 x2 Y2 x 29. (a) The sum of the first five terms of the Fourier–Bessel expansion is ≈ 167411J2 5135x − 077750J2 8417x + 08281J2 11620x − 06201J2 14796x + 06281J2 17960x. From the graphs of x and the first five terms of this expansion, more terms would have to be included to achieve reasonable accuracy.
Answers and Solutions to Selected Problems
A44
(c) The sum of the first five terms of the Fourier–Bessel expansion is 085529J2 5135x − 021338J2 8417x +035122J2 11620x − 020338J2 14796x + 0025800J2 17960x. As the graphs indicate, more terms are needed to reasonably approximate the function by the partial sum of the Fourier–Bessel expansion.
y
y
1.0
0.4
0.8
0.3
0.6 0.2 0.4 0.1
0.2 0.2
0
0.4
0.6 (a)
0.8
1.0
x
x 0
0.2
0.4
0.6
0.8
1.0
(c)
Section 16.3
2 2n − 1 for n = 1 2 ; eigenfunctions nonzero constant multiples of 1. regular problem on [0, L]; eigenvalues 2L 2n − 1 sin x 2L 2n − 1 2n − 1 2 x cos 3. regular on [0, 4]; 2 4 2 4 1 5. periodic on [−3 3]; 0 is an eigenvalue with eigenfunction 1; for n = 1 2 n2 is an eigenvalue with 9 eigenfunction an cosnx/3 + bn sinnx/3, not both an and bn zero √ 1√ . There are infinitely many such solutions, of 7. regular on [0, 1]; eigenvalues are positive solutions of tan = 2 which the first four are approximately 0.43, and 89.82. Corresponding to an eigenvalue n , √ √ 10.84, 40.47, eigenfunctions have the form 2 cos n x + sin x. 9. regular on 0 1 + n2 for n = 1 2 e−x sinnx for n = 1 2 n n2 2 11. regular on 1 e3 1 + for n = 1 2 x−1 sin lnx for n = 1 2 9 3 2 sinnx. The tenth partial sum of the series is compared to the function in the graph. 13. For 0 < x < 1 1 − x = n=1 n
y 1.0 0.8 0.6 0.4 0.2 0
0.2
0.4
0.6
0.8
1.0
x
Answers and Solutions to Selected Problems
A45
√ √ 4 2 cosn/2 − 2 sinn/2 − −1n 2n − 1 cos x . This converges to −1 for 0 < x < 2 15. The expansion is 2n − 1 8 n=1 and 1 for 2 < x < 4, and to 0 if x = 2. The tenth partial sum of the series is compared to the function in the graph.
y 1.0 0.5 0
1
2
3
4
x
0.5 1.0 17. For −3 < x < 3 x2 = 3 2 + 36
−1n cosnx/3 The tenth partial sum of the series is compared to the n2 n=1
function in the graph.
y 80 60 40 20 8 6 4 2
2 4
0
6
8
x
4 1 1 2n − 1 2n − 1 x Now, f · n = x dx = 19. Normalized eigenfunctions are n x = √ cos x4 − x √ cos 8 8 0 2 2 √ 4−1n + 2n − 1 √ 4−1n + 2n − 1 2 4 2 512 128 2 −128 2 , so ≤ f · f = 0 x 4 − x2 dx = 15 , 3 2n − 13 3 2n − 13 n=1 4−1n + 2n − 1 2 512 1 . ≤ = or √ 3 2n − 13 960 n=1 15128 22
Section 16.4 3. The interval on which 13t is nonzero is disjoint from the interval on which −21 t is nonzero, so 13 t−21 t is identically zero, hence − 13 t−21 tdt = 0. Graphs of 13 t and −21 t are shown below.
y 1.0 0.5 10
5
0 0.5 1.0
5
10
t
A46
Answers and Solutions to Selected Problems
5. A graph of 2t − 3 is shown below.
y 1.0 0.5 2 3 2 1 0
1
3
4
t
5
0.5 1.0 7. The Fourier 16 is seriesof ft on −16 n 6 3n n 3n 4 −2 sin + sin − sin − 2 sin + 4 2 n 8 16 n=1 n n nt 4 3n n 6 n 3n sin cos + − − cosn + 2 cos − cos + cos − 2 cos + 4 16 n 4 2 n 8 16 n nt sin . The graph below compares the function with the fiftieth partial sum of its Fourier series. cos 4 16
y 6 4 2 15 10 5
5
2
10
15
t
4 6 n n nt 1 3n −14 sin − 8 sin + 16 sin cos + 9. The Fourier series of ft on [−16, 16] is 2 4 8 16 n=1 n 1 n 3n nt n 3−1n + 2 cos + 3 + 8 cos − 16 cos sin . The graph below compares the function n 2 4 8 16 with the fiftieth partial sum of its Fourier series.
y 8 6 4 2 15 10 5 2 4 6 8
5
10
15
t
Answers and Solutions to Selected Problems
A47
CHAPTER 17 Section 17.1
2 y
2 y2 n2 2 c2 n2 2 = − = − sinnx/L cosnct/L and sinnx/L cosnct/L.
x2 L2
t L2 2 2
y 1
y 1 3. Compute 2 = f x + ct + f x − ct and 2 = c2 f x + ct + f x − ct.
x 2
t 2 2
2 z
z 2 z 5. The problem for the displacement function zx y t is 2 = c2 + for 0 < x < a 0 < y < b
t
x2 y2 1. Compute
zx y 0 = fx y
z x y 0 = 0 for 0 < x < a 0 < y < b
t
z0 y t = za y t = zx 0 t = zx b t = 0
Section 17.2 1. 3. 5. 7.
2−1n 16−1n 2n − 1x 2n − 1ct sin sin + sinnx sinnct yx t = 3 3 2 2 2 2 n=1 2n − 1 c n=1 n c 22n − 1t 108 2n − 1x sin yx t = sin 44 2n − 1 3 3 n=1 √ 24 2n − 1x −1n+1 sin cos 2n − 1 2t yx t = 2 2 n=1 2n − 1 −32 32n − 1t 2n − 1x cos sin yx t = 3 3 2 2 n=1 2n − 1 4 n n 3nt nx + cos − cos sin sin 2 2 2 4 2 2 n=1 n
9. Let Yx t = yx t + hx and substitute into the problem to choose hx = 19 x3 − 49 x. The problem for Y becomes
2 Y
2 Y = 3
t2
x2 Y0 t = Y2 t = 0 4 Y 1 x 0 = 0 Yx 0 = x3 − x 9 9 t √ nx 32 n 3t n cos , and then yx t = Yx t − hx. −1 sin We find that Yx t = 33 3n 2 2 n=1 11. Let Yx t = yx t + hx and choose hx = cosx − 1. The problem for Y is
2 Y
2 Y = 2 2
t
x Y0 t = Y2 t = 0
Y x 0 = 0
t 16 1 2n − 1x 2n − 1t sin cos , and then This problem has solution Yx t = 2 2 2 n=1 2n − 1 2n − 1 − 4 yx t = Yx t + 1 − cosx. r t nx 1 r t nx 2A L 13. ux t = e−At/2 rn cos n + sin n , where Cn = dx and Cn sin fx sin L AL 2L 2L rn 0 L n=1 rn = 4BL2 + n2 2 c2 − A2 L2 Yx 0 = cosx − 1
Answers and Solutions to Selected Problems
A48
64 1 nx 2 1 sin 15. (a) The solution with the forcing term is yf x t = − 3 3 2 2n − 1 9 2n − 1 2n − 1 − 16 4 n=1 32n − 1t 1 cos +
cosx − 1. 4 9 2 128 32n − 1t 2n − 1x cos . sin (b) Without the forcing term, the solution is yx t = 3 3 4 4 n=1 2n − 1 Both solutions are graphed together for times t = 05 06 49, and 9.8, using the same axes to allow comparison.
t = 4.9 2
t = 0.5
1
t = 0.6
0 0
1
2 x
3
4
–1
t = 9.8 –2
17. yj1 = 025 for j = 1 2 19 y12 = 008438 yj2 = 005 for j = 2 3 19 y13 = 013634 y23 = 0077149 yj3 = 0075 for j = 3 4 19 y14 = 017608 y24 = 010786 y34 = 010013 yj4 = 01 for j = 4 5 19 y15 = 020055 y25 = 014235 y35 = 012574 y45 = 012501 yj5 = 0125 for j = 5 6 19 19. We give yjk for j = 1 2 9, first for k = −1, then k = 0 1 5. yj−1 008075 0127 014475 014 011875 0087 005075 0016 −001125 yj0 0081 0128 0147 0144 0125 0096 0063 0032 0009 yj1 0079125 01735 014788 0147 013063 010475 0075375 00485 0030125 yj2 00057813 002115 077078 014903 013567 011328 0087906 0065531 0050516 yj3 −0055066 027160 13199 018908 014015 012162 010062 0083022 0068688 yj4 −0092055 03768 17328 029675 014653 012981 011355 010072 0083463 yj5 −0093987 053745 19712 048652 016125 013803 012669 011814 00941
Section 17.3
−1 sin 10 cosx cos12td 3. yx t = sinx sin4td 25 + 2 2 2 − 1 0 0
1 2 cos − sin 1 −2 cos + 2 sin cosx + sinx sin3td 5. yx t = e−2 e 3 4 + 2 3 4 + 2 0 2 2 − sin − 2 cos sinx cos3td 7. yx t = 3 0 1. yx t =
Answers and Solutions to Selected Problems
1 sin/2 − sin5/2 sinx sin2td 2 − 1 3 −
16 cos3 − 12 cos + 122 sin cos2 − 32 sin − 8 sin cos2 11. yx t = 75 0 + 2 sin + 2 sinx sin14td x x 13. yx t = At + 1 − A t − H t− c c 9. yx t =
0
Section 17.4 1. Characteristics are lines x − t = k1 x + t = k2 x+t
− d = x2 + t2 − xt yx t = 21 x − t2 + x + t2 + 21 x−t
3. Characteristics are x − 7t = k1 x + 7t = k2 49 1 yx t = cosx − 7t + cosx + 7t + t − x2 t − t3 2 3 5. Characteristics are x − 14t = k1 x + 14t = k2 yx t = 21 ex−14t + ex+14t + xt 7. yx t = x + 18 e−x+4t − e−x−4t + 21 xt2 + 16 t3 9. yx t = x2 + 64t2 − x + 321 sin2x + 8t − sin2x − 8t + 121 t4 x 11. yx t = 21 coshx − 3t + coshx + 3t + t + 41 xt4 In each of 13, 15, and 17, the graphs show a progression of the motion as a sum of forward and backward waves. 13.
u
u
1.0 0.6 0.4
0.5
0.2 4
2
0
2
4
x
4
0.5
2
0 0.2
2
x
4
0.4 0.6
1.0 (2)
(1)
u
u 0.6
0.2
0.4 0.4
0.2 4
2
0 0.2 0.4 0.6 (3)
2
4
x
4
2
0 0.2 0.4 (4)
2
4
x
A49
Answers and Solutions to Selected Problems
A50
u 0.4 0.2 4
2
0
2 0.2
x
4
0.4 (5) 15. u
u
1.0 0.8
0.8
0.6
0.6
4
2
0.4
0.4
0.2
0.2
0 (1)
2
4
x
4
2
0 (2)
x
2
4
2
4
u
u 0.7
0.5
0.6
0.4
0.5 0.4
0.3
0.3
0.2
0.2 0.1
0.1 4
2
0 (3)
2
4
x
4
2
0 (4)
u
u
4
2
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0 (5)
x
2
4
x
8 6 4 2
0 (6)
2
4
6
8
x
Answers and Solutions to Selected Problems 17.
u
u 4
2
4
2
0
2
0
x
4
4
2
2
0.5
0.4
1.0
0.8
1.5
1.2
2.0
1.6
(1)
(2)
u
u
0 0.2
2
x
4
4
2
0 0.2
0.4
0.4
0.6
0.6
0.8
0.8
1.0
1.0
(3)
2
4
4
x
x
(4)
Section 17.5 2
1
xJ0 2405xdx 01057 = 078442, a2 = 004112, a3 = −81366, =2
J1 24052 02695 a4 = −3752, a5 = −64709. The fifth partial sum of the series gives the approximation zr t ≈ 078442J0 2405r cos2405t + 004112J0 5520r cos5520t − 81366J0 8654r cos8654t− 3752J0 11792r cos11792t − 64709J0 14931r cos14931t. The graph below shows zr t at various times.
1. We find that (approximately) a1 =
0
z 4000 2000
t 2.7 t 1.4 t 0.3 0.2
2000 4000
0.4
0.6
0.8
1.0 r
0
t 0.9
A51
A52
Answers and Solutions to Selected Problems
3. Approximately, zr t ≈ 12534J0 2405r cos2405r − 088824J0 5520r cos5520t− 2489J0 8654r cos8654t − 11336J0 11792r cos11792t − 19523J0 14931r cos14931t. The graph below shows zr t at selected times.
z 15,000 10,000 5,000 0 5,000 10,000 15,000 20,000
t 1.5 t 2.7 0.4 0.2
0.8 0.6
1.0
r
t 0.9 t 0.4 t 4.2
Section 17.6 1. Compute 1 1 1 0 r = 4 − r 2 sin2 d = 4 − r 2 2 2 − 2 ⎧ 0 ⎪ ⎨ 2 4−r sin2 cosnd = 4 − r 2 n r = 1 ⎪ − − ⎩ 2
if n = 2 if n = 2
4 − r2 2 sin sinnd = 0 − 1 2 1 2 Thus, zr t = j 1 − J j d J r cosj0k t 0 0k 0 0k
J1 j0k 2 0 2 k=1 1 4 1 2 + j − 1J j d J r cos2 cosj2k t 2 2k 2
J3 j2k 2 0 2 2k k=1 1 4−1p+1 1 + j sinp J j dJ r sinjpq t p pq pq 2 2 0 p=1 q=1 pjpq Jp+1 jpq n r =
≈ 11081J0 12025r cos240483t − 013975J0 2760r cos552008t +04555J0 43270r cos865373t − 002105J0 58960r cos117915t +001165J0 74655r cos1443092t + · · · − 29777J2 25675r cos2 cos51356t −14035J2 42085r cos2 cos841724t − 11405J2 58100r cos2 cos116198t −083271J2 7398r cos2 cos147960t − · · ·
Section 17.7
1 8−1n+1 2 16 1 nx n 2 + 3 −1 − 1 sin siny cos 1. zx y t = n + 4t n=1 n n 2 2
Answers and Solutions to Selected Problems 16 2n − 1x 2m − 1x cos 3. zx y t = sin 2 2 2 2 2 n=1 m=1 2n − 12m − 1 2n − 1 + 2m − 1 2n − 12 + 2m − 12 t × sin
CHAPTER 18 Section 18.1
2 u
u
u = k 2 for 0 < x < L t > 0; u0 t = L t = 0 for t ≥ 0 ux 0 = fx for 0 ≤ x ≤ L.
t
x
x
u
2 u
u 3. = k 2 for 0 < x < L t > 0 0 t = 0 and uL t = t for t ≥ 0; ux 0 = fx for 0 ≤ x ≤ L.
t
x
x 1.
Section 18.2 In these solutions, expA = eA .
8L2 −2n − 12 2 kt 2n − 1x exp sin 3 3 L L2 n=1 2n − 1 −16L 2n − 1x −32n − 12 2 t sin exp ux t = 2 L L2 n=1 2n − 1 2n − 1 − 4 2 4 2 cosnxe−4n t ux t = 2 − 2 3 n n=1 nx 2 2 1 1 − e−6 −1n e−n t/18 cos ux t = 1 − e−6 + 12 2 2 6 36 + n 6 n=1 4B 2n − 1x −2n − 12 2 kt sin exp ux t = 2L 4L2 n=1 2n − 1
1. ux t = 3. 5. 7. 9.
11. Substitute ex+t vx t into the partial differential equation and solve for and so that v1 = kvxx . We get = −A/2 and = kB − A2 /4. 13. Let ux t = e−3x−9t vx t. Then vt = vxx v0 t = v4 t = 0, and vx 0 = e3x . Then vx t = 2 2 1 − e12 −1n nx e−n t/16 2n sin . Graphs of the solution are shown for times t = 0003 002 008, 144 + n2 2 4 n=1 and 13.
u
u
400
4
200 0 200
2 1
2
3
4
x
0 2
400
4
600
6
800
1
2
8 t = 0.003
t = 0.02
3
4
x
A53
Answers and Solutions to Selected Problems
A54
u
u
1.0
0.0016 0.0014 0.0012 0.0010 0.0008 0.0006 0.0004 0.0002
0.8 0.6 0.4 0.2 1
0
2 t = 0.08
3
4
x
0
1
2 t = 1.3
3
x
4
15. Let ux t = vx t + fx and choose fx = 3x + 2 to have vt = 16vxx , v0 t = v1 t = 0, and 2 2 4n −1n + 2−1n − 2 − 2n2 2 2 2 sinnxe−16n t and ux 0 = x2 − fx. Then vx t = 2 n3 3 n=1 ux t = vx t + 3x + 2. Graphs of the solution for times t = 0005 0009, and 0.01.
u
u
5.0
5.0
4.5
4.5
4.0
4.0
3.5
3.5
3.0 2.5
3.0
2.0
2.5
1.5 0
0.2
0.4 0.6 t = 0.005
0.8
1.0
x
2.0 0.2
0
0.4 0.6 t = 0.009
u 5.0 4.5 4.0 3.5 3.0 2.5 2.0 0
0.2
0.4
0.6 t = 0.01
0.8
1.0
x
0.8
1.0
x
Answers and Solutions to Selected Problems
A55
17. Let ux t = e−At wx t. Then wt = 4wxx w0 t = w9 t = 0, and wx 0 = 3x. Obtain nx 54−1n+1 2 2 wx t = sin e−4n t/81 . The graphs compare solutions at times t = 0008 004, and 0.6 for n 9 n=1 A = 41 , 1, and 3.
u
u
A = 14
25
20
A=1
20
A=1
15
A=3
15
A = 14
A=3
10
10
5
5 x
0
2
4 6 t = 0.008
x
8
2
0
4 6 t = 0.04
8
u A = 14 10 8 6
A=1
4 A=3
2
x 2
0
4 6 t = 0.6
8
1 1 − −1n 1 − −1n 4 2 2 −4n2 t sinnxe−4n t . The graphs compare −1 + 4n sinnx + t + e n5 n=1 n3 n=1 8 solutions with and without the source term, for times t = 08, 04, and 13.
19. ux t =
u
u
0.25
0.5
With source
0.20
0.4
0.15
0.3
No source 0.10
0.2
0.05
0.1
0
0.5
1.0
1.5 2.0 t = 0.8
2.5
3.0
x
0
0.5
1.0
1.5 2.0 t = 0.4
2.5
3.0
x
Answers and Solutions to Selected Problems
A56
u 0.30 0.25 0.20 0.15 0.10 0.05 0
0.5
1.0
1.5 2.0 t = 1.3
2.5
x
3.0
nx 50 1 − cos 5−1n 2 2 −n2 2 t/25 −25 + n sin 21. ux t = t + 25e n3 3 n2 2 − 25 5 n=1 nx 2 2 4−1n + 2 e−n t/25 . The graphs compare solutions with and without the source term, for + −250 sin n3 3 5 n=1 times t = 07, 15, 26, and 42.
u
u
No source
No source
12
8
10
Source
Source
6
8 6
4
4 2
2 1
0
2
3 t = 0.7
4
5
x
0
1
2
3 t = 1.5
4
x
5
u
u No source
3
5
No source
2
4 3
Source
Source
1
2 0
1 0
1
2
3 t = 2.6
4
5
x
1
2
3
1 t = 4.2
4
5
x
Answers and Solutions to Selected Problems 27−1n+1 23. ux t = 128 n=1
2 2 t/9
16n2 2 t + 9e−16n n5 5
−9
sin
nx 3
+ 2K
1 − −1n nx −16n2 2 t/9 sin e . n 3 n=1
The graphs show the solution with and without the source term, at times t = 005 and 02, with K = 21 .
u
u 0.035
0.25
0.030
0.20
0.025
0.15
0.020
Source No source
0.015
0.10
0.010 0.05
0.005 x
0
0.5
1.0
1.5 2.0 t = 0.05
2.5
3.0
x 0
0.5
1.0
1.5 2.0 t = 0.2
2.5
3.0
25. In the following, j = 1 2 9 uj0 : 0.009, 0.032, 0.063, 0.096, 0.125, 0.144 0.147. 0.128, 0.081 uj1 : 0.0125, 0.034, 0.0635, 0.095, 0.1225 0.14, 0.1415, 0.121, 0.0725 uj2 : 0.01475, 0.064125, 0.089, 0.094, 0.1195 0.136, 0.136, 0.114, 0.0665 uj3 : 0.023381, 0.058, 0.084031, 0.099125, 0.11725 0.13188, 0.1305, 0.10763, 0.06175 27. In the following, j = 1 2 9. uj0 : 0.098769, 0.19021, 0.2673, 0.32361, 0.35355 0.35267, 0.31779, 0.24721, 0.14079 uj1 : 0.096937, 0.18622, 0.26211, 0.31702, 0.34585 0.34417, 0.30887, 0.23825, 0.13220 uj2 : 0.095124, 0.18307, 0.25697, 0.3105, 0.33822 0.33577, 0.30004, 0.22939, 0.12566 uj3 : 0.09330, 0.17956, 0.25188, 0.30405, 0.33062 0.32745, 0.29131, 0.22112, 0.12018
Section 18.3
1 8 2 cosxe− kt d 0 16 + 2 8 cos3 − cos + 4 sin cos2 − 2 sin cos cosx ux t = 2 0
4 −2 sin cos3 + sin cos + 8 cos4 − 8 cos2 + 2 sinx e− kt d − 2 2 2 sinxe− kt d ux t = 0 2 + 2 2 1 − cosh 2 sinxe− kt d ux t = 0 4 2 2 sinxe− t e−t /2 d ux t = 0 1 + 2 2 t x 2t − erfc √ ux t = d, in which erfc is the complementary error function. 0 2 k
1. ux t = 3.
5. 7. 9. 11.
A57
Answers and Solutions to Selected Problems
A58
Section 18.4
1 2 2 2 J j d J0 jn re−jn t ; the fifth partial sum, with approximate values inserted, is 0 n 2
J j 0 1 n n=1 Ur t ≈ 8170J0 2405re−5785t − 11394J0 5520re−3047t + 07983J0 8654re−7489t − 0747J0 11792re−13904t + 06315J0 14931re−22293t . A graph of this function is shown for times t = 0003 0009 004, and 07.
1. Ur t =
U t 0.003
0.8 0.6
t 0.009
0.4 t 0.04
0.2 t 0.7 0.2 0.4
0
0.6
0.8
x
1.0
1 2 jn 2 2 r e−jn t/18 ; the fifth partial sum, with approximate values 9 − J j d J 0 n 0 2
J j 3 0 1 n n=1 inserted, is Ur t ≈ 99722J0 2405r/3e−578t/18 − 1258J0 5520r/3e−3047t/18 + 04093J0 8654r/3e−7489t/18 − 01889J0 11792r/3e−13904t/18 + 01048J0 14931r/3e−22293t/18 . A graph of this partial sum is shown for times t = 0003 0009 008, and 04.
3. Ur t =
u
t = 0.003 t = 0.009
8 6 t = 0.08 4 t = 0.4 2 r 0
0.5
1.0
1.5
Section 18.5 1. ux y t =
n=1 m=1 K L
bnm sin
2.0
2.5
nx L
sin
3.0
my K
e−nm kt , where nm =
n2 m2 + L2 K 2
4 fx y sinnx/L sinmy/Kdx dy LK 0 0 4 8−1m+1 m 2 sinx sinmye−1+m t . 3. ux y = 2 2m − 12 2m + 1 m=1 nm =
CHAPTER 19 Section 19.1 1. & 2 f + g = f + gxx + f + gyy = fxx + fyy + gxx + gyy = & 2 f + & 2 g and & 2 f = fxx + fyy = fxx + fyy = & 2 f .
2 and
Answers and Solutions to Selected Problems
y 2 − x2
2x 2x lnx2 + y2 = 2 =2 2 and . 2 2 2
x x +y
x x + y x + y2 2 x2 − y 2
2 2x =2 2 . Then & 2 lnx2 + y2 = 0, provided that x2 + y2 = 0. Similarly, 2 2 2
y x + y x + y2 2
2 u 1 u 1 2 u 5. Recall that, in polar coordinates, Laplace’s equations is 2 + + 2 2 = 0. It is routine to verify by
r r r r substitution that the given functions are harmonic. 3. Compute
Section 19.2 1. ux y = 3. ux y =
−1 sinx sin hy − sinh 2 32 n−1n+1
sinnx sinhny 2 sinh4n 2n − 12 2n + 12 nx ny 1 1 −1n − 1 sinx sinhy + sin sinh 16n 5. ux y = sinh 2 2 n − 22 n + 2 sinhn 2 /2 2 2 n=1n=2 1 siny sinhx + sinh2 2n − 1y 2n − 1x sinh , 7. ux y = cn sin 2a 2a n=1 a 2 2n − 1x where cn = dx. fx sin a sinh 2n − 1b/2a 0 2a −1 1 − −1n 2 9. ux y = siny sinhx − 4 + 2 sinny sinhnx sinh4 3 n3 n=1 sinh4n n=1
Section 19.3
n 1 1 r 3. ur = 2 + 2−1n 2 2 cosn + n sinn 3 2 n n=1 n − 1 1 r e −1n
− cosn − n sinn + e2 cosn + e2 n sinn 5. ur = sinh + n=1 4 n2 + 1 n 2 r
n2 2 −1n sinn − 6−1n sinn 7. ur = 1 + 3 8 n n=1
1. ur = 1
9. In polar coordinates, the problem is & 2 Ur = 0 for r < 4 U4 = 16 cos2 . This has solution Ur = 8 + r 2 cos2 − 21 . In rectangular coordinates, the solution is ux y = 21 x2 − y2 + 8. 11. In polar coordinates, the solution is Ur = r 2 2 cos2 − 1, so ux y = x2 − y2 .
Section 19.4
3 2 3 1 = d = 98696/; u /3 ≈ 4813941647/, 2 8 0 5/4 − cos − 4 u02 /4 ≈ 8843875590/ 3. u4 ≈ 15525/ u12 3/2 ≈ 302/ u8 /4 ≈ 11156/ u7 0 ≈ 24851/ Rn 1 2 R2 − R2 /4 5. With ur = r n sinn, compute uR/2 /2 = n sinn/2 = 2 2 2 2 0 R + R /4 − R2 cos − /2 Rn sinnd. Upon dividing out common powers of R and solving for the integral, we obtain 2 sinn 1 2 sinn/2 = d. n 2 3 5 − 4 sin 0 2 n 2 1 1 7. cos = cosnd −1n = cosnd n−1 n−1 32 2 5 − 4 sin 32 5 + 4 cos 0 0
1. u
A59
A60
Answers and Solutions to Selected Problems
Section 19.5
1 4−x 4+x arctan − arctan for − < x < y > 0 y y y 1 1 ux y = e− cosd − 0 y2 + − x2 y2 + + x2 2 2 ux y = f sind sinxe−y d + g sind sinye−x d 0 0 0 0 h 2 ux y = Be−y sinx + 1 − −1n 1 − e−ny sinnx 3 n=1 n
−1n −ny 4 −1n 2 e −2 sinnx. − +6 Using a finite Fourier sine transform in x, we get ux y = n=1 n n n 2 sinye−x d ux y = 0 1 + 2 y 8 A A x−8 x−4 − arctan + arctan ux y = d = 4 y2 + − x2 y y
1. ux y = 3. 5. 7. 9. 11. 13.
Section 19.6
4−1n+m √ sinnx sinmy sinh n2 + m2 z n=1 m=1 nm 2 sinh n2 + m2 16 3. ux y z = 2 2n − 12m − 1 sinh2 2m − 12 + 2 2n − 12 n=1 m=1 × sin2n − 1x sin2m − 1z sinh 2m − 12 + 2 2n − 12 y ⎡ 1. ux y z =
⎢ ⎢ 16 ⎢ + ⎢ 2 2n − 12m − 1 n=1 m=1 ⎣
sinh
1 2m − 12 + 2 2n − 12 4
⎤ ⎥ ⎥ 2m − 12 2m − 1y 2 2 × sin2n − 1x sin sinh + 2n − 1 z ⎥ ⎥ 2 4 ⎦
Section 19.7 1. u =
2n + 1A 1 n arccos2 Pn d Pn cos 2 R −1 n=0
≈ 29348A − 37011A 3
R
P1 cos + 11111A
2 R
P2 cos
4 5 P3 cos + 03200A P4 cos − 02120 P5 cos + · · · R R R 2 3. u ≈ 60784 − 98602 P cos + 52360 P2 cos R 1 R 3 4 5 −24044 P3 cos + 15080 P4 cos − 09783 P5 cos + · · · R R R
1 1 5. u = R −1 T R R2 − R1 1 1 2 −05397A
Answers and Solutions to Selected Problems
A61
1
AP2n−1 xdx 4n − 1A 1 = P2n−1 xdx. 1 R2n−1 0 R2n−1 0 P2n−1 x2 dx n=1 4n − 1 1 9. u = a2n−1 2n−1 P2n−1 cos, where a2n−1 = 2n−1 farccosxP2n−1 xdx R 0 n=1 7. u =
2n−1
a2n−1
P2n−1 cos, where a2n−1 =
0
Section 19.8
3.
5. 7. 9. 11.
1
4 cosx cosh1 − y + C. 4 cosxdx = 0, a solution may exist. We find ux y = − sinh Since 0 cos3xdx = 0 6x − 3dx = 0, a solution may exist. We find ux y = 12 −1n − 1 1 cos3x cosh3 − y + cosnx coshny + C. 3 −3 sinh3 n=1 n sinhn 2 2 n −1n + 61 − −1n sinny coshn1 − x ux y = 2 n4 4 sinh n=1 1 R r 2 ur = a0 + 2 cos2 − 1 2 2 R 1 lny2 + − x2 e− sind + c ux y = 2 − 2 a cosxe−y d + c, with a = − f cosd ux y = 0 0
1. Since
0
CHAPTER 20 Section 20.1 1. 26 − 18i
3.
1 1 + 18i 65 n 4n+1
11. i4n = i2 2 n = 1 = 1 i 13. 23. 27. 29.
5. 4 + 228i
7. 6 − i
1 −1632 + 2024i 4225 4n 2 4n+3 4n 2
9.
= ii4n = i, since i4n = 1 i4n+2 = i i = −1 i = i i i = −i + 2n 19. − tan−1 23 + 2n 17. a2 − b2 + b + 1 2ab − a 21. + 2n 2 √ √ 25. 29 cos − tan−1 25 + i sin − tan−1 25 2 2 cos3/4 + i sin3/4 √ −1 1 + i sin tan−1 18 65 cos tan 8 z−w z−w = = 1 z − w = 1. Hint: If z = 1, then z¯z = 1 and 1 − z¯ w z¯z − w¯z ¯z z − w
Section 20.2 1. 5. 9. 13.
15. 17.
19.
21.
√ Circle of radius 9 with center (8, −4) 3. Circle of radius 21 65 with center (0, − 21 ) The real axis for x ≤ 0 7. The line y = x + 2 The line 8x + 10y + 27 = 0 11. The half-plane 3x + y + 2 > 0 K is the closed half-plane 2x + 8y + 15 ≥ 0; every point of K is a limit point of K, and there are no other limit points; boundary points of K are those points on the line 2x + 8y + 15 = 0; K is closed but not compact (because K is not bounded). M consists of all points below the line y = 7; limit points are points of M and points on the line y = 7; boundary points are points x + 7i; M is open; M is not compact. U consists of all points x + iy with 1 < x ≤ 3; limit points are points of U and points on the line x = 1; boundary points are points on the lines x = 1 and x = 3; U is neither open nor closed; U is not compact (neither closed nor bounded). W consists of all x = iy with x > y2 . These are points (x y) inside and to the right of the parabola x = y2 . This set is open, and not compact. Limit points are all points of W and points on the parabola; boundary points are the points on the parabola. 1 + 2i 23. 2 − i 25. −1 27. 23 i
29. If n is even, say n = 2m, then i2n = i4m = 1, so {1} is one convergent subsequence; if n = 2m + 1, then i2n = i4m i2 = i2 = −1, so {−1} is another convergent subsequence. There are others.
Answers and Solutions to Selected Problems
A62
CHAPTER 21 Section 21.1 1. u = x v = y − 1; Cauchy–Riemann equations hold at all points; f is differentiable for all complex z. 3. u = x2 + y2 v = 0; nowhere; nowhere 5. u = 0 v = x2 + y2 ; the Cauchy–Riemann equations hold at (0, 0); nowhere 7. u = 1 y/x; nowhere; nowhere 9. u = x2 − y2 v = −2xy; 0 0; nowhere y x v = −4y − 2 ; Cauchy–Riemann equations hold for all nonzero z; differentiable for z = 0 11. u = −4x + 2 x + y2 x + y2
Section 21.2 1. 2; z + 3i < 2 3. 1; z − 1 + 3i < 1 5. 2; z + 8i < 2 7. 1; z + 6 + 2i < 1 9. No; i is closer to 2i than 0 is, so if the series converged at 0 it would have to converge at i. 11. cn+1 /cn is either 2 or 21 , depending on whether n is odd or even, so cn+1 /cn has no limit. However, lim cn 1/n = 1, so the radius of convergence is 1 by the nth root test applied to
n→
cn z . n
n=0
Section 21.3 1. cos1 + i sin1 7.
3. cos3 cosh2 − i sin3 sinh2
1
1 − cos2 cosh2 + 21 i sin2 sinh2 2
13. u =
9. i
sinx cosx
5. e5 cos2 + ie5 sin2 2
2
11. u = ex −y cos2xy v = ex coshy sinhy
2 −y 2
sin2xy
v = cos2 cosh2 y + sin2 x sinh2 y cos2 x cosh2 y + sin2 x sinh2 y 2 2 1 iz 1 iz 15. sin2 z + cos2 z = e − e−iz + e + e−iz = 1 2i 2 + 2n n any integer 19. z = ln2 + 2n + 1i n any integer 17. z = ln2 + i 2
Section 21.4 3 4n − 1 i ln4 + i 3. ln5 + 2n + 1i ln5 + i 1. ln4 + 2 2 √ √
5. ln 85 + 2n + 1 + tan−1 − 29 i ln 85 + + tan−1 − 29 i z z iargz−argw = e 7. Hint: In polar form, w w
Section 21.5 1. ie
−2n+/2
−2n+/2
32n+3/4
3ln2 3ln2 − i sin cos 2 2
3. e 5. e n n 7. cos + + i sin + 9. 16e2n+1 cosln4 − i sinln4 8 2 8 2 2n + 1 2n + 1 + i sin 13. cosn/3 + i sinn/3 11. 2 cos 4 4
15. The nth roots of unity are k = e2ki/n for k = 0 1 n − 1. Now use the fact that for z = 1 z = e2i/n .
n−1 k=0
zk =
zn − 1 , with z−1
Answers and Solutions to Selected Problems CHAPTER 22 Section 22.1 The graph of the curves in Problems 1,3,5,7, and 9 are shown below. 1. initial point 6 − 2i, terminal point 2 − 2i; simple and not closed; tangent $ t = 2ieit = −2 sinti + 2 costj
y 0
3. 1 + i 3 + 9i; simple and not closed; $ t = 1 + 2ti = i + 2tj
y
3
2
4
5
x
6
0.5 1.0 1.5 2.0 5. 3, 3; closed but not simple; ) t = −3 sint + 5 costi = −3 sinti + 5 costj
9 8 7 6 5 4 3 2 1 1.0
1.5
2.0
2
4 2 1
0 2
1
2
3
x
4
0 2 4 6 8 10 12 14 16
1 2
3
9. 1, cos2 − 2 sin4i; simple and not closed; $ t = − sint − 4 cos2ti = − sinti − 4 cos2tj
y 1.5 1.0 0.5 0.4 0.2 0 0.5
x
y 1
2
3.0
7. −2 − 4i 4 − 16i; simple and not closed; + t = 1 − ti = i − tj
y
3
2.5
0.2 0.4 0.6 0.8
1.0
x
4
x
A63
Answers and Solutions to Selected Problems
A64
Section 22.2 1. 8 − 2i
3 1 + i 2
3.
1 −13 + 4i 2
5.
7. − 21 cosh8 − cosh2
9. − 21 e−1 cos2 + i sin2 − ecos2 − i sin2 larger number)
11. 10 + 210i
13.
25 i 2
15.
2 1 + i 3
√ 16. 1/ z (or any
Section 22.3 1. 0
3. 0
5. 2i
7. 0
9. 0
11. 4i
Section 22.4 1. 32i 3. 2i−8 + 7i 9. −5121 − 2i cos256
5. −2e2 cos1 − i sin1 − 39i 13. 2 11. − 13 2
7. i6 cos12 − 36 sin12
CHAPTER 23 Section 23.1 −1n 2n 2n 2 z z < (that is, the series converges for all complex z) n=0 2n! √ 1 z − 4in z − 4i < 17 5. n + 1zn z < 1 3. n+1 1 − 4i n=0 n=0
1.
7. −3 + 1 − 2iz − 2 + i + z − 2 + i2 z < 9. 63 − 16i + −16 + 2iz − 1 − i + z − 1 − i2 z < −1n z + i2n+1 z < 11. n=0 2n + 1! n 2n 2 + 2n−1 2n z +i z2n+1 z < 13. 1 + iz + 2n! 2n + 1! n=1 15. Fix z and think of w as the variable. Define fw = ezw . Then f n w = zn ezw . By Cauchy’s integral formula, 1 ezw n! ezw zn = dw, with $ the unit circle about the origin. Then dw, so f n 0 = zn = 2i $ wn+1 n! 2i $ wn+1 n 2 n 2 n n z 1 z n ezw 1 1 1 z z z zw zw dw = e dw. Then = e dw = n! 2i $ n!wn+1 2i n=0 $ n!wn+1 2i $ n=0 n! w w n=0 n! 1 1 = ezw+1/w dw. Now let w = ei on $ to derive the result. 2i $ w 17. The maximum must occur at a boundary point of the rectangle. Consider each side. On the left vertical side, x = 0 and ez = eiy = 1. On the right vertical side, ez = e1 eiy = e has maximum e. On the lower horizontal side, ez = ex for 0 ≤ x ≤ 1, with maximum e. On the upper horizontal side, ez = ex has maximum e. Thus the maximum of ez on this rectangle is e.
Section 23.2 1.
1 −1n + z − in , for 0 < z − i < 2 z − i n=0 2in+1
5. −
1 − 2 − z − 1 0 < z − 1 < z−1
−1n+1 4n 2n−2 z z < 2n! n=1 1 2i z2n 7. 2 + 0 < z < 9. 1 + 0 < z − i < z z−i n=0 n + 1!
3.
CHAPTER 24 Section 24.1 1. pole of order 2 at z = 0
3. essential singularity at z = 0
7. simple pole at −i, removable singularity at i
5. simple poles at i and −i, pole of order 2 at 1 2n + 1 9. simple poles at 1, −1 i, and −i 11. simple poles at 2
Answers and Solutions to Selected Problems 13. f has a Taylor expansion fz =
A65
an z − z0 n in some open disk about z0 , and g has a Laurent expansion of the
n=0
b b z − z0 n in some annulus 0 < z − z0 < r. Then fg has an expansion of the form form gz = −1 + z − z0 n=0 n b−1 a0 + c z − z0 n in this annulus, and b−1 a0 = 0 because b−1 = 0 and a0 = fz0 = 0. z − z0 n=0 n
1 gz gz = in some annulus hz z − z0 3 qz 0 < z − z0 < r. Now g/q is analytic at z0 , and so has Taylor expansion gz/qz = cn z − z0 n in some disk
15. Write hz = z − z0 3 qz, where q is analytic at z0 and qz0 = 0. Then
n=0
about z0 . Further, c0 = 0 because qz0 = 0 and gz0 = 0. Then, in some annulus about z0 c0 gz = + c z − z0 n−2 . hz z − z0 3 n=1 n
Section 24.2 and at −2i 251 9 + 12i; the value of the integral is therefore 2i. 8i e − 1 3. 0 5. 2i 7. 2i 9. −i/4 11. 0 13. 2i 15. 2 an z − z0 n and hz = bn z − z0 n , with a0 = 0 and b3 = 0. From Problem 15, Section 24.1, 18. Write gz = n=0 n=3 gz n n n n = d z − z0 , with d−3 = 0. Write gz = an z − z0 = bn z − z0 dn z − z0 and hz n=−3 n n=0 n=3 n=−3 equate the coefficient of z − z0 n on the left with the coefficient of z − z0 n in the product on the right. We get a0 = d−3 b3 a1 = d−3 b4 + d−2 b3 a2 = d−3 b5 + d−2 b4 + d−1 b3 . Use these to solve for d−1 in terms of coefficients 1 1 a0 a1 a2 b1 b4 and use the fact that an = g n z0 bn = hn z0 . n! n! 1. The residue at 1 is
1 16 − 12i, 25
Section 24.3 1. 2i 3. 0 1 −4t 5. cos3t 7. 36 e − 361 e2t + 16 te2t 9. 21 t2 e−5t √ √ √ 1 + 5e−4 11. 2/ 3 13. 2/3 15. 41 e−2 2 sin2 2 17. −/128 19. 32 3 17 cos/5 + 16 cos3/5 16 sin 21. − 5 5 289 + 168 cos2/5 + 136 cos4/5 + 32 cos6/5 cosx 1 − 23. Re seiz /z2 + 1 i = − 21 ie− , so dx = 2i − ie = e− . 2 2 − x + 1 z dz, with $ the unit 25. With the trigonometric substitutions, we obtain −4i 2 2 2 2 2 2 2 $ − z + 2 + z + − 0 circle. The two poles within the unit disk are z = ±
− , and the residue at each is −i/2. Therefore, the value +
of the integral is 2i−i/, or 2/. −z2 e dz = 0, where $ is the rectangular path. Writing the integral over each piece of the $ R 2 2 e−x dx + e−R+it idt+ boundary (starting on the bottom and going counterclockwise), we have −R 0 −R 0 2 2 2 2 e−x+i dx + e−−R+it idt = 0. Let R → . The second integral is e−R et e−2iRt dt, and this goes to R 0 2 e−x dx− zero as R → . Similarly, the fourth integral has limit zero. The first and third integrals give − √ 2 2 2 2 2 2 e−x e e−2ix dx = 0. Then = e e−x cos2xdx + ie− e−x sin2xdx. The last integral is zero − − − √ 2 2 because the integrand is odd. Finally, write = 2e 0 ex cos2xdx, since this integrand is even. Now solve for the integral.
27. By Cauchy’s theorem,
Answers and Solutions to Selected Problems
A66
2
eiz dz = 0. Now integrate over each piece of $, going counterclockwise and beginning /4 2 2i 0 R 2 i/4 2 eix dx + eiR e Riei d + eire ei/4 dr = 0. The second integral tends to with the segment 0 R
28. By Cauchy’s theorem,
$
0
0
R
zero as R → , and in the limit √ the lastequation becomes √ 1 2 2 2 2 −r 2 1 + i 1 + i . Equate the real part of each side and the
cosx + i sinx dx = e dr = 2 2 2 0 0 imaginary part of each side to evaluate Fresnel’s integrals. 2 1 1 1 z 30. dz = −4i d = dz. The integrand has double 2 2 2 + 2z + 2 + cos iz z $ $ 0 1 z+ + 2 z 1 1 2 2 poles at − ± − , but only − + 2 − 2 is enclosed by the unit circle $. Compute 2 i 1 −4iz 1 2 2 − 2 − + Re s =− 2 . Then d = 2 . 2 2 2 3/2 z + 2z + − + cos2 − 2 3/2 0 2 1 1 d = 2 d. Finally, check that 2 2 + cos 0 0 + cos
CHAPTER 25 Section 25.1 1. The images are given by the following diagrams.
v
y π D
w=ez
C
A
π
B
x (a)
C' –eπ
D' –1
A'
y 2
C'
C
D' –1 – π2
B
D
B'
C B 1
A
C'
θ=
π
D' A'
x (c)
B'
1
e
u
e A'
e2 B'
u
v
y π
u
(b) v
4
e
A'
y π
1 e
x
1
A
u
v π
D
B' eπ
1
D
C
A 1
B 2
x (d)
C'
D'
Answers and Solutions to Selected Problems v
y 2
D
e2
C'
π
C D'
–1
1 e
x
2
u
A' A
B
– π2
B'
(e)
3. The images are given by the following diagrams.
v
y π
w = 4 sin(z)
D
C
D'
2
A
π
(4, 0)
B
x
2
A'
(a)
y
B' v
π
D
C
A
B π
4
2
u
D'
u2 – v2 = 8
4
π
C'
(4, 0) x
A'
(b)
B'
C'
u
v
y D'
C' D
π
C
6
A
x
B
A'
(c) v
y π
D
C'
C
2
B
A π
3π 2
2
1
B' (–4, 0)
A' (4, 0)
C
A
B
u
(d) D'
v D
D'
x
y 2
u
B'
A' u
1
2
B'
x (e)
C'
A67
Answers and Solutions to Selected Problems
A68
5. The sector consisting of all w with an argument in /2 . 1 1 1 1 7. Hint: Put z = rei and obtain u = r+ cosk v = r− sink, where the half-line is = k. 2 r 2 r 9. The entire w plane with the origin excluded.
Section 25.2 16 − 16i + −7 + 13iz 4 − 75i + 3 + 22iz 3 + 8i − 1 + 4iz 3. w = 5. w = 4 + 7i − 2 + 3iz 4 − 8i + −1 + 2iz −21 + 4i + 2 + 3iz 11 1 208 19 377 v = 10 9. u − 2 + v + 2 = 11. u − 12 + v + 2 = 21 63 3969 4 16 w = z¯ reverses orientation. az + b dw − b and ad − bc = 0, then z = − is also a linear fractional transformation. If w = cz + d cw − a az + b has either one or two solutions, depending on whether or not c is zero. A translation has Hint: Show that z = cz + d no fixed point.
1. w = 7. 13. 15. 17.
In each of Problems 19 and 21, there are many solutions, of which one is given here. 1 1 1 19. z → → −4 → −4 + i = w 21. z → iz → iz − 2 − 7i = w z z z
Section 25.3 In Problems 1, 3 and 5, there are many mappings having the stated property. We give one such mapping in each case. −4iz + 1 3z + 2 + 6i 5. w = 7. w = z1/3 1. w = 2z + 1 − i 3. w = z + 2i z−1 1 $m$n 9. Hint: Evaluate f1 f−1 f0, and f and then use the result that for positive tm−1 1 − tn−1 dt = $m + n 0 integers m and n.
Section 25.4
gt y dt, where ux 0 = gx − t − x2 + y2 2 1 gx0 + R cost y0 + R sint 3. ux y = 2 0
R2 − x − x0 2 − y − y0 2 × 2 dt R + x − x0 2 + y − y0 2 − 2Rx − x0 cost − 2Ry − y0 sint 1 2 r cost − r sint1 − r 2 5. ur cos r sin = dt 2 0 1 + r 2 − 2r cost − 1 1 1 − t cost/2 7. ux y = 8 −1 1 + sin2 t/2 4 sinhx/2 cosy/2 1 + sin2 t/2 + sinhx siny 1 − sint/2 dt × sinh2 x/2 + sin2 y/2 − 2 coshx/2 siny/2 sint/2 + sin2 t/2 1. ux y =
Section 25.5 1. With a = Kei , equipotential curves are x y = K x cos − y sin = constant. These are lines of the form y = cotx + b. Streamlines are x y = K y cos + x sin = constant, which are lines y = − tanx + b. Velocity = f z = Ke−i . There are no sources or sinks. 3. x y = cosx coshy x y = − sinx sinhy. Equipotential curves are graphs of y = cosh−1 K/ cosx, streamlines are graphs of y = sinh−1 C/ sinx. 5. x y = K lnz − z0 x y = K argz − z0 . Equipotential curves are circles z − z0 = r and streamlines are rays emanating from z0
C
−vdx + udv = 2K, with C the circle of radius r about z0 .
Answers and Solutions to Selected Problems
A69
x y x . Equipotential curves are graphs of x + 2 + i y − = c, streamlines are graphs x2 + y 2 x2 + y 2 x + y2 y of y − 2 = d. x + y2
b x argz = c. Streamlines are graphs of − 9. Equipotential curves are graphs of K x + 2 2 x + y 2
y b ib 1 k y− 2 + + lnz = d. Stagnation points occur where f = 0, or z = k 1 − x + y2 4 z2 2z ib b2 ± 1− . z=− 4k 16 2 k2
7. fz = k x +
CHAPTER 26 Section 26.1 1. (a) abcdefghijkl, where each of the letters a, , g can be either an H (for heads) or T (for tails), and each of the symbols h, , l can be any of the integers from 1 through 6 inclusive. (b) 27 65 3. 610 15 5. There are 43 53 = 8000 outcomes, more than enough. 7. 7(15)(8)(17), or 14,280 9. 7(11)(6)(14), or 6,468
Section 26.2 1. 9! 3. 9! 5. (a) 7! (b) 6! (c) 5! 7. 6! 9. 12! 11. 6!
Section 26.3 1. 5. 7.
25 P7 ,
or 2,422,728,000 3. 22 P6 , or 53,721,360 or 5.7408(1016 ) if order counts; 52 C10 , or 158,200,024,220 if order is unimportant 9. with order: 52 P5 , or 311,875,200; without order, 52 C5 , or 2,598,960 20 C4 , or 4,845 52 P10 ,
Section 26.4 1. Typical outcome: a b c d, where each letter can be any of the integers 1, 2, 3, 4, 5, 6. There are 64 outcomes. A has 40 outcomes in it, and B has 16 outcomes. 3. Typical outcome: ! , with ! and four distinct letters (English) alphabet. There are 26 C4 outcomes. A has 24 C2 outcomes, and W has 7 C4 outcomes. 5. Typical outcome: a b c d e f , where each of a b and c can be H or T , and each of d e and f can be any 1, 2, 3, 4, 5, 6. There are 23 63 outcomes. C has 10 outcomes, D has 80 outcomes, E has 24 outcomes. 7. Typical outcome: a b c d e, in which each letter can be any of 1, 2, 3, 4, 5, 6. U has 21 outcomes, K has 21 outcomes. 9. Typical outcome: a b p, with each letter a distinct card; there are 52 C16 outcomes. Y has one outcome, M has 4(40 C5 ) outcomes.
Section 26.5 1. (a) 5/16 (b) 13/16 3. (a) 1/221 (b) 105/221 5. (a) 5/14 (b) 15/28 (c) 9/28 7. (a) 1/1860480 (b) 1/4 (c) 1271/1292 9. (a) 1892/23205 (b) 2257/54145
Section 26.6 1. 1 − 756 + 57 /67 , about 0.3302
3. 1 −8 C5 /52 C5 , about 0.9998
5. 1 − 1/341 055 365364 · · · 365 − n + 1 7. For n people, the probability of at least two having the same birthday is 1 − , assuming a 365n 365-day year.
A70
Answers and Solutions to Selected Problems
Section 26.7 1. (a) 1/2 (b) 2/3
3. (a) 5/16 (b) 5/11
5. (a) 56/1296 (b) 4/671
7. 1/249,900
9. 1/16, 1/4, 0
Section 26.8 1. 5. 9. 11.
dependent 3. dependent independent 7. dependent at least two heads, 0.4864, exactly two heads, 0.3456 probability of drawing exactly two reds is 108/113
Section 26.9 1. Pr(10) = 1/2, Pr(5) = Pr(50) = Pr(0) = Pr(1200) = 1/8 3. Pr(Ford) = 1/9, Pr(Chevy) = 1/18, Pr(Porsche) = 1/6, Pr(VW) = 1/12, Pr(Lamborghini) = 1/12, Pr(tricycle) = 1/12, Pr(Honda) = 1/18, Pr(Mercedes) = 1/9, Pr(tank) = 1/12, Pr(Steamer) = 1/24, Pr(bike) = 1/8 5. Pr(0) = 1/4, Pr(50, 000) = 1/20, Pr(.05) = Pr(1000) = 1/20 Pr(500) = Pr(1500) = 1/5, Pr(20) = Pr(lion) = 1/10
Section 26.10 1. Pr(≥10 yearsitem defective) = 0.14, Pr5−10 yearsitem defective = 035, Pr1−5 yearsitem defective = 012, Pr<1 yearitem defective = 0039 3. Pr(adult malesurvived ≥2 years = 0039 Pr(girlsurvived <1 year = 0032 5. Pr(City 1gun explodes) = 0.05 Pr(City 2gun explodes) = 0012 Pr(City 3gun explodes) = 0212 Pr(City 4gun explodes) = 0014 Pr(City 5gun explodes) = 0045 Pr(City 6gun explodes) = 0666
Section 26.11 1. win $1.82 per game 3. lose $0.23 per game 5. lose $2 per game 7. win $4.43 per game
CHAPTER 27 Section 27.1 1. (a) s = 4809, approximation yields 4.15 (c) s = 49497, approximation yields 3.75 3. x = 12258 median = 1 s = 86872
Section 27.2 1. P2 = 1/36 = P12 P3 = P11 = 2/36 P4 = P10 = 3/36 P5 = P9 = 4/36 P6 = P8 = 5/36 P7 = 6/36 = 7 = 24152 3. P0 = 1/20 P1 = 8/20 P2 = 6/20 P3 = 4/20 P4 = 1/20 = 18 = 098 5. Pr4 = 6/1326 Pr5 = 16/1326 Pr6 = 22/1326 Pr7 = 32/1326 Pr8 = 38/1326 Pr9 = 48/1326, Pr10 = 54/1326 Pr11 = 640/1326 Pr12 = 190/1326 Pr13 = 64/1326 Pr14 = 54/1326 Pr15 = 48/1326, Pr16 = 38/1326 Pr17 = 32/1326 Pr18 = 22/1326 Pr19 = 16/1326 Pr20 = 6/1326, = 11566 = 233
Section 27.3 1. (a) 0.178 (c) 0.313 (e) 0.0085 (g) 0.273 3. (a) 0.263 (b) 0.329 (c) 0.099 (d) 0.0014
Answers and Solutions to Selected Problems 5. 0.376 9. (a) 0.052 (b) 0.23519 11. For three hits, 0.055. For two to six hits, 0.24653.
Section 27.4
√ 1 2 1. (a) = 15 = 5/ 2 (b) y = √ e−x−15 /25 (c) 205510−5 (d) 0999 (e) 1739610−8 5 1 2 3. (a) = 168 = 93467 (b) y = √ e−x−162 /17472 (c) 3008710−5 (d) 099974 (e) ≈ 1 93467 2 5. probability of between 250 and 600 miles is 0.0417 probability of between 600 and 900 miles is 0.979 7. probability of between .245 and .270 is 0.539 probability of over .260 is 0.35162 probability of .300 or over is 0.01135 9. (a) a = 263 b = 287 (b) a = 251 b = 299 (c) a = 239 b = 311
Section 27.5 1. mean of the sample means = population mean = 5.97 √ standard deviation of the population = = 21446 standard deviation of the sample means = 06629 / 5 = 0959 3. (a) 0.059 (b) 0.5 (c) 0.00053 (bonus extremely unlikely) 5. (a) 0.868 (b) 0.665 (c) 0.997
Section 27.6 1. 3. 5. 7.
(a) (a) (a) (a)
001 < p < 003 (b) 0012 < p < 0028 (c) 0013 < p < 0027 0366 < p < 0504 (b) 0385 < p < 0485 2,401 (b) 384 (c) 2215.9 105.8 (b) 36 (c) 24
Section 27.7 1. (a) 7186 < < 7214 (b) 180 3. (a) 10533 < < 10647 (b) 3,819 5. 2839 < < 3161
Section 27.8 1. c = 099895, significant linear correlation regression line: y = 30196 + 081813x y04 = 33469 29297 < y < 37641 y62 = 8092 7701 < y < 84830 y151 = 15373 14879 < y < 15687 3. c = 0042039 – no significant linear correlation exists. 5. c = 099845, significant linear correlation regression line: y = −13931 + 087678x y06 = −086703 −25493 < y < 081527 y3 = 12372 −04168 < y < 28912 y39 = 32801 30848 < y < 34754 y421 = 35519 33488 < y < 3755 7. c = 099919, significant linear correlation regression line: y = −28232 + 15892x y−4 = −918 −11339 < y < −70212 y111 = 14817 12838 < y < 16796 y301 = 45012 43 < y < 47024 y667 = 10318 10041 < y < 10595
A71
A72
Answers and Solutions to Selected Problems
9. (a) c = 098198, hence a significant linear correlation. (b) y = 1375 + 07704x (c) 96% of these wins are accounted for by this player’s presence. (d) 67 wins projected, with 62 < y < 72. 11. (a) c = 095235, significant linear correlation. (b) y = 071719 + 033661x (c) 91% (d) 9 thefts, with 764 < y < 1062
Index
A Acceleration, 481–483, 488–490 defined, 482 normal components, 488–490 tangential components, 488–490 Adams-Bashforth method, 199 Adams-Moulton method, 199 Adjacency matrix, 248 Adjacent, defined, 247 Algebra, 203–211, 223–224, 228–235, 237–298 column operations, 307–311 homogeneous systems of linear equations, 272–280 linear dependence and independence, 228–235 linear equations, systems of, 237, 272–298 matrices, theorem of, 241 matrix, 239–242 matrix addition, 239 matrix notation, 242–243 multiplication of matrices, 239–241, 246–247 row operations, 251–258, 261–265, 307–311 vector, 203–211 vector space Rn , 223–224 vectors, theorem of, 206 Almost linear systems, 431–451 Amplitude spectrum, complex Fourier series, 633–635 Analysis, see Complex analysis; Fourier analysis; Multiresolution analysis; Vector analysis Analytic function, 156 Angle preserving mapping, 1062 Arc length, line integrals with respect to, 525–528 Archimedes’ principle, 567–568 Area, surface, 557 Argument, 918–920, 1037–1039 complex numbers, 918–920 principle, 1037–1039 Augmented matrix, 285–286 Autonomous systems, defined, 406 Average, see Mean B Basis, 234 Bayes’ theorem, 1134–1138 Beats, 102–103 Bell curve, 1165–1176
continuity adjustment, 1170–1172 defined, 1165 normal distribution, 1166–1168 standard, 1174–1176 Bendixson theorem, 466–467 Bernoulli equation, 42–43, 45–46 Bessel functions, 171–172, 178–180, 719–745, 749–750, 759–761 applications of, 727–732 defined, 171 eigenfunctions, as, 749–750 equation of order, 171 equation of zero order, 178 first kind of order, 171–172, 721–722 Fourier–Bessel coefficients, 741–744 Fourier–Bessel expansions, 739–741 gamma function, 719–721 generating function, 732 inequality, 759–761 integral formula, 733–734 interlacing lemma, 739 linear independent solution, 173–178, 178–180 modified, 725–727 orthogonality, 740–741 recurrence relation, 735–736 second kind, 722–725 second kind of order, 178–180, 722–725 solutions, 721–722 zeros, 737–739 Bessel’s inequalities, 618–620 Binomial coefficients, 1107 Binomial distribution, 1154–1157 Bolzano–Weierstrass theorem, 936 Boundary conditions, 781–785, 841–844, 851–853, 857–859, 871–873 heat conduction, effects on, 857–859 heat equation, 841–844, 851–853 insulated, 843, 847–848 Laplace transform solution for values, 871–873 temperature, 843, 844–848 transformations of values, 851–853 wave equation, 781–785 Boundary points, 927–928 Bounded function, 942–943
I1
I2
Index
Bounded sequences, points, 935 Bounded set, points, 935 C Calculus, see Differential calculus; Integral calculus Cauchy principal value, 1052 Cauchy’s theorem, 990–1005 consequences of, 994–1005 deformation theorem, 995–997, 1002–1005 higher derivatives, 1000–1002 independence of path, 994–995 integral formula, 997–1000 Liouville’s theorem, and, 1001–1002 proof of, 993 terminology for, 990–991 Cauchy–Riemann equations, 945–950 Cauchy–Schwarz inequality, 216–217, 225 inequality in Rn , 225 theorem of, 216–217 Center, measures of, 1143–1146 mean, 1143–1145 median, 1145–1146 Central limit theorem, 1181–1184 Cesàro filter, 690–692 Characteristic equation, 73–74 Characteristic polynomial, 326–328 Characteristic vectors, see Eigenvectors Characteristics, wave equation, 822–830 Circulation, fluid flow, 1087 Closed set, points, 929 Coefficients, 587–588, 679–681 DFT approximation of, 679–681 Fourier, 587–588 Cofactor expansion, 311–314 column, by a, 313 defined, 311 row, by a, 312 Column operations, 307–311 Column spaces, matrices, 266–269 Compact set, points, 935–936 Complementary events, 1121 Complex analysis, 911–1095 conformal mappings, 1055–1095 functions, 939–973 integration, 975–1005 numbers, geometry and arithmetic of, 913–937 residue theorem, 1030–1054 series representations of functions, 1007–1022 singularities, 1023–1030 Complex Fourier integral, 642–652 Complex Fourier series, 630–635 Complex numbers, 630–631, 913–937, 991 argument, 918–920 conjugate, 915–916 defined, 913 division, 916 geometry and arithmetic of, 913–937 inequalities, 917 loci, 921–936 magnitude, 915–916 ordering, 920–921
planes, 914, 921–936 points, sets of, 921–936 polar form, 918–920 review of, 630–631 rules of, 914 simply connected, 991–992 Complex sequences, points, 931–934 Conditional probability, 1122–1125 Conditions, 781–785, 798–801, 834–837, 841–844 boundary, 781–785, 841–844 heat equation, 841–844 initial, 781–785, 798–801, 841–844 periodicity, vibration, 834–837 wave equation, 781–785, 798–801 Confidence intervals, 1185–1189 Conformal mappings, 1062–1095 angle preserving, 1062 complex function models, 1087–1094 construction of between domains, 1072–1080 defined, 1063 Dirichlet problem, 1080–1086 fluid flow, 1087–1094 harmonic functions, 1080–1086 Joukowski transformation, 1093–1094 linear fractional transformations, 1064–1071 mean value property, 1082 Möbius transformation, 1064 orientation preserving, 1062 Riemann mapping theorem, 1072–1077 Schwartz–Christoffel transformation, 1077–1080 three point theorem, 1069–1071 Conjugate, complex numbers, 915–916 Connected set, defined, 991 Conservative field, test for, 539–543 Conservative vector field, 536 Consistent system of equations, 285–288 Continuity adjustment, 1168–1172 Continuity function, 941–942 Continuous random variable, 1153 Convergence, 593–609, 611, 613, 620–622, 757–759, 762–763, 932–934, 952–956 cosine series, 611 eigenfunction expansions, 757–759 end points, at, 599–600 Fourier series, 593–609, 611, 613, 620–622 Gibbs phenomenon, 606–609 mean, 762–763 open disk of, 954 partial sums, 604–606 points, 932–934 power series, 952–956 radius of, 954 sine series, 613 theorems, 596–599, 601–604 uniform and absolute, 620–622 Convolution, 134–139, 657–660 commutativity of, 137, 657 defined, 134 Fourier transform, 657–660 frequency, 658–659 inverse version, 136
Index linearity, 657 theorem, 135–136 time, 658 Correlation, 1194–1198 Cosine, 609–612, 640–642, 670–671, 673 Fourier series, 609–612 Fourier transform, 670–671, 673 integral, 640–642 Counting, see Probability Cramer’s rule, 318–320 Critical damping, 96 Critical points, 413, 424–431, 466 asymptomatically stable, 428 defined, 425 enclosure of, 466 linear systems, 413 nonlinear systems, 424–431 stability of, 427 Critically damped forced motion, 99 Cross product, 217–223 defined, 217 parallelogram, 220–221 properties of, 218–219 rectangular parallelopiped, 221–222 right-hand rule, 219 Crystals, random walks in, 247–250 Curl, 510–512, 513–514, 576 defined, 510 physical interpretation of, 513–514, 576 Stoke’s theorem, 576 Curvature, 483–488, 491–492 defined, 483 function of t, 491–492 unit normal vector, 486–488 Curves, 517–525, 975–979, 1089 complex planes, in, 975–979 continuous, 977 coordinate functions, 975–976 differentiable, 977 equipotential, 1089 equivalent, 977–979 initial point, 975 line integrals of, 517–525 simple, 977 terminal point, 975
D d’Alembert’s solution, 822–830 Damping constant, 94 Data, statistical, 1143 Deformation theorem, 995–997, 1002–1005 Derivatives, 601, 652–655, 943–945, 1000–1002 bounds on, 1001–1002 Cauchy’s integral formula for, 1000–1002 complex functions, of, 943–945 Fourier transform of a, 652–655 higher, 1000–1002 Liouville’s theorem, 1001–1002 right and left, 601
Determinants, 299–322 cofactor expansions, 311–314 column operations, 307–311 Cramer’s rule, 318–320 defined, 301–302 evaluation of, 307–311 matrix inverse, 315–318 matrix tree theorem, 320–322 minor, 311 permutations, 299–301 properties of, 303–307 row operations, 307–311 triangular matrices, 314–315 Diagonalization, 330–339, 384–386, 398–401 matrices, 330–339 solution of X = AX + G, 398–401 solution of X = AX, 384–386 Differential calculus, 473, 475–515 acceleration, 481–483, 488–490 curl, 510–512, 513–514 curvature, 483–488, 491–492 directional derivatives, 501–504 divergence, 510–511, 512–513 Frenet formulas, 492–493 gradient field, 499–501, 504–509 level surfaces, 503–507 normal components, 488–490 normal lines, 508–509 one variable, 475–481 streamlines, 495–499 tangent planes, 507–508 tangential components, 488–490 torsion, 493 unit normal vector, 486–488 vector fields, 493–495 vector functions, 475–481 velocity, 481–483 Differential equations, 1–200, 359, 361–402, 403–472, 779, 781–840, 841–878, 879–909 Bernoulli equation, 42–43, 45–46 Bessel’s equation, 171, 178 characteristic, 73–74 defined, 1 Euler’s equation, 78–81 exact, 26–32 first order, 3–60 heat equation, 841–878 higher-order, 91–92 homogeneous, 38–42, 45–46, 64–68 introduction to, 1–2 Laplace transform, 107–154 linear, 22–26, 37, 61–62, 361–402 nonhomogeneous, 68, 82–93 nonlinear, 403–472 numerical approximations of solutions, 181–200 order, defined, 2 partial, 779, 781–840, 841–878, 879–909 polynomial coefficients, with, 150–154 potential equation, 879–909 second order, 61–106
I3
I4
Index
Differential equations (continued) separable, 11–21, 37 series solutions, 155–180 solution, defined, 2 systems of, 359, 361–402, 403–472 wave equation, 781–840 Diffusivity, defined, 842 Dirac delta function, 139–144, 660–661 filtering property, 140–142 Fourier transform, 660–661 unit impulses, 139–144 Direct fields, first order differential equations, 7–10 Direction fields, nonlinear systems, 406–413 Directional derivatives, 501–504 Dirichlet problem, 879–886, 888–898, 1080–1086 conformal mappings, 1080–1086 cube, for a, 896–898 defined, 880 disk, for a, 883–886 electrostatic potential, 893–895 harmonic functions and, 879–880, 1080–1086 planes, 889–893 rectangle, for a, 881–883 solution of by conformal mapping, 1083–1086 unbounded regions, 888–895 Discrete Fourier transform (DFT), 675–681, 685–689 approximation of, 685–689 coefficients, approximation of, 679–681 inverse N -point, 678–679 linearity, 678 N -point, 676–678, 685–689 periodicity, 678 Discrete random variable, 1153 Displacement, 791–794 initial, 793–794 wave equation, 791–794 zero initial, 791–793 Distributions, sampling of, 1178–1185 Divergence, 510–511, 512–513, 564–567, 570–571 conservation of mass principle, 570–571 defined, 510 Gauss theorem, 564–567 physical interpretation of, 512–513 theorem, 564–567, 570–571 Division, complex, 916 Domain, 539–545, 578–579, 815–821, 908, 991, 1072–1080 Cauchy’s theorem, 991 conditions of, 540 conformal mappings, construction of between, 1072–1080 conservative field, test for, 539–545 defined, 991 plane, in a, 540–542 Riemann mapping theorem, 1072–1077 simply connected, 543 3-space, in surfaces, 578–579 unbounded, 815–821, 908 wave equation, solution on unbounded, 815–821 Dot product, 211–217 Cauchy–Schwarz inequality, 216–217 defined, 211
orthogonal vectors, 215–216 properties of, 211–212 E Eigenfunction, 749–752, 755–759, 763–765 completeness of, 763–765 expansions, 755–759 Sturm–Liouville theory, 745–755 Eigenvalues, 323–330 defined, 323–324 characteristic polynomial, 326–328 Gerschgorin’s theorem, 328–330 Eigenvectors, 323–330 Electrical circuits, 51–53, 57, 103–105, 106, 129–132 first order differential equation applications to, 51–53, 57 Kirchoff’s current and voltage law, 52 second order differential equations, analogy with, 103–105, 106 shifting theorems, analysis of by, 129–132 Electrostatic potential, 893–895 Elementary matrix, 253–258 Enclosure of critical points, 466 End points, convergence at, 599–600 Equality matrices, 238 Equilibrium point, 413 Equipotential curves, 1089 Euler’s equation, 78–81 Euler’s formula, 75 Euler’s method, 182–190, 193–194 defined, 183 modified, 193–194 numerical approximation solutions, 182–190 Events, 1112–1122, 1126–1129 complementary, 1121–1122 defined, 1114 experiment, 1112–1113 independent, 1126–1129 principle of complementarity, 1121 probability of, 1116–1120 product rule, 1128–1129 Exact differential equations, 26–32 defined, 28 potential function, 28 test for exactness, 30–32 Existence and uniqueness theorems, 59–60 Expansions, 581–582, 708–711, 739–741, 745–765, 1019–1022 eigenfunction, 582, 745–755, 755–759 Fourier–Bessel, 739–741 Fourier–Legendre series, 709–711 Laurent, 1019–1022 orthogonality, 708–709, 740–741 Sturm–Liouville theory, 745–755 wavelets, 582, 765–778 Exponential functions, 957–966 Exponential matrix solutions, 386–392 F Fast Fourier transform (FFT), 694–699 analysis of tides in Morro Bay, 697–699 analyzing power spectral densities in signals, 695 filtering noise from a signal, 696 procedure of, 694
Index Filtering, 140–142, 660–661, 667–669, 689–692, 696 bandpass, 667–669 Cesàro filter, 690–692 Dirac delta function, 140–142 Fourier transformations, 660–661, 667–669 lowpass, 667–669 noise, using FFT, 696 property, 140–142 sampled Fourier series, 689–692 First order differential equations, 3–60 Bernoulli, 42–43, 45–46 direct fields, 7–10 electrical circuits, applications to, 51–53, 57 exact, 26–32 general solutions, 3–4 homogeneous, 38–42, 45–46 implicitly defined solutions, 4 initial value problem, 6–7, 58–60 integral curves, 5–6 integrating factors, 33–38 linear, 22–26 mechanics, applications to, 46–51, 55–57 orthogonal trajectories, applications to, 53–55, 57 particular solutions, 3–4 preliminary concepts, 3–11 Riccati, 43–46 separable equations, 11–21 theorems for, 30, 59 Fluid flow, 495–499, 1087–1094 circulation, 1087 complex function models for, 1087–1094 equipotential curves, 1089 flux, 1087–1088 lines, 495–499 plane-parallel, 1087 potential, 1089 solenoidal, 1088 stagnation point, 1089 stationary, 1087 streamlines, 495–499, 1089 vector fields, 495–499 vortex, strength of, 1088 Flux, fluids, 1087–1088 Forced motion, 98–100 Forth order (RK4) Runge-Kutta method, 195–197 Fourier analysis, 581–582, 583–635, 637–699, 701–745 Fourier integral, 637–652 Fourier series, 583–635, 681–693 Fourier transform, 642–681, 685–689, 694–699 importance of, 581–582 special functions, 581–582, 701–745 Fourier integral, 637–652 complex, 642–652 cosine, 640–642 formulas, 637–640 Laplace’s, 641–642 sine, 640–642 Fourier series, 583–635, 681–693, 786–808, 844–864 amplitude spectrum, 633–635 Bessel’s inequalities, 618–620
coefficients, 587–588 complex, 630–635 convergence of, 593–609 cosine, 609–612 differentiation, 641–615, 617–618 endpoints, 599–600 even and odd functions, 589–592 frequency spectrum, 633–635 function, of a, 586–593 Gibbs phenomenon, 606–609 heat equation, solutions, 844–864 integration, 614–617 left derivative, 601 Parseval’s theorem, 622–623 partial sums of, 604–606 phase angle form, 623–630 piecewise continuous function, 593–595 piecewise smooth function, 595–596 right derivative, 601 sampled, 681–693 sine, 612–614 special functions, 581–582 wave equation, solutions of, 786–808 Fourier transform, 642–681, 685–689, 694–699, 816–821, 890–873 applications of, 652–670 approximation of, 685–689 complex Fourier integral and, 642–652 convolution, 657–660 cosine, 670–671, 673 defined, 644 derivative, of a, 652–655 Dirac delta function, 660–661 discrete (DFT), 675–681, 685–689 fast (FFT), 694–699 filtering, 660–661, 667–669, 696 finite, 673–675 frequency differentiation, 655–656 frequency shifting, 649 heat equation, integral transform methods for, 869–873 integral, of a, 656 modulation, 651 operational formula, 672 scaling, 649–650 Shannon sampling theorem, 665–667 sine, 671–672, 673–674 symmetry, 650–651 time reversal, 650 time shifting, 647–649 wave equation, solution on unbounded domains, 815–821 windowed, 661–665 Frenet formulas, 492–493 Frequency differentiation, 655–656 Frequency of outcomes, 1159–1164 Frequency shifting, Fourier transform theorem of, 649 Frequency spectrum, complex Fourier series, 633–635 Frobenius, method of, 166–171, 173–178 linear independent solutions, 173–178 series, 168 singular points, 167–171 theorem, 170
I5
I6
Index
Functions, 122–125, 475–481, 589–596, 600–661, 701–745, 879–880, 939–973, 980–990, 1007–1022. See also Bessel functions; Eigenfunctions analytic, 1008 bounded, 942–943 Cauchy–Riemann equations, 945–950 complex, 939–973, 980–990 continuity, 941–942 derivative of complex, 943–945 Dirac delta, 660–661 even and odd, 589–592 exponential, 957–966 generating, 704–706, 732 harmonic, 879–880 heaviside, 122–125 integrals of complex, 980–990 integrals of series of, 988–989 limit, 939–941 logarithm, 966–969 piecewise continuous, 593–595 piecewise smooth, 595–596 power series, 950–957 powers, 969–972 pulse, 124–125 series representations of, 1007–1022 special, 701–745 trigonometric, 957–966 vector, 475–481 G Gamma function, 719–721 Gauss’s divergence theorem, 564–567 Gauss-Jordan method, 276–279 General solutions, first order differential equations, 3–4 Generating function, 704–706, 732 Bessel functions, 732 Legendre polynomials, 701–719 Gerschgorin’s theorem, 328–330 Gibbs phenomenon, 606–609 Gradient field, 499–501, 504–509 defined, 500 normal vector, as, 504–509 Graph, defined, 274 Green’s theorem, 528–535, 563–564, 903 extension of, 532–535 first identity, 903 rewritten, 563–564 uses of, 530–532 vector integral calculus, 528–535 H Haar wavelets, 767–774, 774–775 Harmonic functions, 879–880, 1080–1086 conformal mappings, 1080–1086 Dirichlet problems and, 879–880, 1080–1086 Heat conduction, 857–859, 865–878. See also Steady-state heat equation effects of boundary conditions and constants on, 857–859 infinite cylinder, in, 873–877 infinite media, in, 865–873
integral transform methods, 869–873 Laplace transform solution, 871–873 rectangular plate, in, 877–878 Heat equation, 568–570, 841–878 boundary conditions, 841–844 determination of, 568–570 Fourier series solutions, 844–864 heat conduction, 857–859, 865–878 initial conditions, 841–844 integral transform methods, 869–873 nonhomogeneous, 854–857 numerical approximation of, 859–862 temperature, 843, 844–848 transfer coefficient, 849 vector integral calculus, as, 568–570 Heaviside functions, 122–125 Hermitian matrices, 355–357 Homogeneous equations, 38–42, 45–46, 64–68, 272–280. See also Nonhomogeneous equations complete pivoting, 276 first order differential, 38–42, 45–46 Gauss-Jordan method, 276–279 linear dependence and independence, 65–66 second order differential, 64–68 solutions of, 272–280 systems of linear equations, 272–280 Wronskian test, 66–67 Hooke’s law, 94 I Identity matrix, 244 Identity theorem, 1015–1016 Implicitly defined solutions, 4 Improper node, 420–421, 429 Inconsistent system of equations, 285, 289–292 Independence of path, 536–545, 987, 994–995 Cauchy’s theorem, 994–995 complex integrals, 987 defined, 537 potential theory in the plane, and, 536–545 test for a conservative field, 539–543 Independent events, 1126–1129 Inequalities, complex numbers, 917 Initial conditions, 6, 781–785, 798–801, 841–844 defined, 6 heat equation, 841–844 motion, effects on, 798–801 temperature, 843 wave equation, 781–785, 798–801 Initial value problem, 6–7, 58–60, 116–120, 156–161 defined, 6 existence and uniqueness for solutions of, 58–60 first order differential equations, 6–7 Laplace transform, solutions of using, 116–120 power series solutions of, 156–161 theorems for, 59, 116–117, 156, 159 Input frequency, 100 Instability, Lyapunov’s direct method for, 454 Integral calculus, 473, 517–580 applications of surface integrals, 557–562 Archimedes’ principle, 567–568
Index conservative vector field, 536 curves, 517–525 divergence theorem, 564–567, 570–571 domain, 540–542 Gauss’s divergence theorem, 564–567 Green’s theorem, 528–535, 563–564 heat equation, 568–570 independence of path, 536–545 lines, 517–528 potential theory, 536–545, 576–579 Stoke’s theorem, 572–579 surfaces, 545–562 Integral curves, 5–6 Integrals, 517–528, 545–562, 637–652, 656, 980–990, 997–1000, 1040–1052 arc length, with respect to, 525–528 Cauchy principal value, 1052 Cauchy’s formula, 997–1000 complex functions, of, 980–990 complex in terms of real, 983–985 evaluation of real, 1040–1052 Fourier, 637–652 Fourier transform of, 656 independence of path, 987 line, 517–528 linearity, 985 properties of complex, 985–989 residue theorem, 1040–1052 reversal of orientation, 986 series of functions, of, 988–989 surface, 545–562 term-by-term integration, 988–989 vector calculus, 517–528 Integrating factors, 22, 33–38 defined, 22, 33 linear equations and, 22, 37 nonzero function, 33–34 separable equations and, 37 Integration, 975–1005 Cauchy’s theorem, 990–1005 complex functions, 975–1005 curves, complex plane, 975–979 integrals, 980–989 term-by-term, 988–989 Interior points, 925 Inverses, 136, 293–298, 315–318, 678–679 convolution, 136 determinant formula for matrix, 315–318 matrix, 293–298, 315–318 method of finding A−1 , 295–296 N -point, DFT, 678–679 nonsingular matrix, 294 singular matrix, 294 systems of linear equations, relation to, 296–297 uniqueness of, 294 J Jordan curve theorem, 529 Joukowski transformation, 1093–1094 Jump discontinuities, 593
K Kirchoff’s current and voltage law, 52 L Laplace transform, 107–154, 379–398, 641–642, 871–873, 1039–1040 boundary condition values, solution for, 871–873 convolution, 134–139 defined, 107 derivative, 116–117 differential equations, solving, 150–154 Dirac delta function, 139–144 functions, table of, 109–111 higher derivative, 117 initial value problems, solution of using, 116–120 integrals, 641–642 inversion formula for, 1039–1040 Lerch’s theorem, 114 linearity of, 112 piecewise continuity, 112–113 polynomial coefficients, solving equations with, 150–154 residue theorem, 1039–1040 shifting theorems, 120–134 systems, solution of, 144–150 theorems for, 112, 113, 114, 115, 116–117, 120, 126, 135–136, 137, 140, 150, 152 unit impulses, 139–144 variation of parameters method and, 397–398 Laplace’s equation, see Steady-state heat equation Lattice points, 803 Laurent expansion, 1019–1022 Leading entry, 258 Legendre polynomials, 701–719, 750–752 approaches to, 701–704 derivative formula, 715–716 eigenfunctions, as, 749–750 Fourier–Legendre coefficients, 711–713 Fourier–Legendre series, 709–711 generating function, 704–706 integral formula, 716–717 orthogonality, 708–709 recurrence relation, 706–708 Rodrigues’ formula, 715–716 zeros, 713–714 Lerch’s theorem, 114 Level surfaces, 503–507 Lienard theorem, 468–470 Limit cycles, 461–466 Bendixson theorem, 466–467 enclosure of critical points, 466 Lienard theorem, 468–470 Poincaré-Bendixson theorem, 467–468 van der Pol equation, 468–470 Limit function, 939–941 Limit points, 929–931 Line integrals, 517–528 Lineal elements, 8 Linear combinations in Rn , 229 Linear correlation, 1194–1198
I7
I8
Index
Linear dependence and independence, 65–68, 228–235, 365–371 defined, 65, 230, 365 linear differential equations for, 365–371 vectors, 230–235 Wronskian test, 66 Linear differential equations, 22–26, 37, 61–62, 361–402, 413–424 defined, 22 dependence, 365–371 diagonalization, 384–386, 398–401 exponential matrix solutions, 386–392 first order, 26–27, 37, 361–373 homogeneous system X = AX, 365–372 independence, 365–371 integrating factors, and, 37 nonhomogeneous system X = AX + G, 372–373 phase portrait, 413–424 second order, 61–62 solution of X = AX + G, 394–402 solution of X = AX when A is constant, 374–394 systems of, 361–402 theory of systems, 361–347 variation of parameters method, 394–398 Linear equations, 237, 242–243, 272–293, 296–297 Gauss-Jordan method, 276–279 homogeneous systems of, solution of, 272–280 inverses, relation to, 296–297 nonhomogeneous systems, 283–293 notation for systems of, 242–243 solution space, 280–283 use of matrices and, 237 Linear fractional transformations, 1064–1071 defined, 1064 three point theorem, 1069–1071 Linear independence, 365–371 Linearity, complex integrals, 985 Liouville’s theorem, 1001–1002 Loci, see Points Logarithm, 966–969 complex, 966–969 natural, 966 Lyapunov’s stability criteria, 451–461 direct method, 454 positive definite, 453 semidefinite, 453 M Magnitude, complex numbers, 915–916 Mappings, 1055–1095 angle preserving, 1062 conformal, 1062–1095 defined, 1055 functions as, 1055–1061 inversion, 1066–1067 orientation preserving, 1062 stereographic projection, 1070 Mass, surface, 557–559 Mathematical modeling, 20 Matrices, 237–298, 299–322, 330–339, 339–347, 352–358 addition of, 239 adjacency, 248
algebra, 239–242 augmented, 285–286 cofactor, 311 column operations, 307–311 column spaces, 266–269 crystals, random walks in, 247–250 defined, 238 determinants, 299–322 diagonalization of, 330–339 elementary, 253–258 equality, 238 Hermitian, 355–357 homogeneous systems of linear equations, 272–280 identity, 244 inverses, 293–298 linear equations, systems of, 237, 272–298 matrix inverse, 293–298, 315–318 minor, 311 multiplication of, 239–241, 246–247 nonhomogeneous systems of linear equations, 283–293 nonsingular, 294 notation for systems of linear equations, 242–243 orthogonal, 339–343 product of, with a scalar, 239 rank, 269–271 reduced row echelon, 259–261 row echelon form of, 258–266 row equivalence, 257 row operations, 251–258, 261–265, 307–311 row spaces, 266–269 singular, 294 Skew-Hermitian, 355–357 solution space, 280–283 symmetric, 343–347 transpose, 245 tree theorem, 320–322 triangular, 314–315 unitary, 352–355 zero, 243–244 Maximum modulus theorem, 1016–1018 Mean, 759–761, 762–763, 1082, 1143–1145, 1152–1153, 1190–1193 approximation in, 759–761 Bessel inequality, 759–761 convergence in, 762–763 defined, 1143 Parseval’s theorem, 762–763 population, estimating, 1190–1193 random variables, 1152–1153 statistical, 1143–1145, 1152–1153 value property, 1082 Mechanical systems, 46–51, 55–57, 93–106 applications to, 46–51, 55–57, 93–106 beats, 102–103 critical damping, 96 electrical circuits, analogy with, 103–105, 106 first order differential equations, 46–51, 55–57 forced motion, 98–100 motion, 48–49, 50–51 Newton’s laws, 46–47 overdamping, 95–96
Index resonance, 100–102 second order differential equations, 93–106 terminal velocity, 47–48 underdamping, 96–97 unforced motion, 95–98 velocity, 49–50 Median, statistical, 1145–1146 Minor, matrices, 311 Mixing problems, 24–25 Möbius transformation, see Linear fractional transformations Modulation, Fourier transform theorem of, 651 Motion, 48–49, 50–51, 93–106, 98–100, 798–801, 808–822 beats, 102–103 critically damped forced, 99 damping constant, 94 effects of initial conditions and constants on, 798–801 electrical circuit, analogy with, 103–105 forced, 98–100 Fourier transform solution on unbounded domains, 815–821 Hooke’s law, 94 infinite string, along an, 808–813 mathematical models of, 48–49, 50–51 mechanical systems, 48–49, 50–51, 98–100 Newton’s laws, 46–47, 94 overdamped forced, 99 overdamping, 95–97 resonance, 100–102 semi-infinite string, along an, 813–815 spring equation, 94–95 underdamped forced, 100 unforced, 95–97 wave, 808–822 Multiplication principle, 1099–1101 Multiresolution analysis, 774–775, 775–776 defined, 774 Haar wavelets, 774–775 scaling function, 776 wavelets, general construction of, 775–776 Multistep methods, numerical approximation, 197–200 N Natural frequency, 100 Neumann problem, 902–908 disk, for a, 906–908 Green’s first identity, 903 rectangle, for a, 904–906 unbounded domain, 908 Newton’s laws, 46–47, 94, 842 cooling, 842 motion, 46–47, 94 Nodal source, 415–418, 429 linear systems, 415–418 nonlinear systems, 429 Noise, FFT filtering, 696 Nonhomogeneous equations, 68, 82–93, 283–293, 372–373, 825–828, 854–857 consistent, 285–288 existence and uniqueness of solutions, 285–292 heat equation, 854–857 higher-order differential equations, 91–92
inconsistent, 285, 289–292 linear, systems of, 283–293 solution of X = AX + G, 372–373 structure of solutions, 284–285 superposition, principle of, 91 systems of linear, 283–293, 372–373 theorem for, 68 undetermined coefficients, method of, 85–90 variation of parameters, method of, 82–84 wave equation, 825–828 Nonlinear differential equations, 403–472 almost linear systems, 431–451 critical points, 424–431, 466 direction fields, 406–413 limit cycles, 461–466 Lyapunov’s stability criteria, 451–461 periodic solutions, 466–470 phase plane, 406–408 phase portrait, 406–413 solutions, existence of, 403–406 systems, 403–406 trajectories, 407–431 translation, 409–412 uniqueness, 404 Nonsingular matrix, 294 Normal distribution, 1162, 1167–1168 Normal lines, 508–509 Normal vector, 486–488, 504–509, 545–562 defined, 504 gradient field, as, 504–509 surfaces, 548–551 unit, 486–488 Numbers, see Complex numbers Numerical approximations, 181–200, 859–862 Adams-Bashforth method, 199 Adams-Moulton method, 199 Euler’s method, 182–190, 193–194 features of, 181–182 heat equation solutions, 859–862 multistep methods, 197–200 one-step methods, 190–197 Runge-Kutta methods, 195–197 Taylor method, second order, 190–192 O One-step methods, numerical approximation, 190–197 Open set, points, 926–927 Operational formula, Fourier transform, 672 Order p method, 197 Ordering, complex numbers, 920–921 Orientation preserving mapping, 1062 Origin (center), 423–424, 429–431 linear systems, 423–424 nonlinear systems, 429–431 Orthogonal expansions, see Expansions Orthogonal matrices, 339–343 Orthogonal trajectories, 53–55, 57 Orthogonal vectors, 215 Orthogonality, 708–709, 740–741 Bessel functions, 740–741 Legendre polynomials, 708–709
I9
I10
Index
Overdamped forced motion, 99 Overdamping, 95–96 P Parallelogram, 206–207, 220–221 cross product, 220–221 law, 206–207 Parseval’s theorem, 622–623 Partial differential equations, 779, 781–840, 841–878, 879–909 heat equation, 841–878 potential equation, 879–909 use of, 779 wave equation, 781–840 Particular solutions, first order differential equations, 3–4 Path, defined, 990 Periodicity conditions, vibration, 834–837 Permutations, 299–301, 1102–1103 defined, 299, 1102 determinants, 299–301 probability, 1102–1103 Phase plane, nonlinear systems, 406–408 Phase portrait, 406–413, 413–424 linear systems, 413–424 nonlinear systems, 406–413 Piecewise, 112–113, 552–553, 593–595, 595–596 continuity, 112–113 continuous function, 593–595 functions, 593–595, 595–596 smooth function, 595–596 surfaces, smooth, 552–553 Plane-parallel flow, 1087 Planes, 507–508, 889–893, 914, 921–936, 975–979, 1070, 1087–1094 complex, 914, 921–936, 975–979 curves in, 975–979 Dirichlet problem, 889–893 extended complex, 1070 fluid flow, complex function models of, 1087–1094 tangent, 507–508 Poincaré-Bendixson theorem, 467–468 Points, 166–173, 921–936, 975 Bessel functions, 171–172 Bolzano-Weierstrass theorem, 936 boundary, 927–928 bounded sequences, 935 bounded set, 935 circles, in, 922–923 closed set, 929 compact set, 935–936 complex pane, in the, 921–936 complex sequences, 931–934 convergence, 932–934 disks, in, 923 distance between, 922, 923–924 equation z − a = z − b, 923–924 Frobenius, method of, 166–173 initial, 975 interior, 925 irregular singular, 167 limit, 929–931 loci, 921–936
open set, 926–927 ordinary, 166 regular singular, 167 sets of, 921–936 singular, 166–173 subsequences, 934–935 terminal, 975 Poisson distribution, 1157–1158 Poisson’s integral formula, 886–888 Polar form, complex numbers, 918–920 Poles, 1024, 1026–1029, 1031–1036 condition for, 1026–1028 double, 1026 order m, of, 1024, 1026–1028, 1033–1036 products, of, 1029 quotients, of, 1028–1029 residue at, 1031–1036 simple, 1031–1033 single, 1026 Polynomial coefficients, solving equations with, 150–154 Population, 1178, 1185–1193 defined, 1178 distributions, sampling of, 1178–1185 mean, estimating, 1190–1193 student t distributions, 1191–1193 Potential equation, 879–909 Dirichlet problems, 879–886 electrostatic, 893–895 harmonic functions, 879–880 Laplace’s equation, 879–880 Neumann problem, 902–908 Poisson’s integral formula, 886–888 steady-state heat equation, 879–880, 898–901 Potential function, 28 Potential theory, 536–545, 576–579 independence of path, 536–545 Stokes theorem, 576–579 3-space, 576–579 Power series, 156–161, 161–166, 950–957, 1007–1018 complex numbers, 951–952 complex polynomial, 950 convergence, 953–956 defined, 952 identity theorem, 1015–1016 initial value problems, 156–161 isolated zeros, 1012–1015 maximum modulus theorem, 1016–1018 recurrence relations, 161–166 representations of functions, 1007–1018 solutions, 156–161, 161–166 Taylor coefficients, 956 Taylor series, 1007–1012 Power spectral densities, FFT analysis of, 695 Powers, 969–972 complex numbers, 969–972 integer, 969 positive integer n, 969–971 rational, 971–972 Prediction interval, 1198–1203 Principal axis theorem, 350–351
Index Probability, 1097, 1099–1141, 1150–1154 Bayes’ theorem, 1134–1138 choosing r objects from n objects, 1104–1111 complementary events, 1121 conditional, 1122–1125 counting and, 1099–1141 distributions, 1150–1154 events, 1112–1122, 1126–1129 expected value, 1139–1140 experiment, 1112–1113 independent events, 1126–1129 multiplication principle, 1099–1101 permutations, 1102–1103 sample space, 1112–1114 statistics and, 1097 tree diagrams, 1107–1111, 1130–1133 Probability distribution, defined, 1151 Product rule, 1128–1129 Products, poles of, 1029 Proper node, 419, 429 Pulse functions, 124–125 Pure imaginary number, 914 Q Quadratic forms, 347–352 Quotients, poles of, 1028–1029 R Random variables, 1150–1154 continuous, 1153 countable, 1154 defined, 1153–1154 discrete, 1153 experiment of, 1150 mean, 1152–1153 probability distributions, and, 1150–1154 standard deviation, 1152–1153 uncountable, 1154 Random walk, defined, 247 Range, statistical, 1146–1147 Rank, matrices, 269–271 Rectangular parallelopiped, cross product, 221–222 Recurrence relations, 161–166, 706–708, 735–736 Bessel functions, 735–736 Legendre polynomials, 701–719 power series solutions using, 161–166 Reduced matrix, 258–261 Reduction of order, 69–72 Regression, 1198–1201 Residue, 1030–1054 applications of, 1037–1054 argument principle, 1037–1039 Cauchy principal value, 1052 defined, 1030 Laplace transform, inversion formula for, 1039–1040 pole of order m, 1033–1036 real integrals, evaluation of, 1040–1052 simple pole, 1031–1033 theorem, 1030–1054 Resonance, 100–102
Reversal of orientation, complex integrals, 986 Riccati equation, 43–46 Riemann mapping theorem, 1072–1077 Right-hand rule, 219 Rodrigues’ formula, 715–716 Row echelon form, 258–266 elementary row operations, 261–265 reduced matrix, 258–261 reducing, 261 Row equivalence, 257 Row operations, 251–258, 261–265, 307–311 determinants, evaluation of by, 307–311 elementary matrices, 251–258 reduced matrices, 261–265 Row spaces, matrices, 266–269 Runge-Kutta methods, 195–197 S Saddle point, 418, 429 Sample space, 1112–1114 Scalar field, 499 Scalar, 203–205, 239 defined, 203 product of, with a vector, 204–205 product of, with a matrix, 239 Scaling, 649–650, 776 Fourier transform theorem of, 649–650 function, 776 Schwartz–Christoffel transformation, 1077–1080 Second order differential equations, 61–106 characteristic equation, 73–74 constant coefficient homogeneous linear equation, 72–78 defined, 61 electrical circuits, analogy with, 103–105, 106 Euler’s equation, 78–81 Euler’s formula, 75 homogeneous equations, 64–68 linear dependence and independence, 65–68 linear equations, 61–62 mechanical systems, applications for, 93–106 nonhomogeneous equations, 68–69, 82–93 preliminary concepts, 61–62 reduction of order, 69–72 superposition, principle of, 91 theory of solutions, 62–69 Separable differential equations, 11–21, 37 applications of, 14–20 defined, 11 integrating factors, and, 37 mathematical modeling, 20 Torricelli’s law, 18–19 Separation constant, 786 Series representations of functions, 1007–1022 functions, of, 1007–1022 identity theorem, 1015–1016 isolated zeros, 1012–1015 Laurent expansion, 1019–1022 maximum modulus theorem, 1016–1018 power, 1007–1018 Taylor, 1007–1012
I11
I12
Index
Series solutions, 155–180. See also Points analytic function, 156 Bessel functions, 171–172, 178–180 Frobenius, method of, 166–171, 173–178 initial value problems, 156–161 linear independent, 173–178, 178–180 logarithm factors, 173–180 power series solutions, 156–161, 161–166 recurrence relations, 161–166 second solutions, 173–180 singular points, 166–173 Shannon sampling theorem, 665–667 Shannon wavelets, 776–778 Shifting theorems, 120–134 electrical circuits, analysis of, 129–132 first, 120–122 heaviside functions, 122–125 pulse functions, 124–125 second, 125–129 Simply connected, complex numbers, 991–992 Sine, 612–614, 640–642, 671–672, 673–674 Fourier series, 612–614 Fourier transform, 671–672, 673–674 integral, 640–642 Singular matrix, 294 Singularities, 1023–1030 classification of, 1025 essential, 1024 isolated, 1023 pole of order m, 1024, 1026–1028 poles, 1026–1029 removable, 1024–1025 68, 95, 99.7 rule, 1176–1177 Skew-Hermitian matrices, 355–357 Solenoidal fluid, 1088 Solution space, 280–283 Spanning set, 229–230 Special functions, 581–582, 701–745 Bessel functions, 719–745 Legendre polynomials, 701–719 Sturm–Liouville theory, 745–755 Speed, see Velocity Spheres, 1070 Spiral point, 421–424, 429 Spring equation, 94–95 Stability, 424–431, 451–461 critical points, 424–431 direct method, 454 Lyapunov’s criteria, 451–461 Stagnation point, 1089 Standard deviation, 1147–1149, 1152–1153 defined, 1147 random variables, 1152–1153 Standard representation, 207 Stationary flow, 1087 Statistics, 1097, 1143–1204 bell curve, 1165–1176 binomial distribution, 1154–1157 center, measures of, 1143–1146 central limit theorem, 1181–1184 confidence intervals, 1185–1189
continuity adjustment, 1168–1172 correlation, 1194–1198 data, 1143 defined, 1143 frequency of outcomes, 1159–1164 mean, 1143–1145, 1152–1153 median, 1145–1146 normal distribution, 1162, 1167–1168 Poisson distribution, 1157–1158 population, 1178, 1185–1193 prediction interval, 1198–1203 probability and, 1097 probability distributions, 1150–1154 random variables, 1150–1154 range, 1146–1147 regression, 1198–1201 sampling distributions, 1178–1185 68, 95, 99.7 rule, 1176–1177 standard deviation, 1147–1149, 1152–1153 variation, measures of, 1146–1149 Steady-state heat equation, 879–880, 898–901 harmonic function, as, 878–880 solid sphere, for a, 898–901 Stereographic projection, 1070 Stoke’s theorem, 572–579 curl, physical interpretation of, 576 integral calculus, 572–579 potential theory in 3-space, 576–579 Streamlines, 495–499, 1089 defined, 495 fluid flow, 1089 vector fields, 495–499 Sturm–Liouville theory, 745–755 eigenfunctions, 749–752 periodic problem, 747, 748–749 regular problem, 745–746, 747–748 singular problem, 747, 749–752 theorem, 752–755 Subsequences, points, 934–935 Subspace of vectors, 226–227, 227–228, 229–235 basis, 234 defined, 226 dimensions, 235 R2 , theorem of, 227–228 spanning set, 229–230 Superposition, principle of, 91 Surfaces, 503–507, 545–562 area, 557 center of mass, 559 flux of a vector field across, 560–562 integrals, 553–562, 557–562 level, 503–507 mass, 557–559 normal vector to, 548–551 piecewise smooth, 552–553 tangent plane to, 551–552 3-space, 545, 576–579 Symmetric matrices, 343–347 Symmetry, Fourier transform theorem of, 650–651
Index T Tangent planes, 503–509, 551–552 defined, 504 differential calculus, 503–509 equation of, 505–506 surfaces, to, 551–552 Taylor method, second order, 190–192 Taylor series, 956, 1007–1012 coefficients, 956 theorems for, 1007–1012 Temperature, 843, 844–848 conditions of, 843 distribution, 848–851 insulated boundary conditions, 843, 847–848 zero, 844–846 Term-by-term integration, 988–989 Terminal velocity, 47–48 Test for exactness, 30–32 Three point theorem, 1069–1071 3-space, 545–554, 576–579 domain, 578–579 potential theory in, 576–579 surfaces in, 545–556 Tides, FFT analysis of, 697–699 Time reversal, Fourier transform theorem of, 650 Time shifting, Fourier transform theorem of, 647–649 Torricelli’s law, 18–19 Torsion, 493 Trajectories, 407–413, 413–424 linear systems, 413–424 nonlinear systems, 407–413 translation, 409–412 Transfer coefficient, 849 Transformations, 660–661, 667–669, 851–853, 1064–1071, 1077–1080, 1093–1094. See also Mappings boundary values, 851–853 Fourier, 660–661, 667–669 Joukowski, 1093–1094 linear fractional, 1064–1071 Schwartz–Christoffel, 1077–1080 Transpose, 245 Tree diagrams, 1107–1111, 1130–1133 computing probabilities, 1130–1133 configuration of, 1107–1111 Trigonometric functions, 957–966 U Unbounded domains, 815–821, 908 Underdamped forced motion, 100 Underdamping, 96 Undetermined coefficients, method of, 85–90 Unforced motion, 95–97 Uniform and absolute convergence, 620–622 Unique solutions, 58–60, 285–292, 294 existence and, theorem for, 59–60 initial value problems, 58–60 inverses, 294 nonhomogeneous systems of linear equations, 285–292 Uniqueness, nonlinear systems, 404 Unit impulses, 139–144 Unit normal vector, 486–488
Unit vectors, 207 Unitary matrices, 352–355 V van der Pol equation, 468–470 Variation of parameters method, 82–84, 394–398 Laplace transform, 379–398 nonhomogeneous equations, solution for, 82–84 solution of X = AX + G, 394–398 Variation, measures of, 1146–1149 range, 1146–1147 standard deviation, 1147–1149 Vector analysis, 473, 475–515, 517–580 differential calculus, 473, 475–515 integral calculus, 473, 517–580 Vector fields, 493–499, 560–562 defined, 493 differential calculus, 493–495 flux across surfaces, 560–562 streamlines (flow lines), 495–499 Vector functions, 475–481 differential calculus, 475–481 one variable, 475–481 Vector space Rn , 223–228 algebra of, 223–224 Cauchy-Schwarz inequality in, 225 dot product of, 224–225 n-vector, defined, 223 subspace, 226–227, 227–228 Vectors, 201–358 algebra and geometry of, 203–211 basis, 254 Cauchy-Schwarz inequality, 216–217 components, 204 cross product, 217–223 defined, 203 determinants, 299–322 diagonalization, 330–339 dimension, 235 dot product, 211–217 eigenvalues, 323–330 linear equations, 237–298 linear independence, 228–235 matrices, 237–298, 330–339, 339–347, 352–358 norm of, 203 orthogonal, 215 parallel, 205 parallelogram law, 206–207 product of a scalar and, 204 quadratic forms, 347–352 space Rn , 223–228 spanning set, 229–230 subspace, 226–227, 227–228 sum of, 206 unit, 207 Velocity, 47–48, 481–483, 786–794 initial, 791–794 terminal, 47–48 vector, analysis, 481–483 wave equation, 786–794 zero initial, 786–791
I13
I14
Index
Vibration, 831–840 circular membrane, 831–837 normal modes, 831–833 periodicity conditions, 834–837 rectangular membrane, 837–840 wave equation, 831–840 Vortex, strength of, 1088 W Walk, defined, 247 Wave equation, 781–840 approximations, 801–804 boundary conditions, 781–785 boundary value problems, 786–798 characteristics, 822–830 d’Alembert’s solution, 822–830 displacement, 791–794 forward and backward, 828–830 Fourier series solutions, 786–808 Fourier transform solution on unbounded domains, 815–821 initial conditions, 781–785, 798–801 lattice points, 803 motion, 798–801, 808–822
nonhomogeneous, 825–828 numerical solution, 801–805 velocity, 786–794 vibration, 831–840 Wavelets, 581–582, 765–778 construction of, 775–776 expansion, 774 Haar, 767–774, 774–775 idea behind, 765–767 multiresolution analysis, 774–775, 775–776 Shannon, 776–778 use of, 582 Windowed Fourier transform, 661–665 Wronskian test, 66–67 Z Zero matrix, 243–244 Zero row, 258 Zeros, 713–714, 737–739, 1012–1015 Bessel functions, 737–739 isolated, 1012–1015 Legendre polynomials, 701–719 order of, 1013–1014
This page intentionally left blank
Guide to Notation The following symbols and notation are used throughout this text. Each symbol is paired with a section in which it is defined or used. Standard symbols, such as notation for integrals and sums, are not included. W[f,g] f
Wronskian of f and g
(2.2)
Laplace transform of f
(3.1)
N
fs Laplace transform of f evaluated at s (3.1) −1
F inverse Laplace transform of F Ht
Heaviside function
(3.3.2)
t
Dirac delta function
(3.5)
a b c vector with three components v
norm (magnitude) of a vector v
F·G
dot product of F and G
F×G Rn
(6.2)
cross product of F and G
n-space; set of all n-vectors
(6.3) (6.4)
matrix whose i j element is aij
aij
n × m zero matrix
(7.1.3)
In
n × n identity matrix
(7.1.3)
At
transpose of A
AR
reduced row echelon form of A
Onm
(6.1) (6.1)
(7.1)
(7.1.3) (7.3)
rankA rank of A (7.4)
AB augmented matrix (7.7.2) A−1
inverse of A (7.9)
A or detA
determinant of A (8.2)
Aij often denotes the minor of the i j element of A (9.1) pA
characteristic polynomial of A
(9.1)
In the context of a system X = AX, denotes a fundamental matrix (10.1); in the context of the fast Fourier transform, denotes the set of nth roots of unity (15.9.1)
T often denotes a unit tangent vector to a curve (12.1) % curvature
(12.2)
&
often denotes a normal (or unit normal) to a curve (12.2) del operator
& or grad
(12.5) gradient of
(12.4)
Du P directional derivative of in the direction of u, evaluated at P (12.4) fdx + gdy + hdz line integral over C (13.1) C F · dR another notation for C fdx + gdy + hdz, C with F = f i + gj + hk (13.1) fx y zds line integral of f with respect to arc C length (13.1.1)
f g Jacobian of f and g with respect to u and
u v v (13.4.1) fx y zd surface integral of f over a surface (13.4.4) fx0 − fx0 + left and right limits, respectively, of f at x0 (14.3) f x0 f x0 left and right derivatives (respectively) of f at x0 (14.3.2) f, or fˆ Fourier transform of f (15.3) −1 f
inverse Fourier transform of f
win f
windowed Fourier transform of f
(15.3) (15.4.6)
wint0 f windowed Fourier transform of shifted f (15.4.6) C f, or fˆC Fourier cosine transform of f (15.5) S f, or fˆS Fourier sine transform of f (15.5) f or f,C n finite Fourier cosine transform of f (15.6) f or f,S n finite Fourier sine transform of f (15.6)
& 2 u Laplacian of u (13.7.2, 19.1) uj
(15.7)
N t in the context of Fourier series, denotes the N th Cesàro sum of f (15.8.2) Zt in the context of filtering, denotes a filter function (15.8.2) L2 R space of square integrable functions defined on the real line (16.5.2) Pn x nth Legendre polynomial
(16.1)
Tn x nth Chebyschev polynomial (16.4.1) Ln x nth Laguerre polynomial th
Hn x n Hermite polynomial $x
gamma function
(16.4.2) (16.4.3)
(16.2.1)
Jn x Bessel function of the first kind of order n (16.2.2) Yn x Bessel function of the second kind of order n (16.2.3) !
sometimes used to denote Euler’s constant (16.2.3)
I0 x K0 x modified Bessel functions of the first and second kinds, respectively, of order zero (16.2.4) ( 01
characteristic function of 0 1
mn t = 2 t − n functions used in constructing Haar wavelets (16.5.2) m
mn t Haar wavelets
(16.5.2)
(16.5.2)
Rez
real part of z
Imz
imaginary part of z (20.1)
(20.1)
z complex conjugate of z (20.1.2) z
magnitude (modulus) of z
(20.1.2)
argz argument of z (20.1.5) fzdz integral of a complex function f over a $ curve $ (22.2) Re sf z0
residue of f at z0
f D → D∗ (25.1)
f is a mapping from D into D∗
n Pr
(24.2)
= n!/n − r! (26.3)
(26.3) n Cr = n!/r!n − r! n alternate notation for n Cr (26.3) r PrE probability of an event E (26.5) EC
complement of event E
PrE U (26.7) x or
(26.6)
conditional probability of E, assuming U
often used for mean or average
(27.1)
s or in the context of statistics, usually denotes standard deviation (27.1) z/2 with reference to a bell curve, usually denotes a critical value (27.6) t/2 with reference to a Student t-distribution, usually denotes a critical value (27.7)