Advances in COMPUTERS VOLUME 3
This Page Intentionally Left Blank
Advances
in
COMPUTERS edited b y
FRANZ L.ALT National Bureau of Standards Washington, D. C.
and
MORRIS RUBINOFF University of Pennsylvania and Pennsylvania Research Associates Philadelphia, Pennsylvania
associate editors A. D. BOOTH
R. E. MEAGHER
VOLUME 3
Academic Press New York London 0 1 9 6 2
COPYRIGHT @ 1962
BY
ACADEMICPRESSINC.
ALL RIGHTS RESERVED
NO PART O F T H I S BOOK MAY B E REPRODUCED I N ANY FORM BY PHOTOSTAT, MICROFILM, OR ANY OTHER MEANS, W I T H O U T WRITTEN PERMISSION FROM T H E PUBLISHERS.
ACADEMIC PRESS INC. 111 FIFTHAVENUE
NEW YORK3, N. Y.
United K-ingdom Edition Published by
ACADEMIC PRESS INC. (LONDON) LTD.
BERKELEY SQUARE HOUSE,BERKELEY SQUARE, LONDON W. 1
Library of Congress Catalog Card Number 59-15761
PRINTED I N THE V S I T E D STATES O F AMERICA
Contributors to Volume 3
GARRETTBIRKHOFF,Department of Mathematics, Harvard University, Cambridge, Massachusetts E. F. CODD,Development Laboratory, Data Systems Division, International Business Machines Corporation, Poughkeepsie, New York SAMUEL D. CONTE,*Computation and Data Reduction Center, Space Technology Laboratories, Inc., Los Angeles, California IiEED c. LAWLOR, t Electronic Data Retrieval Committee of the American Bar Association, Los Angeles, California HAROLD I(.SKRAMSTAD, U . S. Naval Ordnance Laboratory, Corona, California RICHARD S . VARGA, Computing Center, Case Institute of Technology, Cleveland, Ohio PHILIPWOLFE,Mathematics Division, The R A N D Corporation, Santa Monica, California ]]AVID YOUNG, Computation Center, University of Texas, Austin, Texas * Present address: Computation Sciences Center, Purdue University, Lafayette, Indiana. t Mailing address: Park Central Building, 412 West Sixth Street, Los Angeles, California.
V
This Page Intentionally Left Blank
Preface The editors of Advances in Computers have been joined by Morris Rubinoff. This addition, whose effect will be more fully felt in future volumes, is expected to strengthen still further the tendency to broad coverage of different aspects of the computer field, a tendency which has guided us in the earlier volumes and which is demanding increased emphasis as the field is being split into more and more areas of specialization. In the present volume are represented applications, both scientific and data processing; methods, both of numerical analysis and of computer programming; and engineering considerations in computer selection. Taken together with the articles on artificial intelligence, logical design, components and others appearing in the earlier volumes, these contributions appear to constitute a fully representative sample of computer science and technology. We hope that they will be taken as an antidote to specialization. FRANZL. ALT b’eptember, 1962
vii
This Page Intentionally Left Blank
Contents CONTRIBUTORS TO VOLUME3 . PREFACE . . . . . . . . CONTENTS OF VOLUMES 1 AND 2 The Computation
. .
. .
. .
. .
. .
. .
. . . . . . . .
.
.
.
.
.
.
.
.
.
V
vii
...
.
Xlll
. . . . . . . .
2 4 18 28 35 48 70 74
. . . . . . . . . . . . . . . . . .
78 81 83 87 104 122 150 152 152
of Satellite Orbit Trajectories
SAMUEL D. CONTE
1. The Problems Posed by Artificial Satellites . . 2. The Equations of Motion . . . . . . . 3 . Methods of Integration . . . . . . . . 4. General Perturbation Methods . . . . . . 5 . Accuracy Tests for Integration Programs . . . 6 . Orbit Determination and Tracking Methods . . 7. Organization of a Tracking and Prediction Program Bibliography . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
Mu1t iprogramming
.
E. F CODD
1 . Introduction . . . . . . . 2. Early Contributions . . . . . .3 . Current Scope of Multiprogramming 4 . Batch Multiprogramming . . . 5. The Optimizing Problem . . . . 6. Multiprogramming with Two or More 7. Concluding Remarks . . . . . 8. Acknowledgments . . . . . . Bibliography . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . Processing Units . . . . . . . . . . . . . . . . . . .
Recent Developments in Nonlinear Programming PHILIP WOLFE
1. Introduction . . . . . . 2. Differential Gradient Methods . 3 . Large-Step Gradient Methods . IX
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
156 161 165
CONTENTS
X
4. Simplicia1 Methods
. . . . . . . . . . . . . 172
5. Columnar Procedures . . . . 6. The Cutting-plane Method . . 7. Initiating an Algorithm . . . 8. Computer Routines and Literature Bibliography . . . . . . .
.
.
.
.
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
175 180 . . . . . . . 183 . . . . . . . 184 . . . . . . . 186
Alternating Direction Implicit Methods
.
GARRETT BIRKHOFF, RICHARD S VARGA. and DAVID YOUNG
INTRODUCTION
1. General Remarks . . 2. The Matrix Problem . 3 . Basic AD1 Operators .
. . . . . . . . . . . . 190 .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
PART I: STATIONARY AD1 A1K;THODS (Casem
4. Error Reduction Matrix . . . . . . 5. Norm Reduction . . . . . . . . 6 . Application . . . . . . . . . . 7 . Optimum Parameters . . . . . . . 8. The Function F . . . . . . . . 9 . Helmholtz Equation in a Rectangle . . . 10. Monotonicity Principle . . . . . . 11. Crude Upper Bound . . . . . . . 12. Eigenvalues of H , V . . . . . . .
=
191 192
1)
. . . . . . 194 . . . . . . 195 . . . . . . 196 . . . . . . 198 . . . . . . 199 . . . . . . 200 . . . . . . 202 . . . . . . 203 . . . . . . 204
PART 11: COMMUNICATIVE CASE:
13. Introduction
.
.
.
.
.
.
.
.
.
.
14. Problems Leading to Communicative Matrices 15. The Peaceman-Rachford Method . . . . 16. Methods for Selecting Iteration Parameters for Rachford Method . . . . . . . . . 17. The Douglas-Rachford Method . . . . . 18. Applications to the Helmholtz Equation . .
.
. . . . 205 . . . . . 206 . . . . . 210 the Peacenian. . . . . 211 . . . . . 217 . . . . . 222
PART I11: COMPARISON WITH SUCCESSIVE OVERRELAXATION VARIANTS
19. The Point SOR Method . . . 20 . Helmholtz Equation in a Square 21. Block and Multiline SOR Variants 22. Analogies of AD1 with SOR . .
.
.
.
.
.
.
.
.
. 224
.
.
.
.
.
.
.
.
.
. .
. .
. .
. .
. .
. .
. .
. .
225 . 227 . 229
CONTENTS
xi
PART IV: NUMERICAL EXPERIMENTS 23 . Introduction . . . . . . . . . . . . . . . 24 . Experiments with the Dirichlet Problem . . . . . . . 25. Analysis of Results . . . . . . . . . . . . . 26 . Conclusions . . . . . . . . . . . . . . . 27 Experiments Comparing SOR Variants with AD1 Variants . . Appendix A: The Minimax Problem for One Parameter . . . . Appendix B: The Minimax Problem for m > 1 Parameters . . . Appendix C : Nonuniform Mesh Spacings and Mixed Boundary Conditions . . . . . . . . . . . . . . Appendix D : Necessary Conditions for Commutativity . . . . Bibliography . . . . . . . . . . . . . . . .
.
231 232 242 219 250 254 259 263 266 271
Combined Analog-Digital Techniques in Simulation
.
HAROLD K SKRAMSTAD
1. Comparison of Analog and Digital Computers in Simulation . 2 . Interconnected Analog and Digital Computers . . . . . 3. Example of a Combined Solution . . . . . . . . . 4. Analog-Digital Arithmetic in a Digital Computer . . . . . 5 . Systems Using Analog-Digital Variables . . . . . . . Bibliography . . . . . . . . . . . . . . . .
275 277 281 283 288 296
Information Technology and the l a w REED C. LAWLOR
1 . Introduction . . . . . . . . . 2 . Information Growth . . . . . . . 3 . Mechanization in Lam Practice . . . . 4. Applications of Symbolic Logic to Law . . 5. Information Storage and Retrieval . . . 6. Punched Cards and Notched Cards . . . 7. Prediction of Court Decisions . . . . 8. Thinking Machines . . . . . . . 9 . The Law of Computers . . . . . . 10. Use of Computers in Court . . . . . 11 . New Horizons . . . . . . . . . Bibliography . . . . . . . . . . Exhibit “A” . . . . . . . . . .
. .
. .
. .
. .
. .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . .
. . .
. . .
. . .
. . .
.
.
.
AUTHORINDEX . . SUBJECTINDEX .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
. .
. .
.
. 299 . 300 . 302 . 305 . 310 . 323 . 324 . 334 . 335 . 339 . 340 . 343 . 317 .
.
.
353 357
This Page Intentionally Left Blank
Contents of Volume 1
General-Purpose Programming for Business Applications CALVINC. GOTLIEB n’umerical Weather Prediction KORMAN A. PHILLIPS The Present Status of Automatic Translation of Languages YEHOSHUA BAR-HILLEL Programming Computers to Play Games ARTHUR1,. SAMUEL Machine Recognition of Spoken Words RICHARDFATEHCHASD Binary *4rithmetic GEORGEW. REIT~-IESXELL Contents
of Volume 2
A Survey of Numerical Methods for Parabolic Differential Equations JIM DOUGLAS, JR. Advances in Orthonormalizing Computation PHILIPJ. DAVISand PHILIPRABIXOWITZ Microelectronics Using Electron-Beam-l-ictivated Machining Techniques R. SHOULDERS KENNETH Recent Developments in Linear Programming SAULI. GASS The Theory of Automata, a Survey ROBERTMCNAUGHTON xiii
This Page Intentionally Left Blank
The Computation of Satellite Orbit Trajectories" SAMUEL D. CONTEt Computation and Data Reducfion Center. Space Technology Laboratories. Inc.,
Los Angeles. Calif
.
1. The Problems Posed by Artificial Satellites . . . . . . . . . . . . . . . . . 2. The Equations of Motion 2.1 The Cowell Method . . . . . . . . . . . . 2.2 The Encke Method . . . . . . . . . . . . 2.3 Variation of Parameter Methods . . . . . . . . 3 . Methods of Integration . . . . . . . . . . . . 3.1 Runge-Kutta Method . . . . . . . . . . . 3.2 Multi-step Methods . . . . . . . . . . . . 3.3 Special Methods for Second Order Equations . . . . . 3.4 Accumulated Round-off Error . . . . . . . . . 3.5 Integration in Multirevolution Steps . . . . . . . 4 General Perturbation Methods . . . . . . . . . . 4.1 The Diliberto Theory . . . . . . . . . . . . 4.2 Numerical Results . . . . . . . . . . . . 5. Accuracy Tests for Integration Programs . . . . . . . 5.1 Comparison with Analytic Formulas . . . . . . . 5.2 Consistency Checks . . . . . . . . . . . . 5.3 Double Precision Operations . . . . . . . . . . 5.4 Use of Integrals of the Motion . . . . . . . . . 5.5 Comparison Using Different Methods . . . . . . . 5.6 Estimates of Accumulated Truncation and Round-off Errors 5.7 Numerical Comparison of Special Perturbation Methods . 6 . Orbit Determination and Tracking Methods . . . . . . 6.1 Editing the Observational Data . . . . . . . . . 6.2 The Least Squares Problem . . . . . . . . . . 6.3 Stagewise Differential Corrections and Shifting Parameters . 6.4 The Partial Derivatives . . . . . . . . . . . 6.5 The Choice of Burnout Parameter Coordinates . . . . 6.6 Error Analysis in Differential Corrections . . . . . .
.
.
.
2
. . . .
.
.
4
.
.
.
.
. .
. .
. .
. .
. . . . . . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
6 7 15 18 19 21 24 25 28 28 31 34 35 36 36 37 38 38 38 42 48 49 50 56
.
.
.
.
. .
. .
. .
. .
.
.
.
.
.
.
.
.
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
58 64
.
.
.
.
66
* The preparation of this paper was made possible by the support. both direct and indirect. of the Computat.ion and Data Reduction Center a t Space Technology Laboratories. Inc. I n particular. the author is indebted to I . J. Abrams. D. D . Morrison. 0. K . Smith. and R . J . Mercer for many of the techniques and ideas contained in this survey paper . t Present Address: Director. Computer Sciences Center. Purdue University. Lafayette. Indiana
.
1
2
SAMUEL D. CONTE
7. Organization of a Tracking and Prediction 7.1 Input and Conversion Block. . . 7.2 Trajectory Integration Block . . 7.3 Partial Derivative and Residual Block 7.4 The Differential Correction Package 7.5 The Ephemeris Processor. . . . Bibliography
.
.
.
.
.
.
.
.
Program . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70 70 71 72 72 74 74
1. The Problems Posed by Artificial Satellites
The post-Sputnik era has seen a phenomenal growth in the computation of the orbits of artificial satellites. There are literally dozens of installations actively engaged in computing orbits and in related research. Most of the personnel involved are novices in the field of astrodynamics and must inevitably be unfamiliar with the methods and even the literature of classical astronomy on orbit determination. While familiarity with these methods would be of inestimable value t o the modern astrodynamicist, the problems of artificial satellite theory differ in some important respects from those of classical astronomy and therefore call for new methods of attack. Some of these differences arise from the emphasis on and importance of engineering aspects; others concern the relative importance on near-earth satellite orbits of such perturbative forces as drag and oblateness. Rapidly changing forces such as arise from the near approach of a satellite t o a planet also lead t o new effects and to the need for refined computing techniques. The necessity for real time orbital determination and prediction presents new problems which were not encountered by astronomers. The advent of the high speed computing machine makes possible the consideration of complicated forces and effects which were too laborious to be considered in the days of the desk calculator and a t the same time makes necessary a re-evaluation of the numerical techniques used for orbit determination. The engineer is sometimes prone t o accept machine results on faith. The numerical analyst is aware of the many errors which enter in a complicated way into the computations, and it is his responsibility t o make clear to the engineer to what extent, he can trust machine results. I n this paper we propose to discuss and evaluate methods of orbit prediction and determination on high speed computers, pointing out, whenever possible, the sources and the magnitude of errors which enter into the computation. At the outset i t may be well to distinguish, as some writers have done, between “feasibility orbits” and “precision orbits.” Feasibility studies are concerned with the over-all performance characteristics of a missile configuration, with fuel and payload requirements, with the optimization of orbit parameters, and with optimum guidance requirements. I n such studies extreme accuracy is usually not required, two or three significant digits
SATELLITE ORBIT TRAJECTORIES
3
being normally adequate. In feasibility work it is permissible in some cases to use crude mathematical models as is done, for example, in replacing the true orbit of a planet by an approximating circle or in omitting perturbations due to small perturbative forces such as oblateness on an interplanetary flight. In some cases the n-body problem may be approximated by a succession of two-body problems. In addition such studies are often based on the use of approximate values for the physical constants. Even in such studies, however, proper evaluation of the results requires that the nature and magnitude of the approximation be fully understood. In precision orbit work on the other hand one should strive for the ultimate accuracy possible within the limitations of the information available, the machine being used, and the mission under consideration. Precision orbits are required for accurate navigation, for the proper conduct and interpretation of physical experiments, and for determining improved values of physical and astronomical constants. The number of significant digits required for precision work will generally be on the order of six or more depending on the objective of the study. The mathematical model should include, when applicable, perturbations due to nongravitational as well as gravitational forces. Aerodynamic drag effects, for example, on near-earth satellites are very significant. Even small effects such as those caused by electromagnetic forces, meteoritic drag, radiation pressure and relativistic effects may be significant for some missions. Gravitational anomalies may also introduce appreciable effects. The effect of the earth’s oblateness on near earth satellites is now fairly well understood; for lunar orbits the oblateness of the moon may need to be considered. In addition to the forces, every effort should be made to obtain the best available physical and astronomical constants. Indeed as noted above one of the major uses of precision orbits is to determine better these very constants. The present uncertainty in the solar parallax, for example, assumed to be sec, leads to an uncertainty in a Martian flight of on the order of 1 x some 50,000 miles; the uncertainty in the principal oblateness parameter which is on the order of 4 X 1 0 - 6 leads to appreciable effects on the orbital elements of a near earth satellite. Even the tables of the planetary coordinates introduce uncertainties of several hundred miles in the position of a planet in its orbit. The effect of the inaccuracy in the planetary coordinates is considerably more striking if it is necessary to obtain the velocity of the planet by interpolation. It can be shown that errors in the velocity of the earth of up to 1 fps are encountered if they are obtained by interpolation in the tables. This error when translated into errors in the satellite’s velocity will subsequently lead to errors of several thousand miles on flights to Mars or Venus. Assuming that an adequate mathematical model is being used and that
SAMUEL D. CONTE
4
the physical constants are known exactly, one is faced with the problem of maintaining accuracy in the numerical computations. Computational errors, if not properly controlled, may be as serious as the physical errors. Among the computational errors are those which arise from the truncation of series, from the use of approximating polynomials, from the accumulation of rounding errors, from loss of significance through differencing nearly equal quantities and from small divisors. Techniques for estimating these errors and methods for reducing their effects will be discussed in succeeding sections. 2. The Equations of Motion
No attempt will be made here to derive the equations of motion for a satellite acting under the influence of a central body force and various perturbative forces. This material is well covered in several books [37]. However, the equations will be given here both for reference purposes and to serve as a basis for subsequent discussion. Assuming an inertial rectangular equatorial coordinate system centered a t the earth the equations of motion for a vehicle with respect to the earth center have the form
i: =
-P-
r r3
+ Fa + Fi + Fz
(2.1)
where
r
= (5,y, z )
r
= (x2
is the position vector of the vehicle relative to the central body,
p =
+ + y2
~2)~’2,
Gm where G is the gravitational constant and nz is the
mass of the earth, Fa = (Fa,, Fa,, FOE)is the force due to oblateness, FI = ( F I ~F,,, , FlZ) is the force due to the attraction of planetary bodies other than the central body, Fz = (Fz,, FZ,, Fz.) is the force due to drag.
-p
(5>
=
(-
p
2, -p $ -p E+, 2
(2.2)
is the attraction of the central
body on the vehicle. More specifically, the components of the oblateness force, including only the first three harmonics, are :
- J+R4
6 (3
- 42
2 + 63 22
SATELLITE ORBIT TRAJECTORIES
5
22
where J1, Jz, J3 are the oblateness parameters corresponding to the second, third, and fourth harmonics respectively, and R is the radius of the earth. The components of the perturbations due to other planetary bodies, including the moon, are:
FZ1=
-p
,e
mi
t=l
r T + ”) Pi3
(x + Y,z )
where
xi
=
ri
=
x component of the position of the i t h body with respect t o the origin,
p; =
mi
=
+ (y - yJ2 + (z - z ~ ) ~ ] ” ~ + ya2 +
[(x - x J 2 (.a2
za2)1’2
ratio of mass of i t h body to that of the central body.
The form of the drag vector will depend on the model assumed for the atmosphere. If the vehicle is assumed to be spherical so that lift can be neglected, and if variations of the density with latitude or time of day are ignored, the components take the form
FzZ = -D-X a Va
(x --+ Y, 2 )
(2.5)
where
D = pVa2ACD,
2w
density of atmosphere a t height h above surface of the earth, Va = (ka2 ga2 ia2)1’2 = velocity of the missile with respect to the surrounding air, A = cross sectional area of the vehicle, W = mass of the vehicle, CD = drag coefficient, xa = x We’,
p(h) =
+ +
+
ya = y - w,x, ia= 2,
SAMUEL D. CONTE
6
and where the rotation ratc of the atmosphere is assumed to be equal to that of the earth, we.The density p(h) may be approximated over a limited range in height h by a formula of the type p =
exp (cl
+ c2h + cyh2)
where the coefficients c; (i = 1, 2, 3) which may differ for various height levels are obtained by making a fit of p ( h ) to the latest standard atmosphere table. If the perturbativc forces in (2.1) arc all neglected, the motion reduces t o that of a body moving in an inverse square law field. I n this case the orbit is a conic section and explicit analytic formulas can be given for the motion of the vehicle. Another problem which can be solved explicitly is the restricted three body problem. I n the general case analytic solutions are not available and recourse must be made t o numerical integration, or possibly t o series solutions. Our major concern will be with the various possible forms of the equations of motion and with the methods of numerical integration which will lead to the most efficient computational schemes for solving (2.1). Three mathematical formulations which are in common use today-those of Cowcll and Encke and the variation-of-parameters method-will be described and evaluated for computational efficiency. The two major criteria to be used in this evaluation are overall accuracy and computational efficiency, although other criteria such as simplicity and versatility will he discussed. 2.1 The Cowell Method
The direct integration of (2.1) in rectangular coordinates is referred t o in the literature as Cowell’s method. If the position and velocity components (20,yo, 20, 5 0 , go, 2 0 ) are given at a time to, this system of three secondorder equations can be integrated directly to obtain the total velocity and position a t any subsequent time t. In this form the equations, although nonlinear, are very simple and symmetrical. For the integration itself no conversion from one coordinate system to another is necessary and integration time per step is very nominal. However, because the total accelerations are integrated, the attractions change rapidly with small time changes so that small integration steps are required to maintain accuracy and in addition a large number of significant figures must be carried. Over-all computing time depends upon the computing time per step and on the number of steps required for a given accuracy. For many orbits the Cowell method requires about ten times as many integration steps as the Encke or Variation of Parameter Methods, and although computing time per step is approximately 50% less, over-all computing time may be considerably greater. The Cowcll method also suffers more from accumulation of round-
SATELLITE ORBIT TRAJECTORIES
7
off errors which may lead to serious degradation of accuracy. I n general, round-off errors will accumulace as some power of the number of steps. Since the Cowell method requires more integrating steps for a given accuracy, round-off error will be proportionately larger and over-all accuracy may suffer as a result. I n general, a method which requires the smallest number of integration steps should be preferred because round-off error effects will be minimized. For many orbits-including lunar orbits-and in general for any orbits for which two body motion is a reasonable approximation, the Cowell method is probably the least efficient computationally of the three methods to be discussed. Studies a t Aeronutronics [2] and Republic Aviation [33] confirm these remarks. There are some orbits in which the Cowell method does show up t o good advantage, however, particularly those where the thrust and perturbations are changing rapidly. Moreover, it is universally applicable t o all types of orbits and presents no fundamental difficulties in the exceptional cases of nearly circular, nearly parabolic or even hyperbolic orbits. Because of its simplicity and universality of application, every installation engaged in serious work in space trajectories should have available a routine based on Cowell’s method, this in spite of its apparent deficiencies on many types of orbits. 2.2 The Encke Method
Over a century ago the German astronomer J. E. Encke observed that the motion of a celestial body, such as a planet, deviates only slightly from two-body motion. He proposed that instead of integrating the total accelerations as in the Cowell method one integrates only the deviations of the actual motion from that of a reference conic. Encke’s method has been adapted to the earth satellite problem in recent years with striking success. When the reference orbit is that for two-body motion analytic solutions are available. Since, theoretically, these equations can be solved exactly and since the deviations from the two-body motion are assumed to be small, it seems intuitively evident that much larger integrating steps for the deviations should be possible. Moreover, greater accuracy should be possible when working with a fixed word length because round-off and truncation errors enter only into the deviations. To insure that the deviations are kept small compared to the reference orbit motion, a process of rectification is necessary. Whenever the deviations from the fixed reference orbit become large-on the order of 1000 miles-a new reference orbit is determined. The frequency of rectification will depend upon the type of orbit. The computing time per integration step is a t least 5Ooj, greater than for the Cowell method, particularly when frequent rectification is necessary, but the integrating step may be ten times as large so that the Encke method compares very favorably in over-all computational efficiency with the Cowell method.
SAMUEL D. CONTE
8
The advantages of Encke’s method are particularly evident on lunar flights where the deviations from two-body motion are slight but seem to be less marked on near earth satellites where the effects of oblateness and drag cause frequent rectification. To derive the equations for Encke’s method we rewrite the equations of motion (2.1) in the form r r = -p-fF 1.3 where F is the vector sum of all perturbations ,acting on the vehicle. If the perturbations are neglected, the resulting equations
re r,3 describe a conic section with the center of the earth a t one focus. Analytic formulas are known for re, i, as functions of time. Now letting E represent the deviations of the actual orbit from the reference orbit’, one has
re =
5
=
and
..
..
E=r-re=-p
-p-
r
-
re,
(s
-
3)+
F.
Equations (2.9) may now be integrated for the components of E and iand the components of the actual motion r and r obtained from (2.8). From a computational point of view the solution of the two-body formulas for re and ?, is very critical and is the source of the largest error in the Encke method. Several sources of computational errors will be mentioned in succeeding paragraphs. In particular errors in computing the mean motion n and the mean anomaly M when the time is large are especially serious. To avoid these errors it is recommended that the computation of n and M be done in double precision. If the rectification were exact, the initial conditions for & and 50 would be zero. Due to rounding and loss of significance errors, the conversion from elliptic elements to rectangular coordinates will not be exact and hence the initial displacements E,,) iowill not vanish. Since the reference orbit need not be the true osculating conic section, one can simply consider that the erroneous-elements so obtained define a conic section and that the deviations be based on this. Thus the initial displacements will in general not vanish a t rectification but instead will be given values equal to the difference between the actual coordinates at the time and those computed from the slightly erroneous elliptic elements. I n attempting to carry out numerically the integrations in (2.9) we are immediately faced with one of a number of computational difficulties which
SATELLITE ORBIT TRAJECTORIES
9
arise in the Encke method. At the beginning of the flight r/r3 and re/r? are very nearly equal to each other and significance will be lost in differencing these quantities. To avoid this loss of significance a method involving a series expansion is given in the literature [&$I. However, a simpler exact form more suitable for computation is the following: (2.10)
where p = re/r. In order to start the integration of (2.10) we must have the elements of the reference orbit and the initial values of E, a t time t = to. If the initial conditions are given in rectangular coordinates we must first compute the elements of the osculating conic section (a, e, i, i2, o,M ) (see Fig. 2). The formulas differ for elliptic motion ( a > 0) and for hyperbolic motion ( a < 0). To determine the semimajor axis a, the eccentricity e and the mean anomaly M we proceed as follows:
r
=
(x2
+ y2 + z2)1/2
v
=
(vz”
+ v; +
Tl
=
1 - r/a
v,)1‘2
(2.11)
+ T22)1’2
=
(T?
=
(TI2- T22)1/2
=
tan-l T 2/T 1
>0 for a < 0 for a > 0
=
tanh-I T 2 / T 1
for a
p
=
a(1 - e2)
M
=
E - esinE
a>O
=
e sinh E - E
a
e
E
for a
<0
< 0.
To determine the inclination i, the longitude of the node i2 and the argument of perigee w , it is convenient to introduce unit vectors P, Q directed along the semimajor axis toward perigee and in the direction of the velocity vector when the satellite is a t perigee.
10
SAMUEL D. CONTE
P
=
Q
=
r
cos E - vd/asin E
-[1
r
4; sin E
4 P
(2.12)
+ va (cos E - el
1
for elliptic motion and for hyperbolic motion replace the trigonometric functions by hyperbolic functions and 4 ; by dm.The remaining elements can now be computed as follows:
w =
tan-'-
PZ
(2.13)
Q3
Q = tan-'
PI/QZ- PzQu. PzQz- PzQz
Formulas (2.11)-(2.13) are used initially or whenever rectification is required in order to convert from rectangular coordinates to elliptic elements. To carry out the integration we must know re, i, a t any time t. From the elements of the reference orbit the mean motion n = and the time of perigee passage 7 are known. We then find the mean anomaly M = n(t - T) a t time 1 and solve Kepler's equation for the eccentric anomaly E : M
=
=
( a > 0) (a < 0).
E - e sin E e sinh E - E
(2.14)
To compute the position and velocity in rectangular coordinates from the elliptic elements we define a rectangular coxdinate system in the plane of the orbit with the 2,-axis directed toward perigee and the yw along the semi-latus rectum. Then z, = a(cos E - e )
y, =
r
=
a41
- e? sin E
a(1 - e cos E )
- sin E xw = - d f l a -
r
(2.15)
- cos E
Yw
= 4 f l P T
for elliptic motion with the usual modifications for hyperbolic motion. The unit vectors P, Q introduced in (2.12) can also be obtained from the orientation elements (i, Q, w ) as follows:
SATELLITE ORBIT TRAJECTORIES
P,
=
cos w cos D - sin w sin D cos i
P, P, Q, Q,
=
cos w sin D
=
sin w sin i
=
-sin
w
cos D - cos w sin D cos i
=
-sin
w
sin D
11
+ sin w cos D cos i
(2.16)
+ cos w cos D cos i
cos w sin i. Finally the rectangular coordinates a t a point on the reference orbit are obtained from re = x,P yWQ (2.17) i, = kwP gwQ. Kepler’s equation (2.14) plays an important role in all orbit calculations. In the Encke method it must be solved a t each step of the integration and any inaccuracies in its solution will affect subsequent calculations. Many methods have been proposed for solving it for the eccentric anomaly corresponding t o a given time. Perhaps the most convenient and the fastest method for computational purposes on a high speed computer is Newton’s method. Starting with an initial approximation Eo t o E which can be taken as the value of E determined at the previous integration step, the method consists of iterating using the algorithm M - E, 4-e sin En’ AE, = e < 1, I - e cos E , - -M - E,+esinhE,,, (2.18) e > l 1 - e cosh En Qz
=
+
+
+
Erz AEa. The iterations are continued until E,+1 agrees with Ento as many significant digits as are desired. The rate of convergence is quadratic but the number of iterations will clearly depend on the goodness of the initial approximation. In the usual case two or three iterations are sufficient to yield seven-place accuracy. There are cases, however, where this iterative procedure may fail to converge or where convergence will be very slow. This occurs for instance in the important case of nearly parabolic orbits when e 1 and E is small so that the denominator of (2.18) is close to zero. On a typical lunar flight an error of 1 X lop7 in E will lead to an error of 100 feet in the position coordinates. The exceptional case of nearly parabolic orbits is discussed later. En+l =
-
2.2.1 The Enclce Method with E or v as Independent Variable Some authors advocate the use of the eccentric anomaly E or the true anomaly v as independent variables in the Encke method. The equations
SAMUEL D. CONTE
12
of motion for the deviations 4 in terms of the eccentric anomaly E are easily derived. From the relations
E - e sin E re = a(1 - c cos E ) , a>0
M
=
n(t - T)
=
we obtain dE dt
- 4; red;
From the Encke equations with time as independent variable
it now follows that
Using the relation tan
2
=
Jz 1 - e tan E, 2
Encke's equations with true anomaly v as independent variable become
d2E = 5 [ 2 e sin d4 d2v2
p
+ f(4,t)].
The principal advantage in using E or v as independent variable is that the necessity of solving Kepler's equation is avoided altogether thus resulting in a more rapid and somewhat more accurate determination of E. A secondary advantage is that the integration step size is more nearly uniform when using E or v instead of time. On highly eccentric orbits for example, when a variable step integration routine is used, and time is the independent variable, the step size must be reduced considerably near perigee to maintain accuracy. If v is the independent variable, the step size Av will remain nearly constant. Frequent changes in the integration step size is costly in machine time and also results in a small loss of integration accuracy. On the other hand time is a much more natural variable and engineers usually require output at specified times. The most convenient
SATELLITE ORBIT TRAJECTORIES
13
way to obtain information at a specified time is the following. If E is the independent variable and E j are the points at which information is available, solve Kepler’s equation the (‘easy way” to obtain the corresponding time tJ. Since the position, velocity and accelerations are available a t each time ti, one can interpolate for any desired information at a time t which falls between successive values of t j. This interpolation will have to be based in general on nonequally spaced intervals and either a Lagrangian or Hermite type interpolation formula may be used [23]. 2.2.2 Parabolic and Nearly Parabolic Orbits When the reference conic is an ellipse (e < 1) or a hyperbola (e > 1) the formulas given in Section 2.2 are reasonably accurate. When the orbit is nearly parabolic (e 1) the use of either an ellipse or a hyperbola as an osculating conic leads to severe computational difficulties. For parabolic orbits the true anomaly v is used in place of the eccentric anomaly E because its range is much greater. The position and velocity in the plane of the reference conic are computed from the formulas
-
P
r=
1
+ cos v
xu
= rcosv
yo
=
r sin v
(2.19)
r-
?j, =
v/’cLp cos v/r
while the inertial rectangular coordinates are obtained exactly as in the elliptic case. Kepler’s equation in terms of the true anomaly is
M = tan
(5) +
tan3
(i)
and Newton’s method applied to this yields the iterative formulas
Vn+l
= vn
+ Avn.
AS mentioned above substantial numerical errors may be sustained in nearly parabolic orbits when using the standard formulas for the reference orbit. A good discussion of nearly parabolic orbits is given in Herrick [21]. To see how these numerical difficulties arise, we may examine in detail some of the formulas used for the range of parameter values under consideration:
14
SAMUEL D. CONTE
r
=
a ( l - eco sE ) ,
(2.21)
M
=
E
(2.22)
-e
sin E,
(2.23)
-
Consider first (2.23). If e 1, a small error in e will be magnified into a much larger error in a. By considering the differential 2ae da = -de 1 - $2
it is seen that an error in the seventh significant figure in e can lead to an error in the fourth significant figure in a. Since a is used to compute the distance r from the focus, it is clear that such errors will be propagated. A similar loss of significance occurs in computing r when e is near one and E is near zero, as is the case in most parabolic orbits. A loss of one or more digits will be sustained in differencing e and e cos E. A similar remark applies t o the solution of Kepler's equation (2.22). This loss is especially dangerous when Kepler's equation is used to compute the time t or the mean anomaly M initially since the initial timing error so introduced will seriously affect subsequent calculations. Herrick [ 2 l ] has derived a formula which avoids some of these difficulties. I n place of solving Kepler's equation for the time he proposes to use the formula
where
X
=
V
tan2
A=--- 1 - e
I
+ e'
The series converges for AX2 < 1 but its use is advised only when AX2 < 0.2 ( E < 49"). When E > 49", Kepler's equation (2.18) is used. On orbits for which the two body problem is a reasonable approximation, the Encke method is computationally more efficient than the Cowell method. This advantage is especially marked on lunar flights where accuracies comparable to those for the Cowell method can be obtained in 1/2 to 1/3 of the computing time. It is much less sensitive to the accumulation of round-off errors, it is capable of maintaining accuracy for much longer periods of time, and, in addition, it can reflect the effect of very small perturbations. On an IBM 704, using a fourth order integration method, an Encke program will require about 100 integrating steps and about 25 sec of computing t.ime on a typical lunar flight with an error of 0.1 miles.
SATELLITE ORBIT TRAJECTORIES
15
If higher order integration formulas are used, it is possible to take as few as 20 integration steps and a running time of 15-20 sec for the same accuracy. On near earth satellite orbits, the advantages of the Encke method are not nearly so marked. See, for example, the comparison of special perturbation methods on a typical earth satellite orbit given in Section 5. In the latter case the use of reference orbits more complicated than conic sections holds promise of yielding comparable computational efficiency. The major disadvantages of the Encke method are its complexity, and the need for special care in handling the analytic expressions for the reference orbit in order to obtain maximum efficiency. 2.3 Variation of Parameter Methods
In the Encke method the elements of a reference orbit are determined a t a fixed instant of time and the deviations from this reference orbit computed. Whenever desired, a new reference orbit is computed. In variationof-parameter methods equations are derived for the instantaneous rate of change of a selected set of orbital elements, i.e., the reference orbit is continually rectified. In general, for earth satellites one would expect the orbital elements to change more slowly than rectangular coordinates. This in turn implies that a larger integrating step size can be taken for a required accuracy. Several sets of orbital elements have been proposed and are described in the literature under the names of Stromgren, Merton, Hansen, and Oppolzer [34]. More recently S. Herrick has introduced a specially chosen set of parameters which define the osculating orbit a t each epoch. Herrick's method has been programmed for several installations and appears to be well suited to the computation of the orbits of earth satellites. A brief description of the method and the formulas mill be given here. Details of the derivation are given in Baker [ 3 ] .For the purposes of this section the equations of motion (2.1) are writteii in the form r f=-p--+F (2.24) r3 where F is the sum total of all perturbative forces. A set of unit vectors P, Q, W is first introduced (see Fig. 2) with P pointing toward perigee, Q pointing 90" east of P in the plane of the orbit, and W completing a right handed orthonormal system. The parameters for Herricks' method are taken to be vectors a(t) and b(t) along P and Q together with the mean anomaly M and the mean motion n. i.e., a = eP, nf = n ( t - t o ) , -
b
= ed\/pQ,
n
= l~,l/pa-~/~.
(2.25)
SAMUEL
16
D. CONTE
Here to is the initial time and a is the semimajor axis (not the magnitude of a), e is the instantaneous eccentricity, and p is the parameter of the orbit. The differential equations for the instantaneous rate of change of these elements are
b
=
bo
(2.26)
+ k , lotb' dtj
+ k, J' n' dt M ( t ) = MO+ no(t - to) + k, [ I'n' dt dt + k , 1'M'dl n(t) = no
to
(2.27)
to
to
where the perturbative variations a', b', n', M' are to be defined. Equations (2.27) determine the mean anomaly. The system (2.26)-(2.27) is redundant since only six independent equations are needed to describe the motion but the redundancy is used as a check on the computations. To relate the Herrick elements to the rectangular coordinates of the vehicle we proceed as follows. Assuming that a, b, M are known at time t we form: a - a = a2 a> az2 = e2 3 eccentricity
+ + b.b = b2 + b,2 + b,2 = e2p
3
parameter p
a1 = + semimajor p axis a - e2
(2.28)
p = ae
For elliptic orbits we now compute E from M inplanar rectangular coordinates from
r 2,
- e cos E ) = a(cosE - e) = a(l
yw =
=
=
E
- e sin E
and the
magnitude of radius vector
.\/a?, sin E E
-sin
kw = -.\/pa-
r
The position and velocity in inertial rectangular coordinates then are
SATELLITE ORBIT TRAJECTORIES
r
=
x,P
+ ywQ
17
+ ywQ
i = xwP and for later convenience we also form
D
=
ed/asinE
H
=
-exw
b
r.P = -r.a. =
To carry out the integration of equations (2.26)-(2.27) the variations a', b', M' must be obtained. Letting F,, F,, F,, be the components of the perturbations, we compute:
n',
diD'
=
d&'
=
+ yF, + zF, 2 f . F = 2(XF, + yF, + ZF,)
H'
=
200' - rZD'/di
r . F = xF,
A ' = dp- D' r (2.29)
The mean anomaly M arid mean motion n can now be updated using (2.27) and the vectors a, b updated using (2.26). It is clear that these equations are considerably more complicated to program than the Cowell equations and that the computing time per step is considerably greater. Moreover, the two body formulas must still be solved a t every step since the perturbations are expressed in rectangular coordinates. In addition, accuracy is difficult to maintain on low eccentricity orbit as can be seen from the computation of v' and subsequently of M'. For low eccentricity orbits a different set of parameters which avoids the small divisor problem has been proposed by Herrick and is described in his book [62]. No universal set of parameters which apply equally well to all types of orbits has yet been developed so that different sets of parameters must be used for essentially different types of orbits (e 1, e 0, and e moderate). It is often claimed that these formulas are less sensitive
- -
18
SAMUEL
D. CONTE
to round-off accumulation because they have been expressed as a system of eight first-order equations. It can be shown that round-off errors do accumulate less rapidly with first order systems than with second order systems. This conclusion is, however, not valid here since the integration of (2.27) for the mean anomaly is essent,ially a double integration and any errors introduced int,o the mean longitude here are propagated throughout. On the other hand, the Herrick method does allow the use of larger time steps because the dependent variables are slowly changing functions of time. Except for low eccentricity orbits, it is a stable method relative to the growth of round-off and truncation errors and it is capable of yielding good accuracy over long time periods. It is particularly well suited to the integration of orbits where the total perturbation forces, including thrust, are small. See Section 5 for the results of a comparison of the Herrick method with the Cowell and Encke methods on an earth satellite orbit.
3. Methods of Integration
A great variety of methods arc available for the numerical integration of systems of first order differential equations. The general criteria for the choice of an integration method are the following: (a) Storage requirements for the subroutine (b) Complexity of the formulas (c) Flexibility (d) Speed (e) Over-all accuracy. A flexible computing program for the artificial satellite problem which is designed for a variety of missions, which takes into account all significant perturbations and which provides for detailed input and output conversions will inevitably lead t o a very large complicated routine. The number of instructions will be on the order of 20,000 or more. The typical application of such a program-whether for mission design purposes, for prediction of actual motion, or for tracking purposes-will involve large time spans and frequent usage and therefore considerable amounts of computing time. In view of these requirements, it will be assumed that a large high speed digital computer of the IBM 7090 type is available. Criteria (a) and (b) are then definitely of secondary importance while speed and over-all accuracy are of primary importance. While extreme accuracy is not always necessary-in feasibility studies or in preliminary design studies for example-the routine should be capable of producing very accurate results when they are desired even a t the expense of increased computing time.
SATELLITE ORBIT TRAJECTORIES
19
By flexibility we mean the facility with which the program can perform the following functions: (a) Print out results a t a specified time or a t a set of equally spaced times. (b) Change the step size based on an error criterion. (c) Restart the integration when discontinuities are introduced. While some of these functions can be performed by the user, it is more convenient and faster t o have these built into the integration subroutine. The flexibility features affect both the speed of the integration and the accuracy of the results. Frequent interruption of the basic integrating routine results in increased computing time because the restarting procedure, except for Runge-Kutta methods, is expensive and also tends to reduce accuracy although the exact reason for this is not clear. I n discussing the various integration methods emphasis will be placed on the factors which affect speed and on the errors-round-off and truncation-which affect accuracy. An integration routine for space trajectory calculations should strive to obtain a balance among the following requisites : (1) It should allow the largest possible step size compatible with a given error criterion, i.e., higher order methods are preferable to lower order method. This will serve to increase the speed and minimize the accumulation of round-off error. ( 2 ) It should be capable of automatically changing the step size based on an adequate estimate of the local truncation error. (3) The procedure for changing the step size or for restarting should be as inexpensive as possible. (4) Adequate safeguards should be provided against excessive accumulation of round-off error. (5) Provision should be made for the output of information at any desired time based on the use of interpolation formulas. This is particularly true when frequent output is required as in station ephemerides print out or during tracking operations when a great many observations must be processed. This function is perhaps more conveniently performed outside of the subroutine although the subroutine should be written with this use in mind. 3.1 Runge-Kutta Methods
I n this country Runge-Kutta methods are very popular-particularly a version due t o Gill which requires only 3N storage cells-N being the number of equations-and which attempts to reduce the growth of roundoff error. On modern computing machines the savings in storage are insignificant, and round-off error control, while quite important, can be accomplished more simply and more universally by double precision
SAMUEL D. CONTE
20
accumulation of the increment, a process which will be described later. There no longer appears to be any good reason for using the Gill version and it is recommended that if a Runge-Kutta method is desired the standard fourth order method be used. The formulas for this method are well known [23]. For a first-order equation y‘ = f(z, y), assuming that the functional value y1is known a t the ith step, the next value y2, is obtained by the formula
=
+ h6(ki + 2kz + 2k3 + h)+ O(h5! yZ+ a y I +
=
hf(xi, yJ
=
~ t + i
ki
YZ
~
5
)
(3.1)
kr = hf(zt
+ h, yZ+ k ~ ) .
The increment Ayl thus requires the evaluation of the derivative function four times for each integrating step. Since the derivative evaluation is the most time consuming portion, the necessity for four such evaluations provides one of the least desirable features of this Runge-Kutta method. A second disadvantage is that the coefficient in the principal part of the local truncation error is an extremely complicated function of y and its derivatives. I n trajectory calculations it is important to use the largest integrating step possible for a required accuracy in order to reduce both machine time and round-off error. The difficulty of evaluating the error term makes the choice of the proper integrating step difficult. Various methods have been proposed for eliminating this defect. One suggestion is that the integration for yt+l be performed first with a step of length h and then with two steps of length h / 2 . This will yield 2 values for yt+l. The difference between these two values then provides a reasonable estimate of the local truncation error. This method requires three times as much computing per step and for this reason cannot be seriously considered. Another suggestion is the following : having obtained three successive equally spaced values yIP1, y E ,yz+l,usc Simpson’s rule over the two intervals to obtain a new estimate of yz u;+1
= 2/*
+ h ( y L 1 + 2y’, + yE+d
which has the same local truncation error as the Runge-Kutta method. Oiie can then use the difference y; k l - yl+l to decide on whether to increase or
SATELLITE ORBIT TRAJECTORIES
21
reduce the basic step size. This appears to be a reasonable and inexpensive procedure, although there is no rigorous mathematical justification for it nor has the method been used extensively [ti']. Another procedure is to examine the magnitude of the attractions acting on the vehicle-for example, the distance from the earth might be one simple criterion-and choose the interval on the basis of experience. When integrations are performed over a great many time steps, the accumulation of round-off errors can lead to very serious degradation in the accuracy of the results. A very simple, effective, and inexpensive method for reducing accumulated round-off error in any numerical integration method is that of double precision accumulations. I n this procedure the dependent variable is always carried in double precision form, the derivative calculations are carried out in single precision, and the increment Ayi is added to the dependent variable yi in double precision. This process is effective in either fixed point or floating point operations. Round-off error will be discussed in more detail later. To summarize, Runge-Kutta methods are capable of good accuracy, follow the solution curves very well, do not suffer from excessive growth of round-off error, and can change the stepsize at will. On the other hand the usual method is of fourth order so that relatively small step sizes must be taken: it is slow because four derivative evaluations are required at each step; and there is no simple criterion for deciding on how to change the step size. By and large Runge-Kutta methods require more computing time than ot,her methods and for this reason are not recommended for space trajectory calculations. 3.2 Multistep Methods
Multistep methods as opposed to single-step methods such as those of Itunge-Kutta type, require information at several successive equally spaced time steps. Such methods usually lead to a finite difference approximation to the solution of a first-order equation y' = f(z, y) of the form
+.. .+
+ .+
(3.2) where ao,. . . , ( Y k ; P O , . . . , Pk are fixed constants; yn is the value of y at 2 = zo nh, and fn = f(z,, yn). If Pk = 0 the formula is open and yn+k can ffkUn+k
+
aO?Jn
= h(@kfn+k
* *
POfn)
be obtained directly; if Pk # 0, the formula is closed since fn+k requires knowledge of yn+k and hence (3.2) must be solved by iterations. Since closed formulas are more accurate than open formulas, it is customary to obtain a first estimate of y n + k using a predictor formula and then to use (3.2) as a corrector formula. Using the finite difference operators in (3.2) which are of order lc it is theoretically possible to obtain formulas whose However, sincc the difference local truncation error is of degree hZA++'.
22
SAMUEL D. CONTE
operator is of order k arid the differential equation is of first order, (3.2) will have in general k solutions, all but one of which are extraneous. There is some danger in the use of such formulas that small errors such as roundoff introduced a t some point of the computation will become magnified and even dominate eventually the true solution. This phenomenon is referred t o as instability. Dahlquist r9] has studied such finite difference approximations and has shown that if the method is to be stable the local truncation error can be at most of degree hk+3. Moreover, such formulas of maximum degree must be closed. Thus it is possible to generate families of finite difference equations from (3.2) which are stable. We list here two familiar predictor-corrector formulas of the type (3.2). The Milne formulas for first order equations are
As the error terms indicate both methods are of fourth order although for the Milne method the truncation error is somewhat smaller. For sufficiently small values of h the Adams-Moulton formulas are unconditionally stable. The Milne method on the other hand is unstable, and while several modifications have been suggested to remove this instability [IS]it is probably best t o avoid this method. On the assumption that the fifth derivative is constant over an interval, the difference between the predicted and corrected values of y provides a good estimate of the local truncation error and thus provides a basis for deciding on whether to reduce or increase the step size. I n using formulas of this type it is advisable to use the corrector formula once only. If the predicted and corrected values do not agree to the desired number of places, then the integrating step should be reduced. Thus each step involves two derivative evaluations as contrasted with four for Runge-Kutta methods. This fact together with the ease of determining the local truncation error make the Adams-Moulton formulas considerably faster. On the other hand, this method as well as all multistep methods requires an independent starting method whenever the step size is to be changed. While special formulas can be derived to obtain the starting values or to change the interval, it is very convenient t o use a Runge-Kutta
SATELLITE ORBIT TRAJECTORIES
23
method for this purpose. Double precision accumulation is also recommended for round-off error control. To summarize, the Adams-Moulton predictor-corrector method is of the same order of accuracy as the fourth order Runge-Kutta method, it is very stable and is considerably faster. When combined with a Runge-Kutta method for starting and for changing intervals it makes for a convenient rapid method which should be preferred t o a straight Runge-Kutta method. Its major disadvantage is that it is of low order thus restricting the interval step-size for a required accuracy. Of the numerous methods of higher order which are available, we mention briefly the Adams open formula. Written in terms of differences this formula is
+
+
+
+
+
+
h ( 1 &V +AVO" 3 7 3 27 L 2 0V 4 Jp5-0j 288 . . .)y', (3.5) where Vy', = y', - yk-1 is the backward difference operator. If only third differences are retained and these differences are expressed in terms of ordinates we obtain the first of formulas (3.4). By retaining a sufficient number of differences a formula of any order may be obtained. While this method is satisfactory, the Gauss-Jackson method, sometimes erroneously referred t o as the Cowell method, is commonly used b y astronomers and is particularly well suited to the equations of motion expressed in rectangular coordinates. Because of its length and complexity a complete treatment of the Gauss,Jackson method of integration will not be reproduced here. The reader is referred t o Hildebrand [W]for the derivation and usage of the formulas. These formulas are often expressed in the form of differences but it is more convenient to express the differences in terms of ordinates. Let the basic second order equation to be integrated have the form k = X(z, 2, t). Let ~ ~ - 15,-2, , . . . , 5 0 , &, . . . , &; and Xa--l,. . . , Xo be i given successive equally spaced values of the position z, the velocity j: and the attractions X . Then a formula for the predicted value of x, and x.,a t step i has the form yn+l = yn
+ '2 c k X k ] 1 , I 2 % ~h [f X t - l i z+ '2 d L X k ] , J
5," =
h2 ["X.
k=O
=
where the first sum recurrence formulas
'Xz-1,2
(3.6)
k=l
and the second sum " X , are given by the
IX1-1,2
lfXi
+
=
xi-1
=
'Xi-l/2
IXi-3,2,
+
(3.7) "Xi-1.
Using the predicted values xi, Li one may calculate X i from the equations of motion and then apply corrector formulas t o obtain improved values of xi, &. The corrector formulas will have the form
24
SAMUEL D. CONTE r
i
1
(3.8)
+2
xic = h [‘Xi-,,,k = l d f k X k ] .
The difference between the predicted and corrected values can now be used to decide whether the interval h is too large or too small for the accuracy required. For the case when eleven successive values are assumed known the coefficients ck, ellc,dk,d f kare given to nine significant figures in Garrett [14l. The starting values must again be obtained by some independent method, both for the attractions and for their first and second sums. The starting values are usually obtained by a complicated iterative process, although a Runge-Kutta method could be used. In the latter case a considerably smaller integrating step must be used since the truncation error of the Gauss-Jackson method is O(hn+2) if nth differences are retained. A special procedure for doubling or halving an interval must also be available. The Gauss-Jackson method will allow fairly large integrating steps if a sufficient number of differences are retained. Moreover the second sum process inhibits the growth of round-off errors as will be shown in Section 3.4. It is probably the best integration method for Encke and Cowell programs where one is basically dealing with second order systems of equations expressed in rectangular coordinates. It appears to be less well adapted to first order systems such as arise in variation-of-parameter methods. I n general, it is most efficient on orbits which require relat,ively infrequent changes in the integration step size. 3.3 Special Methods for Second Order Equations
In the absence of drag and thrust perturbations, the equations of motion in rectangular coordinates have the form y” = f(z, g), i.e., first derivative terms are absent in the forcing function. Special integration methods both of the multistep and Runge-Kutta type have been proposed to take advantage of this feature. A Runge-Kutta method of this type is the following : kl = hf(xn, 2 / n ) ,
yn+1 =
yn
+ h [ Y f . + g (kl + 2 k 2 ) ] , 1
(3.9)
SATELLITE ORBIT TRAJECTORIES
25
This method has a local truncation error of order h5 just as the corresponding method for first order equations, but involves only three derivative evaluations instead of four. A saving of about 25% in computation time for the same accuracy thus appears possible. Of the various multistep methods we mention the predictor-corrector formulas usually ascribed to Millie and Stormer :
(3.10) The apparent advantage of this method is that a local truncation error of order h6 is achieved compared to an h5 error for similar formulas for firstorder systems. As shown in Hildebrand [23] this advantage is illusory since the total propagated error is only of order h4. I n addition these formulas are more sensitive to the growth of round-off errors than first order methods as shown by Henrici [18].In practice there appears to be little to recommend (3.10) for the free-flight problem. Moreover, since it is frequently necessary t o include nonconservative forces, it is probably advisable to avoid the use of either method (3.9) or (3.10).
3.4 Accumulated Round-off Error Since a machiiie works wilh a finite number of digits, and since some sort of rounding is involved in each calculatioii, it is clear that some significance will be lost due to round-off. Indeed the loss due to round-off can be disastrous in some problems, aiid every effort should be made to assess the extent of such loss. The proper assessment of round-off error is, however, complicated by the fact that very little is known about it and few thorough studies have been made. Rademacher [35] made a study of round-off error accumulation and obtained some results assuming a statistical model of round-off propagation based on fixed point operations and symmetric rounding. Brouwer [S] studied the accumulation of round-off errors for the n-body problem when the Cowell formulation and the GaussJackson integration method were used. He also assumed symmetric rounding and fixed point arithmetic, but he was concerned primarily with hand computations. His results indicate that, for a fixed step size, the error growth is of order N3I2where N is the number of integration steps. Experience on high speed computers of the IBM type with floating point operations indicates that this estimate is probably pessimistic, particularly if the process of double precision accumulation as described in Section 3.1 is used. A more realistic estimate based partially on experience and partially on analysis indicates an error growth of order N1I2.Henrici [IS] has recently
SAMUEL D. CONTE
26
made a study of round-off error propagation and it may be worthwhile to summarize some of his results here. The most important factors affecting round-off error propagation in the solution of differential equations are the following: (1) the method of integration used; (2) the kind of arithmetic, i.e., fixed point or floating point; (3) how the rounding operations are performed. For simplicity we consider second order equations of the form y“ = f(x, y) which are typical for the free flight problem. For machine solution this differential equation can be replaced by a straight finite difference scheme of which (3.10) is typical or by a summed finite difference scheme of which (3.6) is typical. If infinitely many places were carried by computing machines the results would be the same for either method. There is a considerable difference, however, if finitely many places are carried. The integration methods being considered are of the form akYn+k
+ .- + *
aOYtr = h2[Pkfn+k
+ ..
*
f
(3.11)
POfn1.
Let g, denote the numerically calculated values of y, and letf, and define the local round-off errors E, by ffkgn+k
+ . . + aOgn =
h2[bkfn+k
*
+ .. + *
PCLfn]
+
= f(z,,
jj,),
En.
We now adopt a statistical model of round-off propagation. The E, are considered to be random independent variables about which the expected value E and the variance V are assumed to have the form (3.12)
where p , q are functions of x but independent of h, and p , u are functions of the step-size h but independent of x. Using perturbation methods and probability theory, Henrici shows that for the orthodox methods of using (3.11)
E(r,)
=
V(rn>=
L5 h2
$
. *
WGL)
{v(x,)
+ O(h)) (3.13)
+ O(h))
where T , = 5, - yn is the accumulated round-off error after n integration steps and m(z), v(x) are known functions which satisfy certain differential equations. The differential equations satisfied by m and v are:
+ p , m(a) 4gv’ + 2g’v + 2q,
m” = gm 0’” =
=
m’(a) = 0,
v(a) = v’(a)= v’l(a)
=
0
(3.14)
27
SATELLITE ORBIT TRAJECTORIES
where g(z) = fJz, y(z)]. Equations (3.14) can be integrated along with the original equations 9’’ = f(z,y). The summed form of (3.11) is equivalent to the pair of difference equations 0:-12/n+k . . * a’oyn+L = h{PkFn+k . . . PoFn) (3.15)
+ +
+
- Fn+k-l
+
hfn+k. (3.16) Now let en denote the local round-off errors for (3.15) and 9% those for (3.16) and assume as before that E(9n) = vP(zn),V(vn)= ~zQ(z,)and that the en, qn are mutually independent. Then in place of (3.13) we obtain Fn+k
E(rn) =
V(Tn) =
f
*
=
M(zn)
+ aN(zn)
f ~ ( z n+) f ~ ( z n )
(3.17)
where M, N , R, S are again obtained as solutions of certain differential equations. Thus both the expected value and the standard deviation of the accumulated round-off error are improved by a factor of the order of h in the summed method. These results indicate the superiority of the GaussJackson method described in Section 3.2 over the unsummed methods of which (3.10) is typical. I n order t o obtain quantitative results for a particular method of integration one has to make more definite assumptions about the distribution of the local error, i.e., about p , y, P, Q and the constants p, v, u, T . For fixed point operations and symmetric rounding and assuming uniform distribuof tion between the limits -u/2 and u/2 where u denotes the basic unitthe machine, the distributions are p = P = 0, y = Q = 1, u = T = u / d 1 2 . Some experiments on some simple equations show that quite accurate predictions of the distribution of accumulated round-off error can be made. Experiments applied specifically to the equations for the n-body problem are currently being conducted a t Space Technology Laboratories, Inc. Experimental results also indicate that the method of rounding has a considerable effect on round-off error propagation. I n general symmetric rounding is much better than rounding b y chopping. IBM machines normally round by chopping unless a code can be written to achieve symmetric rounding. This fact together with the short floating point word length of IBM machines (8+ decimal digits) constitute two of the poorest features of the 700 series. On the basis of these studies and experiments the following conclusions concerned with minimizing round-off error accumulation seem warranted : (1) the summed form of finite difference integration schemes should be preferred; ( 2 ) symmetric rounding should be used; (3) fixed point arith-
28
SAMUEL D. CONTE
metic is preferable to floating point arithmetic because a longer word length is thereby obtained (lo+ on IBM machines) and because round-off error analysis is much simpler; (4) double precision accumulation as described in Section 3.1 should always be used. 3.5 Integration in Multirevolution Steps
I n many problems such as that of determining the lifetime of a satellite it is necessary t o obtain fairly accurate estimates of the orbital elements over hundreds or even thousands of revolutions. Straight integration of the equations of motion cannot be considered both because of the extensive machine time required, and because accuracy cannot be maintained over that many revolutions. Several papers have appeared recently suggesting that difference equations for the orbital elements be obtained and that an integration be performed in niultirevolution steps. Let yn represent the orbital elements a t the end of the nth revolution. Let fn represent the changes in the elements over the next revolution. Numerical integration over this revolution leads to the difference equation ~ . + l-
Yn = f ( t n ,
yn>
= fn.
(3.18)
Multirevolution steps in y can be obtained through the use of predictorcorrector difference equations of the form y k - YO = k kjv3fk (3.19) 3=0
where
Vjk
represent multi-int ervnl backward differeiiccs, i.e.,
vfn = fn
- fn-k.
The coefficients X, are given as j t h degree polynomials in l / k . The methods proposed by Smith [do] and Cohen and Hubbard [7] use backward difference formulas of the type (3.18) while those given b y Thomas [44]use the Gauss-Jackson central difference formulas. Numerical experiments by Smith on a n IBM 709 using Herrick’s method for the numerical integration led to extremely disappointing results unless the solution of (3.18) was obtained with extremely high precision. On the other hand experiments by Thomas using a machine with a 12-digit word length and based on Cowell’s method led to optimistic conclusions. Because of the possible application of this approach to the satellite lifrtimc problem, further investigation of its capabilities appear warranted. 4. General Perturbation Methods
The large majority of vehicles launched over the past three years have been placed into orbits within 1000 miles of the earth. The major forces
SATELLITE ORBIT TRAJECTORIES
29
affecting the motion of such satellites are drag and the oblateness of the earth. If the perigee distance is greater than 300 miles the motion is relatively drag-free and the oblateness effect dominates. For short term prediction of the motion on such orbits the special perturbation methods discussed above are capable of yielding very good accuracy. For long term prediction over hundreds and even thousands of revolutions, however, they suffer from two important disadvantages: First, step by step integration is subject to the inevitable accumulation of round-off and truncation errors which will eventually destroy completely the validity of the results, and second, they are extremely time consuming since many thousands of integration steps will be required. As a rule of thumb, one can estimate that 100 integration steps per revolution will be required for a near-earth satellite orbit. Assuming an orbital period of 144 min or 10 revolutions per day with 100 steps per revolution would require lo5 integration steps for 100 days. Using Brouwer’s estimate that the round-off error in the mean longitude accumulates as the 3/2 power of the number of steps implies a loss in accuracy, due to round-off alone of seven significant figures. This implies that double precision on IBM machines would be necessary to maintain an accuracy of six or more figures. Since the oblateness parameters in the earth’s gravitational potential are small, and since they enter analytically in the differential equations of the motion, it is reasonable to expect that the motion can be described adequately by power series expansions in these oblateness parameters. Those methods which attempt to obtain series expansions in the oblateness parameters-or perhaps in the eccentricity-are known as general perturbation methods. These methods have some important advantages over the special perturbation methods-at least theoretically. These are : (1)the position and velocity at any epoch can be obtained without the necessity for step by step integration and consequently there is no significant loss of accuracy because of accumulated round-off or truncation error; (2) they are extremely rapid requiring less than 1 sec of computing time per point on an IBM 704, after the initialization parameters have been obtained; (3) they are theoretically capable of maintaining accuracy over hundreds of revolutions; (4) they allow for a clearer interpretation of the sources of the perturbative force and of the qualitative nature of orbital motion. Thus the recent discovery of the “pear” shaped earth was made possible by an analysis of the long period terms in a general perturbation study applied to the 1958?!z, satellite [%’I. In this instance general perturbation theory led to the discovery of the existence of the third harmonic in the earth’s gravitational potential, a discovery which might have been almost impossible to arrive at using special perturbation methods.
30
SAMUEL D. CONTE
On the other hand they suffer from a number of disadvantages, both theoretical and practical. From the theoretical aspect, although mathematicians and astronomers have investigated this problem for hundreds of years, no theoretical proof of convergence of the series expansions has yet been given nor have any estimates of the truncation error in these expansions been obtained, except for the special case of equatorial orbits. In addition the general perturbation methods do not naturally lend themselves to the inclusion of nonconservative forces such as drag and this imposes a severe limitation on the type of orbit to which they are applicable. From a practical point of view, the series expansions are extremely complicated, even to first order terms in the parameters, and almost hopeless for the higher order terms. The derivation of these higher order terms is extremely laborious, the correctness of the expansion is difficult to establish, and the programming of the method is extremely involved. For most of these methods, there is a serious degradation in accuracy for small eccentricity orbits or for orbits near the critical inclination angle of 63.4"where small divisor problems arise. Even for noncritical orbits, the accuracy is limited and for short term prediction cannot be as accurate as numerical integration. A large number of general perturbation methods, have been proposed. The classical theories of Hansen and Delaunay have been modified so as to apply to the artificial satellite problem, the former by Dr. Musen [31] and the latter by Dr. Brouwer [GI. More recently King-Hele suggested a different approach [25] and Brenner and his associates a t Stanford Research Institute have developed a complete first order theory based on a modification of the King-Hele theory [4].A novel approach based on the mathematical theory of periodic two-surfaces of Diliberto [lo]is being developed at Space Technology Laboratories. Other theories have been proposed by Kosai [27], Garfinkel [13],and Struble A complete exposition of these theories cannot be undertaken in this paper but a few comments concerning their current and potential usefulness seem appropriate. Apparently the only general perturbation method which is relatively complete and which has actually been used on the near-earth satellite problem is the Hansen method. A program based on the Hansen method was initiated under the Vanguard Project and written by IBM, and is now being expanded under the jurisdiction of NASA. In this theory the disturbed eccentric anomaly is used as independent variable and extensive use is made of Fourier series expansions. The program is extremely complicated and represents an investment of a great many man-years of effort. It seems to have been used successfully on a great many satellites. The accuracy of the predictions using this program varies with the particular orbit but is stated to be roughly about 0.1" of arc over one revolu-
[@I.
SATELLITE ORBIT TRAJECTORIES
31
tion. No complete write up of the equations appears to be available for Hansen’s method although publication is now contemplated. Brouwer’s adaptation of Delaunay’s method, which was published in 1959, seems to hold great promise for a practical perturbation method. The theory is complete and available to first order in the periodic terms for the first five harmonics of the earth’s gravitational potential, and to second order in the secular terms. It is relatively simple to program and requires a very small initialization time. In these respects it is recommended over Hansen’s method. A comparison with respect to accuracy and length of time of validity is much more difficult to make and such a study has not been undertaken. The Delaunay theory suffers from the problem of small divisors for small eccentricities particularly in the mean anomaly and in the argument of perigee, from singularities in the short period terms of the odd harmonics both a t low eccentricities and a t lorn inclinations, and from resonance difficulties for satellites with inclinations near the critical value 63.4”. Research is currently underway to remove the singularities from the Delaunay theory, and to include a drag theory as well [24,4l].Some tests of the Delaunay theory conducted a t STL indicate that the major error arises in the computation of the mean anomaly so that the error is largely along the orbit rather than perpendicular to it. Most of the other theories mentioned above are not complete or h a w not been sufficiently tested to determine their usefulness. 4.1 The Diliberto Theory
To establish the flavor of general perturbation methods, and also because the Diliberto theory is not now available in the literature, a brief exposition of this theory will be given here. We begin by considering the equations for the drag-free motion of a satellite about an oblate spheroid. In polar coordinat,es (r, 8, 4) these equations are (a) i: - re2 - r sin2 e$
av
=-
ar
d ( ~ ~ 6-) r sin 0 cos 8@ (b) -I r dt d (c) - [rz sin2 e$] dt
=
1 av
= --
r a0
1 av r sin 0 134
where V is the gravitational potential which including only the second harmonic has the form (4.2)
where R is the equatorial radius of the earth and J is a constant which is a
SAMUEL D. CONTE
32
measure of the earth’s oblateness. The solutioii of (4.1) with the graviiational potential (4.2) is the main problem of artificial satellite theory. More complete theories would be bascd on a gravitational potential in the form
where P, is the Legendrc polynomial of order n. The force field is assumed to be rotationally symmetric so that (aV/a4) = 0 and hence (4 .1 ~ )can be integrated to give sill2 e4
=
p where p is a constant,
(4.3) which states that the angular momentum about the polar axis is constant. Also C#I is a monotonic function of time and can therefore be used as a n independent variablc. Introducing two new dependent variables r2
U
=
cot 8,
W
=
1/r sin 0
(5.4)
Eqs. (4.1) with C$ as indcpcndeiit varia1)lc become
d@?
d42
+W
=
A(]+
[-2)--3/2
+ XWz[s(l + 1-2)--7/2
- 4 (1
+
7,T2)-5’2]
\vherc X = Jpli?/p2. A = PIP2, The solution of (4.5) will yield the position and velocity as a function of the longitude 4. Equation (4.3) must be integrated if the position as a function of time is desired. Equations (4.5) can be interpreted as a system of coupled harmonic oscillators and this interpretation suggests the introduction of phase space variables: 51 =
cr,
y1 =
w
For a spherical earth (A = 0 ) Eqs. (4.5) expressed in terms of xi,yi become
One can now find functions ol;(z~, Q) so that under the new variables 2; =
y; -
(Yi(Z1, z2)
i = (1, 2)
SATELLITE ORBIT TRAJECTORIES
33
(4.6) becomes uncoupled, i.e., (4.7)
The phase space has been deconiposed into the product of two planes. If polar coordinates are introduced into the two planes
x1 = rl sin 01, z1 = r2 sin O2 x2 = r1 cos 01, z2 = r2 cos O2 the unperturbed equations (4.7) become dri-dOi =I, (i = 1, 2 ) . -0
d4 d4 Applying the same transformations to the perturbed system (4.5) yields equations of the form
Explicit expressions for Hi, R ; are given by Diliberto [ l o ] . The theory now postulates the existence of integrals of the system (4.9) which are called periodic two surfaces. They are closely related to the concept of periodic integrals. An approximation theory is then developed based on this assumption. The theory is characterized by the fact that it splits the description of the solutions into two distinct parts; one element is the description of the periodic surface, the other is a description of the position on the surface. The description of the surface requires two doubly periodic functions Xi, S2; SO that (4.10) T , = Xz(&, 82, J ) . These functioiis are obtained to first order in J by an espansiori procedure for the corresponding periodic integrals. Once the functions Siare approximated, the position on the surface is obtained from expansions of the form 01
=
0l0
e2 = 020
+ xFl(4,4°) + O ( W + ~ ~ (40)4+, O(W
(4.11)
The superscripted variables are obtained from the initial elements for the unperturbed orbit. The functions F1, F2 are of course quite complicated and must be solved b y iteration. For a given value of the longitude 4
34
SAMUEL D. CONTE
(4.10)-(4.11) are solved for 0;) T ; and from these one can obtain the position and velocity directly. The theory as developed thus far is complete to first order in X but an analytic relationship for 4 as a function of time has not yet been derived. 4.2 Numerical Results
Some numerical results are included here to give the reader a feeling for the kind of accuracy that can be expected from general perturbation methods. A typical non-critical orbit was chosen with initial elements a = 1.5, e = 0.2, i = 45" and period 155 min. The only perturbation considered was the second harmonic in the gravitational potential. The standard was provided by a Cowell double integration routine which carried out the numerical integration over 64 revolutions at a step size of 1/8 min. For this standard the position and velocity are correct to seven significant digits after 64 revolutions. Both the Diliberto and Delaunay methods were then used to obtain the position and velocity at 20-min intervals in each revolution. Table I gives the maximum difference in the position coordinates in feet over selected revolutions. In the Diliberto method the longitude 4 corresponding to a given time was obtained from the numerical integration since no time-longitude relationship has yet been developed for this method. The column Delaunay I gives the results using the mean anomaly as output from the Cowell run and the last column gives the results when the Delaunay method computes its own mean anomaly. The error in the Diliberto and Delaunay I columns must be attributed directly to the fact that in the expansions terms of order higher than the first have been neglected. The Delaunay I results are somewhat better than the Diliberto results, the maximum errors being on the order of 1mile and 3 miles respectively. The Delaunay I1 results are somewhat poorer. The major part of this error can be attributed directly to small errors in the mean anomaly as computed by the Delaunay formulas. The error is thus almost entirely along the path of the motion rather than perpendicular to it. At the end of 64 revolutions (about 9940 min) the mean anomaly determined from the Delaunay theory differs from that of the numerical integration by 0.035". The error Ax in any position component x can be obtained from the formula AM Ax=-X
n
where AM is the error in the mean anomaly, n is the mean motion in degrees per second and x is the velocity in feet per second. For this orbit at t = 9940 min, AM = 0.035,
35
SATELLITE ORBIT TRAJECTORIES
2?r n = -2n =T 155(60)’
x
-
-20,500 fps and
18,500 ft. The error in the mean anomaly thus accounts for almost 90% of the actual error in the x-coordinate which is -21,000 feet and this is true even though the secular terms in the mean anomaly are computed to second order terms. After 1000 revolutions a loss in accuracy of approximately six significant digits will be sustained from this source. TABLE I. NUMERICAL RESULTS-GENERAL PERTURBATION METHODS -
Revolution number
Diliberto
Delaunay I
Delaunay I1
1 2 20 40 64
150 ft 250 2800 5600 14,000
150 f t 240 1400 3000 5000
320 ft 650 TOO0 14,000 22,000
5. Accuracy Tests for Integration Programs
Having decided on a mathematical formulation and on a method for numerical integration, it is then important to know how accurately a trajectory can be computed. Pu t differently, one may ask whether it is possible to estimate the totality of computational errors for a given orbit a t any particular time. Several proposals have been suggested for ail accuracy study, among these being: Comparison of the computed results with analytic formulas for problems t o which solutions are known. Consistency check on the integration. Use of complete double precision. Use of energy and angular momentum integrals. Comparison of results obtained by using different mathematical formulations and different methods of numerical integration. Use of numerical methods which estimate the propagated truncat,ion error and the accumulated round-off error. The relative efficiency of these tests will be discussed in the following sections.
36
SAMUEL D. CONTE
5.1 Comparison with Analytic Formulas
To determine the accuracy of a particular method, one can compare the results for a n idealized two-body motion where the analytic solution is known. The assumption is then made that similar accuracies are attainable on nonidealized orbits. This approach is of limited value for the following reasons : (1) The two-body problem is esseiitially two-dimensional in nature. The effects and the accuracy on true three-dimensional orbits may be significantly different. (2) This method provides no check a t all on the Encke and Variation of Parameter methods which use two-body motion as a reference orbit. The only special perturbation method which does not become degenerate for two-body motion is the Cowell method. (3) The computation of the position and velocity a t a given time from the two-body formulas will itself lead to errors. This is especially true for the limiting case of nearly parabolic orbits. Thus when great accuracy is required, it will not be clear whether the integration or the analytic formulas introduce the largest error. The first two objections, but not the third, can be partially removed by basing the comparison on a three dimensional problem for which an analytic solution is known. Among such problems are the restricted three-body problem and the problem of a vehicle moving under the attraction of two fixed centers of gravitation. For both of these problems an exact solution is known in terms of elliptic functions-although these solutions are fairly complicated. Comparisons based on thrse two problems have been made by Baker [2] and Pines [ 3 3 ] . Those problems for which an analytic solution are known are very special in nature and it is not a t all clear whether the generalization of results to more complicated motion is valid. The results may be useful for orbits which resemble closely those on which the comparison is based. 5.2 Consistency Checks
111 this method the same trajectory is computed several times using either (a) a fixed step size for each integration but reduced for succeeding integrations, e.g., h, h / 2 , h/4, or (b) a variable step size based on a n error criterion which is successively tightened, e.g., E = lo+, lop9. The number of digits of agreement in the position and velocity is then taken as a n indication of the accuracy of the computation. As the step size is reduced the truncation error becomes smaller but the round-off errors become larger. There is therefore an optimum step size for a given method of numerical integration. I n the absence of an absolute standard it is
SATELLITE ORBIT TRAJECTORIES
37
difficult to detcrmine this optimum step sizc. Moreover, thcrc arc ccrtain systematic errors, such as thosc caused by cancellation, which are not uncovered by simply reducing tho step size. Thus it is possible for two successive integrations to agree to N significant digits over a broad range of time values and yet the results may both be correct to somewhat fewer significant digits. Thus the consistency check does not provide a positive test of accuracy, although it does give useful information especially when it can be ascertained that round-off and cancellation errors are not major factors. The h2 extrapolation method can also be used to obtain improved accuracy. Thus, if a pth order integration method is being used, and if two integrations at fixed step size h and h / 2 are available, the formula
will yield an extrapolated value for the dependent variable 'y a t time t. This extrapolation formula is obtained under the assumption that the coefficient c in the truncation error O(chp) is constant over an interval and that the error is completely truncation. If these assumptions are approximately fulfilled, the extrapolated value will have considerably more accuracy than can be obtained by continuing to reduce the integration step size, primarily because round-off errors will affect the results for very small step size. This is not recommended as a general procedure since it involves the storing of all dependent variables at each of the desired times, but it is useful when good accuracy using a single precision routine is required. 5.3 Double Precision Operations
For an idealized problem in which all constants are assumed known exactly, a completely double precision program will allow the use of integration steps so small that truncation errors will be negligible without incurring the danger of round-off error accumulation. On idealized problems involving reasonable time spans, a double precision program should be able to provide an accurate standard against which t o test single precipion programs. The use of double precision programs for the integration of real orbits is not recommended, however. Apart from the fact that double precision operations are very expensive in machine time, the physical constants, the initial conditions, and the planetary coordinate are seldom known to more than six significant figures. The addition of dummy digits to these constants will certainly not lead to increased accuracy. Except for some reduction in round-off error, double precision programs have very little advantage over
38
SAMUEL D. CONTE
single precision programs in producing significantly more accurate results on real problems. 5.4 Use of Integrals of the Motion
For some problems an integral of the motion can be used as a check on the solution. Thus for the two-body problem the energy integral is constant, and for the motion of a satellite about an oblate earth the vertical component of the angular momentum is invariant. These integrals are usually stable in the presence of errors so that errors in the position and velocity cancel when evaluated for the integral. A physical explanation of this lies in the fact that the major portion of the error is along the line of motion rather than perpendicular to it. Thus, the major error is accounted for by a time discrepancy so that the integral is actually a t a point on the orbit but a t a different time. Thus even though the integral is constant to N significant digits the actual position and velocity may be considerably worse. The energy integral does provide a good negative test-the method is generally poor if the integral does not remain constant. In some cases the energy integral has remained constant to seven digits, while the position coordinates were correct to only two digits. Of course, these integrals cannot be used when nonconservative forces are present, but even when they do apply they provide a very inconclusive test. 5.5 Comparison of Different Methods
Agreement between two distinct formulations and if possible two distinct methods of integration provides probably the best indication of accuracy for an arbitrary orbit. Thus if an Encke integration and a Cowell integration agree a t all points to seven significant digits one is reasonably safe in assuming that results for the theoretical orbit are correct to that many digits. This conclusion is based on the assumption that round-off is a random process and that systematic errors such as cancellation are different for different methods. The probability of such errors causing identical effects over a large number of points is almost surely very small. Of course, agreement among three distinct methods provides an even better indication of accuracy. For groups which are heavily engaged in the problems of orbit prediction, it is recommended that a t least two distinct methods be available for orbit integration. 5.6 Estimates of Accumulated Truncation and Round-off Errors
Theories have been developed which make it possible to obtain estimates of the accumulated truncation and the accumulated round-off error a t a given time. If these estimates were available then it would be theoretically possible to choose an integrating step which would minimize the sum of
39
SATELLITE ORBIT TRAJECTORIES
these errors and a t the same time an indication of the over-all accuracy would be available. One such theory is described by Kopal [26] and is applied specifically to the satellite problem by Garrett et al. [ I d ] . Briefly the method proceeds as follows: Let the system of equations to be solved be
9%= f,(Y*, . . . , Yn)
(i = 1, 2,
. . . , n).
(5.1)
Let the system adjoint to the variational equations for (5.1) be
x,
= -
2 a,&)ha
u=l
i = (1, . . . , 1 2 )
(3.2)
where = (af,/dy,). The system (5.2) is solved n times under n sets of initial conditions. If s,(T) represents the error in yz a t time T , the total truncation error will be given by
z,(T) =
J' k5 b k ( t ) X % L ( t ) dt =l 0
(i = 1, . . . ,n)
(5.3)
where b k is the local truncation error a t time t, h , k ( t ) is the solution of ( 5 . 2 ) beginning a t the time t = T with initial conditions
h&(T)= 6ak = 1 =0
(i = k), (i # k),
the functions b k ( t ) are estimates of the local truncation error which for a multistep integration method will have the form Chp+'y(p+')(t). The method requires a forward integration of (5.1) to obtain and store the position and velocity at equally spaced time steps t = kh, k = 0 , 1, 2 , . . . . These are 1, at the correused to compute y(pfl), the derivatives of y of order p sponding times, which are also stored. The adjoint equations (5.2) are then solved backwards in time using n sets of initial conditions corresponding to the identity matrix. With the values of h& so obtained and the estimates of bk(t), the integral in ( 5 . 3 ) is evaluated by quadratures for each of the z,(T). The result will be an estimate of the total truncation error in the position and velocity at time t = T . A similar procedure can be used to obtain the accumulated round-of€ error where now the bk(t) will represent the round-off error a t time t. This procedure is clearly very cumbersome, it requires extensive storage, and it is very expensive. Moreover, several approximations and assumptions must be made concerning the magnitude of the local truncation error and the distribution of the round-off errors, and the approximations to the derivatives must certainly be poor. It is not clear whether the error bounds so obtained are realistic. This method has been programmed for the Cowell method [see 141 and some results are given. A good evaluation of the usefulness of this technique as applied t o the orbit problem is not now available. Because of its complexity
+
SAMUEL D. CONTE
40
it will probably not gain wide acceptance, nor can it be strongly recommended. A simpler and more realistic theory for estimating accumulated truncation and round-off error has recently been proposed by Henrici. Beginning again with the system (5.1), one next derives the associated nonhomogeneous variational system n
+ b,(t)
x, = 1 U J , 3=1
(i = 1, . . . , n)
(5.4)
where again at3(t)= [8fZ(t)/8y3], and b,(t) are estimates of the local truncation error for the method of integration being considered. The system (5.4) is solved simultaneously with the system (5.1) under the initial conditions X,(to) = 0. The coefficients a,(t) are evaluated along the solution curve obtained from (5.1). The solutions X,(t) are proportional to the "discretization" or accumulated truncation error in y % ( t )at time t. This process is a t once much simpler than that described by Kopal, much less time consuming, and more convenient for machine computation. Moreover, it is based on a firm mathematical foundation. Indeed it is shown by Henrici [18] that if a pth order multistep integration method is being used so that b,(t) = -Cyy+1)(t), and if e ( t ) represents thc error at time t, then asymptotically as h -+0, e(t)
-
(5.5)
hpX(t)
where X(t) is the solution of (5.4). For the n-body problem the matrix (uJ is not difficult to derive. Indeed a t many installations the variational equations (5.4) are also required to obtain the partial derivatives of the vehicle's position and velocity with respect to initial parameters. This use of the variational system for this purpose will be described in Section 6. The b,(t) must in general be obtained numerically by differencing the solutions yz(t) obtained from (5.1). This theory for (5.1) can also be extended to a consideration of round-off errors following the method outlined in Section 3 but this will not be pursued here. It may be worthwhile to illustrate this approach to determining the discretization error with a simple example. Written as a first order system the equations of motion for a two-dimensional elliptic orbit are: j,
n
=
=
Ay
[
0
-fi/,."
0
0
where r
= dyl2
1 0 0 0
0 0 0 -p/rJ
Sl '1
(5.6)
0
+ yZ2,and y = (g1, y2, y3, y4) is a vcctor of position aiitl
41
SATELLITE ORBIT TRAJECTORIES
velocity of the vehicle in a rectangular coordinate system. The nonhomogeneous variational system is
i=BA+b 1 B=
-.[$-y] 0
O1
0 P
m
0
r5
0
0
where h is the vector of errors corresponding t o y. If a fourth order multistep method of integration is being used, then the essential components of the vector b will be (y1v(t), yyzv(t>,ySv(t), y p ( t ) ) . Normally b(t) will have to be obtained numerically since the solution y(t) will not be known analytically. If, however, we change the time scale in (5.6) so that p = l and if we prescribe the initial conditions y(0) = (1, 0, 0, 1) then (3.6) leads t o circular motion for which an explicit solution is known. The solution is given by y ( t ) = (cos t, -sin 2, sin t, cos t ) and the vector b(t) by (sin t, cos t , -cos t , sin t). The solution of (5.7) is then h(t) = (t sin t , t cos t, - t cos t, t sin t). The accumulated truncation error e(t) a t any time t is thus seen to oscillate with a period of 27r and t o grow linearly with time. Its numerical value can be obtained from the formula
e(t) = -ch5h(t), where h is the iiitegration step size. For the Adams-Moulton fourth order method c = 19/720. From this formula one can decide on the proper step size for a required accuracy. If for example we require seven decimal places of accuracy in the solution y ( t ) a t the end of the first revolution, then for t = 27r the maximum error mill be 19 E = -h4 X 271. 720 a i d for this to be less than 5 x lo-* will require a step size h of approximately 1/40 or about 240 integration steps. If a multistep method of the Gauss-Jackson type is used with local truncation error of order h9 and if c is assumed t o be on the order of 1/1000, then approximately thirty-five integration steps would be required for the same accuracy. These estimates of course neglect round-off error effects. If an odd order integration method is used so that locally the error is O(h2*f2)for the circular orbit problem, the propagated truncation error will behave like ch2”+’t2cos t as compared with ch2%cos t for an even order
SAMUEL D. CONTE
42
method. It thus appears that for sufficiently large times there is no advantage in choosing say a fifth order method instead of a fourth order method. To what extent this situation carries over to the n-body equations is not known but the conjecture that it may actually hold raises interesting possibilities about the choice of integration methods. 5.7 Numerical Comparison of Special Perturbation Methods
In this section we give the results of a numerical study conducted at STL whose objective was to yield quantitative estimates of the computational efficiency of the three special perturbation methods discussed in Section 2. The idealized orbit selected for the study is the one described in Section 4. This is an earth satellite orbit in which the only perturbation force included is that due to the second harmonic in the earth's gravitational potential. The orbit is defined by the initial elements: a = 1.5 earth radii, e = 0.2, i = 45", D = w = M = 0. The period of the unperturbed orbit is 155 min, perigee distance is 800 miles, apogee distance is 3200 miles, and the energy integral E has the value
=
- 0.0018451686 (e.r./min)2
In rectangular coordinates the initial values are x,,= 1.2 e.r., YO = zo = 0, 0, Ijo = d a ,zo = d a with p = 0.0055302633 (e.r.3,'minZ) and
XO =
J
= 0.001638. Since an analytic solution t o this problem is not known, it was first necessary t o obtain a standard against which to check the integrations. This standard was provided by a completely double precision Cowell program using the second-diff erence integration scheme (3.10). The accuracy attainable on an orbit of this type was checked by integrating the unperturbed orbit ( J = 0 ) and comparing the results with the known analytic solution. Uniform agreement to one or two digits in the eighth significant figure was obtained. For the perturbed orbit, the results provided by the standard are correct to a t least seven significant figures, an accuracy more than adequate for this study. Single precision, floating point programs for the Cowell, Herrick, and Encke methods were then run on an IBM 7090 and compared with the double precision standard. Great care was used to insure that the physical constants and the initial conditions were identical in all programs. The following table gives the method of integration used, the local truncation error criterion, the number of integration steps required, the computing
SATELLITE ORBIT TRAJECTORIES
43
TABLE11. NUMERICAL RESULTS-SPECIAL PERTURBATION METHODS
Formulation
Method of integration
Error criterion
Number of steps
Computing time (min)
Maximum AT (ft)
Cowell Encke Herrick
Gauss-Jackson Gauss-Jackson Adams-Moulton
1 X 10-lo 7 x 10-lo
10,200 6395
5.75 5.31
800 1700 400
5 X 10-lo
(7000)
11.45
time for 64 revolutions, and the maximum error in the distance Ar over the 64 revolutions. The Adams-Moulton and Gauss-Jackson methods are described in Section 3. The former is a fourth-order method, the latter, a sixth-order method and both are of predictor-corrector type with variable step size capability. For each formulation several runs were made with successively tighter error criteria and the most accurate of these selected for the comparison. To put it differently, the error criterion for each of the programs is optimum and the results obtained are the best possible with a single precision program. The Cowell method required almost twice as many integrating steps as the Encke and Herrick methods but only about 15% more computing time than the Encke method. The relatively large computing time required by the Herrick method is partially accounted for by the fact that the Adams-Moulton formulas are of lower order than the Gauss-Jackson formulas. Since the latter will allow perhaps twice as large an integrating step for the same accuracy, the adjusted computing time would be about six minutes or only slightly more than for the Cowell method. In general, there appears to be no large differential in adjusted computing time among the three methods. Relative to accuracy the Herrick method yielded the best results and the Encke method, the worst results, although again there appears to be no order of magnitude preference for any one method. The favorable showing of the Cowell method is accounted for largely by the conscious effort to reduce round-off error growth by the use of double precision accumulation. As mentioned earlier, both the Encke and Herrick methods are considerably more complicated programs and require much more careful numerical analysis. In particular, inaccuracies introduced in the Encke method during the rectification process probably account for most of the errors. In the results given here, four rectifications were used. Rectifying more frequently, or less frequently, led to poorer results. It is entirely possible that the errors in the Encke results can be reduced by an even more careful analysis of the operations involved. For this type of orbit there appears to be very little reason to prefer Encke's method to Cowell's method.
44
SAMUEL D. CONTE
A more detailed comparison relative to accuracy is contained in Graphs I-IV where the errors in position, mean anomaly, energy integral and semimajor axis are plotted a t selected times on the 20th, 40th, and 64th revolutions. All methods gave exceedingly good results on the first few revolutions and hence these errors are not included. Graphs I and I1 show that the Encke errors are growing a t about four times the rate of the Herrick errors and two times the Cowell errors. A comparison of Graphs I and I1 also shows a strong correlation between position and mean anomaly errors. Indeed for all methods over 90% of the position error in magnitude and direction is accounted for by the mean anomaly error; i.e., the error is largely along the path of the motion. We observed earlier that this phenomenon applied to general perturbation methods as well. From these results, we may conclude that the dominant errors in position arise from accumulated truncation or approximation errors which are systematic in nature rather than from round-off which is essentially random in nature. Graph 111 is a plot of the errors in the semimajor axis for the Cowell and Herrick methods. The Encke results are omitted because they are more erratic but generally they are of the same order of magnitude. This plot, as well as those for e and i, which are not included here, show that the geometrical elements a, e, i can be more accurately determined than the position. Thus, for the Herrick method, the absolute value of the maximum error in a is 3 X lo-’ ex., while the maximum error in the position is 2X ex. Graph IV is a plot of AE, the difference between the known energy integral and that obtained from the integration. It will be seen that for all methods the energy E remains constant to a t least seven significant figures while the position coordinates are correct to only four or five significant figures. This observation supports the statement made earlier that the constancy of a known integral of the motion is a poor positive test of accuracy. The results presented here may be assumed to be representative for earth satellite orbits in which the dominant perturbative force is that due to oblateness, and which avoid the exceptional cases of zero eccentricity, zero inclination, and critical inclination. The results should not be assumed t o be typical for lunar or interplanetary trajectories where the dominant forces are different in nature. Indeed, as observed earlier, the Encke method does seem to have a clear advantage over the Cowell method relative to speed on lunar flights. While this study is by no means complete, it does give an indication of what one may expect in everyday use of machine programs based on special perturbation methods.
SATELLITE ORBIT TRAJECTORIES
0
a
X
0
45
X
0
a
X
0
a
X
0
X X
0 0
X
0
X
0
X
0
a!;a
X
u
'r o x a O X a o x a
a
s
k
G o % I
0 (0 I
"
0
w
2
W
* 00 0 I d
0
*
N
OI
N
0
0
0 0 0
0
0
0
0
0
9
9
0
9
at
SAMUEL D. CONTE
46
0
0 0
X
0 0 0 0 0 0
0 0
0 0 0
0
-If
t 0.6
-
-0.2
-0.4
-0.2
0
t0.2
I
l
l
X
l
&
I
I
I
6220
//
//
//
I 9760
I
I
I
x
1 1 T (MIN.) 9940 I
=
rn
5
rn
-I
P
A
2940
4/ I
-
-
A & 6040
l
a l
I
l
6220
1
&a
x
GRAPH3. Semimajor axis errors Au. GRAPH4. Energy integral errors AE.
// "
1 3120
A
x . x
//
I
9760
A
n
I
Y
x x x
I
A
I
I T(MIN.) 9940
n
xx
P
v
v,
rn
6040
-
//
"
1 3120
n
Q
3
I 2940
&
X
-
b/
t0.4
-0.6
-0.4
-0.2
0 .
t0.2
$0.4
48
SAMUEL D. CONTE
6. Orbit Determination and Tracking Methods
Once a satellite has been launched, it is important to determine accurately the actual orbit of the satellite. A preliminary orbit determination is necessary for obtaining steering information for the nearest observing station and in addition for deciding on the amount and the timing of a corrective thrust so as to achieve a certain objective. A more definitive determination would be needed to obtain improved information about certain astronomical and geophysical data. The basis for improved orbit determination rests upon observational data which is made available during the course of the flight. The observing stations may have optical, radar, or Doppler equipment; the observations may take the form of angular information, range from the observing station to the satellite or range rate. The observations are of varying accuracy, and reliability. In general optical observations are more accurate than radar observations and moreover are highly reliable. Optical accuracies on the order 0.2" of arc are attainable. Optical devices cannot always be used however under certain atmospheric conditions and they cannot measure range. Electronic devices include pulsed radar systems, radio interferometers, and Doppler systems. Range and range rate can typically be determined to a few parts in a million. The observational data will undergo processing both a t the observation site and at a central computational site. The on-site calculations may include corrections for instrumental or environmental errors, and changes in coordinate systems. It is advisable to keep the on-site reductions to a minimum, however, and to rely heavily on a central computational facility to make the necessary transformations and corrections. The central facility should be responsible for the following functions: (1) receive and transform early data; ( 2 ) edit the data for systematic or accidental errors; (3) determine a preliminary orbit; (4) transmit quick look data to the receiving stations; ( 5 ) as more data is received, determine an improved orbit using differential corrections; ( 6 ) determine a definitive orbit after sufficient data has been obtained. As indicated above it is customary to distinguish three phases of orbit determination : preliminary, improved, and definitive. During the preliminary phase information may be limited. For a friendly satellite one has available the nominal precalculated burnout conditions and trajectory as well as launch site observations. Abundant data is usually available and this information is sufficient to apply differential corrections immediately. In the case of a foreign satellite very incomplete data of various types and varying accuracy may be available. A variety of methods are available for preliminary orbit determination based on the type of information given,
SATELLITE ORBIT TRAJECTORIES
49
among these being those associated with the names of Laplace, Gibbs, Gauss, and Herrick. Baker [S] gives an excellent summary of these methods, but the variations in these approaches will not be pursued here, As more data is obtained it will be possible to determine an improved orbit from which ephemerides can be computed for that portion of the satellite’s life during which the satellite is being tracked. I n addition to station ephemeris print outs, improved orbits are useful in determining salient characteristics of the flight such as its lifetime and period and also in providing trajectory information which may be needed if later corrective thrusts are to be applied. A definitive orbit is one which fits best all of the observational data. It can be determined at leisure. Its primary function is to lay the foundation for the proper interpretation of physical experiments and for more accurate determination of geophysical and astronomical constants. Definitive orbits have been used to improve our knowledge of the shape of the earth, the atmospheric density profile, and the astronomical unit. In an attempt to determine the astronomical unit at Space Technology Laboratories, some 20,000 observations based on the Pioneer V satellite were processed over a period of six months before a definitive orbit was obtained. Differential corrections are indispensable to improved and definitive orbit calculations. This section will be concerned primarily with a description of the method of differential corrections as applied to orbit determination. Theoretically an orbit is completely determined if 6 independent initial parameters are given. The objective of a differential correction process is to determine these initial parameters so that the best fit possible is obtained between the subsequent trajectory based on these parameters and the observations. In addition to these injection parameters, differential corrections may also be used to determine constants such as the drag coefficient and the solar parallax, and even to refine radar station locations. In describing the process we shall be concerned primarily with the determination of the injection (burnout) parameters. 6.1 Editing the Observational Data
The goodness of the determination will depend largely upon the accuracy of the observations. It is important to discover and eliminate any data which is grossly in error, especially early in the flight since a distorted fit may result, when only a small number of observations are being used. The errors may be accidental in nature, such as transcription errors, or more systematic errors such as arise from faulty instrumentation. The editing is sometimes performed by the orbit analyst by inspection of the data or hy passing a free hand curve through the points. However, this is generally
50
SAMUEL
D. CONTE
too slow during the early stages of a flight and will certainly not uncover systematic errors. A data editing routine which pre-processes the data is recommended. A brief description of such an editing routine follows. Let (ti, yi) (i = 1, . . . , N ) represent a discrete set of observations y; of a particular type at times t ; over one pass from a given station. The first differences of the yi are examined and the points contributing the largest such differences are determined on a statistical basis. The local pattern configurations of these points are then examined and some of these points are rejected on the basis of inconsistency. A regression analysis is then used on the remaining points to obtain an average regression function m(t). Now the original data is compared with the estimates of m(t) and the residuals are obtained as well as the variance 8.Those data points which lie outside of a specified band width from m(t) are rejected as outliers. Of course one must use care to insure that good points are not rejected. While such a routine cannot hope to detect all of the errors, it will certainly eliminate all of those which are likely to distort the fit. More subtle errors such as bias errors can probably be detected from statistical evidence after a fit has been made. A detailed description of such a routine, which has been successfully used at STL, is given by Aroian and Robison [ l ] . 6.2 The Least Squares Problem
The differential correction process which is commonly used to determine the injection parameters for a satellite is usually justified by setting up a statistical model and making certain assumptions concerning the distribution of missile and radar errors [%,491. We shall content ourselves with a heuristic approach with the objective of stressing the basic programming and computational techniques and problems. The trajectory of a satellite is completely determined when the burnout position and velocity are specified. Several coordinate systems are available for specifying the initial values among the more common being: (a) (XO,yo, 20, &,y o , i o ) ; inertial rectangular coordinates (see Fig. 1); (b) (a,e, i, Q, w , 7 ) ; instantaneous classical elliptic elements at time t o (see Fig. 2) ; (c) (T, a, 6, v, p, A ) ; local spherical coordinates at time to with T defined as the distance from the origin of the coordinate system to the satellite, a the right ascension and 6 the declination of the position, v the magnitude of the velocity, p the angle between the velocity vector and the local vertical and A the azimuthal angle measured from north of the velocity vector (see Fig. 1). While the corrections may be applied to any of these sets of initial values, there are distinct differencesamong them relative to computational efficiency and convergence properties. These differences will be discussed
SATELLITE ORBIT TRAJECTORIES
51
FIG.1. Inertial rectangular and local spherical coordinates.
later. It is convenient to pursue a general treatment here. Thus let p = ( p l , . . . , ps) denote any set of six independent burnout (or injection) parameters. Let v = v(t, p) denote a radar observation from any station (e.g., elevation angle El azimuth A , range R, range rate I?). If the time t and any value of p are given, the equations of motion can be integrated and v(t, p) computed from the trajectory information and from analytic expressions for the type of observation. I n a typical situation an approximate value po of the true burnout parameter p mill be known, either from the nominal prelaunch conditions or from a preliminary orbit determination. We will also have an observed value of v which in the absence of error would be the true value of v. The object of a differential correction procedure is to determine the correction vector q which must be added to the approximate value po to obtain the true burnout vector p, i.e.,
P or in terms of components
= PO
+ q,
SAMUEL D. CONTE
52
The radar coordinates are then expanded in a Taylor’s series about the computed value u(t, PO). If po is suffkiently close to p, terms of order higher than the first in the corrections may be neglected. Thus we have
or 6
=
ao.q
where 6 is the difference between the true and computed values of v and ao is a vector of partial derivatives of v with respect t o the initial elements evaluated a t p = PO. Since there are six unknown corrections (91, . . . , 96) t o be determined we must have a t least six observations. In general there will be many observations available, say N >/ 6. We thus obtain the socalled equations of condition 6,
=
(i
a,O.q
1,.
=
. . ,N ) .
(6.4)
In general because of errors in the observations we will not know the 6, exactly. Instead we will have an approximate value 6’%.
+
6‘s = 6%
%,
where E, is a random error. The usual procedure is to determine the correction vector q so as t o minimize the weighted sum of squares of these errors, i.e., to minimize N
N
F(q) =
C a=1
u,e?
=
C a=1
w,(6’, - az0.q)2.
(6.5)
The solution of this least squares problem is obtained in the usual may by solving the system of equations
-aF= o dqj
( j = l ,. . . , 6)
or
Written more compactly, we have ATWAq
=
ATW6’
(6.7)
where A is an N X 6 matrix of partial derivatives, W is a diagonal matrix where elements are the weights wi, and ATWA is a symmetrical 6 X 6 matrix, and 6’ is a column vector of residuals. These are the so-called normal
SATELLITE ORBIT TRAJECTORIES
53
equations from which corrections q and a new estimate of the burnout vector p' are obtained. Using p' the trajectory is recomputed and new residuals v(t, p) - v(t, p') are obtained. Because of the linearity assumption, the residuals may not have been sufficiently reduced and it may be necessary to recompute new partial derivatives evaluated at p' and t o iterate the least squares solution until the residuals are sufficiently reduced or until no further change in the residuals is obtained. The best choice for the weights w z would be l/uiz where u,is the standard deviation of the errors in the observations. However, the ui are not known a priori. Normally one takes an estimate for wi and keeps it fixed while carrying out the iterated least squares solution. It has been suggested that since a posteriori estimates of the standard deviations u iare available after the first iteration that w i be modified accordingly and that this new set of weights be used in the next iteration. This procedure changes the fundamental nature of the least squares problem and it is not clear that this procedure will converge, or if it does converge, to what it will converge. Indeed it has been shown [29] that for some cases this procedure using where k is the iteration number will certainly diverge. I n general, even if the assumed fixed weights are only approximate but reasonable, and if po is close to p, the iterated least squares method will converge to a solution which is independent of the weights. In some cases, however, the choice of wi is critical. We shall indicate later another more sophisticated method for choosing the weights. The differential correction procedure will sometimes fail to converge or will sometimes lead to unreasonably large corrections. Mathematically this will occur when the normal matrix in (6.7) is ill-conditioned, or nearly singular, an effect which results when the determinant is small in a relative sense or when the ratio of the dominant eigenvalue of ATWA to its smallest eigenvalue is large. In cases like this small changes in the coefficients may lead to large changes in the solution of the system. Thus there may not be a unique solution within the accuracy desired. Failure of the iterated least squares procedure to converge is generally an indication that the initial estimate pa was not close enough to the true burnout vector to make valid the assumption of linearity on which the differential correction process is based. Among the reasons for nonconvergence or nonuniqueness of the solution are the following: (a) The initial estimate pa may be very poor. (b) The observations may be too few or concentrated on a portion of arc which leads to indeterminacy, as for example at perigee. ( c ) The observations may be too restricted in type, as for example angular data only from a single station. (d) The quality of the observations may be poor.
SAMUEL D. CONTE
54
In some cases when the data is not sufficiently separated to determine the burnout vector completely, it may still be possible to determine some of the initial parameters. The data may for example allow a good determination of the magnitude of the velocity v and the flight path angle /3 but not of the other elements. In such a case, one would carry out the differential correction process for only two corrections. Several techniques t o reduce ill-conditioning of the normal system and to improve convergence of the iterated problem have been suggested. One simple and effective technique is t o solve instead of the normal system (6.7) the system (ATWA D)q = ATWG (6.8) where D is a diagonal matrix whose elements are chosen so as to restrict the size of the corrections which can be made on any one iteration. By proper choice of the elements d j of D convergence can be forced. The difficulty with this technique is that it is not clear how the elements of D must be chosen so as to guarantee convergence and at the same time to make the convergence reasonably rapid. A technique which provides a systematic method for preserving linearity has been proposed by Morrison [28]. This method which makes use of the eigenvalues and eigenvectors of the normal matrix will be described here briefly. The objective of a differential correction process is to minimize a sum of squares of residuals F ( q ) which we assume in the form
+
F(q) = 1 1-4- b1I2 (6.9) where the norm of a vector w of N components is defined by I[w][2= CP w2. This leads to the normal system (6.7). Instead of solving this problem, we may ask for a vector q which will minimize F(q) under the constraint that the corrections qj do not exceed certain bounds. The solution of this problem as developed in [28] is as follows: Let B = ATA, c = A%, B1 = D-lBD-' where D is a diagonal matrix whose elements will be given later. The normal equations may be written: or
Since B1 is symmetric and positive definite it has a set of n ( = 6) positive eigenvalues X i and corresponding eigenvectors ui, i.e., BiUi =
xiui
(i = 1, . . . ,n).
(6.11)
BiU = UA where 0' is the matrix of eigenvectors (ul, . . . , u,) and A is the diagonal matrix of eigenvalues. Since UTU = I (6.11) becomes
SATELLITE ORBIT TRAJECTORIES
55
R1 = CAUT and the normal system (6.10) becomes
(UAUT)y = D-'c
UTD-lc = e UTy. It can then be shown that if we choose Az
where z
=
=
and hence the sum of squares of the residuals is actually reduced. Under the transformations given above and since l x j l Q 1, and therefore,
11ZlP
=
IlDqlP
n
In actual practice we proceed as follows: (1) First choose M i such that for Iqjl < M j linearity will be preserved. (2) Set d, = &/Mi. (3) Compute the normal matrix B and the vector C . ( 4 ) Find the scaled matrix D-lBD-l and D-lc. ( 5 ) Find the eigenvalues Xi and the eigenvector matrix U of B1. This can be done most efficiently with the Jacobi method, a routine for which is available in the SHARE library for IBM machines. (6) Find in succession the vectors z = A-lUT(D-'c), y = UZ and q = D-ly. (7) Also compute the diagonal of B-' using B-l = D-l(UA-lUT)D-l. The theory indicates that the iterated least squares problem will always converge if the bounds M j are chosen small enough. The theory does not indicate how these bounds are to be chosen. They are related to the bounds on the second derivatives of the radar coordinates but these are not usually available. It is not difficult to obtain reasonable values for the M i based on experimentation. Using for the initial parameters (r0,ao,a0, oo, Po, A,) a set of reasonable bounds are M I = 200,000 ft, M z = M 3 = 3 O , M4 = 1000 fps, Ms = M S = 3' where all angles are expressed in degrees, distance in feet and velocity in feet per second. This eigenvalue-eigenvector procedure requires perhaps 20 or 30 times as much computing time as the straightforward Gaussian solution of the
56
SAMUEL D. CONTE
system (W3-perhaps 1 sec on an IBM 709 computer. However, this is still a tiny fraction of the t.ime required to compute the trajectory and the matrix A of partial derivatives, and the increased information obtained together with the guarantee of convergence make the procedure well worth the additional time involved. A routine for the eigenvalue-eigenvectorsolution of the constrained minimization problem has been developed at Space Technology Laboratories and is available to interested parties [3O]. Using this procedure detailed information can be obtained about the nature of the difficulties encountered. In particular correlation effects can be easily discovered. By examining the eigenvectors and eigenvalues, one can say which elements are well determined and which poorly determined and one can obtain estimates of the standard deviations of the corrections from the standard deviations of the radar errors. 6.3 Stagewise Differential Corrections and Shifting Parameters
After all of the available observations have been used to determine the best set of trajectory parameters, new observations may become available and it is reasonable to expect that this new information will lead to further improvement in the trajectory parameters. One could read in all of the data, both new and old, and perform a least squares fit. However, since this is wasteful of machine time, it is desirable to read in only the new data to obtain modified parameters. This problem has been discussed by Swerling [43! and others. In treating this problem, it is useful to consider a statistical model for the tracking of space vehicles. In the approach described in Section 6.2 the errors in the observations are assumed to be independent and normally distributed with mean zero. Under these assumptions the matrix W is the inverse of the covariance matrix of the observational errors and the inverse of the normal matrix (ATWA)-l is the variance-covariance matrix of the estimates of the corrections due to these errors in the observations. The diagonal elements of (ATWA)-l are the variances-i.e., the square of the standard deviations-of the components of the correction vector and the off diagonal elements are the covariances. Now let p' be the best set of initial parameters obtained from all of the old observations and let a,' be its covariance matrix. Let 6~ represent new residuals based on the new observations, let q, = p - p' be the desired corrections to the latest parameter estimate p' and let AN be the matrix of partial derivatives of the new observations evaluated for p = p'. One can then solve the system of equations (6.12)
for the corrections qAy.This proccss can be repeated as often as desired. For
SATELLITE ORBIT TRAJECTORIES
57
handling a large number of observations it is convenient to batch them into groups of say 50 to 100 observations and apply (6.12) to these groups successively. After tracking an earth satellite over many revolutions or over a fairly long arc of an interplanetary flight, it may be desirable to choose a set of parameters at some new epoch t l a t which to make the corrections. One reason for this is that the normal matrix becomes increasingly more illconditioned if corrections are made a t time to when data is used over too many revolutions. Another reason is that the integration of the equations of motion from launch time is time consuming. At the same time one wishes to preserve some of the information gained from the earlier least squares fits, although the new observations should be given more weight than the old. At any stage of tracking we have an estimate pO(to)= (p?, . . . , pG0) of the true burnout vector p(4) and the associated covariance matrix u(to)for parameters a t time to. From integrating the trajectory based on pO(to)we also have an estimate po(tl) = (plo(tl), . . . , p a ( t 1 ) ) of the parameters a t time tl. If we wish to shift to a new time tl a t which to make succeeding corrections we must obtain the covariance matrix associated with the normal matrix a t time tl. This may be done as follows. To first order terms we have p(to) - pO(to)= S[P(tl) - PO(tl)l (6.13) where S is a matrix of partial derivatives whose elements arc
The matrix puted from
X is not immediately available but its elements may be com(6.14)
where x = ( 5 1 , 5 2 , 5 3 , 5 4 , s5,ss) is the vector of missile position and velocity. In symbolic matrix form (6.14) is (G.15)
The matrices are evaluated for pO(to).The elements of Q are ordinarily available since they are needed for the differential correction procedure and the elements of P can be obtained directly since x(t1) is known explicitly as a function of the parameters p. Then the covariance matrix a t time tl is obtained from (6.16) U+(tl) = ST(t0)u-1( to)S(to) .
SAMUEL
58
D. CONTE
The matrix u-l(tl) is then added to the normal matrix obtained from observations obtained after time tl and succeeding corrections are made to the parameters a t time tl. 6.4 The Partial Derivatives
The most time-consuming portion of any differential correction procedure is that required for obtaining the partial derivatives of the radar coordinates. The method by which these derivatives are obtained is therefore of considerable importance even though extreme accuracy is not necessary. Three methods for obtaining these derivatives will be described.
6.4.1 The Variant Method Letting v(t, p) = v(t, pl, . . . , pa) denote as before any observation and the nominal value of the initial burnout parameter we must compute ( d ~ / d p j ) ~ =To ~ o .obtain an approximation to av/dpl for example we niay form the different quotient. PO,
*I
aPlP=P
N
v(t, PI' -I- Apl0jPzO, *
* 9
-
v(t, PI', . . . 9
P6').
&I'
This requires running two trajectories with the parameter pI0 perturbed by an amount Aplo.To obtain all six of the partial derivatives requires seven trajectory computations in all, usually performed simultaneously. The amount of the increment Apjomust be chosen carefully to obtain maximum accuracy. If it is chosen too small, accuracy will suffer from loss of significance through subtraction of nearly equal quantities; if chosen too large accuracy will suffer because the difference quotient will be a poor approximation to the derivative. The best value of the increments to be used will depend upon the particular parameter and upon the coordinate system being used, and should be determined by experimentation. The variant method suffers from some obvious disadvantages. The computation of seven trajectories is very slow and leads to awkward computational procedures, including the necessity of storing large amounts of information. This is particularly true when real time operations are involved. Moreover, the accuracy attainable is a t best of low order because of the use of the first order difference approximation and also because of accumulated round-off and truncation errors coupled with loss of significance through cancellation. On the other hand, the variant method is simple in concept, it requires very little additional programming, and it can be applied readily to the determination of any constants such as the solar parallax, the drag coefficient, etc.
SATELLITE ORBIT TRAJECTORIES
59
6.4.2 The Variational Method The derivatives of the radar coordinat.es depend upon the variations of the solution as a function of the initial conditions. If ( 2 , y, z, 5,g, S) gives the position and velocity of the vehicle in an inertial rectangular coordinate system and pa = (p?, . . , p6') a set of independent initial parameters we wish to obtain the following matrix of partial derivatives at any time t.
.
-
-ax .
.
2%. ap,o
.
aplo
.
.
.
ap60
.
.
.
.
.
a~
.
-app ai.
...q ...*I
(6.18)
6 0
One way of forming this matrix is to use the variant method described in Section 6.4.1. However, a more direct approach is to solve a related system of variational equations. If the equations of motion are assumed to have the form 2 =
f(z, Y,z , t )
Q
=
2
=
g(x, Y, z , t ) h(x, y, 2 , t )
then the variational equations corresponding to one component of the burnout vector, say pk are
i = fzE + full + fir
+ + h d + h,q + h d
9 = SUE
5
=
9Ytl
szr
(6.19)
cients in (6.19) are to be evaluated upon the unperturbed or nominal solution. The initial conditions to be used in solving (6.19) are
If in particular p is the set (20,yo, zo, Lo, go,io)and if pk is one component of p, say 20, then .$o = 1, go = 70 = 7jo = pa = fo = 0. For each fixed value of pk we solve the variational equations (6.19) under the appropriate conditions and we obtain one column of the matrix (6.19). To obtain the full matrix of partial derivatives (6.18) the system for (6.19) must be solved
SAMUEL D. CONTE
60
for each variation with initial conditions given by the 6 X 6 unit matrix when the parameters p k are the initial rectangular coordinates. In all a set of 42 first-order equations must be solved. Fortunately the coefficients in (6.19) are all evaluated for the same nominal values and since the six sets of variational equations are solved simultaneously, the over-all computing time required is about two or three times that required for the basic equations of motion. This method is therefore at least twice as fast as the variant method and is moreover considerably more accurate, the accuracy being comparable to that of the basic integration itself. In addition it is computationally more efficient since all the information required is available simultaneously. It should be noted that the solutions of the variational equations are also useful for determining miss coefficients, for setting the requirements for guidance accuracy, and for determining changes in initial conditions which result in impact or in meeting certain terminal conditions. To illustrate the form of the variational equations we derive them for the case of a vehicle moving under the influence of an oblate earth and n spherical bodies. The equations of motion as specialized from (2.1) and including only the second harmonic in the Earth's gravitational potential are
with similar equations for y and Z and where defined in Section 2. Letting
p,
mi, m, ri, pi, and J' are
the variational equations are:
(;) (
P,, - Q
= 3p
P,,
+ J'(x2S - U )
+ ZyJ'S
P,, 4- xzJ'T
+ + -+ yzJ'T
P,, XYJ'S P,, - Q J'(g2S - V ) P,, P,,
+ +
xzJ'T P,, xzJ'T P,, - Q f J'(z2T - 3U) where
SATELLITE ORBIT TRAJECTORIES
61
and all other elements of the matrix in (6.21) are obtained from symmetry. This matrix is of course somewhat more complicated if perturbations due to drag and higher order oblateness terms are included. When the burnout parameters are other than the rectangular s-t (TO,YO,Zn, lo,$0, So) the initial conditions for the variational equations will be different. Since our personal preference for reasons to be given later is the local spherical coordinates (To, ao,60, vo, Ao,Po) it may be worthwhile to exhibit the initial conditions for each of the variations in this coordinate system. The equations relating the rectangular and spherical coordinates are (Fig. 1): xo = rg cos a0 cos a, yo = ro cos 6o sin a0
zo
=
20
=
sin 6 0 vo[(cos 0 0 cos 6o - cos A , sin Po sin 6 0 ) cos a0 - sin A0 sin Po sin ao] yo = ~ ~ [ ( c Po o scos - cos A , sin Po sin So) sin a. sin A0 sin Po cos a,] t o = eo[cos A0 sin Po cos 60 cos Po sin So] TO
+
+
From these we arc to derive
where ph is any one of the set we have
(To,
ao, SO,
Aa, Po). For the r0 perturbatio:is
to,
(6.22%)
To For the
For the
to
= -To
a0 perturbations
60
sin
=; :
=
0.
we have
'$0
= --yo;
719
=
lo
= 0;
xo;
perturbations: cos a,;
to
go =
-zo cos a.
go =
-yo
?jJ = X o
to = 0.
(6.2213)
SAMUEL D. CONTE
62
For the vo perturbations : = 0;
go = io/vo
70 =
0;
q o = yo/vo
lo
0;
to= io/vo.
40
=
(6.22d)
For the A0 perturbations: =
0;
70 =
0;
40
= vo[sin A . sin 6o cos a.
- cos A, sin aO]sin Po
+
GO = vo[sinA0 sin 60 sin a. cos A0 cos a,] sin Po (6.22e) t o = -vo[sin A, cos 60 sin P O ] .
r o = 0; For the Po perturbations: =
0;
io
70 =
0;
40 = -vo[(sin POcos 60
to
-vo[(sin
=
PO
cos 60
+ cos
cos p0 sin s ~cos ) a.
+ sin A0 cos Po sin aO]
+ cos AOcos POsin SO) sin
a0
(6.22f)
- sin A0 cos Po cos L Y ~ ] l o = 0;
50
=
cos p0 cos 6o - sin po sin s,)].
td(cos
In actual usage the radar observabions are referenced to a topocentric rectangular coordinate system (f, g, h , f , g, h) located at the station site. In terms of these the usual radar coordinates ( R = range, A = azimuth, E = elevation, k = range rate) may be expressed
+ 9 2 + h*)l/Z = R ( z , y, x )
R
=
(f2
A
=
cos-1
E
=
A
=
+ .f -
(SZ
s i r 1(h/R) 1
- (jj
R
g*)'/2
A ( z , Y,
2)
= E ( z , 7J,
x)
=
(6.23)
+ gg + hh) = A(z, y,
2, k ,
g, 2 ) .
The second equality indicates the functional relationship obtained when the topocentric coordinates are expressed in terms of the inertial rectangular coordinates (2,y, z, 2 , ?j, i ) .Explicit formulas for these are given by Hanson and Routh [16]. If the partial derivative of say the range R relative to any one of the initial parameters p = (r0,ao,So, v,, A,, Po) is desired one forms _ aEz -- aEz ax - + - A aR+at/ - aR ax ap 9.r ap au a p az a p where (aR/az, aR/ay, aZi/az) are obtained by differentiating Eqs. (6.23) and ( & l a p , a R / a p , a R / a p ) are obtained from the solution of the varia-
SATELLITE ORBIT TRAJECTORIES
63
tional equations corresponding to each component of the burnout vector p. For range rate which depends on velocity as well as position we must use
The variational method can be used effectively with any numerical integration scheme. It gives very accurate values of the partial derivatives, and leads to a computationally efficient method. A complete derivation of these variational equations for the N-body problem with oblateness and drag terms included is given in Smith [%I. 6.4.3 The Analytic Method Let a represent the vector of instantaneous elliptic elements (a, e, i, Q, w , T ) and let 8 0 : (ao,eo, io, Q0, wo, r 0 )be their values a t time t = to. Corrections are then desired to a".The equations of condition are of the form
where v = v(t, a) represents any observation a t time t and Av is the difference between the observed and computed observation. The observations may also be considered functions of rectangular coordinates, p: (x,y, x , f, g, 2). The partial derivatives can be obtained by the chain rule. Symbolically, letting a, aO, p represent any component of the vectors a, ao,p we have (6.24)
where the parentheses represent 6 X 6 matrices. In practice the matrix (aa/aao)which gives the change in the elements at time t relative to the initial elements is assumed to be the identity matrix, i.e., the elements are assumed to change so slowly that the effect on the partial derivatives is negligible. This assumption is reasonably good over a small number of revolutions of an earth satellite but will certainly be a poor approximation over many revolutions. However, the assumption is necessary if we wish to obtain completely analytic formulas for the partial derivatives. Under this assumption (6.24) yields for example
with similar expressions for the other partial derivatives. Since the radar coordinate v is known explicitly as a function of the rectangular coordinates p and since the rectangular coordinates are known explicitly as functions of the elliptic elements, the partial derivatives can be computed analytically and evaluated a t any required time. As might be expected the formulas
64
SAMUEL D. CONTE
are rather complicated. The formulas for the derivatives of rectangular coordinates relative to the elliptic elements were first derived by Eckert and Brouwer [ I 1 1 and more recently by Smith [39]. Similar formulas which take into account drag and oblateness are derived by Garrett and others [14l. The use of these analytic formulas makes unnecessary the repeated solution of the equations of motion or of the variational equations and leads to an unquestionably faster procedure for carrying out a differential correction process. Since it is necessary to neglect the effect of perturbations acting on the vehicle, and since secular effects are not taken into account, there is some danger that the formulas used may not be accurate enough over extended time intervals. The following conclusions may be reached concerning the three methods which have been discussed for obtaining the matrix of partial derivatives: (1) The variant method is computationally the least efficient. However, its simplicity and universality of application make it a useful tool in special cases. It is not recommended for the main problem of tracking-that of determining the burnout parameter. ( 2 ) The variational method gives the most accurate results and is very efficient computationally. However, it is complicated and must be changed with the force model. (3) The analytic method is the fastest computationally. It is particularly suitable for tracking near earth satellites although it.s accuracy is questionable over long time intervals. 6.5 The Choice of Burnout Parameter Coordinates
As noted above any number of coordinate systems are available for the burnout parameter vector. Moreover it is not necessary that the integration be carried out in the same coordinate system. Thus one may integrate in rectangular coordinates but make corrections based on a spherical coordinate representation of the burnout parameters. The three most commonly used burnout vectors are rectangular, elliptic, and spherical. Apart from considerations of convenience, there are apparently fundamental reasons for preferring one coordinate system to another. It has been observed that the coordinate system used affects the rate of convergence of the differential correction process and the accuracy of the determination, at least for certain orbits. The mathematical reasons behind this observation are not clear but they are apparently related to the conditioning of the normal matrix. This matrix appears to be more nearly orthogonal for instance when local spherical coordinates are used than when elliptic elements are used so that the corrections are more nearly independent in the former case. The use of elliptic elements for differential corrections will sometimes
SATELLITE ORBIT TRAJECTORIES
65
lead to difficulties for low eccentricity orbits or for critical inclination orbits. In particular, an attempt to determine the eccentricity e on nearly circular orbits may lead to negative eccentricities. There are, of course, many sets of initial elements on which to base differential corrections, and one might hope that among these there would be a “best” set although the mathematical criteria for such a set are difficult to establish. Aeronutronics [S] has recently proposed the use of the initial position vectors Uo, Vo (Fig. 2) which, it is claimed, is applicable without restriction to all types of earth satellite orbits. The vector Uois the initial radial unit vector with components ( x / r , y/r, Z / T ) and the vector Vo is the transverse unit vector These two vectors plus the semimajor in the orbit plane perpendicular to Uo. axis and the quantities e sin Eo,e cos Eo constitute the six parameters of the representation. NORTH CELESTIAL POLE
t
FIG.2. Projection of orbit on celestial sphere.
66
SAMUEL D. CONTE
Experience indicates that, for lunar or interplanetary flights, the rate of convergence and the accuracy attainable with differential correct.ionsare best when local spherical coordinates are used and poorest when elliptic elements are used. 6.6 Error Analysis in Differential Corrections
6 6 . 1 Sources of Error The estimate of the burnout parameters obtained from differential corrections is affected by three primary error sources: (1) the linearization process; (2) errors in the computations; (3) errors in the observations. It has already been observed that the iterated differential process may fail to converge if the linearity assumption is violated. I n Section 6.2 a method was described for solving a constrained least squares problem in which the size of the corrections is restricted sufficiently so that linearity does hold. This approach will generally lead to convergence even when the initial estimate of the burnout parameter is poor. Computational errors may arise from several sources including : (a) the basic integration for position and velocity; (b) the computation of the matrix A of partial derivatives; (c) the interpolation for the exact time of the observation; and (d) the solution of the least squares problem. Integration and partial derivative errors become especially serious when long time periods are involved. The secular error in the mean anomaly increases with time thus limiting the time over which accurate predictions can be made. The normal matrix will become increasingly more ill-conditioned as the partial derivatives decrease in accuracy. Interpolation for the exact time of the observation should be very carefully done, using double precision if necessary. The normal system of equations will usually be somewhat ill-conditioned and great care should be used in solving it. If the eigenvalues and eigenvectors of the normal matrix are desired, the Jacobi rotation method [SO] is probably the best available. In general effective numerical techniques are available for reducing the impact of computational errors so that these errors should not be dominant, except in the special case of tracking over long time periods. The more important of the observational errors are of the following types: (a) random errors; (b) bias or instrumentation errors in measuring range, elevation, azimuth, or range-rate; (c) errors in measuring the time of the observation; and (d) station location errors, particularly due to errors in the geographic latitude and longitude of the observing station. The observation errors are difficult to assess and are usually treated on a statistical basis. Some knowledge about the standard deviations of these errors and of their distributions can generally be obtained from experimen-
SATELLITE ORBIT TRAJECTORIES
67
tation. Relative to these errors two important problems can be formulated. The first problem is this: under certain assumptions about the distribution of these errors, determine their effect on the estimate of the burnout parameters obtained from differential corrections. The second problem is: using differential corrections determine the errors-e.g., bias or station location errors-for a particular station. Although the two problems are related, they will briefly be considered separately. A gross estimate of the errors in the corrections obtained from the least squares solution may be obtained as follows. Let q represent the vector of corrections, A the matrix of partial derivatives with respect to the initial parameters, and b the vector of residuals, i.e., the difference between the measured and the computed value of the observations. The normal system with normalized weights is ATAq = ATb arid the solution is q = (ATA)-'ATb. Let 6A represent the matrix of errors in A and 6b the totality of errors in the observations. Then the errors in the corrections 6q can be estimated to first order by 6q = (ATA)-'{AT(6b - 6Aq) (b - Aq)).
+
If the errors 6q are larger than q, then the iterated least squares process is almost certain to diverge. I n practice 6A and 6b will not be known very well so that estimates based on this computation will be very crude. An even simpler indication of the convergence of the iterations can be obtained by examining the covariance matrix (ATA)-l. If the elements of this matrix, for example, are large the estimates of q is probably poor and continued iteration will probably lead to divergence. If more is known about the distribution of the errors in the observations, a more detailed analysis of the errors in the estimate of q can be made following the statistical approach discussed in Section 6.6.2. 6.6.2 A Statistical Estimate of liJrrors We will consider here the first of thc problems posed in Section 6.6.1, i.e., under proper assumptions about the nature of the observational errors, determine the covariance matrix of the estimated corrections in the burnout position and velocity as determined by differential corrections. For simplicity we consider that the observations are over one pass from a single station. The observations are subjected to random, bias or station location errors. We make the following assumptions about these errors. The random errors are independent, normally distributed with mean zero. The bias errors are constant over each pass for each of the four observation types
SAMUEL D. CONTE
68
(R, A , E , R ) ; these are independent of each other, and the covariance matrix of these errors is given. The station location errors arise from imprecise knowledge of latitude and longitude, and affect only elevation and azimuth measurements; these errors are independent of each other and of the bias and random errors and their covariance matrix is known. The nominal values of the bias and survey errors are assumed to be zero. ) = 1, 2, 3 ) Denote the errors of the three types by &), d2),d3)and let U ~ G(i be their respective covariance matrices. If there are n observations then dl)will be an n X 1 column vector, e(2) is a 4 x 1 vector of bias errors, e(3) is a 2 X 1 vector of survey errors; uc(l)is a diagonal (n x n) matrix, u0w is a diagonal (4 X 4) matrix and ut(J)is a ( 2 x 2 ) diagonal matrix. The elements of these matrices are the variances which are assumed known appro-rimately from experimentation. If v is a typical observation and b the residual, an expansion to first 01 der terms in the residuals yields b
=
dv av cpi + €1" + c api at, 6
4
i=l
i=l
(2)E?)
dv + c aea (3) el3). 2
i=l
(6.24)
The partial derivatives dv/dpi are obtained by any of the methods described in Section 6.4; the partial derivatives with respect to bias errors dv/ae12) are either 0 or 1depending upon the type of observation; the partial derivatives with respect to location errors dv/dei3' are zero if the observation is range or range rate, while for elevation or azimuth they can be obtained either by: (1) evaluating known analytic formulas for these partial derivatives or ( 2 ) by the variant method, i.e., by numerical differencing of perturbed trajectories. The residual vector b will then have the form
b = Aq
+ I,(') + Bd2)+ Ld3)
(6.25)
where A is an (n X 6) matrix of partial derivatives of the observations with respect to the initial parameters, B is an (n X 4) matrix of bias partial derivatives and L is an (n x 2 ) matrix of survey partial derivatives. If we denote by 4 the estimate obtained by differential corrections, q the true corrections, and a = q - q the errors in this estimate, then
Q
+ +~43)1, = (ATXA)-l[ (ATX)e(') + (ATXB)d2) + (ATXL)r(3)], = ( A T S A ) - ~ A T S [ I ~ ~( ~~ ) (
2
)
(6.26)
and the covariance matrix of this estimate is uq =
-
aq* = (ATSA)-' + [(ATSA)-'(ATSB)]~,c*)[ (ATSA)-'(ATXB)IT
+ [(AT2M-l (ATSL)]~,(a)[ (ATSA)-'(ATSL)IT,
(6.27)
SATELLITE ORBIT TRAJECTORIES
69
where S = u$. The matrices S, u . ~ ,U.W are assumed given a priori; the matrices A and L are obtained as output from a tracking program; the bias matrix B which consists of zeros or one’s depending on the observation type will also be output from the program. Thus the covariance matrix UQ can be computed after the estimates 4 have been obtained. I n practice it will be most convenient to store A , B, L on tapes and to use a separate program to carry out the matrix manipulations in (6.27). The input to this program would be
s = u,cl:,
U&), a&).
Generalization to more than one station and more than one pass is immediate. Since the bias errors from a second station, for example, will be independent of those from the first it is only necessary to add to (6.27) a matrix of the second type with the appropriate B and uCb) matrices. The effect of new station locations errors is obtained by adding another matrix of the third type in (6.27) with the corresponding L and u . ~matrices. If it seems desirable to consider that bias errors are not constant over an entire pass but are constant over separate portions of the pass, this can also be handled as above by considering each portion as independent of the others. The second problem posed in Section 6.6.1, that of determining the bias or station location errors, can be handled by solving an augmented problem. I n the case considered above the vector q will consist of the six position and velocity corrections, the four bias corrections, and the two station location corrections. Thus q will have twelve components and the normal matrix will be (12 x 12) instead of (6 X 6). As more stations and more biases are to be determined, the augmented normal matrix will grow in size and numerical difficulties may be expected. The relative simplicity of the approach suggested here is due largely to the assumption of independence of the various types of errors. More sophisticated theoretical models which remove this and other assumptions can, of course, be derived but the practical difficulties involved in such models are enormous. The procedure outlined here can be added to an existing tracking program, and has successfully been used at, Space Technology Laboratories, Inc. for design trajectory studies. The steps outlined above for a statistical error analysis are conceptually simple, but because the matrices involved are very poorly conditioned, the practical implementation of such a program on a computer is very difficult.. I n a program based on this analysis, it was found necessary to carry out all matrix multiplications and inversions in a t least double precision floating point, arithmetic on an IBM 7090 in order to retain any significance in the results.
SAMUEL D. CONTE
70
7. Organization of a Tracking and Prediction Program
A complete prediction and tracking program will inevitably be extremely complicated. The number of instructions will be well over 40,000, the liberal use of tapes is essential, and several man-years of effort will be required to produce it. Such a program should be very flexible because unforeseen needs will demand frequent modification, and because new developments aimed a t improvements in computational efficiency will be the rule rather than the exception. It is therefore advisable to organize the program in the form of macro-blocks, each of which is substantially independent of the others in that each block can be replaced without altering the basic structure. Thc flow diagram a t the end of this section shows the major blocks in such a program. A brief description of the function of each block, together with some suggestions for improving computational efficiency, is given here, although the exact equations needed are not included. I n general, the diagram follows closely that used by Space Technology Laboratories in its General Tracking Program (GTP), a routine that has been used successfully for tracking a large number of American satellites, including the Able series, some of the Pioneer and Explorer series, the Discoverer series, and the Transit and Courier series. G T P will perform on option any combination of three major functions: trajectory intcgrat ion; differential corrections; ephemeris generation. Since the original submission of this manuscript, the author has directed the development of a substantially more sophisticated tracking program A complete dea t Aerospace Corporation. The program is called TRACE. scription of this program together with detailed derivation of equations is contained in Ref. [&I. 7.1 Input and Conversion Block
7.1.1 Input Conversion
The program should be capable of accepting initial conditions in any one of the following coordinate systems : instantaneous elliptic elements, inertial rectangular equatorial, inertial rectangular ecliptic, geo-spherical; and of converting these to the working coordinates of the tracking system which are not necessarily the same as the integrating coordinates. I n GTP, the tracking coordinates are ( r , 01, 6, v, p, A ) while the integrating coordinates are earth-centered inertial rectangular equatorial. This sub-block will also accept control parameters which will: (1) select the perturbations to be included such as drag, oblateness terms, relevant planetary bodies; (2) set the integration accuracy criterion; (3) specify which major functions are to be performed-trajectory, differential correction, or ephemeris generation.
SATELLITE ORBIT TRAJECTORIES
71
7.1.2 Observation Processor If the differential correction option is selected, radar or optical data from any one of several stations is accepted and converted. Station location information is stored in this block for any pre-selected station. Depending upon the equipment available a t a station, the observational data will be transmitted in a variety of forms including direction cosines, declination and right ascension, elevation, azimuth, range, range rate. This routine (1) converts these to a uniform set-range, range rate, elevation, azimuth, and range acceleration; ( 2 ) applies necessary corrections such as refraction; and (3) converts observation times to minutes after the reference time. 7.1.3 Data Editing Routine This routine processes the data, as described in Section 6, to eliminate gross errors and blunders in the data. 7.2 Trajectory Integration Block
Among the more important sttbroittiues to be incorporated in this block are the following: 7.2.1 The Numerical Integration Subroutine G T P operates in a combined Runge-Kutta and Gauss-Jackson method with automatic step size selection utilizing sixth differences. The RungeKutta method is used for starting the second-sum method and for integration up t o exact observation times when small integration steps are required. I n the latter case, basic information for the second-sum method is saved for resumption of integration in that mode. 7.2.2 Planetary Coordinates Look-lip Roziiiw Coordinates of the planets are stored on tapc and rend into the high speed computer in blocks. Interpolation to the required time is performed with Everett's central difference interpolation formula through fourth differences. The coordinates are referred to a fixed rqninos-that of 1950.0. 7.2.3 Coordinate Transformation S i d m u t i i w On interplanetary flights, provision is madc for shifting the center of
coordinates to the dominant body. This routine will supply the proper constants such as the mass ratios in each operating mode and will provide the necessary planetary coordinate subtractions. In addition, it is necessary to obtain the velocities of one body-say the earth-relative to the sun. This is done by numerical differentiation but should be done very carefully since any errors committed here will propagate. I n general, three operating modes are allowcd: from an earth-centered to a sun-centered to a planet-
72
SAMUEL D. CONTE
centered coordinate system. On a lunar flight, it is possible to shift to a moon-centered system. 7.2.4 An Atmosphere Subroutine This routine should provide the density profile for a given altitude above the earth's surface.
7.2.5 An Output Subroutine 7.3 Partial Derivatives and Residual Block
This block operates simultaneously with block 7.2 because the variational equations which supply the derivatives of the rectangular coordinates relative to the initial conditions requires the output from the basic trajectory. This block then consists of: 7.3.1 The Variational Equation Package The initial conditions for these equations must first be computed using the formulas given in Section 6. I n general, six different sets of initial conditions are computed and six sets of variational equations solved to yield the 6 X 6 matrix ( a z / a p ) for x = (2,y, x , 2 , ?j, .k) and p = ( T O , (yo, 60, 210, P o , Ao). 7.3.2 Partial Derivatives of Radar Coordinates Using the output from 7.3.1 and analytic expressions for the radar coordinates (R,A, E, R ) as functions of the position and velocity, this routine computes the derivatives of the radar coordinates with respect to the initial spherical coordinates. 7.3.3 The Residuals and Equations of Condition This routine forms the difference between the observed and computed values of (R, A , E, R, R ) and sets up the equations of condition for the differential correction process. 7.4 The Differential Correction Package
7.4.1 The Least Squares Subroutine The normal matrix ATAis formed, the constrained least squares problem is solved for the corrections q; and the improved estimate of burnout parameters p'. The eigenvalues and eigenvectors of B = D-'ATAD-' and the elements of the covariance matrix (ATA)-' are printed out on-line. New residuals are also computed and the integration block is re-entered, if necessary, for another iteration.
SATELLITE ORBIT TRAJECTORIES
-7I
I
73
74
SAMUEL D. CONTE
7.4.2 Updating the Trajectory
This routine follows the procedure outlined in Section 6 for constructing the covariance matrix a t a time tl so that a least squares fit can be carried out with new observations obtained after time tl. 7.5 The Ephemeris Processor
After the differeiitial correction process has converged, it is necessary t o produce a sighting ephemeris for each of the observing stations. The sighting data must be processed into a format suitable for acceptance by each station. Such sighting data will usually be required a t equally spaced and fairly frequent time intervals. The most efficient way to obtain this information is t o interpolate-using a Lagrangian formula-for the position and velocity a t the required times and then to convert these into radar coordinates. Bibliography 1. Aroian, L. A., and Robison, D. E., A data editing routine. Space Technology Labs. Internal Report No. P A 2952-01, July (1960). 2. Baker, R. M. L., et al., Efficient precision orbit computations. Aeronutronics PAM 59-444 (1959). 3. Baker, R. M. L., Astrodynamics. Academic Press, New York, 1960. 4. Brenner, J. L., and Latta, G. E., Satellite orbit theory using a new coordinate system. Proe. Roy. SOC.(London) (1960). 5. Brouwer, D., On the accumulation of errors in numerical integration. Astron. J . 46 (1937). 6. Brouwer, D., Solution of the problem of artificial satellite theory without drag. Astron. J. 64,37&397, November (1959). 7 . Cohen, C. J., and Hubbard, E. C., Algorithm applicable to numerical integration of orbits in multi-revolution steps. U. S. Naval Weapons Lab. Report No. 1705, May (1960). 8. Contribution to astrodynamics. Aeronutronics Publ. U-880, June (1960). 9. Dahlquist, B., Convergence and stability in the numerical integration of ordinary differential equations. Math. Scand. 4 (1956). 10. Diliberto, S. P., Kyner, W. T., and Freund, R., The application of periodic surface theory to the study of satellite orbits. Astror. J . April (1961). 11. Eckert, W. J., and Brouwer, D., The use of rectangular coordinates in thc differential correction of orbits. Astron. J . 46, August (1937). 12. Ehricke, I<., Space Flight. Van Nostrand, Princeton, New Jersey, 1959. 13. Garfmkel, B., On the motion of a satellite of an oblate earth. Astron. J . 63 (1958). 14. Garrett, J. R., et al., Satellite orbit determination and error analyses of procedures. Holloman Air Force Base, New Mexico, Project No. A-398. 15. Hamming, R. W., Stable predictor-corrector formulas for ordinary differential equations. J. Assoc. Computzng Machinery 6, January (1959). 16. Hanson, G., and Routh, D., Space Tcchnology Labs. General Tracking Program CDRC Report No. 9830.30-018 (1960).
SATELLITE ORBIT TRAJECTORIES
75
17. Helvey, T. C., ed., Space Trajectories, p. 119ff. Academic Press, New York, 1960. 18. Henrici, P., Discrete Variable Methods in Ordinary Diflerential Equations. Wiley, New York, 1961. 19. Herrick, S., Precision orbits and observation reduction. Univ. of California (Los Angeles) Astrodynamical Report S o . 1 (1959). 20. Herrick, S., Astronautics information. Proc. Seminar on Orbit Determination, Jet Propulsion Lab., Pasadena, California, February (1960). 21. Herrick, C. E., On the computation of nearly parabolic orbits. Ast,on J . 65, No. 6, (1960). 22. Herrick, S., Astrodynamzcs. Van Sostrand, Princeton, New Jersey, 1961. 23. Hildebrand, F., Introduction to Numerical Analysis. McCraw-Hill, Massachusetts, S e w York, 1956. 24. Hori, G., Development of Delaunay thcory a t critical inclination. Astron. J . 65, 291 (1960). 25. King-Hele, D. G., The Effect of the Earth‘s Oblateness on the Orbit of a Near Satellite. Proc. Roy. Soc. (London) 947,49-72 (1958). 26. Kopal, Z., Numerical Analysis. Wiley, New York, 1955. 27. Kosai, Y., The motion of a near earth satellite. Astron. J . 64 (1959). 28. Morrison, D., A method for nonlinear minimization problems. Space Tcchnology Labs. Internal Report No. “-140 (1959). 29. Morrison, D., Some remarks on weights in least squares. Space Technology Labs. Internal Report No. PA 2027-1-9, October (1960). 30. Morrison, D., and Holt, J., General linear equation solver. Reports RWLS4F(709) and RWEGGS (FORTRAK), SHARE,September (1960). 31. Musen, P., Application of Hansen’s theory to the motion of an artificial satellite in the gravitational field of the earth. J. Geophys. Research 64, December (1959). 32. O’Keefe, J. A., Eckels, A., and Squires, R. K., Vanguard measurements give pear-shaped component of earth’s figure. Science 99, 565-566 (1959). 33. Pines, S., Payne, M., and Wolf, H., Comparison of special perturbation methods in celestial mechanics. ARL, ARDC, August (19G0). 34. Planetary coordinates for the years 1960-1980. I-I. M . Nautical A41manac(1958) 35. Rademacher, H., On the accumulation of errors in processes of integration on highspeed calculating machines, Proc. Symposium on Large Scale Digital Computing. A n n . Computing Lab. Harcnrd 16 (1948). 36. Shapiro, I., The Prediction of Ballistic Missile Trajcctorits ji cm 2iadai Obsercations. McGraw-Hill, New York, 1957. 37. Smart, W. M., Celestial Mechanics. Longmans, Green, Boston, 1953. 38. Smith, 0. K., Perturbation methods for free flight trajectories beyond the atmosphere. Space Technology Labs. Report EO.GM-TM-0165-0038 (1958). 39. Smith, 0. K., Derivativas of radar coordinates with respect to elliptic elements. Space Technology Labs. Internal Report No. PA 2138-1,2 (1958). 40. Smith, 0. K., Predictor-corrector formulas for difference equations. Space Technology Labs. Report No. STL/TR-60400040294 (1960). 41. Smith, 0. K., The computation of coordinates from Brouwer’s solution of the artificial satellite problem. Submitted to Astron. J . 42. Struble, R., A Rigorous Theory of SalcElite Motion. North Carolina State, 1960. 43. Swerling, P., A proposed stagewise differential correction procedure for satellite tracking and prediction. Rand Corp. Report No. 1292, January (1958).
76
SAMUEL D. CONTE
44. Thomas, L. M., and Mace, D., An extrapolation formula for stepping the calculation of the orbit of an artificial satellite several revolutions a t a time. Astrun. J. 65, June (1960). 45. Titus, J., The Encke lunar trajectory program. Space Technol. Labs. Report No. TR-fjO-0000-00249, August (1960). Aerospace Orbit Determination Frogram, 1215-2-115. Aerospace Corpore46. TRACE: tion, Los Angeles, California, March, 1960.
Multiprogramming E
. F. CODD
Development laboratory. D a t a Systems Division International Business Machines Corporation Poughkeeprie. N e w York
1 . Introduction . . . . . . . . . . . . 1.1 Concurrency . . . . . . . . . . . 1.2 Inter- and Intraprogram Concurrency . . . . 1.3 Time Sharing . . . . . . . . . . . 1.4 Space Sharing . . . . . . . . . . . 1.5 Possible Difficulties . . . . . . . . . 2 . Early Contributions . . . . . . . . . . 3 . Current Scope of Multiprogramming . . . . . 3.1 Environments . . . . . . . . . . . 3.2 Plan of Attack . . . . . . . . . . . 4 . Batch Multiprogramming . . . . . . . . . 4.1 Goals . . . . . . . . . . . . . . 4.2 The Computer System . . . . . . . . 4.3 The Pending Workload . . . . . . . . 4.4 The Four Phases . . . . . . . . . . 4.5 Phase Concurrency . . . . . . . . . 4.6 Task Concurrency . . . . . . . . . . 4.7 I 0 Requests and WAIT Pseudo-OpP . . . . . 4.8 Execution Control . . . . . . . . . . 5. The Optimizing Problem . . . . . . . . . 5.1 Scheduling, Allocation, and Queueing . . . . 5.2 Nonscheduled Mode . . . . . . . . . 5.3 Space-Scheduled Mode . . . . . . . . 5.4 Space-Time Scheduled Mode . . . . . 5.5 Tape Allocation . . . . . . . . . . 5.6 Disk Allocation . . . . . . . . . . 5.7 Core-Storage Allocation . . . . . . . . 5.8 Advance Commitments . . . . . . . . 5.9 Short-Range Scheduling . . . . . . . . 5.10 Queueing . . . . . . . . . . . . 6 Multiprogramming withTwo or More Processingunits 6.1 Motivation and Requirements . . . . . . 6.2 Note on Implementation . . . . . . . . 6.3 Rules for Interruption . . . . . . . .
.
77
.
.
. .
. . . . . . . . . . . .
.
. .
. .
. .
. .
. .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . . . .
.
.
.
.
.
.
.
. .
. .
. . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
83
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .
. .
. .
. .
. .
. . .
. . .
. . .
. . .
. . .
. . . . . .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
78 78 78 79 79 80 81
. .
.
.
.
.
.
.
.
. .
. .
. .
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
83 86 87 87 88 90 91 92 92 95 95 104 105 106 106 108 109 110 112 115 116 120 122 122 125 125
E. F. CODD
78
.
. . . . . . . . . . . . . . 128 . . . . . . . . . . . . . . 130
.
.
.
. . .
. . .
. . . . . . . . . . . . . 150 . . . . . . . . . . . . . 152 . . . . . . . . . . . . . 152
6.4 Protection and Relocation
6.5 Special Operations. . 6.6 The Queueing System 7 . Concluding Remarks . . 8. Acknowledgments . . . Bibliography . . . .
.
.
.
.
.
.
.
.
.
.
.
.
134
1. Introduction 1.l Concurrency
The basic concern of multiprogramming is concurrency of operations within computer systems. Many different types of concurrency are incorporated in the computers in use today. Not all of these types are of interest in multiprogramming. Operations which are performed concurrently range from the very small scale t o the very large. One extreme is represented by the simultaneous operation of two triggers during the execution of a single instruction. The other extreme is represented by the concurrent activity of two processing units sharing a common core storage, each capable of interpreting and executing its own stream of instructions. To eliminate consideration of small-scale concurrency, we focus attention on concurrent operations each of which requires at least one machine instruction for its specification. This qualification alone is, however, not enough. It is also desirable to eliminate local concurrency in which the execution of a n instruction is overlapped by the execution of one or more of its neighbors in a single instruction stream. This particular type of coiicurrency gives rise to highly specialized problems [ 2 ] . We are now in a position to define multiprogramming as the technology associated with the concurrent execution of instructions which are not restricted t o being immediate neighbors in any instruction stream. The most common form of concurrency which is consistent with this definition is that of overlapping input-output (10) operations with each other and with processing unit (PU) operations. A few computer systems are now coming into use which contain two or more PU’s capable of concurrent operation: for example, the GAMMA60 [5] and the RW 400 [16]. Exploitation of systems containing two or more PU’s poses some additional problems. For the most part, however, the problems encountered are similar. This similarity is recognized in the definition of multiprogramming which encompasses both the single and multiple PU cases. 1.2 Inter- and lntraprogram Concurrency
An observer taking an instantaneous look a t a computer in action might see one unit servicing instructions belonging to one program and another
MULTIPROGRAMMING
79
unit servicing a second. In such a case, the concurrency may be characterized as interprogram. If, on the other hand, both units are servicing the same program, intraprogram concurrency is exhibited. The distinction between inter- and intraprogram concurrency is very much dependent upon the use being made of the term “program.” It is partly for this reason that multiprogramming is defined in a sufficiently general way t o encompass both types of concurrency. An additional reason is that, if the main objective is to load all the computer units as heavily as possible, it is inadequate to confine one’s attention to either inter- or intraprogram concurrency alone: one is forced to exploit both to the limit. 1.3 Time Sharing
Now suppose that the observer watches one unit over a period of time. He may notice that this unit services first one program, then another. These programs are said to time-share the unit in question, and the unit itself is called a time-shared facility. Normally, the processing unit(s) and I0 channels are time-shared facilities in a computer system. If the mean time spent by the unit in servicing each program is sufficiently small, the observer may get the impression that the programs are receiving simultaneous service from this one unit. Of course, this impression is false. However, time-sharing and true concurrency are complementary aspects of multiprogramming: it would be a most uiiusual situation in which a unit was being time-shared and no other unit mas engaged in concurrent activity. 1.4 Space Sharing
When a set of programs or program segments is being executed in a concurrent manner, space must be found in core storage to accommodate the instruction and associated data. If the total core-storage requirement does not exceed the available capacity, it is clearly possible to allocate disjoint areas privately to each member of the set. The space reservations in core storage may, in this simple case, be regarded as permanent; that is, not subject t o cancellation unless an emergency arises. Looking a t core storage as a whole, w e may say that it is being space-shared b y the programs in questioii. Looking at any onc particular location, however, we do not observe sharing. If on the other hand, the total core-storagc requirement does exceed the available capacity, it becomes necessary to treat one or more of the space reservations as temporary. Thus, some core locations, indeed perhaps all of them, are sometimes in use by one program and sometimes by another. Generally speaking, the only truly p c r n ~ ~ ~spacc c n t reservation in corc
80
E. F. CODD
storage is enjoyed by the supervisory program, and even it may not be permanently assigned all the core space it could use. Core storage is only one of several space-shared facilities to be considered in a comprehensive computer system. There may be a large, quasi randomaccess auxiliary memory such as high-speed magnetic disk units, and space in such memory must be shared in some manner. Then again, the set of tape units constitutes a space-shared facility in which the normal unit of space is the tape unit itself. Allocation of space on all these facilities is an important part of the concurrency exploitation problem. 1.5 Possible Difficulties
In articles describing computer systems, much emphasis has been placed on the provision of abilities to execute various types of operations concurrently. It is true that careful consideration must be given to this question in planning and designing a new computer. All too often, however, the difficulties of exploiting these concurrency abilities have been left as “an exercise for the programmer.” To avoid disappointing, if not disastrous, consequences, the exploitation problem must be thoroughly examined at the machine planning stage. Consider some of the questions which may arise. What is the user’s objective? Is the prime objective fast response to a variety of messages or transactions entering the system a t random intervals, or is it the minimization of time to process a given set of programs regardless of delay to individual members of the set? Perhaps, the objective is a combination of these extremes. What features or schemes are needed or desirable for facilitating the control and coordination of concurrent operations? To what extent should these features be provided in hardware rather than by programming? Can the burden of controlling and coordinating concurrent operations be assumed entirely by a combination of hardware and systems programming, leaving the problem programmer free to express his problem in a conventional sequential manner, if he wishes? Are any changes needed in problem-oriented languages to permit effective exploitation of certain types of concurrency? Must the operating staff be superhuman to keep pace with a machine which is handling several programs at a time and consuming them at a very rapid rate? When a machine malfunction, a programming blunder or an operator error occurs, what aids are needed for identifying the source of trouble and determining which programs have been adversely affected? How can the risk of accidental or fraudulent manipulation of one program by another be virtually eliminated? Are not new techniques in numerical analysis required if machines with large numbers of autonomous arithmetic units become a reality? Is a centralized
MULTIPROGRAMMING
81
large-scale computer system more economical than several less powerful decentralized systems? These questions are but a few of those associated with multiprogramming today. As new forms of machine organization, new applications, and new machine environments develop, we may expect the subject of multiprogramming to evolve from its present primitive beginnings to become an important branch of the computing art. 2. Early Contributions
Multiprogramming has not been called into existence by any single pronouncement or development or conjecture. Almost every computer built to date has contributed something to the subject, usually by possessing some new form of concurrency, or some new interlocks which facilitate exploitation of concurrency. Contributions have also been made in the systems programming area: for example, development of buffer service routines (and corresponding pseudo-ops) which provide reading or writing of successive records from or to tape asynchronously with respect to the main program. The earliest use of the term “multiprogramming,” which the writer has been able to find in the literature, occurs in a paper by Rochester published in 1955 [17]. In this paper Rochester discusses the problem of handling efficiently such applications as premium-notice writing, which requires relatively small amounts of computing and relatively large amounts of tape reading and writing. “The trick used,” he states, “is called ‘multiprogramming’ and it involves doing two or more jobs at once.” Referring to an IBM 705 system, he goes on to state, “The Tape Record Coordinator . . . runs its master tape units for the file maintenance job almost autonomously while passing over inactive records. Then when an active record is found, the calculator briefly interrupts the work it was doing [on another job] to deal with this active record.” This “trick” proved to be highly successful in several field installations. For example, the Cataloging Division of the Department of Defense cut the total time for their daily workload almost by half in the following way. A large tape file (involving some 200 tape reels) had to be processed on a daily basis. Two versions of the file-processing program were placed in different parts of core storage, each version being assigned the processing of a corresponding half of the file. A third program provided supervision of the other two, enabling them to operate asynchronously, and enabling their corresponding tape-record coordinators to operate concurrently. In 1954 an interesting experiment was conducted at the National Bureau of Standards under the direction of A. I,. Leiner. A direct coupling was
82
E. F. CODD
made between two normally self-sufficient computers, namely SEAC[ I ] and DYSEAC[ I l l . The coupling was such that each computer behaved as if it were an input-output device for the other. The resulting system was capable of handling efficiently problems which the two component computers could scarcely have handled if each were working alone. This experiment foreshadowed the later development of systems (for example, the NBS PILOT [ l a ] )containing two or more virtually autonomous processing units closely coupled for fast interaction. A key factor in making I0 and PU concurrency more readily exploitable -and in a more general manner-has been the development of automatic interruption of programs. This feature has removed the necessity for awkward and time-consuming testing to find out whether an I0 operation has been completed or if some exceptional event, such as machine malfunction, has occurred. One of the earliest machines to be equipped with program interruption was the UNIVACI [8]. However, the only condition which could cause an interruption in this machine was arithmetic overflow. In the NBS DYSEAC the very significant step was made of extending interruption to inputoutput operations. Leiner [IO] comments: “This enables optimum use to be made of the computer’s concurrent operating ability; at the same time it relieves the programmer of the responsibility for working out, exactly, the timing interrelationship between the mutually unsynchronized internal and external portions of the program.” At the National Aeronautics and Space Administration a USIVAC1103A was applied to the on-line reduction of wind-tunnel data, and at the suggestion of Turner [see 1-41an external signal originating from the wind tunnel was used to trigger an interruption in the computer program. A further extension of automatic interruption has been made in the TX-2 computer [8]developed at the Lincoln Laboratory of the Massachusetts Institute of Technology. Upon interruption, control is automatically given to one of 25 possible programs: specifically, that one which possesses the highest priority of all those currently in a ready state. For each instruction, the programmer is required to permit or prohibit interruption. One program may accordingly hold up indefinitely activity on all others. I n a paper published in 1958 in the British Computer Journal, Gill [9] of Ferranti Ltd. gave a very lucid account of the problems and potentialities of multiprogramming. This was the first paper on the subject of such a comprehensive nature, and undoubtedly contributed greatly to the current widespread interest the subject enjoys. His introduction of the term “parallel programming” in place of “multiprogramming” appears to have started a fad for inventing even more synonyms; for example, “concurrent pro-
MULTIPROGRAMMING
83
gramming,” “asynchronous programming,” “simultaneous programming,” and the like. 3. Current Scope of Multiprogramming
There exists today a wide variety of environments in which computer systems are required to operate. Many new types of computer organization are being developed to cope with these environments. Multiprogramming appears to serve different objectives in each. Common to all theseobjectives, however, is the basic objective of obtaining the maximum possible throughput rate. Somewhat minor variations of meaning are attached to “throughput rate” in different situations. Accordingly, we proceed to examine several types of environment with a view t o determining more specifically the role which multiprogramming plays in each. It will be observed that much of the variation in the environments to be described is due to the different time scales upon which service is measured. 3.1 Environments
3.1.1 Commercial Batch Processing Consider first the typical commercial installation. It is concerned with the processing of large quantities of business data on a batched basis. The bulk of the work may be characterized as maintenance of magnetic tape files, sorting, and report generation. The programs tend to be large in size and not subject to rapid obsolescence. The runs are to a large extent production runs to be processed according to a comparatively elastic schedule with as much as an hour’s leeway, even for a critical run. Consequently, when a backlog of work exists (and this is commonly the case), the principal objective is to dispose of this backlog in the shortest possible time. Throughput rate is here identified with the rate a t which the given backlog is processed. Suppose the computer system can accommodate two or more of these runs simultaneously. Such a set of runs may be completed in a shorter time if the runs are executed concurrently rather than sequentially. In the concurrent mode, however, any one run belonging to the set may take longer to complete than if it were executed alone. Multiprogramming may therefore yield an increased throughput rate for the backlog as a whole, and a reduced throughput rate for certain individual members of the backlog. In commercial batch processing, delays of an hour (or more in some cases) may be tolerated in individual runs if this leads to a significant time saving in disposing of the entire backlog.
84
E. F. CODD
3.1.2 Problem Solving with Programmer Of-Line The typical large-scale scient.ific and engineering installation is serving many users (possibly numbered in the hundreds), most of whom require short compiling and debugging runs. The length of these runs on current equipment varies from a few seconds to 30 minutes or more. Perhaps as much as 50% of the machine use is accounted for this way. The remaining amount is consumed by production runs of widely varying length-say from 10 minutes to several hours-some to be processed urgently, others when convenient. As with the commercial environment, it is common to be faced with a backlog of runs to be processed. The multiprogramming objective is again an improved throughput rate on the entire backlog, but with perhaps more stringent limits on the delay imposed upon individual runs, particularly compiling and debugging runs. The greater emphasis on individual service for these users is due to their productivity being significantly affected by the average number of machine runs per day they can obtain.’ 3.1.3 Problem Solving with Programmer On-Line Several universities, notably M.I.T. [It?, 191 and Carnegie Institute of Technology, are currently engaged in the development of techniques to bring the user back into close communication with the machine in order to accelerate certain types of problem-solving activity. Each of several users is provided with a typewriter and other terminal equipment connected to the machine as input-output devices. From time to time each user’s program receives short bursts of service (a few seconds in length) until the user suspends activity on his program to contemplate displayed results and consider his next move. These programs are supplemented by more conventional problem-solving runs with the programmer off -line. Multiprogramming plays an essential part in this environment. If the on-line problem-solving activities were not overlapped, the throughput rate expressed in terms of problems solved would drop enormously, and thc entire operation would become prohibitively expensive. 3.1.4 Industrial and Military Monitoring and Control For monitoring or control purposes a computer system may be connected with automatic data sources operating a t high speed; for example, a wind tunnel or space vehicle “71. During the monitoring or control periods the machine must keep pace with the external events, but may return to other, less urgent activities, in the intervals between these periods. In this envi1 It was once common to get only one compiling and debugging run per normal shift; now, howevcr, with higher performance machines, this number has climbed t o three or four.
MULTIPROGRAMMING
85
ronment, extreme emphasis is placed on speed of handling of the incoming stream of data. Concurrency between jettisoning the less urgent work and getting the urgent data into core storage, and also between transmitting this data and processing the data already in core storage, may be employed to increase the throughput rate for the urgent work. If control rather than mere monitoring is involved, it is likely that a very high degree of reliability is required. For this reason there may be at least two processing units in the system, each one capable of handling the critical parts of the workload alone. During normal operation, the throughput rate of all work, critical and noncritical, may be increased by concurrent activity on all processing units capable of functioning. 3.1.5 Commercial Real-Time Processing At the present time a few commercial installations are being established with the objective of processing transactions as they arise instead of batching them for later sorting and processing. A good example is the handling of airline reservations and related applications on a centralized generalpurpose computer system [15]. Remotely located agent sets are linked by a communications network to the central computer. At each terminal, very prompt response is required from the computer system for the majority of the message types to avoid annoying delays to the customer. The average time allowed for both transmitting and processing may be only five seconds. Because of this need for fast response, the programs for the various types of transactions, together with most records in the centralized shared files, must be extremely accessible. Now, access to currently available, quasirandom, auxiliary storage (i.e., disk or drum) takes from 2 to 200 msec or more. Therefore, in order to maintain sufficiently high throughput rates to meet the transaction response requirements, several transactions must be handled concurrently. Further, as in the industrial and military control applications, two or more processing units may be needed for reliability reasons. 3.1.G Ultrahigh-Performance Requirements
Finally, consider an environment in which problems involving very large amounts of computation must be processed within relatively short time periods. Examples of such problems may be found in the fields of hydrodynamics and linear programming. As research progresses in these (and similar) fields, the magnitude of the problems to be solved grows steadily. It seems most unlikely that circuit speeds can be improved indefinitely. As the fundamental barrier of the speed of light is approached, the dlstances over which signals travel become critically important. 1.ess cen-
86
E. F. CODD
tralized logic appears to provide at least a partial solution to this problem. Observing current trends in the costs of fast storage and logic, it seems inevitable that a point will be reached at which two processing units of given speed working concurrently on different parts of a single problem will cost less than one unit which has double this speed. This point may have been reached already. The decision to employ two or more processing units in a computer system to achieve high performance leads inevitably to the multiprogramming of single problems.
3.1.7 Mixed Environments I n actual practice the “pure” types of environment we have described are not very likely to exist in isolation. Almost every large-scale computer installation is, or will be, faced with a mixture of these types. For example, a commercial installation may have both a real-time and a batch-processing workload. 3.2 Plan of Attack Before starting to consider in detail some of the problems associated with multiprogramming, it is convenient to break the subject down into manageable pieces. There are three important ways of classifying multiprogramming. First, and perhaps most obvious, is a division into batch multiprogramming on the one hand and real-time multiprogramming on the other. Second is a division based on the types of program interdependencies which may exist. Programs may, of course, be completely independent of one another; or interdependencies of various degrees of complexity may occur. At one end of the scale is the simple case of final results of one program being initial operands for another. At the other end, we may have concurrent programs sharing access to files, sharing use of subroutines and possessing interlocks as complicated as those in the supervisory program itself. Third is a division based on the multiplicity of processing units in the computer system. If a system has two or more processing units, an important distinction may be drawn between the case in which these units are functionally identical and that in which they are functionally specialized. Of these three we select the third as a basis for discussion. Accordingly we deal first with the case of a system containing just one processing unit, and later with the case of a system containing two or more functionally identical processing units. In this brief account we place considerable emphasis on batch multiprogramming with independent programs. The lack of emphasis on realtime multiprogramming should not be interpreted as a measure of the
MULTIPROGRAMMING
87
importance attached to it. On the contrary, there is every indication that, real-time multiprogramming will be of prime importance, while batch multiprogramming will play a supporting role. However, many of the multiprogramming problems and difficulties associated with real-time processing are included in those associated with batch processing. For example, in contrast to a batch workload, there is very little scope for scheduling a real-time workload, since most of the demands upon the system cannot be significantly delayed. In what follows an attempt will be made to separate the administrative techniques from the optimizing techniques. The former permit concurrent operations to take place in a safe and orderly manner. The latter are aimed a t maximizing throughput rate in the sense appropriate to the environment. The tools available for satisfying the various requirements and overcoming the numerous difficulties are (1) the hardware itself, (2) the supervisory program, (3) the compiler, and (4) the language (both source and object).
4. Batch Multiprogramming 4.1 Goals
The system and techniques to be described have two main objectives: (1) optimizing and (2) administrative. The optimizing objective is to minimize the execution time of the pending workload, assuming (for the time being) there are no precedence or urgency constraints. The administrative objective is t o shield the problem programmer, operating staff, and maintenance engineers from the complexities of concurrency. The requirements which a multiprogramming system should meet were described in more detail in a paper by Codd et al. [4]. The following six requirements were proposed: Independence of Preparation. The multiprogramming scheme should permit programs to be independently written and compiled. This is particularly important if the programs are not related to one another. The question of which programs are t o be coexecuted with which should not be prejudged even at the compiling stage. (b) Minimum Informationfrom Programmer. The programmer should not be required to provide any additional information about his program for it t o be run successfully in the multiprogrammed mode. On the other hand, he should be permitted t o supply extra information (such as expected execution time if run alone) t o enable the multiprogramming system to run the program more economically than would be possible without this information. (c) Mam’mum Control by Programmer. It may be necessary in a multiprogramming scheme to place certain of the machine’s features beyond the programmer’s direct This reduction in direct influence (for example, both cloclrs in IBM STRETCH).
“(a)
88
E. F. CODD
control by the problem programmer must riot only be held to an absolute minimum, but must also result in no reduction in the effective logical power available to the programmer. (dl Noninterference. No program should be allowed to introduce error or undue delay into any other program. Causes of undue delay include a program which gets stuck in a loop, and failure of an operator to complete a requested manual operation within a reasonable time. (e) Automatic Supervision. The multiprogramming scheme must assume the burden of the added operating complexity. Thus, instructions for handling cards, tapes, and forms should originate from the multiprogramming system. Similarly, machine malfunctions, programming errors, and operator mistakes should be reported to the responsible party in a standard manner by the multiprogramming system. Again, all routine scheduling should be handled automatically by the system in such a way that the supervisory staff can make coarse or fine adjustments a t will. Further responsibilities of the system include accounting for the machine time consumed by each job and making any time studies required for operating or maintenance purposes. (f) Flexible Allocation of Space and Time. Allocation of space in core and disk storage, assignment of input-output units, and control of time-sharing should be based upon the needs of the programs being executed (and not upon some rigid subdivision of the machine) .”
4.2 The Computer System For the sake of clarity it is necessary to make assumptions about the computer system to be exploited. We shall attempt to keep these assumptions few in number and simple in character, adding to them only as necessary. A STRETCH-like system is assumed, including the following items of equipment : (1) one processing unit; ( 2 ) a large core storage (e.g., 64,000 words); (3) one disk channel with two disk units attached; and (4) two tape channels, each with four magnetic tape units attached. The units are interconnected as in Fig. 1. Each disk has just one seeking mechanism and a storage capacity of, say, two million words. A seeking activity on one disk unit may proceed concurrently with either a seeking or a data-transmission activity on the other. Only one disk unit a t a time is allowed to transmit data over the disk channel. Similarly, only one tape unit a t a time is allowed to transmit data over the tape channel to which it is attached. Any of the other units on this channel may, however, be engaged in activities other than transmitting data, for example, rewinding. Each I0 channel is sufficiently autonomous to take an I0 operation t o completion once the operation has been initiated by the processing unit.
1
MULTIPROGRAMMING
UNITS DISK I
STORAGE
STORAGE
89
UNITS TAP ]:
STORAGE
STORAGE
FIG.1. Computer system with single processing unit.
Upon completion, the channel signals the processing unit. If the interruption system is enabled, the signal causes a program interruption. Otherwise, the signal is held until the system is re-enabled, and the interruption occurs a t that time. The processing unit is assumed to have high performance with operation times of a few microseconds. This assumption is convenient primarily because generality and automatism would have t o be sacrificed t o some extent in slower machines. Moreover, the mismatch between processing and I 0 speeds would very likely be less, and hence the profitability of batch multiprogramming would also be less.
90
E. F. CODD
The core storage is sufficiently modular that references by the processing unit are seldom delayed by I0 references. Incidentally, the core storage bus itself is assumed never to be a limiting factor. Thus, the effective speed of the processing unit is not reduced as the I0 load is increased. This is important because multiprogramming normally results in an increased I 0 load. Card readers, printers, and punches have been omitted, partly for the sake of simplicity, and partly because their use in the batch environment with a high-performance machine for anything other than supervisory activity is highly questionable. It frequently proves to be more economical to operate the central machine on a tape-to-tape basis, and use satellite computers for preparing input tapes and interpreting output tapes. In the computer system illustrated in Fig. 1 the programs being concurrently executed share time on (1) the processing unit, (2) the core storage bus, (3) core storage modules, (4) the disk channel, (5) the two disk-seeking mechanisms, and (6) the tape channels. Such programs share space in (1) core storage, (2) disk storage, and (3) tape storage. For the time being we shall assume that individual tape units are not shared, since this would normally entail too much overhead activity in the form of tape searching.2 We shall also assume that a program in the execution phase is assigned an area in core storage, and if the program needs it, an area in disk storage. The assignment of core, disk, and tape space is treated as a permanent and private reservation to be held by a program until completion or termination of the program by the supervisory control. For this particular configuration of equipment the number of programs which may be operating concurrently at any instant is five, if tape rewinds are excluded from consideration. Note that this does not limit the number of programs which may be in the execution phase together. 4.3 The Pending Workload The machine must be made aware of the nature of its pending workload if it is to attempt to minimize the time for disposing of the workload. Records representing run requests are accordingly read into core storage from a tape unit reserved for program loading. Normally, accompanying each run-request record on tape are records containing the corresponding program and data. If, however, the program is retained in disk storage on a more or less permanent basis (presumably because it is so frequently used), then the run-request record on tape may be accompanied by data records only. The overhead activity may be justifiable for extremely fast tape units.
E. F. CODD
92
Neither outputting nor compiling is treated as a phase. The assumption regarding output is that during an execution phase, a program places its final results on one or more output tapes, which are later interpreted by a satellite computer. As far as possible, a compiling run is treated just like any other run, the compiler behaving like a problem program, and the program to be compiled being treated as data. 4.5 Phase Concurrency
The time spent by any one program in each phase is significant enough to warrant concurrent treatment of the phases. We shall assume that, subject to the space limitations of the machine, any number of programs may be in each of the four phases. Consequently, if an observer were to take an instantaneous look a t the entire system, he might see several programs in the scheduling phase (awaiting selection by the scheduling routine), several in the preparation phase, several in the execution phase, and several in the rewind-demount phase. Success in overlapping the preparation phase of a given program with the preparation and execution phases of other programs depends very largely on the tape-unit requirements of the program. If the number of tapes to be mounted for this program is less than or equal to the number currently unreserved, it may be possible to overlap the preparation phase completely with activities on other programs. A similar remark applies to the overlapping of the rewind-demount phase of a given program with the execution phase of other programs. 4.6 Task Concurrency
We now wish to examine the execution phase in more detail, and consider first a program P running alone. Suppose that, at some point, P calls for a record to be read into core storage from one of the tape units. While this record is being transmitted to core storage, the processing unit is free to continue servicing P. If P called for this record sufficiently in advance of the need for the information contained therein, then P would be able to continue to supply the processing unit with useful work throughout the transmission period. We then obtain a timing diagram as in Fig. 2. -TIME
PU
-Po
REQUES;-;
I
TAPE CHANNEL
I
L-----A END OF OPERATION
FIG.2 . Timing diagram illustrating continuation of processing throughout the time interval needed to complete an I0 operation.
MULTIPROGRAMMING
91
Until a run-request record enters core storage, the machine cannot be said to be aware of the corresponding run. Hence, the pending workload, as far as the machine is concerned, consists of all those runs for which (1) requests have entered core storage, and ( 2 ) the corresponding programs have not yet been selected for preparation and execution.
4.4 The Four Phases Each run from the pending workload is taken through four phases: (1) scheduling, (2) preparation, (3) execution, and (4) rewind-demount. Let us follow a particular run through this sequence. In the scheduling phase, the run request is examined by the scheduling routine to determine if, in view of the current state of the machine, this is an opportune time to begin preparations for the execution of the run. Several examinations may take place before the scheduling routine finally decides to place the run in the preparation phase. At this point the scheduling routine allocates tape units, disk space, and core space to the program. A method for determining which units and storage areas are to be assigned will be described later. In the preparation phase the allocated areas are loaded and initial tape reels (if any) are mounted. Changing a tape reel by hand usually takes 30 or more seconds. Automatic reel changers are likely to take some 10 seconds. In either case, a fast computer can do a lot of work in periods of t,his magnitude. As soon as its core area is loaded, the given program may be brought into the execution phase, even though its preparation phase may not be completed yet. Of course, this implies that interlocks are provided to hold up any program in the execution phase which attempts to use a disk area or a tape unit which is not fully prepared. These interlocks are also useful in handling in-execution reel changes associated with multireel files. The activity of a program on the processing unit may be terminated altogether for a given run either because the program itself requested it or because the supervisory control demanded it. In either case, the end of processing is not necessarily coincidental with the end of the execution phase. The program may have I0 operations either active or pending. Some or all of these may entail transmission of data and hence require use of the core reservation for this program. When all these data-transmission operations have been either completed or for some reason canceled, the core reservation may be released for use by the other programs. At this point, the program is said to reach the end of the execution phase. The only activities for this program which may still be unfinished are tape rewinding and demounting operations, and these constitute the rewind-demount phase.
MULTIPROGRAMMING
93
If, on the other hand, program P reaches a point beyond which further processing must await the arrival of the record in core storage, then P releases the processing unit pending completion of the transmission operation. If a supervisory program is present, P accomplishes this release by means of the pseudo-op WAIT, which not only passes control to the supervisory program, but also requests that P itself be placed in the NOT READY (for processing) state. Upon completion of the I0 transmission operation, the supervisory program may pass control back to P (Fig. 3) or, more generally, it may place P in the READY (for processing) state. 4
[I0
R
E
Q
7
PU
TIME
T
I
u I
TAPE CHANNEL
END OF 0 P ERAT1ON
FIG.3. Timing diagram illustrating continuation of processing for only part of the time interval needed to complete an I0 operation.
Figure 4 illustrates the behavior of a program A when run alone. Figure ti shows what might happen if a second program B mere executed concurrently with A. Several items are worth noting. First, queues3are formed on the processing unit and channel 3. Second, program A no longer spends time in the SOT READY state due to I 0 task A-1. This is because A is dis-
-
c
- - - - - -- - p u TASK
I0
10
REOUEST WAIT
pu
A {cH2
(
I0 TASK A-I END OP
CH 3
-TIME
----------+ 30
REQUEST 1
REOUEST WAIT
i ' :
I
I
4
I I
I0 TASKA-2
'
F--
I 0 TASK A-3
END OP
END OP
FIG.4. Behavior of program A running alonc.
placed from the processiiig unit by B for a period of time a t least as long as A previously spent in the NOT READY state. Third, the delay experienced by program A in obtaining service from channel 3 and for I0 task A-2 results in A entering the KOT READY state; when run alone this was not necessary. As a consequence, we may state the following general rule: for The queue discipline exhilitrd on the PU is time-lirnitpd FIFO.
B{:3
CH 3
CH 2
PU
END OP
I0 TASK 8-2
END OP
I
-I
FIG.5. Behavior of program A when run concurrently with program B.
8-1
I
MULTIPROGRAMMING
95
every I0 request issued by a program it is necessary for that program to be prepared to supply a WAIT pseudo-op, because the extent of delay in servicing the request is unpredictable by the programmer or compiler. 4.7 1 0 Requests and WAIT Pseudo-Ops
All I0 requests are channeled through the supervisory control, partly because it may be necessary to queue these requests and partly to keep each program from damaging others (inter-program protection is treated in Section 4.8). Thus the compiled program contains no absolute READ, WRITE, etc., operations. I n their place it contains the pseudo-ops READ REQUEST, WRITE REQUEST, etc., each of which consists of a call for the supervisor followed by a designation of the operation to be performed together with a symbolic identification of the file to be operated upon. Similarly, when a program needs to release the processing unit, it does not issue an absolute WAIT operation, but rather the pseudo-op WAIT. This again is a call for the supervisor followed by information as to what condition is being awaited. Three versions appear to be useful: (1) W-wr for spec.i,fied symbolic file, (2) WAIT for a n y symbolic file, and (3) WAIT for specified number of symbolic files. I n each case it is understood that only those symbolic files belonging to the program issuing the WAIT are pertinent. Each of these pseudo-ops calls for the supervisor to place the current program in the NOT READY state. The program is later returned to the READY state upon completion of (1) the specified task, (2) a n y task, or (3) the specified number of tasks. The only use made of the absolute WAIT operation occurs when the supervisory control finds that there are no programs in the READY state, nor any that can be made ready by transferring the data of a pending output request to an output buffer. Other pseudo-ops will be discussed later. 4.8 Executive Control
The supervisory control may be neatly divided functionally and structurally into two parts: the execution control, which supervises the execution phase; and the pre- and post-execution control, which supervises all other phases. The principal functions of the execution control are: (1) Preservation of the status of arithmetic registers, mode settings, etc., when switching the PU from one program to another. (2) Handling service requests, including queue formation and selection. (3) Buffer service. (4) Inter-program protection.
96
E. F. CODD
(5) In-execution relocation. (6) Program suspension, dumping, and relocatioii. (7) Communication with the operating staff. (8) Space release and request. (9) Time studies. (10) Miscellaneous services. 4.8.1 Status Preservation A processing unit which operates so that no status information (i.e., operand, result, indicator, or mode) is carried over within the unit from one instruction to the next is said to be “core-storage reflected.” Few, if any, processing units in use today are completely core-storage reflected. An example of a machine which is nearly so is the Honeywell 800 [ I S ] . Thus, if interruption is normally permitted a t the end of executing any instruction-and we shall assume that it is-we are faced with the need to preserve all the PU status iiiformation for the interrupted program. At some later time when this program is resumed, it is usually necessary (though not always) to return control to the instruction which, in the usual flow sense, immediately follows the point of interruption. Immediately prior to this return of control, the P U status information is normally restored unchanged so that execution will continue as though no interruption had occurred. Kote that an important item of PU status information is the instruction-counter contents. One might hope that it would be sufficient to provide one dump area per program for PU status information. TTiifortunately, it appears necessary to provide two per program. The basic reason is that for I0 and arithmetic types of interruption, the supervisory program cannot, in general, provide a complete interruption service. After initial, general handling of an interruption by the supervisor, a program may supply additional, special handling by means of an interrupt routine of any length. Because of this indefinite length, the supervisor must be prepared to switch the PU from program A to some other a t a time when an interrupt routine belonging to A is being executed. 4.8.2 Service Requests As the execution of a program proceeds, it may from time to time issue an I0 request in the form of a pseudo-op to the supervisory control. If, after determining which channel and unit are required, the supervisory control finds these are both available, it initiates the operation (or chain of operations) without delay. If the unit is not yet prepared (for example, a tape unit is involved and the required reel is not yet mounted), further service on the PU for this program is temporarily suspended by the super-
MULTIPROGRAMMING
97
visory program. This is an example of a program being placed in the NOT READY (for processing) state, although it did not specifically request such action. If either the channel or unit or both are busy, the request is placed in a queue which may be ordered, for example, by time of arrival. On some occasion, not necessarily the next one, when the channel and unit become available, the request is taken out of the queue and is serviced. Whenever program A is returned from the NOT READY state to READY for processing, we have, in effect, an implied request for service from the processing unit. This request is serviced without delay if the processing unit would otherwise be idle. If, however, program B were being serviced, the supervisory program selects a program from the processing unit queue, treating B as well as A as a member of this queue, and examines the situation in accordance with the queue-selection rule to determine which program should be serviced next. Selection rules for queues on the processing unit and I0 channels are discussed later. 4.8.3 Bu$er Service If a program requires a set of records to be read into core storage one by one in the order in which they are stored on tape or disk, a buffer area may be set aside which is large enough to hold two or more records. While the program is processing a record in one part of the buffer, the supervisory program can very often be reading subsequent records into other parts. The pseudo-op GET is used by the program to indicate a readiness to advance to the next record and discard the current one. The supervisory program permits this advance if the next record is already in core storage and prepares to refill the area occupied by the record which is now obsolete. If the supervisor has fallen behind in the buffer refilling activity and the next record is not yet available, the program is put in the NOT READY state for the time being. Accordingly, unlike the READ REQUEST, it is not necessary for the program to supply a pseudo-op WAIT for any pseudo-op GET issued. Buffered writing is handled in an analogous but complementary manner by the pseudo-op PUT. Buffering may also be applied to the reading of records from disk storage in some order other than that in which they are stored, provided the program is normally able to supply disk addresses well in advance of the need for processing the corresponding records. Buffered writing into nonconsecutive disk locations also requires that disk addresses be supplied, but not necessarily in advance of the need to write the corresponding records. Buffering should not be overlooked simply because there may be other programs in the execution phase concurrently. A relatively small invest-
98
E.
F. CODD
ment in buffer space can impart an elastic quality to a program’s demands for service. Thus, if delay is experienced in obtaining service from one facility, the program may still be able to remain active on others. 4.8.4 Interprogram Protection
There are many ways in which one program, P, can cripple another, Q, if they are in the machine together. One obvious way is for P to modify a n instruction or data word in the core or disk area assigned to Q. Another way would be for P to backspace a tape assigned to Q. Havoc might be createdand probably would be-if P transferred control to some arbitrary point of Q. I n each of these cases, if Q is the supervisory program, the entire workload-current and pending-is placed in jeopardy. Protection measures are accordingly vital to the success of multiprogramming with independent programs. We shall see later that these measures may be extremely desirable when closely related programs are being processed concurrently. Two degrees of protection are readily identifiable: malfunction-proof protection which is inviolable, no matter what machine malfunctions or program errors occur; and program-proof protection which can only be violated if machine malfunction occurs. Malfunction-proof protection appears to be unattainable in practice. For, if information were stored in a read-only memory, a malfunction might occur on read-out and hence yield an incorrect copy even though the original were unchanged. However, it is possible to achieve a level of protection arbitrarily close to malfunction-proof. At the present time it appears justifiable to go beyond mere program-proof protection only for certain military applications, and for protecting the copy of the supervisory program which is kept in disk storage for restoration in case of a n emergency. Program-proof protection appears to be attainable rather readily and is ample to guard against accidental or fraudulent manipulation of one program by another. The question arises: should this degree of protection be extended to prevent a program from merely reading information which does not belong to it? If the applications require that commercial or national security be preserved, the answer is clearly yes. Whatever the applications, read protection is a useful adjunct to write protection as a program debugging aid. The protection problem can be summed up as follows: each program is to be contained throughout its execution phase within the assigned core area(s), disk area(s), and tape units; and the supervisory program is to have access to any part of the machine. We now consider the protection problem in five parts: (1) references to core storage by the processing unit,
MULTIPROGRAMMING
99
(2) references to core storage by the I0 channels, (3) references to disk storage, (4) references to tape units, and ( 5 ) special operations reserved for supervisory control only. n’ote that protection of the contents of high-speed registers in the processing unit is provided by the status-preservation activity of the supervisory control (4.8.1). 4.8.4.1 PU References to Core Storage. Because of the extreme frequency of references to core storage by the PU, it is unreasonable to monitor these references by programmed inspection. Many hardware schemes have been devised. They may be classified by the following properties: (1) Resolution: what is the smallest block of core storage which can be protected? (2) Adjacency: can nonadjacent areas (not necessarily a t the extreme ends of core storage) be simultaneously protected? (3) Treatment of potential violation: is a n alarm given or the address silently transformed to some safe value? (4) Treatment of fetching: are data-fetching and instruction-fetching violations detected, and if so, are they identified separately from data storing? (5) Performance: is any penalty paid in performance due to the monitoring activity? Some examples follow. I n STRETCH, references by the processing unit to core storage are monitored to see if the effective address falls within a certain fixed area or within a second variable area. If so, the reference is suppressed and a n interruption occurs. The boundaries of the variable area are specified by two addresses stored within the fixed area. These addresses may be changed only if the interrupt system is disabled. The resolution of this scheme is one word. Nonadjacent areas cannot be simultaneously protected (unless they are at the extreme ends of core storage). A potential violation causes a n interruption and, if necessary, the illegal address may be reconstructed. Protection is provided on fetches as well as stores, and the interruption indicators distinguish between three cases: instruction fetch, data fetch, and data store. This feature is useful not only for debugging but also as a n aid to dynamic core-storage allocation. X o performance penalty is paid in STRETCHfor the address monitoring. The protection scheme in ATLAShas a coarser resolution, 512 words, but nonadjacent blocks of this size can be simultaneously protected. 4.8.4.2 I0 References to Core Storage. Usually, references by the I0 channels to core storage are not monitored by hardware, due chiefly to the
100
E. F. CODD
additional expense which would be incurred for each channel if one permitted each channel to be servicing a different program. If each I0 request specifies a single core-storage urea, only a small amount of information (a single word or less) needs to be inspected. This inspection can be performed just prior to the servicing of the I0 request. I n this way the requesting program gets no opportunity to modify the area definition after its inspection and before it is dispatched to the appropriate channel. A single core-storage area per I0 request is very common due to the widespread adoption of GET-PUT buffering. However, one important class of programs, namely sorting programs, tends to use I0 requests of the scatterread, scatter -write type involving many core-storage areas per I0 request. I n such cases inspection and prevention of modification of the area definition words become more space- and time-consuming. 4.8.4.3 References to Disk Storage. Throughout the execution phase of a program, the supervisory control retains information concerning the area(s) of disk storage assigned to this program and the corresponding symbolic file name(s) which the program uses in lieu of disk base addresses. When the program issues the pseudo-op SEEK REQUEST, it supplies a symbolic file name and a relative arc number. The supervisor checks that these items represent a disk address to which this program is allowed access. Later when a READ REQUEST or WRITE REQUEST is issued, the control word is examined to insure that the entire disk area affected lies within this program’s assignment. When reading from or writing into consecutively located sets of arcs, a program need not issue a SEEK REQUEST for each READ REQUEST or WRITE REQUEST. Accordingly, for protection (and for other reasons, as we shall see later) the supervisory control must keep track of the last position reached on behalf of this program in seeking, reading, or writing. 4.8.4.4 References to Tape Units. When referring to a tape unit for reading or writing, a program uses a symbolic file name. The supervisory control checks that there is an absolute tape unit assigned to this program and file name, and if so, proceeds to service the request, other conditions permitting. An additional protection measure being adopted rather widely nowadays is that of requiring all tapes to possess an initial record identifying the reel. The supervisory control may accordingly check, whenever a tape reel is mounted, that it is the one requested and that it has been put on the proper unit. 4.8.4.5 Restricted Operations. Certain machine operations must be reserved for the sole use of the supervisory control. Included in these restricted operations are those which directly control input-output and the interrup-
MULTIPROGRAMMING
101
lioii system. If a problem program attempts to issuc one of these restricted operations, a n interruption should occur in order that the supervisor may immediately gain control. In particular, the problem program must not be allowed to disable the interruption system entirely; it would, however, be permissible for it to turn off the arithmetic class of interruptions, because this action does not affect any other programs. 4.8.5 In-Zxecution Relocation Suppose that, before a program enters the execution phase, it is converted to absolute form (except for refercnces to disk and tape, which remain semisymbolic). Kow, suppose that, a t some arbitrary stage of its exrcution, we desire to move this program to some other part, of core storage. It is most unlikely that itj will be possible to relocate that program correctly. The reason for this is that, unless some very constraining conventions have been adopted, there is no way of deciding whether the index registers. accumulator, and other registers contain location-dependent or location-independent information. Why should it be necessary or desirable to relocate a program a t some arbitrary stage of its execution? It is normally neither necessary nor desirable for short runs. For long runs, however, any one of the following possibilit,ies may be encountered. First, a lengthy program may be scheduled to operate on a standby basis; that is, it is normally kept with its associated data in disk storage. From tinie to time, when the level of activity in the machine is getting low, this standby program is called into core storage and its execution is resumed. Then later when, for example, the preparation phase of some other program or an in-execution reel change has been completed, the standby program is suspended and dumped back into disk storage. Each time the program is recalled from disk storage it may have to go into a new position in core storage, because its former position may now be occupied by other programs. Second, a lengthy program not being operated on a standby basis may nevertheless have to be dumped in disk storage, because some fault may have become apparent in the program, and an attempt is to be made to diagnose the trouble without removiiig the program from the machine. One solution to this problem consists of providing in hardware a means of executing directly a program expressed in either relative or differential form. Thus, in the relative case every address which is to be used to refer to core storage, be it a n instruction fetch, data fetch, or data store, is incremented by the contents of a register containing the base address of the program. I n the differential case, the instruction counter supplies the
102
E. F. CODD
increment. Modification by the base address fits in nicely with a single-area core storage protection scheme. The ATLAScomputer goes a step further and provides the ability to execute a program in semisymbolic form.
4.8.6 Program Suspension, Dumping, and Restoration It is a simple matter to bring a program to a complete halt without regard for the problem of resuming its operation; one merely stops its activity abruptly on all facilities which are currently servicing it. If, however. execution of the program is to be resumed a t a later time with a minimum of effort, any I0 activities currently in progress should be allowed to come to completion in the normal way. Each completion may generate an implied request for PU service. All of these requests must be stacked rather than serviced; otherwise, the program is not effectively halted. Thus, for each program we need to be able to stack interrupts for as many channels as the program uses during the execution phase. Note that this interrupt stacking is invoked not only for a program which has been put in the NOT READY state in order to bring it to a halt, but also for each program in the READY state, so long as the selection rule for the PU queue keeps such a program from being serviced by the PU. I n addition, as a convenience to the programmer, he may be provided with the pseudo-ops STACK 10 INTERRUPTS and RELEASE 10 ISTERRUPTS, which again make use of the interrupt stacking feature. If a program A and its associated data are to be temporarily dumped into disk storage and later restored, the dumping should not be initiated until all data transmission has been successfully completed for all of A’s pending and active I0 operations. Restoration and resumption of execution are then quite straightforward. 4.8.7 Communication with the Operating Staff A small number of different types of messages, all simple in character, will suffice for communication between the supervisory program and the operating staff. An I0 typewriter proves to be quite adequate as a terminal device for this purpose. Consider, first, messages originating from the supervisory program. These should include tape mounting and demounting instructions, reports of machine malfunctions and program error, indications of overrun of a program, and reports of major items of progress (for example, the completion of a run). Certain types of outgoing messages are desirable as responses to incoming ones. For example, the instruction “repeat last message” is useful if the last message sent in was unintelligible to the supervisor or if an error was detected in transmission of the message. The outgoing message “z done” may also be helpful as an indication that the supervisory program has completed action on an incoming message marked z.
MULTIPROGRAMMING
103
Messages from the operating staff to the supervisory program should include (1) instructions to modify the current schedule (in case very urgent work comes to hand) ; (2) notification regarding units being taken out of service for repair and units being put back into service; and (3) requests for diagnostic information about the supervisor itself. A message of the type ‘‘5 done” may prove useful for transmission to, as well as from, the supervisory program. Signalling the completion of a tape-mounting operation is best done at the tape unit itself.
4.8.8Space Release and Request If a program reaches a point in its execution phase a t which it no longer needs some portion of the space assigned to it, it may issue the pseudo-op FREE, and designate symbolically the space it is willing to give up. Since the execution control decodes all pseudo-ops issued to the supervisor, it decodes the FREE pseudo-op in particular. It then calls for adjustment of corresponding allocation entries in the control table. This may in turn call for the scheduling routine, which is part of the pre-execution control. There is no particular difficulty about freeing a tape unit. Freeing disk space and core space, on the other hand, can under certain circumstances be meaninglcss, because the space allocation scheme is unable to exploit the released space. Here, we must remember that the disk and core allocation schemes may be heavily constrained by the respective protection devices or procedures. For example, release of the middle 1000 words of a 3000-word core-storage block is useless unless the core-storage protection scheme permits the simultaneous protection of nonadjacent areas. Similar remarks apply to in-execution requests for additional core or disk space. This subject will be discussed further in Section 5.2. 4.8.9 Time Studies
In many computer installations, particularly those concerned with problem solvir?gwith the programmer off-line, users are requested to supply an estimated time for each run as part of the corresponding run request. This estimate is used by the operating staff to limit the amount of machine time used by the program. The intent is to avoid undue waste of machine time due to faulty programs. Accounting for machine use is normally handled by clocking each program into and out of its execution phase. When independent programs are being run concurrently, the time spent by a program in the execution phase ceases to be a good guide either for terminating runs before completion or for accounting purposes. This is due to the fact that any program is likely to experience delay waiting for other programs to rdinqukh time-shared facilities. Therefore, the duration of the
104
E. F. CODD
execution phase of any program is, in general, dependent upon the program mix. A measure is needed which, for a given run, is independent of its core, disk or tape assignment, and also of the activities of any program with which it may be executed. The PU service time, defined as the total time spent by the PU servicing the program in question, possesses this property of invariance. The service time (similarly defined) for each tape unit assigned to the program and for the disk channel are also invariant with respect to allocation and program mix. I n contrast, the service time for any particular tape channel or disk-seeking mechanism is highly dependent upon the tape and disk allocation, respectively; in addition, the observed seeking service times are highly dependent upon program mix. Any or all of the service times which are invariant with respect to allocation and program mix may be used as factors in an accounting formula, along with space factors such as the amount of core, disk, and tape space reserved for the program. The PU service time is likely to be the most suitable means of determining when an incompleted run is to be terminated. The accumulation of both time and space statistics for each run yields informat,ionabout an installation’s workload which is invaluable for determining the kind of scheduling and allocation best suited to this workload. 4.8.10 Miscellaneous Services A supervisory program may provide many services which are desirable if only one program at a time is permitted to be in the execution phase. Some of these services may become vital if two or more programs are executed concurrently. Examples of such services are automatic recovery from tape errors and reporting certain types of program errors. One highly desirable service is associated with the output of final results, namely, the use of a shared output tape. Whenever a program is ready to p u t out a record consisting of final results, it calls for the supervisory program to write this record on a tape designated for shared output. The supervisory program appends to the record a label identifying the program which created it. Accordingly, records created by different programs may follow one another on the tape, and a satellite computer may be used to select all records pertaining to any one program in a single pass of the tape. 5. The Optimizing Problem
Ideally, the optimizing problem might be formulated as follows. We arc given a computer system equipped with space-shared and time-shared facilities. A set of runs (programs and associated data) is to be executed so
MULTIPROGRAMMING
105
as t o minimize the time for the whole set. The space and time requirements, including the pattern of demands on the time-shared facilities, are given for each program executed alone with the associated data. The time-minimizing algorithm is itself executed on the computer. I t s own space requirements and pattern of behavior together with the time consumed in its (possibly repeated) execution must be taken into account in the minimizing procedure. This ideal problem appears to be unsolvable. The practical problem is beset with additional difficulties. First, the behavior of a program may depend very heavily upon the data associated with it for a given run. Hence, observations of its behavior on past runs may be quite unreliable as a guide to its behavior on subsequent runs. Second, the activity of the supervisory program may interfere in a relatively unpredictable way with the activity of the programs being supervised. This interference may be felt on the disk and tape channels as well as on the processing unit. Third, considerable variation may be expected in the speed with which tape-handling operators respond to reel-changing instructions. Fourth, if runs are taken out of their arrival sequence, the space and time overheads are difficult to predict, since they are dependent on the arrival sequence itself, as well as on the characteristics of each run to be done. Fifth, delays which may be incurred in recovery from machine malfunctions of various types are also virtually unpredictable. These practical difficulties suggest the adoption of an empirical approach in which heavy emphasis is placed on short-range, rather than long-range, decision-making. I n an article entitled “Multiprogram Scheduling” [S], the writer described an approach which might be used on either a short- or long-range basis. This approach has now been adapted by R. H. Ramey and the writer in order to include the special aspects of tape scheduling and allocation, and to emphasize short-range decision-making more heavily. Before proceeding to describe various approaches, the terms (‘scheduling,” ‘(allocation,” and “queueing” need clarification. 5.1 Scheduling, Allocation, and Queueing
By “scheduling” we mean the determination of which run is next to enter the preparation phase and at what stage of the proceedings this entry is made. “Allocation” refers to the determination of which parts of core and disk storage, and which tape units are to be assigned to a program under consideration by the scheduling routine. The term “queueing” is used primarily to refer to the set of queue selection rules (or queue disciplines) applied to the servicing of demands arising from programs in the execution phase. Kote that in some programming circles the term “scheduling” has been used rather oddly to denote nothing more than queueing in the execution phase.
106
E. F. CODD
We may now identify three principal modes of operation, according to the information assumed to be available for each run: (1) the nonscheduled mode (no information) ; (2) the space-scheduled mode (space information only) ; and (3) the space-time scheduled mode (space and time information). The fourth logical alternative, the time-scheduled mode, seems to be of little practical interest. A cursory description of the three principal modes now follows. 5.2 Nonscheduled Mode
In the nonscheduled mode, space reservations are acquired by each program only as the need for space becomes vital to the continuation of the program’s loading or execution. For practical reasons, a demand for more space in core and disk storage results in the assignment of a block of locations large enough to keep the assignment overhead reasonably low and consistent with the resolution of the protection scheme. Whenever a request arises for core space, disk space, or a tape unit, and the space currently available is insufficient, the program issuing the request is, at the very least, held up until the required space is available. It may be necessary to remove the program from the machine. This case arises when, for example, execution of two or more programs has been started and it is subsequently discovered that their combined tape-unit requirements exceed the number available on the machine. A space overload might also occur in disk or core storage. A disk overload is about as awkward to handle and as easy to avoid as a tape overload. A core overload, however, may be readily handled by the supervisory program, if programs and their associated data are not location-dependent at any time during their execution phase. The nonscheduled mode is most undesirable if the workload entails a significant amount of tape handling. It provides no assurance that reelchanging activities will be handled concurrently with productive processing of other parts of the workload by the computer system. Note that, in any event, optimizing is confined to the choice of suitable queue disciplines for the execution phase. The nonscheduled mode has very little application in batch multiprogramming. It is discussed here only for the purpose of exhibiting an extreme approach, and thereby putting some more reasonable approaches in their proper perspective. 5.3 Space-Scheduled Mode
We shall assume that the space requirements for a run are completely specified in the corresponding run request, except for compile-and-execute
MULTIPROGRAMMING
107
runs. In the latter case the compilation phase will generate a run request with the appropriate space information to be used for the execution phase. Time information is assumed not to be available. One simple approach is to service run requests in the order of their arrival. Upon reading a run request into core storage, the scheduling routine compares the tape space required with that available. If all the tape space needed is available, tape units are allocated and the execution control is notified that tape-handling instructions must be issued to the operators. The scheduler then calls for allocation of the required disk space. If space is available and there is a block of initial disk data to be loaded, loading is initiated. Finally, when tape and disk requirements have been met, an attempt is made to find a sufficiently large block of core storage for the run. If successful, the program is loaded into the assigned area and adjusted if necessary to tzheform appropriate for this particular place in core storage. Now, of course, any of the comparisons of space needed and space available may revcal a deficiency in that available. The preparation phase for this program is then stalled until sufficient space of the kind needed becomes available. Release of space reservations normally results from a program reaching the end of its execution phase. Possible elaborations of this approach include permitting piecemeal allocation and preparation of tape units whenever any of the needed tape space is available. Also, a program requiring no tape-unit assignment (for example, one that uses the shared output tape for its final results), might be allowed to overtake another program in the preparation phase if this other program were held up for lack of tape units. As we shall see when we deal with tape allocation, some saving in the number of reel changes can usually be achieved with this simple approach. Of course, the composition of program mixes in the execution phase is quite uncontrolled. Nevertheless, this type of operation may be profitable. Is there any possible advantage in servicing run requests out of their order of arrival? In the absence of time information for each run, only two potential gains arc apparent. First, a further reduction in reel changes may be achieved. Second, the average number of programs run concurrently may be increased. An example will illustrate this latter point. Suppose there are twenty runs to be executed, all of approximately equal length, and the run requests arrive in the following sequence: ten runs, each requiring three-quarters of the total core storage available, followed by ten runs, each requiring one-quarter. If the runs are taken in their order of arrival, the average number of programs run concurrently is approximately 1.5, while an average of 2 can be achieved if they are taken out of this order. Now, an increase in the average number of programs run concurrently does not necessarily result in a reduced time to execute the entire set of runs, but
108
E. F. CODD
the statistics collected by a particular installation may show that such a reduction is likely. To service runs out of the sequence of arrival of the corresponding requests, a more sophisticated scheduling routine is required, and provision must be made for storing the programs and initial data in a more readily accessible medium than tape. Thus, an area in disk storage which is termed the staging file is set aside for this purpose. We may expect the most frequently used programs to be kept permanently in disk storage until obsolete. Whenever any one of these is required for a run, it need not be loaded into the staging file, as it is already in an accessible spot. 5.4 Space-Time Scheduled Mode
I t is unrealistic to assume that detailed information can be made available about the expected pattern of use of each of the time-shared facilities by each program. It is reasonable, however, for a large class of programs, to assume that each program’s over-all use of the time-shared facilities is given in approximate form. We have already observed that the supervisory program can readily accumulate for each run of a program the total PU service, disk-channel service, and service from each tape unit. I n addition to these times, the supervisory program can generate an approximate (in some cases, exact) value for the execution time of a program which would have been observed had that program been run alone. A convenient way to characterize a program’s expected use of the machine is therefore as follows: (1) the space requirements; (2) the execution time if run alone; and (3) time fractions for the PU, disk channel, and each tape file; where the time fraction rii of program i on facility j is equal to the corresponding service time p i j divided by the execution time if run alone t,. rij = p,j/ti
The values of p ; j ( j = 1, 2, . . .) and ti which are produced by the supervisory program in one run of a program i may be supplied to the programmer for use on a subsequent run. If the program is expected to make the same gross demands for time as before, these values may be fed back to the scheduling routine unchanged. If change is expected, the programmer is called upon to make rough estimates of new values. For example, the programmer might reasonably be asked to select one of the five values, 0, $, 4, 3, 1 for an expected time fraction. Note that, if the programmer had to provide his own assignment of tape units, he could not do this intelligently without some knowledge of the time fractions for each of his tape files. Having considered a possible characterization for the time requirements
MULTIPROGRAMMING
109
of a program, JVC may now discuss how such information can be used. In scheduling it may be used to select mixes of programs which are likely to prove more profitable. In tape allocation, it may be used to achieve morc uniform loadings of the tape channels. (Incidentally, this latter use would be completely unnecessary if the tape channels were equipped with a crosspoint switch, so that any tape unit could communicate with core storage over any nonbusy tape channel.) As in the space-scheduled mode, the order in which run requests are serviced needs to be resolved. In the space-time scheduled mode, adherence to the sequence in which run requests arrive in core storage would appear inconsistent with the provision of time information. Accordingly, as each run request comes in, the program and initial data (if any) which accompany the request are normally transmitted to the disk staging file, later to be selected for preparation and execution when the scheduling routine considers the prevailing conditions to be opportune. An algorithm for scheduling in the space-time mode is discussed later. First, we consider the special problems of allocating space in tape, disk, and core storage. In passing, we note that the choice of scheduling mode and the more basic choice of executing programs one a t a time or concurrently must be based (at least in part) upon the nature of the workload. For example, if the workload consists of runs, all of which have a PU time fraction of unity, then it would be pointless (in the batch environment and assuming a single PU computer system) to place more than one program a t a time into the execution phase. In this case, concurrency between the execution phase of one program and the preparation phase of another might still prove profitable. 5.5 Tape Allocation
The choice of method for allocating tapes to tape units can have a very considerable effect upon the throughput rate in both sequential and multiprogrammed operation. Well-planned allocation can result in a reduction in the number of reel changes. It can also provide more uniform loading of the tape channels with the result that channel bottlenecks are avoided or reduced in magnitude. To understand the problem we must consider the various states which may be assumed by tapes and tape units. Three initial states, u, v, w and three final states, U , V , W are defined for tapes. To treat the initial states, suppose a program is about to use a tape for the first time in a run. If the information on this tape is of no significance to the program, the initial state of the tape is u. Otherwise, the initial state is either v or w.It is w if, after the significant information was recorded, the tape was left mounted on the tape unit for use by this program. It is v if the tape was demounted and the informabion saved off the machine.
110
E.
F. CODD
To treat the final states, suppose that a program has just finished using a tape. If the information on this tape is now obsolete, the final state of the tape is U. Otherwise, the final state is V or W . It is W if the tape is left mounted on the tape unit so that its information content may be used in some other run. It is V if the tape is demounted and saved for some other run. Note that a program may transform a tape from any one of the three initial states to any one of the three final states. The popular term for a tape with initial state u and final state U is “scratch.” Five states A , El U , V , W are defined for tape units. These states are pertinent when the allocation routine is being executed. If a unit is already assigned, it is in state A . If not assigned and no tape is mounted, the unit is in state E.If, finally, a tape is mounted but the unit is not assigned, the unit assumes the same state as the final tape state (i.e., U , V , or W ) . The allocation of tapes to tape units may now be treated in terms of these five tape-unit states and the three initial states for tapes. The permissible couplings between tape-unit states and initial tape states are as follows: Eu, Ev, Uu, Uv, Vu, Vu, Ww. Couplings of type Uu and Ww require no tape handling a t all. Couplings of type Eu and Ev require tape mounting only. All other types of couplings require tape demounting as well as tape mounting. The following allocation rules minimize the number of reel changes required to prepare a given run, assuming there are enough unallocated tape units to accommodate all the initial tape files for this run. First, dispose of all Ww couplings which were predetermined. Second, make all possible Uu couplings. All remaining couplings necessarily involve tape mounting. Third, make all possible Eu and Ev couplings. Now, if there are any tapes left to be assigned, a full reel change is necessary for each. Finally, therefore, make Uv,Vu, and Vv couplings as necessary for the remaining tapes. In each of the second, third, and final steps, any one of several tapes may be assigned to any one of a corresponding number of tape units. Whenever this is the case, the freedom of association is exploited to make the loading of the tape channels as uniform as possible. 5.6 Disk Allocation
Reservations of space in disk storage are conveniently divided into two classes: short term and long term. Short-term reservations are intended for the use of programs (and associated data) for which a run request has been received. Such reservations are normally cancelled upon completion or termination of the run. Long-term reservations may last for a day or, in some cases, for weeks.
MULTIPROGRAMMING
111
The long-term reservations consist of seven large areas into which disk storage is divided. These areas are now listed with their uses. (1) Systems File. This area is reserved for systems programs such as the supervisory program, compilers, and debugging aids. Only occasionally is it necessary to write into this area. At all other times this area is used on a read-only basis. (2) Nonsystems Program File. This area is reserved for programs other than systems programs which are executed sufficiently frequently to justify their retention in disk storage. Changes in the contents of this area are likely to be considerably more frequent than in the systems file, due to new programs being introduced and old ones being deleted. (3) Staging File. This area is required if run requests are being serviced in some sequence other than their arrival sequence. The information for each run, other than that already present in some other diskstorage area, must be transmitted into the staging file if the run is not selected for preparation and subsequent execution at the time of the arrival of the corresponding run request. (4) Execution File. This area is reserved for the use of all nonstandby programs in the execution phase. (5) Log File. This area is reserved for the current section of the operating log for the machine. (6) Dump File. This area is reserved for emergency dumping of the entire contents of core storage, and therefore has a capacity equal to that of core storage. (7) Standby File. This area is reserved for programs being operated on a standby basis. In general, programs in this area have reached some arbitrary stage of execution, and are accompanied by data and intermediate results. The boundaries of all seven files are fixed on a long-term basis. Short-term reservations are made within the boundaries of the execution file, the staging file, and the standby file. For the sake of brevity, we discuss allocation within the execution file only. Within the disk space allocated to a particular program it is desirable to permit the programmer to organize his data into separate files. Each of these files is identified by a symbolic address, rather than an absolute oneas mentioned in connection with disk protection (4.8.4.3). Initial states analogous to those defined for tapes may be defined for disk files. Thus, we consider a disk file belonging to a program which is about to enter the execution phase. If the information in the disk area assigned to this file is not required to be of significance to the program, the state of the file is 11,. If the program depends upon thc loader to copy initial disk data
112
E. F. CODD
from the master input tape into the assigned area, the state of the file is v. Finally, if the program depends upon some program other than the loader to produce results which become the initial data for this file, the state is w. Allocation of disk space to the files belonging to a given program may be handled in many ways. We assume that the area assigned to any single file always consists of a set of consecutively addressed arcs. The allocation routine may or may not require that the files be placed adjacent to one another. Arguments for permitting scattered placement of a program’s disk files include the following. (1) The space in the execution region of disk storage may be better utilized because a tighter packing is achievable. (2) A file of results produced by one program can become a file of data for another without moving the information. (3) If time fractions (or equivalent information) are available for each file, more uniform loading of the disk-seeking mechanisms may be attainable through placing some files on the first disk unit and others on the second. These advantages of scattered placement are accompanied by the penalty of extra overhead activity in the allocation routine, and also the possibility of higher seek times, which will occur if the files of two programs are interspersed and one of the programs terminates, leaving the other to seek back and forth between widely separated files. More tests are necessary to determine if a program’s requirements for disk space can be satisfied. For simplicity of exposition, we assume that scattered placement of a program’s disk files is not permitted. Kow, under these assumptions, the problem of allocating disk space to nonstandby programs entering the execution phase is essentially the same as that of allocating core space to such programs. We proceed to treat this problem in the next section. 5.7 Core-Storage Allocation
The problem to be treated first is that of allocating space in the execution region (Fig. 6 ) of core storage to nonstaiidby programs. Each program is assumed to require a single set of consecutively addressed words (or space units4).Such a set is called a block of core space. In this context, blocks may consist of any number of space units. The block allocated to a nonstandby program is held by the program until the end of its execution phase. The expected duration of the execution phase is initially assumed to be unknown or ignored. Because of the wide variation ill the core-space requirerimits for dif4
It matters little whether the unit of spare is a single word or many words.
EXECUTION SET
SHAREDOUTPUT
1
MASTER INPUT
TAPE STORAGE
PROGRAMS EL INITIAL DATA
FIG.6. Organization of storage for batch multiprogramming.
RUN-REQUEST FILE
CORE STORAGE DISK
2
NONSYSTEMS PROGRAM \ FILE
FILE
STORAGE
E. F. CODD
114
ferent runs, it is desirable to allocate space so that as much as possible of the space which remains unallocated belongs to a single block. Therefore, whenever only x words are to be allocated from an available block of y words (y > x), the set of x words at one end or the other of this block is selected. The original block is accordingly divided into an allocated block of x words and an unallocated block of (y - x) words. Generally, when a new request for space is about to be considered, there exist several unallocated blocks separated from one another by allocated blocks (Fig. 7). Suppose that the allocation routine maintains a list of ALLOCATED BLOCK
0 UNALLOCATED BLOCK
B
FIG.7. Example of allocated and unallocated blocks of space in the execution region of core or disk storage.
unallocated blocks ordered by size. A request for x words is satisfied, if possible, by scanning the unallocated-block list for the smallest block which contains a t least x words. When such a block B is found, an arbitrary choice is made between the two extreme positions in B, except when one of the boundaries of B happens to be a boundary of the entire execution region; in this case, the extreme position on this region boundary is selected. A scheme such as this which ignores occupancy times may be perfectly adequate if all occupancy times are comparable in magnitude-a common characteristic of the commercial real-time environment (3.1.5). In batch multiprogramming, however, wide variation in the occupancy times for different runs is normal. The resulting difficulty may be demonstrated by an example. If expected occupancy times are ignored, it is clearly possible for a very long program P to be given a position in core storage which is not on either boundary of the execution region (Fig. 8). As long as P occupies this non-
TIME
FIG.8. Fragmentation of unallocnted space by a long program occupying a central position in the execution region.
MULTIPROGRAMMING
115
boundary position, the largest block of space which can be made available is the larger of S1, SZ.If P were in a boundary position, a block as large as SI SZcould be made available. If the computer is capable of executing a program in relative form, a periodic relocation of long programs toward the boundaries of the execution region solves the problem. In the case of a computer capable of executing a program with semisymbolic addresses, the problem arises only if the space unit of the allocation scheme is not an integral multiple of the symbolic block size. For a computer which has neither a relative nor a semisymbolic addressing capability, we may solve the problem, a t least partially, in the following way. Associated with each nonstandby program is a maximum occupancy time. When such a program is being considered for placement in an unallocated block B1 flanked by two allocated blocks Bz, Ba, we require that the maximum occupancy time for this program be less than or equal to the larger of the remaining maximum occupancy times of Bz and BS. If the unallocated block B1 has a boundary in common with the execution region (i.e., it is flanked on only one side at most by an allocated block), no time constraint is applied. Variations of this rule may be worth considering, depending upon the average number of programs sharing core storage. The allocation of core space to standby programs is somewhat simpler, because such programs normally retain their core reservations for the short time intervals associated with rewinding and reel-changing activities. Thus, an allocation rule such as the first one mentioned in this section may be used. We observe that, in a machine which has neither a relative nor a semisymbolic addressing capability, standby programs may be constrained to operate in those core locations originally assigned to it. In such circumstances, it would appear wise to have at least one standby program assigned to one boundary of the execution region and a t least one to the other.
+
5.8 Advance Commitments
In Section 5 arguments have been presented for short-range, rather than long-range, scheduling. We now consider how short the scheduling range may usefully be, when programs are being serviced in an order which is not predetermined. Suppose the scheduling routine has determined that P is the next program to enter the preparation phase, even though sufficient space for P will not be available until the execution phase of at least one current program is finished. Such a decision by the scheduling routine is called an advance commitment. For practical reasons, it is particularly undesirable to call for tapes to be mounted for P, then cancel the commitment to P and call for these tapes to be demounted, not having used them.
116
E. F. CODD
The chief incentive for making an advance commitment to P is that it may be possible t o complete the tape-mounting activity for P while the execution phase of the current programs proceeds. The chief difficulty in making an advance commitment to P is that the selection of P itself frequently requires a judgment to be made as to which of the current programs will be completed first. There are three special cases in which advance commitments may be made without difficulty. I n the first case, the workload consists entirely of programs which behave very much as predicted (under and overrunning being very rare). This case is sufficiently unlikely that we shall not discuss it further. In the second case, only one program Q is in the execution phase, none is in the preparation phase, and none of the pending programs can fit with Q or be profitably mixed with Q. In the third case, P is considered sufficiently urgent that its execution phase must be started as soon as possible without displacing any current programs, but it cannot be accommodated until one (or more) of these programs release space. A program may be placed in the urgent category if it remains in the scheduling phase an excessive length of time. Since the scheduling algorithm given in the next section will tend to delay programs with large space requirements in favor of programs with lower space requirements, it may be desirable to treat programs as urgent if they are delayed too long. Apart from these special cases, advance commitments are likely to prove unprofitable. Normally, therefore, scheduling is extremely short range and is based very heavily on the conditions which have actually developed, rather than projected conditions. The selection of P as the next program to enter the preparation phase is normally not made until its preparation can be initiated and taken to completion without waiting for release of space by any other program. 5.9 Short-Range Scheduling
Barring an emergency such as a unit becoming unserviceable, scheduling activity is required only when the execution phase of a run is completed (or terminated) or a new run request is read into core storage. In the first case (end of execution phase) the scheduling routine considers which run of all those pending would make the most profitable addition to the scheduled load. If successful in finding a profitable addition, it includes this run in the schedule and then searches for a profitable addition to the augmented schedule, and so on, until no further profitable additions may be made a t this time. The preparation phase is then initiated for all selected runs. In the second case (run-request arrival) only one run need be considered; namely, the one whose request has just arrived. It is scheduled or
MULTIPROGRAMMING
117
temporarily shelved according to whether it makes a profitable or unprofitable addition t o the scheduled load. It would be most direct to identify profitability of an addition to the schedule with the resulting time saved in disposing of the scheduled and pending workload. However, estimates of this time saving may be grossly in error due to (1) programs, especially long ones, overrunning or underrunning their estimated service requirements; and (2) delays suffered by each program in each mix being difficult to predict. These uncertainties suggest an approach in which profitability is treated as a simple function of the loads on the time-shared facilities. The load X, on a time-shared facility j may be measured by the sum of the time fractions of all the scheduled programs X (excluding advance commitments) for this facility; that is, XI = C r23.Loads defined in this arX
way must be normalized, since a time-shared facility cannot dispense more than one unit of service in one unit of time. We therefore define the normalized load as h,
=
Al/max { 1, max (A,)}. I
As an example, consider a machine with three time-shared facilities j = 1, 2, 3. Let there be three programs i = I, 2, 3 with time fractions listed as follows: j = 1 j = 2 j = 3 1
3
i = l 4 4 0 1 -1 1 i = 2 4 2 3 1 1 i = 3 1 2 2 If the set X of scheduled programs consists of program 2 only, we have the following loads : x = (21 x1 = f, Xa = 3, A3 = 3 ~
XI =
f, hz
=
3, XJ = 3
Programs 1 and 2 together yield
x=
(1, 2:
x1 =
1, A? =
2, As
X I = 1, A? = $, ha
3 = 3 =
whilc programs 1, 2, 3 together yield
x
=
(1,2,3)
X I = 2 , Xr = It,
X I = 1, A2 =
5,
=
1
ha =
3
x3
Notice the effect of iiorninlizing the loads in this last case.
E. F. CODD
118
Associated with the set P of pending and scheduled programs are the total expected service times required for each time-shared facility. When normalized, these times become dimensionless fractions which may be used both as target loads for the corresponding facilities and as measures of the importance of attaining these target loads. Let aij be the expected service time required by program i on facility j . The total expected service time required from j by set P is given by wj =
c
aij,
ifP
and the normalized weight wj is given by w i = wj/max (wj). j
We define the rating
T,
of a given set X of scheduled programs by T,
=
C wjA(wj - Xj), i
where A(wj - Xi)
=
if wj - Xj 5 0 - X i otherwise
("wj
The A function is used in place of Iw, - A,] or (1 - A,), because it appears desirable neither to discourage nor to encourage loads greater than the target values. On the other hand, it is desirable to discourage loads less than the target values. Suppose a program has just ended and the set of programs remaining in the schedule is denoted X. Each of the pending programs which can be accommodated in the tape, disk, and core space available is rated as an addition to the schedule by tentatively including it in the set X to give a set X', computing the corresponding value of Tzf. That program is selected which gives the minimum value of r2', provided5 this value is less than the rating T, of the unaugmented set X. Whenever the set X becomes empty, we may bypass all comparisons of space required and space available, and omit the rating procedure entirely. In such an event, we select the longest tapeless program if one is available; otherwise, we select the longest program from the pending workload. If a program Q is selected when the rating procedure is invoked, Q is included in the set X until its execution is completed or terminated. The space available is adjusted to reflect the inclusion of Q in the schedule. Now, a second attempt is made to find a pending program which can be profitably accommodated. Such attempts continue so long as they are E This provision may be advantageously omitted when the set X has only one membcr. If thc membership of X is not increased beyond two by the current scheduling activity and the mix (R, Q ) is less profitable than the original single program R, then R mny
l x run with priority over-Q on all fari1itic.s.
MULTIPROGRAMMING
119
siiccessful. Failure to select a program for inclusion in X causes the scheduling routine to become inactive until triggered once again by an end-ofrxecution phase or run-request arrival. This simple procedure is unfortunately inadequate without some modification, because it ignores the special properties of the disk and tape facilities. When two or more programs share use of a single disk, the resulting seeking activity may be considerably in excess of that which would have been experienced had these programs been run alone. Then again, the profitability of adding a given program to a given schedule may be highly dependent on the various allocations for its tape files which are currently attainable. The following procedure takes these considerations into account for a system with a single disk unit (equipped with just one seeking mechanism) and two tape channels. Extensions of this procedure to handle more than one disk unit or seek mechanism and more than two tape channels are readily apparent. Time fractions rz,, loads A,, and weights w,are relabeled as follows for clarity. PU DISK T-4PE FILE
f
rt,w
Xpu
wpu
r%>D,
*
wDK
r%,TPf
TAPE CHANNEL
* *
y
TAPE SET
AT,,
*
*
* WTP
Note: in this table, “*” means “irrelevant.” The disk time fraction r,,,, includes both transmission and that part of the seeking activity which would be experienced if program i were run alone. In addition to this, we require an estimate 6rc,Dxof the extra seeking activity which is included when program i is run concurrently with another program which uses the disk. An approximate measure for this is 6rZ,DK = m,S/t, where (1) m, is the total number of disk reads and writes requested by program i, excluding those for which program i explicitly requests seeks; (2) S is the average seek time for the disk. Whenever a run request for a nonstandby program i arrives, the weights are updated as follows: w’,= w,f r& wLaX= max (w’,) 3
w‘,= w’,/wLax. The end-of-execution phase of a nonstandby program triggers a similar recomputation of weights, the service times for the departing program being subtracted, instead of added. The primes on the w’s are now dropped.
E. F. CODD
120
To rate a program Q as an addition to the set X of scheduled prograins, we tentatively allocate the tape files (see Section 5.5) to determine the channel loads The disk and PU loads are computed
where X' is X augmented by program Q. We also require the induced disk load
C 6r1,vKif more than one program uses disk, 6hDK= fx
0 otherwise. Loads are now normalized as follows:
A,,
max (Apc, AD, max (I, A,,) = Apu/m
AVK
=
A,
=
m
+6 L ,
h i ,
ATCd
=
XDKIm
XTCl/m ATc2/m Finally, the rating of program Q is given by 70
=
ATCl
=
hTC2
=
wpu.A(wpu - Apu)
+ wDK.A(wDK - h v ~ ) + wTP.A(wTP - ~ C I +) wTP.A(WTP - L P ) .
Note that the inclusion of the induced disk load in the determination of together with its exclusion from AD, gives the desired effect of a potential reduction in the normalized load on the disk.
, , ,A
5.10 Queueing
Four requirements for the queue disciplines used in batch multiprogramming are: (1) a tendency to keep all facilities heavily loaded, particularly the bottleneck facility; (2) the ability to handle urgent programs on a high priority basis; (3) the ability t o expedite programs consuming large amounts of space, if space-time is a bottleneck; and (4) simplicity. To observe the first requirement, it is necessary to exploit the special
MULTIPROGRAMMING
121
properties of the various time-shared facilities; for example, the ability of the processing unit to be interrupted. It is also necessary for the queueing scheme t o be responsive to changing patterns of behavior on the part of the programs currently being coexecuted. The following scheme is likely to be found satisfactory in batch multiprogramming. On the processing unit, priority is given at queue selection time t o that ready program which required the shortest burst of service from the PU when it last obtained such service. Each time a program relinquishes the P U or is displaced from it, the value of the length of the last burst is replaced. Consequently, the processing priority of a program is subject t o continual change depending not only upon the variability of its own behavior, but also on that of the other current programs. Queue selection is carried out when any of the following events occur. (1) A program being serviced b y the PU becomes not ready for processing (for example, a WAIT pseudo-op is issued for a n I0 operation which has not been completed). (2) A program, formerly not ready, becomes ready for processing (for example, an I0 operation being awaited is now completed). (3) A program being serviced by the PU has failed to relinquish it and has not been displaced from it for x units of time; where z is a fixed arbitrary limit imposed on the length of continuous P U service to a single program. At queue selection time, all programs ready for processing are considered t o be in the P U queue, even the program serviced most recently (provided it is still ready). This queueing rule tends t o expedite entry into the NOT READY state of those programs which have a tendency to enter this state. When in this state, these programs are utilizing the I0 equipment. Meanwhile, the P U is kept well-loaded by programs which tend to (but are not allowed to) monopolize it. These latter programs may also use the I0 equipment concurrently with the PU. Associated with this queueing rule on the PU, first-come-first-served (FIFO) would be appropriate on each tape channel. On the disk channel, unnecessary delays in seeking may be avoided by selecting that service request which entails the least movement of the seek mechanism, taking its current position into consideration. This rule needs slight modification because it could result in a request for service being ignored for an indefinitely long time interval if an outlying area of the disk were involved. A simple solution is to make sweeps in alternate directions clearing u p all the current requests for one direction before changing direction. E. S. Lowry has proposed the following, more sophisticated scheme. For program P , the quantity
122
E. F. CODD
Y i= C rij 3
where j ranges over those time-shared facilities for which Pi is currently unready, gives a good measure of the capacity of work on Pi to create readiness on ot,her facilities. The values of rij (supplied initially if the space-time scheduled mode is being used) are added to and subtracted from Yi whenever Pi goes unready or ready on a facility j . The initial value of Yi is the sum of rij for all j except the PU. Queue selection for the PU and tape channels entails choosing that program which has the largest value of Y;at this instant, unless two or more programs possess values very close to this maximum. In this case, priority is given to that request (associated with those programs with values of Yi at or near the maximum) which is likely to restore buffer levels to equilibrium most promptly. Note that, if buffer levels are kept in or near a state of equilibrium, all Yiare likely to be very small.
6. Multiprogramming with Two or More Processing Units 6.1 Motivation and Requirements
The incorporation of two or more functionally identical, self-sufficient, in a single computer system is general purpose processing units (PU’S)~ becoming increasingly important as higher processing speeds and higher degrees of reliability are demanded. We focus attention on systems in which the PU’s share the use of memory and input-output channels, because, in such systems, the current workload can be distributed over the various system components in an extremely flexible way, and this will very often yield high overall efficiency. An example of this type of system is illustrated in Fig. 9. The workload is assumed to consist of a number of independent programs (referred to as problem programs or PP’s) which need to be protected from damaging one another when they are concurrently executed. Each of these problem programs may itself be a complex of dependent programs which can be executed concurrently, subject to constraints contained within the programs themselves in the form of special operations to be discussed later. We shall deal with the object form of these programs and omit consideration of their source form and compilation. A fundamental requirement is that the system is to continue in operatio11 if any PU’s become inoperable, so long as at least one PU remains operable. This requirement has many implications ; in particular, the following: Note that the PU’s may differ in their performance characteristics and construction, but are completely interchangeable in all functional rePpcrt,s.
...
----
9
L-----
I0
\
1 -----
DISK CHANNEL I
\
\
/
1
TAPE UNITS
-
-
I'
-
2' -
3'
4'
2
\
L/
3
TAPE CHANNEL
\
..*
-- - - - 3 - - - - -
TAPE CHANNEL
\
-7-
FIG.9. 0rganisat.ion of computer system.
'~NSTRUCTIONS ---L
I
2
\
PROCESSING UNIT
PROCESSING UNIT
3
/
I-
I0 INTERRUPTIONS 7----7----F----
/
---
32-
TAPE UNITS
4-
I-
DISK UNITS
2-
z
w
h)
d
0
3 3
3
F-r 8-u
124
E. F. CODD
(1) The supervisory program must not be permanently associated with any particular PU or subset of PU’s, nor must it require the undivided attention of a whole PU. (2) It must be possible to initiate I0 activities on any channel from any PU. (3) Every PU must be capable of responding to interruptions of all types, including I0 interruptions. Of course, to avoid duplicate handling of I0 interruptions, it is desirable at any instant for only one PU to be designated to receive such interruptions. (4) Every problem program must be in such a form that the correctness of its execution does not depend on which PU’s, and how many PU’s, are available to service the program. A second requirement is that two or more PU’s be permitted to execute concurrently a single set of instructions using different, but possibly intersecting, sets of data.’ This requirement arises when processing large arrays of data. In these circumstances, segmentation of data for concurrent operation is more readily effected and likely to be more profitable than segmentation of instructions. When several concurrent executions of a single set of instructions are to be initiated, the starting address for fetching instructions is insufficient to characterize uniquely each of the corresponding requests for processing service. For each request we require, in addition to the starting address, those index quantities and parameters which are pertinent to this request. All of these items may be made available in a set of consecutive locations. A request may then be uniquely characterized by specifying the base address of this set. This base address is called the jirst base of the corresponding request. Note that this requirement would not arise at all in a system in which all the PU’s were functionally specialized. A third requirement is a means of preventing more than one PU from concurrently modifying an item in memory. Consider, for example, that two processing units are concurrently servicing a common section of code, and the code requires that each PU reduce a certain count in memory by unity. If the first PU fetched the count from memory after the second PU had fetched it but before the second PU had stored its reduced value, then, when both PU’s have completed their decrementing and storing of the count, the value in memory is only one less, instead of two less, than the original value. To overcome this difficulty it is necessary to provide in the system for temporarily blocking references to selected locations in memory or entry by more than one PU into selected sections of code. The latter approach appears more satisfactory and is adopted in this paper. Note that a shared set of instructions may he a large or a small set, and may or may not contain branches and loops.
MULTIPROGRAMMING
125
A final requirement is that there be a priority scheme which permits urgent programs to be processed expeditiously and normal programs to be processed with relative priorities which yield good utilization of the equipment. Priorities must be under the control of the supervisory program. Accordingly, no problem program must be able to alter the priorities allocated by the supervisory program or impede their implementation by holding on t o a processing unit which is needed by a higher priority program. Thus, it is necessary to provide for displacement (from a PU) of one program by another program a t an arbitrary stage of execution of the first. An address is required for dumping (and later restoring) the P U stat,us (arithmetic and index registers, etc.) of the displaced program-this address is called the second base. 6.2 Note on Implementation
I n what follows, the supervisory functions are described as though they were implemented by an ordinary program executable b y any PU. For the sake of simplicity, and because the supervisory activity may be expected to take but a small percentage of time of the PU’s, we shall assume that the supervisory program always operates in the blocked mode; i.e., a t any instant only one PU may be executing it. An alternative implementation would be by a microprogram. However, whether program or microprogram is used, the supervisor is retained in read-only memory because of its critical role in the operation of the whole system. Supervisory tables, on the other hand, are held in regular core storage and are protected by the same means applied to problem programs. 6.3 Rules for Interruption
Interruptions generated within a given processing unit are normally handled by this same unit. Examples are: (1) the interval timer in this PU reaching zero; ( 2 ) program exceptions, such as invalid op code or illegal address; (3) data exceptions, such as overflow, zero divisor, flagged data word, etc.; (4) a CALL SUPERVISOR operation being encountered in the program. I n cases (I), ( 2 ) , and (4) the supervisory program is called in on the same P U in which the interruption occurred. I n case (3) the interruption causes a trap t o a location within the execution area assigned to the problem program. An exceptional case is that of an interruption generated as a result of a malfunction within some PU. The corresponding interruption signal causes the malfunctioning PU to stop with its operable bit turned off (i.e., set to indicate that this PU is not operable in the normal sense for the time being). This signal also interrupts some other P U which
126
E. F. CODD
has its operable bit on (which one does not matter). Thus the supervisory program may be executed by a PU which is still functioning properly and may also take note of the temporary unavailability of the malfunctioning PU. We have already noted that all PU’s must be capable of receiving and acting upon I0 interruptions, and that only one PU at a time is so designated. If a malfunction is detected in the PU designated to receive I0 interruptions, these interruptions must be switched either automatically or under the control of the supervisor to some other PU which is operable. We choose the latter approach because it facilitates implementation of the priority requirement. Thus, whenever the supervisor assigns work to a PU-or attempts to find work and fails-it determines which PU has the lowest priority activity, and selects that one to receive I0 interruptions, at least until the next work assignment for PU’s is considered. Note that idleness is the lowest priority activity of all. 6.3.1 Nondata-Exception Interruptions Suppose the supervisory program is not currently active on any PU. Then, every operable PU is capable of accepting, and acting upon, a nondata-exception interruption generated within itself and is said to be in the normal state. In addition, one of these PU’s is set to accept and act upon an I0 interruption, should it arise. Now, suppose a nondata-exception interruption occurs on Pun.Further nondata-exception interruptions in Pun are not disabled, because, as we shall see, their occurrence means an emergency situation. PU, is, however, placed in the executing supervisor state. In this state nondata-exception interruptions cause a trap to special locations designated for emergency action. Simultaneously with PU, being placed in the executing supervisor state, all other PU’s and all I0 channels are placed on notice that each one may continue to operate concurrently with PU, until such time as a nondata-exception interruption is generated within that I0 channel or PU. This channel or PU must then enter the waiting for supervisor state, in which all of its registers are frozen until the supervisor can be made available to it. In this way any interruptions requiring supervisory treatment which are generated at a time when the supervisor is busy handling some other activity are held in suspension until the supervisor is free to work on them. The supervisory program is constructed so that it does not generate interruptions at all unless a malfunction has occurred in the PU, memory bus, or memory. Therefore, the generation of an interrupt condition in a PU which is executing the supervisor calls for emergency action. Upon completing its activity on PU,, the supervisor places this PU in the on-notice state and issues a release signal to all I0 channels and all
MULTIPROGRAMMING
127
PU’s including itsclf. If none of the PU’s is waiting for the supervisor, all PU’s and all channels are released by this signal from the on-notice state and returned to the normal state. If any of the PU’s are waiting for the supervisor, one of these is selected in an arbitrary way for placement into the executing supervisor state, and all other PU’s and all channels are kept in the on-notice state. Finally, after PU’s waiting for the supervisor have passed one by one through the executing supervisor state, the frozen channels, if any, are selected one by one to submit their waiting interruptions to the PU with the lowest priority activity. Note that the executing supervisor state is not defined for channels, because they only generate interrupts-they do not act upon them. We may now summarize the interruption states of I0 chaniiels and PU’s by means of the state diagrams in Fig. 10. 1/0 CHANNEL
/-\ NORMAL
% s; : : supervisory :state
Release with no PU waiting for supervisor
WAITING FOR SUPERVISOR
ON NOTICE
I/O interruption
generated within this channel, with supervisor busy on some PU
PROCESSING UNIT
1
Nondata -exception interruption, with supervisor not busy
+ EXECUTING SUPERVISOR
NORMAL Other PU begins executing supervisor
visory activity superon
IReIease
F WAITING FOR SUPERVISOR
ON NOTICE
Nondata -exception interruption, with supervisor busy on other PU
FIG.10. Summary of interruption states of I 0 channels and PU’s.
128
E.
F. CODD
When a P U is in the normal or on-notice stsate,it is either executing a problem program or it is idle (i.e., it is not assigned to any stream and is merely marking time). Conversely, when a PU is executing a problem program or it is idle, it must be in the normal or on-notice state. I n the waiting for supervisor state, a PU is assigned to a stream and is therefore not idle in the sense defined above. The handling of nondata-exception interruptions by the supervisor is facilitated by the availability of two pieces of information in the hardware. First, the identifying number n of PU, may be read by the supervisor and used to locate table entries in memory for PU,. Second, the identifying number m of I0 channel m is deposited by hardware in a standard supervisory location whenever an interruption from that channel is taken by the PU designated to receive and act upon I0 interruptions. 6.3.2 Data-Exception Interruptions A data-exception interruption on a given PU disables further dataexception interruptions on that PU only and has no effect on nondataexception interruptions or their associated int,erruption states described above. It is the responsibility of the problem program generating a data-exception interruption to handle this interruption. A hardware operation ENABLE DATA EXCEPTIONS is available for use by problem programs. When one stream is displaced by another on a PU, the data exception state (enabled or disabled) is saved for the former and restored for the latter-along with all the other it,ems such as accumulator contents, index registers, etc. 6.4 Protection and Relocation
Each PU is equipped with its own protection system and, since all PU’s are functionally identical, so are all the protection systems. The protection system of a given P U is automatically disabled when a nondata-exception interruption occurs on that PU, thus permitting the supervisory program to have access to normally protected memory areas and registers, including the protection system registers themselves. Just prior to passing control back to a problem program on some PU, the supervisory program sets up the protection system registers with values appropriate to this problem program. As it passes control back to this problem program, the supervisor re-enables the protection system on this PU. A simple form of relocation and protection will sufIice for the purpose of illustration. Consider any one of the PU’s and suppose its protection system is enabled. All addresses developed within this PU for use in referring t o memory are incremented just prior to use by the contents A of a relocation register, and the incremented values are compared with the
MULTIPROGRAMMING
129
contents B of an upper boundary register. Whenever the addition results in overflow or the comparison indicates that the relocated address exceeds the upper boundary, a program exception interruption occurs and the supervisor is brought into action on this PU. Normally, A is the base address of the problem program. When two PU’s are servicing the same problem program, the values of A and B in effect on one PU are normally equal respectively to the values of A and B in effect on the other PU. Of course, it is certainly possible to have different pairs of values in effect, but the primary need for protection is between, rather than within, problem programs. 6.4.1 Restricted Hardware Operations The majority of operations provided in hardware are freely usable by any program. Certain ones, however, are intended for use by the supervisory program only while others are intended for use by problem programs only. For example, absolute I0 instructions are in the supervisor-only class, while the operation CALL SUPERVISOR is in the problem-program-only class. An attempt by a problem program to issue an operation restricted t o use by the supervisor, or vice versa, is aborted and a nondata-exception interruption occurs. This is realized in hardware by having the PU examine its own interruption state upon receipt of a restricted operation. A table of these restricted operations follow: ~
Supervisor only
Problem program only
CALL PROBLEM PROGRAM
CALL SUPERVISOR
S E T AND WAIT
ENABLE DATA EXCEPTIONS
START OTHER PU SWITCH I 0 INTERRUPTIOh-S
START CHANNEL,
etc.
We now define the first four supervisory operations and the first PP operation in this table. The others need no elaboration. (a) CALL PROBLEM PROGRAM, address p . Let the contents of the specified memory location p be an address b. The executing PU, say PU,, fetches a starting address from memory location b and places it in its instruction counter. PU, then enables its protection system, prepares so that any future nondata-exception interruption will trap t o nonemergency locations, places itself in the on-notice state, and sends a release signal to all PU’s. Prior to issuing this operation, the supervisor fetches the first base
E. F. CODD
130
or second base (whichever is appropriate) of the stream to be started,
and stores this address b in memory location p . It also restores all arithmetic and index registers per b, and sets up the relocation base register and upper boundary register if new values are required. (b) SET AND WAIT.^ This operation places a PU which encounters it in the idle, on-notice state and causes the release signal to be issued to all PU’s including itself. In the idle state, a PU is ready to accept either a start signal from any other PU or an I0 interruption. Of course, the I0 interruption switch must be set to this PU if it is actually to receive an I0 interruption. (c) START OTHER PU, PU number. Let the PU encountering this be PU, and the PU specified by the PU number be PU,. If n = 2, a program exception interruption occurs. If n # z and the operable bit of PU, is off, a program exception interruption occurs. Otherwise, PU, is placed in the waiting-for-supervisor state ready to start execution at a supervisory location associated with leaving the idle state. Actual execution starts on PU, as soon as it is permitted to go into the executing supervisor state by PU,, or some other PU leaving that state. (d) SWITCH 10 INTERRUPTIOXS, PU number. This operation sets the I0 interruption switch to the specified PU. It is permissible, in this case, for the PU encountering this operation and the specified PU to be identical. upon receipt of this operation from a problem (e) CALL SUPERVISOR. program stream, PU, disables its own protection system, traps to a nonemergency location assigned for this kind of interruption, and prepares for any future interruptions to trap to emergency locations. In addition, PU, places itself in the executing supervisor state or the waiting-for-supervisor state according to the availability of the supervisor. 6.5 Special Operations
A small set of special operations is adequate to permit a programmer to express his program so that two or more PU’s can work on it concurrently, if they happen to be available. These special operations, accompanied by appropriate addresses, are essentially instructions to the supervisory program. They enable the problem programmer to start new streams of activity, stop them, regroup them, and whenever necessary, block them. In addition, these operations provide a vehicle of expression which permits An operation similar t o this was defined in 1957 for the IBM STRETCU Computer by F. P. Brooks, Jr.
MULTIPROGRAMMING
131
a program t o be independent of which PU’s, and how many, are available to service the program a t any instant. A list of the operations follows. Processing
Input-Output
START
GET
STOP
PUT
BLOCK UNBLOCK
We now proceed to give an introductory description of these operations, after which we can deal with the queueing system and describe these operations in more detail. A typical format for one of these operations is as follows. Location
m m+l m+2
Contents special op code Slot First Base, Second Base CALL SUPERVISOR,
The slot is a word in which the supervisor can deposit addresses and bits for queueing purposes.
Slot, First Base, Second Base From the user’s viewpoint, this operation calls for continuing the execution of the current stream of instructions and starting the concurrent execution of a second stream a t a start address specified indirectly b y the first base. The index quantities and parameters for this stream are retrievable from a set of consecutive locations starting a t the first base. If status dumping is required a t any time in the execution of the stream now called, i t is to be effected per the second base. From the supervisor’s viewpoint, this operation represents a request for starting concurrent execution of a second stream-a request which may be serviced either now or later, depending on the availability of units and the competing requirements of other programs and of other streams in this program. Accordingly, the supervisor incorporates the request in the queue for processing service. The request is kept in the processing queue while being serviced as well as while waiting for service. Service for this request is not considered complete until the following operation is encountered. 6.5.1
START,
132
E. F.
CODD
6.5.2 STOP No slot or addresses are needed for this operation. The stream in which this operation is encountered is terminated. The supervisory program removes from the processing queue the corresponding request, and scans the queue for new work for the PU which was servicing this request. 6.5.3 BLOCK, Slot No addresses are needed for this operation. From the user’s viewpoint, this operation prevents more than one processing unit at a time from entering the code which immediately follows this operation. Actually, entry of a second PU is only prohibited if it is attempted via this operation itself. This limited form of protectiong is quite adequate for this purpose. The supervisor determines whether entry into the code immediately following is permissible or not. If it is, a bit is set in the slot to prohibit any other entry via this operation until further notice. If this bit (the blocking bit) is already on, indicating that entry is not permissible, the request which was being serviced by this processing unit is re-chained into the processing queue in such a way as to indicate that availability of the code as well as of the unit is awaited. 6.5.4 UNBLOCK, Block Address No slot is needed for this operation. This operation permits a new entry to this section of code at the block address, such entry having been previously prohibited by a BLOCK operation located at this block address. Upon receipt of the UNBLOCK operation, the supervisor ascertains whether there is a request awaiting availability of this section of code. If there is no such request, the blocking bit for this section is turned off. Otherwise, the blocking bit is left on, and that request for this code which is next in line is given a new position in the queue which indicates that availability of a unit only is awaited. 6.5.5 Programming Example Within a section of code denoted A we desire to make two calls for a section B. The resulting executions of B are intended to proceed concurrently with one another and with the continuation of A. Finally, all three calls are to be regrouped on a section C. One method of programming this requirement is shown in Fig, 11. The strings of dots indicate sections of code of any length or complexity. In particular, these sections may contain conditional branches. Another This form of protection should not be confused with interprogram protection.
MULTIPROGRAMMING
133
BLOCK Reduce count by I Test count
BRANCH
UNBLOCK STOP
Reset count to 3 UNBLOCK
J
FIG.11. Three-call programming scheme.
point of interest is that regrouping may be made to depend on any logical combination of states of the three calls by replacing the counting with suitable manipulation of bits.
6.5.6 The I0 Operations: GET a n d PUT As mentioned earlier, problem programs do not issue absolute I0 instructions at all. The compiled (or object) form of these programs contains the pseudo-ops GET and PUT wherever the program is ready to acquire an input record or to dispose of an output record respectively. The I0 unit involved is designated symbolically by a number called the symbolic unit (SU) * A symbolic unit corresponds to an absolute tape unit or an absolute region of disk storage located on one of the disk units. The user specifies which type of storage he needs, but the allocation routine within the supervisory program determines which physical unit and, in the case of disk storage, which absolute region is to be assigned. When the symbolic unit corresponds to a tape unit, the GET operation implies that the next record from tape (i.e., next in arrival sequence) is required, and the PUT operation similarly implies a sequential disposition on tape. When the symbolic unit corresponds to a disk region, the choice between the sequential and nonsequential modes of referencing must be made. For simplicity, we assume that each problem program defines its own buffer areas in core storage and provides a buffer directory for use by the supervisory program. The supervisor attempts to keep read buffers as full as possible and write buffers as empty as possible, but observes an established set of priorities in so doing.
134
E. F. CODD
Thus, when a GET operation is issued from a processing stream belonging to a problem program, the required record may be already available in core storage. In this case, the supervisor plants the base address of this record (relative to the base address of the problem program) in the slot immediately following the GET operation and permits immediate resumption of the stream on the same PU as before. Whenever the supervisor discovers upon receipt of a GET operation that it has not yet acquired the record now needed (this is invariably the case with the nonsequential mode of reading from disk storage), it suspends processing service to the stream in question until that record has been acquired. It determines if retrieval of the record can be initiated without violating established priorities and seeks other work for this processing unit. Eventually, when the record has been read into the appropriate buffer area, the supervisor is activated by the corresponding I0 interruption, and it changes the status of the service request for the suspended stream to indicate that it is merely awaiting availability of a processing unit. If the stream now released for action is of higher processing priority than the lowest priority stream previously being serviced, then the newly released stream obtains service immediately 011 this PU. The PUT operation is treated similarly. 6.6 The Queueing System
Queueing itself has the effect of decoupling timewise the creation of demands by programs on the one hand and their servicing on the other. When symbolic to-absolute translation is incorporated in the queueing system, spacewiselodecoupling is obtained also. The queueing system proposed herein has the following additional properties : (1) Interprogram priorities are assignable by the supervisor independently for all forms of service. ( 2 ) In the case of disk units it is possible for the priorities to reflect the physical location of the assigned regions and, as a result, cause each arm to service the request involving the least movement in a given direction (inward or outward) until all requests in that direction have been serviced and then to change direction. (3) Each problem program may generate any number of requests for processing service, subject only to core storage limitations. (4) While a problem program may through program error mutilate its own request for service, it cannot damage the rest of the queueing structure, unless a malfunction has occurred in the protection system of a P U or in memory itself. 10 In this context, space connotes areas in core and disk storage, also sets of I 0 units of all kinds.
MULTIPROGRAMMING
135
( 5 ) Processing requests which are awaiting availability of (a) a section of code and a PU, (b) a buffer area and a PU, (c) a PU only, and (d) nothing (i.e., the request is being serviced) are all distinguished from one another. The queues to be considered are those for processing service (one queue only), tape channel service (one queue for channel), disk channel service (one queue per channel) and seek service (one queue per disk unit). In describing the structure of the various queues, three core storage areas are of vital interest. The first is the area in which supervisory tables are located. Each queue has its origin or root in this area. The second area is the control area for each problem program. This area contains PP control nodes which contain information necessary to the proper servicing of the corresponding program. Like the supervisory tables, all PP control areas are accessible to the superwisor only. Finally, there is the execution area for each problem program which is freely accessible by that program and by the supervisor, but by no other program. The request nodes for a PP are all located in the corresponding execution area. The queueing structures are illustrated in Figs. 12, 13, and 14. Each queue consists of a set of nodes (marked 0 ) linked together to form a tree. Each node consists of a set of several consecutively located words in core storage. The location of any node is defined to be the location of its first word. A link from node x t o node y is realized by storing in a slot in node x the location of node y.
6.6.1 T h e Processing Service Tree The processing service tree begins in the supervisory tables a t a supervisory node labelled processing service (Fig. 12). Chained from this node is a set of processing control nodes, there being exactly one such node for every problem program. The ordering of these control nodes in the tree determines the priority with which problem programs receive processing service. Attached to each processing control node is a subtree consisting of processing request nodes. Each of these nodes consists of a START operation and its associated slot, first base and second base. Those request nodes which, in the diagram, are aligned horizontally with the corresponding control node represent requests which may be serviced concurrently. For example, in Fig. 12 request nodes tl, ul,v1 belonging to PP2 are aligned horizontally with the PP2 control node for processing. Thus, three distinct processing units may, if available, be assigned t o service these requests concurrently, one PU per request. Those request nodes which are aligiied in a vmtical chain must not be
E. F. CODD
136
PP 4
PP 2
PP 7
PP I
1ICONTROLf-kI 1 ETC
i
EXEC I
I
‘LPROTECTION BOUNDARY P P = PROBLEM PROGRAM
FIG.12. Processing service tree.
serviced concurrently. For example, in Fig. 12, u1 is blocking u2 from using some shared section of code. Accordingly, the stream corresponding to u1 must issue a n UNBLOCK operation before u2 can be serviced. Note that as far as the user is concerned, only two types of precedence constraintsblocked code and unready buffers-are handled within the queueing sys-
137
MULTIPROGRAMMING CHANNEL 3 SERVICE
SUPERVISORY PROGRAM
CHANNEL 3 ASSIGNMENT
p
/
I
P
PP 2
PP I
CONTROL PP 7
EXEC
PP 4
I
EXEC I
I
PROTECTION BOUNDARY
SU = SYMBOLIC UNIT PP = PROBLEM PROGRAM
FIG.13. Tape channel service tree.
tem. More general requirements for precedence constraints between sections of a program may be implemented by means of regrouping, an example of which was illustrated in Paragraph 6.5.6. We have observed that upon receipt of a STOP from some stream the supervisor is required to remove the corresponding request node from the processing queue; also, from time to time the supervisor is required to displace streams temporarily from PU's. Both of these supervisory activities are facilitated by the processing assignment nodes located in the supervisory
E. F. CODD
138
SUPERVISORY PROGRAM
DISK UNIT I SEEK
,/DISK
ARMS
/’
ETC
UNIT 2
SEEK ARMS
SEEK
PP 2
PP 5
I I I
su4
4 I
I
I
L
I
EXEC
I
‘4
su 2
PP 4
I EXEC I
I
L-
PROTECTION BOUNDARY
SU SYMBOLIC UNIT PP = PROBLEM PROGRAM
FIG.14. Disk channel service tree
tables. There is one such node for each PU in the system and it contains two addresses, one pointing to the control node and the other t o the request node currently being serviced by the PU in question. The four types of processing nodes introduced above are now listed with
MULTIPROGRAMMING
139
their contents in Table I; V stands for vertical chain, H for horizontal chain. In Table I, the items marked X in the processing request node are located in the slot associated with every START operation. They are placed in the slot by the supervisory program, not by a problem program. TABLE I Node
Contents
Processing service (one only)
V address V bit
Processing control (one for each PP)
V address V bit H address H bit
Processing request (one for each stream)
Processing assignment (one for each PU)
X X X X X X X
START (Special Operation) V address V bit H address H bit Busy bit Buffer stalled bit Base bit First base Second base
Address of processing control node being serviced Address of processing request node being serviced Operable bit Busy bit
The V and H addresses are in every case the addresses of successor nodes in the service tree, vertically and horizontally respectively. The V and H bits indicate whether or not there is a vertical or horizontal successor respectively. The assigned bit indicates whether or not a PU is currently
assigned, and hence is servicing, this request. The buffer stalled bit is turned on if, in servicing this request, a GET or PUT is encountered and the buffer in question is not ready for use by the problem program. The base bit indicates which base (first or second) is to be used for setting up a PU assigned to service this request. The first set-up following receipt by the supervisor of a START is effected per the first base. All subsequent set-ups and, of course, all dumps for this stream are effected per the second base.
E. F. CODD
1 40
The items in the processing assignment node apply to the corresponding PU and have self-evident definitions.
START on PUn from stream (c,ra) at location rd
F
Obtain c from Pun aksignrnent node and attach request to end of horizontal chain from control node at c.
Entry point 1
Turn off base bit, bdffer-stalled bit and busy bit in this request node.
1
) -
Scan in sequence PUx assignment nodes x = I, 2,3,._. for idle, operable PU
II
None chosen
PUy chosen
ment node
--
Turn on busy bit in request node, issue --START OTHER PU to PUy
-I Re1ea se signal
$.I
Prepare to resume stream (c, ra) on PUn (using second base b 2 of request node for this stream), issue CALL PROBLEM PROGRAM per b2
---------~
I I Prepare to service assigned stream (c, rd)
(see entry point 3 i n STOP)
FIG.15
-F
n -
MULTlPROGRAMMING
141
START
Pun assigned to (c, ro. PUy idle
AFTER
.
C
?
Pun assigned to (c, ro) PUy assigned to (c, rd)
Busy
4 rc
Code Blocked
FIG.16
address of a processing control node address of a processing request node identification of a stream which has a request node a t r chained t o a control node a t c We omit details concerning the manipulation of chain addresses and bits required to couple a node into a chain or uncouple it from a chain, since the reader can with little effort fill in these details for himself. C
r (c, r)
6.6.3 I 0 Service Trees We assume that cross-channel switching is not provided for the tape or disk channels. If it were, only minor changes would be required in the following scheme. The most important of these would be the combination of all trees for individual tape channels into one tree for all tape channel service.
E. F. CODD
142 STOP on P u n from streom (c,rb) at location h
Obtoin c, r b from P u n ossignrnent node and remove corresponding request from processing queue. I f this request node wos the only one ottoched t o the corresponding control node
1
Otherwise
'-{ End of job: remove this control node from the processing queue
Entry point 2
Scan processing queue for new assignment starting from the beginning of the queue ( i . e., per the V address of the processing service node), choose first request found for which both the busy bit ond the buffer -stolled bit ore o f f .
r-----'1 I f none found
Otherwise
P u n ossignment TL f f busy bit node, in
Place c2, rf for chosen request in P u n
issue SET AND WAIT.
ossignment node
Entry point 3
7
1
Prepare to service assigned stream i.e.. load registers of P u n per f i r s t or second base (occording as base bit is off or on), turn base bit on, set relocation and protection addresses per c, issue CALL PROBLEM PROGRAM per first or second base.
FIG.17
Figures 13 and 14 illustrate respectively the tree for a particular tape channel and the trees (one per disk unit) for a disk channel. The channel service node and assignment node are very similar to their processing counterparts in information content and function. The channel control nodes and request nodes are, however, very different from their processing counterparts. We begin by discussing the complete set of control nodes for a single problem program. The single node for processing has already been described. Each of the remaining control nodes corresponds to a symbolic unit (SU) used or ignored by this program. The SU nodes are consecutively
MULTIPROGRAMMING
143
STOP BEFORE
Stalled
Pun assigned
to ( c l r r b )
I.
Code Blocked
c2f----2--2N0, Not Busy
Busy
AFTER
Busy
CI
8
‘d
rC 0 - y
Busy
~3 Buffer Stalled
Pun assigned to ( c 2 , r f )
‘e
bCode Blocked
FIG.18
numbered and are ordered in memory by this number. Thus, the supervisor is able to locate the appropriate SU node quickly whenever it receives a GET or PUT accompanied by a symbolic unit number. The information within the SU node enables the supervisor to identify the absolute unit (and, in the case of disk storage, the region) corresponding
144
E. F. CODD
.
BLOCK on Pun from stream (c,rd) at location h
Pun
Examine blocking bit in BLOCK slot I f off
I
Turn blocking bit bn in BLOCK slot
+
Obtain r d from Pun assignment node and place in BLOCK slot
J.
Resume service to previous stream (c, rd) on Pun
Obtain address rb of request node for blocking stream from BLOCK slot and vertically chain onto i t the current request node on Pun (address rd of this node obtained from Pun assignment node)
J.
Turn off busy bit in request thus vertically chained
t
Scan for new work for Pun (see entry point 2 i n STOP)
to this SU. As usual, this correspondence is set up at loading time by the allocation routine associated with the supervisor. Other items in the SU node identify the mode in which the unit is being operated (sequential or nonsequential) and the state of readiness of the set of buffer areas associated with this SU." The request nodes chained to a given SU node define buffer areas with origins and sizes determined by the programmer (or course, the origins are compiled in relative form). h'ormally, these buffer nodes are chained t o their SU node permanently, i.e., throughout the execution phase of the problem program. I n addition to defining a buffer area and indicating whether its current use is for reading or writing, a buffer node describes the state of this area: Is it ready for use by some stream, is it in use by a stream, is there a stream stalled because this buffer area is not ready and, if so, where is the processing request for this stalled stream? Chain addresses and bits are also included in each buffer node. Their use depends on whether the sequential or nonsequential mode is in effect for this symbolic unit. 6.6.3.1 Sequential Mode. I n the sequential mode a set of records is to be transmitted into or out of core storage preserving the ordering in which l 1 A buffer area is ready for a GET operation if it is full and for a PUT operation if it it3 empty. A set of buffer areas is ready if all members of the set are ready.
MULTIPROGRAMMING
145
BLOCK BEFORE Pun ossigned to (c,rd) BLOCK cf.L+//G Not Busy Busy ‘d
,//
Code Blocked
AFTER IF NOT BLOCKED Note: BLOCK slot changed from 0 to I , rd
Same diagram 0s obove,
AFTER IF BLOCKED BY rb
Block slot unchanged: 1, r b Pun ossigned to (c, ro) Code Blocked ‘d
Code Blocked
FIG.20
they are originally stored or computed. Thus, if a SU has associated with it a buffer consisting of several areas and the sequential mode is in effect, the supervisor services these buffer areas in a fixed, cyclic manner. Further, either all areas of a given buffer are designated as receiving areas for input or all are designated dispatching areas for output. The buffer nodes associated with a SU node in the sequential mode are vertically chained in a cycle. In the special case of a buffer consisting of only one buffer area, this rule requires that the buffer node be chained to itself vertically; i.e., its V address is set equal to its location. As the supervisor completes servicing one buffer area, it prepares to service the next in
E. F. CODD
146
-Pun
Obtain 'b from PU" assignment node, examine V b i t in this node
No other request waiting for this code
f
Turn off blocking bit in BLOCK slot addressed by h
Request waiting far this code
?
Uncouple next waiting request (c, rd) from vertical chain on rb and attoch to end of horizontal chain on c, leaving any other waiting requests [c, re) vertically attached to i t
$.
Place rd in BLOCK slot oddressed by h, look for PU t o service stream (c, rd) (see entry paint I in START)
Resume service to stream (c, rb) on Pun
FIG.21
the cycle by copying the V address of the node just serviced into the SU node. 6.6.3.2 Nonsequential Mode. I n the nonsequential mode the buffer areas belonging to a given buffer may be serviced in any order or concurrently. Concurrent servicefor a given SU is, however, limited to seeking (we assume two or more arms per disk unit, several disk units per transmission channel). Any buffer area associated with a SU in the nonsequential mode may be used for input or output regardless of its immediately previous use and regardless of the use being made of other buffer areas associated with this SU. Each GET or PUT operation issued for a SU in the nonsequential mode is accompanied by a relative disk address. Any stream which issues a GET operation of this type is unconditionally stalled because, even if there is a buffer area immediately available for receiving input, some time must elapse before the desired record has been read into this area. Note that processing activities for this program as a whole are not stalled-other streams may continue to function. Any stream which issues a nonsequential
MULTIPROGRAMMING
147
UNBLOCK, h
Pun assigned to (c, rb ) BEFORE
Buffer Stalled Block Slot: l,rb
f
?ode Blocked
re
Code Blocked
AFTER
Buffer Stalled
Busy
I
Busy
& Code Blocked 'e
operation is stalled only if all buffer areas for the specified SU are now unready for use by the program. The buffer nodes for a SU in the nonsequential mode are chained horizontally from the SU node. The ordering of nodes in this chain has no special significance. If the disk unit to which this SU is allocated has two arms, any two buffer nodes belonging to this SU may be concurrently assigned to the arms. 6.6.3.3 Completion of I 0 Operations. Successful completion of data transmission to or from a tape or disk unit activates the supervisor on some PU, and implies that a buffer area previously unready for program use is now ready. The supervisor finds the buffer node for this area by indirect referencing via the channel assignment node, the arm assignment node (disk PUT
148
E. F. CODD
only), and the SU node, sets the ready bit on, and determines if there is a stream stalled due to this buffer area being unready a t the time a GET or PUT operation was issued. If a stream is waiting for this buffer area, the supervisor obtains from the buffer node the address of the processing request node for the stalled stream, and turns the buffer stalled bit off, thus making that request eligible for further processing service. If the relevant SU node is in the sequential mode, the supervisor now advances an address in this node so that it points to the nest one of its buffer nodes needing supervisory attention. Regardless of mode, the supervisor looks for work for this channel. In the case of a tape channel it can start scanning the corresponding service tree immediately. For a disk channel, however, the completion of a data transmission operation represents an opportunity to put into effect as many outstanding requests for seek service as possible. Accordingly, the supervisor scans the seek service tree for each unit on this channel which has at least one idle arm, and assigns as many of the idle arms as possible to buffer nodes needing service. Then, a disk unit is selected by taking the next unit (in a fixed cycle) which has an arm assignment such that the seeking is complete but the transmission still remains to be done. The channel is then assigned to such a n arm. 6.6.3.4 I 0 Queueing Rules. The queueing rule invoked when a tape channel service tree is scanned is as follows. The first SU node encountered that indicates that there is a stream stalled due to buffer area(s) belonging to this node being unready is selected for service. If no SU node is in this state, the first SU node encountered, which indicates that not all of its buffer areas are ready for program use, is selected. This selection rule can be implemented in such a way that only a single scan is required. The same rule may be applied to selecting a request in a disk seek service tree. However, the incidence of stalled streams may be expected to be higher for disk operations (the majority of which are likely to be nonsequential) than for tape operations (normally sequential). Moreover, the time taken to service a set of seek requests for a given disk unit may be sensitive to the order in which those requests are serviced. Thus it may be profitable to select next that request which entails the least arm movement. If this particular rule were followed, it would be possible for requests for remote areas to be left unserviced for indefinite periods. One simple modification is to service that request which entails the least arm movement in a given direction (inward or outward) until all requests for that direction have been disposed of, and then change the direction. Figure 14 illustrates an implementation of this rule. A bidirectional chain couples together all the SU nodes for a given disk unit in a sequence determined by the physical location of the corresponding
149
MULTIPROGRAMMING
TABLEI1 Node
Contents
Channel service (one per tape channel)
T T
V address V bit
Seek service (one per disk unit)
D D D
Vt address V1 bit
D
Vz bit
SU control (one per SU)
V2 address
V1
D D D D
address
V1 bit. VZaddress V2 bit Absolute channel Absolute unit Disk region base address Disk region upper boundary Mode bit Stream stalled bit Buffer all ready bit, Address of buffer node
D
Chain address (V or H) Chain bit (V or H) Base address of bufler area Stream stalled bit Address of processing request node of stdled stream Ready bit In use bit GET/PUT bit Itelative disk address
Tape channel assignment (one per tape channel)
T T T
Address of SU control node being serviced Operable bit Busy bit
Disk channel assignment (one per disk channel)
D D D
Address of arm assignment node being serviced Operable bit Busy bit
Arm assignment (one per arm)
D D D 11
Address of SU control node being serviced Address of buffer request node being serviced Operable bit Busy bit Ready for transmission bit Inward/outward bit
Buffer request (one per buffer area)
D D
150
E. F. CODD
disk storage regions. This chain is, of course, established and maintained by the allocation routine. When the supervisor is looking for work for an arm, it. can find that outstanding request which is for the nearest region in a given direction by first scanning the buffer nodes belonging to the SU just serviced and, if no work is found, proceeding to the next SU node in the direction currently in force. Should the end of the chain be reached with no work found, the direction of scan is reversed. 6.6.3.5 Summary of I0 Nodes. The seven types of I0 nodes introduced above are now listed in Table I1 with their contents; D and T denote items relevant t o disk only and tape only, respectively. All other items are relevant to both. Two vertical chain addresses and two vertical chain bits are required for the service and control nodes used with disk units. The address and bit denoted VI are used for inward scanning, those denoted Vz for outward. The reader is invited to develop for himself flow charts suitable for the GET and PUT operations, in the sequential and nonsequential modes, and for I0 interruptions which indicate successful completion of data transmission. 7. Concluding Remarks
Very few computer systems in existence today possess two or more processing units sharing a common set of core storage modules and inputoutput channels. The writer’s opinion is that the demand for this type of system will be greater when an adequate solution is found to the problem of exploiting such a system. The proposals in Section 6 represent an attempt to find an adequate solution to this problem. Consider some of the properties of the proposed scheme from the user’s point of view. First, if the user desires to write his program in a conservative way, as if it were to be executed by a machine with only one processing unit, he may do so by creating only one processing stream. This is certainly reasonable if the running time of his program is short. I n fact, whether his single-stream program is short or long, good utilization of the system can very probably be achieved through running other, independent programs concurrently with his-providing his program does not monopolize memory space. Second, if the user’s problem is expected to consume large amounts of processing time or memory space, he may segment his data or instructions or both and create as many streams as he pleases. The special operations for starting, stopping, and blocking these streams are quite simple for the user, even though complicated for the supervisory program. These oper-
MULTIPROGRAMMING
151
ations are, in fact, suitable for use in a problem-oriented source language, since their definition imposes few constraints on the hardware. Note that a user is not required to segment his instructions and data to fit individual memory modules.12 Third, interprogram protection (an example of which was illustrated) provides the individual user with security for his program and data against accidental or fraudulent manipulation. Now, consider some of the properties of the proposed scheme from the iiistallation point of view. First, space and time can be assigned to programs in a n extremely flexible way, thus giving the supervisory program-and hence t,he installation-a large measure of control over the use of the system. I n particular, processing priorities may be determined in any way the installation sees fit, since they are not constrained by the relative speeds of various units. Second, the scheme accommodates real time processing as well as batch processing. (Note, however, that for certain types of real time processing more elaborate protection would be required.) Implementation of the supervisory program does not appear to present any new problems. Most of the logic (and complexity) required in a multiprogramming supervisor for a system with two or more PU’s is already present in a multiprogramming supervisor for a system with but a single
PU. A word of caution is perhaps in order for both the designer and the user. First, the complexities of communication between memory-sharing PU’s are such that this type of system is not suited to low performance PU’s. Second, an over-zealous programmer may over-segment his program. It is importmt to remember that supervisory overhead is incurred in starting and controlling each stream created, no matter whether the supervisor is implemented in hardware or in software. The programmer is therefore advised not to create a new stream unless the set of operations to be performed in this stream is expected to take a t least as long as the average I0 operation. Today, the outstanding problem associated with multiprogramming is that of finding better methods of automatically allocating storage. This problem is now receiving a good deal of attention and its investigation may be expected to lead to a growing body of theory concerning the nature of programming itself. Although a t this time it appears that few computing installations employ anything more than a restricted form of multiprogramming in their daily operations, we can expect a radical change in this situation within the next l2 We assume that (1) the set of memory modules is addressable as a single homogeneous memory, and (2) any location may be used for instructions or data.
152
E. F. CODD
decade. The need to make effective use of machines with vastly increased performance and capacity will lead to widespread adoption of quite general multiprogramming techniques.
8. Acknowledgments
The author wishes to express his appreciation to the many people who have contributed to this article in one form or another. Particular thanks are due to Messrs. E. S. Lowry and R. H. Ramey, Miss E. McDonough, and Messrs. C. A. Scalzi, F. E. Howley and S. F. Grisoff, who have contributed many valuable ideas and suggestions in the course of their experimental work in multiprogramming under the author’s direction. In addition the author wishes to thank Miss E. McDonough for assistance in preparing the flow charts of Section 6 ; Dr. D. S. Henderson for helpful criticisms of this same section; Messrs. J. W. Franklin and P. J. Nelson for their painstaking editing of the first and second halves respectively; and Mr. E. S. T.owry for editorial assistance in the final stages.
Bibliography 1. Alexander, S. N., The National Bureau of Standards Eastern Automatic Computer. PTOC.Joint AIEE-IRE Gonj., Phaladelphia, Pennsylvania. Pp. 84-89, February (1952). 2. Cocke, J., and Kolsky, H. G., The virtual memory in the STRETCH computer. Proc. Eastern Joint Computer Conf., Boston Massachusetts. Pp. 82-93, December (1959). 3. Codd, E. F., Multiprogram scheduling. Communs. Assoc. Computing Machinery 3, 347-350 and 413-418 (1960). 4. Codd, E. F., Lowry, E. S., McDonough, E., and Scalzi, C. A., Multiprogramming STRETCH: feasibility considerations. Communs. Assoc. Computing Machinery 2, 13-17 (1959). 5. Dreyfus, P., System design of GA&iMA 60. Proc. Western Joint Compufer conf., Los Angeles, California. Pp. 130-132, May (1958). 6. Eckert, J. P., Weiner, J. R., Welsh, H. F., and Mitchell, H. F., The UNIVAC system. Proc. Joint AIEE-IRE Conf., Philadelphia, Pennsyleania. pp. 6-14, February (1952). 7. Everett, R. R., Zraket, C. A,, and Benington, H. D., Sage-a data processing system for air defense. Proc. Eastern Joint Computer Conj.,Wnshin&ton,D.C. pp. 148155, December (1957). 8. Frankovich, J. M., and Peterson, H. P., A functional description of the Lincoln TX-2 computer. Proc. Western Joint Compuler Conf., Los Angeles, California. pp. 146-155, February (1957). 9. Gill, S., Parallel programming. Computer J . 1 , 2-10 (1958). 10. Leiner, A. L., System specifications for the DYSEAC. J . Bssoc. Compiiting Machinery 1 , NO.2, 57-81 (1954).
MULTIPROGRAMMING
153
11. Leiner, A. L., and Alexander, S. N., System organization of DYSEAC.IRE Trans. on Electronic Comruters EC-3, 1-10 (1954). 12. Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A., P1LoT-a new multiple computer system. J . Assoc. Computing Machinery 6,313-335 (1959). 13. Lourie, N., Schrimpf, H., Reach, R., and Kahn, W., Arithmetic and control techniques in a multiprogram computer. Proc. Eastern Joint Comp~derConf., Boston, Massachusetts. pp. 75-81, December (1959). 14. Mersel, J., Program interrupt on the UNIVACscientific computer. Proc. Western Joint Computer Conf., S a n Francisco, California. p. 52, February (1956). 15. Plugge, W. R., and Perry, M . N., American Airlines’ “Sabre” electronic reservation system. Proc. Westerns Joint Computer Conf., Los Angeles, California. Pp. 593402, May (1961). 16. Porter, R. E., The RW400-a new polymorphic data system. Datamation pp. 8-14, January-February (1960). 17. Rochester, N., The computer and its peripheral equipment. Proc. Eastern Joint Computer Conf., Boston, Massachusetts. pp. 64-69, November (1955). 18. Teager, H. M., Time-sharing project. M.I.T. Computation Center Semi-Annual Report No. 6 January (1960). 19. Teager, H. M., and McCarthy, J., Time-shared program testing. 14th Nat. Meeting Assoc. Computing Machinery, Cambridge, Massuchuselis. pp. 12-1 and 12-2, September (1959).
This Page Intentionally Left Blank
Recent Developments in Nonlinear Programming PHILIP WOLFE Mathematics Division The R A N D Corporation S a n t a Monica. California
1. Introduction . . . . . . . . . . . . 1.1 The Motivation for Nonlinear Programming . . 1.2 The Mathematical Programming Problem . . 1.3 Some Definitions . . . . . . . . . . 2 . Differential Gradient Methods . . . . . . . 2.1 The General Approach . . . . . . . . 2.2 The Direct Differential Gradient Method . . . 2.3 The Lagrangian Differential Gradient Method . 3 Large-Step Gradient Methods . . . . . . . 3.1 The General Approach . . . . . . . . 3.2 The Simplex Method for Linear Programming . 3.3 The Simplex-Corrected Gradient Method . . . 3.4 Projected-gradient Procedures . . . . . . 4 . Simplicia1 Methods . . . . . . . . . . 4.1 The Roleof the Simplex Method . . . . . 4.2 The Simplex Method for Linear Programming . 4.3 The Simplex Method for Quadratic Programming 5. Columnar Procedures . . . . . . . . . . 5.1 Columnar Procedures in General . . . . . 5.2 Separable Programming . . . . . . . . 5.3 The Decomposition Procedure . . . . . . 6. The Cutting-plane Method . . . . . . . . 7 Initiating an Algorithm . . . . . . . . . 8 Computer Routines and Literature . . . . . . 8.1 Direct Differential Gradient Methods . . . . 8.2 Lagrangian Differential Gradient Methods . . 8.3 The Simplex-corrected Gradient Method . . . 8.4 Projected-gradient Procedures . . . . . . 8.5 Quadratic Programming . . . . . . . . 8.6 Separable Programming . . . . . . . . 8.7 The Decomposition Procedure . . . . . . 8.8 The Cutting-Plane Method . . . . . . . Bibliography . . . . . . . . . . . .
.
. .
155
.
.
.
.
.
.
.
.
.
.
.
.
. . .
. . .
. . .
. . .
. . .
. . .
.
.
.
.
.
.
. . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
. .
. .
. .
. .
. .
. .
. . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
. . .
. . .
. . .
. . .
. . .
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
. .
. .
. .
. .
. .
. .
. .
.
.
.
.
.
.
. .
. .
. .
. .
. .
. .
.
.
.
.
.
.
. 156 . 156 . 157 . 159 . 161 . 161 . 162 . 163 . 165 . 165 . 166 . 168 . 170 . 172 . 172 . 172 . 174 . 175 . 175 . 177 . 178 . 180 . 183 . 184 . 185 . 185 . 185 . 185 . 186 . 186 . 186 . 186 . 186
156
PHILIP WOLFE
1. Introduction 1.1 The Motivation for Nonlinear Programming
While nonlinear programming has been a topic of discussion among people concerned with allocation problems about as long as linear programming has, not much was done about it until three or four years ago. The recent past has shown a great increase in the amount of attention devoted to this area, however, and some of the reasons for this new interest are readily ascertained. On the theoretical side, it would seem that a large part of the purely mathematical theory of linear programming is now well in hand [ I d ] . While, of course, the subject will never be exhausted for the mathematician, he perhaps no longer feels justified in expecting sweeping new results in that area; but when the hypothesis of linearity is dismissed, it takes away with it many of the known results and intuitions we have about these problems. The resulting gap in our knowledge, together with the great practical interest the resulting nonlinear problems have, offers the theoretician a continuing challenge. Likewise the subject of linear programming seems very well in hand on the computational side. While we may never have computer routines and algorithms capable of solving the largest problems we would like to formulate, we can now solve linear programming problems involving several thousand variables and about 500 equations, problems whose input data may have taken years to collect; and it is fairly clear what must be done to increase this capacity even further. For nonlinear programming, nothing of the sort can be said. Quite a few algorithms for various kinds of nonlinear problems have been proposed-these notes are devoted to describing the majority of them-but very few have ever been put into practice. Some procedures are demonstrably quite inefficient, while others will require much computer programming and experience to test. As will be seen below, there is no lack of information as to how nonlinear programming might be done, but on the other hand there is almost no information on how nonlinear programming should be done. Here, then, is a challenge t o the computational expert; one with which he is beginning to grapple seriously. Finally, the practical value of the solution of these problems is high. Almost no real problem is linear; linearity represents our compromise between reality and the limitations of our tools for dealing with it. The user who has only linear techniques for handling his mathematical programming problems is confronted with one of two expensive alternatives: He must accept the results of a linearization of his problem and bear the expense of their deviation from reality, or undertake the imprecize and laborious task of making the results better by heuristic methods. During
RECENT DEVELOPMENTS IN NONLINEAR PROGRAMMING
157
the last few years, the largest users of linear programming techniques have become impatient with these restrictions. They have trained themselves in the use and interpretation of these methods, and have set u p the elaborate data-collecting agencies which are required by their mathematical models. As soon as effective procedures in this area are devised, they will be usedwhich was not true when linear programming first appeared. As we have said, most problems are really nonlinear. While the petroleum industry, for example, which constitutes the major user of linear programming in the United States, has had extraordinary success with linear techniques, the parameters of interest in the refining and blending processes do not behave in precisely a linear manner. At least one firm is now using a nonlinear programming procedure in its daily production work, despite the priniitiveness of the particular method being used. One simple type of nonlinear problem has been fairly well in hand for about two years: Problems having linear constraints, but with a convex, quadratic cost function, can be solved with reasonable efficiency. An algorithm for this problem has been applied to the minimization of power losses in electrical distribution networks, to the cdculation of investment portfolios so as to minimize risk, and to the scheduling of investment and importation of scarce commodities. (The nonlinearity in this last problem arose from the nonlinearity of import cost as a function of previous purchases. It represents about the largest nonliiienr problem ever solved, having some 200 variables and 100 constraints.) The present paper attempts a survey of the majority of techniques which have been proposed for nonlinear programming problems. Disparate as they are, they may be grouped as we do under several broad headings in such a way that the techniques belonging to one group are similar in concept, arc aimed at a certain class of problem, and may have similar computational effectiveness. Our grouping of the methods presented here is given a t the end of this paper, where the References are cited in connection with that outline rather than in the text. 1.2 The Mathematical Programming Problem
The general mathematical programming problem can be posed this way: Given the objective formula f(z) = f(zl, . . . , 5 , ) and the m constraint functions g 2 ( x ) = g2(z1, . . . , xJ, determine x = (a,. . . , 2,) so as to
hIaxiinizc J(L) under the constraints g,(x)
50
for i
=
1, .
. . ,?)I.
158
PHILIP WOLFE
Naturally, the functions f and gi will have to be restricted in some way, as will be done below. It is important to note that (1.1) comprises all the constraints put upon x, which is otherwise allowed to be any collection of n real numbers. It would be a very serious matter to require besides, for example, that the components of x be integers; and while this kind of restriction is dealt with elsewhere for the case of linear programming, it is inappropriate to the bulk of our work. Incidentally, while the constraints zj 2 0 (for each j ) are usually imposed in a programming problem, there is no need especially to mention them here, since they are just special cases of the constraints (1.1). Figure 1 illustrates this problem in two dimensions (n = 2 ) . The shaded
FIG.1. Nonlinear programming problem:
2' = (31,22);m =
3.
RECENT DEVELOPMENTS IN NONLINEAR PROGRAMMING
159
area is the constraint set-the set of all points x satisfying the relations (1.1). Note that its boundaries are just the places where some constraint function vanishes. These boundaries, as well as the vertices of the constraint set (points, of which there are two in the figure, where so many gi vanish that a unique point is defined), usually play an important role in the problem. A contour of the function f is a set of points on which the function has a constant value: a number of these are sketched. Geometrically stated, the programming problem is that of finding the highest-valued contour having some point in common with the constraint set. 1.3 S o m e Definitions
The gradient at x, denoted by Of (x), has been drawn for a point in the constraint set. Defined by
Vf(x)
=
(F, . . . , - '"'>' ax,
i t is always perpendicular to that countour off which passes through x, and points in the direction of steepest ascent-that is, the direction of maximum rate of change off per unit distance. The gradient plays an essential role, although sometimes a well-concealed one, in most mathematical programming procedures, for it is through the gradient that linear approximations are constructed to nonlinear functions, making them numerically tractable. Specifically, when all the derivatives of f (x) exist, so that Of (2)can be computed, and are continuous, then The equation of the plane tangent to the graph of z point (zo, x") is 2 = 20 Of ($0) . .( - x")
+
= f(x)
a t the (1.2)
(see Fig. 2). Since this tangent plane is the best linear approximation to f in the neighborhood of xo, the formula (1.2) will serve to replace (f(x) where linear computations are to be done. One more assumption will be made about the function f : it is to be concave, which means that the graph of t.he function is to lie below any of its tangent planes (as in Fig. 2). Part of the importance of this requirement lies in the fact that any linear approximation to f by functions like (1.2) will then be one-sided-always give values a t least as high as those sought -so that they place bounds on the possible answers. Another consequence of concavity is the fact that for any number K , the set of all x such that f (2)2 K is a convex set-a set which contains the entire line-segment joining any two of its points (see Fig. 1 for a two-dimensional example). The functions g; will all be assumed to be convex. A convex function is the
PHILIP WOLFE
160
/
Z
FIG.2. Concave function of two v:irinhlcs.
negative of a concave function; its graph lies above any of its tangent planes (see Fig. 3). For any K , the set of all x: such that gl(x:) 5 K is convex. I n particular, each of the inequalities of (1.1) defines a convex set, and thus the constraint Pet itself, being the common part of several convex sets, is convex. The assumptions about f and the g? together give the programming problem a n important property ; any local solution is the proper solution of the problem. A local sclution is a point xo satisfying the constraints such that no constrained point can be found in the immediate neighborhood of x o which gives a higher value off. (The point x = a of Fig. 3 is an example of such a local maximum point which would not be the solution of the maximization problem; there thc objective is convex instead of concave.) If x:l is the solution of the programming problem under the assumptions above on f and gz, then for any other point 2" in the constraint set, all points on the segment will have values of f a t least as high as f (z"), so that zocannot be a local solution without being the proper solution as well. While this "local implies global" propcrty is the most important use of
RECENT DEVELOPMENTS IN NONLINEAR PROGRAMMING
x=a
XO
161
x=b
FIG.3. Convex fiinction of one variable.
convexity in nonlinear programming, and for the simpler procedures its only use, it plays a more subtle, yet vital, role in some of the procedures to be discussed. 2. Differential Gradient Methods 2.1 The General Approach
The contours of the problem of Fig. 1 are commonly thought of as the geodetic contours of a “hill” up which we are trying to move the point 2, as constrained by the “fences” gz(z) 5 0. The ordinary man has no difficulty moving up such a hill, knowing as he does the direction-Vf(x)--of steepest ascent a t the point a t which he is standing. He takes one step a t a time, as quickly as possible, so as always to increase his altitude; when he is constrained by a fence, he moves along it, so long as he keeps going up; and he finajly stops when no step he can take will move him higher. It is simplest to describe his motion in terms of infinitesimal steps dx taken during the elapse of a n infinitesimal time dt, or in terms of his (vector) velocity dxldt. I n the most primitive kind of steepest ascent his motion could be described just by
dx - = Vf(z),
dt
(2.1)
PHILIP WOLFE
162
that is, by
but, of course, this equation takes no account of the constraints of the problem. Unless we are fortunate enough to have a solution of the problem interior to the constraint set, the point x will eventually wander outside it. The two “differential gradient” methods presented here-the “direct” and the “Lagrangian”-adopt different methods for enforcing the problem’s constraints. 2.2 The Direct Differential Gradient Method
If a constraint gi(x) 5 0 is violated, that is so because gi(x) is too big. This function can be reduced by motion in the direction -Vg;(x), just as (2.1) increases f ; and all this can be done at the same time by differential equations dX
- = Vj(x) -
dt
(2.2)
where
K
if gi(x) 5 0 if gi(x) > 0. I n (2.3) the number K is chosen sufficiently large to keep x from leaving the constraint set-for example, larger than the maximum of all lVf(x)l/lVgi(x)[for any x on the boundary. The terms of C& (x)V gi (x) serve to “kick back” x when it tends to leave the constraint set. Of course, the differential equations (2.2) must not be taken too seriously from a n analytic point of view. They will not, in general, have solutions, owing to the discontinuity of the terms &(x). The equations are to be taken rather as a guide t o the formulation of a computational procedure for this problem. I n digital computations, dx and dt would be replaced by finite intervals A x and At; the interval At would be chosen (perhaps) small, and Eq. ( 2 . 2 ) , with dxldt replaced by Ax/At, used to obtain the new point x Ax to replace z in these formulas. Figure 4 sketches the course of x under this procedure for small, nearly uniform steps. Evidently there is a great deal of room for experimentation in the matter of proper step size, decrease of step size as the solution is approached, and in a number of other technical matters. This procedure certainly has the most immediate, intuitive appeal of all those discussed here. While for convex problems it will, of course, find their proper solution, it will also find a local solution for just about any problem, a property that not all the methods share. On the other hand, we suspect
&(x)
+
=
RECENT DEVELOPMENTS IN NONLINEAR PROGRAMMING
163
FIG.4. Direct differential gradient method.
that it is likely to be far less efficient for convex problems than some of the methods below, which may require no more computation to take one very large step toward the solution of the problem than this procedure does to take one of its small steps. 2.3 The Lagrangian Differential Gradient Method
I n this type of procedure the discontinuities of the corrections (2.3) applied to the steepest ascent path are avoided by taking a more relaxed approach to the satisfaction of the constraints. The process is governed by the differential equations
PHILIP WOLFE
164
dx - = Vf(Z) dt
and
- CiuiVgi(.)
if u;> 0 otherwise.
dt
(2.4)
or gi(r)
> 0,
(2.5)
n'ow the right-hand sides of the differential equations are continuous if the derivatives off and g; are, and the equations will possess solutions, although it will still be impractical to try to give them in closed form-the computation must be done pretty much as before. If the point 2: violates constraint i, the number ui will tend to rise, by equation (2.5); and whenever ui is positive, equations (2.4) tend to nudge it back across the boundary of g;(x) I 0. Note that (2.5) prevents ui from becoming negative: there is no penalty attached to having gi(x) < 0. It can be shown that under suitable conditions the above differential equations have solutions xj(t), ui(t) which converge to values (xIo, . . . , zn0,U I " , . . . , urno)= (xo,u0) as t -+ 00. It can then be argued that x 0 solves the programming problem, as follows. Evidently at the point (20, uo)we must have
Supposing that the differential equations were started with all u, 2 0, by (2.5) no u,will ever go negative, so that uto2 0 (all i).
(2.6) Equation (2.5) also make it impossible that u,O > 0 and gt(xO)> 0 a t the same time. so that u,Oy,(z") = 0 (alli). (2.7) Equation (2.4)yields
(2.8) In order to show that xu solves the programming problem, suppose that some other x satisfying the constraints of the problem is given; we will show that f ( r ) 5 f (xo). The equation of thc tangent plane to the graph off a t the point (xo,f(zo)) is 2 = j-(xO) Vf(x"(x - x0) V.f(z0) = xtu;vgt(zo).
+
[see Eq. (1.!2)]. Since f is concave, its graph lies below this plane, so that for any x we have
+ 2 g4xO) + Vgdx0)(x -
f(x) L f(z0) Vf(xO)(z - 2". For the convex functions gl., the reverse is true: we have gd.>
x0)
RECENT DEVELOPMENTS IN NONLINEAR PROGRAMMING
165
for any x and i. These two relations, together with (2.8), give
+ Vf(xO)(l: x*) = fbO) + ~zu,ovg,(xo)(x- zO) < jbO) + C%u*“g%(x) = fbO) + - c~u,ogz(xo).
S(z) 6 j(xO)
-
gE(Z0)l
Czuz0g2(Z)
Since now each g2(x) 5 0, and each u,Og,(zo)
=
0, we have at last
f(.> I f b ” . The Lagrangian differential procedure has the odd property that the point x may wander in and out of the constraint set during the computation; the tcrms uI which eventually force it in do not get large until x has been outside for some time. The final values of the multipliers are, however, of great interest, being the ‘(shadow prices” associated with the constraints of the problem, or the “marginal costs” of the resources whose limited availabilities are asserted by the constraints. They are generally found as by-products in the various solution techniques for these problems, and not introduced explicitly into them as in this procedure. The name “Lagrangian” is given to this method because the u,actually play the role of Lagrange multipliers in the computation. Solving the differential equations has been an elaborate way of solving, for the Lagrangian function
).(S
- CLWd.),
the relations O , [ f ( z > - c&gL(z)l = 0.
Owing to the peculiar nature of the constraints g z ( x ) 5 0, being inequalities instead of equations as in the classical kind of constrained-extremum problem, the multipliers have unusual requirements set upon them : effectively, they are required to satisfy (2.6) and (2.7) above. The formulation of mathematical programming problems in terms of such Lagrangians has been extensively developed, and forms a fruitful and unifying point of view for milch of mathematical theory of these problems.
3. large-Step Gradient Methods 3.1 The General Approach
A “large-step gradient” method is conceived as having the same basic motivation as a differential gradient method; an effort is made to move in the direction of the gradient of the objective function. These methods concentrate, however, on taking as large a step -Ax -in that direction as
166
PHILIP WOLFE
possible, the length of the step being limited only by the constraint boundaries, utter misbehavior of the objective function, or some limitation inherent in the procedure other than the “small step” stipulation of the differential methods. It will be noted that this class is not entirely well-distinguished from the previous class nor those which will be taken up later, the “simplicial” procedures. On the one hand, a differential procedure could certainly be performed with large steps in its initial phases, but ultimately a small-step requirement would have to be imposed to assure convergence, and in any case its underlying reasoning is differential. On the other hand, one of the procedures of this section is certainly simplicial, if anything is-the simplex method for linear programming; nevertheless, we shall exhibit it as a sort of gradient method. (It will also be used within one of the two other procedures described here.) Both of the procedures given here are aimed a t problems having a nonlinear objective function but linear constraints. The ability to take a large step in the direction of the gradient of t.he objective seems to require linear constraints. If the same methods are pushed to cases of nonlinear constraints, it appears that they would spend so much effort dealing with those constraints that they would perform in about the same way that differential gradient methods do. 3.2 The Simplex Method for Linear Programming Figure 5 gives a geometric visualization of the linear programming problem. This one has seven constraints: three of them have the form -zj 5 0 and the other four are given by more complicated linear functions of z. Since the constraints are all linear, the faces of the constraint set are planar. The ith face (i = 1, . . . , 7) is the set of all z for which gi(z) = 0; they have been so labeled in the diagram. The vertices of the constraint set are identified by listing the faces which meet in them: for example, the vertex Pz is identified as lying on the planes gl(z) = gb(z) = gT(s) = 0. The simplex method operates this way: let f(s) = Cjcjx,;
thenvjfx)
=
c,
so Vf(z) is constant. Evidently the solution of the problem will be a point of the constraint set located as far as possible in the direction of the vector c. Take any vertex of the constraint set; the direction numbers of all the edges leading out of the point may be calculated, so that those edges making an acute angle with c can be determined. Following such an edge from the current vertex will yield another having a higher value off. The process is repeated until a vertex is reached none of whose edges make an acute angle with c ; that vertex is the solution of the problem.
RECENT DEVELOPMENTS IN NONLINEAR PROGRAMMING
167
)(3
FIG.5 . Linear progrunlming : the Simples method.
This is sketched in Fig. 5. Beginning at 0, we find the path 01’1P?PZ around the constraint set terminating in a solution. The simplex method constitutes an efficient means for performing the calculations that are necessary for this process, While we shall not discuss it in detail, let us see how it proceeds from the vertex 0 of Fig. 5 to the next vertex. At 0, all the relations of the problem can be described in terms of the three independent variables xl, x2, x3.Moving from 0 to another vertex will be accomplished by increasing one of these variables until a constraint is met. It. is reasonable to choose that variable which will make the greatest increase in the objective: that will be the one for which c j ( j = 1, 2 , 3) is the largest. (In the figure, it is xz.)Retaining z1 = x3 = 0, we move to PI. At PI, it will be again possible to express all the relations of the problem in terms of three independent variables and try again. This time they will be chosen as xl, z3,and, say, x5 = -g5(x), the last choice made so that they will all be zero at P1 and nonnegative inside the constraint set; and the objective function may be re-expressed in terms of them, say as dlxl d3x3 &x5. The cycle may then be repeated. In every case, the simplex method concentrates on one variable in the problem, increasing it when
+
+
PHILIP WOLFE
168
profitable, and altering the others so as to remain on the edges and vertices of the constraint set. It was mentioned above that there is a sense in which the simplex method is a gradient method. Generally speaking, a gradient vector of a function may be defined as a direction in which the directional derivative of the function is maximal: more precisely, as a vector Ax = (Axl, . . . , Ax%)maximizing
and such that \IA.c\l = 1,
where [ IAx[1 denotes the length of Ax. When we take = (Ari)'
4-. . .
+ (AzJ2,
it is easy to show that the vector which does this is indeed proportional to . . . , af/ax,), which is therefore satisfactory to use. The above definition of llAsll is not, however, the only one that can be used: many other notions of "length" are available, and perhaps equally plausible. One of the more iiiteresting would be
Gf(z) = (af/ar,,
IjAzlI
=
IAzII
+ . . . + lAxnl.
If this coiiveiition is used, the maximization problem posed again, a new answer is found: its components must all be zero excepting the jth, where a.f/az, is the maximum of all the partial derivatives, and Axj = sgn (af/ax,). Kow this is precisely the rule of progress of the simplex method, since af/az, = c,; we try only to increase x,. Thus, with this more liberal interpretation of the concept of the gradient, the simplex method is a gradient method. Evidently the bulk of calculation in this method is connected with expressing the data of the problem in terms of a selected set of independent variables, and changing this mode of expression from step to step. Fortunately, because of the linearity of all the relations, this work is of the same kind as is done in handling linear equations; and since the set of independent variables changes in only one member between steps, the work turns out to be very simple. 3.3 The Simplex-Corrected Gradient Method
Figure 6 poses the problem of maximizing a nonlinear objective function under linear constraints; the latter are drawn like those of Fig. 5. The point P of the figure is located outside the constraint set, and the objective function is taken to be the negative of the square of the distance from the point
RECENT DEVELOPMENTS
IN NONLINEAR PROGRAMMING
169
a-P
'X'
Fro. 6. Simplex-corrected gradient method.
x to P. The problem is then that of finding the nearest point to P within the constraint set. The objective is evidently concave, as desired. The simplex-corrected gradient procedure will yield the sequence 20, 21, 2 2 , . . . of points of the constraint set which will converge to the solution of the problem. There will also be formed an auxiliary sequence xo,XI, 22, . . . of vertices of the constraint set which are used in the process; these are produced through use of the simplex method in a certain way. Initially, let xo be any vertex of the constraint set, and set zo = xo. Now suppose that k steps of the procedure have been taken. There will be a t hand a point zk belonging to the const,raint set and a vertex xk of the constraint set. Then Calculate Vf (zk). Using Of(2)as a (temporarily) constant objective vector and xk as an initial extreme point, take one step of the simplex method in the maximization of the linear function Of(zk) obtaining a new extreme point; call this extreme point &+l. Choose the point zk+1 on the segment joining zk to xk+l so as to minimize f on that segment. This sequence of steps defines a recursive process for generating the sequence zol zl, . . . It is assumed that the one-dimensional maximization problem of step (3) is not too difficult to perform. The special simplex method application e x ,
170
PHILIP WOLFE
of steps (1) and ( 2 ) is not difficult; suitably handled, the work of step ( 2 ) is not a great deal more than that involved in performing one step of the simplex method for a problem having permanently fixed costs. It should be mentioned that, in case the constraint set is not bounded, it is possible that the step (2) may result in the taking of an “unbounded” simplex step. Step (3) need only be slightly altered to incorporate this possibility. An estimate of the speed of convergence of the procedure is given by the following result: letting M be the maximum value of f as constrained, there exists a constant K such that
Unfortunately, the procedure is unlikely to terminate in the exact solution of the problem, although it does in the case of a quadratic objective, where other exact methods also exist. On the other hand, this will be no objection if it arrives in the near neighborhood of the solution with sufficient speed. 3.4 Projected-gradient Procedures
Methods of the “projected gradient’’ kind can be viewed as resulting from the attempt to make a differential gradient method take as large steps as possible, while never allowing the point x to leave the constraint set. Figure 7 illustrates such a procedure, beginning a t t,he point zoand generat-
FIG.7. Projected-gradient method.
RECENT DEVELOPMENTS IN NONLINEAR PROGRAMMING
171
ing the sequence of points x’, x2, . . . . Below, by “plane” is meant the entirety of a single hyperplane of the form gt(x) = 0, whose intersection with the constraint set yields, in general, one of its faces. Starting with the point xk in the constraint set, either one or two successors of xk are determined by the following steps. A particular set of planes is associated with x k a t all times; initially, let this be the set of all planes which pass through x k . (1) Calculate Vf(xk). ( 2 ) Find the projection of Vf(&) onto the intersection of all the planes associated with xk. (In case there are no planes-as when x k is interior t o the constraint set, this intersection is the whole space, and thus the projection is V f ( z k ) itself.) (3) If the projection is different from zero, extend a ray from xk in that direction, and define xk+’ to be the farthest point along the ray belonging to the constraint set. (a) If f(xk++’)> f(xk), then the cycle is complete. (b) Otherwise, choose xk+2 so as to maximize the function f on the segment xkxk+l;this completes the cycle. (4) If the projection is equal to zero, then Vf(xk) may be written = XkrkAk
as a linear combination of normals Ak to the planes associated with xk (the A k are chosen to point away from the constraint set). (a) If all rk 2 0, then xk is the solution of the problem. (b) Otherwise, define a new set of planes to be associated with xk b y deleting from the present set some one plane for which T k < 0, and return to step ( 2 ) . As with the gradient-corrected simplex method, it is assumed here that the one-dimensional maximization problem which may have to be solved in step (3b) is not difficult to do, which is the case. I n Figure 7, the points x4 and x6 have been obtained as the result of minimizing on the segments x2x3and a t these minima Vf is, of course, perpendicular to the segment in question. Convergence of the procedure to a solution of the nonlinear problem is not difficult t o show. Unlike the previous gradient methods, this procedure does not completely reduce to the simplex method for a linear problem; but it does so reduce if the points x k are vertices of the constraint set. I n the same way that the simplex method may be looked on as a gradient method, a procedure very closely related to the above, which need not actually use Vf(xk) in the calculation, can be given. The projection of Vj(xk) onto the face in which x k lies turns out to be precisely the direction of steepest ascent for the function f per unit distance in the Euclidean
z;
PHILIP WOLFE
172
metric, if that direction is chosen so as to keep one in the constraint set. If, on the other hand, some other metric were used-as in the discussion of the gradient aspect of the simplex method above-a somewhat different algorithm would be obtained. The metric !/Ax/\= Max {\Axl! , . . . , \Ax,/), for example, changes the work of step (2) from that of finding the projection of V.f(xk)onto the face of xk to that of determining a point y maximizing Vf(xk) - y under the linear constraints of the original problem augmented by the constraints \xik - yjl 5 1; the direction of motion away from xk is then that of the ray from xk through y. (The first reference cited under the heading of projected-gradient methods employs the projection; the second reference employs this latter procedure.) 4. Simplicia1 Methods
4.1 The Role of the Simplex Method
I n the current practice of the art of nonlinear programming, those computational methods which are not essentially gradient methods use the simplex method for linear programming to do their basic computational work. As will be seen below, there are three ways in which this happens. The first way we have called “simplicial”: it is the solution of a nonlinear problem-specifically, the quadratic programming problem-by what is almost exactly the simplex method itself. Once the problem has been set up, no additional data need be generated. This is not the case with the other two types of method; they depend in an essential way on the process of linearization, and make use of the linear programming process to solve linear problems derived from the nonlinear problems which give approximate answers to the real problems, using the results thus obtained to further improve the approximation. They are essentially infinite methods. I n the sense in which the term is used here, there is only one “simplicial” procedure current, that for quadratic programming. 4.2 The Simplex Method for linear Programming
The simplex method for linear programming problems was described briefly in Section 3.2. We return to it here only to permit the development of those features which are used in the nonlinear programming algorithms described later. The usual statement of the linear programming problem is: c,yj subject to Maximize C;’=l
yj
>, O (allj),
Cja,,yj
3 b , (alli).
(4.1)
RECENT DEVELOPMENTS IN NONLINEAR PROGRAMMING 173 The Lagrange multipliers, or “dual variables” u L(i = 1, . . . , m) described in Section 2.3, play an important role in the forms of the simplex method used in calculation. The duality theorem for linear programming states that
y1, UI,
. . . , gm solve the problem . . . , u, > 0 such that
(4.1) if and only if there exist
C1.ular, 3 c, (allj), and
ct u,b, = c,cjy,.
The main features of the so-called primal simplex method caii be de. . . , u, and some subset J of all m i be d in these terms. Suppose that u,, the indices 1, . . . , n are known for which y, = 0 unless j € J . (It is not difficult to satisfy these starting conditions, if they do not obtain, by emFedding the given problem in a certain larger one, as described in Section 7.) If now it happens that inequality (4.2) holds for all o t h e r j as well, then the problcm is solved-by the duality theorem. If, however, it is not, then it is possible to select some j for which
C r u t a 2, c,
<0
(4.4)
(whence necessarily y, = 0 ) , and, by allowing the corresponding y3 to become positive, obtain u and y satisfying (4.2, 4.3) with the index-set J augmented by the index j. Continuing in this way, the simplex method ultimately yields u and y satisfying the relations (4.2, 4.3) for all indices, thus solving the problem. The efficiency of this process is based largely on the fact that the amount of computing that must be done when one index is added is small. I n actual practice, one does not try to maintain the constraints (4.2) throughout the calculation; by choosing instead always to increase that zero variable yJ for which Max, [c, - Czu,aJ (4.5) is achieved, and ensuring that the constraints (4.1) of the original problem remain in force, the same results can be obtained. The dual simplex method can be described in a similar way. I n it, a t the start of a n iteration one has u and y satisfying all the constraints of the form (4.2), but not necessarily satisfying all those of the form (4.1). Then, one by one, the set of indices i for which C , a z j y 36 b, actually holds is increased until it is complete, a t which time, by the duality theorem, the solution is a t hand. The linearization methods that will be discussed in Sections 5.3 and 6 below proceed by augmenting the data of a given linear programming
174
PHILIP WOLFE
problem, representing an approximation to the nonlinear programming problem, with additional data in order to improve the approximation. A columnar method does this by adding data in the form of columns (indexed o n j ) to the matrix a,, of (4.1); thus for these, the primal simplex method is an efficient way of proceeding from the solution of one approximate problem to that of the next. The cutting plane method, on the other hand, adds constraints; for this method, the dual is appropriate. Note that the presence of variables, such as our z, unrestricted by the inequalities of the form y, 3 0 which have been assumed here, does not change the essentials of the simplex method. The linear programming problem is, in fact, usually simpler if these are present; they are not especially restricted, so might be solved for and eliminated from the problem. Even when they are not eliminated, it is easy to carry out the necessary calculations with them present but ignored.
4.3 The Simplex Method for Quadratic Programming The ‘(quadratic programming” problem is a considerable specialization of the general nonlinear programming problem. It has only one nonlinearity, its objective function, which is a quadratic function of the variables of the problem. All the constraints are assumed to be linear. In the notation of Section I : f(x) = C , p ~ -, Cj A Q , A ~ J ~ ( j , Ii = 1, . . . , n), (4.6) (i = 1, . . . , m). gz(x) = C , a,,x, - b, (4.7) Since the function f is to be concave, thc n X n matrix Q must be positive semidefinite. While this problem may seem a very simple one compared with the general problems treated elsewhere here, it has one remarkable property: an exact solution may be obtained, as in linear programming, by linear methods. This property is due essentially to the fact that the gradient off is a linear function of x, and that the geare linear. The conditions which yield a solution of the quadratic programming problem follow from the observations made in Section 2.3 on the use of Lagrange multipliers in inequality-constrained problems. It was shown there that z solves the mathematical programming problem if, and only if, there exist multipliers u t 3 0 (i = 1, . . . , m) such that Vf(z) = C % u l V g , ( x ) and (4.8) Czuzgz(z)= 0.
It is convenient to restate equation (4.9). Define y L= -gl(x) for all i. The equation can then be restated: for each i, either u L= 0 or yz = 0.
(4.9)
(4.10) (4.11)
175
RECENT DEVELOPMENTS IN NONLINEAR PROGRAMMING
Expressed in this way, as an alternative regarding the vanishing of variables in a set of linear relations, it will be possible to achieve it through the variable-selection mechanics of the simplex method. For the present problem, Eq. (4.8) becomes
-2
EiQjkzk
- xiuiaij
=
( j = 1, . . . , n).
-pj
(4.12)
This equation can be made to obtain by the same device used to obtain an initial solution for the relations of a linear programming problem (see Section 7) : introducing a vector variable z, which is the difference between the left- and right-hand sides of (4.12), attempt to reduce z to zero. First, some z, u satisfying all the other requirements are chosen; z = 0, u = 0 will do. Then, the variable z is chosen so that, for a suitable initial value, the left-hand side of (4.12) will have the value - p j . The complete set of relations needed is x k azjxk
+ yi
- x i U , U ; j - PjZj Yi 3 01 uz 3 0,
-2QjkXk
It has the initial solution xk = 0, ui = 0,
2/i
=
bi
=
-pj
= bi,
(all i), (allj),
3
0.
zj =
1
xj
(4.13)
(all i,j , k ) .
The task of the algorithm is to: Minimize Cj z j under the constraints (4.13) and the restriction: If ui # 0, retain y i = 0; and vice versa.
(4.14) (4.15)
This last restriction makes a special proof of convergence necessary, since it is a restriction which the simplex method does not ordinarily enjoy. Otherwise, the algorithm proceeds exactly as does the simplex method for linear programming. It can be shown that if Q is a positive definite matrix, then the algorithm will terminate with z = 0, so that the terminal z will solve the quadratic problem. As is often the case with procedures in this field, the algorithm seems to work also in cases for which termination has not been proved, as when Q is only semidefinite.
5. Columnar Procedures 5.1 Columnar Procedures in General
Columnar methods are those in which a linear approximation to a given mathematical programming problem, in the form of a linear programming
176
PHILIP WOLFE
problem, is refined through the addition of columns to its data. The columns result from the evaluation of the functions in the original problem on a grid of points spanning a suitable portion of the space of the problem variable x. Let xl,x2, . . . , xT be a collection of n-vectors. Any point x of the convex hull of this collection (the smallest convex set containing it) may be written x = CT=1htxt, where (5.1) Ctht = 1 and At 3 0 (all t). Given any function h of x, the linearization of h on the "grid" xl,. . . , xT is attained through the approximation
h(x)
Ct Xth(x').
(5.2) Any mathematical programming problem becomes a linear problem in the variables At if x and h(x) are replaced throughout by their representations above. In general, the A t are not uniquely given by (5.1) for given x, so there is some degree of freedom in this representation, and hence in the representation of h ; and whatever process is used for handling the linear problem can take advantage of this. Using these representations, the origiiial mathematical programming problem may be stated in the approximate form:
Maximize
=
C tXtf(xt)
subject to
(5.3)
The main question here is, of course, the relation of the solution of this linear programming problem to the solution of the original problem. Part of the answer follows from the assumptions which have been made on the functions involved. Let hl, . . . , hT be the solution of the problem defined in Eqs. (5.3) and (5.4). Then x = Ctht~' (5.5) is offered as an approximate solution of the original problem. Owing to the convexity of the functions g., we have
g L b )=
gl(Ch'x') 6 C Atgl(xt> < 0,
(5.6)
so that x satisfies its constraints. How closely j ( x ) approximates the maxi-
mum obtained in the linear problem is determined by the fineness of the grid in general. Rut it does follow from the concavity o f f that so that x gives at least as high a value of the objective function as is iiidicated by the solution of the linear problem.
RECENT DEVELOPMENTS IN NONLINEAR PROGRAMMING
177
The principal technical problem in using a columnar method is that of dealing with a sufficiently fine grid without having to handle enormous quantities of data. In the sequel, two different approaches to the solution of this problem will be studied. 5.2 Separable Programming
Separable programming is the application of the grid procedure described in Section 5.1 to a problem in which each nonlinear function is separable, that is, may be written as the sum of separate functions of the components x, of the point x:
f(4 = c;=1.f,(.&),
(5.8) (all 2 ) . gdx) = C?=lgt,(x,) In this application, the linearization technique is applied separately to each variable x,. Suppose that for each j a sequence of values x,~,. . . , x , ~ has been chosen (we suppose the same number T chosen for each j ) . Write T, =
Ct A,th(.x,t),
(5.9)
and use the approximatioil h(xj)
=
Ct AJth(xjf)
(5.10)
for any function h of x, alone. When this is done for each variable, the resulting linear programming problem, derived from Eqs. (5.3) and (5.4), is Maximize
C3C tA , " f q t )
subject to
3
C t A,t
(allj, t ) ,
0
and
<0
C, C t btgz,(x,9
=
1
(allj),
(all 4.
(5.1 1)
(5.12)
From the solutions XJtof this problem the approximate solutions x, = C t XJtxJt (allj) (5.13) of the original problem are obtained. In the separable case, since the linear approximations are handled separately for each variable, the same results are obtained as if the general procedure of the previous section were used with a grid consisting of all points of the form
x
= (23,
4, . . . , x?),
t, = 1, .
. . , ?I,
(5.14)
which are T n in number. There are, however, only nT variables in this problem, and (m 1)nT evaluations of the nonlinear functions involved, rather than the T nvariables and (m l)Tn evaluations which direct ap-
+
+
178
PHILIP WOLFE
plication of the general procedure would entail. The total number of equal to 2m. tions, however, is increased from m Convergence of the solutions (5.13) to the solution of the original problem as the grid size goes to zero for a separable problem is assured by elementary arguments using the convexity and continuity of the objective function. It is remarkable, however, that through the use of special devices in applying the simplex algorithm to the solution of the linear programming problem (5.11, 5.12), it is possible to obtain local solutions for problems in which our assumptions of convexity and concavity do not hold. While a discussion of this arould go outside the bounds of this paper, it should be mentioned that this extension considerably augments the class of functions which can be treated. The class of separable concave or convex functions, although of considerable practical value, omits many of the most common algebraic expressions. The function zy, for example, is not such, but is the sum of the convex function $(x y)2and the concave function -$(x - Y)~. The linear change of variable u = x y, v = x - y, then renders it separable.
+
+
+
5.3 The Decomposition Procedure
I n the case of nonseparable nonlinear problems, any grid of reasonable fineness covering a large region mill include a tremendous number of points, posing considerable data processing problems for a computational routine. In actual fact, however, only a small number of these points would ever be actually used in the computation. On account of the basic properties of the simplex method, the final solution of the approximating linear program1 points, and probably only some ming problem would involve only m small multiple of this number would be used, even temporarily, in the course of arriving a t the final solution. These facts indicate that it would be well to investigate how grid points might be generated when needed, rather than all set down a priori. The decomposition algorithm for linear programming is a device for using the data of very large linear programming problems of a certain form to generate recursively just the needed data for a smaller linear programming problem whose iterated solutions solve the larger problem. What follows is essentially the application of this method to our nonlinear programming problem, conceived as being represented by a linear program of the form (5.3, 5.4) constructed from an arbitrarily fine grid. Let a grid z', . . . , zT be given, and let the associated linear programming problem (5.3, 5.4) be solved, yielding as well as the solution X', . . . , XT the dual solution go, . . , am (for convenience, the equation Ct k t = 1 is numbered 0). Allowing complete freedom in choice of grid points, we may pose the question: Of all possible points zT+l that, might be adjoined to the
+
.
RECENT DEVELOPMENTS IN NONLINEAR PROGRAMMING
179
given grid as a further refinement, which point would the simplex method first choose as contributing the most to the solution of the thus extended linear programming problem? The answer is given by formula (4.5). In the present problem, the data constituting a column of the linear programming problem are given by C t = f(xt),
1
= 1, ait = g;(xt)
a,t
(t = 1,
. . . , T).
(5.15)
The column to be adjoined is thus given by the solution of the problem: Maximize
f(x) - 'ii, -
xiaigi(z),
(x unconstrained).
(5.16)
Letting xT+I be this solution, a new column for the linear programming problem is constructed according to (5.15), a new variable XT+' added, and the simplex method once more employed to find a new solution to the expanded linear programming problem. The repeated application of the procedure of the above paragraph constitutes the decomposition algorithm for nonlinear programming. It can be shown that, when the functions involved satisfy our convexity assumptions, the process converges to a solution of the original nonlinear programming problem, in the sense that any limit point of the sequence (5.17)
is a solution of the problem. As far as making efficient use of grid points needed, it would seem that the decomposition algorithm is about as good as possible. Actually, the burden of the work has been shifted to the subproblem (5.16), which must be solved afresh each iteration using ;iii from the previous iteration; the over-all efficiency of the procedure depends on how readily it can be solved. In the general case, it is not necessarily substantially easier to solve then the original problem, but in many special cases it is. The fact that it is expressed without inequalities often makes classical extremization techniques practical. For example, suppose that the original problem is separable: that is, t,hat f and g ; are of the form given by equations (5.8). Then the problem (5.16) becomes Maximize
C j [fj(xj)
-
a,, - Ciii2g,,(xj)].
(5.18)
Since there are no constraints on x in this problem, its solution is obtained when each of the terms of the summation is independently maximized. The new xT+I is thus made up of the components xTf' = x j obtained from the solutions of the n problems-
1 ao
PHILIP WOLFE
fj(xj) -
hlaximize
C ,uigpj(xJ).
(5.19)
In most practical cases, the problem (5.19) is readily solved by elementary calculus.
6. The Cutting-plane Method
The cutting-plane procedure is the second of the two principal types of linearization procedures considered here. The columnar procedures were based on the idea that the convex constraint set might be represented as the set of all convex combinations of a sufficiently dense set of points; the cutting-plane method is based on the “dual” proposition that it can be represented as the intersection of all the half spaces which contain it. As with the columnar methods, the basic theoretical concept is clear, but its usefulness depends on the efficiency with which the representation-or an essential part of it--can bc carried out to a satisfactorily closc approximation. It is most convenient to describe the method for a mathematical programming problem having a linear objective function. No generality is lost in doing this; since by the addition of one variable and one constraint, the original problem may be embedded in one having this form. Definc
. . . , xn+l) = g2(x1, . . . , 2 , ) for i = 1, . . . , m, . . . ,xn+1) = G + l - f(rc1, . . . , L).
~ ~ ( 2 1 ,
,l.(l+,U
(6.1)
Then the problem Maximizc
g,(zl,.. . , %,+I)
.c,,+~
for i
subject to =
1,.. . ,m
+1
(6.2)
certainly yields the maximum off under the original constraints; the objective is linear, and the function g,+1 is concave. I n the remainder of this section it will be assumed that f itself is linear. The main tool of the cutting-plane procedure is the representation of the nonlinear constraints of the problem by first-order Taylor’s series approximations. Let xtbe some n-vector. Expanding the function ga about xt,the nonlinear constraint gZ(x) 6 0 will be replaced by the linear constraint g@)
+ vg%(xt)(x- xt) 6 0.
(6.3) Note that the left-hand side of (6.3) is never greater than gz(z), since gz is concave; so that if x happens to belong to the constraint set-that is, if ga(z) 6 0 for all i-then x will satisfy every inequality of the form (6.3), for any xt.
RECENT DEVELOPMENTS IN NONLINEAR PROGRAMMING
181
Let now a sequence of points xl, . . . , xT be given. The linear programming problem to be solved as an approximation to the original problem is: Maximize g@>
f(z)
+ Vg;(xt>(x- d ) < 0,
subject to
i t
= =
1,. . . , m ; l , , . ., T.
(6.4)
If the solution x = x of this problem happens to satisfy all the original constraints, then it would be the solution of the original problem; because it would maximize the objective over a constraint set-that defined by (6.4)-which is at least as large as the original. The recursive step of the cutting plane procedure is this: If x does not satisfy all the constraints of the original problem, define xT+l = x ; use xT+I to construct m new linear inequalities of the form (6.4), and solve the new linear programming problem. This step is illustrated in Fig. 8 for a problem having two nonlinear constraints and a linear objective which is maximal at the right-hand side of the page. The convergence of the procedure is not difficult to prove. The only starting condition which must be assumed is that an initial set of points xl,. . . ,xT can be chosen so that the objective of the linear programming problem is bounded above. (If any of the family of linear problems thus generated should have no point satisfying its constraints, it would follow that the original problem had the same property, so that satisfaction of the constraints is guaranteed.) It is noteworthy that, unless the process terminates, the added point x T + ~ always lies outside the constraint set. Neither does that point satisfy all the constraints constructed from it for the next iteration, since letting x = zT+l in relation (6.4) gives g,(xT+l) 6 0, which cannot hold for all i. The point xT+l thus lies on the opposite side from the constraint set of the hyperplane gz(xT+l) Vg,(xT+’)(x - xT+1) = 0 (6.5)
+
for such i that g.(xTfl) > 0. These hyperplanes constitute “cuts,” cutting off pieces of the polyhedral constraint set defined by (6.4) and producing an improved approximation to the original constraint set in the neighborhood of the point xT+l. Considerable advantage can be taken of the fact that the linear program (6.4) does not change a great deal from one iteration to the next. As mentioned in Section 4, the dual simplex method makes it possible to add constraints to a linear problem which has already been solved and efficiently find a solution to the new problem. In this respect the cutting plane method is a sort of “dual” of the columnar methods, in which columns
PHILIP WOLFE
182
FIG.8. The Cutting-plane method.
rather than rows are added to a linear programming problem a t each iteration. It would seem further to be a good idea not to add all the constraints of the form q,(zT+')
+ Vg,(zT+')(z
- XTf')
6
0
(6.6)
RECENT DEVELOPMENTS IN NONLINEAR PROGRAMMING
183
a t each iteration, since those constraints for which already gi(xT+l) 6 0 may remain satisfied indefinitely, and thus not need consideration. The most practical scheme seems to be that of adding a t each iteration only a single linear constraint, formed for that i which achieves Maxi gi(zT+’), (6.7) the “most unsatisfied” constraint. Each iteration will then consist of the addition of a single linear constraint to the linear program (6.4), and the work of obtaining the new solution will be quite small.
7. Initiating an Algorithm
I n discussing those algorithms above which seek always to increase the value of the objective function for points required to satisfy all the constraints of the problem, it has been assumed that such points were at hand to begin with. Of course, this is not the case in general for a practical problem. Most of these algorithms, however, can themselves be applied to the task of finding a suitable starting point in this way: The original problem can be embedded in a so-called “Phase One” problem, having a n objective function of special form, such that (1) a starting point is readily found, and (2) when the special objective is maximized, a point satisfying the constraints is a t hand. The Phase One problem can be constructed in a simple way for use with any procedure applicable to the general nonlinear programming problem. Since each constraint function gi is convex, so is Max [0, g;]. This function is not necessarily differentiable, but its square is if g; is differentiable, and it is furthermore convex. Thus, the Phase One objective
F(z) =
-Xi (Max [O, g;(z)l)z
(7.1)
is concave and differentiable. An algorithm that can solve this problem without constraints may be started with any initial point. When F has been maximized to zero, the terminal point z will satisfy the constraints of the original problem. If it should happen that the maximum of F is negative, then the original constraints are necessarily inconsistent. It is usually simpler to use the following alternative objective function for Phase One: F ( s ) = -C; Max [O, g4z)l. (7.2) Since, however, this function is not differentiable in the neighborhood of points x for which some g,(z) = 0, a somewhat special treatment of the procedure will be necessary.
PHILIP WOLFE
184
Givexi aiiy poiiit x , define I ( x ) to be the set of all values of the index i for which g&) > 0. Define the temporary objective function Px(x) = -CrEIcx)Ggi(x) ;
(7.3)
note that F,(x) = F ( x ) . Employing now any algorithm which is applicable, proceed with the solution of the problem fifnximize gL(x) < 0
Fx(.c) subject to (all i not in I ( x ) )
(7.4)
until a point 2 is found satisfying the constraints of (7.4) for which g&) Q 0 for some i in I ( x ) . Now redefine x to be this new x, and return to the problem (7.4). I n effect, this process works toward satisfying all the constraints at once, retaining each for good as soon as it is satisfied. It is effective especially in the case of linear programming, where it is the standard procedure for finding a starting solution; the fact that Fx is linear makes it work very well. I n problems for which the objective function and the constraints are not of the same type, it may be well to use one sort of procedure for obtaining a starting point and another for subsequent maximization. This is particularly true if the constraints are linear and no “nearly feasible” point is a t hand, in which case the simplex met,hod Phase One seems to be by far the most efficient.
8. Computer Routines and literature
This section will present literature citations for the nonlinear programming methods which have been studied and will describe what computational experience has been made available regarding them. There is no single reference which covers the subject of mathematical programming in the broad sense in which it is used here. The existing textbooks in the field deal almost exclusively with linear programming. Of these, that of Gass [I21 has the widest modern coverage. It is brought up to date in Gass [I,%]. I n contrast to the state of affairs iii linear programming, almost nothing is clearly known regarding the relative computational efficiency of various techniques for nonlinear programming. Each method discussed here has been programmed in some form for a large-scale computer; but only one computer routine has been made publicly available; and most of the known routines have iiot been fashioned for general-purpose usage or even properly documented. There seem also to have been no comparative experi-
RECENT DEVELOPMENTS IN NONLINEAR PROGRAMMING
185
ments run under similar conditions. What follows will thus serve only as an indication of the level of activity and not as a guide to the user with a problem to solve. 8.1 Direct Differential Gradient Methods
There has been a certain amount of activity in this area relating to digital computers for some ten years now. The best comprehensive reference up to 1957 is the monograph of Brown [4],which surveys a number of variants of the basic method. Proposed applications for digital computers since then have been given by Fiacco et al. [9],Carroll [5],and Rosen [f 81. This type of procedure lends itself very nicely to analog computation, a fact which has been exploited by Ablow and Brigham [f], Pyne [ l 6 ] ,and DeLand [7]. There seems to have been a considerable amount of experimentation with direct differential gradient methods, but no reports have been published. One gets the impression that these methods may have worked welt for some particular small problems, but that they have not succeeded in a general way for large problems. 8.2 Lagrangian Differential Gradient Methods
The earliest experience with these procedures seems to be that of Manne [l4].Since then, most of the work in the area is due to Arrow et al. whose collection of papers [2] contains most of what has been done. Manne reports mildly encouraging experience with this sort of method as programmed for the IBM CPC Model 2 (1953). Thomas Marschak [See 21 reports on some experiments with another version programmed for the JOHSNIACcomputer (whose speed is approximately 1/75th that of the IBM 704). Approximately thirty minutes were required for the solution of a five-equation, fourteen-variable linear programming problem. 8.3 The Simplex-corrected Gradient Method
This procedure is described by Frank and Wolfe [ l o ] .No computational experience with it has been reported. 8.4 Projected-gradient Procedures
Several closely-related procedures of this kind have been proposed. One is given in considerable detail by Rosen [17],and another by Zoutendijk [24].Some of the work of Frisch [I11 is based upon this approach. What seems to be a general-purpose routine embodying this procedure for problems having nonlinear objectives and linear constraints has been programmed by Rosen and R. P. Merrill for the IBM 704 and 7090, and some results using this program have been reported [f 71. The routine is not presently available to others.
186
PHILIP WOLFE
It is believed that one of the Frisch procedures has been programmed for a large-scale computer, but no results have appeared. An excellent detailed survey of these procedures has been written by Witzgall [ZO]. His presentation, more precise than some of those above, exhibits them as small variants of one basic scheme. 8.5 Quadratic Programming
The algorithm described above for the quadratic programming problem (convex quadratic objective, linear constraints) is that of Wolfe [2.2].It has been embodied in the SHARE704-7090 routine RSQP1, which will handle problems for which the sum of the numbers of variables and constraints does not exceed 250. Some computational experience is described in ref. [ZZ]. Another procedure is given by Beale [S], which has recently been programmed for the Ferranti Mercury by D. G. Prinz. 8.6 Separable Programming
The basic separable programming procedure, described for computer implementation for problems having quite general objectives and constraints, is Miller’s [15]. This procedure is in regular use on the IBM 7090 by the Standard Oil Company of California. The routine is not being made available at present. 8.7 The Decomposition Procedure
The decomposition procedure is described for the case of linear coilstraints by Dantzig [S] and for the general case by Wolfe [ZI]. An experimental routine was written by Shapiro [I91 for the IBM 704 embodying this procedure for the particular case of the “chemical equilibrium” problem. 8.8 The Cutting-plane Method
The basic cutting-plane method is given by Kelley [IS], and some comments on it by Wolfe [ZS]. Some computational experience has been obtained with it by Dornheim [8] and by Griffith and Stewart [ I 2 b ] , but no detailed report of this experience has been given, and computer routines embodying the method are not available. Bibliography 1. hblow, C. M., and Brigham, G., An analog solution of programming problems. Operations Research 3 , No. 4, 388-394 (1955). 2. Arrow, K. J., Hurwicz, L., and Uzawa, H., Studies in linear and non-linear programming. I n Stanford Mathematical Studies in the Social Sciences, Vol. 11. Stanford Univ. Press, Stanford, California, 1958. 3. Beale, E. M. L., On quadratic programming Naval Research Logist. Quart. 6, 227-243 (1959).
RECENT DEVELOPMENTS IN NONLINEAR PROGRAMMING
187
4. Brown, R. R., Gradient methods for the computer solution of system optimization problems. MIT Dept. Electrical Engineering, WADC Tech. Note No. 57-159, September (1957). 5. Carroll, C. W., The created response surface technique for optimizing nonlinear restrained systems. Operations Research 9, No. 2, 169-184 (1961). 6. Dantzig, G. B., General convex objective forms. Rand Corp. Papcr No. P-1664, April (1959). 7. DeLand, E. C., Continuous programming methods on an analog computer. Rand Corp. Paper No. P-1815, September (1959). 8. Dornheim, F. R., Optimization subject to nonlinear constraints using the SIMPLEX method and its application to gasoline blending (Sinclair Research Laboratories, Harvey, Illinois). Presented at Optimization Techniques Symposium, New York University, May (1960). 9. Fiacco, A. V., Smith, N. M., and Blackwell, D., A more general method for nonlinear programming. Presented at Seventeenth National Meeting of the Operations Research Society of America, New York, May (1960). 10. Frank, M., and Wolfe, P., An algorithm for quadratic programming. Naoal Research IJogi~t.Quart. 3, NO. 1-2, 95-110 (1956). 11. Frisch, R., The multiplex method for linear programming. Memorandum Univ. SocialokGn. Institut, Oslo, October (1955). 12. Gass, S. I., Linear Programming. McGraw-Hill, New York, 1958. 12a. Gms, S. I., Recent developments in linear programming. Advances in Computers 9,295-377 (1961). 12b. Griffith, R. E., and Stewart, R. A., A nonlinear programming technique for the optimization of continuous processing systems. Management Science 7 , No. 4, 379392 (1961). 13. Kelley, J. E., Jr., The cutting-plane method for solving convex programs. J. SOC. Ind. and Appl. Math. 8 , No. 4, 703 -712 (1960). 14. Manne, A. S., Concave programming for gasoline blends. Rand Corp. Paper No. P-383, April (1953). method for local separable programming. Report of the 15. Miller, C. E., The SIMPLEX Electronic Computer Center, Standard Oil Company of California, San Francisco, August (1960). 16. Pyne, I. B., Linear programming on an electronic analogue computer. AIEE Trans. Ann., Part I pp. 139-143 (1956). 17. Rosen, J. B., The gradient projection method for nonlinear programming. I . Linear constraints. J. Soc. Znd. and Appl. Math. 8, No. 1, 181-217 (1960). 18. Rosen, J. B., The gradient projection method for nonlinear programming. 11. Nonlinear constraints. Shell Development Company, Emeryville, California, 1961. 19. Shapiro, M., and Dantzig, G. B., Solving the chemical equilibrium problem using the decomposition principle. Rand Corp. Paper No. P-2056, August (1960). 20. Witzgall, C., Gradient-projection methods for linear programming. Princeton Univ. and IBM Corp. Report No. 2, August (1960). 21. Wolfe, P., The generalized SIMPLEX method. Rand Corp. Paper No. P-1818, May (1959). 22. Wolfe, P., The simplex method for quadratic programming. Econometrica 27, No. 3, 382-398 (1959). 23. Wolfe, P., Accelerating the cutting-plane method for nonlinear programming. Rand Corp. Paper No. P-2010, June (1960). 24. Zoutendijk, G., Methods of Feasible Directions. Elsevier, Amsterdam, 1960.
This Page Intentionally Left Blank
Alternating Direction Implicit Methods* GARRETT BIRKHOFF. RICHARD S. VARGA. AND DAVID YOUNG Deparfmenf of Mathematics. Harvord University. Cambridge. Mossachusetts; Compufing Center. Case Institute of Technology. Cleveland. Ohio; a n d Computation Center. University of Texas. Austin. Texas
.
1 General Remarks . . 2 . The Matrix Problem . 3. Basic AD1 Operators .
Introduction
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 190
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 191 . 192
Part I: Stationary AD1 Methods (Case m 4 . Error Reduction Matrix . . . . . . . . . . 5 NormReduction . . . . . . . . . . . . 6 . Application . . . . . . . . . . . . . 7 . Optimum Parameters . . . . . . . . . . 8. T h e F u n c t i o n F . . . . . . . . . . . . 9 . Helmholtz Equation in a Rectangle . . . . . . 10. Monotonicity Principle . . . . . . . . . . . 11. Crude Upper Bound . . . . . . . . . . . 12. Eigenvalues of H , V . . . . . . . . . . .
.
= 1)
. . . . . . 194 . . . . . . 195 . . . . . . . . . . . . . .
.
.
.
.
.
.
.
. . . . . . . . . . . . .
. . . . .
. .
.
Part 11: Commutative Case 13. Introduction . . . . . . . . . . . . . . . . . . . 14 Problems Leading t o Commutative Matrices . . . . . . . . . . 15. The Peaceman-Rachford Method . . . . . . . . . . . . . 16. Methods for Selecting Iteration Parameters for the Peaceman-Raehford Method . . . . . . . . . . . . . . . . . . . . 17 The Douglas-Rachford Method . . . . . . . . . . . . . . 18. Applications to the Helmholtz Equation . . . . . . . . . . .
196 198 199 200 202 203 204
.
205 206 210
.
211 217 222
Part 111: Comparison with Successive Overrelaxation 19. The Point SOR Method . . . . . . . . . . . 20 . Helmholtz Equation in a Square . . . . . . . . 21 . Block and Multiline SOR Variants . . . . . . . . 22. Analogies of AD1 with SOR . . . . . . . . . .
Variants . . . . . 224 . . . . . 225 . . . . . 227 . . . . . 229
* Work supported in part by the Office of Naval Research, under Contract Nonr-1866 (34) *
189
190
23. 24. 25. 26.
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
Part IV: Numerical Experiments Introduction . . . . . . . . . . . . . . . . . Experiments with the Dirichlet Problem . . . . . . . Analysis of Results . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . .
Appendix A: The Minimax Problem for One Parameter. . . . . . Appendix B: The Minimax Problem for m > 1 Parameters. . . . . Appendix C: Nonuniform Mesh Spacings and Mixed Boundary Conditions Appendix D: Necessary Conditions for Commutativity . . . . . . Bibliography
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 231 . 232 .
242
. 249 . . . .
254 259 263 266
. 271
INTRODUCTION
1. General Remarks
Alternating direction implicit methods, or ADI methods as they are called for short, constitute powerful techniques for solving elliptic and parabolic partial difference equations. However, in contrast with systematic overrelaxation methods, their effectiveness is hard to explain rigorously with any generality. Indeed, to provide a rational explanation for their effectiveness must be regarded as a major unsolved problem of linear numerical analysis. The present article attempts to survey the current status of this problem, as regards elliptic partial difference equation in the plane. It is divided into four chapters and four appendices. Part I deals with AD1 methods which iterate a single cycle of alternating directions. I n this case, the theory of convergence is reasonably satisfactory. Part I1 studies the rate of convergence of AD1 methods using m > 1 iteration parameters, in the special case that the basic linear operators H , V , Z in question are all permutable. I n this case, the theory of convergence and of the selection of good iteration parameters is now also satisfactory. Part I11 surveys what is known about the comparative effectiveness of AD1 methods and methods of systematic overrelaxation, from a theoretical standpoint. Part IV analyzes the results of some systematic numerical experiments which were performed to test comparative convergence rates of different methods. The four appendices deal with various technical questions and generalizations. No attempt has been made to survey practical applications of AD1 methods to industrial problems.
ALTERNATING DIRECTION IMPLICIT METHODS
191
2. The Matrix Problem
Consider the self-adjoint partial differential equation
where the function G is nonnegative, while A and C are positive. Let the solution of Eq. (2.1) be sought in the interior of a bounded plane region (R which assumes given values u = u ( x , y) on the boundary CQ of (R. To h d an approximate solution to the preceding Dirichlet boundary value problem, one commonly [8, Section 201 fist covers (R with a square or rectangular mesh having mesh-lengths h, k , approximating the boundary by nearby mesh-points a t which u is approximately known. One then takes the values u(xi, yj) of u on the set (R(h, k ) of interior mesh-points as un, one approximates -hk a[Aau/as]/az by H and -hk knowns. On ( ~ ( hk), a[Cau/ay]/ay by V , where H and V are difference operators of the form
H u h Y)
= -45,
v)u(z
+ h, Y) + 26(x, ?/Y)u(x,Y> - 4 z , Y>u(z- h, Y)
Vu(z,Y)
=
- a h ?J)ub, Y
+ h) + W(5,Y)Ub, Y>
- Y(Z, Y>U(S, Y The most common’ choices for a, b, c, a,P, y are
+
- Ic).
+ +
(2.2) (2.3)
26 = a c, (2.4) G = kA(z - h/2, y)/h, kA(z h/2, y)/h, 26 = (Y 7. (2.5) a: = hC(z, Y k/2)/k, y = hC(z, Y - k/2)/k, These choices make H and V symmetric matrices, acting on the vector space of functions u = u(xi, yj) with domain @(h,k ) . We will normally consider only the case h = k of a square network; general networks will be treated in Appendix C. For any h > 0, the preceding “discretization” defines a n approximate solution of the given Dirichlet boundary value problem for (2.1), as the algebraic solution of a vector equation (system of linear algebraic equations) of the form (H V Z)U = k. (2.6) I n (2.6), Z is the nonnegative diagonal matrix whose lth diagonal entry, associated with the interior mesh-point XI = (xi, yj), is h2G(xi, yj). The vector k is computed by adding to the source terms h2S(xi,yj) the terms in (2.2)-(2.3) associated with points on the boundary of (R, for which one can substitute approximate known values of U. a
=
+
+ +
Other possible choices are discussed in Birkhoff and Varga [ I , Section 21.
192
GARRETT BIRKHOFF, RICHARD 5. VARGA, AND DAVID YOUNG
Our concern here is with the rapid solution of the vector equation (2.6) for large networks.2 For this purpose, it is essential to keep in mind some general properties of the matrices 2, H , and V . As already stated, 2 is a nonnegative diagonal matrix. Moreover H and V have positive diagonal entries and nonpositive off -diagonal entries. Because of the Dirichlet boundary conditions for (2.1), the diagonal dominance of H and V implies that they are positive definite [19, Section 1.41; such real symmetric and positive definite matrices with nonpositive off-diagonal entries are called Stieltjes matrices. of interior mesh-points is connected, then If the network (R(h, h) = H V and H V 2 are also irreducible; it is known3 that if a Stieltjes matrix is irreducible, then its matrix inverse has all positive entries. The matrices H and V are also diagonally dominated, by which we mean that the absolute value of the diagonal entry in any row is greater than or equal t o the sum of the off-diagonal entries. For any 0 >= 0, the same is true a fortiori of H 02, V 02, and for 0,H O2V 0 2 if > 0, > 0. The above matrices are all diagonally dominated Stieltjes matrices. By ordering the mesh-points by rows, one can make H tridiagonal; by ordering them by columns, one can make V tridiagonal. That is, both H and V are similar to tridiagonal matrices, but one cannot in general make them both tridiagonal simultaneously. It can be shown that the approximate solution of (2.1) for mixed boundary conditions on 63, of the form
+
+ + +
+
+
+
+
d > 0 on 03, &/an d(z, y ) u = U ( x , y), (2.7) can be reduced to a matrix problem of the form (2.6) having the same properties. This is also true of rectangular meshes with variable meshlengths hi, ki, as will be shown in Part I1 and Appendix C; see also [lo, 191. 3. Basic AD1 Operators
From now on, we will consider only the iterative solution of the vector equation (2.6). Since it will no longer be necessary to distinguish the approximate solutions u from the exact solution u(z, y), we will cease to use boldface type. Equation (2.6) is clearly equivalent, for any matrices D and E , to each of the two vector equations (H 2 D ) u = lc - (V - D)u, (3.1) (V 2 E)u = lc - ( H - E)u, (3.2)
+ + + +
We will not consider the truncation or roundoff errors. See Varga [19],Chapter 111, Sectioa 3.5; irreducibility is defined there in Chapter I, Section 1.4. 2
3
ALTERNATING DIRECTION IMPLICIT METHODS
+ +
+ +
193
provided (H I; 0)and (V I; E ) are nonsingular. This was first observed by Peaceman and Rachford in [16] for the case Z = 0, D = E = p l a scalar matrix. I n this case, (3.1)-(3.2) reduce to
+
+
( H p I ) u = k - (V - p l ) u , (V p l ) u = k - ( H - pI)?J. The generalization to Z # 0 and arbitrary D = E was made by Wachspress and Habetler [24; see also 231. For the case I; = 0, D = E = PI which they considered, Peaceman and Rachford proposed solving (2.6) by choosing an appropriate sequence of positive numbers p,, and calculating the sequence of vectors un, u,++ defined from the sequence of matrices D, = En = p n l , by the formulas
+ 2 + Dn)un+i (V + 2 + En)un+l = (H
=
k - (V - Dn)un
(3.3)
- (H - En)un+i(3.4) Provided the matrices which have to be inverted are similar to positive definite (hence nonsingular) well-conditioned tridiagonal matrices under permutation matrices, each of Eqs. (3.3) and (3.4) can be rapidly solved by Gauss elimination. The aim is to choose the initial trial vector uo and El, DZ,E2, . . . so as to make the sequence {un>converge the matrices D1, rapidly. Peaceman and Rachford considered the iteration of (3.3) and (3.4) when D, and Enare given by D n = p n l and En = &,I. This defines the PeacemanRachford method:
+ z + p , L l ) - l [ x : - (V - P,J)ZLIll (I' + Z: + pnl)-'[k - (V - p n I > ~ n + + ] .
U?L+I =
(I1
(3.3)
(3.6) The rate of convergence will depend strongly on the choice of the iteration parameters pn, fin. An interesting variant of the Peaceman-Rachford method was suggested by Douglas and Rachford [7, p. 422, (2.3)], again for the case Z = 0. It can be defined for general Z 2 0 by un+l
=
+ PnI)-'[k (1'1 +
(HI
-
(3.7) lhn+l = pJ-'[VlUn ~n?~n++l, (3.8) where H1 and Trl are defined as H +Z and Ti $2, respectively. This amounts to setting D, = En = p J - $2 in (3.3) and (3.4), and making some elementary manipulations. Hence (3.7) and (3.8) are also equivalent to (2.6), if U , = u.++ = Un+lun+i =
+
(v-1
+
-
~ n O ~ n 1
+
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
194
PART I: STATIONARY AD1 METHODS (CASE m = 1)
4. Error Reduction Matrix
I n Part I, we will discuss only the case that D, = D and En = E are independent of n, so that p n = p and f i n = fi in the preceding formulas. I n this case, it was shown by Wachspress and Habetler [24, Theorem 11 that iteration of the Peaceman-Rachford method (3.5)-(3.6) is always concergent for D = E when D $ is positive definite and symmetric and H V is positive definite. This is always the case in the Dirichlet problem of Section 2, if one chooses D = E = p l - $2, where p is a positive number. We now consider the effect of the Peaceman-Rachford and DouglasRachford methods on the error vector, defined as the difference en = u, - u,, between the approximate solution u, obtained after the nth iteration, and the exact solution urnof the vector Eq. (2.6). A straightforward calculation shows that, for the Peaceman-Rachford method, the effect of a single iteration of (3.5)-(3.6)is to multiply the error vector en by the error reduction matrix T , defined by
+
+ +
+ +
+ +
T p = (V Z pI)-'(H - p I ) ( H 2 pl)-'(V - p l ) . (4.1) Likewise, the error reduction matrix for the Douglas-Rachford method (3.7)-(3.8)with all p n = p is given by
Wp = (Vl
+ pl)-l(H1 + pl)-'(H1V1+ [HiVi + p(Vi + HI) + P~II-'(HIVI + p21) =
+
p20.
(4.2)
If one assumes that D, = -+z p l = En also for the generalized Peaceman-Rachford method (3.3)-(3.4),then from (4.1): Tp
=
(Vi
+ d-'(Hi
- d)(Hi
+ d-'(Vi
- PO,
(4.3)
and the matrices W , and T , are related by 2W,
=
I
+ T,.
But other choices are possible. For example, with D, Douglas-Rachford method is
= pl =
+ z+ k - (v(V + z + (V + 3z)un + (tz + The error reduction matrix for is therefore (V + 2 + pl)-l{(2 + P l ) ( H + 2 + d)-lW - V ) (H
pnl)zcn+; =
Pnl)Un+l =
pnl)Un
(4.4) En,the (4.5)
PnI)un+i.
(4.6)
+ V + 32).
(4.7)
p = pn
l i p
=
ALTERNATING DIRECTION IMPLICIT METHODS
195
Error reduction matrices for still other AD1 methods of the form (3.3)(3.4) will be studied in Section 7.
5. Norm Reduction
For fixed D, E , the preceding AD1 methods have the form un+l = M u ,
+ b, where M is a fixed real matrix and b a fixed real vector. I n the terminology of Forsythe and Wasow [8],they are stationary iterative methods. For such methods, it is well known [8, p. 2181 that the asymptotic rate of convergence is determined by the spectral radius A ( M ) of the associated (error reduction) matrix M . This is defined as the maximum of the magnitudes of the eigenvalues of M ; thus
A(M)
=
Maxl {JXi(M)I).
(5.1)
Here the subscript 1 refers to the lth eigenvalue. A stationary iterative method is convergent if and only if its spectral radius is less than one. More generally, the spectral radius a! = A(A) is the greatest number such that the asymptotic error after n iterations, for n large, is o ( P n ) for any /3 > a. Hence R= -log A ( M ) measures the rapidity of convergence; R is called the asymptotic rate of convergence of A . In applying the convergence criterion A ( M ) < 1 t o AD1 methods, it is convenient to use the following well-known r e ~ u l t . ~
LEMMA5.1. For the norm 11x11 = ( ~ ’ Q X ) ” ~Q, a n y real positive definite matrix, if, for a fixed real matrix M , llMx11 5 ~llxllfor all real x, then A ( M ) 5 Y. This must be combined with another lemma, which expresses the algebraic content of a theorem of Wachspress and Habetlel.5 [24, Theorem 13. LEMMA 5.2. Let P and X be positive definite real matrices, with S symmetric. T h e n Q = ( P - S ) ( P S)-l i s norm-reducing for real x relative to the norm
llzll
+
= (zS-’x’)’’2.
Proof. For any norm llxll, the statement that Q is norm-reducing is l(S - P ) y / l z < \l(S P)ylj2 for every equivalent to the statement that I nonzero vector y = ( P S)-’x. In turn, this is equivalent for the special Euclidean norm IlzIl = (ZX-’X’)~/~ to the statement that y ( P 8)S-l (P’ 8’)~’> y ( P - X>X-l(P - X)’y’ for all nonzero y . Expanding the bilinear terms, canceling, and dividing by two, this is equivalent to the
+
+
+
+
4See Householder [ I Z a ] , where the general result for complex matrices is given. The phrase “norm-reducing” there refers to Euclidean norm only in special cases.
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
196
+
condition that y ( P P’)y’ > 0 for all nonzero y. But this is the hypothesis that P is positive definite.6
THEOREM 5.1. A n y stationary ADI process (3.3)-(3.4) with all D, = D and all E, = E i s convergent, provided Z D E i s symmetric and positive de$nite, and 2H + Z D - E and 2 V B E - D are positive de$nite. Proof. It suffices to show that A ( T ) < 1. But since similar matrices have the same eigenvalues and hence the same spectral radius, the error reduction matrix
+ + + +
+
T
=
(V
+ Z + E)-’(H - E)(H + Z + D)-’(Ti
- D)
(5.2)
of (3.3)-(3.4) has the same spectral radius as
+ +
+ Z + I3-l = + + D)-’][(V - D ) ( V + Z + D)-’]. (3.3) By Lemma 5.2, both factors in square brackets reduce the norm [ d ( Z + D + E)-1x]1/2 = 11x11, provided + D + E 2S, Rw [H + $2 + (D- E)/2] and RV = [V + $2 + ( E - D ) / 2 ] are positive definite, and Z + D + E is also symmetric.’ =
(V 2 E )T (V [(H - E ) ( H 2
=
=
6. Application
I t is easy to apply the preceding result to difference equations (2.2)-(2.3) arising from the Dirichlet problem for the self-adjoint elliptic differential equation (2.1). In this case, as stated in Section 2, H and V are diagonally dominated (positive definite) Stieltjes matrices. The same properties hold a fortiori for BIH 82V e3Z if all Oi 2 0 and O1 82 > 0. Hence the hypotheses of Theorem 5.1 are fulfilled for D = PI - 88, E = pI - 82 for any p, p > 0 and 8, 8 with 0 5 6 , 8 5 2. Substituting into (3.3)-(3.4), we get the following COROLLARY 6.1. If p, j? > 0 and 0 5 8, 0 S 2, then the stationary A D I method defined with 0’ = 2 - 8 by
+
+
+
+ e ~ / 2+ pi)u,+: = 1; - (v+ e ’ ~ / a(1’ + ez/2 + pI)u,,+1 I;; - (H + e’z/2 - ,ijI)u,+:, (H
=
(6.1) (6.2)
i s convergent. In fact, it is norm-reducing for the norm defined by 11~1= 1 ~ d ( Z D E ) - k = ~ ’ [ ( p i)1 (8
+ +
+
+ + e)Z/2]-1~.
6 N o t e t h a t P is not assumed t o be symmetric, but only to be such that r‘(P P‘)r > 0 for all real 5 # 0. ‘This result, for D - E = 0, is due t o Wachspress, Sheldon, and Hahetler (see [93,9 4 ) . For the analogous result on W , see Birkhoff and Varga [ I ] .
+
ALTERNATING DIRECTION IMPLICIT METHODS
197
COROLLBRY 6.2. The Douglas-Rachford method i s convergent for any $xed p > 0. The proof is immediate from (4.7), with 6 = 0 = 1. This result shows also that, if 8 = = 1 and if the largestS eigenvalue of T, is positive, the rate of convergence is less than half that of T,. The convergence of the Douglas-Rachford method has not yet been established for other values of 6, except when HZ = ZH and VZ = ZV. In a connected network &, this implies that B = aI is a scalar matrix, as has been shown in [ I ] . If HZ = ZH and VZ = f V , then thc two middle terms of (4.3) are permutable, and so we h a w
e’
T,
=
K-’(H - p l ) ( V - p l ) , I<
=
(H
+ Z + p l ) ( V + Z + PI).
This can bc compared with the identities
I
=
K-l(HV
+ (Z+ p ) ( H + V) + 2pZ + +
U p = K-’(HV
p2
22)
+ Z ( H + T’) + pZ + p 2 + Z’).
For any a,we therefore have
+ (1 - a)T,] = HT’ + p 2 + (CYZ + 2ap - p)(H + V) + 4 2 p Z + When a ( p + Z ) / ( p + 2Z), this is just KU,, proving K[al
22).
=
LEMMA6.1. If 01 = ( p + Z)/(p reduction matrix (4.7) i s Up= al
+ 2 2 ) , and
+ (1 - a)Z‘,,.
if 2
= al,
then the error
COROLLARY 6.3. If Z = al, then h(U,) < 1. When 2 = al is a scalar matrix, one can reduce the discussion of stationary AD1 methods of the form (6.1)-(6.2) to the case 6 = 8’ = 1, using the following result.
LEMMA 6.2. If z = p ea
- a / 2 , p’
=
uI, then (6.1)-(6.2) are equivalent, for
+ - a/2, to:
+ P’I)Un+1/2 k - (Vl - p‘I)u, (Vi + fi’I)~n+l k - (HI - p ’ I ) ~ n + ~ / z . (H1
=
=
+
p’ = p
+ Ba (6.3)
(6.4)
+
With H I = H 8 / 2 and Vl = V 2/2 as in (3.7) and (3.8), the verification is immediate. Lemma 6.2 is very helpful in choosing good parameters p and 6, as we will now see. 8Since T, may have complex eigenvalues, the condition is that an eigenvalue of largest magnitude be positive.
198
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
7. Optimum Parameters
For any given fixed p, p > 0 and 8, 8 satisfying 0 S 0, 8 6 2, Corollary 6.1 shows that (6.1)-(6.2) is convergent. We now estimate its asymptotic rate of convergence. By Theorem 5.1, this is R = -In [A(T)] = -In [A(p)],where as in (5.3) and in (6.1)-(6.2),
T = [(He - pI)(Hg + pI)-'][(V$ - j I ) ( V v + PI)-'],
(7.1)
with the notational convention
+
+
He = H 8212 and Ve = V 8212. (7.2) Both products in square brackets in (7.1) are symmetric matrices, and hence have real eigenvalues, if 8 = 8 or if Z = a1 is a scalar matrix. For simplicity, we now assume e = 0; we let a be the least and b the largest eigenvalue of He; we let a be the least and p the largest eigenvalue of Vet; and we restrict 8 so that 0 < a 5 b and 0 < a 6 p. Then the first product in square brackets in (7.1) reduces the Euclidean norm by a factor SUP^^^$^, l ( p - p ) / ( p p)l, or less, and the second product reduces it by a factor less than or equal to sup,s,sa I(. - p ) / ( v p ) l . Hence reduces the Euclidean norm by a factor
+
+
or less. By Lemma 5.1, we conclude
THEOREM 7.1. Let a, b and a, p be the least and greatest eigenvalues of He and Ver, respectively. T h e n , for all p, p , A ( F ) 5 #(a, b ; a,P ; p , p ) . It will be shown in Appendix A that there exist o p t i m u m parameters: values p* and p* of p and p such that +(a, b ; a,P ; P*, p*) = Minp,ii+(a, b ; a,P ; P, 6 ) . The following corollary is immediate.
(7.4)
COROLLARY 7.1. Under the hypotheses of Theorem 2, with 8 = 8, the spectral radius of the generalized Peaceman-Rachford method (6.1)-(6.2) with o p t i m u m parameters i s at most G(a, 6 ; 01, PI = Lfin,, +(a, b ; 01, P ; P, 6). (7.5) I n Appendix A, we will discuss the problem of obtaining such optimum parameters p* and p*. But for the present, we will confine our attention to the simpler problem of optimizing p subject to the constraint p = 8: that is, to the problem of determining a single optimum rho. We have COROLLARY 7.2. I n Corollary 7.1, let
p =
p. Let a, b and a, p be the least
and greatest eigenvalues of He and Vs1,respectively. T h e n , for all p ,
199
ALTERNATING DIRECTION IMPLICIT METHODS
(7.6) The right member of (7.6) defines a function of the eigenvalue bounds and p which is so important that we shall denote it b y a special symbol.
DEFINITION. The functions
0
< a 5 b and 0 < a: 5 p, by
X(p)
and F(a, b ; a,
0) are dejined, for given
and
F ( a , b ; a, PI = nlin,>o +(a, b ; a, P ; P I . (7.8) Note that F is a minimax of a family of rational functions; its existence will be established in Appendix A. The following restatement of the key inequality (7.6) follows from the definition of F.
COROLLARY 7.3. I n Theorem 7.1, for the optimum p asymptotic rate of convergence R* which satisfies
R*
=
=
p
= p*,
we have the
-In A(Tp*) 2 -In F(a, b ; a, 0).
(7.9) This corollary shows plainly that one can break down the problem of approximating po and bounding A. into two parts: estimating the least and greatest eigenvalues of He and Vet, and knowing the function F. We will discuss the second of these questions first, referring to Appendix A for details.
8. The Function
F
Some properties of the function F follow almost immediately from its definition by (7.8). LEMM-4 8.1. If a’ 5 a, b 5 b‘, a’ 5 a , and P 5 P’, then F ( U , b ; a, p) 5 F(a’, b’; a’, @’).(Monotonicity Principle) For, the range of values of u and 7 in (7.7) is enlarged, independently of p . Hence, for all p,
X(U, b ; a,P ; p ) 5 X(d,b’; a’, P’;
p).
From this inequality and (7.8), Lemma 8.1 follows immediately.
LEMMA 8.2. For all c
> 0,
F(ca, c b ; ca; cP) = F(a, 0 ; a, P).
For, the substitutions a + ca, 1) -+ab, a ---f ca, definition of F nnafffctcd.
(8.1)
. ,P
---f
CP leave the
200
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
By the symmetry of the definition, we also have (8.2) F(a, b ; a1 P) = F ( a , P ; a, b), and likewise X(a, b ; a, p ; p ) = X(a, p ; a, b; p) for all p. It is easy to show that 0 5 X(p) < 1for all p > 0, and hence that F < 1. The exact value of F can be computed [keeping the symmetry (8.2) in mind] using Appendix A. Theorem A.l asserts that if ab 6 ap, then F is given by (A.lO) as
with p* = 4%in the first case, and p* = in the second case. Note that since p Z >= 6and 6 p 2 62 a, all factors in (8.3) are positive. Using the preceding formula in Corollary 7.3, we obtain the following result.
./G
THEOREM 8.1. By choosing p* as d/abor as dz, we can make the asymptotic rate of convergence of the Peaceman-Rachford method (6.1)-(6.2) at least -In F , where F i s given by (8.3)) with a, b and a, p the least and greatest eigenvalues of He and Vop (or vice-versa, whichever makes ab 6 a@). 9. Helmholtz Equation in a Rectangle
As an example, consider the modified Helmholtz equation Gou- V 2u = S in the rectangle a:0 5 x 5 X , 0 6 y 6 Y . This is the special case A = B = 1, G = Go 2 0 of (2.l), to which one can reduce any elliptic DE (2.1) with constant coefin'ents by elementary transformations. I n this example, the Dirichlet problem has a known basis of orthogonal eigenfunctions upq = sin (rps/a) sin ( r q y l b ) . (9.1) On the set 6th of interior mesh-points of any subdivision of a into squares of side h = X / M = Y/N, these up, for p = 1, . , M - 1 and q = 1, . . . , N - 1 are also a basis of orthogonal eigenvectors for the three operators H, V , L: defined in Section 2. In fact,
..
Hu,, = PPQuPgl VUP, = VPPUPP, ZUP, = a u P 9 , (9.2) where p p q = 4 sin2(rp/2M), vPP = 4 sin2(rq/2N), u = h2Go. These eigenvalues ppq, vPq range from small positive numbers p~ = 4 sin2(r/2M), Y N = 4 sin2(r/2N) to 4 - p ~ 4, - VN. More specifically, we have the inequalities
4 sin2 ( r / 2 M ) 5
ppn (=
4 cos2 (r/2M),
(9.3)
ALTERNATING DIRECTION IMPLICIT METHODS
20 1
4 sin2 (?r/2N) 5 vpn 2 4 cos2 (?r/2N). (9.4) Since the three matrices, H , Ti, and 2 have a common set of eigenvectors (9.1), these are also eigenvectors for the error reduction matrices T,, W,, and U pdefined by Eqs. (4.3), (4.2), and (4.7), and their generalizations to arbitrary 0. The associated eigenvalues, which express the factor by which the uPg-componentof the error function is multiplied, are therefore given by
and, by (4.4),
+
(9.7) M W , ) = [I xP,(s,>l/2, where S , denotes the special case of T , obtained by the choice 0 = 0’ = 1, suggested by Sheldon and Wachspress. Using these general results, it is evident from (9.5) that the PeacemanRachford method is convergent for the Helmholtz equation in the rectangle provided p > (1 - 0)s/2; if 0 2 1, it is convergent if p > 0. Hence, by (9.7), the Douglas-Rachford method with 6 = 1 is convergent (in this special case) provided p > 0. It is also convergent, by (9.5), if 0 = 2 . For 0 = 0’ = 1, T, = X,, one can also compute the exact optimum rho and corresponding most rapid asymptotic rate of convergence for the Helmholtz equation in a rect,angle. By formula (9.5), the spectral radius is
For any fixed p, the two factors inside the absolute value signs are monotone, and so the maximum absolute value of each is assumed for one of the extreme values of pP and vq, numbers which are given by (9.3) and (9.4) respectively. As a consequence, we obtain
N T , ) = +(a, b ; P ; P I , (9.9) where a = paw a/2, b = 4 - pM u/2, a = V N u/2, p = 4 - V N u. Note that a b = a p = 4 a, whence Corollary A.l of Appendix A is applicable. It yields the following result, since ab 5 ap if M 2 N . f f ,
+
+
+
+
+
+
+
THEOREM 9.1. For the Helmholtz equation in a rectangle, with M 2 N, the optimum p for the Peaceman-Rachford method with 0 = 0’ = 1 i s p* =
d ff p = [(4 sin2
+ 5c) (4 cos2 A -+
;)I1?
(9.10)
202
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
The corresponding spectral radius is (9.11)
I n the case u = 0 (of the Laplace equation), the preceding formulas simplify. Then p* = 2 sin ( a / N ) , and the associated spectral radius is sin ( a / N ) - 2 sin2 ( s / 2 M ) sin (a,”) 2 sin2 ( a / 2 M )
+
cos (s/2N) - sin ( a / 2 N ) cos ( s / 2 N ) sin ( a / 2 N )
+
(9.12) 10. Monotonicity Principle
For most regions and most difference equations (i.e., for most choices of H and V ) , the eigenvalues p p of H and v q of V cannot be varied independently to produce an eigenvalue of Tp.As a result, though the spectral radius is bounded above by the right side of (9.11) for the Helmholtz equation with Dirichlet-type boundary conditions, on a n y rectangular mesh (R(h, k ) in which no connected row has more than M 1 and no column 1 ( N 5 M ) consecutive points, one does not know that p* more than N as given by (9.10) is really the optimum rho. In such cases (for arbitrary self-adjoint elliptic difference equations with Dirichlet-type boundary conditions), one can still determine good values of rho by relating the given boundary value problem to the Helmholtz equation in a rectangle, and applying Weyl’s monotonicity principle9 [25a].
+
+
THEOREM 10.1. Let A and B be two real n x n symmetric matrices, with eigenvalues a1 5 . . . 5 a, and 5 . . . 5 Pn, respectively. Let the eigenB be y1 I ... I 7%.T h e n ai P j 5 y h 5 az Pm if values of C = A i+j -I I k I 1 m - n.
+
+
+
+
This principle has many immediate corollaries for the operators H , V , H BE, V BE, and so on. For instance, it shows that if urninis the 0 2 exceed those of smallest eigenvalue of 2 , then the eigenvalues of H H (arranged in descending order) by at least .ni,& Likewise, it shows that the eigenvalues of H and V increase when A (z, y) and C(z, y ) are increased in (2.1), since one adds a diagonally dominated Stieltjes matrix to each, and such matrices are symmetric and positive definite.’O Finally, it shows that if the spectral radius (= Euclidean norm) of B is
+
+
+
We omit the proof. I n general, only nonnegative definite; but, in the present case, they are positive definite if A(z, v) and C(T, y) are increased a t all points. 9
10
ALTERNATING DIRECTION IMPLICIT METHODS
203
+ B = C differ from those of
a t most p, then the eigenvalues of A arranged in the same order by at most /3.
A
11. Crude Upper Bound
Using the preceding observations, one can easily obtain a crude upper bound" for A0 = A(T,,*) and in fact a "good" rho p1 such that X(p1) is less than unity by an appreciable amount. One need only combine Theorem 8.1 with the monotonicity principles of Section 10. For simplicity, we consider only the case of constant h, k. First, one observes that the matrices H and V are changed by positive semidefinite matrices when A ( x , y) and C(x, y) are increased in (2.1), and also when 2 is increased. It follows by Theorem 10.1 that if A ( x , y) and C ( x , y) are replaced at all mesh-points by their maximum and minimum values X, C and A, C,respectively, then the spectrum is shifted up resp. down, as regards all spectral values. Second, if the network @(h, k ) is embedded in a larger (rectangular) network 6 by any extension of the coefficient-functionsA ( x , y) and B ( x , y), then the least eigenvalue is decreased (or left unchanged) and the upper one increased (or left unchanged). This is because, on a,the effect of H and V is that of a matrix which is a principal minor of the corresponding matrices H and V on 6.The least and greatest eigenvalues aminand amax of H have such that vHv' = vgv' = aminvv' and eigenfunctions v, w with support WHW' = WGW'= a,,,vv', respectively. Hence u
a,in
=
min,+o[v&v'/vv'] 5
anlirl
5
a,,ar 5
rnax,#o[wITw'/ww']
= Zimaxl
and likewise for V . Combining the two preceding observations, we obtain the following result. = @(h,k ) can be embedded in n rectangle THEOREM 11.1 Suppose that with side of length M h and N k parallel to the axes. Then
(11.1) a
=
u
4A sin2 2M
a = 4~
u
sin2 2N
+ 2s
b
=
4x cos22 M + 5'
+ 22
p
=
4c cos2 2M
-1
1
i
T
T
s
+ -.2 S
(11.2) (11.3)
COROLLARY 11.1. If A ( x , y) = C ( x , v) and 221 2 N in Theorem 1 2 . 1 , then 5 F ( a , b ; a,p), where p1 =
X(pl) i1
dz.
This result was obtained for the Laplace equation in Varga [f 71.
204
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
+
+
Proof. In this case, a b = a p ; hence the conclusion follows. If A # C, however, in general a b # a p.
+
+
12. Eigenvalues of W,V One can obtain arbitrarily close approximations to the minimum eigenvalues p1 (and vl) of H (and V ) .For any nonzero vector x, the Rayleigh quotient satisfies x'Hx/x'x 2 pl; if y = Hx is any positive vector, then min;[(Hx);/x;] (= pl. Wachspress [W5] has invented an iterative process, based on the Stieltjes property of H and the inverse power method, for computing p1 with arbitrary accuracy. Similar remarks apply to v;. The less crucial maximum eigenvalues of H and V are bounded above by Gerschgorin's Circle Theorem [19], often with sufficient accuracy. For small mesh-length h, accurate asymptotic bounds can be found using the fact that on each connected row (resp. column) of @ h , H (resp. V ) defines a discrete Sturm-Liouville system. Such discrete Sturm-Liouville systems have been thoroughly studied in the literature."" The least eigenvalue of the matrix H , for small fixed h, is approximately h2 times the lowest eigenvalue of the corresponding continuous Sturm-Liouville system, a fact which gives a convenient asymptotic expression for p l ( h ) . The error in this bound is small for h sma!l."b The largest eigenvalue corresponds to an eigenvector, whose components oscillate in sign, and is about equal to 4Z, the maximum being taken over a.The error is ordinarily O(h), but is O(h2)if A = A(y). Similar estimates can be obtained for V . But the fact that the extreme eigenvalues in question can be accurately estimated does not imply that p* or h(T,) can be accurately estimated. As has already been observed in Section 7, p m and v, cannot be varied independently except in special cases (to be treated in Chapter 11). Ilb
See [g], Chapter X; also [ I O U ] . See [12b].
ALTERNATING DIRECTION IMPLICIT METHODS
205
PART II: COMMUTATIVE CASE 13. Introduction
It was proved in Birkhoff and Varga [ I ] that, for m > 1, the analysis of the asymptotic convergence rates discussed in Douglas and Rachford was applicable to the self-adjoint elliptic difference equations of Section 1 in a connected plane network @, if and only if the symmetric matrices H , V , and I; of (2.6) were commutative-that is, if and only if
[?'I
HV = VH, H Z
=
ZH,
VZ
= ZV.
(13.1)
In this chapter we study the extension of this observation to matrices generally. Accordingly, we consider the vector equation (13.2) (H 2)u = I,
+v +
where 2 is a nonnegative diagonal matrix and where H singular. As in [I] we make the following assumptions:
HV
=
VH
Z = a1 (u a nonnegative
+ V + Z is non(13.3) (13.4)
We do not assume that H or V is symmetric. Instead, we make the following weaker assumption: H and V are similar to nonnegative diagonal matrices. (13.5) Conditions (13.3)-(13.5) are related to (13.1) through the following:
THEOREM 13.1. If H and V are positive dejinite symmetric matrices, and if H V i s irreducible, then conditions (13.3)-(13.6) are equivalent to the
+
commutativity condition (13.1). The importance of conditions (13.3)-(13.5) for the study of AD1 methods depends on the following theorem of Frobenius:I3 THEOREM 13.2. T h e matrices H and V have a common basis of eigenvectors i f and only i f H V = V H and H and V are similar to diagonal matrices. From this it follows that H and V have a common basis of eigenvectors and nonnegative eigenvalues if and only if (13.3) and (13.5) hold. If (13.3)-(13.5) hold, then for any nonnegative constants O1 and 02 the matrices l2 We remark that by a slight generalization of Lemma 2 of ref. [ I ] one can show that if H V is irreducible then (13.4) is equivalent to the conditions H z = Z V and v2 = zv. l3 See Exercise 1 in Thrall nnd Tornheim [ I & ] , p. 190.
+
206
+
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
+
H OIZ and V 0 . 2 also have a common basis of eigenvectors and nonnegative eigenvalues. I n Section 14 we exhibit a class of problems involving elliptic partial differential equations which lead to systems of linear algebraic equations of the form (13.2) where the set of matrices H , V , and Z satisfy (13.3)(13.5). In Sections 15-17 we describe how the assumption of conditions (13.3)-(13.5)-leads to effective methods for choosing iteration parameters and for accelerating the convergence of the Peaceman-Rachford and the Douglas-Rachford methods. The application to the Helmholtz equation is given in Section 18. 14. Problems leading to Commutative Matrices
It has already been shown in Section 9 that the Dirichlet problem for the modified Helmholtz equation in a rectangle leads to matrices H and V which have a common basis of eigenvectors and positive eigenvalues. It then follows from the remark after Theorem 13.2 that H and V satisfy (13.3) and (13.5). Since Z = u1, with u 2 0, (13.4) holds also. It was shown in Ref. [I] that if H V = V H , where the matrices H and V arise from a differential equation of the form (2.1) and from the difference approximations (2.2)-(2.3), then the region is a rectangle, and the differential equation is the modified Helmholtz equation. However, as observed by Wachspress,14one can obtain matrices N, V , and Z satisfying (13.3)(13.5) from more general differential equations of the form
in the rectangle a:0 5 x 5 X , 0 5 y 5 Y.The functions El(%),Fl(y), Ez(z), F&) are assumed to be continuous and positive in a,and K is a nonnegative constant. Evidently (14.1) is a special case of (2.1) with A ( x , y) = E I ( ~ ) F I ( C(z, ~ ) , Y) = E2(x)Fdy), and G(z, Y) = KE~(~>FI(Y). A difference equation leading to commutative matrices H , V , and Z is obtained as follows: First, choose mesh sizes h and k such that X / h and Y / k are integers. Next divide (14.1) by Ez(x)F,(y) obtaining
Replacing - hkd[Eldu/dz]/dx and -hkd[F&~/dy]/dy by the expression^'^ given in (2.2) and (2.3), respectively, and substituting in (14.2) we obtain
(H
+ v + 0 4 x ,Y)
= t(z,
(14.3)
Private communication and Ref. [ah]. If one were to use the difference equation of Section 2, one would obtain matrices H and V which, though symmetric, would not in general commute. '1
15
ALTERNATING DIRECTION IMPLICIT METHODS
207
where
+ h, y) - A~(z)u(x- h, y),
(14.4)
V u ( x , Y) = C O ( Y ) U (Y~), - Cz(Y)u(Z,Y 4- k ) - C ~ ( ~ ) Uy( Z-, k ) ,
(14.5)
Hu(Z, Y)
=
Ao(Z)u(x,Y) - Al(x)u(x
Z = hkK,
and t(x, Y) = hkfl(x, y)/~%(~)Fi(y), Ai(z) CZ(Y>= hFz(y (kP))/kFl(y), etc. We now prove
+
=
+
kEi(~
(14.6)
(h/2))/hE2(~),
THEOREM 14.1. Let H, V , and Z be the matrices arising from the solution of the Dirichlet problem in a rectangle for the differential equation (i4.2)and using the difference equation (14.3). Then H, V , and Z satisfy conditions (13 3)-( 2 3.5).
Proof. We first prove
LEMMA 14.2. Under the conditions of Theorem 14.1, conditions (13.4) and (13.5) hold whether or not the region i s a rectangle.
Proof.Because of (14.6) the matrix Z satisfies (13.4). To show that H and V satisfy (13.5) we observe that the matrices H ( s ) = FH and V ( S ) = FV, where F is a diagonal matrix with nonnegative diagonal elements corresponding to the function F(z, y) = Ez(z)Fl(y), are the same as the matrices which one obtains by using the difference approximations (2.2) and (2.3) in (14.2). But in Section 2 it was shown that HCS)and VCs) are symmetric and positive definite. It then follows that H F = Fl/ZHF-l/Z = F-l/ZH(S)F-l/Z and V F= FI/ZV$’-l/Z =- F--1/2V(S)F--1/2 are symmetric. Moreover, since for any nonzero vector v we have (HFu,v) = (F-1/2H(s)F-1/2v, v) = (H(s)F-1/2v, F-%) > 0, since F-% # 0, it follows that HF is positive definite. Similarly, V F is positive definite. Hence H F and V F ,and consequently H and V , are similar to diagonal matrices with positive diagonal elements. To complete the proof of Theorem 14.1, it remains to show that H and V commute. This is equivalent to showing that BP = vg, where and p are difference operators which correspond to H and V , respectively. Actually, f7 and 7 are simply the operators H and V defined by (14.4) and (14.5) but restricted to functions defined only on R ( h , k ) . I n order to avoid the necessity of writing special formulas for H u and Vu for points adjacent to the boundary, where certain terms in (14.4) and (14.5) would be omitted, we write
R @ ( Z , y)
= AO(Z)U(R.,
y) -
ZI(.C, y>u(x
+ h, y)
(14.7) - &(x, Y > u (~ h, Y), l6 Theorem 14.1 can be generalized to include problems involving mised boundary conditions and nonuniform mesh sixes, as shown in Appendix C.
208
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
m x , 3) = Co(y)u(z, Y)
- Cz(z, y>u(x, Y + k )
- k),
- C4(x, y)u(x, y
where
+ h, y),
(14.8)
(14.9) Ay(x)r(x - h, Y), (14.10) = C4(.4r(x, Y - k ) and where r(z, y) = 1 if (z, y) is in @(h,k ) and r(x, y) = 0 otherwise. The use of the “projection operators” A and is especially convenient for the computation of products of operators. We now prove LEMMA 14.3. Let B and 7 be difference operators deJined over the rectangular nelworki7 @(h,k ) b y (14.7) and (14.8). T h e n and commute. Proof. For any u ( x , y) defined on @(h,k ) we seek to show that Bvu(x,y) = PBu(x, y) for all (2, y) in @(h,k ) . Evidently both Qvu(x,y) and PBu(z, y) are linear combinations of u(x, y) and other values of u in @(h,k ) . The coefficient of u(z h, y) for Bru(x,y) is ---K1(x, y)C~(y)= -Al(Z)Co(y)I’(x h, y) which is equal to the coefficient of u(z h, y) for Vf?u(x,y). Moreover, the coefficients of u(x h, y k ) are
Zi(x, y)
=
Ai(x)I’(x
c2(z,7 ~ )= c2(y)r(5,y + k ) ,
Z~(Z, y) -7 c4(x,Y)
=
r
w
+
+
+
+
+ h, y>r(.z:+ h, y + Ic) Al(z)Cz(y)r(z + h, Y + w x z , ‘Y + 1 ~ )
+
Al(z)Cz(y)r(x for i?pu(x, y) 2nd
+
+
for vBu(z,y). If (x h, y k ) does not belong to &(h, k ) both coefficients are zero. Otherwise, since the region is rectangular and since (2,y) is in &(h, k ) i t follows that both (x h, y) and (z, y k ) belong to B ( h , k ) . Thus the two coefficients are equal. Similar arguments hold for the coefficients of u ( x - h, y), u ( x , y k ) , etc., and the lemma is proved. The proof of Theorem 14.1 is now complete. We remark that the matrices H F and V P considered in Lemma 14.2 commute provided H and V commute. For problems to be solved on large automatic computing machines it may be advantageous to use symmetric matrices because of the savings in storage. The operators H F and VFcorresponding to the matrices H F and V F are given by (14.4) and (14.5) where
+
+
+
+ (h/2)1 + - (h/2)1)/J32(z), etc. ~1(x= ) E~(Z + ( h / 2 ) ) / 4 ~ 2 ( x ) ~+z ( h), z
Adz) = (&[x
Theorem 14.1 shows that, with a self-adjoint differential equation of the form (2.1), for there to exist a function P ( x , y) such that the matrices H , V , and Z satisfy (13.3)-(13.5), it is suficient that the differential equation have the form (14.1). Here H , V , and Z arise from the use of the difference *’A network @(h, k ) is “rectangular” if it consists of the points (m nhere i = 0 , 1 , . . . , p n n d j = 0 , 1 , . . . , 1, for some a,YO,h > 0 and k
+ Ih, yo + j k ) , > 0.
ALTERNATING DIRECTION IMPLICIT METHODS
209
approximations (2.2) and (2.3) for the differential equation obtained by multiplying both sides of (2.1) by P(x, y) = 1/E2(x)Fl(y). I n Appendix D it is shown that the condition is also necessary. It is natural to ask whether a similar necessary condition might hold for elliptic equations more gencrally. In this vein, Heller [ I d ] has shown t.hat for the equation
+ G(z, y)u = S ( x , y)
(14.17)
it is sufficient that A and ll depend on x, that C and E depend on y, and G is a constant. However, these conditions are not necessary,'* as the following example shows. Example. Consider the problem of solving the equation (14.18) in the unit square 0 < x < 1, 0 < y < 1 with prescribed values on the boundary of the square. Writing the difference operators H and V in the form (14.4)-(14.5) we have A&, y) = G ( x , y) = 2, A l ( x , y) = CZ(Z,y) = (z y h ) ( x y)-l and A&, y) = G ( x , y) = ( x y+, - h ) ( x y1-l. By direct computation one can show that the operators H and 7 commute. Hence so do H and V . To show that the matrices H and V satisfy (13.5) we observe that H is a tridiagonal matrix whose diagonal elements are positive and whose elements on the adjacent diagonals are negative. Replacing the nonzero off-diagonal elements ai,j by dat,,aj,i we get a symmetric matrix which is similar to the original matrix. Thus H has real eigenvalues and is similar to a diagonal matrix. Because of weak diagonal dominance H has nonnegative eigenvalues and is similar to a nonnegative diagonal matrix. Since the same is true of V, condition (13.5) holds. We remark that one could makc (14.18) self-adjoint by multiplying both sides by - ( x y)z obtaining
+ +
+
+
+
a -d X ((z
+ Y I 2 g)- aya + ((1:
-
(14.19)
Since this equation is not of the form (14.1), it follows from the necessary condition for self-adjoint equations stated above that the matrices H and V corresponding to (14.19) based on the difference approximations (9.9)and (2.8) will not commute even if one first multiplies both sides of (14.19) by any nonnegative function. Thus even though (14.19) and (14.18) are equiv'8This contradicts a statement of Heller [ I d , p. 1621. Even the weaker conditions that there exists a nonvanishing function P such that P A and PD depend only on x , that PC and P E depend only on y, and that P G is constant arc not necessary.
2 10
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
alent equations, by the use of one difference equation we obtain matrices H , V , and I: which satisfy (13.3)-(13.5) while with the other difference equation we do not. The question of how general the differential equations of the form (14.17) can be in order for the associated matrices H , V , and 2 to satisfy (13.3)(13.5) remains to be studied. 15. The Peaceman-Rachford Method
We now consider the Peaceman-Rachford method for solving (13.2) defined by (a, PnI)Un+l,Z = k - ( V , - P n I ) U n (15.1) (Vl P n I ) U n + l = k - (HI - PnI)Un+l!Z +2. Equation (15.1) is derived from where H1 = H +2, V1 = V (6.3)-(6.4) by replacing p’ and p‘ by pn. If the matrices H , V , and Z satisfy (13.3)-(13.5), then there exists a common basis of eigenvectors for H I = H ( a / 2 ) 1 and for V1 = V (a/2)1. Moreover, if v is an eigenvector of such a basis, then Hlv = p ~ , Vlv = V U , (15.2)
+
+ +
+
1
+
+
where p and v are suitable eigenvalues of Hl and V1 respectively. Hence the p)(v p ) , where eigenvalues of T,are all of the form ( p - p ) ( v - p ) / ( p
+ pI)-l(Hi - pI)(Hi + pI)-’(Vi
Tp = (Vi (See (4.1).) Moreover,
m
v, =
2
(P - PJ(. (p
- Pz)
+ p J ( v + Pi)
+
+
- PI).
(15.3) (15.4)
is an eigenvalue of IIE T,,.Evident’ly all eigenvalues of IIE 1 TPiare given by (15.4) for some eigenvalues I.C of H1 and v of VI. Thus we have
(15.5) where p and v range over all eigenvalues of H1and V1, respectively. In cases such as that of Section 14, where the eigenvalues of HI and V1 of a common basis of eigenvectors include all pairs ( p i , V J of eigenvalues pi of H , and vI of V1, one has equality. I n actual practice there are usually so many eigenvalues of HI and V1 that it is not practical to consider them individually even when they are known. It is, however, often practical to estimate upper and lower bounds for the eigenvalues of H1and V1. Thus, having estimated a, b, a,and /3 such b, a _I v 5 fl one seeks to minimize that a 5 p
s
ALTERNATING DIRECTION IMPLICIT METHODS
where p = inequality
(PI,
pz,
21 1
. . . , p"). Frequently, it is convenient to use the
b, 00 P, PI 5 [arn(ait6, p)12 where a = Min (a, a),6 = Max (b, p), and where
(15.7)
*rn(at
(15.8) The problems of minimizing \krnand 9, are equivalent to the problems of determining the minimax of the rational functions involved over certain domains. For the case m = 1 the problem of minimizing XPm is solved in Appendix A. The problem of minimizing 9rnfor m = 2', r an integer, has been solved by Wachspress [25]. The solution is sketched in Appendix B, which also contains a general discussion of the problem of minimizing a,,,.
16. Methods for Selecting Iteration Parameters for the Peaceman-Rachford Method
We now consider two choices of the iteration parameters for the Peaceman-Rachford method defined by (15.1). One choice of parameters was presented by Peaceman and Rachford in [ I S ] . The other was given by Wachspress [23, 241. Though neither choice of parameters is optimum, nevertheless, their use makes the Peaceman-Rachford method effective. We choose a and b so that for all eigenvalues p of H1 and v of V1 me have a 5 p, v 5 b and me let c =a -. (16.1) b
By (15.5), (15.6), and (15.7) the spectral radius of
I I E 1
T P sat , is fie^'^ (16.2)
where, by (15.8), @.,(a, b, p ) is given by (16.3)
The parameters of Peaceman and Rachford are (16.4) and those of Wachspress are l9 The exponent 2 a t the end of (16.2) was omitted in [I61 and in Young and Ehrlich ~91.
212
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG (i- l ) / ( m - 1) piw’ =
b
(:)
,
m z 2 , i = 1 , 2, . . . , m.
(16.5)
We first consider the average rate of convergence R, for the PeacemanRachford parameters when m is held fixed. We define Rmfor any choice of parameters byzo (16.6)
Moreover, by (16.2) we have
-
R, 2 R,
=
2 -;log
@,(a, b, p ) .
(16.7)
THEOREM 16.1. For fixed m if the iteration parameters are given by (16.4) then
where (16.9)
and x
= cli2m.
(16.10)
Moreover, as c -+0
R, 2 R,
4
= -z
m
+ O(z2).
(16.1 1)
Proof. The inequality (16.8) is proved in Appendix B, (B.14). To prove (16.11) we first note that by (16.7), (16.8), and (16.9)
R, h
2 -;log
4 m
6 = -2
+
O(22).
(16.12)
On the other hand, by (16.3), (16.4), and (16.7) we have
Equation (16.11) follows from (16.12) and (16.13). We next seek to optimize the choice of m for a given c. We estimate the average rate of convergence from (16.6) and (16.8) as 2 ELPI = -(16.14) log 6, m where 6 is given by (16.9). We note that, by (16.7) and (16.8), R , 2 R, 2 R P . *OEvidently for m Section 5.
=
1, R,, is juet the asymptotic rate of convergence a5 defined in
ALTERNATING DIRECTION IMPLICIT METHODS
213
Following a method of Douglas [4] we study the behavior of R$') as a function of m, where m is assumed to be a continuous variable. Because the right member of (16.9) is a monotone-decreasing function of m, by (16.10), a one-to-one correspondence between m and 6 is defined. Solving (16.9) and (16.10) for m we obtain (16.15)
Substituting in (16.14) we obtain
Eg)=
-4 log 6 log [(l - 6)/(1
+ S)].
log c
(16.16)
Equating to zero the first derivative of the above expression with respect to 6 we obtain 1-62 1-6 log -= 6 log 6. (16.17) 2 1+6 It is easy to prove LEMMA 16.2. The function Rg) defined by (16.16) i s maximized when
42 - I
6 =
and the corresponding Galue of
R,?
if$'
=
2
0.414,
(1 6.18)
is 4(10gj)~. 3.11 . -1ogc -1ogc
(16.19)
Of course, the value 8 = 0.414 will in general correspond to a noiiintegral value of m, and the actual value of Eg)would in general be less than indicated by (16.19). I n actual practice one would use the following procedure: (1) Estimate a and b, and compute c = a/b. (2) Find the smallest integer m such that ( i ) ? m 5 c, (16.20) where 8 = 4 5 - 1 A 0.414. (3) Determine the iteration parameters by (16.4). (4) The estimated average rate of convergence is given by
where
For the above procedure we prove
THEOREM 16.3. If for given a and b the number of iteration parameters m i s chosen as the smallest integer satisfying (16.20), and if the iteration parameters are chosen by (16.4), then for a n y q > 0 and for suficiently small c
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
214
R, 2 where 8
=
4 5 - 1A
B,
2
4(10g 8)' - 7 - 3.11 - Q -log c -log c ?
(16.21)
~
0,414. Moreover,
Lim {&,(--log
c)}
2 4(log 8)'
c)}
5 4 I log 2 1 {I log 8 I
A
(16.22)
3.11,
c-0
and r m {%(-log c-0
+ 8)
4.57.
(16.23)
Proof. If ism5 c then z 2 8 by (16.10). Consequently, by (16.7), (16.8), and (16.9), we have
2 log 6- = -- 2 log & m - - l ) / m -m m-1 I c we have m-1 Rm(-log c) 2 )-4( (log $ 2 , -
But since
@(m-l)
(16.22) follows. By (16.7), the inequality and, since m + m as c - 0 , (16.21) holds. On the other hand, by (16.13) we have
Using the formula -(1/2)log ((1 - .)/(1
+ .))
=
we have
.+
53/3
+ z5/5 + . 2x3
(1 -
i=3
i=2
7
2)2
*
*
, (16.24)
and hence (16.25) But by (16.20) we have m 2 (1/2)(log c/log
I ) , and
Because of (16.20) and (16.10) it follows that Lim 2 = Lim c1/Zrn= 1. c-io
c-0
Consequently (16.23) holds, and the proof of Theorem 16.3 is complete. For a given m, we have by (16.7) and (16.25)
ALTERNATING DIRECTION IMPLICIT METHODS
215
COROLLARY. If the p , are chosen by (16.4),then
(i 5:)
%(a, b, P ) 2 - exp [-2z3/(l - z ) ~ = ] 6c-223/(1-2)2.(16.27) We now consider the parameters of Wachspress given by (16.5).For the case of fixed m we prove
THEOREM 16.4. For given m, i f the iteration parameters are given by (16.5)
then
(16.28) where
(16.29) (16.30)
(16.31)
Proof. The inequality (16.28)is proved in Appendix B, (B.16).To prove (16.31)we first note that, by (16.7),(16.28),and (16.29)
R,
2 --log 2
+
e =4 y 0(Y2). m m On the other hand, by (16.3),(16.5),and (16.7)we have
(16.32)
(16.33) and hence, by (16.5),
4
R , S --log
(l - yy)
+
-
log
fi ('1 +-
i=3
y2'-3)
yZip3
=
4
-y
m
+ O(y2).
(16.34)
From this (16.31)follows, and the proof of Theorem 16.3 is complete. We now look for an m which will maximize the average convergence rate as estimated by 2 Eg.)= -log e, (16.35) m where E is given by (16.29).We note that by (16.7)and (16.28) R, 2 R, 2 ELrn. As in the case of the Peaceman-Rachford parameters we consider 7?Lm as a function of e, where by (16.29)and (16.30),e and ma re related by (16.29). Because e is a monotone -decreasing function of m,a one-to-one correspond-
216
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
ence between m and e is defined. If we were to replace m by m - 1 in (16.35), then we would have, by (16.29) and (16.30),
R??
= -8 log
4 log [(l - 4 ) / ( 1 + dill. log c
By Lemma 16.2 the optimum value of 4; would be :, the optimum of el would be
z
= 82 =
(di- 1 ) 2
=
3 - 245
A
(16.36)
4 5 - 1 & 0.414 and 0.172.
(16.37)
Of course the value Z = 0.172 will be inaccurate not only because of the replacement of m by (m - 1) in (16.35) but also because the value of m corresponding to 2 by (16.29)-(16.30) will not be an integer. I n actual practice one would use the following procedure: (1) Estimate a and b, and compute c = a/b. (2) Find the smallest integer m such that jjzcm-1,
5 -c
(16.38)
5 - 1 = 0.414. where 8 = 4 (3) Determine the iteration parameters by (16.5). (4) The estimated average convergence rate is given by 2 RAW,”’= -log €, m where In spite of the fact that the above procedure does not give the best value of m,we can prove
THEOREM 16.5. If for given a and b the number of iteration parameters m i s chosen as the smallest integer satisfying (16.38), and if the iteration parameters are chosen by (16.5), then for any 7) > 0 and for sz@iciently small c R, L R, 2 16(10g 8)’ -log c where 8
=
~
6.22 - 7, -log c
(16.39)
4 5 - 1 & 0.414. Moreover,
Lim {%(-log c)} L 16(log &)z
A
6.22,
(16.40)
c+o
and
Lim {R,(-log c)} 5 8 I log 8 1 (1 c+o
log 81
+ +8}
7.66.
(16.41)
Proof. If 8z(m-1) 6 c, then y 2 8. Consequently, by (16.7), (16.28), and (16.29) we have
ALTERNATING DIRECTION IMPLICIT METHODS
217
But since 82(m-2) 2 c we have, by (16.37),
R,(-log
c) 2 8(log $ 2
("?n
___
".
Moreover, since m 03 as c + 0, (16.40) follows. By (16.7), the inequality (16.29) holds. On the other hand, by (16.34) we have 1 - 2i--1 ---f
But, by (16.24) we have (16.42) Thus, from (16.38) it follows that
-
R,( -log c) I8 log I log l - y+8jlog81 l + Y
y3
(1 - YI2
.
Because of (16.30) and (16.38) it follows that Limy c-0
=
lim ~ 1 / 2 ( m - 1 )
=
8.
c+O
Thus (16.41) follows, and the proof of Theorem 16.5 is complete. For given m, we have by (16.7) and (16.42)
COROLLARY. If the pi are chosen by (16.4), then (16.43) Theorems 16.3 and 16.5 show that the Wachspress parameters are superior to the Peaceman-Rachford parameters by a factor of approximately two, provided that the values of m are chosen by (16.38) and (16.20), respectively. Numerical experiments described in Part IV tend to confirm this superiority. 17. The Douglas-Rachford Method
In Sections 3 and 4, two variants of the Douglas-Rachford method are given. The first is defined by (3.7)-(3.8); the second is defined by (4.5)(4.6). Because of the assumptions (13.3)-(13.5) on the matrices H , V , and 2 we can express the eigenvalues of the error-reduction matrices W , and
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
21 8
Up,defined by (4.4) and (4.7), respectively, in terms of the eigenvalues p‘ of H and Y’ of V . Thus xw =
(p’ (p’
+ u’)(.’ + u’) + P 2 + u’ + P)
+ u’ +
(17.1)
P)(V’
is an eigenvalue of W, where
.>
(17.2)
u’ = u/2,
and xu =
+ d(.’ + + + + + P) (P’ + + Pa
(P’
P>(V’
P2
(17.3)
Q
is an eigenvalue of Up. Both variants of the Douglas-Rachford method are identical if u = 0. We now show that for u > 0 the variant corresponding to W , is superior to the other variant. We show that using p’ = p u’ with the first variant yields an eigenvalue XW which is smaller for all positive p’ and Y’ than the corresponding eigenvalue of Up.This will imply that A(W,) 5 A(U,) and that for any PI, PZ, . . . , pm,
+
Kow, replacing p by
But since [(P’
.>
p
+ u’ in (17.1) yields
+’.(I. + + + PO
P21 - [(P’
+
.’)(Y’
+ u’) + + u’)zl (P
= u’(p’
+ + u”2, Y‘)
which is positive for u > 0, it follows that Xu > Xw for all p’ and d . Hence, for u > 0 the first variant of the Douglas-Rachford method is superior to the second. Henceforth we shall consider only the first variant. From (17.1), if p and v are eigenvalues of H I = H 0’1 and VI = V u’1,respectively, then
+
+
m
i=l
G
crv + Pi2 + PJ(V + Pi)
is an eigenvalue of H E l W,,?. It is convenient to define
and
(17.4)
ALTERNATING DIRECTION IMPLICIT METHODS
219
Evidently, we have
5 *E)(a, b, a, P, P ) 5 @,?(a, 6, P ) where a = Min (a, a),6 convergence R, by
=
(17.7)
Max (b, 0).We also define the average rate of (17.8)
Evidently by (17.7) we have
R, 2
R,
1
(D)
= -- log@,
m
- 6 (a, ,p).
(17.9)
The solution to the problem of minimizing for the case m = 1 is given in Appendix A. It is also shown that if a b = a p, then the Peaceman-Rachford method with the optimum single parameter is a t least as effective as the Douglas-Rachford method with the optimum single parameter. We now study the convergence of the Douglas-Rachford method with parameters as given by (16.4). This selection of parameters was used by Douglas and Rachford [7]. We shall assume that the eigenvalues 1.1 of H and v of V all lie in the range a 5 p , v 5 b. We now prove
+
+
THEOREM 17.1. For $xed m, if the
pi
are given by (16.4), then (17.10)
where
+
+(1 a'), and where 6 i s given by (16.9). Moreover, as c 60
B,
=
=2 z
m
(17.11) =
a / b -+0 we have
+ O@),
(17.12)
where z is given by (16.10). Proof. The inequality (17.10) is proved in a manner similar to that used in Appendix B to prove (B.14). To prove (17.12) we first note that by (17.6), (17.8), (17.10), and (17.11),
-
R, 2
1
-- log So =
m
1
--log m
1 + 2 2
+
(1
2)'
On the other hand, by (17.6): (17.7), and (17.9)
and by (16.4),
"~
1
2 m
____ = - z
+
O(2').
220
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
This completes the proof of Theorem 17.1. We next seek to optimize the choice of m for a given c. We estimate the average rate of convergence from (17.8) and (17.10) as (17.14) Relating m and 6 as in the case of the Peaceman-Rachford method we have, by (17.11)) (16.9), and (16.10), (17.15) Equating to zero the first derivative of
I??,”’ with respect to 6 we obtain
+
1-6 6(1 - 62) log -= (1 6 2 ) log (4 1 + 6 Let 8 be the solutionz1of (17.16) in the range 0 < 6 putations lead to the value 80 A 0.60.
+
$62).
(17.16)
< 1. Numerical com(17.17)
The corresponding value of 6 0 is 0.68. I n actual practice one would use the following procedure: Estimate a and b and compute c = a/b. Fine the smallest integer m such that $0 A
(S) 1-6
SC)
(17.18)
(17.19)
where 8 satisfies (17.16) and (17.17). By (17.17) one would actually use (0.25)2mS c. (17.20) Determine the iteration parameters by (16.4). The estimated average rate of convergence is given by (17.21) where (17.22) For the above procedure we prove The solution is unique in the range 0
< B < 1 since d2EAD’/dP < 0 in that
range.
ALTERNATING DIRECTION IMPLICIT METHODS
221
THEOREM 17.2. If for given a and b the number of iteration parameters m i s chosen as the smallest integer such that (17.19) i s satisjied, and if the iteration parameters are chosen by (16.4), then for any q > 0 and for sumiently small c 210g(310g(;+
R, 2
E,
i 8 2 )
-log c
=
where 8 satisfies (17.16) and 8
A
-* -
'*07 - ', -log c
(17.23)
1.16.
(17.25)
0.68. Moreover,
and
Lim {R,(
-log c)} 5 2
c-0
A
Proof. Let x be given by (16.10) and let (17.26)
+ 8').
If m satisfies (17.19), then 9,5 c, z 5 z, 6 5 8, and 60 5 $(l Consequently, by (17.9) and (17.10) we have
Em(-log c) 2 2(-log 2) log
(; + ;P) (A). m-1
Since m --+ m as c -+0, (17.24) follows. By (17. 9), the inequality (17.23) holds. On the other hand, by (17.13) we have 1
112
R, 5 -;logrI i = l
1 + z2(2i-1) (1 2 2 i - 1 ) 2
+
-
1 --logm
1
1 - 2 2
+
I --log-
2 c " 22i-1 +;
m (1 zI2 But by (17.19) we have
-
(1
a=2
R,( -log c) S 2jlog P /
+ z)2 < -
1 - 22 + 2 -. 23 1 --log---m (I z ) ~ m 1 - z2
+
+2
A}*
(17.27)
222
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
Because of (16.10), (17.19), and (17.26) it follows that Limz
=
1-8
-- 2 A 0.25. +
c-0
Hence (17.25) follows, and the proof of Theorem 17.2 is complete. For given m,we have by (17.9) and (17.27)
COROLLARY. If the p i are chosen by (16.4), then 1 - 22 @ P ( a ,b, p ) 2 exp [-2zS/(1 - z ) ~ ] (1 zIZ = 60 exp [-2z3/(l - z ) ~ ] . (17.28) Theorems 17.1 and 17.2 show that the Douglas-Rachford met,hod with the parameters (16.4) is much less effective than the Peaceman-Rachford method either with fixed m or with m chosen as a function of c = a/b by (17.19) and (16.20) for the respective methods. The Douglas-Rachford method is inferior to an even greater extent to the Peaceman-Rachford method with the Wachspress parameters for the case where m is allowed to depend on c. This does not necessarily imply, of course, that if optimum parameters were used with each method, the Douglas-Rachford method would be inferior to the Peaceman-Rachford method. However, as stated earlier, for the case a b = a p and m = 1, the Peaceman-Rachford method is definitely better. ~
+
+
+
18. Applications to the Helmholtz Equation
I n this section we apply the results of Sections 4 and 5 to the Dirichlet problem for the modified Helmholtz equation, (18.1) where Go is a nonnegative constant, in the rectangle 0 S x S X , 0 5 y 5 Y . As in Section 9, we assume that the mesh size is the same in both coordinate directions, and that for some integers M and N (18.2)
It follows from (9.3)-(9.4) that the eigenvalues p of H I (u/2)1 satisfy the eigenvalues v of V1 = V
+
=
H
+ (u/2)I and (18.3)
ALTERNATING DIRECTION IMPLICIT METHODS
where g
=
h2Go.If L a
=
=
223
max ( M , N ) , ? r u + 2 5 p,v 5 4 cos2+ - = b. 2L 2
? F u
4 sin2 2L
(18.4)
Given m, one could determine m iteration parameters for the PeacemanRachford method by (16.4) for the Peaceman-Rachford parameters and by (16.5) for the Wachspress parameters. One would also use (16.4) for the iteration parameters for the Douglas-Rachford method. On the other hand, if one lets m depend on c = a/b, then m can be determined by (16.20) and (16.38) for the Peaceman-Rachford and the Wachspress parameters, respectively, and by (17.19)-(17.20) for the Douglas-Rachford method. We now determine asymptotic formulas for the rates of convergence, with both parameter choices for the Peaceman-Rachford method and for the Douglas-Rachford method as h -+0. Evidently, by (18.4), we have =
where 2 we have
=
($ + %)h2 + O(h4),
(18.5)
max (X, Y ) .By Theorems 16.1, 16.2, 16.4, 16.5, 17.1, and 17.2,
THEOREM 18.1. For the Dirichlet problem for the modified Helmholtz equation (18.1) in the rectangle 0 5 x 5 X,0 S y 5 Y let RE), RAW, and RZ) denote respectively the average rates of convergence of the Peaceman-Rachford method with the Peaceman-Rachford parameters (16.4), the PeacemanRachford method with the Wachspress parameters (16.5), and the DouglasRachford method with the parameters of (16.4). Then as h + 0 ,
i m RZ) 2
4
- ('h)l/m
+ O(h!Vm),
8 (Kh)l/(m-l)+ O(h2/("')) RAW 2 m (18.6)
+
where K 2 = (7r2/4Z2) Go/8. I f the number m of iterations in each case i s chosen by (16.20), (16.38), and (17.19) respectively, then for any 7 > 0 and for sufiiently small h we have
224
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
PART 111: COMPARISON WITH SUCCESSIVE OVERRELAXATION VARIANTS 19. The Point SOR Method
For the solution of the matrix equation
+ +
Au = ( H V Z)U = k , introduced in Section 2, the n x n matrix A is, by its construction, real, symmetric, and positive definite. We now split the matrix A into A = D - E - E', (19.1) where D is a real diagonal matrix, and E' and E' are respectively strictly lower and upper triangular matrices.22Since A is positive definite, then D is a diagonal matrix with real positive diagonal entries, and is thus also positive definite. It is convenient to denote the strictly lower and upper triangular matrices D-I E and D-I E' respectively by L and U. Thus,
A
D ( I - L - V). (19.2) The point successive overrelaxation (SOR) method of Young [26] and Frankel [lo]is defined by =
+
+
(19.3) ( D - wE)u,+1 = [ ( I - w)D WE'>Un wlc, where uo is again some initial vector approximation of the unique solution of (19.1). The quantity w in (19.3) is called the relaxation factor. Since D - w E is triangular, this procedure is easily carried out and easily programmed. It is convenient to write (19.3) equivalently as
+ ( D - wE)-'l<, c, = (D - wE)-l{(l - W)D + WE'}. lln+l =
where
J?,u,,
(19.4)
(19.5)
The convergence properties of the matrix c, as a function of the parameter w follows from the following general result of Ostrowski [ l 4 ] .
THEOREM 19.1. Let the n x n matrix A be given by (19.1), where D i s hermitian and positive definite, and let D - W Ebe nonsingular for all 0 5 w 5 2. T h e n A($,) < 1 i f and only i f A i s positive definite and 0 < w < 2. Thus, as long as we choose any w with 0 < w < 2, we are sure of convergence for the successive overrelaxation iterative method of (19.3). While A(S,) is continuous function of w for the closed interval 0 5 w I 2 , 22
A square matrix T = I It,,,\ I is called strictly lower triangular if tt,? = 0 for a l l j 2 i.
ALTERNATING DIRECTION IMPLICIT METHODS
2 25
we are assured of the existence of an Wb such that A(&) 2 h(C,,) for all 0 I w 5 2. But Ostrowski’s Theorem doesn’t aid in the practical determination of this W b in the general case. Young [26] considered a class of matrices A = D - E - E‘ which possess the property that D is a real diagonal matrix, and that the real triangular matrices E and E’ could be simultaneously expressed, after a suitable permutation of indices, as
1 1;
0 0 0 F‘ E = -(19.6) F 0 E‘ = Here, the partitionings of both matrices are the same, with the diagonal submatrices being square and null. Denoting B=L+U as the point Jacobi matrix, Young proved withz3the assumption of (19.6) :
[
[TIT]‘
THEOREM 19.2. Let A = D - E - E’, where D i s a positive definite diagonal matrix, and E and E’ are real matrices of the form (19.6). I f A i s positive definite, then h(C,) > A(C,b) = - I, for UZZ W # W b , (19.7) where (19.8) The assumption (19.6) on the form of the matrices E and E’ is a special case of what Young calls Property ( A ) for matrices. Note that this assumption precisely characterizes the optimum relaxation factor w., If R ( M ) E -log h ( M ) is the asymptotic rate of convergence for the matrix M , then Young proved R(C,,)
-
2 h { R ( B ) }1’2
(19.9)
as A(B) 1 1. This simple formula shows that, for small mesh size, successive overrelaxation with optimum parameters gives order of magnitude improvements in computing time over the Jacobi (and Gauss-Seidel) iterative methods. 20. Helmholtz Equation in a Square
The basic theoretical results established for AD1 methods (Theorem 5.1) and the point SOR method (Theorem 19.2) were for one-parameter stationary iterative methods, i.e., where one fixed parameter is used in the course 23
This is actually a special case of results in [ 2 6 ] .
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
226
of computation. We now compare the one-parameter Peaceman-Rachford method with the one-parameter point SOR method for the numerical solution of the Helmholtz equation Gou - V 2 u = S in the unit square with uniform mesh spacing h = 1/N. To make this comparison equitable, we shall optimize the asymptotic rate of convergence of each method as a function of its single parameter. For the point successive overrelaxation method, the condit,ions of Theorem 19.2 are fulfilled by the matrix A = D - E - E' for any N > 1. I n this case, i t can be verified that A{D-l(E
where a
=
minA(C,)
- cos (a/N) + E')} = A(B) = 4 4cos+(a/N) Goh2 1 + a/4
(20.1)
Goh2.Thus, from Theorem 19.2, =
h(C,,) =
Wb
-1
0
cos (a/N)
y
+ a/4 + [(l + a/4)2 - cos2(a/N)211/2 (20.2) For the Peaceman-Rachford method, the matrix 2 in this special case is al. The eigenvalues of the matrices H1 H + (l/2)2 and V+ (1/2)2 can be conveniently calculated from (8.3)-(8.4), and all lie in the interval 4 sin2 (a/2N) + a / 2 5 z 5 4 cos2 (7r/2N) + a/2. Appealing to =
{l
=
V1 =
Theorem 8.1, we conclude for this problem that
where
6 - 4 sin2 ( a / 2 N ) 6 4 sin2 ( s / 2 N ) a/2
(20.3) + j3 = ((4 sin2 ( a / 2 N ) + u/2)(4 cos2 (a/2N) + ~ / 2 ) ) l / ~ . (20.4)
minA(T,) P
>o
=
A(T;)
=
+
But it can be verified that the expressions in (20.2) and (20.3) are identical. Thus we obtain the resultz4 THEOREM 20.1. For the Helmholtz equation in the unit square, the optimized one parameter Peaceman-Rachford method and the optimized point SOR method have identical asymptotic rates of convergence for all h > 0.
We point out, however, that numerical requirements for these two methods are different, since the Peaceman-Rachford method requires roughly twice as much arithmetic computations per mesh sweep as does the point SOR method. This will be discussed more in detail in Part IV. We now consider the asymptotic convergence rates of these methods for small h = 1/N. For the point successive overrelaxation method, it follows from (19.9) and (20.1) that
R(T;) = R(C,,) *4
N
2(a2
+ Go)'12h,h -+ 0 ,
In Varga [I71 only the special case GO = 0 was considered.
(20.5)
ALTERNATING DIRECTION IMPLICIT METHODS
227
whereas for the point Gauss-Seidel method, the special case w = 1 of (19.3) kzq for pmposes of comparison, an asymptotic rate of convergence
R(&)
- ( + 2) 1r2
h2, h -+0.
(20.6)
Young and Ehrlich [29] have extended the analysis for Laplace's equation (Go = 0) of a single optimized parameter Peaceman-Rachford method to that of a fixed number m 2 1 of optimized parameters used cyclically, and they showed that (20.7)
See also Section 18. If, however, the number of parameters m is allowed t o change as a function of the mesh spacing h, it can be shown (Section 18) that (20.8)
for all h sufficiently small. These results for the Helmholtz equation for the square rest firmly on the fact that the matrices H and V possess a common basis of orthonormal eigenvectors. But for such problems, the results of (20.5) and (20.8) show that m-parameter AD1 methods are superior for m > 1, in terms of asymptotic rates of convergence, to point SOR methods for all sufficiently small mesh spacings h.
21. Block and Multiline SOR Variants
Several extensions of the results of Section 19 are of practical and theore€icaI interest. First, Ostrowski's Theorem 19.1 permits the use of nondiagonal matrices D. This, however, means that the corresponding SOR method of (19.3) requires the direct solution of nondiagonal matrix equations, like those first introduced in the definition of AD1 methods in (3.3)-(3.4). Second, Young's Theorem 19.2 can be similarly rigorously extended to the case where D is not diagonal, and the corresponding method is called the block or multiline XOR method. One can also show, for irreducible Stieltjes matrices A , that the asymptotic rate of convergence is increased as one passes from point to block or multiline SOR methods, which makes these extensions of practical value. See Varga [19, 201 and references given there. It is relevant to point out that multiline SOR methods are theoretically a special case of block SOR methods, but in actual practical computations, the entries of a block correspond to the mesh points of k adjacent horizontal
228
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
(or vertical) mesh lines, hence the name k-line SOR. Parter [I51 shows that the rate of convergence of k-line Jacobi method for Laplace’s equation in a rectangle is k (21.1) R(B‘”) 5 X2h2,h + 0
-
where X2 is the minimum eigenvalue of the Helmholtz equation vzp
+
xzp =
(21.2)
0.
Theorem 19.2 can be applied, and we conclude from (19.9) that
R(&p’)
-
2& Ah, h -+ 0.
(21.3)
Thus, increasing the number k of lines in SOR methods yields improved asymptotic rates of convergence, but these asymptotic results are always O(h), as h + 0, in contrast with AD1 methods which have asymptotic convergence rates O(hllm)for this problem. See Section 18. Moreover, the arithmetic requirements2s per mesh point of the multiline SOR methods 6) multiplies and ( k 7) increase linearly with k , in that roughly (k additions are needed per mesh point. These combined observations suggest that k = 1 or k = 2 be used in practical problems. Another generalization of Young’s work is based on the concept of weakly cyclic matrices of index p 2 2, an outgrowth of earlier work by Frobenius. We say that a matrix M is weakly cyclic of index p 2 2 if there exists a permutation matrix P for which P M PT has the form
+
+
(21.4)
where the diagonal submatrices are square and null. If we assume that thc U is of this form, then we can state [21] matrix B = L
+
THEOREM 21.1. Let B = L
+
U be weakly cyclic of index p 2 2. If the has nonnegative real eigenvalues less than unity, then (21.5) A(&J > A(&,) = (W - l ) ( p - I), 0 # WE,, where W b i s the unique positive root less than p / ( p - 1) of AP(B)w~” = p”(p - l ) ‘ - P ( ~ b - 1). (21.6) Moreover, R(&) ( 2 p 2 / ( p - 1))1’2{(R(B)}1~2 as h ( B ) increases to unity. matrix
B p
-
25 See [ZO],where some representative arithmetic requirements are given for SOR variants.
ALTERNATING DIRECTION IMPLICIT METHODS
229
The case p = 2 is originally due to Young [26]. Other extensions of Young’s work are worth mentioning. First, it is apparent that the successive overrelaxation method is basically a one-parameter iterative method, in that one selects a single optimum relaxation factor. In generalizing this to iterative methods using a sequence of wi’s, Golub and Varga [ I l l make use of a familiar idea of considering Chebyshev polynomials26 in the matrix B. It is shown that use of optimum relaxation factors wi’s is always superior for any number of iterations to the original successive overrelaxation method of Young and Frankel, but asymptotic convergence rates are unaffected. This superiority has been confirmed in numerical experiments (See Ref. [ I l l ) . In Part IV, these improved SOR variants are compared numerically with AD1 methods.
22. Analogies of AD1 with SOR
The theory of successive overrelaxation for weakly cyclic matrices of index p 2 2 can be applied to AD1 methods. First, we write the equations (3.1)-(3.2), leading to the definition of the Peaceman-Rachford method, in the form (with 6’ = 1 - 6 ) : (22.1) u = ( H 68 p I ) - l { k - (V 6’Z - p l ) u } ,
+ + + u = (v + ez + pi)-l{k - ( H + 6’8 - p l ) ~ } .
(22.2)
Dealing with column rectors with 2n componcnts and 2n X 2n matrices, this can be written as (22.3)
where
c, = ( H + 6 8 + pi)-’(pi - 6’8 - IT); D, = (v + ex + pi)-ypi- e’z - H ) .
(22.4)
The 2n x 2n matrix of (22.4) is thus weakly cyclic of index 2, and applying the successive overrelaxation method with w = 1 (called the Gauss-Seidel or single-step method), we obtain
,,p+s
=
C,I,@)
+ gl;
+
up+’) = D p u(“+’) 1 g2.
(22.5)
Except for notation, this is equivalent to (3.3)-(3.4) for a single fixed acceleration parameter p. Similarly, it can be shown [ I 8 1 that the Peaceman-Rachford method, with nz parameters pz used cyclically, is just the successive overrelaxation 26 The use of Chebyshev polynomials in problems of numerical analysis goes back to Flanders and Shortley [ I & ] .
230
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
method with w = 1 applied to a 2mn x 2mn matrix which is weakly cyclic of index 2m. There is another interesting comparison between SOR methods and AD1 methods. Consider the numerical solution of the Helmholtz equation of Section 2 on a uniform mesh h in a rectangle, and suppose that the initial error vector do) is such that all its components are zero, save one (which we assume is positive). Then, one iteration of the Peaceman-Rachford method distributes this error at a single point over the entire mesh. On the N -2 other hand, if the rectangle is M h x Nh, it could take up to M iterations of the point SOR method to accomplish the same task. See [19]. Intuitively, successive overrelaxation seems less attractive from this point of view. A final analogy [19] between these different methods is that both can be thought of as approximations to the time-dependent parabolic partial differential equation
+
dU
- = -G(x,
at
Y)U
a +ax
with prescribed initial conditions. SOR methods can be viewed as explicit approximations to (22.6) in which the relaxation factor w = At plays the role of the time increment from one step to the next. AD1 methods, on the other hand, can be viewed as implicit approximations (like the CrankNicolson method), in which the iteration parameter p = 2/At plays the role of the reciprocal of the time increment.
ALTERNATING DIRECTION IMPLICIT METHODS
231
PART IV: NUMERICAL EXPERIMENTS
23. Introduction I n this chapter we describe some numerical experiments which were conducted to test the theoretical predictions of Part I1 on the convergence of the Peaceman-Rachford method. One set of experiments involved the solution of the Dirichlet problem with Laplace’s equation for the regions shown in Fig. 1. These experiments were run at the University of Texas Region I.
Unit square ( h = -1 1 5
Region 11.
1 L 1 1
’ 10 ’ 20 ’ 40 ’ 80 ’ 160 )
4 X 10
Unit square with
4 - square
10
removed from center
Region 111.
Unit square with
1 5
X
1
- square 5
removed from each corner
1 1 Region IV. Unit square with - X - square 2 2 removed from one corner
(h=L 10
Region V.
’
1 20
’
r ’ 180 -’ 1601 , 40
Right isosceles triangle with two equal sides of length unity ( h = -1 1 1 1 1 -1) 5
’ 10 ’ 20 ’
40
’
80
’ 160
232
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
and are described in Sections 24-26. Another series involving more general differential equations and boundary conditions were conducted at the Gulf Research and Development Company. They are described in Section 27. 24. Experiments with the Dirichlet Problem
For each of the regions shown in Fig. 1 the five-point finite difference equation analog of the Dirichlet problem with Laplace’s equation was solved27 for a number of mesh sizes using the Peaceman-Rachford method and the successive overrelaxation method. I n every case, the boundary values were assumed to vanish; hence both the exact solution of the Dirichlet problem and the finite difference analog vanish identically. The advantage of this choice is that a t each stage the approximate value at a given point is exactly equal to the error at that point. I n each experiment, starting values of unity were assumed at each interior mesh point, and the iterative process was terminated when the approximate values at all mesh points became less than in absolute value. We remark that the term “successive overrelaxation” means point successive overrelasation as distinguished from block overrelaxation (see Section 21). For the Peaceman-Rachford method three choices of iteration parameters were used: the Peaceman-Rachford parameters of (16.4) ; the Wachspress parameters (16.5);and the optimum parameters. The optimum parameters were chosen by a procedure of Wachspress [25] for m = 1, 2, 4 (see Appendix B). For m = 3 the determination was made numerically on the computer using a successive approximation procedure. One, two, three, and four Peaceman-Rachford parameters and optimum parameters were used, while two, three, four, and five Wachspress parameters were used. Mesh sizes of h = 1/5,1/10,1/20,1,40, 1/80, l/l20, and 1/160 were used,2* though not all mesh sizes were used for every region or every parameter choice. Table I lists the numerical values of the parameters used. For the successive overrelaxation method the optimum relaxation factors were determined analytically for the square region using (19.8) and (20.1) and empirically for other regions to within f0.01, and are given in Table
*’
The following computing machines were used: the Control Data 1604 computers a t the University of Texas, a t the Control Data Corporation, Minneapolis, Minnesota, and a t the National Bureau of Standards, Boulder, Colorado; the IBM 704 and 709 computers a t Texas Agricultural and Mechanical College. The work of L. W. Ehrlich and W. P. Cash of the University of Texas Computation Center is gratefully acknowledged. 28 I n previous experiments by Young and Ehrlich [29],one, two, three, and four Peaceman-Rachford parameters were used with mesh sizes of h = 1/5, 1/10, 1/20, and 1/40.
ALTERNATING DIRECTION IMPLICIT METHODS
233
11. Because of the large amount of machine time which would have been required, the mesh sizes of h = 1/160 and h = 1/120 were not used, and h = 1/80 was used only with the square. Tables I1 and I11 give values of N ; for the Peaceman-Rachford method for the values of h and m indicated. Here N ; refers to an actual or estimated number of iterations, and a and p have the following meanings:
1
P Peaceman-Rachford parameters
a = W Wachspress parameters
B optimum parameters
’
=
I
o observed number of iterations v “virtual” number of iterations, as defined below t predicted number of iterations, as defined below c predicted number of iterations, as defined below
For a given m, the virtual number of iterations Nua was determined by Nya = log 106/-1og ha where A“ is the estimated mean spectral radius found by estimating the limiting value of the quantit,iesjn = for n = m 1, m 2, . . , where en denotes the maximum absolute value of the approximate solution (and hence in this case of the error) after n iterations. In the case of the square, the matrices H and V have a common basis of eigenvectors. As long as in the expansion of eo in terms of the common basis of eigenvectors the component of the vector v associated with the largest eigenvalue of nT= T p ,does not vanish, then j,, will approach the mth root of the spectral radius of IIT=1 T,,,.
+
m-,
+
TABLEI
ITERATION PARAMETERS h-1
m
5 5
1 2
5
3
5
4
5
5
Peaceman-Rachford 1.1755705 0.67009548 2.0623419 0.55560485 1.1755705 2.4873180 0.50591866 0.88754970 1.5570576 2.7315972 -
-
Wachspress 0.38196601 3.6180340 0.38196601 I .1755705 3.6180340 0.38196601 0.80817863 1.7099760 3.6180340 0.38196601 0.67009550 1.1755705 2.0623419
Optimum 1.1755705 0.54887621 2.5178099 0.45359594 1.1755705 3.0466903 0.42174787 0.78715591 1.7556445 3.2767586
234
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
TABLE I. (Continued) h-1
m
Peaceman-Rachford
10 10
1 2
10
3
10
4
10
5
20 20
1 2
20
3
20
4
20
5
0.61803400 0.24596235 1.1529451 0.18092034 0.61803400 2.1112388 0.15516607 0.38988857 0.97967999 2.4616595
-
0.31319083 0.087907193 1.1158188 0.057556742 0.31319083 1.7042051 0.046572773 0.16592687 0.59115497 2.1061338 -
-
40 40
1 2
40
3
40
4
40
5
0.15695853 0 031115900 0.79174897 0.018143240 0.15695853 1.3578601 0.013854188 0.069884949 0.35252200 1.7782335 -
--
Wachspress 3.6180340 0.097886967 3.9021131 0.097886967 0.61803399 3.9021131 0.097886967 0.33438737 1.1422860 3.9021131 0.097886967 0.24596234 0.61803399 I .5529451 3.9021131
-
0.024623319 3.9753768 0.024623319 0.31286893 3.9753768 0.024623319 0.13407789 0.73007542 3.9753768 0.024623319 0.087771701 0.31286893 1.1152452 3.9753768 0.0061653325 3.9938348 0.0061653325 0.15691819 3.9938348 0.0061653325 0.053345898 0.46157849 3.9938348 0.0061653325 0.031103904 0.15691819 0.79164722 3.9938348
Optimum
0.61803400 0.18760957 2.0359623 0.13497175 0.61803400 2.8299802 0.11821609 0.33457893 1.1416320 3.2310830 -
-
0.31319083 0.024623319 3.9753767 0.040456047 0.31319083 2.4245772 0.033125729 0.14039288 0.69723598 2.9550133 -
-
0.15695853 0.022434444 1.0981320 0.012261962 0.15695853 2.0091540 0.0093924997 0.058924690 0.41809268 2.6229420 -
-
ALTERNATING DIRECTION IMPLICIT METHODS
235
TABLEI. (Continued) m
Peaceman-Rac hford
80 80
1 2
80
3
80
4
0.078519631 0.011003253 0.56031907 0.0057152520 0.078519631 1.0787508 0.0041190070 0.029393390 0.20975235 1.4968007
80
5
h-1
-
-
120 120
1 2
120
3
120
4
120
5
0.052353896 0.0059900539 0.45758027 0.0029079793 0.052353897 0.94255504 0.0020261500 0.017708830 0.15477761 1.3527777
-
160 I60
1 2
160
3
160
4
160
5
0.039267386 0.0038908000 0.39630090 0.0018004230 0,039267385 0.85642517 0.0012247357 0.012360483 0.12474654 1.2589880
-
-
Wachspress
Optimum
-
0.078519 631 0.0078568620 0.78470673 0.0037654768 0.078520642 1.6373638 0.0026918638 0.024740608 0.249 19891 2.2903583
0.0015419275 3.9984583 0.0015419275 0.078519632 3.9984582 0.0015419275 0.021183944 0.29103799 3.9984582 0.0015419275 0.011003253 0.078519632 0.56031907 3.9984582
-
0.00068535005 3.9993148 0.00068535005 0.052353897 3.9993148 0.00068535005 0.012338721 0.22214056 3.9993148 0.00068535005 0.0059900539 0.052353897 0.45758027 3.9993148 0.00038551904 3.9996147 0.00038651904 0.039267386 3.9996147 0.00038551904 0.0084082046 0.18338369 3.9996147 0.00038551904 0.0038908000 0.039267385 0.39630090 3.9096147
-
0.052353896 0.0042633301 0.64290834 0.0018960170 0.052354899 1.4456572 0.0013022248 0.014899126 0.18396586 2.1048059 0.039267386 0.0027647161 0.55771640 0.0011669595 0.039267903 1.3213529 0.00077925470 0.010397443 0.14829872 1.9787209
-
-
-
TABLE 11. PREDICTED AND OBSERVED NUMBERSOF ITERATIONS FOR THE PEACEMAN-RACHFORD METHOD AND THE SUCCESSIVE OVERRELAXATION METHOD" Region I
Region I1
Region I
m
h-1
5
10 h)
w
0.
20
40
1 2 3 4 5 SOR
11 8 8 8 -
1 2 3 4 5 SOR
22 14 13 12 -
1 2 3 4 5 SOR
44 22 19 17 -
1 2
87 33 26 24 -
3
4 5 SOR
10 7 6 6
10 7 6 6
-
-
22
22 11
11
10
9
9
44 17 14 13
88
25 18 16
9
_
44 16 13 12
_
86 24 17 15
-
10 11 12 14
_
22 16 17 19
_
44 24 23 24
_
86 34 29 29
_
10 8 8 9 22 12 11 12
44 18 15 15
86 26 19.5 18
12 10 9 8 12
5 5 5 5 (w
12 11 8 9 8 8 8 8 = 1.27)(Nt = 15)
23 16 15 15
-
18 8 9 7
28
(w =
46 24 21 20
-
37 14 11 11
53
(w =
91 36 27 27
-
(w
11
22 14 13 12
1.54)(Nt = 32) 46 18 15 12 1.74)(Nt
91 26 18 18 = 1.86)(Nt
73 22 15 14
117
23 12 12
44 22 19 17
=
11
7 3 4 5
7 6 6
20 9 9 6 40 16 12 11
22 11 9 9 -
44 18 13 12 -
66)
85 33 26 24
=
-
136)
85 23 17 14
85 25 18 15 -
_
-
_
_
-
-
_ _
-
16 13
14 11 12 12
11
10 17 37 20 17 16 38 75 28 26 21 70
(w
-
-
-
-
-
-
-
-
-
-
37 22 17 15 = 1.57)(Nt
72 23 20 19
75 26 27 19 = 1.75)(Nt
-
-
19 12 16 11 8 9 12 10 10 10 = 1.25)(Nt = 14)
-
(w
-
16 15
35 17 15 16 (w
-
-
19 13 9 11 -
42 18 17 15 = 35)
39 16 14 12
11 -
85 30 23 20.5 = 60)
79 22 17 15
85 30 25 14 -
42 17 16
80
I 2 3 4 5
175 48 35 30
175 36 24 2"
175 34 23 19
_
175 49 38 35
_
175 37 25 22
SOR 120 h)
w
v 160
1 2 3 4 5
264 59 41 35
1 2 3 4 5
347 69 46 38
264 45 28 23 347 52 31 25
264 42 26 22
264 60 43 39
264 45 29 24
347 49 29 24
347 70 48 42
347 42 32 26
-
-
-
-
183 49 37 31 236 274 61 44 38
_ 71 47 39
-
146 32 21 18 (O
166 36 25 - - 17 = 1.93)(N; = 292)
166 34 24 19
155 43 34 34
269 59 40 34
46 25 21
269
237 53 41 40
70 46 39
50 31 22
41 23 19
_ -
47 27 22
183 36 24 20
166 48 34 31
274 30 24
33 27 -
- - - _
-
27 22 -
29 21
63 47 39
145 37 26 24
-
155 37 36 30
166 46 31 21
-
166 31 20 25
237 -
269 50 37 33
-
-
- -
-
-
74 42 34
For explanation of ATB";
LY
=
P , W , B ; p = t, c,
0,v;
and N , see Section 24.
-
-
_
-
_
~~
a
166 33 31 25
~~
TABLE 111. OBSERVED NUMBERS OF ITERATIONS FOR THE PEACEMAN-RACHFORD METHODAND THE SUCCESSIVE OVERRELAXATION METHOD^ Region I V
Region I11 h-’
m
NOP
N,R
__
~
5
1 2 3 4
5 SOR 10 h)
0 W
1 2 3 4 5
SOR 20
1 2 3 4
5 SOR 40
1 2 3 4 5
SOR
8 8 9 10 11 19 13 12 15 26 36 20 16 16 51 75 30 22 21 108
Region V
9 11 10 c (w =
18 15 16 16 (w =
36 18 20 22 (w = -
80 23 23 27 (w =
.-
-
8 7 8 8 8 10 9 10 10 8 9 J - 8 1.21)(Nt = 13)
7 7 9 9 -
18 13 12 12 1.50)(A7t= 28)
17 13 14 14
18
37 16 15 19 1.71)(Nt
37 17 17 19
36 15 14 16 -
37 18 17 19 41
85 19 18 - 24 1.85)(N, = 126)
74 23 16 14 -
75 28 23 23 85
19 12 14 15
75 23 19 19
36 19 15 12 ~
=
11 12 14
-
58)
74 28 21 19
-
17 12 14 13 20
20 14 13 12 (w = -
35 18 19 19 (w =
73 26 22 25 (w =
17 13 14 13 1.41)(Nt
19 12 11 11 -
=
19 12 11 10
-
79 30 24 18 = 96)
85 23 16 22
75 26 21 24 l.Sl)(iVt
30 17 15 17
7 8
7
19 14 12 11 -
16 11 11 11 17
42 16 15 16 -
33 17 15 15 41
22)
42 20 15 16 1.65)(Nt = 46)
37 17 18 19
8 7
79 24 18 20 -
67 25 19 19 76
9 8 9 8 (w =
19 13 13 13 (w =
8 8 8 8 -
8 7 7 7
(w
20 11 11 11
19 10 11 10 -
39 19 14 13 = 38)
42 15 16 15
39 15 12 15 -
79 29 22 18 = 81)
85 24 19 19
79 22 18 16 -
16 11 11 12
19 11 9 10 1.36)(Nt = 19)
67 22 17 19 = 1.78)(Nt
80 23 19 20
7 -
r
l.lO)(NL = 9)
33 16 14 15 = 1.60)(Nt
-
c
9 7 7 7
-
41 17 17 17 (w
8
-
80
120
160
1 2 3 4 5
150 40 32 27
1 2
226 49 38 32
1 2 3 4
57 42 35
150 34 25 31
150 32 24 24
146 46 31 27
-
166 27 22 27
223 59 34 31
146 31 22 22
162 43 34 28
223
232 53 40 34
-
-
63 45 38
65 40 34
5 0
For explanation of Ng"; a = P, W , B ; fl
=
t , c,
0, v; and
149 35 27 29
162 38 24 28
166 44 32 29
232 -
223 53 37 32
65 40 37 -
N , see Section 24.
166 31 22 27
166 33 20 20 -
136
-
38
160 34 23 24
223
205 46 32 28
-
27
24
54 36 31
136 32 23 23
166 42 31 28
205
223 53 36 32
-
62 42 36
166 33 21 22
166 32 21 22 223 -
2 40
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
TABLEI V PREDICTED AND OBSERVED NUMBERSOF ITERATIONS USING PEACEMAN-R ACHFORD AND WACHSPRESS PARAMETERS Region I1
Region I m
Nf
N,”
N,‘
N,”
1 2 3 4 5 6 7 8 9 10
82 33 26 23 22 21 21 20 20 20
-
82 34 29 29 30 31 33 34 36
82 26 20 18 18 18 18 17 20
82 25 18 16 15 15 15 15 15
38
NOP N F N,’
N,”
-
-
91 36 27 27 25 24 26 24 26 27
73 22 15 14 14 11 13 14 11
85 33 26 24 22 21 21 20 20 20
N f N,“
85 23 17 14 14 9 10 10 10
75 28 26 21 21 22 20 25 24 26
72 23 20 19 21 23 25 26 26
NF N:V 85 30 23 20 20 18 16 20 22 23
79 22 17 15 18 19 20 21 24
Mesh size: h = 1/40 (For explanation of symbols see Section 24.)
The predicted number of iterations Nta was determined by Nta = log is defined by (16.7) and equals, for the case of the square, 2 R,, = - - log @,,,(u,b, p ) . (24.1)
106/Emwhere
z,,,
m
I n each case the function 4m(u,b, p ) was evaluated numerically to at least four decimal places of accuracy on the computer. The predicted number of iterations, Nca, was determined by N C P= log 106/E(2 and New = log 106/RLm where Eg) and ELm are lower bounds for Ernfor the Peaceman-Rachford parameters and for the Wachspress parameters respectively which are given respectively by (16.14) and (16.36). For the successive overrelaxation method the observed numbers of iterations are given in Tables I1 and I11 on rows labeled “SOR” and in columns headed “Nop.”The corresponding values of w are also indicated. The predicted number of iterations N t was determined by solving the equationz9 4Nt(w - 1)Ns-1 = (24.2) Table I V gives predicted, virtual, and observed numbers of iterations *9 Young [25b] showed that the number of iterations needed to reduce the 1 2 norm of a n initial error vector by factor a, did not exceed n, where 5n(w - l)n-l = a provided o is the optimum relaxation factor. Varga [fg] showed that this equation could be replaced by m ( ~ 1)n-l = 01 for some constant, v, which can be shown to be less than 4.
ALTERNATING DIRECTION IMPLICIT METHODS
24 1
using the Peaceman-Rachford parameters and the Wachspress parameters with up to 10 parameters for the Regions I and I1 and for h = 1/40. Figures 2-6 show graphs, with logarithmic scales, of the observed number of iterations versus h-l for the successive overrelaxation method and for
REGION
I
I
5
10
I
20
I
80 I
120 I 1 160
40 h-l
FIG.2.
the Peaceman-Rachford method with the Peaceman-Rachford parameters and with the Wachspress parameters. Reciprocal slopes of straight lines fitted to the data points in each case are given in Table V. These slopes are also given for the case of the optimum parameters, though the corresponding graphs are not included in Figs. 2-6.
242
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
25. Analysis of Results
For the case of the square, the numbers of iterations Nta for the Peaceman-Rachford method predicted by the theory of Part I1 agree closely with the observed values Noa. I n fact, for m > 1 t,he values of N p differ
REGION
II:
AD1 PARAMETERS PEACEMAN- WACHSPRESS RACHFORD
SOR ---Q-..
-----
~
V 0
m = 3 V m = 4 +
m.5 I
10
I
20
I
40
I
80
I
m
I
120 I60
Frc. 3.
from the corresponding iVoa by at most five iterations and usually by much less. The agreement is especially good in view of the fact that changing the order in which the p i are used sometimes changes the number of iterations by two or three.
ALTERNATING DIRECTION IMPLICIT METHODS
243
The close agreement is to be expected since by (16.6) the actual rate of convergence R, is given by
and since, as noted in Section 15, the inequality (15.5) becomes an equality
m
REGION
30
\/o
10
I
0
A V 0 1
I
I
5
10
20
I
1
40 h-l FIG.
80
I
m=
1 m = 2
m = 3
A
' I
m = 4 4
m=
5
m
I
120 160
4.
in the case of the Helmholtz equation. Thus, the only difference between as given by (16.7) and the actual rate of Convergence R, lies in the approximation of
zm
244
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
REGION
3O(
lo( *+% *\53
*\A
AD1 PARAMETERS PEACEMAN- WACHSPRESS RACHFOR D
I( SOR
------
-it-
0
I
I
10
20
I
I
40 h-l FIG.
80
f
A
m= I m = 2
o
m a 3 m = 4
m.5
A V + m
I
120 160
5.
where y is any eigenvalue of H or V , by &(u, b, p), where
But for small h by (9.2) there will be a large number of eigenvalues of H and V distributed over the interval a 6 y 5 b ; hence the error in the above approximation will be slight. Since for each parameter choice @,(a, b, p ) was evaluated on the computer to a t least four decimal places of accuracy, the discrepancies between the Nta and the N g are primarily due to roundoff. It is expected that closer
ALTERNATING DIRECTION IMPLICIT METHODS
REGION
245
'X
SOR
--
AD1 PARAMETERS PEACEMANWACHSPRESS RACHFORO
-----
7 -
0
A
v o
I
I
5
LO
I
I
20
40
I
I
m = l m = 2 A m:3 V m=4 m.5
I
80 I20 I60
h-1
FIG.
6.
agreement would result if a tighter convergence criterion were used, thus minimizing the influence of the particular initial error vector which was present. In support of this, we note that the virtual numbers of iterations Nva agree much closer with the predicted numbers of iterations Ntathan do the observed numbers of iterations No", especially in the case of the Peaceman-Rachford parameters. Thus the actual rate of convergence as measured by the Nva agrees closely with the predicted rate of convergence as measured by Nta. For regions other than the square, the predictions of the numbers of iterations based on the theory of Part I1 are no longer valid. The observed
TABLE V. RECIPROCALS OF SLOPESOF LINES REPRESENTING LOG N ~
~
~
I
Parameters
P
W I1 I11 N
P
0.
IV
v
B P W B P W B P W B P W B
m
=
1
1.05 1.05 0.96 0.96 1.00 1.00 0.96 -
0.93 1.00
-
1.00
m =2
m =3
1.86 1.00 1.80 1.75 0.90 2.27 1.86 1.00 2.05 1.70 0.98 1.82 1.75 1.00 1.97
2.38 1.60 2.67 2.05 1.77 1.84 2.19 2.65 3.49 2.25 2.14 4.32 2.33 2.07 2.86
m
Reciprocal slopes times m5 =
4
2.95 2.54 2.90 2.14 2.62 2.41 2.75 3.09 4.93 2.79 3.55 3.38 2.71 3.80 3.24
m =5
m = l
m = 2
m = 3
m = 4
3.08
1.05 1.05 0.96 0.96 1.00
0.93 1.00 0.90 0.88 0.90 1.13 0.93 1.00 1.02 0.85 0.98 0.92 0.88 1.00 0.98
0.79 0.80 0.89 0.68 0.89 0.61 0.73 1.33 1.16 0.75 1.07 1.43 0.78 1.03 0.95
0.74 0.83 0.72 0.54 0.87 0.60 0.69 1.03 1.23 0.70 1.18 0.84 0.68 1.27 0.81
-
3.32 3.18 3.27
-
1.00 0.96 0.93 1.00 1.00
c
3.32
-
m = number of parameters P: Peaceman-Rachford parameters W : Wachspress parameters B : Optimum parameters 0
LOG h-l ~
Reciprocal slopes Region
VERSUS
~~
For Wachspress parameters, the reciprocal slopes were multiplied by (m
- 1).
m = 5
ALTERNATING DIRECTION IMPLICIT METHODS
247
number of iterations is seldom more than, and never more than twice, that for the square. In the case of the successive overrelaxation method, it can be proved30that rate of convergence for a given region is at most that for any region which includes the given region. For the Peaceman-Rachford method with one parameter, it was shown in Sections 10 and 11 that if a region can be embedded in a rectangle, then the rate of convergence of the Peaceman-Rachford method for the rectangle using the best value of p is at most that using the same p for the given region. Since for m = 1 the optimum p was used for the square for each mesh size, this result is applicable here, and is confirmed by the numerical results. For m > 1 the following conjecture is offered:
For a region which can be embedded in a rectangle, the rate of convergence of the Peaceman-Rachford method for a given set of iteration parameters is a t least 0 times that for the rectangle where e is a constant such that 1 2 8 2 1/2. As a consequence of the agreement between the numbers of iterations as predicted by the theory of Part I1 and the actual number, it follows that the Peaceman-Rachford method is extremely effective. Thus, from Theorem 18.1 it follows that for fixed m, the number of iterations is O(h-1'm),31 and that if a good value of m is used,32then the number of iterations is O(J1oghl). Consequently, one would expect that, asymptotically for small h, log NOa would be a linear function of log h-l with slope l l m for the PeacemanRachford parameters and with slope l / ( m - 1) for the Wachspress parameters. Inspection of the graphs of Figs. 2-6 reveals that the observed data points do indeed lie roughly on straight lines. Moreover, as indicated by Table V, the slopes of the lines are close to the predicted values for small m,especially for the square. For other regions where the theory of Part I1 does not apply and for larger m, the agreement is not as close. The discrepancy for the larger m may be explained by the fact that the quantity (a/b)l/2* = ( 7 r h / 2 ) l / m , which is assumed to be small in the derivation of the asymptotic formulas (18.6), is actually rather large for m = 4 even for h as small as 1/160. I n this case the value is 0.315. Presumably, the actual slopes would be closer to the predicted slopes if much smaller values of h were used. Although the values of h used were not small enough to test whether Noa is O(/loghl) if a suitable value of m is used for each h, nevertheless, there See Young [25b]. For the Wachspress parameters this would be O ( / L - ~ / ( ~ - ~ ) ) . Determined by (16.20) and (16.38) for the Peaceman-Rachford parameters and for the Wachspress parameters, respectivaly. 30
31
248
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
seems no reason to doubt its validity. In any case, both Nod and N P increased very slowly as h decreased; for example, even with h = 1/160, only twenty-two iterations were required using five Wachspress parameters. The main increase in computer time as h decreased was simply due to the presence of more mesh points rather than to the increase in the number of iterations. In comparing the effectiveness of the Peaceman-Rachford method with that of the successive overrelaxation method, one must remember that twice as much machine time was required per iteration with the PeacemanRachford method as with the successive overrelaxation method. For the case m = 1, it was shown in Part I11 that the spectral radii of the two methods are identical for the square provided that the optimum iteration parameters are used in each case. However, since the Jordan normal form of the matrix corresponding to the successive overrelaxation method is not diagonal, the number of iterations is somewhat larger. For this reason, although the spectral radius of the method using the optimum relaxation factor Wb is (wb - l ) , the predicted number of iterations is determined by (24.2). This yields a larger value than if N t had been determined by the usual formula (ub - l)N1 = lo+, which would be valid if the Jordan normal form of the corresponding matrix were diagonal. While the number of iterations for the successive overrelaxation method is slightly larger than for the Peaceman-Rachford method for m = 1, nevertheless, because only half as much time is required per iteration, the successive overrelaxation method is definitely superior to the PeacemanRachford method with one parameter. However, since the number of iterations with the successive overrelaxation method is asymptotically proportional to (2?rh)-’ as compared to (m/4) (2/rh)”m for the PeacemanRachford method with the Peaceman-Rachford parameters, the superiority of the latter method form > 1 is evident. This superiority is amply reflected in the Tables I1 and I11 and in Figs. 2-6, not only for the square but for the other regions as well. Estimating the number of iterations for the successive overrelaxation method as five hundred seventy for the case h = 1/160 and comparing with twenty-two iterations required using the Peaceman-Rachford method with five Wachspress parameters, the latter method is faster by a factor of nearly thirteen to one. We now consider the choice of iteration parameters. Theorems 16.2 and 16.5 indicate that the Wachspress parameters are superior to the PeacemanRachford parameters provided one chooses good values of m by (16.20) and (16.38), respectively. The results of Tables I1 and I11 confirm this superiority for the case of the square. There seems little to choose between the two parameter choices for the other regions. The optimum parameters are not appreciably better than the Wachspress parameters. Because of the
ALTERNATING DIRECTION IMPLICIT METHODS
249
theoretical superiority of the Wachspress parameters over the PeacemanRachford method and because the Wachspress parameters are easy to determine as compared with the optimum parameters, their use is recommended. Concerning the choice of the number of iteration parameters, m, the values predicted for the square by (16.20) for the Peaceman-Rachford parameters can be estimated from Tables I1 and I V by observing where N P is smallest. This follows since (16.20) was derived by maximizing EF and since Ncp = -log 10-6/-log @,?. In the case of h = 1/40 the smallest value of Ncpoccurs for rn = 3 or 4,whereas the value of m from (16.20) is 4. By (16.38) the predicted optimum value of m for the Wachspress parameters would be 5. The fact that Ncw is smaller for m = 9 than for m = 5 is a reflection of the inexactness of the approximation used in Section 16 to derive (16.38).33 It is to be noted that for h = 1/80, 1/120, and 1/160, the values of n determined by (16.20) would be 5, 5, and 6, respectively, and those determined by (16.38) would be 6, 6, and 7 respectively. Such values of m were not used, but if they had been, presumably fewer iterations would have been required. Returning now to the case h = 1/40, based on the observed values Nop and N o r , it appears that it would have been better to use larger values of m, say between 13 and 2 times those indicated by (16.20) and (16.38). I n support of this, we note that N t P and Nrw appear to decrease for all m up to 10. Moreover, even with N P and NCwthere is only a small increase for values of m larger than those given by (16.20) and (16.38) respectively. Consequently, it seems safer to use a value of m which is slightly too large than to use one which is too small. 26. Conclusions
The following conclusions and recommendations summarize the results of the preceding experiments; they seem reliable at least for the Laplace equation with given boundary values (the Dirichlet problem) and a square mesh. (1) The rate of convergence of the Peaceman-Rachford method is accurately predicted by the theory of Part 11. (2) For each of the other regions which were embedded in the square, the number of iterations required was usually less than and never 13 On the other hand, we note that Ncw agrees more closely with NtW than NCpdoes with N P . This is as expected because the bound (B.15) of @,,,(a, b, p ) for the Wachspress parameters took into account two factors of (16.3) while the corresponding bound (B.14) for the Peaceman-Rachford parameters uses only one factor.
250
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
more than twice that required for the square. It is conjectured that this is true in general for any region embedded in any rectangle. (3) The Peaceman-Rachford method is an extremely effective method, and, for small h, is much superior to the successive overrelaxation method. I n fact, by suitable choice of parameters, the number of iterations only increases as llog h(; hence the increase in computer time involved in passing to a smaller mesh size is almost entirely due to the increase in the number of points, and only very slightly due to an increase in the number of iterations. (4) The Wachspress parameters are recommended in preference to the Peaceman-Rachford and to the optimum parameters. Unless other information is available, it is recommended that the number of parameters used be chosen between 13 and 2 times that obtained from (16.38).
27. Experiments Comparing SOR Variants with AD1 Variants
The following is a brief summary of experimental results34obtained on the IBM-704 and 7090 at the Gulf Research Laboratory (Hamarville, Pa.), comparing the latest SOR variant with the Peaceman-Rachford method. The experimental results of the previous sections specifically compared the point SOR method with the Peaceman-Rachford method, and in n o case (Table 111) did the point SOR method with optimum w require fewer iterations than the Peaceman-Rachford method. The situation is however changed when the newer variant of SOR, using the two-line iterative method (Section 21) coupled with the cyclic Chebyshev semi-iterative method (Section 21), is similarly compared. Using the same regions for the Dirichlet problem, the same starting values of unity and the same method for terminating iterations as described in Section 23, the total number of iterations for each method was normalized by the relative amount of arithmetic required by each method per mesh point. Specifically, the arithmetic requirement for the IBM-704 for the following methods were in the proportions Point SOR 2-line cyclic Chebyshev 1.26 7 Peaceman-Rachford 2.05 and the numbers of observed iterations were multiplied by these constants and called normalized iterations ; these normalized iterations are then directly proportional to actual machine time.
{
34
By Harvey S. Price and Richard S. Varga
l"i
ALTERNATING DIRECTION IMPLICIT METHODS
25 1
The curves in Figs. 7 and 8 illustrate the basic results of this experimentation. For each mesh spacing, each process was optimized with respect to acceleration parameters. This means that in the case of the 2-line cyclic Chebyshev method, estimates of the spectral radius of the Jacobi matrix were vaned to find fastest convergence. For the Peaceman-Rachford method, the number of parameters, to be used cyclically was similarly varied. From these curves, we see that there is a substantial decrease in iterative time in passing from the point SOR to the 2-line cyclic Chebyshev semi-iterative method. Second, in each of these cases (and in all other cases actually considered) we see that there is a critical value h* of the mesh spacing such that if h > h*, it is better to use the 2-line cyclic Chebyshev iterative method, but for all h < h*, the optimized Peaceman-Rachford 80
Region I
40
20 111
g
.A 42
f-l
.z -0
10
N
.* E M
z0 5
5
10
14
h-' FIG.
7.
20
40
60
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
252
method is superior in terms of actual machine time. Again, the curves of Figs. 7 and 8 indicate that the Peaceman-Rachford method for small h is vastly superior to any of the SOR variants. These figures also show that there is a great variation in this critical value h* from problem to problem. Also in this experimental program at Gulf were problems of the general form
- (PlUZ(X,
Y>)z
- (P2%(Z,
Y))Y
+
a 4 2 , Y) = f (2,Y), (2,Y)ER,
(27.1)
where R is a bounded connected set with boundary r, subject to boundary conditions of the form (27.2)
Region I1
40
20
10
5
5
10
20
h-' FIG.8.
40
70 80
ALTERNATING DIRECTION IMPLICIT METHODS
253
I n particular, cases where PI and PZwere discontinuous (typical of problems occurring in reactor and petroleum engineering) were similarly considered. For one such problem, it was possible to select iwo parameters p1 > pz > 0 such that the spectral radius of the associated PeacemanRachford method was A(TpzTp1) = 13.48. This divergence is complementary to the known convergence of Theorem 5.1 for a fked value of p and should serve to warn the unsuspecting reader of possible divergence in his use of AD1 methods.
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
254
APPENDIX A THE MINIMAX PROBLEM FOR ONE PARAMETER 1. Peaceman-Rachford Method
In this section, we again examine the minimax function F(a, b ; a, 6 ) defined by (7.8), which arose in connection with the Peaceman-Rachford method. For 0 < a 5 b, 0 < a 5 8, F(a, b ; a,8) is defined as a minimax function by the formulas
d% b, PI
=
nlaxaSSbl(P - P ) / h
and
+ P)i,
(-4.1)
F(a, 6; a,P> = min, @(a,6, P>@(c., P, P ) . (A.2) Clearly C$ < 1 for any p > 0, and C$ > 1 for any p < 0; moreover 4 tends continuously to 1 as p -+ m. Hence the minimum is assumed in (A.2) for at least one finite positive “optimum rho” p*. Again, for fixed p 2 0, the continuous function \ ( p - p ) / ( p p)I is decreasing for 1.1 < p and increasing for p > p ; moreover it has its minimum when p = p . Hence its maximum value occurs at p = a or p = b. Comparing the values there, we obtain if o 4 p 5 dZ, (b - p ) / ( b p) 44%b, PI = (A.3) (p - a)/(p a) if p 2 4. This completes the determination of 4; it is analytic for all nonnegative p # 6, and continuous everywhere. It is also easy to determine the unique “optimum rho” p* which minimizes +(a, b, p ) . Since In @ is an increasing function of 4, p* is that rho which minimizes In 4. By (A.3),
+
+
i
Hence the optimum rho is
+
6, and =
Since 0 4 4
where d
=
tanh [(ln c)/4],
< 1 for 0 < p < + oc, it follows that
min @/a, P / a ) .
c = b/a.
ALTERNATING DIRECTION IMPLICIT METHODS
255
It is also evident that (A.7) F(a, b ; a, PI L [Inin, +(a, b, P ) 1 . [mi% +(a,P, P I 1 1 equality holding if and only if +(a, b, p ) has the same “optimum rho’’ p* as 4(a, 0, p ) . Referring back to (A.5), we obtain the following result. LEMMA A . l . F eatisjies the inquality
equality holding i f and only if ab
COROLLARY A . l . If a
= a
= aP.
and b
=
P, we have
We now try to determine F generally. Since In F is an increasing function of F , we have In F = min, Cln +(a, b, PI In 4(a1P, PI]. Moreover by the remark after (A.3), the sum in brackets is continuous everywhere, and analytic for p # 6, Finally, differentiating (A.4) again, we obtain d2(ln+)/dp2 < 0 for all p # dab. A similar result holds for +(a,0, p ) , and so we get
+
Go.
+(a, b, P )
a.
+ In
(a!,
(A.9)
P, p ) l l d p 2 < 0,
for p # G b , Since a minimum cannot occur where the second derivative is negative, we conclude -
LEMMA A.2. I n all cases, p* = dab or p* = d a p . Substituting back into (A.2) and (A.3), we obtain the following definitive result.
THEOREM A . l . If ab 5 ap, then F ( a , b ; a, p) of the following two numbers:
=
F ( q p ; a, b) i s the smaller
-
The first option occurs if p* = dab; the second if p* = d a p . The following condition, which we mention without proof, states which value of p is optimal. a, then p* THEOREM A.2. Let ab 5 ap. If a b 2 p, then p* = 42.If a 5 a! and b 5 8, then p* p* = &$if ap 5 ab.
= =
d/ab;if a S
and 6 i f ap 2 ab and a!
256
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
COROLLARY A.l. If
+ b = a + /3 and ab S a@, then p* = 42.
a
Caution. I n Theorem A.2, the value of p* is not necessarily unique. For example, if a@= ab, the two values p* = &6 and p* = are both optimal, though they are in general distinct.
42
2. Douglas-Rachford Method
We now determine the minimax function FD(a, b ; a,P) for the DouglasRachford method, defined on the domain 0 < a 5 b, 0 < a S @ by the formula
+
+
+
pz)/(p p)(v p) < 1 if p > 0 for p, v as specified Clearly, 0 < (pv . the minimum is assumed and it tends to 1 continuously as p + 0 , ~Hence in (A.ll) for some finite positive optimum p = p*. One easily verifies the algebraic identity
I.lv
(Lc
+
+
P)(V
+ P) -- 2-1
1 (cl
Pa
+
- P > ( V - PI.
5 (P +
P>(V
+ PI
(A.12)
On the other hand, from (A.l)-(A.2), using the remarks after (A.2), one can derive the following alternative formula for F :
This will be compared with the following consequence of formulas (A.ll) and (A.12). 1 1 (A.14) FD(a,b; a,PI = 5 5 min,>o +D(u, b; a,P ; PI, where
+
We can compute +D by (A.3). If ab 5
afi and
a
S b, then if0 5 p 5
6,
if-sp.
If a > b, then 4~ is negative, and so FD < 1/2; this case is atypical for elliptic difference equations.
257
ALTERNATING DIRECTION IMPLICIT METHODS
When a = a and b = p (for example, if H = V as for the Helmholtz problem in a square), +D(u, b ; a, b ; p) = [+(a, b, p)lZby (A.16) and (A.3). Hence, in this special case, FD = (1 F)/2. In general, one merely has the inequality
+
+
FD(a,b ; a,8) 5 [l F(a, b ; a,P)1/2: (A.17) which is evident if one compares (A.13) with (A.14)-(A.15). A complete discussion involves an elaborate analysis of special cases, and so we merely state a partial result without proof. THEOREM A.3. If ab 5 b Douglas-Rachford method i s duction matrix i s XD(6)
a 6 Gfl then , the optimum rho p ~ for * the the spectral radius of the error re-
dolb, and =
2dorb/(a
+b +2dZ).
(A.18)
if ab 5 a0 5 bp, then
+ a)bp - ( b + b)aal/[(b + PI - ( a + 4)11”,
PO* = { [ ( a
(A.19)
and the spectral radius of the error reduction matrix is
(A.20)
+
For the Helmholtz equation in a rectangle, treated in Section 9, a b p and so b 5 a. Hence (A.19)-(A.20) hold, and so FD > F except in trivial cases. =a
+
3. Parameter Translation
As in section 7, we define
and we define G as the minimax function
G(a, b ; a,P )
=
min,,;; #(a, 6; a,P ; P,
PI,
(A.21)
all for 0 < a 5 b and 0 < (Y 5 p. Since the functions whose extrema are sought are continuous, the existence of # for p > -a and p > -a, and hence that of G, follows by simple compactness arguments. Any pair p*, p* minimizing )I will be called optimal, for the reason stated in Section 7. The function # is closely related to the function 4. Indeed, setting A = ( p - p)/2, p1 = p A, v1 = Y - A, and 7 = ( p p)/2, clearly
+
(P
- P>(v - p ) / h
+ b)(v +
+
P ) = (pi - 7)(vi
- 7)/(pi
+
T)(vi
+
T).
258
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
Substituting into (A.20), we get
+
4 ( ~ A, b Xow taking the minimax, we get =
+ A;
LY
- A,
- A; 7).
+ A, b + A; a - A, /3 - A). (A.23) We will now calculate this expression. One easily verifies that (a + A) (b + A) = (a + A) (0+ A) if and only if A = (a0 - ab)/(a + b + a + a). With this choice of A, both options in (A.lO) assume the same value. Hence we have THEOREM A.4. For A (ap - ab)/(a + b + a + p), G(u, b ; a,0) 6 F ( u + A, b + A; a - A, /3 - A) G(a, b ; a,p) = minA F ( a
=
=(
(A.24)
It is attractive to speculate that the preceding inequality can be reversed, so that one optimizes the iteration parameters by a translation making ab = ap.
ALTERNATING DIRECTION IMPLICIT METHODS
259
APPENDIX B: THE MINIMAX PROBLEM FOR m > 1 PARAMETERS 1. Optimum Parameters
For any 0 functions
b and any positive integer m, we now define the
[n m
&(al b ; ~ =) m a x , ~ ~ [(P a=1
+
PJ/(P ~ 3 1 b
B.1)
and
Fm(a, b) = min, + 4 a 1 b ; p ) , p = (PI, . . . , pm), (B.2) which generalize the definitions of +(a, b, p) and F(a, b; a, b ) in formulas (A.1) and (A.2) of Appendix A. It is evident that +,, is a symmetric function of the p,-that is, it is invariant under any permutation of the subscripts i = 1, . . . , m. Hence, without loss of generality, we can assume that the pl. are arranged in ascending order, so that p1 5 PZ 5 . . 5 Pm. 03.3) This assumption will be made below. Because the factors in (B.l) are homogeneous of degree zero, it is also evident that f
+,(a, b ; p ) = &(ca; cb; cp) and F,(a, b) = Fm(ca, cb), (B.4) for any c > 0. That is, the value of F,(a, b ) depends only on the ratio b/a, and the positive integer m. An optimum m-vector p* = p*(a, b ; m) for given a, b, and m is defined as a real m-vector which minimizes +,(a, b ; p)-that is, such that &(a, b ; p*) = Fm(a, b). The existence and continuity of +*' for fixed a, b, p is evident since the product on the right of (B.1) is continuous and the domain is compact. The existence of p* then follows since +m(a,b ; p ) is decreased if a negative pz is replaced by -pL, since & < 1 if all pz are positive, and since +m -+ 1 as all pz -+ +.o ; this makes the domain where &(u, b ; p ) 4 1 - e compact, and nonvoid for sufficiently small E > 0. The uniqueness of p*, a more difficult question, is also known. It expresses the fact that the family of rational function expressible as products of the form II[(p - p%)/(p p,)] has the following basic property. Chebyshev Property. For given 0 < a < b and m 2 1, there is a unique optimum m-vector p* with a < pl* < pz* < . . . < pm* < b, such that ?',(a, b) = +,(a, b ; p * ) . This vector is determined by the property that the product IIE1 ( p - p,)/(p pz) in (B.1) assumes its maximum absolute 1 points T ~ ,with value F d a , b), with alternating signs, in exactly m a = TO < PI* < TI < . . . < T,-I < pm* < rm = b.
+
+
+
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
260
For the proof of the fact that the functions in question have the stated property, the reader is referred to Wachspress [ 2 5 ] . It is closely related to pI) is varithe fact that the family of rational functions II ( p - p i ) / ( p solvent36 (unisolvent of variable degree). The following symmetry property is also very helpful:
+
-
-
03.5) 4m(a, b; U ~ / P I , * * 9 ab/pm)This identity is a corollary of the fact that the correspondence p 4 a b / p maps the interval a 6 p 5 b onto itself, combined with the evident algebraic identity +m(a, b ; PI, *
* 7
Pm) =
- (ab/Pi)l/[(ab/P)
[(ablll)
+ bb/Pdl
= (Pi
- d / ( P * + 4.
From (B.5) and the Chebyshev Property, it follows that P%+1-*
(B.6)
= Ub/P%*.
I n particular, for odd m = 2n - 1, it implies pn* = 4, as was proved for n = 1 by elementary methods in Appendix A. From this Symmetry Property and the Chebyshev Property, it follows that for even m = 2n, r n = 4. As shown by Wachspress [ 2 5 ] , one can use the correspondence p 4 ( p a b / p ) / 2 to establish the following sharper result.
+
THEOREM B.l. For any even positive integer m F2n(a,
b) = + ~ n ( a , b ; PI*, . . . =
&(d/ab, (a
pan) =
+ b)/2;
01*,
=
2n,
Pn(dab, (a
+ b)/2)
(B.7)
. . . , an*).
The optimum an-vector p* is related to the optimum n-vector a* by p;-j+i
= u3*-I- 4/(u3*)' -
so that %* = ( P L t l
ab, pi+I
+ ab/pL+1)/2
- d(u3*)'- ab,
= oj*
= (Pi+,
+ Ub/Pi+,)/2.
(B.8)
For n = 1, the optimum parameter for the interval a 5 p 6 b is 6. Hence the case m = 2 can be explicitly calculated from (B.7) and (B.8) as follows.
COROLLARY B.l. For 0
< a 5 b and m
= 2,
we have
+ b ) 4 ] 1 / 2 f [ ( a + b)V% - 2 ~ b ] ' / ~ (B.9) F ~ ( u b) , = (a + b - [ 2 ( ~+ b ) d Z ] l / z } / { ~ + b + [2(a + b ) 4 a b ] 1 1 2 } . &p,*
= [(a
and so
(B.10) Making repeated use of (B.8), one can explicitly compute the optimum m-vector for m = 2? any power of two. One can also compute F,, using 36
See Rice [f 6 b ] .
ALTERNATING DIRECTION IMPLICIT METHODS
26 1
(B.7). Specifically, one first computes the nested sequence of values, tending to the arithmetico-geometric mean of a and b :
a, + bi)/2.
b, ai+l = bi+l With these definitions, we obtain from (B.7) = a, bo =
= (Ui
(B.ll)
1
1.0
.8
.6
m=4
.4
.2
m=8
0
-2
-4
-6
>
log1oa
FIG.9.
When m is not a power of two, the optimum parameters can still be computed effectively using an algorithm of Remes [ I B U ] ? ~This method is described and applied to compute numerical values by de Boor and Rice
Pal
*
a6 See
also Stiefel [16d].
262
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
2. Good Iteration Parameters
For many purposes, one can approximate F,(a, b) sufficiently well by relatively simple explicit formulas for p. Such choices of parameters may be called good parameters, since they give rates of convergence for AD1 methods not too far from the optimum. A very simple and quite good parameter vector for arbitrary a, b, and m was suggested by Peaceman and Rachford [16]. Their suggestion was to use p i p ) = ak2*--l, where k = ( b / ~ ) " ~ " . (B.13)
THEOREM B.2. For the Peaceman-Rachford parameter vector defined by (B.13), we have the inequality 2m am 4 , ( ~ ,b ; p ( P ) ) (= ( k - l ) / ( k 1) = (1 - d u / b ) / ( l d / a / b ) . (B.14)
+
+
Proof. Let I( be given. Since each factor in (B.l) is less than one in magnitude, it suffices to show that one factor is bounded by ( k - l ) / ( k 1). But either p is in (a, ak), or in (bk-l, b ) , or in some interval ( ~ ~ - 1 ,p J . I n the first two cases,
+
+ 11, 0 5 (b - d / ( b + I( k - l ) / ( k + 1). 0
or
s ( P - U ) / ( P + a ) 5 ( k - 1)/(E P)
In the third case,
Comparing these inequalities, (B.14) follows immediately. ~~ Wachspress [23, 241 has pointed out that a better t h e ~ r e t i c a lupper bound is given, for all m > 1, by the choice
pp= ad2"2 ,
where d = ( b / ~ ) l ' ( ~ ~ - ~ ) . THEOREM B.3. The Wachspress parameter vector (B.15) satiq$es
(B.15)
+
+,(a, b ; d W ' ) 5 [ ( d - l ) / ( d 111'. (B.16) Proof. Each factor in ( B . l ) is less than one in magnitude, while for one i, I.L is in the interval (pi+ pi). For this i, the corresponding factor in ( B . l ) satisfies
completing the proof. A still better parameter vector is defined by de Boor and Rice [flu]. 36
Though the bound (B.16) is better than (B.14), the inequality +m(u,b ; pew)) not hold in all cases (remark by J. Rice and C. deBoor).
C $ ~ ( U ,b ; P ( ~ ) does )
<
ALTERNATING DIRECTION IMPLICIT METHODS
263
APPENDIX C: NONUNIFORM MESH SPACINGS AND MIXED BOUNDARY CONDITIONS
In Section 14 it was assumed that the mesh spacings in the two coordinate directions were equal, and Dirichlet conditions were assumed on the boundary of a rectangle. We now seek to show that if the region is a rectangle one can obtain commutative matrices even if the mesh spacing is nonuniform and even if the mixed boundary condition (2.7) is used on some sides of the rectangle, provided that d(x, y) is constant on each of these sides. L2) denote the set of intersections of a family L1 of Let Q D = QD(L~, horizontal lines and a family Lz of vertical lines. Two points of Q D are said to be adjacent if they lie on the same horizontal or vertical line segment and if there are no other points of QD in between. Following Forsythe and Wasow3’ we designate the distances from a point (2, y) to adjacent mesh points in the increasing x, increasing y, decreasing x, and decreasing y directions, respectively, by h~ = ~ E ( z ) ,h~ = ~ N ( Y )hw , = hw(x), and hs = hs(y). The four points adjacent to (x, y) are thus (x ~ E ( z )Y), , (z, Y 4hN(Y)), .( - hw(Z>, Y), and (z, Y - hs(Y>)Given the problem of solving (14.1) in a rectangle, we let L1 and LZbe arbitrary except that the horizontal sides of the rectangle must belong to L1, and the vertical sides to Lz. We assume that on each side of the rectangle either u is given or else (2.7) holds with d a constant on each such side. The set ciiD = cii~(L1,Lz) consists of the interior mesh points and those points of Q D on the boundary for which the mixed conditions apply. For each interior mesh point on 520 we approximate the differential equation (14.2) by the difference equation defined by (14.3)-(14.6) where
+
CdY) Z = hkK.
=
CdY>
+ CdY),
W.2)
((3.3)
Here hE’ = h ~ / 2h,N r = h~/v/2, etc., and h and k are arbitrary positive numbers which might be chosen as the mesh spacings in the z- and y-directions if these were constant. 3’See [ 5 ] , p. 194.
264
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
For points of 6 t D which are on the boundary of the rectangle we develop a difference equation based on both the differential equation (14.2) and the mixed boundary condition. Consider, for example, the case of a point (x,y) on the left vertical side. The boundary condition becomes
The formulas for the difference operator V will be the same as for interior points. To represent the differential operator - [a(El(s)au/ax)/ax]/E,(z) we use the approximation
where (au/ax), and (au/ax)l represent values of &/ax at the points and (x hE’, y) respectively. But by (C.4) we have
+
If we use the central difference approximation &‘(u(x for (au/az), we obtain
(2,
y)
+ h E , y) - u(x, y))
and we have
Similar formulas can rectangle. We now seek to show that the matrices H , V ,and I: obtained from the difference equation satisfy conditions (13.3)-(13.5). By (C.l) and (C.7) the coefficientsof the values of u appearing in the expression for Hu(x, y) depend on x alone. Similarly, the coefficients in Vu(x,y) depend on y alone. Hence the coefficients of the projection operators a and P of (14.9)-(14.10) are of the form (14.9)-(14.10). Therefore it follows by Lemma 14.3 that and P commute, and hence the corresponding matrices H and V commute. Hence condition (13.3) holds. Moreover, by (C.3) condition (13.4) holds. To show that the matrices H and V are similar to nonnegative diagonal matrices we note that FH and FV are symmetric, where F is a diagonal matrix with positive diagonal elements which correspond to the
ALTERNATING DIRECTION MPLlClT METHODS
+
+
265
function F(z, y) which equals Ez(z)F1(y)(hg h w ) ( h ~ hs) as points of (RD inside the rectangle. On points of (RD on the left vertical side F(z, y) = E2(z)Fl(y)(h~hs)hE. Similar formulas hold for the other sides of the rectangle. Since the matrices FH and F V are symmetric and have diagonal dominance, it follows that they are nonnegative definite. Also, as in Lemma 14.2 it follows at once that H and V satisfy (13.5). Thus conditions (13.3)(13.5)are satisfied by the matrices H , V , and 2.
+
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
266
APPENDIX D: NECESSARY CONDITIONS FOR COMMUTATIVITY
I n Section 14 and in Appendix C we have given some sufficient conditions on the differential equation and the region for the matrices H , V, and B to satisfy conditions (13.3)-(13.5). We now present some necessary conditions. We restrict our attention to the Dirichlet problem with the differential equation (2.1),
+
where G ( x , y ) , A ( x , y ) , and C(x, y ) belong to class38 Uz)in (R @ and where G 2 0, A > 0 and C > 0 in 6t a.We assume that a square mesh of length h is used and that (R and @ are such that for an infinite sequence, X, of values of h tending to zero all boundary points of the network belong to @, and moreover for all sufficiently small h in this sequence 6th is connected. We now prove
+
THEOREM D.1. For the Dirichlet problem for the region and the digerential equation (D.l), let there exist a nonvanishing function P ( x , y) such that for all h in X the matrices H , V , and Z satisfy conditions (13.3)-(13.5), where H , V , and B are derived from the equation
fY-6 Y)-u‘1L)
P(., Y)S(J.,Y ) ,
(D.2) using the diflerence approximations (2.2) and (2.3). Then there exists a nonnegative constant K and functions E’,(z),E 2 ( x ) ,Fl(y), F2(y) which are positive and belong to class C(2)in 6t a,such that =
+
A ( x ,Y ) L ( x , Y)
Bi(x)Fi(y),C ( x , y ) = fiz(z)Fz(y), G ( r ,Y ) = KEz(z)Fi(y), (D.3) = c / E ~ ( z ) F ~((c~ i s)a, Constant). =
Proof. The difference operators H and V corresponding to (D.2) are given by
H u h Y)
=
Aob, Y > U ( Z , Y ) - A l b ,
Y)Ub
+ h, Y >
- A d z , Y ) U b - h, Y ) , V U b ,Y)
=
C&,
Y)U(X,Y
)-
C2(2, Y ) U ( Z , Y
0 4 )
+ h)
- C4(2, Y ) U ( Z , Y - h ) , (D.5) where A d z , Y) = Pb, Y ) A ( ~ ( h / 2 ) ~ ) ,A&, Y ) = P ( x , Y ) A ( x- ( h / 2 ) ~ ) ,
+
etc. The so-called ‘Lprojectionoperators” 14 by 38
and
P are defined as in Section
Functions with continuous second partial derivatives are said to be of class C(*).
ALTERNATING DIRECTION IMPLICIT METHODS
267
+ h, y) (D.6) Pub,Y ) = Cob, y>u(z,Y> Gb, Y M X , y + h) - E~(X, Y)u(~ y ,- h ) , (D.7) where Xdz, y) A I ~Y)r(z , + h, y), &(z, y ) Adz, y ) r ( z - h, y), etc., and where r(z, y ) 1 or 0 according to whether or not (z, y ) belongs to Rub, 9) = AOb, Y>U(Z,Y) - U
y)u(z
X ,
- Z ( z , YMJ:- h, Y),
-
=
=
=
Rh.
We now prove two lemmas about general difference operators of the form (D.6)-(D.7). LEMMAD.2. If the coeficients A,(z, y), i = 0, 1, 3, and Cz(z, y ) , i = 0, 2, 4, are positive in a h , and if A and P commute, then 6th is rectangular, Ao(z, y) depends only on x and CO(X,y) depends only o n y.
Proof. We first show that for any (z, y) if any three of the four points h, y), (2, y h ) , and (z h, y h) belong to 6 t h , then the fourth does also. This and the assumption that 6 t h is connected will prove that 6this rectangular. Let us assume that the three points (z, y), (z h, y), and (z h, y h) belong to 6th. Equating coefficients of u(z h, y h) in the expressions for gpu(z,y ) and PRu(z,y) we have XI(%,y)c,(z h, y) = Zdz, y h)Cz(z, Y), or h, d r ( z h, Y h) A&, y)Cdz h, = AI(z, Y h ) C d ~y)r(z , h, Y h ) r ( z ,Y h). But since r(z h, y h) = r(z h, y) and since none of the coefficients Ai(z, y) or Cz(z, y ) vanishes, equality is possible only if r(z, y h) = 1. Hence (z, y h) belongs to 6 t h . Since similar arguments hold in other cases it follows that 6th is rectangular. If the rectangular network @, had only one column of points, then CO(X,y) would clearly be independent of x. Otherwise, let (z, y) and (z h, y) be only two points of 6th. Equating coe6cients of u(x h, y) in the exh, y) = pressions for RPu(z, y) and VRu(z,y) we have -Xl(z, y)Co(z -Zl(z, y)Co(z, y), or equivalently (z, y), (z
+
+
+
+
+
+
+
+
+ + +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ - A ~ ( z , Y ) C ~ ( X + h, y ) r ( z + h, Y ) -A&, y ) c O ( z ,y)r(J: + h y ) . But since r ( z + h, y ) 1, and since Al(z, y) > 0 we have CO(Z+ h, y) =
=
+
=
CO(z,y). Since this is true for any point ( 2 , y) of 6th such that (x h, y) is also in 6th it follows that Co(z, y) is independent of z. Similarly, Ao(z, y) is
independent of y, and Lemma D.2 is proved. and 7 symmetric if the correWe shall call the difference operators sponding matrices H and V, respectively, are symmetric. Symmetry of Z? implies that the coefficient of u(z h, y) in the expression for g u ( z , y) is the same as the coefficient of u(z,y ) in the expression for Ru(z h, y),
+
+
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
268
+
assuming that both (2, y) and (z h, Y) are in Ub. One can readily verify that necessary and sufficient conditions for symmetry of If and P are (for (5, Y) and (5 h, Y) in Ed, (D.8) &(x, y) = Z3(z h, y), (for (z, Y) and (2, Y h) in R h h (D.9) C2(z,y) = C&, y h),
+
+ +
+
We now prove LEMMA D.3. Under the hypotheses of Lemma D S if I? and are symmetric, then the nonzero values of Zl(x, y) and Z3(x,y) depend only on x, and the nonzero values of C2(x,y) and C4(x,y) depend only on y. is rectangular, by Lemma D.l. If (z, y) and Proof. The network such that 2'1(z,y) (z, y h) are any two points (5, y) and (z, y h) in and &(z, y h) do not vanish, then I'(z h, y) = r(z h, y h) = 1. Hence (z h, y) and (z h, y h) belong to a h . Equating the coefficients of u(z h, y h) in the expressions for n r u ( z , y) and rRu(z, y) we obtain &(z, y)C& h, Y) = &(z, Y h)Cz(z,Y) or Adz, Y)CZ(S h, Y) = Adz, Y h)Cz(z, Y). (D.lO) Also, equating the coefficients of u(z* - h, y h) in the expressions for Bvu(z*, y) and rflu(z*, y), where z* = z h we obtain Z3(z h, y) G(z, Y) = Z ( 5 h, Y h)C& h, y), or
+
+ + + +
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+ h, Y + h)C2(z + h, y), A i b , ~)C2(z,Y) = Ai(z, Y + h)Cz(z + h, Y).
A3(2
and, by (D.81
+
+ h, Y)C2(2, Y) =
A3(X
Combining (D.lO) and (D.ll) we obtain [Al(z,y)I2 = [Al(z, y and since A1 > 0 we have
+
(D.ll)
+ h)I2,
Ai(z, Y) = Ai(z, Y h). Since this is true for any two points (2,y) and (z h, y) in ah,it follows that the nonzero values of &(z, y) are independent of y. Similar arguments can be used to prove this about Z3(x,y) and to show that the nonzero values of C2(x, y) and C4(z, y) are independent of z. Thus Lemma D.3 is proved. In order to apply Lemma D.3 to the proof of Theorem D.l, since I? and P are not in general symmetric, we construct operators HcN) and W N ) which are both symmetric and commutative. We let
+ h, Y) - zP')(z,Y ) u ( ~- h, Y), y) = C P ( z , y)u(z, Y) - CP"3, Y)U(S, y + h)
H Y z , Y) VN)(X,
+
=
Ar)(z, y)u(z, Y) - Z N ) ( zYMz ,
- CY)(z, Y ) u ( ~y, - h),
where
03.12) (D.13)
ALTERNATING DIRECTION IMPLICIT METHODS
269
+
Zf")(x,y)
= Af")(x, y)r(x h, y), - h, y), etc. It is easy to see that H w ) and V") are symmetric. To show that they commute we consider the associated matrices H") and V"). Evidently, if F(x, y) = 1/P(x, y) and if the diagonal matrix F corresponds to the function F(z, y) then H") = F1/2HF-1/2and V") = F1/2VF-1/2.Clearly, if H V = V H , then Hw)Vw) = V")H"). Hence, by Lemma D.2 it follows that the nonzero coefficients~ P ) y) (Z and, Plm(x, y) depend only on x and y, respectively. In particular, A:"(%, y) = P'/'((s,y)P'/2(z h, y)A(x (h/2), y) must be independent of y except for points (5, y) of a h such that (x h, y) does not belong to a h . It follows that
Here, as usual,
71hm(z,y)
= A P ( ( s ,y)r(z
+
+
+
for all h in X and for all (x,y) in a h except as noted above. Since P(x, y) is continuous, the limit of both sides of the above equation exists as h + 0 through the sequence X, and we have P(x, y ) A ( x , y)
=
(D.16)
X ( x ) = Lim e(x, y). h+O h€%
Since this is true for all points (x,y) which for some h in X belong to @I, and such that (x h, y) is in ah, and since such points are dense in 6i 63, it follows by continuity that (D.16) holds throughout (R 63. Similarly, we have for some continuous function Y ( y )
+
Substituting (D.16) in (D.15) we have
+
+
270
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
or
log A(x,y)
- 51 log A(x: + h, ?J)= log Oi(x, y) =
ez(z, h).
(D.19)
+ 6%we have by the mean
Since A belongs to class C(l) and is positive in (R
- 2-1 r ( z + h, y)
(D.20) =
0,
where (D.21)
The general solution of the difference equation (D.20) is
T(z,Y)
+
= 4-l) @(?I)
(D.22)
for suitable functions a ( y ) and P(y). Upon substituting (D.22) in (D.21) and integrat,ing we have
A( 2 , y) = -&(z)Fl(y)exp [ z Y l ( ~ ) l (D.23) for suitable functions El(z)Fl(y) and Yl(y). Similarly for suitable Ez(z), Fz(Y), X&) we have C(z, ?J) = Ez(z)Fdy) exP [Yxl(z)l. (D.24) But by (D.16) and (D.17) we have A ( z , y)/C(z, y) = X ( z ) / Y ( y ) so that (D.25)
But by (D.23) and (D.24), for some constant a
Xl(Z) and hence
A ( z , y)
=
=
Yl(Y) = a,
El(z)Fl(y)eazY,C ( x , y)
=
Ez(z)Fz(y)eaZ”.
(D.26)
Moreover, by (D.16), (D.17), and (D.26) there exists a constant c different from zero such that (D.27)
Since the diagonal matrix 2 which corresponds to the function h2P(z,y)
ALTERNATING DIRECTION IMPLICIT METHODS
271
G(x,y) must be a constant times the identity matrix, by (13.4), it follows that for some constant K
G(x,y) = K&(x)Fi(y). (D.28) To determine the constant a we use the fact that AP(x,y) and ChN(z, y) are independent of x and y, respectively. By (D.14), (D.26), and (D.27) we must have
independent of y. But since El(z) is a positive function this is clearly impossible unless a = 0. Therefore (D.3) follows from (D.26), (D.27), and (D.28), and Theorem D.l is proved. Even if the diagonal matrix 2 is not a constant times the identity matrix one might try to obtain matrices H', V' and 2' = 0 satisfying conditions (13.3)-(13.5) by letting H' = H yZ', V' = V (1 - 7)Z' for some constant y. Conceivably H' and V' might commute even though H and V did not. This is clearly not possible, of course, if B = al. It can be shown that if G(z, y), A(x,y), and C(z, y) are of class C(3),then the conditions of Theorem D . l are necessary in order for H' and V' to commute. We omit the proof.
+
+
Bibliography 1. Birkhoff, G., and Varga, R. S., Implicit alternating direction methods. Trans. AMS 92,13-24 (1959). 2. Bruce, G. H., Peaceman, D. W., Rachford, H. H., and Rice, J. D., Calculation of unsteady-state gas flow through porous media. Trans. A I M M E 198,79-91 (1953). 3. Conte, S., and Dames, R. J., An alternating direction scheme for the biharmonic difference equations. Math. Tables Aid Comput. 12, 198-205 (1958). 3a. de Boor, C. M., and Rice, J. R., Tchebycheff approximation by
+
a W ( z - r,)/(z rJ1 and application to AD1 iteration. To appear in J . SOC.Ind. Appl. Math. 4. Douglas, J., Jr., A note on the alternating direction implicit method for the numerical solution of heat flow problems. Proc. A M S 8, 409-411 (1957). a% a% au 5. Douglas, J., Jr., On the numerical integration of - - = - by implicit meth-
ax*
+a g
at
ods, J . Soc. Ind. Appl. Math. 3 , 4 2 4 5 (1955). 6 . Douglas, J., Jr., Alternating direction iteration for mildly nonlinear elliptic differential equations. Numer. Math. 3, 92-98 (1961). 7. Douglas, J., Jr., and Rachford, H., On the numerical solution of heat conduction problems in two and three space variables. Trans. A M S 82,421-439 (1956). 8. Forsythe, G. E., and Wasow, W. R., Finite-difference Methods for Partial Differential Equations. Wiley, New York, 1960. 9. Fort, T., Finite LXfferences.Oxford Univ. Press, London and New York, 1948. 10. Frankel, S., Convergence rates of iterative treatments of partial differential equations. Math. Tables Aid Comput. 4, 65-75 (1950).
272
GARRETT BIRKHOFF, RICHARD S. VARGA, AND DAVID YOUNG
10a. Gantmakher, F. and Krein, M., Sur les matrices completement non-negatives et oscillatoires. Compositio Math. 4,445-476 (1937). 11. Golub, G. H., and Varga, R. S., Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods, I. Numer. Math. 3,147-156 (1961); Part 11.Nunaer. Math. 3, 157-168 (1962). 12. Heller, J., Simultaneous successive and alternating direction schemes. J. SOC.Ind. and Appl. Math. 8, 150-173 (1960). 12a. Householder, A. S., The approximate solution of matrix problems. J. Assoc. Computing Machinery 5,205-243 (1958). 12b. Kryloff, N., Les mbthodes de solution approchh des problhes de la physique mathbmatique. M h . Sci. Math. No. 49,68 pp. (1931). 13. Lees, M., Alternating direction and semi-explicit difference methods for parabolic partial differential equations. NUWT. Math. 3,398-412 (1962). 14. Ostrowski, A. M., On the linear iterative procedures for symmetric matrices. Rend. mat. appl. [5] 14,140-163 (1954). 15. Parter, S. V., “Multi-line” iterative methods for elliptic difference equations and fundamental frequencies. Numer. Math. 3, 305-319 (1961). 16. Peaceman, D. W., and Rachford, H. H., Jr., The numerical solution of parabolic and elliptic differential equations. J. SOC.Znd. and Appl. Math. 3, 28-41 (1955). 16a. Remes, E., Sur un procbd6 convergent d’approximations successives pour dbterminer les polpomes d’approximation. Compt. rend. acad. sn’. 198,2063-2065 (1934); Sur le calcul effectif des polynomes d’approximation de Tchebichef. Ibid. 199, 337-340 (1934). 16b. Rice. J. R., Tchebycheff approximations by functions unisolvent of variable degree. Trans. A M S 99,298-302 (1961). 16c. Shortley, D., and Flanders, G. A., J. Appl. Phys. 21,1326-1332 (1950). 16d. Stiefel, E. L., Numerical methods of Tchebycheff approximation. In On Numerical Approximatiom (R. Langer, ed.), pp. 217-233. Univ. of Wisconsin Press, Madison, Wisconsin, 1959. 16e. Thrall, R. M., and Tornheim, L., Vector Spaces and Matrices, p. 190. Wiley, New York, 1957. 17. Varga, R. S., Overrelaxation applied to implicit alternating direction methods. Proc. Intern. Congr. on Information Processing, Paris, pp. 85-90, June (1958). 18. Varga, R. S., p-cyclic matrices: A generalization of the Young-Frankel successive overrelaxation scheme. Pacific J. Math. 9, 617-628 (1959). 19. Varga, R. S., “Matrix Iterative Analysis.” Prentice-HaI1, Englewood Cliffs, New Jersey, 1962. 20. Varga, R. S., Factorization and normalized iterative methods. In Boundary Problems i n Bfferential Equations, pp. 121-141. Univ. of Wisconsin Press, Madison, Wisconsin, 1960. 21. Varga, R. S., Orderings of the successive overrelaxation scheme. Pmijic J. Math. 9, 925-939 (1959). 22. Varga, R. S., Higher order stable implicit methods for solving parabolic partial differential equations. J. Math. and Phys. 40, 220-231 (1961). 23. Wachspress, E. L., CURE:a generalized two-space-dimension multigroup coding for the IBM 704. Knolls Atomic Power Laboratory Report No. KAPL 1724, General Electric Co., Schenectady, New York, April (1957). 24. Wachspress, E. L., and Habetler, G. J., An alternatingdirection-implicit iteration technique. J. SOC.Ind. and Appl. Math. 8,403424 (1960).
ALTERNATING DIRECTION IMPLICIT METHODS
273
25. Wachspress, E. L., Optimum alternating-direction-implicititeration parameters for a model problem. J. SOC.Ind. and Appl. Math. 10, 339-50 (1962). 25a. Weyl, H., Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen.Math. Ann. (Leipzig) 71,441479 (1912). 25b. Young, D., Iterative methods for solving partial difference equations of elliptic type. Ph.D. Thesis, Harvard (1950). 26. Young, D., Iterative methods for solving partial difference equations of elliptic type. Trans. AMS 76,92-111 (1954). 27. Young, D., Ordvac solutions of the Dirichlet problem, J. Assoe. Computing Machinery 2,137-161 (1955). 28. Young, D., On the solution of linear systems by iteration. AMS Symposium on Numer. Anal. 6 (1956). 29. Young, D., and Ehrlich, L., Numerical Experimenls Involving Boundary Problems in Differential Equations, pp. 143-162. Univ. of Wisconsin Press, Madison, Wisconsin, 1960.
This Page Intentionally Left Blank
Combined Analog-Digital Techniques in Simulation HAROLD
K. SKRAMSTAD
U. S. Navol Ordnance Laboratory Corona, California*
1. Comparison of Analog and Digital Computers in Simulation 2 . Interconnected Analog and Digital Computers . . . . 3. Example of a Combined Solution . . . . . . . . 4. Analog-Digital Arithmetic in a Digital Computer . . . . 5. Systems Using Analog-Digital Variables . . . . . . Bibliography . . . . . . . . . . . . . .
. .
. . .
.
.
.
.
. 277
. .
. .
.
.
. .
. .
. 281 283 . 288 . 296
.
.
.
275
.
1. Comparison of Analog and Digital Computers in Simulation
One of the most common types of problems to which computers have been applied is the study of complex nonlinear dynamic systems subjected to external disturbances. Such systems are represented mathematically by nonlinear systems of differential equations with time as the independent variable. Computers used to solve problems of this type are commonly called simulators, since often the passage of time in the computer solution is proportional to time in the system under study, and thus the system under study is “simulated” by the computer. Examples of typical types of dynamic problems to which computers (or simulators) have been applied are shown in Table I. The comparative characteristics of analog and digital computers when applied to the solution of dynamic systems problems are shown in Table II. It is evident that the electronic analog computer has features which make it particularly adaptable to this type of problem. The more important of these features are continuous representation of variables, high speed of operation, and ease of incorporating actual systems hardware. The fact that all mathematical operations are going on in parallel in separate analog computing elements gives this technique a decided speed advantage over digital machines in which most operations are carried out in sequence in a
* Former affiliation:Xational Bureau
of Standards, Washington, D.C. 275
HAROLD K. SKRAMSTAD
276
TABLE I. EXAMPLES OF DYNAMICSYSTEMS SIMULATION PROBLEMS Aircraft dynamic response Aircraft landing shocks Automatic control systems Automatic pilots Automobile suspension systems Chemical equilibrium Chemical kinetics Chemical process control Economic systems Electron optics Exothermic reactor control Feedback amplifiers Geophysics Heat exchangers Helicopter vibrations Hydraulic transmissions Insect population dynamics Linear and nonlinear circuits
Missile guidance and control systems Modulation systems Noise effects Nuclear physics Nuclear reactor control Operations research Particle injection into accelerator Radar tracking circuits Servomechanisms Spring systems Seismometer systems Transducers Transient heat conduction Torsional vibrations Transmission lines Transistor performance Vibration of structures Vibrometers
TABLE 11. COMPARATIVE CHARACTERISTICS OF ANALOG AND DIGITAL COMPUTERS FOR SIMULATION OF DYNAMIC SYSTEMS Analog
Digital
Continuous representation of variables Parallel operation of components High operation speed Increasing problem size requires proportional increase in computer size a t no increase in solution time Accuracy and dynamic range limited by physical measurement capability Continuous integration Difficult t o handle and storc discretc data Solution progresses at rate proportional t o rate of progression of system studied Easy to include parts of actual system in simulation
Variables represented by discrete numbers Operations performed in sequence Slower operation speed Increasing problem size requires increase in memory size and time of solution
Behavior of system easily visualized as simulation proceeds
Accuracy and dynamic range extendablc t o any desired degree Integration by finite difference calculus Easy to handle and store discrete data Ordinarily no correspondence between rate of solution and rate of system studied Requires analog-digital converters and real time computing speeds to include system parts Output data ordinarily not available as solution proceeds
single time shared arithmetic unit. Also many operations such as integration, trigonometric resolution, and function generation are performed on a digital computer by approximate numerical techniques requiring many computing steps or iterations. The only limitation on the highest natural
ANALOG-DIGITAL TECHNIQUES IN SIMULATION
277
frequencies that can be simulated with fidelity in an analog computer is the bandwidth or frequency response of the individual analog elements employed. The limitations on the speed of a digital computer are such factors as memory cycle time, order code, the time required to perform common operations such as add, subtract, multiply, and divide, and input/output equipment speed. In real-time simulation the digital computer must carry out all the calculations called for in the mathematical model a sufficient number of times per second to follow accurately the highest natural frequencies present in the solution. This usually demands an iteration rate for integration of the order of ten times the highest natural frequency involved in the problem, dependent on the numerical algorithm chosen. If real-time operation is not required, an increase in the time of solution can relieve the dynamic requirements on the individual components of an analog computer, but any increase in problem size will call for a corresponding increase in the size of the computer. I n the digital case however, increasing the time of solution automatically increases the size of the problem that can be solved. If the memory capacity is sufficient, no increase in equipment is needed to handle a larger problem. Thus a digital computer may be said to be time expandable as opposed to the hardware expandable analog computer. The biggest deficiency of the analog computer is that its precision and dynamic range are limited by physical measurement capnbility, and thus it cannot be applied to problems where the precision and dynamic range required is beyond its capabilities.
2. Interconnected Analog and Digital Computers
Many simulation type problems may be solved almost equally well on either an analog or a digital computer. There are many type. of problems however that are not handled well on either type alone, and consideration has been given to the advantages to be derived from interconnecting analog and digital machines through appropriate analog-to-digital and digital-toanalog converter equipment. In such a system, part of the problem can be solved on the digital computer and part on the analog computer, with exchange of data taking place through interconnecting data channels. Primarily, combined systems of this type are needed to handle problems which involve highly accurate calculations on low frequency data as well as lower accuracy calculations on high frequency data. These problems require the high bandwidth of analog computers as well as the high precision and dynamic range of the digital computer. For example, missile guidance and control systems often involve high frequencies in the dynamic equa-
278
HAROLD
K. SKUAMSTAD
tions of motion of the vehicle and a t the same time low frequency large dynamic range variables, such as position, which need to be known to an accuracy of perhaps a few feet in several hundred thousand feet. Solving problems of this nature on a digital computer alone would require very large size computers or excessive running time due to their low bandwidth capacity. On the other hand, analog computers would not be capable of yielding the required precision. Other examples of problems which might better be solved by interconnection of analog and digital computers are: (1) Simulation of hybrid systems, such as a digitally controlled physical process. The analog computer might be used to simulate the process and the digital computer to simulate the digital controller. (2) Problems handled adequately on analog computers except for requiring the generation of arbitrary functions of two or more variables, generation of pure time delays, axis transformation or long-time integration-jobs which frequently are more easily accomplished on digital computers. (3) Very large dynamic problems which would require excessive time on a digital computer or excessive equipment on the analog, but which, by proper division of the problem, could be solved in a reasonable time and with a reasonable amount of equipment on a combined facility. Work in interconnecting analog and digital computers began in 1954 at Convair, Xan Diego [.@?I, and involved interconnecting an ERA 1103 digital computer to a large analog facility. In 1956, a combined system was put into operation simulating the flight of an Atlas missile. Shortly thereafter, a t a new location, an IBM 704 digital computer was linked to a large Electronic Associates analog computer through Epsco “Add-a-Verter” conversion equipment. At about the same time, Ramo-Wooldridge linked together an 1103 and an Electronic Associates analog computer through similar “Add-a-Verter” equipment [ I ] . This laboratory (now under Space Technology Laboratories) is now linking a Packard-Bell PB 250 digital computer to their analog computer through this same Add-a-Verter equipment [ I 4 241.In 1957, the National Bureau of Standards interconnected their SEACdigital computer, a Mid-Century analog computer, and appropriate display and control equipment for simulation of complex manmachine systems, such as a ground controlled intercept system [SS, EL$]. The analog computer was used to simulate the dynamics of the interceptor aircraft being flown by a pilot in a simulated cockpit. The velocity components of the interceptor were converted to digital form, and used as inputs to the digital computer. The digital computer was used to keep track of the interceptor’s position as well as that of all other aircraft in an
ANALOG-DIGITAL TECHNIQUES IN SIMULATION
279
area under study. This facility has also been used for studies of air traffic control. IBM Research Laboratories in Yorktown Heights have interconnected analog computer to an IBM 704 through an Add-a-Link converter a PACE unit. This system has been used for the simulation of chemical processes [SI].At IBM, Owego, a similar combined facility has been applied to the simulation of automatic navigation systems. At General Electric Missile & Space Vehicles Department, an IBM 7070 and a PACEanalog computer with automatic digital input and output equipment have been linked by a special interconnecting link, called HYCOL,which includes analog-to-digital and digital-to-analog conversion equipment built by Packard-Bell [W7]. In this system the analog computer is being put completely (except for patching) under the control of the digital computer. This facility is intended for use in exploring some of the increasingly complex problems of space navigation and control. At the University of Minnesota, an 1103 digital computer and a Reeves analog computer have been linked with Add-a-Link equipment [%I. This combined facility is designed as an educational tool for the study of control systems, particularly in the field of chemical engineering. It is also being used to investigate the ability of the combined system to function as a general purpose mathematical computing machine. The Armament & Flight Control Division of Autonetics has interconnected a large PACEanalog computer with a G-15 digital computer with Packard-Bell conversion equipment. This combination is used primarily in the simulation of complex weapon control systems. At Grumman Aircraft Engineering Corporation, a 704 digital computer and a large analog facility have been interconnected [8, 91 with a data-link manufactured by Adage Incorporated. Presently they have an IBM 7090 linked with their analog facility. The FAA has set up, a t the n’ational Air Facility Experimental Center a t Atlantic City, a large scale man-machine simulator for the study of terminal area air traffic control systems [18].Their simulation effort involves the interconnection of digital computers with special purpose analog aircraft generators and displays. Combined facilities are also a t Convair Astronautics where an IBM 7090 is linked to an analog computer with the Add-a-Verter equipment [@I, and at Douglas where their analog computer is linked to a Bendix G-15 computer. North American Aviation and the Naval Ordnance Laboratory, Corona are interconnecting digital differential analytheir analog facilities with Packard-Bell TRICE zer equipment, which will provide the capability of precise long term integration and extremely rapid and precise axis rotation and other nonlinear transformations. This list is not exhaustive, and there are many
HAROLD K. SKRAMSTAD
280
others not mentioned here which are now in operation. I t is prcscntcd only to show the rapidly increasing use of combined facilities. Figure 1 shows a block diagram of a combined analogdigital facility. Analog-to-digital converters are required to convert analog voltages to digital quantities for introduction into the digital c*omputcr, and digital-
L
4
ANALOG COMPUTER
ANALOG TO DIGITAL CONVERTERS
-
DIGITAL TO ANALOG CONVERTERS
c
c
DIGITAL COMPUTER
1 4
p~
CONVERSION CONTROL
4
c
EQUIPMENT
Fic. 1 Combined urinlog-digital computrr farility
to-analog converters to convert digital numbers to analog voltages. Special conversion control equipment is rcquired to control the timing of the conversion in either direction. In a typical simulation, the digital computer will receive information sampled from the analog, use this data in a computation, and transmit the results back to the analog computer. It is generally required that the continuous data from the analog computer be sampled at a definite frequency, and it is ordinarily preferable to control the sampling from a timing source cxternal to the digital computcr rathcr than from the computcr itself. It is necessary, of course, to sytwhronizc the input and output of the digital computer with this timing soiircc.
ANALOG-DIGITAL TECHNIQUES IN SIMULATION
28 1
A special problem which occurs when combining analog and digital computations is the time lag introduced into the problem by the time required for the analog voltage to be sampled, converted to digital form, introduced into the digital computer, digital calculations made, and then returned again as a voltage to the analog computer. This is equivalent to a transport delay. If this delay is objectionable, some type of extrapolation must be performed in the digital computer to update the data before conversion to analog form. The necessity for the analog computer to receive inputs in continuous form produces another problem. Usually some type of interpolation of the digital data is used, such as to hold the converted data from the digital computer constant until each new value is received. More sophisticated methods involve proper interconnection of analog components using higher order derivatives from the digital computer. I n applying a combined system to a particular problem, certain decisions must be made, such as: (1) which calculations to do on the analog and which on the digital computer, (2) what precision of analog-to-digital and digitalto-analog conversion is necessary, and (3) what is the optimum sampling period for various variables. As mentioned previously, high frequency low accuracy computations should be handled on the analog, while the low frequency high precision calculations should be given to the digital. The precision of conversion is usually made five to ten times greater than that of the information it handles. The sampling period for any variable needs to be long enough so that the digital calculations required during each period can be completed, and yet short enough so that several samples will be obtained during each cycle of the highest frequency present in the analog input. Thus the time scale that can be used for a particular simulation is dependent upon the speed of the converters and speed of calculations of the digital computer, as well as the highest frequencies present in the analog computer variables. 3. Example of a Combined Solution
To illustrate the limitations of either an all analog or an all digital solution, and the advantages of a combined analog-digital solution for certain types of problems, let us consider the terminal phase of the missile intercept problem in two dimensions considered by Grumman Aircraft, and illustrated in Fig. 2 [8, 91. Equating velocities normal and transverse to the line of sight give the following equations:
+ V,sinp + vt cos p
R& = - V m s i n a
-k = v,
cos
(3.1)
(3.2)
282
HAROLD K. SKRAMSTAD Y
FIG.2. Illustrative missile intercept problem.
From the geomet,ry,the angles CY and 0are given by the following equations:
cr=e-d
(3.3)
P=d-r+r (3.4) Let us assume a control equation with a second order time lag given by 9’8 276 6 = N$* (3.5) where 6 is a control surface deflection of the missile, d* is an amplitudelimited value of $, N is the proportional navigation constant, and 7 is the control system time constant. The missile transfer function is assumed of the second order giving the following relationship between e and 6 where w and 5- are the natural angular frequency of the missile and the damping coefficient, respectively : 1 25- .. -e+-e+e=k6
+
+
a*.
w2
w
A basic difficulty with an all analog solution of this problem is the scaling. I n this case, R is initially of the order of magnitude of 50,000 f t and ap-
ANALOG-DIGITAL TECHNIQUES IN SIMULATION
28 3
proaches zero for a direct hit. Miss distances in the order of plus or minus two feet are desired. If we scale so that 100 volts represents R = 50,000 ft, R = 2 ft would correspond to 4 mv, which is well below the probable noise level in practical operation of the analog computer. Another variable which presents a scaling problem is 4,the rate of change of the line of sight angle. Initially, when R is large, 4is small, but it approaches a large value when R approaches zero. This does not offerany particular problem in the guidance equation loop as this signal is limited. However, the true 6 is required to compute the kinematics of the problem. A disadvantage of a digital method of solution lies in the relatively longer amount of computing time required for each solution. If a parametric study or statistical analysis of the system is made, the number of solutions required for any one analysis may be very large. Figure 3 shows how this problem would be solved in a combined system. The airframe and control equations are solved on the analog computer since it can easily meet the precision and frequency requirements of these equations. The digital computer handles the equations involving R and 4 since the dynamic range required here is large. To handle the high frequencies involved in the control and aerodynamic equations, an all digital solution would have to use about 0.025 sec time interval for numerical integration until a range of about 100 ft is reached, and a sampling interval of 0.001 sec for the remainder of the flight. This would result in a running time of about eight times real time. However, in the combined analog digital solution, since the dynamic equations are solved by the analog computer and the equations solved by the digital computer have parameters which do not vary rapidly, it is estimated that the solution could be run a t about one-tenth real time. Thus a considerable time saving can be realized over an all digital solution.
4. Analog-Digital Arithmetic in a Digital Computer
Combined analog-digital techniques have been devised to perform the arithmetic operations in a digital computer. One technique is based on the fact that a digital-to-analog converter is inherently a multiplier, and the analog-to-digital converter a divider. For example, a digital-to-analog converter, shown diagrammat)ically in Fig. 4(a), multiplies the analog reference voltage ( E ) by the digital input ( X ) to produce an analog output signal ( X E ) . Also, an analog-to-digital converter, shown diagrammatically in Fig. 4(b), divides the analog input ( X E ) by the analog reference signal ( E ) to produce the digital quotient ( X ) . Figures 4(c) and 4(d) show
284
P
c
2
R 4 -
i*
9
D-A CONVERTERS
FOR PLOTTING
f
-
ANALOG COMPUTER
D I G I T A L COMPUTER R;
=
-vM
=
vM
0
-R
sin a t
vT sin
c o s a t vT c o s p 0
0
SYSTEM CONTROL 000
8
O0
Kw28 - 2SwO-
w
2
0
8
0
0
-9m 9 < - 9 m a=8-9
p= 9 Y=
FIG.3. Block diagrnm for combined solution.
- y t
r,+9
7
HAROLD K. SKRAMSTAD
1 CONVERSION
ANALOG-DIGITAL TECHNIQUES IN SIMULATION
(0)
DIGITAL
285
TO ANALOG CONVERTER
E
XE
( b ) ANALOG TO DIGITAL CONVERTER
E
D/A
I
E
(c
MULTIPLIER
FIG.4. Hybrid arithmetic.
how converters might be interconnected to perform digital multiplication and division, respectively. Hybrid arithmetic units based on converters, particularly as applied in control computers, is discussed in detail in references [4, 5, 101. A system using hybrid arithmetic has been studied a t IBM Laboratories, Yorktown Heights, N. Y. [@I, and an arithmetic unit has been built to demonstrate the feasibility of the system. If scaling is arranged properly so as to operate a,s close to full scale as possible on the converters, preciaions cJf 1 part in lo4 to 1 part in lo5 can be achieved-sufficient for many problems. The speed of calculation depends primarily on the speed of the
286
DECODER ( D I G I T A L TO ANALOG 1
DIGITAL COMPUTER
~AE"I\cLOoDGERTO DIG
I 2
M U LT.
-(AX +Bx)
--c
+ A ~ Z +B X + C
I
FIG.5 . Use of pulsed-analog cornporierits t o evsluate the polynomial ( A 3
+ Ux* + Cx + D).
HAROLD K. SKRAMSTAD
I1 .
ANALOG-DIGITAL TECHNIQUES IN SIMULATION
287
conversion equipment, and an arithmetic unit of this hybrid type would have a speed advantage over a purely digital unit mainly in cases in which the arithmetic unit has many digital-analog components operating in parallel. Another method of combining analog and digital techniques in the arithmetic unit of a digital computer, known as the “pulsed analog” technique, is under development a t the MIT Electronic Systems Laboratory [ I l l 15, 2’2.1. Figure 5 shows how the third degree polynomial y = Ax3
+ Bx2 + CX + D
can be evaluated using a digital-to-analog converter, an analog-todigital converter, three summing amplifiers, three multipliers, and four coefficient potentiometers. In this system the constants A , B, C , and D DECODER (DIGITAL TO ANALOG
-
SIGNALS
1
GATES ACTIVATED
PROGRAM
I . DECODE AND STORE
X
GI G 3 GI G4
2. DECODE AND STORE A 3. DECODE AND STORE 8 , STORE A x + B 4. TRANSFER Ax t B 5. DECODE AND STORE C, STORE A x z t Bx t C 6. TRANSFER Ax2+ Bx + C z DECODE AND STORE D, STORE y
I
y:
GI G5G6 G2 G4 GI G5 G6 G2G4 G5G6
Ax3tBx2+Cx+D
SEVEN STEPS
- 70
I
MICROSECONDS
FIG.6 . Alternative pulsed-analog configuration for polynomial expansion.
288
HAROLD K. SKRAMSTAD
would have to be set up on analog coefficient potentiometers. Using typical components, the evaluation of the polynomial would require about 10 psec. On the basis of a 10 psec add execute time and 25 psec multiply execute time, the evaluation of the polynomial on a digha1 computer would require 115 psec. A second, more general pulsed-analog configuration is shown in Fig 6 . GI and Gz are sample gates which sample their input voltages on command. Ga to Gs are storage gates which sample and hold their input voltages after disconnection from their inputs. I n this case, the variable x and the constants A , B, C , and D must be obtained from the digital memory. The sequence in which the various sample and storage gates are activated and deactivated under control of the digital program is given in the figure. The computation time saved with the pulsed analog technique depends, of course, on the amount of special purpose analog equipment one is willing to use, the frequency of oft repeated computational sequences, and the time consumed in carrying out these sequences by digital means. The technique is expected to increase the effective computation speed of a digital machine by a factor of two or more, and is currently being applied in the design of a flexible operational flight trainer. 5. Systems Using Analog-Digital Variables
A somewhat different method of combining analog and digital techniques is to represent quantities not by numbers as in the digital computer or by electrical voltages as in the electronic analog, but by the sum of a number and an electrical voltage where full scale on the electrical voltage is equivalent to one in the least significant digit of the digital part of the number. For example, if we assume 100 volts in the analog part is equal to 0.001 in the digital part, a quantity which is represented digitally as the number 0.846753 would be represented in this combined system by the digital number 0.846 plus an analog volt,age equal to 75.3 volts. An alternate representation would be a digital part 0.847 and an analog voltage of a - 24.7 vo1t.s. Figure 7 shows how a sine wave of angular frequency w would be represented in this hybrid system if a factor of 10 is carried digitally. It is obvious that to gain a given factor in precision, the rate of change of the analog voltage is increased by the same factor and thus the basic limitation on such a combined system will be the bandwidth of the analog components. A computer based on this type of numerical representation is described by the author [55, 581, and some additional computing elements are described in reference [SO]. It is similar in organization to an analog computer in that it consists of computer components such as integrators, multipliers,
ANALOG-DIGITAL TECHNIQUES IN SIMULATION 289
290
HAROLD K. SKRAMSTAD
summers, etc., interconnected in open- or closed-loops to solve a problem. Let us consider what form some of the computer components such as an integrator and a multiplier take in such a combined system. Assume we wish to obtain the following: 1 y = xdt
-It
T o
where x and y are functions of the time and T is the time constant of the integration. Let us also assume that the problem has been scaled so that the maximum value of all dependent variables will not exceed unity. Let each of the two dependent variables x and y consist of a digital part and an analog part, denoted by the subscripts D and A , respectively. Thus, we have = XD %A (5.2)
+
y =
+ +
y = yD YA 1 (ZD Z A ) dt.
(5.3) (5.4)
Let us assume time to be divided into discrete equal intervals of duration At, and that the digital parts of x and y can change only at times which are integral multiples of At. We may then write for the value of 9 a t a time t somewhere in the nth interval :
where ( z D ) is ~ the value of XD during the ith interval At. Figure 8 shows a curve of x as an arbitrary function of t. The area under this curve from 2 = 0 to any arbitrary t would equal the integral in Eq. (5.4) or the bracketed expression in Eq. (5.5). The first term in the bracketed expression, represented by area 1, is the integral of the digital part of z up to the time (n - 1) At. The second term, represented by area 2, is the integral of the digital part of x between (n - 1) At and t. The third term, represented by area 3, is the integral of the analog part of x from t = 0 to t. Figure 9 is a block diagram of an integrator unit. It contains an input digital register xD, a digital register R, two digital-to-analog converters, a conventional analog integrator, a special resettable analog integrator, an analog summer, and a comparator unit. The register YD shown on the far right of the figure is the input register of the next component to which this unit might be connected in solving a problem. E is the analog reference voltage supplied to the digital-to-analog converters and a is the digital equivalent of the reference voltage E. At the beginning of each At period, the values XD and R are sampled and converted to analog voltages which are held constant during the period,
ANALOG-DIGITAL TECHNIQUES IN SIMULATION 291
h)
0
CLOCK PERIOD A t
CLOCK PERIOD A t
I I
I I
REGISTER
REGISTER
h)
CLOCK PERIOD A t I I
ka
_ _INCREMENTS _ _ _---_-
FROM OTHER UNITS
REGISTER,
L----
E CONVERTER
RESETTABLE ANALOG INTEGRATOR TIME CONSTANT = A t
rt
I
Jo
a
ANALOG VOLTAGE FROM OTHER UNITS T I M E CONSTANT
=
FIG.9. Integrator unit.
I
ANALOG-DIGITAL TECHNIQUES IN SIMULATION
293
unaffected by future changes in xD or R which occur during the period. The value of X D is then algebraically added to the R register. The voltage VI, which represents that portion of the prior summation of ( x D ) ~At which is of analog magnitude is given during the nth interval At by: V1 = -ER, (5.6) , provides integration of the current X D value within The voltage V Z which the nth interval At? and which is reset to zero a t the end of this interval, is given by : -E(xD)x{t - ( n - 1) At}. V2 = (5.7) At
The voltage V3, which results from the purely analog integration of the continuously varying analog part of x , is given by:
These three voltages are added in the analog summer to give voltage V . The analog part of the output of the integrator is equal to:
h a = V = -(V
1
+ Vz + Vd.
(5.9)
If, a t any time during a period At, the voltage V a t the output of the analog summer exceeds a predetermined upper threshold, this is sensed by the comparator and, during the next At interval, immediately following the addition of XD to R, unity is subtracted from the R register and a is added to the input register of the following unit (YD in Fig. 9). Conversely, if the voltage V falls below a predetermined lower threshold, unity is added to the R register and a is subtracted from the input register of the following unit. It is easily shown that the time constant T of this integrator is equal to Atla.
A multiplier unit operating in this combined representation is shown in Fig. 10. Suppose we wish to obtain the product z = xy. Assuming as before that each variable consists of a digital part and an analog part, we have ZD Z A = XDYD XDYA XAYD XAYA (5.10) where the subscripts D and A signify digital and analog parts, respectively. As seen from the figure, a multiplier unit has three digital registers for xD, YD and R ;three digital-to-analog converters; an analog summer; an analog multiplier; and a comparator unit. At the beginning of each period At, the values of XD, YO, and R are sampled and converted to voltages which are held constant during the period. If, during the period, XD receives an increment (or decrement) a from another unit, YD is added to (or subtracted from) R ; and if YD receives an increment (or decrement) a from another unit, XD is added to (or subtracted from) R. If both X D and YD change during At, the additions to R must either
+
+
+
+
h)
'0 P
CLOCK PERIOD A t I
D: , -4 -C a INCREMENTSFROM OTHER UNITS
-
'DT0 CH INCREMENT
---____
?EGISTER L
I IN
.--
7
Yn
I CLOCK IPERIOD A t
I CLOCK I PERIOD A t
Eyp, a
I/
UPPER - - LOWER THRESHOLD THRESHOLD E VOLTAGE V 0 LTA G E
b ?F
I
ANALOG VOLTAGES FROM OTHER UNITS
\ a
T
I
i L
ANALOG MULTIPLIER
I
V, = E R
n
C
DIGITAL TO ANALOG CONVERTER
------t
T a INCREMENTS
FROM OTHER UNITS
REGISTER
,
___
I'
~ a" 4 ' -
I
Jv3=-
U
€ Y D x~
'
I
EACH ADD YINCREMENT D TO R a
I
IN X D
CLOCK PERIOD A t
FIG.10. Multiplier unit.
V=-(V, t v 2 t v 3 t v 4 ) = - -
ANALOG SUMMER
EZA
ANALOG-DIGITAL TECHNIQUES IN SIMULATION
295
be done serially, using the new XD or yo obtained after each addition to R, for the next addition to R, or some other system must be used to obtain a true digital product xDYD. The quantity XDYD can contain twice as many digits as X D or Y D ;the more significant part will be of digital magnitude, and appear in Z D , the input register of the following unit; and the less significant part will be of analog magnitude, and remain in the R register. The reference voltage E is applied to the digital-to-analog converter connected to register R, producing an output voltage V1 = E R ; the input voltage yAE/a is applied to the converter connected to the register XD producing a n output and the input voltage E x A / ~ is applied to the voltage VZ= EXDYA/CY, converter connected to register YD producing a n output voltage V3 = E y ~ x ~ /An a .analog multiplier is connected to the two analog inputs EyA/a and E x A / ~ Its. output, attenuated by a,produces a voltage V4 = ExAy.k/a. An analog summer sums the voltages V1, V 2 ,V 3 ,and V4 to produce a voltage V equal to -E z A / ~ . During the next At after the voltage V exceeds (or falls below) predetermined threshold voltages, unity is subtracted from (or added to) the R register and the number a is added to (or subtracted from) the input register of the following unit ( Z D in Fig. 10). It should be noted that for small values of CY the analog multiplier may be omitted, producing a maximum error of a. For values of a less than the resolution of the analog components, say 0.001 or less, this error is negligible. If one of the factors to be multiplied is a constant, the equipment required is simplified, since only one digital register needs to be capable of accepting increments, and the R register receives additions from only one other register. If the factor is a purely digital quantity, one of the digitalto-analog converters and the analog multiplier may be omitted. Summing may most easily be done by permitting each integrator or multiplier unit to accept digital increments and analog voltages from several units. For example, in the integrator of Fig. 9, if the f a increments from a number of other units are connected to its X D register, and if the sum of the increments put, out by these units is N a during any period, the increment in X D would equal N a . The analog outputs from the other units would each be connected to an input summing resistor in the analog integrator. I n the case of the multiplier unit, if the f a increments from a number of other units are connected to its x register, and if the sum of the increments put out by these units is NCY,yD would be summed into the R register N times. The analog outputs from the other units would be connected to inputs of a n analog summer whose output would form the analog input XAE/ato the multiplier.
296
HAROLD K. SKRAMSTAD
For a maximum speed-precision product to be obtained, the value of At should be as small as possible consistent with hardware limitations. The smaller At is made, however, the greater the bandwidth required in the
operational amplifiers, since the analog voltages must be capable of a full scale voltage excursion E during the time At. For any particular problem, the part of the variable to be carried digitally depends u p m the particular compromise between precision and speed of solution desired. For example, consider integration of the function x = sin wt and assume At = 0.001 sec and a = 0.001. Since the maximum rate of change of this function should not exceed a / A t , the highest frequency representable a t full scale amplitude would be one radian per second and the precision (assuming an analog precision of 0.001 of full scale) would be one part in one million. If we chose to carry only a factor of ten digitally (a = O.l), the highest frequency representable a t full scale amplitude would be 100 radians/sec and the precision would be one part in ten thousand. It is seen that if analog components of sufficient bandwidth could be produced, a precision speed product of this combined system should be greater than that possible with a parallel digital differential analyzer having equal length digital registers and equal iteration rate by a factor equal to the resolution of the analog components, perhaps a factor of one thousand. It is believed that the greatest usefulness of such a combined system would be on simulation problems where the precision required is somewhat greater than that obtainable by analog methods and that also require the real-time speed of the analog computer. If the precision required is the order of ten to a hundred times that obtainable by analog methods, integrators and multipliers of the combined system would contain short digital registers and only moderate requirements would be put on the speed of switching circuits and the bandwidth of the analog components. Bibliography 1 . Bauer, W. F., and West, 0. P., A system for general-purpose analog-digital computation. J . Assoc. Computing Machinery 4, No. 1 (1957). 2. Bauer, W. F., Aspects of real-time simulation. IRE Trans. on Electronic Computers EC7,134-136 (1958). 3 . Baxter, D. C., and Milsuni, J. H., Requirements for a hybrid analog-digital computer. ASME Paper No. 59-A-304, October (1959). 4. Birkel, G., Jr., Mathematical approach to hybrid computing. Proc. Nat. Symposium on Space Electronics and Telemetry, Sari Francisco, California, Paper No. 2.1, September (1959). 5. Birkel, G., Jr., Hybrid computers for process control. AIEE Transaction Paper No. 6@978, presented at the Joint Automatic Control Conf., M I T , Cambridge, Massachusetts pp. 726-734, September (1960). 6 . Birkel, G., Jr., Scaling and informationtransfer in combined analog-digital computer
ANALOG-DIGITAL TECHNIQUES IN SIMULATION
297
systems. Proc. Combined Analog-Digital Computer Systems Symposium, Philadelphia, Pennsylvania, December (1960). 7. Blanyer, C. G., and Mori, H., Analog, digital, and combined analog-digital computers for real-time simulation. Proc. Eastern Joint Computer Conf ., Washington, D.C. pp. 104-110, December (1957). 8. Burns, A. J., and Kopp, R. E., A communication link between a n analog and a digital computer (data-link). Grumman Aircraft Engineering Corp. Research Report RE-142, October (1960). 9. Burns, A. J., and Kopp, R. E., Combined analog-digital simulation. Proc. Eastern Joint Computer Conf., Washington, D.C. pp. 114-123, December (1961). 10. Burns, M. C., High-speed hybrid computer. Proc. Natl. Symposium on Space Electronics and Telemetry, S a n Francisco, California, Paper No. 2.2, September (1959). 11. Connelly, M. E., Analog-digital computers for real-time simulation. MIT Final Report No. ESLFR-110, June (1961). 12. Greenstein, J. L., Application of AD-DA verter system in combined analog-digital computer operation. Presented a t the Pacific General Meeting of AIEE. AIEE C P No. 56-842, June (1956). 13. Greenstein, J. L., A two-channel data link for combined analog-digital simulation. Presented at the A I E E Summer General Meeting, Montreal, Canada. AIEE CP No. 57-856, June (1957). 14. Hartsfield, E., Timing consideration in a combined simulation system. Proc. Combined Analog-Digital Computer Systems Symposium, Philadelphia, Pennsylvania. December (1960). 15. Herzog, A. W., Pulsed analog computer for simulation of aircraft. Proc. I R E 47, No. 5 (1959). 16. Horwitz, R. D., Testing systems by combined analog and digital simulation. Control Eng. 6, No. 9 (1959). 17. Hurney, P. A., Jr., Combined analogue and digital computing techniques for the solution of differential equations. Proc. Western Joint Computer Conf ., San Francisco, California pp. 64-68, February (1956). 18. Jackson, A., and Ottoson, H., Air traffic control system simulator. Proc. Combined Analog-Digital Computer Systems Symposium, Philadelphia, Pennsyluania. December (1960). 19. Leger, R. M., Specifications for analog-digital-analog converting equipment for simulation use. AIEE Paper No. 56-860, June (1956). 20. Leger, R. M., and Greenstein, J. L., Simulate digitally, or by combining analog and digital computing facilities. Control. Eng. pp. 145-153, September (1956). 21. Leger, R. M., Requirements for simulation of complex control systems, Proc. First Flight Simulation Symposium. White Sands Proving Ground Special Report No. 9, September (1957). 22. Lee, R. C., and Cox, F. B., A high-speed analog-digital computer for simulation. IRE Trans. on Electronic Computers EC-8, No. 2 (1959). 23. McLeod, J. H., and Leger, R. M., Combined analog and digital systems-why, when, and how. Instr. & Automation 30,1126-1130, June (1957). 24. Neustadt, L. W., and Bekey, G. A., Combined simulation at STL. Proc. Combined Analog Digital Computer S p t e m s Symposium, Philadelphia, Pennsylvania. December (1960). 25. Nothman, M. H., Combined analog-digital control systems. Elec. M.fg. June (1958). 26. Palevsky, M., Hybrid analog-digital computing systems. Znstr. R. Automation October (1957).
298
HAROLD K. SKRAMSTAD
27. Paskman, M., and Heid, J., The combined analog-digital computer system. Proc. Combined Analog Cigital Ccmputer Systems Symposium, Philadelphia, Pennsylvania. December (1960). 28. Paul, R. J. A., and Maxwell, M. E., The genera! trend towards digital analogue techniques. Proc. Secondes Journ6es Intern. de Calcul Analogique, StrasbouTg pp. 403408, September (1958). 29. Peet, W. J., 11, Some aspects of computer linkage system design. Proc. Combined Analog Digital Computer Systems Symposium, Philadelphia, Pennsylvania. December (1960). 30. Schmid, H., Combined analog-digital computing elements. Proc. Western Joint Computer Conf., Los Angeles, California pp. 299-314, May (1961). 31. Shapiro, S., and Lapides, L., A combined analog-digital computer for simulation of chemical processes. Proc. Combined Analog-Digital Computer Systems Symposium, Philadelphia, Pennsylvania. December (1960). 32. Shumate, M. S., Simulation of sampled-data systems using analog-to-digital computers. Proc. Western Joint Computer Conf., S a n Francisco, California pp. 331-338, March (1959). 33. Skramstad, H. K., Combined andog-digital simulation of sampled data systems. Presented a t the AIEE Summer General Meeting, Montreal, Canada, June (1957). 34. Skramstad, H. K., Ernst, A. A., and Nigro, J. P., An analog-digital simulator for the design and improvement of man-machine systems. PTOC. Eastern Joint Computer Conf., Washington, D.C. pp. 9(t96, December (1957). 35. Skramstad, H. K., A combined analog-digital differential analyzer. Proc. Eastern Joint Computer Conf., Boston, Massachusetts pp. 94-100, December (1959). 36. Stein, M. L., A general-purpose analog-digital computer system. Proc. Combined Analog-Digital Computer S y s t e m Symposium, Philadelphia, Pennsylvania, December (1960). 37. Susskind, A. K., Notes on Analog-Digital Conoersion Techniques. Technology Press (M.I.T.) and Wiley, New York, 1957. 38. Urban, W. D., Hahn, W. R., Jr., and Skramstad, H. K., Combined analog-digital differential analyzer (CADDA). Proc. Combined Analog-Digital Computer Systems Symposium, Philadelphia, Pennsylvania. December (1960). 39. West, G. P., Computer control experience gained from operation of a large combined analog digital computation system. Proc. Joirt A I E E - I R E Symposium on “Computers in Control” pp. 95-97, May (1958). 40. West, G . P., Combined analog-digital computing system. Handbook of Automation, Computation, and Control, Vol. 2, Chapter 30. Wiley, New York, 1959. 41. Wilson, A. N., Recent experiments in missile flight dynamics simulation with the Convair “Addaverter” system. Proc. Combined Analog-Digatal Computer Systems Symposium, Philadelphia, Pennsylvania. December (1960). 42. Wilson, A. N., Use of a combined analog-digital system for re-entry vehicle flight simulation. Proc. Eastern Joint Computer Conf., Washington, D.C. pp. 105-113, December (1961). 43. Wortzman, D., Use of a digital-analog arithmetic unit within a digital computer. Proc. Eastern. Joint Computer Conf., N e w York, pp. 269-282, December (1960). 44. Wright, R. E., Analog-digital computer linkage system for Bendix G-15. Proc. Combined Analog-Digital Computer Systems Symposium, Philadelphia, Pennsylvania. December (1960).
Information Technology and the Law REED
C. LAWLOR
Potent lawyer, 10s Angeles, California Chairman, Electronic Data Retrieval Committee
of the American Bar Association
1. 2. 3. 4. 5. 6.
7. 8. 9. 10. 11.
Introduction . . . . . . . . Information Growth . . . . . . Mechanization in Law Practice . . Applications of Symbolic Logic to Law Information Storage and Retrieval . Punched and Notched Cards . . . Prediction of Court Decisions . . . Thinking Machines . . . . . . The Law of Computers . . . . . Use of Computers in Court . . . . New Horizons . . . . . . . Bibliography . . . . . . . Exhibit “A” . . . . . . . .
.
.
.
.
.
. .
. .
. .
. .
. .
. .
. .
.
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
. . .
. . .
. . .
. . .
. . .
. . .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. . .
. . .
. . .
. 299 300 . 302 . 305 . 3 10 . 323 . 324 . 334 . 335 . 339 . 340 . 343 . 347 .
1. Introduction
The information explosion that,is occurring in other fields is also occurring in law. Today the lawyer must cope with more laws, more regulations, more problems, and more judicial and administrative bodies than ever before. His old methods of work are inadequate. He needs new tools. Many of the tools are here. But scientists and engineers must help him learn to use these tools. The conjunction, “Information Technology and the Law,” is commutative and therefore symmetrical. When the relation between information technology and the law is examined a little more closely, it is soon recognized that two oppositely polarized relations exist. We are not only concerned with how the new technology may be used in law, but also with how law applies to the new technology. Lawyers today are interested in making use of information technology in recording, classifying, analyzing, utilizing. and applying the law. In addit,ion, lawyers are becoming con299
300
REED C. LAWLOR
cerned with the application of the common law, which has evolved through centuries of trial and error, and also modern statutory law, to problems that are being created by information technology. Some lawyers are turning to the computer and to modern mathematical science to help them in the solution of all their problems, both old and new. At the same time, the use of computers is creating new legal problems which can be solved best by an understanding of computers as well as an understanding of established legal principles. The purpose of this paper is to provide an overview of the activity occurring a t the interface between information technology and law and to suggest which problems may be most significant from the standpoint of the lawyer, the courts, and the public as a whole. I n this paper we will take some short excursions into a number of technical fields, including the fields of symbolic logic, information retrieval, prediction of decisions, and the law of computers.
2. Information Growth
The rate of growth of information is very similar in many fields. A survey of the magnitude of the task that confronts the legal profession has been discussed by Layman E. Allen et al. [4, 51. That the growth of information in the field of law is similar to that in the fields of medicine and biology is apparent from Fig. 1. I n this figure, the number of new periodicals first published in the respective fields in various years is indicated. I n Fig. 2, the number of words in the federal statutes in force in various years are indicated [fir].It is interesting to replot this data to semilogarithmic scales to ascertain to what extent the exponential law of growth applies. This has been done in Fig. 3. From Fig. 3 i t is apparent that the growth of the number of periodicals is regular in the fields of medicine and biology; but it is erratic in the field of law. Nevertheless, the rate of growth in the number of periodicals is about the same in all three fields, and constant over long periods, i t . , growth in all three fields is truly very nearly exponential. But when we examine the number of words in the federal statutes, we find that the rate of growth is not constant but is increasing. We leave it to others to determine whether this is good or bad. But we wish to suggest that what America needs today may be heroic legislators and legal draftsmen who will spend their time repealing laws and reducing them to fewer words. For example, it is suggested that the number of words in the Internal Revenue Code could be reduced by at least 70y0without loss of revenue or fairness. There is, of course, a danger that when computer technology and other
INFORMATION TECHNOLOGY AND THE LAW
30 1
phases of information technology are applied extensively to the field of law, instead of making matters better, they may make them worse as far as quantity is concerned. Just as freeways result in inflation of traffic problems, there is danger that computers may inflate the information problem. 900
-
825
-
750
-
675
-
600
-
525
-
450
-
375
-
300
-
225
-
150
-
FIG.1. Rates of growth of periodical literature in medicine, biology, and law.
This might arise because of the reliance that legislators and lawyers, and others, too, will place on the high speed and large storage capacity of computers. It is important, therefore, when considering the application of information technology to law, that some effort be made to restrain the tendency that one might otherwise have to use many words and to cover many contingencies unnecessarily. Unless this is done, the rate of growth of legal information may increase to such a n extent that even the largest
302
REED C. LAWLOR
FIG.2. Growth of federal statutes in force in the United States.
and highest-speed computers will not be able to cope with the information problems that computers themselves have encouraged. This warning against information inflation also applies just as well to fields other than law.
3. Mechanization in l a w Practice
Though he was not the first to suggest the application of modern information technology to law, Dr. Lucien Mehl, the French jurist, set forth a nice prognostic summary of the possible use of automatic equipment in law in November, 1958, in the following words [5‘7]: “It may seem an ambitious step to try to apply mechanization or automation to the legal sciences. However, a machine for processing information can be an effective aid in searching for sources of legal information, in developing legal argument, in preparing the decision of the Administrator or Judge, and, finally, in checking the coherence of solutions arrivcd at.”
INFORMATION TECHNOLOGY AND THE LAW
I
303
I
FIG.3. Growth of information in medicine, biology, and law.
Mehl suggested that two types of “law machines” might be developed: (1) “The documentary or information machine” and (2) “The consultation or judgment machine.” We will have occasion herein to refer to the application of computers to some of the problems mentioned by Mehl and
304
REED C. LAWLOR
also to some of the more prosaic applications of modern technology to law. First we will consider the most simple. There was once a time when it was undignified to prepare legal documents with a typewriter. Greater respect was given to documents written in longhand than to documents written with the typewriter. This is still true about wills. The problems with which we are primarily concerned relate to the use of more modern mechanical devices to aid lawyers in service to their clients. Today the lawyer receives many of the benefits of the first industrial revolution in that mechanical means, including typewriters, dictating machines, and telephones, are accepted as ordinary tools of the profession. But few lawyers have utilized the fruits of the second industrial revolution in their practice. Today a few law offices have turned to data-processing systems to help them in the analysis of internal activities of a law office and, more particularly, to prepare records of the time spent on legal research, in conference with clients, and in the study, analysis, and preparation of documents in order to determine what is a fair charge to make for services rendered and also to increase the efficiency of operation of a law office [55]. Shell Development (Emeryville, California) utilizes computers to schedule work of its Patent Department. Johnson and Johnson (New Brunswick, New Jersey) utilizes computers to evaluate and preserve its trademarks in various countries and to monitor its contracts respecting trademarks. Carl G. Paffendorf (New York) has developed computer programs which can prepare plans for estates, taking into account numerous aspects of the law and a client’s financial and personal problems. Kenneth L. Black, of Auto-Typist Institute (Gainesville, Florida), now offers a service utilizing Auto-Typist technique for composing legal documents that utilize standard paragraphs. A few lawyers have turned to the use of automatic typewriters which use punched tape, either to reduce the expense of rewriting documents or to aid them in the composition of documents that make use of standard paragraphs. The list includes Homer Montague (Washington, D.C.), Scott D. Kellogg (Oakland, California), and Harold I. Boucher (San Francisco, California). Such methods will go into wider use when inexpensive silent typewriters capable of preparing machine-readable records are available. To the best of the author’s knowledge, no lawyer has yet used punched tapes or other machine-readable records of documents prepared by or for him for the production of indices to those documents. There is an unrecognized need for an inexpensive technique which can be used to index automatically, in alphabetical order, the significant terms appearing in legal
INFORMATION TECHNOLOGY AND THE LAW
305
documents of all kinds, whether they be contracts, statutes, articles, depositions, or trial records. Very few lawyers realize that automatic indexing is even possible.
4. Applications of Symbolic logic to law
The application of symbolic logic to law is not new. The very first problem solved by George Boole in his work, Laws of Thou& is a problem in Jewish dietary law. By means of the class calculus originated by him, George Boole solved the following problem [ I S ]: “Define the classes of foods that exist and those that do not exist under a law that permits a person to eat only clean beasts, which are defined as those which both divide the hoof and chew the cud.”
In this work, Boole also showed that there is an interrelationship between his symbolic logic and the theory of probability. He applied his logic to the analysis of jury verdicts in the chapter entitled, “On Judgments. ” Edmund C. Berkeley, in an article published in 1937 dealing with ‘%oolean Algebra . . . and Applications to Insurance” [ I I ] , employed the class calculus to analyze problems involving the crediting of last payments made under inconsistent rules of an insurance company. I n about 1952, members of various committees of the American Bar Association began to exchange communications on the applications of symbolic logic to law. More particularly, early attempts were made by Miss Elizabeth Freret (Washington, D.C.) , John E. Scheifly (Los Angeles, California), and Vincent P. Biunno (Xewark, Kew Jersey) to apply both the class calculus and the propositional calculus to problems arising in the application of various sections of the Internal Revenue Code. Without doubt, others also have tried their hands a t the art. This interest may have been stimulated, in part a t least, by the work of Edmund C. Berkeley [ I l l and also by an article on the subject of symbolic logic by John E. Pfeiffer, published in Scientific American [ H I . In that article, Pfeiffer reported that symbolic logic had been used by a number of insurance companies in the analysis of war damage clauses and group insurance contracts. However, attempts to identify this earlier work have not borne fruit. In 1956, Ilmar Tammelo proposed that the first-order functional calculus be applied to law [65, I S ] . In 1961 Ward Waddell, a lawyer of San Diego, California, who never went past calculus in college mathematics, authored a fine work that also relates to the application of the first-order functional calculus to law, with particular emphasis on the Hohfeldian theory of
306
REED C. LAWLOR
rights and duties [68]. So far as is known to the author, however, no one has succeeded in making use of the first-order functional calculus in the development of any new legal theory or in the solution of any legal problems. Commencing in 1957, Layman E. Allen, a mathemat,ician and Professor of Law at Yale University, published a series of articles [ 2 4 ] showing how symbolic logic could be applied to various problems, including the analysis of contracts, the organization of arguments, and legislative draftsmanship. He has concentrated (but not exclusively) on the development of techniques that avoid mathematical symbols, thus opening the door to the use of symbolic logic techniques to lawyers having a minimum background in mathematics. Some of his methods distribute unabbreviated naturallanguage sentences vertically on a page and use the expressions “or” and “and” to express logical connections, and horizontal lines extending across the page to express implication and coimplication relations. Allen has also described new techniques for diagramming the relationships of alternative subsidiary propositions that apply under the terms of a compound legal proposition [S]. Figure 4a represents a compound sentence that expresses many alternative subsidiary sentences, each of equal apparent validity, while Fig. 4b represents a compact diagrammatic representation of the various parts of that compound sentence arranged in such a way as to facilitate the recognition of different subsidiary propositions (an ambiguity that requires a more complex symbolism has been omitted in this presentation). Notice the switch-circuit character of the diagram. Every possible set of connections that completes a circuit from one side of the diagram to the other corresponds to a different subsidiary proposition that may be valid under the rule of law expressed by the compound sentence. I n this particular case, the single compound sentence is capable of leading to at least twenty-four correct interpretations. Allen has also utilized similar diagramming techniques for analyzing syntactic ambiguities that arise because of the various meanings of the words (‘and” and and the difficulties of interpretation created by uncertain association of modifiers with the terms modified [S]. A typical example involves the question as to what is meant by the following statement: “The general purposes of the provisions governing the definitions of offenses are: (a) To forbid and prevent conduct that unjustifiably and inexcusably inflicts or threatens substantial harm to individual and public interests; .”
..
I n a survey made by Allen he found great disparity in the opinions of both lawyers and law professors as to what subsidiary purposes that can be composed from the foregoing predicate were purposes that were intended by the author of the statement.
"In all that concerns the riglit of acquiring, possessing or disposing of every kind of property, real or personal, citizens of the United States in Serbia and Serbian subjects in the United States, shall enjoy the rights which tile respective l a w grant or shall grant in each of these states to the subjects of tlie most favored nation." (it)
-
w
0
acquiring
v
I n all that concerns the right of
-
-4
or posse+-
or
+ disposing
-
of every kind of
property,
citizens of tlie United States --. in Serbia or
Serbian subL jects in the United States
-
,
shall enjoy the rights whicli tlie respective laWS
-
in eacli of these states to tlie subjects of the most favored nation.
(b)
FIG.4. Analysis of compound proposition: (a) compound proposition; (b) switch path equivalent with test.
'
-
308
REED C. LAWLOR
In July of 1958, Bryant M. Smith, writing anonymously, published an article in the Stanford Law Review [64a] on the subject of "Mens Rea and Murder by Torture in California." Though the body of this article was written in plain legal English, the footnotes were written in symbolic logic. The author has developed a mathematical theory of patent claims [49]. He has pointed out that patent claims define classes of objects, processes, or compositions of matter. I n his method the claims are rewritten, however, as logical products of propositions; and then the propositional calculus is employed to determine whether an object described in terms of the same propositions do or do not fall within the scope of the patent. This work has led to the development of a computer program for designing around a patent, whenever possible, while still complying with the specifications of a manufacturer. A committee in the United States Patent Office has under consideration the adoption of a rule permitting claims to be rewritten as strings of sentences, instead of as long predicates composed of strings of clauses. The 1att.er style of writing, which has been in use for many, many decades, has been condemned by Judge John R. Brown, of the Fifth Circuit Court of Appeals [1 4 . The relative understandability of claims written in the current style compared with claims written as strings of sentences can easily be appreciated by the reader by the following examples. Under current practice, a patent on a simple, fictitious electric light (invented many years ago) might be written as follows:
I CLAIM: 1. An electric light including: a base having two mutually insulated electrical contacts mounted thereon; a transparent envelope rigidly connected to said base; and a coiled metal filament within said envelope, the two ends of said filament being electrically connected to said contacts. 2. An electric light including: a base, a transparent envelope sealed to said base , two electrical conductors extending through said base into the space enclosed by said envelope, and a coiled filament within said envelope. The same invention could be just as adequately claimed in simple Rudolph-Flesch grammar-school English, as follows: I claim as my invention an object described in one of the following paragraphs: 1. The object is an electric light. The light includes a base. Two mutually insulated electrical contacts are mounted on the base. The light includes a transparent envelope. The envelope is rigidly connected to the
INFORMATION TECHNOLOGY AND THE LAW
309
base. The light includes a filament. The filament is composed of metal. The filament is coiled. The two ends of the filament are connected to the contacts. The filament is within the envelope. 2. The object is an electric light. The light includes a base. Two electrical conductors extend through the base. The conductors extend into the space enclosed by the envelope. The light includes a transparent envelope. The envelope is sealed to the base. The light includes a filament. The filament is coiled. The filament is within the envelope.
To the untrained eye of the layman-whether he be a non-patent lawyer, scientist, or judge-which type of claim conveys more clearly what is covered by the patent? Which is better: a short claim that is a complex predicate of a long sentence, or a long claim made up of a string of simple sentences? The practical application of symbolic logic to law has been retarded partly because of the lack of training of lawyers in this field, partly because of the Tower-of-Babel, jungle-like, and sometimes contradictory varieties of notations employed in symbolic logic, which impede the study and learning of symbolic logic by nonmathemnticians , and partly because the symbolic logicians, being primarily pure scientists, are more interested in extending and developing theories than in finding everyday uses for them. Lawyers need the help of logicians in pointing the way to learning the symbolic logic methods of solving their problems. To be most helpful, the logicians need to improve their methods of communication with the outside world. As a step in this direction, Layman E. Allen has invented a dice game that is used for teaching logic in Polish notation to six-year-olds and up [7], thus outdoing Lewis (Alice-in-Wonderland) Carroll [I 71, who aimed to make symbolic logic understandable to intelligent fourteen-year-olds and up. It also helps when authors illustrate principles of logic with law problems [ l l ,651, however simple they may be. We have little knowledge as to the extent to which symbolic logic is being applied to law in various foreign countries. However, we do have information that the study of symbolic logic is required in the law schools of Poland [70].I n America, extensive courses in symbolic logic have been offered, so far as I know, only under the leadership of Layman E. Allen a t the Yale Law School. Without doubt, symbolic logic is being studied by law professors and law students in many other law schools. It is to be hoped that scientists and engineers will encourage interest in this field whenever the opportunity to do so is presented and that they will help lawyers who are eager to learn, but who lack advanced mathematical backgrounds. I n this way, the disciplines of law and modern logic can join hands for the benefit of the public. But Boolean algebra is not enough. It is only a n intro-
310
REED C. LAWLOR
duction. In order to apply symbolic logic effectively to low, one must go beyond Boolean algebra. One must utilize propositional calculus and modal logic, particularly deontic and alethic logic. A great opportunity awaits the logician who can translate advanced logic into a language which can be used by ordinary people for solving everyday complex problems. 5. Information Storage and Retrieval
There is a great need for the use of computers for storing legal information and for finding relevant stored material and for presenting it in a form most useful for lawyers. This need was first pointed out by Lewis 0. Kelso [@I in 1946. He said: “Today the lawyer works substantially as he worked before the industrial revolution. Only automatic legal research will save him from plying one of the most confused, ill-paid, and unsatisfactory professions in the world of tomorrow.”
In law, research means finding and analyzing the law applicable to a problem. The most important immediate problem which is of interest to the bar, to clients, and to courts and which might be solved by computer technology involves the development of more efficient methods for finding published decisions and statutes which are relevant to a particular set of facts, so that a lawyer can analyze these decisions and statutes and present them in an orderly fashion for a judge to use as a basis for arriving at a just decision in the particular case before him. I n addition, our legislators are exceedingly interested in knowing what laws have been enacted which relate to different subjects, so that better laws can be prepared and so that superseded, inconsistent laws can be removed from the books. I n the United States there are fifty-two main governments-the federal government, the government of the District of Columbia, and fifty state governments. Under most of these, we have various county governments, city governments, boroughs, and others. In addition, each of these governments has many administrative agencies. The number of statutes and regulations to which individual citizens are subject, and to which professional men in various fields are subject, is increasing more than exponentially. Perhaps we can improve our legislative method some day so as to reduce the number of statutes and regulations. But for the present, it looks as if there will be continuous growth of such statutes and regulations. For this reason today’s problem is to develop improved techniques for finding which statutes, regulations, and cases are applicable to particular sets of facts or particular individuals or groups. There are approximately 29 million Appellate decisions published in
INFORMATION TECHNOLOGY AND THE LAW
31 1
the United States. Some of these are Supreme Court decisions which apply to all of us. Some of them are decisions of local courts-either federal, state, county, or city-that apply to some of us in particular geographical areas. Because of the origin of laws in various areas of the country and because of the precedents which are at the foundation of the decisions in different areas (such as the New England area, the South and West), the rules of law applicable in these different areas to the same situation are sometimes very different; in some cases, they are almost contradictory. To eliminate the apparent contradiction between the rules of law in different jurisdictions may be impossible. But knowledge of the laws of other jurisdictions is helpful to courts in applying principles of law to particular sets of facts. Today it is often like looking for a pearl on a pebbly beach when a lawyer goes to the law library to find a case which would be recognized as a precedent in a particular jurisdiction and which applies to the particular set of facts in his client’s case. Often the lawyer can’t even tell whether there is a pearl among the pebbles. He is forced to sample the pebbles manually by picking them up in small batches and to study many of them carefully with his legal eye before he dares conclude whether or not there is a pearl and, if so, whether it is black, or white, or grey. I n practice, the presentation of legal arguments involves the citation of cascs which it is believed by the lawyers should be considered by a judge in arriving at a decision. Finding all applicable cases and finding all applicable statutes by manual methods and selecting representative cases to cite is difficult and costly. A properly designed and operatcd machine-processing system can locate a pearl on a pebbly beach with greater certainty than can a lawyer, and with the expenditure of less lawyer effort per pearl, and for fewer client dollars per pearl. This is especially true when many different searchers are trying to locate different pearls on the same beach and where every searcher can make use of the same pearl if it applies to his problem. The first suggestion to apply mechanical searching systems to the field of law seems to have been made by Lewis 0. Kelso [@I. Stimulated by the earlier suggestions of Dr. Vannevar Bush to mechanize searching methods in scientific fields, Mr. Kelso proposed the use of a “Law-dex” system similar to the present Minicard system and Filesearch as early as 1946. In about 1955 Vincent P. Biunno, of the New Jersey Law Institute, proposed the use of a continuous moving tape for making simultaneous searches in the field of law. I n the system suggested by Biunno, all information that someone might want to retrieve would be recorded on tape, and this tape would be moved continuously past a number of readout stations. This would make possible the simultaneous solution of a number of law problems for different lawyers, possibly greatly reducing the cost per
31 2
REED C. LAWLOR
search. Chernerin [see 441 suggested a number of years ago that perhaps a good way to distribute and retrieve information might be to transmit this information continuously on a radio beam, so that selected parts of the information could be retrieved by those in need of the information at various receiving stations. While the suggestion was made in connecton with other problems, this method could also be applied to law, at least theoretically. The efficiency and effectiveness of any such system could be increased by employing a large number of carrier waves simultaneously, so that the cycling period for the broadcast of all decisions could be shortened. But it may be more economical to duplicate magnetic tapes or other recordings bearing legal information and flying them to various computer centers. The problem of storage and retrieval of legal information divided itself into a number of steps: The selection and analysis of the information to be stored. Selection and application of the coding process, if any is to be used. Selection and application of the storage method. Analysis and preparation of questions to be answered in machine language. (5) Retrieval of the desired information. (6) Display of the retrieved information. (7) Marketing of services. (1) (2) (3) (4)
The preparation of the legal information that is to be stored involves the selection of the material to be stored and also involves the analysis of the material in preparation for storage. One basic question that needs to be answered when one considers how to store legal information involves the question as to whether or not the full text should be stored verbatim. By full-text verbatim storage is meant placing every word of a decision or statute on tape in natural language, letter for letter, in alphanumeric code. There is a great advantage of verbatim storage. If the full text is stored verbatim, mistakes of omission and mistakes of commission-prejudice, if you please-that might otherwise be made by predigestion of the law prior to storage are avoided. On the other hand, storage of full texts requires greater storage capacity and longer search time. Full text storage is similar to color photography. It contains much more information than precoded analysis of the same material. For many purposes color photography is superior to black and white photography, especially if definition has been lost in a coarse process such as stippling. Besides verbatim storage of full text, it is also possible to store digests verbatim, whether they are prepared by professional digesters or whether they are prepared by automatic abstracting methods. Another method is
INFORMATION TECHNOLOGY AND THE LAW
313
to prepare lists of key words which characterize the material in question. Another method is to prepare concept profiles based upon a limited selected list of words [&?I. The application of electronic data-processing systems to legal research was successfully demonstrated publicly for the first time in the summer of 1960 [37]. These demonstrations, which were conducted a t the American Bar Association Convention, were sponsored by the Electronic Data Retrieval Committee of the Bar Activities Section of the American Bar Association, the Health Law Center of the University of Pittsburgh, and the United States Patent Office. These demonstrations fulfilled a n eightyear-old dream of the Electronic Data RetrievaI Committee of the American Bar Association. At these demonstrations, two types of IBM computers were employed for storage and retrieval of legal material. For these demonstrations information of four different kinds was stored. On the IBM 650, the following three types of information were stored: (1) The full texts of all of the hospital law statutes of all the states were stored verbatim and also some labor law material furnished by Lawyers’ Cooperative. (2) Key words characterizing the facts present in all decided foodproduct liability cases were stored. (3) The digests of all the oil and gas law cases of the past two years were stored verbatim.
This application of the IBM 650 was developed a t the Health Law Center of the University of Pittsburgh. John F. Horty, Director of the Health Law Center, and his associates a t the University of Pittsburgh deserve credit for being the first to apply electronic data-processing systems to law problems successfully and for demonstrating this work publicly. They applied it t o the hospital law statutes. The digests of the food-product liability cases were provided by F. Reed Dickerson, Professor of Law, Indiana University School of Law. The digests of the oil and gas cases were provided by Robert A. Wilson, Vice-president of the Southwestern Legal Foundation, of Dallas, Texas [S7]. In addition, a t the ABA Convention key word digests of headnotes of all of the design patent law cases which had been decided in the previous 305. These twenty years were stored on magnetic discs of the IB M RAMAC digests were supplied by the Bureau of National Affairs. The preparation of the material and the demonstrations were made under the direction of Donald D. Andrews, then Director of Research and Development of the United States Patent Office [s]. The material thus stored on magnetic tapes and magnetic discs was
REED C. LAWLOR
314
retrieved in response to logic equations in which the terms represented different English words that had been stored. The stored material was also employed to prepare indices of the text. There are approximately 430 hospital law statutes. Each of these hospital law statutes was assigned a different document number. The computer indexed the stored material automatically. Table I represents a portion of the word index, or dictionary, produced automatically. Here each word is listed in alphabetical order, and opposite each word is a list of document numbers that had been assigned to the different statutes. Words such as TABLE I. PARTIAL AUTOMATIC INDEXTO HOSPITAL LAWSTATUTES BY JOHNF. HORTY IMPRISONtD -Lkt?RLSONMENTIMPROPER _.IMPROVED IMPROVEMENT IMPROVEMtNTS IMPROVING 1N C .. INCAPABLE INCAPACITY INCIDENT INCIDkNTAL I NCLlJDE
_.
INCLUDED INCLUDES INCLUDING __ I N C L U s I ON-
_ _I N C L U S I V E INCOME - - -- __ . _.
.
3 .
INCUMPtTtNT INCONSISTENT INCONVENIENCE INCORPORATtD
__ INCORPORATION I_ N_ C O R P O H A T O K S INCREASE -_II NN CC UU MR BR EE RD t D _ _I_N_C U R R I N G IND -
-.
77
351 51 428 11 242 352 428 337 170
-.
--
59 144 240 399 87 339 35 1 410 102 102 146 330
323 361 406 97 196 2 46 356 397 57 153 360 141 406 355 84 86 10 279
-
61 232
..
227 389
386 393
148 351 140 376 383
431 175 412 404
20 24) 354 42Y 345 32 339 368 407 98 216 4 129 366 403 130 154 383
260 424 431
?52 43 1
31
L74
1 10 L IY
393
344
r l 344 314 409 116 244
133 348 376 415 118
134 350
136 36 7 40 7 132 155
246 368 413 138 156
249 369
98
123 359
26 220 5Y 1
38r
42 1 122
146
15 7
88
11 280
12
358
INFORMATION TECHNOLOGY AND THE LAW
315
llthc,” “of,” “and,” and the like, which have no index value, were cscludcd automatically. Table I1 represents a list of statutes which was produced by thc IBM 650 in response to the question:
“Please supply me with a list of statutes and copies of statutes dealing with restrictions upon remuneration for officials of charitable corporations.” Table I11 shows the text of several of the statutes actually supplied in response to this same question. I n order to retrieve such information, it is necessary to prepare a logical expression relating the various terms that are in the dictionary (Table I) with the terms that are in the question. Such an expression is represented in Fig. 5 . Here it will be noted that the question is in the form that reLAWYER’S QUESTION Please supply me with a list of statutes and copies of statutes dealing with restrictions upon remuneration for ofEciaIs of charitable corporations. COMPUTER QUESTION (a
+ b + + d + e + I N S + h + i +j + k + 0 c
Q
6 c
d e f
trustee = trustees = officer = officers = director = directors =
compensation h = compensations i = salary j = salaries k = remuneration I = wages g =
FIG.5. Type of question employed to retrieve hospital law statutes.
sembles an expression of a formula of symbolic logic. Actually, the expression is the logical product of two terms, each of which is a logical sum of several key words. The sign means “or”; the juxtaposition of two groups of parentheses represents “and.” The computer is programmed to identify and retrieve each document that contains one or more words in the first parenthetical expression and one or more words in the second parenthetical expression. The preparation of the logical expression depends, of course, upon the type of retrieval technique employed. While methods could be employed for retrieving information according to word roots or by coded terms, the method employed here was one of selecting statutes which included words identical with those in the logical expression. Whenever such a matching technique is required, as in this case, it is necessary to take this fact into account in the preparation of questions, EO as t o include in cach logical
“+”
CONN. __.. GENo STAT. =S 12-81__ /1958/. ME. REV. STAT. ANN. C * 91-A* = S 10 /SUPP. 1957/* -_....N ?..- TAX -. .. ....LAW .. = S 4.___ _. .. ...~. .. TENN. CODE ANN. = S 67-502 / 1 9 5 5 / . . w 1.5...ST *.. .*N N 0.. .=?. .7 0*.!?- !.-I ? ?.I.9 .. AS: . .AMEN PEP. 3 U!!!?r-
*-v
*
I I
338.-.
366 -.... ..... -- .. 3 8P -. 415 -1.95.9-l e ... ... .. __- - _.... ...... . ._._ .. .... ._... ..43.1.. .
__
-_ _ _ _ _ _ _ _---. . ----
TEXTOF HOSPITAL LAWSTATUTES RETRIEVED IN RESPONSE TO QUESTION IN FIG.5 TABLE111. PARTIAL ____I________I
.....
NEB. REV. STAT. SS 21-1503 / R E I S S U E 19541. .- - -=-S- -2_1-1503. - ....... -5 T.OCK- *...D-I-V I DE.NDS~I-.-SA.LARL.E.S_I..-PROHI13 I J E D . . . . . . . . . SUCH / N O N P R O F I T / C O R P O H A T I O N S H A L L H A V E NO C A P I T A L S T O C K 9 AND SHALL P A Y N.0 D I V I.D.ENDS-OR- S.ALAR_I.ESf.Q_IJ~S.__1NCDRF(ORATORS . OR . BOARD OF DIRECTORS. .- - - - - - - - - - - _ - - - - ..- _ .... - - - - - .. - - .. - . - - - - - .......................................... NEV. REV. STAT. -5 81.310 /1957/. . - . ..... - ....... - . . . . . . . . . . . . . --=-S--81.*.31?- .POW SRS ..OF. CORPORA? I QE! *- .... THE /NONPROFIT/ CORPORATION SHALL, A S A N INCID~NT OF ITS PURPOSE AND WITHOUT A.NY_NECE~.S.I~_Y_FI!R_E_XPRE.SS-I_N.G__LH.E_.SAME..IN ..ITS . A R T I C L E S OF . I N C O R P O R A T I O N * H A V E THE F O L L O W I N G POWERS9 W H I C H I T MAY E X E R C I S E I N .- - - - -F-U-L....... L MEASUKE ~ W . I T ~ H O U T ~ ~ ~ T H _ E _ N C E S S ~ I ~ T Y . ~ O F . ~ ~,ANY,.OKDER O ~ T A I . N I . N GO F COURT B Y A U T H O R I Z A T I O N , APPROVAL OH C O N F I R M A T I O N * * ...................... AND -._ P A Y ...OFFICERS_-AND.-AGE.NTS. . - ....... .T.o..CONDUCT AND ..1-5 ..... 1 . T-O APPOINT . A D M I N I S T E R THE . A F F A I R S OF THE CORPORATIONI BUT NO MEMBEK O F T H E BOARD 0F f RU S f E , E ~ ~ ~ ~ H ~ ~SA?.IP.N Pe.................................... . , E N
147
NEV. REV. STAT. =S 85.050 /1957/r ..................... . ... . = s t35.050 T R U S T E E S NOT T O RECEIVE COMPENSATION.* EXCEPTION. .- - - - -NO - - ..TRUSTEES - - ... - . -O. F ~ ~ T H E ~ ~ / N O ~ N ~ P ~ R ~ O F I ~ T ~ / ~ . ~ C ~ OSRHPA~LOL . RBE A T IEONNT I T L E D TO ANY COMPENSATION E X C E P T UNDER SOME SPECIAL EMPLOYMENT OF THE B O A R D * OR A U T H O R I T Y EXPRESSED I N THE O R I G I N A L DEE-D--~.K~-I,N,ST.RU,ME,NT..OF TRUST. . ...
149
-
__
_ _ - _ ***
U
-
4
I,
-
WYO. COMPI STAT. ANN. =S ... 44-1009 ................................ 119451. . . . = S 44-1009 . O F F - I C E R S R E C E I V E NO S A L A R I E S . NO O F. F I C E R OF.................................... ANY C O R P O R A T I O N FORMED.,,UNl3ER . T H E . P R O V I S I O N S .OF., .THE. 9 T H S U B - D I V I S I O N OF = S E C T I O N 1 / = S 44-1001 = H O S P I T A L S AND A H~I~ ES/ ANY cI NoF~I R~MR ~ - SNH~A LRL - R- E~ C~E-I V~E- ~ - ~S - A~ L"A-R~Y ~OR ~F i- REMUNERAT s ~ - ~ - u - !O,N,,,F.K,OM..,SUCH. ~ - H - o - F
,
150 e.0
_,_,
_,
$
318
REED C. LAWLOR
sum (. . .) all different spellings of a word found in the dictionary and, likewise, all different grammatical forms and inflections-such as singular and plural forms, variations due to differences in case endings, declension, conjugation, and the like, and even different parts of speech, involving the same concept. Furthermore, antonyms and synonyms must be grouped together. The reason for this is that negatives and double negatives might otherwise cause the retrieval system to overlook important documents. For example, to search all statutes that include the term “sanitary” would fail to recover documents that include the phrase “not unsanitary.” For this reason the search must be made for the logical sum of the two words “sanitary” and “unsanitary”. Richard F. C. Hayden (former chairman of the Electronic Data Retrieval Committee of the American Bar Association and now judge of the Superior Court in Los Angeles, California) has suggested that words that must be considered equivalent for search purposes be called “searchonyms,” regardless of whether they are synonyms, anyonyms, homonyms, or misspellings. One advantage of recording the full text without predigestion by an abstracter involves the fact that the stored record of the full text may then be examined at any later time by a person who has a different interest or a different point of view from the one who prepared the abstract. This has now been done with all the statutes of Pennsylvania. Table IV, for example, shows a small portion of a list of the Pennsylvania statutes which was retrieved in response to the question: “Where does the word ‘patent’ appear in the Pennsylvania statutes?” A study of these statutes reveals some that deal only with land patents, some that deal only with patents on inventions, and some that deal with so-called patent medicines. This simple question could be answered by reference to an automatically generated dictionary. But more advanced computer programs now undergoing development at the University of Pittsburgh may be used to select one of these groups of laws in preference to the others. But since the machine has difficulty with negates, it may be difficult to search for patents that are not land patents. How soon searching techniques can be developed that are free of this limitation can only be guessed. A question seeking hospital law statutes dealing with copyrights quickly brought forth the answer, “None.” How long would it have taken a lawyer interested in this question to convince himself that there was no pearl on the beach if he had worked without a computer? A telephone call to a law computer center could give him the answer with little effort on his part and at much lower cost to a client. Some comparative tests of the speed and reliability of lawyers and computers in finding the law would be desirable. In the demonstrations that were made at the American Bar Association
LISTOF PENNSYLVANIA STATUTES INCLUDING THE WORD“PATENT” TABLEIV. PARTIAL U i l l V E Y S l T Y O F PITTSdtldW CH A 5 j O C I A T l O N ANC THE L E T T E H S C H ASSOCIATION AND THE L E T T E H S ROOFS C F A 9 V E R T I S E k ’ E M T t H A Y F A THE CO@C43hrEALTHr A:YD L € T T E Y S UHOC( ONE S H A L L U E A Y EXPFKT I N LETlEN5 ARY R E k i E D l t S r P d P U L A H L Y C A L L L D L E OF P R O P R I C T A H Y OR $ + C A L L E D WAYUFACTUREH5 AND VENDOR‘> OF C f U h l N c j 0.4 V E N O I d G NtiSTi<xJ:NS OH NY Y A h U F A C T U K f U 3F H0ST)i:ltYS OH H E H C A H T I L E LICENSES ACT* MERCANTILE LICENSE5 ACT* C E R T A I W A C T 5 t‘XTFNDFi? TO PLKANTILE L I C C N S F . S ACT. ON OF THE A R T I C L E S AND L L T T L R S ihy OF THt A h T I C L C S AYG L t T l r ’ k 5 I O f O FUHTtiiRr 1 H A T NEW L E T T F X S I D E D FURTHCRo TWAT l l E U L C T T E H S NEW LETTEHS COLON OF A CHARTFR OR L F T T E K S I N TMF. SAYE CHARTER .if7 LFTTERS OLOR O f AYY CMAHTEH 04 L E T T C A S VAL or THE G U V E R : ~ . uwi rHE W N 1 2 E D F 3 H AE4Y L A N D AF1f.R THL COLOR OF A C H A k f E H OF LETTI.HS L H PERSl)!i I f 4 TME * S N I l L n :.TATE:> VICE-NARK I * THE dPi1TFiJ ‘ . T A T F S T O W A N T .sQcn PEE5 FOR
KEY.*WO-IN-Cl)kTfXT PATLNT PATENT PATEYT PATiNT PATENT CATEnT PATEhT PArmr PAIEkT PATENT PATENT PATENT PATFNT PATENT PATENT P9TENT PATENT PATENT PATENT YATEHT PAlENT PATENT PATEN I PATENT PATtNT PATEYT PATCNT PATE”r1 PA T .E:+ T P A r w
19/1&/
I S S U E D T O SUCd A S S O C l A T l O N r W E ISSUED 10 S U C H ASSOCIATlONr Yt+€ I ~ . ! ~ . ; l € O TtiEREON TO T M NANLD A P P I 5 S f J E 3 IHCHEr”N* U N T I L A T L E A S T LAW. A h 0 THE OTHER TNiJ SHALL B E ,NAY OE l S 5 t J E D T J C E d T A l N COUPOH * L D I C l Y E S r HOW PkEVEYT STOIIELEE f l E D I C I idE 5 .YEDICI~ES ro LICENSED * E D I C I ‘ 4 L S s OF dHATEVEP C L A S S OU aEDICINKS F k M TdE P A Y W E h l OF T HE D 1C I NE 5 HE D I C I NE 5 MEDICINC LICEN5PS
i x
ME 0 1C I N C 5 OF 5 : l C i i A 5 S O C l A T I O N .
OF SUCH A S S 5 C I A T I O N .
OF w c t i CU%SOLI~ATED c,wPoRArii) UF SUCH C O r i S O L l O A T E D COHPMATIO OF : d C H C J N 5 i ) L I b A T E D C U R P O H A T I O OF I H F C O ~ V P ~ N ~ E A L T H AND I *H€THE OF THE j 4 I D KIk6 CI1ARLES T H E SE O F THE COMWONJEALTHr UPON ACCEP
3 F THE C O W O N * E A L T H . i i P 114E C W M ~ ) N ~ ( E A L T I I! i A S B E E N GR Q F T Y I S COrt;4ONWCALTdr ISSUED US OFFICE w i * rIiE O A T € s TM U F F I C F D C D V E S I Y S An4 A k f A I k C L U L i Oh THE d 5 U A L PtiJUF 3 F SETTLEhE34 GN r w i LJTS
PA. PA.
STAT. STAT.
PA. PA.
STAT.
PA. PA. PA. PA. PA. PA. PA.
STAT. STAT. aTAT. STAT. STAT.
STAT. STAT. STAT.
PAS STAT. PA. \TAT. PA. $141. PA. STAT. PA. ‘nTAT. PA. \ T A T . PA. STAT. PA. S T A T . PA. STAT. PA. STAT. PA. STAT. PA. S T A T . PA. STAT. PA. ITAT. PAD STAT. PA. C,TAT. PA. T A T .
ANN. ANN. ANY. ANN. ANII. ANN. ANN. ANN. ANN. AHNO ANN. AM. AI~N. ANN. ANN. ANN. ANN. ANN. ANN. ANN. ANN. ANN. AWN. AHN. ANN.
1 4 r SEC. 14r SEC. 6 4 r SEC. 6 7 ~SCC. 111. 2 5 r SEC. 111. 1 5 s SEC. T I T . 6 3 r SEC. T I T . b 1 r JEC. TIT. 7.21 SEC. T I T , 7.21 SFC. T I T . 7 2 0 SFC. T I T . 7 2 . SCC. T I T . 721 SEC. T I T . 721 SEC. T I T . 721 SECe TIT. 1 4 s SEC. T I T . 14. SLC. T I T . 1 5 1 SEC. T I T . 159 SEC. 111. ZOr SEC. T I T . 1 5 s SFCI
TIT. TIT. TIT. TIT.
5
1112
0
3006 27
E
311 361 3151 3151 3151 3151 3152 3153 3153 58
97 1203 1963 457
3 --.I
8
I
?
0
2 * 5
ZOll
2
T I T . 64r SEC. 1 111. 6 7 r SEC- 1114 T I T . 6 4 r SEC. 6 1 2
5
TIT. ANN. T I T . At440 T I T . ANN. T I T . ANN. T I T . ANN.
56 97 604
rIr.
b4r IEC. 15s SFC.
73.
arc.
7 3 9 SEC. b 4 r SLC.
64.
615 1191 23 20 162
’ ~ E c .4 1 2
<
320
REED C. LAWLOR
Convention in 1960, English words were recorded in alphanumeric form. No coding was employed and only a few standard abbreviations. A question might arise as to the desirability of employing coding. Certainly, one advantage of using code symbols instead of English words resides in the fact that the total amount of storage capacity required would be less. However, unless special precautions are taken for mechanical translation of the coded expressions into the English words during the retrieval or reproduction process, the reader supplied with the answer in code would not only be required to interpret the law, but also the code symbols. The resultant loss of time might be more costly than the additional cost required to store and retrieve information in English-word form. A modified form might utilize information stored in two forms, one in code and one in English. The coded material could be searched to produce code answers that are then stored temporarily on a record which is then used in conjunction with the Englishword record to write out an answer in English-word form. This method would reduce needed storage capacity and also search time where a large body of law is involved, and yet meet the requirement of supplying answers in language already familiar to lawyers; but it would require prescience to anticipate questions far in advance of the occurrence of problems that suggest them. At the 1961 Annual Convention of the American Bar Association, an application of computers to the retrieval of legal information was demonstrated by Robert A. Morgan, of Oklahoma State University. [GO] This demonstration was made on an IBM 1401. The material selected consisted of regulations of the Internal Revenue Code relating to gift tax law and recent cases applicable to that field of law. In this case, the material was analyzed and coded according to points of law. The searching technique involved locating cases which involved all points of law which were of interest to the interrogating lawyer. I n this case, the decisions were printed out a t a rate of 600 lines per minute. It was interesting to watch lawyers queue up to present their questions and then step over to the printer a few minutes later to receive the printed solution. A printer that prints material a t the rate of twenty legal pages in a minute could not help but impress lawyers who have difficulty getting that much material typed in a day. Under some conditions it may be desirable to store information in entirely different form than on magnetic tape or magnetic discs. The Recordak or Minicard system would be just as suitable, and perhaps more economical, to use where searches are made by means of code symbols. In such a system, coded representations of the key words and concepts are recorded photographically along with the pages of the text to which they apply. I n such a case, by searching the photographic records electrooptically, documents to be studied further can be selected, and either pro-
INFORMATION TECHNOLOGY AND THE LAW
32 1
jected onto a screen for viewing or printed out for delivery to a lawyer a t a remote point. All of the Pennsylvania statutes have now been recorded on magnetic tape at the Health Law Center of the University of Pittsburgh. Studies of the context and structure of these statutes are being made, and also methods for searching the statutes to find material pertinent to lawyers’ questions are undergoing development. Two simple, interesting observations regarding these statutes are in order. All in all, there are about six million English words in these statutes. Approximately half of them are of little search value; that is, they are words, such as “the,” “and,” “but,” “Of,,> ria,>, LLis ,7, and the like, which are of little indexing value. A total of only 26,000 different English words, including all inflections thereof, are found in those statutes [MI. As many as 500 terms may be used for searching simultaneously, whether the 500 words be used to form one logical question for one search or several logical questions for several searches. The time required for searching the entire statute with respect to a single question is approximately one-half hour. In that half hour all the sections of the statute referring to a particular subject may be located. The average time required per search is reduced somewhat as the number of searches being made simultaneously is increased. Investigations are being made under the direction of Robert A. Wilson a t the Southwestern Legal Foundation for methods for storing and retrieving legal decisions on a large scale. Additional studies of all possible methods of retrieval of statutory law are also being investigated at the American Bar Foundation under the direction of William B. Eldridge. Other projects are under way in other sections of the country. At the present time the bottleneck in legal retrieval involves the manual keypunching of the cards that are fed to machines during the storage process. It is to be hoped that in the near future someone will supply the legal profession with a machine which will read its old books photoelectrically and record the information directly onto a storage medium which can then be read automatically for retrieval purposes. As presently contemplated, the recording will be in alphanumeric form. I n other words, the records will be in the form of dots and blanks. But there is no technical reason why images of the type cannot be recorded directly, albeit in microscopic form, and then searched directly by means of pattern-recognizing machines. At the present time the plan is to present questions in alphanumeric form. But there is no technical reason why a machine could not be built that would respond to an oral question. Though it may sound fantastic in the present state of the art, the day may even come when a machine will print out the text of all relevant material in response to an oral question
322
REED C. LAWLOR
asked by a man with a n accent, and the answer may even bc supplied orally with the same accent. I n any event, it is suggested that for the immediate future it may be far more economical to record printed information in full text rather than to spend time abstracting or digesting cases in order to abbreviate the material prior to storage. Besides recording English words or coded expressions corresponding to those words, it may also be possible to describe the concepts of various cases or parts of cases by means of a new language that has been developed by Martin Kalikow of General Electric. Kalikow has developed a list of terms which can be used to express the concept of any word in a limited dictionary. The total number of terms employed by him is only about 450. Each word in the dictionary is, in effect, represented by the logical product of a selected number of 450 words. Kalikow has used this dictionary in the indexing of the broadest claims of 6500 patents. I n effect, therefore, since there are only 450 terms involved and the concept profile of each claim is represented by the logical product of the equivalent terms in his language, the concept of each of these broadest claims can be punched out onto a single 90-column card with room to spare; and these cards may then be searched t o locate patents that should be studied to determine which patents are infringed by a particular device. Thc probability of false dropouts when using this system is negligible. The Patent Office searching process is by far the largest information retrieval system in daily use in the world. At the present time, this work is done almost entirely by manual techniques. The Research and Development Division of the United States Patent Office is in the process of developing and testing various techniques which can be employed to mechanize a large part of this work. Whatever can be done in the field of patents can also be done in the field of law. There are about 2,500,000 published decisions. There are approximately 3,000,000 patents. The bulk of the material involved in the two fields is of the same order. While the Patent Office is experimenting with the recording of full text, most of its efforts in the past have involved the use of patent examiners to prepare digests of patents in terms of coded language. At the International Patent Office Symposium on Information Retrieval held in Washington, D.C., in October, 1961, it was pointed out that approximately five or six hours are required to analyze the average patent in this way. Dr. C. S. de Haan, of The Netherlands Patent Office, pointed out that, a t the present rate, all of the United States patents could be analyzed in twenty years, provided that the Patent Office continued with its present staff but did no other work during that period [%'O]. This over-all analysis alone demonstrates the importance of developing automatic methods for recording printed text in machine-readable form without predigestion of the text
[@?I,
INFORMATION TECHNOLOGY AND THE LAW
323
material. Kot only would such an automatic recording system accelerate the work of translating printed law libraries into machine-readable form expeditiously and, perhaps, economically, but it would also record the text in such a way that any part of the material could be located and reproduced automatically by a skillful programmer. As all of these retrieval techniques are developed and improved, and as their practicability is demonstrated in other fields, it is to be hoped that those who work in these fields will bear in mind the fact that lawyers as a whole probably spend more time retrieving information from their library than the members of any other single profession except patent examiners.
6. Punched Cards and Notched Cards
The British Patent Office has used punched cards for searching since 1905. These punched cards are about 12” x 18” and have 4800 punch positions. More recently representatives of many other patent offices have been making use of punched cards in connection with the searching of patents [go]. For many years Universal Studios has indexed the plots to its stories on punched cards. I n 1957, F . B. McKinnon and his associates a t the American Bar Foundation experimented with the use of punched cards to compare statutes of various states with corresponding statutes of the State of Illinois [56]. Richard F. C. Hayden, attorney of Los Angeles, demonstrated the application of edge-notched cards to the indexing of depositions a t the 1958 convention of the American Bar Association. Roy N. Freed [32] has described how punched cards may be employed to the analysis of cases involving a large number of exhibits, such as occurs in antitrust cases. William Cochran, one of the law examiners of the United States Patent Office, has used edge-notched cards to index decisions in the field of patent law. The author has used edge-notched cards to index cases in the preparation of briefs and in the analysis of decisions. It is thus seen that some storage and retrieval methods are available which can be of help to lawyers now. Lawyers do not need to wait for large computer centers to come to their aid. And the experience they gain today with such a simple system as edge-notched cards will help them tomorrow in their use of more advanced systems. More sophisticated peekaboo systems are available. At the National Bureau of Standards a novel peekahoo system has been
324
REED C. LAWLOR
developed. In this system, peekaboo cards are punched that correspond respectively to terms that characterize individual documents. Copies of the documents are recorded on microfilm and the microfilm images mounted on a drum. When documents referring to particular combinations of terms are to be located, the corresponding peekaboo cards are placed on a viewing plate in front of the drum. Two cross-hairs are manually brought to intersect at a point where holes of the cards coincide. This automatically rotates the drum to bring the image of the corresponding document into position in a projector which causes an enlarged view of the document to appear on a screen in front of the user, If the document is of interest, a copy of it is produced automatically merely by operation of a lever. Since this system can record as many as 50,000 pages of documents, it offers possibilities for the recording of about one hundred volumes of 500-page law books. However, this system suffers from the disadvantage that it would be necessary to analyze the text in terms of a code. Jonkers Business Machines (Gaithersberg, Md.) is also developing an application of a peekaboo system to a restricted field of law. The Jonkers system employs a series of peekaboo cards that are placed one upon another on a light table. The points of coincidence of holes on all the cards are used in a conventional manner to identify the corresponding decisions that involve all of the factors represented by the stack of cards.
7. Prediction of Court Decisions
When a client is involved in a lawsuit, he asks the perennial question, “What are my chances of winning?” Strangely, clients seldom ask, “Will justice prevail?” Lawyers, however, are not mere advocates. They are counselors. They are also officers of the court. Lawyers do not merely engage in surgical law requiring a court operation; they also engage in prophylactic law [16].Frequently lawyers counsel their clients to the effect that “Your cause is unjust” or “You have little chance of winning” or “You should settle the case” or “You had better do so-and-so to avoid a collision with the law.” For example, a lawyer often counsels his client in such a way that the client can avoid making mistakes which will entangle him with expensive, time-consuming legal proceedings. Clients want black or white answers, but lawyers can seldom give them. This is not because lawyers don’t know what they are talking about. In fact, the lawyer who frequently says “I am not sure” probably understands the law a great deal better than the lawyer who says “NO,it is impossible” or “Yes, you are absolutely right” most of the time. In the field of medicine,
INFORMATION TECHNOLOGY AND THE LAW
325
Professor Haynes, of Rush Medical, used to say: I‘ ‘Always,’ ‘never,’ and (must be’ are words that must never be used in medicine.” The same principle always (!) applies in law. We all must recognize that the degree of reliability of the opinion of any lawyer or any layman or any judge is impaired by incomplete information, by undiscovered misrepresentation of facts as they are presented by clients and witnesses to the attorneys and to the courts, and by the frailties of human nature-including lack of understanding, bias, lack of knowledge of what rules to apply, and even by lack of attention. While many of these factors may not be subject to precise analysis, nevertheless mathematical methods should enable us to analyze law problems, to make the legal system more efficient, and to help it attain the ends of justice more effectively. The question arises, of course, as to whether one can predict the decision of the Court and, if so, how and with what degree of reliability. If we attempt to apply mathematical methods to the prediction of law decisions, we are confronted with a basic question: Do the laws of chaos apply or the laws of order; or do both kinds apply, depending on conditions? Are decisions a matter of chance, like throwing a seven at dice; or are decisions a matter of regular, predictable action within limits that narrow with time, like the launching of a satellite? De Laplace [21],Cournot [19], Boole [13],Molina [59], and others have long suggested the applicability of the theory of probability to the study of the legislative process, the judicial process, and the jury system. I n spite of all of this fine theoretical work, very little has been done to apply the principles of probability and statistical analysis to the practical problems with which a lawyer is concerned. Lee Loevinger has presented a review of this and related work [53, 541. Fred Kort, a political scientist at the University of Connecticut, has developed two different methods for predicting decisions of various courts. His methods have been applied particularly to certain classes of civil rights cases which reach the United States Supreme Court. I n his f i s t method [45, 24, 25, 461, Fred Kort developed a linear weighting technique based upon the following assumptions : (1) Figures of merit can be calculated for a set of court decisions related to the same
subject. The figures of merit have the following properties: (a) If the figure of merit (cv) for a case exceeds a predetermined value, then a particular party will win. (b) If the figure of merit is less than that amount, the party will lose. (2) The figure of merit is the sum of positive numerical constants (fu) that correspond to different pivotal factors or factual situations of the case in any given field of law. The numerical constant of each factor is omitted from the sum if the factor is absent from the case, but is included in the sum if the factor is present in the case.
326
.
REED C LAWLOR
I n Table V there is shown a table of pivotal factors involved in the right-to-counsel cases decided by the United States Supreme Court. as identified by Kort. and the weights of those factors. as calculated by Kort . These cases involve the questions as to whether a person charged with a TABLE V . WEIGHTSOF FACTS IN RIGHTTO COUNSELCASES ACCORDING TO KORT’SFIRSTMETHOD Pivotal factors:
fu
Crime subject to capital punishment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Crime subject t o life imprisonment, etc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Crime subject to twenty or thirty years imprisonment, etc . . . . . . . . . . . . . . . . . . Crime subject to five or ten years imprisonment. etc . . . . . . . . . . . . . . . . . . . . . . .
51.8 29.9 25.6 19.2
KO previous experience in court, ctc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arraignment without the assistance of counsol . . . . . . . . . . . . . . . . . . . . . . . . . . . . No assistance of counsel between arraignment and trial, ctc . . . . . . . . . . . . . . . . . No assistance of counsel a t the trial, etc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . No assistance of counsel at the time of sentencing . . . . . . . . . . . . . . . . . . . . . . . . . No assistance of counsel a t any other phase of the proceeding* . . . . . . . . . . . No advice of the “right to counsel,” etc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Request of assigned counsel denied . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Opportunity of consultation with own counsel denied . . . . . . . . . . . . . . . . . . . . . . No explicit waiver of the “riglit to counsel”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Detention incommunicado, etc . . . . . . . . . . . . . . ......................... Detention and trial in a hostile cnvironmcnt., ......................... Deception of the defendant, etc . . . . . . . . . . . . . ......................... No explicit presentation of charges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Coercion or intimidation to plead guilty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Consequences of the plea of guilty not rsplained . . . . . . . . . . . . . . . . . . . . . . . . . . Request of additional time, etc., denied . . . . Accelerated trial . . . . . . . . . . . . . . . . . . . . . . . . Procedural or substitntive error, c-t (. .* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jurisdictional issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50.1 51.8 59.4 40.9 38.0 5.2 43.7 43.7 31.2 56.5 32.5 53.9 31.5 31.5 68.3 18.9 14.2
25.2
crime was deprived of the right to be represented by a lawyer at any time of the criminal proceeding. in violation of his constitutional rights . Associated with each pivotal factor is a numerical constant (fv) . According to Kort’s principle. the figure of merit. or case value (cv). is the sum of the constants (fv) corresponding to the factual situations present in the case . I n other words. the figure of merit or case value (cv) for a particular case is given by the equation (cv) =
c a
(fV)i
Ti
INFORMATION TECHNOLOGY AND THE LAW
327
where Ti = 1 or 0, depending upon whether the ith factor is present or absent. In effect, each of the numerical constants (fv) is a weight of the corresponding pivotal factor or fact. The weights of the facts that are present in the case are simply added up; and then the total weight for the case is compared with a critical borderline value, which lies somewhere between 370 and 378, to determine whether the case has sufficient weight for it to be decided in favor of the alleged criminal or against him. Table VI shows a list of cases for which such figures of merit have been TABLE VI. WEIGHTSOF RIGHTTO COUNSELCASES ACCORDING TO KORT’SFIRST METHOD Pro cases Herman v. Claudy, 350 U.S. 155 *Smith v. O’Grady, 312 U.S. 329 (1941) *White v. Ragen, 324 U.S. 760 (1945) *House v. Mayo, 324 U.S. 42 (1945) *De Meerleer v. Michigan, 329 U.S. 663 (1947) Marino v. Ragen, 332 U.S. 561 (1947) Chandler v. Fretag, 348 US. 3 (1954) Palmer v. Ashe, 342 U.S. 134 (1951) *Williams v. Kaiser, 323 U.S. 329 (1945) *Rice v. Olson, 324 U.S. 786 (1945) Uveges v. Pennsylvania, 335 U.S. 437 (1948) *Powell v. Alabama, 287 U.S. 45 (1932) Wade v. Mayo, 334 US. 672 (1948) Townsend v. Burke, 334 U.S. 736 (1948) *Tomkins v. Missouri, 323 U.S. 471 (1945) *Hawk v. Olson, 326 U.S. 271 (1945) Gibbs v. Burke, 337 U.S. 773 (1949) Massey v. Moore, 348 U.S. 105 (1954)
591.5 491.3 490.7 487.7 478.2 457.0 448.4 443.7 435.9 430.1 422.9 412.7 399.7 399.3 392.2 389.1 380.5 378.2
Con cases *Foster v. Illinois, 332 U.S. 134 (1947) Bute v. Illinois, 333 U.S. 640 (1948) *Betts v. Brady, 316 U.S. 455 (1942) Gryger v. Burke, 334 U.S. 728 (1948) Quicksall v. Michigan, 339 U.S. 660 (1950) Gayes, v. New York, 332 U.S. 145 (1947) *Canizio v. New York, 327 U.S. 82 (1946) *Carter v. Illinois, 329 U.S. 173 (1946) *Avery v. Alabama, 308 U.S. 444 (1940) Stroble v. California, 343 U.S. 181 (1952)
370.4 366.0 340.6 334.4 320.2 299.5 292.5 260.5 153.1 116.1
328
REED C. LAWLOR
calculated. The cases which are decided in favor of the alleged criminal all lie above the boundary line. Those that were decided against him all lie below the boundary line. Notice that the figures of merit for the cases above the line exceed 378, while all of those below the line are less than 370. Only by the analysis of further cases to be decided in the future can it be ascertained whether there is a sharp critical value and, if so, what its exact value may be, or whether the boundary is fuzzy with both pro cases and con cases on opposite sides. The twenty-six constants (fv) of Table V were actually calculated by Fred Kort by a very involved but ingenious technique from the analysis of only fourteen cases [47]. It might seem that twenty-six cases would be required to ascertain the values of twenty-six constants. This, however, is not true. Kort used only fourteen cases, and his method works! The fourteen cases used for calculating the twenty-six constants are indicated with asterisks. The cases which were tested by use of the constants are shown without asterisks. Of the fourteen cases tested, twelve fell in the same areas as the original fourteen, according to whether they were “pro” cases or “con” cases. The two remaining new cases helped to narrow the threshold value that Kort assumed separates all “pro” cases from all “con” cases in the decisions of the United States Supreme Court in this field of law. Fred Kort’s theory has been criticized as being unsound by Franklin M. Fisher [24, .25]. But Glendon A. Schubert and Elmer T. Prosper [64] approached Kort’s work differently. They asked: “Since Kort’s method is reliable, how can we explain it?” They used a ‘iscalogram” technique. Schubert and Prosper showed that the Supreme Court judges who have decided the right-of-counsel cases can be arranged-that is, ranked-in a particular order along one axis, and that the cases themselves can be arranged, or ranked, along another axis; and that when the votes of all the justices are recorded on the chart, a line of separation can then be drawn between almost all the votes in favor of the accused and almost all the votes against the accused. The results of this analysis are represented in Table VII. Here you will note that the double vertical line is used to separate the favorable cases from the unfavorable cases. The ‘ I + ” signs indicate favorable (pro) votes, while the “ - ” signs indicate unfavorable (con) votes of the individual judges. More particularly, you will note that certain judges favor the accused criminal almost invariably, while other judges are more evenly divided, and still others usually voted against the accused-all as indicated by the various steps in the broken line which separates the favorable decisions from the unfavorable decisions. Such scalogram analysis has great merit in that it indicates relations between the reactions of the justices to the cases in a single field of law. But while the scalogram technique can be employed
I
I 1
1
I
I
I
I
~
I
I
+++ +++
++ + ++++
+++f + +++ + +++ +++ +++ + + + 4-
10. Masseg (1954)
9. Chandler (1954)
8. Hawk (1945)
7. Smith (1941)
6. Gibbs (1949)
5. Herman (1956)
4. Reece (1955)
3. Marino (1947)
2. White (1945)
1. De Meerleer (1947)
+ I I I
I I I I I I I I I I
I
I I
I
I
I
I I I I I I
I
I
I l l 1
I l l 1
I I l l 1
I I
I
I I I I
I I
I + I + l
I I I
I I
++ +++++
I
I
I
-
t
31. Auery (1940)
30. Canizio (1946)
29. Quicksall (1950)
28. Stroble (1952)
++
27. Befts (1942)
++
26. Carter (1946)
25. Foster (1947)
24. Groban (1957)
23. Gages (1947)
22. Bute (1948)
21. Gwger (1948)
20. Uveges (1948)
19. Rice (1945)
18. Moore (1957)
17. Palmer (1951)
16. Wade (1948)
15. Townsend (1948)
14. Olium (1948)
13. Tomkins (1945)
12. Williams (1945)
11. House (1945)
++++ ++ +
++++
.+++
++++ ++++ ++++ + + 4-+ ++++
I
+ + ++
I
+ I
I
1 1 + 1 I
I I
I I I +
+++I
+ + + 4++++ ++++ ++++ ++++ ++++
I = I'
++++ + k ++ + + ++++ + ++ ++ + + ++ + + + + ++++ + !+ ++ + ++ + + + + + ++ + + + ++
REED C. LAWLOR
330
to scale the attitudes of the justices in general, it does not appear to bc suitable for the prediction of the outcome of individual cases. Kort’s methods, however, can be used to some extent, at least, for prediction. Fred Kort has now gone further and has applied the statistical methods of Hotelling to the analysis of decisions [47, 481. The Hotelling technique involves the calculation of cross-correlation coefficients. The results obtained by applying this method to the same right-of-counsel cases are represented by Table VIII. TABLEVIII.
WEIGHTS OF RIGHT TO COUNSEL CASES
ACCORDING TO KORT’SSECOND METHOD Decisions in favor of the petitioner
Powell u. Alabama, 287 U.S. 45 (1932) Smith u. O’G-rady, 312 U.S. 329 (1941) Williams v. Kaiser, 323 U.S. 471 (1945) Tomkins u. Missouri, 323 U.S. 485 (1945) House u. Mayo, 324 U.S. 42 (1945) White u. Ragen, 324 U.S. 760 (1945) Rice u. Olson, 324 U.S.786 (1945) Hawk u. Olson, 326 U.S. 271 (1945) De Meerleer v. Michigan, 329 U.S. 663 (1947) Marino u. Ragen, 332 U.S. 561 (1947) Wade u. Mayo, 334 U.S. 672 (1948) Townsend v. Burke, 334 U S . 736 (1948) Uveges v. Pennsylvania, 335 U.S. 437 (1948) Gibbs u. Burke, 337 U.S. 773 (1948) Palmer u. Ashe, 342 U.S. 134 (1951) Chandler v. Fretag, 348 U.S. 3 (1954) Massey v. Moore, 348 U.S. 105 (1954) H e m n u. Claudy, 350 U.S. 116 (1956) Moore v. Michigan, 355 U.S. 155 (1957)
6.87 7.73 5.97 5.68 6.05 7.29 6.25 7.11 8.27 7.84 5.62 5.49 4.76 6.18 6.94 6.63 5.09 8.02 7.80
Decisions against the petitioner
Avery v. Alabama, 308 U.S. 444 (1940) Betts u. Brady, 316 U.S. 455 (1942) Canizio u. New York, 327 U.S. 82 (1946) Carter u. IZlinois, 329 U.S. 173 (1946) Foster u. Illinois, 332 U.S. 134 (1947) Gayes v. New York, 332 U.S. 145 (1947) Bute v. Illinois, 333 U S . 640 (1948) Gryger v. Burke, 334 U.S. 728 (1948) Quicksall v. Michigan, 339 U.S. 660 (1950) Stroble v. California, 343 U.S. 181 (1952) I n re Groban, 352 U.S. 330 (1957) Crooker u. California, 357 U.S. 433 (1958) Cicenia u. Lagay, 357 U.S. 504 (1958)
2.51 3.64 2.02 1.20 3.06 2.79 3.07 3.68 3.08 1.99 .51 2.87 2.87
INFORMATION TECHNOLOGY AND THE LAW
33 1
As shown in Table VIII, with this method a very wide difference exists between the figures of merit for the favorable cases, which are above the line, and the figures of merit for the unfavorable cases, which are below the line. You will recall that in Fred Kort’s first approach there was no such wide separation between the figures of merit of favorable cases and those of the unfavorable cases. I n the first method, the gap between the pro cases was about 8 parts in 480 (Table VI). But in the second method, the gap is about 100 parts in 800 (Table VIII). Nagel has also suggested the application of correlation techniques to the prediction of court decisions [ G I , 6.21. The author has also developed a mathematical technique for predicting United States Supreme Court decisions [50].This method is based, in part, upon the idea that the U.S. Supreme Court and each individual justice on the Court are self-consistent. A mathematical theory of stare decisis has been developed that is applicable where such a rule of consistency applies. This method has been applied to the right-to-counsel cases that had been previously studied by Kort and Schubert. With this method, Boolean equations were developed which describe how the United States Supreme Court as a whole and how each justice on it votes when presented with any arbitrarily selected subset of the set of facts that has appeared in prior right-to-counsel cases. In applying this technique, twenty-seven cases were employed for the development of the equations, and the equations were tested on the remaining ten cases. All ten calculated votes for the Court as a whole proved to be correct. Of the thirty-seven votes that were calculated for individual justices, thirty-five proved to be correct. This is fairly good, since only a tiny sample of twenty-seven subsets (cases) out of the 100,000,000,000possible subsets of facts were used in the development of the Boolean equations. By assuming order instead of chance, the probable error of prediction seems to have been reduced. language was then developed by A computer program utilizing FORTRAN the author for predicting decisions, based upon the Boolean equations mentioned above and the author’s mathematical theory of stare decisis. Examples of predictions made with the computer program during the preliminary stages of this development are shown in Appendix A.* I n this appendix, typical examples of the output obtained with this program are shown for two cases. In this treatment a pro decision is a decision in favor of the accused, while a con decision is a vote that is unfavorable to him. It will be noted that, in these computer-composed analyses, prediction is
* I wish to thank Edgar A. Jones, Jr., Professor of Law, Law School a t the University of California a t Los Angeles and Chairman of UCLA Committee for interdisciplinary studies of Law and the Administration of Justice, for sponsoring my use of the IBM 7090 a t the Weetern Data Processing Center of the Graduatc School of Business Administration for this work.
332
REED C. LAWLOR
made in accordance with four different techniques, one being based on Kort’s first method and three being based on the author’s work. The general nature of t.he four techniques is described below: First Technique I n this analysis a pro cranberry case is a potential precedent that meets the following requirements: (1) All of the facts of the pro cranberry case are present in the new case under
consideration. ( 2 ) No other facts are present in the pro cranbcrry case. A case is a pro cranberry case only if it meets the foregoing requirements and only if at least one of the justices voted pro in that case. A justice who voted pro on the pro cranberry case should also vote pro in the case under consideration, if he is consistent. Similarly, a con cranberry case is a potential precedent that meets the following requirements : (1) All of the facts of the new case are present in the con cranberry case.
( 2 ) No other facts are present in the new case. A case is a con cranberry case only if it meets the foregoing requiremcnts and only if at least one of the justires voted con in that case. A justice who voted con in a con cranberry case should also vote con in thc case under consideration, if he is consistent.
Second Technique I n this technique the computer proceeds to analyze the facts of the new case in the light of the Boolean equation that has been developed for predicting how the court as a whole will vote. Third Technique Kow, the Boolean formulas that have been developed for various individual justices are applied to the facts of the case under consideration. The votes of the individual justices are then printed out. It is to be noted that in Case 36 of Appendix A the vote of Justice Frankfurter was not correctly predicted. All the other predicted votes in the two cases in Appendix A arc correct. Fourth Technique The computer then adds up the weights of the facts present in the case, using the weights calculated by Bort. It then compares the total weight of the facts of the case with an arbitrarily selected value of 375.0 (between 370 and 378) and then predicts that the decision will be a pro decision if the weight of the case is more than 375, but a con decision if it is less than 375.
The important point about the prediction technique exemplified in Exhibit A is that it is based upon the assumption that the decisions of the courts and the individual justices are self-consistent and not a matter of chance or whim. It is also important to note that with this prediction technique the votes of individual justices are predictable even when no cranberry case exists which forms a precedent that compels the justice to reach any particular decision.
INFORMATION TECHNOLOGY AND THE LAW
333
It can be shown that the linear equation employed by Kort applies if the occurrences of the factors are statistically independent of each other for both the pro cases and the con cases. This assumes that the votes of the justices are a matter of chance. However, it can also be shown that the linear equation of Kort applies if, in fact, certain types of logical relations exist between the facts of the cases and the vote of the court [50]. Both Kort’s first method and the author’s method have proved to be reliable in the cases to which they are applicable. Theoretically, at least, the prediction by Kort’s method and the author’s method will be contradictory under certain circumstances. So far such circumstances have not arisen. It is conceivable that in the not-too-distant future it will be possible to record decisions, statutes, regulations, and other rules of law in dataprocessing systems and to feed sets of facts to those machines and to draw from them automatically a prediction as to the likelihood that a decision made by a particular court on the particular set of facts will be favorable or unfavorable. Computer-operated probability prediction methods have been used effectively in other fields, such as medicine [51, 691, and have been proposed for use in evaluation of the reliability of international intelligence information [ I ] . The reliability with which it will be possible to predict court decisions is still to be ascertained. The work to date has been too meager and too ineffective to use as a basis for calculating probable error with any substantial degree of reliability. One thing is certain. Regardless of what mathematicians and computers say, the courts will have the last word, and it is anticipated that, for many, many years to come, at the very most computers will be used only to prepare outlines of decisions for consideration by lawyers and the courts. Furthermore, by virtue of the fact that the lawyers and the courts are relieved of some of the more tedious work, they will be able to perceive more easily the solution to more difficult problems and will often reach conclusions far different from those that would be attained by merely following the dictates of a computer. In this connection, it will be well to remember the words of Theodore H. Lassagne, general counsel for Librascope, Inc., which he prepared at the time that the application of information retrieval techniques to law was demonstrated on the IBM 1401 at St. Louis in August, 1961 : “This is the tale of the 1401, The law clerk that was nobody’s son. It spent its days in a furious hunt For authorities, dictum, and argumunt. But after it found them, it burned with shame; The Supreme Court reversed it just the same.”
334
REED C. LAWLOR
8. Thinking Machines
There is considerable debate as to whether or not machines can really think. The answer to this question depends, of course, on how thinking is defined. Ashby [Q]has defined the intellectual power of a machine in terms of the ratio of intelligence output to intelligence input, and he has defined thinking in terms of selection or choice. As a matter of fact, a great deal of our educational system is designed to teach children to make choices between the spaces provided at the ends of questions in true-false or multiplechoice tests. The fact that machines may some day be able to make such choices almost perfectly, while our children will continue to make such choices very imperfectly, does not demonstrate that such machines lack all ability to think. Ashby has pointed out that it has not yet been demonstrated that men cannot possibly construct machines which have more intellectual power than the men who built them. To those who abhor the idea that a machine can think, let me offer this definition: “Thinking is any process by means of which correct conclusions cannot be drawn from premises by a machine unless a man programs the machine and arranges for data to be fed to it.” But to those who can accept the idea that a machine can think in the sense that it can simulate the thinking of the human mind very closely, let me offer the definition: “Thinking is any process by means of which relations between propositions can be tested or derived and described in language understandable to a man, other than those relations which can be recognized instantly by man.” Gelernter et al. [I531 have denionstrated how to solve problems in the field of plane geometry by means of the IBM 704. It is questionable whether law problems are any more difficult. We can look forward to seeing proofs of geometrical problems written in plain English for the benefit of us ordinary people some day. Such logical machine processes serve best when they serve most. They will serve most when they present their results in ordinary language that does not require decoding by the reader. These and other techniques can be extended to law. Without doubt the day will come when legal propositions can be fed to a machine and factual statements can also be fed to a machine; and then the machine will write out what conclusions necessarily follow, which are impossible, and which are uncertain, and will set forth the probability of the truth of the uncertain conclusions and will also write out all of the steps of the reasoning process. Lawyers and judges will then be able to study these machine-made arguments to ascertain the accuracy of the instructions given to the machine and from this the reliability and applicability of thc conclusions to the rases at hand.
INFORMATION TECHNOLOGY AND THE LAW
335
When such a machine is first used, I can well imagine many lawyers saying, “What a revolting development this is!” Others will use it as a crutch. But still others will welcome the help of the machine and employ it with the same care that an engineer or scientist uses with a computer. When that day comes, many persons will disagree as to whether the results of computer programs represent thinking. I n any event, computer processes based on programs that compose reliable solutions out of millions of possible combinations simulate thinking of a rather laborious and advanced kind. Furthermore, such computer processes may actually be more reliable than human thinking. The effectiveness of legal reasoning will be improved when a lawyer’s thinking and a judge’s thinking is commonly supplemented by computer processes that solve complex problems, such as selection of pertinent references [5, 8, 37, 421, designing around patent claims [49], or predicting court decisions [24, 45-48, 50, 61, 621. When that day comes, lawyers and judges will be able to relieve their minds of the more menial but laborious thinking tasks which now consume a large part of their time and energy and, like the scientists and engineers of today, will be able to devote more of their attention to more difficult, more serious problems.
9. The law of Computers
Roy 3.Freed, of Philadelphia, has made a study of the law of computers [27, 69, 301. It is his general conclusion that very few changes in the law
will be required in order for lawyers, the courts, and people as a whole to cope with the legal problems involved in the manufacture, sale, and use of computers. Courts that learned about wagon wheels were able to adapt themselves to automobile wheels. Courts that were able to solve problems of explosions in seagoing vessels have been able to cope with the problems of jet airplanes. Courts that have been able to cope with the problems of books can also cope with the problems of data-processing systems. Nevertheless, some new problems are bound to arise. One of the interesting questions is whether and, if so, when and to what extent, the use of information in a computer is a n infringement of a copyright. Suppose that the full text of a copyrighted book is recorded in machine-readable form. Then at what stage, if ever, does the owner or user of the computer infringe the copyright of the book? Certainly, if the computer is commanded at any time to reproduce the entire book, this would result in an infringement of the copyright. An exception might arise if the sole purpose of the printing is to check the accuracy of the recording. Suppose that only part of the book is reproduced.
336
REED C. LAWLOR
This may or may not be copyright infringement, depending upon whether or not the ordinary printing of that portion of the book would be copyright infringement under the same circumstances. I n this connection, it is to be borne in mind that certain uses of portions of books, for quotation or comparative analysis or argument-as in lawyers’ briefs-as distinguished from competitive use, are considered fair use. But now, suppose that, in the use of an information retrieval system in which the full text of books is recorded, the search system is operated in such a way that all it prints out are references to the pages and lines of the original book where the material may be found. This, it is submitted, would not involve an act of copyright infringement under the law as it now stands. Here the computer does not compete with the book; it helps a reader use the book more effectively. The manufacture of punched cards and magnetic tape records, thermoplastic tape records, and other machine-readable records which do not make use of visually legible printed format is analogous to the perforation of rolls of music and the manufacture of phonograph records. The latter acts are not considered infringements of copyrights, though under some circumstances they may involve trespass of some other kind of right. I n While Smith Music Co. v Apollo, 209 U.S. 1, 12; 52 L.E. 655; 28 S.C. 319, the United States Supreme Court held in 1908 that neither the act of perforating music on a roll nor the perforated roll itself constituted infringement of a musical copyright. The Court held, in effect, that a music roll is merely a mechanical device. The same rule applies to phonograph records, magnetic tapes, and the like when used in a normal way. I n 1909, in an effort to protect the interests of composers, a special law was enacted under which phonograph records could be produced by persons other than the owner of a copyright of the musical composition if, but only if, the owner of the musical composition first made phonograph records himself or permitted others to do it. Then the law went on to specify that if the newcomer made a phonograph record, he was required to pay certain royalties to the owner of the copyright on the musical composition. But that law does not apply to literary material. It applies only to musical compositions. Consequently, it would not apply, for example, to legal literature. It therefore appears that, as the law stands today, it might not be unlawful to record all of the copyrighted legal material in the United States in machine-readable form, so long as it is never reproduced in visually legible form. Whether such acts would constitute unfair competition would depend upon the law of the individual state in which the alleged wrongful act occurs. However, if one were to attempt such an ambitious project or even a
INFORMATION TECHNOLOGY AND THE LAW
3 37
much smaller one, he should be cautioned that courts do sometimes change their minds; and a court might readily determine that, under modern conditions, even a punched card is legible. If, in fact, the literary work was to be recorded on magnetic tape or thermoplastic tape in a form which might be visually read by a trained user, such as by means of polarized light or fine iron dust, a court again, under modern circumstances, might hold that the recording is an infringement and, therefore, in effect overrule the precedent of White Smith Music Co. v. Apollo, which was handed down fifty-four years ago. Phonograph recordings per se have never been subject to copyright protection. Such a record submitted to the register of copyrights would be returned as not constituting copyright subject matter. But photographs and motion pictures are copyrightable. Recently a copyright was registered on a magnetic tape recording of Gian Carlo Menotti’s opera, “The Consul” [Iga]. The tape itself was submitted to the Copyright Office in lieu of a motion picture. Some attention has been given to the question as to whether computers think. An important related question arises as to whether the product of a computer can be copyrighted or patented. Datatron has programmed a computer to compose music. Librascope has programmed a computer to compose poetry. I t has been suggested that computers can make inventions [50, 401. Certainly a listing of a program can be copyrighted. It is usually just a printing of what was composed by a man. Mathematical tables representing the labor of a mathematician are the subject of copyright. What if these same tables were generated by a computer? There are some people who apparently feel that it is undignified to permit the musical compositions and poems produced by computers to be copyrighted. Undoubtedly there are others who think that inventions require the operation of the human intellect. These people would deny that a beautiful modern painting daubed on a canvas by a blind child is art. But is it the origin that determines whether a novel combination of paint strokes or musical notes or mechanical devices has merit? Or does the value depend solely on the result achieved? Under the copyright law, the term “author” includes the person who makes a composition for hire. There is no analogy in the patent law. I n the copyright law, the work of a composer can be copyrighted in the name of the person who employed him. But in the field of patent law, only the person who exercised the intellectual effort in making the invention can file a patent application. Though the subject is too long to discuss here, it is suggested that, in those cases where the owner of a computer employs someone to program the computer to generate a composition-whether it
3 38 REED C. LAWLOR be musical, literary, choreographic, or artistic-the composition would be copyrightable in the name of the owner of the program. I n the case of inventions first described by a computer, the person who programmed the computer to bring the invention into existence would be the inventor; or, if perchance the computer produced the invention accidentally or as a result of some random process, the person who first perceived and appreciated that the output of the computer was new and useful would be the inventor. This is somewhat analogous to what occurs under the patent law as it applies to newly discovered varieties of plants, where the person who first discovers that the new species exists is deemed to be the inventor, and as it applies to accidental discoveries that occur in the middle of research efforts directed to achieve wholly different results. Many of the problems of civil and criminal law take a new twist when applied to computer technology. Under certain conditions the personal property of a bank is not taxable by the county where the personal property exists; but real property is taxable. Real property includes fixtures and improvements. When this issue was considered in a recent case [?'I], the Court held that the computer is taxable as an improvement in real property, emphasizing that the building was constructed to house the computer and included expensive features especially adapting it for use with the computer as well as provisions for expansion, and further that it was intended that the computer remain permanently in place, except for removal for repair or obsolescence or for replacement by more efficient equipment. I n the law of evidence, certain types of original documents or business records can be introduced in evidence with very little proof about the manner in which they were made, other than that they were made in the ordinary course of business. A question arises as to whether a magnetic tape or other machine-readable record is an original document under the law of evidence, especially when the record was made by punching keys a t a time when no punched paper record was being made. Where original documents are not available, secondary evidence regarding their contents may usually be introduced upon proof that the original document is not available. Does this mean that a print-out made from a tape can be introduced in evidence instead of the tape itself? Or is it necessary to introduce the tape into evidence and put someone to the trouble of reading it out? The potentiality of computers for fraud and forgery is limited only by the imagination of evil men. When we get to the point where some computers write checks and other computers honor them in blind obedience to a program, we may have some very serious problems on our hands.
INFORMATION TECHNOLOGY AND THE LAW
339
The Internal Revenue Service is instituting a program for recording the history of each taxpayer on 1/2-inch of tape and of updating these records every year, and purging the records every three years. Since, if adequately programmed, fraudulent concealment of income may be more easily detected than heretofore, the enforcement of the Internal Revenue Code may be simplified and made more certain, with resultant advantages in favor of honest taxpayers. Except for special problems such as the foregoing, it appears that computers are hardly in any different category than other types of machinery insofar as the law is concerned-whether it be the law of contracts, the law of torts, tax law, criminal law, or otherwise. It is to be anticipated that modern information technology, and computer technology especially, will solve more problems for lawyers and the courts than they will create.
10. Use of Computers in Court
Reference has already been made to the possible use of computers for indexing depositions and trial records, for information retrieval, and for the prediction of decisions. Computers can also be used in the administration of court proceedings not only in the docketing and scheduling of hearings and trials, but in the programming of steps to be taken in many kinds of proceedings. They may also be used to prove relations between facts. I n St. Louis, punch-card systems have been introduced for the control of various stages of probate proceedings [%I. Several years ago Judge Richard F. C. Hayden, while in the Attorney General’s office in San Diego, made an unusual use of computers [35].His case involved a charge of bribery of a public official. The question arose as to whether a positive correlatioh existed between the deposit of funds in certain bank accounts and the speed with which building permits were issued to those who made the deposits. I n this particular case, a large amount of data was involved. The statistical analysis was facilitated by means of a computer. And when opposing counsel challenged its reliability by asking for further analyses, those analyses merely added support to the charge of bribery! It is a general rule of trial practice that a lawyer should not ask a question unless he knows the answer. Interrogation of computers poses new problems for trial lawyers.
340
REED C. LAWLOR
11. New Horizons
Until March, 1960, we did not know when it would be possible to demonstrate how electronic data-processing systems can be actually used in finding statutes and cases of interest to lawyers. Nevertheless, such work was accomplished in the Health Law Center at the University of Pittsburgh at that time, and then eminently successful demonstrations were made at the American Bar Association Annual Convention in Washington, D.C., at the end of August, 1960. Then again successful demonstrations were made at the American Bar Association Annual Convention in St. Louis in August, 1961. Many lawyers are now aware of the possibilities of automatic legal research. Many wish it were here to serve them now. While automatic legal research is the major field of immediate interest to lawyers in their daily practice, there are other possible applications of information technology which could be of immediate value to lawyers and judges in their practice. Though we cannot anticipate what will be done next in this field, let us look into the crystal ball for a few minutes to try to see what the future may hold. (1) The application of peekaboo systems to the search for legal material on a practical and possibly commercially profitable scale is probably just around the corner. The systems developed at the National Bureau of Standards and the system undergoing development at Jonkers Business Machines, Inc., arc at the stage where they can be useful to lawyers. (2) The American Bar Foundation is considering issuance of a biweekly KWIC index of the titles of new laws enacted in all the states. ( 3 ) Today the bottleneck in the system involves the punching of cards as a preliminary step to storing information on tape or disc. Systems will soon come into use which will read books automatically and transcribe printed words into reproducible signals on magnetic tapes or other recording media. This would greatly expedite the automatic storage of large bodies of written information. Such records could then be analyzed by methods somewhat similar to those demonstrated at the 1960 and 1961 ABA Conventions for locating subject matter relevant to a lawyer’s question. (4) At the present time, methods are undergoing devcloprnent whereby printing of books and the like can he automated. The printing machines used for this purpose are driven by punched tape upon which the material to be printed has previously been recorded in code form. There is no technological reason that stands in the way of coding this material in such a form that the tapes can be used not
INFORMATION TECHNOLOGY AND THE LAW
341
merely for the printing of books, but also for recording the same information on reproducible media where it will be available for retrieval purposes. By use of the same tape for printing books and for recording the text in retrievable form, great economies can be expected. (5) Systems are undergoing development for automatically abstracting documents. These systems are potentially applicable to the preparation of digests and abstracts of law decisions. (6) Special typewriters are available today which automatically punch paper tape during the typing process. Normally such tape is used for automatically retyping the same material when it is to be reproduced. Such tapes could be used for storing the material in retrievable form on magnetic tapes, magnetic discs, or the like. Such systems could find immediate use for automatically indexing depositions and trial records and also for automatically indexing new statutes and other verbal texts. (7) One of the most worthwhile and valuable projects that could be undertaken would be the development of a system for recording and retrieving new bills being submitted to Congress and to the various legislative bodies throughout the country in such a way that persons who are interested in proposed bills respecting various subjects can be informed about them at the time that they are first introduced. Such a project might include, for example, the recording of bills in machine-readable form at the time that they are originally typed in the office of a Congressman or at the time that they are first printed in the Government Printing Office, and then the transmission of those machine-readable records to a center where they could be stored and continuously analyzed to match the questions of persons who are interested in various subjects. With such a retrieval system the interested persons-whether they be Congressmen, businessmen, scientists, ordinary folks interested in the public welfare, or even lobbyists-could be notified of the introduction of bills of interest at the early stages of the consideration of such bills. By use of the same material, indexes of statutes could be automatically updated continuously for the benefit of all concerned. By applying electronic and other mechanical information storage aiid retrieval methods to law, it will be possible to improve the efficiency with which law-case searching and statute searching can be accomplished. Through this, and in other ways, legal processes can be improved and the ends of justice served more efficiently. Just how the application of information storage and retrieval techniques will be made available to members
342
REED C. LAWLOR
of the Bar throughout the country can only be guessed. If electronic processes are employed, due to the inherent nature of the problem and the costs involved, it may be necessary for the work to be sponsored by Bar Associations or law book publishers in various areas throughout the country and the information made available at various widely dispersed searching centers. When this work is first commenced, it will probably be by different organizations in different areas of the country in entirely different fields of law. While it may be that coding techniques may eventually be employed, it is not beyond the realm of possibility that large bodies of published decisions will be recorded on magnetic tape or some other informationstorage medium soon. It has been calculated that the total number of tapes required by present techniques to store all the words of all published United States Appellate decisions is of the same order of magnitude as the number of tapes used by a large life insurance company in the operation of its business. The task is not insurmountable. Only 20/20 hindsight will make it clear someday how simple the task really was. While the day may be far off when a machine will be able to think through a lawyer’s problems and present him in writing with the step-bystep solution, nevertheless, within the next decade, if not within just the next few years, it is very possible that electronic data-processing systems will go into regular use to supply lawyers with statutory material, case material, and textbook material which applies to the problems on which they are currently working for clients. The attainment of this dream will require close cooperation and imagination of scientists, lawyers, manufacturers, government, and business. To paraphrase the words written by Kelso [@]fifteen years ago: “The American Bar and American Science will do well to think seriously of mechanizing the drudgery of the practice of law, in order that the really irreplaceable human contributions of lawyers may be liberated for more effective use for the benefit of mankind.”
Just as the lawyer of today finds the use of the telephone, the dictaphone, the typewriter, and the adding machine indispensable to the modern law practice, the lawyer of tomorrow will also find the use of automatic information-processingsystems indispensable to his practice. John F. Horty’s reaction to the effect of computers and other phases of modern information technology on law has been summarized by him as follows [38] : “Though it is too early to determine just how this second revolution is going to occur in law, I could never be more convinced than now that, once we have opened up the box that lets the mathematicians and other scientists examine what the lawyer
INFORMATION TECHNOLOGY AND THE LAW
343
does, the legal profession, legal rrsearch, the entire administration of justice will never be the same again.”
The law is here to stay. Information technology is here to stay. Thc twain have met. And they will beget a more efficient, more economical, more equitable, more logical, and more wise administration of justice for all. The application of modern information technology to the problems of law on a large scale will change the course of legal history. Bibliography NOTE:The abbreviation M.U.L.L. refers to Modern Uses of Logic in Law (quarterly newsletter of ABA Special Committee on Electronic Data Retrieval), American Bar Association, 1155 East 60 Street, Chicago 37, Illinois. 1. ACF Electronics, Data Processing Dept., Digitalized Logic and Its Applications. ACF Industries, Inc., Washington, D.C., 1955. 2. Allen, L. E., Symbolic logic: A razor-edged tool for drafting and interpreting legal documents. Yale Law J . 66,833 (1957). 3. Allen, L. E., Modern logic: A useful language for lawyers. Proc. 1st Natl. Law and Electronics Conf., Arrowhead Lake, California, October (1960), sponsored by Univ. of California a t Los Angeles. Matthew Bender, Albany, 1962. 4. Allen, L. E., Brooks, R. B. S., and James, P. A., Automatic retrieval of legal literature: Why and how. M.U.L.L. pp. 17-24 (1959). 5. Allen, L. E., Brooks, R. B. S., and James, P. A., Storage and retrieval of legal information; Possibilities of automation. M.U.L.L. pp. 6&84 (1960); see also Report to Walter E. Meyer Research Institute of Law, New Haven (March 1, 1961). 6. Allen, L. E., Uses of symbolic logic in law practice. Proc. A B A Electronic Data Retrieval Comm. Conf., St. Louis, Missouri (1961); see M.U.L.L. in press (1962). 7. Allen, L. E., W F F ’N PROOF series of educational games. ALL1 (Accelerated Learning of Logic Institute), 10822 Blucher Ave., Granada Hills, California. 8. Andrews, D. D., Application of random access techniques t o case law. Proc. ABA
9.
10. 11.
12.
13. 14.
Patent Trademark and Copgright Section Symposium on Information Retrieval, St. Louis, Missouri pp. 228-232 (1961). Ashby, W. R., Design for an intelligence-amplifier. In Automata Studies (C. E. Shannon and J. McCarthy, ed.), pp. 215-234. Princeton Univ. Press, Princeton, New Jersey, 1956. Bartholomew, P. C., The Supreme Court and modern objectivity. N . Y . State Bar J . 33, 157 (1961). Berkeley, E. C., Boolean Algebra (The Technique for Manipulating “and,” “or,” “not” and Conditions) and Applications to Insurance. Edmund C . Berkeley, New York, 1952; also see Record Am. Inst. Actuaries 96, 373 (1937); 97, 167 (1938). Biunno, V. P., History of electronic methods for legal research. Proc. A B A Data Retrieval Comm. Conf., Washington, D.C. p. 36 (1960), Bureau of National AiTairs, Washington, D.C., 1961; see also M.U.L.L. pp. 99-102 (1960). Boole, G., An Investigation of the Laws of Thought (1854), pp. 84 and 376. Dover Publications, New York, 1951. Brown, J. R., Thermo King Corp. v. White’s Trucking Service, Inc., 190 U.S.P.Q. 90, 95 (1961), Bureau of National Affairs, Washington, D.C., 1961.
344
REED
C. LAWLOR
15. Brown, J. R.,Electronic brains and the legal mind: Computing the data computer’s collision with law. Yale Law J . 71,239 (1961). 16. Brown, L., Manual of Preventive Law. Prentice-Hall, Englewood Cliffs, New Jersey, 1950. 17. Carroll, Lewis (C. L. Dodgson), Symbolic Logic, Part I: Elementary (1897). Berkeley Enterprises, New York, 1955. 18. Clark, R. L. (edited by Duke Univ. Faculty of Law), On Mr. Tammelo’s conception of juristic logic. J . Legal Educ. 8, 491-496 (1956). 18a. Colby, R., Letter dated April 28, 1961, re “The Consul,” Bull. Copyright SOC. U.S.A. 8,205-206 (196o-Sl), Copyright Society of U.S.A., New York, 1961. 19. Cournot, A. A., Exposition de la Thtbrie des Chances et des Probabilitb (1843), translation of Chapters XV and XVI. New Jersey Law Institute, Newark, New Jersey, 1954. (Copies can be obtained from Vincent P. Biunno, 605 Broad St., Newark 2, New Jersey.) 20. de Haan, C. S., Proc. 1st Intern. Patent Ofice Workshop on Information Retrieval, Washington, D.C. (1961),Patent Office Society, Washington, D.C., 1962. 21. de Laplace, Pierre Simon, Marquis, A Philosophical Essay on Probabilities (1819), Chapter XIII. Dover Publications, New York, 1951. 22. Dickerson, F. R.,The electronic searching of law, A m . Bar Assoc. J . 47 (9),902-908 (1961). 23. Fiordalisi, V., Progress and problems in the application of clectronic data processing systems to legal research. Proc. ABA Electronic Data Retrierial Comm. Conf., Washington, D.C. p. 22 (1960),Bureau of National Affairs, Washington, D.C., 1961; see also M.U.L.L.pp. 174-179 (1960). 24. Fisher, F. M., The mathematical analysis of Supreme Court decisions: The use and abuse of quantitative methods. Am. Polit. Sci.Rev. 52 (a),321 (1958). 25. Fipher, F. M.,On the existence and linearity of perfect predictors in “content analysis.” M.U.L.L. pp. 1-9 (1960). 26. Freed, R. N., A lawyer’s guide through the cornputcr mazc. I’ractical Lawyer 6, 1-5 (November 1960). 27. Freed, R.N., Some legal aspects of computer use in business and industry. J . I n d . Eng. pp. 289-291 (July-August 1961). 28. Freed, R.N., The importance of a systcrns approach to mechanized legal research. Proc. ABA Patent Trademark and Copyright Section Symposium on Information Retrieval, St. Louis, Missouri pp. 179-188 (1961). 29. Freed, R. N., Try suing a computer!-Legal tangles in E.D.P. Management Rev. pp. 4-11 (August 1961),Am. Management Assoc., New York. 30. Freed, R. N., Prepare now for machine-assisted legal research. Am. Bar Assoc. J . 4q (8), 764-767 (1961). 31. Freed, R.N., How computer specialists can help lawyers. J . Ind. Eng. 12,324-327 (September-October 1961). 32. Freed, R. N.,Machine data processing systcrns for the trial lawyer. Practical Lawyer 6,73-96 (April 1960). 33. Gelernter, H., Hansen, J. R., and Loveland, D. W., Empirical explorations of the geometry theorem machine. Proc. Western Joint Computer Conf., S u n Francisco, 17, 143 (1960). 34. Hayden, R. F. C., Electronics and the administration of justice. Proc. 1st Natl. Law and Electronics Conf., Arrowhead Lake, California, October (1960),sponsored by Univ. of California a t Los Angeles. Matthew Bender, Albany, 1962.
INFORMATION TECHNOLOGY AND THE LAW
345
35. Hayden, R. F. C., How electronic computers work: A lawyer looks inside the new machines. Proc. ABA Electronic Data Retrieval Comm. Conf., S t . Louis, Missouri (1961); see M.U.L.L. in press (1962). 36. Hensley, D. R., Punched cards produce progress in Probate Court. Am. Bar Assoc. J . 48, (2) 138-139 (1962). 37. Horty, J. F., Experience with the application of electronic data processing systems in general law. Proc. ABA Electronic Data Retrieval Comm. Conf.,Washington, D.C. p. 3 (1960), Bureau of National Affairs, Washing&, D.C., 1961; see also M.U.L.L. pp. 158-168 (1960). 38. Horty, J. F., The keywords in combination approach to computer research in law with comments on costs. Proc. ABA Electronic Data Retrieval Comm. Conf., St. Loiiis, Missouri (1961), see M.U.L.L. pp. 54-64 (March, 1962). 39. Horty, J. F., Kehl, W. B., Bacon, C. R. T., and Mitchell, D. S., An information retrieval language for legal studies. Commiin. Assoc. Computing Machinery 4 (9), 380-389 (1961). 40. Jancin, J., The electronic inventor-A fantasy. J. Patent Ofice Soc. 43, 857-861 (Dec. 1961). 41. Jacobstein, J. M., The computer and legal implications. Quaere (Dec. 1960), Semiannual of the Univ. of Colorado School of Law. 42. Kalikow, M., Patent infringement determined by information retrieval, Proc. ABA Patent Trademark and Copyright Section Symposium o n Information Retrieval, St. Louis, Missouri pp. 18&200 (1961). 43. Kelso, L. O., Does the law need a technological revolution? Rocky Mountain Law Rez~.18, 378 (1946). 44. Kent, A., Experience with the application of electronic data processing systems in general law. M.U.L.L. pp. 179-185 (1960). 45. Kort, F., Predicting Supreme Court decisions mathematically: A quantitative analysis of the “right-to-counsel” cases. Am. Polit. Sci. Rev. 51, 1 (March 1957). 46. Kort, F., Reply to Fisher’s “Mathematical Analysis of Supreme Court Decisions.” Am. Polit. Sci. Rev. 52, 339 (June 1958). 47. Kort, F., The quantitative content analysis of judicial opinions. Polit. Research: Organization and Design 3, 11 (March 1960). 48. Kort, F., The quantitative content analysis of judicial decisions. Annaal of Political Science (Prof. Heinz Eulau, Stanford, California ed.) in press. The Free Press, Glencoe, Illinois, 1962. 49. Lawlor, R. C., Analysis of patent claims by mathematical logic. Proc. ABA Patent
Trademark and Copyright Section Symposium on Information Retrieval, St. Louis, Missouri pp. 201-228 (1961). (Proceedings are obtainable a t ABA, Chicago, Illinois.) 50. Lawlor, R. C . , Prediction of Supreme Court Decisions. Proc. 2nd Natl. Law and Electronics Conf., Arrowhead Lake, California (May 1962), sponsored by Univ. of California at Los Angeles and Systems Development Corp., Matthew Bender, Albany (In Press). 51. Ledley, R. S., and Lusted, L. B., The use of electronic computers to aid in medical diagnosis. Proc. I R E 47, 1970-1977 (November 1959). 52. Lewis, G. J., Electrical revolution in legal research. Illinois Bar J . 47, 680 (April 1959). 53. Loevinger, L., The element of predictability in judicial decision making. Proc. 1st Natl. Law and Electronics Conf., Arrowhead Lake, California, October (1960), sponsored by Univ. of California a t Los Angeles. Matthew Bender, Albany, 1962.
346
REED C. LAWLOR
54. Loevinger, L., Jurimetrics-Science and prediction in the field of law. Proc. A B A Electronic Data Retrieval Comm. Conf., St. Louis, Missouri (1961), see M.U.L.L. in press (1962). 55. Mathews, G. E., Computer dollars and sense in lawyer’s time records. Practical Lawyer 7, &22 (May 1961). 56. McKinnon, F. B., Leary, J. C., and Levinson, D., The American Bar Foundation project on the survey of American statutory law. Proc. S y s t e m of Information Retrieval Conf., CZeveZund, O@o, (April 1957), sponsored by Western Reserve Univ. Center for Documentation and Communication Research. 57. Mehl, L., Automation in the legal world-machine processing of legal information on the “law machine.” Proc. Symposium on the Mechanization of Thought Processes, Teddington, Middlesex, England (November 1958), sponsored by (British) National Physical Laboratory. 58. Melton, J. S., and Bensing, R. C., Searching legal literature electronically: Resalts of a test program. Minresota Law Rev. 45,229 (December 1960). 59. Molina, E. C., The Science of Chance Invades the Realm of the Law. New Jersey Law Institute, Newark, New Jersey, 1954. (See ref. 18.) 60. Morgan, R. T., The point of law approach to computer research in law. Proc. A B A Electronic Data Retrieval Comm. Conf., St. Louis, Missouri (1961), see M.U.L.L. pp. 44-48 (March, 1962). 61. Nagel, S., Weighting variables in judicial prediction. M.U.L.L. pp. 93-96 (1960). 62. Nagel, S., Using simple calculations to predict judicial decisions. Am. Behavioral Scientist 4‘24-28 (December 1960). 63. Pfeiffer, J. E., Symbolic logic. Sci. American, pp. 22-24 (December 1950). 64. Schubert, G. A., Quantitative Analysis of Judicial Behavior. The Free Press, Glencoe, Illinois, 1959; reviewed by F. Hort in M.U.L.L. pp. 143-145 (1960). 64a. Smith, B. M., Mens rea and murder by torture in California.Stanford Law Rev. 10, 672-693 (July 1958). 65. Suppes, P., Introduction to Logic. Van Nostrand, Princeton, New Jersey, 1957. 66. Tammelo, I. (edited by Duke Univ. Faculty of Law), Sketch for a symbolic juristic logic. J . Legal Educ. 8, 277 (1956); see also R. L. Clark’s criticism [18]. 67. U.S. Senate, 85th Congress, Mid-session.Hearings of subcommittee of the committee on government operations on 53126 (Science and Technology Act of 1958), Part I, pp. 250-251. 68. Waddell, W., Jr., Structure of Laws as Represented by Symbolic Methods. Ward Waddell, Jr., San Diego, California, 1961. 69. Warner, H. R., Toronto, A. F., Veasey, L. G., and Stephenson, R., A mathematical approach to medical diagnosis. J . A m . Med. Assoc. 177, 177-183 (1961). 70. Ziembinski, Z., Logic in law schools in Poland. M.U.L.L. pp. 98-99 (1960). 71. Bank of America National Trust and Saving, etc. us. County of Los Angebs, etc., Nos. 758,864 and 784,689, Superior Court of Los Angeles. (Decision not published,)
EXHIBIT“A”. EXAMPLES OF PREDICTIONS OF DECISIONS IN RIGHT-TO-COUNSEL CASESMADEON IBM iO90 .
W.
-
-.
F t GRUARY
lily62
JOHN 4AWYER
__
16-MPlN STREET HGPE T O h N U.S.A.
__ -.
-.
RE.
34
YOUR D O C K t T NO.
_ _
. _
O i A K MR.
- ___ ,
i
LAWYER
____
AN ANALYSIS HAS BEN M A D E O F THE R I G H T - T O C O U N S E L L S E - -THAT YOU PRESENTEI) E N T I T L t D ANONYMOUS NOS. 6 AND 7 V. B A K L K . 3 6 0 U . S . 287 (1959) T H E P N A L Y S I S WAS P E R F U K M E D ON A N I B M 7090 H I T H A S P E L I A L F O K T H A N PRUGKAM. E X C E P T FOR T H E M E l H O O D E V E L O P E D B Y D E V E L O P E D BYF R E D KDKT-!- THE M E T H O D S _ E M P L O Y E 0 -!EKEIN_WERf R t E U C. L A k L O R .
--
__
THE R E S U L T S S E T - F O R T H ~IELOW F O R E C A S T THE V O T E OF THE UNITEU S I A T - E S - S U P R E M E COURT. A S A h l l O L E r A N D CHA-VOTES O F THE I h 3 I VIUUAL JUSTICES..ASSUMING 1 H A T T h E Y A N D THk C O U R I K E C O L N I I C
__IN -
15
19
Y ~ U RC A S E
_ _
THE-FULLOMING
ARE-PRFGNT--- -
FACTS
--
T H E P E T I T I I I N E R L A C K E D A S S I S T A N C E - O F CUUNSE-L A T SOME P H A S E O F _TH$ P_RO_C_EEDING UTHkK T H A N A T
T I M E S O F T H E A K R A I G N M C N I , THE P R E P A R A T I O N F O R T R I A L , THE T R I A L , OR T H E S E N T E N C I N G . THE P t T I T I O N E R NEVER E X P L I C I r L Y H A I V t O R I G H T i TO COUNStL. -
Q LHE R-_FACTS- FQUNO -1 N -PK E V I OU
-.
- F I R S T TECHNIQUE
--
@ /
1-H Lb__F_I_c_L&!A t _A B S E N 1.
C A SE-S--LN-
- - -
..
c.-B”
30 0s .-
I
- -
-
__
Encn L ~ F THE
FOLLUWIN~ - P w - C R A N O C K R Y ~ n s t r sINCLCD~S O N L Y PRESENT I N _ Y O U s _ C $ S k . THE J U S T I C E S WHO v c r E D PRO ON T H E RESPECTIVE LASE?, A K E INDICATED.
F-A L T S - TH A. T - AKE -.
- . _.
. .
NUNt
.
.
.
.
.
.
.
.
.
-
-
- .........
C A L H O F T H t F O L L O U I N G -CONCI
__
3 3 4 U. .
5.
_-_ -.
728 ( 1 9 4 8 ) -
I
.
.
-._-
-
.
--
-
. . .
. . . . . . . . . . . . . . . . . .
.......................
__ .
- . ., . .
.. - . . . . . . . . . . . ’ IA 7
..................
- ..............
Al
..............
.....................
MR. JOHN LAWYER FEBRUARY. - 1 i l . 9 6 2 PAGE 2
.-.-,
,
,
,
....
. -.
..
. . . . . . . . . . .
._
..-...._._- __...... . -. _*_-.
.
...........................
2 o INV I NKSt O N LR.oBAN, 352 . .- . W K F U K . T E R..............
.-
u-.- s.
.....
3 3 0 -~ ~ - ~ ~ ~ ~ - - - - - - - . - - - - - - . -
..............
-
__ _. WKroN . . . . . . .CLARK H A X L b N ........ _ _ _ .......
..- S E C f J N 0 , T E C H N I Q U E
-
.
.............
UASEC UPON .... A.- S P E C I A L , . ~ W E I ~ C H ~ r FORMULA ~I~G .THAT HAS BEEN DEVEL"PfO, IT Is PR~,,ICTE" T H A T T~,E-.Gdu.R.T~n~s- A. wtra.LE..~' Gsi-L.c--.. CGNSIDEKLYUUR ~.. CAS:E-,A.-C!N-. .C/\SE, . . . . . . . . . ... .--
._THIRO ..TECHN
fQUE
-. . . . . . . . . . . . . . . . . . .
................
.............
BASEC -_UPON -___. .--S -P E C I A L
W t I ~ H I _ I N C _ F O ~ R M U L ~ S - T ~ A THAVk H-E-KN--DtVELr OPED F O R THE I N D I V I D U A L JUSTLCES. I T IS A N r I C I P A T E D THAT EACH _ _OF- THE FOLLOWING J U S T I C E S NOW S I T J ' I N G Tklf-CCURT H I L L l K E A T YOUR CASE A S A -PRO- CASE O K A S A -CON- CASE A S - I.FIUICPTEO. . . . . . . . . . . . . . . . . . . . . . . ......... - . .... P R O -. . . . . . BLACK CON FRANKFUHTER PRG UOUGLAS __ . CCN CLARK .pH!! .-. . . .WAKR.tl'?-.. .......................
--
^
...
-
.
-
_. -
. . . . . . ,
NO___FOKMULAS - P A V E YET UEEN- L)EV-ELUPJI)- F U R _ ~ D ~ ~ E _ I I I C ~ ~ / N ( ; - T W E V O - ~ E - _ I I F THE FOLLOWING J U S T I C E S I N T H I S K I N 0 OF CASE.
__. .
........................................................
..... ..
HA RL AN --k" BREh9AN I TAKER. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . STEWART ......
I T
....................
-.
.....
._ ...
-IS! T H E R E F O K E ~ P R t D I C A T E D TH.AT ,Tilt, WOULD HE.
COURT O N T H I S CASE
. . . . . . . . . . . .
VlJJE
.
.UF .TtIE, S!JPREt'.E-
. . . . . . . . . . . . . .
54.10 .
~.
. . . . . . . . . . . . . . . . . . . . . . . . . .........
......
-. .." _.
.-.... .... .- .........
-. . . . . . . . . . . . . . . .
.-._-
.- ....... A2
....
........
......
...........
~
MI(.
JUHN-LAWYER - FE!ltUAHY .. . I * LYbZ. ... PALE 3 ......
-.
..
.......
~-~~
.............
........
.....
~
S I N C L YOUR CASE H A 5 A CASE .VALUE L E S S T H A N 3 7 5 . 0 , KOI(1.S I E C H N I Q U t I N D I C A T E S T H A T T t l E S U P R E M E C U U K l WOULO T R E A T YGUR C A S t A S A -CONC4SE. . _ . .- .__ ................
-
SUNMAKY
-
-
....
..........................
- .................................................
...........
.........
...... ._
....
-.-
.
.____I__.. I..__.
.......
. _ ......
................
.
"
..........
.
.....
.......
____.____-_I_____.-___--
-
.
..........................................
...
---
....
- ................................. ..
.................................
...
.................
349
.........
.
---
.
...
___
-.____._-..__I______._-_-
- ..............
..
..........
._
..............
---.
FEHKUARY
1 , 11962
MK. J C t i N L A W Y C R 16 M b l N S T R E t T
H C M E TUWY U.S.A. KL.
U E A K MK. .
YUUK I ) u c K E T
36
NO.
LAWYtK
___
A N A N A L Y S I S H A S H t C N MADE O F THE R I G H T TO C O U N S E L LASt l H A T YOU P K E S E N T k U E N T I T L E D M C N E A L V. C U L V € K , 365 U.S. 169 ( 1 9 6 1 ) T H E A N A L Y S I S WAS P E K F O R M E O O h A N I B M 709C W I T H A S P t C I A L F O R T R 4 h PRUGKAH. E X C E P T F U R T H t H E l H O U D E V t L U P E D B Y F K t D K U K T , T b E M E T H O D S E M P L O Y E D H E K E I N WERL U E V K L O P C O B Y REED C. LAkLUR.
-
T H t f i t S U L I S S E T F O R T H D E L U l j F O R t C h S T T H t V O T t LIF T H E U N I T E D S l ~ ~ St U SP K E M E C O U R T , A S A WHOLE, AN0 T H t V I I T E S OF T H E I N D I V I O U A L J U S T I C E S , A S S U M I N L T H A T THEY A N D T H E C O U K T R E C O G F c I Z E A S_ _I Y_P O R T A N T . I & _ I H E C A S E T H E 5 4 M E F-ALTS T H A I . Y b L J _ - S T A T C _ A R E PRESENT. FUUR D I F F E R t k T P R l r O I C T I O N T L C H N I G U E S ARE EMPLOYED.
.
I N - Y O U R CASE TIiE FOLLOWING F A C T S AKE PKESCNT-
3 -_ _
6
7 -
-
9
11 ____ 12 13 14
17 19
T H E P C T I T I O N t K WAS C H A R G t D k I l H A C K i N C T O - T r l t N T Y TO T H I R T Y ~ L A ~ S - - I M ~ R I ~ O i ~ M - ~. ~.T_ _. _ T H E P E T I T I O N E R IS E I T H E R I L L I T E A A T E O K I l A S A S U B N O R M A L E D U C A T I O N OR H A 0 U N L Y L I M I T t O ' WHICH H t C O N F A C T H I T H T H E C U L T U R E 14 W4S 1 R I t G . T P t P t T I T I O N E t 7 H A S A R E C O K D UF L O k F I N E K t N l I N A M t N T A L I N S T I T U T I L N P R I O K TQ _ T i l t CGMMISSIUN OF F H E A C r S I N V O L V E U I N T H t C d I M I P I A L L H A R t i E . THE P E T I T I O V E K H A D N O P R E V I U U S t X P E I ( 1 E N L E I N COUICT Ud H 4 D 1\10 A U € Q U A T E K N l l b l L E D G E U F C U U K T PRGCCCUAC I N S P I T E U F P R E V I O U S C O N V I C T I O N S . T H t P E T I T I U N E K WAS A R K A I G N i D A T A T I M t hHEN H E L A C K E O - A S S I S T A ' J C E Of C O U N S E L . T h E P t T I T I ~ A E KH A D 1\10 A . S S l S T A > V C t O F C m S E L - - D E l h E t N A K R A I G h M E N T A N U T K I A L OK T H C H E A K I N G CN FHE P L t A O F G U I L T Y . T H E P k T I T I O J E K H A D 1 0 A S S I S T A N C t OF C O U i U S t L A T T h t T R I A L OK I H E H E A i t I N G ON T t l E P L E A O F G U I L T Y . T H E P t T I T I O N E R H A L : N!2 A b S I S T A N C E AT- T-tiE T I M E U F S t N T t NC I N G . T h t P E T I T I O I J E K S R t Q U t S T F U K A S S I G Q M E N T OF CUUNSEL W 4 b L)t\lIEL). T H t P t T I l I O ~ L KN t V t K t X P L I L I T L Y W A I V t D R I G H T
._ SUbJCCT
-
___
~
Ta cuu:istL. 39
, '
&
A COMPLEX I S S U t H A S P K t S C N T SUCH AS k l i t T H E K T H t T K I A L C U U K T I l A U J U K I S O I C T I O N LIF THE C A S t CR L b M P L I C A T C D CHARLL-S W t R C I N V U L V C D .
B1 350
.
......................
.
MK.
JOHN LAWYER FEBRUARY 1,1962 PAC;€ 2 . --. . . . .
.
.................
.
~____
......
..........
..
.~.
.
OTHER F A C T S FOUND I N P R E V I O U S C A S E S I N T H I S F I E L D A R t ABSENT._ _ _ ~ - .. - F I R S T TECHNIQUE
-
___
- _. . _.
T l i E R C A R E N O C R A N B E H R Y C A S E S € - I T H E R P R U OK C b N .
-
-
SECONU T E C H N I Q U E
BASEC UPON A S P E C I A L W E I G H T I N G FORMULA THAT H A S D E t N DEVELT ____ H t C U U K T A S A WHULC, I T IS P R E D I C A T E D T H A T Y O U R __F-O RCASE WILL B E CONSI[)EKED A - p k o - - C n s g ; ----
P E- D -U --
-
THIRD TELHNIPUE
-
B A S E D UPON S P E C I A L H E I G H l l N G F U R M U L A S T H A T H A V E BECN D t V E L IT- I S AN-TICIPArED THAT O P E D F O R - T H E - I N D I V I D U A L JUSTIC-E-S, t A C H O F T H E F O L L O h I N G J U S T I C E S NUW S I T T I N 6 O N T H E C O U R T CASE AS WILL-TFEAT YOUR C A S E A S A - P K O C A S E OK A S A - C O h INDICAlEG.
__ -
~
NO F O R M U L A S H A V E Y E T B t E N O E V E L O P E L I F U R P R E D I z r T h G THE V O T E I N ' T H I S K I N D OF C A S E .
-UF _.T__ hE FOLL-UUIJVG -JUSTJCJA HARLAN HH t N N A N nH I T 1 A K E R SlLWARl
iKA r T H t - v o i o F m
1 T- -1-SY-THt R E F O K E p PR t I) I C A T C O U R T O N T H I S C A S E WC)ULD t3
4 PRO V U T t i S 1 CON V O T E S - __ U h l " E T E ' { M I N L R V U T t - S ._-_ 4
-
f-OURTH T L C H N I C U t -
-s
u f%m---
-~
__
- _____________ __
B A S E 0 UPON T H E T t C H N I Q U E U E V C L O P t D B Y FRED KURT, PROFfSSOR O F P O L I T I C A L S C I E N C E A T THC U N I V t R S I T Y O F L O N N k C I I C U T , YOUR _ C A S E H A S !.CASE
__- - _ _
VAluE-OF_
..
__
__ - _ _ _
466.60
SIIYCL THE L A S t V A L U E I S t O U A L TO OR E X C t E O S 3 7 5 . C ~ K O K T S I E C H , \ I I O U t I N D I C A T E S T H A T THE S U P K t M E C C J U K l W U U L D T K C A T YUUK C A S E A S 4 -PROCASE. . . . . . . . . . . . . . . . . . .
---
""
B2'- .-
-
JOHN -LnHYER 1 9 1962 3 PAGE
MH.
- FtBRUARY
SUMMARY
__
...........
.............
...
-
1
CON VOTES -__U N C E R T A I N VOTES
- ~ _ _ _ ~ _ - _ - _ _ . . _ - I
4 __
._ .
.......
4
PREDICTION-
.
.
.
.
.
.
.
.
.
.
.
.
PRO
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
................
~ . .........
352
.......................
Author Index Numbers in parentheses are reference numbers and are included to assist in locating references when the authors’ names are not mentioned in the text. Numbers in italics refer to the page on which the reference is listed.
A Ablow, C. M., 185, 186 Alexander, S. N., 82 (1, l l ) , 152, 153 Allen, L. E., 300 (4, 5), 306, 309, 335 (5), 343 Andrews, D. D., 313 (S), 335 (8),343 Aroian, L. A., 50, 74 Arrow, K. J., 185 (2), 186 Ashby, W.R., 334, 343
B Bacon, C. R. T., 346 Baker, R. M. L., 7 (2), 15, 21 (2), 36, 49, 74
Brown, J. R., 308, 343 Brown, L., 324 (16), 344 Brown, R. R., 185, 187 Bruce, G. H., 271 Burns, A. J., 279 (8, 9), 281 (8, 9), 297 Burns, M. C., 285 (lo), 297
C Carroll, C. W., 185, 187 Carroll, Lewis, 309, 344 Clark, R. L., 305 (18), 344 Cocke, J., 78 (2), 152 Codd, E. F., 87, 105 (3), 152 Cohen, C. J., 28, 74 Colby, R., 337 (Ma), 344 Connelly, M. E., 287 ( l l ) , 297 Conte, S., 271 Cournot, A. A., 325, 344 Cox, F. B., 287 (22), 297
Bartholomew, P. C., 343 Bauer, W. F., 278 (I), 296 Baxter, D. C., 896 Beale, E. M. L., 186, 186 Bekey, G. A., 278 (24), 297 Benington, H. D., 84 (7), 152 D Bensing, R. C., 346 Berkeley, E. C., 305, 309 ( l l ) , 343 Birkel, G., Jr., 285 (4, 5), 296 Dahlquist, B., 22, 7 4 Birkhoff, G., 191, 196, 197 (l), 205 (l), Dames, R. J., 271 206 (I), 271 Dantzig, G. B., 186 (19), 187 Biunno, V . P., 343 de Boor, C. M., 261, 262, 271 Blackwell, D., 185 (g), 187 de Haan, C. S., 344 Blanyer, C. G., 297 DeLand, E . C., 185, 187 Boole, G., 305, 325, 343 de Laplace, Pierre Simon, 325, 344 Brenner, J. L., 30, 74 Dickerson, F. R., 344 Brigham, G., 185, 186 Diliberto, S . P., 30, 33, 74 Brooks, R. B. S., 300 (4, 5), 306 (4, 5), Dornheim, F. R., 186, 187 335 (5), 343 Douglas, J., Jr., 193, 205, 210, 271 Brouwer, D., 25, 30, 64, 74 Dreyfus, P., 78 (5), 152 353
AUTHOR INDEX
354
E Eckels, A., 29 (32), 75 Eckert, J. P., 82 (6), 152 Eckert, W. J., 64, 74 Ehricke, K., 74 Ehrlich, L., 211, 227, 232, 273 Ernst, A. A., 278 (34), 298 Everett, R. R., 84 (7), 152
F Fiacco, A. V., 185, 187 Fiordalisi, V., 344 Fisher, F. M., 325 (24, 25), 328, 335 (24), 344 Flanders, G. A., 229, 272 Forsythe, G. E., 191 (8), 195 (8), 263 (8), 271 Fort, T., 204 (9), 271 Frank, M., 185, 187 Frankel, S., 192 (lo), 271 Frankovich, J. M., 82 (8), 152 Freed, R. N., 335, 344 Freund, R., 30 (lo), 33 (lo), 74 Frisch, R., 185, 187
G Gantmakher, F., 204 (loa), 272 Garfinkel, B., 30, 74 Garrett, J. R., 24, 39 (14), 64, 74 Gass, S. I., 156 (12), 184, 187 Gelernter, H., 334, 344 Gill, S., 82, 152 Golub, G. H., 229 (ll), 272 Greenstein, J. L., 297 Griffith, R. E., 186, 187
H Habetler, G. J., 193, 194, 195, 196, 206 (24), 211 (24), 262 (24), 272 Hahn, W. R., Jr., 288 (38),298 Hamming, R. W., 22 (15), 74 Hansen, J. R., 334 (33), 3'44 Hanson, G., 62, 74 Hartsfield, E., 278 (14), 297 Hayden, R. F. C., 339, 344 Heid, J., 279 (27), 298
Heller, J., 209, 27.2 Henrici, P., 25, 40, 75 Hensley, D. R., 339 (36), 345 Herrick, C. E., 13, 14, 75 Herrick, S., 17, 75 Heraog, A. W., 287 (15), 297 Hildebrand, F., 13 (23), 20 (23), 23, 25, 75 Holt, J., 56 (30), 66 (30), 75 Hori, G., 31 (24), 75 Horty, J. F., 313 (37), 321 (38), 335 (37), 342, 345 Horwitz, R. D., 297 Householder, A. S., 195, 212 Hubbard, E. C., 28, 74 Hurney, P. A., Jr., 297 Hurwicz, L., 185 (2), 186
J Jackson, A., 279 (18), 297 Jacobstein, J. M., 345 James, P. A., 300 (4, 5), 306 (4, 5), 335 (5), 343 Jancin, J., 337 (40), 345
K Kahn, W., 96 (13), 153 Kalikow, M., 313 (42), 335 (42), 345 Kehl, W. B., 345 Kelley, J. E., Jr., 186, 187 Kelso, L. O., 310, 311, 342, 345 Kent, A., 312 (44), 345 King-Hele, D. G., 30, 75 Kolsky, H. G., 78 (2), 152 Kopal, Z., 39, 75 Kopp, R. E., 279 (8, 9), 281 (8, 9), 297 Kort, F., 325 (45, 46), 328, 330, 335 (45, 46, 47, 48), 345 Kosai, Y., 30, 75 Krein, M., 204 (loa), 272 Kryloff, N., 276 Kyner, W. T., 30 (lo), 33 (lo), 74
L Lapides, L., 279 (31), 298 Latta, G. E., 30 (4), 74 Lawlor, R. C., 308 (49), 331 (50), 333 (50), 335 (49, 50), 337 (50), 345
355
AUTHOR INDEX
Leary, J. C., 346 Ledley, R. S., 333 (51), 345 Lee, R. C., 287 (22), 297 Lees, M., 272 Leger, R. M., 297 Leiner, A. L., 82 (11, 12), 152 Levinson, D., 346 Lewis, G. J., 345 Loevinger, L., 325, 345 Lourie, N., 96 (13), 153 Loveland, D. W., 334 (33), 344 Lowry, E. S., 87 (4), 152 Lusted, L. B., 333 (51), 345
M McCarthy, J., 84 (19), 153 McDonough, E., 87 (4), 152 Mace, D., 28 (44), 76 McKinnon, F. B., 346 McLeod, J. H., 297 Manne, A. S., 185,187 Mathews, G. E., 304 (55), 346 Maxm-ell, M. E., 298 Mehl, L., 302, 346 Melton, J. S., 346 Mersel, J., 82 (14), 153 Miller, C. E., 186, 187 Milsum, J. H., 296 Mitchell, D. S., 345 Mitchell, H. F., 82 (6), 152 Molina, E. C., 325, 346 Morgan, R. T., 320, 346 Mori, H., 297 Morrison, D., 53 (29), 54 (28), 56 (30), 66 (30),75 Musen, P., 30, 75
N Nagel, S., 331, 335 (61, 62), 346 Neustadt, L. W., 278 (24), 297 Nigro, J. P., 278 (34), 298 Nothman, M. H., 297 Note, W. A., 82 (12), 153
0 O'Keefe, J. A., 29 (32), 'i5 Ostrowski, A. M., 224, 272 Ottoson, H., 279 (18), 297
P Palevsky, M., 297 Parter, S. V., 228, 272 Paskman, M., 279 (27), 298 Paul, R. J. A., 298 Payne, M., 7 (33), 36 (33), 75 Peaceman, D. W., 193, 211, 262, 271 Peet, W. J., 298 Perry, M. N., 85 (15), 153 Peterson, H. P., 82 (8), 152 Pfeiffer, J. E., 305, 346 Pines, S., 7 (33), 36, 75 Plugge, W. R., 85 (15), 153 Porter, R. E., 78 (16), 153 Pyne, I. B., 185, 187
R Rachford, H. H., 193, 205, 211, 219, 262, 271, 272
Rademacher, H., 25, 75 Reach, R., 96 (13), 153 Remes, E., 261, 272 Rice, J. R., 260, 261, 262, 271, 272 Robison, D. E., 50, 74 Rochester, N., 81, 153 Rosen, J. B., 185 (17), 187 Routh, D., 62, 74
S Scalzi, C. A., 87 (4),152 Schmid, H., 288 (30), 298 Schrimpf, H., 96 (13), 153 Schubert, G. A., 328, 3.46 Shapiro, I., 50 (36), 75 Shapiro, M., 186, 187 Shapiro, S., 279 (31), 298 Shortley, D., 229, 272 Shumate, M. S., 298 Skramstad, H. K., 278 (33, 34), 288 (35, 38), 298 Smart, W. M., 4 (37), 75 Smith, B. M., 308, 3.46 Smith, J. L., 82 (12), 153 Smith, N. M., 185 (9), 187 Smith, 0. K., 28, 31 (41), 63, 64, 75 Squires, R. K., 29 (32), 75 Stein, M. L., 279 (36), 298
AUTHOR INDEX
356
Stephenson, R., 333 (69), 346 Stewart, R. A., 186, 187 Stiefel, E. L., 261, 272 Struble, R., 30, 76 Suppes, P., 305 (65), 309 (65), $46 Susskind, A. K., 298 Swerling, P., 50 (43), 56, 75
W Wachspress, E. L., 193, 194, 195, 196, 204, 206 (24), 211, 232, 260, 262, 272 Waddell, W., Jr., 306, 346 Warner, H. R., 333 (69), 346 Wasow, W. R., 191 (8), 195 (8), 263 (8), 271
T Tammelo, I., 346 Teager, H. M., 84 (18, 19), 153 Thomas, L. M., 28, 76 Thrall, R. M., 205, 272 Titus, J., 76 Tornheim, L., 205, 272 Toronto, A. F., 333 (69), 346
U Urban, W. R., 288 (38), 298 Uzawa, H., 185 (a), 186
Weinberger, A., 82 (12), 153 Weiner, J. R., 82 (6), 162 Welsh, H. F., 82 (6), 152 West, G. P., 278 (l), 296, 298 Weyl, H., 202, 273 Wilson, A. N., 278 (42), 279 (42), 698 Witzgall, C., 186, 187 wolf, H., 7 (331, 36 (33), r5 Wolfe, P., 185, 186 (22), 187 Wortzman, D., 285 (43), 298 Wright, R. E., 298 Y Young, D., 211, 225 (as), 227, 229, 232, 240, 247, 273
V
Z Varga, R. S., 191, 192 (19), 196, 197 (I), 203, 204 (19), 205 (l), 206 (l), 226, 227, 228 (20, 21), 229 (ll), 230 (19), 240, 271 Veasey, L. G., 333 (69), 346
Ziembinski, Z., 309 (70), 346 Zoutendijk, G., 185, 187 Zraket, C. A., 84 (7), 152
Subject Index A Abstracting, automatic, 341 Accuracy tests for integration, 35-47 Adam-Moulton formulas, 22 Adjoint system of daerential equations, 39 Advance commitments, 115-116 Allocation of computer components, 105106; see also Tape, Disk, Core storage Analog-digital variables, 288-296; see also Arithmetic Analog-to-digital, see Converters Arithmetic, analog-digital, 283-288 hybrid, 285 Asymptotic rate of convergence, 195, 200, 212, 223, 225-227
B Batch processing, of computer problems, 83 Batch multiprogramming, 86, 87-104, 113 Biology, information growth in, 301, 303 BLOCKoperation, 132, 144, 145 Blocks, of core storage space, 112, 114 Boolean algebra, 305 Buffer service, 97-98 Burnout parameters, 49, 50-55, 64-66
among numerical integration methods, 38, 4 2 4 7 Concept profile, 322 Concurrency of computer operations, 78; see also Phase, Task c. Convergence, of simplex corrected gradient method, 170 of separable programming, 178 of cutting plane method, 181 of AD1 methods, see Asymptotic rate of c. Converters, analog-digital and digitalanalog, 277, 280, 283-285 Copyright, see Infringement Copyrighting of computer products, 337, 338 Core storage allocation, 112-115 modules, 90 Court decisions, prediction of, 324-333 retrieval of information, 311-323 Courts, see Law Cowell method, 6, 7, 14, 28, 42-47 Crank-Nicolson method, 230 Cutting-plane method, 174, 180-183, 186
D
Data editing, 49, 50, 71 Data exceptions, 128 C Decomposition procedure, 178-180, 186 Definitive orbit, 48, 49 Calculus of classes, 305 Delaunay theory, 30,31, 34, 35 of functions, 305 Density of atmosphere, 5 Determination of orbits, 48-69 Chebyshev polynomials, 229 Diagonally dominated matrices, 192, 202 property, 259 Differential corrections, 49, 51, 6649, 72 cyclic cemi-iterative method, 250 stagewise, 56-58 Columnar procedures, 174, 175-180 Combined analog-digital computers, 277, Differential gradient methods, 161-165 280 direct, 162-163 Combined analog-digital solution, examLagrangian, 163-165 ples, 281-283 Digital-to-analog, see Converters Comparison of analog and digital com- Diliberto theory, 30, 31-35 Dirichlet problem, 231-250 puters, 275-277 of analytic and numerical integration, 36 Disk allocation, 110-112 357
358
SUBJECT INDEX
Divergence of AD1 methods, 253 Double precision, 37 accumulation, 21, 23 Douglas-Rachford method, 193, 197, 206, 217-222, 256-257
parameters, 219, 223 Drag, 5, 49 Dumping, of computer programs, 102 Dump file, 111
E Eccentricity, 9 Eccentric anomaly, 10 Elliptic orbits, 9, 10 Encke method, 7-15, 4 2 4 7 with E or v as independent variable, 11-13
Ephemeris generation, 70 processor, 74 Equations of motion, 4-18 Error reduction matrix, 194-195, 197, 217 Evidence, use of computer records as, 338, 339
Execution control, 95-104 file, 111 phase, 9 1 region, 114
F Feasibility orbits, 2, 3 FIFO (fist-come-fist-serve), 121 Fraud by computers, 338 Frobenius, theorem of, 205 Function generation, 276, 278
G Gauss-Jackson method, 23, 27, 7 1 Gauss-Seidel method, 225, 229 Gerschgorin’s circle theorem, 204 Gradient methods, see differential, largestep, projected, simplex-corrected
Herrick’s method, 15, 28, 42-47, 4Y Hybrid arithmetic, 285 integrator, 290-293 multiplier, 293-295 systems, 278, 288-296 Hyperbolic orbits, 9, 10
I Ill-conditioning, 54 Improved orbit, 48, 49 Indexing, automatic, 304, 341 Inertial coordinate system, 4 Information growth, 299-302 retrieval, 310-323 storage, 310-323 Infringement of copyright by computer records, 335-337 Initiation of algorithms (in mathematical programming), 183-184 Injection parameters, see Burnout p. Input conversion, 70; see also Data editing Input-output requests, 95 Input-output service trees, 137, 138, 141150
Instability, 22 Insurance, application of Boolean algebra to, 305 Integrals of motion, 38 Integration (of ordinary differential equations), methods of, 18-28 Integrator, hybrid, 290-293 Interconnected analog and digital coinputers, 277-281 Internal Revenue Code, 305 Interprogram protection, 98-101, 132, 151 Interruption rules, 125-128 Inverse square law, 6 10, see Input-output Irreducible matrices, 192, 205, 227 Iteration parameters, 190, 193 good, 202, 203, 262 optimal, 198-199, 202, 231-250, 254258, 259-261 see also Douglas-Rachford, Peaceman-
H
Rachford, Wachspress
1
Hansen method, 30 Helmholtz equation, 200-202, 206, 222223, 225-227, 243
Jacobi matrix, 225
359
SUBJECT INDEX
Jacobi method, 225 k-line, 228 Jordan normal form, 248
K Kepler’s equation, 10-13
L Lagrange multiplier, 165, 173 Large-step gradient methods, 165-172 Law courts, use of computers in, 339; see also Court decisions Law, information growth in, 299-303 Law questions affecting computers, 335339 Lan-s, indexing and retrieval of, 320, 340 Least squares, 50-56, 72 weighted, 53 Legal questions, see Law Load, on time-shared computer components, 117 target, 118 Log (operating) file, 111
M Mean anomaly, 9, 17 Mean motion, 17 Mechanization in law practice, 302-305 Medicine, information growth in, 301, 303 hficroflnl images, automatic retrieval of, 324 Microprogramming, 125 Milne formulas, 22 Milne-Stormer formulas, 25 Minimax, 199, 211, 254-262 Missile intercept problem, 281-283 Mixed boundary conditions, 263-265 hlonotonicity principle, 199, 202, 203 Moulton, see Adams Multiplier, hybrid, 293-295 Multistep methods (of integration), 21-24, 25, 40 Multirevolution steps, 28
N Nondata exceptions, 126-128 Nonscheduled mode of computer operation, 106
Nonstandby programs, 112 Nonsystems program file, 111 Nonuniform mesh spacings, 263-265 Norm reduction, 195-196 Normal equations, 52-55 Normalized iterations, 250 Normalized weights, for computer programs, 118 NOT READY state, 93, 102 Notations of symbolic logic, 309 Notched cards, 323-324
0 Oblateness force, 4 Observation processing, 71 Observational errors, 53, 66 Operating staff of computer, communication with, 102-103 Optical observation, 48 Optimization, of computer programs, 104122 of iteration parameters, see Iteration parameters, optimal Orientation elements, 10 Ostrowski’s theorem, 224-225, 227 Overrelaxation, see Successive overrelaxation method
P Parabolic and near-parabolic orbits, 13-15 Parameters, see Douglas-Rachford, Iteration, Peaceman-Rachford, Wachspress Partial derivatives, computation of, 5 8 4 4 , 72 Patent claim, 308-309 Patent searching, 322, 323 Patenting of computer inventions, 337,338 Pattern recognizing machines, 321 Peaceman-Rachford method, 193, 200, 206, 210-217, 226,229,231-250,254256 parameters of, 202, 211-215, 231-250, 262 Peekaboo systems, 323, 340 Pending workload, of computers, 90-91 Periodic two surfaces, 33 Perturbation methods, 28-35, 42-47
SUBJECT INDEX
360
Petroleum industry, use of mathematical programming in, 157 Phase, see Execution, Preparation, Rewinddemount, Scheduling concurrency, 92 Point Jacobi matrix, 225 Post-execution control, 95 Precision orbits, 2, 3 Prediction, see Court decisions Predictor-corrector formulas, 22, 25 Pre-execution control, 95 Preliminary orbit, 48 Preparation phase, 91 Printing, automatic, 340 Priorities, for computer programs, 120, 125, 134 Processing service tree, 135-140 Program mixes, 109 Projected-gradient procedures, 170-172, 185-186 Projection operators, 266-267 Property “A” of matrices, 225 Protection, 128-130; see also Interprogram PPulsed analog techniques, 286-288 Punched cards, applications in law practice, 304, 323-324
Q Quadratic programming, see Simplex method Queue selection time, 121 Queuing of computer programs, 105-106, 120-122, 134-150 Queuing rules, 148-150
R Radar observation, 48 Ranking of judges, 328, 329 Rating, of computer programs, 118, 120 READREQUEST (pseudo-operation), 100 READYstate, 93, 102 Real-time processing, of computer problems, 85 multiprogramming, 86 Rectification, 7, 10 Reference conic, r. orbit, 7-10
Relaxation factor, 224 optimal, 225 Reliability, of court decision predictions, 325 Relocation, 101-102, 128-130 Remes algorithm, 261 Residuals, 54, 72 Restoration of computer programs, 102 Restricted operations, 100-101, 129-130 Rewind-demount phase, 91 Round-off errors, accumulation of, 21, 2528, 3 8 4 2 Run requests, 90 Runge-Kutta methods, 19-21, 24, 71
5 Scalogram, 328,329 Scheduling, of computer runs, 105-106 long-range, 115 short-range, 115, 116-120 algorithm, 116 Scheduling phase, 91 Search, mechanized, 311 SEEKREQUEST (pseudo-operation), 100 Seeking mechanism of storage disks, 88 Seeking time, 119 Segmentation of computer data and instructions, 124 Semimajor axis of orbits, 9 Semi-cymbolic programs, 101 addresses, 115 Separable programming, 177-178, 186 Service requests, 96-97 Service time, 108, 118 Shadow prices, 165 Simplex method for linear programming, 166-168, 172174 for quadratic programming, 174-175, 186 Simplex-corrected gradient method, 168170, 185 Simplicia1 methods, 172-175 Simulation problems, 275, 276, 280 Simulators, 275 Solar parallax, 3, 49 Space release, s. request (in computers), 103
SUBJECT INDEX
Space-scheduled computer operation, 106108 Space sharing by computer programs, 7980 Space-time scheduled computer operation, 106, 108-109 Spectral radius, 195, 202, 233, 251, 253, 257 Staging file, 108, 111 Standard deviation, of errors of observation, 53 Standby program file, 111, 115 Stare decisis, 331 STARToperation, 131, 140, 141 Stationary iterative methods, 195 Statistical estimate of errors, 67-69 Status preservation, 96 Statutes, see Laws Stieltjes matrices, 192, 196, 202, 204, 227 STOPoperation, 132, 142, 143 Successive overrelaxation method (SOR), 190, 224-230, 231-250 point, 224-227 block, 227-229 multiline, 227-229 Supervisory tables, 125 Suspension, of computer programs, 102 Symbolic logic, 305-310 Symmetric rounding, 27 Syntactic ambiguities, 306, 307 Systems file, 111
361
T Tape allocation, 109-110 Task concurrency, 92 Throughput rate, 83 Time sharing, by computer programs, 79, 117 Typewriters, automatic, 304, 341
U UNBLOCK operation, 132, 146, 147 Uniform loading of channels, 109 Unisolvent, 260
V Variance-covariance matrix, 56 Variation of parameters, 15-18 Varisolvent, 260 Virtual number of iterations, 233
W Wachspress parameters, 204,211,215-217, 231-250, 262 WAIT (pseudo-operation), 93, 95 Weakly cyclic matrices, 228 WRITE REQUEST (pseudo-operation), 100