Simulation and the Monte Carlo Method
Simulation and the Monte Carlo Method REUVEN Y.RUBINSTEIN Technion, Israel Institute of Technology
John Wiley & Sons New York Chichester Brisbane
Toronto Singapore
A NOTE TO THE READER This book has been electronicaliy reproduced from digital information stored at John Wiley &Sons, Inc. We are pleased that the use of this new technology will enable us to keep works of emluring scholarly value in print as long as there is a reasonable demand for them. The content of this book is identical to previous printings.
Copyright 0 1981 by John Wiley & Sons, lnc. All rights reserved. Publisbcd simultaneously in Canada.
Reproduction or translation of any part of this work beyond that permitted by Sections 107 or 108 of the 1976 United Slates Copyright Act without the permission of the copyright owner is unlawful. Requests for permission or further information should be addressed to the Permissions Department, John Wiley & Sons, Inc.
Lilkruv o/congrrrr c-g
k iwhbtkw
Dpccc
Rubinstein, Rcuven Y. Simulation and the Monte Carlo method. (Wiley series in probability and mathematical statistics) Includes bibliographies and index. I . Monte Carlo method. 2. Digital computer simulation. I. Title. 11. Series. QA298.RS 5 19.2'82 8 I - 1873 ISBN Ck471-08917-6 AACRZ
10 9
To my wiJe Rina and to my friends Eitan Finkelstein and A iexandr k r n e r Russian refuseniks.
Preface In the last 15 years more than 3000 articles on simulation and the Monte Carlo method have been published. There is real need for a book providing detailed treatment of the statistical aspect of these topics. This book attempts to fill this need, at least partially. I hope it will make the users of simulation and the Monte Carlo method more knowledgeable about these topics. It is assumed that the readers are familiar with the basic concepts of probability theory, mathematical statistics, integral and differential equations, and that they have an elementary knowledge of vector and matrix operators. Sections 6.5, 6.6, 7.3, and 7.6 require more sophistication in probability, statistics, and stochastic processes; they can be omitted for a first reading. Since most complex simulations are implemented on digital computers, a rudimentary acquaintance with computer programming will probably be an asset to the readers of this book, though no computer programs are included, Chapter 1 describes concepts such as systems, models, and the ideas of Monte Carlo and simulation. A discussion of these concepts seems necessary as there is no uniform terminology in the literature. Instead of giving rigid definitions, 1 try to make clear what I mean when I use these terms. In addition to the terminology, some examples and ideas of simulation and Monte Carlo methods are given. Chapter 2 deals with several alternative methods for generating random and pseudorandom numbers on a computer, as well as several statistical methods for testing the “randomness” of pseudorandom numbers. Chapter 3 describes methods for generating random variables and random vectors from different probability distributions. Chapter 4 provides a basic treatment of Monte Carlo integration, and Chapter 5 provides a solution of linear, integral, and differential equations by Monte Carlo methods. It is shown that, in order to find a solution by Monte Carlo methods, we must choose a proper distribution and present vii
viii
PREFACE
the problem in terms of its expected value. Then, taking a sample from this distribution, we can estimate the expected value. In addition, variance reduction techniques (importance sampling, control variates, stratified sampling antithetic variates, etc.) are discussed. Chapter 6 deals with simulating regenerative processes and in particular with estimating some output parameters of the steady-state distribution associated with these processes. Simulation results for several practical problems are presented, and variance reduction techniques are given as well. Chapter 7 discusses random search methods, which are also related to Monte Carlo methods. In this chapter I describe how random search methods can be successfully applied for solving complex optimization problems. The final version of this book was written during my 1980 summer visit at IBM Thomas J. Watson Research Center. I express my gratitude to the Computer Sciences Department for their hospitality and for providing a rich intellectual environment. A number of people have contributed corrections and suBestions for improvement of the earlier draft of the manuscript, especially P. Feigin, I. Kreimer, 0.Maimon, H. Nafetz, G. Samorodnitsky, and E. Yaschin from Technion, Israel Institute of Technology, and P. Heidelberger and S. Lavenberg of IBM Thomas J. Watson Research Center. It is a pleasure to acknowledge my debt to them. I would also like to express my indebtedness to Beatrice Shube of John Wiley & Sons and to Eliezer Goldberg of Technion for their efficient editorial guidance. Many thanks to Marylou Dietrich of IBM and to Eva Gaster of Technion for their excellent typing. Finally, I thank the following authors and publishers for granting permission for publication of the cited material: Pages 12- 17 based on Handbook of Operations Research, Foutuktiorts and Fundamenrafs.Edited by Joseph T. Modem and Salah E. Elmagraby, Von Nostrand Reinhold Company, 1978, pp. 570-573. Pages 23-25 based on D. E. Knuth, The Art of Computer Programming: Seminumerical Algorithms, Val. 2, Addisson-Wesley, Reading, Massachusetts, 1969, pp. 155- 156. Pages 199-208 based on Y. R. Rubinstein, Selecting the best stable stochastic system, in Stochastic Processes and their Applications, 1980. (to appear) Pages 253-255 based on Y. R. Rubinstein and 1. Weisman, The Monte Carlo method for global optimization, Cahiers du Centre d'Etudes de Recherche Operationelle. 21, No. 2, 1979, pp. 143- 149.
PREFACE
ix
Pages 248-251 based on Y. R. Rubinstein, and A. Kornovsky, Local and integral properties of a search algorithm of the stochastic approximation type. Stochusfic Processes Appf., 6, 1978, 129- 134.
REWEN Y.RUBINSTEIN Ha.$2, Israel March 1981
Contents 1. SYSTEMS, MODELS, SIMULATION, AND THE MONTE CARL0 METHODS 1.1 systems, 1
1.2
Models, 3
13 Simdation and the Monte Carlo Methods, 6
1.4 A Madiine Shop Example, 12 References, 17
2. RANDOM NUMBER GENERATION 2.1
Introduction, 20
2.2
Coognrential Generators, 21
23 Statistical Tests of Pseudorandom Numbers, 26 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 2.3.6 2.3.7
Chi-square Goodness-ofiFit Te.st, 26 ~o~mogorou-Smirnou Goodness-o&Fit Test, 27 Cramer-wn Mises Goodness-of-Fit Test, 30 Serial Test, 30 Run-Up-ad-Down Test, 31 Gap Test, 32 Maximum Test, 33
Exercfses, 33
References, 35
xii
CONTENIS
3. RANDOM VARIATE GENERATION
3.1
Introduction, 38
3.2 inverse Transform Method, 39
3 3 Composition Method, 43 3.4 Acceptance-Rejection Method, 45 3.4.I 3.4.2 3.4.3 3.4.4
Single- Variate Case, 45 Multisurinte Cuse, 50 Generalization of von Neumunn’s Method, 51 Forsythe’s Method, 56
3 5 Simulation of Random Vectors, 58 3.5,I 3.5.2
3.5.3
Intvrse Transform Merhod, 58
Multivariate Transformation Method, 61 Multinormal Distribution, 65
3.6 Generating from Continuous Distributions, 67 Exponentid Distribution, 67 Gamma Distribution, 71 Beta Distribution, 80 3.6.4 Normal Distribution, 86 3.6.5 Lognormal Distribution, Y I 3.6.6 Cauchy Distribution, 91 Weibul Distribution, 92 3.6.7 3.6.8 Chi-square Distribution, 93 3.6.9 Student’s t-Dislribution. 94 3.6.10 F Distribution, 94
3.6. I 3.6.2 3.6.3
3.7 Generating from Discrete Distributions, 95 3.7. I 3.7.2 3.7.3 3.7.4 3.7.5
Binomial Distribution, 101 Poisson Distribut ion, I02 Geometric Distribution, 104 Negatice Binomial Distribution, 104 Hypergeometric Distribution, 106
Exercises, 107 References, 11I
38
xiii
CONTENTS
4.
MONTE CARL0 INTEGRATION AND VARXANCE REDUCTION TECHNIQUES 4.1
114
Introduction, 114
4.2 Monte Carlo Integration, 115 4.2.1 4.2.2 4.2.3 4.2.4
The Hit or Miss Monte Carlo Method I15 The Sample-Mean Monte Carlo Method, I 1 8 Efliciency of Monte Carlo Method I19 Integration in Presence qf Noise, I20
4 3 Variance Reduction Techniques, 121 4.3. I 4.3.2 4.3.3 4.3.4 4.3.5 4.3.6 4.3.7 4.3.8 4.3.9 4.3.10 4.3.JI
Importance Sampling, 122 Correlated Sampling. I24 Control Variates. 126 Stratified Sampling, I31 Antithetic Variates, I35 Partifion of the Region, 138 Reducing the Dimensionaliq, I 4 0 Conditional Monte Carlo, 141 Random Quadrature Method, 143 Biased Estimators, I45 Weighted Monle Carlo Integration, 147 4.3.12 More about Variance Reduction (Queueing Systems and Networks), 148 Exercises, 153
References, 155 Additional References, 157
5. LINEAR EQUATIONS AND MARKOV C W N S 5.1 Simultaneous Linear Equations and Ergodic Markov Chains, 158
5.1.I 5.1.2
Adjoini System of Linear Equations, I63 Computing the Intvrse Matrix, 168
158
xiv
CONTENTS
5.1.3
Soloing a Sysrem of Linear Equations by Simulating a Markoo Chain with an Absorbing State, 170
5.2 lntegral Equations, 173 52.1
5.2.2 5.2.3
Integral Transforms, I 73 Integral Equations of the Second K i d 176 Eigenvalue Probfem, I78
53 The DirichIet Problem, I79 Exercises, 180
References, 181
6. REGENERATIVE METHOD FOR SIMULATION ANALYSIS 183 6.1
Introduction, 183
6.2 Regenerative SimuIation, 184
6 3 Point Estimators aad Confidence Intervals, 187 6.4 Examples of Regenerative Pmceses, 193 6.4.1 6.4.2
6.4.3
A Single Seroer Queue G I / G / I , 193 A Repairman Modef with Spares, 195 A Closed Queueing Network, I97
6.5 Selecting the Best Stable Stochastic System, 199
6.6 Tbe Regenerative Method for Constrained Optimization ProMems, 248 6.7 Variance Reduction Tecbuiques, 213
6.7.1 6.7.2
Confrof Variafes, 2 14 Common Random Numbers in Comparing Stochasric S’sfem, 224
Exercises, 2 t 9
References, 230
xv
CONTENTS
7.
MONTE CARL0 OPTIMIZATION 7.1
Random search Algorithms, 235
7.2
Efficiency of Random Search AlgWtthms, 241 i and Integral Properties of Optimum Trial Random sesrd, Algorithm RS4,248
73 h
7.3.1 7.3.2 7.4
Local Properires of the Algorithm, 248 iniergral Properties of the Algorithm, 251
Monte Cwto Method for Globat Optimizadon, 2f2
7 5 A Closed Form Soiution for Global Optimization, Mo
7.6
Optimization by Smootbd Functioaab;, 263 Appendix, 272 Exercises, 273
References. 273 INDEX, 277
234
Simulation and the Monte Carlo Method
Simulation and the Monte Carlo Method
R E W E N Y. RUBINSTEIN Copyright 0 1981 by John Wiley & Sons, Inc.
CHAPTER1
Systems, Models, Simulation, and the Monte Carlo Methods In this chapter we discuss the concepts of systems, models, simulation and Monte Carlo methods. This discussion seems necessary in the absence of a unified terminology in the literature. We do not give rigid definitions, however, but explain what we mean when using the above-mentioned terms. 1.1 SYSTEMS
By a system we mean a set of reiated entities sometimes called componenfs or elemenrs. For instance, a hospital can be considered as rt system, with doctors. nurses, and patients a.. elements. The eiements have certain characteristics, or attributes, that have logical or numerical values. In aur example an attribute can be, for instance, the number of beds, the number of X-ray machines, skill, quantity, and so on. A number of activities (relations) exist among the elements, and consequently the elements interact. These activities cause changes in the system. For example, the hospital has X-ray machines that have an operator. If there is no operator, the doctors cannot have X-rays of the patients taken. We consider both internal and external relationships. The internal relationships connect the elements within the system, while the external relationships connect the elements with the environment, that is, with the world outside the system. For instance. an internal relationship is the relationship or interaction between the doctors and nurses, or between 1
2
SYSTEMS, MODELS, SI.MC'LATION, AND THE MONTE CARLO METHODS
lnput
h
System
Outpur I
I
I
I
I
I I i
Feedback loop
------------
I I 2
Flg. 1.1.1 Graphical representation of a system.
the nurses and the patients. An external relationship is, for example, the way in which the patients are delivered to the emergency room. We can represent a system by a diagram, as in Fig. 1.1.1. The system is influenced by the environment through the input it receives from the environment. When a system has the capability of reacting to changes in its own state, we say that the system contains feedback. A nonfeedback, or open-loop, system lacks this characteristic. For an example of feedback consider a waiting line; when there are more than a certain number of patients, the hospital can add more staff to handle the increased workload. The attributes of the system elements define its state. In our example the number of patients waiting for a doctor describe the system's state. When a patient arrives at or leaves the hospital, the system moves to a new state. If the behavior of the elements cannot be predicted exactly, it is useful to take random observations from the probability distributions and to average the performance of the objective. We say that a system is in equilibrium or in the steady state if the probability of being in some state does not vary in time. There are still actions in the system, that is, the system can still move from one state to another, but the probabilities of its moving from one state to another are fixed. These fixed probabilities are limiting probabilities that are realized after a long period of time, and they are independent of the state in which the system started. A system is called stable if it returns to the steady state after an external shock in the system. If the system is not in the steady state, it is in a transient state. We can classify systems in a variety of ways. There are natural and artificial systems, ada;price and nonadaptiw sysrems. An adaptive system reacts to changes in its environment, whereas a nonadaptive system does not. Analysis of an adaptive system requires a description of how the environment induces a change of state. Suppose that over a period of time the number of patients increases. If the hospital adds more staff to handle the increased workload, we say that the hospital is an adaptive system.
MODELS
3
1.2 MODELS
The first step in studying a system is building a model. The importance of models and model-building has been discussed by Rosenbluth and Wiener (321, who wrote: No substantial part of the universe is so simple that it can be grasped and controlled without abstraction. Abstraction consists in replacing the part of the universe under consideration by a model of similar but simpler structure. Models.. .are thus a central necessity of scientific procedure.
A scientific model can be defined as an abstraction of some real system, an abstraction that can be used for prediction and control. The purpose of a scientific model is to enable the analyst to determine how one or more changes in various aspects of the modeled system may affect other aspects of the system or the system as a whole. A crucial step in building the model is constructing the objective function, which is a mathematical function of the decision variables. There are many types of models. Churchman et al. 141 and Kiviat [IS] described the following kinds: 1 Iconic models Those that pictorially or visually represent certain aspects of a system. 2 Analog models Those that employ one set of properties to represent some other set of properties that the system being studied possesses. 3 Symbolic models Those that require mathematical or logical operations and can be used to formulate a solution to the problem at hand.
In this book, however. we are concerned only with symbolic models (which are also called abstract models), that is, we deal with models consisting of mathematical symbols or flowcharts. All other models (iconic, analog, verbal, physical, etc.), although no less important, are excluded from this hook. ‘There are many advantages by using mathematical models. According to Fishman (81 they do the following: 1 Enable investigators to organize their theoretical beliefs and empirical observations about a system and to deduce the logical implications of this organization. 2 Lead to improved system understanding. 3 Bring into perspective the need for detail and relevance. 4 Expedite the analysis. 5 Provide a framework for testing the desirability of system modifications.
4
SYSTEMS, MODELS, SIMULATION. AND THE MONTE CARL0 METHODS
6 Allow for easier manipulation than the system itself permits. 7 Permit control over more sources of variation than direct study of a system would allow. 8 Are generally less costly than the system. An additional advantage is that a mathematical model describes a problem more concisely than, for instance, a verbal description does. On the other hand, there are at least three reservations in Fishman’s monograph [S], which we should always bear in mind while constructing a model. First, there is no guarantee that the time and effort devoted to modeling will return a useful result and satisfactory benefits. Occasional failures occur because the level of resources is too low. More often, however, failure results when the investigator relys too much on method and not enough on ingenuity; the proper balance between the two leads to the greatest probability of success. The second reservation concerns the tendency of an investigator to treat his or her particular depiction of a problem as the best representation of reality. This is often the case after much time and effort have been spent and the investigator expects some useful results. The third reservation concerns the use of the model to predict the range of its applicability without proper qualification. Mathematical models can be classified in many ways. Some models are srutic, other are ~+nomic. Static models are those that do not explicitly take time-variation into account, whereas dynamic models deal explicitly with time-variable interaction. For instance, Ohm’s law is an example of a static model, while Newton’s law of motion is an example of a dynamic model. Another distinction concerns deterministic versus sfochmtic models. In a deterministic model all mathematical and logical relationships between the elements are fixed. As a consequence these relationships completely determine the solutions. In a stochastic model at least one variable is random. While building a model care must be taken to ensure that it remains a valid representation of the problem. In order to be useful, a scientific model necessarily embodies elements of two conflicting attributes-realism and simplicity. On the one hand, the model should serve as a reasonably close approximation to the real system and incorporate most of the important aspects of the system.On the other hand, the model must not be so complex that it is impossible to understand and manipulate. Being a formalism, a model is necessarily an abstraction. Often we think that the more details a model includes the better it resembles reaIity. But adding details makes the solution more difficult and
MODELS
5
converts the method for solving a problem from an analytical to an approximate numerical one. In addition, it is not even necessary for the model to approximate the system to indicate the measure of effectiveness for all various alternatives. All that is required is that there be a high correlation between the prediction by the model and what would actually happen with the real system. To ascertain whether this requirement is satisfied or not, it is important to test and establish control over the solution. Usually, we begin testing the model by re-examining the formulation of the problem and revealing possible flaws. Another criterion for judging the validity of the model is determining whether all mathematical expressions are dimensionally consistent. A third useful test consists of varying input parameters and checking that the output from the model behaves in a plausible manner. The fourth test is the so-calied retrospective test. It involves using historical data to reconstruct the past and then determining how well the resulting solution would have performed if it had been used. Comparing the effectiveness of this hypothetical performance with what actually happened then indicates how well the model predicts the reality. However, a disadvantage of retrospective testing is that it uses the same data that guided formulation of the model. Unless the past is a true replica of the future, it is better not to resort to this test at all. Suppose that the conditions under which the model was built change. In this case the model must be modified and control over the solution must be established. Often, it is desirable to identify the critical input parameters of the model, that is, those parameters subject to changes that would affect the solution, and to establish systematic procedures to control them. This can be done by sensitioity analysis, in which the respective parameters are varied over their ranges to determine the degree of variation in the solution of the model. After constructing a mathematical model for the problem under consideration, the next step is to derive a solution from this model. There are analytic and numerical solution methods. An analytic solution is usually obtained directly from its mathematical representation in the form of formula. A numerical solution is generally an approximate solution obtained as a result of substitution of numerical values for the variables and parameters of the model. Many numerical methods are iterative, that is, each successive step in the solution uses the results from the previous step. Newton’s method for approximating the root of a nonlinear equation can serve as an example. Two special types of numerical methods are simulation and the Monte Carlo methods. The following section discusses these.
6
SYSTEMS, MODELS, SIMUI.ATION, AND THE MONTE CARLO METHODS
13 SIMULATION AND THE MONTE CARLO METHODS
Simulation has Iong been an important tool of designers, whether they are simulating a supersonic jet flight, a telephone communication system, a wind tunnel, a large-scale military battle (to evaluate defensive or offensive weapon systems), or a maintenance operation (to determine the optimal size of repair crews). Although simulation is often viewed as a “method of last resort” to be employed when everything else has faiied, recent advances in simulation methodologies, availability of software, and technical developments have made simulation one of the most widely used and accepted tools in system analysis and operations research. Naylor et al. [28] define simulation as follows: Simulation is a numerical technique for conducting experiments on a digital computer, which involves certain types of mathematical and logicjll models that describe the behavior af business or economic system (or some component thereof) over extended periods of real time.
This definition is extremely broad, however, and can include such seemingfy unrelated things as economic models, wind tunnel testing of aircraft, war games, and business management games. Naylor et al. f28] write: The fundamental rationale for using simulation is man’s unceasing quest for knowledge about the future. This search for knowledge and the desire to predict the future arc as old as the history of mankind. But prior to the seventeenth century the pursuit of predictive power was limited almost entirely to the purely deductive methods of such philosophers as Plato, Aristotie. Euclid, and others.
Simulation deals with both abstract and physical models. Some simulation with physical and abstract models might involve participation by real people. Examples include link-trainers for pilots and military or business games. Two types of simulation involving real people deserve special mention. One is operational gaming, the other man-machine simulation. The term “operational gaming” refers to those simulations characterized by some form of conflict of interest among players or human decisionmakers within the framework of the simulated environment, and the experimenter, by observing the players, may be able to test hypotheses concerning the behavior of the individuals and/or the decision system as a whole.
SIMULATION AND THE MONTE CARL0 MhTtTflODS
7
In operational gaming a computer is often used to collect, process, and produce information that human players, usually adversaries, need to make decisions about system operation. Each player’s objective is to perform as well as possible. Moreover, each player’s decisions affect the information that the computer provides as the game progresses through simulated time. The computer can also play an active role by initiating predetermined or random actions to which the players respond. War games and business management games are commonly discussed in operational gaming literature (see. e.g., Morgenthaler (231 and Shubik [38]). Military gaming is essentially a training device for military leaders; it enables them to test the effects of alternative strategies under simulated war conditions. For example, the Naval Electronic Warfare Simulator, developed in the 195Os, consisted of a large analog computer designed primarily to assess ship damage and to provide information to two opposite forces regarding their respective effectiveness in a naval engagement [14, pp. IS, 161. The exercise, which is one form of simulation gaming, has been used as an educational device for naval fleet officers in the final stages of their training. Business games are also a type of educational tool, but for training managers or business executives rather than military leaders. A business game is a contrived situation which imbeds players in a simulated
business environment, where they must make management-type decisions from time to time, and their choices at one time generally affect the environmental conditions under which subsequent decisions must be made. Further. the interaction between decisions and environment is determined by a refereeing process which ia not open to argument from the players [30,pp, 7.81.
In man-machine simulation there is no need for gaming. While interacting with the computer real people in the laboratory perform the data reduction and analysis. The following two examples are drawn from Fishman (8): The Rand Systems Research Laboratory employed simulation to generate stimuli for the study of information processing centers [14, p. 161. The principal features of a radar site were reproduced in the laboratory, and by carefully controlling the synthetic input to the system and recording the behavior of the human detectors it was possible to examine the relative effectiveness of various man-machine combinations and procedures. In 1956 Rand established the Logistics System Laboratory under U.S. Air Force sponsorship [lo]. The first study in this laboratory involved
8
SYSTEMS, MODELS, SIMULATION, AND THE MONTE CARL0 METHODS
simulation of two large logistics systems in order to compare their effectiveness under different management and resource utilization poticies. Each system consisted of men and machines, together with policy rules for the use of such resources in simulated stress situations such as war. The. simulated environment required a specified number of aircraft in flying and alert states, while the system’s capability to meet these objectives was limited by malfunctioning parts, procurement and transportation delays, and the like. The human participants represented management personnel, while higher echelon policies in the utilization of resources were simulated on the computer. The ultimate criteria of the effectiveness of each system were the number of operationally ready aircraft and the dollar cost of maintaining this number. Although the purpose of the first study in this laboratory was to test the feasibility of introducing new procedures into an existing air force logistics system and to compare the modified system with the original one, the second laboratory problem had quite a different objective. Its purpose was to improve the design of the operational control system through the use of simulation. Naylor et ai. [28] describe many situations where simulation can be successfully used. We mention some of them. First, it may be either impossible or extremely expensive to obtain data from certain processes in the real world. Such processes might involve, for example, the performance of large-scale rocket engines, the effect of proposed tax cuts on the economy, the effect of an advertising campaign on total sales. In this case we say that the simulated data are necessary to formulate hypotheses about the system. Secondly, the observed system may be so complex that it cannot be described in terms of a set of mathematical equations for which analytic solutions are obtainable. Most economic systems fall into this category. For example, it is virtually impossible to describe the operation of a business firm, an industry, or an economy in terms of a few simple equations. Simulation has been found to be an extremely effective tool for dealing with problems of this type. Another class of problems that leads to similar difficulties is that of large-scale queueing problems invofving multiple channels that are either parallel or in series (or both). Thirdly, even though a mathematical model can be formulated to describe some system of interest, it may not be possible to obtain a solution to the model by straightforward analytic techniques. Again, economic systems and complex queueing problems provide examples of this type of difficulty. Although it may be conceptually possible to use a set of mathematical equations to describe the behavior of a dynamic system
SIMULATION AND THE MON'I'E CAR1.0 METHODS
9
operating under conditions of uncertainty, presentday mathematics and computer technoiogy are simply incapable of handling a problem of this magnitude. Fourth, it may be either impossible or very costly to perform validating experiments on the mathematical models describing the system. In this case we say that the simulation data can be used to test alternative hypotheses. In all these cases simulation is the only practical tool for obtaining relevant answers. Naylor et ai. [28f have suggested that simulation analysis might be appropriate for the following reasons: 1 Simulation makes it possible to study and experiment with the complex internal interactions of a given system whether it be a firm, an industry, an economy, or some subsystem of one of these. 2 Through simulation we can study the effects of certain informational, organizational, and environmental changes on the operation of a system by making alterations in the model of the system and observing the effects of these alterations on the system's behavior. 3 Detailed observation of the system being simulated m a y lead to a better understanding of the system and to suggestions for improving it, suggestions that otherwise would not be apparent. 4 Simulation can be used a.. a pedagogicai device for teaching both students and practitioners basic skills in theoretical analysis, statistical analysis, and decision making. Among the disciplines in which sirnulation has been used successfully for this purpose are business administration, economics, medicine, and law. 5 Operational gaming has been found to be an excellent means of stimulating interest and understanding on the part of the participant, and is particularly useful in the orientation of persons who are experienced in the subject of the game. 6 The experience of designing a computer simulation model may be more valuable than the actuaI simulation itself. The knowledge obtained in designing a simulation study frequently suggests changes in the system being simulated. The effects of these changes can then be tested via simulation before implementing them on the actual system. 7 Simulation of complex systems can yield valuable insight into which variables are more important than others in the system and how these variables interact. 8 Simufation can be used to expenment with new situations about which we have little or no information so as to prepare for what may happen.
10
SYSTEMS, MODELS, SIML;I.ATION, AND THE MONTE CARL0 METHODS
9 Simulation can serve as a “preservice test” to try out new policies and decision rules for operating a system, before running the risk of experimenting on the real system. 10 Simulations are sometimes valuable in that they afford a convenient way of breaking down a complicated system into subsystems, each of which may then be modeled by an analyst or team that is expert in that area 123, p. 373). 11 Simulation makes it possible to study dynamic systems in either real time, compressed time, or expanded time. 12 When new components are introduced into a system, simulation can be used to help foresee bottlenecks and other problems that may arise in the operation of the system 123,p. 3751.
Computer simulation also enables us to repiicate an experiment. Replication means rerunning an experiment with selected changes in parameters or operating conditions being made by the investigator. In addition, computer simulation often allows us to induce correlation between these random number sequences to improve the statistical analysis of the output of a simulation. ln particular a negative correlation is desirable when the results of two replications are to be summed, whereas a positive correlation is preferred when the results are to be differenced, as in the comparison of experiments. Simulation does not require that a model be presented in a particular format. I t permits a considerable degree of freedom so that a model can bear a close correspondence to the system being studied. The results obtained from simulation are much the same as observations or measurements that might have been made on the system itself. To demonstrate the principles involved in executing a discrete simulation, a n example of simulating a machine shop is given in Section 1.4. Many programming systems have been developed, incorporating simulation languages. Some of them are general-purpose in nature, while others are designed for specific types of systems. FORTRAN, ALGOL, and PL/1 are examples of general-purpose languages, while GPSS, SIMSCRIPT, and SIMULA are examples of special simulation languages. Simulation is indeed an invaluable and very versatile tool in those problems where analytic techniques are inadequate. However, it is by no means ideal. Simulation is an imprecise technique. It provides only statistical estimates rather than exact results, and it only compares alternatives rather than generating the optimal one. Simulation is aiso a slow and costly way to study a problem. It usually requires a large amount of time and great expense for analysis and programming. Finally, simulation yields only numerical data about the performance of the system, and sensitivity
SIMULATION AND TlIE MOVIE CARL0 MBIIIOUS
11
analysis of the model parameters is very expensive. The only possibility is to conduct series of simulation runs with different parameter values. We have defined simulation as a technique of performing samphng experiments on the model of the system. This general definition is often called simulation in a wide sense, whereas simulation in a nurrow sense, or stochastic simulation, is defined as experimenting with the model over time; it includes sampling stochastic variates from probability distribution [ 191. Therefore stochastic simulation is actually a statistical sampling experiment with the model. This sampling involves all the problems of statistical design analysis. Because sampling from a particular distribution involves the use of random numbers, stochastic simulation is sometimes called Monte Carlo simulation. Historically, the Monte Carlo method was considered to be a technique, using random or pseudorandom numbers, for solution of a model. Random numbers are essentially independent random variables uniformly distributed over the unit interval 10, 1). Actually, what are available at computer centers are arithmetic codes for generating sequences of pseudorandom digits, where each digit (0 througb 9) occurs with approximately equal probability (likelihood). Consequently, the sequences can model successive flips of a fair ten-side die. Such codes are called random number generators. Grouped together, these generated digits yield pseudorandom numbers with any required number of elements. We discuss random and pseudorandom numbers in the next chapter. One of the earliest problems connected with Monte Carlo method is the famous Buffon’s needle problem. The problem is as follows. A needle of length I units is thrown randomly onto a floor composed of parallel planks of equal width d units, where d > 1. What is the probability that the needle, once it comes to rest, will cross (or touch) a crack separating the planks on the floor? It can be shown that the probability of the needle hitting a crack is P = 2l/nd. which can be estimated as the ratio of the number of throws hitting the crack to the total number of throws. In the begining of the century the Monte Carlo method was used to examine the Boltzmann equation. In 1908 the famous statistician Student used the Monte Carlo method for estimating the correlation coefficient in his r-distribution. The term “Monte Carlo” was introduced by von Neumann and Ulam during World War 11, as a code word for the secret work at Los Alamos; it was suggested by the gambling casinos at the city of Monte Carlo in Monaco. The Monte Carlo method was then applied to problems related to the atomic bomb. The work involved direct simulation of behavior concerned with random neutron diffusion in fissionable material. Shortly thereafter Monte Carlo methods were used to evaluate complex multidi-
12
SYSTEMS, MODELS, SIMULATION, AND THE MONTE CARL0 METHODS
mensional integrals and to solve certain integral equations, occurring in physics, that were not amenable to analytic solution. The Monte Carlo method can be used not only for solution of stochastic problems, but also for solution of deterministic problems. A deterministic problem can be solved by the Monte Carlo method if it has the same formal expression as some stochastic process. In Chapter 4 we show how the Monte Carlo method can be used for evaluating multidimensional integrals and some parameters of queues and networks. In Chapter 5 the Monte Carlo method is used for solution of certain integral and differential equations. Another field of application of the Monte Carlo methods is sampling of random variates from probability distributions, which Morgenthaler [23] calls model sampling. Chapter 3 deals with sampling from various distributions. The Monte Carlo method is now the most powerful and commonly used technique for analyzing complex problems. Applications can be found in many fields from radiation transport to river basin modeling. Recently, the range of applications has been broadening, and the complexity and computational effort required has been increasing, because realism is associated with more complex and extensive problem descriptions. Finally, we mention some differences between the Monte Carlo method and simulation: 1 in the Monte Carlo method time does not play as substantial a role as it does in stochastic simulation. 2 The observations in the Monte Carlo method, as a rule, are independent. In simulation, however, we experiment with the model over time so, as a rule, the observations are serially correlated. 3 In the Monte Carlo method it is possible to express the response as a rather simple function of the stochastic input variates. In simulation the response is usually a very complicated one and can be expressed explicitly only by the computer program itseif. 1.4
A MACHINE SHOP EXAMPLE
This example is quoted from Gordon [ 11, pp. 570-5731. For better understanding of the example an important distinction to be made is whether an entity is permanent or temporary. Permanent entities can be compactly and efficiently represented in tables, while temporary entities will be volatile records and are usually handled by the list processing technique described later.
A MACHINE SHOP
EXAMPLE
13
Consider a simple machine shop (or a single stage in the manufacturing process of a more complex machine shop). The shop is to machine five types of parts. The parts arrive at random intervals and are distributed randody among the different types. There are three machines, a11 equally able to machine any part. If a machine is available at the time a part arrives, machining begins immediately. If all machines are busy upon arrival, the part will wait for service. On completion of machining the part will be dispatched to a certain destination, depending on its type, The progress of the part is not followed after it is dispatched from the shop. However, a count of the number of parts dispatched to each destination is kept. Clearly, there are two types of elements in the system: parts and machines. There will be a stream of temporary elements, that is, the parts that enter and leave the system. There is no point in representing the different types of parts as different elements; rather, the type is an attribute of the parts. As indicated before, it is simpler to consider the group of machines as a single permanent element, having as attributes the number of machines and a count of the number currently busy. The activities causing changes in the system are the generation of parts, waiting, machining, and departing. (a) System Image A set of numbers is needed to record the state of the system at any time. This set of numbers is called the syslem imge, since it reflects the state of the system. The simulation proceeds by deciding, from the system image, when the next event is due to occur and what type of event it will be; testing whether i t can be executed; and executing the changes to the image implied by the event. The image must have a number representing clock time, an3 this number is advanced, in uneven steps, with the succession of events in the system. For each part record, there are four numbers to represent the part type, the arrival time, the machining time, and the time the part will next be involved in an event. The first three of these items are random variates derived by the methods described in Chapters 3 and 4. The next event time, in generaI, depends on the state of the system, and must be derived as the simulation proceeds. The organization used for the system image is illustrated in Fig. 1.4.1. There are four frames in this figure, representing successive states of the system. The frames are read from left to right and from top to bottom. The frame in the top left corner is the initial state. The description of the system image is made in terms of that particular frame.
Next amval
1
Part Machine Arrival type time time
2
75
I002
Next event time
1002
1
Patt Machine Arrival type time time
(
I
68
1018
Next event time
I
1018
1 I
1
J
Waiting 976
84
-
I
Parts
3
43
being machined
1
2
21 62
1
2
3
12
22
20 31
I5
1
68
1018
1018
896
I
1003
62
1003
8%
Clock time
Counters
Next arrival
I
4
5
II
2
3
12
22
20 31
15
1
68
I018
1018
4
2
being machined
936
1017
75
1002
-
43
972
1040
1
(10171
Clock time 1
2
3
4
12
23
20 31 Fig. 1.4.1
14
5
r-----
Waiting parts
Counters
r;2;;1
1
5 15
1
1
2
3
13
23
20 31
Machine shop example.
4
5
15
I
A MACHINE SHOP EXAMPLE
15
The top line of the system image represents the part due to enter the system next. As shown here, it is a type 2 part, will require 75 minutes of machining, and is due to arrive at time 1002. This, of course, is also its next event time. Below the next amval listing is an open-ended list of the parts that have amved and are now waiting for service. Currently, there are two waiting parts. As indicated, they are listed in order of arrival. Because the waiting parts are delayed, it is not possible to predict a next event time for them. It is necessary to see whether there is a waiting part when a machine finishes, and to offer service to the first part in the waiting line. The next rows of numbers represent the parts now being machined, in this case limited to three. Once machining begins, the time to finish can be derived and entered as the next event time. Three parts are occupying the machines at this time and they have been listed in the order in which they will finish. Finally, a number represents the clock time, here set to an initial value of 1O00, and there are five counters showing how many parts of each type have been completed. Note that it is not customary to precalculate all the random variates. Instead, each is calculated at the time it is needed, so a simulation program continually switches between the examination and manipulation of the system image and the subroutines that calculate the random variates. (b) The Simulation procesS Looking now at the system image in Fig. 1.4.1, assume all events that can be executed up to time loo0 have been processed. It is now time to begin one more cycle. The first step is to find the next potential event by scanning all the event times. Because of the ordering of the parts being machined, it is, in fact, necessary only to compare the time of the next arrival with the first listed time in the machining section. With the numbers shown in frame I, the next event is the arrival of a part at time 1002, so the clock is updated to this time in the second frame. The arriving part finds all machines busy and must join the waiting line. The successor to the part just arrived is generated and inserted as the next future arrival, due to arrive at time 1018. Another cycle can now begin. The next event is the completion of machining a part at time 1003. The third frame of Fig. 1.4.1 shows the state of the system at the end of this event. The clock is updated to 1003 and the finished part is removed from the system, after incrementing by 1 the counter for that part type. There is a waiting part, so machining is started on the first part in the waiting line, and its next event time, derived from the machining time of 84, is calculated as 1087. In this case the new part for machining has the largest
16
SYSTEMS, MODELS, SIMULATION, AND THE MONTE CARLO METHODS
finish time, and it joins the end of the waiting line. The records in the waiting line and the machine segment are all moved down one line. There is tpen another completion at 1017 that, as before, leads to a counter being increment4 and service being offered to the first part in the waiting line. In this case, however, the machining time is short enough for the new part to finish ahead of one whose machining started earlier, so, instead of being the last listed part, the new part becomes the second in the list. This is shown in the last frame of Fig. 1.4.1. (c) Statistics Gathering The purpose of the simulation, of course, is to learn something about the system. In this case only the counts of the number of completed parts by type have been kept. Depending upon the purpose of the simulation study, other statistics could be gathered. Simuiation language programs include routines for collecting certain typical statistics. Among the commonly used types of statistics are the following:
1 Counts Counts give the number of elements of a given type or the number of times some event occurred. 2 Utiihtion OS equipment This can be counted in terms of the fraction of time the equipment is in use or in terms of the average number of units in USE. 3 Distributions This means distributions of random variates, such as processing times and response times. together with their means and standard deviations. (d) LLst Processing In the machine shop example it was convenient to describe the records as though they were located in one of three places, corresponding to whether they represented parts that were aniving, waiting, or being processed. The simulation was described in terms of moving the records from one place to the next, possibly with some resorting. A computer program that used this approach would be very inefficient because of the large amount of data movement involved. Much better control and efficiency are obtained by using list processing. With this technique each record consists of a number of contiguous words (or bytes), some of which are reserved for constructing a list of the records. Each record contains, in a standard position, the address of the next record in the list. This is called a pointer. A special word, called a header, located in a known position, contains a pointer to the first record in the list. The last record in the list has an end-of-list symbol in place of its pointer. If the list happens to be empty, the end-of-list symbol appears in the header. The pointers, beginning from the header, place the records in a specific order, and allow a program to search the records by following the chain of
REFERENCES
17
pointers. These lists, in fact, are usually called chains. There may be another set of pointers tracing through the chain from end to beginning so that a program can move along the chain in either direction. It is also possibie for a record to be on more than one chain, simply by reserving pointer space for each possible chain. Removing or adding a record, or reorganizing the order of a chain now becomes a matter of manipulating pointers. To remove C from a chain of the records A, B, C, D, . .. , the pointer of B is redirected to D. If the record is being discarded, its storage space would probably be returned to another chain from which it can be reassigned later. To put the record Z between B and C, the pointer of B is directed to 2 and the pointer of Z is set to indicate C . Reordering a chain consists of a series of removals and insertions. As can be seen, list processing does not require that records be physically moved. It therefore provides an efficient way of transferring records from one category to another by moving them on and off chains, and it can easily manage lists that are constantly changing size; these are two properties that are very desirable in simulation programming. Therefore list processing is used in the implementation of all major discrete system simulation languages, including the GPSS and SIMSCRIF'T simulation programs. REFERENCES I 2
3 4
5 6
7
8 9
10 II
Ackoff, R. L.,Towards a system of systems concepts, MaMge. Sei., 17, 19771, 661-671. Burt, I. M., D. P. Graver, and M. Perlas, SimpIe stochastic networks: Some problem and procedures, Nav. Res. I q i s r . Qwr:.. 17, 1970, 439-459. Chorafas, D. N., System and Sinairtion. Acadrmic, New York, 1%5. Churchman, C. W.. R. L. Ackoff, and E. L. Amoff, Introdttction to Operotionr Research, Wilcy, New York, 1959. b s h a f f , J. R. and R. L. Sisson, Design and Use of Computer SimuIation Modeis, Macmillan, New York, 1970. Frmakov. J. M., Monte Carlo Method and Rdated Questions, Nauka, Moskow, 1976 (in Russian). Evans, G. W., G. F. Wallace, and G.L. Sutherland, Simulation Using Digital Conlpurets, hentice-Hall, Engiewood Cliffs, New Jersey, 1967. Fishman. G. S.. Concepts and Merho& in Discrete Ewnt Digital Simulation. Wiley, N e w Ymk, 1973. Fishman, G. S., Principles o/ Discrete Ewnr Simuiation, Witey, New York, 1978. Gcisler, M. A., The use of man-machine simulation for support planning, N a n Res. Logist. Quart., 7, 1960, 421-420. Gordon, G., System Simulation. Prentice-Hall. Englewood Ciiffs, New Jascy, 1%9.
1%
SYSTEMS, MODELS, SIMULATION, AND THE MONTE CARLO METHODS
12 H o n a k A of Operotions Research, Founaiz!ionr and F d m e d s , edited by J. J. Modern
and S. E. Elmagraby, Van Nostrand Reinbold. New Yo&, 1978. 13 Hammersley, I. M. and D. C. Handscomb. Monte Carlo Method?, Wiley, New York; Metbucn, London, 1964. 14 Hannan, H. H.,Simulation: A survey, Report SP-260.System Development Corporation, Santa Monica, California. 1961. IS Hillier. F. S. and G. J. Liebeman. Intmokction to Operoriw R d , Holden-Day, San Francisco, California 1968, Cbaptcr 14. 16 Hollingdde, S. H. (Ed.),Digirol Simdotion in oiprrorions Research, American HJcVier,
New York. 1967. 17 10M Corporation, Biblirrgrcqolry on Simularion, Form No. 320.0924, 112 East Post Road, White Plains. New York, 1966. 18 Kiviat, P. J., Digital Computer Simuiotion: Modcling Conccp~,Report RM-5378-PR The Rand Corporation,Santa Monica, California, 1967. I9 Kkinen, J. P.C.,Staristicul Techniquesin Sintulnrion, Part I, Marcel Decker. New York, 1974. 20 Lcwis.
21 22
23
24 25 26
27 28 29
3Q 31
32
P. A. W.,Large-scale Computer-Aided Statistical Mathematics, Naval Postgraduate School, Mooterey, California, in Proc. Contpuler Science and Sfarirtics: 6fh AMWI Syw.IwerJoce, Western Periodical CO., Hollywood, California, 1972. Lucas. H.C., Performance evaluation and monitoring, Conput. Swu., 3, 1971.79-91. Maid. H. and G. Gnugooli, Simulation oj Discrere Stocktic @stem, Science Research Associates, Palo Alto, Califoraia, 1972. Morgenthaler, G.W.,The theory and application of simulation in Operations research, in Progress in Opematiom Reseorch, edited by R. L. Ackoff, Wiky, New York, 1961. M c h d . J. (Ed.). S i d t i o n , Mffiraw-Hill, Near York. 1968. McMillan, C.,Jr., and R. Coourles, S’sremr Anabsis: A CoyDuter Approach to Decision MOdeLr, R c W cd., Richard D. Ervin. Homewood, lllhoiis. 1965. Mikaifov, G. A., Some Problems in ripe 7 h t y of the Monte-Corlo Method, Nauka, Novasibink, U.S.S.R.. 1974 (in Russian). Mitt. I. H.and J. G.Cox, &enria& ojSimulation, Rentice-HaII, Engkwood ClifFs. New Jersey. 1968. Nayior, T.J., J. L. Ealiotfy, D.S. Burdick, and K.Chu, Conlpwcr Simvlotion Techniques, Wiley, New York, 1966. Naylor, T.J., Campurer Simulation Experiments with MOdeLt o j fi&c Systemr, Wiley. New York, 1971. Proc. Cot$. Businem Games, sponsored by the Ford Foundation and Schooi of Business Administration, Tulane Univmity, April 26-28, 1961. Ueirman, J., Conlpufer Simularion Applications: DiscreteEmt Sinnclarionjur rhe Sywhesis und Ana&sis o/ Complex @srem, Wiky, New Yo&, 1971. Roacnbluth, A. and N. Wiener, The role of models in scicact, Phiios. Sci., Xn,No. 4,
Oct. 1945,316-321. 33 Smith,J., Computer Simulation Models, Hafncr. New Yo&, 1968. 34 Sobol, J. M..Compurarionai Method of Monte C d o , Nauka, M d o w , 1973 (in Russian). 35 Sbreider, Y. A. (Ed.), Method of Sroiisricol Tating: Monte Curio Method, Elsevier, Amsterdam, 1964.
REFERENCES
19
36 Stephenson, R. E., Conqpufer Simwlatronfor Engineers, Harcourt Brace Jovanovitch, New York, 1971. 37 Shubik, M., On gaming and game theory, Manage. Sci., Professional Series, 18, 1972, 37- 53. 38 Shut&, M.. A Preliminuq Bibliography on Gaming, Department of Administrative Sciences, Yak University, New Haven, Connecticut, 1970. 39 Shubik, M.. Bibliography on simulation. gaming, artificial intelligence and allied topics, J . Amer. Star. Asroc., 9, 1960, 736-751. 40 Twher, K. D., The Art of Simulation, D. Van Nostrand, Princeton, New Jersey, 1963. 41 Yakowitz, S. J., Contpurorional Probabilip and Simulation, Addison-Wesley, Reading, Massachusetts, 1977.
Simulation and the Monte Carlo Method
R E W E N Y. RUBINSTEIN Copyright 0 1981 by John Wiley & Sons, Inc.
CHAPTER2
Random Number Generation 2.1 INTRODUCIlON
In this chapter we are concerned with methods of generating random numbers on digital computers. The importance of the random numbers in the Monte Carlo method and simulation has been discussed in Chapter 1. The emphasis in this chapter is mainly on the properties of numbers associated with uniform random variates. T h e term rondom number is used instead of uniJorm random number. Many techniques for generating random numbers have been suggested, tested, and used in recent years. Some of these are based on random phenomena, others on deterministic recurrence procedures. Initially, manual methodr were used, including such techniques as coin flipping, dice rdling, card shuffling, and roulette wheeIs. It was believed that only mechanical (or electronic) devices could yield “truly” random numbers. These methods were too slow for general use, and moreover, sequences generated by them could not be reproduced. Shortly following the advent of the computer it became possible to obtain random numbers with its aid. One method of generating random numbers on a digital computer consists of preparing a table and storing it in the memory of the computer. In 1955 the RAND Corporation published [46]a well known table of a million random digits that may be used in forming such a table. The advantage of this method is reproducibility; its disadvantage is its lack of speed and the risk of exhausting the table. In view of these difficulties, John von Neumann (561 suggested the mid-square method, using the arithmetic operations of a computer. His idea was to take the square of the preceding random number and extract the 20
2.2 CONGRUENTIAL GENERATORS
21
middle digits; for example, if we are generating four-digit numbers and arrive at 5232, we square it, obtain 27,373,824;the next number consists of the middle four digits-namely, 3738-and the procedure is repeated. This raises a logical question: how can such sequences, defined in a completely deterministic way, be random? The answer is that they are not really random, but only seem so, and are in fact referred to aspseudorandom or quasi-random; still we Cali them random, with the appropriate reservation. Von Neumann’s method likewise proved slow and awkward for statistical analysis; in addition the sequences tend to cyclicity, and once a zero is encountered the sequence terminates. We say that the random numbers generated by this or any other method are “good” ones if they are uniformly distributed, statistically independent, and reproducible. A good method is, moreover, necessarily fast and requires minimum memory capacity. Since all these properties are rarely, if ever, realized, some compromise must be found. The congruential methods for generating pseudorandom numbers, discussed in the next section, were designed specifically to satisfy as many of these requirements as possible.
2 2 CONGRUENTIAL GENERATORS
The most commonly used present-day method for generating pseudorandom numbers is one that produces a nonrandom sequence of numbers according to some recursive formula based on caiculating the residues modulo of some integer m of a linear transformation. It is readily seen from this definition that each term of the sequence is available in advance, before the sequence is actually generated. Although these processes are completely deterministic, it can be shown [31] that the numbers generated by the sequence appear to be uniformly distributed and statistically independent. Congruential methods are based on a fundamental congruence relationship, which may be expressed as 1321 X,+,-(uX,+c)(modm), i 5 1, ..., n, (2.2.1) where the mult@lier a, the incremnl c, and the modulus m are nonnegative integers. The modulo notation (mod m ) means that X,,, = a x , + c - mk,, (2.2.2) where k, = [(ax,+ c ) / m J denotes the largest positive integer in
(ax,+
c)/m-
Given an initial starting value X, (also called the seed), (2.2.2) yields a congruence relationship (modulo m ) for any value i of the sequence { X,}.
22
R A m M NUMBER GENERATION
Generators that produce random numbers according to (2.2.1) are called mixed congruential generators. The random numbers on the unit inverval (0,l) can be obtained by Xi q==m
(2.2.3)
Clearly, such a sequence will repeat itself in at most m steps, and will therefore be periodic. For example, let a = c = X, = 3 and m = 5 ; then the sequence obtained from the recursive formula XI+, 3XI + 3(mod 5 ) is XI = 3,2,4,0,3. It follows from (2.2.2) that Xi < m for all i. This inequality means that the period of the generator cannot exceed m, that is, the sequence X, contains at most m distinct numbers (the period of the generator in the example is 4, while m = 5). Because of the deterministic character of the sequence, the entire sequence recurs as soon as any number is repeated. We say that the sequence “gets into a loop,” that is, there is a cycle of numbers that is repeated endlessly. It is shown [3l] that all sequences having the form X,+ = f ( X l ) “get into a loop.” We want, of course, to choose m as large as possible to ensure a sufficiently large sequence of distinct numbers in a cycle. Let p be the period of the sequence. When p equals its maximum, that is, whenp = m, we say that the random number generator has a fullperiod. I t can be shown [31] that the generator defined in (2.2.1) has a full period, m, if and only if: 5
,
I c is relatively prime to m , that is, c and m have no common divisor. = I(mod g ) for every prime factor g of m. 3 u = I(mod 4) if m is a multiple of 4. 2 u
Condition 1 means that the greatest common divisor of c and m is unity. Condition 2 means that CI - g [ a / g ] + I. Let g be a prime factor of m;then denoting K = [ a / g ] ,we may write a = 1 +gk. (2.2.4) Condition 3 means that a*1
+ 4[ a/4]
(2.2.5)
if m/4 is an integer. Greenberger [ 19) showed that the correlation coefficient between X i and Xi+ I lies between the values
i-(Z)(l-;)*-. U
m
and that its upper bound is achieved when a = m ’ / * irrespective of the value of c.
2.2
23
CONORUENTIAL GENERATORS
Since most computers utilize either a binary or a decimal digit system, we select m = 2@or m = lop, respectively where denotes the word-length of the particular computer. We discuss both cases separately in the following. For a binary computer we have from condition 1 that m = 2@guarantees a full period. It follows also from (2.2.1)that, for m = 2#, the parameter c must be odd and a = I(mod4), (2.2.6) which can be achieved by setting a=2'+l, rl2. It is noted in the literature [25, 35, 44)that good statistical results can be achieved while choosing m = 235,a = Z7 + I , and c = I. For a decimal computer m = lop. In order to generate a sequence with a full period, c must be a positive number not divisible by g = 2 or g = 5, and the multiplier a must satisfy the condition a =- )(mod 20), or alternatively, a = lo'+ 1, r > I . Satisfactory statistical results have been achieved f 11 by choosing a = 101, c = 1, r 2 4. In this case X, had little or no effect on the statistical properties of the generated sequences. The second widely used generator is the multiplicatiw generator X I + , =aX,(modm), (2.2.7) which is a particular case of the mixed generator (2.2.1) with c = 0. I t can be shown [ I , 2, 5, 311 that, generally, a f i l l period cannot be achieved here, but a maximal period can, provided that X, is reIatively prime to m and u meets certain congruence conditions. For a binary computer we again choose m = 2 @and it is shown [31] that the maximal period is achieved when u 8r 2 3. Here r is any positive integer. The procedure for generating pseudorandom numbers on a binary computer* can be written as:
-
1 Choose any odd number as a starting value X,. 2 Choose an integer a = 8r 5 3, where r is any positive integer. Choose a close to 2@/*(if /3 = 35, a = 2"+ 3 is a good selection). 3 Compute Xi, using fixed point integer arithmetic. T h i s product will consist of 28 bits from which the high-order /3 bits are discarded, and the low-order /3 bits represent Xi. 4 Calculate V , = X , / 2 @to obtain a uniformly distributed variable.
*This procedure and the one that follows arc reproduced almost verbatim from Ref. 31.
24
RANDOM NUMBER GENERATION
,
5 Each successive random number X,, is obtained from the lowsrder bits of the product ax,.
For a decimal computer m = loB. I t is shown in Ref. 49 that the maximal period is achieved when a = 200r %+p, where r is any positive integer and p is any of the following 16 numbers: (3, 11, 13, 19, 2 I , 27,29,37,53,59,61,67,69,77,83,9 1). The procedure for generating random numbers on a decimal computer can be written as: 1 Choose any odd integer not divisible by 5 as a starting value A’,. 2 Choose an integer a = 2 0 r 2 p for a constant multiplier, where r is any integer and p is any of the values 3, 11, 13, 19, 21, 27, 29,37,53,59,61,67,69,77,83,91. Choose u close to IOfl’’. (If p- 10, Q = lO0,OaO 2 3 is a good selection.)
3 Compute ax, using fixed point integer arithmetic. This product will consist of 28 digits, from which the high-order p digits are discarded, and the low-order digits are the value of XI. Integer multiplication instructions automatically discard the high-order digits. 4 The decimal point must be shifted p digits to the left to convert the random number (which is an integer) into a uniformly distributed variate defined over the unit interval U,= X,/108. 5 Each successive random number X , , I is obtained from the low-order dig& of the product ax,.
Another type of generator in which A’,,, depends on more than one of the preceding values is the additive congruential generator [ 171 k = 1,2 ,..,, i - 1. (2.2.8) X,+,~X,+X,-,(modm), In the particular case k = I we obtain the well known Fibonacci sequence, which behaves like sequences produced by the multiplicative congruential method with a = (1 + *)/2. Unfortunately, a Fibonacci sequence is not satisfactorily random, but its statistical properties improve as k increases. RESUME: We have seen that a sequence of pseudorandom numbers
produced by a congruential generator is completely defined by the numbers X,, a, c, and m. In order to obtain satisfactory statistical results our choice must be based on the following six principles.: 1 The number X, may be chosen arbitrarily. If the program is run several times and a different source of random numbers is desired each time, set X , equal to the last value attained by X on the preceding run, or (if more convenient) set X, equal to the current date and time. *These six principles are reproduced by permission from Knuth [31, pp. 155-1561.
2.2
25
CONGRUENTIAL GENERATORS
2 The number m should be large. It may conveniently be taken as the computer's word length, since this makes the computation of ( a X + c) (modm) quite efficient. The computation of (ax + cxmodm) must be done exactly, with no roundoff error. 3 If m is a power of 2 (i.e., if a binary computer is being used), pick a so that a(mod 8) = 5. If m is a power of 10 (i.e., if a decimal computer is being used), choose a so that a(mod 200) = 21. This choice of a, together with the choice of c given below, ensures that the random number generator will produce all m different possible values of X before it starts to repeat. 4 The multiplier a should be larger than preferably larger than m/100, but smaller than m - 6.The best policy is to take some haphazard constant to be the multiplier, such as a = 3,141,592,621 (which satisfies both of the conditions in 3). 5 The constant c should be an odd number when m is a power of 2 and, when m is a power of 10, should also not be a multiple of 5. 6 The least significant (right-hand) digits of X are not very random, so decisions based on the number X should always be primarily influenced by the most significant digits. It is generally better to think of X as a random fraction X / m between 0 and I, that is, to visualize X with a decimal point at its left, than to regard X as a random integer between 0 and m - 1. To compute a random integer between 0 and k 1, we would multiply by k and truncate the result.
-
Finally, we present in this section the IBM System/360 Uniform Random Number Generator, a multiplicative congruential generator that utilizes the full word size, which is equal to 32 bits with 1 bit reserved for algebraic sign. Therefore an obvious choice for m is 23'. A pure congruential generator (c = 0) with m = 2k (k > 0) can have a maximum period length of m / 4 . Thus the maximum period length is 23'/4 = 229. The period length also depends on the starting value. When the modulus m is prime, the maximum possible period length is m - 1. The largest prime less than or equal to 23' is Z3' 1. Hence, if we choose m = Z3' - 1, the uniform random number generators will have a maximum period length of m - 1 = 23'- 2, which is only the upper bound on the period length. The maximum period length depends on the choice of the multiplier. Note that the conditions ensuring a maximum period length do not necessarily guarantee good statistical properties for the generator, ' does satisfy some known although the choice of the particular multiplier 7 conditions regarding the statistical performance of the generated sequence. The System/360 Generator can be described as follows. Choose any
-
26
A',>
RANUOM NUMBER GENERATION
0. For n
> 1,
X,,=75Xn-,(mod23'- I ) = 16,807Xn-,(mod23'- 1). The random numbers are (see (2.2.3)) U, = X,,/@' - I). The results of the statistical tests of the System/360 Uniform Random Number Generator indicate that it is very satisfactory. Versions of this generator are used in the IBM SL/MATH package, the IBM version of APL, the Naval Postgraduate School random number generator package LLRANDOM, and the International Mathematics and Statistics Library (IMSL)package. The generator is also used in the simulation programming language SIMPL/I. The assembly language subroutines GGLl and GGL2 of IBM Corporation (1974) also implement this generator, as well as the FORTRAN subroutine GGL. 23 STATISTICAL TESTS OF PSEUDORANDOM NUMBERS
In this section we describe some statistical tests for checking independence and uniformity of a sequence of pseudorandom numbers produced by a computer program. As mentioned earlier, a sequence of pseudorandom numbers is completely deterministic, but insofar as it passes the set of statisticai tests, it may be treated as one of "truly" random numbers, that is, as a sample from %(O, I). Our object in this section is to provide some idea of these tests rather than present rigorous proofs. For a more detailed discussion of this topic the reader is referred to Fishman [ 111 and Knuth [311. 23.1
chi-square Gooduess-of-Fit Test
The chi-square goodness-of-fit test, proposed by Pearson in 1900, is perhaps the best known of all statistical tests. Let X,, . . .,X , be a sample drawn from a population with unknown cumulative distribution function (c.d.f.) F,(x). We wish to test the null hypothesis H, : F,(x) = Fo(x), for all x , where F,(x) is a completely specified c.d.f., against the alternative for some x . H,:F , ( x ) Fo(x),
+
Assume that the N observations have been grouped into k mutually exclusive categories, and denote by N, and Np; the observed number of trial outcomes and the expected number for the j t h category, j = 1, . . .,k, respectively, when H, is true.
2.3
27
STATISTICAL TESTS OF PSEUDORANDOM NUMBERS
The test criterion suggested by Pearson uses the following statistic: (2.3.1) which tends to be small when H, is true and large when Ha is false. The exact distribution of the random variable Y is quite complicated, but for large samples its distribution is approximately chi-square with k I degrees of freedom [ 151. Under the Ho hypothesis we expect
-
P(Y >
= a,
(2.3.2)
where a is the significant level, say 0.05 or 0.1; the quantile xt...,, that corresponds to probability 1 --a is given in the tables of chi-square distribution. When testing for uniformity we simply divide the interval [O, I] into k nonoverlapping subintervals of length l/k so that Np,? = N / k , In this case we have (2.3.3) and (2,3.2) can again be applied for testing random number generators. To ensure the asymptotical properties of Y it is often recommended in the literature to choose N > Sk and k > IOOO, where k = 28 and k = loa for a binary and a decimal computer, respectively.
23.2 KolmogMav-Smlmav Coodness-of-Fit Test Another test well known in statistical literature is the one proposed by Kolmogorov and developed by Smirnov. Let X,,...,X w again denote a random sample from unknown c.d.f. Fx( x). The sample cumulative distributive function, denoted by F N ( x ) , is defined as
I FN(x ) = -(number of X,less than or equal to x )
N
where I ( -
X ) is the indicator random variable (r.v.) that is, (2.3.4)
For fixed x , F N ( x ) is itself an r.v., since it is a function of the sample.
28
RANDOM NUMBER GENERATION
Let us show that F N ( x )has the same distribution as the sample mean of a Bernoulli distribution, namely F N ( x )k= ~ ] = ( ~ ) [ F I ( ~ ) ] ~ [ I - F ~ ( x(2.3.5) )]~-~.
f"
Denote V;: = 4- oo,x)( X i ) ; then has a Bernoulli distribution with parame,V,. has a binomial distributer P(V, = I) = P ( X i Ix ) = F,(x). Since Zi", tion with parameters N and Fx(x), and since F N ( x )= ( I / N ) Zf-,Y, the result follows immediately. From (2.3.5) we see that
(2.3 -6) and varF,(x)
1
= T F , ( X I) -[ F , ( x ) ] ,
(2.3.7)
Equations (2.3.6) and (2.3.7) show that, for fixed x , FN(x) is an unbiased and consistent estimator of F,(x) irrespective of the form of F,(x). Since FN(x) is the sample mean of random variables 4- o,x)( Xi),i = 1, . ..,N, it follows from the central-limit theorem that f " ( x ) is asymptotically normally distributed with mean F,(x) and variance (l/N)F'(x)[ 1 - F(x)]. We are interested in estimating F'(x) for every x (or rather, for a fixed x ) and in finding how close F,(x) is to F,(x) jointly over all values x . The result lim P [ N-03
sup
IF,(^) - ~ , . ( x ) l > e ] = o
(2.3.8)
-so<x<m
is known as the Gliwnko-Cantelli theorem, which states that for every E > 0 the step function F,(x) converges uniformly to the distribution function F'(x). Therefore for large N the deviation IFN(x)- F,(x)I between the true function F,(x) and its statistical image F N ( x )should be small for all values of x. The random quantity D, = SUP IFN(X) - FX(X)l. (2.3.9) -oa<x<m
which measures how far F,(x)
deviates from F,(x)
is called the
Kolmogorm-Smirnoo one-sample statistic. Kolmogorov and Smirnov proved that, for any continuous distribution F,( x),
(2.3.10)
2.3
STATISTICAL 1ESTS OF PSEUDORANDOM NUMBERS
29
The function H ( x ) has been tabulated and the approximation was found to be sufficiently close for practical applications, so long as N exceeds 35. The c.d.f. H ( x ) does not depend on the one from which the sample was drawn; that is, the limiting distribution of fi DN is disiribution-jiree.This fact allows D,,, to be broadly used as a statistic for goodness-of-fit, For instance, assume that we have the random sample A',, ...,X , and wish to test H0:F,(x)= Fo(x) for all x where Fo(x) is a completely specified c.d.f. (in our case Fo(x) is the uniform distribution in the interval (0, I)). if Ho is true, which means that we have a good random number generator, then
is approximately distributed as the c.f.d. H( x). If Ho is false, which means that we have a bad random number generator, then F N ( x ) will tend to be near the true c.d.f. Fx(x) ratbet than near Fo(x), and consequently ~ u p - , < , ~ ~ ( F ~-( Fo(x)( x) will tend to be large. Hence a reasonable test criterion is to reject H , if sup- m < x c oo IFN( x 1 - FX x )I is largeThe Kolmogorov-Smirnov goodness-of-fit test with significance level Q rejects lf, if and only if D, > x, where the quantile xi --I is given in the tables of H ( x ) . Before we leave the chi-square and Kolmogorov-Smirnov tests, a word is in order on the similarity and difference between them. The similarity lies in the fact that both of them indicate how well a given set of observations (pseudorandom numbers) fits some specified distribution (in our case the uniform distribution); the difference is that the Kolmogorov-Smirnov test applies to continuous (jumpless) c.d.f.'s and the chi-square to distributions consisting exclusively of jumps (since all the observations are divided into k categories). Still the chi-square test may be applied to a continuous Fx(x)l provided its domain is divided into k parts and the variables within each part are disregarded. This is essentially what we did earlier when testing whether or not the sequence obtained from the random number comes from the uniform distribution. When applying the chi-square test allowance must be made for its sensitivity to the number of classes and their widths, arbitrarily chosen by the statistician. Another difference is that chi-square requires grouped data whereas Kolmogorov-Smirnov does not. Therefore when the hypothesized distribution is continuous Koimagorav-Smirnov allows us to examine the goodness-of-fit for each of the n observations, instead of only for k classes, where k s n. In this sense Kolmogorov-Smirnov makes more complete use of the available data.
<$
--(I
30
RANDOM NUMBER GENERATION
As regards the efficiency of the Kolmogorov-Smirnov and chi-square tests, at present too few theoretical results are available to allow meaningful judgment.
233 Cramer-vm Mises Goodness-&Fit Test [4] This test, like the preceding two, belongs to the goodness-of-fit tests and
its object is the same as theirs: for a given sample X , , . . . ,X , from some
unknown c.d.f. we wish to test the null hypothesis
H, : & ( x )
= Fo(x),
where F,(x) is a completely specified distribution, against the alternative
Hi : M
X
)
+Fax)
for at least one value of x. Denote by X(,,, ...,X(#, the order statistic and consider the following test statistic: (2.3.12)
In other words, the ordinate of Fo(x) is found at each value in the random sample X(,,, and from this is subtracted the quantity (2i - 1)/2N, which is the average just before and just after the jump at X,,,-that is, the average of (i- 1)/N and i / N . T h e difference is squared, so that positive differences do not cancel the negative ones, and the results are added together. T h e quantities of Y are tabulated by using an asymptotic distribution function of Y as given by Anderson and Darling 121. The Cramer-von Mises goodness-of-fit test, with significance level a. rejects Ho if and only if Y >y,-,, where the quantity y , - , can be found from the appropriate tables. 23.4
Serial Test [31]
The serial test is used to check the degree of randomness between successive numbers in a sequence and represents an extension of the chi-square goodness-of-fit test. Let X i = ( ( 1 1 % - . . uk), x , , = (u&+ 1, . . , u 2 k ) , . (qN-,>&+,,..., UNk}be a sequence of N k-tuples. We wish to test the hypothesis that the r.v.’s X , , X , , .. .,X, are independent and uniformly distributed over the kdimensional unit hypercube. Dividing this hypercube into r elementary hypercubes, each with volume i/rk, and denoting by y,,. .. ,,r the number of k-tuples falling within the G=
‘
9
a
x,
2.3
STATISTICAL TESTS OF PSEUDORANDOM NUMBERS
31
element r
r
,
i - 1 ,..., k ; j i = 1 ,..., r ,
we have that the statistic
i,,.. . ,jk= 1
(2.3.13)
has an asymptotical chi-square distribution with r k - I degrees of freedom. Since there are r‘ hypercubes within which Xi may fall, the question of available space arises. If k = 3 and r = 1O00, the serial test requires lW3= 10’ counters-a problematic requirement in terms of both storage and search. In these circumstances the test is rarely used for k > 2. 2.3.5
‘Ibe-Wpd-Do~nTest f43]
For this test the magnitude of each element is compared with that of its immediate predecessor in the given sequence. If the next element is larger, we have a run-up: if smaller, a run-down. We thus observe whether the sequence increases or decreases and for how long. A decision concerning the pseudorandom number generator may then be based on the number and length of the runs. For example, the following seven-term sequence 0.2 0.4 0.1 0.3 0.6 0.7 0.5 consists of a run-up of length 1, followed by a run-down of length 1, followed by a run-up of length 3. and finally a run-down of length 1, and may be characterized by the binary symbol as I 0 11 1 0, where 1 denotes a run-up and 0 a run-down. More generally, suppose there are N terms, say X , < X, < * < X, when arranged in order of magnitude; the timeordered sequence of observations represents a permutation of these N numbers. There are N ! permutations, each of them representing a possible set of sample observations. Under the null hypothesis each of these alternatives is equally likely to occur. The test of randomness, using runs-up and runs-down for the sequence X,,. . .,X , of dimension N, is based on the derived sequence of dimension N - I, whose ith element is 0 or 1 depending on whether Xi+, - X,, i = 1,. . ..N - 1, is negative or positive. A large number of long runs should not occur in a “truly” random sample. The test rejects the null hypothesis if there are at least r runs of length t or more, where both r and 1 are determined by the desired significance level. The means, variances, and covariances of the numbers of runs of length t or more are given in Levene and Wolfowitz (341.
-
32
RANDOM NUMBER GENERATION
The expected numbers of occurrences of runs in a “truly” random sample are [43]
2N -
for total runs
3
N+l 12
for runs of length 1
11N- 14 12 .
.
*
.
.
.
.
.
for runs of length 2 .
.
.
.
.
.
.
.
.
.
.
.
I
.
.
.
.
.
.
.
.
.
.
2[ ( k 2 + 3k + l ) N - ( k 3 + 3k2 - k - 4)] ( k 3)! fork
+
2 I _
N!
.
.
.
.
.
.
.
for runs of length k,
for runs of length N - I .
Tables of the exact probabilities of at least r runs of the length z or more are available in Olmstead [44] for n 2 14, from which the appropriate critical region can be found. A test of randomness can also be based on the total number of runs, whether up or down, irrespective of their lengths. The hypothesis of randomness is rejected when the total number of runs is small. Levene [33] has shown that the r.v. U - ( 2 N - I)/3 z= (2.3.1 4) [ ( i 6 N - 29)/9O] has a standard normal distribution, so that for large N the test of significance can be readily done.
23.6 Gap Test [31] The gap test is concerned with the randomness of the digits in a sequence of numbers. Let U,, ...,U, be such a sequence. We say that any subsequence l$ V,+, [,.. ., r/,+, of r + I numbers represents a gap of length rif r/l and V,+, lie between a and p ( 0 I a < p I I) but V,+i, i - I,. . . , r 1, does not. For a “true” sequence of random numbers the probability of obtaining a gap of length r is given in Ref. 44 and is equal to
P( r ) = (0.9)‘(0.1).
(2.3.15)
A chi-square goodness-of-fit test based on the comparison of the expected and actual numbers of gaps of length r may again be used.
33
EXERCISES
23.7 Maximum Test [35J
,, .
Let Y,“ = max(U(,. . . q k )j, = I , . . . ,N, be a sequence of N ktuples. It is shown in Ref. 35 that, if the sequence U,,,.., V,, is from %(O, I), then Y:, . .., Y i is also from %(O, 1). To check whether or not U , , . .., V,, is a “true” sequence of random numbers, we can apply the chi-square or the Kolrnogorov-Smirnov test to the sequence { ‘;.”,j = I , . .., N}. The reader might ask: “How many tests d o we need to check the rapdom number generator?’ and also “Which of them should we choose?“ In fact, more computer time may be spent testing random numbers than generating them. Another question that arises is: “What should be done with the sequence of numbers if it passes most of the tests but fails one of them?’ These questions, as well as many others, must be solved by the statistician.
EXERCISES 1 Consider a sequence Xl+l
==!(XI).
where X , . X 2 . . . . are integers, 0 I X , < m ,and 0 s j ( X , ) < m . Show that the sequence I S ultimately periodic, in the sense that there exist numbers h and p for which the values X,. Xi,. . .,Xu.. ..,Xu+*-, are distinct, but Xm+* = X , when n 2 p . Find the maximum and minimum possible values of p and A . (b) Show that there exists an n > 0 such that X , = X z m :the smallest such value of n lies in the range p In 5 p + A , and the value of X, is unique in the sense that, if X, X,,and X , = X2,, then X, = X , (hence r - i is a multiple of A). From Knuth 1311.
(a)
-
2 Prove that the middle-square method using2n-digit numbers to the base /3 has the following disadvantage: if ever a number X , whose most significant n digits are zero, appears. then the succeeding numbers wiIl get smaller and smaller until zero occurs repeatedly. From Knuth 131).
3 A sequence generated as in exercise 1 must begin to repeat after at most m values have been generated. Suppose we generalize the method so h a t X,,, depends on X,-I as well as on X , ; formally, letf(x,y) be a function such that, if 0 5 x,y < m, then 0 < / ( x , y ) < m . The sequence is constructed by selecting X, and XI arbitrarily, and then letting
,
X,+ -f( X I , X,- ),
for i
> 0.
Show that the maxtmum period conceivably attainable in this case is m 2 . From Knuth [3 I].
34
RANDOM NUMBER GENERATION
4 Given the two conditions that c is odd and a(mod)45 1, prove that they are necessary and sufficient to guarantee the maximum tengtb period in the sequence
X,,
,
=5
axi - c(mod m )
when m = ac, e + 2. From Knuth [3 I]. 5 Prove that the sequence X,,
-
I
= ax, - c(mod m ) ,
with m loc,e > 3, and c not a multiple of 2 and not a multiple of 5, will have a full period if and only if a(mod 20) = 1. From Knuth 13 I]. 6 Show that the random function SJX) =
I": -- , I(X--Xi)
whereI(t)=
i-I
(A:
is the empirical distribution function of a sample X I ,X,, done by showing that S,,(X) = F , ( x ) for all x.
ift>_O ift
... ,X,;
this should be
7 Let F , ( x ) be the empirical distribution function for a random sample of size n from %(O, 1). Define Xn(l)=- \ / ; ; [ F , ( r ) - t ]
zn(r)= ( 1 + I)x,(&),
for 0 _< 5 I .
Prove chat vadX,(r)) 5 vaifZ,(i)] for a11 0 5 1 I 1 and all n . 8 Find the minimum sample size N required such that
P( D N < 0.05) 2 0.95. 9 A random sample of size 10 is obtained:
X, P 0.503
Xz = 0.621
X,
X,
X7
XB = 0.581
6
0.480
0.320
For a level of significance a = 0.05
91
test,
0.447
X,
0.203
Xg
= 0.55 1
the null hypothesis
F,(x)=Fo(x),
forallx,
where Fo( x ) is from uniform distribution, that is,
0, x,
1. using: (a) The Kolmogorov-Smirnov test. (b) ?'he Cramer-von Mises test.
ifx
X, Xi0
-
P
0.710
0.386.
REFERENCES
35
REFERENCES I 2
3 4
5 6
7
6 9 10 11
12
13 I4
I5
AUard, 1. L., A. R. Dobell, and T. E. Hull, Mixed congrucntial random number generators €or decimal machines. J. Assoc. C o q . Moch., 10, 1966 131- 141. Anderson, T. W. and D. A. Darling, Asymptotic theory of main “goodness of fit” critrria based on stochastic processes. Ann. Moth. Sror., 23, 1952, 193-212. Barnett, V. D., The behavior of pseudo-random sequences generated on computers by the multiplicative congruential method, Murh. Cmp.,16, 1%9, 63-69. Conover, W . J., Practical Nonporametric Stufisfics,Wiley, New York, 1971. Coveyou, R R., Serial correlation in the generation of pseudo-random numbers, J. ASSOC.C ~ n l pMach., . 7, 1960,72-74, Coveyou, R. R. and R. D. MacPherson, Fourier analysis d uniform random number generatom J , Assoc. Comp. Mach., 14, 1%7, 100- 119. Dieter, U., Pseudo-random numbers: The exact dlslribution of pain. M a h . C~nlp.,U, 1971, 855-883. Dieter, U. and J. Ahrens, An exact determination d serial correlations of pseudo-random numbers, N u m r . Moth., 17, 1971, 101-123. Downham. D. Y.,T h e runs up and down test, COT. J., 12, 1%9. 373-376. Downham, D. Y. and F. D. K. Robens, Multiplicative congrucntial pitdorandom number generators, Camp. J. , 10 1967, 74-77. Fishman, G., Principles of Dixrete Ewnt Simuhtion. Wiley. New York, 1978. Foraythe, G. E.,Generation and testing of random digits, U.S. Not. Bur. Stand. A e l . Morh. Ser., No. 12, pp. 34-5, 1951, Franklin, J. N. Deterministic simulation of random processes, Moth. Conlp.,17, 1963. 28- 59. Franklin. J. N.,Numerical simulation of stationary and non-stationary Gaussian random processes, Soc.Indltt. Appl. Math. Rm.. 7 . 1965, 68-80. Gibbons, J. D., Nonpammefric Stufistical InJerence, McGraw-Hill. Tokio; Kogakusha, 1971.
16 Gorenstein, S.,Testiog a random number generator, Comm. Assoc. Cow. Mach. 10, 1967, 1 11- I IS. 17 Green, 8. F., J. E. K. Smith, and L. Klem, Empirical tests of an addltive random generator. J. Asoc. Camp.Mach.. 6, 1959, 527-537.
I8 Greenberger, M., Notes in a new pseudo-random number gentrator. J . Assoc. Canyc. Mach., 6, 1961. 163- 167. 19 Grccnbcrger, M., An a priori determination of serial correlation in computer generated random numbers, Murk Camp.,15, 1961, 383-389. 20 Gnenberger, M., Method in randomness, Comm. Assor. C o w . Mach., 8, 1%5, 177-179. 21 Gnrenberger, F.,Tests of random digits, Muth. Tub. Aidr C q . , 5, 1950,244-245. 22 Gruenberger, F. and A. M. Mark,The d 2 test of random dig& Moth. Tob.Aidr Cow., 5. 1951, 109-110. 23 Hammer, P.C., The mid-square method of generating digits, U.S.Not. Bur. Stand. Appl. Math. Ser., No. 12, p- 33, 1951. Appl. Moth. Reo., 24 Hull. T.E. and A. R. Dobell, Random number generators, Soe. I&$. 4, 1%2, 230-254.
36
RANDOM NUMBER GENERATION
25 Hull. T. E. and A. R. Dobell, Mixed congruential random number generators for binary machints, J . ASS^. C ~ m pMmh., . 11, 1944,31-40. 26 Hutchinson, D.W.,A New UnijOrm PsAldo-Ran&m Number Generator,File 651, Department of (7omputer Sciences, University of Illinois, Urbana, LUinds, April 27, 1%5. 27 Hutchinson, I). W..A ncw uniform pseudorandom number generator, Comm. Assoc. C m ~ tMach,, . 9, 1966.432-433. 28 IBM corpOrstion, Random Number Generation a d Testing, Form aCL801I, 1959. 29 IBM Corporation, Generd Pwpase Simrrlolion Sysfm/360 User’s Manual, G H 20-0326, white Plains, New York, January 1970. 30 Jmsson, B., Rrurdom Number Generatom, Almquist and Wiskell, Stockhoh, 1%. 31 Knuth, D. E., l”hr Art of Compluer Programming: Suninwrical Algonthmr, Vol. 2, Addimn-Wesley, Reading. M-hwtts, 1969. 32 Lcbmcr, D. H., Mathematical methods in Iarge-scaIe computing units. Ann. Conlp. Lab. H a m d Unie., 26, 1951, 141- 146. 33 Levene, M.,Dn the power function of tests of randomness based on runs up and dam, A m . Math. Stat., 23, 1952, 34-56. 34 Lcvene, M. and T. Wolfowitr The covariance matrix of runs up and down, Ann. Math. Stat., IS, 1944, 58-69. 35 MacLaten, M. D. and G. Marsaglia, Uniform random number generator% J. ~ S O C . C o w . Mach.. 12, 1965, 83-89. 36 Marsagha, G., Random numbers fall mainly in the planes, Proc. Nut. Acad. Sci., 61, Sept. 1968, 25-28. 37 Marsaglia, G. The structure of linear congruenlial squencts in Applications UJ Number Theory to Numerical Anu&sis. edited by S. K.Zaremba, Academic, New York, 1972. 38 Mood,A. M.,E. A. Graybill, and D. C. Roes, Introduction to rhc %oiy of Statistics, 3rd ed., McGraw-Hill, New York. 1974. 39 Moore, P. G., A sequential test for randomness, Biomctrika, 40, 1953, I 1 I- I IS. 40 Moshmao, J., The generation of pseudo-random numbers on a decimal calculator, J . ASSOC.COT. Mach., 1, 1954.88-91. 41 Moshman, J., Random number generation in Mathemtical Metho& fur Digital Cmpurers, Vol. 2, edited by A. Ralston and H. S. Wilf, Wiky, New Yotit, 1967. 249-263, 42 Nance, R. and C. Overstmet, Bibliography on random number generation, Cow. Rm., 13, 1972.495-508. 43 Naylor, T. el ah, Conlputer Simulation Techniques, Wiley, New York. 1%. 44 Olmstcad, P. S., Distribution of sample arrangements for runs up and down, Ann. Math. S14t.. 17, 1946, 24-33. 45 Owen, D. B., Handbook of St&sticai Tables, Addison-Wesley, Reading, Massachusetts, 1962. 46 Page, E. S.. Pseudo-random elements for computers, Awl. Stat., 8, 1959, 124- 131. 47 Rand Corporation. A Mittion Random Digits with JOOO.OOO Normal Deviates, Free Press, Clencoe, Illinois, 1955, 48 R otenkg, A., A new pseudo-random number generator, J. Assoc. Comp. Mach., 7, 1W. 49 Taussky, 0.and J. Todd, Generation and testing of pseudo-random numbers, in Symposium on Monte Carlo Methodr, edited by H.A. Meyer, Wiley, New York, 1956, I 5- 28.
REFERENCES
37
50 Tausworthe, R. S., Random number generated by linear recurrence modulo two, Moth. COW.. 19, 1%5, 201-209. 5 1 Thompson, W.E., ERNIE-A mathematical and statistical analysis, J. Roy. Bat. Soc., A. 122, 1959, 301-333. 52 Tippett, L. H. C., Random sampling numbers, in Tracts for Cowuters, No. XV, Cambridge University Press, New York, 1925. 53 Tochcr, K. D., The Art o/ Simulufion, English Universities Press,London, 1963. 54 Toot~lI,J. P. R.,W.D. Robinson, and A. G. Adams, The runs up-and-down performance of Tausworthe pseudo-random number generators, J. Assac. Cmp. Mach., 18, 1971, 38 1-399. 55 Van Geider, A., Some new results in pseudo-random number generation, J. Assuc, Comp. Moch., 14, 1967,785-792. 56 Von Neumann, J., Various techniques used in connection with random digits, U.S. Nat. Bur. Stand. Appi. Math. Ser., No. 12, 36-38, 1951. 57 Westlake, W. J., A uniform random number generator based on the combination of two congruential generators. J . Assoc. Conp. Mach., 14, 1967, 337-340. 58 Whittlesey, J., A comparison of the correlational behavior of random number generators, & o m . Amoc. Contp. Mach., It, 1968,641-644.
Simulation and the Monte Carlo Method
R E W E N Y. RUBINSTEIN Copyright 0 1981 by John Wiley & Sons, Inc.
CHAPTER3
Random Variate Generation 3.1 INTRODUCITON
In this chapter we consider some procedures for generating random variates (r.v.'s) from different distributions. These procedures are based on the following three methods: inverse transform method, composition method, and acceptance-rejection method, which are described, respectively, in Sections 3.2,3.3, and 3.4. Some generalizations on von Neumann's acceptance-rejection method are given in Section 3.4.3. Several techniques for generating random vectors are the subject of Section 3.5. Sections 3.6 and 3.7 describe generation of random variates from most widely used continuous and discrete distributions, respectively. The notations and mode of algorithm presentation are similar to those in Fishman 1121 and are used here to provide uniformity with other works in the field of random variate generation, For convenience we refer to sampling from a particular distribution by placing the name of the distribution of type of random variate before the word generation. For example, exponential generation denotes sampling from an exponential distribution. For simplicity U is a uniform deviate with probability density function (p.d.f.)
V is a standard exponential deviate with p.d.f. e-",
O
otherwise,
38
39
INVERSE TRANSFORM METHOD
and Z is a standard normal deviate with p.d.f. &(I)
=
1 e-r'/2 fi;
- o c < Z < ~ .
X usually denotes the random variable with p.d.f. f x ( x ) from which we wish to generate a value.
3.2 JNYERSE TRANSFORM METHOD
Let X be a random variable with cumulative probability distribution function (c.d.f.) F,(x). Since F,(x) is a nondecreasing function, the inverse function F; ' ( y ) may be defined for any value of y between 0 and 1 as: F;'(y) is the smallest x satisfying F'(x) > y , that is, F<'(y)=inf{x:F'(x)
>Y),
0 1 y I1.
(3.2. I )
Let us prove that if U is uniformly distributed over the interval (0, I), then (Fig. 3.2.1)
X = F , '(Vj
(3.2.2)
has cumulative distribution function F,( x). The proof is straightforward:
P(XIxj=P(F~'(CI)ix]=PIVF I ,(x)]5F'(x).
(3.2.3)
So to get a value, say x, of a random variable X,obtain a value, say u, of a random variable U,compute FX-'(u),and set it equal to x.
Fig. 3.2.1
Inverse probability integral transformation method.
40
RANDOM VARIATE GENERATION
T k alrporitkm iT-I 1 Generate U from %(O, I). 2 X t FL '(U).
3
Deliver X.
Example 1 Generate an r.v. with p.d.f. f x ( x )=
05x5 I otherwise.
ir7
(3.2.4)
The c.d.f. is x
x > 1.
Applying (3.2.2), we have
O l u l 1.
X=F,'(LI)=U''Z,
Therefore LO generate a variate X from the p.d.f. (3.2.4) we generate from %(O, 1) and then take a square root from U . Example 2 Generate an r.v. from the uniform distribution '%(a,b), that is,
otherwise. The c.d.f. is x
a<x
x>b. and X
= FX-'(U) =
a
+(b-a)U
Example 3 Let X , , . . .,X,, be independent and identically distributed (i.i.d) r.v.'s distributed F,(x). Define Y,=max(X, ,..., X,,) and Y,= min(X,, ...,A'"). Generate Y, and U,.The distributions of Y, and Yl are, respectively [23],
c
FY,(Y? = M Y ) ] "
41
INVERSE TRANSFORM METHOD
and G J Y )= 1
- [ 1 - F*(Y)]".
Applying (3.2.2}, we get y, = 1;;;- I( u ' I n ) and
Y,= Fx-l(l- u q . In the particular case where X = U we have y, U'/" E
and
Y,= I - U"". To apply this method F'(x) must exist in a form for which the corresponding inverse transform can be found analytically. Distributions in this group are exponential. uniform, Weibuil, logistic, and Cauchy. Unfortunately, for many probability distributions it is either impossible or extremely difficult to find the inverse transform, that is, to solve
u
q
j*(t)df
-03
with respect to x. Even in the case when Fi-' exists in an explicit form, the inverse transform method is not necessarily the most efficient method for generating random variates.
Example 4 Generate a random variable from the piece-wise constant p.d.f. (Fig. 3.2.2) fx(x) =
{
x,-, < x S x , ; i = 1,2 ,...,n 2
9
otherwise
where C, 2 0, (1 = x,, < x , < - . < x,- 4 x, = b. Denote P, = J,'f;,fx(x)dx, i = 1,..., n, and & = 2 ; - , < ,F,=O; then I-
FAX)==
I
r, .I+/* C,dx-F,_,+C,(.r-x,_,),
J'I
xt
I
where i = max { j : x,-, I x } . J
42
R A N W M VARIATE GENERATION
P
x1
x2
b
x3
Piece-wise constant p. d. 1.
Fig. 3.223
Now solving F,(X) = U with respect to X , we obtain X = n .I -
I
+ U-4-1
,
where&-,%U<&.
Ci
To carry out the method: 1 Generate U from %(O, I). 2 Find i from 1-
1
i
J’-
1
J’1
2 P,< u I 5 ,
i s I,
...,n .
3 i- I
4 Deliver X.
Example 5 Let fx(x) be represented as n
f*(4
5
r: f,W*
1-
Denote
I
.h 2 0.
(3.2.5)
43
COMPOSITION METIIOD
and
Let us prove that
X=#+-'(U-F,-l),
wheree-, 5 V < F ,
(3.2.6)
c-l)/e.
It is easy to see that +,/Piis a c.d.f. and that (Uis distributed < U I4. Therefore the r.v. X = F''((U - t - . l ) / P , ) has a p.d.f. f, / P , conditional to F;- < U 5 6. Noticing that X = +; '(V- F,- I ) = FX-'((V - <-l)/q.), f x ( x ) = CY-, ( f ; / P , ) P , , the results follow immediately. To carry out the method:
%(O, I ) if
c-,
1 P , t j F m f , ( x ) d x , i = 1,..., n. 2 F;tx:,IP,,i=I ,..., n. 3 Generate LI from %(O, I). 4 Find i from 6-i < W 5 6,F,c-O. 5 t # + ( x ) + j ~ , f , ( x ) d x , i= I , . . .,n. 6 X*-$;'(U7 Deliver A'.
As an example, let [22] j-*(x)=$(I
+2),
- 1 5 x 5 1.
Assume f , ( x ) = : , / , ( x ) = ~ x 2 , - 1 5 x 2 1. Then PI=:, P 2 = 5I , cpl(x) = i ( x f I), r$2(x)==i(x3+ I), and
3 3 COMPOSITION METHOD
This method is employed by Butler [7]. Refs. 1 I, 22, 29, and 35 exploit this method to great advantage. In this technique j-,(x), the p.d.f. of the distribution to be simulated, is expressed as a probability mixture of properly selected density functions. Mathematically, let g ( x J y )be a famiiy of one-parameter density functions, where y is the parameter identifying a unique g(x). If a value of y is drawn from a continuous cumulative function F , ( y ) and then if X is
44
RAVDOM VARlATE GENERATION
sampled from the g(x) for that chosen y , the density function for X will be
IF y is an integer parameter, then !ax)
=
c
pIg(xlY = i )
(3.3.2)
I
where ZP,=l,
pi>o;i-1,2
,...; P , = P ( y = i ) .
I
By using this technique some important distributions can be generated. This technique may be applied for generating complex distributions from simpler distributions that are themselves easily generated by the inverse transform technique or by the acceptance-rejection technique. Another advantage of this technique is that we can sometimes find a decomposition (3.3.2) that assigns high probabilities Pi to p.d.f.3 from which sampling X is inexpensive and concomitantly assign low probabilities pi to p.d.f.’s from which sampling X is expensive.
Example 1 Generate an r.v. from
Therefore
Example 2 (Butler “71) Generate an r.v. from
Let
45
ACCEPTANCE-REJECTION MBIHOD
and g ( x ( y ) = y e - - ” ’ . A variate is now drawn from distribution F,(y). Once thisy is selected, it determines a particular g ( x ) =ye-yx. The desired variate from f x ( x ) is then simply a variate generated from g ( x ) =ye -yx. To carry out the composition method: 1 Generate U,, r/, from %(O, I). 2 Ycu,-””. 3 X t -(l/Y)ln V . . 4 Deliver X. Example 3 Generate an r.v. from W
F,(x) =
Pix’, 1-
0I x 5 1,
I
where Xz,P, = 1, pi 2 0.The algorithm can be written directly: I Generate W, and V2from %(O, 1). Find i from z;:‘, Pk s W , I Z,; Pk, where;2 ; 3 X t U.”. 4 Deliver X. 2
3.4
I
Pk = 0.
ACCEF1;4NCEREJEXXION METHOD
This method is due to von Neumann (341 and consists of sampling a random variate from an appropriate distribution and subjecting it to a test to determine whether or not it will be acceptable for use. 3-4.1 Single-Variate Case
Let X to be generated From f x ( x ) , x E I. To carry out the method we represent f x ( x ) as
M.)
= Ch(x)g(x)*
(3.4.1)
where C 2 1, h ( x ) is also a p.d.f., and 0
Y as a variate generated from
M X ) .
2 If the inequality is violated, reject the pair U,Y and try again.
The theory behind this method is based on the following.
46
RANDOM V W A T E GENERATION
'fbeorem 3.4.1 fx(x), x E
Let X be a random variate distributed with the p.d.f.
I , which is represented as
-
f x b ) Cg(x)h(x), where C 2 I, 0 < g ( x ) 5 I, and h ( x ) is also a p d f . Let U and Y be distributed %(O, 1) and h ( y ) , respectively. Then M x I U5
=fx(x).
(3.4.2)
Proof By Bayes' formula
We can directly compute
Upon substituting (3.4.4) and (3.4.5) into (3.4.3). we obtain fY(XIU
d Y ) ) = Cg(x)h(x) 5 M X ) . Q.E.D.
The efficiency of the acceptance-rejection method is determined by the inequality U 5 g ( Y ) (see (3.4.5)). Since the trials are independent, the probability of succcss in each trial is p = l/C. The number of trials N before a successful pair U ,Y is found has a geometric distribution: Py(n) =p(l - p ) " .
n = 0 , 1 , ...,
with the expected number of triils equal to C. Algorithm AR-1 describes the necessary steps. A i g o n t h AR-1 1 Generate U from %(O, 1). 2 Generate Y from the p.d.f. h ( y ) . 3 If U Ig ( Y ) , deliver Y as the variate generated from f x ( x ) . 4 Go to step 1.
(3.4.6)
47
ACCEPlANCE-REJECI’ION METHOD
For this method to be of practical interest the following criteria must be used in selecting h ( x ) : 1 It should be easy to generate an r.v. from h ( x ) . 2 The efficiency of the procedure 1/C should be large, that is, C should be close to 1 (which occurs when h ( x ) is similar tofx(x) in shape).
To illustrate this method (Fig. 3.4.1) let us choose C such that l;c(x) i Ch( x) for all x E I , where C 2 1. The problem then is to find a function +(x) = C h ( x )such that +(x) 2fx(x) and a function h ( x ) = +(x)/C, from which the r.v.’s can be easily generated.
T h e maximum efficiency is achieved when &(x) = +(x), Vx E 1. In this case I/C = C = I , g(x) = 1, and there is no need for the acceptancerejection method because h ( x ) = j x ( x ) (to generate a variate fromfx(x) is the same as from h ( x ) ) . There exist an infinite number of ways to choose h ( x ) to satisfy (3.4.1). Many papers connected with choosing h ( x ) have been written, and we consider some of them later. In the particular case when +(x) = M,a I x 5 b, and 1
h ( x ) =b-a’
(3.4.7)
we obtain from (3.4.1) C - M(b-a )
(3.43)
(3.4.9) Von Neumann (341 first considered the acceptance-rejection method for this particular case, and his algorithm can be described as follows. AIgonikm AR-2
I Generate U,and V, from %(O, 1). Y ~ + U2(b u - a).
2
t I
48
RANDOM VARIATE GENERATION
3 If
deliver Y as the variate generated fromj,(x). 4 Go to step 1. We now consider three examples. The first two are related to Algorithm AR-2 and the third to Algorithm AR-I.
Example 1 Generate a random variate from fx(x) = 3x2,
0 I x I 1.
Here M = 3. (I = 0, and 6 = 1. To apply Algorithm AR-2: 1 Generate two uniform random variates 2 Test to see if U, 5 U;.
W,and U2from qL(0, 1).
3 If the inequality holds, accept U2as the vanate generated fromfx(x). 4 If the inequality is violated, reject W, and U2 and repeat steps 1 through 3. Example 2 Generate a random variate from
2 TR
--
- R Ix IR.
f x ( x ) = ~ d R 2 - x 2 ,
Assume M = 2 / n R ; then Algorithm AR-2 is as follows: I Generate two uniform random variates I / , and U, from %(O, I). 2 Compute Y - ( 2 U 2 - 1)R. 3 If U,< J , ( Y ) / M , which is equivalent to ( 2 4 - I)' I 1 - U:, then accept Y = (2U2- l)R as the variate generated from f x ( x ) . 4 If the inequality is violated, reject U , and U, and repeat steps 1 through 3 again. The expected number of trials C = 4 / ~and the efficiency l/C = n/4 = 0.785.
Example 3 Generate a random variate from I, --x
fax)=
r(a) ,
O
To apply the acceptance-rejection method we use the inequality x>
I,
49
ACCEPTANCE-REJECTION METHOD
which is the same as
Here -a-
I
h( x ) = ~
C=
-(rt1a)
+ f ),
and we obtain from (3.4.1)
To generate a random variate from/,(x) we generate two random variates U and Y from qL(0,l)and h ( y ) , respectively, and then apply the acceptance rule U 5 g( Y ). Note that the random vanate Y can be easily generated by the inverse transform method. To apply Algorithm AR-I : 1 Generate U from %(O, I). 2 Generate Y from h ( y ) .
3 If
05 Y < 1 U 5( e -y, Y" I , l
I C
-=-
a+e aeI'(a)'
and the mean number of trials is
C=--a e f ( a ) a+e
50
RANIIOM VARIATE GENERATION
Let us assume that h ( x ) is known up to the parameter /3, that is, h(x) = h(x,/3). It is shown (see MichaiIov [22] and Tocher [33]) that the optimal p, which provides minimum to C, is achieved by (3.4.10)
3.4.2
Multivariate Case
Theorem 3.4.1 can easily be extended to the multivariate case. The proof of the following theorem wil1 be left to the reader. Theotem 3.4.2 Let X = ( X , , .. . ,X,) be a random vector distributed with the p.d.f.j,(x), x = ( x i , . ..,x,) E D,where D = ( ( x , , . ,x,) :ui I xi I bi, i = 1,. . . ,n}, and supposej,(x) I M. Generate U,, ...,U,, from %(O, 1) and define Y = ( Y , , . - . , Y , ) , where & = u , + ( b i - u i ) U i , i = 1,..., n. Then
..
,
We can see that this theorem is an extension of von Neumann's method described in Algorithm AR-2 for the muhivariate case. Example 4 Generate a random vector uniformly distributed over the complex region G (Fig. 3.4.2). The algorithm is straightforward. I Generate a random vector Y uniformly distributed in Q where $2 is a nice region (multidimensional rectangular, hypersphere, hyperellipsoid, etc.).
Fig. 3.4.2
Generating a random vector uniformly distributed over a complex area.
51
ACCEPTANCE-REJECTION METIIOU
2 If Y E C. accept Y as a variate uniformly distributed in G. 3 G o to step I . Exampk 5 Generate a random vector uniformly distributed on the surface of an n-dimensional unit sphere. To generate a random vector uniformly distributed on the surface of an n-dimensional unit sphere, we simulate a random vector uniformly distributed in the n-dimensional hypercube { - 1 Ix, II}:-, and then accept or reject the sample (XI,. . .,X,,), depending on whether the point (XI,. ..,X,,) is inside or outside the n-dimensional sphere. The algorithm is as follows:
I Generate U,... ., U,,from %(O, 1). 2 X , c I - 2U1 ,..., X,+l -2U,,, and Y 2 c Z ~ , , X ~ . 3 If Y 2 < 1, accept Z = ( Z , ,..., Z,), where Z , = ( X , / Y ) , i = 1,...,n, as the desired vector. 4 Go to step 1. The efficiency of the method is equal to the ratio
I -=C
volume of the sphere 3 I volume of the hypercube n2n - I
r(n / 2 )
'
For even n ( n = 2 m )
and lim
1 -=a.
C In other words, for n big enough the acceptance-rejection method is inefficient. m-tee
Remark To generate a random vector uniformly distributed inside an n-dimensional unit sphere, we have to rewrite only step 3 in the last algorithm as follows:
3 If Y * I 1,accept Y = ( Y I ,..., Y,,)asthedesiredvector. 3.43 Generathation of von Neumann's Method
There are various modifications and generalizations of von Neumann's method [ 10, 291. For simplicity consider the single variate case. Consider a random vector Y = ( Y t , Y,) distributed hy,y,(y,r y2), - 00 < y , < CXI, y2 E [O, M ] and let T ( x ) be an arbitrary continuous func-
52
RANDOM VARIATE GENERATION
tion such that sup, T( x) = M. Similarly to (3.4.2) let us find JY,b!Y2
5
W,))
which we denote j x (x ) . By Bayes' formula
(3.4.1 1)
Differentiating F , ( x ) with respect to x, we obtain M X ) =fYI(xIY2
5
W l ) )
Theoretically, (3.4.12) offers an infinite:number of possibilities for choosing h and T so as to define a proper f x ( x ) . But, practically, this formula has no direct application for generating r.v.3 from f x ( x ) . Let Y, and Y2 be independent. Consider some particular cases, as follows.
(3.4.14)
53
ACCEPTANCE-REJECTION METHOI)
where
c - ' = /h , I ( Y , W , j T ( Y , ) ) d L ,
(3.4.15)
YI
is the efficiency of the method. Thus if Y, and Yz are independent and if f x ( x ) can be represented as (3.4,14), we have fYf.dY2
-< T( Y,))= f x ( x ) .
We can see that (3.4.14) is similar to (3.4.1). When g ( x ) = H,,(T(x)) both (3.4.1) and (3.4.14) coincide. In the particular case when T ( x )= x we obtain f x ( x ) = Ch,,(xW,jx). (3.4.16) Algorithm AR-3 describes the acceptance-rejection method for case I. Atgoritkm AR-3 1 2 3 4
Generate Y, from h,,,(y). Generate Yzfrom h,,,(y). If Y25 T(Y,), deliver Y,. Go to step 1.
Example 6 Generate a random variate from beta distribution
(3.4.17)
Let us use (3.4.16), assuming hy,(X)
-/3(1-
H , j x ) =xu-
x)P-l, I,
0 Ix 2 1
(3.4.18)
0 5 X l l
(3.4.19)
(3.4.20)
By the inverse transform method we have Y,= 1 - u y , and Algorithm AR-3 is as follows:
Y, * u.I/(" - I t
1 Generate U,and U2 from %.(O, 1). 2 Y , c1 - u y .
3 Y2t.Up-'f. 4 If Yz 5 Y,, deliver Y,. 5 Go to step 1.
(3.4.2 1)
54
RANDOM VARIATE OENERATION
Example 7 Consider again the problem of generating a random variate from beta distribution (3.4.17). Let us make use of (3.4.14), assuming
h,I(x) = a x a - " ,
O l x l l
(3.4.22)
H , j T ( x ) ) = ( 1 - x)@- 1,
0 Ix I1
(3.4.23) (3.4.24)
(3.4.25)
By the inverse transform method Y,= U;/a, Y, = Ui/(@-'), and Algorithm AR-3 is as follows: 1 Generate U,and U, from Gz.(O, 1). 2 Y , cu y . 3 Y 2 cup(@1). 4 If Y2I. 1 - Y,,deliver Y,. 5 Go to step 1.
-
Remark If f x ( x ) can be represented as f x ( x ) = Ch,,(xNl H,JT(x))], then it is easy to see that Algorithm AR-3 can be written as follows. Algo~th AR-3'
I Generate Y,from h y , ( y ) . 2 Generate Y, from h,l(y). 3 If Y, 2 T( Y,),deliver Y,. 4 Go to step 1. CASE
2
Let 0 5 T ( x ) 5 M and let Y2 be from qL(0,M),that is, hy$Y2)=
1,
I
M'
0 lY, 5 M
(3.4.26)
otherwise.
Then it follows directly from (3.4.13) that hY,(
f a x )=
where
x )I'( x ?
= C,h,,(x)T(x),
(3.4.27)
55
ACCEPTANCE-RFJECTION METHOD
The efficiency of the method is
(3.4.29)
Substituting C , = C/M in (3.4.1) and denoting g(x) = T ( x ) / M , we obtain = ChY,(XMXh
(3.4.30)
which is exacdy (3.4.1). So case 2 corresponds to Algorithm AR-I.
Example 8 Consider again the problem of generating a random variate from beta distribution (3.4.17), representing f x ( x ) as in (3.4.30), that is, applying Aigorithm AR-1 and taking into account that g( x ) = HY,(X)= x u -
(3.4.3.1)
and g ( x ) = H , j T ( x ) ) = (1
- x Y - I,
(3.4.32)
respectively, for both examples 6 and 7; Algorithm AR-1 for example 6 (see (3.4.17) through (3.4.21)) can be written as: 1 Generate U , and U, from %(O, 1). 2 Y t 1 - u;'? 3 if U, 5 Y O - ' , deliver Y. 4 G o to step 1.
Similarly, for example 7 Algorithm AR-I can be written as: 1 Generate LI, and Uzfrom %(O, I). 2 ycu;/=. 3 If U, I ( I - Y ) @ - ' ,deliver Y. 4 Go to step 1. 3 Let u Ix 5 b, 0 5 7(x) 5 M ,and let Y, and Yz be independent r.v.'s distributed %,(a.6) and %(O, M),respectively. We immediately obtain from (3.4.14) fA-4= Tfx). (3.4.33) CASE
Rewritingf'(x)
in the standard way (3.4.1) = Ch(x)g(x),
56
RANDOM VARIATE GENERATION
we have
(3.4.34)
C=M(b-a)
1
a l x l b
(3.4.35)
otherwise
(3.4.36) Therefore case 3 corresponds to Algorithm AR-2. We can easily see that Algorithm AR-3 generalizes both Algorithms AR-I and AR-2 in the sense that, when h,l(x) is distributed uniformly, we obtain Algorithm AR-I, and when both h,l(x) and h,$x) are distributed uniformly, we obtain Algorithm AR-2. But (3.4.1) generalizes (3.4.14) in the sense that the c.d.f. H y J T ( x ) )is a particular case of g(x), 0 Ig(x) < 1. RESUME: Formula (3.4.1) generalizes 3.4.14. In the particular case when g ( x ) can be represented as a c.d.f. H,t(T(x)) from which a random variate Yz can be easiiy generated, Algorithm AR-3 generalizes Algo-
rithm AR-I and as a rule saves computation (CPU) time. Formula (3.4.14) can be extended easily for the multivariate case.
.
Let Y = (Y,,. . , Y,,)be a random vector with p.d.f. h , ( x ) , x = ( x , , . ..,x,,), and let W be a random variable with p.d.f. h,(w),
Theorem 3.43
w E [ O , MI. Let T ( x ) be an arbitrary continuous function such that sup, T ( x ) = M. Then fy,
I
...
9
y,
(XI,
. . .,x, 1 w 5 T( Y )) = Ch ,( x ) H , ( T( x)), (3.4.37)
where
The proof of this theorem is left for the reader. 3.4.4
Forsythe’s Method
Forsythe’s method is a rejection technique for sampling from a continuous distribution. The original idea is attributed to von Neumann [34]. Forsythe [ 151 described the method explicitly. Other descriptions are given by Ahrens and Dieter [2] and Fishman [ 121 with an application to different distributions. Our nomenclature follows that of Forsythe.
57
ACCEIYTANCE-REJECTION METHOD
Suppose we wish to generate a random variable X from any p.d.f. of the form where (3.4.39)
and h ( x ) is an increasing function of x over the range [0, 001. In the first stage of the method an interval is selected for x , and in the second stage the value of x is determined within the interval by a rejection. For each k = 1,2,. . .,K ( K is defined below} pick g, as large as possible subject to the constraints h ( g k ) - h ( g k - l ) 5 1, g,=O(3.4.40) Next compute r, = i g k j x ( x ) c i x ,
/c = 1,.
. . ,K.
(3.4.4 I )
Here the number of intervals, K, is chosen as the least index such that r, exceeds the largest number less than one that can be represented in a computer. ( K may be chosen smaller if we set f k = i, and if we are willing to truncate the generated variable by reducing any value above gk to the interval I&-,,&). Finally, compute dk=gk-gk-,. k = 1, ..., K (3.4.42) and the function Gk( X ) = h( g, - 1 -tX ) - h( gk - i ) 5 h ( g i ) - h( gk - ) 5 1 3
0 5 X 5 dk . (3.4.43)
Now we present the algorithm. Steps I to 3 determine which interval variabley will belong to. Steps 4 to 8 determine the value of y within that interval. [gk- , . g k ) the
F-I 1 Set k c 1. Generate U fram 9L(4 1). 2 If U 5 r,, go to step 4 (the k th interval is selected). 3 If U > r,, set k t k + I and go back to step I . 4 Generate another uniform deviate U and set X = Ud,.
AIgorirkm
5 Sett+G,(X). 6 Generate U,, U,,.. ,U, where N is such that r U,- I, but t IV, ( N = 1 if 1 IUi). 7 If N is even, reject X and return to step 1. 8 If N is odd, accept X.
.
> U,,t > U.,..., I >
s8
RANDOM VARIATE GENERATION
The proof of the method is given in Forsythe [I51(see also Fishman [12, p. 4001 and Ahrens and Dieter [2]).
Example 1 Exponentid Distribution For h ( x ) = x, f x ( x ) is a standard exponential distribution and we have g, = k , d, = 1, and r, = 1 - e - k for all k.
Example 2 Normal Distribution For h ( x ) = x2/2, f x ( x ) corresponds to the positive half of the normal distribution and we have go = 0,g,= 1, k 2 2. d , = 1, d , = 3”’ - I , . ..,d , = ( 2 k - 1)”’ (2k g, = (2k 3)’/’, and k 2 2. Also
-
X2
k L 1-
G,(x)=~+g,-i~,
The advantage of this method is that it provides a rejection technique for densities of the form (3.4.39) without the need for exponentiation. If G,( x ) is easier to calculate than e - h ( x ? as it is for many members of the exponential family, the method can yield fast algorithms. An important feature of the method is that it does not specify a unique algorithm, but rather a family of algorithms, subject to (3.4.40) being satisfied. The interval widths d , can be chosen at will. A disadvantage of the method is that it requires tables of the constants g&,d&v and rk. 3.5 S I M U T I O N OF RANDOM VECTORS
35.1
loverse Transform Method
Let X = (X,, .. . ,X,,)be a random vector to be generated from the given c.d.f. F,(x). We distinguish the following two cases. CASE 1 The random variables XI.. . . ,A’, are independent. In this case the joint p.d.f. is
- II n
-
f x ,....,x , ( h .. 4)
f;(x,)*
(3.5.1)
1-1
whereJ(xi) Is the marginal p.d.f. of the random variable Xi.It is easy to see that, in order to generate the random vector X =(XI,, . . ,X , ) from c.d.f. F x ( x ) , we can apply the inverse-transform method X, = & - ‘ ( q ) , i== 1 ,..., n to each variable separately.
(3 5 2 )
59
SIMULATION OF RANDOM VECTORS
Exsmple 1 Let XIbe independent r.v.'s with the p.d.f. f;(X,)
=
I 1--,
1
a , < x , < h , , i = i ,..., n
b, - a ,
\O,
otherwise. To generate the random vector X = (XI, . .. ,X,,) with the joint p.d.f.
-ti,,,,, x j x I . . . . * x n ) =
1 il
1
(xI, ...,x , , ) E D
I
(bt-a,)
lo.-'
otherwise
.
where D = ( x i , . . . , x n ) : a , 5 x, 5 b,, i = I , . .. n } , we apply the inverse transform formula (3.5.2) and get X, = a, + (b, - u , ) V , i = 1,. .. , P I . CASE
2 The random variables are dependent. In this case the joint c.d.f. is
Tbeorem 3.5.1 Let U , . . . . , Un be independent uniformly distributed random variates from %(O. 1). Then the vector X = ( X I , . .. ,A',), which is obtained from the solution o f the following system of equations
FAXI) = u, (3.5.4)
is distributed according ta cv(.xj. The proof of this theorem is similar to the proof of (3.2.2) and is left for the reader. The procedure for generating random variates from (3.5.3) contains only two steps. I Generate n independent uniformly distributed variates from %(O, 1). 2 Solve the system of equations (3.5.4) with respect to X = ( X I , .. . ,X,,). There are n ! ordered combinations (possibilities) to represent the variables X I , ...,Xn in vector X , and therefore n! possibilities to generate X while solving (3.5.4). Thus for n = 2 and n ! = 2 we can write fx,,x,(xl,x 2 )
60
RANWM VARIATE GENERATION
in two different ways: (3.5.5) (3.5.6)
The efficiency of simulation will generally depend on the order in which the random variates X i , i = 1,. ..,n , are taken while forming the random vector X. The following example, which is taken from Sobol [29], uses both formulas (3.5.5) and (3.5.6) for generating a two-variate random vector X = (XI, X 2 ) and shows the difference in their efficiency.
Example 1 fX,,X,(x17x2) =
CASE
+
if x, x2 I 1, x 110.~2 20 otherwise.
{
1
fx,,*,(x
1 x2
=f,(X l Mi( x2 I x , 1.
The marginal p.d.f. of the r.v. XI is ft(x1)
=i'-+
fx , x ,( x , , x ~ ) ~ x ~ = -~xxI ), ,( ~ 0 < x l I I .
The conditional p.d.f. of the r.v. X,, gwen XI = xl, is
The correspondent marginal and conditional distribution functions are,
respectively,
CASE
2
SIMULATKONOF RANDOM vEcrms
61
The corresponding marginal and conditional distribution functions are
Fl( x
,I x 2 ) =J”),(x , I
XI)
0
and the system (3.5.4) is
i i
,
dr = x f ( 1 - x2) - z,
1 - (1
Xf(l
,
0 Ix I I
- x2
- X 2 ) ]= u,
-x2)-2=u2
Inasmuch as I - U is distributed in the same way as U,the last system can be written ( I --x,)’= u1
x; = U2( 1 - X$.
Comparing both cases, we can see that the first system is rather difficult to solve (we would have to solve cubic and quadratic equations, respectively), while the second system has a trivial soIution
x,
= 1 - uy3
x,= u y w y . Unfortunately, there is no way to find a priori the optimal order of representing the variates in the vector to minimize the CPU time. Remark For independent r.v.’s the efficiency of simulation does not depend on the order in which the r.v.’s are taken in forming the random vector X.
An alternative method for generating random vectors is the acceptancerejection method based on Theorem 3.4.3.
35.2 Multivariate Transformation Metbod This method can sometimes be useful for generating both random variables and random vectors.
62
RANDOM VARlATE GENERATION
Suppose that we are given the joint p.d.f. fx ,,.-.,xJxI ,..., xn) of the n-dimensional continuous random variable (XI, . .. ,Xm).Let K'
{(XI,.
. .. X n )
:fx,.... x l X l * .
.., X , ) > 0) -
(3.5.7)
Again assume that the joint density of the random variables Y , = g , ( X , , . . .,A',),..., Y, = gk( X i , . . . ,X,,) is desired, where k is an integer satisfying 1 Ik In. If k < n, we introduce additional new random variables Y k + ,= g k + , ( X i , .. . ,X,,), . . . , Y,, = g , ( X , , . ,A',,) for judiciously selected functions g , , I , . . .,g,,; then we find the joint distribution of
..
Y, ,..., Y,; finally, we find the desired marginal distribution of Y,,..., Y, from the joint distribution of Y,,. . ., Y,. This use of possibly introducing additional random variables makes the transformation y , = g , ( x , , . . . ,x J r . . . ,y,, = g,(xl, . . . , x") a transformation from an ndimensional space to an n-dimensional space. Henceforth we assume that we are seeking the joint distribution of Y,= g , ( X , , . ..,A',), .. ., Y, g , ( X , , .. . ,X,,) (rather than the joint distribution of Y , , . . ., Y,) when we have given the joint probability density of Xi,, . .,X,,. We state our results for n = 2. The generalization for n > 2 is straightforward. Let f x , , , , ( x , , x , ) be given. Set K = {(x1.x2) : f x , . x , ( x 1 . x 2 )> 0).We want to find the joint distribution of Y,= gl(XI, X,) and Y, = g , ( X , , X , ) for known functions g,(x,,x,) and g 2 ( x , . x 2 ) . Now suppose that y I = g,(x,, x , ) and yz g 2 ( x I x,) , defines ;t one-to-one transformation that maps K onto, say, D. x , and x 2 can be expressed in terms of yl and y2; so we can write, say. x , = 'pI ( y , , y 2 )and x 2 = tp2 (y1,y2). Note that K is a subset of the xIx2 plane and D is a subset of theyly2 plane consisting of points ( y l , y 2 )for which there exist a ( x , , x , ) E ~ such that ( y l , y 2 ) = [ g l ( x I , x 2 )g2(x,,x2)]. , The determinant
-
5
is called the Jacobian of the transformation and is denoted by J . The above discussion permits us to stale Theorem 3.5.2.
Theorem 3.52 Let Xi and X , be jointly continuous random variables with density function f x , , x , ( x , , x 2 ) . Set K = { ( x I , x 2 :) f x , , x , C x l , x z > ) 0). Assume that: 1 y , =gl(x,,x2) and y , =g2(xl,x2) defines a one-to-one transformation of K onto D.
63
SIMULATION OF RANDOM VECTORS
2 The first partial derivatives of x , = q , ( y 1 , y 2 )and x2 = (p2(y,,y2)are continuous over D . 3 The Jacobian of the transformation is nonzero for (yI,y2)E D. Then the joint density of Y,= gt(XI,X,) and Y, = g2( XI,X 2 ) is given by = I J 1 fx,,x,( Q I(
f,,, Y,< Y I 9Y2
Yl 9Y2 ) 9 d Y t PY2 ) ) I D ( Y l r Y 2 ).
(3 -5.8)
where
The proof is essentially the derivation of the formulas for transforming variables in double integrals. For proof, the reader is referred to Neuts ~51.
For the single variate case the transformation formula (3.5.8) becomes
(3.5.9) Heref,(x) is the given p.d.f.,fy(y) is the desired p.d.f., 1, is the interval of Y PL: g( X ). We can see that (3.5.9)is a particular case of (3.5.8).
x, and
Let Z, and Z2 be two independent standard normal random variables. Let Y,= Z , + Z2 and Y2 = Z, / Z 2 . Then Example I
Y2 1 +Y2
J=
Yl
(1 + Y d 2
I
I
I+Y2
To find the marginal distribution of, say Y,, we must integrate out yl, that
64
R A h W M VARIATE GENERATION
is
Let
then
and so
a Cauchy density. In other words, the ratio of two independent standard normal random variables has 3 Cauchy distribution. To generate an r.v. from a Cauchy distribution we generate Z, and Z2 from N(0, 1) and take their ratio. Example 2
Let Xi have a gamma distribution x , 2 0,
ni>O
0,
otherwise
with parameters ni and I for i = 1,2, and assume X , and X 2 are independent. Suppose now that the distribution of Y,= X , / ( X , + X,) is desired. We have only the one function y , = gI(xI,x2) = x1/(x, + x2), so we have to select the other to use the transformation technique. Since x , and x2 occur in the exponent of their joint density as their sum,x , + x2 is a good choice. Let y2 = x , + x2; then xi = y l y 2 , x2 =y2 -y,y2, and
J=l
-Y2Y2
I -y'Y , I'Y2.
65
SIMULATION O F RANDOM VECTORS
Hence
It turns out that Yl and Yz are independent and that Y, has a beta distribution with parameters n l and n 2 . Thus to generate a random variate from beta distribution we generate two gamma variates X , and X 2 , then calculate Xl/(Xl + X 2 ) .
3.5.3
Muitinormal Distribution
A random vector X = (X,. .. .,X n ) has a multinormal distribution if the p.d.f. is given by I exp[ - f ( x - p ) 7 X c - 1 ( x- p)] (3.5.10) f x ( x )= (27r)n'2p(
and denoted by N(p.C). Here p = (pl..... p n ) is the mean vector, matrix
I: is
the covariance ( n X n)
'72 n
1x1
which is positive definite and symmetric, is the determinant of 2,and I is the inverse matrix of I:. Inasmuch as C is positive definite and symmetric, there exists a unique lower triangular matrix
I:
C=
CI1
0
c21
c22
- * -
...
0 0
(3 -5.12)
66
RANDOM VARIATE GENERATION
such that
z = CCT.
(3.5.13)
Then the vector X can be represented as
x==cz+p,
(3.5.14)
where Z = (Z,, . . . ,Z,) is a normal vector with zero mean and covariance matrix equal to identity matrix, that is, all components Z,,i = 1,. .. ,n, of 2 are distributed according to the standard normal distribution N(0, 1). In order to obtain C from C = CC’ the so-called “square root method” can be used, which provides a set of recursive formulas for computation of the elements of C. It follows from (3.5.14) that X,=c,,Z,+I.r,. Therefore var X I = u , =~ obtain
and c, I = u:(‘.
(3.5.15)
Proceeding with (3.5.14) we
x, = C 2 I G + c222 + P 2
(3.5.16)
var X, = u2, = var( cZ1Z,+ c2*Z2).
(3.5.17)
and
From (3.5.15) and (3.5.16)
E[W, - - P , ) ( - Y ,
-PdI
=012
= ~ [ c l l ~ l ( C 2 I Z I+ C 2 2 Z 2 ) ] .
(33.18)
From (3.5.17) and (3.5.18) (3.5.19)
(3.5.20)
Generally, c , can ~ be found from the following recursive formula: J- 1
c,, =
where
2 C,kCjk
k= I
(
J-1 ‘JJ
-
I/,’
‘;k)
(3.5.2 1)
GENERATING FROM CONTINUOUS DISTRIBUTIONS
67
Algorithm MN-I describes the necessary steps for generating a muitinorma1 variate. Aigon3h MN-1
1
where
2 Generate Z = ( Z , ,. . . ,2,) from N ( 0 , 1).
3 XtCZ+p. 4 Deliver X.
3.6 GENERATING FROM CONTINUOUS DISTRIBUTIONS
This section describes generating procedures for various single-variate continuous distributions. 3.6.1
Exponential Distribution
An exponential variate X has p.d.f. (3.6.1)
otherwise denoted by exp(p). Pmcedune E- I
By inverse transform method
u = ~ ~ ( x1 -e-.''fl )=
(3.6.2)
so that
- U). Since I - U is distributed in the same way as U,we have X = -pln U . X = -[3ln(l
(3.6.3)
(3.6.4)
68
RANDOM VAIUATE GENERATION
For sampiing purposes we may assume /3 = I: if Y is sampled from the standard exponential distribution exp( I), then X = PV is from exp(P). Algorithm E-1
1 Generate U from %(O, I). 2 Xt-BlnlJ. 3 Deliver X .
Although this technique seems very simple, the computation of the natural logarithm on a digital computer includes a power series expansion (or some equivalent approximation technique) for each uniform variate generated. P m d a E-2 We now prove a proposition that can be useful for generating from exponential distribution exp( 1).
Proposition Let U , , , . . , V,, U n + l ,.... U2,-,be independent uniformly distributed random variables, and let YIt,.. . , L&,- ,) represent the order statistics corresponding to the random sample U,, ..,U2"I. Assume U,, = 0 and Ute, = I ; then the r.v.'s n
Y& = (q,.I )
- q,,)in
n o,,
k = 1. ..., n
(3.6.5)
I
I-
are independent and distributed exp( 1). Proof
Denote X,-
qL)--q k-,,,
k = 1 ,..-,n - 1
and n
v,.
x,= - I n n 1-
1
It will be shown in Section 3.6.2 that X , is from the Erlang distribution, that is,
(3.6.6) It is aiso known (Feller [ I I]) that the vector ( X , , .. .,Xn-,) is distributed fx,,..., X a . I ( ~ 1 ~ * * * r x n - 1 ) ~ ( In) !(3.6.7) inside the simplex n-
I
~ x x , s l , x,20, k- I
k = l , . . - ,n - I .
69
GENERATING FROM CONTINUOUS DISTRIBUTIONS
Hence I v , ,. , . , YJY
In
=Ile-*<, I-
~ , ~ o , i ,.... - - i n.
(3.6.9)
I
Q.E.D. For n = 3 we have
1
3
\
70
RANDOM VARIATE GENERATION
Algorithm E-2 describes the necessary steps. Alptithm E- 2 1 Generate 2n - 1 u,, u,, ,,.- ., UZn-1’
uniformly distributed random variates U,, . . .,
,,
,
2 Arrange the variates U,, .. .,U2,,-in order of increasing ma@tudes, that is, define them to be the order statistics q,,,. .., q,,-,). 3 Y k t ( q k - 1 ) - qk))ln(n;-&(h k = l , * . * , n * 4 Deliver Y,, k = 1,. . .,n , as an r.v. from exp(1).
Comparing (3.6.5) with the inverse transform method Y,= -In&, k = l , ...,n, we find that the advantage of Algorithm E-2 is that it requires only one computation of In n;,,U,for generating n exponential variates simultaneously. In the same time the inverse transform method requires n computations of In u k for each variate Y,, k = 1, .. . ,n, separately. The disadvantage of Algorithm E-2 is that it needs 2n- I uniform variates rather than n uniform variates for the inverse transform method. Additionally, Algorithm E-2 requires the arrangement of the uniform variates CJ,,+,, .. to be order statistics U(i),. .., if(,,-.,)and then calculation of U(k- ,)- q k ) , which is also time consuming. Simulating both algorithms we find that Algonthm E-2is faster than the standard inverse Algorithm E-I for n = 3 to n = 6. The optimal n is 4.
..
There are many alternative procedures (Ahrens and Dieter [ 11, Marsaglia [19]) for generating from exp(8) without the benefit of a logarithmic transformation, procedures that are based on the composition method, acceptance-rejection method, and Forsythe method [IS](see also example 1, Section 3.4.4). The reader is also referred to Fishman’s monograph [ 121. Before leaving the exponential distribution we want to introduce von Neumann’s ingenious method [34] for generating from exp(l), a method that was later extended by Forsythe (151 and Ahrens and Dieter [2] for generating various distributions. Let ( X , : i = 0,.. . } be a sequence of i.1.d. r.v.’s from the standard triangular distribution
and define an r.v. N, taking positive integer values through ( X i } by the
71
GENERATlNG FROM CONTlNUOUS DISTRIBUTIONS
inequaiities N- I
2
x,5xo,
C X J l X 0”.., 1- 1
I 4
XJSX~.~x,>X,. 1-1
Jvl
We accept the sequence { X,} if N is odd; otherwise we reject it and repeat the process until N turns out odd. Let T be the number of sequences rejected before an odd N appears (T = 0.1, ...) and let X, be the value of the first variable in the accepted sequence; then Y = T + Xo is from exp(1). It is shown in ref. 34 that generation of one exponentional variate in such a way requires on the average (1 + e ) (1 - e - ’1=6 random numbers.
3.6.2
Ganuna Distribution
A random variable X has a gamma distribution if its p.d.f. is defined as
otherwise, and is denoted by G(a,P). Note that for (x = I, C(1,p) is exp(P). Inasmuch as the c.d.f. does not exist in explicit form for gamma distribution, the inverse transform method cannot be applied. Therefore alternative methods of generating gamma variates must be considered.
k e d m G-1 One of the most important properties of gamma distribution is the reproductive property, which can be successfully used for gamma generation. Let Xi, i = 1,. ..,n, be a sequence of independent random variables from G( ai,8).Then X = Xy-,X,is from G( a, 0) where a = Xy- pi. If a is an integer, say, a = m, a random variate from gamma distribution G(rn,/I) can be obtained by summing m independent exponentiaf random variates exp( p), that is, m
X - P ~(--InU,)== -pin i= I
m
I1 u,,
(3.6.10)
i.; I
which is called Erlang distribution and denoted Er(m, p). Algorithm G-1 describes generating r.v.’s from Er(m, p). A l g ~ t i t b iG-1 1 XtO. 2 Generate V from expfl). 3 X t X + Y.
72
RANDOM VARIATE GENERATION
If a = 1, X C ~ Xand deliver X. 5 ata-I. 6 Go to step 2.
4
It is not difficult to see that the mean computation (CPU) time for generation from Erlang distribution is an increasing linear function of a. However, if a is nonintegral, (3.6.10) is not applicable and some difficulties arise while generating gamma variates. For some time no exact method was known and approximate techniques were used. The most common method was the so-called probability switch method [24]. Let m = [a]be the integral part of a and let 6 = a - m. With probability 6, generate a random variate from Gfm + l,@). With probability I 6, generate a random variate from G( m, /I) This . mixture of gamma variates with integral shape parameters will approximate the desired gamma distribution. This technique will only work when a 2 1. In the particular case when a=; gamma variables can be generated exactly by adding half the square of a standard normal variate to the variate generated in (3.6.10).
-
Proc4drcrp
G-2
Johnk [ 161 suggested a technique that exactly generates' variates from G ( 6 , P ) , where O < 6 < 1.
Theorem 3.6.1 Let W and V be independent variates from beta distribution Be(& 1 - 6 ) (see Section 3.6.3) and exp(l), respectively. Then X = /3VW is a variate with G( 6, li). Prmf Let u = v and let x = @w. Then w = x / @ u , and v = u. The Jacobian of this transformation is (3.6.1 1) The joint distribution of ( u , x ) is therefore given by
otherwise. (3.6.12) *It is understood that when we say a method "exactly generates" random variables on a computer, that the exactness is limited by the computer used and by the randomness of the underlying pseudorandom number generator.
73
GENERATING FROM CON I'INUOUS DISTRIBUTIONS
The marginal distribution for X is
which is G(S,p). AI&th
Q.E.D.
C-2
I Generate two variates W and Y from Be(& I - 6) and exp(l), respectively. 2 Compute X = pYW that is from G(S,B). 3. Deliver A'. To generate a variate from G((w,P) we generate an r.v. Y from Er(rn, l), then compute X = p( Y + V W ) , which is from G ( a , P ) . Here IY = 6 m.
+
Recently, a number of procedures for sampling from G(a,B), based on the acceptance-rejection method, were suggested by Ahrens and Dieter [3], Cheng [9], Fishman [13), Tadikamalla [30, 311 and Wallace [35]. Let us consider some of them. Procedure c-3 Wallace [35]suggested a procedure for generating from G(a,1) with a > 1 based on both the acceptance-rejection and probability switch methods. Let
f.(.)
= Ch(Xk(X)P
where h ( x ) is a mixture of two Erlang distributions Er(m, I ) and E r ( m I , 1) equal to Xm-l
h ( x )= P
e --x
( m- l ) !
x"'e-" m! '
+ ( I - P)--
x 2 0,
+
(3.6.13) (3.6.14)
and g ( + 4 = ( ; ) u [ I + ( < -m
$1
(3.6.15)
It can be found from (3.4.10) that the optimal P is equal to I - 6, where 6 = a - [a]. I t follows from (3.6.14) that the mean number of trials C is a monotone decreasing function of rn for a fixed S and
lim m+oo
( m- l ) ! d = I, r(m+8)
74
RANDOM VARIATE GENERATJON
that is, asymptotically the execution time does not depend on S and achieves optimal efficiency C = 1. Algorithm G-3 describes Wallace's procedure. Al-*thm
G-3
1 Compute 6 = a - m, where m = [a]. 2 Generate Cr,,.. ., U, from %(O, I). 3 With probability 1 - 6 compute rn
v-
11 q .
-In
i- I
4
With probability 6 compute m+ I
v = - I ~ u,. i=1
5 Generate another uniform variate U from %(O, I). 6 If U I ( V / m ) ' / [ 1 ( ( V ( / m )- I)&],deliver V as an r.v. from G(a,1). 7 Go to step 2.
+
The following three procedures are reproduced with little change from Ref. 12.
Proaedwe GC Fishman [13) describes another procedure for generating from G ( a , l), a 2 I: g(x) = x a -
' exp[ (YO
- ]/a)]
-X(I
'exp( 1 - a)
(3.6.16) (3.6.17)
a
& ) = L , - X / .
(3.6.18) The probability of success on a trial is 1 -=-
c
aaeI-o
'
(3.6.19)
For large a the mean number of trials is (3 A.20)
It is not difficult to see that the condition U 5 g ( Y ) , where the r.v. Y is
75
GENERATING FROM CONTINUOUS nISTRIBUTtONS
from exp(l/a), can be written as V, 2 f a - 1)( V , - In V, - 1) and V, and V2 are independent r.v.’s from exp(1). Algoritlun G-4 1 Aca-1. 2 Generate V , and V2 from exp(1). 3 If Vz < A ( V , - In V, - I), go to step 2. 4 Deliver V, as a variate from G ( a , I).
Pmcedum G-5 This procedure is due to Cheng 191 and describes gamma generation G ( a , 1) for a > 1 with execution time asymptotically independent of a. In Cheng’s procedure h(x) =
c=
*(p
+ x”-2,
4a”
(3.6.2 1 )
otherwise
(3.6.22)
r(a)eaX
g(x)=xa-A(y+
x20
ea-r 2)) 4*a+h ’- ’
(3.6.23)
where p
51
aAr
A = (2a -
The execution time C is a monotonicalty decreasing function of a such that, for a = I, C = 1.47, and for a = 2, C = 1.25; asymptotically 2 lim C=-m 1.13. (3 -6.24) p4ao
G
Let b = a - In 4 and d = a + l / X . Then Cheng’s algorithm can be written as follows. A f g d h G-5
1 Sample U,and U2 from %(O, I). 2 Vc-X In[U, /( 1 - &)I. 3 Xcae’. 4 If b + d - X 2 ln(U,2V2),deliver X.
5 Go to step I. P h ~ G-6 b Ahrens and Dieter [3] suggested an alternative procedure for generating from G ( a , p ) with a > 1 and execution time independent of a asymptoti-
76
RANDOM VAIUATE GENERATION
cally and equal to lima-ta C = L/m. Their procedure makes use of the truncated Cauchy distribution. Let
(3.6.25) and
where
(3.6.27) and
(3.6.28)
H,(x) =
1+
tan-'(
7),
- 00 < x < 00
(3.6.29)
are the p.d.f. and c.d.f. of the Cauchy distribution, respectively, with parameters y = a - 1. and p = (2a 1)'/2. I t follows from (3.6.25) and (3.6.28) that h ( x ) is the truncated Cauchy distribution with parameters y and p. To apply the acceptance condition U 5 g( Y ), we have to generate an r.v. Y from the truncated Cauchy distribution h ( y ) . The c.d.f. of Y is
-
(3.6 -30) where H , ( y ) is given in (3.6.29). Substituting (3.6.29) in (3.6.30) and using the inverse transform formula Y = H - ' ( U ) , we obtain Y-ptanTr(u[I - H , ( o ) ]
+H,(o)-~}
+y
(3.6.3 1)
where by (3.6.29)
(3.6.32)
77
OENERATING FROM CONTINUOUS DISTRIBUTIONS
It is readily seen that the condition U 5 g ( Y ) is equivalent to
- V=InU
[+ 1
y'
(yi2y)2]+ylnT-
r+y, (3.6.33)
where I/ is from exp(1). Y' = y
u = Hy(y).
+ p tans(U -;)
and can be found from
A f @ t h G-6 1 yea-1. 2 Generate U from %(O, 1).
3
+
~ ' t - y
tans(U - f 1.
4 Generate V from exp(1). 5 If - V IIn[ 1 (Y' - y ) 2 / / ? 2 ] 6 Go to step 2.
+
+ y In( Y ' / y ) - Y' + y , deliver Y'.
The following two procedures for generating from C ( a , 1) are due to Tadikamalla [30, 311.
P * o c t d ~6 ~- 7 In this procedure 1301 h ( x ) is froin E r ( m , p ) , that is,
Then it is readily shown that g(x)=
xsexp[
-X(I
- I/P)]
[ s p / ( p - - 1)]%-*
,
X L O
(3.6.35)
(3.6.36) where 6 = a - m and m = [a]. The value of a that maximizes the efficiency can be found from (3.4.10) and is equal to a / m . Tadikamalla showed by simulation that his procedure is faster than Fishman's Procedure G-4 for 3 5 a I19 and is comparable for other values of a. For 1 5 a < 2 both methods coincide. This is not surprising. The reason for the great efficiency of this procedure is that Erlang distribution Er(m,p), with n = [a],approximates the gamma distribution C( a,p ) better than the exponential distribution exp(a) (see Procedure G-4) does.
78
RANDOM VARIATE GENERATION
In addition, Tadikamalla's procedure is better than Ahrens and Dieter's Procedure G-6 for a S 8. Algorihm G7 1 Compute &+a - m, where m = [a]. 2 Generate m independent random variates U,, ...,Urnfrom %(O, 1).
3 Compute Y = -/3Inn;,q.. 4 Generate another uniform variate U from %(O, 1). 5 If U S
YGexp[- ~ ( 1 I/P)]
[
&I6e-6
'
deliver Y. 6 Go tostep2. P m d W e G-8 In this procedure 1311 h ( x ) is from the Laplace (double exponential) distribution with location parameter a - I and scale parameter 8, that is,
(3.6.37)
Then it is readily shown that
(3.6.38) and
A I W * t hG-8 1 Generate a random vanate Y from the Laplace distribution with Iocation parameter a - 1 and scale parameter 8. 2 If Y < 0, go to step 1. 3 Generate a uniform random variate from SL(0, 1). 4 If U I g( Y ) (see (3.6.38)), deliver X. 5 Go to step i .
79
GENERATLNGFROM COKFINUOCS D i s r m u r I o N s
Table 3.6.1 The Relative EXkkndes (l/C), and the Average Number of Roodom Numbers Required (N ) for Certain Algorithms Tadikamalla I
Fishrnan
1/c
a
1.5
2.5 3.5 5.5 8.5 10.5
15.5 20.5 30.5 100.5
0.7953 0.m9 0.5047 0.3992 0.3194 0.2868 0.2355 0.2045 0.1674 0.0920
N 2.5 3 -3 4 .O 5 .O 6.3 7 .O 8.5 9.8 11.9
21.7
Tadikamalla 2
1/c
N
I/C
0.7953 0.887 1 0.9222 0.9520 0.9695 0.9755 0.9836 0.9876 0.9917 I .oooo
2.5 3.4 4.3 6.3 9.3 11.3 16.3 21.3 31.3 I01 .o
0.8642 0.7872 0.7565 0.7304 0.7 174 0.7144 0.7132 0.7 149 0.7 195 0.7355
N 2.3 2.5 2.6 2.7 2.8 2.8 2.8 2.8 2.8 2.7
Tadikamalla [31] compared the relative efficiency and CPU timing of his procedures with Fishman’s [13Jand Ahrens and Dieter’s procedures [3]. Table 3.6.1 gives the relative efficiencies and the number of uniform random numbers required for these procedures for some selected values of a. The efficiencies of Ahrens and Dieter’s method are not given in Table 3.6.1 because these have to be calculated numerically and the details are not available in Ref. 3. For increasing values of a the efficiency of Fishman’s algorithm decreases and the efficiencies of Tadikamalla’s first algorithm (G-7) and of Ahrens and Dieter’s algorithm increase. The efficiency of ’Tadikamaila’s second algorithm (G-8) decreases as a increases up to a certain value and then it increases again. Also note that the number of uniforms required for Tadikamalla’s second algorithm (G-8) remains fairly constant. Table 3.6.2 gives the CPU timings for these four methods on an IBM 370/165 computer, for selected values of a. These timings are based on generating 10,OOO variates and using the subroutine TIMER available on the IBM computer. The following observations can be made about the methods compared above. 1 Fishman’s procedure is the simplest of all the procedures and the is constant for any a. As a increases, the number of
CPU time per trial
trials required for one gamma variate increases (efficiency decreases), and thus the CPU time per variate increases with a.
80
RANDOM VARIATE GENERATION
Average CPU Times (in Micnxewh) to Generate One Gamma Varbte on the IBM 370/165 Computer
Table 3.63
Fishman
a
1.5 2.5 3.5 5.5 8.5 10.5
15.5 20.5 30.5 50.5 100.5
127 I75 213 260 334 380 473 559 693
Tadikamalla 1 I37
-
176 184
225 307 354 470 596 850 -
Ahrens and Dieter N/A N/A
225 218 210 209 203 194
Tadikamalla 2 138
152 157 I62 166 167 167
190
166 165
181 171
164 162
2 Tadikamalla’s first procedure (G-7), is also simple, and in this case the number of trials per gamma variate decreases as a increases. However, the CPU time per trial increases with a (more uniforms are required per trial). The average CPU time per variate for this procedure increases with a. Tadikamalla’s procedure is faster than Fishman’s procedure for 3 I a I 19 and the same as Fishman’s procedure for I 5 a < 2. 3 Tadikamalla’s second procedure, (G-8),is faster than Fishman’s and Tadikamalla’s first procedure (G-7) for a > 2 and is faster than Ahrens and Dieter’s fur all a. The average CPU time required per variate for Tadikamalla’s second procedure remains fairly constant for medium and large values of a.
3.63 Beta Distribution An r.v. X has a beta distribution if the p.d.f. is
and is denoted by &(a,@). There are several ways of generating from Be(a,PI. Procedum &-I This procedure is based on the result from Section 3.5.2 (example 2) that says: if Y , and Yz are independent r.v.’s from G(a,1) and G(P, l),
81
GENERATING FROM CONTINUOUS DISTRIBUTIONS
respectively, then (3.6.4 1)
is from Be(a,p). The corresponding algorithm is as follows.
Algoritkm Bp-1 1 Generate Ytfrom G(a,1). 2 Generate Y2 form G ( P , I), 3 X t Y , /
Algorirhm Be- 2 1 Generate (a + #3 - 1) uniform random variates U , ,. ., Va+,-,from
.
%(O, 1).
2 Find U(a,, which is from Be(a,#3).
It can be shown that the total number of comparisons needed to find qpt is equal to ( a / 2 ) ( a + 2p - I), that is, this procedure is not efficient for large a and 8. Many procedures for sampling from Be( a,p ) with nonintegral a and p have been proposed recently (see Ahrens and Dieter (31, Cheng IS], Johnk [16], and Michailov [221). We consider only a few of them. Procedure Be-3
The simplest procedure for generating from Be(a,B) with arbitrary nonintegral a and /3 uses the mode
which corresponds to x*
-
(3.6.42) (a - I ) / ( a + p - 2).
82
RANDOM VARIATE GENERATION
The following algorithm, Be-3, is based on the acceptance-rejection Algorithm AR-2. AIgM-thm Be-3 1 a-1
+a-2
2 Generate U , and U, from %(O, I). 3 If MU, 5 [T(a p)/I'(a)I'(jI)]U?-'(l variate from Be( a,p). 4 Go to step 2.
+
procodrrne
- U,)@-I, deliver U, as a
Be-4
This procedure is due to Jahnk 116) and is based on the following theorem. Let U,and U2be two uniform vanates from %(O, 1) and let Y,= Uilaand Y2= fJ;'@. If Y,+ Y, 5 1, then
Theorem 3.6.2
(3.6.43) is from B(a,/3).
Proof
It is obvious that f y , ( yj, = ayPJyJy2)
I.
= /?Ypi,
0 iy,I1
(3.6.44)
0 SYZ 5 1
(3.6.45)
8-1
(3.6.46)
and fY,Y,(Y,,Y2)
=aPYl
u-I
Y2
*
Let X = Y , / ( Y l + Y,) and W = Y,+ Yz. The Jacobian
J=
ax ax
aw aw
=I
--w
I-x 1-w
(3.6.47)
83
GENERATING FROM CONTINUOUS DiSrKlk3UTlONS
By Bayes' formula (3.6.49)
Substituting (3.6:5 1) and (3.6.50)into (3.6.49), we obtain
Q.E.D. The efficiency of the method is equal to
(3.6.52)
For integer a and
P (3.6.53)
Table 3.6.3 represents the mean number of trials C as a function of a and p. Asymptotically,
lim C - lirn C = lirn C = c o . u-rw
&+m
8;.0
U>O
a-+m
P-m
Thus for large a or /3 Johnk's procedure is not efficient.
TaBie 3.63 I'be Mean Number of Trials as a Functlw of a and fi
P a
I
3
1
2
4
6
3
4
20
56
5
6
56
252
84
RANDOM VARIATE GENERATION
Algorithm Be-4
1 j t l . 2 Generate
V, and r/i+, from %(O,
u.''u. , q:/f. Yl + Y2L
1).
Y,+ 4 '2Y 3
5 If 1, go to step 2. 6 j+-j+2. 7 Deliver X = Y, /(Y,+ Y2).
P m e e h Be-5 This procedure is based on the results of examples 6 and 7 from Section 3.4.3. As follows from (3.4.20) and (3.4.24), the efficiencies of the acceptance-rejection method AR-3 are, respectively,
(3.5.54) (3.5.55) in examples 6 and 7. For integer a and /? we have, respectively, 1 ( a - I)!/?! -
c
(3.6.56)
E
I
(a+P-
l)!
I
. ! ( P - I)!
c
( a + / ? -I ) !
-=
(3.6.57)
In both cases (3.6.56)and (3.6.57)the efficiencies are a little higher than in Johnk's procedure Be4 (see (3.6.53)). I t is interesting to note that for p > a it is more efficient to represent J x ( x ) in the form of (3.4.18) through (3.4.20) and for a > /3 it is more efficient to representf,(x) in the form of (3.4.22) through (3.4.24). Procediin? Be-6 In this procedure h ( x ) is Ee(m,n), that is,
h(x)=
I)! x"-'(l ( m -I)!(n-- I)! (m+n-
-x y - I,
0 Ix I 1 (3.6.58)
where m = [afand n = [ p]. Then
85
GENERATING FROM CONTINUOUS DlSTRIBUTIONS
where 6, = a - rn, S, = /3 - n , and B ( r , s ) = r ( r ) r ( s ) / I ' ( r + s). It is quite easy to prove that the function y = xsc(I - x)&, is concave on [0, I] and achieves its unique maximum y* =
Sf'6262 61 at the point x * = (6, +62)6,+6z 4+4?
Now we set
and
The efficiency of the procedure is
It is easy to see that
(3 -6.61) Comparing (3.6.6 1) with (3.6.56) and (3.6.57), we can also readily prove that Procedure Be-6 is more efficient than Procedure Be-5 for a 2 2, p 2 2. A l m l h m &-6 1 Generate U from %(O, I). 2 Generate Y from Re(m, n ).
3 If
deliver Y. 4
Go
to
step 1.
6, = 0, then g ( x ) = ( 1 - x)',, p). If 8,= 0, then g(x) = ~ ' B ( r n , n ) / B ( a , n ) . If 6 , = 6 , = 0 , then C = 1.
Remark
If
B(m.n)/B(rn,
1 ,
y * = 1, and C = y * = I , and C =
86 3.6.4
RANUOM VARIATE GENERATION
Normal Distribution
A random variable X has a normal distribution if the p.d.f. is
and is denoted N(p.a2). Here p is the mean and u 2 is the variance. Since X = p + aZ, where Z is the standard normal variable denoted by N(0, I), we consider only generation from N ( 0 , I). As we mentioned in Section 3.2, the inverse transform method cannot be applied to the normal distribution and some alternative procedures have to be employed. We consider some of them. More about generation from normal distribution can be found in Fishman [ 121.
Procedure N-l This approach is due to Box and Muller [6]. Let us prove that, if U,and U2 are independent random variates from %(O, I), then the variates Z, =
- 2 In U,
2,= ( - 2 In U,)
cos 2nu2 1/2
(3.6.63)
-
sin 2774
are independent standard normal deviates. To see this let us rewrite the system (3.6.63) as
z, = (2V
cos 2nU
Z , = ( 2 )"'sin ~
(3.6.64)
2 n ~ ,
where Y is from exp(1) and U, = U.I t follows from (3.6.64) that and
Z:+Z$-2V
-' 2 - tan 2nU. 2,
The Jacobian of the transformation
I
- - - 1( z ; + z : ) = 4nv
--1
2n
87
GENERATING PROM CONTINUOUS DISTRIBVTIONS
and
(3.6.65) T h e last formula represents the joint p.d.f. of two independent standard
normal deviates. A/g&thm
N-I
1 Generate two independent random variates U , and U, from %(O, I). 2 Compute 2,and Z , simultaneously by substituting U , and U, in the system of equations (3.6.63).
Procedure 1V-2
This procedure is based on the acceptance-rejection method. Let the r.v. X be distributed
(3.6.66) Since the standard normal distribution is symmetrical about zero, we can assign a random sign to the r.v. generated from (3.6.66) and obtain an r.v. from N ( 0 , 1). To generate an r.v. from (3.6.66) write ,f,( x ) as = c'h(x)g(-r)
,,(.K,
where
(3.6.67)
h( x ) = e --*
(3.6.68) (3.6.69) The efficiency of the method The acceptance condition
I&
equal to Lf.I*/2e ~ 0 . 7 6 .
u 5 g( Y ) is u 2 exp[ - ( Y - 1),/2],
(3.6.70)
which is equivalent to -InU2 where Y is from exp(1).
(Y2
'
(3.6.7 1)
88
RANDOM VARIATE GENERATION
Since -In U is also from exp( 1). the last inequaIity can be written
(3.6.72) where both Vt = - In U and I/, = Y are from exp( 1). AIgoritihm
N-2
1 Generate V, and V2 from exp(1). 2 If V, < (v,- 1 ) ~ / 2go , to step I.
3 Generate U from %(O, I). If U 2 0.5, deliver Z = - V,.
4
5 Deliver Z = Y , .
Remark
In order
to
obtain Algorithm N-2 we can representf,(x) as f * ( x ) = C b , W (1 - N y J x ) ) ,
where h y,( x ) = h( x ) = e - x
H,,(T(x))= I - e -n x ) T ( x ) =t(.X
- Q2,
and then apply Algorithm AR-3‘. Pmcedurp N-3 In this procedure we make use of the logistic distribution [32]
It is shown numerically in Ref. 32 that 8+ = 0.626657, -I = 0.9196
c
(3.6.74)
and
[+
g(x) = 0.25 I
Algorithm N-3 is as foliows. Algorithm N-3
Generate U,and U, from %(O, 1). Yc-0.626657ln(l/U- I). 3 If U 5 g(Y), deliver Y . 4 Go to step 1. I
2
$+ 1.5957x)I. (3.6.3)
exp( - i ~ ~ x ) ’ e x p (
89
GENERATING FROM COMINUOUS DISTRIBUTIONS
P m d U W N-4
This procedure is based on the relationship between the normal distribution with chi-squared distribution and a vector uniformly distributed on the n-dimensional unit sphere. Let Z , , . . . , Z , be i.i.d. r.v.3 distributed N(0, I) and let X = (2:-,Z,?)1/2; then it can be shown by the multivariate transformation method that the vector
Y = = ( Y,..., , .”)+
=I
,...,
”) X
(3.6.76)
is distributed uniformly on the n-dimensional unit sphere.+ Now taking into account that X * = Z;- lZ,’ has the chi-squared distribution with n degrees of freedom (see Section 3.6.8), the algorithm for generating from N ( O , l ) , where I is a unit matrix of size n, is as follows. Algodh
N-4
1 Generate a random vector Y = (Y,, . . . , U,) uniformly distributed on the n-dimensional unit sphere. 2 Generate a chi-square distributed random variate xz with n degrees of freedom. 3 2,- X Y , , k - I , ...,n. 4 Deliver Z = ( Z , ,.. . ,Za}.
Since the efficiency of the algorithm for generating Y = (Y,, ..., Y,) (see example 5, Section 3.4.2) decreases when n increases, it would be interesting to find the optimal n in order to minimize the CPU time while sampling from N(0, I). procedrm N-5
This procedure relies on the central limit theorem, which says that if X i , i = l , ...,n,arei.i.d. r.v.’swith E(X,)=pandvar(X,)=a’, then
Z -
I-
I
(3.6.77)
n”2U
converges asymptotically with n to N(0, 1). Consider the particular case
*An alternative algorithm for generating a vector uniformly distributed on the n-dimensional unit sphere is given in example 5, Section 4.3.2.
90
RANDOM VARIATE GENERATION
when all Xi,i = 1 , . . , . n , are from %(O, I). We find that p=f
n
(3.6.78)
A good approximation can already be obtained for n = 12. In this case I2
Z = C V,-6. I==
(3.6;79)
1
Algorithm N-5is straightforward. Aigoritlhnt N-S 1 Generate 12 uniformly distnbutcd random variates U,, . .., U,2from %(O, 1). 2 Z+-X!i,l.!--6. 3 Deliver %.
P M d U m N-6
Another approximation technique for generating from N ( 0 , 1) is given in Tocher [33j; it makes use of the following approximation: e - v1/2
2e
y
(I
+e
(3.6.80)
TIT.
for x > 0 and k = The c.d.f. for the approximation is
The inverse transformation is 1 k
X=-InAttaching a random sign
K to
1+u 1-u-
(3.6.81 )
this variate we obtain the desired variate
91
GENERATING FROM COXTINUOUS DISTRIBVTIONS
AlgoritAvn N-6
1 Generate U,and U, from %(O, 1). 2 X + VwT jni(1+ u,)/I- v~>I. 3 If U, 5 0.5, deliver 2 = -X. 4 Deliver Z = X .
3.65
Distribution
Let X be from N(p,a'). Then Y = e x has the lognormal distribution with p.d.f.
otherwise.
0.
A I ~ i t l C mLN-1
1 Generate 2 from N(0, I). 2 Xcp+aZ. 3 Yeex. 4 Deliver Y.
3.6.6 Cauchy Distribution
An r.v. X has a Cauchy distribution denoted by C ( a , p ) if the p.d.f. is equal to
B
= n[
p + ( x - *,'I
,
a>o,p>o.-oo<x
(3.6.83)
and the c.d.f. is equal to (3.6.84)
Applying the inverse transform method, we obtain
Algorithm C-1 describes the necessary steps. Al'dhm
C-1
1 Generate U from U(0, 1). 2 X t a - @/tan ( v l l ) . 3 Deliver X.
92
RANDOM VARIATE GENERATION
The next aigorithm is based on the following two properties: (a) If Z , and Z , are independent variates from N(0, I) then Y = Z , / Z , is from C(0, 1). (b) If X is from C(0,l) then Y -PX a is from C ( a , / l ) . T h e last property can be obtained directIy from the transformation formula (3.5.9)
+
A I g M ' t h C-2 1 Generate Z, and Z2 from N(0, 1). 2 x+pz,/z, + a. 3 Deliver X . The third algorithm is based on the following property [ 181: (c) If Y , and Y, are independent r.v.'s both from %( - f . +) and Y: + Y: 5 f then X = Y, / Yz is from C(0, 1). Algrn'thm C-3 2
Generate U,and U2 from DzL(0, 1). Y , t - U , - f and Y 2 t U 2 - , I.
r,2>; go to 1. 3 If Y:+ 4 X t h Y l / Y, + a. 5 Deliver.'A The efficiency of the algorithm is P( r: t. Y; 5;)
+,
so the algorithm is relatively efficient.
3.6.7
W d W Distribution
An r.v. has a Weibul distribution if the p.d.f. is equal to
0, otherwise and is denoted by W( a,#I). To generate X by the inverse transformation method note that u = F,( x) 1 - e * ( x / f l ) " (3 h.87) 5
so
x = p ( - t n ( l - u))"".
(3.6.88)
93
GENERATING FROM CONTINUOUS DISTRIBUTIONS
Since 1 - U is also from ~%(0,I), we have or
(
$)u
= -In
U.
(3.6.90)
Taking into account that -ln(O) is from exp(l), the algorithm for generating an r.v. from a Weibul distribution can be written as follows. A I m i t k m W-I 1 Generate V from exp( 1). 2 Xtpv””. 3 Deliver X .
3.6.8 Chi-square Distribution Let Z,, . . .,Z, be from N(0,l). Then k
Y-
2:;
(3.6.91)
i- I
has the chi-square distribution with k degrees of freedom and is denoted X 2 W
Formula (3.6.91)says, “the sum of the squares of independent standard normal random variables has a chi-square distribution with degrees of freedom equal to the number of terms in the sum”. One approach for generating a chi-square variate from x 2 ( k ) is to generate k standard normal random variables and then apply (3.6.91). Another approach makes use of the fact that x 2 ( k )is a particular case of a gamma density with gamma parameters a and equal, respectively, to k / 2 and 2. Consider two cases. CASE
1
If k is even, then Y can be computed as
Y = -2111
11 (:I: 1
.
(3.6.92)
Formula (3.6.92)requires k/2 uniform variates compared to k in (3.6.91). It also requires one logarithmic transformation, compared to k logarithmic and k cosine or sine transformations .for generating Zi from N(0, l), i = 1,. . .,k (see (3.6.63) and (3.6.64)).
94
RANDOM VARIATE GENERATION
CASE
2 If k is odd, then k/2 - I / 2
Y=-21n(
rI1
1-
u,)+z2,
(3.6.93)
where Z is from N(0, I ) and ZJ, is from %(O, 1). For k > 30 the normal approximation for chi-square variates can be used based on the following formula [24):
z=m-V%=-i. Solving for Y . the chi-square variate, we obtain Y-
(Z+
m)2 2
(3.6.94)
Remark Let Y , , Y2, and Y, be chi-square random variabies with degrees of freedom 2(a + /3), 2a, and 2p, respectively; then
has a beta density with parameters a and p. Applying formula (3.6.92), we get
3.6.9 Student’s I Distribution Let 2 have a standard normat distribution, let Y have a chi-square distribution with k degrees of freedom, and let Z and Y be independent; then
(3.6.95) has a Student’s t distribution with k degrees of freedom. To generate X we simply generate Z as described in Section 3.6.4 and Y as described in Section 3.6.8 and apply (3.6.95). For k 2 30 the normal approximation can be used. 3.6.10
F Distribution
Let Y, be a chi-square random variable with k , degrees of freedom; let
Yz be a chi-square random variable with k, degrees of freedom, and let Y,
9s
GENERATING FROM DISCRETE DISTRIBUTIONS
and Y, be independent. Then the random variable (3.6.96) is distributed as an F distribution with k , and k , degrees of freedom. To generate an F variate we first produce two chi-square variates and then use (3.6.96).
If X has an F distribution with k and k, degrees of freedom, then l / X has an F distribution with k , and k, degrees of freedom. Remurk 1.
Remark 2. If X is an F-distributed random variable with k , and k , degrees of freedom, then
(3 6.97)
has a beta density with parameters a = k /2 and /3= k, /2.
3.7 GENERATING FROM DISCRETE DlSTRIBUTIONS
In this section we describe several procedures for generating stochastic variates from must of the well known discrete distributions. We start with the inverse transform method, which Is generafly easily implemented and is widely used. Let X be a discrete r.v. with probability mass function @.m.f.) k 0,1 ,. . . Pr ( X = x k ) = P, , (3.7.1)
-
and with c.d.f. L
8, = Pr(X G x,)
-
CP,. 8
(3.7.2)
I1
Then
where U is from %(O, 1). Thus X = min { x :g,-
,< U
s gk}.
(3.7.4)
Algorithm IT-2, which is called the inverse transform algorithm, describes generating discrete r.v.'s. This algorithm is based on logical comparison of U with g,'s and is as follows.
96
RANDOM VARIATE GENERATION
C t Po. BCC. KtO.
Generate U from %(O, I). If U I B (UI &), deliver X = x k . K t K + 1. CCAk + lc ( Pk + 1 = + 1 'k ). Bt B -k c (8, + 1 = gk + pk+ 1 )' Go to step 5.
,
Here Po and A,+ = PA+ /Pk are distributed dependent. The recurrent formulas (3.7.5) 'k i.1 = A k + I 'k gk+l = g k + ' k + I
(3.7.6)
in steps 7 and 8 are straightforward for calculation. Most discrete r.v.'s are integers nonnegative valued, that is, xk = k, k = 0, 1,. , . . Later, we consider only these r.v.3. It is easy to see that the mean number of trials m
00
C=I+ xxkPk= ZkP,=l+E(X) k- I
(3.7 -7)
k-l
is equal to the expected value plus one additional trial. Table 3.7.1 represents the values of Po, A,+ I , and C for most well known discrete distributions. In order to generate an r.v. from a specified discrete distribution, we take the corresponding values Po and Ak+, from Table 3.7.1 and then run Algorithm IT-2. In many cases we can improve the efficiency of the inverse transform method IT-2 by starting the search of X at k = m ,rn being an interior point (for example, mode, median, etc.), rather than at k = 0. We assume that tables of Pk and gk are available. The procedure is as follows. If U 2 g,, then gm+t
gm+i- 1
(3.7.8)
+ Pm+,
P,+, = P,+,- IA&+,,
i = 1,2,. ...
(3.7.9) (3.7.10)
4
+
L
-PI+
4
c.
-.
-.C 4
I
4
9,
L
4
2 L
N
I C W
h
h
4
s
Y
97
98
RANDOM VARIATE GENfXATION
where A;+, and A:-, are distribution dependent and their values are available to compute. Algorithm IT-3 describes the necessary steps. A1g~i.h IT-3
1 D+gm* 2 EtP,.
3 Generate 4 5
6 1 8 9 10 11 12
(Ifrom
%(O, 1).
Ktm.
If U > g,, go to step 12. D + D - E ( g k - I = g k - Pk). If U > D , deliver X = K; go to step 1. K t K - 1. If K = 0, deliver X = K ; go to step 1.
E c E A , " - I ( Pk- I = A : - I Pk). Go to step 6. K + K + 1. 13 E c EA4;+ I ( p k + I = I pk 114 D e D + E . 15 If U 5 D,deliver X = K. 16 Go to step 12.
Table 3.7.2 represents the values of Po, m(mode), A i + l and A i + , for most well known discrete distributions. It is easy to see that for an integer m the number of trials (number of logical comparisons of I/ with g: s) is the following r.v.: i f x S 0 , I , ...,m 2 + ( m - X), 1+(X-m), i f x = m + l , m + 2 ,....
m
=
x-0
x-0
n-m+l
x=0
x-m+
m
x-0
I
XXP, x-0
m
xP,=g,+
I +mg,-m(l
x-m+ I
-
P,-
P , + M ~~ , - m
W
+ 5:
m
W
?II
2 P,+ I;: P,+
(3.7.12)
xPx" 1 + E ( X ) - y ( m ) ,
-gm)-
2 xPx+E(X) x-0
(3.7.13)
..., o < p < I
I)p'(l -p)"
\ml max(O,nl + m - n ) s x 5 min(n,.rn)
Hypergeometric
X==O,I,
P,-(x+;-
Negative binomial
e P, = X ! x = O , l , ..., X > O
Poisson
...,n , p > O
Notation
Disaete Unimodai Dbrtributioss
=(;)Px(l -P)"-"
x=O,1,
px
Binomial
Distribution
Table 3.7.2
PO
Modal Value rn
k+l
I-p
h -
k+l
-.P n-k
k
,-
p
( r i k - 1)(1 - p )
k
5;
k n-k+l
100
RANDOM VARlATE GENERATION
where m
2 xPx-g,+m-2mg,.
y(m)-2
(3.7.14)
x-0
It follows from (3.7.7) and (3.7.13) that Algorithm IT-3 is more efficient than Algorithm IT-2 for m such that y( m ) > 0. However, y ( m ) is not necessarily positive for each m. The following example illustrates this point.
Example 1 Assume that the r.v. X has the following p.rn.f.:
p, 0,
x=o
otherwise.
Let m = 1; then y ( l ) = 2 2 ~ , 0 x P x - g , + 1 -2g,,,- 1 - f + 1 - f -0.25 < 0,and therefore Algorithm IT-2 is more efficient than Algorithm IT-3. Neverthetess, in many cases it is possible to choose the starting point m in such a way that y ( m ) > 0,and therefore it is possible for IT-3 to be more efficient than IT-2. Lemma 3.7.1
If there exist m
> 0 such that
rn
P,I
Z: ( 2 % .x-
ij&
forg,<;,
(3.7.15)
I
then y ( m ) > 0.
Proof Condition Po 5 Z'$ ,(2x - 1)P' is equivalent to m
2
I= x p x - g , > o ,
(3.7.16)
x-0
and, correspondingly, condition g, 5 f , m > 0, is equivalent to m - 2mg,
> 0.
Both (3.7+16)and (3.7.17) yield y ( m ) > 0. Note I
We can see that Lemma 3.7.1 is valid if Po 5 2:- ,Px.
(3.7.17)
Q.E.D.
101
GENERATING FROM DISCRETE DIS'TRIBU'I'IONS
This condition is not restrictable and holds for practically all discrete distributions. achieves its maximum at points mo or m,+ 1 where correspondingly, on whether gm0+ gm,+i 5 I O r ~ m , + g m , + t > 1.
Lemma 3.7.2
y(m)
mo = max ( m :g ,
5
i), depending,
Prooj It is straightforward to obtain from (3.7.14) that
AY( m ) = ~
( + m 1)
- Y( m ) = 1 - g m - g m + I -
(3.7.18)
For m < mo we have g, + g,+ 5 1, and therefore Ay(m) 2 0; for m > mo we have, correspondingly, g, + g,+ I > I and A y ( m ) < 0.Therefore y ( m ) is a unimodal function with the maximum at points m, or mo+ I , Q.E.D. depending on whether gm0+ gm0+ I I 1 or gma+ gm0+ I > 1. Nore 2 In other words, Lemma 3.7.2 says that y ( m ) achieves its maximum at the median or at a point neighboring the median on the left. As a corollary from these two lemmas we obtain the following theorem.
Tbeorem 3.7.1 The optimal starting point in Algorithm IT-3 is either the median m o = max { m : g , S i } , if Po 5 Z::;'(2x - IjP, and gm0+ g m e + ls 1, or mo + 1, if Po IZ,ol m + I (2x - 1)P. and ~ Q ~ ~ >+ 1. R ~ ~ + ~ Nofe 3 Theorem 3.7.1 is valid not only for integer nonnegative valued r.v.'s, but for any discrete r.v. with values xO,x,,. . . , since Algorithm IT-3 is determined not by the sequence x,, x I . . .., but by its indices 0, 1,. . .. In the rest of this chapter we consider some alternative procedures for generating discrete r.v.'s. Generally, procedures for generating discrete variates are simpler than procedures for generating continuous variates, and we describe them only briefly. 3.7.1
Binomial Dtstrfbution
An r.v. X has a binomial distribution if the p.m.f. is equal to
P, = (;)p"(l - p ) " - - I ,
x 5 0 , ..., n
(3.7.19)
and is denoted by B ( n , p ) . Here 0 < p < 1 is the probability of success in a single trial, and n is the number of trials. To apply the inverse transform method IT-2 we must check the following condition after step 5: if K - n - 1, terminate the procedure with X=K-n.
I02
RANDOM VARIATE GENERATION
It is also worthwhile to note that. if Y is from B ( n , p ) , then n - Y is from B(n,1 - p ) . Hence for purposes of efficiency we generate X from B ( n , p ) according to ~-~(n!p)ifp~;$ (3.7.20) y--B(n, 1 - p ) ifp >f. For larger n the inverse-transform procedure becomes time consuming, and we can consider the normal distribution as an approximation to the binomial. As n increases the distribution of
(3.7.21) approaches N ( 0 , 1). To obtain a binomial variate we generate Z from N(0, I), solve (3.7.21) with respect to X,and round to nonnegative integer, that is,
X = max (0, [ -0.5
+ np + Z(np( 1 - p ) ) ” 2 ] ) ,
(3.7.22)
where [a]denotes the integer part of a. We should consider replacing the binomial with the approximate normal when np > 10 forp >-f and n(1 - p ) > 10 forp <:. It is shown [22] that, if m is the mode, then for large n the mean number of trials in Algorithm IT-3 is equal to
(3.7.23) Comparing both Algorithms IT-2 and IT-3 (compare (3.7.7) with (3.7.23)). we can see that for large n the mean number of trials is respectively. proportional to np and So for large n Algorithm IT-3 is essentially more efficient than Algorithm IT-2. The acceptance-rejection method can also successfully be implemented for generating from B ( n , p ) (see Ahrens and Dieter [4] and Marsaglia 1201). Description of algorithms for this and their efficiency can be found in Fishman’s monograph [ 121.
im,
3.7.2
Poisson Distribution
An r.v. X has a Poisson distribution if the p.m.f. is equal to Axe
-’
P, = x! ’ and is denoted by P(A).
x = 0,1,.
. . ;h > 0
(3.7.24)
103
GENERATINO FROM DISCREl'E D1SlRIBU'l'tONS
I t is well known (Feller (111) that, if the time intervals between events are from exp(l/X), the number of events occurring in an unit interval of time is from P ( h ) . Mathematically, it can be written x*
X
I
X q 5 1 5 2 Ir;, where T,, i = O , 1 , ...,X + 1, are from exp(I/X). Since = -(l/X)ln V,, the last formula can be written as X x+ I InY < A < lnV,, X = O , l , ... i-0
(3.7.25)
i-0
i=O
(3.7.26)
i=O
or
(3.7.27) The following algorithm is written with respect to (3.7.25): 1 A+l 2 KcO.
(#k=
1).
3 Generate Uk from %{O, 1). Ac-UkA
(gi,+ImgkUk)'
5 If A < e-', deliver X = K. 6 K t K + 1. 7 G o to step 3. For large h ( h > 10) we can approximate the Paisson distribution by normal distribution. As X increases, the distribution of
(3.7.28) approaches N(0, 1). To obtain a Poisson variate we generate Z from N(0, I), then by analogy with (3.7.22) we obtain
x = max (0,[ x + z''~- OS]),
(3.7.29)
where [a]is the integer part of a. It is shown in Ref. 22 that, if m is the mode, then for large n the mean execution time in Algorithm IT-3 is similar to (3.7.23) and is equal to
(3.7.30)
104
RANDOM VARlATE GENERATION
The mean number of trials in both Algorithms IT-2 and IT-3 are proportional, respectively, to X and A'", and therefore Algorithm IT-3 is again essentially more efficient than Algorithm IT-2.
3.73 Geomebrc . Distribution An r.v. has the geometric distribution if the p.m.f. is equal to
..., o < p <
(3.7.31) and is denoted by G e ( p ) . Geometric distribution describes the number of trials to the first success in a serial of Bernoulli trials, The following procedure describes generating from G e ( p ) and is based on the relationship between exponential and geometric distribution. Let Y be from exp(/?); then Px=p(l -p)",
x=O,l,
I
(3.7.32)
-
which is G e ( p = 1 e - ' ' f l ) . For /3 = - 1/In ( 1 - p ) (3.7.32) is identical to (3.7.3 I). Therefore
(3.7.33) where V = -In(U) is a standard exponential variate, that is, X is from Ge( p). Hence to generate an r.v. from G e ( p ) we generate an r.v. from the exponential distribution with /3 = - I /ln ( I - p ) and round the value to an integer. CPU time for this procedure is constant, whereas the CPU time for the inverse transform method is proportional to l / p . However, because this procedure requires generation from the exponential distribution and rounding, it is more efficient than Algorithm IT-2 only for p < 0.25. 3.7.4
Negative Binomid Distribution
The p.m.f. for the negative binomial distribution is Px=(x+X r - I)f(l
--p)",
X'O,l,.
. .; p > o
(3.7.34)
and is denoted by N B ( r , p ) . When r is an integer the distribution is called Pascal distribution, which describes the number of successes occurring before the rth failure in a series of Bernoulli trials. This implies that geometric distribution is a special case of Pascal distribution with r = 1.
1oJ
GENERATING FROM DISCRETE DISTRIBUTIONS
The following algorithm describes generating from Pascal distribution with parameters r and p denoted PS(r,p). 1 XtO. 2 YCO.
3 Generate Ox+ from %(O, I). 4 If U,, > p , go to step 8. 5 YCY+I. 6 If Y = r, deliver X. 7 Go to step 3. 8 XtX+l. 9 Go to step 3. An alternative procedure is based on the reproducfiw property of the negative binomial distribution analogous to that for the gamma distribution. Let X , , i= 1,. .., n , denote a sequence of i.i.d. t.v.’s from NB(r,,p). Then X = 2:- I X, is from PS( r , p ) , where r = 2 :- !r,. Suppose that r, = 1, i = 1,. ..,r, which means that X,, i = 1,. ..,r , are from C e ( p ) ; then X = Z : , , X , is from N B ( r , p ) . The algorithm is straightforward and contains the following steps:
1 Generate X,,. . .,X, from Ge( p ) . 2 x+-z;olx,. 3 Deliver X.
This procedure is more efficient than the inverse transform method IT-2 for p > 0.75. Another possible method for generating an r.v. from NB( r , p ) makes use of the following relationship (see Johnson and Kotz [ 18, p. 1271):
(3.7.35)
Pr ( X Ik ) = Pr ( Y 2 r ) ,
where X is from N B ( r , p ) and Y is from B( p , r + k ) . The reader is asked to describe an algorithm based on (3.7.35), assuming that r.v. Y from B ( p , r + k ) is given. The next procedure is based on the relationship between negative binomial distribution with gamma and Poisson distributions. Suppose we have a mixture of Poisson distributions, such that the parameter X of the Poisson distributions e -’V
P ( X = x l X ) =-
x!
’
x=o,
I , ...
106
RANDOM VARIATE GENERATION
varies according to G( a, p), that is, X 2 0,a> 0,#3 > o .
(3.7.36)
Then P( X = x ) = i w P ( X = xlh)f,(A)dA
=
(3.7.37)
P V 4 3-
So X i s from N B ( a , 1/(P+ 1)).
It is obvious that, when A is from G ( r ,(1 - p ) / p ) , (3.7.37) is identical to (3.7.34). The algorithm is as foIlows: 1 Generate an r.v. X from G ( r , p / ( I - p ) ) 2 Generate X from P(A).
3 Deliver X. It is not difficult to see that an alternative algorithm for generating an r.v. from N B ( r , p ) is the following: 1 Generate A from G ( r , 1). 2 Generate X from P ( X p / ( l - p ) )
3 Deliver X.
3.7.5
Hypergeometric Distribution
An r.v. X has a hypergeometric distribution if the p.m.f. is equal to
P, =
(
",--":) ,
(9
max(O,n, + m - n) I x 15 min(n,,m)
(3.7.38) and is denoted H(n. m,nt). Hypergeometric distribution describes sampling without replacement from finite population. i t has three parameters, n, m, and n , , which have the following meanings: n, the size of the total population in two classes, rn, the size of the sample ( m < n ) that is taken from the total population n without replacement, and n,, the size of the
107
EXERCISES
population in the first class ( n - tz, is the size of population in the second class). Generation from H(n,m , n , ) involves simulating a sampling experiment without replacement, which is merely a Bernoulli trials method of generating from B ( n , p ) with n andp altering (varying) depending, respectively, on the total number of elements that have been previously drawn from the total population and the number of the first class elements that have been drawn. The original value n = no is reduced according to the formula n , = = n i w I -I , i = I ,..., m (3.7.39) when an element in a sample of m is drawn. Similarly, the valuep = p o = n , / n , when the ith element in a sample of n elements is drawn, becomes n i - I p i - l- 6 , i = 1 ,..., M , (3.7.#) Pi = n,-l - 1 where S = I when the sample elements ( i = I ) belong to the first class, and 6 = 0 when the sample elements ( i - I ) belong to the second class.
1 Describe an algorithm lor generating from Laplace (double exponential distri-
bution)
,
B>O, -004x<<.
using the inverse transform method 2 Apply the inverse transform method for generating from extreme value distribution
3 Describe an algorithm for generating from logistic distribution
4 Consider the triangular random variable with the density function if x < 2a or x 2 2b x-2a
(26- X )
if 20 Ix
if a + b s x < 26,
108
RANDOM VAR3ATE GENERATION
and the distribution function
- 2 ~ ) ~if 2a I x
(x
Fdx)=
+
I-
Ill
(26 - x ) ~ if a + b 2 x 2(b-aJ2
< 26
ifx22b
This random variable can be considered as a sum of two independent random variables uniformly distributed between a and b. Show that, applying the inverse method, we obtain
2a + ( b - a ) 6 U
if05W ~ 0 . 5
2b+(a-b)<2(1-U),
ifO.ScU
5 Let C,X,
’xix).p
{ 0,
xo=a,
X,-,IXSX~,
Otherwise c,kO,
x,=b,
i 5 1 , ...*n
a20.
Using the inverse transform method, prove that
where F, = Z:, 6 k t
I
$2-,c,x dx. Describe an algorithm for generating from j x (x),
X ,,...,X, be i.i.d. r.v.’s from exp(A).
(a) Show that Y , = min ( XI,. ..,X,) is distributed exp ( n X ) . (b) Describe an algorithm for generating from Y,.
7 Lct U , , ..., U,+,-, be from from Be( a,@).
%.(a. 1). Prove that the ath order statistic C&, is
8 The joint density of the r.v.’s X and Y is of the fonnf(uz + d)for all u and u. Show that their ratio X/W has a Cauchy density.
9 Describe two alternative algorithms, correspondingly, for examples 4 and 5 of Section 3.4 by making use of Theorem 3.4.2. 10 Describe algorithms for generating from the following p.d.f.’s: (a) j x . y ( x , y ) = c e - ( x + Y ) , x r O , y 20. @) / x u ( x y ) = c x e c x y , 0 5 x 5 2 , y 2 0.
(c) For generating from N ( p , X ) where 1 r = ( ~ , , l ~ ~ , p , ) = ( l , 2 , and 3),
x-I
1
io
1 2 0
0 0 .
31
109
EXERCISES
11 Let Yl and
Y2 be i.i.d. r.v,'s from %( - 4, Y:+
i). Prove that, if
Y2'5i,
then Yl / Y2is from C(0, 1). 12 Let V,,..., V, be i.i.d. r.v.'s from exp(l) and let X = z : - , V i .
Prove that the
vector
5)
Y = ( Y [ ,..., Y " ) = (xs ' . * " X is distributed uniformly on the simplex
2;- ,y, = I , 0 < y i < 1,
i
-
I,.
..,n.
13 Let Z,, ... ,Z,be i.i.d. r.v.3 distributed NfO, 1) and let X = (2;-,Z:)'/*. Prove that the vector
3)
Y = ( Y I , . . . , Y n ) - ( 2 ,..., X is distributed uniformly on the sphere Zy: = 1.
.. ., y,, be
14 Let Yt,, ( X , . . . X"). t
order statistics from %(O, I). Prove that the vector X =
1
XI
q,),x* = y2,
5
- y l ) , . . . ,X" =
q",- qn-,t,
is distributed uniformly inside the simplex 2:,,x, I I, x ,
> 0.
15 Consider the p.d.f.
Let
Using (3.4. lo), prove that the maximum efficiency is achieved when /3 = I .
16 Describe an algorithm for generating from &(a,@}, making use of the inequal-
ity x"-'(l - - x f - ' and assuming h ( x ) = - [aB x"-'+(l
2 x*-I+ (I
- X)+'
-xy-q,
olxll,p>o
a+p
[
g ( x ) = x"-I+ (I
-xy-
'1 - k ' 1(- x y -
1
110
RANDOM VARXATE GENERATION
Compare the efficiencyof this procedure with the efficiency of Johnk's procedure, Be4 17 Describe an algorithm for generating from G(a, 1) by the acceptance-rejection method AR-I, assuming h ( x ) - p E 4 B , m ) + (1 - P ) W P , m + 11% that is, h ( x ) is a mixture of two Erlang distributions, where m = [a]and P = a / m . 18 Prove that Procedure Be-6 i s more efficient than Procedure Be-5 for a 2 2, Bz2. 19 Rescribe an acceptance-rejection algorithm for generating an r.v. from N(0, l), representingf,(x) = Cg(w)h(x,P) and assuming that
m<x<m. Verify that the optimal @ = I, the efficiency I/C = e'/'/
= 0.6578, and
g(x)==0.8243(1+ x Z ) e - X 2 / 2 . From Tadikamalla and Johnson [32]. 20 Describe an algorithm for generating from truncated Erlang distribution
fx(x)
c
m -- te
-x / B
@"(m
- I)!
'
x > l,B>O,m=1,2
and find c. 21 Prove that, ifj'(x) can be represented asfx(x)==Ch,,(xWl Algorithm AR-3 can be rewritten as AR-3'. 22 The p.rn.f. for the uniform discrete distribution is t
Pxpb-u+ 1
,...,
- ff,J'f(x))],
then
x = u , a + 1,...,6 ,
where b and u are integers and b > u. Prove that X = [ a + ( b - u + 1)UJ has the desired distribution, and describe an algorithm for generating an r.v. from P,. Here [a]is the integer part of a. 23 Let Y be from Bernoulli distribution, that is, P,=pY(I
-p)?
y = O , l , O < p < 1.
-
,x
Prove that, if Y , , ..., Y, are i.i.d. r.v.'s from Bernoulli distribution, then X C;, is from B ( n , p ) . Describe an algorithm for generating an r.v. from B ( n , p ) , using the above result. For purposes of efficiency use the fact that if X is from B(n,p), then n - X is from B ( n , I - p ) . 24 Prove (3.7.25), that is, if the time intervals between events are from exp(l/h),
then the number of events occurring in a unit interval of time is from P ( h ) .
111
REFERENCES
ZJ Prove that y
= x81(l - X ) ~ >is a concave function on (0,1] and has a maximum
equal to
26 t e t X and XI be i.i.d. r.v.'s and let Y = a X + ( I Prove that the correlation coefficient PX,Y
=
- a ) X , , where 0 51.1
I.
a
$W.
Describe an algorithm for generating a pair of r.v.'s (A'. Y ) for which p x v = p. 27 Prove Theorems 3.4.2 and 3.4.3. 28 By analogy with Theorem 3.4.2formulate a theorem that is a multidimensional version of Algorithm AR-I, and prove it.
.
29 Let X = (X,,, .,A',) be i.i.d. r.v."s uniformly distributed inside a n n-dimensional unit sphere. Prove that the vector Y = C S is uniformly distributed inside the
ellipsoid Y T Z Y 5 K2, where Z is a symmetric and positively defined ( n X n) matrix and C is the lower triangular matrix (3.5.13). such that B C'C. Hint: Use the fact that the vector . . ,W,) KX is uniformly distributed inside the n-dimensional sphere W = ( WI..
-
-
WrW
p
W: +. W:
+. .. + W :
5 K2
with radius K.
REFERENCES Ahrens, 3. H. and U. Dieter, Computer methods for sampliq from the exponential and normal distributions, Comm. A s ~ o c Comp. . Mach., 15, 1972. 873-882. Ahrens, J. H.and U. Dieter, Extensions of Fonytbe's method for random sampling from the normalldistribution, Math. Cow., 27, 1973, 927-937. Ahrene, J. H. and U. Dieter, Computer methods for sampling from gamma, beta, poisson and binomial distributions, C-ring, 12. 1974, 223-246. Ahrcns. J. H. and U. Dieter, Non-Unifonn RMdono Numbers, Institut fk Mathematiache Statist&. Technixhe Hochschulc in Graz, Austria, 1974. A n d e m , T. W., An Intr&rion to Mdtiwriote Storisticol AnolyJir, Wiley, New York,
1958.
Box, G. E. P. and M.E. Muller, A note on the generation of random normal deviates, A M . Math. S I ~ .29, , 1958, 610-611.
Butler, J. W., Machine sampling from given probability distributions, in Synposim on edited by M.A. Meyer, Wiley, New York. 1956. Chcng, R. C. H., Generating Beta variates with non-iotegral shape parameters, Comm. Assoc. Comp. Mach,, 21, 1978. 317-322. Monte Corlo Mef&,
112
R A N W M VARIATE GENERATION
9 Cheng, R. C. H.,The generation of gamma variables, AppL Stut., 26, 1977,71-75. 10 Ermakw, J. M., Mottle Carlo Method and Related Questions,Nauka, Moscow, 1976 (in Russian). 11 Feller, W..An Intmdwtion ro Probobilify 2 k o y and 10 &lications, Wiley, New Y a k , 1950. 12 Fishman, G. S., Principles of Discrete Ewnr Simuhtwn, Wiky, New Yo&, 1978. 13 Fishman, 0. S., Sampling from the gamma distribution on a computer, C o r n . ICWUK.. Conip. Mwh., 19, 1976,407-409. 14 Fishman, G. S., Sampling from the Poisson distribution on a computer, Conlputing, 17, 1976, 147- 156. 15 ForsytbG G. E.. Yon NCUXIUM’Scomparison method for random sa m phg and from the normal and other distributions,” Math. C q . ,26, 1972, 817-826. I6 JGhnk, M. D., Erzeugung von Betraver(eiIten and Gammamtcilten Z u f f a l d c n , Metrika, 8, 1964, 5- IS. 17 Johnson. N.L. and S . KO@, Discrete Disrributiau, Houghton-Mifflin, 1969. 18 Johnson, N. L. and S. Kotz, Continuour UnicariateDistriktim,Vols. 1 and 2,HoughtonMifflin, 1970. 19 Marsagha, G.,Generating exponential random variables, Ann. Math. Staf., 32, 1961, 899-900.
20 Marsagha, G.. Generating discrete random variables in a computer, Comm. Assm. C m p . Mmh., a, 1963,37-38. 21 Mamaglta, G., M.D.MacLaren, and T. A. Bray, A fast p r d u r e for generaling normal random variables, Comm. Asw. Contp. Mach., 7, 1964. 22 Michailov, S. A., Some Prvbtenu in the l”hewo/ the M w e Carlo Metho&, Nauka, Novosibirsk, U.S.S.R., 1974 (in Russian). 23 Mood. A. M.,F.A. Graybill, and D. C. Boes, Inrr&ticm to rhe l”heqv oj Statistim, 3rd td., McGraw-Hill, New York. 1974,4- 10. 24 Naylor, T. H . et al.. Conylurer Simulation Tkchniques, Wiley, New York, 1966. 25 Ncuts, M.,Probabiliry. AUyn and Bacon, 1972. 26 philips, D.T. and C.Beightlcr, Procedures far generatinggamma variates with non-integer parameter sets, J. Stat. C q .Sirnulorion. 1912, 197-208. 27 Reks, D., A simple algorithm for generating binomial random variables, J. A m r . Stat. ASSM., 67, 1972,612-613. 28 Robinson. D. W.and P. A. W.h i s , Generating$amma and Cauchy random variables: An extension to the Naval Postgraduate School random number package, 1975. 29 Sobol, J. M.,Cmputationa! Methook of Monre Carlo, Nauka, Moocow, 1973 (in Russian). 30 Tadikamalla, P. R., Computer generation of gamma random variables, Comm. Assoc. COT. Mach., 21, 1978,419-422. 31 Tadikamatia, P. R., Computer generation of gamma random variabkq 11, Comm. Assoc. Conip. Mach., 21, 1978,925-928. 32 Tadikamalla, P. R., and M.E. Johnson, Simple rejection methods for sampling from the normal distribution, in Prmeedngs of the Firsr Inietnatiowl Conference of Mathematical Modeling, X. J. A d a , Ed., St. Louis, Missouri, 1977,573-577. 33 Tocher, K. D.,The Art of Simulation, Van Nostrand, Princeton, New Jersey, 1963.
REFERENCES 34 35 36 37
38
113
Neumana, J.. Various techniques used in connection with random digits, U.S. Not. Bur. Stand. Appl. Math. Ser., No. 12, pp. 36-38, 1951. Wallace, N.D.,Computer generation of gnmma random variates wih non-integral shape jmrameters, Comm. Assoc. Cmp. M o c k . 17, 1974,691495. Walker. A. J., An efficient method for generating discrete random variables with general distributions, T m .Moth. Sofrware, 3, No. 3, Stptcrnbcr 1977.253-257. Whittaker, I., Generating gamma and beta raudom variables with non-integrable shape parameters, Appl. Stat., 23, 1974. 210-214. Wilde, D.J., Optimum Seeking Methodr, Rcntice-Hall. Englcwood Cliffs, New Jersey, VOD
1964.
39 Yakowitz, S. J., Cowputotion Pmbabiiiity and Simuklrion, Addison-Wdy, Reading, Massachusetts, 1977.
Simulation and the Monte Carlo Method
R E W E N Y. RUBINSTEIN Copyright 0 1981 by John Wiley & Sons, Inc.
CHAPTER4
Monte Carlo Integration and Variance Reduction Techniques
4.1
INTRODUCI‘ION
The importance of good numerical integration schemes is evident. There are many deterministic quadrature formulas for computation of ordinary integrals with well behaved integrands. The Monte Carlo method is not competitive in this case. But if the function fails to be regular (i.e., to have continuous derivatives of moderate order), numerical analytic techniques, such as the trapezoidal and Simpson’s rules become less attractive. Especially in the case of multidimensional integrals, application of such rules (formulas) runs into severe difficulties. I t is often more convenient to compute such integrals by a Monte Carlo method, which, although less accurate than conventional quadrature formulas, is much simpler to use. It is shown that each integral can be represented as an expected wlue (parameter) and the problem of estimating an integral by the Monte Carlo Method is equivalent to the problem of estimating an unknown parameter. For convenience we use the expression “estimating the integral” rather than “estimating the unknown parameter.” In Section 4.3.12 we consider several practical examples of estimating such parameters (integrals). 114
115
MONTE CARLO INTEGRATION
MONTE CARLO INTEGRATlON
4.2
In this section we consider two simple techniques for computing onedimensional integrals, I=f6g(x)dx,
(4.2.1)
a
by a Monte Carlo method. The first technique is called “the hit or miss Monte Carlo method,” and is based on the geometrical interpretation of an integral as an area; the second technique is called “the sample-mean Monte Carlo method,” and is based on the representation of an integral as a mean value. 4.2.1
Tbe Hit or IMiss Monte Carlo Method
Consider the problem of calculating the one-dimensionat integral (4.2.1) where, for simplicity, we assume that the integrand g ( x ) is bounded OIg(x)_
Let ( X ,Y) be a random vector uniformly distributed over the rectangle S2 with probability density function (p.d.f.) (4.2.2)
otherwise. What is the probability p that the random vector ( X , Y ) falls within the area under the curve g ( x ) ? Denoting S = { ( x , y ) : y Ig ( x ) ) and observing that the area under the curve g(x) is a r e a u n d e r g ( x ) = a r e a S = i bg(x)d.x,
c
‘
i
~
--__
-gM
a
b
. Fig. 42.1 Graphical representation of the hit or miss Monte Carlo method.
116
MONTE CARL0 INTEGRATION AND VARIANCE REDUCTION TECHNIQUES
we obtain (4.2.3)
Let us assume that N independent random vectors ( X , , Y,), ( X , , Y2),. . . ,( X N , YN)are generated. The parameter p can be estimated by NH
$I-
(4.2.4)
N ’
where NH is the number of occasions on which g ( X , ) 2 6 , i = 1,2,. .. ,N, that is, the number of “hits,” and N - NH is the number of “misses”; we score a miss if g( X i ) < yi, i = 1,. . .,N, as depicted in Fig. 4.2.1. It follows from (4.2.3), and (4.2.4) that the integral Z can be estimated by NH
1-49, = c( b - a)-. N
(4.2.5)
In other words, to estimate the integral I we take a sample N from the distribution (4.2.2), count the number N,, of hits (below the curve g(x)), and apply (4.2.5). Since each of the N trials constitutes a Bernoulli trial with probability p of a hit, then
(4.2.6)
that is, 8, is an unbiased estimator of 1. The variance of p is
which, together with (4.2.3). gives 1
varb = -
Z
[c(b- a ) - I ] . [c ( b - a ) ] *
(4.2.8)
Thus var8, = [ c ( b - a ) I 2var$ = [ c ( b - a)I2-!-p( 1 - p )
N
I N and the standard deviation
= - [ c ( b - a ) - 11
’’,= N -
% = [var 8, 1
‘/2{
Z( c ( b - a ) - I ] 1”,.
(4.2.9)
117
MONTE CARL0 INTEGRATION
Note that the precision of the estimator 8,, which is measured by the inverse of standard deviation, is of order N -'I2. How many trials do we have to perform, according to the hit or miss Monte Carlo method, if we require
P[le,-q<&]
(4.2.10)
Chebyshev's inequality, (4.2.1 I)
together with (4.2.10), gives var ff,
(4.2.12)
aSl---,
E2
Substituting (4.2.9) in (4.2.12), we obtain a l l -
P(1
-P)[C@
-412
N E ~ Solving (4.2.13) with respect to N,we have
(4.2.13)
(4.2.14)
which is the required number of trials for (4.2.10) to hold. When N is sufficiently large we can apply the central limit theorem, which says that for N sufficiently large the random variable (r.v.) (4.2.15)
is distributed approximately according to the standard normal distribution, that is, P ( 0 , I x ) ==+(x). (4.2.16) where +(x)=-/
1
s
f i n
--c9
e-rf/2dr.
(4.2.17)
We can easily verify that the confidence interval with level 1 -2a for I is (4.2.18)
where za=+-'(a)*
(4.2.19)
118
MONTE CARL0 INTEGRATION A N D VARIANCE REDUCTION TECHNIQUES
Hammersley and Handscomb [ 101 write: Historically, hit or miss methods were once the ones most usually propounded in explanation of Monte Carlo techniques; they were of course, the easiest methods to understand (particularly if explained in the kind of graphical language involving a curve in a rectangle).
Nit or Miss Monte Cado Method A I g d h m 1 Generate a sequence { r/i}:,”, of 2 N random numbers. 2 Arrange the random numbers into N pairs (U,, U;), (U2, U;),.. .,(U,,U;) in any fashion such that each random number U,is used exactly once. 3 Compute X,=a+V,(b-a) and g(Xi), i = 1 , 2 ,..., N. Count the number of cases NH for which g ( X , ) > cq.‘. 5 Estimate the integral I by
4
8, = c ( 6 - a ) - N H N 4.2.2
The Sample-Man Monte Carlo Method
Another way of computing the integral
I =Jbg(.r)dx a
is to represent it as an expected vaiue of some random variable. Indeed, let us rewrite the integral as (4.2.20)
assuming thatjx(x) is any p.d.f. such thatf,(x) Then
> 0 when g(x) ZO. (4.2.21)
where the random variable X is distributed according to f x ( x ) . Let us assume for simplicity I
if
D
< x < 6,
otherwise;
(4.2.22)
119
MONTE CARLO INTEGRATION
then (4.2.23)
and An unbiased estimator of I is its sample mean s
N
(4.2.25)
The variance of 8, is equal to E(6:) - [E(f?,)]’,so that var 8, = var
[
1
N
(b - u )
] A[
g(Xi) = is= I
(6 -
u)’l”p ,(
x)
b-a
Sanr;ple-Mean Monte Carl0 Algunthm 1 Generate a sequence {V,>,”-,of N random numbers. 2 Compute X , = a + Q ( b - a), i = = I ,..., N. 3 Compute g( X,),i = I , . ..,N. 4 Compute the sample mean 0, according to (4.2.25), which estimates I.
4.23 Efficiency of Monte Carlo Method Suppose two Monte Carlo methods exist for estimating the integral 1. Let B, and 8, be two estimates produced by these methods such that q e , )= qe,) = I . (4.2.27) We denote by I , and f , the units of computing time required for evaluating the random variables 8, and 6,. respectively. Let the variance associated with the first method be var 8, and that associated with the second method be var8,. Then we say that the first method is more efficient than the second method if I , varb, < I. (4.2.28) t t var 8, €5-
Let us compare now the efficiency of the hit or miss Monte Carlo method with that of the sample-mean Monte Carlo method.
120
MONTE CARLO INTEGRATION AND VARWNCE REDUCTION TECHNIQUES
Propodtho 4.2.1
var 8, I var B,.
Proof Subtracting (4.2.26) from (4.2.9), we obtain (4.2.29)
Note that therefore
and further var 8, - var 8, 2 0.
Q. E. D.
Assuming that the computing times 1 , and t , for 8, and 0, are approximately equal, we conclude that the samplemean method is more efficient than the hit or miss method. If var 8, and var 8, are unknown, we can replace them by their estimators N
2 g( X i ) ( b --a ) - 6
I’
(4.2.3 1)
and then estimale by EJ-
s:
11
(4.2.32)
It is interesting to note that, estimating the integral by 8, and S,, we do not need to know the function g(x) explicitly. We need only evaluate g(x) at any point x. 4.2.4
Integration in the Presence of Noise
Suppose now that g(x) is measured with some error, that is, we observe i = 1,2,. . . ,N, instead of g, where ei are independent identically distributed (i.i.d.) random variables with
g(x,) = g ( x i ) + ei,
E ( E )= 0,
var ( E ) = u 2
(4.2.33)
and IEf
< k < 00.
(4.2.34)
121
VARIANCE REDUCTION TECHNIQUES
Let ( X , Y ) be a random vector distributed
otherwise, where c, 2 g(x)
+ k.
Then, by analogy with 8, for the hit or miss method, we obtain
8,
~ , ( -6a ) -N H
”
where N,, is the number of hits, that is & X i ) 2 K.,i = I, with f12 for the sample-mean Monte Carlo method with
(4.2.35)
.. .,N. By analogy
a l x S 6
otherwise, we obtain 1 8, =---(!I - a) N
N
i(Xi). I-
(4.2.36)
1
We can show that both r.v.’s 4,and 8, are unbiased and converge almost surely (as.) and in mean square to I and that the sample-mean method is again more efficient than the hit or miss method.
43
VARIANCE REDUCllON TECHNIQUES
Variance reduction can be viewed as a means to use known information about the problem. In fact, if nothing is known about the problem, variance reduction cannot be achieved. At the other extreme, that is, complete knowledge, the variance is equal to zmo and there is no need for simulation. Variance reduction cannot be obtained from nothing; it is merely a way of not wasting information. One way to gain this information is through a direct crude simulation of the process. Results from this simulation can then be used to define variance reduction techniques that will refine and improve the efficiency of a second simulation. Therefore the more that is known about the problem, the more effective the variance reduction techniques that can be employed. Hence it is always important to clearly define what is known about the problem. Knowledge of a process to be simulated can be qualitative, quantitative, or both.
122 43.1
MONTE CARL0 INTEGRAlION A N D VARIANCE REDUCTION TECHNIQUES
Importance Sampling
Let us consider the problem of estimating the multiple integral* ”
I
=I
g(x)dx,
x ED
c R”
(4.3.1)
We suppose that g E L z ( x ) (in other words, that f g 2 ( x ) d x exists and therefore that I exists). The basic idea of this technique 1141 consists of concentrating the distribution of the sample points in the parts of the region D that are of most “importance” instead of spreading them out evenly. By analogy with (4.2.20) and (4.2.21) we can represent the integral (4.3.1) as (4.3.2)
Here X is any random vector with p.d.f.jx(x), such thatfx(x) > O for each c R“.The functionjx(x) is called the importance sampling distribution. It is obvious from (4.3.2) that 5 = g ( X ) / J y ( X) is an unbiased estimator of I, with the variance
x ED
(4.3.3)
.
In order to estimate the integral we take a sample X,, .. X , from p.d.f. f x ( x ) and substitute its values in the sample-mean formuia (4.3.4)
We now show how to choose the distribution of the r.v. X in order to minimize the variance of {, which is the same as to minimize the variance of e,. Theorem 43.1
?‘he minimum of varr is equal to (4.3.5)
*Formula (4.3.1) is a Lebesque integral and it is assumed that the domain of integration is bounded (has finite measure). Readers not familiar with Lebesque integrals may assume it to be a Ricmam integral.
123
VARIANCE REDUCTION TECHNlQUtS
and occurs when the r.v. X is distributed with p.d.f. fx(x)=
Ig( -t-)I JlgW
(4.3.6)
lfx
Proof Formula (4.3.5) follows directly if we substitute (4.3.6) into (4.3.3). In order to prove that var &, I var 5 it is enough to prove that (4.3.7)
which can be obtained from Cauchy-Schwarz inequality. Indeed,
(4.3.8)
Q.E.D. Corollary If g(x) > 0, then the optimal p.d,f. is Ax) .fxx(x) = -j-
(4.3.9)
and var f = 0. This method is unfortunately useless, since the optimal density contains the integral j I(g(x))i dx, which is practically equivalent to computing I. In the case where g(x) has constant sign it is precisely equivalent to calculating I. But if we already know 1, we do not need Monte Carlo methods to estimate it. Not all is lost, however. 'The variance can be essentially reduced if f x ( x ) is chosen in order lo have a shape similar to that of [g(x)(. When choosing f x ( x ) in such a way we have to take into consideration the difficulties of sampling from such a p.d.f., especially if lg(x)( is not a well behaved function. In estimating the integral, we can save CPU time if the sample X,.... ,X, will be taken in the subregion D' = (x :g(x) # 0 ) of D. This is the same as defining and f x ( x ) = 0. if g( x ) = 0. (4.3.10) j x (x ) > 0, if g( x) # 0
124
MONTE CARLO INTEGRATION AN11 VARIANCE REDUCTION TECHNIQUES
Consider the problem of choosing the parameters of the distribution fx(x) in an optimal way. We assume that the p.d.f. fx(x) is determined up For instance, if to the vector of parameters a, that is, /,(x)=/,(x,a). f x ( x ) represents one-dimensional normal distribution, that is, X- N(p,u 2 ) , then the unknown parameters can be the expected value p and the
variance 02.We want to choose the vector of parameters a to minimize the variance of S,, that is,
(4.3.1 1)
The last problem is equivalent to (4.3.12)
The function (4.3.1 3) can be multiextremal and generally it is difficult to find the optimal a. Some techniques for global optimization are discussed in Chapter 7. 4.3.2
CorreLted Sampling
Correlated sampling is one of the most powerful variance reduction techniques. Frequently, the primary objective of a simulation study is to determine the effect of a small change in the system. The sample-mean Monte Carlo method would make two independent runs, with and without the change in the system being simulated, and subtract the results obtained. Unfortunately, the difference being calculated is often small compared to the separate results, while the variance of the difference will be the sum of the variances in the two runs, which is usually significant. If, instead of being independent, the two simulations use the same random numbers, the results can be highly positively correlated, which provides a reduction in the variance. Another way of viewing correlated sampling through random numbers control is to realize that the use of the same random numbers generates identical histories in those parts of the two systems that are the same. Thus the aim of correlated sampling is to produce a high positive correlation between two similar processes so that the variance of the difference is considerably smaller than it would be if the two processes were statisticaliy independent.
125
VARIANCE REDUCrION TECHNIQUES
Unfortunately, there is no general procedure that can be implemented in correlated sampling. However, in the following two situations correlated sampling can be successfully employed. 1 The value of a smatl change in a system is to be calculated. 2 The difference in a parameter in two or more similar cases is of more interest than its absolute value.
Let us assume that we desire to estimate BI =I, -12. where*
(4.3.14)
and X E D 2
I 2 + 2 ( X ) f 2 ( W ,
CR".
(4.3.16)
Then the procedure for correlated sampling is as follows:
..
1 Generate X,, . ,X, from f,(x ) and 2 Estimate A1 using
Y,,.. ., YN from f2(
x).
(4.3.18) where .
N
(4.3.19)
- -I xN R A Y , ) '
(4.3.20)
= E(B, - f l y
(4.3.2 1)
8,
5
1-1
9:
4+E(e*-12)2
-
(4.3.22)
*Introducing g(x)o+(x)/fx(x), where j x ( x ) is a p.d.f., integral I J+(x)dx can be written as I I J g ( x ) f ( x ) d r . An unbiased estimator of the last integral is q.=ir(X) (4.3.13) and the integral a n be estimated by 8,
I N -TJ
,Zg ( x i ) .
1-1
(4.3.14)
126
MONITE CARL0 INTEGRATION A N D VARIANCE REDUCTION TECHNIQUES
and cov( Now if
e,,d2) = E [ (e, - 1,)(d* - 12)].
6, and e2 are statistically independent, then cov( 6, ,tj2) 0 5
(4.3 -23)
(4.3.24)
and 0
* = u; + u2'.
(4.3.25)
However, if the random variables X and Y are positively correlated and if dz will also be positively correlated, that is, C O V ( ~ , , ~>~0, ) and the variance of A8 may be greatly reduced. Thus the key to reducing the variance of A8 is to insure positive correlation between the estimates i, and iz.This can be achieved in several ways. The easiest way is to obtain correlated samples through random number control. Specifically, this can be accomplished by using the same (common)sequence of random numbers U,, . ..,U, in both simulations, that is, the sequences XI ,..., X , and Y,,..., Y, are generated using X,= F , - ' ( q ) and Y, = F2-'(CJ,), respectively. Clearly, if j X is similar to fy, the r.v.'s X,and k; will be highiy positively correlated since they both used the same random numbers. I t is difficult to be specific as to how random number control should be applied generally. As a rule, however, to achieve maximum correlation common random numbers should be used whenever the similarities in problem structure will permit this. Such an example is given in Section 6.7.2, while comparing some output parameters of regenerative processes.
g,(x) is similar to gz(x) in shape, then the random variables 6,and
433 CmbdVIIliates The use of control variates is another technique for reducing the variance. In this technique, instead of estimating a parameter directly, the difference between the problem of interest and some analytical mode1 is considered. Application of control variates is very general [ 10, 12, 131. Most of them concern queues and queueing networks (see Sections 4.3.13 and 6.7). Our nomenclature follows Lavenberg and Welch's paper [ 131. A random variate C is a control variate for Y if it is correlated with Y and if its expectation pc is known. The control variate C is used to construct an estimator for p that has a smaller variance than the estimator Y. For any p Y(P)= Y-P(C-p,) (4.3.26)
127
VARIANCE REDUCTION TECHNrQUES
is an unbiased estimator of
Now var[ Y(P)] = var[ Y] - 28cov[ Y, C ] + P2var[ C]. I.(.
(4.3.27)
Hence if 2 ~ c o vY [ , C ] >P2var[c], variance reduction is achieved. The value of /? that minimizes var[Y(P)] is easily to be found as B* z= cov[ Y , c] var[ C ] and the minimum variance is equal to vat[ Y ( a * ) ] " ( 1 - ~ " y C ) v a r [ ~ ] ,
(4.3.28)
where p y c is the correlation coefficient between Y and C. Hence the more C is correlated with Y, the greater the reduction in variance. Another type of control variate is one for which the mean E ( C ) is unknown but is equal to p, that is, E ( C ) E( Y ) = p. Any linear combination =3
Y ( P ) - P Y + ( l -P)C
is again an unbiased estimator of p , and if Y and C are correlated, variance reduction will be achieved. We now extend the above results to the case of more than one control variate. Let C = ( C , ,..., Cn) be a vector of Q control variates, let p c be the known mean vector corresponding to C, that is, pc = ( p , , . , .,pQ), where p, = E[C,], and let fi be any vector. Then Yf 6) = y - b'(C - cc)
(4.3.29)
is an unbiased estimator of p. Here t is the transpose operator. The vector that minimizes var[Y(p)] (see [ 131) is
8'
P * X QYC.y-' c 7
(4.3.30)
where Zc is the covariance matrix of C and uyc is a Q-dimensional vector whose components are the covariances between Y and Cis. The resulting minimum variance is var[ Y(@*)]= ( I - R$,-) var[ Y], (4.3.3 1) where (4.3.32)
128
MONTE CARLO INTEGRATION A N D VARIANCE REDUCTION TECHNlQUES
As before the larger the multiple correlation coefficient R’yC between C
and Y, the greater the variance reduction. Again, if Y,, .. . , Y p + , are Q + 1 different unbiased estimators of unknown p, then Q+ I
22 4yl
1-
(4.3.33)
I
where Zp,’,B, = I is also an unbiased estimator of p. For practical application of control variables there are two key problems. First, control variables must be found that are highly correlated with the estimators of interest. Second, since the vector uycand the matrix Zc are in general unknown, the optimum coefficient vector 8’ is unknown and must be estimated. Further, its estimation must be incorporated into effective statistical procedures, and we now turn our attention to these questions. Let Yk, k = I , . ..,K, be a sample fromf,(y). An unbiased estimator of p is
The variance of
F is equal to
and is estimated by
The random variable
a( F ) has approximately a t-distribution with K - 1 degrees of freedom. The confidence interval can be found from prob
[
- t,-
,( 1 - ;)ti(
) Ip IF + I,-
1-
)d(
)
)
fis
1 - a.
(4.3.35)
Let C , be the value of C for the kth run. Then if the optimum coefficient vector p* were known, we would use the estimator Y k ( f l * ) = yk - @*‘(c& -aC)
(4.3.36)
129
VARIANCE REDUCTION TECHKIQUES
for the k th replication. The estimator based on K runs would be
and a confidence interval could be obtained by replacing Yand Yk with y(JB*) and Yk(f3*),respectively, in (4.3.34) and (4.3.35). In this case (p* known)
(4.3.37) and the variance reduction given by (4.3.31) would be obtained. Furthermore, the ratio of the mean confidence interval widths would be approximately proportional to the ratio of the standard deviations, and hence confidence interval width would be reduced by approximately (I R$c)*". However, in practice @*is unknown and hence must be estimated. We estimate it by the sample equivalent of (4.3.30), that is, by
-
8' = e,$,
I,
(4.3.38)
where , i and 2, are the sample covariance vector and sample covariance matrix whose elements are given by
and
where C4k is the qth element of C, and is the average of Cqk, .,,K. Substituting fi* for f4* in (4.3.3.61, we obtain
k = I,.
Y&(B*)= Y& - @*W& -Pc)
and
are in general, y(fi*) is a biased estimator of p since fi* and dependent. Also, the Yk(fi*)are dependent, so we cannot directly use the r-statistic to obtain a confidence interval for y. However, if we assume to have a multivariate normal distribution, then it is shown in
130
[ 131 that
MONTE CARLO lt4TEGRAI'ION AND VARJANCE REDUCTION TECHNIQUES
y(@*)is an unbiased estimator of p and (4.3.39)
-
has a r-distribution with K - Q I degrees of freedom. Hence a confidence interval can be obtained from
5p I
F( p )+ f x - p - l ( 1 - ;)s( U(6')))
=I
- a.
(4.3.40)
Further, the ratio a*(Z)(Y(fi+))/a*(v) is gwen [13] by
We can see from (4.3.41) that there exists a trade-off between ( K 2)/( K - Q - 2) and 1 - R2yC. At one extreme, if K is not large with respect to Q, the factor ( K- 2)/( K - Q - 2) can nullify the potential variance reduction. At the other extreme we expect the factor 1 - R$, to be a decreasing function with respect to Q. It was indicated in [13] that for finite K the number of control variates Q has to be relatively small. It would be interesting to find the optimal Q as a function of K by making some assumptions about R,. T h e major cost involved in the application of control variables is the effort required to develop a reasonable set of control variates. This requires understanding the model in sufficient detail to define possible control variables and estimators of interest. There are only a few published reports describing the application of control variables for practical problems. However, judging from them we hope that variance reduction in the range 0.25 to 0.75 could be realized in practical situations. Now we consider how the control variates can be used in estimating the integral 1-
m a-Is(x)fx(x)dx.
(4.3.42)
Let g,(x) be a function that approximates g(x) well and let the expectation E [ g , ( x ) ] be known. The function g,(x) is a control variate for g(x). Denoting Y - g ( x ) , C=g,,(x), and pLc= j g o ( x ) f x ( x ) d x , we have for
131
VARIANCE REDUCTION TECHNIQUES
which is an unbiased estimator of the integral I. Taking a sample X,,...,X, from fx(x), we can estimate the integral I by
where P* is the optimal /I, which minimizes var[Y(P)]. The efficiency of this technique depends on how well gdx) approximates g ( x ) . But it is sometimes difficult to find a g o ( x ) that approximates g(x) well enough and such that E I g , ( x ) ] is known. In many cases no approximation is known for g(x). This can be overcome by simulating some values of X (making a pilot run) and plotting the results. The extension to the case of Q control variates (see (4.3.29)) in calculating the integral I is as follows. Let +(X ) = I+,( X),...,Cpc( X)] be a vector of control variates, with known mean vector po, that is, p, = E [ + J X ) ] . Then for any vector fl (4.3.43) Y ( B ) = d X )- P ( 4 4 X ) - P + ) is an unbiased estimator of p. Denoting Y = g( X ) , Cp( X ) = C,p+ = p,, we obtain formula (4.3.29). 43.4
Stratified Sampling
This technique is well known in statistics [3]. For stratified sampling we break the region D into m disjoint subregions Di, i = I , 2,. .., m , that is, D = u,: Dl, Dk n Dj = 0, k Z j where 0 is an empty set. Then define (4.3.44)
which can be estimated separatdy by the Monte Carlo method (for instance by the sample-mean Monte Carlo). The idea of this technique is similar to the idea of importance sampling: we also rake more observations (samples) in the parts of the region D that are more “important,” but the effect of reducing the variance is achieved by concentrating more samples in more important subsets D,, rather than by choosing the optimal p.d.f. Let us define PI
=I
f*(x)dx.
D,
(4.3.45)
132
MONTE C A R L 0 IhTEGRATION AND VARfANCE REDUCTION TECHNIQUES
In trod ucing
if x E 0, otherwise,
(4.3.47)
we can rewrite integral 1, as
(4.3.48)
where
Inasmuch as I, is expressed as an expected value, the sample-mean estimator for I, can be written as (4.3.49) T = P,g(X,h where the r.v. XIis distributed according t o f x ( x ) / P ,on 0,. The integral I, can be estimated by T,=-
N,
4 Nl
g(X,,),
k , = i ,..., ~ , , i 5 1 , . . m .,
(4.3 S O )
k,-1
and the integrat I by (4.3.51)
We may quickly verify that (4.3.52)
where
If stratification is well carried out, the variance of O6 may be less than the variance of the sample-mean method 6, with Xy- IN, = N.
133
VARIANCE REDUCTION TECHNIQUES
Once the subsets D t , ...,Dm are selected, the next requirement is to define the number of samples to assign to each interval. More specifically, let N, be the number of samples assigned to the subset Di where m
E&=N.
(4.3.53)
i- I
The following theorem tells us how to stratify in an optimal way.
Theorem 43.2
For given partitioning D = uY-, Di (4.3.54)
subject to m
occurs when Ni = N 7
Ppi
(4.3.55)
and is equal to (4.3.56) The proof of the theorem is left to the reader. Thus when the stratification regions are prescribed the minimum variance of $6 occurs when the N, are proportional to P p , . This theorem, as well as Theorem 4.3.1, has no important direct application because the values of u, are usually unknown. One practical suggestion is to make a small “pilot” run to obtain rough estimates for a,. Such estimates would be of help in determining the optimal N,,with the appropriate trade-off between the cost of sampling and the degree of precision desired. Let us choose N, = P,N (we assume that PI can be calculated analytically). Proposition 43.1 var86 Ivarg,, that is, if the sample size Iv, in each subset Di is proportional to PI (i.e., if N, = Ne.),then the variance of the stratified sampling method will be less or equal to the variance of the sample-mean method.
134
MONTE CARLO INTEGRATION AND VARIANCE REDUCTION TECHNIQUES
Proof Substituting
8 = NP,in (4.3.52),
we obtain
l r n var e, = - 53 P,.varg( X i ) .
(4.3.57)
h.'j=l
From the Cauchy-Schwarz inequality we have
-2
m
;-I Pi
j-1
m
m
r2
i l l
Pi
s x 2 xp#= fcl. Multiplying (4.3.52) by P, and summing over i from 1 to m, we obtain
which together with (4.3.58) can be written as m
Comparing (4.3.57) and (4.3.60), we immediately receive the proof of this proposition. Q.E.D. In other words, proposition 4.3.1 states: There is no function g(x) E
Lz(D,J)such that the stratified sampling method would be worse than the
sample-mean method while choosing 4.= P,N. Of course, if the last assumption is not true, the stratified sampling method may be worse than the sample-mean method. In exercise 6 such an example is presented. It can be proven that the efficiency of stratified sampling in comparison with the sample-mean method is approximately m2. In the particular case when P, = l / m and N,= N / m , we obtain the so-called systematic sampling method [8). The procedure for systematic sampling is as follows: 1 Divide the range [0,1] of the cumulative distribution into m intervals each of width I / m . 2 Generate {t&,&s= 1,. ..,N / m ; i = 1,. . . , m } from %(O, 1). 3 Y k , t ( i- I u&,)/m;k, = 1,. . . ,N / m ; i = 1,. . .,m. 4 X,,cF-'(Y&,).
+
The estimator for the integral I is
13s
VARIANCE REnUCTlON TECHNIQUES
and the sample variance is
4 3 5 AntithetlcvariRtes
This technique is due to Hammersley and Morton [ 1 I]. In this technique we seek two unbiased estimators Y' and Y" for some unknown parameter I (in our case i is the unknown integral}, having strong negative correlation. Note that f ( Y + Y") will be an unbiased estimator of I with variance var[ f (Y'
+ Y " ) ] = f var Y' + f var Y + f cov( Y', Y"),
(4.3.61)
and it follows from the last equation that, if the covariance cov(Y', Y") is strongly negative, the method of antithetic variates can be effective in reducing the variance. As an example, consider the integral
which is equal to (4 -3.62)
The estimator of I is then Y=+Y'+
Y")-f[g(U)tg(l
-U)].
(4.3.63)
Y is an unbiased estimator of I , because both Y' = g ( V ) and Y" = g( 1 - U ) are unbiased estimators of 1. To estimate 1 we can take a sample of size N from the uniform distribution and find .
N
(4.3.64)
The time required for one computation by (4.3.64) is twice that required by the sample-mean method. Therefore the estimator (4.3.64) will be more efficient than the estimator 0, (4.2.25) with a = 0 and b = 1 only if var 8, I4 var 8*, h.oposition 43.2 If g ( x ) is a continuous monotonically nonincreasing (nondecreasing) function with continuous first derivatives, then
136
MONTE CARL0 WTEGRATION AND VARIANCE REDUCTION TECHNIQUeS
Prooj Let us assume without loss of generality that N = I. It follows from (4.3.61) that
+f i ' g ( x)g( 1 -x)dx - I2
(4.3.66)
Therefore 2 var 6,
- var8,
=
1'
g(x)g(I
- x)&
- 12.
The theorem will be proved if we prove &Ig(x)g(l
- x)dx I12.
(4.3.67)
Let us assume that g(x) is a monotonically nondecreasing function with continuous first derivatives (the proof when g ( x ) is nonincreasing is similar), such that g( 1) > g(0). Let us introduce another auxiliary function (4.3.68)
such that +(O) = +(I) = 0. The first derivative (4.3.69)
+'(.x)=g(l - x ) - I
is also a monotone function and +'(O) > 0, +'(l) and obviously
< 0.Therefore +(x)
2 0,
x E [0, I],
Integrating (4.3.70) by parts, we get (4.3.7 1)
and substituting (4.3.69) into (4.3.71), we obtain (4.3.67).
Q.E.D.
More generally, let l=~mg(x)f,(x)dx,
xER'.
(4.3.72)
-a
Then by analogy with (4.3.64) an unbiased estimator of I is (4.3.73)
137
VARIANCE REDUCTION TECHNIQUES
where
xi=F-1(q) x; = F - ' ( 1 - q.)
(4.3.74) (4.3.75)
and F,(x) is the cumulative distribution function (c.d.f.) of X. The pairs X i and X,: are, of course, correlated since the same random numbers V,, i = 1.. . . ,N, were used to generate both r.v.'s XI and Xi. Furthermore, these r.v.3 are negatively correlated and therefore 8, may have a smaller variance than @,. Let us rewrite (4.3.51) for the case when the region D = ( x : x E[O, l]}. We have
where 0 = a .
< a,<
*
1
< a,,,= 1, Pi = a;- a;-,,and Ujjis a sample, from
%(O, 1). Letting m = 2, 4 = N , and denoting a , = a, we get for (4.3.76) N
1
(4.3.77) C ( n g ( a q I ) + ( l- a } g [ a + ( I N j-1 Let us now make qi dependent. Assuming q, = L$z= V,, we obtain
e,=-
-LY)u,~]).
-
8;=- 1
N
N
2
{ a g ( a U , ) + ( I- a ) g [ a + ( l - a ) V , ] }
j-1
or, alternatively, assuming L$, .
-
1-
(4.3.78)
qz= r/., we have
N
It is easy to see that both 8; and 0; arc estimates of the antitbetic variates type. If a = f then (4.3.79) reduces to (4.3.64). Consider now a case with two strata for (4.3.72). Assume the domain of & ( x ) is broken up by x , into the ranges - oa < x < x, and x, < x < 00. By analogy with (4.3.79) an unbiased estimator of I is
.
6,s-
I N
N
I: [ adX,) + ( 1 -.)g(Xi')]
(4.3.80)
1-1
where
xi = F - ' ( a q )
(4.3.81)
x,' = F - I [
(4.3.82)
a
+ ( I-a)q.j.
In the particular case when a = f (4.3.82) reduces to (4.3.73).
138
M O N E CARLO INTEGRATION AND VARIANCE REDUCTION TECHNIQUES
We can try to obtain an a that minimizes vare, in (4.3.80). Generally, this problem is difficult to solve because 8, does need to be unimodal with respect to a. In Chapter 7 some techniques for multiextremal optimization are considered.
43.6
Partition of the Region.
In this technique 1211 we break region D into two parts D - D , u D2, representing the integral I as (4.3.83)
Let us assume that the integral (4.3.84)
can be calculated analytically, and let us define a truncated p.d.f.
\
(4.3.85)
0,
otherwise
where P = j D , I x ( x ) d x . Formula (4.3.83) can be written as
(4.3 3 6 )
An unbiased estimator of I is then (4.3.87)
and the integral 1 can be estimated by (4.3.88)
VARIANCE REDUCTION TECIINIQUES
139
Ropusiion 4 3 3 var 6, 5 ( I - P ) var e,.
(4.3.89)
Proof We have from (4.3.4) that
and, correspondingly, from (4.3.88) that
Multiplying (4.3.90) by ( I - P ) and subtracting (4.3.91), we obtain
- ( I - P ) f 2+ ( L : ( x ) d x )
2
(4.3.92)
Now introducing
we have “(1
- ~ ) v a r ~ , - v a r ~= ~( I] - P ) c L + ( P ~ / ~ ~ - P - ’ / * ~ , ) ~ ~ o ,
and Proposition 4.3.3 is proved. (1
Q.E.D.
As a result of the proposition, we find that this technique is at least - P ) - ’ times more efficient than the sample-mean Monte Carlo method.
140 43.7
MONTE C A R L 0 INTEGRATION AND VARIANCE REDUCTION TECHNIQUES
Reducing the Dimensionality
This approach is due to Buslenko [21] and is sometimes called expected wh e . Let us assume that the integral
can be represented as (4.3.95)
where
,"."
y =(x
x,) ED,CR'
and 2
=(x,+*,
. .. ,x,)
E
D2
c P-'.
Assume also that the integration with respect to analytically, that is, the marginal p.d.f.
t
f A Y )= / - ; f i . L ( Y . 4 d 2
can be performed
(4.3.96)
and the conditional expectation
can be found analytically. It is obvious that
An unbiased estimator of I is ~y~Ez[g(zfY)J,
(4.3.99)
and it can be estimated by . 1
@,=-
N
r:
Ez[gfZIY,)],
1-1
where
x , i = I , . . . ,N are distributed with p.d.f. fy(y).
(4.3.100)
VARIANCE REDUCTION TECHNIQUES
141
Proposition 43.4 If integration can be performed analytically with respect to some variables, then the variance will be reduced, that is, (4.3.101) var v9 I var q, , where qs is the sample-mean estimator (see (4.3.13)).
Proof The proof is quite simple. Denote V = g( Y,Z). Now using the well known formula [I71 var V-var,{€,(Y~Y)} +E,(var,(VIY)) (4.3.102) and noticing that q4 = V,v9 = E,[g(ZIY)]= E , ( V ( Y ) , and E,-var,(V(Y) Q.E.D. 2 0, the result folIows immediately. 43% Couditional Monte Carlo
If the problem under consideration is very complex-the sample space is complicated, or the p.d.f. is difficult to generate from-then it may be possible to embed the given sample space in a much larger space in which the desired density function appears as a conditional probability. Simulation of the large problem can be much simpler than the original complex problem and, despite the added computation required to calculate the conditional probabilities, the gain in efficiency can be quite high. This technique was developed by Trotter and Tukey [24]. Our nomenclature follows Hammersley and Handscomb's book [lo]. Consider again the problem of estimating (4.3.103) Let D be embedded in a product space 51 = D X X . Each point of s 1 5 D X R can be written in the form z = (x,y),+wherex E D andy E R. Let h ( z ) 5 h ( x , y ) be an arbitrary density function, let + ( z ) = + ( x , y ) be an arbitrary real function, both defined on 52, and let
(4.3.104) We also assume that both h ( r ) and J / ( x ) are never zero. We may regard x andy as the first and second coordinates of z so that x is a function of z, which maps the points 52 onto D. Let dz denote the volume element swept out in S2 when x and y sweep out volume elements dx and dy in D and R, respectively. The Jacobian of the transformation I = ( x , y ) is dx 4 9 ( z ) =4(x,y) =dz '
(4.3.105)
142
MUNII: CARLO IN'TEGRATION AND VARIANCE REDUCTION TECHNIQUES
We define the weight function (4.3.106) Then we have the following identity:
(4.3.107)
(4.3.108) where X is the first coordinate of the random vector 2 sampled from &I with p.d.f. h ( t ) . The unbiased estimator of I is then of the form a,o=g(X)w(Z). (4.3.109) Both functions C$ and h , and also the region R, are at our disposal; we may choose them to simplify the sampling procedure and to minimize the variance of the estimator q,*. We now consider a particular case. Let h ( z ) be a given distribution on the product space St D x R , and let J , ( x ) =j,(xly,) be the conditional distribution of h ( z ) giveny ==yo.If we write P ( y ) for the p.d.f. of Y when 2 = (X. Y )has p.d.f. h ( z ) , we have h ( z ) h -Jx(xlY)P(Y)dx&? (4.3.1 10)
-
and comparison of (4.3.106) and (4.3.103) gives (4.3.1 11) In particular (4.3.1 12)
(4.3.1 13)
143
VARIANCE REDUCTION TECHNIQUES
This leads to the following rule. Suppose that 2 = (X, Y) is distributed on 52 with p.d.f. h ( z ) = h ( x , y ) ; then 9ro - g ( X ) w ( Z ) ,
where w( Z) is given by (4.3.1 13), is an unbiased estimator of the conditional expectation of g( X ) given that Y =yo. Note that this rule requires neither sampling from the possibly awkward space D nor evaluation of the possibly complicated function f, and @ is available for variance reduction. 43.9
Random Quadrature Method
Ermakov [4] suggested a quite general method of Monte Car10 integration based on orthonormal functions. We need some preliminary results before describing this method. Let +i(x), i = 0, 1,. ... m, be a system of orthononnal functions over the region D, that is,
and let m
R(x)wgm(x)
C
~ i + t ( ~ )
(4.3.115)
I-0
be an interpolation formula for a given function g(x). The problem is to choose cjt for a given set of points x, E D , in such a way that i = 091,. .. ,m ; (4.3.1 16) ! ? , ( X I ) = g(x,), that is, at points xi we require coincidence in both the original g ( x ) and the approximated function gm(x). To find c, we have to solve the following system of linear equations with respect to c,:
.C &. O. (.X .~ ).+. ci+i(no) . . . . . .+.... . +Cm+m(Xo) . . . . . . . =. d. x. o. ) .c~cPo(xr) . . . . .+cI+,(xr) . . . . . .+. *.. . .+ c.m.+ m. (.X ,.) . -g(x,) ..... *
co+o(xm)+~1+1(~m)+
(4.3.1 17)
+Cm+m(x,)sg(Xm)*
Applying, for instance, Cramer’s rule, we find (4.3.118)
where w( xo,.
...x m ) =
(4.3.1 19)
144
M O m E C A R L 0 INTEGRATION A N D VARIANCE REDUCTION TECHNIQUES
is the ( n + IMn + I ) determinant and w g ( x o , x , , .. ., x m ) is the corresponding determinant in which the first column vector +o(x) = (+o(xo), . . .&(xrn)} is replaced by the .right-hand side vector g(x) = { g ( x o ) , g(x,), .. . ,g(x,,,)}. With these results at hand let us consider the problem of calculating the integral I0 = /+o(x)dx)
(4.3.1 20)
fix.
Substituting (4.3.1 IS) in the last formula, we have
which is an approximation of la and is catled an interpolation quadrature formula [4] for I,. Taking into consideration the orthonormality condition (4.3.1 14). we immediately obtain fo = co.
(4.3.122)
Therefore the value of integral Zo is approximately equal to the coefficient co in the interpolation formula (4.3.1 15) and can be calculated by Cramer's rule (4.3. I 18). Ermakov 141 suggested choosing the points x, E D in the interpolation formula (4.3.115) according to some probabilistic law rather than determining them in advance. Assuming that x,,c,. or both of them are random variables, they called (4.3. I 15) a random quadrature formula, which is a natural generalization of the same formula (4.3.1 15) with deterministic x , and c,. They proved the following theorem.
Tbeorem4.33 Let (4.3.123)
I
0 7
if X
E Bo c
Rm+'
be a random variable distributed with (4.3.124) where
Bo= ~ x : w ( x ) - O } and B+={x:w(x)#O).
145
VARIANCE REDUCTION TECHNIQUES
Then fl,, is an unbiased estimator of I,, that is, E ( 4 , ) = I0
(4.3.125)
with variance (4.3.126)
The proof of the theorem as well, as some generalizations and applications can be found in Ermakov’s monograph [4]. This method offers great possibilities because of its general character. But it also has some weak points: first, we must define a set of orthonorma1 functions over the region D: second, we must find an efficient way of 1 sampling X o , X , . . . . ,Xm with joint p.d.f. [ w ( x , , XI. * . . ,XJ2. ( m+ I)! Even then computation of fl,, is generally no small matter, and therefore the random quadrature method seems to be of rather limited practicality, 43.10
Biased Estimators
Until now we have considered unbiased estimators for computing integrals. Using biased estimators, we can sometimes achieve useful results. Let us estimate the integral (4.3.127)
bY N
r, R ( W
r=l $12
N
xAU)
I=
I
instead of using the usual sample-mean estimator
Here U is distributed uniformly in D, that is,
and X is distributed according to j ” ( x ) .
(4.3.128)
146
MONTE CARL0 IN7EGRATION A N D VARIANCE REDUCTION TECHNIQUES
It is clear that E ( B , , ) f I, that is, 6,, is a biased estimator of I. Let us is consistent. To prove consistency let us represent 8,, as a show that ratio of two random variables 6;, and 6;, that is, . N
(4.3.130)
where (4.3.131)
and (4.3.132)
‘
lim
N-m
N
z g(W z: S((u,)
I-
1
1-
I
a.s. 3
I,
if / l g ( x ) [ d x
< 00
(4.3.135)
147
VARIANCE REDUCTlON TECHNIQUES
One major advantage of this method is that the sample is taken from a uniform distribution rather than from a general f x ( x ) from which the generation of r.v.'s can be difficult (recall for instance that in importance sampling f x ( . x ) has to be proportional to Ig(x)I, and if g ( x ) is a complicated function, it is difficult to generate from f x ( x ) ) . Powetl and Swann [ZO]cailed this method weighred unijortn sampling. They showed that for sufficiently large N this method is times more efficient than the sample-mean method. 4.3.11 Weighted Monte Carlo integration
Yakowitz et af. [27] suggested estimating the integral 1+(X)dX
using the following Monte Carlo procedure: 1 Generate &'&,, . ., r/, from %(O, I). 2 Arrange U , , . . ., U, in the increasing order
q,,,. .., U,,,.
3 Estimate the integral by
( dU(,)) +p
(uOd
whereti,,,
= 0 and
U( +,,,
,)
I)))(clfd+
= 1. They proved
I)
I
- q,))
9
(4'3'137)
the following
Prolpositian 6.35 Assume g ( x f is a function with a continuous second derivative on [0, 1). If (q,,}ft, is the ordered sample associated with N independent uniform observations, then
var ot3= E ( B , , - I
k 1' 5 ,
N4
(4.3.138)
where k is some positive constant. I t is also shown in 1271 that in the one-dimensional case varf?,,= o(l/N4), which is much less than vare, = 0 ( 1 / N ) in the sample-mean Monte Carlo method and in the two-dimensional case var = O(l/N2), which is bigger than varo,, in the one-dimensional case but less then var$ = O(I/N) for the sample-mean Monte Carlo method. Unfortunately, Yakowiiz et at.% method becomes inefficient as the dimensionality of x increases.
14
MONTE C A R I . 0 INTEGRATION A N D VARIANCE REDUCTION TECIiNIQUES
43.12
More about Variance Reduction (Queueing System and Networks)
In this section we consider two more examples of application of variance reduction techniques, which are taken from Refs. 29, 32, and 33. The first example is a single server queue G I / G / 1, the second, a network. Some other examples of variance reduction with application to different problems can be found in Refs. 28 through 46. (a) Single Server Queue CIIG/1[46] Consider a single server queueing system G//G/l, with a general distribution of service and interarrival time. We assume that, if an arriving customer finds the server free, his service commences immediately, and he departs from the system after completion of his service. If the arriving customer finds the server busy, he enters the waiting room and waits for his turn to be served. Customers are served on a first-in-first-out (FIFO) basis. Let S, denote the service time of ith customer who arrives at time I , and let A, = r, - I,- i 2 1, denote the interarrival time (the time between the arrivals of the ( i - 1)th and i th customers). Assume that the sequences (S,,i 2 0) and ( A , , i 2 11 each consist of i.i.d. r.v.3 and are themselves independent. Let p be the mean service rate, and let X be the mean arrival rate, that is,
,,
E(S,)=p-*
and
E(A,)=X-'.
The parameter p = A / p is called the fruflic inremiry and measures the congestion of the queueing system. The necessary and sufficient conditions for the system to reach steady-state position (to become stable) is p < 1. To measure the performance of the system we can use the mean waiting time of the ith customer (time for arrival to commencement of service); the number of customers in the system at time t; the amount of time in the interval [O.t] that the server is busy; or the total number of customers who have been served in the interval [O.rJ. As our measure of performance we take the mean waiting time of the ith customer and denote it by E(W,). We assume that customer 0 arrives at time f , = 0 and finds an empty system. The following recursive formula is well known [33j:
w,-0
,
W , = max( W,- - A, + S, - ], 0 ) = ( M:-
, - A, + S,- ) I
+
,
i = 1,2, ... .
(4.3.139) Usually, for the G / / G / I queueing system it is difficult to find E(W,) analytically and simulation may be used. In order to estimate E(Wi) we run the queueing system N times, each time starting from I , = 0, obtain a
149
VARIANCE REDUCTION TECHNIQUES
sequence of service times {Sik,i 2 0,k = 1,. . .,N ) and a sequence of interarrival times { A i k ,i 2 1,k = I , , . .,N ) , and estimate E(W,) by the sample-mean formula
where ~ ~ & = ( ~ , - l ) k - s 4 + ~ k$ $ - 1 ) k ) + , w&=o* We now explain how the antithetic and control variates methods can be applied for variance reduction, thereby improving the efficiency of the simulation. Both methods are based on reuse of the same random numbers. Antithetic oariufes. Let Fl(x ) be the c.d.f. of the interarrival time A, and let F2(x) be the c.d.f. of the service time S,. Let us generate two sei 2 0,k = 1,. ..,N } and {&i*), i 2 0, quences of random numbers {qi‘), k = 1,. . .,N},and obtain two corresponding sequences A,, = Fl- ‘ ( V f ) ) and &,= F2-’(U,i2)) of interarrival and service times. Introducing the antithetic sequences ( t - U,!’),i 2 0, k = 1,. .. ,N ) and { 1 - qf),i 2 0, k = I , . ..,N},we can define another two sequences Alk = F, - I ( 1 - b$’) and S,: = Ft-’(l U,i2’)of interarrival and service times and estimate the mean waiting time E ( y ) by
-
(4.3.141) where
Wk [ q,,>k - Ft$=
I
I -C p )
+ f-2- I( I - q!?
+
.
Now 1
var Kc”)==- [var W; + var 4N 1
= -[var
2N
+ ~ C O V ( Wq)] ,,
+ cov( y ,y )f .
(4.3.142)
By analogy with (4.3.65) we can conclude that the method of antithetic variates will be more efficient than the sample-mean method if var
e(A) s j v a r e,
which means that cov(W,, w,‘)is negative and /cov(w,
(4.3.143)
v)}>ivar q.
I!%
REDUCTION TECHNIQUES
MONTE CARLO INTEGRATION A N D VARIANCE
Page [46f suggested estimating E( W ) by (4.3.144)
CIA2))
where W;; = [ yy- I ) k - F,- I( + F2-‘(!I) >I+. Comparing the estimates W,(”) and ~ ~ s , l can ) ~ see e that antithetic were replaced, correspondingly, by pairs I - b$I) and I - U , f ) in FYI(A) U,f’ and L&‘j in Mitchel [45j proved that, for any i > 0, both estimators @”)and @”) are more efficient than the sample-mean estimator. Control wriates.
It is suggested in Ref. 33 that C, = C,- I - A , + S, - I
I
(4.3.145)
Co = 0
+
be chosen as a control variate for W, = max(W;.-, - A, Si- 0), W,= 0. Table 4.3.1 presents var(K) for different methods and for the 200th customer, based on 25 runs. The service time has an exponential distribution with mean p-’ 1. I 1 1; the interamval time is assumed to be constant and equal to unity, and at time f , = 0 there are no customers in the system. We can see that the effect of variance reduction by the antithetic and control vanates is substantial. fb) Networks i)Antithetic oariates To illustrate the use of antithetic variates for networks, consider the network shown in Fig. 4.3.1. Suppose we wish to estimate the expected completion time of T = TI T, by simulation, assuming that TI and Tz are independent. The procedure of using antithetic variates for estimating E ( T ) is straightforward and can be written as:
+
1 Generate two sequences of random numbers 1,. . . ,N}.
{r/l(‘’, i= I , , ..,N) and
{@*),i=
TPWe 43.1 var ( Wi) for Diffweat Methods
Method
Sample-Mean
Antithetic Variates
var (. W.j-)
10.678
1.770
Source:
Data from Ref. 33.
Control Variates (@ 1A27
-
1)
151
VARIANCE REDUCTION TECHNIQUES
r,
TZ
@
43.1
Network (from Rel. 29).
2 Compute T,i = F1-'( fJ,(')),T2,= 4- '( q(')),T;, = Fl- '( 1 - U,(')), and Ti, = 4- '( 1 - p). 3 Estimate E ( T ) by
Let us assume that both TI and T2are distributed exp(1). Then denoting = Ti, + Tii, we obtain = Tli + T,, and
On the other hand, in the sample-mean method with 2N runs we have var(T ) =
1 2ff.
(4.3.147)
Thus the variance has been reduced by about one third. It can be proven by analogy with Proposition 4.3.3 that for any continuous r.v. T, and T, the method of antithetic variates is more efficient than the sample-mean method. This simple example has been chosen solely to simplify the presentation. The method of antithetic variates can be successfully employed for any more composed network. Control variaces. Consider the network shown in Fig. 4.3.2. We are interested in finding E( TAB),the mean completion time of the network. We
152
MONTE CARLO INTEGRArlOK A N D VARIANCE REDUCTION TECHNIQUES
Fig. 43.2
Network.
B
Fig, 133 The upper control network.
Fig. 43.4 The lower control network.
assume that all T,, i = I , . .. , 10, are independent exponentially distributed 10. Even in this case it is difficult to c.v.’s with the same mean calculate E(TAB)because of the “crossing” link of duration TI,,. It is suggested in Ref. 29 that the control networks be chosen as a subnetwork of the original complex network, formed by deleting links with low probabilities of falling within the critical part. Two such control networks are shown in Fig. 4.3.3 and 4.3.4: the upper and lower control networks, respectively. For these two control networks the mean completion times are available analytically. Table 4.3.2 presents simulation results for the expected value and the variance of the completion time for the network in Fig. 4.3.2. The
153
EXERCISES
Table 433 Simulation results for t k Netnr#k En Fig. 43.2 Control Variates Sample Antithetic Method
Mean
Expected Value E(TAB) 55.1 Variance var ( TAB) 6.2
Variates
Upper Network Lower Network
54. I
54.3 3.8
4 .o
53.8 3.1
Source: Data from Ref. 29.
following methods are considered: sample-mean, antithetic variates, and control variates, using both the upper and the lower control networks. The simulation results are based on 50 runs. It is clear that the degree of variance reduction depends on our skill in selecting the controI networks, which is not an easy problem.
EXERCISES 1 Apply Chebyshev's rule to find the minimal sample size N for which the following formula will hold:
?+(lo,- 11 5 c )
=a,
where
I N xg(X,) N i-1
6,-(b-a)--
and
X-%(u,b).
2 Assuming that for sufficiently large N
find the confidence interval for I with the level of significance a.
3 Prove Theorem 4.3.2. Hint: apply Bellman's dynamic programming recursive equation, or Lagrangian multipliers. 4 Let f - X ~ - , u , l i , where I i - j g , ( x ) d x and ui are known coefficients. An unbiased estimator of I is
where j x (x ) is a multidimensional distribution.
154
MON I E CAR1 0 IN I'FGRATION A N D VARIANCE REDUCTION TECHNIQUES
(a) Prove that min,r(r, v a r ( q ) is achieved when /x(x)= I Q ( x ) l / j I Q ( x ) l d x , where Q(x ) = 2:- ,a,g,( x ) a n d is equal to
(b) Prove that, if A',, ..., A',, are independent, then minjx(r,var(q) is achieved
when
and is equal to n
i2 -
a:/:, ,=I
where
From Evans [ 5 ] . 5 Consider the integral
which can be estimated by both the sample-mean Monte Carlo method .
N
and by the antithetic vanates method
.
where the sample A',, i = I , . . . N , is taken from % ( a , 6). By the assumptions of Proposition 4.3.2 prove var 8, 5 fvar 6,. 6 Let m = 2 , N,= N2 = N / 2 , in the stratified sampling method. According to Proposition 4.3.1, P,-f and P2 = 4. Prove that if we choose P,- f and P2 m i , then €or any g( x ) E L2(XJ), var& > var O,, that is, the stratified sampling method is worse than the sample mean method. From Ermakov [4). 7 Prove by induction on rn that
jwhere
w( x,,,
J w * ( x 0 . x 2 ...., x,)dxO,dxl
....,d x , = ( m +
m+ I
xI,. . .,x,)
is defined in (4.3.119). From Sob01 [22].
I)!,
155
REFERENCES
8 Find an estimator for I=~mg(x)c-kxdx.
k >0,
assuming that the sample is taken from the exponential distributionJ,( x ) = Xe-Ax, A 3 0. Prove that, for ~ ( x = ) rx: c > I . The minimum variance of the estimator will be achieved when A = k/(n + I ) . From Soh1 1221. 9 Let U be a random number and let X = clU ib and X' = a(l that the correlation coefficient between X and X ' is equal to - I .
- U )+ b. Show
10 Consider the following network: Assume that 7;. i = I , 2,3, are i.i.d. r.v.'s distributed F7( I). Write two formulas for estimating the expected campletion time E( TAB),using the following methods:
(a) Sample-mean Monte Carlo method. (b) Antithetic variates.
7-3
11 Prove that while integrating in situation of noise (see Section 4.2.4) both
8, and
4, converge as. and in mean square to I = j g ( x ) d x and that var6, < var8,. It Let I = $ g ( x ) h ( x ) d r = E ( g ( X ) ) ; where h ( x ) is a p.d.f. L e t j x ( x ) be another p.d.f. An unbiased estimator of I is
and is equal to
13 Show that the method of antithetic variates is a particular case of the method of control variates.
REFERENCES Burt, I. M. and M. B. Garman, Conditional Monte Carlo: A simulation technique for stochastic network analysis. Munage. Sci.. 18. 1971, 207-217. 2 Clark, C. E., Importance sampling in Monte Carlo Analysts. Oper. Rrs., 9. 1961, 603-620. 3 Cochran, W. S.,SanQling Techniques,2nd ed., Wdey, New York, 1966. 1
156
MONTE C A R L 0 INTEGRATIOK AND VARIANCE REDUCTION TECHNIQUES
4 Ermakov, J. M.,Monte Carlo Method and Relared Questions, Nauka, M-w, 1976 (in Russian). 5 Evans. 0.H., Applied multiplex sampling, Technometrics, 5, No. 3, 1%3,341-359. 6 Garman. M. B.. More on conditional sampling in the simulation of stochastic networks, MaMge. Sci., 17, 1972, 90-95. 7 Gray, K. G., and K. I. Travers, The Monre Carlo Method, Stipcs, Champaign, Illinois, 1978. 8 McGrath, E. I., Fwrdamentals 01Operations Researcll, West Coast University, 1970. 9 Halton, 1. H., A retrospective and prospective survey of Monk Carlo method. Soc. Indust. Appl. Math. Rev., 12, 1970. 1-63. 10 Hammersky, J. M. and D. C. Handscomb. Monre Curio Methods, Wiley, New York, 1964.
It I2 13
14
15 16
17 I8 19 20 21 22 23 24
25 26 27
Hammersley, J. M.and K.W. Morton, A new Monte Carlo technique antithetic variatcs, Prw. Cambridge Phil. Soc., 52. 1956,449-474. Kahn, M. and A. W. Marshall, Mcthods of reducing sample size in Monte Carlo computations, Opcr. Res., 1, 1953, 263-278. Laveaberg S. S. and Welch P. D., A perspective on the usc of control variables to increase the efficiency of Monte Carlo simulations. Research Report RC8161, IBM Corporation, Yorktown Heights. New York, 1980. Marshall, A. W., The usc of multi-stage sampling schemcs in Monte Carlo computations. in Syqosiutn on Monte Carlo Methodr, edited by M. A. Meyer, Wiley, New York, 1956, pp. 123-140. Michailov, G. A,. Some ProMems in the Theory of the Morue Carlo Method, Nauka, Novosibirsk, U.S.S.R.,1974 (in Russian). Mitchell. 8.. Various Reduction by Antithetic Variatu in GI /G/1 Queueing Simulation, Oper. Res.. 21, 1973, 988-997. Mood,A. M., F. A. GnybiU, and D. C.Bog,introdrrction to the Theory of Statistics, 3rd cd., McGraw-Hill, New Yo&, 1974. Morhman. I., The application of sequential estimation to computer simulation and Monte Carlo procedures, J. Assoc. C q p . Mach., 5, 1968, 343-352. Neuts, M.,Probobilip, Allyn and Bacon, 1972. Powell, M. 1. D. and 1. Swam, Weighted uniform sampling-A Monte Carlo technique for reducing variance, J. Imt. Marh. Am.. 2, 1966, 228-238. Shreider. Yu. A, (Ed.). nlp Monte Carlo Method (:he Method of Statisrical Trids), Perpmon. EImsford, New York, 1966. Sobol, I. M., Conlprt:atioml M e t h d of Monre Carlo, Nauka, Moscow, 1973(in Russian). Spanier, J., An analytic approach to variance reduction, Soc. Indust. Appl. Math. J. Appl. Math., 18, 1972, 172- 192. Trotter, M. F. and Tukey, J. W., Conditional Monte Carlo for normal samples, in Sympaciwn on Morue Carlo Methodp, edited by M.A. Meyer, Wiky, New Yo&, 1956, pp. 64-79. Wendcl, J. G.,Groups and conditional Monte Carlo, Ann. Math. Stat., 2% 1957, 1048- 1052. Yakowitz, S. J., Compwationai Probabilip ond Sitnuhion, Addison-Wesley, Reading, Massachusetts, 1977. YakowiY S. ct al., Weighted Monte Carlo integration, Sm. Indus:. Awl. Math. J . Nwnrr. Anal., 15, No. 6, 1978, 1289-1300.
ADDITIONAL REFERENCES
157
ADDITIONAL REFERENCES (SECTION 43.12) 28. Burt, J. M., Jr. and M. Garman. Monte Carlo techniques for stochastic network analysis, in Proceedings of the Fourrh Conjerence on the Applicationr of Simulation, December 9- I I, pp. I46- 153. 29 Burt, J. M,Jr., D. P.Gaver, and M. Perlas. Simple stochastic network: Some problems and proccdures, N o w 1 Rer. Logist. Quart. 17, 1970,439-460. 30 Carter, G. and E. Ignalt, A simulation model of fire department operations, INI.Elec. Electron. Eng, Tram. Syst., Man, Cybern., 6, 1970, 282-292. 31 Carter, G. and E. Igna11, Virtual measures for computer sirnulation experiments, Report P-4817,The Rand Corporation, Santa Monica, California, April 1972. 32 Gaver, D. P. and G. S. Shedler, Control variable methods in the simulation of a model of a multiprogramrned computer system, Nau. Rar. Logist. Quart., 1% 1971,435-450. 33 Gaver, D. P. and G. L. Thompson. Programming and Probability M&h in Opcrationr Research, Brooks/Cole, Monterey, California, 1973. 34 Iglehart, D. L., Functional limit theorems for the queue C I / C / I in light traffic, Ado. A&. P d . , 3, 1971,269-281. 35 Iglehart, 13. L. and P. A. W. Lewis, Variance reduction for regenerative simulations, I: Internal control and stratified sampling for queues, Technical Report 86-22, Control Analysis Corporation, Palo AIto, California. 1976. 36 Lawnberg. S. S., Efficient &mation of work rates in closcd queueing networks, in Proceedings in Coqvutarional Slatisisrics, Physica Verlag. Vienna, 1974. pp. 353- 362. 37 Lavenberg S. S., Regenerative simulation of queueing networks, Rescarcb Report RC 7087,IBM Corporation, Yorktown Heigths, New York, 1978. 38 Lavenbcrg, S. S.,T. L.Mocller,and C. H.Sauer, Concomitant control variables applied to the regenerative simulation of queueing systems, Research Report RC 6413, IEM Corporation, 1977. 39 Lavenberg S. S., T. 1.. Moeller, and P. D. Welch, Control variables applied to the simulation of queueing models of computer systems, in Conpurer P e r f o r m e , North Holland, Amsterdam, 1977,pp. 459-467. 40 Lavcnbcrg, S. S., T. L. Moeiler, and P. D. Welch, Statistical results on multiple control variables with application to variance reduction in queueing network simuiation, Restarch Report, IBM Corporation, Yorktown Heights, New Yo&, 1978. 41 Lavenberg, S. S.and C. H. Sauer, Sequential stopping d e s for the rcgencrative method of simuiation, I B N J. Ra. Q e w ~ i o p . , 21, 1977,545-558. 42 Lavenberg, S. S. and G. S. Shedler, Derivation of confidence intcrvals for work rate estimators in closcd queueing network, SOC.Induri. Am. Math. J . Comp., 4, 1975, 108- 124. 43 Lavenberg, S. S. and D. R. Slutz, Introduction to regenerative simulation, IBM J. Res. Dtwlop., 19,1975.458-462. 44 Lavenberg, S. S. and D. R. Slutt, Regenerative simuiation of M automated tape library. IBM J . R a . DeoClop., 19, 1975,463-415. 45 Mitchell, B., Variance reduction by antithetic variatcs in GI/G/1 queueing simulations, Oper. Res., 21, 1971, 988-997. 46 Page, E. S., On Monte Carlo methods in congestion problems, Oper. Res., 13, 1965, 300-305.
Simulation and the Monte Carlo Method
R E W E N Y. RUBINSTEIN Copyright 0 1981 by John Wiley & Sons, Inc.
CHAPTER5
Linear Equations and Markov Chains In this chapter we show how Monte Carlo methods can be used to solve linear algebraic, integral, and differential equations. As a rule Monte Carlo methods are not competitive with classical numerical methods for solving systems of linear equations (some special cases where Monte Carlo methods can be used are considered at the end of Section 5.1.3). We discuss the Monte Carlo methods, however, because they serve to introduce analogous Monte Carlo methods for solving integral equations. These methods are widely used, since numerical methods are not efficient in this latter case. This chapter is constructed as follows: In Section 5.1 we solve a system of linear equations and find the elements of the inverse matrix in the system by simulating discrete-time Markov chains. The problem of finding a solution of integral equations by simulating continuous-time Markov chains is the subject of Section 5.2. Finally, in Section 5.3 we construct a Markov chain for solving the Dirichlet problem.
5.1 SIMULTANEOUS LINEAR EQUATIONS AND ERGODIC MARKOV
CHAINS
A Monte Carlo solution to a system of linear equations is based on one proposed by von Neumann and Ulam and extended by Forsythe and Leibler [4]. Let us consider a system of simultaneous linear equations written in vector form Bx -/, (5.1.1)
158
SIMUL.TANEOUS LINEAR EQUATIONS AND ERGODIC MARKOV CHAINS
I 59
where the vector x' = (x,, .. .,x,,) is to be found and the matrix B = (1 b.'I and the vector f ' = ( f l , .. .,f,) are given; f denotes the transpose operation. Introducing I - A = B, where I is an identity matrix, system (5.1.1) can be rewritten as x=Ax+f. (5.1.2) Suppose (5.1 3)
Under this assumption we can solve (5.1.2) by applying the following recursive equation: X ( k + 1)
Assuming x o = 0 and A'
E
x(k+')=
= A X ( k )+f,
(5.1.4)
I, we have
+A + - -
(I
'
+Ak-'
+ Ak)f
k
(5.1.5)
A"f.
= m-0
Taking the limit, for B nonsingular, k
2
lim x(")= lim k-oo
A"'f=(I-A)-'f=B-lf=x,
(5.1.6)
k+m m 3 0
we obtain the exact solution of x. Thejth coordinate of the vector x k + ' is equal to
-t
*
*
+
E ll,i2,.
,
a,ilaali2*. *
aik-likAk*
(5.1.7)
. sdk
We also consider the. problem of finding the inner product ( h , x ) =hlxl+ +h,x,,
-
(5.1.8)
where h is a given vector and x is a solution of (5.1.2). ft is readily seen that by setting h ' = ( O ,.... 0,1,0 i-?-
i we obtain xi.
I...,
0)
(5.1.9)
160
LINEAR EQUAIIONS AND MARKOV CHAINS
In order to solve (5.1.2) let us introduce an arbitrary ergodic Markov chain (M.C.) P = I1PijIIy (5.1.10) n
n
xpi=l,x<j=~, i- 1
P ~ L O , P , , > O , ~ ,..., , ~ = n=, ~
j - 1
such that* 1
Pi>%
2
Ej>O
if hi # O
...,n,
ifa,,#O,i,j=1,
(5.1 . I 1)
where p i and 4, are, respectively, the initial distribution and the transition probabilities of the Markov chain. ) ) , apWe first consider the problem of estimation ( h , ~ ( ~ + ’ which proximates ( h , x } . Let k be a given integer and let us simulate the Markov chain (5.1.10), (5.1.1 1) k units of time. We associate with the Markov chain a particle that passes through the sequence of states i,, i,,...,i k . Define (5.1.12)
which can be written recursively
%wm= ww,-, 111
em
wo=l.
(5.1.13)
lb,
We also define the random variable (r.v.) h i,
qL(h) =-
k
I: W m J -
(5.1.14)
PI,m=O associated with the sample path i o + i , - + . . * -+it, which has probability p , a P , a , I ~ I , .2P. , . l , h .Now we are able to prove the following Propositioo 5.1.1 k
E[ vk(h)] = ( h ,
A”/)
=(h,X(k+l))l
(5.1.15)
m- 1
that is, ?)k( h ) is an unbiased estimator of the inner product (h, d k + ’ ) ) . *The Markov chain need not be homogeneous; we are considering the homogeneous case for simplicity only.
161
SIMULTANEOUS LINEAR EQUATIONS AND ERGODIC MARKOV CHAINS Proof
Each path i,-+i,
.
4
+ i k will be realized with probability
P( i,, i , ,. . . ,i k ) = p j o ~ -Pi,,, a i l * . . Pih - - I r k. '
(5.1.16)
While simulating the M.C.(5.1.10)-(5.1.1 I), since the r.v. ak(h) is defined along the path i o - + i , - + . . +ik, we have n
n
2
E [ . ~ l , ( hJ )=
.
* *
I,-1
2 ~k(h)PioP,oil ik-l
* * * ek-,jkt'
(5.1.17)
which, together with (5.1.12) through 15.1.14). gives
k
n
n
(5.1.18)
Using the property Zy,, Pi, = 1, the last formula can be written as k
E[ 7)k(h)]
n
I= Z
n
X
* * *
m-Qrp-l
~
~
~
~
~
a
,
l
~
~
l
i(5.1.19) ~ ~ .
i,=I
Taking into account that
and
we immediately obtain k
E [ ? l k ( h ) ]= ( h , m - 0 A Y ) r < h . n ~ k + 1 9 .
Q.E.D. To estimate (h, x f k + ' ) ) we simulate N random paths ig)+i!')--+ - - 4 iy', s = 1,2,. . .,N, of length k each and then find the sample mean
-
. N
(5.1.20)
.
'
~
~
162
LINEAR EQUATIONS AND MARKOV CHAINS
2%e Pme&m for Esrimpting (h, x('+')) 1 Choose any integer k > 0. 2 Simulate N independent random paths ig)+i;')+ 1,. ..,N, of the Markov chain (5.1.l0)-(5.l.11).
*
d i p ) , s =I:
3 Find
where
4 Calculate
I N Ns-,
ek = - 2 vt'(h)*
( 5 . I .23)
which is an unbiased estimator of the inner product { h , . d k + ' ) ) . Taking the limit of (5.1. 15), we obtain
-
provided that the von Neumann series A + A 2 + * converges and the path i @ - i , - + * h i k . . . i s infinitely long, we obtain an unbiased estimator of ( h , x ) . The sample-mean is then of the form
Thus
-
9
.
N
( 5 . I .25)
where
and ( 5 . I .27)
We note that the inner products (h, X L _ , A m f ) for different h andfcan be found from (5.1.23) by using the same random paths ir)+ii')-+ * * i t ) , s 3 1,.".,N of the M.C.(5.1.10)-(5.1.11).
-
SIMULTANEOUS LINEAR EQUATIONS A N D ERGODIC MARKOV CHAINS
Remark
163
In the particular case where A = aP,0 < a < 1, we have W, = a"
and
5.1.1
Adjoint System of Linear Equations
Let us define for the system of linear equations (5.1.2) an associated system of linear equations (5.1.28) x* = A'x* + h where A' = llaf, I!: is the transpose of A. It is readily seen that (5.1.29)
-
Indeed, we have from (5.1.2) and (5.1.28) (x*,x) = ( x * , A x ) + (x*, f ) and (x,x*) = ( A ' x * , x } + ( h , x ) , respectively. Now (5.1.29) follows because ( A ' x * , x ) = ( x * , A x ) . We call the pair (5.1.2) and (5.1.28) adjoint systems. A direct consequence of (5.t.29) is that there exists another unbiased estimator of (h, x ), which can be written as (5.1.30)
where
are defined on the sample path I , , 4 i , .--,* . ' + i k , which is obtained from the Markov chain defined by the following:
p * = 114; 11; n
n
I:p:=t, 1-1
XP;=l, J-
p *i > - 0,
P,; 2 0,
1
such that 1.
p,?>O,
iff;.+O
2.
P,;>O,
ifa,!,#O,i,j=l.
...,n .
i , j = 1 ,..., n,
164
!.INEAR EQUATIONS A N D MARKOV CHAINS
In the particular case for which P in (5.1.10)-(51.11) stochastic matrix, that is, n
is a doubly
n
!",,=I
and
xcj=l,
(5.1.31)
i s1
j- I
P* can be chosen equal to P'. Assuming also A' = A, then together with (5.1.31) we obtain P' = P , and (5.1.30) becomes $(A)
L 2k
=-2
W,h;_.
(5.1.32)
Pi, m-0
Comparing (5.1.14) with (5.1.32), we can see even in this case, that is, when A' = A and ' P = P, q $ ( h ) # v k ( h ) . The difference between q z ( h ) and q k ( h ) is in terms of J o and /I,~,which are interchanged. We return now to the original problem (5.1.2) of estimating all coordinates xi of the vector x. In order to estimate the j t h coordinate x, of x we assume
-
h ' = e , = ( O ,..,,O , l , O ,...,0)
i
and start simulating the M.C. from the state j, that is, pI,=p, = 1. The corresponding path is then j-+i,-+i2-+* * * + i k . Denoting k
x W,hm9,
(5 . I .33)
E [ qk( e l ) ] = x;'+')%
(5.1.34)
? k k J
=
m=O
where
we immediately obtain the coroiiuty
and aIso BJe,) =
1
N
2 qf)(e,)=xjk+'). S=
(5.1.35)
I
I t follows from (5.1.33) that, in order to estimate all the components xi,
j = 1,2, ...,n, of the vector x, we have to simulate n random paths
j+i,+i2+.*--+ik,j= 1,2 ,..., n , o f theMarkovchain(5.1.10)-(5.1.11), each time starting from a new state i, = j .
SIMULTANEOCS L I N E A R E Q U A r l O N S A N D ERGODIC MARKOV CHAINS
165
Looking carefully at (5.1.33), we find that all qn(e,), j = 1,2,. . . ,n, are similar. They differ only in the initial terms a l o i , / ~ oand i l f ; , which are associated with the choice of the initial state i,. Thus for qk(e,) and q k ( e r ) we have 4 and a r l l / P r l ,.{, . respectively. We now turn to the question of whether or not all the components x, of x can be estimated simultaneously by simulating one path, The answer is affirmative. We start this topic with the following
a,,,/cl,.
Definition The path i0+I,-+ . -+iT will be called cowering if it has visited each s t a t e j = I , . . .,n at least once.
-
Let io-+ii-+ . . -+iT+. . be an infinite realization from the Markov chain (5. I . l0)-(5. I . 1 1). Because our Markov chain is ergodic, each state will be visited infinitely many times and the first hitting time to the state j, T, = min(t : i, = j ) is finite almost surely (as.). With this result in hand the procedure for finding all the estimates qk(e,), j = 1,2, . . . ,n, from one realization can be written in the following way: 1
1 Simulate a covering path i,+i,+.
. . --+iT-+.
.
-+I,.-+.
* *
-+iT+k,
(5.1.36)
where T - max,{q) = maxmin,(r : i , = j } ,j = 1 , . . . ,n , and k is some fixed number. 2 Find the first hitting time = min{t : i , = j ) for each state j I , . . ., n , separately. 3 Take the subpath i r , - * i , * + , - *. . . - + I r l + k(which is the part of the generated path) for each state j = I , . . . ,n , separatety. 4 Calculate a11 T, + k
W ~ A , ~ j, =
?Je,)=
I , . ..,n
( 5 . I .37)
m-c
where
are deiined on the same path (5.1.36) starting at different points T, associated with the first hitting time. Each subpath i T , 4 i z , , , - + *. * -+* * * -+i,,+& is of the same length k. Thus i 0 + i p . . . -+i, will be a covering path of minimal length (in a given realization).
LINEAR EQUATIONS A N D MARKOV CHAINS
166
5 Simulate N such independent random paths it)-.
j (ls ) + .
. - -,igd,-p
-
* +,$,?I+
*
. -+i$$)
+k
and find ( 5 . I .39)
which estimates x,. Therefore all r.v.'s qk(e,),j = I , . . .,n, are defined on the same path and calculated according to the same formula (5.1.37). The only difference between them is the starting point, which is determined by the first hitting time T, and is a random variable.
Proof The proof of this proposition is based on the strong Markov property, which is given in Ref. 2, Proposition 1.22, p. 117, which states that for any homogeneous Markov chain and any bounded function g defined on the state space, we have
E [ g ( i , , i,,
I,.
. . ,) / i f=i]
=ii
E[ g ( i 0 , i l , ..- )tio - j ] .
In our notations
E [ +jk(ej)lq= I ] = E [ g ( i , , i f +,,. . . , i f + k ) I i ,= j ] = E [ g(io,.. . , i k ) l i O = j ] .
By Proposition 5.1.1 E [ g ( i , , . . . , i k ) l i o- j ) = x j k + ' ) . Since E[qk(e,)(T/= l ] does not depend on I, we have E(qk(ej)lT,= I ] = E[jlk(e,)]= x;"').
Q.E.D.
(5.1.41)
Proof
167
SIMULTANEOUS LINEAR EQUATIONS A N D ERGODIC MARKOV CHAINS
Similarly,
1
{
=N E [ ij,.ei)
1' - [ x j k +
I)]')
=
1
var +(, ei ).
Now again using Proposition 1.22 of Ref. 2 (p. 117). we have E ( [i,(e,)121?
=t) =~[g(~,,.-.,i,+,)~i,=j]
[
.
= E [ i ( i o , . . , i k ) ~ i o = j=] E( v k ( e , ) ]
2
). Q.E.D.
Therefore var(B,( e,)] = vaqd,( e, )].
To compare the efficiencies of the two methods we use (4.2.28), which can be written var B,( ej) E=
(5.1.42)
ivar d,<e,) '
and assume without loss of generality N = 1. Since var @(, e,) = var d,( e,), we have e = t / L In the first case we have n trajectories each of length k, so the total length of these trajectories is nk. In the second case we have one trajectory of length max,, .. ,"{ 7;} + k,with mean E(max,, , ,,#{ q})+ k . It is obvious that the second algorithm is on the average more efficient when n > 1 and k % ( n - l ) - ' E [ T = maxj-,.,.,,n{T,)J. Because the first hitting time T,, j = I , . . . ,n , to each state is finite as., it can be proven that
,.
,,,
,
t I ~==+-a.s. I
ask-+oo,
(5.1.43)
"
that is asymptotically the method of covering path is n times more efficient than the standard Monte Carlo method. The efficiency of the second method can be improved if we can find i, = I such that
j -
1..
=
... n
min io = I . .
. ..n
E[max(?}li,=I]
and then take this i, = I as a starting point of the path or, equivalently, choose the initial distribution as
-
0, ifi f ii oo=#l/ .
LINEAR EQUATIONS AND MARKOV CHAINS
168 5.1.2
Computing the Inverse Matrix
It follows from ( 5 . 1 4 that m
x-
2 Amf=B-'f ma0
where B"' = 11 $;'lly-
=1
+ A + A z + . Thejth coordinate of x is n x i = 2 bi;%. *
r- I
Setting
-
(5.1.44)
x j = bj;',
(5.1.45)
f = e , = ( O ,...,O , I , O ,...,0), r
we obtain and the estimator qk(x i ) in (5.1.33) becomes
v&&-')=
c
wm.
(5.1.46)
m/i, -r
Here the summation with respect to W, is taken over the indices i , = r , that is, when the particle visits the state r . The sample mean is then (5.1.47)
where s = 1,2, . . , ,N is the path number. Thus setting
h'-h,-e,=(O
-
,...,O , l , O ,...,0)
(5.1.48)
i
and f=f;=e,=O
,...,0,1,0,...,0, L---y---.l
r
we can estimate all the elements bi;' of the inverse matrix B - ' by (5.1.47). Inasmuch as the problem of determining bj; is a particular case of the problem of finding x,. we can estimate all the elements b,;' of the j t h row of the inverse matrix B - ' simultaneously with xi. Thus the Monte Carlo method provides a way of estimating a single element or any collection of
'
SIMULTANEOUS LINEAR EQUATIONS AND ERGODlC MARKOV CHAINS
169
the elements of B - I . This desirable feature differentiates the Monte Carlo method from other numerical methods in which, as a rule, all the elements of B are computed simultaneously. By solving the adjoint system we can estimate simultaneously all the elements bj;' of the rth column of the inverse matrix B - ' . It follows also from (5.1.36) through (5.1.39) that all the elements bi;' can be estimated simultaneously with the x,'s from the covering path. Before leaving this section we want to turn the readers' attention to the analogy that exists in calculating integrals and solving systems of linear equations by Monte Carlo methods, Calculating the integral
-'
Z=Jg(x)dx,
we introduce any p.d.f. f x ( x ) such that
where X is distributed with p.d.f.j,(x) andf,(x) > 0 when g(x) # 0. Then taking a sample N fromf,(x), we estimate the integral I by (see (4.3.4))
While solving the system of linear equations we introduce any ergodic Markov chain (5. I.10)-(5. I. I 1). Then simulating our Markov chain, we obtain the path io+i,-+ * + i k with probability P(io, i,, . . .,i k ) =
-
PiAThe element x j k ' ' ) of the vector x - ( ~ + ' )can be written (see (5.1.7)) as
Pi,Pioil**
*
x j k + I)
5 4 -+ 2 a , l , ~ +- t C aJita,tilf;2 + *
itth
il
+
.
i t . i 2 , . . , i,
ajitai,i2. . aik.t;kLk
170
LINEAR EQtiATlONS AND MARKOV CHAINS
where l)k is distributed according to ql,C . . ,P,k-E,,.Here i , = j and p, = 1. Now considering N random paths j(’)-+i;’)-+i$‘)-+- - 3 $3’ & s= I , . ..,N, we can estimate x j k ) by (5.1.39). Comparing (4.3.4) and (5.1.39, we realize that both problems of calculating the integral and solving the system of linear equations can be reduced to the problem of estimating the expected value of some random function. In our case the random functions are g ( X ) / j x ( X ) and ?f&(e,), respectively. These results allow us to suggest a general Monte Curio procedure for solving different problems, which can be written as: +
3
1 Find a suitable distribution associated with the problem. 2 Take a sample from this distribution.
3 Substitute the values from the sample in a proper formula, which estimates the solution. 5.1.3. solving a System d Linear Equations by Simulating a Markov Chain with an Absorbing State
Another possibility of estimating ( h , x ) is by simulating a Markov chain with an absorbing state, as was suggested by Forsythe and Leibler
(5.1.50)
with n
P,, 2 0,
i . j - I ...., n , P , . , + & =g, = 1 -
2 P,,2 o /”I
n
~ p , = I , p 1 2 O , i = = 1 , 2,..., n , I
4=
which is essentially an augmented (51.10) matrix. Here p , and P,, are, respectively, the initial and the transition probabilities. Assume also: 1 p,>O, if h , # 0 , 2
<,>O,
i f a i j + O , i , j = 1,2,...,n .
(5.1.51)
171
SIMULTANEOL'S LINEAR EQUATIONS AND ERGODIC MARKOV CHAINS
The state n + I is called an absorbing state of the Markov chain (5.1.50)(5.1.51). It is well known (Cinlar [2]} that, if there exists a state i, i = I , . ..,n, such that P,.,+ > 0, then all the random paths i,+;, + * . . + i,,, terminate in state n + 1 a s . and the expected time of termination of each random path is finite, that is, E(v)< cc. We start to simulate our Markov chain (5.1.50)-(5.1.5 1) by choosing the initial state i , according to the probability pi,. i, = I , 2 , . . . ,n , where xioPi, - I . Consider now a particle that is in state 1,. The particle either will be absorbed with probability gio in state i, or will pass to another state i , with probability P,,,,. Generally. if at time m - I the particle arrived at then it will either be absorbed from there with probability the state i,gim--Ior will continue along the random path to the next state ,i with probability The random path ;,--+i,--+. * --+j(") has probability
,
,,
n
.
P i o ~ o i l ~* ~ ,* iPi-iipgi.T l
where gie=
<.:.,,+,= 1 - 2 P,> j - I
is the probabi!ity of absorption from state i,. Consider any r.v. q, which is defined on the parth i,-+i,--+* The expectation of 9 is f f i n
-
-+i(p).
n
where qk is defined on the path that terminates exactly after k units of time. kt
(5.1.52)
where W, is the same as in (5.1.12). Proposition 5.1.4 E[V(k,(h)] = < h * x ) *
(5.1.53)
that is, q t k ) ( h ) is an unbiased estimator of the inner product ( h , x ) , provided E(k) < 00.
172
LINEAR EQUATIONS AND MARKOV CHAINS
Substituting (5.1.22) in (5.1.54) and talung (5.1.12) into account, we obtain m
n
n
Now comparing (5.1.55) with (5.1.19). we immediately obtain (5.1.53). Q.E.D. The procedure for estimating ( h , x ) is: 1 Simulate N independent random paths $)--+il')--+. 1,. . ,N, from the Markov chain (5.1.50)-(5.1.51). 2 Determine
.
- - +if')(k), s =
where Wi') is the same as in (5.1.22). 3 Estimate ( h , x ) by
In the particular case where u,,2 0 and (5.1.SO) can be chosen as 01,
p=
: 0" I
0
... ... ...
X7- lai,< 1,
01"
Ptn+r
ann
Pnn+t
0
that is, p i , = a i j , i,j = I , . . . ,n. In this case W,
-
the matrix P in
1
1 and
There are, however, few applications of these techniques. The reason is that the Monte Carlo method is not competitive with classical numerical analysis in solving systems of linear equations. Still, there are some situations where the Monte Carlo method can be successfully used: 1 The size of the matrix A = IIa!, 11; is sufficiently large ( n > lo3), and a very rough approximation is required. 2 It is necessary to find ( h , x ) for different h andf, where x = A x +f. As mentioned above, such problems can be solved (estimated) simultaneously by simulating only one Markov chain.
173
INTEGRAL EQUATIONS
5 2 INTEGRAL EQUATIONS
One of the most fruitful applications of Monte Carlo methods is in solution of integra: equations. The reason is that such equations cannot be solved efficiently by classical numerical analysis. The idea of solving integral equations by a Monte Carlo method is similar to that of solving simultaneous linear equations. Both methods use Markov chains for simulation. There exists ample literature on solving integral equations by Monte Carlo methods (see [3,7-9]). Its history is connected with the problem of neutron transport, which is described in Spanier and Gelbard's monograph [9].One of thc earliest methods for solving integral equations by a Monte Carlo method was proposed by Albert [I] and was later developed in Refs. 3, 7, and 8. Before proceeding with this topic we need some background on integral transforms. 5.2.1
integral Transforms
Throughout this section we follow Sob1 [S]. Let K be an integral operator such that K + ( x ) = J K k x,)+(x,
1h, I
x , ED ,
(5.2.1)
which maps the function $ ( r ) into K + ( x ) . K + ( x ) is usually called the first iteration of J. with respect to the kernel K. The second iteration is
(5.2.2)
Proceeding recursively we obtain
+
the k th iteration of with respect to the kernel K. We can estimate such integrals by quadrature methods or by Monte Carlo methods, as described in Chapter 4. However, there exists another Monte Carlo method of estimating such integrals, a method that is similar to the method of solving systems of simultaneous linear equations and that based on simulating a Markov chain. Before describing the method let us introduce some notations and make some assumptions.
174
LINEAR EQUATIONS AND MARKOV CHAINS
For any two functions h ( x ) and $(x) their inner product is denoted by ( h , $), where ( h , It>= j h ( x ) $ ( x ) d x .
(5.2.4)
(5.2.5) (5.2.6) (5.2.7)
< 03
.(5.2.8)
JJ.’dx < OQ
(5.2.9)
jh’dx
and SIK’dxi@
< 00,
(5.2.10)
respectively. It is easy to prove, using the Cauchy-Schwarz inequality, that, if conditions (5.2.8) and (5.2.9) are met. then \(h, +)I < co.lndeed
In exercise 2 the reader is asked to prove K + ( x ) E L2(D ) , given (5.2.5) and (5.2.7). With these results we can return to our problem of evaluating Kk+.As we mentioned before, the method o f evaluating K k $ is similar to those for solving the system of linear equations described in Section 5.1.1. From now we consider the problem of finding the inner product ( h , K k \ t ) , which is similar to the problem (h,ZL,,A”’f). The reader is asked to keep this similarity in mind. By analogy with (5.1.10) and (5.1.11) let us introduce any continuous Markov chain
175
INTEGRAL EQUATIONS
satisfying j P ( x , y ) d y = 1, J p ( x ) d x= I , such that 1 p(x)>0, ifh(x)#O P ( x , y )> 0
2
(5.2.13)
if K ( x y ) + 0 ,
where p ( x ) and P ( x , y ) are, respectively, the initial and the transition densities of the Markov chain (5.2.12)-(5.2.13). By analogy with Proposition 5.1.1 we can readily prove the following
(5.2.14)
(5.2.15)
(5.2.16)
Assuming for some given y that h f x )-p(.x) = 6(x -y), where a(,) is Dirac’s delta function, we immediately obtain E[q,(h)J= Kk+. The procedure for estimating the inner product 0. 2 Simulate N independent random paths x:)+x/’)-+. 1,2,. . .,N,from Markov chain (5.2.12)-(5.2.13).
. - -+xp),
s=
3 Find h(x,) qy)(h)=-Wpqi(xk),
s = 1)...,N ,
(5.2.17)
P(X0)
where
(5.2.18) 4 Calculate Bk=-
I
lv
N
~$)(h)m(h,K~\t,), 2-1
which is an unbiased estimator of the inner product ( h , K k + ) .
(5.2.19)
176
5.2.2
LINEAR EQUATrONS AND MARKOV CHAINS
integral Equations of tbe Second
Kind
Consider the following integral equation of the second kind: 4x1
=I
(5.2.20)
K(x,x,)dx,)dx, +.f(x),
D
which can be written as z = Kz +f.
(5.2.2 1)
Let us assume that f ( x ) E L2(D ) , K ( x , x , ) E L2(DX D ) , and I X . I = s u P ~ I K ( x , Y ) l d L <1-
(5.2.22)
D
Under these assumptions by analogy with (5.1.4) we can estimate (5.2.20), applying the following recursive equation: Z ( k + 1)
= Kt(k)+f.
(5.2.23)
Setting Zo= 0 and KO =- 0, we get k
z'"')
-f+ Kf+
* * *
+ Kkf C t
Kmf.
(5.2.24)
m-0
Taking the limit k
lim d k ) = lim k+w
k-rm
,,,-a
Kmf==z,
we obtain the exact solution of z provided the von Neumann series converges. One way of estimating ( h , ~ is) via simulation of a continuous Markov chain similar to (5.1.10)- (5.1. I I). The following proposition can be readily proved [8] by analogy with Proposition 5.1.1. Proposition 5.2.2
For any given vector h
where the r.v. q k ( h ) is defined on the path xo+xI+*
*
+Xk,
such that
177
INTEGRAL EQUATlONS
The sample mean N
k
(5.2.28) estimates the inner product ( h , Z i _ o K ' " j ) . Assuming again h ( x ) = p ( x ) = S(x -y), we obtain (5.2.23). Considering an infinite path xo+xI +. + x k + * , we define the random variable
-
(5.2.29)
It can be shown that for q,(h) to be an unbiased estimator of ( h , ~ ) , that is,
it is not enough to assume the convergence of the von Neumann series
ZZ=&"f, that is, 00
2 Krnf
(5.2.31)
m=O
The reader is asked in exercise 5 to prove (5.2.301,provided
$jf,K m f l < o o .
(5.2.32)
m=O
It is obvious that, when K ( x , y ) 2 0 andf(x) 2 0, both (5.2.31) and (5.2.32) coincide. Another way of estimating ( h , z ) is via simulation of a continuous Markov chain with an absorbing state similar to that of (5.1.50)-(5.1.51). Consider the random path xo-+x, -+ . . - + x ( ~with ) the absorption time k, which is a random variable such that E ( k ) < m. Define on this path the r.v. (compare with (5.1.52))
-
(5.2.33) where g( x) is the absorption probability, p ( x ) is the initial distribution, and
Then by analogy with Proposition 5.1.4 we can readily prove
E [ 7l,,,(h)] = ( h , z ) , provided Ez-ol K m f l < co.
(5.335)
178
LINEAR EQUATIONS AND MARKOV CHAINS
-
To estimate ( h , z ) we simulate A' random paths x ~ ) + x ! ~ ) + ** -+x(';\ with absorption state and find (5.2.36)
The problem x = A x + j can be considered as a particular case of the problem z = Kz +f. Indeed, let us partition the region D into n mutually and let us disjoint subregions Oil i = 1,2,. . . ,n, such that D = assume thatf(x) and K ( x , x , ) are constant functions in each subregion Di, i = 1,2,. .. ,n,that is, f(x) = A ?
x E Di x E D,, x , E 0,.
Qxrx,) = a , , ,
(5.2.37)
Then, for any x E Di, n
(5.2.38)
Inasmuch as z( x ) does not depend on x , the last formula can be written as n
zi=
C u,/z, + A *
(5.2.39)
I" 1
Thus by partitioning the region D into n disjoint subregions, we can find the solution of the integral equation (5.2.19) by solving the system of linear equation (5.2.39).
5.23
Eigenvalw Robtern
Consider the following homogeneous integral equation: Z(X)
= hJK(x.x,)z(x,)dx,,
(5.2.40)
which can be written as z=AKz.
(5.2.4 I )
If z # 0, then X is called the eigenvalue and z ( x ) is called the eigenfunction corresponding to A. Let us assume that the smallest eigenvalue A, is positive and that the kernel K( x , y ) = K ( y ,x ) is symmetric and positive definite, that is, ( K + , + } > 0 if # 0. Under these assumptions for any two positive functionsf and
179
THE DIRICHLET PROBLEM
h we have (see Sob01 IS)) (5.2.42)
and lim K m f ( x ) ( K m f ,K " ' f > - ' / 2 = t ( x ) ,
m-cm
(5.2.43)
where z(x) is the eigenfunction corresponding to A. We can estimate ( h , K " j ) and K"f simultaneously by a Monte Carlo method as described in Section 5.2. I . For further discussion of eigenvatue problems we refer the reader to Hammersley and Handscomb [Sj and Sobol[8]. Until now we have not made any special assumption about our Markov chains. We have required only that the estimators q k ( h ) and ~ ( ~ ) (beh ) unbiased. It is clear that the variance of both qk( h ) and qk,(h ) depends on the transition probabilities 4,. Since in solving linear and integral equations we have, respectively, sums and integrals to deal with, it should be possible to use some of the variance reduction techniques of Chapter 4 for better efficiency. in this context the reader is referred to Michailov [7] and Ermakov 131.
53 THE DIRlCHtMT PROBLEM
One of the earliest and most popular illustrations of the Monte Carlo method is the solution of Dirichlet's problem 141. Dirichlet's problem is to find a continuous and differentiable function u over a given domain D with boundary Do, satisfying (5.3.1)
and u ( x . y ) =g ( x , y ) .
for ( X , Y E Do
(5.3.2)
where g = g ( x . y ) is some prescribed function. Equation (5.3.1) with F ( x , y )# 0 is d i e d the Laplace equation; with F ( x , y ) = 0 it is known as the Poisson equation. Generally, there is no analytical soIution known to this problem and we have to apply a numerical method. We usually start by covering D with a grid and replacing the differential equation by its finite-difference a p proximation. Let us denote the closure of D by 5,that is, D u Do = 5,and
1%0
LINEAR EQUAHONS AND MARKOV CHAINS
the coordinates of the grid by x , = ah and y8 = p h , where h is the step size. Taking the two-dimensional case for convenience, we call the point (x,,ya) E 5 an interior point of 6 if four neighbor points of (x,,yg), namely, (x, - h a a ) , (x, + h,ya), (x,,up - h), and (x,,y8 + h ) also belong to D. We call ( x , , y p ) ~ Da boundary point if there are not four neighbor points that belong to 5. 'faking this definition into account, we have for any interior point u a + t,fi
+ ' a - h,p
@ a ' *
h2
i-
u a . . f i + ~ *Ma8 + ua.B-1 E:
h2
Fa83
(X,,Y~E ) D,
(5.3.3) which is the finite-difference equation of (5.3.1). Here uOp= u(xa,ya), Faa = F(x,, x8), u,, = u(x, 2 h,yp), and u,,8-c = u(x,,yp2,). The fast equation can be rewritten as % @ = f ( ~ , - I , p + U P + l . B + ~ , , p - 1 ~ 0 , / 3 + 1- h 2 F p ) .
(5.3.4)
The boundary condition (5.3.2) is then ga.8
U0.p
and
(x,+yp) E Do.
(5.3.5)
It is not difficult to see that by numbering all the points (x,,y@) E B i n any order we can rewrite (5.3.4) and (5.3.5) as n
u,=
2 u,,u,+x,
i , j - 1 , 2,...,n .
(5.3.6)
J-1
Here n is the number of mesh points f.x,,.v8) E 5,which is also equal to the order of the matrix Iia,, /I .; The matrix [lu,,ll has a specific structure: all diagonal elements are equaI to zero; each row corresponding to an interior point of D has four elements equal to $, ail other elements being zero; each row corresponding to a boundary point of D o contains also elements equal to or zero, but the number of $ elements is as that of neighboring points, which is always less than 4. Thus the Dirichlet problem is approximated by a system of linear equations (5.3.6), which can be solved by the Monte Carlo methods described in Section 5. I .3.
EXERCISES 1 Describe an algorithm for simulating an ergodic Markov chain.
2 Prove that K $ ( x ) E L z ( D ) ,given (5.2.5) and (5.2.7).
181
REFERENCES
Prove that E [ q i ( + k I] K ' # > b k 5 Prove (52.30). given Zz-olK m j l < 03. ie
a
Gin.
< 00.
6 Prove (5.2.35), given x:,,lKmjl
7 Consider the recursive formula (5.2.23) zk+I=Krk++f.
Assume z o = +(x), where +(x) is any function. Then k
K m f + Kk+'+.
irk+'+ m=O
Define Vk(h)=-
h(x-0)
k
2
wmf(xm)+
wk+I/(xk+l)
P ( X O ) m-0
and prove that
E [q k ( h ) ]= ( h , Z " + ' ) } . 8 Prove (5.1.43), that is, prove that asymptotically the method of covering path is n times more efficient than the standard Monte Carlo method. 9 Consider the systems of linear equations x = Ax
+f, where
The exact solution of this is x = (xl. x l ) = (7.58.75). Simulate the following Markov chain with an absorbing state: 0.5 0.2 0.3 P I 0 3 0.4 0 3
lo
0
I'
I
and estimate the exact solution x = (xI,x 2 ) = (7.5,8.75) by making a run of the IOOOlh replication of the Markov chain.
REFERENCES I
Albert, G. E., A general theory of stochastic estimates of (he Neumaan series for solution of certain Fredhdm integral equations and related series, in Symposium o/ Monre Carlo Merhodr, edited by M.A. Meyer. Wiley, New York 1956, pp. 37-46.
182
LINEAR EQUA’HONS AND
MARKOV CHAINS
2 Cinch, E.. Infroducrion fo Stochatric P m m e s . Pnntice-hall. Eaglewood Cliffs. NCW Jersey, 1975. 3 Ennakov, J. M.,Monte Cario Method and Related QuesfMN, Nauka, M m w , 1976 (ia
Russian). 4 Forsythr, S. E. and R. A. Leibler, Matrix inversion by a Monte Carlo method, Math. Tables Other Aidr Compuf., 4, 1950, 127- 129. 5 Hammersky. I. M .and D.C. Handscomb. Morrtu CurloMeihnds. Illethuen. 1.ondon. 1964 6 Halton. 1. H..A retrospectiveand prospective survey of Monte Carlo method, Soc.Xndurr. Appl. Math. Rev.. 12, 1970, I - 63. 7 Michaiiov, G. A., Some Problem in tho Theory of f h t Monre Cad0 Meihod, Nauka, Novosibirsk, U.S.S.R.,1974 (in Russian). 8 Sobol, J. M., Cmywtational Metltodr of Monfe Carlo. Nauka, Moscaw, 1973 (in Russian). 9 Spanicr. J. and E. M. Gclbard, Monre Carlo Principles and Neutron Tronsporta;im Problem, Addison-Wesley, Reading, Massachusetts, 1969.
Simulation and the Monte Carlo Method
R E W E N Y. RUBINSTEIN Copyright 0 1981 by John Wiley & Sons, Inc.
CHAPTER6
Regenerative Method for Simulation Analysis 6.1
INTRODUCTION
it has already been mentioned in Chapter I that many real-world problems are too complex to be solved by analytical methods and that the most practical approach to their study is through simulation. In this chapter we consider simulation of a stochastic system, that is, of a system with random elements. Simulation of such systems can be considered as a statistical experiment, in which we seek valid statistical inferences about some unknown parameters associated with the output of the system (or the associated model) being simulated. However, classical methods of statistics are often unsuitable for estimating these parameters. The reason, as we see later, is that the observations made on the simulated system are highly correlated and nonstationary in time; under these circumstances it is difficult (actually impossible} to carry out adequate statistical analyses of the simulated data. To overcome these difficulties a procedure based on regenerative phenomena, called the regenerative method, has recently been developed. Historically, Cox and Smith [4] were the first to suggest use of regenerative phenomena for simuiating a queueing system with Poisson arrivals. This idea was extended by Kabak [39] and Poliak 1591. Quite recently, Crane and Iglehart [6-91 developed a methodology for the regenerative method, based on a unified approach to analyze the output of simulations of those systems that have the property of self-regeneration, that is, of invariably returning (at particular times) to the conditions under which the future of the simulation becomes a probabilistic replica of its past. In other words, if the simulation output is viewed as a stochastic process, the 183
184
REOENERATIVE METHOD FOR SIMULATION ANALYSIS
regenerative property means that at those particular times the future behavior of the process is independent of its past behavior, and is governed by the same probability law, that is, at those times the stochastic process “starts afresh probabilistically.” Crane and iglehart showed that a wide variety of problems, such as communication networks, queues, maintenance and inventory control systems, can be cast into a common framework using regenerative phenomena; they then proposed a simple technique for obtaining point estimators and confidence intervals for parameters associated with the simulation output. The regenerative method also provides answers to the following important problems: how and when to start the simulation, how long to run it, when to begin collecting data, and how to deal with highly correlated data. The theory and practice of the regenerative method are now in the process of rapid development. The list of references contains about 100 relevant papers known to the author. An excellent introduction to the regenerative method can be found in Crane and Lemoine’s book (101. Iglehart’s forthcoming monograph 138) will present a rigorous development of both the theory and practice. Many others recently obtained results, in particular regarding simulation of response time in networks of queues, are to be found in lglehart and Shedler’s monograph [37]. This chapter is organized as foliows. The basic ideas of the regenerative method are discussed in Section 6.2. Section 6.3 deals with statistical problems, in particular with the confidence interval for the expected values of some functions defined on the steady-state distribution of the process being simulated. In Section 6.4 the ideas of the regenerative method are illustrated for a single-server queue, a repairman system, and a closed queueing system. Choice of the best among a set of competing systems is the subject of Section 6.5. Section 6.6 deals with a linear programming problem in which the coefficients are unknown and presents the output parameters of regenerative processes. Variance reduction techniques in regenerative simulation are the subject of Section 6.7.
REGENERATIVE SlMUWTiON
6.2
We start this section with the definition of a regenerative process. Roughly speaking, a stochastic process ( X ( f ) : I 2. 0) is called regenerative if there exist certain random times 0 < To < T, < T, < * forming a renewal process’ such that at each such time the future of the process becomes a
--
*A sequence of random variables (q:n 2 0) is a renewal T, Ta-,(n 2 1) are i.i.d. r.v.’s.
-
6provided !hat To= 0 and
6.2
185
REGENERATIVE SIMULATION
probabilistic replica of the process itself. Informally, this means that at these times the future behavior of the process is independent of its past behavior and is invariably governed by the same law. In other words, the part of the process ( X ( r ) : T i - , < f I T,} defined between any pair of successive times is a statistically independent probabilistic replica of any other part of the same process defined between any other pair of successive times. The times {T :i 2 0) are called regeneration times and the time between q - , and T, is referred to as the length of the ith cycle. Formally [5], a stochastic process ( X ( t ) : f 2 0) is regenerative if there exists a sequence To,T,, . . . of stopping times+ such that: I
T = { q : I = O , l ,. . . I isarenewalprocess.
2 For any I , rn E (0, 1 , . ( X ( r , ) , .., X ( t , ) } and (X(T,
. . I,
.. . ,i , > 0, the
random vectors are identically distributed and the processes (X(f): t < T,} and (XCT, f I ) :f 2 0 ) are independent.
.
f,,
+ t , ) , . .. .X(T, + r , }
For example, let { X, : n 2 0) be an irreducible, aperiodic, and positive recurrent Markov chain with a countable state space I 5 (0, I,. .. }, and let j be a fixed state; then every time at which statej is entered is a time of regeneration. Let us select a fixed state of the Markov chain (M.C.), say state 0. We then obtain a sequence of stopping times { q :i 2 0) such that O = To < T, < T2 < . - and X, 0 almost surely (as.); that is, once the system enters state 0, the simulation can proceed without any knowledge of its past history. For another example, let us consider the queue size at time t for a Gi/G/ 1 queueing system. Suppose the time origin is taken to be an instant of departure at which rime the departing customer leaves behind exactlyj customers. Then every time a departure occurs leaving behindj customers, the future of the stochastic process after such a time obeys exactly the same probability law as when the process started at time zero. More examples of regenerative processes are considered in Section 6.4. It is shown in Ref. 8 that under certain mild regularity conditions the process ( X ( t ) : I > 0 ) has a limiting steady-state distribution in the sense that there exists a random vector X such that lim P{X ( r ) 5 x } = P( X 5 x).
-
I-+W
.A random variable T taking values in [O, + ,OD) is a stopping time [S) for a stochastic process { X ( r ) :r 2 O), provided that for every finite f 2 0, the occurrence OT nonoccurrence of the event {T5 r ) can be determined from the history ( X ( s ) : s c t ) of the process up to time 1.
186
REGENERA1 1VE METHOD FOR SIMULATION ANALYSIS
This type of convergence is known as weak convergence and is denoted X(r)rjX as r+ 00. The random vector X is called the steady-state vector. Let f : Rk-+R be a given real-valued measurable function, and suppose we wish to estimate the value r = E { f ( X ) } , where X is the steady-state vector. For the M.C.{X,, : n 2 0 ) we have
r = ~ ( f ( ~= )z f) ( i ) P ( X = i ) - x j ( i ) q . iEI
(6.2.1)
iEI
Here, n = ( P (X = i ) :i E I} is the steady-state (stationary) distribution of the regenerative process { X,, :n > 0}, and f ( i ) can be interpreted as the penalty (reward) paid in state i. To find r we can solve the following linear system of stationary equations, n = nP, where P = :i,j E I ) is the transition matrix, and then apply (6.2. I). Let us assume that the valuesf(i) are known but the transition matrix is unknown. It is clear that the value r cannot be found analytically, since 7~ is determined by P, and simulation must be used. Another case is when P is known but the state space is very large; in this case it may be quite difficult to solve the system n*sP, and we must resort to sirnulation again. Possible functions f of interest are the following:
(el
1 If
then E ( f ( X ) ) = 9 , 2 If
then E ( f ( X ) } = P ( X l j } . 3 Iff(i)==iP,p>O,then E { f ( X ) )= E { X P } . 4 If f ( i ) = c, = cost of being in state i, then E ( f ( A')} (the stationary expected cost per unit time).
-
&,c,
P( X
=i )
+
Let 5 denote the interval between the ith and the (i 1)th regeneration times, that is 7 = q+ - 7;, i 2 0 ; 7, is referred to as the length of the ith cycle. Next, assume E(T,)< 00, and define (6.2.2)
6.3
POINT ESTIMATORS AND CONFIDENCE lNTERVALS
187
or T,+,-l
(6.2.3) depending on whether the process ( X ( t ) : t 2 O} is continuous-time or discrete-time. In other words, y. is the penalty (reward) during the cycle of length T~ = q + ,- q.. Naturally, Y, is a random variable (r.v.) because so are T, and f(Xi). We now formulate two fundamental propositions that are used extensively in the rest of this chapter. Proposition 6.2.1. The sequence {( T , T ~: i) 2 I } consists of independent and identicaily distributed (i.i.d) random vectors. Propitioo 6.2.2.
If
T,
is aperiodic,* E ( r l ) < og, and E ( l f ( X ) l ] < 00,
then (6.2.4)
There is an analogous ratio formula when T ! is periodic. For proof of these propositions the reader is referred to [S]. Proposition 6.2.1 says that the behavior patterns of the system during different cycles are statistically independent and identically distributed. Proposition 6.2.2. enables us to estimate the value r = E( Y , ) / E (7 , ) (which is the same as r = E( Y , ) / E ( 7 , ) ) by classical statistical methods, and to find point estimators and confidence intervals fur r. These two problems are the subject of the next section. 6 3 POINT ESTIMATORS AND CONFIDENCE LlVTERVAI.23 [tZ, 281
In this section we consider several point estimators and confidence intervals for the ratio E ( Y , ) / E ( 7 , ) . The problem we consider is as follows: given the Lid. sequence of random vectors { ( Y , T ~ ) : l}, ~ > find point estimators and construct IOO(1 - &)% confidence intervals for the ratio E( r,)lQ7,1. *The random variable T, is periodic with period X > 0 if, with probability 1, it assumes values in the set (0,A, 2X,. .. ] and A is the largest such number. if there is no sucb A, then T , is said to be aperiodic.
188
REGENERATIVE METHOD FOR SIML%ATION ANALYSIS
Let Z, = Y, - '7,. It is readily seen that the Z,'s are i.i.d. r.v.'s, since the vectors (Y,, 7 , ) also are. Note also that E(Z,)=0
(6.3.1)
and a Z = v a r ( Z , ) = v a r ( Y , ) - 2 r c o v ( Y , , ~ , ) f r 2 v a r ( ~ , ) . (6.3.2) Denote Y = (1 /n)X;, I Y,, and 7' = (1 /n)z:, fr,;then by virtue of the central limit theorem, (c.1.t.) we have
(6.3.3) where 3 denotes weak convergence and it is assumed that o 2 < 00. The last formula can be rewritten as
(6i3.4) where i = F/+. Inasmuch as u is unknown, we cannot obtain a confidence interval for r directly from (63.4). However, we can estimate u2 in (6.3.2) from the sample, that is, by s2 =si,
- 2is1, + f2szz,
(6.3.5)
where
and .
n
It is straightforward to see that s 2 - w 2 a.s. as n+w, so (6.3.4)can be rewritten as n"2( i - r ) e N ( 0 , I}, as n - w , (6.3.6) S/.F
and the IOO(1- 61% confidence interval for r = E( ~ ) / E ( Tis~ )
(6.3.7) where t6 = (p- '(I - S/2), $J is the standard normal distribution function, and i = y/F is the point estimator of E ( Y l ) / E ( T i ) .The procedure for
6.3
189
POINT ESTIMATORS AND CONFIDENCEINTERVALS
obtaining a loo( 1 - a)% confidence interval for I can be written as follows: 1 Simulate n cycles of the regenerative process. 2 Compute the sequence .. . , r n and the associated sequence Y,,..., Y, (use (6.2.2) and f6.2.3), respectively, for a continuous-time or
discrete-time process). 3 Compute Y=(l/n)Z:,,Y, estimator by
and 7’ (l/n)xy-i7i and find the point
Y
(6.3.8)
i=y. 7
4 Construct the confidence interval by
where zd= +-‘(I
- 6/2)
and + is the standard normal distribution.
It is readily seen that i-^ = T/F, referred to as the clarsicul estimator [28], is a biased but consistent estimator of E( Y , ) / E ( T ~ Iglehart ). [28] suggested, for the same purpose, the following alternatives: BEALE ESTIMATOR
(6.3.9)
FIELLER ESTIMATOR
(6.3.10)
where
JACKKNIFE ESnMATOR
(6.3.11)
where
190
REGENERATIVE METHOD FOR SIMULATION ANALYSIS
TIN ESTlMATOR
(6.3.12)
Let us now cite some results from Ref, 28. The four point estimators (6.3.9) through (6.3.12) as well as the classical estimator are biased. Their expected value can be expressed as E [ i ( n ) - J r + c + c2 + o( (6.3.1 3)
f)
2
5=
The point estimators (6.3.9), (6.3.1l), and (6.3.12) have been suggested in order to reduce the bias of (6.3.13) up to order I / n 2 . For the jackknife method c , = 0, since
The reader is asked to prove that for both Beale and Tin estimators c , / n is also equal to zero. - i,}--+O as. as n 3 ~ we, can Since both n ’ / * ( i - i b b ) d O and replace i both in (6.3.6) for the c.1.t. and (6.3.7) for the confidence interval without changing the results. For the jackknife method formulas (6.3.6) and (6.3.7) can be written, respectively, as
9-r /2$
as n-+
* N ( O , 1)
00
(6.3.14)
i
where
The FielIer method yields the following lOO(1 - 6)% confidence interval: I,=
I-
0 ’ / 2
7-
7
(72 - k,szz)
t, +
( i 2- k,s,,)
(6.3.15)
6.3
POINT ESTIMATORS AND CONFIDENCE INTERVALS
191
where
and
k, =
[+-‘(I
-w]’ n
The performances of these estimators were compared numerically (via simulating several stochastic models), and the following results were obtained [28]. For short runs the jackknife method is recommended both for point estimators and confidence intervals because it produces slightly better statistical results than other methods. Two minor drawbacks of the jackknife method are a large memory requirement and slightly more complex programming. Additional storage addresses of the order of 2n are required, where n is the number of cycles observed. Where the storage requirement for the jackknife method is excessive, the Beale or Tin methods are recommended for point estimates and the classical method for the confidence intervals. The Fieller method is recornmended for neither point nor confidence intervals. It is found to be heavily biased for short runs and more compiicated than the classical method. The above mentioned five point estimators were based on simulating n cycles of regenerative processes. Another possibility is to consider point estimators based on the simulation for a fixed (but large) Length of time f . In this case the number of cycles N, in the interval (0, r ] is a random variable given by
N,= 2 $ o , q K l s= 1
where I,o,rl is the indicator function of the interval [ O , t j . Replacing n by N,,we can modify all the point estimators (6.3.8) through (6.3.12), preserving their consistency. For example, for the classical estimator we have
Thus, asymptotically, there is little difference while considering point estimators based on simulation n regenerative cycles or on simulation for fixed length of time 1. The c.1.t. in this case is
192
REGENERATIVE METHOD FOR SiMULAnON ANALYSIS
Recently Heidelberger and Meketon 1321 considered estimators based on simulations for a relatively short length of time I . They defined estimators Nt
xx
i( N,)=
I *v, i=
X
(6.3.17)
Ti
i- I
and N,+ 1
(6.3.18)
They then showed that E f i ( N,))= r + O(
L) 1.
(6.3.19)
and E(i(N,+ l ) ) = r + o ( + ) ,
(6.3.20)
so that a bias reduction is achieved by continuing the simulation until the first regeneration after time r. The bias reduction is comparable to that of the jackknife, Beale, and tin estimators since t is proportional to the number of cycles. Table 6.4.3 lists empirical results from simulations of a closed queueing network model for these estimators. We turn now to the problem of determining run length. The lOO(1 - S)% confidence interval for a large but fixed number of cycles has a width approximately equal to
2a+- ’( I - 6/2)
-
E(T,)n’/2
(6.3.21) ’
In terms of duration time I (6.3.20) can be written as (see [24])
(6.3.22) Note that neither u nor E ( 7 , ) are known in advance. Hence it may be worthwhile to take a small sample and obtain rough estimates for u and E(T,).Such estimates would form a basis for a final decision or run length
6.4 EXAMPLES OF REGENERATIVE PROCESSES
193
and level of confidence. We wish to emphasize that all ratio estimators described in this section are designed for simulations with a fixed number of cycles n or a fixed run length t. An alternative possibility would be to consider procedures based on sequential stopping rules.
6.4
EXAMPLES OF REGENERATIVE PROCESSES
In this section we consider three examples of regenerative processes, taken from Refs. 6, 10, and 49: a single server queue, a repair model with spares, and a closed queueing network. 6.4.1
A Single Server Queue GI/G/Z [ti]
This example was described in Section 4.3.12, and will be briefly recapitulated here. Let Cr: and S, be the waiting time and service time, respectively, of the ith customer in a singe server queue. Let A,+, be the time between the arrival of the ith and ( i + 1)th customers. We assume that {S,,i 2 0) are i.i.d. with E ( 8 ) = p-' and that { A , , i 2 I } are i.i,d. with .!?(A,) = A - ' , Let the traffic intensity p be defined by p = A/p. We assume that customer number 0 arrives at time 0 to an empty system. Let X , = S,- I - A , for i 1 1. The waiting time process {W,, i 2 0) can be defined recursively by w,-0 ~ = ( F V , - ~ + X , ) +i ,> I . It i s known [36]that, if p < 1, there exists an infinite number of indices i such that W,3:0 and a random variable W such that W,* W, as i 4 to. Thus we choose zero state as our return state and regenerations occur whenever a customer arrives to find an empty queue. We are interested in estimating E ( W ) , which is finite if E(S:) < m. Since no analytical results are available for calculating the steady-state waiting time E ( W ) , we estimate it via simulation by making use of the classical estimator (6.3.8). The simulation results are shown in Fig. 6.4.1, We see that the customers I, 3,4, 7, 11, and 16 find the server idle, that is, W,= W,= W,= W, = W , ,= W,, = 0, while customers 2, 5, 6, 8, 9, 10,. 12, 13, 14, and 15 find the server busy and wait in the queue before being SC?fVed.
It follows from Fig. 6.4.1 that the simulation data contains five complete cycles with the following pairs {(&7,),i = 1,. . ., 5 ) : (Y1,71) (10,2), ( Yz,r2)= (0, l), (Y,, T ~ ) =(30,3), (Y,, 7,) = (50,4), and ( Y5,T 5 ) = (60,5)}The sixth cycle will start with the arrival of customer 16. Using the
194
REGENERA I I V E METHOD FOR SIMULATION ANALYSIS
Customer number
Fig. 6.4.1 Sample output of queueing simulation.
,x/Z;-
classical estimator i = 2:=
i K
r=--
,-I
5
2 1-
, T ~ we ,
obtain
- 10+0+30+50+60 2+1+3+4+5
150 =-* 15
10.
7,
1
This result can also be obtained by using the sample-mean estimator
l
i=-
N
,-,
I l5 wj=Ts 2 1-1
150 15
w,=-=10.
Here N = C:-,T, = 15 is the length of the run and Zy-tW, = C:-,q. A logical question arises. If both points estimators r' and F are equal (we assume that the length of the run N is equal to n complete cycles, n < N), why do we need all the ratio estimators (6.3.8) through (6.3.12), (6.3.17) and (6.3.181, based on the regenerative phenomena? The answer can be found if we consider not only point estimators for r = E( W ) but confidence intervals as well. In order to construct confidence intervals in the sense of classical statistics, the simulation data must form a sequence of i.i.d. samples from the same underlying probability distribution. The simulation data from the queueing system is the sequence of waiting times W,, . . . , W,. Note, however, that if we start our simulation with an empty queueing system, then the first few waiting times tend to be short, that is, they are correlated, and as a rule, the sample-mean estimator ? will be a biased estimator of r = E( W).
6.4
195
EXAMPLES OF REGENERATIVE PROCESSES
Tabie 6.4.1.
Simulation R d t s for tbe M / M / t Queue
Theoretical Parameter
Value
Point Estimates
Confidence Interval
r = E( W ) = -
0.loo
0.110
[0.0%,0.123]
&W2)
0.040
0.046
[0.035,0.056]
0.120 2.Ooo 0.I73
0.133 2.1 10 0.182
[0.116,0.148] [2.012,2.207] [0.141,0.271]
E( y, ) E( 7 , ) +
qfiw-:-oTj&7i)
4W)
}
Ref. 6. Note: Number of cycles n = Moo, level of confidence IOO(1 - 6)= 90%~ number of replications N = 10. h = 5, p = 10). Source:
To overcome this difficulty we can run the model until it reaches the steady state and then start collecting and updating the simulation data. The problem of determining the steady-state distribution is a difficult one, moreover, requiring considerable computation (CPU) time, but unless we start from it W, and Wj+ will again be correlated (if Wi is short, then W,+, will also tend to be short and vice versa). Since the r.v.'s W , ,...,W, are correiated, classical statistical methods cannot be applied in constructing confidence intends for r = E( W ). Still, this difficulty can be overcome by using the regenerative property, namely, by grouping the simulation data into independent pairs (blocks) ( V,,7 , ) . . .. ,( Y,,T"), which yields different ratio estimators (see (6.3.8) through (6.3.12)' (6.3.17)and (6.3.18)) and the associated confidence intervals by means of classical statistics. Table 6.4.1 presents simulation results for the queueing system M / M /I with h = 5, p = 10 based on a run of 2000 cycles. Confidence intervals at the 90% level are given for the parameters E( W ) , E( W 2 ) , E((V%'- 0.1 )+}, E ( T , ) ,and a ( W ) . The function + ) may be interpreted as a penalty for long waiting time.
,
_ I .
E(dGI7
6.4.2.
A Repairman Model with Spares [lo]
We now consider a repairman problem with n operating units and M spares (Fig. 6.4.2). Each of the operating units fails with rate A. A failed unit enters a queue for service from one of s repairmen on a first-in-first-out (FIFO) basis and is replaced by a spare (if available). The distribution of the i.i.d. repair times is exponential with mean p-' for each repairman. A
1%
REGENERATIVE METHOD FOR SIMULA'NON ANALYSIS
& Queue
u
r Repairmen
m Spares n Operating units
Flg, 6.4.2
Repairman model with spares.
repaired unit enters the pool of spares unless there are fewer than n units in operation, in which case it immediately becomes operational. Denoting by X ( r ) the number of units in service or waiting in the queue for service, then { X ( t ) , r 2 0) is a birth and death process with state space Z = (0, 1,. .., m + n ) , and
(
A, = n h y
(n+ m
Oli<m
-i)A,
m
I l i l s s
5 m +n
Let us simulate the system for T units of time and have as output the values X ( r ) , 0 5 I 6 T. where X ( r ) is the number of units at the repair facility at time r. 'The sample mean (l/T)j:X(t)dt is a consistent estimator for E( X ) where E( X ) is the mean number of units at the repair facility under steady-state conditions. However, unless the vaiue X ( 0 ) is obtained by sampling from the steady-state distribution of X, the sample mean will be a biased estimator due to the initial conditions. Moreover, it is seen that, if I , is close to r 2 , then X ( r , ) and X ( t z ) will be highly correlated, because the number of units in the repair facility usually does not change quickly. Due to the initial bias of the estimator and to the correlation of the output data, it is impossible to apply classical statistics in estimating the steady-state value r = €( X). However, by again applying the regenerative approach the difficulty can be overcome. From here on we repeat in essence what was done for the queueing simulation. The process { X( t ) : t 2 0) is a regenerative one in continuous time and P ( X ( r ) = i)=aP( X = i ) as t + 00 for all i E I
197
6.4 EXAMPLES OF REGENERATIVE PROCESSES Table 6.4.2.
Parameter
Simulation Results for R e p o l h m ~Model
Theoretical Value
Confidence Interval
5.353 1.269 0.465
(5.238,5.432] [ I .201, 1.3251
0.988
[0.987,0.990] [O.OIO,0.013)
[0.444,0.475]
0.012 42.02 I 73.375
i37.459.47.681j [65.262,83.342]
-
Source: Ref. 10. Note: Run length = 500 cycles; level of confidence 95%.
Suppose we start the simulation at time TI= 0 with n operating units and M spares, that is, at T, = 0 the repair facility is empty; then the sequence is (7;: i 2 0}, where T. is defined as the regeneration time when the repair facility becomes empty. In other words, the system “starts afresh probabilistically,” or regenerates itself, at each time q.For any real-valued measurable function -f we define
then the pairs (Yt,7!), . ..,( Y,,,7,). where ri = I;.+, - T,, are i.i.d. Suppose that the simulation time T exactly equals n cycles; then Dl
.
n
i- i
is a biased but consistent estimator for r = E ( f ( X ) )= E ( Y , ) / E ( T ~ ) . Table 6.4.2 gives simulation results for some output parameters based on run of 500 cycles. E ( i , ) represents the number of failures over a cycle. It is assumed that n = 10, m = 5, s = 4, and p = 2. The “lifetime” of an operating unit is exponentially distributed with A = 5. 6.43
A Closed Qwueiag Network I491
Consider a closed queueing system that is a model of the time-sharing computer system in Fig. 6.4.3. The network comprises M service centers with a fixed number N of customers. Service center 1 consists of N
198
REGENERATIVE METHOD FOR SIMULATION ANAL.YSIS
i Fig. 6 4 3
terminals (identical parallel servers); hence a customer at this center never has to wait for a sewer to become free. Service center 2 is a single server processor, that is, all customers receive service immediately, and if there are k customers present each customer is served at I/& of the server’s rate. Service centers 3,. . , ,M represent peripheral input-output devices (single server queues}, each of which is scheduled on a FIFO basis. A customer (device) completing service at service center 1 immediately enters service center 2, and immediately thereafter service center j with probability p, > 0,j = 3,. , . , M ,where Z K 3 p , = 1. After completing service at service centerj,j = 3,. . . ,M ,the customer enters service center I with probability I - p . or service center 2 with probabilityp. Service times at service centers j = 1,2,. . . , M are i.i.d. and exponentially distributed with mean p,-’. It is assumed that routing through the network i s Markovian and that all service and routing mechanisms are mutually independent. Let Q ( f ) = ( Q , ( t ) , . ..,Q,,,(f)), where Q , ( f ) is the number of customers at service centerj at time 1. It can then be shown [49] that { Q ( r ) :f 2 0) is a continuous-time irreducible Markov chain, and hence a regenerative process. We define a response time as the time interval between a customer’s departure from service center I and his next return to it, and let W; be the just completed time of the i t h customer arriving there. Then W = { W,, i 2 0) is regenerative with regeneration occurring whenever a last customer arrives at service center 1 leaving centers 2,. . . ,M empty. Again, we are interested in the expected stationary response time r = E( W),which is known to be finite [49]. Let d, be the utilization of service center i, that is, the long run average proportion of time the server there is busy. The particular parameters chosen for this model are listed in Fig. 6.4.3 and yield d2 =: 0.894, d, = 0.268, r = 8.65. Table 6.4.3 presents point estimators and 90% confidence intervals for several ratio estimators discussed in Section 6.4.3.
6.5
199
SELECTING THE REST STABLE STOCHASTIC SYSI'EM
TABLE 6.43. Point Estimates and 90 Confidence Intervals for E( W )= 8.65 in C W Oueulne Network N-5 N = 10 N 30 N = 50 Estimate 2 = 220 2-440 t = = 1320 t 2220
-
i(4 1
8.28 0.10 8.64 5 0.10 8.23 +- 0.17 8.93 t 0.23 5
i( Nt+ 1)
Classical i Jackknife
8.46
8.55
5 0.07
& 0.07
8.60 0.07 8.50 lr. 0.09 8.71 t 0.09 2
-
8.62
8.59 t 0.07 8.63
2 0.07
-e 0.07
8.56 t 0.07 8.61 & 0.07
8.60 rt 0.08
8.62
-c 0.08
-
Source: Ref.22. Note: N number of cycles simulated; r = number of response times simulated; R = 200 repIications for t = 220, 440; R = 100 replications for t = 1320; R 60 replications for r = 2200.
65 SELECTING THE BEST STABLE SIOCHASTIC SYSTEM
In this section we consider some techniques for selecting the best system from among m alternative systems according to a certain criteria. Assume that N( N 2 2) stochastic systems are being simulated, each giving rise to a regenerative process { X I ( f ) : t 2 0).i = 1,. .., N. For exampie, N alternative designs are considered for a new system. Suppose that the measure of performance for the ith system is i = I ,..., N
r,=E{f(X')),
(6.5.1)
wherejis a real-valued bounded measurable function, X ' is the steady-state random variable of the regenerative process ( X ' ( t ) : f 2 0). The problem is to choose the best system, that is, the system with the smallest value of 5 : rl-
min
1 - 1 , _ . . ,N
r,=
min
I=I
,..., N
E(~(X'))).
(6.5.2)
(We are minimizing 5 ; the alternative problem of maximizing ri can be considered as well.) Iglehart (301 presents a method based on the following scheme. Two positive numbers P* and S* are specified. Then with probability P* the system with the smallest (largest) r, is selected whenever that value of 5 is separated by at least P from the other 5's. Two procedures have been considered in Ref. 30 for this problem. The first procedure is sequential and the second is two-stage. Both procedures involve the use of normal
200
REGENERATIVE METHOD FOR SIMULATION ANALYSIS
approximations and require large samples in terms of the number of cycles of the regenerative processes simulated. We consider here another adaptive approach suggested by Rubinstein (611.Our method is based on an iterative procedure that selects the best system with probability 1. We start solving the problem (6.5.2)by considering the following linear programming problem: N
min W( p ) = min P
C E{ j(x')}p,,
(6.5.3)
....,N.
(6.5.4)
/=I
p
subject to N
xp,=1,
p,20,i=I
f-1
If there exists a unique solution of (6.5.2),then the problem (6.5.3)-(6.5.4) is equivalent to (6.5.2)and its solution is given by a vectorp* with a single nonzero component: (6.5.5) p* = (0,...)0,1,0, ...,0).
I
The algorithm for solving the problem (6.5.3)-(6.5.4) is based on a step-by-step correction of the probability vector pin], where n denotes the step number. There exists a mechanism, provided by (6.5.9)below, which ensures that p , [ n ] 2 4n], i = I , . . ..N, where { ~ [ n ] } : - ~is a monotone decreasing sequence of positive numbers, subject to (6.5.13) and (65.14) below. On the nth step the ith system, i E ( 1 , . . .,h'}, is chosen by simulating the distribution p [ n - I]. We denote this event by X [ n )= X'. One cycle of the process { X ' ( r ) : I 2 0 ) i s carried out. Denote by ~ ' [ n ] , i = I , . . . ,n, the totai number of renewal cycles made by the ith system up to and including the nth step. We check whether or not the inequality v k [ n I ] 2 ne[n],k E { I , . . . ,i - 1, i + I , . . N ) is satisfied for all systems. If for some indices k , , .. .,k , E { 1,. . * , i - I, i + I,. ..,N}, this inequality does not hold then one additional cycle is carried out for each system k , , . . .,k,, so that ultimately
-
+,
v k [ n ]2 n e [ n l ,
We record k
7,"
= qitml - 7&-
1'
k - I ,..., N.
(6.5.6)
k = i , k , , . ..,k,,
the lengths of the cycles performed, and for each k calculate j( X k ( i ) ) d r ,
k = i , k , , ... . k ,
if the process { X ' ( t ) :i 1 0) is continuous-time.
(6.5.7)
6.5
SELECTING 'THE BEST STABLE STOCHASTIC SYSTEM
20 1
In the case of a discrete-time process the integral should be replaced by the corresponding sum over the v k [n]th cycle. Set also Y," s=
T,"
= 0,
if k B { i , k l , ...,ks}.
(6.5.8)
We construct a new distribution p [ n ]by the following recurrence formula:
A n 1 =%*,m,{P[n-
11 - v b I H n l i ) ) .
(6.5.9)
Here S, is a simplex in R N :
ns, is the projection operator onto the simplex S,, such that, for any x ERN,
I l l -&)I!
= min llz -yII, YES.
and B ( . I . ) is a vector { B , ( . l . ),..., B,(.I.)), where Bk(nli)
'ikp,'[
-
'Irk[
'1
(6.5.10) (6.5.1 1)
Yk[n]= Y k [ n - I ]
+ Y,",
# [ n ] = . ~ ~ [ nI ]- +T,k.
k- I , ..., N (6.5.12)
The initial values of p[O] E ScIol,Y [O] = ( Y '[O], . .., Y "O]), and 7[0]= ( ~ ' [ o j , . . ,T ~ [ O ]can ) be chosen arbitrarily, for example, Yk(0)= 0,~ ~ (=0 ) 0,k = 1,. ..,N. The sequences { y [ n ] } r , and { ~ [ n ] ) , "must ~ be chosen so that the following conditions are satisfied: (6.5.13) (6.5.14) (6.5.1 5) m
c e [dn1 c n]Vi
n=1
00.
(6.5.16)
202
REGENERATIVE METHOD FOR SIMULATION A N U Y S I S
Remark f
In order to satisfy conditions (6.5.13) through (4.5.16) take, for
example, y[ n] -n-I,
e [ n ] -11-0.4.
We assume that r," 2 r,, > 0, k = I , . . . ,m, n = 1,2,. . ., that is, that the cycle is taken into account only if it is of some minimal length (which can be considered as the sensitivity threshold of the measuring instrument). Remark 2
Remark 3 The r.v.'s Y k [ n ] and ~ ' [ n bk = 1,..., N, n l 1, defined in (6.5.12), store the information obtained up to and including the nth step. We should aIso note that, for each k fixed, only v L [ n ] summands in both Y k ( n ]and 7 ' [ n l are nonzero.
Theorem 6.5.1 If the values of the function f are uniformly bounded by some constant D and if there exists the unique optimal solutionp* of the problem (6.5.3)-(6.5.4), then for any initial distribution p[O] E the sequence ( p [ n11,"- generated by the algorithm (6.5.9)-(6.5.14), converges t o p * with probability I .
,,
Coroliary The theorem remains valid if we assume that the values of the function f cannot be observed directly, but are measured with a random noise. In other words, (!
x ' } = E ~ ( Q X( ' , < ) } ,
i = I . . ..,N ,
where 6 is a random vector with an unknown time-independent probability distribution function. In this case we can consider another random process: { U l ( t ) : i L ~ } {=( x l ( r ) , t ) ) ;
i=i
,...,N.
If ( X ' ( t ) : t 2 0 ) is regenerative, then ( U ' ( t ) : t 2 0} is also regenerative and the values of Q are uniquely defined for each value of the steady-state r.v. I/' of the process ( U 1 ( f :) t 2 01,and E ( j ( X ' ) } =EfE,(Q(X'.E)}}=E{Q(U')},
i= I,
..., N .
Proof of the Theorem Before proving the theorem, let us introduce some notation. Let ll[nJ = r l [ n ] - r,, 4 n 3 = maxlr,[n]f (6.5.17) 1
.
where 5 = E ( / ( X ' ) } , i = 1,. . . N,and n== 1,2,
. .. .
6.5
203
SELECTING THE BEST STABLE STOCHASTIC SYSTEM
On the nth step the state of the algorithm can be descnbed by a 4iV-dimensional vector z[nl=( ~ [ n 7] [,n ]= ( ~ ' [ n .] ., .,~ ~ [ n ]Y[n] ) , = ( Y 1 [ n,..., ] Y N [ n l > . v [ n l = ( ~ ' f ~..., 1 , v N [ n J ) )We . first prove the following lemma.
Lemma. For any Z[O] such that p [ O ] E Srlol, and 7[0]> 0, m
i = i,...,N.
y [ n ] E ( ( t i [ n ] I I E I O ] }< a ,
(6.5.18)
n= I
Proof Without loss of generality set i = 1 and define n = 1,2, . .. . 2;= Y,'- ri7,1,
If a cycle of the regenerative process ( X ' ( r ) :t 2 0) was not carried out on the nth step, then 2: = 0. For all n's such that a cycle of ( X ' ( r ): t 2 0) was performed on the nth step, the 2; are i.i.d. r.v.'s with E(Z,') = 0 and variance u$. Define also
z'[n] = z ' [n - 11 i-z,!.
n = 1,2,.
. ., z ' [ O ]
= Y'[O]-r , d [ o ] .
Then by the Cauchy-Schwarz inequality,
s "{
z"
n ] ' j r [ o ] } .E"2(
< (( z'[ o])*+
(TI[
no=?)'"-^'/^( (
TI[
n ])-'I."
01)
OJ + rev'[ n]) -'In[ 01
1,
where r0 was defined in Remark 2. Since by (6.5.6) d [ n ] 2 ncfn], we have E ( l i , [ n ] l l E [ O ] )S ( ( Z ' [ O ] ) *+n~~)''~(~'[o]f?,ne[ .])-'.
Thus for n large enough E { [ f , [ n ] p [ O ]I }A , e - ' [ n ] n - - " * ,
(6.5.19)
where A, = A,(ZlO]>. Inequality (6.5.19) and condition (6.5.16) imply the Q.E.D. convergence of the series (6.5.18).
Corollary For any state Zjn] of the algorithm on the nth step, 00
c
m=n+ I
Y [ ~ l ~ ( ~ [ ~ J l <~ 09.[ ~ l }
Now we can prove our theorem.
204
REGENERATIVE METHOD FOR SIMULATION ANALYSlS
Consider the vector p * [ n ] E Sctnl,such that
(6.5.20) where I is defined by (6.5.2) and is unique by the condition of Theorem (6.5.1). We have:
6.5 SELECTING 'THE BEST STABLE STOCHASTIC SYSTEM
205
The first sum in (6.5.26) exists by (6.5.15) and the second by the corolIary from the Iernma. Taking the conditional expectation of both sides
206
REGENERATIVE METHOD FOR SLMULATION ANALYSIS
in (6.5.26),we obtain E { ~ [ n ] l Z [ n 1- 1 ) = E { l I p [ n ] - p * [ n ] l \ ' ( = [ n - I ] } +LE[n] Qo
+ D 2 N I3
y2[m]e-'[ m - 13
m=n+ 1 00
f
2
2N
y( m ]E{E(r[m]lZ[ n ] } I Z [n - I ] } I
m-n+
=E{llp[nI -P+Cnll121qn- 1 1 )
+LEE.]
m
+ D2N 2
y 2 [ m]~-'[ m
- I]
1
m-n+
m
2
+2N
y [ m ] E ( t [ m ] f Z [ n -I]).
(6.5.27)
I
m-n+
The last equality in (6.5.27) is justified by the fact that E [ n ] is a Markov chain taking values in R"151. Using (6.5.25), Efo[n]lZ[n-
13)s I I P [ " -
11 -p*[n-
1]11' cz,
+ LE[n - 1 1 + D Z N 2 y 2 [ MI€-'[
m - I]
man CQ
2
+2N
y [ m ] E ( r [ m ] l E [ n - I]) = t ) [ n - 13.
m-n
(6.5.28)
Thus o [ n ]is a supermartingait [ 5 ] with respect to E[n],and o(n]-+o a.s. as ti--+m. On the other hand, ra
4.1
x
=2N
m-n+
YE~lq~[mlJqn]}
(6.5.29)
1
is also a supermartingale, since 00
~ ( u [ n ] ~ ~11)[ n -
2
r [ n ] ~ { t [ m ] l ~ [ n 1-u~[ n] -}1 ] .
m=n+l
(6.5 -30)
Therefore u [ n ] - + u as. as n + o o and thus [ \ p [ n ] - p * [ n ] J I - + o - u a.s. Taking the unconditional (i.e., conditioned by Z[O]) expectation of both sides of the first inequality in (6.524). using (6.5.25), and summing up from
6.5
207
SELECTING THE BESl SrABLE SlOCI.IAS’~ICSYSTEM
n = 1 to n = n , , we obtain
-4 nil)
+M O I “I
+D
~ 2 N y2[
x
~ ] E - I [n
- 11
(6.5.31)
I
nRI
-2
Y [ n l E ( w P C n - 11)- Wp*En- 11))
n= I
+2N
2
y[n]E{r[n]).
n=
I
As nI-+ m the last sum converges according to the lemma. Therefore m
2Yb+
1lE{r(pEnl)-r(p*Enl))
00.
(6.5.32)
n- 1
By the Fatou lemma
z: y [ n + 03
1l{~(P[nl)--r(P*Cnl))< m a - s -
(6.5.33)
n- 1
From (6.5.33) and (6.5.14) follows the existence of a subsequence n k such that jlp[ n k l -p*[n,]112-+Oa.s.
asn,-+m.
Therefore v - I( = 0 a.s. and 11 p [ n ] -p*[n]ll-+O as. as n + w . On the other hand, p*[ n ]-+p*, and so p [ n3 +p* a s .
as n-, ao.
Q.E.D.
Example Search for an optimal policy in a Markov decision process in the absence of a priori information. Consider a system of I states, S,,.. .,S,.At every stage n = 1,2,. .., one of M possible decisions D l , ....DM must be made. Denote by S j n ] and D [ n ] the state and decision made in stage n , respectively. If S [ n ]= Siand D [ n ]=: D,, then the system moves at the next stage, n + I, into the state SJ with an a priori unknown probability T $ =P r { S [ n + l ]
=q\S[n]=Si,D[n]=Dk}.
This transition, if it occurs, is followed by a random reward (or penalty) cb
2041
REGENERATIVE METHOD FUR SIMULATION ANALYSIS
with an a priori unknown expectation. The expected payoff at stage Si, after the decision D, is made, is given by I
4; = j- I
+,"/.
A p o l i q is a vector of indices P (k],. ..,k,), which determines what decision should be made at each state: for every i - 1,. . . , I , ki is an integer lying between 1 and M ,and at state Sidecision Dk should be made. Suppose that some fixed policy P = (k ...,k,) is maintained. The system then constitutes a Markov chain with transition probabilities fs
,,
Henceforth it is assumed that for every policy P, the corresponding Markov chain is ergodic. Denote by v$'), . ..,n j P )the steady-state probabilities of this chain, that is, v,(')= lim P ' { s [ ~ ]- S , > ,
i = I ,...,I .
n-rcp
The problem is to find a policy P for which the expected payoff, I
r(P)=
C *,(P)cpf*, i= 1
is minimal. There are N = MI possible policies. For each policy P,,, = (k;",. . . ,k,"), m = I , . . .,N, let r,,,= rep-). The problem is therefore to choose the policy with the smallest value of r,,,. The regenerative process ( X m ( t ): I 2 0}, corresponding to the policy Pm, is the Markov chain whose states are S , , . , . ,S, and whose transition probabilities are lr,;?, i,j = I , . . . ,I. The regeneration times p,", n = 0, 1,2, ... , for this policy are the times of visiting a certain fixed state, say
s,.
Since the algorithm (6.5.9)-(6.5.16) does not require any a priori information about the regenerative processes ( X m ( t ):t 2 01, m = 1,. ..,N, or about the values of r,, .. .,r,, it can be applied for finding the optimal policy for the Markov decision process described above.
THE BEGENERATIVE METHOD FOR CONSTRAINED OPTIMIZATION PROBLEMS [st] 6.6
In this section we consider an algorithm for solving a linear programming problem, whose coefficients present some unknown characteristics of regenerative processes.
REGENERATIVE METHOD FOR CONSTRAINED OPTIMIZATION PROBLEMS
209
Let us consider the following linear programming problem: N
(6.6.1)
subject to N
r,(p)=
z E { f , ( X f ) ) P ,10, 1-
i-
l,...,M
(6.6.2)
1
N
P=(PI,*.*,PN)*
pt2°*
~ P P , ~ ~ (6.6.3) . I-
I
Here X', i = 1.2,. . .,N, are the steady-state r.v.3 of the regenerative processes (X'(r) : r 2 0), i = I , 2, .. .,N ;the functionsf,,j = 0,1,. . .,M ,are red measurable bounded functions defined on the ranges of these processes. E ( & ( X ' ) ) can be viewed as a performance index of the ith system, i - 1, ..., N.
We assume that the values E ( S , ( X ' ) ) , i = I , . ..,N,j = O,l,. ..,M,are unknown a priori; therefore the standard simplex method for solving this linear programming problem cannot be applied. Our solution for this problem is based on the penalty function given below and the regenerative approach studied in the previous sections. Before we start solving this problem let us note that, if we drop (6.6.2) in the linear programming (LP) problem (6.6.1)-(6.6,3), then the problem (6.6.1)-.(6.6.3)is identical to the problem (6.5.3)-(6.5.4), which is of course the same as the problem (6.5.2). The problem (6.5.3)- (6.5.4) is referred to as an unconstrained problem (UC) and is therefore a particular case of the constrained LP problem (6.6.1)-(6.6.3). We start solving the problem (6.6.1j-46.6.3)by introducing the following penalty function:
(6.6.4) where p,
> 0 , j = 1,. ..,M. The operator [ - I
+
is defined by
(6.6.5) Now instead of the original LP problem, the following problem is solved:
(6.6.6)
,,
where p satisfies (6.6.3)and the sequences (pj[n]}2- j = 1,.
. .,N,satisfy
210
REGENERATIVE METHOR FOR SIMULATION ANALYSIS
the following conditions:
Now we propose an adaptive algorithm that converges with probability one to the optimal solution of the LP problem (6.6.1)-(6.63). The algorithm i s similar to the algorithm (6.5.8)-(6.5.16) and is based on a step-by-step correction of the probability vector dn],where n denotes the step number. As in the algorithm (6.5.8)-(6.5.16)there exists a mechanism, provided by (6.6.12)below, that ensures thatpi[n] 2 ~ [ n li ,= I , . .. ,N, where ( ~ f n ] } , " is ~ a monotone decreasing sequence of positive numbers, subject to (6.6.16) through (6.6.21)below. On the nth step the ith system, i E ( 1,. .. ,N),is chosen by simulating the distributionp[n - 1). We denote this event by X [ n ] = X ' . One cycle of the process ( X ' ( t ) : z 2 0) is carried out. Denote by v ' [ n ] , i = 1,. . . ,n, the total number of cycles made by the ith system up to and including the nth step. We check whether or not the inequality v k j n I]2 n ~ [ n ]k, E ( I,.. .,i 1,i + 1,. ..,N},is satisfied for a11 systems. If for some indices k,,...,k , E { 1,. . . ,i - 1, i + 1,. ..,N ) this inequality does not hold, then one additional cycle is camed out for each system k,, .. .,k,, so that ultimately
-
-
v"n3
2ttEE.1,
k-I ,..., N,
(6.6.8)
holds. We record also k 7"
= T,k.,nl
k
- T,.[,l-I ,
k = i, k , , . . .,k,,
(6.6.9)
the lengths of the cycles performed. and for each k calculate M + 1 numbers
Y ~ ~ ~ ~ ~ ~ ~ ' ~ ~ ~k =(I ,~k ,,..., ~ ( k,, , ) j =) O~, l z, ..., , M, (MI-
I
(6.6.10)
if the process { X k ( t ): r 2 0) is continuous-time. In the case of discrete-parameter processes the integral should be replaced by the corresponding sum over the (vk[n])th cycle. Set also
Y,"'
= T," = 0 ,
ifk#i,k,
,...,k , , j = 0 , 1 ,..., M. (6.6.11)
The new distribution p i n ] is updated according to the following recurrence formula: P [ " l =%Jn)(PE"-
11 - Y [ 4 B ( n l Q ) .
(6.6.12)
REGENERATIVE METHOD FOR CONSTRAINED OPTiMlZATION PROBLEMS
211
Here S, is a simplex in RN: H
s,-
1
Pk= 1 , 0 < e < P k - - <1 ,
(p=(pf,...,P~): k- 1
vs, is the projection operator onto the simplex S,, such that for any
ZERN,
IlZ-%,(Z)lI
= min
YES,
lI~-Ylll
and S(-l.) is a vector ( B , ( . ) - ) ., . . , B , ( . l . ) ) , where (6.6.13) (6.6.14)
(6.6.15)
~ ~ [ n ] = 7 ' [ . -11 +~,k,
k - 1 ,...,N , j s 0 , 1 , ..., M. (6.6.16) The initial values of p[O]E SEIO,,Y[O], and T [O] can be chosen arbitrarily. In the above, the sequences
.I
[ Y >n"l {e[ 0 , (P'C n 3 In"- 1 and {P"[n] must be chosen in such a way that the following conditions are satisfied: 7
.I >n"-
1-;
n]iO%'[']Lo
(6.6.17)
Q,
2Y[4 RQ
(6.6.18)
=a
I
00
5: ( y f f l ] p ~ ~ [ n ] 2 E - - " n I- ] ) <
00
(6.6.19)
n- 1 W
n-1'2y[n]~"[n]E-'[n]
(6.6.20)
212
REGENERATWE METHOD FOR SIMUIATION ANALYSlS
Remark Z In order to satisfy conditions (6.6.17)through (6.6.22)we can take, for example, y[nf
-n
p" n ] -p'"
I,
E
[n]
-
n -0.2,
n ] -no.2.
Remark2 W e a s s u m e t h a t r ~ 2 T 0 > 0 , i =1,... l N l n = 1 , 2,..., thatis,a cycle will be taken into account if it is of some minimal length (which can be considered as the sensitivity threshold of the measuring instrument).
. .,N , j = 0.1,. ..,Mi n 2 1, defined in (6.6.16), accumulate the information obtained up to and including the npth step. It is worth noting that, for each fixed k, only v k [ n ] summands in both Y k ' f n ] and ~ ~J are [ nonzero. n Remark 3 The r.v.'s Yk'[ n] and T ' [ n ] , k = 1,.
Now we formulate a theorem, which is proven in Rubinstein and Karnovsky [62].
Theorem 6.6.1 If the values of the functions 4, j = 0.1, .. .,M, are unifomly bounded by some constant D and if there exists the unique optimal solution p* of the LP problem, then for any initial distribution p[O]E SeI0, the sequence {p[n]),"c, generated by the algorithm (6.6.7)-(6.6.22) converges with probability I top'. Corollary 1 Since the UC problem (6.5.3)-(6.5.4) is a special case of the LP problem (6.6.1)-(6.6.3), the algorithm (6.6.7)-(6.6.22) solves the UC problem as well. Corollary 2 The theorem remains valid if we assume that the values of the functions fJ cannot be observed directly, but can be measured with a random noise. In other words,
where 6 is a random vector with an unknown time-independent probability distribution function. In this case we can consider another random process:
213
VARIANCE REDUCTlON TECHNIQUES
6.7 VARIANCE REDUCIlON TECHNIQUES
In Chapter 4 we studied several variance reduction techniques- namely: correlated and stratified sampling, antithetic and control variates- for estimating integrals. the mean waiting time in the GI/G/I queueing system, and the expected completion time in networks. Here we deal further with variance reduction techniques for estimating some output parameters of the steady-state distribution of regeneration processes. To understand how expensive simulations can be, consider estimating, via simulation E [ W ] , the expected stationary waiting time in an N / M / I queue. Usually, we would not simulate an M / N / l queue since analytic results are available. However, despite its simplicity it can be very expensive to estimate E [ W ] .It is therefore a good candidate for testing simulathe average of tion methodologies. Let the traffic intensity p < 1; then PN, the first N waiting times, has an asymptotically normal distribution with mean E [ W ]and variance a 2 / N . Therefore a confidence interval for E [ W ] may be constructed. A major problem in any simulation is how long to run it. One possibility is to run the simulation until the half length of a prescribed confidence interval. Table 6.7.1 lists the run lengths needed for the M/M/I queue to have a half iength of 0.10 E ( W ) . It follows from this tabIe that as p Table 6.7J Sompfep Sizes for tfre M / M / 1 Queue Requiretl
P 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.9s
0.99
E(W)
01
0.1 I I 0.250 0.429 0.667 1 .00 ISO 2.33 4 .OO 9
.oo
19.0 99 .O
0.375 1.39 3.96 10.6 290 8a.5 335 1,976 35,901 607.600 3.95 x 1081.09 x 10'
Source: Ref. 24. Note: N = Number of
N
8,200 6,020 5,830 6,430 7,850 10,600
16,700 33,400 119,000 455 ,OOo
customers that must be simulated for a !W%confidence interval for E [ W ] to have a half length of 0.1 EIWI, p
-
1, P ID V P ,
a WI = V P ( P - A).
214
REGENERATIVE METHOD FOR SIMULAllON ANALYSIS
increases beyond 0.3 the required run lengths increase rapidly, and for large values of p simulation is no longer a practical method. In the following two sections we consider control variates and common random numbers (correlated sampling) techniques for variance reduction while simulating stochastic processes, and we give some practical recornmendations for their application. The results of these sections are based on Heideiberger [24j, Heidelberger and iglehart 1231, and Lavenberg, Moeller, and Sauer [45], and are reproduced mostIy from them. 6.7.1
Control VsriStes
The method of control variates has already been described in Sections 4.3.3 and 4.3.12, and is only reviewed briefly here. Let { X,,n 2 .O> be a sequence of 1.i.d. random variables with unknown mean r = E(X,,). We are interested in estimating r via simulation. Let a,’ = a Z ( X n )be the variance of X,,. We can estimate r by N
and then form a confidence interval by using the c.1.t.:
Suppose now that we have another sequence of random variables {C,,,n 2 en’sare i.i.d., that X,, and C, are correlated (usually achieved by simulating X,, and C, with the same stream of random numbers) and that r, = E(C,,) is known. Let fl be some constant and set (6.7.1) Z A P ) = x, - P(C, - rc). O), called control variates, such that
Then (Z,,(P),n 2 0 ) are i.i.d. with mean r and some variance denoted by o*(/?). Let N
r, Z A P )
ZAP>=
n- I
N
then by the strong law of large numbers the c.l.t.,
;
zN(@)+r a.s. as N
- P and, ~ by
It can be readily shown (see also Section 4.4.3) that p==fi*, which
215
VARIANCE REDUCTION TECHNIQUES
minimizes the variance a2(p), is equal to
and that o Z ( p f )= ( 1
- p 2 ( X n ,c"))u:.
(6.7.3)
Formula (6.6.1) can be extended easily to the case of more than one control variate. Indeed, let C = ( C , ,.. .,Ce) be a vector of Q control variates, let rc = ( r , , ...,ro) be the known mean vector corresponding to C, and iet fl be any vector. Then (6.7.4) zn(S) = Xn - B'(C, - re)
is an unbiased estimator of r . Another type of control variate C = ( C , ,... Ce) is one for which the vector E(C) is unknown but its components E(C,), q = 1,. ..,Q, are equal to r . In this case (6.7.5)
with Zf-,,Pi = I, is again an unbiased estimator of r . We now consider two examples of application control variates for which formulas (6.7.4) and (6.7.5) are applied and variance reduction is achieved. The first example deals with (6.7.5); the second with (6.7.4).
Example 1 Let ( X , , n 2 0 ) be an irreducible, aperiodic positive recurrent Markov chain with state space I = (0, I, 2,. .., } and transition matrix P = ( p , , . i , j ~I}. It is known from Section 6.2 that X,*X as n-+oo(* denotes weak convergence), where X is the steady-state random variable having the stationary distribution o = (q:iE I} and 'R can be found from the solution of the system of linear equations II = 'RP. Let f : I + R be a real-valued function on I and define r =~
( f x)) ( =vj=
qf(i). 1
El
Here of= Z,,,n,f(i) is the inner product, of o andf. We are interested in estimating r. If the matrix P is unknown or the state space I is large (i.e., it is difficult to solve n = 'RP), it may become necessary to estimate r = ?if *For simplicity wc use this form ratba than the more conventional (a,/).
216
REGENERATIVE METHOD FOR SIMULATJON ANALYSIS
via simulation. This can be done as follows (see also (6.2.1) through (6.2.4)): Pick some state in I, say 0, and set To = 0. Define
T, = inf(n > Tm-,:X,,= 0},
m 2 0.
We say that a regeneration occurs at time Tmand the time between T, and Tm+,,that is, 7, = Tm+l- T,, is referred to as the length of the m cycle. Let k be some positive integer and let rv = n$, = E ( f , ( x ) ) , v = 0, 1,. ..,&. For each m 1 0 and v = 0,1,. . . ,k, define Y,(v) by Tm+I--I L ( V )
E
=
L(Xn).
n-T,
It follows from Proposition 6.2.2 that, if n If,I < cx), then (6.7.6)
Let Z,(Y)= Y,(Y) - r,?,. By (6.7.6) we have for each each m 2 0 E( z m iY 1) = 0 Define
Y
= 0, 1,.
. .,&
and
(6.7.7)
M
x YAv)
?,(M) =
m- I
(6.7.8)
M
z
7,
m- 1
and
for each v = 0,1,. . .,k. Then i,(M)-+r, a.s. as M-, 00 and i u ( N ) + r p a.s. as N+m. Observe !tat r , ( M ) is an estimator for re based on M cycles of the process and X , ( N ) is an estimation for r, based on N transitions of the process. Because {Z,,,(v):m 2 0 ) are i.i.d., it is readily possible to prove the foilowing two c.1.t.'~:
217
VARIANCE REDUCTION TECHNIQURS
Proposition 6.7.1 Let & be a ( k + 1) x ( k + 1)-dimensional covariance If E(I f,(x)l) matrix of Z,,,(v)’s, whose (i,j)th entry is uij = EiZm(i}Zm(j)]. < 00 for each Y = 0, 1,. . .,k , then
[ V a (ro( M )- r,), ...,
(qM )
I),.-
(
=aN 0,E;tl) (6.7.12)
(6.7.13)
The proof of this proposition is given in Ref. 24. Now let j3 be a ( k + 1)-dimensional row vector of real numbers whose vth entry is P(v). Let r,f(M), and %(A’) denote ( k + 1)-dimensional column vectors whose v t h entries are rv,i,(M), and P,(N), respectively. A simple application of the continuous mapping theorem (Theorem (5.1) of Billingsley [ I J) yields the following. Pnopositioa 6.7.2 Let o i ( j l > = f3Zk/3‘ = 2&X&oj3(i)ui,j3(j). hypotheses of Proposition 6.7.1,
Under the
and
where flr = Zf,@(v)r, is the inner product of
B and r.
In order to form confidence intervals for the ry’s (or for linear combinations of the r,,’s) it is necessary to know the u,[k as well as E(T,).These constants are usuaiiy unknown and must be esttmated. In addition j3 may be a fixed, but unknown, vector so it too must be estimated. The following proposition, the proof of which is also given in Ref. 24, tells us that we may replace these quantities in Proposition 6.7.2 by any sequence of strongly consistent estimators preserving the asymptotic normality.
k
Proposition 6.73 Suppose that ? l ( M ) + E ( ~ l )a.s., that &,., M ) + q j as. for each i and j, and that &i, M ) - + P ( i > a s . for each i. Let k ( M ) be the matrix whose (i,j)th entry is 6 , , ( M ) , let & M ) be the vector whose ith
218
REGENERATIVE METHOD FOR SLMULATION ANALYSIS
component is b(i,M),and let Gk( 6, M )= fi( M$,( M)B'(M). Then
\/;i?(B(M)YM)-B(M)r)3N(0, G(B'M)/?,(M)
1)
asM+oo.
(6.7.16)
We turn now to the problem of choosing the functionsf, with a view to achieving variance reductions. Heidelberger (24-261 suggested several ways of choosing 1,,v = 0,... ,k. We consider only one of them [24}. Let (6.7.17) i, = P'f, v = 0, 1,. ..,k, where P ' is the v step matrix function of the process. It is shown in Ref. 24 that in this case, that is, whenf, = P'f, all rv = nf,, Y = 4 1 , . . .,k, are equal to r , and if E{ If( x)l) < co,then wf = R( Pf ). Since r, = r , v = 0,1, ... ,k, it is obvious that
' M
m = 1 ym(v) a.s.
Fv(M)=
4
M
c
E(Y,(v))
H 71 1
=ni=r*
(6.7.18)
7,
m= I
Therefore each FV( M).Y = 0,1, , . , ,k, is a strongly consistent estimator for r , and we can use one of them for this purpose. However, better results can be achieved by using all of them simultaneously, for instance, using (6.7.5), which can be written as k
$(W = 2
B(y)?,(M),
(6.7.19)
V-0
where Xt,,@( v ) * 1. Variance reduction can be achieved if we choose the @(v)'s so as to minimize the asymptotic variance u&3) of $(M).Mathematicalty it can be written as
p ) = FZF'
(6.7.20)
2 p( v) * 1.
(6.7.2 I )
minimize u,'( k
subject to
r-0
The solution of this problem, which can be obtained using Lagrange multipliers, is (6.7.22) (6.7.23)
2 19
VARIANCE REDUCTION TECHlrilQUES
where e denotes the ( k + I )-dimensional row vector each of whose components is 1, and where t is the transpose operation. Formulas (6.7.10) and (6.7.1 I) can be now rewritten as (6.7.24)
(6.7.25)
where ?,@( N ) = X:-,,/3(
N ), and both j(M)-+ra.s. asN+oo Y)$,.(
and i@(N)+ra.s.
as~-,m.
Since the cova;iance matrix Z i s in general unknown, it is necessary to estimate it. If Z ( M ) is any estimator such that e ( M ) - + X as. as M d m , then it is clear that $ - ‘ ( M ) - + X - ’ as. as M 4 o o . Letting k
@ (‘.
M )=
r: a*(.. M ) q w ,
(6.7.26)
v-0
and applying Proposition 6.7.3, we have (6.7.27)
where 6kk(&,M)-+uk(P*) as. as M - w and ? , ( M ) is any sequence of numbers suchAthat? , ( M ) - t E ( 7 ! ) as. as M-+03. A corresponding c.1.t. exists for the X,( N )‘s as well. This method is called the ‘‘method of multiple estimates” because it combines several different estimates of the same quantity. In order to apply this method the functionsf, must be computed (usually before the start of the simulation). For computation efficiencyf, can be defined recursively by lo-f and f, = PS,-I for Y 2 I. This saves having to compute the Y step transition function P‘, a potentially large computational economy. If the state space is finite and the transition matrix is sparse, the work involved in calculatingfi for a few values of v may not be too heavy. We note that to form the estimates .C”( N ) (or Fp( M)) we must evaluate f.(X,) for each value of Y and each transition n. This tends to increase the amount of time needed for each transition simulated. However, if the variance reduction obtained is sufficientiy large, the potential savings in
220
REGEKERATIVE METHOD FOR SIMULA’TION ANALYSIS
the number of transitions that need to be simulated will more than offset the extra work per transition. We also note that additional work must be done at the end of each cycle to update the estimates of the covariance matrix X k (using no variance reducing technique, we need only update ui). It is shown (see [24]) that u ~ ( ~ * ) - +asO k - m . For many types of Markov chains we can expect substantial variance reductions even when k is relatively small (say 2 or 3). For countable I we have ua
fk(+
I3 p ; m =
E[S(X,,,)IX,=~].
(6.7.28)
J”0
Thus if the Markov chain makes transitions only to “neighboring” states and if f ( j ) is close tof(i) for j close to i,it can be seen from (6.7.28) that, for smatl k , f k ( i ) and f ( i ) should be nearly the same. This means that ik( N ) and i,,(N ) will be highly correlated, a condition that generally results in good variance reduction. Many queueing networks exhibit this special type of structure. Ideally, we would like to be able to have the “optimal” value of k in the sense that, for a given computer budget, we would like to pick the value that yieids the narrowest confidence intervals for r (part of the budget must be allocated to calculation of theJ’s). To perform such an optimization we would have to know u:(j3*) for each Y 2 0. These quantities are generally unknown, and even to estimate them would require calculating the f,’s and then simulating the Markov process for an additional number of cycles. The disadvantage of such a procedure is that the cost of computation off, may be higher than the gain achieved through variance reduction. Generally speaking, the success of this technique depends on our ability to compute and store efficiently the functions 4. The method of multiple estimates can be extended to certain types of continuous-time processes such as continuous-time Markov chains and semi-Markov processes (see 124)). To find out the efficiencies of this method Heidelberger I241 considered the following four examples: the queue length process in a finite capacity M / M /1 queue, the queue length process in the repair problem with spares, and the waiting time processes in both M/M/1 and M / M / 2 queues. These processes were chosen because analytic results are readily available, thereby making a comparison between analytic and simulation results possible. Despite their simplicity, these processes are by no means “easy” to simulate, in particular the heavily loaded queues, which require very large run lengths to get good simulation estimates. The simulation results, which are also presented in Ref. 24, show that for all four examples substantial variance reduction was obtained. However, as this method
221
VARIANCE REDUCTION TECHNIQUES
entails additional computations both before and during the course of simulation, we would recommend using it only when it is computationally advantageous to do so. In the case of Markov chain it is likely that the method will be most effective if the transition matrix of the process is sparse, in which case the preliminary calculations can be carried out with relative ease. It is for this type of process that the method is recommended.
Example 2 We consider now another example of variance reduction, taken from Ref. 45. Before starting this example we need more mathematical background on the regenerative method. Let X again be the steady-state vector of the regenerative process ( x ( t ) : t 2 0}, let f and g be given real-valued measurable functions, and suppose we want to estimate (6.7.29)
It follows from Proposition 6.2.2 that, if E ( l f ( X ) l } < 00 and €(1g(x)l} 00, then
<
(6.7.30)
where
and
z,
3
J 7,+ 'g{ X(t ,}
df
T,
are dependent random variables defined with respect to a single cycle K,, q. In the particular case where g = 1, we have Zi = 7, and (6.7.30) becomes (6.2.4). The classical point estimator for p obtained from M cycles is
T~=
-
M
zcy, 6si-
I
,
I-
1.2,
. . . I
(6.7.31)
II I 2,
1-
and for sufficiently large M (6.7.32)
REGENERAr x w METHOD
222
FOR SIMULATION ANALYSIS
where
Furthermore, if we replace u with its estimator 6 such that M
( l / f ~ - 1)) 62
=
C (.yr-Fzi)* (6.7.33)
1-1
(( 1 / M
,!
Zi
the c.1.t (6.7.32) will also hold; therefore a confidence interval for p can be obtained. Assume now that we have Q pairs of dependent random variables { Y C 4 )Z'4'), , q = I , . . .,(3, defined with respect to a single cycle. Denote E( Y t q ) ) Pq
=
(6.7.34)
E( Z' q' )
Assume also that p , q = I , . . . ,Q, is known,but that the expected values of the pairs { Y ( q ) ,Z(q5}are unknown. In order to apply control vanates in this case the sequence of i.1.d. pairs of random vectors
R, = ( ( U,,,Z,,),( Yjl),Z,!lb),. . . ,( Y,,(e), Z ,(,Q ))},
n= l,...,M (6.7.35)
is collected, and then the Q-dimensional vector of control variates C = ( C , ,, . ., CQ)is defined as M
Yi4' n-i I cq5 '&f
(7"
I....,Q.
(6.7.36)
r,I z y ,
n-
Now, by analogy with (6.7.4), for any vector #3 a point estimator for p using these control variates is (6.7.37) i ( B ) = fi - B'(C - PcL where p c = ( p , , . . . ,pQ). Note that because fi and Cq, q = 1.. .. , Q, are biased estimators, respectively, for p and p q , q = 1,. . . ,Q, the estimator i;(p) is also biased, which differentiates it from the unbiased estimator for Z,,(B) in (6.7.4). However, f i ( f 3 ) is a strongly consistent estimator of p and, for M sufficiently large, (6.7.38)
223
VARIANCE REDUCTION TECHNIOUES
where
The value of fl that minimizes 0 2 ( p )is (see (4.3.30))
p* where the matrix 2 and
(I
= z-b,
(6.7 .#)
have elements
(x)w = cov and
[
Y -- pi? y ( P )- p q Z f P ) -- E[Z] '
E[ZtP']
1
(6.7.41)
(6.7.42)
The resulting minimum value of U*(
)'31
~'(13)
is
= (1 - R 2 ) U * .
(6.7.43)
where (6.7.44)
Finally. for M sufficiently large (6.7.45)
s*
where is an estimator of p* and Ciz(fl') is an estimator of oz(p*). As M increases d*( f l ' ) / ( i * approaches 1 - R 2 , and therefore vanance reduction can be achieved. Now we start with the example given in Ref. 45. Consider a G I / G / I queue with i.i.d interarrival times A, and i . i d service times S,. Let p 2 be the mean interarrival time. Assume that the traffic intensity p = p 2 / p t < I ; this means that the queueing time { & ,2i 0}, which is defined by W; = ( y - +,S,_ I - A , ) + , i 2 I, and Wo-O, is a regenerative process with regenerative points (T,, k = 1,2,. . . }, where T, is the serial number of the kth customer that arrives to find the system empty and T , = 1 (consult Section 6.4.1). The steady-state waiting time E ( W ) = 1-1 can be estimated by M
r;=- XIY t=
5
1-1
2,
224
REGENERATtVE METHOD FOR SIMULATION ANALYSIS
where T,+I-~
Y=
z
w,
J'T,
and 2,= q + l- q . Define yw=
T#+1-1
x
A,,
I'T
the duration of the i th busy cycle (busy period plus idle time), and ym=
T,+I-~
I= s,, J'T,
the duration of the t h busy period. It is known [45j that E( = p l E ( 2) and E ( Y ( " ) = p 2 E ( Z ) , where E ( Z ) = E ( Z , ) , i=1,2, .... The following vector of control variates C = (C1,C,) with components (see also (6.7.36)) M
2 Y;q) Cq=
n= 1
4" 1,2,
M
x Z"
n- I
is considered in Ref. 45 and the point estimator $(#3) given-in (6.7.37) is adapted for the parameter p = E( W). It is shown numerically in Ref. 45 that substantial variance reduction is obtained by simulating the G I / G / 1 queue and some other queueing models, while using these control variates. 6.7.2
Common Random Numbers in Comparbg Stodmstii Systems [U]
In this section we show how the method of common random numbers may be used in simulation of discrete and continuous Markov chains for variance reductions. Suppose we have two irreducible, aperiodic, positive recurrent Markov chains in discrete time and we wish to construct a confidence interval for r, - rt = E{ j,(XI)) - E{ j2(X 2 ) } by simulating the two processes. Here X', i = I , 2, is the steady-state r.v. of the regenerative process X i = ( X i : t 2 0) and the J;. are given real-valued functions defined on the state space 1, of process X'. Let us consider the following two point estimates of ri: n
;i n
-
k= 1 n
k- I
(6.7.46)
225
VARIANCE REDUCTION TECHNIQUES
and
. N-l (6.7.47)
where n is the number of simulated cycles, N is the number of steps, and it is assumed without loss of generality that TO = 0, Xi = 0, i= 1,2. The two c.t.t.’s are the following: (6.7.48)
(6.7.49)
as n and N+m. To construct a confidence interval for r ’ - r 2 we can simulate the two processes XI and Xz independently and apply the bivariate c.1.t. N’/*[t,,,
- I] =.N(O,A),
(6.7 SO)
‘,
where ?& = (?,; F i ) , r = ( r r‘), N(0, A) is a two-dimensional normai vector with mean vector 0 = (0,O) and covariance matrix
It can be readily shown (see [23])that N ’ / z [ (F;
- )?: - (r, - r , ) ] U
*N(O, I),
(6.7.51)
where
A c.1.t. simiiar to (6.7.51), but based on simulating rn cycles, can also be obtained to construct a confidence interval for rl - r,. Now we turn our attention to the problem of using common random numbers while generating sample paths for XI and Xz. Our goal in using common random numbers is to produce a shorter confidence interval for r, - r2 for the same length of simulation run. In other words, we seek a c.1.t. similar to (6.7.51) but with a smaller value of u. To accomplish this we
226
REGENERATIVE METHOD FUR SIMULATION ANALYSIS
generate the bivariate M.C. X = ( X , , : n 2 0}, where X,, = ( X i , X,’). At each jump of the process X the same random number is used to generate the jumps of the two marginal chains Xi and X2.The marginals of the process X are seen to have the same distributions as the original chains X i and X2; however, the marginal chains are now dependent. The state space of the chain X is denoted by F which is a (possibly proper) subset of I, X f2. We assume here that the chain X is also irreducible, aperiodic, and positive-recurrent. (These conditions are not automatic but usually hold for practical simulations.) Furthermore, we assume for convenience that (0,O)E F and use that state to form regenerative cycles. Note that X,*X as n 4 00, and the marginal distributions of X are the same as those of XI and X 2 . namely, ( ~ ( i ) : j IE) for i = 1,2. For any real-valued function f : F-PR satisfying E { i f ( X ) l } < 00, the regenerative method can be applied to X to estimate E { f ( X ) } .Let .Yo = (O,O), To = 0, and define the mth entrance to state (0,O) by X to be
T,, Also, let
T,,
I
= inf{ n
> T, : X,,= ( O , O ) } ,
m2
0.
= Tm+,- T,, m 2 0, be the length of the mth cycle and ?-,,,+#-I
x(x,,),
Yi(i)=
m20.
n- T ,
Set Z h ( i ) = Yk(i)- r,~,,,.Since the ratio formula (6.2.4) still holds for the process X, E c , , o ~ ( Z ~ ( = i )0 ) for i = 1,2. Let q, = E ~ o , o l ( Z X i } Z X j } ) ,i?j- 1.2,
which we assume is finite and nonzero. Since the vectors Z m = (Z&(1). ZL(2)) are i.i.d., the standard c.1.t. yields n
Z;*N(O,~),
nil2 m=
(6.7.52)
I
where I:= {a,,). By analogy with (6.7.49) and (6.7.50) it can be shown (see
1231) that
A”/’[
F,,
- r] a N ( 0 , B ) .
(6.7.53)
and (6.7.54)
Here B = ui,/E(o,o,(
and
VARIANCE REDUCTION TECHNIQUES
227
A c.1.t. similar to f6.7.54), but in terms of n regenerative cycles, can also be obtained. Now consider the marginals of (6.7.53) in conjunction with (6.7.49). Since the marginals of the chain X have the same stochastic structure as the chains X’ and X 2 considered separately, these two c.l.t.’s must be identical. Hence (6.7.55)
Thus upon comparing the constant uz in (6.7.51) and uz in (6.7.54), we > 0. conclude that L?’ < u 2 if and only if The measure of variance reduction we use is (6.7.56)
So, for example, if R = 2, then only half as many steps of the Markov chain X need be simulated to obtain a confidence interval of specified length for r , -.r2 as would be required when simulating X’ and X 2 independently. In addition, of course, only one stream of random numbers need be generated. While we have worked here with discrete-time Markov chains, the same method can be used for continuous-time Markov chains, semi-Markov processes, and discrete-time Markov processes with a general state space. The following definition and properties will be used in obtaining nonnegative correlation. Definition 1 Random variables Y = (Y,, . . . , Y,) are said to be associated if cov { j (Y ).g( Y )} 2 0 for all nondecreasing functions f and g for which E { ~ ( Y Y ) ) . E { K ( Yand ) } E { . f ( Y ) , g ( Y ) }exist. PROPERTY 1.
Any subset of associated random variables are associated.
PROPERTY 2. If two sets of associated random variables are independent of one another, then their union is a set of associated random variables.
PROPERTY
3. The set consisting of a single random variable is associated.
PROPERTY 4.
Nondecreasing functions of associated random variables are
associated. A class of processes for which nonnegative correlation can be guaranteed is stochastically monotone Markov chains (s.m.m.c.). In the following definition let i be a fixed index.
228
REGENERATIVE METHOD FOR SIMULATION ANALYSIS
Definition 2 Let X' = {Xi,n2 0 ) be a real-valued Markov process with initial distribution P , ( x ) = P( XA 5 x ) and transition function P i ( x , A )= P { X , , + l ( i )E A I X , ( i ) = x } (for measurable sets A). X' is said to be an s.m.m.c. if, for everyy, P , ( x , ( - 00,yj) is a nonincreasing function of x. Define the inverse distribution functions J',-'(-)
and P,-'(x, .) by (6.7.57)
P,-'(u)=inf ( y : ~ , ( y2 u ) )
(6.7.58)
Henceforth we assume that the sample paths of X' are generated on the computer, using the inverse transformation scheme XA = l y ( U 0 ) X;=+-'(X;-[,U").
(6.7.59)
n 2 1,
(6.7.60)
where (Un,n2 0 ) is a sequence of random numbers. Notice that, if X' is an s.m.m.c., then <-I(x, u ) is an increasing function in both arguments. This fact enables us to show that for each n 1 0 ( X i , . .. , X i , X i , . ., ,Xi)are associated. I f X i and X 2 are both s.m.m.c.'s with sample paths 1 generated by (6.7.59) and (6.7.60), then, for each n 2 0, { X d , . ..,X,,, X i , .. .,Xi) are associated random variables.
Theorem 6.7.1
Proof The proof is by induction. For n = 0 Property 3 implies that (U,} is associated and since &-I(Ua) is a nondecreasing function of U, for each i, yields that {Xi,Xi} are associated. Assume now that {Xd,. . ., X i , X i , , . . , X i ) are associated. Since V'+ I is independent of this set, { X i , . . ,A';, Xi, .., X i , U,+ are associated by Property 4. The map that takes these random variables into (Xi... .,X,', Xi+ X i , . . .,X:, Xt+ is nondecreasing because X' and X z are both s.m.m.c.*s. Property 4 then yields the final result. Q.E.D.
.
.
,,
The following theorem, whose proof is found in Ref. 23, shows that, when simulating s.m.m.c.'s using common random numbers, a reduction in variance is achieved.
Theorem 6.7.2 Let X' and X 2 both be s.m.m.c.3 with sample paths generated by (6.7.59) and (6.7.60). Let f, and f2 be nondecreasing functions.
229
EXERCISES
If
then u,2 2 0. The efficiency of common random numbers in variance reduction was checked for different output parameters of regenerative processes and substantial variance reduction was achieved only for some particular cases. The effect of variance reduction decreases with increasing complexity of the processes being simulated. The method is effective only where the expected cycle length is sufficiently short. If preliminary simulation runs indicate that the expected cycle length is excessive, it is sugpested that independent simulations be performed.
EXERCISES I For the data given in Fig. 6.4.1 construct a 90% confidence interval using the cIassical estimator r
i I I
y,
1
I-
n
c
’
1 - i
where n is the number of cycles
t Prove by induction that, if
= P F j ,where P’ is the uth step transition matrix, then r, = n, is equal to r = nJ. Here n is the steady-state distribution of P. From Heidelberger [24].
3 Prove that, if nlf I < 00, then wJ- n(P’). From Heidelberger 1241.
4 Prove that the solution of the problem (6.720)-(6.7.21) is (6.7.22)-(6.7.23). 5 Consider the following system of linear equations: Y = aPY + f ,
where P is an ( n x n ) ergodic Markov chain with stationary distribution n = nP,
a < 00. Prove that
(T,
where r = (n,f) .
Y)=
r
I-a
230
REGENERATIVE METHOD FOR SIMULATION ANALYSIS
6 Inventory Model. Consider a situation in which a commodity is stocked in order to satisfy some demand. An inventory (s, S) policy is characterized by two postive numbers s and S with S > s. I f the available stock quantity is greater than s, d o not order. If the amount of inventory on hand plus on order is less than s, order to bring the quantity of the s stock to S. Let Xi denote the level of inventory on hand plus on order in the period i after ordering. Let dj denote the demand in periodj; then the stock values if d j IXj-, Xj - d i , xi+l= s, otherwise define a Markov chain with state space I = (s,s + 1,. . .,S - 1, S}, where it is assumed that s X,,5 S. As a numerical example let s- 2, S = 5, and { p ( d, = 0 ) = f ,p( d, = 1) =+,P( d, = 2) = f ,and P( d, = 3) = ). Then the transition matrix is
{
I? 8
1 1
! 6
?I 1
(a) Find the stationary probabilities n,, ~ E I analyically , and by simulation the Markov chain, making a run of lo00 cycles. (b) Describe a program to simulate the regenerative process ( X ( n ) :n > 0 ) including a flow diagram, a listing of the program, and the random number generator. 7 M / M / I Queue. Run this queueing model for 2000 cycles. From the simulated data :
(a) Fill out a table similar to the Tahlc 6.4.1, taking the Same parameters, that is, X = 5, p = 10, and the 90% confidence interval. (b) Describe your random number generator, a flow diagram, and a listing of your program.
-
8 Repuinuur maiel with spares. Select the same parameters as in Section 6.4.2, that is assume n 10, m = 5, s = 4, p = 2. A = 5, and C ~ O Q S Cthe 95% confidence level. Run the model for 500 cycles and, from the simulated data:
(a) Fill out a table similar to Table 6.4.2. (b) Describe your random number generator, a flow diagram of your program, and a listing of your program.
REFERENCES I
Bilingsley, P., Conwrgence of Pro&&&
Measures, Wiley,
New York, 1968.
2 Carson, J. S., Variance reduction techniques for simulated queuing processes, W.D. thesis, Department of Industrial Engineering, University of Wisconsin, Madison,
Wisconsin, 1978.
REFERENCES
231
3 Carson, J. S. and A. M. Law,Conservation equations and variance reduction in queuing simulations. Technical Report 77-25. Department of Industrial University of Wisconsin, Madison, Wisconsin, 1977. 4 Cox, D. R. and W. L. Smith, Queues, Methuen. London, 1961. 5 &ar, E.,Introduction to Slochanic Procuses, Prentice-Hall, Engtcwood Cliffs. New Jersey, 1975. 6 Crane. M. A. and D. L. Iglehart, Simulating stable stochastic systems, I: General multi-server queues, J. Assoc. C o w . Mach., 21, 1974, 103- 113. 7 Crsne, M. A. and D. L. Igtehart. Simulating stable stochastic systcma, 11: Markov chains, J. A m . C ~ n l pMach., . 21, 1974, I 14- 123. 8 Crane. M. A. and D. L. Iglehart, Simulating stable stochastic systems, 111: Regenerative prooescrea and discrete-event simulations, @er. Re$.,23, 1975, 33-45. 9 Crane, M. A. and D. L. Iglehart. Simulating stable stochastic systems, IV: Approximation tcchnrqucs, Monoge. Sci., 21, 1975, 1215- t224. 10 Crane, M. A. and A. J. Lemoine, An Inrrochrerion to the Rgenemtiw Method for Sirnulotion Ano&is, Springer-Verlag, New Yo& 1977. 11 Esary, J. D., F. Roschan, and D. W.Walkup. Asgociation of random variables with application, Ann. Mofh. Stoi., 38. 1967, 1 6 1 4 7 4 . 12 Fiacso. A. V. and G. P. McCormick, Nonlinear Progromming: Sequential Uncomtroined Minimization Techniques, Wiley, N e w Yo& 1966. 13 Fishman, G. S., Concepts and Methods in Discrete Digit4 Sitnularim, Wiley, New York, 1973. 14 Fishman. G. S., Statistical analysis for queueing simulations, Manuge. Sci., 20, 1973, 363- 369. 15 Fishman, G. S., Estimation in multiservcr queueing simulations, W r . Ru., 22, 1974, 72- 78. 16 Fishman, G. S., Achieving specific accuracy in simulation output analysis, Convn. Assoc. COW. Mach., 29, 1977,310-315. 17 Gass, S. I., Linuor Programming Mefiwds and Applications, 3rd ed, MaSraw-Hiil. New York, 1969. 18 Gavcr, D. P.and G. S. Shcdler, Control variable metbods in the simutation d a model of a multiprogrammed computer system, N m . Re#. Lagist. Q w t . , 18, 1971.435-450. I9 Gavcr, D. P. and G. L. Ihompson, Progromiftg and Probabiiiq Madels in Operatiom Reseurch, Brooks/Colc, Monterey, California, 1973.
20 Gunther, F. L., The almost regenerative method for stochastic system simulations, Technical Report ORC 75-21, Operations Research Center, University of California, Berkeley, California, 1975. 21 Heidelberger. P., Variance reduction techniques for the simulation of Markov processes, Ph.D. thesis, Department of Operations Research, Stanford University, Stanford, California, 1978. 22 Heidelberger, P. and M. Meketon, Bias reduction in regenerative simulation, Research Rcport RC 8397, IBM Corporation, Yorktown Heights, New York, 1980. 23 Heidelberger. P. and D. t. lglehart Comparing stochastic systcms using regenerative simulation and cnmrnon random numbers, A&. Appl. Prob, 11. 1979, 8w-819. 24 Heidclberger. P..Variance reduction techniques for the simulation of Markov processes, 1: Multiple atimates, IBM J. RCS.k f q . (to appear).
232
REGENERATIVE METHOD FOR SIMULATION ANALYSIS
25 Heidclbcrger, P.,Variance reduction techniques for the simulation of Markov processes, 11: Matrix iterative melhods, Acfu I n j m . , 13, 1980. 21-37. 26 Heidelbcrgcr, P., A variance reduction technique that increases the regeneration frequency, in Cwrent Issues in Cowpurer Simulation, Academic, New York, 1979, pp.
257-269. 27 Hordijk, A.. D. L. Iglehart, and R. Schassbergcr, Discrete time methods for simuiating continuous time Marka, chains, A&. Appl. Prob., 6, 1976,772-788. 28 Iglehart, D.L.,Simulating stochastic syslcms, V: Comparison of ratio estimators, Nrm. Rex. k i s t . Qwm., 22, 1975, 553-565. 29 [&hart, D.L., Simulating stable stwhastic systems, VI: Quantile estimation, J. Assoc. COW. M a h . , 23, 1976,347-360. 30 Iglehart, D. L., Simulating stable stochastic systems, VII: Selecting best system, in Aigwirkmic Met& in Probobili@, Vol. 7,edited by M.Neuts, North-Holland, Amsterdam. l977,37-50. 31 Iglchart. D. L, Regenerative simulation for extrcme value+, Technical Report 43, Dqmrtment of Operations Research, Stanford University, Stanford, California, 1977. 32 Iglehart, D. L. and P. A. W.Lewis,Variance reduction for regenerative simulations, I: Intern1 control and stratified sampling for queues, Tochaical Report 86-22,control Analysis Corporation, Palo Alto, California, 1976. 33 Iglehart, D. L. and G. S. Shedler, Regenerative simulation of response times in network of queues, J. Assoc. Comp. Mach.. 25, 1976,449-460. 34 Iglehart, D. L. and G.S. Shedler, S&ulation of response times in finitecapacity open networks of queues, Oper. Ru., 26, 896-914. 35 Iglehart, D. L. and C.S. ShcdIcr, Regenerative simulation d response times in networks of queues, 11: Multiplejob types, Rescarch Report RJ 2256,IBM Corporation, San Jose, California, 1978. 36 Iglebert, D. La,The regenerative metbod for simulation analysis, in Cwrenr Trent& in Pw&rurnnu'wM E I M L ~Vsl , "1, S%jtw.ar* Ew;newi~,rditd by YL. k& Cknady ond R,T. Ycb, &ntic;c-Hall, Enytewvod Cliffs, New J q , 1978. 37 Iglehurt, D. L. and G. S. Shcdler, Regenerative simulation of response times in networks of queucar, Springer-Verlag, New York, 1980. 38 Iglehart, D. L., Regenerative simulation. Forthcoming. 39 Kubak, I. W.,Stopping rules for queueing simulations,Uper. Ru., 16, 1968,431-437. 40 Karlin, S. and H. M. Taylor, A First Cmrse in Stochaslie Pnrccssc~,2nd ed., Academic, New York, 1975. 41 Kiefer, J. and J. Woifowitz, On the tkory of queues with many servers, Trans. Amer. Muth. SW.,78. 1955, I- 18. 42 Knuth, D. E., f i e Art o/ Computer P r o g m ' n g , Vol. 2, Seminumtieal Algorithms, Addison-Wesley, Reading, Massachusetts, 1969. 43 Lauenberg, S. S., Efficient estimation of work r a m in closed queueing networks, in Proceedings in Conpurcuid Stufisiics, Physica Verlag, Vienna, 1974,pp. 353-362. 44 Lavenbcrg, S. S., Regenerative simulation of queueing networks, Research Report RC 7087. IBM Corporation, Yorktown Heights, New Yo&. 1978. 45 Lavenberg, S. S.,T. L. MotUer, and C. H. Sauer, Concorninant control variables applied to the regenerative simulation of queueing system, #per. Ru.. 21, 1979, 134- 160.
REFERENCES
S., T. L. Moeller, and P. D. Welch. Control variables applied to the sirnubation of queueing models of computer systems, in C o m e r Per/omumce. North Holland Amsterdam, 1977. pp. 459-467. Lavenberg, S.S.. T.L. Moelkr. and P. D. Welch, Statistical results on multiple control variables with application to variance reduction in queueing network simulation, Research Report RC 7423. 1BM Corporation, 1978. Lavenbcrg, S. S. and C. H. Sauer, Sequential stopping rules for the regenerative method of simulation, 1EM J. Res. k l q . , 21, 1977. 545-558. Lavenberg, S. S. and G. S. Shedler. Derivation of confidence intervals for work rate estimators in a closed queueing network. Soc. lndust. Appi. Math. J . Cw., 4, 1975, 108- 124. Lavenberg, S. S. and D. R. Slutz, Introduction to regenerative simulation, IEM J. Res. D e ~ l o p .19. , 1975,458-462. tavenberg. S.S. and D. R Slutz. Regenerative simulation of an automated tape library, IBM J. Res. Develop., 19, 1975,463-475. Law, A. M., Efficient estimators for simulated queueing system, Technical Report ORC 74-7. Operations Research Center, University of California, Berkeley, California, 1974. Law, A. M.,Efficient estimators for simulated queueing systems. Manage. Sci., 22, 1975, 30-41. Law, A. M..Confidence intervals in discrete event simulation: A comparison of replication and batch means. Technical Report 76-13, Department of Industrial Engineering, University of Wisconsin, Madison, Wisconsin, 1976. Law. A. M. and J. S. Carson, A sequential procedure for determining the length of a steady-state simulation, Technical Report 77- 12, Department of Industrial Engineering, University of Wisconsin, Madison. Wisconsin, 1977. Law, A. M.and W.D. Kelton, Confidence intervals for steady-state simulations, 11: A survey of sequential procedures, Technical Report 784, Department of Industrial Engineering, University of Wisconsin, Madison, Wisconsin, 1978. Lewis, P.A. W., A. S. Goodman,and J. M.Milter, A pseudo-random number generator for the System/360, IBM Syst. J.. 8. 1969, 199-200. Mitchell, B., Variance reduction by antithetic variates in GI/ G / I qwuaing simulations, W r . Res., 21. 1973, 988-997. Poliak, D. G., Some methods of efficient simulation for queueing system, Eng, Cybern. (J’SY), I, 1970, 75-85. Robinson, D.W., Determinants of run lengths in simulation of stable stochmtic systems, Technical Report 86-21, Control Analysis Corporation, Palo Alto, California, 1976. Rubinstein, Y. R., Selecting the best stable stochastic system, Stochasik Processes Appl., 10, 1980. 75-851. Rubinstein. Y. R. and A. Karnovsky, The regenerative method for constrained optimization problems, in OR’79, edited by K. B. Haley. North-Holland, Araslcrdam. 1979, 931-949. Seila, A, F., Quantile estimation methods in discrete event simulations of stochastic systems, Technical Report 76-12, Curriculum in Operations Research and Systems Analysis, University of North Carolina, Chapel Hill. North Carolina, 1976. Varga, R. S., Murrix Iteratiuc Analysis, Prentice-Hall, Englewood Cliffs. New Jersey, 1%2.
46 Lavenberg, S.
47
48 49
50 51
52 53 54
55
56
57 58 59
60 61 62
63
64
233
Simulation and the Monte Carlo Method
R E W E N Y. RUBINSTEIN Copyright 0 1981 by John Wiley & Sons, Inc.
CHAPTER7
Monte Carlo Optimization Optimization is the science of selecting the best of many possible decisions in a complex real-life environment. The subject of this chapter is Monte Carlo optimization, a subject playing an important role in finding extrema-- that is, minima or maxima of complicated nonconvex real-valued functions. We show how Monte Carlo methods can be successfully applied while solving complex optimization probIems where the convex optimization methods (see Avriel [Z]) fail. Before proceeding to the rest of the chapter, however, we explain what we mean by local and globd extrema for unconstrained optimization. Consider a real-valued function g with domain D in R". The function g is said to have a local maximum at point x* f D if there exists a real number S > 0 such that g(x) 2 g(x*) for all x E D satisfying IIx - x*ll c 6. We define a local minimum in a similar way, but in the sense that inequality g ( x ) 5 g(x*) i s reversed. If the inequality g(x) 5 g ( x * ) is replaced by a strict inequality g( x )
< g( x * ) ,
x E D ,x it x * ,
we have a strict local maximum: and if the sense of the inequality
g c x ) < g(x*) is reversed, we have a strict local minimum. We say that the function g has a global (absolure) maximum (strict global maximum) at x* E D if g(x) Ig ( x * ) , f g ( x ) < g ( x * ) J holds for every x E D. A similar definition holds for a global minimctm (strict global minimum). A global maximum at x* implies that g ( x ) takes on its greatest vahe g(x*) at that
point no matter where else we may search in the set D. A local maximum, on the other hand, only guarantees that the value of g ( x ) is a maximum with respect to other points nearby, specifically in a ®ion about x * . 234
235
RANDOM SEARCH ALGORITHMS
Thus a function may have many local maxima, each with a different value of g(x), say, g(x;), j - 1, ...,k . The global maximum can always be chosen from among these local maxima by comparing their values and choosing one such that
where x* E ( x , " , j = 1 ,...,k } ,
It is clear that every global maximum (minimum) is also a local maximum (minimum); however, the converse of this statement is, in general, not true. If g(.x) is a convex function in R" and D C R" is a convex set then every local minimum of g at x E D also a global minimum of g over D [2]. 7.1
RANDOM SEARCH ALGORITHMS
Consider the following deterministic optimization problem: (7.1.1)
where g( x) is a real-valued bounded function defined on a closed bounded domain D c R". It is assumed that g dchieves its maximum value at a unique point x*. The function g( x ) may have many local maxima in D but only one global maximum. When g(x) and D have some attractive properties. for instance, g ( x ) is a differentiable concave function and I1 is a convex region, thcn, as previously mentioned, a local maximum i s also a global maximurn and probleni (7. I. 1) can be solved explicitly by mathematical programming methods (see Avriel [2]). If the problem cannot be solved explicitly, then numerical methods, in particular Monte Carlo methods, can be applied. For better understanding of the subsequent text we describe an iterative gradient algorithm, assuming for simplicity that the set D = R". According to the gradient algorithm, we approximate the point x * step by step. If on the ith iteration ( i = 1,2,. .. ) we have reached point x,, then the next point x,+, is chosen as x,+,=x,+a,Vg(xi),
where
a,>O
(7.1.2)
236
MONTE CARLO OPTIMIZATION
is the gradient of g(x), where i 3 g ( x k ) / b k , k = 1,. ..,n, are the partial derivatives, and where a,> 0 is the step parameter. If the function g(x) is not differentiable or if the analytic expression of g( x ) is not given explicitly (only the values of g( x) can be observed at each point x E D), then the finite difference gradient algorithm (7.1.3)
x,+ I = x, + a,$g(x,)
can be applied. In (7.1.3)
is the finite difference estimate of the gradient Vg(x,). Under some rather mild conditions (see Avriel [2]) on g(x) and a,,the algorithm (7.1.3) converges to the locai extremum x * . In the case where either g ( x ) or the regton D is nonconvex, the classical numerical optimization methods fail. However, Monte Carlo methods, in particular random search algorithms] can be applied. If we assume, for instance, that g ( x ) is a rnultiextrernal function, then procedures (7.1.2) and (7.1.3) converge only to one of the local extrema, subject to choice of the initial point xo from which the algorithms (7.1.2) and (7.1.3) start. We consider several random search algorithms capable of finding the extremum x * for complex nonconvex functions. The random search algorithms have been described in many papers and books (see Ermolyev 191, Katkovnik [ 171, Rastrigin [28j, and Rubinstein [3 1-36]). and successfully implemented for various complex optimization problems. We now consider several random search algorithms. Rpndom Seawh D
x,+, = x,
d k T d Algm'thm (Algorithm RS-I)
a, +24
g(x,
+ p,Z,) - g ( x ,
- &Zi) JZ,, a,.> O,P, > 0. (7.1.4)
According to this algorithm, at the ith iteration we generate a random vector S, continuously distributed on the n-dimensional unit sphere, calculate the increment (see Fig. 7.1.1) A g -t (,' 1= g t xi + Pi ) - g( x j - fin Ej (7.1.5) ' 1
1 1
RANDOM SEARCH ALGORITHMS
237
Fii. 7.1.1 Graphical repnsentation of the double trials random search algorithm RS-1.
and choose the next point according to (7.1.4). It is not difficult to see that this algorithm generalizes the gradient algorithm (7.1.3). Only in the particular case where Zl is taken in the direction of the gradient do procedures (7.1.3) and (7.1.4) coincide.
Nunlinear Tacric Rmulom Smwh Algodhm (AlgO&hm RS-2) x t + , = x , + -YiSignY,Ei, a; a,>O,j3,>0,
Pi
(7.1.6)
(7.1.7)
Sign U,
I, 0,
Ux>O ify L O .
According to this algorithm, we perform a trial step in the random direction E,and check the Sign Y,. If > 0, then x,, = x , + ( a , / P I ) Y E , . If U, I0, then x , , = x, and nu iteration is made.
,
Linear
,
Tactic Rondom Sea& Algorithm (Algoriflun RS-3)
This algorithm contains the following steps: 1 i t 0 , generate &. 2 Calculate the increment
Y, =g(xi + PtZ,) -gfxi). 3 If Y, < 0, go to step 6.
238
MONTE CARL0 OPTIMIZATION
4
a
x,,
5 Go to step 7. 6 xifltxi. i t i 7 Go to step 2.
,= x i + 2 PI YEi,
ai
> 0, pi > 0.
(7.1.8)
+ 1; generate z-l .
Thus if 5 > 0, we perform as many iterations as possible in the initial chosen random direction xi + /3,Ej- if y. 5 0, we generate a random vector Sjand perform only one iteration according to the nonlinear tactic random search algorithm RS-2. It is not difficult to see that search in the same direction versus choice of a new direction is subject to the shape of g ( x ) . The flatter the gradient lines, the more iterations will be performed according to step 4 and correspondingly the fewer iterations according to step 6. In the particular case where g ( x ) is a linear function, all iterations will be performed according to step 4 in the direction of the vector xo+ a + ao&- 'YoE0,where Z, is the first random vector such that Y , > O and no iteration will be performed according to step 6. This is the reason why this algorithm is called a linear tactic random search algorithm.
,;
Optimum Trial Random Search A l g o r i t h (Algoritrhm RS-4)
This algorithm comprises the following steps: 1 Choose N > 1 independent random points xi + /?& on the sphere {xi+ PIEi}, where Zi is a random vector continuously distributed on the unit sphere with reahations k = 1,. ...IV. 2 Consider the sequence of increments
and let Z,$ denote the direction that has produced this maximum. 4 'fie p i n t x,+, is chosen according to the following iterative procedure: x i + , = x,
+ a , f i i - l ~ ~ , Z ~ ' ,ai > O, pi > 0.
(7.1 . I 1)
Thus the next point x , + ~is chosen in the direction E,$ of the greatest increase x g of the function g(x), that is, the vector Z$, corresponds to the trial optimal among those available.
239
RANDOM SEARCH ALGORITHh4S
Stahtimi Gmdient Rrrndom Search Algorithm (Algorithm RS-5)
This algorithm can be described as follows. 1 Choose N > I independent random points xi + bizjkon the sphere is a random vector continuously distributed on the ( x i + bizi},where unit sphere with realizations Z i k , k = 1 . .,N . 2 Calculate the sequence of increments (7.1.12) Kk =g(xi + pisik) - g ( x i ) , k = 1,. ..,N . 3 Set ?.
&=-
1
” 2
(7.1.13)
VkSik.
k-i
4
The point x i + , is chosen according to xi+l
5
xi
+ aiPi-lV,.s,
a;> 0, Pi> 0.
(7. I . 14)
Thus given x i , the next point xi+ I is chosen in the direction which is a result of averaging the sample Zil,.. . ZiiNweighted with their correspond(7.i.12). In the particular case where N = n and ing increments k = I ,..., n, + = e k , = O ....,0 . 1 , O ,..,, 0,
-
-
xk
k
we obtain the foltowing finite difference gradient algorithm: Xi+L
(7.1.1 5 )
= x, + a , b ( x , )
where
g(x, -4-
@I,%*,
... , x n ) - g(-r)
gtx,,
.. .‘X, + fin) - g ( x )
. . . . I
Pn
131
)-
It is not difficult to prove that for a linear function the direction of qe,on the average, coincides with that of the gradient of g(x). This is the reason why the algorithm is called “statistical gradient algorithm.’’ Consider the following srochasfic opfimization problem.
max “ E [ + ( x ,W ) ] = max g ( % ) = g ( x * ) = g * . xEDcR
(7.1.16)
xEDcR”
Here + ( x , W )is a function of two variables, x and W,x * is the optimal point of g(x), which is assumed to be unique, and W is an r.v. with unknown p.d.f. fw(w). We assume that at each point x E D only the individual realization of +( x, W ) can be observed. It is clear that, if the p.d.f. j w ( w ) is unknown, probiem (7.1.16) cannot be solved analytically. However, numerical methods can be applied.
240
MONTE CARLO OPTIMlWTION
One widely used numerical method for solving (7.1.16) is the srochastic approximation method. This method was originated by Robbins and Monro [30], who suggested a procedure for finding a root of a regression function measured with a noise. Kiefer and Wolfowitz [ 191 considered a procedure for finding x* in the optimization problem (7.i.16) where x E R'. The
procedures of Robbins-Monro and Kiefer-Wotfowitz were generalized by Dvoretzky 181. Hundreds of papers and many books have been written in the past I5 years about stochastic approximation, their convergence, and their applications. The reader is referred to Wilde [44]and Wasan [43]. We consider the following algorithm: x , + I = x , + a , ~ + ( x , K), ~
(7.1.17)
where
+(x,
+B L , X * , ..
, x n , Wl,) - @ ( x i - P I , X Z , .
28, $4x,, x 2 , .
.. * x,
+ P, , W,,) - 44x1
7
x2 1
28,
..
* 7
x,
. ?X,, W,2) ,..., *
1
- Pn, Wn2)
is the estimate of the gradient fig(x). It is readil seen that in the absence of noise, that is, when W - 0 , 6+(x, W ) =Jg(x) and (7.1.17) coincides with (7.1.3). In addition, if the realizations of the noise ar,e independent and E( W) = 0,then 6+(x, W )is an unbiased estimator of V g ( x ) . Proof of convergence of algorithm (7.1.17) to x*, subject to some and the function +(x, W), can conditions on the sequences be found, for instance, in Dvoretzky (81, Gladyshev [13j, and Wasan [43]. I t is not difficult to understand that the random search algorithm can also be used for solving problem (7.1.16). For instance, by analogy with (7.1.17) the random search double trial algorithm (Algorithm RS-I) can be written as
(al)zr,
a X , + ~ = X , +--"[+(x,+
P I E , . q,) - +(xi
F 2 ) ] E , . (7.i.i8)
2/31
We can see that, for the same reasons as the random search algorithm (7.1.4) extends the gradient algorithm (7.1.3), the random search algorithm (7.1.18) extends the stochastic approximation algorithm (7.1.17). Proof of convergence of (7.1.18) to x* can be found in Rubinstein [31). In analogy with (7.1.18) we can adopt any of the random search algorithms RS-2through RS-5 for solving problem (7.1.16).
EFFICIENCY O F THE RANDOM SEARCH ALGORITHMS
291
7.2 EFFICIENCY O F THE RANDOM SEARCH ALGORIT)?MS
The random search algorithms can be compared according to different criteria. Usually, they are compared according to their local and integral properties [28, 291. Local properties are associated with a single iteration of the random search algorithm, integral properties-with many iterations. Comparing different algorithms according to integral properties we usually define: I The initial condition from which search starts. 2 A set of test functions (linear, quadratic, parabolic, multiextremal,
etc.) for which the extremum is sought. 3 Some criteria that must be achieved during optimization. The following criteria can be used. Find an index k corresponding to the best algorithm among S algorithms available, such that: (a) where the number of iteration i is given.
tw
242
MONTE CARLO OPTIMIZA'IION
It is readily seen that the first three problems are associated with finding the best algorithm when the number of iteration i is given; the last two involve finding the best algorithm that hits, at the minimum number of iterations, a given region R , or R , containing the extremum point x*. In Section 7.3 we consider some local and integral properties of Algorithm RS-4. Generally, the problem of comparison of different algorithms according to their integral properties is difficult to solve. Some attempts to overcome this difficulty have been made by Rastrigin [28]. Another interesting problem is how to find the optimal combination of algorithms, each of which is capable of finding the extremum of g(x). This problem is solved in Rubinstein [33] and uses Bellman's principle of optimality. Now we consider some local properties of the random search algorithms, assuming that some point x, has been reached, and that we are allowed to make only a single step (iteration). Let x!:)~, s = I , . . . ,S,be the point (the state of the system) after this single iteration. Let us define the efficiency of the random search algorithms as (7.2.1) where
that is, where Axfa)is the projection of the vector x!:), - x, on the direction of the vector x, - x*. and 4(*'is the number of observations (measurements) of g(x) required for the algorithms in the ith step. For simplicity we consider only the case where g ( x ) is approximately a linear function, which is the same as to assume that in Taylor expansion g ( ~ , + , ) = ~ ( ~ , + ~ X , ) ~ ~ ( ~ , ) + ~ ~ ~ (7.2.2) , ~ V ~ ( ~
Therefore at each iteration made by the random search algorithms, we approximate gfx) IinearIy on the interval Ax,. It is proven in (32)that, for a rather wide class of functions optimized by random search algorithms under the conditions m
I-
I
OD
i- I
there exists a number I , sufficiently large and such that for i 1 1 a linear approximation of g(x), that is, (7.2.2), is valid. Substituting (7.2.2) in any of the four random search Algorithms RS-1, RS-2, RS-4, and RS5 (see, respectively, (7.1.4). (7.1.6), (7.1.11), and
243
EFFICIENCY OF THE RANDOM SEARCH ALGORITHMS
(7.1.14)), we readily obtain
+
, v ~= ~ x, ~ , a ~ ’ ) V g ( . x ,cos ) p!*’io( AX^"')
(7.2.3)
where (7.2.4) and s = I , 2,4,5 corresponds to RS-I , RS-2, RS-4, and RS-5. The distribution of ‘p!’) depends on the specific algorithm and on the distribution of the Let us assume without loss of generality that afs)=1. random vector E!:,’”). Then taking into account that for a linear function g(x) the direction of the vector x* - xi coincides with the direction of the gradient V g ( x i ) , we can express the efficiency C, (see (7.2.1)) as
c,
=
E(c0s p y )
(7.2.5)
E( A!(”)
We consider here only the efficiencies of the random search Algorithms RS-I,and RS-4, assuming that the vector E is uniformly distributed on the surface of the unit n-dimensional sphere. The Double Trial Random Search Algorithm RS-1 It follows from (7.1.4). (7.2.3). and (7.2.4) that (a)
where p!’) is a random angle between t b vector 2:’)uniformly distributed on the n-dimensional sphere and the vector V g ( x , ) . We assume here that the direction of the gradient corresponds to ‘p;”=O. Furthermore, it follows from (7.2.5) that the distribution of d’)does not depend on i ; therefore the index i can be omitted. We also omit for convenience index (1) in d’).I t is shown in the Appendix that pl has a p.d.f.*
h,(cp) = B,sinn-2cp,
7r
7r
2-
2’
- - < ‘p 5 -
(7.2.6)
where (7.2.7) *Weuse for convenience - 5 rp I7 rather than 0 5 p 5 n (see Appendix).
244
MONTE CARLO OPTIMIZATION
Since for Algorithm RS-I we need two observations of g(x) at points + B E ) and g(x - BE), respectively, the efficiency C, (see (7.2.5)) is
g(x
C,”’= E(coscp)
2
(7.2.8)
.
The expected value and the variance of cos cp are, respectively,
(7.2.9)
(7.2.10)
Substituting (7.2.9) in (7.2.81, we obtain Bn c, = n-1
(7.2.1 1)
and the following relationships can also be easily verified :
Table 7.2.1 and Fig. 7.2.1 represent the efficiency C,, and var(cos p) = u2 as
Table 7.2.1 The Efficiency and u’ ss FurrdiolrP d II for Aigod?bm RS-1 02
0.5995 0.416
10
0.3 184 0.25 0.2125 0. I875 0.I702 0.1556 0.1452 0.1367 0.1294
0.1344
0.4112 0.3876 0.3792 0.3677 0.3602 0.3518 0.3564 0.3652 0.3529
I1
0.123
0.2268
0.3538
2 3 4 5 6 7
8 9
0.314
0.26 0.22 I 0.1957 0.166
0.1401
EFFICIENCY OF THE RANDOM SEARCH ALGORITHMS
245
\
;_I
9
;0.3 0.2 0.I
0
Fig. 7.2.1
2
3 4 5 6 7 8 91011
'fie efficiency and y2 as functions of n for Algorithm RS-I.
a function of space size n, from which it follows that, as n increases, both
the efficiency and the variance decrease. When n+oo, E(coscp)-+O and Cn-+O,that is, the random search Algorithm RS-I becomes inefficient.
(b} The Optimum Trials Random Search Algorithm RS4 It follows from (7.1.1 I), (7.2.3), and (7.2.4) that
where cos qy)= max(cos vir,.. . ,cos pi#). Since the distribution of v:*) does not depend on the step number i, we can again omit the index i. We also omit for convenience index (4) in 91:'). To find the efficiency of Algorithm RS-4 let us find the distribution of Y==eoscp, where cp is distributed (compare with (7.2.6) and (7.2.7))
and
By the transformation method (see Section 3.5.2) we obtain n-3 P,(o)= Bn(l 4) * ,
-15051.
(7.2.12)
246
MONTE CARLO OPTIMIZATION
The c.d.f. and p.d.f. of V: = max( V , , ... , V , ) are, respectively,
F , ( 4 ) = [ F,(41N
(7.2.13)
and P,(”;}=”F,(o)]”-’
P,(u).
(7.2.14)
The expected value and the variance of V i are, respectively,
(7.2.15)
For n = 3 we have P3(uoN)=$lN
+t))N-’
N- I E( v$) = N+l 4N
var( v,,}=
( N - 1)“
+
(7.2.17) (7.2.18) (7.2.19)
I t follows from (7.2.5) that the efficiency of Algorithm RS-4is
(7.2.20) For n = 3 we obtain c3=
( N - I) ( N + 1)”
(7.2.2 1)
The optimal value of C, equals i and is achieved when N is equal to 2 or 3. Generally, it is difficult to find C, and var(V,O)for n > 3. Table 7.2.2 and Fig. 7.2.2 represent simulation results for C, and var( V,”)as a function of n for the optimal number of trials N+ on the base of 100 runs. It is interesting to note that the optimal N* = 2 and does not depend on n. Comparing Algorithms RS-1 and RS-4 for a linear function, we conclude that RS-I is more efficient than RS-4 for all n > 1. The variance associated with Algorithm RS-4 for the optimal N* = 2 is always less than that associated with RS-1. The intuitive explanation for it can be given as follows. Taking two random trials according to Algorithm RS-I, we always
241
EFFICIENCY OF THE RANDOM SEARCfl .ALGORITHMS
Table 7.2.2 Tbe Efficiency and the var( V , ) as Functions of II for Algorithm RS-4 Cn
varf v:)
N*
3
0.198
4 5 6 7 9 11
0.159 0.137 0.121 0.109 0.092 0.08 I
0.236 0.171 0. I34 0.110 0.053 0.070 0.050
2 2 2
n
cn 0
4075 3845 3743 3647 3575
2 2
2
3478 3622
2
Note: The sample size is equal to 100.
find a feasible random direction toward the extremum, which is generally not true for Algorithm RS-4. Indeed, the probability of finding such a direction (success) in N independent trials is equal P ( N ) = I - ( I -p)” Here p is the probability of success in a single trial. Taking into account that for a linear function p = l, we obtain, for the optimal N + = 2. P( N * = 2) that is, the probability of a success in Algorithm R S 4 is equal to f - Defining khe efficiency as CdS)/u(’)),where o(’)= (var(cosrp(’))J’/*.we see from Tables 7.2.1 and 7.2.2 that both Algorithms RS-I and RS-4have approximately the same efficiency.
=a,
00.20 .25
0.05
0
t
\
1 2 3 4
Fig. 7.2.2 The efficiency and the var(V:.) size is equal to 100).
5 6
7
8
9
1011
n
as functions of n for Algorithm S-4(the sample
248
MONTE CARLO OPTIMIZATION
73 LOCAL AND INTEGRAL PROPERTIES O F THE OPTIMUM TRIAL RANDOM SEARCH ALGORITHM RS-4
This section is based on Ref. 35. 73.1 Local Properties of the Algorithm
The term "local properties" refers here to convergence of the vector - x, to the direction of greatest increase of the function g ( x ) , as the number of trials m tends to infinity. Assume that g(x) is a continuous function and (7.3.1) x E D c R", +(x, W ) = g ( x ) + W ,
x,+
,
that is, each measurement of the function g(x) is accompanied by additive noise W ,and assume that the vector Z is continuously distributed on the unit sphere with a densityf(Z). Let B be the set on the surface of the unit sphere defined by the condition f(Z) > 0 and let B be the closure of E . Let us also assume that the maximum maxg(x
+ BZ) -g( x + PZ')),
' 2 E ii
(7.3.2)
IEB
occurs a1 the unique point x + pro. We are concerned with the asymptotic behavior of the sequence of optimum-trial directions ( E ~ > ~ defined - , by
Vector E v is almost surely (as.) the only limiting vector of the sequence {E2)zE, if and only if the noise W satisfies the following property: For a s . any sequence ( W,}:-, of W ' s realizations and for any c > 0, there exists a natural number K , (which depends on the sequence) such that
Theorem 73.1
W,< F k + c , where
-
W, =
K,Ik
max
W,.
(7.3.4)
(7.3.5)
I
Proof (1) Suffiency Let us prove that for every 6 >0, the 6neighborhood S(Eo,8 ) of the point Zo contains almost all optimum-trial directions E,: when m is sufficiently large. The proof is by contradiction. Assume that there exists 6 > 0 such that the following holds: There is a
LOCAL AND INIEGRALPROPERTIES OF TIIF. RANDOM SEARCH ALGORITHM
249
positive probability that a realization (Ern): contains a subsequence (Z,,,}r- such that g ( x + p E m , ) + Wmh>g(x+@Z,)+ y , 1 S j < m , - 1 (7.3.6) and at the same time Zrnk B S( so, 6).
Continuity of g(x) implies that we can choose 7 > 0 and 6,< 6, such that inf g(x + P Z ) > sup g(x + p S ) + 27. (7.3.7) E Bn s (E", 8 ,) 3E if/( [email protected])
is (a) The case of unbounded noise Assume the sequence { Wmk)yunbounded and satisfies Wm4<
wmk+0 ,
m k > K,
(7.3.8)
Denote by Zkthe number of the trial in which the maximum qmk is achieved, that is, WS4= Kmk and iiik < m k hold. The sequence of indices (Eik)';9ct is a.s. unbounded, because { Wml)T=, is a s . unbounded. Therefore &heevent E-" k o E Bln S( z0,6,)
will as. occur for some Eke> K,, since at each trial there is a constant nonzero probability of its occurrence. Comparing the results obtained in trials iEk, and mko,it follows from (7.3.7) and (7.3.8) that B( x + P',iiko) + w,ii.,>g(" + B'rnb0) + Wmro+7 s
which contradicts (7.3.6). Q.E.D. (b) The case of bounded noise. If sup W = Wmx< m, then the sequence { Wrn>z, as. contains an infinite subsequence { W , , ) ~such , that i = 1,2, ... W,,,,,-q< Wm,5 Wmx,
,
On the other hand, there exists as. a particular subscript miO such that
z
4
0
E Bn
S(Z*,&,).
Thus fcr any m > mio satisfying EmB S(?, S), g(x +Pm,,,)
+ Wm,*> gtx + @Ern)+ Wm,+ 7 ) g(x + g z m )
which contradicts (7.3.6).
+ Wm,
Q.E.D.
not satisfy(2) Necessity Assume that the set ?of sequences {Wk}:-t ing the theorem's condition has a probability P ( c ) > 0 For each sequence
250
from
MONTE CARLO OPTIMIZATION
c there exists a number c > 0 and a subsequence { Wk,}lm,,such that wk,2 Ek,
+ c.
(7.3.9)
Our task now is to prove that with probability P ( c ) the vector Zo is the What we actually prove is a only limiting vector of the sequence { E:}:somewhat stronger statement: namely, that the set of limiting vectors contains the set
,.
V, = iin { Zlg(x
+ PS") - c < g(x + pz) Ig(x + pro>}.(7.3.10)
To prove this statement it suffices to show that for any y € 3 and any S > 0 the sequence {E;}E-, will visit the neighborhood S(y,S) infinitely often. Indeed, for any trial there exists a constant positive probability of entering the set S ( y , 8 ) n V,. This implies that the subsequence of trials (k,} satisfying (7.3.9) as. contains a new subsequence {k,,) such that E S ( y , 6) n V, holds. The vectors z k , , will be optimum-trial directions, -k,, since for any i , 1 Ii 5 k,l- 1, 8(AX + @5k,l
) + wk,l > 8(x + pz' 1 f
wk,,
= . g ( n + p z , ) +w;.
Q.E.D. Remark In the case without noise ( W = 0 a.s.) we can explicitly calculate the number of trials required to enter a prescribed &neighborhood S(Eo, 6) of the point I ' with a prescribed probability p. Define
that is a is the probability of visiting S(Zo,S) at each single trial. The probability of visiting S(Zo,6 ) at least once by making rn trials is equal to p,=
I-(l-a)m.
(7.3.1 1)
Thus if we want p, 2 p , i t suffices to produce (7.3.1 2)
trials. In the case where p = 1 - a, m 2
In a In(I - a)
Table 7.3.1 shows some values of m as a function of a.
(7.3.13)
LOCAL AND IN’I‘EGRAI. PROPERTIES OF THE R A N D O M SEARCH ALGORITHM
251
TaMe 73.1 Dependence of m 00 a a
0.500 0.200 0.100 0.050 0.020 0.010 0.005 0.002 0.001
r n l
22
8
58
194
1057
458
3104
6903
73.2 Integral Properties of the Algorithm to
The term “integral properties” refers to convergence of Algorithm RS-4 the point of extremum x*.
Tbeorem 73.2 Suppose that g ( x ) has bounded second derivatives. Let E( 115, II*Ix(J,xI, . . - x , ) Ihf 7
< 00
(7.3.14)
for
11 x, 11 I5 < 00, j
= 0,1 ,. ..,I , where
5, = @,-
’
E$.
Let the normalizing factor y, satisfy the condition
0 < v,( TI II X I II + A, ) < WJ
(7.3.15)
9
where 7, =
1,
If
l l ~ & l>l o
and
0, if II VP, iI = 0 ( V , is defined in (7.3.18)), and let a, and /3, be such that 7, =r
m
a, 2 0,
/3, 2 0,
I]alp, < 00, 1-1
m
2 /j? < m, 1-
I
@
2 a , = 00: 1-1
(7.3.1 6) then the optimal trial random search algorithm x, + I = 4 .x, - a,v,E,)
(7.3.17)
converges a s . to x * . Here a(.) denotes the projection operator on D (i.e., for every x E R“, n ( x ) E D and IIx - n(x)ll- min,,,llx - Y 11. Proof Since g ( x ) has bounded second derivatives, it is readily shown that E(Eitxi) = C i V g ( x j ) +P,V,s
(7.3.18)
where C, and the vector V, have bounded components, that is, Ci< 00, IIV,ll < 00. Further, convergence of (7.3.17) to x* follows from Ref. 10, Theorem 1.
252 7.4
MONTE CARLO OPTIMIZATION
MONTE CARLO METHOD FOR GLOBAL OPTiMIZATlON
(a) Deterministic Optimization Problem The problem of finding the global extremum of g( x ) (see (7. I . 1)) has been approached in a number of different ways. The earliest methods were associated with the grid technique and the function was evaluated at equispaced points throughout D. We shall consider only Evtushenko's algorithm [ 1 I] in such a deterministic sense. Some other deterministic approaches for global optimization are given in Dixon [7], Shubert 1371, and Strongin [39]. Evtushenko makes the foilowing assumptions about the function and the objective: 1 The function satisfies the Lipscitz condition, that is,
Ig(X,)-g(x,)j<~llx,-x,If,
for any x i . x 2 E D, L > 0. 2 Each x E D,,where 0,s{ x : tg(x) -g(x*)J
<E},
is accepted as an approximation for x * . Evtushenko's algorithm
IS as
follows.
Algorithm GI- I 1 Evaluate the funclion a1 N equispaced points x,, . . . ,xNthroughoutD and define k = 1 , . . ., N. y, =g(x,), 2 Estimate g* by MN = max( y , .. .,y, ).
.
The theoretical background to this approach is very simple. Let V, be the sphere 11 x - x, [I 5 r, where rt = L . - ' ( g ( . ~ , ) MN+E). Then for any x E V, g( x ) 2 g( x,) - Ix, = MN - c, Hence if the sphere V,, i = I , . ..,N,covers the whole set D, then M N cannot differ from g + by more than E. and the problem is solved. In the simplest case where D is an interval, a Ix Ib, Evtushenko proposed the following procedure: x,=u+-
E
M,=g(x,)
L' 2 E +B(X&) - 4 X&+, = XK + L M , = max(g(x,),M,-,).
253
MONTE CARLO METHOR FOR GLOBAL OPTIMIZATION
The number of function evaluations required to solve the problem is greatest in the case of a monotonically increasing function, namely L(b-a)
N=
2E
.
Most algorithms for global optimization contain random elements and are related to the Monte Cario method. We consider some such algorithms. Brooks (4) suggested, for solving problem (7.1.1), the following “pure” random search algorithm. Aig~dhtttGI-2 1 Generate X , , ...,X , from any p.d.f.f,(x)
such thatf,(x)
> 0, when
X E D. 2 Find Y k = = g ( X k )k. = I ,...,N. 3 Estimateg* by M, = max( Y,,..., YN).
This algorithm was also discussed in Ref. 36. Our nomenclature follows that reference, and our discussion is based on it. Let p be the probability measure defined on B, the Bore1 o-field of D, so that ( D , B , P ) is a probability space. L e t g - ’ ( a , b ) = { x E D : a < g ( x ) _ < b }and , Iet F ( y ) = P ( Y ; l y ) ; then F ( y ) = p { 4 X , ) 5 v } = P ( g - ’( - -7Y
1)
and Y,,. . . ,Y, are independent identically distributed (i.i.d.) random variables (r.v.’s) on R’ with a cumulative probability distribution function (c.d.f.) F. Proposition 7.4.1 Suppose P assigns u positive probability to every neighborhood of x+, and suppose g is continuous at s*,then
lim MN= g *
as.
(7.4.1)
N+W
Proof It is clear that F(g*) = 1 and for each 6 > 0 we have 1 - F(g* - 6) = P ( g * - 6 < g( X , ) 5 g*} > 0 by our assumption. Let A N ( 6 ) be the event ( M N5 g* - 6); then P ( A , ( 6 ) ) = FN(g* - 6) and X g , , P { A , ( 6 ) } = F(g* - 6)/(1 - F(g* - 6)) < 00. By the Borel-Cantelli lemma P { M , ,< g* - 6 infinitely often) = 0 for all d > 0 and thus (7.4.1) follows. Q.E.D. The choice of P, and consequently the resulting F, depends on our prior knowledge of x’. If it is known that a certain region is more likefy to include x*, then it would be more efficient to assign a higher probability to
254
MONI'E CARLO OPTIMIZATION
that region. If nothing is known a priori about x*, a uniform distribution over D can be assumed. In guaranteeing (7.4.1) the exact choice of P is immaterial. However, the rate of convergence is determined by the properties of F. For example, by a theorem of Gnedenko [MI, if there exists a constant a > 0 such that 1 - F(g* - 0 5 ) (7.4.2) lim =cu, vc>o 610 1 - F(g* - 6 )
then
with aN determined by F(g* - a N )= ( N - l)/N. Some more properties of MNare listed below. I Geometric distribution Let jv8 be the first N for which Then N8 is a geometric r.v., that is,
P { N , p k ] =: F k - ' ( g * - 6)[ 1
- F(g* -a)].
MN
> g* - 6.
f,2,.. . .
k
(7.4.4) Consequently, it is well known that EN, =
I I - F(g*
-
- 6) = ??&
and P(N8 5 k }
5
1 -- F k ( g * - 6) ZE P&,t.
-
as 0 640 9 and thus f6,,,,,l = 1 -(1 l / 9 8 ) [ v ~ 1 +l the integer part of v). Hence 9)8 = EN, is approximately a 63% confidence bound for N8, the number of trials necessary to make M N > g* - 6 (6 > 0 small). Let a = 1 - F(g+ - 6); then fa$k =1(1 - a)&.For every given pair ( a , P ) the smallest k for which P8.k 2 P is k( a, 8) = In( 1 - &/In( 1 - a), and Table 7.3.1 with k( a,1 - a) = nr can be used again. 2 tack of memory I t is well known that (7.4.4) implies I t is clear that
'
~
~
3
e - = 0.63 (here [ q ]is
P { N b > k + m / N a> m ) = P ( N g > k). In terms of N,,,we thus have P ( M , + , Ig* - SIM, 5 g*
(7.4 S)
- S} = P ( M , I g* - 6 ) ,
because the events {A'&> k} and { M k5 g* - 6 ) are identical. It follows that, given m successive failures (to enter ( y : y > g* - 6 ) ) , the conditional distribution of the number of trials necessary for the first success equals its
MONTE CARLO MEI'HOD FOR GLOBAL OPTIMIZATION
unconditional distribution. In particular we have E( N, I M, 5 g * - 6 1 = m + EN8.
255
(7.4.6)
3 Poisson approximation If (7.4.2) or (7.4.3) hold, then Z8,,v,the > g* - 6, is asymptotically number of q, i = 1.2,. ..,N, for which Poisson distributed. More precisely, for fixed N and 6 > 0, Z 8 , , is a binomial r.v. with parameters N and p = I - F ( g + - 8 ) . When (7.4.2) holds by substituting 6 = a N in (7.4.2), we obtain N [I - F(g* - 6a,)]+c", which implies that Z c o , , Nconverges in distribution to a Poisson r.v. with parameter c".
The problem of finding the global maximum of g ( x ) can be reduced to that of finding the mode for association with g(x) density function. Indeed, if g(x) 2 0, x E D , then $(x) =c-'g(x) where c - ' = ( j g ( x ) d x ) - ' is a density function. and the problems of finding the global maximum of g ( x ) and finding the mode of # ( x ) are equivalent. This can be solved by one of the methods mentioned in Refs. 41, 42, and 46. If g(x) is unrestricted in sign but bounded, that is, if Ig(x)l 5 k , then f x ( x ) - c - ' ( g ( x ) + k ) , where c . ' = [ J ( g ( x ) + k ) d x ) - ' is again a density function. A natural extension of the "pure" random search algorithm GI-2 is the so-called muitistart algorithm [7], which is probably the one most frequently used in practice for global optimization. In this approach we use any iterative procedure (gradient, random search, etc.) for local optimization and run it from a number of different starting points xoJ, j I, ....N. The set of all terminating points hopefully includes the global maximum x * . The muitistart algorithm is as follows. Algorithm GI-3 1 Generate Xol,. . .,X,, from any p.d.f.f,Jx-) > 0,x E D (usually X, is chosen to be uniformly distributed over D ) . 2 Consider XoI.. . . ,X,, as the starting points, then apply N times a local optimization algorithm (gradient, random search, etc.) and find the local extrema x:, , . ..xfN of g(x) associated with Xol,. . ., XoN. 3. Estimate x * by max (1:. . . .,x:}.
Let us define D, as the set of starting points A', from which the algorithm will converge toj-th local maximum. We call DJ the region ofarrruction of t h e j t h local maximum. Let us assume that the number of local maxima is finite, and let X, be uniformly distributed over D; then the probability
256
MONTE CARLO OPTIMIZATION
of at least one X,, from a sequence of N points drawn at random over D , falling in the region of attraction of the global maximum 0;.equals (7.4.7)
where m( 0 )is the measure of D. A more sophisticated approach to the global optimization problem was suggested by Chichinadze [5], who introduced a probability function P( u ) as the probability of g(x) < c, that is, if m ( V ) is the measure of the level set
Y = { x : g ( x )< u ) .
then (7.4.8)
The function P(o) is, of course, not available, but if we calculate g(x) at N points distributed at random over D, and count the number M of these points for which g ( x ) < u , then M / N approximates P( u). It is not difficult to see that the global maximum corresponds to P ( c ) = I and the global minimum to f ( u ) = 0. To find the solution P ( c ) = 1, Chichnadze suggested approximating f ( v ) by a linear combination of a set of given polynomial functions P,(u), i = I . . . . , k , k
P(,) = 2 h,P,(t;).
(7.4.9)
r=l
The range of u was divided at the points u,, j = 1,. . .,s. and the optimal values of A, were determined by minimizing (7.4.10)
where M, is the number of points for which g ( x ) O , j = 1,. . .,s. The root t‘* of P( u) = 1 was then determined to obtain an estimate of the global maximum of g(x). Considerable attention has been paid in the multiextremal optimization to the random search algorithms. Gaviano [ 121 showed that if x,+ I = x,
+ a,&,
(7.4.1 1)
and a,= arg ( global max g( x,
+ arSr)),
(7.4.12)
257
MONTE CARLO METHOD FOR GLORA!.. OPTIMIZATION
then lim P ( g ( x , ) - g( x+) < E ) = I
(7.4.13)
i+ 00
for every p > 0.Here E is a vector uniformly distributed on the surface of a unit n-dimensional sphere. If D is a finite space and if a bound on the first derivative of g ( x ) is known, then Evtushenko's [ 1 11 or Shubert's [371 one-dimensional global optimization techniques could be used to find the optimal ai.However, for a general function, a global optimization along the lines of (7.4.12) is difficult to perform. Matyas [22] proved the convergence to x' of the foHowing random search algorithm. Aigorihm GI-4 1 Generate Y,,Yz,. . . , from an n-dimensional normal distribution with zero mean and covariance matrix 2, that is Y N(0, Z). 2 Select an initial point xi E D. 3 Compute g(xl).
-
4
i c l .
5 If x, + Yi E D , go to step 8. 6 xi+-x,+,. 7 Go to step 10. 8 Compute g(x, + Y,). 9
xi+)-
i
xi xi
+ U,, 3
if g( x i+
q.)2 g( xi) - e, where E > 0
otherwise.
10 i c i + 1. 11 Go to step 5 .
According to this algorithm, a step is made from the point xi in the direction only if x i + Y, E D and g ( x i + Y,) 2 g ( x i ) - e. The following procedure, based on cluster analysis, was introduced into global optimization by Becker and Lago [3]. AIgoritAm G/-5 1 Select N points uniformly distributed in D. 2 Take Nl < N of these points with the greatest function values. 3 Apply a cluster analysis to these N, points, grouping them into discrete clusters; then find the boundaries of each cluster and define a new
domain D , c D,which hopefully contains the global maximum. 4 Replace D by D , and perform steps 1 through 3 several times.
258
MONTE CARL0 OPTIMIZATION
This is a heuristic algorithm and its ability to find the global maximum depends on the cluster analysis technique used in step 3 and on the parameters N and N,.There exists a positive probability of missing the global maximum. However, in practice this technique is widely used for global optimization. More on cluster analysis for global optimization can be found in Gomulka [lS], Price [27], and Tom [@I. (b) Stochastic Optimization Problem Consider the stochastic optimization problem (7.1.16). assuming that g ( x , W ) = g(.)
f
w,
(7.4.14)
which means that g ( x ) is measured with some error W. The following Monte Carlo algorithm, which is similar to Algorithm G1-2, can be used for estimating g* in (7. I . 16). AlgWithtn GI-2' 1 Generate X I , .. . ,X, from any probability distribution function ) 0 ,ED). ~ (p-d-f.)&(Xh ( f x ( ~ > 2 Find Y k - g ( x , , w k ) = g ( X k ) + w k $ k = I , ...,N. 3 Estimateg" by M N max(Y,, . . ., Y N ) . Q
Let wk be i.i.d. r.v.'s with a given c.d.f. H. We also assume that the W, and the X, are independent and that W , * inf ( u :N(u)= 1) 5 00. The following proposition is proven in Ref. 36.
Proposition 7.4.2.
Under the conditions of Proposition 7.4.1 lim M, = g + + W,, a s . N-4)
(7.4.15)
Proof: Let E N = max
IcrsN
q.
We say that ( E N )is stable if there exists a sequence of constants {q,,,) such that for all 6 > 0 lim P ( j € , - q ! , I > & f = O .
N-tW
(7.4.16)
We consider three cases. 1
< GO, in
which case our estimate for g* is M N - W,,and we
certainly have lim ( M N -
N-bW
w+)= g *
as.
(7.4.17)
MONTE CARLO METHOD FOR GLOBAL OPTIMIZATION
2
W , = 00, but
259
{EN}is stable, in which case (7.4.16) implies lim ( M N - q N )= gf in probabiIity,
(7.4.18)
N-m
and q N is determined by H(qN)= ( N - l)/N. A necessary and sufficient condition for case 2 is [14] I-H(u+6) lim =o, V6 >o. (7.4.19) u-bao 1-H(u) We thus see that, if W , and q N are known, we still have convergent algorithms in (7.4.17) and (7.4.18). 3 W , = 00, but ( E N ) is not stabIe. Here we have by (7.4.15) M N + m as. Q.E.D. The following examples will demonstrate these ideas. 1 If the W; are normally distributed with mean 0 and variance u 2 , then (7.4.19) holds and { E N }is stable with q N = 4 2 log N)”’. 2 Suppose that the W,’s have the generalized double exponential distri-
bution, that is,
Then by (7.4.19) ( E N }is not stable for q N = (log ( N/2))1’a.
Q
II, but is stable for a
> 1 with
Algorithm GI-3 can be also adapted for the stochastic optimization problem (7.1.16). rewriting step 2 as follows: 2 Consider ,Yo,, . . . , ,YONas the starting points; then apply N times a local iterative procedure (stochastic approximation, random search, etc.) that is able to find the association local extrema x : . . ..,xz of E [ g ( x, W ) ] =g(x). (c)
Cowtrained Optimizatkm Consider the following constrained opti-
mization problem: (7.4.20)
subject to g,(x)sO,
k=l,
...,m.
(7.4.21)
We assume that the convex programming methods (see Avriel [2j) cannot be applied because the convexity assumptions do not hold either for the region D = { x :g,(x) I 0, k = I , . . .,M } or for the function go(x).
260
MONTE C A W OPTIMIZATION
Let us consider two cases. 1 If the region D = {x :g k ( x ) 5 0, k = I , . . .,rn) is known, and we can readily generate r.v.’s at D, then Algorithms G1-2 through GI-5 can be directly applied for finding the global extremum of (7.4.20) and (7.4.21). 2 If the region D = { x : g , ( x ) 5 0, k = 1,. . .,m} is either unknown explicitly or is complex, but another region D, that contains D and has a simple shape is known, then we generate r.v.’s at D, and accept or reject them according to whether X E D or X E ( 0 ,- D ) . Next we can apply again Algorithms GI-2 through GI-5.
7 5 A CLOSED FORM SOLUTlON FOR GLUBAL OPTIMIZATION
This section is based on the results of Meerkov [23] and Pincus [25]. Both papers deal with the multiextremal optimization and use the classical Lapkace formula for certain integrals. We follow Pincus [25]. Consider the optimization problem mjn “g(x) =g(x*) = g + , XEDCR
where g(x) is a continuous function, D is a closed bounded domain, and x * is the unique optimum point, Pincus [25Jproved the following theorem.
Theorem 75.1. Let g ( x ) = &x,, . ..,x , ) be a real-valued continuous function over a closed bounded domain D E R“. Further, assume there is a unique point x * E D at which min,c,,g(x) is attained (there are no restrictions on relative minima). Then the coordinates X: of the minimization point are given by
In particular the theorem is valid when D is convex and the objective function g is strictly convex. The proof of the theorem is based on the Laplace formula, which for sufficiently large X can be written as
bi
exp ( - ~ g x)) ( d~ m x , exp ~ (-~g(x*))
(7.5.2)
exp ( - M x ) ) dX = exp ( - M x * ) ) .
(7.5.3)
We now outline a Monte Carlo method based on Metropolis et al. work [24] (see also [26n for evaluating the coordinates of the minimization point
A CLOSED FORM SOLIJTION FOR GLOBAL OP1 IMIZATION
261
. .. ,x,' 1, that is, for approximating the ratio appearing on the right-hand side of (7.5.1). For fixed X (7.5.1) can be written as
x* = (x:,
(7.5.4)
For large h the major contribution to the integrals appearing in (7.5.1) comes from a small neighborhood of the minimizing point x*. Metropolis' sampling procedure (241, described below, is based on simulating a Markov chain that spends, in the long run, most of the time visiting states near the minimizing point and is more efficient than a direct Monte Carlo, which estimates both the numerator and the denominator separately. The idea of the method is to generate samples with density
where the denominator of (7.5.5) IS not known. This is done as follows. Partition the region D into a finite number N of mutually disjoint subregions 0, and replace integrals over D by corresponding h e m a n n sums using the partition { D,}. Fix a point y J = (yi, . , ., y i ) E 0,. Then construct an irreducible ergodic Markov chain {Xk} with state space {y',. . , , y N ) and with transition probabilities pJ,, I 5 i, j I N, satisfying wJ J ' I p#k3 j I , . . , N , where vJ= exp I( - h g ( y J ) J / Z , , - ,exp I[ -Ag(yh)J; that is, (5)is the invariant distribution for the Markov chain. It should be noted that, in the last expression for T, we have assumed for simplicity that all subregions D, have equal volumes. Then using the strong law of large numbers for Markov chains, we have with probability I
.
Lg k-I
*k
-+
m--Lm
(7.5.6)
262
MONTE CARL0 OPTIMIZATION
The sampling error for each component X; of the vector x k is (see [26]) E[m-'(ZT-',,X;- F , ) ~ 5] c / m , where c is a positive number,
J= I
From Chebyshev's inequality we have
[I
k -m
P rn-'
x;-p, k-l
We now turn to the question of how Metropolis constructs a Markov chain with the required invariant distribution. He starts with a symmetric transition probability matrix P* = ( p : ) , 1 5 i , j < N, that is, p,'i =pi:, p; > 0, Z;Zyp:, = I , the known ratios r1/nJ, and defines the transition matrix of the Markov chain (X,)as follows:
J1
(7.5.7)
PI, = P : l ,
I
Pi: +
pIj(l - z ) , i = j ,
I t is shown in Ref. 14 that a Markov chain with the above transition matrix has the invariant distribution {q},that is, 4 = X,p,,?. A chain with such a transition matrix can be realized as follows. Given that the chain is in statey' at time k , that is, {X,= y ' } , the stale at time k + 1 is determined by choosing a new state according to the distribution { p $ , j = I , . ..,N). If the state chosen isy', we calculate the ratio ~ / qIf .~ / 2q1, we accept yJ as the new state at time k + I ; if 5/77, < 1. we takeyJ as the state of the Markov chain at time k + 1 with probability r j / q and y' as the new state at time k + I with probability 1 - ? / n i . It is also shown in Ref. 16 that this procedure leads to a Markov chain with transition matrix P = (pi,). It should be noted that (7.5.1) can be useful not only for finding the global optimum in a multiextremal problem, but aIso for solving nonlinear equations (see [20])and some kinds of problems in statistical mechanics as well (see [ 161).
263
OPTIMIZATION BY SMOOTHED FVWCTIONAW
7.6 O€TIMlZATION BY SMOOTMED FUNCTlONALS
Consider the following stochastic optimization problem (see (7. I . 16)) min E,[ + ( x , w ) ]= rnin "g(x) =g(x*) (7.1.16') xEDcR"
xEDcR
where +(x, W )is a stochastic function with unknown p.d.f. p ( x ) , D is a convex bounded domain, and x * is the unique optimal point. We also assume that g ( x ) is bounded for each x E D and var,[+(x, W ) ]< 00. For solving this problem let us introduce the following convolution function: OD
f(x,P)
= J r nA ( o , P ) g ( x - w ) d v = I _ _ X ( ( x - o ) , P ) g ( o ) d t b -m
(7.6.1 )
which is called a smoothedfuncfional[18). In order for g( x, j3) to have nice smoothed properties, let us make some assumptions about the kernel hc( u, p). I &",/I)= (I/P")~(v/P) = [ ~ / ~ ~ ) h ( a , /.p .,,wn/p) . is a piece-wise differentiable function with respect to 0. 2 lims,,h^( o, P ) = 6( v), where 6( u ) is Dirac's delta function. 3 lima,,&x, 8) = g(,x), if.r is a point of continuity of'g(.r) 4 h^(o,p) is a p.d.f.. that is, $Cx./3)= E v [ g ( x - V>1. We assume that the original function g(x) is not "well behaved." For instance. it can be a multiextremal function or have a fluctuating character (see Fig. 7.6.1). We expect "better behavior" from the smoothed function g(x, p ) than from the original one. The idea of smoothed functionats is as follows: for a given function g(x) construct a smoothed function gfx, P ) and, operating only with b(x, p), find the extremum for g(x-). In other words, while operating only with
Fig. 7.6.t
A bed "behaved" function.
264
MONTE CARLO OPTIMIZATION
g ( x , P ) , we want to avoid all fluctuation and local extrema of g ( x ) and
find x'. it is obvious that the effect of smoothing depends on the parameter p: for large /3 the effect of smoothing is large, and vice versa. When p+O it follows from condition 2 that & x . P ) - g ( x ) and that there is no smoothing. It is intuitively clear that, to avoid fluctuations and local extrema, /? has to be sufficiently large at the start of the optimization. However, on approaching the optimum we can reduce the effect of smoothing by letting p vanish, since at the extremum point x' we want coincidence of both extrema, g(x) and g ( x , p ) . Accordingly, we speak of a set of smoothed functions g ( x , b , ) , s = I , 2,. . . , while constructing an iterative procedure for finding x + . Before describing the iterative procedure for solving the problem (7. I. t 6 ) , we derive some attractive properties of g(x, p). PROPERTY
I
If g ( x ) is convex, then g ( x , P ) is also convex.
The proof of this property is straightforward. For 0 < h < I A g ( x, p ) + ( I - h ) g ( Y = / h ' ( o . P ) [ Xg(x
-u)+ ( I
I
P ) - kit Ax + t I - x )y ,P )
-X)g(y
-v )-g(hx+(l
- h)y - 0)ldv. (7.6.2)
The convexity of g(x) implies g ( h x + ( 1 - X ) y - u ) =g(A(x
- 0)+ ( 1 - A ) ( y - u ) ) 5 Xg( x - G ) + ( I - h ) g ( y - 0).
(7.6.3)
Substituting (7.6.3) in (7.6.2) and taking into account that k ( o , p ) 2 0, we obtain the proof immediately. PROPERTY
2 I t is readily seen that the gradient of the smoothed function
%(x , /I) may be expressed as
'30
'-03
(7.6.4)
and is called a smoorhcd gradient. Using the right-hand side of (7.6.4), together with condition 1 ), we obtain
(7.6.5)
265
OPTIMIZATION BY SMOOTHED FUNCTIONALS
where (7.6.6)
is the gradient of h( u ) and i3h(o)/i3uk*k = I , . . . ,n, are the partial derivatives. It is important to note that, to find a gradient of the smoothed function &x,P), we do not need to know the gradient of g(x), which sometimes
does not exist at all, We consider also the following smoothed function:
d(x,P)
=jmh*(ri,P)[g ( x +
0 ) +g(x
- ri,]
dri.
(7.6.7)
-m
By analogy with (7.6.4) and (7.6.5) we can obtain the smoothed gradient for g(x, P):
=-I P 1
m
h"(o)[g(x-Pt;)-g(x+P~)]d~
(7.6.8)
-m
Now we give two examples of kernels h'( o, p), which satisfy conditions 1 through 4, and find their smoothed gradients according to (7.6.8). Example 1 Let h ( o ) be an n-dimensional standard multinormal distribution
(7.6.9)
Then the smoothed gradient of g ( x ) is
g x ( x , / 3 ) = - ! - j w c h ( r i ) E g ( ~ + P o ) - - g ( x - P t ; ) ] ~(7.6.10) .
P
Example 2
--m
Let (7.6.1 1)
266
MONTE CARLO OPTIMIZATION
that is, let the random vector u be uniformly distributed over the surface of the unit sphere. The smoothed gradient equals
gxx(x,/?)=
ii
cII= 1
o h ( c ) [g(x
+ P O )-g(x - P O ) ] do.
(7.6.12)
Having g x ( x , / ? ) a t our disposal, we can construct, for instance, an iterative gradient algorithm x,+t =
+,
- agx(x,.@,)),
a>o
(7.6.13)
and find the conditions under which x i converges to x* in the deterministic optimization problem minxEDcR.g(x) = g(x*), which is a particular case of (7. I. 16), with p ( w ) being a Dirac 6 function. Here n( - ) denotes the projection operation on D (i.e., for every x E R", n ( x ) E D and I f x - n(x)li = min,,,l[x -yil), and a is a step parameter. Since g ( x ) is not a "well behaved" function, calculation of the multiple integrals gx( x, 8) and g'( x, P ) are usually not available in explicit form and numerical methods have to be used. One of them is, as we know, the Monte Carlo method. For instance, an estimator of gx(x,/3) can be found by the sampie-mean Monte Carlo method (see Section 4.2.2)
and is called pnrnrnetrical sra[isricul gradient (PSG)11 81. Heref'u) is a p.d.f. from which a sample of length N is taken. Assuming that J( 0 ) h( I)), we obtain, respectively, the PSG in examples 1 and 2, as
=
&x,B)=--
1
*
Nfl / a 1
~[g(x+B~)-g(x-flV,)]
(7.6.15)
and
The r.v.3 in (7.6.15) and (7.6.16) are generated from (7.6.9) and (7.6.11), respectively . By analogy with (7.6.7) the smoothed gradient of + ( x , W ) is oc
4~<(x*P.W)=J h,(u,P)[+(x-
0 ,W , ) - + ( x + u , W 2 ) ] d u
--P
=
I = -J h"( 4 + [ ( x - P o , W , ) - 4 4 x + P o , w,)]do P
9
--m
(7.6.17)
267
OPTIMIU'TION BY SMOOTHED FUNCTIONALS
and by analogy with (7.6.14) the sample-mean Monte Carlo estimator for the smoothed gradient of + ( x , W )is
(7.6.18)
Assuming!( v) zz h( c ) by analogy with (7.6.15) and (7.6.16), we have the PSG for Examples 1 and 2, respectively:
From (7.6.18) through (7.6.20) it follows that the estimator I ( x . P ) of the smoothed gradient & ( x , P , W ) is constructed on the basis of observations of @(x,W ) alone. Both the "artificial" random variable Y and "natural" random variable W are averaged in these equations. Table 7.6.1 presents some smoothed gradients and their estimators. Assuming that the r.v.3 V and W are mutually independent and taking the expectation of I (x , @ ) with respect to W and V, we obtain (7.6.21) where
-
That is, the PSG < ( x , p ) is an unbiased estimator for the smoothed gradient & ( x , p, W ) . Assuming also the independence of W$*s,j I , .. .,N, we obtain the variance of the sth component of & x , P ) :
s-
l,..-,n,
(7.6.23)
where
(7.6.24)
Note: Here
Random search [28]
Formula (7.6.19)
approximation [ 131
stochastic
Algorithm
269
OPTIMIZATION BY SMOOTHED FUNCTIONAIS
and ~ { ( S ( x * S ) , S ( x J 3 )= > )var
[ 5 C,(x,8)]
+cE[l(x*P)3.E[~(x,p)]>
s= I
5 u ' N - ? ~ -+ ~ (&(x,fl).&(x,8)).
(7.6.25)
Here h , , ( Y ) is the s t h coordinate of the vector h J V ) , ( , ) denotes the scalar product, and u2 = n2 max sup u,'(x).
(7.6.26)
XED
Note that nz appears in (7.6.26) rather than n because of the covariance terms. Taking into account that g(x) is bounded for all x E D and var,[+(x. W ) ]< 00, we can readily conclude that u:(x) < 00 for all x E D and therefore u 2 < 00. Now problem (7.1.16') can be solved by the following algorithm: X1+,=a(xi--aE(x,,P,)). (7.6.27) Theorem 7.6.1 Assume that the iterative process is constructed in accordance with (7.6.27) and that for every x E D and for every i the following conditions are satisfied: ( ( x - - , ~ * ) , g , ( x , P , ) > 2 K,Ilx-.x*l12-YI
(7.6.28) (7 6 2 9 )
0
< 2K,Kc'
(7.6.30) (7.6.3 1) (7 h . 3 2 )
ElIx,1I2<
0 0 9
(7.6.33)
where ilx 11 = (Zt- , x ~ ) ' /is~ the norm of x , and K, and K, are positive constants. Then process (7.6.27) converges in the mean square to the point x + , that is, lim,,,EHxi - x*1/' = 0. If we replace condition (7.6.31) by 43
I=
fij-2&.-1
< co
(7.6.34)
i- 1
and conditions (7.6.32) by W
Z yi<mq
i- I
(7.6.35)
270
MONTE CARL0 ODTIMII^A’~ION
then process (7.6.27) converges, with probability I , to x*, that is,
Proof Without loss of generality we can set X* = 0. Taking the conditional expectation of ( I X , + ~11, given x,, .. . ,x,, we obtain from (7.6.27)
E( Itx,+I II 21x,,.. . . X I ) I IIXI I/ - 2a<x,,Et ( ( X I * P I > ] )
+ aZE[tC(x,,P,h
CCx,.a,>>]. (7.6.36)
Substituting E [ [(x, p ) ] = &.( x, P ) in (7.6.36), we obtain
E( Ii X I + I [I 21xIt.. x,) I11 x, 11 - 2 a < x l , & ( x l ,
A))
+ a 2 ~ [ < W , 7 P 1S ()x, , , B , ) ) ] * (7.6.37) Now taking (7.6.26) through (7.6.29) into account, we obtain E(llX,+ll12 p l , . . . . x l ) 5 I I ~ , 1 l 2 - 2 ~ K 1 l I X , I+l 22 w ,
+ a z 4 - ‘p,-2u2+ a2K,11 xi 11
=(I -2aK,+a2K2)~~xl~~2+aZNl~58,~2u2+2a (7.6.38) Taking the expectation of both sides of the last inequality, we obtain
*
E I [ x l +I I[ 2 ( I - 2 a K , + a 2 K 2 ) E)Ix,11
+(I
- 2aKl + c ~ ~ K , ) ~ E 1 1 ~ , 1 1 ~
I
+ 2 (a2j3,-*N 5-
+ a2&-2N,-1u2+ 2aY1
‘az
+ 2ay,)(1 - 2
+
a ~ ,U~K,)’-~.
I
(7.6.39) It FoIlows from (7.6.30) that I rewritten as
- 2 a K , + a 2 K 2< I ;
therefore (7.6.39) can be
I
Ellx,+,l12I K;Ellx,112 + 2 (a’#3~zNs-1u2 + Zay,)K;-’,
(7.6.40)
s= I
where K3 = 1 - 2 a K , + d K 2 .
(7.6.4 I )
The first term in (7.6.38) converges to 0 as i-+oo, since K,< 1 and E l l ~ , 1 1<~00 (see (7.6.33)). Thus the theorem will be proven if we prove
27 1
OPTIMIZATION BY SMOOTHED FUNCTIONALS
that i
( ~ * P , - ’ N- I
Iim 1-Q)
- 2 a y , ) ~ ; - ’ = 0.
5’1
To prove this we assume that for any number e we have chosen a number T such that, for all s > T, a2/3s-2Ns-1u2- 2ays is less than E. Then T
I
2 (a2#3;2Ns-’u2 + 2cry,)K:-’
(a2u2/3,-2N,-1 -t~ C X ~ , ) K ; ~
5 K;[
s= I
Ls-I
x I
+
r=T+ I
1
.
(7.6.42)
i n view of the fact that T is finite, the first term in (7.6.42) tends to zero as K, < 1. Using the formula for a geometrical progression, we obtain: I iim E K ~ (I K;-‘) limEllx,112_
i-+00, since
x
Since E may be any positive number, wc have lim,~mEllx, 11 = 0. This completes the proof of the first part of the theorem. To prove the convergence of (7.6.27) with probability 1, it is sufficient to show that Zp“, E(1l x, 11 2 , < do. Summing both sides of (7.6.40), we have by (7.6.34) and (7.6.35)
,
m
m
i
from which the result follows.
Q.E.D.
Remark 1 The theorem remains valid for the deterministic optimization problem min “ g (x ) = g( x*). xEDcR
which is a particular case of problem (7. I . 16). when W = 0. Remark 2 Condition (7.6.28), together with (7.6.32), allowed g(x) to be nonconvex.
272
MONTE CARLO OPTIMIZATION
APPENDIX
Let S be a random vector uniformly distributed over the surface of a unit n-dimensional sphere with its center at origin, and let R be any given unit vector issuing from the origin (see Fig. 7.A.I).
The p.d.f. of the random angle between E and R is sought. For reasons of symmetry we confine ourselves to the semisphere 0 I cp IT.The p.d.f. is then [28]:
where
The expected value of the r.v. Q) is T E ( q )=2' from which it follows that on the average, R and E are orthogonal. It is readily verified that, as n increases, h , ( q ) approaches to Dirac's B function, that is,
Fig. 7.A.2 represents h,,(q) for different n.
n.2
1* 2
--
&. 7.A.2 en1 n .
The density function of cp for differ-
273
REFERENCES
EXERCISES 1 Find the efficiency C, (7.2.8) and var(coscp) of Algorithm RS-2 analytically and of Algorithm R S 5 by simulation. For Algorithm RS-5 describe the random number generator and the flow diagram of your program.
2 Prove that for a linear function g ( x ) the direction of=,'b in Algorithm RS-5(see (7.1.13)) coincides, on the average, with that of the gradient of g ( x ) .
3 By analogy with algorithm RS-1 (see (7.1.18)) describe the nonlinear tactic Algorithm RS-2, the linear tactic Algorithm RS-3, the statistical gradient Algorithm RS-5 for solving problem (7.1.16). 4 Prove that, if g(x) is convex in R" and if the point x + in which g ( x ) attains its minimum value is unique, then k( x , /I) (see (7.6. I)) is strictly convex. 5 Given a linear function ( c , x ) invariant for the convolution (7.6.1). that is, Ih( 8.x - u ) ( c , u ) du = c. x >, prove that %(x , 8 ) 2 g(x). 6 Prove (7.6.4) and (7.6.5). 7 Prove that, if h,(rp) -i B,,[sin"-'cp[, 0 5 cp 5 27r, then P,,(o), where c = coscp is distributed according to (7.2.12).
<
8 Consider the following modification of Algorithm RS-I (see (7.1.4)):
Find the efficiency C,, (7.2.8) and var(cosq), assuming that g ( x ) is a linear function.
REFERENCES I
2
3 4
5
6 7
8
Archetti, F., A sampling technique for global optimizstion, in Towards Global Optimlxation. edited by L. C. W. Duon and G. P. &go, North Holland, American Elsevier. New Yo&, 1975. Avriel, M., Nonlinear Programming. AnuryJis and Merhadr. Rentice-Hall, Englewood CliSfs, New Jersey, 1976. Becker, R. W. and G. V. Lago, A global optimization algorithm, Eighth Allerton Conference on Circuits and System Ttreory,1970, pp, 3- 13. Brooks, S. H.. A discussion of random methods for gceking maxima, Oper. Res., 6, 1958, 244-25 I . Chichinadze, V. K., Random search to dettnnine the extremum of the function of several variables, Eng. Cybern., 1, 1967, 115- 123. Dcvroye, L. P., On the convergence of statistical search, Inst. Elm. Electron. Eng. Tram. sysl., Man,Cybern.. 6, 1976, 46-56. Dixon, L. C. W., Global optimization without convexity. Technical Report N85, The Hatficld Polytechnical Numerical Optimization Center, July 1977. Dvoretzky, A., On stochastic approximation, in Proceedings of the Third Berkelq Symposium on Mathematical S1arislics and Probabilily, Vol. 1, 1956, pp. 39-55.
274
MOW3E CARL0 OPTIMIZATION
9 Ermolyev, Yu. M., Stochastic Programming Methoa3, Nauka, Moscow, 1976 (in Russian). LO Ermolyev, Yu. M.,On the method of generalizcd stochastic gradients and quasi-Ftjersequences, Cybernetics, 5, 1969, 208-220. 11 Evtushenko. Yu. G. Numerical methods for finding global exlrema (case of a non uniform mesh). U.S.S.R. C o w . Math. Math. Phys., It, No. 6, 1971. pp. 38-55. I2 Gaviano, M., Some general rcsuits on the convergence of random search algorithms in minimiation problems, in Toward Global Qntimizatian, cdited by L. C.W. Dixon and 0.P. Szeg&. North Holland, American Elsevier, New Yo&, 1975. 13 Gladyshev, E. Y., On Stochastic Approximation. Theory Prob. Appl., 1966, No. 2, 272-275. 14 Gnedenko, B. V., Sur la distribution d u tcnne maximum d’une serie aleatoirc. Ann. Math., 44, 1943, 423-453. 15 Gomulka, J., Numerical experience with Tom’s clustering algorithm and two implemcntations of Branin’s method, in Toward Global @timuation, Vol. 2, edited by L. C. W. Dixon and G. P. Szego. North Holland, American Elsevier, New York, 1977. 16 liammersley, 1. M. and D. C. Handscomb, Monte Car& Methods, Wiley, New Yo&; Mcthuen. London. 1964. 17 Katkovnik, V. Ya., Linear Esritnationt and Stochric @timiUrtion Problem, Nauka, Moscow, 1976 (in Russian). 18 Katkovnik, V. Ya. and Yu. Kulchitsky, Convergence of a class of random search algorithms, Automat. Remote Control 1972, No. 8, 1321- 1326. 19 Kiefer, J.. and J. Wolfowitz, Stochastic estimation of the msximum of a regression function, Ann. Math. Stat., 23, 1952, 462-466. 20 Ktciza, V., On the modolmg of nonlinearity by the sequence of Markov chains, Liih. Math. 1..XV, No. 4, 1975, 125-130. 21 Mangasanan, 0. L.. Nonlinear Programming, McGraw-Hill, New York, 1969. 22 Matyas, J., Random optimization, Automat. R e m e Contrd, 16, 1965, 246-253. 23 Meerkov. S. M., Deceleration in the search for the global extremum of a function, A u ~ o m tRemote . Confml, 1972, No. 12, 129- 139. 24 Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. TeUer. and E. Teller. Equations of state calculations by fast computing machines. 1.Chem. Physics, 21. 1953, 1087- 1092.
25 Pincus, M., A closed form selection of certain programming problems,, Opcr. Ra., 16, 1%8,690-694. 26 Pincus, M.,A Monte-Carlo method for the approximate solution of certain types of constrained optimization probkms, Oper. Res., 18, 1970, 1225- 1228. 27 Price, W.L., A controlled random search procedure for global optimization, in Towar& Global Optimization, Vol. 2, edited by L. C. W. Dixon and G. P. &go, North Holland, American Elsevier, New York, 1977. 28 Rastrigin, L. A., The Stochastic Methods o/ Search, Nauka, Moscow, 1968 (in Russian). 29 Rastrigin, L. A., and Y. Rubinstein, The comparison of the random search and the stochastic approximation wlule solving the problem of optimization, A w m r . C o n f d , 2, NO.5, IW9.23-29. 30 Robins, H.. and Monro, S.. A stochastic approximation metbod, Ann. Marh. Star., 22, 1951,400-407.
REI'ERENC'ES
31
275
Rubinstein, Y.,Convergence of the random search algorithm, Auromaf. Control, 3, No. I,
1969.46-49. 32 Rubinstein, Y., Piece-wise-linear representation of function in situation of noise, Automat. Contro!, 2, No. 5, 1968, 36-42. 33 Rubinatein, Y.,choice of the optimal search strategy. J . Opfimizar. 'lirteoryAppl., IS,No. 3. March 1976, 309-317. 34 Rubinstein, Y. and J. Har-El. Optimal performane of learning automata in switched random environments, Imr. Elec. Electron. Eng. Tram. Sy$t., Man, Cyber.. SMC-7, 1977, 674- 678.
35 Rubinstein. Y.and A. Karnovsky, Local and integral properties of a search algorithm of the stochastic approximation type. Sfmhnstic Processes Appl., 6, 1978, 129- 134. 36 Rubinstein. Y.and 1. Weissman, The Monte-Carlo method for global optimization, Cah. Cen. Etud. Rech. Qper., 21, No. 2, 1979, 143-419. 37 Shubert, 6.O., A sequential method for searching the global maximum of a function. Six. Indust. Appl. Math. J . Numer. AMI., 1972. No.9,379-388. 38 Suitti, C., Cnnvergence proof of minimization algorithms for nonconvex functions, JOTA. 23, 1977,203-210. 39 Strongn, R. G., Simple search algonthm for global extremum of function of several variables and its use in functions approximation problem, Rndiofizjku, 7, No. 15, 1972, 1077- 1085. 40 Tom, A., A search clustering approach to the global optimization problem, in Towwdr G t W Optimixalion. Vol. 2, edited by I-. C . W.Dixon and G. P. Szcgo, North HoUand, American Elsevier, New York. 1977. 41 Van Ryrin, J., On strong consistency of density estimations, Ann. Muth. Bar., 40, 1%9, 1765.- 1772. 42 Venter, H. J., On estimation of the mode. Ann. Math. Stat., 38. 1%7, 1446- 1455. 43 Wasan, M.T.. Stocliartic Approximafion. Camhridge University Pnsu. New York, 1%9. 44 Wilde. D. J., Optimum Seeking Merho&, PrentiLmHall, Englewood Cliffs, New Jersey, 1964.
Yakowitz, S. J. and L. Fisher, On sequential search for the maximum of unknown function. J. Math. A w l . Appl., 41. 1973, 234-259. 46 Zielinski, R., A Monte Carlo estimation of the maximum of a function. Algorithm, MI, NO. 13, 1970, 5-7. 45
Simulation and the Monte Carlo Method REUVEN Y. RUBINSTEIN Copyright © 1981 by John Wiley & Sons, Inc.
Index Acceptance-rejection method, 45 Ahrens, J., 35, 70, 75 Antithetic variates, 135, 149, 151 Avriel, M, 234, 235, 273 Biased estimator, 145 Cheng, R,CH., 73, 75, IN, 112 6nlar, 182,231 Closed queueing network, 197 Composition method, 43 Conditional Monte Carlo, 141 Confidence interval, 187, 188 Congruential generators, 21, 22 Constrained optimization problem, 208, 259 Control variates, 126, 150, 214 Correlated sampling, 124 Crane, M. A., 183,231 Devroye, L. P.t 273 Dieter, U„ 70, 1 U Dirichlet problem, 179 Dixon, L C . W„ 252, 273 Dvorctzky, A., 240, 273 Efficiency of Monte Carlo method, 119 Eigenvalue problem, 178 Ergodic Markov chains, 160 Ermarkov, S. M„ 17, 143, 156 Estimates: interval, 187 point, 187 Estimators: Beaie, 189 Fieller, 189 jackknife, 189 Tin, 190 Fishman, G. S., 8, 17, 73, 86, 112
Forsythe, GM 56, 70, 158 Gaver, D. PM 157 Generalization of von Neumann's method, 51 Generation: beta, 80 binomial, 101 Cauchy, 91 ehi-square, 93 discrete uniform, 95 Eriang, 71 exponential, 67 extreme value, 107 gamma, 71 geometric, 104 hypcrgeometric, 106 logistic, 107 lognormal, 91 multtnormaf, 65 negative binomial, 104 normal, 86 Poisson, 102 Student, 94 Wei bull, 92 Global maximum, 234 Global optimization, 234, 252 Halton, I. H„ 156 Hammersley, I, M„ 18, 141, 156 Handscornb, D. C, I41, 156 Heidelberger, P., 192, 218, 220 Iglehart, D. L., 183, 199, 231 Importance sampling, 122 IntegraJ equations, 173 Inventory model, 230 Inverse matrix, 168 Inverse transform method, 39
277
278 Jdhnk, M. D„ 72,81, 82 Katkovnik, Ya., 236, 274 Kicfer, JM 240, 274 Kieinen, J. P. C , 18 Knuth, D„ 31,232 Lavenburg. S* S., 126, 156,232 Law, A. M , 233 Lewis, P. A. W„ 18, 233 Linear equations, 158 Local ex trema, 234 Markov chain, 160, 185 Marsaglia, G.t 36, 70, 112 Marshal, A. W,t 156 Michailov, S. A., 18, 81, 112, 156 Mitchel, B„ 156,233 Monte Carlo integration, 115 Monte Carlo methods, 6, 11, 12 Monte Carlo optimization, 234 Multicxtremal function, 236 Multiplicative generator, 23 Naylor, 6, 8, 9, 18,36 Networks, 150 Neuts, M., 112, 156 Page, E. S-, 36, 150, 157 Pinkus, M, H , 260, 274 Queueing: Gl/G/I, 193 M / M / l , 195,230 Random quadrature method, 143 Random search algorithms, 235, 268 double trials algorithm, 236 nonlinear tactic algorithm, 237 optimum trial algorithm, 238 statistical gradient, 239 Rastrigin, L A., 236, 242t 274
INDEX Regenerative process, 184 Regenerative simulation, 184 Repairman model with spares, 195, 230 Rubinstein, Y. R., 200, 212, 233, 236, 242 Selecting best stable stochastic system, 199 Smoothed functional, 253 Smoothed gradient, 264, 266 Spanier, J., 156 Stochastic approximation, 240 Stochastic optimization problem, 258 Stopping time, 185 Stratified sampling, 131 Tests of pseudo-random numbers, 26 chi-square goodness-of-fit test, 26 Cramer-von-Mises goodness-of-fit test, 30 gap test, 32 Kolmogorov-Smirnovgoodness-oMit test, 27 maximum test, 33 run-up-and-down, 31 serial test, 30 Tocher, D. D., 19,37,90, 112 Tukey, J. WM 141 Variance reduction technique, 121,213 antithetic variate, J 35, 149, 159 common random numbers, 224 conditional Monte Carlo, 141 control variates* 126, 150, 214 correlated sampling, 124 importance sampling, 122 stratified sampling, 131 Von Neumann, J., 37, 113 Walker, A, JM 113 Wasan, M, T.f 275 Weighted Monte Carlo method, 147 Welch, P., 126 Wolfowit*, J.,240 Yakowiu, S. X, 19, 113, 147, 156
Simulation and the Monte Carlo Method
R E W E N Y. RUBINSTEIN Copyright 0 1981 by John Wiley & Sons, Inc.
WlLEY SERIES IN PROBABILII'Y AND MATH EM AT1CA 1, STAT1 S'I'ICS ESTABL.ISHEL) BY WA1.TER
Editors
Ralph A. Bradky J. Stuart Hunter
A.
StlEWIlART A N D SAMtlEL
s. WILKS
David G. Kendall Geo{frey S. Watson
Probability and hfathematirol Siati.stic.7 ADLER The Geometry of Random Fields ANDERSON The Statistical Analysis of Time Series ANDERSON A n Introduction to Multivariate Statistical Analysis ARAUJO and GlNE The Central Limit Theorem for Real and Ranach Valued Random Variables ARNOLD The Theory of Linear Models and Multivariate Analysis Statistical BARLOW, BARTHOLOMEW, BREMNER, and BRUNK Inference Under Order Restrictions BARNETT Comparative Statistical Inference BHATTACHARYYA and JOHNSON Statistical Conceptsand Methods BI1.I-INGSLEY Probability and Measure CASSEL. SARNDAL. and WRETMAN Foundations of Inference in Survey Sampling DE FINETTI Theory of Probability, Volumes 1 and II DOOB Stochastic Processes FELLER An Introduction to Probability Theory and Its Applications. Volume I, n i r d Edirion, Revised FELLER An Introduction to Probability Theory and Its Applications. Volume I I . Srrond Edirion FULLER Introduction to Statistical Time SeriL% GRENANDER Abstract Inference HANNAN Multiple Time Series HANSEN. HURWITZ. and MADOW Sample Survey Methods and Theory, Volumes 1 and I I HARDING and KENDALL Stochastic Geometry HOEL Introduction to Mathematical Statistics. Fourth Edirion HUBER Robust Statistics IOSIFESCU Finite Markov Processes and Applications ISAACSON and MADSEN Markov Chains KAGAN, LINNIK, and RAO Charactcrizdtion Problems in Mathematical Statistics KENDALL. and HARDING Stochastic Analysis LAHA and ROHATGI Probability Theory LARSON Introduction t o Probability Theory and Statistical Inlerence. Second Edition LARSON Introduction to the Theory of Statistics LEHMANN Testing Statistical Hypotheses MATHERON Random Sets and Integral Geometry MATTHES, KERSTAN, and MECKE Infinitely Divisible Point Processes PARZEN Modern Probability Theory and Its Applications PURl and SEN Nonparametric Methods in Multivariate Analysis RANDLES and WOLFE Introduction to the Theory of Nonparametric Statistics RAO Linear Statistical Inference and Its Applications, Second Edirion ROHATGI An Introduction to Probability Theory and Mathematical Statistics RUBINSTEIN Simulation and The Monte Carto Method SCHEFFE The Analysis of Variance SEBER linear Regression Analysis SEN Sequential Nonparametrics: lnvariance Principles and Statistical Inference SERFLING Approximation Theorems of Mathematical Statistics TJUR Probability Based on Radon Measures
Prohutiitii,~~ onif Mutlic.*muti d Siuiisrtcr. (CtJniinwd)
WI1.L.IAMS Diffusions. Markov f’rocesses. and Martingales. Volume I: F‘oundations ZACKS Thcorq of Statistical Inference Applied Probabilit.r and Statblirs ANDERSON. AUQUIER. HAUCK, OAKES, VANDAELE. and WEISBERG Statistical Methods for Comparative Studies ARTHANARI and DODGE Mathematical Programming in Statistics BAILEY The Elements of Stochastic Processes with Applications to the Natural Sciences BAILEY Mathematics. Statistics and Systems for Health RARNETT Interpreting Multivariate Data BARNETT and LEWIS Outliers in Statistical Data BARTHOLOMEW Stochastic Models for Social Processes. Sewnd Edition
BARTHOLOMEW and FORBES Statistical Techniques for Manpower Planning BECK and ARNOLD Parameter Estimation in Engineering and Science BELSLEY. K U H , and WELSCH Regression Diagnostics: Identifying Influential Data and Sources of Collinearity BENNETT and F R A N K L I N Statistical Analysis in Chemistry and the Chemical industry RHAT Etements of Applied Stochastic Processes BLOOMFIELD Fourier Analysis of Time Series: An Introduction BOX R. A. Fisher, The Life of a Scientist BOX and DRAPER Evolutionary Operation: A Statistical Method for Process Improvement BOX. HUNTER, and HUNTER Statistics for Experimenters: An Introduction to Design, Data Analysis. and Model Huilding BROWN and HOLLANDER Statistics: A Biomedical Introduction RROWNLEE Statistical Theory and Methodology in Science and Engineering. Sewnd Oliricin BURY Statistical Models in Applied Science CHAMBERS Computational Methods for Data Analysis C34ATTERJF.E and PRICE Regression Analysis by Example CHERNOFF and MOSES Elementary Decision Theory CHOW Analysis and Control of Dynamic Economic Systems CHOW Econometric Analysis by Control Methods CI.ELl.AND. BROWN. and dcCANl Basic Statistics with Business Applications. Second E l i r b n COCH R A N Sampling Tcchniques. 7’hird Edition COCH R A N and COX Experimental Designs. Second Edirion CONOVER Practical Wonparametric Statistics. Secwd Eilirion CORNE1.I.. Experiments with Mixtures: Designs. Modelsand The Analysis of Mixture Data COX Planning of Experiments DANIEL Biostatistics: A Foundation for Analysis in the Health Sciences. Stc-andEdifion
DANIEL Applications of Statistics to Industrial Experimentation DAKIEI. and WOOD Fitting Equations t o Data: Computer Analysis o f Multifactor Data, Second Edition DAVID Order Statistics. Second Edirion DEMING Sample Design in Business Research DODGE and ROMlG * Sampling inspection Tables. Second Edition DRAPER and SMITH Applied Regression Analysis, Second Edition DUNN Rasic Statistics: A Primer for the Biomedical Sciences. Second Edirion DUNK and CLARK Applied Statistics: Analysis of Variance and Regression ELANDT-JOHNSON Probability Models and Statistical Methods in Genetics ELANDT-JOHNSON and JOHNSON Survival Modeis and Data Analysis 9
wnrinued on back
A p p l i d 1'rohuhiiir.r and .Stuii.stinv (Con t i n i d ) FI.EISS Statistical Methods for Rates and Proportions, Srcond Edition
GAi.AMROS The Asymptotic Theory of Extreme Order Statistics GIBBONS, OLKIN. and SOBEL Selectingand Ordering Populations: A New Statistical Methodology GNANADESIKAN Methodsfor Statistical Data Analysisof Multivariate Observations GOLDBERGER Econometric Theory GOLDSTEIN and DI LLON Discrete Discriminant Analysis GROSS and CLARK Survival Distributions: Reliability Applications in the Biomedical Sciences GROSS and HARRIS Fundamentals of Queueing Theory CU PTA and PANCHAPAKESAN Multiple Decision Procedures: Theory and Methodology of Selecting and Ranking Populations GLiTTMAN. WILKS. and HUNTER Introductory EngineeringStatistics. Second Edition
HAHN and SHAPIRO Statistical Models in Engineering HALD Statistical Tables and Formulas HALD Statistical Theory with Engineering Applications HARTIGAN Clustering Algorithms HILDEBRAND. I-AING, and ROSENTHAL Prediction Analysis of Cross Classifications HOEL Elementary Statistics, Fourth Edition HOLLANDER and WOLFE Nonparametric Statistical Methods JAGEKS Branching Processes with Biological Applications JESSEN Statistical Survey Techniques JOHNSON and KOTZ Distributions in Statistics Discrete Distributions Continuous Univariate Distributions-I Continuous Univariate Distributions 2 Continuous Multivariate Distributions JOHNSON and KOTZ Urn Models and Their Application: An Approach to Modern Discrete Probability Theory J O H N S O N and LEONE Statistics and Experimental Design in Engineering and the Physical Sciences, Voturnes I and 11, Stwind Edition JUDGE. GRIFFITHS. HILL and LEE The Theory and Practice of Econometrics KALBFLEISCH and PRENTICE The Statistical Analysis of Failure Time Data KEENEY and RAIFFA Decisions with Muhiole Obicctives LANCASl ER An Introduction to Mtdical SiatistiG L.EAMER Specification Searches: Ad Hoc Inference with Nonexperimental Data McNElL Interactive Data Analysis MA". SCHAFER and S1NGPIIRWALI.A Methods for Statistical Analysis of' Reliability and Life Data MEYER Data Analysis for Scientists and Engineers MILLER Survival Analysis MILLER, EFRON. BROWN, and MOSES Biostatistics Casebook OTNES and ENOCHSON Applied Time Series Analysis: Volume 1, Basic Techniques OTNES and ENOCHSON Digital Time Series Analysis POLLOCK The Algebra of Econometrics PRENTER Splines and Variational Methods R A O and MlTRA Generalized Inverse of Matrices and Its Applications RIPLEY Spatial Statistics SCHUSS Theory and Applications of Stochastic Differential Equations SEAL Survival Probabilities: The Goal of Risk Theory SEARLE Linear Models SPRINGER The Algebra of Random Variables UPTON The Analysis of Cross-Tabulated Data WEISBERG Applied Linear Regression WHITTLE Optimization Under Constraints
Applied Prc,hahiiit,y und Sturktirs (C'onlinued)
WILLIAMS A Sampler on Sampling WONNACOTT and WONNACOTT Econometrics. Second Edition WONNACOTTand WONNACOTI' * Introductory Statistics, ntirdEdiirion WONNACOTT and WONNACOTT Introductory Statistics for Business and Economics. Second Edirbn WONNACOTT and WONNACOTT Regression: A Second Course in Statistics ZELLNER An Introduction to Bayesian Inference in Econometrics Trucrs on Probabiliiy and Staristics BARNDORFF-NIELSEN Information and Exponential Families in Statistical Theory BHAlTACHARYA and RAO Normal Approximation and Asymptotic Expansions BIBBY and TOUTENBERG Prediction and improved Estimation in Linear Models RlLLlNGSLEY Convergence of Probability Measures JARDINE and SIBSON Mathematical Taxonomy KELLY Reversibility and Stochastic Networks KINGMAN Regenerative Phenomena RAKTOE. HEDAYAT, and FEDERER Factorial Designs 0
~jllI/lll~Il~ll1 lll/
9 780471 089179