CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS A series of lectures on topics of current research interest in applied mathematics under the direction of the Conference Board of the Mathematical Sciences, supported by the National Science Foundation and published by SIAM. GARRETT BIRKHOFF, The Numerical Solution of Elliptic Equations D. V. LINDLEY, Bayesian Statistics, A Review R. S. VARGA, Functional Analysis and Approximation Theory in Numerical Analysis R. R. BAHADUR, Some Limit Theorems in Statistics PATRICK BILLINGSLEY, Weak Convergence of Measures: Applications in Probability J. L. LIONS, Some Aspects of the Optimal Control of Distributed Parameter Systems ROGER PENROSE, Techniques of Differential Topology in Relativity HERMAN CHERNOFF, Sequential Analysis and Optimal Design J. DURBIN, Distribution Theory for Tests Based on the Sample Distribution Function SOL I. RUBINOW, Mathematical Problems in the Biological Sciences PETER D. LAX, Hyperbolic Systems of Conservation Laws and the Mathematical Theory of Shock Waves I. J. SCHOENBERG, Cardinal Spline Interpolation IVAN SINGER, The Theory of Best Approximation and Functional Analysis WERNER C. RHEINBOLDT, Methods of Solving Systems of Nonlinear Equations HANS F. WEINBERGER, Variational Methods for Eigenvalue Approximation R. TYRRELL ROCKAFELLAR, Conjugate Duality and Optimization SIR JAMES LIGHTHILL, Mathematical Biofluiddynamics GERARD S ALTON, Theory of Indexing CATHLEEN S. MORAWETZ, Notes on Time Decay and Scattering for Some Hyperbolic Problems FRANK HOPPENSTEADT, Mathematical Theories of Populations: Demographics, Genetics and Epidemics RICHARD ASKEY, Orthogonal Polynomials and Special Functions L. E. PAYNE, Improperly Posed Problems in Partial Differential Equations SAUL ROSEN, Lectures on the Measurement and Evaluation of the Performance of Computing Systems HERBERT B. KELLER, Numerical Solution of Two Point Boundary Value Problems J. P. LASALLE, The Stability of Dynamical Systems—Z. ARTSTEIN, Appendix A: Limiting Equations and Stability of Nonautonomous Ordinary Differential Equations DAVID GOTTLIEB and STEVEN A. ORSZAG, Numerical Analysis of Spectral Methods: Theory and Applications PETER J. HUBER, Robust Statistical Procedures HERBERT SOLOMON, Geometric Probability FRED S. ROBERTS, Graph Theory and Its Applications to Problems of Society JURIS HARTMANIS, Feasible Computations and Provable Complexity Properties ZOHAR MANNA, Lectures on the Logic of Computer Programming ELLIS L. JOHNSON, Integer Programming: Facets, Subadditivity, and Duality for Groups and Semi-Group Problems SHMUEL WINOGRAD, Arithmetic Complexity of Computations J. F. C. KINGMAN, Mathematics of Genetic Diversity MORTON E. GURTIN, Topics in Finite Elasticity THOMAS G. KURTZ, Approximation of Population Processes JERROLD E. MARSDEN, Lectures on Geometric Methods in Mathematical Physics BRADLEY EFRON, The Jackknife, the Bootstrap, and Other Resampling Plans
Lectures on the Measurement and Evaluation of the Performance of Computing Systems SAUL ROSEN Purdue University
SOCIETY for INDUSTRIAL and APPLIED MATHEMATICS • 1976 PHILADELPHIA, PENNSYLVANIA 19103
Copyright 1976 by Society for Industrial and Applied Mathematics All rights reserved Second printing 1981 Third printing 1983 ISBN: 0-89871-020-0
Printed for the Society for Industrial and Applied Mathematics by J. W. Arrowsmith Ltd., Bristol, England BS3 2NT
Acknowledgments This monograph is based on a series of ten lectures delivered at a regional conference on Measurement and Evaluation of Computer Systems held at the College of William and Mary on July 15-July 19, 1974. The conference was sponsored by the Conference Board of the Mathematical Sciences and the National Science Foundation. Some of the research discussed here was supported by the National Science Foundation under Grant GJ-41289. I want to take this opportunity to thank the many people at the College of William and Mary who made the arrangements, and who extended their very gracious hospitality to me and my wife and to my youngest daughter, Janet. I especially want to mention Dr. Norman Gibbs, a former student and colleague at Purdue University, who first suggested to me the idea of a regional conference at William and Mary, Dr. William G. Poole, Jr. who handled many of the preconference negotiations, and Dr. Stuart W. Katzke, director of the conference, who so ably handled all of the details of organizing and running the conference. My thanks go also to the National Science Foundation and the Conference Board of the Mathematical Sciences, and to the participants in the conference who contributed so much to the exchange of ideas and information that took place there. SAUL ROSEN
Hi
This page intentionally left blank
Contents Acknowledgments
iii
Chapter 1 INTRODUCTION 1.1. Aspects of performance 1.2. Capacity 1.3. Workload and benchmarks 1.4. Performance prediction 1.5. Literature
1 6 7 8 9
Chapter 2 A CONCEPTUAL MODEL OF A COMPUTING SYSTEM 2.1. Introduction 2.2. Resources and processes 2.3. Terminology 2.4. The supervisor process 2.5. The spooling process 2.6. Time-sharing process 2.7. Job processes
11 11 13 13 16 16 17
Chapter 3 PERSONAL OBSERVATION 3.1. Introduction 3.2. Human engineering 3.3. Turnaround and response 3.4. Direct observation 3.5. Elementary considerations 3.6. Examples
19 19 19 20 20 21
Chapter 4 ACCOUNTING SYSTEMS 4.1. Introduction 4.2. Validity of data 4.3. Understanding the system 4.4. Accounting and billing systems 4.5. Dayfile analyzer 4.6. Consistency of data
25 25 25 26 33 44 V
vi
CONTENTS
Chapter 5 SOFTWARE PROBES 5.1. Introduction 5.2. Sampling probes 5.3. Trace programs 5.4. Tracing probes 5.5. The probe process 5.6. Other tracing probes
45 45 47 47 50 71
Chapter 6 HARDWARE MONITORS 6.1. Introduction 6.2. IBM hardware monitors 6.3. Commercially available hardware monitors 6.4. Hardware monitors and software probes 6.5. Minicomputer-based hardware monitors 6.6. Hardware program monitors
73 73 74 74 75 78
Chapter 7 BENCHMARK WORKLOAD MODELS 7.1. Introduction 7.2. Real benchmark models 7.3. Synthetic benchmark models 7.4. Simulating a benchmark workload 7.5. System dependent workload models 7.6. Use of benchmarks in computer design 7.7. Comments on equipment selection
79 79 82 82 84 84 85
Chapter 8 SIMULATION 8.1. Introduction 8.2. Simulation languages 8.3. Characteristics and difficulties of simulation 8.4. A classic simulation study 8.5. General simulation models 8.6. Simulation of a time-sharing process 8.7. Length of simulation runs 8.8. The starting and stopping problem 8.9. Trace-driven modeling 8.10. Models driven by empirical distributions Simulation program listings and outputs
87 87 89 90 90 91 99 100 101 102 103
Chapter 9 MATHEMATICAL MODELS 9.1. Models of computing systems
119
CONTENTS
9.2. The single server queue 9.3. Time slicing 9.4. Mathematical modeling and simulation 9.5. Networks of queues References
vii
120 126 129 130 135
This page intentionally left blank
CHAPTER 1
Introduction Our topic is the measurement and evaluation of the performance of computing systems. Just about everything that is done with the aid of a computer is an instance of computer system performance that could be a subject of measurement and evaluation. We shall limit the scope of our discussions to certain types of general purpose computing systems. In Chapter 2 we shall try to describe in broad outline some of the important characteristics of this type of system. We shall not discuss a particular real system, but rather an abstraction, a model that represents many computing systems of the class under consideration. We are mainly interested in multipurpose computer systems whose characteristics involve multiprogramming, multiprocessors, communication lines serving many users, and on-line terminal systems. We study such systems, at least in part, because they are inherently interesting. It has been suggested that they are the most complex artifacts that man has ever devised. It has also been suggested that they are too complicated for the tasks for which they were intended, that their complexity makes them inherently inefficient, that they are temporary phenomena that will soon disappear. All of these suggestions may be true. There are many practical reasons for measurement and evaluation. The advertising literature and even the technical literature (in this area it is sometimes hard to distinguish between the two) tell many stories of how applications of measurement techniques led to large savings, increases in efficiency, informed decisions about whether or not to upgrade hardware, etc. All of these practical considerations can be seen to result from a better understanding of computing systems, and the goal of all of our investigations is thus to gain a better understanding of computing systems through the techniques of measurement and performance evaluation. 1.1. Aspects of performance. The aspects of performance that we shall consider fall into three general categories: throughput, responsiveness and quality. There is, of course, a great deal of overlap and mutual interaction among these categories. We shall discuss computing systems that are combined hardware and software systems. The descriptive phrase "extended machine" was used in an earlier generation to represent this concept. The machine that serves the user, the machine the user knows and sees, is the combination of hardware and software that performs his jobs. 1
2
CHAPTER 1
1.1.1. Throughput. The throughput of a system is the amount of work actually accomplished by the system. Throughput is measured in terms of items such as: number of jobs processed, number of processor hours used, amount of input/output data transferred, number of transactions processed, number of terminal hours logged, etc. A good billing and accounting system can give a useful combined measure of total throughput in terms of the amounts billed to the users of the system. The accounting system should also contain a great deal of information about various components of system throughput. The use of accounting system data in performance studies is discussed in Chapter 4. The throughput of a system is usually considerably below the maximum capacity of the system. This is partly due to inefficiencies that are bound to exist in any complex system. It is partly by design, since a heavily loaded system almost always gives poor response to its users. An attempt to push the throughput of a system up to and beyond its capacity produces saturation phenomena that are not easily predictable. Figure 1.1 is a typical graph of throughput as a function of system load. Beyond a certain point the throughput actually decreases as the load increases.
FIG. 1.1. Typical throughput of a batch system as a function of size of load
1.1.2. Responsiveness. Every task submitted to a computer has a stated or an implied response time requirement. Turnaround time is an important measure of the responsiveness of a batch processing system, or of the batch processing component of a more general purpose system. Turnaround is the elapsed time between the time the user submits a job and the time the results are available to him. In some computing systems there may be a significant interval between the time the user submits a job and the time the computer becomes aware of it. For example, there may be a courier service that delivers card decks from a pickup station to the computing center. Similarly there may be a considerable interval between the time the computer finishes printing its results and the time the output is seen by the user. Turnaround may be quite dependent on factors other than the capability of the computing system. It may depend very strongly on the administration of the computing center.
INTRODUCTION
3
Most computing centers establish policies that favor some classes of jobs at the expense of other classes. A typical policy attempts to guarantee good turnaround for all short jobs and for longer jobs that have some reason to be granted special priority. In a loaded system this may result in very long delays for long non-priority jobs. Some computing centers use their pricing policies to permit the user some control over turnaround. He may choose fast turnaround service at a premium price, or overnight or weekend service at a discount. Average turnaround, average turnaround by job class, and turnaround time as a function of job characteristics, are data that can give a good deal of insight into the performance of a system. These data are usually very sensitive to the nature and magnitude of the load on the system. A system with good turnaround is almost always running with a great deal of idle capacity. One of the recurring problems in performance analysis is to estimate the effect on turnaround when the load on a system increases and approaches the saturation level. 1.1.2.1. Response time. Response time is a term used in connection with on-line interactive systems, the so-called time-sharing systems, in which there are
FIG. 1.2. Typical response time graph for a time-sharing system
4
CHAPTER 1
a number of users (sometimes a large number) at keyboard terminals, or at other types of terminals that permit interaction with the computing system. Our discussion of such systems will be in terms of keyboard terminals. The response time for an interaction is measured from the time the user strikes a key that transmits a request for service (a carriage return or other attention key) to the time the response to that request starts printing at the terminal. Response time will be discussed in more detail in Chapter 2 and in later chapters. Average response time and response time distributions are important performance data in time-sharing systems and in the time-sharing components of general purpose systems. A very large amount of the literature of computer performance analysis deals with the study of the response time characteristics of real systems and of models of such systems. Here again it is usually necessary to have a good deal of unused capacity in order to maintain good response. A very difficult and important problem is to estimate the effect on the response time distribution of an increase in the number of simultaneous users of a time-sharing system, and the extent to which service deteriorates as that number approaches and possibly exceeds the capacity of the system. A typical curve of response time as a function of number of terminals is shown in Fig. 1.2. Chapters 8 and 9 discuss mathematical and simulation models of time-sharing systems. Turnaround can be a factor in response time as experienced by the user. A good deal of activity at interactive terminals in a general purpose system consists of the preparation and editing of program files, which are sent as jobs to the batch system. The results can be retrieved and displayed at the terminal, errors can be corrected, the job resubmitted, etc. The speed with which jobs, especially small jobs, can be processed for the terminal user is an important aspect of the performance of such systems. 1.1.3. Quality. Performance analysts deal mostly with quantitative measures of performance. They respond to questions such as: How much work can the system do under actual or hypothetical conditions? How fast can a real or hypothetical system respond to requests for various kinds of service? Most of the literature in the field addresses itself to questions of this kind, and the measurements of throughput and response supply data that can be used in studies in these areas. It is also important, though usually much more difficult, to consider qualitative aspects of performance. In the study of such qualitative aspects, performance analysis overlaps many areas of computer science and computer engineering. It might be interesting to study the whole computer field from the point of view of performance evaluation, but that is not bur purpose here. In this section we shall merely point out and discuss briefly a few of the questions that should be raised, and subjects that should be considered, in evaluating the quality of performance of a computing system. 1.1.3.1. Hardware reliability. Computing systems and components of systems experience failures that cause all or part of the system to malfunction. The mean time between failures is often used as a measure of hardware reliability. This is a
INTRODUCTION 5
much better measure than up-time or down-time percentages, which were popular measures for the early computer systems and are still sometimes used. Systems that were essentially unusable because of high error frequency could still report 90% and more up-time if the duration of each malfunction were short. Mean time to failure is not always unequivocal, since it is not always clear whether a number of failures that occur close together should be reported as multiple failures or as a single failure. Another useful performance measure in this area is mean time to repair. It is usually assumed that the computer hardware gives correct answers, and this is generally true. Even though they are infrequent, there are hardware errors, and the extent and validity of error detection and correction is an aspect of performance that should be considered. Many computing systems, even relatively modest ones, are networks that involve communications equipment and a variety of processors and peripheral equipment. The very high level of reliability that is usually present in the central computer does not give any insight into the reliability of the whole network. 1.1.3.2. Software quality. Some of the same measures that are used for hardware can be used for software in assessing its reliability. Mean time to failure and mean time to repair are relevant measures. Valid data in this area are rarely available. Software reliability has only recently started to receive serious attention as an area of research and study. The first major conference in this area was held in 1973 [IE73]. Software quality considerations include the quality and availability of language processors, of libraries, and utility routines. The nature and ease of use of the job control language can be an important factor in performance. Many computer resources are wasted because efficient ways of using the system often require that users of high-level programming languages must know and use difficult low-level command languages. The quality of the documentation may be another important performance factor. In time-sharing systems the quality of the editors, the capacity and convenience of file-storage systems, the availability of incremental compilers and other appropriate software, are at least as important as average response time in evaluating the performance of the system. There are time-sharing systems that cost as low as $1000-$2000* per terminal served, and others that cost $10,000-$20,000 and more per terminal. These systems may very well all have similar response time characteristics. At least some of the differences in price can be attributed to qualitative differences in performance that are seldom measured or analyzed in detail. 1.1.3.3. Quality of service. There are many management considerations that affect the quality of performance of a computing system. In a poorly run or overloaded center, jobs may be lost, tapes that should be protected may be overwritten, poorly maintained printers may give unsatisfactory output. Job submission and retrieval facilities may be inconvenient or inadequate. The quality
6
CHAPTER 1
of the operational policies and the competence of the operating staff can often have more effect on performance than the quality of either the hardware or the software. 1.2. Capacity. The capacity of a general purpose computing system is defined as the maximum throughput that the system is able to achieve while providing the required level of responsiveness and quality. Other definitions of capacity might be appropriate in special purpose systems. For example, the capacity of a time sharing system might be defined as the number of simultaneous on-line users that the system can support with a stated mean and/or maximum response time. A major goal of many performance studies is to estimate the capacity of existing systems and of proposed systems. There are many factors that contribute to the capacity of a computing system. One of these, often overemphasized but still important, is the speed of the computing equipment. 1.2.1. Central processor speed. In his very interesting history of computers [GO72], Goldstine reports von Neumann's enthusiastic response when he was told that the Eniac project was developing a machine that would perform more than 333 multiplications per second. In very early computers a few simple data like multiply time, add time, memory size, and memory cycle time, were sometimes enough to characterize the capacity of the computer. The "Gibson mix" was the best known of many efforts to provide a more precise evaluation of the raw speed of a computer, by using a weighted average of the execution times of a number of different instructions. The weights were the expected frequency of occurrence of the various instructions in typical programs. Another approach to measuring processor speed, and to comparing the speed of different computers, is the use of kernels. Kernels are short examples of typical computer code whose speed of execution can be measured or estimated. A measure of processor speed that is often used, especially in promotional literature, is the maximum number of instructions that can be executed per second. There is still some excitement generated by announcements that computers can (or will be able to) execute hundreds of millions or even billions of instructions per second. These speeds represent very impressive technological achievements, but the speed attainable in short bursts by the fastest components of a computing system usually gives very little information about the capacity of a system. 1.2.2. Disc-based systems. The high speed processing units with their correspondingly fast central memory usually end up spending large amounts of time waiting for the relatively slow peripheral storage devices, and the even slower input-output units. Even when central processors are not waiting, they are usually forced to devote much of their processing time to "overhead" tasks of the large and inefficient operating systems that came along with the "third generation" computing systems.
INTRODUCTION
7
Some of the first performance measurements on the then new OS 360 system were presented by IBM to the SHARE organization in 1965. In a number of test runs there was little difference in total processing time required by the slow model 40 processor and the very much faster model 65 of the 360 line. This was a dramatic illustration of the fact that the system that was being tested was a disc-based and disc-access limited system. It was only through the improvements of peripheral storage (faster devices, more and faster channels) and through more effective use of peripheral storage (for example, multiprogramming, buffering, etc.), that the extra speed of the main frame could be translated into extra capacity for the system as a whole. 1.2.3. Other hardware and software features. In a paper written in 1968 [RO68], I stated that "all of the (hardware) features that have been designed into digital computers may be considered to be • • • reflections of software needs." I might equally well have stated that all hardware and software developments are a reflection of the need and desire to improve the capacity and the performance of computing systems. There is a great deal that could be said about the ways in which the logic of hardware and software organization affect capacity and performance. A discussion of the many aspects of hardware and software that influence performance would be an appropriate subject for a text in computer design. There are no such texts, but the reader is referred to [BU62], [TH70] and [BE71] for interesting discussions of hardware performance considerations in the design of specific computers. We shall not pursue this subject further here, since it would take us beyond the projected scope and magnitude of this monograph. 1.3. Workload and benchmarks. The workload of a system is defined as the set of tasks or jobs that the system is called upon to process. In a system in which the workload characteristics are known and can be specified quantitatively, it is possible to define a measure of capacity as the number of workload units that can be processed in a unit of time. Thus, a system whose only function is to handle transactions of a certain kind (for example, an inventory control system) can be said to have a capacity of so many transactions per hour. If several different computing systems are proposed to handle this type of workload, it is only necessary to measure or estimate their average time per transaction in order to obtain an estimate of their capacity for handling the workload. A benchmark is a program or a set of programs that is selected or designed for the purpose of testing and/or measuring the performance of a computing system, or comparing the performance of a number of computing systems. Benchmarks are usually chosen or constructed to be typical of the jobs a system runs, and representative of the system's workload. For a system whose workload can be characterized by a benchmark set of programs, the time it takes to run the benchmark set (or preferably the reciprocal of that time) can be taken as a measure of the capacity of the system. Such benchmark sets are frequently used in equipment procurement studies to provide
8
CHAPTER 1
an estimate of the relative capacities of different systems, and of different system configuration. Benchmarks and workload characterization will be discussed in Chapter 7. In a general purpose system it may be difficult or even impossible to provide a valid characterization of the workload, especially since the workload may vary significantly, even over relatively short periods of time. There may exist only a very general concept of the kinds of work that such a system will be required to handle. This is characteristic of the many situations in which the computer functions as a public or semi-public utility. Examples are computing centers at universities and research establishments, computer service bureaus, and many multipurpose industrial and military computer installations. For existing systems of this type various measurement techniques can be used to obtain measures of throughput under a variety of real and synthetic workloads. The measurement techniques will be discussed in later chapters. Measures of throughput under varying load conditions can be used to obtain estimates of capacity as a function of the nature of the workload of the system. 1.4. Performance prediction. The performance analyst is very often faced with the problem of estimating the capacity (and other performance characteristics) of systems which cannot be measured under actual production loads. These are estimates that are used in decisions about designing or procuring and installing new systems, and making additions or changes in existing systems. In some cases examples of the proposed new system exist, perhaps in prototype form, in the manufacturing plant or in another user's installation. In such cases a certain amount of testing and measurement may be possible, and the analyst is then called upon to design such tests and to make extrapolations from them. Systems on which no measurements at all are possible provide a greater challenge and much greater difficulty. Every hardware and software design project involves, either explicitly or implicitly, predictions concerning the performance of the resulting product. Manufacturers who develop new products, and users who order products before their use is well established, often stake very large amounts of money on the expected performance of a product. It is surprising how little effort has been devoted to performance prediction, even for huge projects involving expenditures of millions of dollars over periods of many years. As the computer industry has matured, there has developed a greater appreciation and understanding of the need for performance prediction and evaluation, and the major computer manufacturers and some government and industrial organizations have established groups and departments devoted to these efforts. An example is the Federal Computer Performance Evaluation and Simulation Center, which was set up by the General Services Administration in Washington, D.C. [CO74]. Performance prediction is not and probably can never be an exact science. Simulation and mathematical modeling techniques can be used to estimate aspects of the performance of new systems, and of the effect of changes in existing systems. Some of these techniques will be discussed in later chapters. Modeling
INTRODUCTION
9
techniques are important for developing insight into performance, and they can lead to the solution of many interesting performance problems. Performance analysts must be aware of these techniques and of the problems and areas in which they can be applied successfully. They must also be aware of their limitations. Real systems, especially general purpose systems, are often very complex structures that cannot be represented adequately by the relatively simple models for which analytical solutions can be found or for which successful simulation studies can be devised. In discussing some specific changes that were considered in connection with the Purdue MACE operating system on the CDC 6500 [RO73], I stated: "It would be nice to have an accurate model that could predict the change in performance that would be produced by a change in a system component. We have expended considerable effort in applying modeling and simulation techniques, but the models still seem to be much too coarse to be able to give us this kind of information. With the present state of the art, it is still necessary to rely on more informal techniques based on careful extrapolation from measurements of the existing system, to estimate expected performance improvement." There is a great deal of work going on in the area of improving techniques for modeling, and for the solution or simulation of models, some of which will be discussed in Chapters 8 and 9. 1.5. Literature. There have been a large number of survey papers and general discussions of performance evaluation, including several books [DR73], [ST74] and a number of bibliographies [BU69], [CR71], [AN72], [MI72]. Every major computer conference proceedings contains articles in this area, and there have been a number of conferences specifically in the area of measurement and evaluation of performance [AC71], [AC73]. Because of the dominant position of IBM in the computer field most performance studies have involved IBM equipment, and the IBM Systems Journal provides some of the most useful papers on performance analysis. Their special issue on system performance evaluation (Volume 8, No. 4, 1969) contains a number of papers that have had a great deal of influence on subsequent developments in this area. The SHARE organization of large IBM users has had an ongoing Computer Measurement and Evaluation project, and selected papers in this area presented to SHARE in the years 1967-1973 are published in [SH74]. A great deal of the performance evaluation literature deals with immediate practical problems in connection with specific systems, and many of the papers are only of transient interest. Some of the best general papers in the area of performance evaluation are those written by T. E. Bell while he was associated with the Computer Performance Analysis project at Rand Corporation. In [BL73] he lists the important tools (or techniques) of performance evaluation: personal inspection, accounting data, hardware monitors,
10
CHAPTER 1
software monitors, benchmarks, simulation models, analytical models. Any author who plans to cover the field of computer performance analysis at all completely should include a discussion of these techniques and their application. The relative weight (that is, the amount of space) given to each is a matter of background and interest. Some authors might choose to devote a great deal of space to queuing theory, in an attempt to develop a mathematical basis for performance analysis. Others feel, and state, that no progress will be made until we find a way to measure and model the workload of a computing system, an area subsumed under the heading of benchmarks in Bell's list of tools and techniques. Still others prefer to concentrate on the gathering and analysis and reduction of real data from running systems. All of Bell's seven categories will be covered to some extent in the chapters that follow. The size of each chapter is not necessarily an indication of my opinion of the relative importance of each topic. Personal inspection is important, but there is not very much of a general nature that can be said about it, and its chapter is therefore quite short. The chapter on hardware monitors is probably shorter than it should be, but my own experience in this area is quite limited. The chapter on simulation may be longer than it should be relative to the others, but I thought it would be interesting to include actual examples of simulation programs and their output. The examples and accompanying text expanded that chapter, perhaps beyond its level of importance in the total performance evaluation picture.
CHAPTER 2
A Conceptual Model of a Computing System 2.1. Introduction. In subsequent chapters we shall present and discuss measurement and evaluation techniques, with examples of the application of these techniques to one particular computing system and with data gathered during the running of that system. A complete description of the system would run to hundreds or even thousands of pages, depending on the level of detail of the description. It would be necessary to know a great deal about the system in order to judge the validity of the data presented and in order to use and interpret such data in an analysis and evaluation of the performance of the system. However, the measurement and evaluation techniques that are used and the data, considered simply as examples of the kind of information produced by the measurement tools, can be discussed and understood in terms that apply to a large class of computing systems of which this particular system is one instance. This class of computing systems can be described by means of a conceptual model that is an abstraction of characteristics that are common to most, if not in all cases to all, members of the class. This chapter is a brief presentation of such a conceptual model. It is an attempt to present it on a level of detail that will be adequate for an understanding and appreciation of the material in the chapters that follow. The reader can judge its success for himself. The system on which the data presented in later chapters was collected is the Purdue MACE operating system that runs on Control Data 6000 (or Cyber 70) equipment [AB70], [AB74]. This system has much in common with Control Data's KRONOS system, and also (though less) with their Scope system. These systems and IBM's various systems on their 360 and 370 computers were kept very much in mind in developing the conceptual model presented here, but there was also an awareness of features of other large systems, including the Univac EXEC systems on their 1100 series, and the GCOS systems on the large Honeywell machines. The model is based on these large scale general purpose systems, but it may also apply to smaller systems, even to some of the very capable mini-computer systems that have been developed. All features of the model need not be present in a particular instance of a system that is described in terms of the model. Figure 2.1 presents a skeletal outline of the system model. 2.2. Resources and processes. The elements of the computing system model are resources and processes. The primary resources are central processors, central 11
FIG. 2.1. Conceptual model of a computing system
A CONCEPTUAL MODEL OF A COMPUTING SYSTEM
13
memory, channels and peripheral storage. Input-output devices, terminals and auxiliary processors (for example, front-end or communications processors) are secondary resources that are often under the control of a specific process. A process is an activity of the system that involves the execution of sets of programs and the use and modification of sets of data, in the performance of a major function for the system itself or for a user of the system. The programs and data that are used by a process reside in peripheral storage. (There are exceptions in the case of processes that communicate with external devices, but these will be ignored in this discussion.) A process becomes active when part (in some cases, all) of its programs and data are moved into central memory. Processors execute programs for active processes, and channels transfer data and programs between peripheral storage and central memory. Processes use central memory space, central processor time, channel time and peripheral storage space; and records that are kept of their use of these commodities provide some of the primary data for performance analysis. 2.2.1. Permanent processes and jobs. There are a few processes that are designed to remain active over long periods of time. These include the supervisor process, the spooling process, and the time-sharing process, all of which will be described below. These are called permanent processes. The supervisor process, for example, must be active whenever the system is operating. It is characteristic of the permanent processes that they are not scheduled by the system job scheduler. Most processes are not permanent, and are referred to as jobs. Jobs are loaded into the system by a spooling process or by a time-sharing process. They are activated and deactivated by the system scheduler which is part of the supervisor process. Jobs that are in the system and that are not active exist as job files in queues in peripheral storage. 2.3. Terminology. A file is a named collection of data terminated by an end of file mark. In IBM terminology a file would be called a data set. The term core memory or simply core is often used in place of the term central memory, since between 1954 and 1972 almost all central memory was made of arrays of magnetic cores. The term disc memory or simply disc is often used in place of the term peripheral storage. Disc units are used as peripheral storage in most computing systems (in 1974), but drum storage systems are also widely used, and other, nonrotating, peripheral storage systems may soon be introduced. 2.4. The supervisor process. The supervisor process is a permanent process that contains an executive program and other programs that contribute to the system's ultimate purpose, which is to run user jobs. The executive program provides a variety of services for all of the active processes. Any significant change of state in the system involves one or more requests for the execution of executive functions, and a record of the execution of these functions can be one of the most
14
CHAPTER 2
useful sources of data on the performance of the system. Part of the executive program must remain permanently resident in central memory, but the execution of some of the functions provided by the executive may require that program segments be loaded from peripheral storage. Our model does not require a knowledge of all or even most of the executive functions. These will vary in different hardware systems and even in different operating systems using the same hardware. The supervisor process has several major functions that are carried out by supervisor programs. Among these are the programs that carry out the system scheduling and the dispatching functions. It is important to distinguish clearly between these functions, since the nomenclature used in describing them is often ambiguous and can lead to some confusion. In our model the system scheduler manages the allocation of that part of central memory that is available for dynamic allocation to job processes. A separate memory management or paging supervisor program should be added if paging problems that are characteristic of virtual memory systems are to be discussed. This important specialized area of performance analysis has been discussed very extensively in the literature and will not be covered in this monograph. A good survey and bibliography in that area is contained in [DE70]. 2.4.1. The system scheduler. The scheduler is the supervisor program that decides which job processes are to be active in the system at any given time. The scheduler can make a job active by allocating central memory to it, and then by causing all or part of its job file to be loaded into central memory. Central memory space may become available either because a job has released some or all of the space allocated to it (for example, if a job terminates), or because a job is temporarily suspended, either by the scheduler or by the action of some other program. In the case of a suspended job, the scheduler can make the central memory space occupied by that job available for reallocation by causing the job to be rolled out from central memory to peripheral storage. There are various parameters associated with each job, including a job priority, which are used by the scheduler to determine which jobs should be made active, and which active jobs should be suspended. A scheduling strategy is an algorithm used by the scheduler in making this determination. It is assumed that a job that is preempted (suspended) and rolled out to peripheral storage goes into a rollout queue, from which it can later be rolled back into central memory and resumed (made active), and continue to execute from the point in its program at which it was suspended. The scheduling strategies will thus in general be preempt-resume strategies. The scheduling strategy used can have a very significant effect on the performance of a computing system. It is usually very difficult to predict the effect on performance of a change in scheduling strategy. Typically, such a change will improve some areas of performance and worsen other areas. Thus, even if one can predict the changes in performance, it may still be extremely difficult to evaluate them.
A CONCEPTUAL MODEL OF A COMPUTING SYSTEM
15
2.4.2. Dispatcher. One or more of the processors in the system are central processors. These perform the main computing and data manipulation functions of the system. The dispatcher is the supervisor program that assigns central processors to active processes. A process may be active and not need a central processor. For example, it may be waiting for an input/output operation to complete. An active process that needs a central processor is said to be in Wait state until a central processor is assigned to it. When a central processor is assigned to a process, the process is said to be executing. The dispatcher maintains a dispatching queue of those processes that are in Wait state and those that are executing. The central processor time allocation unit is called a quantum. A quantum is the maximum amount of time that a process can use a central processor before that processor becomes eligible for reassignment by the dispatcher. Whenever a central processor becomes eligible for assignment, because it has been interrupted or has become idle or because a process has finished a time quantum, the dispatcher assigns the processor to one of the processes in the dispatching queue in accordance with its dispatching strategy. If the queue is empty, the processor remains idle and eligible for assignment. At the end of a quantum the dispatcher may permit a process that was executing to retain the central processor for an additional quantum, or it may put that process into Wait state and reassign the central processor. A typical dispatching strategy might give permanent processes priority over job processes, but assign quanta of central processor time to job processes on a round-robin basis. A typical quantum might be 10 or 20 milliseconds or more. The quantum size is an item in the dispatching strategy. Considerations in determining quantum size include the speed of the central processor, and the time it takes to switch a central processor from serving one process to serving another. The purpose of the round-robin allocation of central processor time is to permit all active jobs to make some progress toward completion while they are tying up central memory and possibly other system resources. Another typical dispatching strategy is to attempt to favor input/output-bound jobs at the expense of compute-bound jobs. The value of this type of dispatching strategy can be enhanced in a system in which the scheduling strategy maintains a good mix of input/output-bound and compute-bound jobs in the set of active jobs. The size of the quantum, and the dispatching strategy may have significant effects on system performance. These effects have been studied in [BA70] and [SH72]. Some systems permit a job process that is using a central processor to retain that processor for as long as it can use it, subject only to preemption by the supervisor process. The effect is the same as would be observed in a system that uses a very large central processor quantum size. This may be a valid strategy in a system in which switching from one task to another is prohibitively expensive, or where there is only very limited multiprogramming activity. There is some danger of confusion between the use of the word quantum in connection with central processor dispatching, and its frequent use in the computer literature in connection with time slicing in time-sharing systems. I have
16
CHAPTER 2
suggested the use of the word partum in place of quantum in the present context, but I hesitate to use an unfamiliar word to describe a simple concept. In this monograph I shall try to avoid any confusion in the use of the word quantum through use of time slice rather than quantum in the discussion of time-sharing processes. 2.4.3. Input-output manager. The input-output manager is the supervisor program that handles requests from all of the processes for the transfer of information between central memory and peripheral storage units. It may also handle transfers of information to some or all input-output devices. A typical input request may ask for a record from a specified file. The I/O manager uses appropriate tables to determine the peripheral unit and the position on that unit at which the record is located. It must have a mechanism for denying or queuing the request if the unit is busy or, in the case of several units on the same channel, if the channel is busy. 2.5. The spooling process. The word "spooling" is based on the acronym Simultaneous Processing Operations On Line introduced by IBM in 1960 to describe a feature of their 7070 system. It has become an accepted part of computer system nomenclature. The spooling process is a permanent process that reads jobs from one or more input devices and puts the jobs into input queues in peripheral storage. It also reads files from output queues in peripheral storage and transmits them to output devices such as printers, punches, microfiche machines, etc. The output spooling process may have a scheduling function, and the order in which output files are released to output devices may have a significant effect on the observed performance of the system. 2.6. Time-sharing process. A time-sharing process is a permanent process that provides interactive computing service to a number of on-line terminals. Each active (i.e., logged-on) terminal alternates between user state and system state. User state time is sometimes called think time, although in our model it includes printing and typing time as well. System time is also called response time. For each active terminal associated with the time-sharing process there is a current task. At the end of a user's input to the system a carriage return or other attention signal transfers the terminal from user state to system state, and causes a request for service to be transmitted to the time-sharing process. There is space, usually in the fastest peripheral storage available, that is allocated as swapping storage for the time-sharing process. Each task associated with an active terminal has an information area in the swapping storage that contains programs and data necessary for initiating or continuing the execution of the programs that must be run in order to respond to the terminal's request. When a particular task is selected as the next one to be served, its information area is moved from swapping storage into a central memory area that belongs to the time-sharing process. All of the services provided to a permanent process, including quanta of central
A CONCEPTUAL MODEL OF A COMPUTING SYSTEM
17
processor time and access to peripheral storage, are available to the task while it is occupying the time-sharing process' central memory space. Unless a task terminates or is preempted, service to the task will continue for a length of time called a time slice, at the end of which its information area is swapped out, and the information area that belongs to the next task to be served is swapped into central memory for the next time slice. A time slice may be, for example, 50, 150, or 250 milliseconds. The time slice size is an element of the time-sharing process scheduling strategy. The amount of service time needed to respond to requests from terminals may vary from just a few milliseconds to several seconds or more. The service request may thus require one time slice (or a fraction thereof), or may require a number of time slices. The time-sharing process provides time slices to the system state terminals according to its scheduling strategy. The simplest strategy is a round-robin strategy in which a task that requests a time slice starts out at the end of a queue consisting of all tasks that need time slices. It goes back to the end of the queue for each additional time slice that it needs. When the time-sharing process has provided the required service for a terminal, it initiates a response to the terminal and places the terminal in user state. A variant of the model would permit a time-sharing process to provide service to more than one task at a time. In order to simplify some of the later discussions we shall assume that a time-sharing process serves one task at a time, and permit two or even more time-sharing processes to be active simultaneously. 2.7. Job processes. Jobs are users of system resources. On the most elementary level a job is specified by listing the nature and the amount of the system resources it will use. The simplest job specification will list the amount of central memory needed, the amount of central processor time used, and the amount of I/O transferred, possibly along with a priority level to be used by the system scheduler. In addition to the total resource requirements, it is necessary to model the sequence in which some of the resources are used. In a very simple model, an active job can be considered to consist of a sequence of alternating central processor requests and channel requests. A central processor request specifies the amount of central processor time required before the next channel request. A channel request specifies a channel number and the amount of information to be transferred. The devices associated with the different channels are not assumed to be identical, and thus the time it takes to satisfy an I/O request for a given amount of information may be quite different for different channels. A job that makes a channel request gives up the central processor, and becomes ineligible for central processor dispatching until the channel request has been satisfied. Some systems permit a job to retain a central processor while one or more input-output requests are being processed for that job. It is possible to achieve maximum overlap between central processor utilization and channel utilization for the job in such a system. Attempts to optimize the performance of one job process may sometimes result in slowing the execution of other active jobs, or even of permanent processes [RO73].
18
CHAPTER 2
A job may be permitted to have a terminal, or in some cases several terminals, associated with it. A job that provides interactive service to a terminal is not the same kind of thing as a time-sharing process. The job is not a permanent process. It is scheduled by the system scheduler. The scheduling algorithm may make special provision for such jobs. Note that the system scheduler is not involved at all with a time-sharing process since a time-sharing process is a permanent process. A system without time-sharing processes may still be a time-sharing system. The time-sharing process is a mechanism that attempts to provide some, possibly limited, interactive services to a large number of users. Its main service may, for example, be a file creator and editor that provides a conversational system for submitting batch jobs to the system.
CHAPTER 3
Personal Observation 3.1. Introduction. It is possible to learn a good deal about the performance of a computing system through personal observation, without the use of sophisticated data gathering or analytical tools. Direct observation may be the only way to study and evaluate some of the important qualitative aspects of computing system performance. A computer room usually satisfies environmental requirements with respect to measurable quantities such as temperature and humidity, but even these should be checked. There are other unmeasurable environmental parameters, some of which are simply questions of good housekeeping. Dust and dirt in the computer room may be the cause of poor tape and disc performance. Cards and tapes stacked apparently at random on equipment are usually a sign of sloppy procedures that result in lost and improperly run jobs. 3.2. Human engineering. There are important human engineering considerations that can and do affect performance. The layout of the computer room, the accessibility of readers and printers and tape and disc units, the facilities for communication between operators in multi-operator installations, can all effect performance. A performance analyst can observe if input decks are handled properly and with care, if tapes are retrieved and restored and mounted and dismounted carefully and efficiently, if output is permitted to accumulate, or is quickly and accurately distributed back to the user. These are not all traditional areas of computer performance evaluation, but they are important parts of the performance of the system as seen by the user. 3.3. Turnaround and response. In the performance areas that are more frequently studied, such as turnaround and response time, an analyst who can arrange to use a computing system in the same ways in which typical users use it can often gain insights into the performance of the system that might not be apparent in data provided by the accounting logs and software and hardware probes that are discussed in Chapters 4, 5 and 6. A good way to get a feel for the turnaround provided by a system is to submit batch jobs and wait for the output. A good way to get some idea of the response time is to spend some time exercising various features of the system from a keyboard terminal. In both of these cases the analyst may also learn a great deal about the quality of the system's performance. Informal observation and informal studies have many of the same problems as more formal ones. In order to be able to generalize about turnaround or response, 19
20
CHAPTER 3
it is necessary to make tests under "typical" conditions, and since typical conditions can usually neither be defined nor achieved, it becomes necessary to make a large number of tests under different sets of conditions so as to make the results representative. The in-house analyst who lives with the system can make such frequent tests. The analyst who is called in for a brief study would probably have to rely on information provided by users of the system. Observing users of the system, and discussing with them their experiences and satisfactions and dissatisfactions with the system, can provide the analyst with a general feel for how well the system is performing, and may suggest the existence and source of performance problems and inadequacies. 3.4. Direct observation. An analyst who understands a particular system well can often get a great deal of information by observing system displays and lights associated with various hardware components. The fact that a unit is idle, or that a channel is underutilized, or that a particular system resource is a system bottleneck, would usually become apparent in any well-planned system evaluation study. Some of this same type of information can be obtained much more directly by observing that certain indicator lamps are rbrely lit while others are almost continuously on. Intelligent and informed observation of a system can sometimes be the most important part of the performance evaluation process. It is sometimes possible to obtain considerable insight into performance problems by asking some obvious questions and making routine observations of how the system is working. 3.5. Elementary considerations. When an electrical appliance is not functioning, the advice usually given by the analyst (repairman) is, "Make sure that it's plugged in and turned on before you look for any other problems." There are analogous situations with respect to some computer performance problems, especially in connection with the use of peripheral storage devices. There are cases where devices have remained unused for extended periods of time because the "SYSGEN" never turned on the device as far as the operating system was concerned. There are situations in which a necessary software module was not installed or a necessary software change was not made at the time a hardware device was installed. It is not unusual for equipment manufacturers to deliver equipment with powerful capabilities which are not used by typical FORTRAN and COBOL programs. The ability of a device to perform disc seeks on a number of spindles simultaneously means nothing if the seeks are not being issued in such a way as to take advantage of this feature. Position sensing capabilities built into modern disc pack controllers remain unused if the programming language processors, and the input-output packages they use, are not designed to use them. The performance analyst, who is aware of the current state of development of the standard software products, is often in a position to explain why the installation of new and more powerful equipment has not produced the performance improvements that were expected as quickly as they were expected.
PERSONAL OBSERVATION
21
3.6. Examples. Later chapters will include some examples of measurements taken on a particular CDC 6000 installation at Purdue University. The remainder of this chapter will contain a few examples taken from this same system, to illustrate the kinds of performance information that are available to direct personal observation, and some thoughts on the uses and limitations of this kind of information. There is no attempt at any kind of completeness, since there is almost no limit to the kinds of things that could be observed. Three separate work stations in the computing center were visited and observed and some comments on each are presented here. 1. A card reader. As I walk down the hall in the basement of the computing center, I come to a self-service card reader with a rated card-reading speed of 1,000 cards per minute, and a queue of about 15 students waiting to load their jobs. I observe a very significant pause between the time that one student finishes loading his job and the time the next student's job starts feeding through the reader. This pause includes time for the first student to walk away from the reader, time for the second student to approach the reader, insert a header card in front of his deck, put the deck into the hopper and start the reader. Then there is a short delay before the reader picks the first few cards, another very short delay while the software is presumably making some job validity checks, and finally the full speed reading of the rest of the deck. Since most of the decks are short, the rated maximum reading speed of the card reader has little effect on the actual throughput of the device. The mechanics of getting people up to and away from the reader are of great significance. The only way to get information about queuing at the reader is the very old-fashioned way of watching and counting. The user at the end of the queue, who may wait 15 to 20 minutes before he can reach the reader to submit his job, is not going to be impressed with internal reports of job turnaround time that ignore his waiting time. 2. A terminal room. Further down the hall there is a small room with four public keyboard terminals that are in almost constant use. I know that the system is very heavily loaded at this time, and as a result any system measurement tools would show that response is slow. Just what does this mean to these four users? I note that two of the terminals are clearly keeping up with, or staying ahead of, their users. At one of them the obviously inexperienced user is consulting a manual to figure out what to do next. At the second the user is creating a file, and his typing speed is clearly the limiting factor. At the third terminal the user is impatiently waiting for the results of a job he has submitted. Every 10 seconds the system types out his current job status, and from the looks of things, he would do better to request a batch printer printout, and leave his terminal. The fourth user has just retrieved a file from the permanent file system, and even though this took almost a minute because of the heavy load, he seems satisfied with this kind of response for this type of request. There is not much information here, but repeated observation of terminal use does give one some background feeling that makes response time averages and distributions more meaningful. There are some obvious truths about the response
22
CHAPTER 3
time of time-sharing systems that sometimes need to be reinforced. One such obvious truth is that whether or not response time is satisfactory depends on the nature of the application. Even when internal data shows that average response time is poor, it is possible, indeed even probable, that most users who are creating and editing small job files and occasionally submitting jobs to be run are getting better than adequate response. At the same time the student who is trying to complete a CAI (Computer Assisted Instruction) assignment in a half-hour segment of real time may find the response intolerable. 3. The system console. Among the innovations in large computer design introduced by the CDC 6000 systems, one very interesting one was the use of a twin cathode ray tube console driven by display programs in the system's peripheral processors. These displays permit constant monitoring of the performance of the system. A major purpose is to give the operator information that will permit him to intervene intelligently in the operation. The amount and types of operator intervention that are required in the running of a computing system, and the nature and extent of the information that is immediately available to the operator, are factors that should be taken into account in an evaluation of system performance. This is a type of performance evaluation data that must at least partially be obtained by observing the operator and the displays. There are many specialized displays available to the operator, but most of the data necessary for following the principal activities of the system are presented on a few major displays. Thus a single display shows the status of each active process (see Chapter 2) and of each processor. It shows the degree of multiprogramming, the activity of the preempt-resume scheduler, the activity of the processors as the job mix changes back and forth between processor limited and input-output limited, the activity of the channels, and thus indirectly the activity of the I/O devices. Another display shows the activity of the time-sharing subsystem, including the status of each terminal, and the switching of process time among them. Another shows the status of the system queues, and the availability of peripheral storage space on public disc storage units. Several displays show the current values of parameters that can be changed directly from the console keyboard in response to unusual loading or queuing conditions. Changing these parameters introduces changes in some of the scheduling strategies of the system. For example, they might increase or decrease the central memory space allocated to jobs submitted through the terminal system, or they might increase or decrease the priority given to jobs as their resource usage approaches their maximum allocation. Experience has shown that caution and restraint should be used in making any strategy changes in a system while it is running under a heavy production load. Changes often have unanticipated side effects, and may worsen the situation they were meant to improve. Direct observation of the system displays provides a great deal of data, but the interrelationships among the data are not at all obvious. The observer can easily be misled into believing that he understands more about the system than he really
PERSONAL OBSERVATION 23
does. Personal observation is important for understanding and evaluating systems, but it presents only a small part of the total performance picture, of which other parts are discussed in subsequent chapters.
This page intentionally left blank
CHAPTER 4
Accounting Systems 4.1. Introduction. This chapter has a twofold purpose. The first is to explain and illustrate one of the important sources of data and information about the performance of computing systems. The second is to discuss some general problems concerning the interpretation and validity of data. This latter discussion applies equally to sections on data collection and analysis in other chapters. Here and in subsequent chapters we shall discuss some specific data gathering tools, and give examples of data from the Purdue MACE operating system running on Control Data 6000 series equipment. Data and techniques for obtaining data about specific systems are of little general interest in themselves. The samples of data presented in these chapters are not presented to give the reader information about the performance of this particular system. Their purpose is only to enhance the text by providing examples of kinds of data that may be of interest, and of some of the problems involved in collecting, understanding and verifying such data. 4.2. Validity of data. All data that is collected through the observation of complex systems should be approached with skepticism. I recently came across an interesting comment attributed to Sir Arthur Eddington [CH74] to the effect that "you cannot believe in astronomical observations before they are confirmed by theory". A similar statement could be made about observations that produce data in many other fields. For the computer field I would paraphrase Eddington's remark to state that you cannot believe in data about the performance of large computing systems unless they can be explained in terms of a conceptual model of the system. Even relatively simple performance statistics can be misleading. The physical sciences place a great deal of emphasis on the reproducibility of experiments. In matters of any importance, experiments are repeated in a number of laboratories by different groups, sometimes using different techniques, collecting data that may eventually establish a high level of confidence in (or possibly disprove) earlier results. The data that we deal with in computer performance studies are rarely of the type that will be confirmed by other independent studies. Decisions about the validity of data are usually based solely on the judgement and integrity of individual investigators. 4.3. Understanding the system. In order to understand and interpret data concerning the performance of a computing system, it is necessary to understand 25
26
CHAPTER 4
the system itself at an appropriate level of depth and detail. In some cases a performance analyst with only a superficial understanding of a system can come up with important insights about the reasons why the system is not performing up to capacity. We have already discussed situations in which simple observations of the system in production can suggest important system improvements. Simple observations of routine data on the system can sometimes be similarly useful. Data on equipment utilization that show a printer or a disc unit or a channel consistently underutilized can suggest changes that result in better system balance and better system performance. System bottlenecks, especially very bad ones, can sometimes become apparent through relatively simple observation and analysis. There are many aspects of performance that cannot be studied and understood without a detailed understanding of the system. It is usually not possible to achieve this type of understanding without a good deal of very specific knowledge of, and experience with, the system being studied or systems very much like it. Such knowledge may only be available to people who have worked very closely with the system, to high level system designers and implementers, and performance evaluation experts associated with the designers and implementers. A performance evaluation group working for a computer manufacturer might be the only group competent to do an accurate evaluation of the performance of that manufacturer's hardware and software system. Data and evaluations produced by groups that have a vested interest in the sale of the system they are measuring will almost always show some bias, if only in terms of the selection of data that is to be presented, and the context in which it is presented. System development projects at universities and other research centers are often under great pressure to publish their accomplishments in the public literature, and there is frequently a tendency to publish data in such a way as to enhance the reputation of those engaged in a system development project, even when the performance of the system under development falls short of the system's original goals. 4.4. Accounting and billing systems. Accounting and billing are major problems in iarge general purpose computing systems. In a multiprogramming system the amount of time that a job occupies central memory and peripheral storage space, and the amount of processor time used, may vary significantly in different runs of the same job, depending on the resource requirements of the other jobs that are active while it is being run. It is not at all obvious just how to create a billing system that is fair to all users, and a considerable literature exists in this area that is not of direct interest here. At least partially as a result of these problems, every major operating system contains elaborate accounting subsystems that keep detailed records about the utilization of system resources. As a minimum, in a typical system, each job creates one or more accounting messages containing information about the resources that the job has used. These messages are saved in an accounting file that serves as the input to billing and accounting runs. In the standard IBM 370 systems the subsystem that produces the resource utilization records is known as the System Management Facility (SMF) [IB73]
ACCOUNTING SYSTEMS
27
which is itself an optional component of the operating system and which contains numerous options that determine which data are to be included. In the CDC systems from which we shall draw our examples, the file that holds the raw accounting records is called the "accounting dayfile." Other terminology is used in other systems, but in almost every case the data that is produced can provide a great deal of information about the performance of the system, including the information contained in standard billing and accounting reports, but often including a great deal more. There is almost no limit to the amount of summary data and the number of reports that can be produced from this data. Here, as in any large scale data collection and data reduction application, the power and speed of the computer and its printers can be a mixed blessing, since it is so easy to produce frequent, voluminous, and often superfluous reports. 4.4.1. Daily summary report. One of the most important measures of performance is the actual volume of production achieved in the day-to-day operation of the system. A short daily summary report can be very useful for keeping management and systems personnel informed about a number of performance parameters, and making them aware of difficulties and anomalies that may occur. Weekly and monthly and other regular reports may also be useful, in the same way as they are useful in any large production operation, for management and control purposes. Here we shall only consider an example of a daily report, keeping in mind the fact that the management group that gets this report receives a similar report every day, and also receives periodic summary reports. A very brief glance is usually enough to determine if anything unexpected happened in the previous day. Tables 4.1 and 4.2 are two pages of a report that summarizes one day's operation (April 22, 1974) of the academic computing center at Purdue University. These tables are typical of this kind of report. They are not meant to show the results of a typical day's operation, since in this kind of system the computing load shows very marked daily and seasonal variations. A quick glance at this report by one who receives them regularly would indicate that this was a day in which total production was unusually heavy, but that nothing unusual or unexpected happened during that day. At the beginning of the report there is a "dead-start history" for the two major computers, a CDC 6500 and a CDC 6400 that run under the Purdue Dual MACE system. Dead starts correspond to IPL's (Initial Program Loads) in IBM systems, and to warm starts (and possibly cold starts) in a number of other systems. The data here reveals that there were no unscheduled dead starts, i.e., no system crashes and recoveries on April 22. A study of these reports for the whole month of April showed that on the average there was one system recovery per day. Such recoveries require that jobs that are active at the time of the crash have to be restarted. Other jobs are usually unaffected. Recoveries of this type usually go completely unnoticed by those who have submitted batch jobs to the system. They are, of course, noticed by on-line users, and by anyone waiting to submit a job at the time the crash occurs.
TABLE 4.1 Summary of one day 's operation of a computing centre PURDUE UNIVERSITY CDC 6400/6500 DAYFILE SUMMARY -6500
DAYFILE COVERS THIS INCLUDES
6500
DEAD-START HISTORY TAPE AT 09.04.2UA, RESUM AT IB. 03.194, DATE AT un.OO.Oitit
6400
DAYFILE COVERS THIS INCLUDES
04/22"/74.
20.7Q9 HOURS, FROM 09.04.20 ON 04/22/74 THRU 05.46.52 ON 04/23/74 18.602 PRODUCTION HOURS AND 2 .107 HOURS LOST 04/22/74. 04/22/74. 04/23/74.
OFF AT OFF AT OFF AT
15. 5tj.55A 23.39.59A Ob.4G.52B
20.208 HOURS, FROM 09.03.12 ON 04/22/74 THRU 05.15.40 ON 04/23/74 2.097 HOURS LOST 18.111 PRODUCTION HOURS AND
6400 DEAD-START HISTORY TAPE AT 00.03. liA t RESUM AT 1H.02.'3/A» DATE AT OO.OU.01B,
04/22/74. 04/22/74. 04/23/74.
OFF AT OFF AT OFF AT
CUM6INED MACHINE PERFORMANCF HISTORY..... 9557 JOBS COMPiKTtU B68S ......(CP D A r F I L E MtSSAGES FOUND) 30.865 CP HOURS
55.80 PERCENT CFNTHAL pRoctsson U T I L I Z A T I O N
8285,962 10 1000 «/nRD UNITS 5257.831
805.057 TERMINAL r.ONNtcT HOURS 26.775 PLOT HOUH5 4453910 LINES PRI^TEU 71004 CARDS PUNCHED
30.49A 39.59A 1S.40B
TABLE 4.1—con/. JOB H I S T O R Y
1000 2000 3000 4000 5000 7000 9000 OVtHHtAO TOTAL
JObS 2245 112 447 35
172 b~369 b9d 679 9-537
CP HOueS
3Y
ACCOUNT
10 U M I T 5 M*0-StC U.r/rs J 3 0 0 . 0 4 b2i!t3l .36^ .Uub 13.U^9 41.016 B52.e>6k: il. 12* 3e>6. '74 .06^ PO.6^9 lO.^Jl .t ^b 7J .338 1 7t.=«7f 1D.<;^9
CATEGlMItS LlNt MRS «I13.4V9 «!.S73 -JJ.6H7 i.Ul W.e-48
^-•t^.^ya
^.93«
PLOT H«S 1 '.4SO .OuO 2.06* .000 .QUO
a.4
4.Bb8
. o oo
76»7fo
LlNfcS 1686*312 136bO 3116d4 I73b<* 145932 1987104 ZStlJB 391V6 4453910
CARDS 34656 0 1448ft 112 2212 B410 11084 42 71004
TABLE 4.2 Summary of one day's operation (cont.) HOURLY JOB T E R M I N A T I O N UI STHldaT I ON HY J O B JOB EXIT HOUR 8 9 10 11 12 13 1* 15 16 17 18 19 20 21 22 23
0 1 2 3 4 5
6 7 TOTAL CP HOURS 10 HOURS
CAMPUS STTE
BATCH. . . . . . . . . . .
M»TH MATH IN HOUSE (A-K) (L-Z) 0 0 o 87 i* 120 134 97 25 ] 06 132 17 99 i25 15 106 20 107 98 25 87 1 06 24 0 0 0 0 0 0 176 83 14 59 155 11 17 113 iOU 89 139 15 159 118 21 12B 117 8 45 8 123 11 7 91 9 10 96 3 Bt 4 7 62 3 9 29 3 0 0 0 0 0 0 1685 1B45 260
12*
tNlAO 0
50 65 15 54 ?3 63 77 0 0 88 44 156 61 68 64 29 26 20 18 11 4 0 0 968
KHANNERT 0 58 37 51 '53 47 35 25 0 0 37 39 42 55 42 69 4 3 5 3 4 1 0 0 610
1.210 9.44B 5.*56 1.569 2.211 1.111 4.042 2.412 1.097 1.131
REMOTE SITE BATCH . . . . . . FT. wESr- HAM- ROSE rtAYNE VILLE MOND POLY U 0 0 0 0 17 U 1 0 2 U 1 1 7 13 10 6 Ib 3 0 5 1 2 6 1 8 5 2 •i U 1 0 0 U 0 0 U U 0 0 4 16 1 13 0 4 2 I 4 u 1 0 u 0 0 1 u 0 2 0 o 0 0. 0 u 0 0 0 u 0 0 0 u 0 0 0 u 0 0 0 o 0 0 0 0 0 0 0 U 0 u 0 0 u 0 0 69 39 69 30 .063 .205
,02J ,08t>
.029 .047
.040 .031
ORIGIM
04/22/74.
PROCSY . . . . . . . . . . . PHI NT PHINT TE9MPR I^AI. 0 0 0 32 2 73 29 8 164 17 22 163 28 10 149 33 19 133 22 11 153 34 9 139 0 0 0 0 0 0 14 0 55 4 4 118 6 166 12 9 7 159 9 1 139 2 4 141 3 1 117 8 2 106 3 0 78 4 2 64 2 0 69 0 23 0 0 0 0 0 0 0 259 113 2209
PE
1.895 1.315
.529 4.581 .319 4.361
IMTEKACTIVE CONSOLE LOGOFF ORI3IN 0 0 28 39 56 30 22 61 64 27 24 78 69 27 125 33 0 0 0 0 39 27 26 53 38 26 44 25 42 25 29 30 23 24 24 25 19 28 27 22 16 29 18 31 0 0 0 0 872 501
1.905 3.953
2.055 2.964
IIM-
0 0 0 0 0 0 0 2
CLASSIFIEO 0 1 0 1 1 4 7 1 0 0 3 1 1 2 2 1 1 0 0 0 0 0 0 0 26
.020 .009
.031 .033
BEGIN JOBS
n
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
1
30.865 23.017
TABLE 4.2—cont. HOURLY SYSTEM CP AND 10 S T A T I S T I C S . . . . . 6500
TIME 09.59.07A 10.59.09A 11.59.11A 1.2.59.12A 13.59.10A 1*. 59. ISA 15.5«.16A 15.56.55A 18. 58. ISA 19.5B.1BA 30.58.19A 21.58.23A 22.b8.21A 23.58.26A 00.58.268 01.58.29B 02.5B.29B 03.58.288 04.58.31B 05.43.268 TOTAL
04/22/74,
6*00 TIME CP HHS»UTlL CJM CP 10 HRS CUM 10 CP HRSfUTIL CUM CP 10 HRS .963 5g ,B6* .864 10.00.2*A .635 66 .635 .433 .963 11.00.33A 1*110 86 1.499 2.073 1.086 1.950 55 .864 .352 53 1.073 67 2.176 12.00.3*A 2.980 1.030 3.1*6 .678 .374 1.07* 13.00.34A *.220 1.119 4.099 53 73 2.915 .739 .392 5.3*8 1.098 1.126 13.59.59A 79 3.707 3.197 56 .792 .458 6.391 6.331 52 89 4.576 1.135 14.58.29A 1.0*3 .868 .354 .934 1.12* 15.55.05A 50 83 5.364 .788 7.455 7.325 .325 7.39* .069 19.03.25A 85 6.229 .005 78 7.460 .865 .866 1.07* 20.03.27A 8.829 1.369 8.468 58 89 7.125 .896 .237 21.03.28A *9 .965 1.204 lU.033 94 8.070 .944 9.*53 .437 60 10.66* 92 8.990 .921 10.954 1.^11 22.03.28A .920 .213 23.03.32A 1.3*1 67 12.015 .870 11.82* 81 9.810 .820 .262 1.191 .9*9 12.773 59 13.206 00.02.178 7* 10.541 .731 .181 ,f£fi 1.019 01.02.19B 88 11.426 *6 14.132 1J.792 .885 .103 5* 02.03.llB .734 1*.S26 28 11.711 .285 15.222 .086 1.0*0 1.03* 51 16.255 Is. 386 .860 33 12.038 .041 03.02.16B .327 .705 16.091 .835 0*.03.10B 25 12.300 .262 .052 M 17.090 1.003 17.093 39 17.877 05.03.39B .7B6 31 12.617 .317 .056 .92? .791 18.798 17.885 05.12.45B 65 12.716 *6 .099 .006 48 19.517 .719 18.206 .322 55.12* MAX POSSIBLE CP HRS. 32. 23* CP HWS, 58 PCT CP UTILIZATION 23.435 10 HRS
CUM 10 .433 .786 1.160 1.552 2.010 2.364 2.689 3.556 3.792 4.229 4.442 4.704 4.885 4.988 5.074 5.115 5.167 5.223 5.229
32
CHAPTER 4
I stated that there were no recoveries on April 22, and I have a very authentic computer printout to prove it. However, there were 872 terminal sessions that day, and if you talked to a user at a terminal you might get a slightly different report. There are a whole host of problems that could have occurred in the terminal itself, in the communication lines, in the front end computer, in the time-sharing process, etc., etc. If any of these problems occurred while he was logged on, he might insist that the system had crashed even though the data presented here showed it had not. 4.4.2. Central processor time. There is no point to discussing all of the types of information contained in the report in Table 4.1. We shall discuss a few items because of some insight that they give into problems of accuracy and validity of data. The most important of these is central processor utilization. The combined system has three central processors. Two of these are part of the 6500 system, sharing the same central memory. The 6400 system is a separate computer with its own central memory and its own central processor and peripheral processors and channels. Thus there are really two distinct systems of the type illustrated in Fig. 2.1. In normal operation the 6500 runs two time-sharing processes and handles all on-line and remote computing activities, and the two systems share in the processing of user jobs. Measurement and analysis efforts are complicated by the fact that the two systems share a number of peripheral storage devices, and each may thus cause access delays for the other. Table 4.1 shows that 30.865 hours of central processor time were charged to various account categories on April 22. There is an overhead category that accumulates central processor time used by operating system tasks and not charged to users. For example, the job that produced this accounting report is an overhead job. Scheduling and some other activities of the time-sharing processes contribute to the accumulation of overhead central processor time. A dayfile analyzer program, which will be discussed later in this chapter and which is used to get more detailed reports about system activity, is not an overhead job. It is a 9000 level job, a category that represents computing center overhead, but not operating system overhead. The summary in Table 4.1 provides an adequate report of central processor utilization for all of the practical purposes for which the report will be used. However, there is some central processor utilization that is left out. The supervisor process is called at a very high frequency (on the order of 1000 or more times per second) to execute central processor executive functions for the other processes and processors in the system. Many of these functions are so short that it would add significant overhead to each occurrence to account for their processor use. We know through the use of the software probe that will be discussed in Chapter 5 that somewhere between 6 and 10% of the central processor time available is used for these functions, and that this represents central processor utilization that is not included in the data in Table 4.1. There is another central processor utilization figure on the second page of this report (Table 4.2) that comes from a quite different source. Raw total central
ACCOUNTING SYSTEMS
33
processor usage and I/O unit transfers are accumulated in locations in central memory, and the amount accumulated is periodically written out to the accounting dayfile. Amounts are printed roughly at hourly intervals to give information about fluctuations in the usage of these resources in the course of the day. The total central processor time recorded here is about an hour and a half more than that recorded in Table 4.1. Is this a measure of the time used by the supervisor process for its executive functions? The answer is that some but not all of this central monitor time is included here, but there are other things as well. The central processor accounting in Table 4.1 is for all completed jobs for which a billing record has been produced. There are jobs that are started, that use a measurable amount of processor and I/O resources, and that for one reason or another are rolled back and either dropped or restarted. This will be especially apparent on days in which a system failure occurs that requires fairly major recovery procedures. In some types of recovery, all jobs that were active at the time of recovery have to be rolled back (i.e., restarted). The raw CP and I/O statistics in Table 4.2 will include the time they used prior to rollback. The job time statistics in Table 4.1 will not. Depending on which of the two tables one uses, the central processor utilization on April 22 was 56% or 58% of the total central processor time available. An auditor, or a performance analyst, who sees a collection of these daily reports would see that the central processor utilization rarely goes any higher than that recorded here. Does this mean that there is a great deal of unused capacity that would be available for handling peak loads? This is an example of the type of conclusion that is sometimes drawn from superficial examination of system performance data. If one could define, and then determine the capacity of the system, it is not obvious what the level of central processor utilization would be when the system was running at its full capacity. It might very well be 60%, and attempts to push it above that level might merely aggravate congestion and queuing problems in other components of the system. An example of the danger of equating efficiency with central processor utilization appeared in [SC70], in which Schwetman reported the results of measuring the running time of the same set of jobs under two different operating systems on a CDC 6600 at the University of Texas. The clock time for running the set of jobs was about half as long in the UTEX 1 system as it was in the system that it replaced, even though central processor utilization was over 95% in both systems. This was central processor time charged to the jobs themselves, not to system overhead. A careless analyst might have assumed that the 95% processor utilization achieved in the earlier system was about as good as could be expected, and that the system was running at its capacity. The central processor utilization data could not and did not give any information about the inefficient and wasteful use of processor time in the run time environment provided by the earlier system. 4.5. Dayfile analyzer. The daily summary report shown in Tables 4.1 and 4.2 is just one of many reports that can be produced from the accounting dayfile data, or
TABLE 4.3 Example of typical day file analyzer output PH1MARY NO.
5
04/2?/t4.
09 .03.12.
05.46.53.
( 1)
CPU TIMt - SECONDS (ALL)
<
.500
1.000
P P P
5 S 5
NO. OF iTEMS/Ci ASS PERCENT IN EACH t|_«55 CUMULATIVE PERCENTAGES
1«53 <M.* 21.4
1?50 T4.4 35.8
P P P P
b b <3 b
TOTAL FOR THIS CLASS MEAN FOR THIS CLASS PERCENT IN EACM CLASS CUMULATIVE PERCENTAGES
42fl.2 .2 .4 .4
922.3 .7 .9 1.3
S S S S
51 51 51 51
TOTAL FOR THIS CLASS MEAN FOR THIS CLASS PERCENT IN EACH CLASS CUMULATIVE PERCENTAGES
110.2 .1 ?.l
69.8 .1 1.3 3.4
2 .000 1296 14.9 50.7
4.000
1500 17.3 68.0
1871. 1 4187.4 1.4 2.8 i.a 4.0 3. 1 7.1
8 .00
998 11.579.5 5780.9 5.8 5.5 12.7
16.00
619 7.1 86.6
32 .00
486 5.6 92.2
64 .00
332 3.8 96.0
128.0
232 2.7 98.7
256 .0 72 .8 99.5
6813.6 11087.8 13768.3 21814. 8 13552.6 94.0 11.0 22.8 41.5 188.2 20.9 6.5 10.6 13.0 13.2 19.2 29.8 64.0 77.0 43.0
512 .0
25 .3 99.8
>
SUM
16 .2
8679
loo.o
7830.9 16152.1 311.2 1009.5 7.5 15.5 100.0 84. 5
104210
1108.9 69.3 21.1 100.0
5256
COPE UTILIZATION - MWD/SECOND (ALL)
?.l
121.9 .1 2.3 5.7
*J4,5 .2 4.5 10.2
368.4 .4 7.0 17.2
418.0 .7 8.0 25.2
544.5 1.1 10.4 35.5
603.6 1.8 11.5 47.0
718.7 3.1 13.7 60.7
723.4 10.0 13.8 74.4
234.2 9.4 4.5 7B.9
TABLE 4.3—cont. FINAL FltLU LENGTH (1 « 1008) S 52 TOTAL FOR THIS CLASS 301205. 0236897. 0215665. 02*>.3b'M.O 189871 . 013*207. 0 92376.0 63431.0 53111.0 17801.0 815B. S 52 MEAN FOR THIS CLASS 16?. 5 189.5 166.4 1*9.1 190.3 216.8 190.1 191.1 228.9 247.2 3?6.3 286.8 S 52 PERCENT IN EACH CLASS 19.2 15.1 13.7 16.1 12.1 8.5 5.9 4.0 3»* 1.1 .5 .3 S 52 CUMULATIVE PERCENTAGbS 19.2 34.3 48.0 t»4.1 76.2 84.8 90.6 94.7 98.1 99.2 99.7 100. 0
IN CORt TIME (SECONDS) S S S S
53 53 53 53
TOTAL FOR THIS CLASS 59109.6 7335.8 11U14.3 19Ju9.3 32064.0 36790.0 438B2.0 49843.0 51981.0 57506.8 11218.0116113.3 MEAN FOR THIS CLASS 3]. 9 5.9 9.1 12.9 32.1 59.4 90.3 150.1 224.1 798.7 44*. 7 7257.1 PERCENT IN EACH CLASS 11.9 1.5 2.4 3.9 6.5 7.4 8.8 10.0 10.5 11.6 ?,3 23.4 CUMULATIVE PERCENTAGtS 11.9 13.4 15.7 19.6 26.1 33.5 42.3 52.3 62.8 74.4 76.6 100.0
496967
S S S
54 54 54 54
TOTAL FOR THIS CLASS 298743. 0273210. 0392334. 06457 J4. 0764488.0836263. 0913274. 0«76502. 0512829. 0368521 . 0 29030.0648916.0 MEAN FOR THIS CLASS 161.2 218.6 302.7 »30.5 766.0 1351.0 1879.2 3543.7 2210.5 5118.3 1161.2 40557.2 PERCENT IN EACH CLASS 4.4 4.0 5.7 9.4 11.1 12.2 13.3 17.2 7.5 5.4 .4 9.5 CUMULATIVE PERCENfAGtS 4.4 8.3 14.1 23.5 34.6 46.8 60.1 77.3 84.7 90.1 90.5 100. 0
P S
5 MEAN • 51 MEAN * 52 MEAN a 53 MEAN a 54 MEAN *
I/O COUNT
S
S
S S
12.01 .61 181.01 57.2* 790.4^
STANDARD STANDARD STANDARD STANDARD STANDARD
DEVIATION DEVIATION DEVIATION DEVIATION DEv/IATION
61.55 5.20" 127.32 777.57 5266.33
MIN *
CORR. CORR. CORR. CORR.
0.00 COEF, • COEF. * COEF. » COEF. »
MAX .79 .07 .62 .39
* 3887.4 C.V. * C.V. » 8.58 C.V. a ,70 C.V. • 13.58 C.V. » 6.66
5.13
685
TABLE 4.4 Dayfile analyzer output (cont.)
PRIMARY NO.
6
04/22/74.
09.03. 12.
05.46,,53.
( 1)
CPU TIME - SECONDS (ALL)
<
.500
1.000 2,,OOU
4.000
8,•00
P P P
h 6 6
NO. OF iTEMS/Cl ASS PERCENT IN EACM CLASS CUMULATIVE PERCENTAGES
1R53 21.4 21.4
12*0 14.4 35.8
1296 14.9
50.7
1500 17.3 hS.O
998 11.5 79.5
P P P P
b b 6 6
TOTAL FOR THIS CLASS MEAN FOR THIS CLASS PERCENT IN EACH CLASS CUMULATIVE PERCENTAGES
428.2 .2 .4 .4
922.3 .7 .9 1.3
1871.1 1.4 1.8 3.1
41H7.4 2.8 4.0 7.1
5780.9 5.8 5.5 12.7
16.00 619 7.1 86.6
32.00
486 5.6 92.2
64.00 332 3.8 96.0
128.0 232 2.7 98.7
256.0
72 .8 99.5
6813.8 11087.8 13768.3 21814.8 13552.6 11.0 94.0 41.5 188.2 22.8 6.5 20.9 13.? 10.6 13.0 19.2 64.0 29.8 77.0 43.0
512.0 > 25 .3 9Q.8
16
8679
.2 100.0
7930.9 16152.1 1009.5 313.2 15.5 7.5 S4.5
SUM
104?10
100.0
I/I €/>)
ROL LOUT COUNT
S
bl 61 61 bl
TOTAL FOR THIS CL«SS MEAN FOR THIS CLASS PERCENT IN EACM CLASS CUMULATIVE PERCENTAGES
S'3.0 .3 4.2 4.2
364.0 .3 2.7 6.9
4*2.0 .4 3.6 10.5
o<»8.0 .4 4.8 15.3
1729.0 1.7 12.7 28.0
1765.0 2.9 13.0 40.9
1716.0 3.5 12.6 53.5
1848.0 5.6 13.6 67.1
1957.0 8.4 14.4 81.5
1245.0 17.3 9.1 90.7
218.0 .4 8.1
140.0 .4 5.2
82.0 .4 3.1 98.9
17.0 .2 .6 99.5
591.0 23.6 4.3 95.0
681.0 42.6 5.0 100.0
1*609
7.0 .4 .3
2682
PROS. NOT REACH SECOND PASS
S S
s
62 TOTAL FOR THIS CLASS 62 MEAN FOR THIS CLASS 62 PERCENT IN EACH CLASS
5 b Z CUMULATIVE PERCtNTAGES
224.0 .1 fl.4 ft.4
337.0 .3
12.6
20.9
431.0 .3 16.1
37.0
604.0 .4 22.5 !»9.5
397.0 .4 14.8 74.3
219.0 .4 8.2 82.5
90.6
95.8
ft.O .2 .2
99.7 100.0
TABLE 4.4—cont.
R A T I O IN CORE TIME TO JA-JB TIME (PERCENT) S 63 TOTAL FOK THIS CLASb 1 5801 1 . 1 1 09091 .311 1 B6b. 0 1*8094.0 79339.0 44612.9 2804*. 4 18893.9 13006.6 111 618.2798.6 4 695481 S 63 MEAN FOR THIS CLASS 85.3 87.3 86.3 M5.4 79.5 72.1 57.7 56.9 56.0 38.9 44.* 38.6 S 63 PERCENT IN EACH CLASS 22.7 15.7 16.1 18.4 U.4 6.4 4.0 2.7 1»9 .4 .2 .1 S 63 CUMULATIVE PERCENl AlitS 22.7 38.* 54.5 72.9 84.3 90.7 9*. 8 100.0 97.5 99.3 99.8 94.9
RATIO CPU TO tN CORE TIME (PERCENT)
S S S S
p
S S S S
6* 64 64 64
T O T A L FOR THIS C L A S S 16672.6 31390.0 36685.5 48J12.4 35414.3 2b3o2.3 21098.2 1&058.5 13025.7 43.4 M E A N FOR THIS C L A S S 9.0 25.1 35.5 32.2 56.1 28.3 40.9 48.4 8.4 PERCENT IN E A C H CLASS 6.6 12.5 14.6 19.2 10.1 14.1 5*2 6.4 CUMULATIVE PERCENIAGES 19.1 33.8 53.0 77.2 97.2 6.6 67.1 85.6 92.0
6 61 62 63 64
MEAN MEAN MEAN MEAN MEAN
« * a a ^
12.01 1.57
.31 80.13 28.9?
STANDARD STANDARD STANDARD STANDARD STANDARD
DEVIATION DEVIATION DEVIATION DEVIATION DEVIATION
s = * » *
61.55 7.62 .46 32.41 21.56
MIN « 0,,00 MAX m 3887.4 C.V. » CORR. COEF. • .27 C.V. • 4. r • 1.50 .03 C.V. COEF, CORR. COEF. * -.13 C.V. • • 40 .19 C.V. CORR. COEF. .75
CORR.
5.13
4292.8 59.6 1.7 98.9
1993.1 75.7 .8 99. 6
891.3 55.7 .4 100.0
251037
TABLE 4.5 Day file analyzer output (cont.) PK1MARY NO.
7o NO. OF ITEMS/CLASS
70
04/22/74.
PERCENT OF TIME LIMIT USED (ALL) 50.0 < Ho. oo 20.00 30.OU 40 .00 60.0 4551 15191519640 349 309 485
09. 03.12.
80 .0
70 .0
,53. OS. 46,
90
100
110
( 1)
>
SUM
70 70
PERCENT IN EACH CLASS CUMULATIVE PERCENTAGtS
185 2.1 92.6
130 1.5 94.1
109 1.3 95.4
70 70 70 70
TOTAL FOR THIS CLASS 15674.4 2J2B9.2 15811.1 1 6*1.9 15626.6 16914.3 12014. B MEAN FOR THIS CLASS 3.5 14.0 44.8 24.7 64.9 34.9 54.7 8.9 9.0 PERCENT IN EACH CLASS 9.0 12.1 9.6 9.6 6.8 48.6 30,1 39.7 65.0 CUMULATIVE PEHCENlAGtS 9.0 21.1 58.2
9657.1 74.3 5.5 70.5
9279.2 85.1 5.3 75.8
6640.8 33136.2 2911.8 94.9 10?. 6 323.5 3.8 1.7 18. B 79.5 9R.3 100.0
176077
701 701 701 701
TOTAL FOR THIS CLASS 10647.3 14075.1 MEAN FOR THIS CLASS ?.3 9.3 PERCENT IN EACH CLASS In. 2 13.5 CUMULATIVE PERCENTAGES 10.2 23.7
8598. B 66.1 8.3 67.4
6642.5 60.9 6.4 73.8
6616.9 20134.5 94.5 62.3 6.3 19.3 80.2 90.5
104210
MIN = 0.00 MAX = 1834.3 C.V. s .25 C.V. » CORR . COEF. = 5 .13
1.61
5?. 4 5?. 4
17.5 69.9
7.4 77.3
5.6 b2.9
CPU T1MF
70 MEAN x 701 MEAN «
20. 2^ 12.01
STANDARD DEVIATION = STANDARD DEVIATION =
5122.0 8.0 4.9 28.6
7*12.3 14.9 6.9 35.6
32. 7i 61.55
4.0 86.9
3.6 90.5
70 .8 96.2
323 3.7 90.9
9 .1 100.0
8679
(SECONDS) 7365.9 21.1 7.1 42.6
7622.4 24.7 7.3 49.9
9630.5 52.1 9.2 59.2
542.1 60.2 .5 100.0
ACCOUNTING SYSTEMS
39
from analogous data on other systems. Programs have been developed at many computer installations that collect and display and analyze this type of data. An example is the "dayfile analyzer" program that was developed at Purdue University by William C. D'Avanzo for use as a research tool in performance measurement and as a source of information for Computing Center management. The dayfile analyzer is a data reduction program that scans the accounting dayfile, and produces a series of tables and reports. It is a rather long-running program of the type that can easily produce more information that one can digest on a regular schedule; hence it is only run on demand when something unusual occurs, or when a special study is under way. Some of the output of a dayfile analyzer run for April 22,1974 is presented in Tables 4.3 to 4.6. April 22 is the same day for which the daily summary report is presented in Tables 4.1 and 4.2. Daily reports toward the end of April indicated the 22nd would be the peak computing day of the Spring semester. The dayfile analyzer program was run that day in order to obtain information about the performance of the system under a very heavy load. The bulk of the output of the dayfile analyzer is a series of reports showing the distributions of selected performance parameters. Examples of these reports are shown in Tables 4.3, 4.4 and 4.5. Table 4.3 is a typical distribution report whose primary parameter, P5, is central processor time. The 8679 jobs are divided into classes according to the number of central processor seconds used. Note that the 9,557 jobs indicated in Table 4.1 included 878 terminal sessions that are not included here. The classes are 0 to 0.5 sec., 0.5 to 1 sec., 1 to 2 sec., 2 to 4 sec., etc. The two tables labeled P5 show the distribution of jobs in these classes, and the distribution of central processor time in these classes, showing actual counts or quantities, and percentages and cumulative percentages. The S or secondary tables show the distribution of other parameters into these same classes. Thus, 13.3% of all input/output transferred was for the 486 jobs that used between 16 and 32 central processor seconds, and 9.5% of all input/output was done for the 16 jobs that used more than 512 seconds of central processor time. At the bottom of the table there are summary statistics that show means and standard deviations, etc. Thus, for example, the average amount of central processor time used by a job was 12.01 seconds, with a standard deviation of 61.55 seconds. This very large standard deviation, relative to the mean, is typical of service time distributions encountered in many computer performance studies. Table 4.6 is a summary report that gives a great deal of information about the total resources used by the system, and about the amounts of system resources used by the average job. The dayfile analyzer reports constitute one of the very important sources of information about the workload and the performance of the system. The tables presented here show only a small fraction of the total information contained in a full dayfile analysis. There are many uses for this kind of information. A few examples will be discussed here. Some others will be mentioned in later chapters.
TABLE 4.6 Day file analyzer summary statistics PERCENT USAGE BY JOB TYPt {JOB BASES)
CPU CM I/O UNITS IN CORE TM R.O. TM R.O. COUNT LINES COUNT
PROCSY
KEG. BATCH
OTHER BATCH
24.16 24.53 31.43 22.13 21.02 44.29 29.24 29.69
56.21
11.92 9.13 11.94 7.75 11.83 12.34 12,77 10.32
4b.r>6
3V. 10 29. *0 64.?8 38.34 5b.?l 51.PO
EXPORT ..54 1.15 1.94 ,7S 2.26 1.80 .91 2.39
CONSOLE 7.10 20.04 15.55 39.76 .61 3.22 .80 5.77
OTHER .07 .05 .05 .02 .00 .01 .07 .02
TOTAL 104210 5256 6859844 496967 1772283 13609 445163? 8679
TABLE 4.6—cont. MISC. STATISTICS BY JOB TYPE (JOB BASES) PROCSY
«EG. BATCH
OTHER BATCH
EXPORT
CONSOLE
OTHER
ALL
.38 19.25 1927.81 1.6* 5.55 5.39 2.30
.37 35.29 c398.08 2.19 16.98 4.67 4.02
.19 7.15 732.10 .57 3.13 1.50 .80
.01 .90 119.07 .06 .60 .22 .19
.11 15.70 953.98 2.94 .16 .39 .*5
.00 .04 2.B5 .00 .00 .00 .00
1.55 78.33 6133*90 7.41 26.41 12.17 T.76
9.77 .50 836.62 42.67 144.58 2.34
13.03 .53 596.51 32.72 253.38 1.16
13.87 .54 913.78 42.97 234.05 1.87
2.71 .29 643.31 17.97 193.44 1.18
14.77 2.10 2129.51 394,39 21.42 .87
35. «9 1.25 1593.50 44.50 11.77
i.no
12.01 .61 790.40 57.26 204.20 1.57
MEAN FL IN USE/IN CORE JOB < K ) I/O UNITS/MIN./1N CORE JOB ROLLED OUT JOBS/IN CoRE JOB ROLLOUTS/MIN./IN CORF JOB
.23 11.75 19.60 3.39 .05
.40 16.10 18.23 7.74 .04
.32 12.46 21. Z7 5.45 .04
.15 16.21 35.79 10.76 .07
.04 5.33 5.40 .05 .00
.81 27.99 35.91 .?6 .02
.21 10.58 13.60 3.57 .03
I/O UNITS/CPU SEC. IN CORE SEC. /CPU SEC. KOLLtU OUT SEC. /CPU «EC. USED ROLLOUTS/CPU MlN. JOBS PROCESSED/CPU MTN. USED
85.64 4.37 14.80 14.36 6.14
45.78 2.51 19.45 5.34 4.61
65.89 3.10 16.88 8.11 4.33
237.79 6.64 71.50 26*25 22.18
144.19 26.71 l.*5 3.55 4.06
44.39 l.?4 .33 1.67 1.67
65.83 4.77 17.01 7.84 5.00
ROLLED IN TlME/ROLLlN* (SEC.) ROLLED OUT TIME/ROLLOUT (StC.)
2.93 250.58 12.78 61. B2
6.03 276.09 15.14 218.32
4.83 317.96 1*.95 12*. 90
1.24 294.62 8.23 163.43
7.88 1136.19 210.42 24.50
17.95 796.75 22. ?5 11.77
t»l. MEAN N89 5CTR5 ROLLtU OUT
1159.12
7577.47
873.31
151.0?
13.28
.16
9774.35
EST. NBR SCTRS ROLLEn OUT/MIN.
896.63
1180.45
283.38
56.91
31.93
.fll
2450.11
MEAN NUMBER OF CPUS TN USE MEAN CORE IN USE (K DEC. WUS.) I/O UNITS TRANSFERED/M1N. MEAN NBR. OF CT|_. PTS. IN USE MEAN NBR. OF JOBS ROLLED OUT ROLLOUTS/MIN. JOBS PROCESSED/WIN. MEAN MEAN MEAN MEAN MEAN MEAN
CPU TIME/JOB CM/JOB I/O UNITS/JOB IN CORE TIME/JOB (SECt) ROLLOUT TIME/JOR (SEC.) NBR. ROLLOUTS/JOB
PUN CORE JOB is USIMG A CPI»
CPU SEC. USED/ROLLIN*
i/o UNITS/I^OLLIN*
4.68 307.78 22.30 130.23
42
CHAPTER 4
4.5.1. Applications of dayfile analyzer data. It is often typical of this kind of system that large amounts of system resources are consumed by a relatively small number of jobs. The extent to which this is true on any particular day can easily be TABLE 4.7 Resources used by jobs that show highest resource utilization
Central processor time Input/Output count Print lines
Percent of jobs
Percent of utilization
20.5 7.8 15.4 5.9 23.0 9.3
87.3 70.2 70.5 56.8 66.3 42.7
deduced from the dayfile analyzer report. Thus the report for April 22nd gives data as shown in Table 4.7. This data indicates the extent to which resource utilization can be controlled by concentrating on large jobs. As part of its output, not shown in the tables reproduced here, the dayfile analyzer also produces a list that contains detailed information about all jobs whose resource use in any one of a number of categories exceeds certain specified limits. This information is very important in the interpretation of all other data concerning that day's operation, since a single job, or a small set of jobs, can have a great deal of impact on the total performance of the system, and can distort the average values of most of the measured parameters. Computing center policies aim to give fast turnaround to small jobs at the expense of large jobs. Dayfile analysis results have been used in setting scheduling parameters that implement this policy, and then in determining the extent to which these policies are successful. The preempt-resume scheduling techniques that are used to implement such policies introduce considerable system overhead that is reflected in the number of times jobs are rolled out of memory and rolled back in. Dayfile analysis data about the number of rollouts and the amount of data involved permits us to monitor this problem, and to adjust the scheduling parameters to keep rollout activity within reasonable bounds. One of the parameters used by the system job scheduler is the user's estimate of the maximum amount of central processor time that his job is going to use. Table 4.5 suggests that these estimates are not very accurate. On that day the average job used only about 20% of the time indicated on the time limit field of its job card. This type of data casts some doubt on the usefulness of this parameter in estimating the job's central processor time requirements. However, Table 4.5 also shows that more than 50% of all central processor time is used by jobs whose estimate is more than 50% of the time they actually used.
TABLE 4.8 Distribution of FORTRAN compile time (CTIME) in seconds <0.5
2.0
3.5
5.0
6.5
8.0
9.5
No. of items/class Percent in each class Cumulative percentages
2317 50.0 50.0
1270 27.4 77.4
320 6.9 84.3
172 3.7 88.0
113 2.4 90.4
100 2.2 92.6
33 0.7 93.3
Total for this class Mean for this class Percent in each class Cumulative percentages
488.0 0.2 4.2 4.2
1,344.4 1.1 11.7 15.9
873.4 2.7 7.6 23.5
717.9 4.2 6.2 29.8
654.9 5.8 5.7 35.5
724.7 7.2 6.3 41.8
286.7 8.7 2.5 44.3
Mean = 2.5
Standard deviation = 7.1
Min = 0.0
Max = 125.3
C.V . = 2.86
12.5
14.0
15.5
>
SUM
72 1.6 94.8
41 0.9 95.7
42 0.9 96.6
28 0.6 97.2
128 2.8 100.0
4,636
738.2 10.3 6.4 50.7
474.8 11.6 4.1 54.8
554.8 13.2 4.8 59.7
407.7 14.6 3.5 63.2
4,228.0 33.0 36.8 100.0
11,494
11.0
44
CHAPTER 4
4.5.2. Additional data. There is always some conflict between the performance analyst's desire for as much information as possible, and the need to minimize the impact on the production system by limiting the time and space requirements of the data-gathering tools. A typical system will give the user information concerning the processing of every control card, and the duration and resource utilization of each job step. All of this information is not necessary for accounting purposes, but it could be accumulated for the purposes of performance analysis, at least on occasions when special studies are to be made. Table 4.8 is an example of data concerning FORTRAN compile time that is not normally accessible through the accounting dayfile analyzer program, but that can be obtained, with some special effort, by a resourceful analyst who can obtain the cooperation of the operating staff of the computing center. For the particular day in which the data of Table 4.8 was collected, the table shows that half of the FORTRAN compilations were done in less than 0.5 seconds, and that only about 5% of the compilations took more than 10 seconds. It also shows that here again the standard deviation of the compile time distribution was very large relative to the mean. 4.6. Consistency of data. Even some of the most basic units of performance may change their meaning as new operating systems and new versions of operating systems are installed. Comparisons of performance between different systems, and over considerable periods of time for the same system, should always be suspect. It has already been pointed out that the amount of processor time charged to a job may vary from run to run under the same system, and may vary very significantly under different operating systems on the same hardware system. The definition of a job may be quite different in different systems, and the use of number of jobs run as a measure of system productivity may lead to erroneous conclusions. For example, a terminal user in the Purdue MACE system may access a "permanent" file by typing a GET command. Before 1972, the GET command caused a short job to be generated and inserted into the input queue with very high priority. The job would move the selected file from permanent storage to working storage, and the terminal user could then proceed. Some other interactive functions, including all other permanent file commands, were also handled through this technique of generating small jobs. A new version of the operating system, installed in the summer of 1972, provided for the direct execution of these functions by a time-sharing process of the type described in Chapter 2. As a result, all statistical summaries showed a significant decrease in the number of jobs, and a significant increase in the size of the average job. It would be easy for someone who was unaware of the significance of the 1972 system change to draw from an analysis of year-to-year changes in these statistics, incorrect conclusions about trends in the number and size of jobs.
CHAPTER 5
Software Probes 5.1. Introduction. The data discussed in the preceding chapter is summary data, produced by system programs at well-defined times that mark the beginning or the end of a job or part of a job. Each statement in the job control (control card) language produces one record or a small packet of records that are placed in the accounting log or dayfile. There is no detailed record kept of the execution of job processes, or of the supervisor process itself. It would be possible to collect a great deal more performance information in the accounting log than is required for accounting, but this approach to system measurement can introduce intolerable overhead into the system. A more practical approach is to use the methods of Chapter 4 to study the more general aspects of system behavior, and to use system probes to fill in the missing details. A system probe is a device that collects information produced during the execution of job processes and system processes. The data collected by the probe provides the input for data reduction and analysis and evaluation routines. These routines can be running along with the probe as part of the probe process, but in most cases it is more practical to use the probe simply as a data gathering device, and to perform the reduction and analysis in separate runs that do not form part of the load on the computer while the probe is in operation. Probes can be hardware devices, or they can be implemented as software that runs along with, or as part of, the system that is being measured. Probes are sometimes referred to as "system monitors" in the performance evaluation literature. I prefer to avoid the use of the word "monitor" in connection with software probes, since "monitor" has been used with other meanings in various software systems. This chapter will deal only with software probe techniques, and will present some examples to illustrate the kinds of data and the kinds of performance information that can be obtained. 5.2. Sampling probes. Software probes fall into two general categories: sampling probes and tracing probes. A sampling probe attempts to obtain random samples of values of variables that are of interest to the performance analyst. These samples can then be used to obtain estimates of the actual distributions of these variables, their means, standard deviations, etc. The techniques and results of statistical sampling theory can be used to test the significance and consistency of the results. A typical sampling probe is given control of the system periodically for very short intervals of time. During such an interval the probe takes a "snapshot" of the 45
46
CHAPTER 5
system, i.e., it reads the values in a number of significant registers, and the state of a number of significant flip-flops. The result of each snapshot is an output record containing one sample value for each of the parameters of interest. The file of output records provides the input to the statistical analysis programs associated with the probe. As an example, the output record might contain information as to which input/output channels were busy at the time each snapshot was taken. If the sample is large enough to give a good estimate of the probability that a given channel was busy at an arbitrary time during the run, this probability will then provide a good estimate of channel utilization. The snapshot would normally include the address of the location in memory of the instruction that was being executed when the snapshot was triggered. In systems in which the supervisor and other permanent processes occupy fixed areas of memory, a frequency distribution of these addresses can be used to obtain an estimate of the fraction of total time the central processor spent in the permanent processes, and the fraction spent in connection with user jobs. It is thus possible to estimate processor idle time and system overhead, and to break system overhead down into its components. Snapshot techniques, and all other probe techniques, are not limited in their application to the study of the performance of permanent or supervisor processes. The same techniques can be used very effectively to study the performance of all kinds of programs: language processors, large applications systems, and any other jobs that are run frequently and that use enough system resources to warrant some effort devoted to studying and improving their performance. The sampling probe may be triggered periodically by a system clock, or it may be triggered by a recurring system event. In either case care must be exercised to make sure that the samples are indeed representative random samples of the parameters that are studied. A major advantage of the sampling probe over the tracing probe, that will be discussed below, is the fact that a sampling probe can usually be installed with little or no change to the operating system of the computer whose performance is being measured. For this reason the software probes used by commercial performance evaluation companies are sampling probes. Sampling probes can provide a great deal of information about the performance of a system at a relatively low cost [KO71]. The rest of this chapter will be devoted to a discussion of tracing probes and some examples of their use. Such probes are less generally available as performance measurement tools than sampling probes. However, where they are available they can give much more information, and more detailed information about the behavior of a system than can be obtained by sampling probes. Tracing probes are also more general, since a tracing probe can be used as a sampling probe through the implementation of appropriate probe routines. There are many problems associated with the use of a tracing probe, and with the interpretation of the data it provides, but it is worth spending considerable effort on solving these problems, since data obtained through the use of software
SOFTWARE PROBES
47
tracing probes can lead to insights into the operation of a computing system that are not obtainable in any other way. 5.3. Trace programs. Trace programs were among the earliest types of debugging aids available for programmers of high speed digital computers. A trace records data about the contents of some critical registers before or after the execution of each instruction, and produces an output record for each instruction executed. The use of trace programs fell into disrepute in early computer generations because trace routines frequently produced large volumes of output that overtaxed available printing capacity. Basic trace programs are also inherently very slow. They are simulators of the machine whose programs they are tracing, and execute those programs interpretively. Selective trace programs that trace only specified areas of a program, or only specified types of instructions, helped speed up the tracing process and cut down the output. Hardware aids on various machines, for example, the Trap Transfer instructions on the IBM 704 in the 1950's and the Program-Event Recording Facility on the IBM 370 in the 1970's, have helped to rehabilitate the tracing concept, as has the incorporation of tracing facilities in higher level languages. The purpose of the hardware features is to permit parts, or selected instruction types, of programs to be traced while the program itself is running at or close to the normal speed with which it would run in the absence of the trace. 5.3.1. Tracing the supervisor process. An instruction by instruction trace of the supervisor process provides, in effect, a detailed simulation of a multiprogramming system. As in the case of all interpretive traces, the execution is very slow and the output can easily become excessive. In cases where timing information is not important, the output can yield a great deal of performance information. A detailed trace, the Instruction Trace Monitor (ITM), was used in connection with IBM's TSS 67. Even though special hardware was introduced to aid in the accumulation of the trace data, it was "totally impractical to use ITM in a 'user' environment." This quotation is from [DE69], where it is further pointed out that the data collected has been used for studies of addressing patterns in virtual memory. 5.4. Tracing probes. The tracing probe that we shall discuss is a mechanism for providing a detailed record of the executive functions executed by the supervisor process, while avoiding some of the slowdown and overhead problems associated with tracing. It represents an attempt to produce an event trace of the supervisor process while the system is executing at normal speed in a production environment. In order to discuss some of the concepts and problems involved in the use of a tracing probe we shall look at a specific software probe that functions in connection with the Purdue Dual MACE operating system. As mentioned in Chapter 4, the data presented here are presented only to illustrate the use of the techniques, and the kinds of data that can be obtained, and the kinds of questions
48
CHAPTER 5
that might be answered through the use of such data. These data are not included here to provide information about the performance of this particular system. In fact, partially as a consequence of the data gathered by this and other techniques, the system has been modified and improved in ways that make most of the data previously gathered obsolete as indicators of system performance. This situation has already been alluded to in Chapter 4. It is very difficult to obtain a valid comparison of system performance over a time period that includes significant system changes. The software probe program from which we shall draw our examples was implemented by Joel Ewing at the Purdue University Computing Center. Details are contained in Computing Center documents and in [EW75]. Vic Abell designed the supervisor process for the Dual MACE system [AB74] including the features and facilities necessary for the implementation of efficient data-gathering tools. Herb Schwetman designed and implemented data reduction and data presentation programs used with the software probe. 5.4.1. Supervisor process. Since the supervisor process controls all of the programs running in the system, it must itself be actively involved in any attempt to trace its execution. Thus, in the particular system from which we are taking our examples, and in other similar systems, the supervisor process itself produces the trace information and supplies event records to a special probe process. Each event record corresponds to the execution of a supervisor executive function (also called a central monitor function), and contains the time at which the function was performed, the function type, and data extracted from a number of relevant registers and storage locations. These records in time sequence provide an almost complete and detailed picture of the operation of the system, since essentially all significant events that occur in the system involve one or more function requests to the supervisor process. Data extracted by the tracing probe provide the raw material for data reduction and analysis programs that give a great deal of information about the performance of the system. They can also provide input for controlled "trace-driven" experiments, either using the real system or models of the system at various levels of detail. 5.4.2. Supervisor process overhead. It takes time for the supervisor process to produce the event records. This is a necessary overhead cost of the data gathering facility, and the data analyst should be aware of the extent to which this overhead cost perturbs the system that is being measured. It would not be reasonable to pay this overhead cost during times when no data-gathering process is in operation. The code that produces the event records may be part of a special version of the supervisor process that is only used when a probe process is running. Alternatively, there can be a single supervisor process that is modified at the time a probe process is activated, and that can be restored to its original condition when the probe process is removed. This latter approach is used in the supervisor in the
SOFTWARE PROBES
49
Purdue system, in which a number of switches are set when the probe process is installed. There is actually a small residual overhead associated with testing these switches when the probe is not present. This overhead ranges between 1 and 7 microseconds per potential event record, and so far at least it has been considered to be negligible, or at least tolerable. The increase in supervisor overhead time when a probe process is active and event records are being produced is roughly 50-55 microseconds per event record. This is considered to be relatively small in view of the multi-millisecond nature of most of the phenomena that are being studied. These numbers and this conclusion are of course specific to one system. The overhead involved in producing event records could be very significant in connection with the analysis of some other types of activities on other systems. The probe process provides an area of memory that serves as an event record buffer. This central memory buffer receives inputs in the form of event records from the supervisor process. The probe process must remove these records from the buffer at least as fast, on the average, as the supervisor process delivers them. This usually involves writing the event records (or a selected or modified subset) to peripheral storage. Either disc or tape storage could be used, depending on the volume of data and on the configuration of the system. For the purposes of this discussion we shall assume that the data is written on magnetic tapes. The rate of record generation can be very high. In the Purdue system, on a computer that is not extremely fast by present-day standards, records which contain several 60-bit words each are generated at a rate of about 1,000 per second. The design of the probe is a compromise. The probe would lose much of its value if it used too many system resources. Thus, the size of the record buffer must be kept as small as possible, consistent with adequate performance of the probe. It is necessary to consider what happens if the supervisor process generates a record and finds that the record buffer is full and there is no place to put the new record. There are two alternatives. The supervisor may simply stop and wait until the probe process makes additional buffer space available. It is assumed, of course, that the system is designed in such a way as to permit the probe process to continue even though the supervisor is temporarily blocked. This approach was taken in the SIPE system that is described in [DE69]. In connection with SIPE, Deniston points out that "SIPE recordings are not normally made during a test run intended to measure absolute system performance." In more recent years most software probes, including IBM's GTF [IB72] for their 360 and 370, and the Purdue probe discussed here, permit the supervisor process to go on at its normal pace, "losing" all the event records it generates until the probe process frees up buffer space and indicates that it can again accept event records. The first event record that it then gets will be a "lost data" record that tells how many records were discarded. There are very serious and difficult data analysis problems that arise because of the possibility of lost data, and the probe would become almost useless if the lost data situation occurred very frequently. When using this approach there is usually an implicit assumption that the event buffer is large enough and the probe process input-output handling is fast enough to make lost data conditions infrequent.
50
CHAPTER 5
When using the other alternative in which the supervisor waits for buffer space when necessary, there is an implicit assumption that the total waiting time will be small enough to make the resultant slowing down of the system tolerable. In that kind of system it is very difficult to measure accurately the amount of slowing down that can be attributed to full event buffers. In a probe that is meant to be used for performance evaluation in normal production environments it is probably better to accept the possibility of lost data, and to design the data analysis programs to take this possibility into account. One cause of lost data might be the inability of the probe process to move its data to tape fast enough because of competition with jobs using tapes. An alternative would be to provide a separate dedicated tape channel, controller and tape unit. This is more in the spirit of the hardware probe approach. Another alternative is to limit or eliminate other tape jobs while the probe is in operation. Unfortunately, tape jobs are often an important part of the environment that is being studied. 5.4.3. The experimental approach. The magnitude of the software tracing probe in terms of the space that it occupies and the resources that it uses is so great that it is neither practical nor desirable to run the probe all of the time, or even a large part of the time. There are situations in which it may be reasonable to gather as much information as possible, and to decide afterward what to do with it and how to analyze it. These are situations in which special conditions prevail that may prove difficult to duplicate. It is easy to think of examples in areas like astronomy and space exploration in which there may be a once-in-a-lifetime occasion to observe certain phenomena. In the case of computer performance data, there may be special conditions, especially peak loading conditions, that occur sufficiently infrequently to warrant the collection and saving of large amounts of raw data. Even in such cases the data usually proves to be of marginal value, since systems rarely remain constant and comparable over long periods of time. In just about every normal situation the use of an instrument as powerful and prolific as the software probe should be carefully planned and scheduled. The overhead of the probe and the volume of data that it can produce make it important to decide beforehand what data is needed and how much, and what programs and techniques will be used to reduce and analyze the data. The area of experimental design is worthy of serious study on the part of performance analysts, and a good deal of statistical theory is relevant. The difficulty of a valid scientific approach in this area should not be underestimated. Relatively few complete studies have been reported. One interesting example is [AN72b]. 5.5. The probe process. During the time it is in operation and gathering data the software probe operates as a permanent process with a high dispatching priority. The particular probe program in use in a probe run is one out of a library of probe programs, each of which is designed to implement a particular probe strategy. Thus one probe program might be designed simply to move all records to
SOFTWARE PROBES
51
tape for future analysis. Another might select only certain types of information records and place them on tape. Still another might be designed to do on-the-fly analysis of certain types of records, producing summary results on tape or disc units. Data about the system resources used by the probe process itself are automatically collected by any nonselective probe. In a typical data gathering run the Purdue probe uses 6-10% of the available central processor time, and thus has a very considerable impact on the system it is measuring. The following sections will present some examples of data and reports obtained through the use of the Purdue MACE software probe and some of its associated analysis routines. Information of the type presented here has been almost indispensable as an aid in the development and enhancement of many features of the operating system. 5.5.1. Data produced by the event trace. Table 5.1 is a sample of the most basic records produced by the Purdue software probe. We call this "raw" data, although even here there is some evidence of selection and editing. It should again be emphasized that we are dealing with an event trace, not an instruction by instruction trace. Time is recorded in milliseconds. Each printed line corresponds to a function executed by the executive routine (central monitor) of the supervisor process. The column headed T shows the event type. An event of type 4 indicates the assignment of a central processor. The fourth column, CP, is the control point number. Thus, the fifth event in Table 5.1 shows that central processor 0 was assigned to control point 2 at time 1407.01155 (1155 milliseconds past 2:07 P.M.). The twelfth event shows central processor 1 being assigned to control point 16. A control point number is associated with a process when it becomes active, and the process is identified by that number until the process is terminated or rolled out of central memory. Permanent processes have permanent control point numbers. At the time this data sample was taken, numbers 00 and 17 were assigned to the supervisor process, 01 was the spooling process, 14 was the probe process, and 15 and 16 were time-sharing processes. Numbers from 2 to 13, if used, are job processes that are given a temporary control point number each time they are activated. The time covered in Table 5.1 is 65 milliseconds, and not all active processes are represented by events in this time period. Events of type 6 are events that take place only at a time-sharing process. The type six events listed here show terminal 15's task being swapped out at the end of a time slice (time 1169) and terminal 2's task being swapped in for the start of a new time slice (time 1170). Events of type 0 are peripheral processor events, many of which have to do with input and output. The OR request in the second event in the table shows channel 7 being reserved (0042 0007) at the time 1153 for peripheral processor 4 which is doing an input/output function for the time-sharing process 16. At time 1171 the OR request (0025 0007) shows the corresponding drop channel operation.
TABLE 5.1 Raw data from the Purdue software probe T CPU PK6
0 0 0 4 4 0 4 4 0 0 0 4 4 0 0 0 4 0 4 0 0 0 4 6 6 6 6 0 0 4 0 0 0 0 0 4 4
0 0
n n n
0
n
0
1
1 1 1 1 0
n
n n n
0
1 1 1 1 1 1 1
CIO 1IM CIO **« »•« CIO #*« **» DSM OSM DSM »*« ««« CIO CIO CIO »»» N
*»• OSM DSM DSM «»«
1
n
n 1 0
n
1IM 1IM »*« 1IM CPM
0 0
CPX CPM
0
»*» »»»
n n
CPM
CP PPU If 00 16 17 0? 16 16 0? 16 16 16 16 04 16 16 16 16 16 0? 16 16 16 16
05 04 05
05 03 03 03 05 05 05 00 03 03 03
on on
04 04
1*
04 05 05 05 05
04 00 16
16 16 16 0?
E v F w T
ET
TIME
115? 1153 1153 1407 .01155 1407,01155 1 1556 1407 .01156 1407 .01156 0 1156 0 1159 0 1159 1407 .Oiuo 1407 .oiuo 0 1160 0 1161 0 1162 1407 .01162 1 1163 1407 .01163 0 1164 0 1166 1 1167 1407 .01167 *»» 0 1167 *«* 2 1169 *«ft 1 1170 «»» 0 1170
0 0 0
1
lUl
0 1171 1407 .01172 0 1172 0 1172 0 1176 0 1176 0 1177 1407 .01177 1407 .01177
00*7 004? nO*7 PRFV PRFV 0016 PRFV ORFV 00*7 0055 0033 PRFV PRFV 00*7 00*5 0016 PRFV 0036 PRFV 00*7 0055 0033 PRFV 0 ?3 1 ?3 1 ?3 1 ?3 007? 00?5 PRF\/ 005? 00*7 0055 0055 0033 PRFV PRFV
R E C 0 R 0 -0 It M P OR REQUEST 0000 OflOO 3503 2335 2CS 0007 0000 2414 0000 T|_ 0000 0000 3304 0633 OOF STATUS. 100000 STATUS- 000000 5605 0000 0000 0001 STATUSs 100000 STATUS- 001000 0000 0000 0423 1516 OSM 0003 6o*2 0073 0001 > 0000 0000 0000 0000 STATUS- 100000 STATUSe 001000 0000 0000 0311 17lfc Cin 6170 0000 0000 0000 000? 0000 0000 0001 STATUSs 100000 000* 4211 216? 364n Q]3 STATUSs 001000 0000 0000 0423 151* DSM 000? 6001 0073 0001 > 0000 0000 0000 0000 STATUS. 100000 02 000 06 015 05 00? 03 00? 0001 0002 0000 0000 0007 000* 216? 4144 Q ] * STATUS- 001000 0000 0000 0121 0000 AO 0000 0000 0320 1516 CP" 0000 4?10 0005 0001 F 0003 6001 0073 0001 > 20?5 2400 0000 001ft STATUS- 100000 STATUS* 001000
P«8E
0000 0211 2340 0000 0007 0000 0000 0163 0000
oooo -oooo oooo oooo oooo oooo
MA WORD i oooo oooo oooo oooo nflflfl oooo oooo oooo noon nOOO oooo oooo oooo oooo nooo
0000 0000 0000
oooo oooo
0001
0000 0552 1073 0000 0000 0000 0000 0000 0000
oooo oooo oooo oooo oooo oooo
oooo oooo oooo 0000 nooo oooo oooo 00*0 oooo ftOOO oooo oooo oooo oooo nooo
0000 0207 1073 0000 0000 6171 0000 oooo 0000
oooo oooo oooo oooo oooo oooo
oooo oooo 2324 0124 0001 oooo 3000 oooo oooo oooo oooo oooo oooo oooo
OR RESPONSE
0000 0000 0000
0000 0552 1073 0000 0000 0000 0000 0000 0000 POOO
oooo oooo
OflOO
oooo oooo oooo nooo
0101 013? 0032 0000 0000
0452 0000 7106 0?QO 0711 oooo 6345 ii?oo 0711 0000 oooo n?oo 0000 oooo oooo oooo 0000 oooo oooo oooo
0000 0000 0000 0000 0000
0000 oooo 11?4 1073 0000 oooo 0000 oooo 0000 oooo
oooo oooo oooo oooo oooo
oooo oooo 000] oooo nooo
0001
oooo oooo oooo nooo oooo oooo
MA tfORD 1
oooo oooo oooo noon A
0100
oooo oooo oooo oooo oooo oooo
oooo oooo oooo OOon 0000 onofl oooo 1614 oooo nooo oooo oooo oooo 0000
moo oooo oooo oooo oooo oooo oooo
0003
oono ooon ooon noon noon oono
nooo
noSO STAT flftf
n?4*> x noon onon nono
ML'B*
n*nn aA H CF nnno nooo nnnn nnnn nnrjn nooo
T6_ KMT 3673 3674 3675 3676 3677 3678 36T9 3680 3691 3612 36*3 3644 3685 3696 3697 3698 3699 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709
TABLE 5.1—cont. 0 0 0
\ 1 1
CFM CFM CFM
1* 1* If,
n3 03 03
1 0 0
1178 1179 llol
n(UT OOOO 0000 030A 1516 CF" 00*? 0015 0000 000? 0000 R OOPS 0015 0000 0000 0000
0000 03*2 1073 0000 0000 0000 0015 0000 0000 nOOO 0000 0000 0000 0000 0000
0000 0000 0000 0000 nnon 0000 0000 0000 0000 nnoo nnOO 0000 0000 0000 0000
0 0
0 n
ISM ISM
on on
0* 0*
0 0
Ilo2 1185
nOf.7 OOOO O O O O 3*23 ISno ISM 0017 0015 O O O O OOOO oono
O O O O 03l7 1073 O O O O OOOO OOOO OOOO OOOO O O O O oooo
OOOO OOOO OOOO onnn noon OOOO OOOO onoo onnh nnnn
15 05
0* 01
1 0
1186 1715
0
l
CFM
\r>
03
0 o
n ISM n ISM n 1
ISM Cln
15 o* is o*
*
1
«»«
17
*
1
•»*
o*
1
*«»
0 0
o
n
ISM
o 0
n ISM l CIO
n
n
*
on
o*
if. o* ni nl
03
IS" 15 n*
o Ha2
no*? ooo3 looo *00o oono 5
o 1185 oo?n 001* oooo oooo oooo o 11B5 no?5 oni* oooo ooon oooo
n017 OOOO O O O O OOOO 0000 nO?5 000? O O O O 0001 1036
1407.01?l6
PRFtf
ST4TUS» 100000
l*07.01?]fc
pRftf
S T « T U S z 000000
1407.01?)7
PRFV S T A T U S s
o 1?16 o I'm 1 I?l7
n I?i7
AH
oooo ooo3 oooo oooo onoo
oooo oooo 0001 oooo nooo oooo oooo oooo oooo nooo
OOOO OOOO OOOO OOOO 0000 OOOO 0001 0000 O O O O OOOO
nooo 0000 oooo oooo noon
3713
oooo oooo onoo ooon nnnn nooo oooo oooo onon nnnn
3716 3717
371* 3715
0000 OOOO OOOO O O n n nnon 0000 OOOO 0000 O O O O nnnr
0017 oni* oooo ooon onno
oooo ooon nooo oooo noon
nooo oooo oooti nnnn nooo
noi7 ooi"; oooo ooon ooon noi* oooo Oooo ooon onn*
oooo oooo oooo oooo oooo nooo OOOO nooo ooon noon
nnon oooo onoo onon noon nooi oono onoo nnoo noon
0017 oooo oooo noon nnnn
oooo oono nooo ooon oooo
oooo oooo onoo nnoo nnnn
100000
3710 3711 3712
37\8 3719
3720
37H1
3722
a
37?3 372*
37?5
37?6
TABLE 5.2 Channel usage and CPU utilization S Y S T E M i»MP| E PERIOD OF
REJECTED
/ 7
en* N I M H . C H f C K i
FREQTTMf-PEPCttjTAVE(MbEC) /
0 B 0 0 0 0 0
11 12 T3 15
0 0 0 0
1798 3p33 2Z5 2^07
0
195ZS
PI
TOT«L
0
1005 56.997 I"?4~554.094 2331 64.205 2131 60.397 ?¥* 13.765 43 1.307 2S68 10.676 I
100. BOO
18.982 99^833 TB76
PUSE
CHANNEL USAGE / WAITS
II S * 8 E
00 n 02 *13 !f5 06 07
S T A T I S T I C S
300000 MSEC FROM 131391 5.706 TO 131441 5.708 ON
89.76 •»3.»6 82.99 85.1)3 169. <J4 91.16 12.47
JUOOUU.OO
TREQ
788 "75Z 1256 935 2/7 0 0 ||
TIME-PETCENt 27.8BB 33.717 4?.084 32.691 1.9611 0 0 n
AVE(MSEC) 06lZ
10/25/74.
AVE WAIT FOP T&LL REQUESTS
L _ 134.? " " 100.5 104.9 217.99 0 0 g
43.9? 57.97 54.40 46.0? 24.1) 0 n 0
31.67 «J* T39",*5 S.J2
306 3593 0 115
3.677 .634 n .320
36.0 .5 0 8.4
6.14 ,50 0 .3*
76725
7772
9.53?
55.2
21.97
78
5~
TABLE 5.2—cont.
CONTROL CPU-O POINT TTHTiMSECrt'EWCtNT 00
1
5
<12 "3 VT 05 06 "T 10 1* T5 1* U CP1-CP16
113597
1
37.9
19811 3*168 20Z23 18001 2217* ~6"
0
6.6 11.* STT 6.n /.* 2.? .-n 9.6 T.3 5.7 _2.2 59.9
1*2.5 Cr-u ASi3NWEviTS/sEC.
CPU U T I L I Z A T I O N
CPU-1 TfWETXSK) 'PERCENT
n
L2!*!7
0
1*382 ?82*7 13699 1*611 22277 7087 ' ?0 23217 1776ft 27619 9*38 168925
BOTH CPU-S TIME(MSEC) PERCENT
i?«?_
23523*
39.2
4.6 9.* *.6 *.9 7.* 2.* .0 7.7 5.9 9.2 3.1
3*193 62*15 33922 32612 ***5l 13579 29 52156 306*2 **759 a1600fl
5.7 10.* 5.7 5.* 7.* 2.3 .0 8.7 5.1 7.5 2.7
56.3
3*8758
58.1
~" 0
0
0
4no2.« PP CALLS/llN CPU USAGE PATTERN
MB IN USE
Tl«E IN USE(MSEC)
f>
6^266
1
10H702
2
I2«in3i
PrRCFNT 21.1 36.?
*2.7
FREO OF ASGNwENT 38298
AVE MSEC PER AS6NMT
PP CALL FREO
8.31
6665
5075 7312 4587 3990 5683 8*9 12 2511 6377 6365 00*9
6.7* 8.5* 7.*0 B.17 7.B2 15.99 2.*? 20.77 *.R1 7.03 1,99
2663 2315 1609 1933 15*0 278 2* 1 *538 3756 0
*2761
8.16
0
0
1*57
2001*
AVE CPU MSEC PEft TALI 3«j. 2 ?
6"
1?.B* 28.18 21.08 1ft.87 20.86 *B,85 1.21
TABLE 5.3 Control point and peripheral processor utilization S r S T E M
SAMPLE PERIOD OF
~CP
JOBS STARTED
01 TO 03 n* TO 06 07
0 9 8 6 12 3 1
0 6 10 7 f— 2 3
59
3*
TO 1 * 15 T
6
TOTAL
0 0 0 0
STAOTS/MIN COMPLETIONS/WIN ~sTTLTmrrS7»i^'
ROLLINS/MIN
JOBS COMPlfcTfcn
ROLL ODTF
5 0 0 U
* 0 p 0
143
7.BO 6.BO
?e^e—
27.BO
Z
PT13F
30001)0 MSEC FROM 131391 5.70B TO 1314*1 5.708 ON
ROLL TNS
0 2 * 20 31 29 23 12
S T A T I S T I C S
CONTHOL POINT UTILIZATION TIME IN USE MSEC" PERCENT
0 Z ~ 22 33 gTS 21 1*
300000 "25T57T 273*16 837*83 198378 183378 *05l5
3" 0 0 6
'"" 2863 299999 29999* 299960
139
2383563
100.0 "'85.9 91.1 75.8 66.1 61.1 13.5 1.0 100.0 100.0 100.0 *6.7
FINAi, JOB igAME BATCHlO MSAOftS KHVDP01 8VSJ905 SIFT009 EL54199 AC*0?*
SIFT009
B
10/g5/7*.
,^,TD«, BUIUT HCAGE PATTERN CONTROL POINT USAGE PATTERN "~N» IN USF
TIME IN USt(MStC)
1 ^T 3 »
~5 6 7 8 9 10
n--
PrRCENT
i 5 10 1
1
*
8
TT5T! 9735 7*092 5 7 7 53370 119*1
rrzi
.0 TO" .0 .0
*
9
T»~ 3.? 2*.7 . 5 17.S *.0
rr
TABLE 5.3—cont. [[ iyo;oxsyopm ~PPV 0 I a 3 * 5 6 7 8 9
TOTAL
PACK.SE FffcT) 1 I 4151 4103 MT75 -SR50 1647 TW32 ->?20 1
— --• ?f,*8l
TIME IN HSFC
—
JSE
PEBCEMT
300000 2 i m T 245215 24ZB99 23ffl?3 2348R8 2287*6 2T2T52 196932 300000
2498750
OVERLAY TSl|_S
Ino.n 0 3 T ? 9 . Q ~ t f SI.7 3*53 Sl.T 3382 79.4™ ^269 78.3 3260 76.3 2910 TO.? ' '2B45 fi5.fi 24b3 100. n 40
B3.3
J1612
PP
"" """""SECTION COUNT
—
47)63 5955 36832 26474 25939 25219 23B?5 23333 20B44 43
225627
MB i N U S r 3 "5" 5 6 7 8 9
10
US4GE
"» T T E R N
T I HE IK USE CM SEC i 136 1"SH5 8076 23918 ^4RTOB 69312 717*2
77244
"PrRCENT .0 T5 2.7 8.0 T6.T 23.1 23.9
257
TABLE 5.4 Software probe data gathering summary PURUUE «MACE* DATA A ' j A L Y Z E r t 1 0 / 1 •>/?».
R U N BSOTHE.R. PUN' "ITrt PEBsEvt-.HfUitt
6500 12/13/74
1b.50.04.PAGE
THE DATA ANALYZED WEwE. (iflTHFOED ON 10/lh/'* AT U9.55.05. 793600 KLCOKDS WEuE pQOCESStu, THE ANALYZE" COMMENCED PdOCESSIMR OF THE I'ATfl AT 9S5.05045. WITH THE nEuufi FLaGS TURMEII UM AT 965.05045- AND OFF AT 2Jbfl.59999. THE ELAPSED REAL TIMF. »MS
813.SZn SECQUOS.
THE ANALYZED CPU TIME »AS
5U7.U7*. SECONUb.
IN THIS TIMF 1?2 JOBS ivERt ENCSUNTEPtU, AND 403 JABS WEHE POLLED niJT. OATA OVERRUNS OCCUHWEL)
Jh TjMrS»
INVOLVING A LOSS OF i?ui KFrOHD<;.
TYPt OF RECnRD NUMBER PROCESSED
0 SJ6640
1 ^197
~t J?36J IM Hit
5 37i]b
NUMBER OF T T " E S CHANNEL
»'>S A S S I G N E D vHFJM S T I L L
~7
NUMBER OF TTMES CHANNEL
tif.S PUnPPH'O »ihtN MJT ASSinnEu
13
NUMBER OF TTMES CPU
*fS ASSIG^E^ vhtN STILL IN 'Jar
3b
4 ^14T^U
5 .It
fr 4338^
7 •*nt-'y
I
~
TABLE 5.4—com. LIST
Of J08S NAMES ENCOUNTt&eri 4i,;0 T M E J H R E S P E C T I V E t X C L I J T K. xlH'I Its:
SYSTEM
SMELL80
•fi\fr,83»
S13.82
PkUBLt"
ȣ
Hi3.i-?9
Mt;c = s
PHU04DO XK6b060 XKfeialfeq EWOJ977
Inn. 620 4.144 =,.697 15.450
SIHU1* MJJjS«!l riFHi^n Xl^6b4»4
12b.,.rtV e;2.f,44 f.742 l7.7fc<)
bt«t.L»* AoRr "II- 3ejf3 HKOo?c,u
HUi4TS» CLRf 015 K06JUH HOOSABa MKJli UISbTT PKOuaCA •
T?.314 1.480 A4.17H ^.917 t),54H ''S>,b36 ??.9H B.19b I9.47b ^^..^vU 1.644
ttt.jo<
1 <.ll^ TU.?U ». 166 y f.'JBb P<;.S4H 476.43ft Ub.^4? 4.^5H 1B.I<>4 i 6,a^7 f?.7')4
3 ) ,b!2 1,247 5.22t) s ? . b<*'3 M.28J ih.52b
t At.itl < tluMer jotijdoi. K KM iK* 4 •lLo.^3'*^ rOJ'Aii
,N!K T.c-x 1 b.»7i i . 7 jt* 4./IV m.f^r
JUEJOHS
CYNbai
9.b4S>
?7.5hu
MUJ4ei*'
nLDFOi:
d)3.829
T,a'->^
?[i.«?o
B«TC*1U
tMn4i'»J
X«7t'3t-^ tA3a-*ln Kwo«9£» P t f - 7 lu.) XKfcbdb^ MUlb43
CU3J-!
Jiigiolnl ""Uui.uu ti-i|.hbj/ ML^^tt n KtJHlh.i
«|j.H?9
-'iiSfl=a=
»n.n2v
1.V3,.
DK411CO
30.617
**.2f<# J.bn4 I ( .7»>i lu.i)9B
fcUSU^ K.MOjOft? n<:T J R40U1R-J
37l.t>4M .bC'.S ^S.UZ/ 4.47^
"F15606 rUT/2 yKftb'37it TZCS? 1 '*
3?.2^8 40.1»l lS.3fi] 3.2°^
1.75? 1U.117 cj.tUB iJ.ktU (J(.4b7 ie.a4y 1^.>>S7 J.4SB l.h'tV . >> I 3 '*.7vfi
HbAbC44 pr
?.7ft'« =..413 u.bB:< .v.-«b-* lh.44^ .(>7I i 1 1 .3?^ ./b.< l.bd' i?,44 / ^7.u?j
eiJ.Hj"^
J.JbO
-«o.ii4b
•* . V / u SI.SKJ U./n3 H . ?bb .b^O
AdHOU
AAOJUHP
f«ua-)o».
r. "4 U .< i 7 . "Un.HJn -3JS <^h6^ ^t tFH^<
b4./b'»
',.:U"
^ 1.7 l.K-/; r.. 1 ^ n 5_1)V* T.^7,'
inannOl
yKf.sSQB
«0bh
lb.6-il
QK^oQHj rl»71 Ml<;Rl »K»,^3t4 cM04BS» yKfr^fefcS fttn jO rMf,Sb [1 S uUlbSlfty cMnc,7qs xKAP'D^M
33."^7 77.^/rt ^0.2^3 III . 2IJ 7 4.7-y^ ef.9J9 IS.7^1 .771 -83f. I 'i. S ' •< li.b-^
Mill h209 ~fMOh*34 in.'iohf"rMfi^riS 1 flb^ft
o f t . InS Sf.(}'>7 • J / -i « 5^ ^«2H?
c-*4,)uQ^
<»3.<)jii
TABLE 5.5 Part of data analysis report PUHUUt
»MACr» D A T A
A M » L Y ? t K l O / l S / T n .
CHANNEL
MUMbER OF A S S I G N M E N T S T O T A L ASSJfi>'"ENT TIME MEAN STANDARD DE"IATIOM MAXIMUM INTFpyAL TIME HAXIMUM INTERVAL BEli'lN MINIMUM INTERVAL TIME UTILIZATION-PERCENT
CHANNEL
NUMBER OF asSISNUt^Tj; T O T A L ASSIGNMENT TIME MEAN STANDARD D E V I A T I O N MAXIMUM I N T E R V A L TIME MAXIMUM INTERVAL BE6HN MINIMUM I N T E R V A L TIME UTILIZATION-PERCENT
TOTAL ASSIGNMENT TIME TOTAL NUMBEw OF ASSI(5NMtNTS AVEHASE ASSIGNMENT TIMt
0 SUB
Sh bUUKCt:
i fcui
4073 4J6* 4bl 63t> 3»i»/J» 11^.34 <S.»3 7f-,06 bl.b-i S?5 <Jb 100>,.5l75b U'UJ.OB'tdl 7 b 5fr.7 4e.hl
uauGF. S T f l T l i n t S
? 80«
4UJ^ g'VtVJri ?3.u-* fil.74 B*S 'VSo.SOii'Mfc S 3ft.?4
11 B44
12 U'llT-HEC
13 TAHtS
17JH 7?Tfc"l 41.55 4?.39 531 95R.08755 2 f.,94
'ioai feiabl)4 82.16 133.00 TKT7 9b6,i?OS07 1 Si*.S4
7V1 170611 £30.24 /60.U6 9297 -ybo.15092 1 ?n.9ft
-596R4J* 5n«e>l 7(..b4
SjCli
3 BJH
fr?44 »tif ?35 /4.^J 5
14 EM
4 E/I
l
rf
I?/ 13/ ?
5 ?l
4
e 6S4
1 <- . £> d ." ^
°
7 = iFFE"
A
S
F
1
To [)nf,D
(i 333H / 7ft4^ 1 II . 3fe9r"iJ Sa^ | ?; ?fe3 p] 3 « ? y 0 IKi.ht' rt4.Hh 1^.S*7 «13B29.(in (I *6.7? lUO.flh 1 6 . ?t0 0 1 n«,u 3?9 3Ba jil3B?9 0 l O l ! A . 3 R ' » ? B 1(103.008?8 K 1 H . l B B ] p ^55.05f)45 ) J h ? 0 0 45.3? ,»7 ia.9(i 100. nO
IS FST
16
17
y n371 0 3r-08& J 4.31 J 7,b7 (T 244 0 10n2.4i;396 0 1 u 4.43
0 U 0 0 0 0 u 0
y 0 0 0 0 0 u 0
TABLE 5.5—con?. CNTKL P
T
NUMBER OF ASS1SNMENTS TOTAL ASSISWMENT TIME
MEAN
3
0 SYSTEM
1 tfaTCnIO
2 PS3MA1L
bFY"?^
4 PuFFT
5 SHE118U
6 CLRF01S
7 XKfrfi»57
10 OB303M
1 81.1BZ9
1 613Bi;ij
55 m569
f6 738 = 46
39 76il971
j_17 S63058
97 445772
5? ?».52f)
Tl 105078
snaao.oo
fliJBe^.uo
ituae.ss
9717./i
i9bu.o8
4812.46
4595,59
5sm.no
33B9.&1
STANOAHD DEVIATION 0 0 l'O44.97 112-^4. »u 74669.69 6523.03 853g.^l 71??.3fr 7gB2.tO MAXIMUM IMTERVAL TIME 813829 8TJ»79 185764 59?93 47t>*86 21301 52644 ?7457 27Sq5 MAXIMUM INTFBVAL BEGUN 95=1.05045 <5i)b.05U46 -<6e.38787 959.26107 iOnd.42388 i005.26797 955.2B442 1000. ?B43? 959.39497 MINIMUM INTERVAL TIME 0 41 2U2 ?3fe 66 17* If.2 * 197 1q3 I I T I U I Z A T I O N - P E R C E N T 1 0 0 . 0 0 1 0 U . O O 9 4 . 8 1 ^ n . 7 ^ 9 3 . 5 1 6 9 . 1 9 5 4 . 7 7 3 ? . 2 1 I g . q l
TABLE 5.6 Software probe data for a time-sharing process PAGE 119
M/U TCBS USE* TCBS »UL TC3S
»«»Rf-AL TlwE(MSEC) PER MESA SLICE«»« EVENTS F.VEMTS/SF.C TOTAL SUM-OF-SQUARES 6 .09970 1636 f>. 139500000E*05 167 7.77510 57977 J.8534559ooE»07 173 2.87480 59613 J.9148509ooE«07
MEAN 278.667 347.168 344.584
SD 167.26592 331.99469 327.95419
M/U TCBS USER TCRS ALL TC8S
«*»TcB TIME SPENT IN SYSTEM STATE»»« EVENTS FVEwTS/SFC TOTAL SUM-OF-SQUARES 2 .03323 7428 S.a707464ooE»07 76 1.26202 283012 8.05730*078E»0-9 78 1.29615 290440 ».110011542E*09
MEAN 3714.000 3723,84? 3723.590
SD 3544.00000 9599.48743 9492.59640
M/U TCHS USER TCBS ALL TC9S
M/U TCHS USEH TCHS
ALL TCHS
EVENTS
»««MFSA CP TTME PER SYSTEM STATE IMTERACTION«*«
2 76 78
EVEMTS/SFC
TOTAL
MEAN
818.000 779,237 780.231
««»TC8 TIME SPENT IN USE5 STATE"* EVENTS FVEMTS/SFC TOTAL SUM-DF-SnljARES i .033?3 23660 3.(5094971?oE»08 77 1.27954 8179699 S.308226654E»11
MEAN 11830.000 28307.779
1.31277
1636 59222 60858
SUM-OF-SQUARES
79
.033?3 1.26292 1.29615
2203359
3.3iisi6isiE»Ji
27890.620
so
646,00000 1193.42565 1182.57479
SD 6366.00080 78054.27909
77110.02873
TABLE 5.6—cont. TCB M/UR 000 10
oos 006 013 ou ois 016 021
023
oo oo oo oo oo 00 00
oo
LOGGED nN TCBS U N D E R THIS CP ST«TE STATF ST/yRT PROCESSOR S ST»TES S S/SEC II 141S.J19B9 MFSAMTR 2 .03323
s u u u u
1416.06326 1413.40356 1416.070b? 141&.U1843 I4i6.oi9b4
PTRATE PTRATE PTRATE PTRATE PTRATE
2 o 3 6 3
s
1416.0705?
PTRATE
s
1415.41075 I4i2.i502n
PTRATE PTRATE
4 o
1416.0B4U7
PTRATE
2
.03323
2 1
.03323 .01662
U U
1413.U89? 1415.2451?
026 027
00 00
U U
032
00
U
1415.59901
U U
1416.04509 1416.0*293
030 031
033
oo oo
u u
oo oo
u u
I4i6.oo364 1415.57156
PTRATE PTRATE
s u
1415.4145T 1408.01545
PTRATE PTRATE
042
00
U
00 00 00 00 00
U U U S U
045 047 050 054 055
060 061
PTRATE
u
00 00
043 04*
PTRATE PTRATE
oo
035 036
040 041
1415.1552B 1416.0799R
LOGON PTRATE
oo oo
oo
00
u
U
1414.bn73n
1403.41995 1415.01615 1416.0R667 1415.57238 1413.14155
1*1*.5n949
1416.0925R
PEO PTRATE PTRATE
PTRATE PTRATE PTRATE SEARCH LOGON
PTRATE
PIRATE
.033?3 o .04935 .09970 .049*5
0 1
0 .01662
1 3
.01662 .04985
2
2 4
0
2 o
0 0 6 3 0
o
4
,oe3o9
.06647 o .03323
.03323 .06647 0
.03323 o 0 0 .09970 .04985 0
o
.06647
TABLE 5.7 Probe data from CDC 6400 systems S T S T 6 M 5 AMPtE
REJECTED
/
PERIOD Of
$ I A T I 5"T~ I C S
PAGE
300000 MSEC FROM 131391 5.659 TO HU4I 5.659 ON
U S » 6 E
/
CHANNEL USAGE W A I T S
CHSNNtLCHECKS1FRCQTtME-PEPCENTAVE(MStC) /FREO"TTSE-PERCENT 00 ffl 0 02 03 ITB
0 IZZ 0 0 D
11 T5 16
0 B 0
63 578 31
i
100.000
300000.00
39.35 i.ez 99.35
TOtAL
0
4726
n.B»6
113.27
lo
o
964 57?T6 18 38 3fcTI
9.986 U8.26 .766 1,838 58.277 .826 .305 1.027
113.47 127.72 145.09 48.42
5
7 rm 0 0 306
AVE(MSEC)
AVE WAIT FOR
A L L REQUESTS
.171 TTT7? 0 0 12.79B
0 0 125.5
0 B 2
0 .014 .022
0 5.4 33.5
0 .07 8.1ft
3f6
.880
120.7
8.3B
o
o
73.3
10/25/74,
o
4790
1.94 A 8 10.63
n
5~
TABLE 5.7—cont. CPU UTILIZATION CONTROL CPU-0 CPU-1 BOTH CPU-S FREOOF AyE M5FC PP C»LL » V E fiPUMSEC "BOTNTTtHE(MSEC) P f R C t w T T T M E f M S E C ) P E R C E N T T I M E ( M S E C ) P E R C E N T A S G N M E N Y P E R A S G N M T F p f B P E R C»LL 00 5 1 02 03 IT*
6
05
119077 39.7 7 5 » » 2 ?*7*5 8.? »5568 15.? 2*023 877) 2
06 if? CP1-CPQ6
Z
7
.0
1*072 t96*
*.T TT?
175959
58.7
300000 5 0 0 0 0
IQO.O B 0 0 0
0 B
0
0
40.0 CPU ASGNMEwTS/sEC.
419077 * 9 2*7*5 »S568 2*023
69.8 11.3 *.l 7.6 *75
ft035 »977 1937 335* TTK
69.** 13.57 12.77 13.59 21.18
1658 ?561 767 87* 399
252.76 It,,36 3?.26 5?.l* 60.21
0 6
1*072 *95*
2.3 ^8
597 485B
23.57 1^02
1 0
1*07>.00 0
0
175959
29.3
1?00*
1*.66
*609
3H.18
6
0
7
5
2
.0
921.8 PP C»LLS/MIN CPU USAGE PATTERN
M
R I 0 I
1
NU S E T I M E 8
0
119077 9 2 3
6
5
I
N
39.7 7 3
5
.40
7
.29
66
CHAPTER 5
It is not important for anyone, except perhaps a system programmer, to be able to read and understand the event records in the form presented in Table 5.1. The examples given here are samples of the kinds of data that provide the raw material on which a performance analyst can build data reduction and analysis routines. There is no point to printing large amounts of raw data, but anyone who wants to attain a reasonably deep and complete understanding of the system should read the documents that are necessary for the understanding and interpretation of such data. An examination of the raw data can reveal redundancies and inefficiencies that would be hard to find in any other way, and that would almost certainly be concealed in summary reports of the type illustrated in Tables 5.2 to 5.7. The following is an example of a system performance bug that was found in one of the early runs of the tracing probe on the Purdue MACE system. A sample of raw event data showed that every write operation on a permanent file disc unit was preceded by an apparently unnecessary short read from the same disc. On investigation, it turned out that the purpose of the read was to fetch a record that was used as an interlock to control write access to the permanent files during system debugging. This particular interlock scheme was probably the best practical method of achieving the needed results at the time it was installed. By the time the probe was available, channel contention on the permanent file disc unit had become a serious problem. The probe data showed the extent to which the interlock mechanism was contributing to the problem. This was one of the motivations for system changes that permitted a less costly interlock mechanism to be used. It is sometimes of interest to list all of the events that have to do with a particular control point. For example, a listing of the events that involve control point 15 or 16 will give more detailed information about the time-sharing processes in the system. An examination of the list of data events at these control points revealed an interesting problem that has since been corrected. The time-sharing process seemed to be quite efficient in its use of central processor time, even under heavy loads, but it was extremely inefficient and used unacceptably large amounts of central processor time when it was "idle," i.e., when no current time-sharing task needed service. This is another example of a problem that showed up most clearly in a detailed data listing. 5.5.2. Examples of probe analyses. In a series of data-gathering experiments, Joel Ewing installed a probe process that could be activated periodically for 5-minute (300,000 millisecond) intervals during test periods that lasted for several hours. Some of the data from one of these 5-minute intervals is presented in Tables 5.2 and 5.3. All of the information in these tables, and in the other tables and summaries presented in this chapter, is based on analyses of records of the type shown in Table 5.1.
SOFTWARE PROBES
67
Table 5.2 includes a summary of channel usage statistics. Channels 0, 1, 2, 3, 5 and 11 are the disc channels that are of major interest. The waits are a result of channel contention. The times shown include access plus transfer time. Thus there were 1905 input/output requests addressed to channel 0 in the 5-minute period of this report. Of these 1905 requests, 788 were delayed an average of 106.2 msec, each. Once it obtained access to the channel, the average I/O operation was completed in 89.76 msec. Channel contention added 43.92 msec, to the average time to complete an I/O request to the disc on channel 0. We know from examining data taken over many sample periods that this was a 5-minute period with very heavy disc activity. There is a great deal of variation from one 5-minute period to another, even within the same hour. There seems to be no such thing as a typical 5-minute period. Some fairly straightforward conclusions can be drawn from looking at a number of channel usage statistics reports. We know that the disc unit on channel 11 is a faster, more modern unit than those on the other channels. The data shows that channel 11 is underutilized compared to the others, and that its I/O completion time is much faster. It is a multispindle device (a CDC 844, analogous to IBM's 3300 series), and another spindle was scheduled to be added in the Spring of 1975. Data like that in Table 5.2 will then show the extent to which utilization of channel 11 has been increased, and may suggest the addition of still another spindle. This data, along with much additional data, also suggests the fairly obvious conclusion that the system would perform better if some or all of the discs on channels 0-5 were replaced by discs similar to those on channel 11. It would be an interesting and difficult study to try to determine the extent to which performance would be improved by such replacement. Some simulation and modeling techniques that might be suitable for such a study will be discussed in Chapters 8 and 9, but ultimately the success of any of these techniques relies on the availability of the types of data discussed here. Economic considerations might make such a study a purely academic exercise at the present time. Figure 5.2 also includes a summary of central processor utilization by control point. As explained in Chapter 2, the central processors are switched at high frequency among the processes that are ready to use central processor time. The central processor time used by control point 0 is idle time, that assigned to CP 17 is used by supervisor functions that run like user jobs. In this particular run the probe process itself (CP 14) used 8.7% of all available central processor time and almost 15% of all central processor time actually used. The time-sharing processes at CP 15 and 16 used 12.6% of all available central processor time and more than 21% of all central processor time used. The user jobs run at control points 2-13 (11 and 12 were not used in this 5-minute period). The upper section of Table 5.3 shows the number of jobs started and completed and the rollout activity by control point. Between them, Figs. 5.2 and 5.3 give some indication of the level of multiprogramming in the system, and of the amount of activity carried on by the system scheduler and by the central processor dispatcher.
68
CHAPTER 5
Table 5.3 also shows peripheral processor utilization during the same 5-minute period. Of special interest is the PP usage pattern which shows that all 10 peripheral processors were in use 25.7% of the time. This is one indication of a peripheral processor saturation problem that is known to exist in the system. The full report of each 5-minute probe activity contains additional reports, in much greater detail, that show the frequency of use and time of execution of the central processor executive functions and of the peripheral processor programs. Information of this type has been extremely useful in many efforts aimed at decreasing .system overhead, pointing out areas in which improvements might be most needed or most productive, and showing the improvements, and sometimes the deficiencies, produced by changes made in an effort to improve these areas. There are many ways in which the data contained in the event records produced by the software probe could be summarized and presented. Tables 5.4 and 5.5 are two pages of a report designed by Herb Schwetman and programmed with the help of Steve Bruell for use with Purdue probe data. As indicated on page 1 of the report, it covered slightly less than 15 minutes of real time, during which almost 800,000 records were generated. The records were collected on magnetic tape, and the program that produced the report was run later as a standard user job that used more than 500 seconds of central processor time. The lost data problem mentioned earlier was encountered during the collection of the data analyzed here. As indicated in Table 5.4, there were 36 occasions in which records were lost, and a total of 3701 records were lost during the running of the probe. This was typical at the time this data was collected, but improvements in data-gathering techniques since then have reduced the lost data problem to more manageable proportions. Table 5.5 shows channel utilization and control point utilization during the period for which the data was gathered. Other pages of this report, not reproduced here, provide an alternate presentation of the data in Tables 5.2 and 5.3, along with much additional information. The volume of data produced by the probe is such as to make it worth while to spend considerable effort to design useful reports. It is not within the scope of our discussion here to consider the relative merits of different methods of data presentation. Graphs of various kinds are frequently used, especially in reports designed for use by management personnel or for others who may not want the level of technical detail that is often present in tabular presentations. An interesting type of graphical presentation is the circular chart which has become known as the Kiviat graph. The reader is referred to [SN74] for an example of effective use of Kiviat graphs in connection with a computer performance study. 5.5.3. Time-sharing process data. Table 5.6 is part of a report produced from data gathered by a selective probe program designed to provide information about the performance of the time-sharing processes in the Purdue MACE system. This table describes about one minute of activity of one of the two time-sharing processes. The previous page of the report, not included here,
SOFTWARE PROBES
69
showed that there were 39 terminals active at this time-sharing process. The "TCB time spent in system state" is the response time for the 78 interactions that finished during the one-minute interval described in this part of the report. For these interactions the average response time was about 3.7 seconds with a standard deviation of about 9.5 seconds. The "MESA CP time per system state interaction" is the service time provided by the time-sharing process. The average service time for these 78 interactions was about 780 milliseconds. The average think time, "TCB time.spent in user state", was about 28 seconds. A great deal of data of this type was collected and is still being collected in efforts aimed at understanding and improving the behavior of the time-sharing subsystem. In Chapters 8 and 9 we shall see how some of this information has been used in simulation models and in other models. 5.5.4. Multicomputer systems. There are some special problems that arise in connection with attempts to obtain performance data from multicomputer systems. The CDC 6500, which was the source of all of the data presented so far in this chapter, is itself a multiprocessor system. The successful use of the software tracing probe on the 6500 is based on the fact that all 12 processors are controlled and synchronized by a single supervisor process. As has already been mentioned in Chapter 4, the system running at Purdue also has a CDC 6400 which shares most of the peripheral storage devices used by the 6500. The 6400 has its own supervisor process that controls the activity of its single central processor and its 10 peripheral processors. The Dual MACE system provides interlocks that permit orderly storage allocation on the shared devices, and communication facilities that permit file table entries and job processes to be allocated to either computer and to be moved from one computer to the other according to a total system strategy. Aside from this loose coupling, which is maintained through common Extended Core Storage, the two machines run asynchronously and independently, but they do affect each others performance through conflicts for access to the shared peripheral devices. Table 5.7 is based on probe data taken on the 6400 at about the same time as the data in Table 5.2 was obtained from the 6500. Channels 0, 1,2,3,5 and 11 in the channel usage table of Table 5.7 communicate with the same discs as the corresponding channels in Table 5.2. The data shows some of the results of an attempt to concentrate the 6400 disc activity on channel 5 while leaving the other discs mainly at the disposal of the 6500. Only the first of the three columns of CPU utilization figures in Table 5.7 is meaningful and the other two should be ignored since the 6400 has only one central processor. Simultaneous collection of data on the two major computers in the system represents a first step in an attempt to study the coupled system. An examination of the raw data on which this first step is based has indicated that some of the data in these tables may not represent exactly what we want them to represent. There are apparently some cases in which a peripheral processor on one computer requests and gets access to a channel, and then drops the channel because the corresponding disc unit is busy servicing a request from the other computer. From
70
CHAPTER 5
the point of view of the first computer this looks like a request for disc access that has been satisfied, and is included in the channel usage statistics, as it probably should be. However, any interpretation of channel usage statistics as representing disc access time statistics becomes doubtful. It now seems necessary to go into more detail in order to get disc access data as well as channel usage data. This is already perhaps too much detail about one particular system. I include it here to reinforce the comments made in Chapter 4 about the need to be very cautious and skeptical about any data that purports to represent the behavior of complex systems. I would guess that most such data contains inaccuracies and ambiguities. Some of these do not matter, but it is an important part of the performance analyst's job to evaluate the quality of the data, and to investigate and perhaps reject conclusions based on questionable data. Many difficult multicomputer analysis problems arise from the fact that all communication with terminals goes through front-end computers, each of which runs under its own operating system. The software probe techniques described in this chapter are not necessarily appropriate for front-end communications processors. Data of the type gathered and presented in Table 5.6 is observed from the point of view of the time-sharing process. From that point of view, an interaction starts at the time the time-sharing process becomes aware of an attention request, and ends at the time that the process has finished with that request. From the point of view of the user at a terminal, the interaction starts at the time he hits the carriage return key or an attention or interrupt key. The front-end processor itself is running an elementary time-sharing system that recognizes the carriage return, and executes a more or less complex program. It may, for example, do a cyclic sum check on the input line, translate the line into a code acceptable to the central computer, and move the line into an appropriate buffer, before it raises a status bit to inform the time-sharing process in the central computer that an input from that terminal is ready. The front-end processor is simultaneously (in the sense of a time-sharing system) serving many other terminals, and queuing delays can and will occur. A routine that runs periodically (every 150 msec.) in a peripheral processor of the 6500 system collects all of the status information from all of the front-end machines, and calls on still another peripheral processor routine to update an attention array that belongs to the time-sharing process. The time at which the updating is done is the time that the process sees as the beginning of the interaction. On output the situation may be even more complicated. A peripheral processor routine writes output lines to the front-end machine, and when its output operation is complete the time-sharing process signals the end of the time slice and the end of the interaction. As in the case of input, we are ignoring possible queuing and translation delays in the front-end processor. Because of buffer size limitations it is possible that only part of the output message was transmitted to the front-end machine. When more buffer space is available, for example, when the first part of the message has been printed on the terminal, the front-end computer
SOFTWARE PROBES
71
will automatically initiate a new interaction with the time-sharing process to get the rest of the message. Thus one interaction from the point of view of the user may become several interactions from the point of view of the time-sharing process. To the best of my knowledge, the data presented in Table 5.6 is correct, but as indicated in the above discussion, an accurate interpretation of the meaning of the data requires considerable knowledge of the details of the system from which the data was collected. 5.6 Other tracing probes. There has not been a great deal published about the use of tracing probes. They are expensive tools that are difficult to use, and even though they provide more information about multiprogramming systems than can be obtained in any other way, this information is specific to the system measured and does not make for interesting papers in the general literature. Also, there may be techniques of this type used in connection with proprietary operating system development by computer manufacturers who do not have strong incentives to publish their techniques and their results. Measurement techniques, including tracing probe techniques, were employed rather extensively in the GECOS III development, and several papers were published in connection with that work, of which the most important is [CA68]. There was a large number of measurement tools employed in connection with IBM's TSS 360 development. We have already mentioned the existence of a full interpretive trace ITF for that system, and a tracing probe called SIPE, the System Internal Performance Evaluation program, described in a very interesting article by Deniston [DE69]. We have also mentioned GTF, the Generalized Trace Facility, provided as a product by IBM [IB72] for their 360/370 series. Beretvas [BR73] indicates that some modifications to GTF were necessary before it would provide the information needed for use as a performance evaluation tool. Schwetman's Ph.D. dissertation [SC70] describes a tracing probe installed as part of a peripheral processor supervisor in a CDC 6600 system at the University of Texas. Joel Ewing's paper [EW75] contains an annotated list of software and hardware probes that have been developed and used, and a useful bibliography of this whole area.
This page intentionally left blank
CHAPTER 6
Hardware Monitors 6.1. Introduction. From the time of the very earliest computers, panel lights were used to indicate when the various control flip-flops in the system were set, and when the various system components were being used. The engineers who design and build and maintain electronic computing equipment have always had to have the ability to attach oscilloscopes and other instruments to test points in the system while the system is rurining, without interfering with the running system. We have already discussed in Chapter 3 how observation of indicator lights during production runs can give the analyst useful information about the performance of the system. The time during which a particular flip-flop is set provides a direct measure of the time during which the function it controls is in use. Direct observation can give only a very rough estimate of such times. They can, however, be measured to any desired degree of accuracy by electronic devices attached to appropriate test points in the system. There is no way of knowing who first had the idea of using electronic counters attached to test points to collect information about system performance. Counters can be used to measure the fraction of the total time that a flip-flop is set, or that a component is in use. Such counters are the basis of most hardware monitoring equipment, and provide the data for most hardware based performance studies. We shall use the term "hardware monitor" in this chapter rather than "hardware probe", which would be more consistent with the use of "software probe" in Chapter 5. The use of the term "hardware monitor" is well established in the performance measurement field, and there is no ambiguity of meaning of the kind that exists in the case of "software monitor." In connection with hardware monitors, the word "probe" is frequently used to designate a wire or sensor that is attached to a system test point. 6.2. IBM hardware monitors. Although it was already present in a simplified way in the Univac I, the ability to overlap computing with input-output operations became an almost standard feature of the late first-generation and early second-generation computers. A hardware monitor that measured the extent to which input-output channels were being used simultaneously with the central processor could give a good picture of the extent to which the system took advantage of this capability. IBM apparently had such a device in connection with their 7090 computer in 1961 [BO69, footnote 3]. A later version of that device was made available to IBM 7000 series customers on temporary loan, to permit them to run evaluation studies. 73
74
CHAPTER 6
One of the early IBM hardware monitors is described in [BO69]. A set of sensors or probes were provided, to be attached to appropriate test points. The monitor contained sixteen 11-decimal-digit counters, and a patchboard to permit signals from more than one test point to be combined through logical AND and OR circuits before being fed to a counter. Thus a counter could record the amount of time that the central processor and one or more I/O channels were simultaneously active. Another might record the time during which at least one inputoutput channel was active. One of the counters was used to record the total time during which data was collected. A display register and switch arrangement permitted the analyst to observe the value in any selected counter at any time. A mode of operation was provided in which the contents of all 16 counters were punched out into cards at regular intervals. The resulting card deck could then be processed on a computer to produce performance reports. The standard report, which showed the extent of overlap in usage among system components, was called a "computer profile." A more sophisticated device, also built by IBM, is described in [SH67]. The "time-sharing system performance activity recorder" used up to 256 sensors to feed up to 48 counters. The contents of the counters could be recorded on magnetic tape units for off-line processing. 6.3. Commercially available hardware monitors. The development of the complicated multiprogramming systems for third-generation computers led to a great increase in the use of measurement techniques of all kinds to enable the system designers and implementers and the users to understand what was going on in their computing systems. They also led to the development of a performance evaluation industry with companies like Comress, Boole and Babbage, Computer Synectics, Tesdata and others offering measurement products and evaluation studies. Several of these companies offered, and some still offer, hardware monitors with capabilities similar to those discussed in the preceding section. Under pressure from this small developing industry, IBM restricted the use of its hardware monitoring activities to its own installations and to specific salesoriented studies at customer installations. We shall not describe the various commercially available hardware monitoring products. A discussion of the use of those available up to sometime in 1973 is contained in [MO73]. Summaries of the characteristics and capabilities of hardware monitors, and of other commercially marketed computer evaluation products, such as software monitors and simulation packages, are published from time to time by computer information service companies and by magazines that serve the computer industry. Individual products are described in some detail in brochures published by their manufacturers. Users groups have been formed for the interchange of information about the use of various evaluation products. 6.4. Hardware monitors and software probes. The software tracing probes discussed in Chapter 4 can get much more information in much greater detail than
HARDWARE MONITORS
75
would be possible through use of a hardware monitor, but at a very large cost in system resources, and with a major perturbation of the system being measured. Software sampling probes are closer analogues to hardware monitors. They can provide a similar level of detail of measurement with a relatively small perturbation of the system that is being measured. A comparison of the results of using a software sampling probe and a hardware probe in one particular study is presented in [PE74]. A major advantage of a hardware monitor is the fact that when properly used, it will not perturb the system that is being measured. In reference to hardware monitors, Bonnet [BO69] states that "monitoring . . . has no effect on system performance and requires no program modification." Warner [WA71] states that "the hardware monitor does not degrade or interfere with the system it is measuring and requires no system overhead." These statements are essentially true. Care must be taken to make sure that the probes are properly placed so that they measure the things that are claimed to be measured. Improperly placed probes may give results that appear correct, but that are spurious. Improperly placed probes may interfere with the system operation, and in extreme cases may cause malfunctions or may even damage the equipment. Almost all of the commercially produced hardware monitors have been designed primarily for use with IBM 360 and 370 equipment, since this equipment represents considerably more than half of all medium and large scale computers installed. Lists of test points for IBM systems are available, and considerable experience has been developed in the use of monitoring equipment. Reports of successful and profitable use of hardware measurement devices appears regularly in the weekly trade paper "Computerworld," and in various trade magazines. The technology is not quite as well developed for some other computers. Thus Noe in [NO74] reports some problems and difficulties in his attempts to use a hardware monitor on CDC 6000 series equipment. He recommends to the user, "Don't take too seriously the common statement, 'Hardware monitors do not interfere with the host system.'" The use of a hardware monitor requires the lease or purchase of a hardware device, and the development of some expertise in its use. At least some of the capabilities of a hardware monitor could be built in to the hardware of a computing system. Features in some large systems already represent a start in this direction, and it is a direction of development that will probably be pursued in future systems. 6.5. Minicomputer-based hardware monitors. The development of very low cost minicomputers and microcomputers in the 1970's has made it possible to incorporate extensive processing capability in the hardware monitor itself, which in turn makes it possible for the hardware monitor to perform on-line a good deal of the data reduction and data presentation that was previously done off-line. It also makes it possible to use much more complicated logic than could be incorporated in plugboards. Commercially available hardware monitors in 1975
SPR RESULTS FOR SNOBOL 15.02.15. 04/05/72. SAMPLES WERE TAKEN. SAMPLES, OR 39.9 PER CENT OF THE TOTAL, WERE FOR THIS PROGRAM. SAMPLES WERE TAKEN DURING EXTENDED CORE STORAGE TRANSFERS. SAMPLES, OR 11.3 PER CENT OF THE TOTAL FOR THIS PROGRAM, WERE WITHIN THE RANGE (001100, 001177). 0 ROLLOUT-ROLLIN OPERATIONS WERE PERFORMED UPON THIS PROGRAM DURING THE SAMPLING PERIOD. THE SAMPLING INTERVAL WAS APPROXIMATELY 100 MICROSECONDS, THUS GIVING A SAMPLING RATE OF APPROXIMATELY 10,000 SAMPLES PER SECOND.
83968 33417 71 3784
ADDRESS RANGE
NUMBER
PERCENT
ANALYSIS OF RANGE BREAK-DOWN
(001100,001100) (001101,001101) (001102,001102) (001103,001103) (001104,001104) (001105,001105) (001106,001106) (001107,001107) (001110,001110) (001111,001111) (001112,001112) (001113,001113) (001114,001114) (001115,001115) (001116,001116)
25 19 21 21 342 305 302 382 408 234 308 419 236 113 12
.7 .5 .6 .6 9.0 8.1 8.0
***** **** ***** ***** ********************************************************************************** ************************************************************************ ************************************************************************ ****************************************************************************************** ****************************************************************************************** ******************************************************* ************************************************************************* ***************************************************************************** ************* ********************************************************* ************************** **
10.1 10.8 6.2 8.1
11.1
6.2 3.0 .3
(001117,001117) (001120,001120) (001121,001121) (001122,001122) (001123,001123) (001124,001124) (001125,001125) (001126,001126) (001127,001127) (001130,001130) (001131,001131) (001132,001132) (001133,001133) (001134,001134) (001135,001135) (001136,001136)
6 8 7 15 10 5 6 6 3 11 12 20 15 37 23
.2 .2 .2 .4 .3 .1 .2 .2 .1 .3 .3 .5 .4 1.0 .6
I * * * *** ** * * * * ** ** **** **# ******** ***** FIG. 6.1. Sample of SPR output
78
CHAPTER 6
contain minicomputer driven printers and graphic displays that provide continuous performance information. 6.6. Hardware program monitors. The idea of a program monitor is to use a hardware device, which may be or include a computer, to read and record the successive values of the program register of a central processor whose performance is of interest. It might also be possible to read and record the contents of other registers. This is done while the computing system that is being monitored is running either a real load or a test load. As in the case of a software probe, the hardware program monitor may be a sampling probe, or it may attempt to capture all values of the registers it is monitoring. In the latter case, and sometimes even in the former, it may accumulate very large amounts of data that are analyzed in runs that are independent of the data-gathering process. They may, for example, be analyzed on a different, more powerful computer. Apple [AP65] describes such a device designed in 1961 for use with the IBM 7000 series computers. Data could be captured on 39 lines, which usually monitored the current value of the program counter and the operation code.This data would be dumped onto a specially modified high performance tape system for off-line processing. Since data could sometimes arrive faster than it could be recorded, provisions were included to record lost-data occurrences, and optionally to stop the host machine until data could be stored on tape. Rock and Emerson [RE69] describe a program monitor for large Univac equipment that was used to record transfers of control (jumps) rather than the entire instruction stream. In the CDC 6000 multiprocessor system there have been a number of studies made in which one peripheral processor was used as a program monitor to obtain performance information about the rest of the system [ST68]. As an example a peripheral processor routine SPR is available in the Purdue MACE system to be used as a measurement tool for any running program. The routine SPR is loaded into a peripheral processor from which it obtains samples of the values of the P register during the operation of the program that is being measured. The result is a frequency distribution that shows the frequency with which the P register was present in various areas of memory. Figure 6.1 is an example of the output of SPR for a particular program. This output can be correlated with a program listing to determine how much time is spent on the various subroutines and code sections of the program, and can be extremely useful in connection with projects aimed at improving the efficiency of programs and system components. It is not really clear whether techniques of this type should be considered hardware probes or software probes. Combinations of hardware and software have been used in many performance studies, and, as computer technology advances, one may expect the distinctions between hardware and software to become blurred. It certainly seems reasonable to expect that most relatively large computers of the next generation will have built-in instrumentation that will automatically make available a good deal of the information that must now be obtained by hardware probes, and also much of the data that can be derived through software probes.
CHAPTER 7
Benchmark Workload Models 7.1. Introduction. The terms "workload" and "benchmark" have been defined and discussed briefly in Chapter 1. With j^w exceptions, mostly in the area of hardware performance, every significant question about computer system performance must take into account the kind of work that the system is expected to perform. Using slightly different terminology, we can state that every serious performance evaluation study uses a model of the workload of the system whose performance is being studied. The validity of the results of the study usually depend very critically on the accuracy and validity of the workload model. Providing an adequate model of the workload of the system may be one of the most difficult tasks faced by the performance analyst. Equally difficult, and even more frequent, is the task of estimating the validity of the results of studies in which simplifying assumptions regarding the workload have been made for practical reasons, in order to make it possible to obtain any results at all from the studies. The importance of workload models has been recognized by many writers in the field. Grenander and Tsao [GR72] wrote: "We believe that no real significant advance in the evaluation of systems can be expected until some breakthrough is made in the characterization of the workload." Ferrari [FE72] states: "The lack of satisfactory workload characterization techniques is one of the main reasons for the primitive state of this important branch of computer engineering" (i.e., performance evaluation). Models of computing system workloads fall into two categories, depending on whether they are designed to be used with (i.e., to drive) a model of the system, or to run on the system itself. In the present chapter we shall limit the discussion to workload models that are designed to run on one or more existing computers. I shall refer to such workload models as benchmark models. It is conceivable that a benchmark model could also be used to drive a model, for example a detailed simulation model, of a system. The defining characteristic is the fact that the benchmark workload model can run on an actual system, and that standard system measurement tools can be used to evaluate the performance of the system on which the benchmark model is run. A good bibliography, with abstracts of the major papers in the area of benchmarking and workload definition, has recently been published by the National Bureau of Standards [WA74]. 7.2. Real benchmark models. A real benchmark model is a set of jobs selected from among the jobs that are actually run as part of the workload of a system. 79
80
CHAPTER 7
For some purposes a benchmark model may be an actual section of the workload of the system, consisting of all jobs submitted during a given time period. It is more usual to construct a real benchmark model by selecting a set of jobs that display some characteristics that are appropriate for the purposes for which the model is being created. Benchmarks are frequently used to aid in the selection of competitive hardware and software systems. When the PUFFT system [RO65] was developed at Purdue University, a set of relatively short FORTRAN jobs was used as a benchmark model to determine to what extent PUFFT had achieved its purpose of speeding up the processing of such FORTRAN jobs on the 7090/7094. When it came time to consider purchasing a third-generation computing system, this benchmark set, slightly augmented, was submitted as an example of part of an academic computing center workload, to be run on the equipment proposed by several different manufacturers. Some useful information was obtained from this exercise, even though every attempt to run the benchmark set ran into difficulties. Most of the difficulties had to do with the fact that FORTRAN programs are not always machine-independent. Differences in word length, for example, provided many problems. Jobs were removed from the set and others were substituted in an attempt to obtain a set that might run on several different machines. In some cases jobs were changed to permit them to run on a particular system. In one case (the IBM 360) the proposed model 65 was not available for benchmark tests, and the jobs were run on a slower model as a substitute. It was clear that what was being tested was not the inherent capacity of the computers themselves, but the capabilities of the computers running a particular software system. In every case these were prototype software systems, even though the vendors themselves were not aware of the extent to which the software would be rewritten in subsequent years. The performance on this FORTRAN benchmark set was one factor in the decision process that led to the procurement of CDC 6000 equipment. Much more recently the University of Washington at St. Louis constructed a more elaborate benchmark model including COBOL as well as FORTRAN programs [ST72]. The purpose was equipment selection, and they ran into some of the same problems Purdue had run into in the much earlier equipment selection study mentioned above. Their results indicated that progress is being made toward the kind of language processor compatability that makes it possible to run some programs with little or no change on several different computing systems. Such compatability is necessary if real benchmarks are to be used successfully and conveniently in equipment comparisons. Ferrari in his key paper in this field [FE72] prefers to use the term "hybrid" rather than "real" to describe the benchmark models discussed here, since the jobs have been specially selected, and some have been modified, for inclusion in the benchmark model. 7.2.1. Validation benchmarks. Real benchmark models are often created whose purpose is to validate the proper operation of a system. It is rather amazing that computing systems of great complexity work correctly and consistently over
BENCHMARK WORKLOAD MODELS
81
long periods of time, and such performance should not be taken for granted. Both hardware and software are subject to routine maintenance and to changes and improvements. A benchmark set is almost essential to prove that the system can still perform at least as well as it could before changes were made. Subsets and special benchmark sets may be needed for hardware and software subsystems. The FORTRAN benchmarks mentioned above are an example. Validation benchmarks may also be used for performance evaluation, but they often provide a benchmark model that is not at all representative of the workload. An atypical job that uses unusual features of the operating system may be ideal for inclusion in a validation benchmark. It is quite usual to include jobs that have run into special system problems in the past, in order to try to guarantee that new system changes have not reintroduced old system bugs. 7.2.2. Difficulties in constructing real performance benchmark models. If a relatively simple system environment can be assumed, it may be possible to select a reasonable set of real jobs that is typical and representative of the system workload [JO65]. In a large general purpose system, an effort to construct an adequate performance benchmark model runs up against some fairly obvious serious problems for which there is no satisfactory solution. A large system provides for the cataloguing and retrieval of files of programs and data. Most jobs of any significance make use of these features. Some make use of extensive data bases, others simply of a facility for storing a program during its debugging phase. Any program that refers in any way to a permanent or catalogued file system will create problems if it is included in a benchmark set. The benchmark model could include its own permanent file system that is loaded as part of the performance study. This might be appropriate for a single large isolated study, but is hardly practical where benchmark studies are part of a continuing evaluation of performance. It also might not work at all in the case of benchmark programs that modify catalogued files. Even if these problems are overcome, one must be aware of possible violations of rights of access and privacy of information that occur through the inclusion of part of a user's data base in a benchmark set. It would be easy to decide simply to omit any job that used the permanent file system, but the resulting benchmark could hardly be called representative of the system workload. Another question that arises in setting up a benchmark is what to do about the terminal system. Batch jobs submitted through the terminal system can be included in the real benchmark, but interactive computing cannot be included. One might run the benchmark while users are actually at terminals doing certain standard things. It might be possible to collect a set of volunteers to help in a single isolated test during which a benchmark set of jobs is run, but the practical difficulties of doing this with very many terminals, or on any kind of regular schedule, are insuperable. The use of synthetic benchmark techniques to represent the terminal system while the real benchmarks are run might provide a reasonable compromise.
82
CHAPTER 7
Any devices with real time or other special requirements can further complicate an attempt to provide a really adequate benchmark workload model. Relatively few very large jobs may have considerable impact on system performance, but it is usually not practical to include any very large jobs in a benchmark set. It may be necessary to misrepresent the characteristics of the workload in order to avoid very long and expensive benchmark runs. 7.3. Synthetic benchmark models. A synthetic benchmark job is a job that has been designed specifically for inclusion in a benchmark model. A synthetic benchmark model is a set of synthetic benchmark jobs assembled for purposes of system measurement or testing or evaluation. Real and synthetic benchmark jobs may be mixed to produce mixed benchmark models. The best-known synthetic benchmark job is one that was presented in a paper by Buchholz [BU69a] in which he states: "The job described here is a greatly simplified file maintenance procedure, which exercises both the central processing unit and the major input/output devices, with activity parameters being specified in a manner independent of the system. A complete PL/I version is shown as an example." By changing the activity parameters in the abovementioned job it is possible to change the ratio of input/output to central processor activity. Variations of this synthetic benchmark job have been used in a number of studies [WO71], [SR74]. In [SR74], Sreenivasam and Kleinman used a large number of instances (88) of the Buchholz job with different parameters to simulate a "representative" workload. They used IBM's SMF (see Chapter 4) to obtain information about the distributions of central processor time and input/output in samples of the real workload, and chose the parameters of the instances of the Buchholz job so as to produce the same distributions in the representative workload. A recent article [OL74] by a group in the Navy Department presents a strong argument for the use of synthetic benchmarks, and comments on the relative ease with which they can be constructed and used in studies that would be very difficult to attack by other methods. 7.4. Simulating a benchmark workload. The benchmark sets considered so far consist of actual jobs, either real or synthetic, that are run on the computer under analysis. Another approach is to use representations of jobs rather than the jobs themselves to simulate the workload of the computer. The representations that we consider here are referred to as job scripts. A simulated benchmark workload consists of a set of job scripts that exercises the system as if the system were running the corresponding jobs. It is considered to be a benchmark workload since the results of the runs are analyzed by standard system measurement tools. Ross Garmoe [GA73] has designed a benchmark workload simulation package as part of a performance study at Purdue University. As of this writing, only a preliminary version has been implemented. The package consists of two routines, BENCH and MARK.
BENCHMARK WORKLOAD MODELS
83
The workload is assumed to consist of jobs from a number of distinct job classes. Examples of classes are: small FORTRAN jobs, large FORTRAN jobs, assembly language jobs, COBOL jobs, etc. BENCH produces the sequence of job scripts that simulates the system workload. The BENCH data structure consists of a set of tables that contains the assumed characteristics of each class of jobs. These characteristics include distributions of central processor time used and input/output units transferred. A job class may be represented as a sequence of job steps with different resource usage distributions for each step, for example, for compilation, for loading and for execution. The input to BENCH is an estimated distribution of the workload by class. In creating a job script for the simulated workload, BENCH first selects a class at random from the input distribution, and then uses random selection from the distributions in the tables that describe that job class to produce a script for the workload file. The distribution of jobs by classes, and the data in the class descriptions are obtained from an analysis of data obtained by the measurement techniques discussed in preceding chapters. The output of BENCH is a file of job scripts whose distribution of jobs by class, and whose resource utilization patterns within classes, are an approximation to the distributions observed or measured in actual workload distributions. The extent to which the simulated workload is representative of the actual workload depends on many factors, including of course the extent to which the measurements used in estimating the distributions are themselves typical or representative. The routine MARK is a job script interpreter whose input is the sequence of job scripts produced by BENCH. MARK uses system resources in accordance with the directions contained in the scripts. A script item that calls for 50 milliseconds of central processor time will put MARK into a 50 msec loop. A script item that calls for reading m characters from a file will cause MARK to read m characters from a dummy file on an appropriate device. A number of MARK programs can run simultaneously to produce the effect of multiprogramming. 7.4.1. Interactive workload simulation. Mention has already been made of the special difficulties that arise in an attempt to construct a benchmark workload model for a general purpose system that supports interactive computing. Efforts have been made to examine and characterize the remote terminal user and to estimate his contribution to the workload of the system [BO74]. Most benchmark studies either omit this type of computing or assume that its effect can be taken into account through the inclusion of special synthetic or simulated benchmarks. Simulated benchmarks designed especially for measurement of time-sharing systems have been used quite extensively [SA70], [SC70]. Hardware and software methods may be used to fool the system into acting as it would if a number of terminals were logged on and performing typical terminal scripts. Control Data Corporation has a program called a "stimulator" that simulates the effect of n terminals (n is a variable parameter) on a running KRONOS system on CDC
84
CHAPTER 7
6000 equipment. Similar routines have been developed in connection with other systems. 7.5. System dependent workload models. The simulated benchmarks discussed above illustrate one of the major difficulties in the area of workload characterization. Workload models tend to be very strongly system dependent. The simulated benchmarks make essential use of the hardware and software features of the system for which they are being constructed. They can be very useful for studies of the performance of that system, and for studying proposed changes to that system. While the techniques for model creation can be used in many different systems, the models themselves cannot. Only some relatively simple real and synthetic benchmark sets can be used across a whole class of computing systems. Such benchmark sets, though useful, are very limited in the extent to which they can actually represent the workload of a complex system. The problem of system dependence in workload models is not limited to benchmark models. It is also characteristic of the trace-driven and distributiondriven approaches discussed in Chapter 8. The need for system independent techniques of workload characterization has been one of the major motivations for theoretical studies of models of program behavior [DE68], [GR75]. 7.6. Use of benchmarks in computer design. Benchmarks are frequently used in connection with the design of a computer in order to evaluate and choose between proposed features of the design. Such benchmarks cannot be run on the computer that does not yet exist, but they can be used as input to a program that simulates the proposed new computer on an existing computer. The program simulates the details of each instruction, or sometimes even each machine cycle of the computer that is being designed. In Chapter 8 we shall discuss discrete event simulators that attempt to model computing systems. The type of simulation discussed here is quite different. It provides a detailed representation of part of the system, usually a central computer or a major processing component of such a computer. The simulation is often straightforward, but very very slow. It may pay to use very large amounts of computer time to test a design of a product that may sell hundreds or even thousands of copies. Such simulations are a standard tool of computer design. In the case in which the new computer is very much faster than existing computers on which it can be simulated, the cost of such simulations may become prohibitive to the point where they cannot be used. A very interesting performance study of this type is reported in [CN68] in connection with IBM's introduction of cache memory (later called buffer memory) on their 360 model 85. The authors state that "detailed timing studies of 26 different jobs were made, covering the complete range of characteristics anticipated for the model 85." They also point out that more than 1000 reels of trace data collected on existing 360 models were processed by their simulator during this performance study.
BENCHMARK WORKLOAD MODELS
85
7.7. Comments on equipment selection. In the very early 1960's, I was part of a consulting team assembled to evaluate proposals submitted by eight different computer hardware manufacturing companies to supply a major computing system for a new command and control center for the armed services. The obvious question was raised over and over again in our deliberations. Just what was the workload for which this computing system was to be procured? Part of the group went to Washington to interrogate the agency that had issued the request for bids, and they came back empty-handed. In essence they were told: "The problem would be easy to solve, and we wouldn't need to hire an expensive consulting team to evaluate these proposals, if we ourselves were able to describe the workload of a command and control system in any straightforward way." We were forced to make fairly arbitrary assumptions about what the system would have to do, assumptions based on very general documents describing the philosophy of military command and control, and we then made subjective evaluations of the extent to which the proposed systems would, implement this philosophy. Ultimately the contract was awarded to one of the lowest bidders, mostly because awarding it to higher bidders would have required more justification, and hence more knowledge and information than we could acquire in the time that was available to us. I found the whole process very unsatisfactory. In more recent years some computing system procurement methods used by federal government agencies have become more sophisticated. In a typical large procurement each bidder is required to implement a fairly large application program that is representative of the workload of the system. A specified language, usually COBOL or FORTRAN must be used, and the award goes to the lowest bidder whose equipment can run the application faster than a specified minimum acceptable speed. The equipment must, of course, satisfy some other criteria, but the capability shown in running the application is the crucial test. Here again the results are not always satisfactory. There is a real danger that the benchmark application is not an adequate representation of the workload, and that the computer that just manages to perform the test within the maximum allowed time may be marginal for the total workload. The requirement that the system must actually be able to run the benchmark keeps manufacturers from proposing equipment that has not yet been designed and built. It also keeps them from proposing new equipment that could be delivered on time, and that might provide real advantages to the customer. The requirement that every bidder must run the same application benchmark may be necessary to guarantee fair treatment to all vendors, but in some cases there is a danger that it may lead to the procurement of equipment that is approaching obsolescence.
This page intentionally left blank
CHAPTER 8
Simulation 8.1. Introduction. The idea of using computer simulation as a tool in the study of the performance of computer systems goes back almost to the beginning of electronic computing. Early simulations were done mostly in connection with the design of new computers. At the most detailed level, logic simulators were (and still are) used to check validity and performance, pulse by pulse and circuit by circuit. On a higher level an instruction simulator simulates the execution of programs on a projected machine (or on another existing machine). Since these simulators execute instructions interpretively, it is not difficult to insert and accumulate timing information, and thus to obtain estimates of the speed with which the simulated machine will execute various test programs. Simulation of many kinds of complex systems was undertaken even during the first computer generation, and computer simulation has become an important area of study and development with a large and extensive literature which includes many books and articles and the proceedings of numerous symposia. Bibliographies can be found in [BE72] and [MA70]. The discussion here will be limited to a brief consideration of the use of discrete event simulation models for the study of the performance of computing systems, with examples based on the study of a time-sharing process of the type discussed in Chapter 2. 8.2. Simulation languages. Computer simulations can be written in assembly language, or in FORTRAN or one of the other general purpose algorithmic languages (ALGOL, PL/1, COBOL, Pascal, etc.). They can be—and usually are—written in special purpose discrete event simulation languages. The object code produced by these "problem-oriented" languages is usually inefficient, and they do not always provide some of the features one might hope for, but they have one overwhelming advantage. It is usually possible to write and debug, and if necessary change, a simulation written in an appropriate simulation language in a small fraction of the time that it would take in any lower level language. There are many useful simulation languages and the reader is referred to [TE66] and [FI73] for surveys and discussions of their various features. One of the first, and still one of the most important of the discrete event simulation languages is GPSS, first implemented on IBM's 700 series computers, and extended and updated in versions for their 360/370 series [GO69], [GO75]. Another of the early simulation languages was SIMSCRIPT which, in its more recent versions, is one of the best-known and most widely used simulation languages [KI68]. 87
88
CHAPTER 8
SIMSCRIPT gives its users the capability of incorporating FORTRAN subprograms in a simulation written in the SIMSCRIPT language. GASP [PR74] provides similar capabilities. GPSS is sometimes referred to as a transaction-oriented language, and SIMSCRIPT as an event-oriented language. The language SIMULA, based on ALGOL 60 exemplifies still another approach, that of a process-oriented simulation language. A more recent example of such a language is ASPOL, which is described in [MA73]. The following statement appears in [MA74]: "In the author's view, process-oriented simulation languages such as SIMULA or ASPOL are unquestionably superior to other types of simulation languages for the development of large scale computer system simulation models." The author referred to in this quotation is Myron H. MacDougall, of Control Data Corporation, who was in charge of the design and implementation of ASPOL. 8.2.1. ASPOL. When several good languages exist in an area, the determining factors in the decision to use a particular language are usually considerations such as the availability of language processors, local support and documentation, and personal contact with others who have used the language. For these types of reasons, and not as a result of a study of the relative merits of alternative languages, I chose ASPOL as the language to be used in simulation studies of some aspects of the Purdue MACE system. The simulation program examples in this chapter will therefore be in ASPOL. I feel that the language is at a high enough level that even the reader who has never heard of ASPOL before will be able to follow the examples in as much detail as he wishes, provided only that he has some background in general purpose programming language concepts. ASPOL is written as a FORTRAN pre-processor, and thus includes FORTRAN as a sublanguage, and permits FORTRAN subprograms to be included in an ASPOL simulation. The fact that it is based on FORTRAN makes the use of ASPOL more comfortable for those who are used to FORTRAN than the ALGOL-based SIMULA language, which is in some ways more powerful than ASPOL. ASPOL is not a block-structured language in the sense that SIMULA and ALGOL are, but it does use the DO WHILE and IF THEN ELSE constructs along with the BEGIN, END statement parentheses that were introduced in ALGOL. This makes it relatively convenient to avoid the use of GO TO statements, and to write programs that have many of the attributes recommended by advocates of "structured programming." The sample ASPOL programs in this chapter are written in this "structured programming" style. When using a simulation language it is often necessary to be familiar not only with the structure of the application, but also with the structure and philosophy of the simulation language and of its processor. The book "Structured Programming" [DA72] consists of three monographs, of which the third, "Hierarchical Program Structures" by Dahl and Hoare, contains a detailed concise implementation of a process-oriented simulation language, with examples (Section 7.1, pp. 210-220). One would, of course, have to read the
SIMULATION
89
monograph from the beginning (i.e., from page 175) to understand all of Section 7.1, but it is well worth the effort. The monograph by Dahl and Hoare is a discussion of the underlying concepts of SIMULA 67, but the section on simulation applies equally well to ASPOL and to other process-oriented simulation languages. I recommend it very strongly to the reader who wishes to have a more complete understanding of this type of simulation language. 8.3. Characteristics and difficulties of simulation. Simulation has a number of characteristics that make it very attractive as a tool for the study of the performance of computing systems. Simulation languages make it relatively easy to write simulation programs, and it is at least theoretically possible to simulate a system with any desired level of detail and accuracy. I consider the use of simulation, combined with the use of measurement techniques to provide input data and to validate simulation models, to be a promising and productive area for research and analysis studies whose goal is a better understanding of the behavior of computing systems. Simulation does have some very serious difficulties and drawbacks that make some authors and analysts skeptical of the value of results achieved so far by this methodology, and of its potential as a practical tool for achieving insight into the performance of complex computing systems. Thus, Grenander andTsao [GR72] state: "We doubt that the measurement and simulation activities of a particular system have improved significantly the general understanding of computer systems." Even when the goal is only to achieve better understanding of one particular system, experience has shown that the writing and testing and debugging and validation of a simulation model of a complex system can be a long and difficult process. Simulations can use very large amounts of computer time. A thorough simulation study may require an analysis of the behavior of a model in which several parameters are varied independently. The number of simulation runs needed is a product whose factors are the numbers of values of each parameter. This product can easily become large enough to tax the resources of the largest computers. It is easy to underestimate the difficulty and the cost of a simulation study. The complexity of the simulation model is of the same order of magnitude as the complexity of the system at the level at which it is being simulated. A fully detailed simulation will have most of the simulated system's complexity, and the problems and difficulty of producing a correct simulation program are quite analogous to the problems of producing and debugging the system itself. Nielsen [NI67] mentions a computer manufacturer who "decided that the development of a suitable simulation model might require as much effort as was budgeted for the development of the proposed time-sharing system itself." Bell [BE72] urges the investigator to "state objectives clearly and in writing at the beginning of a simulation effort" . .. "because the effort required to simulate a computer system is often very great. Analysts should consider carefully the
90
CHAPTER 8
probable value of the results before embarking on it." Huesmann and Goldberg [HU67], after pointing out many of the drawbacks of using simulation, state that "perhaps the foremost reason for simulation's popularity is the lack of a viable alternative." 8.3.1. Debugging a simulation. The problems and techniques of debugging a simulation model are similar to those of a real system. A complex system usually contains bugs that do not appear, and are not removed, until the system has been subjected to extensive use in a production environment. There is nothing quite analogous to extensive use under production conditions in the case of simulation models of computer systems. All of the bugs that are going to be found must be found in test and debugging runs. An alternative might be to attempt a formal proof of the correctness of the simulation program. Although program proof methods may never be powerful enough to make it practical to prove the correctness of a complicated simulation program, the rigorous analysis of the consequences of each program section that is associated with an attempt to produce a formal proof may help make the debugging of complex simulation programs manageable. 8.4. A classic simulation study. One of the most interesting and important of the early computer simulation studies was carried out by Norman R. Nielsen in his Ph.D. dissertation at Stanford University [NI66]. IBM had announced the 360 model 67 and the TSS 360 software system in 1964-1965, in response to the demand for time-sharing systems by universities and other research centers [RO69]. Since the system did not exist, it was not possible to test its performance directly. Nielsen used preliminary specifications to set up a simulation model that revealed gross inadequacies in the proposed system. This study was one of the factors that caused a number of customers (including Purdue University) to cancel their orders for 360/67 systems. On the positive side, these factors led to improvements in the TSS 360 system and to the development of a number of alternative systems [RO72]. It does not detract at all from the importance and influence of Nielsen's study to point out that here, as in other areas of performance analysis, the most striking results are achieved when the systems studied have serious defects that can be discovered with relatively simple models and techniques. 8.5. General simulation models. A number of very general computer system simulation models have been developed by commercial organizations. The SCERT system which is marketed by Comress Inc. is perhaps the best known. Their simulator "accepts specifications of the hardware and software to be simulated, and constructs a 'configuration model' representing the performance capabilities of the components of the system." This quotation is from a promotional brochure that describes the SCERT 70 system. The system also accepts definitions of the runs that will make up the workload of the system that is to be simulated. Definitions of runs can be input in a number of
SIMULATION
91
forms, one of which would be the information produced through the combined use of a hardware probe and an accounting log. The simulated workload drives the configuration model, and the output is a series of reports that may be used to predict the performance of the simulated system. Another product of this type is SAM, the System Analysis Machine, which i offered by Applied Data Research. The following quotations are from an article by Roger Buchanan in "Computerworld" on December 4, 1974: "SAM has an automatic model generation feature which generates accurate models of current application jobs by using SMF data to build a model of each job step." Its library contains "pre-defined submodels of hardware components written in SAM language" . . . "In addition, a system software library is available which contains highly parametrized models of the most commonplace used system software." Products like SCERT and SAM have been used in competitive system procure ments to predict the price performance characteristics of systems proposed for handling a specified workload. Losing bidders sometimes question the validity of this modeling approach. There seems to be no reason why this approach should not give reasonable though perhaps very rough performance figures. The difficulty of validating any performance models has already been noted, and will be discussed further in later sections. It is even more difficult to validate the very general models used in this kind of modeling system. Here as elsewhere a good understanding of the modeling system and of the system being modeled is essential in the interpretation of the data. It is sometimes difficult to get such an understanding in a commercial environment. A company that supplies modeling and evaluation services may, understandably, be reluctant to make the details of its software (and possibly hardware) available to the public, and thus to its competitors. 8.6. Simulation of a time-sharing process. Two simulation programs will be discussed here as examples of the use of a process-oriented simulation language, and as vehicles for the discussion of some of the concepts and problems associated with the simulation of computing systems. The simulation programs model two different scheduling strategies for a time-sharing process of the type discussed in Chapter 2. The model was developed in connection with a study of the MESA time-sharing subsystem of the Purdue MACE operating system. Data collection and mathematical modeling techniques discussed in other chapters were used in this study along with simulation. As a result of the studies of the performance of the system, some important changes were made in the structure of the MESA time-sharing process, that will require the programming of a new and more complicated model if further simulation studies are to be conducted. The programs discussed here thus represent only a very preliminary phase of what may become a full-scale simulation study. The ASPOL programs and the results of a single run on each of them are included at the end of this chapter.
92
CHAPTER 8
8.6.1. Distributions. Both programs make the same assumptions about the distributions that drive the simulations. Every interaction has its think time (user state time) selected at random from an exponential distribution with a mean of 24 seconds, and requests a service time which is 20 milliseconds (ms.) plus a quantity chosen at random from a hyperexponential distribution with a mean of 700 ms. and a standard deviation of 1200ms. The values used for means and standard deviations are typical values that were derived from data obtained through use of the software probe described in Chapter 5. In the MESA system, a time slice cannot end while certain peripheral processor operations are in progress, and slices may thus be longer than the nominal 250 ms. allotted. In the models, the time slice is therefore 250 ms. plus a quantity selected at random from an exponential distribution with a mean of 100 ms. The assumption of an exponential distribution for the time slice increment seems to be reasonable on the basis of measured data. The measured service time distribution is definitely not exponential. For these simulations we used the hyperexponential distribution, which seems to provide a fair approximation to the observed service time distributions. The exponential distribution of think times has been used in theoretical studies, and was tentatively carried along into this model. A more exact simulation would use a think time distribution that provides a better fit to observed think time distributions. Exponential distributions are discussed in Chapter 9. A hyperexponential distribution is a weighted sum of exponential distributions [KL74]. The particular distribution that is referenced by the call HYPERX(«, v) in ASPOL is a two-stage hyperexponential distribution
with mean equal to u, and with variance equal to v. It is easy to show that H2(f) has the mean u, and that it will have variance v if p is chosen to satisfy the equation
8.6.2. Models of scheduling strategies. The two programs (models) assume that the number of active terminals remains constant throughout the study. In the runs whose results are presented here, the number of terminals Nis equal to 30. Each terminal is represented by a tasK, and the conceptual model assumes that there are N tasks that circulate around the closed systems shown in Figs. 8.1 and 8.2. A complete circuit around a loop by a task corresponds to an interaction. 8.6.2.1. Model 1 (Figure 8.1). When a task enters facility 1 at B it is delayed in facility 1 by an amount of time selected at random from the think time distribution. When it leaves facility 1 at A, it makes a service request of s milliseconds, where s is selected at random from the service time distribution. Its system time, which is the response time for this particular interaction, is the time it takes to get from A back to B.
SIMULATION
93
FIG. 8.1. A simple model of a time-sharing process
From A it goes into queue 1, which is a First-in First-out queue. When it reaches the head of this queue it will be given a time slice in facility 2, which represents the time-sharing process. The size of the time slice is selected from the time-slice distribution. The task may use as much of that time slice as it needs. If it finishes its service time within that time slice, it leaves facility 2 as soon as it has finished, and goes back to facility 1. If it has not finished its service time by the end of its first time slice, it goes into queue 2 which is once again a First-in First-out queue with the same time-slice distribution as queue 1. Since queue 1 has priority over queue 2, facility 2 takes its next transaction from queue 2 if and only if queue 1 is empty. A transaction that requires several time slices to complete its service request, circulates back to queue 2 after each of its time slices in facility 2, until it achieves its service time and goes back to facility 1. 8.6.2.2. Model 2 (Figure 8.2). The scheduling algorithm operates in two phases. In phase 1 only those interactions in queue 1 are considered. The interactions in queue 1 are those that have not yet had one time slice of service. The entries in queue 1 are ordered by task number, which is the same as the terminal number. A top-of-queue pointer moves past each task in order, and
FIG. 8.2. A two-phase multi-queue model of a time-sharing process
SIMULATION
95
gives it a time slice of service if it has an interaction in queue 1. The interaction may terminate in its first time slice and go back into user state in facility 1. If it needs more time slices it goes into queue 2. The pointer moves around until it has pointed to all N tasks in the current phase 1. It is then repositioned just past the last task to which it assigned a time slice, and phase 1 ends. The pointer now marks the task that will be considered first in the next phase 1. An interaction that arrives in queue 1 after phase 1 has begun, but before the pointer has passed its task number, will get a time slice in the current phase 1. An interaction that arrives in queue 1 after the pointer has passed its task number will have to wait until the next phase 1 before it gets its first time slice. In phase 2, a fixed total number K of time slices is distributed among those interactions that have already had at least one time slice. During phase 2, the next time slice goes to the interaction that is first on the lowest-numbered of the queues 2 to 64. Interactions in these queues are ordered by task number, with the highest task number first. In phase 2, if an interaction completes a time slice and needs at least one more, it moves to the next higher-numbered queue. If it is already in the highest-numbered queue it moves back to its task number position in that queue. The number Kis a scheduling strategy parameter. A small value of K increases the preference given to short interactions. A larger value of K gives more time to longer interactions. Phase 2 ends when K time slices have been used. It can end earlier, if all of the queues from 2 to 64 become empty before Ktime slices have been used. The scheduler alternates between phase 1 and phase 2. This rather complicated model results from an attempt to simulate accurately the scheduling strategy that was in use at the time this simulation program was written. 8.6.3. Accuracy of the model. The fact that the strategy can be simulated accurately does not imply that the total simulation model is accurate. The model is still a very much simplified representation of the system that is being modeled. In any modeling activity, it is important to be aware of the assumptions that have been made. In both of these models, the assumptions that the number of terminals (tasks) remains constant, and that each task can be represented by parameters chosen from a single known set of distributions, are very strong simplifying assumptions that affect the accuracy of the representation of the system by the model. It would not be difficult to change the model to handle a varying number of terminals. The problem of accurate characterization of the tasks that go through the system is a very fundamental one. The use of empirical distributions and of trace data will be discussed in later sections. The more general problem of workload characterization has been discussed briefly in Chapter 1 and at greater length in Chapter 7. The validity of a simulation study usually depends more on the accuracy of the representation of the workload that drives it than on the accuracy of the representation of the system that is being driven. Both must be considered in any estimate of the valdity of the model.
96
CHAPTER 8
8.6.4. Simulation programs. Program 1 (pages 103 to 104) simulates the scheduling strategy of Model 1. Our discussion of this program will be rather detailed in order to give the reader some insight into the logical structure of the running (object time) system associated with a simulation language like ASPOL. In order to make the simulation programs more readable, we have departed from some aspects of good programming practice. The values of the various parameters are included directly in the programs. Production versions of these programs would be somewhat longer, and all parameters that might change from run to run would be variables which would be set by DEFINE statements at the beginning of the program. The word "process' is used with two different meanings in this section. In order to avoid confusion we shall use "process" with the meaning defined in Chapter 2 only when it is preceded by the words "time-sharing," and refers to the time-sharing process that is being simulated. When discussing ASPOL programs, the word "process" will be used in its technical sense as defined in the ASPOL manual [CD72]: "A process is a dynamic entity, a particular and unique instance of a specified set of activities." . . . "A process description is a program which apparently is being executed simultaneously for all the currently-existing processes it represents." Program 1 consists of a short driving program SIM MESA on lines 1 to 13, and a description of the process INTER that makes up the rest of the program. During execution of the driving program, the FOR statement that starts at line 10 of page 1 of the program listing causes the logical equivalent of 30 copies of the code for the process INTER to be created. Each copy represents a terminal and is identified by the integer value /with which it is initiated. Even though the simulation program does not actually create 30 copies of the code, it is convenient to talk in terms of 30 separate copies. As each copy of the program INTER is initiated it starts executing and continues executing until it comes to a waiting or a queueing instruction such as HOLD or WAIT or RESERVE or QUEUE. The DO on line 29, which is not preceded by a FOR phrase, directs the next statement to be repeated over and over again until stopped by some action external to that statement. The statement that is repeated starts with the BEGIN on line 30 of page 1 and ends with the END on line 25 of page 2. It is essentially the whole INTER routine, and represents a single complete interaction. The 30 copies of the code for INTER generated by the INITIATE statement (line 11, page 1) execute concurrently, thus simulating 30 simultaneously active terminals. The concurrency is, of course, simulated, since in fact only one copy can be executing at a time. A master routine keeps track of simulated time and maintains a number of lists. A copy of a process, represented by a pointer to the part of the program that represents that copy, can be on one and only one list at any given time. All processes that are ready to execute are on a ready list in priority order, with the one with highest priority first. Processes of equal priority will execute in the order in which they were placed on the list. A process that goes into execution is said to have been activated.
SIMULATION
97
Processes that are not executing may be waiting for the passage of a known amount of time, or may be waiting for an event to occur. If their next time of activation is known, they will be on the activation list in the order of increasing time of activation. If a process is waiting for an event to occur, it will be in a list associated with the occurrence of that event. The list may be a queue, in which case the first process (i.e., the one with highest priority, or the one waiting longest in the case of equal priority) is moved to the ready list when the event occurs, and the next one moves to the head of the queue to wait for the next occurrence of that event. If the list is not a queue, then all waiting processes become ready to execute when the event occurs. The process in execution can run until it comes to a waiting or queuing instruction. While it is executing it may cause an event to take place by executing statements like SET or RELEASE. Whenever this occurs control goes back to the master program which moves any appropriate waiting processes to the ready list, and then gives control to the first process on that list. When a process comes to a waiting or queuing instruction, control goes to the master routine, which places it in its proper sequence on the activation list if the time it will have to wait is known (i.e., a constant). Otherwise the process is placed on the list associated with the event for which it is waiting. The master routine then activates the next process that is ready to execute. If there are no more processes ready to execute, it advances simulated time to the activation time of the first process on the activation list, and that process is then placed in execution. The driving routine SIM MESA is itself a process, and the HOLD instruction in line 12 puts it into the activation list with a wait time of 6,000,000. When 6,000,000 units of simulated time have passed, the driving process will be activated, and the execution of END SIM will terminate the simulation. Let us now follow one interaction for one terminal. The HOLD statement on line 31 corresponds to think time. It causes this copy of the code of INTER to be placed on the activation list until an amount of simulated time equal to the value selected from the built-in exponential distribution has passed. This copy will then be activated and statements 32 to 35 are executed. Statement 32 selects a service time INTIM from the hyperexponential distribution. The time at which system time started is then stored in TIN, and the RESERVE statement gives control to the master program, which places this copy on a queue for the facility MCP declared in line 2. The facility MCP corresponds to what we have called the time-sharing process. It is facility 2 in Fig. 8.1. The interaction will be complete when this copy has spent INTIM units of time in control of the facility MCP. Instead of using two separate queues, this simulation uses a single queue with two different priorities. All of the copies with priority 1,000 are in queue 1, and all of the copies with priority 100 are in queue 2 of Fig. 8.1. After the passage of some amount of simulated time, as recorded in the global variable TIME, this copy will be at the head of the queue and will take control of MCP at the time that it is released by another process. It will hold MCP until it executes a RELEASE statement. A slice time Y is selected in line 36. Line 37 asks if all of the service time, INTIM, can be included in this one time slice. The code between 40 and 50
98
CHAPTER 8
handles this case. The HOLD on line 41 gives control back to the master program. This copy of INTER will not be ready to execute again until an amount of simulated time equal to its service time has passed. When this copy resumes execution it will RELEASE the facility MCP, once again returning control to the master routine, which gives control of MCP to the next process in the MCP queue. The current copy remains on the ready list, and when it resumes its execution the logic of the IF THEN ELSE statement sends control to the END at line 25 of page 2, and the indefinite repeat induced by the DO on line 29 of page 1 sends this terminal (or copy) back into user state. If the service time cannot be completed in one time slice, the IF in line 37 sends control to the ELSE clause that starts on line 5 1 and that makes up the rest of the program. Line 55 represents the end of the first time slice, after which MCP is released on line 2 of page 2. The new priority of 100 places the copy in the second queue, and its remaining service time requirement is TIMR. The reader who has followed so far should find the rest of the program easy to follow. The simulation of Model 2 (Fig. 8.2) is a bit more complicated (pages 110-112). An attention array ATT is declared on line 10 of SIM in such a way that it can be made common to all processes by the single macro-statement ABLOCK. An additional process, CONTROL, is initiated to handle the queuing and the alternation between phase 1 and phase 2. The 30 copies of the process INTER are quite similar to those in the first program, but here, at the end of its think time, each copy sets its identifying attention bit in the common array ATT. The identification is the copy number (terminal number). The copy then gives up control, and waits until the process CONTROL gives it a signal to proceed. There is a group of events GO, one for each terminal. The indexed SET (GO) instructions on line 48 of page 1 and line 5 of page 2 permit the corresponding copies to proceed, i.e., to become ready to execute. All of CONTROL up to line 10 of page 2 controls phase 1 of the scheduling algorithm of model 2, in which an interaction gets its first time slice. After CONTROL issues each SET GO, it waits for an event ATSE to be set. A copy of INTER sets ATSE when it completes its first slice. The group of 63 events, QU(63) declared in line 14 (page 1) sets up 63 corresponding queues. The code in the ELSE clause on page 3 causes a case of INTER to move into successively higher-numbered queues as it gets more and more slices. The CONTROL code in lines 14-21 of page 2 uses the index J to allocate 10 slices to phase 2. Thus the parameter Kof the scheduling algorithm has been set equal to 10 for this particular run. Each case of INTER sets the switch STSE when it completes a time slice other than the first, and CONTROL uses the WAIT (STSE) instruction to regain control at the end of each such time slice. The IF statement on line 23, page 2, simulates the situation in which there are no incomplete interactions in the system, i.e., all of the terminals are in user state, and facility MCP is idle. This IF statement was omitted in an early undebugged version of the simulation program, causing the program to go into an infinite loop in the routine CONTROL when the idle situation arose. In the idle system there was no wait executed in CONTROL, and therefore no way to give control back to the master routine to cause time to advance to the beginning of the next
SIMULATION
99
interaction. The event ATCON, which is set by every case of INTER just before it makes its service request, is needed to provide a waiting list for CONTROL whenever the simulated system becomes idle. It is interesting to note that an early version of the time-sharing process MESA in the Purdue Mace system had a serious performance bug in the handling of the situation in which the time-sharing process was idle. 8.6.5. Simulation results. The results of the two simulation runs are presented along with the programs. The simulation language makes it quite easy to produce detailed reports and graphs through use of the TABLE declarations and the RECORD statements. In Program 2, the MONITOR statements on lines 15 to 19, page 1 cause the simulator master program to maintain statistics for the first 20 queues, and this data is reported on page 1 of the simulation output report. Both programs were also run for a 40-terminal system in which we would expect essentially 100% utilization of the facility MCP. It would serve no useful purpose here to give detailed results of these simulation runs. Formula (9.11) of Chapter 9 suggests that the system modeled here should be able to give good service to about 33 terminals. Assuming 100% utilization for MCP, formula (9.10) predicts that the average response time for the 40-terminal system would be 4.8 seconds. The simulation runs showed MCP busy only 96% of the time, and for both models they showed an average response time of 5.7 seconds. As might be expected, the standard deviation for Model 2 (16.4 seconds) was larger than for Model 1 (12.7 seconds), and the maximum response time for Model 2 (512 seconds) was larger than that for Model 1 (157 seconds). For Model 2 several additional runs were made with different values of K, with several different values of N. Each run used about 200 seconds of central processor time on the CDC 6400 computer and cost about $18.00 at the standard rates charged at Purdue's computing center. Thus even this very elementary simulation study, whose purpose was more pedagogic than technical, turned out to be fairly expensive: a warning about the probable cost of a more comprehensive simulation effort. The simulation runs give some insight into the extent to which the more complicated scheduling strategy can be expected to succeed in giving better response to short service requests at the expense of those with long service requests. In other runs additional output tables were produced that showed the distributions of response time for a number of ranges of service time. It is easy to get almost any desired amount of detailed information about the performance of a simulation model. It is easy to fall into the error of thinking that the detailed behavior of the model reflects the detailed behavior of the system that it represents. 8.7. Length of simulation runs. A simulation program is part of a study or experiment designed to produce information about and insight into the performance of a process or a system. The simulation should run long enough to produce results that satisfy the needs of the analyst or investigator who is making the study.
100
CHAPTER 8
The simplest approach to determining an appropriate running time is an empirical approach. Produce summary results after every m seconds of simulated time and keep running until some critical parameters have stabilized. This was done for the simulation programs described here. Elementary queuing theory considerations predict that the models will achieve a steady state, and a leveling off of response time, for example, can be taken as an indication that the steady state has been reached. There is some question as to whether the time-sharing process itself reaches a stable state, since our data indicates an unexpectedly large amount of fluctuation in the number of terminals logged on. This is an area worth of further study, but for now we shall consider the model rather than the underlying system. In some situations it may be adequate to observe the fluctuation of a critical parameter, and then make an arbitrary decision as to how long to run. In a sophisticated simulation study this observation of critical parameters will be done by the simulation program itself, through use of a statistical analysis routine that is activated at intervals during the simulation run. The distributions observed in studies of time-sharing systems frequently have standard deviations that are large compared with their means. For example, the response-time distribution for our simulation example 1 has a mean of 2.806 and a standard deviation of 6.417. This was based on 5911 simulated interactions. If we assume that these 5911 interactions represented a random sample from the set of all interactions, elementary sampling theory states that the standard error of estimate for the distribution of sample means is
Arbitrarily taking twice this standard error as a confidence limit, we might then say that the probability is very high that the average response time is between 2.639 and 2.973. It has been observed by many authors that the sets of values of response times and other parameters do not represent random samples at all. The response to a particular service request may depend in very complicated ways on the service requests made by preceding interactions. Methods of autocorrelation analysis rather than those of sampling theory are appropriate in determining confidence limits for means and other estimated statistics. The reader is referred to Chapter 10 of [FI73] and the references given there and in [MA70] for a discussion of the use of sophisticated statistical analysis techniques to determine how long a simulation should run. Another, more recent, reference is [CR74] in which the authors address "questions of simulation run duration, and of starting and stopping simulations . . . from the viewpoint of classical statistics." The paper [SC74] discusses the practical use of the methods of [CR74] in a test simulation of a time-sharing system. 8.8. The starting and stopping problem. Another serious problem is the start-up problem. A real time-sharing system starts with a sequence of log-ons
SIMULATION
101
when the system first comes on, and may gradually build up to its full load. A separate start-up strategy could be incorporated in the model. It is clear that any performance statistics collected during the initial start-up period would be atypical, and might adversely affect the validity of the average values and distributions of the results. In the simple simulation model presented here all terminals are assumed to be entering user state at time 0 when the simulation starts, but no data is collected on any transactions that start before 500 seconds of simulated time have elapsed. Since average user time is taken to be 24 seconds, almost all terminals will have been through a number of interactions before data-gathering starts, and at least some of the starting transient will have been dissipated. At the end of the simulation there will be a number of interactions that are incomplete. If we simply leave these interactions out of our totals we shall be introducing a bias into the result, since we shall be leaving out a disproportionate number of long interactions. For example, if we define interactions with response time greater than 20 seconds as long interactions, then all long interactions that start in the last 20 seconds would be discarded, along with a number of very long interactions that started even earlier; while all short interactions that started earlier, and many short ones that started in the last 20 seconds, would be included in the statistical summary. For this reason we do not include any interactions that start in the last 200 seconds. All of the data that goes into the results are collected within the scope of the condition: IF((TIN.GT.500000.).AND.(TIN.LT.5800000.))THEN. The reader is referred to the references mentioned at the end of the preceding section for more sophisticated statistical approaches to the starting and stopping problem. 8.9. Trace-driven modeling. In the simulation examples that we have discussed, we have used simple distributions to drive the system. Relative advantages and disadvantages of using mathematical models and simulation models are discussed briefly in Chapter 9. There are serious difficulties in both areas, but one major advantage of simulation models lies in the fact that there are essentially no restrictions on the distributions that can be used to characterize the parameters of the system. Simulation languages usually provide the facility for obtaining random values from any of a number of standard probability distributions. It is also quite easy to introduce and use empirical distributions which can be provided to the system in a number of ways. A very interesting approach to providing workload driving data to a simulation model of a computing system is to use data provided by trace programs. The software tracing probes discussed in Chapter 4 can be especially useful for this purpose. The concept of trace-driven modeling was introduced by Cheng [CH69] in 1969. He states: "In the trace-driven approach, data traced on a real running system are used to drive the model. The workload and the activities of system
102
CHAPTER 8
components in response to the workload are supplied as input to the model in the form of trace data." This attractive approach has been used by other investigators. Sherman, Baskett and Browne [SH72] used this technique to study a number of central processor dispatching strategies, using "trace data for 500 jobs on May 13, 1970." This specific designation of the source of data used illustrates the major difficulty with trace-driven modeling. The trace data that was used was collected in one relatively short interval. The way in which the data were used indicates that the authors considered the data adequately representative of the system workload for the purpose of the study that was being undertaken. In other more elaborate studies one might duplicate experiments using traces taken at different times, and use statistical tests to determine the consistency of the results, or one might combine the results of traces taken at different times in an attempt to produce a model of the workload that is more representative than any single set of trace data. In trace-driven modeling it is often not possible or not convenient to use the raw trace data as input to the model. The trace data may itself merely be the input to a procedure that produces as its output the model of the workload that drives the model of the computing system. Trace-driven modeling has been mostly discussed in connection with simulation modeling. In a sense every model is a simulation of the system being modeled. The trace-driven approach can be used to drive other models, or to drive a real system, in order to study the effect of proposed changes in system components or system strategies. 8.10. Models driven by empirical distributions. The simulation examples in this chapter are distribution-driven models. The driving distributions are the mathematical distributions: exponential, hyperexponential, etc., that are provided in the library associated with the simulation language processor. Another approach, related to the trace-driven modeling approach, is to use frequency distributions derived from measurements of the parameters of the system. The simulation programs discussed in § 8.6 have been modified to use frequency distributions, supplied in the form of tabular input data, in place of th mathematical distributions supplied in ASPOL. This seems to be a very attractive way to use data obtained from system measurements to enhance the validity of a simulation model.
SIMULATION
103
SIMULATION PROGRAM LISTINGS AND OUTPUTS
PROGRAM 1 ASPOL SOLRCE LISTING
05
10
CDC C Y B E R 70
VI.0
C Y C L E <*A
10/0^/7'*. 14.1.7.^9. PAGE 1
SIM MESA; FACILITY MCP; TA3LE RINT ^RESPONSE TIPE (1)* STED 20CC. UNTIL 63000. PLOT CUM: TABLE INT ^INTERACTION TIME* STEP 500. UNTIL 10000. PLOT OFF;
T A 3 L E SLICE # S L I C E T I M E * STE° 50. U N T I L 1000. P L O T OFP; TA3LE SINGLE <1SLICE RESPONSE (IK STEP 250. UNTIL 3000. PLOT OFF : TABLE ATWAIT /ATTENTION WAIT* STEP 100. UNTIL 1500. PLOT OFF! INTEGER i; MONITOR(MCP); FOR 1=1 UNTIL 30 00
INITIATE I N T E R ( I ) :
HOLD(6000000.)!
END SIM; 15
20
25
3C
PROCESS I N T E M T C B ) ; L O C A L I N T E G E R TCP.: L O C A L R E A L Y; L O C A L R E A L TIN: L O C A L R E A L T I M A , T I M B , T I K C t I N T I M ,TIK9; -DO 8EGIM HOLO(EXPNTL(2UOCO.)); INTIM = 20. + H Y P E R X ( 7 C O . t l ' i ' » O O O G . ) !
TIN = T I H E ;
35
i»0
*»5
PRIORITY = l O O O : RESERVE(KCP)? Y=250 * E X P N T L d O O . ) ; IFCINTIM.LE.Y) THEN 9EGIN RECORD INTIM IN SLICE; HOLD(INTIM)! RECOP.D(TIME - INTIf - TIN) IN ATWAIT; R E L E A S E (MCP); I F « T IN. GT. 500000.) .AND. (TIN.LT.5800 0.00.)) THEN BEGIN
50 ELSE 55
END:
RECORD INTIM IN INT: 9ECORD(TIME - TIN) IN RINT: PECORO(TIME - TIN) IN SINGLE: END:
BEGIN R E C O R D Y IN SLICE; HOLD(Y);
104
CHAPTER 8 ASPOL SOURCE LISTING
COC CYBER 70
VI. 0
CYCLE <*A
10/0«*/7<». 1«*.«»7.5<». PAGE 2
RECORD (TIME - Y - TIM IN A T W A I T ; RELEASE(MCP): TIMR = INTIf - Y; PRIORITY = 100: Y=250+EXPNTL(100l ? WHILE(TIMR.GT.Y) DC BEGIN RECORD Y IN SLICE: RESERVEIKCP) ' HCLO
05
10
END:
RECORD TIMR IN SLICE: RESERVEtMCP) : HOLO(TIMR): P.ELEASE(MCP) : I F ( ( T IN. GT. 5 0 0 0 0 0 . ) .AND. (TIN. LT. 5 9 0 0 0 0 0 . ) ) THEN BEGIN R E C O R D INTIM IN INT! R E C O R O t T I M E - TIN) IN R I N T t
15
20
25
END:
END:
END;
END PROCESS:
TIHE
A S P C L
6000000.00
D A T A
ACTUAL RUN TIKE = ACTUAL CM USED =
R E P O R T
PAGE
50.967 SECONDS 314008 rtCROS
FACILITY STATISTICS FACILITY MCP
UTILIZATION .82
M E A N <3USY PERIOD
NUMBER OF REQUESTS INTERRUPTS
263.5^9
13663
QUEUE S T A T I S T I C S EVENT NAME MCP
NUMBER OF ENTRIES 15226
QUEUE LENGTH MEAN MAX.
QUEUEINC TIME MEAN MAX.
2.309
910.07
13
^810.63
1
SIMULATION A
TIME eoooooc.oo
S P 0 L
D A T A
105 R E P O R T
PAGE
TABLE - RESPONSE TIME (1) TABLE NO. 2 MINIMUM MAXIMUM
20.351 85298.278 NUMBER OF ENTRIES
UPPER LIMIT
GREATER THAN
2000.000 4000.000 6000.000 8000.000 10000.000 12000.000 14000.000 16000.000 18000.000 20000.000 22000.000 24000.000 26000.000 26000.000 30000.000 32000.000 34000.000 36000.000 38000.000 40000.000 42090.000 44000.000 46000.000 48000.000 50000.000 52000.000 54000.000 56000.000 58000.000 60000.000 60000.000
FREQUENCY
4401 479 292 178 127 88 71
61 41 25 20 21 17 10 9 11 8 6 4 6 2 4 5 5 4 2 2 0 1 2 9
5911
PROPORTION .7445 .0810 .0494 .0301 .0215 .0149 .0120 .0103 .0069 .OC42 .0034 .0036 .0029 .0017 .0015 .0019 .0014 .0010 .0007 .OC10 .0003 .0007 .0008 .0008 .0007 .0003 .0003 0.0000 .0002 .0003 .0015
MEAN STO. DEV.
2806. 042 6417.681
CUMULATIVE PROPORTICN .7445 .8256 .8750 .9051 .9266 .9415 .9535 .9638 .9707 .9750 .9783 .9819 .9848 .9865 .9880 .9898 .9912 .9922 .9929 .9939 .9942 .9949 .9958 .9966 .9973 .9976 .9980 .9980 .9981 .9985 1.0000
2
CHAPTER 8
106
A S P 0 L
TIME
6000000.00
D A T A
R E P O R T
PAGE
3
SIMULATION D A T A
A S P 0 L TIME
107 R E P O R T
PAGE
6000000.00
TABLE - INTERACTION TIME TABLE NC. 4 MINIMUM MAXIMUM
20.036 18114.901 NUMBER OF ENTRIES
UPPER LIMIT
G R E A T E R THAN
500 .000 1000 .000 1500 .000 2000 .000 2500 .000 3000 .000 3500 .000 < t O O O .000 4500 .000 5000 .000 5500 .000 6000 .000 6500 .000 7000 .000 7500 .000 8000 .000 8500 .000 9000 .000 9500 .000 10000 .000 10000 .000
FREQUENCY 3648 1205 ^ 50 182 106 51 55 34 32 27 20 18 19 16
7
12 2 5 5 3 14
MEAN STD. C£V.
736.004 1?65.253
5911
PROPORTION
.6172 .?G39 .0761 .0308 .0179 .0086 .0093 .0058 .0051* . 0046 .0034 .0030 .0032 .0027 .0012 .0020 .0003 .OQOS .0008 .0005 .0024
CUPULATIVE PROPORTION
.6172 .8210 .8971 .9?79 .9459 .9545 .9638 .9695 .9750 .9795 .9829 .986C .9892 .9919 .9931 .9951 .9954 .9963 .9971 .9976 l.OQQC
4
CHAPTER 8
108
A S P C L
TIME
D A T A
R E P O R T
60COOOO.OO
PAGE
TABLE - SLICE TIME TABLE NO. 6 MINIMUM MAXIMUM
.045 1292.181 NUMBER OF ENTRIES
UPPER LIMIT
GREATER THAN
TIME
.0595 .0698 .0605 .0550 .1673 .2562 .1431 .0812 .0469 .0259 .0143 .0089 .0049 .0025 .0017 .0008 .0003 .0004 .0002 .0003 .0003
1111 1302 1130 1026 3123 4781 2671 1516 875 484 266 167 91 46 31 15 6 7 4 5 6
A
6000000.00
PROPORTION
FREQUENCY
50.000 100.000 150.000 200.000 250.000 300.000 350.000 400.000 450.000 500.000 550.000 600.000 650.000 7 00'. 000 750.000 300.000 350.000 900.000 950.000 1000.000 1000.000
S P 0
L
18653
D A T A
MEAN STO. OEV.
263.561 124.770
CUMULATIVE PROPORTION .0595 .1293 .1898 .2448 .4122 .6683 .8114 .8927 .9396 .9655 .9797 .9887 .9936 .9960 .9977 .9985 .9988 .9992 .9994 .9997 l.OOOC
R E - P O R T PAGE
TABLE - 1SLICE RESPONSE (1) T A B L E NO. 8
MINIMUM MAXIMUM
20.351 1326.454 NUMBER OF ENTRIES
UPPER LIMIT 250.000 500.000 750.000 1000.000 1250.000 1500.000
FREQUENCY 1023 1241 422 104 28 4
2822
PROPORTION .3625 .4398 .1495 . 0369 .0099 .0014
MEAN STO. DEV.
CUMULATIVE PROPORTION .3625 .8023 .9518 .9887 .9986 1.0000
345.377 210.270
6
5
SIMULATION
TIME
A S P O L
6000000.00
D A T A
109 R E P O R T
PAG;:
TABLE - ATTENTION WAIT TABLE NO. 10 MINIMUM MAXIMUM
-.000 1638.359 NUMBER OF ENTRIES
UPPER LIMIT LESS T H A N
GREATER THAN
0.000 100.000 200.000 300.000 (fOO.OOO 500.000 600.000 700.000 800.000 900.000 1000.000 1100.000 1200.000 1300.000 KtOO.OOO 1500.000 1500.000
FREQUENCY l*t*»8 1378 1*+09 1208 561 303 163 110 <*6 32 1*» *» 7 2 1 0 1
6697
PROPORTION
.2165 .2061 .2107 .1806 .0839 .0*453 .02M» .0161* .0069 .00<*8 .0021 .0006 .0010 .0003 .OC01 O.OCOO .OC01
MEAN STO. OEV.
CUMULATIVE PROPORTION
.2165 .4226 .6333
.auto
.8979 .9*»32 .9675 .98«fO .9909 .9957 .9978 .998*4 .999** .9997 .9999 .9999 1.0000
172.8R5 17*J.9<«9
7
liO
CHAPTER
PROGRAM 2 ASPOL SOURCE LISTING
CDC C Y B E R 70
VI.0
CYCLE
10/0<4/7i». H».
SIM MESA;
05 10
15
20
FACILITY MCP? TA3LE RINT ^RESPONSE TIKE (2>* STEP 2JOCO. UNTIL 50000. PLOT CUKt TABLE INTERAC *IKTERAC TIME HYPER* STEP 500. UNTIL 1COOO. PLOT OFF! TABLE SINGLE #ONE SLICE TIME (2)* STEP 5CO. UNTIL 8000. PLOT OFF! TABLE ATHAIT ^ATTENTION WAIT* STEP 500. UNTIL 8000. PLOT OFF! I N T E G E R i; MACRO ABLOCK: BLOCK A t INTEGER ATT (63)5 END ELOCK; ENOMACRO? ABLOCK; EVENT ATCON,GO(63),ATSE,QU(63) ,STSE ;
MONITOR(OU (1) ) ; MONITOR (QU (2 ))! MONITOR (CU (3) ) ' MONITOR < QU (t» MONITOR(QU(5) ) ; MONITOR (QU (6)); MONITOR (GU (7 )); MONITO"? (QU (8) ) ? MONITOR(CU(9)); MONITOR(QU(11)); KCNITCR(QU(11)): MONITOR(QU(12))t MONITORIQU(13) ) ; MONITCR(QUIlii) ) ; MONITOR (OU (15 )): MONITOR (QU (1 MONITOR(QU(17)); MONITOR(QU(18)); MONITOR(QU(19))5 MONITOR(QU(20)): FOR 1=1 UNTIL 30 00
INITIATE INTER(I) : INITIATE CONTROL; HOLO(6000000.);
25
END SIM;
30
35
*»C
PROCESS CONTROL; ABLOCK; LOCAL INTEGER TC5,J ,K , L,f,R: L=I; M=I; DO
BEGIN
K=M: FOR TCQ = K UNTIL 30 CO
<»5
so
REGIN I F ( A T T ( T C B ) . E Q . l ) Tl-FN BEGIN 4TT(TC")= C? 3ET (GC ( T C 9 ) I '• HAIT (ATS?) ? L = Tc=m;
END:
55
END:
I F t M . N ' E . l ) Thru BEGIN FOR T C « = 1 Utmt r-1 CO
SIMULATION
A S P O L SOURCE L I S T I N G
CDC CYBER 70
VI.0
111
CYCLE <»A
10/OW71*. l
5EGIN
I F ( A T T ( T C B ) .EQ.DTHEN BEGIN ATT(TC3>=0; SET(GO(TCB))J WAIT(ATSE); L=TC9+i;
ENO;
IF(L.GT.30)
j = i;
FOR
END:
END:
THEN M = I ;ELSE
M = !.t
R=l UNTIL 63 W H I L E ( J . L E . 1 0 > 00 BEGIN WHILE( ( L E N G T H ( Q U ( R ) ) . N E . O ) . A N D . (J.LE. 10)) BEGIN SET (QIMR) ) ? HAIT(STSE); J = J+l!
DC
END; END: IF ( (STATE (GOI . EQ. C ) , AND. (ST ATE ( QU ) . ECi. 0) > THEN
END PROCESS;
END;
HAIT(ATCON)5
PROCESS INTER(TCB)5 ABLOCK; LOCAL INTEGER TC«,P5 LOCAL REAL Y.TIN , INTI M ,TIMR ', DO
BEGIN HOLD ( E X P N T L ( 2 i » 0 0 0 . ) ) ; INTIr = 20. * H Y P E R X ( 7 0 0 . t l ' » « » 0 0 0 0 . ) ; RECORD INTIK IN I N T E R A C ' SET(ATCON); A T T C T C B ) = it
TIN = T I M E ;
WAIT(GOtTCB)) ; RESERVE(MCP) : Y=250. + EXPNTLdOO.) * IF(INTI*.LE.Y) THEN BEGIN HOLD(INTIH):
112
CHAPTER
ASPOL SOURCE LISTING
COO CYBER 70
VI. 0
CYCLE kH
10/0
IF l(TIN.GT.500000.).ANO. (TIN.LT.5800000 .)) THEN RECORO(TIME - INTIM - TIN) IN ATHAIT; RELEASE (MCP)t SET(ATSE): IFUTIN. GT.500000.). AND. (T IN. LT.5800000.)) THEN BEGIN RECCRO(TIKE - TIN) IN RINT? RECCROaif'E - TIN) IN SINGLE?
END; ELSE
• EMC;
BEGIN HOLD(Y) !
IF ( (TIN.GT.500000.).A NO. (TIN. LT.5800000.)) THEN RECORC(TIME - Y. - TIN) IN ATWAIT? RELEASE(MCP): stTtaTSE): TIMR = INTIM - Y; ' P = I: PRIORITY = TC8! Yr25i.+ EXPNTL(IOO); W H I L E ( T I M R . G T . Y ) 00 BEGIN QUEUE(GU IP)) : RESERVEtPCP): HCLO(Y); RELEASE(^CP)' SET(STSE):
TIMR=TIKP-Y;
E^!c;
END PROCESS;
ENO ?
IFtP.LT.63) THEN p = P + i: Y = 250*EXPNTL(100 .) ;
QUEU r (GU(P) ) ', "ESE?VE(MCP)t HOLD(TIMR)t P E L I A S E (MCP) ; SET(GTSE)! IFUTIN.GT.500000.).AND. (TIN.LT.5BOOQOO.)) THEN RECCKO(TIME - TIN) IN RINT: END !
SIMULATION
TIME
A S P 0 L
£000000.00
A T A
113 R E P O R T
PAGE
ACTUAL RUN TIME = 2 8 1 . 8 2 3 SECONDS ACTUAL CM USED = 33100B W O R D S FACILITY
STATISTICS
FACILITY MCP
UTILIZATION .83
M E A N BUSY PERIOD
N U M B E R OF REQUESTS nTERRUPTS
265 .019
18700
QUEUE STATISTICS EVENT NAME QUU
QU(2 OU ( 3 QUU
QU(5 OU (6 OU(7 QU<8 QU(9 QU(10) QU<11) OU(12) QU(13) QU<14) QU(15) QU(16) QU(17)
outie)
QU(19) OU(20)
NUMBER OF E N T R I E S 3467 2178 1289 857 617 496 414 351 300 258 22 4 192 1,72 152 135 115 99 90 80 68
QUEUE L E N G T H MAX. MEAN .299 .10<4
.063 .064 ,OSi» .049 .047 .044 .050 .041 .096 .028 .023 .022 .020 .028 .028 .031 .026 .025
8 5 5 6 5 6 6 6 6 6 6 5 5 4
5
5 it 4 4
QUEUEING TIME MEAN
516.92 296. 3
MAX.
4018.03 5120.24 7841.59 3174.41 11503.77 15871.03 16578.96 14809.02 18587.00 32105.97 39382.57 12889.66 18890.67 10829.95 11210.95 32894.17 3-3591.32 48934.03 59315.68 33450.81
1
CHAPTER
114
TIME 6 0 0 0 0 0 0 . 0 0
A S P O L
D A T A
R E F O R . T
PAGE
TABLE -»• RESPONSE TIME (2) TABLE NO. 2 MINIMUM 20.2$1 M A X I M U M 144520.620 NUMBER OF ENTRIES
UPPER LIMIT
GREATER THAN
2000.000 4000.000 6000.000 8000.000 10000.000 12000.000 14000.000 16000.000 18000.000 20000.000 22000.000 24000.000 26000.000 28000.000 30000.000 32000.000 34000.000 36000.000 38000.000 40000,000 <«?000.000 44000.000 46000.000 48000.000 50000.000 50000. O O Q
FREQUENCY 2995 1680 654 195 64 47 34 20 14 12 15 15 11 6 6 6 o 5 2 5 2 6 1 3 4 21
5826
PROPORTION
.5141 .2884 .1123 .0335 .0110 .0081 .0059 .0034 .0024 .0021 .0026 .0026 .0019 .0010 .0010 . OC10 .0005 .0009 .0003 .0009 .0003 .0010 .0002 .0005 .0007 .0036
MEAN STO. OEV.
3162.899 6213.182
CUMULATIVE PROPORTION
.5141 .8024 .9147 .9482 .9591 .9672 .9731 .9765 .9789 .9809 .9835 .9861 .9880 .9890 .9900 .9911 .9916 .9924 .9928 .993e .9940 .9950 .9952 .9957 .9964 1.0000
2
SIMULATION
A S P C L TIME
6000000.00
O A T A
115
R E P O R T PAGE
3
CHAPTER 8
116 A S P 0 L TIME
D A T A
=? £ F 0 R T
PAEE
eOCOTOQ.HD
T A B L E - I M T E R A C TIME HYFER T A B L E NO. if MINIMUM MAXIMUM
20.(139 ISII^.901 N U M B E R OF ENTRIES
UPPEP LIMIT
GREATER THAN
TIME
.6108 .2088 .0756 .0322 . 0151 .0116 .0085 .0067 .0065 .OC39 .0032 .0027 .0026 .0017 . 001<» .0017 . OOlit .0014 .0006 .OC03 .0035
<<039
1381 500 213 100 77 56
<*<»
<*3 26 21 18 17 11 9 11 9 9 <+ 2 23
A S P 0 L
6000000.00
PROPORTION
FREQUENCY
500.000 1000.000 1500. D O T 2000.000 2510.000 3000.000 3500.000 MOO. 000 U508.000 5000.000 5500.000 6000.000 6500.000 7000.000 7500.000 8000.000 8500.000 9000.000 9500.000 10000.000 10000.000
6613
*EAN STO. OEV.
CUMULATIVE PROPORTION .6108 .3196 .3952 .927i« .9^25 .9542 .9626 .9693 .9758 .9797 .9829 .9856 .9382 .9899 .9912 .9929 .99*»3 .9956 .9962 .9965 1.0000
R E P O R T
D A T A
749,<«24 1325.032
PAGE
TABLE - ONE SLICE T I M E (2) T A B L E NO. 6 MINIMUM MAXIMUM
20.251 572e.8*»l N U M B E R OF E N T R I E S
UPPER LIMIT 500.000 1000.000 1500.000 2000.000 2500.000 3000.000 3500.000 i+OOO.OOO (4503.000 5000.000 5500.000 6000.000
FREQUENCY
926 <»27 295 271 2<*2 207 182 112 67 38 8 (f
27?o
PROPORTION .333? .1537 .1062 .0975 .0871 .07
MEAN I*4*t0.820 S T O . O E V . 1267.895
CUMULATIVE PROPORTION .3332 .
5
SIMULATION
TIME
4 S P O L
6000000.00
O A T H
117
R E P O R T PAGE
TABLE - ATTENTION WAIT T A B L E NO. 8 MINIMUM MAXIMUM
-.000 6165.593 NUM9ER OF E N T R I E S
UPPER LIMIT LESS T H A N
0.000 500.000 1000.000 1500.000 2000.000 3500.000 3000.000 3500.000 <*000.000 1*500.000 5000.000 5500.000 6000.000 6500.000
FREQUENCY 113i» 1150
782 598 539 k7Z (»2
5886
PROPORTION .19A6 .197<+ .13<«2 .1026 .0925 .0810 .0728 .OSS'* .0331 .0218 .0076 .0031 .0007 .0002
MEAN 1276.1<»5 S T O . D E V . 1E70.677
CUMULATIVE PROPORTION
.ig^e
.3920 .5263 .6289
.7214
.ao2<*
.8752 .9336 .9667 .9885
.9961 .9991
.9998 1.0000
6
This page intentionally left blank
CHAPTER 9
Mathematical Models 9.1. Models of computing systems. The use of mathematical models in the analysis of computing systems is one of the most interesting and important areas of research in performance measurement and evaluation. A mathematical solution of a model that is a valid representation of a computer subsystem can give a great deal of insight into the performance of that subsystem. A few formulas derived through mathematical analysis might provide more useful information than could be obtained through the use of expensive and elaborate probes, or from large numbers of long simulation runs. The aim of a mathematical model is to abstract some of the essential characteristics of a system, and to find relationships among the parameters of the model that lend themselves to mathematical treatment and solution. Computing systems are extremely complex structures that do not easily lend themselves to description in terms of systems of equations that can be solved either in closed form or throughout computation. It is usually necessary to make strong simplifying assumptions in order to bring the mathematical and computational complexities within the range of known practical methods. A model that cannot be solved at all is useless. A model that has been simplified to the point where it has lost its validity as a representation of the system being studied may provide solutions that are equally useless. One of the major goals of this whole area of research is to develop new methods, and to extend existing methods to the solution of more complex models that provide closer approximations to real systems than those that can now be solved. The measurement techniques described in earlier chapters can be used to calibrate a model by providing appropriate values for the parameters and the statistical distributions that drive the model. To the extent that mathematical solutions predict performance parameters of the system that is being modeled, it may be possible to validate the model by comparing predicted values with values derived from an analysis of the measured data. We shall discuss a few queuing models that have been used in the study of time-sharing systems, and shall touch briefly on the "network of queues" models that have been used to study some multiprogramming systems. A large literature exists in this area, including several bibliographies [MC69], [AN72]. Additional and more recent collections of references can be found in the dissertations [OM73] and [BU71], and in a number of the other references. There are large areas of mathematics that are relevant to the topics discussed in this chapter, especially probability theory and queuing theory. There are many good standard texts on probability including [FE50] and [PA65]. Volume 1 of [KL75] is a very 119
120
CHAPTER 9
recent and thorough text on queuing theory. A second volume of [KL75], not yet published at the time of this writing, will be devoted to the applications of queuing theory to the modeling of computing systems. Our discussion here will be brief and informal, and will rely very much on references to the published literature. Our purpose is to illustrate the nature of some of the assumptions that are typically made, and the nature of some of the results that can be achieved. 9.2. The single server queue. A simple single server queuing system is illustrated in Fig. 9.1. Service requests arrive at distinct points in time, f t , t2, • • • , th • • • . The service request that arrives at time f, needs s, time units of service, and the server can supply service to only one request at a time. If the server is idle at the time the z'th request arrives, the request enters the server immediately, and departs from the system s, time units later. If the server is busy when a request arrives, the request enters the queue. When the server finishes serving the request with which it is currently occupied, another request is selected from the queue, and its service request is then satisfied. The algorithm used to select the next request from the queue is called the "service discipline". The simplest discipline is FCFS (First-Come First-Served), in which the request that has been waiting longest is chosen as the next one to be served. Another possible discipline is LCFS in which the request that has arrived most recently is the one selected.
FIG. 9.1. A single server queuing system
The requests in the queue are usually requests for different amounts of service time, and other examples of service disciplines are shortest request first (SRF) or longest request first (LRF). We shall assume the First-Come First-Served discipline wherever there is no other service discipline specified. We also assume that the queues are large enough to store all service requests that need to be queued. Many different phenomena can be represented by queuing systems. In particular the server may represent a time-sharing process, and the service requests may represent requests from terminals as users reach the end of their think time, and
MATHEMATICAL MODELS
121
hit a carriage return or other attention key to request service from the system. (See [CH70] for a good presentation of the theory of the single server queue and its applications to computing systems.) 9.2.1. The infinite source model. The rate at which service requests arrive is one of the important characteristics of a queuing system. In order to be able to use some of the classical results of probability theory, it is often convenient to assume that the probability P(t) that at least one request will arrive in a time interval (t0, t0 + t) satisfies the relationship where A is a constant, and o(t)/t-*0 as t->Q. We then say that the requests arrive according to a Poisson process with density A. The characteristics of a Poisson arrival process are derived in standard texts on probability and statistics. The probability that exactly n requests will arrive in an interval of length t is given by the well-known formula
The probability that no requests will arrive in an interval of length t is and the probability that at least one request will arrive is therefore A(t) may be interpreted as the probability that the interval from any starting time t0 until the next request occurs is ^t. If we start at the time of arrival of the I'th request, th A(t) represents the probability that the interarrival time ti+l — f, will be less than t. For the Poisson process the interarrival times are thus independent random variables with the exponential distribution A(t). The constant A is the average rate at which requests arrive in the system, and I/A is the average interarrival time. Since the standard deviation of an exponential distribution is equal to its mean, the standard deviation of the interarrival times is also. equal to I/A. The exponential distribution is characterized by the Markov or memoryless property which can be described, a bit informally, by the statement that at any time f 0 , the probability that a request will arrive in the next t seconds is independent of anything that occurred prior to t0, and is thus also independent of the amount of time that has passed since the last request arrived. The assumption that the probability P(t) is independent of the interval starting point t0 means that it is also independent of the number of requests that may already be in the queue at that time. This assumption cannot be true in any finite terminal system, but it may be approximately true as the number of terminals gets large compared with the number of requests in the queue at any one time. Models that make this assumption are called "infinite source" models.
122
CHAPTER 9
9.2.2. Exponential service time assumption. In order to simplify the mathematics, especially in connection with complicated models, the assumption is frequently made that the service time probability distribution is an exponential distribution. In more technical terms we state that the service times s, are independent identically distributed random variables with the exponential distribution B(t) can be interpreted as the probability that a service request selected at random will require no more than t time units of service. The average service time per request is l//x, which is also the standard deviation of the service time distribution. Because of the memoryless property of the exponential distribution, B(t) can also be interpreted as the probability that the additional service time required by a request that is currently being serviced is less than t. For the queuing system of Fig. 9.1 with interarrival time distribution (9.2) and service time distribution (9.3), the service rate is defined as
Since the service requests arrive at an average rate of A requests per time unit, and require an average of I//.?, time units of service, it is clear that the queue would grow indefinitely large in time if A//u were greater than 1. We shall therefore assume that p is less than 1. For this simple single server system with Poisson arrivals and an exponential distribution of service times, a classical result of queuing theory states that the system will eventually reach a steady state in which the average number of requests in the system (either in the queue or in the server) is given by the formula
The average service time required by each of the requests already in the system (including the additional time required by the one in the server) is I//LI. It therefore seems plausible, and can be proved, that the average response time of this simple model is
9.2.3. Af/M/1 and M/G71 queues. There is a standard abbreviated notation for specifying the nature of a queuing model by a sequence of symbols separated by slashes. The queuing model discussed in the preceding section would be designated by the sequence M/M/l/oo or, more simply, by the sequence M/M/1. The first symbol designates the input process, the second designates the service distribution, the third is the number of servers. The symbol oo indicates that this is an infinite source model. This is the default assumption that is made if no symbol
MATHEMATICAL MODELS
123
at all is present beyond the third. One may also include a specification for the maximum number of requests that may be in the system simultaneously. If no such maximum is specified, it is assumed to be infinite. A service discipline may also be specified, and if omitted is assumed to be FCFS. The letter M designates a process or distribution that displays the Markov memoryless property. Thus in the M/M/1 model, the first M designates the Poisson input process, and the second M designates the exponential service time distribution. The 1 specifies that it is a single server queue. The letter E is reserved for the Erlang distributions which are used in many queuing system studies. A queuing system of type M/G/1 would be a single server queuing system with a Poisson arrival process, but with an unspecified, i.e., a general, service time probability distribution. For an M/G/l system of the type represented in Fig. 9.1, we assume that the mean, E(S} = 1/fj, and the standard deviation a(S) of the service time distribution are known. We also assume that p = X//JL < 1, and define c by the formula
It can be shown that the queuing system reaches a steady state in which the average response time is given by the formula
For the exponential service time distribution in which cr(S) = E(S), c is equal to 1, and the response time in (9.7) reduces to that in (9.6). Formula 9.7 illustrates very graphically how, in the case of the M/G/lmodel, the response time depends on the variance as well as on the mean of the service time distribution, and how, assuming constant service rate, the response time increases as the ratio of the variance to the square of the mean increases. 9.2.4. The finite source model. A more realistic model of a time-sharing process than that of Fig. 9. 1, especially if the number of terminals is not very large, is the single server queuing system with a finite number of users Af, illustrated in Fig. 9.2. In such a system it is not reasonable to assume that the arrival rate of service requests is independent of the number of requests already in the queue. In the extreme case in which all N terminals have service requests in the system, there can be no new service requests at all until at least one service request is completed. A system of this type is self-regulating in the sense that, as requests accumulate in the queue, the rate of arrival of new requests decreases. The finite source model was studied quite extensively in connection with the machine repair problem, which was one of the early practical applications of queuing theory. In the simplest machine repair model there are N automatic machines that work continuously so long as they are not in need of repairs, and there is a single repairman who can service one machine at a time.
124
CHAPTER 9
FIG. 9.2. A finite source single server queuing system
In the finite source model the simplifying assumption, analogous to the assumption (9.1) of the Poisson arrival process in the infinite source model, is the assumption that for each terminal in user state the probability P(t) that the terminal will submit a request in a time interval (t0, t0 + t) satisfies the relationship P(i) = \*t + o(t) as t-*Q, where A is a constant. Since A, and hence P(t), is independent of t0, the probability that a new service request will be made by a particular terminal in user state in the next interval of length t is independent of how long the terminal had already been in user state at the beginning of the interval. This is the Markov assumption for the finite source model, and if in addition the service time is assumed to have an exponential distribution, the model is said to be of type M/M/l/N. There is an elegant solution to the machine repair problem with the Markov assumptions [FE50], from which one can deduce the following formula for the average steady state response time W:
MATHEMATICAL MODELS
125
where p0 is that fraction of the total time that the time-sharing process (or the repairman) is idle. The quantity (1 -p0) can be interpreted as the probability that the process (or repairman) is busy. The value of p0 can be calculated from the formulas
In this model l//x is the expected (average) value of the service time E(S), and I/A is the average value of the user time or think time E(U}. The response time formula (9.8) can be rewritten in the form
A very simple argument, quoted in [AD69] and credited there to Khinchin, shows that the response time formula (9.10) is valid for the finite source model M/G/l/N with a general service time distribution with mean E(S). A formula for Po for that model is given on pages 317 and 318 of [AD69]. Even relatively elementary applications of queuing theory lead to complicated formulas that contain the Laplace-Stieltjes transforms of the distribution functions involved. The formula for p0 in [AD69] is expressed in terms of products of such transforms. The calculation of p0 f°r a service time distribution other than the exponential distribution can be very much more complicated than that for the M/M/l/N queue indicated in (9.9). Buzen [BU74] describes calculations carried out for several different service distributions, and provides a set of tables that show the relative errors in response time introduced by using the infinite source assumption in place of the finite source assumption for a range of values of N and p0. As might be expected, in a busy system in which p0 is small, N must be quite large before the response predicted by the infinite source model provides a close approximation to that predicted by the finite source model. In connection with research involving finite source queuing models at Purdue University, William C. D'Avanzo has shown that formula (9.10) applies under even more general assumptions, for any G/G/l/N system that converges to a steady state. In the more general case it may not be possible to calculate p0, but in a real time-sharing system it may be possible to measure p0, and to obtain an estimate of W from estimates or measurements of the distributions of service requests and user times. 9.2.5. Some practical results. Formula (9.10) was first used in the context of a study of a time-sharing system by Scherr [SC67] in his Ph.D. dissertation that contains his classic study of the CTSS system at MIT. As the use of the time-sharing process approaches 100%, p0 approaches 0 and formula (9.10) becomes
126
CHAPTER 9
The average response time W thus becomes a linear function of N with slope E(S). In this saturation condition every additional terminal that requests service from the system will increase the response time by an amount E(S). Scherr took the value of N for which W = 0, that is,
as an estimate of the number of terminals that the process is able to support with good response. As the number of terminals increases over the value given in (9.12), formula (9.10) is an estimate of the extent to which response time will be degraded. As part of his study of CTSS, Scherr observed and measured the user time at a number of terminals. As a result of this direct measurement he used 35 seconds as an estimate of mean user time. In a much later analysis of the IBM OS/360 TSO system [LA72], he again used the above formulas with a 35-second average think time to estimate the number of terminals that could be supported by a TSO partition. It is very difficult to get accurate measures of think time through internal measurements, especially where front-end computers are involved. As has been pointed out in Chapter 5, it is even difficult to define think time accurately enough to measure it under such conditions. In an early study at Purdue, mean think time was measured as 24 seconds, and mean service time as 720 msec. According to formula (9. 1 2), the time-sharing process could then support 33 terminals. With 4 active terminals, assuming that p0 = 0, formula (9.10) predicted an average response time of 4.8 seconds. This agreed fairly well with observed response times in the early production versions of the system. The version of the Purdue MACE system that was in use at the time these measurements were made ran two time-sharing processes, and if the availability of time at the time-sharing process were the only critical resource, one would expect to be able to support twice as many terminals as with one process. The two processes compete with each other and with other processes for various resources, and it would be very difficult to analyze the effect of such competition. The elementary formulas (9.10) and (9.12) seem to provide a good first approximation, even in this case. These formulas are useful in providing a quantative expression for the intuitively obvious fact that there is a strong relationship between the average service time and the average response time of the system. The formulas permit rough calculations of the improvement in system response that can be expected through steps taken to cut down the service time requirements of the average user. 9.3. Time slicing. In a simple queuing system in which time slicing is not used, the full amount of service requested will be granted to the user when he reaches the head of the queue. His total response time is therefore equal to his time in the queue, which is independent of the size of his own service request, plus the length of his own request.
MATHEMATICAL MODELS
127
One of the purposes of most time-sharing scheduling strategies is to favor those users who make short service requests at the expense of those with longer requests. There are many possible strategies (see, for example [CO68a]), some of which may deny or delay access to the queue to requests that exceed set limits, and some of which rearrange the order of service requests in the queue, so that shorter requests will be handled ahead of longer ones. 9.3.1. The time-sliced round-robin model (Figure 9.3). In this model, each user is placed at the end of the queue when he enters the system. When he gets to the head of the queue he receives a maximum time slice 6. If his service request S is ^ 6, his interaction is complete and he exits the system. Otherwise he goes back to the end of the queue. Each time he arrives at the head of the queue he gets another time slice, until he finally satisfies his service time request.
FIG. 9.3. A time-sliced round-robin model
For many of the service disciplines that have been used in the study of queuing models of time-sharing systems, including the FCFS discipline that we are assuming in most of our discussion here, time slicing does not change the average response times given in formulas (9.6) and (9.10). While the average remains unchanged, the response characteristics of the system are quite different, since in a time-sliced system the response time to a user depends in a rather complicated way on the length of the user's service request. In the infinite source model M/M/1, the response time as a function of the size of the service request has been calculated in [CO68] and [SH67]. The formulas derived for response as a function of the length of the service request would take
128
CHAPTER 9
about half a page, and individual values and sets of values of response time for various choices of A, /u, and p can easily be calculated on a computer, or even on one of the more sophisticated hand calculators. There is some interest in considering the special case in which the length of the time slice tends to zero. This is called the "processor-sharing model", since the model behaves as if all the requests in the queue are continuously getting equal shares of server (processor) time. The processor-sharing model is an approximation to a model in which the time slice is small. For the round-robin processorsharing model the average response time formula reduces to W(S) = S/(l — p), where S is the length of the service time request [CO68]. For fixed service rate p, the average response time W in this model is a linear function of the amount of service requested. The factor 1/(1 — p) is the "expansion factor". It represents the factor by which a user's response time is multiplied because of the presence of other users in a time-sharing system. The round-robin model has been solved for the finite source case, M/M/l/N (see [AD69]). The formula for response time as a function of the length of a service request, along with a definition of the terms in which the formula is presented, would take 5 or 6 pages, and will not be presented here. The formula is of use only as a guide to the development of a computer program that can be used to provide the desired values of response time and that can and has been used to provide graphs that correlate parameters such as number of terminals, interarriva and service time, and the size of the time slice used. The referenced paper presents a number of such graphs. The author defines the service rate as NA//a, and notes that when this service rate is held constant, the average response time seems to be close to a linear function of the length of the service request. 9.3.2. Feedback queuing models. Figure 9.4 represents a feedback queuing model (FBN) that has been studied by several investigators [SC67], [CO68]. The model is similar to, but simpler than, the simulation model considered in Chapter 8 (Fig. 8.2). In Fig. 9.4 we assume that a service request enters the system through the first queue, and gets one time slice when it reaches the head of the queue. It leaves the system if its service request is satisfied by this time slice. At the end of each time slice the server gives the next slice to the service request at the head of the lowest level nonempty queue. Whenever a task finishes a time slice in queue /, it moves to the end of queue i +1 if it has not yet had enough time slices to complete its service request and leave the system. When a task gets to the head of the Nth queue it remains at the head of that queue until it has received enough time slices to finish its service request, but it is preempted between time slices if any tasks appear in lower level queues. A solution to the FBN model under the M/M/1 assumptions of exponential distribution for inter-arrival time and service time is given in [CO68]. The solution is in the form of a formula for the average response time W as a function of the service time. The formula, which could be stated in less than one page, is not reproduced here. The reference [CO68] contains a number of graphs showing the results of calculations for this model and for a number of related models.
MATHEMATICAL MODELS
129
FIG. 9.4. Feedback queuing model
Schrage [SC67] studied the model FBoo in which the number of feedback queues is unlimited, and derived formulas for average response time under the assumptions M/G/1. It is pointed out in [BU74] that feedback queuing models have not yet been solved under the M/G/1/N finite source assumptions. 9.4. Mathematical modeling and simulation. A single formula embodied in a simple computer program contains all the information and more that could be obtained from huge numbers of simulation runs. There is a tremendous advantage in having the mathematical solution, provided of course that the mathematical model is an adequate representation of the system that is being studied.
130
CHAPTER 9
The scheduling strategy in the simulation model for the system of Fig. 8.2 used a two-cycle scheduler, in which a fixed number of time slices was allocated to the interactions in the higher level queues in the second cycle, before new service requests in the lowest level queue were honored in the next first cycle. This type of scheduling strategy could probably not be handled by the methods by which the model in Fig. 9.4 was solved. The simulation model used a hyperexponential service time distribution in the example presented, and it has also been run with other distributions, including histograms derived from system measurements. The simulation model assumed a finite source of service requests. The mathematical model that has been solved is an infinite source model. The mathematical model provides a difficult mathematical problem. It is hard to quantize such things, but in some sense one might state that its level of difficulty is such as to tax the state of the art of solving such problems, and even apparently small changes in the assumptions would put it beyond the state of the art. The simulation model of Fig. 8.2 is almost at the other extreme. It is presented as an example of a technique, and unnecessary simplifying assumptions were made to keep the example short. From the point of view of programming effort it would be easy to include a great deal more detail in the model. One could, for example, include a study of the interference between the time-sharing processes and the other processes of the system. One could include the effect of demands on other facilities and resources, such as input-output and swapping storage. Writing the simulation program might be a nontrivial programming exercise, but it would be a straightforward effort on the part of a competent programmer. Even though it might be easy to write such a simulation program, it might be prohibitively expensive to run the simulation enough times, and for long enough times, to produce answers to any meaningful questions about the system that is being simulated. In most cases mathematical models and simulation studies are not used to get exact solutions to specific questions about real systems. Mathematical models are often used to provide insights into the behavior of idealized systems, and order-of-magnitude estimates of the behavior of systems and system components. Simulation models can go a bit farther, and can come closer to the complexity of a real system, but, as pointed out in Chapter 8, there is usually a very high cost associated with this additional detail, and practical considerations may force the simulation model to make simplifying assumptions analogous to those made in mathematical models. 9.5. Networks of queues. A very important and interesting approach to modeling computing systems is the use of queuing network models. The bestknown example is the central server model of a multiprogramming system that is discussed and studied in detail by Buzen [BU71], [BU73]. The model illustrated in Fig. 9.5 assumes that there are N programs in the system which has one CPU and L peripheral devices. A program starts at the end of the CPU queue. When it reaches the head of the queue it gets service according to a service time distribution with mean l/w 0 . There is now a probability p0 that
MATHEMATICAL MODELS
131
FIG. 9.5. Central server model
the program will terminate, in which case it is assumed that it is immediately replaced with another program at the end of the CPU queue, so that the number N of programs in the system remains constant. If it does not terminate, the probability is p l5 p2, ' • • , PL respectively that it will enter queues Qi, Q2, • • • , OL for one of the L peripheral devices. After it gets to the head of queue Q it will receive service from the peripheral device D, according to a service time distribution with mean 1/w,. It will then go back to the central processor queue and continue to circulate around the system. The probabilities p0, p\, p2-> ' ' ' , PLare assumed to be known, perhaps from system measurements. They must satisfy the constraint In most network of queues studies, all of the service time distributions are assumed to be exponential. The more general network of queues model assumes that N customers circulate around a network that contains L + 1 interconnected servers. Each server has an associated service queue. The quantity ptj represents the probability that a customer leaving server / will enter the service queue at server /. Thus all of the pi; are ^ 1 and for all i,
132
CHAPTER 9
The network of queues model is solved in [JA63] and also in [GO67] under the assumption that the servers have independent exponential service time distributions. The solution is the steady state probability p(« 0 > HI, • • • , nL) that there are HJ customers present at the y'th server, where
The solution can be presented in a very simple form. Let y0, yi, y 2 , • • • , y^ be any solution of the system of equations
The relations (9.13) make it certain that this system of equations will have solutions. The steady state probabilities are then
where l/uk is the average service time for the kth server, and G is a normalizing constant chosen to make the sum of all values of p(n0, n\, n2, • • • , nL) equal to 1. Thus
where the sum is taken over all values of the set (n0, n, • • • , nL) that satisfy (9.14). These formulas assume an especially simple form for the central server model, and Buzen's dissertation [BU73] discusses that model and presents some interesting applications of its use. Although the formulas for the solution of the network of queues model seem quite simple, a completely straightforward attempt at evaluating them would not be practical for a network of any magnitude. Buzen's dissertation contains an algorithm designed to make the computations manageable, even for large networks. His algorithm has been programmed for a number of computers, and it provides a useful tool in the area of mathematical modeling of computing systems. The solutions presented in (9.14) through (9.16) represent one of the few instances in which it is possible to obtain closed-form solutions to a fairly general model that can represent some of the very complicated relationships that exist in a computing system. It is often useful and interesting to obtain a solution for a model that is a good representation of the structure of a system, even if the required exponential service time distributions are not good representations of the actual service time distributions in the real system that is being modeled. Networks of queues models are a subject of very active research by a number of investigators. Networks in which there are different classes of customers are being studied, and attempts are being made to replace the exponential service time
MATHEMATICAL MODELS
133
assumption with less restrictive assumptions. A paper [MU74] describing recent progress in this area has been widely circulated through private channels, and a more complete version is scheduled to appear in the Journal of the ACM in the near future.
This page intentionally left blank
References
[AB70] V. A. ABELL, S. ROSEN AND R. E. WAGNER, Scheduling in a general purpose operating system, AFIPS Conf. Proc., 37 (1970), FJCC, pp. 89-96. [AB74] V. A. ABELL, The Purdue, Dual-MACE operating system, Purdue University Computing Center Document LO-DMACE, Lafayette, Indiana, 1974. [AC71] Proc. of ACM SIGOPS Workshops on System Performance Evaluation at Harvard University, 1971, ACM, New York, N.Y. [AC73] Proc. 1st Annual SIGME Symposium on Measurement and Evaluation, ACM, New York, 1973. Note: Proc. 2nd Annual Sigmetrics Symposium is published in Performance Evaluation Rev., 3 (1974), No. 4. [AD69] I. ADIRI AND B. AVI-!TZHAK, A time-sharing queue with a finite number of customers, J. ACM, 16 (1969), pp. 315-323. [AN72] H. A. ANDERSON, JR. AND R. G. SARGENT, Modeling, evaluation and performance measurements of time-sharing computer systems, Comput. Rev., 13 (1972), pp. 603-608. [AN72b] H. A. ANDERSON, JR., An empirical investigation into foreground-background scheduling for an interactive computing system, IBM Res. Rep. RC3941, Yorktown Heights, N.Y., July, 1972. [AP65] C. T. APPLE, The program monitor—a device for program performance measurement, Proc. ACM National Conf., 1965, pp. 66-75. [BA70] F. BASKETT, J. C. BROWNE AND W. M. RAIKE, The management of a multi-level non-paged memory system, AFIPS Conf. Proc., 36 (1970), SJCC, pp. 459-465. [BE71] C. G. BELL AND A. NEWELL, Computer Structures: Readings and Examples, McGraw-Hill, New York, 1971. [BE72] T. E. BELL, Objectives and problems in simulating computers, AFIPS Conf. Proc., 41 (1972), FJCC, pp. 287-297. [BL73] , Performance determination—the selection of tools, if any, Ibid., 42 (1973), pp. 31-38. [BO74] S. J. BOIES, User behavior on an interactive computer system, IBM Systems J., 13 (1974), pp. 2-18. [BO69] A. J. BONNER, Using system monitor output to improve performance, Ibid., 8 (1969), pp.290-298. [BR73] THOMAS BERETVAS, System-independent tracing for prediction of system performance, Proc. ACM SIGSIM Symposium on Simulation of Computer Systems, 1973, pp. 209-213. [BU62] W. BUCHHOLZ, ed., Planning a Computer System, McGraw-Hill, New York, 1962. [BU69] , A selected bibliography on computer systems performance evaluation, IEEE Comp. Group News, 2 (1969), No. 8, pp. 21-22. [BU69a] , A synthetic job for measuring system performance, IBM Systems J., 8 (1969), pp. 309-318. [BU71] J. P. BUZEN, Queuing network models of multiprogramming, Ph.D. dissertation, Harvard University, Cambridge, Mass., 1971. [BU73] , Computational algorithms for closed queuing networks with exponential servers, Comm. ACM, 16 (1973), pp. 527-531. [BU74] J. P. BUZEN AND P. S. GOLDBERG, Guidelines for the use of infinite source queuing models in the analysis of computer system performance, AFIPS Conf. Proc., 43 (1974), National Comp. Conf., pp. 371-374. 135
136
REFERENCES
[CA67] P. CALINGAERT, System performance evaluation: survey and appraisal, Comm. ACM, 10 (1967), pp. 903-914. [CA68] D. J. CAMPBELL AND W. J. HEFFNER, Measurement and analysis of large operating systems during system development, AFIPS Conf. Proc., 33 (1968), Part 1, FJCC, pp. 903-914. [CD72] Control Data Cyber 70 Computer Systems. A simulation process-oriented language (ASPOL) reference manual, Special Support Division, Control Data Corporation, Sunnyvale, Calif., 1972. [CH69] P. S. CHENG, Trace-driven system modeling, IBM Systems J., 8 (1969), pp. 280-289. [CH70] W. CHANG, Single-server queuing processes in computing systems, Ibid., 9 (1970), pp. 36-71. [CH74] S. CHANDRASEKHAR, Observation must be confirmed by theory, Univ. of Chicago Magazine, Chicago, 111., Summer, 1974, p. 16. [CN68] C. J. CONTI, D. H. GIBSON AND S. H. PITKOWSKI, Structural aspects of the system/360 model 85, IBM Systems J., 7 (1968), pp. 2-14. [CO68] E. G. COFFMAN AND L. KLEINROCK, Feedback queuing models for time-shared systems, J. ACM, 15 (1968), pp. 549-576. [CO68a] , Computer scheduling methods and their countermeasures, AFIPS Conf. Proc., 32 (1968), SJCC, pp. 11-21. [CO74] Comptroller General of the United States, Tools and techniques for improving the efficiency of Federal automatic data processing operations, Document B-l 15369, U.S. General Accounting Office, Washington, D.C., June, 1974. [CR71] S. CROOK, J. MINKER AND J. YEH, Key word in contest index and bibliography on computer system evaluation techniques, Tech. Rep. TR-146, University of Maryland Computer Science Center, College Park, Md., 1971. [CR74] M. A. CRANE AND D. L. IGLEHART, Simulating stable stochastic systems I. General multiserver queues, J. ACM, 21 (1974), pp. 103-113. [DA72] O. J. DAHL, E. W. DIJKSTRA AND C. A. R. HOARE, Structured Programming, Academic Press, New York, 1972. [DE68] P. J. DENNING, The working set model for programming behavior, Comm. ACM, 11 (1968), pp.323-333. [DE69] W. R. DENISTON, SIPE-.ATSS/360 software measurement technique, Proc. ACM National Conference, 1969, pp. 229-245. [DE70] P. J. DENNING, Virtual memory, ACM Comp. Surveys, 2 (1970), No. 3, pp. 153-189. [DR73] M. E. DRUMMOND, JR., Evaluation and Measurement Techniques for Digital Computer Systems, Prentice-Hall, Englewood Cliffs, N.J., 1973. [EW75] J. C. EWING, Measurement and analysis of an interactive/remote job entry subsystem in a batch oriented system, Research report, Purdue University Computing Center, Lafayette, Indiana, 1975. [FE50] W. FELLER, An Introduction to Probability Theory and Its Applications, John Wiley, New York, 1950. [FE72] D. FERRARI, Workload characterization and selection in computer performance measurement, Computer, 5 (1972), No. 4, pp. 18-24. [FI73] G. S. FISHMAN, Concepts and Methods in Discrete Event Digital Simulation, John Wiley, New York, 1973. [GA73] R. A. GARMOE, Bench/Mark—a performance evaluation tool, Internal document, Purdue University Computing Center, Lafayette, Indiana, 1973. [GO67] N. J. GORDON AND G. F. NEWELL, Closed queuing systems with exponential servers, Operations Res., 15 (1967), pp. 254-265. [GO69] R. L. GOULD, GPSS/360—an improved general purpose simulator, IBM Systems J., 8 (1969), pp. 16-27. [GO72] H. H. GOLDSTINE, The Computer from Pascal to von Neumann, Princeton University Press, Princeton, N.J., 1972. [GO75] G. GORDON, The Application of GPSS V to Discrete System Simulation, Prentice-Hall, Englewood Cliffs, N.J., 1975.
REFERENCES
137
[GR72] U. GRENANDER AND R. F. TSAO, Quantitative methods for evaluating computer system performance: A review and proposals, Statistical Computer Performance Evaluation, Academic Press, New York, 1972. [GR75] G. S. GRAHAM, Program behavior and memory management, Ph.D. dissertation, Purdue University Computer Science Dept., Lafayette, Indiana, 1975. [HE74] H. F. HERTEL AND R. A. MERIKALLIO, The system simulators—a modular approach to systems modeling, Proc. ACM SIGSIM Symposium on the Simulation of Computer Systems, 1974, pp. 197-209. [HU67] L. R. HUESMANN AND R. P. GOLDBERG, Evaluating computer systems through simulation, Comput. J., 10 (1967), pp. 150-156. [IB72] GTF (Generalized Trace Facility), IBM System/360 Operating System Service Aids, Document S360-31, GC 28-6719-2, IBM Corporation, Boulder, Colorado, 1972. [IB73] IBM Systems Reference Library, OS SMF S360-31 GC28-6712-7, IBM Corporation, Boulder, Colorado, 1973. [IE73] Record of the 1973 IEEE Symposium on Computer Software Reliability, IEEE Computer Society, New York, 1973. [JO65] E. O. JOSLIN, Application benchmarks; the key to meaningful computer evaluations, Proc. ACM National Conf., 1965, pp. 27-37. [KI68] P. J. KIVIAT, R. VILLANUEVA AND H. M. MARKOWITZ, The SIMSCRIPTII Programming Language, Prentice-Hall, Englewood Cliffs, N.J., 1968. [KL75] L. KLEINROCK, Queuing Systems, Wiley-Interscience, New York, 1975. [KO71] K. W. KOLENCE, A software view of measurement tools, Datamation, 17 (1971), pp. 32-38. [LA72] E. R. LASSETTRE AND A. L. SCHERR, Modelling the performance of the OS/360 timesharing option (TSO), Statistical Computer Performance Evaluation, Academic Press, New York, 1972. [LU71] H. C. LUCAS, JR., Performance evaluation and monitoring, ACM Comp. Surveys, 3 (1971), No. 3, pp. 79-91. [MA70] M. H. MACDOUGALL, Computer system simulation, an introduction, Ibid., 2 (1970), No. 3, pp. 191-209. [MA73] M. H. MACDOUGALL AND J. S. McALPiNE, Computer system simulation with ASPOL, Proc. ACM SIGSIM Symposium on the Simulation of Computer Systems, 1973, pp. 93-103. [MA74] M. H. MACDOUGALL, Simulating the NASA mass data storage, Proc. ACM SIGSIM Symposium on the Simulation of Computer Systems, 1974, pp. 33-43. [MC69] J. McKENNEY, A survey of analytical time-sharing models, ACM Comp. Surveys, 1 (1969), No. 2, pp. 105-116. [MI72] E. F. MILLER, JR., Bibliography on techniques of computer performance analysis, Computer, 5 (1972), No. 8, pp. 39-47. [MO73] J. A. MORRIS, Hardware measurement—past, present and future, Paper presented to SHARE and reprinted in [SH74], Vol. 2, pp. 308-332. [MU74] R. R. MUNTZ AND F. BASKETT, Open, closed, and mixed networks of queues with different classes of customers, Preliminary copy of paper to appear in J. ACM with additional joint authors, K. M. Chandy and F. Palacios Gomez. [NI66] N. R. NIELSEN, The analysis of general purpose computer time-sharing systems, Document 40-10-1, Stanford Computation Center, Stanford, Calif., 1966. [NI67] , The simulation of time-sharing systems, Comm. ACM, 10 (1967), pp. 397-412. [NO72] J. D. NOE AND G. J. NUTT, Validation of a trace-driven CDC 6400 simulation, AFIPS Conf. Proc. 40 (1972), SJCC, pp. 749-757. [NO74] J. D. NOE, Acquiring and using a hardware monitor, Datamation, 20 (1974), No. 4, pp. 89-95. [OM73] K. OMAHEN, Analytic models of multiple resource systems, Ph.D. dissertation, University of Chicago, Chicago, 111., 1973. [PA65] A. PAPOULIS, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, 1965.
138
REFERENCES
[PE74] T. G. PETERSON, A comparison of software and hardware monitors, Performance Evaluation Rev., 3 (1974), No. 2, pp. 2-5. [PR74] A. PRITSKER, The GASPIV Simulation Language, Wiley-Interscience, New York, 1974. [RE69] D. J. ROEK AND W. C. EMERSON, A hardware instrumentation approach to evaluation of a large scale system, Proc. 24th ACM National Conf., 1969, pp. 351-367. [RO65] S. ROSEN, R. A. SPURGEON AND J. K. DONNELLY, PUFFT—The Purdue University Fast FORTRAN Translator, Comm. ACM, 8 (1965), pp. 661-666. [RO68] S. ROSEN, Hardware design reflecting software requirements, AFIPS Conf. Proc., 33 (1968), Part 2, FJCC, pp. 1443-1449. [RO69] , Electronic computers: a historical survey, ACM Comp. Surveys, 1(1969), pp. 7-36. [RO72] , Programming systems and languages, 1965-1975, Comm. ACM, 15 (1972), pp. 591-600. [RO73] , Improving operating system performance, Proc. 2nd Texas Conference on Computing Systems, 1973, pp. 8-1 to 8-5. [SA70] J. H. SALTZER AND J. W. GINTELL, The instrumentation of multics, Comm. ACM, 13 (1970), pp. 493-500. [SC67] A. L. SCHERR, An Analysis of Time-Shared Computer Systems, MIT Press, Cambridge, Mass., 1967. [SC70] H. D. SCHWETMAN, A study of resource utilization and performance, evaluation of large scale computer systems, Ph.D. dissertatibn, Document TSN-,12, University of Texas Computation Center, Austin, Texas, 1970. [SC74] , Analysis of a time-sharing subsystem: A preliminary report, Performance Evaluation Rev., 3 (1974), No. 4, pp. 65-75. [SE69] P. H. SEAMAN AND R. C. SOUCY. Simulating operating systems, IBM Systems J., 8 (1969), pp. 264-279. [SH67] F. D. SCHULMAN, Hardware measurement device for IBM system/3 60 time-sharing evaluation, Proc. 22nd ACM National Conf., 1967, pp. 103-109. [SH72] S. SHERMAN, F. BASKETT AND J. C. BROWNE, Trace-driven modelling and analysis of CPU scheduling in a multiprogramming system, Comm. ACM, 15 (1972), pp. 1063-1069. [SH74] SHARE INC., Computer Measurement and Evaluation, Selected Papers from the SHARE Project, Vols. 1, 2, 25 Broadway, New York, N.Y., 1974. [SN74] R. SNYDER, A quantitative study of the addition of extended core storage, Performance Evaluation Rev., 3 (1974), No. 1, pp. 10-33. [SR74] K. SREENIVASAM AND A. J. KLEINMAN, On the construction of a representative synthetic workload, Comm. ACM, 17 (1974), pp. 127-133. [ST68] D. F. STEVENS, System evaluation on the Control Data 6600, Proc. IFIP Congress 68, pp. C34-C38. [ST72] J. C. STRAUSS, A benchmark study, AFIPS Conf. Proc., 41 (1972), Part 2, FJCC, pp. 1225-1233. [ST74] S. STIMLER, Data Processing Systems: Their Performance, Evaluation, Measurement, and Improvement, Motivational Learning Programs, Inc., Trenton, New Jersey, 1974. [TE66] D. TEICHROEW AND J. F. LUBIN, Computer simulation—discussion of the technique and comparison of languages, Comm. ACM, 9 (1966), pp. 723-741. [TH70] J. E. THORNTON, Design of a Computer. The Control Data 6600, Scott and Foresman, Glenview, Illinois, 1970. [WA71] C. D. WARNER, Monitoring: A key to cost efficiency, Datamation, 17 (1971), pp. 40-49. [WA74] J. L. WALKOWICZ, Benchmarking and workload definition: A selected bibliography wit abstracts, NBS special publication 405, 1974. U.S. Govt. Printing Office Document: SD Catalogue no. C13.10:405, Washington, D.C. [WO71] D. C. WOOD AND E. H. FORMAN, Throughput measurement using a synthetic job stream, AFIPS Conf. Proc., 39 (1971), FJCC, pp. 51-55.