DATA HANDLING IN SCIENCE AND TECHNOLOGY - VOLUME 11
Experimental design: a chemometric approach
DATA HANDLING IN SCI...
90 downloads
1208 Views
5MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
DATA HANDLING IN SCIENCE AND TECHNOLOGY - VOLUME 11
Experimental design: a chemometric approach
DATA HANDLING IN SCIENCE AND TECHNOLOGY Advisory Editors: B.G.M. Vandeginste and S.C. Rutan
Other volumes in this series: Volume 1
Microprocessor Programming and Applications for Scientists and Engineers by R.R. Smardzewski Volume 2 Chemometrics: A Textbook by D.L. Massart, B.G.M. Vandeginste, S.N. Deming, Y. Michotte and L. Kaufman Volume 3 Experimental Design: A Chernometric Approach by S.N. Deming and S.L. Morgan Volume 4 Advanced Scientific Computing in BASIC with Applications in Chemistry, Biology and Pharmacology by P. Valko and S. Vajda Volume 5 PCs for Chemists, edited by J. Zupan Volume 6 Scientific Computing and Automation (Europe) 1990, Proceedings of the Scientific Computing and Automation (Europe) Conference, 12-15 June, 1990, Maastricht, The Netherlands, edited by E.J. Karjalainen Volume 7 Receptor Modeling for Air Quality Management, edited by P.K. Hopke Volume 8 Design and Optimization in Organic Synthesis by R. Carlson Volume 9 Multivariate Pattern Recognition in Chernometrics, illustrated by case studies, edited by R.G. Brereton Volume 10 Sampling of Heterogeneous and Dynamic Materials Systems: theories of heterogeneity, sampling and homogenizing by P.M. Gy Volume 11 Experimental Design: A Chemometric Approach (Second, revised and expanded edition) by S.N. Deming and S.L. Morgan
DATA HANDLING IN SCIENCE AND TECHNOLOGY - VOLUME 11 Advisory Editors: B.G.M. Vandeginste and S.C. Rutan
Experimental design: a chernornetric approach Second, revised and expanded edition STANLEY N. DEMING Department of Chemistry, University of Houston, Houston, TX 77004, U.S.A.
and
STEPHEN L. MORGAN Department of Chemistry, University of South Carolina, Columbia, SC 29208, U.S.A.
ELSEVIER Amsterdam - London - New York - Tokyo 1993
ELSEVIER SCIENCE PUBLISHERS B.V. Sara Burgerhartstraat 25 P.O. Box 211,1000 AE Amsterdam, The Netherlands
Library of Congress Cataloging-in-Publication Data Deming, Stanley N., 1944Experimental design: a chemometric approach I Stanley N . Deming and Stephen L. Morgan. - 2nd rev. and expanded ed. p. cm. - (Data handling in science and technology ; v . 1 1 ) Includes bibliographical references and index. ISBN 0-444-891 11-0 1 . Chemistry-Statistical methods. 2. Experimental design. 1. Morgan, Stephen L.. 1949- . 11. Title. 111. Series. Q D 7 5 . 4 S D 4 6 1993 92-38312 543 ' .00724-dc20 CIP
ISBN
0-444-89111-0
0 1993 Elsevier Science Publishers B.V. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher, Elsevier Science Publishers B.V., Copyright & Permissions Department, P.O. Box 521, 1000 A M Amsterdam, The Netherlands. Special regulations for readers in the USA -This publication has been registered with the Copyright Clearance Center Inc. (CCC), Salem, Massachusetts. Information can be obtained from the CCC about conditions under which photocopies of parts of this publication may be made in the USA. All other copyright questions, including photocopying outside of the USA, should be referred to the copyright owner, Elsevier Science Publishers B.V., unless otherwise specified. No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. This book is printed on acid-free paper. Printed in The Netherlands
V
Contents Preface to the first edition
.............................................
XI
...........................................
XV
Preface to the second edition
Chapter 1. System Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Measurement scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 2 9 14 16 20
Chapter 2 . Response Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Elementary concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Continuous and discrete factors and responses . . . . . . . . . . . . . . . . 2.3 Constraints and feasible regions . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Factor tolerances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25 25 31 35 38 41
Chapter 3 .
Basic Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Themean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Degrees of freedom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 The variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Sample statistics and population statistics . . . . . . . . . . . . . . . . . . . . 3.5 Enumerative vs. analytic studies . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45 45 48 48 51 53 54
Chapter 4 . One Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 A deterministic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 A probabilistic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A proportional model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 4.4 Multiparameter models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59 59 60 62 63 67
Chapter 5 . Two Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Matrix solution for simultaneous linear equations . . . . . . . . . . . . . . 5.2 Matrix least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 The straight line model constrained to pass through the origin . . . . . 5.4 Matrix least squares for the case of an exact fit . . . . . . . . . . . . . . . 5.5 Judging the adequacy of models . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71 72 76 81 83 86 87 93
VI
Chapter 6 .
Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 The null hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 The t-test . . ..................... 6.4 Sums of squares . . . . . . . . ......... 6.5 TheF-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Level of confidence ............................. Exercises . . . . . . . . . . . . . . . . . . .....
99 99 101 104 105 109 111 114
Chapter 7.
The Variance-Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Influence of the experimental design . . . . . . . . . . . . . . . . . . . . . . . 7.2 Effect on the variance of b, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Effect on the variance of b, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Effect on the covariance of b, and b, . . . . . . . . . . . . . . . . . . . . . . . 7.5 Optimal design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
119 120 121 122 124 126 129
Chapter 8. Three Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 All experiments at one level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Experiments at two levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Experiments at three levels: first-order model ......... 8.4 Experiments at three levels: s Centered experimental designs and coding . . . . . . . . . . . . . . . . . . . 8.5 8.6 Self interaction . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........
131 131
Chapter 9 .
Analysis of Variance (ANOVA) for Linear Models . . . . . . . . . . . . . . . . . . 9.1 Sums of squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Additivity of sums of squares and degrees of freedom . . . . . . . . . . . 9.3 Coefficients of determination and correlation . . . ......... 9.4 Statistical test for the effectiveness of the factors . . . . . . . . . 9.S Statistical test for the lack of fit . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Statistical test for a set of parameters . . . . . . . . . . . . . . . . . . . . . . 9.7 Statistical sig ance . . . . . . . . . . . . . . . Exercises . . . . . . . ...................
151 151 161 162 165 166 167 169 170
Chapter 10. An Example of Regression Analysis on Existing Data . . . . . . . . . . . . . . . . 10.1 The data . . . . ............................. 10.2 Preliminary observations . . ....................... 10.3 Statistical process control charts of the data . . . . . . . . . . . . . . . . . . 10.4 Cyclical patterns in the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Discussion of process details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 An abandoned search for a driving force . . . . . . . . . . . . . . . . . . . . 10.7 Observation of a day-of-the-week effect . . . . . . . . . . . . . . . . . . . . . 10.8 Observations about prior check is ................... 10.9 The linear modcl . . . . . . . . . . . ...................
177 177 I78 180 182 183 183 186 188 191
134 137 140 145 147 148
VII 10.10 Regression analysis of the initial model . . . . . . . . . . . . . . . . . . . . . 10.1 1 Descriptive capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.12 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
192 193 194
Chapter 11. A Ten-Experiment Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Allocation of degrees of freedom . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Placement of experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Results for the reduced model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Results for the expanded model . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.5 Coding transformations of parameter estimates . . . . . . . . . . . . . . . . 11.6 Confidence intervals for response surfaces . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
199 200 202 206 211 214 217 223
Chapter 12. Approximating a Region of a Multifactor Response Surface . . . . . . . . . . . . 12.1 Elementary concepts 12.2 Factor interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Factorial designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Coding of factorial designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Star designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Central composite designs ........... 12.7 Canonical analysis 12.8 Confidence interval ....................... 12.9 Rotatable designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.10 Orthogonal designs ............................. 12.11 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.12 Mixture designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
227 227 230 234 239 243 246 254 259 259 262 264 266 275
Chapter 13. Confidence Intervals for Full Second-Order Polynomial Models . . . . . . . . . 13.1 The model and a design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Normalized uncertainty and normalized information . . . . . . . . . . . . 13.3 The central composite design . . . . . . . . . . . . 13.4 A rotatable central composite design . . . . . . . . . . . . . . . . . . . 13.5 An orthogonal rotatable central composite design . . . . . . . . . . . . . . 13.6 A three-level full factorial design . . . . . . . . . . . . . . . . . . . . . . . . . 13.7 A small star design within a larger factorial design . . . . . . . . . . . . . 13.8 A larger rotatable central composite design . . . . . . . . . . . . . . 13.9 A larger three-level full factorial design . . . . . . . . . . . . . . . . . . . . . 13.10 The effect of the distribution of replicates . . . . .......... 13.11 Non-central composite designs . . . . . . . . . . . . . . . . . ..... 13.12 A randomly generated design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.13 A star design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.14 Rotatable polyhedral designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.15 The flexing geometry of full second-order polynomial models . . . . . 13.16 An extreme effect illustrated with the hexagonal design . . . . . . . . . . 13.17 Two final examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
279 279 280 282 284 285 286 289 292 293 294 297 299 301 302 304 309 310
VIII Exercises
.............................................
315
Chapter 14. Factorial-Based Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1 Coding and symbols . . . . . . . . . . . . . ......... 14.2 Classical mathematical treatment of a ful 14.3 Classical vs . regression factor effects . . . . . . . . . . . . . . . . . . 14.4 Factor units and their effect on factor effects . . . . . . . . . . . . . . . . . . 14.5 An example of the use of a 23 factorial design . . . . . . . . . . . . . . . . 14.6 The Yates’ algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 14.7 Some comments about full factorial designs . . . . . . . . . . . . . . . . . . 14.8 Fractional replication . ................ 14.9 Confounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.10 Blocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.11 Saturated fractional factorial designs and screening . . . . . . . . . . . . . 14.12 Plackett-Burman designs 14.13 Taguchi ideas . . . . . . . . . . . . . . . . . . ........ 14.14 Latin square designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
317 317 321 326 328 329 331 333 334 337 339 342 346 348 352 357
Chapter 15. Additional Multifactor Concepts and Experimental Designs . . . . . . . . . . . . 15.1 Confounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Randomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3 Completely randomized designs . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4 Randomized paired comparison designs . . . . . . . . . . . . . . . . . . . . . 15.5 Randomized complete block designs . . . . . . . . . . . . . . . . . . . . . . . 15.6 Coding of randomized complete block designs . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
361 361 365 368 373 378 384 388
Appendix A Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.l Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2 Matrix addition and subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4 Matrix inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AS Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
393 393 396 397 400 404
Appendix B Critical Values oft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
413
Appendix C Critical Values of F. a=0.05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
415
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
417
Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
429
IX
To Bonnie, Stephanie, and Michael, and to Linda and Neal
This Page Intentionally Left Blank
XI
Preface to the first edition As analytical chemists, we are often called upon to participate in studies that require the measurement of chemical or physical properties of materials. In many cases, it is evident that the measurements to be made will not provide the type of information that is required for the successful completion of the project. Thus, we find ourselves involved in more than just the measurement aspect of the investigation -we become involved in carefully (re)formulating the questions to be answered by the study, identifying the type of information required to answer those questions, making appropriate measurements, and interpreting the results of those measurements. In short, we find ourselves involved in the areas of experimental design, data acquisition, data treatment, and data interpretation. These four areas are not separate and distinct, but instead blend together in practice. For example, data interpretation must be done in the context of the original experimental design, within the limitations of the measurement process used and the type of data treatment employed. Similarly, data treatment is limited by the experimental design and measurement process, and should not obscure any information that would be useful in interpreting the experimental results. The experimental design itself is influenced by the data treatment that will be used, the limitations of the chosen measurement process, and the purpose of the data interpretation. Data acquisition and data treatment are today highly developed areas. Fifty years ago, measuring the concentration of fluoride ion in water at the parts-per-million level was quite difficult; today it is routine. Fifty years ago, experimenters dreamed about being able to fit models to large sets of data; today it is often trivial. Experimental design is also today a highly developed area, but it is not easily or correctly applied. We believe that one of the reasons “experimental design” is not used more frequently (and correctly) by scientists is because the subject is usually taught from the point of view of the statistician rather than from the point of view of the researcher. For example, one experimenter might have heard about factorial designs at some point in her education, and applies them to a system she is currently investigating; she finds it interesting that there is a “highly significant interaction” between factors A and B, but she is disappointed that all this work has not revealed to her the particular combination of A and B that will give her optimal results from her system. Another experimenter might be familiar with the least squares fitting of straight lines to data; the only problems he chooses to investigate are those that can be reduced to straight-line relationships. A third experimenter might be asked to do a screening study using Plackett-Burman designs; instead, he transfers out of the research division.
XI1
We do not believe that a course on the design of experiments must necessarily be preceded by a course on statistics. Instead, we have taken the approach that both subjects can be developed simultaneously, complementing each other as needed, in a course that presents the fundamentals of experimental design. It is our intent that the book can be used in a number of fields by advanced undergraduate students, by beginning graduate students, and (perhaps more important) by workers who have already completed their formal education. The material in this book has been presented to all three groups, either through regular one-semester courses, or through intensive two- or three-day short courses. We have been pleased by the confidence these students have Bained from the courses, and by their enthusiasm as they study further in the areas of statistics and experimental design. The text is intended to be studied in one way only-from the beginning of Chapter 1 to the end of Chapter 15. The chapters are highly integrated and build on each other: there are frequent references to material that has been covered in previous chapters, and there are many sections that hint at material that will be developed more fully in later chapters. The text can be read “straight through” without working any of the problems at the ends of the chapters; however, the problems serve to reinforce the material presented in each chapter, and also serve to expand the concepts into areas not covered by the main text. Relevant literature references are often given with this latter type of problem. The book has been written around a framework of linear models and matrix least squares. Because we authors are so often involved in the measurement aspects of investigations, we have a special fondness for the estimation of purely experimental uncertainty. The text reflects this prejudice. We also prefer the term “purely experimental uncertainty” rather than the traditional “pure error”, for reasons we as analytical chemists believe should be obvious. One of the important features of the book is the sunis of squares and degrees of freedom tree that is used in place of the usual ANOVA tables. We have found the “tree” presentation to be a more effective teaching tool than ANOVA tables by themselves. A second feature of the book is its emphasis on degrees of freedom. We have tried to remove the “magic” associated with knowing the source of these numbers by using the symbols n (the total number of experiments in a set), p (the number of parameters in the model), and f (the number of distinctly different factor combinations in the experimental design). Combinations of these symbols appear on the “tree” to show the degrees of freedom associated with various sums of squares (e.g., n-f for SS,). A third feature is the use of the J matrix (a matrix of mean replicate response) in the least squares treatment. We have found it to be a useful tool for teaching the effects (and usefulness) of replication.
XI11
We are grateful to a number of friends for help in many ways. Grant Wemimont and L. B. Rogers first told us why statistics and experimental design should be important to us as analytical chemists. Ad Olansky and Lloyd Parker first told us why a clear presentation of statistics and experimental design should be important to us; they and Larry Bottomley aided greatly in the early drafts of the manuscript. Kent Linville provided many helpful comments on the early drafts of the first two chapters. A large portion of the initial typing was done by Alice Ross; typing of the final manuscript was done by Lillie Gramann. Their precise work is greatly appreciated. We are grateful also to the Literary Executor of the late Sir Ronald A. Fisher, F. R. S., to Dr. Frank Yates, F. R. S., and to Longman Group Ltd., London, for permission to partially reprint Tables 111 and V from their book Srarisrical Tables f o r Biological, Agricultural and Medical Research, 6th ed., 1974. Finally, we would like to acknowledge our students who provided criticism as we developed the material presented here.
S. N. Deming Houston, Texas August 1986
S. L. Morgan Columbia. South Carolina
This Page Intentionally Left Blank
XV
Preface to the second edition In the six years that have elapsed since publication of the first edition, increasing interest in the quality of products and processes has led to a greater interest in the application of designed experiments. Designed experiments are being used more frequently and more effectively in research, development, manufacturing, and marketing. We welcome this change. We are still convinced that a course on the design of experiments does not necessarily have to be preceded by a course on statistics. Instead, we believe even more strongly that both subjects can be developed simultaneously in a course that presents the fundamentals of experimental design. A section has been added to Chapter 1 on the distinction between analytic vs. enumerative studies. A section on mixture designs has been added to Chapter 9. A new chapter on the application of linear models and matrix least squares to observational data has been added (Chapter 10). Chapter 13 attempts to give a geometric “feel” to concepts such as uncertainty, information, orthogonality, rotatability, extrapolation, and rigidity of the design, Finally, Chapter 14 expands on some aspects of factorial-based designs. We are grateful to a number of friends for help in many ways. David Councilman and Robin Corbit typed much of the manuscript. John and Josephine Palasota assisted in gathering reference material. Mary Flynn supplied data for use as an example in one of the chapters. Finally, we would like to acknowledge again our students who provided criticism as we developed and expanded the material presented here. S. N. Deming Houston, Texas August 1992
S.L. Morgan Columbia, South Carolina
This Page Intentionally Left Blank
1
CHAPTER 1
System Theory
General system theory is an organized thought process to be followed in relating cause and effect Pertalanffy (1968), Iberall(1972), Laszlo (1972), Sutherland (1975), Weinberg (1975), Gold (1977), Jeffers (1978), and Vemuri (1978)l. The system is treated as a bounded whole with inputs and outputs external to the boundaries, and transformations occurring within the boundaries. Inputs, outputs, and transformations can be important or trivial - the trick is to determine which. The system of determination involves a study and understanding of existing theory in the chosen field; a study and understanding of past observations and experiments more specific to the chosen problem; and new experiments specific to the problem being studied. General system theory is a versatile tool for investigating many research and development projects [Gall (1986), Lacy (1986), and Weinberg and Weinberg (1988)]. Although other approaches to research and development often focus on the detailed internal structure and organization of the system, our approach here will be to treat the system as a whole and to be concerned with its overall behavior.
1.1 Systems A system is defined as a regularly interacting or interdependent group of items forming a unified whole [Bertalanffy (1968)l. Systems are often called processes [Scherkenbach (1988), Scholtes (1988), and Scherkenbach (1991)l. A system or process is described by its borders, by what crosses the borders, and by what goes on inside. Thus, we often speak of a solar system when referring to a sun, its planets, their moons, etc.; a thermodynamic system when we are describing compounds in equilibrium with each other; and a digestive system if we are discussing certain parts of the body. Other examples of systems are ecological systems, data processing systems, and economic systems. We even speak of the system when we mean some
Inputs
T r a n s f orms
outputs
7_
Figure 1.1 General system theory view of relationships among inputs, transforms, and outputs.
2
Y = X t 2
ty
Figure 1.2 General system theory view of the algebraic relationship y = x
+ 2.
part of the established order around us, some regularly interacting or interdependent group of items forming a unified whole of which we are a part. General system theory views a system as possessing three basic elements - inputs, transforms, and outputs (see Figure 1.1). An example of a simple system is the mathematical relationship (1.1)
y=x+2
This algebraic system is diagrammed in Figure 1.2. The input to the system is the independent variable x. The output from the system is the dependent variable y. The transform that relates the output to the input is the well defined mathematical relationship given in Equation 1.1. The mathematical equation transforms a given value of the input, x, into an output value, y. If x = 0, then y = 2. If x = 5, then y = 7,and so on. In this simple system, the transform is known with certainty. Figure 1.3 is a system view of a wine-making process. In this system, there are two inputs (yeast and fruit), two outputs (percent alcohol and “bouquet”), and a transform that is probably not known with certainty. Most systems are much more complex than the simple examples shown here. In general, there will be many inputs, many outputs, many transforms, and considerable subsystem structure. A more realistic view of most systems is probably similar to that shown in Figure 1.4.
1.2 Inputs We will define a system input as a quantity or quality that might have an influence on the system. The definition of a system input is purposefully broad. It could have been made narrower to include only those quantities and qualities that do have an influence on
Yeast Fruit
Wine-Making Process
% AlCOhOI BDUQUet
Figure 1.3 General system theory view of a wine-making process.
3 Input 1
output 1
Input 2
output 2
Input 3 Input 4
~’y‘~ output 3
output 4
Tr a nsf orm
I.n’p-u”t K.
.... . .
Out put J
Figure 1.4 General system theory view emphasizing internal structures and relationships with a system.
the system. However, because a large portion of the early stages of much research and development is concerned with determining which inputs do have an influence and which do not, such a narrow definition would assume that a considerable amount of work had already been carried out. The broader definition used here allows the inclusion of quantities and qualities that might eventually be shown to have no influence on the system, and is a more useful definition for the early and speculative stages of most research. We will use the symbol x with a subscript to represent a given input. For example, x1 means “input number one,” x, means “input number two,” xi means “the ith input,” and so on. The intensity setting of an input is called a level. It is possible for an input to be at different levels at different times. Thus, if x1 designates the x in the previous algebraic example, it might have had the value 0 when we were first interested in the system; now we might want x1 to have the value 5. To designate these different sets of conditions, a second subscript is added. Thus, xII= 0 means that in the first instance, xI= 0; and xI2= 5 means that in the second instance, x I = 5 . Ambiguity is possible with this notation: e.g., x137might refer to the level of input number one under the 37th set of conditions, or it might refer to the level of input 13 under the seventh set of conditions, or it might refer to input number 137. To avoid this ambiguity, subscripts greater than nine can be written in parentheses or separated with commas. Thus, x,(37)and x1,37refer to the level of input number one under the 37th set of conditions, x(13)7 and x13,7refer to the level of input 13 under the seventh set of conditions, and x(~,,, refers to input number 137. Input variables and factors
A system variable is defined as a quantity or quality associated with the system that may assume any value from a set containing more than one value. In the algebraic system described previously, “x” is an input variable: it can assume any
4
one of an infinite set of values. “Yeast” and “fruit” are input variables in the wine-making process. In the case of yeast, the amount of a given strain could be varied, or the particular type of yeast could be varied. If the variation is of extent or quantity (e.g., the use of one ounce of yeast, or two ounces of yeast, or more) the variable is said to be a quantitative variable. If the variation is of type or quality (e.g., the use of Saccharomyces cerevisiae, or Sacchuromyces ellipsoideus, or some other species) the variable is said to be a qualitative variable. Thus, “yeast” could be a qualitative variable (if the amount added is always the same, but the type of yeast is varied) or it could be a quantitative variable (if the type of yeast added is always the same, but the amount is varied). Similarly, “fruit” added in the wine-making process could be a qualitative variable or a quantitative variable. In the algebraic system, x is a quantitative variable. A factor is defined as one of the elements contributing to a particular result or situation. It is an input that does have an influence on the system. In the algebraic system discussed previously, x is a factor; its value determines what the particular result y will be. “Yeast” and “fruit” are factors in the wine-making process; the type and amount of each contributes to the alcohol content and flavor of the final product. In the next several sections we will further consider factors under several categorizations. Known and unknown factors
In most research and development projects it is important that as many factors as possible be known. Unknown factors can be the witches and goblins of many projects - unknown factors are often uncontrolled, and as a result such systems appear to behave excessively randomly and erratically. Because of this, the initial phase of many research and development projects consists of screening a large number of input variables to see if they are factors of the system; that is, to see if they have an efSect on the system. The proper identification of factors is clearly important (see Table 1.1) If an input TABLE 1.1 Possible outcomes in the identification of factors. Type of input
Identified as a factor
Not identified as a factor
A factor
Desirable for research and development
Random and erratic behavior
Not a factor
Unnecessary complexity
Desirable for research and development
5
Known F a c t o r
1
System
Unknown F a c t o r ........'."
Figure 1.5 Symbols for a known factor (solid arrow) and an unknown factor (dotted arrow).
a,
Yeast Fruit
Wine-Making System
% Alcohol
Bouquet
Figure 1.6 Symbols for controlled factors (arrows with dot at tail) and an uncontrolled factor (arrow without dot).
variable is a factor and it is identified as a factor, the probability is increased for the success of the project. If an input variable truly is a factor but it is not included as an input variable and/or is not identified as a factor, random and erratic behavior might result. If an input variable is not a factor but it is falsely identified as a factor, an unnecessary input variable will be included in the remaining phases of the project and the work will be unnecessarily complex. Finally, if an input variable is not a factor and is not identified as a factor, ignoring it will be of no consequence to the project. The first and last of the above four possibilities are the desired outcomes. The second and third are undesirable outcomes, but undesirablefor direrent reasons and with diflerent consequences.The third possibility, falsely identifying an input variable as a factor, is unfortunate but the consequences are not very serious: it might be expensive, in one way or another, to carry the variable through the project, but its presence will not affect the ultimate outcome. However, the second possibility, not identifying a factor, can be very serious: omitting a factor can often cause the remaining results of the project to be worthless. In most research and development, the usual approach to identifying important factors uses a statistical test that is concerned with the risk (a)of stating that an input variable is a factor when, in fact, it is not - a risk that is of relatively little consequence (see Table 1.1). Ideally, the identification of important factors should also be concerned with the potentially much more serious risk (p) of stating that an
6
input variable is not a factor when, in fact, it is a factor (see Table 1.1). This subject is discussed further in Chapter 6. A known factor can be shown as a solid arrow pointing toward the system; an unknown factor can be shown as a dotted arrow pointing toward the system (see Figure 1.5). Controlled and uncontrolledfactors
The word “control” is used here in the sense of exercising restraint or direction over a factor - that is, the experimental setting of a factor to a certain quantitative or qualitative value. Controllable factors are desirable in experimental situations because their effects can usually be relatively easily and unambiguously detected and evaluated. Examples of individual controllable factors include x, yeast, fruit, temperature, concentration, time, amount, number, and size. Uncontrollable factors are undesirable in experimental situations because their effects cannot always be easily or unambiguously detected or evaluated. Attempts are often made to minimize their effects statistically (e.g., through randomization of experiment order - see Section 15.2) or to separate their effects, if known, from those of other factors (e.g., by measuring the level of the uncontrolled factor during each experiment and applying a “known correction factor” to the experimental results). Examples of individual uncontrollable factors include incident gamma ray background intensity, fluctuations in the levels of the oceans, barometric pressure, and the much maligned “phase of the moon”. A factor that is uncontrollable by the experimenter might nevertheless be controlled by some other forces. Incident gamma ray background intensity, fluctuations in the levels of the oceans, barometric pressure, and phase of the moon cannot be controlled by the experimenter, but they are “controlled” by the “Laws of Nature”. Such factors are usually relatively constant with time (e.g., barometric pressure over a short term), or vary in some predictable way (e.g., phase of the moon). A controlled factor will be identified as an arrow with a dot at its tail; an uncontrolled factor will not have a dot. In Figure 1.6, temperature is shown as a controlled known factor; pressure is shown as an uncontrolled known factor. “Yeast” and “fruit” are presumably controlled known factors. Intensive and extensive factors
Another categorization of factors is based on their dependence on the size of a system. The value of an intensivefactor is not a function of the size of the system. The value of an extensive factor is a function of the size of the system. The temperature of a system is an intensive factor. If the system is, say, 72”C, then
7
it is 72°C independent of how large the system is. Other examples of intensive factors are pressure, concentration, and time. The mass of a system, on the other hand, does depend on the size of the system and is therefore an extensive factor. Other examples of extensive factors are volume and heat content. Masquerading factors
A true factor exerts its effect directly on the system and is correctly identified as doing so. A masquerading factor also exerts its effect directly on the system but is incorrectly assigned some other identity. “Time” is probably the most popular costume of masquerading factors. Consider again the wine-making process shown in Figure 1.3, and imagine the effect of using fruit picked at different times in the season. In general, the later it is in the season, the more sugar the fruit will contain. Wine begun early in the season will probably be “drier” than wine started later in the season. Thus, when considering variables that have an effect on an output called “dryness,” “time of year” might be identified as a factor - it has been correctly observed that wines are drier when made earlier in the season. But time is not the true factor; sugar content of the fruit is the true factor, and it is masquerading as time. The masquerade is successful because of the high correlation between time of year and sugar content. A slightly different type of masquerade takes place when a single, unknown factor influences two outputs, and one of the outputs is mistaken as a factor. G. E. P. Box has given such masquerading factors the more dramatic names of either “latent variables” or “lurking variables” [Box, Hunter, and Hunter (1978), and Joiner (1981)l. Suppose it is observed that as foaming in the wine-making system increases, there is also an increase in the alcohol content. The process might be envisioned as shown in Figure 1.7. Our enologists seek to increase the percent alcohol by introducing foaming agents. Imagine their surprise and disappointment when the intentionally increased foaming does not increase the alcohol content. The historical evidence is clear that increased foaming is accompanied by increased alcohol content! What is wrong?
Foaming
Wine-Making System
%
Alcohol
Figure 1.7 One view of a wine-making process suggesting a relationship between foaming and percent alcohol.
8
Wine-Making Sugar Content
System
Foaming % Alcohol
Figure 1.8 Alternative view of a wine-making process suggesting sugar content as a masquerading factor.
What might be wrong is that foaming and percent alcohol are both outputs related to the lurking factor “sugar content’’ (see Figure 1.8). The behavior of the system might be such that as sugar content is increased, foaming increases and alcohol content increases; as sugar content is decreased, foaming decreases and alcohol content decreases. “Sugar content,” the true factor, is thus able to masquerade as “foaming” because of the high correlation between them. A corrected view of the system is given in Figure 1.9. Experiment vs. observation Think for a moment why it is possible for factors to masquerade as other variables. Why are we sometimes fooled about the true identity of a factor? Why was “time of year” said to be a factor in the dryness of wine? Why was “foaming” thought to be a factor in the alcohol content of wine? One very common reason for the confusion is that the factor is identified on the basis of observation rather than experiment [Snedecor and Cochran (1980)l. In the wine-making process, it was observed that dryness is related to the time of the season; the causal relationship between time of year and dryness was assumed. In a second example, it was observed that percent alcohol is related to foaming; the causal relationship between foaming and alcohol content was assumed. But consider what happened in the foaming example when additional foaming agents were introduced. The single factor ‘‘foaming” was deliberately changed. An experiment was performed. The result of the experiment clearly disproved any hypothesized causal relationship between alcohol content and foaming.
Sugar Content
Wine-Making System
Foaming % Alcohol
Figure 1.9 Corrected view of a wine-making process showing relationships between sugar content and both foaming and percent alcohol.
9
An observation involves a measurement on a system as it is found, unperturbed and undisturbed. An experiment involves a measurement on a system after it has been deliberately perturbed and disturbed by the experimenter. In an important paper summarizing the current state of knowledge about cause and effect relationships, Holland (1986) has concluded that the “analysis of causation should begin with studying the effects of causes rather than the traditional approach of trying to define what the cause of a given effect is.” That is a very powerful argument in favor of designed experiments.
1.3 Outputs We will define a system output as a quantity or quality that might be influenced by the system. Again, the definition of system output is purposefully broad. It could have been made narrower to include only those quantities and qualities that are influenced by the system. However, to do so presupposes a more complete knowledge of the system than is usually possessed at the beginning of a research and development project. System outputs that are influenced by the system are called responses. System outputs can include such quantities and qualities as yield, color, cost of raw materials, and public acceptance of a product. We will use the symbol y to represent a given output. For example, y l means “output number one,” y2 means “output number two,” and yi means the ith output. The intensity value of an output is called its level. It is possible for an output to be at different levels at different times; in keeping with the notation used for inputs, a second subscript is used to designate these different sets of conditions. Thus, yu refers to the level of the output yz under the fifth set of conditions, and y(11)(17)or y11,17 refers to the level of the 11th input under the 17th set of conditions. Important and unimportant responses
Most systems have more than one response. The wine-making process introduced in Section 1.1 is an example. Percent alcohol and bouquet are two responses, but there are many additional responses associated with this system. Examples are the amount of carbon dioxide evolved, the extent of foaming, the heat produced during fermentation, the turbidity of the new wine, and the concentration of ketones in the final product. Just as factors can be classified into many dichotomous sets, so too can responses. One natural division is into important responses and unimportant responses, although the classification is not always straightforward. The criteria for classifying responses as important or unimportant are seldom based solely on the system itself, but rather are usually based on elements external to the system. For example, in the wine-making process, is percent alcohol an important or
10
unimportant response? To some persons, alcohol content could well be a very important response; any flavor would probably be ignored. Yet to an enophile of greater discrimination, percent alcohol might be of no consequence whatever: “bouquet” would be supremely important. Almost all responses have the potential of being important. Events or circumstances external to the system usually reveal the category - important or unimportant - into which a given response should be placed. Responses as factors
Most of the responses discussed so far have an effect on the universe outside the system; it is the importance of this effect that determines the importance of the response. Logically, if a response has an effect on some other system, then it must be a factor of that other system. It is not at all unusual for variables to have this dual identity as response and factor. In fact, most systems are seen to have a rather complicated internal subsystem structure in which there are a number of such factor-response elements (see Figure 1.4). The essence of responses as factors is illustrated in the drawings of Rube Goldberg (1930) in which an initial cause triggers a series of intermediate factor-response elements until the final result is achieved. Occasionally, a response from a system will act as a true factor to the same system, a phenomenon that is generally referred to as feedback. (This is not the same as the situation of masquerading factors.) Feedback is often classified as positive if it enhances the response being returned as a factor, or negative if it diminishes the response. Heat produced during a reaction in a chemical process is an example of positive feedback that is often of concern to chemical engineers (see Figure 1.10). Chemical reactions generally proceed more rapidly at higher temperatures than they do at lower temperatures. Many chemical reactions are exothermic, giving off heat as the reaction proceeds. If this heat is not removed, it increases the temperature of the system. If the temperature of the system increases, the reaction will proceed faster. If it proceeds faster, it produces more heat. If it produces more heat, the temperature goes still
4 I
I
. . . . . . . . . . . . . . . . . . . . . . .
Temperature
System Heat
Figure 1.10 Example of positive feedback: as heat is produced, the temperature of the system increases.
11
.................~~~~~.
E x tens i o n
Pain
Figure 1.1 1 Example of negative feedback: as pain increases, extension decreases.
higher until at some temperature either the rate of heat removed equals the rate of heat produced (an equilibrium), or the reaction “goes out of control” (temperature can no longer be controlled), often with disastrous consequences. Examples of negative feedback are common. A rather simple example involves pain. If an animal extends an extremity (system factor = length of extension) toward an unpleasant object such as a cactus, and the extremity encounters pain (system response = pain), the pain is transformed into a signal that causes the extremity to contract (i.e., negative extension), thereby decreasing the amount of pain (see Figure 1.1 1). Negative feedback often results in a stable equilibrium. This can be immensely useful for the control of factors within given tolerances.
Known and unknown responses There are many examples of systems for which one or more important responses were unknown. One of the most tragic involved the drug thalidomide. It was known that one of the responses thalidomide produced when administered to humans was that of a tranquilizer; it was not known that when taken during pregnancy it would also affect normal growth of the fetus and result in abnormally shortened limbs of the newborn. Another example of an unknown or unsuspected response is the astonished look of a young child when it first discovers that turning a crank on a box produces not only music but also the surprising “jack-in-the-box”. TABLE 1.2 Possible outcomes in the identification of responses. Type of output
Identified as a response
Not identified as a response
Important
Desirable for research and development
Possibility of unexpected serious consequences
Unimportant
Unnecessary data acquired (but available if response becomes important later)
Desirable for research and development
12
Known Response System
> Unknown
Response
Figure 1.12 Symbols for a known response (solid arrow) and an unknown response (dotted arrow).
Unknown important responses are destructive in many systems: chemical plant explosions caused by impurity built up in reactors; Minimata disease, the result of microorganisms metabolizing inorganic mercury and passing it up the food chain; the dust bowl of the 1930’s - all are examples of important system responses that were initially unknown and unsuspected [Adams (1991)l. As Table 1.2 shows, it is desirable to know as many of the important responses from a system as possible. If all of the responses from a system are known, then the important ones can be correctly assessed according to any set of external criteria (see the section on important and unimportant responses). In addition, because most decision-making is based on the responses from a system, bad decisions will not be made because of incomplete information if all of the responses are known. A known response can be represented as a solid arrow pointing away from the system; an unknown response will be represented as a dotted arrow pointing away from the system (see Figure 1.12).
Controlled and uncontrolled responses A controlled response is defined as a response capable of being set to a specified level by the experimenter; an uncontrolled response is a response not capable of being set to a specified level by the experimenter. It is important to realize that an experimenter has no direct control over the responses. The experimenter can, however, have direct control over the factors, and this control will be transformed by the system into an indirect control over the responses. If the behavior of a system is known, then the response from a system can be set to a specified level by setting the inputs to appropriate levels. If the behavior of a system is not known, then this type of control of the response is not possible. Let us reconsider two of the systems discussed previously, looking first at the algebraic system, y = n + 2. In this system, the experimenter does not have any direct control over the response y, but does have an indirect control on y through the system by direct control of the factor x. If the experimenter wants y to have the value 7, the factor x can be set to the value 5. If y is to have the value -3, then by setting n = -5, the desired response can be obtained. This appears to be a rather trivial example. However, the example appears trivial only because the behavior of the system is known. If we represent Equation 1.1 as
13
Figure 1.13 Representation of system transform A: y = Ax.
where A is the transform that describes the behavior of the system (see Figure 1.13), then the inverse of this transform, A-I, relates x to y
In this example, the inverse transform may be obtained easily and is expressed as
x=y-2
(1.4)
(see Figure 1.14). Because this inverse transform is easily known, the experimenter can cause y to have whatever value is desired. Now let us consider the wine-making example and ask, “Can we control temperature to produce a product of any exactly specified alcohol content (up to, say, 12%)”? The answer is that we probably cannot, and the reason is that the system behavior is initially not known with certainty. We know that some transform relating the alcohol content to the important factors must exist - that is, percent alcohol = W (temperature, ...)
(1.5)
However, the transform W is probably unknown or not known with certainty, and therefore the inverse of the transform (W’) cannot be known, or cannot be known
Figure 1.14 Representation of system inverse transform A-l: x = A-ly.
14
with certainty. Thus, temperature = W - (percent alcohol, ...)
(1.6)
is not available: we cannot specify an alcohol content and calculate what temperature to use. If the response from a system is to be set to a specified level by setting the system factors to certain levels, then the behavior of the system must be known. This statement is at the heart of most research and development projects. Intensive and extensive responses
Another categorization of responses is based on their dependence on the size or throughput of a system. The value of an intensive response is not a function of the size or throughput of the system. Product purity is an example of an intensive response. If a manufacturing facility can consistently produce 95% pure material, then it will be 95% pure, whether we look at a pound or a ton of the material. Other examples of intensive responses might be color, density, percent yield, alcohol content, and flavor. The value of an extensive response is a function of the size of the system. Examples of extensive responses are total production in tons, cost, and profit.
1.4 Transforms The third basic element of general system theory is the transform. As we have seen, it is the link between the factors and the responses of a system, transforming levels of the system’s factors into levels of the system’s responses. Just as there are many different types of systems, there are many different types of transforms. In the algebraic system pictured in Figure 1.2, the system transform is the algebraic relationship y = x + 2. In the wine-making system shown in Figure 1.3, the transform is the microbial colony that converts raw materials into a finished wine. Transforms in the chemical process industry are usually sets of chemical reactions that transform raw materials into finished products. We will take the broad view that the system transform is that part of the system that actively converts system factors into system responses. A system transform is not a description of how the system behaves; a description of how the system behaves (or is thought to behave) is called a model. Only in rare instances are the system transform and the description of the system’s behavior the same - the algebraic system of Figure 1.2 is an example. In most systems, a complete description of the system transform is not possible - approximations of it (incomplete models) must suffice. Because much of the remainder of this book discusses models, their
15
formulation, their uses, and their limitations, only one categorization of models will be given here. Mechanistic and empirical models
If a detailed theoretical knowledge of the system is available, it is often possible to construct a mechanistic model which will describe the general behavior of the system. For example, if a biochemist is dealing with an enzyme system and is interested in the rate of the enzyme catalyzed reaction as a function of substrate concentration (see Figure 1 . 1 3 , the Michaelis-Menton equation might be expected to provide a general description of the system’s behavior. rate = rate,,,
[substrate J / ( K ,
+ [substrate] )
(1.7)
The parameter K,,, would have to be determined experimentally. The Michaelis-Menton equation represents a mechanistic model because it is based on an assumed chemical reaction mechanism of how the system behaves. If the system does indeed behave in the assumed manner, then the mechanistic model is adequate for describing the system. If, however, the system does not behave in the assumed manner, then the mechanistic model is inadequate. The only way to determine the adequacy of a model is to carry out experiments to see if the system does behave as the model predicts it will. (The design of such experiments will be discussed in later chapters.) In the present example, if “substrate inhibition” occurs, the Michaelis-Menton model would probably be found to be inadequate; a different mechanistic model would better describe the behavior of the system. If a detailed theoretical knowledge of the system is either not available or is too complex to be useful, it is often possible to construct an empirical model which will approximately describe the behavior of the system over some limited set of factor levels. In the enzyme system example, at relatively low levels of substrate concentration, the rate of the enzyme catalyzed reaction is found to be described rather well by the simple expression TABLE 1.3 Hierarchical characteristics of the four types of measurement scale. Type of measurement scale
Possesses name
Possesses order
Possesses distance
Possesses meaningful origin
Ratio Interval Ordinal Nominal
Yes Yes Yes Yes
Yes Yes Yes
Yes Yes -
Yes
-
-
-
16
Enzyme
Michaelis-Menton Mechanism
Reaction Rate
Figure 1.15 General system theory view of a mechanistic model of enzyme activity.
rate = k[substrate]
(1.8)
The adequacy of empirical models should also be tested by carrying out experiments at many of the sets of factor levels for which the model is to be used. Most of the models in this book are empirical models that provide an approximate description of the true behavior of a system. However, the techniques presented for use with empirical models are applicable to many mechanistic models as well.
1.5 Measurement scales Four types of measurement scale can be used for assigning values to varying amounts of a property associated with a system input or system output [Summers, Peters, and Armstrong (1977)l. In order of increasing informing power, they are: nominal, ordinal, interval, and ratio scales. The characteristics that determine a measurement scale’s level of sophistication are name, order, distance, and origin. The characteristics of the four types of measurement scale are shown in Table 1.3. The nominal scale possesses only one of these characteristics; the ratio scale possess all four characteristics. Nominal Scales When numbers are used simply to classify an observational unit, the nominal scale is used. For example, red wines might be described by the number 1, white wines might be assigned the number 2, and pink wines might be designated by the number 3. A key aspect of the nominal (or “named”) scale is that numbers are chosen arbitrarily to describe the categorical property. Any other numbers could be used to describe the property just as well. For example, red wines could be assigned the
17
number 27, white wines could be designated by the number 2, and pink wines could be described by the number -32. The numbers just represent names. It is important to recognize when a nominal scale is being used; improper judgments can be avoided. For example, imagine the gardener who rushes to the local nursery to purchase fertilizer 3 because fertilizer 2 gave superior performance over fertilizer 1. Or consider the mathematical “proof’ that if 1 = red, 2 = blue, and 3 = green then six red balls, one blue ball, and six green balls are, on the average, blue. Questionnaires often use nominal scales to describe the responses. A true story involves a graduate teaching assistant who successfully protested that male teachers were unfairly rated lower than females because the students were asked to mark the teachers’ gender on the questionnaire: 1 = male, 2 = female. When inappropriately averaged with other (ordinal) responses for which 1 = unacceptable, 2 = poor, 3 = average, 4 = good, and 5 = excellent, all teaching assistants were penalized, but males more so than females. Ordinal Scales
The ordinal scale uses the order or ranking characteristic in numbers. When a manager uses numbers to rank those who work for her, she applies an ordinal scale. Any number can be chosen to represent the performance of the worst or best worker, but most persons start with the number 1. The natural numbers that follow the initial number can be used to rank the other workers. Successively larger numbers can represent either increasingly better performance or increasingly worse performance - it is clearly important to communicate the direction of the measured property on the ordinal scale, otherwise lazy workers might be given large raises and excellent performers fired. The ordinal scale possesses the characteristic of order, but it does not make use of the distance property. In a ranked group of twelve workers, for example, the numerical distance between the best worker (#1) and the next-to-the-best worker (#2) is the same as the numerical distance between the two middle-ranked workers (#5 and #6), but the ordinal scale does not imply that worker #5 is as much better than worker #6 as worker #1 is better than worker #2. Reward systems based on rank is a topic of much current interest [Scherkenbach (1986), Scherkenbach (1991), Scholtes (1988), and W. Deming (1986)l. Interval Scales
The interval scale makes use of both the order and the distance characteristics of numbers but does not use the origin property. The origin of an interval scale is arbitrary. For instance, the zero point on the scale of elevation is arbitrarily set at sea level. Other interval scales are Celsius and Fahrenheit temperature, date, latitude, and exam scores. The origin of latitude, for example, is the equator. Because this
18
reference point has no fundamental advantage over any other reference point, it represents an arbitrary or conventional origin. When two interval scales are used to measure the amount of change in the same property, the proportionality of differences is preserved from one scale to the other. For example, Table 1.4 shows reduction potentials of three electrochemical half-cell reactions measured in volts with reference to the standard hydrogen electrode (SHE, E") and in millivolts with reference to the standard silver-silver chloride electrode (Ag/AgCI, E). For the SHE potentials the proportion of differences between the intervals +0.54 to +0.80 and +0.34 to +0.80 is
0.80-0.54 0.80-0.34-
0.26 13 0.46- 23
(1.9)
For the Ag/AgCl scale the proportion of the respective differences is
580-320 - _ 260_ 13580-120-460-23
(1.10)
Thus the proportionality of differences is preserved as we change from a standard hydrogen electrode scale in volts to a silver-silver chloride scale in millivolts or vice versa. The origin or zero point on interval scales is arbitrary. For example, 0.00 vs. SHE does not represent the condition of having no voltage. Instead, the zero point is simply an arbitrarily chosen reference voltage with which we can compare other voltages. For some purposes it might be more convenient to define another voltage scale for which the zero point is different. For example, the Ag/AgCl reference electrode is popular because it is so much easier, cheaper, and safer to use than the SHE. TABLE 1.4 Electrochemical half-cell potentials vs. the standard hydrogen electrode (SHE, Eo) and vs. the standard silver-silver chloride electrode (Ag/AgCI, E). Half-cell reaction
E", SHE
E, AglAgC1
Ag+
+
e- = Ago
4.80
+580
I;
+
2e- = 31-
4.54
+320
cu2+ + 2e- = CUO
4.34
+120
19
Ratio Scales Ratio scales possess not only order and interval characteristics but also have meaningful origins. Examples of properties that can be measured on ratio scales are mass, length, pressure (absolute, not guage), and volume. In each case the origin (zero point) on the ratio scale signifies that none of the property is present. Degrees Kelvin is an obvious example of a ratio scale: its use gives accuracy of prediction to the gas law PV = nRT. Using the interval scale of Celsius degrees for T will not work. Application of Scales There are many reasons why it is important to understand which type of measurement scale is being used to describe system inputs and outputs. One reason is that most statistical techniques are not applicable to data arising from all four types of measurement scales: the majority of techniques are applicable to data from interval or ratio scales. The difference between interval and ratio scales can be important for including or not including an intercept term in mathematical models; for the correct calculation of the correlation coefficient; for deciding to mean center or not in principal component analysis; and for a host of other decisions in data treatment and modeling. It is important to be aware of the differences in measurement scales.
20
Exercises 1.1 General system theory. Complete the following table by listing other possible inputs to and outputs from the wine-making process shown in Figure 1.3. Categorize each of the inputs according to your estimate of its expected influence on each of the outputs - strong (S), weak (W), or none (N). Inputs
outputs % alcohol
Amount of yeast
W
Type of yeast
S
“bouquet”
Amount of fruit
w
Type of fruit
S
Amount of sugar
clarity
...
S
1.2 General system theory. Discuss the usefulness of a table such as that in Problem 1.1 for planning experiments. What are its relationships to Tables 1.1 and 1.2? 1.3 General system theory. Choose a research or development project with which you are familiar. Create a table of inputs and outputs for it such as that in Problem 1.1. Draw a system diagram representing your project (see, for example, Figure 1.3).
1.4 General system theory. Suppose you are in charge of marketing the product produced in Problem 1.1. Which output is most important? Why? Does its importance depend on the market you wish to enter? What specifications would you place on the outputs to produce an excellent wine? A profitable wine?
21
1.5 Important and unimportant outputs. Lave (1981) has stated, “The automobile has provided an unprecedented degree of personal freedom and mobility, but its side effects, such as air pollution, highway deaths, and a dependence on foreign oil supplies, are undesirable. The United States has tried to regulate the social cost of these side effects through a series of major federal laws....however, each law has been aimed at a single goal, either emission reduction, safety, or fuel efficiency, with little attention being given to the conflicts and trade-offs between goals”. Comment, and generalize in view of Table 1.2. 1.6 Inputs. In the early stages of much research, it is not always known which of the system inputs actually affect the responses from the system; that is, it is not always known which inputs are factors, and which are not. One point of view describes all inputs as factors, and then seeks to discover which are significant factors, and which are not significant. How would you design an experiment (or set of experiments) to prove that a factor exerted a significant effect on a response? How would you design an experiment (or set of experiments) to prove that a factor had absolutely no effect on the response? [See, for example, Fisher (1971), or Draper and Smith (1981).]
I. 7 Symbology. Indicate the meaning of the following symbols: x, y, xl, x2, yl, y2, xI1.x13r A, A-1*
x319
YII,
~ 1 2 9~ 2 1 x9 3 j 9 Y l i ,
1.8 Qualitative and quantitativefactors. List five quantitative factors. List five qualitative factors. What do the key words “type” and “amount” suggest?
1.9 Known and unknown factors. In pharmaceutical and biochemical research, crude preparations of material will often exhibit significant desired biological activity. However, that activity will sometimes be lost when the material is purified. Comment. 1.10 Controlled and uncontrolled factors. Comment on the wisdom of trying to control the following in a study of factors influencing employee attitudes: size of the company, number of employees, salary, length of employment. Would such a study be more likely to use observations or experiments?
I . 11 Intensive and extensive factors. List five extensive factors. List five intensive factors. How can the extensive factor mass be converted to the intensive factor density?
22
I . I2 Masquerading factors. There is a strong positive correlation between how much persons smoke and the incidence of lung cancer. Does smoking cause cancer? [See Fisher (1959).] I . I3 Masquerading factors. Data exist that show a strong positive correlation between human population and the number of storks in a small village. Do storks cause an increase in population? Does an increase in population cause storks? [See, for example, Box, Hunter, and Hunter (1978).] 1.14 Important and unimportant responses.
Discuss three responses associated with any systems with which you are familiar that were originally thought to be unimportant but were later discovered to be important. 1.1.5 Responses as factors. List five examples of positive feedback. List five examples of negative feedback.
1.16 Responses as factors. What is a “Rube Goldberg device”? Give an example. 1.17 Known and unknown responses. Give examples from your childhood of discovering previously unknown responses. Which were humorous? Which were hazardous? Which were discouraging?
1.18 Controlled and uncontrolled responses. Plot the algebraic relationship y = lox-?. How can the response y be controlled to have the value 9? Plot the algebraic relationship x = 1Oy-y2. Can the response y be controlled to have the value 9?
I . 19 Transforms. What is the difference between a transform and a model? Are the two ever the same? 1.20 Mechanistic and empirical models. Is the model y = kr an empirical model or a mechanistic model? Why? 1.21 Mechanistic and empirical models. Trace the evolution of a mechanistic model that started out as an empirical model. How was the original empirical model modified during the development of the model? Did experimental observations or theoretical reasoning force adjustments to the model?
23
1.22 Research and development. The distinction between “research” and “development” is not always clear. Create a table such as Table 1.1 or 1.2 in which one axis is “motivation” with subheadings of “fundamental” and “applied,” and the other axis is “character of work” with subheadings of “science” and “technology”. Into what quarter might “research” be placed? Into what quarter might “development” be placed? Should there be distinct boundaries among quarters? [See, for example, Mosbacher (1977) and Finlay (1977).] 1.23 General system theory. Composition Law: “The whole is more than the sum of its parts”. Decomposition Law: “The part is more than a fraction of the whole”. Are these two statements contradictory? 1.24 Measurement scales. Indicate which type of measurement scale (nominal, ordinal, interval, or ratio) is usually used for the following characteristics: time, mass, library holdings, gender, type of heart attack, cholesterol level as measured by a clinical chemical laboratory, cholesterol level as reported by a doctor to a patient, pipet volume, and leaves on a plant.
1.25 Measurement scales. Indicate which type of measurement scale (continuous or discrete) is usually used for the characteristics listed in Problem 1.24. 1.26 Measurement scales. Assume that a chemical analysis laboratory of a government agency determines that the lead (Pb) concentration in a sample taken from a waste site is 2,698.32 ppm (parts per million). Suppose that for a legal definition of contamination, the lead concentration must be greater than 1,000 ppm; the laboratory reports the lead concentration as 1,698.32 pprn above the legal threshold. The government agency uses this information to state that the lead concentration is “greater than the legal limit.” As a result, the waste site is classified as “toxic.” In terms of measurement scales (nominal, ordinal, interval, and ratio), what was the progression of information through the series of actions just described? [See Enke (1971).] Is it possible to start with the categorization “toxic” and be able to determine that the lead concentration was greater than the legal limit, that it was 1,698.32 ppm above the legal threshold, and that the lead concentration in the sample was 2,698.32 ppm? Comment on the corruption of data.
This Page Intentionally Left Blank
25
CHAPTER 2
Response Surfaces Response surface methodology is an area of experimental design that deals with the optimization and understanding of system performance. In this chapter, several general concepts that provide a foundation for response surface methodology are presented, usually for the single-factor, single-response case (see Figure 2.1). In later chapters these concepts are expanded, and additional ones are introduced, to allow the treatment of multifactor systems. Response surface concepts are especially applicable to the description of multifactor systems [Box (1954), Box and Youle (1955), Box and Hunter (1957), Box and Draper (1959), Hill and Hunter (1968), S. Deming, (1971), Morgan and Jacques (1978), Hendrix (1980), Carter (1983, Jenkins, Mocella, Allen, and Sawin (1986), Box and Draper (1987), Khuri and Cornell (1987), Solana, Chinchilli, Wilson, Carter, and Carchman, (1987), and Coenegracht, Dijkman, Duineveld, Metting, Elema, and Malingre (1991)l.
2.1 Elementary concepts A response surfuce is the graph of a system response plotted against one or more of the system factors [Khuri and Cornell (1987)l. We consider here the simplest case, that of a single response plotted against only one factor. It is assumed that all other controllable factors are held constant, each at a specified level. As will be seen later, it is important that this assumption be true; otherwise, the single-factor response surface might appear to change shape or to be excessively noisy. Figure 2.2 is a response surface showing a system response, y l , plotted against one of the system factors, x,. If there is no uncertainty associated with the continuous response, and if the response is known for all values of the continuous factor, then the response surface might be described by some continuous mathematical model M that relates the response y1 to the factor xI.
Figure 2.1 Single-factor, single-response system.
26
0
1
2
3
4
5 6 Factor X1
7
8
9
10
Figure 2.2 Response surface showing the response, y , . as a function of the single factor, xl.
For Figure 2.2 the exact relationship is YI
= 0.80+ 1.20~1- 0 . 0 5 ~ :
(2.2 1
Such a relationship (and corresponding response surface) might represent reaction yield (yl) as a function of reaction temperature (xI)for a chemical process. It is more commonly the case that all points on the response surface are not known; instead, only a few values will have been obtained and the information will give an incomplete picture of the response surface, such as that shown in Figure 2.3. Common practice is to ussurne a functional relationship between the response and the factor (that is, to assume a model, either mechanistic or empirical) and find the values of the model parameters that fit the data. If a model of the form
is assumed, model fitting methods discussed in later chapters would give the estimated equation
27 m
r--
lo-
-
*
m0
t
rn m
L q -
0
a Ln
111 a,
m-
0
0 N-
--
0 0
0
Factnr X 1
Figure 2.3 Response values obtained from experiments carried out at different levels of the factor XI.
Factor X I
Figure 2.4 Response values obtained from experiments carried out at different levels of the factor xI and the assumed response surface.
28
I
0
1
I
2
,
3
I
I
4
5
I
6
I
7
I
8
I
9
10
Factor X I
Figure 2.5 Response surface exhibiting a maximum at x1 = 7.
y1= 8.00- 1 . 2 0 ~ 1 + 0.04~:
(2.4)
for the relationship between y1 and x1 in Figure 2.3. This assumed response surface and the data points are shown in Figure 2.4. The location of a point on the response surface must be specified by (1) stating the level of the factor, and (2) stating the level of the response. Stating only the coordinate of the point with respect to the factor locates the point in factor space; stating only the coordinate of the point with respect to the response locates the point in response space; and stating both coordinates locates the point in experiment space. If the seventh experimental observation gave the third point from the left in Figure 2.3, the location of this point in experiment space would by x17= 3.00, yI7= 4.76 (the first subscript on x and y refers to the factor or response number; the second subscript refers to the experiment number). Figures 2.2, 2.3 and 2.4 show relationships between y1 and x1 that always increase or always decrease over the domains shown. The lowest and highest values of the response y1 lie at the limits of the x1 factor domain. Figure 2.2 is a response surface that is monotonic increasing; that is, the response always increases as the factor level increases. Figures 2.3 and 2.4 show response surfaces that are monotonic decreasing; the response always decreases as the factor level increases. Figure 2.5 shows a relationship between y1and x1 for which a muximum lies within
29
1
0
2
3
4
5
6
7
8
9
10
Factor X1
Figure 2.6 Response surface exhibiting a minimum at xI = 3.
the domain, specifically at x1 = 7. Figure 2.6 is a response surface which exhibits a minimum at x1 = 3. Each of these extreme points could be considered to be an
-
8
1
2
3
I
I
4
5
6
7
8
9
10
Factor X1
Figure 2.7 Response surface exhibiting two local maxima, one at xI = 2, the other (the global maximum) at x, = 7.
30
m L
,
0
1
2
3
4
.
.
.
5
6
7
8
9
10
Factor X I
Figure 2.8 Response surface exhibiting three local minima, one at x, = 3, one at x, = 5, and the third (the global minimum) at x, = 8.
optimum, depending on what is actually represented by the response [Wilde and Beightler (1979)l. For example, if the response surface shown in Figure 2.5 represents the yield of product in kilograms per hour vs. the feed rate of some reactant for an industrial chemical process, an optimum (maximum) yield can be obtained by operating at the point x , = 7. If the response surface shown in Figure 2.6 represents the percentage of impurity in the final product as a function of the concentration of reactant for the same industrial chemical process, there is clearly an optimum (minimum) at the point x , = 3. As suggested by Figures 2.5 and 2.6, it is not always possible to simultaneously optimize multiple responses from the same system [Walters, Parker, Morgan, and Deming (1991)l. A maximum is the point in a region of experiment space giving the algebraically largest value of response. A minimum is the point in a region of experiment space giving the algebraically smallest value of response. An optimum is the point in a region of experiment space giving the best response. The response surfaces shown in Figures 2.5 and 2.6 are said to be unimodal- they exhibit only one optimum (maximum or minimum) over the domain of the factor x,. Multimodal response surfaces exhibit more than one optimum as shown in Figures 2.7 and 2.8. Each individual optimum in such response surfaces is called a local optimum; the best local optimum is called the global optimum. The response surface in Figure 2.7 has two local maxima, the rightmost of which is the global maximum. The response surface in Figure 2.8 has three local minim, the rightmost of which is the global minimum.
31
2.2 Continuous and discrete factors and responses A continuousfactor is a factor that can take on any value within a given domain. Similarly, a continuous response is a response that can take on any value within a given range. Examples of continuous factors are pressure, volume, weight, distance, time, current, flow rate, and reagent concentration. Examples of continuous responses are yield. profit, efficiency, effectiveness, impurity concentration, sensitivity, selectivity, and rate. Continuous factors and responses are seldom realized in practice because of the finite resolution associated with most control and measurement processes. For example, although temperature is a continuous factor and may in theory assume any value from -273.16”C upward, if the temperature of a small chemical reactor is adjusted with a common autotransformer, temperature must vary in a discontinuous manner (in “jumps”) because the variable transformer can be changed only in steps as a sliding contact moves from one wire winding to the next. Another example arises in the digitization of a continuous voltage response: because analog-to-digital converters have fixed numbers of bits, the resulting digital values are limited to a finite set of possibilities. A voltage of 8.76397... would be digitized on a ten-bit converter as either 8.76 or 8.77. Although practical limitations of resolution exist, it is important to realize that continuous factors and continuous responses can, in theory, assume any of the infinite number of values possible within a given set of bounds. A discrete factor is a factor that can take on only a limited number of values
0
1
,
,
1
2 3 4 5 6 7 8 Number o f R e c r y s t a l l i z a t i o r ~ s
9
Figure 2.9 Response surface showing an inherently continuous response (percent purity of a protein) as a function of an inherently discrete factor (number of recrystallizations).
32
0 ‘
90,000
100. 000
E x c i t a t i o n E n e r g y , Wavenumbers
Figure 2.10 Response surface showing an inherently discrete response (number of lines in the Lyman series) as a function of an inherently continuous factor (excitation energy).
within a given domain. A discrete response is a response that can take on only a limited number of values within a given range. Examples of discrete factors are type of buffer, choice of solvent, number of extractions, type of catalyst, and real plates in a distillation column. Examples of discrete responses are the familiar postal rates, wavelengths of the hydrogen atom emission spectrum, number of radioactive decays in one second, number of peaks in a spectrum, and number of items that pass inspection in a sample of 100. Figures 2.2-2.8 are response surfaces that show inherently continuous responses as functions of inherently continuous factors. Figure 2.9 illustrates a response surface for an inherently continuous response (the percent purity of a protein) plotted against an inherently discrete factor (number of recrystallizations); any percent purity from zero to 100 is possible, but only integer values are meaningful for the number of recrystallizations. Figure 2.10 is a response surface for an inherently discrete response (number of lines in the Lyman series of the hydrogen atom emission spectrum) as a function of an inherently continuous factor (energy of exciting radiation); only integer values are meaningful for the number of lines observed, but any excitation energy is possible. The discrete factor ‘‘number of recrystallizations” is naturally ordered (see Figure 2.9). It obviously makes sense to plot yield against the number of recrystallizations in the order 1, 2, 3, 4, 5, 6, 7, 8, 9; it would not make sense to plot the yield against a different ordering of the number of recrystallizations, say 3, 7, 5 , 8, 2, 1, 4, 9, 6. Other discrete factors are not always as meaningfully ordered, especially if they are
33
expressed on a nominal scale. Consider a chemical reaction for which the type of solvent is thought to be an important discrete factor. If we choose nine solvents at random, arbitrarily number them one through nine, and then evaluate the percent yield of the same chemical reaction carried out in each of the nine solvents, the results might be
Solvent 1 2 3 4 5 6 7 8 9
Percent Yield 54.8 88.7 34.8 83.7 68.7 89.9 74.7 90.0 83.6
If we now plot the percent yield vs. the solvent number, we will get the “response surface” shown in Figure 2.1 1. There seems to be a trend toward higher yield with increasing solvent number, but such a trend must be meaningless: it is difficult to imagine how an arbitrarily assigned number could possibly influence the yield of a chemical reaction. If a relationship does appear to exist between yield and solvent number, it must be entirely accidental - the variation in response is probably caused by some other property of the solvents. When it is suspected that some particular property of the discrete factor is important, reordering or ranking the discrete factor on an ordinal scale in increasing or decreasing order of the suspected property might produce a smoother response surface. Many properties of solvents could have an influence on the yield of chemical reactions: boiling point, Lewis basicity, and dipole moment are three such properties. Perhaps in this case it is felt that dipole moment is an important factor for the particular chemical reaction under investigation. If we look up the dipole moments of solvents one through nine in a handbook, we might find the following:
Solvent 1
2 3 4 5 6 7 8 9
Dipole Moment 0.175 0.437 0.907 0.362 0.753 0.521 0.286 0.491 0.639
0
0
0 0 0
0
1
2
3
4
5
6
7
8
9
Solvent
Figure 2.11 Response surface obtained from experiments using different solvents arbitrarily numbered 1 - 9.
When arranged according to increasing dipole moment, the solvent “names” are expressed on an ordinal scale as 1, 7, 4, 2, 8, 6, 9, 5, 3. Figure 2.12 plots percent
0
0
0
1
7
4
2
8 6 Solvent
9
5
3
Figure 2.12 Response surface obtained from experiments using different solvents ranked in order of increasing dipole moment.
35
0
0 2
0 4 0 6 D l p o l e Moment
0 8
1 0
Figure 2.13 Response surface showing percent yield as a function of dipole moments of solvents.
yield as a function of these reordered solvent numbers and suggests a discrete view of an underlying smooth functional relationship that might exist between percent yield and dipole moment. The discrete factor “solvent number” is recognized as a simple bookkeeping designation. We can replace it with the continuous factor dipole moment expressed on a ratio scale and obtain, finally, the “response surface” shown in Figure 2.13. A special note of caution is in order. Even when data such as that shown in Figure 2.13 is obtained, the suspected property might not be responsible for the observed effect; it may well be that a different, correlated property is the true cause (see Section 1.2 on masquerading factors).
2.3 Constraints and feasible regions A factor or response that is not variable from infinitely negative values to infinitely positive values is said to be constrained. Many factors and responses are naturally constrained. Temperature, for example, is usually considered to have a lower limit of -273.16”C; temperatures at or below that value or theoretically impossible. There is, however, no natural upper limit to temperature: temperature is said to have a natural lower bound, but it has no natural upper bound. The voltage from an autotransformer has both a lower bound (0 V a.c.) and an upper bound (usually about 130 V a.c.) and is an example of a naturally constrained discrete factor. The upper constraint could be changed if the autotransformer were
36
Setting
autotransformer
Temperature
LA+ Reactor
Yield
Figure 2.14 General system theory view showing how the use of an autotransformer imposes artificial constraints on the factor temperature.
redesigned but, given a particular autotransformer, these constraints are “natural” in the sense that they are the limits available. If the voltage from an autotransformer is to be used to adjust the temperature of a chemical reactor (see Figure 2.14), then the natural boundaries of the autotransformer voltage will impose artificial constraints on the temperature. The lower boundary of the autotransformer (0 V a.c.) would result in no heating of the chemical reactor. Its temperature would then be approximately ambient, say 25°C. The upper boundary of the autotransformer voltage would produce a constant amount of heat energy and might result in a reactor temperature of, say, 300°C. Thus, the use of an autotransformer to adjust temperature imposes artificial lower and upper boundaries on the factor of interest. The natural constraint on temperature, the natural constraints on autotransformer voltage, and the artificial constraints on temperature are all examples of inequality constraints. If T is used to represent temperature and E represents voltage, these inequality constraints can be expressed, in order, as
-273.16”C
(2.5)
0 V a . c . I E S 130 V a.c.
(2.6)
25”ClTI30OoC
(2.7 1
Here the “less than or equal to” symbol is used to indicate that the boundary values themselves are included in the set of possible values these variables may assume. If the boundary values are not included, “less than” symbols (<) are used. The presence of the additional “equal to” symbol in Equations 2.6 and 2.7 does not prevent the constraints from being inequalities. It is often desirable to fix a given factor at some specified level. For example, if an enzymatic determination is always to be carried out at 37”C, we might specify
37 Solvent X I Reactor S o l v e n t X2
Figure 2.15 General system theory view of a process to which two different solvents are added.
T = 37°C
(2.8)
This is a simple example of an externally imposed equality constraint. Note that by imposing this constraint, temperature is no longer a variable in the enzymatic system. Natural equality constraints exist in many real systems. For example, consider a chemical reaction in which a binary mixed solvent is to be used (see Figure 2.15). We might specify two continuous factors, the amount of one solvent (represented by xl) and the amount of the other solvent (x2). These are clearly continuous factors and each has only a natural lower bound. However, each of these factors probably should have an externally imposed upper bound, simply to avoid adding more total solvent than the reaction vessel can hold. If the reaction vessel is to contain 10 liters, we might specify the inequality constraints
OLIX, I l O L
(2.9)
0 L l x , I10 L
(2.10)
However, as Figure 2.16 shows, these inequality constraints are not sufficient to avoid overfilling the tank; any combination of x1 and x2 that lies to the upper right of the dashed line in Figure 2.16 will add a total of more than 10 liters and will cause the reaction vessel to overflow. Inequality constraints of
OLSXIISL
(2.11)
0 LIx2 I 5 L
(2.12)
will avoid overfilling the tank. The problem in this case, however, is that the reaction vessel will seldom be filled to capacity: in fact, it will be filled to capacity only when both x , and x, are 5 liters (see Figure 2.16). It is evident that an equality constraint is desirable. If we would like to vary both xl and x2, and if the total volume of solvent must equal 10 liters, then we want to choose x1 and x2 such that their location in factor space falls on the dashed line in Figure 2.16. An appropriate equality constraint is
38
x, +xz = 10 L
(2.13)
Either (but not both) of the two inequality constraints given by Equations 2.9 and 2.10 also applies. The specification of an equality constraint takes away a degree of freedom from the factors. In this example, x , and x, cannot both be independently varied - mathematically, it is now a single-factor system. Except in certain specialized areas, it is seldom practical to place an equality constraint on a continuous response: it is usually not possible to achieve an exact value for an output. For example, a purchasing agent for a restaurant might specify that the margarine it buys from a producer have a “spreadability index” of 0.50. When questioned further, however, the buyer will probably admit that if the spreadability index is between 0.45 and 0.55 the product will be considered acceptable. In general, if an equality constraint is requested for a continuous response, an inequality constraint is usually preferable. When a system is constrained, the factor space is divided into feasible regions and nonfeasible regions. A feasible region contains permissible or desirable combinations of factor levels, or gives an acceptable response. A nonfeasible region contains prohibited or undesirable combinations of factor levels, or gives an unacceptable response.
2.4 Factor tolerances Response surfaces provide a convenient means of discussing the subject of factor tolerances, limits within which a factor must be controlled to keep a system response within certain prescribed limits [Box, Hunter, and Hunter (1978)l. Consider the two response surfaces shown in Figures 2.17 and 2.18. Each response surface shows percent yield (u,) as a function of reaction temperature (x,), but for two different processes. In each process, the temperature is controlled at 85°C. The yield in each case is 93%. Let us suppose that the yield must be held between 92% and 94% for the process to be economically feasible. That is, 92% I y, I 94%
(2.14)
The question of interest to us now is, “Within what limits must temperature (XJ be controlled to keep the yield within the specified limits”? Answering this question requires some knowledge of the response surface in the region around 85°C. We will assume that experiments have been camed out in this region for each process, and that the response surfaces shown in Figures 2.17 and 2.18 are good approximations of the true behavior of the system. Empirical models that adequately describe the dependence of yield on temperature in the region of factor space near 85°C are y , = 50.5+0.5~,
for the first process (Figure 2.17), and
(2.15)
39
0
2
4
6
8
10
12
14
16
Gallons o f Solvent X I
Figure 2.16 Graph of the factor space for Figure 2.15 showing possible constraints on the two different solvents.
...........................................
/
82
83
84
85
86
87
88
Temperature ( X I )
Figure 2.17 Relationship between percent yield and temperature for chemical process A. Note the relatively large variations in temperature that are possible while maintaining yield between 92%and 94%.
40
~1
= 8.0+ 1 . 0 ~ 1
(2.16)
for the second process (Figure 2.18), where x1 is expressed in Celsius degrees. Because we know (approximately) the behavior of the system over the region of interest, we can use the inverse transform (see Section 1.4) to determine the limits within which the temperature must be controlled. For the first process,
X i = 2 ( ~ 1-50.5)
(2.17)
Substituting 92% and 94% for yl, we find that temperature must be controlled between 83°C and 87°C. These limits on temperature, and the transformed limits on yield, are illustrated in Figure 2.17. For the second process, the inverse transform is XI
= Y I -8.0
(2.18)
Substituting 92% and 94% for yl, we find that temperature must be controlled between 84°C and 86°C for this second process (see Figure 2.18). This is a tighter factor tolerance than was required for the first process. The specification of factor tolerances is clearly important for the achievement of reproducible processes. It is especially important to specify factor tolerances if the developed process is to be implemented at several different facilities, with different types of equipment, at different times, with different operators, etc.
Temperature
(XI)
Figure 2.18 Relationship between percent yield and temperature for chemical process B. Note the relatively small variations in temperature that are required to maintained yield between 92% and 94%.
41
Exercises 2.1 Uncontrolledfactors. Suppose the exact relationship for the response surface of Figure 2.2 is y 1 = 0.80 + 1 . 2 0 -~ 0.05< ~ + 0.50~~ (note the presence of the second factor, x,) and that Figure 2.2 was obtained with x, held constant at the value zero. What would the response surface look like if x, were held constant at the value 4? If x, were not controlled, but were instead allowed to vary randomly between 0 and 1, what would repeated experiments at x , = 5 reveal? What would the entire response surface look like if it were investigated experimentally while x, varied randomly between 0 and l?
2.2 Experiment space. Sketch the factor space, response space, and experiment space for Problem 2.1. 2.3 Terminology. Give definitions for the following: maximum, minimum, optimum, unimodal, multimodal, local optimum, global optimum, continuous, discrete, constraint, equality constraint, inequality constraint, lower bound, upper bound, natural constraint, artificial constraint, degree of freedom, feasible region, nonfeasible region, factor tolerance. 2.4 Factor tolerances. Consider the response surface shown in Figure 2.7. Suppose it is desired to control the response to within plus or minus 0.5 units. Would the required factor tolerances on x1 be larger around x1 = 2 or around x1 = 6?
2.5 Mutually exclusive feasible regions. Suppose that the response surface shown in Figure 2.5 represents the yield of product in kilograms per hour vs. the feed rate of a reactant xlrand that the response surface shown in Figure 2.6 represents the percentage of impurity in the same product. Is it possible to adjust the feed rate of reactant x1 to simultaneously achieve a yield greater than 7.25 kilograms per hour and a percentage impurity less than 2.1%?If not, are any other actions possible to achieve the goal of >7.25 kg h-'and <2.1% impurity? How could one or the other or both of the constraints be modified to achieve overlapping feasible regions? 2.6 Continuous and discrete factors. Specify which of the following factors are inherently continuous and which are inherently discrete: length, time, count, pressure, population, density, population density, energy, electrical charge, pages, temperature, and concentration.
42
2.7 Upper and lower bounds. Which of the factors listed in Problem 2.6 have lower bounds? What are the values of these lower bounds? Which of the factors listed in Problem 2.6 have upper bounds? What are the values of these upper bounds? 2.8 Maxima and minima. Locate the three local minima in Figure 2.7. Which is the global minimum? Locate the four local maxima in Figure 2.8. Which is the global maximum?
2.9 Degrees of freedom. A store has only three varieties of candy: A at 5 cents, B at 10 cents, and C at 25 cents. If a child intends to buy exactly $1.00 worth of candy in the store, how many degrees of freedom does the child have in choosing his selection of candy? Are the factors in this problem continuous or discrete? List all of the possible combinations of A, B, and C that would cost exactly $1.00. 2.10 Order arising from randomization. Suppose, by chance (one in 362,880) you happened to assign the letters H, C, I, D, G, B, F, A, E to the solvents numbered 1, 2, 3, 4, 5 , 6, 7, 8, 9 in Figure 2.11. If these solvents and their percent yield responses were listed in alphabetical order A-I, and a plot of percent yield vs. solvent letter were made, what naive conclusion might be drawn? 2.I I Artificial constraints. List three factors for which artificial constraints are often imposed. What are the values of these constraints? If necessary, could the artificial constraints be removed easily, or with difficulty?
2.12 Equality constraints. Suppose you are given the task of preparing a ternary (three-component) solvent system such that the total volume be 1.00 liter. Write the equality constraint in terms of x,, x,, and x,, the volumes of each of the three solvents. Sketch the three-dimensional factor space and clearly draw within it the planar, two-dimensional constrained feasible region. (Hint: try a cube and a triangle after examining Figure 2.16.) 2.13 Two-factor response sul3caces. Obtain a copy of a geological survey topographic map showing contours of constant elevation. Identify the following features, if present: a local maximum, a local minimum, the global maximum, the global minimum, a broad ridge, a narrow ridge, a broad valley, a narrow valley, a straight valley or ridge, a curved valley or ridge, a plateau region, and a saddle region.
43
2.14 Catastrophic response surfaces. Sketch a response surface showing length (yl) as a function of force exerted (xl) on a rubber band that is stretched until it breaks. Give examples of other catastrophic response surfaces. [See, for example, Saunders (1980).] 2.15 Hysteresis. Suppose you leave your home, go to the grocery store, visit a friend, stop at the drug store, and then return home. Draw a response surface of your round trip showing relative north-south distance 01,) as a function of relative east-west distance (xl). What does the term hysteresis mean and how does it apply to your round trip that you have described? Give examples of other systems and response surfaces that exhibit hysteresis. Why does there appear to be two different response surfaces in systems that exhibit hysteresis? Is there a factor that is not being considered? Is y1 really a response? Is x1 really a factor? 2.16 Equality and inequality constraints. Which would you sign, a $100,000 contract to supply 1000 2.000-meter long metal rods, or a $100,000 contract to supply 1000 metal rods that are 2.000*0.001 meters in length? 2.1 7 Factor tolerances. Look up a standard method of measurement (e.g., the determination of iron in an ore, found in most quantitative analysis textbooks). Are factor tolerances specified? What might be the economic consequences of insufficiently controlling a factor that has narrow tolerances? What might be the economic consequences of controlling too closely a factor that actually has wide tolerances? 2.18 Factor tolerances. “The sponsoring laboratory may have all the fun it wants within its own walls by using nested factorials, components of variance, or anything else that the workers believe will help in the fashioning of a test procedure. At some time the chosen procedure should undergo the kind of mutilation that results from the departures from the specified procedure that occur in other laboratories” [Youden (1961b)l. Comment. 2.19 Sampling theory. The equation y = asin(bn) describes a sine wave of period 36Olb. If a = 1 and b = 10, evaluate y at x = 0, 40,80,... , 320 and 360. Plot the individual results. What is the apparent period of the plotted data? Do these discrete responses give an adequate representation of the true response from the system?
This Page Intentionally Left Blank
45
CHAPTER 3
Basic Statistics It is usually not possible or practical to control a given system input at an exact level; in practice, most inputs are controlled around set levels within certain factor tolerances. Thus, controlled system inputs exhibit some variation. Variation is also observed in the levels of otherwise constant system outputs, either because of instabilities within the system itself (e.g., a system involving an inherently random process, such as nuclear decay); or because of the transformation of variations in the system inputs into variations of the system output (see Figures 2.17 and 2.18); or because of variations in an external measurement system that is used to measure the levels of the outputs. This latter source of apparent system variation also applies to measured values of system inputs. Thus, the single value 85°C might not completely characterize the level of input x,. Similarly, the single value 93% yield might mislead us about the overall behavior of the output y , . Will y, be 93% yield if we repeat the experiment using presumably identical conditions? Will it change? Over what limits? What might be expected as a typical value? To simplify the presentation of this chapter, we will look only at variation in the level of measured system outputs, but the same approach can also be applied to variation in the level of measured system inputs. We will assume that all controllable inputs (known and unknown) are fixed at some specified level (see Figure 3.1). Any variation in the output from the system will be assumed to be caused by variation in the uncontrolled inputs (known and unknown) or by small, unavoidable variations about the set levels of the controlled inputs [Davies (1956)l. The two questions to be answered in this chapter are, “What can be used to describe the level of output from the system”? and “What can be used to describe the variation in the level of output from the system”?
3.1 The mean Suppose we carry out a single evaluation of response from the system shown in Figure 3.1. This number is designated y , I and is found to be 32.53. Does this number, by itself, represent the level of output from the system? If we answer “yes”, then we have done so on the basis of the assumption that the variation associated with the output is zero. In the absence of other information, we cannot be certain that there is no variation, but we can test this assumption by carrying out a second evaluation of response from the same system. If we did carry
46 ^
C
_
C
3
2
_
C
-
C
1
3
0
0
0
0
L
C
C
C
C
C
Y y Y v
I
23
I
System
outputs
Figure 3.1 General system theory view showing known, unknown, controlled, and uncontrolled factors.
out another measurement and obtained the value yI2= 32.53, then we would feel more strongly that there is no measurable variation associated with the system output. But we have not proved that there is no uncertainty associated with the system; we have only increased our confidence in the assumption that there is no uncertainty. We will always be left with the question, “What if we evaluated the response one more time”? If, for example, we carried out our evaluation of response a third time and found y,, = 32.55, then we could conclude that the assumption about no measurable variation was false. (In general, it is easier to disprove an assumption or hypothesis than it is to prove it. The subject of hypothesis testing will be taken up again in Chapter 6.) We can see that a single evaluation of output from a system does not, by itself, necessarily represent the true level of the output from a system [Mandel (1964)l. Consider the following data set obtained by repetitive evaluation of the output of the system shown in Figure 3.1. Evaluation YI1
Y12 Y13
Y14
YIS Y16 Y17 Y18
Y19
Response
32.53 32.53 32.55 32.52 32.48 32.54 32.57 32.51 32.54
41
1 u L W 7 [7
W LL L
0 rf
LD 0
ID 0
m cu
m m
m cu
Figure 3.2 Plot showing frequency of obtaining a given response as a function of the response values themselves.
It is useful to reorder or rank the data set according to increasing (or decreasing) values of response. Such a reordering is Evaluation YlS
Y18
Y14 Yll
YlZ
Yl6
Y19 YI3
Y17
Response 32.48 32.51 32.52 32.53 32.53 32.54 32.54 32.55 32.57
The responses vary from a low value of 32.48 to a high value of 32.57, a response range of 0.09. Notice that the responses tend to cluster about a central value of approximately 32.53. Most of the values are very close to the central value; only a few are far away. Another representation of this ordered data set is given in Figure 3.2, where the vertical bars indicate the existence of a response with the corresponding value on the horizontal number line. Bars twice as high as the others indicate that there are two responses with this value. Again, notice that the data set tends to be clustered about a central value. The measure of central tendency used throughout this book is the mean, sometimes called the average [Arkin and Colton (197O)l. It is defined as the sum (Z) of all the response values divided by the number of response values. In this book, we will use 7 as the symbol for the mean (see Section 1.3).
where y l i is the value of the ith evaluation of response yl, and n is the number of response evaluations. For the data illustrated in Figure 3.2,
48
vl
= (32.48+32.51+32.52+32.53+32.53 +32.54+32.54+32.55+32.57)/9 = 32.53
(3.2)
3.2 Degrees of freedom Suppose a person asks you to choose two integer numbers. You might reply, “Five and nine”. But you could just as easily have chosen one and six, 14 and 23, 398 and 437, etc. Because the person did not place any constraints on the integers, you had complete freedom in choosing the two numbers. Suppose the same person again asks you to choose two integer numbers, this time with the restriction that the sum of the two numbers be ten. You might reply, “Four and six”. But you could also have chosen five and five, -17 and 27, etc. A close look at the possible answers reveals that you don’t have as much freedom to choose as you did before. In this case, you are free to choose the first number, but the second number is then fixed by the requirement that the sum of the numbers be ten. For example, you could choose three as the first number, but the second number must then be seven. In effect, the equality constraint that the sum of the two numbers be ten has taken away a degree of freedom (see Section 2.3). Suppose the same person now asks you to choose two integer numbers, this time with the restrictions that the sum of the two numbers be ten and that the product of the two numbers be 24. Your only correct reply would be, “Four and six”. The two equality constraints (yll + y12= 10 and yllxy12= 24) have taken away two degrees of freedom and left you with no free choices in your answer. The number of degrees offreedom (DF) states how many individual pieces of data in a set can be independently specified. In the first example, where there were no constraints, DF = 2. For each independent equality constraint placed on the system, the number of degrees of freedom is decreased by one. Thus, in the second example, DF = 2-1 = 1, and in the third example, DF = 2-2 = 0. In Section 3.1, the mean value of a set of data was calculated. If this mean value is used to characterize the data set, then the data set itself loses one degree of freedom. Thus, if there are n values of response, calculation of the mean value leaves only n-1 degrees of freedom in the data set. Only n-1 items of data are independent - the final item is fixed by the other n-1 items and the mean.
3.3 The variance For any given set of data, the specification of a mean value allows each individual response to be viewed as consisting of two components - a part that is described by the mean response Cj,), and a residual part (rli) consisting of the difference or
49 c
m
II a,
;Residual
2
0
in 0
ID 0
m cu
m N
m 1u
Figure 3.3 Illustration of a residual as the difference between a measured response and the mean of all responses.
deviation that remains after the mean has been subtracted from the individual value of response. Y1r
=Ylr -YI
(3.3)
The concept is illustrated in Figure 3.3 for data point seven of the example in Section 3.1. The total response (32.57) can be thought of as consisting of two parts - the contribution from the mean (32.53) and that part contributed by a deviation from the mean (0.04). Similarly, data point five of the same data set can be viewed as being made up of the mean response (32.53) and a deviation from the mean (-0.05), which add up to give the observed response (32.48). The residuals for the whole data set are shown in Figure 3.4. The set of residuals has n-1 degrees of freedom [Youden ( 195l)]. Information about the reproducibility of the data is contained in the residual parts of the responses. If the residuals are all small, then the dispersion is slight and the responses are said to cluster tightly - the reproducibility is good. On the other hand, if the residuals are all large, the variation in response is large, and the reproducibility is poor. One measure of variation about the mean of a set of data is the variance (s’), defined as the sum of squares of residuals, divided by the number of degrees of
0 0
+
0
Figure 3.4 Plot showing frequency of obtaining a given residual as a function of the values of the residuals. The solid curve shows an estimated gaussian or normal distribution for the data.
50
freedom associated with the residuals.
For the data in Section 3.1, S?
= [ (-0.05)'+ ( -0.02)'+
( -0.01
)2+
(0.01 ) 2 + (0.01 ) *
+(0.02)'+ (0.04)']/(9- 1 ) =0.0052/8=0.00065
(3.5)
The square root of the variance of a set of data is called the standard deviation and is given the symbol s. S=
(s')
(3.6)
1/2
For the data in Section 3.1, S=
(3.7)
(0.00065)'/2=0.025
In much experimental work, the distribution of residuals (i.e., the frequency of occurrence of a particular value of residual vs. the value itself) follows a normal or gaussiun curve described by the normalized equation frequency = { 1/ [ s( 27c) '/']}exp [ - (yl
-v1 )'/ (2s')
]
(3.8) A scaled gaussian curve for the data of Section 3.1 is superimposed on Figure 3.4.
---------\
. 0
1
2
3
.
.
4 5 6 7 Level of Factor X I
8
9
10
Figure 3.5 Relationship of response frequency plot to response surface.
51
0
1
2
3
4 5 E Y Level o f Facto- X 1
8
9
?C
Figure 3.6 Example of homoscedastic noise.
Figure 3.5 shows the current data set superimposed on an arbitrary response surface. The system has been fixed so that all factors are held constant, and only one point in factor space (x, = 4) has been investigated experimentally. It must be remembered that in practice we are not “all knowing” and would not have a knowledge of the complete response surface; we would know only the results of our experiments, and could only guess at what the remainder of the response surface looked like. If we can pretend to be all knowing for a moment, it is evident from Figure 3.5 that if a different region of factor space were to be explored, a different value for the mean would probably result. What is not evident from the figure, however, is what the variance would be in these other regions. Two possibilities exist. If the variance of a response is constant throughout a region of factor space, the system is said to be homoscedusfic over that region (see Figure 3.6). If the variance of a response is not constant throughout a region of factor space, the system is said to be heteroscedastic over that region (see Figure 3.7).
3.4 Sample statistics and population statistics In the example we have been using (see Section 3. l), the response from the system was measured nine times. The resulting data is a sample of the conceptually infinite number of results that might be obtained if the system were continuously measured. This conceptually infinite number of responses constitutes what is called the
52
population of measured values from the system. Clearly, in most real systems, it is not possible to tabulate all of the response values associated with this population. Instead, only a finite (and usually rather limited) number of values can be taken from it [Fisher (1970), Fisher (1971)l. If it were possible to know the whole population of responses, then a mean and variance could be calculated for it. These descriptors are known as the population mean (p) and the population variance (d), respectively. For a given population, there can be only one value of p and one value of d. The population mean and the population variance are usually unknown ideal descriptors of a complete population about which only partial information is available. Now consider a small sample (n = 9, say) drawn from an infinite population. The responses in this sample can be used to calculate the sample mean, yl, and the sample variance, s2. It is highly improbable that the sample mean will equal exactly the population mean = p), or that the sample variance will equal exactly the population variance (s2= d). It is true that the sample mean will be approximately equal to the population mean, and that the sample variance will be approximately equal to the population variance. It is also true (as would be expected) that as the number of responses in the sample increases, the closer the sample mean approximates the population mean, and the closer the sample variance approximates the population variance. The sample mean, yl, is said to be an estimate of the population mean, p, and the sample variance, 2, is said to be an estimate of the population variance, d.
1
0
1
2
3
6 7 Level o f FaLtor X I 4
5
8
Figure 3.7 Example of heteroscedastic noise.
9
1 10
53
3.5 Enumerative vs. analytic studies W. E. Deming (1950, 1953, 1975b) was one of the first to point out the distinction between analytic studies and enumerative studies. As Hahn and Meeker (1991) have noted, “Despite its central role in making inferences from the sample data, many traditional textbooks in statistics have, by and large, been slow in giving this distinction the attention that it deserves”. This situation is changing [see, for example, Box, Hunter, and Hunter (1978), Hahn and Meeker (1991), Gitlow, Gitlow, Oppenheim, and Oppenheim (1989)l. According to W. E. Deming (1975b), an enumerative study is one in which “action will be taken on the material in the frame studied”, where a frame is defined as “an aggregate of identifiable units of some kind, any or all of which may be selected and investigated. The frame may be lists of people, areas, establishments, materials, or other identifiable units that would yield useful results if the whole content were investigated”. Thus, the frame defines a specific, well-defined population about which inferences can be made. The correctness of statistical inferences requires random sampling from this population. In contrast, an analytic study is one in which “action will be taken on the process or cause-system ...the aim being to improve practice in the future .... Interest centers in future product, not in the materials studied” [W. Deming (1975b)l. There is no longer an existing frame that delineates a specific, well-defined population about which inferences can be made: much of the population of an analytic study lies in the nonexistent future. Most statistical concepts, theory, and results are based on the assumption of enumerative studies (e.g., the confidence interval of the mean). This has important implications for the use of statistics in research, development, and manufacturing in which inferences about future behavior are frequent. For example, although we might be able to state the mean and standard deviation of the output of a chemical manufacturing process based on measurements of yield every hour for the past two weeks, extrapolation of these results to the future is probably more uncertain than traditional statistical measures would suggest. For example, it is possible that the source of raw materials might change, the plant operator might be replaced with a new hire, the process might be shut down for routine maintenance and then restarted, or a distillation tray might bump loose and block a flow path in a purification unit. Any one of these events is sufficient to cause a shift in the behavior of the manufacturing process which would (probably) result in a different mean yield and (possibly) a different standard deviation. In general, inferences about the future of processes that have been brought to a state of “statistical control” are more reliable than inferences about poorly behaved systems [W. Deming (1982), Wheeler (1983), Wheeler and Chambers (1986), Wheeler (1987)l.
54
xercises 3.1 Mean. Calculate the mean of the following set of responses: 12.61, 12.59, 12.64, 12.62, 12.60, 12.62, 12.65, 12.58, 12.62, 12.61. 3.2 Mean. Calculate the mean of the following set of responses: 12.83, 12.55, 12.80, 12.57, 12.84, 12.85, 12.58, 12.59, 12.84, 12.77.
3.3 Dispersion. Draw frequency histograms at intervals of 0.1 for the data in Problems 3.1 and 3.2. Calculate the variance and standard deviation for each of the two sets of data. 3.4 Normal distribution. Use Equation 3.8 to draw a gaussian curve over each of the histograms in Problems 3.3. Do the gaussian curves describe both sets of data adequately? 3.5 Degrees of freedom. A set of five numbers has a mean of 6; four of the numbers are 8, 5, 6, and 2. What is the value of the fifth number in the set?
3.6 Degrees of freedom. Given Equation 2.13, how many independent factors are there in Figure 2.15?
3.7 Degrees of freedom. Is it possible to have two positive integer numbers a and b such that their mean is five, their product is 24, and alb = 2? Why or why not? Is it possible to have two positive integers c and d such that their mean is five, their product is 24, and cld = 1.5? Why or why not? 3.8 Sample and population statistics. A box contains only five slips of paper on which are found the numbers 3, 4, 3, 2, and 5. What is the mean of this data set, and what symbol should be used to represent it? Another box contains 5,362 slips of paper. From this second box are drawn five slips of paper on which are found the numbers 10, 12, 11, 12, and 13. What is the mean of this sample, and what symbol should be used to represent it? What inference might be made about the mean of the numbers on all 5,362 slips of paper originally in the box? What inference might be made about the standard deviation of the numbers on all 5,362 slips of paper originally in the box?
3.9 Sample statistics. The five slips of paper drawn from the second box in Problem 3.8 are replaced, and five more numbers are drawn from 5,362 slips of paper. The numbers are 5, -271, 84, 298, and 12. By itself, is this set of numbers surprising? In view of the five numbers previously drawn in Problem 3.8, is this present set of numbers surprising? In view of the present set of numbers, is the previous set surprising? If the five slips of paper drawn from the first box in Problem 3.8 are replaced, and five more numbers are drawn, would you be surprised if the numbers are 12, 10, 12, 13, and 11? Why or why not? 3.10 Heteroscedastic noise. Poorly controlled factors are an important source of variation in responses. Consider the response surface of Figure 2.7 and assume that the factor x1 can be controlled to k0.5 across its domain. What regions of the factor domain of xl will give large variations in responses? What regions of x , will give small variations in response? Sketch a plot of variation in yI vs. x ] .
3.1I Reproducibility. “Reproducibility is desirable, but it should not be forgotten that it can be achieved just as easily by insensitivity as by an increase in precision. Example: All men are two meters tall give or take a meter” [Youden (1961b)l. Comment. 3.I 2 Accuracy. “Accuracy” is related to the difference between a measured value (or the mean of a set of measured values) and the true value. It has been suggested that the term “accuracy” is impossible to define quantitatively, and that the term “inaccuracy” should be used Instead. Comment. 3.13 Precision. “Precision” is related to the variation in a set of measured values. It has been suggested that the term “precision” is impossible to define quantitatively, and that the term “imprecision” should be used instead. Comment. 3. I4 Accuracy. A student repeatedly measures the iron content in a mineral and finds that the mean of five measurements is 5.62%. The student repeats the measurements again in groups of five and obtains means of 5.61%, 5.62%, and 5.61%. On the basis of this information, what can be said about the student’s accuracy (or inaccuracy)? If the true value of iron in the mineral is 6.21%, what can be said about the student’s inaccuracy?
56
3.15 Bias. The term “bias” is often used to express consistent inaccuracy of measurement. What is the bias of the measurement process in Problem 3.14 (sign and approximate magnitude)? 3.16 Accuracy and precision. Draw four representations of a dart board. Show dart patterns that might result if the thrower were a) inaccurate and imprecise; b) inaccurate but precise; c) accurate but imprecise; and d) accurate and precise. 3.17 Factor tolerances. Consider the response surface shown in Figure 2.17. A set of five experiments is to be carried out at x, = 85°C. If the actual temperatures were 84.2, 85.7, 83.2, 84.1, and 84.4, what would the corresponding responses be (assuming the exact relationship of Equation 2.15 and no other variations in the system)? What is the standard deviation of x,? What is the standard deviation of y l ? What is the relationship between the standard deviation of x l , the standard deviation of y , , and the slope of the response surface in this case? In the general case? 3.18 Youden plots. The term “rugged” can be used to indicate insensitivity of a process to relatively small but common variations in factor levels. One method of evaluating the ruggedness of measurement methods is to use interlaboratory testing in which samples of the same material are sent to a number of laboratories. Presumably, different laboratories will use slightly different factor levels (because of differences in calibrating the equipment, the use of different equipment, different individuals carrying out the measurements, etc.) and the aggregate results reported to a central location will reveal the differences in response and thus the “ruggedness” of the method. W. J. Youden suggested that although the actual factor levels of a given laboratory would probably be slightly different from the factor levels specified by the method, the factor levels would nevertheless be relatively constant within that laboratory. Thus, if a laboratory reported a value that was higher than expected for one sample of material, it would probably report a value that was higher than expected for other samples of the same material; if a laboratory reported a value that was lower than expected for one sample, it would probably report other low sample values as well. Youden suggested sending to each of the laboratories participating in an interlaboratory study two samples of material having slightly different values of the property to be‘measured (e.g., sample #I might contain 5.03% sodium: sample #2, 6.49% sodium). When all of the results are returned to the central location, the value obtained for sample #I is plotted against the value of sample #2 for each laboratory.
57
What would the resulting “Youden plots” look like if the method was rugged? If the method was not rugged? If the laboratories were precise, but each was biased? If the laboratories were imprecise, but accurate? [See Youden (1959).]
3.19 Sources of variation. Will variables that are true factors, but are not identified as factors always produce random and erratic behavior in a system (see Table 1.1)? Under what conditions might they not? 3.20 Homoscedastic and heteroscedastic response surfaces. In view of Problem 3.17 and what ever insight you might have on the shapes of response surfaces in general, what fraction of response surfaces do you think might be homoscedastic? Why? Is it possible to consider any local region of the response surface shown in Figure 3.7 to be approximately homoscedastic? 3.21 Uncertainty. Note that in Figure 3.5 the mean yl of the superimposed data set (32.53) does not coincide with the response surface (32.54). Assuming the drawing is correct, give two possible reasons for this discrepancy. 3.22 Enumerative vs. analytic studies. In academic quantitative chemical analysis laboratory courses, a common task is to determine the percent iron in an ore sample. Is this an enumerative or an analytic study? Why? 3.23 Enumerative vs. analytic studies. In the mining industry, a common task is to determine the percent iron in an ore. Is this an enumerative or an analytic study? Why? 3.24 Statistical process control. Statistical process control charts (such as the “x-bar” and range charts) plot measurements as a function of time [Grant and Leavenworth (1988)l. With reference to the current day, what part of these charts approximates an enumerative study? What part of these charts approximates an analytic study? Are the parts different? Are the uses different?
This Page Intentionally Left Blank
59
CHAPTER 4
One Experiment
Experiments that will be used to estimate the behavior of a system should not be chosen in a whimsical or unplanned way, but rather, should be carefully designed with a view toward achieving a valid approximation to a region of the true response surface [Cochran and Cox (1950), Youden (195 l), Wilson (1952), Mandel (1964), Fisher (1971)l. In the next several chapters, many of the important concepts of the design and analysis of experiments are introduced at an elementary level for the single-factor single-responsecase. In later chapters, these concepts will be generalized to multifactor, multiresponse systems. Let us imagine that we are presented with a system for which we know only the identities of the single factor x1 and the single response y , (see Figure 4.1). Let us further suppose that we can carry out one experiment on the system - that is, we can fix the factor x1 at a specified level and measure the associated response. We consider, now, the possible interpretations of this single observation of response. From this one piece of information, what can be learned about the effect of the factor x1 on the behavior of the system?
4.1 A deterministic model One possible model that might be used to describe the system pictured in Figure 4.1 is YI 1 = P o
(4.1)
where yI1is the single measured response and Po is a parameter of the model. Figure 4.2 is a graphical representation. This model, an approximation to the system’s true
Factor X I
System
I
Figure 4.1 Single-factor, single-response system for discussion of a single experiment.
,
0
1
2
3
.
.
.
4 5 6 7 L e v e l o f F a c t o r XI
.
.
8
9
10
Figure 4.2 Graph of the deterministic model y,, = Po.
response surface, assumes that the response is constant (always the same) and does not depend on the levels of any of the system’s inputs. A model of this form might be appropriate for the description of many fundamental properties, such as the half-life of a radionuclide or the speed of light in vacuum. The single observation of response from the system can be used to calculate the value of the parameter Po: that is, Po = yll. For example, if yI1= 5, then Po = yll = 5. This model is deterministic in the sense that it does not take into account the possibility of uncertainty in the observed response [Daniel and Wood (197 l), Daniel (1976)].
4.2 A probabilistic model A probabilistic or statistical model that does provide for uncertainty associated with the system is illustrated in Figure 4.3. For this example, it is assumed that the underlying response is zero and that any value of response other than zero is caused by some random process. This model might appropriately describe the vertical velocity (speed and direction) of a single gas molecule in a closed system, or white noise in an electronic amplifier - in each case, the average value is expected to be zero, and deviations are assumed to be random. The model is Yll
=r11
61
0
1
2
3
5 6 7 4 Level of Factor X I
8
9
10
Figure 4.3 Graph of the probabilistic model yI, = 0 + rll.
but to explicitly show the absence of a yll = O + r l ,
Po term, we will write the model as (4.3)
where rll is the residual or deviation between the response actually observed (rll)and the response predicted by the model (the predicted response is given the symbol $11 and is equal to zero in the present model). Note that this residual is not the difference between a measured response and the average response (see Section 3.3). If yll = 5 , then r l l = yll - jt,, = 5 - 0 = 5 . The term r l l is not a parameter of the model but is a single value sampled from the population of possible deviations [Natrella (1963)l. The magnitude of r l l might be used to provide an estimate of a parameter associated with that population of residuals, the population variance of residuals, 03.The population standard deviation of residuals is a,. The estimates of these two parameters are designated s: and s,, respectively [Neter, Wasserman, and Kutner (1990)l. If DF, is the number of degrees of freedom associated with the residuals, then
62
For this model and for only one experiment, n = I and DF, = 1 (we Rave not calculated a mean and thus have not taken away any degrees of freedom) so that sf = 41and s, = r , , . In ?his example, s, = r l l = 5 .
4.3 A proportional model A third possible model is a deterministic one that suggests a proportional relationship between the response and the factor. Yll
(4.5)
= PIX11
where PI is a parameter that expresses the effect of the factor x1 on the response yl. This approximation to the response surface is shown graphically in Figure 4.4. In spectrophotometry,Beer’s Law (A = abc) is a model of this form where PI represents the product of absorptivity a and path length b, and xI1corresponds to the concentration c (in grams per liter) of some chemical species that gives rise to the observed absorbance A. For a single observation of response at a given level of the = 5/6 for the example shown factor xI,the parameter PI can be calculated: PI = yIl/xll in Figure 4.4. When using models of this type with real systems, caution should be exercised in assuming a causal relationship between the response y1 and the factor x1 [Youden
0
1
2
3
4 5 6 7 Level of Factor X I
B
9
10
Figure 4.4 Graph of the proportional deterministic model y I I= plxll.
63
(1951), Wilson (1952), Huff (1954), Campbell (1974), Moore (1979)l: a single experiment is not sufficient to identifi which of any number of controlled or uncontrolled factors might be responsible for the measured response (see Section 1.2 on masquerading factors). In the case of Beer’s Law, for example, the observed absorbance might be caused not by the supposed chemical species but rather by some unknown impurity in the solvent.
4.4 Multiparameter models A more general statistical (probabilistic) model of the system takes into account both an offset (Po) and uncertainty (rll).
This model offers greater flexibility for obtaining an approximation to the true response surface. However, with only a single observation of response, it is not possible to assign unique values to both Po and rll. Figure 4.5 illustrates this difficulty: it is possible to partition the response y l l into an infinite number of combination of Po and rll such that each combination of Po and rll adds up to the observed value of response. Examples in this case are Po = 3.2 and rll = 1.8, and Po = 1.9 and rll = 3.1.
0
,
,
,
1
2
3
,
.
.
.
4 5 6 7 L e v e l o f Factor X I
8
9
10
Figure 4.5 Graphs of two fitted probabilistic models of the form yll = Po + rl
Level o f F a c t o r X 1
Figure 4.6 Graphs of two fitted probabilistic models of the form y , , = Plxll + rll.
The reason there are an infinite number of combinations of Po and rll that satisfy this model is that there are more degrees of freedom in the set of items to be estimated (two - Po and rll) than there are degrees of freedom in the data set (one).
/
0
1
2
3
4 5 6 7 Level of Factor X I
8
9
10
Figure 4.7 Graphs of two fitted deterministic models of the form y , , = Po + P,x,,.
65
0
0
1
2
3
4 5 6 7 L e v e l o f Factor X 1
8
9
10
Figure 4.8 Graphs of two fitted probabilistic models of the form yI1= Po + Plxll+ rll.
It is as if someone asked you for two integer values and told you that their sum must equal a certain value (see Equation 4.6 and Section 3.2): there still remains one degree of freedom that could be utilized to choose an infinite number of pairs of integers that would satisfy the single constraint. If we could reduce by one the number of degrees of freedom in the set of items to be estimated (the degrees of freedom would then be equal to one), then the single value of response would be sufficient to estimate Po and rll.To accomplish this, let us impose the constraint that the absolute value of the estimated residual be as small as possible. For this particular example, the residual is smallest when it is equal to zero, and the estimated value of Po would then be 5. As shown in Figures 4.6-4.8, this difficulty also exists for other multiparameter models containing more degrees of freedom in the set of items to be estimated than there are degrees of freedom in the data set. The other models are
Yll =Po+PlxlI
(4.8)
Y11 = Po + PIX11 + Y I l
(4.9)
and
respectively. The probabilistic model represented by Equation 4.7 can be fit to the single experimental result by imposing the previous constraint that the residual be as small as possible; the models represented by Equations 4.8 and 4.9 cannot. Each of the models discussed in this chapter can be made to fit the data perfectly. It is thus evident that a single observation of response does not provide sufficient information to decide which model represents the best approximation to the true response surface [Shewhart (19391, Himmelblau (1970)l. In the absence of additional knowledge about the system, it is not possible to know if the single observed response should be attributed to a constant effect (y,, = Po>,to uncertainty (y,, = 0 + r,,), to a proportional effect (yll = pix,,), to some combination of these, or perhaps even to some other factor. Because it is not possible to choose among several “explanations” of the observed response, a single experiment, by itself, provides no information about the effect of the factor x1 on the behavior of the system.
67
Exercises 4. I Inevitable events. Suppose a researcher develops a drug that is intended to cure the common cold. He gives the drug to a volunteer who has just contracted a viral cold. One week later the volunteer no longer has a cold, and the researcher announces to the press that his drug is a success. Comment. Suggest a better experimental design. 4.2 Coincidental events. It has been reported that one night a woman in New Jersey plugged her iron into an electrical outlet and then looked out the window to see all of the lights in New York City go out. She concluded it was her action that caused the blackout, and she called the power company to apologize for overloading the circuits. Comment.
4.3 Confounding. A researcher drank one liter of a beverage containing about one-third gin, two-thirds tonic water, and a few milliliters of lime juice. When asked if he would like a refill, he replied yes, but he requested that his host leave out the lime juice because it was making his speech slur and his eyes lose focus. Comment. Suggest an experimental design that might provide the researcher with better information. 4.4 Mill’smethods. In 1843, John Stuart Mill first published his System o f b g i c , a book that went through eight editions, each revised by Mill to “attempt to improve the work by additions and corrections, suggested by criticism or by thought....” Although much of Mill’s logic and philosophy has decreased in prestige over the years, his influence on methods in science is today still very much felt, primarily through the canons of experimental methods he set down in his Logic. The canons themselves were not initially formulated by Mill, but were given his name because of the popularization he gave to them. Mill’s First Canon states: “If two or more instances of the phenomenon under investigation have only one circumstance in common, the circumstance in which alone all the instances agree is the cause (or effect) of the given phenomenon”. Why is it necessary that there be “two or more” instances? Is this canon infallible? Why or why not? 4.5 Antecedent and consequent. “Smoking causes cancer”. Some might argue that “cancer causes smoking”. Comment. Is it always clear which is cause and which is effect? 4.6 Terminology. Define or describe the following: parameter, deterministic, probabilistic, statistical, residual, proportional, cause, effect.
68
4.7 Deterministic models. Write two models that are deterministic (e.g., F = ma). Are there any limiting assumptions behind these deterministic models? Are there any situations in which measured values would not be predicted exactly by the deterministic models? 4.8 Sources of variation. Suppose you have chosen the model yI1= Po + r l l to describe measured values from a system. What is the source of variation in the data (rJ? Is it the system? Is it the measurement process? Is it both? Draw a general system theory diagram showing the relationship between a system of interest and an auxiliary system used to measure the system of interest (see Figures 2.14 and 3.1).
4.9 Single-point calibrations. Many measurement techniques are based on physical or chemical relationships of the form c = km, where m is a measured value, c is the property of the material being evaluated, and k is a proportionality constant. As an example, the weight of an object is often measured by suspending it from a spring; the length of extension of the spring, m, is proportional, Vk, to the mass of the object, c. Such measurement techniques are often calibrated by the single-point method: i.e., measuring m for a known value of c and calculating k = clm. Comment on the assumptions underlying this method of calibration. Are the assumptions always valid? Suggest a method of calibration that would allow the testing of these assumptions. 4.I0 Confounding. A student wishes to measure the absorptivity ( a ) of compound X in water at a concentration ( c ) of 2 milligrams per liter in a one centimeter pathlength (b) glass cuvet at a wavelength of 150 nanometers. The measured absorbance A is found to be 4.0. What value would the student obtain for the absorptivity a? The true value is 0. Why might the student have obtained the wrong value? How could she have designed a set of experiments to avoid this mistake? 4.1 I Measurement systems.
An individual submitted a sample of material to a laboratory for the quantitative determination of a constituent of the material. The laboratory reported the value, 10.238%. When asked if this was a single datum or the mean of several data, the laboratory representative replied, “Oh, that’s just a single measurement. We have a lot of variability with this method, so we never repeat the measurements”. Comment. 4.12 Mean of zero.
Name four systems for which the model y I i = 0 + rli is appropriate (see Section 4.2).
69
4.13 Choice of model. Equation 4.4 is a general equation for calculating the variance of residuals. Equation 3.4 is a specific equation for calculating the variance of residuals. What is the model that gives rise to Equation 3.4? 4.14 Dimensions, constraints, and degrees of freedom. How many straight lines can pass through a single point? Through two non-coincident points? Through three non-collinear points? How many flat planes can pass through a single point? Through two non-coincident points? Through three non-collinear points? Through four non-coplanar points? 4.15 Models. Graph the following models over the factor domain -10 I x1 5 10: a) yli = 5.0 b) yli = O.5xli C) yli = 5.0 + 0 . 5 ~ ~ ~ d) y l i = O.05gi e) y l i = 5.0 - 0.05~:~ f) yli = 0 . 5 X l i - 0.05& g) yli = 0.5xli + 0 . 0 5 ~ ; ~ h) y l i = 5.0 + 0 . 5 ~ 0.05$:, ~~ 4.16 Calibration. Many electrochemical systems exhibit behavior that can be described by a relationship of the form E = E” - klog[q, where E is an observed voltage (measured with respect to a known reference electrode), E” is the voltage that would be observed under certain standard conditions, k is a proportionality constant, and represents the concentration of an electroactive species. Rewrite this model in terms of y , p’s, and x = log[y. The “glass electrode” used with so-called “pH meters” to determine pH (= -log[H+]) responds to log[H+] according to the above relationship. Comment on the practice of calibrating pH meters at only one [H’]. Why is “two-point” calibration preferred?
[a
4.I7 Hypotheses. A hungry student who had never cooked before followed the directions in a cookbook and baked a cake that was excellent. Based on this one experiment, he inwardly concluded that he was a good cook. Two days later he followed the same recipe, but the cake was a disaster. What might have gone wrong? How confident should he have been (after baking only one cake) that he was a good cook? How many cakes did it take to disprove this hypothesis? What are some other hypotheses he could have considered after his initial success?
70
4.18 Disproving hypotheses. Sometimes a single experiment can be designed to prove that a factor does not have an effect on a response. Suppose a stranger comes up to you and states that he keeps the tigers away by snapping his fingers. When you tell him it is a ridiculous hypothesis, he becomes defensive and tells you the reason there aren’t any tigers around is because he has been snapping his fingers. Design a single experiment (involving the stranger, not you) to attempt to disprove the hypothesis that snapping one’s fingers will keep tigers away. 4.I 9 Consequent and antecedent. “Thiotimoline” is a substance that has been (fictitiously) reported to dissolve a short time before being added to water [Asimov (1948)l.What inevitable effects could be caused by such a substance?
71
CHAPTER 5
Two Experiments
We consider now the possible interpretations of the results of two experiments on a system for which a single factor, xI,is to be investigated. What can be learned about the effect of the factor x1 on the behavior of the system from these two pieces of information? For the moment, we will investigate the experimental design in which each experiment is carried out at a different level of the single factor. Later, in Section 5.6, we will consider the case in which both experiments are performed at the same level. Before we begin, it is important to point out a common misconception that involves the definition of a linear model [Deming and Morgan (1979)l. Many individuals understand the term “linear model” to mean (and to be limited to) straight line relationships of the form
where y1 is generally considered to be the dependent variable, x1 is the independent variable, and Po and P, are the parameters of the model (intercept and slope, respectively). Although it is true that Equation 5.1 is a linear model, the reason is not that its graph is a straight line. Instead, it is a linear model because it is constructed of additive terms, each of which contains one and only one multiplicative parameter. That is, the model is first-order or linear in the parameters. This definition of “linear model” includes models that are not first-order in the independent variables. The model
is a linear model by the above definition. The model
however, is a nonlinear model because it contains more than one parameter in a single term and because the parameter p2 appears in an exponential term. For some nonlinear models it is possible to make transformations on the dependent and
12
independent variables to “linearize” the model. Taking the natural logarithm of both sides of Equation 5.3, for example, gives a model that is linear in the parameters and P2: log,(ylJ = Pi - PA^, where P; = log,(PI). This book is limited to models that are linear in the parameters.
5.1 Matrix solution for simultaneous linear equations In the previous chapter, it was seen that a single observation of response from a system does not provide sufficient information to fit a multiparameter model. A geometric interpretation of this limitation for the linear model y l i = Po + Plxli (Equation 5.1) is that a straight line drawn through a single point is not fixed, but is free to rotate about that point (see Figure 4.7). The limitation is overcome when mo experiments are carried out at diflerent levels (xI1and xI2)of the factor x1 as shown in Figure 5.1. The best approximation to the true response surface is then uniquely defined by the solutions (Po and PI) to the set of simultaneous linear equations that can be obtained by writing the linear model (Equation 5.1) for each of the two experiments.
where yI1is the response obtained when factor x, is set at the level xI1,and y,2 is the response at x12.Again, the parameter Pois interpreted as the intercept on the response axis (at xI = 0); the parameter PI is interpreted as the slope of the response surface with respect to the factor x1 - i.e., it expresses the first-order (straight line) effect of the factor x1 on the response y l . Equation 5.4 may be rewritten as
Expressed in matrix form (see Appendix A), this becomes
(5.6)
Recall that in matrix multiplication, an element in the ith row (a row is a horizontal arrangement of numbers) and jth column (a column is a vertical arrangement of
73 m ,
,
, -
o
Level
[if
Factor X 1
Figure 5.1 Graph of the deterministic model yli = Po + p,xli fitted to the results of two experiments at different levels of the factor xi.
numbers) of the product matrix is obtained by multiplying the ith row of one matrix by the jth column of a second matrix, element by corresponding element and summing the products. Thus, Equation 5.6 may be rewritten as
(5.7)
Note that the leftmost matrix contains the coefficients of the parameters as they appear in the model from left to right; each row contains information about a different experiment, and each column contains information about a different parameter. The next matrix contains the parameters, from top to bottom as they appear in the model from left to right. The matrix on the right of Equation 5.7 contains the corresponding responses; each row represents a different experiment. Let the leftmost 2-row by 2-column (2x2) matrix be designated the matrix of parameter coeficients, X , let the 2-row by 1-column (2x1) matrix of P’s be designated the matrix of parameters, B; and let the rightmost 2x1 matrix be designated the matrix of measured responses, Y. Equation 5.5 may now be expressed in concise matrix notation as
XB= Y
74
Equation 5.8 i s the matrix form of aII deterministic linear models. The matrix solution for the parameters of the simujtaneous linear equations is stated here without proof:
where X-l is the inverse (see Appendix A) of the matrix of parameter coefficients. If the elements of the 2x2 X matrix are designated u, b, c, and d, then
(5.10) and
(5.11)
where
D = a x d - c x ii
(5.12)
is the determinant of the 2x2 X matrix. Thus, the notation of Equation 5.9 is equivalent to
(5.13) and
The matrix approach to the solution of a set of simultaneous linear equations is entirely general. Requirements for a solution are that there be a number of equations exactly equal to the number of parameters to be calculated and that the determinant
75
D of the X matrix be nonzero. This latter requirement can be seen from Equations 5.14 and 5.15. Elements a and c of the X matrix associated with the present model are both equal to unity (see Equations 5.10 and 5.7); thus, with this model, the condition for a nonzero determinant (see Equation 5.12) is that element b ( x , J not equal element d (xI2). When the experimental design consists of two experiments carried out at different levels of the factor xi (xll4 x12;see Figure 5.1), the condition is satisfied. To illustrate the matrix approach to the solution of a set of simultaneous linear equations, let us use the data points in Figure 5.1: xI1= 3, y I 1= 3, xI2= 6, and yI2= 5. Equation 5.5 becomes
(5.16)
In matrix form, the equation XB = Y is
(5.17)
The determinant of the matrix of parameter coefficients is D = (1x6 - 1x3) = 3. Inverting the X array gives
X-1=
6/3 [-1/3
-3/3]=[ 1/3
2
-1
-1/3
1/3
]
(5.18)
and the solution for B = X-’Y is
- 1 1/3 1315
= .= ];[-[3 ,: =
[-
Thus, the intercept
[
2 x 3 - 1 5 1 = ;/3] 3/3 5/3
+
(Po) is
1 and the slope with respect to the factor x,
(5.19)
(PI) is 2/3.
76
Substitution of these values into Equation 5.16 serves as a check. 1 x 1 + 3 X (2/3) = 3 1 x 1 + 6 X (2/3) = 5
(5.20)
This particular experimental design involving two experiments at two different levels of a single factor has allowed the exact fitting of the model y l i = Po + /3,xlP Note that both of the p’s are parameters of the model and use up the total degrees of freedom (DF = 2). It is not possible to estimate any uncertainty due to random processes that might be taking place; there are no degrees of freedom available for calculating s: (see Equation 4.4).
5.2 Matrix least squares Consider now the probabilistic model illustrated in Figure 5.2 and expressed as Y11= Po + r l i
(5.21)
Note that the response is not a function of any factors. For this model, an estimate of Po (the estimate is given the symbol b,) is the mean of the two responses, y , , and Y12.
Of the total two degrees of freedom, one degree of freedom has been used to estimate the parameter Po. leaving one degree of freedom for the estimation of the variance of the residuals,
<.
(5.23)
and s,= 1.41
(5.24)
Suppose these solutions had been attempted using simultaneous linear equations. One reasonable set of linear equations might appear to be
77
0
1
2
3
4 5 6 7 Level of Factor X 1
8
9
10
Figure 5.2 Graph of the fitted probabilistic model y , , = p,,
+ r,,.
(5.25)
There is a problem with this approach, however - a problem with the residuals. The residuals are neither parameters of the model nor parameters associated with the uncertainty. They are quantities related to a parameter that expresses the variance of the residuals, The problem, then, is that the simultaneous equations approach in Equation 5.25 would attempt to uniquely calculate three items (Po, rl,. and r,2) using only two experiments, clearly an impossible task. What is needed is an additional constraint to reduce the number of items that need to be estimated. A unique solution will then exist. We will use the constraint that the sum of squares of the residuals be minimal. The following is a brief development of the matrix approach to the least squares fitting of linear models to data. The approach is entirely general for all linear models. Again, let X be the matrix of parameter coefficients defined by the model to be fit and the coordinates of the experiments in factor space. Let Y be the response matrix associated with those experiments. Let B be the matrix of parameters, and let a new matrix R be the matrix of residuals. Equation 5.25 may now be rewritten in matrix notation as
4.
XB+R= Y
(5.26)
78
Equation 5.26 is the matrix form of all probabilistic linear models. For the model of Equation 5.21 this can be written
(5.27 )
(Recall that the coefficient of the single parameter therefore, only 1's appear in the X matrix.) Equation 5.26 can be rearranged to give
Po is equal to one in each case;
(5.28)
R = Y-XB It is cow useful to note that the R matrix multiplied by its transpose
gives the sum of squares of residuals, abbreviated SS,.
(5.30) Although a complete proof of the following is beyond the scope of this presenthrlua, it can be shown that partial differentiation of the sum of squares of residuals with respect to the B matrix gives, in a simple matrix expression, the partial derivative of the sum of squares of residuals with respect to all of the p's.
R'R = ( Y - X B ) ' ( Y - X B )
(5.31)
d(R'R)IaB= a[(Y-XB)'( Y-XB)]IdB = X' ( Y - X B ) = X
Y-XXB
(5.32)
If this matrix of partial derivatives is set equal to zero (at which point the sum of squares of residuals with respect to each p will be minimal), the matrix equation
0 =X Y - X X B is obtained where
(5.33) is the matrix of parameter estimates giving this minimum sum
79
of squares. Rearranging gives
XXB=XY
(5.34)
The (X'X)array can be eliminated from the left side of the matrix equation if both sides are multiplied by its inverse, (X'X)-'.
( X X ) - ' ( X X ) B =( X x ) - l ( X Y )
(5.35)
B= (XX)-I(XY)
(5.36)
This is the general matrix solution for the set of parameter estimates that gives the minimum sum of squares of residuals. Again, the solution is valid for all models that are linear in the parameters. Let us use the matrix least squares method to obtain an algebraic expression for the estimate of Po in the model y l i = Po + rli (see Figure 5.2) with two experiments at two different levels of the factor xl. The initial X, B, R,and Y arrays are given in Equation 5.27. Other matrices are X=[1
(5.37)
11
(XX)=[l
ll[~]=[lxl+lxl]=[2l
(5.38)
The inverse of a 1x1 matrix is the reciprocal of the single value. Thus,
(XX)-'=[1/2]
(XY)=[l
I]
cy'::]=
(5.39)
[1xYlI+1xY121=
[Y11+Y*21
(5.40)
The least squares estimate of the single parameter of the model is (5.41 )
0
1
2
3
4
5
6
7
6
Value o f bg
Figure 5.3 Squares of the individual residuals and the sum of squares of residuals, as functions of different values of b,.
and shows that for this model the mean response is the best least squares estimate of Po, the estimate for which the sum of squares of residuals is minimal. The data points in the present example are xI1= 3, y I 1= 3, xI2= 6, and y12= 5 , and the least squares estimate of Po is
bo= ( 3 + 5 ) / 2 = 4
(5.42)
Figure 5.3 plots the squares of the individual residuals (61and <2) and the sum of squares of residuals (SS,), for this data set as a function of different values of b,, demonstrating that b, = 4 is the estimate of Po that does provide the best fit in the least squares sense. Equations 5.28 and 5.30 provide a general matrix approach to the calculation of the sum of squares of residuals. This sum of squares, SS,, divided by its associated number of degrees of freedom, DF,, is the sample estimate, s:, of the population variance of residuals,
4.
s,Z=SS,/DF,= ( R ’ R ) / D F , = [ ( Y - X B ) ’ ( Y - X B ) ] / D F ,
(5.43)
For two experiments and the model y l i = Po + rli,the degrees of freedom of residuals
81
is one, and the estimate of the variance of residuals is calculated as follows: (5.44)
R’R=[-1
S;
I]
[-;I=
= SS,/DF, = 2/ 1 = 2
s,= 1.41
[-lX(-l)+lx11=2
(5.45)
(5.46)
(5.47)
5.3 The straight line model constrained to pass through the origin Another statistical model that might be fit to two experiments at two different levels is
and is illustrated in Figure 5.4. The model is a probabilistic proportional model that is constrained to pass through the origin; there is no Po term, so only a zero intercept is allowed. In this model, the response is assumed to be directly proportional to the level of the factor x,; any deviations are assumed to be random. The matrix least squares solution is obtained as follows. (5.49)
B = [I311
(5.50)
82
(5.51)
(5.52)
(5.53)
( X X ) = [XI1
xI2l[x"]= [x:,
(5.54)
+x:21
XI2
( X X ) - ' = [ l/(X?I +x?2)]
(5.55)
(5.56)
of
For the data points in Figure 5.4 (xI1= 3, yI1= 3, x I z = 6, yI2= 5 ) , the estimates PI and 4 are
b1= ( 3 X 3 + 6 X 5 ) / ( 3 2 + 6 2 ) = 3 9 / 4 5 = 13/15
s:=
( R ' R ) / l = [6/15
-3/15]
[-:$::]
= 36/225+9/225 = 1 / 5
(5.58)
(5.59)
83
A plot of r ; , , r;2, and the sum of squares of residuals vs. b, is shown in Figure 5.5; SS, is clearly minimal when b, = 13/15. Note that the residuals do not add up to zero when the sum of squares of residuals is minimal for this example. (The residuals do add up to zero in Equation 5.44 for the model involving the Po parameter.) To understand why they are not equal and opposite in this example, look at Figure 5.4 and mentally increase the slope until the two residuals are equal in magnitude (this situation corresponds to the point in Figure 5.5 where the r ; , and r:2 curves cross between b, = 13/15 and b, = 14/15). If the slope of the straight line in Figure 5.4 is now decreased, the magnitude of r : , will increase, but the magnitude of r;2 will decrease much faster (for the data shown) and this will tend to decrease the sum of squares (see Figure 5.5). However, as the slope is further decreased and r ; , gets smaller, the relative decrease in the SS, caused by r:2 also gets smaller (see Figure 5.5); the increase in the SS, caused by an increasing r : l finally dominates, and at b, = 13/15 the sum of squares of residuals starts to increase again. In general, the sum of residuals (not to be confused with the sum of squares of residuals) will equal zero for models containing a Po term; for models not containing a Po term, the sum of residuals usually will not equal zero.
5.4 Matrix least squares for the case of an exact fit Figure 5.1 shows the unconstrained straight line model yli =
0
1
2
3
4 5 6 7 Level of Factor X I
8
9
10
Figure 5.4 Graph of the fitted probabilistic model y l i =
PIXli
+ rli
Po + Pixli passing
-1 u> From r i l
0
From r12
Value o f b i i n 1 5 t h ~
Figure 5.5 Squares of the individual residuals and the sum of squares of residuals, as functions of different values of b,.
exactly through the two experimental points. The matrix least squares approach can be used in situations such as this where there is an exact fit. Using the data xI1= 3, y l l = 3, xI2= 6, y12= 5, the appropriate arrays are (5.60)
(5.61)
(5.62)
x="3
'1
6
(5.63)
85
(XX)=
[: 'I" 6
lXl+lXl 3xl+6x1
1
45 91
D = (2x45-9x9) =9
(5.64)
(5.65)
(5.66)
1 1 3 lX3+1X5 (X = 3 61 51 = 3x 3+ 6x 5 ] =
[ [ [
[];3
39 [ 40-+78/91 [:,
= -8
(5.68)
31
=
(5.67)
Thus, the intercept b,, = 1 and the slope b, = 2/3are identical to those values obtained using the simultaneous linear equations approach (see Equation 5.19and Figure 5.1). Because there is an exact fit, the residuals are equal to zero.
1 3
=
[:I-[
1~1+3X(2/3) 3-3 1 x 1 +6x (2/3)] = 5- 51 =
[
[:]
(5.69)
The sum of squares of residuals must also be equal to zero.
I:[
S S r = R ' R = [ O 01
=O
(5.70)
86
The variance of residuals sf = SS,/DF, is undefined: the data set contains two values (n = 2) but one degree of freedom has been lost for each of the two parameters Po and PI. Thus, DF, = 2 - 2 = 0 and there are no degrees of freedom available for calculating the variance of residuals.
5.5 Judging the adequacy of models Let us return now to a question asked at the beginning of this chapter, “What can be learned about the behavior of the system from two experiments, each carried out at a different level of a single factor”? For the data used in this section, the unconstrained straight line model y l i = Po + pixli + rli fits exactly. Although the constrained model y l i = Plxli+ rli does not fit exactly, it does “explain” the observed data better than the simple model yli = Po + rli(ss values of 1/5 and 2, respectively). Thus, the response of the system would seem to increase as the level of the factor x, increases. Is it justifiable to conclude that the factor x, has an influence on the output yl? The answer requires a knowledge of the purely experimental uncertainty of the response, the variability that arises from causes other than intentional changes in the factor levels. This variance due to purely experimental uncertainty is given the symbol of, and its estimate is denoted sf. If the purely experimental uncertainty is small compared with the difference in
I
o
1
2
3
,
,
5 6 7 Level of Factor X I r?
a
9
10
Figure 5.6 Graphs of three models (yl, = p, + pixI,, yl, = p, points having small purely experimental uncertainty.
+ rlr,and y , , = plxl,+ rl,) fit to two data
87
response for the two experiments (see Figure 5.6), then the observed responses would be relatively precise indicators of the true response surface and the simple model y l i = Po + rli would probably be an inadequate representation of the system’s behavior; the constrained model y l i = plxli+ rli or the unconstrained model y l i = Po + p I x l i+ rli might be better estimates of the true response surface. However, if the purely experimental uncertainty is large compared with the differences in response for the two observations (see Figure 5.71, then the observed responses might be relatively poor indicators of the underlying response surface and there would be less reason to question the adequacy of the model y l i = Po + rli;the other two models would still fit the data better, but there would be less reason to believe they offered significantly better descriptions of the true response surface. Unfortunately, two experiments at two different levels of a single factor cannot provide an estimate of the purely experimental uncertainty. The difference in the two observed responses might be due to experimental uncertainty, or it might be caused by a sloping response surface, or it might be caused by both. For this particular experimental design, the effects are confused (or confounded ) and there is no way to separate the relative importance of these two sources of variation. If the purely experimental uncertainty were known, it would then be possible to judge the adequacy of the model y l i = Po + rli:if s: were very much greater than sk (see Figure 5.6), then it would be unlikely the large residuals for that model occurred by chance, and we would conclude that the model does not adequately describe the true behavior of the system. However, if ss were approximately the same as s; (see Figure 5.7). then we would conclude that the model was adequate. (The actual decision compares S; to a variance slightly different from s:, but the reasoning is similar.) The estimation of purely experimental uncertainty is essential for testing the adequacy of a model. The material in Chapter 3 and especially in Figure 3.1 suggests one of the important principles of experimental design: the purely experimental uncertainty can be obtained only by setting all of the controlled factors at fired levels and replicating the experiment.
5.6 Replication Replication is the independent performance of two or more experiments at the same set of levels of all controlled factors. Replication allows both the calculation of a mean response, r),, and the estimation of the purely experimental uncertainty, s , ; at that set of factor levels. We now consider the fitting of several models to two experiments camed out at the same levels of all controlled factors with the purpose, again, of learning something about the effect of the factor x , on the behavior of the system.
88
. 0
1
2
3
.
5 6 7 Level of Factor X1
4
8
9
10
Figure 5.7 Graphs of three models (yl, = Po + Pix,,, yI,= Po + rl,,and yl, = Pix,, + r,,)fit to two data points having large purely experimental uncertainty.
The model yli = Po + Plxli+ rli
It is instructive to try to fit the unconstrained model y l i = Po + P,xli + rli to the results of two experiments carried out at a single level of the factor xI.Let the data be xI1= 6, yI1= 3, xIz= 6, ylz = 5 (see Figure 5.8). Then, (5.71)
1
6
2
12
(5.72)
(5.73)
89
Because the determinant is equal to zero, the (X’X) matrix cannot be inverted, and a unique solution does not exist. An “interpretation” of the zero determinant is that the slope PI and the response intercept Po are both undefined (see Equations 5.14 and 5.15). This interpretation is consistent with the experimental design used and the model attempted: the best straight line through the two points would have infinite slope (a vertical line) and the response intercept would not exist (see Figure 5.8). The failure of this model when used with these replicate data points may also be considered in terms of degrees of freedom. Here, the total number of degrees of freedom is two, and we are trying to estimate two parameters, Po and PI; this is not inconsistent with the arguments put forth in Sections 4.4, 5.1, and 5.4. However, replication requires that some degrees of freedom be used for the estimation of the purely experimental uncertainty. In general, if m replicates are carried out at a given set of factor levels, then m - 1 degrees of freedom must go into the estimation of 0;. In the present example, one degree of freedom must be given to the estimation of purely experimental uncertainty; thus, only one other parameter can be estimated, and it is not possible to obtain a unique solution for the two-parameters (Po and PI) of the model y l i = Po + Pixli + rli. When replication is camed out at one set of experimental conditions (i.e., at a
(5.74) Li=
0
1
J
1
2
3
4 5 7 L ~ v e ol f F a c t o r X 1
8
9
Figure 5.8 Graph of the deterministic model yl, = the same level of the factor xl.
1
0
Po+ PI.,,
fit to the results of two experiments at
90
where j l i and y l i refer only to the replicate responses at the single point in factor space (responses at other points in factor space are not used in this calculation). For this data set (xI,= n,, = 6), the purely experimental variance is s&= [ ( 3 - 4 ) 2 + ( 5 - 4 ) 2 ] / ( 2 - 1 )
=(l+l)/l =2
(5.75)
The model y,; = plxli+ rli
Figure 5.9 contains the same two replicate experiments shown in Figure 5.8, but here the response surface for the model y l i = p,xli + rli is shown. The least squares solution is obtained as follows.
(5.76)
( X ' X )= [6 61
[:I=
[721
(XX)-' = [ 1/72]
( X ' Y ) = [6 61
[:I=
(5.77)
(5.78)
[481
B = [bl] = [48] [ 1/72] = 2 / 3
(5.79)
(5.80)
(5.81) and
.SF=
(R'R)/I= [-I
I ] [-+2
(5.82)
91
t
-i
Level of Factor X I
Figure 5.9 Graph of the fitted probabilistic model y , , = P,xli
rl i'
Note that b, is used with the fixed level of x1 = 6 to estimate the mean response for the replicate observations. The model y l i = Po + rli Let us fit the probabilistic model, y l i = Po+ rlirto the same data (see Figure 5.10). If the least squares approach to the fitting of this model is employed, the appropriate matrices and results are exactly those given in Section 5.2 where the same model was fit to the different factor levels: xI1= 3, yl, = 3, x,, = 6, y l z = 5 . This identical mathematics should not be surprising: the model does not include a term for the factor x1and thus the matrix of parameter coefficients, X,should be the same for both sets of data. The parameter Po is again estimated to be 4, and 0: is estimated to be 2. The model yli = 0 + rli Before leaving this chapter we will consider one final model, the purely probabilistic model y l i = 0 + rli(see Section 4.2 and Equation 4.3). Whether obtained at different levels of the factor x , or at the same level (replicates), there is a possibility that the two observed responses, y I 1 = 3 and ylz = 5 , belong to a population for which the mean (p) is zero. The fact that the two numbers we have
92
R
0
C;
1
2
3
4 5 6 7 Level o f F a c t o r XI
B
9
Figure 5.10 Graph of the fitted probabilistic model
1C
= Po + rli.
obtained give a sample mean 6,)of 4 might have occurred simply by chance. Thus, when judging the adequacy of various models, we should not overlook the possibility that the purely probabilistic model is adequate. It is not possible to fit this model using matrix least squares techniques: The matrix of parameter coefficients, X,does not exist - it is a Ox0 matrix and has no elements because there are no parameters in the model. However, the matrix of residuals, R, is defined. It should not be surprising that for this model, R = Y; that is, the matrix of residuals is identical to the matrix of responses.
93
Exercises
5.2 Simultaneous equations. Use simultaneous equations (Section 5.1) to solve for set of equations: 9 = P o + 3P1 13 = Po + 5P,
Po and PI in the following
5.3 Simultaneous equations. Use simultaneous equations (Section 5.1) to solve for Po. PI, and following set of equations: 16 = Po + 5Pl + 2P2 11 = P o + 3p, + 1Pz 32 = Po + 6P, + 7P2 (See Appendix A for a method of finding the inverse of a 3x3 matrix.)
P2 in
the
5.4 Matrix least squares. Use matrix least squares (Section 5.2) to estimate b, and b, in Problem 5.2. 5.5 Matrix least squares. Use matrix least squares (Section 5.2) to estimate b,, b,, and b, in Problem 5.3. 5.6 Sum of squares of residuals. Construct a graph similar to figure 5.3 showing the individual squares of residuals and the sum of squares of residuals as a function of b, (b, = 2) for Problem 5.4. Construct a second graph as a function of b, (bo= 3). 5.7 Covariance. Construct five graphs showing the sum of squares of residuals as a function of b, for Problem 5.4. Graph 1: b, = 1; Graph 2: b, = 2 (see Problem 5.6); Graph 3: b, =
94
3; Graph 4: b, = 4; Graph 5: b, = 5. Why doesn’t the minimum occur at the same value of b, in all graphs? Which graph gives the overall minimum sum of squares of residuals? 5.8 Matrix least squares. Use matrix least squares to fit the model y l i = p , x l i + rlito the data in Problem 5.2 (xI1= 3, yl, = 9, x12= 5, y , , = 13). Does the sum of residuals (not the sum of squares of residuals) equal zero? Graph the individual squares of residuals and the sum of squares of residuals as a function of b,.
5.9 Matrix least squares. Use matrix least squares to fit the data in Section 3.1 to the model y l i = Po + rli. Compare (R’R)/DF,with the variance in Equation 3.5. 5.I0 Residuals. Plot the following data pairs ( x I i ,yli) on a piece of graph paper: (0, 3.0), (1, 5.0), (2, 6.8), (3, 8.4). (4,9.8), (5, 11.0). (6, 12.0), (7, 12.8), (8, 13.4), (9, 13.8), (10, 14.0). Use a ruler or other straightedge to draw a “good” straight line (yli = Po + pixli + rli) through the data. Measure the residual for each point (record both sign and magnitude) and on a second piece of graph paper plot the residuals as a function of xl. Is there a pattern to the residuals? [See, for example, Chapter 3 in Draper and Smith (1981).] Suggest a better empirical linear model that might fit the data more closely. How well does the model y l i = 3 + 2 . 1 -~ O.l$i ~ ~ fit?
5.11 Purely experimental uncertainty. Most textbooks refer to sk as the “variance due to pure error”, or the “pure error variance”. In this textbook, sk is called the “variance due to purely experimental uncertainty”, or the ‘‘purely experimental uncertainty variance”. What assumptions might underlie each of these systems of naming? See Problem 6.14. [See, also, pages 123-127 in Mandel (1964).] 5.I 2 Replication. A researcher carries out the following set of experiments. Which are replicates? If the factor x1 is known to have no effect on y , , and if x , is ignored, then which of the experiments can be considered to be replicates? 5.13 Purely experimental uncertainty.
Refer to Figure 3.1. How can purely experimental uncertainty be decreased? What are the advantages of making the purely experimental uncertainty small? What are the disadvantages of making the purely experimental uncertainty small?
95 i
Xli
x2i
Y1i
1
3
7
12.6
2
7
1
4.5
3
2
3
2.9
4
7
5
11.5
5
2
4
11.4
6
6
3.2
7
3
12.9
8
2
3.2
9
5
4.3
10
2
3
2.8
5.14 Purely experimental uncertainty. Assume that the nine values of measured response in Section 3.1 are replicates. What is the mean value of these replicates? How many degrees of freedom are removed by calculation of the mean? How many degrees of freedom remain for the estimation of sk? What is the value of sk? 5.15 Replication.
Consider the following four experiments: xll = 3, yll = 2, x12= 3, y12= 4,x , =~ 6 , y13= 6, x14= 6, y14= 4. At how many levels of x1 have experiments been carried out? What is the mean value of y1 at each level of x,? How many degrees of freedom are removed by calculation of these two means? How many degrees of freedom remain for the calculation of s&? If n is the number of experiments in this set andfis the number of levels of xl, then what is the relationship between n and f that expresses the number of degrees of freedom available for calculating the purely experimental uncertainty? Why is sk = 2 for this set of data?
5.16 Matrix algebra. Given XB = Y for a set of simultaneous equations, show that B = X ' Y (see Appendix A).
96
5.1 7 Matrix algebra. Given XB = Y for an overdetermined set of equations, rigorously show that b = (X’X)-’(X‘Y) gives the minimum sum of squares. [See, for example, Kempthorne (1952).] 5.18 (X’X)-’ matrix. Calculate the (X’X) matrix, its determinant, and the (X’X)-’ matrix for the model y l i = Po+ Plxli+ rli fit to each of the following three sets of data: a) xI1= 1.9, xI2= 2.1; b) xI1= 0,xI2= 4;c) xI,= 0, xI2= 0,xI3= 4,xI4= 4.From this information, can you offer any insight into conditions that make the elements of (X’X)-’ small? Design a set of experiments to prove or disprove these insights.
5.I9 Matrix least squares. Use matrix least squares to fit the model y l i = Po + plxli+ rli to the following ten data points:
1
0
0.1
2
6
11.7
3
9
19.0
4
4
8.6
5
5
10.0
6
6
15.9
7
1
1.8
8
10
19.9
9
2
4.1
10
3
5.9
11
7
14.1
5.20 Importance of replication. Suppose a researcher who has never gambled before goes to Las Vegas, bets $10 on red in a roulette game, and wins $20. Based on the results of that one experiment,
97
he makes a conclusion and bets the $20 and all of the rest of his money on red again. He loses. Comment about the usefulness of restrained replication. Suppose a researcher who has been sheltered in his laboratory for ten years goes outside and says hello to the first person he meets. The stranger stops, takes the researcher’s money, and runs away. The researcher makes a conclusion, becomes a recluse, and never speaks to another human again. Comment about the usefulness of bold replication. 5.21 Replication. Measurement laboratories are often sent blind duplicates of material for analysis (i.e., the measurement laboratory is not told they are duplicates). Why?
This Page Intentionally Left Blank
99
CHAPTER 6
Hypothesis Testing
In Section 5.5 a question was raised concerning the adequacy of models when fit to experimental data (see also Section 2.4). It was suggested that any test of the adequacy of a given model must involve an estimate of the purely experimental uncertainty. In Section 5.6 it was indicated that replication provides the information necessary for calculating sf the estimate of 0;. We now consider in more detail how this information can be used to test the adequacy of linear models [Davies (1956)l.
6.1 The null hypothesis Figure 6.1 shows the replicate data points xll= 6, yll = 3, xI2= 6, y12= 5, and their relationships to the two models, y l i = 0 + rli and yli = + rli. We might ask,
Po
0
1
2
3
4
5
6
8
9
10
Level o f F a c t o r X I
Figure 6.1 Relationships of two replicate data points (xll = 6, yll = 3, x12= 6, ylz = 5 ) to two probabilistic models, yli = Po + rli and y l i = 0 + rli
100
<
Accept
I
Relect
>
Figure 6.2 Common usage of the terms “accept” and “reject”.
“Does the purely probabilistic model y l i = 0 + rli adequately fit the data, or does the model yli = Po + rli offer a significantly better fit”? Another way of phrasing this question is to ask if the parameter Po is significantly different from zero. Two possibilities exist:
(1) If Po is significantly different from zero, then the model y l i = Po + rli offers a significantly better fit. (2) If Po is not significantly different from zero, then the model y l i = 0 cannot be rejected; it provides an adequate fit to the data.
+ rli
In statistical terms, we seek to disprove the null hypothesis that the difference between Po and zero is ‘‘null”. The null hypothesis is written
The alternative hypothesis is written
If we can demonstrate to our satisfaction that the null hypothesis is false, then we can reject that hypothesis and accept the alternative hypothesis that Po f 0. However, failure to disprove the null hypothesis does not mean we can reject the alternative hypothesis and accept the null hypothesis. This is a subtle but extremely important point in hypothesis testing, especially when hypothesis testing is used to identify factors in research and development projects (see Section 1.2 and Table 1.1). In everyday language, the words “accept” and “reject” are usually used as exact complements (see Figure 6.2): if something is not accepted, it is rejected; if something is not rejected, it is accepted. As an example, if someone orders a new car and it arrives with a gross defect, the purchaser will immediately reject it. If the new car amves in good condition and seems to perform well in a test drive, the purchaser will probably accept the new car. However, the word “accept” might not be entirely correct in this automotive example - it could imply a level of commitment that does not exist. Perhaps a better choice of words would be to say that the owner “puts up with” or “tolerates” the new car. As long as the new car performs reasonably well and does not exhibit a serious malfunction, the owner will not reject it. Truly “accepting” the car, however, might occur only after many years of trouble-free operation.
101 Accept
Undecided
Reject
In this example, as in statistical hypothesis testing, there is a more or less broad region of “tolerance” or “indecision” between the commitments of “acceptance” and “rejection” (see Figure 6.3). In seeking to disprove the null hypothesis, we are exploring the region between “undecided” and ‘‘rejection”. The exploration of the region between “undecided” and “acceptance” involves a very different type of experimentation and requires a different set of criteria for hypothesis testing. In this book we will limit the testing of hypotheses to the “undecided-rejection” region, realizing, again, that failure to reject the null hypothesis means only to tolerate it as adequate, not necessarily to fully accept it.
6.2 Confidence intervals Figure 6.4 shows the relationship between b, (the least squares estimate of Po) and zero for the data in Figure 6.1. In this example, the parameter Po has been estimated on the basis of only two experimental results; if another independent set of two experiments were carried out on the same system, we would very likely obtain different values for yI3and yl,, and thus obtain a different estimate for Po from the second set of experiments. It is for this reason that the estimation of Po is usually subject to uncertainty. It can be shown that if the uncertainties associated with the measurements of the response are approximately normally distributed (see Equation 3.8), then parameter estimates obtained from these measurements are also normally distributed. The standard deviation of the estimate of a parameter will be called the standard uncertainty, s, of the parameter estimate (it is usually called the “standard error”) and can be calculated from the (X’X)-’ matrix if an estimate of oh is available. For the data in Figure 6.1, the standard uncertainty in b, (sw) is estimated as
0
b0
v
V
7 ,
-1
0
1
2 3 4 5 6 7 Value o f Parameter Estimate
8
9
Figure 6.4 Relationship between b, and 0 for the model y,, = Po + rli fit to the data of Figure 6.1.
102
1
3
1
2
3
4
5
5
7
8
9
V a l u e 0' P a r a m e t e r E s t ~ m a t E
Figure 6.5 Five gaussian curves, each estimated from an independent set of two experiments on the same system, showing uncertainty of estimating b, and sM.
The matrix least squares solution for the model y l i = Po + rli and the data in Figure 6.1 was obtained in Section 5.2 (see Equations 5.26, 5.27, and 5.37-5.47) and gave the results b, = 4, sf = 2. Because the two experiments in this example are replicates, s& = s: and Equation 6.3 becomes
Both the parameter estimate b, and the standard uncertainty of its estimate sm depend on the Y matrix of experimental responses (see Equations 5.36, 5.28, and 5.43): if one set of experiments yields responses that agree closely, the standard uncertainty of b, will be small; if a different set of experiments happens to produce responses that are dissimilar, the standard uncertainty of b, will be large. Thus, not only is the estimation of the parameter Po itself subject to uncertainty, but the estimation of its standard uncertainty is also subject to uncertainty. The estimation of Po is therefore doubly uncertain, first because the value of Po is not known with certainty, and second because the exact distribution of estimates for Po is unknown. Figure 6.5 illustrates the problem: each curve represents a pair of estimates of b, and sm obtained from an independent set of two experiments on the same system. Suppose we could use b, and sbo from only one set of experiments to construct a confidence interval about b, such that there is a given probability that the interval
103
contains the population value of Po (see Section 3.4). The interpretation of such a confidence interval is this: if we find that the interval includes the value zero, then (with our data) we cannot disprove the null hypothesis that Po = 0; that is, on the basis of the estimates b, and sM, it is not improbable that the true value of Po could be zero. Suppose, however, we find that the confidence interval does not contain the value zero: because we know that if Po were really equal to zero this lack of overlap could happen by chance only very rarely, we must conclude that it is highly unlikely the true value of Po is zero; the null hypothesis is rejected (with an admitted finite risk of being wrong) and we accept the alternative hypothesis that Po is significantly different from zero. The other piece of information (in addition to b, and sM) required to establish a confidence interval for a parameter estimate was not available until 1908 when W. S. Gosset, an English chemist who used the pseudonym “Student” (1908), provided a solution to the statistical problem [J. Box (1981)l. The resulting values are known as “critical values of Student’s t” and may be obtained from so-called “t-tables” (see Appendix B for values at the 95% level of confidence). Using an appropriate t-value, we can now estimate a confidence interval ( C I ) for b, obtained from the data in Figure 6.1 :
where 12.71 is the tabulated value of t for one degree of freedom at the 95% level of confidence. Thus, we expect with 95% confidence that the population value of Po lies within the interval -8.711Po1+16.71
(6.7)
(see Figure 6.6). Because this confidence interval contains the value zero, we cannot disprove the null hypothesis that Po = 0. There is no strong reason to believe (at the
<
CI f r o m t - t a b l e
+
Value o f Parameter E s t i m a t e
Figure 6.6 Confidence interval (95% level) for b,.
104
Critical
<
CI f r o m
t-table
3
CI
c a l c u l a t e d t o include z e r o
i
-1
3
1
2 3 4 5 6 7 Value o f Parameter Estimate
>
8
9
Figure 6.7 Confidence intervals for t-test in which the null hypothesis H,: disproved at the specified level of confidence.
p = 0 would be
Po is significantly different from zero, and we must therefore retain as adequate the model y l i = 0 + rli.
95% level of confidence) that
6.3 The t-test A somewhat different computational procedure is often used in practice to carry out the test described in the previous section. The procedure involves two questions: "What is the minimum calculated interval about b, that will include the value zero?" and, "Is this minimum calculated interval greater than the confidence interval estimated using the tabular critical value of t?" If the calculated interval is larger than the critical confidence interval (see Figure 6.7), a significant difference between Po and zero probably exists and the null hypothesis is disproved. If the calculated interval is smaller than the critical confidence interval (see Figure 6 4 , there is insufficient reason to believe that a significant difference exists and the null hypothesis cannot be rejected. Each of these confidence intervals (the calculated interval and the critical interval) can be expressed in terms of b,, sbo, and some value of t (see Equation 6.5). Because the same values of b, and s, are used for the construction of these intervals, the
Critical
CI
from t - t a b l e
CI c a l c u l a t e d t o i n c l u d e z e r o i
-1
0
1
2 3 4 5 6 7 Value o f Parameter Estimate
>
8
9
Figure 6.8 Confidence intervals for t-test in which the null hypothesis H,: disproved at the specified level of confidence.
p = 0 could nor be
105
information about the relative widths of the intervals is contained in the two values of r. One of these, tcrit,is simply obtained from the table of critical value of r - it is the value used to obtain the critical confidence interval shown in Figure 6.7 or 6.8. The other, rdc, is calculated from the minimum confidence interval about b, that will include the value zero and is obtained from a rearrangement of Equation 6.5.
The “t-test” involves the comparison of these two values:
(1) If tCdc> rcrir(see Figure 6.7), the minimum confidence interval is greater than the critical confidence interval and there is strong reason to believe that Po is significantly different from zero. (2) If tcdc< tcrit(see Figure 6.8), the minimum confidence interval is less than the critical confidence interval and there is no reason to believe Po is significantly different from zero. In short, if rdc > rcril, then H, is rejected at the given level of confidence and the alternative hypothesis, Ha, is accepted. As an example, consider again the data in Figure 6.1. From the t-table in Appendix B, refit = 12.71 (95% level of confidence). Based on the estimates b, and sm,
Thus, rcdc < r,, (4 < 12.71) and there is no reason to believe, at the 95% level of confidence and for the data in Figure 6.1, that Po is significantly different from zero. The model y l i = 0 + rli cannot be rejected; it provides an adequate fit to the data.
6.4 Sums of squares Let us now consider a slightly different question. Rather than inquiring about the significance of the specific parameter Po, we might ask instead, “Does the model y l i = 0 + rli provide an adequate fit to the experimental data, or does this model show a significant lack of fit”? We begin by examining more closely the sum of squares of residuals between the measured response, y l i , and the predicted response, itli Gli= 0 for all i of this model), which is given by
106
Lack o f f i t
<
- 2 - 1
0
>
1 2 3 4 5 6 Value o f Parameter E s t i m a t e
7
8
c
L
ro
m
rn t
0
t
D
D
0
W
>
u u
L
n
W
D
r m? 0
a
Figure 6.9 Illustration of the lack of fit as the difference between an observed mean and the corresponding predicted mean.
n
n
n
(6.10) If the model yli = 0 + rli does describe the true behavior of the system, we would expect replicate experiments to have a mean value of zero ( j l i= 0); the sum of squares due to purely experimental uncertainty would be expected to be n
n
(6.11)
if all n of the experiments are replicates. Thus, for the experimental design of Figure 6.1, if yli = 0 + rli is an adequate model, we would expect the sum of squares of residuals to be approximately equal to the sum of squares due to purely experimental uncertainty (Equations 6.10 and 6.1 1).
ss, =sspe
(6.12)
Suppose now that the model yli = 0 + rli does not adequately describe the true behavior of the system. Then we would not expect replicate experiments to have a # 0; see Equation 6.11) and the sum of squares due to purely mean value of zero
Figure 6.10 Sums of squares tree illustrating the relationship SS, = SS,,,
+ SS,.
107
experimental uncertainty would not be expected to be approximately equal to the sum of squares of residuals; instead, SS, would be larger than SSF. This imperfect agreement between what the model predicts and the mean of replicate experiments is called lack o f f t of the model to the data; an interpretation is suggested in Figure 6.9. When replicate experiments have been carried out at the same set of levels of all controlled factors, the best estimate of the response at that set of levels is the mean of the observed response, YIP The difference between this mean value of response and the corresponding value predicted by the model is a measure of the lack of fit of the model to the data. If (and only iJ) replicate experiments have been carried out on a system, it is possible to partition the sum of squares of residuals, SS,, into two components (see Figure 6.10): one component is the already familiar sum of squares due to purely experimental uncertainty, SS;, the other component is associated with variation attributed to the lack of fit of the model to the data and is called the sum of squares due to lack of fit, SS,,.
(6.13) The proof of the partitioning of SS, into SS,ofand SS, identity for a single measured response.
rI,= (vll- 9 1 1 )= (yl, -h)+ ( Y l l- E l r )
is based on the following
(6.14)
Given a set of m replicate measurements,
(6.15)
(6.16)
m
m
(6.17)
108
Because iliis the same for each of these replicate measurements,
(6.20) m
m
However, m
c
m
Yli=
i= I
c
Ylr
(6.21)
i= I
and thus (6.22)
(6.23 )
which by Equation 6.11 is equivalent to
ss, = SSl0f+SS,,
(6.24)
Although it is beyond the scope of this presentation, it can be shown that if the model y I i = 0 + rli is a true representation of the behavior of the system, then the three s~~.:P of squares SS,, SS,,, and SS, divided by the associated degrees of freedom (2, 1, and 1 respectively for this example) will all provide unbiased estimates of 0; and there will not be significant differences among these estimates. If yli = 0 + rli is not the true model, the parameter estimate sf will still be a good estimate of the purely experimental uncertainty, 0; (the estimate of purely experimental uncertainty is independent of any model - see Sections 5.5 and 5.6). The parameter estimate however, will be inflated because it now includes a non-random contribution from a nonzero difference between the mean of the observed replicate responses, yli, and the responses predicted by the model, ili(see Equation 6.13). The less likely it is that y l i = 0 + rli is the true model, the more biased and therefore larger should be the term sfofcompared to sf.
109
6.5 The F-test The previous section suggests a method for testing the adequacy of the model y l i
= 0 + rli:
(1) If y l i = 0 + rli is not the true model, then stofwill not be an unbiased estimate of 0; and there should be a difference between s;ofand sk: we would to be greater than sk; that is, stof- sk > 0. expect (2) If yli = 0 + rli is the true model, then stofand sb should both be good estimates of 0; and there should not be a significant difference between them: we would expect to be equal to s; that is, stof- sk = 0. Thus, we can test the null hypothesis
(6.25 ) with the alternative hypothesis being
Ha:
S:,f-S;,
>o
(6.26)
The test of this hypothesis makes use of the calculated Fisher variance ratio, F.
where F(DFn,DFd) represents the calculated value of the variance ratio with DF, degrees of freedom associated with the numerator and DF, degrees of freedom associated with the denominator. The variance stofalways appears in the numerator; the test does not merely ask if the two variances are different, but instead seeks to answer the question, “Is the variance due to lack of fit significantly greater than the variance due to purely experimental uncertainty?” The critical value of F at a given level of probability may be obtained from an appropriate F-table (see Appendix C for values at the 95% level of confidence). The calculated and critical F values are compared: (1) If Fcdc> Fcrif,stofis significantly greater than sf and the null hypothesis can be rejected at the specified level of confidence. The model yli = 0 + rli would thus exhibit a significant lack of fit.
(2) If FCalc c FcTil, there is no reason to believe that is significantly greater than sf, and the null hypothesis cannot be rejected at the specified level of
110
confidence. We would therefore retain as adequate the model y l i = 0 + rli. For the numerical data in Figure 6.1 and for the model yli = 0 + rli,the variance of residuals is obtained as follows.
(6.28)
DFr=n=2
(6.29)
sf = SS,/DF, = 3412 = 17
(6.30)
The purely experimental uncertainty variance is estimated as m
SSpe=C ( ~ l ~ - y l (~3 -) 4~) 2=+ ( 5 - 4 ) 2 = 2
(6.31)
r=l
(6.32)
s ; ~= SSpe/DFpe= 2/ 1 = 2
(6.33)
Finally, the variance due to lack of fit may be obtained from n
SSlof=C ( Y l i - g l r ) 2 = (4-0)2+ (4-0)2 = 32
(6.34)
i= 1
An alternative method of obtaining SS,orprovides the same result and demonstrates the additivity of sums of squares and degrees of freedom. SSIo,=SS,-SSpe=34-2= 32
(6.35)
(6.36)
111
(6.37) Finally,
At first glance, this ratio might appear to be highly significant. However, the critical value of F at the 95% level of confidence for one degree of freedom in the numerator and one degree of freedom in the denominator is 161 (see Appendix C). Thus, the critical value is not exceeded and the null hypothesis is not rejected. We must retain as adequate the model y l i = 0 + rli. It is interesting to note that for the example we have been using (data from Figure 6.1 and the models y l i = 0 + rli and yli = + rli),the critical value of F is equal to the square of the critical value of t , and the calculated value of F is equal to the square of the calculated value oft. For a given level of confidence, Student’s t values are, in fact, identical to the square root of the corresponding F values with one degree of freedom in the numerator. For these simple models and this particular experimental design, the F-test for the adequacy of the model y l i = 0 + rli is equivalent to the t-test for the significance of the parameter b, in the model y l i = Po + rli. However, for more complex models and different experimental designs, the fwo tests are not always equivalent. The t-test can be used to test the significance of a single parameter. The F-test can also be used to test the significance of a single parameter, but as we shall see, it is more generally useful as a means of testing the significance of a set of parameters, or testing the lack of fit of a multiparameter model.
Po
6.6 Level of confidence In many situations it is appropriate to decide before a test is made the risk one is willing to take that the null hypothesis will be disproved when it is actually true. If an experimenter wishes to be wrong no more than one time in twenty, the risk a is set at 0.05 and the test has “95% confidence”. The calculated value of t or F is compared to the critical 95% threshold value found in tables: if the calculated value is equal to or greater than the tabular value, the null hypothesis can be rejected with a confidence equal to or greater than 95%. If the null hypothesis can be rejected on the basis of a 95% confidence test, then the risk of falsely rejecting the null hypothesis is at most 0.05, but might be much less. We don’t know how much less it is unless we look up the critical value for, say,
112
99% confidence and find that the null hypothesis cannot be rejected at that high a level of confidence; the risk would then be somewhere between 0.01 and 0.05, but further narrowing with additional tables between 95% and 99% would be necessary to more precisely define the exact risk. Similarly, if the null hypothesis cannot be rejected at the 95% level of confidence it does not mean that the quantity being tested is insignificant. Perhaps the null hypothesis could have been rejected at the 90% level of confidence. The quantity would still be rather significant, with a risk somewhere between 0.05 and 0.10 of having falsely rejected the null hypothesis. In other situations it is not necessary to decide before a test is made the risk one is willing to take. Such a situation is indicated by the subtly different question, “What are my chances of being right if I reject this null hypothesis?” In this case, it is desirable to assign an exact level of confidence to the quantity being tested. Such a level of confidence would then designate the estimated level of risk associated with rejecting the null hypothesis. There are today computer programs that will accept a calculated value of t or F and the associated degrees of freedom and return the corresponding level of confidence or, equally, the risk. As an example, a calculated t value of 4 with one degree of freedom (see Equation 6.9) is significant at the 84.4% level of confidence (a = 0.156). Similarly, a calculated F value of 16 with one degree of freedom in the numerator and one degree of freedom in the denominator (see Equation 6.38) is also significant at the 84.4% level of confidence. It is common to see the 95% level of confidence arbitrarily used as a threshold for statistical decision making. However, the price to be paid for this very conservative level of certainty is that many null hypotheses will not be rejected when they are in fact false. A relevant example is the so-called “screening” of factors to see which ones are “significant” (see Section 1.2 and Table 1.1). In selecting for further investigation factors that are significant at a given level of probability, the investigator is assured that those factors will probably be useful in improving the response. But this “efficiency” is gained at the expense of omitting factors that might also be significant. Ideally, the question that should be asked is not, “What factors are significant at the P level of probability?” but rather, “What factors are insigni3cant at the P level of probability?” The number of factors retained when using the second criterion will in general be larger than the number retained when using the first; the investigator will, however, be assured that he is probably not omitting from investigation any factors that are important. Unfortunately, this type of statistical testing requires a very different type of experimentation and requires a different set of criteria for hypothesis testing (see Figure 6.3). An alternative is to relax the requirement for confidence in rejecting the null hypothesis. The widespread use of the 95% level of confidence can be directly linked to Fisher’s opinion that scientists should be allowed to be wrong no more than one time
113
in 20 [Moore (1979)l. There are many areas of decision making, however, where the arbitrary threshold level of 95% confidence is possibly too low (e.g., a doctor’s confidence that a patient will survive an unnecessary but life-threatening operation) or too high (e.g., one’s confidence in winning even money on a bet). Proper assessment of the risk one is willing to take and an exact knowledge of the level of significance of calculated statistics can lead to better decision making.
114
Exercises 6.1 Hypothesis testing. Suppose a researcher believes a system behaves according to the model y l i = Po + P,xli + rli over the domain 0 I x, I10. Suggest a set of ten experiments that might be useful in either disproving the researcher’s hypothesis or increasing the confidence that the researcher’s hypothesis is correct. Give reasons for your chosen design. 6.2 Null hypothesis. Write a null hypothesis that might be useful for testing the hypothesis that b, = 13.62. What is the alternative hypothesis?
6.3 Rejection of null hypothesis. Suppose you are told that a box contains one marble, and that it is either red or blue or green. You are asked to guess the color of the marble, and you guess red. What is your null hypothesis? What is the alternative hypothesis? You are now told that the marble in the box is not red. What might be the color of the marble? 6.4 Qualiv of information. If the person giving you information in Problem 6.3 is lying, or might be lying, what might be the color of the marble? What is the difference between direct evidence and hearsay evidence? Suppose the box has a false bottom and someone under the table is changing the marbles in the box. Is this relevant to the concept of an analytic study as opposed to an enumerative study?
6.5 Confidence intervals. Calculate the 95% confidence interval about b, = 213.92 if sw = 5.12 is based on five degrees of freedom. 6.6 Level of confidence. If the null hypothesis is “rejected at the 95% level of confidence”, why can we be at least 95% confident about accepting the alternative hypothesis? 6.7 Risk and level of confidence. The relationship between the risk, a,of falsely rejecting the null hypothesis and the level of confidence, P , placed in the alternative hypothesis is P = lOO(1 - a)%. If the null hypothesis is rejected at the 87% level of confidence, what is the risk that the null hypothesis was rejected falsely?
115
6.8Statistical significance. Criticize the following statement: “The results of this investigation were shown to be statistically significant”. 6.9 Threshold of significance. State five null hypotheses that could be profitably tested at less than the 95% level of confidence. State five null hypotheses that you would prefer be tested at greater than the 95% level of confidence. 6.10 Confidence intervals. Draw a figure similar to Figure 6.6 showing the following confidence intervals for the data in Figure 6.1 (see Equation 6.6): 50%, 80%, 90%. 95%, 99%, 99.9%. 6.11 One-sided t-test. If a null hypothesis is stated H,: b, = 0 with an alternative hypothesis Ha: b, f 0, then the t-test based on Equation 6.8 will provide an exact decision. If the null hypothesis is rejected, then b, f 0 is accepted as the alternative hypothesis at the specified level of confidence. Values of b, significantly greater or less than zero would be accepted. Suppose, however, that the alternative hypothesis is Ha: 6, > 0. Values of b, significantly less than zero would not satisfy the alternative hypothesis. If we did disprove the null hypothesis and b, were greater than zero, then we should be “twice” as confident about accepting the alternative hypotheses (or we should have only half the risk of being wrong). If tcritis obtained from a regular “two-tailed” t-table specifying a risk a, then the level of confidence in the test is lOO(1 - a/2)%. If the null and alternative hypotheses in Problem 6.5 are H,: b, = 0 and Ha: b, > 0, what should a be so the level of confidence associated with rejecting H, and accepting Ha will be at least 95%? 6.12 t-tables. Go to the library and find eight or ten books that contain t-tables (critical values of t at specified levels of confidence and different degrees of freedom). Look at the value for 2 degrees of freedom and a “percentage point” (or “percentile”, or “probability point”, or “percent probability level”, or “confidence level”, or “probability”) of 0.95 (or 95%). How many tables give t = 2.920? How many give t = 4.303? Do those that give ? = 2.920 at the 0.95 confidence level give ? = 4.303 at the 0.975 confidence level? How many of the tables indicate that they are one-tailed tables? How many indicate that they are two-tailed tables? How many give no indication? Are t values at the lOO(1 - a)% level of confidence from two-tailed tables the same as ?-values at the lOO(1 - d 2 ) % level of confidence from one-tailed tables? In the absence of other information, how can you quickly tell if a ?-table is for one- or two-tailed tests? [Hint: look at the level of confidence associated with the t-value of 1.000 at one degree of freedom.]
116
6.13 Sums of squares. Assume the model yli = 0 + rliis used to describe the nine data points in Section 3.1. Calculate directly the sum of squares of residuals, the sum of squares due to purely experimental uncertainty, and the sum of squares due to lack of fit. How many degrees of freedom are associated with each sum of squares? Do SS,,, and SS, add and sk. What is the value of the Fisher F-ratio for lack up to give SS,? Calculate of fit (Equation 6.27)? Is the lack of fit significant at or above the 95% level of confidence? 6.14 True values. In some textbooks, a confidence interval is described as the interval within which there is a certain probability of finding the true value of the estimated quantity. Does the term “true” used in this sense indicate the statistical population value (e.g., p if one is estimating a mean) or the bias-free value (e.g., 6.21% iron in a mineral)? Could these two interpretations of “true value” be a source of misunderstanding in conversations between a statistician and a geologist?
6.1.5 Sums of squares and degrees of freedom. Fit the model yli = Po + plxli+ rli to the data xI1= 3, y I 1= 2, x12= 3, y l 2= 4, xi3 = 6, yI3= 6, xI4= 6, yI4= 4. Calculate the sum of squares of residuals, the sum of squares due to purely experimental uncertainty, and the sum of squares due to lack of fit. How many degrees of freedom are associated with each sum of squares? Can sYorbe calculated? Why or why not? 6.16 Null hypotheses. State the following in terms of null and alternative hypotheses: a) the mean of a data set is equal to zero; b) the mean of a data set is equal to six; c) the mean of a data set is equal to or less than six; d) the variance of one data set is not different from the variance of a second data set; e) the variance of one data set is equal to or less than the variance of a second data set. 6.17 Distributions of means and standard deviations. Write the following numbers on 31 slips of paper, one number to each piece of paper, and place them in a container: 7, 8,9, 9, 10, 10, 10, 1 1 , 11, 11, 1 1 , 11, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 14, 14, 14, 15, 15, 16, 17. Carry out the following experiment 25 times: mix up the slips of paper, randomly draw out five pieces of paper, calculate the mean and variance of residuals of the numbers drawn, and put the pieces of paper back in the container. Round the values of the mean to the nearest 0.5 unit and the values of the variances to the nearest 0.5 unit and plot as frequency histograms. Is the histogram for the means roughly gaussian? Is the histogram for the variance of residuals roughly gaussian? What is the smallest value sf could assume? What is the largest value? What value appeared most frequently for
117
s:? [The values of ss are distributed according to the chi-square distribution,
x2,
which is skewed. See, for example, Mandel (1964).]
6.18 Equivalence o f t - and F-values. Compare the t-values in Appendix B with the square root of the F-values in Appendix C with one degree of freedom in the numerator. 6.19 Null hypothesis. Suppose you see a dog and hypothesize, “This animal is a stray and has no home”. What might you do to try to prove this hypothesis? What might you do to try to disprove it? Which would be easier, proving the hypothesis or disproving it? 6.20 Lack of fit. What is the relationship between Figures 3.5 and 6.9 (see Problem 3.21).
6.21 Confidence intervals. The confidence interval of the mean is sometimes written j f ts/(n”’). How is this related to Equation 6.5?
This Page Intentionally Left Blank
119
CHAPTER 7
The Variance-Covariance Matrix In Section 6.2, the standard uncertainty of the parameter estimate bo was obtained by taking the square root of the product of the purely experimental uncertainty variance estimate, sk, and the (X’X)-’ matrix (see Equation 6.3). A single number was obtained because the single-parameter model being considered (yli = Po + rli) produced a 1x1 (X’X)-’ matrix. For the general, multiparameter case, the product of the purely experimental and the (X’X)-’ matrix gives the estimated uncertainty estimate, .;s variance-covariance matrix, V .
Each of the upper left to lower right diagonal elements of V is an estimated variance of a parameter estimate, s:; these elements correspond to the parameters as they appear in the model from left to right. Each of the off-diagonal elements is an estimated covariance between two of the parameter estimates [Dunn and Clark ( 1987)].
Thus, for a single-parameter model such as yli = Po + rli, the estimated variance-covariance matrix contains no covariance elements; the square root of the single variance element corresponds to the standard uncertainty of the single parameter estimate. In this chapter, we will examine the variance-covariance matrix to see how the location of experiments in factor space (i.e., the experimental design) affects the individual variances and covariances of the parameter estimates. Throughout this section we will be dealing with the specific two-parameter first-order model yli = Po + Pixli + rli only; the resulting principles are entirely general, however, and can be applied to all other linear models. We will also assume that we have a prior estimate of 0; for the system under investigation and that the variance is homoscedastic (see Section 3.3). Our reason for assuming the availability of an estimate of c& is to obviate the need for replication in the experimental design so that the effect of the location of the experiments in factor space can be discussed by itself.
120
7.1 Influence of the experimental design Each element of the (X‘X) matrix is a summation of products (see Appendix A). A common algebraic representation of the (X’X) matrix for the straight-line model yli = Po + Pixli + rli is
(XX)=
I
(7.2)
Let s&, be the estimated variance associated with the parameter estimate b,; let sil be the estimated variance associated with b,; and let sLb1(or silbo)represent the estimated covariance between b, and b,. Then
(7.3)
From Equation 7.3 it is evident that the experimental design (i.e., the level of each xli) has a direct effect on the variances and covariances of the parameter estimates. The effect on the variance-covariance matrix of two experiments located at different positions in factor space can be investigated by locating one experiment at xl, = 1 and varying the location of the second experiment. The first row of the matrix of parameter coefficients for the model yli = Po + pIxli + rli can be made to correspond to the fixed experiment at x , = 1.
,
(7.4)
Let us now locate a series of “second experiments” from x12= -5 to x12= +5. The
X matrix for this changing experimental design may be represented as
Three numerical examples of the calculation of the (X’X)-’ matrix are given here. Example 7.1: x12= -4
121
l][l
(XX)=[l 1
-4
‘I=[ ,I 2
-3
-
1 -4
(7.6)
(7.7) Example 7.2: x12= +0.5 ( X X ) = “ 0 0.5
1“ I=” 1 0.5
]
1.5 l1.25 e5
Example 7.3: xI2= +4 (7.10)
(7.11)
7.2 Effect on the variance of b, Let us first consider the effect of these experimental designs on the uncertainty of estimating the parameter j3,.This parameter represents the slope of a straight line relationship between yI and xl. There is an intuitive feeling that for a given uncertainty in response, the farther apart the two experiments are located, the more precise the estimate of should be. An analogy is often made with the task of holding a meter stick at a particular angle using both hands: if the meter stick is grasped at the 5- and 95-cm marks, a small amount of up-and-down “jitter” or uncertainty in the hands will cause the stick to move from the desired angle, but only slightly; if the meter stick is held at the 45and 55-cm marks, the same vertical jitter or uncertainty in the hands will be “amplified” and could cause the stick to wiggle quite far from the desired angle. In the first case, there would be only a slight uncertainty in the slope of the meter stick; in the second case, there would be much greater uncertainty in the slope of the meter stick. These intuitive feelings are confirmed in the calculations of Examples 7.1-7.3. When the two experimental points are relatively far apart, as they are in Example 7.1 (xll = +1, xI2= -4, distance apart = 5), the value of the element for the estimated
122
variance of b, is relatively small - only 2/25 (the lower right hand element of the (X‘X)-’ matrix in Example 7.1). When the experiments are located closer in factor space, as they are in Example 7.2 (x,, = +1, xI2= +0.5, distance apart = O S ) , the value of the element for the estimated variance of b, is relatively large - 8 in Example 7.2. When the experiments are once again located far apart in factor space, as they are in Example 7.3 (x,, = +1, x12= +4, distance apart = 3), the value of the element for the estimated variance of b, is again relatively small - 2/9 in Example 7.3. A more complete picture is shown graphically in Figure 7.1 where the value of the element for the estimated variance of b, is shown as a function of the location of the second experiment. The stationary experiment is shown by a dot at xll = +l. As the two experiments are located farther and farther apart, the uncertainty in the slope of the straight line decreases. Note that the curve in Figure 7.1 is symmetrical about the fixed experiment at x , , = +l.
7.3 Effect on the variance of b, Let us now consider the effect of experimental design on the uncertainty of estimating the parameter p,,. This parameter is interpreted as the y,-intercept (at x , = 0) of the straight line yli = Po + pixli + rli. Imagine that you are given a very long, rigid stick and are asked to position that stick as close to a given target point as possible. You can grasp the stick with your hands anywhere along its length, but your hands must always be one meter apart, thus keeping constant the amount of jitter in the slope of the stick. Suppose the target point is the comer of a desk immediately in front of you. If you position the stick so that each hand is equidistant from the desk comer, you would be able to make the stick touch the target point. Any jitter in your hands would not have much effect on the location of the stick at the desk comer. If, however, you walk 20 feet away from the desk, you would probably not be able to position the stick very precisely - any jitter in your hands would be amplified along the extended stick and cause considerable wiggling of the stick near the target point. This effect is shown in Figure 7.2, where the upper left element of the (X’X)-’ matrix (the element used to estimate the variance of b,) is plotted as a function of the center of various two-experiment designs. The worst case illustrated involves two experiments separated by two units in the factor xl: as the design is moved away from x , = 0, the uncertainty in b, begins to rise steeply. The best case illustrated is for two experiments separated by ten units in the factor x , : the uncertainty in b, increases very slowly as the center of the design is moved away from x , = 0. Note that in all cases, the element of (X’X)-’ associated with the variance of b, is equal to its minimum value of 0.5 when the design is centered at x, = 0. Note also that the
123 L
I
1
-5
-4
! 3 -3 -2 -1 1 L o c a t i o n o f Second E x o e r i m e n t
4
F
Figure 7.1 Value of the element of (X‘X)-’ for the estimated variance of‘ b, as a function of the location of a second experiment, one experiment fixed at x1 = 1.
same element of (X’X)-’ is equal to 1 when the design is centered so that one of the experiments is located at x , = 0. Thus, the uncertainty in b, is influenced by the uncertainty in the slope of the straight line relationship yli = Po + pinli + rli;for a given uncertainty in the slope, the uncertainty in b, becomes larger as the center of the experimental design moves farther away from x , = 0. The combined effects of both the distance between the two experimental points and the location of the center of the design are seen in Figure 7.3, where one of the points is located at xll = +I (see Section 7.1) and the second point is allowed to vary along the n, factor axis (see Equation 7.5). If the two experiments are located close together, the estimated uncertainty in b, is seen to be very great. The uncertainty goes through a minimum when the second experiment is carried out at x12= -1, as expected from Figure 7.2. If x,, is moved to values more negative than -1, even though the slope of the straight line model becomes more stable, the estimated uncertainty in bo increases because of the overriding effect caused by the center of the design moving away from the point x , = 0. If x,, is moved to values more positive than -1, the estimated uncertainty in b, increases not only because the center of the design moves away from x1 = 0, but also because the experimental points are getting closer together resulting in greater jitter of the slope of the straight line model at x , = 0. When x,, = + I , the elements of the (X’X)-’ matrix are undefined (the determinant goes to zero). If x,, is moved to values more positive than +1, the slope of the straight line model gains stability and the yl-intercept again becomes more certain, even though the center of the design is moving farther from x1 = 0.
-5
-4
-3 -2 -1 0 1 2 3 Cente- a f two-experiment design
4
5
Figure 7.2 Value of the element of (X'X)-' for the estimated variance of b, as a function of the center of five different experimental designs, one each with Ax, = 2, 4, 6, 8, and 10. See text for discussion.
7.4 Effect on the covariance of b, and b, Figure 7.4 plots the value of either of the off-diagonal elements (they are theoretically and numerically identical) of the (X'X)-'matrix as a function of the location of the second experiment. As stated previously, the off-diagonal elements of the (X'X)-' matrix are associated with the covariance of the parameter estimates; for this model, the off-diagonals of the matrix represent the covariance between b, and b, (or, identically, between b, and b,). To discover what covariance is, and what it means when the covariance is negative, let us assume that we have carried out the two experiments at xI1= +1 and x12= -4 (see Example 7.1). The off-diagonal element of the (X'X)-' matrix (the covariance between b, and b,) is positive 3/25. Assume also that we have obtained responses of yll = 4.5 and ylz = 2.0. The first-order relationship through these points is shown as a solid line in Figure 7.5. For purposes of illustration, suppose the experiments are repeated twice, and values of yls = 4.75, yI4= 1.75, yls = 4.25, and yI6 = 2.25 are obtained by chance because of uncertainty in the response; the corresponding relationships are shown as dashed lines in Figure 7.5. For these sets of data, both the slope and the y,-intercept increase and decrease together. The same tendency exists for most other sets of experimental data obtained at x I = +1 and x1 = -4: as the slope through these points increases, the intercept at x , = 0 will probably
125 I
I
-5
,
,
-4
-3
,
,
-2 -1 1 2 3 Location o f S e c o n d E x p e r i m e n t
1
4
5
Figure 7.3 Value of the element of (X'Q-' for the estimated variance of b, as a function of the location of a second experiment, one experiment fixed at x, = 1.
increase; as the intercept increases, the slope will probably increase. Similarly, as one parameter decreases, the other will decrease. As one parameter estimate varies, so varies the other: the covariance is positive. Assume now that we have carried out our two experiments at xl, = +1 and xlZ= +4 (see Example 7.3). The off-diagonal element of the (X'X>-' matrix is negative 5/9. Assume also that we have obtained responses of yll = 4.5 and yI2 = 6.0. The first-order relationship through these points is shown as a solid line in Figure 7.6. Again, for purposes of illustration, suppose the experiments are repeated twice, and values of y13 = 4.35, y,, = 6.15, yls = 4.65, and y16 = 5.85 are obtained; the corresponding relationships are shown as dashed lines in Figure 7.6. For these sets of data, the slope and the yl-intercept do not tend to increase and decrease together. Instead, each parameter estimate varies oppositely as the other: the covariance is negative. Let us now assume that we have carried out our two experiments at xII= +1 and xlz = -1 and that we have obtained responses of yI1= 4.5 and yI2= 3.5. When the experiments are repeated twice, we obtain yI3= 4.4, y14= 3.6, yls = 4.6, and y16= 3.4; the corresponding relationships are shown in Figure 7.7. Note that in this example, the intercept does not depend on the slope, and the slope does not depend on the intercept: the covariance is zero. As an illustration, if xll= +1 and xI2= -1, then Example 7.4: xl2= -1
126
(7.12)
(7.13) This is also seen in Figure 7.4 in which the covariance goes to zero when the second experiment is located at xI = -1.
7.5 Optimal design Consideration of the effect of experimental design on the elements of the variance-covariance matrix leads naturally to the area of optimal design [Box, Hunter, and Hunter (1978), Evans (1979), and Wolters and Kateman (1990)l. Let us suppose that our purpose in carrying out two experiments is to obtain good estimates of the intercept and slope for the model y l i = Po + Plxli+ rli.We might want to know what levels of the factor xI we should use to obtain the most precise estimates of Po and
PI.
The most precise estimate of Po is obtained when the two experiments are located such that the center of the design is at xI= 0 (see Figure 7.2). Any other arrangement
7
L n r a t i o n a f CeconLI E x p e r l m e - t
Figure 7.4 Value of the element of (X’X)-’ for the estimated covariance between b, and b, as a function of the location of a second experiment, one experiment fixed at x, = 1 .
127
J -5
,
,
,
-4
-3
-2
, -1
,
,
1 2 Level o f F a c t o r X I
/
/
3
4
5
Figure 7.5 Illustration of positive covariance: if the slope (b,)increases, the intercept (b,) increases; if the slope decreases, the intercept decreases.
of experiments would produce greater uncertainty in the estimate of Po. With two experiments, this optimal value of sh would be 0 . 5 ~ ~(see ; Equations 7.1 and 7.13). The most precise estimate of p, is obtained when the two experiments are located
t
J
1 Level of
2
3
4
5
F a c t o r XI
Figure 7.6 Illustrationof negative covariance: if the slope (b,)increases, the intercept (bo)decreases; if the slope decreases, the intercept increases.
-5
-4
-3
-2
-1
0
2
1
3
4
5
Level of Factor X 1
Figure 7.7 Illustration of zero covariance: if the slope ( b , )increases or decreases, the intercept (b,) does not change.
as far apart as possible. If the two experiments could be located at --oo and +=, there would be no uncertainty in the slope. Example 7.5: xI1= -00, xIz= +oo
(XX)=[
l][1 -a +or, 1
( X X ) - l = [ 1/2 0 ]
-a]=[ 2 +or, 0
]
0 co2
(7.14)
(7.15)
In most real situations, boundaries exist (see Section 2.3) and it is not possible to obtain an experimental design giving zero uncertainty in the slope. However, the minimum uncertainty in the slope will be obtained when the two experiments are located as far from each other as practical.
129
Exercises 7.I Variance-covariance matrix. Calculate the variance-covariance matrix associated with the straight line relationship yli = Po + P,xIi + rli for the following data (see Section 11.2 for a definition of 0 ) :
D=
-4 -4 1 1 -4 1
Y=
4.24 2.25 4.75
(Hint: there are no degrees of freedom for lack of fit; therefore, s; = sf by Equation 6.24.)
7.2 Variance of b,. What is the value of the lower right element of the (X'X)-'matrix (the element for estimating the variance of b,) when x I 2= 1 in Figure 7.1? Why? 7.3 Variance-covariance relationships. Figure 7.2 implies that for the model y l i = Po + Pixli + rli, if one experiment is placed at n, = 0, the position of the second experiment will have no influence on the estimated value of sm. However, the position of the second experiment will affect the Can you discover a relationship between sbl and sml for this values of sbland smbl. case? 7.4 Coding effects. Calculate the variance-covariance matrix associated with the straight line relationship y l i = Po + PI+ + rli for the following data set (assume sk = sf):
I![=.
Y=[
'!]
Subtract 2 from each value of D and recalculate. Plot the data points and straight line
130
for each case. Comment on the similarities and differences in the two variance-covariance matrices.
7.5 Asymmetry. Why are the curves in Figures 7.3 and 7.4 not symmetrical? 7.6 Variance-covariance matrix. Equation 7.1 is one of the most important relationships in the area of experimental design. As we have seen in this chapter, the precision of estimated parameter values is contained in the variance-covariance matrix V the smaller the elements of V, the more precise will be the parameter estimates. As we shall see in Chapter 11, the precision of estimating the response surface is also directly related to V the smaller the elements of V, the less “fuzzy” will be our view of the estimated surface. Equation 7.1 indicates that there are two requirements for a favorable variance-covariance matrix: small values of sk, and small elements of the (XX)-’ matrix. How can s$ be made small? How can the elements of the (X’X)-’ matrix be made small? 7.7 Precision of estimation. If sb = 4, how large should n be so that sm will be less than or equal to O S ? 7.8 Precision of estimation. If sf = 4, how large should n be so that the 95% confidence interval associated with b, in Equation 6.5 will be no wider than b,, & 1.5?
131
CHAPTER 8
Three Experiments
In a single-factor system, there are three possible designs for three experiments. We will discuss each of these possibilities in order, beginning with the situation in which all experiments are carried out at the same level of the single factor, and concluding with the case in which each experiment is carried out at a different level.
8.1 All experiments at one level Let us suppose we have carried out the three experiments y l I = 3.0, y12= 5.0, y13= 3.7.
xll = 6 x12= 6 x13= 6
The model yli = 0 +- rli If we assume the model yli = 0 + rli, the data and uncertainties are as shown in Figure 8.1. We can test this model for lack of fit because there is replication in the experimental design which allows an estimate of the purely experimental uncertainty (with two degrees of freedom).
yli= (3.0+ 5.0+ 3 . 7 ) / 3 = 3.9
(8.1 1
n
SS,, = 1 (yl,- ~ 7 , , ) ~ = 2 . 0 6 i=
1
S&=SSpe/(n- 1)=2.06/2= 1.03
(8.3)
a
0
1
3
2
7 4 5 6 Level of Factor X I
h
8
9
10
Figure 8.1 Relationship of three replicate data points to the model yli = 0 + r,,.
The sum of squares of residuals, with three degrees of freedom, is given in this case by
(8.4) The sum of squares due to lack of fit, with one degree of freedom, may be obtained by difference
SS,,f=SS,-SS,, =47.69 - 2.06 = 45.63
(8.5)
or by direct calculation n
SSiof=C (yir- $ I r ) I=
I
45.63
(8.6)
The estimate of the variance due to lack of fit is given by
S : , ~ = S S ,(3 ~ ~-/2 ) =45.63 An appropriate test for lack of fit is the F-test described in Section 6.5. F(I.2)=S&f/sf =45.63/1.03=44.30
(8.7)
133
The critical value of F(l,2) at the 95% level of confidence is 18.51 (see Appendix C). Thus, Fcdc > Fcrit which allows us to state (with a risk of being wrong no more than one time in 20) that the model y l i = 0 + rli exhibits a significant lack of fit to the three data we have obtained. The level at which Fcdc = Fcrit is 97.8% confidence, or a risk a = 0.022. A better model is probably possible.
The model yli = Po + rli If we fit the model y l i = Po + rli to this data, we will find that b,, = 3.9. The sum of squares of residuals (with two degrees of freedom) is
[
- 0.9
SSr=R'R= [ -0.9
1.1
-0.21
1.1]=0.81+
l.21+0.0k2.06
(8.9)
- 0.2
The variance of residuals is
S:
(8.10)
=SS,/ (n- 1 ) = 2.06/2 = 1.03
The variance due to purely experimental uncertainty (with two degrees of freedom) is n
sie
c ( Y l r - m 2 = 1.03 = r=l
(8.11)
n- 1
Further, s&=
[ s & ( X X ) - ' ]"2= [ 1.03X (1/3)]"*=
[0.343]1'2=0.586
(8.12)
We can test the significance of the parameter estimate b, by calculating the value of f required to construct a confidence interval extending from b, and including the value zero (see Section 6.3).
The critical value o f t at the 95% level of confidence for two degrees of freedom is 4.30 (see Appendix B). Thus, fcdc > tCritwhich allows us to state (with a risk of being
134
wrong no more than one time in 20) that the null hypothesis H,: Po = 0 is disproved and that Po is different from zero. The level at which fCdc = fcritis 97.8% confidence, or a risk a = 0.022. It should be noted that in this case the sum of squares due to lack of fit must be zero because the sum of squares of residuals is equal to the sum of squares due to purely experimental uncertainty. This would seem to suggest that there is a perfect fit. In fact, however, s:,~cannot be calculated and tested because there are no degrees of freedom for it: one degree of freedom has been used to estimate Po leaving only two degrees of freedom for the residuals; but the estimation of purely experimental uncertainty requires these two degrees of freedom, leaving no degrees of freedom for lack of fit (see Figure 6.10).
8.2 Experiments at two levels Let us now assume that the first experiment was not carried out at xI1= 6, but rather was performed at xI1= 3. The three experiments are shown in Figure 8.2. One experiment is carried out at one level of the single factor xl, and two experiments are carried out at a different level.
The model yli =
Po + rli
If the model y l i = Po + rli is fit to this data (see Figure 8.2), we find, as before, that b, = 3.9 and sf = 1.03 (see Section 8.1). If we now calculate ,s; we find that in this case we have only two replicates, whereas before we had three. Y l r=
(5.0+ 3.7) /2 = 4.35
(8.14)
(8.15)
with one degree of freedom. S
& =S S p e / ( n - 1 ) =0.84 5/ ( 2 - 1 ) =0.845
sb0= [ s & ( X X ) - ~ ]' I 2 = [0.845X ( 1 / 3 ) ] ' / 2 = [0.282]1'2=0.531
(8.16)
(8.17)
135
-1
cu
0
I -7
,
,
-
Level o f Factor X 1
Figure 8.2 Relationship of three experiments at two levels to the fitted model yI,= Po + rli'
tcalc = I bo - 0 I / s ~ O = I 3.9 - 0 I /O. 5 3 1= 7.34
(8.18)
The tabular value of t at the 95% level of confidence and one degree of freedom is 12.71 (see Appendix B). Thus, fCdc c fcfit and we cannot conclude (at the 95% level of confidence) that Po is significantly different from zero for the data in this example. The level at which fcalc = tC",is 91.4% confidence, or a risk a = 0.086.
The model yli =
Po + pIxIi+ rIi
We now fit the model y l i = Po +
+ rli. (8.19)
D= 3 x 8 1 - 15 x 15 = 243 - 224= 18
81/18
(xX)-'=[ - 15/18
-15/18 3/18]=[
(8.20)
-:$: -;$:I
(8.21)
136
(8.22)
-'/'I
9/2 [-5/6
1/6
[61.2
(8.23)
11.7]=[
0.45
The best least squares straight line is shown in Figure 8.3. Notice that the line goes through the point xI1= 3, yll = 3 and through the average of the points x12= 6, y12 = 5.0 and x13= 6, y13= 3.7. There is perhaps an intuitive feeling that this should be the case, and it is not surprising to see the residuals distributed as follows.
R=Y-XB=
:!I-[
[ [1 :][A:]=[ 5.0 - 1 6
SSr=R'R= [0 0.65
-0.651
[
";5]=[ 4.35
0.65 =0.845 -:.6,]
:.65]
(8.24)
-0.65
(8.25)
The sum of squares due to purely experimental uncertainty, already calculated for
' I ' z
0
1
2
3
4 5 6 7 L e v e l o f F a c t o r XI
'
'
8
9
10
Figure 8.3 Relationship of three experiments at two levels to the fitted model yl, = Po + P,x,, + r,,.
137
this data set (see Equation 8.15), is 0.845 with one degree of freedom. If we calculate SSlof= SS, - SS,, then SS,, = 0 and we might conclude that there is no lack of fit that is, that the model fits the data perfectly. However, as seen before in Section 8.1, an analysis of the degrees of freedom shows that this is not a correct conclusion. The total number of degrees of freedom is three; of these, two have been used to calculate b, and b, and the third has been used to calculate .s; Thus, there are no degrees of freedom available for calculating s:op If a variance due to lack of fit cannot be calculated, there is no basis for judging the inadequacy of the model.
8.3 Experiments at three levels: first-order model
Po
For this section, we will assume the model yli = + P,xli+ rli. We will examine the (X'X)-'matrix to determine the effect of carrying out a third experiment xI3 between x1 = -5 and x1 = +5 given the very special case of two fixed experiments located at xll = +1 and x12= -1 (see Chapter 7 and Example 7.4). We give four numerical examples of the calculation of the (X'X)-'matrix, one each for x , = ~ -4, 0, +0.5, and +4. Example 8.1: x I 3= -4
5
-
4
-
- 2 - 1 0 1 2 3 Location of Thlrd Exoeriment
3
4
5
Figure 8.4 Value of the element of (X'X)-' for the estimated variance of b, as a function of the location of a third experiment, two experiments fixed at x, = -1 and x, = 1.
y
-
e
m
1
-5
-4
I
,
-2
1 2 3 -1 Location of Third Experiment
-3
4
5
Figure 8.5 Value of the element of (X'X)-' for the estimated variance of b, as a function of the location of a third experiment, two experiments fixed at x1 = -1 and x1 = 1.
"
( X X )= 1
-1
'I[:
-4
1
-:I=[-: -4
-4
(8.26)
(8.27)
Example 8.2: xI3 = 0 (X.X)=[' 1
(XX)-'=
-1
['C
'I[;
0
:,2]
3 2] 0
(8.28)
(8.29)
Example 8.3: x I 3= t0.5 (8.30)
139
(8.31) Example 8.4: xI3= +4
(8.32)
(XX)-'=
-2/19
(8.33)
3/38
A more complete picture of the effect of the location of the third experiment on the variance of the slope estimate (b,) is shown in Figure 8.4. From this figure, it is evident that the uncertainty associated with b, can be decreased by placing the third experiment far away from the other two. Placing the experiment at xI3= 0 seems to give the worst estimate of b,. For the conditions giving rise to Figure 8.4, this is true, but it must not be concluded that the additional third experiment at ~ 1 = 3 0 is detrimental: the variance of b, (s;, = s32) for the three experiments at x , , = -1, xI2 = +1, and xI3= 0 is the same as the variance of b, for only two experiments at n,, = -1 and xI2= +1 (see Equations 7.1 and 7.13).
-5
-4
-3 -2 -1 0 1 2 3 Location of l h i r d Experiment
4
5
Figure 8.6 Value of the element of (X'X)-' for the estimated covariance between b,, and b, as a function of the location of a third experiment, two experiments fixed at x, = -1 and x1 = 1.
140
Figure 8.5 shows that b, can be estimated most precisely when the third experiment is located at x13= 0. This is reasonable, for the contribution of the third experiment at x1 = 0 to the variance associated with b, involves no interpolation or extrapolation of a model: if the third experiment is carried out at x1 = 0, then any discrepancy between y13and the true intercept must be due to purely experimental uncertainty only. As the third experiment is moved away from x1 = 0, & does increase, but not drastically; the two stationary experiments remain positioned near x1 = 0 and provide reasonably good estimates of b, by themselves. Finally, the effect of the position of the third experiment on the covariance associated with b, and b, is seen in Figure 8.6 to equal zero at x13= 0. If the third experiment is located at xI < 0, then the estimates of the slope and intercept vary together in the same way (the covariance is positive; see Section 7.4). If the third experiment is located at xl > 0, the estimates of the slope and intercept vary together in opposite ways (the covariance is negative).
8.4 Experiments at three levels: second-order model Let us now consider another case in which each experiment is carried out at a different level of the single factor xI:xll = 3, x12= 6, and x13= 8. The three points are distributed as shown in Figure 8.7. It is possible to fit zero-, one-, two-, and three-parameter models to the data shown
: I tc
0
1
2
3
4 5 6 7 Level of Factor X I
'
'
8
9
10
Figure 8.7 Graph of the probabilistic model y l i = Po + Plxli + PI,& experiments at different levels of the factor xI.
+ rli fit to the results of three
-1 r -5
-4
-3
-2
-1
1
141
\
2
3
4
5
Location o f Third Erperiment
Figure 8.8 Value of the element of (X'X)-'for the estimated variance of b, in the model y , , = Po + P,x,, + PI,$, + r,, as a function of the location of a third experiment, two experiments fixed at x, = -1 and x , = 1.
in Figure 8.7; for the moment, we will focus our attention on the three-parameter parabolic second-order model
-5
-4
-2 -1 1 2 3 Location o f T h i r d Experiment
-3
4
5
Figure 8.9 Value of the element of (X'X)-' for the estimated variance of b,.
r--T-
i
1 L t dt'or
~f
2
2
.
4
5
Third E x p e r i m ~ r +
Figure 8.10 Value of the element of (X'X)-' for the estimated variance of bll.
(8.34)
~lf=po+plXlf+P1lX:I+~,f
The subscript on the parameter Pl1 is used to indicate that it is associated with the This notation anticipates coefficient corresponding to x1 times x,; that is xlxlror
4.
-5
-4
-3
-2
-1
0
1
2
3
4
5
Location o f T h i r d E x p e r i m e n t
Figure 8.11 Value of the element of (X'X)-' for the estimated covariance between b, and b,.
143
L o c a t ~ o no f T h l r d E x p e r ~ m e n t
Figure 8.12 Value of the element of (X’X)-’ for the estimated covariance between b,, and bl,.
later usage where it will be seen that Pl2, for example, is associated with the coefficient corresponding to x , times a second factor x2, xlx,. Although the fit will be “perfect” and there will be no residuals, we will
Figure 8.13 Value of the element of (X’X)-’ for the estimated covariance between b, and b,,.
144
nevertheless use the matrix least squares method of fitting the model to the data. The equations for the experimental points in Figure 8.7 can be written as
(8.35)
The matrix of parameter coefficients is thus of the form
x=
[' 1 1
:ji]=[l 1 63
3]; (8.36)
XI2 ~
1
3x:3
1 8 64
Continuing with the data treatment procedures,
(X'X)= 3 [19
17 1091 1 6 36 = 17 109 755 6 8 361 64l][l 1 8 109 3 64 91 3 755 5473
[
(8.37)
Procedures for the inversion of a 3x3 matrix are given in Appendix A. 13266/450 -5373/450 477/450
-5373/450 2269/450 -206/450
477/450 -206/450 19/450
1
(8.38)
(8.39)
- 3.740 = ( X ' X ) - I ( X Y ) = [ -0.2633 3.037
As a check,
P=XB=
[
1 3 9 ] [ - 3 . 73.037 401 1 6 36 1 8 64 -0.2633
]
(8.40)
= [3.001] 5.003
(8.41 ) 3.705
which does reproduce the original data points. (The discrepancies are caused by rounding to obtain the four-significant-digit estimates of p, and pI1in Equation 8.40.)
145
The equation that best fits this data is therefore
Yli= - 3 . 7 4 O + 3 . 0 3 7 ~ I ; - 0 . 2 6 3 3 ~ : ; + r , ;
(8.42)
Let us assume we have a prior estimate of dP.We will now rewrite the (X’X)-’ matrix and consider its meaning in terms of the variances and covariances of the parameter estimates. 29.48
- 11.94
1.06
5.04 -0.46
1.061 -0.46 0.04
(8.43)
The uncertainty in the estimate of Po (the value of y , at x1 = 0) is relatively large (& = ~ 2 ~ 2 9 . 4 8Figure ). 8.7 suggests that this is reasonable - any uncertainty in the response will cause the parabolic curve to “wiggle”, and this wiggle will have a rather severe effect on the intersection of the parabola with the response axis. The uncertainty in the estimate of is smaller (sE1 = s;x5.04), and the uncertainty in the estimate of Pll is smaller still (sill= s2,xO.O4). The geometric interpretation of the parameters PI and PI1 in this model is not straightforward, but P1 essentially moves the apex of the parabola away from x1 = 0, and PI, is a measure of the steepness of curvature. The geometric interpretation of the associated uncertainties in the parameter estimates is also not straightforward (for example, Po, P,, and PI, are expressed in different units). We will simply note that such uncertainties do exist, and note also that there is covariance between b, and b,, between b, and b,,, and between b, and bll.
P1
8.5 Centered experimental designs and coding There is a decided interpretive advantage to a different three-experiment design, a symmetrical design centered about x1 = 0. Let us assume that two of the Figures 8.8-8.13 show the experimental points are located at xll= -1 and x,* = +l. effects of moving the third experimental point from x13= -5 to x13= +5 on the elements of the (X’X)-’ matrix associated with the variances s;, sil, and sill,and with respectively. Of particular interest to us is the the covariances sLl,s&,,,, and silblI, case for which x13= 0.
‘I=[.
1 1 1 1 - 1 1 (XX)=[-l 1 0][; 01 0 1 1 0
3 0 2 2 2 0 0 21
(8.44)
146
1
0 (8.45)
-1
0
Using this design, the covariances between the estimates of Po and PI and between the estimates of PI and PI] are zero. This is confirmed in Figures 8.1 1 and 8.13. Thus, the estimation of Po does not depend on the estimated value of PI (and vice versa), and the estimated values of PI and PIIdo not depend on the estimated values of each other. This advantage of a symmetrical experimental design centered at xI= 0 is usually realized in practice not by actually carrying out experiments about x1 = 0 (in many practical cases, lower natural boundaries would prevent this; see Section 2.3), but instead by mathematically translating the origin of the factor space to the center of the design located in the desired region of factor space. Often, as a related procedure, the interval between experimental points is normalized to give a value of unity. When one or the other or both of these manipulations of the factor levels have been carried out, the experimental design is said to be coded. If cXIis the location of the center of the design along the factor xl, and if d,, is the distance along xI between experimental points, then the coded factor levels (xYi) are given by x:i = (XI i - c x 1 ) /dx 1
(8.46)
Another, often major, advantage of using coded factor levels is that the numerical values involved in matrix manipulations are smaller (especially the products and sums of products), and therefore are simpler to handle and do not suffer as much from round-off errors. It is to be stressed, however, that the geometric interpretation of the parameter estimates obtained using coded factor levels is usually different from the interpretation of those parameter estimates obtained using uncoded factor levels. As an illustration, b: (the intercept in the coded system) represents the response at the center of the experimental design, whereas Po (the intercept in the uncoded system) represents the response at the origin of the original coordinate system; the two estimates (Pi and Po) are usually quite different numerically. This “difficulty” will not be important in the remainder of this book, and we will feel equally free to use either coded or uncoded factor levels as the examples require. Later, in Section 11.5, we will show how to translate coded parameter estimates back into uncoded parameter estimates.
147
8.6 Self interaction Before leaving this chapter, we introduce the concept of interaction by pointing out an interpretation of the second-order character in the model we have been using. Equation 8.34 can be rewritten as
Focusing our attention for a moment on the term (Dl + pI1xli)xli, it can be seen that over a small region of factor space, the first-order effect of the factor x1 is given by the “slope”, p1+ p,lxIi. But this “slope” depends on the region of factor space one is observing! Another way of stating this is that the effect of the factor depends on the level of the factor. For the example of Figure 8.7, in the region where x, is small, the effect of x1 is to increase the response y1 as xI is increased; in the region where x1 is large, the effect of x1 is to decrease the response y1 as x1 is increased; in the region where xI is approximately equal to 5 or 6, there is very little effect of x1 on yl. This dependence of a factor effect on the level of the factor will be called “self interaction”.
148
Exercises 8.1 Number of experimental designs. In a single-factor system, how many possible designs are there for four experiments (e.g., four experiments at a single factor level, three at one factor level and one at another factor level, etc.)? For five experiments? 8.2 Calculation of SS,o,. Show by direct calculation that the numerical value in Equation 8.6 is correct. 8.3 Degrees of freedom. From discussions in this and previous chapters, formulate an algebraic equation that gives the number of degrees of freedom associated with lack of fit. 8.4 Effect of a fourth experiment. Recalculate Example 8.2 for four experiments, the fourth experiment located at x1 = 0. Assuming sf is the same as for the three experiment case, what effect does this fourth experiment have on the precision of estimating Po and PI?
8.5 Variance of b,. The graph in Figure 8.9 is discontinuous at xI = -1 and xI = 1. Why? Are there corresponding discontinuities in Figures 8.8 and 8.10-8.13?
8.6 Matrix inversion. Verify that Equation 8.38 is the correct inverse of Equation 8.37 (see Appendix A). 8.7 Residuals. Why would Equation 8.41 be expected to reproduce the original data points of Equation 8.35? 8.8 Matrix inversion. Verify that Equation 8.45 is the correct inverse of Equation 8.44.
8.9 Matrix least squares. Complete the development of Equations 8.44 and 8.45 to solve for B assuming yI1 = 3, yI2= 5, and yI3= 4. 8.10 Self interaction. Give three examples of self interaction. Does the sign of the slope invert in any of these examples?
149
8.I I Self interaction. Sketch a plot of “discomfort” vs. “extension” for Figure 1.11. Does the relationship you have plotted exhibit self interaction? 8.12 Self interaction. Indicate which of the following figures exhibit self interaction: Figures 1.2, 1.9, 1.15, 2.2, 2.5, 2.7, 2.8, 2.18, 4.2,4.4,5.3, 7.1,7.4,8.4, 8.9, 8.13. 8.13 Matrix least squares. Fit the model yli = + Plxli+ rli to the following data:
Po
Calculate s:, stof,and .s; 8.14 Matrix least squares. A single matrix least squares calculation can be employed when the same linear model is used to fit each of several system responses. The D , X,(X’X), and (X’X)-’ matrices remain the same, but the Y, (X’Y), 8, and R matrices have additional columns, one column for each response. Fit the model yji = Po + Plxli + qi to the following multiresponse data, j = 1, 2, 3:
What are the b’s for each response? Calculate R. Calculate sTi for each of the three responses. 8.15 Coding. Code the data in Figure 8.7 so that the distance between the left point and the center point is 1.5 in the factor x;, and the center point is located at x; = 3. What are cxl and dxlof Equation 8.46 for this transformation? 8.16 Coding. “Autoscaling” is a technique for coding data so that the mean is zero and the standard deviation is unity. What should cyland dylbe to autoscale the nine responses in Section 3.1? (Hint: see Figure 3.3.)
150
8.I7 Covariance. Give interpretations of the covariance plots shown in Figures 8.11-8.13. Why is the covariance between Po and PI, not equal to zero when x13= 0 (Figure 8.12)? 8.18 Covariance.
Can you find an experimental design such that the off-diagonal elements of Equation 8.45 are all equal to zero? What would be the covariance between b, and b,, between b, and b,,, and between b, and b,, for such a design? [See, for example, Box and Hunter (1957).]
15 1
CHAPTER 9
Analysis of Variance (ANOVA)for Linear Models
In Section 6.4,it was shown for replicate experiments at one factor level that the sum of squares of residuals, SS,, can be partitioned into a sum of squares due to purely experimental uncertainty, SSP, and a sum of squares due to lack of fit, SS,op Each sum of squares divided by its associated degrees of freedom gives an estimated variance. Two of these variances, and sb, were used to calculate a Fisher F-ratio from which the significance of the lack of fit could be estimated. In this chapter we examine these and other sums of squares and resulting variances in greater detail. This general area of investigation is called the “analysis of variance” (ANOVA) applied to linear models [Scheffk (1953), Dunn and Clark (1987), Allus, Brereton, and Nickless (1989), and Neter, Wasserman, and Kutner (1990)l.
9.1 Sums of squares There is an old “paradox” that suggests that it is not possible to walk from one side of the room to the other because you must first walk halfway across the room, then halfway across the remaining distance, then halfway across what still remains, and so on; because it takes an infinite number of these “steps,” it is supposedly not possible to reach the other side. This seeming paradox is, of course, false, but the idea of breaking up a continuous journey into a (finite) number of discrete steps is useful for understanding the analysis of variance applied to linear models. Each response y l i may be viewed as a distance in response space (see Section 2.1). Statisticians find it useful to “travel” from the origin to each response in a number of discrete steps as illustrated in Figure 9.1. Each journey can be broken up as follows.
(1) From the origin (0) to the mean of the data set QJ. This serves to get US into the middle of all of the responses. It is allowed by the degree of freedom associated with Po, the offset parameter.
152
’
Factor e f f e c t
t
Mean o f d a t a set
I
I
0
Y1
911
Vll
I y11
Total
I
Corrected f o r mean Residual t------------l
Figure 9.1 Discrete steps involved in “traveling“from the origin to a given response, Y , ~ .
(2) From the mean of the data set (jI)to the value predicted by the model Gli). This distance is a measure of the effectiveness of the model in explaining the variation in the data set. It is allowed by the degrees of freedom associated with the coefficients of the factor effects (the other p’s). (3) From the value predicted by the model GI;) to the mean of replicate responses (if any) at the same factor level (jJ. This distance is a measure of the lack of fit of the model to the data; if the model does a good job of predicting the response, this distance should be small. (4) Finally, from the mean of replicate responses GI,.)to the response itself (yJ. This distance is a measure of the purely experimental uncertainty. If the measurement of response is precise, this distance should be small.
Alternative itineraries can be planned with fewer stops along the way (see Figure 9.1). Two useful combinations are ( 5 ) From the mean of the data set GI) to the response itself (y,,.). This is a distance that has been “corrected” or “adjusted” for the mean.
(6) From the value predicted by the model GI,.)to the response itself (yli). This distance corresponds to the already familiar “residual.” In this section we will develop matrix representations of these distances, show simple matrix calculations for associated sums of squares, and demonstrate that certain of these sums of squares are additive.
153
Total sum of squares
The individual responses in a data set are conveniently collected in a matrix of measured responses, Y
Y=
[!j
(9.1)
Yln
The total sum of squares, SS,, is defined as the sum of squares of the measured responses. It may be calculated easily using matrix techniques.
(9.2)
The total sum of squares has n degrees of freedom associated with it, where n is the roral number of experiments in a set.
Sum of squares due to the mean
Probably the most interesting and useful aspect of a data set is its ability to reveal how variations in the factor levels result in variations among the responses. The exact values of the responses are not as important as the variations among them (see Section 1.4). When this is the case, a Po term is usually provided in the model so that the model is not forced to go through the origin, but instead can be offset up or down the response axis by some amount. It is possible to offset the raw data in a similar way by subtracting the mean value of response from each of the individual responses (see Figures 9.1 and 3.3). For this and other purposes, it will be convenient to define a matrix of mean response, Y , of the same form as the response matrix Y, but containing for each element the mean response 7, (see Section 1.3).
154
This matrix may be used to calculate a useful sum of squares, the sum of squares due to the mean, SS,,,,,,.
ss,,,,
n
= F E=
1 y: I=
(9.4)
I
The sum of squares due to the mean always has only one degree of freedom associated with it.
Sum of squares corrected for the mean The mean response can be subtracted from each of the individual responses to produce the so-called “responses corrected for the mean.” This terminology is unfortunate because it wrongly implies that the original data was somehow “incorrect”; “responses adjusted for the mean” might be a better description, but we will use the traditional terminology here. It will be convenient to define a matrix of responses corrected for the mean, C .
(9.5)
This matrix may be used to calculate another useful sum of squares, the sum of squares corrected for the mean, S,, sometimes called the sum of squares about the mean or the corrected sum of squares.
155
SS,,, =c‘c=( Y - E) ’ ( Y - E) =
c (y1 -yI n
)2
i= 1
(9.6)
The sum of squares corrected for the mean has n - 1 degrees of freedom associated with it. It is a characteristic of linear models and least squares parameter estimation that certain sums of squares are additive. One useful relationship is based on the partitioning Yli=YI
+(Yli-Yl)
(9.7)
as shown in Figure 9.1. It is to be emphasized that just because certain quantities are additive (as they are in Equation 9.7), this does not mean that their sums of squares are necessarily additive; it must be shown that a resulting crossproduct is equal to zero before the additivity of sums of squares is proved. As an example, the total sum of squares may be written
n
n
n
(9.10) By Equations 9.4 and 9.6, n
c
ss~=ssmean+s~corr+2Y~
(Yli-Y’I)
(9.1 1 )
i= 1
The summation in Equation 9.11 can be shown to be zero. Thus,
The degrees of freedom are partitioned in the same way. DF, =DF,,,,
+
+DF,,,, = 1 ( n - 1 ) = n
(9.13)
Although the partitioning of the total sum of squares (SS,) into a sum of squares due to the mean (SS,,,) and a sum of squares corrected for the mean (SS,,,) may be carried out for any data set, it is meaningful only for the treatment of models containing a Po term. In effect, the Po term provides the degree of freedom necessary
156
for offsetting the responses so the mean of the “corrected” responses can be equal to zero. Models that lack a Poterm not only force the factors to explain the variation of the responses about the mean, but also require that the factors explain the offset of the mean as well (see Figure 5.9). Inclusion of a Po term removes this latter requirement [see S. Deming (1989b)l.
Sum of squares due to the factors (due to regression)
Using matrix least squares techniques (see Section 5.2), the chosen linear model may be fit to the data to obtain a set of parameter estimates, B, from which predicted values of response, Gli, may be obtained. It is convenient to define a matrix of estimated responses, P.
(9.14)
Some of the variation of the responses about their mean is caused by variation of the factors. The effect of the factors as they appear in the model can be measured by the differences between the predicted responses &) and the mean response Ol). For this purpose, it is convenient to define a matrix of factor contributions, F .
(9.15)
This matrix may be used to calculate still another useful sum of squares, the sum of squares due to the factors as they appear in the model, SSfacl,sometimes called the sum of squares due to regression, SS,,.
157 n
S S ~ ~ , , = F F = ( P - E ) ~ ( P - ErC =)l =( 9 1 1 - ~ 1 ) 2
(9.16)
For models containing a Po term, the sum of squares due to the factors has p - 1 degrees of freedom associated with it, where p is the number of parameters in the model. For models that do not contain a Po term, SS,, has p degrees of freedom. Sum of squares of residuals
We have already defined the matrix of residuals, R , in Section 5.2. It may be obtained using matrix techniques as
YI I -91
1
Y12-912 Y13-913
Y - XB=
(9.17)
YIn-91,. This matrix may be used to calculate the sum of squares of residuals, SS,, sometimes called the sum of squares about regression. (9.18)
The sum of squares of residuals has n - p degrees of freedom associated with it. The sum of squares corrected for the mean, SS,,,, is equal to the sum of squares due to the factors, SSfac,,plus the sum of squares of residuals, SS,. This result can be obtained from the partitioning
(see Figure 9.1). The sum of squares corrected for the mean may be written (9.20) (9.21)
158 n
n
n
(91i-91)2+ C ( Y I I - ~ I ~ ) ~ + ~ C
SScorr=
($jli-~1)(~li-$li)
I=
1
I=
I
I=
(9.22)
1
By Equations 9.16 and 9.18, n
s s c o m =sSfact
+S S r +2 C1 (91 i
~
1
(Y) 1
i
-9 I
i
)
(9.23)
I=
It can be shown that the rightmost summation in Equation 9.23 is equal to zero. Thus, (9.24)
SScorr =Ssfact + S s r
The degrees of freedom are partitioned in the same way. DFco,=DF~ac,+DFr = ( p - 1 )
+ ( n - p ) = (n- 1)
(9.25)
for models containing a Po term. If a Po term is not included in the model, then the sum of squares due to the mean and the sum of squares corrected for the mean are not permitted, in which case
ss, =SSfac,+ss,
(9.26)
DFT=DFfac,+DFr=p+( n - p ) = n
(9.27)
and
Sum of squares due to lack of $t Before discussing the sum of squares due to lack of fit and, later, the sum of squares due to purely experimental uncertainty, it is computationally useful to define a matrix of mean replicate responses, J , which is structured the same as the Y matrix, but contains mean values of response from replicates. For those experiments that were not replicated, the “mean” response is simply the single value of response. The J matrix is of the form
J=
[
(9.28)
Vl n
159
As an example, suppose that in a set of five experiments, the first and second are replicates, and the third and fifth are replicates. If the matrix of measured responses is
two replicates (9.29) two other replicates
then the matrix of mean replicate responses is
(9.30)
Note that in the J matrix, the first two elements are the mean of replicate responses one and two; the third and fifth elements are the mean of replicate responses three and five. The fourth element in the Y and J matrices are the same because the experiment was not replicated. In a sense, calculating the mean replicate response removes the effect of purely experimental uncertainty from the data. It is not unreasonable, then, to expect that the deviation of these mean replicate responses from the estimated responses is due to a lack of fit of the model to the data. The matrix of lack-of-fit deviations, L, is obtained by subtracting P from J
Y11-91, V12-912 V13
-91
3
(9.31)
Vl n -91n
This matrix may be used to calculate the sum of squares due to luck of fit, SS,,+
160 n
SSIof=L'L=( I -
P)' (I- P) =
c
I=
(9.32)
(Yll
I
Iff is the number of distinctly different factor combinations at which experiments have been carried out (also called design points), then the sum of squares due to lack of fit has f - p degrees of freedom associated with it.
Sum of squares due to purely experimental uncertainty
The matrix of purely experimental deviations, P , is obtained by subtracting J from Y.
Yl I
-Y1 I
Y12-712 YI3-Yl3
P= Y- J=
(9.33)
YIn-Yln
TABLE 9.1 Summary of matrix operations used to calculate sums of squares.
Sum of squares
Matrix operation
Degrees of freedom
Y'Y
n
PP
1
c c = ( Y - R'(Y - r)
n-1
F%=
( P - @(P- F) = (XE - V ( X E - F)
P-1
P) = (Y - Xb)'(Y - X E )
"-P
R'R = (Y - P)'(Y L'L = (J
- @(J - P)= (J - XE)'(J - X E )
PP = (Y -J)'(Y -J)
f-P n
-f
161
Zeros will appear in the P matrix for those experiments that were not replicated 61, = yli for these experiments). The sum of squares due to purely experimental uncertainty is easily calculated.
ss,, = P P = ( Y - J ) ’ ( Y - J ) =
c (y1, n
-yIi)2
(9.34)
i= 1
The sum of squares due to purely experimental uncertainty has n - f degrees of freedom associated with it. The sum of squares due to lack of fit, SSloFand the sum of squares due to purely experimental uncertainty, SS,, add together to give the sum of squares of residuals,
SS,.
ss, =S&f +ss,,
(9.35)
The degrees of freedom are similarly additive.
+
OF, =DF1,f DFpe= ( f- p ) + ( y1 -f ) = ( n -p )
(9.36)
Table 9.1 summarizes the matrix operations used to calculate the sums of squares discussed in this section for models that contain a Po term.
9.2 Additivity of sums of squares and degrees of freedom Figure 9.2 summarizes the additivity of sums of squares and degrees of freedom for models containing a Po term. For models that do not include a Po term, the partitioning is shown in Figure 9.3. The information contained in Figures 9.2 and 9.3 is commonly presented in an analysis of variance (ANOVA) table similar to that shown in Table 9.2 for models containing a Po term; Table 9.3 is an ANOVA table for models that do not contain a Po term. The column farthest to the right in each of these tables contains the values that are obtained by dividing a sum of squares by its associated degrees of freedom. These values are called “mean squares.” Statistically, the mean squares are estimated variances. Although ANOVA tables are often used to present least squares results for linear models, the diagrammatic tree structure of Figures 9.2 and 9.3 better illustrates several important statistical concepts. We will continue to use these diagrams throughout the remainder of the text.
162
Figure 9.2 Sums of squares and degrees of freedom tree illustrating additive relationships for linear models that contain a Po term.
9.3 Coefficients of determination and correlation Figure 9.4 emphasizes the relationship among three sums of squares in the ANOVA tree - the sum of squares due to the factors as they appear in the model, SS,, (sometimes called the sum of squares due to regression, SSreg); the sum of squares of residuals, SS,; and the sum of squares corrected for the mean, SS,,, (or the total sum of squares, SS,, if there is no Po term in the model). If the factors have very little effect on the response, we would expect that the sum of squares removed from SS,,, by SS,,, would be small, and therefore SS, would be TABLE 9.2 ANOVA table for linear models containing a
Poterm.
Source
Degrees of freedom
Sum of squares
Mean square
Total
n
SST
Mean
1
S~mm
Factor effects
P-1
SSfaC,
SS,,/@
Purely experimental uncertainty
n
-f
SSP,
SS,/(n
SS, In
,,s
I1
- 1)
-A
163 TABLE 9.3 ANOVA table for linear models lacking a Po term. Source
Degrees of freedom
Sum of squares
Mean square
Total
n
SST
SS, In
Factor effects
P
SSfaCl
SSfaC,/P
Residuals
n-p
ssc
SS,f(n - P)
Lack of fit
f-P
SSbf
SS,, /(f- P )
Purely experimental uncertainty
n
SSp.
SSp.l(n - f)
-f
large - about the same size as SS,. Conversely, if the factors do a very good job of explaining the responses, we would expect the residuals to be very small and SS, to be relatively large - about the same size as SS,,,. The coeflcient of multiple determination, R2, is a measure of how much of SS, is accounted for by the factor effects.
R =S S d S S c o r r
(9.37)
The coefficient of multiple determination ranges from 0 (indicating that the factors, as they appear in the model, have no effect on the response) to 1 (indicating that the factors, as they appear in the model, “explain” the data “perfectly”). The square root of the coefficient of multiple determination is the coeflcient of multiple correlation, R. If the model is the two-parameter (Po and P,) single-factor straight-line relationship y l i = Po + Plxli + rli,then R2 is given the symbol ? and is called the coeflcient of determination. It is defined the same way R2 is defined.
, /
SSfact
SSr
+y SSIOf f-P
ope n-f
)1
Figure 9.3 Sums of squares and degrees of freedom tree illustrating additive relationships for linear models that do not contain a Po term.
164
(&EL) SScorr
P-1
n-p
Figure 9.4 Relationships among SSf,,, SS,, and SS,, for calculating both the coefficient of multiple determination, R2, and the variance-ratio for the significance of the factor effects, Fk,,,-p).
Again, 2 ranges from 0 to 1. The square root of the coefficient of determination is the coeficient of correlation, r, often called the correlation coeficient.
r= SGN( b, ),/( r 2 )
(9.39)
where SGN(b,) is the sign of the slope of the straight-line relationship. Although r ranges from -1 to +1, only the absolute value is indicative of how much the factor explains the data; the sign of r simply indicates the sign of b, [S. Deming (1989c)l. It is important to realize that an R or r value (instead of an R2 or 3 value) might give a false sense of how well the factors explain the data. For example, the R value of 0.956 arises because the factors explain 91.4% of the sum of squares corrected for the mean. An R value of 0.60 indicates that only 36% of SS,, has been explained by the factors. Although most regression analysis programs will supply both R (or r) and R2 (or ?) values, researchers seem to prefer to report the coefficients of correlation ( R and r) simply because they are numerically larger and make the fit of the model look better. Although the coefficients of determination and the correlation coefficients are conceptually simple and attractive, and are frequently used as a measure of how well a model fits a set of data, they are not, by themselves, a good measure of the effectiveness of the factors as they appear in the model, primarily because they do not take into account the degrees of freedom. Thus, the value of R2 can usually be increased by adding another parameter to the model (until p =fl, but this increased R2 value does not necessarily mean that the expanded model offers a significantly better fit. It should also be noted that the coeficient of determination gives no indication of whether the lack of perfect prediction is caused by an inadequate model or by purely experimental uncertainty.
165
9.4 Statistical test for the effectiveness of the factors A statistically valid measure of the effectiveness of the factors in fitting a model to the data is given by the Fisher variance ratio
for linear models containing a Poterm. (For models that do not contain a Po term, the ratio is still s:act /s:, but there are p degrees of freedom in the numerator, not p - 1.) The F-test for the significance of the factor effects is usually called the test for the significance of the regression. Although it is beyond the scope of this presentation, it can be shown that s:~.~, sf, and would all be expected to have the same value if the factors had no effect on the response. However, if the factors do have an effect on the response, then sfactwill become larger than sf. The statistical test for the effectiveness of the factors asks the question, “Has a significant amount of variance in the data set been accounted for by the factors as they appear in the model?” Another way of asking this is to question whether or not one or more of the parameters associated with the factor effects is significant. For models containing a Po term, the null hypothesis to be tested is
(there are only p - 1 parameters because the parameter Po does not represent a factor effect) with an alternative hypothesis that one (or more) of the parameters is not equal to zero. We will let ( x ) represent some function of the factor(s) x and designate
as the expanded model and Yli=Po+rli
(9.43)
as the reduced model. If one or more of the parameters in the set (PI, ..., P,,) is significant, then the variance of residuals for the expanded model should be significantly less than the variance of residuals for the reduced model. The difference, sTact, must be due to one or more of the factor effects (i.e., the PI, ..., P,,-l in the expanded model). The more significant the factor effects, the larger s:,~ will be with respect to sf.
166
9.5 Statistical test for the lack of fit Figure 9.5 emphasizes the relationships among three other sums of squares in the ANOVA tree - the sum of squares due to lack of fit, SS,of;the sum of squares due to purely experimental uncertainty, SSP; and the sum of squares of residuals, SS,. Two of the resulting variances, s:,f and sk, were used in Section 6.5 where a statistical test was developed for estimating the significance of the lack of fit of a model to a set of data. The null hypothesis
Ho:
Slof-Spe=O 2 2
(9.44)
is tested with the Fisher variance ratio
The value of F calculated in this manner can be tested against a tabular critical value of F,,, at a given level of confidence (see Section 6.5), or the level of confidence at which the calculated value of FCdcis critical may be determined (see Section 6.6). If the null hypothesis of Equation 9.44 is disproved (or is seriously questioned), the conclusion is that there is still a significant amount of the variation in the measured responses that is not explained by the model. That is, there is a significant lack of fit between the model and the data. We emphasize that if the lack of fit of a model is to be tested, f - p (the degrees of freedom associated with SS,,3 and n - p (the degrees of freedom associated with SS,) must each be greater than zero; that is, the number of factor combinations must be greater than the number of parameters in the model, and there should be replication to provide an estimate of the variance due to purely experimental uncertainty.
Figure 9.5 Relationship between SS,orand SS, for calculating the variance-ratiofor the significance of the lack of fit, Fv-p.n-,,.
167
9.6 Statistical test for a set of parameters Let us now consider the more general case in which the reduced model contains more parameters than the single parameter Po. We assume that we have an expanded model
containing p parameters, and wish to test the significance of the set of parameters p, through p,-,. The reduced model is then
and contains g parameters. An appropriate partitioning of the sums of squares for these models is shown in Figure 9.6 and Table 9.4, where SS, is the sum of squares of residuals for the reduced model (‘‘IT”= “residuals, reduced”), and SS, is the sum of squares of residuals for the expanded model (“re” = “residuals, expanded”). The difference in these sums of squares (SS,,,) must be due to the presence of the additional p - g parameters in the expanded model. (In Figure 9.6, the notation SS, refers to the sum of squares due to the factor effects in the reduced model only; SSexp refers to the sum of squares due to the additional factor effects in the expanded model). An appropriate test of the hypothesis that
Figure 9.6 Sums of squares and degrees of freedom tree illustrating additive relationships for testing a subset of the parameters in a model.
168
TABLE 9.4 ANOVA table for testing the significance of a set of parameters. Source
Degrees of freedom
Sum of squares
Mean square
Total
n
SST
SS T/n
Mean
1
Corrected
n-1
ssmn
SS,,l(n - 1 )
Factors, reduced model
g-1
SSd
SSdk
Residuals, reduced model
n-g
ssn
SSnh
Factors, additional
P-g
SSCXP
SSexpl(P - g)
Purely experimental uncertainty
n -f
SSP
SS,l(n
lea"
ss-,
I1
- 1)
- g)
-8
is
(9.49) If FCdc > FC",,then the null hypothesis is disproved at the given level of confidence, and one or more of the parameters Pg, ..., pp-l offers a significant reduction in the variance of the data. Alternatively, the level of confidence (or risk) at which one or more of the parameter estimates is significantly different from zero may be calculated (see Section 6.6). Equation 9.40 is seen to be a special case of Equation 9.49 for which g = 1. Why bother with an F-test for the significance of a set of parameters? Why not simply use a single-parameter test to determine the significance of each parameter estimate individually? The answer is that the risk of falsely rejecting the null hypothesis that pi = 0 for at least one of the parameters is no longer a: the probability of falsely rejecting at least one of the null hypotheses when all k null hypotheses are true is 11 - (1 - a)k3. If there are two parameters in the model, the overall risk becomes 0.0975 (if ct = 0.05); for three parameters, the risk is 0.1426; and so on. The use of the F-test developed in this section allows the simultaneous testing of a set of parameters with a specified risk of falsely rejecting the overall null
169
hypothesis when it is true. It should be pointed out that if the null hypothesis is disproved, it offers no indication of which parameter(s), either individually or jointly, are significantly different from zero. In addition, if the null hypothesis cannot be rejected, it does not mean that the parameter estimates in question are insignificant; it means only that they are not significant at the given level of probability (see Chapter 6). Again, determining the level of confidence at which F is significant is useful.
9.7 Statistical significance and practical significance The F-tests for the effectiveness of the factors and for the lack of fit sometimes give seemingly conflicting results: with some sets of data, it will happen that each of the F-tests will be highly significant. The question then arises, “How can a model exhibit both highly significant factor effects and a highly significant lack of fit?” Such a situation will often arise if the model does indeed fit the data well, and if the measurement process is highly precise. Recall that the F-test for lack of fit compares the variance due to lack of fit with the variance due to purely experimental uncertainty. The reference point of this comparison is the precision with which measurements can be made. Thus, although the lack of fit might be so small as to be of no practical importance, the F-test for lack of fit will show that it is statistically significant if the estimated variance due to purely experimental uncertainty is relatively very small. It is important in this case to keep in mind the distinction between “statistical significance” and “practical significance.” If, in a practical sense, the residuals are small enough to be considered acceptable for the particular application, it is not necessary to test for lack of fit.
170
Exercises 9.1 Total sum of squares. Calculate the total sum of squares, SS,, for the nine responses in Section 3.1 (see Equation 9.2). How many degrees of freedom are associated with this sum of squares? 9.2 Sum of squares due to the mean. Calculate the sum of squares due to the mean, SS,,,,, for the nine responses in Section 3.1 (see Equation 9.4). How many degrees of freedom are associated with this sum of squares? 9.3 Sum of squares corrected for the mean. Use the C matrix of Equation 9.5 to calculate the sum of squares corrected for the mean, SS,,, for the nine responses in Section 3.1 (see Equation 9.6). How many degrees of freedom are associated with this sum of squares?
9.4 Variance. Calculation of SS, is allowed if there is a Po term in the model. A sum of squares divided by its associated degrees of freedom gives an estimated variance. Comment on the model underlying the calculations in Sections 3.1 and 3.3. Comment on the fact that the numerical value of Equation 3.5 is equal to SS,,,.)DF,,, from Problem 9.3. 9.5 Additivity of sums of squares and degrees of freedom. Use the results of Problems 9.1-9.3 to illustrate that SS, = SS,,,, DF, = DF,,, + DF,,.
+ SS,,
and that
9.6 Sum of squares due to factors. Fit the model y l i = Po + p , x l i + rli to the following data:
Calculate SS,, SS,,,,,, and SS,,,. Calculate P (Equation 9.14), F (Equation 9.13, and SS,,, (Equation 9.16). How many degrees of freedom are associated with SS,,,?
171
9.7 Sum of squares of residuals. Use Equations 9.17 and 9.18 to calculate the sum of squares of residuals, SS,, for the data and model of Problem 9.6. How many degrees of freedom are associated with this sum of squares? Use numerical values to show that SS,,, = SSf,, + SS, and that DF,, = DFr,, + DF,. 9.8 Matrix of mean replicate responses and sum of squares due to lack of fit. Calculate the J matrix for Problem 9.6. Calculate the corresponding L matrix and SS,,. How many degrees of freedom are associated with this sum of squares? 9.9 Sum of squares due to purely experimental uncertainty. Calculate the sum of squares due to purely experimental uncertainty, SSp,for the model and data of Problem 9.6. How many degrees of freedom are associated with SSp? Use numerical values to show that SS, = SSlof+ SS, and that DF, = DF,of+ DFp. 9.10 Sums of squares and degrees of freedom tree. Use the data in Problems 9.6-9.9 to construct a sums of squares and degrees of freedom tree (Figure 9.2). 9.11 Matrix operations for sums of squares. Note the additivities that exist between each odd-numbered line (except the last) and the two lines below it in Table 9.1. What are the relationships between Table 9.1 and Figure 9.1? 9.12 Coefficients of determination and correlation. Calculate the coefficient of determination, r,, and the coefficient of correlation, r, for the model and data of Problem 9.6. What is the difference between the coefficient of determination and the coefficient of multiple determination? 9.13 Correlation coeflcient. What information is contained in r = -0.70? What information is contained in r = +1.36? 9.14 Significance of the regression. Calculate the Fisher F-ratio for the significance of the factor effects (Equation 9.40) for the model and data of Problem 9.6. At approximately what level of confidence is the factor x 1 significant? 9.15 Significance of the regression. In Section 9.4 it is stated that the statistical test for the effectiveness of the factors asks the question, “Has a significant amount of variance in the data set been
172
accounted for by the factors as they appear in the model?” Why is it necessary to qualify this with the phrase, “...as they appear in the model?”
9.16 Significance of the lack offit. Calculate the Fisher F-ratio for the significance of the lack of fit (Equation 9.45) for the model and data of Problem 9.6. Is the lack of fit very significant?
9.17 Degrees of freedom. The following data were obtained by an experimenter who believed a straight line relationship would be a good model. How many degrees of freedom are there for lack of fit?
3.5
2.8
7.6
5.9
3.5
3.1
3.5
2.6
7.6
5.5
3.5
2.7
7.5
6.3
9.18 Statistical test for a set of parameters. Section 9.6 shows how to test the significance of a set of parameters in a model. This “set” could contain just one parameter. Is it possible to fit a large, multi-parameter model, and then test each parameter in turn, eliminating those parameters that do not have a highly significant effect, until a concise, “best” model is obtained? Is it possible to start with a small, single-parameter model, and then add parameters, testing each in turn, keeping only those that do have a highly significant effect, until a “best” model is obtained? Would the order of elimination or addition of parameters make a difference? [See, for example, Draper and Smith (1981).] 9.19 Degrees of freedom. The following data were obtained by an experimenter who believed a straight line relationship would be a good model. How many degrees of freedom are there for lack of fit? Can the experimenter test the significance of the lack of fit?
173
2.6
3.5
2.7
3.5
2.8
3.5
3.1
3.5
5.5
7.6
5.9
7.6
6.3
7.6
9.20 Degrees of freedom. An experimenter wanted to fit the model y l i = Po + pIxli+ of Problem 9.17. Comment. 9.21 Hypothesis testing. Why is Equation 9.49 an appropriate test of the hypothesis H,:
+ rli to the data p,
= ... = pP1 = O?
9.22 Hypothesis testing. Suppose a food manufacturer uses a “backward elimination” procedure in its recipes. When first developed, the recipe is a gourmet delight! Before long, however, someone asks the question, “I wonder if it would taste difSerent if we used only half as many eggs?” The recipe is prepared with only half as many eggs, and is presented with the original product to a panel of tasters. The null hypothesis is usually H,: (taste of original recipe) - (taste of modified recipe) = 0. The alternative hypothesis is that the difference in taste is significant, and the level of confidence is usually set at about 95%. Most of the time the null hypothesis cannot be disproved, so the modified recipe is used from then on. Soon, a second question is asked: “I wonder if the modified recipe would taste different if we used only half as much butter?” so the modified recipe is further modified and tested against the recipe that uses half as many eggs. And a third question, “I wonder if the modified-modified recipe would taste different if we used artificial flavoring?” so the modified-modifed recipe is further modified and tested against the recipe that uses half as many eggs and half as much butter. And so on. The result is a recipe that “tastes like cardboard.” If the “cardboard” recipe were now tested against the original recipe, do you
174
think the difference in taste would be significant at a high level of probability? What does all this say about testing parameters one at a time, rather than as a group? Consider the improvement of quality. Is it possible that small, incremental improvements might not individually show highly statistical significance, yet all of them taken as a whole might show a highly statistically significant effect? Is, in fact, the correct question being asked? [See Section 1.2, Table 1.1, and Figure 6.3.1 9.23 Lack of fit. A good method of detecting lack of fit is to plot residuals as a function of the factor levels. The technique works best for single-factor systems, but often can be used effectively for multifactor systems. If a straight line model were fit to the data in Figure 2.3, what would the plot of residuals vs. the level of factor x1 look like? What would this suggest about the adequacy of the model? Would the pattern of residuals suggest a form for an additional term in the model? What terms might be suggested if the residuals were distributed in an “”’-type pattern? An “M”-type pattern? 9.24 Statistical signijicance and practical significance. Look up an account of the discovery of the planet Pluto. What can be said about the effectiveness of the factors in the models that described the motions of the planets other than Pluto? What can be said about the lack of fit of these models to the available astronomical data? How was the lack of fit accounted for? 9.25 Experiment vs. observation. Are the data in Problem 9.24 the results of experimentation or observation (see Section 1.2)? 9.26 Efectiveness of factors and lack of fit. Science is often described as an iterative process that involves an interplay between hypothesis and experimentation. [See, for example, Box (1976).] How are the concepts of “goodness of fit” and “lack of fit” used in this iterative process? 9.2 7 Experimental design. A system is thought to be described by the model y l i = Po + P,xli+ rli; in fact, it + rli. Can you design a can be better described by the model yli = Po + p l x l i + set of experiments that would tend to show that the factor effects in the first model are highly significant, and that the model exhibits very little lack of fit? Can you design a set of experiments that would tend to show that the factor effects in the first model are not very significant, and that the model exhibits a highly significant lack of fit? Can experimental design be used (or misused) to strengthen or otherwise
175
influence the conclusions drawn from experimental data?
9.28 Significance of regression. It has been suggested that it is “unfair” to judge the effectiveness of the factors (i.e., the significance of the regression) on the basis of sTact /s:: one of the components of s: is and the factor effects should not be asked (or expected) to account for imprecision. An alternative F-test might be siXt/s:,,~ Comment.
sL,
9.29 ANOVA table. Write an ANOVA table for the model y l i = plxli+ rli fit to the following data:
3.0
2.6
3.0
3.5
3.0
2.9
6.0
5.9
6.0
6.3
6.0
6.1
9.0
9.1
9.0
9.3
9.0
9.5
9.30 Mean squares. A “mean square residuals” is equal to 1.395. If the model contained Po and five additional parameters, and if the model was fit to data from twelve experiments, what is the variance of residuals? The sum of squares of residuals? 9.31 Sums of squares. The calculation of the mean and standard deviation in Chapter 3 can be viewed in terms of the linear model yti = Po + rli.Prepare a sums of squares and degrees of freedom tree (Figure 9.2) for this model and the nine data points of Section 3.1. How do you interpret the fact that SS,, = SS, = SS,? Why are DF,,, and DF,,, equal to zero?
This Page Intentionally Left Blank
177
CHAPTER 10
An Example of Regression Analysis on Existing Data
Although our purpose in introducing the subject of data treatment has been to provide insight into the design of experiments, the technique of least squares (regression analysis) is often used to fit models to data that have been acquired by observation rather than by experiment. In this chapter we discuss an example of regression analysis applied to observational data.
10.1 The data A chemistry student who was enrolled in a master of business administration (MBA) statistics course was given the assignment, “Collect data and perform a regression analysis.” The student worked in the finance department of a large chemical company. She and some of her colleagues were interested in predicting the amount of money in checks that would clear the banks one or two days in the future as a function of the amount of money issued in checks on previous days. Her goal was to develop a useful predictive model - it was not expected that the model would be exact, but it was hoped that the model would reduce the amount of uncertainty in the check clearing process. The student collected one data pair for each business day for the past nine-months. Each data pair consisted of (1) the amount of money issued by her department in computer-generated checks on that day, and (2) the amount of money in checks that cleared the banks on that day. Table 10.1 is a four-column list of the 177 pairs of data she collected. Each entry gives: (1) the sequence, or acquisition number (“Seq”), starting with Thursday, August 8, and increasing by one each business day, five business days a week; (2) a nominal scale (that can also be used as an ordinal or interval scale) for the day of the week (“D”), where 1 = Monday, 2 = Tuesday, 3 = Wednesday, 4 = Thursday, and 5 = Friday; (3) the amount of money issued in checks (“Iss”); and (4) the amount of money in checks that cleared (“Clr”). Figure 10.1 plots the issues and clearings as a function of the sequence number. The large amount of variation in both issues and clearings is apparent.
178 ,
T---’
”
”
Clearings
0 n.
1
Issues
0 rL i
L
3
0 4 E
-
0
Days
Figure 10.1 The amounts of check clearings and check issues as a function of days.
10.2 Preliminary observations Before “fitting a model to the data,” it is often wise to look at the data and listen with our eyes to hear what the data is trying to tell us about the form the model might take. Profound knowledge [W. Deming (1986)l can usually be used to develop mechanistic models that are superior to off-the-shelf empirical models (see Section 1.4). Inspection of Table 10.1 and Figure 10.1 shows that on eight days the amounts of issues and clearings are both zero (sequence numbers 8, 18, 68, 81, 100, 105, 118, and 138). Five of these eight days are Mondays and represent legal holidays (days 8, 18, 68, 118, and 138); one day (day 81, Thursday) is the American Thanksgiving holiday; and days 100 and 105 represent Christmas and New Years holidays, respectively, each of which fell on a Wednesday. These eight data points should probably be excluded from the data set used to model the non-holiday behavior of the system. A separate model can be used to accurately predict the clearings (vli)on holidays: Y,i=O
(10.1)
There is an extreme data point at day 128 where the clearings have the value 26.0. This is by far the largest value in the data set: it is half again as large as the next largest value (17.6 on day 99). Although the student did not find out the exact reason
179
TABLE 10.1 Data pairs for check clearing data. Seq D Iss 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
4
4.9 4.1 1 5.8 2 2.7 3 4.1 4 2.6 5 5.0 1 0.0 2 7.5 3 5.4 4 9.1 5 8.4 1 15.4 2 11.4 3 6.5 4 2.8 5 10.0 1 0.0 2 9.4 3 8.9 4 4.8 5 4.0 1 4.9 2 6.3 3 3.6 4 2.4 5 5.9 1 7.5 2 4.0 3 2.0 4 2.0 5 5.0 1 0.5 14.9 2 12.9 2.0 3 6.3 12.7 4 5.2 7.7 5 7.6 7.1 1 3.7 11.8 2 12.9 9.7 4.2 3 3.0 4 1.9 7.1 5 6.6 10.1 1 5.6 9.1 44 2 5.0 5.1 45 3 2.2 5.1
5
3.3 4.9 3.0 4.5 3.2 7.7 11.1 0.0 16.9 5.7 6.3 3.3 5.1 6.9 5.9 6.7 4.1 0.0 3.5 5.1 3.6 3.5 4.7 2.4 10.4 3.4 3.4 7.4 6.3 11.5 0.5 21.9
Clr
Seq D Iss
Clr
Seq D Iss
Clr
Seq D Iss
Clr
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
2.9 3.1 8.3 0.0 5.5 3.7 5.4 9.8 10.9 7.0 8.7 14.1 15.8 8.8 9.4 7.8 6.9 8.3 7.2 6.1 2.4 6.0 0.0 8.3 9.4 3.3 3.4 8.8 10.9 5.9 5.0 12.5 13.3 14.5 14.5 0.0 2.8 10.6 9.9 7.8 3.5 2.7 8.8 2.5 7.4
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135
3.3 3.9 6.5 7.4 7.5 5.9 15.0 8.6 17.6 0.0 14.0 10.8 11.8 15.3 0.0 6.4 4.6 6.5 8.7 3.9 1.5 3.6 6.2 6.4 6.8 4.7 3.4 0.0 7.9 10.8
136 4 137 5 138 1 139 2 140 3 141 4 142 5 143 1 144 2 145 3 146 4 147 5 148 1 149 2 150 3 151 4 152 5 153 1 154 2 155 3 156 4 157 5 158 1 159 2 160 3 161 4 162 5 163 1 164 2 165 3 166 4 167 5 168 1 169 2 170 3 171 4 172 5 173 1 174 2 175 3 176 4 177 5
2.1 1.6 0.0 7.2 5.7 5.7 2.3 9.4 16.9 11.6 4.0 4.5 10.0 7.6 6.4 3.5 5.0 1.0 6.2 3.8 1.7 2.4 5.4 5.1 4.7 3.8 5.1 10.0 7.0 12.1 7.2 4.8 10.3 8.3 4.8 3.8 4.5 7.4 4.7 4.8 1.9 4.8
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3
2.9 0.0 0.5 9.9 11.8 5.5 0.0 10.0 7.9 8.0 6.4 7.1 9.7 2.3 5.8 4.4 8.2 1.2 0.0 2.9 4.5 7.2 0.0 6.3 8.1 6.0 14.0 6.2 14.1 7.2 8.0 13.6 4.0 12.4 5.6 0.0 7.2 0.0 2.7 4.5 1.4 5.1 6.2 3.7 6.0
4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3
4.6 7.8 5.9 18.8 19.7 17.5 4.3 11.2 11.9 0.0 9.8 13.2 4.8 5.1 0.0 3.6 2.7 3.2 1.8 3.0 3.1 5.2 18.2 5.6 5.5 9.3 7.6 0.0 10.1 10.2 9.5 6.0 4.0 2.9 6.6 5.5 3.3 2.1 3.9 3.8 5.8 9.0 3.6 2.7 3.6
11.9 5.5 15.1 15.7 3.9 5.2 2.5
26.0 5.0 4.7 2.6 2.6 9.9 6.4 9.9
7.1 3.6 0.0 5.3 11.5 9.1 9.2 4.8 6.0 10.7 5.5 4.0 3.2 1.0 3.9 2.4 3.9 2.0 12.0 3.9 4.2 0.0 4.9 10.2 10.0 6.5 5.6 7.1 1.3 10.5 4.8 2.5 3.5 4.6 5.3 2.3 3.9 2.3 4.3 2.8 1.8 9.5
180
for the aberration, discussions with individuals in her department led her to believe that a special cause was at work and that this data point did not represent the normal behavior of the system. Thus, it was decided to exclude this point from the data set.
10.3 Statistical process control charts of the data Statistical process control charts [Wheeler and Chambers (1986), Wheeler (1987), and Grant and Leavenworth (1988)l were prepared to gain insight into the stability of the check clearing process. The top panel in Figure 10.2 shows the clearings as a function of measurement number (the original data minus the deletions discussed above). The middle panel, the “x-bar chart,” shows the sequential five-member subgroup averages as a function of subgroup number. The bottom panel shows the five-member subgroup ranges as a function of subgroup number. The range chart
-4, , I 20 40
I
,
,
I
I
, , , ,-,
80 100 120 140 M e a s u r e m e n t Number
60
I
,
I
I
I/
160 180
Figure 10.2 Statistical process control charts for clearings. Top panel: runs chart showing clearings as a function of measurement number. Middle panel: x-bar chart with dashed upper control limit (UCL) and lower control limit (LCL); solid horizontal line is the grand mean, f . Bottom panel: range chart with dashed upper control limit (UCL); solid horizontal line is the average range, F.
181
shows that the short-term variability of the clearings process is stable and in statistical control. The x-bar chart shows that the clearings process mean is stable and in statistical control, with only one exception - the excursion above the upper control limit at subgroup 20. More revealing, however, is the cyclical pattern in the x-bar chart. There appears to be a cycle every four or five subgroups. Each period in the cycle in the x-bar chart consists of two low points followed by two high points. Close inspection shows that the slopes defined by the two high points start out negative, become positive, and then become negative again over the course of the chart. Similarly, the slopes defined by the two low points start out negative, become positive, and then become negative again over the course of the chart. This is often indicative of a phase phenomenon. The phase might be related to the rational choice of five as the subgroup number (the number of days in a week) applied to a sequence of data that is no longer synchronized with weeks - some of the data has been omitted as discussed previously. The phase might also be related to an interacting monthly cycle that would have a period of between four and five weeks duration.
M e a s u r e m e n t Number
Figure 10.3 Statistical process control charts for issues. See Figure 10.2 for details.
-- 1
1
10
20
30
S u b g r o u p Number
x i 10
20 Subgroup Number
30
Figure 10.4 Superimposed x-bar and range charts for clearings and issues. See text for details.
The top panel in Figure 10.3 shows the issues as a function of measurement number. The bottom range chart shows that the short-term variability of the issues process is stable and in statistical control. The x-bar chart shows that the issues process mean is stable and in statistical control, with one exception at subgroup 19. A cyclical pattern appears to be present in the x-bar chart, but the structure is not as well defined as it is for the clearings data. Figure 10.4 superimposes the x-bar and range charts for the clearings and issues data. The range chart suggests some correlation in subgroup variabilities for issues and clearings. The x-bar chart suggests a stronger correlation in subgroup means for issues and clearings with the issues generally leading the clearings, as expected.
10.4 Cyclical patterns in the data Initial inspection of Figure 10.1 showed what appears to be a cyclical pattern in the clearings and issues. This was confirmed by the statistical process control charts in Figures 10.2-10.4. The broad peaks and valleys in Figure 10.1 seem to repeat every 20 to 25 days. At first, this seemed to be a strange number of days for a cycle - a monthly cycle of 30 or 31 days would have made more sense to some of us. However, the student was quick to point out that the average business month has between 21 and 22 days: (365.25 calendar days per year) x (5 business days per week) 1 ( 7 calendar days per week) = 260.89 business days per year which, when divided by 12 months, is 21.75 or approximately 22 business days per month. This observation of a cycle suggested including in the eventual model a trigonometric sine function of the form y l r=
- - - + ps sin ( (x, -9)/z) + - - -
(10.2)
183
where x, is the sequence number, Q is a phase constant, z is the period, and p, is the magnitude of the cyclical effect. The phase Q serves to shift the cycle forward or backward in time (left or right in Figure 10.1). The period z serves to stretch or compress the cycle. Both the phase and period can be used to align the cycle with the peaks and valleys of the observed data. If Q and z were parameters of the model, then Equation 10.2 would be a non-linear model (see the introduction to Chapter 5). However, if the period (2)is fixed at the previously suggested value of 22 days, then it becomes a constant and is no longer a variable parameter. Inspection of Figure 10.1 suggests that the beginning of the cycle in clearings is approximately six to ten days into the sequence; thus, if the phase Q is fixed at an intermediate value of eight days, then it too becomes a constant and is no longer a variable parameter. If (I = 8 and z = 22, then the only adjustable parameter in Equation 10.2 is p,, and the model expressed by Equation 10.2 is linear:
10.5 Discussion of process details At this point, the student reviewed the check clearing process in detail. Checks are issued to a supplier and an encumbrance is placed on an internal account in the chemical company; the supplier cashes the check at the supplier’s bank; the check proceeds through a series of intermediate banks until it arrives at the chemical company’s bank; the issued amount is removed from the chemical company’s bank account (cleared); and the check is then returned to the chemical company where the encumbrance on the internal account is removed. The system must achieve an overall “mass balance” such that the total or cumulative amount of money issued must eventually equal the total or cumulative amount of money cleared. In view of this mechanism, the amount of money cleared on a given day must be related to the amount of money issued on previous days. If it always took the same amount of time for a check to go through the system, then the clearings could be predicted perfectly. However, because checks are issued throughout the business day; because the business day is only one-third of a full day; because business days for various suppliers are offset by different changes of time zones; because suppliers submit checks for cashing at their own time; because different banks involved in the clearing process are in different time zones; and because of a large number of other reasons, it does not always take the same amount of time for checks to go through the system.
10.6 An abandoned search for a driving force Given LeChatelier’s Principle (when a stress is placed on a system at equilibrium, the system will adjust itself in such a way as to relieve the stress), it seemed that if
184
the cumulative amount of money being issued temporarily exceeded the cumulative amount of money being cleared (over a period of a few days), then the check clearing process would find itself in a situation of stress. This system of stress could be relieved by increasing the number of clearings. Thus, it was felt that an indicator of this type of stress might be useful for modeling the behavior of the system. The student suggested that the cumulative issues be subtracted from the cumulative clearings (or vice versa) to provide a stress indicator. This integral of issues minus clearings might signal an increase or decrease in the number of clearings on a future day. Figure 10.5 shows the complementary integral of clearings minus issues plotted above the individual clearings and issues. Two things are immediately apparent. First, the cyclical behavior is evident again. The integration process tends to smooth noise or variability in data and reveal more clearly the underlying trends in the data. The integral has the features of a sine wave with a period of slightly greater than 20 days. Second (and potentially more interesting), the system does not seem to be closed - the clearings are consistently larger than the issues, with the result that the integral increases over time. This increasing difference between clearings and issues was puzzling. The amount of each check issued should be balanced by an equal amount cleared. One explanation for the behavior seen in the top panel of Figure 10.5 is that over the 177 business days investigated in this study there was a gradual decrease in the amount of time it took the system to clear checks. But the amounts involved suggest that the average clearing time would have to have decreased by about seven or eight days, an improvement that the student thought was highly unlikely. Another
Integral of clearings minus issues
Clearings
1
Issues
20
40
60
80
100 Days
120
140
160
180
200
Figure 10.5 Integral of clearings minus issues as a function of days. See text for details.
185
explanation for the increasing integral involves embezzlement. However, the student felt that this was also a highly unlikely explanation because of rigid verification procedures within the finance department. Investigation by the student revealed that the amounts reported to her for issues had been obtained for computer-generated checks only; the total did not include checks that had been hand written. The clearings, however, were based on both computer-generated and hand-written checks. Thus, there was a simple explanation for the increasing integral. Although using the combined amounts of computer- and hand-written checks for modeling purposes would probably give more precise predictions, it turned out that it would be difficult to obtain the amounts for the hand-written checks on a timely basis. Thus, the modeling problem was redefined to predict the amount of money in checks that would clear the banks on a given day expressed as a function of the amount of money issued in computer-generated checks on previous days. Because the integral in Figure 10.5 reflects more than just a transient stress on the system (it also includes the cumulative contribution of unknown hand-written checks), this search for a useful indicator of system stress that could be used as a driving force for the estimation of future clearings was abandoned.
0
,
m
.
.
.
,
,
,
.
,
,
,
I
,
,
.
,
.
.
.
.
Issues
Davs
Figure 10.6 The amounts of check clearings and check issues as a function of days, suggesting that Mondays (solid dots) have higher clearings.
186
10.7 Observation of a day-of-the-week effect. A chance remark by the student suggested that the day of the week might be an important factor for predicting the number of check clearings. She stated that in her experience, check clearings were always large on Monday and tapered off throughout the rest of the week. This observation seemed reasonable from a mechanistic point of view: if the processing of checks included transmission over the weekend, then some of the checks would be bunched up and ready for processing on Monday, with some of them perhaps increasing the Tuesday clearings as well. To test this hypothesis, the plot shown in Figure 10.6 was generated, where the solid black dots indicate Mondays. (Monday holidays are not marked.) The "Monday
I c->
i
> c
..
0
20
40
60
BO
100
[jay-
120
140
160
ieo
200
20
40
50
80
100
120
140
16C
Uayi
Figure 10.7 Plots similar to Figure 10.6 showing clearings on Tuesdays, Wednesdays, Thursdays, and Fridays.
1GO
2OLI
187
effect" seems to be real: in general, Monday clearings are higher than the clearings on the subsequent four days. To further test this hypothesis, similar plots were made for the other days of the week; the plots are shown in Figure 10.7 Thursdays and Fridays appear to give the smallest values of clearings, while Tuesdays and Wednesdays are intermediate. Thus, in a coarse way, the student's observation is verified: as the day number increases, the clearings decrease. This observation of a day-of-the-week effect suggested adding to the model a term of the form y, =
---+
PdXd
+- - -
T
(10.4)
Same D a y
r2 =
o
?1c59
"!
. . *
..
. . I , I
5
10
15 Issues
20
25
30
5
Figure 10.8 Clearings vs. issues for a lag of 0, 1, 2, and 3 days.
10
15 I55ue5
20
25
188
where x, is the ordinal or interval code for the day of the week (1 = Monday, 2 = Tuesday, 3 = Wednesday, 4 = Thursday, and 5 = Friday). A negative value for the parameter Pd would cause the clearings to decrease as the week progressed. This is an admittedly coarse way to model the day effect. It assumes that the change from day to day is constant.
10.8 Observations about prior check issues Unless the check clearing process were instantaneous, it seems unlikely that the clearings on a given day would be a function of the issues on that same day.
21
..... .. - . ....
.... . . .
.*
z
--i
*
, 5
_ . I
b .
'51 w
!
10
15
20
25
30
..... .,*:
..... ..* . .... -8.
. .. -*. . .
2 . .
8
*
.
a.
, 5
10
..
15
20
25
30
5
Issues
Figure 10.9 Clearings vs. issues for a lag of 4, 5, 6, and 7 days.
10
15
Issue5
-~ 2;
189
However, it does seem reasonable that the amount of money in checks that would clear the banks on a given day should be a function of the amount of money issued in checks on previous days (the student's original hypothesis). This was seen in the data earlier when close observation of Figure 10.4 indicated that the 22-day cycle in the check clearings lags (or comes later than, or appears to the right of) the cycle in the check issues. To gain a better understanding of how the check clearings on a given day might depend on check issues on previous days, a series of scatterplots was developed showing check clearings as a function of check issues (Figures 10.8-10.10). In each subsequent scatterplot, the time between clearing and issue has been increased by one day. Because the issues and clearings are all positive values, the scatterplots all lie z
y
-
Issues a t Cay
~
6
r c = 0 05808
I
.
7'
. Issue5 a t Day
~
~-
9
1
.:. ... . . ... .. .
w l :
.. ..... :. . .*
.. : " .-.
*
.%.
, 10
15
20
25
5
30
10
15
20
25
3C
I s 'iu e 5
Issues m 0
Issues a t
Day
~
z I :
Icisues a t Oak
10
I
- 2 = 0 00009
P 2
=
0 00282
.. .. . . ..:. ' .. .0 .
7-
5
10
15
20
,
25
30
.
10
I5sues
Figure 10.10 Clearings vs. issues for a lag of 8, 9, 10, and 1 1 days.
:5
I 5 t. ue 5
20
25
30
190
in the first quadrant. Days for which the clearings are zero have been omitted (the eight holidays and day 49); the unexplained large clearing on day 128 has also been omitted. As expected, the scatterplot for clearings and issues on the same day (the upper left scatterplot in Figure 10.8) does not suggest much correlation between check issues and clearings (2 = 0.01059). The scatterplot of clearings vs. issues on the previous day (the upper right scatterplot in Figure 10.8) doesn’t look much better (? = 0.02590). The scatter plot for two days difference between clearings and issues (the lower left scatterplot in Figure 10.8) also has low correlation (? = 0.02128). However, the data for a difference of three days between issues and clearings (the lower right scatterplot in Figure 10.8) begins to show some apparent correlation. The correlation coefficient for this data is 0.18284. Scatter plots for lags of four through 11 days are shown in Figures 10.9 and 10.10. Figure 10.11 plots the correlation coefficient of clearings vs. issues as a function of the lag time in days between them. The largest correlation coefficients occur at days 3 and 4. It was speculated that the relatively high correlation on these days might be an expression of the average amount of time it takes a check to get through the check clearing process. The secondary peak at eight days lag might be a one-week (five business days) shadow or ghost of the peak at a lag of three days. As a result of this analysis, it seemed worthwhile to include the amounts of checks issued three and four days previous to the day for which the clearings were to be predicted. Just to be on the safe side, days two and five were also included. This resulted in four more terms to be added to the model:
Lag i n Days
Figure 10.11 Coefficient of determination as a function of lag in days.
191 TABLE 10.2 Parameter estimates and levels of confidence. Parameter
Estimate
0
7.072 1.585 0.114 0.250 0.05 1 0.183
S
-2 -3 -4 -5 d
-1.206
where x - ~ x-~, , x, days previous.
Confidence
100.000 99.895 91.983 99.961 5 1.856 97.838 1oo.oO0
and x - ~are the amounts of checks issued two, three, four, and five
10.9 The linear model Given the above considerations, the following full linear model was constructed: yli= Po+
+
pssin[ (xs-8)/22]
p--3x-3
+ p--2x--2
+ p-4x4 + p--5x-5 + PdXd
(10.6)
The first term, Po, represents an offset or intercept. The second term, P,sin[(x,-8)/22], represents the monthly business cycle observed in the clearings data. The next four terms relate the number of clearings to the checks issued two, three, four, and five days previously. The last term represents the “day effect” discussed above. TABLE 10.3 Analysis of variance table for check clearing data. Source
Sum of Squares
Total Mean
6328.04 4813.58 1514.46 918.38 596.07 596.07 0.00
corr Fact Resid LOF PE
Variance
63.28 4813.58 15.30 153.06 6.4 1 6.41 -
D.F. 100 1 99 6 93 93 0
192
10.10 Regression analysis of the initial model It was decided to fit the model expressed by Equation 10.6 to the most recent 100 data points only (i.e., starting with sequence number 72) and then use the earlier data points to test the predictive capability of the fitted model (although the “prediction” is backward in time; see Section 3.5 and Exercise 4.19). Table 10.2 lists the parameter estimates and levels of confidence. Table 10.3 gives the ANOVA table and other statistics for the fitted model. The offset parameter (Po = 7.072 at the lOO.OOO% level of confidence) does not represent an average response, but rather corresponds to the clearings at day zero when all of the other factor effects have been removed. It represents a reference point to which the factor effects can be added. The second parameter (p, = 1.585 at the 99.895% level of confidence) is the hypothesized monthly cyclical term. The high level of significance indicates that it is extremely unlikely (a= 0.00105) that the correlation of the data with this cycle is accidental; it is more likely that the cyclical effect of time is real. The next four parameters represent check clearings on previous days: p-2 = 0.114 (91.983%), p-3 = 0.250 (99.961%), p4 = 0.051 (51.856%), and p-5 = 0.183 (97.838%). In view of Figure 10.11, the effects of the two- and three-day lags are as expected, but it is perhaps surprising that the four-day lag has such low relative importance and that the five-day lag has such high importance. Frequently, however, the presence of certain factor effects in a model (in this case, p, and P d , for example) in this can change the apparent single-factor effects of other factors (p-2-p-5 example). This phenomenon is discussed in more detail in Section 15.3. Finally, as expected, the day-of-the-week effect is negative and relatively large (p, = -1.206 at the 100.000%level of confidence). The R2 value is 918.38/1514.46 = 0.6064. Although this is clearly not a perfect fit, the model does account for approximately 60% of the variation in the data about the mean. Given the lack of information about hand-written checks and other uncertainties in the check clearing system, the performance of the model is probably reasonable. The Fisher F-ratio for the significance of regression, Ffactris (918.38/6)/(596.07/93) = 23.88; with 6 degrees of freedom in the numerator and 93 degrees of freedom in the denominator, this is significant at nearly the 100 percent level of confidence. Thus, one (or more) of the factor effects in the model seems to be describing some of the variation in the data. It is unlikely that the structure in the data has occurred by chance. Note that the Fisher F-ratio for the significance of lack of fit cannot be tested because there are no degrees of freedom for purely experimental uncertainty. This lack of degrees of freedom for replication is a usual feature of observational data. Any information about lack of fit must be obtained from patterns in the residuals. Table 10.4 gives alternative integer combinations of phase and period for the sine
193
TABLE 10.4 Effect of phase and period of the sine term on the correlation coefficient. Period Phase
21
22
23
-5
0.5773
-6
0.5944
-7
0.5641
0.6058
0.5617
-8
0.5582
0.6064
0.5586
-9
0.561 1
0.5987
0.5582
-10
0.5875
term in Equation 10.6. The choice of 8 for the phase and 22 for the period seems optimal.
10.11 Descriptive capability The top panel in Figure 10.12 shows clearings as a function of days with the fitted model (dark line) superimposed over the original data (light line). Holidays and the unusual clearing at day 128 are included. The residuals are shown in the bottom panel. The vertical lines at days 71.5 and 177.5 in Figure 10.12 bracket the 100 data points that were used to fit the model and separate this data from the earlier early data (to the left of the vertical line at 71.5) and the later data (to the right of the vertical line at 177.5).The fitted model does a reasonably good job of describing the clearings, both for the data on which the model is based as well as for the earlier data. The effects of the individual parameters in the fitted model are shown in Figures 10.13-10.15. The predictions based on the offset and cyclical term (7.0715176+ 1.5854050sine[(xs-8)/22]) are shown in the top panel of Figure 10.13; the residuals are shown in the bottom panel. The predictions based on the offset and lag terms (7.0715176 + 0.1135450~-,+ 0.2495185~-~ + 0.0506921~~ + 0.1827384~~) are shown in the top panel of Figure 10.14;the residuals are shown in the bottom panel.
194
The predictions that result from the offset and day-of-the-week terms (7.0715176 1.2063679~~) are shown in the top panel of Figure 10.15; the residuals are shown in the bottom panel. Again, when all of these factor effects are combined into the full model, the results shown in Figure 10.12 are obtained. The student used the fitted model to predict future clearings with the results shown in Figure 10.16. If the check clearings remain in a state of statistical control, then the uncertainty of prediction in the future should be well represented by the residuals shown in the bottom panel of this figure (see Section 3.5).
10.12 Future work This example illustrates some of the benefits of carefully considering the data before writing a model: mechanistic insights often provide meaningful terms in the model. But it is also important to carefully consider the data again after the model has been fitted: there might be opportunities for further improvements. The most obvious recommendation is to obtain a better estimate of the total amount of money issued in checks on each day by obtaining the information on manually-written checks. There appears to be evidence in the residuals that the clearings on Thursday are less than predicted by the first-order term invoIving the day of the week (Pdxdi).
.
.
.
20
.
.
40
.
.
60
.
.
80
.
. . 100
. . 120
. . 140
.
. 160
.
.
180
200
Days
Figure 10.12 Fitted model (dark line) superimposed on clearings data (light line). Data from days 72 through 177 (between vertical lines) were used to fit model.
195
1
clearings
-1
Residuals
i
I
i
i
Figure 10.13 Partial fitted model (7.0715 176 + 1.5854050sine((x,-8)/22)) superimposed on clearings data.
m 0
Clearings
1
N 0 Y
L
3 0
6 E
-
0
0
0
*
Y
+
c
2 0
Q E
0
20
40
60
80
100
120
140
160
180
0
Days
Figure 10.14 Partial fitted model (7.0715176 + 0.1135450~-~ + 0.2495185x-,, 0.1827384~-~) superimposed on clearings data.
+ 0 . 0 5 0 6 9 2 1 ~+~
196 . . . . . . .
0
m
,
,
,
,
'
"
1
'
'
'
'
I
Clearings
1
'
I
Residuals
,
,
,
20
/
1
40
( 60
,
, 80
,
,
,
100
,
,
120
,
,
140
,
, 160
'
1 180
, 200
Days
Figure 10.15 Partial fitted model (7.0715176 - 1.2063679~~) superimposed on clearings data.
Adding indicator variables [Neter, Wasserman, and Kutner (1990)], one for each day, might further improve the fit of the model to the data:
Clearings nl 0
Residuals
-
0
r
20
40
60
80 100 120 140 160 180 200 220 240 260
Days
Figure 10.16 Predictive capability of fitted model illustrated by data beginning with day 178.
197
where pTu- PF are the individual day effects (Tuesday through Friday) and xTui- xFi are the indicator variables. The indicator variables take on the value 1 if the y l i occurs on that day; otherwise the indicator variables are assigned the value 0. One of the day effects, in this case the Monday effect, must not be included in the model or the (X’X)matrix will become singular. This seemingly nonexistent day effect is implicitly included in the offset term, Po. (See Section 15.6). It might be worthwhile to fit a model without the small “issue-2 day” effect (p,). If this revised model fits well, then it has the tremendous practical advantage of allowing predictions to be made one more day into the future [Box and Jenkins (1976)]. Inspection of the residuals in Figure 10.12 suggest a longer term (perhaps a yearly) cycle in the data; an additional term with a period of approximately 261 days might be added to the model. Finally, using a nonlinear model instead of a linear model would allow the phase and periods of the cyclical terms to be determined by the data itself [Wentworth (1965a, 1965b), Chambers (1973), Draper and Smith (1981). and Bates and Watts (1988)l.
This Page Intentionally Left Blank
199
CHAPTER 11
A Ten-Experiment Example
Enzymes are large biological molecules that act as catalysts for chemical reactions in living organisms. As catalysts, they allow chemical reactions to proceed at a rapid rate, a rate much faster than would occur if the enzymes were not present. The action of an enzyme is often represented as
E A-B
(11.1)
where A represents one or more reactants and B represents one or more products of the chemical reaction. The enzyme E is assumed to take part in the reaction, but to be unchanged by it. Alternative expressions for enzyme catalyzed reactions are
A+E-
B+E
( 1 1.2)
and
A-
-B ( I 1.3)
E-
-E
which emphasize the recyclable nature of the enzyme. A systems view of enzyme catalyzed reactions is given in Figure 11.1. In this figure, an additional input to the system is shown - the pH, or “negative logarithm base ten of the hydrogen ion concentration.” If the pH of a chemical system is low (e.g., pH = l), the concentration of hydrogen ions is large and the system is said to be acidic. If the pH of a chemical system is high (e.g., pH = 13), the concentration of hydrogen ions is small and the system is said to be basic. The reasons for the effect of pH on the catalytic properties of enzymes are numerous and will not be discussed here. For most enzymes, however, there is a pH at which they are optimally effective: changing the pH to lower (more acidic) levels or to higher (more basic) levels will decrease the overall rate at which the associated chemical reaction occurs. In the region of the optimum pH, the reaction rate vs. pH response surface can usually be approximated reasonably well by a second-order, parabolic relationship.
200
~
R e a c t a n t A-
*Product
System
+R e a c t i o n
E
Rate
Let us assume that we are primarily interested in estimating the optimum pH for a particular enzyme catalyzed reaction. We might also be interested in determining if a parabolic relationship provides a significantly better fit than is provided by a straight line relationship over the domain of pH studied. If it does provide a better fit, then we might ask if the parabolic relationship is an adequate model or if it exhibits a highly significant lack of fit. We assume a limited budget that allows resources for only ten experiments. How can these experiments be distributed among factor combinations (design points) and replicates? At what levels of pH should the experiments be carried out? How can the results be interpreted?
11.1 Allocation of degree of freedom A set of n measured responses has a total of n degrees of freedom. Of these, n - f degrees of freedom are given to the estimation of variance due to purely experimental uncertainty (sk), f - p degrees of freedom are used to estimate the variance due to lack of fit (&), and p degrees of freedom are used to estimate the parameters of the model (see Table 9.2). In the present example, the total degrees of freedom is ten. The parabolic model that is to be fit,
ratei = Po + P1pHi +PI I PH? + r f
(11.4)
or
contains three parameters, Po, PI, and PI,. The number of degrees of freedom associated with the residuals is ten minus three (n - p ) , or seven. Thus, the number
20 1 TABLE 11.1 Possible allocation of seven residual degrees of freedom. Number of factor combinations
Degrees of freedom allocated to
of degrees of freedom allocated to lack of fit and purely experimental uncertainty must together total seven. What information could be obtained if all ten experiments were carried out at only three levels of pH? Three levels of x1 (f= 3) provides sufficient factor combinations for being able to fit a three-parameter model, but leaves no degrees of freedom for estimating lack of fit: f - p = 3 - 3 = 0. Because one of our objectives was to determine if a parabolic relationship provides an adequate model for the observed rate, we must be able to estimate the variance due to lack of fit of the model; the number of factor combinations (levels of x I in this example) must therefore be greater than three. What would be the consequences of carrying out experiments at ten factor combinations ( f = lo)? There would certainly be an adequate number of degrees of freedom available for estimating the variance due to lack of fit of the model: f - p = 10 - 3 = 7. However, the test of significance for the lack of fit is based on an F-ratio, the denominator of which is the estimated variance due to purely experimental uncertainty. Unfortunately, ten factor combinations leave no degrees of freedom for the estimation of s; ( n - f = 10 - 10 = 0) and the significance of the lack of fit could not be tested. Clearly, then, the number of factor combinations must be less than ten. This restriction and the previous one for lack of fit require that the number of factor combinations (f)be greater than or equal to four and less than or equal to nine: 4 I f <9. Thus, the number of degrees of freedom given to lack of fit can range from one to six [ 1 5 (f- 3) I61 and the number of degrees of freedom given to purely experimental uncertainty can range from six to one [6 2 (10 - f ) 1 11. Let us consider some of the possible allocations of degrees of freedom shown in Table 1 1.1, keeping in mind that the confidence of an estimated variance improves as the number of degrees of freedom associated with that estimate is increased. The effect of sk on the values of the V matrix was shown in Equation 7.1. Thus, we
202
might allocate most of the available degrees of freedom to the estimation of sf to make it a more precise estimator of o$ and decrease the uncertainties in the parameter estimates. The price to be paid, of course, is that s:,~ will be a less precise estimator of oy,,.On the basis of this reasoning, let us allocate five degrees of freedom to the estimation of of and two degrees of freedom to the estimation of &. The number of factor combinations will therefore be five.
11.2 Placement of experiments There remain two questions to be answered concerning the design of the experiments: what five levels of pH should be chosen, and how should the replicate experiments be allocated among these five levels? If the factor combinations are chosen too close together, the variances and covariances of the parameter estimates will be large (see Sections 7.2-7.4). Further, it might happen that the chosen levels of pH do not enclose the optimal pH and the extrapolated location of the optimum might be very imprecise. If the factor combinations are chosen far apart, the variances and covariances of the parameter estimates will be smaller, and the probability of bracketing the optimal pH will be greater. However, the assumed second-order model might not be as good an approximation to the true response surface over such a large domain of the factor as it would be over a smaller domain. In this as in all other problems of experimental design, prior information is helpful. For example, if the enzyme we are dealing with is naturally found in a neutral environment, then it would probably be most active at a neutral pH, somewhere near pH = 7. If it were found in an acidic environment, say in the stomach, it would be expected to exhibit its optimal activity at a low (acidic) pH. When information such as this is available, it is appropriate to center the experimental design about the “best guess” of where the desired region might be. In the absence of prior information, factor combinations might be centered about the midpoint of the factor domain. Let us assume that the enzyme of interest is naturally found in a basic environment for which the pH = 10; the midpoint of the factor combinations will therefore be taken to be 10. From the known pH dependence of the activity of other enzymes, a domain of approximately 4 pH units should be sufficient to bracket the optimum, yet not be so wide as to seriously invalidate a parabolic approximation to the true behavior of the system. The chosen treatment combinations will therefore be pH = 8, 9, 10, 11, and 12. We will code them as -2, -1, 0, +1, and +2 (cX1= 10, d,, = 1; see Section 8.5). Given these five levels of pH, how can the five replicate experiments be allocated? One way is to place all of the replicates at the center factor level. Doing so would give a good estimate of of at the center of the experimental design, but it would give
203
no information about the heteroscedasticity of the response surface. Also, in a coded data system such as this, the (x*';u')-'matrix element associated with the variance of the parameter estimate b; is no better with this design than for a five-experiment design with one experiment at each factor level. For the five-experiment design,
-2 -1 0 +1 +2
X*=
+4
+A] +1
( 1 1.6)
+4
The corresponding ( y ' X * ) matrix is
[l :I:
(X*'X*)=
0 10
( 1 1.7)
and the inverse is
0.486 0.000 (**.x*)-'=[0.000 0.100 -0.143 0.000
1
-0.143 0.000 0.071
( 1 1.8)
For the ten-experiment design with six replicates at the center point,
1 1 1 1 1 1 1 1 1 1
-2 -1 0 0 0 0 0 0 +I +2
+4 +I 0 0 0 0 0 0 +1 +4
(1 1.9)
204
10 0 10 10 0 10 0 34 01
= [),.*,,
( 1 1.10)
and the inverse is
0.142 0.000
,,.tx*)-L[ 0.000 0.100 -0.042
0.000
-0.042 0.000 0.042
i
(11.11)
Let us try instead an experimental design in which two replicates are carried out at the center point and three replicates are carried out at each of the extreme points (-2 and +2). Then
1 1
1 1 1 1 1 1 1 1
-2 -2 -2 -1 0 0 +1 +2 +2 +2
+4 +4 +4 +I 0 0 +1 +4 +4 +4
( 1 1.12)
The corresponding (x"x*)matrix is
(X*'X*)=[ 10 0 26 0 260 1 (1 1.13) 26
0 98
205
and the inverse is
(XX*) -1
=[
0.322 0.000 -0.086 0.000 0.000 0.038 0.033 -0.086 0.000
1
( 1 1.14)
The parameter estimates with this design will be more precise than they would be with either a five-experiment design (see Equation 11.8) or, with the exception of the estimate of Pi, a ten-experiment design with six replicates at the center point (see Equation 11.11). We will use as our design here, the allocation represented in Equation 11.12. The corresponding coded experimental design matrix, D', contains the coded factor combinations at which experiments are to be carried out. The coded experimental design matrix is usually not the same as the matrix of coded parameter coefficients, A?; the design matrix is determined by the chosen factor combinations only, while the matrix of parameter coefficients is determined also by the model to be fit. Each row of the experimental design matrix corresponds to a given experiment, and each column of the experimental design matrix indicates a particular factor [Kempthorne (1980)l. The experimental design matrix for this study of the effect of pH on reaction rate is
-
D*=
-
-2 -2 -2 -1 0 0 +I +2 +2 +2 -
(11.15)
206
11.3 Results for the reduced model Let us assume that we have carried out the indicated experiments and have obtained the following values for the measured reaction rates:
-
33 50 37 53 89 Y= 87 69 69 67 - 80
( 1 1.16)
We will first fit the reduced model Yll=
P; + P7-G+
r*l
( 1 1.17)
The matrix of parameter coefficients is then
-
X* =
1
1 1 1 1 1
1 1 1 1
-2 -2 -2 -1
0 0 +1
( 1 1.18)
+2 +2 +2
( 1 1.19)
207
(X*TX*)-l=[ 1/10 ()
(1 1.20)
(11.21)
63.4 B*=[bb:]=[
(1 1.22)
8.00]
The data and the least squares straight line relationship are shown in Figure 11.2. It is to be remembered that the parameter estimates are those for the coded factor levels (see Section 11.2) and refer to the model ~ 1 , = 6 3 . 4 + 8 . 0 0(pHj- 10.00)+rlj
(1 1.23)
Thus, the parameter estimate bi is the estimated response at pH = 10.00. The overall mean response is vl
=63.4
(1 1.24)
The matrix expressing the contribution of the overall mean response to each experiment is 0
01
7
8
10 11 9 L e v e l o f F a c t u r X I (pH)
12
Figure 11.2 Graph of the model y l i = fli+ pix;,
13
+ rli fit to the experimental data.
208
,,S,S,
=
40195 6 9
1
3
5
Figure 11.3 Sums of squares and degrees of freedom tree for Figure 11.2.
P
63.4 63.4 63.4 63.4 63.4 Y= 63.4 63.4 63.4 63.4 - 63.4
-
( 1 1.25)
-
The matrix of predicted responses,
P, is given by
209
47.4 47.4 47.4 55.4 63.4 63.4 71.4 79.4 79.4 79.4
( 1 1.26)
The matrix of residuals, R, is given by
- 14.4 2.6 - 10.4 - 2.4 25.6 23.6 -2.4 - 10.4 - 12.4 0.6
( 1 1.27)
The matrix of mean replicate responses, J , is
-
40 40 40 53 88 J= 88 69 72 72 - 72
( 1 1.28)
210
Using these matrices, the following sums of squares are easily calculated (see Section 9.1).
SST = Y Y=43668
( 1 1.29)
SS,,,,,, = Y E=40195.6
( 1 1.30)
SSCor, = C'C=3472.4
(11.31)
ssfaC, = F F= 1664
( 1 1.32)
SSr=R'R= 1808.4
( 1 1.33)
L'L = 15 50.4
SSlof =
( 1 1.34)
SS,, =P P= 258
( 1 1.35)
The relationships of the sums of squares and degrees of freedom are given in Figure 11.3 and Table 11.2. The coefficient of determination is
R = (SSfact ) / ( SSc,,.,) = 1664/ 3472.4 =0.479
( 1 1.36)
and indicates in this case that the term p;x; removes 47.9% of the corrected sum of squares; 52.1% remains as residuals. The significance of the coefficient of determination is contained in F(p- I ,n - p ) = F , 1.8 ) = ( s S f a c t / D F f a c t ) / (ssr/DFr )
= (1664/1)/( 1808.4/8)=7.36
( 1 1.37)
which is significant at the 97.3% level of confidence. The significance of the lack of fit is determined by F(f-p,n-f)
/ (SSP,/DFP,)
=F(3,5)= ( S ~ l O f / D ~ l O f >
= ( 1550.4/3)/(258/5)= 10.02
which is significant at the 98.5% level of confidence.
( 1 1.38)
21 1 TABLE 11.2 ANOVA table for reduced model. Source
Degrees of freedom
Sum of squares
Mean square
Total Mean Corrected Factor effects Residuals Lack of fit
10 1 9 1 8
43668.0 40195.6 3472.4 1664.0 1808.4 1550.4 258.0
4366.8 40 195.6 385.82 1664.0 226.05 516.8 51.6
Purely experimental uncertainty
3 5
11.4 Results for the expanded model Fitting the expanded (parabolic) model to the data in Equation 11.16 gives the parameter estimates
( I 1.39)
1
212
2
5
Figure 11.5 Sums of squares and degrees of freedom tree for Figure I I .4.
B*=[3=[ -7:0 6.00 ]
( 1 1.40)
The estimated response surface and the data are shown in Figure 11.4. The allocation of the sums of squares and degrees of freedom are given in Figure 11.5 and Table 11.3. The coefficient of multiple determination is
R2=2758.4/3472.4=0.794
( 1 1.41 )
and indicates in this case that the terms j3;x;; and P;,x;; remove 79.4% of the corrected sum of squares; only 20.6% remains as residuals. The significance of the coefficient of multiple determination is contained in F(2.7)=
(2758.4/2)/(714/7)= 13.52
( 1 1.42)
which is significant at the 99.6% level of confidence. The significance of the lack of fit is determined by F(2,5)=
(456/2)/(258/5)=4.42
which is significant at the 92.2% level of confidence.
( 1 1.43 )
213
TABLE 11.3 ANOVA table for expanded model. Source
Degrees of freedom
Total Mean Corrected Factor effects Residuals Lack of fit Purely experimental uncertainty
10 1 9
Sum of squares 43668.0 40 195.6 3412.4 2758.4 714.0 456.0 258.0
2 7 2
5
Mean square 4366.8 40 195.6 385.82 1379.2 102.0 228.0 51.6
The partitioning of sums of squares and degrees of freedom shown in Figure 11.6 and Table 11.4 allows us to determine the individual significance of the “set of parameters” containing only p;l (that is, the significance of adding the single term
p; I.;?).
which is significant at the 98.6% level of confidence.
2
5
Figure 11.6 Sums of squares and degrees of freedom tree for estimating the significance of the additional parameter pi,.
214 TABLE 11.4 ANOVA table for significance of
P;,.x;f.
Source
Degrees of freedom
Sum of squares
Mean square
Total Mean Corrected Factor, reduced model Residuals, reduced model Factors, additional Residuals Lack of fit Purely experimental uncertainty
10 1 9 1 8 1 7 2 5
43668.0 40195.6 3412.4 1664.0 1808.4 1094.4 714.0 456.0 258.0
4366.8 40195.6 385.82 1664.0 226.05 1094.4 102.0 228.0 51.6
11.5 Coding transformations of parameter estimates The parabolic relationship between reaction rate and uncoded pH is obtained by expansion of the coded model. rate, =79.0+8.00(pH,
- 10.00) -6.00(pH,
rate, =79.0+8.00pH, -80.0-6.00 rate, = - 60 1
+ 128 pH, - 6.00 pH:
pHf
- 10.00)2
+ 120 pH, -600
(11.45) ( 1 1.46) ( 1 1.47)
(see Equation 11.4). The same transformation may be accomplished using general matrix techniques. It can be shown that
where B is the matrix of parameter estimates in the uncoded coordinate system, B* is the matrix of parameter estimates in the coded coordinate system, and the transformation matrix, A , is obtained from the matrix of uncoded parameter coefficients, X , and from the matrix of coded parameter coefficients, x .
A = ( X * ’ X * )- ‘ ( X * ‘ X )
( 1 1.49)
215
In the present example, (PX)is given by Equation 11.14. Other arrays are 1 1 1 1 -2 -2 -2 -1 +4 +4 +4 +1
I 1 1
X=
1 1 1 1 1 1 1
8 8 8 9 10 10 I1 12 12
12
1
1
1
0 0 +1 0 0 +1
'I
1 1 +2 + 2 + 2 +4 +4 +4
64 64 64 81 100 100 121 144 144 144
(11.51)
10 100 1026 26 5201 26 260 2698
( x * x ) = [o
A = (X*'X* ) - ' ( X * ' X )
=[
=[o
49/152 0 -13/12
0
- 13f2
1/26
0
( 1 1.50)
5/152
( 1 1.52)
][
1;
1;
1!3:]
( 1 1.53)
26 260 2698
1 10 100 1 2p] 0 0
A-'=[: 1
-10
-2p] 100 ( 1 1.54)
216
Finally,
( I 1.55)
which may be compared with the coefficients obtained in Equation 11.47. The location of the predicted optimum pH may be found by differentiating the fitted model with respect to pH. Setting the derivative of Equation 11.47 equal to zero gives the location of the stationary point (in this case, a maximum). ( 1 1.56)
O = 128- 12 pH
(1 1.57)
128/12= 10 213
pH,,,=
Anticipating a later section on canonical analysis of second-order polynomial models, we will show that the first-order term can be made to equal zero if we code the model using the stationary point as the center of the symmetrical design. For this new system of coding, c,, = 10 2/3 and d,, = 1 (see Section 8.5).
-
x*=
1 1 1 1 1 1 1
1 1 - 1
-813 -813 -813
-513 -213 -213 1/3 4/3 4/3 413
64.9 64/9 6419 2519 419 419 1/9 16/9 1619 16/9
-
- 2013 27419
27919 - 1484/27
( I 1.58)
274/9
- 1484/27 13714181
(1 1.59)
217
0.2699 -0.06890 -0.07091
- 0.06890 - 0.0709 1 0.09694 0.04386 0.04386 0.03289,
(1 1.60)
100
- 12213 27419
-492/3 242819 2214619
(11.61)
( 1 1.62)
(1 1.63)
Thus, in this new system of coding, b; = 0; the first-order parameter is zero.
11.6 Confidence intervals for response surfaces In Section 6.1, the concept of confidence intervals of parameter estimates was presented. In this section, we consider a general approach to the estimation of confidence intervals for parameter estimates and response surfaces based on models that have been shown to be adequate (i.e., the lack of fit is not highly significant, either in a statistical or in a practical sense). If a model does not show a serious lack of fit, then sf = S S , / ( n - p )
is a valid estimate of
v= s: ( X X ) ' -
( 1 1.64)
(see Section 6.4) and the equation
( 1 1.65)
is often used to calculate the variance-covariance matrix. It is important to realize that this is a valid estimate only if the model is adequate.
218
The variances of the parameter estimates can be used to set confidence intervals that would include the true value of the parameter a certain percentage of the time. In general, the confidence interval for a parameter p, based on sf, is given by
where p is the true value of the parameter, b is its estimated value, s,”is the corresponding variance from the diagonal of the variance-covariance matrix, and F(l,,,-J,)is the tabular value of F at the desired level of confidence. Although the derivation is beyond the scope of this presentation, it can be shown that the estimated variance of predicting a single new value of resporise at a given point in factor space, stlo,is equal to the purely experimental uncertainty variance, sk, plus the variance of estimating the mean response at that point, stlo;that is, s:,o =s;,
+s:,,
( 1 1.67)
where the subscript “0” is used to indicate that the factor combination of interest does not necessarily correspond to one of the experiments that was previously carried out. Let a lxfmatrix X , contain only one row; let X , have columns that correspond to the columns of the X matrix (the matrix of parameter coefficients); and let the elements of X , correspond to the factor combination of interest. For example, if we are interested in the point represented by pH = 8.5, then in the uncoded factor space given in Section 11.2, xlo= -1.5, and for the second-order model of Equation 11.39, XO=[l
x ~ O
x:O]=[l
-1.5
2.251
(1 1.68)
For a given experimental design (such as that of Equation 11 .IS), the variance of predicting the mean response at a point in factor space is
Thus,
If the model is adequate but still not perfectly correct, then the estimate stlobased on s;e (Equation 11.70) will be too low because it does not take into account the lack of fit of the model. To partially compensate for the possibility of a slight lack of fit between the model and the data, it is customary to use sf to estimate s:lo in setting confidence intervals for response surfaces.
219
s;,,=s;{l+
[X,(XX)-'Xbl}
( 1 1.71 )
It is to be stressed that if the model is grossly incorrect, it is of little practical use to estimate confidence intervals for response surfaces. For the example at pH = 8.5, by Equation 11.14 and Table 11.3,
s;l,,= ( 102.0) ( 1.19 1 ) = 12 1.5
( 1 1.72)
The standard uncerruinty in the single new value of response (the square root of the estimated variance of predicting the response, analogous to the standard uncertainty of a parameter estimate defined in Section 6.2) is s,, =Js&
( I 1.73)
= 1 1.02
It is evident that Equation 11.71 can be used to plot the variance (or uncertainty) of predicting a single new value of response if X, is made to vary across the domain of factor space. Such a plot of standard deviation of predicting a single new value of response as a function of pH is shown in Figure 11.7 for the experimental design of Equation 11.15, the data of Equation 1 1.16, and the second-order model of Equation 1 1.39. It is possible to use s,: to obtain confidence limits for predicting a single new
-2
-1
0
2
1
Coded L e v e l o f Factor X I
3
IpH-lC)
Figure 11.7 Standard uncertainty for estimating one new response as a function of the factor xi for the model yl, = pi + pyx;, + pylx;: + rl,.
220
value of response. The interval is given by ( 1 1.74)
where F has one degree of freedom in the numerator and n - p degrees of freedom in the denominator because stlois based on .:s Because YlO
=a
( 1 1.75)
an equivalent expression for the confidence interval is obtained by substituting Equations 11.7 1 and 1 1.75 in Equation 11.74.
At pH = 8.5, the 95% confidence interval is
ylo= 53.5 t J ( 5 . 5 9 X
121.5) =53.5 226.06
( 1 1.77)
Equation 11.76 can be used to plot the confidence limits for predicting a single new value of response if X, is made to vary across the domain of factor space. Figure 11.8 gives the 95% confidence limits for the data and model of Section 11.4. A related confidence interval is used for estimating a single meaiz of several new values of response at a given point in factor space. It can be shown that the estimated variance of predicting the mean of m new values of response at a given point in factor space, s*yl0,is
A plot of the corresponding confidence limits can be obtained from
Figure 11.8 gives the 95% confidence limits for predicting a single mean of four responses at any point in factor space for the data and model of Section 11.4. If m is large, Equation 11.79 reduces to
which gives the confidence interval for a single estimate of the true mean,
at a
22 1
7
8
9 10 11 L e v e l o f F a c t o r X I (pH)
12
13
Figure 11.8 Confidence bands (95% level) for predicting a single new value of response (outer band), a single mean of four new values of response (middle band), and a single estimate of the true mean response (inner band).
given point infactor space. Figure 11.8 plots the 95% confidence limits for predicting the true mean at any single point in factor space for the data and model of Section 11.4. Finally, we turn to an entirely different question involving confidence limits. Suppose we were to carry out the experiments indicated by the design matrix of Equation 11.15 a second time. We would probably not obtain the same set of responses we did the first time (Equation 1 1.16), but instead would have a different Y matrix. This would lead to a different set of parameter estimates, b, and a predicted response surface that in general would not be the same as that shown in Figure 11.4. A third repetition of the experiments would lead to a third predicted response surface, and so on. The question, then, is what limits can we construct about these response surfaces so that in a given percentage of cases, those limits will include the entire true response surface? The answer was provided by Working and Hotelling and is of the same form as Equation 11.80, but the statistic W* (named after Working) replaces F [Neter, Wasserman, and Kutner (1990)l.
W2=PXF(p,n-p) 910
= X 0 B + J ( w2XSf[XO(xX)-lXo])
(11.81) ( 1 1.82)
The 95% confidence limits are plotted in Figure 11.9. The Working-Hotelling
222
7
9 10 11 L e v e l o f F a c t o r X I (pH1
8
12
13
Figure 11.9 Confidence band (95% level) for predicting the entire true response surface.
confidence limits are used when it is necessary to estimate true mean responses, at several points in factor space from the same set of experimental data. Before leaving this example, we would point out that there is an excessive amount of purely experimental uncertainty in the system under study. The range of values obtained for replicate determinations is rather large, and suggests the existence of important factors that are not being controlled (see Figure 11.10). If steps are taken to bring the system under better experimental control, the parameters of the model can be estimated with better precision (see Equation 7.1). An old statement is very true - statistical treatment is no substitute for good data.
I
Reartlon Rate
Figure 11.10 General system theory view emphasizing the cffect of uncontrolled factors on the noise associated with responses.
223
Exercises 11.1 Placement of experiments. Rework Sections 11.3 and 11.4 with x ; ~= -1 and x;(,,, = 1. Compare (X*’x*)-’ with Equations 11.11 and 11.14. Which experimental design gives the most precise estimate of p;?
11.2 Effect of coding. Rework Sections 11.3 and 11.4 using cX1= 9, d,, = 0.5. Does coding appear to affect the significance of the statistical tests?
11.3 Residual plot. Plot the residuals of Equation 11.27 against the factor levels of Equation 11.15. Is there a probable pattern in these residuals? What additional term might be added to the model to improve the fit? 11.4 Residual plot.‘ Plot the residuals of Figure 11.4 against the factor levels of Equation 11.15. Is there a probable pattern in these residuals? Would an additional term in the model be helpful? 11.5 Matrix inversion. Verify that Equation 11.8 is the correct inverse of Equation 11.7 (see Appendix A). 11.6 Coding transformations.
Calculate B from B* in Section 11.3 (see Section 11.5 and Equation 11.48). What is the interpretation of b,? What is the relationship between b; and b,? Why?
I I . 7 Experimental design. A system is thought to be described between x1 = 0 and x, = 10 by the model yli = Po + pIxli+ rli.What is the minimum number of experiments that will allow the model to be fit, the significance of regression to be tested, and the lack of fit to be tested? How should these experiments be distributed along x,? Why? 11.8 Experimental design.
A system is thought to be described between x1 = 0 and xI = 10 by the model y l i + p2(log x l i ) + rli. If resources are available for ten experiments, how would you place the experiments along x,? Why? =
Po +
11.9 Experimental design. A two-factor system is thought to be described by the model y l i = + p , x I i+ ps2i + rli over the domain 0 5 xI I10, 0 Ix2 I 10 (see Figui-e 2.16). What is the minimum number of experiments that will allow the model to be fit? What are some
224
of the ways you would not place these experiments in the domain of x, and x,? [Hint: calculate (X'W-' for questionable designs.] 11.10 Confidence intervals. Compare Equations 11.76, 11.79, and 11.80. In what ways are they similar? In what way are they different? 11.11 Confidence intervals. Rewrite Equation 11.82 substituting P X F @ , ~for - ~ )W2.In what ways is it similar to Equation 11.SO ? In what way is it different?
11.12 System theory. What other factors might affect the reaction rate of the enzyme catalyzed system shown in Figure 11.10? Which of these can be easily controlled? Which would be very difficult to control? Which would be expected to exert a large effect on the system? Which would be expected to exert only a small effect on the system? [See Problem 1.1.] I1 . I 3 Measurement systems. Measurement systems are used to obtain information from systems under study:
-
NUMBER
STUDY
INPUT
Comment on known and unknown, controlled and uncontrolled measurement system factors and their influence on the quality of information about the system under study. 11.14 Measurement systems.
An ideal measurement system does not perturb the system under study. Give three examples of ideal measurement systems. Give three examples of non-ideal measurement systems. 11.15 Experimental design. Draw a systems diagram for a process with which you are familiar. Choose one input for investigation. Over what domain might this input be varied and controlled? What linear mathematical model might be used to approximate the effect of this factor on one of the responses from the system? Design a set of experiments to test the validity of this model.
225
11.I6 Confidence intervals. Calculate the 99% confidence interval for predicting a single new value of response at pH = 7.0 for the data of Equation 11.16 and the second-order model of Equation 11.39. Calculate the 99% confidence interval for predicting the mean of seven new values of response for these conditions. Calculate the 99% confidence interval for predicting the true mean for these conditions. What confidence interval would be used if it were necessary to predict the true mean at several points in factor space? 11.17 Number of experiments. The experimental design represented by Equation 11.6 requires only half as many experiments as the design represented by Equation 11.12. Each design can be used to fit the same parabolic model. What are the advantages and disadvantages of each design?
I I . 18 Coefficients of correlation. In Sections 11.3 and 11.4 the R2 value increased from 0.479 to 0.794 and the F-ratio for the significance of regression increased from the 97.3% to the 99.6% level of confidence when the term p;x;, was added to the model. Is it possible that in some instances R2 will increase, but the significance of regression will decrease? If so, why? [See, for example, Neter, Wasserman, and Kutner (1990), p. 229.1
11.19 Confidence intervals. Comment on the practice of writing results in the following form: 5.63k0.16. What do the numbers mean? 11.20 Variance-covariance matrix. In Section 11.3, is s: a “valid estimate” of o;? Why or why not? In Section 11.4, is s: a “valid estimate” of o&? Why or why not?
This Page Intentionally Left Blank
227
CHAPTER 12
Approximating a Region of a Multifactor Response Surface
In previous chapters, many of the fundamental concepts of experimental design have been presented for single-factor systems. Several of these concepts are now expanded and new ones are introduced to begin the treatment of multifactor systems. Although the complexity of multifactor systems increases roughly exponentially as the number of factors being investigated increases, most multifactor concepts can be introduced using the relatively simple two-factor case. Thus, in most of this chapter we will consider the system shown in Figure 12.1, a system having two inputs designated factor x1 and factor x2 (see Section 1.2), and a single output designated response y 1 (see Section 1.3).
12.1 Elementary concepts In Chapter 2 it was seen that a response surface for a one-factor system can be represented by a line, either straight or curved, existing in the plane of two-dimensional experiment space (one factor dimension and one response dimension). In two-factor systems, a response surface can be represented by a true sutjfiuce, either flat or curved, existing in the volume of three-dimensional experiment space (two factor dimensions and one response dimension). By extension, a response surface associated with three- or higher-dimensional factor space can be thought of as a hypersurface existing in the hypervolume of four- or higher-dimensional experiment space.
'13
actor F a c t o r X2
System
Response Y 1
Figure 12.1 Two-factor, single-response system for discussion of multifactor experimentation.
228
0
1
2
3
4 5 6 7 Level of Factor X 1
8
9
10
Figure 12.2 Location of a single experiment in two-dimensional factor space.
Figure 12.2 is a graphic representation of a portion of two-dimensional factor space associated with the system shown in Figure 12.1. In this illustration, the domain of factor x, (the “horizontal axis”) lies between 0 and +lo; similarly, the domain of factor x2 (the “vertical axis”) lies between 0 and +lo. The response axis is not shown in this representation, although it might be imagined to rise perpendicularly from the intersection of the factor axes (at x1 = 0, x2 = 0). Figure 12.2 shows the location in factor space of a single experiment at xI1= +3, x21= +7. Figure 12.3 is a pseudo-three-dimensional representation of a portion of the three-dimensional experiment space associated with the system shown in Figure 12.1. The two-dimensional factor subspace is shaded with a one-unit grid. The factor domains are again 0 I x1 I +10 and 0 I x2 I +lo. The response axis ranges from 0 to +8. The location in factor space of the single experiment at x , , = +3, x21= +7 is shown as a point in the plane of factor space. The response (yll = 4.00) associated with this experiment is shown as a point above the plane of factor space, and is “connected” to the factor space by a dotted vertical line. A two-factor response surface is the graph of a system output or objective function plotted against the system’s two inputs. It is assumed that all other controllable factors are held constant, each at a specified level. Again, it is important that this assumption be true; otherwise, as will be seen in Section 12.2, the response surface might appear to change shape or to be excessively noisy. Figure 12.4 is a pseudo-three-dimensional representation of a response surface showing a system response, y l , plotted against the two system factors, x, and x2. The
229
Figure 12.3 Location of a single experiment in three-dimensional experiment space.
response surface might be described by some mathematical function U that relates the response y , to the factors x1 and x2.
(12.1) For Figure 12.4, the exact relationship is (12.2)
Figure 12.4 A two-factor response surface.
230
Such a relationship (and corresponding response surface) might represent magnetic field strength as a function of position in a plane parallel to the pole face of a magnet, or reaction yield as a function of reactor pressure and temperature for a chemical process.
12.2 Factor interaction The concept of “self interaction” was introduced in Section 8.6 where it was shown that for a single-factor, second-order model, “the (first-order) effect of the factor depends on the level of the factor”. It was shown that the model (12.3)
Ylr=Po+PIXl,+PIIX:I+~lI
can be written v1,=Po+(P1
+P1IXI*>X1*+~lr
(12.4)
from which the “slope” with respect to xi (the first-order effect of xl) can be obtained: slope of factor x1= (PI
+ PI x I ,) I
(12.5)
In multifactor systems, it is possible that the effect of one factor will depend on the level of a second factor. For example, the slope of factor xI might depend on the level of factor x2 in the following way: slope of factor xl = (P1 + PI2x2,)
(12.6)
When incorporated into the complete second-order model, we obtain Ylr = Po + (PI
+ P12X2r)Xlr + T I *
(12.7)
and finally Y11=
P o + PlXlr + S12XlrX2r + T I 1
(12.8)
The term P 1 2 x l i x 2isi said to be an interaction term. The subscripts on pI2indicate that the parameter is used to assess the interaction between the two factors xl and x2.
23 1
According to Equation 12.8 (and seen more easily in Equation 12.7), the response
Po
y1 should be offset from the origin an amount equal to and should change according to a first-order relationship with the factor x,. However, the sign and
magnitude of that change depend not only on the parameters PI and PI,, but also on the value of the factor x,. If we set Po = d . 0 , PI = -0.4, and P,2 = +0.08, then Equation 12.7 becomes
yl, =4.0+ ( -0.4+0.08xz,)xl, + r l r
(12.9)
y , , =4.0-0.4xI, +0.08xl,x2,+ r I 1
(12.10)
or
When x2 = 0, the effect of xI will be to decrease the response y , by 0.4 units for every unit increase in xl. When x2 = 5 , yI will not be influenced by xI;i.e., the slope of y I with respect to x i will be zero. When x2 = 10, the effect of xI will be to increase the response y 1 by 0.4 units for every unit increase in x l . Figure 12.5 graphs the response surface of Equation 12.10. We could also look at this interaction from the point of view of the second factor x, and say that its effect depends on the level of the first factor xI. Rewriting Equation 12.8 gives
That is, the slope of factor x2= p12xl,
(12.12)
This interpretation is confirmed in Figure 12.5 where it can be seen that when x1 = 0, the response does not depend on the factor x2. However, the dependence of y1 on x, gets larger as the level of the factor x1 increases. The concept of interaction is fundamental to an understanding of multifactor systems. Much time can be lost and many serious mistakes can be made if interaction is not considered. As a simple example, let us suppose that a research director wants to know the effect of temperature on yield in an industrial process. The yield is very low, so in an attempt to increase the yield he asks one of his research groups to find out the effect of temperature (7J on yield over the temperature domain from 0 "C to 10 "C. The group reports that yield decreases with temperature according to the equation yield = ( 4.0 - 0.4T )O/o
(12.13)
232
Figure 12.5 Graph of the model y , , = 4.0
- 0.4x,, + 0
. 0 8 ~ ~ ~ ~ ~ .
Not certain of this result, he asks another of his research groups to repeat the work. They report that yield increases with temperature according to the equation yield= (4.0+0.4T)%
(12.14)
Many research directors would be upset at this conflicting information - how can one group report that yield decreases with temperature and the other group report that yield increases with temperature? In many instances, including this example, there could be a second factor in the system that is interacting with the factor of interest. It is entirely possible that the first research group understood their director to imply that catalyst should not be included in the process when they studied the effect of temperature. The second group, however, might have thought the research director intended them to add the usual 10 milligrams of catalyst. Catalyst could then be the second, interacting factor: if catalyst were absent, the yield would decrease; with catalyst present in the amount of 10 milligrams, the yield would increase. Presumably, intermediate amounts of catalyst would cause intermediate effects. The results of this example are illustrated in Figure 12.5 if temperature in “C is represented by the factor x,, milligrams of catalyst is represented by x2, and yield is represented by y , . Equations 12.13 and 12.14 are obtained from Equation 12.10. At the beginning of Section 2.1 where single-factor response surfaces were first discussed, the following caution was presented: “It is assumed that all other controllable factors are held constant, each at a specified level .... It is important that this assumption be true; otherwise, the single-factor response surface might appear to change shape or to be excessively noisy”. This caution was stated again for
233
multifactor response surfaces in Section 12.1. We have just seen how controlling a second factor (catalyst) at difleerent levels causes the single-factor response surface to change shape. Let us now see why an uncontrolled second factor often produces a single-factor response surface that appears to be excessively noisy. Suppose a third research group were to attempt to determine the effect of temperature on yield. If they were not aware of the importance of catalyst and took no precautions to control the catalyst level at a specified concentration, then their results would depend not only on the known and controlled factor temperature (xl), but also on the unknown and uncontrolled factor catalyst (xz). A series of four experiments might give the results
3.65
3.31
9.00
1.76
6.33
1.28
5.31
0.70
Plotting this data and the results of several other similar experiments in which catalyst level was not controlled might give the results shown in Figure 12.6. The 0
m
W (n
c
D
0
m
a 111
"1
.
I
0
1
2
3
4 5 6 7 Level o f F a c t o r X I
8
.
' *
v
9
10
Figure 12.6 Plot of response as a function of xI for random values of x2 (0 I x2 5 10) in the model = 4.0 - 0.4~,,+ 0 . 0 8 ~ , , ~ ~ , , .
yl,
234
relationship between y1 and x1 is seen to be “fuzzy”; the data is noisy because catalyst (xz) has not been controlled and interacts with the factor xI. A similar effect is observed when non-interacting factors are not controlled. However, uncontrolled non-interacting factors usually produce homoscedastic noise (see Figure 3.6); uncontrolled interacting factors often produce heteroscedastic noise (see Figure 3.7), as they do in the present example.
12.3 Factorial designs Factorial designs are an enormously popular class of experimental designs that are often used to investigate multifactor response surfaces. The word “factorial” does not have its usual mathematical meaning of an integer multiplied by all integers smaller than itself (e.g., 3! = 3x2x1), but instead indicates that many factors are varied simultaneously in a systematic way. One of the major advantages of factorial designs is that they can be used to reveal the existence of factor interaction when it is present in a system. Historically, factorial designs were introduced by Sir R. A. Fisher to counter the then prevalent idea that if one were to discover the effect of a factor, all other factors must be held constant and only the factor of interest could be varied. Fisher showed that all factors of interest could be varied simultaneously, and the individual factor effects and their interactions could be estimated by proper mathematical treatment. The Yates algorithm and its variations are often used to obtain these estimates, but the use of least squares fitting of linear models gives essentially identical results. Important descriptors of factorial designs are the number of factors involved in the design and the number of levels of each factor. For example, if a factorial design has three levels (low, middle, and high) of each of two factors (xIand x,), it is said to be a 3x3 or 3’ design. A factorial design involving three factors with each factor at two 2 23 factorial. Figure 12.7 illustrates levels (low and high) would be called a 2 x 2 ~ or possible factor combinations for a 3* factorial and Figure 12.8 shows factor combinations for a 23design. Note that all possible combinations of the chosen factor levels are present in the experimental design; thus, the description of a factorial design gives the number of factor combinations (f)contained in the design: 3’ = 9, and 23 = 8. Note also that as the number of factors increases, the number of factor combinations required by a full factorial design increases exponentially (see Table 12.1). In general, if k is the number of factors being investigated, and m is the number of levels for each factor, then mk factor combinations are generated by a full factorial design. We will restrict our discussion here to the two-level, two-factor design shown in Figure 12.9. The linear model most commonly fit to the data from 2’ factorial designs is Ylr
= Po
+ P l X l f + P2X2f + P12XlrX2, + r , f
(12.15)
235
0
1
2
3
4 5 6 7 Level of Factor X I
8
9
10
Figure 12.7 Factor combinations for a 3* factorial experimental design in two-dimensional factor space.
Figure 12.8 Factor combinations for a 23 factorial experimental design in three-dimensional factor space.
236
This model gives estimates of an offset term (Po), a first-order effect (PI) of the first factor xl, a first-order effect (p,) of the second factor x2, and a second-order interaction effect (PI,) between the two factors xI and x2. When the model of Equation 12.15 is fit to data from a 2’ factorial design, the number of factor combinations ( f = 4) is equal to the number of parameters (p = 4), and the number of degrees of freedom for estimating lack of fit is zero ( f - p = 4 - 4 = 0). Further, if there is no replication, the number of degrees of freedom for residuals is zero (n - p = 4 - 4 = 0); under these conditions, the number of degrees of freedom for purely experimental uncertainty is also zero (n - f = 4 - 4 = 0). [See, for example, Bose and Carter (1962).] If replication is carried out, purely experimental uncertainty can be estimated, but an estimate of lack of fit is still not possible unless a model with fewer parameters is fitted. (Frequently, if the factors are measured on interval or ratio scales, statisticians will recommend running an additional center point experiment to pick up a degree.of freedom for lack of fit.) Let us suppose the results of the experimental design shown in Figure 12.9 are obtained from the response surface shown in Figure 12.5. These results are
1
3
3
3.52
2
3
7
4.48
3
7
3
2.88
4
7
7
5.12
Then
(12.16)
(12.17)
237 TABLE 12.1 Number of factor combinations required by full factorial designs as a function of the number of factors. Number of factors, k
Number of factor combinations required for full factorial 2-level
3-level
4-level
1
2
3
4
2
4
9
16
3
8
27
64
4
16
81
256
5
32
243
1,024
6
64
729
4,096
7
128
2,187
16,384
8
256
6,561
65,536
9
512
19,683
262,144
1024
59,049
1,048,576
10
.=[I
1 3 3 1 73 37 2 I:] 1
(12.18)
1 7 7 4 9 Using the method of least squares, we obtain
[
I:]
20 20 116 100 20 100 116 100 580 580 3364 4
( ~ x ) = 20
( X X )- =
- 2.27 - 2.27 0.391
(12.19)
- 2.27
- 2.27
0.453 0.39 1 -0.0781
0.391 0.453 -0.0781
0.39 1 -0.0781 -0.078 1 0.0156
1
(12.20)
0
1
2
3
4 5 6 7 Level o f F a c t o r XI
8
9
10
Figure 12.9 Factor combinations for a 22 factorial experimental design in two-dimensional factor space.
43.52 .481 =
( X Y ) = 1 3 3 7 3 37
7
L 9 21 21 49
2.88 5.12
[ ii::
86.4 437.12
(12.21 )
I
B=
(12.22)
Because four parameters were estimated from data obtained at four factor combinations, there are no degrees of freedom for lack of fit; further, there was no replication in this example, so there are no degrees of freedom for purely experimental uncertainty. Thus, there can be no degrees of freedom for residuals, and the estimated model will appear to fit the data “perfectly”. This is verified by estimating the responses using the fitted model parameters. 1 3 3 1 7 3 2 1 1 7 7 4 9
4.00 (12.23) 0.08
239
This reproduces the original data exactly (see Equation 12.17). Because the data for this example were taken from Figure 12.5, it is not surprising that these parameter estimates are the same as the values in Equation 12.10 from which Figure 12.5 was drawn. Note that p2does not appear in Equation 12.10; thus, the estimated value of p2 = 0 in Equation 12.22 is reasonable and expected for this example. One might be tempted to conclude from this fitted model that the factor x, is not important - after all, the coefficient p, is equal to zero, This would be an incorrect conclusion. It is true that the p~~~term in the fitted model does not influence the response, but x, is still an important factor because of its effect in the interaction term O,GlP*i. Other models can be fit to data from two-level factorial designs. For example, fitting the model expressed by Equation 12.8 to the data used in this section will produce the fitted model given by Equation 12.10. Some models cannot be fit to data from two-level factorial experiments; for example, the model YIL = Po + PIX11 + P 2 X 2 1 +
P22&
+rll
( 12.24)
produces an (X’X) matrix that cannot be inverted.
12.4 Coding of factorial designs Calculations for factorial designs are often greatly simplified if coding of factor levels is employed (see Section 8.5). For the present example, setting cXI= cX2= 5 and d,, = dx2= 2,
x*=[ 1 -1 -1
1 1
-1
-:I
1 1 -1 -1 1 1 1
(12.25)
(12.26)
Inversion of this matrix is trivial.
240
( 12.27)
16.00(12.28)
3.20 1.28
= (X*’X*)--1(X*’Y)=
(12.29)
0.80 0.32
Transformation back to the original coordinate system is accomplished as described in Section 11.5.
0
8 0 4 0 8 0 0
“x)=[O
0
f:i 0
(12.30)
’
A = (X*’X*) - (X*’X)=
(12.31) 0 0 0
4 4.00
0
which is the same result as was obtained in the previous section (see Equation 12.22). Many workers do not transform the parameter estimates back to the original coordinate system, but instead work with the parameter estimates obtained in the coded factor space. This can often lead to surprising and seemingly contradictory results. As an example, the fitted model in the coded factor space was found to be
24 1
yl;
=4.00+0.00~;; +0.80~;;+0.32x;ix;,
This suggests that the form
p;
(12.33)
term can be omitted and leads to a simpler model of the
y l r= PT,+ P Z G + P:2~T,~';,+rl,
( 12.34)
which is clearly not the form of the equation from which the data were generated (see Equations 12.8 and 12.10). The coded equation has replaced the P,xli term with a P>ii term! This is not an uncommon phenomenon: the mathematical form of the model can often appear to change when factor levels are coded. The reason for the present change of model is seen in the algebra of coding.
Substituting coded values gives
0
0
1
2
3
7 4 5 6 Level of Factor X I
;3
9
10
Figure 12.10 Factor combinations for a star experimental design in two-dimensional factor space.
242
Figure 12.11 Factor combinations for a star experimental design in three-dimensional factor space.
(12.37)
(12.38)
(12.39) From this it is seen that
(12.40)
(12.41 )
PT = (2p.2 + 1
+
= (0.00 0.80) = 0.80
(12.42)
243
Thus, it is the algebraic effect of coding and the numerical values of the estimated parameters that cause some terms to be added to the model (e.g., p,) and other terms to disappear from the model (e.g., p,). Transformation of parameter estimates back to the original factor space usually avoids possible misinterpretation of results.
12.5 Star designs Two-level factorial designs are useful for estimating first-order factor effects and interaction effects between factors, but they cannot be used to estimate additional second-order curvature effects such as those represented by the terms PI and p,&, in the model ~ l / = p o + p l X l r + P Z ~ 2 / + P l I X : r +P22x:,+rlr
( 12.44)
A different class of experimental designs, the star designs, provide information that can be used to fit models of the general form described by Equation 12.44. Models of this class contain 2k + 1 parameters, where k is the number of factors included in the model. Star designs are located by a center point from which other factor combinations are generated by moving a positive and a negative distance in each factor dimension, one factor dimension at a time. Star designs thus generate 2k + 1 factor combinations, and are sufficient for estimating the 2k + 1 parameters of models such as Equation 12.44. Star designs for two- and three-dimensional factor spaces are shown in Figures 12.10 and 12.1 1, respectively. As an example of the use of star designs, let us fit the model of Equation 12.44 to data obtained at locations specified by the star design in Figure 12.10.
3.50
5.75 8.00 5.75
3.50
244
Let us code the factor levels using c,, = cX2= 5 and d,, = dx2= 3. Then,
1 1 -1 0 (X*’X*)= 0 -1 0 1 0 1
*
-
0 2 0 0 0
0 0 2 0 0
[[ y;] 0 1 0
1 - 1
1 10 1 1 0 0 1 1 0
[ [r : : -I
5 0 = 0 2 2
(X*’X )
1 0 0 0 0
0 1 0
2 2 0 0 0 0 2 0 0 2
:.5 0 0 -1 0 -1 0
0.5 0 0
(12.45)
-1
-1
0 0 1.5
(12.46) 1 1.5
1
rj m
--
1
-1% , v 3 1 2 3 4 5 6 7 Level o f Factor X I
I ’ l
8
9
1(
Figure 12.12 Factor combinations for a central composite experimental design in two-dimensional factor space.
245
r
1
1 -1 0 (X*'Y)= 0 - 1 1 0 0 1
-
1 0 0 0 0
1 1 ' 0 1 1 0 0 1 1 0
26.5 (12.47) 7.0 11.5
(12.48)
1
(X*'X)= 0 0 6 2 10 10 2 10 10
(12.49) 68 50
50 68
1 5
A = (X*'X*) - '
5 25 25 0 3 0 3 0 0 (X*'X)= 0 0 3 0 3 0
0 0 0 0 0 0
9 0
(12.50)
0 9
Figure 12.13 Factor combinations for a central composite experimental design in three-dimensional factor space.
246
(12.51)
Because 2k + 1 parameters were estimated from data obtained at 2k + 1 factor combinations, there are no degrees of freedom for lack of fit (f - p = 5 - 5 = 0); again, there was no replication in this example, so the estimated model should give a “perfect” fit to the data. This is verified by estimating the responses using the fitted model parameters.
4 25 25 25 64 25
[El - 0.25
(12.52)
3.50
This reproduces the original data exactly, as expected. Other models can be fit to data from star designs. For example, the model
YII = Po + P I X I , + P22X2, + Y I I
(12.53)
can be fit to the data used in this section. Some models cannot be fit to data obtained from star designs; in particular, many models possessing factor interaction terms produce (X’X) matrices that cannot be inverted.
12.6 Central composite designs One of the most useful models for approximating a region of a multifactor response surface is the full second-order polynomial model. For two factors, the model is of the form
247
Figure 12.14 Graph of the fitted full second-order polynomial model yI, = 3.264 0.5664x2,- 0.1505~:,- 0.027342, - 0 . 0 5 8 7 5 ~ , ~ ~ , .
+
1.537x,, +
In general, if k is the number of factors being investigated, the full second-order polynomial model contains %(k + l)(k + 2) parameters. A rationalization for the widespread use of full second-order polynomial models is that they represent a truncated Taylor series expansion of any continuous function, and such models would therefore be expected to provide a reasonably good approximation of the true response surface over a local region of experiment space.
Level of Factor X I
Figure 12.15 Contours of constant response as functions of x , and x2 for the response surface of Figure 12.14.
248
TABLE 12.2 Efficiency of full second-order polynomial models fit to data from central composite designs without replication. Experimental factors, k
Parameters, p = %(k + l)(k
+ 2)
Factor combinations, f = 2k + 2k + 1
Efficiency, E=plf
1
3
5
0.60
2
6
9
0.67
3
10
15
0.67
4
15
25
0.60
5
21
43
0.49
6
28
77
0.36
7
36
143
0.25
8
45
273
0.16
9
55
531
0.10
10
66
1045
0.06
If we choose to use a full second-order polynomial model, then we must use an appropriate experimental design for estimating the %(k + l)(k + 2) parameters of the model. Two-level factorial designs are an attractive possibility because they allow PI, P2, and p12parameters; however, they do not allow estimation estimation of the of the second-order parameters PI, and pZ2,and for situations involving fewer than four factors, there are too few factor combinations (2k) to estimate all !h(k + l)(k + 2) parameters of the model. Star designs are also an attractive possibility: they allow the estimation of the PI1and p22parameters along with the Po, PI, and p2 parameters; unfortunately, star designs do not allow the estimation of the P12 interaction parameter, and for all situations involving more than one factor, there are too few factor combinations (2k + I) to estimate all parameters of the full second-order polynomial model. The juxtaposition of a two-level factorial design with a star design gives a composite design that can be used to estimate all parameters in a full second-order polynomial model. If the centers of the two separate experimental designs coincide, the resulting design is said to be a central composite design. If the centers do not coincide, the result is a non-central composite design. Figures 12.12 and 12.13
Po,
249
illustrate central composite designs for the two- and three-dimensional factor spaces. Central composite designs are relatively efficient for small numbers of factors. “Efficiency” in this case means obtaining the required parameter estimates with little wasted effort. One measure of efficiency is the eficiency value, E
E=P/f
(12.55)
where p is the number of parameters in the model to be fit, and f is the number of factor combinations in the experimental design. The efficiency value ranges from unity to zero; if it is greater than unity, the model cannot be fit. In general, it is not desirable that the efficiency value be unity - that is, the design should not be perfectly efficient. Some inefficiency should usually be included to estimate lack of fit. Table 12.2 shows efficiency as functions of the number of experimental factors for fitting a full second-order polynomial model using a central composite experimental design. For up to about five factors the efficiency is very good, but decreases rapidly for six or more factors. Replication is often included in central composite designs. If the response surface is thought to be reasonably homoscedastic, only one of the factor combinations (commonly the center point) need be replicated, usually three or four times to provide sufficient degrees of freedom for .s; If the response surface is thought to be heteroscedastic, the replicates can be spread over the response surface to obtain an “average” purely experimental uncertainty. As an illustration of the use of central composite experimental designs, let us fit the model of Equation 12.54 to the following responses Y obtained at the coded factor combinations D”. We will assume that c, = c,, = 5, and d,, = d,, = 2.
-
7.43 7.93 6.84 6.40
-
6.50 4.42 Y= 7.45 7.4 1 7.95 7.81 7.86 - 7.85 -
(12.56)
250
-1
-1
-1
1 -1 1
1
1 -2 2 0 0 0 0 0 0
0 0
(12.57)
-2 2
0 0 0 0
Inspection of the coded experimental design matrix shows that the first four experiments belong to the two-level two-factor factorial part of the design, the next four experiments are the extreme points of the star design, and the last four experiments are replicates of the center point. The corresponding x* matrix for the six-parameter model of Equation 12.54 is
1
-1
1 1
1 -1 1
-1 -1 1 1 1 1 - 2 1 2
1
0
1 1 1 1 - 1
0 0 0
1 1 1 1
1 1 1 - 1 1 1
0 4 0 0 4 0 2 0 4 2 0 4
0 0
-1 1 0 0
0 0
0
0
0
0
0
0
0 0
0
0
0
0
0
0
0
0,
-
~
Figure 12.16 Sums of squares and degrees of freedom tree for Figure 12.14.
(12.58)
25 1
T
2
I
3
,
4
-
7
Level of Factor X I
Level of
5 I Factor X I
Level of Factor X 1
Level of
:actor
X1
Figure 12.17 Upper left panel: contours of constant response in two-dimensional factor space. Upper right panel: a subset of the contours of constant response. Lower left panel: canonical axes translated to stationary point of response surface. Lower right panel: canonical axes rotated to coincide with principal axes of response surface.
r
(X*'X*)=
12 0 0 1 2
I : 1 19 12 0
0 0
1 :8 I 1
0 12 12 0 0 0 0 0
l: 3: 0 4 3 6 0 0 0 0 4
(12.59)
252
5/24 0 0 (PA?-' = -1/16 -1/16
0 1/12 0 0 0
0
0
0 0 1/12 0 0 0
-1/16 0 0 3.64 1/64 0
-1/16 0 0 1/64 3/64 0
0 0 0 0 0 1/4
(12.60)
Note that with this coded experimental design, the estimates of by, b;, and b;z will be independent of the other estimated parameters; the estimates of bi, b;],and b;* will be interdependent.
(F'Y) =
85.85 - 6.280 -0.02000 72.28 88.04 - 0.9400
(12.61)
r
7.865 i -0.5233 - 0.00 1667 &=(Pxlc)-'(x*'Y) = - 0.60 19 -0.1094 1-0.2350
(12.62)
1
Thus, the fitted model in coded factor space is
(12.63)
In uncoded factor space the parameter estimates are
253
(12.64)
The fitted model of Equation 12.64 is drawn in Figure 12.14. The negative estimates of b,, and b,, cause the estimated response surface to fold downward quadratically in both factors, x1 and x,, although less rapidly in factor x, than in factor xl. Careful inspection of Figure 12.14 reveals that the “ridge” of the surface is tilted with respect to the factor axes - the ridge runs obliquely from the middle of the left front side toward the far back comer of Figure 12.14. This “rotation” of the response surface with respect to the factor axes is caused by the interaction term b,gc,x,.This ridge is seen more clearly in Figure 12.15 which shows “contours of constant response” as a function of the factors x1 and x, for Equation 12.64. Each contour in Figure 12.15 represents the intersection of the response surface in Figure 12.14 with a plane parallel to the xI-x2 plane at a given response level. Response contours for y1 = 8, 7, 6, 5, and 4 are shown in Figure 12.15. The sum of squares and degrees of freedom tree for the fitted model is given in Figure 12.16. The R2 value is 0.9989. The Fisher F-ratio for the significance of the factor effects is Fc5,a, = 1096.70 which is significant at the 100.0000% level of confidence. The F-ratio for the lack of fit is F,,,, = 0.19 which is not very significant. As expected, the residuals are small:
R=
-0.014 0.0 19 -0.028 0.006 - 0.005 0.009 0.019 -0.015 0.085 - 0.055 - 0.005 -0.015
(12.65)
254
12.7 Canonical analysis Full second-order polynomial models used with central composite experimental designs are very powerful tools for approximating the true behavior of many systems. However, the interpretation of the large number of estimated parameters in multifactor systems is not always straightforward. As an example, the parameter estimates of the coded and uncoded models in the previous section are quite different, even though the two models describe essentially the same response surface (see Equations 12.63 and 12.64). It is difficult to see this similarity by simple inspection of the two equations. Fortunately, canonical analysis is a mathematical technique that can be applied to full second-order polynomial models to reveal the essential features of the response surface and allow a simpler understanding of the factor effects and their interactions. Canonical analysis achieves this geometric interpretation of the response surface by transforming the estimated polynomial model into a simpler form. The origin of the factor space is first translated to the stationary point of the estimated response surface, the point at which the partial derivatives of the response with respect to all of the factors are simultaneously equal to zero (see Section 11S ) . The new factor axes are then rotated to coincide with the principal axes of the second-order response surface. The process of canonical analysis is illustrated in Figure 12.17 for a two-factor response surface with elliptical contours of constant response. Translation has the effect of removing the first-order terms from the polynomial model; rotation has the effect of removing the interaction terms. It is the signs and magnitudes of the remaining second-order terms that reveal the essential features of the response surface. To find the coordinates of the stationary point, we first differentiate the full second-order polynomial model with respect to each of the factors and set each derivative equal to zero. For two-factor models we obtain
(12.66) The coordinates of the stationary point (sx1 and sX2) are those values of xl and x2 that simultaneously satisfy both of these partial derivatives. Equation 12.66 may be rewritten as (12.67)
255
or in matrix notation
(12.68) Let us define a 1 x k matrix of stationary point coordinates, s; s = [sx,
&2l
(12.69)
Let us also define a k x 1 matrix of first-order parameter estimates,
f=
[:I
(12.70)
Finally, we define a k x k matrix of second-order parameter estimates, S ; (12.71) The single-factor second-order parameter estimates lie along the diagonal of the S matrix, and the two-factor interaction parameter estimates are divided in half on either side of the diagonal. By Equations 12.69-12.71, Equation 12.68 may be rewritten 2Ss' = -f
(12.72)
SS'= -0.5f
(12.73)
or
The solution to this set of simultaneous equations gives the transpose of the stationary point coordinate matrix. S'
= -0.SS-lf
For the parameter estimates of Equation 12.64,
(12.74)
256
S'
= -0.5
-0.1505 -0.05875/2
-0.05875/2]-' -0.02734
[
]
1.537 - [3.903] 0.5664 - 6.165
(12.75)
That is, the coordinates of the stationary point are x , = 3.903, x2 = 6.165. The corresponding response at the stationary point is 8.009. These results may be verified qualitatively by inspection of Figures 12.14 and 12.15. Translation of the origin of factor space to the location specified by s (i.e., coding in which cxl = sxl,cx2= sxz, and d,, = dx2= 1) has the effect of making 6 ; = 0 and 6: = 0, where the superscript t indicates coding involving translation only. The parameters bi,, b:2, and bi2 have the same values as their uncoded estimates. If the b; term is subtracted from both sides of the equation, a simpler model for the response surface is obtained, a form containing second-order terms only.
Rotation of the translated factor axes is an eigenvalue-eigenvector problem, the complete discussion of which is beyond the scope of this presentation. It may be shown that there exists a set of rotated factor axes such that the off-diagonal terms of the resulting s" matrix are equal to zero (the indicates rotation); that is, in the translated and rotated coordinate system, there are no interaction terms. The relationship between the rotated coordinate system and the translated coordinate system centered at the stationary point is given by
-
Figure 12.18 Response surface representing a parabolic bowl opening upward.
251
Figure 12.19 Response surface representing a parabolic bowl opening downward.
(12.77)
where Zii and .tii are the coordinates in the translated and rotated factor space, and the e’s are the elements of the rotation matrix E that results from the solution of the eigenvector problem. The corresponding rotated parameter estimates become 6; and b;2(bf, = 0). Thus, all non-degenerate two-factor full second-order polynomial models can be reduced to the form
The signs and magnitudes of gI,xI: and reveal the essential features of the response surface. Table 12.3 gives some of the possibilities; these possibilities are illustrated in Figures 12.18-12.22. For the parameter estimates of Equation 12.64, rotation of the translated factor axes gives
(12.79) but with a Thus, the canonical factor axis .ti is primarily x, in character (0.9753~~) Similarly, .ti is primarily x2 (0.9753~~) with slight amount of x2 character (0.2207~~).
Figure 12.20 Response surface representing a flattened parabolic bowl opening downward.
0
-0.02069
1
(12.80)
a small amount of x1 (-0.2207~~). Not unexpectedly, in this example the rotated parameter estimates if,and b;2 are not very different from the corresponding unrotated values. Because and are both negative and because they differ by an order of magnitude, we expect the response surface to be a flattened parabolic bowl opening downward (see Table 12.3). Figure 12.23 graphs the response surface of Equation 12.64 in canonical form.
&,
TABLE 12.3 Interpretation of two-factor canonical parameter estimates. Sign of
Relationship
g: I
Sign of
Interpretation
G 2
Illustrated in figure
~
+
Parabolic bowl opening upward
12.18
-
Parabolic bowl opening downward
12.19
-
<
-
Flattened parabolic bowl opening downward
12.20
-
<<
-
Ridge
12.21
+
=
-
Saddle region or col
12.22
259
Figure 12.21 Response surface representing a ridge.
12.8 Confidence intervals Confidence intervals for single-factor response surfaces were discussed in Section 11.6. The equations developed for estimating different types of confidence intervals (Equations 11.76, 11.79, 11.80, and 11.81) are entirely general and can be used for multi-factor response surfaces as well. For example, plotting the standard uncertainty for the estimation of a single value of response (the square root of Equation 11.71) as a function of both x , and x2 for the model expressed by Equation 12.54 and the data of Section 12.6 gives the uncertainty surface shown in Figure 12.24 (compare with Figure 11.7). The uncertainty is smallest in the region of greatest experimentation, especially at the center point. The uncertainty is greatest in those regions farthest away from the experimental points, as expected. The data in Figure 12.24 can be used with Equation 11.76 to generate confidence bands for the fitted model of Equation 12.64. The upper and lower 95% confidence bands are shown as transparent surfaces in Figure 12.25. Other confidence bands can be generated by Equations 11.79-1 1.81 .
12.9 Rotatable designs The uncertainty contour in Figure 12.24 is “lumpy” because the uncertainty depends not only on the distance from the center of the design, but also on the distance from points at which the other experiments have been carried out. For some
260
Figure 12.22 Response surface representing a saddle region.
applications, it might be desirable that the uncertainty predicted by a full second-order polynomial model be dependent only on the distance from the center of the design and be independent of the location of the experimental points. Such a condition can be achieved by a class of central composite experimental designs known as rotatable designs. In effect, the star points are located the same distance from the center of the design as the factorial points. For a two-factor central composite design, the star points are located d 2 coded units from the center. Thus, the design of Equation 12.57 could be made rotatable if it had specific elements of
-5
-4
-2 -1 0 1 2 3 Level of Canonical Factor XI
-3
4
5
Figure 12.23 Contours of constant response as functions of the canonical axes 2; and .t; for the response surface of Figure 12.14.
26 1
Figure 12.24 Standard uncertainty for estimating one new response as a function of the factors x, and x,. See text for discussion.
-1 -1 1 1
D* =
-J2 J 2 0
0 0
0 0 0
-1 1 -1 1 0 0 -J2 J 2
(12.81)
0 0 0 0
Assuming the same s: as was used to draw Figure 12.24, the uncertainty surface for the rotatable design of Equation 12.81 is shown in Figure 12.26. The uncertainty depends only on the distance from the center of the design; i.e., the contours of constant uncertainty are circular about the center of the design. The use of rotatable designs usually makes sense only in normalized factor spaces (each factor divided by dxi)because it is difficult to define distance if the factors are measured in different units. For example, if x, is measured in "C and x2 is measured in minutes, the distance of a point (x,;, x2J from the center of the design (cxl,cX2) would be calculated as
distance=J{ [x,, -c,--) "C]'+ [ (x2,-c,)min]'}
(12.82)
262
0
Figure 12.25 Upper and lower 95% confidence bands for the response surface of Figure 12.14.
However, it is not possible to add OC2 and m i d . In a normalized factor space the factors are unitless and there is no difficulty with calculating distances. Coded rotatable designs do produce contours of constant response in the uncoded factor space, but in the uncoded factor space the contours are usually elliptical, not circular.
12.10 Orthogonal designs An interesting phenomenon occurs if we add four more center points to the design of Equation 12.81. -1 -1 1 1
-1 1
-J2 J2
0 0 4 2 J2 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
-1 1
(12.83)
263
Figure 12.26 Standard uncertainty for estimating one new response as a function of the factors x, and x2 for a rotatable design. Compare with Figure 12.24. See text for discussion.
The corresponding matrix least squares treatment for the full second-order polynomial model proceeds as follows.
-
1 1 1
1 1 1 1 1 1 1 1 1 1
-1 -1 1 1 -J2
-1
1 -1
1 0 0
J2 0 0 0 0
-J2 J 2 0 0
0 0 0
0 0 0
0
0
0 0
0 0
1 1 1
1 6 0 0 0 8 0
8 0
1 1 1 1
1 I 1 1
1 -1 -1
2
0
2 0 0 0
0 2 2 0
0 0 0 0 0
0 0 0 0
0 0 0 0
0 0
0 0
0
0
0
1
(12.84)
0 0 0 0 0 0 0
Ol
0
(12.85)
8 0
0 0
0 0
4 1 2 0 0 0 4
264
/8 0 0 /16 /16 0
0 1/8
0 0 0 0
0 0 1/8
0 0 0
- /16 0 0 /8 0 0
-1/16
0 0 0
O0
0
1/4
0
l
I
(12.86)
From this (X"X')-' matrix it is clear that the estimates of by, b;, byl, bi2, and by2 do not depend on each other; that is, there will be no covariance among the parameter estimates associated with the coded factor effects. (The parameter bi is not a factor effect.) If the parameter estimates associated with any one factor in a multifactor design are uncorrelated with those of another, the experimental design is said to be orthogonal. In this example, orthogonality of all factor effects has been achieved by including additional center points in the coded rotatable design of Equation 12.81. Orthogonality of some experimental designs may be achieved simply by appropriate coding (compare Equation 12.26 with Equation 12.20, for example). Because orthogonality is almost always achieved only in coded factor spaces, transformation of the coded parameter estimates (B')back to the uncoded factor space (B)usually destroys the condition of orthogonality. Orthogonal designs have been historically important because they often permit simple mathematical formulas to be used to calculate the factor effects (e.g., the Y ates algorithm for treating factorial designs), thus avoiding the tedious manual calculations that were previously required for the matrix least squares fitting of linear models to data. A practical disadvantage of orthogonal designs used solely for the purpose of "easy" calculation is that a missed or incorrectly executed experiment prevents the use of simple calculational formulas. However, such an incomplete data set can usually be treated by matrix least squares; this is a relatively trivial matter using modern computers.
12.11 Scaling Visual presentation of data treatment results can sometimes be misleading if scaling factors are not taken into account. For example, the parabolic bowl shown in Figure 12.19 could be made to look like the ridge in Figure 12.21 if the scaling for
265
0
u
1
2
3
4 5 5 7 L e v e l o f F a c t o r XI
I
8
r
Level o f F a c t o r X I
v ’
,
,
0
2
4
9
_
J
-
I
,
,
I
6
,
I
,
.
1
8 11 12 14 Level of Factor X I
,
I
,
,
.
.
16
18
,
,
20
Level o f F a c t o r x 1
Figure 12.27 Upper left panel: replot of the lower right panel in Figure 12.17. Upper right panel: x1 axis compressed. Lower left panel: x2 axis compressed. Lower right panel: x, axis highly compressed.
the x,-axis were extended (e.g., if the domain 4 I x1 5 6 , 0 I x2 I 10 were plotted full scale). Similarly, the ridge of Figure 12.21 could be made to look like the parabolic bowl of Figure 12.19 if the x,-axis were compressed. Scaling can also affect apparent relationships in canonical analysis. Figure 12.27 shows different scalings of the lower right panel of Figure 12.17. The canonical axes are at right angles to each other at all times in the coded factor space; it is the scaling that distorts them in the visual presentations. The effects of visual scaling and numerical coding are very similar. However, the “distortions” caused by numerical coding are not always as readily apparent as some distortions caused by visual scaling. For example, rotatable designs in coded factor spaces might not produce rotatable designs in uncoded factor spaces (see Section 12.9). We simply warn the reader that concepts such as “rotatable”, “circular”,
266
“orthogonal”, “perpendicular”, and “elliptical” are dependent on both numerical coding and visual scaling; caution should be exercised in the use and understanding of these concepts.
12.12 Mixture designs Mixtures (orformulations) consist of two or more components combined in definite proportions such that the sum of their fractions is unity [Claringbold (1955)l. The mathematical expression of this concept is
1 x,=XI +x, I=
+x, + * -.+x, = 1
(12.87)
1
where xi is the fraction of the ith component in the mixture of n components and is constrained to lie in the range 0 I xi I 1. The fraction might be expressed as the mole fraction, the mass fraction, the weight fraction, or the volume fraction. Alternatively, each xi in Equation 22.87 might be expressed as a percent so that the sum of all of the xi components equals 100%. Many practical applications exit [see, for example, Cornell (1973), Glajch, Kirkland, and Snyder (1982), Glajch, Kirkland, Squire, and Minor (1980), Glajch and Snyder (1981), and Coenegracht, Dijkman, Duineveld, Metting, Elema, and Malingre (1991)l. Because the sum of all the fractions must equal unity (or loo%), only IZ - 1 of the components can be specified independently; the remaining component is a dependent or “slack” variable [Snee (1973)l. Such a system is said to have n - 1 degrees of freedom. It is impossible to increase the fraction of one component in the mixture without decreasing the fraction of at least one other component. Placing one or more equality constraints on a system often leads to a confounded (or confused) interpretation of experimental data. Care must be exercised when assigning component effects to a particular response. In mixtures, for example, an observed change in response could be attributed to the increase in one component or to the concomitant decrease in one or more of the other components. The upper left panel of Figure 12.28 shows the nine factor combinations (design points) of a 3’ full factorial design in two independent mixture components, x1 and x2. The third component of the three-component mixture is not independent, and is obtained from the mixture constraint (Equation 12.87) by difference: ‘/OX,
= 100%- ‘/OX,
- ‘/ox2
(12.88)
Recognizing that the percentages of the components cannot be less than 0% or greater than loo%, the factor space can be constrained as shown by the solid lines connecting the outer design points in the upper right panel of Figure 12.28. However,
267
+ 0
20
40
EO
60
Mole P e r c e n t
xl
100
0
20
40
60
80
100
Mole P e r c e n t X 1
Figure 12.28 Conversion of a 3’ full factorial design to a give a constrained mixture design. See text for details.
not all of the nine design points are physically possible. For example, the upper right point for which x1 = 100% and x2 = 100% requires a total of 200%, clearly impossible for an intensive factor such as percent composition (see Section 1.2). In fact, as shown in the lower left panel of Figure 12.28, any compositions that lie to the upper right of the line connecting the points (x, = 0%, x, = 100%) and (x, = loo%, x2 = 0%) are also impossible. This leaves the remaining six design points shown in the lower right panel of Figure 12.28. The upper left panel of Figure 12.29 shows the design points with the labelled factor axes drawn next to the resulting triangle of points. If the symbol X1 is used to
268 x2
Mole Percent xI
Figure 12.29 Conversion of an orthogonal mixture space to an equilateral triangular mixture space. See text for details.
represent the vertex (corner) that corresponds to pure component x, (100% x,), and if the symbol X2 is used to represent the vertex that corresponds to pure component x2 (100% x2>,then the labelled factor axes can be omitted with the result shown in the upper right panel of Figure 12.29. It can be seen that the origin of the original factor space corresponds to 0% x, and 0% x2, so that by Equation 12.88, %x, = 100%- 0% - 0%= 100%
(12.89)
Thus, the origin is labelled X 3 as shown in the lower left panel of Figure 12.29. It is our impression that in the early days of mixture designs, a diagram like that in the lower left panel of Figure 12.29 was shown to a physical chemist. This physical chemist probably remarked that mixtures are not represented by right-angle
269
triangles, but rather by equilateral triangles. Thus, the array of design points shown in the lower left panel were transformed into the array of design points shown in the lower right panel of Figure 12.29. This equilateral triangular (or tetrahedral) depiction dominates the statistical mixture design literature and, in our opinion, makes entry into the field unnecessarily difficult. The triangular representation does have the advantages of showing all components explicitly and giving them equal graphical weight, however. Figure 12.30 shows an alternative derivation of the two-dimensional nature of three-component mixture designs. The upper panel in Figure 12.30 shows a shaded diagonal plane passing through the feasible three-dimensional volume of the three components xl, x2, and x,. The shaded plane corresponds to all combinations of xl, x,, and x3 that satisfy the equality constraint of Equation 12.87. Because each side of this constrained triangular plane forms a diagonal of the two-dimensional sides of the constrained cubic feasible volume, the sides of the triangle must be equal. If this shaded triangular plane is removed from the three-dimensional factor space and placed in the two-dimensional plane of the page, the diagram shown in the lower panel is the result. Figure 12.31 shows contours of constant composirion for the three individual components xI (upper left panel), x2 (upper right panel), and x3 (lower left panel). Although these contour lines show constant composition for the indicated factor, the relative composition of the other two components will change along any given line but their sum will remain constant. Note that the composition is 100% at the labelled apex of the triangle and decreases to 0% at the opposite base of the triangle. These composition contours are superimposed in the lower right panel. The design point (black dot) indicates a composition of 25% xl, 50% x2, and 25% x,. Mixture designs are used to supply data for fitting continuous response surface models, either first-order models, such as Ylr
= Po + P I XI, + P2XI 1 + T l r
(12.90)
or second-order models, such as Yl1
= Po + PlXlr + P 2 X 2 r + P I l
d r
+ P22X:r + P12XlIX2r+r1,
(12.91)
Because there are only three parameters in Equation 12.90, a design involving three design points, one each at pure xl, pure x,, and pure xj, would be adequate but would provide no degrees of freedom for lack of fit [Anderson and McLean (1974)l. The design shown in the upper left panel of Figure 12.32 contains three edge points (binary mixtures) that would supply three degrees of freedom for lack of fit if the model of Equation 12.90 is to be used. However, this design could also be used to fit the model of Equation 12.91, but again there would be no degrees of freedom for lack of fit. A more serious deficiency of the experimental design shown at the upper
270
100% x 2
:eo%
x3
100% X I
Figure 12.30 Alternative visualization of the equilateral triangular mixture space.
left panel is that there are no true ternary mixtures represented by the design points. For these reasons, statisticians often recommend adding a center point (33V3% xl, 33l/3% x2, 33l/3% xg) and replicating it a total of four times to provide three degrees of freedom for purely experimental uncertainty and one degree of freedom for lack of fit (of the model in Equation 12.91). The lower left panel in Figure 12.32 shows contours of constant response (viscosity of a cutting oil used in a machine shop, for example) from a model such as Equation 12.91. Perhaps there is a threshold of desired viscosity such that some compositions give results above this threshold and other compositions give results below this threshold. The shaded region in the lower right panel shows those compositions that give viscosities at or above the given threshold. Most research and development projects involve several responses (see Figure 1.4). Perhaps the shaded region in the upper left panel in Figure 12.33 shows compositions that have suitably low volatility. Superposition of the two response surfaces as shown in the upper right panel reveals an overlap region where both adequate viscosity and
27 1 x2
x2
x3
X I
x3
x2
x3
x2
X I
x3
X I
Figure 12.31 Contours of constant composition for each of the three mixture components (x,, x2, and x3), individually and superimposed. The single design point corresponds to 25% x,, 50% x2, and 25% xg.
adequate volatility can be found. Alternatives to the use of superimposed contour plots include the use of desirability functions [Harrington (1965), and Chapter 8 in Walters, Parker, Morgan, and Deming (1991)l and fuzzy logic [Zadeh (1965), Kaufmann and Gupta (1985), and Kandel (1986)l. Many mixtures exhibit “edge effects” such that the behavior of the formulation shows drastic changes when one or more of the components is omitted from the mixture [Anderson and McLean (1974)l. Thus, if simple empirical models such as Equations 12.90 and 12.91 are to be used to model the system, it is often best to work in regions that have all components present. Such systems can be prepared with so-called “pseudo-components” [Cornell (1990)l as shown in the lower two panels of Figure 12.33. The pseudo-components correspond to the vertexes in these designs and are seen to be mixtures that are relatively rich in one of the components. In practice, the pseudo-components can be prepared first, and then the other mixtures in the design can be prepared from these pseudo-components. The first-order “Scheffk model”, sometimes called the “Canonical model,” uses
212
x2
x2
x3
X I
x3
X1
Figure 12.32 Mixture designs and fitted full second-order polynomial response surfaces. See text for details.
three parameters to describe the response for each pure component: Yi, = P l X h
+ P2X2, + 03x3, + Y l ,
(12.92)
However, this model can be converted to the intercept model as follows: x3 = 1 -XI
(12.93)
-x2
Ylr=PlXl,+P2X2*+P3(1-X1
-x2)+y1,
(12.94)
273 x2
x2
R
A
X I
x3
x2
x3
X I
x3
x2
X I
x3
X i
Figure 12.33 Upper panels: overlapping contour plots. Lower panels: pseudo-component concepts. See text for details.
Figure 12.34 Pseudo-three-dimensional plot of response as a function of composition for a three-component mixture.
274
Ylr = P;; + PTx,,+ PTx2, + Y I I
(12.96)
where Pi = P3, p; = PI - P3, and = p2 - P3. Without the stars that indicate coding, Equation 12.96 is identical to Equation 12.90. The Scheffk model appears to have no intercept term and three factor effects. Thus, the sums of squares and degrees of freedom tree for models that contain a Po term (Figure 9.2) would not seem to apply; instead, the sums of squares and degrees of freedom tree for models that lack a Po term (Figure 9.3) would seem appropriate. However, because of the mixture constraint, the three Scheffk parameters are not independent and, from a degrees of freedom point of view, the model should be treated as if it still had a Po term. See Cornell (1990) for details. Other representations are also possible [e.g., Cox (1971)l. The results of three-component mixture designs are often presented as response surfaces over the triangular mixture space as shown in Figure 12.34. The Scheffk model parameters are seen to be equivalent to the responses at the vertexes.
275
Exercises 12.1 Multifactor systems. Draw a system diagram for a process with which you are familiar. Choose two inputs for investigation. Over what domain might these two inputs be varied and controlled? What linear mathematical model might be used to approximate the effects of these two factors on one of the responses from the system? Design a set of experiments to test the validity of this model. 12.2 Response su@aces. Sketch the response surface that might be expected for the system you chose in Problem 12.1. 12.3 Factor interaction. Assume the two inputs you chose in Problem 12.1 will exhibit factor interaction. If they would be expected to interact, what would be the mechanistic basis of that interaction? What might be the approximate mathematical form of that interaction? Would you use a mechanistic or empirical model to approximate the response surface? Why? Sketch the response surface predicted by this model. 12.4 Factor interaction. Eliminate the interaction terms from the model you chose in Problem 12.3 and sketch the response surface predicted by this simpler model. How dissimilar is this sketch from the sketch made in Problem 12.3? How important does the interaction term seem to be? 12.5 Homoscedasticity and heteroscedasticio. Is the “response surface” shown in Figure 12.6 homoscedastic or heteroscedastic? 12.6 Canonical analysis. What is the final canonical equation in Section 12.7? 12.7 Factorial designs. Sketch the location of the factor combinations in a 4* factorial design. In a 33 factorial design. In a 24 factorial design. What i s f i n each of these designs? 12.8 Matrix inversion. Verify that the (X’X) matrix obtained for the model of Equation 12.24 and the experimental design of Figure 12.9 cannot be inverted.
27 6
12.9 Coding. What is the relationship between Equations 12.40-12.43 and A of Equation 12.31? Could A be derived algebraically using c,; and d,; rather than using X and X‘? 12.10 Corner designs. Comer designs are located by a starting point from which other factor combinations are generated by moving a positive distance in each factor dimension, one factor dimension at a time. How many factor combinations are there in comer designs? Sketch the location of the factor combinations in 2-, 3-, and 4-factor comer designs. How many parameters can be estimated by comer designs? What linear models are exactly fit by corner designs? Look up the definition of a “simplex”. Are comer designs a class of simplex designs?
12.1 I Experimental designs. Can the following design be used to fit the model expressed by Equation 12.44? Can it be used to fit the model expressed by Equation 12.15? How is this design related to a star design? To a comer design?
12.12 Polynomial models. Write full second-order polynomial models for 1, 2, 3, 4, and 5 factors. 12.I3 Composite designs.
Sketch a 3-factor non-central composite design for which the center of the star coincides with one of the factorial points. 12.14 Canonical analysis. Perform a canonical analysis on the fitted equation y l i = 5.13 + 0 . 1 6 1 ~~ 0 .~3 7 3 ~ ~ ~ + 0.5174; - 1.33& - 0.758~,,x,~.What are the coordinates of the stationary point? What are the characteristics of the response surface in the region of the stationary point (see Table 12.3)?
12.15 Canonical analysis. Write a table similar to Table 12.3 for three-factor systems. What do the possible isoresponse contours look like in three-dimensional factor space? [See, for example, Box (1954).]
277
12.16 Eficiency offactorial designs. Derive an expression that describes the number of parameters p as a function of the number of factors k for the general model of Equation 12.15 (first-order with interaction and offset). Prepare a table of efficiency ( E = p/n) of two-level factorial designs for this model (see Table 12.2). 12.1 7 Response surjkes. Sketch the response surface for Equation 12.10 over the factor space -10 5 x I I 0, 0 5 x, 5 10. 12.I8 Matrix representations. Show that the full two-factor second-order polynomial model may be written y l i = Po + Do + D J D b where Do = [xli xzi]. Show that this may be extended to full three-factor second-order polynomial models. 12.19 Experimental optimization. One important use of experimental designs is to achieve optimum operating conditions of industrial processes. For a discussion of this application, see Box and Wilson (195 1). This paper is extraordinarily rich in response surface concepts. What is the “steepest ascent technique” discussed in this paper? What models are assumed, and what experimental designs are used? 12.20 Empirical and mechanistic models. A paper by Box and Youle (1955) explores the relationships between an empirical model and a fundamental mechanism. In Section 9 of their paper, they discuss some aspects of the process of scientific investigation. What do they perceive as the relationships among experiment, theory, and knowledge? 12.21 Wording. After Equation 15.12, the statement is made, “...it is unlikely that the sodium ion concentration has a statistically significant effect ....” Does this statement have content, or is it meaningless? What would be a better way of stating the results? 12.22 Central composite designs. It has been remarked that a three-level two-factor factorial design can be the same as a two-factor central composite design. Comment.
12.23 exercise A popular method of smoothing analytical measurement data is the least squares technique presented by Savitzky and Golay (1964) [see also Steinier, Termonia, and Deltour (1972)l. The technique is useful for data consisting of a single response as a function of a single factor with equally spaced factor levels. This type of data is
27 8
often found in time series where the frequency of data acquisition is fixed; or in spectral data where emission, transmittance, or absorbance is measured at equally spaced wavelengths or energies. Show that if the x1values are coded -2, -1, 0, +1, and +2, calculation of (X’X)-’X’ gives the five-point convolutes found in the Savitzky and Golay paper.
279
CHAPTER 13
Confrdence Intewals for Full Second-Order Polynomial Models The art of experimental design is made richer by a knowledge of how the placement of experiments in factor space affects the quality of information in the fitted model. The basic concepts underlying this interaction between experimental design and information quality were introduced in Chapters 7 and 8. Several examples showed the effect of the location of one experiment (in an otherwise fixed design) on the variance and co-variance of parameter estimates in simple single-factor models. In this chapter we investigate the interaction between experimental design and information quality in two-factor systems. However, instead of looking again at the uncertainty of parameter estimates, we will focus attention on uncertainty in the response surjiuce itself. Although the examples are somewhat specific (i.e., limited to two factors and to full second-order polynomial models), the concepts are general and can be extended to other dimensional factor spaces and to other models.
13.1 The model and a design In two-factors, the full second-order polynomial (FSOP) model is Y l , = Po
+ PlXll+
P2X2I
+ PI Ix:, + p22x:/ + P12Xl,X2/ + r l l
(13.1)
At least six distinctly different factor combinations (f=6) are required to fit the six parameters of this model ( p = 6). To provide three degrees of freedom for lack of fit, fmust be increased to 9. To provide three degrees of freedom for purely experimental uncertainty, n must be increased to 12. One experimental design that can be used to provide data to fit the two-factor FSOP model is the central composite design with four replicates at the center point. This design was introduced in Chapter 12 and is shown in Figure 12.12. A sums of squares and degrees of freedom tree for the design with four center points is given in Figure 13.1.
280
3
3
Figure 13.1 Sums of squares and degrees of freedom tree for a two-factor full second-order polynomial model fitted to a central composite design with a total of four center point replicates.
13.2 Normalized uncertainty and normalized information The confidence interval (C.I.) for one estimate of the true response (q) at a given factor combination is given by
or
The confidence interval for one estimate of a measured response 6 )must also take into account the uncertainty of that one measurement, and the confidence interval then becomes
or
Half the width of the confidence interval is given by
28 1
J { q l , n - p & u + [~o(~x)-'xOl)) (13.6) The width of the confidence interval depends on both F(l,n-p)and sf. But F(l,n-p)is a function of n, p , and the level of confidence the experimenter chooses to set for the particular confidence interval. And sf depends on both the lack of fit of the model to the data and the repeatability of experimentation. Because the values of these quantities depend on the experimenter, the model, and the system, F(l,n-p)and sf can be removed to give a normalized confidence interval half width that depends on the design only: J{l+[xO(xx)-lxbl)
(13.7)
This normalized half width will be called the normalized uncertainty in the predicted response. It is bounded between 1 and 00. Information theory states that uncertainty and information are related reciprocally.
-+d -4
-2
Factor X I
Figure 13.2 Central composite design. Square points +2, star points k4, DF,or= 3, DFp, = 3 .
282
Thus, we will define the normalized information as the reciprocal of the normalized uncertainty. The normalized information is bounded between 1 and 0:
[Xo(~X)-’Xbl)
I/&+
(13.8)
The normalized uncertainty and normalized information are related to the variance function and information function, respectively, defined by Box and Draper (1987). One purpose of a good design is to minimize uncertainty and maximize information over the region of interest. We will use both normalized uncertainty and normalized information to discuss the effect of experimental design on the quality of information obtained from a two-factor FSOP model.
13.3 The central composite design A central composite design is constructed from a two-level factorial design (a so-called “square”) and a multi-dimensional univariate design (a so-called ‘‘star”). A central composite design (or “star-square” design) is illustrated in Figure 13.2. This figure contains four smaller figures or panels. The lower lefi panel in Figure 13.2 shows the central composite design in the two factors x1 and x2. The factor domain extends from -5 to +5 in each factor dimension. The coordinate axes in this panel are rotated 45” to correspond to the orientation of the axes in the panel above. Each black dot represents a distinctly different factor combination, or design point. The pattern of dots shows a central composite design centered at (xl = 0, x2 = 0). The factorial points are located +2 units from the center. The star points are located i4 units from the center. The three concentric circles indicate that the center point has been replicated a total of four times. The experimental design matrix is -2 -2
D=
2 2 -4 4 0 0 0
0 0 0
-2 2 -2 2 0 0 -4
4 0 0 0 0
( 13.9)
283
For a two-factor FSOP model fitted to this experimental design, n = 12, f = 9, and p = 6 . The corresponding sums of squares and degrees of freedom tree is shown in Figure 13.1. The upper left panel shows a surface of normalized uncertainty (defined in Equation 13.7 above) as a function of factors x , and x,. The normalized uncertainty is relatively small in the center (approximately 1.1) and relatively large at the comers (approximately 4.0). Note that this surface generally reflects the underlying design: the uncertainty surface is relatively low in those regions where experiments have been carried out and is relatively high in those regions where experiments have not been carried out. It is important to note that the normalized uncertainty surface shown in the upper left panel is not the response surface generated by the FSOP model itself. Instead, this upper left panel is a measure of how much the response surface might “flap around” in different regions of the factor space. Experiments serve to anchor the underlying model, to “pin it to the data,” and thereby reduce the amount of uncertainty in the model at those points. The large amount of uncertainty at the comers of this upper left panel is a reflection of the freedom the model has to move up and down in those regions where experiments have not been performed. The upper right panel shows a surface of normalized information (defined in Equation 13.8 above) as a function of factors x1 and x2. The normalized information is relatively large in the center (approximately 0.91) and relatively small at the comers (approximately 0.25). Note that this surface also reflects the underlying design: the information is relatively high in those regions where experiments have been performed and is relatively low in those regions where experiments have not been carried out. Again, it is important to note that the normalized information surface shown in the upper right panel is not the response surface generated by the FSOP model itself. Instead, this upper right panel is a reflection of how “tight” the model is in different regions of factor space. Experiments serve to give information, to provide rigidity, and thereby to provide precision. The large amount of information at the center of this upper right panel is a reflection of the tightness of the model in this region. The lower right panel plots normalized information as a function of factor x1 for x, = -5, -4, -3, -2, -1, and 0. These lines show the left front edge (x2 = -5) and parallel slices through the normalized information surface in the panel above. (For this design which is symmetric about the x1 axis, the graph lines for x2 = 1, 2, 3 , 4, and 5 are identical to lines that are already present.) One of the striking features of this central composite design is the flatness of the normalized uncertainty and normalized information surfaces near the center of the design. In Figure 13.2, the experimental design, the normalized uncertainty surface, and the normalized information surface each have four planes of mirror-image symmetry, all of which are perpendicular to the x1 - x, plane. One reflection plane contains the
284
x1 axis; a second plane contains the x2 axis; the third and fourth planes contain the
+45" and -45" diagonals.
13.4 A rotatable central composite design Figure 13.3 shows a similar set of four panels for a slightly different central composite design. The lower lefi panel shows the placement of experiments in factor space (i.e., it shows the experimental design). The upper left panel shows the normalized uncertainty as a function of factors x1and x,. The upper right panel shows the normalized information as a function of factors x1 and x,. The lower right panel plots normalized information as a function of factor x1 for x, = -5, -4, -3, -2, -1, and 0. The experimental design matrix is
- 1
~, -4
, -2
,
,
,
0 Fartor X I
,
2
,
,
J
4
Figure 13.3 Rotatable central composite design. Square points 22, star points 2242, DF,,, = 3, DFp, = 3.
285
D=
-2 -2 2 2 -2J2 2J2 0 0 0 0 0 0
-2 2 -2 2 0 0 -2J2 2J2 0 0 0 0
(13.10)
This design is similar to the design in Figure 13.2, but the star points are located *242 from the center, not +4. This design is sometimes called a “circumscribed central composite design.” All of the peripheral points lie on a circumference equidistant from the center and makes the design rotatable: the uncertainty depends only on the distance from the center and not on the direction (see Section 12.9). Although the experimental design still has the four planes of mirror-image symmetry discussed in the previous section, the uncertainty and information contours each have an infinite number of planes of mirror-image symmetry passing through the origin. That is, they each have a C, axis of rotational symmetry at the origin and perpendicular to the xI-x2 plane. The normalized information at the center (and at the edges) of the factor space in Figure 13.3 is less than the normalized information at the center (and at the edges) in Figure 13.2. These effects are a result of the relative compactness of the star points in this rotatable design which allows the FSOP model to flex more at the corners of the factor space and, consequently, at the center as well.
13.5 An orthogonal rotatable central composite design
Figure 13.4 shows four panels for still another central composite design. The lower leji panel shows the experimental design itself. The upper le) panel shows the normalized uncertainty associated with this design. The upper right panel shows the
286
normalized information. The lower right panel plots normalized information as a function of factor x1 for fixed values of x2. The experimental design matrix is
D=
-2 -2 2 2 -2J2 2J2 0 0 0
-2 2 -2 2 0 0 -2J2 2J2 0
0 0 0 0 0
0 0 0 0 0 0
0 0
(13.11)
0
The design in Figure 13.4 is similar to the design in Figure 13.3, but the center point has been replicated a total of eight times, not four. This makes the design not only rotatable but also orthogonal in the coded factor space: that is, the estimate of one factor effect (i.e., p;, pi. p;l, p;*, or p;,) is independent of the estimates of all other factor effects (see Section 12.10). The normalized information at the center of the design in Figure 13.4 is somewhat greater than the normalized information at the center of the design in Figure 13.3. This is because of the additional information supplied by the extra four center-point replicates in the orthogonal design and because the additional experiments decrease the amount the FSOP model can flex. Note that orthogonality and greater information have been achieved at a relatively high cost: 16 instead of 12 (or 113 more!) experiments.
13.6 A three-level full factorial design Figure 13.5 shows still another central composite design. The experimental design matrix is
287
Figure 13.4 Orthogonal rotatable central composite design. Square points -e2, star points +2./2, DF,, = 3, DFF = 7.
288
Figure 13.5 Face centered central composite design. Square points t 2 , star points t 2 , DF, = 3.
= 3,
289
D=
-2 -2 2 2 -2 2 0 0 0 0 0 0
-2 2 -2 2 0 0 -2
(13.12)
2 0
0 0 0
This design is similar to the designs in Figures 13.2 and 13.3, but the star points are located *2 from the center, not *4 or *2d2. In effect, the star points have been brought in to the faces of the square. This design is sometimes called a “face centered central composite design”. This design is also equivalent to a two-factor three-level full factorial design. (The equivalence between a face centered central composite design and a three-level full factorial design does not hold for factor spaces of dimension greater than two, however.) A comparison of the three-level full factorial design shown in Figure 13.5 with the rotatable central composite design shown in Figure 13.3 reveals that the normalized uncertainty and normalized information at the center (and at the comers) of the factor space are approximately equal for the two designs. However, both the normalized uncertainty and normalized information surfaces in Figure 13.5 are pushed in from the sides toward the center compared with the corresponding surface in Figure 13.3. The “shaped” contours in Figure 13.5 are not circularly symmetrical but are “squared off”.
13.7 A small star design within a larger factorial design Figure 13.6 continues the sequence of Figures 13.2, 13.3, and 13.5. The experimental design matrix is
w
Figure 13.6 Central composite design. Square points 22, star points ? l S , DF,, = 3. DFp. = 3.
0
Normallzed Information 0 5 075
025 1 0
Normalized Information 0 025 0 5 075 1 0
V
I
N o r m a l i z e d Uncertainly 2 4 6 6 10
0
29 1
D=
-2 -2 2 2 4 2 J2 0
0 0 0 0 0
-2 2 -2 2 0 0 -J2 J2 0
(13.13)
0 0 0
Figure 13.7 Rotatable central composite design. Square points k2d2, star points k4, DF,or= 3, OF, = 3.
292
This design is similar to previous designs, but the star points are located 4 2 from the center, not *4 or 5~242or *2. The star points have been brought inside the faces of the square. This design is sometimes called an “inscribed central composite design’’. Note that the “sides” of the rectangular normalized information surface have been pinched inward. The shape of this surface is clearly related to the placement of the experiments in factor space as shown in the lower left panel. A constant theme of experimental design is that generally in those regions where experiments have been carried out, there is superior information; in those regions where experiments have not been carried out, there is inferior information.
13.8 A larger rotatable central composite design The rotatable central composite design in Figure 13.7 is related to the rotatable central composite design in Figure 13.3 through expansion by a factor of 42: the square points expand from k 2 to k242 from the center; the star points expand from *242 to +4 from the center. The experimental design matrix is
-
-2J2 -2J2 2J2 2J2 -4 4
D=
-
0 0 0 0 0 0
-
-2J2
2J2 -2J2 2J2
0 0
-4 4
(13.14)
0 0 0 0
The design in Figure 13.7 is thus the same design as in Figure 13.3, but larger. As expected, the normalized uncertainty surface and the normalized information surface also expand in all directions from the center by the same amount (42). The result is that the normalized uncertainty is generally lower (and the normalized information is generally higher) over the whole factor domain. Further comparison of Figure 13.7 with Figure 13.3 suggests that over a given region of factor space, a broader design gives less uncertainty. This is another way of saying that if you want to find out what is happening in a certain region of factor
293
space, the most informative method is to carry out an experiment there. Extrapolation of a narrower design is done with greater uncertainty. However, as the experimental design becomes broader, there will be a greater likelihood that the empirical FSOP model will fit less well and might exhibit a greater amount of lack of fit.
13.9 A larger three-level full factorial design Figure 13.8 is related to Figure 13.5 through expansion by a factor of 2: the factorial and star points are all on the square, *4 units from the center of the design. Again, this design is equivalent to a two-factor three-level full factorial design. The experimental design matrix is
-4
-2
0
2
4
Factor X I
Figure 13.8 Face centered central composite design. Square points -c4, star points 24, DF,,, = 3, DFpc= 3.
294
-
D=
-
-4
-4
-4
4
4 4
-4
-4 4 0
0 0 -4
0 0 0 0 0
4 0 0 0 0
-
4 (13.15)
In Figure 13.8, a high degree of information is provided over most of the bounded factor domain. However, as before, the possibility of lack of fit of this broader design is greater than for the narrower design of Figure 13.5.
13.10 The effect of the distribution of replicates Replicates don’t all have to be carried out at the center point. Using the experimental design of Figure 13.2 as a basis, Figures 13.9 and 13.10 show the effects of different distributions of replicates. In Figure 13.9, instead of carrying out four replicate experiments at the center point (as in Figure 13.2), the four replicates are carried out such that one experiment is moved to each of the existing four factorial points. The experimental design matrix is
-
D=
-2 -2 -2 -2 2 2 2 2
-4 4 0 0
-2 -2 2 2 -2 -2 2 2 0 0 -4
4
-
(13.16)
295
-
I
,
-4
,
, -2
,
,
0 Factor x i
,
,
2
,
,
'
4
Figure 13.9 Central composite design. Square points +4, star points +2, DF,,,, = 2, DFF = 4.
This allocation of experiments has the effect of making the normalized uncertainty and normalized information contours more axially symmetric (the design isn't quite rotatable; there are still only four mirror-image planes of reflection symmetry). However, because no experiments are now being carried out at the center point, the amount of uncertainty is greater there (and the amount of information is smaller there). The overall effect is to provide a normalized information surface that looks like a slightly square-shaped volcano. Because there are now only eight distinctly different factor combinations (f = 8; the center point is not present), there are four degrees of freedom for purely experimental uncertainty (n - f = 12 - 8 = 4) and only two degrees of freedom for lack of fit ( f - p = 8 - 6 = 2). Figure 13.10 shows the effect of placing the four replicates at each of the star points. The experimental design matrix is
296
-
D=
-2 -2 2 2 -4 -4
4 4
0 0 0 0
-2 2
-
-2 2 0
0 0
(13.17)
0 -4 -4 4 4
v
, -4
,
, -2
,
,
0 F a c t o r XI
,
, 2
,
, 4
Figure 13.10 Central composite design. Square points +4, star points +2, DF,,, = 2, DFF = 4.
297
This allocation of experiments has the effect of emphasizing the star points in the normalized uncertainty and normalized information contours. The contours are “bumpier” now, with the bumps occurring at the star points. Because no experiments are being carried out at the center point, the amount of uncertainty is greater there (and the amount of information is smaller there) than in the original design of Figure 13.2. As in Figure 13.9, there are only eight distinctly different factor combinations (f= 8). Thus, there are four degrees of freedom for purely experimental uncertainty and only two degrees of freedom for lack of fit.
13.11 Non-central composite designs Figure 13.1 1 is a non-central composite design - the center of the square design and the center of the star design do not coincide. Thus, the two individual designs which make up the combined design are not centered. It is a composite design, but it is non-central. The experimental design matrix is
-
D=
-2 -2 2 0 0 0 0 0 0
-2 0
0 -2 2
(13.18)
0 0 0 0-
There are many ways to view the construction of the experimental design shown in Figure 13.11. Of these, perhaps the most straightforward is to note that the design in Figure 13.11 can be derived from the design in Figure 13.5 by removing three of the comer points. This leaves a smaller factorial design in the lower left quadrant of the original design while still retaining the original star design. The design in Figure 13.11 has only nine experiments (n = 9) and only six distinctly different factor combinations (f= 6). Thus, there are still three degrees of freedom for purely experimental uncertainty (n -f= 9 - 6 = 3). However, the FSOP model has six parameters ( p = 6 ) , so there are now no degrees of freedom for lack of fit ( f - p = 6 - 6 = 0). In the upper left panel of Figure 13.1 1, standard uncertainties greater than 10 have been truncated to 10. The experimental design in Figure 13.11 is more efficient than the experimental design in Figure 13.5 in the sense that fewer experiments are used to estimate the parameters of the model ( E = p / f = 6/6 = I.OO), but the quality of information (as
298
shown in the normalized uncertainty and normalized information surfaces) suffers as a result. Figure 13.12 is another non-central composite design. The experimental design matrix is
-2 -2 2 -2
D=
0 0 0 0 0
-2 2 -2 0 -2 0 0 0
(13.19)
0
a
u
,
-4
,
,
-2
,
,
0
,
~
2
I
,
4
Factor X 1
Figure 13.11 Non-central composite design. Square point -2, star points +2, DF,,, = 0, DFpe= 3.
299
The design in Figure 13.12 can be derived from the design in Figure 13.5 by removing one of the comer points and two of the edge points. This leaves a small factorial design in the lower left quadrant of the original design. Now, however, the arms of the star have been pulled apart and placed along two perpendicular edges of the small factorial design. The degrees of freedom are the same as for the design in Figure 13.11. The design in Figure 13.12 is similar to the mixture design shown in the lower right panel of Figure 12.28.
13.12 A randomly generated design The nine distinctly different factor combinations in Figure 13.13 were obtained from a random number generator. In this sense, the experimental design in Figure 13.13 is a “totally random design.” One factor combination was chosen (again at random) and three additional experiments were carried out there to provide three degrees of freedom for purely experimental uncertainty. There are three degrees of freedom for lack of fit. The experimental design matrix is
-
3. I2920 0.56375 -2.94542 -2.86738 - 3.73338 -0.1 1982 D= 2.45737 - 3.3465 1 4.15284 4.15284 4.15284 - 4.15284
2.56303
- 4. I3827 1.23608 0.62983 4.95222 - 1.48349 -0.453 19 - 1.53869 0.25859 0.25859 0.25859 0.25859
( 13.20)
Note that information is greatest in regions where experiments have been carried out. In the lower left panel of Figure 13.13 there is an open area where few experiments have been carried out (between xI = -3 and x, = 3, and at x, > 0). In the upper right figure this barren region appears as a valley sloping downward toward the back right. Even so, this randomly generated experimental design provides fairly good information across the factor domain. Occasionally, random number generators will give a poor design (e.g., all experiments might end up in a very small fraction of the desired factor domain). However, as long as the resulting design spans the desired factor domain, has enough factor combinations to determine lack of fit, and has replicates to determine purely
300
-4
-2
0 Factor
2
4
xi
Figure 13.12 Non-central composite design. Square points *2, star point -2, DF,,,, = 0, DFpc= 3.
experimental uncertainty, random processes seem to generate ‘‘pretty good” experimental designs. This is not a recommendation of random or haphazard designs, especially pseudo-random designs generated by researchers. Such designs frequently do not span the factor space, have far too many degrees of freedom for lack of fit, and have no degrees of freedom for purely experimental uncertainty. Using standard experimental designs (such as the central composite design) or creating new experimental designs from sound statistical principles is almost always more efficient and informative than any randomly generated design. Of course, the design in Figure 13.13 could be improved by moving one of the two closely placed experiments near ( x , = -2.5, x2 = 1) into the open space discussed above. And the point near (xl = -3, x2 = -1) might be moved a bit lower in x2. And then if ....
301
13.13 A star design Figure 13.14 shows a star design that can be used to fit a two-factor FSOP model. The experimental design matrix is
-
D=
-4.0 -2.5 -2.5 -1.0 0.5 1.0 3.5 4.0
1.5 2.0 2.0 1.5 1.0 -1.0 -1.0 1.0
(13.21)
1
"
-
I
, -4
,
, -2
,
,
,
0 FaCt3r X I
Figure 13.13 A totally random design. DF,,, = 3, DFpe= 3.
, 2
,
, 4
'
302
D
-
7
-
1
Factor X 1
Figure 13.14 A star design. (The replicatcs are Mizar and Alcor in Ursa Major.) DF,,, = 1, DFF = 1.
Because the number of distinctly different factor combinations is seven ( f = 7), and because the number of experiments is eight (n = 8), there is only one degree of freedom for lack of fit (f- p = 7 - 6 = 1) and only one degree of freedom for purely experimental uncertainty (n - f = 8 - 7 = 1).
13.14 Rotatable polyhedral designs Figure 13.15 shows a design based on a regular polyhedron, the pentagon [Himmelblau (1970)l. The experimental design matrix is
303
D=
2.000 0.618 -1.618 - 1.618 0.618 0.000 0.000
0.000 1.902 1.176 - 1.176 - 1.902 0.000
0.000 0.000
0.000
(13.22)
0.000 0.000
Because each of the pentagonal points is equidistant from the center of the design, the design is rotatable. This rotatability is seen in the axially symmetric surfaces for normalized uncertainty and normalized information.
-4
-2
0
2
Factor X I
Figure 13.15 A pentagonal rotatable design with center point. DF,, = 0, DFp = 3.
4
304
In this design there are only six distinctly different factor combinations. Thus, there are no degrees of freedom for lack of fit when fitting a two-factor FSOP model with six parameters. There are three degrees of freedom for purely experimental uncertainty because of the four replicate experiments at the center point. Figure 13.16 is a design based on a regular hexagon [Himmelblau (1970)l. The experimental design matrix is
D=
2.000 1.000 - 1.000 -2.000 - 1.OOO 1.000 0.000 0.000 0.000 0.000
0.000 1.732 1.732 0.000 - 1.732 - 1.732 0.000 0.000 0.000 0.000
( 13.23)
Because each of the hexagonal points is equidistant from the center of the design, the design is rotatable. There are seven distinctly different factor combinations. Thus, there is one degree of freedom for lack of fit when fitting a two-factor FSOP model with six parameters. There are three degrees of freedom for purely experimental uncertainty because of the four replicate experiments at the center point.
13.15 The flexing geometry of full second-order polynomial models Figure 13.17 shows a design similar to the hexagonal design in the previous Figure 13.16, but in this new design only one experiment has been carried out at the center. The experimental design matrix is
D=
2.000 1.000 - 1.000 -2.000
- 1.OOO 1.000 0.000
0.000 1.732 1.732 0.000 - 1.732 - 1.732 0.000
( 13.24)
305
" I
01
, -4
,
, -2
,
,
0 Factor X1
,
,
2
,
, 4
Figure 13.16 A hexagonal rotatable design with center point. DF,,, = 1, DFpe = 3 .
A comparison of the lower right panels of Figures 13.16 and 13.17 shows that at the center of the factor domain, the normalized information is less for the design with only one experiment at the center (Figure 13.17) than for the design with four replicates at the center (Figure 13.16). This is confirmed in the upper right panels of both figures: the design with fewer experiments at the center (Figure 13.17) has a depression in the center of its normalized information surface; the design with more experiments at the center (Figure 13.16) has a smoothly sloped dome at the center of its normalized information surface. A moment's thought suggests that the decreased information at the center of the design in Figure 13.17 is reasonable and expected - the depression is a result of the principle that if many experiments give more information, then fewer experiments give less information. But this simple explanation becomes inadequate when it is realized that the information is greater on the ring of hexagonal points that surround the center point
306
than it is at the center point itself. After all, the six circumferential points and the center point are laid out on an equilateral triangular grid: why should one of them (especially the center point) provide less information than the other six? The answer to this seeming conundrum lies in the geometry of the fitted model. In Chapter 12, Figures 12.18-12.22 show some of the possible response surfaces that can be represented by the two-factor FSOP model. (Here it is important to note that the response surfaces shown in Figures 12.18-12.22 are generated by the FSOP model and are not the normalized uncertainty or normalized information surfaces shown in the upper panels of Figures 13.16 or 13.17.) Consider one of these response surfaces, the parabolic bowl (paraboloid) opening downward, shown in Figure 12.19. In this canonical form, the maximum of the dome corresponds to the center of the factor domain and the sides of the surface slope downward away from the center. The paraboloid is symmetrical about an axis at the origin and perpendicular to the x1-x2 plane. The discussion that follows will use the geometry of this somewhat unique
Factor x i
Figure 13.17 A hexagonal rotatable design with center point. DF,or= 1, DFpe= 0.
307
response surface to rationalize the shapes of the normalized uncertainty and normalized information surfaces in Figure 13.17. (The results are identical for developments that involve other forms of the FSOP model, but the explanations are less straightforward.) The six outer experiments of the hexagonal design are like a pair of hands clasping the paraboloid around its sides, much as an American football player might catch a well-thrown pass. Even though the response surface might flex and writhe over other regions of factor space, it will be held rigidly around this circumference and won't move much at all. Thus, the information content will be high over this circular region. Geometrically, there are an infinite number of paraboloids that can pass through a circle (e.g., the circle of equal responses at the hexagonal points in this example). Some of the paraboloids will be tall and elongated, some will be short and compressed, some will point up, some will point down, one of them will even be a degenerate flat plane [Rider (1947)l. Although the hexagonal points hold the sides
_7_
-4
0
Factor X 1
Figure 13.18 A hexagonal design with no center point. DF,,, = 0,DFF = 0.
7
4
308
Factor X I
Figure 13.19 A hexagonal design with extra outcr point. DF,,, = I , DFpe= 0.
tightly, the tip (or apex) of the paraboloid is defined by the response at the center point only. Any noise or uncertainty at the center point will move the apex up or down without any resistance from the circumferential points. Any uncertainty in the single center point will not be averaged by the other points, and the uncertainty there will remain relatively large. This is why there is a depression at the top of the normalized information surface in Figure 13.17. The hexagonal ring of points represents a circular node in one of the normal vibrations of a paraboloid. If the paraboloid is held rigidly around this circle, then pressing down on the top of the paraboloid will cause the sides of the paraboloid to flair outward below the circle. Pulling up on the top of the paraboloid will cause the sides of the paraboloid to squeeze inward below the circle. The greater the variation at the center point inside the circular node of points, the greater will be the variation in regions outside this node. This effect can be seen by comparing Figures 13.16 and 13.17. The upper left panels of the two figures show that, as expected, the normalized
309
uncertainty for the design with only one center point (Figure 13.17) is greater at the comers of the factor domain than for the design with four center points (Figure 13.16). At a very basic level, the shapes of the normalized uncertainty and normalized information surfaces for a given model are a result of the location of points in factor space simply because carrying out an experiment provides information - that is, information is greatest in the vicinity of the design. But at a more sophisticated and often more important level, the shapes of the normalized uncertainty and normalized information surfaces are caused by the geometric vibrations of the response surfaces themselves - the more rigidly the model is “pinned down” by the experiments and the less it can squirm and thrash about, then the less will be the uncertainty and the greater will be the information content.
13.16 An extreme effect illustrated with the hexagonal design
The lower left panel of Figure 13.18 shows a hexagonal design without a center point. Considerations of degrees of freedom suggest that it should be possible to fit a FSOP model to data from these six experiments: n = 6 , f = 6, p = 6; thus, although the degrees of freedom for residuals, lack of fit, and purely experimental uncertainty are all equal to zero, they are not negative, and the model would be expected to fit perfectly. Even so, the determinant of the (X’X)matrix is zero, the matrix cannot be inverted, and the two-factor FSOP model cannot be fitted to the data in Figure 13.18. Geometrically, the zero determinant arises because the number of FSOP models that can be made to pass through the responses above the hexagonal factor combinations is now truly infinite (see Section 5.6). There is no center point to define the apex and prevent the model from fluttering about. For purposes of illustration only, to circumvent the problem of a zero determinant but still show the distributions of uncertainty and information in this design, a seventh experiment was added at a factor combination just slightly removed from one of the hexagonal points (at x1 = 2.000, x2 = 0.001). The hexagonal points were also adjusted somewhat to coincide with the grid lines in the pseudo-three-dimensional plots (this is equivalent to a minor adjustment of scale in the x, dimension). Neither of these modifications significantly affects the overall conclusions to be drawn from this example. The actual design is
310
D=
2.000 1.000 - 1.000 -2.000 - 1.000 1.000 2.000
0.000 1.750 1.750 0.000 - 1.750 - 1.750 0.001
( 13.25)
The results from this seven-experiment design are shown in Figure 13.18. The striking feature of this design is the set of six spikes in both the normalized uncertainty and normalized information surfaces. These spikes are an extreme expression of the basic idea that experiments provide information. Even if the experimental design is not a good match for the model; even if the (X’X) matrix is ill conditioned; even if the model doesn’t fit the data very well, there is still highquality information at the points where experiments have been carried out. Figure 13.19 shows the effect of moving the seventh experiment farther away from one of the hexagonal points. In this example, the seventh experiment is at n, = 2.000, x2 = 0.500. This experiment gives enough leverage that the response surface becomes better defined around the ring of hexagonal points. Because the resolution of the plotting grid is not sufficient to resolve all of the sharp detail in the cylindrical surfaces, the normalized uncertainty was truncated at 4.0 and the normalized information was correspondingly truncated at 0.25; the actual surfaces extend below and above the capped surfaces shown in Figure 13.19. Figure 13.20 shows the effect of moving the seventh experiment still farther away from the hexagonal points (to x , = 4.000, x2 = 4.000). The more distant point has good leverage, and the fitted response surface becomes much more rigid (as indicated by the normalized uncertainty and normalized information surfaces in this figure). Note also that the information is high above the distant point.
13.17 Two final examples Figure 13.21 shows the effect of adding an extra, remote star point to a small face centered central composite design. The experimental design matrix is
I
I
o\
h,
w
> 0 0 0 0 0 0 + e - - - -
II
b
I
I
P
II
G
2
w f:
z
c
3
09 0
e, X
z
P
0
L
z
4
6
8
1
025
0 5
075
0
1 0
Normalized Information 0
A
2
~ o r m a l i r e dUncertainty 0
312
_
I
,
-4
,
, -2
,
, 0
,
~,
,
2
4
F a c t o r XI
Figure 13.21 An inscribed central composite design with distantly located extra star point. DF,or = 4, DFF = 3.
The elephant-like contours result from the stabilizing leverage of the distant factor combination. Note that this additional point stabilizes the fitted model in the x,-direction (the dimension in which it was extended), but has little effect in the x,-direction. The coded design shown in Figure 13.22 is from a clinical chemical study investigating the interference of magnesium in the analytical chemical determination of calcium [Olansky, Parker, Morgan, and Deming (1977), Deming and Morgan (1979)l. The uncoded factors represent the concentrations of calcium and magnesium in human blood serum. The experimental design matrix is
B It
b
b
Y
a
e,
3
Figure 13.22 A non-central composite design with many distributed replicates. DF,, = 4, DFp, = 10. N o r m a l i z e d Information
4
6
8
1
0
Y
Y
I
Normalized Information 0 025 0 5 0 7 5 1 0
2
Normalized Uncertainty 0
314
D=
-3 -3 -2 -2 -1 -1 -1 -1 0 0 0 0 0 0 1 1 1 1 2 2
0 0 0 0 -1 -1 1 1
0 0 0 0 2 3
( 13.27)
-1 -1 1 1 0 0
It could be argued statistically that the number of replicates in Figure 13.22 is excessively large (the person carrying out these experiments would also argue that the number of replicates is excessively large!): the relatively small improvement in the quality of information in the region of the design has been gained at the relatively large expense of the additional experiments. It might also be argued statistically (indeed, it is perhaps one of the major points of this chapter) that the domain of the experimental design represents only a small fraction of the factor space shown: a broader design would have given smaller uncertainties and more precise information over the whole factor space (see, for example, Figure 13.8). However, in the example of Figure 13.22, the factorial part of the design adequately covers the combinations of calcium and magnesium found in living humans. The extended star points were used to obtain precise estimates of curvature in xI and x2.There is no practical reason for investigating combinations of calcium and magnesium in the unexplored regions: serum samples with these combinations of concentrations could only have come from the morgue.
315
Exercises 13.1 Minimalist design. Add one design point to a two-factor star design to generate a design that is sufficient to fit a full second-order polynomial model ( yli = Po + plxli+ p~~~+ p1& + p,&, + plgl$z + rli).Hint: see Figure 13.11. 13.2 Optimal design.
Assume a constrained factor space of -5 I x, I +5, -5 I x, I +5. Assume the full two-factor model with interaction, y l i = Po + pixli + pzxzi + plzx l ixzi + rli. Assume a 22 factorial design. How should the four design points be placed to maximize the determinant of the (X'X)matrix? Demonstrate with a few calculations.
13.3 Existing designs. Find a report of a two-factor experimental design. Speculate about the shape of the normalized uncertainty and normalized information surfaces for the design. Sketch their shape.
This Page Intentionally Left Blank
317
CHAPTER 14
Factorial-Based Designs
As indicated in Section 12.3, full factorial designs are a popular class of experimental designs, in part because of their ability to detect interaction among factors [Fisher (1971) and Yates (1937)l. With full factorial designs, all k factors are varied simultaneously over several (m)levels. The designs are usually symmetric (i.e., all factors have the same number of levels) which leads to the notation mkto describe the number of distinctly different factor combinations (f)in the full factorial designs. Figures 12.7, 12.8, and 12.9 are examples of 32, 23, and 22 designs, respectively. Up to this point we have used the symbol x with a subscript to indicate a factor (e.g., xl, x2,x3). In the early days of experimental design, however, a capital letter was usually used to represent a factor [see, for example, Davies (1956)l. Thus, what we would now call factor xl,the early workers might have called factor A. What we would now call factor x,, statisticians might have called factor B. The notation x3 would represent factor C. An x1x2interaction would be called an AB interaction. And so on. We have chosen to use the x,-type notation because it is consistent with the mathematical notation used in both linear models and matrix least squares [Neter, Wasserman, and Kutner (1990)l. However, both systems are in use today. For that reason, in this chapter we will also use the classical notation, and will use it interchangeably with the xl-type notation. In this chapter we explore factorial-based experimental designs in more detail. We will show how these designs can be used in their full factorial form; how factorial designs can be taken apart into blocks to minimize the effect of (or, if desired, to estimate the effect of) an additional factor; and how only a portion of the full factorial design (a fractional replicate) can be used to screen many potentially useful factors in a very small number of experiments. Finally, we will illustrate the use of a Latin square design, a special type of fractionalized design.
14.1 Coding and symbols Factorial designs are usually discussed in terms of coded factor spaces. Table 14.1 shows some of the common coding systems for two- and three-level designs. Our emphasis in this chapter will be on the two-level designs.
318 TABLE 14.1 Common coding systems for two- and three-level factorial designs.
Mathematical
Two-level
Three-level
Lo Hi
Lo Mid Hi
-1
+1
-1
0
1
o
+
Abbreviated
-
+
-
Taguchi
1
2
1
2
3
Plackett-Burman
1
2
1
2
3
Combinatorial
l
a
As shown in Table 14.1,the notation for the combinatorial mathematical treatment of two-level factorial designs uses a “1” to indicate the low level of a factor, and a lowercase letter (e.g., a) to indicate the high level of a factor. These lowercase letters can then be used to indicate factor combinations of the design points: for example, the symbol ac specifies the high level of factor A, the high level of factor C, and the low level of factor B (by omission of “b”: “ac” = “axlxc”). The use of such descriptive symbols for design points is one of the advantages of the older notation.
Factor A
Figure 14.1 A two-factor two-level full factorial design in factors A and B.
319
To complicate matters, however, the notation for a design point is also used as a symbol for the response at that factor combination. Thus, ac might refer to the design point (the factor combination) at which an experiment was carried out, or ac might refer to the response that was obtained there. It is usually clear from the context which meaning should be attached to the symbol. Two-factor factorial designs like that shown in Figure 14.1 are often shown on the printed page as “square plots”:
a
1
41
56
A
A
Three-factor two-level factorial designs like that shown in Figure 14.2 can be shown on the printed page as “cube plots”:
*
*
*
7.6
*
aC
5 3
I
C
*
*
b-
--
*
*
a A
**I
*
*
9 5
6.2
I
C
ab
*
2.5
~
--
*
*
B 2 2
3.6
B
2.9 A
Table 14.2 gives a traditional tabular presentation of the information in this cube plot. This type of table is often used before and during the experiments - it provides instructions on how the experiments should be carried out, and provides a column where the responses can be recorded. The first column of Table 14.2 lists the experiment numbers (1-8). The next three columns list the abbreviated coded factor levels (- and +) for factors A, B, and C. Note that these three columns are equivalent to the abbreviated coded D matrix:
320
Figure 14.2 A three-factor two-level full factorial design in factors A, B, and C. The open circle locates the center of the design.
-
-
+ +
-
+ + + + + +
7
(14.1)
The next column lists the factor combinations (again, each symbol is equivalent to a description of the factor combination represented by the design point). The last column gives the response and is equivalent to the matrix of measured responses, Y
-
Y=
2.2 2.9 2.5 3.6 5.9 6.2 7.6 9.5
-
(14.2)
321
TABLE 14.2 Traditional tabular presentation of data for carrying out a Z3 full factorial design. Experiment
Factor A
B
Factor combination
Response
1
2.2
a
2.9
b
2.5
ab
3.6
C
5.9
aC
6.2
bc
7.6
abc
9.5
C
14.2 Classical mathematical treatment of a full Z3 factorial design With simple, symmetrical, orthogonal designs like the full factorial designs, when all of the experiments have been done exactly the same number of times, then the factor effects can be calculated using simple algebra. In the classical factorial design literature, a factor effect is defined as the difference in average response between the experiments carried out at the high level of the factor and the experiments carried out at the low level of the factor. Thus, in a 23full factorial design, the main effect of A would be calculated as:
A=[
Average response athLyfvel
a+ob+oc+ubc]
=[
4
I-[
Average response atlztvel
[l + b y + b c ]
-
] (14.3)
322
where the symbols indicate the measured responses for the factor combinations. Note that the symbols for the responses at the high level of a factor include the lower case letter for that factor; the symbols for the responses at the low level of a factor do not have the lower case letter for that factor. To illustrate this classical approach to the calculation of factor effects, consider the following 23 full factorial design: bc
**I
~~
*
*
a
*
__
* *
a
1 A
* :
*
51
b-
I *
a
B
12
55
C
a
C
*
*
*
89
49 __
I *
I *
E
75
45
A
Table 14.3 gives a traditional tabular presentation of the information used to treat the data from this factorial design. This type of table is used after the experiments have been carried out. The first column of Table 14.3 gives the response notation (or, equivalently, the factor combination). The next eight columns list the eight factor effects of the model: the three main effects (A, B, and C), the three two-factor interactions (AB, AC, and BC), the single three-factor interaction (ABC), and the single offset term (MEAN, analogous to pi in the equivalent linear model). Finally, the last column of Table 14.3 lists the response (RESP) at each experimental point and is equivalent to the matrix of measured responses:
-
Y=
45 75 49 111 51 89 55 125,
( 14.4)
Although it is true that the first three columns of plus and minus signs in Table 14.3 are equivalent to the abbreviated coded experimental design matrix D , the signs in Table 14.3 are used for a slightly different purpose than they were Table 14.2. In fact, as we will see, the eight columns of signs in Table 14.3 are equivalent to the matrix of parameter coefficients, X.
323 TABLE 14.3 Traditional tabular presentation of data for calculating the classical factor effects in a 2’ factorial design. Response
Effect ABC
MEAN
-
-
-I-
45
+
-
+
+
75
b
-
+
+
+
49
ab
+
+
-
+
111
C
-
-
+
+
51
ac
+
-
-
+
89
bc
-
+
-
+
55
abc
+
+
+
+
125
4
4
4
8
Notation
A
B
1
-
a
divisor
C
4
AB
4
AC
4
BC
4
RESP
Note that the signs under the column labelled MEAN are positive = + = +1 = 1, which is the implicit coefficient of Po in linear models containing an offset term. Clearly, the signs representing the levels of the factors themselves are the coefficients of the factor effects in the model. Note also that for any experiment the signs under the two- and three-factor interaction columns are the products of the signs for the individual factors; for example, in the first experiment the sign of the ABC interaction is negative (-), which is the product of the signs of the factor levels for A, B, and C: (-l)x(-l)x(-1). Thus, although the coefficient of the offset term has been moved from the left side of the matrix to the right side of the matrix, the headings in Table 14.3 are analogous to the parameters (or factor effects) in the corresponding linear model:
+ P l X l r + P2X2r + P3X2r + P 1 2 X l J Z f + P13XlrX3r + P23X2rX3r + P123XlrX2rX3r + r l f
Y l f= P o
This can be seen more clearly in the X matrix itself
(14.5)
324
X=
PI23
p3
PI2
B
C
AB AC BC ABC
-1 -1
-1
+I
-1
PO
-
p23
p2
Mean A
P I
+I
-1
+I +I
+I
+1 +1
+1
+1 +I +I
-1 -1
-1
+1 +1 -1
-1 -1 +1
+I
+I
+I
-1
-1
+1
+1 +1
+1
+I
+I
-1 -1 +1
-1
PI3
+1 -1
+1
+I
+1
+1
-1 -1 -1 -1
+I
-1 -1
+I -1 +1
+I +1
Linear model Classical model
-1
-1 +1
(14.6)
-1
-1 +1
Because this X matrix is orthogonal, the elements of any one column multiplied by the corresponding elements of any other column sum to zero. As a result, the product (X’X)produces an identity matrix Z multiplied by eight: (X’X)= 81.Thus, the inverse matrix (X’X)-’has the form of the identity matrix multiplied by the reciprocal, ’ (1/8)Z. Further multiplication gives the ‘‘pseudo-inverse’’ one-eighth: (X’X)-= (X’X)-’(X’=)(1/8)ZX’= (1/8)X’ [Malinowski (1991)l.Finally, the response matrix Y can be multiplied by this pseudo-inverse to give the matrix least squares solution, where B = [(X’X)-’X’]=Y(1/8)X’Y
-
B=
bo
-
-
+1
+1 +I + I
bl
[ + 1 -1 + 1
b2
-1 -1
b3
b12 bl3
- -
=(ID)
+I +I +I
-1 -1
+1 -1 -1
-1
+I
+I +I
-1
-1
+1 -1 +1 -1 -1 -1
-1 +1 +1
-1 +I
+I -1 +I +I
-1
-1
-1
+I
-1
+1 -1
+I ~
+1
-1 - 1 + I b123 -1 +I + I - 1 -1 It is easy to see that the estimate of b, (the MEAN in the classical treatment) is obtained by multiplying each value of response in the single column of the Y matrix by a +1 from the top row of the X’ matrix in Equation 14.7, and then multiplying the sum of products by one-eighth (or, equivalently, dividing the sum by eight). This division by eight is the source of the divisor listed for the MEAN in Table 14.3. b23
MEAN=
(45+75+49+111+51+89+55+125)
= 75 (14.8) 8 It can also be seen that the estimate of b,, for example, will be obtained, in part, by multiplying each value of response in the single column of the Y matrix by the But the second row of values in the second row of the X’matrix in Equation 14.7. the X’matrix has the same elements in the same order as the second column of the original X matrix shown in Equation 14.6. Thus, the signs in the column under the
325
factor named A in the X matrix (shown in Equation 14.6) and the signs in the column under the factor named A in Table 14.3 indicate which responses are added and which responses are subtracted to obtain the factor effect. Going back to the idea that a classical factor effect is defined as the difference in average response between the experiments carried out at the high level of the factor and the experiments carried out at the low level of the factor, the responses can be grouped into those that correspond to the factor at a high level (the experiments with a plus sign for that factor) and those that correspond to the factor at the low level (the experiments with a minus sign for that factor). This information is obtained easily from Table 14.3. Because there are four experiments at the high level and four experiments at the low level, the divisor for each average is four as indicated in Table 14.3. Thus, the classical main effect of factor A is calculated as A=
(75+ 11 1 +89+ 125) - (45+49+ 51 4 4
+ 5 5 ) = 50
(14.9)
The classical main effects of factors B and C can also be calculated in this manner: B=
+ + 125) - (45 + 75 + 5 1+ 89) --20
(49+ 1 11 55 4
4
(14.10)
(51+89+55+125) - (45+75+49+111) ( 14.1 1 ) = 10 4 4 In a similar way, the classical interaction effects AB, AC, BC, and ABC can be defined as the difference in average response between the experiments carried out at the high level of the interaction and the experiments carried out at the low level of the interaction. Again, the high level of an interaction is indicated by a plus sign in its column in Table 14.3 (either both of the individual factors are at a high level, or both of the individual factors are at a low level). The low level of a two-factor interaction is indicated by a minus sign in its column in Table 14.3 (one but not both of the individual factors is at a low level). Thus, the classical two-factor interaction effects are easily calculated:
C=
AB = AC =
(45
+ 1 1 1 + 5 1 + 125) - (75 +49+ 89+ 5 5 ) = 16 4
4
(45+49+89+125) - (75+111+51+55) =4 4 4
+
+
(14.12) (14.13)
(45+75+ 5 5 125) - (49+ 11 1 51 +89) =O (14.14) 4 4 Note that the symbols for the factor combinations at the high level of an interaction either include the symbols for both individual factors or do not include the symbols for both individual factors; the symbols for the factor combinations at the low level of an interaction include one, but not both, of the symbols for the individual factors.
BC =
326
The three-factor interaction effect ABC is calculated in the same way:
ABC =
( 7 5 + 4 9 + 5 1 + 125) - ( 4 5 + 111 + 8 9 + 5 5 ) =O 4 4
(14.15)
14.3 Classical vs. regression factor effects Table 14.4 shows a typical regression analysis output for the 23 factorial design in Table 14.3. Most of the output is self-explanatory. For the moment, however, note the regression analysis estimates for the parameters of the model given by Equation 14.5 and compare them to the estimates obtained in Equations 14.8-14.15 above. The mean is the same in both cases, but the other non-zero parameters (the factor effects and interactions) in the regression analysis are just half the values of the classical factor effects and interaction effects! How can the same data set provide two different sets of values for these effects? The classical definition of a factor effect is simply the difference in average response. This definition involves a change in response only, a bare 6y. The change in the factor from low to high level, 6x, is ignored. In the early types of designed experiments this was probably adequate because researchers (typically in agriculture) were usually using nominal or ordinal variables, not variables expressed on interval or ratio scales (see Section 1.5). For example, if the -1 level corresponded to “GrowFast” brand of fertilizer and the +1 level corresponded to “UpQuick” brand of fertilizer, then it would be useful to know that using GrowFast instead of UpQuick gave, say, 6.29% greater crop yield. The x values (and their difference, 6x) don’t have any conventional meaning in this example: the values -1 and +1 are just surrogate names for the GrowFast and UpQuick brands of fertilizers, and 6x = 2 has no meaning. If x1 is temperature and two experiments are carried out, one at a coded level of -1 and one at a coded level of +1, and we get a classical factor effect of +3.6% yield, it tells us that working at the +1 coded temperature gives more yield than working at the -1 coded temperature. But this classical factor effect by itself doesn’t tell us very much about how sensitive the reaction is to temperature because 6x,isn’t included in the factor effect. Thus, in modem research using interval and ratio scales the 6x usually shouldn’t be ignored. Let’s add 6x, to the calculation to obtain b;, as would be done with regression analysis. Because xl went from a coded level of -1 to a coded level of +1, 6x, = 2. Thus, by (the factor effect in the coded factor space) = 6yI/6x;= +3.6% per 2 coded units = +1.8% per coded unit. The fact that 6x is equal to 2 with this system of coding is why regression analysis of coded data gives results that are smaller by ‘/2 from the results obtained from the classical approach! This by still isn’t very helpful because we don’t know the factor effect in terms of
327 TABLE 14.4 Regression analysis of the Z3 factorial design in Table 14.3. FACTOR COMBINATIONS EXP
FACTOR1
1 2 3 4 5 6 7 8
-1 .0000000 1.0000000 -1 .0000000 1.000ooOo -1 . o m 0 0 1.0000000 -1 .0000000 1.0000000
FACTOR 2
FACTOR 3
RESPONSE
-1 .0000000
-1 .0000000 -1.0000000 -1.0000000 -1.0000000 1.0000000 1.0000000 1.0000000 1.000oooo
45.0000000 75.0000000 49.0000000 111 .0000000 5 1.0000000 89.0000000 55.0000000 125.0000000
-1.0000000
1.ooooooo 1.0000000 -1.0000000 -1.0000000 1.0000000 1. m 0 0 0
PARAMETER ESTIMATES B
1 A B C AB AC BC ABC
(DETERMINANT = 1.677722D+07)
ESTIMATE % CONFIDENCE
RISK
75.0000000 INSUF DEGREES OF FREEDOM 25.0000000 10.0000000 5.0000000 8.0000000 2.0000000 0.0000000 0.0000000
BREAKDOWN FOR SUMS OF SQUARES EXP
RESPONSE
ADJUSTED
PREDICTED
1 2 3 4 5 6 7 8
45.0000000 75.0000000 49.0000000 111.0000000 5 1.0000000 89.0000000 55.0000000 125.0000000
-30.0000000 0.0000000 -26.0000000 36.0000000 -24.0000000 14.0000000
45.0000000 75.0000000 49.0000000 111.ooooooo 5 1.OOOOOOO 89.0000000 55 .0000000 125.0000000
SRCE
SUM OF SQR
VARIANCE
D.F.
T MEAN CORR FACT R LOF PE
5 1544.0000000 45000.0000000 6544.0000000 6544.0000000 0.0000000
6443.0000000 45000.0000000 934.8571429 934.8571429 UNDEFINED UNDEFINED UNDEFINED
8 1 7 7 0 0 0
0.0000000
0.0000000
-20.0000000 50.0000000
DETERMINATION AND CORRELATION
R"2 VALUE = 1.0000000000 R VALUE = 1.0000000000 FISHER F-RATIOS
ESTIMATE % CONFIDENCE
FACT F(7,O) = UNDEFINED LOF F(0,O) = UNDEFINED
RISK
328
percent yield per unit of temperature. However, if we know how many real units (e.g., "C) correspond to a coded unit, then we can make the conversion. Suppose -1 = 20°C and +1 = 60°C. Then 2 coded units = +1 - (-1) = (60 - 20)"C = 40°C, or 1 coded unit = 20°C. Thus,
b, = + 1.8% per coded unit
x 1 coded unit per 20"C=0.09%/°C
(14.16)
This final, uncoded, 6y/& information is more meaningful and useful to the researcher or engineer who deals with factors expressed on interval and ratio scales.
14.4 Factor units and their effect on factor effects Suppose regression analysis of a designed experiment gives the following uncoded first-order parameter estimates (factor effects) for factors x1 and x,: b, = 3.781, b, = 0.0001032. Because the effect of factor x, appears to be so much smaller than the effect of factor x,, many researchers would conclude that factor x, is relatively unimportant. They might conclude from these numerical parameter estimates that the more important factor is x1 - the factor that appears to have the greatest effect (b), the greatest slope (6y/6x), the greatest change in y for a given change in x. But factor effects have units. Because a factor effect is a slope (b = 6y/&), the factor effect must have response units in its numerator and factor units in its denominator. If y , is the percent yield (%) and x, is the temperature expressed in degrees Celsius ("C),then b, = 3.781%/"C. Increasing x1 by 1°C is predicted to give a 3.8% increase in yield. If x, is the pressure and we assume it is expressed in atmospheres (atm), then b, = 0.0001032%/atm. Increasing x, by 1 atm is predicted to give a 0.0001% increase in yield. As we saw before, it looks like pressure is not a very important factor. But suppose that the researcher had expressed pressure in Pascals (Pa), not atmospheres as we had assumed. Then b, = 0.0001032%/Pa, and increasing x, by 1 Pa is predicted to give a 0.0001% increase in yield. This still doesn't seem like a very large change in response for a unit change in pressure. However, recall that there are 100,663.41 Pascals in an atmosphere. A Pascal is a tiny unit of pressure. If we apply the appropriate conversion factor, then
b, =0.0001032%/Pax 100,663.41 Pa/atm= 10.39%/atm
(14.17)
This is a sizeable effect. In this light, pressure appears to exert a useful influence on the percent yield. As this example is intended to show, the numerical values of the parameter estimates provide an insufficient basis to decide which factors exert the greatest
329
effects - the units in which the factor effects are expressed must also be taken into account. It is often necessary to convert the factor effects to operationally useful units. Coding is sometimes used as a way of sweeping these considerations under the rug, of ignoring this effect of units. It is often assumed that the researcher will use an ‘‘experimentally relevant domain” or “meaningful coded levels” of the factors so that the bare Sy will show the effect of the factor at the extremes of this domain. If this is understood explicitly, then no harm comes of it. But if the experimenter is unaware of these influences of units and coding, misinterpretation of the results is easily possible.
14.5 An example of the use of a z3 factorial design Consider an experimental project to reduce the number of defectives per 10,OOO items coming out of a packaging machine. Three factors are thought to be important: A, the speed the machine is run (2000 and 3000 unitsh); B, the pressure on a heated sealing plate (100 and 120 psi); and C, the weight of packaging film (4.0 or 4.7 mil polymer). Prior experience suggests that some of the factors probably interact. It is decided to use a 23 full factorial design to fit an eight parameter model: offset, first-order effects, and all possible interactions. Each design point is run until 100,OOO items have been produced and the number of defectives (DEF) per 10,000 items is calculated. The results are: Notation 1 a b ab C
ac bc abc
A
+ + + +
B
+ +
-
+ +
C
-
+ + + +
AB
+
AC
BC
ABC -
MEAN
-
-
+ +
+ +
-
+
-
+ +
+
+ +
-
+ +
+ +
-
-
-
+
+
(19.9+ 13.7+ 18.1+32.3) - (18.1
B=
(12.3+13.7+29.7+32.3) 4
C=
(15.9+ 18.1 +29.7+32.3) - (18.1 4
4
+ + + +
+
Classical treatment of the data gives (18.1+19.9+12.3+ 13.7+15.9+18.1+29.7+32.3) MEAN= 8 A=
+
,+
=20.0 (14.18)
+ 12.3+ 15.9+29.7) =2.0 4
- (18.1+
19.9+ 15.9+18.1)
DEF 18.1 19.9 12.3 13.7 15.9 18.1 29.7 32.3
(14.19)
=4.0
(14.20)
+ 19.9+ 12.3+ 13.7) =8.0
(14.21)
4 4
330
AB = AC = BC =
(18.1
+ 13.7+ 15.9+32.3) - (19.9+ 12.3+ 18.1 +29.7)-0.0 4
(18.1
+ 12.3+ 18.1 +32.3) - (19.9+ 13.7+ 15.9+29.7) =0.4 4
(18.1+19.9+29.7+32.3) 4
ABC =
4
(19.9+12.3+15.9+32.3) 4
4
( 14.22) ( 14.23)
- ( 1 2 . 3 + 1 3 . 7 + 1 5 . 9 + 1 8 . 1 )-~ 1 0 ~ 0 ( 1 4 . 2 4 )
4
- (18.1+13.7+18.1+29.7)-0.2 -
(14.25)
4
These effects all show a positive slope. Thus, the main effects (by themselves) suggest that the lower level of A (slower), the lower level of B (lighter pressure), and the lower level of C (fewer mils) will decrease the number of defectives. The weight of packaging material is predicted to have the greatest effect (8.0), the pressure on the heated sealing plate is predicted to have an intermediate effect (4.0), and the speed is predicted to have the smallest effect (2.0). The economics of these results are undesirable for the recommendation regarding speed - running the machine at 2000 units per hour rather than 3000 units per hour would either take longer to package a fixed total number of units, or require more machines to package a fixed number of units per hour. Fortunately, only two more defectives per 10,000items are produced at the faster speed, so a costhenefit analysis would probably suggest running the machine at the faster speed. The economics of the situation are desirable for factors B and C - lighter pressure would produce less wear on the machine and maintenance costs would be less; the lighter weight packaging material would presumably cost less. However, an analysi~ of the main effects of B and C is insufficient in this example; the interaction effects must also be examined. The three-factor interaction ABC is small. The AB and AC interactions are small. But the BC interaction is very large, larger than the largest main effect. The mathematical calculation of the BC interaction suggests an interpretation of the BC effect: because it is the average response (defectives) of those factor combinations for which the product BxC is positive minus the average response of those factor combinations for which the product BxC is negative, a positive BC interaction effect suggests that fewer defectives should be produced at those factor combinations for which the product BxC is negative. The product BC will be negative for B(+)C(-) and B(-)C(+). However, neither of these conditions is consistent with the previously discussed main effects which suggest minimum defectives at B(-)C(-). Thus, a compromise is required. Because the main effect of C is greater than the main effect of B, fewer defectives will be produced when C is at its low level. Thus, the combination B(+)C(-) should produce the smallest number of defectives. The cube plots for this experiment confirm the mathematical results. In the diagram
33 1
at the right, the boxed factor combinations correspond to negative values of the BC interaction. Of these, the smallest numbers of defectives are found when the factor C is at its low level and B is at its high level (double boxes). Again, the main effect of A shows the fewest defectives at its lower (slower) level. bc
*
*
--__-__
~
*
I
ac
C
I
I
I
II i *
abc * I
b
-
I
ab
*
I * * B a
1 A
The mathematical treatment gives numbers that are consistent with the effects seen in the visual presentation of the cube plot. But both the numbers and the cube plot contain more information than just a simple description of the historical data. They suggest that to obtain even fewer defectives, the experimenters should consider running the machines with still lighter weight packaging film (factor C ) and heavier pressure on the heated sealing plate (factor B). Under these extrapolated conditions it might even be possible to run the machines faster (perhaps 4000 units per hour) and still obtain very low numbers of defectives. The economics suggest that greater savings can be realized with the lighter weight material but more money would probably have to be spent on maintenance of the heated sealing plate. If the machines can be run at 4000 units per hour, then fewer machines would have to be purchased to achieve a fixed rate of output.
14.6 The Yates’ algorithm The Yates’ algorithm is a formal procedure for estimating the p’s for full two-level factorial designs [Yates (1936)l. The Yates’ algorithm is related to the fast Fourier transform. We describe the Yates’ algorithm here, and illustrate it’s use for the 23full factorial design discussed in Section 14.2. The runs are first listed in “standard order” (Le., a binary counting sequence, least significant bit on the right):
332 Std.
FACTOR B A
Order
C
1 a b ab c
-
t
-
t
1 2 3 4 5
a c 6 b c 7 abc 8
- - - - + + + + +
Std. Order 1 1 a 2 b 3 ab 4 c 5
-
ac
+ +
6
b c 7 8
abc
b ab c a c b c
abc
3 4 5 6 7 8
Std. Order 1 1 a 2 b 3 ab 4 c 5 ac 6 bc 7 abc 8
+ t
-
+
-
+
-
+
-
+
+
-
+
+
-
+
+
-
+
+ +
+
-
-
+
+
+
-
-
t
-
+ +
+ +
C
FACTOR B A
-
_
+
-
-
-
+
+
-
+
+
-
+
+
+
+
+
+
+
-
+
+
+
RES 45 75 49 111
51 89 55 125
75-’ 49 111 51 89
55-1
160 140 180 30 62 38
Pass #I: A column is filled with se uential, pair-wise sums and differences of responses according to the following scheme of rows: {1+2], ?+4? ?+6), r + 8 ? 2-1 , 4-3, 6-5), 8 - 7 .
125--70
49
111 51 89
140 180 30 62
92 108 40 40
carried out again, but’on the results of Pass #1, not on the original responses.
15zrL;:7L%
P #1 P #2 P # 3 45-~-120---280-~-600 75-160-- 320-’ 200 49 140 92 80 111 180 108 64 51 30 40 40 89 62 40 16 550 38-32-1 125-l--70 --__ 32-1L-O
RES
Pass #3: The same operation is carried out again, butonthe results of Pass #2.
Finally, the results of the third pass are divided by the number of experiments going into the averages for that effect. The results are the numerical values of the estimated factor effects.
333 Std. Order 1 1 a 2
b 3 ab 4 c 5 a c 6 bc I abc 8
FACTOR C B A
RES P #1 P # 2 P # 3 Div 457120-2807-600 8 75--] 160” 32OJ 2 0 0 4 - + 49 140 92 80 4 - + + 111 1 8 0 108 64 4 + - - 5 1 30 40 4 0 4 + - + 89 62 40 16 4 4
- - - - +
+
Effect 75
50 20 16
B
AB C AC
10 4 0 0
+
~
Name Mean A
BC ABC ~~
Compare these results with those in Section 14.2 The Yates’ algorithm is simple and can be adapted for use with other full factorial designs, with completely replicated full factorial designs, with fractional factorial designs, and with other grid designs. However, the Yates’ algorithm is not easy to use when data is missing. It is not easy to use when only a few points have been replicated and orthogonality is lost. In the early days of experimental design the Yates’ algorithm made hand calculations easier and minimized calculational errors. However, similar results can be obtained with modern regression analysis packages.
14.7 Some comments about full factorial designs Full factorial designs have been especially useful for describing the effects of qualitative factors, factors that are measured on nominal or ordinal scales. This environment of qualitative factors is where factorial designs originated. Because all possible factor combinations are investigated in a full design, the results using qualitative factors are essentially historical and have little, if any, predictive ability. Full factorial designs are also useful for describing the effects of quantitative factors, factors that are measured on interval or ratio scales. This is how factorial designs are often used today. Because the factors are quantitative (e.g., 20 lbs or 40 lbs of catalyst, 1 hour or 2 hours reaction time), the results have some predictive ability (e.g., what would the results be with 50 lbs of catalyst and 2.5 hours reaction time?). Factorial designs are often used in such a way that there are no degrees of freedom for residuals - the designs are said to be “saturated” in the sense that enough parameters are added to the model to use up all of the degrees of freedom available through the factor effects: i.e., p =f. Similarly, there is often no replication so it is impossible to estimate the purely experimental uncertainty. However, sometimes an estimate of SS, can be obtained by “pooling” - essentially eliminating from the model those effects that don’t appear to be significant and letting the variation that was ascribed to these effects appear in the residuals. Full factorials are often praised because they can be used to reveal factor
334
interactions - that is, they show that the effect of one factor is not constant but depends on the level of another factor. Many researchers feel satisfied when they discover such interaction effects and can describe these interactions quantitatively. However, as shown in Figure 14.3, in systems where curvature effects occur in addition to interaction effects, the interaction effect that will be observed usually depends on where the factorial was camed out in factor space.
14.8 Fractional replication The degrees of freedom for lack of fit, f - p , must not be negative or the model cannot be fitted to the data (see Section 5.6 for example). However, it is possible to use all of the degrees of freedom from the factor combinations to estimate up to f parameters in a model. If p =f,there will be a “perfect fit”. The model usually fitted to data from a full 23 factorial design is Ylr
=Po +PIX,,+
+
P*xZ,+P3X3r
S12xlrX2,
+P I ~ X ~ , X ~ ~ +
P ~ ~ x Z
+P123XlrXZIX3r +rr1
20
40 60 Factor X1
Offset First-order effects ~Two-factor X ~ ~ interactions Three-factor interaction Residual
( 14.26)
80
Figure 14.3 Five two-factor two-level full factorial designs applied to different regions of a curved factor space.
335
Figure 14.4 A one-half replicate of the three-factor two-level full factorial design shown in Figure 14.2. The open circle locates the center of the design.
Statisticians seem to like the fact that p = f = 8 for this model and this design. The design is said to be “saturated” by the model (all of the available degrees of freedom are used up). If the two- and three-factor interactions are known or assumed to be negligible (in manufacturing or production, for example, where only a small portion of the larger response surface might be investigated), then the following model can be used:
Offset First-order effects Residual
( 14.27)
But now p e f a n d the design is not saturated. The model of Equation 14.27 requires only half the number of factor combinations available in a 23 full factorial design. The design does not have 100% efficiency (efficiency = p / f ) . It is possible to selectively choose a subset of 4 of the original 8 factor combinations and use these to fit the reduced model with 100% efficiency. The resulting design is called a “fractional factorial design”. A full 23 factorial design has two “half-replicates” as shown in Figures 14.4 and 14.5, or in cube plot form as:
Figure 14.5 The other one-half replicate of the three-factor two-level full factorial design shown in Figure 14.2. The open circle locates the center of the design. The reflection of this design is shown in Figure 14.4. ---__
*
* I *
i
*
C
*
I
* i
i
-__ * *
B
A Note that one design can be turned into the other by refection of each design point through the center of the design marked “0”. A one-half fractional two-level three-factor factorial design can be designated 1/22or 23-1(1/223= 2 - ’ ~ 2=~23x2-’ = 23-’). Other fractionalizations are possible for more complicated designs [see, for example, Box and Hunter (1961)l. Consider the following z3-*fractional factorial design:
--*
*
~
* * H
, *
I
* A
* * B
337
This design can be used to fit the model of Equation 14.27 with 100% efficiency. However, the first-order Koshal design [Koshal (1933), Kanemasu (1979), and Box and Draper (1987)l can also fit this model with 100% efficiency.
.
* I * I
I
* I * I
I
1
I
l
o
~
I
.
*
*
m
__-
.
A
* *
B
In this comer design, if a mistake is made in the experiment at low levels of each factor, the estimation of all of the factor effects will be incorrect by the same amount. The fractional factorial design is generally considered to be superior to the comer design because of the averaging that takes place: in calculating the factor effects in the fractional factorial design, an error in any one point will be distributed in smaller proportion over all factor effects. As we will see in Section 14.11, fractional factorial designs are often used to look for important factors. If it turns out that one or more factors is unimportant (say factor x3), then a fractional factorial design can be collapsed into a less fractional design in a lower-dimensional factor space. Figure 14.6 shows an example of collapsing the design.
14.9 Confounding Consider the half-replicate taken from the following 23 full factorial design (those experiments marked by arrows are taken): Response Notation + 1
a
+ +
b
ab C
A
-
+
-
+
B
C
+
-
-
+
+
-
-
abc
+
+
+ + +
divisor
4
4
4
+
ac bc
-
+
AB
EFFECT AC
BC
ABC
MEAN
338
Fractional replication causes confusion among the factor effects. This confusion is called “confounding” or “aliasing”. To see this, compare the signs in the columns in the fractional factorial design:
1
Response Notakion
ab dC
bc
; +
+
-
+
+
EFFECT A:
A:
-
+ +
+
-
-
+
-
BI
ABY
-
-
MEA;
+ + +
1
The main effect of A would be calculated as A= (ab+ac)/2- ( 1 + b ~ ) / 2
(14.28)
The BC interaction would be calculated as
BC= ( 1 +bc)/2- (ab+ac)/2
( 14.29)
Thus, the calculation of the BC interaction would be identical in magnitude but opposite in sign to the A main effect. Because the four-experiment fractional factorial design does not have enough degrees of freedom to calculate all eight parameters, only the offset and first-order effects are usually considered. But if the assumption of no interactions is incorrect (e.g., if the fractional factorial design is being used in a research environment over a large domain of factor space where interactions might be important), then any existing BC interaction will show up in the A effect. In a similar way, the main effect of B is confounded with the AC interaction; the main
Figure 14.6 Collapsing the fractional factorial design of Figure 14.5 into the plane of factors A and B . See text for discussion.
339
effect of C is confounded with the AB interaction; and the offset term (MEAN) is confounded with the ABC three-factor interaction. With this fractional design, each main effect is confounded (or “aliased” or “inextricably intermingled” or “confused”) with a higher-order effect. We might say we are estimating A (PI), for example, but the estimate is actually a combination of A and BC (p, and p,,). The results from Section 14.2 can be used to illustrate the effect of confounding: Response Notation 1 1 ab ac bc
divisor
A=
A -
+ +
EFFECT C
€3
-
-
+ +
2
2
MEAN
RESP
+
45 111
+
+
89
+
+
55
2
4
-
(111+89) - (45+55) =50 2 2
B = (111+55) - (45+89) = 16 2 2
C=
(89+55) - (45+ 11 1 ) = -6 2 2
+
(A-BC=50-0=50)
(14.31)
(B-AC=20-4=
(14.32)
16)
(C-AB= 10- 16= - 6 )
(14.33)
This demonstrates the confounding that takes place when a full factorial design is fractionalized: the main effects are no longer “pure”, but now include the effect of any existing interactions as well. Again, when using fractional factorial designs it is usually necessary to assume that the higher-order interaction effects are zero or negligible. If the higher-order interaction effects are actually non-zero, they will bias the results. Generally, the assumption of negligible interactions works well in manufacturing or production where only small changes in the factors are allowed and interactions are accordingly seldom seen. The assumption of negligible interactions is often a poor assumption in research and development where large changes are made in the factors and interactions are thereby more easily seen.
14.10 Blocking Ideally, all factors other than the ones under study can be controlled and held constant. Then, because they don’t vary, they can’t cause variation in the response
340
(see Figure 12.6). In practice, all factors other than the ones under study can’t be controlled. Because the factors vary, they will cause variation in the response. If the lack of control is unpredictable, then randomization offers some assurance that the variation in the response won’t systematically influence the results (see Section 15.2). A block is a portion of the experimental material or of the experimental environment that is likely to be more homogeneous within itself than among different portions. For example, samples taken from a single lot of pharmaceutical product are likely to be more uniform than samples taken from different lots. A group of samples from one lot would be regarded as a block. Similarly, measurements taken on the same day are likely to be more uniform than measurements taken over several days. A group of measurements from one day would be regarded as a block. Factorial designs are especially well suited for blocking. When a factorial design is broken up into blocks by fractionalization, the “block effect’’ must be assigned to, or confounded with, one of the effects normally obtained from the model. In practice, the interaction of least concern (usually the highest-order interaction) is sacrificed. To illustrate, we’ll use the ABC interaction to block the design of Section 14.6. Response Notation
1 a b ab C
aC bc abc
A
B
-
-
C
-
+ -
+
+ -
+
AB
EFFECT AC
BC
ABC
MEAN
+ +
+ + +
-
-
+
-
+
+
+
-
-
-
-
+
-
+ +
+
+
+
+
+
+
+
-
-
-
+ +
-
-
+
-
+
-
+
-
-
+
+
-
+ +
+ + +
-
-
DEF
18.1 19.9 12.3 13.7 15.9 18.1 29.1 32.3
Collect into one block all runs for which ABC is “-”; collect into the other block all runs for which ABC is “+”: Response Notation 1 ab aC bc
EFFECT B C
A
-
-
+
+
+
-
+
-
+
+
DEF 18.1
Response Notation
EFFECT A B C
+
13.1
a b
18.1 29.7
C
-
abc
+
-
- + +
+ +
DEF
19.9 12.3 15.9 32.3
I+
Now, to illustrate the utility of blocking we will assume that the experiments weren’t done exactly as described before and we actually got results that are different than these. Because of the large number of items to be counted, two temporary workers (X and Y) had to be employed. Although workers X and Y received identical quality training on the operational definition of “defective,” it was noticed that worker X
34 1
didn’t seem to listen as well as worker Y. Because of this, it was suspected that the determination of defectives might be different for the two workers. To guard against this possibility, the design was blocked: the items that were produced as a result of the experiments in the (-) block on the left were counted by worker X; those on the right by worker Y. Assume that an item had to be grossly defective for worker X to count it as a defective; thus, this worker’s counts were relatively low by 3 per 10,000 items. Worker Y, however, was overly zealous and found “defectives” when, in fact, a defect did not exist; thus, this worker’s counts were relatively high by 7 per 10,000 items. The original results would then have been: +Y
Response
EFFECT
ac bc ___--_____
Response Notation a b 15.1
-
26.7
EFFECT A
B
C
+
-
-
-
+
C
-
-
abc
+
+
DEF 26.9 19.3 22.9 39.3
+
+
Recombining these blocks gives the original eight-experiment design:
Response Notation
EFFECT A
B
C
AB
AC
1 + + + a b + + + ab + + + + C + + + aC + + bc abc + + + + +
MEAN=
BC
ABC
MEAN +
+ +
+ + + +
-
+ +
-
-
+
-
+ +
+
+ + +
DEF
15.1 26.9
19.3 10.7 22.9
15.1 26.7 39.3
(15.1 +26.9+ 19.3+ 10.7+22.9+ 15.1 +26.7+39.3) = 22.0 8 (14.34)
A=
(26.9+ 10.7+ 15.1 +39.3) 4
-
(15.1
+ 19.3+22.9+26.7) =2.0 4
(14.35)
B=
(19.3+ 10.7+26.7+39.3) (15.1 +26.9+22.9+ 15.1 ) =4.0 4 4
(14.36)
C=
(22.9+15.1+26.7+39.3) 4
(14.37)
AB =
-
(15.1+10.7+22.9+39.3) 4
(15.1+26.9+19.3+10.7) =8.0 4
-
(26.9+ 19.3+15.1+26.7) = 0.0 4
(14.38)
342
AC = BC =
(15.1 + 19.3+ 15.1 +39.3) (26.9+ 10.7+22.9+26.7) =0.4 4 4 (15.1+26.9+26.7+39.3)
ABC =
4
-
(14.39)
(19.3+ 10.7+22.9+15.1) = 10.0 (14.40) 4
(26.9+19.3+22.9+39.3) (15.1+ 10.7+ 15.1+26.7) = 10.2 4 4
(14.41)
Note that the mean has changed by +2 compared to the results in Section 14.5. This is expected: if four of the results decreased by 3 (-12 total) and four of the results increased by 7 (+28 total), then the net change in the sum will be 28 - 12 = 16 which, when spread over 8 results, is an average change of +2. Note that the interaction effect ABC now includes (is confounded with) the block effect; it has increased from 0.2 to 10.2, an increase of 10. This is the increase in response between the average results for the block (worker X) and the average results for the “+” block (worker Y): as expected, this goes from -3 to +7 for a change of 10. Thus, blocking has given us an opportunity to estimate the main effect of a fourth factor, the worker effect. However, all of this is minor compared with what has happened (or hasn’t happened!) to the other effects. The estimation of the other effects has not been influenced by the blocking factor! This is the whole point of blocking. “-”
14.11 Saturated fractional factorial designs and screening Sieving (or screening) is a process that separates large from small. Suppose we are just beginning a research project and have no idea which of a large number of potentially important factors really are important. Wouldn’t it be nice if we could place all of these factors on a screen and sieve them so the unimportant factors fell through and only the important factors stayed on top? There is a class of experimental designs, called screening designs, that can be used to sieve factors. Behind almost all of these designs is an implicit linear model that is first-order in each factor. The model is YI,=Po+P,X,,+P2X*,+P3X3,+...
(14.42)
Note that for k factors, the model will have k + 1 parameters - one for each of the factors being investigated plus one for the offset term (Po). After this model has been fitted to the data obtained from a screening design, the p’s can be used to determine whether the factor is small and can be discarded, or is large and should be retained. Remember that a factor effect in a first-order model is a slope, 8yl8x. A factor effect tells us how much power over the universe that factor gives us: that is, it tells us how much we can change y for a given change in x. Figure 14.7 shows graphically
343
the results of a screening design used to investigate four coded factors. It is assumed that the low (-1) and high (+1) levels cover an “experimentally relevant domain” (see Section 14.4). Clearly, the effects (slopes) of temperature and pH are small. The effects of pressure and catalyst concentration are much larger - the pressure effect is positive, but the catalyst effect is negative (maybe the catalyst is really an inhibitor). It looks like higher pressure and lower catalyst will produce better yield. If this project is continued, then temperature and pH can probably be held constant and not varied, while pressure and catalyst concentration can be studied in more detail with additional designed experiments. This reduction of four potentially useful factors to
o
-1
0
-1
‘tl
t l
Code0 P r e s s u r e
Coded T e m p e r a t u r e
-1 0 +I c m e d catalyst CDrlcentration
-1
0
ti
Codea pH
Figure 14.7 Graphical results of a screening design to detect the first-order effects of four factors.
j
344
only two useful factors greatly simplifies the further study of the system. The purpose of initial screening designs is to choose from a large number of potentially useful factors those few factors that probably exert the greatest influence on the system. Screening designs allows us to “pick the low-hanging fruit,” to use the most influential factors to make initial improvements in the system. Later, we might want to come back and reinvestigate some of the factors that were omitted initially. But for the time being, we will use those factors that give us the greatest power over the universe. Most screening designs are based on saturated fractional factorial designs. The fractional factorial designs in Section 14.8 are said to be saturated by the first-order factor effects (parameters) in the four-parameter model (Equation 14.27). In other words, the efficiency E = p/f = 4/4 = 1.0. It would be nice if there were 100% efficient fractional factorial designs for any number of factors, but the algebra doesn’t work out that way. As suggested by Table 14.5, fractional factorials will give a saturated design for 3 factors (23-’ = f = p = 4) and for 7 factors (274 = f = p = 8), but full factorial designs cannot be fractionalized by powers of two to get 5, 6, or 7 factor combinations. Saturated fractional factorial designs exist only for numbers of factor combinations that are a power of two. If the design is to have 100%efficiency, then the number of factors must be a power of two, minus one for the Po term. The first six saturated fractional factorial designs are shown in Table 14.6. Table 14.7 shows a saturated fractional factorial design for k = 7 which can be used to supply data for estimating the seven main effects in the first-order model
Notice that this is an orthogonal design in coded factor space (-1 and +l): any one column multiplied by any other column will give a vector product of zero. Other saturated fractional factorial designs may be found in the literature [Box and Hunter (1961a, 1961b), Anderson and McLean (1974), Barker (1985), Bayne and Rubin (1986), Wheeler (1989), Diamond (1989)l. The saturated fractional factorial designs are satisfactory for exactly 3, or 7, or 15, or 31, or 63, or 127 factors, but if the number of factors is different from these, so-called “dummy factors” can be added to bring the number of factors up to the next largest saturated fractional factorial design. A dummy factor doesn’t really exist, but the experimental design and data treatment are allowed to think it exists. At the end of the data treatment, dummy factors should have very small factor effects that express the noise in the data. If the dummy factors have big effects, it usually indicates that the assumption of first-order behavior without interactions or curvature was wrong; that is, there is significant lack of fit. As an example of the use of dummy factors with saturated fractional factorial designs, suppose there are 11 factors to be screened. Just add four dummy factors and
345
TABLE 14.5 First-order linear models for 3-7 factors.
k Model
use the 215-1'saturated fractional factorial design. It will have an efficiency of 12/16 or 75% (1 1 factor effects plus one offset term, all divided by 16 factor combinations). Now suppose there are 16 factors to be screened. We would have to add 15 dummy factors and use the 231-26saturated fractional factorial design, but this would give an efficiency of only 17/32 = 53%. This is not very efficient. Most researchers would rather eliminate one of their original 16 factors to give only 15 factors. There is a saturated fractional factorial design that will allow these factors to be screened in only 16 experiments.
TABLE 14.6 The first six saturated fractional factorial designs.
design
k
f
3
2%'
4
7
214
8
15
215-11
16
31
231-26
32
63
263-57
64
127
2127.120
128
346 TABLE 14.7 A seven-factor saturated fractional factorial design. Variable Run
A
B
C
D
E
F
G
1
-
-
-
+
+
+
-
2
+
-
-
-
-
+
+
3
-
+
-
-
+
-
+
4
+
+
-
+
-
-
-
5
-
-
+
+
-
-
+
6
+
-
+
-
+
-
-
7
-
+
+
-
-
+
-
8
+
+
-I-
+
+
+
+
14.12 Plackett-Burman designs In 1946 R. L. Plackett and J. P. Burman showed how full factorial designs could be fractionalized in a different way to give numbers of factor combinations that are a multiple of four rather than a power of two. This has tremendous utility because the sequence of saturated fractional factorial designs can be augmented with the Plackett-Burman designs as shown in Table 14.8. Table 14.9 shows a Plackett-Burman design for determining the first-order effects of 11 factors in only 12 experiments. This is one of the cyclical Plackett-Burman designs: note that the first row is sequentially rotated to the right to give each succeeding row. The last experiment sets all factors at their low level. With the Plackett-Burman designs, at most only three dummy factors need to be added to reach a saturated design. It is interesting that Plackett and Burman originally developed these designs for the purpose of testing the ruggedness of established processes, a purpose related to quality even back in 1946. In their original paper, the low level was represented by “1” instead of “-1” and the high level was represented by “2” instead of “+1.” The Plackett-Burman designs are also known as Hadarnard matrix designs [Wheeler (1989)], 2(k-p)fractional factorial designs [Box and Hunter (1961a, 1961b)], and Taguchi orthogonal arrays [Ross (1988)l. When Plackett-Burman designs are used, the next highest design,is often chosen to supply
347
TABLE 14.8 The relationship of saturated fractional factorial and Plackett-Burman designs. k
design
f
3
23-1
4
7
214
8
11
PB-11
12
15
21s.Ll
16
19
PB-19
23
PB-23
27
PB-27
28
31
231-26
32
-
. 63
263-57
1 design fits here
t---
3 designs fit here
+-
7 designs fit here
+--
64 15 designs fit here
+--
127
2127-120
128
.
+-
etc.
dummy factors that can provide degrees of freedom for assessing lack of fit of the model to the data. One difficulty with Plackett-Burman designs (and saturated fractional factorial designs in general) is that main effects are confounded with interactions, in particular with two-factor interactions. (The confounding scheme is beyond the scope of this presentation.) However, there is a way to remove the confounding between main effects and two-factor interactions by using the “foldover” or “reflection” design along with the original design [Box, Hunter, and Hunter (1978)l.There might still be confounding of main effects with higher-order interactions, but these are generally considered to be less important. Table 14.10 gives the foldover design that goes with the previous design in Table 14.9. The complete design is shown in Table 14.11.
348
14.13 Taguchi ideas Genichi Taguchi is an engineer who used screening designs to implement a small part of his very broad philosophy about quality improvement [Kackar (1985), Taguchi (1986), Ross (1988), Bendell, Disney, and Pridmore (1989)l. An 11-factor Taguchi design, called an L,, orthogonal array, is shown in Table 14.12. Note that Taguchi designs use “1” to indicate the low level of a factor and “2” to indicate the high level of a factor. Most Taguchi arrays are identical to Plackett-Burman designs [Kackar, Lagergren, and Filliben (1991)l. The conversion of the Taguchi L,, orthogonal array to the Plackett-Burman 11 -factor fractional factorial design is outlined in Exercise 14.7. Because experiments can be arbitrarily assigned to rows, and because factors can be arbitrarily assigned to columns, the Taguchi L,, orthogonal array and the Plackett-Burman 1 1 -factor fractional factorial design are identical. TABLE 14.9 A Plackett-Burman design for 11 factors in 12 runs. Variable 1
2
3
4
5
6
7
3
-
+
+
-
+
-
-
4
+
-
+
+
-
5
-
I
6
+
+
+
-
+
+
7
-
+
+
+
-
+
8
-
-
+
+
+
-
Run
8
9
10
11
+
+
+
+
+
1 2
9 10 11
-
+
-
+ +
-
+
-
-
+
+
+ I
-
-
+
-
-
+-
-
-
+
-
+
-
-
349
A major part of Taguchi’s philosophy centers around the separation of effects caused by noise factors and effects caused by control factors. Control factors are variables that are under the control of the experimenter: flow rate, pH, concentration, reactor temperature, etc. In contrast, noise factors are variables that are not under the control of the experimenter: ambient temperature, ambient humidity, identity of process operator (Joe or Jane), source of raw material, etc. Control factors are sometimes called process factors; noise factors are sometimes called environmental factors. Taguchi uses fractional factorial designs to determine the first-order effects of both the control factors and the noise factors, but he separates the factors and the designs into an “inner array” (involving the control factors only) and an “outer array” (involving the noise factors only). The concept is shown geometrically in Figure 14.8 for three control factors and three noise factors. The inner array is the large fractional factorial design shown in the center of TABLE 14.10 The foldover design for the Plackett-Burman design in Table 14.9. Variable Run
1
2
3
4
5
6
1
-
+
-
+
+
+
-
2
-
-
+
-
+
+
+
-
-
-
3
+
-
-
+
-
+
+
+
-
-
4
-
+
-
-
+
-
+
+
+
-
-
5
-
+
-
-
+
-
+
+
+
-
6
-
-
+
-
-
+
-
+
+
+
7
+
-
+
-
-
+
-
+
+
8
+
+
-
-
-
+
-
-
+
-
+
9
+
+
+
-
-
-
+
-
-
+
-
lo
-
+
+
+
-
-
-
+
-
-
+
11
+
-
+
+
+
-
-
+
-
-
12
+
+
+
+
+
+
+
+
+
-
-
7
-
1011
+
-
+
9
8
+
+ -
350
TABLE 14.11 An 1 1-factor Plackett-Burman design with foldover. Variable Run 1 2 3 4 5
6 7 8
9 10 11 12 13 14 15 16 17 18
19 20
21 22 23 24
1
2
3
4
5
6
7
8
9
10
11
35 1
x m
Figure 14.8 Taguchi concepts of inner and outer arrays. See text for discussion.
Figure 14.8. The inner array exists in the three-dimensional space of control factors xl, x2, and x3;these might represent temperature, flow rate, and pH, respectively. The outer array is represented by the small fractional factorial designs shown at each factor combination of the inner array. It is important to understand that this inner outer does not exist in the control factor space - it exists in a separate three-dimensional space of noise factors, designated for our purposes here as zl, z2, and z,; these might represent ambient humidity, source of raw material, and identity of process operator. Suppose viscosity, yli, is the quality response of interest. Then at one of the four factor combinations in the inner array (i.e., in the control factor space), experiments can be carried out at the four factor combinations in the outer array (i.e., in the noise factor space). Frequently the outer array experiments are “adventitious experiments” in the sense that the experimenter has to wait until, say, the ambient humidity approaches the desired value; then the appropriate source of raw material and the appropriate process operator can be brought in and the experiment can be carried out. Once the four experiments in the outer array have been completed, the following model can be fitted to the data: Y1 i =GI+ a ,Z I
i
+a 2 ~ 2+j a3 +rl z3j
j
(14.44)
The parameter estimates a,, u2, and a, represent the noise factor effects. In this example, they represent how much influence the ambient humidity, source of raw material, and name of person running the process have on the viscosity. Ideally, for a rugged system, we would like a,, u2, and a, to be zero. Taguchi realized that the effects of the noise factors might depend on the settings of the control factors. The four experiments of the outer array described in the
352
previous paragraph are now repeated for the other three factor combinations in the inner array. The model represented by Equation 14.44 can be fitted to each of these three addition sets of data. At this point, 16 experiments will have been carried out: four outer array experiments at each of the four factor combinations of the inner array. At this point, four models of the form shown in Equation 14.44 will have been fitted. Taguchi’s great insight is summarized by the following model:
a,, = P o + Plxlr + P Z X 2 r + P3x3, + T l l
( 14.45)
It states that the effect of the ambient humidity is a function of the control factors! This information can be tremendously useful because it offers a quantitative means of adjusting the control factors xl, x2, and x3 in such a way as to cause the value of a, to approach zero. In short, there might be a combination of the control factors that will cause the system to be rugged with respect to ambient humidity. The same type of model can be used to describe how the effects of the source of raw material and the identity of the process operator are in turn affected by the control factors:
= Pb’ + P ; ’ x l l
+ P;’x21 + Pyx,, +‘lr
( 14.47)
With luck, there might be a combination of xl, x2, and x3 that will cause the noise factor effects to all go to zero. In practice, some compromise usually must be made. Taguchi’s use of fractional factorial designs is innovative and powerful when applied to appropriate problems. Other workers, however, have used Taguchi designs indiscriminately with mixed success. Taguchi arrays work well for improving processes that are in manufacturing or production; this is because the control factors have a very limited domain of variation about the set point and the first-order assumption of the underlying fractional factorial designs is usually satisfied curvature and interaction are seldom seen over a small patch on the side of a hill. Taguchi arrays don’t always work well for improving processes that are still in research and development; this is because the control factors have a very wide domain of variation and the first-order assumption of the fractional factorial designs is usually not satisfied - curvature and interaction are frequently seen over the broader response surface [ S . Deming (1985)l.
14.14 Latin square designs In some applications, Latin square designs can be thought of as fractional three-level factorial designs that allow the estimation of one main factor effect while
353 TABLE 14.12 A Taguchi orthogonal array.
Variable Run
1
2
3
4
5
6
7
8
9
1011
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
2
2
2
2
2
3
1
1
2
2
1
1
1
2
2
4
1
2
1
2
2
1
2
2
1
1
2
5
1
2
2
1
2
2
1
2
1
2
1
6
1
2
2
2
1
2
2
1
2
1
1
7
2
1
2
2
1
1
2
2
1
2
1
8
2
1
2
1
2
2
2
1
1
1
2
9
2
1
1
2
2
2
1
2
2
1
1
10
2
2
2
1
1
1
2
2
1
2
11
2
2
1
2
1
2
1
1
1
2
2
12
2
2
1
1
2
1
2
1
2
2
1
2
1
2 2
simultaneously blocking against the effects of two other factors. As with most other highly fractional factorial designs, it is necessary to assume that no interactions exist among the three factors [Dunn and Clark (1987)l. Latin squares are usually used when the factors are expressed on nominal scales, and when models of the type discussed in Section 15.5 can be used with classical analysis of variance. However, if the factors are expressed on ordinal, interval, or ratio scales; if the interval between levels is the same; and if first-order behavior can be assumed, then Latin squares can provide information to fit other types of linear models. For a more thorough discussion of Latin square designs see Hunter (1989). As an example of the use of a Latin square as a fractional factorial design, suppose we want to find out the effect of increasing concentrations of a chemical added to retain gloss in an industrial paint formulation. The model is
354
y , , = Po + PIXI*+ T I ,
(14.48)
where p, expresses the effect of the additive. Three paints are formulated with low (2 grams per gallon), middle (4 g/gal), and high (6 g/gal) concentrations of gloss retainer. However, the evaluation of these three paints must be done on a single large metal test panel that will be erected vertically and perpendicular to a nearby desert highway. It is felt that the portion of the panel nearest the highway will experience more abrasion from rocks and gravel thrown up by passing cars than will the portion of the panel farthest from the highway. Similarly, it is felt that the portion of the panel closest to the ground will experience more abrasion from sand and debris thrown against it from blowing wind than will the portion of the panel farthest from the ground. Thus, we don't want the evaluation of the gloss retainer to be confused by the rock and gravel abrasion (the rock factor), and we don't want the evaluation of the gloss retainer to be confused by the sand and debris abrasion (the sand factor). How can we distribute the three paints across the single panel to achieve these goals? Using three vertical stripes would confuse the additive factor with the rock factor. Using three horizontal stripes would confuse the additive factor with the sand factor. Fortunately, if we divide the panel into nine smaller panels, a Latin square can be used to block against the influence of these two troublesome factors. Let the concentrations of the gloss retaining additive (factor xI)be designated -1 (low = L), 0 (mid = M), and +1 (high = H). Let the rock factor (factor x2) be assumed to run horizontally across the panel and let the levels be designated -1 (farthest from the highway), 0 (mid), and +1 (closest to the highway). Let the sand factor (factor x3) be assumed to run vertically and let the levels be designated -1 (top), 0 (mid), and +1 (bottom). The following Latin square allocation of three paints to a 3x3 grid of panels is such that each paint appears once and only once at each level of the rock factor, and each paint appears once and only once at each level of the sand factor:
355
The experimental design matrix for the above nine panels (experiments) is -x1
x2
x3
-1 -1 -1
-1
-1
0
~
0
+I 0 -1 D= 0 0 0 +I + 1 -1 +I 0 - +1 +I
+1
0 +I -1 +I -1 0-
(14.49)
Although the original model is given by Equation 14.48, let us consider an expanded linear model that includes terms for the rock effect sand effect as well: YII=
P o + PIXll
+
+ P3X3, +
P2XZ1
(14.50)
rlr
The X matrix that corresponds to Equation 14.50 is derived from the design matrix given in Equation 14.49 by adding an initial column of one's. Then, using abbreviated sign notation for the levels,
-+
r + ++ + + + + + ( X X )= l x L - 0
I'
o o o + + ; - o + 0 + 0 + - + - + O
+
+ +
+ + (14.51)
0
0
0
6
356
Thus,
(14.52)
It is clear from this (X'x>-' matrix that the design is orthogonal for the model of Equation 14.50 and the effect of gloss retaining additive (x,)will be estimated separately from the effects of rocks (x2) and sand (xJ.
357
Exercises 14.1 Two-factor two-level full factorial designs. Calculate the “grand average” (MEAN), the two classical main effects (A and B), and the single two-factor interaction (AB) for the two-factor two-level full factorial design shown in the “square plot” in Section 14.1.(Assume coded factor levels of -1 and +l). What is the equivalent four-parameter linear model expressing y1 as a function of x , and x,? Use matrix least squares (regression analysis) to fit this linear model to the data. How are the classical factor effects and the regression factor effects related. Draw the sums of squares and degrees of freedom tree. How many degrees of freedom are there for SS,, SS1, and SS,? 14.2 Two-factor three-level f i l l factorial designs. Calculate the grand average (MEAN), the three classical main effects (A, B, and C), the three two-factor interactions (AB, AC, and AD), and the single three-factor interaction (ABC) for the 23full factorial design shown in the “cube plot” in Section 14.1.(Assume coded factor levels of -1 and +l). What is the equivalent eight-parameter linear model expressing y1 as a function of xl, x,, and x,? Use matrix least squares (regression analysis) to fit this linear model to the data. How are the classical factor effects and the regression factor effects related. Draw the sums of squares and degrees of freedom tree. How many degrees of freedom are there for SS,, SS,,, and SSP? Use matrix least squares (regression analysis) to fit the linear model y l i = bi + + b>z + bix,, + rli to the data. How are the least squares parameter estimates in the eight-parameter model related to the parameter estimates in this four-parameter model? Why? Draw the sums of squares and degrees of freedom tree. How many degrees of freedom are there for SS,, SS,,, and SS,? Why? 14.3 Calculation of (X’X)-’(X’). Show by direct calculation that the pseudo-inverse (X‘X)-’(X’) is equivalent to the transpose of the X matrix for the design and model represented by Table 14.3 (see Equation 14.6). 14.4 Classical vs. regression factor effects. What clue is there in Equation 14.7 that suggests that there will be a difference between the classical calculation of factor effects and the regression analysis calculation of factor effects?
358
14.5 Second-order vs. first-order factor effects. In a set of experiments, x , is temperature expressed in degrees Celsius and is varied between 0°C and 100°C. Fitting a full second-order polynomial in one factor to the experimental data gives the fitted model y l i = 10.3 + 1 . 4 +~ O.0927gi ~ ~ + rli. The second-order parameter estimate b,, is much smaller than the first-order parameter estimate b,. How important is the second-order term compared to the first-order term when the temperature changes from 0°C to 1”C? How important is the second-order term compared to the first-order term when temperature changes from 99°C to lOO”C? Should the second-order term be dropped from the model if it is necessary to predict response near the high end of the temperature domain? 14.6 Plackett-Burman and saturated fractional factorial designs.
Using row and column operations, convert the following 7-factor Plackett-Burman design to the saturated fractional factorial design shown in Table 14.7. Variable Run
A
1
-
2 3
4 5
6
7 8
-
-
+ -
+ +
-
B I
+
-
+ + -
C -
+ +
+ -
D +
+ + +
+ -
E +
-
F
G
+
-
- + + - + + + +
-
-
+
+
-
-
+ -
-
+ +
Hint: row 8 of this design and row 8 of the saturated fractional factorial design in Table 14.7 suggest that the reflection or foldover must be carried out first. Repetitions of switching one row with another, and switching one column with another, will eventually yield the desired result. Retain the identities of the rows and columns. Remember that these row and column operations are the same as renumbering the experiments (arbitrary, anyway) and re-assigning the factor identities (arbitrary, anyway). 14.7 Plackett-Burman and Taguchi designs. Copy each row labeled i in the following Taguchi array into the run of the next array changing “1” to “-” and “2” to “+” as you do so.
359
7 8 9
10 11 12
2 2 2 2 2 2
1 1 1 2 2 2
2 2 1 2 1 1
1
2
3
2 1 2 1 2 1
1 2 2 1 1 2
4
5
1 2 2 1 2 1
2 2 1 1 1 2
2 1 2 2 1 1
1 1 2 2 1 2
2 1 1 1 2 2
1 2 1 2 2
7
8
9 1 0 1 1
4 1 10 6
2 5
1
Variable -~ Run
6
~
Now copy the column labeled j in the above array into the variable of the following array.
4
Run _ I -
~~
Variable 5 6 7
1 8
9
10
11
~~
What is this resulting array? Think about what this means. See Kackar, Lagergren, and Filliben (1991).
14.8 Latin square design as a fractional factorial design. Assume the gloss retention responses associated with Equation 14.49 are (in order) 98, 84, 70, 106, 92, 90, 114, 112, and 98. Using matrix least squares and the model of Equation 14.50, what is the estimated effect of the additive (b,)? Using matrix
360
least squares and the model of Equation 14.48, what is the effect of the additive? Are the estimates the same using the two models or are the estimates different? Why?
14.9 Classical vs. regression factor effects. In Section 14.3, a coding of -1 and +1 gave linear model main effects (b:,b;, and b;) that differed by a factor of '/z from the classical main effects (A, B, and C). If the coding had been -2 and +2 instead, by how much would they have differed? In Section 14.3, a coding of -1 and +1 gave linear model interaction effects (by2, bi3,and b;) that differed by a factor of '/z from the classical interaction effects (AB, AC, and BC). If the coding had been -2 and +2 instead, by how much would they have differed? With -2 and +2 coding, by how much would the three-factor linear model interaction effects (Z&) differ from the classical three-factor interaction effect (ABC)? 14.10 Yates' algorithm. Use the Yates' algorithm to calculate the results for Table 14.2.
361
CHAPTER 15
Additional Multifactor Concepts and Experimental Designs
In this chapter we discuss the multifactor concepts of confounding and randomization. The ideas underlying these concepts are then used to develop experimental designs for discrete or qualitative variables.
15.1 Confounding Consider the situation of a researcher who believes that the rate of an enzyme catalyzed reaction is affected not only by factors such as temperature, substrate concentration, and pH (see Section 1 1.l), but also by the concentration of sodium ion (“a’]) in solution with the enzyme. To investigate this hypothesis, the researcher designs a set of experiments in which all factors are kept constant but one: the concentration of sodium ion is varied from 0 to 10 millimolar (mM) according to the design matrix
-
0’ 1
2 3 D=
4 5
6 7
8 9 10
(15.1)
362
That is, in the first experiment there is no added sodium ion; in the second experiment, “a’] = 1 M, in the third experiment, “a’] = 2 M,and so on, until in the eleventh and last experiment, “a’] = 10 mM. Carrying out these experiments (one every ten minutes), the researcher obtains the response matrix
-
96 89
-
84 81 75 Y = 70 65
( 15.2)
61 54 49 46
The fitted two-parameter straight line model is
(15.3)
or
Rate= 95 - 5 “a+ ] m M + r , ,
( 15.4)
The data and fitted model are shown in Figure 15.1. The parameter estimate 6 , is highly significant. Based on this information, the researcher would probably conclude that the concentration of sodium ion does have an effect on the rate of the enzyme catalyzed reaction. However, this conclusion might be wrong because of the lurking presence of a highly correlated, masquerading factor, time (see Section 1.2). A better description of the design matrix would involve not only the concentration , but also the time at which each experiment was carried out of sodium ion ( x ~mM), (x2, min). If we begin our measurement of time with the first experiment, then the design matrix for the previous set of experiments would be
363
0
0 10
1 2 3
20
30 40 50 60 70 80 90
4
D=
5 6 7 8
9
(15.5)
10 100
The experimental design in two-dimensional factor space is shown in Figure 15.2. A simple model that would account for both “a’] and time is
( 15.6)
YIr=Po+PI~II+P2X2,+~lr
The (X’X) matrix is given by
0
1
2
3
4
5
“a+],
6
7
8
9
10
mM
Figure 15.1 Graph of the fitted model y,, = 95 - 5 . ~with , ~ experimental data.
364
-
t
0 0
0
0
0
1
2
3
5
4
“a’],
6
7
8
9
10
mM
Figure 15.2 Factor combinations for the highly correlated experimental design of Equation 15.5.
(15.7)
Calculation of (X’x>-’ is not possible, however, because the determinant of Equation 15.7 is zero. Thus, there are an infinite number of solutions to this problem, two of which are yl,=95-5Xl,+OX2r+rl,
(15.8)
and
There is no unique combination of b, and b, that satisfies the condition of least squares; all combinations of b, and b, such that b, = (0.5 + b,)/0.1 will produce a minimum sum of squares of residuals. The reason for this difficulty is that we are trying to fit a planar response surface (Equation 15.6) to data that have been obtained in a line (see Figure 15.2). An infinite number of planes can be made to pass through this line; therefore, an infinite
365
number of combinations of b, and b2satisfy Equation 15.6. (See Sections 4.4 and 5.6 where the analogous difficulty of fitting a straight line to a point was discussed.) Figure 15.3 shows the graph of Equation 15.8; Figure 15.4 is the graph of Equation 15.9. Both give equally good fits to the data. Our researcher viewed the system as revealed by Equation 15.8 and Figure 15.3 ("a'] responsible for the change in rate). However, Equation 15.9 and Figure 15.4 might be correct instead; the enzyme could be denaturing with time - that is, changing its structure and therefore losing its ability to catalyze the reaction. It might also be true that both effects are taking place ("a'] could have an effect and the enzyme could be denaturing). We are hopelessly confused about which factor is causing the observed effect. We are cor?founded.The factors x, and x2 are said to be confounded with each other. The experimental design is responsible for this confusion because the two factors, "a'] and time, are so highly correlated.
15.2 Randomization If we could somehow destroy the high correlation between "a'] and time (see Figure 15.2), then we might be able to unravel the individual effects of these two factors. One way to avoid high correlations among factors is to use uncorrelated designs such as factorial designs (Section 12.3), star designs (Sections 12.3, or central composite designs (Section 12.6). However, when time is a factor it is often difficult to use these highly structured designs because they require running several
Figure 15.3 Response surface for the fitted model y , , = 95
- 5x,, + Ox,,.
366
Figure 15.4 Response surface for the fitted model y,, = 95
+ Ox,, - OSx,,.
experiments simultaneously. (E.g., replicate center points demand that several experiments be run at the same fixed levels of all factors; when time is a factor, this means they must all be run at the same fixed level of time - that is, simultaneously.) Randomization is another approach to avoiding high correlations among factors. For the example of Section 15.1, randomization is accomplished by mixing up the order in which the experiments are carried out. Any of a number of methods might be used to randomly assign times to concentrations of sodium ion (or, equivalently, randomly assign concentrations of sodium ion to times). One method would be to put eleven slips of paper on which are written the times 0, 10, 20, ... in one bowl, and eleven slips of paper on which are written the sodium ion concentrations 0, 1, 2, ... in another bowl; mix the contents of each bowl; and blindly draw pairs of slips of paper, one slip from each bowl. The resulting pairs would be a random assignment of times and sodium ion concentrations. Other methods of randomization include the use of random number tables, and random number functions in computers. (Random number functions within computers should be used with caution, however; some of them are not as random as one might be led to believe.) Figure 15.5 illustrates one completely random pairing of times and sodium ion concentration. Very little correlation is evident in this figure (? = 0.056). It must be remembered that randomization does not guarantee that factor combinations will be uncorrelated; after all, the pairing shown in Figure 15.2 has exactly the same chance of occurring randomly as the pairing shown in Figure 15.5 does. Randomization is useful because the number of patterns that are very highly correlated is usually so small compared with the total number of possible patterns that there is only a small probability of randomly obtaining a highly correlated pattern.
367
Nevertheless, it is wise to examine the randomized design to fully protect against the unlikely possibility of having obtained a correlated design. The experimental design matrix for the randomized pattern shown in Figure 15.5 is
r 7 02 10 9 20 1 30 0 40 D= 8 50 4 60 3 70 6 80 5 90 10 100.
(15.10)
Let us assume that the corresponding response matrix is
-96' 89 84 81 75 Y = 70 65 61 54 49 -46.
(15.11)
Matrix least squares fitting of the model of Equation 15.6 gives an (X'X) matrix that can be inverted. The fitted model is
Rate=95.04-0.009629[Na+ ]mM-0.4998min+r,j
(15.12)
Confidence intervals (Equation 11.66) suggest that it is unlikely the sodium ion
368
. . / 8 0
0
* “a+],
mM
Figure 15.5 Factor combinations for uncorrelated experimental design of Equation 15.10.
concentration has a statistically significant effect on the rate of the enzyme catalyzed reaction; some other factor, one that is correlated with time, is probably responsible for the observed effect (see Section 1.2 on masquerading factors).
15.3 Completely randomized designs Suppose we are interested in investigating the effect of fermentation temperature on the percent alcohol response of the wine-making system shown in Figure 1.6. We will assume that ambient pressure has very little effect on the system and that the small variations in response caused by this uncontrolled factor can be included in the residuals. Further, we can use the same type and quantity of yeast in all of our experiments so there will be no (or very little) variation in our results caused by the factor “yeast”. Because we intend to carry out these experiments in the leisure winter months, our inventory of frozen fruit will be low and we will not have enough of any one type of fruit to be used in all of the experiments. However, we will have available modest amounts of each of twenty types of fruit, so we can randomly assign these fruit types to each experiment and expect to “average out” any effect caused by variability of the qualitative factor, “fruit”. A systems view of the experimental arrangement is shown in Figure 15.6. Let us carry out a so-called “screening experiment” in which we attempt to
369
Figure 15.6 General system theory view of a wine-making process.
discover if the fermentation temperature (factor xI)has a “significant” effect on the response y, (% alcohol content). We will choose two levels of temperature, 23°C and 27°C. This is the minimal number of factor levels required to fit the two-parameter model Ylr = Po + P l X l l +r,i
(15.13)
where y l i is the % alcohol content and xli is the fermentation temperature. Table 15.1 contains the experimental design and the results of the investigation. Note that the time order of the experiments has been randomized, and the different fruits have been randomly assigned to the temperatures (with the restriction that ten fruits are assigned to each of the two temperatures). The sugar content of each fruit is also listed. Figure 15.7 shows the factor combinations of fruit number and temperature represented in Table 15.1; the number beside each combination indicates the time order in which the experiments are run.
0 13 0
4
0
1
0 10
0
5
0 16
0
2 0
8
0
9
0 14 0 17 0 20
0 18 0 15 0 $2
0
7
e
6
0
3
0 11 0 19
,
L
21
23
27
25 Temperature
7.-
29
Oc
Figure 15.7 Factor combinations for a completely randomized design investigating the effect of temperature. ‘‘Fruit number” is an arbitrarily assigned, qualitative factor. Numbers beside factor combinations indicate the time order in which experiments were run.
370
TABLE 15.1 Completely randomized experimental design for determining the effect of temperature on a wine-making system. ~~
Experiment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Fruit 18 14 3 19 16 4 5 13 9 17 2 6 20 12 7 15 11 8 1 10
sugar, % 3.38 10.35 6.53 4.28 1.48 9.66 8.26 3.73 1.65 5.30 7.02 5.58 0.82 2.16 7.81 2.60 5.40 9.45 5.73 9.04
Temperature 23 23 27 23 27 27 23 27 27 27 23 23 23 23 27 27 23 27 27 23
% Alcohol 5.10 11.08 9.62 6.62 5.98 12.92 10.20 8.58 7.32 10.64 9.60 8.86 5.44 6.92 13.04 9.28 9.92 15.16 12.58 13.22
Least squares fitting of the model expressed by Equation 15.13 to the data in Table 15.1 gives the fitted model
y , , = - 1.746+0.454OX,,
(15.14)
Figure 15.8 gives the sums of squares and degrees of freedom tree. The data and fitted model are shown in Figure 15.9. Using Equation 11.66 as a basis, the parameter estimate b, = 0.4540 is significant at the 84.22% level of confidence:
F , 1 . 1 8 ) =b?/s&= (0.4540)2/ ( 7 . 5 9 0 ~ 0 . 0 1 2 5=2.1726 )
(15.15)
Note that the F-ratio in Equation 15.15 would be larger (and therefore more significant) if the effect of temperature were stronger (b, > 0.4540); however, we have no control over the value of b,. The F-ratio would also be larger if were
37 1
k,,,, 9
(,
= 1844
SScorr
/
1
(Sst,,t
= 16 4
=
,)
153 10
l9 9
-(,SSP
I
\ = 136
61)
18
Figure 15.8 Sums of squares and degrees of freedom tree for the completely randomized design.
smaller; we can have control over sil. The value of sildepends on two quantities: the appropriate element of the (XtX)-' matrix and the quantity s: (see Equation 11.65). The first of these, the element of the (XtX)-' matrix associated with b,, can be made smaller by increasing the number of experiments (which will also increase the number of degrees of freedom in the denominator of the F-ratio and improve the confidence of the estimate) andor by using a broader region of experimentation in the factor space (e.g., by carrying out experiments at 21°C and 29°C instead of 23°C and 27°C). Decreasing sil by decreasing the value of sf might be accomplished by using a model that takes other factors into account, or by removing the effect of other factors by appropriate experimental design. We will discuss this latter technique in subsequent sections; in the remainder of this section, we show an example of decreasing sil by using an expanded model. Careful inspection of the information in Table 15.1 suggests that sugar content might be a significant factor in determining the percent alcohol content of wine. The plot of percent alcohol vs. percent sugar in Figure 15.10 seems to confirm this. Let us use the data of Table 15.1 to fit the model Yl,=Po+PIXlr+P2X*r+rl,
(15.16)
where y l i is again the percent alcohol, x l i is the temperature, and x2; is the percent sugar in the fruit. Figure 15.11 gives the sums of squares and degrees of freedom tree. The fitted model is -7.423+0.5018~1,+O.8133~2,
~ l , =
(15.17)
and is plotted in Figure 15.12. The variance of residuals has been greatly reduced by taking into account the effect of sugar content; s: is now 1.550 compared to 7.590
372 0
' l:I u
0 0
m 0
L
a alw
I
0
0
i
I
21
23
25 Temperature.
29
27 OC
Figure 15.9 Graph of the model y l i = -1.746
+ 0 . 4 5 4 0 ~with ~ ~ experimental data.
0
0 0
0 0
m 0
0 0
0
0
0 0
0
7
2
4
6
8
10
12
P e r c e n t Sugar
Figure 15.10 Plot of percent alcohol response as a function of sugar content for the 20 fruits listed in Table 15.1.
373
2
17
Figure 15.1 1 Sums of squares and degrees of freedom tree for the two-factor model that includes both temperature and sugar content.
when sugar was not included as a factor. The parameter estimate b, = 0.5018 is now significant at the 99.78% level of confidence:
=b:/$, = (0.5018)2/( 1.550~0.0125)= 12.98
F(1,17)
(15.18)
15.4 Randomized paired comparison designs In the previous section, the discussion following Equation 15.15 suggested that
Figure 15.12 Response surface for the fitted two-factor model.
374
certain experimental designs could be used to decrease the uncertainty associated with a parameter estimate by minimizing the effect of uncontrolled factors. In this section we discuss one of these experimental designs, the randomized paired comparison design. We have seen that the percent alcohol content of a wine depends not only on the temperature of fermentation, but also on the type of fruit used (in part because of the masquerading factor, sugar content). When different fruit types were randomly assigned to temperature levels, the plot of percent alcohol vs. temperature was very noisy (Figure 15.9), and the effect of temperature was estimated with poor confidence (Equation 15.15). Because we knew that the lurking factor sugar content was important, and because we had measured the level of sugar content in each of the fruits, we were able to account for the effect of sugar in our expanded model (Equations 15.16 and 15.17) and thereby improve the confidence in our estimate of the temperature effect. But what if we did not know that sugar content was an important factor, or what if we were unable to measure the level of sugar content in each fruit? We would then not have been able to include this factor in our model and the confidence in the temperature effect could not have been improved. In the completely randomized design, a different fruit was randomly assigned a temperature, either 23°C or 27°C. Let us consider now a different experimental design. We will still employ the same number of experiments (20), but we will use only half as many fruit types, assigning each fruit type to both temperatures. Thus, each fruit will be involved in a pair of experiments, one experiment at 23°C and the
*
c
10
e
2
11
14
18
15
*
7
16
-
*
21
23
5
13
25 27 Temperature, OC
2g
Figure 15.13 Factor combinations for a randomized paired comparison design investigating the effect of temperature. ‘‘Fruit number” is an arbitrarily assigned, qualitative factor. Numbers beside factor combinations indicate the time order in which cxperiments are run.
375
other experiment at 27°C. The experimental design is shown in Table 15.2 and in Figure 15.13. Fitting the model of Equation 15.13 to this data gives y I 1= - 1.233+0.4908~,,
(15.19)
The data and fitted model are shown in Figure 15.14 (compare with Figure 15.9). Based on
F , I , l s )= (0.4908)2/ (5.6953 x 0.0125 ) = 3.38
( 15.20)
the temperature effect is significant at the 91.76% level of confidence, only slightly more significant (by chance) than that of Equation 15.14 (84.22%) for the same model fit to data from the completely random design. Up to this point, the randomized paired comparison design has not offered any great improvement over the completely randomized design. However, the fact that each fruit has been investigated at a pair of temperatures allows us to carry out a
TABLE 15.2 Randomized paired comparison experimental design for determining the effect of temperature on a wine-making system. Experiment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Fruit
Temperature
% Alcohol
4 8 7 10 1 5 3 6 2 9 7 9 1 6 4 2 10 5 8 3
27 23 27 23 23 27 23 27 27 23 23 27 27 23 23 23 27 23 27 27
12.99 13.20 12.98 13.28 10.59 12.13 7.58 10.82 11.56 5.40 11.04 7.28 12.49 8.90 10.83 9.51 15.18 10.21 15.06 9.68
376
01
21
23
25 Temperature.
27
29
OC
Figure 15.14 Graph of the fitted model y,, = -1.233
+ 0.4908x,, with experimental data.
different type of data treatment based on a series of paired comparisons. We realize that there are a number of factors in addition to temperature that influence the % alcohol response of the wine: sugar content, pressure, magnesium concentration in the fruit, phosphate concentration in the fruit, presence of natural bacteria, etc. Although we strive to keep as many of these factors as controlled and therefore as constant as we can (e.g., pressure), we have no control over many of the other factors, especially those associated with the fruit (see Section 1.2). However, even though we do not have control over these factors, it is nonetheless reasonable to expect that whatever the % alcohol response is at 23"C, the % alcohol response at 27°C should increase for each of the fruits in our study if temperature has a significant efect. That is, if we are willing to make the assumption that there are no interactions between the factor of interest to us (temperature) and the other factors that influence the system, the diferences in responses at 27°C and 23°C should be about the same for each pair of experiments carried out on the same fruit. Examination of Table 15.2 tends to confirm this idea. In experiments 10 and 12, for example, fruit number nine had responses of 5.40% and 7.28% at 23°C and 27"C, respectively. In experiments 4 and 17, fruit number ten had responses of 13.28% and 15.18% at 23°C and 27"C, respectively. Even though these two fruits give quite different individual responses at a given temperature, the temperature effect for each fruit is about the same: +1.88% for fruit number nine, and +1.90% for fruit number ten as the temperature is increased from 23°C to 27°C. Similar trends are observed for the other fruits.
377
There are a number of equivalent ways this paired comparison data can be evaluated; we base our treatment here on a linear model that can be used to estimate the effect of temperature and its significance. Of the twenty original pieces of experimental data, we form the ten pairwise differences listed in Table 15.3. Note that we lose half of our original 20 degrees of freedom in forming these differences, so our resulting data set has only ten degrees of freedom. Thus, we can arbitrarily set the “response” of each fruit equal to zero at 23°C; the “response” of each fruit at 27°C is then simply the calculated difference. Further, we can shift (code) the temperature axis so that 23°C becomes 0 and 27°C becomes +4. An appropriate model for assessing the temperature effect in this coded data system is
This is the straight line model constrained to pass through the origin (see Section 5.3). The fitted model is y , =0 . 4 9 0 8 ~ : ~
(15.22)
and is shown with the data in Figure 15.15. The sums of squares and degrees of freedom tree is given in Figure 15.16. The significance of b; is estimated from
0
1
2
4
7,
Change i n T e m p e r a t u r e
5
OC
Figure 15.15 Graph of the fitted model yl, = 0.4908x,, with experimental data.
378 TABLE 15.3 Paired differences for determining the effect of temperature on a wine-making system. Fruit alhocol
1 2 3 4
5 6 7 8 9 10
AT, "C
A%
4 4 4 4 4 4 4 4 4 4
1.90 2.05 2.10 2.16 1.92 1.92 1.94 1.86 1.88 1.90
F ( , , 9 ,=b:/.~g,= (0.4908)*/(0.0105~ 0 . 0 0 6 2 5 =3670.62 )
( 15.23)
which in this case is entirely equivalent to the F-test for the significance of regression. F is significant at only slightly less than the 100.000% level of confidence. Thus, the randomized paired comparison design has allowed a more sensitive way of viewing our data, a view that ignores much of the variation caused by the use of different fruits and focuses on pairwise differences associated with the single factor of interest, temperature.
15.5 Randomized complete block designs The randomized paired comparison design discussed in the previous section separates the effect of a qualitative factor, fruit, from the effect of a quantitative factor, temperature (see Section 1.2). The randomized complete block design discussed in this section allows us to investigate more than one purely qualitative variable and to estimate their quantitative effects. Suppose the researcher involved with the sodium ion concentration study of Sections 15.1 and 15.2 becomes interested in the wine-making process we have been discussing in Sections 15.3 and 15.4. In particular, let us assume the researcher is interested in determining the effects on the percent alcohol response of adding 10 milligrams of three different univalent cations (Li+, Na+, and K+) and 10 milligrams
379
55, = 0 0948 9
1
Figure 15.16 Sums of squares and degrees of freedom tree for the randomized paired comparison design.
of four different divalent cations (Mg2+,Ca”, Sr’+, and Ba”) in pairs - one univalent and one divalent cation per pair - to one-gallon batches of fermentation mixture. In planning the experiments, the researcher holds as many variables constant as possible, and randomizes to protect against confounding uncontrolled factors with the two factor effects of interest. If experiments are carried out at all combinations of both factors (3x4 = 12 factor combinations) with replicates at each factor combination, the resulting 24-experiment randomized design is shown in Table 15.4 and Figure 15.17. In the classical statistical literature, one of the two qualitative factors is referred to as the “treatments” and the other qualitative factor is referred to as the “blocks”. Hence, the term “block designs”. In some studies, one of the qualitative factors might be correlated with time, or might even be the factor “time” itself; by carrying out the complete set of experiments in groups (or “blocks”) based on this factor, estimated time effects can be removed and the “treatment” effects can be revealed
L1
Na Type o f U n i v a l e n t C a t i o n
K
Figure 15.17 Factor combinations for the randomized complete block design investigating two qualitative factors, type of univalent cation and type of divalent cation. Each factor combination is replicated.
380
TABLE 15.4 Randomized complete block design for determining the effects of univalent and divalent cations on a wine-making system. Experiment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Univalent cation
Divalent cation
K K Na Li Li K Li K Na Na Li Li Na Na Na Li Li Na K K K Na K Li
Sr Ba Sr Sr Ca Ba Mg Ca Mg Ca Ba Mi2 Ca Ba Sr Sr Ca Ma Ma Ma Sr Ba Ca Ba
% Alcohol
3.99 1.17 8.25 4.90 6.43 1.20 7.61 5.76 11.27 10.22 1.97 7.37 10.42 5.60 8.41 4.65 6.56 1 1.07 7.01 6.7 1 4.16 5.96 6.02 2.36
more clearly. However, it is entirely equivalent to state that by carrying out the complete set of experiments in “treatments”, the “block” effects can be revealed more clearly. In situations where the effect of time is not removed by blocking but is instead minimized by randomization, the assignment of the terms “blocks” and “treatments” to the remaining factors of interest is not always straightforward, and the terms are often interchangeable. For example, in the present study, one could view the experiments as involving univalent cation “treatments” and divalent cation “blocks”. But the experiments could also be viewed as consisting of univalent cation “blocks” and divalent cation “treatments”. In general, we will avoid the use of the terms “treatments” and “blocks” to refer to the levels of one or the other of the qualitative factors, but we will point out that the useful concept of blocking allows us to separate the effects of one factor from the effects of another factor.
38 1
Because nominal factors are not continuous, we cannot use a linear model such as y l i = Po + PIxli+ ps2i + rli to describe the behavior of this system. For example, if x1 were to represent the factor “univalent cation”, what value would x1 take when Li was used? Or Na? Or K? There is no rational basis for assigning numerical values to xl, so we must abandon the familiar linear models containing continuous (quantitative) factors. Instead, we will view the percent alcohol response from the wine-making process as follows. Let us first pick a reference factor combination. Any of the twelve factor combinations could be used; we will choose the combination Li-Mg (the lower left corner of the design in Figure 15.17) as our reference. For that particular reference combination, we could write the linear model
which is analogous to the zero-factor model yli = Po + rli. The subscript on PLIMg indicates which factor combination we are using as our reference. Let us now assume that no matter what level of divalent cation we are using, changing the univalent cation from Li to Na will give the same change in response. This is very similar to the reasoning used in treating the data from the randomized paired comparison design of Section 15.4. Thus, if Na were used instead of Li, we might write the model
where the new term PNa is a measure of the difference in response caused by using Na instead of Li. Likewise, if K were used instead of Li, we might write Yl i = PLiMg
PK
+ P K + rl i
( 15.26)
where is a measure of the difference in response caused by using K instead of Li. In a similar way, we might substitute Ca, Sr, or Ba for Mg and write the models
and
382 Ylr=PLiMg+
PBa+rlr
( 15.29)
where the terms Pea, psr, and PBaare measures of the differences in response caused by using Ca, Sr, or Ba instead of Mg. If we used Ca instead of Mg and used Na instead of Li, a valid model would be
If we were to continue this scheme of two-factor substitutions, we would accumulate a total of twelve different models, one for each factor combination in the study. However, it is possible to combine all twelve of these separate models in a single model through the use of “dummy variables”. These dummy variables can be used to “turn on” or “turn off” various terms in the combined model in such a way that the twelve separate models result. Let us create a dummy variable, xNa.For a given experiment i, we will assign to xNathe value 0 if sodium ion is no? included in the study, or the value 1 if sodium ion is included in the study. We might then write
Ylr
= PLiMg + P N A X N a r + Y l r
(15.31)
If Li were included in a particular experiment, xNai= 0 and Equation 15.31 would reduce to Equation 15.24; if Na were included in the experiment, xNai= 1 and Equation 15.31 would be the same as Equation 15.25. We can also create the dummy variables xK,xca,x,,, and xBafor use with the model
to designate any one of the factor combinations shown in Figure 15.17. Note that there is one reference parameter, PLIMg, analogous to Po in the other linear models; two difference parameters (p, and P,) for the three univalent cations; and three difference parameters (Pea, Psr, and p), for thefour divalent cations, giving a total of six parameters. The matrix of parameter coefficients X for the experiments listed in Table 15.4 and the model of Equation 15.32 is
383
1 1 1 1 1 1 1 1 1 1 1 1 X= 1 1 1 1 1 1 1 1 1 1 1 -1
Na K 0 1 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 1 0 0 0 1 0 0
1
0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0
Ca 0 0 0 0 1
Sr 1 0 1 1
Ba
0
0
0
0 0
1 0 0 0 0 1
0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0
0
1 0 0
0 0
(15.33)
1 0 0 0
0 0 0 0 1 0 1
Treating the data by conventional matrix least squares techniques gives
(15.34)
The interpretation of these parameter estimates is straightforward. The parameter bLiMg estimates the response for the reference factor combination containing Li and Mg. The estimated value of 7.53 compares favorably with the results of experiments 7 and 12 in Table 15.4, replicates for Li and Mg with responses of 7.61 and 7.37,
384
respectively. The parameter estimate b,, = 3.67 suggests that replacing Li with Na causes an increase in response to 11.20; experiments 9 and 18 in Table 15.4 have responses of 11.27 and 11.07, respectively. Replacing Li with K, however, causes a decrease in response of -0.73 (bK).Similarly, replacing Mg with Ca, Sr, or Ba causes decreases in response of -0.94 (bca), -2.78 (bsr), and -5.46 (bBa). The response for experiment 6 in Table 15.4 would be estimated as 7.53 (bLiMg) minus 0.73 (bK, the difference caused by replacing Li with K) minus 5.46 (bBa,the difference caused by replacing Mg with Ba) = 1.34; the measured response was 1.20. Figure 15.18 shows a sums of squares and degrees of freedom tree for the data of Table 15.4 and the model of Equation 15.32. The significance of the parameter estimates may be obtained from Equation 11.66 using sf and (X’X)-’ to obtain the matrix for the present example is variance-covariance matrix. The (X’X)-’ 0.250 -0.125 -0.125 ( X X )- I = -0.167
-0.167 -0.167
-0.125 0.250 0.125 0
0 0
-0.125 0.125 0.250
0 0 0
-0.167 0
-0.167
0
0 0
0.333 0.167 0.167
0.167 0.333 0.167
-0.167 0
0 0.167 0.167 0.333 (15.35)
Note that for the randomized complete block design, there is no covariance between the effects of the univalent cations and the effects of the divalent cations. The randomized complete block design has provided a sensitive way of viewing the data from this set of experiments involving two qualitative factors. The linear model using dummy variables ignores much of the variation in the data by again focusing on pairwise differences associated with the different discrete levels of the factors of interest.
15.6 Coding of randomized complete block designs Because of tradition, the model of Equation 15.32 is seldom used for the randomized complete block design. Instead, a somewhat different but essentially equivalent model is used: Y , = P +y, + 7,+ t,
( 15.36)
where yi is the response in the ith “block” receiving thejth “treatment” (note the different meaning attached to i), p is the average response (equivalent to yI), yj is a
385
rn S S t = 1119.70
(SS1,t
= 0 14)
6
S-S ,(,
=
),,
0 37
12
Figure 15.18 Sums of squares and degrees of freedom tree for the randomized complete block design.
“block” effect, zj is a “treatment” effect, and the E~ are “errors” (residuals) between what is observed and what the model predicts. Two equality constraints are placed on the yi’s and xj’s (see Section 2.3): ( 15.37)
c z,=o i
( 15.38)
For the example used in Section 15.5, there would be three y’s and four z’s (or, perhaps, four y’s and three 7’s). If y is associated with the qualitative factor “univalent cation”, then yLi would be the average “block” difference in response between the experiments involving Li and the overall mean; yNaand yK would be the corresponding differences for experiments involving Na and K. Similarly,,,z, ,,z, zSr, and T, would be the average “treatment” differences for experiments involving the divalent cations Mg, Ca, Sr, and Ba. Thus, the full model would be
Again, certain terms are “turned on” or “turned off” to correspond to a particular factor combination ij. Notice that Equation 15.39 has eight parameters; Equation 15.32 has only six parameters. It would appear that the two models are not equivalent
386
because they contain different numbers of parameters; however, the equality constraints of Equations 15.37 and 15.38 take away two degrees of freedom from Equation 15.39 and it actually has only six independent parameters (see Section 2.3). The equality constraints require that
and TMg
=-k
a
- %r - TBa
(15.41)
The relationships between the parameters of Equations 15.32 and 15.39 are easily discovered.
PNa=YNa-YLi=YNa-(-yNa-YK)=2yNa+YK
( I 5.43)
( 15.47)
( 15.48)
387
( 15.49)
then by Equation 11.48, B = A-'B* and it is evident that Equations 15.42-15.47 are equivalent to
1 0 0 0 0 0
-1 2 1 0 0 0
-1 1 2 0 0 0
-1 0 0 2 1 1
( 15.50)
Similarly, B' = A D and
1 0 0 0 0 0
113 213 -113
113 -113 213
0 0 0
0 0 0
114 0 0 314 -114 -114
114
0 0 - 114
314 -114
114
01 01 - 114 -114 314
(15.5 1)
The remaining parameters of Equation 15.36 (yLi and 'cMg) are calculated from Equations 15.40 and 15.41. Thus, if the model of Equation 15.32 has been used to treat the data from a randomized complete block design, the results may be readily converted to the form of Equation 15.36.
388
Exercises 15.1 Confounding.
Historical data from production facilities often give rise to highly confounded results. Consider a process in which temperature and pressure are considered to be inputs, and yield is an output. If the data are based on observations rather than experiments, why might temperature and pressure be confounded? [See, for example, Box (1954).] 1.5.2 Confounding. Use matrix least squares to fit the model yli = following data:
Po + p,xli + p ~ , ;+ rli to the
2
14
34.0
12.3
3
20
43.0
33.5
4
13
32.5
87.4
5
19
41.5
3.0
6
16
37.0
105.0
7
22
46.0
47.3
8
15
35.5
88.6
9
18
40.0
88.2
10
21
44.5
51.7
15.3 Randomization. The following ten random numbers were obtained in the following sequence from a computer: 1, 2, 10, 3, 4, 9, 5 , 6, 7, 8. Comment. 15.4 Randomization.
Find a table of random numbers and use it to reorder the experimental design of Equation 15.1. [See, for example, Cochran and Cox (1950).]
389
15.5 Randomization. “Randomization may be thought of as insurance, and, like insurance, may sometimes be too expensive” [Natrella (1963), p. 11-41. Comment. Give a set of circumstances in which it might be too expensive to randomize. 15.6 Randomization. “Randomization affords insurance against uncontrollable disturbances in the sense that such disturbances have the same chance of affecting each of the factors under study, and will be balanced out in the long run” [Natrella (1963), p. 13-11. Will the uncontrollable disturbances be balanced out “in the short run”? 15.7 Blocking. One of the advantages often put forth for grouping experiments in blocks is that it allows the “treatment effects” to be obtained with (usually) better precision. The “block effects” are usually ignored. Is information being wasted when this is done? Give an example for which the block effects might provide useful information. 15.8 Completely randomized designs. “One fact compensates to some extent for the higher experimental errors [uncertainties of the completely randomized design] as compared with other designs. For a given number of treatments and a given number of experimental units, complete randomization provides the maximum number of degrees of freedom for the estimation of error” [Cochran and Cox (1950)l. Why is this an advantage? Is it more important for large n or small n? 15.9 Completely randomized designs. “This plan is simple, and is the best choice when the experimental material is homogeneous and background conditions can be well controlled during the experiment” [Natrella (1 963)]. Comment. 15.10 Randomized paired comparison designs. Suppose someone comes to you with the hypothesis that shoes worn on the right foot receive more wear than shoes worn on the left foot. Design a randomized paired comparison experiment to test this hypothesis. How might it differ from a completely randomized design? 15.11 Randomized complete block design. Suppose a sports enthusiast wants to see which of the following has the greatest effect on minimizing bicyclists’ time to complete a one-mile ride: resting 10 minutes immediately before the race, doing 50 pushups 20 minutes before the race, running in place 10 minutes immediately before the race, drinking 0.5 liters of orange juice 5 minutes before the race, or drinking 0.5 liters of water 15 minutes before the race.
390
Design a randomized complete block design to help answer the question raised by the sports enthusiast. What are the “blocks” and “treatments” in your design? 15.I 2 Randomized complete block designs. Randomized paired comparison designs can be thought of as randomized complete block designs in which the blocking is done in pairs to eliminate unwanted sources of variability. If the data in Section 15.4 are treated as a randomized complete block design, what is the model? How many degrees of freedom are available for estimating the temperature effect? How many “block effects” must be estimated? How many degrees of freedom are there for sk?
15.13 Randomized complete block designs. It has been suggested that a set of factor combinations such as that in Section 15.5 is simply a 3x4 factorial design. Comment. 15.14 Balanced incomplete block designs. When the “block size” is smaller than the number of “treatments” to be evaluated, incomplete block designs may be used [Yates (1936)l. Balanced incomplete block designs give approximately the same precision to all pairs of treatments. The following is a symmetrical (i.e., the number of blocks equals the number of treatments) balanced incomplete block design: Block
Treatments
1
ABD
2
BCE
3
CDF
4
DEG
5
EFA
6
FGB
7
GAC
How many times does a given treatment appear with each of the other treatments? Graph the design with blocks on the vertical axis and treatments on the horizontal axis. If the usual model for this design is yii = p + + zj + E ~ write , an equivalent linear model using p’s, and write the corresponding X matrix.
39 1
15.15 Latin square designs. Randomized block designs are useful when there is a single type of inhomogeneity (the “block effects”) detracting from precise estimates of the factor effect of interest (the “treatment effects”). When there are two types of inhomogeneity, a Latin square design is often useful. The model is y , = p + 1: + oj + T~ + E ~ The ~ . following is a 4 x 4 Latin square design: A B C D C D A B B C D A D A B C If the “columns” represent the -yj and the “rows” represent the oj,what do the letters A, B, C, and D represent? If the corresponding numerical results are
46 21
40 40
48 28 49 55
46 36 51 48
40 21
44 43
what are the estimated differences (p’s) between A and B, C, and D?
15.I6 Youden square designs. Another useful experimental design for minimizing the effects of two types of inhomogeneity is the Youden square design. Latin squares must have the same number of levels for both of the blocking factors and the treatment factor; Youden squares must have the same number of levels for the treatment factor and one of the blocking factors, but the number of levels for the other blocking factor can be smaller. Thus, Youden squares are more efficient than Latin squares, especially as the number of “treatment” levels gets large. How is the following Youden square design related to the Latin square design of Problem 15.15? A B C D C D A B D A B C
392
If the results are 46
27 40
48 28 55
46 36 48
40
27 43
what are the differences (p’s) between A and B, C, and D? What is the relationship - Youden square designs:Latin square designs::balanced incomplete block designsxandomized complete block designs?
15.17 Graeco-Latin square designs. If there are three types of blocking factors, Graeco-Latin square designs can be used to minimize their effects. The following is a 4 x 4 Graeco-Latin square. What do a, p, y, and o represent? A a BP B o Ay CP D a Dy C o
Do DP C a
Cy
A o By B a AP
15.18 Other designs. Look up one of the following experimental designs and explain the basis of the design, its strengths, its weaknesses, and its major areas of application: Plackett-Burman designs, Box-Behnken designs, cross-over designs, hyper-Graeco-Latin square designs, fractional factorial designs, split-plot designs, quasi-Latin square designs, partially balanced incomplete block designs, lattice designs, rectangular lattice designs, cubic lattice designs, chain block designs. 15.19 Experimental design. “In general, we should try to think of all variables that could possibly affect the results, select as factors as many variables as can reasonably be studied, and use planned grouping where possible” [Natrella (1963), p. 11-41. Comment.
393
APPENDIX A
Matrix Algebra
Matrix algebra provides a concise and practical method for carrying out the mathematical operations involved in the design of experiments and in the treatment of the resulting experimental data.
A.1 Definitions A matrix is a rectangular array of numbers. Many types of data are tabulated in arrays. For example, baseball fans are familiar with a tabulation of data similar to the following array:
Team
Won
Lost
Pct
Houston
41
22
0.65 1
Cincinnati
40
25
0.615
Los Angeles
36
28
0.563
San Francisco
28
32
0.467
San Diego
29
35
0.453
Atlanta
25
36
0.410
Not only is the value of each element in the matrix important, but the location of each element is also significant. Fans of the Atlanta team would be dismayed to see the sixth row of the array, Atlanta
25
36
0.4 10
ranking Atlanta in last place. The baseball fans might also be interested in the
394
winning percentages given by the third column of the array,
0.65 1 0.6 15 0.563 0.467 0.453 0.410 If we omit the row and column headings and focus our attention on the arrays of numbers in this example, we are dealing with the matrices
-
A=
41 40 36 28 29 25
22 25 28 32 35 36
0.651 0.615 0.563 0.467 0.453 0.410
B=[25
36
0.4101
0.65 1 0.615 0.563 C= 0.467 0.453 0.4 10
The dimensions of a matrix are given by stating first the number of rows and then the number of columns that it has. Thus, matrix A has six rows and three columns, and is said to be a 6x3 (read “six by three”) matrix. Matrix B has one row and three columns and is a 1x3 matrix. Matrix C is a 6x1 matrix. Generally, a matrix M that has r rows and c columns is called an rxc matrix and can be identified as such by the notation Mmc. If the number of rows is equal to the number of columns in the matrix, the matrix is said to be a square matrix. For example, given the two simultaneous equations
the coefficients of the two unknowns x, and x2 constitute a 2x2 square matrix
If a matrix contains only one row, it is called a row matrix or a row vector. The matrix B above is an example of a 1x3 row vector. Similarly, a matrix containing only one column is known as a column matrix or column vector. The matrix C above is a 6x1 column vector. One use of vectors is to represent the location of a point in
395
an orthogonal coordinate system. For example, a particular point in a three-dimensional space can be represented by the 1x3 row vector [ 7 4 91 where the first element (7) represents the x,-coordinate, the second element (4) represents the x,-coordinate, and the third element ( 9 ) represents the x,-coordinate. Capital italic letters in bold-face are used by typesetters to represent matrices. The values in an array, or the elements of the array, are denoted using the corresponding small italic letters with appropriate subscripts. Thus, aii denotes the element in the ith row and jth column of the matrix A . The individual elements of the previously defined A matrix are
a , , =41
a,, =22 a,,=25 a,, =28 a4, =32 a,, = 3 5 a62 =36
a,, =40 a3,=36 a4, =28 a,, =29 U 6 , =25
a , , =0.651 a,,=0.615 =0.563 a43=0.467 =0.453 a63 =0.410
Two matrices are equal ( A = 0 ) if and only if their dimensions are identical and their corresponding elements are equal (ag = dv for all i andj). The transpose X' of a matrix X is formed by interchanging its rows and columns; that is, the element in row i and columnj of the transpose matrix is equal to the element xji in row j and column i of the original matrix. For example, if
gj
then
x="3
-1
'1
0
Note that the first row of X becomes the first column of X', the second row of X becomes the second column of X', and so on. If X is a p x q matrix, then X' is a qxp matrix. If the transpose of a matrix is identical in every element to the original matrix (that is, if A' = A ) , then the matrix is called a symmetric matrix. Thus, a symmetric matrix
396
has all elements aij equal to all elements aji; it is symmetric with respect to its principal diagonal from upper left to lower right. A symmetric matrix is necessarily a square matrix, because otherwise its transpose would have different dimensions and could not be identical to it. A special case of the symmetric matrix is the diagonal matrix, in which all the off-diagonal elements are zero. For example,
; :] 0
1
is a 3x3 diagonal matrix where m,, = 2, mZ2= 3, m33= 1, and mij = mji = 0 for all i#j.
The identity matrix Z is a diagonal matrix which has all 1's on the diagonal; for example, the 3x3 identity matrix is
A.2 Matrix addition and subtraction The sum of two matrices is obtained by adding the corresponding elements of the two matrices. For example, given
and
.=[-4
'I
1 3
-2
then the sum S is
S=A+B=
2+(-4)
3+2
(-l)+l
0+3
1+1 5+(-2)
Note that the resulting matrix has the same dimensions as the original matrices; two
397
matrices may be added together if and only if they have identical dimensions. For example, if
T = [ :0
:]
-1
then S and T cannot be added together. When the dimensions of two matrices are the same, they are said to be conformable f o r addition. Matrix addition is commutative and associative:
A+B=B+A
(commutative)
+
A+ (B+C)=(A+B ) C
(associative)
The negative -A of a matrix A is simply the matrix whose elements are the negatives of the corresponding elements of A . The diflerence between two matrices is obtained by subtracting the corresponding elements of the second matrix from the elements of the first. For example, given the two matrices A and B on the previous page, their difference is
A.3 Matrix multiplication The product of two matrices AB exists if and only if the number of rows in the second matrix B is the same as the number of columns in the first matrix A . If this is the case, the two matrices are said to be conformable for multiplication. If A is an mxp matrix and B is a p x n matrix, then the product C is an mxn matrix:
The number of rows (m)in the product matrix C is given by the number of rows in the first matrix A , and the number of columns ( n ) in the product matrix is given by the number of columns in the second matrix B . Each of the elements of the product matrix, C = AB, is found by multiplying each of the p elements in a column of B by the corresponding p elements in a row of A and taking the sum of the intermediate products. Algebraically, an element ci is calculated
398
For example, given the matrices
To calculate the element cI1in the product matrix C = AB:
To form the element c3* in the product matrix, we find
-I=[: -
7
8
9
7 ~ 0 + 8 ~ 3 + 9 X -l
n]
The entire product matrix may be calculated similarly.
An alternative layout of the matrices is often useful, especially when the matrices are large. The right matrix is raised, and the product matrix is moved to the left into the space that has been made available. The rows of the left matrix and the columns of the right matrix now “point” to the location of the corresponding product element.
399
1
[I!
-; i I;] 2
-
-3
-1
1
2 1
11
3
-I
2
-1
1 1
2 1 3 -3
-1
-1 -2
-1
1
1
2
-1
1 1 1 1
I:]1
-1
5
Another example of matrix multiplication involves an identity matrix:
This example illustrates why the identity matrix Z is so named: it serves the same role as the number 1 does in the multiplication of ordinary real numbers.
AI =IA = A Note that the multiplication of matrices is distributive
A(B+C)=AB+AC and associative
(AB)C=A(BC)
400
but that it is not, in general, commutative,
ABfBA This general non-commutative property of matrix multiplications is in contrast with ordinary algebra. The product of a number and a matrix is another matrix obtained by multiplying each of the elements of the matrix by the number. For example,
A.4 Matrix inversion The inverse A-' of a matrix A serves the same role in matrix algebra that the reciprocal of a number serves in ordinary algebra. That is, for a nonzero number a in ordinary algebra, a(l/a)=(l/a)a=l whereas in matrix algebra
where Z is an identity matrix. Multiplying a matrix by an inverse matrix is analogous to division in ordinary algebra, an operation not defined in matrix algebra. The inverse of a matrix exists only for square matrices, and for any square matrix there can exist only one inverse. The inverse of an nxn square matrix is another nxn square matrix. As will be seen, not all square matrices have inverses. Finding the inverse of a large matrix is a tedious process, usually requiring a computer. The inverse matrices for 2x2 and 3x3 matrices, however, can be easily calculated by hand using the following formulas. Given the 2x2 matrix
40 1
the inverse matrix may be found by calculating
where
D = ad- cb is the determinant of the 2x2 matrix A . Given the 3x3 matrix
then the inverse matrix is
B - ' = [ sP tq ur ] u w x where
p = (ek-hf)/D s =-(dk-gf)/D u = (dh-ge)/D
q =-(bk-hc)/D t = (ak-gc)/D
w =-(ah-gb)/D
and the determinant D is calculated as
D = a ( e k - h f ) - b ( d k - g f ) +c(dh-ge) = aek+ bgf + cdh - ahf- bdk- cge
r = (bf-ec)/D u =-(af-dC)/D x = (ae-db)/D
402
If the determinant of the matrix to be inverted is zero, the calculations to be performed are undefined. This suggests a general rule: a square matrix has an inverse ifand only ifits determinant is not equal to zero. A matrix having a zero determinant is said to be singular and has no inverse. As an example of matrix inversion, consider the 2x2 matrix
The determinant is
D = 1 x 4 - 2 x 3 3 -2 Thus, the inverse matrix is
A-'=
41-2 -21-2
-3/-2]=[-2 11-2
1
1.51 -0.5
This result can be verified by multiplying the inverse by the original matrix
1 ~ ( - 2 ) + 3 X l 1~1.5+3X(-0.5) 2~ ( - 2 ) + 4 ~ 1 2 X 1.5+4X (-0.5) As an example of a 3x3 matrix inversion, consider the matrix
403
The determinant D is D = 4 x ( 5 ~ 6 - 1 ~ 8 ) - (36~~ 6 - 1 0 ~ 8 ) + ( 2 ~ 6 1 -~l O X 5 ) =
-
B-I=
(5X6-1X8) 132 -(6X6-10X8) 132 (6x1-10x5) I32 0.167 0.333 -0.333
-0.121 0.030 0.197
-(3X6-1~2) 132 (4~6-1OX2) 132 -(4X1-10~3) 132
-
0.106 -0.152 0.015
132
(3x8-5x2) 132 -(4x8-6x2) 132 (4x5-6x3) 132
1
The verification that this inverse matrix is correct is left as an exercise. A special case for matrix inversion is that of a diagonal matrix. The inverse of the diagonal matrix
C=
0
0
0
. . . . -.-
cnn
is another diagonal matrix of the form
C-'=
404
A.5 Exercises Use the following array of numbers for exercises 1-8. 41 40 36 28 29 25
22 25 28 32 35 36
0.651 0.615 0.563 0.467 0.453 0.410
1. Write the numbers in the third row.
2. Write the numbers in the second column. 3. What value is in both the third row and second column?
4. What is the position of the value 29?
5. What is the position of the value 0.615? 6. What value is in both the fourth row and first column?
7. Write the numbers in the first column. 8. Write the numbers in the fifth row. 9. Write a row vector. Call it J . Use proper subscripts to indicate its size. 10. Write a column vector. Call it K . Use proper subscripts to indicate its size. 1 1 . Write a 2x2 identity matrix I . 12. Write a row vector L to indicate the location of the point x1 = 12, x, = 7 in the two-dimensional factor space of x, and x,. 13. Write a row vector M to indicate the location of the point xI = 12, x, = 7, xg = 47 in the three-dimensional factor space of x,, x,, and x3. 14. Write a row vector N to indicate the location of the point x1 = 12, x2 = 7, x3 = 47, x, = 47 in the four-dimensional factor space of xl. x2,x3,and x,.
405
15. What is the value of the element m,,? 16. Write a symmetric matrix R . 17. Write the transpose R’ of your R matrix. 18. Write a diagonal matrix S. Given the following matrices X , Y , and 2, answer the remaining questions:
20. What is the value of the element xll? 21. What is the value of the element z21? 22. What is the value of the element y3*? 23. Which matrix is a square matrix? 24. Which matrix has more rows than columns? 25. Which matrix has more columns than rows? 26. What is the element in the second row and second column of X ? 27. What is the element in the third row and third column of Y? 28. Write a matrix Q equal to X . 29. Write the transpose X’ of the X matrix. 30. Write the transpose Y‘ of the Y matrix.
31. Write the transpose 2’ of the 2 matrix. 32. Which of the above six matrices ( X , X’, Y , Y’, 2, 2’) are symmetric? 33. Add the matrices X and Y . 34. Add the matrices X’ and Y. Call the result B .
406
35. Add the matrices Y‘ and X . Call the result C . 36. Subtract X’ from Y (i.e., Y - X’). Call the result D . 37. Subtract Y’ from X (Le., X - Y’). Call the result E .
38. Write the transpose C’ of the C matrix. 39. Subtract C’ from B (i.e., B - C’). 40. Write the transpose E’ of the E matrix. 41. Add E’ and D . 42. Write F = -D. 43. Subtract Y from X‘ (i.e., X‘
- Y).
Call the result G .
44. Add F and G . 45. Multiply X by Y (i.e., YX). Call the result I’. 46. Multiply Y by X (i.e., X u ) . Call the result U . 47. Multiply T by U (i.e.. UT). 48. Multiply U by T (i.e., TU). 49, Multiply Y’ by T (i.e., TY’). 50. Multiply U by 5 (i.e., 5U). 51. Multiply X by X’ (i.e., X’X). Call the result (X’X).
52. Multiply Y by Y’ (i.e.. Y’Y). Call the result (Y’Y). 53. Take the inverse of (X’X).
54. Take the inverse of (Y‘Y).
407
A.6 Answers 1. 36 28 0.563 2. 22 25 28 32 35 36 3. 28 4. Fifth row, first column. 5 . Second row, third column.
6. 28 7. 41
40 36 28 29 25
8. 29
35 0.453
9. Variable answer. 10. Variable answer.
11.
,I
1 0
12. L = [12 71 13. M = [12 7 131 14. N = [12 7 13 471 15. 13
16. Variable answer. 17. R’ = R for a symmetric matrix.
18. Variable answer. 19. S’ = S for a diagonal matrix.
408
20. 1 21. 8
22. 2 23. Z
24. Y 25. X 26. 5 27. There is no third column in Y
28.
1 2 3 5 61
a=[,
30. Y = [ .4, 3 1 2]
31. Z = [ 25 4] 8
32. None.
33. The matrices are not conformable for addition.
409
34.
.=[a
36. D=
37. E=
:9]
['"I -2
-4
-3 -3
-1
2 -4 4
39. B - C = [ !
41. E + D =
a] :I
[:
0 0
1
410
-3 2
-3 4
[-: -:I
43.G= -1
-4
"t : I:
45. T = 39
46' '=[37
51 63
13 31 851
47. Matrices are not conformable for multiplication.
48. Matrices are not conformable for multiplication. 49. Matrices are not conformable for multiplication.
17 22 27 51.
(X'X)=22 29 36 27 36 45
1
41 1
53. The determinant is equal to zero.
54'
(yry)-l=[
1341235 -571235 -571235 26/235]=[
0.5702 -0.2426
1
-0.2426 0.1 106
This Page Intentionally Left Blank
413
APPENDIX B
Critical Values of t
iz.;o6
31.821 63.657 636,619
Taken from Table 111 of Fisher and Yates: Statistical Tables for Biological, Agricultural and Medical Research, published by Longman Group Ltd., London (previously published by Oliver and Boyd, Edinburgh), and by permission of the authors and publishers.
This Page Intentionally Left Blank
zt.2
L2.z
2s.z
LC.2
Sz.z tf.2
81.2
01.2
SP.z
6e.z
LIZ
20.2
fg.1 26.1
06.1
SL.1
06.1 16.1 f6.1 s6.1
69.1
60.2
653.1
29.1
00.2
61.1
1S.i
oL.1
6f.1
19.'
Sz.1
zS.1
00.1
P9.1
s9.I L9.r
01.;
8z.z
0f.z 2f.z
51.2
6Z.Z
71.i
Cr.2
01.2
81.2
96.1 86.1
1)l.Z
6S.z 09.2
t1P.z
tC.2
58.2
0L.z tL.z
65.;
2t.Z
tz.z
10.2
SS.;
8f.i
6I.Z
<)6.1
1S.z
iY.z
S1.c
s'"2
LL.2
Lo.f
SF.: !C.C
96.2
el.;
1-r.c
66.7
9i.z
IS.2
oh.;
6L.z
t0.z
L0.z
86.2
1o.c
8i.Z
z9.z t9.z
CS.2
<,f.Z
58.2
0L.z
1.2
1z.z
tL.2
f0.I'
0s.z
sc.2
S0.Y L0.Y
58.2
6i.z 8t.z SC.2 fS.z it.,- 0 g . i 60.2
0C.z
f
66.2
19. 2
st.2
02.2
00.2
2g.z
9q.z
S5.i
0t.z
t3.C
i").i
96.2 z0.f r1.f 3i.f
6o.C
so.;
6L.Z
10.2
0t.z
oS.7
sf.7 0t.C Ti-.!'
tt.C Ll..Y
6t.f
tg.C z6.C 0o.P g0.t i 1 . t
81 .C 0z.t 1z.t
zz.t t;.t 7i.t
31
Oi I
09 O t
Of hz SZ I .,*
<) ;
Si tc 'G
0F.t-
fz
s2.t LY.t
ST.t
-l i
Oi
61
t!.z 06.z
ts.2
16.:
1C.z
rC.f
i6.c
c2.z
Yo.;
1l.I
12.2
CC.!'
C6.z
0L.z
6g.z
fS.z PS.z
LS.7
cl..z
S?.<
so.;
18.1
21.1 9L.l 81.1
01.c
ig.2
1L.z
2z.f
L0.f Cz.C tt.C
L0.f 2z.C
z1.f
fLS S1.b z8.P P0.9
S1.C fz.f
S6.i
Il.7
cjs.2
fC.z f'b.7.
St.2
8z.z
8o.z
08.1
h"
y).z
1t.f
V6.z f2.f
IL.2
09.2 s9.i qL.z t8.Z
LS.2
9b.e
1P.i
09.i f9.z
If.?
I1.i
38.1 z6.1
1S.C 0o.b
Pyx
L9.f
9 f . b cs.P 89.b C9.S L L . S 16,s f s . 8 $9.8 P L . 8 oS.61 SP.61 1b.61
z6.z 0o.C
LI
CC.f
6C.P
8Z.P
16.f
LC.C 8S.f L8.f S6.P
91.9
bg.8 t6.8 Lf.61 C C . 6 1
3C.f
6l).f
l0.T
90s 1I.f g1.C
t2.f
6T.C PC.C 1t.f
C9.7
tS.t
sq.C
6t.f s';.c 1L.C
s6.T 01
1)i.t
')8.f
ti.t t l
!q.t
5L.t tS.t
.l. , , 6 . t
9t.t
L0.C
Sf.t
9L.t
6C.t
o0.t
1-!.f 0y.f
6S.E
IJf .f
9z.c 8t.f C9.C t8.f :r.t f5.t
S0.S 61.S 1 t . S 92.9 6C.9 6S.g 10.6 z1.6 82.6 o f . 6 1 Sz.61 91.61
.s
il
.i'
zf.S 6S.S
66.:
91
Sl t I i-I i I 11
CI
6 8 I
9
f
SC.0 f1.01 oo.hi 1S.g.1
b
1L.L
t6.q
S
19.9
6i.S
z I
-
SIP
This Page Intentionally Left Blank
417
References Abramowitz, M., and Stegun, LA., Eds. (1968), Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables (Nat. Bur. of Stand. Appl. Math. Ser., No. 55). 7th printing. US Govt. Printing Office, Washington, DC. Adams, J.L. (1991). Flying Buttresses, Entropy, and 0-Rings: The World of an Engineer, Harvard University Press, Cambridge, MA. Allus, M.A., Brereton, R.G., and Nickless, G. (1988). “The Effect of Metals on the Growth of Plants: The Use of Experimental Design and Response Surfaces in a Study of the Influences of T1, Cd, Zn, Fe and Pb on Barley Seedlings,” Chemom. Intel. Lab. Sys., 3, 215-231. Allus, M.A., Brereton, R.G., and Nickless, G. (1989). “The Use of Experimental Design, Multilinear Regression, ANOVA, Confidence Bands and Leverge in a Study of the Influence of Metals on the Growth of Barley Seedlings,” Chemom. Intel. Lab. Sys., 6, 65-80. Anderberg, E.K., Bisrat, M., and Nystrom, C. (1988). “Physicochemical Aspects of Drug Release. VII. The Effect of Surfactant Concentration and Drug Particle Size on Solubility and Dissolution Rate of Felodipine, a Sparingly Soluble Drug,” Intemar. J. of Phanna., 47, 67-77. Anderberg, E.K., and Nystrom, C. (1990). “Physicochemical Aspects of Drug Release X. Investigation of the Applicability of the Cube Root Law for Characterization of the Dissolution Rate of Fine Particulate Materials,” Intentat. J. of Pharma.. 62, 143-151. Anderson, V.L., and McLean, R.A. (1974). Design of Experiments: A Realistic Approach, Dekker, New York,
NY. Arkin, H.. and Colton, R.R. (1970). Statistical Methods, 5th ed.,Barnes and Noble. New York, NY. Asimov, I. (1948). “The Endochronic Properties of Resublimated Thiotimoline”. Astounding Science Fiction. Bailar, J.C., 111, and Mosteller, F.. Eds. (1986), Medical Uses of Statistics, New England Journal of Medicine Books, Waltham, MA. Barker, T.B. (1985), Quality by Experimental Design, Dekker, New York, NY. Bamett, V., and Lewis, T. (1978). Outliers in Statisricul Data, John Wiley & Sons, New York, NY. Bates, D.M., and Watts, D.G. (1988). Nonlinear Regression Analysis and Its Applications, Wiley. New York,
NY. Bayne, C.K., and Rubin, I.B. (1986). Practical Experimental Designs and Optimization Methods for Chemists, VCH Publishers, Deerfield Beach, FL. Beale, E.M.L. (1988). Introduction to Optimization, Wiley, New York, NY. Bendell, A., Disney, J., and Pridmore, W.A., Eds. (1989), Taguchi Methods: Applications in World Industry, Springer-Verlag, London. Berger, R.W., and Hart, T. (1986), Statistical Process Control: A Guide for Implementation, ASQC Quality Press, Milwaukee, WI. Bemdge, J.C. (1988). “Chemometrics and Method Development in High-performance Liquid Chromatography. Part 1: Introduction,” Chemom. Intel. Lab. Sys., 3, 175-188. Bemdge, J.C. (1989). “Chemometrics and Method Development in High-performance Liquid Chromatography. Part 2 Sequential Experimental Designs,” Chemom. Intel. Lab. Sys., 5 , 195-207. Bertalanffy. L. (1968). General System Theory: Foundntions. Development, Applications, George Braziller, New
York, NY. Bidlingmeyer, B.A., Deming, S.N., Price, W.P., Jr., Sachok, B., andPetmsek, M. (1979). “RetentionMechanism for Reversed-Phase Ion-Pair Liquid Chromatography,” J. Chromarogr., 186.419434. Bose, R.C., and Carter, R.L. (1962). ‘‘Response Model Coefficients and the Individual degrees of Freedom of a Factorial Design,” Biometrics, 18, 160-171. Box, G.E.P. (1952). “Statistical Design in the Study of Analytical Methods,” Anal. Chem., 77, 879. Box, G.E.P. (1954), “The Exploration and Exploitation of Response Surfaces: Some General Considerations and Examples,’’ Biometrics, 10, 16-60,
418 Box, G.E.P. (1957). “Evolutionary Operation: A Method for Increasing Industrial Productivity,” Applied Statistics, 6, 81-101. Box, G.E.P. (1976). “Science and Statistics,” J. Amer. Statist. Assoc., 71,791-799. Box, G.E.P. (1984). “The Importance of Practice in the Development of Statistics,” Technometrics, 26, 1-8. Box, G.E.P., and Behnken, D.W. (1960a). “Simplex-Sum Designs: A Class of Second Order Rotatable Designs Derivable from those of First Order,” Ann. Math. Statist., 31, 838-864. Box, G.E.P., and Behnken, D.W. (1960b). “Some New Three Level Designs for the Study of Quantitave Variables.” Technometrics, 2, 455. Box, G.E.P., and Draper, N.R. (1959). “A Basis for the Selection of a Response Surface Design,” J. Amer. Statist. Assoc., 54, 622-654. Box, G.E.P., and Draper, N.R. (1969). Evolutionary Operation: A Statistical Method Prcess Improvement, Wiley, New York, NY. Box, G.E.P., and Draper, N.R. (1987). Empirical Model-Building and Response Surfaces, Wiley, New York, NY. Box, G.E.P., and Hunter, J.S. (1957). “Multi-Factor Experimental Designs for Exploring Response Surfaces,” Ann. Math. Statist., 28, 195-241. Box, G.E.P., and Hunter, J.S. (1961a). “The 2‘’ Fractional Factorial Designs: Part I.,” Techrrotnrtrics, 3, 3 11-35 1. Box, G.E.P., and Hunter, J.S. (1961b), “The 2k-FFractional Factorial Designs: Part II.,” Techrionretrics. 3, 449458. Box, G.E.P., Hunter, W.G., and Hunter, J.S. (1978). Statistics f o r Experimenters: An Introdrrcrion to Design, Data Analysis, and Model Building, Wiley, New York, NY. Box, G.E.P., and Jenkins, G.M. (1976), Time Series Analysis: Forecasting and Control, 2nd ed.. Holden-Day, San Francisco, CA. Box, G.E.P., and Wilson, K.B. (1951). “On the Experimental Attainment of Optimum Conditions,” J. Roy. Statist. SOC.,Series B, 13, 1 4 5 . Box, G.E.P., and Youle, P.V. (1955). “The Exploration and Exploitation of Response Surfaces: An Example of the Link between the Fitted Surface and the Basic Mechanism of the System,” Biornetrics, 1 1 , 287-323 BOX,J.F. (1978), R.A. Fisher, The Life of a Scientist, Wiley, New York, NY. BOX,J.F. (1980). “R.A. Fisher and the Design of Experiments, 1922-1926,” American Sturisticiurr, 34, 1-7. BOX,J.F. (1981). “Gosset, Fisher, and the t Distribution,” American Statistician, 35, 61-66. Bratchell, N. (1989). “Multivariate Response Surface Modelling by Principal Components Analysis,” J. Chemom., 3,579-588. Campbell, S.K. (1974), Flaws and Fallacies in Statistical Thinking, Prentice-Hall, Englewood Cliffs, NJ. Carter, W.H., Jr. (1985). “Response Surface Methodology and the Design of Clinical Trials for the Evaluation of Cancer Chemotherapy,” Cancer Treatment Reports, 69, 1049-1053. Carter, W.H., Jr., Jones, D.E., and Carchman, R.A. (1985), “Application of Response Surface Methods for Evaluating the Interactions of Soman, Atropine, and Pralidioxime Chloride,” Fundamental and Applied Tox., 5, S232-S241. Carter, W.H., Jr., and Wampler, G.L. (1986), “Review of the Application of Response Surfaacc Methodology in the Combination Therapy of Cancer,” Cancer Treatment Reports, 70, 133-140. Chambers, J.M. (1973). “Fitting Nonlinear Models: Numerical Techniques,” Biometrika, 60, 1-13. Chambers, J.M., Cleveland, W.S., Kleiner, B., and Tukey. P.A. (1983), Graphical Methods f o r Data Analysis, Wadsworth, Belmont, CA. Claringbold, P.J. (195.5). “Use of the Simplex Design in the Study of the Joint Action of Related Hormones”, Biometrics, 11, 174-185 (1955). Cleveland, W.S. (1985). The Elements of Graphing Data, Wadsworth. Monterey, CA. Cochran, W.G., and Cox, G.M. (1950). Experimental Designs, Wiley, New York, NY. Coenegracht. P.M.J., Dijkman, M., Duineveld, C.A.A., Metting, H.J.. Elema, E.T., and Malingre, T.M. (1991). “A New Quaternary Mobile Phase System for Optimization of TLC Separations of Alkaloids Using Mixture Designs and Response Surface Modelling,” J. Liquid Chromatogr., 14, 3213-3239. Colina, A., Palacios, J.L., and Sarabia, L.A. (1989). “Analysis of Response Surfaces in the Study of Overlapped Polarographic Waves,” Chemom. Intel. Lab. Sys., 6 , 8 1-87.
419 Connor, W.S., and Zelen, M. (1959). “Fractional Factorial Experiment Designs for Factors at Three Levels,” National Bureau of Standards Applied Mathematics Series, 54, 1-37. Cornell, J.A. (1973). “Experiments with Mixtures: A Review,” Technometrics. 15, 437. Comell, J. (1990), Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data, 2nd ed., Wiley, New York, NY. Cox, D.R. (1971). “A Note on Polynomial Response Functions for Mixtures,’’ Biometrika, 58, 155-159. Crisponi, G., Nurchi. V.. and Ganadu, M.L. (1990). “An Approach to Obtaining an Optimal Design in the NonLinear Least Squares Determination of Binding Parameters in a Complex Biochemical System,” J. Chemotn., 4, 123-133. Crosby, P.B. (1979), Qualify is Free: The Art of Making Qualify Certain, McGraw-Hill, New York, NY. Crosby, P.B. (1984). Qualify Without Tears: The Art of Hassle-Free Management, McGraw-Hill, New York,
NY. Daniel, C. (1958), “On Varying One Factor at a Time,” Biometrics, 14, 4 3 M 3 1 . Daniel, C. (1976). Applications of Statistics to Industrial Experimentation, Wiley, New York, NY. Daniel, C., and Wood, F.S. (1971). Fitting Equations to Data: Computer Analysis of Multifactor Data for Scientists and Engineers, Wiley, New York, NY. Davies, O.L., Ed. (1956), Design and Analysis of Industrial Experiments, 2nd ed., Hafner, New York, NY. Davis, J.C. (1986). Statistics and Data Analysis in Geology, 2nd ed., Wiley, New York, NY. Dean, W.K., Heald, K.J., and Deming, S.N. (1975). “Simplex Optimization of Reaction Yields,” Science, 189, 805-806. Deming, S.N. (1971). “Pseudo-Three-Dimensional Presentation of Reaction-Rate Data,” Anal. Chem., 43, 1726-1728. Deming, S.N. (1977). “Optimization of Experimental Parameters in Chemical Analysis,” Chapter 5 i n DeVoe, J.R., Ed., Validation of the Measurement Process, ACS Symposium Series 63, American Chemical Society, pp. 162-175. Deming, S.N. (1978), “Optimization of Methods,” Chapter 2 in Hirsch, R.F., Ed., Proceedings of the Eastern Analytical Symposium on Principles of Experimentation and Data Analysis, Franklin Institute Press, pp. 31-55. Deming, S.N. (1981), “The Role of Optimization Strategies in the Development of Analytical Chemical Methods,” American Laboratory, 13(6), 42, 44. Deming, S.N. (1985), “Optimization,” Journal of Research of the National Bureau of Standards, 90(6), 479-4233. Deming, S.N. (1986), “Optimization and Experimental Design in Analytical Chemical Methods Development,” Chapter 41 in Laing, W.R., Ed., Analytical Chemistry Instrumentation: Proceedings of the 28th Conference on Analyticd Chemistry in Energy Technology, Lewis Publishers, Chelsea. MI. pp. 293-297. Deming. S.N. (1988). “Quality by Design: Part 1,” CHEMTECH, 18, 560-566. Deming, S.N. (1989a). “Quality by Design: Part 2,” CHEMTECH, 19, 52-58. Deming, S.N. (1989b). “Quality by Design: Part 3,” CHEMTECH, 19, 249-255. Deming, S.N. (1989~).“Quality by Design: Part 4,” CHEMTECH, 19, 504-511. Deming, S.N. (1990a). “Quality by Design: Part 5,” CHEMTECH, 20, 118-126 778. Deming, S.N. (1990b). “Experimental Optimization for Quality Products and Processes,” Chapter 7 in Karjalainen, E.J., Ed., Scientific Computing and Automation (Europe) 1990. Elsevier, Amsterdam, pp. 7 1-83. Deming, S.N., Bower, J.G., and Bower, K.D. (1984). “Multifactor Optimization of HPLC Conditions,” Chapter 2 in Giddings, J.C., Ed., Advances in Chromatography, 24, 35-53. Deming, S.N., and Morgan, S.L. (1973). “Simplex Optimization of Variables in Analytical Chemistry,’’ Anal. Chem., 45, 278A-283A. Deming, S.N., and Morgan, S.L. (1977). “Advances in the Application of Optimization Methodology in Chemistry,” Chapter 1 in Kowalski, B.R.. Ed., Chemometrics: Theory and Application, ACS Symposium Series 52, American Chemical Society, pp. 1-13. Deming, S.N., and Morgan, S.L. (1979), “The Use of Linear Models and Matrix Least Squares in Clinical Chemistry,” Clin. Chem., 25, 840-855. For correspondence, see Clin. Chem., 25, 2052-2055 (1979); Clirr. Chem., 26, 1227-1228.
420 Deming, S.N., and Morgan, S.L. (1983). “Teaching the Fundamentals of Experimental Design,” Anal. Chim. Acta, 150, 183-198. Deming, S.N., Morgan, S.L., and Willcott, M.R. (1976). “Sequential Simplex Optimization,” American Laboratory, 8(10), 13-14, 16, 18-19. Deming, S.N.. Palasota. J.A., and Palasota, J.M. (1991). “Experimental Design in Chemometrics,” J. Chemom., 5, 181-192. Deming, S.N.. and Parker, L.R., Jr. (1978), “A Review of Simplex Optimization in Analytical Chemistry,” CRC Crit. Rev. Anal. Chem., 7 , 187-202. Deming, S.N., and Turoff, M.L.H. (1978), “Optimization of Reverse-Phase Liquid Chromatographic Separation of Weak Organic Acids,” Anal, Chem., 50, 546-548. Deming, W.E. (1943). Statistical Adjustment of Data, Wiley, New York, NY. Deming, W.E. (1950). Some Theory of Sampling, Wiley, New York, NY. Deming. W.E. (1953). “On the Distinction between Enumerative and Analytic Surveys,” J. Amer. Statist. Assoc., 48, 244-255. Deming, W.E. (1956). “On the Use of Theory,” Industrial Quality Control, 13, 12-14. Deming, W.E. (1960). Sample Design in Business Research, Wiley, New York, NY. Deming, W.E. (1975a), “On Some Statistical Aids toward Economic Production,” Inte$aces, 5, 1-5. Deming, W.E. (1975b). “On Probability as a Basis for Action,” American Statistician, 29, 146-152. Deming, W.E. (1982). Quality, Productivity, and Competitive Position, MIT Center for Advanced Engineering Study, Cambridge, MA. Deming, W.E. (1985). “Transformation of Western Style of Management,” Interfnces, 15, 6-1 1. Deming. W.E. (1986). Out of the Crisis, Center for Advanced Engineering Study, Massachusetts Institute of Technology, Cambridge, MA. Derringer, G., and Suich, R. (1980). “Simultaneous Optimization of Several Response Variables,” J. Qual. Technol.. 12(4), 214-219. Diamond, W.J. (1989). Practical Experiment Designs, 2nd ed., Van Nostrand Reinhold, New York, NY. Dobyns, L., and Crawford-Mason, C. (1991). Quality or Else: The Revolution in World Business, Houghton Mifflin, Boston, MA. Draper, N.R., and Smith, H. (1981). Applied Regression Analysis, 2nd ed., Wiley, New York, NY. Driver, R.M. (19701, “Statistical Methods and the Chemist.” Chem. Brit., 6, 154-158. Duncan, A.J. (1959), Qualify Control and Industrial Statistics, revised ed., Irwin, Homewood, IL. Dunn, O.J.. and Clark, V.A. (1987). Applied Statistics: Analysis of Variance and Regression, 2nd ed., Wiley. New York, NY. Eisenharf C. (1968), “Expression of the Uncertainties of Final Results,” Science, 160, 1201. Enke, C.G. (1971). “Data Domains-An Analysis of Digital and Analog Instrumentation Systems and Components,” Anal. Chem., 43, 69. Evans, J.W. (1979), “Computer Augmentation of Experimental Designs to Maximize [X’X],” Technometrics, 21. 321. Finlay, W.L. (1977). ResearcWDevelopment, (May) p. US. Fisher, R.A. (1959). Smoking: the Cancer Controversy, Oliver and Boyd, London. Fisher, R.A. (1970). Statistical Methods for Research Workers, 14th ed., Hafner, New York, NY. Fisher, R.A. (1971). The Design of Experiments, 8th ed., Hafner, New York, NY. Fisher, R.A., and Yates, F. (1963). Statistical Tablesfor Biological, Agricultural and Medical Research, 6th ed., Hafner, New York, NY. Fletcher, R . (1987). Practical Methods of Optimization, 2nd ed., Wiley. New York, NY. Franklin, N.L.. Pinchbeck, P.H.. and Popper, F. (1956). “A Statistical Approach to Catalyst Development Part I: The Effect of Process Variables on the Vapour Phase Oxidation of Naphthalene.” Trans. Instn. Chem. Engrs, 34,280-293. Franklin, N.L., Pinchbeck, P.H., and Popper, F. (1958). “A Statistical Approach to Catalyst Development. Part 11. The Integration of Process and Catalyst Variables in the Vapour Phase Oxidation of Naphthalene,” Trans. Instn. Chem. Engrs., 36, 259-369.
42 1 Gall, 1. (1986). Systemantics, 2nd ed., The General Systemantics Press, Ann Arbor, MI. Ghosh, S., Ed. (1990). Statistical Design and Analysis of Industrial Experiments, Dekker, New York, NY. Gibson, R.J. (1968), “Experimental Design, or Happinessis Planning the Experiment,” Bioscience, 18, 223-225. Gitlow, H., Gitlow, S., Oppenheim, A., and Oppenheim, R. (1989), Tools and Methods f o r the Improvement of Quality, Irwin, Homewood, IL. Glajch, J.L., Kirkland, J.J., and Snyder, L.R. (1982). “Practical Optimization of Solvent Selectivity in LiquidSolid Chromatography using a Mixture-Design Statistic1 Technique,” J. Chromatogr., 238, 269-280. Glajch, J.L., Kirkland, J.J., Squire, K.M., and Minor, J.M. (1980), “Optimization of Solvent Strength and Selectivity for Reversed-Phase Liquid Chromatography using an Interactive Mixture-Design Statistical Technique,” J. Chromatogr., 199, 57-79. Glajch, J.L. and Snyder, L.R. (1981). “Solvent Strength of Multicomponent Mobile Phases in Liquid-Solid Chromatography. Mixtures of Three or More Solvents,” J. Chromatogr., 214, 21-34. Gluckman, P., and Roome, D.R. (1990). Everyhy Heroes: From Taylor to Deming: The Journey lo Higher Productivity, SPC Press, Knoxville, TN. Gold, H.J. (1977). Mathematical Modelling of Biological Systems - an Introductory Guidebook, Wiley, New York, NY. Goldberg, R. (1930), Colliers, 21 June, p. 14. Golub, G.H., and Van Loan, C.F. (1989). Matrix Computations, 2nd ed., Johns Hopkins University Press, Baltimore, MD. Gould, S.J. (1981). The Mismeasure ofMan, W.W. Norton & Company, New York, NY. Grant, E.L., and Leavenworth, R.S. (1988). Statisrical Quality Control, 6th ed.. McGraw-Hill, New York, NY. Green, P.E. (1976), Mathematical Tools for Applied Multivariate Analysis: Student Edition, Academic Press, New York, NY. Hacking, I. (1973, The Emergence of Probability, Cambridge University Press, Cambridge. Hahn, G.J. (1973). “The Coefficient of Determination Exposed!,” CHEMTECH, 3, 609-612. Hahn, G.J. (1974a). “Don’t Let Statistical Significance F w l You!,” CHEMTECH, 4, 16-17, 55. Hahn, G.J. (1974b), “Putting Bounds on a Prediction,” CHEMTECH, 4, 381-383. Hahn, G.J. (1975a). “‘How Large a Sample Do I Need for 95% Confidence?’,’’ CHEMTECH, 5. 61-62. Hahn, G.J. (1975b). “What Confidence Level Should I Select?,” CHEMTECH. 5. 186-187. Hahn, G.J. (197%). “Designing Experiments - I,” CHEMTECH, 5, 496-498. Hahn, G.J. (1975d). “Designing Experiments - 11,” CHEMTECH, 5, 561-562. Hahn, G.J. (1976a), “How Big a Sample Do I Need?,” CHEMTECH, 6, 142-143. Hahn, G.J. (1976b), ‘‘Process Improvement Using Evolutionary Operation,” CHEMTECH, 6, 204-206. Hahn, G.J. (1976c), “Process Improvement through Simplex EVOP,” CHEMTECH, 6, 343-345. Hahn, G.J. (1976d), “Whys and Wherefores of Normal Distribution,” CHEMTECH, 6, 530-532. Hahn, G.J. (1977a). “Beware of Nonindependent Observations!,” CHEMTECH, 7 , 117-1 18. Hahn, G.J. (1977b). “Estimating Sources of Variability,” CHEMTECH, 7 , 580-582. Hahn, G.J. (1977c), “Must I Randomize?,” CHEMTECH, 7 , 630-632. Hahn, G.J. (1977d), “Some Things Engineers Should Know About Experimental Design,” J. Qualify Tech., 9, 13-20. Hahn, G.J. (1978a). “More on Randomization.” CHEMTECH, 8, 164-168. Hahn, G.J. (1978b). “Don’t Be Deceived!,” CHEMTECH. 8, 317-318. Hahn, G.J. (1978c), “The Hazards of Extrapolation,” CHEMTECH, 8, 699-701. Hahn, G.J. (1979a). “More Hazards of Extrapolation,” CHEMTECH, 9, 46-49. Hahn, G.J. (1979b). “Sample Size Determines Precision.” CHEMTECH, 9, 294-295. Hahn, G.J. (1979~).“What Do I Gain from Smoothing Data?,” CHEMTECH. 9, 492493. Hahn, G.J. (1980a). “Planning Experiments: An Annotated Bibliography.” CHEMTECH, 10, 36-39. Hahn, G.J. (1980b), “Retrospective Studies Versus Planned Experimentation.” CHEMTECH. 10, 372-373. Hahn, G.J. (1981a). “Some Bwks on Applied Regression Analysis,” CHEMTECH, 11, 96-99. Hahn, G.J. (1981b). “‘True’ Product Variability,” CHEMTECH, 11, 156-157.
422 Hahn, G.J. (1982a). “Demonstrating Performance with ‘High Statistical Confidence’,” CHEMTECH. 12, 286-289. Hahn, G.J. (1982b), “On-Line Comparison of Process Alternatives,” CHEMTECH, 12, 741-743. Hahn, G.J.(1984a). “Calculating Overall Error,” CHEMTECH, 14, 696-697. Hahn, G.J. (1984b). “Experimental Design in the Complex World,” Technometrics, 26, 19-31. Hahn, G.J. (1985). “More Intelligent Statistical Software and Statistical Expert Systems: Future Directions.” American Statistician. 39, 1-16. Hahn, G.J. (1986), “Improving Our Most Important Product,” Inst. of Math. Stat. Bull.. 15, 354-358. Hahn, G.J., and Boardman. T.J. (1987a), “Continuous Quality Improvement: Part 1,” CHEMTECH, 17, 346-349. Hahn. G.J., and Boardman, T.J. (1987b). “Statistical Approaches to Quality Improvement: Part 2,” CHEMTECH, 17,412417. Hahn, G.J., and Meeker, W.Q. (1983), “Product Life Data Analysis and Some of Its Hazards,” CHEMTECH, 13, 282-284. Hahn, G.J., and Meeker, W.Q., Jr. (1984). “An Engineer’s Guide to Books on Statistics and Data Analysis,” J. Qual. Technol., 16(4), 196-218. Hahn, G.J., and Meeker, W.Q. (1985). “From Data to Decision: A Bookshelf for the Technologist,” CHEMTECH, 15, 175-177. Hahn, G.J., and Meeker, W.Q. (1991). Statistical Intervals: A Guide for Practitioners, Wiley, New York, NY. Hahn, G.J.. Meeker, W.Q., Jr.. and Feder, P.I. (1976). “The Evaluation and Comparison of Experimental Designs for Fitting Regression Relationships,” J. Qual. Technol., 8(3), 14CL157. Harrington, E.C., Jr. (1965). “The Desirablity Function,” Industrial Quulify Control, 21, 494498. Havlicek, L.L., and Crain, R.D. (1988). Practical Statistics f o r the Physical Sciences, American Chemical Society, Washington, DC. Hawkins, D.M. (1980). “A Note on Fitting a Regression Without an Intercept Term,” American Statistician, 34,233. Healy, M.J.R. (1986). Matrices for Statistics, Oxford University Press, New York, NY. Hendrix, C. (1980). “Through the Response Surface with Test Tube and Pipe Wrench,” CHEMTECH, 10, 488497. Hendrix, C.D. (1986), “Sixteen Ways to Mess Up an Experiment,” CHEMTECH, 16, 223-231. Hicks, C.R. (1973). Fundamental Concepts in the Design of Experiments, 2nd. ed. Holt, Rinehart, and Winston, New York, NY. Hill, W.J., and Hunter, W.G. (1966), “A Review of Response Surface Methodology: A Literature Survey,” Technometrics, 8, 571. Himmelblau, D.M. (1970). Process Analysis by Statistical Methods, Wiley, New York, NY. Hitchcock, K., Kalivas. J.H., and Sutter, J.M. (1992), “Computer-Generated Multicomponent Calibration Designs for Optimal Analysis Sample Predictions,” J. Chemom., 6, 85-96. Holland, P.W. (1986). “Statistics and Causal Inference,” J. Amer. Statist. Assoc., 81, 945-960. Hooke, R. (1980). “Getting People to Use Statistics Properly.” American Statistician, 34, 3 9 4 2 . Horwitz, W. (1989),“Interlaboratory Studies,” J. Assoc. Of. Anal. Chem., 72, 145-147. Huff, D. (1954). How to Lie with Statistics. Norton, New York, NY. Hunter, J.S. (1985). “Statistical Design Applied to Product Design,” J. Qual. Technol., 17(4), 21&221. Hunter, J.S. (1987), “Applying Statistics to Solving Chemical Problems”, CHEMTECH, 17, 167-169. Hunter, J.S. (1989), “Let’s All Beware the Latin Square,” Qual. Eng., 1, 453465. Hunter, W.G. (1981), “The Practice of Statistics: The Real World Is an Idea Whose Time Has Come,” American Statistician, 35. 72-76. Hunter, W.G., and Kittrell, J.R. (1966). “Evolutionary Operation: A Review,” Technometrics, 8, 389-397. Iberall, A S . (1972). Toward a General Science of Viable Systems, McGraw-Hill, New York, NY. Ishikawa, K. (1982), Guide to Qualify Control, Asian Productivity Organization, Tokyo. Available from UNIPUB, Box 433 Murray Hill Station, New York, NY.
423 Jeffers, J.N.R. (1978). An Introduction to Systems Analysis: with Ecological Applications, University Park Press, Baltimore, MD. Jenkins, M.W., Mocella, M.T., Allen, K.D., and Sawin, H.H. (1986). “The Modeling of Plasma Etching Processes Using Response Surface Methodology,” Solid State Technol., 29, 175-182. Joiner, B.L. (1981). “Lurking Variables: Some Examples,” American Statistician, 35, 227-233. Joiner, B.L. (1985). “The Key Role of Statisticians in the Transformation of North American Industry,” American Statistician, 39(3), 224-227. Juran, J.M. (1988a), Juran on Planning for Quality, The Free Press, New York, NY. Juran, J.M. (1988b). Editor-in-Chief, Juran’s Quality Control Handbook, 4th ed.,McGraw-Hill, New York, NY. Kackar, R.N. (1985), “Off-Line Quality Control, Parameter Design, and the Taguchi Method,” J. Qual. Technol., 17, 176209. Kackar, R.N., Lagergren, E.S., and Filliben, J.J. (1991), “Taguchi’s Fixed-Element Arrays are Fractional Factorials,” J. Qual. Technol. 23, 107-116. Kandel, A. (1986). Fuzzy Mathematical Techniques with Applications, Addison-Wesley, Reading, MA. Kanemasu, H. (1979). “A Statistical Approach to Efficient Use and Analysis of Simulation Models,” Bull. Inter. Statist. Inst., 48, 573-604. Kaufmann, A., and Gupta, M.M. (1985), Introduction to Fuzzy Arithmetic: Theory and Applications, Van Nostrand Reinhold, New York, NY. Kempthome. 0. (1952). The Design and Analysis of Experiments, Wiley, New York, NY. Kempthorne, 0. (1980). “The T e r n Design Matrix,” American Statistician, 34, 249. Khuri, A.I., and Comell, J.A. (1987). Response Su$aces: Designs and Analyses, Dekker, New York, NY. Kilian, C.S. (1988), The World of W. Edwards Deming, Continuing Engineering Education, The George Washington University, Washington, DC. King, P.G., and Deming, S.N. (1974). “UNIPLEX: Single-Factor Optimization of Response in the Presence of Error,” Anal. Chem., 46, 14761481. King, P.G., Deming, S.N., and Morgan, S.L. (1975). “Difficulties in the Application of Simplex Optimization to Analytical Chemistry.” Anal. Lett., 8, 369-376. Kong, R.C., Sachok, B., and Deming, S.N. (1980). “Combined Effects of pH and Surface-Active-Ion Concentration in Reversed-Phase Liquid Chromatography,” J. Chromutogr., 199, 307-316. Kopas, D.A., and McAllister, P.R. (1992). “Process Improvement Exercises for the Chemical Industry,” American Statistician, 46, 34-41. Koshal, R.S. (1933), “Application of the Method of Maximum Likelihood to the Improvement of Curves Fitted by the Method of Moments,” J. Roy. Stutist. SOC.,A96, 303-313. Kowalski, B.R., Ed. (1977). Chemometrics: Theory and Application, ACS Symposium Series 52, American Chemical Society, Washington, DC. Lacy, M.E.(1986). “Systems Theory as a Conceptual and Organizational Framework for Computational and Inferential Chemistry,” J. Chem. Educ., 63(5), 392-396. Laszlo, E. (1972). Introduction to Systems Philosophy, Gordon and Breach, New York, NY. Lave, L.B. (1981). “Conflicting Objectives in Regulating the Automobile,” Science, 212, 893-899. Lewis, C.I. (1956). Mind and the World-Order: Outline of a Theory of Knowledge, Dover, New York. NY. Lightman, A,. and Gingerich, 0. (1991), “When Do Anomalies Begin?” Science, 255, 6904595. Lund, R.E. (1982a), “Plans for Blocking and Fractions of Nested Cube Designs,” Cornmun. Sfafisf.-Theor. Meth.. 11, 2287-2296. Lund, R.E. (1982b). “Description and Evaluation of a Nested Cube Experimental Design,” Commun. Statist.Theor. Meth., 11, 2297-2313. Malinowski, E.R. (1991), Factor Analysis in Chemistry, 2nd ed., Wiley, New York, NY. Mallows, C.L., Ed. (1987), Design, Data and Analysis: By Some Friends of Cuthbert Daniel, Wiley, New York,
NY. Mandel, J. (1964). The Statistical Analysis of Experimental Data, Wiley, New York, NY. Mann, N.R. (1985). The Keys to Excellence: The Story of the Deming Philosophy, Prestwick Books, Los Angeles, CA. Marquardt, D.W. (1973), “Strategy of Experimentation.” Dufont Innovation. 5, 1-5.
424 Marquardt, D.W. (1979), “Statistical Consulting in Industry,” American Statistician, 33, 102-107. Marquardt, D.W. (1984). “New Technical and Educational Directions for Managing Product Quality,” American Statistician, 38, 8-14. Massart, D.L., and Kaufman, L. (1983). The Interpretation of Analytical Chemical Data by the Use of Cluster Analysis, Wiley, New York, NY. Massart, D.L., Vandeginste, B.G.M., Deming, S.N., Michotte, Y., and Kaufman, L. (1988), Chemometrics, a Textbook, Elsevier Science Publishers, Amsterdam. Matson, J.V. (1991). How to Fail Successfully, Dynamo Publishing Co., Houston, TX. Meeker, W.Q., Jr., Hahn, G.J., and Feder, P.I. (1975), “A Computer Program for Evaluating and Comparing Experimental Designs and Some Applications,” American Sraristician, 29, 60-64. Mendenhall, W. (1968). Introduction to Linear Models and the Design and Analysis of Experiments, Wadsworth, Belmont, CA. Meyer, S.L. (1975). Data Analysis for Scientists and Engineers, Wiley. New York, NY. Miah, M.J., and Moore, J.M. (1988). “Parameter Design in Chemometry,” Chemom. Intel. Lab. Sys., 3. 31-37. Miller, L.M. (1984). American Spirit: Visions of a New Corporate Culfure, Morrow, New York, NY. Miller, J.C., and Miller, J.N. (1988). Statistics for Analytical Chemistry, 2nd ed., Ellis Honvood Limited, Chichester. Minkkinen, P. (1986). “Monitoring the Precision of Routine Analyses by Using Duplicate Determinations,” Anal. Chim. Acta, 191, 369-376. Minkkinen, P. (1987). “Evaluation of the Fundamental Sampling Error in the Sampling of Particulate Solids,’’ Anal. Chim. Acta, 196. 231-245. Minkkinen, P. (1989), “SAMPLEX - A Computer Program for Solving Sampling Problems,” Chemom. Intel. Lab. Sys.,7, 189-194. Minkkinen, P. (1990), “Analysis of Collaborative Experiments,” Euroanalysis VII, August 26-3 1. Montgomery, D.C. (1984), Design and Analysis of Experiments, 2nd ed., Wiley, New York, NY. Moore, D.S. (1979). Statistics: Concepts and Controversies, Freeman, San Francisco, CA. Morgan, E. (1991), Chemometrics: Experimental Design, Wiley, Chichester. Morgan, E., Burton, K.W., and Church, P.A. (1989). “Practical Exploratory Experimental Designs,” Chemom. Intel. Lab. Sys., 5, 283-302. Morgan, S.L., and Deming, S.N. (1974). “Simplex Optimization of Analytical Chemical Methods,” Anal. Chem., 46, 1170-1181. Morgan, S.L., and Deming, S.N. (1975). “Optimization Strategies for the Development of Gas-Liquid Chromatographic Methods,” J. Chromatogr., 112, 267-285. Morgan, S.L., and Deming, S.N. (1976). “Experimental Optimization of Chromatographic Systems,” Sep. Purij Methods, 5, 333-360. Morgan, S.L., and Jacques, C.A. (1978). “Response Surface Evaluation and Optimization in Gas Chromatography,” J. Chromatogr. Sci., 16, 500-505. Mosbacher, C.J. (1977), ResearchDevelopment, (January) p. 23. Mulholland, M., Naish, P.J., Stout, D.R., and Waterhouse, J. (1989). “Experimental Design for the Ruggedness Testing of High-performance Liquid Chromatography Methodology,” Chemom. Intel. Lab. Sys.,5,263-270. Mulholland, M., and Waterhouse, J. (1988). “Investigation of the Limitations of Saturated Factorial Experimental Designs, with Confounding Effects for an HPLC Ruggedness Test,” Chromatographia, 25(9), 769-774. Myers, R.H., Vining, G.G.. Giovannitti-Jensen, A,, and Myers, S.L. (1992). “Variance Dispersion Properties of Second-Order Response Surface Designs”. J. Quol. Technol., 24, pp. 1-1 1. Natrella, M.G. (1963). Experimental Statistics, National Bureau of Standards Handbook 9 1, Washington, DC. Neave. H.R. (1990), The Deming Dimension, SPC Press, Knoxville, TN. Negoita, C.V. (1985), Expert Systems and Fuzzy Systems, BenjamidCummings, Menlo Park, CA. Nelder, J.A. (1966). ‘‘Inverse Polynomials, A Useful Group of Multi-factor Response Functions,” Biornetrics, 22, 128-141.
425 Nelson, L.S. (1986). Technical Aids: Collected from the Journal of Quality Technology 1974-1985. 2nd ed., American Society for Quality Control Quality Press, Milwaukee, WI. Neter, J., Wasserman, W., and Kutner, M.H. (1990). Applied Linear Statistical Models: Regression, Analysis of Variance, and Experimental Designs, 3rd ed., Irwin, Homewood, IL. Norman, G.R., and Streiner. D.L. (1986), PDQ Statistics, B.C. Deker, Philadelphia. O’Donahue, W.T., and Geer, J.H. (1985), “The Habituation of Sexual Arousal,” Archives of Sexual Behavior, 14, 233-246. Olansky, A.S., and Deming, S.N. (1976), “Optimization and Interpretation of Absorbance Response in the Determination of Formaldehyde with Chromotropic Acid,” Anal. Chim Acta, 83,241-249. Olansky, A.S., and Deming, S.N. (1978), “Automated Development of a Kinetic Method for the ContinuousFlow Determination of Creatinine,” Clin. Chem., 24, 2115-2124. Olansky, A.S., Parker, L.R., Jr., Morgan, S.L., and Deming, S.N. (1977). “Automated Development of Analytical Chemical Methods. The Determination of Serum Calcium by the Cresolphthalein Complexone Method,’’ Anal. Chim. Acta, 95, 107-133. Parker, L.R., Jr., Morgan, S.L., and Deming, S.N. (1975). “Simplex Optimization of Experimental Factors in Atomic Absorption Spectrometry,” Appl. Spectrosc., 29, 429433. Penzias, A. (1989), “Teaching Statistics to Engineers,” Science, 244, 2. Phadke, M.S.. Kackar, R.N., Speeney, D.V., and Grieco, M.J. (1983), “Off-Line Quality Control in Integrated Circuit Fabrication Using Experimental Design,” Bell Sys. Tech. J., 62, 1273-1309. Plackett, R.L. and Burman, J.P. (1943). “The Design of Optimum MultifactorialExperiments,” Biometrih, 33, 305-325. Polya, G. (1973), How To Solve It: A New Aspect of Mathematical Method, 2nd ed., Princeton University Press, Princeton, NJ. Press, William H., Flannery, Brian P., Teukolsky, Saul A,, and Vetterline, William T. (1989). Numerical Recipes in Pascal: The Art of Scientific Computing, Cambridge University Press, New York, NY. Price, W.P., Jr., and Deming, S.N. (1979). “Optimized Separationof Scopoletin and Umbelliferone and cis-trans Isomers of Ferulic and p-Coumaric Acids by Reverse-Phase High-Performance Liquid Chromatography,” Anal. Chim. Acta, 108, 227-231. Price, W.P., Jr., Edens, R., Hendrix, D.L., and Deming. S.N. (1979). “Optimized Reverse-Phase HighPerformance Liquid Chromatographic Separation of Cinnamic Acids and Related Compounds,” Anal. Biochem., 93, 233-237. Pugh, S. (1991). Total Design: Integrated Methods for Successful Product Engineering, Addison-Wesley Publishing Company, Wokingham. Ray, D.E. (1989). “Interrelationships among Water Quality, Climate, and Diet on Feedlot Performance of Steer Calves,” J. Animal Sci., 67, 357-363. Read, D.R. (1954). “The Design of Chemical Experiments,” Biometrics, 10, 1-15. Rider, P.B. (1947). Analytic Geometry, Macmillan, New York, NY. Ross, P.J. (1988). Taguchi Techniques for Quality Engineering: Loss Function. Orthogonal Experiments, Parameter and Tolerance Design, McGraw-Hill Book Company, New York, NY. Rovati, G.E., Rodbard, D., and Munson, P.J. (1990). “DESIGN Computerized Optimization of Experimental Design for Estimating K(d) and B(max) in Ligand Binding Experiments. 11. Simultaneous Analysis of Homologous and Heterologous Competition Curves and Analysis of Blocking and of “Multiligand” DoseResponse Surfaces,” Anal. Biochem., 184, 172-183. Rubin, I.B., Mitchell, T.J., and Goldstein, G. (1971). “A Program of Statistical Designs for Optimizing Specific Transfer Ribonucleic Acid Assay Conditions,” Anal. Chem., 43, 717-721. Sachok, B., Kong, R.C., and Deming, S.N. (1980), “Multifactor Optimization of Reversed-Phase Liquid Chromatographic Separations,” J. Chromatogr., 199, 317-325. Sachok, B.. Stranahan, J.J., and Deming, S.N. (1981). “Two-Factor Minimum Alpha Plots for the Liquid Chromatographic Separation of 2.6-Disubstituted Anilines,” Anal. Chem., 53. 70-74. Saunders, P.T. (1980), An Introduction to Catastrophe Theory, Cambridge University Press, New York, NY. Savitzky, A., and Golay. M.J.E. (1964). “Smoothing and Differentiation of Data by Simplified Least Squares Procedures.” Anal. Chem., 36, 1627-1639.
426 Schabas, M. (1991). “The Idea of the Normal,” Science, 251, 1373. Scheffk, H. (1953), Analysis of Variance, Wiley, New York, NY. Scherkenbach, W.W. (1986), The Deming Route to Qualify and Productivity: Road Maps and Roadblocks, ASQC Quality Press, Milwaukee, WI. Scherkenbach, W.W. (1991). Deming ’s Road to Continual Improvement, SPC Press, Knoxville, TN. Scholtes. Peter R. (1988). The Team Handbook: How to Use Teams ro Improve Qwliry. Joiner Associates, Madison, WI. Sharaf. M.A., Illman, D.L.,and Kowalski, B.R. (1986), Chemometrics, Wiley, New York, NY. Shavers, C.L., Parsons, M.L., and Deming, S.N. (1979). “Simplex Optimization of Chemical Systems,’’ J . Chem. Educ., 56, 307-309. Shewhart, W.A. (1931). Economic Control of Quality of Manufactured Product, Van Nostrand, New York, NY. Shewhart, W.A. (1939). Statistical Method from the Viewpoint of Quality Control, The Graduate School, The Agriculture Department, Washington, DC. Skogerboe. R.K. (1968), “Optimization of Analytical Methods Using Designed Experiments,” in Baer, W.K., Perkins, A.J., Grove, E.L., Eds, Developmenrs in Applied Spectroscopy, 6, Plenum Press, New York, NY, pp. 127-142. Small, Bonnie B. (1956). Statistical Quality Control Handbook, Western Electric, Indianapolis. IN. Smith, J.R., and Beverly, J.M. (1981). “The Use and Analysis of Staggered Nested Factorial Designs,” J. Qual. Technol., 13, 166173. Snedecor, G.W., and Cochran, W.G. (1980). Statistical Methods, 7th ed., The Iowa State University Press, Ames, IA. Snee, R.D. (1973). “Techniques for the Analysis of Mixture Data,” Technometrics, 15, 517-528 (1973). Solana, R.P., Chinchilli, V.M., Walter, J., Carter, W.H., Jr., Wilson, J.D., and Carchman, R.A. (1987), “The Evaluation of Biological Interactions using Response Surface Methodology,” Cell and Biology Toxicology, 3, 263-277. Solana, R.P., Chinchilli, V.M., Wilson, J., Carter, W.H., Jr., and Carchman, R.A. (1986). “Estimation and Analysis of the Concentration-Response Surfaces Associated with Multiple-Agent Combinations,’’ Toxicology and Applied Pharmacology, 85, 231-238. Solana, R.P., Chinchilli, V.M., Wilson, J.D., Carter, W.H., Jr., and Carchman, R.A. (1987). “Evaluation of the Interaction of Three Genotoxic Agents in Eliciting Sister-Chromatid Exchanges using Response Surface Methodology,” Fundamental and Applied Tox., 9, 541-549. Stahle, L., and Wold, S. (1989). “Analysis of Variance (ANOVA),” Chemom. Intel. Lab. Sys., 6, 259-272. Steinberg, D.M., and Hunter, W.G. (1984). “Experimental Design: Review and Comment,” Technometrics, 26, 71-97. Steinier, J., Termonia, Y., and Deltour, J. (1972). “Comments on Smoothing and Differentiation of Data by Simplified Least Square Procedure,” Anal. Chem., 44, 1906-1909. Stigler, S.M. (1986), The History of Statistics: The Measurement of Uncertainty before 1900,Harvard University Press, Cambridge, MA. Strange, R.S. (1990). “Introduction to Experiment Design for Chemists,’’ J . Chem. Educ., 67(2), 113-1 15. “Student” (1908). “The Probable Error of a Mean,” Biometrika, 6, 1-25. Suckling, C.J., Suckling, K.E., and Suckling, C.W. (1978), Chemistry through Models: Concepts and Applications of Modelling in Chemical Science, Technology and Industry, Cambridge University Press, Cambridge. Summers, G.W.. Peters, W.S., and Armstrong, C.P. (1977). Basic Statistics: An Introduction, Wadsworth, Belmont, CA. Sutherland, J.W. (1975). Systems: Analysis. Administration, and Architecture, Van Nostrand Reinhold, New York, NY. Taguchi, G. (1986). Introduction to Quality Engineering: Designing Quality into Products and Processes, Asian Productivity Organization, Tokyo. Available in North America from UNIPUB, Kraus International Publications, White Plains, NY. Taylor, J.K. (1984). “The Rising Interest in Quality Assurance,’’ Trends in Analytical Chemistry, 3(4), iv-vi. Taylor, J.K. (1987). Quality Assurance of Chemical Measurements, Lewis Publishers, Chelsea, MI.
427 Taylor, J.K. (1990). Statistical Techniques for Data Analysis. Lewis Publishers, Chelsea, MI. Tufte, E.R. (1983). The Visual Display of Quantitative Information, Graphics Press, Cheshire, CT. Tufte, E.R. (1990), Envisioning Information, Graphics Press, Cheshire, CT. Tukey, J.W. (1980). “We Need Both Exploratory and Confirmatory,” Americm Statistician, 34, 23-25. Turoff. M.L.H., and Deming, S.N. (1977). “Optimization of the Extraction of Iron(I1) from Water into Cyclohexane with Hexafluoroacetylacetone and Tri-n-Butyl Phosphate,” Talanta, 24, 567-57 1. Vemuri, V. (1978). Modeling of Complex Systems: an Introduction, Academic Press, New York, NY. Walters, F.H., Parker, L.R., Jr., Morgan, S.L., and Deming, S.N. (1991). Sequential Simplex Optimization: A Technique for Improving Quality and Productivity in Research, Development, and Manufacturing, CRC Press, Inc., Bcca Raton, FL. Walton, M. (1986). The Deming Management Method, Dodd, Mead & Co., New York, NY. Walczak, B., Chretien, J.R., Dreux, M., Morin-Allory, L., and Lafosse, M. (1987). “Factor Analysis and Experiment Design in High-performance Liquid chromatography. IV. Influence of Mobile Phase Modifications of the Selectivity of Chalcones on an ODS Stationary Phase,” Chemom. Intel. Lab. Sys., 1, 177-189. Walczak, B., Morin-Allory, L, Chretien. J.R., Lafosse, M., and Dreux, M. (1986), “Factor Analysis and Experiment Design in High-performance Liquid Chromatography. 111. Influence of Mobile Phase Modifications on the Selectivity of Chalcones on a Diol Stationary Phase.” Chemom. Intel. Lab. Sys., 1, 79-90. Wei, J. (1975). “Least Square Fitting of an Elephant,” CHEMTECH, 5 , 128-129. Weinberg, G.M. (1975), An Introduction to General Systems Thinking, Wiley, New York, NY. Weinberg, G.M., and Weinberg, D. (1988), General Principles of Systems Design, Dorset House, New York, NY. Wentworth, W.E. (1965a). “Rigorous Least Suqares Adjustment Application to Some Non-Linear Equations I.,” J. Chem. Educ., 42, 96-103. Wentworth, W.E. (1965b). “Rigorous Least Squares Adjustment: Application to Some Non-Linear Equations, 11.” J. Chem. Educ., 42, 162-167. Wernimont, G. (1951). “Design and Interpretation of Interlaboratory Studies of Test Methods,” Anal. Chem., 23, 1572-1576. Wemimont, G. (1946). “Use of Control Charts in the Analalytical Laboratory,” Industrial and Engineering Chemistry, 18, 587-592. Wemimont, G. (1969), “Development and Evaluation of Standard Test Methods, The Role of Statistical Design of Experiments.” Materials Research & Standards, 9(9), 8-21. Wernimont, G.T. (1985). Spendley. W., Ed., Use of Statistics to Develop and Evaluate Analytical Methods. Association of Official Analytical Chemists, Washington, DC. Wheeler, D.J. (1983), Four Possibilities, Statistical Process Controls, Knoxville, TN. Wheeler, D.J. (1985). Keeping Control Charts, Statistical Process Controls, Knoxville, TN. Wheeler, D.J. (1987), Understanding Industrial Experimentation, Statistical Process Controls, Knoxville, TN. Wheeler, D.J. (1989). Tables of Screening Designs, 2nd ed., Statistical Process Controls, Knoxville, TN. Wheeler, D.J., and Chambers, D.S. (1986), Understanding Statistical Process Control, Statistical Process Controls, Knoxville, TN. Wheeler, D.J., and Lyday, R.W. (1984), Evaluating the Measurement Process, Statistical Process Controls, Knoxville, TN. Wilde, D.J., and Beightler, C.S. (1979), Foundations of Optimization, 2nd ed., Prentice-Hall, Englewood Cliffs, NJ. Williams, H.P. (1990). Model Building in Mathematical Programming, 3rd ed., Wiley, New York, NY. Wilson, A.L. (1970a). “The Performance-Charcteristics of Analytical Methods-I.,” Talanta, 17, 21-29. Wilson, A.L. (197b). “The Performance-Characteristics of Analytical Methods-11,’’ Talanta, 17. 31-44. Wilson, A.L. (1973). “The Performance Characteristics of Analytical Methods-111,” Talanta, 20, 725-732. Wilson, E.B., Jr. (1952). An Introduction to Scientific Research, McGraw-Hill, New York, NY. Wolters, R., and Kateman, G. (1990). “The Construction of Simultaneous Optimal Experimental Designs for Several Polynomials in the Calibration of Analytical Methods,” J. Chemomerrics, 4, 171-185.
428 Yarbro, L.A., and Deming, S.N. (1974), “Selection and Preprocessing of Factors for Simplex Optimization,” Anal. Chim. Acra, 73, 391-398. Yates, F. (1935), “Complex Experiments,” J. Roy. Statist. SOC.Supplement, 2, 181-247. Yates, F. (1936), “A New Method of Arranging Variety Trials Involving a Large Number of Varieties”, J. Agr. Sci., 26, 424. Yaks, F. (1937). The Design and Analysis of Factorial Experiments, Imperial Bureau of Soil Science, Harpendon. Yaks, F. (1964), “Sir Ronald Fisher and the Design of Experiments,” Biomerrics, 20, 307-321. Youden. W.J. (1951), Statistical Methods for Chemists, Wiley, New York, NY. Youden. W.J. (1952). “Statistical Aspects of Analytical Determinations”, Analyst. 77, 874-878. Youden, W.J. (1959). “Graphical Diagnosis of InterlaboratoryTest Results,” Industrial Qualify Control, 15(11). 24. Youden, W.J. (1961a). “How To Evaluate Accuracy,” Materials Research & Standards. 1, 268-271. Youden, W.J. (1961b). “Experimental Design and ASTM Committees,” Materials Research & Standards, 1, 862-867. Youden, W.J. (1961~).“Interpreting PreliminaryMeasurements,” Materials Research & Standards, 1,987-991. Youden, W.J. (1963). “Ranking Laroratoriesby Round-Robin Tests,” Materials Research & Standords, 3.9-1 3. Youden, W.J. (1966). Experimentation and Measurement, Scholastic Book Services, New York, NY. Youden, W.J., and Steiner, E.H. (1975). Statistical Manual of the Association of OfJicial Analytical Chemists, Association of Official Analytical Chemists, Washington, DC. Zadeh, L.A. (1965), “Fuzzy Sets,” Inform. Control., 8, 338-363. Zoest, A.R., Lim, J.K.C., Lam,F.C., and Hung, C.T. (1988). “Application of Central Composite Design to the Optimization of HPLC Analysis of Nitroimidazoles,” J. Liquid Chromarogr., 11, 2241-2253.
429
Subject Index
A acceptance of hypotheses, 100 accuracy, 55 adequacy of models, 86 adjustment for the mean, 152 adventitious experiments, 351 algorithm, Yates, 234,264,331 aliasing, 338 allocation of degrees of freedom, 200 alternative hypothesis, 100 analysis, canonical, 254 analysis of variance, 151 analytic study, 53 ANOVA, 151,161 array -, elements, 395 -,inner, 349 -, orthogonal, 348 -, outer, 349 artificial constraint, 36 ascent, steepest, 277 average, 47,357 averaging, 337
B backward elimination, 173 balanced incomplete block designs, 390 basic statistics, 45 bias, 56 bias-free, 116 blind duplicates. 97 block, 317 -, designs, 372,373 -, effect, 340 blocking, 317,339, 340, 372, 373,385 bound -, lower, 35 -, upper, 35
C
calcium, 312 calibration, 68,69 cancer, 22,67 canonical -, analysis, 2S4 -, model, 271 catastrophic response surface, 43 causal relationship, 62 cause, 9 centering, 145 central composite designs, 246,248,279,282 -, face centered, 289 -, inscribed, 292 charts -, control, 57 -, range, 180 -, x-bar, 180 chi-square distribution, 117 circular, 265 -, contours of constant response, 261 classical -, factor effects, 321,325, 326 -, -, vs. regression factor effects, 326 clinical chemical study, 312 coding, 129,145,146,214, 317,329,384 -, effect on parameter estimates, 214 -, of factorial designs, 239 coefficient -, of correlation, 164 -, of determination, 163 -, of multiple correlation, 162 -, of multiple determination, 162 column -, matrix, 394 -, vector, 394 comparisons, paired, 374 completely randomized designs, 368 components, pseudo-, 271 composite designs, 248
430 composition
D
-, contours of constant, 269 -, law! 23 confidence, 111,114 -,interval, 101,102, 104,217, 259,280 -, -, for full second-order polynomials, 279 -, -, of the mean, 117 conformable -, for addition, 397 -> for multiplication, 397 confounding, 67,68,87,337,338,340,347,361 constraint, 35, 65, 69, 77 -, artificial, 36 -, equality, 37, 266,385 -, inequality, 36 -, natural, 35 continuous -, factor, 31 -> response, 31 contour plots, superimposed, 271 contours -, of constant composition, 269 -, of constant response, 253,261 control -, charts, 57 -, factors, 349 -, statistical, 53 con trolled -: factor, 5 -: responsc, 12 corner designs, 276 corrected sum of squares, 154 correction for the mean, 152, 154 correlated experimental designs, 364 correlation -, among factors, 365 -> coefficient, 164 covariance, 93, 124 -, between slope and intercept, 124 -,estimated, 119 -,negative, 125, 127 -> positive, 125, 127 critical values of Student's t, 103 cube plots, 319,331 cuwe gaussian, 50 -, normal, 50 cyclical patterns, 182 -$
data
-, fuzzy, 234 -, observational, 177 day-of-the-week effect, 186 decomposition law, 2 decreasing, monotonic, 28 defective, 340 definitions, operational, 340 degrees of freedom, 38,48, 69; 238 -, allocation of, 200 -, tree, 162 dependent variable, 266 descent, steepest, 277 design -, matrix, 205 -, point, 160 designs -, balanced incomplete block, 390 -, block, 372,373 -, central composite, 246, 248: 279. 282 -, completely randomized, 368 -, composite, 248 -, corner, 276 -, correlated, 364 -> existing. 315 -, experimental, 71 -, face centered central composite, 289 -, factorial, 234 -, factorial-based, 317 -, foldover, 347 -, fractional factorial, 335,346 -, Graeco-Latin square, 392 -, Hadainard matrix, 346 -, hexagonal. 304,309 -, inscribed central composite, 292 -, Koshal, 337 -, Latin square?352 -, minimalist, 315 -, mixture, 266 non-central Composite, 248. 297 -, optinial, 126 orthogonal^ 262, 285 -> pentagonal, 309 -, Plackett-Burman, 346: 348,358 -, polyhedral, 302 -, randomized -, -, complete block, 378, 384 -> paired comparison, 373 -$
-;
-$
43 1 designs (cont’d)
-, randomly generated, 299
-, reflection, 347 -, rotatable, 259, 260,284, 302 -, saturated, 333
-, -, fractional factorial, 342, 344 -, star: 243,301 -, star-square, 282 -, Taguchi, 346,348: 358 -, three-level full factorial, 286 -, Youden square, 391 desirability functions, 271 determinant, 74, 401 deterniination, coefficient of, 163 deterministic model; 59, 60, 73 deviation, 49 -, standard, 50 diagonal matrix, 396 dimensions, 69 -, of a matrix, 394 discrete -, factor, 31 -, response, 32 distribution -, chi-square, 117 -, frequency, 47 -, of estimates: 102 -, of replicates, 294 dummy -, factors, 344: 346 -,variables$ 382 duplicates, blind: 97 E edge effects. 271 effects -, block, 340 -, classical factor, 321, 325.326 -, day-of-the-week, 186 -, edge, 271 -, factor, 9, 342 -, interaction, 325,330 -, main, 330 efficiency, 249. 335,344 eigenvalue, 256, 2.57 eigenvector, 256, 257 elements of an array, 395 elliptical contours of constant response, 262, 266
empirical models, 15, 277 enumerative study, 53 environmental factors, 349 enzymes, 199 equality constraints, 37, 266,385 equations, simultaneous, 93 error, 48,86,87, 385 estimate, 52,61,76 -, covariance, 119 -, effect of coding on, 214 -. least squares, 79,80 -,variance, 119, 161 exact fit, 83 example, regression analysis, 177 existing designs, 315 expanded model, 165,211 -, residuals, 167 expansion, Taylor series, U 7 experiment, 8 , 9 space, 28 experimental design, 71 matrix, 205 experiments, 388 adventitious, 351 -> one, 59 placement in factor space, 202 -: screening, 368 -; three, 131 two, 71 extensivc factor, 6 -; response, 14 extrapolation, 331 -$
-.
-$
-$
-.
-;
-$
F F-test, 109,111 face centered central composite designs. 289 factor, 4 -, effech, 342 -, -, classical, 321, 325,326 -, -, regression vs. classical, 326 -, space, 28 -, tolerance, 38 -, units, 328 factorial-based designs, 317 factorial designs, 234 -, coding of, 239 -, fractional, 335, 346
432 factorial designs (cont’d)
G
-, saturated fractional, 342,394 factors
-, continuous, 31 -, control, 349 -, controlled, 5 -, discrete, 31
-, dummy, 344,346 -, environmental, 349 -, extensive, 6 -, identification, 4 -, intensive, 6
-, known, 4 , s -, lurking, 362
-, masquerading, 7,362 -,noise, 349 -, process, 349 -, qualitative, 333, 378 -, quantitative, 333 -? uncontrolled, 5,233 unknown, 4 , s fast Fourier transform, 331 feasible region, 35,38 feedback, 10 -, negative. 11 -, positive, 10 first-order model, 71 fit: exact, 83 flexing geometry of response surfaces, 304 foldover designs, 347 formulations, 266 forward addition, 173 fractional -, factorial designs, 335, 346 -, -> saturated, 342,344 -, replication, 317,334 frame, 53 frequency distribution, 47 FSOP, 279 full -, factorial designs, three-level, 286 -, s e c o n d a d e r polynomial model, 246 function -, desirability, 271 -, information, 282 -, variance, 282 fuzzy -, data, 2’4 -, logic, 271
-.
gaussian curve, 50 general system theory, 1 geometric interpretation of parameter estimates, 146 geometry, flexing, of response surfaces, 304 global -, maximum, 30 -, minimum, 30 -, optimum, 30 Graeco-Latin square designs, 392 grand -, average, 357 -, mean, 357
H Hadaniard matrix designs, 346 half replicates, 335 heteroscedastic noise, 52,234 heteroscedasticity, 51,203 hexagonal designs, 394,309 homoscedastic noise?51,234 homoscedasticity, 51 hypersurface, 227 hypervolume, 227 hypotheses, 69 hypo thesis -, acceptance of, 100 -, alternative, 100 -, null, 100 -, rejection of, 100 -: testing, 46. 99 -; toleration of, 100 hysteresis. 43
I identification -, oTfactors. 4 -, of responses, 11 identity matrix, 396 important response, 9 imprecision, 55 inaccuracy, 55 increasing, monotonic, 28 inequality constraint, 36 information -, function, 282
433 information (cont’d)
-, normalized, 280,282 -,quality of, 114 inner array, 349 input, 2 inscribed central composite designs, 292 insignificance, 112 intensive -, factor, 6 -, response, 14 interaction, 230,256,323,325, 330,334,339, 347 -, self, 147 intercept, 72 -, covariance with slope, 1% -, lack of, 156 -, variance of, 122 interval -, confidence, 101,102,104,259,280 -, scale, 16,17,327,353 inverse -, matrix, 73 -, transform, 13,40 inversion, matrix, 400
K knowledge, profound, 178 known -, factor, 4 , s -, response, 11 Koshal designs, 337
L lack of fit, 107
-, statistical test for, 166 latent variable, 7 Latin square designs, 352 law -, composition, 23 -, decomposition, 23 least squares, 77 -, estimate, 79,80 -, matrix, 76 LeChatelier’s Principle, 183 level, $45 -, of confidence, 111,114 line, 227
linear
-, equations, simultaneous, 72 -, models, 71,93
-, -, deterministic, 73 linearization, 72 local maximum, 30 logic, fuzzy, 271 lower bound, 35 lurking -, factors, 362 -, variable, 7
M magnesium, 312 main effects, 330 masquerading factors, 7,362 matrix, 393 -, column, 394 -, determinant of, 401 -, diagonal, 396 -, dimensions of, 394 -, experimental design, 205 -, identity, 396 -, inverse, 74 -, inversion, 400 -, least squares, 76 -, multiplication, 72 -, of estimated responses, 156 -, of factor contributions, 156 -, of first-order parameter estimates, 255 -, of lack of fit deviations, 159 -, of mean replicate responses, 158 -, of mean response, 153 -, of measured responses, 73,153 -, of parameter coefficients, 73 -, of parameter estimates, 78 -, of parameters, 73 -, of purely experimental deviations, 160 -, of residuals, 77,157 -, of second-order parameter estimates, 255 -, of stationary point coordinates, 2SS -, row, 394 -, square, 394 -, symmetric, 39s -, transpose of, 78,395 -, variance-covariance, 119 maximum, %,30 -, global, 30 -, local, 30
434 mean, 45,47
-, adjustment for, 152 -, confidence interval of, 117 -, correction for, 152 -, grand, 357 -, population, 52 -, response, 48
-, sample, 52 -, squares, 161 measurement scale, 16 -, systems, 45, 224 mechanistic models, 15,277 Michaelis-Menton equation, 15 Mill’s methods, 67 minimalist designs, 315 minimum, 29, 30 -, global, 30 mixture designs, 266 mixtures, 266 models, 14 -, adequacy of, 86 -, canonical, 271 -, deterministic, 59, 60 -, empirical, 15, 277 -> cxpanded, 165, 21 1 -, first-order, 71 -, full second-order polynomial, 246 -> lacking intercept, 156 -, linear, 71, 93 -, mechanistic, 15, 277 multiparameter, 63 -, polynomial, 216 -, probabilistic, 60, 78 -, proportional, 62 -, reduced, 165,206 -, ScheffC, 271 -, second-order, 141 -, statistical, 60 -, straight line through the origin, 81 monotonic -, decreasing, 28 -, increasing, 28 multifactor response surfaces, 227 multimodal response surface, 30 niultiparameter models, 63 mu1tiple -, correlation, coefficient of, 162 -, determination, coefficient of, 162 inul tiplica ti on -;
-;
-, conformable for, 397 -, matrix: 72 N natural constraint, 35 negative covariance, 125, 127 negative feedback, 11 noise -, factors, 349 -, heteroscedastic, 52, 234 -, homoscedastic, 51, 234 nominal scale, 16,326,353 non-central composite designs, 248, 297 nonfeasible region, 38 normal curve. 50 normalization, 146 normalized -, information, 280,282 -, uncertainty. 280,281 null hypothesis, 100, 111
0
observation, 8. 9,388 observational data, 177 one experiment, 59 one-sided t-test, 115 operational definitions, 340 optimal design, 126 optimum, 30 -, global, 30 order, 33 ordinal scale, 16, 17,326,353 origin, translation of, 146 orthogonal, 266 -, arrays, 348 -, designs, 262, 285 orthogonality, 264 outer array, 349 output, 2 , 9
1’ paired comparisons, 374 paraboloid, 306 parameter estimates -, effect of coding on, 214 -, geometric interpretation of, 146
435 parameters, 59, 71 -, statistical test for a set of, 167 patterns, cyclical, 182 pentagonal designs, 303 perpendicular, 266 pH, 199 placement of experiments in factor space, 202 Plackett-Burman designs, 346,34S, 358 plane, 227 plots -, cube, 319,331 -, residual, 223 -, square, 319 -, Youden, 56 polyhedral designs, 302 pooling, 334 population -, mean, 52 -, statistical, 51, 52 -,variance, 52 positive -, covariance, 125,127 -, feedback, 10 practical significance, 169 precision, 55 probabilistic model, 60 process, 1 -, factors, 349 -, wine-making, 2 processes, reproducible, 40 profound knowledge, 178 proportional model, 62 pseudo-components, 271 pure error, 86,87 purely experimental uncertainty, 86. 87
Q qualitative
-, factors, 333, 378 -,variable, 4 quality, of information, 114 quantitative -, factors, 333 -, variable, 4
R randomization, 42,340,365,366 randomized -, complete block designs, 378, 384
paired comparison designs, 373 randomly generated designs, 299 range, 47 -, chart, 180 rank, 33 ratio scale, 16, 19,327,353 recipe, 173 reduced -, model, 165, 206 -, residuals, 167 reflection, 336 -, designs, 347 regression analysis, example of, 177 rejection of hypothesis, 100 relationship, causal, 62 replicates, 158, 159 -, distribution of, 294 -, half, 335 replication, 87,236,249,334 -, fractional, 317, 334 reproducibility, 49 reproducible processes, 40 residual plots, 223 residuals, 48,49,77,152,385 -, expanded, 167 -, reduced, 167 -, standard deviation of, 61 -, sum of, 83 -, sum of squares of, 78 -, variance of, 61 response, 9 -, contours of constant, 253,261 -, mean, 48 -, space, 28 -, surfaces, 25, 279 -, -, catastrophic, 43 -, -> confidence intervals for, 217 -, -, multifactor, 227 -, -, multimodal, 30 responses -, as factors, 10 -, continuous, 31 -, controlled, 12 -, corrected for the mean, 154 -, discrete, 32 -, extensive, 14 -, identification of, 11 -, important, 9 -, intensive, 14 -, known, 11 -:
436 responses (cont’d)
-, uncontrolled, 12 -, unimportant, 9 -, unknown, 11 ridge, 253 risk, 5, 114 rotatable designs, 259,260,265,284,302 rotation, 253,254 rounding, 145 row -, matrix, 394 -, vector, 394 ruggedness, 56,352
S sample
-, mean, 52 -,variance, 52 sampling theory, 43 saturated -, designs, 333 -, fractional factorial designs, 342,344 saturation, 333 Savitzky-Golaysmoothing, 277 scale -, interval, 16,17,327, 353 measurement, 16 -,nominal, 16,326,353 -, ordinal, 16,17,326,353 -, ratio, 16,19, 327,353 scales, application of: 18, 19 scaling, 264 Scheffe model, 271 screening, 342,344 -, experiments, 368 second-order models, 141; 216 self interaction, 147 significance, 87 of regression, statistical test for, 165 -, practical, 169 -> statistical, 169 simplex, 276 simultaneous equations, 93 -> linear equations, 72 single-point calibration, 68 slack variable, 266 slope, 72: 147, 342 -> covariance with intercept, 124 -, variance of, 121 --;
-I
-$
smoking, 22,67 smoothing, 277 space -, experiment, 28 -, factor, 28 -, response, 28 SPC charts, 180 square -, matrix, 394 -, plots, 319 squares, mean, 161 standard -, deviation, 50 -, -, of residuals, 61 standard uncertainty, 101,219 star designs, 243,301 star-square designs, 282 stationary point, 254 statistical -, control, 53 -, model, 60 -, population, 51,52 -, process control, 180 -, significance, 169 statistics, basic, 45 steepest ascent, 277 storks, 22 straight line through the origin, 81 structure, subsystem, 2 Student, 103 study -, analytic, 53 -, enumerative, 53 subsystem structure, 2 sum -, of residuals, 83 -, of squares, 105 -, -, about regression, 157 -, -, about the mean, 154 -, -, corrected for the mean, 154 -, -, due to -, -, -,lack of fit, 107,159 -, -, -, purely experimental uncertainty, 106, 160 -: -, -, regression, 156 -, -, -, the factors as they appear in the model, 156 -, -, -; the mean, 154 -, -, of residuals, 78, 157 -, -, total, 153
437 sums of squares tree, 1 6 2 superimposed contour plots, 271 surfaces, 227 -, response, 25,279 symmetric matrix, 395 system, 1 -, theory, 1 systems, measurement, 45,224
T
uncontrolled
-, factor, 5,233 -, response, 12 unimportant response, 9 units, factor, 328 unknown -, factor, 4 , s -, response, 11 upper bound, 35 V
t-tables, 115 t-test, 104,105, 111 -, one-sided, 115 t-values, 103 tables -, ANOVA, 161 -, Student’s t , 115 Taguchi -, designs, 346,348,358 -, orthogonal arrays, 346 Taylor series expansion, 247 test for -,a set of parameters, 167 -, effectiveness of the factors, 165 -, lack of fit. 166 -, significance of regression, 165 testing, hypothesis, 46, 99 theory, sampling, 43 thiotinioline, 70 three experiments, 131 three-level full factorial designs, 286 time, as a factor, 362,365 tolerance, factor, 38 toleration of hypothesis, 100 total sum of squares, 153 transform, 2,13,14 -, fast Fourier, 331 -, inverse, 13, 40 translation, of origin, 146,254 transpose, of a matrix, 78,395 treatment, 372,373,385 tree, sums of squares and degrees of freedom, 162 true value, 116 two experiments, 71
x-bar chart, 180
U
Y
uncertainty --;normalized, 280.281 -, purely experimental, 86, 87 -,standard, 101,219
Yates algorithm, 234,264: 331 Youden -, plot, 56 -, square designs, 391
value -, bias-free, 116 -, efficiency, 249 -, true, 116 variable, 3 -, dependent, 266 -, dummy, 382 -, input, 3 -, latent, 7 -, lurking, 7 -, qualitative, 4 -, quantitative, 4 -, slack, 266 variance, 48,49 -, estimated, 119, 161 -, function, 282 -, of intercept, 122 -, of residuals, 61 -, of slope, 121 -, population, 52 -, sample, 52 vanance-covariance matrix, 119 variation, 45 vector -, column, 394 -, row, 394 volume, 227
w wine-making process, 2 Working statistic, 221
x
This Page Intentionally Left Blank